Page 31 of 32 FirstFirst ... 2129303132 LastLast
Results 301 to 310 of 320

Thread: STR Wars, GDs, TMRCA estimates, Variance, Mutation Rates & SNP counting

  1. #301
    Registered Users
    Posts
    1,649
    Sex
    Location
    Calgary
    Ethnicity
    Anglo
    Nationality
    Canadian
    Y-DNA (P)
    I2-S2361 < L801
    mtDNA (M)
    H2a2b(1)
    mtDNA (P)
    H3

    Canada
    Rapid evolution of the human mutation spectrum
    More mutations at different rates between populations.

  2. The Following 2 Users Say Thank You to Megalophias For This Useful Post:

     Michał (10-31-2016),  Táltos (10-31-2016)

  3. #302
    Registered Users
    Posts
    343
    Sex
    Location
    USA
    Nationality
    USA
    Y-DNA (P)
    R1b-L21 L513*

    United States of America Ireland Germany Belgium Wallonia
    For Mark Jost or anyone who is familiar with Ken Nordtvedt's Y-STR-based Interclade age estimating methodology, I recently changed the TMRCA estimations in SAPP (my phylogenetic tree tool) to use that methodology instead of Nordtvedt's older adaptation of the Bruce Walsh methodology.

    The main reason I switched is because Nordtvedt's older method required knowledge of the allele frequencies which are specific to individual haplogroups. I built the L21 allele frequencies into the tool but that made it less accurate for other haplogroups. The Interclade least squares estimations don't require allele frequencies so they are more widely applicable.

    The tool recalculates a Interclade age estimation at every branching point (node) on the phylogenetic tree. Internally, every node has only two branches, so there are always two "sub-clades" to use for the methodology. When the phylogenetic tree is drawn it eliminates unnecessary nodes so you only see relevant branching with their TMRCA estimates. I should also note that although I calculate coalescence ages as well, I don't currently report them for simplicity.

    Switching between the two methods, I find the Interclade methodology produces slightly younger TMRCA estimates but with tighter one-standard-deviation error ranges than Nordtvedt's older method.

    Since the tool constrains the phylogenetic tree by SNP results, I also report the SNP TMRCA ages (where known) from YFull (V5.04 currently) against the SNPs reported on the tree. This allows for direct comparison around the tree between TMRCAs calculated by SNPs and TMRCAs calculated by STRs.

    But... this is not necessarily an apples-to-apples comparison since the TMRCA nodes reported on the tree are only the TMRCAs for the group being charted, not the TMRCA for that SNP overall. You would expect then in general that the SNP TMRCAs should be older (MUCH older, in some cases) than the STR TMRCAs. So the way to compare the two ages is that the SNP's overall TMRCA is somewhere ABOVE the node shown on the tree, and the node itself is of the age reported for the STR-based calculation.

    This is an example (from data invented for display purposes) of a node in the phylogenetic tree chart with both SNP (in blue) and STR (in green) ages shown:

    Attached Images Attached Images

  4. The Following User Says Thank You to Dave-V For This Useful Post:

     Roslav (08-16-2021)

  5. #303
    Registered Users
    Posts
    22
    Sex

    Quote Originally Posted by MJost View Post
    Anatole Klyosov uses several method to produce ages based on a 25 year per generation mutation rate.
    http://www.jogg.info/52/files/Klyosov1.pdf

    Chandler has posted his own set of calculated mutation rates. His paper is found at: http://www.jogg.info/22/Chandler.pdf

    Marko Heinla has produced his own more recent mutation rates back in May 2012 using methods using Chandler's methods. He has a link to his 111 marker rates near the botton of this web page.
    https://dl.dropboxusercontent.com/u/...svg_trees.html

    Marko Heinila's results are based on about 4,000 111 level samples. He used an estimation process that each haplotype pair was considered an independent random draw from a model distribution. Model distribution suggests what is the ratio of mismatches and matches in a given marker if pairs with a given number of matching markers in general are considered. The pair data was then used to solve the mutation rates. He said that this is the same idea as in Chandler's paper on mutation rate estimation.

    MJost
    A little late, but I have attached a short report that discusses my 2012 estimation method and it's relation to the original Chandler's approach. Basically (2012) estimation method reworks relevant math removing some approximations and also, perhaps more importantly, introduces topological data weighting. It appears that, although unpublished, these estimates are still referred to in the web even in some peer-reviewed papers. Some extra details then seem appropriate.

  6. The Following 5 Users Say Thank You to MarkoH For This Useful Post:

     JMcB (04-06-2021),  razyn (04-06-2021),  Roslav (07-03-2021),  sheepslayer (04-06-2021),  Telfermagne (04-06-2021)

  7. #304
    Registered Users
    Posts
    22
    Sex

    2021 YhRD

    YHRD dataset as been slowly expanding after 2012. New loci have been included and the dataset has been diversified. So I wrote a short description of the current situation wrt the accuracy of the 111 estimates.

    edit: I replaced the attachment to fix a comment related to the original error estimates
    Last edited by MarkoH; 07-16-2021 at 09:23 PM.

  8. The Following 3 Users Say Thank You to MarkoH For This Useful Post:

     J-Live (07-17-2021),  JMcB (07-16-2021),  sheepslayer (07-17-2021)

  9. #305
    Registered Users
    Posts
    22
    Sex

    There was the technical detail that the rate estimates are found by a probability maximization process. Such estimate would then be a mode of a distribution rather than a mean as I incidentally suggested in the document yhrd2021b.pdf .

    It seems that I can't any more edit the previous post, so the document with a more consistent error estimate section is attached here.

  10. The Following User Says Thank You to MarkoH For This Useful Post:

     sheepslayer (07-18-2021)

  11. #306
    Registered Users
    Posts
    22
    Sex

    Quote Originally Posted by MarkoH View Post
    There was the technical detail that the rate estimates are found by a probability maximization process. Such estimate would then be a mode of a distribution rather than a mean as I incidentally suggested in the document yhrd2021b.pdf .

    It seems that I can't any more edit the previous post, so the document with a more consistent error estimate section is attached here.
    Somehow I forgot to add the quite relevant 95% confidence interval for the overall scale of the estimates.

    After all, errors are composed of per locus errors and an error in the overall scale. For the very fastest loci these are fairly similar in magnitude, while the per locus error dominates in the case of the slower loci.

    Also the scale factor uncertainty needed to be removed from calculation producing Fig 1.

    None of this of course changes anything material in the document....
    Last edited by MarkoH; 07-21-2021 at 08:12 AM.

  12. #307
    Registered Users
    Posts
    22
    Sex

    I would summarize my recent observations as follows:

    Any pair of two real-world haplotypes is related by a sequence of independent
    periods of evolution (called "branches" in rrates.pdf). If each of the
    "branches" of evolution are given an equal probability of occurrence
    (by using a weighting scheme), random draws from an abstract haplotype
    pair distribution can be simulated and mutation rates can be
    estimated.


    Even tough my 2012 estimates did not score particularly well with
    father/son pair data that was available in 2012, they work very well
    with today's more complete YHRD dataset. It is estimated that the
    accuracy is at least similar to a father/son dataset of 10,000
    samples, but the accuracy could potentially be as good as with a set
    of 80,000 father/son pairs.
    Last edited by MarkoH; 07-22-2021 at 02:46 PM.

  13. #308
    Registered Users
    Posts
    22
    Sex

    The use of a particular weighting function in the estimatinon algorithm description document rrates.pdf was critical for the accuracy. However, it may appear cryptic. To clarify the meaning of the weighting I wrote a short description of it (attached).

    For example, if m(a,b) is a mutation count between samples a and b, and t(a,b) is their time distance, the related mutation rate is simply

    sum w(a,b) m(a,b) / sum w(a,b) t(a,b)

    where w(a,b) is the weighting factor and the sum is over all sample pairs (a,b). This simple result is accurate eg if the changes always happen in unique locations and can then be counted accurately from the pair observations without complications like back and parallel mutations.
    Last edited by MarkoH; 07-26-2021 at 11:55 AM.

  14. #309
    Registered Users
    Posts
    22
    Sex

    If the statistical tests of the document yhrd2021d.pdf are done for
    "Heinila (2012)" rates, McDonald rates, and Chandler rates,
    respectively, I find the following values with the data at
    https://yhrd.org/pages/resources/mutation_rates :

    P value

    0.36 , 0.0016 , 3e-5

    maximum likelihood N0 value

    89000, 12000, 7300

    This is surprising since McDonald rates are partly based on (2012)
    rates. However, an inclusion of relatively small father/son studies
    may reduce the effective size N0. Perhaps the biggest single
    difference of (2012) rates and Chandler rates was the use of the
    summing theorem (weighting.pdf).

  15. #310
    Registered Users
    Posts
    22
    Sex

    I started editing the documents I posted earlier since they omit some details. If you would like to get extended versions, please send a message.

    I would strongly encourage P value tests for various mutation rate sets, perhaps simply ignoring complex loci like 385 & 389. P tells how likely as extreme (or more extreme) mutation counts as seen in measured transmission events would be if the rate set in question was assumed to be accurate.
    Last edited by MarkoH; 08-01-2021 at 09:45 PM.

Page 31 of 32 FirstFirst ... 2129303132 LastLast

Similar Threads

  1. Replies: 18
    Last Post: 04-13-2020, 12:28 AM
  2. Replies: 2
    Last Post: 11-24-2019, 04:43 PM
  3. Replies: 0
    Last Post: 04-22-2018, 09:02 PM
  4. Replies: 41
    Last Post: 07-28-2017, 06:29 AM
  5. Replies: 77
    Last Post: 09-26-2013, 03:37 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •