Michał

04-17-2014, 06:40 PM

I was asked to repost here my SNP-based estimates for R1b-U106 that I have originally announced in the R1b-U106 Yahoo group (as these estimates were produced using the Big Y results of the R1b-U106 FTDNA project members).

Taking opportunity of having an access to such a large sample of nicely analyzed Big Y results, I've used those Big Y data (from the recent Big Y 120 spreadsheet) to make some provisional SNP-based TMRCA estimates for selected subclades of R1b-U106. Based on my previous experience with doing similar calculations for R1a and other haplogroups (including some R1b clades upstream of R1b-U106), I've decided to use the mutation rate 0.66 x 10^-9 per bp per year, which should give us approximately 150 years per each BigY-tested mutation when the so-called "gold standard" region of about 10 Mb is considered. Although the sequenced region is actually slightly larger (11-13 Mb), it seems that the percentage of "reliable" mutations reported for some sequences located outside of the "gold standard" region is below 5% on average, so I didn't modify my calculations to take this into account.

The average number of reliable BigY-tested mutations downstream of U106 seems to be 39.6, which corresponds to 5940 years (when using the above-mentioned assumption). Since this number of downstream mutations may of course differ in particular sublineages, I've tried to base my calculations not only on an average number of mutations downstream of a given branching node but also on a number of mutations separating this particular node from U106. It is worth noting that this method gives reliable results only if a relatively large number of independent (!) downstream lineages is available. In all other cases, there is a huge risk of significant over- or under-estimation.

Below please find some of my results. As rightly pointed out by Raymond, you need to keep in mind that each such estimate has a relatively large margin of error, so don't take those estimates too seriously. :)

U106 - 5940 ybp

Z381 - 5796 ybp

Z156 - 5628 ybp

DF98 - 4868 ybp

DF96 - 4412 ybp

L1 - 3416 ybp

L48 - 5663 ybp

L47 - 5084 ybp

Z160 - 4652 ybp

Z9 - 5134 ybp

Z30 - 5085 ybp

Z2 - 4943 ybp

Z7 - 4758 ybp

Z334 - 4814 ybp

Z326 - 3442 ybp

CTS2509 - 2887 ybp

Z5054 - 1802 ybp

Since the number of mutations downstream of U106 in all sublineages of clade Z18 is significantly lower than in most remaining subclades of U106, I am reluctant to provide any relatively secure estimates for Z18 or Z372. We can of course imagine that such lower number of SNPs is likely to result from some natural random fluctuations that may potentially lead to a decreased (or increased) mutation rates in certain lineages. In this particular case, we can safely assume that such "local fluctuation" took place at some early stage, or close to the root of Z18 (so it significantly affected all descending sublineages). On the other hand, we cannot rule out that some other subclades (like L48, and more specifically D9) show an opposite phenomenon (i.e. an increased number of mutations due to some "random fluctuations"), and since those "overestimated" subclades are the most frequent ones, this may also lead to some bias when the average number of mutations downstream of U106 is calculated. One way to overcome this is to calculate the "overall average" by giving the same weight to all major subclades (irrespective of their frequency). Indeed, such average number of mutations would be slightly lower (closer to 38 instead of approaching 40), which would place the most recent common ancestor of all U106 members at about 5700 ybp (or 3700 BC), while the TMRCA values for all descending subclades would also be affected accordingly.

Here are some links to my previous SNP-based estimates that were posted on the Anthrogenica and Molgen forums:

http://www.anthrogenica.com/showthread.php?828-STR-Wars-GDs-TMRCA-estimates-Variance-Mutation-Rates-amp-SNP-counting/page9&p=26002#post26002

http://eng.molgen.org/viewtopic.php?t=1300&p=20293

Taking opportunity of having an access to such a large sample of nicely analyzed Big Y results, I've used those Big Y data (from the recent Big Y 120 spreadsheet) to make some provisional SNP-based TMRCA estimates for selected subclades of R1b-U106. Based on my previous experience with doing similar calculations for R1a and other haplogroups (including some R1b clades upstream of R1b-U106), I've decided to use the mutation rate 0.66 x 10^-9 per bp per year, which should give us approximately 150 years per each BigY-tested mutation when the so-called "gold standard" region of about 10 Mb is considered. Although the sequenced region is actually slightly larger (11-13 Mb), it seems that the percentage of "reliable" mutations reported for some sequences located outside of the "gold standard" region is below 5% on average, so I didn't modify my calculations to take this into account.

The average number of reliable BigY-tested mutations downstream of U106 seems to be 39.6, which corresponds to 5940 years (when using the above-mentioned assumption). Since this number of downstream mutations may of course differ in particular sublineages, I've tried to base my calculations not only on an average number of mutations downstream of a given branching node but also on a number of mutations separating this particular node from U106. It is worth noting that this method gives reliable results only if a relatively large number of independent (!) downstream lineages is available. In all other cases, there is a huge risk of significant over- or under-estimation.

Below please find some of my results. As rightly pointed out by Raymond, you need to keep in mind that each such estimate has a relatively large margin of error, so don't take those estimates too seriously. :)

U106 - 5940 ybp

Z381 - 5796 ybp

Z156 - 5628 ybp

DF98 - 4868 ybp

DF96 - 4412 ybp

L1 - 3416 ybp

L48 - 5663 ybp

L47 - 5084 ybp

Z160 - 4652 ybp

Z9 - 5134 ybp

Z30 - 5085 ybp

Z2 - 4943 ybp

Z7 - 4758 ybp

Z334 - 4814 ybp

Z326 - 3442 ybp

CTS2509 - 2887 ybp

Z5054 - 1802 ybp

Since the number of mutations downstream of U106 in all sublineages of clade Z18 is significantly lower than in most remaining subclades of U106, I am reluctant to provide any relatively secure estimates for Z18 or Z372. We can of course imagine that such lower number of SNPs is likely to result from some natural random fluctuations that may potentially lead to a decreased (or increased) mutation rates in certain lineages. In this particular case, we can safely assume that such "local fluctuation" took place at some early stage, or close to the root of Z18 (so it significantly affected all descending sublineages). On the other hand, we cannot rule out that some other subclades (like L48, and more specifically D9) show an opposite phenomenon (i.e. an increased number of mutations due to some "random fluctuations"), and since those "overestimated" subclades are the most frequent ones, this may also lead to some bias when the average number of mutations downstream of U106 is calculated. One way to overcome this is to calculate the "overall average" by giving the same weight to all major subclades (irrespective of their frequency). Indeed, such average number of mutations would be slightly lower (closer to 38 instead of approaching 40), which would place the most recent common ancestor of all U106 members at about 5700 ybp (or 3700 BC), while the TMRCA values for all descending subclades would also be affected accordingly.

Here are some links to my previous SNP-based estimates that were posted on the Anthrogenica and Molgen forums:

http://www.anthrogenica.com/showthread.php?828-STR-Wars-GDs-TMRCA-estimates-Variance-Mutation-Rates-amp-SNP-counting/page9&p=26002#post26002

http://eng.molgen.org/viewtopic.php?t=1300&p=20293