PDA

View Full Version : SNP Counting and Estimating the Age of R1b



rms2
08-02-2014, 03:03 PM
What is the latest on the SNP-counting method of estimating the age of R1b and its subclades?

Where can one access a relatively complete list of SNPs that excludes SNP designators that are merely different names for the same SNP (to prevent over counting)?

alan
08-02-2014, 03:54 PM
I noticed most of the discussions on this have not been on this site. I am very interested in this too and it would be nice if anyone weighed in with some thoughts on this here.

alan
08-02-2014, 04:09 PM
I posted about the long lifespan and habit of having many wives in sequence of Medieval Irish kings strongly suggesting that any top-down expansion of y lineages from kings may well have involved people who help positions of power into old age. I believe something of a pattern of a longer inter-generation average than some suspect is appropriate for similar top-down expansions in the past. Leaders may have had sons over a span of 20 years or more and lived much longer than the poor. It seems pretty likely from the whole shape of P312 that a similar process was going on much earlier - perhaps from immediately after P312 itself. It looks like a top down push rather like Medieval Ireland. If so then I wouldnt hesitate to suggest 30 years or more as an average span between the birth of a male and his father on these apparently privileged lineages.

VinceT
08-02-2014, 06:29 PM
I'm in process of compiling such a list combining 1KGP, 23andMe, and FGC & YFull data via Ybrowse; several of these are recurrent, having at most 10 different occurrences on the tree. So far I've counted approximately 220 SNPs spanning from the top of the R-M343 branch (i.e. at the R1a-R1b split) down to the bottom of the R-L11 branch. If I presume that the U106-P312 split took place approximately 4500 ybp, and these SNPs accumulate once per approximately 70 years on average (faster than using FGC's estimate of ~89 years/SNP using their filtered data and rejection of recurrent SNPs), I end up at circa 19,900 ybp.

But I may be on the high side here. The rate may be slightly faster and the ages slightly more recent.

alan
08-03-2014, 12:55 AM
Why did you go for 70 years per SNP instead of 89?


I'm in process of compiling such a list combining 1KGP, 23andMe, and FGC & YFull data via Ybrowse; several of these are recurrent, having at most 10 different occurrences on the tree. So far I've counted approximately 220 SNPs spanning from the top of the R-M343 branch (i.e. at the R1a-R1b split) down to the bottom of the R-L11 branch. If I presume that the U106-P312 split took place approximately 4500 ybp, and these SNPs accumulate once per approximately 70 years on average (faster than using FGC's estimate of ~89 years/SNP using their filtered data and rejection of recurrent SNPs), I end up at circa 19,900 ybp.

But I may be on the high side here. The rate may be slightly faster and the ages slightly more recent.

George Chandler
08-03-2014, 01:16 AM
I'm in process of compiling such a list combining 1KGP, 23andMe, and FGC & YFull data via Ybrowse; several of these are recurrent, having at most 10 different occurrences on the tree. So far I've counted approximately 220 SNPs spanning from the top of the R-M343 branch (i.e. at the R1a-R1b split) down to the bottom of the R-L11 branch. If I presume that the U106-P312 split took place approximately 4500 ybp, and these SNPs accumulate once per approximately 70 years on average (faster than using FGC's estimate of ~89 years/SNP using their filtered data and rejection of recurrent SNPs), I end up at circa 19,900 ybp.

But I may be on the high side here. The rate may be slightly faster and the ages slightly more recent.

How many of those are known triangulated lineages where you can say you have 1 every 70 years or whatever?

George

VinceT
08-03-2014, 01:51 AM
I calibrated to the reckoning of Dr. Nortdvedt and Dr. Klyosov that P312 and U106 split approximately 4,000 to 4,200 ybp, but up-scaled for Dr. Tim Janzen's reckoning of up to 5500 ybp, and acknowledgement that sample variances tend to underestimate true population variances slightly, so presumed 4,500 ybp.

Below R-L11 (i.e. typical R-U106 and R-P312 branches), there appear to be a range of roughly 45 to 65 SNPs per sub-branch.

65/4500 = ~69, bumped up to 70.

Going with the high end: (220 + 65) * 70 = 19,950. This is more than Dr. Karafet's max 18,500 ybp estimate for R-M343, but I was looking for the upper end.

We can take the median of approximately 55 SNPs from present day to the U106-P312 split: 4500/55 = ~82 years per SNP. (220 + 55) * 82 = 23,370 ybp, which is a bit more than Karafet's 18,500 ybp. I'm leaning away from this, because I suspect that the occurrence of highly recurrent SNPs due to population expansion in recent times likely filters out a decent percentage of the true SNP count, somewhat akin to the Birthday Paradox problem.

Obviously, it's still pretty loosy-goosy at this point. And of course, finding and successfully sequencing an LGM-era R1(b?) sample would be pretty nifty.

VinceT
08-03-2014, 02:03 AM
How many of those are known triangulated lineages where you can say you have 1 every 70 years or whatever?

George

I don't know of any projects that employ high resolution and high coverage (i.e. FGC's 'Elite' product, not FTDNA's 'BigY' product) full sequencing of family members with proven deep-rooted pedigrees.

SNP rates would be inversely proportional to breadth of coverage, presuming a random distribution. BigY's coverage implies a rate closer to 137 years per SNP identified among the ~10 million bases sequenced by that test, according to several reports.

I still yet need to sift through that count of 220 SNPs to see how many of them are identifiable through BigY's coverage.

George Chandler
08-03-2014, 02:28 AM
I don't know of any projects that employ high resolution and high coverage (i.e. FGC's Elite, not FTDNA's BigY) full sequencing of family members with proven deep-rooted pedigrees.

SNP rates would be inversely proportional
We're getting ~45-60 under DF13 in our group but what will differ is the number of Sanger verified SNP's. Of the 32 Sanger verified SNP's under DF13 for me only 2 have happened in the past 12 generations (one every 189 years). When I look at the other Chandler who is 10 generation to out MRCA he discovered another 5 which were culled and primers can be made (yet to be retested). I'm going to retest those as well for consistency sake. Personally I don't like using STR calculations much anymore as they have been so wrong when you actually look at the SNP results.

George

VinceT
08-03-2014, 02:47 AM
We're getting ~45-60 under DF13 in our group but what will differ is the number of Sanger verified SNP's. Of the 32 Sanger verified SNP's under DF13 for me only 2 have happened in the past 12 generations (one every 189 years). When I look at the other Chandler who is 10 generation to out MRCA he discovered another 5 which were culled and primers can be made (yet to be retested). I'm going to retest those as well for consistency sake. Personally I don't like using STR calculations much anymore as they have been so wrong when you actually look at the SNP results.

George

Of my own list of 52 SNPs below R-U106, Z2265, (that count does not including recurrent SNPs aside from L199) only 32 are testable and validated by Sanger sequencing. At least 9 of those happened within the last 1250 years or so.

George Chandler
08-03-2014, 03:04 AM
That's interesting. How were you able to determine the 9 happened within the past 1250 years? What does the STR difference look like?

George

VinceT
08-03-2014, 03:26 AM
I miscounted above, there are actually at least 10 SNPs private to my line below L199.

My closest match (also L199+) has a GD of 8/67 and 14/111 to me. Near as I can figure it (using McGee's Y-Utility), this places our TMRCA somewhere between 750 to 1400 years ago. His BigY test shows 6, maybe 7 SNPs private to his line.

I have another less close match who is L199-, but McGee estimates our mutual TMRCA around 1250 ybp.

MJost
08-03-2014, 01:26 PM
Here is a chart I have been working on for my closest STR GD as they relate to the latest SNP Testing along with showing FtDNA TIP probabilities. I have always considered 1 sigma (68%) bell curve as a good point to work within. In my chart you will see that a Watterson family with three sons came to the Virginia Colony in 1762.

I have 25 validated SNPs under DF13>FGC5496. I tested these 25 SNPs against my 155812 GD9/67 & GD13/111 and it produced SGD9/25 or 17 positive out of 25. My next two candidates are Watterson's. 206726 was born on the Isle of Man is a STR GD6/67 & GD8/111 turned out SGD3/25 or 22 positive of 25 SNPs. Then my predicted aDNA 4th cousin 316063 was shown to be a GD1/67 & GD3/111 had a SGD2/25 or 23 positive out of my 25 SNPs. (His Aunt and I have the same small non-European admixture percentage which would correlate to have occurred since the 1762 arrival in colonial times. There is a fourth candidate N41593 who is a GD3/67 & GD5/111 who appears to be at least one generation back (MDKA b1824) who would be assumed to be a SGD2 or 3. I am going to assume their is a SNP missing based on the overall STR GD's and TIP calculations or at least a mutation that has occurred in different quadrants of the bell curve.

I also show various known generation birth years as compared to the various years per SNP mutation. You can note that based on the known years per generation back to the late 1600's, the 90 years per SNP works out very good as a general rule of thumb and a there is not a missing SNP as may be indicated after all.

https://drive.google.com/file/d/0By9Y3jb2fORNWkdIUlRjYTlKQWs/edit?usp=sharing

MJost

George Chandler
08-03-2014, 04:10 PM
Hi Mark,
How many of those relatives were you able to run a Full Y or Big Y against? I'm interested in how many were matched among the culled SNP's by YSEQ. What concerns me is that if we only take the viable Sanger SNP's into account we are missing part of the information. Yet at the same time understanding that you risk putting unstable SNP's into the mix if we don't. If look at the 9919 A4 results there were 47 high quality SNP's below DF13 and yet only 22 passed muster in terms of YSEQ and those have yet to be retested. Although I'm really interested in your results I'm also interested in the raw values. When I compare the results of the other Chandler line as I mentioned before there were only 2 of my validated SNP's we didn't match on. What is interesting is that his results picked up another 8 new SNP's for him and out of those 5 were stable enough to develop primers. I'm going to retest those as well just to make sure because there is 2 different sequencing company's in play and 2 different tests. It will be interesting to see how it plays because of the 5 new SNP's Astrid recommended 2 of them.

George

MJost
08-03-2014, 11:42 PM
Hi Mark,
How many of those relatives were you able to run a Full Y or Big Y against? I'm interested in how many were matched among the culled SNP's by YSEQ. What concerns me is that if we only take the viable Sanger SNP's into account we are missing part of the information. Yet at the same time understanding that you risk putting unstable SNP's into the mix if we don't. If look at the 9919 A4 results there were 47 high quality SNP's below DF13 and yet only 22 passed muster in terms of YSEQ and those have yet to be retested. Although I'm really interested in your results I'm also interested in the raw values. When I compare the results of the other Chandler line as I mentioned before there were only 2 of my validated SNP's we didn't match on. What is interesting is that his results picked up another 8 new SNP's for him and out of those 5 were stable enough to develop primers. I'm going to retest those as well just to make sure because there is 2 different sequencing company's in play and 2 different tests. It will be interesting to see how it plays because of the 5 new SNP's Astrid recommended 2 of them.

George
I haven't had any others except myself but 155812 tested all 25 SNPs When he came back with 17, the other two kits tested the last eight only which showed branching. If any HAD did a full Y then we would be able to see their SNPs in their own branch from me.

I had 69 SNPs under DF13 but could only validate by Sanger sequencing 25 due to various issues such as cross overs, etc. I don't believe we can ever get a large number of SNPs back to DF13 from raw positive results. To many issues that may exclude some at some branch below. Who is to say 100% these would prove positive in everyone that tests for that particular SNP.

MJost

MJost

George Chandler
08-04-2014, 01:27 AM
That's the frustrating part. I have 32 which have been validated and if the other 5 Chandler SNP's are validated then that would be 35 for that line (2 of mine being negative). If test negative for all 5 then that would make 1 per 189 years for my line and 1 every 71 years for his line in terms of proven SNP's..which is a big difference.

I would just be interesting to see the comparison between the kit who mismatched you at 9 SNP's how that translated into raw SNP's. Would you see an equal number of mismatches for the raw results or does the 9 you mismatch on just happen to be the stable ones? As you know I've pretty much thrown out the STR comparison calculations after seeing that the types who were supposed to be 1000, 1500, or 2,000 or the MRCA only share 5 SNP's right under DF13.

George

George Chandler
08-04-2014, 02:26 AM
If I take the raw data results from the 2 Chandler lines I have 53 high quality SNP's below DF13 and the other Chandler has 55. From my birth to the time of our MRCA there is 377 years and 12 generations. For the other Chandler there is 356 years and 10 generations. When comparing the raw high quality results under DF13 I there has been 4 raw mutations in my line and 6 raw mutations in his line. If we average the years as 367 and the average number of mutations being 5 then there is one raw mutation every 73.3 years x 54 being the average number of SNP's between us including DF13 then you age an age estimate of just over 4,000 years for DF13. Obviously results will vary between different tests but it would be interesting to see what other families get when comparing the raw data.

George

George Chandler
08-04-2014, 02:28 AM
I should have said "and include DF13 which makes 55".

Michał
08-04-2014, 12:18 PM
Obviously, it's still pretty loosy-goosy at this point. And of course, finding and successfully sequencing an LGM-era R1(b?) sample would be pretty nifty.
Absolutely agreed. Some fully sequenced (and radiocarbon-dated!) R1b remains is what we badly need for the verification of our SNP-based age estimates, and I hope this will eventually happen. It is so frustrating that such aDNA data are not available yet.



I calibrated to the reckoning of Dr. Nortdvedt and Dr. Klyosov that P312 and U106 split approximately 4,000 to 4,200 ybp, but up-scaled for Dr. Tim Janzen's reckoning of up to 5500 ybp, and acknowledgement that sample variances tend to underestimate true population variances slightly, so presumed 4,500 ybp.
I must admit that if I used your approach, I would definitely assume an older age for L11. I see no reason for giving more credit to the old STR-based estimates than to some more recent SNP-based calculations (like those of Tim Janzen, among others). Also, if L11 was only 4500 years old, this would practically rule out any involvement of the P312 males in the early dispersal of the Bell Beaker folk, which would then pose a significant problem when trying to explain the expansion of P312+U106 over the entire Western Europe.

In light of the above, it seems much better justified to assume that L11 is about 5500 years old (which is consistent with the estimates by Tim Janzen), even though I suspect that L11 might be a bit older (i.e. about 6000 years old).



Below R-L11 (i.e. typical R-U106 and R-P312 branches), there appear to be a range of roughly 45 to 65 SNPs per sub-branch.

65/4500 = ~69, bumped up to 70.
It is not clear to me why you decided to choose an upper limit when assuming the average number of SNPs downstream of L11 (instead of using a mean or median value). No wonder that when combined with a very young (and probably underestimated) age of L11, this produced a surprisingly low number of 70 years per each SNP.

Unfortunately, I don’t have enough data to provide a reliable average number of the FGC-tested SNPs under L11. However, George Chandler reports here that the DF13 people get about 45-60 SNPs under DF13. I don’t know the exact average for this group, but it would be probably safe to assume that such average value for DF13 is between 50 and 55, so when adding the 10 known SNPs positioned upstream (PF6547/S116/P312, PF6548/CTS12684/Z1904, S145/M529/L21, L459, S552/Y2598, S245/Z245, Z260, S461/Z290, Z2542/CTS8221, CTS241/S521/DF13), we get an average of 60-65 reliable SNPs downstream of L11. Based on what you wrote, we may suspect that some other sublineages of L11 show a lower number of SNPs downstream of L11 (as suggested by the median value of 55 for another subgroup of L11), so I would assume that the average number of SNPs in all L11 members is probably between 55 and 65, or close to 60.

Let me now use the above estimates to follow your approach and calculate the age of the R1a-R1b split. When dividing 5500 years by 60, we get 91.7 years per each reliable FGC-tested SNP. In the next step, (220+60)x91.7 gives 25676 years to the R1a-R1b split, which is almost exactly the same as has been recently reported by Underhill in his paper about R1a. Most importantly, it is also perfectly consistent with the Y-DNA data for the radiocarbon-dated Mal’ta boy (which cannot be said about the old estimates by Karafet you were citing in your post).

The above estimates also suggest that L21 and DF13 are about 4800 and 4600 years old, respectively (within a quite reasonable margin of error, of course).

280 SNPs downstream of R1 is close to what we see in R1a, though if counting some well-known SNPs that usually give low quality results at FGC (i.e. ** or ***), this number will probably be a bit higher (close to 300).

parasar
08-04-2014, 02:15 PM
...

Let me now use the above estimates to follow your approach and calculate the age of the R1a-R1b split. When dividing 5500 years by 60, we get 91.7 years per each reliable FGC-tested SNP. In the next step, (220+60)x91.7 gives 25676 years to the R1a-R1b split, which is almost exactly the same as has been recently reported by Underhill in his paper about R1a. Most importantly, it is also perfectly consistent with the Y-DNA data for the radiocarbon-dated Mal’ta boy (which cannot be said about the old estimates by Karafet you were citing in your post).

The above estimates also suggest that L21 and DF13 are about 4800 and 4600 years old, respectively (within a quite reasonable margin of error, of course).

280 SNPs downstream of R1 is close to what we see in R1a, though if counting some well-known SNPs that usually give low quality results at FGC (i.e. ** or ***), this number will probably be a bit higher (close to 300).

Michał,

Calibrating with Anzick-1 should give the same ballpark figures too.
http://www.anthrogenica.com/showthread.php?1507-Some-provisional-calculations-for-haplogroup-R1a-based-on-the-first-FGC-result&p=41327&viewfull=1#post41327

alan
08-04-2014, 03:58 PM
I have a lot of confidence in your estimates because they are falling in line so well with Mal;ta and in the case of L21 and DF13 they are uncannily close to archaeological expectation for a beaker group arriving at the English Channel.

Can I ask, after the R1 defining SNP are there shared SNPs between R1b and a or do the two lines split immediately?


Absolutely agreed. Some fully sequenced (and radiocarbon-dated!) R1b remains is what we badly need for the verification of our SNP-based age estimates, and I hope this will eventually happen. It is so frustrating that such aDNA data are not available yet.



I must admit that if I used your approach, I would definitely assume an older age for L11. I see no reason for giving more credit to the old STR-based estimates than to some more recent SNP-based calculations (like those of Tim Janzen, among others). Also, if L11 was only 4500 years old, this would practically rule out any involvement of the P312 males in the early dispersal of the Bell Beaker folk, which would then pose a significant problem when trying to explain the expansion of P312+U106 over the entire Western Europe.

In light of the above, it seems much better justified to assume that L11 is about 5500 years old (which is consistent with the estimates by Tim Janzen), even though I suspect that L11 might be a bit older (i.e. about 6000 years old).



It is not clear to me why you decided to choose an upper limit when assuming the average number of SNPs downstream of L11 (instead of using a mean or median value). No wonder that when combined with a very young (and probably underestimated) age of L11, this produced a surprisingly low number of 70 years per each SNP.

Unfortunately, I don’t have enough data to provide a reliable average number of the FGC-tested SNPs under L11. However, George Chandler reports here that the DF13 people get about 45-60 SNPs under DF13. I don’t know the exact average for this group, but it would be probably safe to assume that such average value for DF13 is between 50 and 55, so when adding the 10 known SNPs positioned upstream (PF6547/S116/P312, PF6548/CTS12684/Z1904, S145/M529/L21, L459, S552/Y2598, S245/Z245, Z260, S461/Z290, Z2542/CTS8221, CTS241/S521/DF13), we get an average of 60-65 reliable SNPs downstream of L11. Based on what you wrote, we may suspect that some other sublineages of L11 show a lower number of SNPs downstream of L11 (as suggested by the median value of 55 for another subgroup of L11), so I would assume that the average number of SNPs in all L11 members is probably between 55 and 65, or close to 60.

Let me now use the above estimates to follow your approach and calculate the age of the R1a-R1b split. When dividing 5500 years by 60, we get 91.7 years per each reliable FGC-tested SNP. In the next step, (220+60)x91.7 gives 25676 years to the R1a-R1b split, which is almost exactly the same as has been recently reported by Underhill in his paper about R1a. Most importantly, it is also perfectly consistent with the Y-DNA data for the radiocarbon-dated Mal’ta boy (which cannot be said about the old estimates by Karafet you were citing in your post).

The above estimates also suggest that L21 and DF13 are about 4800 and 4600 years old, respectively (within a quite reasonable margin of error, of course). A date around 24000BC for the split between R1a and b is also highly significant because it is bang on the start of the LGM. We know that the extinct R line of Mal'ta boy was near Baikal c. 22000BC and so some R hung around for some time during the LGM.

280 SNPs downstream of R1 is close to what we see in R1a, though if counting some well-known SNPs that usually give low quality results at FGC (i.e. ** or ***), this number will probably be a bit higher (close to 300).

alan
08-04-2014, 04:34 PM
I think Michal's apparent findings that there is a significant gap between L51 and L11 on the one hand and downstream major branches like DF13, U152 etc is an important finding. It slightly removes the sense of European R1b (other than L23xL51) coming out of the blue from nowhere. It allows European lines like L51xL11 and L11* a considerable length of time resident in at least Alpine Europe before the main burst a little below P312. It is interesting as it provides a European pre-beaker phase. I am in no way saying that those lines today have an old intrclade TMRCA but they show surviving branches were being shed before the upstream SNPs which define most European R1b. I think there is significance to them and their patterning.

This all does make me wonder even more if L23 was not somehow linked with copper working from its very invention with L11 or L51 being linked to its later spread west of the Balkans from the east end of the Alpine zone to the west which was a relatively slow process c. 4000-3000BC. If so then it does make me wonder if older lineages of L23 and maybe even M269xL23 are linked to oldest copper smelting in the Balkans c. 5000BC, Anatolia, Iran etc. That wouldnt be a crazy idea if we accept that copper working was diffused by lineages rather than the subject of coincidental invention is similar periods in the Balkans, Anatolia, north Iran etc. It might explain the weird distribution of L23xL51. It is also possible that it got into the steppes along with the Bakans-steppe copper network run by Sredny Stog people who appear to have been a mix of steppe and farming types c. 4500BC. The detail is unclear but its entirely conceivable that a lineage with the knowledge of copper jealously guarded their secret and spread widely into other cultures to feed demand but remain in control of it.

This is a handy guide to the not fully resolved origin of copper debate

http://armchairprehistory.com/2010/07/07/a-primer-on-old-world-metals-before-the-copper-age/

I believe in single origin and diffusion but I have an open mind as to origin point - the evidence is just not good enough in many areas.

parasar
08-04-2014, 04:42 PM
I have a lot of confidence in your estimates because they are falling in line so well with Mal;ta and in the case of L21 and DF13 they are uncannily close to archaeological expectation for a beaker group arriving at the English Channel.

Can I ask, after the R1 defining SNP are there shared SNPs between R1b and a or do the two lines split immediately?

I see 41 R1 equiv or upstream in my YFull data.
R equiv are fewer - 12

Michał
08-04-2014, 05:05 PM
I see 41 R1 equiv or upstream in my YFull data.

The experimental tree at YFull shows 46 SNPs assigned to the R1 level, and I don't think this is a full list.

alan
08-04-2014, 06:31 PM
So you could say that the splitting into two lines that have survived may represent some minor improvement in reproductive survival or a geographical change/split after a long period c. 4-5000 years of bare survival. The timing c. 24000BC seems a little odd for an improvement as its the start of the LGM. Perhaps again some improvement due to change in geography is indicated - perhaps making the decision to head to a southern refugium around Altai or somewhere else. Perhaps its counterintuitive and the spread of steppe tundra with the LGM actually brought better hunting.

We know that for most of the period 30-22000BC that south-central Siberia was occupied by the middle upper palaeolithic culture of that area of which Mal'ta boy seems a very late representative - the only certain date after 25000BC for that culture found as yet. So, R1 branched at the time when the culture Mal'ta boy lived in slowly dying out. Again an odd time to branch on the face of it. However, it seems clear from radiocarbon dating that most of Siberia was abandoned c. 25000BC with Mal'ta being an exception. Did the R1 line exit southwards around this time while the ancestors of Mal'ta boy remained in the harsher zone around Baikal for a few thousand years more before dying out.


The experimental tree at YFull shows 46 SNPs assigned to the R1 level, and I don't think this is a full list.

VinceT
08-04-2014, 09:21 PM
280 SNPs downstream of R1 is close to what we see in R1a, though if counting some well-known SNPs that usually give low quality results at FGC (i.e. ** or ***), this number will probably be a bit higher (close to 300).

I missed a dozen additional SNPs approximate to the R-L51 and R-L11 branches. My current total count is now approximately (234 + 55) = 289. Tack on a ± 5% or 10% error to that, maybe?

VinceT
08-04-2014, 09:33 PM
The experimental tree at YFull shows 46 SNPs assigned to the R1 level, and I don't think this is a full list.

67 so far from Greg Magoon's analysis (http://biorxiv.org/content/early/2013/11/22/000802.1) (still under review), with recurrence of <10 observances in the tree.

alan
08-04-2014, 10:33 PM
What age would people put on the P297 SNP using SNP counting?

alan
08-04-2014, 10:45 PM
...also can anyone post or post a link showing the entire vertical chain of SNPs from from M343 down to say M269. I get the general idea of years per SNP and would like to be able to not have to keep bothering others for comment on this.

VinceT
08-05-2014, 12:44 AM
What age would people put on the P297 SNP using SNP counting?

Aside from low coverage data for HG01947 (Peruvian) and HG00640 (Puerto Rican), I'm not aware of any other publicly available NGS samples positioned between R-M343 and R-M269 to resolve the SNP chain.

*Correction: there are 3 BigY tests in the R1b1(xP297) DNA Project at https://www.familytreedna.com/public/R1b1Asterisk/default.aspx which may be helpful.


i.e.
HG01947: M343+ P25? M415+ L278+ L1068+ L388- L780? L389- P297- L320- L757- M269-
HG00640: M343+ P25? M415? L278+ L1068+ L388+ L780+ L389+ P297- L320? L757- M269-





...also can anyone post or post a link showing the entire vertical chain of SNPs from from M343 down to say M269. I get the general idea of years per SNP and would like to be able to not have to keep bothering others for comment on this.

Working on that... :)

MJost
08-05-2014, 02:21 PM
For reference, here is a post by Iain McDonald over on the Yahoo r1b1c_u106-s21 group. I am seeing 99 years per SNP and 102 years per STR at 111 markers. MJost


https://groups.yahoo.com/neo/groups/R1b1c_U106-S21/conversations/topics/23797;_ylc=X3oDMTJzbHBubzBnBF9TAzk3MzU5NzE1BGdycEl kAzIyMDMyNzk3BGdycHNwSWQDMTcwNTE4Nzk4MgRtc2dJZAMyM zc5NwRzZWMDZG1zZwRzbGsDdm1zZwRzdGltZQMxMzk4MjY2OTg 5

"Sorry for the bombardment of posts, but I've been able to add the Cockburn ages in as well. For U106, the picture now looks more sensible at:
Best guess: 118 years/mutation (+28 -20 years)
68% confidence interval: 98-146 years/mutation
95% confidence interval: 83-189 years/mutation

For U106 + Clan Donald, the results become (including mini-satellites):
Best guess: 116 years/mutation (+20 -15 years)
68% confidence interval: 101-136 years/mutation
95% confidence interval: 89-163 years/mutation
(add on about 10 +/- 3 years if you don't want mini-satellites)

The closeness between the Clan Donald and U106 results indicates there is negligible difference between R1a and R1b. The latter value of 116 +20 -15 years/mutation is my recommended value at present, and equates to 0.71 - 0.96e-9 mutations per year per SNP, or a foundation of U106 of 3050 - 1600 BC at 68% confidence.

In combination with the STR results, this would give 2250 - 1350 BC at 68% confidence, with constraints from archeological M269 results strongly favouring ages above 2000 BC. My best estimate for the age of U106 is therefore now the latter half of the third millenium BC (approximately 2500 - 2000 BC).

Cheers,

Iain."



From: Iain McDonald <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Wednesday, 23 April 2014, 13:14
Subject: Re: Michal's SNP ages & SNP rates from Clan Donald

"I've had a chance to look through the Clan Donald data. There are four lineages for which I have been able to trace ancestries, totalling 2604 +/- 42 years. This page gives the Clan Donald family tree:

http://dna-project.clan-donald-usa.org/DNAmain3.htm

One lineage comes from Angus Og (b. 1272), the other three lineages stem from his three grandchildren

In this time, there have been 19 SNPs and the aforementioned 4 mini-satellite mutations. Assuming we count the mini-satellite mutations, as I presume they are listed in David's BigY spreadsheet, I find:

113 +32 -21 years/mutation, or 92-145 years/mutation at 68% confidence and 77-201 years/mutation at 95% confidence.

Perhaps unsurprisingly, this lies between Michal's rate and mine, and would imply that U106 arose around 2100 BC, give or take a few centuries, during the height of the Bell-Beaker culture.

If we stick strictly to SNPs, I find:

137 +44 -27 years/mutation, or 110-181 years/mutation at 68% confidence and 91-262 years/mutation at 95% confidence.

I presume this is not appropriate, however, as mini-satellites are included in the BigY novel variant list.

There are six other ancestries, which are phylogenically equivalent to the sons of Lord John McDonald. It's not clear whether these are more unrecorded sons of Lord John. I'd think it more likely that they share a common ancestor in the few generations after Lord John McDonald. We can limit the slow end of the mutation rate by assuming that they are all sons of Lord John, making the total lineage 6348 +/- 47 years, over which there are 32 STR + 4 mini-satellite = 36 BigY mutations. That would give:

50% confidence of an upper limit of 176 y/mut

68% confidence of an upper limit of 213 y/mut

95% confidence of an upper limit of 268 y/mut

In practice, the lower limit from all lineages is not likely to be less than:

50% confidence of a lower limit of 149 y/mut

68% confidence of a lower limit of 126 y/mut

95% confidence of a lower limit of 109 y/mut

The more tightly constrained lineage from the four known lines is consistent with these results, so I'd be happy adopting 113 +32 -21 years for the McDonald lines. I'd also be happy extending it to the U106 results, based on what we think we know about the origin of U106.

I've also combined it with the rates we get from Dutton-Warburton and Mumma to get:

105 +19 -14 years/mutation, or 90-124 years/mutation at 68% confidence and 79-151 years/mutation at 95% confidence.

Based on the other Donald lines and historical evidence, I'd now be inclined to suggest that the real value is towards the upper end of this range, i.e. circa 125-135 years/mutation.

Cheers,

Iain."

George Chandler
08-06-2014, 06:08 AM
The results from the Chandler comparison being 73.3 years per raw SNP and 139 years per Sanger SNP are interesting when looking at the 9919 group results. So as I stated before we have a rough age estimate for DF13 being 4,032 years raw & 4,587 years (Sanger) using my line. If I compare the average 9919 results (excluding BigY) I get 4,032 years raw & 3,952 years Sanger for DF13.

It's surprising how close the age estimates are.

George

alan
08-06-2014, 02:01 PM
Well at least all variations on dating techniques are coming in at about 3500-2000BC for the immediately below L11 major clade nodes which does at least seem to put it in a date range that is clearly post-first farmers across Europe - which is in line with ancient DNA evidence and does support the copper age arrival model.

I still think the really hard to work out part is the initial story of how L11 derived groups got to the western half of Europe from the older R1b zone of Eastern Europe and SW Asia in the copper age. Its like there is a missing link or part of the story has been obscured by later population replacement.

Certainly the signal of that move is a great deal more subtle and arguable than we may have hoped for. However, the key is probably that this is a lineage moving rather than a population and that will cause low archaeological visibility. Maybe it was a lineage that has always been associated with metallurgy from its very origin and has almost from the moment of its invention of smelting it gained the ability to penetrate any society because of what it could offer. That would mean it could have had shifting and multiple linguistic identities.

I think I have come around more to the idea that the whole L23 derived expansion may be down to a metallurgical lineage who may have commenced in their rise to importance c. 5000BC with copper smelting. What is interesting is that copper smelting or copper trading quickly penetrated a number of societies. Within a few centuries of origin (Balkans?) this was known in the Balkans, Anatolia, Iran and the western steppes. Its interesting to note that trading copper into the steppes and simple copper production there commences long before it spread into most of Europe. So, it is possible that branches of any copper working and trading lines could have been in the steppes, probably as an element of the Sredny Stog elites, from only a few centuries after the invention of copper smelting and long prior to the final rise of full PIE c. 4000-3500BC. So, these lineage would probably have been in the mix in the steppe in the 500-1000 years period leading to PIE. its pretty clear from craniolology that the Stedny Stog groups were a mixture of steppe and Balkan elements - apparently with the latter better represented among males. Its also clear how important Stedny Stog was in the rise of social hierachy and elites among steppe tribes. So the nativist picture of indigenous evolution being dominant on the steppes that you sometimes read about is hugely exaggerated.

rms2
08-06-2014, 06:33 PM
At the risk of starting up the usual bugfest, I want to add that the same period witnessed the transformation of Western Europe into not only an L11 dominated region, but a region speaking the older, Centum variety of Indo-European. The exact details of how that happened can be argued - and will, no doubt, be argued - but it seems to me unlikely that the two are merely coincidental.

alan
08-06-2014, 08:03 PM
I agree. There is simply no later phenomenon than beaker and corded ware that can explain a pan European linguistic change. After that Europe is divided up too much to find a common denominator and there is no pan-Europeangenetic change after the arrival of R1b and R1a of any scale that can tie language change and genetic change. It appears to be the last great pan-European yDNA change in European prehistory and you were a profit of that even when the gene dating being suggested made that seem impossible. Your idea of R1b centum and R1a satem still looks basically correct. I would also say there is still no evidence from Afansievo-Tocharian that clearly contradicts it as we cannot be sure the R1a Tarim mummies are linked to either of them. Indeed if the unconfirmed claim of M269 in Afansievo ever gets confirmed then its even more brownie points to you for calling that so many years ago.

While I dont think a lineage-= a dialect in a simple way, if one lineage is older then it does fit the idea that the older lineage links to the older form of IE. The earlier R1b forms such as we see in SE Europe are older while most R1a doesnt look older than Yamnaya. I think my best intepretation of IE and R1b in a steppe model is that R1b lineages may have been strongly involved in the pre-Yamnaya groups of Stedny Stog type who linked between the Balkans and the western steppe and moved into the former in a detectable way from 4200 onwards for a few centuries - and they did continue to remain in contact with the steppe after that. However, they had already been strongly interacting and trading with the Balkans for 500 years or so before that so it wasnt the meeting of complete strangers IMO. This predated the rise of mobile pastoralism on wheels which seems to be a Yamnaya invention C. 3300BC further east within the European steppe. From what I understand the Afansievo culture is believed to have initially involved horses but not wagons.

So, I see the two phases

1. A pre-yamnana phase where Stredny Stog type groups interacted heavily with the Balkans before moving into the latter but remaining in contact with the steppe. Unlike Anthony I dont see any need to see all of these pre-Yamnaya steppe groups as Anatolian pre-proto IE and I think they could be centum IEs. They remained after moving into the Balkans in contact with the steppes with evidence of return journeys in metalwork. I suspect this group was predominantly R1b L23 derived because it is old enough to date to movements as early as this - over 6000 years ago while most R1a in the same area doesnt seem to be old enough and is more likely linked to Yamnaya. This early group lived in an area at the western end of the steppe around the Dnieper where agriculture was known before 4000BC and I think that is probably reflected in some languages having more IE agricultural vocab. As Mallory pointed out, Tocharian seems to be one of the languages with a more developed IE agricultural vocab which puts doubt that the Tocharians originated east of the Don. These groups also throught their Stedny Stog origins and the long term influence of the Balkans had a very early connection with Balkans metal. Again this connection with metalwork and more developed agricultural background kind of pre-figures patterns further west and the kind of IE languages R1b seems to be linked with. Whether the metal knowledge and R1b itself entered into Sredny Stog from the Balkans originally is not important when you consider the back and forth relations between Sredny Stog and the Balkans from 4500-after 4000BC.

2. I think R1a is associated with Yamnaya and Corded Ware. I believe its sudden take off around the Yamnaya period is no coincidence. It arose in an area which was east-peripheral to the Sredny Stog metal network and links to metallurgy and which also has very little evidence for agriculture other than pastoralism. Its rise IMO is a demographic expansion linked to the creation of pastoralism on wheels - something that probably happened due to Maykop passing on the wheel to steppe groups who then opened up the inter-riverine parts of the steppes as a huge new pastoral resource. It is also true that Yamnaya seems to have received some fresh impetus in metallurgy from Maykop sources - it became a northern branch of the CMP and mining arose in the Urals son after. Datewise I think this happened around 3300BC, only spreading west of the Black Sea after 3000BC. I also think it was slightly post-Afanasievo. I also think its hard to believe that this wasnt the period satemisation commenced.




2.


At the risk of starting up the usual bugfest, I want to add that the same period witnessed the transformation of Western Europe into not only an L11 dominated region, but a region speaking the older, Centum variety of Indo-European. The exact details of how that happened can be argued - and will, no doubt, be argued - but it seems to me unlikely that the two are merely coincidental.

alan
08-06-2014, 09:04 PM
One thing I think is abundantly clear is that (I am focusing only on Europe here) the branches that M269 shed earliest are more common in the east of Europe, especially the Balkans. The SE European clades M269* and the L23xL51 clades seem to be dated by SNP counting to a considerable age by Michal - very much early enough to have been involved in the pre-Yamnaya of steppe-Balkans interactions - something that was a two-way thing that in some form or another was taking place from at least 4500BC and included migration into the Balkans before 4000BC. We also can be fairly sure from the craniology that a flow from the Balkans into the steppes had happened prior to the flow from the steppes into the Balkans around 4000BC. Where I disagree with Anthony is the idea that this was all Anatolian. I actually get the evidence for this from Anthony's own book which points out that these steppe invaders into the Balkans remained in contact with the steppes afterwards, made return journeys bringing metals etc which IMO would potentially mean some of these groups were not cut off from further developments linguistically in the steppes and interaction continued for some time after. I think its possible to make IE language tree branching too chronological and underestimate geography. The archaic IE homeland might have stretched over a vast space of the European steppe c. 4500-3500BC - a period before mobile pastoralism on wheels. Before mobile pastoralism on wheels geography would have had a much bigger impact as dry inhospitable steppe between the river valleys was not exploited and groups were based on rivers and separated from each other except for some intrepid traders who must have crossed them unless of course the Black Sea was used. Now to me using the Black Sea would make a great deal of sense in terms of the Balkans-steppe metal network, especially Bulgaria would seem a likely coast to set off from - place like Varna

http://en.wikipedia.org/wiki/Varna_Necropolis

seferhabahir
08-06-2014, 09:55 PM
... If I presume that the U106-P312 split took place approximately 4500 ybp, and these SNPs accumulate once per approximately 70 years on average (faster than using FGC's estimate of ~89 years/SNP using their filtered data and rejection of recurrent SNPs), I end up at circa 19,900 ybp.


I'm late to the discussion. What is the rationale for rejecting or excluding recurrent SNPs in these kinds of counting calculations? I (and others) have a few recurrent SNPs in my DF13 line, but since they may have developed independently in my line and do not appear in any other DF13 line, why would would they be ignored in the count of total novel SNPs in the line? Aren't they subject to the same kinds of randomness as a novel SNP that is not recurrent? At 88-90 years per SNP, this is 270 years of difference if there were three recurrent SNPs in a particular line. What am I not understanding?

parasar
08-06-2014, 11:13 PM
I'm late to the discussion. What is the rationale for rejecting or excluding recurrent SNPs in these kinds of counting calculations? I (and others) have a few recurrent SNPs in my DF13 line, but since they may have developed independently in my line and do not appear in any other DF13 line, why would would they be ignored in the count of total novel SNPs in the line? Aren't they subject to the same kinds of randomness as a novel SNP that is not recurrent? At 88-90 years per SNP, this is 270 years of difference if there were three recurrent SNPs in a particular line. What am I not understanding?

Do you mean something like SRY10831 in R1a1?

VinceT
08-06-2014, 11:23 PM
I'm late to the discussion. What is the rationale for rejecting or excluding recurrent SNPs in these kinds of counting calculations? I (and others) have a few recurrent SNPs in my DF13 line, but since they may have developed independently in my line and do not appear in any other DF13 line, why would would they be ignored in the count of total novel SNPs in the line? Aren't they subject to the same kinds of randomness as a novel SNP that is not recurrent? At 88-90 years per SNP, this is 270 years of difference if there were three recurrent SNPs in a particular line. What am I not understanding?

Well, on the extreme side in the case of the 1000 Genomes data-set, there are some SNPs that appear in excess of 20, even 30 times in various parts of the tree, while many other SNPs appear only once. Some other SNPs (such as P25) don't even show up at all in NGS data due to insufficient fragment read lengths. Part of the problem is that many of the more recurrent SNPs may be situated in regions that are somewhat unstable, such as palindromes, micro-satellites, and any other regions that may have or could experience chromosomal transposition due to high similarity. I certainly don't have a handle on how to objectively categorize the stability of polymorphisms in a particular region, but this issue was recently brought up for discussion on Facebook's ISOGG group (link (https://www.facebook.com/groups/isogg/permalink/10152636307522922#)).

haleaton
08-07-2014, 02:21 AM
Well, on the extreme side in the case of the 1000 Genomes data-set, there are some SNPs that appear in excess of 20, even 30 times in various parts of the tree, while many other SNPs appear only once. Some other SNPs (such as P25) don't even show up at all in NGS data due to insufficient fragment read lengths. Part of the problem is that many of the more recurrent SNPs may be situated in regions that are somewhat unstable, such as palindromes, micro-satellites, and any other regions that may have or could experience chromosomal transposition due to high similarity. I certainly don't have a handle on how to objectively categorize the stability of polymorphisms in a particular region, but this issue was recently brought up for discussion on Facebook's ISOGG group (link (https://www.facebook.com/groups/isogg/permalink/10152636307522922#)).

I noticed, culling through my FGC BGI and FTNDA BigY Data, that many of SNPs I shared with multiple distant halplogroups often had a higher heterozygosity often also with a large number of reads, though considered high quality. I was under the impression that in some cases this was due to read length with current NGS techology.

Is it possible that some of these are just due to current measurement uncertainty not instability?

George Chandler
08-07-2014, 03:34 AM
I noticed, culling through my FGC BGI and FTNDA BigY Data, that many of SNPs I shared with multiple distant halplogroups often had a higher heterozygosity often also with a large number of reads, though considered high quality. I was under the impression that in some cases this was due to read length with current NGS techology.

Is it possible that some of these are just due to current measurement uncertainty not instability?

I "believe" you're right as there are various reasons why the SNP's are culled by YSEQ. I know I use the word "instability" for all culled SNP's but that may be an inappropriate term.

George

seferhabahir
08-07-2014, 05:32 AM
I'm referring to something like FGC11962 (17349099G>A) that is only found in two haplogroups. If you look in ybrowse it says Approx. R-L21 below S9294 because FGC found it in their analysis of my DNA. However, according to FGC it apparently was also found in HG01356, a person that is R-Z1909 (downstream from L2). It is not a SNP that is found in 20 or 30 haplogroups. Apparently just two for the moment (R-L21 and R-L2), which to my way of thinking is probably almost the same as being unique. According to Greg Magoon, for most FGC testers, there are usually one or two high-reliability SNPs found that have appeared once before in some other haplogroup. If the region is stable, we might want to treat these kinds of only found once before SNPs similar to never before found SNPs. In my BAM file it does not look heterozygous (100% of the sequences have exactly the same mutation).

Recurrent was probably a bad choice of words, although I think technically that is what they are. Another case is something like Z10164, which ybrowse says is R-Z251 (also found in my DNA but not found in any others who are R-Z251), but was apparently first found in haplogroup A1a. This looks like yet another independently mutated SNP on my line. So, here are two SNPs, FGC11962 and Z10164 that have both appeared in one other lineage that is not R-Z251, but within R-Z251 they only appear in my line. Z10164 is also unambiguous in my BAM file, all 80 or so sequences having the same mutation. To my way of thinking, they should be used as part of the age calculation along with the other high-reliability FGC SNPs that have not appeared anywhere else to date.

George Chandler
08-09-2014, 03:56 PM
I have a lot of confidence in your estimates because they are falling in line so well with Mal;ta and in the case of L21 and DF13 they are uncannily close to archaeological expectation for a beaker group arriving at the English Channel.

Can I ask, after the R1 defining SNP are there shared SNPs between R1b and a or do the two lines split immediately?

This is a great paper by Alison Sheridan which details Beaker artifacts that have been radiocarbon dated from about 3,500 to 3,900 years ago from Northern Scotland and mainly from the Aberdeen area. This fits like a key in a lock for the DF13 Beaker arrival in Scotland IMO.

http://www.academia.edu/4176664/Scottish_Beaker_dates_the_good_the_bad_and_the_ugl y

George

George Chandler
08-15-2014, 03:27 AM
I just received SNP results back from another Chandler cousin with whom we share a MRCA born in 1800 (6 generations ago for me). Of the 2 Sanger verified SNP's that occurred in my line since 1595 he only shares one of them. This means that the second SNP has occurred in my line since 1825 or within the past 146 years and the first SNP occurring in the first 205 years.

George

MJost
08-15-2014, 10:03 AM
This is a great paper by Alison Sheridan which details Beaker artifacts that have been radiocarbon dated from about 3,500 to 3,900 years ago from Northern Scotland and mainly from the Aberdeen area. This fits like a key in a lock for the DF13 Beaker arrival in Scotland IMO.

http://www.academia.edu/4176664/Scottish_Beaker_dates_the_good_the_bad_and_the_ugl y

George

Thanks for linking this paper.

This most probably had the influence of L11-P312/L21/DF13 ancestors around this time and via the Rhine/Netherlands connection as mentioned in this paper that I will quote.

"...Scottish Beaker use, taking on board the afore mentioned shortage of dates for the earliest material, would seem to span the 25th century BC to 1800 BC (ie 3900/3875–3550 BP). The vast majority of dates fall within the last three centuries of the third millennium BC (ie 3850–3650 BP); this, and the diversity of designs represented, fit with Needham’s model of a ‘fissionhorizon’. However, if one considers the totality of dates covering the diverse designs that characterise this phase of Beaker-related activity, it appears that this ‘fissionhorizon’ may have started slightly earlier in Scotland than in southern England – perhaps as early as 2350 BC"

"The second is that the dating evidence now available confirms earlier suspicions (as expressed, for example, by Ian Shepherd in 1986) that there had been a design influence from the Netherlands to north-east Scotland during the last three centuries of the third millennium – in addition to any previous Dutch (or other Continental)influence on Scottish/British Beaker design. ... Furthermore, such influence was not limited toceramics, as the Dutch-style copper diadems or neck rings from Lumphanan, Aberdeenshire, indicate (Shepherd 1986: 9; Needham 2004: 237–8). Nor, indeed, is the area of contact limited to north-east Scotland. For all their faults, Clarke’s stylistic groupings (Clarke 1970) highlight widespread and regionally-variable design links across the North Sea; Clarke and Case (e.g. Case 2004) have argued for a possible lower Rhine conduit for the ultimately north European fashion of using battle axeheads as grave goods; and Needham has argued for the presence of a Veluwe-style Beaker, along with tanged copper knife closely comparable with Dutch examples, at Shrewton (5K, Wiltshire: Needham 1976:table 3; 2005). As far as Scottish Beakers are concerned, it would be useful to re-compare them with Dutch Beakers to assess the likely strength of the Dutch influence ..." pg 99-101

MJost

George Chandler
08-15-2014, 01:50 PM
Thanks for linking this paper.

This most probably had the influence of L11-P312/L21/DF13 ancestors around this time and via the Rhine/Netherlands connection as mentioned in this paper that I will quote.

"...Scottish Beaker use, taking on board the afore mentioned shortage of dates for the earliest material, would seem to span the 25th century BC to 1800 BC (ie 3900/3875–3550 BP). The vast majority of dates fall within the last three centuries of the third millennium BC (ie 3850–3650 BP); this, and the diversity of designs represented, fit with Needham’s model of a ‘fissionhorizon’. However, if one considers the totality of dates covering the diverse designs that characterise this phase of Beaker-related activity, it appears that this ‘fissionhorizon’ may have started slightly earlier in Scotland than in southern England – perhaps as early as 2350 BC"

"The second is that the dating evidence now available confirms earlier suspicions (as expressed, for example, by Ian Shepherd in 1986) that there had been a design influence from the Netherlands to north-east Scotland during the last three centuries of the third millennium – in addition to any previous Dutch (or other Continental)influence on Scottish/British Beaker design. ... Furthermore, such influence was not limited toceramics, as the Dutch-style copper diadems or neck rings from Lumphanan, Aberdeenshire, indicate (Shepherd 1986: 9; Needham 2004: 237–8). Nor, indeed, is the area of contact limited to north-east Scotland. For all their faults, Clarke’s stylistic groupings (Clarke 1970) highlight widespread and regionally-variable design links across the North Sea; Clarke and Case (e.g. Case 2004) have argued for a possible lower Rhine conduit for the ultimately north European fashion of using battle axeheads as grave goods; and Needham has argued for the presence of a Veluwe-style Beaker, along with tanged copper knife closely comparable with Dutch examples, at Shrewton (5K, Wiltshire: Needham 1976:table 3; 2005). As far as Scottish Beakers are concerned, it would be useful to re-compare them with Dutch Beakers to assess the likely strength of the Dutch influence ..." pg 99-101

MJost

I'll have to try and find the other information I was looking at that "I believe" stated that there was Bell Beaker Pottery in Ireland being dated to a similar period. This made me wonder if when they left Continental Europe they divided their boats in 2 directions. It would be amazing if they arrived on the NE shores of Scotland with some travelling overland to the west coast, building boats and then populating Ireland. So if that carbon dating is true I saw was true it would all be within a couple of hundred years. I'm not sure if the pottery was the same style as the NE Scotland/Rhine Beakers.

George

jamesdowallen
12-23-2014, 02:58 PM
Sorry to bump an oldish thread, but dating R1b-L11 is of great interest to me, and I don't check here often. I would love to see a clading diagram of R1b with SNP counts; something like the Francalacci-2013 chart, but it doesn't label the R1b branches. I read through this thread but if there was an explicit link to such a chart, I missed it.

Two comments/questions:
(1) Is the father-son generation time a key variable? I thought mutations multiply in spermatogenesis with passing years, so the increased mutation counts in older fathers would make years rather than generations the key variable.
(2) In another thread, I think we agreed the R1a/R1b split occurred very close to 26000 BP. With this as reference, I think the L11 timing should be determinable to about ± 400 years using modern DNA, or even better if an ancient skeleton can be fully sequenced.


Well at least all variations on dating techniques are coming in at about 3500-2000BC for the immediately below L11 major clade nodes which does at least seem to put it in a date range that is clearly post-first farmers across Europe - which is in line with ancient DNA evidence and does support the copper age arrival model.

I still think the really hard to work out part is the initial story of how L11 derived groups got to the western half of Europe from the older R1b zone of Eastern Europe and SW Asia in the copper age. Its like there is a missing link or part of the story has been obscured by later population replacement....

Your "3500-2000BC" date for L11 breakout is what we knew already and gives ± 750 years uncertainty. Asking for improvement to ± 400 years may seem like nitpicking, but it would help a lot in interpreting archaeological data. With present data can we even be sure whether the sudden L11 radiation began in Iberia, or much farther east? Do we even know whether the best Western copper workers arrived overland with Corded Ware, or came by sea from the Mediterranean?

Question, questions ... Help, please!

MJost
12-23-2014, 04:09 PM
Just to clarify

>'the "3500-2000BC" date for L11 breakout is what we knew already and gives ± 750 years uncertainty'

is a block of seven SNPs which contains L11 and that the order of occurrence is unknown as of yet. The 1500 years represents total time for the block of mutations to have occurred not just a date of 2750 BC as a node point for L11 with an error range ± 750 years. The block contains L11 (7 other SNPs including P312) that have a total of 903 yrs in block, thus from P312 at 4524 ybp (2524bc) to Max time for the L11 block to 5427 (3427bc).

MJost

MJost
12-23-2014, 04:13 PM
> (1) Is the father-son generation time a key variable? I thought mutations multiply in spermatogenesis with passing years, so the increased mutation counts in older fathers would make years rather than generations the key variable.

Mutations occur during a meiosis generation. Time for each generation varies but is generally averaged. My time plan generally uses 30 years per generation for the most recent 2000 years and 20 before 0AD noting that this may vary I different paternal lines.

MJost

jamesdowallen
12-23-2014, 05:46 PM
Thanks for your responses, MJost.

Is there a diagram, e.g. based on the 1000 Genomes study, showing SNP's within R1 or R1b? I find charts on the 'Web, but not ones like the Francalacci or Raghavan studies where SNP distances to the present can be read or deduced directly from the chart.

MJost
12-23-2014, 06:52 PM
Jump over to here for further discussions and read this post I did.

http://www.anthrogenica.com/showthread.php?828-STR-Wars-GDs-TMRCA-estimates-Variance-Mutation-Rates-amp-SNP-counting&p=60307&viewfull=1#post60307

MJost

faulconer
12-23-2014, 06:55 PM
> (1) Is the father-son generation time a key variable? I thought mutations multiply in spermatogenesis with passing years, so the increased mutation counts in older fathers would make years rather than generations the key variable.

Mutations occur during a meiosis generation. Time for each generation varies but is generally averaged. My time plan generally uses 30 years per generation for the most recent 2000 years and 20 before 0AD noting that this may vary I different paternal lines.

MJost

An exponential model estimates paternal mutations doubling every 16.5 years. (http://www.ncbi.nlm.nih.gov/pubmed/22914163)

Aside from the fact that spermatogonia remain dormant until puberty, it seems that time is the important factor, though generations do matter if the mitosis of spermatogonia is to be considered.

MJost
12-23-2014, 07:18 PM
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3548427/

Are you suggesting the this study states or implies that aDNA would correspond to the same mutation rates as does the ChrY?

MJost

faulconer
12-23-2014, 07:32 PM
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3548427/

Are you suggesting the this study states or implies that aDNA would correspond to the same mutation rates as does the ChrY?

MJost

I am not. Do they need to be the same to allow for the age of the father to alter the number of Y mutations passed per generation? This is not a rhetorical question. I am curious as to how you chose your rates.

MJost
12-23-2014, 08:40 PM
I am not. Do they need to be the same to allow for the age of the father to alter the number of Y mutations passed per generation? This is not a rhetorical question. I am curious as to how you chose your rates.

Please review my posts here

http://www.anthrogenica.com/showthread.php?828-STR-Wars-GDs-TMRCA-estimates-Variance-Mutation-Rates-amp-SNP-counting&p=59988&viewfull=1#post59988

faulconer
12-23-2014, 10:56 PM
Thank you for the link. I have read through these posts before. I am specifically commenting on the generations vs time question brought up earlier in the thread. My quote from the article may be a bit misleading, I apologize for that. My intention is to point out that I am not sure I follow how one can determine a SNPs to generations number that is constant regardless of time between generations.

If we can agree that the Y is copied many times before the sperm fertilizes the egg and that mutations occur during this process, then we can also agree that the number of mutations per generation would vary based on time between generations. Perhaps this is where we disagree?

It is possible that I am missing a basic understanding in this area and I am very interested in understanding your position. Please let me know what you think.

MJost
12-24-2014, 04:13 AM
Thank you for the link. I have read through these posts before. I am specifically commenting on the generations vs time question brought up earlier in the thread. My quote from the article may be a bit misleading, I apologize for that. My intention is to point out that I am not sure I follow how one can determine a SNPs to generations number that is constant regardless of time between generations.

If we can agree that the Y is copied many times before the sperm fertilizes the egg and that mutations occur during this process, then we can also agree that the number of mutations per generation would vary based on time between generations. Perhaps this is where we disagree?

It is possible that I am missing a basic understanding in this area and I am very interested in understanding your position. Please let me know what you think.

SNP (Single Nucleotide Polymorphism) mutation do more frequently occur in the germline cells.

>the Y is copied many times before the sperm fertilizes the egg and that mutations occur during this process

What you mean is somatic. Where the Y chromosome undergoes multiple cell divisions during gametogenesis. Each cellular division provides further opportunity to accumulate base pair mutations but in very low rates.

----------------------------------

Wiki:

"A germline mutation is any detectable and heritable variation in the lineage of germ cells. Mutations in these cells are transmitted to offspring, while, on the other hand, those in somatic cells are not. A germline mutation gives rise to a constitutional mutation in the offspring, that is, a mutation that is present in virtually every cell. A constitutional mutation can also occur very soon after fertilisation, or continue from a previous constitutional mutation in a parent.[1]

This distinction is most important in animals, where germ cells are distinct from somatic cells. However, in plants, the reproductive cells in a particular flower will be derived from the same meristem as the cells in that flower and on the stem leading to the flower, which is a different population of cells than those that give rise to the other flowers on the plant. Single-celled organisms have no distinction between germline and somatic tissues.

In animals, mutations are more likely to occur in sperm than in ova, because a larger number of cell divisions are involved in the production of sperm.[2]

Mutations that are not germline are somatic mutations, which are also called acquired mutations"

-----------------------------

I used the Wei study information to calibrate my SNP counting (posted previously).


[[""estimate this number from the sequences of the three-generation

family... observation of two germline mutations in two transmissions of

8.97 Mb is consistent with the expectation of ~0.6 mutations in two

transmissions (0.3 variants observed per meiosis in 10.5 Mb)"(Xue et al.

2009)." (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2748900/ A

calibrated human Y-chromosomal phylogeny based on resequencing)]]

I posted this example.

"Let say we have a point in time of 4000 ybp with 33 year generations

(ypg) equates to 121.1 generations. At 0.3 variants observed per meiosis,

calculates to 36.3 mutations that would occur."

_______________________

I worked at aging the list of SNPs using this mutation rate to keep it simple.


MJost

Chad Rohlfsen
12-24-2014, 05:28 AM
Everyone,

To the best of my knowledge, I am L11, P311, with nothing under it. What test would you recommend me buying? Will it be of good use to you?

Thanks,
Chad

Wing Genealogist
12-24-2014, 08:12 AM
Everyone,

To the best of my knowledge, I am L11, P311, with nothing under it. What test would you recommend me buying? Will it be of good use to you?

Thanks,
Chad

Ideally, the best SNP testing would be the NGS testing (either FTDNA's Big Y or FGC's (https://www.fullgenomes.com/) Y-Prime or Y-Elite). These tests are not cheap, but they would be the best tests on the market today.

FGC's tests also automatically includes a listing of SNPs not found in public results (such as the 1000 Genome project or the PGP Project). They also analyze Big Y results for a small fee. In addition, another company, YFull (http://www.yfull.com) also analyzes Big Y and FGC results for a fee, and list where you match other individuals.

What would likely be of the most benefit is the listing of SNPs that you DON'T currently match others. Seeing how many of these "singleton" SNPs may be able to assist in dating L11/P311.

jamesdowallen
12-24-2014, 02:45 PM
Thanks for pointing me to the YFull tree (http://www.yfull.com/tree/R1b/). I'm posting here again because the thread title is precisely my query.

It gives me feelings of confidence and definiteness to work with rawish data. So I counted the distance in SNPs from the root of the R1b tree to the leaves. Here are my results, which I did not yet double-check since I'm afraid my understanding is very flawed. I do realize I'm "reinventing a wheel"; I'd be happy to be pointed to an existing well-defined wheel. :)

YFull uses three labels (HG, YF, and NA) on its leaves. I couldn't find the page where they explain their terminology, but I broke down the counts by these three types and got:
HG -- 144 individuals, avg 45.4 SNPs (sd 15) range 5 - 93
YF -- 274 individuals, avg 51.6 SNPs (sd 17) range 28 - 100
NA -- 75 individuals, avg 45.3 SNPs (sd 12) range 25 - 93

The SNP counts shown in the Francalacci and Raghavan-Skoglund papers "compare apples with apples" because SNPs were counted only when tested in all individuals. I realize that complex mutations can pose difficulties even for that, but the above statistics suggest that the SNP counts extracted from YFull do NOT offer "apples with apples." (For example, the ratios of max count to min count are much larger than those of the above-mentioned papers.)

Help, please?

lgmayka
12-24-2014, 04:00 PM
For example, the ratios of max count to min count are much larger than those of the above-mentioned papers.
YFull does not post "private" SNPs to its tree, where private is defined to be any SNP not yet found in two or more people. Thus, the one member of C-V20 (C1a2) on YFull's tree (http://yfull.com/tree/C/) has 329 private SNPs that are not shown on the tree. As soon as another C-V20 submits his BAM file to YFull for analysis, most of those 329 SNPs will become publicly displayed on the tree.

In short, you cannot make comparisons based only on counts of public SNPs--you have to add in the private SNPs too.

jamesdowallen
12-24-2014, 04:40 PM
YIn short, you cannot make comparisons based only on counts of public SNPs--you have to add in the private SNPs too.

Instead one can make an "apples to apples" comparison as the Francalacci and Raghavan-Skoglund papers did. I think each was based on 1000 Genomes, but I can't find the 1000 Genomes data in a "digested" form suitable for answering my question.

lgmayka
12-24-2014, 05:27 PM
Instead one can make an "apples to apples" comparison as the Francalacci and Raghavan-Skoglund papers did. I think each was based on 1000 Genomes
You answered your own question. The 1000 Genomes samples were supposedly anonymized (actually not--the participants can be easily identified by their Y chromosomes and other identifying data), so authors felt free to publish and count the participants' private as well as public SNPs. YFull cannot do this for the personal BAM files submitted to it, for privacy reasons.

jamesdowallen
12-24-2014, 05:36 PM
You answered your own question.

No. What I've been asking for (though evidently doing a poor job of expressing) is a useful link to a derivation of the 1000 Genomes data that will most easily enable me to count the SNPs in the R1b tree. Is that part of the YFull data? If not, where do others get their data? I can find the RAW 1000 Genome data online, but to reconstruct the SNP counts from THAT really would be a massive reinvention!

GailT
12-24-2014, 06:10 PM
No. What I've been asking for (though evidently doing a poor job of expressing) is a useful link to a derivation of the 1000 Genomes data that will most easily enable me to count the SNPs in the R1b tree. Is that part of the YFull data? If not, where do others get their data? I can find the RAW 1000 Genome data online, but to reconstruct the SNP counts from THAT really would be a massive reinvention!

Your question is expressed well, perhaps the people who can respond are not online now, but the archives here and at eng.molgen might have some relevant discussion. One challenge is that different test methods differ in their coverage of the Y genome. FullY Elite has the best coverage (twice as many private SNPs identified compared to BigY), and I believe that 1000 Genome has poorer coverage than either of these. I've seen estimates that 1 FullY Elite SNP is equivalent to 3 generations. Not sure what the conversion would be for BigY or 1000G. I hope someone is compiling data on R1b* SNPs, as I'm also interested in an update on this topic. A large sample size might also be required because of variability in numbers of mutations among samples.

Joe B
12-24-2014, 07:46 PM
Everyone,

To the best of my knowledge, I am L11, P311, with nothing under it. What test would you recommend me buying? Will it be of good use to you?

Thanks,
ChadHey Chad,
Do you belong to the R1b1a2 (P312- U106-) DNA Project (aka ht35 Project) (https://www.familytreedna.com/public/ht35new)? That is where the phylogenetic work is being done for R1b-L11/P311. Consider testing for SNPs CTS4528 & / or DF100. Ideally, please follow the suggestion from Wing Genealogist and get some NGS testing. The data will be put to good use.

As for you SNP counters. Have a look at the R1b-M269 (P312- U106-) DNA Project Phylogenetic Research Tree that smal has put together. It has data from a number of research projects, 1KGP data included.

VinceT
12-24-2014, 11:11 PM
^ Joe B's on the money. I'll also second the vote for testing with Full Genomes Corp., especially if you are counting ALL your SNPs that can be sequenced using current technology. That said, it appears that you may get better mileage if you went with FGC's Whole Genome offer as opposed to the Y-Elite. [link] (http://www.anthrogenica.com/showthread.php?742-Full-Y-Chromosome-Sequencing-Phase-III-Pilot&p=61903&viewfull=1#post61903)

I'm also slowly compiling an exhaustive list of SNPs on the backbone between R-M207 and R-L11/P310 etc., but it is slow going. There are remarkably few NGS samples with on par with FGC or 1KGP data who represent branches splitting off between R-M343 and R-L23 to indicate precisely where most of these SNPs are situated on the tree.

parasar
12-27-2014, 05:52 PM
Thanks for your responses, MJost.

Is there a diagram, e.g. based on the 1000 Genomes study, showing SNP's within R1 or R1b? I find charts on the 'Web, but not ones like the Francalacci or Raghavan studies where SNP distances to the present can be read or deduced directly from the chart.
Not sure if this will work.
The distances are proportional, but their absolutes are off when calibrated with ancient dna..
http://mbe.oxfordjournals.org/content/suppl/2014/11/26/msu327.DC1/FigureS1_TreeWithSampleNames.pdf
https://lh4.googleusercontent.com/-F99Gkpu6UWQ/VI39pJrvZVI/AAAAAAAAB90/bO4qWelRZxs/w250-h287-no/Hallast_FigureS1_TreeWithSampleNames_small.png
http://mbe.oxfordjournals.org/content/early/2014/11/26/molbev.msu327/suppl/DC1

VinceT
12-27-2014, 09:03 PM
The list I have has 67 SNPs at R1 (common to R1a and R1b but not R2), and 66 SNPs at R (common to R1a, R1b, and R2, but no one else from 1KGP including hg Q). R1b is a bit murky: about 60 SNPs are approximate with R-M343 or R-L278. From there, at least 163 more SNPs are somewhere at or below R-L388 but above R-L23.

Then 19 (possibly as many as 32) at R-L23, 5 SNPs at R-L51, and 13 (or 14) more at R-P310.

Between 80 and 100 years per SNP seems to be in the ballpark for 1KGP data, so if R-P312, R-U106, and R-S1200 are all around 4500 ybp, this suggests that hg R began somewhere around 4500 + 90 * 394 = pretty darn close to 40kybp.

MitchellSince1893
12-27-2014, 11:25 PM
...Between 80 and 100 years per SNP seems to be in the ballpark for 1KGP data, so if R-P312, R-U106, and R-S1200 are all around 4500 ybp...

Based on the various SNP dating methodologies, 4500 ybp seems very reasonable. Using my own SNP dating technique (based on the work of others) I get P312 as ~4740 ybp.

jamesdowallen
12-29-2014, 03:25 PM
Not sure if this will work.
The distances are proportional, but their absolutes are off when calibrated with ancient dna..
http://mbe.oxfordjournals.org/content/suppl/2014/11/26/msu327.DC1/FigureS1_TreeWithSampleNames.pdf

Thank you! For my own benefit, I've prepared my own copy of the R1b portion, labeled (http://fabpedigree.com/james/r1bchart.jpg).
I'm not sure why the variations in SNP counts are so great, but I guess it will be impractical to use the SNP counts by themselves for dating L11 without an ancient skeleton for reference. Are there any? The 24,000-year old Siberian boy is useful, but is there one in R-M412?

parasar
12-29-2014, 06:14 PM
Thank you! For my own benefit, I've prepared my own copy of the R1b portion, labeled (http://fabpedigree.com/james/r1bchart.jpg).
I'm not sure why the variations in SNP counts are so great, but I guess it will be impractical to use the SNP counts by themselves for dating L11 without an ancient skeleton for reference. Are there any? The 24,000-year old Siberian boy is useful, but is there one in R-M412?

There is indeed variation, but at least for the older SNPs we are on good grounds with SNP counting.
Your ages look to me to be little high ...

http://fabpedigree.com/james/r1bchart.jpg

...but perhaps in line with Michał's calculations: "significantly older age for haplogroup P (about 38-40 ky), so I was wondering if this difference could be related to a very specific set of data you were using, or maybe to the way you have performed your calculations. It seems that both these things could have contributed to the observed discrepancy." http://www.anthrogenica.com/showthread.php?1507-Some-provisional-calculations-for-haplogroup-R1a-based-on-the-first-FGC-result&p=48298&viewfull=1#post48298

I think the R-Q divergence is ~33kybp.

Ebizur
12-29-2014, 09:14 PM
There is indeed variation, but at least for the older SNPs we are good grounds with SNP counting.
Your ages look to me to be little high ...

...but perhaps in line with Michał's calculations: "significantly older age for haplogroup P (about 38-40 ky), so I was wondering if this difference could be related to a very specific set of data you were using, or maybe to the way you have performed your calculations. It seems that both these things could have contributed to the observed discrepancy." http://www.anthrogenica.com/showthread.php?1507-Some-provisional-calculations-for-haplogroup-R1a-based-on-the-first-FGC-result&p=48298&viewfull=1#post48298

I think the R-Q divergence is ~33kybp.Yes, jamesdowallen seems to have misplaced those TMRCA estimates. The TMRCA of (R1 + R2) should be approximately 27,000 years, and the TMRCA of (Q + R) should be approximately 33,000 years. I do not know how he may have obtained the 40 kya estimate; the TMRCA of (K1 + K2) based on Hallast et al. should be approximately 45,000 years, and not 40,000 years.

alan
12-29-2014, 11:41 PM
Yes, jamesdowallen seems to have misplaced those TMRCA estimates. The TMRCA of (R1 + R2) should be approximately 27,000 years, and the TMRCA of (Q + R) should be approximately 33,000 years. I do not know how he may have obtained the 40 kya estimate; the TMRCA of (K1 + K2) based on Hallast et al. should be approximately 45,000 years, and not 40,000 years.

If R was literally 25000BC it places this ancestor of Mal'ta boy as living in the start of the LGM. Now we know that Mal'ta lived in Siberia within the early LGM 3000 years later. Its unthinkable that R moved north in the early LGM so IMO it seems to me that all R came from an early LGM Siberian whose ancestors moved from there either during or after the LGM. We known that apart from Mal'ta boy there was a major depopulation of his cultural group in the area he lived in around 25000BC and all gone by 22000BC. So, for me there is no doubt R arose in south-central Siberia. Then when you factor in that exactly the same culture as Mal'ta boy had existed in the same area from as early as 30000BC then its safe to say that there was a long line of pre-R ancestors who would have been haplogroup P people lived in the same area for 5000 years before R existed. As P is not usually dated older than 30000BC then it seems likely to me that P also arose in the early days of that culture from a K ancestor in the same area. Mal'ta boys culture seems to have been a local evolution from a culture dating back to as long ago as 45000 years so it seems likely that K people were spread right across south Siberia in the period 40000-30000BC, which again Ust'-Ishim confirms. Archaeology and genetics seem in perfect harmony in terms of the early north Asian human story.

VinceT
12-29-2014, 11:54 PM
I have downloaded the Mal'ta MA-1 data (which I believe I had read previously was allegedly an early form of R1), and intend to compare against the list I have, in the days to come. At least I should be able to confirm which ones in the list are no-calls, negative-calls, and positive calls for MA-1.

alan
12-30-2014, 12:19 AM
I have downloaded the Mal'ta MA-1 data (which I believe I had read previously was allegedly an early form of R1), and intend to compare against the list I have, in the days to come. At least I should be able to confirm which ones in the list are no-calls, negative-calls, and positive calls for MA-1.

Am pretty sure from memory he was a third line a few thousand years downstream of R but not on the same line as R1 and R2. Sometimes referred to as R3 or a derived form of R*.

To me the significance of the Mal'ta boy was he was almost the last member of the south central Siberian middle upper Palaeolithic culture 30-22000BC so even without further samples we can have a very strong guess that R and perhaps even P arose within that culture. We also can be pretty sure that R arose in that culture about 3000 years before Malta lived. I am not sure what the latest estimate for R1 is- that would tell us quite a lot too if it can be calculated with confidence.

Michał
12-30-2014, 12:25 AM
...but perhaps in line with Michał's calculations: "significantly older age for haplogroup P (about 38-40 ky), so I was wondering if this difference could be related to a very specific set of data you were using, or maybe to the way you have performed your calculations. It seems that both these things could have contributed to the observed discrepancy." http://www.anthrogenica.com/showthread.php?1507-Some-provisional-calculations-for-haplogroup-R1a-based-on-the-first-FGC-result&p=48298&viewfull=1#post48298

I think the R-Q divergence is ~33kybp.

Please note that the recent FGC-based estimates provided by VinceT are supporting my older estimates:
http://www.anthrogenica.com/showthread.php?2963-SNP-Counting-and-Estimating-the-Age-of-R1b/page7&p=62911#post62911

Michał
12-30-2014, 12:32 AM
Yes, jamesdowallen seems to have misplaced those TMRCA estimates.

I don't see why.


The TMRCA of (R1 + R2) should be approximately 27,000 years

Such a young age for haplogroup R is inconsistent with the Mal'ta boy data.


and the TMRCA of (Q + R) should be approximately 33,000 years.

I don't know any data strongly suggesting that haplogroup P is that young. All studies I have seen, including Francelaacci et al., Rootsi et al., Raghavan et al.(Mal'ta) and Rasmussen et al. (Anzick-1) indicate that haplogroup P is about 36-41 ky old. Even when parasar claims that the Anzick-1 data suggest that the R-Q split took place about 33 ky ago this is only because he uses a very specific way of calculating the TMRCA value for haplogroup P, basing his estimate on a single subclade of haplogroup Q, while disregarding the available data for haplogroup R.

Please see my calculations based on the Anzick data:
http://www.anthrogenica.com/showthread.php?1507-Some-provisional-calculations-for-haplogroup-R1a-based-on-the-first-FGC-result&p=48298&viewfull=1#post48298
and those based on the Mal'ta data:
http://eng.molgen.org/viewtopic.php?f=183&t=1808&start=10 (in Polish)
http://www.anthrogenica.com/showthread.php?1507-Some-provisional-calculations-for-haplogroup-R1a-based-on-the-first-FGC-result/page4&p=20838#post20838
those based on the Francelacci data:
http://www.anthrogenica.com/showthread.php?828-STR-Wars-GDs-TMRCA-estimates-Variance-Mutation-Rates-amp-SNP-counting/page8&p=15936#post15936
and those based on the Rootsi data:
http://www.anthrogenica.com/showthread.php?828-STR-Wars-GDs-TMRCA-estimates-Variance-Mutation-Rates-amp-SNP-counting/page9&p=26002#post26002



I do not know how he may have obtained the 40 kya estimate;

Please see above. Also, according to Hallast et al. the TMRCA for haplogroup P is 39178 (or 34190-55067) when using the mutation rate suggested by Mendez (ie. 0.617 x 10^-9 per bp per year).


the TMRCA of (K1 + K2) based on Hallast et al. should be approximately 45,000 years, and not 40,000 years.

The SNP-based TMRCA for haplogroup K, as provided by Hallast et al., is either 52822 (46098-74244) when using the mutation rate by Mendez et al., or 32589 (13036-108642) when using the much less reliable mutation rate by Xue et al.

In fact, the K1-K2 split could not have happened neither 40 nor 45 kya, as the Ust'-Ishim remains are 45 ky old and they represent an extinct subclade of K2. Also, the Ust'-Ishim paper dates the K2 split to 50 (47-55) kya, and this is as close to the actual date as we can get at the moment.

Michał
12-30-2014, 12:39 AM
Am pretty sure from memory he was a third line a few thousand years downstream of R but not on the same line as R1 and R2. Sometimes referred to as R3 or a derived form of R*.
Actually, I would call it rather R0, as this lineage is a sister clade of a lineage ancestral to R1+R2.


I am not sure what the latest estimate for R1 is- that would tell us quite a lot too if it can be calculated with confidence.
I will be very surprised if the age of R1 turns out to be not within the 25-30 ky range.

alan
12-30-2014, 01:25 AM
The K1-K2 split of 47-55000 years pretty well places it as happening before north Asia was settled and it must have happened in SW Asia IMO. From memory the Ust-Ishim paper also calculated Neanderthal admixture back around 50-odd thousand years - again pointing to SW Asia within the Neanderthal range. So it seems incredibly likely to me that the first K2 person was a SW Asian. Archaeological evidence would strongly point towards the the first humans in north Asia as having a technology that links with the Emiran culture of the Levant and similar ones around Iran, Uzbek etc. It all seems to fit together very well IMO.

It remains hard to explain how K2 has a distribution in both north Asia and south-east Asia and islands. I think its possible that there could be two factors. One could be a north-south shunt of populations from north Asia during climate downturns. It may seem a long distance but there was a dead zone between Siberian and south-east coastal Asia during the LGM. The other factor could be that K2 itself split into two directions early on in SW Asia with one group heading to north Asia via Iran and fairly archaeologically visible while another group moved east by a southern route. This latter group is not archaeologically attested but the bamboo factor may be to blame - south Asia has poor flint but much bamboo and bamboo would have taken over from flint as the main material for tools - greatly reducing flint skills and giving very poor archaeological visibility due to the perishable nature of bamboo. http://www.eeob.iastate.edu/research/bamboo/maps/world-total-woody.gif

The terrible flint tradition/probable use of bamboo seems to be typical of south-east Asia and the islands in this period. Problem is this makes human movement almost impossible to determine for archaeologists.

Ust' Ishim is fascinating because from archaeology we know from his date and geography that he was part of the rapid wave that brought the first modern humans wave into north Asia. Around this time they were just about arriving as far east as Altai but not yet Baikal, Mongolia etc. So we basically know now that north Asia's first human wave, already known from archaeology before DNA testing, carried K2. I suppose from this point onwards north Asia was occupied by various K2 lineages until P emerged by 34-39000BC or earlier. That early dating also places P within a later phase but still within the early south Siberian culture period that lasted c. 43000-30000BC- the same horizon as Ust Ishim. Actually that sort of date for P corresponds roughly to the period when it reached it fullest extend as a culture and ranged across south Siberia to Baikal, north Mongolia and the NW Fringes of China so its hard to exactly pinpoint where P would have arisen in that range - the only clue is that an fairly early R guy was on Baikal in 22000BC.

Ebizur
12-30-2014, 01:50 AM
Such a young age for haplogroup R is inconsistent with the Mal'ta boy data.It is not inconsistent. Actually, as alan has outlined above, it fits very nicely into a scenario that would have the common ancestors of the Mal'ta-Buret' culture and the Afontova Gora-Oshurkovo culture of Upper Palaeolithic Southern Siberia as the population from which Y-DNA haplogroup QR has descended.


In fact, the K1-K2 split could not have happened neither 40 nor 45 kya, as the Ust'-Ishim remains are 45 ky old and they represent an extinct subclade of K2. Also, the Ust'-Ishim paper dates the K2 split to 50 (47-55) kya, and this is as close to the actual date as we can get at the moment.You appear to have misunderstood my comment. After pointing out that jamesdowallen seemed to have misplaced the TMRCA of R1 and R2 at the node in the tree of Hallast et al. that represents the MRCA of R1a and R1b and the TMRCA of Q and R at the node in the tree that represents the MRCA of R1 and R2, I commented that the remaining TMRCA figure of 40,000 years would not be consistent with the TMRCA of K2a and K2b (i.e. one of the most likely nodes for a person to have a TMRCA estimate upstream of (Q + R)).

The Ust'-Ishim specimen's Y-DNA is only very slightly (about 2 + 6 = 8 SNPs) removed from the MRCA of K2a and K2b, and located on the K2a side of the split. This means that the MRCA of K2a and K2b must have lived not long before 45,000 YBP. A TMRCA of 40,000 years for (Q + R) would require an unusually high rate of accumulation of SNPs between the K2 node and the P1 (i.e. QR) node. Your estimate of 40,000 years for the TMRCA of Q and R is at the upper limit of that allowed by the 95% CI of Fu et al. 2014 for the TMRCA of K(xLT) (i.e. K2).

At present, I cannot see the TMRCA of K2a and K2b being much greater than 48,000 years, so I would place the MRCA of Q and R at closer to 35,000 YBP than 40,000 YBP.

alan
12-30-2014, 01:57 AM
Actually, I would call it rather R0, as this lineage is a sister clade of a lineage ancestral to R1+R2.


I will be very surprised if the age of R1 turns out to be not within the 25-30 ky range.

I am using a 25000BC give or take a bit as mental shorthand which is almost in the middle of that range. Its an interesting date as the LGM in Siberia is usually said to commence then. There was a good period from about 43000-36000BC followed by a downturn around 36000-33000BC then another better period 33000-25/22000BC then a slump towards the LGM. Roughly speaking these coincide with

1. the initial early upper palaeolithic culture of Siberia
2. A period of perhaps extension south into Mongolia and NW China
3. A revival period of the middle upper palaeolithic culture with Mal'ta boy its last representative-this expanded a little further north into Siberia than previous phases
4. Then a period of mostly abandonment of Siberia during the worst of the LGM except Altai.

Its impossible not to see these having a huge influence on the demographic history of the area. We are very fortunate in that K2 Ust-Ishim is a representative of 1. and R0 Mal'ta is a late representative of 3. It would be nice to get someone from Siberia who falls into the post K2, pre-R phase i.e. the P phase. I would also say my belief is that N and maybe even O may relate to 2. in my list above. I cannot see past them being a southward push from K2 elements in Siberia.

vettor
12-30-2014, 02:24 AM
Only when you include in the calculas, the K1 split age and the X as per the NO ( k2a) group split age will any calculations of age of K2b be resolved..............most are clutching at straws in regards to dating and are dying to ensure that P and/or R are not SE-Asian

alan
12-30-2014, 02:34 AM
It is not inconsistent. Actually, as alan has outlined above, it fits very nicely into a scenario that would have the Mal'ta-Buret' culture of Upper Palaeolithic Southern Siberia as the archaeological correlate of the population from which Y-DNA haplogroup QR has descended.

You appear to have misunderstood my comment. After pointing out that jamesdowallen seemed to have misplaced the TMRCA of R1 and R2 at the node in the tree of Hallast et al. that represents the MRCA of R1a and R1b and the TMRCA of Q and R at the node in the tree that represents the MRCA of R1 and R2, I commented that the remaining TMRCA figure of 40,000 years would not be consistent with the TMRCA of K2a and K2b (i.e. one of the most likely nodes for a person to have a TMRCA estimate upstream of (Q + R)).

The Ust'-Ishim specimen's Y-DNA is only very slightly (about 2 + 6 = 8 SNPs) removed from the MRCA of K2a and K2b, and located on the K2a side of the split. This means that the MRCA of K2a and K2b must have lived not long before 45,000 YBP. A TMRCA of 40,000 years for (Q + R) would require an unusually high rate of accumulation of SNPs between the K2 node and the P1 (i.e. QR) node.

If K2 is perhaps from around 44000BC that puts that node in a tricky time which may or may not pre-date the thrust from SW Asia into Siberia. Its kind of on the cusp. I suspect from datings for the climate improvement that allowed the first humans that such a date would slightly predate the settlement of Siberia and therefore K2 could have come into being in SW Asia. However, its a close call. Interestingly, a similar but short lived culture spread into part of Europe - the Bohunician and Bachokirian but they may have left no descendants. As I stated above, we should not underestimate the profoundly negative effect the likely use of bamboo for tools in south Asia has on archaeologists ability to detect human movements into and though those bamboo using areas. All that survives is what was likely crude flint blades used for cutting the bamboo. So we could be completely missing an important wave from SW Asia along the southern route linked somehow to some of that south-east Asian K2 c. 44000BC at the same sort of time as others moved east through the north Asian route. I personally suspect a second wave might have followed a very early C wave to east Asia.

jamesdowallen
12-30-2014, 10:06 AM
My date estimates (R1a/R1b split 27,000 BP, etc.) are shown in a chart I prepared (http://fabpedigree.com/james/yhestimf.jpg) and previously posted. As I explain (http://fabpedigree.com/james/yhchart.htm) the basic chart is copied from Francalacci but the dating is derived from the SNP counts in Raghavan-Skoglund's analysis of 24,000-year old Siberian boy's Y-chromosome. (Unfortunately the link to Nature supplement no longer works, though I'm sure I and others still have copies.)

In the Raghavan-Skoglund paper, they explain how they do an "apples with apples" comparison to 1000 Genome data to get usable SNP counts. The number of SNPS from R-common to Siberian boy is much greater than to R1; if their chart is drawn to scale, they agree, showing Siberian boy ("MA-1") well after the R1a-R1b split.

In the year-ago thread where this was discussed, the numbers I got seemed in agreement with other numbers. I thought it was settled. ;)

But anyway, fine-tuning the date of R1a-R1b split will help only minimally to establish the more interedting dates of Rib-L11. How tight are those dates? Are there ancient skeletons to work with?

ETA: A key point that might be overlooked is that, although Siberian boy is R*, he has most of the SNPs between R and R1/R2 split AND another 35 SNP's -- quite a few given the somewhat sparse sampling available for that old skeleton.

alan
12-30-2014, 12:36 PM
My date estimates (R1a/R1b split 27,000 BP, etc.) are shown in a chart I prepared (http://fabpedigree.com/james/yhestimf.jpg) and previously posted. As I explain (http://fabpedigree.com/james/yhchart.htm) the basic chart is copied from Francalacci but the dating is derived from the SNP counts in Raghavan-Skoglund's analysis of 24,000-year old Siberian boy's Y-chromosome. (Unfortunately the link to Nature supplement no longer works, though I'm sure I and others still have copies.)

In the Raghavan-Skoglund paper, they explain how they do an "apples with apples" comparison to 1000 Genome data to get usable SNP counts. The number of SNPS from R-common to Siberian boy is much greater than to R1; if their chart is drawn to scale, they agree, showing Siberian boy ("MA-1") well after the R1a-R1b split.

In the year-ago thread where this was discussed, the numbers I got seemed in agreement with other numbers. I thought it was settled. ;)

But anyway, fine-tuning the date of R1a-R1b split will help only minimally to establish the more interedting dates of Rib-L11. How tight are those dates? Are there ancient skeletons to work with?

ETA: A key point that might be overlooked is that, although Siberian boy is R*, he has most of the SNPs between R and R1/R2 split AND another 35 SNP's -- quite a few given the somewhat sparse sampling available for that old skeleton.

So the split of R into R1, R2 and R0 happened some time after R i.e. they have a MRCA some time after R. How many SNPs after the R defining one do R1, R2 and Mal'ta/R0 share?

jamesdowallen
12-30-2014, 02:41 PM
So the split of R into R1, R2 and R0 happened some time after R i.e. they have a MRCA some time after R. How many SNPs after the R defining one do R1, R2 and Mal'ta/R0 share?

Since one of the co-authors directed me to the page; I'll assume I have permission to post the relevant chart:
3303
Three classes of SNP are shown counted: 'a', 'd', 'N.' I found this somewhat confusing but I think 'd' SNPs match Siberian boy ("MA-1"), 'a' do not match him, 'N' are, for some reason, uninformative.

(I'm a beginner here; did I include the image best way?)

parasar
12-30-2014, 04:53 PM
So the split of R into R1, R2 and R0 happened some time after R i.e. they have a MRCA some time after R. How many SNPs after the R defining one do R1, R2 and Mal'ta/R0 share?

R level mutations are defined in terms of M242 and the R1-R2 split. MA1 is ancestral at 5 of those and shares I believe 19 of the remaining 36. So he is basal to R by those 5 and further derived by his 35 private SNP. So I have preferred to term him a pre-R derivative.

His depth of coverage is relatively low (1.5X on 5.8 million bases) so it is possible that many SNPs were missed. In the figure below from DE to R we have 153+6+17+96+36+5=313 mutations. At each level we can see that MA1 falls about halfway short. As seen in the figure he has 72+3+5+39+19=138. So going approximately, he is possibly ancestral at ~10 and derived at ~70 compared to the coverage of the other samples on the chart (4X http://www.1000genomes.org/about).


http://www.anthrogenica.com/attachment.php?attachmentid=3303&d=1419950070

Due to the above, when I used MA1 for calibration initially I had to assume a very wide range of potential ages. http://www.anthrogenica.com/showthread.php?1507-Some-provisional-calculations-for-haplogroup-R1a-based-on-the-first-FGC-result&p=22644&viewfull=1#post22644

I think calibrating with Anzick1 is more reliable - the dating looks secure at 12600ybp and the coverage is at the same level of other sample ( 5× for both Maya HGDP00877 and Anzick-1) I compared with, and it works reasonably well at 360years/mutation with the lower end of the range I got with MA1. I find the high coverage Ust-Ishim to be consistent too (at least on the Y side).

jamesdowallen
12-30-2014, 05:21 PM
MA1 ... shares I believe 19 of the remaining 36.

I think calibrating with Anzick1 is more reliable - the dating looks secure at 12600ybp ...

My understanding is that MA1 shares 19 of the SNPs and, due to poor sampling of the ancient DNA could not be tested for the other 17, right? Hardly the same as not sharing those SNPs.

Can you point me to the Anzick1 data, please?

parasar
12-30-2014, 05:53 PM
My understanding is that MA1 shares 19 of the SNPs and, due to poor sampling of the ancient DNA could not be tested for the other 17, right? Hardly the same as not sharing those SNPs.

Can you point me to the Anzick1 data, please?

No doubt, it is due to coverage and poor sample issues as about the same 50% ratio is seen at other levels too. 1000 Genomes coverage is also relatively low compared to say FGC.

Anzick-1
http://www.nature.com/nature/journal/v506/n7487/images/nature13025-sf2.jpg
http://www.nature.com/nature/journal/v506/n7487/images/nature13025-sf2.jpg
http://www.nature.com/nature/journal/v506/n7487/full/nature13025.html#supplementary-information
http://csfa.tamu.edu/cfsa-publications/Rasmussen%20et%20al%202014%20-%20anzick%20genome.pdf

Michał
12-30-2014, 06:48 PM
It is not inconsistent.

Then how would you explain the fact that the distance from MA-1 to the R0 node (35 mutations) is much larger than the distance between the R0 node and the R1 (R1a-R1b) node (23 mutations only)? IMO, it indicates quite strongly that the R1a-R1b split took place significantly earlier than 24 kya, most likely about 27.5 kya (or 25-30 kya), and thus the R1-R2 split could not have taken place 27 kya (as you have suggested), especially when knowing that the distance between the R1-R2 node and the R1a-R1b node is at least 4-5 ky (and probably about 6 ky). Please see the below scheme.
3304

BTW, the above mutations from the selected group of SNPs (ie. from those detectable in MA-1) correspond to about 300 years each, but using the much more complete SNP data for the R2 and Q lineages from the same paper (and the mutation rate of 0.7, strongly supported by the Ust’-Ishim data), we can date the R-Q split to 39.2 kya, the R1-R2 split to 33.8 kya and the R1a-R1b split to 28.3 kya, which is perfectly consistent with the above calculations based on the MA-1 data.



You appear to have misunderstood my comment. After pointing out that jamesdowallen seemed to have misplaced the TMRCA of R1 and R2 at the node in the tree of Hallast et al. that represents the MRCA of R1a and R1b and the TMRCA of Q and R at the node in the tree that represents the MRCA of R1 and R2, I commented that the remaining TMRCA figure of 40,000 years would not be consistent with the TMRCA of K2a and K2b (i.e. one of the most likely nodes for a person to have a TMRCA estimate upstream of (Q + R)).

Firstly, your suggestion that jamesdowallen has misplaced the TMRCAs (taking K2a-K2b for Q-R) was extremely strange, as nothing suggested he was not aware of the actual structure of the tree. Secondly, you seem to have missed the point I was trying to make in my previous post, namely that the 40ky TMRCA for haplogroup P (or for the Q-R split) is perfectly consistent with the estimated age for the K2a-K2b split which is quite precisely dated to 50 (47-55) kya by the Ust’-Ishim study, which can be easily demonstrated using any set of recently published Y-DNA sequencing data.


The Ust'-Ishim specimen's Y-DNA is only very slightly (about 2 + 6 = 8 SNPs) removed from the MRCA of K2a and K2b, and located on the K2a side of the split. This means that the MRCA of K2a and K2b must have lived not long before 45,000 YBP.

Importantly, only 1.86 MB was sequenced in this particular case, so each such SNP corresponds to nearly 1 ky (or 812 years), which makes the K2a-K2b split quite securely dated to about 50 (47-55) kya, as mentioned above.


A TMRCA of 40,000 years for (Q + R) would require an unusually high rate of accumulation of SNPs between the K2 node and the P1 (i.e. QR) node.

Actually, this accumulation of SNPs between the K2a-K2b node and the Q-R node when assuming an older age for haplogroup P (like 38-40 ky) will be much smaller than when assuming the much younger date of the R-Q split (ie. 33 kya), so this argument speaks against your lower estimates for haplogroup P.


Your estimate of 40,000 years for the TMRCA of Q and R is at the upper limit of that allowed by the 95% CI of Fu et al. 2014 for the TMRCA of K(xLT) (i.e. K2).

I agree that 40 ky for haplogroup P is close to the upper limit, but it is still within the acceptable range of 36-41 ky. On the other hand, your estimate of 33 ky (as suggested in your previous post) seems to be below the lower limit for the Q-R split.


At present, I cannot see the TMRCA of K2a and K2b being much greater than 48,000 years, so I would place the MRCA of Q and R at closer to 35,000 YBP than 40,000 YBP.

You seem to have weakened your position a bit, as the K2(48 kya)->P(35 kya) scenario is not substantially different from the K2 (50 kya)->P(40 kya) scenario, and I would certainly agree with any intermediate variant, like K2(49 kya)->P(37.5 kya).

Let me now use the FGC-based SNP data to refine our estimates as much as possible. It seems that the average number of the FGC-tested SNPs downstream of R1 is very close to 300 (in both R1a and R1b). VinceT has recently reported that the number of reliable SNPs at the R1 level is 67, while a corresponding number for level R is 66. As for the P level (or actually for SNPs separating the M-P and Q-R nodes), there are 137 of them (according to YFull). YFull also reports 5 SNPs for the MP level and 1 SNP for the K(xLT ) level. So we have (on average) about 576 SNPs downstream of K*, 575 SNPs downstream of MP, 570 reliable SNPs downstream of K2, 433 SNPs downstream of P1, 367 SNPs downstream of R and 300 SNPs downstream of R1. For calculating the TMRCA values, I have used two different rates (90 or 85 years per SNP), as they seem to be best supported when comparing the FGC data with the slightly better calibrated Big Y data.

Here are my estimates calculated using the rate of 90 years per each FGC-tested SNP:

Haplogroup R1 (R1a-R1b node) - 27.0 kya
Haplogroup R (R1-R2 node) - 33.0 kya
Haplogroup P1 (Q-R node) - 39.0 kya
Haplogroup K2b (M-P node) - 51.3 kya
Haplogroup K2 (K2a-K2b node) - 51.7 kya
Haplogroup K (K1-K2 node) - 51.8 kya

And the corresponding set when assuming 85 years per each SNP.

Haplogroup R1 (R1a-R1b node) – 25.5 kya
Haplogroup R (R1-R2 node) - 31.2 kya
Haplogroup P1 (Q-R node) – 36.8 kya
Haplogroup K2b (M-P node) – 48.5 kya
Haplogroup K2 (K2a-K2b node) – 48.9 kya
Haplogroup K (K1-K2 node) – 49.0 kya

A more precisely calculated number of SNPs downstream of R1 should provide an additional refinement, but it is a fully sequenced and radiocarbon-dated R1b member from the 20 kya - 5 kya period what is really needed for this.

jamesdowallen
12-30-2014, 06:59 PM
I think calibrating with Anzick1 is more reliable ...

::confused::

It's good to estimate average mutation rate and the data you post may have much utility. But mutation rate seems so variable that to calibrate the time of, say, R1b-L11, one wants a dated skeleton as close to R1b-L11 as possible.

Almost any R1-skeleton will be more useful than a Q-skeleton for dating R1.

alan
12-30-2014, 07:13 PM
This is getting confusing. How many SNPs do we know for sure that Mal'ta shares with R1/2 after M207. I am desperately trying to get an idea of the date of the MRCA of Mal'ta and the rest of R. From what people are saying the MRCA is significanty significantly more recent than M207.

Michał
12-30-2014, 07:30 PM
This is getting confusing. How many SNPs do we know for sure that Mal'ta shares with R1/2 after M207.

This is actually quite simple. Only 24 mutations from the R level (ie. between the Q-R and R1-R2 nodes) could have been determined in MA-1 (as the remaining ones where not covered by the analysis, or could not have been reliably read). From those 24 mutations, MA-1 was positive for 19 and negative for 5, which means that the MRCA of MA-1, R1 and R2 predated the MRCA of R1 and R2.



I am desperately trying to get an idea of the date of the MRCA of Mal'ta and the rest of R. From what people are saying the MRCA is significanty significantly more recent than M207.
The distance between the R0/R1+R2 node and the R1/R2 node is about 1.5 ky, so when assuming that the R1-R2 split took place about 33 kya (which implies that Q and R have been separated about 40 kya), the lineage ancestral to MA-1 has been separated about 34.5 kya (see my old scheme posted above).

Ebizur
12-30-2014, 07:47 PM
Firstly, your suggestion that jamesdowallen has misplaced the TMRCAs (taking K2a-K2b for Q-R) was extremely strange, as nothing suggested he was not aware of the actual structure of the tree.jamesdowallen is a newbie here. His TMRCA estimate for (R1a + R1b) happens to match perfectly with TMRCA estimates for (R1 + R2) and his TMRCA estimate for (R1 + R2) happens to match perfectly with TMRCA estimates for (Q + R) that I (and at least parasar as well) have posted on this forum previously. Therefore, I made a specious assumption that he had inadvertently misplaced the TMRCA estimates on his edited version of the phylogenetic tree from Hallast et al., but I could not explain his estimate of 40,000 YBP for the TMRCA of (Q + R) under that assumption. I apologize if I have worded my comment in an ambiguous manner. Anyway, it is now clear that his placement of the TMRCA estimates has not been made in error, but rather reflects a differing estimate of the NRY mutation rate.


Actually, this accumulation of SNPs between the K2a-K2b node and the Q-R node when assuming an older age for haplogroup P (like 38-40 ky) will be much smaller than when assuming the much younger date of the R-Q split (ie. 33 kya), so this argument speaks against your lower estimates for haplogroup P.You have this completely backward. The TMRCA of K2 is constrained by the time of deposition of the Ust'-Ishim specimen to somewhere between 47,000 and 55,000 years according to Fu et al. (2014); the authors' point estimate is 50,000 years. The greater you make an estimate of the TMRCA of (Q + R), the less time you have between the MRCA of K2 (≈ 50,000 YBP according to Fu et al.) and the MRCA of QR for the accumulation of all the SNPs that distinguish QR from K2a. Less time to accumulate the same number of SNPs equates to a higher rate of accumulation of SNPs. I think this should be quite obvious.

The ratio of the TMRCA (Q + R)/TMRCA (K2a + K2b) in Yan et al. (2014) is 0.730̅3̅. Taking the estimate of Fu et al. for the TMRCA of K2, the TMRCA of (Q + R) should be approximately 36,515 (34,324 - 40,167) years. Considering these data, I would accept that the TMRCA of QR should be most likely somewhere between 35,000 and 40,000 years; my previous estimate is rather low, and your estimate is rather high.

alan
12-30-2014, 08:51 PM
Then how would you explain the fact that the distance from MA-1 to the R0 node (35 mutations) is much larger than the distance between the R0 node and the R1 (R1a-R1b) node (23 mutations only)? IMO, it indicates quite strongly that the R1a-R1b split took place significantly earlier than 24 kya, most likely about 27.5 kya (or 25-30 kya), and thus the R1-R2 split could not have taken place 27 kya (as you have suggested), especially when knowing that the distance between the R1-R2 node and the R1a-R1b node is at least 4-5 ky (and probably about 6 ky). Please see the below scheme.
3304

BTW, the above mutations from the selected group of SNPs (ie. from those detectable in MA-1) correspond to about 300 years each, but using the much more complete SNP data for the R2 and Q lineages from the same paper (and the mutation rate of 0.7, strongly supported by the Ust’-Ishim data), we can date the R-Q split to 39.2 kya, the R1-R2 split to 33.8 kya and the R1a-R1b split to 28.3 kya, which is perfectly consistent with the above calculations based on the MA-1 data.



Firstly, your suggestion that jamesdowallen has misplaced the TMRCAs (taking K2a-K2b for Q-R) was extremely strange, as nothing suggested he was not aware of the actual structure of the tree. Secondly, you seem to have missed the point I was trying to make in my previous post, namely that the 40ky TMRCA for haplogroup P (or for the Q-R split) is perfectly consistent with the estimated age for the K2a-K2b split which is quite precisely dated to 50 (47-55) kya by the Ust’-Ishim study, which can be easily demonstrated using any set of recently published Y-DNA sequencing data.


Importantly, only 1.86 MB was sequenced in this particular case, so each such SNP corresponds to nearly 1 ky (or 812 years), which makes the K2a-K2b split quite securely dated to about 50 (47-55) kya, as mentioned above.


Actually, this accumulation of SNPs between the K2a-K2b node and the Q-R node when assuming an older age for haplogroup P (like 38-40 ky) will be much smaller than when assuming the much younger date of the R-Q split (ie. 33 kya), so this argument speaks against your lower estimates for haplogroup P.


I agree that 40 ky for haplogroup P is close to the upper limit, but it is still within the acceptable range of 36-41 ky. On the other hand, your estimate of 33 ky (as suggested in your previous post) seems to be below the lower limit for the Q-R split.


You seem to have weakened your position a bit, as the K2(48 kya)->P(35 kya) scenario is not substantially different from the K2 (50 kya)->P(40 kya) scenario, and I would certainly agree with any intermediate variant, like K2(49 kya)->P(37.5 kya).

Let me now use the FGC-based SNP data to refine our estimates as much as possible. It seems that the average number of the FGC-tested SNPs downstream of R1 is very close to 300 (in both R1a and R1b). VinceT has recently reported that the number of reliable SNPs at the R1 level is 67, while a corresponding number for level R is 66. As for the P level (or actually for SNPs separating the M-P and Q-R nodes), there are 137 of them (according to YFull). YFull also reports 5 SNPs for the MP level and 1 SNP for the K(xLT ) level. So we have (on average) about 576 SNPs downstream of K*, 575 SNPs downstream of MP, 570 reliable SNPs downstream of K2, 433 SNPs downstream of P1, 367 SNPs downstream of R and 300 SNPs downstream of R1. For calculating the TMRCA values, I have used two different rates (90 or 85 years per SNP), as they seem to be best supported when comparing the FGC data with the slightly better calibrated Big Y data.

Here are my estimates calculated using the rate of 90 years per each FGC-tested SNP:

Haplogroup R1 (R1a-R1b node) - 27.0 kya
Haplogroup R (R1-R2 node) - 33.0 kya
Haplogroup P1 (Q-R node) - 39.0 kya
Haplogroup K2b (M-P node) - 51.3 kya
Haplogroup K2 (K2a-K2b node) - 51.7 kya
Haplogroup K (K1-K2 node) - 51.8 kya

And the corresponding set when assuming 85 years per each SNP.

Haplogroup R1 (R1a-R1b node) – 25.5 kya
Haplogroup R (R1-R2 node) - 31.2 kya
Haplogroup P1 (Q-R node) – 36.8 kya
Haplogroup K2b (M-P node) – 48.5 kya
Haplogroup K2 (K2a-K2b node) – 48.9 kya
Haplogroup K (K1-K2 node) – 49.0 kya

A more precisely calculated number of SNPs downstream of R1 should provide an additional refinement, but it is a fully sequenced and radiocarbon-dated R1b member from the 20 kya - 5 kya period what is really needed for this.

From an archaeological point of view I think in both your dating methods that P1 is the most upstream node in your list that is young enough to have occurred in northern Asia/Siberia. As radiocarbon dating shows that area wasnt settled by cultures associated with modern humans until around the time Ust-Ishim lived c. 45000 years ago, then the K, K2 and K2b nodes simply had to have happened outside north Asia/Siberia. Archaeological evidence available at present would strongly point to SW Asia as the origin for the earliest north-central Asian/Siberian modern humans. The technology strong resembles the Emiran culture of SW Asia. However, the P1 node is easily within the timespan of the settlement of Siberia and IMO its hard to imagine it didnt happen there with a K2 Ust-Ishim and R* Mal'ta being found among just a very few palaeolithic Eurasians tested to date.

Actually the suggested dating for P1 of 35-37000BC is interesting as a climatic downturn hit the early upper palaeolithic Siberians around this time and didnt fully pick up again until around 30000BC. This creates a scenario where some of the P Siberians could have moved south into SE Asia. We know from archaeology that the early upper palaeolithic culture of Siberia seems to have arrived a lot later in Mongolia and NW China than in most of Siberia and this may well be linked to that climatic downturn in Siberia. I provides the earliest climatic scenario for Siberians heading into SE Asia.

After that there was a pre-glacial upturn c. 30000-25000BC which less likely to have seen north to south migration. Indeed in this period the middle upper palaeolithic culture of Siberia developed and this is known to have extended further north than every before. Of course the last glacial then hit and cleared out most of Siberia by 25-22000BC creating another scenario of migration south.

I am encouraged by the dating being suggested that a subset of K2 and K2b was among the group who headed north from SW Asia into north central Asia and Siberia c. 43000BC and likely gave rise to P1 there. The dating suggests that the K2 node at c. 49-47000BC is so old that there is no problem at all in seeing parts of K2 and K2b heading into north Asia and other parts of K and K2 and even K2b heading along the southern Asian coastal route east. If this movement happened then you dating suggests that the south Asian west to east route of K2 could have been used at any time after 50000BC with the period 50-45000BC making a lot of potential sense to me for a southern route from SW Asia to SE Asia and separate Levant to Iran to north central Asia to Siberia route commencing around the young end of that range or a little after. As I noted before, use of bamboo for tools may be hiding this movement from archaeologists. Great material for making tools but a disaster for archaeologists trying to work out what happened.

However I think the K2 picture in SE Asia has been further obscured by a later north to south movement from Siberia too centred on the mid 30000s.

parasar
12-30-2014, 08:52 PM
::confused::

It's good to estimate average mutation rate and the data you post may have much utility. But mutation rate seems so variable that to calibrate the time of, say, R1b-L11, one wants a dated skeleton as close to R1b-L11 as possible.

Almost any R1-skeleton will be more useful than a Q-skeleton for dating R1.

I agree on the first part. Supposedly ERS389795 is L11.

On the second, yes too if we are comparing apples to apples. But if not, I would take a properly covered genome such as Anzick-1 or Ust-Ishim any day to date downstream and parallel clades, than a nosier lower coverage, but closer individual.

I may be wrong on this, but I think it is best to avoid mutations on terminal strings to calculate rates. I would rather calculate using two known ancient samples such as Ust-Ishim and Anzick-1 going back to the common M526 node.

Regarding mutation rates, Hallast et al. had some interesting observations:
http://mbe.oxfordjournals.org/content/early/2014/12/13/molbev.msu327/suppl/DC1

Visual inspection of the tree (fig. 3) shows apparent heterogeneity in branch lengths between (and also within) clades—for example, the tips of hg C sequences appear to extend further than those of other haplogroups. Also, one previous study has shown a significantly reduced mean number of mutations to the root for haplogroup A, compared with other lineages ...
A comparison of the mean number of mutations to the root of the tree for the three different tissue sources (supplementary fig. S3, Supplementary Material online) shows that MSY sequences in the LCLs analyzed here indeed carry significantly more mutations (mean of 471, n = 152; P = 0.00124, one-way analysis of variance) than the sequences from blood...
[but]
Considering branch lengths to the root, absolute differences between sample sources are small, so have a minimal effect on TMRCA estimates...

To address the possibility of haplogroup-specific effects, we compared the mean number of mutations with the root of our tree for 17 different major haplogroups (supplementary table S8, Supplementary Material online). Numbers of samples per haplogroup vary widely, and once this is taken into account only two comparisons, hg E and hg O versus hg R1b, retain any signal of distinctive branch lengths
...
MSY Mutation Rate
Although the relative ages of clades in the MSY phylogeny can now be well established thanks to the large number of variants, the absolute estimates of TMRCA remain uncertain because of corresponding uncertainty about choice of the appropriate mutation rate...
Visual inspection of the phylogeny suggests that there may be branch length heterogeneity within our phylogeny. However, after adjustment for sample size differences, statistical support for such differences remains for only two comparisons, hg O versus hg R1b, and hg E versus hg R1b...

parasar
12-30-2014, 09:59 PM
Folk here look to be in agreement that the high coverage (21.7X) Ust-Ishim is reliably dated (45770-44010 cal BP (68.2%) / 46880-43210 cal BP (95.4%)) and that from him K-M9 can be reasonably dated to about 51000ybp (M9 is far closer to M526 than to M578 so I am splitting the 4k as 1:3)


The Ust’-Ishim sequence shares all the mutations common to the K macrohaplogroup and has one additional specific mutation rs2033003/M526 which defines the group K(xLT) (Figure S9.1, blue part). The Ust’-Ishim Y-chromosome carries no additional mutations belonging to any of the sub-haplogroups of K(xLT); however, there are 6 additional mutations ...

we estimate a mutation rate of 0.76 × 10 -9 substitutions per site per year (95% HPD: 0.67-0.86 × 10 -9)...

KIJ 54 49-59
KxLT 50 47-55


Going by Karafet et al. that K-M9 is about 12 + <5 ky distant from R,Q divergence we get 51-<17=<34kybp as the approximate age for R,Q divergence.

Courtesy Maju:
http://1.bp.blogspot.com/-xIyCVSPSm3I/U5LqVD5nawI/AAAAAAAACsM/TEjs1OTRu7k/s1600/Karafet-Y-DNA-K-tree-annotated.png

Michał
12-30-2014, 10:23 PM
we get 51-<17=<34kybp as the approximate age for R,Q divergence.

I guess you meant >34kybp.

parasar
12-30-2014, 10:47 PM
I guess you meant >34kybp.

Thanks. Yes.

I would also add that that 17k number is based on the <3k between K-M9 and P331.

We estimate the time between the most recent common ancestors
of chromosomes carrying the M9 mutation and that of the subset of
chromosomes carrying the P331 mutation using the probability
distribution for mutations along a lineage as described in Karafet
et al.10...

Considering that we do not observe any of the 68 mutations occurring between the common
ancestor of K-M9 chromosomes and that of K-P331 chromosomes,
we estimate that this interval of time was shorter than 4.3% of the
TMRCA of M168 chromosomes (95% upper bound). Assuming
~70 ky for the TMRCA of M168 chromosomes,10 we estimate the
interval of time between the diversification of K-M9 and that of
K-P331 to be <3 ky. This rapid diversification has also been assessed
using whole Y-chromosome sequence data.22 In addition, we estimate
the total time between the common ancestor of K-M9 and that of
P-P295 to be <5 ky, and the time between the common ancestor
P-P295 and that of P-P27 to be 12.3 ky (95% CI: 6.6–20 ky).


The above assumption is from Karafet's prior paper.
http://genome.cshlp.org/content/suppl/2008/04/02/gr.7172008.DC1/SOM_2.pdf

Assuming that the age of MRCA-CT is 70,000 years ... MRCA-F ... = 48,039 years.


So per the above
48039-<17k=>31039kybp as their approximate age for R,Q divergence.

alan
12-31-2014, 03:18 AM
Folk here look to be in agreement that the high coverage (21.7X) Ust-Ishim is reliably dated (45770-44010 cal BP (68.2%) / 46880-43210 cal BP (95.4%)) and that from him K-M9 can be reasonably dated to about 51000ybp (M9 is far closer to M526 than to M578 so I am splitting the 4k as 1:3)



Going by Karafet et al. that K-M9 is about 12 + <5 ky distant from R,Q divergence we get 51-<17=<34kybp as the approximate age for R,Q divergence.

Courtesy Maju:
http://1.bp.blogspot.com/-xIyCVSPSm3I/U5LqVD5nawI/AAAAAAAACsM/TEjs1OTRu7k/s1600/Karafet-Y-DNA-K-tree-annotated.png

Yes I think 45000 years/43000BC is probably a good round figure for Ust-Ishim and easy to remember. This is also the very date climatologists have come up with as the start of a better climate period in Siberia. So, I dont expect to ever find remain much older than Ust-Ishim in Siberia. He is also at the extreme western end of south-central Siberia which again suggests he was early.

alan
12-31-2014, 03:34 AM
So seems that anything to do with K dates from 50000BC onward and any movements east by the southern route from SW Asia pre-dating 50000BC cannot have anything to do with K2.

jamesdowallen
12-31-2014, 08:53 AM
Ouch! I just had my ignorance fought on two counts.

(1) Many observed SNP's do not come from the germ line. Thus the 35 novel SNPs found for Siberian boy may give an overestimate of its distance from its R* ancestor. (But how common are such non-germ SNPs? Or could the extra "SNP"s be artifacts of the very poor sampling of that DNA?)

(2) I didn't know the subclades of K were renumbered recently and PQR is now synonymous with K2b2 ! I'd thought I was up-to-date (2008 style) knowing that K2 was the old K4, with T the old K2.
In the words of the famous Roseanne Rosannadanna, "Nevermind." :humble:

Michał
12-31-2014, 01:30 PM
The above assumption is from Karafet's prior paper.
http://genome.cshlp.org/content/suppl/2008/04/02/gr.7172008.DC1/SOM_2.pdf

So per the above
48039-<17k=>31039kybp as their approximate age for R,Q divergence.

48309 years is the Karafet’s estimate for TMRCA of haplogroup F, and since we know that it is most likely a quite significant underestimation (as there are at least 27 mutations between the F and K2 levels, with K2 dated to 50 (47-55) kybp based on Ust’-Ishim), we cannot consider your above estimate for the age of the R-Q split as reliable.

alan
12-31-2014, 01:36 PM
48309 years is the Karafet’s estimate for TMRCA of haplogroup F, and since we know that it is most likely a quite significant underestimation (as there are at least 27 mutations between the F and K2 levels, with K2 dated to 50 (47-55) kybp based on Ust’-Ishim), we cannot consider your above estimate for the age of the R-Q split as reliable.

So F presumably dates to 50-odd thousand years ago. What is the current theory as to story of F. I read the idea that it was a thrust into the Levant (which an early modern human thrust had abandoned to the Neanderthals) out of NE Africa around 50000 years back related to the Emiran culture in Levant and then the latter in archaeological terms would appear to have a thrust into Siberia c. 45000 years ago and also a short-lived one into Europe around the same time or fractionally earlier corresponding to a good climate period. It would make sense to me if F was related to Emiran and that K,I and J arose in that culture and its derivatives in the subsequent few thousand years.

I must admit I am very encouraged by the way a lot of the DNA, genetic dating and cutting edge archaeological dating is aligning. Seems to me within a couple of years we will have a pretty definitive history of modern humans in the Palaeolithic. I actually dont think it would take a vast amount of ancient DNA to get the picture clarrified - which is just as well because burials of that period are not exactly plentiful.

Michał
12-31-2014, 01:50 PM
So F presumably dates to 50-odd thousand years ago.
IJK is dated by Fu et al. (the Ust'-Ishim paper) to about 54 kya (49-59 kya) and F seems to be about 1 ky older than IJK, so yes, it seems almost certain that F expanded between 50 and 60 kya (most likely about 55 kya), and I guess it had to have happened in the Near East (or very close to it).

alan
12-31-2014, 02:06 PM
IJK is dated by Fu et al. (the Ust'-Ishim paper) to about 54 kya (49-59 kya) and F seems to be about 1 ky older than IJK, so yes, it seems almost certain that F expanded between 50 and 60 kya (most likely about 55 kya), and I guess it had to have happened in the Near East (or very close to it).

I believe there is a theory that F may have been in NE Africa for a while before expanding into Levant when climate changes shunted populations and this led to the Emiran in Levant, SE Anatolia and derivatives beyond. I think this is because the best dates for the Emiran culture is not older than 50000 years so it is likely that F had a few millenia somewhere else before the Levant. I think from memory that there is a technological resemblance of Emiran in Levant to earlier cultures in NE Africa but I will need to check that as idea and data changes very fast and turns on individual discoveries in this period.

alan
12-31-2014, 02:52 PM
This seems the most recent paper on the origin of the Emiran which seems the root of the Siberian early upper palaeolithic as well as other cultures like Bohunician in Europe

http://dienekes.blogspot.co.uk/2014/09/an-archaeological-scenario-for-out-of.html

parasar
01-01-2015, 12:13 AM
48309 years is the Karafet’s estimate for TMRCA of haplogroup F, and since we know that it is most likely a quite significant underestimation (as there are at least 27 mutations between the F and K2 levels, with K2 dated to 50 (47-55) kybp based on Ust’-Ishim), we cannot consider your above estimate for the age of the R-Q split as reliable.

Perhaps, but I doubt it.
I think Karafet's ages need about a 1.1385x correction.
That would bump that <17K from M9 to P295 to about <19K. Deducting from 19K from 51k would put us at >32k. So I think my estimate of 33k is not unreliable.


IJK is dated by Fu et al. (the Ust'-Ishim paper) to about 54 kya (49-59 kya) and F seems to be about 1 ky older than IJK, so yes, it seems almost certain that F expanded between 50 and 60 kya (most likely about 55 kya), and I guess it had to have happened in the Near East (or very close to it).

From what I have able to find there is no evidence of humans in the Near East in that time-frame.

alan
01-01-2015, 03:52 AM
Perhaps, but I doubt it.
I think Karafet's ages need about a 1.1385x correction.
That would bump that <17K from M9 to P295 to about <19K. Deducting from 19K from 51k would put us at >32k. So I think my estimate of 33k is not unreliable.



From what I have able to find there is no evidence of humans in the Near East in that time-frame.

This indicates Emiran - a modern human associated culture-existed in the Levant from 49000 years ago

http://www.koutaigeki.org/PDF/2013/Sano%20et%20al.%20poster%20ESHE3.pdf

Emeran seems odds on to F or perhaps K2 associated to me and the Siberian thrust a few millenia later seems clear to be K2 associated given Ust-Isham was a pretty early example of the Siberian settlement by modern humans.

Michał
01-01-2015, 03:30 PM
Perhaps, but I doubt it.
I think Karafet's ages need about a 1.1385x correction.
That would bump that <17K from M9 to P295 to about <19K. Deducting from 19K from 51k would put us at >32k. So I think my estimate of 33k is not unreliable.

You might be right, but there are too many independent data suggesting that haplogroup P is older than 35 ky, as admitted above by Ebizur. Here is another example. The paper by Scozzari et al. (2014) provides an SNP-based tree in which haplogroup F (or rather IJK) is dated to 63 or 66 kybp (using two different calculation methods and assuming the 0.64 mutation rate), but when rescaling this tree using the Ust'-Ishim-derived 0.76 rate we get 53 or 55.5 kybp for IJK (which is in full agreement with the 54 kybp estimate provided by Fu et al.). Now, IJK shows 64 mutations downstream, with 44 mutations (on average) downstream of P (R-Q), which corresponds to 37.1 kypb for the Q-R split.
http://genome.cshlp.org/content/early/2014/01/06/gr.160788.113.abstract

http://4.bp.blogspot.com/-AFN4jjwuNBc/Us15U6bBVVI/AAAAAAAAJdc/uY8iTtxO0Mo/s1600/Scozzari.png


From what I have able to find there is no evidence of humans in the Near East in that time-frame.

Right, but I am not sure how much this is affected by a potentially very small population size for those F* people exiting Africa (when compared to the local Neanderthal population) and a lack of calibrated data for many Near Eastern sites. Importantly, the modern humans are seen in Central Europe as early as 48 kybp (the Bohunician industry of Emirian origin), while the more widespread Aurignacian culture (strongly associated with modern humans) has been recently redated (based on the calibrated radiocarbon data) to a period ranging from 47 to 41 kybp, which is strongly suggesting that the F-derived sublineages heading north-west (towards Caucasus and Europe) and east (towards South Asia) were separated before 50 kybp.

Also, dating the arrival of haplogroup F to SW Asia to a period between 60 and 50 kybp is consistent with the Neanderthal admixture data from Ust'-Ishim. This admixture is unlikely to have taken place in Africa, so we can assume that it happened shortly after arriving to SW Asia but before splitting into many currently known sublineages of F, or at least before the GHIJK split. Based on the size of the Neanderthal-derived DNA fragments, Fu et al. estimated that " the Neanderthal gene flow occurred 232–430 generations before the Ust’-Ishim individual lived" or "approximately 50,000 to 60,000 years BP".

parasar
01-01-2015, 09:11 PM
This indicates Emiran - a modern human associated culture-existed in the Levant from 49000 years ago

http://www.koutaigeki.org/PDF/2013/Sano%20et%20al.%20poster%20ESHE3.pdf

Emeran seems odds on to F or perhaps K2 associated to me and the Siberian thrust a few millenia later seems clear to be K2 associated given Ust-Isham was a pretty early example of the Siberian settlement by modern humans.

That Emirian was some type of F-GHIJK is possible. C, E are other possibilities. That it was a precursor to Ust-Ishim to me looks unlikely if not impossible. I see them as parallel almost simultaneous developments as there is no precursor to any of these three (Levant, Europe, Altai) in situ.

Ebizur
01-02-2015, 02:21 AM
You might be right, but there are too many independent data suggesting that haplogroup P is older than 35 ky, as admitted above by Ebizur.I have stated that an arithmetic comparison of the point estimate and 95% confidence interval of the TMRCA of K(xLT) as per Fu et al. 2014 and the point estimates of the TMRCA of (NO + QR) and the TMRCA of (Q + R) as per Yan et al. 2014 suggests that the TMRCA of (Q + R) is most likely between 35,000 and 40,000 YBP. If one also takes into account the 95% confidence intervals of the TMRCA estimates of Yan et al., then one can predict only that the TMRCA of (Q + R) should be somewhere between 28,700 and 47,500 YBP. Of course, the mean of those values does again fall between 35,000 and 40,000 YBP, but this only takes into account data from two sources (Yan et al. 2014 and Fu et al. 2014), and the prediction is very imprecise. I think the only thing that clearly can be agreed upon at present is that we need to refine our understanding of the NRY nucleotide substitution rate.

parasar
01-02-2015, 10:32 PM
You might be right, but there are too many independent data suggesting that haplogroup P is older than 35 ky, as admitted above by Ebizur. Here is another example. The paper by Scozzari et al. (2014) provides an SNP-based tree in which haplogroup F (or rather IJK) is dated to 63 or 66 kybp (using two different calculation methods and assuming the 0.64 mutation rate), but when rescaling this tree using the Ust'-Ishim-derived 0.76 rate we get 53 or 55.5 kybp for IJK (which is in full agreement with the 54 kybp estimate provided by Fu et al.). Now, IJK shows 64 mutations downstream, with 44 mutations (on average) downstream of P (R-Q), which corresponds to 37.1 kypb for the Q-R split.
http://genome.cshlp.org/content/early/2014/01/06/gr.160788.113.abstract

...

Isn't their P-R,Q P-P295 not P-P27? There is a gap of 12-16k between the two.

Human Y chromosomes to be sequenced (Supplemental Table S1) were selected on the basis
of their SNP/STR genotype which had been determined in the present or previous studies (Cruciani
et al. 2004, 2007, 2010, 2011a, 2011b; Trombetta et al. 2011; Scozzari et al. 2012).

parasar
01-02-2015, 11:15 PM
This indicates Emiran - a modern human associated culture-existed in the Levant from 49000 years ago

http://www.koutaigeki.org/PDF/2013/Sano%20et%20al.%20poster%20ESHE3.pdf

Emeran seems odds on to F or perhaps K2 associated to me and the Siberian thrust a few millenia later seems clear to be K2 associated given Ust-Isham was a pretty early example of the Siberian settlement by modern humans.
That Emirian was some type of F-GHIJK is possible. C, E are other possibilities. That it was a precursor to Ust-Ishim to me looks unlikely if not impossible. I see them as parallel almost simultaneous developments as there is no precursor to any of these three (Levant, Europe, Altai) in situ.

Regarding the above, there is a chance that part of what alan is saying to be possible under a secondary dispersion model. This secondary dispersion would have included in some form Indians and Negritos.
http://www.pnas.org/content/111/20/7248.full.pdf

Our dataset conforms to this hypothesis in that neither the genetic nor the cranial phenotype dataset from our sampled populations separate the Indo-European and Dravidian speakers from India, as might be expected if the latter where relic descendants of the southern route dispersal (Supporting Information, The “Negrito” Hypothesis). Instead, both Indian samples exhibit closer genetic and phenotypic affinity to the hypothetical second dispersal descendants (the Japanese, Aeta/Agta, and Central Asian populations) ...
Our results are unambiguous in their support of multiple dispersals into Eurasia, with Australians, Papuans, and Melanesians retaining the signal of a southern route dispersal that commenced closer to the temporal boundary of the Middle–Late Pleistocene ...
models of ancient admixture events with other hominin populations should enclose the South Asian, southern route geographical space


The “Negrito” Hypothesis
http://www.pnas.org/content/suppl/2014/04/17/1323666111.DCSupplemental/pnas.201323666SI.pdf

Papuans and Melanesians as descendants of the first dispersal and Agta/Aeta as descendants of the second dispersal

Michał
01-03-2015, 03:04 PM
Isn't their P-R,Q P-P295 not P-P27? There is a gap of 12-16k between the two.
This seems extremely unlikely. Firstly, the supplementary table S1 suggests that the group marked as haplogroup P included only samples from haplogroup Q and R (this is also suggested by the referenced publications in which no P(xQR) samples are mentioned). Secondly, this haplogroup P group is shown as defined by M45 (table S1), and based on the study of Karafet et al. (2014), it seems that M45 is positioned downstream of P295 (thus on the P27 level). Thirdly, the very first information about P-P295* was published about six months later (Karafet et al. , 2014), so if Scorazzi et al. knew about such unexpected P-P295* case in their haplogroup P sample, they would certainly inform us about it. Fourthly, if the haplogroup P group shown in the Scorazzi tree corresponded to a recently discovered lineage (or paragroup) P-P295*, the proportions between mutations positioned between the IJK node and the P295* node (20?) and those downstream of P295 (44?) would be certainly different (close to 1:10 rather than 1:2).

alan
01-03-2015, 04:12 PM
I dont know this for sure but if PxR/Q is rare in SW Asia except Iran then it at least confirms that P probably happened after the spread east and north out of SW Asia was under way happened c. 45000 years or so ago. This seems in line with the suggested genetic dating. If the dating does indeed fall into the 40-35000 years ago range - most likely around 37000 years then this does fit rather well the relatively eastern distribution of PxR/Q. We also cannot ignore the fact that early R and Q most likely arose in north-east Asia from P

Actually as a tangent it did strike me that P in south Asia could be a hint at where some of its descendants retreated during the LGM. P is too young to be associated with the first modern human wave from Levant to north-central Asia/Siberia which was about 45000 years ago or more. I am absolutely convinced P arose in Siberia c. 37000BC give or take a thousand years . Its highest frequency occurs among Turkic peoples of Central Asia and South Siberia (35.4% among Tuvans, 28.3% among Altaians-Kizhi according to Wiki which makes it overwhelmingly likely that at least a large part of P overwintered around Altai in the LGM.

It therefore seems certain to me that P found in more southerly locations around Iran and south and south-east Asia is due to movements back south from Siberia. The timing is less clear. I suppose any period is possible but maybe some PxQ/R lineages made a trek towards the Stans and Iran along with some early R1b and R1a and R1 as the LGM started to grip Siberia. I however cannot say I am aware of archaeological evidence for an early LGM movement from Siberia. It could be post-LGM. Regardless, I see P in southern parts of Asia as a displacement from Siberia.

parasar
01-03-2015, 06:16 PM
This seems extremely unlikely. Firstly, the supplementary table S1 suggests that the group marked as haplogroup P included only samples from haplogroup Q and R (this is also suggested by the referenced publications in which no P(xQR) samples are mentioned). Secondly, this haplogroup P group is shown as defined by M45 (table S1), and based on the study of Karafet et al. (2014), it seems that M45 is positioned downstream of P295 (thus on the P27 level). Thirdly, the very first information about P-P295* was published about six months later (Karafet et al. , 2014), so if Scorazzi et al. knew about such unexpected P-P295* case in their haplogroup P sample, they would certainly inform us about it. Fourthly, if the haplogroup P group shown in the Scorazzi tree corresponded to a recently discovered lineage (or paragroup) P-P295*, the proportions between mutations positioned between the IJK node and the P295* node (20?) and those downstream of P295 (44?) would be certainly different (close to 1:10 rather than 1:2).

As far as they are concerned they can't distinguish among P-P27, P-P295, and P-M45 and other mutations in the 12+ky string. So how many of these mutations are they including at the P level? They have 64 mutations on the P (M9) branch with 44 in the fan section. I am assuming the fan section is mainly R and Q samples. Was say P1-P27 included among the 44? Was P-P295?
This knowledge would be needed to get the correct ratio:
44/64=0.6875
0.6875x54=37.125

Per Karafet 2014, P295 "was previously assumed to be equivalent to 18 other mutations defining the haplogroup P is derived in a broader group of chromosomes ... In our worldwide sample of 7462 Y chromosomes, we observe the newly defined paragroup P-P295* in 83 chromosomes from Island Southeast Asia (Timor, Sumba, Sulawesi) and the Negrito Aeta population from Philippines (Table 1 and Figure 2) ...and the time between the common ancestor P-P295 and that of P-P27 to be 12.3 ky ..."


I dont know this for sure but if PxR/Q is rare in SW Asia except Iran then it at least confirms that P probably happened after the spread east and north out of SW Asia was under way happened c. 45000 years or so ago. This seems in line with the suggested genetic dating. If the dating does indeed fall into the 40-35000 years ago range - most likely around 37000 years then this does fit rather well the relatively eastern distribution of PxR/Q. We also cannot ignore the fact that early R and Q most likely arose in north-east Asia from P

Actually as a tangent it did strike me that P in south Asia could be a hint at where some of its descendants retreated during the LGM. P is too young to be associated with the first modern human wave from Levant to north-central Asia/Siberia which was about 45000 years ago or more. I am absolutely convinced P arose in Siberia c. 37000BC give or take a thousand years . Its highest frequency occurs among Turkic peoples of Central Asia and South Siberia (35.4% among Tuvans, 28.3% among Altaians-Kizhi according to Wiki which makes it overwhelmingly likely that at least a large part of P overwintered around Altai in the LGM.

It therefore seems certain to me that P found in more southerly locations around Iran and south and south-east Asia is due to movements back south from Siberia. The timing is less clear. I suppose any period is possible but maybe some PxQ/R lineages made a trek towards the Stans and Iran along with some early R1b and R1a and R1 as the LGM started to grip Siberia. I however cannot say I am aware of archaeological evidence for an early LGM movement from Siberia. It could be post-LGM. Regardless, I see P in southern parts of Asia as a displacement from Siberia.

It has only been found in South East Asia in a set of 7462 worldwide samples studied by Karafet and I have seen no evidence contradicting Karafet.

"With the exception of P-P27, all of the descendant lineages are located today in Southeast Asia and Oceania ...This pattern leads us to hypothesize a southeastern Asian origin for P-P295 and a later expansion of the ancestor of
subhaplogroups R and Q into mainland Asia. An alternative explanation would involve an extinction event of ancestral P-P295* chromosomes everywhere in Asia. These scenarios are equally parsimonious.
They involve either a migration event (P* chromosomes from Indonesia to mainland Asia) or an extinction event of P-P295* paragroup in Eurasia. However, given the geographic distribution of the P331 mutation, the immediate predecessor of P lineage and its likely origin in Southeast Asia/Indonesia, the existing evidence favors the first scenario."

vettor
01-03-2015, 06:40 PM
what is the ages of the K split creating SNP's P326 and M526?

I understand from M van Owen chart that P326 was created before M526

parasar
01-03-2015, 08:43 PM
what is the ages of the K split creating SNP's P326 and M526?

I understand from M van Owen chart that P326 was created before M526

M526 is about 51000 years old from Ust-Ishim.
I don't think Karafet found any SNP between M9 and M526.
In my FGC I see three SNPs at the M526 level - M5710 • CTS6445, CTS8508, M526 • PF5979 level K(xLT)
If those are the only three, I can't imagine P326 being much older, if at all, than M526.

So for now, phylogenitically
https://www.familytreedna.com/PDF/MendezHumBiol2011.pdf

(P326) marks a higher-level branch (i.e., analogous to M526 in the Y chromosome phylogeny hierarchy) that represents the ancestor of both haplogroups T and L.


Eyeballing here, I would say P326 while older than the R,Q split is younger than M526.
http://mbe.oxfordjournals.org/content/suppl/2014/11/26/msu327.DC1/FigureS1_TreeWithSampleNames.pdf

Michał
01-03-2015, 08:48 PM
As far as they are concerned they can't distinguish among P-P27, P-P295, and P-M45 and other mutations in the 12+ky string. So how many of these mutations are they including at the P level? They have 64 mutations on the P (M9) branch with 44 in the fan section. I am assuming the fan section is mainly R and Q samples. Was say P1-P27 included among the 44? Was P-P295?
This knowledge would be needed to get the correct ratio:
44/64=0.6875
0.6875x54=37.125

Since nothing indicates that any P(xQR) sample was analyzed in their study, it is quite obvious that their haplogroup P group should be considered as positive for all known SNPs from the P/P1 levels (even if most of them were not tested), and thus the ratio provided above does indeed correspond to the ratio between the age of QR (or P1) and the age of IJK, so the age of 37.1 ky for QR should be considered as reliable (of course, within a reasonable margin of error).


Per Karafet 2014, P295 "was previously assumed to be equivalent to 18 other mutations defining the haplogroup P is derived in a broader group of chromosomes ... In our worldwide sample of 7462 Y chromosomes, we observe the newly defined paragroup P-P295* in 83 chromosomes from Island Southeast Asia (Timor, Sumba, Sulawesi) and the Negrito Aeta population from Philippines (Table 1 and Figure 2) ...and the time between the common ancestor P-P295 and that of P-P27 to be 12.3 ky ..."

The very high distance between the age of P-P295 and P-P27 (ie. about 12.3 ky) indicates quite clearly that P-P295 (or K2b2) is only slightly younger than K2b. In fact, P295 seemed to be the only mutation from the former P level that turned out to be positioned upstream of the remaining ones (hence the distance between the K2b-P331 and K2b2-P295 was calculated as maximally 2 ky, and most likely not more than 1 ky). In other words, P-P295 (or K2b2) is maximally 5 ky younger than K (and most likely only about 2 ky younger than K), ie. about 49-51 ky old, while P-P27 (or P1/QR) is about 12.3 ky younger than K2b2, thus probably about 37.5 ky old, or let’s say 35-40 ky old, as suggested by all other SNP-based trees, including those from the Malta and Anzick-1 papers.

vettor
01-03-2015, 09:03 PM
. In other words, P-P295 (or K2b2) is maximally 5 ky younger than K (and most likely only about 2 ky younger than K), ie. about 49-51 ky old, while P-P27 (or P1/QR) is about 12.3 ky younger than K2b2, thus probably about 37.5 ky old, or let’s say 35-40 ky old, as suggested by all other SNP-based trees, including those from the Malta and Anzick-1 papers.

so you are saying from the creation of K to P-P295 ( as you state 2 ky ) there was created the following
G
H
I
J
LT
MPS
and X

seem very many..............what caused this?

parasar
01-03-2015, 09:16 PM
Since nothing indicates that any P(xQR) sample was analyzed in their study, it is quite obvious that their haplogroup P group should be considered as positive for all known SNPs from the P/P1 levels (even if most of them were not tested), and thus the ratio provided above does indeed correspond to the ratio between the age of QR (or P1) and the age of IJK, so the age of 37.1 ky for QR should be considered as reliable (of course, within a reasonable margin of error).


The very high distance between the age of P-P295 and P-P27 (ie. about 12.3 ky) indicates quite clearly that P-P295 (or K2b2) is only slightly younger than K2b. In fact, P295 seemed to be the only mutation from the former P level that turned out to be positioned upstream of the remaining ones (hence the distance between the K2b-P331 and K2b2-P295 was calculated as maximally 2 ky, and most likely not more than 1 ky). In other words, P-P295 (or K2b2) is maximally 5 ky younger than K (and most likely only about 2 ky younger than K), ie. about 49-51 ky old, while P-P27 (or P1/QR) is about 12.3 ky younger than K2b2, thus probably about 37.5 ky old, or let’s say 35-40 ky old, as suggested by all other SNP-based trees, including those from the Malta and Anzick-1 papers.

Then we are in agreement, except that I'm applying a 1.1385x factor to Karafet's timeframes, so for eg. that 12.3k becomes 14k.

Michał
01-04-2015, 01:16 AM
so you are saying from the creation of K to P-P295 ( as you state 2 ky ) there was created the following
G
H
I
J
LT
MPS
and X

seem very many..............what caused this?

G, H, I and J do not descend from K.
However, you are indeed right that all these major subclades downstream of F (including those downstream of K) were born within a relatively short time period, probably between 55 and 50 kybp. As for the most likely reason for this sudden expansion, I can imagine many different scenarios, including extremely favorable climatic conditions, a significant "technological" progress, or even some unknown beneficial consequences of interbreeding with H. neanderthalensis.

IMO, the most intriguing part of the above scenario is that it seemed to have included two different "expansion events" taking place in very remote locations. The initial "Near Eastern" expansion was supposed to take place within a short period of about 1 ky (probably between 55 and 54 kybp, or very close to this time range), producing several different sublineages of F (including F*, F1, F2, F3), plus G, H, IJ and K. This was quite quickly followed by another sudden expansion that was supposed to take place in SE Asia (Sundaland?) in a period between about 52 and 50 kybp, which resulted in producing LT (K1), NO (K2a), MP, S, P (K2b2), P1, K2c and K2d, among others. This means that between the "Near Eastern" birth of haplogroup K (or its separation from IJ) and its subsequent SE Asian expansion there was only about 2 ky (or even 1.5 ky, if assuming that the FGC-tested mutations from the K level include only 17 SNPs) during which the relatively small "tribe K" traveled a very long way (about 6,000-8,000 km) to reach a place where something caused its sudden explosion (with the descending subclades migrating in very different directions and colonizing nearly entire Eurasia, America, and even some regions in Africa).

Another intriguing element of the above scenario are the hypothetical interactions between different haplogroups descending from haplogroup F and their very distant relatives from haplogroups C and D (including the hypothetical "Negritos" mentioned above by parasar), especially in such places like Europe (where haplogroups C and I seemed to have coexisted for many millennia) or in India (when one should expect to find a much higher frequency of D, a haplogroup common among the Andaman Islanders,Tibetans and Japanese (Ainu). This is also related to a hypothetical correlation between the major non-African Y-DNA haplogroups (like C, D and F) and their potential mitochondrial counterparts (like N, M and R), but I guess this is not a proper thread for discussing all these issues.

MJost
01-26-2015, 05:31 PM
I pulled the number of SNPs for each DF13+ kit out of AlexW's BigY L21 Tree spreadsheet. Using Excel counting formula required a change of the Blank Colored Cells to a value. I just did a quick and dirty cell change so it may not be 100% correct but will be almost. :) This graphical representation of the distribution of data histogram is based on 741 DF13 kits and it also contains some FGC Genome kits in the mix but you can see the nice bell curve for the BigY section of the chart.

SNPplot-DF13-SNPCount
https://drive.google.com/file/d/0By9Y3jb2fORNVkNnNWN2dXExR28/view?usp=sharing

MJost

vettor
01-26-2015, 06:18 PM
G, H, I and J do not descend from K.
However, you are indeed right that all these major subclades downstream of F (including those downstream of K) were born within a relatively short time period, probably between 55 and 50 kybp. As for the most likely reason for this sudden expansion, I can imagine many different scenarios, including extremely favorable climatic conditions, a significant "technological" progress, or even some unknown beneficial consequences of interbreeding with H. neanderthalensis.

IMO, the most intriguing part of the above scenario is that it seemed to have included two different "expansion events" taking place in very remote locations. The initial "Near Eastern" expansion was supposed to take place within a short period of about 1 ky (probably between 55 and 54 kybp, or very close to this time range), producing several different sublineages of F (including F*, F1, F2, F3), plus G, H, IJ and K. This was quite quickly followed by another sudden expansion that was supposed to take place in SE Asia (Sundaland?) in a period between about 52 and 50 kybp, which resulted in producing LT (K1), NO (K2a), MP, S, P (K2b2), P1, K2c and K2d, among others. This means that between the "Near Eastern" birth of haplogroup K (or its separation from IJ) and its subsequent SE Asian expansion there was only about 2 ky (or even 1.5 ky, if assuming that the FGC-tested mutations from the K level include only 17 SNPs) during which the relatively small "tribe K" traveled a very long way (about 6,000-8,000 km) to reach a place where something caused its sudden explosion (with the descending subclades migrating in very different directions and colonizing nearly entire Eurasia, America, and even some regions in Africa).

Another intriguing element of the above scenario are the hypothetical interactions between different haplogroups descending from haplogroup F and their very distant relatives from haplogroups C and D (including the hypothetical "Negritos" mentioned above by parasar), especially in such places like Europe (where haplogroups C and I seemed to have coexisted for many millennia) or in India (when one should expect to find a much higher frequency of D, a haplogroup common among the Andaman Islanders,Tibetans and Japanese (Ainu). This is also related to a hypothetical correlation between the major non-African Y-DNA haplogroups (like C, D and F) and their potential mitochondrial counterparts (like N, M and R), but I guess this is not a proper thread for discussing all these issues.

yes , you are correct, they came from F, but you are in error when you state K1 formed in Sundaland, when the karafet 2014 paper states otherwise. IIRC K1 and K2a ( XNO ) formed around india and Burma zone ......might even be Bhutan!!

J1 DYS388=13
01-26-2015, 08:16 PM
I pulled the number of SNPs for each DF13+ kit out of AlexW's BigY L21 Tree spreadsheet. Using Excel counting formula required a change of the Blank Colored Cells to a value. I just did a quick and dirty cell change so it may not be 100% correct but will be almost. :) This graphical representation of the distribution of data histogram is based on 741 DF13 kits and it also contains some FGC Genome kits in the mix but you can see the nice bell curve for the BigY section of the chart.

SNPplot-DF13-SNPCount
https://drive.google.com/file/d/0By9Y3jb2fORNVkNnNWN2dXExR28/view?usp=sharing

MJost

I hope this is not too elementary a question. In the time since DF13 occurred, one lineage led to a case with 95 further SNPs, and another lineage led to a case with only 20 further SNPs. What caused that vast difference?

George Chandler
01-26-2015, 08:17 PM
I pulled the number of SNPs for each DF13+ kit out of AlexW's BigY L21 Tree spreadsheet. Using Excel counting formula required a change of the Blank Colored Cells to a value. I just did a quick and dirty cell change so it may not be 100% correct but will be almost. :) This graphical representation of the distribution of data histogram is based on 741 DF13 kits and it also contains some FGC Genome kits in the mix but you can see the nice bell curve for the BigY section of the chart.

SNPplot-DF13-SNPCount
https://drive.google.com/file/d/0By9Y3jb2fORNVkNnNWN2dXExR28/view?usp=sharing

MJost

I don't think it's a good idea to mix the different testing results as the coverage between Big Y and FGC testing is different. That being said I know I've received Big Y results with more reliable SNP's than FGC but that's just luck of the draw that the mutations happened in those areas to be picked up. I just think the 2 different sequencing levels should be separated when trying to determine the age or average number of SNP's. Are these obtained from all BAM files or some just CSV files? Are they only reliable SNP's or all discovered SNP's below DF13? Have they been Sanger tested?

George

MJost
01-26-2015, 09:05 PM
I hope this is not too elementary a question. In the time since DF13 occurred, one lineage led to a case with 95 further SNPs, and another lineage led to a case with only 20 further SNPs. What caused that vast difference?In my post I mention, kits included both BigY and Full Genome where the later has considerable more coverage age read. The Bell curve are is where the BigY kits fall, I don't think, but haven't really looked but I will assume mostly right side of the graph are mostly the Full genome kits.

There will be some difference in SNPs shown due to the BigY coverage is not standardized for what ever reason.

MJost

MJost
02-13-2015, 05:34 AM
I have recalibrate my list of SNPs in the YFull R-Tree with Mal'ti boy's Calibrated age of 24Kya for reference. I recalibrated T. Karafet et al. estimated the age of R1, the parent of R1b, as 18,500 (12,500 - 25,700) to find R's age which is 23kya.



YBP
BCE
SUM
SNPs in Block
Chr Y HG


24000
22000
23
23
R


20000
18000
27
4
R.1-Y482


19304
17304
71
44
R1


11652
9652
72
1
R1b-M343/PF6242


11478
9478
74
2
R1b1.1 M415/PF6251 * L278


11130
9130
75
1
R1b1.2 L389/PF6531


10957
8957
77
2
R1b1a P297/PF6398 * L320/PF6092


10609
8609
84
7
R1b1a2 - M269


9391
7391
87
3
R1b1a2a - L23


8870
6870
91
4
R1b1a2a1 - L51


8174
6174
99
8
R1b1a2a1a - L11


6783
4783
101
2
R1b1a2a1a2 - P312


6435
4435
107
6
R1b1a2a1a2c - L21


5391
3391
109
2
DF13


5043
3043
138
29
FGC5494


0
-2000
138
0
Present 1950




Num SNPs
138





YPSNP
174





MA1
24000





MJost

MJost
02-13-2015, 06:03 AM
Here is estimated age by Karafet R1 Adj Karafet to R results in 23kya vs 24kya MA1.



YBP
BCE
SUM
SNPs in Block
Chr Y HG


23000
21000
23
23
R


19167
17167
27
4
R.1-Y482


18500
16500
71
44
R1


11167
9167
72
1
R1b-M343/PF6242


11000
9000
74
2
R1b1.1 M415/PF6251 * L278


10667
8667
75
1
R1b1.2 L389/PF6531


10500
8500
77
2
R1b1a P297/PF6398 * L320/PF6092


10167
8167
84
7
R1b1a2 - M269


9000
7000
87
3
R1b1a2a - L23


8500
6500
91
4
R1b1a2a1 - L51


7833
5833
99
8
R1b1a2a1a - L11


6500
4500
101
2
R1b1a2a1a2 - P312


6167
4167
107
6
R1b1a2a1a2c - L21


5167
3167
109
2
DF13


4833
2833
138
29
FGC5494


0
-2000
138
0
Present 1950


Num R1 SNPs
111
Num R SNPs
138



YPSNP
167
YPSNP
167



Karafet R1

18500
Adj Karafet to R
23000






T. Karafet et al. estimated the age of R1, the parent of R1b, as 18,500 (12,500 - 25,700)



http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2336805/




MJost

Megalophias
02-13-2015, 06:35 AM
Very interesting.

And totally incompatible with the results I got SNP counting from Hallast et al. I wonder why? Just the huge error bars inherent in SNP counting, or is my calibration totally off?

For comparison, I would make R ~31 000 YBP, R1 ~23 000 YBP, R1b1-L278 22 000 YBP, R1b1a2a-L23 and R1b1a2a-L51 6600 YBP (not distinguished, so L51 probably <800 years later), R1b1a2a1a-L11 4700 YBP, R1b1a2aa1a2-P312 4000-4500 YBP.

So the older nodes much older, but the younger nodes quite a bit younger.

Heber
02-13-2015, 03:57 PM
Interesting announcement (below) by YFull. I look forward to the article which explains their SNP counting algorithm.
I have analysed my Big Y and Y Elite with YFull and am awaiting results of another distant cousin who tested Big Y.
This should be a good benchmark and apples to apples comparison.
YFull have a good database of R1b and R1a so this could be a good opportunity to estimate the age of both.
They also have an impressive phylogenetic tree which appears to be computer generated.
Here is what the results look like.
https://www.pinterest.com/gerardcorcoran/yfull/
and here is what FGC looks like:
https://www.pinterest.com/gerardcorcoran/full-genome-corporation/
Here is my analysis of the expansion of R1b and P312 and L21:
https://www.pinterest.com/gerardcorcoran/r1b/
https://www.pinterest.com/gerardcorcoran/r1b-p312/
https://www.pinterest.com/gerardcorcoran/r1b-l21/

Here is the paper by Hallast:
http://mbe.oxfordjournals.org/content/early/2014/12/13/molbev.msu327.full
Although Hallast clearly shows P312 expanding in the Atlantic Zone, it will be interesting to see what ancient DNA tells us when it becomes available. I am also interested in the Stelae found in Samara and those of the Atlantic Zone.
https://www.pinterest.com/gerardcorcoran/the-stelae-people/

"In the version 3.4 of YFull Y-Tree
http://www.yfull.com/tree/R1b1a2a1a2/
we plan to show an estimation of age for all subclades with at least one Big Y or Y Elite in our database. The algorythm of estimation by SNP count we will explain later in an article written by Dmitriy Adamov, Vladimir Gurianov, Sergey Karzhavin, Vladimir Tagankin, Vadim Urasin. We have checked our estimation by information of common ancestors our clients. For 12 of 14 subclades estimated age is inside 95% confidence interval. But estimated age of I-YP1012 and I-A379 is not. In the chart you can see all 14 subclades with known ancestors. For all subclades in the Y-Tree a confidence interval depends from number of samples. For subclades with 1 sample a bounds of 95% confidence interval of age estimation are -48.8% +61.6% in the average, for subclades with 2 samples a bounds of 95% confidence interval are -43.3% +50.1% in the average, for 3 samples: -34.6% +37,3%, for 4 samples: -28,9% +32,2%. More samples - better estimation."

MitchellSince1893
02-14-2015, 05:06 AM
FWIW,

A member PMed about the number of SNPs between P312 and the present. I went and looked at Richard Rocca's U152 tree (U152 is one SNP below P312) and began counting SNPs on various "bushy" branches. The high outlier was 110 SNPs (maybe this was a FGC test...not sure why it's so high compared to the others). My own line currently has 24 known SNPs + 22 private SNPs = 46 SNPs

I then looked at the 5 clusters on the tree that share common surnames i.e. Barry, Bolgeri, Duncan, Graves, McCarthy. Here I found more consistency in the counts. For arguments sake I'm going to assume the shared common ancestor in each cluster lived around 1600 AD and there has been an average of 3 SNPs since the common ancestor to the present time.

Total SNPs in line including U152:
Barry cluster 34 + 3 SNPs from 1600 to present = 37
Bolgeri Cluster 28 + 3 =31
Duncan cluster 31 + 3 = 34
Graves cluster 31 + 3 = 34
McCarthy cluster 31 +3 =34

So on the U152 tree it appears there are around 34 SNPs (a statistical median number) below P312 to the present.
Then the question is which number of years to multiply by to get the age of P312.

I've seen anywhere between 120 to 180 years depending on the source
120 x 34 = 4080 years before present ybp
180 x 34 = 6120 ybp

Vince T in an earlier post in this thread said
BigY's coverage implies a rate closer to 137 years per SNP identified among the ~10 million bases sequenced by that test, according to several reports

That would be 34 x 137 = 4658 ybp for P312 ... that seems in the ball park to me based on what I believe Alan and Richard Rocca have said about Bell Beaker''s arrival in Central Europe...correct me if I'm wrong.

Also it's old enough to not be in conflict with P312 mentioned in the David Reich etc paper
I0806 Bell_Beaker_LN Bell Beaker LN Quedlinburg VII 2, Germany; QLB28b, feature 19617 2296-2206 cal BCE

And yes I do realize there are plenty of assumptions when coming up with these numbers.

FWIW

MitchellSince1893
02-15-2015, 01:17 AM
I heard about this in another thread but just saw it myself. The Yfull tree now has formed and TMRCA dates

L150/PF6274/L150.1/PF6274.1/L150.2/PF6274.2 * L23/S141/PF6534 * L49.1/L49.2/PF6276/S349: formed 12200 ybp, TMRCA 6400 ybp
L51/M412/S167/PF6536 * PF6414 * CTS10373/PF6537: formed 6400 ybp, TMRCA 6000 ybp
L151/PF6542 * P310/S129/PF6546 * YSC0000191/PF6543/S1159: formed 6000 ybp, TMRCA 6000 ybp
P312: formed 6000 ybp, TMRCA 5800 ybp

MJost
02-15-2015, 04:33 AM
if MA1 really is 24Kya by calibrated max then how is YFulls age so high, formed 31600 ybp, TMRCA 27800 ybp. They show R-M222 as 4700 ybp, TMRCA 1700 ybp.

MJost

MitchellSince1893
02-15-2015, 05:20 AM
if MA1 really is 24Kya by calibrated max then how is YFulls age so high, formed 31600 ybp, TMRCA 27800 ybp. They show R-M222 as 4700 ybp, TMRCA 1700 ybp.

MJost


Here is a quote about this new feature from YFull's facebook page


In the version 3.4 of Y-Tree we plan to show an estimation of age for all subclades with at least one Big Y or Y Elite. The algorythm of estimation by SNP count we will explain later in an article written by Adamov, Vladimir Gurianov, Sergey Karzhavin, Владимир Семаргл, Vadim Urasin We have checked our estimation by information of common ancestors our clients. For 12 of 14 subclades estimated age is inside 95% confidence interval. But estimated age of I-YP1012 and I-A379 is not. In the chart you can see all 14 subclades with known ancestors. For all subclades in the Y-Tree a confidence interval depends from number of samples. For subclades with 1 sample a bounds of 95% confidence interval of age estimation are -48.8% +61.6% in the average, for subclades with 2 samples a bounds of 95% confidence interval are -43.3% +50.1% in the average, for 3 samples: -34.6% +37,3%, for 4 samples: -28,9% +32,2%. More samples - better estimation smile emoticon

https://scontent-atl.xx.fbcdn.net/hphotos-xfp1/v/t1.0-9/10996793_658259384296381_7234424791344635341_n.jpg ?oh=4d88bdab588c62abb70e374c231f795c&oe=55535BD3

FYI: If you hold your mouse over the formed and TMRCA dates on the Yfull tree you will see the confidence interval

For haplogroup R their 95% confidence interval is formed age 34100 to 29500 ybp and TMRCA is 30100 to 25500 ybp

There are a couple of posters on the facebook page that know their TMRCA. Here are their comments about Yfull's age estimates


FYI. We have high confidence that the founder of S781 was Sir John Stewart of Bonkyl, born about 1245. It will be interesting to see how close your model comes for S781 for which you have 10 samples in the database...YFull team. I will give you an A- for S781 as you are within 10%.


Excellent work. For the four of us who are Q-Y4929 we know our most recent common ancestor was Joseph VICK who was born around 1640-1650. You estimate 425 ybp, so that seems to me to be quite good.

For my own line they have:

P312: formed 6000 ybp, TMRCA 5800 ybp
-U152: formed 5800 ybp, TMRCA 5300 ybp
--L2: formed 5300 ybp, TMRCA 5300 ybp
---Z49: formed 5300 ybp, TMRCA 4500 ybp
----Z142: formed 4500 ybp, TMRCA 4400 ybp
-----Z150: formed 4400 ybp, TMRCA 4200 ybp
------R-Y3140, Y3144 * Y3140 * Y3141 * Y3142 * Y3143 * Y10987: formed 4200 ybp, TMRCA 3700 ybp (Abraham Lincoln's current terminal branch)
-------R-Y9080 * Y10984 * Y10985 *Y10986 * Y4272: formed 3700 ybp, TMRCA 3000 ybp (my current terminal SNP shared with YF02596 aka FTDNA-268283)

jbarry6899
02-15-2015, 01:32 PM
The TMRCA for Z49>S8183>Y4356>Y11178 of 750 ybp seems reasonable given the family history. We have another test in process that defines a more recent subclade. (Haven't been able to hover and see the confidence intervale; may be a Mac browser limitation.)

MitchellSince1893
02-15-2015, 02:36 PM
The TMRCA for Z49>S8183>Y4356>Y11178 of 750 ybp seems reasonable given the family history. We have another test in process that defines a more recent subclade. (Haven't been able to hover and see the confidence intervale; may be a Mac browser limitation.)

For Y11178 the 95% CI is 5700 to 3500 ybp, and the TMCRA is 1300 to 300 ybp.

I'm looking forward to when my branch gets so close to modern times...will help to solve my paternal line mystery. At least I've narrowed the time from from 1000 BC and 1893 AD. A few months ago it was 2400 BC to 1893 AD.

jbarry6899
02-15-2015, 02:42 PM
For Y11178 the 95% CI is 5700 to 3500 ybp, and the TMCRA is 1300 to 300 ybp.

I'm looking forward to when my branch gets so close to modern times...will help to solve my paternal line mystery.

Thanks; I was able to fine the CI with Firefox, but not with Safari or Chrome. The two of use listed by YFull in Y11178 share a surname and general family location. The third kit is in process at YFull, but Rich has analyzed the BAM and we differ from him by only one identified SNP. He also shares the surname but has a modal value of DYS388=12, while the other man and I have 11. STR matching is generally consistent with what YFull estimates for TMRCA.

Heber
02-17-2015, 09:29 PM
I took the YFull TMRCA and Formed estimates on the YFull Tree and mapped them against my terminal SNP R1b-L21-S5456 and defining mutations back to R1b and to some of the Reich samples.
They seem to be on the high side.
https://www.pinterest.com/gerardcorcoran/r1b/
3798

Heber
03-11-2015, 01:10 AM
DNA mutation clock proves tough to set

"Last year, population geneticist David Reich of Harvard Medical School in Boston, Massachusetts, and his colleagues compared the genome of a 45,000-year-old human from Siberia with genomes of modern humans and came up with the lower mutation rate2. Yet just before the Leipzig meeting, which Reich co-organized with Kay Prüfer of the Max Planck Institute for Evolutionary Anthropology, his team published a preprint article3 that calculated an intermediate mutation rate by looking at differences between paired stretches of chromosomes in modern individuals (which, like two separate individuals’ DNA, must ultimately trace back to a common ancestor). Reich is at a loss to explain the discrepancy. “The fact that the clock is so uncertain is very problematic for us,” he says. “It means that the dates we get out of genetics are really quite embarrassingly bad and uncertain.”

http://www.nature.com/news/dna-mutation-clock-proves-tough-to-set-1.17079

lgmayka
03-11-2015, 02:03 AM
Yet just before the Leipzig meeting, which Reich co-organized with Kay Prüfer of the Max Planck Institute for Evolutionary Anthropology, his team published a preprint article3 that calculated an intermediate mutation rate by looking at differences between paired stretches of chromosomes in modern individuals (which, like two separate individuals’ DNA, must ultimately trace back to a common ancestor).
Perhaps the clock is fine, but the assumptions are wrong. Any introgression from an archaic human into the autosome increases the number of differences. If one foolishly ignores such introgression and instead assumes that all differences are merely random mutations, the mutation rate will appear to be higher than it really is.

Also, natural or social selection can appear to either increase or decrease the mutation rate, depending on whether one compares selected vs. selected or selected vs. unselected.

rms2
04-08-2015, 12:11 PM
Can someone post a link to the latest YFull estimates for the major branches of R1b, beginning from M343 onward? In this case, I'm really not interested in anyone's own sparsely populated terminal SNP (including my own), just the bigger branches.

Thanks!

MJost
04-08-2015, 12:26 PM
Is this what your looking for?

YFull Experimental YTree v3.7

http://www.yfull.com/tree/R1b/

MJost

rms2
04-09-2015, 11:44 AM
Is this what your looking for?

YFull Experimental YTree v3.7

http://www.yfull.com/tree/R1b/

MJost

Thanks. I actually found it within a few minutes of making that earlier post, but I didn't have the time to go back and say never mind.

But thanks again anyway. Very useful stuff. Beats the old "cave man" dates that were current when I first got into genetic genealogy.

R.Rocca
04-28-2015, 02:38 PM
Has anyone tested the validity of the various SNP counting methods against the Hinxton 4 sample? It would seem like we know enough about the L21 tree to at least give it a shot...

Iron Age Briton (aka Hinxton 4)
Approximate age by authors: 2500-1800 years
Positive as per current tree: L21+L459+Z245+Z260+Z290+ > DF13+Z2542+ > DF21+ > FGC3903+ > Z246+ > DF25+
L21 Tree: https://dnaexplained.files.wordpress.com/2014/08/haplogroup-proj-17.png

MitchellSince1893
04-28-2015, 03:09 PM
On yfull tree

DF25/S253 formed 3800 ybp, TMRCA 3700 ybp
http://www.yfull.com/tree/R-DF25/

Muircheartaigh
04-28-2015, 05:02 PM
Has anyone tested the validity of the various SNP counting methods against the Hinxton 4 sample? It would seem like we know enough about the L21 tree to at least give it a shot...

Iron Age Briton (aka Hinxton 4)
Approximate age by authors: 2500-1800 years
Positive as per current tree: L21+L459+Z245+Z260+Z290+ > DF13+Z2542+ > DF21+ > FGC3903+ > Z246+ > DF25+
L21 Tree: https://dnaexplained.files.wordpress.com/2014/08/haplogroup-proj-17.png

Unless they have carried out NGS testing that they didn't report we can't use Hinxton 4 to determine the ages of the L21 tree. We can only say that DF25 is older than Hinxton 4 and that DF25 is his terminal SNP based on Targeted testing of known present day SNPs. Without NGS testing we don't know how many downstream private or extinct SNPs hinxton 4 had.
NB. If lack of finance was an obstacle to NGS testing, I'd be happy to contribute towards a Full Genomes Test

R.Rocca
04-28-2015, 05:34 PM
Unless they have carried out NGS testing that they didn't report we can't use Hinxton 4 to determine the ages of the L21 tree. We can only say that DF25 is older than Hinxton 4 and that DF25 is his terminal SNP based on Targeted testing of known present day SNPs. Without NGS testing we don't know how many downstream private or extinct SNPs hinxton 4 had.
NB. If lack of finance was an obstacle to NGS testing, I'd be happy to contribute towards a Full Genomes Test

We can at least validate SNP counting methods that are wrong if, based on the L21 tree a method is showing DF25 branch as younger or substantially younger than the confirmed age of Hinxton 4 (2500-1800 years).

razyn
04-28-2015, 07:24 PM
We can at least validate SNP counting methods that are wrong

Wouldn't that actually be invalidating them? I agree with your suggestion about the value of aDNA, even if it's only been tested on yesterday's chip -- but maybe not with the way this is phrased.

Muircheartaigh
04-28-2015, 10:09 PM
We can at least validate SNP counting methods that are wrong if, based on the L21 tree a method is showing DF25 branch as younger or substantially younger than the confirmed age of Hinxton 4 (2500-1800 years).

An age of 1800-2500 years for DF25 dosn't correspond to any of the SNP mutation rates that have been used in recent scientific papers, or with the Yfull calculation method. They all suggest a considerably greater age for DF25 implying that Hinxton 4 had a number of SNPs downstream of DF25, not present in today's present tested population, which is after all what would be expected.

R.Rocca
04-28-2015, 11:45 PM
Wouldn't that actually be invalidating them? I agree with your suggestion about the value of aDNA, even if it's only been tested on yesterday's chip -- but maybe not with the way this is phrased.

Yes...and the word invalid applies to my brain function when I wrote that sentence. :)