PDA

View Full Version : Can Last Glacial Maximum evidence help us estimate SNP ages?



epp
07-30-2016, 02:03 PM
yfull estimates of haplogroup ages show a fallow period (between 39,000 and 30,000 BC) when virtually no new surviving y-dna lineages arose, and none of those that did arise seem to be of European origin. This would appear to indicate a human (and especially European) population struggling to survive and on the brink of extinction. Both before and after this period, yfull estimates that new haplogroups arose in substantial numbers.

The earth (and in particular Europe) is thought to have been at its least hospitable at the time of the Last Glacial Maximum, which is estimated to have occurred between 23,000 and 11,000 BC. This would seem to be the period in which SNP bottlenecks would be most likely to occur and fewer new lineages would be expected to arise.

So does this suggest that yfull's TMRCA estimates might be exaggerating (approximately doubling) the real ages of SNPs, and that SNPs are generally much younger than they are currently believed to be? Or might there be another reason why a pre-LGM environment might have been significantly less capable of allowing a diverse human gene pool to thrive than the LGM itself?

ArmandoR1b
07-30-2016, 09:27 PM
A lack of new branches between 39,000 and 30,000 BC does not mean that there weren't new SNPs and therefore doesn't throw off the real ages of SNPs very much. It's the number of SNPs that are used to calculate the estimated dates of the branches and not the number of branches. Relatively speaking, the YFull date estimates and the date estimates from Poznik et al. 2016 (http://www.nature.com/ng/journal/v48/n6/full/ng.3559.html) (see the Supplementary Table 10 page 47 of the Supplementary PDF (http://www.nature.com/ng/journal/v48/n6/extref/ng.3559-S1.pdf)) aren't that far off from each other and if anything, YFull has estimated the SNPs to be somewhat younger than they really are, contradicting your suggestion that YFull's TMRCA estimates might be exaggerating (approximately doubling) the real ages of SNPs.

miiser
07-30-2016, 10:19 PM
A lack of new branches between 39,000 and 30,000 BC does not mean that there weren't new SNPs and therefore doesn't throw off the real ages of SNPs very much. It's the number of SNPs that are used to calculate the estimated dates of the branches and not the number of branches. Relatively speaking, the YFull date estimates and the date estimates from Poznik et al. 2016 (http://www.nature.com/ng/journal/v48/n6/full/ng.3559.html) (see the Supplementary Table 10 page 47 of the Supplementary PDF (http://www.nature.com/ng/journal/v48/n6/extref/ng.3559-S1.pdf)) aren't that far off from each other and if anything, YFull has estimated the SNPs to be somewhat younger than they really are, contradicting your suggestion that YFull's TMRCA estimates might be exaggerating (approximately doubling) the real ages of SNPs.

I think you're missing the point of Epp's comment. Epp can correct me if I'm wrong. But I don't think he's suggesting that there were no SNPs.

He's suggesting that the glacial maximum might be used as a calibration point to set the number of years per SNP. The dearth of branches pinpoints the time at which the glacial maximum occurred because of the strain it put on the population, resulting in a haplogroup bottleneck, according to the hypothesis. YFull used a similar method to establish their "years per SNP" constant - associating a known historical event at a particular time with a particular node in the tree and taking an SNP count from that node. Epp is simply suggesting that the glacial maximum might be used as a similar calibration point.

If one accepts that the empty section of the tree does represent the glacial maximum, then it's a valid approach. If this is the case, then this glacial maximum calibration contradicts the calibration currently being used by YFull. As far as I know, YFull has never validated their calibration by comparing it to a second calibration of a different event. I think it's a reasonable concern that the calibration may not be correct, or at least not applicable to all times and places.

My own opinion is that SNP rate has not been proven to be a fully random process. The SNP rate can vary significantly, beyond what's expected of a random distribution. So the SNP rate within a particular branch and particular time range may not be the same as the SNP rate of a different branch in a different time range. Different calibration standards may, in fact, give different SNP rates.

Along this same line of reasoning, though, we cannot assume that a block of phylogenetically equivalent SNPs necessarily represents a long period of time without extant branching. It may simply be a large number of SNPs that occurred within a short time. So the dearth of branches mentioned by Epp may not even represent a population bottleneck, but just a clump of SNPs occurring within a normally growing population.

At any rate, I think Epp's hypothesis is an interesting idea that bears further consideration. I would like to see YFull's calibration validated by additional calibrations to more than one event.

epp
07-30-2016, 10:41 PM
A lack of new branches between 39,000 and 30,000 BC does not mean that there weren't new SNPs
So what does the lack of new branches mean? What are the likely reasons why the new branches would tail off after 40,000 BC and then begin to arise again in significant numbers after 30,000 BC?

It's the number of SNPs that are used to calculate the estimated dates of the branches
How do we know that the number of SNPs provides an accurate estimate of the dates of the branches?

epp
07-30-2016, 10:43 PM
Thanks, miiser. You've explained my thinking better than I have.

ArmandoR1b
07-30-2016, 11:21 PM
So what does the lack of new branches mean? What are the likely reasons why the new branches would tail off after 40,000 BC and then begin to arise again in significant numbers after 30,000 BC?
It simply means that the sons of each SNP weren't successful in branching out.


How do we know that the number of SNPs provides an accurate estimate of the dates of the branches?

We have ancient DNA such as Bell Beaker Germany Osterhofen-Altenmarkt [RISE563] http://www.ancestraljourneys.org/copperbronzeagedna.shtml (coverage of 0.329
per SupplementaryDataTable2 (http://biorxiv.org/content/early/2016/06/16/059311.figures-only) of Lazaridis et al. 2016) who is positive for U152. The Bell Beaker period was c. 2800 – 1800 BC and YFull estimates U152 to have a TMRCA of 4500 ybp (2500 BC). Since the coverage was only 0.329 we don't know which other U152 SNPs was positive for but at least we know that U152 is at least 3800 years old which definitely means YFull's TMRCA estimates are not exaggerating (approximately doubling) the real ages of SNPs.

ArmandoR1b
07-30-2016, 11:47 PM
I think you're missing the point of Epp's comment. Epp can correct me if I'm wrong. But I don't think he's suggesting that there were no SNPs.
Actually you missing my point. He used the term lineages for branches. There are multiple SNPs, sometimes many dozens of them in a branch, and those SNPs are used for dating.


He's suggesting that the glacial maximum might be used as a calibration point to set the number of years per SNP. The dearth of branches pinpoints the time at which the glacial maximum occurred because of the strain it put on the population, resulting in a haplogroup bottleneck, according to the hypothesis. YFull used a similar method to establish their "years per SNP" constant - associating a known historical event at a particular time with a particular node in the tree and taking an SNP count from that node. Epp is simply suggesting that the glacial maximum might be used as a similar calibration point.You are both still missing the point. The lack of branches is irrelevant for SNP dating.


If one accepts that the empty section of the tree does represent the glacial maximum, then it's a valid approach.
No, it's not. The SNPs average out over time. More branches just mean more people.


If this is the case, then this glacial maximum calibration contradicts the calibration currently being used by YFull. As far as I know, YFull has never validated their calibration by comparing it to a second calibration of a different event. I think it's a reasonable concern that the calibration may not be correct, or at least not applicable to all times and places.Poznik et al. used a different calibration and I provided links to that study and what to look for in that study.


My own opinion is that SNP rate has not been proven to be a fully random process. The SNP rate can vary significantly, beyond what's expected of a random distribution. So the SNP rate within a particular branch and particular time range may not be the same as the SNP rate of a different branch in a different time range. Different calibration standards may, in fact, give different SNP rates. That will be true for short time periods but over a thousand or more years there will be an average.


Along this same line of reasoning, though, we cannot assume that a block of phylogenetically equivalent SNPs necessarily represents a long period of time without extant branching. It may simply be a large number of SNPs that occurred within a short time. So the dearth of branches mentioned by Epp may not even represent a population bottleneck, but just a clump of SNPs occurring within a normally growing population.
Some SNPs can occur simultaneously in a person or a few generations but there is no evidence that dozens have happened often enough for the date estimates to be wildly off for branches from thousands of years ago.


At any rate, I think Epp's hypothesis is an interesting idea that bears further consideration. I would like to see YFull's calibration validated by additional calibrations to more than one event.
We have the U152 example I provided in my post #6. There are other examples of ancient DNA that can be used to completely refute Epp's hypothesis that "YFull's TMRCA estimates might be exaggerating (approximately doubling) the real ages of SNPs such as Anzick-1 which is about 13,000 to 12,600 calendar years bp and is positive for Q-Z780 which has a TMRCA 14200 ybp. If the YFull estimate were twice the actual age then Anzick would not have been able to have been Q-Z780.

MitchellSince1893
07-31-2016, 01:03 AM
A related article on climate extremes and mutation rates
https://www.sott.net/article/293696-Climate-change-may-produce-human-mutations

If accurate it implies an ice age might increase mutations.

miiser
07-31-2016, 01:09 AM
Actually you missing my point. He used the term lineages for branches. There are multiple SNPs, sometimes many dozens of them in a branch, and those SNPs are used for dating.

You are both still missing the point. The lack of branches is irrelevant for SNP dating.

I'm not saying I believe Epp's hypothesis. But it's clear that, in your initial post, the response you made was not an answer to the hypothesis that Epp intended to propose. Whether your points are right or wrong, I think you misunderstood what he was proposing. But let's not dwell on the past...



No, it's not. The SNPs average out over time. More branches just mean more people.

Poznik et al. used a different calibration and I provided links to that study and what to look for in that study.


When I say that I'd like a validation of YFull's calibration, I mean a second calibration, using the same technique based on SNPs, for a second historical event. I've read the original paper by YFull. It includes no such validation.




That will be true for short time periods but over a thousand or more years there will be an average.

Some SNPs can occur simultaneously in a person or a few generations but there is no evidence that dozens have happened often enough for the date estimates to be wildly off for branches from thousands of years ago.


How long will it take for those differences in rate to average out? In evolutionary biology and bioinformatics, it's pretty much an accepted fact that gamma ray bursts can cause a very large number of mutations, and these events have generated discontinuities in the evolutionary mutation rate. Such events can certainly generate dozens of SNPs within an individual. Do you deny the existence of such events? If such an event occurs once every few thousand years, then a few thousand years is definitely not long enough to average out the rate.



We have the U152 example I provided in my post #6. There are other examples of ancient DNA that can be used to completely refute Epp's hypothesis that "YFull's TMRCA estimates might be exaggerating (approximately doubling) the real ages of SNPs such as Anzick-1 which is about 13,000 to 12,600 calendar years bp and is positive for Q-Z780 which has a TMRCA 14200 ybp. If the YFull estimate were twice the actual age then Anzick would not have been able to have been Q-Z780.

These specific examples demonstrate that the calibration is not an over estimate - for these specific cases, within this branch and within this time period. But your claim that YFull's dates may actually be an under estimate supports the idea that the calibration rate might be inaccurate, and different calibrations based on different events might give different rates. YFull being a consistent under estimate for many branches, as you claim, indicates a systematic error. A systematic errors means that we are not observing just random variation, but an inaccuracy of the calibrated rate. If the mutation rate is not consistent through history, we should expect a calibration based on a single event to result in over estimates of some nodes and under estimates of other nodes. The under estimation of some nodes does not rule out the possibility of other nodes being over estimated.

ArmandoR1b
07-31-2016, 02:21 AM
I'm not saying I believe Epp's hypothesis. But it's clear that, in your initial post, the response you made was not an answer to the hypothesis that Epp intended to propose. Whether your points are right or wrong, I think you misunderstood what he was proposing.
His hypothesis is that the LGM needs to be used as a calibration point because he thinks YFull's estimations are off because there is a lack of branching between 39,000 and 30,000 BC and not in the LGM, which is when he expects for there to have been fewer branches (he uses the term lineages). My contention is that the branching is irrelevant because the Poznik dates are similar to YFull's and therefore the need to try and use the LGM as a calibration point is unnecessary. So my initial response was a counter to the data that he uses for his hypothesis. I'm basically saying that low branching between 39,000 and 30,000 BC and not in the LGM is a false cause to needing the LGM as a calibration point and I am also saying it is not an indication that YFull's TMRCA estimates might be exaggerating (approximately doubling) the real ages of SNPs.


When I say that I'd like a validation of YFull's calibration, I mean a second calibration, using the same technique based on SNPs, for a second historical event. I've read the original paper by YFull. It includes no such validation.They calibrated to both Ust-Ishm and Anzick-1 (I had forgotten that they used Anzick-1) so they did it with a pre-LGM specimen and a post-LGM specimen. I also provided the example of a Chalcolithic or Bronze Age specimen and how it's YFull age is not too old. Which other historical event are you looking for?



How long will it take for those differences in rate to average out? In evolutionary biology and bioinformatics, it's pretty much an accepted fact that gamma ray bursts can cause a very large number of mutations, and these events have generated discontinuities in the evolutionary mutation rate. Such events can certainly generate dozens of SNPs within an individual. Do you deny the existence of such events? If such an event occurs once every few thousand years, then a few thousand years is definitely not long enough to average out the rate.Just two dozen SNPs would throw off the dates by about 3,456 years. There isn't that much of a difference with the Bell Beaker.


These specific examples demonstrate that the calibration is not an over estimate - for these specific cases, within this branch and within this time period. But your claim that YFull's dates may actually be an under estimate supports the idea that the calibration rate might be inaccurate, and different calibrations based on different events might give different rates. YFull being a consistent under estimate for many branches, as you claim, indicates a systematic error. A systematic errors means that we are not observing just random variation, but an inaccuracy of the calibrated rate. If the mutation rate is not consistent through history, we should expect a calibration based on a single event to result in over estimates of some nodes and under estimates of other nodes. The under estimation of some nodes does not rule out the possibility of other nodes being over estimated.
The range of the error is not so great for me to be concerned about it at the moment. So far there isn't a significant contradiction between historical events and YFull date estimates and the inaccuracy isn't anywhere close to as great as what epp proposed.

miiser
07-31-2016, 03:48 AM
A related article on climate extremes and mutation rates
https://www.sott.net/article/293696-Climate-change-may-produce-human-mutations

If accurate it implies an ice age might increase mutations.

I think there is some truth to the idea of a connection between climate and mutation rate. But that website you link to seems a little fruity to me. There are pseudo scientific comments in the article regarding "electrophonic phenomena" and such.

epp
07-31-2016, 08:34 AM
It simply means that the sons of each SNP weren't successful in branching out.
Why weren't they? That is my question. Why did hardly any of these branches survive 39,000-30,000 BC, when plenty were successfully branching out both beforehand & afterwards?


We have ancient DNA such as Bell Beaker Germany Osterhofen-Altenmarkt [RISE563] http://www.ancestraljourneys.org/copperbronzeagedna.shtml (coverage of 0.329
per SupplementaryDataTable2 (http://biorxiv.org/content/early/2016/06/16/059311.figures-only) of Lazaridis et al. 2016) who is positive for U152. The Bell Beaker period was c. 2800 – 1800 BC and YFull estimates U152 to have a TMRCA of 4500 ybp (1500 BC). Since the coverage was only 0.329 we don't know which other U152 SNPs was positive for but at least we know that U152 is at least 3800 years old which definitely means YFull's TMRCA estimates are not exaggerating (approximately doubling) the real ages of SNPs.
A TMRCA of 4500 ybp is 2,500 BC (not 1,500 BC), which is 800 years earlier than the sample that you say is at least 3,800 years old.

epp
07-31-2016, 08:55 AM
The SNPs average out over time.
How do you know that SNPs average out over time? Take two of yfull's examples for Y4213 in haplogroup I2 - id YF04998's ancestors apparently had no new SNPs since 9,800 BC, whereas id YF02042's ancestors apparently had 92 new SNPs since the same date. Averaging out at 46 new SNPs tells us nothing about either of these samples, and there are plenty of wild variations like this across yfull's database.

More branches just mean more people. Exactly, and less branches means less people, as would have been the case when large areas of Eurasia would have been depopulated during the LGM.

Gravetto-Danubian
07-31-2016, 09:04 AM
Sorry off topic, but you're that same "epp" that thinks haplgroup I originated and expanded from northern Europe, right ??

epp
07-31-2016, 09:04 AM
A related article on climate extremes and mutation rates
https://www.sott.net/article/293696-Climate-change-may-produce-human-mutations

If accurate it implies an ice age might increase mutations.
This is interesting. If mutation rates do vary with climate, this would seem to reduce the reliability that we can place on estimates that are based the assumption that TMRCAs are proportional to the number of SNPs.

miiser
07-31-2016, 09:55 AM
They calibrated to both Ust-Ishm and Anzick-1 (I had forgotten that they used Anzick-1) so they did it with a pre-LGM specimen and a post-LGM specimen. I also provided the example of a Chalcolithic or Bronze Age specimen and how it's YFull age is not too old. Which other historical event are you looking for?


I just took a fresh look at the YFull paper by Adamov, et al. If anything, it clearly shows that calibrations to different standards give significantly different rates. Their discussion and conclusions amount to, "These calibration methods all give significantly different rates, but we think ours is best."

jdean
07-31-2016, 11:35 AM
I just took a fresh look at the YFull paper by Adamov, et al. If anything, it clearly shows that calibrations to different standards give significantly different rates. Their discussion and conclusions amount to, "These calibration methods all give significantly different rates, but we think ours is best."

Therein lies the nub !!

Megalophias
07-31-2016, 03:36 PM
Take two of yfull's examples for Y4213 in haplogroup I2 - id YF04998's ancestors apparently had no new SNPs since 9,800 BC, whereas id YF02042's ancestors apparently had 92 new SNPs since the same date. Averaging out at 46 new SNPs tells us nothing about either of these samples, and there are plenty of wild variations like this across yfull's database.
YF004998 has 74 SNPs, not zero.

ArmandoR1b
07-31-2016, 06:49 PM
Why weren't they? That is my question. Why did hardly any of these branches survive 39,000-30,000 BC, when plenty were successfully branching out both beforehand & afterwards?
We don't have a time machine to provide that answer.


A TMRCA of 4500 ybp is 2,500 BC (not 1,500 BC), which is 800 years earlier than the sample that you say is at least 3,800 years old.Yes, I typo'd 1500 BC instead of 2500 BC for 4500 ybp. However, I did correctly state that U152 specimen is at least 3,800 years old (because the Bell Beaker period was c. 2800 – 1800 BC) and I also correctly stated that the TMRCA of U152 (https://www.yfull.com/tree/R-U152/) is 4500 ybp. The difference between 3,800 years ago and 4500 years ago is 700 years and not 800 years and that difference isn't anywhere close to being 100% which is what you suggested it would be in your initial post. I also correctly stated that the coverage was only 0.329 so we don't know which other U152 SNPs he was positive for. He likely had some more SNPs which have a younger TMCRA.

ArmandoR1b
07-31-2016, 06:59 PM
How do you know that SNPs average out over time? Take two of yfull's examples for Y4213 in haplogroup I2 - id YF04998's ancestors apparently had no new SNPs since 9,800 BC, whereas id YF02042's ancestors apparently had 92 new SNPs since the same date. Averaging out at 46 new SNPs tells us nothing about either of these samples, and there are plenty of wild variations like this across yfull's database.
YF04998 just doesn't have any other SNPs that YFull has been able to show on the site because they don't know which ones are private and which ones aren't because there aren't enough people in his branch that have also had a BigY or FGC test that have submitted their files to YFull. Your misunderstanding of the process that YFull uses is causing you to come to erroneous conclusions.


Exactly, and less branches means less people, as would have been the case when large areas of Eurasia would have been depopulated during the LGM.
Again, we don't have a time machine in order to know all of the variables involved.

epp
07-31-2016, 07:48 PM
YF004998 has 74 SNPs, not zero.
https://www.yfull.com/tree/I2/ seems to show no new SNPs since 9,800 BC. Would you please identify where you got 74 from.

epp
07-31-2016, 08:29 PM
We don't have a time machine to provide that answer.
We don't have a time machine to provide any answers. As far as I know, yfull's answers haven't been verified by a time machine either, but that doesn't prevent yfull from making an estimate, nor anybody else from considering what possible reasons there could be to explain yfull's data. I'm not asking whether you have a time machine or the answer - I'm seeking possible hypotheses that might explain the data.
You seem to be under the impression that I am arguing that yfull's estimates are all double what they should be, but I haven't concluded (or even suggested) anything at all. I merely pointed out something striking in the data, and asked a question about it.

epp
07-31-2016, 08:51 PM
YF04998 just doesn't have any other SNPs that YFull has been able to show on the site because they don't know which ones are private and which ones aren't because there aren't enough people in his branch that have also had a BigY or FGC test that have submitted their files to YFull.
YF04998 is defined by yfull as I-Y4213* (with I-Y4213's TMRCA estimated as 11,800 ybp). However, YF02042 is identified as being in a subclade of this Y-Y4213, but having had 92 new SNPs added to it since Y4213.
If YF04998 might also have some newer SNPs, why would it be defined as I-Y4213* and not as I-Y4213 (as is the case with the not fully defined sample YF06281)?

epp
07-31-2016, 08:59 PM
I'll ask again my question - other than the possibility that yfull's TMRCA calculations could be over-estimated, can anyone provide a plausible reason to explain why hardly any y-dna branches survive 39,000-30,000 BC (in yfull's estimates), when there was plenty of successful branching out both beforehand & afterwards?

Megalophias
08-01-2016, 03:25 AM
https://www.yfull.com/tree/I2/ seems to show no new SNPs since 9,800 BC. Would you please identify where you got 74 from.
Click on 'info' next to Y4213.


The earth (and in particular Europe) is thought to have been at its least hospitable at the time of the Last Glacial Maximum, which is estimated to have occurred between 23,000 and 11,000 BC
A recent estimate puts the LGM at 26.5 to 19-20 thousand years ago. After 19 000 years ago it was still quite cold but the ice sheets were retreating. 14 700 years ago the Bølling-Allerød warm period began, lasting until the Younger Dryas which began 12 900 years ago. 11 700 years ago the last glacial period ended and we entered the Holocene.

What I see is a rapid diversification of C, D, E, F, and K around 50 000 years ago, associated with the initial spread of modern humans and Upper Palaeolithic technology - possibly connected to the relatively mild and rainy climate of Greenland Interstadials 14-12 about 54-44 000 years ago - followed by much more gradual splits during the period of oscillation between long cold stretches and short mild periods that followed. Then we see much more growth after the LGM. But I'm not seeing that there is growth during the LGM and not before.

Here is a tree from Karmin et al with times of Heinrich events 4 and 5, the LGM, the Bølling-Allerød, and the beginning of the Holocene marked. Keep in mind that the actual TMRCAs of haplogroups have large uncertainties. There is a rapid phase of expansion early on, and later on after the LGM when conditions improved, but the middle part was just small Ice Age populations of modern humans already established, so there is no reason we should expect a lot of branching during that period.
10729

ArmandoR1b
08-01-2016, 11:58 AM
YF04998 is defined by yfull as I-Y4213* (with I-Y4213's TMRCA estimated as 11,800 ybp). However, YF02042 is identified as being in a subclade of this Y-Y4213, but having had 92 new SNPs added to it since Y4213.
If YF04998 might also have some newer SNPs, why would it be defined as I-Y4213* and not as I-Y4213 (as is the case with the not fully defined sample YF06281)?

It is because for the 74 mutations that YF04998 does have, as Megalophias pointed out, many of them can be private. YFull needs other people that share many of those mutations in order to determine which are private in order to put them into a new subclade and estimate the date to the subclade. For instance, YF02042 and YF01699 match all of the extra SNPs, between I-Y4213 and I-Y4252, with YF06415, YF06255, YF02232, YF01783, and YF01522 so there are 5 people they were able to be compared to down to I-Y4252. YF02042 only shows 10 SNPs up to I-A417 and using the corrected number of SNPs as 11.35 they get an age estimate of 1700. YF04998 does not share any SNPs below I-Y4213 with anyone else so they use all 74 SNPs for a corrected number of 79.97 and an age estimate of 11,608. Without knowing which SNPs to use for a new branch for YF04998 then he can't be put into a younger branch with fewer SNPs to calculate an age of a younger branch.

So all of this goes back to you not understanding the process that YFull uses and because of it you come to erroneous conclusions. These things should be asked about before jumping to conclusions.

epp
08-01-2016, 09:33 PM
It is because for the 74 mutations that YF04998 does have, as Megalophias pointed out, many of them can be private. YFull needs other people that share many of those mutations in order to determine which are private in order to put them into a new subclade and estimate the date to the subclade. For instance, YF02042 and YF01699 match all of the extra SNPs, between I-Y4213 and I-Y4252, with YF06415, YF06255, YF02232, YF01783, and YF01522 so there are 5 people they were able to be compared to down to I-Y4252. YF02042 only shows 10 SNPs up to I-A417 and using the corrected number of SNPs as 11.35 they get an age estimate of 1700. YF04998 does not share any SNPs below I-Y4213 with anyone else so they use all 74 SNPs for a corrected number of 79.97 and an age estimate of 11,608. Without knowing which SNPs to use for a new branch for YF04998 then he can't be put into a younger branch with fewer SNPs to calculate an age of a younger branch.

So all of this goes back to you not understanding the process that YFull uses and because of it you come to erroneous conclusions. These things should be asked about before jumping to conclusions.
Sorry, sir! I think I'm now beginning to understand yfull's confusing presentation a little better. It would have been easier to spot and more user-friendly if the "info" identifying the number of the sample's SNPs were recorded on the same line as the sample number itself, rather than three lines above it. Although I still can't quite see why sample YF06281 would be shown on a branch separate from both I-Y4213* and I-L1287 (I-Y4213's only identified subclade) - presumably it is either one or the other, or perhaps half-private? (in which case, it might be clearer if it were presented in such a way that didn't make it look like it were neither in one nor the other)

Do we have evidence to indicate whether mutation rates can vary with climate, radiation levels, disease or other variables? Or whether SNPs can arise in clusters? If mutations were pretty much always singular and independent of other variables, then I would agree that we could confidently infer at least the relative ages of different SNPs from the yfull evidence (assuming, of course, that the people behind yfull are reliable in providing accurate and unbiased data).

miiser
08-01-2016, 09:55 PM
Sorry, sir! I think I'm now beginning to understand yfull's confusing presentation a little better. It would have been easier to spot and more user-friendly if the "info" identifying the number of the sample's SNPs were recorded on the same line as the sample number itself, rather than three lines above it. Although I still can't quite see why sample YF06281 would be shown on a branch separate from both I-Y4213* and I-L1287 (I-Y4213's only identified subclade) - presumably it is either one or the other, or perhaps half-private? (in which case, it might be clearer if it were presented in such a way that didn't make it look like it were neither in one nor the other)

Do we have evidence to indicate whether mutation rates can vary with climate, radiation levels, disease or other variables? Or whether SNPs can arise in clusters? If mutations were pretty much always singular and independent of other variables, then I would agree that we could confidently infer at least the relative ages of different SNPs from the yfull evidence (assuming, of course, that the people behind yfull are reliable in providing accurate and unbiased data).

Although you misread the data in this case, there are plenty of examples in which parallel branches of YFull's tree have wildly incongruous SNP counts, beyond what is expected of a random distribution. Now that you know how to read the data correctly, keep looking and you will find them.

The tree structure itself is evidence of a non random distribution. It is much bushier than it ought to be, were the SNPs randomly distributed. The large number of branches per node and the long strings of phylogenetically equivalent SNPs are excessive - indicative of at least some degree of SNP clumping. A random distribution of SNPs is not statistically capable of creating the tree that we have.

There are plenty of well known factors that can affect mutation rate - the age of the father, cosmic radiation events, toxic exposure (such as arsenic from mining and metallurgy), and others. There is no rational reason to assume the distribution of SNPs is fully random, or that it is sufficiently random that it will average out to a smooth rate within a few thousand years. I've seen multiple people assume that this is the case, but I've yet to see a mathematical demonstration that it is true. Similarly, I've seen people make the assumption that SNP counting is a more accurate dating method than STR variance, but I've seen no mathematical evidence to support this claim.

The problem is that too many people have pet theories which depend on the random interpretation - a population explosion in this or that particular branch is associated with this or that famous person, chiefly lineage, or historical invasion, etc. A very large narrative structure has been built up by various haplogroup project leaders and researchers that depends on the distribution being random. A non random distribution is viewed as a turd in the punch bowl. So some people are very hostile toward the idea. Damage the corner stone of random SNP distribution, and the entire edifice comes tumbling down.

epp
08-01-2016, 10:49 PM
A recent estimate puts the LGM at 26.5 to 19-20 thousand years ago. After 19 000 years ago it was still quite cold but the ice sheets were retreating. 14 700 years ago the Bølling-Allerød warm period began, lasting until the Younger Dryas which began 12 900 years ago. 11 700 years ago the last glacial period ended and we entered the Holocene.
These estimates would reduce the difference slightly, putting yfull's TMRCA estimates only about 65% higher than those based purely on a LGM calibration. In the shorter run, the positions are even closer, with the branch expansions of what seem to be Northern haplogroups estimated by yfull to have started to arise at an average 23,250 BC, and by the post-LGM hypothesis at some point after 17,500 BC (assuming the LGM estimate above is accurate) - a difference close to 30%


What I see is a rapid diversification of C, D, E, F, and K around 50 000 years ago, associated with the initial spread of modern humans and Upper Palaeolithic technology - possibly connected to the relatively mild and rainy climate of Greenland Interstadials 14-12 about 54-44 000 years ago - followed by much more gradual splits during the period of oscillation between long cold stretches and short mild periods that followed. Then we see much more growth after the LGM. But I'm not seeing that there is growth during the LGM and not before.
I would say yfull seems to indicate a slow initial diversification of [I]surviving[I] CT lineages outside of Africa, indicating to me an initial spread that was relatively unsuccessful (probably cut short by the worsening climate outside of the tropics). This then seemed to be followed by a more rapid diversification - by my analysis, nearly all within haplogroups that had stayed in or retreated to the Middle East or Southern Asia (C, F, H, IJ and K), rather than within Northern/European populations.

The following groups are examples of those estimated by yfull to have arisen during the LGM - G, I, pre-I1, I2, I2-L460, L, T, N, N-Y6503. My analysis indicates that each of these groups was most likely to have arisen North of the Alps-Caucasus-Himalayas. It just seems more plausible to me that multiple northern branches would probably have sprouted and grown northwards during the retreat of the LGM (10,000-17,000 BC), rather than in the middle of it (17,000-25,000 BC).

jamesdowallen
02-27-2018, 05:26 PM
I'm certainly posting in the wrong thread(*) but two comments on this topic:

(1) The European haplogroups I1, I2, R1a, R1b were still single lineage (as far as YFull is concerned) during the Late Glacial Maximum, so Europe's clading is almost irrelevant to this topic.

(2) Nevertheless I was intrigued enough to "mine" Yfull for ALL its mean TMRCA figures (6642 with duplicates eliminated). I attach a histogram of those dates. (Having gone to this bother, I'll consider requests for other ways to present this data, e.g. by group.)

(* - How do you all navigate the zillion of threads here?) :\

Note: To keep this table short, I've combined dates together. For example the "312 instances of 300 ybp" include the 225/250/275 counts; the "44 instances of 11000 ybp" include 10100 - 10900 ybp; and so on. The YFull data I used is up-to-date; I downloaded it immediately before preparing this table.

128 instances of 100 ybp
168 instances of 200 ybp
312 instances of 300 ybp
239 instances of 400 ybp
262 instances of 500 ybp
240 instances of 600 ybp
197 instances of 700 ybp
183 instances of 800 ybp
193 instances of 900 ybp
154 instances of 1000 ybp
152 instances of 1100 ybp
171 instances of 1200 ybp
146 instances of 1300 ybp
153 instances of 1400 ybp
184 instances of 1500 ybp
147 instances of 1600 ybp
159 instances of 1700 ybp
117 instances of 1800 ybp
123 instances of 1900 ybp
134 instances of 2000 ybp
197 instances of 2200 ybp
161 instances of 2400 ybp
151 instances of 2600 ybp
140 instances of 2800 ybp
188 instances of 3000 ybp
162 instances of 3200 ybp
171 instances of 3400 ybp
146 instances of 3600 ybp
146 instances of 3800 ybp
170 instances of 4000 ybp
132 instances of 4200 ybp
142 instances of 4400 ybp
117 instances of 4600 ybp
65 instances of 4800 ybp
46 instances of 5000 ybp
101 instances of 5500 ybp
86 instances of 6000 ybp
76 instances of 6500 ybp
59 instances of 7000 ybp
56 instances of 7500 ybp
67 instances of 8000 ybp
30 instances of 8500 ybp
40 instances of 9000 ybp
31 instances of 9500 ybp
33 instances of 10000 ybp
44 instances of 11000 ybp
37 instances of 12000 ybp
41 instances of 13000 ybp
31 instances of 14000 ybp
27 instances of 15000 ybp
29 instances of 16000 ybp
20 instances of 17000 ybp
8 instances of 18000 ybp
22 instances of 19000 ybp
6 instances of 20000 ybp
5 instances of 21000 ybp
6 instances of 22000 ybp
3 instances of 23000 ybp
2 instances of 24000 ybp
5 instances of 25000 ybp
2 instances of 26000 ybp
4 instances of 27000 ybp
2 instances of 28000 ybp
5 instances of 29000 ybp
2 instances of 30000 ybp
4 instances of 31000 ybp
3 instances of 32000 ybp
3 instances of 35000 ybp
2 instances of 36000 ybp
2 instances of 37000 ybp
3 instances of 39000 ybp
2 instances of 40000 ybp
3 instances of 42000 ybp
5 instances of 43000 ybp
3 instances of 45000 ybp
10 instances of 46000 ybp
4 instances of 47000 ybp
1 instances of 48000 ybp
3 instances of 49000 ybp
5 instances of 50000 ybp
1 instances of 51000 ybp
1 instances of 52500 ybp
1 instances of 54700 ybp
1 instances of 59300 ybp
1 instances of 65200 ybp
1 instances of 65900 ybp
1 instances of 68500 ybp
2 instances of 84800 ybp
1 instances of 88000 ybp
1 instances of 125600 ybp
1 instances of 130700 ybp
1 instances of 133400 ybp