PDA

View Full Version : I1 level SNPs at YFull



deadly77
03-17-2020, 06:45 PM
Due to a rather serious population bottleneck, the modern descendants of the I1 haplogroup share a large number (>300) of SNP mutations after breaking away from the rest of haplogroup I and before the modern descendants diversify from each other (at least among currently tested folks in databases). For a while, the YFull tree has been pretty stable at the I1 level, using 312 SNPs to define haplogroup I1. However, when I looked at the top of the tree recently, I noticed that YFull appears to have slightly pruned the number of SNPs at the I1 level down to 310.

Cross referencing against an older YReport, it appears the SNPs that have been removed are:

Z2746 ChrY position (Hg38):11470758 A to G.
FGC33327 ChrY position (Hg38):11351427 T to A.

Both of these have a 1 star rating at YFull, and they are in a region close to the centromere.

Looking back through archive versions of the YFull tree, these two were removed from the YFull tree between version 7.10.01 and 8.00.1 in January/February 2020.

In my own results, I have an ambiguous read for FGC33327 (22A, 18T). Z2746 is more straightforward in terms of reads with 25G, 1T, but in my "Hg and SNPs" tab at YFull, it's listed as level I-Z58, although it's a 1-star SNP and doesn't appear on the tree.

Doesn't really change anything for most people as it's far up the tree at the root of I1. Just a little intrigued by the change. There are a few other 1-star SNPs that are still on the YFull tree that are at the I1 branch level - for example CTS898, L574 / Z2801, Z2741, FGC2468 / Y1831 and a few more.

spruithean
03-18-2020, 01:54 AM
Interesting. Any idea what the Hg19 positions are? I'm curious to see my reads for these.

deadly77
03-18-2020, 07:14 AM
Interesting. Any idea what the Hg19 positions are? I'm curious to see my reads for these.

Sure - my own data is mapped to Hg38, but I'm used to looking up Hg19 coordinates as the majority of ancient samples are in Hg19 format. Thankfully YFull displayed the position for both references when you look up a SNP.

Z2746 ChrY position (Hg19):13626434 A to G.
FGC33327 ChrY position (Hg19):13507103 T to A.

It may be that these SNPs are derived in most I1 samples, but with increased coverage YFull is finding derived reads for these SNPs in samples belonging to other haplogroups, so YFull may have removed them for phylogenetic inconsistencies.

JMcB
03-18-2020, 03:01 PM
Sure - my own data is mapped to Hg38, but I'm used to looking up Hg19 coordinates as the majority of ancient samples are in Hg19 format. Thankfully YFull displayed the position for both references when you look up a SNP.

Z2746 ChrY position (Hg19):13626434 A to G.
FGC33327 ChrY position (Hg19):13507103 T to A.

It may be that these SNPs are derived in most I1 samples, but with increased coverage YFull is finding derived reads for these SNPs in samples belonging to other haplogroups, so YFull may have removed them for phylogenetic inconsistencies.

Out of curiosity, I’m positive for: FGC33327 with 21 reads. While Z2746 is listed as ? (no call position).

mwauthy
03-18-2020, 03:46 PM
Thanks for the info. On a side note I’m kind of perturbed that YFull doesn’t update age estimates once they hit 100 samples for a subclade. As a result, my sample isn’t used for the aging of I1, I-DF29, I-Z58, I-Z59, or I-Z2041.

deadly77
03-19-2020, 08:21 AM
Thanks for the info. On a side note I’m kind of perturbed that YFull doesn’t update age estimates once they hit 100 samples for a subclade. As a result, my sample isn’t used for the aging of I1, I-DF29, I-Z58, I-Z59, or I-Z2041.

I must admit that I hadn't noticed that before. My sample is listed in the age estimate list for I-L338, I-YSC261, I-Z2535, I-Z141 but not the above levels (I-Z140, I-Z60, etc upstream). I'd be surprised if YFull is capping the contributing samples per branch at 100 - I think it's more likely that all samples contribute to the age estimate but only 100 are displayed on the YFull tree under the info button.

I did a quick check for I-Z140 (the most downtream branch where my kit isn't listed). The age estimate of 3630 ybp is from a single sample YF09964 at I-Z140* at 3084 ybp and the branch I-Z141 at 4177 ybp. If I cut and past the "age by this line only" for all of the samples listed (excluding YF09964 and the age for the branch I-Z141), paste them into excel and use excel's AVERAGE formula, I come out with a mean average of 4126.04 for the 100 samples listed for I-Z141. Which is different to the value of 4177 ybp for I-Z141 branch that's being used to contribute to age estimate for I-Z140, so I'd assume that kits outside of the 100 listed are contributing. But assumptions are often the mother of all you-know-whats, so probably something that's best cleared up by emailing YFull and asking them.

Large, upstream branches with a lot of samples contributing to age estimate might not obviously change when new samples added. From YFull's FAQ, the rounding of branch ages displayed in the tree "An age of less than 500 ybp is rounded to the nearest "25" (e.g., 267 becomes 275); an age of 500 to 1999 is rounded to the nearest "50" (e.g., 1267 becomes 1250); and an age of 2000 or more is rounded to the nearest "100" (e.g., 2267 becomes 2300)." https://www.yfull.com/faq/how-does-yfull-determine-formed-age-tmrca-and-ci/ - it's likely that new samples don't move the displayed rounded value very much unless they are statistical outliers and in larger branches with many samples contributing, the influence of outliers is less due to reducing statistical noise. Of course, samples closer to the root of a branch (see example YF09964 at I-Z140* contributing to I-Z140) will have a larger influence on the age estimation than a kit on a downstream branch which has been averaged out (such as mine at I-L338 where the individual age estimate is combined within 344 YF samples at I-Z141 to an average).

deadly77
03-19-2020, 08:36 AM
I do admit to being very confused about some of the things displayed on my individual age estimate on the homepage. Under "+ known SNPs" I have Z58 with a weight of 0.224 to the I-Z58 branch and a weight of 0.776 to the I-Z138 branch (which is parallel to I-Z59 and a branch I'm not part of), several SNPs contributing to I-Z58 (which are not there on the tree) and Z2721 (on the YFull tree at the I1 level) contributing a weight of 0.002 to I-Z17954. No idea what's going on there. 36871

JMcB
03-19-2020, 11:05 AM
I do admit to being very confused about some of the things displayed on my individual age estimate on the homepage. Under "+ known SNPs" I have Z58 with a weight of 0.224 to the I-Z58 branch and a weight of 0.776 to the I-Z138 branch (which is parallel to I-Z59 and a branch I'm not part of), several SNPs contributing to I-Z58 (which are not there on the tree) and Z2721 (on the YFull tree at the I1 level) contributing a weight of 0.002 to I-Z17954. No idea what's going on there. 36871

I also have two of those: I-Z58 Z58 / S244 @ 0.224 and I-Z138 Z58 / S244 @ 0.776

Mine doesn’t go below DF29, so nothing listed at the I1 level

JMcB
03-19-2020, 11:38 AM
I must admit that I hadn't noticed that before. My sample is listed in the age estimate list for I-L338, I-YSC261, I-Z2535, I-Z141 but not the above levels (I-Z140, I-Z60, etc upstream). I'd be surprised if YFull is capping the contributing samples per branch at 100 - I think it's more likely that all samples contribute to the age estimate but only 100 are displayed on the YFull tree under the info button.

I did a quick check for I-Z140 (the most downtream branch where my kit isn't listed). The age estimate of 3630 ybp is from a single sample YF09964 at I-Z140* at 3084 ybp and the branch I-Z141 at 4177 ybp. If I cut and past the "age by this line only" for all of the samples listed (excluding YF09964 and the age for the branch I-Z141), paste them into excel and use excel's AVERAGE formula, I come out with a mean average of 4126.04 for the 100 samples listed for I-Z141. Which is different to the value of 4177 ybp for I-Z141 branch that's being used to contribute to age estimate for I-Z140, so I'd assume that kits outside of the 100 listed are contributing. But assumptions are often the mother of all you-know-whats, so probably something that's best cleared up by emailing YFull and asking them.

Large, upstream branches with a lot of samples contributing to age estimate might not obviously change when new samples added. From YFull's FAQ, the rounding of branch ages displayed in the tree "An age of less than 500 ybp is rounded to the nearest "25" (e.g., 267 becomes 275); an age of 500 to 1999 is rounded to the nearest "50" (e.g., 1267 becomes 1250); and an age of 2000 or more is rounded to the nearest "100" (e.g., 2267 becomes 2300)." https://www.yfull.com/faq/how-does-yfull-determine-formed-age-tmrca-and-ci/ - it's likely that new samples don't move the displayed rounded value very much unless they are statistical outliers and in larger branches with many samples contributing, the influence of outliers is less due to reducing statistical noise. Of course, samples closer to the root of a branch (see example YF09964 at I-Z140* contributing to I-Z140) will have a larger influence on the age estimation than a kit on a downstream branch which has been averaged out (such as mine at I-L338 where the individual age estimate is combined within 344 YF samples at I-Z141 to an average).

I suspect there’s a limit to how many samples they want to list on these pages. As you say, the best thing to do is just write them and ask. They’ve usually responded to my emails. Although, sometimes it may take a while.

mwauthy
03-19-2020, 03:40 PM
I must admit that I hadn't noticed that before. My sample is listed in the age estimate list for I-L338, I-YSC261, I-Z2535, I-Z141 but not the above levels (I-Z140, I-Z60, etc upstream). I'd be surprised if YFull is capping the contributing samples per branch at 100 - I think it's more likely that all samples contribute to the age estimate but only 100 are displayed on the YFull tree under the info button.

I did a quick check for I-Z140 (the most downtream branch where my kit isn't listed). The age estimate of 3630 ybp is from a single sample YF09964 at I-Z140* at 3084 ybp and the branch I-Z141 at 4177 ybp. If I cut and past the "age by this line only" for all of the samples listed (excluding YF09964 and the age for the branch I-Z141), paste them into excel and use excel's AVERAGE formula, I come out with a mean average of 4126.04 for the 100 samples listed for I-Z141. Which is different to the value of 4177 ybp for I-Z141 branch that's being used to contribute to age estimate for I-Z140, so I'd assume that kits outside of the 100 listed are contributing. But assumptions are often the mother of all you-know-whats, so probably something that's best cleared up by emailing YFull and asking them.

Large, upstream branches with a lot of samples contributing to age estimate might not obviously change when new samples added. From YFull's FAQ, the rounding of branch ages displayed in the tree "An age of less than 500 ybp is rounded to the nearest "25" (e.g., 267 becomes 275); an age of 500 to 1999 is rounded to the nearest "50" (e.g., 1267 becomes 1250); and an age of 2000 or more is rounded to the nearest "100" (e.g., 2267 becomes 2300)." https://www.yfull.com/faq/how-does-yfull-determine-formed-age-tmrca-and-ci/ - it's likely that new samples don't move the displayed rounded value very much unless they are statistical outliers and in larger branches with many samples contributing, the influence of outliers is less due to reducing statistical noise. Of course, samples closer to the root of a branch (see example YF09964 at I-Z140* contributing to I-Z140) will have a larger influence on the age estimation than a kit on a downstream branch which has been averaged out (such as mine at I-L338 where the individual age estimate is combined within 344 YF samples at I-Z141 to an average).


You are correct. I just took a closer look at YFull’s description and they state “Branch numbers are averages of the numbers given for all of the samples in the branch (but the table will be limited to only the first 100 samples listed in order of ‘id’ number).”