PDA

View Full Version : I-A12974 Branch



deadly77
01-22-2020, 10:39 PM
Iíve been intrigued by the new(ish) I-A12974 branch close to the root of I-L338 for a couple of reasons Ė personally because this may be where I split away from the rest of the I-L338 folks and procedurally because it flies rather close to SNPs that we may consider unreliable for Y-DNA phylogeny.

YFull installed this new branch I-A12974 as a transition between I-L338 and two already established downstream branches I-Y33691 and I-Y12329 and this change to their phylogenetic tree first appeared in YTree version 7.08.01 (archived 21 October 2019) https://www.yfull.com/arch-7.08/tree/I1/ and remained on the YFull tree for the subsequent three versions of the tree, including the current version. Tracking back to version 7.07 of the YFull tree, archived 9 Sep 2019, I-A12974 doesnít appear and I-Y33691 and I-Y12329 are directly downstream of I-L338 along with I-Y15155 and I-S12289 https://www.yfull.com/arch-7.07/tree/I1/
3598435985

Over on the FTDNA public haplotree, they have also recently installed A12974 as a transition branch at the same position, representing several of the kits that are on downstream branches at YFull (which FTDNA has labelled I-A2006 and I-S24090) as well as a few more that are not and from the country report, thereís one kit that is I-A12974 but not belonging to either of the two downstream branches. https://www.familytreedna.com/public/y-dna-haplotree/I;name=I-A12974 Iím not sure when FTDNA installed this branch on their haplotree, as unlike YFull they donít seem to list archived versions. However, I donít think I-A12974 was on there in December.
3598635987

deadly77
01-22-2020, 10:40 PM
FTDNA also lists FGC74335 as a phyloequivalent SNP to A12974. This grabbed my attention as FGC74335 was first reported in my own YElite test as one of my novel SNPs. However, FGC74335 is located at ChrY position (Hg38): 5457635, which is in the X-Y homologue region of Yp11.2. YFull tagged FGC74335 as homologous >95% (low confidence) among my novel SNPs when my results were first analyzed, but has since removed FGC74335 from my list of novel SNPs. YSEQ flagged FGC74335 as not recommended when Wish a SNP was ordered, stating 98.1% similar to chromosome X + 92343012 92344014. I’ve seen YSEQ reject several SNPs in this region, and YFull doesn’t appear to add them to their tree, although FTDNA have several examples on their haplotree.

Looking up FGC74335 in the Groups view at YFull, I can see that most people in I-Z140 have the ancestral allele A or the position is not read (most likely Big Y Y500 kits). The only ones that show the derived allele G are my own two kits (YElite and WGS) and YF66605 on the I-Y33691 branch. For the others on that branch the position is not read as these are previous Big Y Y500 kits and the upgraded Y700 kits that are on the public YFull tree have not joined the I1-Z140 group at YFull. All three I-A8673 kits are Big Y Y500 with no read for this position. On the I-Y12663 branch, YF01814 is a Big Y Y500 kit that upgraded to Y700 or another test for the same individual, which is YF67369. FGC74335 is not read in the Y500 but is in the other test. However, the result is showing as ambiguous, with YFull reporting nucleotide “R”, which IUPAC uses to refer to A or G.
35988

deadly77
01-22-2020, 10:42 PM
Back to A12974 – when I received my YElite results in the excel spreadsheet mapped to hg19, A12974 was listed as one of my derived novel SNPs, but then when the WGS results came in shortly after, the BAM file showed that the result for A12974 was ambiguous with 18A (ancestral) and 27T (derived). After Full Genomes realigned to hg38, I received the hg38 BAM file and that also shows ambiguous call for A12974, although this time with 31A and 22T. Going back to the Groups view at YFull, most people in the I-Z140 project are showing the ancestral allele A. Under I-L338 (xS12289), two of the I-Y15155 kits are negative with A, while all of the I-Y12329 kits that I can see are positive (T), except YF01814 which has no reading (although upgraded kit YF67369 does). For the kits that I can see on the I-Y33691 branch, YF09616 and YF07607 are ambiguous for A12974, although upgraded kit YF66605 for the latter individual shows positive (T) for A12974. Can’t see the results for YF65004 (upgraded kit for YF09616) or YF66752 as these kits haven’t joined I1-Z140 YFull group, but these may be positive for A12974 as well.
35989

deadly77
01-22-2020, 10:45 PM
A12974 is located at ChrY position (Hg38): 10125784, which puts it pretty close to the centromere. A12974 won’t be available for Sanger single SNP testing at YSEQ because it falls in this region chrY:10072350..11686750. In recent years we’ve seen FTDNA use SNPs in regions that are highly identical such DYZ19, centromere, X-Y homologue region of Yp11.2, inside STRs, palindromes, etc. and in some cases YFull will tend to use these SNPs for branching, but less regularly and not for age estimation. YSEQ on the other hand tends to not recommend SNPs from these highly identical regions – I’ve seen Thomas Krahn say a few times that he doesn’t consider such SNPs reliable for Y-line phylogeny, given that there’s a higher probability of false mapping and that they’re prone to recombination events. And even if such a SNP has been inherited for several generations, there’s no guarantee that it won’t undergo recombination events in a future or parallel generation. Thomas has a presentation here https://www.yseq.net/images/2014-08_I4GG.pdf?fbclid=IwAR3cVxzY_tKezkogIsOMS1lzJ5T-Ac4T_VEZG4hdEWJWiTHXOSoOraD2qB4 – it’s from a few years ago, but worth a view.

Note that his presentation says that highly identical regions are not necessarily bad data, but require a bit more caution. Maybe in the future we can make better judgement calls with better human genome reference or long read technology. There’s an article discussing sequencing the Y chromosome centromere here https://www.theatlantic.com/science/archive/2018/03/y-chromosome-sequencing/556034/ using nanopore sequencing which references this paper in Nature Biotechnology https://www.nature.com/articles/nbt.4109

I had a go at looking up A12974 on YBrowse, zooming out to 1kbp, downloading the decorated FASTA file and pasting into UCSC BLAT against hg38 reference. Came up with this – if I’m reading this correctly, sequence is 94.7% similar to a sequence on chromosome 20.
35990
I’d be interested to hear what others think about this – is this a real branch that’s phylogenetically consistent, or is it fool’s gold? It’s not really necessary for either the I-Y33691 branch or the I-Y12329 branch as they have several more SNPs that describe their Y line phylogeny downstream of I-L338 without including A12974 or FGC74335. Although it does move them closer to each other (and possibly myself) than the I-Y15155/A1944 or I-S12289 branch. It doesn’t affect anything in genealogical records since it’s over 2500 years ago for all concerned and we all have a common ancestor at I-L338 anyway. So worth considering or overthinking something that may not be reliable?

JonikW
01-23-2020, 12:30 AM
A12974 is located at ChrY position (Hg38): 10125784, which puts it pretty close to the centromere. A12974 won’t be available for Sanger single SNP testing at YSEQ because it falls in this region chrY:10072350..11686750. In recent years we’ve seen FTDNA use SNPs in regions that are highly identical such DYZ19, centromere, X-Y homologue region of Yp11.2, inside STRs, palindromes, etc. and in some cases YFull will tend to use these SNPs for branching, but less regularly and not for age estimation. YSEQ on the other hand tends to not recommend SNPs from these highly identical regions – I’ve seen Thomas Krahn say a few times that he doesn’t consider such SNPs reliable for Y-line phylogeny, given that there’s a higher probability of false mapping and that they’re prone to recombination events. And even if such a SNP has been inherited for several generations, there’s no guarantee that it won’t undergo recombination events in a future or parallel generation. Thomas has a presentation here https://www.yseq.net/images/2014-08_I4GG.pdf?fbclid=IwAR3cVxzY_tKezkogIsOMS1lzJ5T-Ac4T_VEZG4hdEWJWiTHXOSoOraD2qB4 – it’s from a few years ago, but worth a view.

Note that his presentation says that highly identical regions are not necessarily bad data, but require a bit more caution. Maybe in the future we can make better judgement calls with better human genome reference or long read technology. There’s an article discussing sequencing the Y chromosome centromere here https://www.theatlantic.com/science/archive/2018/03/y-chromosome-sequencing/556034/ using nanopore sequencing which references this paper in Nature Biotechnology https://www.nature.com/articles/nbt.4109

I had a go at looking up A12974 on YBrowse, zooming out to 1kbp, downloading the decorated FASTA file and pasting into UCSC BLAT against hg38 reference. Came up with this – if I’m reading this correctly, sequence is 94.7% similar to a sequence on chromosome 20.
35990
I’d be interested to hear what others think about this – is this a real branch that’s phylogenetically consistent, or is it fool’s gold? It’s not really necessary for either the I-Y33691 branch or the I-Y12329 branch as they have several more SNPs that describe their Y line phylogeny downstream of I-L338 without including A12974 or FGC74335. Although it does move them closer to each other (and possibly myself) than the I-Y15155/A1944 or I-S12289 branch. It doesn’t affect anything in genealogical records since it’s over 2500 years ago for all concerned and we all have a common ancestor at I-L338 anyway. So worth considering or overthinking something that may not be reliable?

I would definitely post a pointer to this thread on a wider FTDNA post on the forum to make sure you get the necessary expertise. My only useful comment is that I take frequent screenshots from YFull, which has enabled me to chart the TMRCA changes on the S12289 branch. The last change there was in the past few weeks. Good luck because it looks to me that you're learning something about your own L338 line.

mwauthy
01-23-2020, 06:23 PM
I lack the scientific acumen to give you a helpful answer; however, here is my opinion regarding the matter. At this moment in time with the minimal amount of folks in the databases I’m satisfied with just focusing on “reliable” SNPs that are found on both the Ftdna and YFull Haplotrees for phylogeny and age estimates. In the future as more men test and more patrilineal differentiation will be necessary I’ll delve deeper at said time into these “questionable” SNPs. I don’t really care about questionable SNPs that are millennia old because in terms of what I want to know there is not much diffference between a common patrilineal ancestor from 500 BC or a “questionable” common patrilineal ancestor from 420 BC.
I would find these “questionable” SNPs to be more worthwhile similar to Y-111 STRs when you want fine detailed patrilineal differentiation within a genealogical timeframe when paper trails are incomplete.

JMcB
01-23-2020, 07:52 PM
I would imagine that sometime in the future we may be able to use those SNPs in a more reliable fashion. Although, for the time being, I’m inclined to listen to Thomas Krahn and generally disregard them. Coincidentally, my match received his results yesterday and of the 12 Novel Variants we match (11 @ FT), two of them are in the centromeric region. So I’ve been wondering how to consider them. Mostly, I’ve been ignoring them but not always. Perhaps, as Deadly said, they’re useful for branching but not for dating. Hopefully, he’ll send his results to YFull and we can let them figure it out.


Edit: After reading the article in the Atlantic, maybe the future is closer than I thought. Let’s hope!

deadly77
01-24-2020, 12:11 AM
I would definitely post a pointer to this thread on a wider FTDNA post on the forum to make sure you get the necessary expertise. My only useful comment is that I take frequent screenshots from YFull, which has enabled me to chart the TMRCA changes on the S12289 branch. The last change there was in the past few weeks. Good luck because it looks to me that you're learning something about your own L338 line.

That's a good point as it's clearly not a scenario that's exclusive to I1. I have taken screenshots of my YFull pages over time - there are a few things that have been added or removed since the initial results unlocked, some changes stick around, some changes go away. It's good to see that some things are dynamic and being updated. Good value for a one-off fee for analysis.

There were a few interesting things going on with the "Hg and SNPs" tab and the "Age Estimation" tab recently - wish I had taken some of those screenshots. You probably see similar in your results - under "+ known SNPs" for the "Age Estimation" tab, some of the SNPs were being counted 2 times but assigned to I-YSC261 and I-L338 with a weight attached to each - for example DF77/S1969 assigned to I-L338 level with a weight of 0.846 and the same SNP assigned to the I-YSC261 level with a weight of 0.154. This says to me that they're not completely sure level these SNPs should be on since everyone who is I-YSC261 is I-L338. I guess we won't know for sure unless someone breaks up the block of phylogenetic SNPs. On the "Hg and SNPs" tab, most of them are listed as "level I-L338 <-> I-YSC261"

I also have a private SNP that's not novel because it appears on the YFull tree as a recurrent SNP in haplogroup O. YFull was flipping that between my kit number and I-A12974 with a weight assigned to each, but the I-A12974 level has now dropped out of that. Should have taken some screenshots of that before it changed.

deadly77
01-24-2020, 12:22 AM
I lack the scientific acumen to give you a helpful answer; however, here is my opinion regarding the matter. At this moment in time with the minimal amount of folks in the databases I’m satisfied with just focusing on “reliable” SNPs that are found on both the Ftdna and YFull Haplotrees for phylogeny and age estimates. In the future as more men test and more patrilineal differentiation will be necessary I’ll delve deeper at said time into these “questionable” SNPs. I don’t really care about questionable SNPs that are millennia old because in terms of what I want to know there is not much diffference between a common patrilineal ancestor from 500 BC or a “questionable” common patrilineal ancestor from 420 BC.
I would find these “questionable” SNPs to be more worthwhile similar to Y-111 STRs when you want fine detailed patrilineal differentiation within a genealogical timeframe when paper trails are incomplete.

This is largely where I am with these, and very much agree with the strategy of concentrating on the reliable SNP candidates first and then looking at the more questionable ones later that Thomas Krahn outlined in the presentation linked in #4. And for sure, A12974 and FGC74335 don't have any recent genealogical influence. The I-Y33691 folks have between 17 and 20 SNPs on the YFull tree that are suitable for age estimation (plus several more on the YFull tree) that track their descent from I-L338 while I have 17 different ones in the same category so we can chart our divergence from each other without including A12974 or FGC74335 as a stepping stone. Although I must admit that I'm interested if this I-A12974 is reliable in a phylogenetic sense in terms of the architecture of the tree. More curious than practical for genealogy and it may be chasing ghosts that may not be real.

deadly77
01-24-2020, 12:49 AM
I would imagine that sometime in the future we may be able to use those SNPs in a more reliable fashion. Although, for the time being, I’m inclined to listen to Thomas Krahn and generally disregard them. Coincidentally, my match received his results yesterday and of the 12 Novel Variants we match (11 @ FT), two of them are in the centromeric region. So I’ve been wondering how to consider them. Mostly, I’ve been ignoring them but not always. Perhaps, as Deadly said, they’re useful for branching but not for dating. Hopefully, he’ll send his results to YFull and we can let them figure it out.


Edit: After reading the article in the Atlantic, maybe the future is closer than I thought. Let’s hope!

Indeed, there's an element of opinion on whether these type of SNPs are reliable or not. FTDNA and YSEQ are easiest to predict as it seems (at least to me) FTDNA are using all that are found as long as the positives/negatives are consistent across branches in their own database and YSEQ have set their red lines in a more conservative approach. I agree with you that I lean a bit closer to the YSEQ view, as Thomas Krahn explains his reasoning but I haven't yet seen similar from anyone associated with FTDNA. It's the others (YFull, FGC, Alex Williamson's Big Tree, etc.) that are a bit harder to predict because I'm not sure what their accept/reject criteria are. For example, I can understand why YFull aren't including FGC74335 on their tree due to homology with the X chromosome. But I'm not sure why they use A12974 and give it four stars out of five in their rating. Figured best was to send them an email and ask them, so will see if they respond.

Yes, I thought that was an interesting article in the Atlantic - an easier read than the Nature Communications paper itself, which I haven't had a full go at yet. Better references sequences are probably what we'll be able to apply to our own data in the future. Long read will involve a extra NGS test but the price is a bit rich for me currently. As well as the more difficult regions, can see it being better for the longer INDELs and STRs which we're probably not using as much as we could due to the limitations of current technology, not to mention an even more limited pool to compare with.

deadly77
01-24-2020, 12:59 AM
This is largely where I am with these, and very much agree with the strategy of concentrating on the reliable SNP candidates first and then looking at the more questionable ones later that Thomas Krahn outlined in the presentation linked in #4. And for sure, A12974 and FGC74335 don't have any recent genealogical influence. The I-Y33691 folks have between 17 and 20 SNPs on the YFull tree that are suitable for age estimation (plus several more on the YFull tree) that track their descent from I-L338 while I have 17 different ones in the same category so we can chart our divergence from each other without including A12974 or FGC74335 as a stepping stone. Although I must admit that I'm interested if this I-A12974 is reliable in a phylogenetic sense in terms of the architecture of the tree. More curious than practical for genealogy and it may be chasing ghosts that may not be real.

Although to add to that, I'm very intrigued by the one individual on the FTDNA public haplotree that is on I-A12974 branch but not on either of the two downstream branches (I-A2006 and I-S24090) and if he is on my branch. Would have to be a Big Y since A12974 or FGC74335 aren't in any SNP packs or offered individually. He's not in either of the FTDNA projects for I1 or I1-Z140 and it's unlikely he'll be on the I-S24090 folks Big Y match list since that branch has 32 SNPs on the I-S24090 branch that would take him past the 30 non-matching variants that FTDNA uses as their matching threshold. Hope this individual makes his way to the FTDNA project or YFull. Although knowing my luck, our common ancestor will be at I-L338 and he'll have in own separate string of 17+ novel SNPs downstream of that...

JMcB
01-24-2020, 03:53 AM
Although to add to that, I'm very intrigued by the one individual on the FTDNA public haplotree that is on I-A12974 branch but not on either of the two downstream branches (I-A2006 and I-S24090) and if he is on my branch. Would have to be a Big Y since A12974 or FGC74335 aren't in any SNP packs or offered individually. He's not in either of the FTDNA projects for I1 or I1-Z140 and it's unlikely he'll be on the I-S24090 folks Big Y match list since that branch has 32 SNPs on the I-S24090 branch that would take him past the 30 non-matching variants that FTDNA uses as their matching threshold. Hope this individual makes his way to the FTDNA project or YFull. Although knowing my luck, our common ancestor will be at I-L338 and he'll have in own separate string of 17+ novel SNPs downstream of that...

If I’m reading it correctly, it looks like he has 16 Novel Variants:

“I-A12974

On average 16 Private Variants in 1 Big Y participant”.


He’s the only one I can see who’s still carrying I-A12974 in the group coming out of the same.
Unfortunately, the Big Block Tree is a little clunky & unmanageable, so it’s hard to get a reduced shot of it, that includes all of the pertinent subclades.

On the other hand, it may be my iPad that’s the culprit.

deadly77
01-24-2020, 11:17 AM
If I’m reading it correctly, it looks like he has 16 Novel Variants:

“I-A12974

On average 16 Private Variants in 1 Big Y participant”.


He’s the only one I can see who’s still carrying I-A12974 in the group coming out of the same.
Unfortunately, the Big Block Tree is a little clunky & unmanageable, so it’s hard to get a reduced shot of it, that includes all of the pertinent subclades.

On the other hand, it may be my iPad that’s the culprit.

Well that at least confirms a Big Y :) - I must admit to not being too familiar with the Big Y Block Tree (since I don't have Big Y). I've seen screenshots that others have posted which show how many matches that each kit has. Does that show in the Block Tree that you can see for the I-A12974 individual or just that just apply to your own matches on the Block Tree? My guess it is the latter.

Not sure if Mr I-A12974 will appear on anyone's Big Y match list as I'm guessing that he has more than 30 non matching variants with the others downstream. There are 16 SNPs on the I-A2006 branch level, so adding Mr I-A12974's 16 private variants gets to 32 without even considering the SNPs on downstream branches below I-A2006 or private variants. Ditto for I-S24090 branch's 32 SNPs.

Looks like there are 13 individuals at I-A12974 branch or downstream, with 10 of them in the FTDNA I1-Z140 project. Missing one from I-S24090, one from I-A2051 (Denmark) and Mr I-A12974.

I checked my STR matches and I don't match most of these at any level. The only example is one of the I-A1972 folks at Y12, GD1. Looking at the project it looks like I have GD2-3 with most of the rest of these folks at Y12, so outside the threshold.

JMcB
01-24-2020, 04:08 PM
Well that at least confirms a Big Y :) - I must admit to not being too familiar with the Big Y Block Tree (since I don't have Big Y). I've seen screenshots that others have posted which show how many matches that each kit has. Does that show in the Block Tree that you can see for the I-A12974 individual or just that just apply to your own matches on the Block Tree? My guess it is the latter.

Not sure if Mr I-A12974 will appear on anyone's Big Y match list as I'm guessing that he has more than 30 non matching variants with the others downstream. There are 16 SNPs on the I-A2006 branch level, so adding Mr I-A12974's 16 private variants gets to 32 without even considering the SNPs on downstream branches below I-A2006 or private variants. Ditto for I-S24090 branch's 32 SNPs.

Looks like there are 13 individuals at I-A12974 branch or downstream, with 10 of them in the FTDNA I1-Z140 project. Missing one from I-S24090, one from I-A2051 (Denmark) and Mr I-A12974.

I checked my STR matches and I don't match most of these at any level. The only example is one of the I-A1972 folks at Y12, GD1. Looking at the project it looks like I have GD2-3 with most of the rest of these folks at Y12, so outside the threshold.

He doesn’t appear to have any matches. I managed to cobble together two screen shots for his area. Unfortunately, the way the Big Block Tree is design you have to scroll back and forth to see everything you want. So I couldn’t fit everything in

At any rate here it is:

36000

deadly77
01-24-2020, 04:47 PM
He doesn’t appear to have any matches. I managed to cobble together two screen shots for his area. Unfortunately, the way the Big Block Tree is design you have to scroll back and forth to see everything you want. So I couldn’t fit everything in

At any rate here it is:

36000

Yep, looks that way - and that appears to be the case for all of the individuals in that screenshot. Thanks for the screenshot. I hope this individual makes it over to the FTDNA Z140 project or YFull of his own accord. Even if A12974 turns out not to be reliable or he doesn't share any of my currently private SNPs, it's a new branch of I-L338.

deadly77
01-25-2020, 10:31 PM
Well, YFull say they are confident in A12974 - they said they analyze BAM files manually and take into account not only this mutation but the whole situation in different samples in this region regarding potential pseudo SNPs from other chromosomes. They think my ambiguous A12974 result is actually positive. They say they don't trust FGC74335 due to homology with the X chromosome though.

mwauthy
01-29-2020, 06:59 PM
Well, YFull say they are confident in A12974 - they said they analyze BAM files manually and take into account not only this mutation but the whole situation in different samples in this region regarding potential pseudo SNPs from other chromosomes. They think my ambiguous A12974 result is actually positive. They say they don't trust FGC74335 due to homology with the X chromosome though.


There is another SNP on YFull that received a 5 star rating and is located in the “questionable” centromeric region. The SNP / subclade branch is I-Y2170. So it does appear that there are many factors that go into judging the suitability of a SNP and that we should not simply discount these “questionable” SNPs based solely on their locations.

deadly77
01-29-2020, 07:51 PM
There is another SNP on YFull that received a 5 star rating and is located in the “questionable” centromeric region. The SNP / subclade branch is I-Y2170. So it does appear that there are many factors that go into judging the suitability of a SNP and that we should not simply discount these “questionable” SNPs based solely on their locations.

Yeah, I'm just looking for a bit more understanding about why some of the SNPs in the centromere are ok, and some are not so I can apply it to candidate SNPs in this region in the future. If I look at my WGS, there are differences at 44 positions in the centromere region and 65 positions in the DYZ19 region compared to the reference sequence. Obviously not all of these will be reliable, but I'd like to have a better understanding of which ones are worth considering and which to ignore, and how to assess that. Comparing by BLAT to reference genome for high identity? Phylogenetic consistency across Y-DNA branches? Statistically predicting potential recombination events? Just not sure how to evaluate that at my current level of understanding.

Of course, not all these 109 differences in the centromere and DYZ19 will be real "derived" since the hg38 reference is R1b, some of the positions in the reference sequence represent derived mutations from the ancestral state for R1b rather than for other haplogroups.

I've sent FTDNA a query regarding their view of A12974 and if they'd consider making it available for single SNP testing, but they have yet to reply to either yet. Might ask FGC as well.

I asked a few people on this branch who had kits uploaded to YFull to join the YFull I1-Z140 group and most of them have. It gives a bit of a clearer picture of what YFull is looking at from a phylogenetic consistency perspective. This is especially for those who have done Big Y Y700 and added that to an earlier Big Y Y500, in which the later test reads A12974 much better.
36083

mwauthy
01-30-2020, 12:18 AM
Yeah, I'm just looking for a bit more understanding about why some of the SNPs in the centromere are ok, and some are not so I can apply it to candidate SNPs in this region in the future. If I look at my WGS, there are differences at 44 positions in the centromere region and 65 positions in the DYZ19 region compared to the reference sequence. Obviously not all of these will be reliable, but I'd like to have a better understanding of which ones are worth considering and which to ignore, and how to assess that. Comparing by BLAT to reference genome for high identity? Phylogenetic consistency across Y-DNA branches? Statistically predicting potential recombination events? Just not sure how to evaluate that at my current level of understanding.

Of course, not all these 109 differences in the centromere and DYZ19 will be real "derived" since the hg38 reference is R1b, some of the positions in the reference sequence represent derived mutations from the ancestral state for R1b rather than for other haplogroups.

I've sent FTDNA a query regarding their view of A12974 and if they'd consider making it available for single SNP testing, but they have yet to reply to either yet. Might ask FGC as well.

I asked a few people on this branch who had kits uploaded to YFull to join the YFull I1-Z140 group and most of them have. It gives a bit of a clearer picture of what YFull is looking at from a phylogenetic consistency perspective. This is especially for those who have done Big Y Y700 and added that to an earlier Big Y Y500, in which the later test reads A12974 much better.
36083


I’m in the same boat as you are. I’d like to have a better understanding of the criteria for differences in quality as well.

I recently learned that some SNPs in the combBED regions are not suitable for analysis either so I quit trying to do rough age estimates for people’s private SNPs that were in the combBED regions.

Another question I have is why are some of the novel SNPs with the red “H” next to them because they are homologous listed in the “Acceptable” category? My first cousin had one of those. When I uploaded my Big Y-700 results to YFull that homologous novel SNP disappeared from his novel section but it wasn’t added to the Haplotree either. Not sure why it was considered acceptable at the beginning and then not afterwards?

deadly77
01-30-2020, 09:16 AM
I’m in the same boat as you are. I’d like to have a better understanding of the criteria for differences in quality as well.

I recently learned that some SNPs in the combBED regions are not suitable for analysis either so I quit trying to do rough age estimates for people’s private SNPs that were in the combBED regions.

Another question I have is why are some of the novel SNPs with the red “H” next to them because they are homologous listed in the “Acceptable” category? My first cousin had one of those. When I uploaded my Big Y-700 results to YFull that homologous novel SNP disappeared from his novel section but it wasn’t added to the Haplotree either. Not sure why it was considered acceptable at the beginning and then not afterwards?

Yes, I have some of those too. It seems being in the combBED region alone is not the only qualification that YFull uses to determine if suitable for age estimation. The "x Known SNPs" and "x Novels" tabs under "age estimation are helpful. From there, I can see several SNPs that YFull list as phyloequivalent with I-Z58 branch, but they're not included because of more than five different localizations - ie, S10311 in the combBED region but also found in 349 haplogroups/subclades different from mine, so too many examples of recurrence. Another one (Y125444) is classed as an INDEL.

For the SNPs tagged as homologous (the novel SNPs with the red “H” next to them), I believe that YFull once said that they use "soft reject" for these cases - they report them in novel SNPs but don't add them to the tree, use them for age estimation or use them for phylogeny. I think the situation with your cousin's SNP is similar to what I saw with FGC74335 in post #2 - initially reported in my novel SNPs as position read in my test but not in any of the other Big Y Y500 tests on I-Y33691 or I-Y12329. I can see that since two of the kits upgraded to Y700, they are now showing positive result for FGC74335. So FGC74335 has disappeared from my novel SNPs because it's not found only in my sample. But not added to YFull tree because it's homologous. FTDNA is listing FGC74335 on their haplotree, though.

mwauthy
01-30-2020, 02:38 PM
Yes, I have some of those too. It seems being in the combBED region alone is not the only qualification that YFull uses to determine if suitable for age estimation. The "x Known SNPs" and "x Novels" tabs under "age estimation are helpful. From there, I can see several SNPs that YFull list as phyloequivalent with I-Z58 branch, but they're not included because of more than five different localizations - ie, S10311 in the combBED region but also found in 349 haplogroups/subclades different from mine, so too many examples of recurrence. Another one (Y125444) is classed as an INDEL.

For the SNPs tagged as homologous (the novel SNPs with the red “H” next to them), I believe that YFull once said that they use "soft reject" for these cases - they report them in novel SNPs but don't add them to the tree, use them for age estimation or use them for phylogeny. I think the situation with your cousin's SNP is similar to what I saw with FGC74335 in post #2 - initially reported in my novel SNPs as position read in my test but not in any of the other Big Y Y500 tests on I-Y33691 or I-Y12329. I can see that since two of the kits upgraded to Y700, they are now showing positive result for FGC74335. So FGC74335 has disappeared from my novel SNPs because it's not found only in my sample. But not added to YFull tree because it's homologous. FTDNA is listing FGC74335 on their haplotree, though.


I was looking at my first cousins age estimate on YFull and it shows that he has 8 SNPs associated with the I-Z58 subclade branch that were used for his age estimate. Is there a reason why YFull has not added them to their Haplotree for the I-Z58 branch?

deadly77
01-30-2020, 03:51 PM
I was looking at my first cousins age estimate on YFull and it shows that he has 8 SNPs associated with the I-Z58 subclade branch that were used for his age estimate. Is there a reason why YFull has not added them to their Haplotree for the I-Z58 branch?

I had never noticed that before but I see the same thing with eight SNPs listed on the I-Z58 level on the "+ known SNPs" tab. Also it seems that Z58 itself is weighted as 0.225 to it's own branch but more weirdly weighted as 0.775 to I-Z138. That doesn't make sense at all. Also they are listing Z2721 on the I-Z17954 branch. If I look up Z2721 on my YReport, it's listed on the I1 branch (ie, upstream of I-DF29 and I-Z17954) and that appears to be reflected on the YFull tree as well.
36093
I looked up the first of the eight SNPs listed on the I-Z58 level - M11767 - and it looks like it's on the YFull Tree at A0 and A1b but then it's listed for seemingly every other subclade haplogroup (below the screenshot it scrolls down for a long, long way. Looking up under check SNPs, it seems that Z40392 is C to T (1 star rating) while M11767 / Z12032 is T to C (no rating) although both mutations are listed together as if they are the same SNP on the age estimate. Either way, I don't see what it has to do with the I-Z58 branch.
36094
This is all a bit bizarre and I'm not sure how to rationalize any of that. The only thing I can think of is that YFull is making some changes and hasn't finished yet, so we're looking at a work in progress.

deadly77
01-30-2020, 03:52 PM
Looked back at this "+Known SNPs" tab from a screenshot that I took in 2018. This one makes a lot more logical sense to me.
36095

JMcB
01-30-2020, 04:36 PM
I had never noticed that before but I see the same thing with eight SNPs listed on the I-Z58 level on the "+ known SNPs" tab. Also it seems that Z58 itself is weighted as 0.225 to it's own branch but more weirdly weighted as 0.775 to I-Z138. That doesn't make sense at all. Also they are listing Z2721 on the I-Z17954 branch. If I look up Z2721 on my YReport, it's listed on the I1 branch (ie, upstream of I-DF29 and I-Z17954) and that appears to be reflected on the YFull tree as well.
36093
I looked up the first of the eight SNPs listed on the I-Z58 level - M11767 - and it looks like it's on the YFull Tree at A0 and A1b but then it's listed for seemingly every other subclade haplogroup (below the screenshot it scrolls down for a long, long way. Looking up under check SNPs, it seems that Z40392 is C to T (1 star rating) while M11767 / Z12032 is T to C (no rating) although both mutations are listed together as if they are the same SNP on the age estimate. Either way, I don't see what it has to do with the I-Z58 branch.
36094
This is all a bit bizarre and I'm not sure how to rationalize any of that. The only thing I can think of is that YFull is making some changes and hasn't finished yet, so we're looking at a work in progress.

Not surprisingly, I’ve got them, too.

36096


I have another one they list as a known SNP (that’s in the combBED) used for dating; Z27279, that’s added to my 8 NV and when I look it up it says:

Z27279 D-N1 Yseq tested: 1 (positive: 0) .BAM Rating for known SNP (three stars)

When I search on the tree it says:

− Z27279 D-N1

− Z27279 I-A13248*
This position for SNP is not in the YTree

So it’s in my branch and Haplogroup D-N1

deadly77
01-30-2020, 06:09 PM
Not surprisingly, I’ve got them, too.

36096


I have another one they list as a known SNP (that’s in the combBED) used for dating; Z27279, that’s added to my 8 NV and when I look it up it says:

Z27279 D-N1 Yseq tested: 1 (positive: 0) .BAM Rating for known SNP (three stars)

When I search on the tree it says:

− Z27279 D-N1

− Z27279 I-A13248*
This position for SNP is not in the YTree

So it’s in my branch and Haplogroup D-N1

Yep, I have a similar one - a SNP that was first listed among novel SNPs, with a note "additional localizations for variant:1" - it was added to YBrowse in 2017 in haplogroup O, prior to my own results. YFull later placed it on the tree at O-Y81597 where there are two samples. Also found in I-M223* as well as my own branch according to tree (although not on the tree for latter two branches). After that it got moved from my Novel SNPs to Hg and SNPs with a three star rating. It's in the combBED region, so counts towards age estimation on the "+ Known SNPs" tab.

From the YFull FAQ, it says that they allow a SNP if it's found with five or less "localizations", while SNPs found on more than five different branches are excluded from age estimation calculation: https://www.yfull.com/faq/what-yfulls-age-estimation-methodology/