PDA

View Full Version : What is ZZ11 and what do we do with it?



TigerMW
04-20-2016, 04:12 PM
Alex Williamson discovered and documented ZZ11 in the Big Tree. It is in the MRCA of both DF27 and U152 and ZZ11 is a descendant of P312.

http://ytree.net/DisplayTree.php?blockID=2&star=false

http://ybrowse.y-chromosome.org/gb2/gbrowse_details/chrY?ref=ChrY;start=22286799;end=22286799;name=ZZ1 1_1;class=Sequence;feature_id=10824;db_id=chrY%3Ad atabase

Unfortunately ZZ11_1 22286799-C-G sits in the DYZ19 region.

ISOGG describes ZZ SNPs in this way,
"ZZ = Alex Williamson. Mutations in palindromic regions. Each ZZ prefix represents two possible SNP locations."

We don't have a location for ZZ11_2 documented in YBrowse. Can ZZ11_2's position-anc-der be derived or predicted?

Most testing institutions don't really like this kind of SNP. It is a risky SNP, no doubt. If I thought I was ZZ11* (DF27- U152-) it would be especially risky or if I thought was P312* the same would be the case. It might have been wiped out.

Given all of that, I don't know that ZZ11 does much terms of making formal, conservative trees but that does not eliminate its importance in the phylogeny. It pretty much proves DF27 and U152 are more closely related to each other than to L21, DF19, DF99 or L238, right?

I don't think ZZ11 should be included in a SNP testing approach based on traversing the tree, but I could be wrong.

SNP Packs are not tree traversing, they "all in one" fell swoops. I don't know if FTDNA would accept ZZ11 but we could try to get it in something. The likely places would be the R1b-M343 Backbone or the R1b-P312 Packs. What do you think of trying to get this placed in a pack?

swid
04-20-2016, 04:25 PM
It doesn't show up when you expand out the ZZ11 branch on its own Big Tree page, but Alex does also list Z38841 as a second mutation on that block on the main page itself; unfortunately, it's equally "risky" as it's really an STR.

razyn
04-20-2016, 05:03 PM
OK, there are 15 results for P312 SNP packs that I can see in the DF27 project. One is U152 and shouldn't be in it, but I can see his result, so I checked.

And what I see is, the test "worked" for 12 of the 15, including the U152 guy: ZZ11*, in which the asterisk means "no-call or heterozygous result." And a heterozygous result there is the call that one should expect, in a sample positive for this elusive SNP.

So, the pack is testing for it, and mostly returning what looks like the right call for these guys. There may be some mystery as to why three samples (42481, 52129 and 114395) didn't get a call for it. But the P312 pack is in its earliest days, so maybe that's just "noise." Anyway, I'm pleased that the effort is being made.

lgmayka
04-20-2016, 05:13 PM
There may be some mystery as to why three samples (42481, 52129 and 114395) didn't get a call for it.
The "mystery" is that it is very prone to back-mutation. Both members of R-Y18894 (https://yfull.com/tree/R-Y18894/) are entirely ancestral (396C and 392C) at 22286799.

The practical danger is that naive customers might actually take the ZZ11 level seriously: A customer might test ZZ11- and assume that he is also DF27- U152- , which is not a safe conclusion.

TigerMW
04-20-2016, 05:36 PM
It doesn't show up when you expand out the ZZ11 branch on its own Big Tree page, but Alex does also list Z38841 as a second mutation on that block on the main page itself; unfortunately, it's equally "risky" as it's really an STR.
I can't find Z38841's positionnum-anc-der details anywhere. Do you have them?

TigerMW
04-20-2016, 05:42 PM
The "mystery" is that it is very prone to back-mutation. Both members of R-Y18894 (https://yfull.com/tree/R-Y18894/) are entirely ancestral (396C and 392C) at 22286799.
Can you put some kind of probability or range around the words "very prone"? That's hard to use in decision-making. There is an SNP in U152 that was rejected by ISOGG as it sits in an X DNA similar region but the X DNA related mutation is thought to have happened 300 to 400 thousand years ago. That's not too bad for our purposes.

I really do not know the answer but is the location really prone to back mutation or is the issue that some testing systems (maybe all today) have a hard time sequencing the DYZ19 region?


The practical danger is that naive customers might actually take the ZZ11 level seriously: A customer might test ZZ11- and assume that he is also DF27- U152- , which is not a safe conclusion.

Agreed. I'm not sure if this can nor ever should make the haplotree, but it's definitely worthwhile to know from ancient research point of view that U152 and DF27 are closely related.

TigerMW
04-20-2016, 05:59 PM
OK, there are 15 results for P312 SNP packs that I can see in the DF27 project. One is U152 and shouldn't be in it, but I can see his result, so I checked.

And what I see is, the test "worked" for 12 of the 15, including the U152 guy: ZZ11*, in which the asterisk means "no-call or heterozygous result." And a heterozygous result there is the call that one should expect, in a sample positive for this elusive SNP.

So, the pack is testing for it, and mostly returning what looks like the right call for these guys. There may be some mystery as to why three samples (42481, 52129 and 114395) didn't get a call for it. But the P312 pack is in its earliest days, so maybe that's just "noise." Anyway, I'm pleased that the effort is being made.
I think you are saying the test results were "no calls" for 42481, 52129 and 114395. That's different than an ancestral call. I'm a little concerned about testing and consulting institutions in terms of our assumptions on ancestral values and ambiguous calls. Ambiguity implies we can't be sure. An interpretation is just an interpretation. Different types of testing systems might shed light on ambiguities.

However, this the kind of bad news I was expecting on experimental SNPs. I just checked. ZZ11 is no longer in the R1b-P312 SNP Pack list. Remember, I said that FTDNA was scrubbing their SNP list in the first quarter of this year. ZZ11 may be a casualty. Rather than struggle through these interpretations in difficult regions, some things will be thrown out the door.

swid
04-20-2016, 06:01 PM
I can't find Z38841's positionnum-anc-der details anywhere. Do you have them?

You'll get text with information if you hover over Z38841 on the main page: "Change from 10 copies of GAATG to 9 copies at position 13840815."; it's also listed on the mutations page (http://ytree.net/SNPIndex.php): 13840815-CGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAAT G-CGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATG

razyn
04-20-2016, 06:17 PM
I can't find Z38841's positionnum-anc-der details anywhere. Do you have them?
YFull search says 17673197, C to T

TigerMW
04-20-2016, 06:31 PM
You'll get text with information if you hover over Z38841 on the main page: "Change from 10 copies of GAATG to 9 copies at position 13840815."; it's also listed on the mutations page (http://ytree.net/SNPIndex.php): 13840815-CGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAAT G-CGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATG
There are several rs mutations in YBrowse in this area.

http://ybrowse.y-chromosome.org/gb2/gbrowse_details/chrY?ref=ChrY;start=13840815;end=13840820;name=rs3 74883171;class=Sequence;feature_id=362105;db_id=ch rY%3Adatabase

http://ybrowse.y-chromosome.org/gb2/gbrowse_details/chrY?ref=ChrY;start=13840816;end=13840816;name=rs1 46438935;class=Sequence;feature_id=362106;db_id=ch rY%3Adatabase

Z38841 itself was apparently not considered for YBrowse.

lgmayka
04-20-2016, 09:21 PM
I really do not know the answer but is the location really prone to back mutation or is the issue that some testing systems (maybe all today) have a hard time sequencing the DYZ19 region?
Back-mutation certainly occurred in the line leading to the R-Y18894 clade (https://yfull.com/tree/R-Y18894/) (tested 396C and 392C).

Back-mutation certainly occurred in YF05239 (tested 197C), which is directly under R-L2 (https://yfull.com/tree/R-L2/).

Back-mutation certainly occurred in YF04510 (tested 340C), which belongs to R-FGC14124* (https://yfull.com/tree/R-FGC14124/).

Thus, 4 customer examples just from the members of YFull's U152 and DF27 groups.

razyn
04-22-2016, 04:08 PM
I don't exactly disagree with lgmayka's perception that ZZ11 is a bit chimerical. But I think that characteristic is hardly limited to ZZ11, among all SNPs -- and not a reason to ignore something that altered the phylogeny, at a time and place several thousand years (and miles) from where we currently stand. It's highly useful, if present. We may continue to disagree about its proper use; or the horrible consequences either of failing to test for it, or testing and getting the wrong answer; or the threat of wasting a few hundred dollars here and there.

Anyway, last night I went through the exercise of checking all the results so far in the P312 project for the P312 SNP Pack -- the only test that purports actually to look for ZZ11. I had previously done this in the DF27 project, where we had ZZ11* (no call or heterozygous call) for 11/14 DF27 guys, and 1/1 U152 guy who should not have been in that project anyhow. In my enthusiasm, I mistook that for a good result, assuming that they all should (at best) show heterozygosity. Having looked at the P312 guys, my enthusiasm has waned. I think the test isn't targeting accurately enough to get a credible read. Perhaps those asterisks were about evenly divided between no-calls and mixed calls. Anyway, with a couple dozen more tests (only a few of whom were DF27 or U152), everyone who got any call at ZZ11 got the asterisk -- 17 out of the 25 I hadn't already examined in the DF27 project. The other 8 had no result reported for ZZ11.

Of the four new U152 guys, both who were L2+ had the ZZ11*, and both who were PF6658+ had no result. Does that mean anything, really? I kind of doubt it. A case might be made that the little-tested Z29645>S27900 is consistently ZZ11*; and maybe even that DF99 is. But, does that mean anything? I doubt it.

I think we just need a better test than the P312 chip offers at present; or more enriched data for that problematic region, before betting the farm on its validity OR throwing out the baby with the bath water. And I sort of wish Alex would join this discussion.

TigerMW
04-22-2016, 04:36 PM
Back-mutation certainly occurred in the line leading to the R-Y18894 clade (https://yfull.com/tree/R-Y18894/) (tested 396C and 392C).

Back-mutation certainly occurred in YF05239 (tested 197C), which is directly under R-L2 (https://yfull.com/tree/R-L2/).

Back-mutation certainly occurred in YF04510 (tested 340C), which belongs to R-FGC14124* (https://yfull.com/tree/R-FGC14124/).

Thus, 4 customer examples just from the members of YFull's U152 and DF27 groups.

How do you know certainly that these are back mutations versus bad (ambiguous) reads that are not just reflective of a testing technology or interpretation method that struggles with this portion of the DYZ19 region?

lgmayka
04-22-2016, 05:22 PM
How do you know certainly that these are back mutations versus bad (ambiguous) reads that are not just reflective of a testing technology or interpretation method that struggles with this portion of the DYZ19 region?
The reads were not ambiguous. I already cited them: 396C, 392C, 197C, and 340C.

Yes, one could claim that the Big Y somehow always reads this location wrong. But that makes the location rather worthless anyway, at least for the near future.

Yes, one could claim that ZZ11 is not where we think it is. But then where or what is it? If no one can define it unambiguously, it is rather worthless for scientific purposes.

lgmayka
04-22-2016, 05:24 PM
We may continue to disagree about its proper use; or the horrible consequences either of failing to test for it, or testing and getting the wrong answer; or the threat of wasting a few hundred dollars here and there.
The examples of back-mutation I cited are not "wrong" answers. The reflect the reality of a finicky location that apparently often back-mutates (probably due to copy-over from elsewhere).

razyn
04-22-2016, 05:43 PM
The examples of back-mutation I cited are not "wrong" answers. The reflect the reality of a finicky location that apparently often back-mutates (probably due to copy-over from elsewhere).
I wasn't talking about your four examples; I believe in back mutations, or return to a status quo ante. I was talking about the results from the three dozen tests with the P312 SNP pack, the only ones to my knowledge that have actually purported to look for it.

Also, Alex lists several places to look for evidence of it, not just one. I suspect this SNP pack only looks at one; and for that matter I'm pretty sure the YFull data you cite is only looking at one.

Also, if it's a real event with phylogenetic consequences, it probably has several equivalents -- some or one of which may be easier to test reliably (and may be more stable, over four or five millennia).

So, hold off on the hyperbolic phrases like "worthless for scientific purposes," please. Scientific purposes include continuing to look for the branching points; if the ZZ11 event is indeed a major one, for the early Bronze Age peopling of central and western Europe, we need to find ways to look for it. And if actually looking for it proves that it's not such a branching point, I'll shut up. That hasn't really happened, yet.

Williamson
04-22-2016, 06:07 PM
We don't have a location for ZZ11_2 documented in YBrowse. Can ZZ11_2's position-anc-der be derived or predicted?

There is at least a ZZ11_2 and suspect there may be a ZZ11_3 as well. There is a large portion of DYZ19 that has not been mapped. That region is still sequenced by BigY and FGC tests, and the reads from those regions simply get mapped to regions of the reference sequence that most closely resemble them. Judging what I can from the BAM files available to me, it looks as though reads from 2 or 3 different regions of the Y chromosome are being aligned to the region around 22286799. Only 1 of those regions has the ZZ11 mutation. Until a more complete reference sequence for DYZ19 becomes available, it isn't possible to assign positions to a ZZ11_2 or a ZZ11_3.



Most testing institutions don't really like this kind of SNP. It is a risky SNP, no doubt. If I thought I was ZZ11* (DF27- U152-) it would be especially risky or if I thought was P312* the same would be the case. It might have been wiped out.

There would be no concern of being ZZ11*, but being P312* would definitely be a concern. We should try to estimate some sort of back mutation rate for this SNP. That would allow us to better quantify our concerns. Ideally, most of our P312 branches would have other branches as well, which formed early on. The odds of recLOH on all the branches would be fairly low. Just a single P312* guy is a real problem.



Given all of that, I don't know that ZZ11 does much terms of making formal, conservative trees but that does not eliminate its importance in the phylogeny. It pretty much proves DF27 and U152 are more closely related to each other than to L21, DF19, DF99 or L238, right?

I don't think ZZ11 should be included in a SNP testing approach based on traversing the tree, but I could be wrong.


If we do believe the mutation exists and we can confidently place it on the tree then it should be on the tree even if not all the downstream branches are going to remain positive for it. I think those should be the only conditions.

When it comes to determining if someone belongs to the R-ZZ11 branch it makes far more sense to test for mutations that are thought to be more stable that are known to be downstream of ZZ11. There is no need to confuse the issue of what branching and mutations happened throughout history with what mutations we can reliably expect to find today.


SNP Packs are not tree traversing, they "all in one" fell swoops. I don't know if FTDNA would accept ZZ11 but we could try to get it in something. The likely places would be the R1b-M343 Backbone or the R1b-P312 Packs. What do you think of trying to get this placed in a pack?

If it can be tested reliably, then it should be included. If it can't, don't include it. Back mutations are good mutations too, but they do require FTDNA to up their game when it comes to interpreting the results.

TigerMW
04-22-2016, 06:18 PM
The reads were not ambiguous. I already cited them: 396C, 392C, 197C, and 340C.

Yes, one could claim that the Big Y somehow always reads this location wrong. But that makes the location rather worthless anyway, at least for the near future.

Yes, one could claim that ZZ11 is not where we think it is. But then where or what is it? If no one can define it unambiguously, it is rather worthless for scientific purposes.

Let me just use the words "in error". It is possible that the system, maybe any testing system, would have trouble with this particular part of the Y chromosome.

I'm distinguishing between a truly unstable mutation and a difficult/technically infeasible mutation to read (today!)

If so, then I agree with Razyn, we don't want to throw the baby out with the bath water. Let's save our knowledge of this mutation for a day when testing systems can handle it. Perhaps more importantly, we can be very confident the U152 and DF27 are brothers where as L21, DF19, DF99 and L238 are their cousins. The tree of paternal lineages is what it is regardless of our current abilities to deal with the complexities of the Y chromosome.

Our tree description needs enhancements. There are some branches that are real but aren't feasible to test for reliably today. Call it a phantom SNP or whatever, it's still a real branch. Razyn, take note again. ;) I do care about the real phylogeny, not just a constructed model based on simple notions.

Williamson
04-22-2016, 06:25 PM
Can you put some kind of probability or range around the words "very prone"? That's hard to use in decision-making.

I agree with Mike, it would be great if these issues could be quantified more. Unfortunately it is hard to do. Right now, on my Big Tree, I have 649 ZZ11 men. In my database of results, I see 319 positive results. It shows up maybe 10% of the time in 1000 genomes VCF files, half the time in BigY VCF files and never in FGC interpretation files. I have BAM files for about 15% of the BigY/FGC men on my tree so I could check many of those. That would still leave many unknown.

YFull would probably be a better option, even though there are fewer men. I may yet do that.


Back-mutation certainly occurred in the line leading to the R-Y18894 clade (https://yfull.com/tree/R-Y18894/) (tested 396C and 392C).

Back-mutation certainly occurred in YF05239 (tested 197C), which is directly under R-L2 (https://yfull.com/tree/R-L2/).

Back-mutation certainly occurred in YF04510 (tested 340C), which belongs to R-FGC14124* (https://yfull.com/tree/R-FGC14124/).

Thus, 4 customer examples just from the members of YFull's U152 and DF27 groups.

Are these the only 3 branches with back mutations in the YFull data? How many DF27 and U152 men are there? I can take a look and count as well.

Williamson
04-22-2016, 06:55 PM
How do you know certainly that these are back mutations versus bad (ambiguous) reads that are not just reflective of a testing technology or interpretation method that struggles with this portion of the DYZ19 region?

I trust these numbers. BigY tests have great coverage in the region around 22286799, and that position is also the best place for ZZ11+ reads to align. The reads with the ZZ11 mutation are generally about 1/3 to 1/2 of them. If they were there, they'd show up.

TigerMW
04-22-2016, 07:02 PM
I trust these numbers. BigY tests have great coverage in the region around 22286799, and that position is also the best place for ZZ11+ reads to align. The reads with the ZZ11 mutation are generally about 1/3 to 1/2 of them. If they were there, they'd show up.
So these really are back-mutations, then, right? since the sequencing is not likely out of whack.

Then the question is all about the mutation rate for these locations and the duration for which we need a low probability of mutation.

Williamson
04-22-2016, 07:08 PM
The "mystery" is that it is very prone to back-mutation. Both members of R-Y18894 (https://yfull.com/tree/R-Y18894/) are entirely ancestral (396C and 392C) at 22286799.

The practical danger is that naive customers might actually take the ZZ11 level seriously: A customer might test ZZ11- and assume that he is also DF27- U152- , which is not a safe conclusion.

Could you explain what you mean by "take the ZZ11 level seriously"? Either there is a mutation that happened between P312 and DF27 and U152 or there isn't. It sounds like you're suggesting there isn't. Or are you suggesting we just don't have enough data yet?

As for testing ZZ11- and assuming he is also DF27- and U152-, I agree with you that's a problem. I think it should become better known that one shouldn't put all their faith in any single mutation. There are R-DF13 men right now who are in fact DF13- after having lost that mutation. Had there not be additional downstream SNPs, he too may have come to the wrong conclusion.

TigerMW
04-22-2016, 07:12 PM
I trust these numbers. BigY tests have great coverage in the region around 22286799, and that position is also the best place for ZZ11+ reads to align. The reads with the ZZ11 mutation are generally about 1/3 to 1/2 of them. If they were there, they'd show up.
Does checking the opposite strand help in any manner?

TigerMW
04-22-2016, 07:28 PM
... There are R-DF13 men right now who are in fact DF13- after having lost that mutation. Had there not be additional downstream SNPs, he too may have come to the wrong conclusion.
I thought there was something funky with DF13 going on but thought it must have just been lab error or Geno 2 integration screw-ups. Does Z2542 seem to work better? I remember FTDNA didn't want to do DF13 back in the M343 Backbone Pack a year ago and chose Z2542 instead.

Williamson
04-22-2016, 07:50 PM
No. I believe any knowledge of what strand a read came from is only determined by mapping that read somewhere. In this case, it would amount to the same thing as looking at different positions within DYZ19. Some reads with the ZZ11 mutation do in fact get mapped to alternate positions, but that number is typically small compared to what is mapped to 22286799. Shorter reads have more ambiguity than the longer 165bp Big Y reads.

Williamson
04-22-2016, 08:15 PM
I thought there was something funky with DF13 going on but thought it must have just been lab error or Geno 2 integration screw-ups. Does Z2542 seem to work better? I remember FTDNA didn't want to do DF13 back in the M343 Backbone Pack a year ago and chose Z2542 instead.

It's an interesting question. I don' know the answer. DF13 shows up more consistently in BigY results than Z2542 does. If there were problems with Z2542, I wouldn't be able to identify them as easily as I can with DF13.

Williamson
04-22-2016, 08:49 PM
So these really are back-mutations, then, right? since the sequencing is not likely out of whack.

Then the question is all about the mutation rate for these locations and the duration for which we need a low probability of mutation.

Yes, that's the case. "Back mutation" here needs to be understood to refer to a few different mechanisms for losing the mutation, but they all have the same net effect.

We can try to estimate a rate until a more detailed analysis can be done. For arguments sake, let's say we currently identify one ZZ11 back mutation for every 100 R-ZZ11 men we look at. ZZ11 is about 4500 years old, so that would be 180 generations at 25yrs/generation. I want to estimate the number of transmission events (father to son) since ZZ11 within that group of 100 men. I going to say 120 per man on average. So that would give one back mutation for every 12,000 transmission events. I'm open to hearing different suggestions for these numbers.

If we consider a lone P312* guy, we can now estimate the chance he might belong to R-ZZ11 even if he tested ZZ11-. It would be just 180/12000 = 1.5%. It's not a small number and certainly larger than anyone would want it to be. Fortunately, we don't have many P312* men.

To help the situation, we do actually have the INDEL, Z38841. I have asked for this INDEL to be included in YBrowse a couple times, but it doesn't seem to have made it yet. I think the problem with its inclusion is the way it is defined. The mutation is already in the reference sequence. Z38841 refers to a change from 10 repeats of GAATG to 9. The reference sequence already has 9 repeats.

Unlike ZZ11, Z38841 can be Sanger sequenced and can be ordered from YSEQ. I have had Thomas Krahn test me for it, and sure enough I have 10 repeats. (I belong to R-L21) Thomas is reluctant to offer it as anything that will be useful for phylogeny, so it's not in the catalog. If you order a test for FGC20785, it will come up. Keep in mind, like ZZ11, we do expect some men to not have the expected value.

I'm sure we could estimate the mutation rate for Z38841 as well. Even if we assume it mutates 10 times as quickly as ZZ11, which I don't think it does, the two combined would greatly improve our confidence in predicting the true R-ZZ11 status for P312* men.

As an FYI, I have checked the BAM files of both P312* men I know of Keyes (104079) and Harbottle (278973) and both are negative for ZZ11 and Z38841.