There are many kits at FTDNA where ZP87 is negative while Z17 is positive.

I'm wondering why FTDNA considers ZP87 to be an equivalent to Z17. Surely an automated tree-building algorithm would have noticed this?

For my own savage amusement I'm trying to winnow out "impossibilities" within the FTDNA tree for Z18 (and below), and I'm starting with their idea of equivalent snps. If even one SNP / Equivalent SNP pair is shown to be at odds, then wouldn't that constitute something that would be impossible (if the equivalence relation holds)? And wouldn't an automated tree building system (which I presume they have) isn't taking this into account?

ZP87 is an inconsistent call in BigY results. Until there is an individual SNP test result at FTDNA showing the Z17+, ZP87- result they will be considered equivalent. For some of this analysis you can use the hg37 based age comparison results from Iain McDonald. The comparison sheet run will provide information around whether a specific call was consistent or not within the BigY data set. For U106 that was ~1100 BigY hg37 results.

Until there is an individual SNP test result at FTDNA showing the Z17+, ZP87- result they will be considered equivalent.

I suppose it's possible that of the 108 FTDNA kits where Z17+/ZP87- exists that they're all BigY results and none of them are individual SNP/SNP packs results.

I'm just looking at the SNPs data pages that list kits and associated Snps for Z18 project as well as those Z18+ kits in the U106 project that are not in the Z18 project. There are about 660 or so kits in my analysis, and of these 108 are Z17+/ZP87-. There are just 14 where Z17+/ZP87+. I'm not looking to duplicate the excellent analytical work done by Ray and Iain. My goal is to automate the finding of FTDNA tree "impossibilities" within the FTDNA's Z18 tree, and then pass them on to the appropriate parties so that what I see in FTDNA tree on their site is at least logically consistent with their own SNP pages. :)

Just something to while away the days while I'm waiting for my BAM file--- hopefully in the Spring of THIS year.


Because I was puzzled by YFull's use of the variant name FGC79182a instead of Z17, I recently asked them about it, and got a response. They said they couldn't use Z17 for some reason I did not understand, and then said something to the effect that they would use ZP87 instead. Since I'm just an amateur, I know I don't know as much as them, but ZP87 was not on my list of synonyms for Z17, so I decided to research it. What I found completely agrees with John's findings above, which leaves me scratching my head... For this research, I used the FTDNA database of about 800 records on this page: https://www.familytreedna.com/public/r-z18/default.aspx?section=ysnp
* First, ZP87 does not seem to be easy to reliably determine - of all the variants I was scanning for, only ZP87 was inconsistent. As we know, the records for many people contain the results of multiple yDNA tests, so we expect some redundancy, and hope that the results are always consistent (the same SNP's should always be positive or negative for the same person). But I found multiple records with both a ZP87+ and a ZP87-. I basically ignored those records, relied only on ones with consistent results.
* For all records with a reliable ZP87+, if there was a Z17, it was Z17+. However, the reverse was not true, there were many more Z17+ with a ZP87-. That would indicate that R-ZP87 is a descendant clade below R-Z17.
* For all records with a reliable ZP87+, if there was a Z372, it was Z372+. I did not see any cases of a ZP87+ with a Z372- (can't totally guarantee there weren't though). That puts R-ZP87 equivalent to or below R-Z372 (and we know R-Z372 is below R-Z17).
* I spent a lot of time trying to pin ZP87 down to a fork below them, but gave up - too much work and time needed! But I did determine that ZP87 was negative for the branches of R-L257. That puts it on a sister fork above L257, but not in the direct line to R-L257, so not equivalent to Z17 or Z372, and therefore a descendant of R-Z372.
* On that page, my own kit (670269) does not have a call for ZP87, but others with the same Haplogroup of R-S23346 have the same calls as I do, plus ZP87-.

So at this time, I find it difficult to see how ZP87 could be considered an equivalent to Z17. Both FTDNA and YSEQ show them as equivalent in their trees.

John, I was happy to find this page, and see that I'm not the only one wondering what's up with this. Cofgene, I certainly agree ZP87 is an inconsistent call. But all results taken statistically, and relying only on those that do *appear* to be valid, they do seem to present a clear positioning below R-Z372. The fact that I could not find a record with a reliable determination of ZP87+ and an L257+ is pretty good evidence it's not on the direct line to R-L257. But yes, I'm an amateur, just my opinion ...

I have to say I'm very impressed with YFull's personal attention to any queries, very quickly too. I understand much better now why they make the decisions they make, and I've had to correct some wrong ideas I had. They pointed out their "Check SNP" function (my ignorance), and when I checked ZP87 on my 2 kits, both were clearly positive for ZP87! Now I have to figure out why FTDNA did not call it (and learn how to read VCF files). Both kits had at least 7 positive reads of ZP87.

They also said they don't have a single kit with both Z17+ and ZP87-, so that issue will have to wait. Since my own kits are positive for both, my thought that L257 branches were negative for ZP87 was wrong. Their reason for choosing ZP87 over Z17 is that they have a kit placed downstream of this branch that does not have a read of Z17. I hope that someday they will allow "presumed positive" SNP's, when there are sufficient positive downstream SNP's.