Analysis of R-L21>Z253>DF73 and S933 (source data, YSNP prediction & SAPP charts

10-27-2020, 11:25 PM
I am continuing to challenge the limits of YSNP prediction and YDNA charting. This analysis started out to be DF73 but it was required to reduce the scope to its son, S933.
There are several criteria that maximizes accuracy of both YSNP prediction and YDNA charting:

1) The haplogroup must be in the time frame of predictable haplogroups which is between 1500 to 2500 YBP. Any older haplogroups make YSTR signatures too small and
charting is not possible due to so many hidden mutations in the upper parts of the chart. According to YFULL, DF73 is 2500 YBP but it also lists S933 as 2500 YBP as well.

2) The YSNP branch should have 10 to 20 branch equivalents in the top two levels of branching. This gives genetic isolation from other haplogroups. For haplogroup R,
the vast majority of larger predictable haplogroups began to experience large growth after a long period of bottleneck of offspring.

3) Sample sizes must be between 50 and 100 testers which have 20 to 25 % of YSNP testing. This means you need 10 to 20 Big Y testers and 10 or so YSNP branches.
This seems to be a major limitation as around half of the predictable haplogroups fail to be this large (but 90 % of the testers belong to the more prolific haplogroups).

4) You really need a minimum of six markers in the YSTR signature of the predictable haplogroup - this happens over 90 % of the time when other criteria are met.

S933 was another challenge to collect enough data and the TMRCA are still not certain. Most of the qualifications for S933 are pretty marginal but YSNP prediction
seems to work (only at 80 % vs. the normal 99 % accuracy). Charting has some yellow flags as well (very large numbers of private YSTRs and out of 23 known
confirmed and predicted testers - none have the same surname). It seems that accuracy suffers quite a bit when criteria is not well met but the results are
still very useful for analysis (and beats most manual analysis that manual methods produce).

Here are the files for the analysis:

High level summary of the analysis:


Source of all data related to analysis:

SAPP charting files (one input and two outputs):