PDA

View Full Version : New paper on YSNP prediction using signatures and genetic distance



RobertCasey
02-03-2018, 05:47 PM
I just completed a major update to this paper which has many significant improvements in binary logistic regression for YSNP prediction using both signatures and genetic distance. For the first time, I have used both signature and genetic distance in the model and the first three haplogroups all have 100 % accuracy for 54,000 testers at 67 markers under haplogroup R. New models for L226, L555 and L371 are given as examples and new easy to construct genetic filter tables have been created to visualize how these models work and can be used for prediction without using the supporting math models. At least 80 % of all haplogroups across the entire genome can use this methodology to predict YSNPs with over 99 % accuracy for haplogroups in the 1,200 to 2,500 year range. Feedback is welcomed:

http://www.rcasey.net/DNA/R_L21/Math_behind_R_L21_SNP_Predictor_20180202A.pdf

Almagest
02-03-2018, 06:47 PM
How did you realise that the range 1200-2500 worked best? I assume you compared the YSTR>YSNP results probability to those that had confirmed positive for certain SNP’s and used the TMRCA of those SNPs

RobertCasey
02-03-2018, 09:03 PM
How did you realise that the range 1200-2500 worked best? I assume you compared the YSTR>YSNP results probability to those that had confirmed positive for certain SNP’s and used the TMRCA of those SNPs
During the "Walk the Y" testing era, I was able to analyze every new branch that was discovered under R-L21. It was real obvious that YSNPs like L513, DF21, DF41, etc. could not be charted without multiple signatures and there was still convergence within these haplogroups even with multiple signatures. I was also able to chart around 30 predictable YSNPs with high accuracy as well. These YSNPs now have pretty solid TMRCA estimates which allowed me to derive this date range. I used to state between 1,500 and 2,500 years, but recent adjustments show that YSNP prediction can be somewhat younger. Also, this is not an absolute range. If your signature is large enough and genetic isolation is very apparent, these limits will expand in either direction some.

All of the 50 signatures that I have analyzed all have pretty solid TMRCA estimates. Also, you really do not care about the upper limit of the TMRCA as these are due to the bottleneck of YSNP equivalents for the haplogroup. You really should use the lower range when the YSNP became prolific as 95 to 98 % of the testers descend the more recent date.