PDA

View Full Version : Search for largest predictable haplogroups under haplogroup R1b



RobertCasey
07-03-2021, 03:57 PM
I have now completed almost 50 reviews of predictable haplogroups under haplogroup R. However, I keep finding dozens of predictable haplogroups with only 5 to 10 branches which are pretty small in scope. I am looking for predictable haplogroups only under any haplogroup R (their TMRCA must be in the 1500 to 2200 YBP range for my prediction model using binary logistic regression). These haplogroups can not only be predicted but can be charted with SAPP as well. Here is my list of all predictable haplogroups with over 50 branches (in order of size):

M222 - 1,093
L1065 - 593
CTS4466 - 405
Z255 - 317 - this haplogroup has some convergence
L226 - 222 - I am one of the admins for this project
L193 - 187
S844 - 83
Z375 - 77 - U106 branch
Z16506 - 54
CTS9881 - 51
FGC9798 - 50

R1 Basal and R1a are too old and do not have genetic isolation for too much YSNP prediction. I was able to predict BY62252 under R1 Basal but this only has two branches below BY62252. Very little under R1 Basal is probaly predictable. I was also able to predict YP358 under R1a which has 18 branches and R1a probably has more but not as many as P312/L21. U106 probably has several larger ones but it is also pretty old and lacks genetic isolation but should have significant YSNP prediction in the 1500 to 2500 YBP range. P312 and L21 can be predicted for over 50 % of the testers - but have dozens of smaller haplogroups. I was able to predict around 75 % of Z253 but it took nine haplogroups to get this coverage.

Did I miss any larger predictable haplogroups in the 1500 to 2500 YBP range ?

TigerMW
07-08-2021, 02:54 PM
I have now completed almost 50 reviews of predictable haplogroups under haplogroup R. However, I keep finding dozens of predictable haplogroups with only 5 to 10 branches which are pretty small in scope. I am looking for predictable haplogroups only under any haplogroup R (their TMRCA must be in the 1500 to 2200 YBP range for my prediction model using binary logistic regression). These haplogroups can not only be predicted but can be charted with SAPP as well. Here is my list of all predictable haplogroups with over 50 branches (in order of size):

M222 - 1,093
L1065 - 593
CTS4466 - 405
Z255 - 317 - this haplogroup has some convergence
L226 - 222 - I am one of the admins for this project
L193 - 187
S844 - 83
Z375 - 77 - U106 branch
Z16506 - 54
CTS9881 - 51
FGC9798 - 50

R1 Basal and R1a are too old and do not have genetic isolation for too much YSNP prediction. I was able to predict BY62252 under R1 Basal but this only has two branches below BY62252. Very little under R1 Basal is probaly predictable. I was also able to predict YP358 under R1a which has 18 branches and R1a probably has more but not as many as P312/L21. U106 probably has several larger ones but it is also pretty old and lacks genetic isolation but should have significant YSNP prediction in the 1500 to 2500 YBP range. P312 and L21 can be predicted for over 50 % of the testers - but have dozens of smaller haplogroups. I was able to predict around 75 % of Z253 but it took nine haplogroups to get this coverage.

Did I miss any larger predictable haplogroups in the 1500 to 2500 YBP range ?


What do you think of R1b-L513, which is a superset of L193?

It's probably more in the 4000 ybp range but it's DYS406s1 <=11 and DYS617>= 13 works well, but it is not 100% full proof as I've seen false predictions in M222, CTS4466 among other places. .. however they are fairly rare.

I do think M222 will remain the clear winner.

RobertCasey
07-08-2021, 05:11 PM
What do you think of R1b-L513, which is a superset of L193?

It's probably more in the 4000 ybp range but it's DYS406s1 <=11 and DYS617>= 13 works well, but it is not 100% fool proof as I've seen false predictions in M222, CTS4466 among other places. .. however they are fairly rare.

I do think M222 will remain the clear winner.

To be predictable haplogroups using "binary logistic regression" models, they have to be in the 1500 to 2500 YBP range. There is just too much convergence above this time frame for accurate YSNP prediction. Almost all of the prediction models using two variables are over 99 % accuracy these days - occasionally, it drops to the 95 to 98 % range when significant convergence is found. I migrated to using both signature match and genetic distance from the signature in the model which dramatically improves accuracy. You also need large blocks of YSNPs (many branch equivalents) at the two highest YSNP branches being predicted. This gives it genetic isolation for nearby haplogroups. For much older haplogroups, the Athey prediction model works very well - but this model can usually only separate at the R-M269 level.

I do have three predictable haplogroups under L513 to date:

L193 - 189
FGC9798 - 50
S7897 - 35

There are probably three or four more in the 20 to 40 ranges - but takes a lot of time to collect the data - analyzing does not take long.

TigerMW
07-08-2021, 05:28 PM
...
I do have three predictable haplogroups under L513 to date:

L193 - 189
FGC9798 - 50
S7897 - 35

There are probably three or four more in the 20 to 40 ranges - but takes a lot of time to collect the data - analyzing does not take long.

How about these under R-L513?

R-Z16372 - It has a phylogenetic block of 28 SNPs and it has 67 branches so it is of good size.
https://www.familytreedna.com/public/y-dna-haplotree/R;name=R-Z16372

R-CTS11744(L705.2) - It has to be close to making the cut. It has only 40 known branches but it has a phylogenetic block of 30 SNPs. This is where I live.
https://www.familytreedna.com/public/y-dna-haplotree/R;name=R-CTS11744

R-CTS3087 - It has a phylogenetic block of 22 SNPs and it has 48 branches,
https://www.familytreedna.com/public/y-dna-haplotree/R;name=R-CTS3087

I can't find S7897 under L513. Is this the subclade of Z253 instead? I see it appears twice on the tree.

RobertCasey
07-22-2021, 08:27 PM
S7897 should be S7834 which has 35 branches when I last analyzed it. CTS11744 looks like a good one to analyze and I believe L705 was one of the ones included in my L21 SNP predictor tool. Z16372 looks pretty good. CTS3087 is a little low on branch equivalents but probably would work. I am trying to find the largest predictable haplogroups as well as finding the largest ones across haplogroup R. R1b Basal, R1a and U106 is not great for YSNP prediction at the 99 % level. It has a more even progression of YSNP branches (with few or no branch equivalents). P312 and L21 have huge bottlenecks of blocks of YSNPs in the correct time frame. All six would allow prediction of 66 % of L513. Getting it up to 75 % would probably require another ten smaller haplogroups.