PDA

View Full Version : R1a subclade prediction



MfA
09-01-2016, 01:15 PM
Came across this specimen while looking at results of a Kurdish village in Gokcumen et al., 2011. Unfortunately It's only 16STR markers. Nevgen predictor predicted it as Z287 with relatively not that bad fitness values, which is still unusual.
https://abload.de/img/ydnahaplogrouppredict1wsfu.png

Compared to
vis-à-vis 16STR on the R1a project and it got a match with "N111143 Italy R-CTS3402"
@ first 15STR "369183 Italy R-CTS1211",
even with using only first 11 markers it matches only with Z283 people oddly.



129045 Franc Pestotnik, b. 1813, Psajnovica, Slovenia Slovenia R-CTS1211
N114571 Symeon(Sam)Starosta,Borschiv,Ukr. d.1950's Canada Ukraine R-CTS1211
411421 Johan Christopher William Rost Germany R-YP340
52014 Johann Heinrich Rathke, b 23 Sep 1753, Balin, Ger Germany R-M512 CTS1211>YP343>YP340*-y
N7207 Georgio Loscavo Italy R-M198 CTS1211>YP343>YP340*-y
322969 Stefan Grabiański, b. 1780 and d. 1827 Poland R-Z283
245756 Zoltán Drinóczi, 1954, István Drinóczi, 1931 Hungary R-Z283
425156 Adopted - Ukrainian father Ukraine R-YP340
E16481 Macedonia R-YP372
316721 Wasyl Naberezny, b. 1894 and d. 1956 Poland R-YP343
N111143 Nazareno Montalbini b.1850 Italy d. 1924 Italy R-CTS3402
236632 Ivan Fedorov son Pirogov, born near 1607 y. Russian Federation CTS1211>Y35>CTS3402>Y33>CTS8816>Y2902-x4
327492 Илларион Яковлевич Каштанов, 1898 - 1968 Russian Federation R-M512 Y2902-x4
248305 Nikola Blagoev Kol. (14.6.1856-1905), Zagorichani, Bulgaria R-L366
187099 Triavna, Bulgaria Bulgaria R-Z283
N93324 Janez (Johan) Hocevar (Hozhevar), 1812, Krska vas Slovenia R-Z283
110695 John Harris, b. 1843, died 1887. Unknown Origin R-M198 CTS1211-y
369183 Italy R-CTS1211
446468 Sylvester Filipowski, b. ~1760 Bochnia, Poland Poland R-M198 YP1405*-A-x
72166 Poland Poland R-M512 YP270>CTS4648-x1
115874 Giuseppe Regine, b. 1891, Ischia, Italy Italy R-Z283
348024 Olof Pehrson, b 1779 Hällestad (R), d 1855 Sweden R-Z283


https://docs.google.com/spreadsheets/d/1asfF-hm-GhlSW_TxwqzXgwD_dEVzkLJxyW3JFqswujM/edit?usp=sharing

Michał
09-01-2016, 04:23 PM
Came across this specimen while looking at results of a Kurdish village in Gokcumen et al., 2011. Unfortunately It's only 16STR markers. Nevgen predictor predicted it as Z287 with relatively not that bad fitness values, which is still unusual.
[...]
Compared to
vis-à-vis 16STR on the R1a project and it got a match with "N111143 Italy R-CTS3402"
@ first 15STR "369183 Italy R-CTS1211",
even with using only first 11 markers it matches only with Z283 people oddly.

I wouldn't trust those predictions, mostly because they were based on such a low number of STRs. Z93 is a very old/large clade and thus it is quite well differentiated. Also, since it remains undertested, we can securely assume that it includes many not so rare subclades that are not known to us yet (and thus they are rarely used in any predictors). Finally, the modal STR haplotype for Z93 is practically identical with the modals for M417, Z645 and Z283, so unless a given haplotype belongs to a very specific subclade (with a very characteristic/unusual haplotype), it is almost impossible to discriminate between Z93 and Z283 (and especially between Z93 and Z280, as both M458 and Z284 are slightly less differentiated and show some relatively rare STR results that are common to nearly all its members, which is something that is, unfortunately, missing for Z280 and Z93). It is not surprising that most of those Z283 matches are from clade Z280 (especially CTS1211>CTS3402), and more specifically from categories encompassing the so-called unclustered haplotypes that do not fit any more specific subclades (maybe except subclade Y2902, as in this case the first 37 STRs are also very close to the M417/Z280/Z93 modal). Those unclustered CTS1211/CTS3402 cases are usually recognized as such (with no SNP testing) based on a characteristic result DYS464a=13 that quite strongly correlates with CTS1211/CTS3402 (though in people of non-Central-Eastern European ancestry this can be very misleading).

As for Z287, it is frequently impossible to recognize this clade (even as an unknown subclade of Z284) based on the first 37 STRs, so I wonder what STR pattern was used in the Nevgen predictor to recognize a likely Z287 member in this particular case. Could you please post those Kurdish STR results here?

For any R1a-M417 haplotype from Kurdistan, I would expect that there is about 80-90% chance that this is Z93. As for the remaining 10-20%, more than 5% (5-15%?) should probably be attributed to the so-called West Asian clade Z282>Y17491>YP4858, a very old subclade of Z282 that is often impossible to recognize based on the first 37 STRs only.

MfA
09-01-2016, 04:37 PM
I wouldn't trust those predictions, mostly because they were based on such a low number of STRs. Z93 is a very old/large clade and thus it is quite well differentiated. Also, since it remains undertested, we can securely assume that it includes many not so rare subclades that are not known to us yet (and thus they are rarely used in any predictors). Finally, the modal STR haplotype for Z93 is practically identical with the modals for M417, Z645 and Z283, so unless a given haplotype belongs to a very specific subclade (with a very characteristic/unusual haplotype), it is almost impossible to discriminate between Z93 and Z283 (and especially between Z93 and Z280, as both M458 and Z284 are slightly less differentiated and show some relatively rare STR results that are common to nearly all its members, which is something that is, unfortunately, missing for Z280 and Z93). It is not surprising that most of those Z283 matches are from clade Z280 (especially CTS1211>CTS3402), and more specifically from categories encompassing the so-called unclustered haplotypes that do not fit any more specific subclades (maybe except subclade Y2902, as in this case the first 37 STRs are also very close to the M417/Z280/Z93 modal). Those unclustered CTS1211/CTS3402 cases are usually recognized as such (with no SNP testing) based on a characteristic result DYS464a=13 that quite strongly correlates with CTS1211/CTS3402 (though in people of non-Central-Eastern European ancestry this can be very misleading).

As for Z287, it is frequently impossible to recognize this clade (even as an unknown subclade of Z284) based on the first 37 STRs, so I wonder what STR pattern was used in the Nevgen predictor to recognize a likely Z287 member in this particular case. Could you please post those Kurdish STR results here?

For any R1a-M417 haplotype from Kurdistan, I would expect that there is about 80-90% chance that this is Z93. As for the remaining 10-20%, more than 5% (5-15%?) should probably be attributed to the so-called West Asian clade Z282>Y17491>YP4858, a very old subclade of Z282 that is often impossible to recognize based on the first 37 STRs only.

Thanks Michał, nevgen friendly STR values are below


13,25,16,11,11-14,0,0,11,13,11,30,15,0,0,0,0,14,19,0,0,0,0,0,15,0 ,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,23,0,0,0,0 ,0,0,0


GD between him and Kurdish Z282 # 214352 is 8 @15STR

Michał
09-01-2016, 06:34 PM
13,25,16,11,11-14,0,0,11,13,11,30,15,0,0,0,0,14,19,0,0,0,0,0,15,0 ,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,23,0,0,0,0 ,0,0,0

Thanks. Honestly, I don't see anything in this haplotype that would specifically point to Z287. I guess this particular variant of the Nevgen predictor does not recognize any specific subclades under Z287 (as opposed to another version dedicated specifically to R1a, but requiring at least 67 STRs for proper functioning), and this is probably the reason why it "wrongly" predicts Z287.

When compared with the modal for M417, the only off-modal values found in that Kurdish haplotype are DYS439=11(+1), DYS448=19(-1) and DYS456=15(-1). DYS448=19 is rare in all major branches under M417 (except L664), but both DYS439=11 and DYS456=15 are indeed quite frequent in clade Z287. However, they are mostly associated with only one of the three major subclades under Z287 (ie. with YP402), while the "basic" version of the Nevgen predictor does not take into account that YP402 is additionally quite strongly associated with some other rare STRs that are not typical for Z287 as a whole, and this includes DYS389b=16(-1), among others. So although all three results found in that Kurd (ie. DYS439=11, DYS389b=17 and DYS456=15) are indeed common in Z287, they rarely co-exist in the same subclade, as instead of 11-17-15, we usually see either 11-16-15 (in subclade YP402) or 10-17-16 / 10-17-15 (in the two remaining major subclades).

As for the two "exact" matches from branch Z280 (more specifically Y2902 and an unclustered member of subclade CTS1211), the above-mentioned off-modal STR results are not typical for these particular subclades, so both these Z280 haplotypes are likely to match the very short Kurdish haplotype just by coincidence. Please note that these two Z280 lineages are strongly predicted to belong to separate subclades under CTS1211/CTS3402, so we know that they cannot form any specific common subclade with the Kurdish sample.

If I had to classify this haplotype without knowing its ethnic origin, I would probably give up (assigning it to a group of M417 haplotypes that simply require more STRs and/or SNP testing), even though I would provisionally rule out M458. However, if I knew this is a Kurdish (or West Asian) sample, I would most likely classify him as a potential member of branch Z93.

BTW, when using the R1a-specific version of the Nevgen predictor (http://www.nevgen.org/#), the two most likely predictions are shown as R1a Z282>Z280> CTS1211>YP343> YP3979 and R1a Z93>Z94> Z2123>Y2632, but I would approach this with much caution (as the number of STRs analyzed is much lower than required).

09-01-2016, 06:49 PM
Thanks. Honestly, I don't see anything in this haplotype that would specifically point to Z287. I guess this particular variant of the Nevgen predictor does not recognize any specific subclades under Z287 (as opposed to another version dedicated specifically to R1a, but requiring at least 67 STRs for proper functioning), and this is probably the reason why it "wrongly" predicts Z287.

When compared with the modal for M417, the only off-modal values found in that Kurdish haplotype are DYS439=11(+1), DYS448=19(-1) and DYS456=15(-1). DYS448=19 is rare in all subclades under M417 (except L664), but both DYS439=11 and DYS456=15 are indeed quite frequent in clade Z287. However, they are mostly associated with only one of the three major subclades under Z287 (ie. with YP402), while the "basic" version of the Nevgen predictor does not take into account that YP402 is additionally quite strongly associated with some other rare STRs that are not typical for Z287 as a whole, and this includes DYS389b=16(-1), among others. So although all three results found in that Kurd (ie. DYS439=11, DYS389b=17 and DYS456=15) are indeed common in Z287, they rarely co-exist in the same subclade, as instead of 11-17-15, we usually see either 11-16-15 (in subclade YP402) or 10-17-16 / 10-17-15 (in the two remaining major subclades).

As for the two "exact" matches from branch Z280 (more specifically Y2902 and an unclustered member of subclade CTS1211), the above-mentioned off-modal STR results are not typical for these particular subclades, so both these Z280 haplotypes are likely to match the very short Kurdish haplotype just by coincidence. Please note that these two Z280 lineages are strongly predicted to belong to separate subclades under CTS1211/CTS3402, so we know that they cannot form a very specific common subclade with the Kurdish sample.

If I had to classify this haplotype without knowing its ethnic origin, I would probably give up (assigning it to a group of M417 haplotypes that simply require more STRs and/or SNP testing), even though I would provisionally rule out M458. However, if I knew this is a Kurdish (or West Asian) sample, I would most likely classify him as a potential member of branch Z93.

BTW, when using the R1a-specific version of the Nevgen predictor (http://www.nevgen.org/#), the two most likely predictions are shown as R1a Z282>Z280> CTS1211>YP343> YP3979 and R1a Z93>Z94> Z2123>Y2632, but I would approach this with much caution (as the number of STRs analyzed is much lower than required).

Hi Michal,


I am R1a1a, I got my dna test with 23andme, am I able to use that YDNA predictor to calculate my subclade?
Sorry im a complete novice with Dna, but trying to learn

Michał
09-01-2016, 08:30 PM
I am R1a1a, I got my dna test with 23andme, am I able to use that YDNA predictor to calculate my subclade?

No, you need STR results. This predictor uses STR results to predict an SNP-defined subclade.

vettor
09-02-2016, 05:55 AM
Hi Michal,


I am R1a1a, I got my dna test with 23andme, am I able to use that YDNA predictor to calculate my subclade?
Sorry im a complete novice with Dna, but trying to learn

see if this works

http://www.y-str.org/2014/04/23andme-to-ysnps.html

MfA
09-02-2016, 10:20 AM
Thanks. Honestly, I don't see anything in this haplotype that would specifically point to Z287. I guess this particular variant of the Nevgen predictor does not recognize any specific subclades under Z287 (as opposed to another version dedicated specifically to R1a, but requiring at least 67 STRs for proper functioning), and this is probably the reason why it "wrongly" predicts Z287.

When compared with the modal for M417, the only off-modal values found in that Kurdish haplotype are DYS439=11(+1), DYS448=19(-1) and DYS456=15(-1). DYS448=19 is rare in all major branches under M417 (except L664), but both DYS439=11 and DYS456=15 are indeed quite frequent in clade Z287. However, they are mostly associated with only one of the three major subclades under Z287 (ie. with YP402), while the "basic" version of the Nevgen predictor does not take into account that YP402 is additionally quite strongly associated with some other rare STRs that are not typical for Z287 as a whole, and this includes DYS389b=16(-1), among others. So although all three results found in that Kurd (ie. DYS439=11, DYS389b=17 and DYS456=15) are indeed common in Z287, they rarely co-exist in the same subclade, as instead of 11-17-15, we usually see either 11-16-15 (in subclade YP402) or 10-17-16 / 10-17-15 (in the two remaining major subclades).

As for the two "exact" matches from branch Z280 (more specifically Y2902 and an unclustered member of subclade CTS1211), the above-mentioned off-modal STR results are not typical for these particular subclades, so both these Z280 haplotypes are likely to match the very short Kurdish haplotype just by coincidence. Please note that these two Z280 lineages are strongly predicted to belong to separate subclades under CTS1211/CTS3402, so we know that they cannot form any specific common subclade with the Kurdish sample.

If I had to classify this haplotype without knowing its ethnic origin, I would probably give up (assigning it to a group of M417 haplotypes that simply require more STRs and/or SNP testing), even though I would provisionally rule out M458. However, if I knew this is a Kurdish (or West Asian) sample, I would most likely classify him as a potential member of branch Z93.

BTW, when using the R1a-specific version of the Nevgen predictor (http://www.nevgen.org/#), the two most likely predictions are shown as R1a Z282>Z280> CTS1211>YP343> YP3979 and R1a Z93>Z94> Z2123>Y2632, but I would approach this with much caution (as the number of STRs analyzed is much lower than required).

Michał thank you very much for the detailed answer, If it's not too much Could you look into these 2 samples as well? Fortunately they've more STR values. They're Kurds from Iran.


13,25,15,10,11-14,12,12,10,13,11,29,15,9-10,11,11,24,14,20,32,13-15-15-16,11,11,19-23,16,0,0,0,0,14,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,12,0,0,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,11,30,12,14,0,13,9,0,0,0,0,0,0,0,0,0 ,0,0,23,0,0,0,0,0,10

13,24,16,11,11-15,12,12,10,13,11,31,16,9-10,11,11,24,0,20,30,12-15-15-16,11,11,19-23,16,0,0,0,0,14,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,13,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,11,0,12,14,0,13,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,11

Michał
09-02-2016, 12:28 PM
Michał thank you very much for the detailed answer, If it's not too much Could you look into these 2 samples as well? Fortunately they've more STR values. They're Kurds from Iran.


13,25,15,10,11-14,12,12,10,13,11,27,15,9-10,11,11,24,14,20,32,13-15-15-16,11,12,19-23,16,0,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,12,0,0,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,11,30,12,0,0,0,9,0,0,0,0,0,0,0,0,0,0, 0,23,0,0,0,0,0,10,0

13,24,16,11,11-15,12,12,10,13,11,29,16,9-10,11,11,24,0,20,30,12-15-15-16,11,12,19-23,16,0,0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,13,0,0,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,11,0,12,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,11,0

Haplotype No. 1
If it was a European haplotype, I would consider it a likely candidate for an unknown (thus very rare) subclade under CTS1211, Y35 or CTS3402, mostly because of DYS464a=13. However, since this is an Iranian/Kurdish sample, this seems much less likely. In all such Asian cases of DYS464a=13, I am unwilling to assign them to CTS1211 unless the other STR results strongly indicate a specific downstream subclade/cluster under CTS1211, which doesn't seem to be the case here. Intriguingly, this haplotype includes an extremely rare DYS389 result (13-27), which is something I haven't seen so far. This suggests that this is either a very rare R1a subclade or a subclade that is relatively common in some undertested regions of Eurasia. Based on all those STR results, I can exclude L664, while M458 seems to be very unlikely, so we are left with Z93, Z280, Z284 and some rare/unknown subclades under Z283, Z282, Z645, M417, etc. Of course, all this makes an unknown/rare clade under Z93 the most likely option. When searching for potential distant relatives for this Iranian Kurd, I would look for haplotypes showing co-existence of at least 2 (and better 3) of the following 4 rare STR results: DYS389b<16(=14), DYS464=13, DYS444<13(=12) and DYS461<11(=10).

Haplotype No. 2
This one seems to be much less unusual/atypical, but since it is also very close to the modal for M417, Z645, Z93, etc, it is hard to assign it to any specific branch/subclade under M417. As above, I would rule out L664 and (slightly less securely) M458, but all remaining major branches seem perfectly possible. Again, I would consider Z93 to be the most likely option for ethnic/geographical reasons, though I cannot indicate any specific subclade.

MfA
10-07-2016, 09:00 AM
2 new samples from Dêrsim, both Kurmanji Kurds, Z283 (Z280-, Z284-) # E18606 and ¿YP5664? # E20643.

https://docs.google.com/spreadsheets/d/1asfF-hm-GhlSW_TxwqzXgwD_dEVzkLJxyW3JFqswujM/edit#gid=624886255

https://www.familytreedna.com/public/PROJEDERSIMIEDNA?iframe=yresults

Michał
10-07-2016, 04:39 PM
Z283 (Z280-, Z284-) # E18606
He likely forms a new (yet unnamed) subclade under YP4858 with kit 182305 from Turkey/Armenia.


and ¿YP5664? # E20643.
Yes, he seems to be a member of cluster YP5664-A.

Would it be possible to ask them both to join our R1a project?

MfA
10-07-2016, 04:58 PM
He likely forms a new (yet unnamed) subclade under YP4858 with kit 182305 from Turkey/Armenia.


Yes, he seems to be a member of cluster YP5664-A.

Would it be possible to ask them both to join our R1a project?

Thanks very much Michał. I'm not the admin over there, though they're fresh members they may join in the next days. If they won't I'll try asking.

Smilelover
10-23-2016, 06:33 PM
Thanks very much Michał. I'm not the admin over there, though they're fresh members they may join in the next days. If they won't I'll try asking.

I contact my match YP5664A and he joined the R1a project already

MfA
11-09-2016, 08:35 AM
Here's a new Ezidi Kurdish Y12 STR results, it's likely despite the low resolution, CTS6+

https://docs.google.com/spreadsheets/d/1asfF-hm-GhlSW_TxwqzXgwD_dEVzkLJxyW3JFqswujM/edit#gid=624886255


534043 13 25 16 10 11-14 12 12 12 14 11 31


There's a new study on its way, 300 Kurdish results will be available.

https://yhrd.org/YA004213
https://yhrd.org/YA004216

MfA
11-18-2016, 10:32 AM
544365 SNP pack results came back: Z2124* (Z2122-, Z2125-)

Michał
11-18-2016, 11:07 AM
544365 SNP pack results came back: Z2124* (Z2122-, Z2125-)
AFAIK, this is a first Z2124* case known to us.

MfA
11-18-2016, 01:04 PM
AFAIK, this is a first Z2124* case known to us.

Let's hope he orders Big Y

Michał
11-19-2016, 06:00 PM
Let's hope he orders Big Y
At this point, the Big Y test won't help him much. However, his Big Y results should turn very useful after another Z94* lineage is found (and tested with NGS).

MfA
11-19-2016, 06:45 PM
At this point, the Big Y test won't help him much. However, his Big Y results should turn very useful after another Z94* lineage is found (and tested with NGS).

There aren't many Kurds and Iranians tested at FTDNA. A paper like the Sardinian paper with NGS test would be greatly appreciated.

Smilelover
11-22-2016, 08:21 PM
a new member to YP4141 joined R- arabia project from Qatar
he is assigned R-YP4132 ancestral to YP4131

MfA
12-19-2016, 09:29 PM
Stenersen et al., 2004 Kurdish (Iraq) and Somalian population data for 15 autosomal and 9 Y-chromosomal STR loci

Kurdish R1a's.
3xR1a-L62>YP5664
2xR1a-L62>M417>Z94>Y40>YP4867
1xR1a-L62>M417>Z282>YP4858
8xvarious R1a M417,Z94,Z2124

MfA
01-01-2017, 11:34 AM
2 Kurdish R1a from Iraq

554429 Belbas (http://www.iranicaonline.org/articles/belbas-a-former-kurdish-tribal-confederacy-of-northwestern-iran-and-northeastern-iraq) R1a-L62>YP5664
N36757 Çingyanî R1a-L62>M417>Z94>Y40>YP4867

Updated haplotype spreadsheet
https://docs.google.com/spreadsheets/d/1asfF-hm-GhlSW_TxwqzXgwD_dEVzkLJxyW3JFqswujM/edit#gid=624886255

MfA
04-04-2017, 06:51 PM
3 new Kurdish R1a-YP4858. Decent number of Kurdish R1as belongs to the clade, another one is YP5664.

E21646 R1a-L62>M417>Z282>YP4858
E22087 R1a-L62>M417>Z282>YP4858
534044 R1a-L62>M417>Z282>YP4858

Updated haplotype spreadsheet
https://docs.google.com/spreadsheets/d/1asfF-hm-GhlSW_TxwqzXgwD_dEVzkLJxyW3JFqswujM/edit#gid=624886255