PDA

View Full Version : How West Eurasian are different Uralic populations in Europe?



davit
01-15-2021, 02:26 AM
And did those lands have ANE/WSHG/EHG substrates before Uralics (guessing close to fully ENA populations) moved in?

al.Krivich
08-06-2021, 10:14 PM
From what I understand the TRVE Uralics like Khanti, Mansi, Nenets are not West Eurasian what so ever, their most "West Eurasian" component comes from EHG admix that was present in Siberia since forever

The wannabe Uralics like Mari, Udmurts, Komi are assimilated Corded Ware and Combed Ware people with minor TRVE Uralic admixture

If we're talking about Moksha/Erzya/Karelians/Finns/Estonians/Hungarians then they're just typical Europeans of Corded Ware, Combed Ware, Bell Beaker descent with pretty much nothing eastern

J-Live
08-06-2021, 10:29 PM
From what I understand the TRVE Uralics like Khanti, Mansi, Nenets are not West Eurasian what so ever, their most "West Eurasian" component comes from EHG admix that was present in Siberia since forever

The wannabe Uralics like Mari, Udmurts, Komi are assimilated Corded Ware and Combed Ware people with minor TRVE Uralic admixture

If we're talking about Moksha/Erzya/Karelians/Finns/Estonians/Hungarians then they're just typical Europeans of Corded Ware, Combed Ware, Bell Beaker descent with pretty much nothing eastern

So Finns/Estonians and the like have absolutely no Siberian or East Asian ancestry at all?

vasil
08-06-2021, 10:53 PM
So Finns/Estonians and the like have absolutely no Siberian or East Asian ancestry at all?

Estonians have almost no Siberian, Finns have it but actually less than Northern Russians it does peak in the Saami though.

al.Krivich
08-07-2021, 12:37 AM
So Finns/Estonians and the like have absolutely no Siberian or East Asian ancestry at all?

Almost none pretty much.

Tsakhur
08-28-2021, 06:17 AM
From what I understand the TRVE Uralics like Khanti, Mansi, Nenets are not West Eurasian what so ever, their most "West Eurasian" component comes from EHG admix that was present in Siberia since forever

The wannabe Uralics like Mari, Udmurts, Komi are assimilated Corded Ware and Combed Ware people with minor TRVE Uralic admixture

If we're talking about Moksha/Erzya/Karelians/Finns/Estonians/Hungarians then they're just typical Europeans of Corded Ware, Combed Ware, Bell Beaker descent with pretty much nothing eastern

Khanty, Mansi, Nenets actually have it quite a lot of West Eurasian ancestry. EHG is a West Eurasian component as well. There have been West Eurasians in Siberia since ancient times. Khanty, Mansi are around 47-49% West Eurasian while Nenets are approx 29% West Eurasian. Krasnoyarsk_BA is an East Eurasian component closely related to modern day Nganassans.

Target: Khanty
Distance: 6.1798% / 0.06179766
53.4 RUS_Krasnoyarsk_BA
41.0 Yamnaya_RUS_Samara
3.6 Baltic_EST_BA
2.0 WHG



Target: Mansi
Distance: 6.0422% / 0.06042214
51.4 RUS_Krasnoyarsk_BA
38.8 Yamnaya_RUS_Samara
6.6 Baltic_EST_BA
2.6 WHG
0.6 TUR_Barcin_N


Target: Nenets
Distance: 4.4318% / 0.04431806
50.6 RUS_Krasnoyarsk_BA
25.8 Yamnaya_RUS_Samara
20.4 Nganassan
1.8 WHG
0.6 Baltic_EST_BA
0.4 FIN_Levanluhta_IA_o
0.4 TUR_Barcin_N

Also Maris, Udmurts, Komis are not wannabe Uralics. They would cluster with Balts, Slavs, and Scandinavians if they are are assimilated Corded Ware and Combed Ware people with minor TRVE Uralic admixture. In fact, they have very significant amounts of Uralic (East Eurasian) ancestry. Udmurts are around 25-26% East Asian, Mari are approx 32-33% East Eurasian and Komi are 16-17% East Asian on average. Maitan_Alakul is a Steppe pop closely related to Sintashta_MLBA.

Target: Udmurt
Distance: 2.7785% / 0.02778518
32.6 KAZ_Maitan_MLBA_Alakul
25.2 RUS_Krasnoyarsk_BA
21.4 Yamnaya_RUS_Samara
19.2 Baltic_LTU_BA
0.8 Baltic_EST_BA
0.4 Levant_PPNB
0.4 TUR_Barcin_N

Target: Mari
Distance: 8.8936% / 0.08893626
40.6 Baltic_LTU_BA
32.6 RUS_Krasnoyarsk_BA
24.0 KAZ_Maitan_MLBA_Alakul
2.2 Yamnaya_RUS_Samara
0.6 IRN_Ganj_Dareh_N

Target: Komi
Distance: 1.5613% / 0.01561284
32.0 KAZ_Maitan_MLBA_Alakul
22.6 Baltic_EST_BA
18.8 Baltic_LTU_BA
17.0 RUS_Krasnoyarsk_BA
6.0 Yamnaya_RUS_Samara
3.6 TUR_Barcin_N

Also adding Bolshoy_Oleni_Ostrov_o and Uyelgi which are ancient Uralic populations (keep in mind they already have West Eurasian admixture) along with Krasnoyarsk_BA and you can see the amount of Uralic ancestry in these groups.
Also Udmurt, Mari, Komi seem to have some Iranian/Caucasus-related affinity as well likely from Scythian/Sarmatian or Turkic tribes roaming the Urals, which is something that Corded Ware and Combed Ware descended Europeans generally lacked. RUS_Alan_MA/Dzharkutan2_BA = resemble modern North Caucasians, Sappali_Tepe = resemble modern Iranians

Target: Khanty
Distance: 4.4647% / 0.04464659
42.6 RUS_Bolshoy_Oleni_Ostrov_o
25.8 Uyelgi
18.6 RUS_Krasnoyarsk_BA
12.6 KAZ_Maitan_MLBA_Alakul
0.4 UZB_Sappali_Tepe_BA_o

Target: Mansi
Distance: 4.1834% / 0.04183355
44.0 RUS_Bolshoy_Oleni_Ostrov_o
27.0 Uyelgi
15.0 RUS_Krasnoyarsk_BA
11.4 KAZ_Maitan_MLBA_Alakul
2.6 Levant_PPNB

Target: Nenets
Distance: 4.2736% / 0.04273600
56.8 RUS_Krasnoyarsk_BA
16.2 Uyelgi
12.4 RUS_Bolshoy_Oleni_Ostrov_o
8.0 Yamnaya_RUS_Samara
6.6 FIN_Levanluhta_IA_o


Target: Udmurt
Distance: 2.2967% / 0.02296709
37.8 KAZ_Maitan_MLBA_Alakul
14.4 Uyelgi
12.4 RUS_Krasnoyarsk_BA
11.4 RUS_Bolshoy_Oleni_Ostrov_o
10.6 Baltic_LTU_BA
5.6 Yamnaya_RUS_Samara
4.2 RUS_Alan_MA
3.2 Baltic_EST_BA
0.2 TUR_Barcin_N
0.2 UZB_Sappali_Tepe_BA_o

Target: Mari
Distance: 8.2352% / 0.08235223
34.0 Uyelgi
32.6 Baltic_LTU_BA
16.4 RUS_Krasnoyarsk_BA
13.6 KAZ_Maitan_MLBA_Alakul
2.4 Levant_PPNB
0.8 Han
0.2 MAR_Taforalt

Target: Komi
Distance: 1.5077% / 0.01507663
33.2 KAZ_Maitan_MLBA_Alakul
23.2 Baltic_EST_BA
16.6 Baltic_LTU_BA
14.4 RUS_Krasnoyarsk_BA
4.4 Uyelgi
3.8 TUR_Barcin_N
3.4 Yamnaya_RUS_Samara
1.0 RUS_Bolshoy_Oleni_Ostrov_o


Removing Krasnoyarsk_BA with Uyelgi and Bolshoy Ostrov to see the actual amount of Uralic genes:

Target: Khanty
Distance: 5.0651% / 0.05065091
71.0 RUS_Bolshoy_Oleni_Ostrov_o
24.8 Uyelgi
2.0 RUS_Alan_MA
1.2 MNG_North_N
1.0 MAR_Taforalt

Target: Mansi
Distance: 4.7348% / 0.04734837
68.4 RUS_Bolshoy_Oleni_Ostrov_o
25.8 Uyelgi
4.2 RUS_Alan_MA
1.6 MAR_Taforalt

Target: Nenets
Distance: 9.1206% / 0.09120610
60.4 RUS_Bolshoy_Oleni_Ostrov_o
26.8 MNG_North_N
12.8 Uyelgi


Target: Udmurt
Distance: 2.4874% / 0.02487387
32.2 RUS_Bolshoy_Oleni_Ostrov_o
30.2 KAZ_Maitan_MLBA_Alakul
15.2 Uyelgi
9.2 Baltic_LTU_BA
8.4 RUS_Alan_MA
2.6 TUR_Barcin_N
2.2 Baltic_EST_BA

Target: Mari
Distance: 8.4189% / 0.08418853
37.6 Uyelgi
27.6 Baltic_LTU_BA
24.4 RUS_Bolshoy_Oleni_Ostrov_o
5.8 TUR_Barcin_N
2.8 Levant_PPNB
1.4 Han
0.4 MAR_Taforalt

Target: Komi
Distance: 1.9040% / 0.01904045
23.8 RUS_Bolshoy_Oleni_Ostrov_o
22.6 KAZ_Maitan_MLBA_Alakul
20.8 Baltic_EST_BA
16.4 Baltic_LTU_BA
7.8 TUR_Barcin_N
5.8 Uyelgi
2.2 UZB_Dzharkutan2_BA
0.6 MNG_North_N

al.Krivich
08-31-2021, 04:14 PM
Khanty, Mansi, Nenets actually have it quite a lot of West Eurasian ancestry. EHG is a West Eurasian component as well. There have been West Eurasians in Siberia since ancient times. Khanty, Mansi are around 47-49% West Eurasian while Nenets are approx 29% West Eurasian. Krasnoyarsk_BA is an East Eurasian component closely related to modern day Nganassans.

Target: Khanty
Distance: 6.1798% / 0.06179766
53.4 RUS_Krasnoyarsk_BA
41.0 Yamnaya_RUS_Samara
3.6 Baltic_EST_BA
2.0 WHG



Target: Mansi
Distance: 6.0422% / 0.06042214
51.4 RUS_Krasnoyarsk_BA
38.8 Yamnaya_RUS_Samara
6.6 Baltic_EST_BA
2.6 WHG
0.6 TUR_Barcin_N


Target: Nenets
Distance: 4.4318% / 0.04431806
50.6 RUS_Krasnoyarsk_BA
25.8 Yamnaya_RUS_Samara
20.4 Nganassan
1.8 WHG
0.6 Baltic_EST_BA
0.4 FIN_Levanluhta_IA_o
0.4 TUR_Barcin_N

Also Maris, Udmurts, Komis are not wannabe Uralics. They would cluster with Balts, Slavs, and Scandinavians if they are are assimilated Corded Ware and Combed Ware people with minor TRVE Uralic admixture. In fact, they have very significant amounts of Uralic (East Eurasian) ancestry. Udmurts are around 25-26% East Asian, Mari are approx 32-33% East Eurasian and Komi are 16-17% East Asian on average. Maitan_Alakul is a Steppe pop closely related to Sintashta_MLBA.

Target: Udmurt
Distance: 2.7785% / 0.02778518
32.6 KAZ_Maitan_MLBA_Alakul
25.2 RUS_Krasnoyarsk_BA
21.4 Yamnaya_RUS_Samara
19.2 Baltic_LTU_BA
0.8 Baltic_EST_BA
0.4 Levant_PPNB
0.4 TUR_Barcin_N

Target: Mari
Distance: 8.8936% / 0.08893626
40.6 Baltic_LTU_BA
32.6 RUS_Krasnoyarsk_BA
24.0 KAZ_Maitan_MLBA_Alakul
2.2 Yamnaya_RUS_Samara
0.6 IRN_Ganj_Dareh_N

Target: Komi
Distance: 1.5613% / 0.01561284
32.0 KAZ_Maitan_MLBA_Alakul
22.6 Baltic_EST_BA
18.8 Baltic_LTU_BA
17.0 RUS_Krasnoyarsk_BA
6.0 Yamnaya_RUS_Samara
3.6 TUR_Barcin_N

Also adding Bolshoy_Oleni_Ostrov_o and Uyelgi which are ancient Uralic populations (keep in mind they already have West Eurasian admixture) along with Krasnoyarsk_BA and you can see the amount of Uralic ancestry in these groups.
Also Udmurt, Mari, Komi seem to have some Iranian/Caucasus-related affinity as well likely from Scythian/Sarmatian or Turkic tribes roaming the Urals, which is something that Corded Ware and Combed Ware descended Europeans generally lacked. RUS_Alan_MA/Dzharkutan2_BA = resemble modern North Caucasians, Sappali_Tepe = resemble modern Iranians

Target: Khanty
Distance: 4.4647% / 0.04464659
42.6 RUS_Bolshoy_Oleni_Ostrov_o
25.8 Uyelgi
18.6 RUS_Krasnoyarsk_BA
12.6 KAZ_Maitan_MLBA_Alakul
0.4 UZB_Sappali_Tepe_BA_o

Target: Mansi
Distance: 4.1834% / 0.04183355
44.0 RUS_Bolshoy_Oleni_Ostrov_o
27.0 Uyelgi
15.0 RUS_Krasnoyarsk_BA
11.4 KAZ_Maitan_MLBA_Alakul
2.6 Levant_PPNB

Target: Nenets
Distance: 4.2736% / 0.04273600
56.8 RUS_Krasnoyarsk_BA
16.2 Uyelgi
12.4 RUS_Bolshoy_Oleni_Ostrov_o
8.0 Yamnaya_RUS_Samara
6.6 FIN_Levanluhta_IA_o


Target: Udmurt
Distance: 2.2967% / 0.02296709
37.8 KAZ_Maitan_MLBA_Alakul
14.4 Uyelgi
12.4 RUS_Krasnoyarsk_BA
11.4 RUS_Bolshoy_Oleni_Ostrov_o
10.6 Baltic_LTU_BA
5.6 Yamnaya_RUS_Samara
4.2 RUS_Alan_MA
3.2 Baltic_EST_BA
0.2 TUR_Barcin_N
0.2 UZB_Sappali_Tepe_BA_o

Target: Mari
Distance: 8.2352% / 0.08235223
34.0 Uyelgi
32.6 Baltic_LTU_BA
16.4 RUS_Krasnoyarsk_BA
13.6 KAZ_Maitan_MLBA_Alakul
2.4 Levant_PPNB
0.8 Han
0.2 MAR_Taforalt

Target: Komi
Distance: 1.5077% / 0.01507663
33.2 KAZ_Maitan_MLBA_Alakul
23.2 Baltic_EST_BA
16.6 Baltic_LTU_BA
14.4 RUS_Krasnoyarsk_BA
4.4 Uyelgi
3.8 TUR_Barcin_N
3.4 Yamnaya_RUS_Samara
1.0 RUS_Bolshoy_Oleni_Ostrov_o


Removing Krasnoyarsk_BA with Uyelgi and Bolshoy Ostrov to see the actual amount of Uralic genes:

Target: Khanty
Distance: 5.0651% / 0.05065091
71.0 RUS_Bolshoy_Oleni_Ostrov_o
24.8 Uyelgi
2.0 RUS_Alan_MA
1.2 MNG_North_N
1.0 MAR_Taforalt

Target: Mansi
Distance: 4.7348% / 0.04734837
68.4 RUS_Bolshoy_Oleni_Ostrov_o
25.8 Uyelgi
4.2 RUS_Alan_MA
1.6 MAR_Taforalt

Target: Nenets
Distance: 9.1206% / 0.09120610
60.4 RUS_Bolshoy_Oleni_Ostrov_o
26.8 MNG_North_N
12.8 Uyelgi


Target: Udmurt
Distance: 2.4874% / 0.02487387
32.2 RUS_Bolshoy_Oleni_Ostrov_o
30.2 KAZ_Maitan_MLBA_Alakul
15.2 Uyelgi
9.2 Baltic_LTU_BA
8.4 RUS_Alan_MA
2.6 TUR_Barcin_N
2.2 Baltic_EST_BA

Target: Mari
Distance: 8.4189% / 0.08418853
37.6 Uyelgi
27.6 Baltic_LTU_BA
24.4 RUS_Bolshoy_Oleni_Ostrov_o
5.8 TUR_Barcin_N
2.8 Levant_PPNB
1.4 Han
0.4 MAR_Taforalt

Target: Komi
Distance: 1.9040% / 0.01904045
23.8 RUS_Bolshoy_Oleni_Ostrov_o
22.6 KAZ_Maitan_MLBA_Alakul
20.8 Baltic_EST_BA
16.4 Baltic_LTU_BA
7.8 TUR_Barcin_N
5.8 Uyelgi
2.2 UZB_Dzharkutan2_BA
0.6 MNG_North_N

if Oleni Ostrov is proxy for EHG, then we can conclude based on this that Khants are something like 71% EHG. This would make them very west eurasian genetically, however their appearance is rather east eurasian.

Tsakhur
09-23-2021, 07:34 PM
if Oleni Ostrov is proxy for EHG, then we can conclude based on this that Khants are something like 71% EHG. This would make them very west eurasian genetically, however their appearance is rather east eurasian.

Sorry for very late reply.

Oleni_Ostrov and Oleni_Ostrov_o are not proxies for EHG as they also have very high levels of East Eurasian of mainly Nganasan/Krasnoyarsk_BA (kra001)-related ancestry. If you want a proxy for EHG, you have many options from Samara_HG, Karelia_HG, Sidelkino_HG, Volga_Kama_EN, Khvalynsk_EN and Veretye_Meso.

Target: RUS_Bolshoy_Oleni_Ostrov
Distance: 2.3544% / 0.02354423
37.8 RUS_Krasnoyarsk_BA
32.6 RUS_Khvalynsk_En
24.2 RUS_Volga-Kama_N
2.8 WHG
1.4 Han
1.2 MNG_North_N

Target: RUS_Bolshoy_Oleni_Ostrov_o
Distance: 2.6907% / 0.02690695
47.8 RUS_Krasnoyarsk_BA
23.8 RUS_Khvalynsk_En
22.8 RUS_Volga-Kama_N
3.8 Han
1.8 MNG_North_N

Oleni_Ostrov is around 40% ENA/Eastern Non-African (another term for Eastern Eurasian) while Oleni_Ostrov_o is approximately 53% Eastern Eurasian/ENA.

Khanty are not 71% EHG but closer to 44% EHG with the rest of their genome being 51% Eastern Eurasian and 5% EEF/Neolithic Farmer.

Target: Khanty
Distance: 4.7530% / 0.04753042
50.8 RUS_Krasnoyarsk_BA
43.8 RUS_Khvalynsk_En
5.4 TUR_Barcin_N

Nganasankhan
09-23-2021, 09:26 PM
I selected modern samples from 1240K+HO with longitude over -30 and latitude over 30, and with no suffix like .SG or .DG. I excluded Chukchi, Eskimo, Itelmen, Koryak, Nanai, Negidal, Nivh, and Ulchi. Next I merged the samples with samples from Tambets et al. 2018 (https://evolbio.ut.ee/Tambets2018/). I then did an ADMIXTURE run of the samples with two components. The list below shows the percentage of the East Eurasian component for Uralic populations, and also for a few Turkic populations that are probably mixed with Uralics:

3 (2-4) Hungarian
5 (1-8) Tambets_Estonians
5 (3-9) Estonian
8 (6-10) Tambets_Karels
9 (6-12) Tambets_Ingrian
9 (5-13) Tambets_Finns
10 (7-12) Finnish
10 (6-13) Mordovian
11 (7-14) Karelian
12 (11-13) Tambets_Vepsas
12 (11-13) Veps
22 (15-27) Tambets_Saami_Kola
22 (17-27) Tambets_Tatars
26 (16-30) Chuvash
27 (24-29) Besermyan
30 (28-33) Udmurt
31 (26-34) Tambets_Saami_SWE
31 (31-31) Tambets_Maris
35 (7-62) Tambets_Mansi
36 (22-47) Bashkir
54 (52-55) Tatar_Siberian_Zabolotniye
55 (48-59) Mansi
59 (57-64) Tambets_Khanty
68 (39-90) Selkup
79 (73-82) Enets
87 (42-94) Tofalar
88 (81-95) Dolgan
90 (89-90) Todzin
99 (95-100) Nganasan

The Mansi samples by Tambets range from 7% East Eurasian to 62% East Eurasian, so they're only 35% East Eurasian on average. The Bashkir samples from Jeong et al. have a range of 22% to 47%, because the northern Bashkir samples have much lower East Eurasian ancestry than the southern Bashkir samples.

Full results:

0 (0-0) Sardinian
0 (0-1) Basque
0 (0-1) Spanish_North
0 (0-2) Italian_North
1 (0-3) Spanish
1 (1-1) Tambets_Germans
1 (0-2) Italian_South
1 (0-2) Sicilian
1 (0-2) English
1 (0-3) French
1 (0-3) Greek
1 (1-2) Cypriot
2 (1-3) Orcadian
2 (0-3) Icelandic
2 (1-3) Croatian
2 (1-3) Maltese
2 (1-2) Jew_Turkish
2 (1-3) Albanian
2 (1-4) Bulgarian
2 (1-3) Lebanese_Christian
2 (1-3) Scottish
2 (1-4) Czech
2 (0-4) Norwegian
2 (1-3) Jew_Moroccan
2 (0-4) Romanian
2 (1-3) Jew_Libyan
2 (1-4) Jew_Tunisian
2 (1-3) Armenian_Hemsheni
2 (0-4) Armenian
2 (2-2) Tambets_Gagauzes
2 (2-3) Jew_Iraqi
2 (1-4) Druze
2 (0-4) Jew_Ashkenazi
2 (2-2) Tambets_Latvians
2 (1-4) Moldavian
3 (0-8) Georgian
3 (1-4) Gagauz
3 (2-4) Lithuanian
3 (2-4) Hungarian
3 (1-4) BedouinB
3 (1-6) Assyrian
3 (1-5) Ukrainian_North
3 (1-4) Ukrainian
3 (2-5) Jew_Georgian
3 (2-5) Jew_Iranian
4 (4-4) Kabardinian_outlier
4 (2-5) Lebanese_Muslim
4 (4-5) Abazin_outlier
4 (3-6) Belarusian
4 (4-4) Tambets_Swedes
4 (3-6) Palestinian
5 (2-8) Lebanese
5 (1-8) Tambets_Estonians
5 (4-5) Abkhasian
5 (3-9) Estonian
5 (3-8) Jordanian
5 (0-8) Syrian
5 (5-5) Avar_outlier2
5 (5-5) Avar_outlier1
6 (3-8) BedouinA
6 (3-12) Tambets_Poles
6 (5-8) Kubachinian
6 (3-7) Kaitag
6 (5-8) Ezid
6 (4-9) Kurd
7 (4-9) Tabasaran
7 (5-8) Egyptian
7 (6-8) Darginian
7 (6-8) Lezgin
7 (5-8) Libyan
7 (3-13) Russian
7 (4-9) Chechen
7 (6-8) Avar
7 (5-8) Lak
7 (6-8) Tambets_Russians_Central
7 (6-10) Mozabite
8 (6-10) Tunisian
8 (6-10) Tambets_Karels
8 (5-12) Adygei
8 (7-9) Algerian
8 (5-11) Iranian
8 (6-10) Ingushian
8 (6-12) Moroccan
9 (1-19) Turkish
9 (6-12) Tambets_Ingrian
9 (5-13) Tambets_Finns
9 (4-11) Kumyk
10 (7-12) Finnish
10 (8-13) Azeri
10 (6-13) Mordovian
10 (7-13) Ossetian
11 (8-13) Circassian
11 (7-14) Karelian
11 (10-12) Russian_Archangelsk_Krasnoborsky
11 (9-13) Balkar
11 (9-14) Karachai
12 (9-17) Kabardinian
12 (11-13) Tambets_Vepsas
12 (11-13) Veps
12 (10-15) Abazin
14 (12-15) Russian_Archangelsk_Pinezhsky
16 (13-21) Balochi
16 (14-21) Brahui
17 (16-18) Turkish_Balikesir
17 (15-18) Russian_Archangelsk_Leshukonsky
18 (17-21) Tatar_Mishar
19 (18-21) Kalash
21 (17-25) Pathan
22 (15-28) Tajik
22 (15-27) Tambets_Saami_Kola
22 (17-27) Tambets_Tatars
23 (21-25) Tatar_Kazan
26 (26-26) Turkmen_outlier
26 (16-30) Chuvash
27 (24-29) Besermyan
27 (21-36) Nogai_Karachay_Cherkessia
30 (28-33) Udmurt
30 (28-33) Burusho
31 (26-34) Tambets_Saami_SWE
31 (31-31) Tambets_Maris
35 (29-38) Punjabi
35 (7-62) Tambets_Mansi
36 (22-47) Bashkir
36 (10-59) Aleut
37 (32-48) Turkmen
41 (20-56) Uzbek
47 (33-61) Tlingit
50 (42-56) Tatar_Siberian
50 (37-60) Nogai_Stavropol
50 (45-56) Yukagir_Forest
50 (50-50) Tambets_Buryats
54 (52-55) Tatar_Siberian_Zabolotniye
55 (48-59) Mansi
56 (54-58) Nogai_Astrakhan
56 (48-67) Uyghur
56 (53-58) Hazara
57 (50-64) Karakalpak
59 (46-66) Altaian_Chelkan
59 (57-64) Tambets_Khanty
62 (62-62) Khakass_outlier
63 (58-67) Tubalar
64 (63-66) Shor_Mountain
65 (61-70) Kazakh
66 (62-68) Shor_Khakassia
68 (65-70) Kyrgyz_Tajikistan
68 (39-90) Selkup
69 (54-79) Khakass
70 (61-75) Kyrgyz_China
70 (35-98) Even
71 (66-74) Kyrgyz_Kyrgyzstan
71 (65-74) Ket
74 (67-82) Kazakh_China
76 (71-79) Khakass_Kachin
76 (73-80) Altaian
79 (73-82) Enets
83 (81-86) Kalmyk
85 (66-100) Evenk_FarEast
85 (81-89) Tuvinian
86 (84-89) Salar
87 (42-94) Tofalar
88 (83-92) Mongol
88 (86-93) Dongxiang
88 (81-95) Dolgan
88 (79-92) Buryat
89 (85-92) Dungan
90 (89-90) Todzin
91 (85-95) Yakut
91 (86-93) Khamnegan
92 (87-97) Bonan
94 (92-96) Tu
96 (89-100) Yugur
96 (92-100) Yukagir_Tundra
96 (91-100) Evenk_Transbaikal
96 (90-100) Tibetan
96 (95-97) Mongola
97 (94-99) Xibo
98 (96-99) Daur
99 (95-100) Nganasan
99 (95-100) Oroqen
99 (97-100) Hezhen
100 (98-100) Qiang
100 (97-100) Han
100 (100-100) Japanese
100 (100-100) Korean

Tsakhur
09-24-2021, 05:23 PM
I selected modern samples from 1240K+HO with longitude over -30 and latitude over 30, and with no suffix like .SG or .DG. I excluded Chukchi, Eskimo, Itelmen, Koryak, Nanai, Negidal, Nivh, and Ulchi. Next I merged the samples with samples from Tambets et al. 2018 (https://evolbio.ut.ee/Tambets2018/). I then did an ADMIXTURE run of the samples with two components. The list below shows the percentage of the East Eurasian component for Uralic populations, and also for a few Turkic populations that are probably mixed with Uralics:

3 (2-4) Hungarian
5 (1-8) Tambets_Estonians
5 (3-9) Estonian
8 (6-10) Tambets_Karels
9 (6-12) Tambets_Ingrian
9 (5-13) Tambets_Finns
10 (7-12) Finnish
10 (6-13) Mordovian
11 (7-14) Karelian
12 (11-13) Tambets_Vepsas
12 (11-13) Veps
22 (15-27) Tambets_Saami_Kola
22 (17-27) Tambets_Tatars
26 (16-30) Chuvash
27 (24-29) Besermyan
30 (28-33) Udmurt
31 (26-34) Tambets_Saami_SWE
31 (31-31) Tambets_Maris
35 (7-62) Tambets_Mansi
36 (22-47) Bashkir
54 (52-55) Tatar_Siberian_Zabolotniye
55 (48-59) Mansi
59 (57-64) Tambets_Khanty
68 (39-90) Selkup
79 (73-82) Enets
87 (42-94) Tofalar
88 (81-95) Dolgan
90 (89-90) Todzin
99 (95-100) Nganasan

The Mansi samples by Tambets range from 7% East Eurasian to 62% East Eurasian, so they're only 35% East Eurasian on average. The Bashkir samples from Jeong et al. have a range of 22% to 47%, because the northern Bashkir samples have much lower East Eurasian ancestry than the southern Bashkir samples.

Full results:

0 (0-0) Sardinian
0 (0-1) Basque
0 (0-1) Spanish_North
0 (0-2) Italian_North
1 (0-3) Spanish
1 (1-1) Tambets_Germans
1 (0-2) Italian_South
1 (0-2) Sicilian
1 (0-2) English
1 (0-3) French
1 (0-3) Greek
1 (1-2) Cypriot
2 (1-3) Orcadian
2 (0-3) Icelandic
2 (1-3) Croatian
2 (1-3) Maltese
2 (1-2) Jew_Turkish
2 (1-3) Albanian
2 (1-4) Bulgarian
2 (1-3) Lebanese_Christian
2 (1-3) Scottish
2 (1-4) Czech
2 (0-4) Norwegian
2 (1-3) Jew_Moroccan
2 (0-4) Romanian
2 (1-3) Jew_Libyan
2 (1-4) Jew_Tunisian
2 (1-3) Armenian_Hemsheni
2 (0-4) Armenian
2 (2-2) Tambets_Gagauzes
2 (2-3) Jew_Iraqi
2 (1-4) Druze
2 (0-4) Jew_Ashkenazi
2 (2-2) Tambets_Latvians
2 (1-4) Moldavian
3 (0-8) Georgian
3 (1-4) Gagauz
3 (2-4) Lithuanian
3 (2-4) Hungarian
3 (1-4) BedouinB
3 (1-6) Assyrian
3 (1-5) Ukrainian_North
3 (1-4) Ukrainian
3 (2-5) Jew_Georgian
3 (2-5) Jew_Iranian
4 (4-4) Kabardinian_outlier
4 (2-5) Lebanese_Muslim
4 (4-5) Abazin_outlier
4 (3-6) Belarusian
4 (4-4) Tambets_Swedes
4 (3-6) Palestinian
5 (2-8) Lebanese
5 (1-8) Tambets_Estonians
5 (4-5) Abkhasian
5 (3-9) Estonian
5 (3-8) Jordanian
5 (0-8) Syrian
5 (5-5) Avar_outlier2
5 (5-5) Avar_outlier1
6 (3-8) BedouinA
6 (3-12) Tambets_Poles
6 (5-8) Kubachinian
6 (3-7) Kaitag
6 (5-8) Ezid
6 (4-9) Kurd
7 (4-9) Tabasaran
7 (5-8) Egyptian
7 (6-8) Darginian
7 (6-8) Lezgin
7 (5-8) Libyan
7 (3-13) Russian
7 (4-9) Chechen
7 (6-8) Avar
7 (5-8) Lak
7 (6-8) Tambets_Russians_Central
7 (6-10) Mozabite
8 (6-10) Tunisian
8 (6-10) Tambets_Karels
8 (5-12) Adygei
8 (7-9) Algerian
8 (5-11) Iranian
8 (6-10) Ingushian
8 (6-12) Moroccan
9 (1-19) Turkish
9 (6-12) Tambets_Ingrian
9 (5-13) Tambets_Finns
9 (4-11) Kumyk
10 (7-12) Finnish
10 (8-13) Azeri
10 (6-13) Mordovian
10 (7-13) Ossetian
11 (8-13) Circassian
11 (7-14) Karelian
11 (10-12) Russian_Archangelsk_Krasnoborsky
11 (9-13) Balkar
11 (9-14) Karachai
12 (9-17) Kabardinian
12 (11-13) Tambets_Vepsas
12 (11-13) Veps
12 (10-15) Abazin
14 (12-15) Russian_Archangelsk_Pinezhsky
16 (13-21) Balochi
16 (14-21) Brahui
17 (16-18) Turkish_Balikesir
17 (15-18) Russian_Archangelsk_Leshukonsky
18 (17-21) Tatar_Mishar
19 (18-21) Kalash
21 (17-25) Pathan
22 (15-28) Tajik
22 (15-27) Tambets_Saami_Kola
22 (17-27) Tambets_Tatars
23 (21-25) Tatar_Kazan
26 (26-26) Turkmen_outlier
26 (16-30) Chuvash
27 (24-29) Besermyan
27 (21-36) Nogai_Karachay_Cherkessia
30 (28-33) Udmurt
30 (28-33) Burusho
31 (26-34) Tambets_Saami_SWE
31 (31-31) Tambets_Maris
35 (29-38) Punjabi
35 (7-62) Tambets_Mansi
36 (22-47) Bashkir
36 (10-59) Aleut
37 (32-48) Turkmen
41 (20-56) Uzbek
47 (33-61) Tlingit
50 (42-56) Tatar_Siberian
50 (37-60) Nogai_Stavropol
50 (45-56) Yukagir_Forest
50 (50-50) Tambets_Buryats
54 (52-55) Tatar_Siberian_Zabolotniye
55 (48-59) Mansi
56 (54-58) Nogai_Astrakhan
56 (48-67) Uyghur
56 (53-58) Hazara
57 (50-64) Karakalpak
59 (46-66) Altaian_Chelkan
59 (57-64) Tambets_Khanty
62 (62-62) Khakass_outlier
63 (58-67) Tubalar
64 (63-66) Shor_Mountain
65 (61-70) Kazakh
66 (62-68) Shor_Khakassia
68 (65-70) Kyrgyz_Tajikistan
68 (39-90) Selkup
69 (54-79) Khakass
70 (61-75) Kyrgyz_China
70 (35-98) Even
71 (66-74) Kyrgyz_Kyrgyzstan
71 (65-74) Ket
74 (67-82) Kazakh_China
76 (71-79) Khakass_Kachin
76 (73-80) Altaian
79 (73-82) Enets
83 (81-86) Kalmyk
85 (66-100) Evenk_FarEast
85 (81-89) Tuvinian
86 (84-89) Salar
87 (42-94) Tofalar
88 (83-92) Mongol
88 (86-93) Dongxiang
88 (81-95) Dolgan
88 (79-92) Buryat
89 (85-92) Dungan
90 (89-90) Todzin
91 (85-95) Yakut
91 (86-93) Khamnegan
92 (87-97) Bonan
94 (92-96) Tu
96 (89-100) Yugur
96 (92-100) Yukagir_Tundra
96 (91-100) Evenk_Transbaikal
96 (90-100) Tibetan
96 (95-97) Mongola
97 (94-99) Xibo
98 (96-99) Daur
99 (95-100) Nganasan
99 (95-100) Oroqen
99 (97-100) Hezhen
100 (98-100) Qiang
100 (97-100) Han
100 (100-100) Japanese
100 (100-100) Korean

Am surprised the Maris in that run are only 31% Eastern Eurasian/ENA (Eastern Non-African) and that even Udmurts can score even slightly more Eastern Eurasian than Maris here.

Because in G25, Maris can even be up close to 35% depending on the samples/calculators used while it seems Udmurts in G25 are less ENA than in this run:

Mari average: I use the Davidski Global Standard calculator but add the Sintashta, KAZ_Maitan_Alakul (Steppe), Baltic_LTU_BA/EST_BA into the run:

Target: Mari
Distance: 8.8936% / 0.08893626
40.6 Baltic_LTU_BA
32.6 RUS_Krasnoyarsk_BA
24.0 KAZ_Maitan_MLBA_Alakul
2.2 Yamnaya_RUS_Samara
0.6 IRN_Ganj_Dareh_N

Most Eastern-shifted Maris:


Target: Mari:mari1
Distance: 8.6989% / 0.08698884
38.6 Baltic_LTU_BA
33.8 RUS_Krasnoyarsk_BA
22.4 KAZ_Maitan_MLBA_Alakul
5.2 Yamnaya_RUS_Samara


Target: Mari:mari3
Distance: 8.8181% / 0.08818085
41.4 Baltic_LTU_BA
32.6 RUS_Krasnoyarsk_BA
23.8 KAZ_Maitan_MLBA_Alakul
2.2 Yamnaya_RUS_Samara

Compare to the Udmurts on G25:

Udmurt Average:

Target: Udmurt
Distance: 2.9797% / 0.02979689
26.2 RUS_Krasnoyarsk_BA
23.6 Yamnaya_RUS_Samara
23.0 RUS_Sintashta_MLBA
15.0 Baltic_LTU_BA
9.8 KAZ_Maitan_MLBA_Alakul
1.2 TUR_Tepecik_Ciftlik_N
1.0 Baltic_EST_BA
0.2 Levant_PPNB

Most Eastern-shifted Udmurts:

Target: Udmurt:udmurd8
Distance: 3.2481% / 0.03248071
28.6 RUS_Krasnoyarsk_BA
21.0 Yamnaya_RUS_Samara
20.4 KAZ_Maitan_MLBA_Alakul
17.0 RUS_Sintashta_MLBA
11.6 Baltic_LTU_BA
1.0 Levant_PPNB
0.4 TUR_Tepecik_Ciftlik_N

Target: Udmurt:153_R02C01
Distance: 4.8473% / 0.04847300
29.8 RUS_Sintashta_MLBA
28.2 RUS_Krasnoyarsk_BA
23.4 Yamnaya_RUS_Samara
9.6 KAZ_Maitan_MLBA_Alakul
9.0 Baltic_LTU_BA

If we utilize G25 as the base, its a bit puzzling how the Udmurts in the 1240K run have higher Eastern Eurasian range (28-33%) compare to the Maris (31-31%) when in G25 its the reverse.

Also this qpAdm data showing Mari to be more ENA:

Udmurt
Nganasan: 0.3070.007
Estonia_MN_CCC: 0.1500.033
Latvia_LN_CordedWare: 0.3870.045
Ukraine_Globular_Amphora: 0.1550.029
tail: 0.099657
chisq: 15.999
Udmurt_1.txt

Mari
Nganasan: 0.3440.013
Estonia_MN_CCC: 0.1530.052
Latvia_LN_CordedWare: 0.2800.078
Ukraine_Globular_Amphora: 0.2230.049
tail: 0.119238
chisq: 15.367
Mari_1.txt

Nganasankhan
09-24-2021, 06:18 PM
Am surprised the Maris in that run are only 31% Eastern Eurasian/ENA (Eastern Non-African) and that even Udmurts can score even slightly more Eastern Eurasian than Maris here.

...

If we utilize G25 as the base, its a bit puzzling how the Udmurts in the 1240K run have higher Eastern Eurasian range (28-33%) compare to the Maris (31-31%) when in G25 its the reverse.

There was only one Mari sample in the run, and other Mari samples would've probably gotten a higher percentage of the eastern component.

Also in order to increase the percentage of the eastern component in Maris, I could've included only Siberians and no East Asians in the run. Or I could've included a lot of WHG or ENF samples. Or I could've done a run with 3 or 4 components, so the eastern component would've split off into Siberian and East Asian components, which would usually result in Maris getting a higher percentage of the Siberian component.

The image below shows ADMIXTURE runs using the same samples as my previous run, except I added one more Mari sample. The new sample looks a bit more eastern than the old sample. In the run with 2 components, the old Mari sample got 31% of the East Asian component and the new sample got 33%, and in the run with 4 components, the old Mari sample is 27% Siberian and 4% East Asian and the new one is 28% Siberian and 5% East Asian. This time splitting the eastern component into Siberian and East Asian components actually reduced the total amount of East Eurasian ancestry in some Uralic populations, but usually it's the reverse. Maybe it's because in the K=4 run, the West Eurasian component also split off into European and MENA components, which caused Uralic populations to get a component which fits their western ancestry better than the combined European-MENA component of the K=2 run.

https://i.ibb.co/sQVhVBF/circlize.jpg

I created the plot using the Circlize package: https://anthrogenica.com/showthread.php?23708-Shell-and-R-scripts-for-SmartPCA-and-ADMIXTURE&p=771174&viewfull=1#post771174.

BTW the K=3 run is also interesting, because the average amount of the European component is about as high in Swedish Saami as in Latvians, and it's about as high in Khanty as in Bulgarians:

https://i.ibb.co/QHxd5nd/admixpoly.jpg

There are two samples labeled Tambets_Tatars, but I think the other one is a Crimean Tatar, because it plots close to Nogais from Karachay-Cherkessia. There is one sample labeled Tambets_Buryats, but it looks like hapa.

Tsakhur
09-24-2021, 06:54 PM
There was only one Mari sample in the run, and other Mari samples would've probably gotten a higher percentage of the eastern component.

Also in order to increase the percentage of the eastern component in Maris, I could've included only Siberians and no East Asians in the run. Or I could've included a lot of WHG or ENF samples. Or I could've done a run with 3 or 4 components, so the eastern component would've split off into Siberian and East Asian components, which would usually result in Maris getting a higher percentage of the Siberian component.

The image below shows ADMIXTURE runs using the same samples as my previous run, except I added one more Mari sample. The new sample looks a bit more eastern than the old sample. In the run with 2 components, the old Mari sample got 31% of the East Asian component and the new sample got 33%, and in the run with 4 components, the old Mari sample is 27% Siberian and 4% East Asian and the new one is 28% Siberian and 5% East Asian. This time splitting the eastern component into Siberian and East Asian components actually reduced the total amount of East Eurasian ancestry in some Uralic populations, but usually it's the reverse. Maybe it's because in the K=4 run, the West Eurasian component also split off into European and MENA components, which caused Uralic populations to get a component which fits their western ancestry better than the combined European-MENA component of the K=2 run.

https://i.ibb.co/sQVhVBF/circlize.jpg

I created the plot using the Circlize package: https://anthrogenica.com/showthread.php?23708-Shell-and-R-scripts-for-SmartPCA-and-ADMIXTURE&p=771174&viewfull=1#post771174.

Nice! Great comparison of various Eurasian populations. There also new Mari samples which hasn't been add to G25 yet and they seem slightly less ENA than the ones already in the datasheet: https://anthrogenica.com/showthread.php?22733-651-New-Sample-for-G25

It seems Volga Ural folks also score minor amounts of East Asian and MENA component which likely comes from Turkics and Iranian tribes like Scythians, Sarmatians who used to roam the areas. There were some Scythians and Sarmatians from the Urals in G25 database, for instance.

Nganasankhan
09-24-2021, 07:21 PM
It seems Volga Ural folks also score minor amounts of East Asian and MENA component which likely comes from Turkics and Iranian tribes like Scythians, Sarmatians who used to roam the areas. There were some Scythians and Sarmatians from the Urals in G25 database, for instance.

Yeah, I tried making a SmartPCA run of the samples in my previous post, and I used a G25-style scaled datasheet of the run to model Udmurts, but they got 11% Darginian and 1% Kalash ancestry:

Udmurt (.002):
37% Tambets_Maris
24% Tambets_Khanty
20% Tambets_Latvians
11% Darginian
3% Tambets_Saami_SWE
3% Tubalar
1% Kalash

Also if Udmurts are modeled using a global f2 matrix of 265 populations (https://anthrogenica.com/showthread.php?23677-R-scripts-for-ADMIXTOOLS-2&p=798523&viewfull=1#post798523), and if Besermyans are excluded from the source populations, then Udmurts get 4% Darginian and 3% Kaitag ancestry:

Udmurt (.006):
47% Chuvash
25% Mansi
17% Russian_Archangelsk_Pinezhsky
4% Russian_Archangelsk_Leshukonsky
4% Darginian
3% Kaitag
1% Tlingit
0% Mordovian

If Chuvashes, Tatars, Bashkirs, and Mordovians are also excluded from the source populations, then Udmurts get 11% Tajik:

Udmurt (.007):
47% Russian_Archangelsk_Krasnoborsky
32% Mansi
11% Tajik
6% Yukagir_Forest
3% Aleut

If Mordvins are modeled using the f2 matrix, and if Tatars, Chuvashes, and Bashkirs are excluded from the source populations, then Mordvins get 7% Nogai and 2% Tajik ancestry:

Mordovian (.002):
61% Russian
29% Russian_Archangelsk_Krasnoborsky
7% Nogai_Karachay_Cherkessia
2% Aleut
2% Tajik
0% Udmurt

You can download the f2 matrix here: https://pastebin.com/raw/B1t0ESsj. You can then use it to create models with Vahaduo.

Nganasankhan
09-24-2021, 07:54 PM
Here's models made using the f2 matrix linked in my previous post, with the sources Belarusian, Norwegian, Croatian, Mongol, Yakut, and Nganasan:

$ curl -Ls pastebin.com/raw/B1t0ESsj|tr -d \\r>f2
$ curl -Ls pastebin.com/raw/afaMiFSa|tr -d \\r>mix;chmod +x mix
$ pip3 install cvxpy
[...]
$ for t in Hungarian Estonian Finnish Karelian Veps Mordovian Udmurt Mansi Selkup Enets Nganasan;do mix <(printf %s\\n Croatian Belarusian Yakut Nganasan Mongol Norwegian|awk -F, 'NR==FNR{a[$0];next}$1 in a' - f2) <(grep ^$t, f2) -s -t.01;done
Hungarian (.006): 93% Croatian + 6% Belarusian + 1% Mongol
Estonian (.005): 86% Belarusian + 12% Norwegian + 2% Nganasan
Finnish (.010): 93% Belarusian + 7% Mongol
Karelian (.009): 91% Belarusian + 9% Mongol
Veps (.009): 90% Belarusian + 7% Yakut + 3% Mongol
Mordovian (.019): 46% Croatian + 45% Belarusian + 9% Mongol
Udmurt (.028): 70% Belarusian + 30% Mongol
Mansi (.027): 59% Mongol + 40% Belarusian + 1% Yakut
Selkup (.023): 64% Yakut + 27% Belarusian + 10% Mongol
Enets (.017): 60% Yakut + 23% Nganasan + 16% Belarusian
Nganasan (.000): 100% Nganasan

I made the models using a modified version of michal3141's convex optimization script (https://github.com/michal3141/g25), which produces models that are virtually identical to Vahaduo.

Mongols are a mixed population with low drift, so they are preferred as a source in these f2-based models, but Nganasans are a drifted population that has high f2 distance to other populations, so they are not preferred as a source.

This employs the same concept as what Kale calls f3 outgroup nMonte (https://anthrogenica.com/showthread.php?16180-Modern-poles-of-Eurasian-variation), except I use a matrix of f2 stats where the columns consist of 265 modern populations, and he uses a matrix of f3 stats where the columns consist of about 10 ancient populations.

Tsakhur
09-27-2021, 07:39 AM
From what I understand the TRVE Uralics like Khanti, Mansi, Nenets are not West Eurasian what so ever, their most "West Eurasian" component comes from EHG admix that was present in Siberia since forever

The wannabe Uralics like Mari, Udmurts, Komi are assimilated Corded Ware and Combed Ware people with minor TRVE Uralic admixture

If we're talking about Moksha/Erzya/Karelians/Finns/Estonians/Hungarians then they're just typical Europeans of Corded Ware, Combed Ware, Bell Beaker descent with pretty much nothing eastern

Also you might have a hard time believing this but many Volga Uralics like Udmurt, Mari and even Saami are genetically closer to Tajiks, Siberian and Central Asian Turkics such as Siberian Tatars, Uzbeks, Turkmens, Nogais, Hazaras and Ugrics such as Khanty/Mansi than they are to most ethnic Russians and other Europeans according to G25 runs. The only exceptions seems to be Finns as they also have more kra001/Nganasan-related East Eurasian admixture which they shared with Udmurts, resulting in the former becoming close to the latter than other pops:

Here is the Udmurt average from G25:

Distance to: Udmurt

0.09233220 Bashkir
0.10996572 Tatar_Crimean_steppe
0.12669984 Tatar_Siberian
0.13017299 Turkmen
0.13209317 Turkmen_Uzbekistan
0.13358622 Tajik_Hisor
0.13659690 Finnish_East
0.13908269 Tajik_Ayni
0.13951604 Sarikoli_China
0.14377135 Tatar_Siberian_Zabolotniye
0.14399868 Tajik_Shugnan
0.14419189 Russian_Kostroma
0.14453708 Iranian_Turkmen_Golestan
0.14518332 Tajik_Rushan
0.14537728 Yukagir_Forest
0.14611296 Tajik_Kulob
0.14620147 Tajik_Badakshan
0.14730403 Uzbek
0.15097086 Finnish
0.15341731 Tajik_Ishkashim
0.15670755 Tlingit
0.15985649 Cossack_Kuban
0.16346492 Mansi
0.16363002 Tajik_Yagnobi
0.16588757 Russian_Yaroslavl
0.16668715 Russian_Tver
0.16744887 Nogai
0.16839026 Russian_Ryazan
0.17037648 Turkish_Northwest
0.17140857 Kho_Singanali
0.17162638 Turkish_Balikesir
0.17507607 Turkish_Rumeli
0.17543081 Russian_Kursk
0.17682555 Turkish_Southwest
0.17705807 Jatt_Pathak
0.17723851 Russian_Orel
0.17734978 Turkish_Deliorman
0.17771331 Khanty
0.17812440 Turkish_South
0.17864308 Turkish_Aydin
0.17880079 Cossack_Ukrainian
0.17908975 Hazara_Afghanistan
0.17924798 Russian_Kaluga
0.17929713 Estonian
0.17959051 Ror
0.18171679 Russian_Voronez
0.18211018 Russian_Pskov
0.18344477 Russian_Belgorod
0.18412141 Uygur
0.18449143 Ukrainian
0.18549235 Turkish_North
0.18763907 Uthmankhel
0.18771594 Russian_Smolensk
0.18895298 Swedish
0.18954798 Hazara
0.18959742 Polish
0.18964876 Hungarian
0.19272786 Latvian
0.19318914 Karakalpak
0.19554328 Lithuanian_PZ
0.19861093 German
0.20141186 Turkish_Central
0.20544633 Tubalar
0.21700773 Shor_Mountain
0.21998997 Shor_Khakassia
0.22240629 Shor
0.22504281 Kazakh


Most Eastern-shifted Udmurt individual:

Distance to: Udmurt:udmurd8

0.08187760 Bashkir
0.10852740 Tatar_Crimean_steppe
0.11499101 Tatar_Siberian
0.12768661 Turkmen
0.12839383 Turkmen_Uzbekistan
0.13030726 Tatar_Siberian_Zabolotniye
0.13430221 Yukagir_Forest
0.13723756 Tajik_Hisor
0.14008865 Uzbek
0.14242160 Tajik_Ayni
0.14307951 Sarikoli_China
0.14503555 Iranian_Turkmen_Golestan
0.14931869 Tlingit
0.15002433 Finnish_East
0.15051991 Mansi
0.15082985 Tajik_Shugnan
0.15099631 Tajik_Kulob
0.15170661 Tajik_Badakshan
0.15262305 Tajik_Rushan
0.15733351 Nogai
0.15807654 Russian_Kostroma
0.15932898 Tajik_Ishkashim
0.16412459 Finnish
0.16433979 Khanty
0.16977493 Hazara_Afghanistan
0.17150233 Tajik_Yagnobi
0.17424742 Cossack_Kuban
0.17470455 Uygur
0.17473450 Kho_Singanali
0.17560038 Turkish_Northwest
0.17658199 Turkish_Balikesir
0.17977632 Hazara
0.18022385 Russian_Yaroslavl
0.18054395 Russian_Tver
0.18132893 Jatt_Pathak
0.18183389 Turkish_Southwest
0.18225833 Karakalpak
0.18227607 Russian_Ryazan
0.18257704 Turkish_South
0.18350451 Turkish_Aydin
0.18364497 Ror
0.18371928 Turkish_Rumeli
0.18694029 Turkish_Deliorman
0.18932506 Russian_Kursk
0.19037902 Turkish_North
0.19079839 Russian_Orel
0.19152406 Uthmankhel
0.19219383 Cossack_Ukrainian
0.19309090 Russian_Kaluga
0.19334510 Estonian
0.19378021 Tubalar
0.19543075 Russian_Voronez
0.19625473 Russian_Pskov
0.19722412 Russian_Belgorod
0.19809225 Ukrainian
0.20108449 Swedish
0.20135408 Russian_Smolensk
0.20161450 Hungarian
0.20302204 Polish
0.20469775 Shor_Mountain
0.20685367 Turkish_Central
0.20715504 Latvian
0.20766697 Shor_Khakassia
0.20963820 Lithuanian_PZ
0.20993455 Shor
0.21031218 German
0.21358828 Kazakh


Now compare to Mari who should be more Eastern Eurasian/ENA on average than Udmurts:

Distance to: Mari

0.11547308 Bashkir
0.13857526 Tatar_Siberian
0.14622919 Tatar_Siberian_Zabolotniye
0.14750185 Tatar_Crimean_steppe
0.15247870 Yukagir_Forest
0.15879922 Mansi
0.16609517 Turkmen
0.16666745 Turkmen_Uzbekistan
0.17214423 Tlingit
0.17229907 Uzbek
0.17348576 Khanty
0.17705239 Finnish_East
0.17783813 Nogai
0.18294903 Tajik_Hisor
0.18422114 Iranian_Turkmen_Golestan
0.18522712 Russian_Kostroma
0.18617886 Tajik_Ayni
0.18909122 Sarikoli_China
0.19255765 Finnish
0.19488972 Tajik_Kulob
0.19535284 Tajik_Shugnan
0.19586713 Hazara_Afghanistan
0.19632056 Tajik_Badakshan
0.19748239 Tajik_Rushan
0.19884563 Cossack_Kuban
0.19923069 Uygur
0.19996889 Karakalpak
0.20260554 Tajik_Ishkashim
0.20344348 Hazara
0.20516483 Russian_Yaroslavl
0.20847626 Russian_Tver
0.20920322 Russian_Ryazan
0.21058514 Turkish_Northwest
0.21182997 Tubalar
0.21232139 Tajik_Yagnobi
0.21235883 Turkish_Balikesir
0.21577945 Russian_Kursk
0.21622425 Turkish_Southwest
0.21650129 Kho_Singanali
0.21689662 Turkish_Aydin
0.21701708 Russian_Kaluga
0.21741102 Turkish_Rumeli
0.21763448 Estonian
0.21768552 Turkish_South
0.21780212 Turkish_Deliorman
0.21798475 Russian_Orel
0.21984970 Cossack_Ukrainian
0.22059257 Russian_Voronez
0.22088477 Russian_Pskov
0.22192787 Shor_Mountain
0.22200313 Jatt_Pathak
0.22287509 Russian_Belgorod
0.22346587 Ror
0.22396805 Turkish_North
0.22486058 Ukrainian
0.22617737 Shor_Khakassia
0.22640049 Shor
0.22654435 Russian_Smolensk
0.22718034 Kazakh
0.22981453 Polish
0.23081106 Hungarian
0.23084455 Uthmankhel
0.23128921 Latvian
0.23172467 Swedish
0.23346019 Lithuanian_PZ
0.23934037 Turkish_Central
0.24026907 German


Most East Asian-shifted Mari:

Distance to: Mari:mari1

0.11040352 Bashkir
0.13218886 Tatar_Siberian
0.13714602 Tatar_Siberian_Zabolotniye
0.14681540 Yukagir_Forest
0.14881216 Tatar_Crimean_steppe
0.14953210 Mansi
0.16383451 Khanty
0.16693582 Turkmen
0.16705886 Turkmen_Uzbekistan
0.16769483 Tlingit
0.16971483 Uzbek
0.17234959 Nogai
0.18345203 Finnish_East
0.18634596 Tajik_Hisor
0.18677092 Iranian_Turkmen_Golestan
0.18919353 Tajik_Ayni
0.19187852 Russian_Kostroma
0.19201913 Sarikoli_China
0.19206068 Hazara_Afghanistan
0.19396984 Karakalpak
0.19471678 Uygur
0.19885824 Hazara
0.19891662 Tajik_Kulob
0.19894694 Finnish
0.20018331 Tajik_Shugnan
0.20028861 Tajik_Badakshan
0.20214528 Tajik_Rushan
0.20393972 Tubalar
0.20712006 Cossack_Kuban
0.20717671 Tajik_Ishkashim
0.21260915 Russian_Yaroslavl
0.21346326 Shor_Mountain
0.21551440 Russian_Tver
0.21558532 Turkish_Northwest
0.21656360 Russian_Ryazan
0.21710570 Turkish_Balikesir
0.21750578 Tajik_Yagnobi
0.21750915 Shor_Khakassia
0.21771129 Shor
0.22030140 Kazakh
0.22037648 Kho_Singanali
0.22170558 Turkish_Southwest
0.22199340 Turkish_Aydin
0.22267565 Turkish_South
0.22306003 Russian_Kursk
0.22430097 Turkish_Rumeli
0.22434148 Russian_Kaluga
0.22465072 Turkish_Deliorman
0.22484249 Estonian
0.22493215 Russian_Orel
0.22638171 Jatt_Pathak
0.22698380 Cossack_Ukrainian
0.22792117 Russian_Voronez
0.22802434 Ror
0.22830061 Russian_Pskov
0.22879830 Turkish_North
0.23017250 Russian_Belgorod
0.23218075 Ukrainian
0.23369090 Russian_Smolensk
0.23532068 Uthmankhel
0.23707678 Polish
0.23795651 Hungarian
0.23845057 Latvian
0.23846213 Swedish
0.24026953 Lithuanian_PZ
0.24473670 Turkish_Central
0.24734527 German


Here is the Saami average:

Distance to: Saami

0.11182593 Bashkir
0.12036459 Finnish_East
0.12774128 Tatar_Crimean_steppe
0.13609329 Russian_Kostroma
0.13826217 Finnish
0.14160670 Yukagir_Forest
0.14216068 Tatar_Siberian
0.15078272 Cossack_Kuban
0.15607780 Tatar_Siberian_Zabolotniye
0.15704647 Russian_Yaroslavl
0.15796967 Tlingit
0.15887672 Russian_Tver
0.16336793 Turkmen
0.16338782 Russian_Ryazan
0.16459138 Turkmen_Uzbekistan
0.16632238 Estonian
0.16927429 Russian_Kursk
0.16938225 Tajik_Hisor
0.17094294 Russian_Kaluga
0.17167384 Russian_Orel
0.17178954 Russian_Pskov
0.17240465 Cossack_Ukrainian
0.17367742 Uzbek
0.17400811 Mansi
0.17453495 Tajik_Ayni
0.17516197 Sarikoli_China
0.17518874 Russian_Voronez
0.17794170 Russian_Belgorod
0.17822236 Tajik_Rushan
0.17827602 Tajik_Shugnan
0.17868669 Iranian_Turkmen_Golestan
0.17927329 Ukrainian
0.17931411 Latvian
0.18056644 Russian_Smolensk
0.18141920 Tajik_Badakshan
0.18157993 Nogai
0.18204987 Tajik_Kulob
0.18279075 Swedish
0.18283997 Lithuanian_PZ
0.18381285 Polish
0.18837516 Tajik_Ishkashim
0.18848438 Khanty
0.18949914 Hungarian
0.19112002 Turkish_Deliorman
0.19124715 Turkish_Rumeli
0.19634888 Turkish_Northwest
0.19658845 Tajik_Yagnobi
0.19719818 German
0.19724073 Turkish_Balikesir
0.20194140 Hazara_Afghanistan
0.20361165 Turkish_Southwest
0.20432707 Uygur
0.20462215 Turkish_Aydin
0.20636918 Karakalpak
0.20693335 Turkish_South
0.20699293 Kho_Singanali
0.20854188 Jatt_Pathak
0.21102027 Hazara
0.21116963 Ror
0.21361686 Turkish_North
0.21653079 Tubalar
0.22249621 Uthmankhel
0.22875742 Turkish_Central
0.22888814 Shor_Mountain
0.23134900 Shor_Khakassia
0.23286304 Shor
0.23581486 Kazakh


Most East Asian-shifted Saami sample:

Distance to: Saami:saami2

0.09848462 Bashkir
0.12327083 Yukagir_Forest
0.12433077 Tatar_Siberian
0.13207427 Tatar_Crimean_steppe
0.13276869 Tatar_Siberian_Zabolotniye
0.14038424 Tlingit
0.14647190 Finnish_East
0.15131717 Mansi
0.16238275 Russian_Kostroma
0.16399254 Finnish
0.16441909 Khanty
0.16453117 Turkmen_Uzbekistan
0.16464982 Turkmen
0.16634323 Uzbek
0.16761669 Nogai
0.17878843 Tajik_Hisor
0.17951571 Cossack_Kuban
0.18265408 Sarikoli_China
0.18304216 Tajik_Ayni
0.18317926 Russian_Yaroslavl
0.18423148 Iranian_Turkmen_Golestan
0.18544337 Russian_Tver
0.18957098 Russian_Ryazan
0.19026927 Karakalpak
0.19066948 Hazara_Afghanistan
0.19071665 Tajik_Shugnan
0.19160642 Tajik_Rushan
0.19176525 Tajik_Badakshan
0.19271019 Tajik_Kulob
0.19294976 Uygur
0.19302237 Estonian
0.19451723 Tubalar
0.19534246 Russian_Kursk
0.19670823 Russian_Kaluga
0.19704706 Russian_Orel
0.19782409 Cossack_Ukrainian
0.19811217 Hazara
0.19827910 Russian_Pskov
0.19911121 Tajik_Ishkashim
0.20084988 Russian_Voronez
0.20395317 Russian_Belgorod
0.20524660 Ukrainian
0.20555402 Latvian
0.20557912 Shor_Mountain
0.20588287 Russian_Smolensk
0.20770330 Shor_Khakassia
0.20779264 Swedish
0.20824022 Lithuanian_PZ
0.20962900 Shor
0.20964550 Polish
0.21008945 Turkish_Northwest
0.21076801 Turkish_Balikesir
0.21104590 Turkish_Rumeli
0.21150202 Tajik_Yagnobi
0.21176201 Turkish_Deliorman
0.21404769 Kho_Singanali
0.21421140 Hungarian
0.21756288 Jatt_Pathak
0.21783864 Turkish_Southwest
0.21796627 Kazakh
0.21816001 Turkish_Aydin
0.21957202 Turkish_South
0.21980440 Ror
0.22179960 German
0.22679866 Turkish_North
0.23126550 Uthmankhel
0.24294135 Turkish_Central


Do you find these runs shocking? Are you astonished that Udmurts, Mari and even a related group like some Saamis are genetically closer to Tajiks, many Central Asians and Siberian Turks, Khanty/Mansi than to ethnic Russians and most other Europeans?

Heck even the Mari1 sample is closer to a Kazakh than to Russians from Kursk, Kaluga and Orel. Or how Udmurt8 individual is genetically closer to Hazaras from Afghanistan and Uyghurs than he/she is to Russians from Yaroslavl, Tver and Ryazan. Pretty fascinating for me imo.

CyrylBojarski
09-27-2021, 08:37 AM
Estonians have almost no Siberian, Finns have it but actually less than Northern Russians it does peak in the Saami though.

They have a bit

Target: Estonian
Distance: 1.6651% / 0.01665124 | R2P
92.0 Lithuanian_VZ
8.0 Udmurt

Nganasankhan
09-27-2021, 12:56 PM
They have a bit

Target: Estonian
Distance: 1.6651% / 0.01665124 | R2P
92.0 Lithuanian_VZ
8.0 Udmurt

I get a higher percentage of Udmurt in my f2-based model, but it's partially to counter Lithuanian-specific drift, because when I add Belarusians to sources, I get lower Udmurt and a much better fit:

$ mix <(egrep 'Udmurt|Lithuanian' f2) <(grep Estonian f2) -s
Estonian (.0092): 87% Lithuanian + 13% Udmurt
$ mix <(egrep 'Udmurt|Lithuanian|Belarusian' f2) <(grep Estonian f2) -s
Estonian (.0028): 55% Belarusian + 38% Lithuanian + 7% Udmurt

I also tried making a SmartPCA run of the same 265 populations that are included in my global f2 matrix. Now in a G25-style scaled datasheet of the run, the best two-way model for Estonians was the same as your G25 model:

$ f=kasi.p.avescaled;t=Estonian;mix <(grep -v ^$t, $f) <(grep ^$t, $f) -d4 -s -m2
Estonian (.0025): 92% Lithuanian + 8% Udmurt


Also you might have a hard time believing this but many Volga Uralics like Udmurt, Mari and even Saami are genetically closer to Tajiks, Siberian and Central Asian Turkics such as Siberian Tatars, Uzbeks, Turkmens, Nogais, Hazaras and Ugrics such as Khanty/Mansi than they are to most Russians and other Europeans according to G25 runs. The only exceptions seems to be Finns as they also have kra001/Nganasan-like Eastern Eurasian admixture which they shared with Udmurts pulling them closer than other pops:

Based on my f2 matrix, Udmurts are even closer to Tajiks than to Mordvins, even though on G25 it's the reverse, but it's partially because f2 is more sensitive to drift than G25, and Tajiks and other Central Asians are mixed populations with low driftedness:

$ curl -Ls pastebin.com/raw/B1t0ESsj|tr -d \\r|awk -F, 'NR==1{for(i=2;i<=NF;i++)if($i==x)break;next}$1!=x{print$i,$1}' x=Udmurt -|sort -n|head -n32
.00159 Besermyan
.00244 Tatar_Kazan
.00253 Bashkir
.00306 Chuvash
.00372 Tatar_Mishar
.00378 Nogai_Karachay_Cherkessia
.00385 Tatar_Siberian
.00403 Russian_Archangelsk_Krasnoborsky
.00409 Uzbek
.00424 Yukagir_Forest
.00431 Tajik
.00436 Mordovian
.00450 Nogai_Stavropol
.00458 Russian
.00466 Turkmen
.00471 Russian_Archangelsk_Pinezhsky
.00477 Finnish
.00494 Russian_Archangelsk_Leshukonsky
.00500 Karelian
.00522 Estonian
.00524 Belarusian
.00524 Ukrainian
.00525 Ukrainian_North
.00528 Abazin
.00530 Kabardinian
.00532 Veps
.00534 Czech
.00534 Hungarian
.00556 Kumyk
.00559 Uyghur
.00563 Karakalpak
.00567 Nogai_Astrakhan

If you don't have GNU/Linux or macOS, you can run the command above by copying and pasting it here: https://www.mycompiler.io/new/bash.

Even Finns are closer to Mishars than to English or Icelandics, but it's partially because Mishars have lower driftedness than Icelandics. And Finns are also closer to Chuvashes than to North Italians:

f2 distance to Finnish:
.00034 Karelian
.00074 Russian_Archangelsk_Krasnoborsky
.00076 Estonian
.00102 Russian
.00126 Ukrainian
.00127 Belarusian
.00131 Hungarian
.00137 Mordovian
.00138 Ukrainian_North
.00146 Czech
.00147 Veps
.00156 Norwegian
.00164 Russian_Archangelsk_Pinezhsky
.00173 Lithuanian
.00173 Tatar_Mishar
.00174 English
.00175 French
.00176 Icelandic
.00176 Tatar_Kazan
.00177 Croatian
.00184 Bulgarian
.00223 Romanian
.00227 Orcadian
.00232 Gagauz
.00239 Scottish
.00239 Spanish
.00247 Russian_Archangelsk_Leshukonsky
.00259 Chuvash
.00262 Moldavian
.00269 Italian_North
.00271 Albanian
.00281 Greek

Tsakhur
09-28-2021, 02:39 AM
I get a higher percentage of Udmurt in my f2-based model, but it's partially to counter Lithuanian-specific drift, because when I add Belarusians to sources, I get lower Udmurt and a much better fit:

$ mix <(egrep 'Udmurt|Lithuanian' f2) <(grep Estonian f2) -s
Estonian (.0092): 87% Lithuanian + 13% Udmurt
$ mix <(egrep 'Udmurt|Lithuanian|Belarusian' f2) <(grep Estonian f2) -s
Estonian (.0028): 55% Belarusian + 38% Lithuanian + 7% Udmurt

I also tried making a SmartPCA run of the same 265 populations that are included in my global f2 matrix. Now in a G25-style scaled datasheet of the run, the best two-way model for Estonians was the same as your G25 model:

$ f=kasi.p.avescaled;t=Estonian;mix <(grep -v ^$t, $f) <(grep ^$t, $f) -d4 -s -m2
Estonian (.0025): 92% Lithuanian + 8% Udmurt



Based on my f2 matrix, Udmurts are even closer to Tajiks than to Mordvins, even though on G25 it's the reverse, but it's partially because f2 is more sensitive to drift than G25, and Tajiks and other Central Asians are mixed populations with low driftedness:

$ curl -Ls pastebin.com/raw/B1t0ESsj|tr -d \\r|awk -F, 'NR==1{for(i=2;i<=NF;i++)if($i==x)break;next}$1!=x{print$i,$1}' x=Udmurt -|sort -n|head -n32
.00159 Besermyan
.00244 Tatar_Kazan
.00253 Bashkir
.00306 Chuvash
.00372 Tatar_Mishar
.00378 Nogai_Karachay_Cherkessia
.00385 Tatar_Siberian
.00403 Russian_Archangelsk_Krasnoborsky
.00409 Uzbek
.00424 Yukagir_Forest
.00431 Tajik
.00436 Mordovian
.00450 Nogai_Stavropol
.00458 Russian
.00466 Turkmen
.00471 Russian_Archangelsk_Pinezhsky
.00477 Finnish
.00494 Russian_Archangelsk_Leshukonsky
.00500 Karelian
.00522 Estonian
.00524 Belarusian
.00524 Ukrainian
.00525 Ukrainian_North
.00528 Abazin
.00530 Kabardinian
.00532 Veps
.00534 Czech
.00534 Hungarian
.00556 Kumyk
.00559 Uyghur
.00563 Karakalpak
.00567 Nogai_Astrakhan

If you don't have GNU/Linux or macOS, you can run the command above by copying and pasting it here: https://www.mycompiler.io/new/bash.

Even Finns are closer to Mishars than to English or Icelandics, but it's partially because Mishars have lower driftedness than Icelandics. And Finns are also closer to Chuvashes than to North Italians:

f2 distance to Finnish:
.00034 Karelian
.00074 Russian_Archangelsk_Krasnoborsky
.00076 Estonian
.00102 Russian
.00126 Ukrainian
.00127 Belarusian
.00131 Hungarian
.00137 Mordovian
.00138 Ukrainian_North
.00146 Czech
.00147 Veps
.00156 Norwegian
.00164 Russian_Archangelsk_Pinezhsky
.00173 Lithuanian
.00173 Tatar_Mishar
.00174 English
.00175 French
.00176 Icelandic
.00176 Tatar_Kazan
.00177 Croatian
.00184 Bulgarian
.00223 Romanian
.00227 Orcadian
.00232 Gagauz
.00239 Scottish
.00239 Spanish
.00247 Russian_Archangelsk_Leshukonsky
.00259 Chuvash
.00262 Moldavian
.00269 Italian_North
.00271 Albanian
.00281 Greek

Yes I don't have either GNU, Linux or macOS unfortunately. I have tried to copy and paste the command above in the link but for some reason it keeps repeating, " main.sh: line 2: $: command not found
Hello world!
[Program exited with exit code 0]". Therefore, I'm not sure what I have done wrong. :confused:

Am surprised that the Khanty and Mansi didn't show up in the f2 Udmurt matrix. Its probably because both of these Ugrics have heavy drift which makes them seem more distant. I wonder which is one is more accurate in telling the genetic distance between populations: f2 or G25?

I think the "Russian" sample that shows up before the Turkmen in the Udmurt run is a heavily Finnic admixed Northern Russian because all the other Russians in the samples that come after such as Russian_Archangelsk_Krasnoborsky, Pinezhsky or Leshukonsky are also Northern Russians with heavy Eastern Eurasian ancestry. Can you add Russian_Smolensk, Russian_Ryazan, Russian_Voronezh and Russian_Orel into the Udmurt runs? I want to what ranks they will be in.

Furthermore, its really peculiar how in f2, Czech and Hungarians are closer to Udmurts than Uyghurs are or how the "Russian" is closer to Udmurts than Turkmen are when in fact in G25, it is the opposite.

Also can you include the following groups such as Turkish_Balikesir, Hazara_Afghanistan, Turkish_Northwest, Kho_Singanali, Tajik_Rushan, Khanty, Tubalar and Shor into the Udmurt runs?

Finally can you conduct the same f2 matrix runs for the Mari and Chuvash?

Nganasankhan
09-28-2021, 12:00 PM
Yes I don't have either GNU, Linux or macOS unfortunately. I have tried to copy and paste the command above in the link but for some reason it keeps repeating, " main.sh: line 2: $: command not found
Hello world!
[Program exited with exit code 0]". Therefore, I'm not sure what I have done wrong. :confused:

You have to remove the dollar sign when you run the command.


Am surprised that the Khanty and Mansi didn't show up in the f2 Udmurt matrix. Its probably because both of these Ugrics have heavy drift which makes them seem more distant. I wonder which is one is more accurate in telling the genetic distance between populations: f2 or G25?

Khanty were not included in the run, but Mansi were on rank 48 (after Icelandics but before Scots). My f2 matrix included 8 Mansi samples from two different locations, but maybe Mansi would get lower f2 to other populations if I used samples from more locations or a bigger number of samples. And it's not just that Khanty and Mansi are drifted, but they are unique populations with high ANE / WSHG.

f2 shows the real genetic distance, and it's usually close to FST. Distances calculated from a PCA like G25 depend a lot on how many dimensions you use, and Davidski could've just as well included 10 or 100 or 1000 dimensions in G25. If G25 had a bigger number of dimensions, it could account for population-specific drift more accurately. The results of G25 also depend on which samples were part of the initial reference run that later samples were projected into (I don't know if Davidski has published that list anywhere).


I think the "Russian" sample that shows up before the Turkmen in the Udmurt run is a heavily Finnic admixed Northern Russian because all the other Russians in the samples that come after such as Russian_Archangelsk_Krasnoborsky, Pinezhsky or Leshukonsky are also Northern Russians with heavy Eastern Eurasian ancestry. Can you add Russian_Smolensk, Russian_Ryazan, Russian_Voronezh and Russian_Orel into the Udmurt runs? I want to what ranks they will be in.

Furthermore, its really peculiar how in f2, Czech and Hungarians are closer to Udmurts than Uyghurs are or how the "Russian" is closer to Udmurts than Turkmen are when in fact in G25, it is the opposite.

The population named "Russian" consists of 71 samples from all over Russia, so it's more mixed than the sets of samples from a single district of Arkhangelsk Oblast, and it gets lower f2 distance to other populations. Also in my f2 run, the population named "Russian" had more samples than almost any other population, and populations with a bigger number of samples get lower f2 distance to other populations, because their allele frequencies get more averaged out so they are less affected by random individual-level variation.


Also can you include the following groups such as Turkish_Balikesir, Hazara_Afghanistan, Turkish_Northwest, Kho_Singanali, Tajik_Rushan, Khanty, Tubalar and Shor into the Udmurt runs?

Finally can you conduct the same f2 matrix runs for the Mari and Chuvash?

It already incuded Turkish_Balikesir, Tubalar, and Shor. You can do it for Chuvashes yourself by using my code (or just sort the CSV file (https://pastebin.com/raw/B1t0ESsj) by the column for Chuvashes in a spreadsheet application).

Maris were not included in the run, but the Mari samples from Tambets et al. 2018 look like this:

f2 distance to Tambets_Maris:
0.001432 Tambets_Tatars
0.001529 Tambets_Mansi
0.003021 Tambets_Russians_Central
0.003265 Tambets_Karels
0.003789 Tambets_Swedes
0.004018 Tambets_Finns
0.004134 Tambets_Poles
0.004200 Tambets_Latvians
0.004322 Tambets_Ingrian
0.004527 Tambets_Estonians
0.005443 Tambets_Vepsas
0.005656 Tambets_Germans
0.006112 Tambets_Gagauzes
0.006552 Tambets_Saami_Kola
0.008591 Tambets_Khanty
0.009978 Tambets_Saami_SWE

The Mansi samples from Tambets ranged from about 10-60% Nganasan-like ancestry, with an average value of around 30-40%, so they are close to Maris on the list above. That's because f2 measures the difference in allele frequencies between population averages, and not the average difference in allele values between individual samples:

https://i.ibb.co/vwZXn3m/f2.png
https://uqrmaie1.github.io/admixtools/articles/admixtools.html#f-statistics-basics-1

Tsakhur
09-28-2021, 05:05 PM
You have to remove the dollar sign when you run the command.



Khanty were not included in the run, but Mansi were on rank 48 (after Icelandics but before Scots). My f2 matrix included 8 Mansi samples from two different locations, but maybe Mansi would get lower f2 to other populations if I used samples from more locations or a bigger number of samples. And it's not just that Khanty and Mansi are drifted, but they are unique populations with high ANE / WSHG.

f2 shows the real genetic distance, and it's usually close to FST. Distances calculated from a PCA like G25 depend a lot on how many dimensions you use, and Davidski could've just as well included 10 or 100 or 1000 dimensions in G25. If G25 had a bigger number of dimensions, it could account for population-specific drift more accurately. The results of G25 also depend on which samples were part of the initial reference run that later samples were projected into (I don't know if Davidski has published that list anywhere).



The population named "Russian" consists of 71 samples from all over Russia, so it's more mixed than the sets of samples from a single district of Arkhangelsk Oblast, and it gets lower f2 distance to other populations. Also in my f2 run, the population named "Russian" had more samples than almost any other population, and populations with a bigger number of samples get lower f2 distance to other populations, because their allele frequencies get more averaged out so they are less affected by random individual-level variation.



It already incuded Turkish_Balikesir, Tubalar, and Shor. You can do it for Chuvashes yourself by using my code (or just sort the CSV file (https://pastebin.com/raw/B1t0ESsj) by the column for Chuvashes in a spreadsheet application).

Maris were not included in the run, but the Mari samples from Tambets et al. 2018 look like this:

f2 distance to Tambets_Maris:
0.001432 Tambets_Tatars
0.001529 Tambets_Mansi
0.003021 Tambets_Russians_Central
0.003265 Tambets_Karels
0.003789 Tambets_Swedes
0.004018 Tambets_Finns
0.004134 Tambets_Poles
0.004200 Tambets_Latvians
0.004322 Tambets_Ingrian
0.004527 Tambets_Estonians
0.005443 Tambets_Vepsas
0.005656 Tambets_Germans
0.006112 Tambets_Gagauzes
0.006552 Tambets_Saami_Kola
0.008591 Tambets_Khanty
0.009978 Tambets_Saami_SWE

The Mansi samples from Tambets ranged from about 10-60% Nganasan-like ancestry, with an average value of around 30-40%, so they are close to Maris on the list above. That's because f2 measures the difference in allele frequencies between population averages, and not the average difference in allele values between individual samples:

https://i.ibb.co/vwZXn3m/f2.png
https://uqrmaie1.github.io/admixtools/articles/admixtools.html#f-statistics-basics-1

Oh ok. It works now. :)

It's funny to me that Tajiks and a lot of Central Asian Turks are closer to Udmurts than the Mansi are. But you right, it could be due to the low driftness of those pops due to them being mixed while Mansi/Khanty while mixed still have distinct genetic profiles. From what I know, Udmurt, Maris have pretty high ANE as well. They also likely possess some WSHG but not as much as Khanty/Mansi.

Would G25 have around 25 Dimensions though based on its name as they also have 25 PCAs or dimension is not the same thing as a PCA? By "Davidski has published that list somewhere", do you mean the samples that were part of the initial run?

Right. Although I don't think it make sense to have one "Russian" group which contains 71 samples from all over the nation, meanwhile separating the Russian_Archangelsk samples into Krasnoborsky, Pinezhsky or Leshukonsky". I think having one "Russian" group will distort the result of the f2 run. Its better to split all those 71 samples in the "Russian" cluster into Russian_Smolensk, Russian_Orel, Russian_Voronezh, Russian_Tver, Russian_Ryazan, etc. Because then why separate "Russian_Archangelsk" into three separate groups?

Is there a way to include Mari into the run/find coordinates? Would be great to see if Mari are even closer to Central Asians including Hazaras, Mansi, Siberian Turks like Shor, Tubalar, Altaian than they are to mainstream Europeans when compared to the Udmurt.

That f2 Tambets_Mari distances are very bizarre. Tambets_Mari are genetically closer to Russian_Central, Swedes, Poles, Latvians, Germans and Gagauzes than they are to Saami_Kola and Saami_SWE? Something is fishy here.

Ah ok that make sense. There seem to be at least three distinct clusters among the Mansi then; one with low East Eurasian, the other with intermediate amount of Central Siberian-type ENA and another one with high amount of Nganasan-related ancestry.

So I did the Chuvash f2 run and wow they seem to have less ENA ancestry than the Udmurts as can be seen by how a lot of mainstream Europeans come up in the ranks of closeness to Chuvash before Tajiks and other Central Asians and Nogais. What's really puzzling is how the Hungarian, Ukrainian_North and Czech come up before Udmurts in the f2 Chuvash or how Chuvash seem genetically closer to Belarusian than to Besermyan. This doesn't seem to be right?

.00079 Tatar_Kazan
.00142 Bashkir
.00151 Tatar_Mishar
.00167 Russian_Archangelsk_Krasnoborsky
.00204 Mordovian
.00213 Nogai_Karachay_Cherkessia
.00221 Russian
.00259 Finnish
.00259 Ukrainian
.00269 Karelian
.00271 Uzbek
.00274 Estonian
.00278 Belarusian
.00279 Besermyan
.00282 Tajik
.00283 Yukagir_Forest
.00285 Hungarian
.00289 Ukrainian_North
.00297 Tatar_Siberian
.00299 Czech
.00302 Russian_Archangelsk_Pinezhsky
.00306 Udmurt
.00309 Veps
.00321 Turkmen
.00322 Bulgarian
.00323 Croatian
.00325 Kabardinian
.00326 Nogai_Stavropol
.00330 Russian_Archangelsk_Leshukonsky
.00333 Lithuanian
.00338 Abazin
.00342 English
.00346 French
.00349 Norwegian
.00351 Gagauz
.00356 Kumyk
.00362 Icelandic
.00363 Circassian
.00363 Romanian
.00372 Turkish
.00384 Balkar
.00390 Orcadian
.00394 Moldavian
.00395 Albanian
.00401 Spanish
.00402 Scottish
.00405 Greek
.00413 Lezgin
.00415 Adygei
.00421 Italian_North

Nganasankhan
09-28-2021, 06:37 PM
So I did the Chuvash f2 run and wow they seem to have less ENA ancestry than the Udmurts as can be seen by how a lot of mainstream Europeans come up in the ranks of closeness to Chuvash before Tajiks and other Central Asians and Nogais. What's really puzzling is how the Hungarian, Ukrainian_North and Czech come up before Udmurts in the f2 Chuvash or how Chuvash seem genetically closer to Belarusian than to Besermyan. This doesn't seem to be right?

Yeah but if you look at what population are the closest to Udmurts, then Chuvashes rank 4th.

Hungarians are another mixed population that has low driftedness or high effective population size, so they're relatively close to other populations based on f2 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7696950/):


The top admixture in the Asian continent can be observed among the Uzbek population from Central Asia (Uzbekistan Republic), Azerbaijanians, Iranians, and Pathan people from Pakistan. These populations lived for thousands of years in the middle of a complex network of trading, migration, and conquest events where millions of people representing the world's vast spectrum of genetic diversity passed through. Due to a historically high number of admixed people and large population sizes, the numbers of shared IBD fragments within the same population in the aforementioned groups are the lowest compared to the rest of the world (for Pathans, their shared number of IBDs among themselves is the record low - 8.95; this is followed by Azerbaijanians and Uyghurs, each at 10.8; Uzbeks at 11.1; Iranians at 12.2; and Turks at 12.9). Together, the low number of shared IBDs within the same population and the high values for relative relatedness from multiple DHGR components (Supplementary Table S4) indicate the populations with the strongest admixtures where millions of people from remote populations mixed with each other for hundreds of years. In Europe, such admixed populations include the Moldavians, Greeks, Italians, Hungarians, and Tatars, among others.


Right. Although I don't think it make sense to have one "Russian" group which contains 71 samples from all over the nation, meanwhile separating the Russian_Archangelsk samples into Krasnoborsky, Pinezhsky or Leshukonsky". I think having one "Russian" group will distort the result of the f2 run. Its better to split all those 71 samples in the "Russian" cluster into Russian_Smolensk, Russian_Orel, Russian_Voronezh, Russian_Tver, Russian_Ryazan, etc. Because then why separate "Russian_Archangelsk" into three separate groups?

I guess the population that's labeled just Russian is meant to be semi-representative of Russians in general, so they didn't want it to include too many samples from Arkhangelsk. The Human Origins dataset doesn't have that many regional population averages like G25 does.


That f2 Tambets_Mari distances are very bizarre. Tambets_Mari are genetically closer to Russian_Central, Swedes, Poles, Latvians, Germans and Gagauzes than they are to Saami_Kola and Saami_SWE? Something is fishy here.

Yeah cause Saami are far from everyone. If the populations are sorted by their distance to Tambets_Saami_SWE, then Maris rank third.


Is there a way to include Mari into the run/find coordinates? Would be great to see if Mari are even closer to Central Asians including Hazaras, Mansi, Siberian Turks like Shor, Tubalar, Altaian than they are to mainstream Europeans when compared to the Udmurt.

I made another f2 run where I included the one Mari sample that is in Tambets 2018, and also 8 samples from modern populations in the Human Origins dataset that have at least 8 samples. Here's the f2 distance to the Mari sample (there was something funky with merging the samples or the quality control, so the absolute distances don't make sense, but their relative order seems reasonable):


$ refam()(awk 'NR==FNR{a[$1]=$2;next}{$1=a[$2]}1' $1.{pick,fam}|sponge $1.fam)
$ f2c()(Rscript --no-init-file -e 'library(admixtools);p=commandArgs(T)[1];f=f2(p,unique_only=F);df=as.data.frame(f);write.c sv(round(xtabs(df[,3]~df[,2]+df[,1]),6),paste0(p,".f2"),quote=F)' "[email protected]")
$ x=marieight.p;refam $x;f2c $x; awk -F, 'NR==1{for(i=2;i<=NF;i++)if($i==x)break;next}$1!=x{print$i,$1}' x=Mari $x.f2|sort -n|awk '{$1=sprintf("%.5f",$1)}1'
0.00152 Chuvash
0.00371 Tatar_Mishar
0.00389 Tatar_Kazan
0.00398 Bashkir
0.00477 Tatar_Siberian
0.00481 Russian
0.00534 Nogai_Karachay_Cherkessia
0.00603 Nogai_Stavropol
0.00603 Udmurt
0.00627 Finnish
0.00639 Uzbek
0.00669 Hungarian
0.00673 Mordovian
0.00691 Ukrainian
0.00696 Karakalpak
0.00704 Kabardinian
0.00713 English
0.00718 Abazin
0.00728 Bulgarian
0.00735 Czech
0.00737 French
0.00744 Ukrainian_North
0.00755 Veps
0.00757 Tajik
0.00760 Estonian
0.00777 Icelandic
0.00779 Croatian
0.00780 Karelian
0.00784 Belarusian
0.00802 Moldavian
0.00808 Uyghur
0.00821 Spanish
0.00829 Azeri
0.00837 Romanian
0.00842 Norwegian
0.00850 Kazakh
0.00852 Lithuanian
0.00855 Orcadian
0.00859 Karachai
0.00861 Italian_North
0.00866 Balkar
0.00866 Greek
0.00869 Circassian
0.00874 Kumyk
0.00877 Gagauz
0.00878 Tabasaran
0.00879 Hazara
0.00880 Turkish
0.00885 Lezgin
0.00893 Ossetian
0.00915 Adygei
0.00915 Pathan
0.00916 Lak
0.00942 Iranian
0.00946 Sicilian
0.00962 Chechen
0.00971 Burusho
0.00978 Mansi
0.00989 Sindhi_Pakistan
0.00994 Avar
0.01002 Kyrgyz_Kyrgyzstan
0.01017 Kaitag
0.01026 Ezid
0.01040 Even
0.01052 Ingushian
0.01056 Cypriot
0.01059 Abkhasian
0.01067 Georgian
0.01103 Lebanese_Muslim
0.01109 Assyrian
0.01114 Lebanese
0.01119 Jew_Turkish
0.01119 Maltese
0.01128 Brahui
0.01135 Khakass
0.01137 Iranian_Bandari
0.01145 Armenian
0.01153 Lebanese_Christian
0.01157 Kyrgyz_China
0.01160 Syrian
0.01161 Makrani
0.01180 Jordanian
0.01181 Palestinian
0.01184 Balochi
0.01189 Basque
0.01192 Altaian
0.01195 Selkup
0.01196 Darginian
0.01202 Kurd
0.01209 Armenian_Hemsheni
0.01215 Kazakh_China
0.01215 Egyptian
0.01230 BedouinA
0.01299 Jew_Iranian
0.01323 Punjabi
0.01323 Altaian_Chelkan
0.01328 Druze
0.01351 Saudi
0.01352 Jew_Libyan
0.01370 Sardinian
0.01421 Tunisian
0.01438 Khakass_Kachin
0.01468 Kalmyk
0.01469 Yemeni_Highlands
0.01492 Moroccan
0.01524 Jew_Yemenite
0.01529 Tubalar
0.01561 Yemeni_Northwest
0.01569 Yemeni_Desert2
0.01602 Tuvinian
0.01645 Mozabite
0.01666 Kalash
0.01696 Ket
0.01705 Mongol
0.01724 Buryat
0.01728 Burmese
0.01745 Yemeni_Desert
0.01796 Salar
0.01819 Dungan
0.01826 Khamnegan
0.02008 BedouinB
0.02039 Tu
0.02057 Tibetan
0.02080 Yakut
0.02107 Daur
0.02108 Yugur
0.02121 Malay
0.02171 Thai
0.02175 Yukagir_Tundra
0.02187 Tofalar
0.02241 Bonan
0.02249 Cambodian
0.02254 Oroqen
0.02305 Hezhen
0.02440 Nanai
0.02442 Tujia
0.02472 Kinh
0.02473 Ulchi
0.02473 Japanese
0.02478 Evenk_Transbaikal
0.02486 Qiang
0.02495 Yi
0.02506 Kusunda
0.02518 Han
0.02524 Naxi
0.02562 Miao
0.02572 Vietnamese
0.02591 Zhuang
0.02594 Dong
0.02600 Mulam
0.02679 She
0.02722 Dai
0.02801 Gelao
0.02829 Chukchi
0.02843 Maonan
0.03004 Nganasan
0.03024 Nivh
0.03036 China_Lahu
0.03055 Dusun
0.03105 Somali
0.03184 Koryak
0.03186 Murut
0.03285 Mayan
0.03309 Ami
0.03393 Eskimo_Naukan
0.03710 Kankanaey
0.03718 Zapotec
0.03989 Atayal
0.03998 Mixtec
0.04073 Masai
0.04528 Mixe
0.04675 AA
0.04720 Pima
0.05067 Nasioi
0.05711 Luhya
0.05719 Luo
0.06092 Mandenka
0.06096 Malawi_Tumbuka
0.06114 Malawi_Chewa
0.06114 Esan
0.06124 Mende
0.06136 Malawi_Yao
0.06159 Yoruba
0.06167 Karitiana
0.06202 Khomani
0.06249 Papuan
0.06954 Surui
0.07218 Biaka
0.08343 Mbuti

Tsakhur
09-29-2021, 03:19 AM
Yeah but if you look at what population are the closest to Udmurts, then Chuvashes rank 4th.

Hungarians are another mixed population that has low driftedness or high effective population size, so they're relatively close to other populations based on f2 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7696950/):


The top admixture in the Asian continent can be observed among the Uzbek population from Central Asia (Uzbekistan Republic), Azerbaijanians, Iranians, and Pathan people from Pakistan. These populations lived for thousands of years in the middle of a complex network of trading, migration, and conquest events where millions of people representing the world's vast spectrum of genetic diversity passed through. Due to a historically high number of admixed people and large population sizes, the numbers of shared IBD fragments within the same population in the aforementioned groups are the lowest compared to the rest of the world (for Pathans, their shared number of IBDs among themselves is the record low - 8.95; this is followed by Azerbaijanians and Uyghurs, each at 10.8; Uzbeks at 11.1; Iranians at 12.2; and Turks at 12.9). Together, the low number of shared IBDs within the same population and the high values for relative relatedness from multiple DHGR components (Supplementary Table S4) indicate the populations with the strongest admixtures where millions of people from remote populations mixed with each other for hundreds of years. In Europe, such admixed populations include the Moldavians, Greeks, Italians, Hungarians, and Tatars, among others.



I guess the population that's labeled just Russian is meant to be semi-representative of Russians in general, so they didn't want it to include too many samples from Arkhangelsk. The Human Origins dataset doesn't have that many regional population averages like G25 does.



Yeah cause Saami are far from everyone. If the populations are sorted by their distance to Tambets_Saami_SWE, then Maris rank third.



I made another f2 run where I included the one Mari sample that is in Tambets 2018, and also 8 samples from modern populations in the Human Origins dataset that have at least 8 samples. Here's the f2 distance to the Mari sample (there was something funky with merging the samples or the quality control, so the absolute distances don't make sense, but their relative order seems reasonable):


$ refam()(awk 'NR==FNR{a[$1]=$2;next}{$1=a[$2]}1' $1.{pick,fam}|sponge $1.fam)
$ f2c()(Rscript --no-init-file -e 'library(admixtools);p=commandArgs(T)[1];f=f2(p,unique_only=F);df=as.data.frame(f);write.c sv(round(xtabs(df[,3]~df[,2]+df[,1]),6),paste0(p,".f2"),quote=F)' "[email protected]")
$ x=marieight.p;refam $x;f2c $x; awk -F, 'NR==1{for(i=2;i<=NF;i++)if($i==x)break;next}$1!=x{print$i,$1}' x=Mari $x.f2|sort -n|awk '{$1=sprintf("%.5f",$1)}1'
0.00152 Chuvash
0.00371 Tatar_Mishar
0.00389 Tatar_Kazan
0.00398 Bashkir
0.00477 Tatar_Siberian
0.00481 Russian
0.00534 Nogai_Karachay_Cherkessia
0.00603 Nogai_Stavropol
0.00603 Udmurt
0.00627 Finnish
0.00639 Uzbek
0.00669 Hungarian
0.00673 Mordovian
0.00691 Ukrainian
0.00696 Karakalpak
0.00704 Kabardinian
0.00713 English
0.00718 Abazin
0.00728 Bulgarian
0.00735 Czech
0.00737 French
0.00744 Ukrainian_North
0.00755 Veps
0.00757 Tajik
0.00760 Estonian
0.00777 Icelandic
0.00779 Croatian
0.00780 Karelian
0.00784 Belarusian
0.00802 Moldavian
0.00808 Uyghur
0.00821 Spanish
0.00829 Azeri
0.00837 Romanian
0.00842 Norwegian
0.00850 Kazakh
0.00852 Lithuanian
0.00855 Orcadian
0.00859 Karachai
0.00861 Italian_North
0.00866 Balkar
0.00866 Greek
0.00869 Circassian
0.00874 Kumyk
0.00877 Gagauz
0.00878 Tabasaran
0.00879 Hazara
0.00880 Turkish
0.00885 Lezgin
0.00893 Ossetian
0.00915 Adygei
0.00915 Pathan
0.00916 Lak
0.00942 Iranian
0.00946 Sicilian
0.00962 Chechen
0.00971 Burusho
0.00978 Mansi
0.00989 Sindhi_Pakistan
0.00994 Avar
0.01002 Kyrgyz_Kyrgyzstan
0.01017 Kaitag
0.01026 Ezid
0.01040 Even
0.01052 Ingushian
0.01056 Cypriot
0.01059 Abkhasian
0.01067 Georgian
0.01103 Lebanese_Muslim
0.01109 Assyrian
0.01114 Lebanese
0.01119 Jew_Turkish
0.01119 Maltese
0.01128 Brahui
0.01135 Khakass
0.01137 Iranian_Bandari
0.01145 Armenian
0.01153 Lebanese_Christian
0.01157 Kyrgyz_China
0.01160 Syrian
0.01161 Makrani
0.01180 Jordanian
0.01181 Palestinian
0.01184 Balochi
0.01189 Basque
0.01192 Altaian
0.01195 Selkup
0.01196 Darginian
0.01202 Kurd
0.01209 Armenian_Hemsheni
0.01215 Kazakh_China
0.01215 Egyptian
0.01230 BedouinA
0.01299 Jew_Iranian
0.01323 Punjabi
0.01323 Altaian_Chelkan
0.01328 Druze
0.01351 Saudi
0.01352 Jew_Libyan
0.01370 Sardinian
0.01421 Tunisian
0.01438 Khakass_Kachin
0.01468 Kalmyk
0.01469 Yemeni_Highlands
0.01492 Moroccan
0.01524 Jew_Yemenite
0.01529 Tubalar
0.01561 Yemeni_Northwest
0.01569 Yemeni_Desert2
0.01602 Tuvinian
0.01645 Mozabite
0.01666 Kalash
0.01696 Ket
0.01705 Mongol
0.01724 Buryat
0.01728 Burmese
0.01745 Yemeni_Desert
0.01796 Salar
0.01819 Dungan
0.01826 Khamnegan
0.02008 BedouinB
0.02039 Tu
0.02057 Tibetan
0.02080 Yakut
0.02107 Daur
0.02108 Yugur
0.02121 Malay
0.02171 Thai
0.02175 Yukagir_Tundra
0.02187 Tofalar
0.02241 Bonan
0.02249 Cambodian
0.02254 Oroqen
0.02305 Hezhen
0.02440 Nanai
0.02442 Tujia
0.02472 Kinh
0.02473 Ulchi
0.02473 Japanese
0.02478 Evenk_Transbaikal
0.02486 Qiang
0.02495 Yi
0.02506 Kusunda
0.02518 Han
0.02524 Naxi
0.02562 Miao
0.02572 Vietnamese
0.02591 Zhuang
0.02594 Dong
0.02600 Mulam
0.02679 She
0.02722 Dai
0.02801 Gelao
0.02829 Chukchi
0.02843 Maonan
0.03004 Nganasan
0.03024 Nivh
0.03036 China_Lahu
0.03055 Dusun
0.03105 Somali
0.03184 Koryak
0.03186 Murut
0.03285 Mayan
0.03309 Ami
0.03393 Eskimo_Naukan
0.03710 Kankanaey
0.03718 Zapotec
0.03989 Atayal
0.03998 Mixtec
0.04073 Masai
0.04528 Mixe
0.04675 AA
0.04720 Pima
0.05067 Nasioi
0.05711 Luhya
0.05719 Luo
0.06092 Mandenka
0.06096 Malawi_Tumbuka
0.06114 Malawi_Chewa
0.06114 Esan
0.06124 Mende
0.06136 Malawi_Yao
0.06159 Yoruba
0.06167 Karitiana
0.06202 Khomani
0.06249 Papuan
0.06954 Surui
0.07218 Biaka
0.08343 Mbuti

Mmm can you post the link which shows Chuvash being the 4th closest population to Udmurts?

Hungarians also have very tiny amount of East Asian (probably more from Turkic invaders rather than Huns/Magyars) at 1-4% on average so that probably pulls them closer to other populations as well. Are Ukrainians/Ukrainian_North and Czechs also mixed populations with low driftness? They also have lower genetic distances to other groups just like Hungarians do. Btw what are Greeks, Italians mixed with according to that paragraph? They don't have ENA admixture from what I know unlike Hungarians. I'm guessing Moldavians are Moldovans aka Romanian speakers?

Well it is pretty misleading in my opinion to have one Russian sample and then three other Russians separate into Krasnoborsky, Pinezhsky and Leshukonsky, all of them who seem to be Russified Uralics or heavily Finno-Ugric admixed Northern Russians. I am certain Turkmens will be closer to Udmurts if the Russian samples is separate into different categories from various regions of Russia or if some of the significantly Uralic admixed Northern Russians were remove from the "Russian" mega-sample (assuming if there's some from Northern Russia among the 71 individuals).

Its possibly because the Saami are so geographically isolated and thus lead to heavy drift among them. Ah ok Maris coming up at #3 for the Saami_SWE makes sense. Saami_SWE are more ENA than Saami_Kola right? G25 also displays Saami being relatively close to Volga Uralics so I think the G25 manage to remove that genetic drift for the Saami.

Several rankings in the absolute distances are indeed incorrect. For instance, Maris are closer to Hungarian, English, Bulgarian, Czech and French to Tajiks? Or Maris are closer to Czech and French than they are to Estonian and Uyghur or Udmurts showing up in the rank after Russian? But yes the relative order is somewhat okay, with Chuvash, Volga Tatars, Bashkirs, Siberian Tatars, Nogais and Uzbeks still coming up among the top ranks of closest pops to Maris. Btw where is Nogai_Astrakhan; it should somewhere between Ukrainian and Karakalpak or before the Hungarian, Mordovian and Ukrainian?

Could you select the two most Nganasan-shifted Maris from the total 8 samples from Human Origins and run them on f2 absolute distances in relative to other populations?

al.Krivich
09-29-2021, 09:36 PM
Also you might have a hard time believing this but many Volga Uralics like Udmurt, Mari and even Saami are genetically closer to Tajiks, Siberian and Central Asian Turkics such as Siberian Tatars, Uzbeks, Turkmens, Nogais, Hazaras and Ugrics such as Khanty/Mansi than they are to most ethnic Russians and other Europeans according to G25 runs. The only exceptions seems to be Finns as they also have more kra001/Nganasan-related East Eurasian admixture which they shared with Udmurts, resulting in the former becoming close to the latter than other pops:

Here is the Udmurt average from G25:

Distance to: Udmurt

0.09233220 Bashkir
0.10996572 Tatar_Crimean_steppe
0.12669984 Tatar_Siberian
0.13017299 Turkmen
0.13209317 Turkmen_Uzbekistan
0.13358622 Tajik_Hisor
0.13659690 Finnish_East
0.13908269 Tajik_Ayni
0.13951604 Sarikoli_China
0.14377135 Tatar_Siberian_Zabolotniye
0.14399868 Tajik_Shugnan
0.14419189 Russian_Kostroma
0.14453708 Iranian_Turkmen_Golestan
0.14518332 Tajik_Rushan
0.14537728 Yukagir_Forest
0.14611296 Tajik_Kulob
0.14620147 Tajik_Badakshan
0.14730403 Uzbek
0.15097086 Finnish
0.15341731 Tajik_Ishkashim
0.15670755 Tlingit
0.15985649 Cossack_Kuban
0.16346492 Mansi
0.16363002 Tajik_Yagnobi
0.16588757 Russian_Yaroslavl
0.16668715 Russian_Tver
0.16744887 Nogai
0.16839026 Russian_Ryazan
0.17037648 Turkish_Northwest
0.17140857 Kho_Singanali
0.17162638 Turkish_Balikesir
0.17507607 Turkish_Rumeli
0.17543081 Russian_Kursk
0.17682555 Turkish_Southwest
0.17705807 Jatt_Pathak
0.17723851 Russian_Orel
0.17734978 Turkish_Deliorman
0.17771331 Khanty
0.17812440 Turkish_South
0.17864308 Turkish_Aydin
0.17880079 Cossack_Ukrainian
0.17908975 Hazara_Afghanistan
0.17924798 Russian_Kaluga
0.17929713 Estonian
0.17959051 Ror
0.18171679 Russian_Voronez
0.18211018 Russian_Pskov
0.18344477 Russian_Belgorod
0.18412141 Uygur
0.18449143 Ukrainian
0.18549235 Turkish_North
0.18763907 Uthmankhel
0.18771594 Russian_Smolensk
0.18895298 Swedish
0.18954798 Hazara
0.18959742 Polish
0.18964876 Hungarian
0.19272786 Latvian
0.19318914 Karakalpak
0.19554328 Lithuanian_PZ
0.19861093 German
0.20141186 Turkish_Central
0.20544633 Tubalar
0.21700773 Shor_Mountain
0.21998997 Shor_Khakassia
0.22240629 Shor
0.22504281 Kazakh


Most Eastern-shifted Udmurt individual:

Distance to: Udmurt:udmurd8

0.08187760 Bashkir
0.10852740 Tatar_Crimean_steppe
0.11499101 Tatar_Siberian
0.12768661 Turkmen
0.12839383 Turkmen_Uzbekistan
0.13030726 Tatar_Siberian_Zabolotniye
0.13430221 Yukagir_Forest
0.13723756 Tajik_Hisor
0.14008865 Uzbek
0.14242160 Tajik_Ayni
0.14307951 Sarikoli_China
0.14503555 Iranian_Turkmen_Golestan
0.14931869 Tlingit
0.15002433 Finnish_East
0.15051991 Mansi
0.15082985 Tajik_Shugnan
0.15099631 Tajik_Kulob
0.15170661 Tajik_Badakshan
0.15262305 Tajik_Rushan
0.15733351 Nogai
0.15807654 Russian_Kostroma
0.15932898 Tajik_Ishkashim
0.16412459 Finnish
0.16433979 Khanty
0.16977493 Hazara_Afghanistan
0.17150233 Tajik_Yagnobi
0.17424742 Cossack_Kuban
0.17470455 Uygur
0.17473450 Kho_Singanali
0.17560038 Turkish_Northwest
0.17658199 Turkish_Balikesir
0.17977632 Hazara
0.18022385 Russian_Yaroslavl
0.18054395 Russian_Tver
0.18132893 Jatt_Pathak
0.18183389 Turkish_Southwest
0.18225833 Karakalpak
0.18227607 Russian_Ryazan
0.18257704 Turkish_South
0.18350451 Turkish_Aydin
0.18364497 Ror
0.18371928 Turkish_Rumeli
0.18694029 Turkish_Deliorman
0.18932506 Russian_Kursk
0.19037902 Turkish_North
0.19079839 Russian_Orel
0.19152406 Uthmankhel
0.19219383 Cossack_Ukrainian
0.19309090 Russian_Kaluga
0.19334510 Estonian
0.19378021 Tubalar
0.19543075 Russian_Voronez
0.19625473 Russian_Pskov
0.19722412 Russian_Belgorod
0.19809225 Ukrainian
0.20108449 Swedish
0.20135408 Russian_Smolensk
0.20161450 Hungarian
0.20302204 Polish
0.20469775 Shor_Mountain
0.20685367 Turkish_Central
0.20715504 Latvian
0.20766697 Shor_Khakassia
0.20963820 Lithuanian_PZ
0.20993455 Shor
0.21031218 German
0.21358828 Kazakh


Now compare to Mari who should be more Eastern Eurasian/ENA on average than Udmurts:

Distance to: Mari

0.11547308 Bashkir
0.13857526 Tatar_Siberian
0.14622919 Tatar_Siberian_Zabolotniye
0.14750185 Tatar_Crimean_steppe
0.15247870 Yukagir_Forest
0.15879922 Mansi
0.16609517 Turkmen
0.16666745 Turkmen_Uzbekistan
0.17214423 Tlingit
0.17229907 Uzbek
0.17348576 Khanty
0.17705239 Finnish_East
0.17783813 Nogai
0.18294903 Tajik_Hisor
0.18422114 Iranian_Turkmen_Golestan
0.18522712 Russian_Kostroma
0.18617886 Tajik_Ayni
0.18909122 Sarikoli_China
0.19255765 Finnish
0.19488972 Tajik_Kulob
0.19535284 Tajik_Shugnan
0.19586713 Hazara_Afghanistan
0.19632056 Tajik_Badakshan
0.19748239 Tajik_Rushan
0.19884563 Cossack_Kuban
0.19923069 Uygur
0.19996889 Karakalpak
0.20260554 Tajik_Ishkashim
0.20344348 Hazara
0.20516483 Russian_Yaroslavl
0.20847626 Russian_Tver
0.20920322 Russian_Ryazan
0.21058514 Turkish_Northwest
0.21182997 Tubalar
0.21232139 Tajik_Yagnobi
0.21235883 Turkish_Balikesir
0.21577945 Russian_Kursk
0.21622425 Turkish_Southwest
0.21650129 Kho_Singanali
0.21689662 Turkish_Aydin
0.21701708 Russian_Kaluga
0.21741102 Turkish_Rumeli
0.21763448 Estonian
0.21768552 Turkish_South
0.21780212 Turkish_Deliorman
0.21798475 Russian_Orel
0.21984970 Cossack_Ukrainian
0.22059257 Russian_Voronez
0.22088477 Russian_Pskov
0.22192787 Shor_Mountain
0.22200313 Jatt_Pathak
0.22287509 Russian_Belgorod
0.22346587 Ror
0.22396805 Turkish_North
0.22486058 Ukrainian
0.22617737 Shor_Khakassia
0.22640049 Shor
0.22654435 Russian_Smolensk
0.22718034 Kazakh
0.22981453 Polish
0.23081106 Hungarian
0.23084455 Uthmankhel
0.23128921 Latvian
0.23172467 Swedish
0.23346019 Lithuanian_PZ
0.23934037 Turkish_Central
0.24026907 German


Most East Asian-shifted Mari:

Distance to: Mari:mari1

0.11040352 Bashkir
0.13218886 Tatar_Siberian
0.13714602 Tatar_Siberian_Zabolotniye
0.14681540 Yukagir_Forest
0.14881216 Tatar_Crimean_steppe
0.14953210 Mansi
0.16383451 Khanty
0.16693582 Turkmen
0.16705886 Turkmen_Uzbekistan
0.16769483 Tlingit
0.16971483 Uzbek
0.17234959 Nogai
0.18345203 Finnish_East
0.18634596 Tajik_Hisor
0.18677092 Iranian_Turkmen_Golestan
0.18919353 Tajik_Ayni
0.19187852 Russian_Kostroma
0.19201913 Sarikoli_China
0.19206068 Hazara_Afghanistan
0.19396984 Karakalpak
0.19471678 Uygur
0.19885824 Hazara
0.19891662 Tajik_Kulob
0.19894694 Finnish
0.20018331 Tajik_Shugnan
0.20028861 Tajik_Badakshan
0.20214528 Tajik_Rushan
0.20393972 Tubalar
0.20712006 Cossack_Kuban
0.20717671 Tajik_Ishkashim
0.21260915 Russian_Yaroslavl
0.21346326 Shor_Mountain
0.21551440 Russian_Tver
0.21558532 Turkish_Northwest
0.21656360 Russian_Ryazan
0.21710570 Turkish_Balikesir
0.21750578 Tajik_Yagnobi
0.21750915 Shor_Khakassia
0.21771129 Shor
0.22030140 Kazakh
0.22037648 Kho_Singanali
0.22170558 Turkish_Southwest
0.22199340 Turkish_Aydin
0.22267565 Turkish_South
0.22306003 Russian_Kursk
0.22430097 Turkish_Rumeli
0.22434148 Russian_Kaluga
0.22465072 Turkish_Deliorman
0.22484249 Estonian
0.22493215 Russian_Orel
0.22638171 Jatt_Pathak
0.22698380 Cossack_Ukrainian
0.22792117 Russian_Voronez
0.22802434 Ror
0.22830061 Russian_Pskov
0.22879830 Turkish_North
0.23017250 Russian_Belgorod
0.23218075 Ukrainian
0.23369090 Russian_Smolensk
0.23532068 Uthmankhel
0.23707678 Polish
0.23795651 Hungarian
0.23845057 Latvian
0.23846213 Swedish
0.24026953 Lithuanian_PZ
0.24473670 Turkish_Central
0.24734527 German


Here is the Saami average:

Distance to: Saami

0.11182593 Bashkir
0.12036459 Finnish_East
0.12774128 Tatar_Crimean_steppe
0.13609329 Russian_Kostroma
0.13826217 Finnish
0.14160670 Yukagir_Forest
0.14216068 Tatar_Siberian
0.15078272 Cossack_Kuban
0.15607780 Tatar_Siberian_Zabolotniye
0.15704647 Russian_Yaroslavl
0.15796967 Tlingit
0.15887672 Russian_Tver
0.16336793 Turkmen
0.16338782 Russian_Ryazan
0.16459138 Turkmen_Uzbekistan
0.16632238 Estonian
0.16927429 Russian_Kursk
0.16938225 Tajik_Hisor
0.17094294 Russian_Kaluga
0.17167384 Russian_Orel
0.17178954 Russian_Pskov
0.17240465 Cossack_Ukrainian
0.17367742 Uzbek
0.17400811 Mansi
0.17453495 Tajik_Ayni
0.17516197 Sarikoli_China
0.17518874 Russian_Voronez
0.17794170 Russian_Belgorod
0.17822236 Tajik_Rushan
0.17827602 Tajik_Shugnan
0.17868669 Iranian_Turkmen_Golestan
0.17927329 Ukrainian
0.17931411 Latvian
0.18056644 Russian_Smolensk
0.18141920 Tajik_Badakshan
0.18157993 Nogai
0.18204987 Tajik_Kulob
0.18279075 Swedish
0.18283997 Lithuanian_PZ
0.18381285 Polish
0.18837516 Tajik_Ishkashim
0.18848438 Khanty
0.18949914 Hungarian
0.19112002 Turkish_Deliorman
0.19124715 Turkish_Rumeli
0.19634888 Turkish_Northwest
0.19658845 Tajik_Yagnobi
0.19719818 German
0.19724073 Turkish_Balikesir
0.20194140 Hazara_Afghanistan
0.20361165 Turkish_Southwest
0.20432707 Uygur
0.20462215 Turkish_Aydin
0.20636918 Karakalpak
0.20693335 Turkish_South
0.20699293 Kho_Singanali
0.20854188 Jatt_Pathak
0.21102027 Hazara
0.21116963 Ror
0.21361686 Turkish_North
0.21653079 Tubalar
0.22249621 Uthmankhel
0.22875742 Turkish_Central
0.22888814 Shor_Mountain
0.23134900 Shor_Khakassia
0.23286304 Shor
0.23581486 Kazakh


Most East Asian-shifted Saami sample:

Distance to: Saami:saami2

0.09848462 Bashkir
0.12327083 Yukagir_Forest
0.12433077 Tatar_Siberian
0.13207427 Tatar_Crimean_steppe
0.13276869 Tatar_Siberian_Zabolotniye
0.14038424 Tlingit
0.14647190 Finnish_East
0.15131717 Mansi
0.16238275 Russian_Kostroma
0.16399254 Finnish
0.16441909 Khanty
0.16453117 Turkmen_Uzbekistan
0.16464982 Turkmen
0.16634323 Uzbek
0.16761669 Nogai
0.17878843 Tajik_Hisor
0.17951571 Cossack_Kuban
0.18265408 Sarikoli_China
0.18304216 Tajik_Ayni
0.18317926 Russian_Yaroslavl
0.18423148 Iranian_Turkmen_Golestan
0.18544337 Russian_Tver
0.18957098 Russian_Ryazan
0.19026927 Karakalpak
0.19066948 Hazara_Afghanistan
0.19071665 Tajik_Shugnan
0.19160642 Tajik_Rushan
0.19176525 Tajik_Badakshan
0.19271019 Tajik_Kulob
0.19294976 Uygur
0.19302237 Estonian
0.19451723 Tubalar
0.19534246 Russian_Kursk
0.19670823 Russian_Kaluga
0.19704706 Russian_Orel
0.19782409 Cossack_Ukrainian
0.19811217 Hazara
0.19827910 Russian_Pskov
0.19911121 Tajik_Ishkashim
0.20084988 Russian_Voronez
0.20395317 Russian_Belgorod
0.20524660 Ukrainian
0.20555402 Latvian
0.20557912 Shor_Mountain
0.20588287 Russian_Smolensk
0.20770330 Shor_Khakassia
0.20779264 Swedish
0.20824022 Lithuanian_PZ
0.20962900 Shor
0.20964550 Polish
0.21008945 Turkish_Northwest
0.21076801 Turkish_Balikesir
0.21104590 Turkish_Rumeli
0.21150202 Tajik_Yagnobi
0.21176201 Turkish_Deliorman
0.21404769 Kho_Singanali
0.21421140 Hungarian
0.21756288 Jatt_Pathak
0.21783864 Turkish_Southwest
0.21796627 Kazakh
0.21816001 Turkish_Aydin
0.21957202 Turkish_South
0.21980440 Ror
0.22179960 German
0.22679866 Turkish_North
0.23126550 Uthmankhel
0.24294135 Turkish_Central


Do you find these runs shocking? Are you astonished that Udmurts, Mari and even a related group like some Saamis are genetically closer to Tajiks, many Central Asians and Siberian Turks, Khanty/Mansi than to ethnic Russians and most other Europeans?

Heck even the Mari1 sample is closer to a Kazakh than to Russians from Kursk, Kaluga and Orel. Or how Udmurt8 individual is genetically closer to Hazaras from Afghanistan and Uyghurs than he/she is to Russians from Yaroslavl, Tver and Ryazan. Pretty fascinating for me imo.

I think this is because G25 uses 25 coordinates. If they used a more simplistic system with fewer components, it would be clear that these groups are genetically closer to Europeans than they are to Asians. I wish we had averages for Gedrosia Ancient Eurasia K6 so we could set this straight.
So far it seems like a calculator error.
I have experienced such an error too, with the Eurogenes K36 calculator, according to which I am closer to Bulgarians than to Karelians.
46863
Of course that doesn't actually mean my genes resemble the genes of bulgarians more than those of finns, it's just a calculator error that you get when you look at calculators with too many components and too great a FTS distance

Nganasankhan
09-29-2021, 10:01 PM
I think this is because G25 uses 25 coordinates. If they used a more simplistic system with fewer components, it would be clear that these groups are genetically closer to Europeans than they are to Asians.

A PCA with 10 dimensions is simply a truncated version of a PCA with 25 dimensions, so you can make your own G10 by keeping only the first 10 dimensions in G25.

In the image below, the x-axis shows regular G25 distance to Maris, and the y-axis shows distance to Maris using only the first 10 dimensions of G25. The only population that is much above the regression line are Chuvashes. That's because out of modern population averages, Maris or Chuvashes are within the 20 populations with the highest or lowest value for PC13, PC14, PC15, PC16, PC19, PC20, and PC23. I think there were some Mari or Chuvash samples among the initial set of reference samples that later samples on G25 were projected on, so many PCs on G25 ended up accounting for drift that is specific to Maris and Chuvashes.

Mbuti and San are far below the regression line, because they are far from Maris on many PCs after the first 10 PCs.

https://i.ibb.co/42KMg43/g10distance.jpg


library(tidyverse)

t=read.csv("https://drive.google.com/uc?export=download&id=1wZr-UOve0KUKo_Qbgeo27m-CQncZWb8y",row.names=1)

xy=cbind(as.matrix(dist(t))[,"Mari"],as.matrix(dist(t[,1:10]))[,"Mari"])
colnames(xy)=c("x","y")

ggplot(xy,aes(x,y))+
geom_abline(linetype="dashed",color="gray80",size=.3)+
geom_smooth(method="lm",se=F,size=.5)+
geom_point(size=.5)+
geom_text(label=rownames(xy),size=2,vjust=-.7)+
scale_x_continuous(breaks=seq(0,10,.05))+
scale_y_continuous(breaks=seq(0,10,.05))+
labs(x="G25 distance to Mari (scaled)",y="G10 distance to Mari (scaled)")+
theme(
axis.text.y=element_text(angle=90,vjust=1,hjust=.5 ),
axis.text=element_text(size=6),
axis.ticks.length=unit(0,"cm"),
axis.ticks=element_blank(),
axis.title=element_text(size=8),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray85",fill=NA,size=.6),
panel.grid.major=element_line(color="gray85",size=.2)
)

ggsave("a.png",width=14,height=14)

al.Krivich
09-29-2021, 11:02 PM
A PCA with 10 dimensions is simply a truncated version of a PCA with 25 dimensions, so you can make your own G10 by keeping only the first 10 dimensions in G25.

In the image below, the x-axis shows regular G25 distance to Maris, and the y-axis shows distance to Maris using only the first 10 dimensions of G25. The only population that is much above the regression line are Chuvashes. That's because out of modern population averages, Maris or Chuvashes are within the 20 populations with the highest or lowest value for PC13, PC14, PC15, PC16, PC19, PC20, and PC23. I think there were some Mari or Chuvash samples among the initial set of reference samples that later samples on G25 were projected on, so many PCs on G25 ended up accounting for drift that is specific to Maris and Chuvashes.

Mbuti and San are far below the regression line, because they are far from Maris on many PCs after the first 10 PCs.

https://i.ibb.co/42KMg43/g10distance.jpg


library(tidyverse)

t=read.csv("https://drive.google.com/uc?export=download&id=1wZr-UOve0KUKo_Qbgeo27m-CQncZWb8y",row.names=1)

xy=cbind(as.matrix(dist(t))[,"Mari"],as.matrix(dist(t[,1:10]))[,"Mari"])
colnames(xy)=c("x","y")

ggplot(xy,aes(x,y))+
geom_abline(linetype="dashed",color="gray80",size=.3)+
geom_smooth(method="lm",se=F,size=.5)+
geom_point(size=.5)+
geom_text(label=rownames(xy),size=2,vjust=-.7)+
scale_x_continuous(breaks=seq(0,10,.05))+
scale_y_continuous(breaks=seq(0,10,.05))+
labs(x="G25 distance to Mari (scaled)",y="G10 distance to Mari (scaled)")+
theme(
axis.text.y=element_text(angle=90,vjust=1,hjust=.5 ),
axis.text=element_text(size=6),
axis.ticks.length=unit(0,"cm"),
axis.ticks=element_blank(),
axis.title=element_text(size=8),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray85",fill=NA,size=.6),
panel.grid.major=element_line(color="gray85",size=.2)
)

ggsave("a.png",width=14,height=14)

I'd need to find the dimensions of G25 that are most similar to one another to group them into G10 first. I dont have the needed knowledge about the calculator's inner workings to do that unfortunately

Nganasankhan
09-29-2021, 11:19 PM
Mmm can you post the link which shows Chuvash being the 4th closest population to Udmurts?

Just run the shell command I posted earlier (or look at the file at Pastebin in a spreadsheet application):


$ curl -s https://pastebin.com/raw/B1t0ESsj|tr -d \\r|awk -F, 'NR==1{for(i=2;i<=NF;i++)if($i==x)break;next}$1!=x{print$i,$1}' x=Udmurt -|sort -n|head
.00159 Besermyan
.00244 Tatar_Kazan
.00253 Bashkir
.00306 Chuvash
.00372 Tatar_Mishar
.00378 Nogai_Karachay_Cherkessia
.00385 Tatar_Siberian
.00403 Russian_Archangelsk_Krasnoborsky
.00409 Uzbek
.00424 Yukagir_Forest


I am certain Turkmens will be closer to Udmurts if the Russian samples is separate into different categories from various regions of Russia or if some of the significantly Uralic admixed Northern Russians were remove from the "Russian" mega-sample (assuming if there's some from Northern Russia among the 71 individuals).

It's half and half:

f2 distance to Udmurt:
0.00403 Russian_Vologda
0.00468 Russian_Kursk
0.00483 Russian_Yaroslavl
0.00505 Russian_Tver
0.00509 Russian_Ryazan
0.00516 Turkmen
0.00538 Russian_Kaluga
0.00542 Russian_Belgorod
0.00543 Russian_Orel
0.00551 Russian_Pskov
0.00576 Russian_Smolensk


Could you select the two most Nganasan-shifted Maris from the total 8 samples from Human Origins and run them on f2 absolute distances in relative to other populations?

There are no Mari samples in the HO dataset. There's one Mari sample in 1240K+HO, but it's from the 1240K dataset. Also it has an .SG suffix, so its absolute f2 distance to other samples is way too high. (I don't know if there's some trick I'm missing for dealing with the SG samples.)

Edit: Actually I think something went wrong with merging the Mari sample from Tambets 2018, because it was even closer to Bulgarians than to Estonians based on the list in my previous post. The f2 distance to the Mari.SG sample seems more reasonable:

https://i.ibb.co/y4VpZrH/mari.jpg


Saami_SWE are more ENA than Saami_Kola right?

Yeah, those are the same samples as Saami and Saami_Kola in G25, except G25 also has two Norwegian Saami samples from the Estonian Biocentre.


Several rankings in the absolute distances are indeed incorrect. For instance, Maris are closer to Hungarian, English, Bulgarian, Czech and French to Tajiks? Or Maris are closer to Czech and French than they are to Estonian and Uyghur or Udmurts showing up in the rank after Russian? But yes the relative order is somewhat okay, with Chuvash, Volga Tatars, Bashkirs, Siberian Tatars, Nogais and Uzbeks still coming up among the top ranks of closest pops to Maris. Btw where is Nogai_Astrakhan; it should somewhere between Ukrainian and Karakalpak or before the Hungarian, Mordovian and Ukrainian?

It's G25 that's incorrect, and f2 shows the real genetic distance. I didn't include Nogai_Astrakhan in the run because it had less than 8 samples. (BTW Astrakhan sounds cool, like it's some steppe warrior from the astral plane.)

Tsakhur
09-30-2021, 12:25 AM
Just run the shell command I posted earlier (or look at the file at Pastebin in a spreadsheet application):


$ curl -s https://pastebin.com/raw/B1t0ESsj|tr -d \\r|awk -F, 'NR==1{for(i=2;i<=NF;i++)if($i==x)break;next}$1!=x{print$i,$1}' x=Udmurt -|sort -n|head
.00159 Besermyan
.00244 Tatar_Kazan
.00253 Bashkir
.00306 Chuvash
.00372 Tatar_Mishar
.00378 Nogai_Karachay_Cherkessia
.00385 Tatar_Siberian
.00403 Russian_Archangelsk_Krasnoborsky
.00409 Uzbek
.00424 Yukagir_Forest



It's half and half:

f2 distance to Udmurt:
0.00403 Russian_Vologda
0.00468 Russian_Kursk
0.00483 Russian_Yaroslavl
0.00505 Russian_Tver
0.00509 Russian_Ryazan
0.00516 Turkmen
0.00538 Russian_Kaluga
0.00542 Russian_Belgorod
0.00543 Russian_Orel
0.00551 Russian_Pskov
0.00576 Russian_Smolensk



There are no Mari samples in the HO dataset. There's one Mari sample in 1240K+HO, but it's from the 1240K dataset. Also it has an .SG suffix, so its absolute f2 distance to other samples is way too high. (I don't know if there's some trick I'm missing for dealing with the SG samples.)

But the Mari sample from Tambets is more Mongoloid than the Mari sample from 1240K, so it's closer to Mansi, Hazara, Kazakhs, and Kyrgyzes:

https://i.ibb.co/PTgPhKb/mari.jpg



Yeah, those are the same samples as Saami and Saami_Kola in G25, except G25 also has two Norwegian Saami samples from the Estonian Biocentre.



It's G25 that's incorrect, and f2 shows the real genetic distance. I didn't include Nogai_Astrakhan in the run because it had less than 8 samples. (BTW Astrakhan sounds cool, like it's some steppe warrior from the astral plane.)

Oh ok. I will run that pastebin again

Those Russian_Vologda, Kursk, Belgorod and Smolensk have been add to the f2 pastebin? I can't find it.

The Mari from Tambets is closer to Mansi, Hazara, Kazakh, Kyrgyzs than to most Euros? Can you post codes and distances to other pops separately for the Mari from Tambets and the Mari from 1240 HK?

That make sense regarding the Saami.

Hmm but the previous f2 for Mari show them closer to Czech and French than they are to Estonian and Uyghur or also show Hungarian, English, Bulgarian, Czech, French closer to Mari than Tajiks are. So I thought something was wrong about that.

Also how many Udmurt individuals are there in f2? Can you pick the most Mongoloid-shifted Udmurt samples and run them in f2 distance? I want to see if they will pull closer to Asiatic groups even more than the Udmurt average.

Nganasankhan
09-30-2021, 01:39 AM
I'd need to find the dimensions of G25 that are most similar to one another to group them into G10 first. I dont have the needed knowledge about the calculator's inner workings to do that unfortunately

I tried using hierarchical clustering to divide the dimensions in the datasheet for scaled modern averages into ten groups:


> t=read.csv("https://drive.google.com/uc?export=download&id=1y49hyvviJpHj9esVqyeiFm32DhnPlfRQ",r=1)
> k=cutree(hclust(dist(t(t))),10)
> cat(sapply(unique(k),function(x)paste(x,paste(name s(k)[k==x],collapse=" "))),sep="\n")
1 PC1
2 PC2
3 PC3
4 PC4
5 PC5
6 PC6
7 PC7 PC8
8 PC9
9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC19 PC20 PC21 PC22 PC23 PC24 PC25
10 PC18

I think PC7 and PC8 were grouped together because they're both the highest in Americans and the lowest in Africans.


Hmm but the previous f2 for Mari show them closer to Czech and French than they are to Estonian and Uyghur or also show Hungarian, English, Bulgarian, Czech, French closer to Mari than Tajiks are. So I thought something was wrong about that.

Hmm hmm... Yeah actually I think I messed up merging that sample, because it was even closer to Bulgarians than to Estonians, so the distances do actually look wrong.

It would be nice if some pro like Generalissimo posted a global f2 matrix for a bigger set of populations.


Also how many Udmurt individuals are there in f2? Can you pick the most Mongoloid-shifted Udmurt samples and run them in f2 distance? I want to see if they will pull closer to Asiatic groups even more than the Udmurt average.

Here's the percentage of the eastern component in Udmurt samples in a K=2 Eurasian ADMIXTURE run:

31.0 Udmurt:UDM-254
30.7 Udmurt:UDM-138
29.6 Udmurt:UDM-256
29.1 Udmurt:UDM-033
29.1 Udmurt:UDM-004
29.0 Udmurt:UDM-034
28.3 Udmurt:UDM-229
27.1 Udmurt:UDM-335
26.9 Udmurt:UDM-255
26.5 Besermyan:UDM-362
26.2 Besermyan:UDM-366
25.6 Udmurt:UDM-318
24.9 Besermyan:UDM-357
24.4 Besermyan:UDM-287
23.5 Besermyan:UDM-309
23.5 Besermyan:UDM-349

The easternmost Udmurt sample (UDM-254) was even closer to Siberian Tatars than to Mishars:

f2 distance to Udmurt:UDM-254:
0.00046 Udmurt
0.00240 Bashkir
0.00271 Tatar_Kazan
0.00293 Chuvash
0.00347 Tatar_Siberian
0.00388 Tatar_Mishar
0.00389 Russian
0.00391 Nogai_Stavropol
0.00405 Uzbek
0.00418 Nogai_Karachay_Cherkessia
0.00428 Finnish
0.00456 Uyghur
0.00458 Mordovian
0.00462 Karelian
0.00480 Karakalpak
0.00484 Ukrainian
0.00486 Circassian
0.00486 Belarusian
0.00488 Ukrainian_North
0.00488 Kumyk
0.00488 Czech
0.00488 Abazin
0.00489 Estonian
0.00493 Kabardinian
0.00493 Veps
0.00503 Mansi
0.00505 Hungarian
0.00505 Tajik
0.00507 Tabasaran
0.00514 Pathan
0.00521 French
0.00522 Balkarf2 distance to Udmurt:
0.00046 Udmurt:UDM-254
0.00144 Bashkir
0.00168 Tatar_Kazan
0.00210 Chuvash
0.00238 Tatar_Mishar
0.00263 Russian
0.00272 Tatar_Siberian
0.00281 Nogai_Karachay_Cherkessia
0.00306 Uzbek
0.00309 Nogai_Stavropol
0.00310 Mordovian
0.00335 Finnish
0.00351 Ukrainian_North
0.00352 Tajik
0.00352 Belarusian
0.00353 Hungarian
0.00353 Estonian
0.00355 Kabardinian
0.00359 Ukrainian
0.00361 Uyghur
0.00362 Karelian
0.00364 Czech
0.00365 Abazin
0.00367 Kumyk
0.00375 Bulgarian
0.00376 Karakalpak
0.00383 Gagauz
0.00387 Veps
0.00387 French
0.00392 English
0.00392 Circassian
0.00397 Croatian

Tsakhur
09-30-2021, 02:17 AM
I tried using hierarchical clustering to divide the dimensions in the datasheet for scaled modern averages into ten groups:


> t=read.csv("https://drive.google.com/uc?export=download&id=1y49hyvviJpHj9esVqyeiFm32DhnPlfRQ",r=1)
> k=cutree(hclust(dist(t(t))),10)
> cat(sapply(unique(k),function(x)paste(x,paste(name s(k)[k==x],collapse=" "))),sep="\n")
1 PC1
2 PC2
3 PC3
4 PC4
5 PC5
6 PC6
7 PC7 PC8
8 PC9
9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC19 PC20 PC21 PC22 PC23 PC24 PC25
10 PC18

I think PC7 and PC8 were grouped together because they're both the highest in Americans and the lowest in Africans.



Hmm hmm... Yeah actually I think I messed up merging that sample, because it was even closer to Bulgarians than to Estonians, so the distances do actually look wrong.

It would be nice if some pro like Generalissimo posted a global f2 matrix for a bigger set of populations.



Here's the percentage of the eastern component in Udmurt samples in a K=2 Eurasian ADMIXTURE run:

31.0 Udmurt:UDM-254
30.7 Udmurt:UDM-138
29.6 Udmurt:UDM-256
29.1 Udmurt:UDM-033
29.1 Udmurt:UDM-004
29.0 Udmurt:UDM-034
28.3 Udmurt:UDM-229
27.1 Udmurt:UDM-335
26.9 Udmurt:UDM-255
26.5 Besermyan:UDM-362
26.2 Besermyan:UDM-366
25.6 Udmurt:UDM-318
24.9 Besermyan:UDM-357
24.4 Besermyan:UDM-287
23.5 Besermyan:UDM-309
23.5 Besermyan:UDM-349

The easternmost Udmurt sample (UDM-254) was even closer to Siberian Tatars than to Mishars:

f2 distance to Udmurt:UDM-254:
0.00046 Udmurt
0.00240 Bashkir
0.00271 Tatar_Kazan
0.00293 Chuvash
0.00347 Tatar_Siberian
0.00388 Tatar_Mishar
0.00389 Russian
0.00391 Nogai_Stavropol
0.00405 Uzbek
0.00418 Nogai_Karachay_Cherkessia
0.00428 Finnish
0.00456 Uyghur
0.00458 Mordovian
0.00462 Karelian
0.00480 Karakalpak
0.00484 Ukrainian
0.00486 Circassian
0.00486 Belarusian
0.00488 Ukrainian_North
0.00488 Kumyk
0.00488 Czech
0.00488 Abazin
0.00489 Estonian
0.00493 Kabardinian
0.00493 Veps
0.00503 Mansi
0.00505 Hungarian
0.00505 Tajik
0.00507 Tabasaran
0.00514 Pathan
0.00521 French
0.00522 Balkarf2 distance to Udmurt:
0.00046 Udmurt:UDM-254
0.00144 Bashkir
0.00168 Tatar_Kazan
0.00210 Chuvash
0.00238 Tatar_Mishar
0.00263 Russian
0.00272 Tatar_Siberian
0.00281 Nogai_Karachay_Cherkessia
0.00306 Uzbek
0.00309 Nogai_Stavropol
0.00310 Mordovian
0.00335 Finnish
0.00351 Ukrainian_North
0.00352 Tajik
0.00352 Belarusian
0.00353 Hungarian
0.00353 Estonian
0.00355 Kabardinian
0.00359 Ukrainian
0.00361 Uyghur
0.00362 Karelian
0.00364 Czech
0.00365 Abazin
0.00367 Kumyk
0.00375 Bulgarian
0.00376 Karakalpak
0.00383 Gagauz
0.00387 Veps
0.00387 French
0.00392 English
0.00392 Circassian
0.00397 Croatian

Are there separate Russian_Smolensk, Russian_Kursk, Russian_Ryazan, Russian_Orel, etc. in the f2 spreadsheet? I can't find them in the pastebin.

I find it fishy as well when Estonians come up in the rank after Bulgarians, Czechs or how Tajiks come up in the rank after Ukrainians, English, Bulgarians and French.

Can you do the f2 run specifically for the Mari sample from Tambets?

Nice! It's unusual though how in Udmurt:UDM-254, Tajiks are farther away in the f2 ranking than Ukrainians, Belarusians, Czech, Hungarians while in the average Udmurt f2 run, Tajiks are closer than Ukrainians, Hungarians, Belarusians and Czechs are. Something is inconsistent here. Can you also include the various Russians such as Kursk, Orel, Tver, Ryazan, Belgorod, Voronezh along with Russian Archangelsk into the f2 Udmurt:UDM-254 runs?

Nganasankhan
09-30-2021, 02:57 PM
Are there separate Russian_Smolensk, Russian_Kursk, Russian_Ryazan, Russian_Orel, etc. in the f2 spreadsheet? I can't find them in the pastebin.

No, but they're now included in the f2 matrix linked below.


Can you do the f2 run specifically for the Mari sample from Tambets?

I already posted it twice, but I think the results were wrong because I merged the samples incorrectly or something.

Cardona et al. 2014 also includes four Mari samples: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE73996. I now merged the samples from that paper with samples from 1240K+HO, and I posted an f2 run here: https://drive.google.com/uc?export=download&id=1HamurTWskoqPdey_yDufvL56SpPr6St5. I used `--indep-pairwise 50 10 .1 --geno .01`, which kept about 110,000 SNPs. This time I included all samples instead of 8 samples from each population, so it introduces a bias because populations with a bigger sample size tend to get lower f2 distance to other populations. This shows f2 distance to Cordona_Mari:


$ curl -Ls 'https://drive.google.com/uc?export=download&id=1HamurTWskoqPdey_yDufvL56SpPr6St5'|awk -F, 'NR==1{for(i=2;i<=NF;i++)if($i==x)break;next}$1!=x{print$i,$1}' x=Cordona_Mari -|sort -n
.001192 Chuvash
.002292 Tatar_Kazan
.002536 Bashkir
.003925 Yukagir_Forest
.003986 Tatar_Siberian
.004034 Tatar_Mishar
.004161 Cordona_Komi
.004279 Nogai_Karachay_Cherkessia
.004400 Nogai_Stavropol
.004564 Russian_Vologda
.004610 Udmurt
.004616 Uzbek
.004838 Mordovian
.004845 Russian_Archangelsk_Krasnoborsky
.005104 Cordona_Turkmen
.005132 Cordona_Mordva
.005139 Turkmen
.005208 Besermyan
.005413 Finnish
.005564 Cordona_Tajik
.005580 Karakalpak
.005612 Russian_Ryazan
.005620 Russian_Yaroslavl
.005630 Nogai_Astrakhan
.005654 Russian_Tver
.005659 Karelian
.005679 Tajik
.005689 Russian_Kursk
.005730 Russian_Archangelsk_Leshukonsky
.005828 Uyghur
.005840 Russian_Orel
.006000 Ukrainian
.006040 Russian_Archangelsk_Pinezhsky
.006071 Russian_Belgorod
.006078 Ukrainian_North
.006158 Russian_Kaluga
.006168 Russian_Smolensk
.006187 Russian_Pskov
.006192 Veps
.006290 Abazin
.006378 Estonian
.006407 Cordona_Kazakh
.006415 Czech
.006452 Belarusian
.006510 Kabardinian
.006528 Hungarian
.006827 Kumyk
.006948 Cordona_German
.007024 Lithuanian
.007061 Kazakh
.007073 Croatian
.007080 Cordona_British
.007101 English
.007107 Balkar
.007164 Bulgarian
.007177 Gagauz
.007197 Cordona_Uygur
.007200 Hazara
.007354 Circassian
.007393 Icelandic
.007448 Norwegian
.007455 French
.007458 Mansi
.007524 Cordona_Turk
.007567 Turkish
.007651 Cordona_Khant
.007767 Lezgin
.007777 Ossetian
.007812 Pathan
.007837 Aleut
.007852 Azeri
.007874 Romanian
.007882 Kyrgyz_Tajikistan
.007890 Albanian
.007983 Adygei
.007991 Moldavian
.008030 Orcadian
.008060 Spanish
.008153 Greek
.008199 Burusho
.008213 Tatar_Siberian_Zabolotniye
.008222 Italian_North
.008323 Kyrgyz_Kyrgyzstan
.008343 Chechen
.008368 Even
.008373 Iranian
.008473 Sindhi_Pakistan
.008482 Karachai
.008487 GujaratiA
.008532 Cordona_Iranian
.008608 Tabasaran
.008646 Sicilian
.008700 Kyrgyz_China
.008754 Bahun
.008770 Cordona_Italian
.008827 Italian_South
.008888 Scottish
.008890 Ingushian
.008979 Spanish_North
.009013 Khakass
.009038 Cordona_Teleut
.009060 Kaitag
.009070 Tlingit
.009259 Jew_Turkish
.009366 Iranian_Bandari
.009414 Balochi
.009494 Lak
.009577 Abkhasian
.009603 Armenian
.009628 Cordona_Avar
.009671 Avar
.009731 GujaratiB
.009750 Canary_Islander
.009807 Makrani
.009820 Maltese
.009833 Ezid
.009851 Lebanese_Muslim
.009859 Jew_Ashkenazi
.010018 Lebanese
.010040 Brahui
.010251 Georgian
.010286 Cypriot
.010351 GujaratiC
.010357 Kazakh_China
.010462 Jordanian
.010586 Lebanese_Christian
.010596 Altaian
.010624 Basque
.010841 Turkish_Balikesir
.010859 Syrian
.010869 Cordona_Tundra_Nentsi
.010895 Armenian_Hemsheni
.010904 Cordona_Altai_Kizhi
.010964 Jew_Moroccan
.010977 Altaian_Chelkan
.011039 Assyrian
.011043 Bengali
.011068 Darginian
.011271 Kurd
.011306 Palestinian
.011310 Cordona_Ket
.011353 BedouinA
.011427 Tubalar
.011430 Punjabi
.011574 Druze
.011672 Jew_Cochin
.011710 Jew_Iraqi
.011715 Jew_Iranian
.011745 Yemeni
.011868 Shor_Mountain
.011880 Khakass_Kachin
.011932 Egyptian
.011980 Jew_Georgian
.012051 Selkup
.012126 Cordona_Egyptian
.012171 Newar
.012187 Kalmyk
.012253 Sardinian
.012536 Cordona_Indian
.012596 Cordona_Selkup
.012597 Cordona_Sri_Lankan
.012606 Tharu
.012723 Jew_Tunisian
.012790 GujaratiD
.012852 Tunisian
.012968 Evenk_FarEast
.013093 Libyan
.013113 Jew_Libyan
.013131 Tuvinian
.013401 Kubachinian
.013428 Yemeni_Northwest
.013567 Mongol
.013655 Yemeni_Highlands
.013865 Yemeni_Highlands_Raymah
.013878 Yemeni_Desert2
.013930 Cordona_Dolgan
.013945 Saudi
.013958 Cordona_Mongol
.014084 Cordona_Buryat
.014414 Buryat
.014739 Moroccan
.014948 Ket
.015052 Jew_Yemenite
.015120 Khamnegan
.015197 Salar
.015695 Shor_Khakassia
.015698 Dolgan
.015733 Enets
.015829 Dongxiang
.016349 Kalash
.016444 Dungan
.016584 Burmese
.016725 Cordona_Forest_Nentsi
.016956 Yemeni_Desert
.017077 Mozabite
.017388 Magar
.017491 Yakut
.017706 Cordona_Even
.017921 Cordona_Yakut
.017928 Tamang
.018004 Algerian
.018358 Tu
.018419 Mongola
.018605 Bonan
.018825 Daur
.018848 Todzin
.018915 BedouinB
.019003 Yugur
.019048 Xibo
.019159 Oroqen
.019328 Saharawi
.019868 Gurung
.019969 Yukagir_Tundra
.020131 Malay
.020140 Hezhen
.020171 Tibetan
.020677 Cambodian
.020798 Thai
.020988 Eritrea
.021149 Cordona_Tibetan
.021256 Tofalar
.021587 Cordona_Manchu
.021588 Negidal
.021729 Korean
.021788 Tagalog
.021804 Evenk_Transbaikal
.021828 Yi
.022031 Qiang
.022040 Naxi
.022165 Sherpa
.022214 Nanai
.022217 Kusunda
.022257 Han
.022264 Cordona_Korean
.022298 Ulchi
.022377 Cordona_Evenk
.022402 Japanese
.022638 Tujia
.022654 Kinh
.022701 Jew_Ethiopian
.022723 Rai
.023169 Cordona_Vietnamese
.023495 Cordona_Koryak
.023531 Vietnamese
.023578 Cordona_Indonesia_Java
.023868 Cordona_Han_South
.023885 Miao
.024178 Visayan
.024191 Dong
.024408 Zhuang
.024567 Mulam
.024878 Cordona_Nganasan
.024969 Dai
.025380 Gelao
.025486 She
.025565 Maonan
.025645 Li
.025940 Chukchi
.026380 Nganasan
.026808 Eskimo_ChaplinSireniki
.026954 Ilocano
.026996 Tibetan_Yunnan
.027966 China_Lahu
.028093 Cordona_Yukagir
.028327 Dusun
.028498 Nivh
.029392 Somali
.029413 Koryak
.030709 Murut
.031172 Mayan
.031225 Itelmen
.031298 Quechua
.031466 Ami
.031766 Eskimo_Naukan
.033054 Bolivian
.033951 Datog
.034193 Zapotec
.035164 Kankanaey
.035844 Mixtec
.037644 Masai
.038999 Atayal
.039872 Chukchi1
.041418 Kikuyu
.041684 AA
.041845 Mixe
.043945 Piapoco
.044230 Pima
.049022 Nasioi
.051883 Luhya
.052012 Cordona_PNG_Highland
.052563 Luo
.053645 BantuKenya
.054251 Cordona_Dinka
.054766 Gambian
.055445 Malawi_Ngoni
.055953 Malawi_Tumbuka
.055957 Malawi_Yao
.056146 Mende
.056202 Mandenka
.056325 Malawi_Chewa
.056549 Yoruba
.056946 BantuSA_Ovambo
.057042 Esan
.057383 Australian
.058506 Namibia_Bantu_Herero
.058539 BantuSA
.059191 Karitiana
.061020 Papuan
.063687 Khomani
.065488 Surui
.066858 Biaka
.070883 Hadza1
.077581 Mbuti
.084229 Ju_hoan_North

Below is an MDS plot which contains the 64 populations that are closest to the Cordona_Mari in the f2 matrix linked above. Komis cluster together with Maris, and Komis plot further north on PC2 than Maris, but I think it's because they are Siberian Komis that include some Khanty-like and Nenets-like samples.

The fourth dimension ended up being dominated by Maris, but the same thing also happened when I did a SmartPCA run of the samples from Tambets 2018.

https://i.ibb.co/Mpx1wJd/marimds.png

Below is also a similar MDS plot that includes the 128 closest populations to Cordona_Mari. Now Ob-Ugrians and Swamp Tatars are the furthest from zero in the third dimension, and Tlingits and Aleuts are the furthest from zero in the fourth dimension. Now the cluster that includes Maris and Komis also includes Siberian Tatars and Forest Yukaghirs. On G25, Swamp Tatars are almost as close to Maris as other Siberian Tatars, but here there's a much bigger difference, which is probably because Swamp Tatars are more drifted than other Siberian Tatars. Less drifted populations tend to be connected with a line to more populations, and more drifted populations tend to be connected to fewer populations, but the number of lines to other populations is 2 for Swamp Tatars and 7 for other Siberian Tatars.

https://i.ibb.co/4d9cvY2/marimds2.png

Tsakhur
09-30-2021, 04:06 PM
I think this is because G25 uses 25 coordinates. If they used a more simplistic system with fewer components, it would be clear that these groups are genetically closer to Europeans than they are to Asians. I wish we had averages for Gedrosia Ancient Eurasia K6 so we could set this straight.
So far it seems like a calculator error.
I have experienced such an error too, with the Eurogenes K36 calculator, according to which I am closer to Bulgarians than to Karelians.
46863
Of course that doesn't actually mean my genes resemble the genes of bulgarians more than those of finns, it's just a calculator error that you get when you look at calculators with too many components and too great a FTS distance

You could be right but G25 doesn't even have or use components like gedmatch calculators do. G25 is a mathematical tool and its coordinates are based on your overall genome, not just the amounts of SNPs that your raw autosomal DNA have with Gedmatch calculators. In my opinion, G25 is more accurate than simplistic systems that use lower components. I will quote another member in this forum because he can explained it much better than me:


Admixture calculator distances function mainly on component overlap, and fails to account for the divergence of population sources themselves as I've explained before. G25 distances are euclidean distances (i.e. the distance from one point to another), and thus both of them account for distance in a very different way. This is one reason why different admixture calculators will give you different distances, as these distances are heavily reliant on the components used. G25 does not overestimate the difference for divergent components but more accurately represents the divergent aspect of the genome, as can be seen for e.g. on a PCA.

Take this analogy for e.g., say there are 3 components, I will use the colours green, blue and red. Green and blue are cool colours, while red is a warm colour, and hence belong to different groupings. In an admixture calculator, the distances would mainly be calculated on the overlap of colours on an individual sample, while G25 will account for the fact that red is a warm colour and will be rather divergent from blue and green.

I think G25 is more accurate in showing that Volga Uralics like Udmurts, Maris are indeed genetically closer to Asiatics such as Tajiks, Uzbeks, Siberian Tatars, Turkmens, Nogais, Karakalpaks, etc than to ethnic Russians (without Uralic admixture) and most other Euros than even simplistic calculators that use few components due to the reasons outlined and quoted above. Moreover, its because Udmurts, Maris have very significant amounts of Eastern Eurasian ancestry around 28-34% on average and even some CHG/Iran_N-related ancestry likely from ancient Iranian and Turkic tribes who used to roam the Urals, which pulls them closer to those Asians while most ethnic Russians (not counting those who have Uralic admixture) and most other Europeans have very little to none Mongoloid admixture and thus makes them more distant from Volga Uralics than many Central Asians and Siberian groups do.

Let say, if there is a Mari who is 35% East Asian, a Karakalpak who is 55% East Eurasian and a Uralic admixed Russian who is 6% Mongoloid, the Mari will be closer to the Karakalpak than to the Russian because the amounts of East Eurasian ancestry of Mari and Karakalpak are closer to one another. Another example, if there is an Udmurt who is 30% ENA (Eastern Non-African, another term for East Eurasian), a Nogai Turk who is 45% East Asian and a Czech who is 0% East Eurasian, the Udmurt will be closer to Nogai in the distance run than he/she is to the Czech because they have closer amounts of Eastern ancestry to one another.

Nganasankhan
09-30-2021, 07:48 PM
Admixture calculator distances function mainly on component overlap, and fails to account for the divergence of population sources themselves as I've explained before. G25 distances are euclidean distances (i.e. the distance from one point to another), and thus both of them account for distance in a very different way. This is one reason why different admixture calculators will give you different distances, as these distances are heavily reliant on the components used. G25 does not overestimate the difference for divergent components but more accurately represents the divergent aspect of the genome, as can be seen for e.g. on a PCA.

If you use Vahaduo to calculate the distance between populations in the spreadsheet of an admixture calculator, it uses the same formula of Euclidean distance as with G25. You can think of K15 as a 15-dimensional space where the coordinate inside each dimension ranges from 0 to 100. Then the distance between two populations in K15 is the Euclidean distance between their points in 15-dimensional space.

When you calculate distances between populations in the datasheet of an admixture calculator, you can sometimes make it more accurate by multiplying the matrix of admixture percentages with the FST matrix of the calculator. Sometimes it gives weird results though.

1. When you calculate distances between populations in K15 without multiplying by FST like in the top left graph below, it overestimates the distance of Maris to Turkmens, because Maris and Turkmens have a similar amount of total Mongoloid ancestry, but in K15 they have different western components and different eastern components. Maris are very close to Kets and Selkups, because in the original K15 spreadsheet (http://bga101.blogspot.com/2013/10/eurogenes-k15-now-at-gedmatch.html), the proportion of the Eastern_Euro component is the highest in Maris, 5th highest in Selkups, and 10th highest in Kets.
2. In the top right graph below where the K15 matrix is multiplied by the FST matrix, it moves Turkmens closer to Maris, but it also moves some South Asian populations like Pathan and Burusho too close to Maris. However it also causes Maris to be closer to Jordanians than to Lithuanians, which seems wrong.
3. In the bottom left graph below, you can see that relative to f2, G25 underestimates the distance of Maris to many drifted populations like West Siberians, Kusunda, Eskimos, Itelmens, Kubachinians, and Udmurts. There's a huge difference in the distance to Even, but it's probably because different samples were used in G25 and in my f2 matrix.
4. In the bottom right graph below, you can see that if K15 without multiplication by FST is compared to G25, it underestimates the distance of Maris to Eastern European populations and overestimates the distance to Western European populations. That's because K15 has many European components that have low FST distances with each other, so it overestimates the distances between European populations.

https://i.ibb.co/4VdMR80/maridistance.jpg