PDA

View Full Version : Heatmaps of Uralic and Eurasian Y-DNA and mtDNA from Tambets 2018



Nganasankhan
03-19-2021, 01:58 PM
https://i.ibb.co/v1hHz1n/tambets-2018-yhg-mtdna.jpg

The data is from table S5 and S4 of Tambets et al. 2018 ("Genes reveal traces of common recent demographic history for most of the Uralic-speaking populations"): https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1522-1.

I omitted a few columns where the total percentage among all populations was 4% or less.

In the table for the distribution of Y-DNA haplogroups, there were rows for data about Kets and Selkups from two different sources, but I replaced them with the average value of the rows (not weighted by sample size).

The dendrograms are sorted based on the frequency of haplogroups that are characteristic of East Eurasians. The clustering is done with the hclust function using Euclidean distances and the complete linkage method.

Sources for Y-DNA:

Saami from Sweden (n=73): 1,2,3
Saami from Kola Peninsula (n=23): 4
Finns (n=536): 5,6
Karelians (n=140): this study; updated from 7,8,9
Vepsians (n=39): this study; updated from 7,8,9
Estonians (n=327): 1,2,6
Latvians (n=199): 1,2,6
Lithuanians (n=164): 6
Swedes (n=1188): 3,6
Russians North (n=380): 10
Russians Central (n=364): 10
Russians South (n=484): 10
Hungarians (n=110): 1.2
Mordovians (n=82): 1
Udmurts (n=184): this study; 1,11
Komis (n=135): this study; updated from 7,8,9,12
Maris (n=97): this study; updated from 7,8,9,11
Bashkirs (n=122): this study; updated from 8,9
Chuvashes (n=193): this study; updated from 8,9,11
Tatars (n=207): this study; updated from 8,9,13
Gagauz (Moldova) (n=80): this study; updated from 2
Khanty (n=86): 12,13,14
Mansis (n=25): 14
Nenets (n=148): 12
Enets (n=9): 12
Nganasans (n=38): 12
Selkups (n=131): 12
Selkups (n=43): this study
Yakuts (n=369): 15,13,12
Buryats (n=385): 16,12,13
Mongolians (n=350): 12,17
Kalmyk (n=68): 16,12,13
Even (n=31): 12
Evenks (n=50): 16
Oroqens (n=30): 12,18
Uygurs (n=109): 12,4
Uzbeks (n=78): 12,13
Altaians (n=380): 13,12,16
Chelkans (n=25): 19
Tubulars (n=27): 19
Altai-Kizhi (n=120): 19
Dolgans (n=67): 12
Khakassians (n=228): 7,16
Shors (n=74): 7,16
Tuvinians (n=518): this study; updated from 8,13,16
Turks (n=523): 20
Turkmen (n=30): 4
Kazakhs (n=139): 4,13,12
Kyrgyz (n=91): 12.13
Nogay (Kuban) (n=87): 21
Balkar (n=135): 21
Kumyk (n=73): 21
Kets (n=69): 12.13
Kets (n=22): this study
Nivhkis (n=10): 13
Yukaghirs (n=11): 12
Tadjiks (n=24): 13

1. Tambets et al. (2004). The western and eastern roots of the Saami - The story of genetic "outliers" told by mitochondrial DNA and Y chromosomes. American Journal of Human Genetics 74, 661-682.
2. Rootsi et al. (2004). Phylogeography of Y-chromosome haplogroup I reveals distinct domains of prehistoric gene flow in Europe. American Journal of Human Genetics 75, 128-137.
3. Karlsson et al. (2006). Y-chromosome diversity in Sweden - A long-time perspective. Eur J Hum Genet 14, 963-970.
4. Wells et al. (2001). The Eurasian heartland: a continental perspective on Y-chromosome diversity. Proc Natl Acad Sci U S A 98, 10244-10249.
5. Lappalainen et al. (2006). Regional differences among the finns: A Y-chromosomal perspective. Gene 376, 207-215.
6. Lappalainen et al. (2008). Migration waves to the Baltic Sea region. Annals of Human Genetics 72, 337-348.
7. Rootsi et al. (2007). A counter-clockwise northern route of the Y-chromosome haplogroup N from Southeast Asia towards Europe. Eur J Hum Genet 15, 204-211.
8. Underhill et al. (2010). Separating the post-Glacial coancestry of European and Asian Y chromosomes within haplogroup R1a. Eur J Hum Genet 18, 479-484.
9. Myres et al. (2011). A major Y-chromosome haplogroup R1b Holocene era founder effect in Central and Western Europe. Eur J Hum Genet 19, 95-101.
10. Balanovsky et al. (2008). Two sources of the Russian patrilineal heritage in their Eurasian context. Am J Hum Genet 82, 236-250.
11. Semino et al. (2000). The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: a Y chromosome perspective. Science 290, 1155-1159.
12. Karafet et al. (2002). High levels of Y-chromosome differentiation among native Siberian populations and the genetic signature of a boreal hunter-gatherer way of life. Hum Biol 74, 761-789.
13. Kharkov, V.N.: Structure of Y-chromosomal lineages in Siberian populations. PhD thesis (in Russian). Tomsk, Research Institute of Medical Genetics at the Tomsk Scientific Center, Siberian Division of Russian Academy of Medical Sciences, 2005.
14. Pimenoff et al. (2008). Northwest Siberian Khanty and Mansi in the junction of West and East Eurasian gene pools as revealed by uniparental markers. Eur J Hum Genet 16, 1254-1264.
15. Fedorova et al. (2013). Autosomal and uniparental portraits of the native populations of Sakha (Yakutia): implications for the peopling of Northeast Eurasia. BMC Evol Biol 13, 127.
16. Derenko et al. (2006). Contrasting patterns of Y-chromosome variation in south Siberian populations from Baikal and Altai-Sayan regions. Human Genetics 118, 591-604.
17. Katoh et al. (2005). Genetic features of Mongolian ethnic groups revealed by Y-chromosomal analysis. Gene 346, 63-70.
18. Sengupta et al. (2006). Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenou
and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J Hum Genet 78, 202-221.
19. Dulik et al. (2012). Mitochondrial DNA and Y Chromosome Variation Provides Evidence for a Recent Common Ancestry between Native Americans and Indigenous Altaians
(vol 90, pg 229, 2012). American Journal of Human Genetics 90, 573-573.
20. Cinnioglu et al. (2004). Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet 114, 127-148.
21. Yunusbayev et al. (2011). The Caucasus as an Asymmetric Semipermeable Barrier to Ancient Human Migrations. Mol Biol Evol.

Sources for mtDNA:

Estonians (n=409): Loogväli et al. 2004, this study
Finns (n=603): Meinilä et al. 2001; Hedman et al. 2007
Ingrian Finns (n=36): Lappalainen et al. 2008
Karelians (n=595): Lappalainen et al. 2008; this study
Vepsians (n=125): Lappalainen et al. 2008; this study
Saami from Finland (n=69): Lappalainen et al. 2008; Sajantila et al. 1995
Saami from Norway (n=278): Sajantila et al. 1995; Dupuy and Olaisen et al. 1996; Delghandi et al. 1998
Saami from Sweden (n=98): Sajantila et al. 1995; Tambets et al. 2004
Saami from Kola Peninsula (n=86): this study
Maris (n=136): Bermisheva et al. 2002
Mordovians (n=298): Bermisheva et al. 2002; this study
Komis (n=345): Bermisheva et al. 2002; Gubina et al. 2005; this study
Udmurts (n=182): Bermisheva et al. 2002; this study
Hungarians (n=116): this study
Khanty (n=405): Gubina et al. 2005; Pimenoff et al. 2008; Naumova et al. 2009, this study
Mansis (n=199): Derbeneva et al. 2002, Pimenoff et al. 2008; this study
Selkup (n=120): this study
Nenets (n=137): Saillard et al. 2000; this study
Nganasans (n=131): Derbeneva et al. 2002; Goltsova et al. 2005
Tu (n=35): Yao et al. 2002a
Buryat (n=472): Derenko et al. 2003; 2007; Starikovskaya et al. 2005; Gibert et al. 2010
Daur (n=45): Kong et al. 2003
Kalmyk (n=230): Derenko et al. 2007; this study
Mongol (n=262): Kolman et al. 1996; Yao et al. 2002a; 2004; Kong et al. 2003; Derenko et al. 2007
Oroqen (n=44): Kong et al. 2003
Even (n=215): Derenko et al. 1997; Pakendorf et al. 2007; Fedorova et al. 2013
Evenk (n=480): this study; Kong et al. 2003; Starikovskaya et al. 2005; Derenko et al. 2007; Gibert et al. 2010; Fedorova et al. 2013
Hezhen (n=86): this study
Balkar (n=160): Quintana-Murci et al. 2004; this study
Nogay (n=129): Yunusbayev et al. 2010
Kumyk (n=112): this study
Turk (n=478): Quintana-Murci et al. 2004; this study
Gagauz (n=134): this study
Bashkir (n=215): Bermisheva et al. 2002; this study
Chuvash (n=169): Richards et al. 2000; Bermisheva et al. 2002; this study
Tatar (n=196): Bermisheva et al. 2002; Comas et al. 2004
Kazakh (n=572): Yao et al. 2000, 2004; this study
Kyrgyz (n=157): Comas et al. 2004; this study
Turkmen (n=77): Comas et al. 2004; Quintana-Murci et al. 2004; this study
Uzbek (n=259): Quintana-Murci et al. 2004; Yao et al. 2004; this study
Altaian (n=110): Derenko et al. 2003
Tuvinian (n=291): Derenko et al. 2003; 2007; Starikovskaya et al. 2005
Khakas (n=110): Derenko et al. 2003; 2007
Shor (n=82): Derenko et al. 2007
Uyghur (n=166): Yao et al. 2000; this study
Yakut (n=562): Derenko et al. 1997; 2007; Puzyrev et al. 2003; Fedorova et al. 2013
Dolgan (n=156): this study; Fedorova et al. 2013
Latvian (n=411): Lappalainen et al. 2008; Pliss et al. 2006
Lithuanian (n=201): Lappalainen et al. 2008; this study
Swede (n=550): Lappalainen et al. 2008; this study
Russian_North (n=144): this study
Russian_South (n=199): Malyarchuk et al. 2002
Tadjik (n=20): Comas et al. 2004
Ket (n=104): Derbeneva et al. 2002; this study
Nivkhi (n=56): Starikovskaya et al. 2005
Yukaghir (n=100): Volodko et al. 2008

1. this study
2. Bermisheva, M., Tambets, K., Villems, R., and Khusnutdinova, E. (2002). Diversity of mitochondrial DNA haplotypes in ethnic populations of the Volga-Ural region of Russia. Mol Biol 36, 990–1001.
3. Comas, D., Plaza, S., Wells, R.S., Yuldaseva, N., Lao, O., Calafell, F., and Bertranpetit, J. (2004). Admixture, migrations, and dispersals in Central Asia: evidence from maternal DNA lineages. Eur J Hum Genet 12, 495–504.
4. Delghandi, M., Utsi, E., and Krauss, S. (1998). Saami mitochondrial DNA reveals deep maternal lineage clusters. Hum Hered 48, 108–114.
5. Dupuy, B.M., and Olaisen, B. (1996). MtDNA sequences in the Norwegian Saami and main population. In Advances in Forensic Haemogenetics. 6, A. Carracedo, B. Brinkmann, and W. Bär, eds. (Berlin, Heidelberg, New York: Springer-Verlag), pp. 23–25.
6. Fedorova, S.A., Reidla, M., Metspalu, E., Metspalu, M., Rootsi, S., Tambets, K., Trofimova, N., Zhadanov, S.I., Kashani, B., Olivieri, A., et al. (2013). Autosomal and uniparental portraits of the native populations of Sakha (Yakutia): implications for the peopling of Northeast Eurasia. BMC Evol. Biol. 13, 127.
7. Gibert, M., Theves, C., Ricaut, F.X., Dambueva, I., Bazarov, B., Moral, P., Crubezy, E., Perrucho, M., Felix-Sanchez, M., and Sevin, A. (2010). mtDNA variation in the Buryat population of the Barguzin Valley: New insights into the micro-evolutionary history of the Baikal area. Ann. Hum. Biol. 37, 501–523.
8. Goltsova, T.V., Osipova, L., Zhadanov, S., Villems, R, T. V (2005). The Effect of Marriage Migration on the Genetic Structure of the Taimyr Nganasan Population. Russ. J. Genet. 41, 954–965.
9. Губина, М.А., Осипова, Л.П., and Виллемс, Р. (2005). Анализ материнского генофондапо полиморфизму митохондриальной ДНК в популяциях хантов и коми Шурышкарского района ЯНАО. In Коренное Население Шурышкарского Района Ямало-Ненецкого Автономного Округа: Демографические, Генетичесцкие И Медитцинские Аспекты, Л.П. Осипова, ed. (Новосибирск: ART-AVENUE), pp. 105–117.
10. Hedman, M., Brandstätter, A., Pimenoff, V., Sistonen, P., Palo, J.U., Parson, W., and Sajantila, A. (2007). Finnish mitochondrial DNA HVS-I and HVS-II population data. Forensic Sci. Int. 172, 171–178.
11. Kong, Q.-P., Yao, Y.-G., Liu, M., Shen, S.-P., Chen, C., Zhu, C.-L., Palanichamy, M.G., and Zhang, Y.-P. (2003). Mitochondrial DNA sequence polymorphisms of five ethnic populations from northern China. Hum. Genet. 113, 391–405.
12. Naumova, O.I., Khaiat, S.S., and Rychkov, S.I. (2009). [Mitochondrial DNA diversity in Kazym Khanty]. Genetika 45, 857–861.
13. Pakendorf, B., Novgorodov, I.N., Osakovskij, V.L., and Stoneking, M. (2007). Mating patterns amongst Siberian reindeer herders: Inferences from mtDNA and Y-chromosomal analyses. Am. J. Phys. Anthropol. 133, 1013–1027.
14. Pimenoff, V.N., Comas, D., Palo, J.U., Vershubsky, G., Kozlov, A., and Sajantila, A. (2008). Northwest Siberian Khanty and Mansi in the junction of West and East Eurasian gene pools as revealed by uniparental markers. Eur. J. Hum. Genet. 16, 1254–1264.
15. Pliss, L., Tambets, K., Loogvali, E.L., Pronina, N., Lazdins, M., Krumina, A., Baumanis, V., and Villems, R. (2006). Mitochondrial DNA portrait of Latvians: towards the understanding of the genetic structure of Baltic-speaking populations. Ann Hum Genet 70, 439–458.
16. Puzyrev, V.P., Stepanov, V.A., Golubenko, M. V, Puzyrev, K. V, Maximova, N.R., Kharkov, V.N., Spiridonova, M.G., and Nogovitsina, A.N. (2003). MtDNA and Y-Chromosome Lineages in the Yakut Population. Genetika 39, 975–981.
17. Quintana-Murci, L., Chaix, R., Wells, S., Behar, D., Sayar, H., Scozzari, R., Rengo, C., Al-Zahery, N., Semino, O., Santachiara-Benerecetti, A.S., et al. (2004). Where West meets East: The complex mtDNA landscape of the Southwest and Central Asian corridor. Am J Hum Genet 74, 827–845.
18. Sajantila, A., Lahermo, P., Anttinen, T., Lukka, M., Sistonen, P., Savontaus, M.L., Aula, P., Beckman, L., Tranebjaerg, L., Gedde-Dahl, T., et al. (1995). Genes and languages in Europe: an analysis of mitochondrial lineages. Genome Res 5, 42–52.
19. Starikovskaya, E.B., Sukernik, R.I., Derbeneva, O.A., Volodko, N. V, Ruiz-Pesini, E., Torroni, A., Brown, M.D., Lott, M.T., Hosseini, S.H., Huoponen, K., et al. (2005). Mitochondrial DNA diversity in indigenous populations of the southern extent of Siberia, and the origins of Native American haplogroups. Ann. Hum. Genet. 69, 67–89.
20. Tambets, K., Rootsi, S., Kivisild, T., Help, H., Serk, P., Loogvali, E.L., Tolk, H. V, Reidla, M., Metspalu, E., Pliss, L., et al. (2004). The Western and Eastern Roots of the Saami--the Story of Genetic "Outliers" Told by Mitochondrial DNA and Y Chromosomes. Am J Hum Genet 74, 661–682.
21. Yao, Y.-G., Nie, L., Harpending, H., Fu, Y.X., Yuan, Z.-G., and Zhang, Y.-P. (2002). Genetic relationship of Chinese ethnic populations revealed by mtDNA sequence diversity. Am J Phys Anthr. 118, 63–76.
22. Yao, Y.G., Kong, Q.P., Wang, C.Y., Zhu, C.L., and Zhang, Y.P. (2004). Different matrilineal contributions to genetic structure of ethnic groups in the silk road region in china. Mol Biol Evol 21, 2265–2280.
23. Derbeneva, O.A., Starikovskaya, E.B., Wallace, D.C., and Sukernik, R.I. (2002). Traces of early Eurasians in the Mansi of northwest Siberia revealed by mitochondrial DNA analysis. Am J Hum Genet 70, 1009–1114.
24. Derbeneva, O.A., Starikovskaya, E.B., Volod’ko, N. V, Wallace, D.C., and Sukernik, R.I. (2002). [Mitochondrial DNA variation in Kets and Nganasans and the early peoples of Northern Eurasia]. Genetika 38, 1554–1560.
25. Derenko, M. V, and Shields, G.F. (1997). [Diversity of mitochondrial DNA nucleotide sequences in three groups of aboriginal inhabitants of Northern Asia]. Mol Biol 31, 784–789.
26. Derenko, M. V, Grzybowski, T., Malyarchuk, B.A., Dambueva, I.K., Denisova, G.A., Czarny, J., Dorzhu, C.M., Kakpakov, V.T., Miscicka-Sliwka, D., Wozniak, M., et al. (2003). Diversity of mitochondrial DNA lineages in South Siberia. Ann Hum Genet 67, 391–411.
27. Derenko, M., Malyarchuk, B., Grzybowski, T., Denisova, G., Dambueva, I., Perkova, M., Dorzhu, C., Luzina, F., Lee, H.K., Vanecek, T., et al. (2007). Phylogeographic analysis of mitochondrial DNA in northern Asian Populations. Am J Hum Genet 81, 1025–1041.
28. Kolman, C., Sambuughin, N., and Bermingham, E. (1996). Mitochondrial DNA analysis of Mongolian populations and implications for the origin of New World founders. Genetics 142, 1321–1334.
29. Lappalainen, T., Laitinen, V., Salmela, E., Andersen, P., Huoponen, K., Savontaus, M.L., and Lahermo, P. (2008). Migration waves to the Baltic Sea region. Ann. Hum. Genet. 72, 337–348.
30. Loogväli, E.-L., Roostalu, U., Malyarchuk, B.A., Derenko, M. V, Kivisild, T., Metspalu, E., Tambets, K., Reidla, M., Tolk, H.-V., Parik, J., et al. (2004). Disuniting uniformity: a pied cladistic canvas of mtDNA haplogroup H in Eurasia. Mol Biol Evol 21, 2012–2021.
31. Malyarchuk, B.A., Grzybowski, T., Derenko, M. V, Czarny, J., Wozniak, M., and Miscicka-Sliwka, D. (2002). Mitochondrial DNA variability in Poles and Russians. Ann Hum Genet 66, 261–283.
32. Meinilä, M., Finnilä, S., and Majamaa, K. (2001). Evidence for mtDNA admixture between the Finns and the Saami. Hum Hered 52, 160–170.
33. Richards, M., Macaulay, V., Hickey, E., Vega, E., Sykes, B., Guida, V., Rengo, C., Sellitto, D., Cruciani, F., Kivisild, T., et al. (2000). Tracing European founder lineages in the Near Eastern mtDNA pool. Am J Hum Genet 67, 1251–1276.
34. Saillard, J., Evseva, I., Tranebjaerg, L., and Norby, S. (2000). Mitochondrial DNA diversity among Nenets. In Archaeogenetics: DNA and and the Population Prehistory of Europe, C. Renfrew, and K. Boyle, eds. (Cambridge: McDonald Institute for Archaeological Research Monograph Series, Cambridge University), pp. 255–258.
35. Volodko, N. V, Starikovskaya, E.B., Mazunin, I.O., Eltsov, N.P., Naidenko, P. V, Wallace, D.C., and Sukernik, R.I. (2008). Mitochondrial genome diversity in arctic Siberians, with particular reference to the evolutionary history of Beringia and Pleistocenic peopling of the Americas. Am. J. Hum. Genet. 82, 1084–1100.
36. Yao, Y.G., Lu, X.M., Luo, H.R., Li, W.H., and Zhang, Y.P. (2000). Gene admixture in the silk road region of China: evidence from mtDNA and melanocortin 1 receptor polymorphism. Genes Genet Syst 75, 173–178.
37. Yunusbayev, B., Metspalu, M., Järve, M., Kutuev, I., Rootsi, S., Metspalu, E., Behar, D.M., Varendi, K., Sahakyan, H., Khusainova, R., et al. (2012). The Caucasus as an asymmetric semipermeable barrier to ancient human migrations. Mol. Biol. Evol. 29, 359–365.


library(pheatmap)
library(colorspace) # for hex()
library(vegan) # for reorder.hclust()

t=read.csv("https://pastebin.com/raw/aGPQSC24",row.names=1,header=T,check.names=F) # Y-DNA
# t=read.csv("https://pastebin.com/raw/MmttxJJM",row.names=1,header=T,check.names=F) # mtDNA

t=t[,colSums(t)>=4]

wts=t[,"C3 (M217)"]+t[,"N(xN3)1# (M231)"]+t[,"N32# (TAT/M178)"]+t[,"P+Q+R*+R2 (M74/M242/M207/M124)"]
# wts=t[,"A"]+t[,"B"]+t[,"C"]+t[,"D"]+t[,"X"]+t[,"Z"]
sort=reorder(hclust(dist(t)),wts=wts)

pheatmap(
t,
clustering_callback=function(...){sort},
cluster_cols=F,
filename="output.png",
legend=F,
treeheight_row=80,
treeheight_col=80,
cellwidth=16,
cellheight=16,
fontsize=8,
border_color=NA,
display_numbers=T,
number_format="%.0f",
fontsize_number=7,
number_color="black",
breaks=seq(0,100,100/256),
colorRampPalette(hex(HSV(c(210,210,120,60,40,20,0) ,c(0,.5,.5,.5,.5,.5,.5),1)))(256)
)

Nganasankhan
03-19-2021, 05:25 PM
Here's just Uralic populations:

https://i.ibb.co/Xs1Qzkw/tambets-uralic-ydna.pnghttps://i.ibb.co/b7q44Ns/tambets-uralic-mtdna.png

Below is the combined proportion of these groups: Y, X, A, B, F, M(xD/G/C/Z), D, G, C, Z, L.

$ curl -Ls pastebin.com/raw/MmttxJJM|tr -d \\r>tambetsmtdna
$ awk -F, 'NR>1{s=0;for(i=27;i<=37;i++)s+=$i;printf"%.1f %s\n",s,$1}' tambetsmtdna|sort -rn
100.1 Nivkh
100.0 Yukaghir
99.0 Even
97.7 Oroqen
97.1 Tu
96.1 Dolgan
93.9 Evenk
90.6 Yakut
88.9 Daur
85.6 Tuvan
85.5 Buryat
83.7 Mongol
82.5 Hezhen
78.8 Nganasan
78.1 Khakas
76.9 Shor
74.6 Kyrgyz
72.7 Kalmyk
63.6 Altaian
57.8 Kazakh
56.9 Nenets
54.7 Uyghur
47.2 Uzbek
45.5 Turkmen
45.0 Tajik
40.6 Bashkir
38.4 Ket
38.3 Selkup
37.2 Mansi
30.1 Udmurt
26.8 Khanty
25.0 Nogay
15.9 Saami (Finland)
13.9 Komi
12.5 Balkar
9.9 Kumyk
9.7 Turk
9.6 Tatar
7.8 Chuvash
7.3 Mari
6.3 Russian (North)
5.4 Karelian
5.0 Russian (South)
3.5 Saami (Kola)
3.5 Hungarian
3.2 Finnish
2.9 Saami (Norway)
2.9 Gagauz
2.8 Finnish (Ingrian)
2.0 Mordovian
1.6 Vepsian
1.2 Swedish
1.0 Saami (Sweden)
0.7 Latvian
0.4 Estonian
0.0 Lithuanian

On the list above, Finnish Saami have more eastern mtDNA than Swedish, Norwegian, or Kola Saami. Also Udmurts have more eastern mtDNA than Maris.

The percentage for Nganasans is only 79% on the list above because it doesn't include U4:


U4 has been found in ancient DNA,[84] and it is relatively rare in modern populations,[41] although it is found in substantial ratios in certain indigenous populations of Northern Asia and Northern Europe, being associated with the remnants of ancient European hunting-gatherers preserved in the indigenous populations of Siberia.[85][86][87] U4 is found in the Nganasan people of the Taymyr Peninsula,[63][88] in the Mansi (16.3%) an endangered people,[87] and in the Ket people (28.9%) of the Yenisei River.[87] It is found in Europe with highest concentrations in Scandinavia and the Baltic states.[89] and is found in the Sami population of the Scandinavian peninsula (although, U5b has a higher representation).[90]

Jaska
03-19-2021, 08:29 PM
Very illustrative, thank you! :)
I've tried to find out which of the Saami mtDNA lineages could have spread from the Volga direction:
http://www.elisanet.fi/alkupera/SaameMTDNA.pdf

Nganasankhan
03-20-2021, 08:34 AM
I've tried to find out which of the Saami mtDNA lineages could have spread from the Volga direction:
http://www.elisanet.fi/alkupera/SaameMTDNA.pdf

I made a heatmap for table B from your paper:

https://i.ibb.co/MC5swJ8/saami-mtdna.png

In the table from Tambets et al., I was wondering why D and Z were much more common among Finnish Saami than among Norwegian, Swedish, or Russian Saami. From the table you compiled, we can see that D5a was only common among Inari Saami and not Skolt Saami, even though the sample size for Inari Saami was small. The source for data about both Inari Saami and Skolt Saami was Sajantila et al. 1995 (https://www.researchgate.net/publication/14485460_Genes_and_languages_in_Europe_An_analysis _of_mitochondrial_lineages). Also Z1a was fairly common among Saami from Västerbotten who practiced a traditional lifestyle. (In Ingman & Gyllensten 2007 (https://www.semanticscholar.org/paper/Rate-variation-between-mitochondrial-domains-and-in-Ingman-Gyllensten/646e65c68352fb70ba00f49f7f7f699d81ea21e8), Saami from Västerbotten were divided into two groups based on whether they practiced a traditional lifestyle or not. Västerbotten is part of the South Saami and Ume Saami region.)


library(pheatmap)
library(colorspace)

data=",U5b1b1a,V,H,D5a,Z1a,U5a,W+T,Other
Finland: Skolt (47),18,24,0,0,4,1,0,0
Finland: Lake Inari (22),10,2,2,6,1,1,0,0
Norway: Karasjok-1 (21),11,8,2,1,0,0,0,0
Norway: Karasjok-2 (16),80,62,1,5,3,0,3,0
Norway: Kautokeino (75),57,17,7,0,0,0,1,0
Norway: Coastal (23),9,6,0,2,0,0,1,0
Sweden: Jokkmokk (39),12,27,5,0,0,0,0,0
Sweden: Norrbotten-1 (25),8,14,2,0,1,0,0,0
Sweden: Norrbotten-2 (152),54,89,4,0,1,0,4,0
Sweden: Västerbotten-trad (46),11,17,7,0,5,0,2,4
Sweden: Västerbotten-non (92),15,8,41,0,1,0,3,24
Sweden: Other (73),18,52,2,0,0,0,1,0
Russia: Kola (85),48,17,11,2,0,7,0,0
GenBank (95),53,18,20,3,0,1,0,0"

t=read.csv(text=data,check.names=F,header=T,row.na mes=1)
t=100*(t/rowSums(t))

pheatmap(
t,
filename="out.png",
legend=F,
cluster_cols=F,
cluster_rows=F,
cellwidth=16,
cellheight=16,
fontsize=8,
border_color=NA,
display_numbers=T,
number_format="%.0f",
fontsize_number=7,
number_color="black",
breaks=seq(0,100,100/256),
colorRampPalette(hex(HSV(c(210,210,120,60,45,30,15 ,0),c(0,.5,.5,.5,.5,.5,.5,.5),1)))(256)
)

Nganasankhan
04-30-2021, 07:00 AM
Here's a heatmap of table S2 from Ilumäe et al. (2016), "Human Y Chromosome Haplogroup N: A Non-trivial Time-Resolved Phylogeography that Cuts across Language Families": https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5005449/. I also made a scatterplot where the clustering is based on a distance matrix generated from the full haplogroup table.

https://i.imgur.com/z6jL4nd.png


library(pheatmap)
library(colorspace)

t=read.csv("https://pastebin.com/raw/Wr9xPHnH",row.names=1,check.names=F)
n=t[,1]
t=t[,-c(1,2)]
t=t/n
t=cbind(t,rowSums(t))
colnames(t)[ncol(t)]="Total"
rownames(t)=paste0(rownames(t)," (n=",n,")")

pheatmap(
100*t,
filename="t/b.png",
cluster_cols=F,
cluster_rows=F,
legend=F,
treeheight_row=80,
treeheight_col=80,
cellwidth=16,
cellheight=16,
fontsize=8,
border_color=NA,
display_numbers=T,
number_format="%.0f",
fontsize_number=7,
number_color="black",
gaps_col=ncol(t)-1,
colorRampPalette(hex(HSV(c(210,210,170,120,60,40,2 0,0),c(0,rep(.5,7)),1)))(256)
)


library(tidyverse)
library(ggrepel)

t=read.csv("https://pastebin.com/raw/Wr9xPHnH",row.names=1,check.names=F)

xy=100*data.frame(y=rowSums(t[,8:ncol(t)])/t[,1],x=rowSums(t[,3:7])/t[,1])

a=t[,-c(1,2)]/t[,1]
xy$k=cutree(hclust(dist(a)),16)

ggplot(xy,aes(x,y))+
geom_point(aes(color=as.factor(k)),size=.5)+
geom_polygon(data=xy%>%group_by(k)%>%slice(chull(x,y)),alpha=.2,aes(color=as.factor(k) ,fill=as.factor(k)),size=.3)+
geom_text_repel(aes(label=rownames(xy),color=as.fa ctor(k)),size=2.4,force=5,box.padding=0,point.padd ing=.05,min.segment.length=.1,segment.size=.2,max. overlaps=Inf)+
# geom_text(label=rownames(xy),aes(color=as.factor(k )),size=2.4,vjust=-.7)+
geom_abline(linetype="dashed",color="gray80",size=.3)+
labs(y="N3-M46 (%)",x="N2a-P43 (%)")+
scale_x_continuous(breaks=seq(0,100,10))+
scale_y_continuous(breaks=seq(0,100,10))+
coord_fixed()+
scale_color_manual(values=hcl(head(seq(0,360,lengt h.out=n_distinct(xy$k)+1),-1),125,55))+
theme(
axis.text=element_text(color="black",size=6),
axis.text.y=element_text(angle=90,vjust=1,hjust=.5 ),
axis.ticks.length=unit(0,"pt"),
axis.ticks=element_blank(),
axis.title=element_text(size=8),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray80",fill=NA,size=.6),
panel.grid.major=element_line(color="gray80",size=.25),
plot.background=element_rect(fill="white")
)

ggsave("a.png",height=8,width=8)

Here's also a series of biplots based on the haplogroup table. One thing that stands out in the first plot is that Dolgans from Yakutia have more N3a2 so they are closer to Yakuts, but Dolgans from Taimyr have more N2a1 so they are closer to Samoyeds.

https://i.imgur.com/yM20jGv.png


library(tidyverse)
library(ggrepel)

t=read.csv("https://pastebin.com/raw/Wr9xPHnH",row.names=1,check.names=F)
n=t[,1]
t=t[,-c(1,2)]
t=t/n

p=prcomp(t)
pct=paste0(colnames(p$x)," (",sprintf("%.1f",p$sdev/sum(p$sdev)*100),"%)")
p2=as.data.frame(p$x)
p2$k=as.factor(cutree(hclust(dist(t)),k=16))
load=p$rotation

for(i in c(1,3,5,7)){
x=sym(j("PC",i))
y=sym(j("PC",i+1))

mult=min(max(p2[,i])/max(load[,i]),max(p2[,i+1])/max(load[,i+1]))

ggplot(p2,aes(!!x,!!y))+
geom_segment(data=load,aes(x=0,y=0,xend=mult*!!x,y end=mult*!!y),arrow=arrow(length=unit(.3,"lines")),color="gray60",size=.4)+
annotate("text",x=(mult*load[,i]),y=(mult*load[,i+1]),label=row.names(load),size=2.5,vjust=ifelse(load[,i+1]>0,-.5,1.4))+
geom_polygon(data=p2%>%group_by(k)%>%slice(chull(!!x,!!y)),alpha=.2,aes(color=k,fill=k ),size=.3)+
geom_point(aes(color=k),size=.6)+
geom_text(aes(label=row.names(t),color=k),size=2.5 ,vjust=-.6)+
labs(x=pct[i],y=pct[i+1])+
coord_fixed()+
scale_x_continuous(breaks=seq(-2,2,.1),expand=expansion(mult=.06))+
scale_y_continuous(breaks=seq(-2,2,.1),expand=expansion(mult=.06))+
scale_color_manual(values=hcl(head(seq(15,375,leng th=length(unique(p2$k))+1),-1),120,50))+
theme(
aspect.ratio=1,
axis.text=element_text(color="black",size=6),
axis.ticks=element_line(size=.3,color="gray60"),
axis.ticks.length=unit(-.1,"cm"),
axis.text.x=element_text(margin=margin(.2,0,0,0,"cm")),
axis.text.y=element_text(angle=90,vjust=1,hjust=.5 ,margin=margin(0,.2,0,0,"cm")),
axis.title=element_text(color="black",size=8),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray60",fill=NA,size=.4),
panel.grid=element_blank()
)

ggsave(paste0(i,".png"))
}

Nganasankhan
05-05-2021, 06:09 AM
Here's also biplots based on tables S5 and S4 from Tambets et al. 2018 (https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1522-1#Sec25).

In the image for Y-DNA below, PC1 has the highest loading for C3, and Oroqens who have the most C3 (87%) are the furthest away from zero on PC1. Evenks are part of the same cluster with Mongolians, but Evens and Oroqens are part of the same cluster with Kalmyks and Buryats.

PC2 has the highest loading for N-M178 (TAT) and N-M231 (other N), because both are found among Uralic populations. However Selkups have so little N that they are on the opposite side of PC2 from the most N-rich Uralic populations.

Based on Y-DNA, Selkups cluster together with Chelkans and Tubalars, who are two subgroups of Northern Altaians.

https://i.ibb.co/WGpwKhp/tambets-biplot-y.jpg

Based on mtDNA, Finns cluster together with Swedes, Hungarians, Gagauzes. Even Maris and Chuvashes are part of the same cluster with Finns, but Udmurts and Komis cluster with Turks, Kumyks, Balkars, Nogays, and Tatars.

Selkups cluster together with Kets based on both mtDNA and Y-DNA. However Khanty and Mansi only cluster together with Kets based on mtDNA. So based on mtDNA, Uralics cluster together with the locals not only in Europe but also in Western Siberia.

PC4 has the highest loading for Y, and Nivkhs are the furthest away from zero on PC4. PC6 differentiates V (which is more common among Swedish Saami) from U5 (which is more comon among Saami from Norway and Kola).

Khakasses and Shors are differentiated from other Western Siberians based on high F, as you can see from the bottom right plot.

https://i.ibb.co/gV9gPN4/tambets-biplot-mt.jpg


library(tidyverse)

t=read.csv("https://pastebin.com/raw/aGPQSC24",row.names=1,header=T,check.names=F) # Y-DNA
# t=read.csv("https://pastebin.com/raw/MmttxJJM",row.names=1,header=T,check.names=F) # mtDNA

p=prcomp(t)
pct=paste0(colnames(p$x)," (",sprintf("%.1f",p$sdev/sum(p$sdev)*100),"%)")
p2=as.data.frame(p$x)
p2$k=factor(cutree(hclust(dist(t)),k=12))
load=p$rotation

for(i in seq(1,7,2)){
x=sym(paste0("PC",i))
y=sym(paste0("PC",i+1))

mult=min(max(p2[,i])/max(load[,i]),max(p2[,i+1])/max(load[,i+1]))

ggplot(p2,aes(!!x,!!y))+
geom_segment(data=load,aes(x=0,y=0,xend=mult*!!x,y end=mult*!!y),arrow=arrow(length=unit(.3,"lines")),color="gray60",size=.4)+
annotate("text",x=(mult*load[,i]),y=(mult*load[,i+1]),label=rownames(load),size=2.5,vjust=ifelse(load[,i+1]>0,-.5,1.4))+
geom_polygon(data=p2%>%group_by(k)%>%slice(chull(!!x,!!y)),alpha=.2,aes(color=k,fill=k ),size=.3)+
geom_point(aes(color=k),size=.6)+
geom_text(aes(label=rownames(t),color=k),size=2.5, vjust=-.6)+
labs(x=pct[i],y=pct[i+1])+
scale_x_continuous(breaks=seq(-200,200,20),expand=expansion(mult=.06))+
scale_y_continuous(breaks=seq(-200,200,20),expand=expansion(mult=.06))+
scale_color_manual(values=hcl(head(seq(15,375,leng th=length(unique(p2$k))+1),-1),120,50))+
theme(aspect.ratio=1,
axis.text=element_text(color="black",size=6),
axis.ticks=element_line(size=.3,color="gray60"),
axis.ticks.length=unit(-.1,"cm"),
axis.text.x=element_text(margin=margin(.2,0,0,0,"cm")),
axis.text.y=element_text(angle=90,vjust=1,hjust=.5 ,margin=margin(0,.2,0,0,"cm")),
axis.title=element_text(color="black",size=8),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray60",fill=NA,size=.4),
panel.grid=element_blank())

ggsave(paste0(i,".png"),width=6,height=6)
}

Nganasankhan
05-06-2021, 07:42 AM
I now made updated versions of the heatmaps in the first post of this thread, where I used the `ComplexHeatmap` package to draw row and column names on all four sides of the plot: https://jokergoo.github.io/ComplexHeatmap-reference/book/. I also added sample sizes and the sources of the data to the images.

`ComplexHeatmap` cannot be installed with `install.packages`, but you need to use `BiocManager`: `install.packages("BiocManager"); BiocManager::install("ComplexHeatmap")`. The installation failed before I ran `brew install libxt` and `install.packages("png")`.

https://i.ibb.co/L6h0jtM/complexheatmap-tambets-2018-ydna.png
https://i.ibb.co/FxFD1cM/complexheatmap-tambets-2018-mtdna.png


library(ComplexHeatmap)
library(circlize) # for colorRamp2
library(colorspace) # for hex
library(vegan) # for reorder.hclust (may be masked by the package seriation)

t=read.table("https://pastebin.com/raw/9VcbF62f",sep="\t",check.names=F,header=T,comment.char="") # mtDNA
# t=read.table("https://pastebin.com/raw/N30HaLsz",sep="\t",check.names=F,header=T,comment.char="") # Y-DNA
t2=as.matrix(t[,-c(1:3)])

east=23:34
# east=c(1,2,4,10,11,12,13,14)
roweast=rowSums(t2[,east])
coleast=as.numeric(seq(ncol(t2))%in%east)

png("a.png",w=4000,h=4000,res=100)

ht_opt$COLUMN_ANNO_PADDING=unit(0,"mm")
ht_opt$ROW_ANNO_PADDING=unit(0,"mm")

Heatmap(
t2,
show_heatmap_legend=F,
show_column_names=F,
show_row_names=F,
width=ncol(t)*unit(30,"pt"),
height=nrow(t)*unit(30,"pt"),
column_dend_height=unit(200,"pt"),
row_dend_width=unit(200,"pt"),
clustering_distance_rows="euclidean",
clustering_distance_columns="euclidean",
cluster_rows=reorder(hclust(dist(t2)),-roweast),
cluster_columns=reorder(hclust(dist(t(t2))),-coleast),
column_title="Source: Tambets et al. 2018, table S5",column_title_gp=gpar(fontsize=24),
rect_gp=gpar(col="gray80",lwd=1.5),
col=colorRamp2(seq(0,100,length.out=7),hex(HSV(c(2 10,210,130,60,40,20,0),c(0,rep(.5,6)),1))),
cell_fun=function(j,i,x,y,w,h,fill)grid.text(sprin tf("%.0f",t2[i,j]),x,y,gp=gpar(fontsize=17)),
top_annotation=columnAnnotation(text=anno_text(gt_ render(colnames(t2),padding=unit(c(3,3,3,3),"mm")),just="left",rot=90,location=unit(0,"npc"),gp=gpar(fontsize=17,border="gray80",lwd=1.5))),
bottom_annotation=columnAnnotation(text=anno_text( gt_render(colnames(t2),padding=unit(c(3,3,3,3),"mm")),just="left",rot=270,gp=gpar(fontsize=17,border="gray80",lwd=1.5))),
left_annotation=rowAnnotation(text=anno_text(gt_re nder(t[,1],padding=unit(c(3,3,3,3),"mm")),just="right",location=unit(1,"npc"),gp=gpar(fontsize=17,border="gray80",lwd=1.5))),
right_annotation=rowAnnotation(
text1=anno_text(gt_render(t[,1],padding=unit(c(3,3,3,3),"mm")),just="left",location=unit(0,"npc"),gp=gpar(fontsize=17,border="gray80",lwd=1.5)),
text2=anno_text(gt_render(sprintf("%.0f",roweast),padding=unit(c(3,3,3,3),"mm")),just="center",location=unit(.5,"npc"),gp=gpar(fontsize=17,border="gray80",lwd=1.5)),
text3=anno_text(gt_render(t[,2],name="N",padding=unit(c(3,3,3,3),"mm")),just="center",location=unit(.5,"npc"),gp=gpar(fontsize=17,border="gray80",lwd=1.5)),
text4=anno_text(gt_render(t[,3],name="Sources",padding=unit(c(3,3,3,3),"mm")),just="left",location=unit(0,"npc"),gp=gpar(fontsize=17,border="gray80",lwd=1.5))
)
)

decorate_annotation("text2",grid.text("Total eastern (%)",y=unit(1,"npc")+unit(3,"mm"),rot=90,just="left",gp=gpar(fontsize=17)))
decorate_annotation("text3",grid.text("N",y=unit(1,"npc")+unit(3,"mm"),rot=90,just="left",gp=gpar(fontsize=17)))

dev.off()
system("mogrify -gravity center -trim -border 16 -bordercolor white a.png")

Nganasankhan
05-06-2021, 10:43 AM
Here's a combined table for both Y-DNA and mtDNA. It's missing a few populations that didn't have data for both Y-DNA and mtDNA.


Population,Y*(x/DE/F),C3 (M217),E3b (M35),D (M174),F (M89),G (M201),I (M170),J (12f2),K* (M9),L (M20),N(xN3)1# (M231),N32# (TAT/M178),O (M175),P+Q+R*+R2,R1b (M173/M269),R1a (SRY1532/M198),mt:H,mt:HV,mt:V,mt:U(xU1-8),mt:U1,mt:U2,mt:U3,mt:U4,mt:U5,mt:U6,mt:U7,mt:U8 ,mt:K,mt:R(xJ/T/F/B),mt:J,mt:T,mt:N(xN1-2/N9a/A/X/Y),mt:I,mt:N1a,mt:N1b,mt:N1c,mt:N1e,mt:N2a,mt:W,mt :N9,mt:Y,mt:X,mt:A,mt:B,mt:F,mt:M(xD/G/C/Z),mt:D,mt:G,mt:C,mt:Z,mt:L
Altaian,0,11.1,0.3,3.7,1.3,0.5,1.6,2.6,1.3,0,4.2,4 .2,2.4,16.6,2.9,47.4,6.4,0,0,0,0,5.5,1.8,5.5,3.6,0 ,0,0,0,0,3.6,0.9,1.8,0,2.7,0,0,0,0,0,4.5,0,2.7,0,3 .6,9.1,7.3,15.5,1.8,19.1,4.5,0
Balkar,0,0,0,0,0,32.6,3,19.3,0,0,0,0,0,3.7,13.3,28 .1,25.0,3.8,0.6,0,8.1,3.8,5.6,0.6,1.3,0,1.9,0,5,0, 3.1,10.6,0.6,3.1,0,2.5,0,0,0,11.9,0,0,8.1,1.3,0,0, 0,2.5,0.6,0,0,0
Bashkir,0,4.9,0,0,0,0,2.5,1.6,0,0,0.8,29.5,0.8,0,2 8.7,31.1,12.1,0.9,2.8,0,0,0.5,0,12.1,14,0,0,0.5,1. 4,0,3.3,5.1,1.4,0,3.7,0,0,0,0,0.5,1.4,0.5,0,4.2,0. 9,5.6,1.9,9.3,4.7,12.6,0.9,0
Buryat,0,59.5,0,0,1,0.3,0.3,0.3,6.2,0,1.3,26.5,0.8 ,1.6,0.8,1.6,4.9,0.8,0.2,0,0.2,0,0,0.6,2.1,0,0,0.4 ,0.8,0,1.1,0.8,0.4,0.2,0.2,0,0,0.2,0,0,1.3,1.3,0.2 ,3.6,4.2,2.3,4.4,33.7,10.8,23.7,1.3,0
Chuvash,0,0.5,5.7,0,1.6,1,12.4,11.4,0.5,0.5,10.9,2 2.8,0,0,3.1,29.5,40.8,0,4.7,0,0,1.2,0.6,11.8,10.1, 0,0,3,8.3,0,4.1,2.4,2.4,0,0.6,0,0,0,0,2.4,0,0,0,3, 0,0,1.8,1.8,0,0.6,0.6,0
Dolgan,0,37.3,1.5,0,0,3,1.5,0,0,0,11.9,22.4,4.5,0, 1.5,16.4,1.3,0,0,0,0,0,0,1.3,0,0,0,0,1.3,0,0,0,0,0 ,0,0,0,0,0,0,0,0,0,4.5,0,2.6,8.3,39.7,0.6,39.1,1.3 ,0
Estonian,0,0,3.4,0,0.3,1.2,18,1.8,0.6,0,0,31.8,0,0 .6,7.3,34.9,45.0,0,7.1,0,0.2,1.5,1,5.4,14.4,0,0,1. 7,2.7,0,10.3,7.8,1,0,1.2,0.2,0,0,0,0,0,0,0.2,0,0,0 ,0,0.2,0,0,0,0
Even,0,74.2,0,0,0,0,3.2,0,0,0,0,12.9,0,3.2,0,6.5,0 .0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.9,0,0,0,0,0,0,0,0,0 ,0,5.6,0,2.3,0.5,3.7,1.9,23.7,8.8,39.5,13,0
Evenk,0,40,0,0,4,0,2,0,0,0,18,16,0,0,6,14,2.3,0,0, 0,0,0,0,0.2,0,0,0,0,0.4,0,3.3,0,0,0,0,0,0,0,0,0,0, 1.7,0.4,6.7,1.7,1.5,0.4,26.3,2.1,52.1,1,0
Finnish,0.2,0.2,0.4,0,0.4,0,28.9,0,0,0,0.6,58.2,0, 0.4,3.7,7.1,41.5,0.5,3.6,0,0,0.5,0,1,25.2,0,0.5,0. 5,4.1,0,4.5,3.5,3,1,0.5,0,0,0,0,7,0,0,1.5,0,0,0,0, 0.2,0,0,1.5,0
Hungarian,0,0.9,7.3,0,0.9,1.8,27.3,15.5,0,0,0,0.9, 0,2.7,20.9,21.8,45.7,0,2.6,0,0.9,0.9,1.7,3.4,8.6,0 .9,0,0.9,4.3,0,7.8,11.2,1.7,0,0,1.7,0,0,0,4.3,0,0, 0.9,0,0,0,1.7,0,0.9,0,0,0
Kalmyk,0,70.6,0,0,0,0,0,0,4.4,1.5,2.9,0,0,11.8,2.9 ,5.9,2.2,5.2,0.9,0.4,0,0.9,0,2.2,2.6,0,0.4,0,1.7,0 ,3.5,1.3,0.9,0.4,0.4,0,0,0,0,0,4.3,0.9,0,5.7,4.3,7 ,9.6,24.8,7,10.4,3,0
Karelian,0,0,7.1,0,1.4,0,10,0.7,0,0,0.7,36.4,0,0,2 .9,40.7,45.7,0.2,5.5,1,2.4,0.5,0.8,3.2,16.1,0,0,3. 2,1.5,0,6.7,2.7,2.7,0.8,0,0.2,0,0,0,1.3,0,0,0.2,0, 0,0,0,4.7,0.3,0,0.2,0
Kazakh,0,47.5,0,0.7,5,2.9,0,5.8,4.3,0,0,5,19.4,3.6 ,4.3,1.4,12.9,2.4,0.5,0.3,0.5,1.2,0.3,1.6,4,0,0.7, 1.2,1.6,0,2.8,5.4,1.4,0.5,0.9,0,0,0,0,1,2.6,0.7,0. 5,5.4,4.2,4.5,6.6,18.7,5.1,9.1,3,0
Ket,0,2.15,0,0,0,0,0,0,0.7,0,0,4.55,0.7,88.85,2.95 ,0,12.5,0,0,0,0,0,0,27.9,12.5,0,0,0,0,0,1,0,1,1.9, 0,0,0,0,4.8,0,0,1.9,0,6.7,0,11.5,0,2.9,0,12.5,2.9, 0
Khakas,0,2.2,0,0,0,0,2.2,0,2.6,0,22.8,25.4,0.9,4.8 ,5.7,33.3,5.5,0,0.9,0,0,0.9,0,5.5,0.9,0,0,0,0,0,3. 6,3.6,0,0,0,0,0,0,0,0,0.9,0,0,3.6,6.4,23.6,1.8,14. 5,0.9,27.3,0,0
Khanty,0,2.3,0,0,0,1.2,0,0,0,0,31.4,48.8,0,0,10.5, 5.8,17.3,0,0,0,0.2,0.5,0,16.5,5.4,0,5.7,0,0.5,0,13 .1,7.4,0,0,0.7,0,0,0,5.7,0,0,0.2,0.2,3,0,0.2,0,11. 6,1.2,10.4,0,0
Komi,0,0,1.5,0,0,0,5.9,2.2,0.7,0,18.5,37,0,0.7,5.9 ,27.4,33.0,0,0.6,0,0,0,0,13.6,9.9,0,0.9,0.9,2,0,4. 9,13.3,0.9,2.6,2.9,0,0,0,0,0.6,0,0.3,0,3.2,0,0,0,3 .5,2.3,2.9,1.7,0
Kumyk,0,0,2.7,0,0,13.7,0,42.5,1.4,0,0,0,1.4,2.7,20 .5,15.1,24.1,3.6,0,0,4.5,2.7,6.3,6.3,5.4,0,4.5,0.9 ,2.7,0,8,11.6,5.4,0.9,0,0.9,0,0,0,1.8,0.9,0,5.4,0, 0,0,0.9,0.9,2.7,0,0,0
Kyrgyz,0,18.7,0,1.1,1.1,0,0,7.7,2.2,0,0,8.8,5.5,3. 3,2.2,49.5,8.3,1.9,1.9,0.6,0.6,1.3,0,1.9,0.6,0,0.6 ,0,0.6,0,0.6,2.5,1.3,0,0,0,0,0,0,1.3,1.3,0.6,0.6,7 ,5.1,7,6.4,20.4,9.6,16.6,1.3,0
Latvian,0,0,1,0,0.5,0.5,7,0,0,0,0,41.7,0,1,9.5,38. 7,42.1,1.7,2.7,0,0,3.6,2.2,10.2,10.2,0,0,0,2.4,0,6 .1,9.2,4.6,0,0,0,0,0,0,4.1,0,0,0.2,0,0,0,0,0,0.5,0 ,0,0
Lithuanian,0,0,1.2,0,1.8,0,11.6,1.8,0,0,0,43.9,0,0 .6,4.9,34.1,49.8,1.5,5.5,1,0,0,0.5,2.5,12.4,0,0,0, 1.5,0,6.5,9.5,2.5,0.5,2,1.5,0,0,0,3,0,0,0,0,0,0,0, 0,0,0,0,0
Mansi,0,0,0,0,0,0,8,4,0,0,60,16,0,0,4,8,15.1,0,0.5 ,0,0,2,0,16.6,3.5,0,3.5,0,1.5,0,13.1,5,0,0,0,0,0,0 ,2,0,0,0,0,1.5,0,0.5,0.5,12.6,3.5,18.6,0,0
Mari,0,0,0,0,3.1,0,5.2,5.2,3.1,0,8.2,46.4,0,4.1,2. 1,22.7,41.9,0,11,0,0,0,0,10.3,14,0,0,0,2.2,0,7.4,5 .1,0.7,0,0,0,0,0,0,0,0,0,0,1.5,0,0,0.7,1.5,0,0.7,2 .9,0
Mongol,0,53.1,0,2.9,1.1,0.3,0,2,2.9,0,2.6,4,20.6,5 .1,0.3,5.1,4.2,1.5,0,0,0.4,1.1,0,1.9,1.1,0,0,0,0.8 ,0,1.1,0.8,0,0.4,0.4,0,0,0,0,0,2.7,0.8,0,6.5,8.8,8 .8,11.1,26.3,6.9,11.8,2.7,0
Mordovian,0,0,0,0,3.7,3.7,19.5,12.2,2.4,0,2.4,15.9 ,0,0,13.4,26.8,46.3,0,3.7,0.3,0.3,2.7,0.3,2.3,15.8 ,0,0,0,1.3,0,6.7,6.7,6.7,1.7,0.7,0,2.3,0,0,0,0,0,0 ,0,0,0,0,0.7,0,1.3,0,0
Nenets,0,0,0,0,0,0,0,0,0,0,57.4,40.5,0.7,1.4,0,0,1 5.3,0,0,1.5,0,1.5,0,6.6,2.9,0,0,0,0,0,0.7,14.6,0,0 ,0,0,0,0,0,0,0,1.5,0,0.7,0,0,0,13.1,0,41.6,0,0
Nganasan,0,5.3,0,0,0,0,0,0,0,0,92.1,2.6,0,0,0,0,1. 5,0,0,0,0,2.3,0,16.8,0,0,0,0,0,0,0,0.8,0,0,0,0,0,0 ,0,0,0,0.8,0,3.8,0,0,0.8,33.6,3.1,34.4,2.3,0
Nivkh,0,40,0,10,0,0,10,0,0,0,0,0,20,10,0,10,0.0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,66.1 ,0,0,0,0,0,28.6,5.4,0,0,0
Nogay,0,8,0,1.1,0,13.8,0,37.9,1.1,0,2.3,2.3,3.4,0, 17.2,12.6,20.9,5.4,0,0,3.9,2.3,7,3.1,4.7,0,0,0,3.9 ,0,4.7,8.5,3.9,0,0,0.8,0,0,0,6.2,0,0,2.3,4.7,0.8,1 .6,2.3,3.9,4.7,1.6,2.3,0.8
Oroqen,0,86.7,0,0,0,0,0,0,0,0,0,10,3.3,0,0,0,0.0,0 ,0,0,0,0,0,0,0,0,0,0,0,2.3,0,0,0,0,0,0,0,0,0,0,0,0 ,0,4.5,2.3,4.5,0,43.2,11.4,29.5,2.3,0
Russian (North),0,0,0.3,0,0.8,1.3,12.9,1.8,0,0,7.1,35.5,0, 0.8,5.3,34.2,47.9,1.4,0,1.4,0,0,0,2.8,4.9,0,0.7,0. 7,10.4,0,9.7,5.6,0,6.9,0,0,0,0,0,1.4,0,0,0,0.7,0,0 ,0,5.6,0,0,0,0
Russian (South),0,0.2,1.9,0,0.4,1,20.9,3.5,1.2,0,0.4,9.5,0 ,0.4,5.2,55.4,41.7,3.5,4.5,0,1,1.5,1,3.5,10.6,0,0. 5,0,3,0,8,11.1,2.5,0.5,0,0,0,0,0,2,0,0,3.5,0,0,0,0 .5,0.5,0.5,0,0,0
Saami (Kola),0,0,8.7,0,0,0,17.4,4.3,0,0,0,39.1,0,0,8.7,2 1.7,12.8,0,19.8,0,0,0,0,0,64,0,0,0,0,0,0,0,0,0,0,0 ,0,0,0,0,0,0,0,0,1.2,0,0,2.3,0,0,0,0
Saami (Sweden),0,0,0,0,2.7,0,32.9,0,0,0,0,41.1,0,0,5.5,1 7.8,3.1,0,68.4,0,0,0,0,0,26.5,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0
Selkup,0,3.05,0,0,0,0,3.5,0,0,0,6.95,1.15,0,62.25, 16.55,6.55,28.3,0,0,0,0,5.8,0,25,0.8,0,0,0,0,0,0,1 .7,0,0,0,0,0,0,0,0,0,0,0,7.5,0,4.2,0,3.3,0.8,22.5, 0,0
Shor,0,1.4,0,0,4.1,0,0,0,0,0,14.9,18.9,0,1.4,14.9, 44.6,11.0,0,0,0,0,0,0,2.4,0,0,0,0,1.2,0,6.1,0,2.4, 0,0,0,0,0,0,0,0,0,0,1.2,4.9,41.5,3.7,12.2,0,12.2,1 .2,0
Swedish,0.2,0.1,4.1,0,3.5,1.7,44.3,0.9,0.8,0,0,6.8 ,0.6,2.9,20.9,13.4,45.6,2.2,1.3,0.2,0,0.4,1.3,4.2, 12.7,0,0.2,2.5,7.1,0,8,9.1,2.4,0,0.2,0,0,0,0,1.5,0 ,0,0.5,0,0,0,0,0,0,0.2,0.5,0
Tajik,0,0,0,0,0,8.3,0,20.8,8.3,16.7,0,0,0,16.7,4.2 ,25,0.0,15,0,0,5,0,0,0,5,0,5,0,5,0,5,5,0,0,0,0,0,0 ,0,10,0,5,0,15,5,0,0,15,5,0,0,0
Tatar,0,4.8,3.9,0,0.5,2.9,7.2,9.2,0,1,1.4,22.2,1,3 .9,10.1,31.9,28.6,3.1,4.1,1.5,1.5,0.5,3.1,6.6,10.2 ,0,0,0,6.6,0,7.7,10.2,1.5,0.5,0,2.6,0,0,0,2,0,0,0, 2,0.5,0,2,2.6,1.5,1,0,0
Turkish,0.4,1.3,11.3,0,0.6,10.9,5.4,33.5,2.5,4.2,2 .9,1,0.2,2.9,16.3,6.9,25.5,6.1,0.4,0.6,4.2,1.3,4.8 ,1,5.4,0,1.7,0.8,6.1,0.2,10.5,11.9,2.5,1.3,1,0.8,0 .2,0,0.2,3.3,0.2,0.6,4.4,0.8,0,0.2,3.3,0.2,0.2,0,0 ,0
Turkmen,0,0,0,0,13.3,0,0,16.7,13.3,0,0,0,0,13.3,36 .7,6.7,16.9,15.6,0,0,0,1.3,0,1.3,1.3,0,0,0,0,1.3,6 .5,6.5,0,1.3,2.6,0,0,0,0,0,0,0,1.3,1.3,3.9,2.6,2.6 ,20.8,3.9,9.1,0,0
Udmurt,0,0,1.1,0,1.1,0,2.2,0,1.1,0,16.8,53.3,0,0.5 ,4.9,19,22.5,0,0.5,0.5,0,10.4,0,3.3,9.3,0,0,0,0,0, 1.6,16.5,0,4.4,0.5,0,0,0,0,0,0,0,0,8.8,0,0,0,11,0. 5,3.8,6,0
Uyghur,0,10.1,0,2.8,4.6,2.8,0.9,13.8,0,3.7,3.7,0.9 ,14.7,9.2,11,22,15.1,3.6,0.6,0,0,3,0.6,2.4,3.6,0,2 .4,0,1.8,0,3.6,3,1.8,0.6,0,0,0,0,0,2.4,0.6,2.4,0.6 ,2.4,4.2,3.6,12.7,12,8.4,7.2,1.2,0
Uzbek,0,15.4,3.8,0,6.4,2.6,1.3,14.1,3.8,2.6,2.6,3. 8,6.4,10.3,9,17.9,14.7,4.6,0,0,0.4,1.5,0.8,5.4,3.5 ,0,2.7,0.4,1.9,0.4,2.3,4.6,0,3.5,0.4,1.9,0,0,0,1.9 ,1.9,0.8,1.2,4.6,3.9,5,5.8,13.5,3.1,8.9,0.4,0
Vepsian,0,0,0,0,0,0,5.1,0,0,0,17.9,38.5,0,0,2.6,35 .9,57.6,1.6,2.4,0,0,0.8,0,0.8,16.8,0,0,0.8,0,0,4.8 ,2.4,4,4,0,0,0,0,0,2.4,0,0,0,0,0,0,0,1.6,0,0,0,0
Yakut,0,4.9,0,0,0.3,0,0.5,0,0,0,3.8,86.2,0,0,0.8,3 .5,3.2,1.1,0,0,0,0,0,0.5,0.4,0,0,0,0,0,2,0.9,0,0.4 ,0,0,0,0,0,1.1,0,0.9,0,2,1.8,4.1,3.7,29.4,4.4,43.6 ,0.7,0
Yukaghir,0,54.5,0,0,0,0,0,0,0,0,0,27.3,0,18.2,0,0, 0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,19,12,67,2,0

If you use the table to make Vahaduo models, it makes Finns look nice and Mongoloid. The Yakut ancestry is explained by bottlenecks that caused both Finns and Yakuts to have high N-M178. The Saami ancestry is explained by high U5 in Finns.

Target: Finnish
Distance: 2917.9693% / 29.17969266
32.0 Lithuanian
21.2 Swedish
19.0 Saami(Kola)
17.8 Yakut
10.0 Mari

In the table I used to make the models, South Russians have the highest R1a (55%) and Shors also have 45% R1a. Therefore Shors get South Russian ancestry:

Target: Shor
Distance: 2968.5906% / 29.68590599
82.6 Khakas
14.4 Russian(South)
1.8 Kyrgyz
0.8 Altaian
0.4 Turkmen

In Tambets et al. 2018, mtDNA groups were characterized as either eastern or western, and eastern groups were listed at the end of table S5. Maris had high U4 (10%) which is today the most common in Siberia, but U4 was not considered to be eastern, so Maris only had a total of 4% of the eastern mtDNA groups.

U4 is an interesting haplogroup because it is mostly found in the domain of Uralic influence. In table S5, U4 is the common among Kets (28%), Selkups (25%), Nganasans (17%), Mansi (17%), and Khanty (16%). However Komis, Chuvashes, Bashkirs, Maris, and Latvians also have relatively high U4. Wikipedia says that U4 is "found in Europe with highest concentrations in Scandinavia and the Baltic states", but maybe it's actually more common in the VUR or among Komis.

U4 seems to peak in populations with high ANE, and the amount of U4 in Nganasans also nicely corresponds to the amount of ANE ancestry in Nganasans. In table S5, almost all of the western haplogroups of Nganasans are U4.

Maybe high U4 in Maris is part of the reason why Maris get Khanty and Ket ancestry. Another reason why Maris get Khanty ancestry is that Maris and Khanty had more N-M178 than other N, but Samoyeds and Mansi had more other N than N-M178.

Target: Mari
Distance: 1241.8026% / 12.41802615
47.0 Lithuanian
19.4 Khanty
16.0 Vepsian
7.0 Saami(Sweden)
4.6 Finnish
4.0 Ket
1.4 Chuvash
0.6 Yakut

Hungarians, Mordvins, Chuvashes, and Tatars have fairly high J, which explains why Mordvins get high Caucasian ancestry. When I have made models of Uralic people using Vahaduo, qpAdm, and ADMIXTURE, Mordvins have also received more CHG or Caucasian-like ancestry than Baltic Finnic populations.

Target: Mordovian
Distance: 977.8064% / 9.77806357
28.4 Vepsian
23.2 Swedish
17.2 Kumyk
14.0 Hungarian
9.0 Russian(South)
4.8 Estonian
1.6 Saami(Sweden)
1.0 Lithuanian
0.8 Finnish

Nganasankhan
05-06-2021, 12:08 PM
I used a shell wrapper for a simplified version of nMonte3.R (https://anthrogenica.com/showthread.php?23441-Shell-commands-and-R-scripts-for-working-with-G25&p=761927&viewfull=1#post761927) to model all populations in the CSV file in my previous post. In Vahaduo when you use the same populations as sources and targets, it's annoying that you have to manually remove target populations from the source tab, but it's easy to do with my shell wrapper for nMonte.

I used the `seriation` package to arrange related cells next to each other: https://jokergoo.github.io/ComplexHeatmap-reference/book/a-single-heatmap.html#heatmap-seriation.

For example Nganasans were modeled as 85% Mansi and 15% Nenets, but the distance of the model was bad (50%). The model of Finns is very different from the Vahaduo model in my previous post, because now Finns have only 5% Yakut ancestry but 31% Mari ancestry. Finns have much more Saami ancestry than Karelians, Vepsians, or Estonians, which is probably partially because Finns have high U5.

https://i.ibb.co/C7fMGpm/nmonte.png

In the image above, Kazakhs are modeled as 79% Mongol and Mongols are modeled as 68% Kazakh. In the Y-DNA heatmap, Mongols and Kazakhs formed one of the branches with the lowest height. Out of all pairs of populations in the Y-DNA table, Mongols and Kazakhs actually have the fourth lowest distance:


> t=read.table("https://pastebin.com/raw/N30HaLsz",sep="\t",check.names=F,header=T,comment.char="")
> d=as.data.frame(as.matrix(dist(t[,-c(1:3)])))
> cbind(t[,1],setNames(d,t[,1]))%>%pivot_longer(cols=-1)%>%arrange(value)%>%filter(value>0&.[[1]]<.[[2]])%>%head(16)%>%as.data.frame
t[, 1] name value
1 Latvian Lithuanian 8.584870
2 Komi Vepsian 9.719053
3 Estonian Russian (North) 10.240605
4 Kazakh Mongolian 10.756858
5 Altai-Kizhi Altaian 10.961296
6 Karelian Latvian 11.175867
7 Lithuanian Russian (North) 11.241441
8 Kumyk Nogay (Kuban) 11.522587
9 Karelian Lithuanian 11.974974
10 Karelian Russian (North) 12.179080
11 Russian (Central) Russian (South) 12.392336
12 Estonian Karelian 12.526771
13 Dolgan Evenk 12.618241
14 Latvian Russian (North) 12.886039
15 Uygur Uzbek 13.041472
16 Mari Udmurt 14.188376

Nganasankhan
05-06-2021, 03:02 PM
Here's biplots based on a combined table of both Y-DNA mtDNA. The clusters are based on hierarchical clustering using the complete linkage method. Each population is additionally connected with a line to its two nearest neighbors.

Yakuts are a lone outlier that form their own cluster because they have high N-M231, even though their closest neighbors are Udmurts and Khanty. Nivkhs are also an outlier because they had 66% mtDNA Y, but the second highest Y was 17% in Hezhen. Both Kola Saami and Swedish Saami are also in their own cluster with no other population.

PC1 differentiates the most common eastern mtDNA haplogroups (C and D) and the most common western groups (H). Selkups are far from other Samoyeds on PC2, because PC2 has the highest loading for Y-DNA N haplogroups, and Selkups had only 8% N.

On PC5 and PC6, there appears one cline for Saami with a high loading for mtDNA U5 and V. There is another cline for Altaians, Shors, Khakasses, and Kyrgyzes, all of whom have around 50% R1a.

On PC7, the loading of mtDNA Y has the highest magnitude, which causes Nivkhs to plot very far from other populations.

https://i.ibb.co/sHkR74V/biplot-mtdna-ydna-combined.jpg


library(tidyverse)
library(ggforce)
library(ggrepel)

t=read.csv("https://pastebin.com/raw/kyzkgH3V",row.names=1,header=T,check.names=F)

p=prcomp(t)
pct=paste0(colnames(p$x)," (",sprintf("%.1f",p$sdev/sum(p$sdev)*100),"%)")
p2=as.data.frame(p$x)
p2$k=as.factor(cutree(hclust(dist(t)),k=12))
load=p$rotation

for(xpc in c(1,3,5,7)){
ypc=xpc+1

xsym=sym(paste0("PC",xpc))
ysym=sym(paste0("PC",ypc))

dist=as.data.frame(as.matrix(dist(t)))
seg0=lapply(1:3,function(i)apply(dist,1,function(x )unlist(p2[names(sort(x)[i]),c(xpc,ypc)],use.names=F))%>%t%>%cbind(p2[,c(xpc,ypc)]))
seg=do.call(rbind,seg0)%>%setNames(paste0("V",1:4))

# spantree=cbind(2:nrow(t2),vegan::spantree(dist)$ki d)
# seg=cbind(p2[spantree[,1],c(xpc,ypc)],p2[spantree[,2],c(xpc,ypc)])%>%setNames(paste0("V",1:4))

mult=max(max(p2[,xpc])/max(load[,xpc]),max(p2[,ypc])/max(load[,ypc]))

ggplot(p2,aes(!!xsym,!!ysym))+
geom_segment(data=seg,aes(x=V1,y=V2,xend=V3,yend=V 4),color="black",size=.3)+
ggforce::geom_mark_hull(aes(color=k,fill=k),concav ity=100,radius=unit(.15,"cm"),expand=unit(.15,"cm"),alpha=.15,size=.1)+
# geom_polygon(data=p2%>%group_by(k)%>%slice(chull(!!xsym,!!ysym)),aes(color=k,fill=k),a lpha=.2,size=.2)+
geom_segment(data=load,aes(x=0,y=0,xend=mult*!!xsy m,yend=mult*!!ysym),arrow=arrow(length=unit(.3,"lines")),color="gray85",size=.4)+
annotate("text",x=(mult*load[,xpc]),y=(mult*load[,ypc]),label=rownames(load),size=2.3,color="gray85",vjust=ifelse(load[,ypc]>0,-.5,1.4))+
geom_point(aes(color=k),size=.6)+
# geom_text(aes(label=rownames(t),color=k),size=2.5, vjust=-.6)+
ggrepel::geom_text_repel(aes(label=rownames(t),col or=k),max.overlaps=Inf,force=5,size=2.3,box.paddin g=0,point.padding=1,min.segment.length=.2,segment. size=.2)+
labs(x=pct[xpc],y=pct[ypc])+
scale_x_continuous(breaks=seq(-200,200,20),expand=expansion(mult=.1))+
scale_y_continuous(breaks=seq(-200,200,20),expand=expansion(mult=.04))+
scale_color_manual(values=hcl(head(seq(15,375,leng th=length(unique(p2$k))+1),-1),100,80))+
theme(axis.text=element_text(color="black",size=6),
axis.text.y=element_text(angle=90,vjust=1,hjust=.5 ),
axis.ticks=element_line(size=.25,color="black"),
axis.title=element_text(color="black",size=8),
legend.position="none",
panel.background=element_rect(fill="gray40"),
panel.border=element_rect(color="black",fill=NA,size=.5),
plot.background=element_rect(fill="gray40",color=NA),
panel.grid=element_blank())

ggsave(paste0(xpc,".png"),width=6,height=6)
}