# Thread: The deep population history of northern East Asia from the Late Pleistocene to the

1. Originally Posted by Michalis Moriopoulos
David just converted these to G25 so I went ahead and put them through the ringer.
Here's an alternative PCA. It includes the 200 population averages that are the closest to the average coordinates of the new Amur River samples. Each population is connected with a line to its two closest neighbors.

Code:
```library(tidyverse)
library(ggforce) # for geom_mark_hull

t=rbind(t1,t2)

amur=t1[grep("CHN_Amur_River.*BP",rownames(t1)),]
mean=colMeans(amur)
dist=apply(t,1,function(x)sqrt(sum((x-mean)^2)))
t=unique(rbind(amur,t3))

p=prcomp(t)
pct=paste0(colnames(p\$x)," (",sprintf("%.1f",p\$sdev/sum(p\$sdev)*100),"%)")
p2=as.data.frame(p\$x)
k=cutree(hclust(dist(t)),16)

p2\$k=as.factor(k)

set.seed(0)

xpc=1
ypc=2
xsym=sym(paste0("PC",xpc))
ysym=sym(paste0("PC",ypc))

dist=as.data.frame(as.matrix(dist(t)))
seg=lapply(1:3,function(i)apply(dist,1,function(x)unlist(p2[names(sort(x)[i]),c(xpc,ypc)],use.names=F))%>%t%>%cbind(p2[,c(xpc,ypc)]))%>%do.call(rbind,.)%>%setNames(paste0("V",1:4))

ggplot(p2,aes(x=!!xsym,y=!!ysym))+
geom_segment(data=seg,aes(x=V1,y=V2,xend=V3,yend=V4),color="gray20",size=.1)+
geom_point(aes(color=k),size=.5)+
geom_text(aes(color=k),label=rownames(p2),size=2,vjust=-.7)+
labs(x=pct[xpc],y=pct[ypc])+
coord_fixed()+
scale_x_continuous(breaks=seq(-1,1,.05))+
scale_y_continuous(breaks=seq(-1,1,.05))+
scale_color_manual(values=color)+
scale_fill_manual(values=color)+
theme(
# aspect.ratio=1,
axis.text.x=element_text(margin=margin(.2,0,0,0,"cm")),
axis.text.y=element_text(angle=90,vjust=1,hjust=.5,margin=margin(0,.2,0,0,"cm")),
axis.text=element_text(color="black",size=6),
axis.ticks.length=unit(-.13,"cm"),
axis.ticks=element_line(size=.3,color="gray80"),
axis.title=element_text(color="black",size=8),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray80",fill=NA,size=.6),
panel.grid=element_blank()
)

ggsave("a.png",width=12,height=12)
system("mogrify -trim -bordercolor white -border 16 a.png")```

2. ## The Following 7 Users Say Thank You to Nganasankhan For This Useful Post:

Lenny Nero (06-11-2021),  Michalis Moriopoulos (06-10-2021),  Norfern-Ostrobothnian (06-10-2021),  Shuzam87 (06-10-2021),  Songtsen (06-11-2021),  thejkhan (06-10-2021),  Zelto (06-10-2021)

3. Originally Posted by Kristiina
Thanks Michalis for the explanation. I have a pretty big screen and I am able to identify only Tianyuan. Would you still like to explain what kind of populations/samples are in the top-left corner, the bottom centre and in the top-right corner?
Sure. On the PCA in the top left corner are groups with the strongest Southern East Asian ancestry (Taiwan, Indochina, Malay archipelago, Philippines, some Polynesians). At the bottom center are people from Tibetan Plateau/Himalayas, Northern China, Japan, Korea, etc. In the top right corner are the Siberians (with 75% or more East Asian ancestry)-- actually the Nganasan are in this PCA but are out of frame. Of course, most Northern East Asian groups (including the Amur River people) are in between the Tibet and Siberia poles.

4. ## The Following 5 Users Say Thank You to Michalis Moriopoulos For This Useful Post:

Kristiina (06-11-2021),  Lenny Nero (06-11-2021),  okarinaofsteiner (06-15-2021),  Radboud (06-11-2021),  Shuzam87 (06-13-2021)

5. Originally Posted by Ebizur
Perhaps because of hypotheses linking subclades of haplogroup O-M175 with certain Neolithic population expansions in eastern Asia, the age and internal diversity of this haplogroup often seem to be underestimated.

If you assume that haplogroup O1-F265 has originated south of the Yangtze River in the Paleolithic (O-F265 formed 30,500 [95% CI 28,700 <-> 32,400 ybp, TMRCA 29,900 [95% CI 27,800 <-> 32,100] ybp according to the current version of the YFull tree), how do you suppose that haplogroup O1b2-P49 might have reached Korea and Japan?

.
Several recent papers have discussed the concept of coastal distribution of a population, so O1 is best suited to this role

Several recent papers have discussed the concept of coastal distribution of a population, so O1 is best suited to this role
Actual specimens from coastal North China (Shandong) dated to the era of the First Agricultural Revolution in East Asia have been found to belong to Y-DNA haplogroup N1b.

Li Hui et al. (2007) found O1a-M119 with great frequency in specimens obtained from Liangzhu Culture sites around the mouth of the Yangtze River, but I do not know whether that early aDNA finding has been corroborated by subsequent research. In any case, they were assigned to O1a, so they cannot be ancestral to O1b2. O1a-M119 (more specifically, O-CTS5726 according to an online Chinese source) also has been found in Specimen LD1 (mean age 8,240 ybp) from Liangdao, a tiny island off the coast of Fujian, strengthening the evidence for a presence of O1a-M119 along the western shore of the East China Sea in the Early Neolithic. O1b1a1a1a-F1252 (basal to O-M111/M88, a clade that has been found predominantly among present-day Tai ethnic groups and the Vietnamese) has been found in Specimen L5696 from Tanshishan on the Fujian mainland, and O2a1-L465 (basal to O-JST002611, a clade that is spread widely among present-day East Asians) has been found in Specimen L5701 from nearby Xitoucun, but those specimens have been dated to the middle of the third millennium BCE, so they are Late Neolithic rather than Early Neolithic. Again, none of these can be ancestral to O1b2.

As far as I know, the earliest evidence of Y-DNA haplogroup O1b available at present is the Y-DNA obtained from Specimen La898 (mean age 6,000 ybp), a specimen obtained from the Tam Hang rockshelter in northern Laos. That specimen belonged to mtDNA haplogroup N9a6. However, La898 appears to represent a recent intrusion into Hoabinhian territory, so he or his recent ancestors may have migrated to Tam Hang from some place further north. Furthermore, his Y-DNA has been assigned to O1b1a1a1b-F2028, which is the subclade of O-M95 that predominates among Austroasiatic-speaking peoples (except the Vietnamese) and western Malayo-Polynesians (e.g. Chamic peoples) in western Indonesia, Malaysia, Singapore, southern Vietnam, and eastern Cambodia and that also has been found in India and Bangladesh, where it might be related to the migrations of Munda peoples. La898 is most likely a (pre-)Proto-Austroasiatic migrant from whichever part of East Asia the Proto-Austroasiatic-speaking people have inhabited prior to their southward migration. Again, this specimen is not ancestral to haplogroup O1b2.

Relatively ancient specimens belonging to haplogroup O1b also have been found in WGM94 (mean age 5,300 ybp) from the Wanggou site of the Yangshao culture in Zhengzhou, Henan and in HJTM109 (mean age 3,958 ybp) from the Haojiatai site of the Longshan culture in Luohe, Henan. The remains of silk fabrics produced with already well-developed techniques also have been recovered from the Wanggou site. In any case, both WGM94 and HJTM109 are from central Henan and belong to O1b1a2-F3016, a branch of O1b that is now most commonly found among northern Han Chinese (approximately 5% of all present-day Northern Han belong to O1b1a2-F3016). However, O1b1a2-F3016 forms a clade with O1b1a1a1b-F2028 (and, more generally, with O1b1a1-PK4) vis-à-vis haplogroup O1b2-P49.

The location(s) of members of haplogroup O1b2a1a-K10 during the Neolithic and the location of their Paleolithic O1b2-P49 ancestor remain a mystery. Many people have inferred that they were most likely inhabiting the Korean Peninsula, which still appears to me to be the most likely hypothesis, but it cannot be said to have been substantiated at all.

7. ## The Following 2 Users Say Thank You to Ebizur For This Useful Post:

alchemist223 (06-13-2021),  Ryukendo (06-13-2021)

8. This is the picture from the Li Hui et al. regarding Chinese Neolithic yDNAs: Y chromosomes of prehistoric people along the Yangtze River

It has detected O1 around Shanghai in Maqiao and Xindili:

Ancient yDNA from Yangtze River.JPG

9. Originally Posted by Kristiina
This is the picture from the Li Hui et al. regarding Chinese Neolithic yDNAs: Y chromosomes of prehistoric people along the Yangtze River

It has detected O1 around Shanghai in Maqiao and Xindili:

Ancient yDNA from Yangtze River.JPG
That map is Fig. 1 from the study by Li Hui et al. (2007) to which I have referred in my previous comment in this thread. At the time, "O1" referred to haplogroup O1a-M119, "O2a" referred to haplogroup O1b1a1a-M95, "O3" referred to haplogroup O2-M122, "O3d" referred to haplogroup O2a2a1a2-M7, and "O3e" referred to haplogroup O2a2b1-M134.

10. The originating paper for this thread states: "In conclusion...populations in the Amur region...are the closest East Asian source known for Ancient Paleo-Siberians,...the closest relative of Native American populations outside of the Americas." In November 2020, the paper 《 Post-last glacial maximum expansion of Y-chromosome haplogroup C2a-L1373 in northern Asia and its implications for the origin of Native Americans 》 concluded the same: "Our results support...that the direct ancestor...of Native Americans is an admixture of 'Ancient Northern Siberians' and...communities from the Amur region, which appeared during the post-LGM ... ". The October 2020 paper 《The genomic formation of First American ancestors in East and Northeast Asia》 (Houtaomuga paper), to which names such as David Reich, Johannes Krause, Ron Pinhasi, Carles Lalueza-Fox, and others, lend their prestige, using DNA from the same Amur river, only 2,000 years years more recent than the AR14K, states: "Ancient Beringians...harbor substantial admixture from a lineage that did not contribute to other Native Americans: Amur River Basin populations", which seems to me, to be quite the opposite of what the two papers I mentioned first, say, as the Ancient Beringians became extinct in Beringia itself. I had already pointed out here, what seemed to me to be a flaw in the bibliography of this Houtamunga paper, now I point out another flaw: what does the acronym NA mean on the map of 36-26Kya ago? I think it's North Asian = Tianyuan-related, but it's not explained in the text accompanying the map. I continue to have faith in the names of David Reich and others. Can anyone help?

11. ## The Following 2 Users Say Thank You to jose luis For This Useful Post:

CopperAxe (06-15-2021),  Howard23 (06-14-2021)

12. Originally Posted by Nganasankhan
Here's an alternative PCA. It includes the 200 population averages that are the closest to the average coordinates of the new Amur River samples. Each population is connected with a line to its two closest neighbors.

Code:
```library(tidyverse)
library(ggforce) # for geom_mark_hull

t=rbind(t1,t2)

amur=t1[grep("CHN_Amur_River.*BP",rownames(t1)),]
mean=colMeans(amur)
dist=apply(t,1,function(x)sqrt(sum((x-mean)^2)))
t=unique(rbind(amur,t3))

p=prcomp(t)
pct=paste0(colnames(p\$x)," (",sprintf("%.1f",p\$sdev/sum(p\$sdev)*100),"%)")
p2=as.data.frame(p\$x)
k=cutree(hclust(dist(t)),16)

p2\$k=as.factor(k)

set.seed(0)

xpc=1
ypc=2
xsym=sym(paste0("PC",xpc))
ysym=sym(paste0("PC",ypc))

dist=as.data.frame(as.matrix(dist(t)))
seg=lapply(1:3,function(i)apply(dist,1,function(x)unlist(p2[names(sort(x)[i]),c(xpc,ypc)],use.names=F))%>%t%>%cbind(p2[,c(xpc,ypc)]))%>%do.call(rbind,.)%>%setNames(paste0("V",1:4))

ggplot(p2,aes(x=!!xsym,y=!!ysym))+
geom_segment(data=seg,aes(x=V1,y=V2,xend=V3,yend=V4),color="gray20",size=.1)+
geom_point(aes(color=k),size=.5)+
geom_text(aes(color=k),label=rownames(p2),size=2,vjust=-.7)+
labs(x=pct[xpc],y=pct[ypc])+
coord_fixed()+
scale_x_continuous(breaks=seq(-1,1,.05))+
scale_y_continuous(breaks=seq(-1,1,.05))+
scale_color_manual(values=color)+
scale_fill_manual(values=color)+
theme(
# aspect.ratio=1,
axis.text.x=element_text(margin=margin(.2,0,0,0,"cm")),
axis.text.y=element_text(angle=90,vjust=1,hjust=.5,margin=margin(0,.2,0,0,"cm")),
axis.text=element_text(color="black",size=6),
axis.ticks.length=unit(-.13,"cm"),
axis.ticks=element_line(size=.3,color="gray80"),
axis.title=element_text(color="black",size=8),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray80",fill=NA,size=.6),
panel.grid=element_blank()
)

ggsave("a.png",width=12,height=12)
system("mogrify -trim -bordercolor white -border 16 a.png")```
Interesting how Han_Guangdong doesn’t have She as one of its neighbors, even though She is closer to Han_Guangdong than to Han_Fujian (which She is connected to in the graph) on the PCA.

13. Originally Posted by Ebizur
cf. Hua Zhong, Hong Shi, Xue-Bin Qi, et al. (2011), "Extended Y Chromosome Investigation Suggests Postglacial Migrations of Modern Humans into East Asia via the Northern Route," Mol. Biol. Evol. 28(1):717–727. (For the most part, the same samples have been examined in this study as have been examined in the previous Zhong et al. 2010 "Global distribution of Y-chromosome haplogroup C reveals the prehistoric migration routes of African exodus and early settlement in East Asia" that you have cited.)

The mean frequency of C2-M217 among Han in the northern half of China is approximately 14.1%, whereas the mean frequency of C2-M217 among Han in the southern half of China is approximately 7.2%, so the N/S ratio is 1.955 (i.e. C2-M217 is approximately twice as common among Northern Chinese as it is among Southern Chinese).

The mean frequency of N-M231 among Han in the northern half of China is approximately 6.7%, whereas the mean frequency of N-M231 among Han in the southern half of China is approximately 6.8%, so the N/S ratio is 0.987 (i.e. N-M231 is approximately equally common among Northern Chinese as it is among Southern Chinese).

There is no clear longitudinal cline within the Han Chinese population in regard to the frequency of Y-DNA haplogroup N-M231, but, if anything, the distribution of N-M231 seems to be slightly positively correlated with longitude (i.e. the opposite of what you have claimed) at least in the southern half of the country:

Southern China
W < ------ > E
Yunnan 5.1% Sichuan/Chongqing 5.8% Guizhou 6.9% (Guangxi 0% with n=27) Hunan/Hubei 9.3% Guangdong 8.6% (Jiangxi 0% with n=26) Fujian 9.0% Zhejiang 9.4% (Shanghai 0% with n=17) Taiwan 6.0%

Northern China
W < ------ > E
(Xinjiang 0% with n=32) Gansu 6.8% Shaanxi/Shanxi 10.9% Henan 4.5% Anhui 7.7% Shandong 8.7% (Jiangsu 0% with n=39) Liaoning 5.0% Jilin 7.1% Heilongjiang 4.8%
Originally Posted by Ebizur
Where have you obtained information regarding the Y-DNA haplogroup of the Bianbian specimen? Perhaps from Pribislav or another member of this forum?

Regarding the present-day distribution of C-M217, I have noted ten years ago that it resembles the distribution of Q-M242 rather than that of N-M231 insofar as the frequencies of C-M217 and Q-M242 both exhibit clear latitudinal clines within China, whereas N-M231 is not significantly less common in present-day Southern China than it is in present-day Northern China.

In the case of Y-DNA testing, sample size is definitely very important, the larger the better. Some of the data you've quoted here is relatively old and the sample size is relatively small. Based on data collected by 23mofang (currently the largest DNA testing company in China with tens of thousands of samples), here is the population and percentage of N-M231 according to province in China, data compiled at the beginning of 2021:

1) Gansu Province 8.42% (135 pax)
2) Shanxi Province 8.13% (485 pax)
3) Inner Mongolia Autonomous Region 7.98% (266 pax)
4) Shaanxi Province 7.96% (398 pax)
5) Henan Province 7.80% (796 pax)
6) Yunnan Province 7.64% (208 pax)
7) Shanghai City 7.41% (502 pax)
8) Heilongjiang Province 7.29% (292 pax)
9) Qinghai Province 7.28% (27 pax)
10) Hebei Province 7.27% (757 pax)
11) Liaoning Province 7.14% (507 pax)
12) Shandong Province 7.13% (1256 pax)
13) Jilin Province 7.03% (278 pax)
14) Jiangsu Province 7.02% (1354 pax)
15) Fujian Province 6.78% (415 pax)
16) Tianjin City 6.77% (178 pax)
17) Beijing City 6.76% (516 pax)
18) Anhui Province 6.48% (487 pax)
19) Sichuan Province 6.40% (704 pax)
20) Xinjiang Uyghur Autonomous Region 6.18% (100 pax)
21) Chongqing City 6.16% (235 pax)
22) Tibet Autonomous Region 6.04% (9 pax)
23) Zhejiang Province 5.72% (842 pax)
24) Guizhou Province 5.70% (99 pax)
25) Ningxia Hui Autonomous Region 5.33% (27 pax)
26) Hubei Province 4.92% (366 pax)
27) Hunan Province 4.67% (336 pax)
28) Hainan Province 4.61% (23 pax)
29) Taiwan 4.40% (31 pax)
30) Guangdong Province 4.29% (617 pax)
31) Jiangxi Province 3.59% (163 pax)
32) Guangxi Province 3.52% (97 pax)

There is a clear and distinctive north-to-south decline when it comes to Haplogroup N-M231 as well, the only notable exception being Yunnan province which is known for its incredible ethnic and cultural diversity. Many Mongolian ethnic groups have also settled in the province (eg the Daurs, Khatso aka descendants of the Mongol Empire army), Muslims, alongside the Tai-Kadai speakers, Hmong-Mien speakers and also Tibeto-Burman speakers. Note that the number of pax quoted for each province is not the sample size but the number of people who have tested N-M231+. In reality the sample size is much larger. For Jiangsu province alone in this case the sample size is around 19,300 people.

Originally Posted by Kristiina
@ Ebizur

You are only quoting Han frequencies. Han Chinese are the most recent layer in China. The older distributions may be hidden rather among the different ethnic minorities.
I definitely agree with and echo Kristiina on this one. Many Han Chinese in certain provinces and areas are recent immigrants from other parts of the country. Many Han Chinese from the northeast three provinces (aka Dongbei; Heilongjiang, Jilin, Liaoning) are descended from a relatively recent wave of immigrants from Shandong province during what was known as "Chuang Guandong". The data I've quoted above includes all ethnic minorities and is self-declared by the user therefore recent immigrants to a major city like Beijing or Shanghai for work would simply declare their ancestral hometown instead (be it another province or region). Compare this to a test where they pick random people off the street and assume they are "native" to the province.

14. ## The Following User Says Thank You to SG_Jun For This Useful Post:

Ryukendo (06-19-2021)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•