View Full Version : Genetic-substructure and complex demographic history of South African Bantu speakers

08-13-2020, 11:56 PM
Genetic-substructure and complex demographic history of South African Bantu speakers


South Eastern Bantu-speaking ( SEB ) groups constitute more than 80% of the population in South Africa. Despite clear linguistic and geographic diversity, the genetic differences between these groups have not been systematically investigated. Based on genome-wide data of over 5000 individuals, representing eight major SEB groups, we provide strong evidence for fine-scale population structure that broadly aligns with geographic distribution and is also congruent with linguistic phylogeny (separation of Nguni, Sotho-Tswana and Tsonga speakers). Although differential Khoe-San admixture plays a key role, the structure persists after Khoe-San ancestry-masking. The timing of admixture, levels of sex-biased gene flow and population size dynamics also highlight differences in the demographic histories of individual groups. The comparisons with five Iron Age farmer genomes further support genetic continuity over ∼400 years in certain regions of the country. Simulated trait genome-wide association studies further show that the observed population structure could have major implications for biomedical genomics research in South Africa.

08-14-2020, 12:05 AM

Note 3. Sex-specific admixture patterns Several recent studies based on surveys of mitochondrial DNA (mtDNA) and Y-chromosome (Y-chr) haplogroups in Southern African populations have demonstrated a clear sex-biased gene flow between the K-S and BS16–18. Among the five Y-haplogroups found to be common among the SEB of this study, three are associated with Bantu-speakers (E1b1/E-P2, E2b/E-M52, and B2a1/B-M109) and two are associated with K-S populations (B2b/B-P6 and A1b1b2a), which are only 5.1% of the samples (Supplementary Table 4, Fig. 3a). Quality of assignment was measured using the F1 score —all assignments of E haplogroups were done with F1 score >0.89, and assignments of B haplogroups were done with F1 score >0.77. The assignment of our samples to A1b was with F1=1, though finer-scale resolution to A1b1b2a was only done with F1 in [0.60, 6.69]. The classification of the relatively few individuals with Y-haplogroups usually not associated with Africans included haplogroups assigned with F1>0.9 except for about a dozen individuals classified in J2a1a with F1<0.6.

In contrast, among the mtDNA-haplogroups detected in our dataset, the proportion of the two K-S associated mtDNA-haplogroups (L0d and L0k) is about 20.5 %, confirming K-S biased maternal gene flow (Supplementary Table 5, Fig. 3a). MtDNA classification was more complicated than for Yhaplogroups due to technical limitations of the H3A custom array. Nonetheless, this array allowed high resolution and accurate calling of L0 haplogroups associated with K-S ancestry/speakers (such as L0d and L0k), and could distinguish between three sub-haplogroups of L0d (L0d1, L0d2, and
L0d3). However, the base of the array was from existing Illumina bead pools which has good coverage of non-African haplogroups (viz, M and N and below) and some coverage of African haplogroups. As part of the design process, additional probes were added (Botha et al. in prep). However, the underlying array technology probes for SNPs that are within 100 bp of each other may interfere with each other (and more so as they get closer to each other). As the mitochondrial genome is too short (over 16K SNPs) and there were over 200 SNPs genotyped, the array has limitations for the coverage of other African mtDNA-haplogroups. Besides, the classifications that were made were done with reasonable quality scores (except for L2a1 with a score of 0.63), but in some cases it was at a very coarse resolution. For example, 16% of the samples were classified as L0a’b’g but could not be classified more deeply and about 5% were classified as L1’2’3’4’5’6 but could not be classified more deeply. Additional SNPs covering L0a and L0g seem to be the most pressing, and with extra coverage of L3, L4, L6 and especially L2 being desirable.

08-14-2020, 12:22 AM
The lowest cross-validation (CV) value was observed at K=5 which separates the Afro-Asiatic and the Central-West African ancestries (Fig. 2a, Supplementary Fig. 4a). Notably, the estimates show about 170 (4%) of the SEB participants harbour more than 5% Eurasian-like ancestry (Table 2).

I wonder if the Eurasian ancestry is Cushitic via the Khoikhoi pastoralists or minor European/Dutch ancestry

08-14-2020, 12:34 AM
I wonder if the Eurasian ancestry is Cushitic via the Khoikhoi pastoralists or minor European/Dutch ancestry

Prolly the former, if on the inverse the fact African ancestry in Afrikaaners has been shown to be from West-African slaves and marriage with 'pure' Khoi females. Seems Bantus were left quite alone in that regard.

11-08-2020, 11:18 AM
It seems not every single or most of them have Eurasian ancestry though. If you look at the K3 and the K5 graphs closely. Or am I looking at it wrong?