PDA

View Full Version : High-depth African genomes inform human migration and health (Choudhury et al. 2020)



Angoliga
10-28-2020, 06:42 PM
Abstract (paper (https://www.nature.com/articles/s41586-020-2859-7))

The African continent is regarded as the cradle of modern humans and African genomes contain more genetic variation than those from any other continent, yet only a fraction of the genetic diversity among African individuals has been surveyed1. Here we performed whole-genome sequencing analyses of 426 individuals—comprising 50 ethnolinguistic groups, including previously unsampled populations—to explore the breadth of genomic diversity across Africa. We uncovered more than 3 million previously undescribed variants, most of which were found among individuals from newly sampled ethnolinguistic groups, as well as 62 previously unreported loci that are under strong selection, which were predominantly found in genes that are involved in viral immunity, DNA repair and metabolism. We observed complex patterns of ancestral admixture and putative-damaging and novel variation, both within and between populations, alongside evidence that Zambia was a likely intermediate site along the routes of expansion of Bantu-speaking populations. Pathogenic variants in genes that are currently characterized as medically relevant were uncommon—but in other genes, variants denoted as ‘likely pathogenic’ in the ClinVar database were commonly observed. Collectively, these findings refine our current understanding of continental migration, identify gene flow and the response to human disease as strong drivers of genome-level population variation, and underscore the scientific imperative for a broader characterization of the genomic diversity of African individuals to understand human ancestry and improve health.




Fig. 1) H3Africa WGS data
https://i.imgur.com/USDjNcr.png
a) Geographical regions and populations of origin for H3Africa WGS data. The size of the circles indicates the relative number of sequenced samples from each population group (before quality control; Supplementary Methods Table 1). Samples with WGS data from the 1000 Genomes Project and the African Genome Variation Project are included for comparison (grey circles). CAM includes 25 individuals who are homozygous for the sickle mutation (HbSS); MAL includes unaffected individuals with a family history of neurological disease; BOT comprises children who are HIV-positive; BRN included only female participants. 1000G, 1000 Genome Project; AGVP, African Genome Variation Project. Maps were created using R43. b, Principal component analysis of African WGS data showing the first two principal components. New populations used in this study are indicated by crosses. Population abbreviations are as described in the 1000 Genomes and H3Africa Projects as provided in Supplementary Methods Table 1 and Supplementary Table 22. Shaded background elipses relate to the geographical regions as shown in a.



Fig. 2) Population admixture and genetic ancestry among African populations
https://i.imgur.com/NKYrYyY.png
a) Admixture plot showing select African populations based on WGS and array data for K = 10. b, Proposed movement during the Bantu migration, showing the populations that were used for inference. Blue line shows the migration patterns inferred by genetic distance estimates with Zambia (BSZ) as an intermediate staging ground for Bantu migrations further east (red–teal arrow) and south (red–yellow arrow). The dotted black line shows the previously proposed late-split route9; the dotted blue–green line through the DRC indicates an alternative model of migration. GGK, Gǀwi, Gǁana and baKgalagadi. c, Key admixture dates (in generations) in populations of interest based on MALDER results. The colour of each circle represents the admixture date for NC components in each population group (KS, AA, RFF and NS). Dates are shown in terms of number of generations (1 generation = 29 years). Maps were created using R43.



Data Fig. 1 ) Admixture K10
https://i.imgur.com/bjvCRKL.jpg



Existing African datasets from AGVP4, 1000 Genomes project2, SAHGP17 and previously published studies9,14 and a representative European population (CEU) from the 1000 Genomes Project are included as reference panels. K values from 2 to 10 are shown. See Supplementary Table 22 for definitions of abbreviations.

pmokeefe
10-28-2020, 07:57 PM
40728
Supplementary Figure 12 - Distribution of mitochondrial and Y chromosome haplogroups in H3A-high coverage WGS samples.
Pie charts show the relative frequencies of (A) Mitochondrial haplogroups (B ) Y-chromosome
haplogroups in the populations surveyed. All samples from BRN were female. Maps were created using
R40. Country border data was obtained from: http://thematicmapping.org/downloads/world_borders.php

Supplementary information (https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-020-2859-7/MediaObjects/41586_2020_2859_MOESM1_ESM.pdf)

I didn't see a more detailed breakdown of haplogroups in the paper and supplements, did I miss them?

beyoku
10-28-2020, 08:17 PM
Lot as interesting stuff in the Supp.

Angoliga
10-28-2020, 09:26 PM
Lot as interesting stuff in the Supp.

These PCAs and ADMIXTURE results stood out the most for me:

*especially Fig 5, my paternal-side (Aringa) are basically a sub-tribe of Lugbara




Supplementary Figure 1 – Extended principal components (PCs) of H3Africa populations from WGS data.
https://i.imgur.com/2kWupkT.png

Supplementary Figure 5 - PC and ADMIXTURE clustering analysis of Ugandan Nilo- Saharan (UNS).
https://i.imgur.com/fytK4a1.png

Supplementary Figure 6 - Genetic affinities and ancestral composition of Berom (BRN).
https://i.imgur.com/sXUOCTO.png

Supplementary Figure 9 - Principal component and ADMIXTURE clustering analysis of ethnolinguistic groups

https://i.imgur.com/7mbPE7M.png



I contacted the H3 team back in mid-August for some of these samples; many appear to be from the Mulindwa, 2020 et al. paper (https://www.cell.com/ajhg/pdf/S0002-9297(20)30237-8.pdf).

There was some correspondence after submitting request forms for dataset access (EGAS00001002602)... I was told that it takes some time to be granted access, and they've been partially been delayed due to the pandemic :/

-- hoping someone can eventually gain access for G25 conversion

ThaYamamoto
10-28-2020, 11:21 PM
Always stoked to have a new paper on African demographics. But...they don't excite me much anymore. The current methods are way too unoptimized to glean anything useful personally. For example, Admix runs exhibit founder effects and symptoms of pop. bottlenecks but don't really offer much insight thereof; I mean both the South African Bantu and the EA Bantu component being present in Cameroonians and Congolese [obviously] doesn't demonstrate admixture between these populations at all, and its extremely unlikely backflow has occurred. Allowing for artificial clusters based off of drift for Africans isn't conveying the full picture of the African genomic landscape at all. You can just as easily model South African Bantus as an entirely unique cluster while having the Baganda appear 80% Yoruba-non-bantu-like just as a paper earlier this year did.

Maybe if most papers utilized the package tools from A different view on fine-scale population structure in Western African populations (https://link.springer.com/article/10.1007/s00439-019-02069-7) the results would be much more useful for those attempting to learn more about finescale demographic history - since Tishkoff its the same methods, and the same results over and over again barring a handful of studies and that's still the benchmark a decade later. Even having these samples on the g25 won't help much, g25 and nMonte even more so being heavily constrained for modern day African groups.

Of course I'm still gonna pour over this paper and its supplements and its always nice to have Basoga as part of these studies. I wish Africa was of paramount importance to the ancient DNA labs.

edit; Its cool how you can see that 1 Somali and possibly Asian admixed UBS individual plotting far away in the above supplemental PCA.

edit2: Maybe I spoke too soon cuz the supplements are juicy lol. I wish they'd run some GLOBETROTTER/ALDER runs like Busby et al though.

ThaYamamoto
10-29-2020, 01:09 AM
So the Lugbara/UNS and Basoga/USB are only sequenced to 10x coverage which sucks also limiting the Mulindiwa paper. Unfortunate.

edit: the 1000 genomes YRI/LWK/MSL/ESN etc are only 2-4x coverage!? Damn. My bad I've noticed they did run MALDER but its flawed; AA introgression 52 generations ago in UBS based off two admixed out

edit: Angoliga, what are your thoughts on NS admixture into UBS being Central Sudanic (Equatorial Sudanese?) such as Lugbara etc as opposed to Dinka/Nuer/Shilluk/Lwo+proto-Luo? Possibly admixture from both?

NetNomad
10-29-2020, 09:19 AM
Can these new WGS samples be uploaded to yfull's tree? Hope so..

mpatsibihugu89
11-10-2020, 03:12 AM
Always stoked to have a new paper on African demographics. But...they don't excite me much anymore. The current methods are way too unoptimized to glean anything useful personally. For example, Admix runs exhibit founder effects and symptoms of pop. bottlenecks but don't really offer much insight thereof; I mean both the South African Bantu and the EA Bantu component being present in Cameroonians and Congolese [obviously] doesn't demonstrate admixture between these populations at all, and its extremely unlikely backflow has occurred. Allowing for artificial clusters based off of drift for Africans isn't conveying the full picture of the African genomic landscape at all. You can just as easily model South African Bantus as an entirely unique cluster while having the Baganda appear 80% Yoruba-non-bantu-like just as a paper earlier this year did.

Maybe if most papers utilized the package tools from A different view on fine-scale population structure in Western African populations (https://link.springer.com/article/10.1007/s00439-019-02069-7) the results would be much more useful for those attempting to learn more about finescale demographic history - since Tishkoff its the same methods, and the same results over and over again barring a handful of studies and that's still the benchmark a decade later. Even having these samples on the g25 won't help much, g25 and nMonte even more so being heavily constrained for modern day African groups.

Of course I'm still gonna pour over this paper and its supplements and its always nice to have Basoga as part of these studies. I wish Africa was of paramount importance to the ancient DNA labs.

edit; Its cool how you can see that 1 Somali and possibly Asian admixed UBS individual plotting far away in the above supplemental PCA.

edit2: Maybe I spoke too soon cuz the supplements are juicy lol. I wish they'd run some GLOBETROTTER/ALDER runs like Busby et al though.

Could the eurasian shifted UBS sample be hima like?

ThaYamamoto
11-10-2020, 10:46 PM
Could the eurasian shifted UBS sample be hima like?

Anythings possible but as far as I know all UBS samples were collected at a university in Jinja (Busoga land) from the Mulindwa paper where they were originally labelled UBB [Uganda Bantu Basoga]. Jinja had many Somalis as well as Arabs/Asians back in the day and there are a lot of mixed people, i.e. Zari Hassan who would plot exactly like that individual in the PCA I'd bet. Jinja is as far from the cushitic-admixed zone like the south-west as can be so I really doubt this happens to be a Hima or south-west tribes person that identifies as Basoga for whatever reason lol.

It seems that the more eastern you travel, the cushitic admixture is very low or non-existent something I suspected well before ever seeing the UBB sample..the Baganda samples often have appreciable cushitic admixture however as they are located centrally and span a far larger area. So as only Tororo and Busia would be further east than where these samples were collected, really don't think its a Hima person (the individual is hitting 50% somali). The cushitic signature seems to drop heavily in east Nyanza but increases on either side of the region and to the south also (central and southwest Ugandans, Rwandans, Burundians etc to its west, heavily mixed Kenyan tribes to the east and cushitic-admixed folks to the south in Tanzania). I do wonder how and why this phenomenon has occurred. I've gone off on a tangent but yeah.