Angoliga
10-28-2020, 06:42 PM
Abstract (paper (https://www.nature.com/articles/s41586-020-2859-7))
The African continent is regarded as the cradle of modern humans and African genomes contain more genetic variation than those from any other continent, yet only a fraction of the genetic diversity among African individuals has been surveyed1. Here we performed whole-genome sequencing analyses of 426 individuals—comprising 50 ethnolinguistic groups, including previously unsampled populations—to explore the breadth of genomic diversity across Africa. We uncovered more than 3 million previously undescribed variants, most of which were found among individuals from newly sampled ethnolinguistic groups, as well as 62 previously unreported loci that are under strong selection, which were predominantly found in genes that are involved in viral immunity, DNA repair and metabolism. We observed complex patterns of ancestral admixture and putative-damaging and novel variation, both within and between populations, alongside evidence that Zambia was a likely intermediate site along the routes of expansion of Bantu-speaking populations. Pathogenic variants in genes that are currently characterized as medically relevant were uncommon—but in other genes, variants denoted as ‘likely pathogenic’ in the ClinVar database were commonly observed. Collectively, these findings refine our current understanding of continental migration, identify gene flow and the response to human disease as strong drivers of genome-level population variation, and underscore the scientific imperative for a broader characterization of the genomic diversity of African individuals to understand human ancestry and improve health.
Fig. 1) H3Africa WGS data
https://i.imgur.com/USDjNcr.png
a) Geographical regions and populations of origin for H3Africa WGS data. The size of the circles indicates the relative number of sequenced samples from each population group (before quality control; Supplementary Methods Table 1). Samples with WGS data from the 1000 Genomes Project and the African Genome Variation Project are included for comparison (grey circles). CAM includes 25 individuals who are homozygous for the sickle mutation (HbSS); MAL includes unaffected individuals with a family history of neurological disease; BOT comprises children who are HIV-positive; BRN included only female participants. 1000G, 1000 Genome Project; AGVP, African Genome Variation Project. Maps were created using R43. b, Principal component analysis of African WGS data showing the first two principal components. New populations used in this study are indicated by crosses. Population abbreviations are as described in the 1000 Genomes and H3Africa Projects as provided in Supplementary Methods Table 1 and Supplementary Table 22. Shaded background elipses relate to the geographical regions as shown in a.
Fig. 2) Population admixture and genetic ancestry among African populations
https://i.imgur.com/NKYrYyY.png
a) Admixture plot showing select African populations based on WGS and array data for K = 10. b, Proposed movement during the Bantu migration, showing the populations that were used for inference. Blue line shows the migration patterns inferred by genetic distance estimates with Zambia (BSZ) as an intermediate staging ground for Bantu migrations further east (red–teal arrow) and south (red–yellow arrow). The dotted black line shows the previously proposed late-split route9; the dotted blue–green line through the DRC indicates an alternative model of migration. GGK, Gǀwi, Gǁana and baKgalagadi. c, Key admixture dates (in generations) in populations of interest based on MALDER results. The colour of each circle represents the admixture date for NC components in each population group (KS, AA, RFF and NS). Dates are shown in terms of number of generations (1 generation = 29 years). Maps were created using R43.
Data Fig. 1 ) Admixture K10
https://i.imgur.com/bjvCRKL.jpg
Existing African datasets from AGVP4, 1000 Genomes project2, SAHGP17 and previously published studies9,14 and a representative European population (CEU) from the 1000 Genomes Project are included as reference panels. K values from 2 to 10 are shown. See Supplementary Table 22 for definitions of abbreviations.
The African continent is regarded as the cradle of modern humans and African genomes contain more genetic variation than those from any other continent, yet only a fraction of the genetic diversity among African individuals has been surveyed1. Here we performed whole-genome sequencing analyses of 426 individuals—comprising 50 ethnolinguistic groups, including previously unsampled populations—to explore the breadth of genomic diversity across Africa. We uncovered more than 3 million previously undescribed variants, most of which were found among individuals from newly sampled ethnolinguistic groups, as well as 62 previously unreported loci that are under strong selection, which were predominantly found in genes that are involved in viral immunity, DNA repair and metabolism. We observed complex patterns of ancestral admixture and putative-damaging and novel variation, both within and between populations, alongside evidence that Zambia was a likely intermediate site along the routes of expansion of Bantu-speaking populations. Pathogenic variants in genes that are currently characterized as medically relevant were uncommon—but in other genes, variants denoted as ‘likely pathogenic’ in the ClinVar database were commonly observed. Collectively, these findings refine our current understanding of continental migration, identify gene flow and the response to human disease as strong drivers of genome-level population variation, and underscore the scientific imperative for a broader characterization of the genomic diversity of African individuals to understand human ancestry and improve health.
Fig. 1) H3Africa WGS data
https://i.imgur.com/USDjNcr.png
a) Geographical regions and populations of origin for H3Africa WGS data. The size of the circles indicates the relative number of sequenced samples from each population group (before quality control; Supplementary Methods Table 1). Samples with WGS data from the 1000 Genomes Project and the African Genome Variation Project are included for comparison (grey circles). CAM includes 25 individuals who are homozygous for the sickle mutation (HbSS); MAL includes unaffected individuals with a family history of neurological disease; BOT comprises children who are HIV-positive; BRN included only female participants. 1000G, 1000 Genome Project; AGVP, African Genome Variation Project. Maps were created using R43. b, Principal component analysis of African WGS data showing the first two principal components. New populations used in this study are indicated by crosses. Population abbreviations are as described in the 1000 Genomes and H3Africa Projects as provided in Supplementary Methods Table 1 and Supplementary Table 22. Shaded background elipses relate to the geographical regions as shown in a.
Fig. 2) Population admixture and genetic ancestry among African populations
https://i.imgur.com/NKYrYyY.png
a) Admixture plot showing select African populations based on WGS and array data for K = 10. b, Proposed movement during the Bantu migration, showing the populations that were used for inference. Blue line shows the migration patterns inferred by genetic distance estimates with Zambia (BSZ) as an intermediate staging ground for Bantu migrations further east (red–teal arrow) and south (red–yellow arrow). The dotted black line shows the previously proposed late-split route9; the dotted blue–green line through the DRC indicates an alternative model of migration. GGK, Gǀwi, Gǁana and baKgalagadi. c, Key admixture dates (in generations) in populations of interest based on MALDER results. The colour of each circle represents the admixture date for NC components in each population group (KS, AA, RFF and NS). Dates are shown in terms of number of generations (1 generation = 29 years). Maps were created using R43.
Data Fig. 1 ) Admixture K10
https://i.imgur.com/bjvCRKL.jpg
Existing African datasets from AGVP4, 1000 Genomes project2, SAHGP17 and previously published studies9,14 and a representative European population (CEU) from the 1000 Genomes Project are included as reference panels. K values from 2 to 10 are shown. See Supplementary Table 22 for definitions of abbreviations.