View Full Version : Hunter-gatherer genomes a trove of genetic diversity

08-04-2012, 05:21 PM
Sarah Tishkoff of UPenn has a new study out and she's going straight for the jugular. She's fully sequenced the genomes of 5 pygmies (from Cameroon), 5 Hadze, and 5 Sandawe (both from Tanzania).

Free access news article: http://www.nature.com/news/hunter-gatherer-genomes-a-trove-of-genetic-diversity-1.11076

Full Cell study (limited access): http://www.sciencedirect.com/science/article/pii/S0092867412008318


Demographic History of African Hunter-Gatherers
Principal component analysis (PCA) reveals both continental and population-specific patterns of genetic variation. PC1 distinguishes Africans from non-Africans (with East African populations being closer to non-Africans), and PC2 differentiates Asian and European populations (Figure 1D). The Hadza are differentiated by PC3, and subsequent principal components differentiate Pygmies (PC4) and Sandawe (PC5) from other African populations (Figure S4).

To assess shared ancestry between diverse African hunter-gatherer populations, we examined the percentage of shared variants between Pygmy, Hadza, and Sandawe genomes and a previously sequenced San genome (Schuster et al., 2010). The percentage of San variants that are shared with one other hunter-gatherer population is similar for Pygmy-, Hadza-, and Sandawe-specific variants (5.6%–5.7%), suggesting that the San diverged before other hunter-gatherer populations. However, the D test of admixture (Green et al., 2010) indicates that the San genome shares more derived alleles with Pygmies than with the Hadza or Sandawe (p < 0.01; Table S3). This result suggests that the ancestors of the Tanzanian click-speakers (the Hadza and Sandawe) may have diverged more recently in the past than the Pygmy/San split. However, additional possibilities involve gene flow between the ancestors of Pygmies and the San or stochastic loss of shared derived alleles among the ancestors of the Hadza and Sandawe.

A neighbor joining tree indicates that Pygmies diverged before the Hadza and Sandawe split (Figure 1F), and lack of monophyly among Pygmy genomes reveals population substructure involving Baka, Bakola, and Bedzan individuals. Hadza and Sandawe genomes are nested within a cluster that also includes the Maasai, possibly due to recent shared gene flow with neighboring East African populations. With the exception of Pygmies, clustering patterns reflect shared language families: Khoesan-speaking Hadza and Sandawe individuals cluster together, as do Niger-Kordofanian-speaking Yoruba and Luhya individuals.

We also observe differences in the number and cumulative size of long runs of homozygosity in each population. Of the 15 hunter-gatherer genomes analyzed in this paper, the five genomes with the most runs of homozygosity all belong to the Hadza (Figure S5). Though some of these differences may be due to a population bottleneck in the Hadza (Henn et al., 2011), an additional cause may be cryptic inbreeding (Pemberton et al., 2012), as indicated by the large variance in cumulative size of runs of homozygosity within the Hadza (Figure S5). Indeed, cumulative runs of homozygosity in three Hadza genomes are more than double the size of other hunter-gatherers analyzed in this paper (Figure S5).

Consistent with an historic bottleneck and/or inbreeding in the Hadza, we find that the proportion of polymorphic sites, as quantified by θ, is lowest for the Hadza and highest for Pygmies (Table 2). Depending on mutation rates, this translates to effective population sizes of 11,300–25,700 (Pygmy), 9,200–20,900 (Hadza), and 10,600–24,000 individuals (Sandawe). Genome-wide estimates of Tajima's D are lower for Pygmies and Sandawe compared to the Hadza (mean values of Tajima's D are −0.4273 for Pygmies, −0.0148 for Hadza, and −0.3453 for Sandawe). These results are consistent with the observation that low-frequency-derived alleles (DAF ≤ 0.1) are overrepresented in Pygmy and Sandawe populations and underrepresented in the Hadza (Figure S6; p < 0.0001, χ2 tests of independence). Together, these results suggest that Pygmy and Sandawe populations have recently expanded in size, whereas the Hadza population has recently decreased in size.


Figure S4. PCA Plots, Related to Figure 1(A–F) In each panel the x axis corresponds to PC1. The proportion of the variance explained by each PC is indicated along each axis, and individuals are represented by population name. Pygmies are labeled green, Hadza are labeled blue, and Sandawe are labeled red.

The deluge of data from next-generation sequencing has begun, with massively large data sets of low-coverage whole-genome sequences (1000 Genomes Project Consortium, 2010) and high-coverage exome sequences (Tennessen et al., 2012) being reported in thousands of individuals. Here, we described high-coverage whole-genome sequencing of individuals from three African hunter-gatherer populations, who harbor a large amount of previously unknown genetic diversity that is inaccessible by studying individuals of non-African ancestry or by focusing only on protein-coding regions. Despite evidence of inbreeding and a population bottleneck in the Hadza, high levels of genetic diversity are maintained in all three hunter-gatherer populations. Additionally, we found significant genetic divergence among the three African hunter-gatherer populations, including between the Hadza and Sandawe, who are geographically close (∼150 km apart) and have languages that contain click consonants, demonstrating the continued need to broadly sample human populations in order to comprehensively assess the spectrum of human genomic diversity.

We find evidence of selective constraint near genes, and these patterns are replicated in each hunter-gatherer population. We also observe signatures of local adaptation in Pygmy, Hadza, and Sandawe populations, including high locus-specific branch lengths for genes involved in taste/olfactory perception, pituitary development, reproduction, and immune function. These genetic differences reflect differences in local diets, pathogen pressures, and environments. Thus, Pygmies, Hadza, and Sandawe have continued to adapt to local conditions while sustaining their own unique cultures of hunting and gathering.

Short Stature, Pituitary Function, and Local Adaptation in Western African Pygmies
Short stature in African Pygmies is thought to be an adaptation to a tropical forest environment. Several possible fitness advantages of short height have been proposed, including thermoregulation, early cessation of growth as a trade-off for early reproduction to compensate for shorter life expectancy, easier mobility in a dense forest environment, and reduced caloric requirements (Migliano et al., 2007; Perry and Dominy, 2009). Although stature in Europeans is a highly complex trait (Lango Allen et al., 2010), the genetic architecture of this trait in Pygmies may differ (Pygmy LSBL hits are not enriched for height genes found in largely European GWAS, p = 0.888 for the top 268 LSBL windows, confirming Jarvis et al. [2012]. AIMs within and near HESX1 and POU1F1 are strong candidates for the short stature phenotype in Pygmies, together with previously identified (chr3:45–60 Mb region; Jarvis et al., 2012) and other as yet undiscovered loci. The observation of long-range LD maintained in diverse populations at these loci raises the possibility that undetected inversions in these chromosome 3 regions play a role in population differentiation and adaptation. Additionally, the observation that third-chromosome AIM clusters exist at a very low frequency in other African populations suggests that, if selection has altered the frequency of AIM haplotypes in Pygmies, then it may have acted on standing variation, which existed prior to the divergence of Eastern and Western Pygmies from other African populations. Furthermore, AIM variants are not included in commercially available genome-wide SNP arrays, emphasizing the critical importance of whole-genome sequencing for identifying variants of potential functional significance that may be geographically or ethnically restricted due to distinct selection pressures and/or demographic histories.

In addition to the 3p14.3 (HESX1) and 3p11.2 (POU1F1) AIM clusters, we have identified other candidate loci that may play a role in local adaptation, height, and pituitary function in Pygmies. These loci include TRHR (thyrotropin-releasing hormone receptor), APPL1, FSHR, and genes associated with Williams Syndrome (Supplemental Information). Overall, we find that highly divergent regions of Pygmy genomes (as identified by LSBL scans) are enriched for genes that play a role in pituitary function (p = 0.0082, χ2 test of independence).

Together, these results point toward the possibility that development and expression of hormones produced by the anterior pituitary may play a central role in the Pygmy phenotype, potentially influencing a number of traits, including growth, reproduction, metabolism, and immunity. Further studies of pituitary function and development in vitro and using transgenic animal models will be necessary to elucidate the importance of this system in Pygmy development and physiology and to clarify the role of variants within the 3p14.3 and 3p11.2 Pygmy AIM clusters.