Page 1 of 4 123 ... LastLast
Results 1 to 10 of 35

Thread: A closer look at the I1 samples from Batini et al, Nat. Commun. 2015

  1. #1
    Moderator
    Posts
    982
    Sex
    Location
    United Kingdom
    Ethnicity
    European
    Nationality
    British
    Y-DNA (P)
    I-FGC74348
    mtDNA (M)
    J1c1

    United Kingdom Northumberland European Union

    A closer look at the I1 samples from Batini et al, Nat. Commun. 2015

    I was intrigued by the paper published in Nature Communications in 2015 by Batini et al on “Large-scale recent expansion of European patrilineages shown by population resequencing” that came up a few times recently in the Ancient I1 Samples thread here on Anthrogenica. The Batini et al 2015 paper doesn’t actually contain any ancient samples, aside from a list in Supplementary Table 8, referenced in the manuscript “Ancient MSY (male specific region of the Y chromosome) sequences show that hgs R1a and R1b are present in the steppe much earlier than observed in any European sites (Supplementary Table 8), making this region a likely source for these MSY expansion lineages.” Given that there’s no ancient sample analysis and instead looks at the DNA sequences of modern individuals, plotted back to a TMRCA (time to most recent common ancestor), I thought I’d put it into a separate thread so it doesn’t go off tangent from the main Ancient I1 thread. I think does have some interest from an I1 perspective, especially relating to the founder effect and patrilineal bottleneck of modern I1 living today.

    The paper is open access, so anyone should be able to read it and it can be found here https://www.nature.com/articles/ncomms8152. It seems to be a more focused discussion on TMRCA using a dataset established in an earlier 2014 paper from Hallast et al titled “Y chromosome tree bursts into leaf…” which features a lot of the same authors as the Batini et al 2015 paper. The Hallast et al 2014 paper is also open access and can be found here https://academic.oup.com/mbe/article/32/3/661/977118. This one is useful because there’s a better resolution version of the tree branching Supplementary Figure 1 displaying the phylogenetic tree with sample names that is easier to read. The corresponding Supplementary Figure 1 in the Supplementary Information from the Batini et al 2015 paper is not very easy to read.

    The study contains the MSY sequences of 334 males from 17 populations and uses NGS (next generation sequencing). So not a very high number of samples, but the Y-DNA analysis of those samples is in greater depth than some other studies. All samples are from anonymous donors and most were collected for this study, so a good chance to get a look at a dataset that’s outside of the heavily US-biased direct to consumer testing databases.

  2. The Following 5 Users Say Thank You to deadly77 For This Useful Post:

     Celt_?? (07-12-2019),  JMcB (07-09-2019),  JonikW (07-09-2019),  spruithean (07-09-2019),  VytautusofAukstaitija (07-09-2019)

  3. #2
    Moderator
    Posts
    982
    Sex
    Location
    United Kingdom
    Ethnicity
    European
    Nationality
    British
    Y-DNA (P)
    I-FGC74348
    mtDNA (M)
    J1c1

    United Kingdom Northumberland European Union
    46 of the samples in the study were found to be I1, representing 13.8% of the dataset. As expected, the dataset is dominated by R1b, although several other haplogroups are represented in the study. No I1 samples were found in the Basque, Greek, Palestinian, Spanish, Italian or Turkish populations. This doesn’t mean that there’s no I1 folks in these modern populations – each population was only sampled with up to 20 individuals. Small sample size means that some haplogroups are going to be missed. For the populations where I1 samples were found in this study, the breakdown looks like this:
    Batini I1 breakdown.png
    Can also follow these in the Batini et al 2015 manuscript on the pie charts depicted in Figure 1b where I1 samples are shown in a lime green colour.

    The percentage of I1 in Danish and Norwegian populations appears to match up closely with other studies referenced by Eupedia around the 30% mark. A bit surprising to see the highest percentage of I1 was in the Frisian population at 50%. My feeling is that is a consequence of a small sample size of 20 (10 of which are I1) overestimates the percentage of I1. Compare to Eupedia reporting 16.5% I1 in the Netherlands citing a sample size of 500 to 1000 samples. It’s unlikely that 50% of men in that “Frisian” population are I1.

    Some of the population names are easy to follow – such as Danish, Irish, Norwegian, etc. Some are less clear. It’s not clear to me whether the Saami population in this study refers to specifically to Finno-Urgic people inhabiting Sápmi region or instead refers the population of Finland as defined by the current political borders. My feeling is the latter. Same with Frisian for the Netherlands, Bavaria for Germany. When the paper mentions England, it says Hertfordshire and Worcestershire in the Methods section, so perhaps the English samples were only collected from those regions. It also appears that the English and Orcadian samples reference the POBI (People of the British Isles) project. The CEU samples are collected from folks in Utah with Central European ancestry from International HapMap Consortium.

  4. The Following 4 Users Say Thank You to deadly77 For This Useful Post:

     JMcB (07-09-2019),  JonikW (07-09-2019),  spruithean (07-09-2019),  VytautusofAukstaitija (07-09-2019)

  5. #3
    Moderator
    Posts
    982
    Sex
    Location
    United Kingdom
    Ethnicity
    European
    Nationality
    British
    Y-DNA (P)
    I-FGC74348
    mtDNA (M)
    J1c1

    United Kingdom Northumberland European Union
    One thing that’s interesting is that the Batini et al 2015 paper gives a couple of different age estimates for the TMRCA for I1 – 4190 YA (years ago) with 95% HPD (high posterior density) 3470-5070 YA using BEAST. Then there’s another age estimate using rho with TMRCA 3460 YA and a range of 3180-3760 YA. This is a bit earlier than the YFull estimate of 4600 ybp (years before present) and 95% CI (confidence interval) 5200-4000 ybp. It’s important to remember that estimates are just that – estimates. Such estimates are not genealogically exact and they never will be. They’re in the same general area (and some of the dates fall within the error range of each other) but all estimates agree that the TMRCA is quite a gap from the branching off from I into I1 and I2. I wanted to establish if the TMRCA estimate in the Batini et al 2015 paper was based on a subset of I1 – for example, if all of the samples came from just I-DF29 or a downstream branch such as I-Z58, I-Z2336 or I-Z63.

    The Supplementary Data also includes a VCF file of the samples reported in this study. This can be loaded up in the Broad Institute’s IGV software that I use for looking at the BAM files of ancient I1 samples and make some designations about the I1 downstream subclades of these samples. I can’t get as much information from the VCF compared to the BAM. Think of the BAM file as the raw sequence data against a reference genome (hg19 or hg38 for example) and think of the VCF as more like an annotated report rather than the raw data itself. Also, both papers listed above in this study used 8 targeted regions of Y chromosome that were X-degenerate. The hg19 coordinates of these targeted X-degenerate regions are listed in Supplementary Data 2 of the Batini et al paper and in Figure 1 of the main manuscript in the Hallast et al 2014 paper.

  6. The Following 4 Users Say Thank You to deadly77 For This Useful Post:

     JMcB (07-09-2019),  JonikW (07-09-2019),  spruithean (07-09-2019),  VytautusofAukstaitija (07-09-2019)

  7. #4
    Moderator
    Posts
    982
    Sex
    Location
    United Kingdom
    Ethnicity
    European
    Nationality
    British
    Y-DNA (P)
    I-FGC74348
    mtDNA (M)
    J1c1

    United Kingdom Northumberland European Union
    This also means there are limitations on which SNPs can be extracted from the VCF file. For example, DF29 and the phyloequivalent SNPs for that branch aren’t covered. Same for Z58 which is one of the largest branches below I-DF29. However, I can see the SNPs Z2336, Z59 and Z63 which covers a lot of major branching points. Here’s how the numbers work out for this dataset:
    Batini 4 groups.png
    Most of the samples are I-Z59 and subclades. Second most common in this dataset are the I-Z2336 branch, followed by the I-Z63 branch and the rest are I-M253, which includes the samples which don’t fit into the other three categories – either I-Z58 who aren’t I-Z59 (such as I-Z138) and other I-DF29 subclades, and subclades of I1 that are negative for DF29.
    Here’s a more detailed breakdown od how the main four I1 groups are distributed in this paper.
    Batini Groupings Table.png
    Or for those of you who like a more visual representation:
    Batini Groupings Bar Chart.png

  8. The Following 4 Users Say Thank You to deadly77 For This Useful Post:

     JMcB (07-09-2019),  JonikW (07-09-2019),  spruithean (07-09-2019),  VytautusofAukstaitija (07-09-2019)

  9. #5
    Moderator
    Posts
    982
    Sex
    Location
    United Kingdom
    Ethnicity
    European
    Nationality
    British
    Y-DNA (P)
    I-FGC74348
    mtDNA (M)
    J1c1

    United Kingdom Northumberland European Union
    All four of the CEU samples are on YFull as part of their academic samples list which makes them a bit easier to track. From the YFull tree, sample CEU-NA11992 in on branch I-Y9414, which is downstream of I-Z17954. So the dataset here includes at least one sample that is outside the I-DF29 branch, going back to the same TMRCA as the YFull I1 tree, so that removes one of the main reasons for the differences in the TMRCA. There’s several other main reasons why the TMRCAs are different. YFull has significantly more samples and uses BAM files rather than VCF for the data. They also pull their pool of SNPs from different regions of the Y chromosome. YFull uses SNPs from the CombBED region, filters out some on other criteria (MNPs such as indels, SNPs that appear to often in multiple different haplogroups, read quality and depth) adjusted for coverage then an assumed mutation rate of 144.41 years. The Batini et al 2015 paper used 8 targeted X-degenerate regions of Y chromosome, so doesn’t cover as much of the combBED region as YFull, and uses a different mutation rate – for example, the rho method calculation uses 268.5 years per mutation, much higher than YFull but uses less SNPs to do that calculation. There is some discussion of their mutation rate in the manuscript and they reference a 2015 paper by Helgusson Nature Genetics which assigned mutation rates of Icelandic Y chromosome genomes. I haven’t read the Helgusson 2015 paper on Y chromosome of Icelandic genomes (not open access), but I’ve seen it referred to in Iain MacDonald’s age estimate calculations and different regions where a substantial (18%) difference in the mutation rate among different regions of the Y chromosome was mentioned.

    The smaller coverage of the 8 targeted X-degenerate regions of Y chromosome means that I’m not able to assign everything clearly as not every relevant SNP is covered in the VCF, but managed to find a few SNPs that were covered and was able to use some phyloequivalent SNPs as surrogates. After a bit of trial and error, I found that SNPs with a PH prefix were well covered in this paper. Apparently, this prefix is assigned to Pille Hallast, one of the authors of this paper, so that’s probably why. It may be that several of the PH SNPs were discovered and registered by this study. I was able to assign three of the samples in the M253 group to the I-Z138 group based on a derived reads for Z139. These are I-Z58 but negative for the I-Z59 branch (I-Z59 accounts for 21 of the 46 samples, so I-Z58 is 24 samples). Within these three samples, Fri-1312 is derived for PH4482 and Fri-1725 is derived for S19185, while Nor-20 I didn’t find anything further than Z139. Know CEU-NA11992 is I-Y9414 from position on the YFull tree, and further analysis shows that for the two remaining I-M253 group samples, Ork-525 has a derived read for PH2706 which is on I-Z131 branch (and therefore negative for DF29) and Nor-15 has a derived read for PH2510, which is a small branch below I-DF29 on the YFull tree. So, of the 46 samples, 44 are DF29+ and 2 are DF29-.

    Could dig out some more downstream information on the other branches – three of the I-Z63 samples were derived for Y2245 (Bav-53, Eng-O109, Ser-12) while the other three I-Z63 samples were ancestral for Y2245 (Fri-1319, Nor-14, Ser-1). Managed to find derived read for PH2195 for Ser-1 and derived read for PH3482 for Ser-12.

  10. The Following 4 Users Say Thank You to deadly77 For This Useful Post:

     JMcB (07-09-2019),  JonikW (07-09-2019),  spruithean (07-09-2019),  VytautusofAukstaitija (07-09-2019)

  11. #6
    Moderator
    Posts
    982
    Sex
    Location
    United Kingdom
    Ethnicity
    European
    Nationality
    British
    Y-DNA (P)
    I-FGC74348
    mtDNA (M)
    J1c1

    United Kingdom Northumberland European Union
    For the I-Z2336 samples, I could separate five of the thirteen into I-Z74 based on derived for CTS1793. Of those five, Saa-5 was derived for L258 and a further four of those (CEU-NA11829, Den-152, Nor-2, Nor-21) into I-L813 based on derived read for S297. Of the other samples, three of the Saami samples (Saa-5, Saa-9, Saa-10) had derived read for PH5383 downstream of I-L22 branch. Of the rest, Ser-4 was while Den-158 and Hun-27 I didn’t find anything downstream of I-Z2336.
    The I-Z59 group (the largest group), three of these were negative for Z60 and positive for Z2041. Of those three, Hun-37 had derived read for PH4774, Den-183 had derived read for PH4362 and CEU-NA12750 was I-Z2042.

    All the rest of the I-Z59 group were I-Z60. I didn’t find anything further for Den-207 but Den-113 was derived for PH902 and Nor-7 was derived for PH2834, placing that sample at I-BY453 downstream of I-F2642. Z140 and Z141 aren’t covered so no read for those, although F2642 was and Nor-7 in the previous sentence was the only example. I-Z2535 was read and six of the samples were derived – all of these were also derived for L338. Two of these I-L338 samples were derived for S1990 – Fri-1325 was also derived for PH5345 while Ire-0130 was ancestral for PH5345. The remaining four I-L338 samples were all derived for Y8337 – two of those (Fri-1309, Fri-1722) were also further derived for PH4462, while the other two (Fri-1048, Fri-1938) were not and at I-Y8337.

    Of the remaining I-Z60 samples, the remaining nine were derived for S1948, putting them on the I-CTS7362 branch. For two of these (Fri-1048, Fri-1937), didn’t get further down than S1948. Bav-57 was derived for PH2753, which would be at branch I-Y63100 on the YFull tree, a small branch of I-CTS7362. The rest of the I-CTS7362 samples could be grouped into I-Z73 based on derived read for phylogenetic SNP Y2927. For two of these (Den-176, Ork-573), couldn’t find anything downstream of that but CEU-NA06994 was I-Y11026 and three of the Saami samples (Saa-11, Saa-19, Saa-20) were I-L1302.

  12. The Following 4 Users Say Thank You to deadly77 For This Useful Post:

     JMcB (07-09-2019),  JonikW (07-09-2019),  spruithean (07-09-2019),  VytautusofAukstaitija (07-09-2019)

  13. #7
    Moderator
    Posts
    982
    Sex
    Location
    United Kingdom
    Ethnicity
    European
    Nationality
    British
    Y-DNA (P)
    I-FGC74348
    mtDNA (M)
    J1c1

    United Kingdom Northumberland European Union
    I realize that’s a bit of a read, but wanted to give the rationale behind where I grouped these samples in this dataset. Much easier to summarize into a table:
    Batini Summary.png
    Or for those who prefer a more visual overview, I annotated Supplementary Material Figure 1 from the Hallast et al 2014 paper with branches based on derived SNPs that I could find.
    Hallast Annotated.png
    This doesn’t necessarily mean the absolute position on the tree for the samples in this dataset as a lot of the SNPs weren’t covered in the 8 targeted x-degenerate regions of Y chromosome listed in Supplementary Data 2 of the Batini et al paper and in Figure 1 of the main manuscript in the Hallast et al 2014 paper. For some of the branches I was able to use phyloequivalent SNPs and at least I was able to break down the I1 samples in this dataset into known subclades.

  14. The Following 4 Users Say Thank You to deadly77 For This Useful Post:

     JMcB (07-09-2019),  JonikW (07-09-2019),  spruithean (07-09-2019),  VytautusofAukstaitija (07-09-2019)

  15. #8
    Registered Users
    Posts
    1,628
    Sex
    Location
    Canada
    Nationality
    Canadian
    Y-DNA (P)
    I-FTA53697

    Canada Netherlands United Kingdom Cornwall Ireland France
    Very interesting work. I remember reading through this study and finding the high-rate of I1 in Frisians, to not be surprising, but I felt that it was skewed. Great work! Interesting to note the distribution of these various subclades of I1.

  16. The Following 3 Users Say Thank You to spruithean For This Useful Post:

     deadly77 (07-09-2019),  JMcB (07-09-2019),  JonikW (07-09-2019)

  17. #9
    Gold Class Member
    Posts
    1,776
    Sex
    Location
    Kent
    Ethnicity
    North Sea/Irish Sea
    Nationality
    British
    Y-DNA (P)
    I1 Z140+ A21912+
    mtDNA (M)
    V
    Y-DNA (M)
    R1b L21+ L371+
    mtDNA (P)
    J1c2l

    Wales England Cornwall Scotland Ireland Normandie
    Quote Originally Posted by spruithean View Post
    Very interesting work. I remember reading through this study and finding the high-rate of I1 in Frisians, to not be surprising, but I felt that it was skewed. Great work! Interesting to note the distribution of these various subclades of I1.
    I agree that this is excellent work and it's great to see that deadly77 is back. I was also drawn to the Frisians here and would love to see a detailed study on that region, particularly given the area's role as a springboard into England in the Migration Period. Ten out of 20 were I1: wow, it would be good to have 200 samples and see how much the I1 level changed...
    Living DNA's former Cautious mode:
    Wales-related ancestry: 86.8%
    Cornwall: 8%
    North England-related ancestry: 5.2%
    Y line: Peak District, England. Big Y match: Scania, Sweden; TMRCA 1,250 ybp (YFull);
    mtDNA: traces to Glamorgan, Wales
    Mother's Y: traces to Llanvair Discoed, Wales

  18. The Following 2 Users Say Thank You to JonikW For This Useful Post:

     Adrian Stevenson (07-09-2019),  deadly77 (07-09-2019)

  19. #10
    Moderator
    Posts
    982
    Sex
    Location
    United Kingdom
    Ethnicity
    European
    Nationality
    British
    Y-DNA (P)
    I-FGC74348
    mtDNA (M)
    J1c1

    United Kingdom Northumberland European Union
    Quote Originally Posted by spruithean View Post
    Very interesting work. I remember reading through this study and finding the high-rate of I1 in Frisians, to not be surprising, but I felt that it was skewed. Great work! Interesting to note the distribution of these various subclades of I1.
    Cheers. Indeed - I saw that when I was reading the paper and thought "crikey! that's a lot of I1 in the Netherlands. Why have I never heard that before". Yeah, I believe it's a consequence of the small sample size and grabbing 10 out of the 20 Frisian samples being I1. I think with a larger sample size, the proportion of I1 isn't as large. Works the other way in that there's only one I1 sample from England. Luck of the draw I guess.

    I'm often seeing the percentage of I1 being quoted as "up to 50% of the population in some areas of Sweden" in various places. I'm feeling that figure is down to a low sample size. Going from Eupedia's Y-DNA haplogroups by country, it has 50% I1 in Gotland while 37% I1 in Sweden here https://www.eupedia.com/europe/europ...logroups.shtml

    Looking through the list of Eupedia's sources, got to this paper "Y-chromosome diversity in Sweden – A long-time perspective" by Karlsson in Nature 2006 here (open access): https://www.nature.com/articles/5201651 - Table 1 lists 40 samples from Gotland, 18 of which are I1a and 2 of which are I1c. I think the paper (given the year it was published) actually means I1c is what we would today classify as I2. It seems there was a reclassification in 2008 where I1a became I1. But if Eupedia is adding 18+2=20 I1 out of 40 samples from Gotland to get 50% again that could be an inflated figure from small sample size. If anyone knows of any other studies supporting the 50% statistic, I'd be interested to check them out.

  20. The Following 3 Users Say Thank You to deadly77 For This Useful Post:

     JMcB (07-09-2019),  JonikW (07-09-2019),  RP48 (07-12-2019)

Page 1 of 4 123 ... LastLast

Similar Threads

  1. Replies: 14
    Last Post: 04-04-2021, 07:41 AM
  2. Replies: 41
    Last Post: 10-28-2019, 01:40 PM
  3. Why is Botocudo closer to SE Asians than Amerindians?
    By Censored in forum Autosomal (auDNA)
    Replies: 2
    Last Post: 07-01-2019, 04:54 PM
  4. My current list of U106+ samples in aDNA samples
    By Bollox79 in forum Ancient (aDNA)
    Replies: 2
    Last Post: 04-23-2018, 12:51 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •