Page 1 of 2 12 LastLast
Results 1 to 10 of 12

Thread: Human Y Chromosome --- Highly Contiguous Long Read Assemblies (discussion)

  1. #1
    Global Moderator
    Posts
    1,619
    Sex
    Location
    Cambridge MA / Rome, Italy / San Diego, CA (currently)
    Ethnicity
    Polish/British Isles
    Nationality
    U.S.
    Y-DNA (P)
    R-Y154732
    mtDNA (M)
    H1
    mtDNA (P)
    J1c2

    Poland England Ireland Munster

    Human Y Chromosome --- Highly Contiguous Long Read Assemblies (discussion)

    Assembly of 43 diverse human Y chromosomes reveals extensive complexity and variation
    Pille Hallast, Peter Ebert, ... , The Human Genome Structural Variation Consortium (HGSVC), Jan O. Korbel, Chris Tyler-Smith, Evan E. Eichler, Xinghua Shi, Christine R Beck, Tobias Marschall, Miriam K. Konkel, Charles Lee
    doi: https://doi.org/10.1101/2022.12.01.518658
    This article is a preprint and has not been certified by peer review

    Abstract
    The prevalence of highly repetitive sequences within the human Y chromosome has led to its incomplete assembly and systematic omission from genomic analyses. Here, we present long-read de novo assemblies of 43 diverse Y-chromosomes, three contiguously assembled including two from deep-rooted African Y lineages. Examination of the full extent of genetic variation between Y chromosomes across 180,000 years of human evolution reveals its remarkable complexity and diversity in size and structure, in contrast with its low level of base substitution variation. The size of the Y chromosome assemblies vary extensively from 45.2 to 84.9 Mbp, with individual repeat arrays showing up to 6.7-fold difference in length across samples. Half of the male-specific euchromatic region is subject to large (up to 5.94 Mbp) inversions with a >2-fold higher recurrence rate compared to the rest of the human genome. The Y centromere, composed of 171 bp α-satellite monomer units, appears to have evolved from tandem arrays of a 36-mer ancestral higher order repeat (HOR), which has been predominantly replaced by a 34-mer HOR, and reveals a pattern of higher sequence variation towards the short-arm side. The Yq12 heterochromatic region is ubiquitously flanked by approximately 649 kbp and 472 kbp inversions that maintain the alternating arrays of DYZ1 and DYZ2 repeat units in between. While the sizes and the distribution of the DYZ1 and DYZ2 arrays vary considerably, primarily due to local expansions and contractions, the copy number ratio between the DYZ1 and DYZ2 monomer repeat units remains consistently close to 1:1. In addition, we have identified on average 65 kbp of novel sequence per Y chromosome. The availability of sequence-resolved Y chromosomes from multiple samples provides a basis for identifying new associations of specific traits with the Y chromosome and garnering novel evolutionary insights.

    The complete sequence of a human Y chromosome
    Arang Rhie, Sergey Nurk, ..., Adam M. Phillippy
    doi: https://doi.org/10.1101/2022.12.01.518724
    This article is a preprint and has not been certified by peer review

    Abstract
    The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure including long palindromes, tandem repeats, and segmental duplications. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029 base pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, revealing the complete ampliconic structures of TSPY, DAZ, and RBMY; 42 additional protein-coding genes, mostly from the TSPY gene family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a prior assembly of the CHM13 genome and mapped available population variation, clinical variants, and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.
    YFull: YF14620 (Dante Labs 2018)

  2. The Following 10 Users Say Thank You to pmokeefe For This Useful Post:

     AlluGobi (12-05-2022),  Calamus (12-07-2022),  David Bush (12-03-2022),  Ebizur (12-02-2022),  JMcB (12-02-2022),  MacUalraig (12-02-2022),  Megalophias (12-02-2022),  PLogan (12-02-2022),  razyn (12-04-2022),  RCO (12-02-2022)

  3. #2
    Global Moderator
    Posts
    1,619
    Sex
    Location
    Cambridge MA / Rome, Italy / San Diego, CA (currently)
    Ethnicity
    Polish/British Isles
    Nationality
    U.S.
    Y-DNA (P)
    R-Y154732
    mtDNA (M)
    H1
    mtDNA (P)
    J1c2

    Poland England Ireland Munster

    Haplogroups

    Haplogroup figures from the Supplement:

    Figure S1. Phylogenetic relationships of the analyzed Y chromosomes. Split times as estimated according to the
    BEAST analysis are shown with 95% HPD interval in brackets (kya - thousand years ago). Sample ID is followed
    by population designation, full Y haplogroup label according to ISOGG v15.73 and terminal marker ID.
    Population abbreviations: ACB - African Caribbean in Barbados; ASW - African Ancestry in SW USA; BEB -
    Bengali in Bangladesh; CHB - Han Chinese in Beijing, China; CHS - Han Chinese South; CLM - Colombian in
    Medellín, Colombia; ESN - Esan in Nigeria; FIN - Finnish in Finland; GBR - British From England and Scotland;
    GWD - Gambian in Western Division – Mandinka; IBS - Iberian Populations in Spain; ITU - Indian Telugu in
    the U.K.; JPT - Japanese in Tokyo, Japan; KHV - Kinh in Ho Chi Minh City, Vietnam; LWK - Luhya in Webuye,
    Kenya; MSL - Mende in Sierra Leone; MXL - Mexican Ancestry in Los Angeles CA USA; PEL - Peruvian in
    Lima Peru; PJL - Punjabi in Lahore, Pakistan; PUR - Puerto Rican in Puerto Rico; TSI - Toscani in Italia; YRI -
    Yoruba in Ibadan, Nigeria.


    Figure S2. Phylogenetic relationships of the analyzed Y chromosomes 516 and assembly completeness. Phylogenetic
    relationships of the analyzed Y chromosomes with branch lengths drawn proportional to the estimated times
    between successive splits according to BEAST analysis. Summary of Y assembly completeness with the number
    of contigs containing sequence from specific sequence class indicated with different colors (on the right - number
    of Y contigs needed to achieve the plotted assembly contiguity/total number of assembled Y contigs for each
    sample). Sample IDs include the population abbreviation, and the full Y lineage and terminal marker in brackets.
    See Figure S1 for population abbreviations.

    The haplogroup from the other paper is J-L816 (J1a2b3a1) Ashkenazi Jewish.

    Links to the corresponding YFull samples and Haplogroups would be nice!
    Last edited by pmokeefe; 12-02-2022 at 02:38 PM.
    YFull: YF14620 (Dante Labs 2018)

  4. The Following 7 Users Say Thank You to pmokeefe For This Useful Post:

     Coldmountains (12-02-2022),  Ebizur (12-02-2022),  JMcB (12-02-2022),  MacUalraig (12-02-2022),  Megalophias (12-03-2022),  PLogan (12-02-2022),  Riverman (12-02-2022)

  5. #3
    Registered Users
    Posts
    1,198
    Sex

    In Figure S1, it appears that the 95% HPD interval for the QR (= P1) node has erroneously been copied from the 95% HPD interval for the Q-M3 node.

  6. The Following 2 Users Say Thank You to Ebizur For This Useful Post:

     JMcB (12-02-2022),  pmokeefe (12-02-2022)

  7. #4
    Global Moderator
    Posts
    1,619
    Sex
    Location
    Cambridge MA / Rome, Italy / San Diego, CA (currently)
    Ethnicity
    Polish/British Isles
    Nationality
    U.S.
    Y-DNA (P)
    R-Y154732
    mtDNA (M)
    H1
    mtDNA (P)
    J1c2

    Poland England Ireland Munster
    Would it be worth realigning our old samples using the closest haplogroup available from these new papers? Pity there's no sample from Haplogroup I, presumably the J samples are the closest?
    YFull: YF14620 (Dante Labs 2018)

  8. The Following 2 Users Say Thank You to pmokeefe For This Useful Post:

     JMcB (12-02-2022),  Wing Genealogist (12-03-2022)

  9. #5
    Global Moderator
    Posts
    1,619
    Sex
    Location
    Cambridge MA / Rome, Italy / San Diego, CA (currently)
    Ethnicity
    Polish/British Isles
    Nationality
    U.S.
    Y-DNA (P)
    R-Y154732
    mtDNA (M)
    H1
    mtDNA (P)
    J1c2

    Poland England Ireland Munster
    Quote Originally Posted by Ebizur View Post
    In Figure S1, it appears that the 95% HPD interval for the QR (= P1) node has erroneously been copied from the 95% HPD interval for the Q-M3 node.
    Good eye! Here's their listed email if you get a chance:
    Corresponding author; email: charles.lee{at}jax.org
    Last edited by pmokeefe; 12-02-2022 at 03:21 PM.
    YFull: YF14620 (Dante Labs 2018)

  10. The Following 2 Users Say Thank You to pmokeefe For This Useful Post:

     Ebizur (12-02-2022),  JMcB (12-02-2022)

  11. #6
    Registered Users
    Posts
    1,198
    Sex

    Quote Originally Posted by pmokeefe View Post
    Would it be worth realigning our old samples using the closest haplogroup available from these new papers? Pity there's no sample from Haplogroup I, presumably the J samples are the closest?
    It is a pity that the authors have not included any member of haplogroup C2, haplogroup D, haplogroup I, haplogroup L, haplogroup T, haplogroup M, or haplogroup S in their sample set.

    Judging from a comparison between the TMRCA estimates in Figure S1 and those of corresponding nodes on the YFull tree, haplogroup NO and haplogroup O should see the greatest increases in TMRCA through recalibration: the TMRCA estimate for the NO-M214 node according to the present study's T2T-Y sequences is 1.231x that of YFull's estimate, and the present study's TMRCA estimate for the O-M175 node is 1.220x that of YFull's estimate.

    The 1.10x correction factor that many people have opined should be applied to YFull's TMRCA estimates to make them align more closely with other data, such as radiocarbon datings of archaeological specimens, appears to be quite on the mark in regard to subclades of haplogroup QR-M45. However, I would say that one may reasonably apply a 1.15x correction factor to YFull's TMRCA estimates for most clades, and subclades of haplogroup O may require a correction factor of 1.20x or more.

  12. The Following 3 Users Say Thank You to Ebizur For This Useful Post:

     JMcB (12-03-2022),  pmokeefe (12-03-2022),  Ryukendo (12-22-2022)

  13. #7
    Moderator
    Posts
    892
    Sex
    Location
    Waterville, ME
    Ethnicity
    Great Migration Colonists
    Nationality
    American
    Y-DNA (P)
    R1b-U106 (S10415)
    mtDNA (M)
    J1c2g (FMS)
    Y-DNA (M)
    I1a-P109 (23andMe)
    mtDNA (P)
    T2b3 (23andMe)

    United States Gadsden England Scotland Ireland Wales
    Quote Originally Posted by pmokeefe View Post
    Would it be worth realigning our old samples using the closest haplogroup available from these new papers? Pity there's no sample from Haplogroup I, presumably the J samples are the closest?
    I believe once Long Read technologies are used, it would be important to develop multiple reference samples (likely broken down by haplogroup) and to make comparisons with the closest available reference sample. Overall, I believe the Y-DNA tests commercially available today do not cover the areas where most of the large scale differences are found. Given the fragmentary nature of ancient DNA testing, I don't believe this will have any real impact in this area of study (although I would love to be proven wrong!)
    Gedmatch DNA: M032736 Gedcom: 6613110.
    Gedmatch Genesis: WH4547538
    co-administrator: Y-DNA R-U106 Haplogroup Project

  14. The Following 4 Users Say Thank You to Wing Genealogist For This Useful Post:

     David Bush (12-03-2022),  JMcB (12-03-2022),  pmokeefe (12-03-2022),  Saetro (12-03-2022)

  15. #8
    Registered Users
    Posts
    1,623
    Sex
    Location
    Glasgow, Scotland
    Ethnicity
    Pictland/Deira
    Y-DNA (P)
    R1b-M222-FGC5864
    mtDNA (M)
    H5r*

    Quote Originally Posted by Wing Genealogist View Post
    I believe once Long Read technologies are used, it would be important to develop multiple reference samples (likely broken down by haplogroup) and to make comparisons with the closest available reference sample. Overall, I believe the Y-DNA tests commercially available today do not cover the areas where most of the large scale differences are found. Given the fragmentary nature of ancient DNA testing, I don't believe this will have any real impact in this area of study (although I would love to be proven wrong!)
    We had similar discussions last December, certainly if I was hg I then I would give hg J alignment a pass.
    YSEQ:#37; YFull: YF01405 (Y Elite 2013)
    WGS (Full Genomes Nov 2015, YSEQ Feb 2019, Dante Mar 2019, FGC-10X Linked Reads Apr 2019, Dante-Nanopore May 2019, Chronomics Jan 2020, Sano Genetics Feb 2020, Nebula Genomics June 2020, YSEQ WGS400 Feb 2022)
    Ancestry GCs: Scots in central Scotland & Ulster, Ireland; English in Yorkshire & Pennines
    Hidden Content
    FBIMatch: A------ (autosomal DNA) for segment matching DO NOT POST ADMIXTURE REPORTS USING MY KIT

  16. The Following User Says Thank You to MacUalraig For This Useful Post:

     JMcB (12-03-2022)

  17. #9
    Registered Users
    Posts
    1,198
    Sex

    I have checked the R1b section of Figure S1 in full detail now, and it turns out that YFull's TMRCA estimates for subclades of R-P312, which subsumes most of the major Western European lineages under haplogroup R1b(xR-U106), are also probably severe underestimates:

    TMRCA Q-M3: 14 (11.6, 16.6)
    YFull: 12500 (95% CI 13000 <-> 12000) ybp
    1.120x

    TMRCA (R1+R2): 30.6 (26.5, 35.3)
    YFull: 28200 (95% CI 30500 <-> 25900) ybp
    1.085x

    TMRCA (R1a+R1b): 25.2 (21.6, 29.2)
    YFull: 22800 (95% CI 25100 <-> 20500) ybp
    1.105x

    TMRCA (R-DF13+(R-DF27+R-U152)): 5.6 (4.7, 6.7)
    YFull: TMRCA R-P312 4500 (95% CI 5300 <-> 3700) ybp
    1.244x

    TMRCA R1b1a1a2a1a2c1a-CTS241/DF13/S521: 4.9 (4, 5.9)
    YFull: 4100 (95% CI 4300 <-> 3900) ybp
    1.195x

    TMRCA (R-DF27+R-U152): 5.5 (4.6, 6.5)
    YFull: TMRCA R-P312 4500 (95% CI 5300 <-> 3700) ybp
    1.222x

    TMRCA R1b1a1a2a1a2b-PF6570/S28/U152: 5.3 (4.4, 6.3)
    YFull: 4500 (95% CI 5300 <-> 3700) ybp
    1.178x

    TMRCA R1b1a1a2a1a2b1a1-L20/S144: 4.6 (3.7, 5.6)
    YFull: 4000 (95% CI 4300 <-> 3700) ybp
    1.150x

    T2T-Y sequencing also may be of interest for its potential to resolve multifurcations in the phylogeny. According to the present study's Figure S1, R-DF27 and R-U152 should form a clade vis-ŕ-vis R-DF13.

  18. The Following 3 Users Say Thank You to Ebizur For This Useful Post:

     alchemist223 (12-04-2022),  pmokeefe (12-04-2022),  Ryukendo (12-22-2022)

  19. #10
    Quote Originally Posted by Ebizur View Post
    It is a pity that the authors have not included any member of haplogroup C2, haplogroup D, haplogroup I, haplogroup L, haplogroup T, haplogroup M, or haplogroup S in their sample set.

    Judging from a comparison between the TMRCA estimates in Figure S1 and those of corresponding nodes on the YFull tree, haplogroup NO and haplogroup O should see the greatest increases in TMRCA through recalibration: the TMRCA estimate for the NO-M214 node according to the present study's T2T-Y sequences is 1.231x that of YFull's estimate, and the present study's TMRCA estimate for the O-M175 node is 1.220x that of YFull's estimate.

    The 1.10x correction factor that many people have opined should be applied to YFull's TMRCA estimates to make them align more closely with other data, such as radiocarbon datings of archaeological specimens, appears to be quite on the mark in regard to subclades of haplogroup QR-M45. However, I would say that one may reasonably apply a 1.15x correction factor to YFull's TMRCA estimates for most clades, and subclades of haplogroup O may require a correction factor of 1.20x or more.
    What is going on here?



    I think this picture is better, because they included a very ancient Y chromosome D0 from the Africans in this picture. Then it depicts a real population of DE that separates from a CF population more correctly than in the discussed article where there is no y Chromosome D haplogroup. What we have in this new article is just a more isolated population in Africa, which influenced the y chromosome E haplogroup, and the part of this population separated 77800 years ago and influenced the Eurasians. It is not necessarily modal for all Eurasians, so the timing of these splits in question is more likely to represent the timing of the distribution of the population, which also split from the one that influenced the E haplogroup. The timing of the new article should not be accepted as real patterns of the timing of the most massive haplogroups' distribution. These patterns and even the whole haplogroup 'trees' change from one article to another article, which can have a certain meaning though. But it does not mean that those drawings represent the final and real picture of the past.
    Last edited by SarahMaludongsMotherInLaw; 12-05-2022 at 12:14 AM.

Page 1 of 2 12 LastLast

Similar Threads

  1. Dante Labs Long Read Test
    By pmokeefe in forum Dante Labs
    Replies: 162
    Last Post: 12-17-2019, 09:52 PM
  2. Improved Long read genome sequencing
    By Amerijoe in forum Medical Genetics
    Replies: 0
    Last Post: 08-13-2019, 01:07 PM
  3. Chromium long Read results in DF27
    By Earl Davis in forum R1b-DF27
    Replies: 0
    Last Post: 11-05-2018, 05:53 PM
  4. Stretch highly selected for on chromosome 16?
    By Salkin in forum Autosomal (auDNA)
    Replies: 2
    Last Post: 01-19-2015, 08:28 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •