Page 118 of 119 FirstFirst ... 1868108116117118119 LastLast
Results 1,171 to 1,180 of 1189

Thread: Dante Labs (WGS)

  1. #1171
    Gold Class Member
    Posts
    1,075
    Sex
    Location
    Birmingham, UK
    Ethnicity
    Indian - Punjabi Jatt
    Nationality
    British
    Y-DNA (P)
    R2-SK2142 > A26340
    mtDNA (M)
    U7a3a
    Y-DNA (M)
    R1b-Z2109 > Y35099
    mtDNA (P)
    M5a1a

    England United Kingdom India India Punjab
    On an unrelated note, even with minimap2, or any other non-alt aware aligner, you should still be able to map with unlocalised and decoy contigs, as these do not have known positions (known or otherwise) on the primary assembly.

    For YFull analysis, this is unnecessary as the only unlocalised region relevant to chrY is the chrY_KI270740v1_random region, and YFull have told me that they don't look at this region.

    Decoys also don't appear to be that necessary to use as minimap2 is already a very quick aligner on usegalaxy.eu. No harm in it, though.

    Just thought this was worth mentioning

    EDIT: The NCBI FTP server states that the GRCh38 "no-alt analysis set" contains unplaced and unlocalised contigs, as well as the Epstein-Barr virus genome, which I believe serves as a decoy. There is also a "no-alt plus hs38d1 analysis set" that contains everything from the no-alt analysis set, with additional decoy contigs. I would recommend using the "no-alt plus hs38d1 analysis set" for non-alt aware aligners such as minimap2.
    Last edited by aaronbee2010; 01-07-2020 at 03:43 PM.
    FTDNA: IN41220, YFull: YF67680 (FTDNA)

    Ancestral Haplogroups (Punjabi Jatt) - Italics = Predicted:
    * Father: R2-M479 > M124 > M9710 > SK2142 > Y1383* (Z6135-) > A26340 (Novel) - M5a1a
    * Maternal Uncle: R1b-M343 > M269 > Z2103 > Z2109 > Y35099 - U7a3a
    * MGM's MGF: R1a-M420 > M417 > Z93 > L657 > Y7 - ?

    Friends haplogroups:
    * North Moroccan Berber: E-P96 > M35 > L19 > M81 > M183 - R0
    * Gujarati Lohana: T-M184 > M70 > Y11151 - R30b1

    Hidden Content

  2. #1172
    Registered Users
    Posts
    67
    Sex
    Nationality
    Finnish
    Y-DNA (P)
    I-Y40511
    mtDNA (M)
    U5b1-a2h2d2a1a2

    Finland
    Yeah these alt contigs and their naming seem unnecessarily complicated. You also have to map the dbSNP VCF to use the same naming convention if you are going to do variant calling yourself. I'll try to construct script that does that extra/alt file based on GRCh38_major_release_seqs_for_alignment_pipelines and GCA_000001405.28_GRCh38.p13 diff.. don't feel like copy-pasting those sections manually.

  3. The Following User Says Thank You to tontsa For This Useful Post:

     aaronbee2010 (01-07-2020)

  4. #1173
    Registered Users
    Posts
    423
    Sex

    The decoy contigs are there to serve as a sink for reads which don't originate from the primary human genome, and would otherwise end up forced to map against the primary assembly whether they actually belong there or not. Heng Li published some primarily results indicating this does improve the final results; it also requires extra memory and in many cases would probably be slower. But if you're using free outside processing service, why would you care about efficiency?

    However, for saliva derived samples, be aware that majority of the non-primary-assembly reads actually originate from oral microbiome, so the blood-sequencing decoys like hs38d1 don't make that much difference, there's going to be a lot of off-target sequences from saliva. A good haplotype-aware variant caller may be able to ignore most of them as it detects the variants belong to different haplotype (Ie. chromosome/genome) but it's going to skew things.

    Also note that including or excluding the unlocalized contigs will change your results for similar reasons. It's basically true these don't require alt-aware caller, but the results will still be affected, because reads can preferentially map to the ulocalized contig or get folded on the primary contig. This is actually not very well accounted in sequencing now, and might benefit from being treated more like alt-contigs.

    And finally, if you roll your own from the genome patches etc. prime thing to be aware of is that a lot of the hs38d1 decoy sequences are actual human genome that is included in the patches, so you have to handle duplicates if you combine patches and decoy.

    A lot of this is detailed already in the "Dante Labs technical" thread on this forum, and done in my scripts like https://github.com/Donwulff/bio-tool...8_bwa_index.sh - which by the way I'm eager to receive feedback and improvements on. The current resource naming requires a bit of work (The HLA file updates but keeps the same name, so I have to handle that). Of course I'm not sure if it'll make sense to make that completely fire & forget script, you'll need prerequisites and preferably know what you're doing.

  5. The Following 2 Users Say Thank You to Donwulff For This Useful Post:

     aaronbee2010 (01-09-2020),  karwiso (01-11-2020)

  6. #1174
    Gold Class Member
    Posts
    2,064
    Sex
    Location
    Florida, USA.
    Ethnicity
    English, Scottish & Irish
    Nationality
    American
    Y-DNA (P)
    I-A13252
    mtDNA (M)
    H1e2
    mtDNA (P)
    K1

    England Scotland Ireland Prussia Italy Two Sicilies United States of America
    Iím not a Dante Labs customer but thought this might be of interest to some:


    BC Platforms Partners with Dante Labs to Build Europe's Largest Next Generation Sequencing Laboratory for Private and Public Customers


    https://markets.businessinsider.com/...qcy6xqut7epqcu
    Paper Trail: 43.8% English, 29.7% Scottish, 12.5% Irish, 6.25% German, 6.25% Italian & 1.5% French. Or: 86% British Isles, 6.25% German, 6.25% Italian & 1.5% French.
    LDNA: 88.1% British Isles (59.7% English, 27% Scottish & 1.3% Irish), 5.9% Europe South (Aegian 3.4%, Tuscany 1.3%, Sardinia 1.1%), 4.4% Europe NW (Scandinavia) & 1.6% Europe East, (Mordovia).
    BigY 700: I1-Z140 >I-F2642 >Y1966 >Y3649 >A13241 >Y3647 >A13248 (circa 620 AD) >A13242/YSEQ (circa 765 AD) >A13252/YSEQ (circa 1630 AD).

  7. The Following 2 Users Say Thank You to JMcB For This Useful Post:

     aaronbee2010 (01-09-2020),  pmokeefe (01-09-2020)

  8. #1175
    Gold Class Member
    Posts
    1,075
    Sex
    Location
    Birmingham, UK
    Ethnicity
    Indian - Punjabi Jatt
    Nationality
    British
    Y-DNA (P)
    R2-SK2142 > A26340
    mtDNA (M)
    U7a3a
    Y-DNA (M)
    R1b-Z2109 > Y35099
    mtDNA (P)
    M5a1a

    England United Kingdom India India Punjab
    Quote Originally Posted by JMcB View Post
    I’m not a Dante Labs customer but thought this might be of interest to some:


    BC Platforms Partners with Dante Labs to Build Europe's Largest Next Generation Sequencing Laboratory for Private and Public Customers


    https://markets.businessinsider.com/...qcy6xqut7epqcu
    I like the idea of workflow automation, that would give the potential to drive the prices down further. More machines should equal less time to process a given number of kits can also reduce the amount of labour needed just like automation.

    More than anything else, this can translate to more reliable results delivery, which I really hope will happen. I would also like to see some competitors pop up, which can help lower WGS prices for the consumer as a whole.

    I'm not 100% optimistic, but this does have the potential to be a big step forward for bringing WGS to the mainstream.
    FTDNA: IN41220, YFull: YF67680 (FTDNA)

    Ancestral Haplogroups (Punjabi Jatt) - Italics = Predicted:
    * Father: R2-M479 > M124 > M9710 > SK2142 > Y1383* (Z6135-) > A26340 (Novel) - M5a1a
    * Maternal Uncle: R1b-M343 > M269 > Z2103 > Z2109 > Y35099 - U7a3a
    * MGM's MGF: R1a-M420 > M417 > Z93 > L657 > Y7 - ?

    Friends haplogroups:
    * North Moroccan Berber: E-P96 > M35 > L19 > M81 > M183 - R0
    * Gujarati Lohana: T-M184 > M70 > Y11151 - R30b1

    Hidden Content

  9. The Following User Says Thank You to aaronbee2010 For This Useful Post:

     pmokeefe (01-09-2020)

  10. #1176
    Gold Class Member
    Posts
    1,075
    Sex
    Location
    Birmingham, UK
    Ethnicity
    Indian - Punjabi Jatt
    Nationality
    British
    Y-DNA (P)
    R2-SK2142 > A26340
    mtDNA (M)
    U7a3a
    Y-DNA (M)
    R1b-Z2109 > Y35099
    mtDNA (P)
    M5a1a

    England United Kingdom India India Punjab
    Quote Originally Posted by Donwulff View Post
    The decoy contigs are there to serve as a sink for reads which don't originate from the primary human genome, and would otherwise end up forced to map against the primary assembly whether they actually belong there or not. Heng Li published some primarily results indicating this does improve the final results; it also requires extra memory and in many cases would probably be slower. But if you're using free outside processing service, why would you care about efficiency?
    I was under the impression that decoys speed up the alignment process. Here's a quote I found that says this:

    But actually from what I hear, the major motivation for people to use the decoy genome is speed. If you include the decoy in your reference genome when you do the original alignment, many reads will quickly find a very confident alignment in the decoy, thus avoiding countless compute cycles spent trying to Smith-Waterman align it to someplace it doesn’t belong. This purpose – siphoning off these reads to keep them from slowing down the whole alignment – is why this is called the ‘decoy genome.’
    Of course, I'm not treating the above quote as divine truth, but it appears to make sense, at least to me.

    Quote Originally Posted by Donwulff View Post
    However, for saliva derived samples, be aware that majority of the non-primary-assembly reads actually originate from oral microbiome, so the blood-sequencing decoys like hs38d1 don't make that much difference, there's going to be a lot of off-target sequences from saliva. A good haplotype-aware variant caller may be able to ignore most of them as it detects the variants belong to different haplotype (Ie. chromosome/genome) but it's going to skew things.
    That's very interesting. I take it that downloading oral microbiome decoys and merging them into one decoy file might be a good idea for saliva-derived samples?

    Quote Originally Posted by Donwulff View Post
    Also note that including or excluding the unlocalized contigs will change your results for similar reasons. It's basically true these don't require alt-aware caller, but the results will still be affected, because reads can preferentially map to the ulocalized contig or get folded on the primary contig. This is actually not very well accounted in sequencing now, and might benefit from being treated more like alt-contigs.
    That's a fair point, if some decoys behave like alt contigs, then they could be subject to the same reduction in coverage and mapping quality for certain positions.

    Quote Originally Posted by Donwulff View Post
    And finally, if you roll your own from the genome patches etc. prime thing to be aware of is that a lot of the hs38d1 decoy sequences are actual human genome that is included in the patches, so you have to handle duplicates if you combine patches and decoy.
    Looking at alt-aware BAM files I've made with BWA MEM, there are a few cases where reads map to more than one alt contig (or even primary contig), even without the presence of patches. I'm assuming that the MAPQ on the target region on the primary assembly would be fine with multiple alternative alignments. The hg38p13+alt+decoy BAM I made had slightly better MAPQ, coverage and less no-calls than the hg38+alt+decoy BAM, but those were with basic .alt indexes without CIGAR strings or edit distances, so didn't work with the post-alt script in bwakit when I tried. I just aligned a BAM with hs38DH that did work with post-alt processing. I'll need to align some patch contigs to the primary assembly for their target chromosomes, but when I've tried, the alignment gets split into different lines on the SAM header, instead of one line, which is what I'm after.

    Quote Originally Posted by Donwulff View Post
    A lot of this is detailed already in the "Dante Labs technical" thread on this forum, and done in my scripts like https://github.com/Donwulff/bio-tool...8_bwa_index.sh - which by the way I'm eager to receive feedback and improvements on. The current resource naming requires a bit of work (The HLA file updates but keeps the same name, so I have to handle that). Of course I'm not sure if it'll make sense to make that completely fire & forget script, you'll need prerequisites and preferably know what you're doing.
    I'm quite busy with exams this month, but this is something I would like to have a look at afterwards.
    Last edited by aaronbee2010; 01-11-2020 at 05:28 PM.
    FTDNA: IN41220, YFull: YF67680 (FTDNA)

    Ancestral Haplogroups (Punjabi Jatt) - Italics = Predicted:
    * Father: R2-M479 > M124 > M9710 > SK2142 > Y1383* (Z6135-) > A26340 (Novel) - M5a1a
    * Maternal Uncle: R1b-M343 > M269 > Z2103 > Z2109 > Y35099 - U7a3a
    * MGM's MGF: R1a-M420 > M417 > Z93 > L657 > Y7 - ?

    Friends haplogroups:
    * North Moroccan Berber: E-P96 > M35 > L19 > M81 > M183 - R0
    * Gujarati Lohana: T-M184 > M70 > Y11151 - R30b1

    Hidden Content

  11. The Following User Says Thank You to aaronbee2010 For This Useful Post:

     pmokeefe (01-11-2020)

  12. #1177
    Junior Member
    Posts
    2
    Sex

    Quote Originally Posted by Mikewww View Post
    How does the hard drive method of transmission work in terms of safety? Does it provide extra security or are there holes in this method.

    I have some information on my home systems cordoned off network-wise and only back up with portable hard drives that go into safes. Maybe I'm overdoing it but there is some data I want only my family be able to retrieve. Hard drives are cheap and everyone should have a solid fireproof safe.
    I may totally agree, recently I've bought a small safe for the things I'd like to be only mine. I've found a portable nice fireproof SentrySafe 1200 Fireproof Box and you know, if I need my info with me, it's a perfect decision. Maybe, it's a bit old fashioned, but I feel myself comfortable, knowing all my materials are safe.

  13. #1178
    Registered Users
    Posts
    39
    Sex
    Location
    Vermont, USA
    Ethnicity
    NW European
    Nationality
    American
    Y-DNA (P)
    R1b-Z295*
    mtDNA (M)
    U2e1j1

    Quote Originally Posted by Staystay View Post
    I may totally agree, recently I've bought a small safe for the things I'd like to be only mine. I've found a portable nice fireproof SentrySafe 1200 Fireproof Box and you know, if I need my info with me, it's a perfect decision. Maybe, it's a bit old fashioned, but I feel myself comfortable, knowing all my materials are safe.
    Please pardon the off-topic post, but anyone considering a fireproof safe as a storage method for magnetic media like hard drives must consider that most fireproof safes are designed to protect paper documents (keeping contents below 451*F), not to protect magnetic media (which needs to stay below 125-150*F). CDs and DVDs may be able to make it up to 350*F or so. A firesafe designed for paper products will not protect drives or discs.

    In my case, I go for offsite storage for a backup copy of my and my wife's WGS results and analysis run on the data.

  14. #1179
    Junior Member
    Posts
    6
    Sex

    Finally got my bam and fastq files.
    Had to wait 1 year and 4 months.
    Was able to generate my gvcf file from my bam took me 1,564.82 minutes to do so on a high end laptop from 2015.
    I don't want to know how long it would take me to generate a new bam from a my fastq files mapped to a newer reference than hg19.
    These bioinformatic tools are so slow.

    Since I don't really trust dante labs after all the issues and customer messages i needed to send to get my results.
    I want to test the results to see if it really is my data.
    Anyone know how accurate a microarray like that of 23me is for snps?
    I Was thinking of ordering a test there to compare my results.
    My plan was to annotate my vcf files from the wgs with the dbsnp database to get the RS nr's and after that filter the high quality reads.
    And write a little script in python to compare the conflicts between 23me snps and the wgs snps by checking for the base pair and the zygosity.
    Is this a solid plan?
    Would I get a 99%+ match?
    I don't really have a background in bioinformatics so don't know the accuracy of these 2 sequencing machines that's why I'm asking.

  15. The Following User Says Thank You to newuser For This Useful Post:

     pmokeefe (01-20-2020)

  16. #1180
    Registered Users
    Posts
    67
    Sex
    Nationality
    Finnish
    Y-DNA (P)
    I-Y40511
    mtDNA (M)
    U5b1-a2h2d2a1a2

    Finland
    You can compare with DNA Kit Studio how close your Dante reads are to the microarray. I've verified against Evogenom and MyHeritage's raw files my two Dante kits and they are as close as they can get.

  17. The Following User Says Thank You to tontsa For This Useful Post:

     teepean47 (01-20-2020)

Page 118 of 119 FirstFirst ... 1868108116117118119 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •