Page 9 of 123 FirstFirst ... 78910111959109 ... LastLast
Results 81 to 90 of 1223

Thread: Dante Labs (WGS)

  1. #81
    Registered Users
    Posts
    97
    Sex
    Location
    Stockholm, Sweden
    Ethnicity
    Ingrian + Russian
    Nationality
    Swedish, Russian
    Y-DNA (P)
    R1a1a1b1-Z92
    mtDNA (M)
    J1c3k

    Finland Russian Federation Sweden
    Hi!

    I contacted Dante Labs and they sent me two USB-drives with FASTQ and BAM files. The BAM is aligned to GRCh37.
    I have submitted the results to YFull.com to get my Y-SNPs.

    I also uploaded BAM (120 GB ) to sequencing.com and it took a day or two until it showed up in the list of my files. Genome VCF app uses BAM files and it produced gVCF-file in a day or two.
    I have tried to upload gVCF to GEDmatch Genesis, but the file has not been accepted because it was too large (1.1 GB ). So, bad news - we cannot use gVCF with gedmatch. I hope that they will develop a better matching algorithm and work with "ususal" VCFs.

  2. #82
    Registered Users
    Posts
    445
    Sex

    I need to keep better track of my "experiments". I think the Genomic VCF failed for me at GEDMatch Genesis, but last time I tried they also stopped accepting VCF files direct from Sequencing.com, because they don't have VCF at end of the sharing URL. I managed to use something like "&name=sample1.vcf.gz" (Which Sequencing.com ignores, but the service may interpret as file-type) at the end originally, but GEDmatch stopped accepting that. I'm wondering what I was sending if the genomic VCF was too large, though...

    One solution to this would be to filter the VCF to calls for the "known" sites on dbSNP. This has the added benefit of filtering out novel SNP calls which might be errors; rare SNP's which have emerged during the last few generations have little significance for relative matching, because you already share so many other SNP's and could have emerged after the MRCA. Unfortunately I don't think Sequencing.com can do that filtering, and Genesis GEDMatch doesn't yet.

    Another considerable option is still imputation, because it produces calls only for locations included in the reference panel. It would also fill for uncertain and uncalled locations in the genome; I'm not sure if there's any data on whether it's better to leave them uncalled or fill with imputation though. Depends a lot on the matching algorithm.

    Currently I'm testing my Dante Labs sequence with Sanger Imputation Service on GEDMatch Genesis, and that significantly reduces the false positives, though it's significantly less straightforward than preferable for consumer genetics. Also, lack of notes, I think I may have included my Genes for Good results in the imputation, which is great for data quality, but after that you're pretty much just doing microarray vs. microarray comparisons against since few people have whole genome yet.

    Edit: I don't use 23andMe for imputation because some of 23andMe's custom probes have reverse orientation from expected, and I'm STILL to get around to figuring which... Also, .vcf.gz as it's compressed.
    Last edited by Donwulff; 01-10-2018 at 08:24 PM.

  3. The Following User Says Thank You to Donwulff For This Useful Post:

     karwiso (01-10-2018)

  4. #83
    Registered Users
    Posts
    445
    Sex

    On the other hand Promethease.com still has free interpretation (And if you register & allow them to use your de-identified data, free report-updates) until 15th January, so be sure to take advantage of that before then if you've got your data aren't prone to worrying and hypochondria. Read their disclaimers carefully though, I hate it when people take Promethease interpretation as absolute truth or call it superior to clinically validated tests. The genomic VCF *does* work with them, although there's some caveats to low sequencing depth/quality calls. If you have microarray tests and relatives results (With consent!) all the better for seeing potential conflicting calls.

  5. The Following User Says Thank You to Donwulff For This Useful Post:

     karwiso (01-10-2018)

  6. #84
    Registered Users
    Posts
    277
    Sex
    Location
    Wisconsin, USA
    Nationality
    American
    Y-DNA (P)
    R1b-FGC29071
    mtDNA (M)
    U5a1b1g*

    Ireland England Netherlands Germany France
    If one (or more) of you who have received your Dante Labs WGS BAM back would consider sharing the file, I'd like to get the Y chromosome coverage stats added to: http://www.haplogroup-r.org/stats.html In return I'll send back a gVCF aligned to hg38 and callable status report in BED format. (This may take up to two weeks.)

    The BAM needs to be in URL that can be retrieved via a HTTP or FTP pull, since my host can not currently accommodate an upload that large.

    Use the BAM analysis submission form: http://www.haplogroup-r.org/submit.html It has a link to the Data Use policy the site abides by.

  7. The Following 2 Users Say Thank You to JamesKane For This Useful Post:

     gotten (01-13-2018),  karwiso (01-12-2018)

  8. #85
    Registered Users
    Posts
    445
    Sex

    Quote Originally Posted by JamesKane View Post
    If one (or more) of you who have received your Dante Labs WGS BAM back would consider sharing the file, I'd like to get the Y chromosome coverage stats added to: http://www.haplogroup-r.org/stats.html In return I'll send back a gVCF aligned to hg38 and callable status report in BED format. (This may take up to two weeks.)

    The BAM needs to be in URL that can be retrieved via a HTTP or FTP pull, since my host can not currently accommodate an upload that large.

    Use the BAM analysis submission form: http://www.haplogroup-r.org/submit.html It has a link to the Data Use policy the site abides by.
    Hey, is this haplogroup R only as could be inhered from the site, or can all haplogroups send in? Also I've already done the hg38 mapping myself, however I've had trouble identifying the adapter sequences for the purposes of de novo alignment. I used READ_NAME_REGEX="CL1000XXXXXL2C([0-9]+)R([0-9]+)_([0-9]+)" (X'd out sample ID, wrote it in for performance) for the optical duplicate read-name template, which doesn't conform to normal bcl2fastq naming either. Could you figure out the adapter sequence, are those sequences barcoded or degraded?

  9. The Following User Says Thank You to Donwulff For This Useful Post:

     karwiso (01-29-2018)

  10. #86
    Registered Users
    Posts
    277
    Sex
    Location
    Wisconsin, USA
    Nationality
    American
    Y-DNA (P)
    R1b-FGC29071
    mtDNA (M)
    U5a1b1g*

    Ireland England Netherlands Germany France
    The primary focus of that site is indeed Haplogroup R. The NGS statistics page is more about the tests being used though and includes the out group samples picked up from publicly released WGS results.

    As to your question on adapter sequence, I believe it is standard practice on Illumina instruments to trim them before delivering the final data. Having not played with any of the de novo alignment tools there's not much advice I can offer on workflow here. Check the usual places for people who deal with these types of questions daily. These are the ones I hit most frequently:

    https://www.researchgate.net
    https://www.biostars.org

  11. The Following User Says Thank You to JamesKane For This Useful Post:

     karwiso (01-29-2018)

  12. #87
    Registered Users
    Posts
    445
    Sex

    Thanks! I checked those places already, unfortunately the problem is not well understood or answered on those sites. In addition as suggested, there seems to be something weird going on with the Dante Labs raw data (Not the quality/quantity mind you, just that it isn't fully standard). For those not familiar with sequencing technology and terms, the adapters are DNA sequences which are ligated to the DNA fragments called "library inserts" to be sequenced. They're used to attach the DNA fragments for sequencing in the sequencer. When the insert length is shorter than the read length, here 100 bases, adapter read-through occurs and the sequencer reads part of the adapter instead of the actual sequence.

    When the sequencing reads are mapped against reference genome eg. with BWA MEM, this poses little problem, as BWA MEM will "soft-clip" the read ends that do not match the reference and will not use them to determine variations/mutations. This is also probably safest way to try to figure out what an unknown adapter sequence might be, just by tallying up the soft-clipped sequences. Unfortunately, this doesn't appear to match Illumina's list of standard adapter sequences (Which is, quite big, and in addition a lab could develop their own adapters for reasons unknown), but also, there seems to be large variety in the potential adapter sequences. I used Trim Galore!/cutadapt before realizing that they're not standard Illumina adapters, so it doesn't work.

    Another thing of note is that since the DNA insert fragment is read from both ends (paired-end sequencing), when adapter read-through happens, the reads will end up mirrors of each other. A few of the more recent adapter trimming programs, chiefly Trimmomatic, are using this to identify the adapters. Unfortunately, even Trimmomatic expects that the adapter sequence must resemble one of Illumina's standard adapters. I think I'm going to give AlienTrimmer that I just heard of a try, and/or write a script to programmatically tally out the soft-clipped sequences from the BAM file and try to find some kind of pattern in that. Of course, if even you don't know the adapter sequence, I guess I might as well see if Dante Labs can reveal it. One thought is they could be using NovaSeq, and I haven't found out technical details on Illumina's NovaSeq platform yet. Or could they be saving costs by using self-made adapters?

  13. The Following 2 Users Say Thank You to Donwulff For This Useful Post:

     gotten (01-21-2018),  karwiso (01-29-2018)

  14. #88
    Registered Users
    Posts
    277
    Sex
    Location
    Wisconsin, USA
    Nationality
    American
    Y-DNA (P)
    R1b-FGC29071
    mtDNA (M)
    U5a1b1g*

    Ireland England Netherlands Germany France
    Here are the raw callable loci stats for the Y chromosome on hg38 for the first Dante Lab's 30x WGS I received:

    state nBases
    CALLABLE 14,278,386
    NO_COVERAGE 21,194
    LOW_COVERAGE 308,597
    POOR_MAPPING_QUALITY 9,065,418

    The CALLABLE metric indicates regions with 4 or more reads, which have 90% of the reads with a PHRED-scaled read alignment quality of at least 10. The median value for a 30x WGS test with 150 base pair reads is 14,922,530. Where the two differ is in the regions assigned POOR_MAPPING_QUALITY or LOW_COVERAGE. I won't make too many more comments on those differences without seeing a few more to get an average.

    At $700 this test is the bang-for-the-buck leader for chrY variant discovery at the moment. I'm not sure how well it performs for Y STR purposes. Intuitively it won't be quite as good as the 150 base pair tests. Although my Y Elite that was also 100 base pair has a better STR extraction rate than any 150 base pair Big Y I've seen at YFULL. This should be similar.
    Last edited by JamesKane; 01-26-2018 at 12:55 PM.

  15. The Following 2 Users Say Thank You to JamesKane For This Useful Post:

     gotten (01-26-2018),  karwiso (01-26-2018)

  16. #89
    Registered Users
    Posts
    445
    Sex

    YFull stats, both hg19
    Big Y 10/14/2015 - up to 160bp PE
    STRs (all): 587
    Reliable alleles: 494 (84.16%)
    Uncertain alleles: 19 (3.24%)
    N/A: 74 (12.61%)
    Out of Y111: 100 reliable, 1 uncertain, 10 N/A

    Dante Labs, batched 18/07/2017
    STRs (all): 587
    Reliable alleles: 378 (64.40%)
    Uncertain alleles: 35 (5.96%)
    N/A: 174 (29.64%)
    Out of Y111: 81 reliable, 7 uncertain, 23 N/A

    This is just one sample, of course. For STR's, with the much shorter read length it's no huge surprise, this isn't a good deal for the STR's alone. However, when comparing against other NGS tests on YFull, remember you still have >300 of the shorter STR's available. Also, having upgraded from Y-37 to Y-111, I can confirm that all listed STR's were called correctly, even the uncertain one. Some people still swear by FTDNA's STR-tests, but I believe NGS tests are where it is at, and with more competition (Dante Labs unfortunately only in Europe) things are looking good. Since Dante Labs is WGS, it's not bound to any specific reference build, and could also pick up on haplogroup specific structural variation if that ever becomes a thing.

  17. The Following 3 Users Say Thank You to Donwulff For This Useful Post:

     gotten (01-29-2018),  JamesKane (01-29-2018),  karwiso (01-29-2018)

  18. #90
    Registered Users
    Posts
    97
    Sex
    Location
    Stockholm, Sweden
    Ethnicity
    Ingrian + Russian
    Nationality
    Swedish, Russian
    Y-DNA (P)
    R1a1a1b1-Z92
    mtDNA (M)
    J1c3k

    Finland Russian Federation Sweden
    And here is statistics for my sample from Dante Labs (analysed at Yfull):

    STRs (all): 587
    Reliable alleles: 504 (85.86%)
    Uncertain alleles: 25 (4.26%)
    N/A: 58 (9.88%)

Page 9 of 123 FirstFirst ... 78910111959109 ... LastLast

Similar Threads

  1. Dante Labs WES/WGS Sequencing Technical
    By Donwulff in forum Dante Labs
    Replies: 164
    Last Post: 10-29-2020, 09:45 PM
  2. Dante Labs WGS (30x) $299
    By noman in forum Dante Labs
    Replies: 3
    Last Post: 08-30-2020, 09:06 PM
  3. Dante Labs subforum?
    By MacUalraig in forum Suggestions
    Replies: 2
    Last Post: 11-11-2019, 12:35 AM
  4. Whole Genome sequence $299 at Dante Labs
    By Dr_McNinja in forum Dante Labs
    Replies: 15
    Last Post: 02-18-2019, 12:30 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •