Page 102 of 125 FirstFirst ... 25292100101102103104112 ... LastLast
Results 1,011 to 1,020 of 1241

Thread: Dante Labs (WGS)

  1. #1011
    Registered Users
    Posts
    488
    Sex
    Location
    Praha, Czech Republic
    Ethnicity
    Czech
    Nationality
    Czech
    Y-DNA (P)
    R-Y14088
    mtDNA (M)
    J1c1i

    Czech Republic Austria Austrian Empire Bohemia Carinthia
    Here is the fastqc report: https://www.dropbox.com/s/gexahrb6zc...stqc.html?dl=1

    Total Sequences 333747905
    Sequence length 35-151

    One error: Per base sequence content
    2 warnings: Sequence Length Distribution, Sequence Duplication Levels
    Y-DNA: R-Y14088 (ISOGG: R1b1a1a2a1a2b1c2b1a1a)
    mtDNA: J1c1i (J1c1 + 7735G and 8848C) Extras: 198T 12007A 16422C 16431A

  2. The Following User Says Thank You to Petr For This Useful Post:

     pmokeefe (11-13-2019)

  3. #1012
    Registered Users
    Posts
    277
    Sex
    Location
    Wisconsin, USA
    Nationality
    American
    Y-DNA (P)
    R1b-FGC29071
    mtDNA (M)
    U5a1b1g*

    Ireland England Netherlands Germany France
    Quote Originally Posted by Petr View Post
    I'm surprised that these two FASTQ files are only 25 GB each - previously, WGS 30x files from Dante were much bigger, 50 GB each.

    What is the size of the FASTQ files delivered to other testers this month?
    Can we all please STOP USING COMPRESSED FILE SIZES as a metric for evaluating FASTQ data being returned? The important metrics are how many gBases are represented in the files, much of it maps to the genome reference territory of your choice, and median coverage on the genome that isn't defined as N (any) base.

    The better your data quality and the longer the reads, the smaller your compressed files will be. (Within reason...)
    Last edited by JamesKane; 11-12-2019 at 11:34 PM.

  4. The Following 3 Users Say Thank You to JamesKane For This Useful Post:

     JMcB (11-13-2019),  karwiso (11-13-2019),  pmokeefe (11-12-2019)

  5. #1013
    Registered Users
    Posts
    488
    Sex
    Location
    Praha, Czech Republic
    Ethnicity
    Czech
    Nationality
    Czech
    Y-DNA (P)
    R-Y14088
    mtDNA (M)
    J1c1i

    Czech Republic Austria Austrian Empire Bohemia Carinthia
    Quote Originally Posted by JamesKane View Post
    Can we all please STOP USING COMPRESSED FILE SIZES as a metric for evaluating FASTQ data being returned? The important metrics are how many gBases are represented in the files, much of it maps to the genome reference territory of your choice, and median coverage on the genome that isn't defined as N (any) base.

    The better your data quality and the longer the reads, the smaller your compressed files will be. (Within reason...)
    OK, I agree, but how to get these numbers from FASTQ files?
    Y-DNA: R-Y14088 (ISOGG: R1b1a1a2a1a2b1c2b1a1a)
    mtDNA: J1c1i (J1c1 + 7735G and 8848C) Extras: 198T 12007A 16422C 16431A

  6. #1014
    Registered Users
    Posts
    277
    Sex
    Location
    Wisconsin, USA
    Nationality
    American
    Y-DNA (P)
    R1b-FGC29071
    mtDNA (M)
    U5a1b1g*

    Ireland England Netherlands Germany France
    fastp will perform most of the same metrics you posted from fastqc, but also includes the total base estimates. For a 30x WGS you should see more than 90G.

    The others require aligning the files to a human reference first. Your goal here is to see how well the 600+ million reads fit and if the average aligned coverage is about 30x.

  7. #1015
    Registered Users
    Posts
    488
    Sex
    Location
    Praha, Czech Republic
    Ethnicity
    Czech
    Nationality
    Czech
    Y-DNA (P)
    R-Y14088
    mtDNA (M)
    J1c1i

    Czech Republic Austria Austrian Empire Bohemia Carinthia
    Quote Originally Posted by JamesKane View Post
    fastp will perform most of the same metrics you posted from fastqc, but also includes the total base estimates. For a 30x WGS you should see more than 90G.
    General
    fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
    sequencing: paired end (151 cycles + 151 cycles)
    mean length before filtering: 148bp, 148bp
    mean length after filtering: 148bp, 148bp
    duplication rate: 5.986576%
    Insert size peak: 261
    Before filtering
    total reads: 667.495810 M
    total bases: 99.125328 G
    Q20 bases: 96.030385 G (96.877748%)
    Q30 bases: 90.925155 G (91.727470%)
    GC content: 43.566229%
    After filtering
    total reads: 662.534472 M
    total bases: 98.299320 G
    Q20 bases: 95.480444 G (97.132355%)
    Q30 bases: 90.481013 G (92.046428%)
    GC content: 43.541445%
    Filtering result
    reads passed filters: 662.534472 M (99.256724%)
    reads with low quality: 4.778964 M (0.715954%)
    reads with too many N: 37.338000 K (0.005594%)
    reads too short: 145.036000 K (0.021728%)


    And now - is it good or bad result?

    Quote Originally Posted by JamesKane View Post
    The others require aligning the files to a human reference first. Your goal here is to see how well the 600+ million reads fit and if the average aligned coverage is about 30x.
    If I have bam file, what statistics is the best?
    Last edited by Petr; 11-13-2019 at 05:38 PM.
    Y-DNA: R-Y14088 (ISOGG: R1b1a1a2a1a2b1c2b1a1a)
    mtDNA: J1c1i (J1c1 + 7735G and 8848C) Extras: 198T 12007A 16422C 16431A

  8. #1016
    Gold Class Member
    Posts
    579
    Sex
    Location
    San Diego, CA
    Ethnicity
    Polish/British Isles
    Nationality
    U.S.
    Y-DNA (P)
    R-A9185
    mtDNA (M)
    H1
    mtDNA (P)
    J1c2

    Poland England Ireland Munster

    Dante Labs 30X WGS test turned out to only be 7X

    ***Update ***
    I just checked the VCF file from that kit using bcftools stats. It show 30X coverage, so hopefully there will be more FASTQ files coming.
    *************
    I just received FASTQ files from a Dante Labs test (from the 2018 Black Friday sale). It appears that the coverage was about 7X.
    I counted the lines, there were 290,777,460 lines in each file. With 4 lines per read and two files with 150 bp reads, and a ~3GB genome I get about 7X.
    The fastp report below seems to bear that out.

    Anyone else with a Dante Labs 30X result that came up short?

    >fastp -i 60820188474302_SA_L001_R1_001.fastq.gz -I 60820188474302_SA_L001_R2_001.fastq.gz

    Summary
    General
    fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
    sequencing: paired end (151 cycles + 151 cycles)
    mean length before filtering: 146bp, 146bp
    mean length after filtering: 146bp, 146bp
    duplication rate: 5.175692%
    Insert size peak: 261
    Before filtering
    total reads: 145.388730 M
    total bases: 21.354551 G
    Q20 bases: 20.879118 G (97.773623%)
    Q30 bases: 20.044981 G (93.867489%)
    GC content: 41.690978%
    After filtering
    total reads: 144.125952 M
    total bases: 21.141710 G
    Q20 bases: 20.733409 G (98.068742%)
    Q30 bases: 19.922371 G (94.232540%)
    GC content: 41.645787%
    Filtering result
    reads passed filters: 144.125952 M (99.131447%)
    reads with low quality: 1.208892 M (0.831489%)
    reads with too many N: 17.714000 K (0.012184%)
    reads too short: 36.172000 K (0.024880%)
    Last edited by pmokeefe; 11-13-2019 at 06:11 PM. Reason: The VCF had 30X coverage
    YFull: YF14620 (Dante Labs 2018)

  9. The Following 2 Users Say Thank You to pmokeefe For This Useful Post:

     darethehair (03-30-2020),  MacUalraig (11-13-2019)

  10. #1017
    Registered Users
    Posts
    488
    Sex
    Location
    Praha, Czech Republic
    Ethnicity
    Czech
    Nationality
    Czech
    Y-DNA (P)
    R-Y14088
    mtDNA (M)
    J1c1i

    Czech Republic Austria Austrian Empire Bohemia Carinthia
    This is test finished on March 5th, 2019:

    General
    fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
    sequencing: paired end (150 cycles + 150 cycles)
    mean length before filtering: 150bp, 150bp
    mean length after filtering: 149bp, 149bp
    duplication rate: 21.974666%
    Insert size peak: 269
    Before filtering
    total reads: 834.317408 M
    total bases: 125.147611 G
    Q20 bases: 121.528261 G (97.107935%)
    Q30 bases: 115.666626 G (92.424158%)
    GC content: 40.918799%
    After filtering
    total reads: 827.816768 M
    total bases: 123.827389 G
    Q20 bases: 120.561766 G (97.362762%)
    Q30 bases: 114.859232 G (92.757534%)
    GC content: 40.873596%
    Filtering result
    reads passed filters: 827.816768 M (99.220843%)
    reads with low quality: 6.295018 M (0.754511%)
    reads with too many N: 65.278000 K (0.007824%)
    reads too short: 140.344000 K (0.016821%)
    Y-DNA: R-Y14088 (ISOGG: R1b1a1a2a1a2b1c2b1a1a)
    mtDNA: J1c1i (J1c1 + 7735G and 8848C) Extras: 198T 12007A 16422C 16431A

  11. The Following User Says Thank You to Petr For This Useful Post:

     pmokeefe (11-13-2019)

  12. #1018
    Registered Users
    Posts
    31
    Sex
    Location
    Finland
    Y-DNA (P)
    N-Y35885
    mtDNA (M)
    X2b4a1a

    Here is my Dante fastp output. Ordered Dec 2018. I have many fastq files. I run this only for the first fastq pair.


    $ ./fastp -i /mnt/d/DanteLabs/F_L01_517_1.fq.gz -I /mnt/d/DanteLabs/F_L01_517_2.fq.gz

    fastp report
    Summary
    General
    fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
    sequencing: paired end (100 cycles + 100 cycles)
    mean length before filtering: 100bp, 100bp
    mean length after filtering: 99bp, 99bp
    duplication rate: 0.607919%
    Insert size peak: 169
    Before filtering
    total reads: 227.436332 M
    total bases: 22.743633 G
    Q20 bases: 22.372274 G (98.367193%)
    Q30 bases: 21.171145 G (93.086030%)
    GC content: 42.706677%
    After filtering
    total reads: 226.843730 M
    total bases: 22.658819 G
    Q20 bases: 22.308741 G (98.455004%)
    Q30 bases: 21.119156 G (93.205014%)
    GC content: 42.687204%
    Filtering result
    reads passed filters: 226.843730 M (99.739443%)
    reads with low quality: 566.188000 K (0.248944%)
    reads with too many N: 26.414000 K (0.011614%)
    reads too short: 0 (0.000000%)
    Last edited by Giosta; 11-13-2019 at 06:13 PM.

  13. The Following User Says Thank You to Giosta For This Useful Post:

     pmokeefe (11-13-2019)

  14. #1019
    Registered Users
    Posts
    488
    Sex
    Location
    Praha, Czech Republic
    Ethnicity
    Czech
    Nationality
    Czech
    Y-DNA (P)
    R-Y14088
    mtDNA (M)
    J1c1i

    Czech Republic Austria Austrian Empire Bohemia Carinthia
    And this test was finished by Dante on January 14th, 2018:

    General
    fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
    sequencing: paired end (100 cycles + 100 cycles)
    mean length before filtering: 100bp, 100bp
    mean length after filtering: 99bp, 99bp
    duplication rate: 1.406735%
    Insert size peak: 157
    Before filtering
    total reads: 1.200368 G
    total bases: 120.036831 G
    Q20 bases: 116.151332 G (96.763078%)
    Q30 bases: 106.174508 G (88.451609%)
    GC content: 41.953024%
    After filtering
    total reads: 1.180651 G
    total bases: 117.875819 G
    Q20 bases: 114.391330 G (97.043933%)
    Q30 bases: 104.688158 G (88.812242%)
    GC content: 41.951735%
    Filtering result
    reads passed filters: 1.180651 G (98.357428%)
    reads with low quality: 7.075926 M (0.589480%)
    reads with too many N: 12.640984 M (1.053092%)
    reads too short: 0 (0.000000%)
    Y-DNA: R-Y14088 (ISOGG: R1b1a1a2a1a2b1c2b1a1a)
    mtDNA: J1c1i (J1c1 + 7735G and 8848C) Extras: 198T 12007A 16422C 16431A

  15. The Following User Says Thank You to Petr For This Useful Post:

     pmokeefe (11-13-2019)

  16. #1020
    Gold Class Member
    Posts
    270
    Sex
    Nationality
    Finnish
    Y-DNA (P)
    R1b-Z142
    mtDNA (M)
    H10g

    Here's fastp from my 4x test.

    fastp version: 0.19.11 (https://github.com/OpenGene/fastp)
    sequencing: paired end (151 cycles + 151 cycles)
    mean length before filtering: 148bp, 148bp
    mean length after filtering: 148bp, 148bp
    duplication rate: 4.958682%
    Insert size peak: 261
    Before filtering
    total reads: 347.074070 M
    total bases: 51.605934 G
    Q20 bases: 49.989405 G (96.867553%)
    Q30 bases: 47.543664 G (92.128290%)
    GC content: 42.797089%
    After filtering
    total reads: 342.757768 M
    total bases: 50.954371 G
    Q20 bases: 49.556123 G (97.255882%)
    Q30 bases: 47.189698 G (92.611679%)
    GC content: 42.717622%
    Filtering result
    reads passed filters: 342.757768 M (98.756374%)
    reads with low quality: 4.115656 M (1.185815%)
    reads with too many N: 34.754000 K (0.010013%)
    reads too short: 165.892000 K (0.047797%)

  17. The Following 3 Users Say Thank You to teepean47 For This Useful Post:

     MacUalraig (11-14-2019),  mdn (11-15-2019),  pmokeefe (11-13-2019)

Page 102 of 125 FirstFirst ... 25292100101102103104112 ... LastLast

Similar Threads

  1. Dante Labs WES/WGS Sequencing Technical
    By Donwulff in forum Dante Labs
    Replies: 168
    Last Post: 11-03-2020, 05:12 PM
  2. Dante Labs WGS (30x) $299
    By noman in forum Dante Labs
    Replies: 3
    Last Post: 08-30-2020, 09:06 PM
  3. Dante Labs subforum?
    By MacUalraig in forum Suggestions
    Replies: 2
    Last Post: 11-11-2019, 12:35 AM
  4. Whole Genome sequence $299 at Dante Labs
    By Dr_McNinja in forum Dante Labs
    Replies: 15
    Last Post: 02-18-2019, 12:30 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •