PDA

View Full Version : Dante Labs (WGS)



Pages : 1 2 3 4 [5] 6

ntk
11-10-2019, 01:00 PM
Did anyone else watch the webinar with Andrea Riposati just now (8am PST)? He acknowledged several customer issues (delays, raw data, etc). Did anyone feel like their issue was not even mentioned, or poorly addressed?

Also, thanks to MacUalraig for the heads up, I would have missed it otherwise.
I didn't, since they didn't notify customers (at least not me), I wasn't hitting F5 in this thread that day, and they haven't posted audio.

It's a joke: "the truth behind the delays in 2018", as if this was a 2018 problem and not ongoing. My kit ordered mid-February 2019 and returned that same month has now been in "Sequencing started" with no results for 22 days.

Feb 16, 2019: Ordered kit ($299)
Feb 18, 2019: Shipping notification
Feb 25, 2019: Kit received by Dante/logistics partner in Draper UT
Mar 26, 2019: Message support to ask why kit messenger still says "Waiting Confirmation from Dante Labs".
Apr 13, 2019: Response from Dante: "I can confirm that your sample has been received in our labs. We are currently running the DNA extraction ..."
Jun 28, 2019: Dante switches to new kit manager, status is still "QC Completed"
Sep 2, 2019: Dante: "84% of all our samples received by the end of July have been sequenced ... We plan to finish the entire backlog of samples by mid-October." I email to clarify about my kit status: same-day response: "Your sample is also being sequenced as with all our samples."
Oct 3, 2019: Kit manager still shows "QC Completed." Ask for status, same-day response: "I have checked your kit and it is still being sequenced within your [sic] lab." Asked how this is possible. Same-day response: "We are in the process of releasing data to all our customers."
Oct 13, 2019: "We are now sequencing more than 300 samples per week ... We have not completed our backlog yet, but we are getting closer every day."
Oct 19, 2019: Kit status updated to "Sequencing Started"

still nothing.

And is Dante Labs now offering the same 30x WGS for $199 again with "8 week turnaround time"? Why yes they are. Do you believe it?

pmokeefe
11-10-2019, 01:28 PM
Is the webinar still available anywhere, missed it myself and nobody reporting what they actually said.
Dante Labs Webinar on November 9th, 2019
https://www.youtube.com/watch?v=sgLwwsYinRA

ntk
11-11-2019, 12:41 PM
Well, a few hours after my last post, I got a notification that my VCF and FASTA files are now available (looks like no BAM yet). Will be reviewing these.

So here is my timeline to results:

Feb 16, 2019: Ordered kit ($299)
Feb 18, 2019: Shipping notification
Feb 25, 2019: Kit received by Dante/logistics partner in Draper UT
Mar 26, 2019: Message support to ask why kit messenger still says "Waiting Confirmation from Dante Labs".
Apr 13, 2019: Response from Dante: "I can confirm that your sample has been received in our labs. We are currently running the DNA extraction ..."
Jun 28, 2019: Dante switches to new kit manager, status is still "QC Completed"
Sep 2, 2019: Dante: "84% of all our samples received by the end of July have been sequenced ... We plan to finish the entire backlog of samples by mid-October." I email to clarify about my kit status: same-day response: "Your sample is also being sequenced as with all our samples."
Oct 3, 2019: Kit manager still shows "QC Completed." Ask for status, same-day response: "I have checked your kit and it is still being sequenced within your [sic] lab." Asked how this is possible. Same-day response: "We are in the process of releasing data to all our customers."
Oct 13, 2019: "We are now sequencing more than 300 samples per week ... We have not completed our backlog yet, but we are getting closer every day."
Oct 19, 2019: Kit status updated to "Sequencing Started"
Nov 11, 2019: VCF and FASTA received

pmokeefe
11-11-2019, 01:46 PM
My Trustpilot review of Dante Labs (https://www.trustpilot.com/users/5dc962bce93383a66c2df580)
I have ordered 9 test kits from Dante Labs so far. I have complete results for the first 3 kits, including all the raw data. All the results I received were very satisfactory. Customer support for the earlier kits was satisfactory as well. Because of that positive experience, I ordered four more kits on the Black Friday sale of 2018. It's now almost 11 months later and I have no results from those. When I contacted customer support about the 2018 Black Friday kits I received perfunctory, uninformative replies. I gave up trying to contact customer support. The last kit I ordered was for their Long Read test in the spring of 2019. I have received partial results for that test, but the results appear to be for a short-read test, not a long read test! The long read test was more expensive, but I appear to have received the results of a less expensive test. I'm waiting to see if I receive the full results before trying to contact Dante Labs about it. Dante Labs recently claims to have a new laboratory and new procedures that will address the sorts of issues I have experienced. I just ordered a new kit to test that. I feel rather foolish doing so, throwing good money after bad! But it is for my elderly father who is in very poor health. His was also one of the 2018 Black Friday kits with no results. I hope Dante Labs has actually improved its service, if they do I will write another, much better review and order many more kits. Wish me luck!

MacUalraig
11-11-2019, 02:46 PM
Dante Labs recently claims to have a new laboratory and new procedures that will address the sorts of issues I have experienced. I just ordered a new kit to test that. I feel rather foolish doing so, throwing good money after bad!

I'm tempted too although I can't see myself switching away from YSEQ for my regular project purchases unless there is a sea change in attitude from them.

Donwulff
11-11-2019, 09:09 PM
I have bit similar conundrum. Like the Black Friday sale, today's sale is again crazy good enough that it would be a steal (more affordable than many microarray tests) even if one got just the VCF and no BAM(*), and I might even test myself again for presumably 160bp reads if I couldn't talk any relatives around to it... My first sequence with them in 2017 went down without a hitch - better, in fact, than any other sequencing or microarray company I've dealt with, all who have had scheduling and sometimes quality issues.

HOWEVER, yes... it appears that for my Long Read test they intentionally returned data that fails Oxford Nanopore's standard quality control to inflate the data amount; I suspect this is intentional because other members had those reads correctly quality-filtered. This is literarally the worst possible thing you can do, since if against all advice someone was actually using these results for medical purposes, the errors could cause grievous bodily harm or even death in a way that "had to wait" or "didn't get all the data" doesn't. As it is, it undermines their credibility and good will. So far this seems like an isolated incident (of this type), but they also have not replied to my customer support request about it.

Speaking of which, I ordered their AI report for my short-read sequence just to find out what it was about couple of months ago, but the option wasn't even available in their report menu after paying for it. Their customer support replied immediately that they were on it, however... no report. If they need some specific information besides what the description on the order page says now, they should say so. Has anybody managed to get the AI report? What is it like? I also requested the personalized report because they say we can get one at the time - not even a response, though. And I tried the monthly report update subscription, but there wasn't any change at all to the reports. (Edit: Just to stress that Neurological Panel I ordered did work, received it within 24 hours. It was pretty much ClinVar dump where table formatting broke because of too long lines + Excel table of all the variants. Useful if you can't run ANNOVAR or similar.)

Common thead to the Long Read quality issues and all the reports is that none of the caveats about wet-lab work being unpredictable or having a third-party lab to blame seem to apply to these. Their price is best on the market, even if it suggests they're taking a loss to win market-share just like essentially every DNA-testing company or startup out there, and their service is great, when it works. I really want to see them succeed, however they shouldn't sell products they can't deliver (I mean, literally, not having the option in the menu...) or fail at things that should be entirely under their control like QC and reports if they hope for that market-share to amount to anything. Of course, since they must be taking a loss at that price, I'm not sure they'll REALLY miss my repeat-business, but neither am I sure I really desire a second sequence from them that badly... It's still not clear to me how large part of their customers haven't received short-read sequencing results after a long wait however; considering people are more sure to talk about negative experiences than everything going as intended, is that a large portion?

(*) And when people complain about not receiving results, can we get a clarification if they mean they received VCF but are still waiting for BAM, or if they received no VCF whatsover? Because I see all of the seemingly most vocal complainers talk about not receiving the BAM, so when people say "didn't receive data" or the like it's not clear what they mean.

MacUalraig
11-11-2019, 09:37 PM
People coming from the genetic genealogy community are primed for something they can upload to YFull (if they're male!*) so like me they tend to stop the stopwatch when the BAM turns up. Possibly people who bought for medical reasons are satisfied if they get a VCF or medical report only. What is annoying is that Dante themselves seem to count the product as delivered when they put the reports up.

Right now BAMs are controversial because of what FTDNA are up to so if Dante can produce them without extra charge or delay they have a winner.

* women of course could do just an mtDNA upload to YFull if they wanted.

tontsa
11-12-2019, 06:35 AM
It really sucks that FTDNA doesn't at least yet do imports of BAM/FQs as they have some populations there that won't get Y-700 or just don't upload to YFull.. so you are kinda forced in those cases to do Y-700 but for Yfull you get better coverage with 30x WGS..

Petr
11-12-2019, 07:06 PM
I just received FASTQ files for one WGS 30x test ordered in February 2019. The names are:
60820188xxxxxx_SA_L001_R1_001.fastq.gz
60820188xxxxxx_SA_L001_R2_001.fastq.gz

I'm surprised that these two FASTQ files are only 25 GB each - previously, WGS 30x files from Dante were much bigger, 50 GB each.

What is the size of the FASTQ files delivered to other testers this month?

tontsa
11-12-2019, 07:27 PM
If you unpack those gzips what size do you get? It's possible those 150p reads compress better especially if they have trimmed those adapter sequences away from them.

Petr
11-12-2019, 11:14 PM
Here is the fastqc report: https://www.dropbox.com/s/gexahrb6zc3vx0n/pa_fastqc.html?dl=1

Total Sequences 333747905
Sequence length 35-151

One error: Per base sequence content
2 warnings: Sequence Length Distribution, Sequence Duplication Levels

JamesKane
11-12-2019, 11:27 PM
I'm surprised that these two FASTQ files are only 25 GB each - previously, WGS 30x files from Dante were much bigger, 50 GB each.

What is the size of the FASTQ files delivered to other testers this month?

Can we all please STOP USING COMPRESSED FILE SIZES as a metric for evaluating FASTQ data being returned? The important metrics are how many gBases are represented in the files, much of it maps to the genome reference territory of your choice, and median coverage on the genome that isn't defined as N (any) base.

The better your data quality and the longer the reads, the smaller your compressed files will be. (Within reason...)

Petr
11-13-2019, 09:49 AM
Can we all please STOP USING COMPRESSED FILE SIZES as a metric for evaluating FASTQ data being returned? The important metrics are how many gBases are represented in the files, much of it maps to the genome reference territory of your choice, and median coverage on the genome that isn't defined as N (any) base.

The better your data quality and the longer the reads, the smaller your compressed files will be. (Within reason...)

OK, I agree, but how to get these numbers from FASTQ files?

JamesKane
11-13-2019, 12:48 PM
fastp (https://github.com/OpenGene/fastp) will perform most of the same metrics you posted from fastqc, but also includes the total base estimates. For a 30x WGS you should see more than 90G.

The others require aligning the files to a human reference first. Your goal here is to see how well the 600+ million reads fit and if the average aligned coverage is about 30x.

Petr
11-13-2019, 02:16 PM
fastp (https://github.com/OpenGene/fastp) will perform most of the same metrics you posted from fastqc, but also includes the total base estimates. For a 30x WGS you should see more than 90G.

General
fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
sequencing: paired end (151 cycles + 151 cycles)
mean length before filtering: 148bp, 148bp
mean length after filtering: 148bp, 148bp
duplication rate: 5.986576%
Insert size peak: 261
Before filtering
total reads: 667.495810 M
total bases: 99.125328 G
Q20 bases: 96.030385 G (96.877748%)
Q30 bases: 90.925155 G (91.727470%)
GC content: 43.566229%
After filtering
total reads: 662.534472 M
total bases: 98.299320 G
Q20 bases: 95.480444 G (97.132355%)
Q30 bases: 90.481013 G (92.046428%)
GC content: 43.541445%
Filtering result
reads passed filters: 662.534472 M (99.256724%)
reads with low quality: 4.778964 M (0.715954%)
reads with too many N: 37.338000 K (0.005594%)
reads too short: 145.036000 K (0.021728%)


And now - is it good or bad result?


The others require aligning the files to a human reference first. Your goal here is to see how well the 600+ million reads fit and if the average aligned coverage is about 30x.
If I have bam file, what statistics is the best?

pmokeefe
11-13-2019, 05:22 PM
***Update ***
I just checked the VCF file from that kit using bcftools stats. It show 30X coverage, so hopefully there will be more FASTQ files coming.
*************
I just received FASTQ files from a Dante Labs test (from the 2018 Black Friday sale). It appears that the coverage was about 7X.
I counted the lines, there were 290,777,460 lines in each file. With 4 lines per read and two files with 150 bp reads, and a ~3GB genome I get about 7X.
The fastp (https://github.com/OpenGene/fastp) report below seems to bear that out.

Anyone else with a Dante Labs 30X result that came up short?

>fastp -i 60820188474302_SA_L001_R1_001.fastq.gz -I 60820188474302_SA_L001_R2_001.fastq.gz

Summary
General
fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
sequencing: paired end (151 cycles + 151 cycles)
mean length before filtering: 146bp, 146bp
mean length after filtering: 146bp, 146bp
duplication rate: 5.175692%
Insert size peak: 261
Before filtering
total reads: 145.388730 M
total bases: 21.354551 G
Q20 bases: 20.879118 G (97.773623%)
Q30 bases: 20.044981 G (93.867489%)
GC content: 41.690978%
After filtering
total reads: 144.125952 M
total bases: 21.141710 G
Q20 bases: 20.733409 G (98.068742%)
Q30 bases: 19.922371 G (94.232540%)
GC content: 41.645787%
Filtering result
reads passed filters: 144.125952 M (99.131447%)
reads with low quality: 1.208892 M (0.831489%)
reads with too many N: 17.714000 K (0.012184%)
reads too short: 36.172000 K (0.024880%)

Petr
11-13-2019, 05:42 PM
This is test finished on March 5th, 2019:

General
fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
sequencing: paired end (150 cycles + 150 cycles)
mean length before filtering: 150bp, 150bp
mean length after filtering: 149bp, 149bp
duplication rate: 21.974666%
Insert size peak: 269
Before filtering
total reads: 834.317408 M
total bases: 125.147611 G
Q20 bases: 121.528261 G (97.107935%)
Q30 bases: 115.666626 G (92.424158%)
GC content: 40.918799%
After filtering
total reads: 827.816768 M
total bases: 123.827389 G
Q20 bases: 120.561766 G (97.362762%)
Q30 bases: 114.859232 G (92.757534%)
GC content: 40.873596%
Filtering result
reads passed filters: 827.816768 M (99.220843%)
reads with low quality: 6.295018 M (0.754511%)
reads with too many N: 65.278000 K (0.007824%)
reads too short: 140.344000 K (0.016821%)

Giosta
11-13-2019, 06:11 PM
Here is my Dante fastp output. Ordered Dec 2018. I have many fastq files. I run this only for the first fastq pair.


$ ./fastp -i /mnt/d/DanteLabs/F_L01_517_1.fq.gz -I /mnt/d/DanteLabs/F_L01_517_2.fq.gz

fastp report
Summary
General
fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
sequencing: paired end (100 cycles + 100 cycles)
mean length before filtering: 100bp, 100bp
mean length after filtering: 99bp, 99bp
duplication rate: 0.607919%
Insert size peak: 169
Before filtering
total reads: 227.436332 M
total bases: 22.743633 G
Q20 bases: 22.372274 G (98.367193%)
Q30 bases: 21.171145 G (93.086030%)
GC content: 42.706677%
After filtering
total reads: 226.843730 M
total bases: 22.658819 G
Q20 bases: 22.308741 G (98.455004%)
Q30 bases: 21.119156 G (93.205014%)
GC content: 42.687204%
Filtering result
reads passed filters: 226.843730 M (99.739443%)
reads with low quality: 566.188000 K (0.248944%)
reads with too many N: 26.414000 K (0.011614%)
reads too short: 0 (0.000000%)

Petr
11-13-2019, 06:31 PM
And this test was finished by Dante on January 14th, 2018:

General
fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
sequencing: paired end (100 cycles + 100 cycles)
mean length before filtering: 100bp, 100bp
mean length after filtering: 99bp, 99bp
duplication rate: 1.406735%
Insert size peak: 157
Before filtering
total reads: 1.200368 G
total bases: 120.036831 G
Q20 bases: 116.151332 G (96.763078%)
Q30 bases: 106.174508 G (88.451609%)
GC content: 41.953024%
After filtering
total reads: 1.180651 G
total bases: 117.875819 G
Q20 bases: 114.391330 G (97.043933%)
Q30 bases: 104.688158 G (88.812242%)
GC content: 41.951735%
Filtering result
reads passed filters: 1.180651 G (98.357428%)
reads with low quality: 7.075926 M (0.589480%)
reads with too many N: 12.640984 M (1.053092%)
reads too short: 0 (0.000000%)

teepean47
11-13-2019, 10:08 PM
Here's fastp from my 4x test.

fastp version: 0.19.11 (https://github.com/OpenGene/fastp)
sequencing: paired end (151 cycles + 151 cycles)
mean length before filtering: 148bp, 148bp
mean length after filtering: 148bp, 148bp
duplication rate: 4.958682%
Insert size peak: 261
Before filtering
total reads: 347.074070 M
total bases: 51.605934 G
Q20 bases: 49.989405 G (96.867553%)
Q30 bases: 47.543664 G (92.128290%)
GC content: 42.797089%
After filtering
total reads: 342.757768 M
total bases: 50.954371 G
Q20 bases: 49.556123 G (97.255882%)
Q30 bases: 47.189698 G (92.611679%)
GC content: 42.717622%
Filtering result
reads passed filters: 342.757768 M (98.756374%)
reads with low quality: 4.115656 M (1.185815%)
reads with too many N: 34.754000 K (0.010013%)
reads too short: 165.892000 K (0.047797%)

JamesKane
11-14-2019, 12:53 AM
Before filtering
total reads: 667.495810 M
total bases: 99.125328 G
Q20 bases: 96.030385 G (96.877748%)
Q30 bases: 90.925155 G (91.727470%)
GC content: 43.566229%
After filtering
total reads: 662.534472 M
total bases: 98.299320 G
Q20 bases: 95.480444 G (97.132355%)
Q30 bases: 90.481013 G (92.046428%)
GC content: 43.541445%
Filtering result


A 30x WGS test should have better than 90gBases. These files easily exceed that amount.

Now the BAM should be investigated for genomic territory covered. You could use GATK's CollectWgsMetrics. MEAN_COVERAGE should be about 30x. If it's significantly lower, you had a higher bacterial concentration sequenced than normal.

JamesKane
11-14-2019, 12:58 AM
Before filtering
total reads: 834.317408 M
total bases: 125.147611 G
Q20 bases: 121.528261 G (97.107935%)
Q30 bases: 115.666626 G (92.424158%)
GC content: 40.918799%
After filtering
total reads: 827.816768 M
total bases: 123.827389 G
Q20 bases: 120.561766 G (97.362762%)
Q30 bases: 114.859232 G (92.757534%)
GC content: 40.873596%


Much higher than expected. That's closer to 40x coverage and would be atypical.

JamesKane
11-14-2019, 01:02 AM
Before filtering
total reads: 347.074070 M
total bases: 51.605934 G
Q20 bases: 49.989405 G (96.867553%)
Q30 bases: 47.543664 G (92.128290%)
GC content: 42.797089%
After filtering
total reads: 342.757768 M
total bases: 50.954371 G
Q20 bases: 49.556123 G (97.255882%)
Q30 bases: 47.189698 G (92.611679%)
GC content: 42.717622%


Helps to confirm what I've seen in the Y-chromosome data added to the Y-DNA Warehouse. These tests are actually 15x+ to date. The real question is why are they overshooting the mark so much? Why not market as 15x if this is expected?

Petr
11-14-2019, 10:30 AM
Summary of all my WGS 30x tests from Dante, total bases after filtering and coverage calculated for idxstats:

2018-01-19: 118 G - 32.8x - 12.4 % of unmapped reads
2018-03-23: 125 G - 32.7x - 8.3 % of unmapped reads
2019-03-27: 124 G - 38.7x - 1.8 % of unmapped reads
2019-03-27: 89 G - 27.5x - 3.3 % of unmapped reads
2019-03-27: 102 G - 25.5x - 20.7 % of unmapped reads
2019-03-27: 93 G - 21.0x - 28.5 % of unmapped reads
2019-06-05: 106 G - 29.4x - 12.8 % of unmapped reads
2019-11-12: 98 G - mapping not processed yet
2019-11-14: 73.7 G - mapping not processed yet

teepean47
11-14-2019, 10:33 AM
Helps to confirm what I've seen in the Y-chromosome data added to the Y-DNA Warehouse. These tests are actually 15x+ to date. The real question is why are they overshooting the mark so much? Why not market as 15x if this is expected?

Weird thing is that YFull refused to accept this. According to their customer service: "Unfortunately, the quality of the sequence is extremely low for the Y chromosome".

MacUalraig
11-14-2019, 02:49 PM
Weird thing is that YFull refused to accept this. According to their customer service: "Unfortunately, the quality of the sequence is extremely low for the Y chromosome".

Can you ask them for the stats they used to reject it? If its not a trade secret.

teepean47
11-14-2019, 05:54 PM
Can you ask them for the stats they used to reject it? If its not a trade secret.

The only answer I got was it is 2x. I did ask again but I doubt I get an answer. Maybe they don't want to anger their new business partner... I honestly thought Dante's 4X would be the perfect cheap test to cover everything.

Petr
11-14-2019, 11:04 PM
The results of WGS 30x received today have somewhat lower average number of reads:

General
fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
sequencing: paired end (151 cycles + 151 cycles)
mean length before filtering: 148bp, 148bp
mean length after filtering: 148bp, 148bp
duplication rate: 5.775481%
Insert size peak: 261
Before filtering
total reads: 502.517378 M
total bases: 74.698229 G
Q20 bases: 72.112478 G (96.538403%)
Q30 bases: 68.185465 G (91.281233%)
GC content: 41.676315%
After filtering
total reads: 497.104320 M
total bases: 73.690341 G
Q20 bases: 71.416587 G (96.914448%)
Q30 bases: 67.614213 G (91.754512%)
GC content: 41.600758%
Filtering result
reads passed filters: 497.104320 M (98.922812%)
reads with low quality: 5.140478 M (1.022945%)
reads with too many N: 31.094000 K (0.006188%)
reads too short: 241.486000 K (0.048055%)

mdn
11-15-2019, 07:11 AM
The only answer I got was it is 2x. I did ask again but I doubt I get an answer. Maybe they don't want to anger their new business partner... I honestly thought Dante's 4X would be the perfect cheap test to cover everything.
Please could you clarify is your results filtered for Y only or not?

For example here https://anthrogenica.com/showthread.php?12075-Dante-Labs-(WGS)&p=611568&viewfull=1#post611568 statistics for 4x says: "ChrY... 29.20% Coverage and 1.66 Mean Depth" Could you show your output by BamDeal?

As Y DNA Warehouse statistics page also says: "On average WGS tests have fifty percent of test's rated depth covered in the Y chromosome e.g. a 15x test has 7 reads spanning each location". 4x -> 2x should be ok.

I am considering 4x as a great (and very cheap) option to get a brief autosomal DNA + mtDNA + might be some of Y (if it is possible), so it would be nice even if sometime accidentally they are producing better Y results. :)

MacUalraig
11-15-2019, 07:33 AM
For comparison I once loaded a WGSx4 (FGC low pass) to YFull and have several YSEQ with a Y depth median 7-8x, my worst YSEQ has been 6x (on the Y) and those were all accepted. YFull have a very good relationship with YSEQ of course.

I didn't bother uploading my WGSx2 though. :)

teepean47
11-15-2019, 07:55 AM
Please could you clarify is your results filtered for Y only or not?

For example here https://anthrogenica.com/showthread.php?12075-Dante-Labs-(WGS)&p=611568&viewfull=1#post611568 statistics for 4x says: "ChrY... 29.20% Coverage and 1.66 Mean Depth" Could you show your output by BamDeal?

As Y DNA Warehouse statistics page also says: "On average WGS tests have fifty percent of test's rated depth covered in the Y chromosome e.g. a 15x test has 7 reads spanning each location". 4x -> 2x should be ok.

I am considering 4x as a great (and very cheap) option to get a brief autosomal DNA + mtDNA + might be some of Y (if it is possible), so it would be nice even if sometime accidentally they producing better Y results. :)

I am going to test it with BamDeal after work. I sent them just the Y-DNA part of the BAM.

The results on Y-Warehouse include my 4x BAM and at least to me the results look very good.

mdn
11-15-2019, 10:35 AM
The results on Y-Warehouse include my 4x BAM and at least to me the results look very good.
Yes but those results are quite strange.
They say Big Y 500 has coverage 16.3/23.7=69% and Big Y 700 has coverage 22.4/23.7=95%, but in reality the same YFull reports that for them Big Y 500 has ~52-53% coverage, and Big Y 700 - 78-79%.

2 my recent (November) Big Y 700 results on YFull:
https://c.radikal.ru/c15/1911/10/d68307dfaec5.png
https://d.radikal.ru/d37/1911/a9/bc3dd3acb417.png

And almost the same DanteLabs 30x - 99.81% covered, I just have no my own results there yet so can not show it.
So despite Y-Warehouse statistics shows it as a good - for Y it might be very bad (it had been my 1st Big Y 700 - and after I found that they covers just 79% - :( )

JamesKane
11-15-2019, 11:04 AM
Length of coverage borders on a useless statistic. These are all the locations with at least a single read present.

The Y-DNA Warehouse differs in that the callable loci statistic only reports locations with at least four reads with an acceptable alignment quality score.

There may be a key differentiator in that the Warehouse takes the vendor's alignment analysis and shoves it in the dust bin.** Every sample is reverted to FASTQ and then put through a pipeline based on Broad Institute's Best Practices to get normalized data for reporting. As far as I know YFULL uses what you give them.

** Unless I know that vendor to be using the same processing steps.

teepean47
11-15-2019, 11:15 AM
Length of coverage borders on a useless statistic. These are all the locations with at least a single read present.

The Y-DNA Warehouse differs in that the callable loci statistic only reports locations with at least four reads with an acceptable alignment quality score.

There may be a key differentiator in that the Warehouse takes the vendor's alignment analysis and shoves it in the dust bin. Every sample is reverted to FASTQ and then put through a pipeline based on Broad Institute's Best Practices to get normalized data for reporting. As far as I know YFULL uses what you give them.

So if I create a new GRCh38 BAM with your instructions it might be more acceptable for YFull?

JamesKane
11-15-2019, 11:46 AM
Possibly. If you want I can create a temporary Dropbox share with the chrY and chrY_KI270740v1_random reads for you to use. The mtDNA wouldn't be there for them though.

teepean47
11-15-2019, 11:53 AM
Possibly. If you want I can create a temporary Dropbox share with the chrY and chrY_KI270740v1_random reads for you to use. The mtDNA wouldn't be there for them though.

Thank you! You can PM/email me the link. MtDNA is not necessary as I have sent it to them earlier.

mdn
11-15-2019, 12:25 PM
I have just got access to FASTQ R1 and R2 (each about 25 GB ) on my latest 30x order! Sequencing had been started on 23.10.2019 and is still ongoing, but those 2 files are ready.

Could someone advice how can I directly generate filtered BAM (Y and/or mtDNA only) from those FASTQ? It should be much faster than full BAM generation, isn't it?
I have tried to provide template for mtDNA only but in this case SAM file still have a lot of incorrect unaligned records and seems the same execution time like full BAM.

Petr
11-15-2019, 02:04 PM
Maybe this process? http://www.it2kane.org/2018/05/variant-discovery-process-update/

Donwulff
11-15-2019, 02:11 PM
According to above posts, 2 x 25GBytes seems like the expected size for whole genome.

An alignment to partial genome only wouldn't be majorly faster, due to the storage I/O still being same, and the way the search-index-lookup works it would still do similar amount of work. However, if you don't have the rest of the genome there, the reads will be forced to align against what you have (ie. Y-chromosome) which would yield different results from whole genome. I can't speak for YFull, but if the intent is to submit to YFull I strongly recommend aligning against whole genome so the results will be comparable with other alignments produced that way.

You can do something like

(samtools view -H sample.bam; samtools view sample.bam | grep "[[:space:]]chrY") | samtools sort - -o sample-chrY-sorted.bam
to extract only reads pairs whose either end aligned on something beginning with chrY out of unsorted alignment and then sort them.
I didn't try that command out right now though, so it might need some adjustment...

teepean47
11-15-2019, 07:04 PM
The new BAM was accepted! I guess the dream still lives on :) Thank you very much James!

Petr
11-15-2019, 07:57 PM
It looks like Sequencing.com has some problem with the new FASTQ files from Dante. All older files were OK, but last two kits failed with this error:


Thank you for uploading 6082018847xxxx_SA_L001_R2_001.fastq.gz. Unfortunately, this file is not compatible with some of the apps at Sequencing.com. Because of this, you may not be able to select this file when starting some apps.
There are several reasons why a file may be incompatible. For example, the file format may not yet be fully supported, the file may be encrypted or the file may be damaged. Apps can only process human genetic data so files containing genetic data from any other species will also be incompatible.
Even though this file is incompatible with some apps, the file is securely stored in your account and you can share and download it at any time.
We're here to help. If you would like us to determine why your file is incompatible, please submit a Support Request.


And Data Viewer:

Your Data Viewer app encountered an issue while processing your file and will not be able to provide a result.
Action required: None. The Support Team has already been notified and they are investigating the error.
App: Data Viewer
Filename: 6082018847xxxx_SA_L001_R2_001.fastq.gz
ID: 2287209
We apologize about this issue. Most errors are resolved within 2-3 days. Once the error is resolved, your app will automatically restart and will analyze your file

mdn
11-16-2019, 06:34 AM
The new BAM was accepted! I guess the dream still lives on :) Thank you very much James!
Great, please could you share YFull Statistics page when it will be available.

teepean47
11-16-2019, 09:24 AM
It looks like Sequencing.com has some problem with the new FASTQ files from Dante. All older files were OK, but last two kits failed with this error:



And Data Viewer:

I gave up with sequencing.com. After getting a subscription everything stopped working. Luckily the trial period did not end.

Petr
11-18-2019, 10:58 AM
3 WGS 30x results delivered last week (ordered in March), total bases after filtering 98.3 G, 73.7 G, 97.8 G.

mdn
11-19-2019, 08:12 AM
30x DanteLabs results from November 2019, converted to BAM hg38 from FASTQ by bwa mem (pre-indexed - so just 20 hours on 12 threads :) ).

YFull results:
https://a.radikal.ru/a05/1911/5d/17c760f0a0e2.png
https://c.radikal.ru/c21/1911/16/7ba3934be597.png
https://c.radikal.ru/c27/1911/8b/9f78778ee357.png

Interesting, how faster by time and worse by quality are CUDA aligners.

teepean47
11-19-2019, 12:51 PM
Interesting, how faster by time and worse by quality are CUDA aligners.

I have been following bwa-mem2 for a while now but still cannot even index any references without crashing.

https://github.com/bwa-mem2/bwa-mem2

tontsa
11-20-2019, 06:32 AM
I managed to index hg38 with bwa-mem2 by using 320 gigs of swap space.. it's really really memory hungry. Haven't tried aligning yet though..

Donwulff
11-20-2019, 08:40 PM
The question is, if you have one or a handful of samples to process, and 12 cores/huge memory laying around for whatever reason, why do you even care all that much about performance? Also, bwa-mem2 page says "not recommended for production uses at the moment" so maybe only use it if you want to compare if it really gives correct/same results as original.

With respect to overall performance question, Broad Institute announced they're starting to use DRAGEN, which Dante Labs claims to be using now too. Which really, really just confuses me. DRAGEN is lightning fast, but there appears to be no real research paper backing it or its correctness up. And it's lightning fast because it uses proprietary "hypercomputing" FPGA-accelerators. And yes, the Broad Institute blog just flat out says they're replacing BWA-MEM with DRAGEN. If the results are notably different from BWA-MEM, it would need to be staggeringly better to justify re-processing tens of thousands of samples with DRAGEN to ensure they're statistically comparable. I think that's more a financial tie with Illumina, which raises questions about the impartiality of Broad Institute Best Practices, though they only supported ILLUMINA BAM-tag & sequencing adapters as it is.

The casual user can probably run DRAGEN on one of their cloud-platforms for a fraction of what it would cost to learn to process & process the sample on their own computers. (Of course, if Dante Labs aligns with DRAGEN, that alone might be enough reason to use alternative aligner for "second opinion").

teepean47
11-20-2019, 09:17 PM
DRAGEN, which Dante Labs claims to be using now too.

The recent BAMs from Dante have been processed with Dragen:


@PG ID: Hash Table Build VN: 01.003.044.3.3.5-hv-7 CL: /opt/edico/bin/dragen --lic-instance-id-location /root/.edico --build-hash-table true --ht-reference /data/input/appresults/126241120/hs37d5.fa --ht-build-rna-hashtable true --enable-cnv true --ht-alt-aware-validate true --output-directory /data/scratch/hs37d5 DS: digest_type: 1 digest: 0xF2543D4A ref_digest: 0xC2311E75 ref_index_digest: 0xB4307AF0 hash_digest: 0x7BF2A3E5
@PG ID: DRAGEN HW build VN: 04261818
@PG ID: DRAGEN SW build VN: 05.021.332.3.4.5 CL:

teepean47
11-21-2019, 05:32 AM
Here are the statistics so far from the 4x sample:


ChrY BAM file size:
0.16 Gb
Hg38

Reads (all): 1242311
Mapped reads:
1242311
(100.00%)

Unmapped reads:
0

Length coverage:
23530215 bp
(99.55%)

Min depth coverage: 1X
Max depth coverage: 222X
Mean depth coverage: 7.13X
Median depth coverage: 6X
Length coverage for age: 7879245 bp
No call: 106140 bp

My Big Y-700:


ChrY BAM file size:
0.35 Gb
Hg38

Reads (all): 5867120
Mapped reads:
5867120
(100.00%)

Unmapped reads:
0

Length coverage:
18898332 bp
(79.95%)

Min depth coverage: 1X
Max depth coverage: 238X
Mean depth coverage: 34.70X
Median depth coverage: 35X
Length coverage for age: 8380022 bp
No call: 4738023 bp

pmokeefe
11-22-2019, 04:14 PM
Today I received the FASTQ files for a 30X short read test purchased during the 2018 Black Friday sale. The two FASTQ files were about 1.8GB each. It appears to have 2X coverage. Included below is the email I sent to Sydney Maria Sherwood and others at Dante Labs. The FASTQ files and the output of fastp are available on a public Google Drive folder at this link: https://drive.google.com/open?id=1hjctSLXy0VI-hI1wlmcRm9RdGebfWilm
What do you think?


Dear Dante Labs,
Today I received the FASTQ files from kit #56001801069027 which was purchased as a 30X WGS test on the 2018 Black Friday sale. Upon running fastp on the FASTQ files I received, it appears that the depth of coverage is about 2X . I have included a log of the commands I used on MacOS below. I would like to receive FASTQ files for a 30X test, not 2X. Can you please help me with that? Thank you.
Sincerely,
Patrick O'Keefe


ls -l *.fastq.gz
[email protected] 1 patrick staff 1816972069 Nov 22 06:01 56001801069027_S15_L001_R1_001.fastq.gz
[email protected] 1 patrick staff 1859161621 Nov 22 06:05 56001801069027_S15_L001_R2_001.fastq.gz


Patricks-MacBook-Pro:56001801069027 patrick$ fastp -i 56001801069027_S15_L001_R1_001.fastq.gz -I 56001801069027_S15_L001_R2_001.fastq.gz
*** Here is the beginning of the fastp.html output from the above command ***
fastp report

Summary

General
fastp version:0.20.0 (https://github.com/OpenGene/fastp)sequencing:paired end (151 cycles + 151 cycles)
mean length before filtering:147bp, 147bp
mean length after filtering:146bp, 146bp
duplication rate:4.011014%
Insert size peak:261

Before filtering
total reads:45.489846 M
total bases:6.696706 G
Q20 bases:6.419570 G (95.861610%)
Q30 bases:6.018207 G (89.868174%)
GC content:42.479490%

teepean47
11-23-2019, 09:17 AM
Looks like Dante is selling upgrades. I might consider it if there are any decent Black Friday discounts.

https://www.dantelabs.com/products/whole-genome-upgrade

MacUalraig
11-23-2019, 10:43 AM
Agreed, definitely await Black Friday announcements before ordering anything from them!

teepean47
11-24-2019, 09:13 PM
Black Friday discount: 30x test for 169€.

https://www.dantelabs.com/collections/our-tests/products/whole-genome-sequencing?variant=30759474593927

Petr
11-25-2019, 10:48 AM
3 WGS 30x results delivered last week (ordered in March), total bases after filtering 98.3 G, 73.7 G, 97.8 G.

The coverage counted using idxstats form the original hg19 bam files supplied by Dante, and hg38 bam files created by myself using the procedure described by James Kane:

Kit 1: (98.3 Gbases): coverage 15.57x, 51.23 % unmapped - hg38 JKane: coverage 16,55x, 47.19 % unmapped
Kit 2: (73.7 Gbases): coverage 22.21x, 7.69 % unmapped - hg38 JKane: coverage 22.30x, 6.35 % unmapped
Kit 3: (97.8 Gbases): coverage 30.78x, 3.34 % unmapped - hg38 JKane: (not processed yet)

What could be the reason for so many unmapped reads for Kit 1? High contamination of the original sample? Or the 8 months storage of the sample by Dante?

Petr
11-25-2019, 12:03 PM
Black Friday discount: 30x test for 169€.

https://www.dantelabs.com/collections/our-tests/products/whole-genome-sequencing?variant=30759474593927

€152.10 with code YFULL10.

Donwulff
11-25-2019, 01:11 PM
The coverage counted using idxstats form the original hg19 bam files supplied by Dante, and hg38 bam files created by myself using the procedure described by James Kane:

Kit 1: (98.3 Gbases): coverage 15.57x, 51.23 % unmapped - hg38 JKane: coverage 16,55x, 47.19 % unmapped
Kit 2: (73.7 Gbases): coverage 22.21x, 7.69 % unmapped - hg38 JKane: coverage 22.30x, 6.35 % unmapped
Kit 3: (97.8 Gbases): coverage 30.78x, 3.34 % unmapped - hg38 JKane: (not processed yet)

What could be the reason for so many unmapped reads for Kit 1? High contamination of the original sample? Or the 8 months storage of the sample by Dante?

Too many possibilities to answer. In principle, I believe DNA GenoTek https://blog.dnagenotek.com/ for example seems to promise years of storage at room temperature. However, this is assuming the stabilizing buffer is added promptly. In addition, your sample was probably also collected on SPECTRUM, while I recall Dante Labs saying they switched kits specifically because the new one provides higher quality, I've not seen any specs or studies supporting that.

Seeing read-quality of the unmapped reads (I can't unfortunately recall if any QC software readily provides that) to see if it's degraded DNA or results of mapping against some microbial references (I have oral on the "Dante Labs technical" thread, but I'm thinking one of the high-level classifiers could be useful too for mold etc. I have to say that 15.57X is certainly concerning for something sold as 30X. The microarray genotypers also seem to have individuals who never have enough DNA in the sample though, so I'm sure there's outliers and mis-applied stabilizing buffer etc.

mdn
11-25-2019, 03:53 PM
Please could somebody clarify.
Is there anything special in 130x exome?

Or if I will buy and do 5 times 30x - I will get even 150x exome + 150x everything, so it is definitely better (and even cheaper now)?

I do not plan to do so, but it's just interesting. Might be I will do 60x still. :)

ceh13
11-25-2019, 03:56 PM
There was 25% unmapped reads in my sample. Anyone can check microbial content making fastq files of unmapped reads and uploading the files www.cosmosid.com. In my case there was no outside contamination. The dominant strains/OTUs belonged normal oral flora.

aaronbee2010
11-25-2019, 04:12 PM
Please could somebody clarify.
Is there anything special in 130x exome?

Or if I will buy and do 5 times 30x - I will get even 150x exome + 150x everything, so it is definitely better (and even cheaper now)?

I do not plan to do so, but it's just interesting. Might be I will do 60x still. :)

In theory, you could purchase 5 30x kits then merge the FASTQ files from each kit into one super FASTQ file.

I would love to do this myself, but first I would like to get my maternal Y-DNA and my paternal mtDNA sequenced. I've ordered a Dante Labs kit for my maternal uncle 2 weeks ago and I've ordered a Dante Labs kit for my father today, as it's my father's birthday today.

If you're mainly interested in your own DNA, then you could purchase the kits for yourself and see. It may be worth clarifying what I've said about merging FASTQ files with people more knowledgeable than me, though. There are quite a few people on the Dante Labs Customers Facebook page who should be able to help you.

MacUalraig
11-26-2019, 08:34 AM
Talking of which, Dante now have their own controlled 'customer service' group as well. FWIW.

tontsa
11-26-2019, 08:41 AM
In that group they have upgraded the customer service to an actual human being that reads the mail you actually sent.. not that it makes the sequencing process any faster if you are unlucky to have your sample at "partner Illumina" lab.

MacUalraig
11-26-2019, 03:53 PM
It could be read as a positive sign that they are confident they have turned the corner at last. I do wonder for example if everyone who bought last Black Friday has actually taken delivery of their data? Hope so.

pmokeefe
11-26-2019, 05:13 PM
It could be read as a positive sign that they are confident they have turned the corner at last. I do wonder for example if everyone who bought last Black Friday has actually taken delivery of their data? Hope so.
I finally received the last of my four 2018 Black Friday kits yesterday. This one had adequate coverage. But two of those four had low FASTQ coverage (2X,7X) still waiting for that to be resolved.

aaronbee2010
11-26-2019, 07:48 PM
Just received my reports and raw data (hg19) today. I'm currently waiting for my hg38 files.

I'll evaluate the coverage for my BAM + FASTQ files as soon as I can then keep you all posted (if I can remember, hopefully!).

So far, so good. Thank you Dante Labs :)

https://i.gyazo.com/66d991b8723e53e86096097641f0dec3.png

mixmaxdp
11-28-2019, 10:29 AM
I've finally received sequencing results from my last year's BF WGS kit from DanteLabs this morning.

Can someone kindly provide guidance in layman's terms on how to evaluate coverage and my sample sequencing quality, please?
I'm just starting with genetics and while having some programming/CS experience am a little bit lost when it comes to interpreting/processing WGS results, especially since things are moving relatively fast in this area.
Thanks! :)

mdn
11-28-2019, 11:38 AM
I've finally received sequencing results from my last year's BF WGS kit from DanteLabs this morning.
Can someone kindly provide guidance in layman's terms on how to evaluate coverage and my sample sequencing quality, please?
You can try fastp: https://github.com/OpenGene/fastp

For example my results for 30x (received 2 weeks ago):
General
fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
sequencing: paired end (151 cycles + 151 cycles)
mean length before filtering: 148bp, 148bp
mean length after filtering: 148bp, 148bp
duplication rate: 5.186679%
Insert size peak: 261
Before filtering
total reads: 639.948568 M
total bases: 95.087056 G
Q20 bases: 91.374658 G (96.095790%)
Q30 bases: 85.692296 G (90.119833%)
GC content: 41.692851%
After filtering
total reads: 633.689772 M
total bases: 93.900964 G
Q20 bases: 90.563383 G (96.445636%)
Q30 bases: 85.027882 G (90.550595%)
GC content: 41.612174%
Filtering result
reads passed filters: 633.689772 M (99.021985%)
reads with low quality: 5.919744 M (0.925034%)
reads with too many N: 35.972000 K (0.005621%)
reads too short: 303.080000 K (0.047360%)

95 Gbases before filtering means that it is slightly better than 30x.

mdn
11-28-2019, 01:17 PM
And finally I did fastp report for my 4x (result is about 1 month old).



General
fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
sequencing: paired end (151 cycles + 151 cycles)
mean length before filtering: 148bp, 148bp
mean length after filtering: 148bp, 148bp
duplication rate: 5.278218%
Insert size peak: 261

Before filtering
total reads: 291.229862 M
total bases: 43.367725 G
Q20 bases: 42.106602 G (97.092024%)
Q30 bases: 40.101717 G (92.469035%)
GC content: 43.100605%

After filtering
total reads: 288.389676 M
total bases: 42.877317 G
Q20 bases: 41.776746 G (97.433210%)
Q30 bases: 39.831228 G (92.895804%)
GC content: 43.037300%

Filtering result
reads passed filters: 288.389676 M (99.024761%)
reads with low quality: 2.699558 M (0.926951%)
reads with too many N: 29.418000 K (0.010101%)
reads too short: 111.210000 K (0.038186%)


Seems it is something like 14x??? ;)

MacUalraig
11-28-2019, 05:04 PM
I've finally received sequencing results from my last year's BF WGS kit from DanteLabs this morning.

Can someone kindly provide guidance in layman's terms on how to evaluate coverage and my sample sequencing quality, please?
I'm just starting with genetics and while having some programming/CS experience am a little bit lost when it comes to interpreting/processing WGS results, especially since things are moving relatively fast in this area.
Thanks! :)

As well as the other steps suggested don't forget to submit any BAM you got to YFull.com for analysis and tree placement, they will do both Y chromosome (males) and mtDNA.

https://yfull.com/order/

tontsa
11-28-2019, 05:42 PM
Just got fastqs and bam for another 30x kit sequenced in Italy.. it's pretty low coverage:
General
fastp version: 0.20.0 (https://github.com/OpenGene/fastp)
sequencing: paired end (151 cycles + 151 cycles)
mean length before filtering: 148bp, 148bp
mean length after filtering: 148bp, 148bp
duplication rate: 3.934586%
Insert size peak: 261
Before filtering
total reads: 519.950394 M
total bases: 77.216558 G
Q20 bases: 74.817375 G (96.892917%)
Q30 bases: 71.092621 G (92.069140%)
GC content: 42.040095%
After filtering
total reads: 514.497438 M
total bases: 76.227120 G
Q20 bases: 74.141915 G (97.264485%)
Q30 bases: 70.537754 G (92.536297%)
GC content: 41.959235%
Filtering result
reads passed filters: 514.497438 M (98.951255%)
reads with low quality: 5.038110 M (0.968960%)
reads with too many N: 176.942000 K (0.034031%)
reads too short: 237.904000 K (0.045755%)

teepean47
11-29-2019, 05:58 PM
Dante is now offering a 4x to 30x upgrade for 99€. With a discount code YFULL10 total is 89€.

https://www.dantelabs.com/products/whole-genome-upgrade

tontsa
11-29-2019, 06:45 PM
I guess I bite and see if I end up with 34x in the end :) Also they said to resequence the sample that only had 76 gigabases so I'm happy.

Anoon
11-29-2019, 07:33 PM
I received results for a 30x eight-week WGS order this week. Surprisingly quickly as well – 9 calendar days from when they received my sample to the results and raw files being available for download.

However, I'm a bit disappointed with the coverage. The BAM file was 32GB in size and Picard's CollectWgsMetrics (on the provided BAM file) reports the median coverage as 17 and the mean just below 17 (that with an exclusion total of 25.8% which seems high – seems like duplicates and overlapping reads are bringing that up).

fastp reports 75.744586 G total bases.

I have queried it, so will see what they say.

Will also try realigning to see if the output of CollectWgsMetrics improves.

mdn
11-29-2019, 08:49 PM
Dante is now offering a 4x to 30x upgrade for 99€. With a discount code YFULL10 total is 89€.

https://www.dantelabs.com/products/whole-genome-upgrade
Thank you, have someone tried it?

I have bought it and I have no any idea how to provide information which kit must be updated. :)
Or as I have just 1 4x kit - it will be used automatically?

teepean47
11-29-2019, 09:13 PM
Thank you, have someone tried it?

I have bought it and I have no any idea how to provide information which kit must be updated. :)
Or as I have just 1 4x kit - it will be used automatically?

I ordered it and I assume (hope) they know what they are doing :) I only have one kit so it shouldn't be too difficult.

ybmpark
12-03-2019, 01:25 AM
... I do wonder for example if everyone who bought last Black Friday has actually taken delivery of their data? ...

No, and they don't even reply to emails anymore but you will be accusing me of working for a rival company and lying anyway.

tontsa
12-03-2019, 05:43 AM
I've gotten everything from last Black Friday.. SNP/Indel vcfs in the portal around March and HDD in April. Also now couple of weeks ago they uploaded the fastqs and newly generated Dragen platform provided BAM.. though not with the hg38 alignment I asked for in May. Also from this Black Friday one of my kits is in sequencing.. so their Italy lab seems fine.. as long as you check coverage of the fastq and ask for missing bits if seriously under 30x. Though.. I have from their 90 day campaign from April still one kit just "passed QC" stage.

mdn
12-03-2019, 04:25 PM
Seems Intro 4x is removed from the shop. Not listed in All and old page https://www.dantelabs.com/products/introduction-to-whole-genome-sequencing is Not Found now.

tontsa
12-03-2019, 04:30 PM
That Intro 4x got removed also last time their offer ended.. so could be a glitch too.

sam-iJ-ZS1727
12-03-2019, 09:42 PM
I got results in 1 week.
was sooo fast
the problem now is how to send bam file ( 64,3 go) to YFULL ?
how can i read my results ?

MacUalraig
12-04-2019, 08:36 AM
I got results in 1 week.
was sooo fast
the problem now is how to send bam file ( 64,3 go) to YFULL ?
how can i read my results ?

They said they would provide a widget to allow direct sharing with YFull but it doesn't seem to have appeared at least on my account.

So unless you fancy loaning YFull your login details (!) you will have to download it to your local drive. After that the best bet is split off the Y chromosome (and mtDNA if desired) then share those with YFull. Various ways of doing it, I use samtools. Then put those much smaller BAM files on the cloud and send YFull the link. Or if you really want, upload the whole thing and share but that is less desirable .

Sounds fiddly but I do this with the YSEQ BAMs all the time - only difference is they provide ready split off Y BAM for you as well as the whole genome.
(by the way YSEQ also offer to manually share your data with YFull but I never let them do this as you can't be 100% which files they shared).

mdn
12-05-2019, 07:23 AM
How do you think, if I will buy an upgrade from Premium (30x) to Super Premium (130x + 30x) - will they just do an additional 100x for exome only and merge with old results?
Or will it be fully new sequencing which I will be able to merge myself and get ~160x exome + 60x genome?

mdn
12-05-2019, 07:28 AM
After that the best bet is split off the Y chromosome (and mtDNA if desired) then share those with YFull.
Might be it would be nice also to check if BAM is not in hg38 - create hg38 from that BAM or from FASTQ files. It will be very tricky and might take a week(s). :) (weeks if multithread mode will be forgotten - very easy for 1st time)

MacUalraig
12-05-2019, 08:32 AM
Might be it would be nice also to check if BAM is not in hg38 - create hg38 from that BAM or from FASTQ files. It will be very tricky and might take a week(s). :) (weeks if multithread mode will be forgotten - very easy for 1st time)

I realigned mine with minimap so I could compare it with the long read BAM. Run time was 6h 32m with 3 threads (default).

real time: 23590.254 sec CPU 69011.285 sec Peak RSS 13.465Gb

sam-iJ-ZS1727
12-05-2019, 02:42 PM
where can i find Minimap software ?
is it free ?
is it the same thing than the mapping service from Yseq or Dante Labs ?

MacUalraig
12-05-2019, 05:20 PM
where can i find Minimap software ?
is it free ?
is it the same thing than the mapping service from Yseq or Dante Labs ?

There are many freely available programs to do alignment (mapping). Some have Windows binaries but many don't (source and maybe Linux binaries). bowtie, bwa-mem are a couple of popular ones.

minimap2 is here:

https://github.com/lh3/minimap2

It needed 13+Gb of free RAM when I ran it.

Several companies will remap hg19 to hg38 or vice versa if you want, as well as yseq and dante Full Genomes do it

https://www.fullgenomes.com/analysis-and-more/

https://www.fullgenomes.com/purchases/136/?

tontsa
12-06-2019, 05:42 AM
Dante now has the upgrades back on website and new Hybrid Whole Genome Sequencing (130X exome + 30X WGS + 15X long reads).. is there really any use for those long reads if you don't get FAST5 files?

MacUalraig
12-06-2019, 07:58 AM
On your own, compare large scale variants esp deletions with the literature.
With a couple of others, start doing your own large scale variant based phylogeny. That's a pricey hobby though.

aaronbee2010
12-06-2019, 05:04 PM
On your own, compare large scale variants esp deletions with the literature.
With a couple of others, start doing your own large scale variant based phylogeny. That's a pricey hobby though.

Off topic, but the bottom of your signature is pure gold. :beerchug:

mdn
12-09-2019, 02:44 PM
New sale, December special - 299€ for 30x/8 weeks.

Intro 4x is still missing.
My upgrade seems also is lost, at least status is "Unfulfilled" - and 10 days of promised 3 weeks already passed. :)

pmokeefe
12-09-2019, 04:36 PM
The good news: A kit I ordered on the 11/11 sale for 8 weeks delivery had the fastq files posted on 12/09 so that was great!
The bad news: I ran fastp and got:

Before filtering
total reads:374.563238 M
total bases:55.366384 G
Which implies 55.4/3.2 = 17.3X coverage, significantly short of the advertised coverge.

They haven't posted BAM or VCF yet, but the FASTQ was posted less than a day ago.

slievenamon
12-09-2019, 06:32 PM
Could someone please provide mailing address for return of vial to Dante Labs in Italy?
Three Project members ordered.
Two in the USA have not received their kits, to date.
No reply from Dante on their queries.
Member in Ireland received kit and is ready to return.
No address label. No reply email from Dante, with address.
If someone could share a Dante Lab address, it would be appreciated.
Thanks...

tontsa
12-09-2019, 06:38 PM
You can print new shipping label when you register kit in the genome.dantelabs.com. If you have already registered a kit you can still press the register the kit button and on that page just print new label. I don't recommend trying to send them via normal mail as they have special deal with UPS and DHL for receiving those.

karwiso
12-09-2019, 09:09 PM
And statistics from my last WGS at Dante (Nov 2019)
fastp -i kit_R1_001.fastq.gz -I kit_R2_001.fastq.gz -o out.R1.fq.gz -O out.R2.fq.gz :


sequencing: paired end (151 cycles + 151 cycles)
mean length before filtering: 141bp, 141bp
mean length after filtering: 141bp, 140bp
duplication rate: 5.414195%
Insert size peak: 167
Before filtering
total reads: 586.671976 M
total bases: 82.797727 G
Q20 bases: 80.286072 G (96.966517%)
Q30 bases: 76.495246 G (92.388099%)
GC content: 42.565553%
After filtering
total reads: 580.360078 M
total bases: 81.833140 G
Q20 bases: 79.653677 G (97.336699%)
Q30 bases: 75.976144 G (92.842758%)
GC content: 42.504814%
Filtering result
reads passed filters: 580.360078 M (98.924118%)
reads with low quality: 6.029014 M (1.027664%)
reads with too many N: 137.120000 K (0.023373%)
reads too short: 145.764000 K (0.024846%)

So, total bases before filtering 82.80/3.2 = 25,9X
Total bases after filtering 81.83/3.2 = 25,6X

tontsa
12-11-2019, 11:00 AM
How would you guys go about combining multiple runs of 5x-30x WGS? I have 100bp BGI one and then 30x and 5x with Illumina 150bp reads. I've had mixed results of creating one BAM and then variant calling that.. not yet sure how to go about comparing variant calling individual BAMs and then merging the vcfs somehow.

pmokeefe
12-12-2019, 03:34 AM
I found a curious discrepancy between my FASTQ files and my BAM files received from Dante Labs recently. In 4/6 cases the FASTQ files had significantly fewer reads than the BAM files, which doesn't seem possible if the BAM files were based on the given FASTQ files. The number of FASTQ reads was determined by running fastp (https://github.com/OpenGene/fastp), BAM reads came from samtools flagstat (http://www.htslib.org/doc/samtools-flagstat.1.html).



reads (millions)




fastq
bam


45
752


614
615


145
699


585
586


44
569


375
728




The first four tests in the list were ordered during Black Friday 2018, but the results were only posted recently. The last one was ordered on Nov 11.

Has anyone else noticed this?

tontsa
12-12-2019, 05:13 AM
Atleast I got exactly the same fastqs as was on HDD I received back in April 2019 or so for 2018 Black Friday kit.. the BAM was newly generated with Dragen though but referenced atleast same files. I haven't really looked into that BAM too closely as I don't need hs37d5 aligned one currently.

dosas
12-12-2019, 06:24 AM
I am thinking of getting their intro package (30x). Apart from the obvious benefits of detailed health-related reports, is it worth it for genealogical research regions, like uploading the BAM to YFull (I have already done a BIGY700)?

Thanks.

MacUalraig
12-12-2019, 12:57 PM
I am thinking of getting their intro package (30x). Apart from the obvious benefits of detailed health-related reports, is it worth it for genealogical research regions, like uploading the BAM to YFull (I have already done a BIGY700)?

Thanks.

Mine matched the full set of novel variants from my Y Elite test, and no more (so far). Don't forget you will also have mtDNA with it which YFull now analyse and place on their mtree:

https://yfull.com/mtree/

mtDNA is stripped out of the BY raw data.

teepean47
12-12-2019, 12:59 PM
I am thinking of getting their intro package (30x). Apart from the obvious benefits of detailed health-related reports, is it worth it for genealogical research regions, like uploading the BAM to YFull (I have already done a BIGY700)?

Thanks.

30x has good quality to create autosomal kits with WGSExtract and YFull accepts the BAM as well.

teepean47
12-12-2019, 01:01 PM
How would you guys go about combining multiple runs of 5x-30x WGS? I have 100bp BGI one and then 30x and 5x with Illumina 150bp reads. I've had mixed results of creating one BAM and then variant calling that.. not yet sure how to go about comparing variant calling individual BAMs and then merging the vcfs somehow.

I was thinking about something similar: combine two FASTQ/BAM-files depending on read quality. So use one FASTQ/BAM to improve another.

bjp
12-12-2019, 05:34 PM
Atleast I got exactly the same fastqs as was on HDD I received back in April 2019 or so for 2018 Black Friday kit.. the BAM was newly generated with Dragen though but referenced atleast same files. I haven't really looked into that BAM too closely as I don't need hs37d5 aligned one currently.

Thanks for posting this. Just checked, and the kit I've been posting about now has BAM and FASTQ posted for download. I wish they could keep their story straight. When I ordered it, they said raw data would be downloadable. Then they said no way, gotta pay for the drive. Then I paid for the drive and they sent it about 6 months later. Now a few months after that and it is online. I have not bothered to download them yet to compare to the files on the drive.

MacUalraig
12-14-2019, 10:57 AM
Blaine Bettinger has got his raw data and made some long posts about it on his big fb group which has kicked off some interest. He used the YSEQ remapping service.

He mentioned that he expected having a WGS chapter in the next edition of his book which would be good.

belmont
12-15-2019, 09:37 AM
yesterday tested the download from Dante Labs, 36 GB BAM file came down in 27 minutes, so this maxed out my 200 Mbit/sec Internet line, I must say excellent speed. Also the i run fastp on it>
sequencing: paired end (151 cycles + 151 cycles)
mean length before filtering: 147bp, 147bp
mean length after filtering: 147bp, 147bp
duplication rate: 5.237798%
Insert size peak: 261
Before filtering
total reads: 622.063512 M
total bases: 91.707459 G
Q20 bases: 89.236250 G (97.305334%)
Q30 bases: 85.153134 G (92.853008%)
GC content: 42.444874%
After filtering
total reads: 616.151406 M
total bases: 90.678039 G
Q20 bases: 88.540228 G (97.642416%)
Q30 bases: 84.579331 G (93.274327%)
GC content: 42.377279%
Filtering result
reads passed filters: 616.151406 M (99.049598%)
reads with low quality: 5.595058 M (0.899435%)
reads with too many N: 13.404000 K (0.002155%)
reads too short: 303.644000 K (0.048812%)

Jan_Noack
12-16-2019, 08:28 AM
The coverage counted using idxstats form the original hg19 bam files supplied by Dante, and hg38 bam files created by myself using the procedure described by James Kane:

Kit 1: (98.3 Gbases): coverage 15.57x, 51.23 % unmapped - hg38 JKane: coverage 16,55x, 47.19 % unmapped
Kit 2: (73.7 Gbases): coverage 22.21x, 7.69 % unmapped - hg38 JKane: coverage 22.30x, 6.35 % unmapped
Kit 3: (97.8 Gbases): coverage 30.78x, 3.34 % unmapped - hg38 JKane: (not processed yet)

What could be the reason for so many unmapped reads for Kit 1? High contamination of the original sample? Or the 8 months storage of the sample by Dante?


Too many possibilities to answer. In principle, I believe DNA GenoTek https://blog.dnagenotek.com/ for example seems to promise years of storage at room temperature. However, this is assuming the stabilizing buffer is added promptly. In addition, your sample was probably also collected on SPECTRUM, while I recall Dante Labs saying they switched kits specifically because the new one provides higher quality, I've not seen any specs or studies supporting that.

Seeing read-quality of the unmapped reads (I can't, unfortunately, recall if any QC software readily provides that) to see if it's degraded DNA or results of mapping against some microbial references (I have oral on the "Dante Labs technical" thread, but I'm thinking one of the high-level classifiers could be useful too for mold etc. I have to say that 15.57X is certainly concerning for something sold as 30X. The microarray genotypes also seem to have individuals who never have enough DNA in the sample though, so I'm sure there's outliers and mis-applied stabilizing buffer etc.

I've had two kits done with a similar mapping/ QC issue on my Kit1.
Kit1. Purchased Feb2019, returned by a slow mail (even though paid for tracking etc as I mistakenly ordered from the US site and they only send a return address if I post from the Us. I did contact them and was not offered any help. This kit was delayed in US customs, and presumably Asian customs, then US customs again and then Italian customs. So months when it was not in ideal conditions. That said they insist it can be out in the heat of 50C or minus (as it was during the big freeze in the US last winter). My usual wait in US customs with parcels and mail is 5 weeks, I've had up to 9 weeks for a saliva sample to clear customs.
this is my husband's kt. We were careful in taking the saliva. I wore gloves, a plastic apron, clean washed hard floor environment. taken first thing in morning with no drink, food since evening before. Cleaned teeth and mouth well night before. etc. My husband does have PhD in biology if that adds to the point that it was NOT the collection process. (although the kit sample now reads you can clean teeth one hour before), that was not how the advice used to read a couple of years ago when I took FTDNA, ancestry etc samples. Stabilizing buffer applied immediately. Stored in cool place at room temp until posted that day, and stored in cool place at PO until collected by mail truck on a coolish day (all considered in collection). Time was also after the extremes of temperatures had passed for transit. Parcel tracking paid and left Australia promptly. Kit2 received by Dantein Utah in March 2019 I think, maybe April-I'd need to check that.

Kit2. Purchased directly from EU site and collected by DHL, and returned to Italy and finished processing and reports very fast. Collected Fri
22 Nov 2019 in Australia, finished processing and report loaded by 3Dec 2019 in Italy. I didn't notice until I got home on 13th Dec. Everything there. DHL expedites thru customs. This kit was also collected with care and consideration to temperature and picked up promptly by the DHL courier.

Kit1: using FASTQ1 49.99Gbases, FASTQ2 49.97 Gbases(using FQSUM). But running bam.iobio I get 61.7 % mapped reads, with 83.4% proper pairs, 6.1%singletons, 87.8% both mates mapped, 8.5% duplicates
https://www.facebook.com/groups/558973438185946/permalink/590295888387034/
gives almost a 20X, but I am unsure whether it is so corrupted it can be trusted that much?

kit2: is set to the new minimum no of reads of 90GB(using FQSUM) it seems but gave me almost a 30X.
90.4% mapped reads, 98.1% proper pairs, 0.6% singletons, 98.6% both mates mapped, 7.1% duplicates

I was assured it was NOt the delay or sitting around in customs or transit that may have caused degradation of the sample. Both kits had stabilizer applied immediately, both taken with utmost care and stored until sent with care. (room temp -aircon set to 20C). I read the tech stuff about the kits and stabilizer and even emailed them. I am assured it can withstand 50C on the tests they provide, for months. I am unsure if I agree with this though. If not, how does the sample get so contaminated... Ok, it could be viruses? he does have a lurking shingles virus , but at time of collection was healthy enough to not show any symptoms. Otherwise, with at least two of us with similar issues,.. maybe it is some problem with dante labs. I do wonder what Dante's QC test is and how these kits passed it?

pmokeefe
12-16-2019, 10:59 AM
I had mixed results with FASTQ files from Dante Labs since they started their in-house lab.
Four out of six kits were small, two were only 6.7M bases, 2.1X coverage.
Thanks to the members of the Dante Labs Customers FB, I tried redownloading the FASTQ files from the Dante Labs site.
3 of the 4 small kits are now significantly larger. You can check to see if the files are bigger by starting the download and then cancelling if the file size isn't larger. I've been checking my fourth small kit that way to see if eventually it is replaced as well. Good luck!

Old/New FASTQ total reads using fastp
Old New
6.7 137.4
6.6 84.3
55.4 107.5

mdn
12-16-2019, 12:29 PM
Is dantes lab any good?
During sales it is 5-10 times cheaper than any competitor (sale = 30x WGS price is 169€).
But for 99.9% potential users the provided output is clearly not enough. :) Either you will need to spend a lot of time to understand what you can do 'for free', or you will need to pay more to some other company.

mdn
12-16-2019, 12:32 PM
(regarding to upgrade from Intro)

I ordered it and I assume (hope) they know what they are doing :) I only have one kit so it shouldn't be too difficult.

Have you received any update?
My 4x ready for 'upgrade' is still not touched - but seems I will need to write to the support.

tontsa
12-16-2019, 12:36 PM
My 4x upgrade hasn't moved.. nor has resequencing of one 30x kit.. they are prolly again swamped from all the sale campaigns that we have to wait months or year for something to happen..

pmokeefe
12-16-2019, 01:45 PM
How would you guys go about combining multiple runs of 5x-30x WGS? I have 100bp BGI one and then 30x and 5x with Illumina 150bp reads. I've had mixed results of creating one BAM and then variant calling that.. not yet sure how to go about comparing variant calling individual BAMs and then merging the vcfs somehow.

I'm currently struggling with this issue. I have 3 BGI kits and 7 Illumina kits with repeated individuals within both BGI and Illumina and across the two. When I tried to use bcftools mpileup with bams from BGI and Illumina together, mpileup ignores the Ilumina kits. But when I run mpileup separately on BGI and Ilumina kits it seems to work reasonably. I've tried every option I could find for mpileup without success in handling both BGI and Illumina together.

I was motivated to work with the bam files Dante Labs provided because they had fairly reasonable coverage. Their Illumina FASTQs however were sometimes very small, <7G bases read in two cases and only 2/6 were close to 90G bases.

In the meantime it's been brought to my attention (on the Dante Labs customer FB group) that in some cases the FASTQs on Dante's site have changed. 3 out of 4 of my kits that were small are now 85G or even bigger. So now I'm contemplating realigning the FASTQs.

But just out of curiosity, has successfully combined BGI and Illumina bams using bcftools mpileup (or any other method)?

JamesKane
12-17-2019, 01:20 PM
How would you guys go about combining multiple runs of 5x-30x WGS? I have 100bp BGI one and then 30x and 5x with Illumina 150bp reads. I've had mixed results of creating one BAM and then variant calling that.. not yet sure how to go about comparing variant calling individual BAMs and then merging the vcfs somehow.

I don't recommend mixing BAMs of varying read lengths. In the experiment I did with a 100bp 1st generation Y Elite and a 150bp 30x WGS test, the resulting file was less callable overall than just using the 30x WGS file on chrY locations. The source of the deviation is in the lower alignment confidence in the short reads. You wind up with a lot of reads with lower MapQ scores washing out the much higher scores on the longer reads.

pmokeefe
12-17-2019, 04:46 PM
I don't recommend mixing BAMs of varying read lengths. In the experiment I did with a 100bp 1st generation Y Elite and a 150bp 30x WGS test, the resulting file was less callable overall than just using the 30x WGS file on chrY locations. The source of the deviation is in the lower alignment confidence in the short reads. You wind up with a lot of reads with lower MapQ scores washing out the much higher scores on the longer reads.

I just took a quick look at a 100bp BGISEQ-500 bam and a 150bp Illumina NovaSeq 6000 bam both provided by Dante Labs. I used https://bam.iobio.io/ In both cases the "Mapping Quality" displayed was overwhelming loaded at 60, both graphs were basically tall spikes on the far right. From the description on the site I'm guessing their Mapping Quality is just MapQ, anyone know for sure? I hope so, it's cute and fast.

*** update ***
Some MapQ stats for chromosome 22 were:

% nonzero MapQ
100bp: 94.2%
150bp: 96.0%

average MapQ when MapQ!=0
100bp: 57.87
150bp: 58.28

So the chromosome 22 MapQ numbers were a bit better with the 150bp Illumina kit than the 100bp BGISEQ kit.
Hopefully 22 is representative.

slievenamon
12-17-2019, 07:54 PM
A 2019 WGS Black Friday kit finally arrived for a USA Project member.
It was from Draper, Utah, USA. Kit is to be returned to Draper, UT.
This kit had been delayed.

Another USA Project member is still pending receipt of his WGS kits.
Will his kits arrive from Italy or Draper, Utah, USA?

A Project member, in Ireland, received his WGS kit soon after ordering.
His kit was returned to Dante's lab in Italy.

Where did your WGS kit originate?
Did you return it to Italy or Draper, Utah, USA?
Thanks...

Donwulff
12-19-2019, 06:45 AM
What I would do would be align both BAM's separately (especially BWA MEM since it dynamically detects the insert-length of the DNA fragments in the library!) making sur eto include read-groups so all statistical analysis will be able to tell they have different statistics, and then merge the BAM files. I'm not sure how exactly variant-callers deal with the different insert lengths etc. and it will undoubtedly depend on variant caller, many do local re-alignment during variant calling. It should be noted that reads are often trimmed for adapters & low-quality bases before analysis, so read lengths should already be supported to some degree, and speaking of which if you have different adapter sequences it's probably worth trimming them with something that supports both adapter sets. On the whole I don't think anybody has really looked or made a study into combining short-reads from different libraries or sequencing platforms.

Donwulff
12-19-2019, 07:20 AM
If the variant-caller penalizes locations with low MapQ reads (And I'm not sure why it should, shouldn't any sane implementation still use high MapQ reads?), perhaps those could be filtered out from the shorter reads, although I admit this messes up with assumption of uniform statistics. VQSR could map those as low-MapQ regions of course, but I'm not entirely certain VQSR would make sense for such a combined run. I also wonder how the new neural network version fares with that... stuff to test I guess.

Adamm
12-20-2019, 12:44 PM
My first time with Dante Labs, hope it's worth it :)

https://i.imgur.com/lBHur5y.png

E_M81_I3A
12-20-2019, 06:24 PM
My first time with Dante Labs, hope it's worth it :)

https://i.imgur.com/lBHur5y.png

Not sure... They received my kit in April, 8 months ago and still waiting for the results! Their last answer in October was "We have a backlog of results to be released right now, but your results will be available in the next few weeks"… Since then, no news and they no longer answer to my emails>:(.

I think no one living in Europe or outside of the US has ever received his results, unfortunately.

slievenamon
12-20-2019, 07:36 PM
We have a Project member in Ireland. He sent his kit to Italy.
Italy is in the EU and the GDPR is in Dublin, Ireland.
There are options available, if you don't receive what you paid for.
Our other members have sent their kits to Draper, Utah, USA.
That's right up the road from me.
Not a problem...
The kits our Project received were from Spectrum DNA.
They are a business resource of Ancestry.
If you can't get answers from Dante, try their affiliates, eh?
From where did your kits arrive?
Where did you return them?

Adamm
12-20-2019, 07:42 PM
Not sure... They received my kit in April, 8 months ago and still waiting for the results! Their last answer in October was "We have a backlog of results to be released right now, but your results will be available in the next few weeks"… Since then, no news and they no longer answer to my emails>:(.

I think no one living in Europe or outside of the US has ever received his results, unfortunately.

Wow that is pretty shitty from them, can't you contact them through a phone?

teepean47
12-20-2019, 08:00 PM
I think no one living in Europe or outside of the US has ever received his results, unfortunately.

I live in Europe and got my results in less than eight weeks. There are others in this thread from Europe who have discussed about their results.

E_M81_I3A
12-20-2019, 08:02 PM
We have a Project member in Ireland. He sent his kit to Italy.
Italy is in the EU and the GDPR is in Dublin, Ireland.
There are options available, if you don't receive what you paid for.
Our other members have sent their kits to Draper, Utah, USA.
That's right up the road from me.
Not a problem...
The kits our Project received were from Spectrum DNA.
They are a business resource of Ancestry.
If you can't get answers from Dante, try their affiliates, eh?
From where did your kits arrive?
Where did you return them?

I received my kit directly from DanteLabs and sent it back to Italy in April, they received it on May 12th. Since June, its status is in "Your Kit will be Sequenced Shortly. Your kit has passed the quality care inspection and is scheduled to be sequenced shortly. After sequencing is complete your results will be posted soon after" .

Until October they always answered to my emails and in the last one in October they wrote me "Thanks for reaching out. We have a backlog of results to be released right now, but your results will be available in the next few weeks. At the moment, we are unable to give you the exact day and date of when your result will be out . And you will be notified by email when we have a more precise timeline for your results. Thanks for your patience and apology for the inconvenience."

But since then no news… and no more answer to my emails.

I think the problem might be because when I bought my kit on their web site on April 26th (at a special price of 199 euros) it was written that if the results were not sent after 90 days then I should get both my results and my money back… As they probably dont want to give me the money back, then they dont send the results.

slievenamon
12-20-2019, 08:26 PM
I think the problem might be because when I bought my kit on their web site in April (at a special price of 199 euros) it was written that if the results were not sent after 90 days then I should get both my results and my money back… As they probably dont want to give my money back, then they dont send the results.

You have requested your results from Dante and have not received them.
Have you request a refund from Dante?
How do you know they do not wish to return your money?
You are entitled to both a refund and results, after 90 days, yes?
If Dante is not in compliance, why have you not filed with GDPR in Dublin, Ireland?

E_M81_I3A
12-20-2019, 08:35 PM
You have requested your results from Dante and have not received them.
Have you request a refund from Dante?
How do you know they do not wish to return your money?
You are entitled to both a refund and results, after 90 days, yes?
If Dante is not in compliance, why have you not filed with GDPR in Dublin, Ireland?

Yes I am entitled to both a refund and results, after 90 days. Thats is why I requested a refund in August after the 90 days and they answered "Thanks for reaching out. For your refund, I will be escalating to the relevant team for further assistance. We will reach out with an update as soon as we have more information. Once your refund is completed successfully, you will be notified. Thanks for your patience."

MacUalraig
12-21-2019, 03:46 PM
Yes I am entitled to both a refund and results, after 90 days. Thats is why I requested a refund in August after the 90 days and they answered "Thanks for reaching out. For your refund, I will be escalating to the relevant team for further assistance. We will reach out with an update as soon as we have more information. Once your refund is completed successfully, you will be notified. Thanks for your patience."

Interesting, I recall at one stage they argued that they would *first* deliver the delayed results and then hand back the refund. Spot the catch.

pmokeefe
12-21-2019, 06:59 PM
I've been trying to use bcftools mpileup (http://samtools.github.io/bcftools/bcftools.html#mpileup) to combine different Dante Labs kits and have run into a brick wall. When I feed two old BGI bams to mpileup it works fine. When I feed two new Illumina kits to mpileup it also works fine. But when I give mpileup a BGI and an Illumina it only uses one. When the order of the input bam files given to mpileup is BGI Illumina it only uses the BGI bam. When the order is reversed, it only uses the Illumina bam. I tried many other permutations and combinations: with/without giving a reference file; changing the order of the lines in the read group file; selecting a region vs. using the entire bam; providing the bam files in a file with the -b option instead of on the command line etc. bcftools mpileup would never combine a BGI bam with an Illumina, and changing the order of the bam files controlled which bam it used. The order of lines in the read group file didn't matter.

So it doesn't seem to have anything to do with the relative quality of BGI and Illumina, it appears that mpileup deems them incompatible and uses whichever one it sees first.

Suggestions, other things to try, etc. would be most appreciated!


If you are curious the bam files are available in a public Google Drive folder (https://drive.google.com/open?id=1aj-Qu_Mq4zSI3Jo3LDVlNuN8aag4iIKN). They are just chromosome 22, so not too huge. The bash script and read group file I used are also there.


Here are the commands I used. bam1 is BGI and bam2 is Illumina, region=22:29877565, RG12 is the read group file.

bam1=$1
bam2=$2
region=$3
RG12=$4

echo "mpileup BGI"
bcftools mpileup --no-reference -a FORMAT/AD $bam1 -r $region| \
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%AD]\n'
echo

echo "mpileup Illumina"
bcftools mpileup --no-reference -a FORMAT/AD $bam2 -r $region| \
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%AD]\n'
echo

echo "--read-groups"
cat $RG12
echo

echo "mpileup BGI Illumina"
bcftools mpileup --no-reference -a FORMAT/AD --read-groups $RG12 $bam1 $bam2 -r $region | \
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%AD]\n'
echo

echo "mpileup Illumina BGI"
bcftools mpileup --no-reference -a FORMAT/AD --read-groups $RG12 $bam2 $bam1 -r $region | \
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%AD]\n'
echo



Here is the output of the script

mpileup BGI
22 29877565 N G,<*> 0,44,0

mpileup Illumina
22 29877565 N G,<*> 0,30,0

--read-groups
* 56001801067578A_WGZ/result_alignment/56001801067578A_WGZ.22.bam pmo
* 60820188474472/60820188474472.22.bam pmo

mpileup BGI Illumina
22 29877565 N G,<*> 0,44,0

mpileup Illumina BGI
22 29877565 N G,<*> 0,30,0


This is the command I used to invoke the script, your files might be in a different location
./mpileup.anomaly.sh 56001801067578A_WGZ/result_alignment/56001801067578A_WGZ.22.bam 60820188474472/60820188474472.22.bam 22:29877565 mpileup.anomaly.RG12.txt > mpileup.anomaly.output.txt

MacUalraig
12-21-2019, 07:30 PM
Did you look at the source code for mpileup.c? Quick skim shows only one place in the file loop where it skips a file as opposed to exits

conf->mplp_data[i]->bam_id = bam_smpl_add_bam(conf->bsmpl,h_tmp->text,conf->files[i]);
if ( conf->mplp_data[i]->bam_id<0 )
{
// no usable readgroups in this bam, it can be skipped

<snip>
continue;
}

pmokeefe
12-21-2019, 07:38 PM
Did you look at the source code for mpileup.c? Quick skim shows only one place in the file loop where it skips a file as opposed to exits

conf->mplp_data[i]->bam_id = bam_smpl_add_bam(conf->bsmpl,h_tmp->text,conf->files[i]);
if ( conf->mplp_data[i]->bam_id<0 )
{
// no usable readgroups in this bam, it can be skipped

<snip>
continue;
}

Thanks, I looked at that file yesterday, but didn't notice that spot. I will look again.

teepean47
12-21-2019, 10:30 PM
(regarding to upgrade from Intro)


Have you received any update?
My 4x ready for 'upgrade' is still not touched - but seems I will need to write to the support.

Nothing yet.

pmokeefe
12-22-2019, 12:59 AM
Did you look at the source code for mpileup.c? Quick skim shows only one place in the file loop where it skips a file as opposed to exits

conf->mplp_data[i]->bam_id = bam_smpl_add_bam(conf->bsmpl,h_tmp->text,conf->files[i]);
if ( conf->mplp_data[i]->bam_id<0 )
{
// no usable readgroups in this bam, it can be skipped

<snip>
continue;
}

I cloned the development version of bcftools from github and installed it on my system. mpileup still behaves the same given BGI and Illumina bam files. But alas mpileup does not hit that particular code in the process. Definitely worth checking though, and now I am all set up to explore further:)

Jatt1
12-22-2019, 04:24 AM
Should one do BiG Y 700 or Dente Labs WGS?

MacUalraig
12-23-2019, 12:38 PM
YFull have finished their SNP analysis of my Dante short read bam which I uploaded in its original hg19 format as I was a bit impatient...

As per the statistics comparison page where I can now compare 4 BAMs of mine they claim to have only found 7 NVs compared with 12 on the Y Elite: however this is misleading as the Y Elite figure includes originally novel SNPs that were subsequently promoted to branch level.

ChrY BAM file size: 0.44 GbHg19
Reads (all): 5959939
Mapped reads: 5959939(100.00%)
Unmapped reads: 0
Length coverage: 22935991 bp(99.79%)
Min depth coverage: 1X
Max depth coverage: 205X
Mean depth coverage: 23.09X
Median depth coverage: 21X
Length coverage for age: 8418625 bp
No call: 48538 bp

MacUalraig
12-23-2019, 12:41 PM
Should one do BiG Y 700 or Dente Labs WGS?

Dante (+ YFull upload of both chrY and mtDNA).

aaronbee2010
12-23-2019, 06:16 PM
Should one do BiG Y 700 or Dente Labs WGS?

Wait for an offer from Dante Labs that falls under $200 for 30x WGS, and do that. You can use Galaxy (or even a Linux Bash Shell in Windows) to align the FASTQ's to hg38, extract Y-DNA and mtDNA reads and upload to YFull for $50. $250 or less is very low compared to the offer price of a Big Y ($550 including BAM download and YFull upload). Even without the BAM download and YFull upload, the offer price will be somewhere around $400, but you're restricted to FTDNA's tree and database, which can't grow as much as YFull's database, which consists of Y-DNA data from various companies and studies. FTDNA's database isn't good for South Asians, and you can still compare your YFull STR's to those from South Asians in the India and L projects on FTDNA.

Dante Labs seem to be doing better even since they opened their own laboratory and customer service team (they outsourced both before). Some older backlogged customers are still waiting (I was a backlogged customer but I received my results), but newer customers appear to be getting their results, however some of them appear to be getting lower coverage, however you can request resequencing if this does happen to you. bol_nat just received his results from Dante Labs and his coverage is around 35x (mine is around 29x), and Rustyshakelford has ordered a kit as well.

On top of this, Dante Labs also gives you autosomal and mtDNA data as well. More data for less money means that on offer, Dante Labs is vastly superior.

Jatt1
12-24-2019, 05:10 AM
Wait for an offer from Dante Labs that falls under $200 for 30x WGS, and do that. You can use Galaxy (or even a Linux Bash Shell in Windows) to align the FASTQ's to hg38, extract Y-DNA and mtDNA reads and upload to YFull for $50. $250 or less is very low compared to the offer price of a Big Y ($550 including BAM download and YFull upload). Even without the BAM download and YFull upload, the offer price will be somewhere around $400, but you're restricted to FTDNA's tree and database, which can't grow as much as YFull's database, which consists of Y-DNA data from various companies and studies. FTDNA's database isn't good for South Asians, and you can still compare your YFull STR's to those from South Asians in the India and L projects on FTDNA.

Dante Labs seem to be doing better even since they opened their own laboratory and customer service team (they outsourced both before). Some older backlogged customers are still waiting (I was a backlogged customer but I received my results), but newer customers appear to be getting their results, however some of them appear to be getting lower coverage, however you can request resequencing if this does happen to you. bol_nat just received his results from Dante Labs and his coverage is around 35x (mine is around 29x), and Rustyshakelford has ordered a kit as well.

On top of this, Dante Labs also gives you autosomal and mtDNA data as well. More data for less money means that on offer, Dante Labs is vastly superior.

I can get one for my brother for $259US, but for me it be full price so the Dante Labs will make sense for me and BIG Y 700 sounds alright for my brother. Thanks

Dr_McNinja
12-24-2019, 05:25 AM
I'd consider Big Y if money isn't an issue or if your main need is the Y chromosome. The read depth is good and my previous Big Y in Nov 2019 got done in less than 5 weeks. The one before that took about two months during this past summer (June-Aug). Current Big Y approaching 5 weeks (batched late Nov), so we'll see if that quick turnaround continues. Price: $419 (including BAM and as an upgrade on top of previous tests)

Dante took around 6-7 months, gave an Hg19 BAM, not already in Hg38 alignment like Big Y, and mean/median depth was ~10X (Big Y was 45-46X for one kit, and 34X median/38X mean for another).

On the other hand, Dante for $199 is a great deal for the best autosomal test essentially.

slievenamon
12-24-2019, 05:25 AM
Yes I am entitled to both a refund and results, after 90 days. Thats is why I requested a refund in August after the 90 days and they answered "Thanks for reaching out. For your refund, I will be escalating to the relevant team for further assistance. We will reach out with an update as soon as we have more information. Once your refund is completed successfully, you will be notified. Thanks for your patience."

Ultimately, it's about choices.
If your patience is exhausted, dispute the monies paid to Dante for non performance.
Your credit card company can walk you through the process.

E_M81_I3A
12-24-2019, 07:18 AM
Ultimately, it's about choices.
If your patience is exhausted, dispute the monies paid to Dante for non performance.
Your credit card company can walk you through the process.

I can still wait but I would just like them to answer to my emails to give me an update. Not sure why they stopped since October...

Jatt1
12-24-2019, 08:12 AM
I'd consider Big Y if money isn't an issue or if your main need is the Y chromosome. The read depth is good and my previous Big Y in Nov 2019 got done in less than 5 weeks. The one before that took about two months during this past summer (June-Aug). Current Big Y approaching 5 weeks (batched late Nov), so we'll see if that quick turnaround continues. Price: $419 (including BAM and as an upgrade on top of previous tests)

Dante took around 6-7 months, gave an Hg19 BAM, not already in Hg38 alignment like Big Y, and mean/median depth was ~10X (Big Y was 45-46X for one kit, and 34X median/38X mean for another).

On the other hand, Dante for $199 is a great deal for the best autosomal test essentially.

Does FTDNA charges for BAM files? Thanks

aaronbee2010
12-24-2019, 09:10 AM
I'd consider Big Y if money isn't an issue or if your main need is the Y chromosome. The read depth is good and my previous Big Y in Nov 2019 got done in less than 5 weeks. The one before that took about two months during this past summer (June-Aug). Current Big Y approaching 5 weeks (batched late Nov), so we'll see if that quick turnaround continues. Price: $419 (including BAM and as an upgrade on top of previous tests)

Dante took around 6-7 months, gave an Hg19 BAM, not already in Hg38 alignment like Big Y, and mean/median depth was ~10X (Big Y was 45-46X for one kit, and 34X median/38X mean for another).

On the other hand, Dante for $199 is a great deal for the best autosomal test essentially.

Coverage isn't the only factor at play here. It's worth mentioning that Big Y-700 doesn't cover as much of the GRCh38 of the Y chromosome as Dante Labs does. Below is a comparison between Big Y-500 (top), Big Y-700 (middle) and Dante 30x (bottom) (Source (https://www.facebook.com/RochaHolmes/photos/p.165786074814137/165786074814137/?type=1&theater)):

https://i.gyazo.com/e982867c5fb82e0a3f492d01d203bb93.jpg

The difference in length of coverage seems to offset the disparity in breadth of coverage between the two tests. Bear in mind that genealogically irrelevant parts of the Y chromosome are discarded prior to sequencing with a Big Y test, which isn't the case for Dante 30x, so when a new reference is released (i.e. GRCh39), you can just align the Dante FASTQ's to the new reference, which can result in more SNPs being discovered due to an increase in the known percentage of the Y chromosome. You don't have this option with Big Y-700 as the discarded regions are only irrelevant as far as GRCh38 is concerned. If a new build is released, you would need to wait for FTDNA to release a new test and you would need to pay for an upgrade, in all likelihood.

This page (https://ydna-warehouse.org/statistics.html) provides a comparison between various Y-DNA sequencing services and their ability to call SNPs. Big Y-700 and Dante 30x both have similar results in this regard. As I mentioned previously, the increase in breadth of coverage of the targeted regions with Big Y-700 appears to be roughly cancelled out by it's length of coverage.

https://i.gyazo.com/d75c552d23468a4b9c46f4f1f70c65b1.png

https://i.gyazo.com/8013ecf36df6b3020f98859c4cfe4679.png

https://i.gyazo.com/d9fa3b3437ed067117bdbeb0bb2a0a09.png

aaronbee2010
12-24-2019, 09:27 AM
I can get one for my brother for $259US, but for me it be full price so the Dante Labs will make sense for me and BIG Y 700 sounds alright for my brother. Thanks

If you're going to pay for the Big Y-700 as well, why not get that one for yourself too? :P

aaronbee2010
12-24-2019, 10:58 AM
Does FTDNA charges for BAM files? Thanks

They charge $100 to download the BAM.

MacUalraig
12-24-2019, 11:12 AM
They charge $100 to download the BAM.

One respected commentator has suggested this charge violates GDPR. I'm not an expert but the rules appear to say

"As a rule, the information has to be provided free of charge. If, in addition, further copies are requested, one can request a reasonable payment which reflects administrative costs. The controller is also allowed to refuse a data subject’s requests to right of access if it is unjustified or excessive. "

The BAM file of course already exists - they used it to do the SNP calling and probably are using it for the raw data browse function (despite one fan who dubiously claimed this was some separate file). So the administrative cost should be negligible.

JMcB
12-24-2019, 06:20 PM
Does FTDNA charges for BAM files? Thanks

The answer is yes. They charge $100 for the release of your Bam file, if you’ve tested after November 1st, 2019. So that should be figured into your calculations. Assuming you’re planning on having your results analyzed by YFull.

As an aside, YFull is now accepting VCF files for analysis and Family Tree is still supposed to be supplying those for free. So that option appears to be available but unfortunately it’s still going to hinder their ability to some degree. As Bam files contain more information. (See below). Fortunately, they will be incorporating VCF file age estimations with their analysis, starting sometime in February. And, if you choose, you can still send in your Bam file later on and get a new analysis done for no extra cost.


Here’s the latest statement from YFull which was posted on their Facebook page yesterday:

Dear friends,

now we accept the VCF files for interpretation (Big Y700 only). Unfortunately, the VCF files are not raw data files. These files contain much less important information for analysis than BAM files. At any time you can make an order for a FREE upgrade "VCF > BAM". At the moment, we do not use VCF samples for age calculating , but we are already working to ensure that VCF samples also participate in the age calculations and plan this for February. Also, STR matches are not available for VCF samples. This is due to the fact that STR markers are very poorly extracted from VCF files, and number of markers does not allow showing matches with a high confidence. Thank you for being with us!

Vladimir Semargl

Jatt1
12-24-2019, 08:19 PM
Than you all very much.

When would most likely Dante Labs be at lowest price?

aaronbee2010
12-24-2019, 08:45 PM
Than you all very much.

When would most likely Dante Labs be at lowest price?

The biggest savings tend to be on/around Black Friday, but they've done offers around other times of year (i.e. DNA day, singles day). If you can wait until Black Friday next year, then that's your best bet, otherwise I would keep a track of prices on their website and wait for an offer that is $200 or less and go for it. You can try the "YFULL10" discount code for a 10% discount and see if that works as well. Look out for upcoming times of year like Easter and you might get lucky there. They might do something for Christmas or New Years, but I don't think it's likely considering they had their Black Friday sale not long ago. I bought 3 kits during that sale, two for my parents and one for my maternal uncle. I got them for $170 each with the YFULL10 discount code (on top of the Black Friday sale price), which was a massive bargain!

E_M81_I3A
12-25-2019, 09:17 AM
Does anyone know other DanteLabs email address than "[email protected]" ? As this one no longer answers.

slievenamon
12-26-2019, 02:12 AM
I can still wait but I would just like them to answer to my emails to give me an update. Not sure why they stopped since October...

MacUalraig stated "spot the catch"
It's to do with Dante's results/refund offer.
You've suspected that from the beginning.
Appears you are correct...

FreeAmin
12-30-2019, 02:23 AM
Hello, I think the best solution is to join their official facebook group and also join the unofficial facebook group there they won't be able to ignore your messages, do you have a facebook account?

FreeAmin
12-30-2019, 02:26 AM
they solved most of their problem now, they offer fast results, but new problems have appeared sometimes the quality is not really 30x so you have to check test results.

tontsa
12-31-2019, 03:27 AM
they solved most of their problem now, they offer fast results, but new problems have appeared sometimes the quality is not really 30x so you have to check test results.

And now that the Facebook groups know they can easily check the number of gigabases in their fastqs so now Dante is backlogged again. I got initial around 25x coverage in 3 weeks but after that I have been waiting for the resequence. Also my 5x->30x upgrade is still waiting even though they promise 3 week turn around for those..

FreeAmin
12-31-2019, 11:32 PM
The price is unbeatable, too bad they have too many problems it never ends.

teepean47
01-03-2020, 02:39 PM
And now that the Facebook groups know they can easily check the number of gigabases in their fastqs so now Dante is backlogged again. I got initial around 25x coverage in 3 weeks but after that I have been waiting for the resequence. Also my 5x->30x upgrade is still waiting even though they promise 3 week turn around for those..

I noticed today that my upgrade kit's status changed to "Sequencing Started".

teepean47
01-03-2020, 08:57 PM
I got the results from YFull for my 4x test. Here's the comparison with Big Y-700.

https://i.imgur.com/oT5j9k8.png

35728

Giosta
01-05-2020, 02:02 AM
The bad news: I ran fastp and got:

Before filtering
total reads:374.563238 M
total bases:55.366384 G
Which implies 55.4/3.2 = 17.3X coverage, significantly short of the advertised coverge.


Hello

Where is this 3.2 coming from? I saw some other person using number 2.9 and I would like to know a little more about this number.

aaronbee2010
01-05-2020, 02:15 AM
Hello

Where is this 3.2 coming from? I saw some other person using number 2.9 and I would like to know a little more about this number.

It's the total number of bases in the human genome (3.2 GBases = 3.2 billion bases)

The total number of bases in pmokeefe's FASTQ files is 55.4 GBases. The coverage of the FASTQ file is the average number of times a base is covered by a test, which is calculated by dividing the no. of bases in the FASTQ files by the no. of bases in the human genome, which is 55.4 billion divided by 3.2 billion = 17.3 times coverage.

Some people use different values for the size of the human genome, hence the differing values. Dante Labs have stated that their target for 30x WGS is 90 GBases, which means that they're using a value of 3 billion bases. Obviously, they've fallen short with pmokeefe, and a lot of others from what I've seen on DL's customer care FB group, as well as AG.

aaronbee2010
01-05-2020, 02:25 AM
I got the results from YFull for my 4x test. Here's the comparison with Big Y-700.

https://i.imgur.com/oT5j9k8.png

35728

That's insanely good considering their 4x price was around 87 pounds. It's a shame they don't offer the test anymore from what I've seen.

pmokeefe
01-05-2020, 02:55 AM
It's the total number of bases in the human genome (3.2 GBases = 3.2 billion bases)

The total number of bases in pmokeefe's FASTQ files is 55.4 GBases. The coverage of the FASTQ file is the average number of times a base is covered by a test, which is calculated by dividing the no. of bases in the FASTQ files by the no. of bases in the human genome, which is 55.4 billion divided by 3.2 billion = 17.3 times coverage.

Some people use different values for the size of the human genome, hence the differing values. Dante Labs have stated that their target for 30x WGS is 90 GBases, which means that they're using a value of 3 billion bases. Obviously, they've fallen short with pmokeefe, and a lot of others from what I've seen on DL's customer care FB group, as well as AG.

In the meantime, Dante Labs replaced the FASTQ files for some of my kits. So now, of the 6 kits with results, 5 of them have coverage close to or exceeding the 90GB they advertise. One is still only 21.4GB.
Here's a link to a public Google Drive folder which contains the FASTQ for three of those kits and fastp output for all six: https://drive.google.com/open?id=1_x7ZtSenJNUyb9nsq0hcNbyizx1a39F3/

aaronbee2010
01-05-2020, 04:02 PM
In the meantime, Dante Labs replaced the FASTQ files for some of my kits. So now, of the 6 kits with results, 5 of them have coverage close to or exceeding the 90GB they advertise. One is still only 21.4GB.
Here's a link to a public Google Drive folder which contains the FASTQ for three of those kits and fastp output for all six: https://drive.google.com/open?id=1_x7ZtSenJNUyb9nsq0hcNbyizx1a39F3/

I get a 404 error when trying to view the link. It doesn't work when I use my phone either.

Thank you for the link, though!

pmokeefe
01-05-2020, 04:22 PM
I get a 404 error when trying to view the link. It doesn't work when I use my phone either.

Thank you for the link, though!

Sorry about that! This link seems to work: https://drive.google.com/open?id=1_x7ZtSenJNUyb9nsq0hcNbyizx1a39F3

Giosta
01-05-2020, 05:44 PM
It's the total number of bases in the human genome (3.2 GBases = 3.2 billion bases)

Some people use different values for the size of the human genome, hence the differing values. Dante Labs have stated that their target for 30x WGS is 90 GBases, which means that they're using a value of 3 billion bases.

Thank You very much.

My 78.35 Gigabase fastq files are possibly truncated and my coverage is 78.35/3=26.117x. I am currently trying to explain the situation to Dante CS but they don't want to agree there is something wrong.

And that's why it is encouraging to read pmokeefe's last post how those broken fastq files are replaced with the good ones.

My bam is also too small in numbers compared to "good" bam. So Dante has created my bam from those broken fastq files.

aaronbee2010
01-05-2020, 06:53 PM
Thank You very much.

My 78.35 Gigabase fastq files are possibly truncated and my coverage is 78.35/3=26.117x. I am currently trying to explain the situation to Dante CS but they don't want to agree there is something wrong.

And that's why it is encouraging to read pmokeefe's last post how those broken fastq files are replaced with the good ones.

My bam is also too small in numbers compared to "good" bam. So Dante has created my bam from those broken fastq files.

So despite you stating that the files are below 90 GBases (presumably with proof), they still refuse to do anything? I believe I've seen DL CS being quite difficult with this with quite a few people if I recall correctly, so I hope you continue to pursue this. I would recommend posting this in the DL "Customer Care" FB group so others can see this too.

Ysearcher
01-06-2020, 12:05 AM
I can't remember if I ever posted this or not, but I couldn't find it with a search, so I'll go ahead and post it.

I ordered a kit from Dante Labs on Black Friday, 2018 for $199. I had it delivered to my son's address in Canada, and linked to his email account. My son was dying from ALS, and it was very low on his list of priorities. I live in the USA about 1500 miles from his address in Canada, so when I next visited him in July 2019, I eventually found the Dante collection kit sitting on a shelf under his computer monitor. I was returning home, so collected the kit to carry across the border to the USA and mail from there, which is waht I did. After I collected the kit, and before mailing it, I went to the website to register the kit, but the website was fouled up, and after a dozen or more emails to the help team, and following all of their instructions, the kit could not be registered and I was compelled to be on my way back home, so I followed the instructions included with the kit, and put it in the mail. The help team assured me that was fine, there would be no problem. I mailed the kit with a tracking number, and finally got proof of delivery, and again the help team assured me that there was no problem. A week or two passed and no further responses from the help team. I finally deduced that the return address included with the kit was in fact the mailing address of the kit manufacturer, definitely not the address for Dante's lab. After a couple more emails, the help team at Dante simply stopped responding to my emails. My son died August 6, 2019, and I lost the kit, the $199 and most importantly, the last chance to get a whole genome sequence for my dying son. Dante Labs offered nothing whatsoever to make amends, no new kit, no refund, zippo, nada, zero. Apparently, just to add insult to injury, they deleted all records of the kit from my son's Dante Labs account, as if by doing so they could pretend that it had never happened. Of course, I still have the credit card statement showing payment for the kit, the USPS proof of delivery, a PDF of the receipt page when I ordered it, and all of the email conversations with the Dante Labs "help" team. How would you feel about it if you experienced a similar encounter with them? What would you do about it? I went through a lot of different stages of anger and frustration, but ultimately decided that there really isn't much I can do about it beyond trying to file a suit against a company headquartered in Italy to try to recover my lost $199, which would of course, be beyond absurd, and I can't imagine any attorney that would take that case. So I have settled for shaming them on every public forum that I can. They are beyond shameful, and nothing I could say could possibly express my disgust with them.

domogled
01-06-2020, 11:16 AM
Hello everyone,
I am following Dante Labs (WGS) thread since almost one year ago, not until long ago the registration here on the forum been deactivated so only yesterday I finally managed to finalize the sign up process.
This thread was my main source of information regarding DL's WGS, until I discovered the customer Facebook group.
I purchased a kit back in February 2019, got the partial results back in June,the 4 official files (CNV.vcf, SNP.vcf, indel.vcf and SV.vcf) 3 other "hidden" files (G.vcf, raw.indel.vcf and raw.SNP.vcf) and some reports.
An then I waited and waited, trying to get in contact with them and no replies on any of my inquires neither e-mails, private messages on Facebook group to some of the members of their staff or tagging these members in comments on these very same facebook groups. One day, eventually their staff memeber Sydney, replied to one of my not so positive comments about their lack of customer care and, finally in a "sunny" day of December my raw files started to pop up. Firstly a fastq, the R2 then its twin the R1 and finally the .bam file together with all other small files (bam.bai, CNV.tbi, SV.tbi, SNP.tbi and indel.tbi) It was a 10 and a half month's game of patience, finally concluded.

I checked the fastq's with a tiny tool FQSum and the result is 127.46GBases. I have to add that it is 100bp and not 150bp as the newly sequenced kits in their lab in Italy. I assume the sequencing on my kit is the original one used to provide mt partial results back in June, most probably a BGI kit. Back in June I checked the header of my SNP.vcf file and I've found the sequencing has been performed on 26th of April. So all these months from the end April till the end of December it was just a very slow delivery of the results and not any other type of problems.

I realigned my results using GRCh38.p13.genome.fa.gz and i did it using bwa-mem. Later I realized there is a newer option which takes less time, and produces better results the minimap2.
Then I created the .bai file and finally a Y-MT.bam for upload to yfull.

Later that point, reading few pages back here about fastp I used it too but I am not sure what's the use of the 2 newly obtained archives and also the results:

fastp -i 60820188xxxyyy_SA_L001_R1_001.fastq.gz -I 60820188xxxyyy_SA_L001_R2_001.fastq.gz -o out.R1.fq.gz -O out.R2.fq.gz
Read1 before filtering:
total reads: 637289339
total bases: 63728933900
Q20 bases: 63037226096(98.9146%)
Q30 bases: 60502321555(94.937%)

Read1 after filtering:
total reads: 635356908
total bases: 63481273392
Q20 bases: 62811045155(98.9442%)
Q30 bases: 60301622626(94.9912%)

Read2 before filtering:
total reads: 637289339
total bases: 63728933900
Q20 bases: 62309378594(97.7725%)
Q30 bases: 58153383475(91.2511%)

Read2 aftering filtering:
total reads: 635356908
total bases: 63481273392
Q20 bases: 62181961704(97.9532%)
Q30 bases: 58070302134(91.4763%)

Filtering result:
reads passed filter: 1270713816
reads failed due to low quality: 3862052
reads failed due to too many N: 2810
reads failed due to too short: 0
reads with adapter trimmed: 7581564
bases trimmed due to adapters: 109119934

Duplication rate: 1.18263%

Insert size peak (evaluated by paired-end reads): 169

JSON report: fastp.json
HTML report: fastp.html

fastp -i 60820188xxxyyy_SA_L001_R1_001.fastq.gz -I 60820188xxxyyy_SA_L001_R2_001.fastq.gz -o out.R1.fq.gz -O out.R2.fq.gz
fastp v0.19.6, time used: 4533 seconds

Earlier today I uploaded the Y-M.bam file into my google drive and shared the link to yfull and finally ordering an analysis for my results.

I don't know if I should have realigned my results using minimap2, if the final.bam would be way better than the one i got using bwa-mem. Then I don't know if I should have used the newly obtained fastq archives after using fastp and then minimap2 to create the .bam file.

aaronbee2010
01-07-2020, 05:30 AM
Hello everyone,
I am following Dante Labs (WGS) thread since almost one year ago, not until long ago the registration here on the forum been deactivated so only yesterday I finally managed to finalize the sign up process.
This thread was my main source of information regarding DL's WGS, until I discovered the customer Facebook group.
I purchased a kit back in February 2019, got the partial results back in June,the 4 official files (CNV.vcf, SNP.vcf, indel.vcf and SV.vcf) 3 other "hidden" files (G.vcf, raw.indel.vcf and raw.SNP.vcf) and some reports.
An then I waited and waited, trying to get in contact with them and no replies on any of my inquires neither e-mails, private messages on Facebook group to some of the members of their staff or tagging these members in comments on these very same facebook groups. One day, eventually their staff memeber Sydney, replied to one of my not so positive comments about their lack of customer care and, finally in a "sunny" day of December my raw files started to pop up. Firstly a fastq, the R2 then its twin the R1 and finally the .bam file together with all other small files (bam.bai, CNV.tbi, SV.tbi, SNP.tbi and indel.tbi) It was a 10 and a half month's game of patience, finally concluded.

I checked the fastq's with a tiny tool FQSum and the result is 127.46GBases. I have to add that it is 100bp and not 150bp as the newly sequenced kits in their lab in Italy. I assume the sequencing on my kit is the original one used to provide mt partial results back in June, most probably a BGI kit. Back in June I checked the header of my SNP.vcf file and I've found the sequencing has been performed on 26th of April. So all these months from the end April till the end of December it was just a very slow delivery of the results and not any other type of problems.

I realigned my results using GRCh38.p13.genome.fa.gz and i did it using bwa-mem. Later I realized there is a newer option which takes less time, and produces better results the minimap2.
Then I created the .bai file and finally a Y-MT.bam for upload to yfull.

Later that point, reading few pages back here about fastp I used it too but I am not sure what's the use of the 2 newly obtained archives and also the results:

fastp -i 60820188xxxyyy_SA_L001_R1_001.fastq.gz -I 60820188xxxyyy_SA_L001_R2_001.fastq.gz -o out.R1.fq.gz -O out.R2.fq.gz
Read1 before filtering:
total reads: 637289339
total bases: 63728933900
Q20 bases: 63037226096(98.9146%)
Q30 bases: 60502321555(94.937%)

Read1 after filtering:
total reads: 635356908
total bases: 63481273392
Q20 bases: 62811045155(98.9442%)
Q30 bases: 60301622626(94.9912%)

Read2 before filtering:
total reads: 637289339
total bases: 63728933900
Q20 bases: 62309378594(97.7725%)
Q30 bases: 58153383475(91.2511%)

Read2 aftering filtering:
total reads: 635356908
total bases: 63481273392
Q20 bases: 62181961704(97.9532%)
Q30 bases: 58070302134(91.4763%)

Filtering result:
reads passed filter: 1270713816
reads failed due to low quality: 3862052
reads failed due to too many N: 2810
reads failed due to too short: 0
reads with adapter trimmed: 7581564
bases trimmed due to adapters: 109119934

Duplication rate: 1.18263%

Insert size peak (evaluated by paired-end reads): 169

JSON report: fastp.json
HTML report: fastp.html

fastp -i 60820188xxxyyy_SA_L001_R1_001.fastq.gz -I 60820188xxxyyy_SA_L001_R2_001.fastq.gz -o out.R1.fq.gz -O out.R2.fq.gz
fastp v0.19.6, time used: 4533 seconds

Earlier today I uploaded the Y-M.bam file into my google drive and shared the link to yfull and finally ordering an analysis for my results.

I don't know if I should have realigned my results using minimap2, if the final.bam would be way better than the one i got using bwa-mem. Then I don't know if I should have used the newly obtained fastq archives after using fastp and then minimap2 to create the .bam file.

Yes, this definitely appears to be a BGI-generated file. Their files were known to often have much higher coverage than advertised, and their DNBseq sequencers are configured for 100 bp. Dante Labs currently use Illumina NovaSeq 6000's configured for 150 bp if I recall correctly. The headers in my Dante FASTQ files showed that my FASTQ files were generated from a NovaSeq 6000.

I waited 8 months for my files from when they received my kit on March 25th to reports being ready on November 26th. Things didn't appear to be going anywhere for a long time. I sent out messages to Dante CS roughly every 1-2 months, but initially it was just the same old scripted nonsense saying my results would be ready in (insert month here). Eventually, one of my messages was answered by Sydney, who appeared to be the first Dante CS rep I dealt with who wasn't outsourced. What she said eventually ended up coming to pass around a month later when I received my results, so well done to Sydney for that. Initially, the FASTQ's, hg19 BAM, various hg19 VCF's and all index files arrived, and around a month later, the hg38 BAM and VCF's arrived along with their corresponding indexes.

Moving on, the problem with aligning to p13 is that most aligners are not "alt-aware", and will assign less reads to the regions on the parts of the primary assembly where the patches apply, as well as assigning those reads a mapping quality of 0/60 or 1/60 depending on the aligner. If you want to align to p13, you can align with BWA-MEM (which is the only free aligner that can be configured to be alt-aware as far as I know) with the use of a fa.alt index file, which makes BWA-MEM able to distinguish between alt contigs (patches aren't technically alt contigs, but apparently are supposed to be treated like alt-contigs, as to not interfere with the co-ordinates of the primary assembly - the GRC heavily emphasizes not disrupting the primary assembly co-ordinates with patches) and the primary chromosomal assembly and the latter will get mapping priority. Here's a diagram from https://software.broadinstitute.org/gatk/documentation/article?id=8017.

https://us.v-cdn.net/5019796/uploads/FileUpload/38/98aa1f4e0468b7fc8106a6bcc600c5.png

Additionally, here is a quote from the Author of the BWA aligner, Heng Li:

"During alignment, BWA-MEM will be able to classify potential hits to ALT and non-ALT hits. It reports alignments and assigns mapping quality (mapQ) loosely following these rules:

1. The original mapQ of a non-ALT hit is computed across non-ALT hits only. The reported mapQ of an ALT hit is always computed across all hits.

2. An ALT hit is only reported if its score is better than all overlapping non-ALT hits. A reported ALT hit is flagged with 0x800 (supplementary) unless there are no non-ALT hits.

3. The mapQ of a non-ALT hit is reduced to zero if its score is less than 80% (controlled by option -g) of the score of an overlapping ALT hit. In this case, the original mapQ is moved to the om tag.

This way, non-ALT alignments are only affected by ALT contigs if there are significantly better ALT alignments. BWA-MEM is carefully engineered such that ALT contigs do not interfere with the alignments to the primary assembly."

Saying all this, YFull only look at the chrY primary assembly, they don't look at the patch regions on chrY (below:)

chrY:56821510-56887902 chrY_KN196487v1_fix REGION198 HG2062_PATCH fix-patch
chrY:9034110-9132718 chrY_KZ208923v1_fix REGION259 HG1531_PATCH fix-patch
chrY:21578942-21828659 chrY_KZ208924v1_fix REGION260 HG1535_PATCH fix-patch

So the only possible reason one would have to use p13 is if it improved the mapping of reads onto the chrY primary assembly. (1) was the first BAM file produced by me and the first file I sent to YFull last month, before I was aware of alt-aware configurations for BWA-MEM. (5) was purchased soon after I realised of not using an alt-aware pipeline and (2), (3) and (4) were all produced this week by me for a comparison I'm doing in order to find the best choice to use with my parents and maternal uncles FASTQ files when they arrive. Since your comment coincides with my plan shockingly well, I will post the comparison below so everyone here can have a look for themselves:

As a prerequisite, I've produced the following BAM files:

(1) GRCh38 (pseudo-autosomal regions not N-masked, alt/unlocalised contigs present, unplaced contigs present, no decoys present and p13 present) - BWA-MEM (not configured to be alt-aware)

(2) GRCh38 (pseudo-autosomal regions N-masked, no alt/unlocalised contigs present, no unplaced contigs present, no decoys present and no patches present) - minimap2

(3) GRCh38 (pseudo-autosomal regions N-masked, alt/unlocalised contigs present, unplaced contigs present, decoys present and no patches present) - BWA-MEM (configured to be alt-aware)

(4) GRCh38 (pseudo-autosomal regions N-masked, alt/unlocalised contigs present, unplaced contigs present, decoys present and p13 present) - BWA-MEM (configured to be alt-aware)

I also have a GRCh38 BAM produced by Dante Labs:

(5) GRCh38 (pseudo-autosomal regions N-masked, alt/unlocalised contigs present, unplaced contigs present, decoys present and no patches present) - Illumina DRAGEN (configured to be alt-aware) (can only be run on the DRAGEN Bio-IT Platform server, which requires a subscription if I'm not mistaken)

(1), (2), (3) and (4) were produced from the untouched FASTQ files straight from Dante Labs (no fastp filtering). For the sake of this comparison, all regions except the chrY primary assembly were filtered out from all BAM files. Finally, all BAM files had library and sequencing duplicates removed with Picard MarkDuplicates (REMOVE_DUPLICATES=true and OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500). All of these steps were done using a Linux bash shell on my computer (except producing BAM (5), obviously). All 5 trimmed files were then run through QualiMap BamQC configured to evaluate non-N sections of the hg38 chrY primary assembly (excluding pseudo-autosomal regions). The results are below:

(1):

Mean coverage: 14.4x
Standard deviation of mean: 19.5357x
Modal coverage: 12x
Minimum coverage: 0x
P2.5 coverage: 5x
Q1 coverage: 10x
Median coverage: 12x
Q3 coverage: 15x
P97.5 coverage: 33x
Maximum coverage: 3628x
Mean MAPQ: 32.0124/60

(2):

Mean coverage: 14.5235x
Standard deviation of mean: 20.3284x
Modal coverage: 12x
Minimum coverage: 0x
P2.5 coverage: 5x
Q1 coverage: 10x
Median coverage: 12x
Q3 coverage: 15x
P97.5 coverage: 33x
Maximum coverage: 1934x
Mean MAPQ: 36.4122/60

(3):

Mean coverage: 15.3795x
Standard deviation of mean: 26.9002x
Modal coverage: 12x
Minimum coverage: 0x
P2.5 coverage: 5x
Q1 coverage: 10x
Median coverage: 12x
Q3 coverage: 15x
P97.5 coverage: 41x
Maximum coverage: 5933x
Mean MAPQ: 32.4252/60

(4):

Mean coverage: 15.383x
Standard deviation of mean: 26.9609x
Modal coverage: 12x
Minimum coverage: 0x
P2.5 coverage: 5x
Q1 coverage: 10x
Median coverage: 12x
Q3 coverage: 15x
P97.5 coverage: 41x
Maximum coverage: 5911x
Mean MAPQ: 32.4271/60

(5):

Mean coverage: 20.2451x
Standard deviation of mean: 94.5697x
Modal coverage: 13x
Minimum coverage: 0x
P2.5 coverage: 5x
Q1 coverage: 11x
Median coverage: 14x
Q3 coverage: 17x
P97.5 coverage: 39x
Maximum coverage: 9744x
Mean MAPQ: 38.3205/60

If we're talking about the advantages of p13, then it's best to look specifically at (3) and (4) as the only key variable between them is the presence of the patches present within p13 as every other variable was controlled. The results above for (3) and (4) are extremely close with (4) only appearing to edge in front by a miniscule margin. While this is technically a win, p13 doesn't appear to be as much as a deal as it's often made out to be, at least as far as Y-DNA analysis is concerned.

If you do decide to go with p13, then the importance of alt-aware alignment can be summarised by the comparison of (1) and (4).

Apart from the presence of patches, the other two variables between (1) and (4) are the presence of N-masking of PAR regions and the presence of decoys.

As far as N-masking goes, it doesn't have an affect on analysis of Y-DNA for phylogenetic regions as PAR regions are ignored. N-masking is good if you want to analyse the PAR regions and call variants for other reasons (i.e. health), as the PAR regions on chrY need to be N-masked for this purpose. For the purposes of this comparison, the presence of N-masking shouldn't theoretically have any effect on the results.

Decoy sequences also do not theoretically interfere with Y-DNA phylogenetic analysis as their main purpose is to filter off reads that aren't supposed to map to the primary assembly at all, which speeds up alignment as the aligner doesn't have to spend too much time trying to map some sequences to places they do not belong. This also ensures that a higher percentage of reads mapped to the reference genome are in the right place.

While (4) does appear to beat (1) in this comparison, the difference isn't vast. This is probably as patch regions only account for a small percentage of the whole chrY assembly ((334853 * 100%)/23636305 = 1.42%), so the reduction of coverage and mapping quality in these regions shouldn't pose a significant effect.

To further evaluate this, we can single out the regions of the chrY primary assembly that specifically align with the chrY patches present in p13. Here are the most important statistics for (1) and (4) in these regions:

(1)

Mean coverage: 14.6367x
Standard deviation of mean: 24.5075x
Mean MAPQ: 0.0922/60

(4)

Mean coverage: 28.1606x
Standard deviation of mean: 48.9373x
Mean MAPQ: 37.2826/60

The different alt-aware mapping makes in regards to p13 is very large in the affected regions. While this won't have much of an impact on the coverage across the whole chrY assembly, it does demonstrate what happens when alt-aware pipelines are not present when mapping to regions of the primary assembly that overlap with other sequences. This is more of a demonstration than anything else.

Looking at a comparison between BWA-MEM ((3) and (4)) and minimap2 (2), the former appears to have slightly higher coverage (particularly past the third quartile) whereas the latter has noticeably better average mapping quality. Considering the coverage below the third quartile seems to be similar and increases in coverage above the third quartile are relatively inconsequential as these positions past the third quartile already have at least 15x coverage anyways. Mapping quality may be more important in this regard, although I'm not entirely sure how the advantage in MAPQ translates to differences in the potential of novel SNP discovery or other aspects of YFull analysis. I'm not sure how exactly YFull focuses on mapping quality.

If I had to guess, minimap2 would be better than BWA-MEM for YFull analysis (as mapping quality is an indicator of the reliability of a read fragments mapping to it's position on the reference, and in this case appears to have more of an impact then extra coverage in the top quartile), but I can't say this with absolute certainty. Hopefully somebody else could give their input.

Looking at the BAM produced using the DRAGEN platform, (5) is in top for all but P97.5 coverage, which doesn't really mean anything. If you're willing to pay $32 for GRCh38 alignment that you really should be getting for free, then this appears to be the best choice, however if you want to align the FASTQ files yourself, minimap2 may be the best choice for you, although this is open to discussion. @domogled sorry if I wasn't able to answer this question for you definitively.

On a final note, I'm not by any stretch of the imagination an expert in regards to bioinformatics, so if I've made any notable errors, please let me know and I will do what I can to correct them.

tontsa
01-07-2020, 05:32 AM
I got 1->1.5% better alignment with bwa-mem after fastp's trimming adapters and moving singletons to separate file. Haven't really gotten into minimap2 or bwa-mem2 as they seem to require so much more memory.. bwa-mem2's index of hg38.fasta requires like 193G of memory so I gave up.

aaronbee2010
01-07-2020, 05:45 AM
I got 1->1.5% better alignment with bwa-mem after fastp's trimming adapters and moving singletons to separate file. Haven't really gotten into minimap2 or bwa-mem2 as they seem to require so much more memory.. bwa-mem2's index of hg38.fasta requires like 193G of memory so I gave up.

For me, minimap2's is mainly signinficant when using Galaxy. When running it on my PC, it regularly get bottlenecked by the RAM running out, and I have 16GB too. I'm planning on upgrading my PC in the summer so hopefully that would fare better.

Can't say I've tried MEM2 but regular MEM seems to be fine.

What did the adapter trimming translate into in terms of mapping quality, if you know?

teepean47
01-07-2020, 07:01 AM
I realigned my results using GRCh38.p13.genome.fa.gz and i did it using bwa-mem. Later I realized there is a newer option which takes less time, and produces better results the minimap2.


Thank you for the comparison. I have been considering both minimap2 and bwa-mem2 but have had a lot of problems especially with the latter.

What options did you use with minimap2?

MacUalraig
01-07-2020, 07:36 AM
For me, minimap2's is mainly signinficant when using Galaxy. When running it on my PC, it regularly get bottlenecked by the RAM running out, and I have 16GB too. I'm planning on upgrading my PC in the summer so hopefully that would fare better.

Can't say I've tried MEM2 but regular MEM seems to be fine.

What did the adapter trimming translate into in terms of mapping quality, if you know?

Probably a system total 16Gb is pushing it too close for minimap2 depending no how much else inc OS is loaded. My notes indicate it went over 15Gb at one point. bowtie2 will run on an 8Gb machine though and has ready built binaries for windows and linux.

tontsa
01-07-2020, 10:30 AM
What did the adapter trimming translate into in terms of mapping quality, if you know?

Don't have the stats available anymore but will try to produce them again when I tune the fastq->bam creation. Btw.. how do you create the alt.fa index for newer patches? The bwa-kit includes pretty old version and there's no actual how-to to re-generate it from scratch.

aaronbee2010
01-07-2020, 02:03 PM
Don't have the stats available anymore but will try to produce them again when I tune the fastq->bam creation. Btw.. how do you create the alt.fa index for newer patches? The bwa-kit includes pretty old version and there's no actual how-to to re-generate it from scratch.

First, you need to analyse the reference FASTA file you're using and gather the order of all the contigs present after chrM. I use Ubuntu's split function to split the FASTA into 100MB fragments, analyse the files that contain non-primary assemblies (in my experience, the first file to be applicable was called "xbd", so the previous split files were discarded), use Notepad++ with each file to mark and bookmark all lines with ">" (which denotes the beginning of a new sequence), copy those lines to a new document (in the correct order!) and remove ">" from all lines.

You can just take the hs38DH.fa.alt file, delete all contigs below the primary assembly and replace them with the lines you produced previously (https://sourceforge.net/p/bio-bwa/mailman/message/34964710/. Make sure there's one empty line at the bottom of the .alt file otherwise the last alt contig is ignored for analysis. Here is a compressed file with the sequence containing the alt contigs, and its complementary .alt file, which I used for (4): https://drive.google.com/file/d/191PKMj3nPecKdaTqtzB7PgAUC8Yf6BrJ/view?usp=sharing. From top to bottom, you have alt/unlocalised contigs, decoy contigs, and patch contigs.

aaronbee2010
01-07-2020, 03:18 PM
On an unrelated note, even with minimap2, or any other non-alt aware aligner, you should still be able to map with unlocalised and decoy contigs, as these do not have known positions (known or otherwise) on the primary assembly.

For YFull analysis, this is unnecessary as the only unlocalised region relevant to chrY is the chrY_KI270740v1_random region, and YFull have told me that they don't look at this region.

Decoys also don't appear to be that necessary to use as minimap2 is already a very quick aligner on usegalaxy.eu. No harm in it, though.

Just thought this was worth mentioning :)

EDIT: The NCBI FTP server states that the GRCh38 "no-alt analysis set" contains unplaced and unlocalised contigs, as well as the Epstein-Barr virus genome, which I believe serves as a decoy. There is also a "no-alt plus hs38d1 analysis set" that contains everything from the no-alt analysis set, with additional decoy contigs. I would recommend using the "no-alt plus hs38d1 analysis set" for non-alt aware aligners such as minimap2.

tontsa
01-07-2020, 04:03 PM
Yeah these alt contigs and their naming seem unnecessarily complicated. You also have to map the dbSNP VCF to use the same naming convention if you are going to do variant calling yourself. I'll try to construct script that does that extra/alt file based on GRCh38_major_release_seqs_for_alignment_pipelines and GCA_000001405.28_GRCh38.p13 diff.. don't feel like copy-pasting those sections manually.

Donwulff
01-09-2020, 03:35 AM
The decoy contigs are there to serve as a sink for reads which don't originate from the primary human genome, and would otherwise end up forced to map against the primary assembly whether they actually belong there or not. Heng Li published some primarily results indicating this does improve the final results; it also requires extra memory and in many cases would probably be slower. But if you're using free outside processing service, why would you care about efficiency?

However, for saliva derived samples, be aware that majority of the non-primary-assembly reads actually originate from oral microbiome, so the blood-sequencing decoys like hs38d1 don't make that much difference, there's going to be a lot of off-target sequences from saliva. A good haplotype-aware variant caller may be able to ignore most of them as it detects the variants belong to different haplotype (Ie. chromosome/genome) but it's going to skew things.

Also note that including or excluding the unlocalized contigs will change your results for similar reasons. It's basically true these don't require alt-aware caller, but the results will still be affected, because reads can preferentially map to the ulocalized contig or get folded on the primary contig. This is actually not very well accounted in sequencing now, and might benefit from being treated more like alt-contigs.

And finally, if you roll your own from the genome patches etc. prime thing to be aware of is that a lot of the hs38d1 decoy sequences are actual human genome that is included in the patches, so you have to handle duplicates if you combine patches and decoy.

A lot of this is detailed already in the "Dante Labs technical" thread on this forum, and done in my scripts like https://github.com/Donwulff/bio-tools/blob/master/mapping/GRCh38_bwa_index.sh - which by the way I'm eager to receive feedback and improvements on. The current resource naming requires a bit of work (The HLA file updates but keeps the same name, so I have to handle that). Of course I'm not sure if it'll make sense to make that completely fire & forget script, you'll need prerequisites and preferably know what you're doing.

JMcB
01-09-2020, 04:00 AM
I’m not a Dante Labs customer but thought this might be of interest to some:


BC Platforms Partners with Dante Labs to Build Europe's Largest Next Generation Sequencing Laboratory for Private and Public Customers


https://markets.businessinsider.com/news/stocks/bc-platforms-partners-with-dante-labs-to-build-europe-s-largest-next-generation-sequencing-laboratory-for-private-and-public-customers-1028801705?fbclid=iwar00fthaus6wbt9yrcbbzyjrwdnxhe ybozrllf47amhkzqcy6xqut7epqcu

aaronbee2010
01-09-2020, 05:28 PM
I’m not a Dante Labs customer but thought this might be of interest to some:


BC Platforms Partners with Dante Labs to Build Europe's Largest Next Generation Sequencing Laboratory for Private and Public Customers


https://markets.businessinsider.com/news/stocks/bc-platforms-partners-with-dante-labs-to-build-europe-s-largest-next-generation-sequencing-laboratory-for-private-and-public-customers-1028801705?fbclid=iwar00fthaus6wbt9yrcbbzyjrwdnxhe ybozrllf47amhkzqcy6xqut7epqcu

I like the idea of workflow automation, that would give the potential to drive the prices down further. More machines should equal less time to process a given number of kits can also reduce the amount of labour needed just like automation.

More than anything else, this can translate to more reliable results delivery, which I really hope will happen. I would also like to see some competitors pop up, which can help lower WGS prices for the consumer as a whole.

I'm not 100% optimistic, but this does have the potential to be a big step forward for bringing WGS to the mainstream.

aaronbee2010
01-11-2020, 03:56 PM
The decoy contigs are there to serve as a sink for reads which don't originate from the primary human genome, and would otherwise end up forced to map against the primary assembly whether they actually belong there or not. Heng Li published some primarily results indicating this does improve the final results; it also requires extra memory and in many cases would probably be slower. But if you're using free outside processing service, why would you care about efficiency?

I was under the impression that decoys speed up the alignment process. Here's a quote I found that says this:


But actually from what I hear, the major motivation for people to use the decoy genome is speed. If you include the decoy in your reference genome when you do the original alignment, many reads will quickly find a very confident alignment in the decoy, thus avoiding countless compute cycles spent trying to Smith-Waterman align it to someplace it doesn’t belong. This purpose – siphoning off these reads to keep them from slowing down the whole alignment – is why this is called the ‘decoy genome.’

Of course, I'm not treating the above quote as divine truth, but it appears to make sense, at least to me.


However, for saliva derived samples, be aware that majority of the non-primary-assembly reads actually originate from oral microbiome, so the blood-sequencing decoys like hs38d1 don't make that much difference, there's going to be a lot of off-target sequences from saliva. A good haplotype-aware variant caller may be able to ignore most of them as it detects the variants belong to different haplotype (Ie. chromosome/genome) but it's going to skew things.

That's very interesting. I take it that downloading oral microbiome decoys and merging them into one decoy file might be a good idea for saliva-derived samples?


Also note that including or excluding the unlocalized contigs will change your results for similar reasons. It's basically true these don't require alt-aware caller, but the results will still be affected, because reads can preferentially map to the ulocalized contig or get folded on the primary contig. This is actually not very well accounted in sequencing now, and might benefit from being treated more like alt-contigs.

That's a fair point, if some decoys behave like alt contigs, then they could be subject to the same reduction in coverage and mapping quality for certain positions.


And finally, if you roll your own from the genome patches etc. prime thing to be aware of is that a lot of the hs38d1 decoy sequences are actual human genome that is included in the patches, so you have to handle duplicates if you combine patches and decoy.

Looking at alt-aware BAM files I've made with BWA MEM, there are a few cases where reads map to more than one alt contig (or even primary contig), even without the presence of patches. I'm assuming that the MAPQ on the target region on the primary assembly would be fine with multiple alternative alignments. The hg38p13+alt+decoy BAM I made had slightly better MAPQ, coverage and less no-calls than the hg38+alt+decoy BAM, but those were with basic .alt indexes without CIGAR strings or edit distances, so didn't work with the post-alt script in bwakit when I tried. I just aligned a BAM with hs38DH that did work with post-alt processing. I'll need to align some patch contigs to the primary assembly for their target chromosomes, but when I've tried, the alignment gets split into different lines on the SAM header, instead of one line, which is what I'm after.


A lot of this is detailed already in the "Dante Labs technical" thread on this forum, and done in my scripts like https://github.com/Donwulff/bio-tools/blob/master/mapping/GRCh38_bwa_index.sh - which by the way I'm eager to receive feedback and improvements on. The current resource naming requires a bit of work (The HLA file updates but keeps the same name, so I have to handle that). Of course I'm not sure if it'll make sense to make that completely fire & forget script, you'll need prerequisites and preferably know what you're doing.

I'm quite busy with exams this month, but this is something I would like to have a look at afterwards.

Staystay
01-15-2020, 04:54 AM
How does the hard drive method of transmission work in terms of safety? Does it provide extra security or are there holes in this method.

I have some information on my home systems cordoned off network-wise and only back up with portable hard drives that go into safes. Maybe I'm overdoing it but there is some data I want only my family be able to retrieve. Hard drives are cheap and everyone should have a solid fireproof safe.

I may totally agree, recently I've bought a small safe for the things I'd like to be only mine. I've found a portable nice fireproof SentrySafe 1200 Fireproof Box (https://wisepick.org/best-fireproof-safe/) and you know, if I need my info with me, it's a perfect decision. Maybe, it's a bit old fashioned, but I feel myself comfortable, knowing all my materials are safe.

bjp
01-15-2020, 06:03 PM
I may totally agree, recently I've bought a small safe for the things I'd like to be only mine. I've found a portable nice fireproof SentrySafe 1200 Fireproof Box (https://wisepick.org/best-fireproof-safe/) and you know, if I need my info with me, it's a perfect decision. Maybe, it's a bit old fashioned, but I feel myself comfortable, knowing all my materials are safe.

Please pardon the off-topic post, but anyone considering a fireproof safe as a storage method for magnetic media like hard drives must consider that most fireproof safes are designed to protect paper documents (keeping contents below 451*F), not to protect magnetic media (which needs to stay below 125-150*F). CDs and DVDs may be able to make it up to 350*F or so. A firesafe designed for paper products will not protect drives or discs.

In my case, I go for offsite storage for a backup copy of my and my wife's WGS results and analysis run on the data.

newuser
01-19-2020, 06:21 PM
Finally got my bam and fastq files.
Had to wait 1 year and 4 months.
Was able to generate my gvcf file from my bam took me 1,564.82 minutes to do so on a high end laptop from 2015.
I don't want to know how long it would take me to generate a new bam from a my fastq files mapped to a newer reference than hg19.
These bioinformatic tools are so slow.

Since I don't really trust dante labs after all the issues and customer messages i needed to send to get my results.
I want to test the results to see if it really is my data.
Anyone know how accurate a microarray like that of 23me is for snps?
I Was thinking of ordering a test there to compare my results.
My plan was to annotate my vcf files from the wgs with the dbsnp database to get the RS nr's and after that filter the high quality reads.
And write a little script in python to compare the conflicts between 23me snps and the wgs snps by checking for the base pair and the zygosity.
Is this a solid plan?
Would I get a 99%+ match?
I don't really have a background in bioinformatics so don't know the accuracy of these 2 sequencing machines that's why I'm asking.

tontsa
01-20-2020, 05:21 AM
You can compare with DNA Kit Studio how close your Dante reads are to the microarray. I've verified against Evogenom and MyHeritage's raw files my two Dante kits and they are as close as they can get.

Ann Turner
01-20-2020, 04:46 PM
Finally got my bam and fastq files.
Had to wait 1 year and 4 months.
Was able to generate my gvcf file from my bam took me 1,564.82 minutes to do so on a high end laptop from 2015.
I don't want to know how long it would take me to generate a new bam from a my fastq files mapped to a newer reference than hg19.
These bioinformatic tools are so slow.

Since I don't really trust dante labs after all the issues and customer messages i needed to send to get my results.
I want to test the results to see if it really is my data.
Anyone know how accurate a microarray like that of 23me is for snps?
I Was thinking of ordering a test there to compare my results.
My plan was to annotate my vcf files from the wgs with the dbsnp database to get the RS nr's and after that filter the high quality reads.
And write a little script in python to compare the conflicts between 23me snps and the wgs snps by checking for the base pair and the zygosity.
Is this a solid plan?
Would I get a 99%+ match?
I don't really have a background in bioinformatics so don't know the accuracy of these 2 sequencing machines that's why I'm asking.
I've done a cross-platform comparison of results from various chips at GEDmatch. The Dante file was prepared with DNA Kit Studio from a VCF file, which contains only variants (SNPs with an ALT allele). The VCF file is filtered for quality. WGSExtract, another tool listed on the DKS site, uses the BAM file and will extract any/all SNPs used by the genetic genealogy companies, both REF and ALT values. It would include some lower quality base calls.

http://dnagenics.com/dna-kit-studio/




35969

Donwulff
01-21-2020, 06:07 PM
Does that mean that Dante Labs results assume REF allele for no-calls? I assume so, DNA Kit Studio seems to be just dropping ALT calls into the 23andMeV4 VCF template, which means the SNV overlap is just same as V4 chip (=meaningless), and concordance is largely based on 23andMeV4 correctness.

Worth noting that all other tests except Affymetrix and Dante Labs are using same technology and statistical processing. According to the Illumina validation files, the Illumina chip results should in fact be near 100% concordant. Maybe this concordance calculation is counting no-calls as discordant? The lower than 100% concordance may be due to companies adding custom SNV's which give semi-random results. Probes for same SNV and InDEL can be constructed different ways. Or (Apologies, I'm just putting this out for completeness of possibilities;) there could be an error in comparison algorithm.

In most cases Dante Labs has slightly higher concordance than Affymetrix, which is interesting since the DNA microarrays use very similar technology to each other.

Last time I checked it looked like Dante Labs sequence and microarray results for my samples differed only when there was SNV next to the one interrogated by the microarray probe, which is a systematic error in the microarray. Sequencing has different kinds of errors in hard to align locations, but those are generally not tested by microarrays.

WGSExtract is Windows re-wrapped version of Thomas Krahn's https://github.com/tkrahn/extract23 one concern I have is it uses samtools for variant calling which isn't state of the art, another is the Windows binary was unattributed and contains executables which can include all kinds of trojan horses/malware, and people got oddly defensive when I pointed that out earlier, lol.

We can do better perhaps, but that takes some effort. I'd like to find regions of Dante Labs/WGS that are confidently called and construct a generic mask from that. Of course they changed from BGISEQ to Illumina, and read length has changed too, so there aren't many sequences with same technology yet. We can also look at OpenSNP data to pick variants which seem to be incorrectly called in microarrays, and filter those out of microarray results. And we could use Genomic VCF (Does Dante Labs STILL provide those? Other than the BAM) from state of the art variant calling pipelines to get both ALT and REF calls. And we could, time and resources allowing, construct a database of variants that are generally incorrectly called (For microarray this could be estimated by mapping the estimated flanking sequences of the variants to the reference genome and seeing if they have non-unique mapping or variants next to the interrogated SNV) by comparing matches microarray & sequence samples.

DNA matching services like GEDMatch probably already have internal quality control filtering out some of the unreliable variants on microarrays, and there are some similar studies on sequencing data like https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004144

Using file conversions that blindly turn WGS/WES no-calls into REF alleles introduces big matching errors, though.

mwasser
01-21-2020, 07:28 PM
WGSExtract is Windows re-wrapped version of Thomas Krahn's https://github.com/tkrahn/extract23 one concern I have is it uses samtools for variant calling which isn't state of the art, another is the Windows binary was unattributed and contains executables which can include all kinds of trojan horses/malware, and people got oddly defensive when I pointed that out earlier, lol.

Seriously, you are repeating these wrong accusations again? Do you also still claim there were no source codes in the archive?
Ok, then again...

1. WGSExtract runs directly from source. The structure of the python script has been intentionally kept simple and as short as possible, so that everybody can see that it does nothing evil. The Windows version ships with Python binaries and samtools binaries, so that the user can immediately start. If you are paranoid you can still replace that exe files with your own ones. Do you also avoid using BAM Analysis Kit or DNA Kit Studio because they contain binaries? I hope not, as I like them as well.

2. You write: "WGSExtract is Windows re-wrapped version ..."
No, there is a Windows and a Mac version. The Mac version uses Macports. But you will suspect Macports to be trojan makers as well..

3. You write: "...of Thomas Krahn's extract23".
Yes, it always kept extract23 as a core and wouldn't exist without extract23. Thomas did a great work with this tool.
But it's not true to say that WGSExtract is just a dumb GUI. It also comes with some extras, like converting positions from hg38 to hg19, adding more autosomal ouput formats, generating statistics for BAM files...
It's a convenient, little tool for non-tech savvy genealogists. I don't get why you always try to talk bad about it. Just because the beta version is hosted on an ip-address?
Registering a fancy domain name does say nothing about security.

4.
You write: "it uses samtools for variant calling which isn't state of the art"
I can see in this thread, that you know much more about bioinformatics than I do. But what I can do is to compare data:
When I generate an Ancestry V2 file out of a hs37d5 Dante BAM, and compare it on Gedmatch 1:1 to a former Ancestry V2 test, then it is 99.9% full identical. Other formats are worse, like 23andMeV5 with 99.5%. But doesn't that mean that samtools is still good enough for this purpose? At least it is still more accurate than creating the files from VCFs, or do I miss something?

I hope that my arguments don't sound too emotional. It's just that I invested lots of my almost non-existent spare time for it, just to help other fellow genealogists. I get nothing out of it and it is frustrating to read repeatedly wrong things about the program.

teepean47
01-22-2020, 08:17 AM
WGSExtract is Windows re-wrapped version of Thomas Krahn's https://github.com/tkrahn/extract23 one concern I have is it uses samtools for variant calling which isn't state of the art, another is the Windows binary was unattributed and contains executables which can include all kinds of trojan horses/malware, and people got oddly defensive when I pointed that out earlier, lol.

I have compiled most of those binaries. You can always use Virustotal to check if they include any nasty surprises.

I used WGSExtract to create a autosomal kit from my Dante BAM and compared it to my Ancestry kit and there were about 1500 different calls out of 940000.

Ann Turner
01-23-2020, 12:05 AM
Does that mean that Dante Labs results assume REF allele for no-calls? I assume so, DNA Kit Studio seems to be just dropping ALT calls into the 23andMeV4 VCF template, which means the SNV overlap is just same as V4 chip (=meaningless), and concordance is largely based on 23andMeV4 correctness.
The template I used with DNA Kit Studio was a merger of A1+A2+V4+V5 SNPs, which gives a SNP overlap of 460-624K to various chips. DKS has an option to replace no-calls with the REF allele, but I did't use it for this comparison. It's just the SNPs with at least one ALT allele. Allowing DKS to fill in SNPs with homozygous REF values gives worse results.

GEDmatch discards no-calls when it "tokenizes" a file, so the 1:1 tool just reports on SNPs that have results in both files. It also discards SNPs with a low minor allele frequency, but I don't think it has any filter for unreliable variants.

Ric69
01-28-2020, 06:52 PM
Technical question: I just got the resequenced data from DL, (first kit was only 15x) now I have two sets of R1,R2 for the same individual (a relative). Can I just append those R1's (e.g. R1a+R1b-> R1a+b) and have a very large raw R1 (and same for the R2's separately). I don't see a reason why not ??? It seems straight forward.

pmokeefe
01-28-2020, 07:04 PM
Technical question: I just got the resequenced data from DL, (first kit was only 15x) now I have two sets of R1,R2 for the same individual (a relative). Can I just append those R1's (e.g. R1a+R1b-> R1a+b) and have a very large raw R1 (and same for the R2's separately). I don't see a reason why not ??? It seems straight forward.

Did you submit a new sample, or did they give you bigger files for the same sample? I received bigger files for the same sample. When I checked a few reads from the old files, they were all present in the new files, assuming they all were.

Ric69
01-28-2020, 08:19 PM
Did you submit a new sample, or did they give you bigger files for the same sample? I received bigger files for the same sample. When I checked a few reads from the old files, they were all present in the new files, assuming they all were.
In theory it should be a whole new sample (new kit, new kit id#, qc, sequencing step etc). But good point, I should look carefully at the individual reads to see if they reappear identical in the new results, as you suggest. That should serve as a fingerprint. I notice they still aligned it to the older hg19, both kits. (Another good reason to merge the FASTQ's and create my own hg38 aligned BAM). That should make quite a decent file (assuming it isn't half microbiome).

aaronbee2010
01-28-2020, 09:01 PM
In theory it should be a whole new sample (new kit, new kit id#, qc, sequencing step etc). But good point, I should look carefully at the individual reads to see if they reappear identical in the new results, as you suggest. That should serve as a fingerprint. I notice they still aligned it to the older hg19, both kits. (Another good reason to merge the FASTQ's and create my own hg38 aligned BAM). That should make quite a decent file (assuming it isn't half microbiome).

You should remove duplicates with Picard, and see how this affects coverage. Normally this would reduce coverage quite a bit, but you should see more than a 50% decrease if R1/2b contains all the reads from R1/2a.

Petr
01-29-2020, 11:17 PM
Did you submit a new sample, or did they give you bigger files for the same sample? I received bigger files for the same sample. When I checked a few reads from the old files, they were all present in the new files, assuming they all were.

Exactly the same for my two samples, the new FASTQ files contain all reads from the old FASTQ files plus some new.

grimani
02-03-2020, 09:45 AM
Finally downloaded FASTQ files for 4 individuals from Dante Labs after many months. They were sequenced in two batches, 1 initially and then 3 later.



Sample Name % Duplication M Q30 reads Mb Q30 bases GC content % PF % Adapter
0000000000 37.3% 0.0 128592.5 42.8% 98.8% 0.5%
0000000001 5.4% 0.0 73256.8 42.1% 98.8% 0.6%
0000000002 6.5% 0.0 90450.8 42.8% 99.0% 0.5%
0000000003 6.0% 0.0 93426.5 43.3% 99.1% 0.4%


Any idea why the first #0 has such a high duplication rate and should I be concerned?

FastQC also came up with weird failures:
"Per base sequence content" (all 4) 36190
"Sequence duplication levels" (only #0) 36187

And warnings:
"Per sequence GC content" (all 4) 36188
"Sequence length distribution" (all 4) 36189
"Overrepresented sequences" (only #0) 36186

Thanks!

adamg
02-10-2020, 02:24 AM
Any news if will be some promotion?

mdn
02-21-2020, 08:40 AM
Has anyone got an upgrade result? I have ordered an upgrade from Intro 4x to 30x on Black Friday - almost 3 months passed and still no results (upgrade order still promises '2 weeks') :).

tontsa
02-21-2020, 08:44 AM
Has anyone got an upgrade result? I have ordered an upgrade from Intro 4x to 30x on Black Friday - almost 3 months passed and still no results (upgrade order still promises '2 weeks') :).

I did one week ago. Not super happy with the results but prolly can't be bothered to chase them for it. 89 gigabases and about 84% maps to grch38 no patches analysis set.

teepean47
02-24-2020, 01:45 PM
I did one week ago. Not super happy with the results but prolly can't be bothered to chase them for it. 89 gigabases and about 84% maps to grch38 no patches analysis set.

I got my upgrade as well after sending an email to their support. Preliminary results do not look like very promising to me.


chr chr_length reads_mapped reads_unmapped
1 249250621 39238381 498054
2 243199373 40394701 609377
3 198022430 32199973 480300
4 191154276 30260782 485119
5 180915260 29365976 443127
6 171115067 27799030 410821
7 159138663 26220834 360356
8 146364022 24306044 333026
9 141213431 20296591 271121
10 135534747 22442288 289175
11 135006516 22526044 288951
12 133851895 21975666 298312
13 115169878 15469260 265463
14 107349540 15135857 200691
15 102531392 14326210 177549
16 90354753 15734439 147785
17 81195210 14380242 156772
18 78077248 12417469 184656
19 59128983 10831209 92290
20 63025520 10723919 124905
21 48129895 6261367 82612
22 51304566 6751207 55013
X 155270560 13133821 342829
Y 59373566 2606107 38017
MT 16569 120210 306

teepean47
02-27-2020, 10:35 AM
I got my upgrade as well after sending an email to their support. Preliminary results do not look like very promising to me.

Sample will be resequenced next week.

karwiso
02-27-2020, 09:44 PM
I get more and more disappointed with DanteLabs results after their switch to Illumina technology. It seems that Illumina is more costly and DanteLabs tights up the amount of sequenced gigabases. Two last results were 86 and 94 gigabases, but with an ok alignment close to 30x - 24-25x and 28x. I got a third test result now - 96.4 Gigabases sequenced, but only 70% passed filtering in fastp and 25% has too many N's. I'm waiting for BAM to download, but based on fastp statistics the coverage would be around 20x. The read length is also 124-137 bp and it is significantly under 150 bp. Since Nebula has announced collaboration with BGI, possible future ingegration with FTDNA and 150 bp read lenth, it would be probably a better option. BGI has also delivered an average coverage about 43-45x in the aligned BAM in the beginning on DanteLabs buiseness.

Here is statistics from fastp on my last DanteLabs WGS with Illumina technology:

fastp version: 0.19.6 (https://github.com/OpenGene/fastp)
sequencing: paired end (151 cycles + 151 cycles)
mean length before filtering: 137bp, 132bp
mean length after filtering: 134bp, 124bp
duplication rate: 11.507642%
Insert size peak: 167
Before filtering
total reads: 716.536520 M
total bases: 96.419756 G
Q20 bases: 88.914143 G (92.215690%)
Q30 bases: 84.819875 G (87.969395%)
GC content: 42.105996%
After filtering
total reads: 501.830700 M
total bases: 64.860703 G
Q20 bases: 63.649012 G (98.131856%)
Q30 bases: 61.083801 G (94.176902%)
GC content: 43.995924%
Filtering result
reads passed filters: 501.830700 M (70.035607%)
reads with low quality: 33.745142 M (4.709480%)
reads with too many N: 180.901744 M (25.246689%)
reads too short: 58.934000 K (0.008225%)

J Man
03-04-2020, 09:35 PM
I saw before that they said that they have a tool called "DNA Explorer" that would allow us to search for individual SNP results. On my personal page though there does not seem to be any DNA Explorer tool. Does anyone else have this tool?

tontsa
03-05-2020, 06:03 AM
I saw before that they said that they have a tool called "DNA Explorer" that would allow us to search for individual SNP results. On my personal page though there does not seem to be any DNA Explorer tool. Does anyone else have this tool?

Seems with newer kits and those that they reuploaded and processed with Dragen have that DNA Explorer available.. it's really just embedded IGV using 24h expiry AWS links (hint: if you need 24h link for your BAM, see source of that page).

E_M81_I3A
03-13-2020, 03:45 PM
Hello, I think the best solution is to join their official facebook group and also join the unofficial facebook group there they won't be able to ignore your messages, do you have a facebook account?

Thanks, I have now received my results today, finally, after 11 months…. Do you know how I can now check the quality of the data ?

JamesKane
03-17-2020, 02:32 PM
Thanks, I have now received my results today, finally, after 11 months…. Do you know how I can now check the quality of the data ?

Fastp: https://github.com/OpenGene/fastp - Tool to sample the read length, insert size, total bases, and quality of the reads in your FASTQ files.

GATK: https://github.com/broadinstitute/gatk/releases - CollectWgsMetrics will yield the Mean and Median Coverage from the BAM as well as the depth percentages in buckets from 1x to 90x. The package includes a ton of other tools for assessing qualities and variant discovery.

FreeAmin
03-18-2020, 01:59 PM
Thanks, I have now received my results today, finally, after 11 months…. Do you know how I can now check the quality of the data ?
There are simpler tools
1. WGSExtract (Software) http://37.187.22.93/wgsextract/WGSExtractBeta.zip
2. https://bam.iobio.io/
3. https://qual.iobio.io/

E_M81_I3A
03-18-2020, 02:02 PM
There are simpler tools
1. WGSExtract (Software) http://37.187.22.93/wgsextract/WGSExtractBeta.zip
2. https://bam.iobio.io/
3. https://qual.iobio.io/

Yes thanks, I have used qual.iobio.io already.

aaronbee2010
03-18-2020, 02:27 PM
Thanks, I have now received my results today, finally, after 11 months…. Do you know how I can now check the quality of the data ?

I would start with FQsum (https://workupload.com/file/Ctkj6RAw), as this determines whether or not you would be able to get a resequence from Dante Labs. They'll only resequence your sample if it's less than 90 GBases (all FASTQ files combined).

Donwulff
03-28-2020, 05:46 PM
Suddenly I received e-mail:
"The following reports have been generated for your kit

Neurology Test
Pharmacogenetics Report (en_us)
Scientific Fitness Report (en_us)
Wellness & Lifestyle Report (en_us)
Nutrigenomics & Lifestyle Report (en_us)"

I'm not certain what this related to. I have bought their neurology analysis & one month of subscription which I canceled when not a single report was actually updated in the past. So perhaps they've just finally got around to that, or alternatively this is related to the new reports (Fitness etc.) although I haven't seen an announcement they were going to be rolled out to everybody for free.

Checking my profile it appears my old BGI-seq SNP only results have been re-processed, too, and INDEL and CNV files have become available. In addition the FASTQ and BAM files are downloadable, and insterestingly there's a "DNA Explorer" allowing viewing the BAM file, which should require quite a bit of resources from them. I'm in the process of downloading those to see if they've used new pipeline etc. of course the old BGI-seq was 100bp PE only, so the longer structures probably suffer a bit.

There's a lot to unpack in all those files (Both literally and figuratively), first observation is "Neurology Test" no longer has clipped-off-page texts as the original report I had. I believe the original report just dumped Genetics Home Reference public domain text for any variant they found, but this new report seems to only contain description for a few of the variants found, and I'm not sure how those are chosen. It's possible the description is now written by Dante Labs, and includes references to literature on some. The rsID's have now been lifted as main identifier, while the HGVS molecular codes are listed only for the few variants with a detailed description. Also the new report has only PDF, without the huge spreadsheet table with different (old) sources for the variant information.

The new pharmacogenetics report states "This report is NOT intended for US persons. This report was not submitted for approval to the US Food and Drug Administration (FDA)." whereas the original one I received stated something along the lines of it being fully compliant, which I commented at the time was surely not correct. (There has, however, been significant question as to whether just simply leaving any drug brand-names out of a report, leaving generic/effective ingredient wouldd be compliant, as at least at the time FDA quidance seemed to suggest that was the issue).

Overall with the reports I'm seeing few disclaimers about mis-interpreting the results. One main thing I'd like to note is that for *most* variants (but not all), harmful effects come with two copies, and the reports do not really address this. Clinically this is very complicated though, as all DNA reading methods are more likely to make an error on the number of variants than on the absence/presence of a variant, some compound heterozygotes (variants occurring on different locations of both copies of the same gene) and complex heterozygotes (variants on different genes on the same biological pathway) count, as well as structural variants could affect other copy of the gene. There are also many cases where variant in just one copy has full or partial effect; as well as two carriers of a variant could pass the condition to their children. Therefore it's never safe to say "You have only one copy, so no need to worry", but with the current reports I think people generally get the impression they have some condition when they have just one copy detected.

Donwulff
03-30-2020, 07:30 PM
BAM file (which I didn't have download link to but received on USB stick originally), INDEL and CNV files look to have been re-processed with DRAGEN, but interestingly SNP-file that was only one downloadable before appears not to have changed. (The original BAM etc. had A at the end of the kit number, the new BAM and INDEL/CNV have just the number). Which is bit of a shame, as I was looking forward to comparing variant calls from GATK and DRAGEN. At least I have the DRAGEN BAM to compare now. The Broad Institute blog-post on DRAGEN-GATK project seems to suggest that DRAGEN primarily improves INDEL/CNV calling, so basing the reports on the old SNP calls may not be bad idea.

Again I'm not sure what triggered this as "Scientific Fitness Report", "Welness & Lifestyle Report" and "Nutrigenoics & Lifestyle Report" can be ordered separately, but I have not ordered them. I have ordered the monthly report update, AI report and personal report, none which I've received, so it's possible they're just getting around to that. On the other hand these reports are included in the newly ordered genome sequences. So maybe they're just being added to old kits for free? Haven't seen any other reports of this yet.

You can download a sample of the standard reports, at least if you give your e-mail for marketing (The link to download is sent to your e-mail), so I'm not sure if it would be fruitful for me to review those reports further. Like everything in this world, they could still use some work, though at least customers can also use third party services like SNPedia/Promethease, Varsome, Genetics Home Reference, Google Scholar, OMIM etc. which I encourage to use in conjunction with any report. Also there's some fancy new tools I should keep track of but can never remember the URL, like there is a whole genome browser-side VCF viewer which allows you to filter for variants associated with given conditions, but I can't find it just now...

There's consistent problem with over-promising in genetic analysis currently though, for example almost all of those intepretations are summarized as "People with your genetic profile are likely...". While they don't list references, many of these claims appear to be based on ridiculously small sample-sizes with selection bias and unknown environmental factors. ("Israeli elite endurance athletes" etc.), where the variant or variants in question are found to explain only small faction of the variance in the trait. Typically just a single variant is considered, while other variants not considered or discovered (to say nothing of environmental factors) could have larger effect. Admittedly, if every intepretation has to read, SNPedia-like, "This variant has been found associated with a small change in benefit from endurance training among some ethnicities and regions, assuming no other difference" or something like that would be lot less sexy and overall incomprehesible. I think on the long run only overall genetic literacy will help, and people rely on a lot more flaky sources for their fitness advice, for example. Many services have for example been trying to put "confidence" on each analysis alongside the effect size, and increasing larger studies will help to improve confidence.

Dorkymon
04-22-2020, 01:52 PM
I am testing for genealogical pursposes, not medical. If we consider discounted prices, you get more bang for the buck with WGS compared to tests at FTDNA, Ancestry and so on. With WGS one gets exome, autosomal, mt-DNA and Y-chromosome. If we count all tests as FF, mtDNA-full, Y67/Y111 plus BigY, then WGS is the way to go. There is only question how we could use WGS for genealogy - the results could not be easily shared and compared yet.

So, two years later from this reply, did you guys figure out if WGS is a solid option to get BigY, full-mtDNA and solid autosomal coverage for genealogical purposes?

teepean47
04-22-2020, 07:17 PM
So, two years later from this reply, did you guys figure out if WGS is a solid option to get BigY, full-mtDNA and solid autosomal coverage for genealogical purposes?

It seems to be so. Especially with Nebula if the results can be transferred to FTDNA. I haven't still heard any experiences with this option.

Jatt1
04-22-2020, 07:38 PM
It seems to be so. Especially with Nebula if the results can be transferred to FTDNA. I haven't still heard any experiences with this option.

Was anybody able to transfer Nabula WGS results to FTDNA so far?

aaronbee2010
04-22-2020, 10:32 PM
Was anybody able to transfer Nabula WGS results to FTDNA so far?

FTDNA customer support mentioned to noman that they're planning on releasing more information regarding this at a later date (COVID-19 has probably delayed this).

Kazakh
04-23-2020, 02:49 PM
DNA Day sales on Dante Labs! -50% discount is provided.:)

slievenamon
05-24-2020, 01:56 PM
We're still waiting for 30X WGS from the 2019 Black Friday Sale!
What's up with that????
Anyone else in the Dante pipeline from 2019?
Thanks...

Adamm
05-28-2020, 01:53 AM
We're still waiting for 30X WGS from the 2019 Black Friday Sale!
What's up with that????
Anyone else in the Dante pipeline from 2019?
Thanks...

How is that possible? I know a guy he bought his 30X WGS in 2020 and he already has his results delivered.

aaronbee2010
05-28-2020, 02:25 AM
How is that possible? I know a guy he bought his 30X WGS in 2020 and he already has his results delivered.

He purchased the test back when Dante were partnered with BGI. A lot of tests (including mine) suffered very heavy delays during this partnership. Eventually, Dante Labs managed to raise enough money to create their own labs on-premises, and the amount of delays faced by customers who purchased tests from early September 2019 onwards has gone down very significantly. While I purchased during the BGI days, BGI never got around to analysing my sample, but fortunately my sample was one of the kits successfully recalled back from China to Dantes own lab, and I received my results about 2 months after Dante Labs opened their own lab.

Not all people are lucky, and slievenamon appears to be one of the less fortunate customers whose sample wasn't received. I would recommend him contacting DL support, although I wouldn't be surprised if he has done so already.

Jan_Noack
05-29-2020, 03:58 AM
I have one test from a few weeks before the Black Friday sale.it's been sitting in reports being produced or whoever the last step is on their timeline for months! Another I returned after Xmas and they received on dec31st and processed within days.

slievenamon
05-29-2020, 07:15 PM
Two Black Friday tests from the USA were received and processed.
Not so promptly, but processed.
The kit from Ireland was sent 12 December 2019. It was received.
Dante has not processed and is giving the tester the royal runaround.

He is tenacious and will stay on Dante.
Dante does not have a proper business model.
A contractual agreement should not be based on luck.
The GDPR is located in Dublin...

aaronbee2010
05-29-2020, 10:57 PM
Two Black Friday tests from the USA were received and processed.
Not so promptly, but processed.
The kit from Ireland was sent 12 December 2019. It was received.
Dante has not processed and is giving the tester the royal runaround.

He is tenacious and will stay on Dante.
Dante does not have a proper business model.
A contractual agreement should not be based on luck.
The GDPR is located in Dublin...

D'Oh! It appears I completely misinterpreted your message! For some reason I misread the date you gave as December 2018! I hindsight, I have no idea how I managed this, so apologies for that :)

The tests from Black Friday 2019 would've been sold after Dante opened their own labs, so there's no excuses for these results not arriving (although even for older customers, I don't see the issue with sending another kit).

As you alluded to, one has to be tenacious in order to get Dante to do their job sometimes, so I give your friend credit where it's due. I remember around this time last year that the Dante Labs Facebook page was filled to the brim with angry customer messages, and I would like to think Dante learnt their lesson since then, but now I'm not so sure.

MacUalraig
06-07-2020, 01:39 PM
Thanks to the admins Dante now have their own sub-section. Seems fair enough especially since the building of the new lab in l'Aquila last year.

https://dantelabs.co.uk/pages/our-labs

"Unlike other Whole Genome Sequencing companies sending to China, we strictly perform all our activities on EU-US soil"

OK we'll pretend they never used China themselves and of course YSEQ use a German lab for sequencing (CeGaT). :-)

TigerMW
06-08-2020, 12:53 AM
....
Seems fair enough especially since the building of the new lab in l'Aquila last year....)
Has Covid-19 affected their operation? Italy was hit pretty hard. Is the local university back in session?

slievenamon
06-10-2020, 08:27 PM
I would say Dante has serious issues producing a WGS test in a timely manner.
I'd venture to say, once they have your money, they really don't give a toss.
That's my humble opinion.
Can speak to the local university being back in session. Is that of some import?
The new digs are working well, it appears. Two out of three tests were completed.
We will not support Dante again with our custom.

Adamm
06-10-2020, 09:14 PM
I'm currently waiting for 3 months now (they were supposed to deliver within 2 months), when I ask them why its taking so long they claim I'm currently sequenced 89Gb and they cannot finish it because they don't have the supplies needed because of coronavirus). I'm thinking they just use the corona-virus as an excuse for their delay, bad customer service and bad delivery time.

MacUalraig
06-11-2020, 08:53 AM
I'm currently waiting for 3 months now (they were supposed to deliver within 2 months), when I ask them why its taking so long they claim I'm currently sequenced 89Gb and they cannot finish it because they don't have the supplies needed because of coronavirus). I'm thinking they just use the corona-virus as an excuse for their delay, bad customer service and bad delivery time.

It must be tempting to just ask them to hand over what they've done if it's progressed that far but maybe they would decline?

Adamm
06-16-2020, 08:07 PM
It must be tempting to just ask them to hand over what they've done if it's progressed that far but maybe they would decline?

Well I've been mailing them extra and said to them that they are not upholding their promises, and today they delivered my report. So I think it helps to mail them all the time.

Kazakh
11-23-2020, 12:54 PM
Dante Labs is having Black Friday sales at 149EURO (~177USD). Dunno what to say, but if you are a risk taker, go for it.
There is an alternative option which is more expensive but with reliable delivery timeframes and excellent quality, it is Nebula Genomics.

Choose US through a vpn then price will be 149$, not 149€
https://i.ibb.co/RyPKCvk/Screenshot-2020-11-23-19-35-06.png (https://ibb.co/DpkF7VH)

lacreme
11-24-2020, 06:14 PM
quick question
as an amateur who will need help deciphering his WGS results regardless of dna testing company and knowing their reliability issues as far as communication and speed go.
Is it worth the risk ?
Also worth pointing out that I'm located in Greece so them being in Italy should mean that the delivery time (of the kit) back and forth even with the covid issues shouldn't be months long ... riight ?

Marmaduke
11-24-2020, 06:26 PM
What's the best price anyone has seen for the Nebula Genomics WGS tests?

Adamm
11-24-2020, 06:45 PM
quick question
as an amateur who will need help deciphering his WGS results regardless of dna testing company and knowing their reliability issues as far as communication and speed go.
Is it worth the risk ?
Also worth pointing out that I'm located in Greece so them being in Italy should mean that the delivery time (of the kit) back and forth even with the covid issues shouldn't be months long ... riight ?

If you check this thread you can see that I had problems with their customer service, but nevertheless they delivered the results in the end for a cheap price (150$). Today I bought another kit from DanteLabs and imo its worth the risk.

lacreme
11-24-2020, 07:35 PM
If you check this thread you can see that I had problems with their customer service, but nevertheless they delivered the results in the end for a cheap price (150$). Today I bought another kit from DanteLabs and imo its worth the risk.

I've read your problems (and other similar complaints in other sites too ) but if they delivered the results in the end I'll probably take the plunge.
Do you know for how many days will the offer stand ?

dosas
11-24-2020, 07:44 PM
Don't do it guys, lol.

Join the Facebook page of Nebula+Dante customers, and check for yourself the endless horror stories about people getting ripped off.

Consider yourself warned about Dante, pay a bit extra and go for Nebula, the price difference is not worth the hassle.

pmokeefe
11-24-2020, 08:26 PM
What's the best price anyone has seen for the Nebula Genomics WGS tests?
I haven't seen it go below $299 for the 30X WGS test. To stay current with their reports, you have to buy a subscription on top of that. I have found the reports helpful and they seem to be pretty good about adding a couple of new reports every week, though I haven't been keeping close enough watch to be sure. Two more came today, for example.

Ibericus
11-24-2020, 08:27 PM
Nebula delivered right at the end of the promised 10 weeks (I have read it takes them anywhere from 8 to 12 weeks). However, it took them 5 weeks to mark my sample as received... Not sure if it was their fault or a post/customs issue.

If Dante had had this discount back in summer I would probably have picked them.

Adamm
11-24-2020, 09:41 PM
I've read your problems (and other similar complaints in other sites too ) but if they delivered the results in the end I'll probably take the plunge.
Do you know for how many days will the offer stand ?

I have no idea, maybe until the weekend.

Marmaduke
11-24-2020, 10:49 PM
Thank you. It's $299 today, and as you say, the reports add another $200. So $499 total. I've done other 30x WGS tests. Is it worth considering the Nebula 100x test at $999? Is that likely to turn up any additional SNP's? Thanks in advance for your thoughts.

Ibericus
11-25-2020, 05:33 AM
Thank you. It's $299 today, and as you say, the reports add another $200. So $499 total. I've done other 30x WGS tests. Is it worth considering the Nebula 100x test at $999? Is that likely to turn up any additional SNP's? Thanks in advance for your thoughts.

I am no expert but if your are talking about Y-SNPs I did the 30x Nebula test (actually I got 34x) and my results were:

- 17x average reads on the Y chromosome
- 99.54% coverage
- 18 novel SNPs according to Yfull of which 1 is 'ambiguous' and 1 is 'one read'

I'm not sure what 'ambiguous' means so I assume that only the 'one read' SNP would benefit from more reads. Then there is an 8.31% chance of finding another novel SNP in the uncovered region (assuming the 100x test gets 100% coverage). In my opinion not even close to being worth the extra money.

Jatt1
11-26-2020, 12:05 AM
I'm currently waiting for 3 months now (they were supposed to deliver within 2 months), when I ask them why its taking so long they claim I'm currently sequenced 89Gb and they cannot finish it because they don't have the supplies needed because of coronavirus). I'm thinking they just use the corona-virus as an excuse for their delay, bad customer service and bad delivery time.

I think the delay comes from combination of Corona virus and them using only spare capacity when available to process these sales kits. I personally am in no hurry as long they do the quality work and provide me the full data.

dosas
11-26-2020, 04:04 PM
Thank you. It's $299 today, and as you say, the reports add another $200. So $499 total. I've done other 30x WGS tests. Is it worth considering the Nebula 100x test at $999? Is that likely to turn up any additional SNP's? Thanks in advance for your thoughts.

It's 299USD for the Whole Genome Sequence and then you have to pay 20eu for one month sub and not renew if you are not interested in getting the updated/new medical reports. So the total to unlock your files and have access to the reports already published would be 299+20=319USD.

For Black Friday, you can add BLACKFRIDAY in the coupon code area for an extra 10% off (just got an email).

Imesmouden
11-26-2020, 09:56 PM
Nebula delivered right at the end of the promised 10 weeks (I have read it takes them anywhere from 8 to 12 weeks). However, it took them 5 weeks to mark my sample as received... Not sure if it was their fault or a post/customs issue.

If Dante had had this discount back in summer I would probably have picked them.

Hello brother

are you the new spanish E-Z5009 sample on yfull?

ThaYamamoto
11-27-2020, 11:36 AM
So seeing as they got the sales on, should I go for Dante or Nebula?

Jatt1
11-27-2020, 06:50 PM
So seeing as they got the sales on, should I go for Dante or Nebula?

I bought a Dante kit but their reputation is bad, but at the same time you can basically get 2 kits from Dante for the price of one kit from Nabula.

ThaYamamoto
11-27-2020, 08:32 PM
I bought a Dante kit but their reputation is bad, but at the same time you can basically get 2 kits from Dante for the price of one kit from Nabula.

Cool cool..Nebula send your DNA to China though far as I know, so might have to go with Dante. They have two tests atm both on sale, the 30x and the 130x...is the 130x worth it at this point?

Jatt1
11-27-2020, 10:23 PM
Cool cool..Nebula send your DNA to China though far as I know, so might have to go with Dante. They have two tests atm both on sale, the 30x and the 130x...is the 130x worth it at this point?

Though the expensive one is not required but if one can deal with the excessive data then the extra cost is only minor. One who needs it for medical reason should go for it.

Adamm
12-01-2020, 11:35 PM
Well here is round 2, I bought a new kit with DanteLabs because my first experience overall (except for the shitty customer service) was good, they delivered my results within 3/4 months. With blackfriday I could buy a new kit for 120 euros (!) so even if their service is shit this was just worth the risk so I bought the kit. It arrived today (within a week) at my house:

https://i.imgur.com/jAwttBf.png

Jatt1
12-01-2020, 11:48 PM
Well here is round 2, I bought a new kit with DanteLabs because my first experience overall (except for the shitty customer service) was good, they delivered my results within 3/4 months. With blackfriday I could buy a new kit for 120 euros (!) so even if their service is shit this was just worth the risk so I bought the kit. It arrived today (within a week) at my house:

https://i.imgur.com/jAwttBf.png

How could you get it for 120 Euro?

Adamm
12-01-2020, 11:51 PM
How could you get it for 120 Euro?

It was an easy trick by using a VPN and set it up to America (the rate of the kit would then become 149$).

Jatt1
12-21-2020, 01:44 AM
Ordered a kit on Nov. 24th, still waiting.

Arlus
12-21-2020, 05:25 AM
Ordered a kit on Nov. 24th, still waiting.
Same here. I have received no tracking updates since it left Germany. I hope that it arrives this week.

Jatt1
12-21-2020, 05:42 AM
Same here. I have received no tracking updates since it left Germany. I hope that it arrives this week.

Where are you? I am in Canada.

Adamm
12-21-2020, 05:42 AM
I didnt update, but I received my kit 1 week after I ordered it on black friday.

Arlus
12-21-2020, 05:44 AM
Where are you? I am in Canada.

India.

lacreme
12-24-2020, 06:09 PM
My Greek friend ordered one kit at 26/11. The delivery and the return of the sample went very smoothly and today, after aprox. a week and a half in the "awaiting qc inspection" stage, his sample continued to "sequencing started" . How much will he have to wait, in the best case scenario and also at normal circumastances, for his results to be ready ? A member of the facebook group who also ordered a normal 8 weeks kit at 26/11 got freakishly lucky and got his results today :eek: ! But cases like his must be one in a million...obviously I don't expect similar timeframes.