PDA

View Full Version : Dante Labs (WGS)



Pages : [1] 2 3 4 5 6

MacUalraig
09-21-2017, 11:26 AM
Dante Labs had a WGS day in July selling their product for just UKP349. Claiming 8w delivery so if so, we should start to see some people discussing their results soon?

https://www.facebook.com/DanteLabs

They are still selling it at its normal price on Amazon but there is some speculation that the reviews are fake. Current price is UKP599 for 30x.

karwiso
09-21-2017, 11:01 PM
They have rebated WGS to 399 euro just today, 22 September 2017.

I've ordered WGS and WES for me, mother and my uncle. Superior for European market.

Donwulff
09-22-2017, 01:25 PM
I did take the chance with them during their earlier discount on 10th July. It's now 8 weeks since they received the sample, with WGS processing time promised at 7-9 weeks, so I should find out any time now, though it wouldn't be the first company missing their delivery estimates. All the reviews are obviously fake and/or internal, because they weren't on the market publicly long enough for those reviewers to get their results. Discount prices are below the reagent prices alone, with no explanation or plan offered for how they offset the price (Beyond "partnered with Amazon", somebody should get Amazon's perspective on this, they have been enrolled into Amazon's Launchpad giving some credibility). The reviewers use language consumers wouldn't, and the most commonly referenced review on DNAMelogy is from anonymous group from same country as Dante Labs (Italy). This is sometimes a marketing strategy that legit companies choose, though it's unfortunate. That said, a well known genetic genealogist talked with them and vouched that it seemed like a genuine company trying to get off the ground. If you pay by PayPal, you may be able to use their refund guarantee if you don't get results, but be prepared for a long fight in that case.

The experience so far has been good; the sample kit, a standard Oragene OG-500, was dispatched and delivered by DHL (Not their fault but at least here DHL is nearly unworkable, I wasn't home during the day so they canceled the delivery with no notification, I had to guess it was being delivered by DHL, call them to re-arrange the delivery and take the day off work) with pre-paid DHL return shipping. I received e-mail confirmation of the kit reception, and unusually, a confirmation and report of successful DNA extraction. Their customer-support has been very rapid, courteous and accommodating. According to their replies on FaceBook they're using Sequencing.com's DNA-analysis-marketplace for analysis, with a sample report for Wellness & Longevity app (Different one from the sample report on Sequencing.com) having been shared.

MacUalraig
09-22-2017, 01:49 PM
They have rebated WGS to 399 euro just today, 22 September 2017.

I've ordered WGS and WES for me, mother and my uncle. Superior for European market.

Crikey so they have - they must read Anthrogenica!

MacUalraig
09-22-2017, 01:56 PM
I did take the chance with them during their earlier discount on 10th July. It's now 8 weeks since they received the sample, with WGS processing time promised at 7-9 weeks, so I should find out any time now, though it wouldn't be the first company missing their delivery estimates. All the reviews are obviously fake and/or internal, because they weren't on the market publicly long enough for those reviewers to get their results. Discount prices are below the reagent prices alone, with no explanation or plan offered for how they offset the price (Beyond "partnered with Amazon", somebody should get Amazon's perspective on this, they have been enrolled into Amazon's Launchpad giving some credibility). The reviewers use language consumers wouldn't, and the most commonly referenced review on DNAMelogy is from anonymous group from same country as Dante Labs (Italy). This is sometimes a marketing strategy that legit companies choose, though it's unfortunate. That said, a well known genetic genealogist talked with them and vouched that it seemed like a genuine company trying to get off the ground. If you pay by PayPal, you may be able to use their refund guarantee if you don't get results, but be prepared for a long fight in that case.

The experience so far has been good; the sample kit, a standard Oragene OG-500, was dispatched and delivered by DHL (Not their fault but at least here DHL is nearly unworkable, I wasn't home during the day so they canceled the delivery with no notification, I had to guess it was being delivered by DHL, call them to re-arrange the delivery and take the day off work) with pre-paid DHL return shipping. I received e-mail confirmation of the kit reception, and unusually, a confirmation and report of successful DNA extraction. Their customer-support has been very rapid, courteous and accommodating. According to their replies on FaceBook they're using Sequencing.com's DNA-analysis-marketplace for analysis, with a sample report for Wellness & Longevity app (Different one from the sample report on Sequencing.com) having been shared.

There is a real need for some believable reviews including screen shots, report extracts, Y and mt haplogroup determination etc (assuming you get those). dnatestingchoices.com sometimes have useful reviews and I've talked to them before but this is a bit lame:

"Dante Labs Review

5 September 2017

Whole Genome Sequencing
5
Approved Review

I am very interested in buying the whole genome sequencing pack. Next Generation Sequencing is very rare and I trust the quality of this product.

Dante Labs DNA Sequencing review by a DNA Testing Choice user"

https://dnatestingchoice.com/search/dante

What a joke.

karwiso
09-23-2017, 09:02 AM
As Donwulff said there are not many reviews of the company. According to genomeweb molecular-diagnostics/dante-labs-offers-direct-consumer-hereditary-disease-risk-genome-exome-tests this article it is quite new vendor for DNA testing. I think they have good possibilities to grew in the European market. Their offers are very attractive and I think they do this to attract new customers and get some reviews here and there.

So far: They deliver with DHL and you get prepaid DHL labels to returns the samples. When samples kits are shipped you get DHL Waybill number so you can track the pack. It took just one day to get the kit (from Italy to Sweden) and one day to ship the kit back to Italy. My kits were marked as received the same day they were delivered. The WES kit status was updated after 11 days and I got a DNA-extraction report. I suppose that my sample is in sequencing now.
I ordered two WGS kits for my relatives in Russia and I was quite surprised that kits are send with DHL (it is not to big cities as St Petersburg and Moscow, but to rather distant smaller city). There were some additional documentation attached for the customs.
So far I am very satisfied with the service.
I am testing for genealogical pursposes, not medical. If we consider discounted prices, you get more bang for the buck with WGS compared to tests at FTDNA, Ancestry and so on. With WGS one gets exome, autosomal, mt-DNA and Y-chromosome. If we count all tests as FF, mtDNA-full, Y67/Y111 plus BigY, then WGS is the way to go. There is only question how we could use WGS for genealogy - the results could not be easily shared and compared yet.

MacUalraig
09-23-2017, 12:00 PM
As Donwulff said there are not many reviews of the company. According to genomeweb molecular-diagnostics/dante-labs-offers-direct-consumer-hereditary-disease-risk-genome-exome-tests this article it is quite new vendor for DNA testing. I think they have good possibilities to grew in the European market. Their offers are very attractive and I think they do this to attract new customers and get some reviews here and there.

So far: They deliver with DHL and you get prepaid DHL labels to returns the samples. When samples kits are shipped you get DHL Waybill number so you can track the pack. It took just one day to get the kit (from Italy to Sweden) and one day to ship the kit back to Italy. My kits were marked as received the same day they were delivered. The WES kit status was updated after 11 days and I got a DNA-extraction report. I suppose that my sample is in sequencing now.
I ordered two WGS kits for my relatives in Russia and I was quite surprised that kits are send with DHL (it is not to big cities as St Petersburg and Moscow, but to rather distant smaller city). There were some additional documentation attached for the customs.
So far I am very satisfied with the service.
I am testing for genealogical pursposes, not medical. If we consider discounted prices, you get more bang for the buck with WGS compared to tests at FTDNA, Ancestry and so on. With WGS one gets exome, autosomal, mt-DNA and Y-chromosome. If we count all tests as FF, mtDNA-full, Y67/Y111 plus BigY, then WGS is the way to go. There is only question how we could use WGS for genealogy - the results could not be easily shared and compared yet.


If I recall Full Genomes send out their saliva collection kit by courier too, presumably because of the liquids in it ??

There are a few people looking at how we might go about using full (WGS) autosomal DNA for genealogy. One of them is also a traditional SNP based autosomal match to me luckily. I am convinced we will be doing it before long and hopefully WGS will go mainstream soon given the way prices are going.

karwiso
09-23-2017, 02:32 PM
If I recall Full Genomes send out their saliva collection kit by courier too, presumably because of the liquids in it ??

I don't think it is because liquids. AncestryDNA has a very similar kit for saliva collection, but their kit is mailed and you mail it back in prepaid box. Probably it depends on amount of kits mailed and contracts with the post or other carriers. But it is still nice to get kits and samples faster and be able to track them. LivingDNA uses B-post and it took 2 or 3 weeks to see the kit as received by the lab...

Donwulff
09-29-2017, 12:52 PM
New review for Dante Labs on medgadget.com
It's not independent in that the reviewer got it for free and thus Dante Labs knew there was going to be a review, but it does look like a genuine outside review.
I, on the other hand, sent the sample in 11 weeks ago now, with Dante Labs stating I'd receive results within couple of weeks three weeks ago, so I'm inclined to think it's a scam. As said, it wouldn't be the first time a DNA company misses delivery estimates, but given the campaign prices are so far below going price/materials price (Human whole genome sequencing at BGI starts at $600, Illumina is couple hundred more) with no plan to cover the price difference presented that they have burden of proof.
There is also a first "Verified Purchase" review on Amazon, but it sounds & looks like Dante Labs press release. My reading is that they ordered the test on 5th September so could not actually have received the results in any case (Which is quite confusing, why admit it was ordered three weeks ago if the processing time is eight weeks?)
Sequencing.com, their biinformatics analytics partner, has been listing Dante Labs as their "preferred provider" for a while though.

MacUalraig
09-29-2017, 01:36 PM
Review here:

https://www.medgadget.com/2017/09/dante-labs-full-genome-sequencing-review.html

so sounds like he hasn't looked at his BAM file, or it wasn't available... but only a vcf file. And no look at his Y/mtDNA.

MacUalraig
10-02-2017, 07:51 AM
New review for Dante Labs on medgadget.com
It's not independent in that the reviewer got it for free and thus Dante Labs knew there was going to be a review, but it does look like a genuine outside review.
I, on the other hand, sent the sample in 11 weeks ago now, with Dante Labs stating I'd receive results within couple of weeks three weeks ago, so I'm inclined to think it's a scam. As said, it wouldn't be the first time a DNA company misses delivery estimates, but given the campaign prices are so far below going price/materials price (Human whole genome sequencing at BGI starts at $600, Illumina is couple hundred more) with no plan to cover the price difference presented that they have burden of proof.
There is also a first "Verified Purchase" review on Amazon, but it sounds & looks like Dante Labs press release. My reading is that they ordered the test on 5th September so could not actually have received the results in any case (Which is quite confusing, why admit it was ordered three weeks ago if the processing time is eight weeks?)
Sequencing.com, their biinformatics analytics partner, has been listing Dante Labs as their "preferred provider" for a while though.

If you read the Verified Purchase review on Amazon carefully they never actually say they have had their results yet. Its an obfuscation masterpiece. Even the phrase 'The report results are in plain English and I did not find it difficult to understand, ' is from the paragraph 'SAMPLE REPORTS'.

Donwulff
10-06-2017, 02:06 AM
I actually received my results yesterday. That makes it 79 days, or 11 weeks and two days, since they confirmed receiving the sample, as I said missing the delivery estimate is pretty common... BAM file was not included, but since their web-site says it is available (After I asked about it from them), I have requested for it, we'll see how it goes. As for the VCF file they make available, it's near perfect match with my 23andMeV3 results, and looks like a perfect half-match to one of my parents exome-sequence. This proves beyond shadow of doubt that it's an actual sequence, and not just imputed SNP-microarray/chip result, which was my first concern.
For the casual user, the "Wellness and Longevity" report is the main feature. It's a little underwhelming, for me at least, though as indicated I've seen my own 23andMe and Promethease results as well as one parent's exome analysis, and these reports only cover SNP's which are generally tested by 23andMe already. The traits included are Athletic Performance, Melanoma, Arthritis, Osteoporosis, Malignant Hyperthermia, Heart Attack, Medication Assasment (Warfarin, Clopidogrel/Plavix, Aspirin, Statins, Blood Clot Risk including Deep Vein Thrombosis), Preventable Sudden Death (due to a Heart Arrythmia) and finally Genetic Profile which increases one-liners for Atrial Fibrillation, MTHFR Deficiency, Salt-Sensitive Hypertension, Effect of Breastfeeding as a Baby on IQ, Congenital Bilateral Absence of Vas Deferens, Obstructive Azoospermia, Lactose Intolerance, Liver Disease, Hirschsprung Disease, Blood Clot Risk (due to Factor V Leiden, Prothrombin 20210, Protein C Deficiency, Protein S Deficiency, Antithrombin III Deficiency), Bleeding Risk (due to Factor V Deficiency, Factor XI Deficiency, Von Willebrand Disease Type 2M, Prothrombin Deficiency, Von Willebrand Disease Type 1, Von Willebrand Disease Type 2B), Hemochromatosis, Drug-induced Hemolysis, Hemolytic Anemia due to Triosephosphate Isomerase Deficiency, Homocysteinuria, Thrmbotic Thromboctopenic Purpurra, Idiopathic Pancreatitis, Sysceptibility to Lead Poisoning due to ALAD Gene, Brain Aneurysm, Alpha-1-Antirypsin Deficiency, Asthma, Bronchiectasis, Ephysema, Chronic Respiratory Disease, Resistance to HIV Infection, Susceptibility to West Nile Virus, Recurrent Bacterial Infections, Increased Risk of Disseminated Infection with Mycobacterium avium and Salmonella enteritidis and Noise-inducd Hearing Loss.
Phew, that's a LOT. My main concern is that you get access to that report without a single click-wrap warning. Some people won't handle well learning that they have "Heart Attack" or "Sudden Death", for example, with no forced warning or disclaimer about how the results are only according to knowledge now, may contain errors, and in any case only a risk. 23andMe had multiple click-wraps and disclaimers but still got into hot water, which is why Dante Labs is only available in Europe, but for ethical reasons I hope they will introduce clear warnings both before order and before viewing results. If you are taking the test for genealogical purposes only, consider carefully the possible impact on yourself *and your family* if you're found positive for any of the risks above before clicking on "Wellness and Longevity" in the kit manager.
Finally, there is a "Genome Overview", which is an SnpEff report for the VCF file. Most of the information in it, I daresay, holds little relevance or usability, what am I supposed to compare the quality metrics etc. on here? It shows there were 3,431 variants detected on Y chromosome, though. The report shows mtDNA incorrect, and the reference sequence in the VCF looks to actually be UCSC hg19 chrM, which isn't compatible with anything, really. So the BAM will be essential for phylogenetic uses.

karwiso
10-08-2017, 03:54 PM
I am testing for genealogical purposes so I am interested to upload WGS to GEDmatch.com. When I clilck on upload VCF to Genesis I see "Copy the FILE URL from your testing company, and paste into this box. Your data must be Build 37 (GRCh37/hg19) " Probably it is not that bad with hg19? FTDNA just announced the switch to GRCh38.

Donwulff
10-09-2017, 04:54 PM
Dante Labs was currently on UCSC hg19. It's just that the mitochondrial reference in that is way outdated.
FTDNA's switch is for Big Y's Y-chromosome only, it won't matter for this as YFull would redo the mapping to a specific build anyway.

Dante Labs provided a download link for the VCF file, which worked as-is for both GEDMatch Genesis and SNPedia Promethease. There's a general problem with the sequencing services though, due to the large amount of data the VCF's only list variants that differ from reference. GEDMatch Genesis will therefore take only those SNP's into account in matching, which means you're working with very limited (and skewed) set of SNP's. This is a non-trivial problem because even a whole genome sequence doesn't get sufficient coverage across the whole genome, so you can't just assume non-listed variants are reference allele either. The solution is for everybody to start using gVCF (And, preferably, some interchangeable common format...) which lists regions that match the reference.

I have no idea if GEDMatch or SNPedia support gVCF yet, I haven't seen a statement for that. It's possible to expand out the gVCF REF-blocks, but Dante Labs VCF doesn't contan them currently, and expanding them would hugely increase the file size (Storage + file-transfer). Another quick-and-dirty fix would be to run the VCF through an imputation process, which would fill out the reference alleles, and hopefully impute some of the variants in the non-tested regions. However, currently I'm not sure there ARE any public imputation services that would take a sequencing VCF.

I was offered the BAM file on a mailed USB-stick, but given it would take a week or two to upload the BAM file on Dropbox/Google Drive etc. (Oh, and need expanded capacity too) over a fast consumer internet connection, I asked if they could arrange for a download link, still waiting for a response. YFull does not currently list Dante Labs as a testing company on their order form, but given they frequently add samples from studies with different workflows, I'm sure they can add Dante Labs BAM files easily enough.

Donwulff
10-09-2017, 05:13 PM
Whoops double post deleted.

karwiso
10-09-2017, 07:21 PM
I would like to provide a link, but I am not allowed to post links here yet. Please google: NCBI Genome Remapping Service NCBI
It is free remapping tool and you can choose genome build and input and output file types.

Well, I hope they can provide download links. I have ordered several tests and would be disappointing to go through snail mail with USB sticks...

rbtlsr
10-09-2017, 07:47 PM
Thank you so much for updating Donwulff! I was getting worried that I'd been scammed. They received my kit 9 and a half weeks ago. I contacted them late last week and they said I could expect to get my results October 20th.

warwick
10-12-2017, 02:28 AM
I actually received my results yesterday. That makes it 79 days, or 11 weeks and two days, since they confirmed receiving the sample, as I said missing the delivery estimate is pretty common... BAM file was not included, but since their web-site says it is available (After I asked about it from them), I have requested for it, we'll see how it goes. As for the VCF file they make available, it's near perfect match with my 23andMeV3 results, and looks like a perfect half-match to one of my parents exome-sequence. This proves beyond shadow of doubt that it's an actual sequence, and not just imputed SNP-microarray/chip result, which was my first concern.
For the casual user, the "Wellness and Longevity" report is the main feature. It's a little underwhelming, for me at least, though as indicated I've seen my own 23andMe and Promethease results as well as one parent's exome analysis, and these reports only cover SNP's which are generally tested by 23andMe already. The traits included are Athletic Performance, Melanoma, Arthritis, Osteoporosis, Malignant Hyperthermia, Heart Attack, Medication Assasment (Warfarin, Clopidogrel/Plavix, Aspirin, Statins, Blood Clot Risk including Deep Vein Thrombosis), Preventable Sudden Death (due to a Heart Arrythmia) and finally Genetic Profile which increases one-liners for Atrial Fibrillation, MTHFR Deficiency, Salt-Sensitive Hypertension, Effect of Breastfeeding as a Baby on IQ, Congenital Bilateral Absence of Vas Deferens, Obstructive Azoospermia, Lactose Intolerance, Liver Disease, Hirschsprung Disease, Blood Clot Risk (due to Factor V Leiden, Prothrombin 20210, Protein C Deficiency, Protein S Deficiency, Antithrombin III Deficiency), Bleeding Risk (due to Factor V Deficiency, Factor XI Deficiency, Von Willebrand Disease Type 2M, Prothrombin Deficiency, Von Willebrand Disease Type 1, Von Willebrand Disease Type 2B), Hemochromatosis, Drug-induced Hemolysis, Hemolytic Anemia due to Triosephosphate Isomerase Deficiency, Homocysteinuria, Thrmbotic Thromboctopenic Purpurra, Idiopathic Pancreatitis, Sysceptibility to Lead Poisoning due to ALAD Gene, Brain Aneurysm, Alpha-1-Antirypsin Deficiency, Asthma, Bronchiectasis, Ephysema, Chronic Respiratory Disease, Resistance to HIV Infection, Susceptibility to West Nile Virus, Recurrent Bacterial Infections, Increased Risk of Disseminated Infection with Mycobacterium avium and Salmonella enteritidis and Noise-inducd Hearing Loss.
Phew, that's a LOT. My main concern is that you get access to that report without a single click-wrap warning. Some people won't handle well learning that they have "Heart Attack" or "Sudden Death", for example, with no forced warning or disclaimer about how the results are only according to knowledge now, may contain errors, and in any case only a risk. 23andMe had multiple click-wraps and disclaimers but still got into hot water, which is why Dante Labs is only available in Europe, but for ethical reasons I hope they will introduce clear warnings both before order and before viewing results. If you are taking the test for genealogical purposes only, consider carefully the possible impact on yourself *and your family* if you're found positive for any of the risks above before clicking on "Wellness and Longevity" in the kit manager.
Finally, there is a "Genome Overview", which is an SnpEff report for the VCF file. Most of the information in it, I daresay, holds little relevance or usability, what am I supposed to compare the quality metrics etc. on here? It shows there were 3,431 variants detected on Y chromosome, though. The report shows mtDNA incorrect, and the reference sequence in the VCF looks to actually be UCSC hg19 chrM, which isn't compatible with anything, really. So the BAM will be essential for phylogenetic uses.

Thus far, Promethease remains superior to most commercial DTC health reports.

MacUalraig
10-12-2017, 12:52 PM
Dante Labs was currently on UCSC hg19. It's just that the mitochondrial reference in that is way outdated.
FTDNA's switch is for Big Y's Y-chromosome only, it won't matter for this as YFull would redo the mapping to a specific build anyway.

Dante Labs provided a download link for the VCF file, which worked as-is for both GEDMatch Genesis and SNPedia Promethease. There's a general problem with the sequencing services though, due to the large amount of data the VCF's only list variants that differ from reference. GEDMatch Genesis will therefore take only those SNP's into account in matching, which means you're working with very limited (and skewed) set of SNP's. This is a non-trivial problem because even a whole genome sequence doesn't get sufficient coverage across the whole genome, so you can't just assume non-listed variants are reference allele either. The solution is for everybody to start using gVCF (And, preferably, some interchangeable common format...) which lists regions that match the reference.

I have no idea if GEDMatch or SNPedia support gVCF yet, I haven't seen a statement for that. It's possible to expand out the gVCF REF-blocks, but Dante Labs VCF doesn't contan them currently, and expanding them would hugely increase the file size (Storage + file-transfer). Another quick-and-dirty fix would be to run the VCF through an imputation process, which would fill out the reference alleles, and hopefully impute some of the variants in the non-tested regions. However, currently I'm not sure there ARE any public imputation services that would take a sequencing VCF.

I was offered the BAM file on a mailed USB-stick, but given it would take a week or two to upload the BAM file on Dropbox/Google Drive etc. (Oh, and need expanded capacity too) over a fast consumer internet connection, I asked if they could arrange for a download link, still waiting for a response. YFull does not currently list Dante Labs as a testing company on their order form, but given they frequently add samples from studies with different workflows, I'm sure they can add Dante Labs BAM files easily enough.

My WGS FGC reports include a dbSNP.vcf report of about 41Gb which includes all calls ref or alt for SNPs that occur in the dbSNP database. The other main VCF is a snpeff one (2 Gb) which is restricted to non-ref results.

Donwulff
10-13-2017, 09:17 AM
About a week after I requested BAM & gave an impassioned plea for download link, they sent it on a 128GB SanDisk Cruzer USB-stick via DHL. Good news is it was an overnight delivery, but this seems untenable for most customers without data-processing tools at home and opens some new privacy concerns for people who believe in genetic exceptionalism (Ie. that the information couldn't be obtained in other ways). For myself, I have mobile broadband on a tablet, and made it a WLAN access point for my Windows laptop.

Uploading the ~100 gigs BAM to Amazon S3, I first tried Cloudberry Explorer which offers 14 day Pro trial with multipart upload. This was horrible; with current version, the percent complete calculation was totally off (And actually showed two values), the time left showed the expected total transfer time, and most importantly, when the upload finished 75 blocks failed. I could find no way to either resume the download or even delete the unfinished file so I wouldn't have to pay storage fee for it. S3 Browser worked nice enough, letting me view and delete incomplete multipart files, and again most importantly, finish the upload without errors even though I suspended the laptop without warning at one point.

I'm expecting Amazon AWS storage and transfer fees to be in the neighborhood of 5 euros, maybe less given I only need to store it until YFull downloads it. Considering the USB stick is 30 dollars and DHL shipping probably in the neighborhood of 50 dollars, this is obviously un-optimal, though if Dante Labs does their processing offline they could be facing similar upload bandwidth problem. Nonetheless, I hope they solve that with further sequences. Having ordered my sequence at their Amazon Prime Day discount, it also seems like just the DHL, sample processing and USB costs would put them in the red, never mind the actual sequencing; it's likely they're burning investor money to get established, but I'm still not getting their business model exactly.

YFull replied they think they can process the BAM file, and there doesn't seem to be anything out of ordinary with it, but will probably be a while before I know for sure.

As an aside, I could impute the 166MB VCF-file that Dante Labs provides standard with Sanger Imputation Service (HRC 1.1), but this isn't currently an easy task. The service can't handle chrM or chrY, and the chromosomes need to be renamed to GRCh37 without the "chr" part, and Globus Connect is needed for transferring the files. After that you still need to combine the chromosomes into a single file that's close to the "dbSNP" list (with potentially unsequenced variants imputed and ref-calls filled out) and do whatever preparation your intended use requires. I could not yet get GedMatch to accept the imputed file, and it provides no error so I don't know if it's file size or what.

Donwulff
10-15-2017, 08:39 PM
Cost about $10 to serve the BAM file off Amazon S3 for YFull; since YFull pulled it almost immediately I could've deleted it right away so storage costs just pennies.

I was going to leave it there in case I get around to running some further analysis on Amazon AWS myself, but then I took a closer look at Sequencing.com who advertise Dante Labs as their sequencing partner, and noticed they offer free storage and transfer for unlimited amount of DNA data at least for now. So if you trust sequencing.com and don't have another reliable cloud drive, you can actually just use their Big Yotta client to upload the BAM file on their site, and serve the link off there. It might even be possible for Dante Labs to put it there, but I didn't ask.

My BitDefender antivirus did not play nice with Big Yotta though, and I had to add the BAM file to antivirus exclusions; not that there's a virus there, but the way Big Yotta accesses it causes BitDefender to lock up CPU & drives re-scanning all the time.

Sequencing.com offers a few other cloud-based automated tools that are free; sadly despite claiming to process BAM files their EvE Free pipeline won't let me select my BAM file, so the only free tool that currently works is "Genome VCF" for generating gVCF v4.1 off the bam file, which *might* solve the problem of REF-calls not being listed in the original VCF. Trying out how it goes; though again, hopefully Dante Labs can offer this for most folks in the future.

MacUalraig
10-16-2017, 08:01 AM
Cost about $10 to serve the BAM file off Amazon S3 for YFull; since YFull pulled it almost immediately I could've deleted it right away so storage costs just pennies.



Please post a screenshot of the 'RAW data - Statistics' tab which should be the URL https://www.yfull.com/raw-data-stat/. This should be populated very shortly (a matter of hours) after they receive your file.

Donwulff
10-16-2017, 08:23 PM
Oddly enough, Y chromosome stats remains empty save for "ChrY BAM file size: 0.95 Gb" vs 0.41 Gb for BigY, but STR & mtDNA sections were populated almost immediately (though not available yet). 355 (71.00%) Reliable Alleles vs. 422 (84.40%) for Big Y; it makes sense because my Big Y was with 120bp read length, while the Dante Labbs WGS is just 100bp so longer STR's can't be read. This is weird, because the STR results used to lag months behind SNP's. I were them, I'd calibrate the SNP's against my existing Big Y test before using them, so perhaps that's what they are doing. Not batched yet for next processing batch either, just "the samples are collected for a next batch" with last months target time; I'm not sure what YFull is supposed to look for new samples nowadays. It may not get processed until the next batch starts.

kafky
10-20-2017, 06:50 PM
Hi! Anyone had results back from DANTE? Can you share the experience?

Donwulff
10-20-2017, 07:18 PM
Funny, the site won't let me reply to a PM...
Yes, I did receive my results, though a several weeks late. I can't vouch that everybody will receive them; their pricing remains suspect as ever. The results I got check out in every way, it's the real thing, and YFull is currently analyzing them for Y chromosome & mtDNA.

There's some challenge with their formats so far, because they provide only VCF file of sites that differ from the reference build, and with an outdated mtDNA sequence. In my case, they also only delivered the BAM file on an USB stick, which while fast, doesn't make it easy to run third party analysis as you will first need to upload it to a 100GB+ cloud storage provider.

GEDMatch Genesis (Their new beta matching) and SNPedia/Promethease accept their VCF-file link directly, and Sequencing.com can be used besides for storing and serving the BAM file, also to run various kinds of analysis on it, many for free. In that case you may have to add "&.vcf.gz" at the end of the file URL's to get things like GedMatch to realize they're vcf files. This will let you get Genomic VCF and mtDNA alignment from the BAM file and ClinVar annotation (or test mapping to GRCh38 reference) if you can get the BAM uploaded there.

Donwulff
10-20-2017, 09:45 PM
YFull has updated the expected completion date on my Dante Labs WGS sample to 20th November so I guess I'll be able to report on any impressions about theremain, as of yet there's no Y chromosome statistics save for "ChrY BAM file size: 0.95 Gb". From my memory of the Big Y processing, the statistics took a while to fill in as well, and the predicted completion date kept sliding forwards and backwards. I'll probably get around to doing some analysis of my own before then, but I think what YFull gets out of it with their pipeline & quality control is what matters most, so that'll be about a month.

MacUalraig
10-21-2017, 08:20 AM
YFull has updated the expected completion date on my Dante Labs WGS sample to 20th November so I guess I'll be able to report on any impressions about theremain, as of yet there's no Y chromosome statistics save for "ChrY BAM file size: 0.95 Gb". From my memory of the Big Y processing, the statistics took a while to fill in as well, and the predicted completion date kept sliding forwards and backwards. I'll probably get around to doing some analysis of my own before then, but I think what YFull gets out of it with their pipeline & quality control is what matters most, so that'll be about a month.

My memory may have been at fault. I admin a group on YFull but you can only join groups when the SNP analysis is complete (and hence the statistics tab populated).

In the meantime you could do some simple examination of the data locally eg compare read depths and calls at your previously determined NVs.

Donwulff
10-21-2017, 03:31 PM
It's been a few years since I got my BigY done, and this is a new situation since my sample may be first one from Dante Labs they have so it may be handled differently.
I'm not sure I should post my novel SNP's publicly, but what the heck, that's probably the most significant part. Out of the VCF provided by Dante Labs, most of my novel SNP's aren't listed. The sequence is nominally 30X, however the Y chromosome is haploid, ie. there is only one copy, which means you could expect average read depth of about 15 reads, and if Dante Labs filters for example expect at least 12 reads to report it, then most SNP's won't end up on the VCF. BigY is by design much deeper sequence, and the variants matching Dante Labs have from 24 to 85 reads on YFull, typically 50 to 80 as per the BigY spec.

The main benefit from WGS is you're supposed to get areas outside of the targeted locations of Y-chromosome, so novel SNP's not found on BigY are highest interest but will require YFull to finish their analysis. Comparing to the novel SNP's found on BigY by YFull will give some idea of utility as the only test. In conclusion all of the Acceptable and Good Quality SNP's can be called on Dante Labs sequence, though only about two thirds pass filters on the VCF they provide. Read-depth at my current terminal Y-SNP is 25; the terminal/haplogroup detemining SNP's are usually ones which are easy to determine by sequencing, because they need to match in all samples.

I have 6 Best Quality novel variants on YFull off the 150bp PE Big Y, and Dante Labs VCF includes these 4 out of them and Sequencing.com Genome VCF finds them all (italics):
chrY 9427939 . T C 747 PASS DP=21;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLE AC=2;MLEAF=1.00;MQ=46;MQ0=1;QD=35.57;SB=-2.700e+02 GT:AD:DP:GQ:PL:MQ:GQX 1/1:1,20:21:60:780,60,0:46:60
chrY 16307293 . G A 460.77 PASS AC=2;AF=1.00;AN=2;DP=20;FS=0.000;GQ_MEAN=60.00;MLE AC=2;MLEAF=1.00;MQ=60.00;MQ0=0;NCC=0;QD=23.04;SOR= 0.892;VQSLOD=1.58;culprit=FS GT:AD:DP:GQ:PL 1/1:0,20:20:60:489,60,0
chrY 17173949 . C T 509.77 PASS AC=2;AF=1.00;AN=2;DP=22;FS=0.000;GQ_MEAN=63.00;MLE AC=2;MLEAF=1.00;MQ=60.00;MQ0=0;NCC=0;QD=24.27;SOR= 1.828;VQSLOD=1.90;culprit=FS GT:AD:DP:GQ:PL 1/1:0,21:21:63:538,63,0
chrY 19070707 . T C 293.77 PASS AC=2;AF=1.00;AN=2;DP=19;FS=0.000;GQ_MEAN=52.00;MLE AC=2;MLEAF=1.00;MQ=60.00;MQ0=0;NCC=0;QD=15.46;SOR= 0.793;VQSLOD=2.18;culprit=FS GT:AD:DP:GQ:PL 1/1:0,19:19:52:322,52,0
chrY 20063414 . G C 366 PASS DP=11;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLE AC=2;MLEAF=1.00;MQ=53;MQ0=0;QD=33.27;SB=-1.987e+02 GT:AD:DP:GQ:PL:MQ:GQX 1/1:0,11:11:30:399,30,0:53:30
chrY 20063416 . G C 467.77 PASS AC=2;AF=1.00;AN=2;DP=12;FS=0.000;GQ_MEAN=36.00;MLE AC=2;MLEAF=1.00;MQ=54.16;MQ0=0;NCC=0;QD=27.40;SOR= 1.022;VQSLOD=1.72;culprit=QD GT:AD:DP:GQ:PGT:PID:PL 1/1:0,12:12:36:1|1:20063414_G_C:496,36,0

17,5 average read depth. NB. Sequencing.com Genomic VCF has 1 read read less for some locations.


Out of my 8 Acceptable Quality SNP's on YFull, Dante Labs has reported 5, and Sequencing.com Genomic VCF has the rest:
chrY 8341974 . G T 606 PASS DP=18;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLE AC=2;MLEAF=1.00;MQ=60;MQ0=0;QD=33.67;SB=-3.160e+02 GT:AD:DP:GQ:PL:MQ:GQX 1/1:0,18:18:51:639,51,0:60:51
chrY 8420303 . G T 464 PASS DP=13;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLE AC=2;MLEAF=1.00;MQ=60;MQ0=0;QD=35.69;SB=-2.349e+02 GT:AD:DP:GQ:PL:MQ:GQX 1/1:0,13:13:39:497,39,0:60:39
chrY 9167511 . C T 457.77 PASS AC=2;AF=1.00;AN=2;DP=19;FS=0.000;GQ_MEAN=57.00;MLE AC=2;MLEAF=1.00;MQ=58.41;MQ0=0;NCC=0;QD=24.09;SOR= 1.609;VQSLOD=1.39;culprit=FS GT:AD:DP:GQ:PL 1/1:0,19:19:57:486,57,0
chrY 15349787 . G C 484.77 PASS AC=2;AF=1.00;AN=2;DP=20;FS=0.000;GQ_MEAN=60.00;MLE AC=2;MLEAF=1.00;MQ=60.00;MQ0=0;NCC=0;QD=24.24;SOR= 0.892;VQSLOD=1.42;culprit=FS GT:AD:DP:GQ:PL 1/1:0,20:20:60:513,60,0
chrY 17172702 . G T 524.77 PASS AC=2;AF=1.00;AN=2;DP=21;FS=0.000;GQ_MEAN=63.00;MLE AC=2;MLEAF=1.00;MQ=60.00;MQ0=0;NCC=0;QD=24.99;SOR= 0.784;VQSLOD=1.65;culprit=FS GT:AD:DP:GQ:PL 1/1:0,21:21:63:553,63,0
chrY 17465570 . G A 518.77 PASS AC=2;AF=1.00;AN=2;DP=22;FS=0.000;GQ_MEAN=66.00;MLE AC=2;MLEAF=1.00;MQ=60.00;MQ0=0;NCC=0;QD=23.58;SOR= 1.085;VQSLOD=2.04;culprit=FS GT:AD:DP:GQ:PL 1/1:0,22:22:66:547,66,0
chrY 18872443 . C G 410 PASS DP=12;Dels=0.00;FS=0.000;HaplotypeScore=0.0000;MLE AC=2;MLEAF=1.00;MQ=60;MQ0=0;QD=34.17;SB=-7.900e+01 GT:AD:DP:GQ:PL:MQ:GQX 1/1:0,12:12:33:443,33,0:60:33
chrY 22138713 . C A 416.77 PASS AC=2;AF=1.00;AN=2;DP=18;FS=0.000;GQ_MEAN=54.00;MLE AC=2;MLEAF=1.00;MQ=60.00;MQ0=0;NCC=0;QD=23.15;SOR= 0.914;VQSLOD=1.43;culprit=FS GT:AD:DP:GQ:PL 1/1:0,18:18:54:445,54,0

This comes to average read depth of 17,875.

Ambiguous and Low Quality SNP's on YFull weren't found on Dante Labs VCF at all, which isn't too surprising since most of them are likely errors or just hard to sequence spots, and Dante Labs VCF doesn't list REF calls. Sequencing.com produced Genomic VCF calls Ambiquous Quality (2 reads on BigY) ones to REF and the one Low Quality (8 reads with one DEL on BigY) to ALT:
chrY 3393577 . T . . PASS END=3393621;BLOCKAVG_min30p3a GT:DP:GQX:MQ 0/0:18:48:52
chrY 4319552 . G . . PASS END=4319597;BLOCKAVG_min30p3a GT:DP:GQX:MQ 0/0:28:57:26
chrY 5650724 . A . . PASS END=5650758;BLOCKAVG_min30p3a GT:DP:GQX:MQ 0/0:23:60:41
chrY 6403302 . G . . PASS END=6403312;BLOCKAVG_min30p3a GT:DP:GQX:MQ 0/0:20:48:50
chrY 8066241 . A . . PASS END=8066300;BLOCKAVG_min30p3a GT:DP:GQX:MQ 0/0:18:48:60
chrY 15444167 . A . . PASS END=15444217;BLOCKAVG_min30p3a GT:DP:GQX:MQ 0/0:15:45:60
chrY 16395059 . G . . PASS END=16395152;BLOCKAVG_min30p3a GT:DP:GQX:MQ 0/0:19:54:60
chrY 18702993 . T . . PASS END=18703017;BLOCKAVG_min30p3a GT:DP:GQX:MQ 0/0:13:39:60
chrY 18819125 . A . . PASS END=18819177;BLOCKAVG_min30p3a GT:DP:GQX:MQ 0/0:16:45:60

chrY 16742953 . A T 538 PASS DP=17;Dels=0.00;FS=0.000;HaplotypeScore=0.9998;MLE AC=2;MLEAF=1.00;MQ=60;MQ0=0;QD=31.65;SB=-2.220e+02 GT:AD:DP:GQ:PL:MQ:GQX 1/1:0,17:17:45:571,45,0:60:45

Donwulff
10-22-2017, 11:23 AM
YFull's statistics for my Dante Labs sequence filled out today. Dante Labs is on LEFT, BigY on RIGHT. Please note that one-to-one comparison isn't really possible, because Dante Labs is whole genome-wide "shotgun" (named after the random spread) sequence, and BigY is targeted capture deep sequencing test that targets certain well sequencing and phylogenetically informative regions of the human genome build 37. And most notably, sequencing results vary from run to run and sample to sample. Given that, though, the Dante Labs statistics look better than I expected.

19414

On their Facebook page Dante Labs shared post of someone else that got their BAM file on an USB stick, which indicates not only are they still using USB delivery, but *he got a better USB stick than me*. Pfft! ;) According to Facebook's translate he seemed pretty happy with the deal & security, though IMO it's still a terrible way of delivery. At least here the courier service will hand out the package to anyone who looks like they might want it, and then it'll be your home computer and network (Or school, library etc. if that's where you use it!) whose security matters, while the VCF file is still sitting on the Internet containing all the relevant data in much more easily digestible form. Of course any enemies powerful enough to use my DNA can already get it from a glass at restaurant or my garbage, so I'm not personally worried, just saying people need to ask Dante Labs to arrange for online link for the BAM files.

MacUalraig
10-22-2017, 12:06 PM
My broadband is capped and there is no way on earth I would be going near an online 100Gb BAM file. That's about what I streamed watching the London Olympics and my broadband bill was horrendous. So regardless of how I were to enable YFull to get to it, I would have to have hard copy locally for my own analysis. My own Y Elite BAM was 8.5Gb and I wouldn't want to go much bigger than that for downloading.

Donwulff
10-22-2017, 01:04 PM
Maybe both then. The economist in me says the "hardcopy" should cost more, since as discussed, the better USB stick costs 60-90 EUR at my location, and DHL shipment to Italy looks to cost 40 EUR, so the expense that needs to be passed to customer is 100-130EUR, while as shown, one download serving from Amazon S3 is about 8 EUR. By all means, ask what is most convenient for you, especially if you can get it for free. Though given uploading the BAM online is bound to cost at least as much as downloading it, I think most people would still prefer to have at least a download link for YFull, FGC, Sequencing.com and what have you. In fact, I'd still wager most of their intended clientele can't process let alone upload 100G of data. Also, unless people are actually getting (or taking) free resources at work, school or home, doing your own analysis on cloud servers is bound to be more affordable thanks to economies of scale than using your own hardware (And currently free at Sequencing.com).

Rant done ;)

Aside: YFull's "read count" and "unmapped reads" are identical to the numbers on the Dante Labs provided BAM, so if you *can* process the BAM file, I believe you can just extract the Y-chromosome part and save uploading most of it if you're doing *only* YFull (But better check with YFull first). Other services would require the whole file though.

MacUalraig
10-22-2017, 01:41 PM
Aside: YFull's "read count" and "unmapped reads" are identical to the numbers on the Dante Labs provided BAM, so if you *can* process the BAM file, I believe you can just extract the Y-chromosome part and save uploading most of it if you're doing *only* YFull (But better check with YFull first). Other services would require the whole file though.

I did that for a project member, it is quite trivial to split off a Y-only BAM using samtools.

Donwulff
10-23-2017, 12:09 PM
samtools index original.bam
samtools view -bo me_chrY_only.bam original.bam
Of course, you need a fast computer running operating system with samtools etc. capable of reading USB sticks, preferably with 128GB or larger hard-drive. In other words, no phones or tablets. N.B. samtools doesn't allow you set compression level for view sub-command, and bam format is kinda wierd so it takes some gymnastics to re-compress; sort would let you but then you will need lots of memory and at least double the hard-drive. But the highest compression level doesn't save that much space, so it's probably not worth it.

I re-mapped the Dante Labs BAM to GRCh38 (analysis ready + full decoy sets from bwa-kit) due in part to the talk about FTDNA switching to build 38, and the results did not change noticeably, read depths within one read of the original, or two reads for the one read YFull called "Low Quality". The GRCh38 chrY BAM has less reads total (including the unplaced contig), 10,527,139 vs. 12,090,138 for the Dante Labs provided build 37 one, so restricting the reads to chrY mapped probably doesn't hurt results too much even if they're re-analysed against build 38. Of course, the best would still be to run the whole original unfiltered BAM through the pipeline, but I don't know if even YFull is doing it.

Removed reference to FTDNA's build 38 re-analysis; apparently they match, it's just that their "Non-matching variants" column lists both REF and ALT variants which don't match with the other person, which makes sense but isn't obvious.

rbtlsr
10-29-2017, 11:31 PM
Just wanted to say that I'm still waiting on results. I mailed my kit in on 8/4, so it's been over 12 weeks. When I originally contacted them a few weeks ago they said I'd have it by 10/20, then I contacted them again when that date came around and they're now saying 11/3.

I asked if I could get the vcf as soon as it was available without having to wait for the report to be developed, and the person I was in contact with looked into it but unfortunately they can't do that, so I have to wait until everything is ready.

I'm trying not to get too impatient because they're a new company and it was a ridiculously low price, but I'm really looking forward to getting my data.

Donwulff
11-04-2017, 09:17 PM
Did you get your results yet? I don't want to be recommending them if their delivery seems dodgy. As said mine was some weeks late too, but nothing drastic, and they shared a post from happy customer on Facebook, but otherwise I've heard little.
As an aside, YFull updated my samples "Full Statistics" maybe a week ago to show 15 Novel SNP's for the Dante Labs sample. As I posted previously, this is the number of reliable novel SNP's found on my BigY test as well, so it would seem like in my case Dante Labs WGS doesn't improve over BigY results. I have no idea of YFull's methodology though, the number may still change, for all I know they took the number from my post here ;)

jenssonjoi
11-06-2017, 03:35 PM
New review for Dante Labs on medgadget.com
It's not independent in that the reviewer got it for free and thus Dante Labs knew there was going to be a review, but it does look like a genuine outside review.
I, on the other hand, sent the sample in 11 weeks ago now, with Dante Labs stating I'd receive results within couple of weeks three weeks ago, so I'm inclined to think it's a scam. As said, it wouldn't be the first time a DNA company misses delivery estimates, but given the campaign prices are so far below going price/materials price (Human whole genome sequencing at BGI starts at $600, Illumina is couple hundred more) with no plan to cover the price difference presented that they have burden of proof.
There is also a first "Verified Purchase" review on Amazon, but it sounds & looks like Dante Labs press release. My reading is that they ordered the test on 5th September so could not actually have received the results in any case (Which is quite confusing, why admit it was ordered three weeks ago if the processing time is eight weeks?)
Sequencing.com, their biinformatics analytics partner, has been listing Dante Labs as their "preferred provider" for a while though.

Hi Donwulff

I am the one who has the Verified Purchase on Amazon UK and I am the guilty one who added the review on to Amazon on the 5th September 2017 and I am sorry if my partial review looked like Dante Labs press release. I also bought another WGS for my wife few weeks later (23 September 2017) and my sister bought a WGS few days later after me in Iceland. I think she received and sent her sample back with DHL on the 13 September 2017). The sample for my sister was sent from Rome, Italy with DHL.

I have been in touch with Dante Labs few times since I bought my sample kit for WGS and they have always been very fast replying to my emails and their answer are exemplary and very detailed.

I am sure that Dante Labs are for real and I wish them great success but I have a lot hinging on my believe that they are just trying to find their way in a difficult business world.

However, I did contact Dante Labs this morning (6 November 2017) and I asked when I can expect my WGS test results:

Here is a copy of my email to Dante Labs:

"Dear Dante Labs

The reason why I am writing to you today is to check when the results of my Dante Labs Full DNA Analysis Whole Genome Sequencing (WGS), 30X Coverage will be forwarded on to me.

I placed my order through Amazon UK. Order #205-5594037-XXXXXXX on the 5 September 2017 and I posted my spit sample on the 6 September 2017 with DHL. This was 8 weeks ago and as I was expecting the results to arrive in 6-8 weeks. The 8 weeks cut-off date makes it today from the date of shipping my sample to Dante Labs in Rome, Italy."

...and Dante Labs answered my email with this reply.

"Our lab is completing the bioinformatics analysis on your sample. You are expected to receive results by the end of November"

To receive my WGS test results at the end of November is perhaps not ideal but no serious hardship is done, that is if I do get my results by the end of November at the latest.

I intend to keep you all informed how I get on at the end of November.

Best regards

jenssonjoi

rbtlsr
11-08-2017, 02:58 PM
Did you get your results yet? I don't want to be recommending them if their delivery seems dodgy. As said mine was some weeks late too, but nothing drastic, and they shared a post from happy customer on Facebook, but otherwise I've heard little.


Still no results. This Friday will be 14 weeks. I've been in contact with them again and they said they'd look into my order specifically. It's worth noting that they've changed the estimate on their site to 50 business days (so about 10 weeks).

karwiso
11-08-2017, 10:09 PM
It is 9th week for the first sample I have sent to them. So, still waiting. We will see next week.

jenssonjoi
11-10-2017, 02:29 AM
Just wanted to say that I'm still waiting on results. I mailed my kit in on 8/4, so it's been over 12 weeks. When I originally contacted them a few weeks ago they said I'd have it by 10/20, then I contacted them again when that date came around and they're now saying 11/3.

I asked if I could get the vcf as soon as it was available without having to wait for the report to be developed, and the person I was in contact with looked into it but unfortunately they can't do that, so I have to wait until everything is ready.

I'm trying not to get too impatient because they're a new company and it was a ridiculously low price, but I'm really looking forward to getting my data.

RBTISR

Did you get your results back from Dante Labs?

karwiso
11-13-2017, 10:22 PM
In the beginning of the 10th week I received results of my WES analysis. I have got an email that reports are available for the test.
I was able to download a VCF, which is aligned to GRCh37 and contains two files - one for SNPs and one for INDELS.
Right now I am trying to use them with Promethease, and I have uploaded uncompressed VCF for SNPs to GEDmatch.com
Wellness and longevity report report is very strange. It is some file from sequencing.com, but I see the information as not useful. It is produced by the Genome viewer App at sequencing.com and contains a lot of statistics about reads, aminoacids and so on. I have not seen any 100-pages report about my traits or disease risks and protections. It would be an complete abracadabra for a general average user. I haven't been able to find any direct useful health information there.
I've contacted the customer service about alignment to GRCh37 and they have answered that FASTQ files are provided for customers and the files could be aligned to any genome reference. That is nice. I have not seen any FASTQ or BAM files for direct download from the page. I will wait with contacting the customer service for the files because I am waiting for my WGS and have more samples from my relatives. I want to see how WGS results are and which reports are provided then.
If you are planning to test WES for possible disease risks, than you have probably to use some third party service or have a bio/medical-analyst to interpret your test results.
After the contact with some genealogist I can say that WES is not recommended for genealogical purposes. I write more when I have got my WGS results.

Well, the current price for WGS and WES is a hard competition compared to other services around or above 1000$, another good point is European market and very fast delivery with DHL. The customer service is friendly and responsive. They still need to improve their reports and FASTQ and BAM files should be available for download directly from the account, at least for a limited time, hm... 30 days after the completion and upon a later request. The Wellness and Longevity report is of an inferior quality so far and should be improved.

PS Some part referred as Quality in Number of effects:
Quality:

Min 30
Max 20,628
Mean 2,159.85
Median 1,092

If it is about number of reads than it is superior I think.

The report from Promethease was a lot more informative than Promethease based on the analysis of the data from AncestryDNA.

jenssonjoi
11-14-2017, 03:59 AM
In the beginning of the 10th week I received results of my WES analysis. I have got an email that reports are available for the test.
I was able to download a VCF, which is aligned to GRCh37 and contains two files - one for SNPs and one for INDELS.
Right now I am trying to use them with Promethease, and I have uploaded uncompressed VCF for SNPs to GEDmatch.com
Wellness and longevity report report is very strange. It is some file from sequencing.com, but I see the information as not useful. It is produced by the Genome viewer App at sequencing.com and contains a lot of statistics about reads, aminoacids and so on. I have not seen any 100-pages report about my traits or disease risks and protections. It would be an complete abracadabra for a general average user. I haven't been able to find any direct useful health information there.
I've contacted the customer service about alignment to GRCh37 and they have answered that FASTQ files are provided for customers and the files could be aligned to any genome reference. That is nice. I have not seen any FASTQ or BAM files for direct download from the page. I will wait with contacting the customer service for the files because I am waiting for my WGS and have more samples from my relatives. I want to see how WGS results are and which reports are provided then.
If you are planning to test WES for possible disease risks, than you have probably to use some third party service or have a bio/medical-analyst to interpret your test results.
After the contact with some genealogist I can say that WES is not recommended for genealogical purposes. I write more when I have got my WGS results.

Well, the current price for WGS and WES is a hard competition compared to other services around or above 1000$, another good point is European market and very fast delivery with DHL. The customer service is friendly and responsive. They still need to improve their reports and FASTQ and BAM files should be available for download directly from the account, at least for a limited time, hm... 30 days after the completion and upon a later request. The Wellness and Longevity report is of an inferior quality so far and should be improved.

PS Some part referred as Quality in Number of effects:
Quality:

Min 30
Max 20,628
Mean 2,159.85
Median 1,092

If it is about number of reads than it is superior I think.

The report from Promethease was a lot more informative than Promethease based on the analysis of the data from AncestryDNA.

Thank you Karwiso for your excellent update - Can I ask you why you get both WGS and WES tests?

I am still waiting for my results for my WGS but it should not be long now. I am interested hearing how you get on with the WGS tests and in particular the usability for genealogical purposes.

Best regards

Johannes

MacUalraig
11-14-2017, 07:20 AM
After the contact with some genealogist I can say that WES is not recommended for genealogical purposes. I write more when I have got my WGS results.



For those who haven't seen it, this is what gedmatch says about exome date:

"About the close 'Exome' matches

Are you puzzled by the new, very close, match you are seeing with somebody. Does it have "Exome" in the name? Even if it does not say "Exome", there are a few "Exome" kits on Genesis. Exome kits are different than the "genealogy" kits we are used to dealing with. The exome regions of the chromosome have much less difference from one individual to the next. Because of that, they appear as close matches to more people.

We apologize for the "false match". We plan to provide some means of differentiating these "exome matches" from the real thing, but it will take a while to get it in place."

So I'm not sure how much investigation of exome-exome comparisons has been carried out, contrasted with standard matching between the same people?

MacUalraig
11-14-2017, 07:58 AM
Wellness and longevity report report is very strange. It is some file from sequencing.com, but I see the information as not useful. It is produced by the Genome viewer App at sequencing.com and contains a lot of statistics about reads, aminoacids and so on. I have not seen any 100-pages report about my traits or disease risks and protections. It would be an complete abracadabra for a general average user. I haven't been able to find any direct useful health information there.


Sounds like you are looking at or were given the wrong report. There is a sample on their site

https://s3.amazonaws.com/dantelabswebsite/Dante+Lab+Sample+Report+-+Wellness%26Longevity+August+2017.pdf

for the health report. What you are describing sounds like the genome overview report which is cool if you like data like I do but not particularly useful.

https://s3.amazonaws.com/dantelabswebsite/Dante+Labs+Genome+Overview+Sample+Report.zip

karwiso
11-14-2017, 08:42 AM
Thank you Karwiso for your excellent update - Can I ask you why you get both WGS and WES tests?

I am still waiting for my results for my WGS but it should not be long now. I am interested hearing how you get on with the WGS tests and in particular the usability for genealogical purposes.

Best regards

Johannes

Hi! I ordered WES first because it was cheaper than WGS and because WES has more coverage (100x). Some of my relatives are interested in health risks and I have ordered both tests for them.
WES could probably be used for genealogy. I have uploaded VCF to GEDmatch Genesis and now it has been analysed. I see overlaps from 16 000 to 62 000 with other kits, mostly FGC, Gencove, Genes for God, (imputed) VCFs from dna.land, much less are 23andme, FTDNA and Ancestry. I match myself 64 cM in total with Ancestry results and 104 cM in total with MyHeritage results, no match on me with Living DNA kit. So, if the genealogy is the goal, then any cheaper autosomal test will do better than WES.

I expect my WGS in two weeks. I ordered it a bit later than WES. For most of my relatives I have ordered only WGS.

Best regards.

MacUalraig
11-14-2017, 08:45 AM
Some genealogically useful Y SNPs are inside genes anyway. M222, the world's most famous SNP*, is in the USP9Y gene.

rbtlsr
11-14-2017, 09:06 PM
RBTISR

Did you get your results back from Dante Labs?

Nope. I've been bugging them about it and was just told "We are completing the quality control on your sample. You are expected to receive the result by the end of next week."
The end of next week will be 16 weeks since they received the sample.

vettor
11-15-2017, 12:16 AM
Some genealogically useful Y SNPs are inside genes anyway. M222, the world's most famous SNP*, is in the USP9Y gene.

Is that the gene stated as.....slow sperm swimmers?

gotten
11-15-2017, 01:03 AM
Is that the gene stated as.....slow sperm swimmers?

Well, it seems they were quite productive, nonetheless. :rofl:

MacUalraig
11-15-2017, 08:25 AM
Joking apart, yes USP9Y is involved in sperm creation but M222 is benign ;-)

karwiso
11-17-2017, 02:12 PM
Update about my WES:
It was probably some glitch in Dante Labs systems or it probably takes some days to get results from Wellness and Longevity App. Today I was able to access my health report.
The report is 160 pages, but most of them are explanations about how to read the report and references to research.
Effectively the report is 11-12 pages. The content and the appearance corresponds to the sample reports provided by Dante Labs. I think the report is easy-readable, the risk overviews are easy to understand and it is nice to have life style recommendations in order to prevent some possible conditions. I find Promethease report to be more detailed, but it is of course not so easy to browse compared to Wellness and Longevity App interpretation. I think the report is better suited for an average user that could be bored or scared by scientific or medical terminology and details. It would be more fun to read such report if it would contain some factoids and phenotype traits - it would be an more entertaining for a general audience and could be a good selling point (like ethnicity estimates in genealogical testing).
After this amendment I think that the product corresponds to my expectations, but the access to FASTQ and BAM files should be improved (as I said before - at least possibility to download them during some limited period).

MacUalraig
11-17-2017, 02:52 PM
Update about my WES:
It was probably some glitch in Dante Labs systems or it probably takes some days to get results from Wellness and Longevity App. Today I was able to access my health report.
The report is 160 pages, but most of them are explanations about how to read the report and references to research.
Effectively the report is 11-12 pages. The content and the appearance corresponds to the sample reports provided by Dante Labs. I think the report is easy-readable, the risk overviews are easy to understand and it is nice to have life style recommendations in order to prevent some possible conditions. I find Promethease report to be more detailed, but it is of course not so easy to browse compared to Wellness and Longevity App interpretation. I think the report is better suited for an average user that could be bored or scared by scientific or medical terminology and details. It would be more fun to read such report if it would contain some factoids and phenotype traits - it would be an more entertaining for a general audience and could be a good selling point (like ethnicity estimates in genealogical testing).
After this amendment I think that the product corresponds to my expectations, but the access to FASTQ and BAM files should be improved (as I said before - at least possibility to download them during some limited period).

I like the way you access the Promethease reports personally - dip in and out rather than just read one long doc which might contain things you didn't want to see or will put you off your dinner.

jenssonjoi
11-20-2017, 02:32 PM
Dear Karwiso

I am still waiting for my results from Dante Labs but if my calculations are right then I should get my WGS results sometime this week.

I hope you have managed to get all your test results for WGS and WES from Dante Labs by now! You mention that the "Wellness and Longevity report is of an inferior quality so far and should be improved".

What would you like to see improved for the Wellness and Longevity report?

I thought the Wellness and Longevity report came from Wellness & Longevity App and not from the Genome Overview App. I noticed that the Genome Overview App has been upgraded and that the new 'Details By Gene' table provides now, improved comprehensive, searchable gene-based information.

Have you tried running the Genome Overview App, with your data?

I hope everything works well for you.

Best regards

Johannes

karwiso
11-20-2017, 09:09 PM
Dear Karwiso

I hope you have managed to get all your test results for WGS and WES from Dante Labs by now! You mention that the "Wellness and Longevity report is of an inferior quality so far and should be improved".

What would you like to see improved for the Wellness and Longevity report?

Best regards

Johannes

Hi Jenssonjoi!

As I wrote above the thing with the Wellness and Longevity report had been fixed (before it was just statistics from Genome Overview App), I suppose that it was a glitch in the system or that it just takes some time to get the results from Wellness and Longevity App. As I wrote I think the report is reasonably informative for an average user and it is nice with life-style advices. Some factoids (like caffeine digestion) and phenotype traits would be more entertaining for an average customer. Personally I would like see more details about specific diseases - higher or lower risk calculated, but it would be probably too much for everyone.
I have showed my Wellness and Longevity report for a colleague at my work and I think she was pleased with the amount of details and information, especially that I had no serious high-high risk "genes" (Her son is worrying about his health, so I try to convince them to test). Since I got the correct report, I'm satisfied.

I expect my WGS to come next week. I hope so.

Best regards,
karwiso

rbtlsr
11-25-2017, 08:17 AM
Got my data :) it took 16 weeks, but hopefully as the company grows they'll iron out any issues with timing/time estimates. It's definitely worth it for the price, even if it takes 16 weeks, though I wish I'd known from the beginning how long it would take rather than being reassured it would just be another week or two for an additional 8 weeks.

Edit: did those who received their VCFs receive two files? I got a snp.vcf.gz and an indel.vcf.gz. I put them both into promethease for one report and got a large number of conflicts (>2k), so I'm confused as to what's in the files/what the difference is (I know the difference between a SNP and an indel, just not what info is contained in these files and how it's written). I don't know if the conflicts are within the SNP file or if it's conflicts between the two files. I haven't been able to open the .gz file to try to figure it out by just reading it as a text document. I'm really just waiting on getting the BAM file, but I was hoping to get some answers before then. (I have no bioinformatics background (other than one perl-based class in undergrad), but I have a biology/genetics background).

Donwulff
11-25-2017, 03:40 PM
Promethease doesn't really handle indels, so it will likely interpret the indel file in the wrong way. My WGS results didn't include indels, and I forget if you were waiting for WES. Indels would make more sense for WES as it has deeper read depth and the exome is more vulnerable to indels. I'm also hoping they improve their analysis pipeline; as I previously noted they're using old UCSC chrM reference, and for me there was just one file of SNP's called with high confidence. Actually I'm going to ask Promethease if they'll let me swap the Dante Labs VCF for a gVCF (Showing ref-calls, uncertain calls and correct MtDNA) from same sample (Once I'm done re-running the analysis), not a big deal but worth a try. They've got Black Friday & Cyber Monday campaign going on right now, by the way.

MacUalraig
11-25-2017, 04:20 PM
Got my data :) it took 16 weeks, but hopefully as the company grows they'll iron out any issues with timing/time estimates. It's definitely worth it for the price, even if it takes 16 weeks, though I wish I'd known from the beginning how long it would take rather than being reassured it would just be another week or two for an additional 8 weeks.

Edit: did those who received their VCFs receive two files? I got a snp.vcf.gz and an indel.vcf.gz. I put them both into promethease for one report and got a large number of conflicts (>2k), so I'm confused as to what's in the files/what the difference is (I know the difference between a SNP and an indel, just not what info is contained in these files and how it's written). I don't know if the conflicts are within the SNP file or if it's conflicts between the two files. I haven't been able to open the .gz file to try to figure it out by just reading it as a text document. I'm really just waiting on getting the BAM file, but I was hoping to get some answers before then. (I have no bioinformatics background (other than one perl-based class in undergrad), but I have a biology/genetics background).

What are you using to (try to) open the gz file? gzip for example?

Donwulff
11-25-2017, 05:22 PM
VCF files are large, even the default VCF files provided by Dante Labs. One option for processing them if you don't have Unix/Linux shell available is Sequencing.com, which has a few free to use bioinformatics analysis and browsers. DNA.Land also lists some ideas and editors, along with their "Compass" web-app that actually works mainly through client side JavaScript so your DNA data never leaves your computer: https://dna.land/vcf-info
In short, on Windows I personally have always 7zip installed, which can extract gzip files. However, reading and searching in a vcf file on Windows can be bit of a challenge. DNA.Land finally settled on GLOGG editor. There are some specialized vcf-browsers/gene-viewers that let you browse the uncompressed gzip (bgzip to be exact, with index to the contents) but they require individual instructions. I like Golden Helix GenomeBrowse on the free side: http://goldenhelix.com/products/GenomeBrowse/index.html In a way it's for BAM files, but it does show VCF files as well, though not in the tabular format you tend to get from variant browsers. Some web-based alternatives, both free and not, for VCF browsing and interpretation can be found at https://caelestic.com/variant-analysis-list

Donwulff
11-27-2017, 12:12 AM
I forgot that DNA.Land requires the index-file .tbi for the VCF file, generating this on Windows/Mac/Mobile phone could be more trouble than it's worth. Sequencing.com doesn't seem to generate the .tbi files, either. Usually "tabix" utility is used, best available on Linux, but one of the other methods might be beneficial due to that, or ask Dante Labs to make the .tbi files available along the vcf. Remember to register to Promethease first if you use them, since they allow for registered users to re-generate the report every one and half months with latest information entered into SNPedia.

MacUalraig
11-30-2017, 08:43 AM
Dante now seems to have launched a e149 array test for 'Health and Wellness' but no mention of ancestry?

https://www.dantelabs.com/collections/our-tests/products/consumer-dna-test-health-wellness

Standard price will be e199.

VCF with 500k SNPs reported plus a Health and Wellness report - but the link to the sample report is actually their WGS report from sequencing.com...

https://s3.amazonaws.com/dantelabswebsite/Dante+Lab+Sample+Report+-+Wellness%26Longevity+August+2017.pdf

karwiso
11-30-2017, 07:09 PM
Hi!

Yesterday I got my WGS results from Dante Labs. It was the 10th week after the DanteLabs had recieved the sample.
WGS comes with Wellness and Longevity report, Genome Overview and VCF file.

WES vs WGS reports
They are almost identical - the majority of the traits or health risk are the same, but there are some differences.
Athletic performance - identical interpretations
Melanoma - identical interpretations
Arthritis - High Risk for Knee Arthritis (WES) vs Slightly Increased Risk for Knee Arthritis (WGS), the rest of the interpretations is identical.
Osteoporosis - identical interpretations
Malignant Hyperthermia - identical interpretations
Heart Attack - nearly identical interpretations with Moderate Life Time Risk of Heart Attack - 37% (WES) vs. 46% (WGS).
Medication assesment - nearly identical: Warfarin, Clopidogrel, Aspirin, Statins (Lipitor) - identical. One difference - Blood Clot Risk: Increased Risk Detected (WES) vs. Normal Risk (WGS). Due to this difference the section about Blood clot risk is a bit larger in WES report and includes some recommendations, about 3/4 of A4 page.
Preventable Sudden Death - identical interpretations.

Genetic profile:
- medications - 13 identical interpretations
- cancer - 3 identical and 1 different intrepretations (one type of cancer - High Risk (WES) vs Slightly Increased Risk (WGS))
- heart & and blood vessels - 3 identical interpretations
- child development - 1 identical interpretation
- fertility - 2 identical interp.
- Digestive Tract & Liver - 3 identical interp.
- Blood - 3 identical and 1 different interpretations (Blood Clot Risk - Carrier (WES) vs No Increased Risk (WGS))
- hematology - 14 identical interpretations
- neurology - 1 different interpretations (Brain Aneurysm - No Increased Risk (WES) vs Increased Risk (WGS))
- pulmonology - 5 identical interpretations
- infectious desease - 4 identical interpretations
- hearing - 1 different interpretation (Hearing Loss - Lower Risk (WES) vs Increased Risk (WGS)).

As far as the health is the main factor for testing one should do the WES, since the coverage is higher and the interpretations should be more accurate. In my opinion WGS gives highly reliable health profile, although one should not take the interpretations as "a doom". There are some differences between WES and WGS - some risks are identified as high in WES, but as slightly increased in WGS. Hearing Loss and Brain Aneurysm are at no risk in WES, but have increased risk in WGS. So, a grain of salt is necessary when reading the results.

Genealogy
(my main reason for testing)

One-to-one comparison of my WES and WGS on GEDmatch.com Genesis gives 478.4 cM in common... About 45 300 SNPs are used for comparision. My WES results don't show up in the list of my matches for the WGS kit....

Overlap with other kits at GEDmatch.com Genesis:



Vendor
WES
WGS


FGC
38'000-43'000
447'000, 560'000, 1'430'000, 1'735'000, 1'751'000


Veritas
44'422
1'742'000


Genos
abt 21'300
16'000-17'000


FTDNA
16000-17000 (62000 - 1 kit)
117'000 - 255'000


Ancestry
15'000 - 16'000 (58'000, 62'000 - 2 kits)
216'000 - 247'000


23andMe
19000 - 20000 (62000? - 2 kits)
105'000 - 332'000


Genes for Good
50000 - 51000
111'000 - 113'000, 1'943'000 - 6 kits?


MyHeritage
16300 - 16500
243'000 - 248'000


Gencove
50000 - 62000
no kits



Well, my kits from Myheritage, FTDNA, Ancestry and LivingDNA show up in WGS matches with about 3560 cM in common. BUT... DonWulff, sorry for disclosure, is matching me 3558 cM as well. BUT... Donwulff is not on my match list in the "old" style GEDmatch (the first page goes down to 9.6 cM). Even if I do one-to-one comparison with Donwullf in Genesis I get 3558 cM in common.

Well, I am satisfied with the service from Dante Labs. I have also contacted them and sked to upload FASTQ and BAM files to my Google Drive or OneDrive (where I have quite a lot of space at my job), but they have answered that I will get an USB with the files free of charge. That is nice, but I think they will have to reconsider this model in the future.

WES is not so useful for genealogy, probably a little bit, if anything. WGS is very nice and the way to go, but there are a lot of challenges on the way - both for GEDmatch and other vendors. If one can afford WGS, then it should be done for the oldest relatives. Otherwise any genealogical test from the big vendors will be good enough so far and WGS is for additional testing of Y-chromosome. DanteLabs offers a quite affordable WGS testing service (and WES testing) - we can probably expect some Christmas and other sales, so that we can test more relatives.

MacUalraig
11-30-2017, 07:32 PM
Hi!

Yesterday I got my WGS results from Dante Labs. It was the 10th week after the DanteLabs had recieved the sample.
WGS comes with Wellness and Longevity report, Genome Overview and VCF file.

WES vs WGS reports
They are almost identical - the majority of the traits or health risk are the same, but there are some differences.
Athletic performance - identical interpretations
Melanoma - identical interpretations
Arthritis - High Risk for Knee Arthritis (WES) vs Slightly Increased Risk for Knee Arthritis (WGS), the rest of the interpretations is identical.
Osteoporosis - identical interpretations
Malignant Hyperthermia - identical interpretations
Heart Attack - nearly identical interpretations with Moderate Life Time Risk of Heart Attack - 37% (WES) vs. 46% (WGS).
Medication assesment - nearly identical: Warfarin, Clopidogrel, Aspirin, Statins (Lipitor) - identical. One difference - Blood Clot Risk: Increased Risk Detected (WES) vs. Normal Risk (WGS). Due to this difference the section about Blood clot risk is a bit larger in WES report and includes some recommendations, about 3/4 of A4 page.
Preventable Sudden Death - identical interpretations.

Genetic profile:
- medications - 13 identical interpretations
- cancer - 3 identical and 1 different intrepretations (one type of cancer - High Risk (WES) vs Slightly Increased Risk (WGS))
- heart & and blood vessels - 3 identical interpretations
- child development - 1 identical interpretation
- fertility - 2 identical interp.
- Digestive Tract & Liver - 3 identical interp.
- Blood - 3 identical and 1 different interpretations (Blood Clot Risk - Carrier (WES) vs No Increased Risk (WGS))
- hematology - 14 identical interpretations
- neurology - 1 different interpretations (Brain Aneurysm - No Increased Risk (WES) vs Increased Risk (WGS))
- pulmonology - 5 identical interpretations
- infectious desease - 4 identical interpretations
- hearing - 1 different interpretation (Hearing Loss - Lower Risk (WES) vs Increased Risk (WGS)).

As far as the health is the main factor for testing one should do the WES, since the coverage is higher and the interpretations should be more accurate. In my opinion WGS gives highly reliable health profile, although one should not take the interpretations as "a doom". There are some differences between WES and WGS - some risks are identified as high in WES, but as slightly increased in WGS. Hearing Loss and Brain Aneurysm are at no risk in WES, but have increased risk in WGS. So, a grain of salt is necessary when reading the results.

Genealogy
(my main reason for testing)

One-to-one comparison of my WES and WGS on GEDmatch.com Genesis gives 478.4 cM in common... About 45 300 SNPs are used for comparision. My WES results don't show up in the list of my matches for the WGS kit....

Overlap with other kits at GEDmatch.com Genesis:



Vendor
WES
WGS


FGC
38'000-43'000
447'000, 560'000, 1'430'000, 1'735'000, 1'751'000


Veritas
44'422
1'742'000


Genos
abt 21'300
16'000-17'000


FTDNA
16000-17000 (62000 - 1 kit)
117'000 - 255'000


Ancestry
15'000 - 16'000 (58'000, 62'000 - 2 kits)
216'000 - 247'000


23andMe
19000 - 20000 (62000? - 2 kits)
105'000 - 332'000


Genes for Good
50000 - 51000
111'000 - 113'000, 1'943'000 - 6 kits?


MyHeritage
16300 - 16500
243'000 - 248'000


Gencove
50000 - 62000
no kits



Well, my kits from Myheritage, FTDNA, Ancestry and LivingDNA show up in WGS matches with about 3560 cM in common. BUT... DonWulff, sorry for disclosure, is matching me 3558 cM as well. BUT... Donwulff is not on my match list in the "old" style GEDmatch (the first page goes down to 9.6 cM). Even if I do one-to-one comparison with Donwullf in Genesis I get 3558 cM in common.

Well, I am satisfied with the service from Dante Labs. I have also contacted them and sked to upload FASTQ and BAM files to my Google Drive or OneDrive (where I have quite a lot of space at my job), but they have answered that I will get an USB with the files free of charge. That is nice, but I think they will have to reconsider this model in the future.

WES is not so useful for genealogy, probably a little bit, if anything. WGS is very nice and the way to go, but there are a lot of challenges on the way - both for GEDmatch and other vendors. If one can afford WGS, then it should be done for the oldest relatives. Otherwise any genealogical test from the big vendors will be good enough so far and WGS is for additional testing of Y-chromosome. DanteLabs offers a quite affordable WGS testing service (and WES testing) - we can probably expect some Christmas and other sales, so that we can test more relatives.

If you do the WGS you will surely want to upload enough data to get a full overlap with one of the big firms - I think YSEQ were doing this for customers at one stage (23andMe equivalent calls) although they seem to have removed the mention of it from their website.

MacUalraig
11-30-2017, 07:41 PM
Ha, now you mention it I can see a Dantelabs kit in my match list too. But its a false positive :-(

I also now have a YSEQ WGS match which is a confirmed one.

RobertN
11-30-2017, 09:04 PM
Hello, i have some auto-immune diseases and i want to invest in a genetic testing. Until now i was planning to do 23andme and upload the results to promethease for health report but i learned about the dante labs WES and WGS and now i'm a bit more confused on what to do. I even saw that Dante Labs added their own "budget" "health and wellness" testing for 200e(150e now).

So for someone who is a bit more than the average user and very interested in the health readings, what test would you suggest me to take? The simple test(like 23andme or dante labs 200e test) or WES or WGS? Which of these would give me the most reliable info ?

Thank you!

MacUalraig
12-01-2017, 07:48 AM
Hello, i have some auto-immune diseases and i want to invest in a genetic testing. Until now i was planning to do 23andme and upload the results to promethease for health report but i learned about the dante labs WES and WGS and now i'm a bit more confused on what to do. I even saw that Dante Labs added their own "budget" "health and wellness" testing for 200e(150e now).

So for someone who is a bit more than the average user and very interested in the health readings, what test would you suggest me to take? The simple test(like 23andme or dante labs 200e test) or WES or WGS? Which of these would give me the most reliable info ?

Thank you!

If your interests are purely health and you don't care about ancestry, go for the WES (exome).

Donwulff
12-02-2017, 06:09 AM
I need to keep better notes as I don't remember if my Dante Labs WGS sample was processed in any way here. I think this one wasn't, just uploaded straight from Dante Labs site.
20157
The yellowed out rows are my other microarray kits, which are supposed to match. What you can see is that even though I have WGS, the microarray kits don't match for all SNP's, and that a lot of NGS kits are false positives to the WGS.

So lets see what happened when I imputed the Dante Labs provided VCF for the WGS, filling in the blocks of variants called as reference, which Dante Labs doesn't include:
20158

Now I'm only matching closely my own kits, with the overlap showing whole kit sizes, though there's a Genos Researc exome which matches distantly.
Oddly enough the imputed kit isn't a perfect match against the raw Dante Labs VCF, even though imputation shouldn't touch the know variants. Haven't had time to look what's up with that.

Basically, though, as I've noted before when you upload an NGS kit on GEDMatch Genesis, it will *only include variants listed in that kit*. In retrospect, this makes sense, because for example an exome kit will not have tested most locations on the genome, and you can't just assume all those locations match the reference. Consequently, all whole genome sequences will be very close matches since every variant that is listed in both kits is going to match! The solution would be to use gVCF files, which list reference-calls as well. I recall having tried that on GEDMatch Genesis, but I don't see the kit there right now, so there must have been some problem. In fact I think the gVCF files were too large for GEDMatch Genesis to currently process. Imputation could improve usability of Exomes for matching, but it would also introduce a lot of matching errors; and once again I'm not sure how GEDMatch Genesis deals with these resource-wise, a gVCF or imputed file is going to require a lot more resources.

jenssonjoi
12-07-2017, 04:53 PM
Karwiso

That is amazing and interesting comparison you did on your genetic matches. Unfortunately I am still in the same situation as before. I am still waiting for my results from Dante Labs. The latest scheduled delivery is 22 December 2017. Its a long time since I shipped off my sample to them on the 6 September 2017.

I am hoping to give a feedback of my results when it arrives.

karwiso
12-07-2017, 05:57 PM
Thank you, jenssonjoi!

I think that my sample for WGS was received by DanteLabs on 21 or 22 September 2017. So it was on the 10th week I got the results.
I have also got WES for my uncle - I was in Finland around Oct 20 and DanteLabs received the sample Oct 25. I got his WES results Dec 4, but I'm still waiting for his WGS.

It is really a pity that your results are delayed! Have you got any report about DNA extraction? All of my submitted samples got reports about DNA extraction with A quality. What quality are your samples?


Karwiso

That is amazing and interesting comparison you did on your genetic matches. Unfortunately I am still in the same situation as before. I am still waiting for my results from Dante Labs. The latest scheduled delivery is 22 December 2017. Its a long time since I shipped off my sample to them on the 6 September 2017.

I am hoping to give a feedback of my results when it arrives.

jenssonjoi
12-08-2017, 07:40 PM
The answer I got was that my sample is under bioinformatics analysis but Dante Labs said that they had quality issues after their quality control picked up that there were some missing variants so they started their process again. This is the only feedback i have received so far.

My sister handed her sample 5-6 days after me but she has not heard anything from them and the same thing applies to my wife's sample - I cannot see that they will get their WGS results before the end of this year.

This delay is very frustrating and the looser is Dante Labs as some other family members of mine are holding back with their tests until all three WGS have been handed over.

Geldius
12-09-2017, 04:29 PM
Hi,

I am also in the process of WGS test with Dante. It is 4th week these days.

I have received an email update after 3 week since my sample was delivered,
saying that DNA extraction was completed with level A.

starsoccer
12-09-2017, 06:05 PM
Karwiso

That is amazing and interesting comparison you did on your genetic matches. Unfortunately I am still in the same situation as before. I am still waiting for my results from Dante Labs. The latest scheduled delivery is 22 December 2017. Its a long time since I shipped off my sample to them on the 6 September 2017.

I am hoping to give a feedback of my results when it arrives.

Sorry to hear that, I am in the same situation, Mine was received in the first week of September, and I am still waiting for the results. I would suggest you complain to them via email. I did so and got a nice discount. I was told I would get my results before end of year.

Donwulff
12-10-2017, 06:03 AM
Sad to see so many people's results are delayed, as said that also hurts the company's reputation, and I can imagine people are edgy with a new company that's yet to prove themselves. Though it seems they've probably delivered a fair amount of results already as there's been talk of many orders, with some results seen, and haven't really seen complaints.

In any case, YFull just finished analysis of my Dante Labs WGS. I was bit concerned on that, too, because having looked at the raw data myself, I couldn't identify the read names or sequencing adapters used in the run (And that's still a problem, because I want to trim them for some of my own analysis; could this be NovaSeq?). In any case, as I checked earlier, the WGS analysis confirmed every variant in my Big Y file - with the odd exception of TWO new BigY variants that seemed to appear out of nowhere just recently, and are tagged "new" in my old BigY report. From whence did those come from? They CAN be found on my WGS files, both hg19 and GRCh38 re-mapped, but aren't listed on YFull analysis. Maybe their work to validate GRCh38 caused them to re-classify some filtered ranges as valid, but those ranges weren't yet applied to my WGS results. The BigY "Low Qual" SNP is similarly missing on the WGS results, though I can find it in my own data (see earlier post). Also, shame it wasn't possible to do this analysis with GRCh38 at YFull yet; I ended up just paying for it now to get to see the results.

My novel SNP stats are now:
BigY: Best qual: 16 (61.54%) [7 (26.92%) - best; 9 (34.62%) - acceptable] Ambigious qual: 9 (34.62%) (not on WGS) Low qual: 1 (3.85%) (not on WGS)
YFull: Best qual: 19 (82.61%) [18 (78.26%) - best; 1 (4.35%) - acceptable] One read!: 1 (4.35%) (Plus 3 special cases which aren't listed on YFull, but can be seen on raw data)

It's hard to present these calculations in a simple way due to so many categories, but I guess the WGS found 6 more best quality SNP's than BigY, and in addition three SNP's have differing interpretation between the analysis. I'd say that seems like a pretty good haul, though of course those SNP's probably can't be matched against people who have tested on BigY.

jenssonjoi
12-11-2017, 12:40 AM
Hi,

I am also in the process of WGS test with Dante. It is 4th week these days.

I have received an email update after 3 week since my sample was delivered,
saying that DNA extraction was completed with level A.

I wished I could see the pattern that Dante Labs uses to process their WGS orders. For example you "Geldius" are getting your results in record time and I am happy for you. Dante labs have moved their target for my results few times. My wife and my sister have not either heard anything from Dante Labs either.

jenssonjoi
12-11-2017, 12:48 AM
Sad to see so many people's results are delayed, as said that also hurts the company's reputation, and I can imagine people are edgy with a new company that's yet to prove themselves. Though it seems they've probably delivered a fair amount of results already as there's been talk of many orders, with some results seen, and haven't really seen complaints.

In any case, YFull just finished analysis of my Dante Labs WGS. I was bit concerned on that, too, because having looked at the raw data myself, I couldn't identify the read names or sequencing adapters used in the run (And that's still a problem, because I want to trim them for some of my own analysis; could this be NovaSeq?). In any case, as I checked earlier, the WGS analysis confirmed every variant in my Big Y file - with the odd exception of TWO new BigY variants that seemed to appear out of nowhere just recently, and are tagged "new" in my old BigY report. From whence did those come from? They CAN be found on my WGS files, both hg19 and GRCh38 re-mapped, but aren't listed on YFull analysis. Maybe their work to validate GRCh38 caused them to re-classify some filtered ranges as valid, but those ranges weren't yet applied to my WGS results. The BigY "Low Qual" SNP is similarly missing on the WGS results, though I can find it in my own data (see earlier post). Also, shame it wasn't possible to do this analysis with GRCh38 at YFull yet; I ended up just paying for it now to get to see the results.

My novel SNP stats are now:
BigY: Best qual: 16 (61.54%) [7 (26.92%) - best; 9 (34.62%) - acceptable] Ambigious qual: 9 (34.62%) (not on WGS) Low qual: 1 (3.85%) (not on WGS)
YFull: Best qual: 19 (82.61%) [18 (78.26%) - best; 1 (4.35%) - acceptable] One read!: 1 (4.35%) (Plus 3 special cases which aren't listed on YFull, but can be seen on raw data)

It's hard to present these calculations in a simple way due to so many categories, but I guess the WGS found 6 more best quality SNP's than BigY, and in addition three SNP's have differing interpretation between the analysis. I'd say that seems like a pretty good haul, though of course those SNP's probably can't be matched against people who have tested on BigY.

I agree with you Donwulff - I have never seen a complaint about any of the test results from Dante Labs. It appears that their quality is first class but few of us are getting on edge and starting to think we might never see any results. The weeks are ticking off one by one. Deadlines are set and broken.

I wish Dante Labs all the best and hope they will get their ship in order before it sinks.

jenssonjoi
12-18-2017, 03:39 PM
Things have moved on since few days ago. Dante Labs have delivered full results for my wife's WGS and they provided answers to few suspected issues. Although these are bad issues, we prefer to know about these DNA faults than not to know instead of always expecting the worse but never be sure. I intend to get further analysis for my wife's DNA results so any advice is welcome, where ever that may come from.

As for my WGS results I have noticed in my Dante Labs account that my test status has been marked: Success DNA - A, so I take it that I am nearly there and it should not be long until I get my full results.

Petr
12-18-2017, 04:10 PM
As for my WGS results I have noticed in my Dante Labs account that my test status has been marked: Success DNA - A, so I take it that I am nearly there and it should not be long until I get my full results.It is still long way:
The status "Success DNA - A" means your sample is perfectly qualified (the saliva quality is high).

jenssonjoi
12-18-2017, 06:27 PM
It is still long way:

Ohhh Noooo! Just starting then - The only results since 6 September 2017 is that 'my saliva quality is high' It will be well into next year when I get my results.

Petr
12-24-2017, 08:23 AM
A month ago they wrote me:
Yes, we take 50 business days to carry out the analysis. In my case 50 business days (2.5 months) just passed and no results yet.

starsoccer
12-30-2017, 10:26 PM
I am glad to say I finally got my results. Took nearly 5 months from when I sent my sample though. I also requested my data in BAM format which was quickly sent on a USB flash drive.

Does anyone have any tools/services they suggest be used. I was not very impressed with the information provided in their PDF report. It more or less seems to be straight from a bunch of apps from sequencing which dont provide much information in general.

karwiso
12-31-2017, 10:56 AM
Hi Starsoccer!

I would suggest that you use sequencing.com and upload your BAM with their Big Yotta. Then you can use Genomic VCF to get all call for your results. It is free. Then I would use Promethese to get more detailed health reports. (I am currently uploading my BAM and it really takes time unless you have very fast Internet)

starsoccer
12-31-2017, 05:44 PM
I did exactly that already. I am actually having some issues with sequencing.com. My BAM was uploaded and I used a few of the free apps with no issue, but the Genomic VCF tool took 3 days and just this morning I got an update saying it was complete, but no data or results were present so I re-ran it. I will see what happens though.

karwiso
01-10-2018, 03:34 PM
Hi!

I contacted Dante Labs and they sent me two USB-drives with FASTQ and BAM files. The BAM is aligned to GRCh37.
I have submitted the results to YFull.com to get my Y-SNPs.

I also uploaded BAM (120 GB ) to sequencing.com and it took a day or two until it showed up in the list of my files. Genome VCF app uses BAM files and it produced gVCF-file in a day or two.
I have tried to upload gVCF to GEDmatch Genesis, but the file has not been accepted because it was too large (1.1 GB ). So, bad news - we cannot use gVCF with gedmatch. I hope that they will develop a better matching algorithm and work with "ususal" VCFs.

Donwulff
01-10-2018, 07:45 PM
I need to keep better track of my "experiments". I think the Genomic VCF failed for me at GEDMatch Genesis, but last time I tried they also stopped accepting VCF files direct from Sequencing.com, because they don't have VCF at end of the sharing URL. I managed to use something like "&name=sample1.vcf.gz" (Which Sequencing.com ignores, but the service may interpret as file-type) at the end originally, but GEDmatch stopped accepting that. I'm wondering what I was sending if the genomic VCF was too large, though...

One solution to this would be to filter the VCF to calls for the "known" sites on dbSNP. This has the added benefit of filtering out novel SNP calls which might be errors; rare SNP's which have emerged during the last few generations have little significance for relative matching, because you already share so many other SNP's and could have emerged after the MRCA. Unfortunately I don't think Sequencing.com can do that filtering, and Genesis GEDMatch doesn't yet.

Another considerable option is still imputation, because it produces calls only for locations included in the reference panel. It would also fill for uncertain and uncalled locations in the genome; I'm not sure if there's any data on whether it's better to leave them uncalled or fill with imputation though. Depends a lot on the matching algorithm.

Currently I'm testing my Dante Labs sequence with Sanger Imputation Service on GEDMatch Genesis, and that significantly reduces the false positives, though it's significantly less straightforward than preferable for consumer genetics. Also, lack of notes, I think I may have included my Genes for Good results in the imputation, which is great for data quality, but after that you're pretty much just doing microarray vs. microarray comparisons against since few people have whole genome yet.

Edit: I don't use 23andMe for imputation because some of 23andMe's custom probes have reverse orientation from expected, and I'm STILL to get around to figuring which... Also, .vcf.gz as it's compressed.

Donwulff
01-10-2018, 07:59 PM
On the other hand Promethease.com still has free interpretation (And if you register & allow them to use your de-identified data, free report-updates) until 15th January, so be sure to take advantage of that before then if you've got your data aren't prone to worrying and hypochondria. Read their disclaimers carefully though, I hate it when people take Promethease interpretation as absolute truth or call it superior to clinically validated tests. The genomic VCF *does* work with them, although there's some caveats to low sequencing depth/quality calls. If you have microarray tests and relatives results (With consent!) all the better for seeing potential conflicting calls.

JamesKane
01-12-2018, 12:07 PM
If one (or more) of you who have received your Dante Labs WGS BAM back would consider sharing the file, I'd like to get the Y chromosome coverage stats added to: http://www.haplogroup-r.org/stats.html In return I'll send back a gVCF aligned to hg38 and callable status report in BED format. (This may take up to two weeks.)

The BAM needs to be in URL that can be retrieved via a HTTP or FTP pull, since my host can not currently accommodate an upload that large.

Use the BAM analysis submission form: http://www.haplogroup-r.org/submit.html It has a link to the Data Use policy the site abides by.

Donwulff
01-19-2018, 01:18 PM
If one (or more) of you who have received your Dante Labs WGS BAM back would consider sharing the file, I'd like to get the Y chromosome coverage stats added to: http://www.haplogroup-r.org/stats.html In return I'll send back a gVCF aligned to hg38 and callable status report in BED format. (This may take up to two weeks.)

The BAM needs to be in URL that can be retrieved via a HTTP or FTP pull, since my host can not currently accommodate an upload that large.

Use the BAM analysis submission form: http://www.haplogroup-r.org/submit.html It has a link to the Data Use policy the site abides by.

Hey, is this haplogroup R only as could be inhered from the site, or can all haplogroups send in? Also I've already done the hg38 mapping myself, however I've had trouble identifying the adapter sequences for the purposes of de novo alignment. I used READ_NAME_REGEX="CL1000XXXXXL2C([0-9]+)R([0-9]+)_([0-9]+)" (X'd out sample ID, wrote it in for performance) for the optical duplicate read-name template, which doesn't conform to normal bcl2fastq naming either. Could you figure out the adapter sequence, are those sequences barcoded or degraded?

JamesKane
01-21-2018, 11:26 AM
The primary focus of that site is indeed Haplogroup R. The NGS statistics page is more about the tests being used though and includes the out group samples picked up from publicly released WGS results.

As to your question on adapter sequence, I believe it is standard practice on Illumina instruments to trim them before delivering the final data. Having not played with any of the de novo alignment tools there's not much advice I can offer on workflow here. Check the usual places for people who deal with these types of questions daily. These are the ones I hit most frequently:

https://www.researchgate.net
https://www.biostars.org

Donwulff
01-21-2018, 07:40 PM
Thanks! I checked those places already, unfortunately the problem is not well understood or answered on those sites. In addition as suggested, there seems to be something weird going on with the Dante Labs raw data (Not the quality/quantity mind you, just that it isn't fully standard). For those not familiar with sequencing technology and terms, the adapters are DNA sequences which are ligated to the DNA fragments called "library inserts" to be sequenced. They're used to attach the DNA fragments for sequencing in the sequencer. When the insert length is shorter than the read length, here 100 bases, adapter read-through occurs and the sequencer reads part of the adapter instead of the actual sequence.

When the sequencing reads are mapped against reference genome eg. with BWA MEM, this poses little problem, as BWA MEM will "soft-clip" the read ends that do not match the reference and will not use them to determine variations/mutations. This is also probably safest way to try to figure out what an unknown adapter sequence might be, just by tallying up the soft-clipped sequences. Unfortunately, this doesn't appear to match Illumina's list of standard adapter sequences (Which is, quite big, and in addition a lab could develop their own adapters for reasons unknown), but also, there seems to be large variety in the potential adapter sequences. I used Trim Galore!/cutadapt before realizing that they're not standard Illumina adapters, so it doesn't work.

Another thing of note is that since the DNA insert fragment is read from both ends (paired-end sequencing), when adapter read-through happens, the reads will end up mirrors of each other. A few of the more recent adapter trimming programs, chiefly Trimmomatic, are using this to identify the adapters. Unfortunately, even Trimmomatic expects that the adapter sequence must resemble one of Illumina's standard adapters. I think I'm going to give AlienTrimmer that I just heard of a try, and/or write a script to programmatically tally out the soft-clipped sequences from the BAM file and try to find some kind of pattern in that. Of course, if even you don't know the adapter sequence, I guess I might as well see if Dante Labs can reveal it. One thought is they could be using NovaSeq, and I haven't found out technical details on Illumina's NovaSeq platform yet. Or could they be saving costs by using self-made adapters?

JamesKane
01-26-2018, 12:52 PM
Here are the raw callable loci stats for the Y chromosome on hg38 for the first Dante Lab's 30x WGS I received:

state nBases
CALLABLE 14,278,386
NO_COVERAGE 21,194
LOW_COVERAGE 308,597
POOR_MAPPING_QUALITY 9,065,418

The CALLABLE metric indicates regions with 4 or more reads, which have 90% of the reads with a PHRED-scaled read alignment quality of at least 10. The median value for a 30x WGS test with 150 base pair reads is 14,922,530. Where the two differ is in the regions assigned POOR_MAPPING_QUALITY or LOW_COVERAGE. I won't make too many more comments on those differences without seeing a few more to get an average.

At $700 this test is the bang-for-the-buck leader for chrY variant discovery at the moment. I'm not sure how well it performs for Y STR purposes. Intuitively it won't be quite as good as the 150 base pair tests. Although my Y Elite that was also 100 base pair has a better STR extraction rate than any 150 base pair Big Y I've seen at YFULL. This should be similar.

Donwulff
01-28-2018, 07:45 PM
YFull stats, both hg19
Big Y 10/14/2015 - up to 160bp PE
STRs (all): 587
Reliable alleles: 494 (84.16%)
Uncertain alleles: 19 (3.24%)
N/A: 74 (12.61%)
Out of Y111: 100 reliable, 1 uncertain, 10 N/A

Dante Labs, batched 18/07/2017
STRs (all): 587
Reliable alleles: 378 (64.40%)
Uncertain alleles: 35 (5.96%)
N/A: 174 (29.64%)
Out of Y111: 81 reliable, 7 uncertain, 23 N/A

This is just one sample, of course. For STR's, with the much shorter read length it's no huge surprise, this isn't a good deal for the STR's alone. However, when comparing against other NGS tests on YFull, remember you still have >300 of the shorter STR's available. Also, having upgraded from Y-37 to Y-111, I can confirm that all listed STR's were called correctly, even the uncertain one. Some people still swear by FTDNA's STR-tests, but I believe NGS tests are where it is at, and with more competition (Dante Labs unfortunately only in Europe) things are looking good. Since Dante Labs is WGS, it's not bound to any specific reference build, and could also pick up on haplogroup specific structural variation if that ever becomes a thing.

karwiso
01-29-2018, 09:01 PM
And here is statistics for my sample from Dante Labs (analysed at Yfull):

STRs (all): 587
Reliable alleles: 504 (85.86%)
Uncertain alleles: 25 (4.26%)
N/A: 58 (9.88%)

Petr
01-29-2018, 09:24 PM
I just received my Dante results. I was told that the raw data will be sent on a flash disk by end of February.

So I compared my 60 (mostly) private mutations obtained from FGC Y Elite 1.0 test and the VCF files contain 44 of 60 known mutations. This is not very good, but maybe YFull analysis will discover more - but have to wait about 2 months.

The average number of reads is 14, probably thy Y chromosome is not covered by this test so good if the total average should be 30 reads.

For completeness, here is the list of my SNPs and data from VCF:

SNP hg19 hg38 anc der reads
FGC20774 8356636 8488595 T C 14C
FGC20775 2842598 2974557 T C missing
FGC20776 3388904 3520863 A T 8T
FGC20777 6707665 6839624 A G missing
FGC20778 6804217 6936176 T G 26G
FGC20779 7315755 7447714 T G 14G
FGC20780 7523890 7655849 C A 10A
FGC20781 7611397 7743356 C G 22G
FGC20782 8854928 8986887 C T 26T
FGC20783 13225084 11069408 C T 16T
FGC20784 13809744 11689038 T G missing
FGC20785 13840799 11720093 G A 6A
FGC20786 14079163 11958457 T G 12G
FGC20787 14083779 11963073 G A 9A
FGC20788 14223146 12102440 A G 19G
FGC20789 15429333 13317453 G A missing
FGC20790 16697404 14585524 C T 16T
FGC20791 17239521 15127641 C T 10T
FGC20792 17240996 15129116 T C 13C
FGC20793 17362238 15250358 C T missing
FGC20794 17579706 15467826 G T 12T
FGC20795 17630454 15518574 C A 14A
FGC20796 17760314 15648434 T C missing
FGC20797 18068752 15956872 G A 16A
FGC20798 18247663 16135783 G A 15A
FGC20799 18880452 16768572 A G 22G
FGC20800 19074023 16962143 G A missing
FGC20801 19094079 16982199 G A 8A
FGC20802 19466943 17355063 T G 15G
FGC20803 20813181 18651295 G T 17T
FGC20804 21092042 18930156 G T 13T
FGC20805 21344128 19182242 T A missing
FGC20806 21468981 19307095 T C missing
FGC20807 22173652 20011766 A G 14G
FGC20808 22565789 20403903 G A 14A
FGC20809 22680941 20519055 G A 15A
FGC20810 23558775 21396889 A T 20T
FGC20811 24356640 22210493 T G missing
FGC20812 6762871 6894830 CAAT C 14C
FGC20813 16669603 14557722 GA G 20G
Z41150 21527519 19365633 T TA 3T 18TA
FGC31474 23883529 21721643 C T missing
FGC31475 13218678 11063002 C T 12T
FGC31476 3224887 3356846 A G 6G
FGC31477 4385730 4517689 C T 29T
FGC31478 5024010 5155969 G C missing
FGC31479 5739310 5871269 T C 9C
FGC31480 5762800 5894759 A G 15G
FGC31481 6581432 6581432 T C missing
FGC31482 6476811 6608770 T C 12C
FGC31483 6507549 6639508 G A 12A
FGC31484 7131444 7131444 A G missing
FGC31485 9076129 9238520 G A 11A
FGC31486 14391678 12270974 C G 9G
FGC31487 15773565 13661685 C A 12A
FGC31488 20454018 18292132 T C missing
FGC31489 21552883 19390997 C T 17T
FGC31490 23821665 21659779 G A 17A
FGC31491 28621381 26475234 G A missing
A9522 21362958 19201072 A G 5G

Donwulff
01-31-2018, 02:35 PM
The way to express the read depth is slightly liable to lead to confusion. Remember we have chromosome *pairs*, one copy from each parent. Except for the non-autosomal, sex-chromosomes, where we get Y from father (if male) and X from mother. The read depth is expressed for chromosome pair, so the 30X WGS expected read depth for Y chromosome alone is 15X. (Of course, this also means the actual read depth usable for determining SNP's etc. on each of a pair of autosomal chromosomes is on average 15X too, so technically it should be called 15X sequence, but that's just not the convention). There are also a bunch of other factors that go into calculating the coverage, depending which regions you include in the average. I posted my coverage for the individual SNP's earlier in this thread, and for me they came closer to 20 or so.

Donwulff
01-31-2018, 02:39 PM
And here is statistics for my sample from Dante Labs (analysed at Yfull):

STRs (all): 587
Reliable alleles: 504 (85.86%)
Uncertain alleles: 25 (4.26%)
N/A: 58 (9.88%)

Interesting, I wonder how you got so much higher, or I got so much lower stats. To be honest, I'd expect the lower STR count from 100bp read length; I'm sure many of the STR's are shorter than that. Has the read length changed, or is that 100bp as well?

karwiso
01-31-2018, 09:20 PM
Interesting, I wonder how you got so much higher, or I got so much lower stats. To be honest, I'd expect the lower STR count from 100bp read length; I'm sure many of the STR's are shorter than that. Has the read length changed, or is that 100bp as well?

No, I have not seen any change in the specifications.

Here is some info that accompanied FASTQ files for my sample:
Raw reads 1329511702
Raw bases (Mb) 132951.17
Clean reads 1318214110
Clean bases (Mb) 131821.41
Clean data rate (%) 99.15
Clean read1 Q20 (%) 97.43
Clean read2 Q20 (%) 90.66
Clean read1 Q30 (%) 89.41
Clean read2 Q30 (%) 77.33
GC content (%) 41.50

And the statistics for BAM:

Clean reads 1318214110
Clean bases (Mb) 131821.41
Mapping rate (%) 88.39
Unique rate (%) 93.14
Duplicate rate (%) 2.26
Mismatch rate (%) 0.86
Average sequencing depth (X) 37.90
Coverage (%) 99.86
Coverage at least 4X (%) 99.50
Coverage at least 10X (%) 98.58
Coverage at least 20X (%) 94.28

These are images supplied together with the BAM:
211232112421125

Another moment: my FASTQ file was realigned to GRCh38 and then went to Yfull analysis.

dbm
02-01-2018, 03:18 AM
Hello Everyone:

Ordered my WGS - should be not too much longer before i get results. Am curious if anyone has:

1. Ran samtools flagstats on their BAM file and would be willing to share the results.
2. Has ran a coverage map of either all chromosomes - or a single chromosome
3. If anyone has personally re-mapped the fastq to an alternate reference genome - or even the same reference genome. If yes, what was your pipeline to get to VCF.
3a. For VCFs generated from alternate reference genomes, has anyone compared the SNPs in that one with the ones provided by Dante's VCF?
4. Any general experience/comments with using tools like samtools, bwa-mem, picard, gatk, etc., on Dante's files?
5. Has anyone used cloud-based resources like AWS, Azure, etc., to handle the alignment and other tasks that take up a massive amount of memory?
6. Any comments on the SNPs or other abnormalities identified by Dante? Are they primarily simple SNPs, or does their analysis look for more complex interactions?
7. Does anyone have any insight on the pipelines used by Dante to create their BAM and VCF files?
8. Anyone know if they have a specific sequencer they use? i thought in the fastq sequence identifier line there is supposed to be some code that shows, but i wasn't sure.

Not asking or accepting trade secrets here - more looking for feedback on the quality/robustness and how to verify accuracy with alternate processing pipelines.

Sorry, i know that is a lot of questions. i don't want to be a taker in all this. i was planning on writing some code that visualizes some of the quality metrics both across whole chromosomes and perhaps also letting you narrow down to a specific location also. If there is any interest, would be happy to share. Anyone coding basic python/pandas/matplotlib should be able to use.

Thanks very much!

boaziz
02-01-2018, 07:32 AM
Can you please help me in finding a software for reading th VCF file i have of my Dante WGS 30x results?
Because it is very Large 979 MB ( SNP.VCF )

JamesKane
02-01-2018, 11:26 AM
3. If anyone has personally re-mapped the fastq to an alternate reference genome - or even the same reference genome. If yes, what was your pipeline to get to VCF.

I have an old blog article that documents how to setup pipeline to remap the reads in a BAM. http://www.it2kane.org/2016/10/y-dna-variant-discovery-workflow-pt-1-1/ This was the flow that produced the coverage stats shared earlier in this thread.

The bbmap dependency is strictly for Big Y BAMs, which for some odd reason don't have the pairs named correctly. That step can be removed for all other labs thus far.

I wouldn't recommend doing this type of work on a cloud-computing platform unless you have a very limited computer yourself. The cost of these resources is disproportionally high. The workflow above will run on a system with 32GB of RAM. You'll also need the better part of a TB of disc unless you purge the intermediary files as soon as they are no longer needed.

Donwulff
02-01-2018, 11:28 AM
Hello Everyone:

Ordered my WGS - should be not too much longer before i get results. Am curious if anyone has:


I've posted some info before, but I'm now wondering if there's much variation between the sequences. Maybe need to write a full technical review somewhere, though, but we may want multiple samples.

samtools flagstat
1318582757 + 0 in total (QC-passed reads + QC-failed reads)
1976391 + 0 secondary
0 + 0 supplementary
41999309 + 0 duplicates
1293479874 + 0 mapped (98.10% : N/A)
1316606366 + 0 paired in sequencing
658303183 + 0 read1
658303183 + 0 read2
1262728258 + 0 properly paired (95.91% : N/A)
1290527310 + 0 with itself and mate mapped
976173 + 0 singletons (0.07% : N/A)
21649590 + 0 with mate mapped to a different chr
16276232 + 0 with mate mapped to a different chr (mapQ>=5)

chrM is UCSC hg19, no longer used by anybody. Primary contigs only, as list would be too large otherwise.

samtools idxstats
reference sequence name, sequence length, # mapped reads and # unmapped reads, crude coverage (mapped/sequence length)
chrM 16571 1114400 743 6725
chr1 249250621 104835827 69382 42.0604
chr2 243199373 110231171 73358 45.3254
chr3 198022430 85934500 57802 43.3963
chr4 191154276 87467356 60970 45.7575
chr5 180915260 78877818 49480 43.5993
chr6 171115067 73078352 52386 42.7071
chr7 159138663 71034495 50350 44.6369
chr8 146364022 67846742 49551 46.3548
chr9 141213431 53744570 35734 38.0591
chr10 135534747 66866209 52418 49.3351
chr11 135006516 58715173 43470 43.4906
chr12 133851895 57176882 38311 42.7165
chr13 115169878 42195610 26464 36.6377
chr14 107349540 38966204 27580 36.2984
chr15 102531392 35926428 23781 35.0394
chr16 90354753 40119201 28965 44.4019
chr17 81195210 34812311 27299 42.8748
chr18 78077248 34701490 22736 44.4451
chr19 59128983 24955928 20875 42.2059
chr20 63025520 26404612 19901 41.8951
chr21 48129895 17396969 26926 36.1459
chr22 51304566 15499777 34082 30.2113
chrX 155270560 34573577 27791 22.2667
chrY 59373566 12071826 18312 20.332

From my own re-mapping (bwa-kit hs37d5 reference, I'm not sure what happened to the Y coverage here)

1317646569 + 0 in total (QC-passed reads + QC-failed reads)
1040203 + 0 secondary
0 + 0 supplementary
36154060 + 0 duplicates
1294216236 + 0 mapped (98.22% : N/A)
1316606366 + 0 paired in sequencing
658303183 + 0 read1
658303183 + 0 read2
1270542026 + 0 properly paired (96.50% : N/A)
1292474850 + 0 with itself and mate mapped
701183 + 0 singletons (0.05% : N/A)
16699618 + 0 with mate mapped to a different chr
14619369 + 0 with mate mapped to a different chr (mapQ>=5)

reference sequence name, sequence length, # mapped reads and # unmapped reads, crude coverage (mapped/sequence length)
1 249250621 99369162 53752 39.8672
2 243199373 104985266 55846 43.1684
3 198022430 85451852 44913 43.1526
4 191154276 82975628 42818 43.4077
5 180915260 78463656 40925 43.3704
6 171115067 73767591 38779 43.1099
7 159138663 68268759 36823 42.8989
8 146364022 64142271 34032 43.8238
9 141213431 51655993 27951 36.5801
10 135534747 56830824 30867 41.9308
11 135006516 57089972 31340 42.2868
12 133851895 56648648 30103 42.3219
13 115169878 42085155 21765 36.5418
14 107349540 38757536 20876 36.1041
15 102531392 35740295 19621 34.8579
16 90354753 38401575 21179 42.5009
17 81195210 33312257 19731 41.0274
18 78077248 32864433 17732 42.0922
19 59128983 23157087 14305 39.1637
20 63025520 25708903 14900 40.7913
21 48129895 16086364 8797 33.4228
22 51304566 14788542 9134 28.825
X 155270560 34354354 18908 22.1255
Y 59373566 9177632 5009 15.4574
MT 16569 1115245 703 6730.91

Correction: My remap was hs37d5 in this case, not GRCh38 as originally stated. Here is GRCh38 also bwa-kit (Based on the analysis ready reference):

I have not marked duplicates on this run, because I can't use GRCh38 VCF-file for pretty much anything yet.

1368769055 + 0 in total (QC-passed reads + QC-failed reads)
4970207 + 0 secondary
47192482 + 0 supplementary
0 + 0 duplicates
1345320465 + 0 mapped (98.29% : N/A)
1316606366 + 0 paired in sequencing
658303183 + 0 read1
658303183 + 0 read2
1263898958 + 0 properly paired (96.00% : N/A)
1292437000 + 0 with itself and mate mapped
720776 + 0 singletons (0.05% : N/A)
22472264 + 0 with mate mapped to a different chr
15192219 + 0 with mate mapped to a different chr (mapQ>=5)

chr1 248956422 109426298 58001 43.954
chr2 242193529 106587604 56626 44.0093
chr3 198295559 89476341 46912 45.1227
chr4 190214555 86539563 45239 45.4958
chr5 181538259 80265012 41809 44.2138
chr6 170805979 74924678 39361 43.8654
chr7 159345973 70345052 37822 44.1461
chr8 145138636 64220931 33981 44.248
chr9 138394717 53204756 28792 38.4442
chr10 133797422 59909375 33260 44.7762
chr11 135086622 58332392 31812 43.1815
chr12 133275309 57640597 30563 43.2493
chr13 114364328 46327525 24007 40.5087
chr14 107043718 38845806 21082 36.2897
chr15 101991189 36723227 20143 36.0063
chr16 90338345 39765271 22083 44.0182
chr17 83257441 35623219 21430 42.7868
chr18 80373285 35397632 18689 44.0415
chr19 58617616 23234996 14432 39.6382
chr20 64444167 29552764 17224 45.8579
chr21 46709983 19597422 12442 41.9555
chr22 50818468 16680818 10400 32.8243
chrX 156040895 34500656 18935 22.11
chrY 57227415 10510535 9686 18.3663
chrM 16569 1115221 706 6730.77

JamesKane
02-01-2018, 11:30 AM
Can you please help me in finding a software for reading th VCF file i have of my Dante WGS 30x results?
Because it is very Large 979 MB ( SNP.VCF )


http://software.broadinstitute.org/software/igv/ works great.

You can also use a tool like GREP from the command line to search for specific sites or tabix to do range searches.

Donwulff
02-01-2018, 12:01 PM
The read names aren't standard Illumina bcl2fastq names, so I run BWA MEM with extra parameters READ_NAME_REGEX="CL100......L.C([0-9]+)R([0-9]+)_([0-9]+)" OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500. The BAM file identifies technology as ILLUMINA, but it seems NovaSeq was being used by some providers by the time Dante Labs rolled their product out. I'm not sure if BGI-SEQ is using their own identifier, but I had the impression it wasn't ready by that time yet, and no info on the specs. I run MarkDuplicates on read-name sorted BAM, before coordinate sort, to take advantage of new workflow that marks secondary alignments as duplicate as well. BQSR is ran over whole BAM and not shortest chromosome only, because I'm analyzing few sequences so there's no need to save resources. HaplotypeCaller is dreadfully slow, if you want to use it to call dbSNP sites only, so --dbsnp All_20170710_GRCh37p13.vcf.gz --emitRefConfidence GVCF for snp-id annotated gVCF seems like a good first step, then pick dbSNP sites from that with some other tool if needed for GedMatch Genesis etc.

Donwulff
02-02-2018, 10:04 AM
Can you please help me in finding a software for reading th VCF file i have of my Dante WGS 30x results?
Because it is very Large 979 MB ( SNP.VCF )

DNA.Land had some instructions on https://dna.land/vcf-info - unfortunately Dante Labs doesn't provide the VCF index file, so you'd need to figure a way to generate it for the Genome Compass if I recall right.

If you're on a mobile device, often cloud-based solutions are the only option. I had some VCF exploring options on https://caelestic.com/variant-analysis-list though that's always in need of revision.

Actually I think there needs to be some new lists/resources and perhaps even tools for VCF files, nobody really goes into how to use grep etc. and it's far from perfect. In fact, if you can use grep, then you can probably use tabix (https://archive.codeplex.com/?p=bow maybe?). Although that needs search by rsID too, blah.

I do almost all bioinformatics on Linux, or maybe R, but if you're not already using them, then the curve to get using them is very high. Windows GUI tools are sadly lacking, and despite starting to pack the computing power of desktops just a few years ago, mobile phones/tablets aren't really well geared for this.

boaziz
02-02-2018, 02:59 PM
glogg
bonnefon.org

Thank you

Donwulff
02-04-2018, 09:48 AM
Lot of questions though, kinda answering bit by bit when I have time ;) Although stuff like mapping should probably go on a thread/post of it's own...
Most of this and my previous comments are in context of Broad Institute GATK Best Practices https://software.broadinstitute.org/gatk/best-practices/ which could be considered the reference book on this kind of pipelines.

But on the Dante Labs pipeline analysis; a long read!

Identifying reference genome, if not stated, isn't always straightforward. In DanteLabs's case, though, looking at the BAM file headers we have:
@SQ SN:chrM LN:16571

The 16571 basepair chrM is NC_001807 which UCSC used, but is used by practically nobody nowadays (It's not the same as rCRS or RSRS). Since in mapping reads will align to a location they best match with, changing the reference, even something as simple as the mitochondrial DNA, can cause reads to map differently elsewhere, and labs like to keep things unchanged so the results remain comparable. This one probably should be changed though, but YFull interpreted full mtDNA from the sequence, which is probably easiest option.

Anyway, there's no decoy sequences either - bits of DNA that aren't part of the human genome, but are almost always present in the sequence, like the Epstein-Barr retrovirus that cut & pastes itself all around the host genome. Some studies have shown that results are better using the decoy sequence, because the reads aren't forced to map where they don't belong on the human sequences, though I've not seen them in any of the saliva based sequencing results. Personally I use the hs37d5 reference which has the decoy sequences, but this also means there will be more mapped reads on hs37d5 just because the non-human reads are mapping as well.

Finally, there's the program group headers, which in this case let us know that the alignment was using a genome reference named "ucsc.hg19/ucsc.hg19.fa". Now that's a file name, so you can actually call it whatever you want, but as it happens this is consistent with the contingents (chrM) and lack of decoy sequence, so we can safely assume this is the original UCSC hg19 reference. These contigs match the ones in VCF, so we can assume at least the reference is same, though it's possible the customer delivered BAM file is different from what they used for calling.

Since we have the program group lines, there are a few inessential things we can gleam from the DanteLabs pipeline:

@PG ID:bwa PN:bwa VN:0.7.15-r1140 CL:/opt/bin/bwa-0.7.15/bwa mem -t 10 -M -Y -R @RG\tID:CL100XXXXXX_L0X_17\tPL:ILLUMINA\tPU:CL100X XXXXX_LX\tLB:WHYZLAADKGAAXXXXXX-17\tSM:1500XXXXXXXXXXX ucsc.hg19/ucsc.hg19.fa /l3bioinfo/CL100XXXXXX_L0X_17_1.05_1.fq.gz /l3bioinfo/CL100XXXXXX_L0X_17_1.05_2.fq.gz

This tells us it's mapped with bwa-mem, this is industry standard so nothing surprising or useful in that. 0.7.15 is actually really recent version. The most recent changes in it have involved alternate contingent support, this is when the reference contains multiple different sequences for the same genetic region. Normally the reads would map to different alt-contigs haphazardly, preventing getting a good sequence for it, so the mapper needs to have support for this. The hg19/GRCh37 doesn't have alt-contigs though, so we're all good here, it's just good to see the pipeline is up to date. "l3bioinfo" folder/directory suggests this is L3 Bioinformatics pipeline and/or service. (And I don't consider this a tradesecret because it's openly in the header file; there could be number of reasons for that though)

There's the parameters, "-t 10" means 10 threads. Actually there are 127 parallel invocations of the bwa mem, so that means the mapping in this case was done on 1270 parallel runs! The actual mapping won't take long, assuming they have that many cores. -M is "mark shorter split hits as secondary" which is expected by Picard Tools and therefore always used and -Y is "use soft clipping for supplementary alignments". Sometimes the whole read doesn't map at single place, but it can be split with the parts mapped to different locations on the genome, normally the whole read is listed just once with rest of the matches showing only the matching part of the read. With "-Y" the read is listed again with the matching region marked (ie. soft-clipped). I'm not actually sure what requires this, it's possible HaplotypeCaller could benefit from it, or it can be just to make sure all reads listed are 100bp long. In short, the settings here are pretty standard, and fit for Illumina 100bp paired end sequencing.

The other kind of program group header we have here is MarkDuplicates, from Picard Tools. This is standard choice, and the reason for the -M on the bwa mem invocation:
@PG ID:MarkDuplicates VN:2.5.0(2c370988aefe41f579920c8a6a678a201c5261c1_ 1466708365) CL:picard.sam.markduplicates.MarkDuplicates INPUT=[/l3bioinfo/... ... ...] OUTPUT=chr1.markdup.bam METRICS_FILE=chr1.metrics.txt VALIDATION_STRINGENCY=STRICT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 REMOVE_SEQUENCING_DUPLICATES=false TAGGING_POLICY=DontTag REMOVE_DUPLICATES=false ASSUME_SORTED=false DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values> OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json PN:MarkDuplicate

We can see this is ran with just the standard settings, no special options, which means that no reads are actually deleted/removed, they're just marked as duplicates. Good for extracting FASTQ from the BAM. However, there's one note I want to stress, the "READ_NAME_REGEX=<optimized capture of last three ':' separated fields as numeric values>" does not work because that's not the format of the read name in these files! So the FASTQ read names have changed, but apparently nobody's noticed. Also, the OPTICAL_DUPLICATE_PIXEL_DISTANCE should be 2500 or so for this kind of high-throughput sequencing. I'm not sure if this makes practical difference especially since the general duplicate marking is still done, but this means optical duplicate removal doesn't happen at all. (Actually, I'll just re-run the duplicate marking on it and compare stats). In short, the sequencer is an optical instrument that reads the sequence from the fluorescent dyes attached to the DNA bases. This makes it possible to read the sequence from same DNA fragment multiple times, which would bias variant-calling and using lower quality optical reads, so you want to ignore similar sequences which are optically close by, which means you need to extract their location and distance on the flowcell. Again, the general duplicate detection catches most of these, so I'm not sure that's a problem.

There are no other program group files on the BAM. Specifically, Base Quality Score Re-calibration BQSR appears not to have been run. This is good news if you want to extract the reads for your own use, because it means the reads haven't been messed with. Likely, they ran BQSR before variant calling but provided the raw file for people's own analysis, though https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5048557/ suggests it may not always be beneficial, but the Broad Institute GATK Best Practices still call for it. If you're doing your own analysis on the DanteLabs provided BAM file, you may want to consider running BQSR. I should probably run a report on that as well ;) BQSR determines the probability of sequencing error depending on sequencing context (Read group and preceding nucleotide bases) from the novel (Not seen before) variants on the BAM sequence, and then uses that to adjust the quality of the bases read. Ie. if every time GATC is the sequence read the C base is a novel variant not seen before, then maybe you should ignore C when you read the sequence GATC.

The VCF file gives us the final part we can gleam from these:
##GATKCommandLine=<ID=ApplyRecalibration,Version=3.3-0-g37228af,...,ts_filter_level=99.0>

This is actually different from BQSR, the variant call recalibration works on variant call level and not on the BAM itself, though the idea is somewhat similar: Find systematic errors and apply them to the variant calls via machine learning. This one is called VQSR. In this case, the cut-off point is that they want variants to be 99.0% certain compared to "known real variants". I'd consider that pretty lenient, so further filtering or at least checking results might be warranted if using this VCF. The raw result (without percentages) is VQSLOD in the INFO annotation; larger is better.

##GATKCommandLine=<ID=SelectVariants,Version=3.3-0-g37228af,..., variant=(RodBinding name=variant source=out_snp.vcf.gz)>

This one appears to just be selecting SNP's which are in their "out_snp" list. So no novel SNP's are listed. Theoretically this would be "SNP's we've validated to genotype correctly in our lab", but I'd assume this is some old copy of dbSNP, I could add dbSNP revision annotation to the VCF and see what's the latest revision they list. I noticed that when I run the analysis myself, I'm getting hugely larger list of variants. If you run Genomic VCF on the BAM on Sequencing.com, the only header it has is UnifiedGenotyper, so I'm thinking that probably skips BQSR and VQSR entirely, but see above reference for how they may not be beneficial, and at least you get all calls, including the gVCF ref-call blocks. (Also, UnifiedGenotyper is kinda old by now).

Re-running the whole analysis pipeline might be worth it, but considering how relatively little you can do with at least novel SNP's, that may not be of benefit to the average user, especially in light of how many mistakes one might make doing it themselves. If one is looking to learn and experiment, then by all means, have at it though ;)

Donwulff
02-04-2018, 07:59 PM
I'll have to admit my description on optical duplicates was slightly incorrect. I just tested this out on the Dante Labs provided BAM file as I suggested, and found it doesn't affect the duplicates found at all.
The main difference is it appears to be slightly faster with that; should run multiple trials with otherwise idle computer to be sure. I started these two in paraller with some free cores, hence 12G per tasks. samtools flagstat are exactly identical for both runs; although using latest Picard Tools at least, it's slightly different from flags on the original BAM file (41999309 duplicates flagged in original BAM having Picard 2.5.0; 42330893 with Picard 2.17.6 - you don't normally want to run latest version in bioinformatics pipelines by the way, I'm doing experimenting and development).

Duplicate marking runtime without optical duplicates:
real 499m10.051s
user 588m38.432s
sys 8m4.488s

Duplicate marking runtime with optical duplicates:
time java -Xms12G -jar picard.jar MarkDuplicates INPUT=sample1.bam OUTPUT=sample1_MarkDuplicates.bam METRICS_FILE=sample1_MarkDuplicates.bam.metrics.tx t
real 471m32.868s
user 454m16.360s
sys 6m46.476s

Broad Institute actually says at https://gatkforums.broadinstitute.org/gatk/discussion/6747/how-to-mark-duplicates-with-markduplicates-or-markduplicateswithmatecigar:
"The Broad's production workflow increases OPTICAL_DUPLICATE_PIXEL_DISTANCE to 2500, to better estimate library complexity. The default setting for this parameter is 100. Changing this parameter does not alter duplicate marking. It only changes the count for optical duplicates and the library complexity estimate in the metrics file in that whatever is counted as an optical duplicate does not factor towards library complexity. The increase has to do with the fact that our example data was sequenced in a patterned flow cell of a HiSeq X machine. Both HiSeq X and HiSeq 4000 technologies decrease pixel unit area by 10-fold and so the equivalent pixel distance in non-patterned flow cells is 250. You may ask why are we still counting optical duplicates for patterned flow cells that by design should have no optical duplicates. We are hijacking this feature of the tools to account for other types of duplicates arising from the sequencer. Sequencer duplicates are not limited to optical duplicates and should be differentiated from PCR duplicates for more accurate library complexity estimates."

So my information on optical duplicates is actually slightly outdated, apparently it's no longer necessary for it's original purpose on newest technologies (Granted, I can't tell if this sequencing WAS done on patterned flowcells or not). However, it DOES have a significant difference for estimating library complexity: About 1.2 billion without optical duplicates, 3.5 billion with the Broad Institute recommended OPTICAL_DUPLICATE_PIXEL_DISTANCE=2500 and correct regex.

Anyway, the bottom line is in my writeup above, I thought that the optical duplicates would have relatively little effect in the actual duplicates, but testing it out shows that it has exactly zero effect on which reads are marked as duplicate, and only matters if you're using sequencing library complexity for something (Probably not, it could be used for comparing different wet-lab workflows and preparation kits for example). It will probably save you some runtime, though. The bioinformatics tools version does have distinct effect though, as you might expect, I don't see any difference in options that should matter. (EDIT: I just remembered the Dante Labs pipeline marks duplicates per chromosome, probably to parallelise the runs. However, this probably causes it to be unable to mark duplicates fragments whose ends aren't on the same chromosome, which is more likely reason for the large difference than the tool version. I'm not really sure I want to test that theory, though. Maybe.)

karwiso
02-05-2018, 09:01 AM
And here is some more statistics from my sample at YFull:

ChrY BAM file size: 0.85 GbHg38
Reads (all): 9877517
Mapped reads: 9855042 (99.77%)
Unmapped reads: 22475 (0.23%)
Length coverage: 26360194 bp (99.79%)
Min depth coverage: 1X
Max depth coverage: 8010X
Mean depth coverage: 32.61X
Median depth coverage: 19X
Length coverage for age: 8469975 bp
No call: 54849 bp

Donwulff
02-05-2018, 12:03 PM
And here is some more statistics from my sample at YFull:

ChrY BAM file size: 0.85 GbHg38
Reads (all): 9877517
Mapped reads: 9855042 (99.77%)
Unmapped reads: 22475 (0.23%)
Length coverage: 26360194 bp (99.79%)
Min depth coverage: 1X
Max depth coverage: 8010X
Mean depth coverage: 32.61X
Median depth coverage: 19X
Length coverage for age: 8469975 bp
No call: 54849 bp

I wanted to ask, how did the Hg38 mapping happen? I was under the impression YFull didn't even accept them yet.
I think I've probably posted mine before, but here they are again for comparison purposes, Hg19 though so... yeah.

ChrY BAM file size: 0.95 GbHg19
Reads (all): 12090138
Mapped reads: 12071826 (99.85%)
Unmapped reads: 18312 (0.15%)
Length coverage: 25578704 bp (99.71%)
Min depth coverage: 1X
Max depth coverage: 8016X
Mean depth coverage: 40.56X
Median depth coverage: 21X
Length coverage for age: 8445972 bp
No call: 74862 bp

BigY:
ChrY BAM file size: 0.41 GbHg19
Reads (all): 6464708
Mapped reads: 6464708 (100.00%)
Unmapped reads: 0
Length coverage: 14228866 bp (55.47%)
Min depth coverage: 1X
Max depth coverage: 7718X
Mean depth coverage: 51.70X
Median depth coverage: 36X
Length coverage for age: 7946632 bp
No call: 11424700 bp

karwiso
02-05-2018, 01:32 PM
I wanted to ask, how did the Hg38 mapping happen? I was under the impression YFull didn't even accept them yet.
I think I've probably posted mine before, but here they are again for comparison purposes, Hg19 though so... yeah.


I have got FASTQ files from Dante Labs - that allows to align the reads to any reference genome. YFull could extract the reads for Y-chromosome and align it to Hg38, but as I understood they would not normaly work with whole genomes data. I think this FASTQ option was on their order page, but it seems it is removed now.

I have ordered the third party analysis at FGC, so we can see what they can supply, probably bam aligned to Hg38. I will try sequencing.com, but transferring 112-120 GB takes a lot of time. I have not tried samtools, but probably later - I need to learn more about bioinformatics and genetic data processing.

JamesKane
02-05-2018, 04:09 PM
I wanted to ask, how did the Hg38 mapping happen? I was under the impression YFull didn't even accept them yet.

They have been taking Hg38 BAMs for a bit. About as long as YSEQ started aligning their WGS files to the newer reference anyway.

Since you should have the full set of reads from the FASTQ in a WGS BAM, anyone with a few tools and a good amount of free time could do this and upload them via some cloud storage.

Donwulff
02-05-2018, 09:02 PM
They have been taking Hg38 BAMs for a bit. About as long as YSEQ started aligning their WGS files to the newer reference anyway.

Since you should have the full set of reads from the FASTQ in a WGS BAM, anyone with a few tools and a good amount of free time could do this and upload them via some cloud storage.

Dante Labs offers FASTQ straight too, I got the BAM because I wanted to see their alignment, didn't realize I should've asked for both though, lol. And yes as I can be seen, I've already mapped them to several different references myself. I was curious how other people do it though. I know you can pay for mapping on Sequencing.com, possibly, and there was talk of it being too large upload. Wasn't aware YFull was already taking hg38 either, although in truth, I've always been little unsure of their pipeline. Accepting FASTQ in the past made it seem like they'd re-do mapping themselves, but then they're requiring hg38 mapped submissions, so I'm not sure. Also the stats page on YFull is straight copy of the original mapping stats, so either they take that before analysis (Likely, in fact) or they don't map it themselves. Also if you do the mapping yourself, there's still a choice of GRCh38 with or without alt contigs, with decoy sequences, analysis ready or latest patch etc. all which would subtly change the results.

Anyway, yeah, the "In this case YFull mapped it to hg38 themselves" is a good answer ;)

BAM is *probably* better option normally, because it allows more additional information than FASTQ usually. Broad Institute's preference for raw reads is now "unmapped BAM" format. Of course, you can strip the mapping out, or just leave it in so you can check the pileup/alignment later. I rarely work with FASTQ directly, but I understand if YFull's workflow takes/took them.

Also, while I'm here, just finished testing duplicate-flagging the Dante Labs provided BAM in queryname(readname) sorted order; a new feature in the Picard Tools that will also mark secondary alignments (When a read maps well to multiple locations on the genome) as duplicate if the primary alignment is duplicate. Some newer Broad Institute pipelines are using this feature. For my BAM file, this flagged 253 950 additional mappings out of 1 976 391 secondary mappings as duplicate, going from 42 330 893 (3.21%) to 42 584 843 (3.23%) duplicates total. And 41 999 309 (3.19%) in the originally provided BAM.

Unfortunately this is fairly expensive operation for small gain (In fact, depending if the downstream pipeline even uses secondary alignments), in particular because Picard Tools requires the BAM file to be sorted in its own order which differs from everything else. Except... for the Broad Institute "Unmapped BAM" order, so if you use that, you can then use the new duplicate-marking flow on the freshly mapped uBAM before sorting it to chromosome/coordinate order adding little to the normal runtime (However, it reduces the ability to analyse pieces of same genome in parallel!). In theory at least, the more duplicates you're able to mark and exclude from the variant calling, the more fair it will be.

dbm
02-06-2018, 06:44 AM
I have got FASTQ files from Dante Labs - that allows to align the reads to any reference genome. YFull could extract the reads for Y-chromosome and align it to Hg38, but as I understood they would not normaly work with whole genomes data. I think this FASTQ option was on their order page, but it seems it is removed now.

I have ordered the third party analysis at FGC, so we can see what they can supply, probably bam aligned to Hg38. I will try sequencing.com, but transferring 112-120 GB takes a lot of time. I have not tried samtools, but probably later - I need to learn more about bioinformatics and genetic data processing.

It looks like sequencing.com does the mapping for you - this really surprised me that they would be willing to do that - i sort of wonder what their privacy policy is. Regardless, FastQ i believe are significantly smaller than the BAM files.

So they would take your FastQ, align to a reference genome (creating a BAM), and then create a VCF for you.

i'd be very interested to see the differences in VCF files between Dante, Sequencing and any other provider's pipelines - including mapping to alternative references genomes. i'm working on putting together a pipeline geared more towards single genome analysis based on Broad/GATK4. They actually have some pipelines already made public, they are super confusing for a simpleton like me - and seem very much geared towards the analysis of massive numbers of genomes - not just a single one.

JamesKane
02-06-2018, 02:58 PM
The initial GATK pipeline is generally the same for single samples or batch. You align, dedup, recalibrate, and produce the intermediate gVCF.

For a single file you simply run GenotypeGVCFs on the single target rather than combining a batch. You are stuck with filtration methods here, since last I tried VSQR doesnít work on a single sample.

As a note freebayes is another popular tool that doesnít require all the recalibration preprocessing. https://github.com/ekg/freebayes

Donwulff
02-06-2018, 10:44 PM
I was disappointed to note with the GATK 4 release Broad Institute has actually gutted their Best Practices documentation, it no longer links to all the specifics, just to their own pipeline implementation which aren't in really readable format. I think the previous instructions are probably still available, and I kind of hope they add links back to the processing sections.

I actually already called Y-SNP's from the three different duplication processing stages: Dante Labs own, the one I re-ran on whole genome, and queryname sorted. And I should add BQSR for comparison too, but that's a lot of computing for each duplicate check, especially when the Dante Labs mapping is basically throwaway (Though there's a good reason to assume the results hold for any mapping, as much as you can trust a single-sample study). If you do your own mapping, I would think you might just want to go for duplicate marking in queryname sorted order and allele-specific annotation/VQSR from the start, but it's always good to check your assumptions.

Challenge is there's no exact "truth set" for the calls, so it's difficult to tell what the improvement actually is. Though as I said I'd lead by the default assumption that more duplicate removal is better and see how much, if at all, it changes the results. Second idea I have is you can run VQSR (Variant Quality Score Recalibration) and use that to gauge if the quality has changed; but for that I need to call the whole genome for each different case, which will take weeks on single box. You can run VQSR on single sample, if it's whole genome sequence; may need to reduce gaussians. Another challenge is there really aren't good tools for comparing VCF files.

There was some discussion on the uBAM vs. FASTQ; Broad Institute prefers uBAM, and their tools & pipeline examples pretty much require it by now, and they're the main ones in use so to some degree they're just forcing it down on us whether we like it or not. Seems more like the only real drawback is that the uBAM loses any additional flags on the FASTQ after the read name, such as Illumina uses, though they could be put into custom flags in uBAM. An unmapped BAM shouldn't be that much larger, and everything else is computationally trivial compared to the bioinformatics analysis.

dbm
02-07-2018, 04:04 AM
I was disappointed to note with the GATK 4 release Broad Institute has actually gutted their Best Practices documentation, it no longer links to all the specifics, just to their own pipeline implementation which aren't in really readable format. I think the previous instructions are probably still available, and I kind of hope they add links back to the processing sections.

I actually already called Y-SNP's from the three different duplication processing stages: Dante Labs own, the one I re-ran on whole genome, and queryname sorted. And I should add BQSR for comparison too, but that's a lot of computing for each duplicate check, especially when the Dante Labs mapping is basically throwaway (Though there's a good reason to assume the results hold for any mapping, as much as you can trust a single-sample study). If you do your own mapping, I would think you might just want to go for duplicate marking in queryname sorted order and allele-specific annotation/VQSR from the start, but it's always good to check your assumptions.

Challenge is there's no exact "truth set" for the calls, so it's difficult to tell what the improvement actually is. Though as I said I'd lead by the default assumption that more duplicate removal is better and see how much, if at all, it changes the results. Second idea I have is you can run VQSR (Variant Quality Score Recalibration) and use that to gauge if the quality has changed; but for that I need to call the whole genome for each different case, which will take weeks on single box. You can run VQSR on single sample, if it's whole genome sequence; may need to reduce gaussians. Another challenge is there really aren't good tools for comparing VCF files.

There was some discussion on the uBAM vs. FASTQ; Broad Institute prefers uBAM, and their tools & pipeline examples pretty much require it by now, and they're the main ones in use so to some degree they're just forcing it down on us whether we like it or not. Seems more like the only real drawback is that the uBAM loses any additional flags on the FASTQ after the read name, such as Illumina uses, though they could be put into custom flags in uBAM. An unmapped BAM shouldn't be that much larger, and everything else is computationally trivial compared to the bioinformatics analysis.

Thanks and agree. GATK4 seems to have migrated to these WDL scripts that are geared towards massive datasets. i heard a video saying they should be able to run on a laptop, but didn't dig too much into this.

You can actually load them into a notepad++ and try to parse them. i also heard about the unmapped BAM being preferred by Broad, but when i look at bwa mem (which i believe GATK4 basically just calls itself - the examples i saw took two fastq's and a reference and creates a mapped bam). So i am wondering if the preference for unmapped BAM is based on pre-GATK4?

i don't have my fastq files from dante yet, so i was playing around with low-coverage or exome files from 1000 genomes.

i found this one pipeline that Intel illustrated for Broad. it's a PDF file you can search for on google

it seemed a pretty good overview. Though i believe with GATK4, they folded the 'Realigner Target Creator' and 'IndelRealigner' into HaplotypeCaller. Though i haven't seen much documentation to support that. They sort of mention it (i think) in the GATK4 release videos on youtube.

There are some pretty confusing compatibility issues - like a VCF file having indexes of "1" rather than "chr1" - this is what fixed the 'no overlapping contigs' issue i was running into yesterday. But a post on biostars where someone wrote an 'awk' script fixed that for me, allowing me to complete BQSR and HaplotypeCaller

Question for you though - why do you say that the Dante BAM was throwaway? Poorly mapped? Or how were you able to determine that? Do you feel their VCF is reasonable?

Thank-you so much!

Kaper
02-07-2018, 04:40 AM
Is anyone else having a problem logging into their Dante labs account? I have never had a problem until today. Now when I put my email and my password and it says it is incorrect so I clicked on forgot password and it tells me that my email address does not have an account. My results should be ready soon and it looks like they deleted my account. Or is this happening to other people too and it’s just a temporary glitch ? Is anyone else having problems logging into their account today? Thanks

dbm
02-07-2018, 05:51 AM
Is anyone else having a problem logging into their Dante labs account? I have never had a problem until today. Now when I put my email and my password and it says it is incorrect so I clicked on forgot password and it tells me that my email address does not have an account. My results should be ready soon and it looks like they deleted my account. Or is this happening to other people too and it’s just a temporary glitch ? Is anyone else having problems logging into their account today? Thanks

No issues logging into my account. How long have you been waiting for your results?

JamesKane
02-07-2018, 11:50 AM
You can actually load them into a notepad++ and try to parse them. i also heard about the unmapped BAM being preferred by Broad, but when i look at bwa mem (which i believe GATK4 basically just calls itself - the examples i saw took two fastq's and a reference and creates a mapped bam). So i am wondering if the preference for unmapped BAM is based on pre-GATK4?

Using the unmapped BAMs is a relatively recent introduction for GATK. The script I linked earlier in this thread was the Best Practice flow verbatim at the time the blog post was created. The production of the clean BAMs came a few weeks later. Now that I have a fresh NovaSeq sample, this is something I'm toying with to see if there are actual benefits.


it seemed a pretty good overview. Though i believe with GATK4, they folded the 'Realigner Target Creator' and 'IndelRealigner' into HaplotypeCaller. Though i haven't seen much documentation to support that.

These tools were removed from Best Practices sometime between 3.4 and 3.6. HaplotypeCaller handles local realignment as part of it's processing using the Smith-Waterman algorithm.

Donwulff
02-07-2018, 08:00 PM
You can actually load them into a notepad++ and try to parse them. i also heard about the unmapped BAM being preferred by Broad, but when i look at bwa mem (which i believe GATK4 basically just calls itself - the examples i saw took two fastq's and a reference and creates a mapped bam). So i am wondering if the preference for unmapped BAM is based on pre-GATK4?

bwa-mem has supported aligning from BAM for a long time, it still does, even aligned BAM, though the preference seems to be to shuffle the reads to prevent bias from alignment. The previous Broad Institute GATK Best Practices version made, if I recall right, no mention at all of uBAM but it was in some of the additional information they offered. Now as can be seen on https://software.broadinstitute.org/gatk/best-practices/workflow?id=11145 they no longer offer Best Practices workflows from FASTQ, all the example scripts are from uBAM. I suspect part of the reason for this is as I described, the new queryname-grouped secondary-mapping marking MarkDuplicates workflow requires that the BAM is sorted in the Picard Tools queryname order, which the uBAM is.

If you did a pipeline with FASTQ, then you have to either skip the queryname ordered MarkDuplicates (running the old version on chromosome coordinate sorted file either) or do double the sorting work, which is easily about ~10 hours per sort for a WGS sequence. Of course, I'm not aware there is any reason that Picard Tools technically *couldn't* accept FASTQ-ordered BAM for MarkDuplicates, since the mappings from same readname are still grouped, in fact it actually runs fine until the point where it starts actually writing out the flagged BAM. Either it's an oversight that they're going to fix after enough people complain, or they're REALLY serious about forcing everybody to use uBAM for whatever reason.


Question for you though - why do you say that the Dante BAM was throwaway? Poorly mapped? Or how were you able to determine that? Do you feel their VCF is reasonable?

I mean from the perspective of doing your own pipeline/re-analysis, that is. There isn't going to be THAT much point to calling the variants again if you stick with the original BAM file mapping. But regarding the mapping, as said, the USCS chrM reference is one nobody is using anymore, so you might want to re-map it just for that. Additionally, the decoy sequences for extra DNA often present in human samples that have been shown to improve mapping aren't included in that reference (Though, sequencing from saliva does have different metagenome profile), and I believe the Y chromosome Pseudo-Autosomal Regions at least (and probably centromeric regions) haven't been masked, which leads to the alt-contig like problem when reads can map to either X or Y chromosome or even different ends on different chromosomes (Of course, this only matters if you're interested in the PAR region).

For GRCh38, the preferred "analysis sets" for mapping are at ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/ - note these do NOT contain the latest patch corrections (that do not change genome coordinates) to the reference genome, so I wish someone provided scripts for re-building the analysis sets from any build. But also be aware that most online sites still take hg19/GRCh37 mapped data, sometimes exclusively, so going GRCh38 doesn't win you much, but eventually people will want to re-map their sequence to GRCh38, GRCh39 and so on... For GRCh37 you may consider using http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz which is the reference genome used by 1000 Genomes project, using rCRS mitochondrial and decoy sequences. RSRS would take a bit of assembly (Hah!).

Snoozy
02-09-2018, 09:24 PM
First time poster here who has ordered a WGS from Dante.

Could anyone tell me the 'timeline' steps, please, so I know how far along I am? So far, I have:
Awaiting confirmation from Dante Labs (done).
Kit Received.

Thanks. :)

RobertN
02-09-2018, 09:30 PM
It will take ~12-14 days untill they will let you know if they have successfully extracted your DNA from your sample. After that they will start sequencing which will take a long time 50+ days

Snoozy
02-09-2018, 09:37 PM
It will take ~12-14 days untill they will let you know if they have successfully extracted your DNA from your sample. After that they will start sequencing which will take a long time 50+ days

Thank you! I don't really mind the wait for a WGS, although wait times do seem variable. Was chatting to a Redditor earlier who got his back in 'almost two months' recently. Guess that's something the company could maybe provide more info on/streamline over time.

I've been waiting over six weeks for 23andMe and they are nowhere near finishing that, so I have to get used to it I suppose. :)

Donwulff
02-10-2018, 10:28 AM
I'd expect there is much variation in timeline due to various things.
10th July, late evening - Paid and order confirmed, reported as kit shipped
18th July - sample confirmed as received; I didn't record the exact day it arrived or picked back up, although coordinating things with the couriers is always a problem
3th August - Report of DNA extraction
3th September - asked about the processing time, was told that WGS takes 7-9 weeks instead of the 3-4 weeks for other services that was quoted on their front page
5th October - received notification results were ready, requested BAM file, which was offered on USB stick only
11th October - informed that the USB drive was mailed
12th October - 128GB USB flash drive with the BAM file arrived
13th October - finished uploading BAM file to AWS overnight (Afterwards I discovered Sequencing.com Big Yotta is free and easier), and submitted it to YFull for analysis
9th December - YFull analysis complete, along with STR's, and paid up.
4th January - New YTree with changes, in this case mainly age estimates based on the new sample

180 days total, so patience & perseverance definitely required ;)

Their web-site now says 50 days everywhere, on the front-page and on the WES and WGS product pages. I get it's bit difficult for them with different products, and varying timing. Only the microarray test product page says 3 weeks. That should include DNA extraction, but as can be seen, back when I did my test, DNA extraction took 16 days from when they confirmed sample as received, and sequencing + interpretation 48 days. It's likely they've streamlined things since, but there seems to have been stories of much longer wait-times too. I get that if any stage fails quality-control they have to go back and re-do it though.

Snoozy
02-10-2018, 01:47 PM
I'd expect there is much variation in timeline due to various things.
10th July, late evening - Paid and order confirmed, reported as kit shipped
18th July - sample confirmed as received; I didn't record the exact day it arrived or picked back up, although coordinating things with the couriers is always a problem
3th August - Report of DNA extraction
3th September - asked about the processing time, was told that WGS takes 7-9 weeks instead of the 3-4 weeks for other services that was quoted on their front page
5th October - received notification results were ready, requested BAM file, which was offered on USB stick only
11th October - informed that the USB drive was mailed
12th October - 128GB USB flash drive with the BAM file arrived
13th October - finished uploading BAM file to AWS overnight (Afterwards I discovered Sequencing.com Big Yotta is free and easier), and submitted it to YFull for analysis
9th December - YFull analysis complete, along with STR's, and paid up.
4th January - New YTree with changes, in this case mainly age estimates based on the new sample

180 days total, so patience & perseverance definitely required ;)

Their web-site now says 50 days everywhere, on the front-page and on the WES and WGS product pages. I get it's bit difficult for them with different products, and varying timing. Only the microarray test product page says 3 weeks. That should include DNA extraction, but as can be seen, back when I did my test, DNA extraction took 16 days from when they confirmed sample as received, and sequencing + interpretation 48 days. It's likely they've streamlined things since, but there seems to have been stories of much longer wait-times too. I get that if any stage fails quality-control they have to go back and re-do it though.

Thanks for that. :) So it was roughly 12 weeks from receipt for the results/raw data. It's tricky to know what to expect, as the company is quite young - and Amazon have removed all of the reviews for their offerings. Still, lurking and reading this thread convinced me they were legit.

RobertN
02-10-2018, 01:59 PM
Still, lurking and reading this thread convinced me they were legit.

Hehe, same for me :D. They were professional until now and support answered pretty fast to my questions.

My DNA extraction was complete on Jan 17 and since then i'm still waiting for WES seq, hopefully will have it by the end of the month or early next month. Will update once i will have them...for statistics :P

Snoozy
02-15-2018, 05:22 PM
Seems they're having website issues at the moment - can't view the Kit Manager on multiple browsers.

Edit: They're upgrading their systems at the moment, so Kit Manager might not be available for all of their customers.

Petr
02-15-2018, 07:18 PM
Their kit manager is unbelievable crap, it does not work very often, I already reported the problem twice.

Besides other problems, it connects directly to https://snowtriceratops.com:8000/ and some proxy caches does not work with https protocol on nonstandard port, standard is port 443. And it gives "SSL_ERROR_RX_RECORD_TOO_LONG" error.

snowtriceratops.com domain is anonymously owned - the owner is Domains By Proxy, LLC from Arizona, USA, this looks not very pleasant for handling of the sensitive data.

Donwulff
02-16-2018, 01:30 PM
Their kit manager is unbelievable crap, it does not work very often, I already reported the problem twice.

Besides other problems, it connects directly to https://snowtriceratops.com:8000/ and some proxy caches does not work with https protocol on nonstandard port, standard is port 443. And it gives "SSL_ERROR_RX_RECORD_TOO_LONG" error.

snowtriceratops.com domain is anonymously owned - the owner is Domains By Proxy, LLC from Arizona, USA, this looks not very pleasant for handling of the sensitive data.

I've never had a problem with their kit manager. snowtriceratops.com doesn't seem to be so much of a problem. Maybe it changed since you posted, but I'm seeing DigitalOcean cloud hosted website with GoDaddy as registrar? It's used for the kit details and the actual data comes from Amazon AWS though. However, the scripts are launching connections all over the place, some of them like Facebook, LinkedIn and DoubleClick extremely questionable for this kind of usage. None of them are getting the "sensitive data" that's served straight off Amazon AWS though. I do wish companies as a whole put more effort to protecting the actual data, like in this case requiring some toke-authentication for the download, as the URL will be left on your browser history, proxy logs etc. and it's a issue I've seen many sequencing companies make. (On the other hand you can provide link to the VCF to sites for further analysis with no problems...)

Snoozy
02-27-2018, 11:49 AM
This Thursday will be 3 weeks in extraction. Patience was never my strong suit.

Snoozy
02-28-2018, 04:47 PM
Can't edit my last post. Extraction completed this morning. :)

AMAZIGH
02-28-2018, 11:02 PM
Hello everyone.

I want to order an NGS sequencing test, to get all the informations that I can have with my Y-Chromosome. Do you recommand more big Y or WGS ? There is now a sale : 400 euros for WGS.

Especially, what works better with Yfull ?

omega56
02-28-2018, 11:43 PM
I don't know what test would be best for you, but just in case you or anyone else orders the WGS: After I clicked add to cart and entered my email, name etc. I waited awhile before finishing checkout and about 5-10 hours later I got an email from dantelabs with a 10% discount code. In the end it cost me 359,10€.

AMAZIGH
03-01-2018, 11:33 AM
I don't know what test would be best for you, but just in case you or anyone else orders the WGS: After I clicked add to cart and entered my email, name etc. I waited awhile before finishing checkout and about 5-10 hours later I got an email from dantelabs with a 10% discount code. In the end it cost me 359,10€.

Thank you !

JamesKane
03-01-2018, 01:17 PM
I want to order an NGS sequencing test, to get all the informations that I can have with my Y-Chromosome. Do you recommand more big Y or WGS ? There is now a sale : 400 euros for WGS.

Especially, what works better with Yfull ?

For short variant detection the most important attribute of the discovery tests is the horizontal coverage of the test. I maintain a chart that compares this metric from a large sample of Direct-2-Consumer options, which have been aligned to GRCh38 and run through an older version of GATK's Best Practice document. It can be found here: http://www.haplogroup-r.org/stats.html

For the short answer, Dante's test is a better value than Big Y without factoring you also get aDNA and mtDNA results. Big Y is roughly 16k Y DNA base pair per dollar. Dante's WGS is roughly 29k Y DNA base pair per dollar using today's price. It's also outright cheaper with the current sale price.

AMAZIGH
03-01-2018, 03:56 PM
For short variant detection the most important attribute of the discovery tests is the horizontal coverage of the test. I maintain a chart that compares this metric from a large sample of Direct-2-Consumer options, which have been aligned to GRCh38 and run through an older version of GATK's Best Practice document. It can be found here: http://www.haplogroup-r.org/stats.html

For the short answer, Dante's test is a better value than Big Y without factoring you also get aDNA and mtDNA results. Big Y is roughly 16k Y DNA base pair per dollar. Dante's WGS is roughly 29k Y DNA base pair per dollar using today's price. It's also outright cheaper with the current sale price.

Thank you, is it good for SNP discovery ?

JamesKane
03-01-2018, 04:15 PM
Yes, this test will work well for SNPs and short INDELs. The jury is out for how well it will work for long variants due to the 100 base read length.

I really think they are using BGISEQ-500 here. Some info on how it compares with HiSeq is here: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190264#sec008

AMAZIGH
03-01-2018, 04:36 PM
Yes, this test will work well for SNPs and short INDELs. The jury is out for how well it will work for long variants due to the 100 base read length.

I really think they are using BGISEQ-500 here. Some info on how it compares with HiSeq is here: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190264#sec008

Could you tell me what are long variants please ? STRs ?

Petr
03-01-2018, 08:13 PM
Yes, this test will work well for SNPs and short INDELs.
As I already reported earlier in this thread: https://anthrogenica.com/showthread.php?12075-Dante-Labs-(WGS)&p=339953&viewfull=1#post339953 many SNPs were not shown in the VCF file. Now I am waiting for more than a month for the raw data, what is the best procedure to obtain as much variants as possible? Their VCF file is hg19 aligned, I suppose the BAM file will be hg19 aligned too, so then the best would be analyze the FASTQ file? Using sequencing.com? Is it hard to set up my linux workstation (Ubuntu 16.04LTS) for genome analysis?

JamesKane
03-01-2018, 08:52 PM
My process to convert a BAM to the same hg38 reference build used by the 1000 Genomes project was linked earlier in the thread. It should work without much effort on a Linux box with a JVM.

http://www.it2kane.org/2016/10/y-dna-variant-discovery-workflow-pt-1-1/

Iím in the process of reworking this for GATK 4.

Petr
03-01-2018, 09:09 PM
Thank you. Is it necessary to use GATK 3.6 only? Or 3.7 or 3.8 or 3.8.1 can be used as well?

Donwulff
03-01-2018, 11:44 PM
Yes, this test will work well for SNPs and short INDELs. The jury is out for how well it will work for long variants due to the 100 base read length.

I really think they are using BGISEQ-500 here. Some info on how it compares with HiSeq is here: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0190264#sec008

My main reason for doubting it was BGISEQ-500 is the first papers on BGISEQ-500 started appearing in the summer 2017 when Dante Labs was already in full swing, and I didn't think BGISEQ-500 was yet commercially available or accepted for whole genome sequencing. However, Dante Labs explicitly reserves the right to change their sequencing technology, and now it seems I was wrong, it's definitely BGISEQ. This explains some of the anomalies I noticed in the adapter sequences, for example sometimes they seemed to be in middle of reads instead of ends, which can happen in nanoball sequencing but not Illumina paired-end sequencing. Consequently, I checked the FASTQ data BGI releases for their first whole genome sequence paper, and it follows the same naming convention, explaining that oddity. The technology is likely named as Illumina in the BAM for compatibility with current pipelines/programs.

Theoretically this means if you can identify the adapter sequence, you could preserve the sequence at both ends of the adapter, though this would be so short it would unlikely contribute much to final sequence. I'm also unclear if it would likely be a duplicate read with the other end, need to look into that. Unfortunately I think this could have other implication too, since BWA MEM will soft-clip the adapters at end so they don't need to be trimmed. However, if the adapter is at middle of a read, it could be interpreted as structural variation. It appears the specifics of the technology are proprietary though, and I'm not entirely sure how to recognize the adapter sequences. Anyway, overall BGISEQ quality looks competitive if not better than Illumina, so it's great value regardless, but more research & bioinformatics tools development might help.

Edit: Reference to the reference (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467036/), because it has some simplified diagrams for BGISEQ technology etc.

JamesKane
03-02-2018, 12:07 AM
Thank you. Is it necessary to use GATK 3.6 only? Or 3.7 or 3.8 or 3.8.1 can be used as well?
The workflow should work for entire line of GATK 3. 4 changed a lot of the command line options.

Miroslav
03-02-2018, 02:16 AM
I am interested in the WGS discount sale (Europe) because of the cost-value it offers. I already have done Big Y and YFull, but am interested to do an alternative test to compare and confirm novel SNPs (length coverage was 58%, mean 87X median 54X depth coverage, 58% of novels was ambiguous). There's a slight disappointment with Big Y knowing its limits. Generally speaking, which test would you advise as best, then in comparison to Dante's, as well according to JamesKane chart?

Donwulff
03-02-2018, 02:46 AM
Looking at the supplementary materials in the "A reference human genome dataset of the BGISEQ-500 sequencer (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467036/)" paper I accidentally found the official adapter sequences for the BGISEQ, which on a quick grep appear to match to the Dante Labs sequence.

⦁ Filtering
SOAPnuke 1.5.3: SOAPnuke filter -l 10 -q 0.1 -n 0.01 -Q 2 -G -f AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA -r CAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTT

As I previously reported, there seems to be significant variation to the exact adapter sequence in the data, so one to one match may not be enough. SOAPNuke (https://github.com/BGI-flexlab/SOAPnuke) is BGI's own quality-control and trimming program. Source code at https://github.com/BGI-flexlab/SOAPnuke/blob/master/src/FilterProcessor.cpp suggests the parameters correspond to "low quality threshold 10 (default 5)", "low quality rate 0.1 (default 0.5)", "N rate threshold 0.01 (default 0.05)", "sanger (Illumina) quality input and output". These should correspond to the thresholds explained in the paper. Due to better documentation and characterization I'm currently testing the results of using Trimmomatic (http://www.usadellab.org/cms/index.php?page=trimmomatic) for trimming with those adapter sequences, although it will take quite a while due to size of data. Unclear if the sequence has already been quality-filtered with the thresholds in the paper or similar, but I suppose I will have to run some tests about that too if/when I have time ;)

boaziz
03-06-2018, 11:42 PM
" I just received my Dante results. I was told that the raw data will be sent on a flash disk by end of February."

Me too,
But, An update:
"You are expected to receive the USB drive by the end of March." :(

dbm
03-07-2018, 01:08 AM
" I just received my Dante results. I was told that the raw data will be sent on a flash disk by end of February."

Me too,
But, An update:
"You are expected to receive the USB drive by the end of March." :(

Thanks! How long ago did your DNA extraction finish? Do you find the report useful - like does it contain the SNPs, etc.,?

Wireless
03-07-2018, 11:07 AM
" I just received my Dante results. I was told that the raw data will be sent on a flash disk by end of February."

Me too,
But, An update:
"You are expected to receive the USB drive by the end of March." :(

Their service is a joke, I am still waiting for the raw data from 2 WES tests according to this timeline:

30/08/2017 - DNA extraction
13/11/2017 - WES results VCF and reports
21/11/2017 - email for RAW data "We are going to send them by next week. "
15/12/2017 - email reply to update request "We are doing our best to fulfill your request as quickly as possible. "
13/01/2018 - email reply to update request "We are going to send you two 128 GB USB flashdrive containing BAM files related to both your sample by the end of next week, free of charge."
23/01/2018 - email reply to update request "We are going to send you the USB Flashdrive containing BAM files related to both your samples by Friday."
28/02/2018 - email reply to update request "Unfortunately data transfer takes longer than expected. Anyway, I forwarded your request to our Biotech Department so as to boost the process. "

Seriously I don't know what to do and the bad news is that I have 2 other WES samples in the pipeline with them.
It has been almost 4 months since the results and still no RAW files.

Incredible..

rock
03-12-2018, 05:22 PM
After they received sample one week later, the kit manager's status still "Waiting confirmation from Dante Labs". Anyone with same situation?

omega56
03-12-2018, 08:00 PM
They received my sample last week on thursday and this morning I got an email confirming that they received it and the status changed to "Kit received".
If you're worried just send them an email asking if they received your sample.

Snoozy
03-15-2018, 09:54 PM
Their service is a joke, I am still waiting for the raw data from 2 WES tests according to this timeline:

30/08/2017 - DNA extraction
13/11/2017 - WES results VCF and reports
21/11/2017 - email for RAW data "We are going to send them by next week. "
15/12/2017 - email reply to update request "We are doing our best to fulfill your request as quickly as possible. "
13/01/2018 - email reply to update request "We are going to send you two 128 GB USB flashdrive containing BAM files related to both your sample by the end of next week, free of charge."
23/01/2018 - email reply to update request "We are going to send you the USB Flashdrive containing BAM files related to both your samples by Friday."
28/02/2018 - email reply to update request "Unfortunately data transfer takes longer than expected. Anyway, I forwarded your request to our Biotech Department so as to boost the process. "

Seriously I don't know what to do and the bad news is that I have 2 other WES samples in the pipeline with them.
It has been almost 4 months since the results and still no RAW files.

Incredible..

That's really disheartening. Did you get gVCF files at all? I'll be needing those, not the BAM files, and am hoping they won't need to send out a USB now.

Snoozy
03-15-2018, 10:30 PM
Hehe, same for me :D. They were professional until now and support answered pretty fast to my questions.

My DNA extraction was complete on Jan 17 and since then i'm still waiting for WES seq, hopefully will have it by the end of the month or early next month. Will update once i will have them...for statistics :P

Did you get your results yet? Seems about that time. :)

RobertN
03-15-2018, 10:33 PM
I really wanted to post an hour ago but i said i will update next week, this is crazy :D.

Since you have asked, no...no updates and from what i saw in this thread the average time is somewhere around ~70 days for WGS, although i don't know if WES should take less from a technical point of view but i don't really expect any results until the end of the month...

Wireless
03-16-2018, 01:44 PM
That's really disheartening. Did you get gVCF files at all? I'll be needing those, not the BAM files, and am hoping they won't need to send out a USB now.

I did get VCF and not gVCF files online in the kit manager dashboard along with a PDF from Sequencing.com's app Wellness and Longevity which is not so detailed or usefull to tell you the truth. Have played with promethease but have been waiting for the BAM files to process it myself.

dbm
03-17-2018, 05:35 AM
I did get VCF and not gVCF files online in the kit manager dashboard along with a PDF from Sequencing.com's app Wellness and Longevity which is not so detailed or usefull to tell you the truth. Have played with promethease but have been waiting for the BAM files to process it myself.

i thought the fastq files were available online, with the BAM getting mailed on a USB. If you have the fastq files, i think that's all sequencing.com needs. Am i incorrect? Thanks!

Petr
03-17-2018, 07:57 AM
No, both BAM and FASTQ files are delivered on a USB flash drive. And Iīm still waiting for them, almost 2 months after delivery of the VCF file.

Renchuga
03-19-2018, 01:21 PM
Anyone else experiencing problems with their kit manager dashboard lately? I registered my kit last week, it was visible for a while (with the tag that it waits for a confirmation from Dante Labs), then disappeared from a dashboard. Apparently they are upgrading their system.

Renchuga
03-19-2018, 01:40 PM
I'm having the same problems and when I finally manage to log in, I can't see the kit I have already registered. I contacted their customer service and they responded that they are in the process of upgrading their system. More than a month for a system upgrade... not impossible but it does raise a red flag.

Renchuga
03-19-2018, 01:42 PM
Is anyone else having a problem logging into their Dante labs account? I have never had a problem until today. Now when I put my email and my password and it says it is incorrect so I clicked on forgot password and it tells me that my email address does not have an account. My results should be ready soon and it looks like they deleted my account. Or is this happening to other people too and itís just a temporary glitch ? Is anyone else having problems logging into their account today? Thanks

I'm having the same problems and when I finally manage to log in, I can't see the kit I have already registered. I contacted their customer service and they responded that they are in the process of upgrading their system. More than a month for a system upgrade... not impossible but it does raise a red flag.

Wireless
03-19-2018, 04:30 PM
Yup its not working for me either..I login...click on kit manager and nothing happens...tried it both with Firefox and IE and same thing.

Donwulff
03-19-2018, 05:53 PM
Maybe not necessary for everybody to check in, but tried again just now, works fine. I've yet to experience any problems with their kit manager. I wonder if it has to do with the newer kits, or something?
I don't have gVCF download for my WGS, so is that new? I hope that people are well aware of the limitations of genetics; I don't think the FASTQ and BAM are useful for anybody besides researchers at present.
Promethease can't give you analysis for variants which aren't already well known, thus already listed on Dante Labs VCF. If you re-run the analysis, you're including a lot of erroneously called variants, and perhaps 99% of Promethease's interpretations are speculative at best. The only reason I've suggested re-analysis at Sequencing.com is to obtain the gVCF for GedMatch Genesis. But if Dante Labs provides gVCF now, and GedMatch Genesis doesn't seem to accept WGS gVCF, it may no longer matter.
People certainly deserve the raw data they've paid for, and I'm kinda wondering what's the holdup, but unless you're a bioinformatician or are going to send the (WGS) data to third party interpretation like YFull I'm not sure how useful it will be.

Wireless
03-20-2018, 01:43 PM
Maybe not necessary for everybody to check in, but tried again just now, works fine. I've yet to experience any problems with their kit manager. I wonder if it has to do with the newer kits, or something?
I don't have gVCF download for my WGS, so is that new? I hope that people are well aware of the limitations of genetics; I don't think the FASTQ and BAM are useful for anybody besides researchers at present.
Promethease can't give you analysis for variants which aren't already well known, thus already listed on Dante Labs VCF. If you re-run the analysis, you're including a lot of erroneously called variants, and perhaps 99% of Promethease's interpretations are speculative at best. The only reason I've suggested re-analysis at Sequencing.com is to obtain the gVCF for GedMatch Genesis. But if Dante Labs provides gVCF now, and GedMatch Genesis doesn't seem to accept WGS gVCF, it may no longer matter.
People certainly deserve the raw data they've paid for, and I'm kinda wondering what's the holdup, but unless you're a bioinformatician or are going to send the (WGS) data to third party interpretation like YFull I'm not sure how useful it will be.

Thank you for your input Donwulff it is very much appreciated. The purpose for acquiring the raw files of MY exome is self explanatory I believe, they belong to me and I don't have much trust in the workflow they use to produce the vcf I have been given. I would like as you say put it through the re-analysis of Sequencing.com in order to porduce a gVCF and also to be able to keep analysing it in the future through potential new services.

The kit manager is still not working for me, it is very strange as is the delay for the USB sending with the raw data.

Here is a link I found which gives more light to the Dante Labs company: www . seedinvest . com / dante.labs / seed

Wireless
03-20-2018, 01:57 PM
The comments, questions and answers from the seedinvest link are very enlighting.

This is something something that all of us were expecting given the low pricing for WGS and WES but here it is in detail:


"The Company has incurred losses from inception of approximately $257,752 which, among other factors, raises substantial doubt about the Company's ability to continue as a going concern."

Donwulff
03-22-2018, 02:31 PM
Very interesting link, I wonder why I didn't find the investment page earlier! Though I'm no investor myself, I would like to note the "incurred losses from inception" of $257,752 includes all the costs of getting the company started up from the ground. It doesn't sound high for modern startups; and this is oddly common for modern companies. One of the worst examples is Twitter, who have $2 billion in losses since going public (Ie. that doesn't even include to costs of getting the company off the ground!). It's definitely something to consider for investors, but at this point I don't think it indicates the company doesn't have a plan to profitability, if they can execute it.

Of course, I've commented from the start that their WGS discount cost seems to hardly cover their own costs of sending and extracting DNA from the sample kit, analysis and sending the USB drives, never mind the sequencing. When launched, it was stated that the BGI-SEQ sequencing WGS cost about $600 or about 485 EUR. Dante Labs is currently selling it at 499 EUR. On the other hand, the closest competitor, Veritas Genetics sells WGS at $999 or 810 EUR (Including short genetic counseling) but only in USA. It's not unreasonable to think from those figures Dante Labs could be making around 50% loss on every kit; their strategy as a new company must be to capture as much of the market as they can before running out of investor money, then hike the price and hope to get profitable. Most DNA companies have had introductory price for new products for this reason.

Wireless
03-22-2018, 03:53 PM
Hi Donwulff,

I hope you had a chance to look at the comments, I found this particularly interesting.


Sequencing (WGS and WES) is a process with low differentiation opportunities, as labs use the same machines. We receive raw data from labs (FASTQ, BAM, VCF).

Then we perform Variant Annotations on more than 30 databases (vs. only ClinVar as most labs do), and we automatically generate customized reports. We built this system and this is proprietary.

Our hereditary disease reports are proprietary. Our database is also proprietary.

The decision about the patent submission is expected by early 2019.

And


2) What is proprietary, and what does that mean competitors products will not be able to do?

The data and the data analytics are proprietary. We are collecting more data points than anyone else and building proprietary country specific datasets and recruiting doctors with exclusive contracts so that competitors will have a lot to catch up. We are really building a health digital platform powered by whole genome data.

I have no idea if they did this for the WES kits I ordered since the report from the kit manager results is obviously made with the sequencing.com app as mentioned in previous posts.

On a more positive note they have apparently sent finally the USB with the raw data with delivery for tomorrow. Tracking code is confirmed. Took them four and a half months for that.

dbm
03-22-2018, 07:47 PM
Very interesting link, I wonder why I didn't find the investment page earlier! Though I'm no investor myself, I would like to note the "incurred losses from inception" of $257,752 includes all the costs of getting the company started up from the ground. It doesn't sound high for modern startups; and this is oddly common for modern companies. One of the worst examples is Twitter, who have $2 billion in losses since going public (Ie. that doesn't even include to costs of getting the company off the ground!). It's definitely something to consider for investors, but at this point I don't think it indicates the company doesn't have a plan to profitability, if they can execute it.

Of course, I've commented from the start that their WGS discount cost seems to hardly cover their own costs of sending and extracting DNA from the sample kit, analysis and sending the USB drives, never mind the sequencing. When launched, it was stated that the BGI-SEQ sequencing WGS cost about $600 or about 485 EUR. Dante Labs is currently selling it at 499 EUR. On the other hand, the closest competitor, Veritas Genetics sells WGS at $999 or 810 EUR (Including short genetic counseling) but only in USA. It's not unreasonable to think from those figures Dante Labs could be making around 50% loss on every kit; their strategy as a new company must be to capture as much of the market as they can before running out of investor money, then hike the price and hope to get profitable. Most DNA companies have had introductory price for new products for this reason.

There are a series of articles about Dante Labs in the news also. Them partnering to provide data to 3rd parties. i think some of them have been removed, actually. You can search for them - not allowed to provide links. Also interesting - and others can correct me if i'm wrong - the privacy policy at the bottom of their page has disappeared. Seems like that happened around the same time they further lowered their prices.

In the most recent articles, i see this statement - from Dante Labs.

"All our samples are anonymous. Data, DNA, reports and samples have no personal identification information, not even the country of origin."

i believe that most DNA companies charge less than it costs them to sequence, and make most of their money selling 'anonymous' data to 3rd parties. i have it in writing from Dante that they will not share my information with anyone. So i am guessing i ordered before all this switched. i hope so at least.

lukaszM
03-24-2018, 04:46 PM
interesting:)

Donwulff
03-25-2018, 07:08 PM
A quick thought, when this started Dante Labs was in EU only, as far as I can tell. Now they have also US site, so apparently offering it in US as well. Their US site indeed doesn't have Privacy Policy link on the front page, but the EU site still does. However, it's linked deeper on US site: https://us.dantelabs.com/pages/privacy-policy and https://www.dantelabs.com/pages/privacy-policy - I believe they're identical, maybe I need to diff them.

Anyway, I believe they're EU based company, even if they're not, they're definitely targeting the EU market and have presence (office) in EU, so they will be bound by GDPR for all their processing, which is pretty strict by genetic data. I'm not sure they can reasonable escape that. In any case, *fully* anonymized data is not "personal data", however individual genetic sequences can't be fully anonymized as it would always be possible to upload them to a genetic genealogy site and see who it matches. As such, Dante Labs definition isn't exactly correct, "Anonymized information: information that has been stripped of your Registration Information (e.g., your name and contact information) and other identifying data such that you cannot reasonably be identified as an individual." - but this is why you define what you mean by them.

Most people don't seem to understand about genetic information, but it's only valuable in as far as you can connect it to specific phenotypes, ie. the person had specific disease etc. Secondly, as far as actual research goes, since after the fall of Nazi Germany (And human experiments like Dr. Mengele's), the Nurenberg Code has required explicit, specific consent for all human research. True, there certainly are secret government agencies and evil megalomaniac scientists who don't care about those points, but then, they would have no use of waiting people to randomly send in their samples either - they'd just arrange a blood drive, or ransack through their trash for any chewed gum, comb with hair or used cup. As such, people's worry that their DTC genetic testing results will be used for research or "other" evil purposes without their consent is somewhat misplaced.

The Privacy Policies currently clearly state "We will not sell, lease, or rent your individual-level information (i.e., information about a single individual's genotypes, diseases or other traits/characteristics) to any third-party or to a third-party for research purposes without your explicit consent." and then go to further stress the same point.

The PP does state, "Aggregate information. We may share aggregate information with third-parties, which is any information that has been stripped of your Registration Information (e.g., your name and contact information) and aggregated with information of others so that you cannot reasonably be identified as an individual ("Aggregate Information")." Conceivably, if they ask for the additional information of, for example, have you had breast cancer, the aggregate information could include things like "8% of 100 people with this variation have had breast cancer". Usually, this sort of data is mainly used for reporting results of research a user has consented to. The issue is that you can withdraw consent, but they can't withdraw the research paper with the results when someone withdraws consent. There's been some considerable research into being able to use these sorts of anonymous summary results for further research, but it's generally challenging because for example you want to be able to separate different confounders in the research, like whether they have used oral contraceptives. So usually, you use consent for research, and unidentifiable aggregate data for publishing results of the research.

Donwulff
03-25-2018, 07:43 PM
Looking further at the monthly projections on their investment round supporting information, it looks like they're planning to become profitable by year end, on sales of genetic tests to consumers only. I cannot really comment on the viability of that, other than that I think it's fairly optimistic given they've written average test prices as around $600 dollars, but of course this will depend entirely on demand. However, at least their sales pitch doesn't seem to include sales of data - yet, but yes, most genetic testing companies are looking at that option.

Interestingly, in their sales pitch they say 40% of the investment money will go into software development. In the monthly projections, they've projected software revenue starting at $400 a month, very slowly increasing. There's also "Data Analysis" with substantial income. I believe they're currently offering gene panel of choice with sequences to customers, so it could be monetizing that. 23andMe has a research portal where qualifying & paying researchers can run analysis on the consenting customers data, without getting possession of the data themselves (Great both for ensuring they get paid for everything, and preventing researchers from using the data for unethical uses). The simplest case would be for Dante Labs to do the same, perhaps with 40% of investment going to developing the portal, $400 license per research organization, and additional about $1000 per analysis run. This is not investment advice, just common sense reasoning of what those numbers could mean. Of course, they would adjust prices as needed.

On the EU side service, I do not see or remember research consent document though. Has anybody actually accepted one?

Another thought worth noting is that many of the startups will get bought by larger biotechnology firms, they may even plan for this (Ie. not aiming for profitability at all themselves). This would have some effect for the use of the data as the biotech firm could use it directly themselves, although it still wouldn't remove the need for demonstrable consent.

rock
03-26-2018, 09:00 AM
After they received sample one week later, the kit manager's status still "Waiting confirmation from Dante Labs". Anyone with same situation?

Today I got a notification email. I thought it was telling me DNA extraction complete but only told me kit received. My kit delieved on March 6 by tracking web, after contacted helpdesk, kit manager's status changed to kit received on March 13, until today I got email notification. weird, seems the system has delay.

Wireless
03-26-2018, 08:07 PM
Update: Finally today 26/03/2018 received the USB drive with the raw files of 2 WES tests that they posted the results on 13/11/2017. Took them 4,5 months to put them in USB stick and mail them and it took me 7 whole minutes to copy them from the USB stick to my SSD drive.

Donwulff
03-26-2018, 09:04 PM
Meanwhile, FTDNA BAM files 6.5 months and counting, and they don't even give you free USB sticks. DNA testing companies really need to work on staying on schedule, and expectation management in particular, though I understand the pressure for DNA testing companies is to promise moon from the sky and then try to figure out how to deliver it on shoestring budget. Sending USB sticks by courier isn't sustainable for Dante Labs, and it's not secure as I got mine just standing around when the courier arrived with no proof of identity, so hopefully they will be providing download link. This would also solve the IMHO biggest problem of passing them on to further processing at YFull etc. with just a mobile phone.

As an aside, has anyone used the paid EvE versions etc. on Sequencing.com to know if their bioinformatics workflows are actually any good? With the free version I've only been able to call gVCF on pre-mapped BAM?

Kaper
03-28-2018, 07:11 PM
Any idea on how long processing is currently taking? I purchased my kit on November 10 and still do not have the results!
Iím a little disappointed in the length of time that it is taking and their lack of communication. Does anybody know about how long it currently takes for processing? I emailed him but havenít heard back

Kaper
03-28-2018, 07:14 PM
This happened to me in January. I bought my kid in November and was told in early January that the extraction was successful. Then I went later in January to check in my account has been deleted. They did set up a new account for me after I contacted them but I have still not gotten my results and I am getting worried. About how long is it taking these days to get results

Donwulff
03-28-2018, 09:11 PM
As they now seem to have separate US and EU operations, it would be useful for people to specify which one they're using. For me in EU, last summer it took 16 days to DNA extraction, and 49 for sequencing and interpretation. BAM files seem to vary. For another poster, it seems to have been 10 weeks for WES + interpretation, not sure of region. Incidentally, their EU site now says 50 days and US site 10 weeks. So the sequencing + interpretation, not including DNA extraction and raw data, may match their claimed turnaround time. If they had to re-do the data for quality control reasons, I can understand the time would nearly double.
I do wish Dante Labs luck, because they seem to be only DTC sequencing service serving the EU market currently. After their acquisition, Genos Research is no longer mentioning service to EU and is WES only, and Full Genomes Corporation has stopped offering services to EU countries due to GDPR. FTDNA only does targeted Y-chromosome with old reference build targets, and hasn't been providing raw data for over half a year (after another long break last year). Veritas Genetics seems still restricted to US. Are there any other reliable DTC sequencing services?

Snoozy
03-29-2018, 06:37 PM
Any idea on how long processing is currently taking? I purchased my kit on November 10 and still do not have the results!
Iím a little disappointed in the length of time that it is taking and their lack of communication. Does anybody know about how long it currently takes for processing? I emailed him but havenít heard back

Did you get an answer from them?

MacUalraig
03-30-2018, 11:29 AM
As they now seem to have separate US and EU operations, it would be useful for people to specify which one they're using. For me in EU, last summer it took 16 days to DNA extraction, and 49 for sequencing and interpretation. BAM files seem to vary. For another poster, it seems to have been 10 weeks for WES + interpretation, not sure of region. Incidentally, their EU site now says 50 days and US site 10 weeks. So the sequencing + interpretation, not including DNA extraction and raw data, may match their claimed turnaround time. If they had to re-do the data for quality control reasons, I can understand the time would nearly double.
I do wish Dante Labs luck, because they seem to be only DTC sequencing service serving the EU market currently. After their acquisition, Genos Research is no longer mentioning service to EU and is WES only, and Full Genomes Corporation has stopped offering services to EU countries due to GDPR. FTDNA only does targeted Y-chromosome with old reference build targets, and hasn't been providing raw data for over half a year (after another long break last year). Veritas Genetics seems still restricted to US. Are there any other reliable DTC sequencing services?

YSEQ in Berlin have multiple WGS options and use a lab in Germany.

https://www.yseq.net/product_info.php?cPath=29&products_id=42468

Petr
03-30-2018, 09:06 PM
My process to convert a BAM to the same hg38 reference build used by the 1000 Genomes project was linked earlier in the thread. It should work without much effort on a Linux box with a JVM.

http://www.it2kane.org/2016/10/y-dna-variant-discovery-workflow-pt-1-1/

I’m in the process of reworking this for GATK 4.

I just prepared my linux (Ubuntu 16.04 LTS) machine for the process, it was necessary to modify several things but the process works now with hg19 FTDNA BAM.

But how to use it with Dante raw data? There are files:
clean_data
15001702300576A_1.fq.gz (40.2 GiB)
15001702300576A_2.fq.gz (42.4 GiB)
result_alignment
15001702300576A.bam (81.1 GiB)
15001702300576A.bai (8.44 MiB)
I'm not sure about the alignment, but since VCF files are hg19 aligned, probably the BAM file will be aligned to hg19 as well.

Is it better to use BAM or FASTQ files? I'd guess that FASTQ should be better, but I'm not sure how to use these two files.
All commands and parameters remain the same? I suppose FTDNA and DANTE use different process?

When do you suppose to finish the rework to GATK 4?

Petr
03-30-2018, 09:09 PM
The workflow should work for entire line of GATK 3. 4 changed a lot of the command line options.
There is some error with GATK 3.7, but 3.6 works fine.

Petr
04-03-2018, 12:22 PM
I tried to align the delivered FASTQ files using bwa, but it failed:

[email protected]:~/Data/15001702300576A$ ~/Genomics/bwa/bwa mem -t 4 -M ~/Genomics/Reference/GRCh38/GRCh38_full_analysis_set_plus_decoy_hla.fa cllean_data/15001702300576A_1.fq.gz clean_data/15001702300576A_2.fq.gz >test.sam
[M::bwa_idx_load_from_disk] read 3171 ALT contigs
[M::process] read 400000 sequences (40000000 bp)...
[M::process] read 400000 sequences (40000000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 1, 0, 1)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] skip orientation FR as there are not enough pairs
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[mem_sam_pe] paired reads have different names: "CL100054478L1C011R062_553015", "CL100054478L1C001R001_12"
[mem_sam_pe] paired reads have different names: "CL100054478L1C011R062_553024", "CL100054478L1C001R001_18"
[mem_sam_pe] paired reads have different names: "CL100054478L1C011R062_553021", "CL100054478L1C001R001_16"

I found that this could be corrected by BBmap, but there is another problem:

[email protected]:~/Data/15001702300576A$ ~/Genomics/bbmap/repair.sh -Xmx28g in1=clean_data/15001702300576A_1.fq.gz in2=clean_data/15001702300576A_2.fq.gz out1=dante1_fixed.fq out2=dante2_fixed.fq outsingle=single.fq overwrite=t
java -ea -Xmx28g -cp /home/petr/Genomics/bbmap/current/ jgi.SplitPairsAndSingles rp -Xmx28g in1=clean_data/15001702300576A_1.fq.gz in2=clean_data/15001702300576A_2.fq.gz out1=dante1_fixed.fq out2=dante2_fixed.fq outsingle=single.fq overwrite=t
Executing jgi.SplitPairsAndSingles [rp, -Xmx28g, in1=clean_data/15001702300576A_1.fq.gz, in2=clean_data/15001702300576A_2.fq.gz, out1=dante1_fixed.fq, out2=dante2_fixed.fq, outsingle=single.fq, overwrite=t]

Set INTERLEAVED to false
Started output stream.
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.StringCoding$StringDecoder.decode(String Coding.java:149)
at java.lang.StringCoding.decode(StringCoding.java:19 3)
at java.lang.StringCoding.decode(StringCoding.java:25 4)
at java.lang.String.<init>(String.java:546)
at stream.FASTQ.makeId(FASTQ.java:568)
at stream.FASTQ.quadToRead(FASTQ.java:786)
at stream.FASTQ.toReadList(FASTQ.java:711)
at stream.FastqReadInputStream.fillBuffer(FastqReadIn putStream.java:109)
at stream.FastqReadInputStream.nextList(FastqReadInpu tStream.java:94)
at stream.ConcurrentGenericReadInputStream$ReadThread .readLists(ConcurrentGenericReadInputStream.java:6 77)
at stream.ConcurrentGenericReadInputStream$ReadThread .run(ConcurrentGenericReadInputStream.java:653)
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.regex.Matcher.<init>(Matcher.java:225)
at java.util.regex.Pattern.matcher(Pattern.java:1093)
at java.util.regex.Pattern.split(Pattern.java:1206)
at java.lang.String.split(String.java:2380)
at java.lang.String.split(String.java:2422)
at jgi.SplitPairsAndSingles.repair(SplitPairsAndSingl es.java:705)
at jgi.SplitPairsAndSingles.process3_repair(SplitPair sAndSingles.java:531)
at jgi.SplitPairsAndSingles.process2(SplitPairsAndSin gles.java:293)
at jgi.SplitPairsAndSingles.process(SplitPairsAndSing les.java:219)
at jgi.SplitPairsAndSingles.main(SplitPairsAndSingles .java:37)

This program ran out of memory.
Try increasing the -Xmx flag and using tool-specific memory-related parameters.

This computer has 32 GiG of RAM so -Xmx28g is maximum. I started with -Xmx8g and the resulst was the same - the new fixed fq files had size 4486 MiB, regardless on the -Xmx value.

Is it normal for Dante that the paired reads have different names? How to work with them?

Donwulff
04-04-2018, 04:24 PM
The read names reported by BWA are actually close to each other (per file, that is) so I'm not entirely sure what's going on there. Did they deliver more than two FASTQ files - my BAM actually had a number of read groups? Alternatively, if the FASTQ files are create by "samtools fastq", the BAM file needs to be queryname sorted, or they won't match, but then you'd likely get randomly ordered read names.

https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/repair-guide/ says that "Repairing (repair flag) arbitrarily disordered files will take a lot of memory Ė potentially, all reads need to be stored in memory.", so it seems like you can't use it in this case. A common way to sort FASTQ files is "zcat reads_1.fq.gz | paste - - - - | sort -S 6G --parallel=4 -T /tmp | tr '\t' '\n' | gzip -c > reads_sorted_1.fg.gz" (You need enough temp space to store uncompressed FASTQ, or give --compress-program).

I have not tried this yet, but I think you might be able to move straight to GATK uBAM workflow though:
https://gatkforums.broadinstitute.org/gatk/discussion/6484/how-to-generate-an-unmapped-bam-from-fastq-or-aligned-bam

Assuming the FastqToSam doesn't check FASTQ names match, that should work. Then you could run RevertSam to sort them into queryname order, and finally BWA MEM on the resulting uBAM. Aside from your above commands, you really need to set the read group data when you're mapping, because subsequent tools will require them. Do the FASTQ files have RG:Z: tag? If not, your best bet is to just run RevertSam on the originally provided BAM instead.

Edit: I think you may need/want to use SortSam to sort them into correct order either way, re-reading the page it doesn't seem that RevertSam will do sorting, it just assumes they're already sorted.

Petr
04-04-2018, 07:10 PM
Thank you Don. I received these files from Dante:

clean_data folder
15001702300576A_1.fq.gz (40.2 GiB)
15001702300576A_2.fq.gz (42.4 GiB)
result_alignment folder
15001702300576A.bam (81.1 GiB)
15001702300576A.bai (8.44 MiB)

15001702300576A_1.fq.gz has reads named: 553015 553019 553021 553024 553027 553029 553034 553036 553040 553045 ...
15001702300576A_2.fq.gz has reads named: 12 13 16 18 19 42 54 57 61 63 ...

repair.sh repairs the first 4.5 GB of these files by changing numbers 12 13 16 18 19 42 54 57 61 63 ... to 553015 553019 553021 553024 553027 553029 553034 553036 553040 553045 ...

It looks like the problem si not with ordering but with the names only.

The FASTQ files look like:

@CL100054478L1C011R062_553015/1
ATGAAATTAAATACACATATAAAGTGCTTAGAACCCAGTAAAAGGTTACT ATATTTTAAATGGTATTACTATGCCTTAACATCTGTTCTGGTTTTTCTAT
+
[email protected]<ECAF6AE=>FFEA8>E9E27E>C>>3;DD;:FDEEFAF5DD5F.<F:[email protected]<[email protected](3DA<BCDF4F
@CL100054478L1C011R062_553019/1
ATGAACACATATAGCACCATGACCCCCCCCCCCCCCCCCCCCAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAACCACACAACA
+
GGFFGFFFFFEFFFFAFFGEE:[email protected]:E02'6<[email protected]>F><=FFE3CCDCEFACED/>96C?=:<F2>(+,)'++>49)2+6(
@CL100054478L1C011R062_553021/1
TGGGCTCTAGTTTTCCAATGTCCTGAGAGGCAGTGTATCATAGTGAAGGA AAAAGGAGGCTTCAGGATGCCGGCTTTGCCACTTACTAGGTGTCCAAGCT

I see no RG: there.

What the possible advantage or disadvantage in using BAM file instead of FASTQ files for the analysis?

Donwulff
04-04-2018, 10:59 PM
I'm not certain what you mean by "the problem is not with the ordering but names only". Surely that's a problem of ordering? Also, it sounds like DL did just extract the FASTQ files from the BAM, but did so poorly without preserving read grouping/queryname order or read groups, which kinda sucks, though it's always possible to extract the FASTQ from the BAM anyway. "samtools fastq" requires -t option to preserve RG (And BC/QT if they exist) tags. Casava 1.8 with -i might help too. Normally though, each read group is in a separate file, which based on the BAM file is originally the case for Dante Labs too.

https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups "There is no formal definition of what is a read group, but in practice, this term refers to a set of reads that were generated from a single run of a sequencing instrument." Actually I'm not entirely sure what I should think of this, because when I'm looking at Dante Labs read groups for example they run like:
@CL100XXXXXXL2C001R004_1 RG:Z:CL100XXXXXX_L02_17
@CL100XXXXXXL2C001R004_3 RG:Z:CL100XXXXXX_L02_22
@CL100XXXXXXL2C001R004_4 RG:Z:CL100XXXXXX_L02_18
@CL100XXXXXXL2C001R004_5 RG:Z:CL100XXXXXX_L02_19
@CL100XXXXXXL2C001R004_6 RG:Z:CL100XXXXXX_L02_24
@CL100XXXXXXL2C001R004_7 RG:Z:CL100XXXXXX_L02_21
@CL100XXXXXXL2C001R004_10 RG:Z:CL100XXXXXX_L02_18
@CL100XXXXXXL2C001R004_11 RG:Z:CL100XXXXXX_L02_24
@CL100XXXXXXL2C001R004_12 RG:Z:CL100XXXXXX_L02_19
@CL100XXXXXXL2C001R004_13 RG:Z:CL100XXXXXX_L02_23
@CL100XXXXXXL2C001R004_14 RG:Z:CL100XXXXXX_L02_23
@CL100XXXXXXL2C001R004_15 RG:Z:CL100XXXXXX_L02_24
@CL100XXXXXXL2C001R004_16 RG:Z:CL100XXXXXX_L02_22
@CL100XXXXXXL2C001R004_18 RG:Z:CL100XXXXXX_L02_22
@CL100XXXXXXL2C001R004_21 RG:Z:CL100XXXXXX_L02_24

So the read groups are randomly interlaced there among the querynames. This, however, suggests it's not possible to extract the read groups from the read names itself. The BaseQualityRecalibration phase of the variant calling, however, will build a separate sequencing quality profile for each read group, which isn't possible if the read group information is lost. I should play with the covariates files to see if (for my sample) this actually makes noticeable difference, but in short you can't do the GATK best practices workflow if you don't know which read-group each read belongs to. MarkDuplicates has a new, more effective mode that requires the BAM is queryname-sorted, and uBAM input is the best way to achieve that. I don't think the variant calling stage cares about it, but I'm not entirely certain how the variant calling logic works with respect to that.

Otherwise there shouldn't be any difference in process & results for the mapping and analysis, regardless of whether FASTQ or BAM was used as input. I can't remember if BWA MEM can take coordinate-sorted BAM as input right now, but I think in any case you may be best sorting the BAM in queryname order first for the MarkDuplicates phase.

Petr
04-05-2018, 07:43 AM
Using the samtools view -H sample.bam | grep '@RG' command from your link, my BAM file supplied by Dante labs shows:

@RG ID:CL100054478_L01_10 PL:COMPLETE PU:WHJMAACYGAA171113-10 LB:WHJMAACYGAA171113-10 SM:15001702300576A CN:BGI
@RG ID:CL100054478_L01_11 PL:COMPLETE PU:WHJMAACYGAA171113-10 LB:WHJMAACYGAA171113-10 SM:15001702300576A CN:BGI
@RG ID:CL100054478_L01_12 PL:COMPLETE PU:WHJMAACYGAA171113-10 LB:WHJMAACYGAA171113-10 SM:15001702300576A CN:BGI
@RG ID:CL100054478_L01_9 PL:COMPLETE PU:WHJMAACYGAA171113-10 LB:WHJMAACYGAA171113-10 SM:15001702300576A CN:BGI
@PG ID:bwa PN:bwa VN:0.7.15-r1140 CL:/bioapp/bwa-0.7.15/bwa mem -t 8 -M -Y -R @RG\tID:CL100054478_L01_9\tPL:COMPLETE\tPU:WHJMAAC YGAA171113-10\tLB:WHJMAACYGAA171113-10\tSM:15001702300576A\tCN:BGI /rdata/genedock/hg19_broad/ucsc.hg19.fasta CL100054478_L01_9_1.clean.fq.gz CL100054478_L01_9_2.clean.fq.gz
@PG ID:bwa.1 PN:bwa VN:0.7.15-r1140 CL:/bioapp/bwa-0.7.15/bwa mem -t 8 -M -Y -R @RG\tID:CL100054478_L01_12\tPL:COMPLETE\tPU:WHJMAA CYGAA171113-10\tLB:WHJMAACYGAA171113-10\tSM:15001702300576A\tCN:BGI /rdata/genedock/hg19_broad/ucsc.hg19.fasta CL100054478_L01_12_1.clean.fq.gz CL100054478_L01_12_2.clean.fq.gz
@PG ID:bwa.2 PN:bwa VN:0.7.15-r1140 CL:/bioapp/bwa-0.7.15/bwa mem -t 8 -M -Y -R @RG\tID:CL100054478_L01_11\tPL:COMPLETE\tPU:WHJMAA CYGAA171113-10\tLB:WHJMAACYGAA171113-10\tSM:15001702300576A\tCN:BGI /rdata/genedock/hg19_broad/ucsc.hg19.fasta CL100054478_L01_11_1.clean.fq.gz CL100054478_L01_11_2.clean.fq.gz
@PG ID:bwa.3 PN:bwa VN:0.7.15-r1140 CL:/bioapp/bwa-0.7.15/bwa mem -t 8 -M -Y -R @RG\tID:CL100054478_L01_10\tPL:COMPLETE\tPU:WHJMAA CYGAA171113-10\tLB:WHJMAACYGAA171113-10\tSM:15001702300576A\tCN:BGI /rdata/genedock/hg19_broad/ucsc.hg19.fasta CL100054478_L01_10_1.clean.fq.gz CL100054478_L01_10_2.clean.fq.gz


Yesterday I started JKane script: http://www.it2kane.org/2016/10/y-dna-variant-discovery-workflow-pt-1-1/ and I expect it will take at least one day on my machine with Xeon E5-1620, 32 GB RAM and SSD.

Donwulff
04-05-2018, 07:55 PM
Big Y contains just one read group, so JKane's script doesn't work quite correctly either. I'm doing some storage space & other rearrangements on my home server currently, so I'm not sure about testing things myself right now. "samtools bamshuf" is interesting for a couple of reasons, even for queryname-sorted BAM, and I think it's way faster than the sorting options. However, it won't let you run the new GATK MarkDuplicates workflow. I have some scripts myself, but I've been playing with improving the workflow myself. Actually I need to update my other thread, because I found out that for some reason the adapter-trimming I tried didn't really work, but I've not yet figured out the problem. I think that for most uses people can just skip the adapter trimming of course.

Petr
04-06-2018, 06:39 AM
Yesterday I started JKane script: http://www.it2kane.org/2016/10/y-dna-variant-discovery-workflow-pt-1-1/ and I expect it will take at least one day on my machine with Xeon E5-1620, 32 GB RAM and SSD.

So after 35 hours I got error message

Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: java.io.IOException: No space left on device

I have 1.35 TB free space on SSD, I hoped it could be sufficient.

And the script has deleted all results at the end.

:-(

So again.

Note:
Just bwa took almost 24 hours:

[main] Version: 0.7.17-r1188
[main] CMD: /home/petr/Genomics/bwa/bwa mem -t 4 -M -p /home/petr/Genomics/Reference/GRCh38/GRCh38_full_analysis_set_plus_decoy_hla.fa fixed.fq
[main] Real time: 83303.874 sec; CPU: 334712.405 sec

Petr
04-06-2018, 06:58 AM
Big Y contains just one read group, so JKane's script doesn't work quite correctly either.

Big Y 2014 BAM file (V1.4)
@RG ID:GRC11054780_S37_L00 SM:GRC11054780_S37_L00

Big Y 2016 BAM file (V 1.4)
@RG ID:GRC14388618_S53_L00 SM:GRC14388618_S53_L00

FGC Elite 1.0 BAM file (V 1.0) contains one read group as well:
@RG ID:KK2P9_1 PL:ILLUMINA LB:HUMsgiTAAANAAA-104 SM:KK2P9

FGC WGS 15x BAM file (V 1.4) contains 3 read groups:
@RG ID:UT4MM_DHG02985_HFK5KCCXX_L6 PL:illumina PU:DHG02985_HFK5KCCXX_L6 LB:UT4MM SM:UT4MM CN:novogene
@RG ID:UT4MM_DHG02985_HFK5KCCXX_L7 PL:illumina PU:DHG02985_HFK5KCCXX_L7 LB:UT4MM SM:UT4MM CN:novogene
@RG ID:UT4MM_DHG02985_HJHMLCCXX_L7 PL:illumina PU:DHG02985_HJHMLCCXX_L7 LB:UT4MM SM:UT4MM CN:novogene

Genos WES BAM file (V 1.5) contains just one:
@RG ID:37791510242018 SM:37791510242018 LB:37791510242018 PL:ILLUMINA PU:HJLM5BCXY:1 CN:MACROGEN_MD DT:2017-03-27T20:22:49-0400

And Dante BAM file (V 1.5)
@RG ID:CL100054478_L01_10 PL:COMPLETE PU:WHJMAACYGAA171113-10 LB:WHJMAACYGAA171113-10 SM:15001702300576A CN:BGI
@RG ID:CL100054478_L01_11 PL:COMPLETE PU:WHJMAACYGAA171113-10 LB:WHJMAACYGAA171113-10 SM:15001702300576A CN:BGI
@RG ID:CL100054478_L01_12 PL:COMPLETE PU:WHJMAACYGAA171113-10 LB:WHJMAACYGAA171113-10 SM:15001702300576A CN:BGI
@RG ID:CL100054478_L01_9 PL:COMPLETE PU:WHJMAACYGAA171113-10 LB:WHJMAACYGAA171113-10 SM:15001702300576A CN:BGI

Petr
04-08-2018, 06:52 AM
Exception in thread "main" htsjdk.samtools.util.RuntimeIOException: java.io.IOException: No space left on device

I have 1.35 TB free space on SSD, I hoped it could be sufficient.

And the script has deleted all results at the end.

:-(

So again.

I found that the problem is with the space on the disk where the temporary files are placed, not on my working disk. I changed it to different disk and the SortSam process finished succesfully.
[Sun Apr 08 02:41:05 CEST 2018] picard.sam.SortSam done. Elapsed time: 289.35 minutes.

But not the MarkDuplicates process

I don't know if this warning is important:
WARNING 2018-04-08 08:31:01 AbstractOpticalDuplicateFinderCommandLineProgram A field field parsed out of a read name was expected to contain an integer and did not. Read name: CL100054478L1C010R059_134742. Cause: String 'CL100054478L1C010R059_134742' did not start with a parsable number.
but the process has ended shortly with the following message:
Exception in thread "main" htsjdk.samtools.SAMException: Value was put into PairInfoMap more than once. 1: null:CL100054478L1C015R021_110106
at htsjdk.samtools.CoordinateSortedPairInfoMap.ensure SequenceLoaded(CoordinateSortedPairInfoMap.java:13 3)
at htsjdk.samtools.CoordinateSortedPairInfoMap.remove (CoordinateSortedPairInfoMap.java:86)
at picard.sam.markduplicates.util.DiskBasedReadEndsFo rMarkDuplicatesMap.remove(DiskBasedReadEndsForMark DuplicatesMap.java:61)
at picard.sam.markduplicates.MarkDuplicates.buildSort edReadEndLists(MarkDuplicates.java:528)
at picard.sam.markduplicates.MarkDuplicates.doWork(Ma rkDuplicates.java:232)
at picard.cmdline.CommandLineProgram.instanceMain(Com mandLineProgram.java:269)
at picard.cmdline.PicardCommandLine.instanceMain(Pica rdCommandLine.java:98)
at picard.cmdline.PicardCommandLine.main(PicardComman dLine.java:108)

ValidateSamFile command shows:

ERROR: Read groups is empty
WARNING: Read name CL100054478L1C010R059_134742, A record is missing a read group
WARNING: Read name CL100054478L1C014R096_469848, A record is missing a read group
WARNING: Read name CL100054478L1C011R045_113731, A record is missing a read group
WARNING: Read name CL100054478L1C014R050_557359, A record is missing a read group
WARNING: Read name CL100054478L1C015R094_416090, A record is missing a read group
etc...




I have no idea how to continue. Evereything what I did was to use this script http://www.it2kane.org/2016/10/y-dna-variant-discovery-workflow-pt-1-1/ with the BAM file supplied by Dante.

Donwulff
04-08-2018, 06:53 PM
The obvious problem with you current workflow is that the read group information is lost, like I posted above.
I'm not certain how much it helps to go the problems over step by step... On the other hand I could post my own script, but I'm doing the bleeding edge stuff with constant tweaks.
"Value was put into PairInfoMap more than once. 1: null:CL100054478L1C015R021_110106" suggests that read exists more than once. I take it "null" is supposed to be the read-group, so it's possible the same read occurs in different read groups, and the MarkDuplicates is unable to process them. I don't think this happens in my sample, but as I was perplexing about the read-group interleaving above, it's entirely possible.

bwakit will give you the basic command for running bwa on your data correctly, by the way (I'm masking identifiable information with X's, though don't think there's any need to):

./run-bwamem -o /output/mapped_sorted -t 4 -H -s -k -M hs38DH.fa sample1.bam
cat /mnt/MyGenome/dante_self/original/150XXXXXXXXXXXA.bam \
| ./htsbox bamshuf -uOn80 - /output/mapped_sorted.shuf \
| ./htsbox bam2fq -O -t - \
| ./bwa mem -p -t4 -M -H'@RG\tID:CL100XXXXXX_L02_17\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-17\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_18\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-18\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_19\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-19\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_20\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-20\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_21\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-21\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_22\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-22\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_23\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-23\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_24\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-24\tSM:150XXXXXXXXXXXA' -C hs38DH.fa - 2> /output/mapped_sorted.log.bwamem \
| ./k8 ./bwa-postalt.js -p /output/mapped_sorted.hla hs38DH.fa.alt \
| ./samtools sort [email protected] 4 -m1G - -o /output/mapped_sorted.aln.bam;
./run-HLA /output/mapped_sorted.hla > /output/mapped_sorted.hla.top 2> /output/mapped_sorted.log.hla;
touch /output/mapped_sorted.hla.HLA-dummy.gt; cat /output/mapped_sorted.hla.HLA*.gt | grep ^GT | cut -f2- > /output/mapped_sorted.hla.all;

I'm not sure what happens now that the BAM file has correct platform. Some tools might require ILLUMINA.

bwakit would use samblaster for MarkDuplicates, but I don't recall right now if it's a good choice. I'm using MarkDuplicates for the new queryname sorted order, but this requires queryname order as noted.
I think samtools is compatible with htsbox, although I've used the original htsbox with that pipeline. Note the temporary space requirements.

Wireless
04-10-2018, 09:03 AM
Another WES timeline (4th and last one) from Dante - 37 business days to get the DNA extracted, let's see if they live up to their 50 days quote.

13/02 Sample received
09/04 DNA extraction OK

Petr
04-11-2018, 04:17 PM
15.12.2017: We are pleased to confirm you we have received your sample. Our lab will start the DNA extraction as soon as possible
5.1.2018: We are pleased to confirm you we have successfully completed DNA extraction. Your sample is classified as level A.
50 working days passed on February 27th.
1.3.2018: We are currently running the bioinformatics analysis on your sample. You are expected to receive the final result by the end of next week.
22.3.2018: The analysis of your sample is completed. We are running the quality control, which generally takes 6 working days to be completed. You are expected to receive the final result by early April.
Since that no reply to my e-mails.

:(

Petr
04-11-2018, 08:37 PM
./run-bwamem -o /output/mapped_sorted -t 4 -H -s -k -M hs38DH.fa sample1.bam
cat /mnt/MyGenome/dante_self/original/150XXXXXXXXXXXA.bam \
| ./htsbox bamshuf -uOn80 - /output/mapped_sorted.shuf \
| ./htsbox bam2fq -O -t - \
| ./bwa mem -p -t4 -M -H'@RG\tID:CL100XXXXXX_L02_17\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-17\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_18\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-18\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_19\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-19\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_20\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-20\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_21\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-21\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_22\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-22\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_23\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-23\tSM:150XXXXXXXXXXXA' -H'@RG\tID:CL100XXXXXX_L02_24\tPL:ILLUMINA\tPU:CL10 0XXXXXX_L2\tLB:XXXXXXXXXXXXXXXXXX-24\tSM:150XXXXXXXXXXXA' -C hs38DH.fa - 2> /output/mapped_sorted.log.bwamem \
| ./k8 ./bwa-postalt.js -p /output/mapped_sorted.hla hs38DH.fa.alt \
| ./samtools sort [email protected] 4 -m1G - -o /output/mapped_sorted.aln.bam;
./run-HLA /output/mapped_sorted.hla > /output/mapped_sorted.hla.top 2> /output/mapped_sorted.log.hla;
touch /output/mapped_sorted.hla.HLA-dummy.gt; cat /output/mapped_sorted.hla.HLA*.gt | grep ^GT | cut -f2- > /output/mapped_sorted.hla.all;

It worked and it took 2 days to finish.

It was a bit confusing that bwamem (from bwa) is not the same as bwa.mem (from bwamem).

I don't know what to do with the HLA files?

Snoozy
04-18-2018, 11:45 AM
So I'm now a bit confused about this 50 working days thing. Just been on their Facebook page and apparently, the 50 days starts when the sample has been extracted, not once they've acknowledged receipt. I've e-mailed them asking for an update anyhow - 50 working days is tomorrow.

Edit: So this is starting to get very confusing. To clarify:

The 50 working days only starts once your sample has been extracted according to what they've told me on Facebook Messenger. That's not made clear anywhere, but ok. I asked them also about a kit update. They told me:

Hi again, your sample is in the analysis phase, this means that library preparation and sequencing went well. Now the Fastq files have been generated and the bioinformatics are aligning your DNA to a reference genome to obtain the BAM. After that they will obtain a file that is easier to handle call VCF. When we receive it we perform a final QC and deliver to you. We can expect to finish this procedure in 10 days.


That's good comms, and I'm happy.

About an hour ago, I got a response to the same question (kit update query) saying this:

Your genome is still under analysis, in one month results will be ready.

We apologize for the delay

All the best

Now I'm confused as heck and wondering how one estimate is twice as long as the other in working days....how? :s Just gives me the impression they have no idea what's going on, so I have no confidence right now that I'll get my results within either timeframe.

Update: Apparently, it is true 'theoretically' that I could have my results in 10 days, but there is apparently a lot of demand for analysis. Great.

Petr
04-18-2018, 05:11 PM
After almost a month of silence I have received the first results for my second kit. The DNA was successfully extracted on January 5th, so 68 working days in total.

For my 1st sample, I have received:

On January 28th downloadable files:
Wellness-and-Longevity-xxxxxx-15001702300xxxA_snp_cds_annot_csv-18Jan26.pdf
15001702300xxxA.snp.annotation.tar.zip, containing:
- 15001702300xxxA.indel.annotation.tar.gz, containing
-- 15001702300xxxA.coding.indel.AnnotationTable.xls
-- 15001702300xxxA.indel.annot.csv.gz
-- 15001702300xxxA.indel.AnnotationTable.xls
-- 15001702300xxxA.indel.cds.len.png
-- 15001702300xxxA.indel.cds_annot.csv.gz
-- 15001702300xxxA.indel.gene.csv.gz
-- 15001702300xxxA.indel.len.png
-- 15001702300xxxA.indel.vcf.gz
- 15001702300xxxA.snp.annotation.tar.gz, containing
-- 15001702300xxxA.coding.snp.AnnotationTable.xls
- 15001702300xxxA.snp.annot.csv.gz
- 15001702300xxxA.snp.AnnotationTable.xls
- 15001702300xxxA.snp.cds_annot.csv.gz
- 15001702300xxxA.snp.gene.csv.gz
- 15001702300xxxA.snp.vcf.gz
Files within the zip archive have date 18th January.

Genome-Overview-xxxxxx-15001702300xxxA_snp_cds_annot_csv-24Jan2018.zip, containing:
- Genome-Overview-xxxxxx-15001702300xxxA_snp_cds_annot_csv-24Jan2018.html
- Genome-Overview-xxxxxx-15001702300xxxA_snp_cds_annot_csv-24Jan2018.genes.txt
Files within the zip archive have date 24th January.

On March 26th flash disk with
15001702300xxxA_1.fq.gz
15001702300xxxA_2.fq.gz
15001702300xxxA.bam
15001702300xxxA.bai
All files with date March 8th.

For my 2nd sample, I have received
On April 17th
Wellness-and-Longevity-yyyyyy-15001702300yyyA_snp_vcf_gz-18Apr13.pdf
15001702300yyy.zip, containing:
- 15001702300yyyA.indel.vcf.gz
- 15001702300yyyA.snp.vcf.gz
Files within the zip archive have date 22nd March.

That's all, nothing more.


The Wellness and Longevity report was generated by: https://sequencing.com/wellness-and-longevity
The Genome Overview report was generated by https://sequencing.com/genome-overview

dudeweedlmao
04-20-2018, 12:08 AM
So I'm now a bit confused about this 50 working days thing. Just been on their Facebook page and apparently, the 50 days starts when the sample has been extracted, not once they've acknowledged receipt. I've e-mailed them asking for an update anyhow - 50 working days is tomorrow.

Edit: So this is starting to get very confusing. To clarify:

The 50 working days only starts once your sample has been extracted according to what they've told me on Facebook Messenger. That's not made clear anywhere, but ok. I asked them also about a kit update. They told me:

Hi again, your sample is in the analysis phase, this means that library preparation and sequencing went well. Now the Fastq files have been generated and the bioinformatics are aligning your DNA to a reference genome to obtain the BAM. After that they will obtain a file that is easier to handle call VCF. When we receive it we perform a final QC and deliver to you. We can expect to finish this procedure in 10 days.


That's good comms, and I'm happy.

About an hour ago, I got a response to the same question (kit update query) saying this:

Your genome is still under analysis, in one month results will be ready.

We apologize for the delay

All the best

Now I'm confused as heck and wondering how one estimate is twice as long as the other in working days....how? :s Just gives me the impression they have no idea what's going on, so I have no confidence right now that I'll get my results within either timeframe.

Update: Apparently, it is true 'theoretically' that I could have my results in 10 days, but there is apparently a lot of demand for analysis. Great.

Hey you mean it's been 50 working days since you got the 'extraction complete' update? If so, how long did it take to go from kit received to extraction complete? I've been sitting at kit received for 6 weeks today.

Donwulff
04-20-2018, 12:41 AM
Meanwhile, as an useless update, Dante Labs has hit $350.000 additional investment on https://www.seedinvest.com/dante.labs/seed with one day to go. They were seeking $250.000 to $1.000.000, so that's considerably less than their target, but should alleviate any concern for the company running out of money before they can deliver current tests (The "going concern" warning mentioned above). The company CEO is also saying they will likely have another investment round at the end of the year. Since they've already got everything working with their earlier investment round, their mandatory expenses should be much less now, but we'll have to wait and see. As they not, it's a "highly competitive market", but there don't seem to be that many competitors right now, at least for sequencing on EU market.

Donwulff
04-20-2018, 12:52 AM
If you have not received DNA extraction completed notice in 6 weeks, I'd definitely contact their customer support. On the other hand, the measurement of processing time from DNA extraction is bit meh, because that's definitely not what customers would expect and allows them to potentially cover delays by delaying DNA extraction notification. The reported processing times from that notice to receiving results seem to have held so far. And yes, I would even expect small number may have to be re-ran through sequencing, I don't know any of the specifics of their workflow, but that could even mean starting again back from the DNA extraction if the sequencing fails.

1) US and EU have different processing times, which seems to have held true in the few processing times reported so far.
2) The processing time they advertise does not include DNA extraction or delivery of raw reads, just DNA extraction to VCF/interpretation.
3) Pretty sure some samples will have to fail quality control, no idea what their arrangements are in that case.

dbm
04-20-2018, 04:44 AM
Hey you mean it's been 50 working days since you got the 'extraction complete' update? If so, how long did it take to go from kit received to extraction complete? I've been sitting at kit received for 6 weeks today.

It took 4-5 weeks from when i sent the kit in to see extraction as complete. It's been more than 50 businses days since. They say i'll get my results soon. Who knows. i think 50 days is a bit of a guess on their part. And as they get more and more customers, i think the processing time is slowly growing. As Donwulff said, EU vs US are probably way different. i am in the US, if that helps.

karwiso
04-23-2018, 05:39 PM
Edit:
Sorry, guys!
The mods have deleted the posts now, it seems so. I don't see them. Can you confirm it?
The discussions were going on for several days and I thought I needed to ask for your help.
The guys were referring to have read here and there about problems with Dante Labs. It seemed very suspicious just right before the DNA day, and just constantly reposting. I don't know the exact details and I don't want to post any unsupported rumors here. Anyway, sorry for disturbing!
Do you want to post a review there, please, do!
Sorry again, not working for Dante. Just some feeling for the justice. Dante were nice to me.
-------------------------------------------------------------------------------------------------

Hi guys!

There are now several discussions about Dante Labs results, truncated and copied files. The guys are just keeping it reposting on and on. We know about problems with deadlines sometimes, waiting for usb-drives. But are Dante Labs so bad?

The discussions are in Facebook group FTDNA Big Y * YSEQ * YFULL * FGC - NGS Discussion Forum (https://www.facebook.com/groups/257810104756408/). Whether you have any positive or negative experiences with Dante Labs, could you share it honestly in the group?

PS Roberta Estes is one of the moderators in the group, so I think it should be serious discussion there.

Mich Glitch
04-23-2018, 05:47 PM
Calumniation?

Mich Glitch
04-23-2018, 05:50 PM
Roberta?
I still remember her attacks on YFull.

Mich Glitch
04-23-2018, 05:52 PM
BTW, I'm waiting for the Dante Labs results.

karwiso
04-23-2018, 06:52 PM
Sorry, guys!
The mods have deleted the posts now, it seems so. I don't see them. Can you confirm it?
The discussions were going on for several days and I thought I needed to ask for your help.

See my edit up there |^^^

MacUalraig
04-23-2018, 06:57 PM
Estes is in a formal business relationship with ftdna to promote her add-on consultancy. But Thomas Krahn of YSEQ is also a moderator.

Ust-Ishim
04-25-2018, 04:00 AM
Is the general consensus then that Dante Labs is too unreliable? Not in EU; I just really don't want to pay over $1,000 AUD for 30x WGS.

Mich Glitch
04-25-2018, 04:27 AM
Estes is in a formal business relationship with ftdna to promote her add-on consultancy. But Thomas Krahn of YSEQ is also a moderator.

With all my respect to T.Krahn, YSEQ has a similar product.
So, I'm still waiting for results from Dante Labs and don't listen I-don't-know-who. People that I trust say it's OK with the price/quality ratio for Dante.

RobertN
04-25-2018, 07:06 AM
I'ts been more than 90 days since my dna extraction and no status change...i think they forgot about me, let's see what they will say.

Maybe i'm wrong but since i ordered WES, shouldn't technically be faster than a WGS? i'm almost at the double of that 50 days target they promise...

dudeweedlmao
04-25-2018, 04:47 PM
Hey I was a long time on 'Kit Reveiced' and when I emailed them they told me it was already in sequencing and that results would be ready in 30 more days. Not sure if they're not pushing updates onto the website on purpose or something broke, either way it's a tad unprofessional but as long as I get my results I'm happy. EU btw.

andrea_dante_labs
04-25-2018, 11:53 PM
Hello - this is Andrea from Dante Labs. I just wanted to clarify that the errors with the BAM/FASTQ and with the truncated files were caused by the transfer of data from the cloud or hard drives to the flashdrives that we shipped to customers. We have now switched to WD Elements hard drives which have proven a better solution.

andrea_dante_labs
04-25-2018, 11:57 PM
Meanwhile, as an useless update, Dante Labs has hit $350.000 additional investment on SeedInvest with one day to go. They were seeking $250.000 to $1.000.000, so that's considerably less than their target, but should alleviate any concern for the company running out of money before they can deliver current tests.

Hello - this is Andrea from Dante Labs. I just wanted to confirm that we are here to stay, we are serious about our commitment to our customers and we will never let you down. As an update, we ended up raising $580,000 on SeedInvest plus anothr $400,000 from grants and bank financing. We are now investing to automate the processes that have caused delays in the past.

This is not meant as an ads, but only as a transparent way to engage with the community and address questions. Thanks.

Ust-Ishim
04-26-2018, 10:03 AM
This is not meant as an ads, but only as a transparent way to engage with the community and address questions. Thanks.

Thanks Andrea, I appreciate it. Was considering ordering a kit but I'm a bit concerned with the backlog people are discussing here, and the privacy changes:


There are a series of articles about Dante Labs in the news also. Them partnering to provide data to 3rd parties. i think some of them have been removed, actually. You can search for them - not allowed to provide links. Also interesting - and others can correct me if i'm wrong - the privacy policy at the bottom of their page has disappeared. Seems like that happened around the same time they further lowered their prices.

In the most recent articles, i see this statement - from Dante Labs.

"All our samples are anonymous. Data, DNA, reports and samples have no personal identification information, not even the country of origin."

i believe that most DNA companies charge less than it costs them to sequence, and make most of their money selling 'anonymous' data to 3rd parties. i have it in writing from Dante that they will not share my information with anyone. So i am guessing i ordered before all this switched. i hope so at least.

Could you shed some light on this? Also, what sort of turnaround can I expect?

LPoropat
04-26-2018, 07:31 PM
I ordered WGS in late March for €399 during the Rare Disease Month. The lab received my sample via pre-paid DHL within two days, on 6 April. The extraction was successfully completed on 25 April, with classification level A. For now, I have to say that I am satisfied with the support. Whenever I asked a question they responded soon, and the laboratory work is seemingly also going fine.

If have an advice for them it would be to give a bit more significant attention to appearance and functionality of their website, also to consider having one day a stationary office for patients in Italy, and New York, so they can come here, get an introduction to their offers as well on privacy policy and collaboration with hospital etc., arrange arrival for testing day after being introduced to not eat&drink several minutes before it, and of course to have a friendly environment because the potential customer range from a child to elderly. Many people are still not used to do such things via the post office, they are reluctant and even distrustful, not to mention that cannot be ignored the part of the population who due to various issues could experience difficulties to properly do such a test on their own without supervision.

I hope they stay in the EU&Italy market and overcome beginners' difficulties&mistakes, it's not like other companies did not and will not have them when start. The demand for WGS in the near future will exponentially grow, both from the public and medical institutions (hospitals). The current price they have for WGS (€599), and especially like €399 if continue to have in one-or-two rare occasions during the year, is relatively accessible. Many people are not aware that simple disease/genes panel test cost €560-900-1600. Yes, their results are expected within less than 30 days, but such a thing can be expected with Dante and other labs in near future. If anything, with such accessible WGS prices we are not only going to see an exceptional drop of disease/genes panel test prices but their very limited usage from the clinical/technical point of view calls into question the existence of such single panel offers. From a financial point of view the laboratories/companies in the market will not have another choice, but to adapt or to close up. It only depends on Dante Labs team how much their future will be bright.

Donwulff
04-27-2018, 11:49 AM
I'm wondering if we can get accurate estimate/followup of the processing times. Earlier, I commented most people seemed to get their results in promised time (DNA extraction to VCF; it's perhaps bit misleading that it doesn't cover DNA extraction and BAM delivery, but some of that at least is out of their hands). At least according to their site, there's no difference between Exome and Whole Genome delivery times. It doesn't actually take weeks to just sequence a sample; a lot of that will be batches, shippping, queues and bio-informatics processing. See Full Genomes Corporation https://www.fullgenomes.com/qanda/ for timing examples. Dante Labs promised timing seems similar to FTDNA Big Y delivery time (still waiting for BAM's...) for example. Of course, if they do not deliver in promised time, that's always reason for concern. But I've had interactions with several companies taking double the promised time, with no explanation or feedback, s*it happens. Indeed, I'm sure no company can promise the sample/batch won't fail quality check and have to be re-run, though the time needed to do that can vary.

But I hope that people can report here the time it took from shipping to DNA extraction, from DNA extraction to delivery of VCF, and then to raw data (If requested) so we can have some kind of idea what to expect & how things are going.

Donwulff
04-27-2018, 12:08 PM
Also, FaceBook is well... FaceBook. The FB ISOGG group has always been pretty openly pro-FTDNA, and you will find various posters railing against anything done by other companies. Some people have raised concern & issue about the very close relationship between ISOGG & FTDNA itself in the past (Sorry, no reference, there were some posts on the FB group itself about that maybe half a year ago?).

I like the groups rules, but these appear to not be enforced: "Group Rules: Be kind and considerate; treat others the way you'd want to be treated (including the DNA companies). Stay on the group's topic; no religious posts, and no partisan political bashing. In other words, no contentiousness. The only political posts allowed are when it relates to DNA legislation, but keep partisan bashing out of it. Commercial genetic genealogy postings ARE allowed, but grandstanding is not. Grandstanding is when you complain about a company or other commercial entity (i.e.: a consultant, etc.) and you have not contacted them with your issues and grievances first so they can address it. If you do on-list grandstanding, you may be asked to provide correspondence with company or person you are complaining about. Grandstanding posts will be deleted by the admins."

I'd say that many if not most posts on the group are actually contentious (socio)political, medical or bashing specific DNA companies/startups. Though to be fair FTDNA criticism doesn't seem particularly restricted. However, I wouldn't consider that group or any with same main actors as particularly unbiased, but they're useful groups which reach a lot of genetic genealogy aficionados. Just be aware that anybody with any kind of agenda can post on them, and individuals can and do delete responses which don't agree with their narrative.

Snoozy
04-28-2018, 11:30 AM
Hey you mean it's been 50 working days since you got the 'extraction complete' update? If so, how long did it take to go from kit received to extraction complete? I've been sitting at kit received for 6 weeks today.

No, it was 50 working days since they acknowledged receipt. They need to clarify if it's from extraction however, as none of their materials specify that. From kit received to extraction complete was 2.5 weeks.

Seems vague communication is a staple of DNA testing companies but I hope Dante makes things clearer as customers will savage them in reviews for not meeting advertised timescales, however good the product is. Last I checked, they were still advertising 6-8 week turnarounds for WGS on Amazon.

Geldius
04-30-2018, 11:32 PM
Hello - this is Andrea from Dante Labs. I just wanted to clarify that the errors with the BAM/FASTQ and with the truncated files were caused by the transfer of data from the cloud or hard drives to the flashdrives that we shipped to customers. We have now switched to WD Elements hard drives which have proven a better solution.

Hello Andrea, I appreciate your posts on behalf of Dante Labs and I would like to ask following questions.

1) Could you please clarify when BAM/FASTQ transfer fix could be expected and when USB drives with correct BAM/FASTQ are going to be distributed ?
2) Does Dante Labs store BAM/FASTQ files of completed tests ? If so, is it for unlimited time ? (I would welcome this service.)

Thank you !

My order history for eventual interest of others:
01/11/2017 - WGS ordered
14/11/2017 - DNA sample delivered
05/12/2017 - confirmed DNA extraction with Level A
19/03/2018 - completed: Welness and Longevity report, VCF raw data
17/04/2018 - received USB drive, BAM/FASTQ files missing, drive apparently affected with some data transfer error
... waiting for correct BAM/FASTQ files

pm1113
05-01-2018, 07:08 PM
Hi all,
Here my experience so far as Dante WGS customer. Initially I wanted to wait for the results and then join the discussions on analysis of the WGS. As my experience seems to be a variation of the stories described before, I decided to share now:

Timeline part 1
27. Nov 2017 ORAGENE DNA SAMPLE KIT sent back to Dante Italy
5. Dec 2017 Confirmation sample arrived at Dante
21. Dec 2017 Confirmation DNA extraction successful
5. April 2018 Results files uploaded: .gvcf file and wellness and longevity report (from sequencing.com) (I.e. 70 working days after DNA extracted assuming a Christmas to new year break)
7. April 2018 Mail to Dante: Please send .bam and .fastq file
8. April 2018 Reply from Dante: You will receive a notification email as soon as the parcel is fulfilled.

Issues with the .gvcf file
1) I run as a test the Lactose Tolerance App at sequening.com. App came back with an ďemptyĒ result with the information ďNot enough genetic dataĒ. For none of the variant sites investigated a sequence could be found in the .gvcf file.
2) I run the Genome overview App at sequencing.com. Result file was weird: It reported that there are about 6.1 mio processed variants (sounds quite high?) of which 5.5 mio were reported as multiallelic Vcf entries (I.e. More than two alleles). Assuming I have a diploid genome that seems weird.
3) I had a look into the .gvcf file (3.5 Gb as .gz; 23.0 Gb unzipped) and this looked weird to me: The majority of positions are recorded as <NON_REF> defined in the header as:
##fileformat=VCFv4.1
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">
For only few positions a base is given, but never homozygous, always in combination with <NON_REF> (see example line 3). The sequence is always reported as ./. I.e missing sequence instead of 0/0 or 0/1 etc.

Here an extract for 3 positions in chr 1 (other chromosomes are similar)

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT xxxxxxxxxxxxxxxxxx-PE100-2
chr1 3791541 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL ./.:38:98:38:0,99,1333
chr1 3791542 . G <NON_REF> . . END=3791582 GT:DP:GQ:MIN_DP:PL ./.:41:99:38:0,105,1516
chr1 3791583 . A G,<NON_REF> . . BaseQRankSum=-3.555e+00;ClippingRankSum=-3.850e-01;DP=39;MQ=60.00;MQ0=0;MQRankSum=-3.280e-01;
ReadPosRankSum=-1.099e+00
GT:AD:DP:PL:SB ./.:16,23,0:39:652,0,542,700,611,1311:7,9,14,9

Timeline part 2
15. April 2018: Mail to Dante reporting the issue and asking for clarification.
20. April 2018: No response from Dante yet; Reminder sent by mail
25. April 2018: No response from Dante yet; Reminder sent by mail
27. April 2018: Response from Dante:
I have forwarded your case to our bioinformaticians that are reviewing this.
I will send you an email as soon as possible.
Apologies if the service would be slower than usual. This is due to a huge number of volumes reached.


So, obviously my questions to the community:
1) If I didn't miss anything, I seem to be the first one who received a .gvcf instead of a .vcf file?
2) Anything wrong with my interpretation that the content seems not to correct my sequence? The file was obviously produced with GATK (##GATKCommandLine=<ID=HaplotypeCaller,Version=3.3-0-g37228af).

Geldius
05-01-2018, 11:07 PM
I have received VCF available for download via Dante Labs website (1,1 GB uncompressed)
and gVCF on USB drive (split into 25 files along with chr number, 3,15 GB as .gz).
I haven't tried to analyze gVCF with Sequencing tools yet.

Genotype reported as ./. means "no call" at the specific position.

Donwulff
05-02-2018, 03:46 PM
Here an extract for 3 positions in chr 1 (other chromosomes are similar)

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT xxxxxxxxxxxxxxxxxx-PE100-2
chr1 3791541 . A <NON_REF> . . . GT:DP:GQ:MIN_DP:PL ./.:38:98:38:0,99,1333
chr1 3791542 . G <NON_REF> . . END=3791582 GT:DP:GQ:MIN_DP:PL ./.:41:99:38:0,105,1516
chr1 3791583 . A G,<NON_REF> . . BaseQRankSum=-3.555e+00;ClippingRankSum=-3.850e-01;DP=39;MQ=60.00;MQ0=0;MQRankSum=-3.280e-01;
ReadPosRankSum=-1.099e+00
GT:AD:DP:PL:SB ./.:16,23,0:39:652,0,542,700,611,1311:7,9,14,9


Other people have reported on receiving gVCF before. I didn't receive one, so I can't comment on Dante Lab's use of them, though that example looks kind of garbled.
However, Geldius' point is correct, that's not exactly what the gVCF file is saying. The "<NON_REF>" are on the ALT-column, and it's supposed to be there on every line.
The actual call is at the last column, in these rows indeed starting with "./.". If it's "0/0", then that's two REF alleles, and "1/1" is two ALT-alleles.
The rows you have quoted from the file aren't quite at the tip of the chromosome, so if you aren't quoting selectively, it looks like it might be cut off. If the first no-calls are 3.8 megabases into the chromosome, then that's nothing to worry about.
Rows with "END=" on them aren't even exactly calls; that's just a blog indicating that the region from 3791542 to 3791582 inclusive could not be called, though it's perhaps bit weird format.

But yeah, there's other things that don't look right about those example rows. First row has MIN_DP, but it doesn't have END= info on it, so it isn't a block for which there could be minimum.
Third row, assuming that's a single row, has DP 39 on the info-field, but 23 in the sample column. Also I think some of those probably should be able to be called.
I believe at least one issue in that is that it's picked out of multi-sample VCF file without correcting all the annotations, which could have caused some of the other issues.

pm1113
05-02-2018, 10:25 PM
Thanks Geldius and Donwulff,
the three lines are indeed exemplary for the whole file. Min_DP is present in all lines, be it for single positions or blocks. For single positions (as in line 1) the values are identical. For blocks (as in Line 2) Min_DP is smaller than DP.
For line 3 (GT:AD:DP:PL:SB ./.:16,23,0:39:652,0,542,700,611,1311:7,9,14,9) I interpreted that DP is the number 39 after the 2nd colon, so this seemed ok to me.

Thanks for clarifying the meaning of <Non_Ref>. I had misinterpreted the header information (##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location">) and an example on the gvcftools page where the ALT sequence was given as a dot or an alternative base.
So the main problem seems to be that there are no calls in the SAMPLE column. Scrolling through the file I only found ./.
I will have to wait and see what Dante will reply; without bam file I can't do much more.

Geldius
05-03-2018, 12:28 AM
pm1113, I forgot to note, if you have any other atDNA results (FTDNA, Ancestry, 23andme, MyHeritage),
you can compare VCF (Dante Labs) with other raw data (atDNA) on GEDmatch Genesis (not allowed to post links).

I did so and verified that 99,9% of overlapping SNPs match each other with Dante Labs VCF :).

pm1113
05-03-2018, 09:26 PM
This is my first sequencing - so unfortunately nothing to compare. But I am happy to hear that results between technologies match well (as they should...).
Just got a mail from Dante that .bam and .fastq files will be delivered in three weeks due to the backlog caused by the problems with the flashdrives.

Geldius
05-03-2018, 09:53 PM
That's good news from Dante, I guess it concerns me as well.

In case you would like to have some other results to compare, there is still special sale for $59 from Family Tree DNA or MyHeritage.

cholineman
05-23-2018, 12:50 PM
I just ordered a WGS kit from them now, and am located in Australia. I have been sick a long time, and assume I have a genetic disease on an autosome, so will be interesting to see the results.

Geldius
06-06-2018, 01:02 AM
My order history for eventual interest of others:
01/11/2017 - WGS ordered
14/11/2017 - DNA sample delivered
05/12/2017 - confirmed DNA extraction with Level A
19/03/2018 - completed: Welness and Longevity report, VCF raw data
17/04/2018 - received USB drive, BAM/FASTQ files missing, drive apparently affected with some data transfer error
... waiting for correct BAM/FASTQ files

I have received 1TB WD Elements with BAM and FASTQ files today. I will initiate analysis on yfull.com very soon ...

Thank you Dante Labs !

Petr
06-06-2018, 06:40 AM
I have received no response from them since April 17th...

Miqui Rumba
06-07-2018, 06:31 PM
Hi, I have my WES results since last November and now I am waiting my WGS kit.
I am checking some bioinformatic tools with Kubuntu 17: samtools, bcftools, bwa and bowtie2.
First I downloaded Hg38 FASTA to index with BWA and I tried to to proccess my FASTQ exome files with bowtie2 and samtools though I had problems (except chrom.1 and chrM).
Then I tried with Hg37 FASTA and bcftools with better results. Chrom by chrom, align with BWA MEM and converting to VCF with command "bcftools mpileup ref.fa dantelabs.bam". I've still generating chrom1.vcf, chrom9.vcf and chrom19.vcf with all readings (2 Gb every vcf).
2 days ago, I compressed chrom1.vcf with gzip (it must be a tbi index too) and with " bcf merge" I generated a new vcf merging my 23andme raw data with Dantelabs WES chrom1.vcf
Some readings are different though most show identical alleles.

RobertN
07-03-2018, 05:09 PM
wrong thread, posted on the technical one.

oagl
07-11-2018, 07:58 PM
It seems Dante Labs now offers a combined WES-WGS package for 850€/999$ (Whole GenomeZ). However, it would be cheaper to just buy the WGS for 400€ and WES for 350€ separately, at least now while there is sale ;)

Miqui Rumba
07-13-2018, 02:50 PM
I have my Dantelabs WES snp.VCF uploaded to Gedmatch Genesis. My results first were like Karwiso's kit though now my kit can't run Comparision:One-to-Many. Not enough SNPs to compare matches!

omega56
07-13-2018, 05:59 PM
I just got my hard drive. This is my order history, just in case anybody is interested:

28/02/2018: Ordered WGS for 359.10 Ä
02/03/2018: Saliva Collection Kit arrived
12/03/2018: Kit arrived at Dante Labs
10/04/2018: DNA extraction confirmed
31/05/2018: VCF and Report can be downloaded
13/07/2018: Received a 1TB Hard disk with BAM, FASTQ, GVCF, VCF, rawVCF and a bunch of other files.


I found these statistics in one of the excel files on the hard disk. Maybe someone can tell me how they compare to other BAM files from Dante Labs:

Clean reads 1470466708
Clean bases (Mb) 147046.67
Mapping rate (%) 97.18
Unique rate (%) 92.97
Duplicate rate (%) 2.74
Mismatch rate (%) 0.51
Average sequencing depth (X) 46.7
Coverage (%) 99.83
Coverage at least 4X (%) 99.59
Coverage at least 10X (%) 98.82
Coverage at least 20X (%) 96.55

MacUalraig
07-17-2018, 11:42 AM
Dante have another flash sale on, WGS 30x for 299 Euros so just grabbed one, offer expires at the end of the day.


"
Dante Labs Offers $349 Whole Genome Sequencing on Amazon Prime Day
Jul 16, 2018 Posted by: Andrea Riposati

Press Release (ePRNews.com) - NEW YORK - Jul 16, 2018 - International biotech company Dante Labs announced today the offer of whole genome sequencing (WGS) and interpretation at only USD 349 (€299). This offer marks a further price reduction compared to Prime Day 2017, and marks another “first” in the worldwide reduction of the cost of whole genome sequencing.

The special offer is available for only 36 hours – from July 16 to the end of July 17 on both the Amazon websites (amazon.com, amazon.de, amazon.fr) and on Dante Labs website. Customers from all over the world will be able to benefit from this one-off, historic opportunity.

The sequencing coverage is 30X via next-generation sequencing (NGS). The service includes bioinformatics analysis and data interpretation, as well as customized reports for diseases or conditions of interest.
"This offer marks a further price reduction compared to Prime Day 2017, and marks another “first” in the worldwide reduction of the cost of whole-genome sequencing"
Andrea Riposati
CEO


Dante Labs confirms its excitement to work with Amazon on this opportunity to make advanced genetic testing accessible to everyone.

“Amazon has been a special resource for Dante Labs from the beginning,” said Dante Labs CEO Andrea Riposati. “We share several values such as customer centricity and passion for excellence. Amazon provides us with an amazing platform to reach people worldwide and to achieve economies of scale and cost savings that we are glad to pass to our customers.”
"

MacUalraig
07-17-2018, 11:46 AM
I did pass on this little additional offer at the end, on privacy grounds... ;-)

"To effectively support all the people affected by rare diseases, Dante Labs will provide you with a Customized Report for your diseases of interest (ex. Epilepsy, Periodic Paralysis, etc.).

After checkout, email us your notes about the specific diseases or genes you are interested in. We will provide you with a customized report, free of charge. "

MacUalraig
07-26-2018, 02:56 PM
WGS sample kit received back at the Dante lab so I've started my ten week stopwatch...

reignman
07-26-2018, 03:44 PM
Is there any I-CTS10228 who has ordered Dante labs WGS?

pmokeefe
07-28-2018, 04:36 PM
02/03/2018: Ordered WGS
09/03/2018: Kit arrived at Dante Labs - I was in Rome, Italy at the time, so kit logistics were probably better than most people wlll experience
23/06/2018: VCF and Report downloaded
19/07/2018: Received Hard disk with BAM, FASTQ, GVCF, VCF, rawVCF etc.


I combined my SummaryTable.xlsx with the statistics posted by omega56 in a previous post. They seem roughly similar.

pmokeefe omega56
Clean reads 1511812484 1470466708
Clean bases (Mb) 151181.25 147046.67
Mapping rate (%) 96.99 97.18
Unique rate (%) 93.77 92.97
Duplicate rate (%) 2.19 2.74
Mismatch rate (%) 0.4 0.51
Average sequencing depth (X) 48.41 46.7
Coverage (%) 99.84 99.83
Coverage at least 4X (%) 99.59 99.59
Coverage at least 10X (%) 98.81 98.82
Coverage at least 20X (%) 96.71 96.55

pmokeefe
07-28-2018, 06:06 PM
My Dante Labs vcf files do not appear to contain refSNP (rsid) information. The "ID" column seems to always be a ".".
I would like to create a new vcf file which includes that information.
I found some promising resources at NCBI/dbSNP and Broad Institute/GATK but the easiest/best procedure is not yet clear to me.
Advice would be much appreciated!

rock
08-12-2018, 05:47 AM
my Statistics compared with others, reduced many reads:

Sample pmokeefe omega56 rock
Clean reads 1511812484 1470466708 1328781140
Clean bases (Mb) 151181.25 147046.67 132878.11
Mapping rate (%) 96.99 97.18 93.23
Unique rate (%) 93.77 92.97 93.56
Duplicate rate (%) 2.19 2.74 2.17
Mismatch rate (%) 0.4 0.51 0.46
Average sequencing depth (X) 48.41 46.7 40.82
Coverage (%) 99.84 99.83 99.77
Coverage at least 4X (%) 99.59 99.59 99.39
Coverage at least 10X (%) 98.81 98.82 98.32
Coverage at least 20X (%) 96.71 96.55 94.02

dufresda
08-16-2018, 04:43 AM
compared to other companies that does sequencing? what do you mean? I am new to this but understand the statistics hard to understand your thought without the comparing data

MacUalraig
08-16-2018, 09:13 AM
compared to other companies that does sequencing? what do you mean? I am new to this but understand the statistics hard to understand your thought without the comparing data

The table posted is comparing posters' data as shown in the column labels 'Sample pmokeefe omega56 rock' - the samples are all Dante.

Ted Toal
08-22-2018, 07:53 AM
There are some hints of big problems with Dante Labs:

1. The web site currently has NO contact information: no names, no email addresses, no phone numbers, no physical addresses.
2. When you register for an account, you do not get a confirmation email, and though it says the registration succeeded, you cannot log in to the account.
3. Price is too good to be true?
4. I received my two kits promptly, but with NO RETURN LABEL OR ADDRESS! And no response to website email sent to them.
5. There is no Customer Service or Support link on the web site.

MacUalraig
08-22-2018, 09:12 AM
There are some hints of big problems with Dante Labs:

1. The web site currently has NO contact information: no names, no email addresses, no phone numbers, no physical addresses.
2. When you register for an account, you do not get a confirmation email, and though it says the registration succeeded, you cannot log in to the account.
3. Price is too good to be true?
4. I received my two kits promptly, but with NO RETURN LABEL OR ADDRESS! And no response to website email sent to them.
5. There is no Customer Service or Support link on the web site.

I can log in to my account. Form based contact method only is quite common and YSEQ and FGC don't have manned phones either. There is a NY address at the bottom of the home page.

Did you actually look at their site cos it has a contact form here:

https://www.dantelabs.com/pages/contact-us

bjp
08-22-2018, 01:42 PM
There are some hints of big problems with Dante Labs:
...
2. When you register for an account, you do not get a confirmation email, and though it says the registration succeeded, you cannot log in to the account.
...
4. I received my two kits promptly, but with NO RETURN LABEL OR ADDRESS! And no response to website email sent to them.


I purchased a Dante Labs WGS kit during their Amazon prime day deal. Re 2, I noticed when registering an account that I was only able to login successfully using Google Chrome browser. None of my other browsers worked. Re 4, my kit also arrived with no return mailing label. I contacted support, provided my order details, and quickly received a prepaid FedEx shipping label by email within two days. The kit is not for me (I am already sequenced by another vendor) but I am planning on comparing results with the tester's 23andMe v3 chip data to get a feel for the sequencing quality.

newuser
08-25-2018, 06:39 PM
Any EU customers here?
What is the shipping method of the test kit?
Is it delivered by courier or regular post?
Payment method is credit card?
I'm asking because i have privacy concerns.
Was wondering if it can be ordered on a fake name and drop shipment address.
So these details will help me.

karwiso
08-27-2018, 08:03 AM
Yes, payment is by credit card.
Last time I ordered the kits were delivered by regular post, but I needed to go to the Post Office and sign the package.

Miqui Rumba
08-28-2018, 04:35 PM
I bought WES kit last year and WGS in current year, both by Amazon paying with VISA card. I have problems with DHL ship back from Alicante, Spain: they said that saliva was bio proof and I had to pay plus money. I told Dantelabs staff so and they suggested me to contact DHL again saying "that is medical research". Good thrick, they accepted.
This year I shipped my WGS kit back with UPS and no problem.

MacUalraig
08-29-2018, 03:31 PM
Any EU customers here?
What is the shipping method of the test kit?
Is it delivered by courier or regular post?
Payment method is credit card?
I'm asking because i have privacy concerns.
Was wondering if it can be ordered on a fake name and drop shipment address.
So these details will help me.

My kit arrived by DHL and returned to NL via UPS. I just took it to a local dropoff who scanned the label, didn't have to declare anything etc.
As regards payment I think it was direct to them by cc/debit card rather than paypal or anything like that.

pmokeefe
09-03-2018, 08:52 PM
I'm an American who spends several months a year in Italy. I ordered a Dante Labs WGS kit while I was in Rome. It arrived by DHL at my address in Rome (c/o my Italian landlord) within a couple of days. I dropped it off at a store on my block that handles DHL and Dante Labs received it a couple of days later. I paid with an American credit card in my own name. When the results were ready I was back in America. The VCF files were made available on their web site as links to an AWS cloud account (in Singapore as turned out). They shipped a hard drive with the FASTQ and BAM files to my address in America which is a box in UPS store (that's my legal address in America). The mail box is in my own name. I corresponded with Dante Labs quite a bit via email. They were responsive and helpful. I am still studying the data, but it looks good so far.
I ordered the test in March 2018 and the results came back in July.

pmokeefe
09-03-2018, 08:58 PM
Any EU customers here?
What is the shipping method of the test kit?
Is it delivered by courier or regular post?
Payment method is credit card?
I'm asking because i have privacy concerns.
Was wondering if it can be ordered on a fake name and drop shipment address.
So these details will help me.
I'm an American who spends several months a year in Italy. I ordered a Dante Labs WGS kit while I was in Rome. It arrived by DHL at my address in Rome (c/o my Italian landlord) within a couple of days. I dropped it off at a store on my block that handles DHL and Dante Labs received it a couple of days later. I paid with an American credit card in my own name. When the results were ready I was back in America. The VCF files were made available on their web site as links to an AWS cloud account (in Singapore as turned out). They shipped a hard drive with the FASTQ and BAM files to my address in America which is a box in UPS store (that's my legal address in America). The mail box is in my own name. I corresponded with Dante Labs quite a bit via email. They were responsive and helpful. I am still studying the data, but it looks good so far.
I ordered the test in March 2018 and the results came back in July.

Ownstyler
09-18-2018, 08:51 PM
So how large is the BAM file they provide? Is it actually 100GB? That would be impossible to upload to YFull.

Miqui Rumba
09-19-2018, 09:52 AM
You can generate a new BAM with your chrY only if you asked Dantelabs for your FASTQ files (BAMs have errors sometimes). I downloaded UCSC full chromFa GRCh37.75 and GRCh38.86 for aligning again my FASTQ WES files. After processing with BWA/Samtools I have 2 BAMs of my chrY, one aligned to Hg19 fasta and another one to Hg38 fasta (right to yfull).I detected some markers of my haplogroup R1b-L21 Z253 DF73 in Hg38 raw BCF though I am waiting my WGS hard disk (since April!) to get full sequenced markers. Armando Framarini explained in Dantelabs facebook page (17/07/18) 3 ways to get your chrY results of a Dantelabs WGS. https://www.facebook.com/ufi/reaction/profile/browser/?ft_ent_identifier=643212806042174&av=100007283505611

Miqui Rumba
09-23-2018, 08:54 AM
I have my first Dantelabs WGS results: Sequencing.com Wellness Report and a VCF file with 3.5M SNPs variants (too much filtered). I uploaded snps.VCF to Sequencing.com to obtain another annotated VCF (Snpeff/VEP) totally free with EvE app. I was talking with Curtis Rodgers (Gedmatch admin) yesterday because of I had problems to upload this fat file to Gedmatch Genesis. He thinks this kind of SNPs file are all imputed and they are not right to genealogical research. Well, you know we are all Vindija neanderthal matches in Gedmatch Genesis database (same segment in chr19) although if you have another genealogic kit (Ancestry, 23andme, myHeritage, FTDNA...) and you have uploaded your raw data to DNA.land, then you can profit VCF converted files to compare and make intersection among shared SNPs. This is very easy with Samtools/Bcftools running "bcftools isec Dantelabs.snps.vcf DNA.land. imported.vcf -p dir" you generate 3 files: 1 shared positions (it can be different GTs), 1 only not shared Dantelabs SNPs and 1 not shared genealogic VCF SNPs. If you want concat DNA.land imputed VCF (genealogic) and Dantelabs snps.vcf (medical) is a bit more difficult because Build 37/Hg chr column incompatibility. I shall explain in technical forum.

dudeweedlmao
09-29-2018, 04:44 PM
I have my first Dantelabs WGS results: Sequencing.com Wellness Report and a VCF file with 3.5M SNPs variants (too much filtered). I uploaded snps.VCF to Sequencing.com to obtain another annotated VCF (Snpeff/VEP) totally free with EvE app. I was talking with Curtis Rodgers (Gedmatch admin) yesterday because of I had problems to upload this fat file to Gedmatch Genesis. He thinks this kind of SNPs file are all imputed and they are not right to genealogical research. Well, you know we are all Vindija neanderthal matches in Gedmatch Genesis database (same segment in chr19) although if you have another genealogic kit (Ancestry, 23andme, myHeritage, FTDNA...) and you have uploaded your raw data to DNA.land, then you can profit VCF converted files to compare and make intersection among shared SNPs. This is very easy with Samtools/Bcftools running "bcftools isec Dantelabs.snps.vcf DNA.land. imported.vcf -p dir" you generate 3 files: 1 shared positions (it can be different GTs), 1 only not shared Dantelabs SNPs and 1 not shared genealogic VCF SNPs. If you want concat DNA.land imputed VCF (genealogic) and Dantelabs snps.vcf (medical) is a bit more difficult because Build 37/Hg chr column incompatibility. I shall explain in technical forum.

How long did it take?