PDA

View Full Version : Dante Labs (WGS)



Pages : 1 2 3 [4] 5 6

Erikl86
05-23-2019, 08:47 AM
Ok so a quick update. The good people of YFull have compiled my FASTQ files into a single Hg38 BAM file.

Now I've tried to use WGS Extract this that to extract 23andMe-like txt file to upload to genesis, but it doesn't support that yet.

Is there any easy way to re-align Hg38 BAM file to Hg19?

Donwulff
05-23-2019, 10:02 AM
fastp is sequencer-type agnostic though (at least in the adapter read-through use as in the script) which is why I've suggested it, so it doesn't actually matter.
https://academic.oup.com/bioinformatics/article/34/17/i884/5093234
https://github.com/OpenGene/fastp
The BAM file headers claim it's Illumina because of compatibility with tools that expect that. I'm wondering if the new sequence is really Illumina, is there confirmation of adapter sequences? (Heck, 150bp PE could have different adapter sequences anyway, but fastp doesn't care what the adapter sequence is, as long as it's paired-end short read sequencing).

aaronbee2010
05-23-2019, 10:10 AM
Ok so a quick update. The good people of YFull have compiled my FASTQ files into a single Hg38 BAM file.

Now I've tried to use WGS Extract this that to extract 23andMe-like txt file to upload to genesis, but it doesn't support that yet.

Is there any easy way to re-align Hg38 BAM file to Hg19?

So they do accept FASTQ files then?

Erikl86
05-23-2019, 12:22 PM
So they do accept FASTQ files then?

I asked nicely :).

But seriously, now I need to convert the Hg38 BAM to Hg19 BAM.... any ideas?

CrossMap doesn't seem to want to install :(

JamesKane
05-23-2019, 01:59 PM
You need to realign using the appropriate reference. The author of BWA mem recommends ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz.

There are several posts about how to use BWA assuming you have access to a *nix box with at least 4GB of RAM and 1TB of free disk space.

Erikl86
05-25-2019, 01:14 PM
You need to realign using the appropriate reference. The author of BWA mem recommends ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz.

There are several posts about how to use BWA assuming you have access to a *nix box with at least 4GB of RAM and 1TB of free disk space.

Is there any Windows option? I've tried installing CrossMap but it fails to even install.

MacUalraig
05-25-2019, 07:06 PM
Is there any Windows option? I've tried installing CrossMap but it fails to even install.

There are a number of approaches you could follow, without wanting to sound too trite you can
1. try a windows port of bwa
2. use bowtie2 which I have demonstrated runs under windows (watch out for howls of protest if you pursue this though) - other aligners are available...
3. install (and optionally learn?) linux/linux subsystem thingy for windows
4. pay fgc a small fee and they will do it for you I think - they offer an alignment fastq to hg38 for $75 so maybe hg19 will only be $37.50? :-). Best to email them rather than use the form in case they do it to hg38 instead.

FGC FASTQ to hg38 bam is at
https://www.fullgenomes.com/purchases/136/?

J Man
05-26-2019, 01:11 AM
I am sorry if this seems like a noob type of question but is there anyway that you can download your data and then search for individual SNPs like you can with 23andme once your testing is complete with Dante?

tontsa
05-26-2019, 06:54 AM
Hi,

Easiest way prolly is to upload the .VCF to sequencing.com and use free Data Viewer there.. you can also upgrade it to provide you ClinVar stuff. Sequencing.com also accepts the .BAM and/or FASTQ if you got them as 2 files.



I am sorry if this seems like a noob type of question but is there anyway that you can download your data and then search for individual SNPs like you can with 23andme once your testing is complete with Dante?

MacUalraig
05-26-2019, 09:12 AM
I am sorry if this seems like a noob type of question but is there anyway that you can download your data and then search for individual SNPs like you can with 23andme once your testing is complete with Dante?

If you mean search by position you can view the VCF in IGV (a java download). It will index it the first time you load it, then you can just key or paste in the chr:position you want to look at. Make sure its pointing at the right human reference (hg19 or hg38). You still won't be able to search by rsID though.

pinoqio
05-27-2019, 01:35 PM
200€ / 200$ off today on any product with promo code MEMORIALDAY

Btw, there is a fairly active Facebook group: Dante Labs Customers (https://www.facebook.com/groups/373644229897409/). Andrea (Dante Labs CEO) occasionally answers some questions there.
One of the things he promised there, was free hg38 alignment. There is a webform https://dantelabs.typeform.com/to/GlCpRP, but I asked customer support earlier and they put me in a queue manually too.
A few days ago, the first customer has received this a newly aligned BAM & VCF and apparently they are simply running it through EvE Free on sequencing.com. According to that post, they used Illumina Isaac for aligning and variant calling. The user also reported that there were clear differences between the original hg19 VCF and new hg38 one (he highlighted a indel which was not called, but he is not sure which on is accurate).
Does anyone know more about the relative performance of the Isaac pipline vs the standard bwa-mem + GATK?

Donwulff
05-27-2019, 02:30 PM
Never heard of Isaac aligner & variant calling, but quick Googling suggests that's probably because it doesn't come very highly recommended. Most curious thing though is discussions suggest that it only works on Illumina sequencer "raw raw" data, and not BGI/MGI/Complete Genomics. Pipeline accuracy comparisons are tricky, it's basically a matter of how much effort someone (Comparison author, or tool author) put into tuning it for that specific case; given that, it's surprising that in the original Illumina Isaac paper they couldn't show comparable accurracy to BWA-MEM & GATK Best Practices, and the speed comparison was against old version single-threaded run of the GATK pipeline. On the PrecisionFDA Truth Challenge an Isaac pipeline came nearly last on SNP F-score (combined recall & precision): https://precision.fda.gov/challenges/truth/results

Regardless, if they're running Illumina Isaac, there should be some reason for that choice, so I'm curious if Illumina has managed to improve on it. With every run on EvE costing the same? Otherwise I'd think Microsoft Genomics would be most efficient, albeit perhaps MS charges extra for that.

rock
05-28-2019, 07:45 AM
Have you ever encountered this situation?
The hard disk sent by Dante Labs rejected by Customs, need pay extra tax based on the price of the hard disk and the cost of DHL shipment.
DHL track shows
Further Detail: Entry is rejected by Customs Authorities
Next Step: Entry will be corrected and resubmitted. A DHL representative shall attempt to contact the importer or shipper if further information is required.

MacUalraig
05-28-2019, 06:58 PM
Have you ever encountered this situation?
The hard disk sent by Dante Labs rejected by Customs, need pay extra tax based on the price of the hard disk and the cost of DHL shipment.
DHL track shows
Further Detail: Entry is rejected by Customs Authorities
Next Step: Entry will be corrected and resubmitted. A DHL representative shall attempt to contact the importer or shipper if further information is required.

Not quite although my 23andMe kit got held up at customs in the US pending further paperwork which 23 sorted out after a short delay. Let us know what happens.

Donwulff
05-28-2019, 08:54 PM
23andMe: When they still used courier, instructions told exactly what to fill in the paperwork. Someone from courier service called and demanded to know my social security number, over the phone, which 23andMe instructions told not to fill up. Well, it was either give it over the phone or request a new kit...
Genos Research: On kit inbound, customs sent a notice that I needed to declare the kit for customs. This is kind-of nightmare-fuel, because you have an $1000 invoice and no way to prove that the inbound kit isn't worth $1000. I called and explained the situation, and they released it immediately.
Not customs related:
AncestryDNA: Called to arrange pickup, service person demanded to know what the shipment was. How do you explain that to someone over phone? "Uh... test tube, inert buffer and exempt human specimen?" I tried explaining several times, only to get cut off by the customer service person who wouldn't have any of it. I finally said just "sample", which they accepted. Stressed that they needed to ask me by name, but instead the delivery driver came to the company shipping department causing mayhem as they demanded "the shipment".
Dante Labs: On inbound test kit, the deliveryman kept coming to wrong door or not getting to the door at all, and made no notice so the delivery would've failed if I didn't follow it very closely & arrange very specific details. When they delivered the hard-drive/USB-stick, I was standing way outside to make sure I didn't miss it, and the driver (Different from before) just handed it out to me without checking identity or anything.

Customs issues have been relatively minor, however of course if you are receiving actual goods, like a hard-drive, then by law customs duties & taxes should be paid. It would be nice if Dante Labs arranged to have shipper pay for that (What else are they using DHL for?), although since the customs duties & taxes as well as maximum free value vary by country, I can see where the receiver might have to pay. Does DHL at least take care of the proper paperwork? If you have a separate invoice for the hard-drive, that should be relatively easy (Do they itemize work, hard-drive and shipping though?). At least SSD should fall into a single customs code...

aaronbee2010
05-28-2019, 10:34 PM
Have you ever encountered this situation?
The hard disk sent by Dante Labs rejected by Customs, need pay extra tax based on the price of the hard disk and the cost of DHL shipment.
DHL track shows
Further Detail: Entry is rejected by Customs Authorities
Next Step: Entry will be corrected and resubmitted. A DHL representative shall attempt to contact the importer or shipper if further information is required.

DHL parcels are notorious for being prone to customs checks. I remember when I was ordering a dash camera from China, I was recommended to avoid DHL. A keyboard I got from China was shipped via DHL and I had to pay customs on it.

Ysearcher
05-29-2019, 07:41 PM
If you mean search by position you can view the VCF in IGV (a java download). It will index it the first time you load it, then you can just key or paste in the chr:position you want to look at. Make sure its pointing at the right human reference (hg19 or hg38). You still won't be able to search by rsID though.

I always manage to find the most difficult way to do anything. I have loaded all of my many sources of data (whole genomes, whole exomes, 23andMe genotypes) for many family members into UltraEdit, then I just start that up, wait about 15 minutes for it to load all the results, and then browse them at will. Then I copy and paste into Apple notes for any given gene/region of interest, and annotate using whatever genome browser. Very time consuming, but it works. I'm mostly retired, and it's better than crossword puzzles, and keeps me thinking. I just wish I had some reliable to data to interrogate. All of my data is mostly cheesy crap.

ybmpark
06-05-2019, 10:49 PM
I am sure other companies will offer WGS at competitive prices soon. I regret buying Dante. Their service is comical at best.

Erikl86
06-10-2019, 08:44 AM
So few updates:

1. I've successfully got a full analysis of my Y-dna line in YFull, and as I've mentioned before, they've also compiled the FASTAQ files to an Hg38 BAM file.

2. Any attempt using Sequencing.com or other tools to produce a usable VCF file from my Hg38 to get normal results in either Gloabl25 or GEDmatch Genesis were futile - I do get closest populations Ashkenazi Jew on GEDmatch, but the distances are all screwed up (I get shortest distance of @12).

3. I've used FullGenomes service to re-compile my Hg38 BAM file to Hg19 BAM file, and they've just delivered it to me, so I've started downloading it.

Will try to use this tool to extract a 23andMe-like VCF file from my Hg19 BAM file:

http://www.beholdgenealogy.com/blog/?p=3018

Will keep you updated.

Donwulff
06-10-2019, 11:53 AM
Will try to use this tool to extract a 23andMe-like VCF file from my Hg19 BAM file:

http://www.beholdgenealogy.com/blog/?p=3018

Will keep you updated.

That's interesting, although the blog page doesn't even stat where that program is coming from. Packaging reference genomes and samtools I'm guessing it's doing pretty good job, if heavy. (And it should really dload the reference genomes on demand, but I digress...). There seems to be heavy tendency right now to use analysis on home-computer, I wonder why this is. Most people now and certainly in the future won't have full-fledged computer (Albeit as noted, newest mobile phones can pack 1 terabyte of storage, which would certaianly be enough for whole genome analysis, if slow). First you're buying $1000-$2000 computer for maybe one-time analysis of the genome, secondly no telling how much the data-transfer will take or cost, and thirdly assuming this is due to privacy, how will you be able to tell the third-party software won't upload your genome, and everything else found on your computer at the same time, to a server in China or Russia or whatever your Evil Empire is?

You could use Galaxy for some genetic analysis right now, although I've had trouble getting my BAM out of Sequencing.com (Huh?) so I may have to actually upload my whole BAM online again to actually try how well it works right now. Sequencing.com's own analysis have gotten relatively expensive, so I don't know how well they work right now either, although it's probably still cheaper than nay other way. It's not clear from what your description what you tried & what went wrong, but I'm guessing it's the same olf VCF vs. gVCF/tab-file problem being discussed every few replies. Won't the EvE gVCF or 23andMe output work?

mwasser
06-10-2019, 11:11 PM
how will you be able to tell the third-party software won't upload your genome, and everything else found on your computer at the same time, to a server in China or Russia or whatever your Evil Empire is?

The program is open source, source is included.

Donwulff
06-11-2019, 01:13 PM
The program is open source, source is included.

So did you investigate and analyze all the binaries in the package? The issue if hypothetical, of course, much like people objecting before that they can't use Sequencing.com because they may send their DNA data to China (Or Russia in the case of YFull). I'm just pointing out that running a Windows software package is more insecure than sending your DNA data for third party analysis.
In this case there was absolutely no information of author/provenance of the software, or the license.

I did download the package, and it's bunch of binaries, couple of scripts with the reference genomes making the bulk of the download. Source is actually not included, but everything looks to be open source. Individual people aren't really equipped to audit binaries, much less source-code though. Open source protects against trojans & backdoors when you pull your code from a trusted location used by thousands of users and developers, but I'm not really here to argue that... If you want to be relatively safe you could always run it in throwaway virtual machine without network connection ;)

However, we really need secure web-services for processing genomic data. Unfortunately, as noted, Sequencing.com has started returning "403 Forbidden" to my attempts to retrieve/share my BAM files, which suggests either they didn't have backups or their storage isn't as unrestricted as they implied (No, I didn't contact their support yet, I just Googled/searched their knowledge base and was surprised to find no matches). So I'm not sure I can any more recommend them on data integrity grounds.

tontsa
06-11-2019, 02:56 PM
However, we really need secure web-services for processing genomic data. Unfortunately, as noted, Sequencing.com has started returning "403 Forbidden" to my attempts to retrieve/share my BAM files, which suggests either they didn't have backups or their storage isn't as unrestricted as they implied (No, I didn't contact their support yet, I just Googled/searched their knowledge base and was surprised to find no matches). So I'm not sure I can any more recommend them on data integrity grounds.

I've done few EvE premium runs lately.. and everytime the results return you 403 and you have to ask support to give you an "48 hour" link so you can download it to your own computer.. paradoxically they can't save the results under your account.. you actually have to upload it yourself if you want to use the resulting file on sequencing.com. I'll really need to look into building my own cluster with working pipelines so I can stop paying sequencing.com for each run..

TigerMW
06-11-2019, 04:10 PM
...and thirdly assuming this is due to privacy, how will you be able to tell the third-party software won't upload your genome, and everything else found on your computer at the same time, to a server in China or Russia or whatever your Evil Empire is?

... much like people objecting before that they can't use Sequencing.com because they may send their DNA data to China (Or Russia in the case of YFull). I'm just pointing out that running a Windows software package is more insecure than sending your DNA data for third party analysis.

These are considerations. I'm not sure I agree that Windows is less secure then sending your DNA data to third party analysis. I don't think anyone can answer that with some objective numbers.

Regardless, sending your Whole Genome Sequencing (WGS) data across the internet (anywhere) is something to consider heavily. Essentially, your WGS data is the blueprint to yourself as well as parts of your family members.

This is the reason I won't recommend WGS at this point. We need clear laws on this and secure systems. I don't think mom and pop businesses can support this. The big governments and hackers are fighting cyberwars.

JamesKane
06-11-2019, 10:52 PM
These are considerations. I'm not sure I agree that Windows is less secure then sending your DNA data to third party analysis. I don't think anyone can answer that with some objective numbers.

There is little stopping a trojan Windows application from phoning home with your genetic data. How does the operating system know if small encrypted packets broad casting back to AWS contain sensitive information instead of typical check for updates chatter? Unless you are compiling the source for these newer GUI's popping up after auditing the code, you have no idea what's in the distributed archives. Most interested parties don't have the capacity to compile the code much less understand the details of its operations.

On the other hand if you are installing the individual components like GATK, bwa, Yleaf, etc... from their repositories, there are thousands of eyes on the ball. There's little risk in doing the analysis from a command prompt and judicious use Google that makes it pretty simple these days.

Web hosted services like sequencing.com are convenient but unless their market places pick up or they score million dollar contracts to access the data, I don't see them remaining free. Cloud compute and storage resources are not inexpensive.

Donwulff
06-11-2019, 11:53 PM
I should stress I have no information that WGS Extract is dangerous, and I'm not accusing that, "it's probably safe". I've just seen people be very security-conscious even on that thread, so I thought to mention that running a program (or script running programs) found off Internet on your genome probably isn't the best course if you're worried about security of your genomic data. And it was more than little concerning to me that the web-page gives no hint as to author/origin/license of that software. In 2017 there was new malware every 4.2 seconds, today they're popping up faster than new humans: https://www.gdatasoftware.com/blog/2017/04/29666-malware-trends-2017 But yes, one way to mitigate would be to run it in some sort of sandbox, like a virtual machine without network access.

Sequencing.com is a good case in point on the price. They started off with some nice, free analysis products, but at present time I think "gVCF to VCF" is the ONLY free thing they have, and the individual tool prices have been raising exponentially, so they're practically pricing themselves out of market. I don't think "free" is the important word here, but $19.99 for extracting 23andMe compatible file? You're probably better off buying 23andMe test. Another good example is Helix, they even do their own sequencing, but apparently sequencing & genomics marketplace wasn't profitable enough so they made a strategic shift away from personal genomics. Of course, getting a computer cluster & learning all the tools to use on it for one-time genomic analysis is still going to be way, way more expensive options. Galaxy is still free (until people rush that), but it takes more effort and doesn't have specific tools. One could always just drop a pre-configured virtual machine on AWS/Azure and pay for the computing time and storage, though, but that has somewhat similar issue to the "found binary on the Internet".

mwasser
06-12-2019, 09:37 AM
WGS Extract has got the license text included. It is the file "wgsextract-license" in the "open_source_licenses" subdirectory of the download archive. It is GPL V3.

The source code is also included. It is the file "wgsextract.py" in the "programs/wgsextracty" subdirectory. The program itself does even run from source, there is no binary. The batch file in the main folder calls the python interpreter and let it run wgsextract.py.
If you look at the python source code, then you can see that this file doesn't do any evil. It just shows the GUI, using the official TKinter module of python, and it does run samtools, bcftools and other open source products.

The only binary files in the archive are the third party software products that come included, so that the user doesn't have to install anything, and can run it immediately. Those binaries don't open any network connections, as can be seen in a network sniffer. But if somebody still suspects something evil, then he can download Python, Cygwin etc. from the official homepages and overwrite the .exe and .dll files in the archive with them. However, somebody who is that cautios, would also not upload his whole genome to a commercial company like sequencing.com.

That's just my humble opinion. At the end of the day, everybody has to decide himself which way to take to extract the files.

TigerMW
06-12-2019, 04:30 PM
... Regardless, sending your Whole Genome Sequencing (WGS) data across the internet (anywhere) is something to consider heavily. Essentially, your WGS data is the blueprint to yourself as well as parts of your family members.

This is the reason I won't recommend WGS at this point. We need clear laws on this and secure systems. I don't think mom and pop businesses can support this. The big governments and hackers are fighting cyberwars.

How does the hard drive method of transmission work in terms of safety? Does it provide extra security or are there holes in this method.

I have some information on my home systems cordoned off network-wise and only back up with portable hard drives that go into safes. Maybe I'm overdoing it but there is some data I want only my family be able to retrieve. Hard drives are cheap and everyone should have a solid fireproof safe.

JamesKane
06-12-2019, 04:48 PM
That depends on chain of custody. Many of the current WGS offerings are sending the bar-coded sample to partners for sequencing. You would have to inquire about how they get the FASTQ data back before delivery to you. I would imagine it’s a mix of encrypted file transfer or hard disk depending on volume.

pinoqio
06-12-2019, 06:55 PM
Virtually all trojan/virus/malicious software is created to make money in one way or another.
While stealing/uploading DNA seq data is probably not that hard, I just can't think of a scenario where a trojan writer could make money off the genomes stolen from random people.
Maybe with the exception of targeted thefts aimed at politicians and celebrities who could be blackmailed.

Once DNA printing advances, it might become useful to frame someone for a crime, but at the same time the public trust in DNA fingerprinting would nosedive, so it might not make much of a difference.
Those who have the most to gain from DNA data would probably be the insurance industry, but they won't risk the media firestorm if they are exposed buying customer DNA from dodgy botnet operators.

At the moment, the cost/payoff ratio for DNA theft just means that a criminal is much better off fishing for your bank data than creating a DNA database in the hopes of selling it at some point.
For authoritarian nation-state actors the equation changes a bit, but I still don't think even China would be interested in creating a DNA database for foreigners - if I were a Chinese citizen, I would be concerned.

I guess we'll have to live with the idea that our genome is not private - not any more private than a literal fingerprint. Nanopore sequencers are already the size of large flash drive, and most of the tech in there is commodity electronics - the price will continue to go down. We are literally one order of magnitude in price reduction away from 50$, and then lifting a full genome off of a used coffee cup will be as easy as lifting off a fingerprint.

Trying to ban people from using tech that is cheap and can be used secretly is not going to be successful, the best we can hope for is stopping companies from discriminating using it.

Donwulff
06-12-2019, 07:03 PM
Should probably start (another) thread on genomics security. The hard-drive/USB-key home-delivery is probably worst possible choice from security standpoint. In my case, the courier/deliveryman handed it out to me with no identification or verification while I was just standing outside. There is also fairly good chance that mysterious storage-media gets copied for analysis at each border-crossing. Also, anybody else at the receiving household will have access to it, which may not be desirable in many cases. The way to do the transfer safely is to encrypt the media with strong encryption, and provide the encryption key on the web-site to authenticiated user. Of course, encryption isn't always user-friendly, so perhaps make it optional.

HTTPS transfers itself are fairly secure, I'd like to say, as secure as we know to make them, the risks to online-transfer are mainly related to endpoints, including someone getting copy of the company's SSL/TLS certificate, but in that case they can probably get a copy of the data as well. I'm not sure if there are any DTC genomic companies where genetic data is NEVER loaded on Internet-connected computer. The relative risk of WGS vs. microarray would be a fascinating debate, if a BAM file is around 100GB compressed and often ambiguous while requiring non-trivial computing resources, it becomes somewhat self-limited compared to microarray files. Nevertheless, microarrays (AncestryDNA, MyHeritage, 23andMe etc.) are more than enough to identify people, their genetic ethnicity, and genetic variants of known effect.

Consent is extremely important for reasons of ethics, trust and people's feeling of agency. However, please remember that there are many genetic analysis labs that will analyze samples no questions asked, or you can just say it's your late father's chewing gum for genealogical research. VolTRAX + Flongle even today allows DNA sequencing on the fly with a mobile phone. It's been shown time and time again that's it's far more cost-effective and succesful to just re-sequence someone's genome than to obtain it via targeted hacking. Researchers have access to hundreds of thousands of sequenced samples with attached information about ethnicity, medical conditions etc. so they're not really interested in obtaining YOUR sample with no attached information. Some opportunistic hacks like indeed having a program that copies your genotypes & contents of Documents folder, or blackmailing a company by threatening to reveal customer's genetic information are likely, and with more people taking genetic tests, scams claiming to have your genetic data can't be far behind, and will cause some issues.

gene.test
06-13-2019, 07:39 AM
How long you were waiting for results?

Erikl86
06-13-2019, 08:06 AM
I seriously give up. Using EvE, using WGS Extract, using DNA Kit Studio or whatever - I still can't get a normal raw data file to upload to GEDmatch or to use in Global25 (Davidski got bogus results).

For now, I'm happy with the YFull upload I got and the full medical report I have.

I did manage to get nice results through GEDmatch by merging my DanteLab's 23andMe extraction upload with my previous 23andMe v5 upload, but it just seems to rely mostly on my v5 upload - but it just seems to not be usable in ancestry testing yet, unfortunately.

tontsa
06-13-2019, 08:21 AM
I seriously give up. Using EvE, using WGS Extract, using DNA Kit Studio or whatever - I still can't get a normal raw data file to upload to GEDmatch or to use in Global25 (Davidski got bogus results).


I got ok results with EvE by converting FASTQ -> gVCF hg19 and then running DNA Kit Studio on it with Livingdna template.

mwasser
06-13-2019, 08:55 AM
I seriously give up. Using EvE, using WGS Extract, using DNA Kit Studio or whatever - I still can't get a normal raw data file to upload to GEDmatch or to use in Global25 (Davidski got bogus results).
[...]
I did manage to get nice results through GEDmatch by merging my DanteLab's 23andMe extraction upload with my previous 23andMe v5 upload, but it just seems to rely mostly on my v5 upload - but it just seems to not be usable in ancestry testing yet, unfortunately.

Did WGSExtract show you an error message in the black command line window? If an error occures, than an empty 23andme file is produced.
The downloadable beta version had the following limitations:

- the .bai index file had to be in the same directory as the bam file
- the path of the bam file wasn't allowed to have non standard characters in it
- autosomal export was only possible for hg19

The next version addresses these issues, but it is not published yet.

Erikl86
06-13-2019, 09:17 AM
I got ok results with EvE by converting FASTQ -> gVCF hg19 and then running DNA Kit Studio on it with Livingdna template.

Awesome ! I'll try to use the LivingDNA template !

EDIT: Didn't work - got the same bogus results :(

Erikl86
06-13-2019, 09:49 AM
Did WGSExtract show you an error message in the black command line window? If an error occures, than an empty 23andme file is produced.
The downloadable beta version had the following limitations:

- the .bai index file had to be in the same directory as the bam file
- the path of the bam file wasn't allowed to have non standard characters in it
- autosomal export was only possible for hg19

The next version addresses these issues, but it is not published yet.

I just saw that FullGenomes have forgot to provide me with the .bai file for the realigned hg19 bam file.

mwasser
06-13-2019, 10:39 AM
The next, yet unpublished version of WGSExtract checks directly after opening a bam file if there is no .bai index file, and creates it in that case.
In the meantime, there is an alternative possibility to create it, albeit a bit more cumbersome:
If you don't feel uncomfortable using the windows command line, you could open the command line, go to the "samtools-cygwin" subdirectory of WGSExtract, and type:
samtools index /path/to/your/bamfile/filename-of-bamfile.bam
This would create the missing .bai index file.

Erikl86
06-13-2019, 01:38 PM
The next, yet unpublished version of WGSExtract checks directly after opening a bam file if there is no .bai index file, and creates it in that case.
In the meantime, there is an alternative possibility to create it, albeit a bit more cumbersome:
If you don't feel uncomfortable using the windows command line, you could open the command line, go to the "samtools-cygwin" subdirectory of WGSExtract, and type:
samtools index /path/to/your/bamfile/filename-of-bamfile.bam
This would create the missing .bai index file.

I get this comment when I run it:

"[W::bam_hdr_read] EOF marker is absent. The input is probably truncated"

Erikl86
06-13-2019, 02:55 PM
Ok so I ran WGSExtract now with my bai file, and I get 1 kb size raw data file - which means a bogus file.

This is what I get:

C:\WGSExtractBeta\WGSExtract\programs\wgsextract>"C:\WGSExtractBeta\WGSExtract\programs\samtools-cygwin\bash" "C:\WGSExtractBeta\WGSExtract\temp\extract23.sh"
bash.exe: warning: could not find /tmp, please create!
Starting mpileup... Please be patient!
[warning] samtools mpileup option `v` is functional, but deprecated. Please switch to using bcftools mpileup in future.
[mpileup] 1 samples in 1 input files
Mpileup completed. Starting SNP calling...
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
SNP calling completed. Starting annotation...
Annotation completed. Starting extraction from VCF ...
Extraction from VCF completed. Sorting by chromosome and position ...
dos2unix: converting file C:/BAM/output/ASJ9N.GRCh37__23andmeV3.txt to Unix format...
C:/BAM/output/ASJ9N.GRCh37__23andmeV3.txt was created. Compressing ...
adding: ASJ9N.GRCh37__23andmeV3.txt (deflated 46%)
extract23: Output file C:/BAM/output/ASJ9N.GRCh37__23andmeV3.txt.zip was created.

mwasser
06-13-2019, 03:42 PM
Your BAM file is in GRCh37 format - which is fine for autosomal export, the old beta version just expected hg19 format.

Please open the file extract23_script_template.txt with a text editor. It is situated in the subdirectory /programs/extract23. Look for this line:


samtools mpileup -C 50 -v -l ${REF_23ANDME} -f ${REF} ${BAMFILE_SORTED} > 23andMe_raw.vcf.gz


replace it with these lines:


samtools view -H ${BAMFILE_SORTED} | /bin/sed -e 's/SN:\([0-9XY]\)/SN:chr\1/' -e 's/SN:MT/SN:chrM/' | /bin/samtools reheader - ${BAMFILE_SORTED} > bam_tmp.bam
samtools index bam_tmp.bam
samtools mpileup -C 50 -v -l ${REF_23ANDME} -f ${REF} bam_tmp.bam > 23andMe_raw.vcf.gz


Please save the text file and run WGSExtract again.

EDIT: Don't use these steps, sorry. Please see my other post below.

Erikl86
06-13-2019, 07:06 PM
Your BAM file is in GRCh37 format - which is fine for autosomal export, the old beta version just expected hg19 format.

Please open the file extract23_script_template.txt with a text editor. It is situated in the subdirectory /programs/extract23. Look for this line:


samtools mpileup -C 50 -v -l ${REF_23ANDME} -f ${REF} ${BAMFILE_SORTED} > 23andMe_raw.vcf.gz


replace it with these lines:


samtools view -H ${BAMFILE_SORTED} | /bin/sed -e 's/SN:\([0-9XY]\)/SN:chr\1/' -e 's/SN:MT/SN:chrM/' | /bin/samtools reheader - ${BAMFILE_SORTED} > bam_tmp.bam
samtools index bam_tmp.bam
samtools mpileup -C 50 -v -l ${REF_23ANDME} -f ${REF} bam_tmp.bam > 23andMe_raw.vcf.gz




Please save the text file and run WGSExtract again.


Is this the line to be replaced?


{{samtools}} mpileup -C 50 -v -l {{reftab}} -f {{refgenom}} {{input_bamfile_sorted}} > {{temp_raw_vcf}}

If so, then it isn't exactly as you typed it - how exactly should I write the other lines? Or will copy them as you typed would be suffice?

EDIT: found the exact line in extract23_original_script.sh rather than in extract23_script_template.txt.

mwasser
06-13-2019, 08:01 PM
Sorry, I mixed up the different versions in my head.
Please ignore my last comment. Correct is:
Look in the file
extract23_script_template.txt

for this line:

{{samtools}} mpileup -C 50 -v -l {{reftab}} -f {{refgenom}} {{input_bamfile_sorted}} > {{temp_raw_vcf}}

Replace it with:

samtools view -H {{input_bamfile_sorted}} | /bin/sed -e 's/SN:\([0-9XY]\)/SN:chr\1/' -e 's/SN:MT/SN:chrM/' | /bin/samtools reheader - {{input_bamfile_sorted}} > bam_tmp.bam
samtools index bam_tmp.bam
samtools mpileup -C 50 -v -l {{reftab}} -f {{refgenom}} bam_tmp.bam > {{temp_raw_vcf}}

Erikl86
06-14-2019, 12:53 PM
Sorry, I mixed up the different versions in my head.
Please ignore my last comment. Correct is:
Look in the file
extract23_script_template.txt

for this line:

{{samtools}} mpileup -C 50 -v -l {{reftab}} -f {{refgenom}} {{input_bamfile_sorted}} > {{temp_raw_vcf}}

Replace it with:

samtools view -H {{input_bamfile_sorted}} | /bin/sed -e 's/SN:\([0-9XY]\)/SN:chr\1/' -e 's/SN:MT/SN:chrM/' | /bin/samtools reheader - {{input_bamfile_sorted}} > bam_tmp.bam
samtools index bam_tmp.bam
samtools mpileup -C 50 -v -l {{reftab}} -f {{refgenom}} bam_tmp.bam > {{temp_raw_vcf}}


That doesn't work:

F:\WGSExtractBeta\WGSExtract\temp\extract23.sh: line 14: /bin/sed: No such file
or directory
F:\WGSExtractBeta\WGSExtract\temp\extract23.sh: line 14: samtools: command not f
ound
F:\WGSExtractBeta\WGSExtract\temp\extract23.sh: line 14: /bin/samtools: No such
file or directory
F:\WGSExtractBeta\WGSExtract\temp\extract23.sh: line 15: samtools: command not f
ound
F:\WGSExtractBeta\WGSExtract\temp\extract23.sh: line 16: samtools: command not f
ound
tbx_index_build failed: F:/WGSExtractBeta/WGSExtract/temp/temp_autosomes_raw.vcf
.gz
Mpileup completed. Starting SNP calling...
Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all site
s are diploid
Failed to open F:/WGSExtractBeta/WGSExtract/temp/temp_autosomes_raw.vcf.gz: unkn
own file type
tbx_index_build failed: F:/WGSExtractBeta/WGSExtract/temp/temp_autosomes_called.
vcf.gz
SNP calling completed. Starting annotation...
Failed to open F:/WGSExtractBeta/WGSExtract/temp/temp_autosomes_called.vcf.gz: u
nknown file type
tbx_index_build failed: F:/WGSExtractBeta/WGSExtract/temp/temp_autosomes_annotat
ed.vcf.gz
Annotation completed. Starting extraction from VCF ...
Failed to open F:/WGSExtractBeta/WGSExtract/temp/temp_autosomes_annotated.vcf.gz
: unknown file type


EDIT: Found out what was needed to be fixed:



{{samtools}} view -H {{input_bamfile_sorted}} | {{sed}} -e 's/SN:\([0-9XY]\)/SN:chr\1/' -e 's/SN:MT/SN:chrM/' | {{samtools}} reheader - {{input_bamfile_sorted}} > bam_tmp.bam
{{samtools}} index bam_tmp.bam
{{samtools}} mpileup -C 50 -v -l {{reftab}} -f {{refgenom}} bam_tmp.bam > {{temp_raw_vcf}}


Instead of these:



samtools view -H {{input_bamfile_sorted}} | /bin/sed -e 's/SN:\([0-9XY]\)/SN:chr\1/' -e 's/SN:MT/SN:chrM/' | /bin/samtools reheader - {{input_bamfile_sorted}} > bam_tmp.bam
samtools index bam_tmp.bam
samtools mpileup -C 50 -v -l {{reftab}} -f {{refgenom}} bam_tmp.bam > {{temp_raw_vcf}}


Now it works perfectly and I got excellent results on GEDmatch! What a great tool :) Thanks !

ybmpark
06-14-2019, 05:18 PM
How long you were waiting for results?

Almost 7 months and I still don't have the full raw data. I think they lost my data and are trying to recontruct them from the vcf files. Dante is pure fraud.

Erikl86
06-15-2019, 09:18 AM
I should summarize the pipeline I went through, for others to follow:

1. Got my two FASTAQ files from DanteLabs (as links in my account).
2. Shared my FASTQ files with YFull - they've sent me an Hg38 BAM file and did my paternal haplogroup analysis.
3. Shared my Hg38 BAM with Fullgenomes to get it re-aligned to GChr37/Hg19.
4. Used WGSExtract to create a 23andMe-like raw data and uploaded it to GEDmatch.

What didn't work:

1. Tried to use the SNP VCF from DanteLabs directly on DNA Studio, even with the reference.

2. Tried Sequencing.com EvE to create VCF or BAM from my FASTAQ files - didn't yield any good results.

MacUalraig
06-15-2019, 09:42 AM
I should summarize the pipeline I went through, for others to follow:

1. Got my two FASTAQ files from DanteLabs (as links in my account).
2. Shared my FASTQ files with YFull - they've sent me an Hg38 BAM file and did my paternal haplogroup analysis.
3. Shared my Hg38 BAM with Fullgenomes to get it re-aligned to GChr37/Hg19.
4. Used WGSExtract to create a 23andMe-like raw data and uploaded it to GEDmatch.

What didn't work:

1. Tried to use the SNP VCF from DanteLabs directly on DNA Studio, even with the reference.

2. Tried Sequencing.com EvE to create VCF or BAM from my FASTAQ files - didn't yield any good results.

Did you speak to FGC about the 23andMe extract file too? Might save (others) a lot of bother. I had a feeling they offered this too, possibly as their 'Gedmatch Analysis of Uploaded WGS data
$25.00' product - although if so, they could name it a bit better.

Someone surely will get a grip on all this soon otherwise we'll all have to wait for 23andMe to launch their WGS which at least has a chance of a decent interface. Probably it won't be Dante though!

ybmpark
06-18-2019, 08:48 AM
Dante told me that there is just no way that they would provide the link to my full data for free and led me to buy the hard drive but nothing happened after 8 weeks.
Anyone with the same experience?
And they keep sending me these inane ads about full genome sequencing for $ 399. Big joke. I think it was meant to be taunting.

tontsa
06-18-2019, 10:32 AM
Dante told me that there is just no way that they would provide the link to my full data for free and led me to buy the hard drive but nothing happened after 8 weeks.
Anyone with the same experience?


For me it took 3 months after the initial .vcf results on web page for harddisk delivery. Dante is very good patience growing exercise.

bjp
06-18-2019, 01:16 PM
Dante told me that there is just no way that they would provide the link to my full data for free and led me to buy the hard drive but nothing happened after 8 weeks.
Anyone with the same experience?
And they keep sending me these inane ads about full genome sequencing for $ 399. Big joke. I think it was meant to be taunting.

I continue to contact their customer support once monthly about delivery of the hard disk I have already paid for. Responses have changed from "we will get you to the front of the shipping queue" (about three months ago) to "we will send it in a few weeks". I am not far from disputing the charge with my credit card provider. Enough chargebacks and they will lose the ability to accept credit card payment.

But, I want the data and would never have paid them for the sequencing without it.

poi
06-21-2019, 06:14 PM
I have also not received my BAM hard drive yet. Whenever I email them, they have been very responsive (within an hour) and promise that they have sent the request to their lab for the harddrive, but it is has been over a month or two. Just now they responded saying that they have again sent the request to their lab. I already have the VCFs, but need BAM for full utilization across various tools/platforms.

Wireless
06-21-2019, 06:39 PM
They go from excuses for time wasting to out right lying...they cannot be trusted ..period. They lost my 6th sample with them and they lied about. It was sent October 2018 and after waiting and waiting I contacted them on March 2019, their reply: After checking internally with UPS, we realized that, with that tracking code, no one has shipped back the kit with the saliva collection. Most likely you used another return shipping label. Please let me know about this.

I then contact UPS myself and get this as response using their tracking code: Thank you for your e-mail.

Our record shows that this shipment was delivered on xx/10/18 and signed by Patrick at the dock.

If we may assist you in the future, please feel free to contact us.

In the end they admitted the "mix-up" and sent me 2 kits back, but we lost all this time and potentially the sample could not be taken again because of location, illness, death etc.

You need patience and then some...

MacUalraig
06-21-2019, 07:23 PM
I always track outgoing DNA samples with the courier tracking number, both parties should be able to track it without dissension with it. Then you know when it turned up and can make sure they have updated their webpage.

Having said that, with my short read sample I misread an '8' for a 'B' when I wrote the tracking number down and ended up having to email about it anyway.

Donwulff
06-28-2019, 01:36 PM
Dante Labs "Manage Your Kits" now takes you (or well, at least me!) into a new "Health Manager". I kinda like the look & technical side of it, however all disclaimers and soft explanations seem to be gone from this experience, which is... quite concerning. I have one "high-risk" variant whose text starts with "We have found a heterozygous variant associated with..." but does not explain what heterozygous means, or that the condition is considered a recessive one, ie. they did not find I'm affected but to many readers it would sound like they are affected!

Let's be clear here, I quite like this level of information from technical standpoint, but for the general public this must be quite bewildering & dangerous, although the text looks to be direct copy of Genetics Home Reference https://ghr.nlm.nih.gov/ Health Conditions (US Government works are in public domain generally). However, when you combine diagnostic and treatment information with the genetic risk with no explanation or present disclaimer about the genetic results, it creates the impression you've been diagnosed and should start treatment. Also, especially for people who are familiar with genetics, this is still problematic because the appears to be no way to determine which variants have been considered, which were no-calls and variants of unknown significance.

On the plus side, registering a kit in the new system now asks you to confirm that you understand the information is not medical advice etc. Their Terms of Service covers this most comprehensively of the sites I've seen, however we all know most people don't read the ToS... Now you have to confirm you've read them with the short-form disclaimer before sending your sample in, but only if you're a new customer. There is also finally an option to consent to genetics research, although since they don't (yet) as any other personal information, that won't be very useful. But new customers may now consent to research if they wish to.

Heads-up though, at least I had to re-register a kit that's currently being processed. Since many people may no longer have their kit-ID with them, or may not realize they have to re-register, I seriously hope Dante Labs is going to add them back automatically at the latest when the kits are processed further, but at least for me it wasn't listed and I had to manually add it. The already processed kit was there though, along with links to all the previous reports & vcf's.

pmokeefe
06-28-2019, 01:58 PM
I looked at the new "Health Manager" as well. One problem for me: I have purchased 8 DNA test kits from them, plus 2 raw data kits, which are in various stages of processing, plus there have been replacement kits for ones which failed QC or were never received. On their old page they had a 'nickname' field for each kit, which I used to keep track of them. I can't seem to find the nickname field, any date associated with the kit or even my order history - it was completely blank the last time I looked. Has anyone found a way to access any of that information which I may have missed?

Donwulff
06-28-2019, 02:23 PM
I forgot one thing, which I just saw on Reddit. My login to the main Dante Labs website didn't work, I had to use the "forgot password" option to set a new password for the Health Manager. Chrome password manager may not really like that, because it seems to assume they're same. Oh well.

When I re-registered my second kit, it asked me for an alias, and I entered one. An "All Kit IDs" button appeared at the upper left of the sidebar, which shows the kit-ID's as well as the alias I set. It looks like there's no way to set alias for the existing kits; I would have assumed they'd carry over from the old system. Order history is still there on my main account page, with order dates, but (un)fortunately it never showed kit-ID's, which is probably good for privacy. If you scroll to the end of the "Reports" page, it has links to the old reports & raw data, which at least for me also contains the processing date (Unfortunately not order date!).

Someone called this "Beta" though I'm not aware how you would access the old system. It certainly seems to have a few quirks to still work out.

Also re. the above comments about disclaimers, I should note that Promethease works somewhat similar with disclaimer just once which you could have passed without reading, and then treatment information in the reports generally copied from research papers, usually without clear explanation of what was tested/considered and what the genetic result means (Although SNPedia allows custom result for each genotype, so with some effort people can put in "Carrier", "Check for these variants" etc.). Genos Research also has similar issue, although it pulls classification from ClinVar rather than Genetics Home Reference. So not to single out Dante Labs, it's obviously really common, though it should really be done better and as said the medical information from GHR contributes to wrong impression.

pmokeefe
06-28-2019, 02:42 PM
...
Order history is still there on my main account page, with order dates, but (un)fortunately it never showed kit-ID's, which is probably good for privacy.
...

Thanks Don, but alas my Order History is blank. I sent the company a message.

poi
06-29-2019, 08:36 PM
The site's redesign makes it a bit getting used to (haven't spent much time there yet), but login to the new site and visit "Reports": https://genome.dantelabs.com/reports

Collapse the "wellness and lifestyle report" panel and you should see all panels, including raw data downloads, etc.

https://i.imgur.com/5aFjjoDr.png

Administrator
06-29-2019, 11:40 PM
[ADMIN]

After some investigation, our team has removed multiple posts from this thread relating to client-side calls from Dante Labs.

Thanks for your cooperation.

Donwulff
06-30-2019, 06:43 AM
You can also just scroll to the end of the Reports "tab" to get to the raw data links, but you have to expand (from the right facing arrow/angle) the "Personalized Reports" section to get the the old PDF reports. It's not obvious they're "personalized" and the function of the arrows isn't self-evident so it took just a moment to figure out for me. The "Download PDF" for the list of traits they show doesn't work for me, however. I'm wondering if it works for people who have the new (non-Sequencing.com) Health & Wellness report. Also since they appear to be generating those reports themselves, they should've added the new report to old users so I could compare the reports ;)

tontsa
06-30-2019, 06:55 AM
To me that report just looks scraped info from the PDF.. at least in my case it's 1:1. What boggles my mind is that it seems to be based on the snp.vcf they upload to sequencing.com instead of the fastq and hg38 alignment so it's missing lots of data.

Donwulff
06-30-2019, 08:47 AM
If the online report & PDF were substantially different, THAT would be worrying, that's the whole point. Unfortunately I don't have the new PDF report, so I can't exactly confirm, but it looks like interactive and "hardcopy" versions of the same reports, nothing scraped. The health reports are no longer generate by Sequencing.com, and vcf aren't automatically uploaded there. The VCF is a "human readable" version of the FASTQ sequence data, it isn't missing any data that the reports could show.

Yes, they should probably get on with the times and move to hg38 alignment, but honestly most analysis are still done and calibrated on GRCh37/hg19. (See for example http://dx.doi.org/10.1186/s12859-019-2620-0 ) There's an issue in the reports which is that they don't show which variants were considered ( https://doi.org/10.1038/gim.2017.119 sort of), if there were variants they couldn't determine ( http://dx.doi.org/10.1186/s13059-019-1707-2 ), or variants of unknown significance ( http://dx.doi.org/10.3802/jgo.2019.30.e80 ). That's really common, but without that information it's impossible to determine how comprehensive the report is. These issues aren't majorly affected by the reference genome build or FASTA data, which are always converted to vcf before interpretation. A Genomic VCF or confidently called regions file would help with the "variants they couldn't determine" issue, but without a list of variants considered it's impossible to even tell if they used one.

tontsa
06-30-2019, 01:05 PM
A Genomic VCF or confidently called regions file would help with the "variants they couldn't determine" issue, but without a list of variants considered it's impossible to even tell if they used one.

Yeah that's what I was after.. looks like at least my report is based on the heavily filtered hg19 snp.vcf so it's missing some stuff that you would get from hg19 BAM or gcvf. But I agree that health reports should be based on hg38 alignment and recent enough dbsnp/clinvar/etc. dataset.

Ysearcher
07-02-2019, 05:02 PM
I finally collected a salive sample for my son, and got it in the mail, and it was received yesterday, but I have been completely unable to register the kit. When I use the "Forgot my password" link, I don't get any reset messages in my email. I have tried over and over again. I think it may be because my email address is "not in our system", so the email address associated with the kit is not recognized. I have had numerous messages with support, back and forth, but can't fix the problem. I am afraid that the sample will be lost or discarded because it is not registered. I have provides the support team the original order number, the saliva sample number and the USPS tracking number, but still the saliva sample is not registered. I don't know what else I can do. The kit was ordered in November 2018 (Black Friday sale) and sat on a shelf being ignored for seven months by the person it was ordered for. I finally collected the sample and got in the mail after making the 1500 mile trip to see the relative. Now I fear after all that it will be lost.

pinoqio
07-02-2019, 09:29 PM
Try capitalizing the first letter of your email, or however else you may have written it when you first registered. I read on FB that their software doesn't ignore upper/lowercase in emails.

MacUalraig
07-04-2019, 08:08 AM
I seem to end up locked out of the new kit manager with neither my old or new passwords working - have to give it another go today on the offchance they've actually uploaded something useful. The short read hard disk order is now past four months and counting.

pinoqio
07-04-2019, 03:10 PM
You will need to reset your password using the same email address you used to sign up. Or are you saying you can't log in after resetting your password? In any case, they are handling this transition pretty poorly.

MacUalraig
07-04-2019, 06:32 PM
I tried another browser and the reset worked, to reveal a new kit manager which is a bit underwhelming. I'd rather have some of the overdue reports and data...

As regards the site I wish they'd get someone with decent English to write it.

bjp
07-08-2019, 05:20 PM
I just emailed Dante Labs with my monthly beg for my paid-for hard drive containing raw data, summarizing the timeline from purchase on April 4th, an email from them on April 7th saying it would be shipped in "a few days", to "in a few weeks" on June 6th, to my latest stating I need it shipped with a tracking number in the next 30 days or I will take further action.

My patience is nearly infinite, my tolerance for lies is not.

Donwulff
07-13-2019, 03:04 PM
Any experience/idea of Dante Labs new condition-reports? Their business model starts to make slightly more sense, if they're selling interpretations separately. Unlike many other companies, Dante Labs has never promised free updates, and I don't blame them for it because they've been selling sequencing at times cheaper than microarray health test like 23andMe/MyHeritage. This will also let them gather phenotypic data (Ie. people's diagnoses etc.) which they're now asking consent to use in research.

But it still remains that I'm not currently really impressed with the reports that come with the test itself, and I can't find examples or any data of any of their in-depth reports. You've been able to request "personalized health report" up to now, so I'm wondering if these are the same, or how they differ. And yes, they just announced "Artificial Intelligence Personalized Reports" which raises more questions than it answers. The "Artificial Intelligence" report has been added to "What do you get" on the sequencing product itself though, so it may be that new customers get it for free, but what exactly constitutes "medical documents" in this case?

tontsa
07-15-2019, 09:13 AM
My feeling on those new services is that they just put them up to impress some potential investors.. just for "fun" I ordered couple weeks back the now commercial Hg38 alignment and haven't gotten results yet. In theory it shouldn't be that long to deliver if they are using the sequencing.com premium pipeline for that..

MacUalraig
07-17-2019, 03:47 PM
Dante have another flash sale on, WGS 30x for 299 Euros so just grabbed one, offer expires at the end of the day.


"
Dante Labs Offers $349 Whole Genome Sequencing on Amazon Prime Day
Jul 16, 2018 Posted by: Andrea Riposati

Press Release (ePRNews.com) - NEW YORK - Jul 16, 2018 - International biotech company Dante Labs announced today the offer of whole genome sequencing (WGS) and interpretation at only USD 349 (€299). This offer marks a further price reduction compared to Prime Day 2017, and marks another “first” in the worldwide reduction of the cost of whole genome sequencing.

The special offer is available for only 36 hours – from July 16 to the end of July 17 on both the Amazon websites (amazon.com, amazon.de, amazon.fr) and on Dante Labs website. Customers from all over the world will be able to benefit from this one-off, historic opportunity.

The sequencing coverage is 30X via next-generation sequencing (NGS). The service includes bioinformatics analysis and data interpretation, as well as customized reports for diseases or conditions of interest.
"This offer marks a further price reduction compared to Prime Day 2017, and marks another “first” in the worldwide reduction of the cost of whole-genome sequencing"
Andrea Riposati
CEO


Dante Labs confirms its excitement to work with Amazon on this opportunity to make advanced genetic testing accessible to everyone.

“Amazon has been a special resource for Dante Labs from the beginning,” said Dante Labs CEO Andrea Riposati. “We share several values such as customer centricity and passion for excellence. Amazon provides us with an amazing platform to reach people worldwide and to achieve economies of scale and cost savings that we are glad to pass to our customers.”
"

That was my short read order a year ago today. To recap, the VCF arrived in early March (~7.5 months) and I have yet to receive the BAM file. I believe they are running or have just run another of their 'sales'.

pmokeefe
07-17-2019, 05:53 PM
I just received (July 17, 2019) a disk drive with the raw data from two WGS x30 kits I ordered on October 31, 2018 for €349.00 each. The vcf's have been available on their website for some time. I'm not sure whether the raw data was supposed to be "included" with my original order, I believed so, but just to be sure I made an additional order for the raw data disks on May 5, 2019 for €59.00 per kit.

One question: I've only had time to take a brief look at the disk I received, but I did not notice a gvcf file. The raw data for a previous kit, which I received back in July of 2018 did include a gvcf file - which I found useful. Has anyone else received their raw data recently with a gvcf? Maybe there's an equivalent file with a different extension?

One further comment on the Dante Labs service times. The order that I made in March 2018 took less than five months for the raw data. The orders I made on October 31, 2018 took about 8.5 months for the raw data.

MacUalraig
07-17-2019, 06:18 PM
I just received (July 17, 2019) a disk drive with the raw data from two WGS x30 kits I ordered on October 31, 2018 for €349.00 each. The vcf's have been available on their website for some time. I'm not sure whether the raw data was supposed to be "included" with my original order, I believed so, but just to be sure I made an additional order for the raw data disks on May 5, 2019 for €118.00 per kit.

One question: I've only had time to take a brief look at the disk I received, but I did not notice a gvcf file. The raw data for a previous kit, which I received back in July of 2018 did include a gvcf file - which I found useful. Has anyone else received their raw data recently with a gvcf? Maybe there's an equivalent file with a different extension?

One further comment on the Dante Labs service times. The order that I made in March 2018 took less than five months for the raw data. The orders I made on October 31, 2018 took about 8.5 months for the raw data.

Glad to hear you got yours - there doesn't seem to be any pattern to this at all.

ybmpark
07-17-2019, 06:18 PM
I just emailed Dante Labs with my monthly beg for my paid-for hard drive containing raw data, summarizing the timeline from purchase on April 4th, an email from them on April 7th saying it would be shipped in "a few days", to "in a few weeks" on June 6th, to my latest stating I need it shipped with a tracking number in the next 30 days or I will take further action.

Same story. My original order was around Nov 20(Thanksgiving Promotion), vcf files were uploaded in early April. I ordered the hard disk around April 20th. In the beginning they were saying it would be shipped in the next few days but now the reply looks automated. They even automated the reply to "Was this an automated reply?". My fear is that they threw out the raw data and are now trying to reconstruct them from vcf files and the reference sequence. Which is a fraud.

newuser
07-17-2019, 06:34 PM
Anybody know if there are there other company's europe based that do wgs and deliver full raw data?

I'm on the HDD queue for 5 month's and still haven't received a harddrive.
The vcf they supplied online where incomplete. I ordered the genomeZ(wgs/wes) and only recieved the wessnp.vcf.
Had to contact support to deliver the wgs snp data.

I also asked for the indel vcf month's ago never heard anything back.
I'm ok with that as long as i receive the harddrive so i can generate them myself.

But I'm starting to think they lost my data.
If I ever will receive the harddrive I'm for sure gonna order a new kit somewhere else to see if the data overlaps.
Because i'm starting to lose Dante's trust after all these issues.

pmokeefe
07-17-2019, 07:09 PM
I just took another look at the hard drive I received today from Dante Labs. The "Date Modified" attribute for the files on the drive is April 24-25, 2019. Not sure if that is conclusive evidence though.

MacUalraig
07-17-2019, 07:25 PM
Anybody know if there are there other company's europe based that do wgs and deliver full raw data?

I'm on the HDD queue for 5 month's and still haven't received a harddrive.
The vcf they supplied online where incomplete. I ordered the genomeZ(wgs/wes) and only recieved the wessnp.vcf.
Had to contact support to deliver the wgs snp data.

I also asked for the indel vcf month's ago never heard anything back.
I'm ok with that as long as i receive the harddrive so i can generate them myself.

But I'm starting to think they lost my data.
If I ever will receive the harddrive I'm for sure gonna order a new kit somewhere else to see if the data overlaps.
Because i'm starting to lose Dante's trust after all these issues.

YSEQ's delivery is exemplary, all available online usually around 2 months after sample return then if ordered, a drive with the fastqs within a further fortnight. The online files include a gedmatch upload file and a chrY BAM ready for upload to yfull plus mtdna fasta. However they are of course a fair bit more expensive! If you mean which firm is as cheap as Dante but delivers in a decent timescale then there isn't one.

newuser
07-17-2019, 07:44 PM
YSEQ's delivery is exemplary, all available online usually around 2 months after sample return then if ordered, a drive with the fastqs within a further fortnight. The online files include a gedmatch upload file and a chrY BAM ready for upload to yfull plus mtdna fasta. However they are of course a fair bit more expensive! If you mean which firm is as cheap as Dante but delivers in a decent timescale then there isn't one.

I understand the price will be a lot higher since dante's prices seem unrealistic.
But isn't YSEQ US based?
Getting a sample through customs and the company not falling under GDPR is an issue for me.

MacUalraig
07-17-2019, 07:49 PM
YSEQ are in Berlin and their sequencing is done there.

ybmpark
07-17-2019, 10:10 PM
I just took another look at the hard drive I received today from Dante Labs. The "Date Modified" attribute for the files on the drive is April 24-25, 2019. Not sure if that is conclusive evidence though.

Have you checked "date created"?

pmokeefe
07-17-2019, 10:33 PM
Have you checked "date created"?

Same as modified, within minutes.

pmokeefe
07-18-2019, 01:39 AM
On my recent raw data disk I had the following large files

mtDNA_V300014569_L02_533_1.fq.gz
mtDNA_V300014569_L02_533_2.fq.gz

WGS_CL100112809_L01_501_1.fq.gz
WGS_CL100112809_L01_501_2.fq.gz
...
WGS_CL100112809_L01_508_2.fq.gz

There was one pair of mtDNA files and 8 pairs of WGS files. The mtDNA files were about 1.2GB each while the WGS files were 6-8GB each.
They all looked like typical paired read fastq files (100 bp long).
The raw data disk I received last year did not have the mtDNA fastq files.
What is going on? Are the paired reads from mtDNA somehow separated from the nuclear DNA? If so, how is that done? The depth of coverage of the mtDNA would typically be much higher than nuclear, is that how it is done?

Donwulff
07-18-2019, 05:19 AM
There are existing tools to simulate sequencing reads from VCF file, so if they were doing that, it would take hours and not month, though it would be statistically fairly obvious. The fact that people are getting them, after months, speaks to the fact that they're able to deliver them so there's no basis for either allegation, that they've lost the data or that they're trying to reconstruct it. Obviously it's bad enough that customers can't tell what the delivery schedule is, though.
Genomic sequencers don't use FASTQ internally, so usually there is a conversion step for generating the FASTQ. Creation/modification date can be affected by copying files, especially if compressing, so that doesn't necessarily tell anything useful.
Having mtDNA separate is very weird though. There's no technical problem to that, it's just 16569 basepairs you can do a lossy compare against to recognize the reads. Why would you do that, though? All I can think is they have different statistical properties so you might want to exclude them from statistical analysis like read-depth for example. The flow-cell ID V300014569 in principle suggests this was sequenced with something else than the main instrument which raises more questions, though.

MacUalraig
07-18-2019, 10:26 AM
Some may have seen the CEO's latest mini vid of his new 'automated sequencing center'/'lab'? I can't make out the labels on it so what is the machine?

Not a great fan of the way they do these, how about a longer one?

https://youtu.be/oqubXXTiiRQ

Donwulff
07-19-2019, 10:07 AM
Weird, somebody asked same thing on Facebook about a week ago, and they actually replied:
"Dante Labs Automatic DNA extraction, library prep and sequencing. Large capacity. Specialized on whole genome. The instrument behind Andrea is a Qiagen DNA extractor (QiaSymphony). The one in his hand is a ThermoFisher QC instrument."

You can see the QIASymphony label in any case.
There's https://plos.figshare.com/articles/_Comparision_of_instrument_prices_and_material_cos ts_per_sample_for_five_automated_DNA_extraction_sy stems_/1132834 for example. No sequencer, though.

pinoqio
07-20-2019, 04:56 PM
Out of curiosity, I aligned your mtDNA fastqs against hg19 and got some stats:
91% of the reads are mappable to mtDNA, 8% to the rest of the genome, and <1% unmappable.
Which means a mtDNA-only BAM is ~1.85GB with a average mtDNA coverage of 171'529x, with a max coverage of about 350'000x.
IGV needs about 4G RAM to load it but here is the coverage plot:
31913

A normal 30x WGS should have about 3000x mtDNA coverage - I guess the cynics on the FB customers group were right, and Dante Labs is doing an mtDNA enrichment step and an extra sequencing run.
I can only assume this is the result of a miscommunication between the non-technical CEO at Dante and BGI in Hong Kong after customers complained about VCFs not having mtDNA calls.
Now I'll just hope my sample gets sequenced before they realize they could save a good bit by getting rid of this...

skoodamies
08-02-2019, 09:30 AM
I was investigating some Dante Labs BAM files with idxstats. Only one of those is female sample and included around 30% of mapped Y chromosome reads compared to male samples. I was wondering is this typical level? Has anyone else checked these? I know that Y and X chromosome have some overlapping for evolution reasons that could result mapped reads but there is most likely other explanation too.

Is there any easy way to check what these mapped reads are and how they would with to haplotree? Sending Y chromosome BAM file to YFull is of course always an option, but I have a feeling they would not be able to process it..

tontsa
08-02-2019, 10:42 AM
Is it Hg19 aligned BAM? Those contain "phantom" Y-chromosome hits that are actually X or something else.. Hg38 shouldn't contain those for female samples.. I'm still waiting Dante on one female sample to see what they produce..

skoodamies
08-02-2019, 12:33 PM
Yes all of these are HG19 aligned BAM files.

Donwulff
08-03-2019, 04:54 AM
X ad Y were originally the same chromosome: https://slideplayer.com/slide/6199812/18/images/56/Evolution+of+X+and+Y+Chromosomes+from+Homologous+A utosomes.jpg
They still share a lot of the same content: http://universe-review.ca/I11-35-Y22.jpg

One of the many annoyances on social media/internet is all the people declaring "Test Z is garbage, it's showing values on the Y chromosome for a woman!!", and then there's some know-it-all with "But is it on the PAR? If not, demand your money back immediately!" xD

Garvan Institute of Medical Research 50X sequence of well known NA12878 *Female* sample mapped on hs37d5:
samtools idxstats HG001-NA12878-50x-BWA-MEM.bam | grep "[XY]"
X 155270560 53586492 155661
Y 59373566 268381 1377

In the raw human genome references the Pseudoautosomal Regions are repeated on both X and Y, and so reads will additionally map to either one of them. 1000 Genomes and Analysis Ready genomes references have them hard-masked on Y reference, and so they will only map to X.

Let's try this:
samtools bedcov chrY.bed HG001-NA12878-50x-BWA-MEM.bam
Y 10000 2649520 0
Y 2649520 59034049 29668828
Y 59034049 59363566 0

Okay, this reference has Y-PAR hardmasked, but still substantial mapping to the Male Specific Y region.

samtools bedcov -Q60 chrY.bed HG001-NA12878-50x-BWA-MEM.bam
Y 10000 2649520 0
Y 2649520 59034049 4781722
Y 59034049 59363566 0

PAR's are identical between X and Y, so they have mapping quality 0, but there's still substantial amount of reads that map uniquely to MSY.

deadly77
08-05-2019, 08:54 AM
There's a discussion on the YSEQ Facebook group (public group) regarding the Dante Labs WGS raw data output here: https://www.facebook.com/groups/YSEQDNA/permalink/2302463169832734/

It appears that Thomas Krahn of YSEQ is offering a service to take the raw data from Dante WGS and process the data into more manageable form - separate Y,mt, autosomal BAM mapped to hg38, easy transfer to YFull, mtDNA FASTA, "23andme"style hg19 autosomal file for Gedmatch, etc. No medical data analysis. Service to cost $25 for existing YSEQ customers, $50 for externals.

Of course, you'll still need to get the data from Dante on a physical hard disk or download and then get them to YSEQ (again either by shipping the hard disk or supplying them with a reliable and accesible link) but it's an option for those who would like the data in a more manageable form that you can do more with.

tsunami
08-15-2019, 10:51 AM
Is this guide (https://lucazammataro.blogspot.com/2018/01/from-fastq-to-bam-in-8-steps.html) good for producing BAM file from FASTQ files?

Donwulff
08-15-2019, 11:07 AM
Is this guide (https://lucazammataro.blogspot.com/2018/01/from-fastq-to-bam-in-8-steps.html) good for producing BAM file from FASTQ files?

I would say those instructions are all kinds of wrong, and certainly old, but apparently I'm not allowed to comment on bioinformatics without some people feeling personally offended ;)

tsunami
08-15-2019, 11:50 AM
If that's the case, could you please comment them in PM? ;)

tontsa
08-15-2019, 04:52 PM
On the "Dante Labs WES/WGS Sequencing Technical" thread karwisos' instructions at least lead to working Hg38 BAM from Dante's fastqs. I'm still testing Donwulff's hg38.p12/p13 bqsr scripts.. kinda slow especially the haplotype calling on whole genome..

E_M81_I3A
08-15-2019, 05:36 PM
I ordered a kit on April 26th for 199 EUR, sent it back to the Lab (located in Italy) beginning of May and they received it on May 10th.

After 3 months I still haven't received my results so according to the "90 Day Guarantee", they must give me a full refund as well as the full results…

Excellent… I therefore contacted them yesterday and this is what they answered to me "Thanks for reaching out. For your refund, I will be escalating to the relevant team for further assistance. We will reach out with an update as soon as we have more information. Once your refund is completed successfully, you will be notified. Thanks for your patience. "

Has anyone experienced the same ? results + full refund ? or they don't tell the truth ?

Donwulff
08-15-2019, 06:19 PM
The revert_bam script is probably faster than alternatives on a single computing node (as long as there's enough memory for the processing threads, or they're adjusted properly) because the script is optimized to parallerize and pipe results uncompressed where possible. BQSR is optimal, there are some suggestions it doesn't make much difference (I'm just updating the pre-processed dbSNP snapshot to version 153). The scripts do not currently include variant calling; it's both more striaghtforward and, well, varied...

I did just recall the rever_bam script expects BAM input, and people apparently only have FASTQ now (no BAM?). I'm using a script like this for making the unmapped BAM from FASTQ:


#!/bin/bash
ID=56001801099999A
LEVEL=1
for R in /path/to/fastq/${ID}/*1.fq.gz; do
BASE=${R%%1.fq.gz}
NAME=${BASE##*/}
READ=${NAME:1:18}
/usr/bin/time java -Xmx28G -jar picard.jar FastqToSam \
FASTQ=${BASE}1.fq.gz \
FASTQ2=${BASE}2.fq.gz \
OUTPUT=/mnt/tmp/${READ}.bam \
READ_GROUP_NAME=${READ} \
SAMPLE_NAME=$ID \
LIBRARY_NAME=$ID \
PLATFORM=ILLUMINA \
SEQUENCING_CENTER=BGI \
RUN_DATE=`TZ='America/Chicago' date -Isecond -r ${R}`\
TMP_DIR=/mnt/tmp
JOIN="${JOIN} /mnt/tmp/${READ}.bam"
done

samtools merge -n /mnt/tmp/${ID}.unmapped.bam $JOIN

I'll probably make that into part of the main script at some point. You'll have to touch ${ID}.bam or symlink it to the unmapped too for revert_bam to realize ${ID}.bam has already been unmapped. The benefit of the unmapped BAM format is that it keeps the metadata, in this case the read-group header mainly, with the file itself so one doesn't have to give or look it up elsewhere. Also, Broad Institute that's largely driving development of the sequence interpretation tools has decided that's the proper archive format; you can delete it after aligning the BAM though. Don't try to use unmapped BAM for anything else though.

tontsa
08-16-2019, 06:01 AM
I did just recall the rever_bam script expects BAM input, and people apparently only have FASTQ now (no BAM?). I'm using a script like this for making the unmapped BAM from FASTQ:


Cool! That is the missing piece I guess. With BWA approach I run into issues with GATK's HaplotypeCaller as it can't find the chrUn_KN707606v1_decoy and other _decoys from the reference. I'll rerun now whole pipeline with just picard, gatk and samtools. Will take few days with my rig though :)

MacUalraig
08-16-2019, 07:11 AM
I ordered a kit on April 26th for 199 EUR, sent it back to the Lab (located in Italy) beginning of May and they received it on May 10th.

After 3 months I still haven't received my results so according to the "90 Day Guarantee", they must give me a full refund as well as the full results…

Excellent… I therefore contacted them yesterday and this is what they answered to me "Thanks for reaching out. For your refund, I will be escalating to the relevant team for further assistance. We will reach out with an update as soon as we have more information. Once your refund is completed successfully, you will be notified. Thanks for your patience. "

Has anyone experienced the same ? results + full refund ? or they don't tell the truth ?

I'll be interested to hear how things progress. I tested before the guarantee was announced but I did get a partial refund quite quickly after I overlooked a discount code.

tontsa
08-16-2019, 07:12 AM
I'm in the same boat.. they received the kit on 10th of May.. I asked about the refund and they say they process the refund same time the results come.. so you'll prolly have to wait around year for that.. atleast that was the case with my other kit.

MacUalraig
08-16-2019, 07:16 AM
I'm in the same boat.. they received the kit on 10th of May.. I asked about the refund and they say they process the refund same time the results come.. so you'll prolly have to wait around year for that.. atleast that was the case with my other kit.

You're kidding?

tontsa
08-16-2019, 07:19 AM
Not kidding.. this is their reply:
Thank you for your message.

Apologize for the inconvenience and delay with your sample status.

Your DNA has passed the quality control and is currently being sequenced: this is a long and complex process, composed of several steps. It can take up to a month to complete. We have not assigned a specific date to when the results would be ready as it may vary base on the sample.

Next, we will run the bioinformatics analyses, which will take a few more days. Once the process is complete we'll send you an email confirming that your results are ready. Your patience is truly appreciated.

Once eligible the refund will be processed once the results are ready and the internal team will update you once there us more information.

Donwulff
08-16-2019, 07:37 AM
Cool! That is the missing piece I guess. With BWA approach I run into issues with GATK's HaplotypeCaller as it can't find the chrUn_KN707606v1_decoy and other _decoys from the reference. I'll rerun now whole pipeline with just picard, gatk and samtools. Will take few days with my rig though :)

I'm not sure what you're doing to do though, but the missing contigs/chromosomes are usually result of using different references. I think for most of them it doesn't matter if downstream analysis is missing some contig definitions, if those contigs aren't analysed. (Most of the time you want to only call variants on the primary assembly, certainly not decoys). With HaplotypeCaller, you can give -L chrY or put chr1, chr2 etc. on separate lines in chromosomes.list and use -L chromosomes.list

HC will complain if there's incompatible contig definitions. In my scripts I pre-process the references, like dbSNP, by either removing the contig definitions completely or replacing them with the reference I'm using. (Obviously, only do this if the references are compatible, ie. all GRCh38 for example). Those are actually in BQSR.sh script currently, so I may have to start breaking those into modules which make more sense... which is kind of a shame, those scripts started just as an example of how to do it. Might as well switch to some existing pipeline construction framework at this point.

Converting dbSNP dump contig names into UCSC ones using the GRCh38.p13 assembly report; assembly report doesn't cover fixes to primary assembly, so I might have to handle that mapping manually. Nobody cares about the fix-patches anyway ;)

gawk -v RS="(\r)?\n" 'BEGIN { FS="\t" } !/^#/ { if ($10 != "na") print $7,$10; else print $7,$5 }' GCF_000001405.39_GRCh38.p13_assembly_report.txt > dbSNP-to-UCSC-GRCh38.p13.map
time bcftools annotate --rename-chrs dbSNP-to-UCSC-GRCh38.p13.map GCF_000001405.38.gz | gawk '/^#/ && !/^##contig=/ { print } !/^#/ { if( $1!="na" ) print }' \
| bgzip [email protected]${cores} -l9 -c > ${DATA}/GCF_000001405.38.dbSNP153.GRCh38p12.GATK.vcf.gz
tabix ${DATA}/GCF_000001405.38.dbSNP153.GRCh38p12.GATK.vcf.gz

Reheadering indel reference because HLA contig sizes change, and mismatched contig lengths don't work:

tabix -H ${DATA}/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz | grep -v "^##contig=<ID=HLA-" > ${DATA}/Mills_and_1000G_gold_standard.indels.hg38.noHLA.vc f.head
bcftools reheader -h ${DATA}/Mills_and_1000G_gold_standard.indels.hg38.noHLA.vc f.head ${DATA}/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz \
-o ${DATA}/Mills_and_1000G_gold_standard.indels.hg38.noHLA.vc f.gz
tabix ${DATA}/Mills_and_1000G_gold_standard.indels.hg38.noHLA.vc f.gz

Completely removing contig definitions on compatible references, when they're non-standard:

zgrep -v "contig" snps_hg38.vcf.gz | bcftools norm -cs -f ${REF} | bgzip -c > ${DATA}/snps_hg38_GATK.vcf.gz
tabix -f ${DATA}/snps_hg38_GATK.vcf.gz

And this is why I have separate "technical" thread ;) GATK3 used ROD's which is acronym for Reference-Ordered Data. This is awkward, because not only do the contig lengths (and content) have to match, they also need to be in same order. Just one reason to use GATK4, though it's possible to re-order the contigs in the files, it's a royal pain. Also, a good reason to use existing services where available...

tontsa
08-16-2019, 07:44 AM
Yeah I used parts of your BQSR script to get that dbSNP contigs in UCSC naming convention.. I guess I just have to grep those SN:s from reference.dict to give to gatk's HC -L parameter so it'll be happy.

Donwulff
08-16-2019, 03:28 PM
grep "^[^#].*Primary Assembly" GCF_000001405.39_GRCh38.p13_assembly_report.txt | cut -f10 on that assembly report... Although I might have to say "chromosomes" instead of "primary assembly" (And um... those leave chrM out). SNPedia and places like GEDMatch really tend to only use the primary chromosomes chr1-chr22,chrX,chrY,chrM for anything. That might be changing at some point, as can be seen already dbSNP dumps have pretty much all contigs. This is a by-product of SNP's being defined by the flanking sequences though, so dbSNP just assigns the rsID to anywhere that the same sequence occurs, be it unplaced, unlocalized, fix or alternate loci. But yeah, that basically means that you *could* have the variant on unplaced contig, so using that ("Primary assembly") makes sense.

I don't recommend calling variants on decoy or HLA sequences, because that generally doesn't make sense. Fix, alt and HLA sequences should be mappable against the primary assembly, and if you're running post-alt.js script (A feat, granted) reads that *only* map to those should be lifted back into their rightful places on the primary assembly. Fix-patches are bit debatable, they're "meant" to alter the primary assembly and can change it quite significantly locally, but that could change the chromosome coordinates, so they haven't been incorporated into the primary assembly. UCSC includes them as separate contigs in their reference genome, and I judged it's better to include them than exclude them (Sort of like HLA alt-contigs on Heng Li's reference). The upshot is it's not clear how those affect the results, varies on case by case basis, but it's not clear calling variants on those makes any sense. On most cases you will ONLY need chr1 through chr22 and chrX/Y/M, the rest just serve to siphon reads off from mapping into wrong places, and would take long time to call variants on.

tontsa
08-18-2019, 02:16 PM
Going by the https://github.com/gatk-workflows/gatk4-germline-snps-indels I atleast seem to get ok variant only VCF. Commands I ran with the .bam:


wget -4 -c https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.gz

wget -4 -c https://ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.38.gz.tbi

gawk -v RS="(\r)?\n" 'BEGIN { FS="\t" } !/^#/ { if ($10 != "na") print $7,$10; else print $7,$5 }' GCF_000001405.38_GRCh38.p12_assembly_report.txt > dbSNP-to-UCSC-GRCh38.p12.map

bcftools annotate --rename-chrs dbSNP-to-UCSC-GRCh38.p12.map GCF_000001405.38.gz | gawk '/^#/ && !/^##contig=/ { print } !/^#/ { if( $1!="na" ) print }' | bgzip [email protected] -l9 -c > GCF_000001405.38.dbSNP152.GRCh38p12b.GATK.vcf.gz

tools/gatk/gatk CreateSequenceDictionary -R references/hs38DH.fa -O references/hs38DH.dict
cat hs38DH.dict | perl -le 'while(<>) { if(/SN:([^\s]+)\s/) { print qq|$1|; } }' | grep -v -E '(decoy|_alt|HLA)' >hs38DH.list
tools/gatk/gatk --java-options "-Xms30G -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10" HaplotypeCaller -R references/hs38DH.fa -I TBZBM.GRCh38.bam --dbsnp references/GCF_000001405.38.dbSNP152.GRCh38p12b.GATK.vcf.gz -L references/hs38DH.list -O foo52.g.vcf --add-output-vcf-command-line -ERC GVCF

tools/gatk/gatk --java-options "-Xmx4g -Xms4g" GenomicsDBImport -V foo52.g.vcf --genomicsdb-workspace-path /temppi/genodb -L references/hs38DH.list

tools/gatk/gatk --java-options "-Xmx5g -Xms5g" GenotypeGVCFs -R references/hs38DH.fa -O /dna/genotype.vcf -D references/GCF_000001405.38.dbSNP152.GRCh38p12b.GATK.vcf.gz -G StandardAnnotation --only-output-calls-starting-in-intervals --use-new-qual-calculator -V gendb:///temppi/genodb -L references/hs38DH.list

tools/gatk/gatk --java-options "-Xmx5g -Xms5g" VariantFiltration --filter-expression "ExcessHet > 54.69" --filter-name ExcessHet -O /dna/variant.vcf -V /dna/genotype.vcf


Dunno though if this helps anyone though.. just documenting what I'm playing with as total bioinformatics noob.

Donwulff
08-18-2019, 04:23 PM
And gah, I already forgot recently Broad Institute added https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_spark_pipeline s_ReadsPipelineSpark.php so separate pipeline may no longer be necessary at all. I've not tried that yet though, mind you, because I like to have more control over the process. Their Terra platform where you can run the Best Practices and sundry on Google cloud is interesting too, but I'd hate to blow the $300 trial credits on just one $5 genome analysis, which makes me wonder if I can just pay to test the features & possibly take the trial later...

The current Best Practices pipeline they have implemented uses Variant Quality Score Recalibration for primarily filtering the variants. Oddly enough their Best Practices web-page says to use Convolutional Neural Network for single sample variant filtration; I assume that's mainly they've just not had time to implement the pipeline yet. That's a new feature at any rate, which I'm yet to try myself. You could probably train it with results from an SNP array (genetic genealogy test) or the like, although I'm not sure how to avoid overfitting it; I'll just have to look at the documentation at some point...

tontsa
08-18-2019, 04:29 PM
I don't think you really need the Google Cloud for that.. just docker capable workstation/server with enough memory. Just above I just did one pipeline manually just to see what I get out of it before running it dockericed.

Donwulff
08-18-2019, 04:59 PM
Yeah, I'm mostly thinking people who don't have powerful home computer/allergic to Linux. Plus with people charging 50 bucks for re-analysis of the sequence, one could make $2500 profit on their $300 trial credits with no effort or cost just running the $5 genome analysis & egress, as long as they don't tell people that's what they're doing... oh oopsie! ;) I've got my own optimized single-node version of the pipeline that I can also run on Docker or AWS though, so the GATK scripts would be a step back in some ways. I want to see joint genotyping to take advantage of commonalities between sequences though.

Also, with the GATK single-sample neural network no doubt having been trained on Illumina sequences, there's bound to be some inefficiences applying it to BGIseq sequences. BGI released sequences of the HG001/NA12878 benchmark, so it's possible to train it on that. I think that for microarrays one would have to filter sites that have major HWE, Mendelian violation or population frequency error. Having paired samples of BGIseq + microarrays would help a whole lot too, though, as that would allow filtering and identifying sites which are obviously wrong. Microarrays are bit context-sensite, too, though so it's a conundrum.

Ecogene
08-19-2019, 10:54 AM
Question from a newbie. I availed of the Dante special offer last November and wanted to use some app to look at haplogroups so I investigated sequencing.com but got informed that their apps are not compatible with 130x data (I actually thought my data was 30x until I noticed WGZ in the filenames). Is this the same for other apps or can they handle 130x as well as 30x? I am interested in researching Y & MT haplogroups and some health related topics. Any help appreciated.

tontsa
08-19-2019, 11:04 AM
Question from a newbie. I availed of the Dante special offer last November and wanted to use some app to look at haplogroups so I investigated sequencing.com but got informed that their apps are not compatible with 130x data (I actually thought my data was 30x until I noticed WGZ in the filenames). Is this the same for other apps or can they handle 130x as well as 30x? I am interested in researching Y & MT haplogroups and some health related topics. Any help appreciated.

Did you get the data in BAM and/or FastQ format? If latter do you actually have separate 30x and then 130x exome? I don't see how their apps wouldn't be compatible with the 130x.. sequencing.com just has weird limit for fastqs that only 1 pair is supported.. so you need to join those _1s and _2s as just one pair and send that there and then make dataset out of them.

Ecogene
08-19-2019, 12:37 PM
Did you get the data in BAM and/or FastQ format? If latter do you actually have separate 30x and then 130x exome? I don't see how their apps wouldn't be compatible with the 130x.. sequencing.com just has weird limit for fastqs that only 1 pair is supported.. so you need to join those _1s and _2s as just one pair and send that there and then make dataset out of them.

Yes I got BAM and FastQ on hard disk. But absent an explanation from Dante I don't know what the files are. The main folder name is basically the kit number with WGZ appended. It has three folders: clean_data, result_alignment and result_variation. The latter two folders use WGZ in the file names and the first folder (clean_data) uses WGS in the file names. There are 8 1.fq and 2.fq files (16 fq files in total). Is it the case the clean_data are 30x and the alignment & variation files are 130x (just going by the file names)? What are the 1.fq and 2.fq files referring to?

Ecogene
08-19-2019, 12:38 PM
statement from sequencing.com:
If the whole genome sequencing you had was using Dante Labs' Whole GenomeZ (120x or 130x) then the BAM may have issues processing as our systems are optimized for 30x. I apologize that we can't guarantee the BAM will work but you can certainly upload it into your Sequencing.com account to test it

tontsa
08-19-2019, 01:05 PM
There are prolly some .png fqcheck files and also .xls that is actually just text file describing what those .fqs actually have and what read depth. Those 1 and 2s are just pairs for pair-ended-sequencing. For Mt/Y analysis Yfull is prolly best choice to send the .bam file to. Also if you need 23andme compatible file from that bam you might google WGS Extract for pretty simple program. You can find it on Dante Labs customers FB group atleast.

Donwulff
08-19-2019, 01:51 PM
Recommended Variant Quality Score Recalibration parameters on GATK Best Practices workflow assume that it's a WGS sequence with even coverage. In that case, some sections of the genome suddenly having higher coverage would be interpreted as an error, and the variants discarded. There could also be hardcoded filters for read-depth. The new GATK Best Practices pipelines have separate mtDNA calling for this reason, as there's hundreds of mitochondria in single cell, so the read depth is usually in the thousands.

This is one possible reason why some people aren't seening mitochondrial DNA calls in their VCF files (And also someone reported on here that Dante Labs was giving out raw data which had extra mitochondrial sequencing run; I'm not sure if that could be the case here). If it actually is an exome sequence or combined WGS + WES, you'd have to be able to run it on an exome pipeline, or somehow disable that part of the VQSR. Not that the exome reads should be in separate FASTQ files, so you could just leave those out of analysis. Y chromosome (actual read depth about 17X also because there's supposed to be only single copy) and mtDNA analysis don't significantly benefit from exome sequencing, I feel, because they're haploid (phased single copy) and not specifically targeted by exome sequencing.

Ecogene
08-19-2019, 05:21 PM
There are prolly some .png fqcheck files and also .xls that is actually just text file describing what those .fqs actually have and what read depth. Those 1 and 2s are just pairs for pair-ended-sequencing. For Mt/Y analysis Yfull is prolly best choice to send the .bam file to. Also if you need 23andme compatible file from that bam you might google WGS Extract for pretty simple program. You can find it on Dante Labs customers FB group atleast.

I don't see anything useful in the XLS files but I will look at using the WGS Extract program that you mentioned, it looks promising, thanks. When I looked again at my Dante files I noticed it is the BAM files and the separate SNP, CNV etc files that are associated with the WGZ naming while the FastQ files are associated with WGS file names. This is definitely confusing for a newbie. In terms of analysis it would be best for me to have an app working on my own PC if it needs the full 100+GB data files as I have a slow internet connection and uploading files that size is not feasible.

Ecogene
08-19-2019, 05:32 PM
Recommended Variant Quality Score Recalibration parameters on GATK Best Practices workflow assume that it's a WGS sequence with even coverage. In that case, some sections of the genome suddenly having higher coverage would be interpreted as an error, and the variants discarded. There could also be hardcoded filters for read-depth. The new GATK Best Practices pipelines have separate mtDNA calling for this reason, as there's hundreds of mitochondria in single cell, so the read depth is usually in the thousands.

This is one possible reason why some people aren't seening mitochondrial DNA calls in their VCF files (And also someone reported on here that Dante Labs was giving out raw data which had extra mitochondrial sequencing run; I'm not sure if that could be the case here). If it actually is an exome sequence or combined WGS + WES, you'd have to be able to run it on an exome pipeline, or somehow disable that part of the VQSR. Not that the exome reads should be in separate FASTQ files, so you could just leave those out of analysis. Y chromosome (actual read depth about 17X also because there's supposed to be only single copy) and mtDNA analysis don't significantly benefit from exome sequencing, I feel, because they're haploid (phased single copy) and not specifically targeted by exome sequencing.

As I mentioned in my reply above I am uncertain as to what sequencing I actually have: some file names are labelled with WGZ and some with WGS but the parent folder for all files is labelled with WGZ. I have separate mito files (*1.fq, *2.fq) approx 1GB each.

ssamlal
08-20-2019, 11:14 AM
I've finally received the results for my "My Full DNA: Whole Genome Sequencing with mtDNA" order (yesterday). It took 9 months (from start to finish)!

In Genome Manager > Reports, there are 5 file links listed in the Raw Data Library available for direct download:

FASTQ.1
FASTQ.2
BAI 8.5 MB
BAM 52.4 GB
SNP (VCF) 220 MB

The FASTQ.x links appear to be currently "dead"; the others direct and redirect to the following URLs:

https://genome.dantelabs.com/raw-data-download?id=17484&kitId=<kitId>

REDIRECTS TO:
https://dragen-output-vcfs.s3.amazonaws.com/<kitId>/<kitId>.bam?


Question Summary:
1. Will YFull/other service providers be able to use the amazonaws BAM file link?
2. Will YFull/other service providers also require the BAI file link?
3. Are the file sizes typical for 30x WGS (the BAM file seems quite small)?

Petr
08-20-2019, 01:49 PM
1. Will YFull/other service providers be able to use the amazonaws BAM file link?
2. Will YFull/other service providers also require the BAI file link?
3. Are the file sizes typical for 30x WGS (the BAM file seems quite small)?

1. It looks like these links does not work. I have received the following:

<Code>AccessDenied</Code>
<Message>Request has expired</Message>

2. No.

3. The BAM file sizes vary.
The old (2017/2018) BAM files with 100bp reads had the size 93 and 95 GB.
Newer BAM files with 150bp reads from April 2019 had the size 30, 33, 35, 37 and 41 GB only. And they are 15x to 24x, not 30x.

Donwulff
08-20-2019, 02:42 PM
Is the read depth confirmed from autosomal chromosomes, or calculated for example from number of reads (which would be lower due to the reads beinging longer)? Because their sales-page still says "Whole Genome Sequencing 30X" and "Average data size 90GB" (Presumably these refer to GigaBases - file size will depend mainly on how the quality data is encoded), so that would be misleading. At least longer read length makes it more powerful, but increased error rate means that something like 15X read depth would be significantly worse.

The download links have never worked retroactively, you get what they're offering when your order goes through. Not that I wouldn't MIND getting download links to the raw data myself, the new Health Report to compare, and the "A Curated Personalized Report powered by our Artificial Intelligence, based on your medical history, health records or doctor's diagnosis" they're advertising for new customers (Honestly I'm not sure, maybe I could still get it but I have no idea what reports I should send to them as I don't have any rare diseases diagnosed. Bloodwork results? Be aware that they're probably gathering research data, although again I'm not clearly sure if there's consent or opt-in anywhere...)

Previously there was post https://anthrogenica.com/showthread.php?12075-Dante-Labs-(WGS)&p=568447&viewfull=1#post568447 from Petr suggesting the 150BP sequences were still >90GB (I'm assuming that's whole sequence, including some oral metagenome reads etc.), and https://anthrogenica.com/showthread.php?12075-Dante-Labs-(WGS)&p=583143&viewfull=1#post583143 from pmokeefe showing naming convention for the FASTQ with extra mtDNA reads. Back in 2017 when I got my sequence done, there were none of these naming conventions. I should check on Personal Genomes Project what the naming conventions and read depths are on what people are submitting there.

Petr
08-20-2019, 08:11 PM
For my 5 Spring 2019 "30x" WGS tests I have received just FASTQ files, not BAM files, and these FASTQ files were interpreted by YSEQ (to BAM) and 3 male results by YFull (to their web).

BAM files created by YSEQ have the size 38.1, 32.7, 34.8, 28.3 and 30.9 GiB, BAM files created by YSEQ from one year older FASTQ files have the size 87.8 and 87.3 GiB.

YFull statistics for 3 2019 samples (Y chromosome only):
Mean depth coverage: 17.73X
Median depth coverage: 15X

Mean depth coverage: 10.70X
Median depth coverage: 9X

Mean depth coverage: 13.01X
Median depth coverage: 11X

I'm not sure how to correctly determine the read depth.
Read depth is in VCF files delivered by Dante, separately for each chromosome. For chromosome 1 the read depth in the VCF file is:
##Depth_chr1=16.00
##Depth_chr1=11.00
##Depth_chr1=8.00
##Depth_chr1=20.00
##Depth_chr1=21.00

FastQC shows the following data for their 150bp FASTA files:
Total Sequences 342766670
Total Sequences 417158704
Total Sequences 302178190
Total Sequences 358447022
Total Sequences 312527687

and for 100bp files:
Total Sequences 600184154
Total Sequences 636133334

Rather confusing to me.

Petr
08-20-2019, 08:12 PM
(duplicate)

ybmpark
08-20-2019, 09:00 PM
I tried the amazon link and I get accessdenied for code and access denied for message.

Donwulff
08-21-2019, 11:12 AM
It's true that read depth can be defined in multiple ways, but whichever method you use to define it, you should use the same one when comparing. The count of reads times read length (Times two if counting pairs) is fairest and commonly used. Likely biggest thing that complicates that measure is the amount of oral microbiome included in the sample, but the sequencing company has little control over that, and some people might even be interested in that metagenome. Another source are entirely identical duplicate reads which are excluded from analysis to prevent bias.

I looked quickly and couldn't find a definition for the ##Depth_ headers in the VCF, but it stands to reason those indicate the depth of the reads actually used for calls of the variant sites in the VCF. Different variant callers have different parameters due to which they might exclude some reads from the calls being made, and the variant sites reported in the VCF are likely to have lower read depth than the whole genome. While, arguably, this "read depth of the called variants" is what you care about, it depends on the bioinformatics analysis of the sequence data, and would require processing each compared genome sequence exactly identically.

ssamlal
08-31-2019, 11:02 PM
I've finally received the results for my "My Full DNA: Whole Genome Sequencing with mtDNA" order (yesterday). It took 9 months (from start to finish)!

In Genome Manager > Reports, there are 5 file links listed in the Raw Data Library available for direct download:

FASTQ.1
FASTQ.2
BAI 8.5 MB
BAM 52.4 GB
SNP (VCF) 220 MB

The FASTQ.x links appear to be currently "dead"

Updates:

The FASTQ file links are working now :)

Another file link appeared: Indel (VCF)

Total of 6 working file links in the Raw Data Library section now:

FASTQ.1 - 29.6 GB
FASTQ.2 - 31.6 GB
BAM - 52.4 GB
BAI - 8.5 MB
SNP (VCF) - 220 MB
Indel (VCF) - 68.5 MB


Is is strange that the FASTQ compressed files are 61.2 GB (in total) and the uncompressed BAM is 52.4 GB?

pinoqio
09-01-2019, 10:02 AM
Is is strange that the FASTQ compressed files are 61.2 GB (in total) and the uncompressed BAM is 52.4 GB?

No, because in the BAM all reads of the same sequences are already grouped together, so they compress better.

Donwulff
09-02-2019, 04:51 PM
Also, BAM stands for Binary Sequence Alignment/Map Format, it's both binary & compressed. The uncompressed, ASCII file is called SAM and is significantly larger. I haven't seen Dante Labs new sequences, but the BAM file size usually depends significantly on the read base quality data, because it isn't redundant (repetitive) and thus compressible. The ~50GB BAM's probably have fewer quality levels (bins) and/or more uniform quality than ~100GB BAM's which have been seen. (The >90GB in sequence specs stands for GigaBases, not GigaBytes which will indeed depend on compression).

ybmpark
09-04-2019, 11:46 PM
Anyone got their hard drives or the FASTQ file in the last month? Especially those who bought the Thanksgiving promo.
This is getting ridiculous. It is approaching 10 months since I ordered the test.

Ann Turner
09-05-2019, 11:43 AM
Anyone got their hard drives or the FASTQ file in the last month? Especially those who bought the Thanksgiving promo.
This is getting ridiculous. It is approaching 10 months since I ordered the test.
There is a new customer support person who introduced himself on the Dante Labs FaceBook group. He helped me get the hard drive I ordered in April (not at the time of the Black Friday sale). Try [email protected]

Donwulff
09-07-2019, 01:07 PM
I think this is Europe only, due to FDA. Only just now realized we apparently get one "credit" towards free reports at Dante Labs with the DNA test. The new reports now cost 29 EUR per panel, but I noticed there is now a subscription option, which says "All your reports are updated monthly". In addition, the silver subscription at 19.99 EUR gives one extra credit per month, which is more affordable than the individual panels.

The process to get the new panel is slightly confusing. After choosing it from the online store, I received e-mail telling me to use a special page to enter my kit ID, panel order ID and the panel I ordered, finding them from different places around the Genome Manager & e-mails I received. Instructions were unclear with "For multiple orders, we suggest you add a nickname." (Add what nickname where? For now I resolved to only ordering one panel at a time, though that may not be what they mean), and the input form didn't work without disabling my privacy guards. And yes, that means the subscription is kit-based, which means if you have 10 kits you'd need to buy 10 subscriptions, which makes sense but is expensive. The ordered panel appeared in about 24 hours.

Turns out the panel is just a dump of ClinVar data for genes in the panel, in a PDF with some formatting issues. I don't believe there are easy ways for an individual to get their ClinVar report (Even Promethease doesn't really report most of that) so certainly for Dante Labs users this is the easiest way, however I'm hoping they'll fix the PDF formatting and make this otherwise more approachable. There's an excel table which will be useful for more advanced users. It contains ClinVar, OMIM, SIFT & PolyPhen2 predictions, GWAS catalog and other annotations, however all of the sources are at least one year old.

Regarding the "All your reports are updated monthly" in couple of days I've not yet observed any change to my existing reports, although I suppose that phrase is somewhat open to interpretation of what the "reports" are. There isn't any change to the actually useful dymanic browser on the Genome Manager, either. It still reads "Your holistic report that lists potential health conditions you might develop." but shows only pharmacogenetic items. A point of note is I did my sequence in 2017, when their bioinformatics process was significantly different, for example I only have SNP file, no indels or CNV's. It would be "fair" for them to update the bioinformatics analysis up to date as well, but with people waiting months for their BAM files, I'm not holding my breath.

In conclusion I can recommend the subscription feature for lower price access to annotated variants, but presently the PDF report is barely readable, and I'll keep you posted IF there are actual updates to the reports.

Geldius
09-07-2019, 10:55 PM
Question from a newbie. I availed of the Dante special offer last November and wanted to use some app to look at haplogroups so I investigated sequencing.com but got informed that their apps are not compatible with 130x data (I actually thought my data was 30x until I noticed WGZ in the filenames). Is this the same for other apps or can they handle 130x as well as 30x? I am interested in researching Y & MT haplogroups and some health related topics. Any help appreciated.

As to your question concerning Y & mt haplogroups, I can recommend you to use YFull service - https://yfull.com
You can extract Y chromosome and mtDNA reads using samtools or you can pass link to whole BAM file (also with BAI).
YFull should be able to extract it on their own, it would only take more time due upload.

Geldius
09-07-2019, 11:25 PM
As I'm still waiting for new Dante raw data results, I did some tests with Bowtie 2 aligner on my recent and older CPUs.

input: 2x pair-ended FASTQ files (BGISEQ-500), Dante Labs WGS, received in June 2018
aligning tool: Bowtie 2 (v2.3.5.1)
reference genome: GRCh38.p13 (from February 28, 2019)

CPU: Core i5-7400, (1 thread) - 104 hours
CPU: Core i5-7400, (4 threads) - 30 hours
CPU: Core i9-9700K, (8 threads) - 14 hours

I'm also considering the possibility to make an experiment with Intel Xeon Phi PCIe card as these cards are quite inexpensive at ebay.

Donwulff
09-17-2019, 12:54 PM
Looks like big changes with Dante Labs product lineup and pricing, they're now offering sequencing starting from 99 EUR for avg. 4X low-pass sequence. Sequences have two speed-tiers, 2 weeks and 8 weeks, and according to news post since September all sequencing is done in their "new european lab". (All this pertaining to their european site, I need to check the US site too). Artificial Intelligence Personalized Report (Which I still don't understand, should I just upload my whole blood count or what for that? Also it's promised as part of the original sequencing order, so shouldn't I get it for free?) is discounted to same 49 EUR as other tests.

Of course, their 3 month delivery guarantee page earlier disappeared without a word and just crossing 90 days waiting on my long-read test which should have been ran at their Italian lab, so treat with caution, but the prices are definitely competitive and they're actively changing things around.

MacUalraig
09-17-2019, 01:58 PM
From the FAQ:

"What about the delays I read online?
In the past, we used third party labs, which caused the delays. Now that all samples are sequenced internally, we control the entire process and can guarantee the turnaround time."

https://www.dantelabs.com/pages/faq

MacUalraig
09-17-2019, 03:24 PM
Import your 23andMe/Ancestry genetic data?

https://www.dantelabs.com/collections/subscription/products/import-your-genetic-data

Is this new, hard to keep up...?

Donwulff
09-17-2019, 03:35 PM
Import your 23andMe/Ancestry genetic data?

Well, that's confusing. They already had €19.99 / month per sequence subscription which is supposed to be "All your reports are updated monthly", though I've not yet seen any updated (But my profile page does read "Unfulfilled", and I did get monthly credit towards free reports). So I thought the "Subscription" offering was the same... And on that note the new products make it unclear what reports etc. are actually included, because on the new product offerings I find:

Intro: intro test, with reports on wellness and nutrition (4X coverage)

Premium: medical grade, full health check up for all common and rare diseases (30X coverage)

Super Premium: advanced medical grade, full health check up for advanced diagnostics (50X coverage)

So what reports do they actually include now? What reports did they include before because they listed AI reports etc. which I didn't get? Guess I'll have to ask them directly, but I would have thought someone here has actually dealt with these before.

These are definitely "new" though, I think the changes are from today or yesterday and haven't been officially announced yet, it may take them a while to clarify what they mean.

MacUalraig
09-17-2019, 03:43 PM
I never tried them but I thought the original ones only worked off their WGS data. Looks like you can get 4 reports off imported data (including a VCF option) but a bigger more detailed selection if you have their WGS data.

Donwulff
09-17-2019, 04:08 PM
I have "Neurology+Test (zip)" which is actually a ZIP, including pdf and excel file (which seems too large to really process with OpenOffice Calc) which I obtained within 24 hours of choosing the extra report for my report credits, Pharmacogenetic Report that came as free update to everybody. Then there's "Genome Overview" and "Health & Wellness" from Sequencing.com which came with the sequencing in 2017. None have updated in the two weeks since I bought the monthly "report updates" subscription (It's the most affordable way to get the new panels/reports, however).

I never received their new, in-house report which came after I bought the test. However, when they first rolled out the new Genome Manager, for a while it was seeing health-conditions alongside the pharmacogenetics, but they suddenly disappeared and it was re-named to Pharmacogenetics report with sub-title saying it's a "holistic report of health-conditions". Apparently the previous product description actually read "If you seek to get a Customized Report on a disease or gene panel, after receiving your saliva collection kit, fill out the form", that sounded the same as the AI-report but the AI report may actually be a separate one. The customized report is still/now detailed in their FAQ, so it may still be available to new customers, although in my opinion the new product listings are very unclear on what you're getting with regards to reports, and there haven't been any examples available for a good while.

Also this is all on the EU site; I'm under the impression that due to FDA they don't offer perhaps any reports on the US site (Their FaceBook top post says "* Reports are available only outside the United States."), but I'm not sure.

teepean47
09-18-2019, 10:41 AM
Looks like Dante Labs is offering their 4X WGS for €99.

https://www.dantelabs.com/products/whole-genome-sequencing?variant=29983751995436

Donwulff
09-18-2019, 12:49 PM
Already said that at the top of the page ;)

I feel it's important to stress that the 4X reads are spread randomly across the genome, not uniformly, which means that huge swaths will be left unread, and others will have just 1 read which leaves up to 1% error probability. It's also highly likely all the average 4 reads will hit on one strand only at many locations, unless you actually get the optimal situation of 2 reads on both strands, it'll be nearly impossible to tell if a variant is homozygous or heterozygous (Two or one copies).

So an individual might find 4X sequencing relative useless; the sales page says "Intro: intro test, with reports on wellness and nutrition (4X coverage)" so apparently you won't get the health analysis and reports, which is good considering. I don't know about wellness & nutrition report, it appears there is now separate "Nutrigenomics" and "Wellness & Lifestyle" reports.

I tried to find a good image illustrating the distribution for 4X, but that wasn't so quick... if the mood strikes me I could subsample Dante Labs 30X sequence to about 4X and produce graphs from that, since the technology will have slightly different read distribution. But it's also presumably 4X minimum, so real sequences could see 5X or 6X, just no promises.

MacUalraig
09-18-2019, 01:40 PM
I've got both 2x and 4x WGS data in my surname project and they weren't bad for testing known Y SNPs, the 2x not surprisingly had a lot of no-calls but even there it was enough to do a fairly good branch placement. Novel variants were another matter without more samples.

The Rocca et al. citizen science paper used 2-4x data

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0041634

but they had sufficient samples to id new branches.

Donwulff
09-18-2019, 01:58 PM
On Y-chromosome, https://isogg.org/wiki/Full_Genomes_Corporation#/media/File:WGS-150bp-Y-coverage-estimation.png seems to suggest you get maybe 16% callable loci on Y chromosome. Of course if that 16% happens to be your known terminal SNP, due to phylogeny you have very good idea of your rough haplogroup, but much less luck verifying or finding new terminal SNP's. It's haploid chromosome, so you don't have the same homozygous/heterozygous problem. The coverage must not be high enough to get any useful STR's out of it. Seems like it would be pretty decent for mtDNA.

But I'm not really talking about Y chromosome analysis at all. That's not health, wellness or nutrigenomics. People have been able to do YSEQ orientation panel to get their exact known terminal SNP for about same price for a while.

I'm just pointing out that the 100 EUR 4X sequence is a terrible choice if you want reliable & actionable health data, or autosomal DNA matches. It's affordable, but low expectations.

1000 genomes project (About 2500 samples in the end) used 4X sequencing; it's not a poor choice for population genomics where you don't care about individual genotypes, just general trends. Also, Nebula does have I think 0.25X sequencing with imputation for much higher price than that (after a loud campaign about how they were going to be free). So it's being done, but I wouldn't really trust the results on individual level.

Tonev
09-18-2019, 05:06 PM
Just decided to ask Dante Labs for a price for upgrading options from 30x to 50x for former users like me. One of them, ​Nat, twice replied that there is nothing like this- price upgrade.
Quoting "If you are considering getting the 50X, you would need to have that one purchased."
Thats it as of tonight...:(...

Donwulff
09-18-2019, 05:14 PM
And on a second thought, I'm not certain what we're talking about because coverage over the Y chromosome is bit tricky. The normal sequencing coverage is calculated over diploid ie. autosomal chromosomes, so 4X sequence should have average read depth of 2X over euchromatid regions Y-chromosome. Conversely, 4X Y-chromosome sequence would be 8X whole genome. If it's coverage over whole Y chromosome, all bets are off because most of the Y chromosome doesn't sequence well and therefore gets 0 coverage. The graph I linked says WGS reads, so we can assume the Y-chromosome reads are about half that, and the low coverage/callable loki supports that.

Some regions of Y chromosome DO sequence more reliably/commonly, and because they need to be sequenced in all compared sequences, phylogenetically informatative loci tend to be among those, so this isn't to strictly say "16% chance of getting your terminal SNP", too many variants for that. Bit higher average read depth, bit higher quality reads covering the SNP of interest, accepting bit lower confidence variant call due to supporting evidence (Phylogenic tree, family ancestry etc.) and you'd have a decent chance of getting at, or at least near your terminal branch Y-SNP. Note though that YFull for example at least says they'll take minimum 15X WGS reads only, though they did take lower in the past.

Donwulff
09-18-2019, 05:23 PM
Calling the first test "Intro" makes it indeed sound like you should be able to upgrade, but it doesn't really appear to make much sense for them to offer upgrades. The sequencing has to be prepared anew, bioinformatics analysis have to be ran anew from start, the cost to them is same and there can't be much air in their prices right now. Or you could do like someone on these forums, and buy two 30X sequences. If they're on the same sequencing technology, I don't see why they couldn't offer to combine them into one sequence, and even offer mapping to GRCh38 as a service though. Failing that you might be able to do it on Sequencing.com (or at home of course), I've not played with their paid offerings though so I don't know if they have multi-input. Certainly should allow multiple FASTQ for separate read groups.

bjp
09-19-2019, 02:30 PM
I just emailed Dante Labs with my monthly beg for my paid-for hard drive containing raw data, summarizing the timeline from purchase on April 4th, an email from them on April 7th saying it would be shipped in "a few days", to "in a few weeks" on June 6th, to my latest stating I need it shipped with a tracking number in the next 30 days or I will take further action.


Two months since this post, five and a half months since paying for the hard drive delivery of raw data, and 14 months after returning the kit, I finally received a tracking number last night, shipping out of Rome, Italy. Once received I will add some notes on the contents. The product here was the 30X WGS offered via Amazon for the Prime Day promotion in July 2018.

Geldius
09-27-2019, 10:45 PM
I'm also among lots of people waiting for raw data of Dante WGS.

2018-11-26 - order placed with Dante
2018-12-14 - shipment notification from Dante
2019-01-01 - DNA sample complete, ready to mail
2019-01-16 - Dante confirmed receipt of sample
2019-01-30 - completed DNA extraction, sample classified as level A
2019-04-27 - sequencing finished
2019-05-17 - results ready, VCF only

… waiting for FASTQ, BAM files

tiggerle
09-30-2019, 07:56 PM
I'm also among lots of people waiting for raw data of Dante WGS.

2018-11-26 - order placed with Dante
2018-12-14 - shipment notification from Dante
2019-01-01 - DNA sample complete, ready to mail
2019-01-16 - Dante confirmed receipt of sample
2019-01-30 - completed DNA extraction, sample classified as level A
2019-04-27 - sequencing finished
2019-05-17 - results ready, VCF only

… waiting for FASTQ, BAM files

I am very happy that I came across this forum because we share a similar story:

2018-11-23 - order placed with Dante
2018-12-20 - sample finally received (first kit was broken)
2019-March - sequencing finished, only 17 pages "Wellness and Lifestyle Report" and "Pharmacological Report"
2019-March - results ready, VCF only

Ever since, I am trying to:
- Order BAM / FASTQ files, but they keep ignoring my requests. If I get a reply at all, it is something like a standard mail from Andrea that they know that they are late with sequencing (which doesn't fit my case and request at all...).
- Get the full “Wellness and Longevity report” that was said to have 160 pages when I ordered the kit
- Get a customized report on all the variants present in my genome that are considered pathogenic as well as a list of VUS present in my genome, because that's what I had been promised to get BEFORE I ordered!

You can well imagine that I feel betrayed. I am already thinking about sueing them.

Geldius
09-30-2019, 10:59 PM
I'm a bit disappointed as well.

Last year, the delivery timeline was much better:

2017 Nov - order placed
2018 March - VCF ready
2018 June - raw data delivered at HDD

tontsa
10-01-2019, 07:11 AM
I have their new 4x offering with 2 weeks delivery running.. they acked receiving the sample on last Friday. That will tell if they are still up to their old tricks. I also have one 30x going when they introduced the 90 day guarantee.. i'm guessing they forgot one zero from it..

aaronbee2010
10-01-2019, 10:38 PM
I have their new 4x offering with 2 weeks delivery running.. they acked receiving the sample on last Friday. That will tell if they are still up to their old tricks. I also have one 30x going when they introduced the 90 day guarantee.. i'm guessing they forgot one zero from it..

Another AG user here purchased three 4x kits from Dante. I'll post the timeline here when the results arrive.

ntk
10-02-2019, 05:43 AM
I ordered my kit back in mid-February and they got the kit back from me on February 25. A month ago, on September 2, I got this email from Dante:

Where is your kit?

Why is it taking so long to receive your results?

As the CEO of Dante Labs, I am aware that these questions are popping in your mind. The responsibility is ours, and primarily mine. You deserve an answer.

[snip excuses]

As a result, today, approximately 84% of all our samples received by the end of July have been sequenced. This numbers was below 70% in June (apple to apple comparison). We plan to finish the entire backlog of samples by mid October.
Miffed, I asked them how that is so when mine is still listed as "QC completed." Their reply:

Your sample is also being sequenced as with all our samples. We mention the June and July samples to explain how far along we are in releasing the data for our customers especially the ones that were before June.
So what is 84% of? Clearly not the samples received in July; maybe all of the kits they ordered from inception through the end of July? Maybe just a random number?

But a month later, their new and improved dashboard still has the bubble filled at "QC Completed," with "Sequencing Started" still grayed out, shriveled and lonely in the future half of the timeline:

Your Kit will be Sequenced Shortly

Your kit has passed the quality care inspection and is scheduled to be sequenced shortly. After sequencing is complete your results will be posted soon after.
It seems that someone is being less than honest. But hey, it's not mid-October (yet), so I guess they left themselves a little wiggle room.

I do have to say, a dishonest apology with more empty promises leaves me a lot less satisfied than I was just sitting around waiting with radio silence and low expectations.

Petr
10-02-2019, 02:40 PM
3 kits delivered to Dante in Italy on April 4th, their reply today:


Thanks for reaching out and apology for the delayed results.

The kits are still in the process and the FASTQ files are not ready. We are also battling with a backlog of result to be released right now, but your results will be availed in the next few weeks. You will be notified by email when we have a more precise timeline for your results.

Thanks for your patience and apology for the inconvenience.

aaronbee2010
10-02-2019, 03:12 PM
Just decided to ask Dante Labs for a price for upgrading options from 30x to 50x for former users like me. One of them, ​Nat, twice replied that there is nothing like this- price upgrade.
Quoting "If you are considering getting the 50X, you would need to have that one purchased."
Thats it as of tonight...:(...

It appears they've taken the 50x test off their store, which is a shame. They've replaced it with their previously-seen 130x exome + 30x rest-of-genome service.

I can see why they've done it though. 130x exome coverage is very good for health reasons, and they don't seem to market themselves to people interested in genealogy.

teepean47
10-02-2019, 06:27 PM
I have their new 4x offering with 2 weeks delivery running.. they acked receiving the sample on last Friday. That will tell if they are still up to their old tricks. I also have one 30x going when they introduced the 90 day guarantee.. i'm guessing they forgot one zero from it..

I ordered one as well as I noticed they were advertising their new European lab that was opened on 1st of September.

Donwulff
10-02-2019, 08:23 PM
You could still get twice 30X, although I'm not sure if they'd do the bioinformatics combining. Also I forgot what the previous test cost, but 2x30X seems somewhat better option if you really deserve that. I can't imagine that being meaningfully better for genealogy though; that has it's own challenges but not really sequencing-depth constrained once you're at around 30X. Trying your luck on hard to sequence Y chromosome SNP's perhaps, perhaps. The WGS+WES is actually kinda interesting from technical standpoint because they don't combine so readily, both sequences would normally need separate analysis. I don't know if they deliver it like that.

Donwulff
10-02-2019, 08:31 PM
Long sequences were supposedly done at their Italian lab too, but they're still having schedule problems with those. Although the timestamps on my FASTQ seem to indicate they sequenced it months ago (even before I sent them e-mails asking why it was "waiting for QC" for months) but made the FASTQ available only yesterday. So... I certainly wouldn't hold my breath for the delivery schedule. Still, they seem to be the most affordable and in some cases only option out there, but missing delivery promises does hurt their reputation. Oddly enough when other companies constantly do it, I don't see people "rioting" calling them a fraud or threatening suing them, though. I wonder is this because of poor expectation management, social media management, affordable price-point where different crowd buys them, them being newcomers to the scene, people being fans of different company, or something else?

JamesKane
10-02-2019, 11:56 PM
I wonder if it just isn’t the volume of tests they may be handling. The one data guy claims to have to work through a petabyte backlog. That would indicate over 4,000 30x WGS BAMs and FASTQ files have been processed. Consider unhappy customers are 3x more likely to leave any kind of feedback and things don’t look so bad.

ntk
10-03-2019, 01:46 AM
I wonder if it just isn’t the volume of tests they may be handling. The one data guy claims to have to work through a petabyte backlog. That would indicate over 4,000 30x WGS BAMs and FASTQ files have been processed. Consider unhappy customers are 3x more likely to leave any kind of feedback and things don’t look so bad.
I mean that's nice to be optimistic and all, but that doesn't do much for me when they had my kit in February and I'm still stuck at "QC completed." Meanwhile they've since had huge sales where they guaranteed results in 90 days and they are still advertising full data in 8 to 10 weeks on new sales, and they are claiming on other forums that they are down to 1,200 kits outstanding. Do you not see anything wrong with this picture?

They told me specifically a month ago that my kit is "also being sequenced as with all our samples," but their dashboard still says it hasn't started.

JamesKane
10-03-2019, 09:07 AM
Since other labs are having trouble clearing back logs of tests with much smaller file sizes? Other than stupid marketing claims, no, I don't see a problem here. It's is not uncommon for NGS testing to take upwards of a year.

ntk
10-03-2019, 05:26 PM
Since other labs are having trouble clearing back logs of tests with much smaller file sizes? Other than stupid marketing claims, no, I don't see a problem here. It's is not uncommon for NGS testing to take upwards of a year.
That's funny, because I've never heard of any other consumer NGS companies where it is not uncommon for them to take upwards of a year.

Nor advertising times of 8-10 weeks that they know they cannot fulfill in that time, while they still have large backlogs of kits waiting > 7 months to start sequencing.

"Order volume" is not an excuse when they are still taking orders and breaking promises for new customers.

Not to mention the fact that I was told my test was being sequenced over a month ago. The most charitable interpretation is that their support staff is giving out false information because they are unable to get correct information. Less charitably, they are blowing smoke and giving out false information deliberately or without checking. The only logical possibility where that was true is that it was actually sequenced a month ago and they are sitting on it without updating their new-and-improved communications dashboard.

JamesKane
10-03-2019, 09:29 PM
The new dashboard is garbage frankly. My Long Read sample was fully quality controlled and sequenced in late July according to the date stamps. The progress wasn't updated until the FASTQ was made available yesterday.

Again we don't know the volume of testing they are doing, we just see the unhappy customers who expect unrealistic timelines for what they paid for the sequencing service. As with all services you may choose two from price, quality, and timeliness menu. Granted Dante's marketing is overly optimistic and based on the best case of what can happen. Reality is that best case rarely is achieved by startups.

ybmpark
10-10-2019, 01:48 PM
BBB recently downgraded Dante Lab to F.

There is another board where they have an average 3 star rating(5 max) but it is apparently due to Dantelab employees generating multiple IDs to boost its rating. So you have mostly 1 star, some 2 and rarely 3 star ratings and then suddenly there are a lot of 5 star ratings and they appear in spurts, on a single day each time. It is obvious who they are.

Other than the fact that they cannot deliver what they promised, they are downright sleazy.

I also overheard what they blurted out apparently by mistake;t they are trying to recover, from the "machine", the raw data, BAM and FASTQ. So if you ordered a hard drive later, after you ordered the intial test, the chances are good that they had thrown out the raw data already.

I also find some members here quite funny. Why are they so beligerant to other members who are dissatisfied for very legitimate reasons? Once they get their results, they suddenly turn into Dante lovers. Others here like me are still pissed because we have not gotten the result after almost a year, OK?(I have seen a guy who claimed 1&1/2 year)

Donwulff
10-10-2019, 04:46 PM
I think many of us care mostly about civil, factual discourse. The above post for example contains wild, unsourced nonsensical conspiracy-theories and ad-hominen attacks against "some memebers". At risk of sounding hypocritical, to be honest that seems like typical attitude of Dante Labs critics, which certainly doesn't help their cause, because that forces people to defend *themselves* and not just Dante Labs.

I have not yet seen a DNA testing company which keeps their promised schedules. Ordering a complicated wet-lab analysis is nothing like ordering some off the shelf item or picking a soft drink off a grocery store, this we know. As for the ever-changing world of bioinformatics, be glad if you did NOT pre-order Duke Nukem Forever... People have been waiting for FTDNA Big Y upgrades for over half a year, with multiple months delays, and they've shut down raw BAM downloads multiple times for new and existing customers, once for over half a years as well - and that's to mention just single well established and trusted DNA company.

Dante Labs was running DNA analysis at outside labs off-site, so there's no way that they could try to "recover from the machine" anything. Luckily, there's no need to because the short read sequencing machines do not produce direct FASTQ or BAM files, they are processed on servers as the produced files clearly indicate. Several people have also received their raw data after months of waiting, so it doesn't look like they're lost at all. I'm also wondering where you were in position to overhear Dante Labs "they" blurting anything.

On the "another board where they have an average 3 star rating" Dante Labs looks to be responding to negative reviews, where I saw them claiming they did not receive raw data from said outside lab, and are therefore sequencing those samples again at their own cost. I'm not certain whether to believe that either (How come they were delivering raw data, just slowly, and this is business-to-business, they should be able to sue the lab for breach of contract).

And on the practical side a lot of us recognize the benefit to more affordable (CLEARLY so - many people paid less for their sequences than a DNA microarray test which produces only 0,03% of the results nevermind the raw data) sequencing even IF the wait-time was a year. Unlike aforementioned well established DNA-testing company, I'm not sure >year wait-times are real. One should ask themselves why TrustPilot has almost no reviews below 1 and 5, and why so many negative reviews repeat identical phrases. Yet just from Anthrogenica forums, the majority of customers do appear to have received their data, raw data included, except for a few people who are seemingly sticking "It's a scam, you idiots will never see anything" (paraphrased) in multiple threads.

karwiso
10-10-2019, 09:17 PM
I have ordered 8 DNA tests from DanteLabs and 9th is processing. I have recieved the results for all tests and all requested FASTQ/BAM files.

Yes, it took time. A lot of time sometimes. For FASTQ/BAM on one test it took 7 months to get the raw results, but they are real and authentic. I have created some artificail autosomal kits based on DanteLabs raw data and they matched results from autosomal kits from other vendors, like FTDNA, MH or 23andme (from the same testers).

My choice of DanteLabs was based on the tests' affordability. I could order them for more of my potential realtives and the results have been submitted to yfull.com and were accepted for their analysis.

Well, I am not happy about waiting times at DanteLabs, but others are not on schedule either. FTDNA has delayed results delivery on BigY on 3 of 4 BigY testes I have ordered, several moths. It doesn't execuse DanteLabs, but the situations is not so uncommon in the branch. DanteLabs tests are affordable and I don't bother about some additional waiting. The results are real and reliable.

I have shown my results to some other genealogists and they have been interested in doing their tests also. I hope that DanteLabs will stay in the branch since there is need for affordable WGS testing. If it wouldn't be DanteLabs I would probably do 1-2-3 WGS tests instead of 9.

Negative reviews and negative posts about DanteLabs appear also in some spurts. So it is obvious that either some individuals or some competitors are in the game.

ybmpark
10-11-2019, 01:24 AM
Oh really?... This is the response to certain user by Dante Lab at trystpilot.
"We had an issue with the BAM and FASTQ delivery by a third party lab. We are resequencing samples like yours so that you can have your full raw data (at an extra cost for us)."

In many cases(I say the majority) they have to do the sequencing again because they cannot recover the raw data from the sequencing machine. It is already gone.
It is not data processing into BAM and FASTQ format; "sequencing sample" means biochemical process, not data processing.

MacUalraig
10-11-2019, 07:26 AM
Negative reviews and negative posts about DanteLabs appear also in some spurts. So it is obvious that either some individuals or some competitors are in the game.

I think people can form their own opinions on who is a genuine poster and who isn't - probably best to keep the thoughts on specific cases to themselves though!

I have waited over a year for raw data from my short read test so its hard to see me ever recommending them to anyone else. There are other European options emerging which may be more tempting for me.

MacUalraig
10-11-2019, 07:43 AM
I'm not sure >year wait-times are real.

When I first got interested in Dante I saw comments from Max de Plum and Ted Kandell warning of year plus waits for data and decided as usual that I would make my own mind up. Hey guess what I am now waiting over a year too. If you choose to disbelieve that go ahead.

Having said that, people I already knew and have no reason to doubt have come along later and got the lot well within a year. Whether that leads one to conclude that I'm going to get mine too eventually ...?

pinoqio
10-11-2019, 05:02 PM
I also ordered from Dante knowing that their actual delivery times would be many months longer than advertised, and looking at their last annual report, I still don't think they are a scam.

But what can not be denied, is that they are paying for fake reviews and comments, and incredibly crude ones at that.
Take for instance this review on TrustPilot: https://www.trustpilot.com/users/5d90c9fb4e1ff7090da41814
Ostensibly from Helsinki, Finland with a decidedly non-finnish name and a stock photo as an avatar: https://www.bigstockphoto.com/image-215511178
Going through the five star reviews, I'd say easily a third are pretty blatant astroturfing with an American/English name, random location, and a stock/celebrity avatar.
And then there is this gem of a reddit thread: https://www.reddit.com/r/promethease/comments/cb0pxt/dante_labs_a_good_discovery/ (All accounts wrote one post and were created on the same day)
On the customers Facebook group, there is a mail showing them offering a 50% refund in exchange for a better TrustPilot review.

Most likely, they are paying for some dubious "online reputation management" made in Hyderabad, but it looks terrible and I can't blame those who feel they got scammed.
Honesty and details about the timeline and backlog would go a long way for Dante Labs instead of BS canned responses.

MacUalraig
10-12-2019, 11:25 AM
Take for instance this review on TrustPilot: https://www.trustpilot.com/users/5d90c9fb4e1ff7090da41814
Ostensibly from Helsinki, Finland with a decidedly non-finnish name and a stock photo as an avatar: https://www.bigstockphoto.com/image-215511178


"Reply from Dante Labs
Oct 1, 2019
Hi Russell, thanks for your feedback. We are now enabling every new user to receive data via download.

Mary"

tiggerle
10-12-2019, 12:01 PM
Did any of you order a HDD with the raw data?
I am asking myself how they would assign it to the right kit, because I have two analyzed and there was no option to choose which raw data from which kit should be delivered... How did you guys manage to get the right data? My sequencing was finished half a year ago - how long would I have to wait for the raw data after ordering it (finally, I was able to order it 2 weeks ago after months of pleeding for it, haha).

Thanks for your experiences and advice

Petr
10-12-2019, 11:14 PM
3 kits sent back more than 7 months ago, their reply is every month the same:



Hello Petr,

Thanks for reaching out.

We have a backlog of samples right now, but your sample will be sequenced in the next weeks. You will be notified by email when we have a more precise timeline for your results.

Thanks for your patience and apology for the inconvenience.

Kind regards,

Devi

teepean47
10-13-2019, 08:53 AM
Since our last update 45 days ago, we sequenced more than 1000 whole genomes in our new lab in Europe. We are now sequencing more than 300 whole genomes per week.

This means that 300 people will receive their long-awaited, full results every week.

We have not completed our backlog yet, but we are getting closer every day.

We will now post more regular updates.

Also, we learned from our mistakes and we made investments to ensure that no one will ever have to experience delays in the future. We will continue to ensure that more and more people will get access to advanced genomics at accessible prices.

Thanks and kindest regards,

Andrea Riposati
CEO
Dante Labs


https://www.dantelabs.com/blogs/news/sequencing-update-2

aaronbee2010
10-13-2019, 09:12 AM
https://www.dantelabs.com/blogs/news/sequencing-update-2

But how many of the backlogged samples are being processed per week? And how many backlogged samples do they have in total? Those statistics would be far more relevant to the older customers still waiting for their results.

MacUalraig
10-13-2019, 11:20 AM
The statement could mean anything. If new cases are being done quickly at the Italian lab why describe them as 'long awaited'?

What samples is he counting as 'backlog'?

No reference to hard disks of course...

Petr
10-13-2019, 06:12 PM
My 3 samples are sitting in the Italian lab since March 8th. Are they still sequencing even older samples? Or what does this mean? According to https://genome.dantelabs.com/updates the sequencing was not started yet.

Regarding hard disks, I cancelled the order on June 24th and I have not received the refung yet. On September 17th I received:
Hello Petr,

Thanks for your message.

I have seen this request for cancellation and refund. I apologize for the timeframe you have been waiting for. Once it has been handled by our internal team, we will reach out to you with more details.

Thanks for your patience. Apologies for any inconvenience caused.

Best Regards,

Lenox
No message since then.

aaronbee2010
10-14-2019, 10:07 PM
Hello all,

Before I say anything else, I'm a complete novice with all of this, so apologies if I say (or have done) anything that sounds silly :)

I've generated a .vcf file of SNPs from a donated Dante Labs sample (thanks to a very kind member from the Dante Labs Facebook group) that are not on the YBrowse database. I'm not sure what filters I should use on the list?

I've tried removing SNPs under YSEQs criteria:

"Please do not suggest SNPs in the following hg38 regions:
chrY:1..2781479 (pseudo autosomal region 1, PAR1)
chrY:10072350..11686750 (synthetic assembled centromeric region, CEN)
chrY:20054914..20351054 (DYZ19 125 bp repeat region)
chrY:26637971..26673210 (post palindromic region, actually gradual start of Yq12 repetitive region)
chrY:56887903..57217415 (pseudo autosomal region 2, PAR2)
Those sections of the Y chromosome suffer from frequent recombination events and are therefore not useful for phylogenetic studies. Unfortunately we can't provide primers for those regions."

I then removed SNPs with a quality score of less than 30 and a read depth per SNP of less than 10, however there are still 2470 novels remaining. The sample belongs to R1b-CTS9733*, and I'm wondering if the number of remaining SNPs is normal for a subclade which YFull says was formed 3900 ybp and has a TMRCA of 3600 ybp? I have a feeling there are still too many SNPs, but I'm not sure as I'm not an expert here.

Again, if I've messed up anywhere, then I'm sorry. I'm very new to all of this :)

Thanks in advance.

JamesKane
10-14-2019, 11:42 PM
Which reference is the VCF prepared against? Remember that ybrowse and YSEQ are using GRCh38 now. By default Dante aligns to hg19. This may be inflating your figures.

However, the number of private mutations really depends on the distance from the other samples you are comparing against.

aaronbee2010
10-15-2019, 12:17 AM
Which reference is the VCF prepared against? Remember that ybrowse and YSEQ are using GRCh38 now. By default Dante aligns to hg19. This may be inflating your figures.

However, the number of private mutations really depends on the distance from the other samples you are comparing against.

The VCF was prepared using a GRCh38 (no patch) reference FASTA.

I was initially provided with R1 and R2 FASTQ files which I aligned with BWA MEM against GRCh38 as well.

MacUalraig
10-15-2019, 10:21 AM
In addition to the 'CEO second sequencing update' I have now also received an email 'Clarification about Hard Drives'

'...we hope you are enjoying your Results. You will receive your BAM and FASTQ files within two weeks. Since we will upload these files on the cloud, you don't need to buy a hard drive, unless you want one'.

Again I have no idea what order this refers to or whether they just go through their entire address book for these mail shots. Is it talking about my July 2018 short read test, that I have only received a VCF for? I somehow doubt it. Is it a reference to my Long Read test which they gave me a FASTQ file for some time ago but no BAM or VCFS? Err somehow I doubt that too.

pmokeefe
10-15-2019, 11:57 AM
In addition to the 'CEO second sequencing update' I have now also received an email 'Clarification about Hard Drives'

'...we hope you are enjoying your Results. You will receive your BAM and FASTQ files within two weeks. Since we will upload these files on the cloud, you don't need to buy a hard drive, unless you want one'.

Again I have no idea what order this refers to or whether they just go through their entire address book for these mail shots. Is it talking about my July 2018 short read test, that I have only received a VCF for? I somehow doubt it. Is it a reference to my Long Read test which they gave me a FASTQ file for some time ago but no BAM or VCFS? Err somehow I doubt that too.

I received an email with the exact same content about BAM and FASTQ files. It doesn't particularly apply to me. I believe it was a mass mailing.

generjustin
10-16-2019, 02:43 AM
Yes I got BAM and FastQ on hard disk. But absent an explanation from Dante I don't know what the files are. The main folder name is basically the kit number with WGZ appended. It has three folders: clean_data, result_alignment and result_variation. The latter two folders use WGZ in the file names and the first folder (clean_data) uses WGS in the file names. There are 8 1.fq and 2.fq files (16 fq files in total). Is it the case the clean_data are 30x and the alignment & variation files are 130x (just going by the file names)? What are the 1.fq and 2.fq files referring to?

Hello Ecogene,

Can you send me a screenshot of the contents of the clean_data, result_alignment and result_variation folders? I want to see what is contained on the HD before I order one to see if it has any added value over waiting for the cloud?

tontsa
10-16-2019, 03:48 AM
Yesterday I got fastqs for the 4x 2 week order. So they actually delivered them also on intro order even though they don't mention them atleast on european product page.
FastQC results Total Sequences 300974962, Sequence length 35-151, Sequence Length Distribution warn
#Length Count
35-39 899099.0
40-44 647287.0
45-49 467423.0
50-54 662310.0
55-59 525450.0
60-64 717720.0
65-69 615328.0
70-74 776673.0
75-79 741090.0
80-84 844697.0
85-89 898572.0
90-94 912192.0
95-99 1069470.0
100-104 995304.0
105-109 1263675.0
110-114 1102195.0
115-119 1479676.0
120-124 1233438.0
125-129 1721059.0
130-134 1422394.0
135-139 2005357.0
140-144 1709133.0
145-149 2.615329E7
150-152 2.5211213E8

Will have to align it next to hg38.p13 to see how it compares to my 30x kit.

MacUalraig
10-16-2019, 07:57 AM
What was the turnaround for your '2 week' test?

I admit I'm tempted to try it, but really they should be doing it for free for people who've been waiting as long as I have for the outsourced version.

tontsa
10-16-2019, 10:00 AM
It was 2 days short of 14 days after they marked the saliva kit received.

teepean47
10-17-2019, 05:32 AM
It was 2 days short of 14 days after they marked the saliva kit received.

I ordered the cheapest one with eight week turnaround. Dante Labs received the kit on 27th of September and it is currently "Awaiting Quality Control Inspection".

MacUalraig
10-17-2019, 07:43 AM
According to the lastest GGI blog Dante are sponsoring the ISOGG 'Day Out' in Dublin on Sunday - see the sponsor's logo at the end.

https://ggi2013.blogspot.com/2019/10/isogg-day-out-ggi2019-dublin.html

Not sure it will do them any good, this bunch are very FTDNA-centric.

tontsa
10-17-2019, 11:21 AM
I quickly aligned the 4x fastqs to hs38DH reference (the hg38 that bwa-mem gets). Here is BamDeal stats on that resulting .bam:
#ID Length CoveredBase TotalDepth Coverage% MeanDepth
chr1 248956422 222880980 2206162418 89.53 8.86
chr2 242193529 235929417 2249759259 97.41 9.29
chr3 198295559 193989798 1852895798 97.83 9.34
chr4 190214555 185309784 1734450477 97.42 9.12
chr5 181538259 174526997 1653441972 96.14 9.11
chr6 170805979 161754999 1526610250 94.70 8.94
chr7 159345973 153090572 1453468478 96.07 9.12
chr8 145138636 138621281 1313583602 95.51 9.05
chr9 138394717 115174553 1105468086 83.22 7.99
chr10 133797422 130344787 1267614751 97.42 9.47
chr11 135086622 129790250 1254986279 96.08 9.29
chr12 133275309 129610829 1237321487 97.25 9.28
chr13 114364328 96220125 905181036 84.13 7.91
chr14 107043718 84760557 813175822 79.18 7.60
chr15 101991189 75874082 730543406 74.39 7.16
chr16 90338345 75199500 830540525 83.24 9.19
chr17 83257441 73268854 744342103 88.00 8.94
chr18 80373285 74821469 708073888 93.09 8.81
chr19 58617616 53306136 557490389 90.94 9.51
chr20 64444167 61122004 620767449 94.84 9.63
chr21 46709983 34366313 347607752 73.57 7.44
chr22 50818468 34682892 366449627 68.25 7.21
chrX 156040895 144361855 711462489 92.52 4.56
chrY 57227415 16709244 94770026 29.20 1.66
chrY_KI270740v1_random 37240 21220 81572 56.98 2.19
chrM 16569 16569 2585629 100.00 156.05

Didn't include those "chr1_KI270706v1_random" style of contigs to this listing. Any ideas what else to check or how to compare to 30x bam?

teepean47
10-17-2019, 05:23 PM
I quickly aligned the 4x fastqs to hs38DH reference (the hg38 that bwa-mem gets). Here is BamDeal stats on that resulting .bam:
#ID Length CoveredBase TotalDepth Coverage% MeanDepth
chr1 248956422 222880980 2206162418 89.53 8.86
chr2 242193529 235929417 2249759259 97.41 9.29
chr3 198295559 193989798 1852895798 97.83 9.34
chr4 190214555 185309784 1734450477 97.42 9.12
chr5 181538259 174526997 1653441972 96.14 9.11
chr6 170805979 161754999 1526610250 94.70 8.94
chr7 159345973 153090572 1453468478 96.07 9.12
chr8 145138636 138621281 1313583602 95.51 9.05
chr9 138394717 115174553 1105468086 83.22 7.99
chr10 133797422 130344787 1267614751 97.42 9.47
chr11 135086622 129790250 1254986279 96.08 9.29
chr12 133275309 129610829 1237321487 97.25 9.28
chr13 114364328 96220125 905181036 84.13 7.91
chr14 107043718 84760557 813175822 79.18 7.60
chr15 101991189 75874082 730543406 74.39 7.16
chr16 90338345 75199500 830540525 83.24 9.19
chr17 83257441 73268854 744342103 88.00 8.94
chr18 80373285 74821469 708073888 93.09 8.81
chr19 58617616 53306136 557490389 90.94 9.51
chr20 64444167 61122004 620767449 94.84 9.63
chr21 46709983 34366313 347607752 73.57 7.44
chr22 50818468 34682892 366449627 68.25 7.21
chrX 156040895 144361855 711462489 92.52 4.56
chrY 57227415 16709244 94770026 29.20 1.66
chrY_KI270740v1_random 37240 21220 81572 56.98 2.19
chrM 16569 16569 2585629 100.00 156.05

Didn't include those "chr1_KI270706v1_random" style of contigs to this listing. Any ideas what else to check or how to compare to 30x bam?

You can send the Y-DNA portion of the BAM to James Kane and he will add it to the statistics page.

If James doesn't see this post you can use the form to contact him:

https://ydna-warehouse.org/contact.php

https://ydna-warehouse.org/statistics.html

ntk
10-19-2019, 11:49 AM
Here's my updated timeline:

Feb 16, 2019: Ordered kit ($299)
Feb 18, 2019: Shipping notification
Feb 25, 2019: Kit received by Dante/logistics partner in Draper UT
Mar 26, 2019: Message support to ask why kit messenger still says "Waiting Confirmation from Dante Labs".
Apr 13, 2019: Response from Dante: "I can confirm that your sample has been received in our labs. We are currently running the DNA extraction ..."
Jun 28, 2019: Dante switches to new kit manager, status is still "QC Completed"
Sep 2, 2019: Dante: "84% of all our samples received by the end of July have been sequenced ... We plan to finish the entire backlog of samples by mid-October." I email to clarify about my kit status: same-day response: "Your sample is also being sequenced as with all our samples."
Oct 3, 2019: Kit manager still shows "QC Completed." Ask for status, same-day response: "I have checked your kit and it is still being sequenced within your [sic] lab." Asked how this is possible. Same-day response: "We are in the process of releasing data to all our customers."
Oct 13, 2019: "We are now sequencing more than 300 samples per week ... We have not completed our backlog yet, but we are getting closer every day."
Oct 19, 2019: Kit status updated to "Sequencing Started"

With luck I could have some sort of VCF at least by end of month, though I'm not holding my breath. I'd actually be okay with that timeline for my order given the deal I got (assuming getting the BAM doesn't drag out forever, we shall see).

What I'm much less okay with is the misleading and opaque/canned statements from Dante, what appears to be damning evidence of planting fake reviews, and accounts of them dragging out promised refunds for months.

MacUalraig
10-20-2019, 03:11 PM
ntk, are we to assume then that your Feb sample was sent out to the far east but is now being run in Italy? Or that its still with their old supplier? If you're stuck on the old deal you may struggle especially to get the raw data (if that is important to you eg uploading to YFull etc).

ntk
10-21-2019, 04:27 AM
ntk, are we to assume then that your Feb sample was sent out to the far east but is now being run in Italy? Or that its still with their old supplier? If you're stuck on the old deal you may struggle especially to get the raw data (if that is important to you eg uploading to YFull etc).
Your guess is as good as mine. What do you think the odds are that if I asked I would get a straight answer?

MacUalraig
10-21-2019, 03:19 PM
Low??

MacUalraig
10-23-2019, 08:08 AM
Dante are now giving a 20% discount to ISOGG members to celebrate their sponsoring of the ISOGG Day Out in Dublin the other day - code is available in the isogg forum. There are a couple of pics of Andrea talking - appropriately seems to be inside a dark cellar. :biggrin1:

Petr
10-24-2019, 12:13 PM
2 of my 3 kits delivered to Dante on March 8th were marked as "Sequencing Started" yesterday. The remaining one is still in the "QC Is Complete" state.

tontsa
10-24-2019, 12:37 PM
My April 25 kit that had "90 day" guarantee is still in QC Is complete state.. prolly lost somewhere in south pacific sea.

ybmpark
10-29-2019, 02:47 AM
No BAM or FASTQ well after 2 weeks from Andrea's "clarification". No surprise. The amount of lies and deception they try looks even comical by now.

tontsa
10-29-2019, 05:42 AM
I think they mass mailed the thing partly to wrong people.. they only could deliver them on those samples that have been sequenced on their Italy lab with Illumina. Did yours get sequenced past 8 weeks but with only those reports?

Cofgene
10-29-2019, 10:08 AM
I'd have to look at when I did my order but my WGS results arrived 10 days ago. I have the BAM and Fastq downloaded and made available for some 3rd party analysis. For a $300 fling it will be interesting to see how it fits in with my other sequencing results.

ybmpark
10-29-2019, 06:45 PM
I ordered mine during the 2018 Black Friday promo and the hard drive was ordered in early April. When I ordered the hard drive they promised to deliver within a month and possibly even within 2 weeks. From the way they talk in various forums it appears that they lost/discarded the full results and have only vcf files.

tontsa
10-30-2019, 05:51 AM
I ordered one kit also around that time.. I did get HDD though few months after the VCFs on the site. And still they only have VCFs on the site.

ybmpark
10-30-2019, 05:29 PM
Now they don't even reply to emails and inquiries. They are looking more and more like a typical sleazy operation.
And they have the audacity to advertise their new products on their facebook.

It appears that the scam mostly targets US customers. No surprise there as they are under more direct jurisdiction by European Union.
I am looking at usa.gov/stop-scams-frauds to report the scam but I am not sure whether they understand that this is indeed a fraud.
I hope to mobilize some kind of class-action law suit but my degrees are in science and I am not familiar how things go in these kinds of things.
Curiously they stopped replying just after 6 months from the sale of "hard drive".
They might have waited so that they don't even lose 70 dollars from the fraudulent sale of the hard drive.
Very greedy.

ybmpark
10-31-2019, 09:04 PM
I finally got a reply after a third try.
Over the past year I have come to understand their language and they now make it official that the raw data for some(which translates into most, probably US) samples were discarded by a "partner" company(probably bogus) and cannot be recovered.
Even though they don't say directly they do not want to resequence my sample(their exact wording is that "there is no ETA" or "promises". But I have learned what it means in some countries.).
And they do not want to issue a refund.(their exact wording is that my request was passed to the relevant department. But this is a joke.)

Essentially they are just saying "tough luck, if you have time and money sue us".
I just cannot imagine that anyone will try to defend these fraudulent scammers. But I have seen a few in this very thread.
From the way they change their words it appears that they now are preparing for another BF special.
If the price is low enough there will be people who think they can try their luck.

I am going to email every journalist who wrote about Dante Promo as if it were legitimate.
At least they owe me a clarification and another article that shows the true nature of fraudulent Dante Labs.
I just hate to see them laughing all the way to the bank again.

karwiso
11-01-2019, 10:47 AM
I'd have to look at when I did my order but my WGS results arrived 10 days ago. I have the BAM and Fastq downloaded and made available for some 3rd party analysis. For a $300 fling it will be interesting to see how it fits in with my other sequencing results.

Have you requested access to FASTQ files? I got results of one WGS recently, but there are only one BAM and two VCF files (plus their indexes). BAM is still hg19/GRCh37.

MacUalraig
11-01-2019, 01:00 PM
I finally got a reply after a third try.
Over the past year I have come to understand their language and they now make it official that the raw data for some(which translates into most, probably US) samples were discarded by a "partner" company(probably bogus) and cannot be recovered.
Even though they don't say directly they do not want to resequence my sample(their exact wording is that "there is no ETA" or "promises". But I have learned what it means in some countries.).
And they do not want to issue a refund.(their exact wording is that my request was passed to the relevant department. But this is a joke.)

Essentially they are just saying "tough luck, if you have time and money sue us".
I just cannot imagine that anyone will try to defend these fraudulent scammers. But I have seen a few in this very thread.
From the way they change their words it appears that they now are preparing for another BF special.
If the price is low enough there will be people who think they can try their luck.

I am going to email every journalist who wrote about Dante Promo as if it were legitimate.
At least they owe me a clarification and another article that shows the true nature of fraudulent Dante Labs.
I just hate to see them laughing all the way to the bank again.

Post a redacted screenshot.

ybmpark
11-01-2019, 06:10 PM
Post a redacted screenshot.

What for?

MacUalraig
11-01-2019, 06:51 PM
What for?

So we can see exactly what they said as opposed to your paraphrasing of it.

ybmpark
11-01-2019, 08:49 PM
So we can see exactly what they said as opposed to your paraphrasing of it.
You don't seem to demand the same when a poster reports a positive experience with Dante.
Posting a PM is against the forum rule, the mods may insist the same for email messages.

MacUalraig
11-01-2019, 09:04 PM
they now make it official that the raw data for some(which translates into most, probably US) samples were discarded by a "partner" company(probably bogus) and cannot be recovered.

Well that to me is a major claim and I've not seen anyone else report it - I don't wade through all the noise on the fb group though to be fair. Is it 'official' on their website or from them posting it on social media? Or 'official' only in a private email to you? Why did it require you to 'come to understand their language'?

Some of us here are well known to each other from other forums and/or real DNA talks at shows. I've done public DNA talks for example. JamesKane and ChrisR are also well known to me. Some accounts have sprung up from nowhere to post good or bad stuff about Dante. If you're in the latter group and get all grumpy when asked to back up your claims, what do you expect?

Jan_Noack
11-01-2019, 09:30 PM
I have all my results. I still have to look at them though ...but seem to be all there and took about 6 months form Dante receiving kit in US. Just got the last (the mtDNA) yesterday.

MacUalraig
11-01-2019, 09:36 PM
I have all my results. I still have to look at them though ...but seem to be all there and took about 6 months form Dante receiving kit in US. Just got the last (the mtDNA) yesterday.

You mean reports or raw data? And if latter, aligned to what? pls.

sktibo
11-01-2019, 10:35 PM
Well that to me is a major claim and I've not seen anyone else report it - I don't wade through all the noise on the fb group though to be fair. Is it 'official' on their website or from them posting it on social media? Or 'official' only in a private email to you? Why did it require you to 'come to understand their language'?

Some of us here are well known to each other from other forums and/or real DNA talks at shows. I've done public DNA talks for example. JamesKane and ChrisR are also well known to me. Some accounts have sprung up from nowhere to post good or bad stuff about Dante. If you're in the latter group and get all grumpy when asked to back up your claims, what do you expect?

Personally, unless the user is trusted, if they offer no convincing evidence to their claims about a company or their experience / results, I automatically assume they're making up what they're saying, because even on this forum, a lot of people lie about results all the time. They make fake accounts, and sometimes go as far to modify screenshots or images. It's the internet for God's sake, and it's full of lunatics. I think it's insane how often people believe what is posted here without evidence of any kind.

ybmpark
11-02-2019, 12:41 AM
Well that to me is a major claim and I've not seen anyone else report it - ...

It is still wise not to make it known to Dante who is complaining to avoid being targeted as they have previously done.
But I will just quote that part
"
He said that for your sample, the raw data is unfortunately unavailable due to an error by a partner lab where we had previously sent some of our samples, yours among them."

"he" is one from "biometric department". But I often imagine it is all Andrea under different names though.

And now screenshot... who the hell you think you are?

Jan_Noack
11-02-2019, 01:17 AM
You mean reports or raw data? And if latter, aligned to what? pls.

ah, I had only just read the email.. My apologies. It looks like another lot of FASTQ files (FASTQ_1 and FASTQ_2). I already had a FASTQs (2*4 files) and now I have 2 new FASTQ files, first one I am now downloading is 25GB?


So looks like I received concatenated FASTQ files instead? I expected just a short file with the raw data, like in FTDNA, ie a 17KB FASTA file?

I guess I can get them extracted at FGC or somewhere or one day, get a new computer, download the software then extract it all myself. When I placed my order with the discount code, the price of the Dante test at the time, was not much more than the FTDNA mtDNA test so I thought why not? I already had the BigY500 test for this person.

Have you tried or do you know if anyone has merged the FTDNA BigY results with the Y from the WGS? I'm wondering if FGC or YSEQ could or would do this if it would produce any benefit from the greater no of reads in the BigY?.

Donwulff
11-02-2019, 03:12 AM
This was reported before, on https://anthrogenica.com/showthread.php?12075-Dante-Labs-(WGS)&p=609442&viewfull=1#post609442 I wrote "On the "another board where they have an average 3 star rating" Dante Labs looks to be responding to negative reviews, where I saw them claiming they did not receive raw data from said outside lab, and are therefore sequencing those samples again at their own cost. I'm not certain whether to believe that either (How come they were delivering raw data, just slowly, and this is business-to-business, they should be able to sue the lab for breach of contract)."

Unfortunately, I just went back trying to find that official reply from Dante Labs to screenshot it for dicussion, however I could no longer find it, which is... eh, suspicious, though there's several reasons that could be including that I might just have missed it. While I do not want to forward any conspiracy theories, I do remain suspicious of that explanation given, because many of the long read sequences were delayed by months despite being sequenced locally, and because there are many, many reports of people receiving their BAM/FASTQ but just delayed. This, combined with them favoring hard-drive delivery over online download strongly suggests they're having a bandwidth problem of some sort, which wouldn't be surprising considering the amount of data whole genome sequencing generates. Of course, it's possible that some raw sequence data has also been lost, perhaps because they couldn't download in time, although medical regulations seem to require that a copy of all human sequences be archived for possible later audits.

I still don't get what people are getting by attacking fellow Dante Labs customers & site users, and why the site-admins are letting that continue unabated.

Petr
11-02-2019, 09:04 AM
Summary of my orders of WGS 30x test:

2 tests ordered in October/November 2017: After several e-mails I have received both FASTQ and BAM (hg19) files on HDD in July 2018. The coverage was 30x and files 100 GB in size. Read length 100 bp.

5 tests ordered on 20-Nov-2018, 4 VCF results received on 02-Apr-2019, one on 07-Jun-2019. The coverage was 8x, 10x, 15x, 20x, 20x. I complained against the low coverage and Dante latest reply is:
I will be escalating to the relevant team for further assistance. We will reach out with an update as soon as we have more information.
In June 2019, FASTQ files appeared for download, the size of one FASTQ file is between 19 and 25 GB only. Read length 150 bp. No BAM file is available.

On 04-Apr-2019 I ordered 2 HDDs with FASTQ and BAM data (they told me that one HDD will take raw data of 2 kits). Never received, I asked for refund, never received, many promises, the latest:
I have seen this request for cancellation and refund. I apologize for the timeframe you have been waiting for. Once it has been handled by our internal team, we will reach out to you with more details.

Thanks for your patience. Apologies for any inconvenience caused.

3 tests ordered on 17-Feb-2019, 2 of them were marked as "Sequencing Started" on 23-Oct-2019, one is still in "QC Is Complete" state, for 6 months already. Dante latest reply:
We have a backlog of samples right now, but your sample will be sequenced in the next weeks. You will be notified by email when we have a more precise timeline for your results.

1 Whole GenomeL test ordered on 15-Apr-2019, only the FASTQ file received. Dante lastest reply from October 30th:
Thank you for letting us know. I have submitted your query to our bioinformatics team who will look into uploading the rest of your raw data files for you.

I cannot give you an exact estimate on time, but I have submitted the request to them now. You should receive VCF, BAM, and a second FASTQ file.

Rather frustrating.

rippleish20
11-02-2019, 05:42 PM
I have no idea what's going on behind the scene at Dante, but I have had a questionable experience. I ordered a WGS 30x in late Nov 2018. At the time they were insisting you order a HD also if you wanted a BAM file, even though their advertisements were for the results, including Raw data, for $199 inclusive. I got VCF files in May but no mention of the BAM file. I have contacted them six times over the last six months. In three cases, I received useless or misleading excuses. I file A BBB complaint at the end of August, where Dante had a F rating. In the other three cases, I received no reply at all. The complaint was closed by them with a reply of "your business is important to us, contact us if you have questions". I submitted the complaint because I *had* contacted them but it got no results.

Two weeks ago, Andrea sent me (and others) an email with a subject of "Clarification about Hard Drives

Hi-

Thanks for trusting our services. We are happy to introduce you to a community of thousands of people who have received their whole genome sequenced and liked to share their experience!

We hope you are enjoying your Results. You will receive your BAM and FASTQ files within two weeks. Since we will upload these files on the cloud, you don’t need to buy a hard drive, unless you want one.

Thanks,

Dante Labs Team

"

More than two weeks later, I have yet to see my BAM file. Not that he also specifically said you do not need to have ordered the HD.

If you look in the Dante Customer group on FB, you will find many people in the same boat as me, as well as some trying very hard to defend Dante. Andrea responded to one of my comments and contacted me via PM, but then responded to my response. They also recently advertised that a special where you could get the 30x test, with results in, in two weeks.

I have to seriously question the company's behavior and it would be interesting to know overall what percentage of testers have actually gotten their full results. It's very hard to tell given comments both pro and con.

MacUalraig
11-02-2019, 06:31 PM
As has been discussed, that message about the HD was a generic thing they sent to everyone so it may not have any connection with your particular kit status.

ceh13
11-03-2019, 06:33 AM
I have no idea what's going on behind the scene at Dante, but I have had a questionable experience. I ordered a WGS 30x in late Nov 2018. At the time they were insisting you order a HD also if you wanted a BAM file, even though their advertisements were for the results, including Raw data, for $199 inclusive.

I utilised their Nov 2018 offer too and they made all files downloadable without request. Seems to be weird lottery. The files are smaller side in size as some has reported here. The depths in vcf file are 18x-19x for chromosomes and 3600x for mitochondria.

EDIT
The dephts calculated by Qualimap from bam are better than ones in vcf file header.
chr1...chr22, chrX, chrY
25,26,25,25,25,25,26,26,22,30,26,25,21,21,21,27,27 ,26,27,25,21,18,13,10

Donwulff
11-03-2019, 06:59 AM
I utilised their Nov 2018 offer too and they made all files downloadable without request. Seems to be weird lottery. The files are smaller side in size as some has reported here. The depths in vcf file are 18x-19x for chromosomes and 3600x for mitochondria.

That's interesting though. As long as they were using an outside lab for processing, I would have thought that it'd be impossible for them to deliver data that's below agreed-upon quality, without Dante Labs being able to demand to fix it. Same with the BAM files; there's no way they don't have a contract with the sequencing lab, and they should be able to keep that lab to the contract without huge cost to themselves. I know it's a struggle, contractors always try to slip from the agreed-upon quality, but I can't imagine their contract with the lab could have said 2/3rds read depth and no BAM is acceptable. Producing files for online download might mean this could have been a special run as well.

Edit: One possibility is if there's HIGH bacterial (or any other contaminant) content in the sample for whatever reason, of course. In that case most of the raw sequence would be metagenomic vs. human, which is likely main reason you would fall below normal yield with something as tried as short-read sequencing, and it's reasonable that might not violate their sequencing contract which is probably based on raw basepairs with the sequencing lab. "Hey, that's what you sent us for sequencing".

Mostly musing as I'm trying to figure this out myself... The BAM files on Harvard Personal Genomes Project for example are very good quality, and obviously they had BAM's available. There's no way you'll get VCF file without having the FASTQ/BAM (or the sequencer-properietary source format at least) so this whole "no BAM available" spectacle is ridiculously contrived though. Requiring hard-drive shipping while they were promising download has been a staple though, but in 2017 they didn't charge for that. Worth noting that FTDNA just switched to requiring $100 charge for the BAM data and their files are way smaller, though I think in FTDNA's case the reason is mostly trying to deny YFull competition. (Computational processing is now down to "$5 whole genome" pipelines, and BigY is less than 1% of whole genome, storage and transfer has to be neglibile to their other data now).

MacUalraig
11-03-2019, 12:46 PM
Did anyone hear if Roberta Estes got her data and commented on it? Not heard a thing from her since last year apart from the occasional remark in the comments section:

https://dna-explained.com/2018/11/19/whole-genome-sequencing-is-it-ready-for-prime-time/#comments

teepean47
11-04-2019, 11:59 AM
I got my 8 week 4X results. They received my DNA 28th of September and the results were ready today.

EDIT: Do I have to request fastq separately as I found only vcf and tbi files.

teepean47
11-04-2019, 12:02 PM
Yesterday I got fastqs for the 4x 2 week order. So they actually delivered them also on intro order even though they don't mention them atleast on european product page.


Did you have to request fastqs separately?

MacUalraig
11-04-2019, 01:43 PM
I got my 8 week 4X results. They received my DNA 28th of September and the results were ready today.

EDIT: Do I have to request fastq separately as I found only vcf and tbi files.

If you get a BAM do have a go with YFull, even though they say 'depth of coverage min 15X' they processed a FGC 4x* for one of my project members. Plus my worst case YSEQ test came in at 6x mean and 4x median and they processed that fine.

*That's WGS 4x depth, YF were specifying Y depth.

bjp
11-04-2019, 04:59 PM
Wow, I see this thread has become even more contentious, and I hope this serves as a lesson to other companies in, and soon to be in, this space, that promises you cannot keep may bring in some initial revenue but will leave a set of angry, highly-technical customers that will spread their displeasure far and wide, and even if that initial revenue gets you up and running at the pace you want,

Just to add something, maybe relevant, maybe not, to the question of whether or not some customers' raw data was lost - I (and the person whose DNA I had tested) strongly suspected that was the case based on the delay between ordering a hard drive and receiving the raw data (plus the delay after receiving VCF results before they told me I had to order the drive instead of that would just receive downloadable data as promised). I know that datestamps on files do not prove anything, and can easily be forged by anyone with the slightest technical skill. But: I received VCF results for this kit in February, 6 months after returning the sample, ordered the hard drive in April, and received the hard drive in September. My VCF files have a datestamp of February 20, 2019. My BAM file has a datestamp of Jan 23, 2019. Assuming no messing around with file datestamps, this would tend to indicate that the BAM data I received was indeed the original BAM data used to generate the VCF files, and that my BAM data was not re-sequenced by Dante after I ordered the HD and badgered them for months about receiving it.

My order may or may not be typical. But I had a 6 month wait for VCFs, which eventually arrived, and then another 6 month wait for BAM/FASTQ, which eventually arrived. I have a feeling that the partner labs were offering a high-(VCF)-throughput deal to Dante with raw data sent to offline storage that was not easily accessible, and that Dante was in a position of hoping most users would not be interested in full raw data, and that the process for retrieving raw data from the partner lab was at the partner lab's leisure and not any kind of a high priority in any way that would detract from CPU or I/O resources needed to continue delivering 'rapid' (ahem) VCF turnaround.

I have no evidence for any of this, so take it for what you will. I was an upset customer for a while, now I am disappointed but satisfied in having received delivery, despite the constant runaround, false updates, and general clown show environment in dealing with Dante.

karwiso
11-04-2019, 05:29 PM
Did you have to request fastqs separately?

I got my FASTQs also. I got BAM last week and after a week FASTQ files appeared in the reports. I didn't need to request them. :thumb:

aaronbee2010
11-04-2019, 05:38 PM
Did you have to request fastqs separately?

From what I've seen, they usually deliver the FASTQ files after the BAM + VCF files.

mdn
11-05-2019, 05:09 AM
I have received "Intro 4x in 2 weeks (for the same price as 8 weeks)" yesterday. It was really 2 weeks.
There was no any real progress on their status page - from "Kit Received" it has changed to "Finished" (moreover, 21st October - delivered to them, status not changed, 25th October - changed to "Kit Received", 4th November - "Finished", so real 2 weeks).
So I have some VCFs, trying to get out mtDNA from there by some tools.

VCFs are generated for hs37d5.fa sample, I am already know that I have to use both VCFs, as I can generate MT from any single but it will be invalid. :)

teepean47
11-05-2019, 07:44 AM
I have received "Intro 4x in 2 weeks (for the same price as 8 weeks)" yesterday. It was really 2 weeks.
There was no any real progress on their status page - from "Kit Received" it has changed to "Finished" (moreover, 21st October - delivered to them, status not changed, 25th October - changed to "Kit Received", 4th November - "Finished", so real 2 weeks).
So I have some VCFs, trying to get out mtDNA from there by some tools.

VCFs are generated for hs37d5.fa sample, I am already know that I have to use both VCFs, as I can generate MT from any single but it will be invalid. :)

I filtered MT from the VCF and added the original header. Haplogrep gave the result H10g which is correct.

mdn
11-05-2019, 12:38 PM
I filtered MT from the VCF and added the original header. Haplogrep gave the result H10g which is correct.
Thank you, it says H1e. I just do not know the 'proper' haplogroup as the data is for my wife.
Ok might be I will wait for BAM or FASTQ and will check with another instruments.

mdn
11-08-2019, 08:04 AM
So some statistics on Intro 4x.

Monday noon - initial result with VCF (SNP и Indel).
Tuesday evening additional 2 VCF (CNV + something else).
Wednesday evening - FASTQ R1 и R2 (20 gb each)
Thursday evening - BAM (20 gb)

BamDeal stats (it is a girl, so some strange Y with 0.64% coverage and 0.05 mean depth is removed):
#ID Length CoveredBase TotalDepth Coverage% MeanDepth
1 249250621 221955859 2241274580 89.05 8.99
2 243199373 236104856 2291519659 97.08 9.42
3 198022430 194221826 1859468480 98.08 9.39
4 191154276 186638374 1728301243 97.64 9.04
5 180915260 175745684 1673761356 97.14 9.25
6 171115067 166756582 1597156474 97.45 9.33
7 159138663 153434835 1487997908 96.42 9.35
8 146364022 141330764 1369367867 96.56 9.36
9 141213431 114186762 1119817234 80.86 7.93
10 135534747 129253716 1290690007 95.37 9.52
11 135006516 130412836 1307149460 96.60 9.68
12 133851895 129858562 1268523973 97.02 9.48
13 115169878 95300694 888581322 82.75 7.72
14 107349540 87843774 860996242 81.83 8.02
15 102531392 79996070 809312638 78.02 7.89
16 90354753 77335729 873209365 85.59 9.66
17 81195210 77010661 831590323 94.85 10.24
18 78077248 74460303 718042436 95.37 9.20
19 59128983 55306480 621690764 93.54 10.51
20 63025520 59238733 628993885 93.99 9.98
21 48129895 34644781 353564696 71.98 7.35
22 51304566 34098755 387299468 66.46 7.55
X 155270560 148155109 1396235891 95.42 8.99
MT 16569 16569 3921325 100.00 236.67

mtDNA Server stats for MT BAM done by WGS Extractor.
===
Statistics:
Overall Reads: 176,455
Filtered Reads: 153,004
Passed Reads: 23,451
Passed FWD Reads: 1,690,116
Passed REV Reads: 1,687,354

Mapping Quality OK: 25,864
Mapping Quality BAD: 1,101
Unmapped Reads: 0
Wrong Reference in BAM: 149,490

Base Read Quality OK: 3,377,470
Base Read Quality BAD: 113,647

Bad Alignment: 15
Duplicates: 2,398
Short Reads (<25 bp): 0
===

Donwulff
11-08-2019, 10:39 AM
That's big for 4X, definitely strong contender against DNA microarrays if they can keep the quantity (and quality) up. Especially if it could be combined with imputation. Actually I think we can already do that fairly well, drop variant calls below certain quality, run it through imputation and then set discrepant imputed variants as no call. (There's PROBABLY an imputation algorithm that can take variant quality into account and do error-correction on the fly, but I haven't seen those in common use)

Bit unclear on the read length etc. from those stats though, (3,377,470+113,647)+(25,864+1,101) gives me 129.5 bp, which might mean 160bp reads with 30 bp overlap and/or trim. I have no idea how the mtDNA stats say "overall reads 176,455" and "passed fwd reads" 1,690,116 though. Is the overall reads for whole genome, and passed FWD/REV reads actually bases? Passed reads is probably read templates on chrM, and mapping quality is over alternative mappings as well with "wrong reference" being non-chrM reads including alt-mappings. There's no stat for bases in primary mappings only though which prevents calculating real read length.

mdn
11-08-2019, 11:01 AM
I have no idea how the mtDNA stats say "overall reads 176,455"
Sorry WGS Extract produced BAM "chrY and chrM", so there is some trash unknown for me (with coverage of Y 0.64% and 0.05 mean depth). Might be this is a reason.

teepean47
11-08-2019, 04:06 PM
I created an autosomal kit from Dante 4X with WGSExtract and compared it to my Ancestry and FTDNA kits. The results were quite encouraging if one wants to create autosomal kits: Ancestry vs Dante 4X has around 1600 different SNPs out of 900000. With FTDNA the diffrence is around 300 out of 680000.

Donwulff
11-09-2019, 04:09 AM
Studies like https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6370902/ have shown empirically that concerdance drops sharply below about ~10X read depth (Average!). 8X-9X is right near the sweet spot, if it actually were the 4X that's promised, it would be really poor even for the sites with coverage. Consequently, the concordance depends heavily on the exact read depth they hit, so YMMV, literally.

In fact though, your figures seem like they might be slightly higher concordance than in the study (especially since in the study they filtered suspicious calls from the microarray). One caution is in the study they used fixed confidence level for the sequence, and didn't really test the effect of different confidence levels. I suspect that using the new GATK Best Practices recommended convolutional neural network filter and higher required confidence the concordance could be pushed even higher. Perhaps I should try sub-sampling down a sequence and testing it out.

Of course, the majority of the error is driven by heterozygous variants, where all the sequenced reads happen to hit copy from the same parent (same chromosome from a pair) which will statistically happen a lot over a large genome like the human genome, and is hard to detect. The probability for second read to hit same copy is 50%, three it's 25%, you need 8 reads to push it below 1% (0.78%) out of variants that look homozygous. That filter would give you almost 50% of no-calls though. This is also main reason why rare variants will have more errors, because they're almost always heterozygous.

In general, you should be able to trust variants which appear heterozygous (ie. one copy out of two) in the sequence, but there's a chance of missing some variants entirely, or reading them as homozygous when in fact heterozygous. Filtering variants called as homozygous at read-depth below 14 should have very high concordance according to that paper. A lot of clinical reports though just say "variant exists" which is playing it safe, without specifying whether it's hetero or homozygous, or that there isn't a variant. VCF files need an extension that says the same...

MacUalraig
11-09-2019, 12:41 PM
Dante to sequence 100k Mediterraneans!?

"“We will sequence the genomes of 100,000 people from Africa, the Middle East, and Europe in the next 3 years”, says Dante Labs CEO Andrea Riposati."

"Early partners will be announced in the next few weeks. Universities, hospitals, research centres and biobanks are invited to apply at the Ulysses Genome Project website."

https://www.dantelabs.com/blogs/news/dante-labs-announces-ulysses-the-new-100-000-mediterranean-inclusive-reference-genome-project

Can't find such a genome project website so far mind.

MacUalraig
11-09-2019, 12:53 PM
Even better TODAY there is a Dante webinar at which they will reveal amongst other things...

drumroll...

"the truth behind the delays"!!

https://www.dantelabs.com/pages/webinar-inside-dante-labs

mdn
11-09-2019, 01:13 PM
Might be new queues since tomorrow - announced big discounts by YFull and MyTrueAncestry (for DanteLabs).
Some quote from YFull announce:
"Dear friends, a couple of news for the community.

1) Transfer the BAM file from Dante Labs to YFull will be more convenient and easier

Dante Labs company plans in December 2019 for all its customers (for those who have received their results earlier and for future users) to add the ability to request for free charge the generation of a BAM file (in a special widget on the website), which will include only Y and/or MT Chromosomes. The user will be able to download this BAM file (Y+MT) to your computer or send a direct link in the order form on the YFull website. This option is preferable. We hope that after this the transfer of the BAM file for analysis will be simple and accessible to everyone!

2) Special extra discount for YFull users from Dante Labs dedicated to "China's Singles Day"

You can make a discounted order from 11pm CET (Europe) or 5pm eastern time (USA) on Sunday, November 10, 2019 on until Monday, November 11, 2019.

The price for the WGS (Whole Genome Sequencing) 30X on the Illumina NovaSeq 6000 will be $199.

For YFull users with the "YFULL10" code extra 10% discount, and the price for the WGS (Whole Genome Sequencing) 30X on Illumina NovaSeq 6000 will be only $180!

Discount Code: YFULL10

https://www.dantelabs.com/products/whole-genome-sequencing"

MacUalraig
11-09-2019, 01:29 PM
The YFull linkup is way overdue, we've been asking for it since day one. If they actually deliver the BAMs and the link this could be a game changer. But I've said that before and was wrong...

pmokeefe
11-09-2019, 02:18 PM
There was supposed to be a link up between sequencing.com and Dante Labs. I signed up for that and received this email from Dante Labs.

Dante-Labs [email protected] via freshdesk.com
Wed, Jul 17, 1:20 PM
to notification, me


Hello,

Thank you for your message

We are going to import your raw data to your sequencing.com account within the next 48 hours.

Please let me know if there is anything else we may assist you with.

Kind regards,

Lenox
I just checked sequencing.com, I could find no evidence that the import ever occurred. At least there was no charge. Did anyone else ever successfully import their Dante Labs data directly into sequencing.com?

MacUalraig
11-09-2019, 03:58 PM
Webinar registration link up now running 4-5pm GMT

https://register.gotowebinar.com/register/7223150781977895435

MacUalraig
11-09-2019, 04:25 PM
My BAM file is uploaded!! I just wanted to check before I complained live on air and there it was, must have gone up in the last 24h to head off the publicity.

pmokeefe
11-09-2019, 04:40 PM
Did anyone else watch the webinar with Andrea Riposati just now (8am PST)? He acknowledged several customer issues (delays, raw data, etc). Did anyone feel like their issue was not even mentioned, or poorly addressed?

Also, thanks to MacUalraig for the heads up, I would have missed it otherwise.

pmokeefe
11-09-2019, 04:57 PM
My BAM file is uploaded!! I just wanted to check before I complained live on air and there it was, must have gone up in the last 24h to head off the publicity.

When did you place your order for that kit?

MacUalraig
11-09-2019, 05:07 PM
When did you place your order for that kit?

Sample kit signed for back at Dante Jul 25th 2018, VCFs uploaded Mar 5th 2019. Hard disk fee paid following day.

rippleish20
11-09-2019, 05:10 PM
I would also thank MacUalraig for the notice, as I would not have known about it otherwise, although I assume they will post a link to the recording.

His explanation would cover the majority of complaints I have seen. I like the explanation, even if it is late in coming and downplays the shenanigans. I apparently am one of the "50 or so" that are being sent a new kit since they apparently lost my data along the way. I received VCF files but nothing else. I guess I am hopeful they really are going to fulfill all issues (and prioritize that), but we will have to see if that pans out. I have basically gotten lots of excuses in the past without any actual results.

The email I got from them was a little lacking on an apology. They indicated they would send me a new kit and sequence a new sample as a "gift" from them, yet I paid for those results, including a hard drive, a year ago. Calling it a gift is humorous.

MacUalraig
11-09-2019, 08:27 PM
OK quick peek and its hg19. Y now split off and on its way to YFull*. Will download the FASTQs tomorrow and align them.

Interestingly all the FGC novel variants that were missing from the VCF are present and I'd say callable in the BAM. We'll see what some alternate callers say.

*on second thoughts maybe I'll upload it after I realign it.

Petr
11-09-2019, 08:39 PM
Just received refund for 2 HDDs ordered on April 4th and never sent.

Donwulff
11-10-2019, 10:34 AM
Ulysses, 100.000 whole genomes that's 130+ whole genomes per each working day for that three years. "Sequencing will be performed by Dante Labs in its high-output sequencing centre in Italy." Did they announce the capacity of their new sequencing center, because uh... On the flip side, they do need to keep that sequencing center busy to cover the fixed costs, so this makes sense if as it seems they have funding and perhaps can fulfill the requisite capacity. One might also do well to look at the Human Genetic Diversity Project and the ensuing political/ethical fallout, which led among other things to the stipulation that genetic research should be as far as possible conducted in the region of interest so the benefits flow to them. If their partners, perhaps in the region, are willing to heed off that fallout though... but it seems bit over-ambitious, especially since I'm not sure they've even started their ora microbiome sequencing project which was announced earlier.

Also I just noticed their press-release says "leading DNA sequencing company"...

Is the webinar still available anywhere, missed it myself and nobody reporting what they actually said.