PDA

View Full Version : How much cheaper will whole genome sequencing (WGS) become?



Plantbasedhuman
07-28-2020, 10:30 AM
I'm considering purchasing a WGS test however I'm willing to wait if prices are likely to drop significantly and the technology is going to improve a lot in the next year or so. I see Nebula is currently offering 30x Whole Genome Sequencing for $299 USD which is less than half the price of Dante Labs. Is that as good as it will get for the foreseeable future in terms of price? Also, is it likely that the accuracy of testing will improve over time? For example could companies eventually offer 60x coverage instead of 30x causing the current results to become redundant? Also, would a Big Y-700 at FDNA have more detail than the Y chromosome data from a 30x WGS? I'm hoping that commercial companies like Ancestry, MyHeritage, 23andme & FTDNA will eventually offer upgrades to WGS as they already have the saliva samples on hand but maybe that is wishful thinking?

Ibericus
07-28-2020, 11:24 AM
Welcome! BGI, the company that sequences for Nebula, has said that they are able to sequence a full genome for 100$. However, when you add the costs of shipping, costumer and IT services, etc you may well end up close to 200$. Probably it's not going to go much lower than that anytime soon so I don't think it's worth waiting several years just to save $100, but that's just my opinion.

Plantbasedhuman
07-28-2020, 11:38 AM
Thanks for the reply @Ibericus. I think the biggest advances will be in services for consumer friendly, non-technical interpretation of the data and a standardised API for sharing with doctors etc. Another consideration is how future-proof the data is. If the file format we receive now remains compatible for years to come then the cost is probably worth the investment. I'm leaning towards the Dante Labs Gen Z test since it has 120x coverage so might be more accurate.

Dave-V
07-28-2020, 01:05 PM
The next “quantum leap” that we should see in DNA testing is an eventual shift from the current NGS/WGS testing over to something like nanopore or SMRT, which will be a big deal especially for the Y chromosome as it should open up more of it for mutation discovery. Nanopore especially has lower equipment costs but the error rates (as I understand it) are still too high for the consumer market. Those would require retesting us all from original DNA samples or new samples if available.

Don’t expect a major change on that front for at least the next couple of years though (at a guess) so if you’re waiting to see if things change soon that’s not a factor. But even if companies don’t all jump on that bandwagon as soon as it becomes viable for genetic genealogy you can expect it would still drive the cost of “traditional” NGS/WGS testing down just from market competition.

Just my personal opinion but I would not expect significant changes in WGS read depth (the 30x/60x/120x/etc); it’s possible of course but we’ve pretty much reached the 23.6M base pair limit of the Y chromosome that NGS/WGS testing can usefully report (even a 15x WGS test covers most of it) so any additional read depth will just increase the reliability of SNPs found which gives a little boost in effective SNP numbers but not significant new discovery. I would agree generally that the higher the read depth the better the test certainly so there’s nothing wrong with selecting a test with higher read depth if it’s within your price range.

There are raging debates elsewhere on comparisons between Big Y-700 and WGS as to the Y chromosome (leaving out the autosomal side of WGS) but my summary would be that Big Y-700 is approximately the same as a 30x WGS as far as reliable SNP discovery. The reason I highlighted “reliable” is because vendors providing WGS will report a little more of the Y at lower read depths so WGS testing will report more SNPs based on 1 or 2 reads that Big Y-700 will cut off and whether those are meaningful is part of the debate. So a 30x WGS will probably give you slightly more SNPs than a Big Y-700 but it’s an apples-to-oranges comparison once you get into the details and experience varies on whether the difference is meaningful... hence my statement above. Not saying they’re identical tests but only that they’re very similar in genealogical value. Of course the other usual point mentioned in those comparisons is that with Big Y-700 you get access to a much larger matching database which is important too, although less important for people from under-tested populations. And on the other side with WGS you get autosomal data of course as well.

MacUalraig
07-28-2020, 01:19 PM
I'm considering purchasing a WGS test however I'm willing to wait if prices are likely to drop significantly and the technology is going to improve a lot in the next year or so. I see Nebula is currently offering 30x Whole Genome Sequencing for $299 USD which is less than half the price of Dante Labs. Is that as good as it will get for the foreseeable future in terms of price? Also, is it likely that the accuracy of testing will improve over time? For example could companies eventually offer 60x coverage instead of 30x causing the current results to become redundant? Also, would a Big Y-700 at FDNA have more detail than the Y chromosome data from a 30x WGS? I'm hoping that commercial companies like Ancestry, MyHeritage, 23andme & FTDNA will eventually offer upgrades to WGS as they already have the saliva samples on hand but maybe that is wishful thinking?

Not quite USD299 no because you have to also buy a 'subscription' either monthly yearly or lifetime. The cheapest is to buy the monthly and cancel it after one month.
Comparing it with targetted Y NGS like Y Elite (Full Genomes) or BigY is tricky partly because they plan to do a linkup with FTDNA down the road but they haven't published details on how this will work or if it will involve extra charges.

What you will get with it though is your raw data via download in the headline price so you can upload it to YFull who will put you on both their Y and mtDNA trees. BigY bills you USD100 extra for the raw data and if you do get it they will have stripped out the incidental mtDNA reads which in my experience are good enough to not need a separate FMS test.

The difference in measurable terms in the varying read depths is probably not worth worrying about, it is doubtful you will tell the difference (I've done 8 sequencing tests so have some good comparison data!).

Dante will sequence you on a NovaSeq 6000 in Italy, that IIRC is the same machine BigY runs on.

MacUalraig
07-28-2020, 03:37 PM
Thanks for the reply @Ibericus. I think the biggest advances will be in services for consumer friendly, non-technical interpretation of the data and a standardised API for sharing with doctors etc. Another consideration is how future-proof the data is. If the file format we receive now remains compatible for years to come then the cost is probably worth the investment. I'm leaning towards the Dante Labs Gen Z test since it has 120x coverage so might be more accurate.

The Dante 120x coverage is only for the exome not the whole genome.

YSEQ will sell you a 50x WGS but its pricey.

MacUalraig
07-28-2020, 05:56 PM
There are some Y SNPs that are in the exome of course so you could get lucky. M222 is in exon 25 of the USP9Y gene so that gets captured during exome sequencing - in this screenshot (M222-) it got 80 reads.

38783

jadegreg
07-29-2020, 02:01 PM
Speaking with Nebula, they are planning on rolling out 100x WGS in the near future, but they couldn't give me dates or costs....

Plantbasedhuman
08-01-2020, 11:56 AM
Just as an update, I went ahead and purchased the Dante Labs test which they call “ Whole GenomeZ - Whole Genome for Advanced Analysis (130X + 30X)”. Thanks everyone for the really thoughtful replies this looks like a great community.

JamesKane
09-25-2020, 12:49 PM
Dramanac, the scientist behind CompleteGenomics/MGI, stated in one the interviews that processing costs of less than $10 could be achieved before he retires around the time they announced DNBSeq Tx and CoolMPS technology. The MGI chemistry has alleged patent entanglements but those also expire in the next 4 years.

Basically, the price for 30x coverage with 150bp PE reads will start to be dominated by the IT costs of delivering all the data generated to the customers. Expect companies to move toward monetizing the interpretation with subscriptions like Nebula is showing instead performing the tests.

RobertCasey
09-25-2020, 02:13 PM
The number of reads could increase and get another 5 to 10 % usable YSNP discovery but these tests would be more much more than 5 or 10 % of the costs of the standard 30X WGS products. I think that much longer read lengths would have the same impact on the read quality (Nanapore or 1000 bp NGS). But again the usable YSNP discovery is pretty small again - 5 to 10 %. But what is not being discussed is the longer read lengths would dramatically help with YSTRs.

First, you would no longer need to order the Y111 test which is key to currently discovering 3X branches than just YSNP only branches. Also, the large number of no reads of the existing Y700 YSTRs would go way down as well. Plus you would get 1,000 to 2,000 YSTRs but most of these would be too slow or too fast mutating. Another major factor would be that the sample size of Big Y2000 YSTRs would eventually catch up with Y67 and Y111 only testers. This would be the largest improvement in creating new YSTR branches. With extensive YSNP testing, YSTRs become much more reliable for charting and Y2000 YSTRs would increase the YSTR to YSNP ratio up to 5X or more. For R-L226, I can now chart 93 % of Y67 and 97 % of Y111 and this keeps improving with sample size. We are definitely hitting the YSNP discovery limit with our R-L226 project as many Big Y700 testers are beginning to yield no, 1 or 2 private YSNPs which means parts of R-L226 has reached the limit of YSNP discovery - but Y700 YSTRs are still five to ten extra private YSTR mutations. Plus we could easily get more than 100 % more YSTR mutations vs. 5 to 10 % more YSNP discovery.

But YSTR signatures are not 100 % reliable and YSNPs are 99 % reliable. But larger numbers of testers (sample size) and larger numbers of YSTRs will close the gap. Also, using faster mutating markers is somewhat problematic as well. CDY and several existing faster mutating Y700 YSTRs may need to be filtered out due to hidden mutations in the older parts of the charts. SAPP is a pretty robust charting tool but this tool and any manual method of charting still have quality issues to improve. But larger YSTR signatures above seven mutations are now above 95 % in accuracy (and YSNP at Y67 prediction of older predictable haplogroups are usually 99 % accurate these days).

People want to ignore YSTRs but they are already very key to revealing more total branches where expense NGS/WGS and YSNP discovery is declining is larger and more prolific haplogroups. YSNPs are like probate and census records which are much more reliable sources. But all the other genealogical source records is where major advances are being made on the brick walls. Personal property tax lists and even older genealogical manuscripts do have a lot of value.

JamesKane
09-25-2020, 07:24 PM
I'll disagree here, Robert. All the statistical problems everyone had building out STR-based trees prior NGS becoming affordable are still inherent. No doubt you've seen the report of a R1b-Z253 and a R1b-FGC11134 (supposedly Irish Type 2) having more than 100 common STR markers on the 111 marker set. If that's correct, none of the statistical models can deal with these types of outliers and are therefore flawed to the core. In genetic genealogy we are dealing with individuals not populations, which is where the STR models work. In populations it doesn't matter if you misclassify 2-5% of the cohort, since you are in a favorable margin of error.

This moves me to the camp STRs are just not helpful, they are useless given the current cost of WGS.

MacUalraig
09-30-2020, 04:59 PM
To my knowledge there has never been a scientific Y-STR tree, if there is RobertCasey would be pointing it out to us. Right back to the YCC it was always SNPs (plus an Alu or two). As a reminder I cite Professor Mark Jobling, who amongst other things did the Y-SNP selections for LivingDNA, on my website:

Professor Mark Jobling* (2017) writes "SNPs define stable haplotypes, known as haplogroups, which can be used to build a robust phylogeny using the principle of maximum parsimony". STRs cannot.

Dave-V
09-30-2020, 06:50 PM
I'll disagree here, Robert. All the statistical problems everyone had building out STR-based trees prior NGS becoming affordable are still inherent. No doubt you've seen the report of a R1b-Z253 and a R1b-FGC11134 (supposedly Irish Type 2) having more than 100 common STR markers on the 111 marker set. If that's correct, none of the statistical models can deal with these types of outliers and are therefore flawed to the core. In genetic genealogy we are dealing with individuals not populations, which is where the STR models work. In populations it doesn't matter if you misclassify 2-5% of the cohort, since you are in a favorable margin of error.

This moves me to the camp STRs are just not helpful, they are useless given the current cost of WGS.

Examples don't disprove statistics, only better statistics does. We all (probably) can cite examples of branches where STRs don't reliably identify branching. We SHOULD all (I know I can) be able to cite examples where SNPs don't reliably identify branching either and not just over short time periods but even over much greater time periods even than the frequencies found in WGS. If we're honest, neither STRs nor SNPs alone offer the consistent reliability that we would hope for in reassuring a new tester that they can use Y-DNA testing to adequately map their shared ancestry with other testers.

It boggles my mind that people still think STRs or SNPs are an either-or question or that based on either's performance in particular branches that either should be discarded. Both have been proven reliable when used correctly. One of the differences that we ignore at our peril is that with SNPs, much of the reliability issue is hidden from us by vendors who apply criteria to tell us which SNPs to count on over others, and we usually blindly accept those criteria. With STRs the reliability issue is inherent in the mutation variability and not filtered so we think of them as less reliable.

Nobody's saying that you should consider a branch proven because a few people share an off-modal STR. Certainly once you get past the reliability issues, absolutely SNPs have greater certainty than single STRs or wherever STR patterns don't show consistent signatures. And I would always advocate WGS or targeted NGS testing to get a SNP-based reliable structure as the backbone of the ancestral branching. So maybe we would agree that STRs are a secondary piece of data. But in ALL of the NGS/WGS testing and analysis that I have seen, I don't think I've ever seen that SNPs alone have provided enough branching to completely answer the testers' original genealogy questions. It's always "here's what we know so far". In that light, looking for two-marker or three-marker or even higher signatures from STRs, which depending on the markers will very quickly move from offering "likely" branching all the way up to just as statistically-reliable as SNPs, is a perfectly acceptable (and I would argue, still necessary) additional analysis that satisfies Jobling's same maximum parsimony principle that MacUalraig posted about just before this.

Perhaps when we get to finding a SNP every generation we can adopt your approach. I don't think we can yet.

Dave-V
09-30-2020, 07:02 PM
To my knowledge there has never been a scientific Y-STR tree, if there is RobertCasey would be pointing it out to us. Right back to the YCC it was always SNPs (plus an Alu or two). As a reminder I cite Professor Mark Jobling, who amongst other things did the Y-SNP selections for LivingDNA, on my website:

Professor Mark Jobling* (2017) writes "SNPs define stable haplotypes, known as haplogroups, which can be used to build a robust phylogeny using the principle of maximum parsimony". STRs cannot.

Apologies but I need to point out that you're taking Jobling out of context. He was actually advocating for the use of SNPs together with STRs basically the same way I was describing in my earlier post just now. The full quote (Human Y-chromosome variation in the genome-sequencing era, 2017, Jobling & Tyler-Smith, available from the Wayback Machine at https://lra.le.ac.uk/bitstream/2381/40490/2/Jobling&TylerSmith17.NRG.YChromosomeNGSReview.pdf) is as follows:

Early studies involved discovering variants in small samples before genotyping them in larger samples, leading to strong biases because additional variants present in the larger samples were not accounted for. Some of these problems could be alleviated by performing combined analyses of slowly-mutating single-nucleotide polymorphisms (SNPs) and more rapidly-mutating short-tandem repeats (STRs). SNPs define stable haplotypes, known as haplogroups, which can be used to build a robust phylogeny using the principle of maximum parsimony. Deploying multiple STRs, which are variable in all populations and therefore lack ascertainment bias, can then reveal the level of variation within these haplogroups, and also provide some information about their time-depths (that is, the time since the haplogroup-defining mutation occurred); older haplogroups will harbour higher STR haplotype diversity.

RobertCasey
09-30-2020, 08:50 PM
I'll disagree here, Robert. All the statistical problems everyone had building out STR-based trees prior NGS becoming affordable are still inherent. No doubt you've seen the report of a R1b-Z253 and a R1b-FGC11134 (supposedly Irish Type 2) having more than 100 common STR markers on the 111 marker set. If that's correct, none of the statistical models can deal with these types of outliers and are therefore flawed to the core. In genetic genealogy we are dealing with individuals not populations, which is where the STR models work. In populations it doesn't matter if you misclassify 2-5% of the cohort, since you are in a favorable margin of error.

This moves me to the camp STRs are just not helpful, they are useless given the current cost of WGS.

With extensive Big Y700 testing (around 20 %), YSTRs are very useful for two reasons: 1) for haplogroups in the 1500 to 2500 YBP range, you can predict the predictable haplogroup with 99 % accuracy. This allows YSTR testers to participate in haplogroup projects; 2) charting reveals 3X to 4X branches that YSNP only testing does not reveal. With Y700 YSTRs, these signatures are becoming very large and are climbing to around 80 to 95 % accuracy. For the R-L226 project, it is now getting pretty common for 0, 1 or 2 private YSNPs but where YSTR branches are working extremely accurate. Some YSTR signatures are getting 10 to 15 mutations under predictable haplogroups which are 95 to 98 % accuracy.

Currently, most people still test Big Y700 (90 to 95 %) that unfortunately require Y111 tests first. Once any FGS/WGS/New YSNP discovery test has longer read lengths of around 1000 bp, Y111 will no longer be needed as Y111/Y700 will have only 1 or 2 % no calls. Also, both YFULL and FTDNA are not reporting a large number of slower mutating YSTR markers that will become more useful as sample sizes increase. This like telling genealogists to use only well proven source records such as census records and probate records. However, like YSTRs, all the less rich source material produce most of the useful discovery that YSNP only fail to produce. We have over 100 testers in the O'Brien surname cluster which is extensively Big Y700 tested:

http://www.rcasey.net/DNA/R_L226/Haplotrees/L226_Home.pdf#Page=8

Note that DC863 has five Big Y700 testers but only one YSNP. There is just one branch equivalent of DC863 and seven private YSNPs across five testers (1.4 private YSNPs per tester). But we clearly have three well proven and consistent YSTR branches under DC863 which is 4X more branches than 1X YSNP only. This is consistent throughout the L226 chart - not just large surname clusters. Binary logistic regression works not only for YSNP prediction that is 90 % of the time is just as accurate as YSNP branches. Charting (with very large Y700 YSTR signatures) are over 95 accurate as well. Binary logistic regression works under predictable haplogroups as well. Charting is also very key to getting people to upgrade to Big Y700 as well. YSTR only approaches do have poor accuracy (this is why the FTDNA YSTR matches and the Tip Tool lack accuracy as well). Charting and extensive YSNP dramatically restrict placement of YSTR mutations - SAPP does an excellent job of not only enforcing YSNP hierarchy but also predicts less YSNP tested testers (those who went the SNP pack route or randomly tested YSNPs). SNP packs and individual YSNP testings is no longer cost effective as it once was.

BroderTuck
10-01-2020, 07:05 AM
Currently, most people still test Big Y700 (90 to 95 %) that unfortunately require Y111 tests first.

Slightly confused by this statement, since BigY actually includes the Y111 test in its price (has done for about 2 years now or so), and is usually CHEAPER than first ordering Y111 and then doing a BigY-upgrade.
(edit: Well, I might be mistaken about the current state of this since I haven't been testing others, and my own BigY was made several years ago)


SNP packs and individual YSNP testings is no longer cost effective as it once was.
True, and even Thomas Krahn has said this himself (think it was on the yseq facebook page)

RobertCasey
10-01-2020, 03:32 PM
Slightly confused by this statement, since BigY actually includes the Y111 test in its price (has done for about 2 years now or so), and is usually CHEAPER than first ordering Y111 and then doing a BigY-upgrade.
(edit: Well, I might be mistaken about the current state of this since I haven't been testing others, and my own BigY was made several years ago)

Prior to FTDNA bundling the Y111 test with Big Y, 80 % of the L226 testers are still YSTR only (sometimes with limited YSNP testing via SNP pack or individual YSNP orders). There are also 2 or 3 % of the early Big Y500 testers at YSTR levels lower than Y111 (even though these upgrades were and remain very economical).

With FTDNA pricing, multiple upgrades are always more costly than going directly with Big Y700. The price difference has not been that much until recently. During the last sale, the upgrade charges from Y37 to Y67 to Y111 to Big Y700 has gone up significantly. R-L226 still has 400 testers at Y37 out of the known 1,200 testers. So upgrading YSTR to higher resolution YSTR has become more expensive during the recent sales. The number of net new Big Y700 has gone up a lot and the number of YSTR only upgrades has gone down due these pricing changes (during the sales events primarily). Ordering Y37 to Y67 to Y111 is 30 % higher these days and any Y37 should now go directly to Y111. Going directly to Big Y700 always saves another 5 to 10 % in costs as well. This makes sense as multiple orders require multiple lab runs and multiple processing of analysis as well.

We still occasionally recommend the R-L226 SNP pack depending on where the YSTR signatures place them on the R-L226 chart. We have added analysis of the branches just above R-L226 since they are important for understanding the progression of YSTR mutations prior to R-L226 and also a good for understanding the geographic migration pattern of L226 just before and just after the L226 mutation. We included quite a few L226 branch equivalents in the L226 SNP pack, so it is economically viable to recommend the L226 SNP pack as an economical alternative to validate the huge signatures of the branches above L226 (there is significant RecLOH involved in pre-L226 YSTR signatures).

Dave-V
10-01-2020, 05:11 PM
With FTDNA pricing, multiple upgrades are always more costly than going directly with Big Y700. The price difference has not been that much until recently. During the last sale, the upgrade charges from Y37 to Y67 to Y111 to Big Y700 has gone up significantly.

I agree it's hard to keep up with pricing and sales etc, but I don't think there is still that much difference right now between multiple upgrades and direct purchase.

Based on some exploration right now (with no sales in place), the direct prices (all in USD) are Y37 = $119, Y111 = $249, and Big Y700 = $449.

Skipping the Y67 level, upgrades are Y37->Y111 = $139, Y37->BigY = $339, Y111->BigY = $239.

So taking a Y37 first and then upgrading to Big Y is 119+339 = 458 or $9 more than a direct Big Y purchase. Taking a Y111 first and then upgrading to Big Y is 249+239 = 488 which is $39 more than a direct Big Y purchase. The highest additional price is if you took Y37 first, then upgraded to Y111, then upgraded to Big Y which would be 119+139+239 = 497. That's $48 more than Big Y direct but that's only an 11% increase.

Certainly I agree with you it's better to go direct if you can and avoiding an 11% increase is a good thing. But if you played it smarter and got those upgrades during sales periods, you should be able to knock at least $20 off those upgrade costs each time. At that point you'd be taking a Y37 then Y111 then Big Y for only $8 more than you would have paid for a direct Big Y. And that's not even counting the chance that the prices themselves including upgrades would fall over the period that you're taking these tests.

I'm not arguing FOR the multiple upgrade path as a first choice, only saying that I don't think there's a significant price issue with it. The real downside is the extra time spent at lower levels if those haven't helped to answer the questions you were hoping to get answers for.

TigerMW
10-01-2020, 06:57 PM
Everyone realizes that WGS tests are NOT inherently less costly to run than targeted Y NGS tests, right?

They are both running on the same equipment, Next Generation Sequencing equipment. The targeting actually saves costs as the WGS tests chug on for quite a bit on longer to to cover the whole genome to the specs stated.

The reason why WGS tests are sometimes inexpensive is that vendors want your data for other reasons, be it for big pharma research or medical research or what you have. This means those vendors are willing to subsidize the costs with the intent to make more money elsewhere.

It's the same old story why Google and Facebook are free. High tech CEOs like Apple's have told us this for years. If you are getting a product for free then - YOU ARE THE PRODUCT.

In the case of Big Y700, FTDNA does have additional cost. The Y111 STR panel is actually a second test embedded in the Big Y700 "package". I don't know how much Y111 STR panel costs to run. Accounting-wise, they may allocate little of the machinery costs to it. As an example, the equipment may already be fully depreciated.