PDA

View Full Version : YSEQ's new 15x WGS



MfA
01-10-2017, 09:40 AM
This seems like a well made all-around product with having more emphasize on Y-DNA but you'll receive complete mtDNA sequence, and autosomal file in 23andMe format.

https://www.yseq.net/product_info.php?products_id=42468


This WGS test is specifically designed for genealogy researchers and includes:

Full Mitochondrial Genome (Fasta file & haplogroup)
Annotation of all Y-SNP names and Y haplogroup information from Ybrowse.org
Separate VCF files for derived and novel Y-SNPs
User friendly autosomal allele files that can be immediately uploaded to genealogy tools like GedMatch
All raw data (FastQ, BAM, VCF etc.) supplied for free download or on SD card by mail
FREE Sanger sequencing confirmation of up to 10 novel Y-SNPs (for well prepared submission to the ISOGG tree)
Turn around time less than 3 months (after sample arrival)

MacUalraig
01-10-2017, 02:31 PM
Yes it's an intriguing development and I wish Thomas well with it. As I understand it he was sort of working with FGC though, doing their DNA extraction??

RobertCasey
01-10-2017, 04:19 PM
It is a little disappointing that they chose 15X coverage - wish it was 30X or 50X (but that would push it to the $1,000 range which would turn off many). Also, does anyone know the read length ? Probably 150 which is another issue. Also, there is no mention if relevant list of YSNPs will be entered into their database as well as the YSTR values. Both FGC and YSEQ are really missing the advantage of a robust database to match the genetic data being provided. They need to partner with some IT people and charge for this service. Of course FTDNA does a minimal job of YSNPs for their database as well and fail to even analyze the YSTRs at all. I guess that you have to enter the market with one test to see the level of interest. Delivery time could be a real plus for YSEQ over FGC.

Wish they would have stuck to their core business of YSNP testing and introduced Mass Array SNP packs with a lot more branches and private YSNPs (without all the equivalent YSNPs that have minimal value compared to private YSNPs). This would obviously boost the YSNP sales as well which are suffering due to their SNP panel test orders declining. They should also lighten up on testing YSNPs found in unstable areas but have consistent test results - another FTDNA advantage.

Do not get me wrong, I regularly encourage YSEQ testing for the L226 and the Casey projects. Under L226, YSEQ has been discovering branches at 10 % of the cost of yet more Big Y tests. It is really hard to convince a lot of people to test with YSEQ or FGC though. All three FGC tests under L226 are for Casey men only to date.

I am really surprised how much success that L226 has had with the L226 SNP pack that includes 50 private YSNPs. Not only has this test revealed another seven branches under L226, but my charting of L226 using signatures has increased from 50 % to 75 % and has been more useful than just a few more Big Y tests. The Mass Array is having some substantial reliability issues (more than expected) but it remains a super useful test. Also, FTDNA's insistence on posting private YSNPs in the L226 SNP as branches really is misleading and confusing which detracts from a fine test.

gotten
01-10-2017, 06:45 PM
I think it is actually a good move. They can learn a lot from it. I understand why they specifically offer it to Genealogical Researchers. A matching system would be nice but they are a small team.

Hopefully this also shakes things up a bit with the bigger companies. Competition is healthy.

Look forward to hear about the first results.

TigerMW
01-11-2017, 09:38 PM
It is a little disappointing that they chose 15X coverage - wish it was 30X or 50X (but that would push it to the $1,000 range which would turn off many). Also, does anyone know the read length ? Probably 150 which is another issue. Also, there is no mention if relevant list of YSNPs will be entered into their database as well as the YSTR values. Both FGC and YSEQ are really missing the advantage of a robust database to match the genetic data being provided. They need to partner with some IT people and charge for this service. Of course FTDNA does a minimal job of YSNPs for their database as well and fail to even analyze the YSTRs at all. I guess that you have to enter the market with one test to see the level of interest. Delivery time could be a real plus for YSEQ over FGC.

Wish they would have stuck to their core business of YSNP testing and introduced Mass Array SNP packs with a lot more branches and private YSNPs (without all the equivalent YSNPs that have minimal value compared to private YSNPs). This would obviously boost the YSNP sales as well which are suffering due to their SNP panel test orders declining. They should also lighten up on testing YSNPs found in unstable areas but have consistent test results - another FTDNA advantage.

Do not get me wrong, I regularly encourage YSEQ testing for the L226 and the Casey projects. Under L226, YSEQ has been discovering branches at 10 % of the cost of yet more Big Y tests. It is really hard to convince a lot of people to test with YSEQ or FGC though. All three FGC tests under L226 are for Casey men only to date.

I am really surprised how much success that L226 has had with the L226 SNP pack that includes 50 private YSNPs. Not only has this test revealed another seven branches under L226, but my charting of L226 using signatures has increased from 50 % to 75 % and has been more useful than just a few more Big Y tests. The Mass Array is having some substantial reliability issues (more than expected) but it remains a super useful test. Also, FTDNA's insistence on posting private YSNPs in the L226 SNP as branches really is misleading and confusing which detracts from a fine test.

Robert, do we have any assessments of confidence levels by read length. I was told 30x is questionable, particular given it is an average so many regions will be less than 30x. I haven't seen any good information on what the curve really looks like. Obviously there is a cost/times coverage curve but there is also a confidence/times coverage curve.

JamesKane
01-13-2017, 05:02 PM
30x read depth is questionable for medical applications. For diagnosis of genetic conditions or personalized medicine a medical provider needs to be 99.999% or better sure the call is accurate. Much of this is because you are dealing with two copies of the autosomes. The reads are effectively split between them. With only 30x coverage thats only 15 reads per copy on average. This is why the current NCBI recommendation is 300x on the gene encoding regions.

For genetic genealogy we only need 98% accuracy or so. This can be achieved with relatively small number of reads in areas like Poznik's Gold Regions. You want more reads for checking the consensus where the alignment algorithms like Burrow-Wheeler are challenged more. My comparison chart is already summarizing what can be accurately called with a few WGS read levels.

15x coverage is actually performing surprisingly better than I thought for the Y chromosome. An average of 12 million loci have at least 4 reads with less than 10% chance of being incorrectly aligned for each. This compares with 9 million loci in a Big Y despite it averaging better than 55x read depth in the target regions. What the 15x WGS test doesn't give is assurances it is covering the same regions as Big Y. The two I've looked at have less than 6.5 million of the 8.5 million combBED bases covered.

I'm working on generating histograms in an automated fashion to supplement the summaries, but keep getting distracted with other life priorities at the moment.

Eochaidh
01-13-2017, 06:22 PM
Yes it's an intriguing development and I wish Thomas well with it. As I understand it he was sort of working with FGC though, doing their DNA extraction??

Would this be any use to someone who has taken the Y Elite 2 test?

JamesKane
01-13-2017, 07:01 PM
Would this be any use to someone who has taken the Y Elite 2 test?

If you are only interested in your Y DNA, there is no real benefit for a Y Elite tester to take a WGS. Your upgrade path will require something with 10k read lengths for anything appreciably better. The difference between Y Elite 2 and a 30x WGS is only 1 million callable bases in the best scenario and WGS doesn't have the same PCR stutter problems, which can make STR calls more accurate.

If you haven't done autosomal testing, this is a very expensive 23andMe test you can put in Gedmatch until somebody builds a better mousetrap that does autosomal matching on WGS tests.

I would consider this a pretty good deal for someone that hasn't done much testing on the sample donor though.

MacUalraig
01-14-2017, 02:02 PM
WGS wide autosomal matching will be the next big thing. I couldn't resist doing a WGS a couple of years ago, but as James says, doesn't really add anything to the Y you already did. Luckily one of the top autosomal experts who is looking at this is an autosomal match to me ;-)

RobertCasey
01-14-2017, 04:28 PM
Robert, do we have any assessments of confidence levels by read length. I was told 30x is questionable, particular given it is an average so many regions will be less than 30x. I haven't seen any good information on what the curve really looks like. Obviously there is a cost/times coverage curve but there is also a confidence/times coverage curve.

Here is quote on callable Loci based on various coverage from two to thirty that FGC stated:


Average number of bases with at least 1x coverage: (on the Y chromosome)

30x whole genome: 22,761,293
10x whole genome: 22,025,697
4x whole genome: 17,678,170
2x whole genome: 13,755,442

Callable Loci:
30x whole genome: 14,644,185
10x whole genome: 8,046,540
4x whole genome: 1,050,996
2x whole genome: 394,718

Y Elite 1.0 by comparison:
Average number of bases with at least 1x coverage: (on the Y chromosome):
22,772,412

Callable Loci: 14,000,000

I would ignore all the 1X numbers and concentrate on callable Loci which is FGC's criteria for calling a YSNP.

Here is another analysis of a 15X coverage:

GenomeGuide 15x update:
13.2 mb Callable Loci
1000x+ mtDNA coverage
98.7% autosomal SNP coverage (3.5 million SNPs+)


Big Y is around 9,000,000 callable Loci, so the 15 X test would call a lot more than Big Y - but the calls would be very different parts of the Y chromosome. At 30X, it would be a clear winner. There is one big pro of this test over the Big Y: it would read areas that Big Y does not cover. There is one big con of this approach, it will be missing many of the Big Y reads. Probably having 30 % beyond Big Y and missing 15 % of Big Y. This has value in itself - like the old Walk the Y when they changed the area being scanned, there was a large jump of YSNPs being discovered.

Of course, all these NGS/WGS tests will be replaced in 12 to 18 months when WGS tests have much longer read lengths (think 500) with 50X to 100X coverage. These new WGS test will easily read all 111 YSTR markers, be near the limits of what can be read on the YCHR and would include the atDNA data. So this test will be around $500 to $750 which will include 111 YSTRs (plus 300 more YSTRs), a far superior Big Y (35 % more) and a better atDNA test (not sure how much larger atDNA tests would help vs. IT costs to analyze). However, with no databases for these tests, their future is remains problematic. I am super surprised that you and others have done such a good job of feeding files into the Big Tree which is a database of sorts. Expanding Alex's Big Tree scope from R-P312 to all of the genome is a big step though as they leadership beyond P312 is more distributed in nature. Also, Alex's charts are missing branches discovered by missing VCF files not shared, has most of the FGC files though, missing all the individual YSNP testing from YSEQ and will be missing all the private YSNPs being included in FTDNA SNP packs. But we are getting slowly getting there. The tests are moving faster than the quality of the tools and databases though.

mlcarson
01-16-2017, 10:46 PM
How does the 15xWGS test compare with Fullgenome's GenomeGuide 15X test?
Price wise, it's $899 vs $795 but I don't understand enough about the testing to say anything beyond that.

JamesKane
01-17-2017, 12:05 AM
The main test is identical. Both are using Illumina X10 to perform the sequencing. The servicing labs are different but that should not impact things.

The main difference is YSEQ is validating 10 of your singleton variants with Sanger sequencing. This is a convenient way to make them available for close STR matches to try. This can be more cost effective than a second NGS test. The drawback is many of this can turn up negative in the match. Some feel this is a wasted investment.

Cofgene
01-17-2017, 12:43 PM
The main test is identical. Both are using Illumina X10 to perform the sequencing. The servicing labs are different but that should not impact things.

The main difference is YSEQ is validating 10 of your singleton variants with Sanger sequencing. This is a convenient way to make them available for close STR matches to try. This can be more cost effective than a second NGS test. The drawback is many of this can turn up negative in the match. Some feel this is a wasted investment.

The other item is that there are going to be useful variants that Sanger sequencing cannot be used for. One may get 10 potential Sanger tests but those may not be useful for your close STR matches. You will still need to get BigY or WGS results to work on the variants from regions not accessible to Sanger.

mgrollman
07-30-2017, 10:46 PM
Of course, all these NGS/WGS tests will be replaced in 12 to 18 months when WGS tests have much longer read lengths (think 500) with 50X to 100X coverage. These new WGS test will easily read all 111 YSTR markers, be near the limits of what can be read on the YCHR and would include the atDNA data. So this test will be around $500 to $750 which will include 111 YSTRs (plus 300 more YSTRs), a far superior Big Y (35 % more) and a better atDNA test (not sure how much larger atDNA tests would help vs. IT costs to analyze). However, with no databases for these tests, their future is remains problematic. I am super surprised that you and others have done such a good job of feeding files into the Big Tree which is a database of sorts. Expanding Alex's Big Tree scope from R-P312 to all of the genome is a big step though as they leadership beyond P312 is more distributed in nature. Also, Alex's charts are missing branches discovered by missing VCF files not shared, has most of the FGC files though, missing all the individual YSNP testing from YSEQ and will be missing all the private YSNPs being included in FTDNA SNP packs. But we are getting slowly getting there. The tests are moving faster than the quality of the tools and databases though.

Robert et al- Assuming IT assets manage to close some of these gaps in the next year or two, which in a cloud-IoT driven world they will be trying hard as an industry at least from a capacity and power standpoint to do, your projection of a major WGS quality results crossover point being reached sometime in 2018 leads to me to ask two questions:

1.) Seems like someone patient who does not currently have high resolution Y results would be well advised to wait till next year before dropping $700+ in any of the products on the market in 2017. More or less true?

2.) Can you describe for an WGS labwork newbie who is technology terminology proficient, what specific underlying technology in WGS testing do you expect to see hitting the market in 2018 which will bring "much longer read lengths (think 500) with 50X to 100X coverage"...I am guessing it is a new generation of silicon from one or more specific vendors?

Thanks for any insight you or others in this space can offer...

RobertCasey
08-01-2017, 06:35 AM
There are several changes in the wings:

1) The Whole Genome Sequencing (WGS) testing will come down faster than Next Generation Sequencing (NGS) testing for two reasons: a) NGS testing requires YDNA enrichment (filters out YDNA from atDNA); b) WGS is far more main stream in medical industry - so this technology go down faster due to medical testing using this technology in large quantities.

2) The future increase read length of the WGS/NGS will probably increase in the near future. Big Y is 150 base pairs and Elite2.1 is 250 base pairs. Longer read lengths not only reveals more YSNPs but also increases the capability to read more YSTRs. At 1,000 base pairs, I think that would be enough to read all 111 markers of the FTDNA test as well as revealing few more longer YSTR structures as well. With all 67 markers being read (or 111 markers), you would no longer need to take a redundant YSTR test at FTDNA. This would reduce the perceived costs of WGS/NGS tests by $269 (67) or $359 (111) plus give us another 300 or so new YSTRs to investigate. With the Big Y going on sale for only $395 over the next week, these savings could bring down the cost of WGS/NGS. However, we would have to manually track these YSTRs - but we already have to manually track the YSNPs from Big Ys already. There is a 1,000,000 read length available today for $2,950 (that will only catch a few orders at that price). Remember the WGS was $10,000,000 only ten or so years ago. Help desk and IT support is probably already around 50 % of the costs these days, so testing costs is not the only issue.

3) Waiting for technology to come down just does not make sense. Do you wait for new PCs to drop in prices - no you should purchase what you need when you need it. You can also wait for others to compile your genealogy for you. You can either be proactive and drive innovation/discovery or take a more passive role and let others make the significant strides now vs. later. YDNA testing is not inexpensive - but neither are Ancestry.com subscription fees for genealogical databases or your "family history" trips where travel expenses can add up. If you want to save your funds - wait for sales like tomorrow when the Big Y is reduced to $395 from the regular $575 price tag.

Sassoneg
10-24-2017, 06:01 PM
You can choose 15x, 30x, or 50x on the WGS on YSEQ currently. The cost goes up of course.

Afshar
10-24-2017, 06:24 PM
Excellent product but (for the wgs) without adequate interpretation and databases its useless. But considering you get FF, mtdna and bigY data for this price,it is a bargain.

MacUalraig
10-27-2017, 11:34 AM
Excellent product but (for the wgs) without adequate interpretation and databases its useless. But considering you get FF, mtdna and bigY data for this price,it is a bargain.

It is better than that in terms of Y sequencing coverage as it captures the regions FGC sequence but BigY doesn't.

Petr
10-27-2017, 01:12 PM
My FGC WGS 15x shows the statistic at YFull (Y chromosome):
Length coverage: 22547120 bp (87.89%)
Mean depth coverage: 13.08X
Median depth coverage: 6X

I don't have detail statistics, but it looks like this mean about 50 % coverage with depth 4 - 5 or better. For less than 4 reads there are too many error.

In comparison with typical Big Y

Length coverage: 14043329 bp (54.74%)
Mean depth coverage: 61.07X
Median depth coverage: 35X

It looks like the reliable coverage is almost the same.

Elite 1.0 is much better:

Length coverage: 22959696 bp (89.50%)
Mean depth coverage: 76.85X
Median depth coverage: 37X

MacUalraig
10-27-2017, 01:22 PM
My FGC WGS 15x shows the statistic at YFull (Y chromosome):
Length coverage: 22547120 bp (87.89%)
Mean depth coverage: 13.08X
Median depth coverage: 6X

I don't have detail statistics, but it looks like this mean about 50 % coverage with depth 4 - 5 or better. For less than 4 reads there are too many error.

In comparison with typical Big Y

Length coverage: 14043329 bp (54.74%)
Mean depth coverage: 61.07X
Median depth coverage: 35X

It looks like the reliable coverage is almost the same.

Elite 1.0 is much better:

Length coverage: 22959696 bp (89.50%)
Mean depth coverage: 76.85X
Median depth coverage: 37X

Thanks for posting the stats. It does depend though on whether you are making a general comparison or whether you have a particular interest/need in the extended areas. And whether you are mostly looking for new variants (particularly in a previously unexplored group) or mainly comparing with a previous sample.

Petr
10-27-2017, 04:35 PM
It might be confusing that Whole Genome Sequencing 15x means that only about 25 % of the genome has the depth coverage at least 15x.

It is good service but it has its limits.

Petr
10-27-2017, 05:10 PM
The main difference is YSEQ is validating 10 of your singleton variants with Sanger sequencing. This is a convenient way to make them available for close STR matches to try. This can be more cost effective than a second NGS test. The drawback is many of this can turn up negative in the match. Some feel this is a wasted investment.
Many of these variants have just 1 or 2 reads or non-consistent reads and therefore it is very useful even for me to have them validated with Sanger sequencing. It could be easily false positive.

Other thing is that many SNPs are not suitable for Sanger sequencing. I asked for many variants discovered by NGS testing and just two (A1168 and A9502) out of 28 were suitable, others were rejected or not recommended for various reasons:

3252488 G -> A 99.3% X + 88980954 88981953
13218678 C -> T 95.1% 11 + 70848858 70850055
13611363 G -> A 96.0% 2 + 92203880 92204879
13611386 T -> C 96.1% 2 + 92203903 92204902
59025293 C -> A Pseudo Autosomal Region (PAR)
9239350 T->C RC1 99.9% Y + 9347996 9369299
9316833 A->G A618 99.4% Y + 9296023 9297022
9350171 A->G ZS4230 99.1% Y + 9369980 9370977
19968557 G->C 100.0% Y - 20210718 20211717
22425174 A->G Z27750 DYZ19 125bp repeat
22436085 T->G BY320 DYZ19 125bp repeat
22463781 T->G DYZ19 125bp repeat
23673867 C->T ZS1894 99.9% Y - 24063071 24064071
23709848 T->C A999 99.8% Y - 24050629 24051629
28792932 T->C GGAAT repeat
28802511 G->A A644 GGAAT repeat
28802514 G->A A645 GGAAT repeat
FGC31478 5024010 G->C 98.8% X + 91186977 91187987
5275891 C->A 97.1% X + 91548060 91549050
rs2580669 6599640 T->C 97.4% X + 88631947 88632933
7120613 T->G 97.6% 14 + 91899724 91900732
19173957 T->C 99.2% 5 - 82614534 82615533
A16162 ChrY:14086865 C to T A&T rich complex repeat section (Can't get primers with enough Tm) *
ChrY:22568328 del to G DYZ19 125bp repeat *
ChrY:10023953 TCTC to del Centromeric repeat *
Z41150 homopolymer *
* = not recommended

Robert McBride
10-27-2017, 08:57 PM
Here's the Yfull stats for my Y chromosome bam file from my 50x Yseq WGS:


ChrY BAM file size: 0.67 Gb
Reads (all): 9558654
Mapped reads: 9541476 (99.82%)
Unmapped reads: 17178 (0.18%)
Length coverage: 25592634 bp (99.76%)
Min depth coverage: 1X
Max depth coverage: 8039X
Mean depth coverage: 40.12X
Median depth coverage: 20X
Length coverage for age: 8462241 bp
No call: 60932 bp


I wanted to get my whole genome sequenced otherwise I would have gone for the FGC Elite.
Before I ordered I asked Thomas Krahn how many extra good quality sanger sequenceable Y chromosome snps it was likely to find that wouldnt have been found by my Big Y and he estimated two and he was spot on .

MacUalraig
11-30-2017, 07:51 PM
There are now YSEQ WGS kits showing up on Genesis-Gedmatch... got one in my match list.

curiousII
11-11-2018, 09:13 PM
This is an old thread, hope I'm not "necro-posting" but I came just that close to buying a 15x WGS both yesterday and today. A fellow Z2573 agreed that's a good plan as Z2573 isn't covered well by Big Y and, since I'm downstream from DF27, I get to read all about the Iberia/Quedlinburg/Wherever controversy of DF27's origins. Z2573 doesn't have the Basque Marker; perhaps that's odd-man-out now but I'm just guessing as many of the involved parties seem to be, also.

Cut-and-paste from YSEQ:

Coverage:
15x means ca. 45Gbases raw data per sample
30x means ca. 90Gbases raw data per sample
50x means ca. 150Gbases raw data per sample

15x is $740 at this writing, 30x is special priced at an additional $600 (meaning $1,340), but if you're considering that $1,340 you're just a step away from 50x at $1,430.

I'm staying at 15x, but my question is, being downstream from DF27, I wonder if I need use the Sanger Confirmation Package option at another $100? From YSEQ: "Up to 10 Y-SNP confirmations" for that $100.

Good idea? Or just stay at 15x?

Link to that webpage: http://www.yseq.net/product_info.php?products_id=42468

edit: I just bought the 15x without the Sanger add on. Maybe I can add that at a later date. Or maybe not.

The $740 is just a little over the listed Big Y price that Family Tree's advertising now, and with WGS you get your mtDNA along with all the other bennies that go with it. I have no idea why Family Tree hasn't started offering WGS yet; it seems they would to stay competitive with their competition.

MacUalraig
11-12-2018, 12:24 PM
This is an old thread, hope I'm not "necro-posting" but I came just that close to buying a 15x WGS both yesterday and today. A fellow Z2573 agreed that's a good plan as Z2573 isn't covered well by Big Y and, since I'm downstream from DF27, I get to read all about the Iberia/Quedlinburg/Wherever controversy of DF27's origins. Z2573 doesn't have the Basque Marker; perhaps that's odd-man-out now but I'm just guessing as many of the involved parties seem to be, also.

Cut-and-paste from YSEQ:

Coverage:
15x means ca. 45Gbases raw data per sample
30x means ca. 90Gbases raw data per sample
50x means ca. 150Gbases raw data per sample

15x is $740 at this writing, 30x is special priced at an additional $600 (meaning $1,340), but if you're considering that $1,340 you're just a step away from 50x at $1,430.

I'm staying at 15x, but my question is, being downstream from DF27, I wonder if I need use the Sanger Confirmation Package option at another $100? From YSEQ: "Up to 10 Y-SNP confirmations" for that $100.

Good idea? Or just stay at 15x?

Link to that webpage: http://www.yseq.net/product_info.php?products_id=42468

edit: I just bought the 15x without the Sanger add on. Maybe I can add that at a later date. Or maybe not.

The $740 is just a little over the listed Big Y price that Family Tree's advertising now, and with WGS you get your mtDNA along with all the other bennies that go with it. I have no idea why Family Tree hasn't started offering WGS yet; it seems they would to stay competitive with their competition.

GeneByGene, their parent company, does a WGS but its pricey and uses a blood sample so not really DTC. It seems that they want to have clear daylight between 'genealogical' and potentially medical testing. USD2895 *without* variant calling!

https://www.genebygene.com/pages/research#

curiousII
11-12-2018, 01:28 PM
GeneByGene, their parent company, does a WGS but its pricey and uses a blood sample so not really DTC. It seems that they want to have clear daylight between 'genealogical' and potentially medical testing. USD2895 *without* variant calling!

https://www.genebygene.com/pages/research#

I had some fuzzy math in my post: YSEQ's 50x is $2,170. However it is that the Krahns do it they always seem to have affordable products compared to the competition. Puts things like WGS within reach; I read that Ozzy Osbourne spent $10,000 for his years ago. No idea which company he used.

MacUalraig
11-12-2018, 02:38 PM
I had some fuzzy math in my post: YSEQ's 50x is $2,170. However it is that the Krahns do it they always seem to have affordable products compared to the competition. Puts things like WGS within reach; I read that Ozzy Osbourne spent $10,000 for his years ago. No idea which company he used.

The YSEQ 15x is my standard test at the moment, without Sanger confirmation. They seem to have good batching/turnaround arrangements so far too.

curiousII
01-11-2019, 10:00 PM
Just got my WGS back, no idea how to read any of this. Both y and mtDNA haplogroups are the same as Family Tree gave me years ago, really want to see how much (if any) my autosomal differs between the two companies. Have to ask for help with that one; I sure hope it's something with a map and percentages.
__________________________________________________ __________________________________

Quick results summary:

1.) Mitochondrial DNA
Your mitochondrial haplogroup is H11a according to PhyloTree Build 17 (2016-02-18)

The differences to the rCRS are
195C 263G 309.1C 315.1C 750G 961G 1438G 4769G 8448C 8860G 13759A 15326G 16293G 16311C 16368C

You can download your mtDNA fasta file here: xxxxxxx

If you want, you can submit your mtDNA sequence to NCBI Genbank:
http://www.ianlogan.co.uk/checker/yseq_submission_maker.htm

2.) Y Chromosome
Your Y chromosome haplogroup is R1b-DF27 > Z2573

The downstream SNPs of Z2573 are verified as:
BY30356 A-
BY30353 A-
BY30354 C-
BY30359 C-
BY30361 G-
BY30362 G-
BY30363 A-
BY30364 T-
BY30365 T-
BY30366 A-
BY30370 C-
BY30374 C-

BY161082 A-
BY161047 C-
BY161082 A-
BY161088 G-
BY161212 C-
BY161343 T-
BY161599 G-
BY161654 G-
BY161838 G-
BY162077 T-

Z29624 T-
A6086 G-
Z29620 C-

CTS1090 C-
A6104 C-

The novel SNPs found in sample xxxx are (hg38 positions):

chrY:3886880 C T A23918
chrY:9929780 G A A23919
chrY:14563154 C T A23920
chrY:15449544 C T A23921
chrY:15663409 C T A23922
chrY:20498535 C T A23923

We can order primers for the new SNPs and we recommend to verify them with Sanger sequencing.

We have primers in stock for A23920.

Of course there are many more novel mutations identified, but they have been rejected for the reasons given in this file: xxx

Also there are various gzipped VCF files for called, cleaned, derived, INDEL and novel SNPs that can be analyzed by experts.
The Y chromosome BAM file is available here: xxxxxx

We have made good experience with the YFull team. They are cooperative and also share their findings with us.
Therefore we recommend a 3rd party analysis at https://yfull.com .
If you want to release your BAM files to YFull, just send us an email.

3.) Autosomal DNA
The complete BAM file of your genome is available here: xxxxx

Please be aware that this is a huge download of 22.4 GByte. On request we can provide a SD card with all files at a reasonable handling and shipping fee.

The complete set of extracted mutations can be found in this gzipped VCF file: xxxxx

Please let us know if you have questions.

Your order has been updated to the following status.

New status: Delivered

Please reply to this email if you have any questions.

curiousII
01-15-2019, 02:04 PM
Looks like I'm going to have to order my autosomal results on that SD card YSEQ offers for $90. I guess I should have done that to begin with but I was assuming that the results would be along the lines of FTDNA and the rest with maps and percentages.

It also looks like I'll need to pay for an interpreter to translate all this information into something that I can understand. Any idea who does that? Anyone?

MacUalraig
01-15-2019, 02:15 PM
At one stage they were offering a GEDMatch ready upload file so maybe ask them about that.

You could spend the rest of your life analysing a WGS file, depends on what you are interested in.

Robert McBride
01-15-2019, 04:12 PM
One thing I noticed when I put my Yseq WGS autosomal results file through Promethease is that only snps that I was positive for and novel variants were included in the results, snps that I am ancestral for didn’t seem to be included . Or at least thats what I inferred from the fact Promethease couldn’t predict my blood type unless I used the Yseq 23andme format file because there was no result on the main autosomal results file for some of the snps used to predict it (they were included on the Yseq 23andme format file) .I suppose its a way to reduce the file size. Or perhaps something went wrong with Promethease.
Any one else notice this or is it just me?

JamesKane
01-17-2019, 11:16 AM
There are 3 billion base locations included in your WGS test results. By default callers only look for locations where you do not match the reference genome, since the variant call file would be larger than your BAM if all sites were included. YSEQ will create a file that's in the same format as 23andMe that can be uploaded to Gedmatch, if you'd want. Only about 500-900 thousand call sites are included in that report.

For the most part they only offer analysis on the Y and mitochondrial portions of their WGS tests. This is delivered as a summary of where you fit and recommendations of any additional testing that might be useful to clear up questionable results.

curiousII
01-18-2019, 06:30 PM
At one stage they were offering a GEDMatch ready upload file so maybe ask them about that.

I did, and I got a response very quickly from them. YSEQ always has courteous and prompt service, and in this one respect their service is quite satisfactory. They're allowing YFull access to my results, too.

So, since comparing results is so much fun with DNA tests, here's my Genesis GEDmatch from my YSEQ test:

Population

North_Atlantic: 45.16 Pct
Baltic: 26.98 Pct
West_Med: 13.36 Pct
West_Asian: 3.43 Pct
East_Med: 7.28 Pct
Red_Sea: 0.85 Pct
South_Asian: 1.25 Pct
East_Asian: -
Siberian: 0.86 Pct
Amerindian: 0.47 Pct
Oceanian: -
Northeast_African: 0.20 Pct
Sub-Saharan: 0.16 Pct

My pre-Genesis GEDmatch results from my FTDNA test, which remained static except for a reduction of my SSA from 0.16 to 0.15. Have to double-check to see if there's any other changes, but this is what it says now:

Population

North_Atlantic: 44.08 Pct
Baltic: 27.57 Pct
West_Med:13.67 Pct
West_Asian: 3.04 Pct
East_Med: 8.05 Pct
Red_Sea: 0.70 Pct
South_Asian: 1.39 Pct
East_Asian: -
Siberian: 0.91 Pct
Amerindian: 0.33 Pct
Oceanian: -
Northeast_African: 0.10 Pct
Sub-Saharan: 0.15 Pct

I'm waiting for my new YFull/YSEQ results, that might be a few days, but I uploaded to DNA.Land and really got a difference. Matter of fact, I should post that in the DNA.Land forum. Anyhow, this was really nice. Affordable WGS, a sign of a happy future.

MacUalraig
02-12-2019, 07:26 AM
I got my own WGSx30 back overnight, my sample got back to the lab on Dec 14th. The Y BAM is downloadable at 343Mb but the full BAM is 32Gb so I will be awaiting the postal version of that. Brief skim through the reports indicates they didn't find any new good quality Y SNPs that I didn't have already from my Y Elite 1.0 but I'm uploading it to YFull for a second opinion.

JamesKane
02-16-2019, 12:44 AM
Adding a set of histograms for YSEQ's WGS products as pertains to the Y chromosome. I may publish a link to a blog post that does all contigs with my 30x sample, but don't hold your breath. :biggrin1:

15x WGS: (Note: Two of these really seem to be 20x)
https://haplogroup-r.org/data/histograms/YSEQ-15x%20WGS.png

30x WGS:
https://haplogroup-r.org/data/histograms/YSEQ-30x%20WGS.png

50x WGS:
https://haplogroup-r.org/data/histograms/YSEQ-50x%20WGS.png

MacUalraig
02-26-2019, 09:29 AM
Got my disc with my full WGS bam yesterday so plenty to play around with now.

As regards coverage it seems to vary quite a bit and one x15 kit seems to have got as good depth as my x30 on the Y.
Mine: mean 14.67x median 14x (roughly the expected value for a WGSx30)
x15: mean 14.14x median 14x

Range seem to date on x15: mean 6.31-20.68x, median 4x-18x
DNA volume: 14 to 215 microL

JamesKane
03-02-2019, 02:08 PM
A very short blog post containing coverage histograms for the 30x WGS option on all chromosomes is now up. A Bit About WGS Testing (http://www.it2kane.org/2019/03/a-bit-about-wgs-testing/)

While the sample is sourced from YSEQ any test with 150 base pair reads running on Illumina equipment will look about the same.

MacUalraig
03-07-2019, 03:35 PM
Has anyone other than me tried doing a compare betweeen a real 23andMe report and YSEQ's mocked up 23andMe style report? I started with this as it seemed like a more manageable dataset but was surprised to be honest how many differences there are. Going through the whole thing could be quite a big task.

In one instance I found the YSEQ "23" call was contradicted not just be 23andMe and my new Dante VCF but the YSEQ BAM file! However despite looking good in IGV it is *missing* from the YSEQ VCF so presumably wasn't called and the 23 script then just wrote it up as ancestral AA instead of AG.

JamesKane
03-07-2019, 06:56 PM
Are you looking at the hg38 BAM? The 23andMe-mock report is done with an hg19/hg37 reference.

MacUalraig
03-07-2019, 10:08 PM
The comparison I am talking about is between the hg19 '23andme' mockup report from YSEQ and my real hg19 23andMe download file. So of course, any check I do in the hg38 BAM I have to convert. In fact in the specific case above I first used the hg38 coords from dbSNP and then double checked it manually using LiftOver to make sure.

There are several thousand diffs - don't have the exact figure to hand because the original query I did included the ones 23andMe genotype as 'II' or 'DD'. There are probably multiple issues accounting for them, but now I'm aware that it includes a subset which weren't called in the VCF I may just not bother.

If you have the same two files available for yourself I'd be interested to hear how you got on.

MacUalraig
03-07-2019, 10:21 PM
Just in case anyone does have the two files I refer to, the example I was talking about was chr6:104503363 (hg19) where my mockup file listed me as AA but 23andMe have me as AG. Dante VCF has 6A 10G and the YSEQ BAM has 7A 5G.

The rsID is rs4145453 and hg38 pos is 104055488 which is not in my VCF (37_hg38.vcf).

https://www.ncbi.nlm.nih.gov/snp/rs4145453

I suspect there are more interesting categories of issues than this one though, if I looked further into it.

Robert McBride
03-08-2019, 12:09 AM
Just in case anyone does have the two files I refer to, the example I was talking about was chr6:104503363 (hg19) where my mockup file listed me as AA but 23andMe have me as AG. Dante VCF has 6A 10G and the YSEQ BAM has 7A 5G.

The rsID is rs4145453 and hg38 pos is 104055488 which is not in my VCF (37_hg38.vcf).








https://www.ncbi.nlm.nih.gov/snp/rs4145453

I suspect there are more interesting categories of issues than this one though, if I looked further into it.


I did a text search for rs4145453 in my Yseq mockup 23andme report, my 23andme v4 , FF and Ancestrydna raw data files and I can’t find it in any of them.

JamesKane
03-08-2019, 02:23 AM
Just in case anyone does have the two files I refer to, the example I was talking about was chr6:104503363 (hg19) where my mockup file listed me as AA but 23andMe have me as AG. Dante VCF has 6A 10G and the YSEQ BAM has 7A 5G.

The rsID is rs4145453 and hg38 pos is 104055488 which is not in my VCF (37_hg38.vcf).

https://www.ncbi.nlm.nih.gov/snp/rs4145453

I suspect there are more interesting categories of issues than this one though, if I looked further into it.

Not covered in my 23andMe V3 results. As far as the SNP it shows as fully derived G with 30 reads except an odd ball T read that I think is assigned to the wrong location due to a single base insertion at 104,055,478.

MacUalraig
03-08-2019, 07:04 AM
My 23andMe test was v5, although the comparison I'm discussing could of course be done against any version as its the diffs I am querying.

MacUalraig
03-08-2019, 10:32 AM
I did a text search for rs4145453 in my Yseq mockup 23andme report, ....

Odd especially in view of my finding that some calls may have been simply assumed unless you perhaps had no reads in your BAM? These are the 5 entries either side

rs859982 6 104500232 CT
rs73002415 6 104501109 CT
rs859980 6 104501198 CC
rs859978 6 104502187 CC
rs6906370 6 104503032 CT
rs4145453 6 104503363 AA
rs6929295 6 104503485 AG
rs9499731 6 104509384 AA
rs859969 6 104512664 GT
rs859967 6 104512842 AG
rs860932 6 104513069 GT

How does this selection compare with your file? (no need to post the genotypes can't be arsed to edit them)

Robert McBride
03-08-2019, 09:46 PM
Odd especially in view of my finding that some calls may have been simply assumed unless you perhaps had no reads in your BAM? These are the 5 entries either side


How does this selection compare with your file? (no need to post the genotypes can't be arsed to edit them)

This is the 23andme V3 yseq mockup

rs9499726 6 104484800 CT
rs11754542 6 104485747 GG
rs9499727 6 104488425 TT
rs9499729 6 104491819 GG
rs6919443 6 104493098 AG
rs7753362 6 104494946 TT
rs1736577 6 104496518 AG
rs859985 6 104497838 GT
rs9404502 6 104499121 CC
rs859982 6 104500232 CC
rs859980 6 104501198 CT
rs6906370 6 104503032 CT
rs9499731 6 104509384 AA
rs859969 6 104512664 TT
rs859967 6 104512842 GG
rs860932 6 104513069 TT
rs11156355 6 104513200 CC
rs865803 6 104514169 CT
rs9499734 6 104515333 AA
rs9391148 6 104517073 AG
rs9377599 6 104518126 TT
rs12110847 6 104518954 TT

23andme v4
rs9499726 6 104484800 CT
rs9499729 6 104491819 GG
rs6919443 6 104493098 AG
rs7753362 6 104494946 TT
rs1736577 6 104496518 AG
rs9404502 6 104499121 CC
rs859982 6 104500232 CC
rs859967 6 104512842 GG
rs860932 6 104513069 TT
rs865803 6 104514169 CT
rs12110847 6 104518954 TT
rs11156357 6 104520128 AG



Ancestrydna
rs9499726 6 104484800 T C
rs9499727 6 104488425 T T
rs6919443 6 104493098 A G
rs1736577 6 104496518 A G
rs9404502 6 104499121 C C
rs859982 6 104500232 C C
rs859980 6 104501198 0 0
rs6906370 6 104503032 T C
rs859969 6 104512664 T T
rs859967 6 104512842 G G
rs11156355 6 104513200 C C
rs9391148 6 104517073 A G
rs9377599 6 104518126 T T
rs12110847 6 104518954 T T
rs9386402 6 104519786 T C
rs11156357 6 104520128 A G
rs9499736 6 104521604 T C