PDA

View Full Version : 23andme SNP accuracy falls short compared to Ancestry and Geno 2+



aistea
07-14-2016, 05:44 PM
I've done tests with multiple companies and so far I've received results from 23andme, Ancestry and Geno 2+.

I compared the raw data of each result against each other and noticed that while Ancestry and Geno 2+ have nearly exactly the same results for overlapping SNP's, 23andme doesn't. Here are the test result:

Ancestry v2 vs Geno 2+
410409 SNPs appear in both files
0.004 % (15) of the 410409 SNP results differ

Ancestry v2 vs 23andme
298665 SNPs appear in both files
0.086 % (257) of the 298665 SNP results differ

Geno 2+ vs 23andme
303283 SNPs appear in both files
0.092 % (278) of the 303283 SNP results differ

As it is very unlikely that Ancestry and Geno 2+ did exactly the same mistakes, I can assume that they are correct. That means that roughly 1% of all 23andme SNP results are wrong. So roughly 600 SNP's would be wrong. I am wondering weather I just fell into a batch with many errors or if 23andme simply produces more errors than the others.

In case you're interested into noCalls:

23andme no calls: 15.862
Ancestry no calls: 2.860
Geno 2+ no calls: 10.265

wombatofthenorth
07-19-2016, 02:00 AM
Don't forget that some Geno SNPs are reverse strand though.

Cofgene
07-19-2016, 11:13 AM
The results would be expected to be different since you are not dealing with exactly the same probes for some of the SNPs. It IS likely that some calls can be incorrect across multiple chips due to the nature of the probe and/or an incorrect initial reference to the variant. In order to determine if a probe is inaccurate you would need to validate the call using a different methodology. Get a WGS result and see if that indicates better agreement with some of the chip based results.

aistea
07-23-2016, 08:41 AM
The results would be expected to be different since you are not dealing with exactly the same probes for some of the SNPs. It IS likely that some calls can be incorrect across multiple chips due to the nature of the probe and/or an incorrect initial reference to the variant. In order to determine if a probe is inaccurate you would need to validate the call using a different methodology. Get a WGS result and see if that indicates better agreement with some of the chip based results.

So you believe that Ancestry, Geno 2+ and 23andme have all approximately the same amount of errors but Ancestry and Geno 2+ only seem to have less because they they miraculously have made the same errors for some SNP's even though they all have their own specific custom made chip?

aistea
07-23-2016, 08:43 AM
Don't forget that some Geno SNPs are reverse strand though.

That wasn't an issue in my analysis.

Cofgene
07-23-2016, 11:54 AM
So you believe that Ancestry, Geno 2+ and 23andme have all approximately the same amount of errors but Ancestry and Geno 2+ only seem to have less because they they miraculously have made the same errors for some SNP's even though they all have their own specific custom made chip?

How do you define an error? How do you one which one is correct? What have you done to validate your results to establish a correct answer?

The Ancestry v2 chip would be expected to contain the some of the same errors present in FamilyFinder, Ancestry v1, and 23andme v2 and v3 due to the platform being used. When the panels were assembled they were based off of information present in dbSNP and the liturature. We know that there is wrong information present in dbSNP from various studies where the qc efforts around variant identification left something to be desired.

aistea
07-24-2016, 07:50 AM
How do you define an error? How do you one which one is correct? What have you done to validate your results to establish a correct answer?

The Ancestry v2 chip would be expected to contain the some of the same errors present in FamilyFinder, Ancestry v1, and 23andme v2 and v3 due to the platform being used. When the panels were assembled they were based off of information present in dbSNP and the liturature. We know that there is wrong information present in dbSNP from various studies where the qc efforts around variant identification left something to be desired.

It's very simple. It's playing with odds. Ancestry and Geno share 400k SNP's. Only 15 of them are different. If you assume that they did the same mistakes with some other SNP's then it's like playing against incredible odds. You need to realize that each SNP can have at least 5 different combinations. Having the same mistake out with out of 5 combinations out of 400k is nearly impossible. It's like winning the lottery worldwide in every single country at the same time. It would be a miracle. So one must assume that they are correct. The fact that 23andme has each 20 times more differences compared to Ancestry and Geno 2+ just shows that it has way more errors.

It's not a guarantee but so is also Einstein's relativity theory. Until today it's not 100% proven but even the mobile phone networks wouldn't work if Einstein's theory was wrong. It's playing with odds. At some point you simply have to assume that it's correct if the opposite would be a miracle even if possible.

geebee
07-24-2016, 08:04 AM
"It's like winning the lottery worldwide in every single country at the same time. It would be a miracle."

Supposing that each country used the same software, it wouldn't necessarily take a miracle. It might merely be a software glitch.

Similarly, there are reasons why different companies might experience issues on the same SNPs. Testing errors aren't necessarily random.

JaG
07-24-2016, 08:09 AM
Both Ancestry and Geno 2+ use Ilumina OmniExpress platform. They are expected to get very similar output.

geebee
07-24-2016, 08:22 AM
So far as I know, 23andMe, FTDNA, and Ancestry all use an Illumina chip. Granted, the chips are customized for each company, but starting from the same base. The processing for Geno is done through FTDNA.

EDIT: And I see that someone beat me to it, with better detail.

aistea
07-24-2016, 04:33 PM
"It's like winning the lottery worldwide in every single country at the same time. It would be a miracle."

Supposing that each country used the same software, it wouldn't necessarily take a miracle. It might merely be a software glitch.

Similarly, there are reasons why different companies might experience issues on the same SNPs. Testing errors aren't necessarily random.

Aha, ok. This conversation is getting into a weird direction. I don't really believe in miracles and I think I'm done discussing it with you. Maybe about another topic sometime. I'm sorry.

Btw., you have exactly the same paternal haplogroup as me.