PDA

View Full Version : Gedmatch Genesis - big problem with false matches



lukaszM
11-14-2017, 12:20 AM
Recently I uploaded to Gedmatch Genesis few low-coverage genomes.

One was Rapanui genome generated from 11 mb bam file, so it has only few hundreds snps in oracles...
But this kit had enormous number of very significant matches. For sure all no-calls were treated by algorhitm as matching segments and they constituted about 99,9% of whole genome in this case.

I have similar situations with few other genomes. The question is when Gedmatch fix it? It is serious error of matching algorhitm. Obviously typical samples from genetic companies don;t have no-calls in any significant number but uploading ancient and some modern genomes form official datasets now it's pointless.

Babyj
11-17-2017, 10:37 PM
I have a match in my genesis list with someone who does not match me on FTDNA and 23andme, can that be that?

RobinBMc
11-18-2017, 01:57 AM
I have a match in my genesis list with someone who does not match me on FTDNA and 23andme, can that be that?

Did they test at FTDNA and 23andMe? What does the one-to-one comparison at Genesis say? I have noticed a few people on my Genesis one-to-many list which says they share a significant amount of DNA with me (about 30 cM if I recall), but clicking on one-to-one comparison with them turns up nothing above 7 cM, so something is obviously wrong.

firemonkey
11-18-2017, 09:31 AM
Comparing Kit UB1655155 () [FTDNA] and SX8786608 (Loschbour) [Luk Genomics]

16 79,416,563 83,270,617 10.6 369


Largest segment = 10.6 cM

Total Half-Match segments (HIR) = 10.6 cM (0.3 Pct)
Estimated number of generations to MRCA = 5.2

1 shared segments found for this comparison.

110207 SNPs used for this comparison.

53.9 Pct SNPs are full identical

Donwulff
11-18-2017, 04:03 PM
Main problem seems to be sequences which list alt-allele calls only. Genos Research and Dante Labs at least do this, and with exome it's obviously far more worse because there's little overlap with the microarray kits. From the user's point of view this can be fixed either by obtaining a gVCF file (such as re-calling on sequencing.com - actually I'm not sure right now if GEDmatch processes gVCF's correctly) listing the ref-calls as well, or if BAM is not available, imputing it (or you could merge/force calls at dbSNP sites).

However, it's likely this will fill up Gedmatch's database; it seems to be having some significant trouble even receiving whole genome sequences. It kind of depends on their internal implementation, actually storing all the ref-calls, especially for ALL locations would be extremely resource-intensive. There's lots of literature on trying to store whole genomes optimally, though.

RobinBMc
11-18-2017, 05:00 PM
Here's my dad's top one-to-many match on Genesis (after me and his dad):

Largest Seg Total cM Gen Overlap Date Compared Testing Company
14.2184 30.8311 4.4 146969 2017-10-12 23andMe

Yet one to one shows:

Largest segment = 0.0 cM
Total Half-Match segments (HIR) = 0.0 cM (0.0 Pct)
No shared DNA segments found
146991 SNPs used for this comparison.
54.6 Pct SNPs are full identical

My dad tested with AncestryDNA - this match is from 23andMe.

Further down the list:

Largest Seg Total cM Gen Overlap Date Compared Testing Company
11.3349 18.517 4.8 150330 2017-11-05 Living DNA

One to one:

Largest segment = 0.0 cM
Total Half-Match segments (HIR) = 0.0 cM (0.0 Pct)
No shared DNA segments found
150352 SNPs used for this comparison.
55.1 Pct SNPs are full identical

And another:

Largest Seg Total cM Gen Overlap Date Compared Testing Company
10.0999 17.2785 4.8 149930 2017-11-05 Living DNA

One to many:

Largest segment = 0.0 cM
Total Half-Match segments (HIR) = 0.0 cM (0.0 Pct)
No shared DNA segments found
149952 SNPs used for this comparison.
55.0 Pct SNPs are full identical

There's more, but obviously this gets repetitive. There was one from Genes for Good that did the same thing. He doesn't have many matches from Ancestry on Genesis but of the few he does have, they all seem fine, so is it a problem comparing across different companies maybe? But why does it only happen a portion of the time?

Askye
01-22-2018, 11:59 PM
Comparing Kit UB1655155 () [FTDNA] and SX8786608 (Loschbour) [Luk Genomics]

16 79,416,563 83,270,617 10.6 369


Largest segment = 10.6 cM

Total Half-Match segments (HIR) = 10.6 cM (0.3 Pct)
Estimated number of generations to MRCA = 5.2

1 shared segments found for this comparison.

110207 SNPs used for this comparison.

53.9 Pct SNPs are full identical

Did you ever figure out if your result was valid or not?
I get
SX8786608 (Loschbour) [Luk Genomics]
Chr Centimorgans (cM) SNPs
19 10.4 291
I have access to other raw files from people close to me, but not related. They're primarily European and get nothing unless I drop the cM down below 2-3

Nive1526
01-23-2018, 12:03 PM
Genesis uses very low snp thresholds for low overlap kits. 200 snp's come together randomly quite easy and therefore the list is filled with a huge amount of false positive matches.
And many of those are located in regions with little natural variance.
For example, If I run segment search of my first chromosome against my first 1000 autosomal matches, I get about 150 overlapping kits.
Roughly 80 of them match on position 0-3 Million snps and 30 at 242-246 million.
So around 75% of matches are located within the same three percent of the chromosome.

kingjohn
01-24-2018, 03:25 PM
Recently I uploaded to Gedmatch Genesis few low-coverage genomes.

One was Rapanui genome generated from 11 mb bam file, so it has only few hundreds snps in oracles...
But this kit had enormous number of very significant matches. For sure all no-calls were treated by algorhitm as matching segments and they constituted about 99,9% of whole genome in this case.

I have similar situations with few other genomes. The question is when Gedmatch fix it? It is serious error of matching algorhitm. Obviously typical samples from genetic companies don;t have no-calls in any significant number but uploading ancient and some modern genomes form official datasets now it's pointless.


from there site :
About the close 'Exome' matches
Are you puzzled by the new, very close, match you are seeing with somebody. Does it have "Exome" in the name? Even if it does not say "Exome", there are a few "Exome" kits on Genesis. Exome kits are different than the "genealogy" kits we are used to dealing with. The exome regions of the chromosome have much less difference from one individual to the next. Because of that, they appear as close matches to more people.

We apologize for the "false match". We plan to provide some means of differentiating these "exome matches" from the real thing, but it will take a while to get it in place.

Oleg (Rus)
01-26-2018, 12:24 PM
Matches at Genesis are strange and unreliable in any case, not only this time.