PDA

View Full Version : DNA.Land Relationship Matches are impossible



J1 DYS388=13
10-12-2015, 03:16 PM
I've had three contacts from "2nd or 3rd cousin" DNA matches at DNA.Land. None could possibly be related to me that closely. Two are from countries where my ancestors never set foot (Germany, Denmark, Sweden).

What are you other DNA.Land users seeing?

MikeWhalen
10-12-2015, 07:41 PM
still waiting for that part of the analysis

Mike

N21163
10-12-2015, 08:01 PM
I've had three contacts from "2nd or 3rd cousin" DNA matches at DNA.Land. None could possibly be related to me that closely. Two are from countries where my ancestors never set foot (Germany, Denmark, Sweden).

What are you other DNA.Land users seeing?

I have had the same result.

Looking at the chromosome browser on the relationship chart, they seem to have identified segments which are deemed ancient (green) and recent segments (orange).

Does anyone know what methods DNA Land used to determine to make such distinctions?

Following this they appear to just tally up the total cM shared and slap a label of "3rd cousin" degree of separation on it.

If the ancient and recent distinction is correct then our matches are very very distant relations.

On another note this could be why Felix Immanuel was able to find matches between ancient DNA samples and living people:

http://www.y-str.org/p/ancient-dna.html

Ultimately, I think they need to do some further analysis.

evon
10-12-2015, 10:31 PM
The DNA.Land team is in great debt to dnagenealogy on this awesome post that highlights issues in our relative matching report. As we mentioned earlier, our website is still in an early beta version. We are going to discuss in details this post to tweak our algorithms and improve the quality of our matches.

http://dnagenealogy.tumblr.com/post/130934539888/dnaland-vs-gedmatch

Lirio100
10-12-2015, 11:29 PM
I'm not sure the relationship part can work effectively at this early date. Right now there are only 4700+ genomes registered, according to the log in page. What are the odds of people being related in a genealogical time frame with just that number? Mine are still "in process" after more than 48 hours and I suspect they will be for some time. I also expect that should I get any matches, they will be due more to Ashkenazim endogamy than anything else, as happened on both FTDNA and 23andme. I entered my genome simply to see the ancestry breakdown would be, it will be interesting to watch as they refine their parameters.

Huntergatherer1066
10-13-2015, 12:07 AM
I think it is going to take some patience, they have a lot of tweaking and fixes to do and this project has just started. I've only uploaded my mother's so far, going to wait until things settle down before I upload anyone else. I uploaded my mother yesterday afternoon, and while her ancestry percentages are back the relationship matches are still pending.

khanabadoshi
10-13-2015, 01:32 AM
https://i.gyazo.com/b1bd19203f9db2c85a26d15f1c356f7d.png

Táltos
10-13-2015, 04:03 AM
This was posted tonight in the ISOGG Facebook group by Yaniv Erlich.

Hi All,
Thank you all for the discussions about DNA.Land features, methods, and issues.
As we work on improving our results, I thought to provide some information about the algorithmic steps behind relative matching. Please notice that this is a work in progress and things are likely to change.
1. File uploaded to the website are converted to standard format of build 37/hg19.
2. The files are phased using SHAPEIT and imputed using IMPUTE2 using the samples in the 1000Genomes phase1 data.
IMPUTE2 Reference: journals.plos.org/plosgenetics/article…
For imputation accuracy see:
https://mathgen.stats.ox.ac.uk/…/data_download_1000G_phase1…
3. Out of the 39 million imputed SNPs, we select 4 million SNPs. This list is pre-determined and fixed for all samples. The SNPs have (i) MAF>5% across all human populations (ii) bi-allelic (iii) not in repeat regions.
4. We use GERMLINE to find IBD matches between the phased haplotypes of pairs of samples using the four million SNPs.
GERMLINE reference: http://genome.cshlp.org/content/19/2/318.long
[one issue we discovered is that our current parameters with GERMLINE are a bit sensitive the imputation errors. This creates a break in long IBD segments as was described by the excellent post of dnagenealogy. Another issue is that GERMLINE produces false positive matches in specific segments of the genome.- Dr. Tris Hayeck is working on that.]
5. The IBD segments are fed to ERSA to estimate the most likely number of meiosis events. ERSA performs hypothesis testing that classifies IBD segments as "ancient" or "recent" based on their length. "Ancient" segments refer to short IBD segments that segregate in unrelated individuals and tells nothing about relatedness. Only "Recent" segments are scored towards relatedness [There was some discussion that we use very short cutoff (3cM) as part of the IBD detection. These short segments are (almost) always classified as ancient and do not confound our model. However, please note that the relative finder report shows the total cM that includes both **ancient and recent** segments]
ERSA reference: http://genome.cshlp.org/content/21/5/768.full
6. ERSA has some glitches with very close relatives such as brothers, MZ twins (or duplicated), etc. We have a final step to measure the Identity-by-State between samples with high relatedness in ERSA to refine close relatives.
DONE.
---
Final note: many of you have not got their relative matches yet. We thank you for your patience. Our scripts were never tested with such number of genomes and we were not ready to scale from 0 to 4000 genomes in just four days. The team is working hard to eliminate kinks in the scripts and to accelerate some of the steps. This task takes a lot of effort and careful thinking. We didn't forget about you. We just need time.
Kudos to Jie Yuan and Richard Manuz. They joined the group a month ago to start their PhD and didn't sleep much since.