PDA

View Full Version : Gedmatch and FF: Shared DNA. Basic criteria and reliability when finding relatives.



Shadogowah
10-14-2016, 04:49 PM
Hello all,

I am creating my genealogy tree and I am taking advantage of the DNA tests to try to find relatives that could help me to find more ancestors back in time as I am reaching a point where civil registries do not help too much.

The tools offered by FTDNA and Gedmatch provide me with a decently sized list of potential relatives based on shared DNA. They even provide an estimate of the genetic distance to the closest common ancestor.

I would like to know any information that could help me to understand under which parameters or thresholds I can consider this data reliable and some insight on the statistics used to make such predictions.

Basically the list of relatives, as I would expect, contains a lot of hispanic and portuguese names but I am also surprised to find so many british surnames not bad positioned in the list. Some make sense, they seem to be USA citizens with some Spanish ascendancy but some others look like UK people.

But perhaps the most exotic matches to find for a Spaniard in his list of potential relatives are swedish people. One of them happens to have uploaded his genealogy tree and it is quite elaborated and does not show a single hint of foreign influence in it. This "outsider" shows up around the position 25 in the list with a longest block with a length of 10.6 cM and a total shared amount of 36 cM. Are these significative figures and I should expect to find some surprises in my family tree or values this small can be considered irrelevant?

Thanks in advance for any input.

dp
10-14-2016, 09:53 PM
Hello all,

I am creating my genealogy tree and I am taking advantage of the DNA tests to try to find relatives that could help me to find more ancestors back in time as I am reaching a point where civil registries do not help too much.

The tools offered by FTDNA and Gedmatch provide me with a decently sized list of potential relatives based on shared DNA. They even provide an estimate of the genetic distance to the closest common ancestor.

I would like to know any information that could help me to understand under which parameters or thresholds I can consider this data reliable and some insight on the statistics used to make such predictions.

Basically the list of relatives, as I would expect, contains a lot of hispanic and portuguese names but I am also surprised to find so many british surnames not bad positioned in the list. Some make sense, they seem to be USA citizens with some Spanish ascendancy but some others look like UK people.

But perhaps the most exotic matches to find for a Spaniard in his list of potential relatives are swedish people. One of them happens to have uploaded his genealogy tree and it is quite elaborated and does not show a single hint of foreign influence in it. This "outsider" shows up around the position 25 in the list with a longest block with a length of 10.6 cM and a total shared amount of 36 cM. Are these significative figures and I should expect to find some surprises in my family tree or values this small can be considered irrelevant?

Thanks in advance for any input.

Sorry to be brief. But I suggest you read this: http://isogg.org/wiki/Identical_by_descent
dp :)

Shadogowah
10-15-2016, 11:44 AM
That looks exactly like what I was looking for. Thanks a lot.

Shadogowah
10-15-2016, 01:15 PM
I post my initial conclusions just in case it could be of help for somebody else and to be corrected in case I got something wrong.

After a quick reading, what I get is that a shared segment "can be" an identical by descent segment (IBD) thus a truly DNA chunk inherited from a common ancestor... or not. There are several reasons for this to happen but what I was only interested to know is that you could get false positives.

In any case, the length of the shared DNA bit both increases the chances of being a genuine IBD and also brings closer the common ancestor.

A threshold of 5 cM can be then assumed as a sensible measurement that the shared DNA bit is actually an IBD. Below that length chances become relatively high that it is a false positive and the bit cannot be trusted (but it could still perfectly be an IBD).

IBDs with such a size however could very likely place the common ancestor way back in time (beyond 500 years) and does not make it quite useful for the purpose of genealogy.

When it comes to set a threshold on this regard, choosing 10 cM is a safer bet as the common ancestor is then, with higher probability within the latest 500 years.

The author of the entry goes further and suggests an even a more strict one setting it to 15 cM because in his opinion very few people would have their ancestors recorded up to 10 generations. It is actually true.

Without too much effort I have managed to collect almost all of them up to 7 generations thus reaching 300 years back in time in Spain only using civil registries. I still need to start looking at church records and indeed that is the difficult bit as lot of these data is either lost or difficult to find. However my wife actually had already the work done by one of her grandparents and she can perfectly trace lots of her ancestors up to the XVII century in the Netherlands.

My initial conclusion after seeing these british and swedish matches with IBDs above 10 cM is that indeed the chance of having a common ancestor within the last 500 years connecting us are actually high but it is going to be difficult to find him/her.

MitchellSince1893
10-15-2016, 02:38 PM
...A threshold of 5 cM can be then assumed as a sensible measurement that the shared DNA bit is actually an IBD. Below that length chances become relatively high that it is a false positive and the bit cannot be trusted (but it could still perfectly be an IBD).

IBDs with such a size however could very likely place the common ancestor way back in time (beyond 500 years) and does not make it quite useful for the purpose of genealogy.

When it comes to set a threshold on this regard, choosing 10 cM is a safer bet as the common ancestor is then, with higher probability within the latest 500 years.

The author of the entry goes further and suggests an even a more strict one setting it to 15 cM because in his opinion very few people would have their ancestors recorded up to 10 generations. It is actually true.

Without too much effort I have managed to collect almost all of them up to 7 generations thus reaching 300 years back in time in Spain only using civil registries. I still need to start looking at church records and indeed that is the difficult bit as lot of these data is either lost or difficult to find. However my wife actually had already the work done by one of her grandparents and she can perfectly trace lots of her ancestors up to the XVII century in the Netherlands.

My initial conclusion after seeing these british and swedish matches with IBDs above 10 cM is that indeed the chance of having a common ancestor within the last 500 years connecting us are actually high but it is going to be difficult to find him/her.

This website mentioned above http://isogg.org/wiki/Identical_by_descent includes a study by Tim Janzen and his wife's +8000 matches on gedmatch. He found that only 20% of his unphased familyfinder 7cM matches on gedmatch remained when compared to his phased matches
phasing is the process of assigning alleles to the mother or the father. The highest degree of accuracy is achieved by using the phased data from a two-parent/one child trio, where the error rate for phasing is only 0.01%.
At 9 cM only 53% remained matches. At 11 cM it reaches 98%. Hence I try to use 11 cM as my threshold when investigating matches at gedmatch.
That is not to say that matches smaller than this aren't valid but

For genuine smaller shared segments in the range of 5 cMs to 10 cMs the common ancestor may be as many as 10 to 15 generations or more back in time.

Shadogowah
10-17-2016, 08:25 AM
Thanks a lot.

I think I'll ask my parents to take the tests too.

Shadogowah
10-21-2016, 09:37 AM
Just an update: No, 10 cM is not good enough at all.

I had an interesting email exchange with a potential relative of Dutch origin. We had several DNA chunks matching. The length of the biggest one was 10 cM but there was another one 5.3 cM and yet another 4.8 cM and a lot more of smaller matches. Thinking in terms of probabability, this looked to me like a clear indication that we must had a common ancestor somewhere. Actually, adding up all the bits, it is one of the matches with the biggest total amount of cM in my list.

However he happened to have his parents data available and the connection was indeed a false positive, at least for the biggest chunk.

I assume it is an extreme case but it taught me that unphased matches cannot be trusted at all.

MitchellSince1893
10-21-2016, 11:26 AM
Just an update: No, 10 cM is not good enough at all.

I had an interesting email exchange with a potential relative of Dutch origin. We had several DNA chunks matching. The length of the biggest one was 10 cM but there was another one 5.3 cM and yet another 4.8 cM and a lot more of smaller matches. Thinking in terms of probabability, this looked to me like a clear indication that we must had a common ancestor somewhere. Actually, adding up all the bits, it is one of the matches with the biggest total amount of cM in my list.

However he happened to have his parents data available and the connection was indeed a false positive, at least for the biggest chunk.

I assume it is an extreme case but it taught me that unphased matches cannot be trusted at all.

In that link in post #5 above, just over 30% of unphased 10cM were false positives.

Shadogowah
10-21-2016, 01:45 PM
In that link in post #5 above, just over 30% of unphased 10cM were false positives.

Yes, you mean the study made by Tim and Rachel Janzen. I have been re-thinking about this again.

Their figures are indeed 69.3% for matches at 10 cM and 97.9% for matches at 11 cM (as you yourself stated in the very same post #5). Initially, before knowing what I know now, I assumed a linear interpolation for values between these two points but it is clear that an exponential curve fits better. EDIT: Actually I am idiot. just above the table there is a graphic representation showing that it totally fits an error function.

In any case these estimations are considering the longest IBD only. I also happened to have some sort of "circumstancial evidence" besides the match with 10 cM, like having a lot of other smaller matching segments giving a total sum above 50 cM, I thought that taking all into consideration, it would statistically increase the odds for a real match.

Actually, even when the longest IBD happens to be false compared with phased data, I think it does not discard a smaller one within the same segment (perhaps 5, 6 or even 7 cM It is difficult to appreciate it in the screenshot he sent me).

I am now suspecting that it could still be a match but probably beyond the 10 generations. Perhaps even twice that distance if indeed the connection refers to the years when Spain ruled the Low Countries. It is not difficult to think about marriages between dutch and spanish people there before the protestantism began. Besides I have compared his kit with one of my other matches (with the largest fragment at 12.6 cM) that I suspect could come from the same lineage and they also seem to have a lot of small segments in common. Even more than with me.

I have emailed him again with this considerations to know his opinion. He seems to know about this more than I do.

Shadogowah
10-28-2016, 09:11 AM
Well, just in case there is anybody interested in my research, it happens that I actually have found british family.

A couple of great great grand uncles happened to establish themselves in London around the end of the XIX century. That would explain the british matches. IŽll wait for my DNA data to be phased and start contacting them. Probably they will be interested in all I know about their Spanish branch I guess.

IŽll continue with the church records. Perhaps IŽll also find the swedish connection.