PDA

View Full Version : Three-way Match Topology

Mike Edwards
12-09-2013, 12:41 AM
Okay,so I'm out on my DNA-Testing Company's web-site looking for matches. On one autosomal chromosome, I find two people that match me in the same place (or so) on one segment. Further research show that they match each other there as well. So we have the following situation:

1029

But there are two versions of that segment of DNA for each of us, which causes there to be two different ways that we can have a three-way match:

1030

1031

Now if we had the DNA of each of our sets of mothers and fathers, we could tell immediately which kind of match we had. But lets assume that we don't.

So, the question is:

From a statistical point of view, what is the probability of the three-way match turning out to be "Disjoint" rather than "Coherent"?

Have any of you run across a research paper on this? If not, have you given this problem any thought?

I have been thinking about it a lot. I am thinking that the three-way match is two times (plus x) more likely to turn out to Coherent rather than Disjoint (where x is an added factor based on two of the three being more closely related). But I am not totally convinced that my logic is sound. If someone provides a citation where this problem has already been figured out, there is no need for me to explain this logic. Otherwise, I'll make some more pictures and post it as a follow-up.

--Mike

Mike Edwards
12-10-2013, 08:20 AM
I am responding to my own thread and adding my probability calculations for one simple case. I am repeating everything so that its not broken up. And I changed the pictures slightly.

-------------------------------------------------------------------------
Okay,so I'm out on my DNA-Testing Company's web-site looking for matches. On one autosomal chromosome, I find two people that match me in the same place (or so) on one segment. Further research show that they match each other there as well. So we have the following situation:

1048

But there are two versions of that segment of DNA for each of us, which causes there to be two different ways that we can have a three-way match:

1047

Now if we had the DNA of each of our sets of mothers and fathers, we could tell immediately which kind of match we had. But lets assume that we don't.

So, the question is:

From a statistical point of view, what is the probability of the three-way match turning out to be "Disjoint" rather than "Coherent"?

Have any of you run across a research paper on this? If not, have you given this problem any thought?

I have been thinking about it a lot. I think I have an angle on this. Lets look at the topologies of the two possible descendancies that could lead to these two possible match configurations:

For the coherent match case, lets assume that Misters Plum, Peach and Lemon have the same MRCA and he is exactly 5 generations above them. So they are all 4th cousins. (Having all three tree-up to just one MRCA is, of course, not the way that this usually happens - two of the three will normally tree up together at a closer ancestor and that ancestor will tree up and join the third linage higher. I'll cover that case later.)

We want the setup for the disjoint match case to be equivalent. So lets assume that in the disjoint match case that each pair of Misters Plum, Peach and Lemon have separate MRCAs but for each pair, the MRCA is exactly 5 generation above them. So they are pair-wise 4th cousins.

1045

Now at this point, you have to think about the diagrams and convince yourself (or not) that:

These descendancy topologies do produce the two kinds of matching.
These descendancy topologies are equivalent in terms of comparing them together to see which is more likely to be the actual decendancy.

To be clear, lets re-draw the two topologies and show the 5-generation-steps up to the MRCAs.

1046

And let's be even more specific: Lets assume that the length of the common matching segment for our problem is 10 centMorgans. Remember what a centaMorgan measures:

In genetics, a centimorgan (abbreviated cM) is a unit for measuring genetic linkage. It is defined as the distance between chromosome positions for which the expected average number of intervening chromosomal crossovers in a single generation is 0.01 (ie, 1%).

.. and that a one cM segment's chance of making it thru to a particular child is (1/2) *(99/100). So the chance of a segment with length c making it thru to the next generation is (1/2)*((99/100)^c). Since we are letting c=10, this is (1/2)*((99/100)^10) which calculates out to 0.45 or 0.45%.

So lets look at the Coherent Matches case. For the common segment to be passed from the common ancestor down to Mr. Peach, it has to remain unbroken thru 5 recombination/meiosis steps. The odds of that happening is 0.45^5. But the same segment has to be passed down, unbroken to Mr. Plum and to Mr. Lemon. so the chance that all three thing happen is 0.45^15

Now lets look at the Disjoint Matches case. The MRCA of Mr. Peach and Mr. Plum has to pass his coding of the common segment down to both Mr. Peach and Mr. Plum. That is a total of 10 generational steps. The chances of this happening is 045^10. But the same thing has to happen for Mr. Lemon and Mr.Peach's MRCA. And the same thing has to happen for Mr. Lemon and Mr.Plum's MRCA. so the chance that the Disjoint Case can occur is 0.45^30.

The key thing to see is the disjoint case requires 30 steps (that do not break up our segment range); the coherent case only requires 15 steps.

So the likelihood of the coherent case occurring compared to the disjoint case occurring is:

(0.45^15)/045^30

which is equal to:

(0.45^(-15)) = 147969.002

So we shouldn't expect the disjoint case to happen very often.

--Mike