# Thread: TMRCA: Conditional Probability versus Absolute Probability

1. Using the study of Bruce Walsh, I have tried to deduce an estimation of the absolute TMRCA (whatever the surnames are) from the conditional TMRCA (when the surname is shared) computed by FTDNA TiP.

In principle the conditional probability is obtained by dividing the absolute probability by a coefficient equal to the probability for 2 persons to share the same surname, which is equal to the frequency of the surname in a given population. Let F be this frequency.

If we look at the posterior density (formula 12), we can simplify it by a first order approximation, and get a density which only depends on t^(n-k) for n and k fixed.

Now, if we integrate this density, by performing a substitution of the variable t, consisting of multiplying t by a coefficient c, we will dilate the integration interval and multiply the density by (1/c)^(n-k+1).

Consequently, we can get rid of F, if we equal c^(n-k+1) to 1/F, that is by choosing c equal to the (n-k+1) root of 1/F.

Conclusion: TMRCA (absolute) = (1/F)^(1/(n-k+1)) TMRCA (conditional)

I have tried to estimate a value of F by looking at the French population: most of the genealogists consider that there are approximately 350 000 surnames (modulo their variants) for 60 millions of people, corresponding to 171 individuals in a surname cluster. So F=171/60000000, and 1/F= 351000

Turning to my personal case:

At the Y 67 level: one match with n=67 and k=60, so:

c=351000^(1/8)=5

Using FTDNA TiP, I get (conditional TMRCA| absolute TMRCA| percentage)

4 20 23.58%

8 40 68.57%

12 60 91.1%

At the Y 111 level: same match with n=111 and k=110, so:

c=351000^(1/12)=3

4 12 0.65%

8 24 17.71%

12 36 56.61%

16 48 85.5%

It is coherent with my first estimation, based on historical research.

2. Finally I got the response regarding the interpretation of the results provided by FTDNATiP: at the Y12, Y25 level a match with a different surname is not relevant, but at the Y37, and a fortiori at the Y67 and Y111 it is a real match, so a NPE or a period before surname adoption, and the TiP estimates the absolute TMRCA back to this NPE or to the period before surname adoption.

And I suppose that my US match at the Y67 level already knows that and he has seen that the TiP gives 80% of being related 12 or 18 generations ago, that is why he does not reply to me anymore...

"When comparing two individuals, the DNA test shows them to be closely related. However, one has my surname and the other does not. Does the match with the similar surname mean we are more closely related?

In many cases, you are more likely to be recently related to a match who also shares your surname. However, there are several reasons why recently related individuals may not share the same surname (undocumented adoption, false paternity). If the match is with one of our basic tests (Y-DNA12 or Y-DNA25), it may make sense to upgrade to a Y-NDA37, Y-DNA67, or Y-DNA111 test to increase the precision or further exclude the probability of a recent common ancestor."

https://www.familytreedna.com/learn/...atching-tools/

4. Cheekily jumping in. What would be the TMRCA for a same surname match of 37-2?

5. It depends on which STR markers the mismatches occur:

If it is on high-rate mutating markers (as it is in my case) it will not drastically increase the TMRCA.

If it is on slow-rate mutating markers, it can drastically increase the TMRCA.

And so here is the interest of the FTDNATiP: it takes into account the mutation-rate of each STR marker.

In fact here is the process:

First you have to obtain a short list, so you have to upgrade the Y level till you reach this short list, in my case I reach the short list at the Y37 level:

700 matches at the Y12 level.

80 matches at the Y25 level.

2 matches at the Y37 level

But my haplogroup is rare, if it is a more common haplogroup you may have to upgrade to the Y67 level or Y111 level.

And once you have obtained your short list, you know that you are within your family, and it is time (before it is not relevant) to use the FTDNATip for estimating the TMRCA between you and your cousins.

Of course an estimation at the Y111 level will be more accurate than at the Y67 level, but, as far as I have understood, it will not change the short list of your cousins. (in my case I was misled by the fact that my Y67 match disappeared at the Y111 level, I thought that we were not related within the genealogical time frame, but FTDNATiP, even at the Y111 level, shows that we are)

7. 12 markers -99 matches . No same surname matches
25 markers- 43 matches. 1 same surname match 25-1
37 markers - 20 matches. 1 same surname match 37-2
67 markers - 17 matches. No same surname matches
111 markers - 6 matches. No same surname matches

My haplogroup is rare ie only one ,non surname, match on yFull.

My surname match is not in any projects so I'm not sure how I can compare markers .

8. Interesting, normally the number of matches is exponentially decreasing according to the number of markers: when multiplying by 2 the number of markers (from Y12 to Y25) the number of Y25 matches is approximately the square root of the number of Y12 matches. Your case seems to correspond to endogamic marriages.

Currently I have not a satisfactory response: it seems that FTDNA has put in place a double probabilistic filter: the number of mismatches + the mutating-rate of each STR markers,it seems to be redundant, but there should be a reason, I have to think about that...

In the match report at 37 markers, it should say in your match with the same surname what level he has tested to. i suspect he has only tested to 37 markers - can you confirm?

If so, he will not by definition appear in the match reports at higher number of markers since you and he can't be compared with the additional markers.

It IS possible that a match will show up at lower marker report levels but not at higher ones when he has tested higher; that's typically because his genetic distance at the higher number of markers exceeds the report limit. The reports stop showing matches after 7 genetic distance for Y67, and after 10 genetic distance at Y111. It seems unlikely that a match with the same surname exceeds those genetic distances which is why I'm pretty sure he has only tested to Y37.

At Y37 a genetic distance of 2 will most of the time have a common ancestor at 6 generations back but 95% of the time will be between 2 and 16 generations ago. That large gap can only be shortened by looking at the specific markers, and as you said that requires you both to be in a project.

10. Yes, he's only to tested to 37 markers. I've tried to get him to upgrade ,including offering to pay, without success . It's the height of frustration.

12. When you look at the first TMRCA tables displayed by FTDNA, it is clear that the validity of their estimations was conditioned by the fact that the two matches shared the same surname:

“If two men share a surname, how should the genetic distance at…”

But now it has changed: in their tutorial they do not mention anymore, conditions or restrictions regarding the usage of their TiP. Probably because there are now new customers intending to find their biological lineage (in case of adoption) or interesting in the historical timeframe, like me (before the adoption of surnames)

However the validity of the TiP estimations is still conditioned: the estimation at a certain Y level is correct if the other markers at the higher levels have not drastically changed.

Example:

For the same match, I have gotten:

At the Y 12 level: 56% to be related within the 8 past generations

At the Y25 level: 27% to be related within the 8 past generations

At the Y37 level: 12% to be related within the 8 past generations

At the Y67 level: 3% to be related within the 8 past generations

At the Y111 level: 0% to be related within the 8 past generations

It is clear on this example that the condition “the other markers at the higher levels have not drastically changed” is never satisfied, that is why these estimations are not correct, except when you reach 0%, as you cannot go lower.

The ideal would be to test all of the STR markers, in order to reach an unconditioned estimation, or at least enough markers to get a good approximation of this unconditioned estimation, but how many markers exactly? 111? 500? 700? 2000? I do not know.

What I have understood: in order to reduce the uncertainty of the TiP conditional estimations, FTDNA has coupled the TiP with its old criterion of the number of mismatches. It is purely empirical: they suppose that if the number of mismatches at each Y level, do not exceed a certain threshold, we can reasonably estimate that the condition “the other markers at the higher levels have not drastically changed” is satisfied and so the TiP estimation is absolutely correct.

But in my case, I am not convinced, because my Y 67 match disappears at the Y 111 level, and so my mismatch number has exceeded the threshold at the Y 111 level, and normally, in this case, the TiP estimation is no more valid. And I would say: even if the threshold at the Y 111 level was not exceeded, it could have been exceeded at higher levels (Y 500, Y 700 etc...)

And I am a bit disappointed, because my US match with French ancestry, believes that it is a NPE and apparently does not want to be further tested. But in fact it is not sure at all, without a comparison between my 32 unique SNP ancestral mutations and his unique SNP ancestral mutations, it is impossible to say that it is a NPE or not.

13. No common ancestor in last 5 generations

25-1

Generations Percentage
6 37.69%
8 61.17%
10 75.81%
12 84.92%
14 90.61%
16 94.15%
18 96.35%
20 97.73%
22 98.58%
24 99.12%

37-2

Generations Percentage
6 45.98%
8 72.3%
10 86.28%
12 93.37%
14 96.85%
16 98.53%
18 99.32%
20 99.69%
22 99.86%
24 99.94%

