PDA

View Full Version : TMRCA estimates based on surname cluster milestones and variable years per YSNP



RobertCasey
10-08-2019, 11:38 PM
I just completed a 30 minute video that shows how to create much more accurate TMRCA estimates than the YSTR and YSNP only approaches. This approach first creates TMRCA milestone dates for surname clusters. This information is not used by other YSTR only or YSNP only approaches. This approach assigns any YSTR or YSNP branch the TMRCA = 1000 AD for the L226 Irish haplogroup where there are 31 known surname clusters to date. Also, for 55 of the 133 YSNP branches, this method uses variable years per YSNP which eliminates a lot of the statistical variation that exists.

https://www.youtube.com/watch?v=sKaxanrxBgs&t=4s

Dave-V
11-24-2019, 02:39 PM
Please watch the YouTube video which clearly shows a dozen examples of just how bad the FTDNA match lists are (and the Tip Tool based on same math).

The first three that I analyzed (Y67):

L226 - O'Brien - Surname signature - one mutation - false positives = 91.1 % - with GD = 5, only reduced to 83.6 % false positives - these testers have very low genetic distance from the L226 signature
L226 - Casey - Surname signature - five mutations - false positives = 83.6 % - with GD = 4, reduced to 5.2 % false positives - these testers have very high genetic distance from the L226 signature
A5629 - Gleason (Maurice Gleason's surname cluster) - false positives = 74.4 % - with GD = 4 has 33 % false positives and GD = 3 has 0 % false positives - genetic distance filter could eliminate all false positives

What are the peer reviewed publications ?


These are clearly no where close to 5 % that FTDNA advertises.

Note that Maurice's haplogroup has some convergence which few haplogroups have.

Continuing this discussion here instead of the other thread since it was a tangent from the original poster's question on the other thread.


Please watch the YouTube video which clearly shows a dozen examples of just how bad the FTDNA match lists are (and the Tip Tool based on same math).

That's the thing about statistics, examples don't prove that the statistics are bad. Every statistical curve has outliers. You actually have to show through statistics that the statistics are bad.

You quote kurtosis above 2.0 as a reason to reject the statistics that has been in use for over a decade for STR-based TMRCAs but kurtosis is just a measure of distribution curves like skewness. Actually (https://en.wikipedia.org/wiki/Kurtosis) a normal distribution has a kurtosis of 3, so I'm confused about your definition.

You also say that the "regular math" for STR-based TMRCAs depends on a normal distribution. Can you substantiate that? It doesn't as far as I can tell. There is an error range for the distribution of actual dates based on STR mutations and a 50% point where the peak of the distribution occurs. Whether that peak follows a normal distribution, or is skewed, or how high the peak is doesn't really enter into it.

It's true we do usually talk about an "average" within that error range as if most people would fall on that average point, and not all distributions HAVE a peak like that. So if you mean that that average is more typical sometimes than others (as in a higher kurtosis WOULD make that average less typical) then I agree with that, but it doesn't invalidate the approach.



The first three that I analyzed (Y67):

L226 - O'Brien - Surname signature - one mutation - false positives = 91.1 % - with GD = 5, only reduced to 83.6 % false positives - these testers have very low genetic distance from the L226 signature
L226 - Casey - Surname signature - five mutations - false positives = 83.6 % - with GD = 4, reduced to 5.2 % false positives - these testers have very high genetic distance from the L226 signature
A5629 - Gleason (Maurice Gleason's surname cluster) - false positives = 74.4 % - with GD = 4 has 33 % false positives and GD = 3 has 0 % false positives - genetic distance filter could eliminate all false positives


My problem here is that you're comparing the two approaches and saying the "regular" one is bad when it doesn't agree with yours. You can't do that, there needs to be some objective measure. Taking your approach as the standard by definition just means you're concluding the assumption that you started with.

I'm very concerned about tying surnames to signatures - while I think it's a neat idea, I think it introduces too much error. You don't really know when surnames were adopted on specific lines. You're using the current tested groups as if they are representative of the surname adoption patterns when you really don't know that. And you're assuming (I believe) that the common ancestors of those currently-tested groups lived very close to the generation in which the surname was adopted. All of those are assumptions that introduce unknown errors into the calculations.



What are the peer reviewed publications ?


These are clearly no where close to 5 % that FTDNA advertises.

In the original thread I was talking about the Bruce Walsh original paper (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1461668/), Nordtvedt's adaptation of it (http://www.jogg.info/pages/42/files/Nordtvedt.htm), the Adamov paper (https://www.academia.edu/11554977/Defining_a_New_Rate_Constant_for_Y-Chromosome_SNPs_based_on_Full_Sequencing_Data), there are several others that were mainly derivative of Walsh's original paper. All of whom use essentially the same statistics to calculate TMRCAs. None of them talk about kurtosis as an invalidator of the statistical approaches, and you would think it would have been mentioned especially in the later papers.

I'm sorry I don't know what 5% is advertised by FTDNA.

Aspar
11-24-2019, 04:43 PM
I've tested first Y-12 at FTDNA and then upgraded to BIGY-500 along with Y-111 str markers freely added.

Now I have two close matches at Y-67 at 4 gd and they are not tested further.
These matches have genealogical tree and they are a nephew and uncle. The nephews's father and the nephews's uncle are brothers.
These guys are at 1 gd on Y-67 str level and the difference is on DYS439 str marker at which the nephew is DYS439=12, the uncle is DYS439=14.

As I already said, I am at gd 4 with these guys, and the differences are: DYS439=13(me)=>12,14(nephew, uncle); DYS460=9(me)=>10(nephew, uncle); DYS576=18(me)=>17(nephew, uncle); CDY=32-36(me)=>32-35(nephew, uncle).
These guys have different surname than mine however their surname and mine own have a same meaning in different languages, basically steaming from the word 'grandma' where their surname is with an Aromanian origin while my own is with Albanian origin.

Moreover, I have some indications that my paternal line probably comes from Southern Albania as I have some ethnographic books about the village my ancestors come from and it's mention that the village was first build and settled by migrants from Southern Albania and Western Macedonia who escaped the tyrannical rule of Ali Pasha of Ioannina.
I can't find any paper trails as this is almost impossible for people of the modern country of North Macedonia. Any birth certificates or other genealogical papers mostly from the Ottoman period were either destroyed or confiscated by the foreign rulers after the Ottomans and the Ottomans themselves. However my ydna matches whose origin is from Southern Albania do have a genealogical tree and they can trace their paternal line to the late 18th century with the same surname as of now. They have trace for one more ancestor prior to the one born in 1780 however they don't have much info about this ancestors other than his name George!
From the same ethnographic book I have for the village where my ancestors come from, there is an information about a shepherd named George who was among the first settlers in the village and who also build a memorial fountain in the village. This info was gathered by the old inhabitants of the village now deceased.

Now I like using this TMRCA calculator: http://www.mymcgee.com/tools/yutility.html?mode=ftdna_mode
Primarily because you can play with the ratio of mutation. Having a list of str markers and their mutation rate I can see that the slowest non matching marker is DYS460 with mutation rate of 0.00402. As such I choose the second option for mutation rate FTDNA 0.004..0.0075.
When I set the probability factor to 50%, I get a TMRCA of only 240 years or 8 generations ago or the middle of the 18th century, the exact time when this George guy could have been born.

Now this is fascinating to me as it takes me very close to what I am looking for and discovering my roots although without paper trail this is very hard to be confirmed.