PDA

View Full Version : Subclades of P312 & U106



Wing Genealogist
06-20-2017, 01:57 PM
Currently, U106 has twelve subclades/branches (not including a presumed extinct branch from ancient DNA [RISE 98]). U106 has two larger subclades (Z381 & Z18) and several smaller clades (FGC3861/Z8056, S12025, S18632, FGC396, S19589, BY11501 & A2150). We currently have three tiny "family" branches which are currently limited to a single surname (these three branches would otherwise be considered U106*).

How does this compare with P312? As far as I am aware, P312 has three large subclades (L21, DF27, U152) and two (possibly more) smaller clades DF19 & ?ZZ37_1? I understand there are other tiny "family" clades as well, but am not familiar with them.

It is looking (to me) like U106 has almost twice as many subclades immediately below it than P312.

Given the fact U106 initially appeared to be more successful than P312 (at least in spawning new lines) it does seem strange that P312 is much more common than U106 today. The descendants of P312 (or at least the descendants of its largest subclade, L21 and possibly DF27 and/or U152 as well) must have enjoyed a period of explosive growth to account for the modern day dominance.

Earlier, I had remarked (in regards to U106) how the Z381 subclade immediately experienced a period of growth (spawning numerous subclades in fairly quick succession). All the other U106 subclades had a relative period of dormancy (noted by each subclade having multiple equivalent SNPs as well as an estimated age much younger than the estimated age of U106 itself). Z18 arose roughly 600 years after the formation of U106 and all of the smaller subclades are even younger than U106.

It would be interesting to know of the periods when the various subclades of P312 experienced explosive growth and dormancy/bottlenecks. I did recently see where L21 itself has several equivalent SNPs, which is an indication there was a period of time when it was stagnant. I find it interesting that this is today the largest subclade of P312.

rms2
06-20-2017, 02:11 PM
As far as I know, as you said, the three major subclades of P312 are DF27, L21, and U152, with two somewhat smaller subclades: DF19 and DF99. I've never heard of ZZ37_1.

Mike Walsh can probably tell you about the existence of even smaller branches off of P312, but I have not kept track of them, if there are any.

swid
06-20-2017, 02:17 PM
Alex' analysis of NGS results has done a good job of teasing out the branching immediately below P312 - his main page (http://ytree.net/) illustrates it quite nicely.

Much of P312 is united by a single SNP - Z40481. This SNP is above ZZ11, DF99, and ZZ37. ZZ11 is above U152 and DF27, and ZZ37 is above L624, Z29644, and Z39300.

The other clades directly below P312 are DF19, Z290 (which was recently found to be upstream of L21), L238, Y18209, and A9063.

So...overall, P312 has 6 known clades directly below it - two of which are very large (Z40481 and Z290), two smaller ones (DF19 and L238), and two that only been found in a handful of men (Y18209 and A9063).

rms2
06-20-2017, 03:01 PM
How many men thus far have been found to be Z290+ but L21-?

rms2
06-20-2017, 03:15 PM
How many men thus far have been found to be Z290+ but L21-?

From what I can see from Alex's Big Tree it looks like two, both with the surname Reynolds, and both sharing a long list of SNPs below Z290 that they don't share with anyone else who is derived for Z290.

It would be nice to find someone outside that one family who is Z290+ and L21-

Mikewww
06-20-2017, 03:34 PM
Alex' analysis of NGS results has done a good job of teasing out the branching immediately below P312 - his main page (http://ytree.net/) illustrates it quite nicely.

Much of P312 is united by a single SNP - Z40481. This SNP is above ZZ11, DF99, and ZZ37. ZZ11 is above U152 and DF27, and ZZ37 is above L624, Z29644, and Z39300.

The other clades directly below P312 are DF19, Z290 (which was recently found to be upstream of L21), L238, Y18209, and A9063.

So...overall, P312 has 6 known clades directly below it - two of which are very large (Z40481 and Z290), two smaller ones (DF19 and L238), and two that only been found in a handful of men (Y18209 and A9063).

Thanks, SWID. This is how I depict for the R1b project. Does this match what you have?

https://www.familytreedna.com/groups/r-1b/about

I think ZZ38 is testable where as ZZ37 is not so I use ZZ38. L624 is also shaky so I use an equivalent.

Mikewww
06-20-2017, 03:36 PM
Currently, U106 has twelve subclades/branches (not including a presumed extinct branch from ancient DNA [RISE 98]). U106 has two larger subclades (Z381 & Z18) and several smaller clades (FGC3861/Z8056, S12025, S18632, FGC396, S19589, BY11501 & A2150). We currently have three tiny "family" branches which are currently limited to a single surname (these three branches would otherwise be considered U106*).

Do I have the U106 side correct?

https://www.familytreedna.com/groups/r-1b/about

Wing Genealogist
06-20-2017, 03:45 PM
Do I have the U106 side correct?

https://www.familytreedna.com/groups/r-1b/about

I note you do include all but the three "family" clades (and the presumably extinct RISE 98 clade). I personally would not include these clades in your tree either.

Wing Genealogist
06-20-2017, 03:52 PM
Alex' analysis of NGS results has done a good job of teasing out the branching immediately below P312 - his main page (http://ytree.net/) illustrates it quite nicely.

Much of P312 is united by a single SNP - Z40481. This SNP is above ZZ11, DF99, and ZZ37. ZZ11 is above U152 and DF27, and ZZ37 is above L624, Z29644, and Z39300.

The other clades directly below P312 are DF19, Z290 (which was recently found to be upstream of L21), L238, Y18209, and A9063.

So...overall, P312 has 6 known clades directly below it - two of which are very large (Z40481 and Z290), two smaller ones (DF19 and L238), and two that only been found in a handful of men (Y18209 and A9063).

As Alex explains at his site, Z40481 is actually a STR mutation, not a SNP mutation. We believe U106 has a similar STR mutation at the top of its tree. The ancestral value for DYS492 is 12 repeats. Virtually all U106+ results have 492=13. However, one of the small subclades directly below U106 (A2150) also has the ancestral value of 492=12.

Wing Genealogist
06-20-2017, 03:56 PM
From what I can see from Alex's Big Tree it looks like two, both with the surname Reynolds, and both sharing a long list of SNPs below Z290 that they don't share with anyone else who is derived for Z290.

It would be nice to find someone outside that one family who is Z290+ and L21-

As I reported above, The U106 tree currently has three tiny "family" clades which (at the present time) is also limited to a single surname. They are analogous to the Z290+/L21- clade.

As only a small fraction of the population has done any NGS/WGS testing, future results will possibly add further structure to even these "upper" layers of the tree.

rms2
06-20-2017, 04:12 PM
As I reported above, The U106 tree currently has three tiny "family" clades which (at the present time) is also limited to a single surname. They are analogous to the Z290+/L21- clade.

As only a small fraction of the population has done any NGS/WGS testing, future results will possibly add further structure to even these "upper" layers of the tree.

This thread reminded me of a discussion of some of these things back in the winter. I believe Z40481 was discussed back then, and the fact that it is actually an STR came up.

I also remember the first Reynolds L21- Z290+ Z260+ etc. result coming up back then. Now there has been a second member of that family with the same result. I'm a little leery of results from a single line being used to alter the tree, but I guess with several SNPs at the Z290 level, and the Reynoldses being positive for all of them, but negative for L21, that is significant.

I thought I recalled something, a SNP or an STR, that supposedly linked DF19, L21, and L238. What happened to that? Or am I misremembering?

Dewsloth
06-20-2017, 04:14 PM
DF19 has two immediate subclades (DF88 and Z302) each of which seem to be almost as old as DF19 itself, and whose modern geographical distribution look pretty similar.

The modern MDKA distribution of the whole DF19 group looks like this:

17087

There is also one known family that is neither DF88 nor Z302. Their known ancestry seems to start in England in the early 1700s, so :noidea:

rms2
06-20-2017, 04:24 PM
There is a discussion of Z40481 here (http://www.anthrogenica.com/showthread.php?7150-Z40481-Splits-R-P312&p=156214&viewfull=1#post156214).

That was actually discussed in May of 2016.

Mikewww
06-20-2017, 04:26 PM
...
How does this compare with P312? As far as I am aware, P312 has three large subclades (L21, DF27, U152) and two smaller clades DF19 & ?ZZ37_1? I understand there are other small/"family" clades as well, but am not familiar with them.

It is looking (to me) like U106 has almost twice as many subclades immediately below it than P312.

Given the fact U106 initially appeared to be more successful than P312 (at least in spawning new lines) it does seem strange that P312 is much more common than U106 today. The descendants of P312 (or at least the descendants of its largest subclade, L21 and possibly DF27 and/or U152 as well) must have enjoyed a period of explosive growth to account for the modern day dominance.

Earlier, I had remarked (in regards to U106) how the Z381 subclade immediately experienced a period of growth (spawning numerous subclades in fairly quick succession). All the other U106 subclades had a relative period of dormancy (noted by each subclade having multiple equivalent SNPs as well as an estimated age much younger than the estimated age of U106 itself). Z18 arose roughly 600 years after the formation of U106 and all of the smaller subclades are even younger than U106.

It would be interesting to know of the periods when the various subclades of P312 experienced explosive growth and dormancy/bottlenecks. I did recently see where L21 itself has several equivalent SNPs, which is an indication there was a period of time when it was stagnant. I find it interesting that this is today the largest subclade of P312.

Here are some general points to consider.

1) I would not consider this as comparing P312 and U106, each in the aggregate. P312 subclades of DF19, DF99 and L238 have different regional distributions, just like L21, U152 and DF27 to a lesser degree. It's quite possible that a couple of these took U106's paths.

2) A point you make about L21 may pertain generally. The genetic distance in SNPs between the MRCAs must be considered. L21 is several rungs (SNPs) down the ladder age wise from P312 where as U152 and DF27 are very closely related and very close in age to the P312 MRCA.

3) It's not hard to find new * (paragroup) subclades given some individuals exist that have been tested. All you have to do is get a cousin tested and the subclade gets added to the tree. In that sense, smaller subclades may not be much different than counting individuals.

Here are specific comments.

L21 seemed to have a slightly longer fuse before it exploded. There are large number of subclades immediately downstream of L21 and DF13/Z2542 so it was successful at the point of the L21 & DF13 MRCAs.

L21 may not be as large as we think relative to DF27 in particular. L21 is heavily Isles biased so testing penetration is high.

DF27 has an earlier (than L21) explosion given the large number of subclades directly branching from ZZ12. ZZ12 is a bit of a questionable SNP but the DF27 and ZZ12 blocks are only one SNP each and then you see the explosion. The TMRCAs for these ZZ12 subclades is right on top of DF27 so this could be considered as part of the early P312 expansion.

Z40481 is actually an STR.

Mark or Richard can comment on U152 but there may be both and early and somewhat deferred explosions there too. L2 is in its own class.

GoldenHind
06-20-2017, 06:06 PM
As far as I am aware, P312 has three large subclades (L21, DF27, U152) and two smaller clades DF19 & ?ZZ37_1? I understand there are other small/"family" clades as well, but am not familiar with them.


Given the fact U106 initially appeared to be more successful than P312 (at least in spawning new lines) it does seem strange that P312 is much more common than U106 today. The descendants of P312 (or at least the descendants of its largest subclade, L21 and possibly DF27 and/or U152 as well) must have enjoyed a period of explosive growth to account for the modern day dominance.



As is reflected in some of the above posts, this is incorrect. There are seven currently known subclades of P312. The four less numerous subclades of P312 are:

L238 This appears to have undergone a long bottleneck before branching. There are some people currently working on its substructure. Today it is found primarily in Scandinavia.

DF99 This is the only one of the four united with DF27 and U152 by the upstream Z40481. After a single intermediate SNP (Z296643), it split into three subclades, which probably occurred very early, shortly after the birth of P312. While its outliers are extremely widespread (Portugal to Moscow), it is concentrated in Germany and England. Considering the Isles centric FTDNA database, it appears to be predominantly a continental subclade.

DF19 I haven't really followed this closely beyond looking at its distribution (which has a lot of similarities with that of DF99), so defer to the above post from Dewsloth. It appears to be the most numerous of these four.

ZZ37/ZZ38 This is the most recently identified, and not many know about it. Three subclades have been discovered to date. So far they appear to be closely connected with Wales, Scotland, Cornwall and Ireland.

None of these can be described as small/family subclades. While much less numerous than the larger three P312 subclades, they are of equal importance in determining the history of P312. There must be a reason why these four seem to expanded at a smaller rate than the more numerous P312 subclades.

GoldenHind
06-20-2017, 06:34 PM
Here are some general points to consider.

1) I would not consider this as comparing P312 and U106, each in the aggregate. P312 subclades of DF19, DF99 and L238 have different regional distributions, just like L21, U152 and DF27 to a lesser degree. It's quite possible that a couple of these took U106's paths.

2) A point you make about L21 may pertain generally. The genetic distance in SNPs between the MRCAs must be considered. L21 is several rungs (SNPs) down the ladder age wise from P312 where as U152 and DF27 are very closely related and very close in age to the P312 MRCA.

L21 may not be as large as we think relative to DF27 in particular. L21 is heavily Isles biased so testing penetration is high.

DF27 has an earlier (than L21) explosion given the large number of subclades directly branching from ZZ12. ZZ12 is a bit of a questionable SNP but the DF27 and ZZ12 blocks are only one SNP each and then you see the explosion. The TMRCAs for these ZZ12 subclades is right on top of DF27 so this could be considered as part of the early P312 expansion.

Z40481 is actually an STR.

somewhat deferred explosions there too. L2 is in its own class.

Though I was not the first to notice it, I have mentioned before that the combined distribution of L238, DF19 and DF99 is remarkably similar to that of U106 as a whole. This could be coincidental, but I am inclined to doubt it.

DF99 is in the same category with DF27 and U152 in being very closely related and very near the age of P312.

The predominance of DF27 in Iberia and comparative scarcity (to L21) in the Isles has misled many into considering it as an Iberian subclade. One has to take into consideration the technical difficulty in identifying it in DNA testing. I believe it is the only P312 subclade which can be truly considered pan-European; I also feel fairly confident that it will eventually be proven to be by far the most numerous of the P312 subclades.

Wing Genealogist
06-20-2017, 06:45 PM
... None of these can be described as small/family subclades.

Please note I meant to make a clear distinction between smaller clades (as opposed to the behemoth large clades) and the family clades. The smaller clades are equally important, in terms of history and movement. It is only the family clades which do not currently merit the same level of attention.

Wing Genealogist
06-20-2017, 07:00 PM
Ignoring the Z40481 STR, it appears the tree immediately below P312 is composed of:

2 HUGE: ZZ11 (parent clade of U152 & DF27), Z260 (parent clade of L21)
Next level: L238, DF99, DF19, ZZ38, Y18209 & A9063



In comparison, U106 is composed of:

1 HUGE: Z381, 1 large: Z18
Next level: Z8052, S12025, S12025, S18632, FGC396, S19589, BY11501, A2150


EDIT: I am trying to list the SNPs in order of Largest to smallest. I am confident in the arrangement for U106, but am basically taking a guess for P312.

Mikewww
06-20-2017, 08:40 PM
Ignoring the Z40481 STR, it appears the tree immediately below P312 is composed of:

2 HUGE: ZZ11 (parent clade of U152 & DF27), Z260 (parent clade of L21)
Next level: L238, DF99, DF19, ZZ38, Y18209 & A9063



In comparison, U106 is composed of:

1 HUGE: Z381, 1 large: Z18
Next level: Z8052, S12025, S12025, S18632, FGC396, S19589, BY11501, A2150


EDIT: I am trying to list the SNPs in order of Largest to smallest. I am confident in the arrangement for U106, but am basically taking a guess for P312.

You can go look at the L21, U152, DF27, DF19 and P312 projects to get an idea.

L21 looks by far biggest but my guess is before all is said and done DF27 is largest. L21 looks to be next. After that is U152, then DF19 which is not too shabby a size. Then it is L238 and DF99 and Z2238, right Robert H?

Y18209 and A9063 are quite small population wise.

I'll go back to my caveat on comparing these subclades and what "layer" or branching level they are at. DF27 shows us the vagaries of comparing at thought to be comparable levels. DF27 doesn't even show up in standard testing and in some worlds of thought would not be considered valid (simple, nominal model). ZZ11 and ZZ12 could also easily be rejected. If you rejected those like a stern lab tech would you have a whole ton of branches below ZZ12 that become immediate branches of P312.

http://www.ytree.net/DisplayTree.php?blockID=31

I count about two dozen branches immediately from ZZ12.

I don't know if Pan-European is the right description but I agree with the idea as I've noted before DF27 is more scattered than the other big subclades of P312. It appears to have been right in the middle of the big east meets west conflict in the mid-third millenium BC.

It's fun to look at all the flags at the bottom the ZZ12 Big Tree (add Z195 too). You see, Spain and Portugal of course, but also find Germany, France, Switzerland, Italy, Slovakia, Ireland, Scotland, England, Wales, Denmark, Netherlands, Belgium, Sweden, Finland. I know we have Poland out there somewhere.

Okay that's it. I've changed my mind. You see Lithuania, Ukraine, Belarus, Romania, Armenia and Turkey too. This is a Pan-European haplogroup.

Dewsloth
06-20-2017, 09:03 PM
Just a quick check of FTDNA groups. Note L21 is larger than P312, so these can't be taken as empirically accurate indicators and are subject to whatever demographic biases FTDNA may have:
R1b: 14,230 members
R1b Basal Subclades: 1,927
U106: 3,776
P312: 3,141
L21: 5,807
U152: 1,993
DF27: 1,893
DF19: 284
L238: 114
DF99: 83

Osiris
06-20-2017, 09:10 PM
It would be interesting to have graphs of # of known living subclades versus a timeline for various clades like P312, U106, etc. I tried to see if I could use the ytree to get a basic understanding but was unable to extract the data in a format I could use. Was thinking I could replace timeline with a SNP count. Be cool to make something like the histomap

Cofgene
06-20-2017, 11:31 PM
It would be interesting to have graphs of # of known living subclades versus a timeline for various clades like P312, U106, etc. I tried to see if I could use the ytree to get a basic understanding but was unable to extract the data in a format I could use. Was thinking I could replace timeline with a SNP count. Be cool to make something like the histomap


That is coming.... active discussions on the current state of Iain McDonald's age estimation result for P312 are in process. The SVG graphic that is provided as part of the output will line everything up on a timeline. There are still usability issues with that particular presentation. Individuals with advanced skills in html5/SVG could build a better output. We need to improve this format: http://www.jb.man.ac.uk/~mcdonald/genetics/tree.html send me a PM if you have a technical wizard that could help via an open source solution.

Osiris
06-21-2017, 12:01 AM
send me a PM if you have a technical wizard that could help via an open source solution.
If I knew back in college how much math, statistics and programming I could use in my genealogy research I'd have taken such a different course then I did. If you need someone to search an entire census reel from Walker Co., AL in 1850 looking for a specific Jones family then I'm your guy.

razyn
06-21-2017, 12:23 AM
I'll go back to my caveat on comparing these subclades and what "layer" or branching level they are at. DF27 shows us the vagaries of comparing at thought to be comparable levels. DF27 doesn't even show up in standard testing and in some worlds of thought would not be considered valid (simple, nominal model). ZZ11 and ZZ12 could also easily be rejected. If you rejected those like a stern lab tech would you have a whole ton of branches below ZZ12 that become immediate branches of P312.

Since you mention it, I'll go back to my caveat that the concept of a mutation rate is both theoretical (if we believe there is one, we can apply it in formulae) and very flexible (the number of years used as this "rate" varies wildly over time, across haplogroups, between labs, and as new upstream SNPs get discovered). So it's useful to have one, and I applaud those who try to refine it; but I'm not actually a believer in the underlying assumption (that there genuinely is, or ever was, a rate).

Even less do I concur in the widespread consensus that a Single Nucleotide Polymorphism rules the lineages of its descendants; but other mutations (in STRs, on palindromes, Indels, RecLOH events, etc.) do not, so we need not count them. Although everything later has branched, as of that non-SNP event -- notably including the migratory paths of its bearers, or non-bearers. Leaving off our tree the ones that aren't really SNPs, or are SNPs but are in a troublesome region, more or less cripples the runner before the race begins. We may never agree on [I]how to calculate the age of ZZ11 (or whatever, pick your own problem mutation); but it's older than its sons, and younger than its daddy. It happened to one guy in one place at one time, and is heritable.

The context of this particular rant, most recently, was a little disagreement I was having with Mikewww. Actually I think we are in agreement, about what he's said on the present thread. He mentioned his caveat, so I'm mentioning mine. http://www.anthrogenica.com/showthread.php?10749-Corded-Ware-origin-for-P312&p=240561&viewfull=1#post240561

Wing Genealogist
06-21-2017, 12:28 AM
This discussion has helped me better understand just what a complex "beast" the P312 clade is. I always knew it was much larger than U106, but I never really paid much attention to the details. Now that I am looking at the details, I think I need some Excedrin :biggrin1:

GoldenHind
06-21-2017, 01:48 AM
You can go look at the L21, U152, DF27, DF19 and P312 projects to get an idea.

L21 looks by far biggest but my guess is before all is said and done DF27 is largest. L21 looks to be next. After that is U152, then DF19 which is not too shabby a size. Then it is L238 and DF99 and Z2238, right Robert H?

Y18209 and A9063 are quite small population wise.

I'll go back to my caveat on comparing these subclades and what "layer" or branching level they are at. DF27 shows us the vagaries of comparing at thought to be comparable levels. DF27 doesn't even show up in standard testing and in some worlds of thought would not be considered valid (simple, nominal model). ZZ11 and ZZ12 could also easily be rejected. If you rejected those like a stern lab tech would you have a whole ton of branches below ZZ12 that become immediate branches of P312.

I don't know if Pan-European is the right description but I agree with the idea as I've noted before DF27 is more scattered than the other big subclades of P312. It appears to have been right in the middle of the big east meets west conflict in the mid-third millenium BC.

It's fun to look at all the flags at the bottom the ZZ12 Big Tree (add Z195 too). You see, Spain and Portugal of course, but also find Germany, France, Switzerland, Italy, Slovakia, Ireland, Scotland, England, Wales, Denmark, Netherlands, Belgium, Sweden, Finland. I know we have Poland out there somewhere.

Okay that's it. I've changed my mind. You see Lithuania, Ukraine, Belarus, Romania, Armenia and Turkey too. This is a Pan-European haplogroup.

You may well be right about the relative sizes of the P312 subclades, but counting subclade project numbers may not be an accurate method of determining this. For example, DF99, unlike DF19, is not included in the Geno 2 test, which has identified many DF19 men. Secondly DF99 wasn't discovered until several years after the others (except for ZZ37/38), so has several years of catching up to do. It has only been with the recent introduction of the SMS-M343 test that the numbers of DF99 have really begun to expand. Finally DF99 appears to be primarily a continental marker, and so suffers from the enormous over weighting samples of Isles origins in the FTDNA database.

L238 was discovered fairly early on, and has the advantage of being readily identifiable from its STR signature (Nordtvedt's old R1b-Norse variety).

What is really needed is some scientific sampling studies in includes testing of all of these. The only one I am aware of is the sample of 500 men from Holland in the Genomes of the Netherlands study. Of the 500, 93 were P312. Of these, 36 were U152, 22 were DF27>Z195, 16 were L21, 10 were DF19, 4 were DF99, there were no L238, and the remaining 5 were unclassified beyond P312. DF27 itself wasn't tested. Of course these proportions would likely vary considerably from country to country.

Mikewww
06-21-2017, 02:14 AM
You may well be right about the relative sizes of the P312 subclades, but counting subclade project numbers may not be an accurate method of determining this. .
I agree totally. I was just trying to respond to Raymond with some way to comprehend the sizes of the various subclades below:

I am trying to list the SNPs in order of Largest to smallest



What is really needed is some scientific sampling studies in includes testing of all of these.
Agreed. We just don't have it, though.

Mikewww
06-21-2017, 02:24 AM
Since you mention it, I'll go back to my caveat that the concept of a mutation rate is both theoretical (if we believe there is one, we can apply it in formulae) and very flexible (the number of years used as this "rate" varies wildly over time, across haplogroups, between labs, and as new upstream SNPs get discovered). So it's useful to have one, and I applaud those who try to refine it; but I'm not actually a believer in the underlying assumption (that there genuinely is, or ever was, a rate).
There is a mutation rate. That is for sure. What is not precisely known is how much variation we have in the mutation rate. YFull, McDonald and others try to estimate the confidence intervals for these estimates.
This is not a "belief" thing. It is a reality. The question is if the variance is so great that TMRCA estimates are useful or not and to what extent.


Even less do I concur in the widespread consensus that a Single Nucleotide Polymorphism rules the lineages of its descendants; but other mutations (in STRs, on palindromes, Indels, RecLOH events, etc.) do not, so we need not count them.
I absolutely agree. TMCRA estimation methods would be better if they use all available data, all available variants. There some 300-500 STRs hidden in Big Y BAM files that need to be teased out and used along with other available data, such as constraints set by ancient DNA finds.
The state of our TMRCA methods are at a low level though. That is not a criticism of the great work of hobbyists and scientists. That is an assessment of their capability/funding to use real analytical computational assets, like people working on cancer and other medical issues are.


We may never agree on [I]how to calculate the age of ZZ11 (or whatever, pick your own problem mutation); but it's older than its sons, and younger than its daddy.
Agreed, that's why I'm stressing the low levels of genetic distance between some of these subclade MRCAs. The MRCA of P312, the MRCA of U152, the MRCA of DF27 and the MRCA of DF99 all have very low genetic distances to each other. I think DF19 falls in the same boat.

BTW, DF19 has Germany and English written all over it.

L238 has Nordic written all over it.

DF99 is truly quite scattered, almost like VisiGoth.

DF27 has shown significant impact across the board and so is deserving of a Pan-European moniker. There is a SW European bent, but that is almost more like a wave that hit the Pyrenees, accumulated there, and then spilled over into the Iberian Peninsula over a long period of success.

U152, I don't know. It is Alpine, Italic and right up (or down I should say) the Rhine. Urnfielders and Hallstatt possibly with some DF27 in the mix.

The L21 ancestral lineage was more of steam roller heading northwest and across the English Channel where it set up for a long time.

MitchellSince1893
06-21-2017, 05:18 AM
...U152, I don't know. It is Alpine, Italic and right up (or down I should say) the Rhine. Urnfielders and Hallstatt possibly with some DF27 in the mix...

I would also add La Tene, Gauls, and the Belgae as groups that may have had higher than average U152 representation and be major contributors of U152 in the Isles. I have a feeling if better sampled, present day France, especially the eastern half would show up with high percentages for U152; but the legal issues and cultural views of dna testing prevent us from getting more accurate picture.

rms2
06-21-2017, 01:16 PM
. . .

What is really needed is some scientific sampling studies in includes testing of all of these. The only one I am aware of is the sample of 500 men from Holland in the Genomes of the Netherlands study. Of the 500, 93 were P312. Of these, 36 were U152, 22 were DF27>Z195, 16 were L21, 10 were DF19, 4 were DF99, there were no L238, and the remaining 5 were unclassified. DF27 itself wasn't tested. Of course these proportions would likely vary considerably from country to country.

You're right, we need some better, more up-to-date studies. I don't mean to imply that you said such studies will radically alter the relative proportions of the P312 subclades; you didn't.

However, for those who may not have followed that, we have the Busby et al paper from five years ago or so. They did not test for DF19, DF27, or DF99, but they did have a catchall category, P312xL21,U152, and it's that remainder those subclades have left to split amongst themselves in each sample location.

One can see what proportion of the whole it is at each sample location on this Busby stats map (https://tinyurl.com/yarkw5zf).

Here are some examples:

1. Germany North, N=64: P312xL21,U152 = 3.1%

2. Friesland, N=94: P312xL21,U152 = 4.3%

3. Netherlands, N=87: P312xL21,U152 = 6.9%

4. Germany East, N=47: P312xL21,U152 = 0%

5. Germany West, N = 100: P312xL21,U152 = 10%

6. Germany South, N = 91: P312xL21,U152 = 9.9%

7. Germany Freiburg, N = 102: P312xL21,U152 = 7.8%

9. East France, N = 80: P312xL21,U152 = 7.5%

Those percentages, which, in the examples above, reach their maximums in SW Germany and nearby SE France, would have to be divided among all three of those subclades, DF19, DF27, and DF99, not to mention the very minor P312 subclades mentioned in this thread.

Even in the Genome of the Netherlands Project you mentioned, the percentage of DF19 was just 2% and that of DF99 0.8%.

Dewsloth
06-21-2017, 03:55 PM
You're right, we need some better, more up-to-date studies. I don't mean to imply that you said such studies will radically alter the relative proportions of the P312 subclades; you didn't.

However, for those who may not have followed that, we have the Busby et al paper from five years ago or so. They did not test for DF19, DF27, or DF99, but they did have a catchall category, P312xL21,U152, and it's that remainder those subclades have left to split amongst themselves in each sample location.



Thank you! I think you just solved a mystery:
This must be the source of the goofy "DF19" map and stats that Living DNA supplied for my Dad's results. :lol:

I thought the distribution looked more like DF27 (and it's unlikely that Spain is 35% DF19!); I guess it was the only source Living DNA could find without plagiarizing FTDNA's group map.
See attached: http://www.anthrogenica.com/showthread.php?9681-Living-DNA-the-Y&p=245925&viewfull=1#post245925

rms2
06-21-2017, 04:01 PM
Thank you! I think you just solved a mystery:
This must be the source of the goofy "DF19" map and stats that Living DNA supplied for my Dad's results. :lol:

I thought the distribution looked more like DF27 (and it's unlikely that Spain is 35% DF19!); I guess it was the only source Living DNA could find without plagiarizing FTDNA's group map.
See attached: http://www.anthrogenica.com/showthread.php?9681-Living-DNA-the-Y&p=245925&viewfull=1#post245925

It seems doubtful they based that on Busby et al. The figures don't look like the figures from that paper.

GoldenHind
06-21-2017, 06:16 PM
You're right, we need some better, more up-to-date studies. I don't mean to imply that you said such studies will radically alter the relative proportions of the P312 subclades; you didn't.

However, for those who may not have followed that, we have the Busby et al paper from five years ago or so. They did not test for DF19, DF27, or DF99, but they did have a catchall category, P312xL21,U152, and it's that remainder those subclades have left to split amongst themselves in each sample location.

One can see what proportion of the whole it is at each sample location on this Busby stats map (https://tinyurl.com/yarkw5zf).

Those percentages, which, in the examples above, reach their maximums in SW Germany and nearby SE France, would have to be divided among all three of those subclades, DF19, DF27, and DF99, not to mention the very minor P312 subclades mentioned in this thread.

Even in the Genome of the Netherlands Project you mentioned, the percentage of DF19 was just 2% and that of DF99 0.8%.

The problem with those stats is that the percentages given are of the entire male population, including all haplogroups, which makes them seem quite insignificant. What we are discussing on this thread is their importance within P312. So in the Dutch study, DF19 consists of just under 11% of P312 and DF99 4.3% of P312. Together they represent about 15% of P312, which is hardly insignificant.

Nor would I bet the farm on the reliability of the small sampling from Busby. For instance, looking at his stats for Germany East, one might conclude that DF27, DF19 and DF99 were entirely absent there. However a study was published in 2013 which included 121 samples from Mecklenburg (in eastern Germany) and 218 from Bavaria. They tested U152, L21 and SRY2627, so at least picked up a portion of DF27. The P312xU152,L21,SRY2627 portion of the total was 8.4% in Mecklenburg and 6.8% in Bavaria. Expressed as a percentage of P312, P312xU152,L21,SRY2627 was 44% of P312 in Mecklenburg and 35% in Bavaria. This is considerably different than zero. It also suggests that the DF27/DF19 /DF99/L238 portion of P312 is higher in at least one part of eastern Germany than in the south.

The study in question Rebala, etc. Contemporary Paternal Genetic Landscape of Polish and German Populations, published in the European Jounal of Human Genetics, 2013.

rms2
06-21-2017, 06:51 PM
The problem with those stats is that the percentages given are of the entire male population, including all haplogroups, which makes them seem quite insignificant. What we are discussing on this thread is their importance within P312. So in the Dutch study, DF19 consists of just under 11% of P312 and DF99 4.3% of P312. Together they represent about 15% of P312, which is hardly insignificant.

I've always felt the proportion of the total population is more important than the proportion of just P312 or of just R1b.

If, as you say, the topic is importance within P312, then another way to think of it is that, in that Dutch study, if DF19 and DF99, taken together, comprise 15% of the P312 (yet just 2% and 0.8% of the total population respectively), then 85% of the P312 is something else, and just over 97% of the total population is something else.

However you slice it, according to that Genome of the Netherlands study, out of every 100 Dutchmen, two are DF19. We need 125 Dutchmen to get to one DF99.



Nor would I bet the farm on the reliability of the small sampling from Busby. For instance, looking at his stats for Germany East, one might conclude that DF27, DF19 and DF99 were entirely absent there. However a study was published in 2013 which included 121 samples from Mecklenburg (in eastern Germany) and 218 from Bavaria. They tested U152, L21 and SRY2627, so at least picked up a portion of DF27 . . .

You're comparing apples and oranges, since Busby et al's Germany East sample (N=47), which came from the Myres et al study, was not collected in Mecklenburg, but quite a ways south of there, though not as far west and south as Bavaria.

I listed the sample sizes of the examples I cited. They weren't small. Why should Busby's test results be doubted? I did not like the outcome in every case either, but there is no reason to think the men they tested weren't what the results indicated.

It would be nice if Busby's and Myres' P312xL21,U152 samples could be tested for the current array of P312 SNPs (other than L21 and U152).

The point of my previous post was that, if Busby et al is fairly representative (and I don't think there is much reason to doubt that it is), then there is only so much P312xL21,U152 left to be divvied up between DF19, DF27, DF99, and the rest.

More continental testing is not likely to alter the landscape much. Besides, probably the bulk of that remainder can be expected to go to DF27.

GoldenHind
06-22-2017, 07:07 PM
Those percentages, which, in the examples above, reach their maximums in SW Germany and nearby SE France, would have to be divided among all three of those subclades, DF19, DF27, and DF99, not to mention the very minor P312 subclades mentioned in this thread.

Even in the Genome of the Netherlands Project you mentioned, the percentage of DF19 was just 2% and that of DF99 0.8%.

I don't question the accuracy of of Busby's testing, just the conclusions you drew from it based on a sample of 47 from somewhere in eastern Germany. The Rebala sampling I cited is not only much larger (131 in Mecklenburg and 218 in Bavaria), but also tested for at least a portion of DF27. It clearly shows that the overall percentage of P312xU152,L21,DF27/SRY2627 is larger in Mecklenburg in the east than in Bavaria in the south. Looking at the Busby data one might think that this group fades to nothing away from southern and western Germany, as you clearly do. But that is clearly not the case.

They may well be mainly DF27 as you assume. However while DF27 is pan-European and may well be the largest P312 subclade, that does not mean it is the most common P312 subclade in every region in Europe. I can say that there are at least two among the 80+ members of the DF99 project with confirmed German origins in the former eastern German provinces of Posen and Silesia (both now in Poland), so it is clearly present present in the east.

The question originally presented was whether DF27, U152 and L21 were the only P312 subclades who weren't small/family subclades. Of course mall is a relative term. Holland has a population of 17 million. Assuming half of that is male and that the Dutch study percentages for DF19 and DF99 are representative for the entire country, that translates in to some 170,000 DF19 men there and some 68,000 DF99 men. You may consider that insignificant. I do not.

MitchellSince1893
06-22-2017, 07:28 PM
FWIW, Busby et al's Germany East sample (N=47). Germany East has about 16.3 million people.

In order to have a 95% confidence interval one would need a minimum of 385 random samples from Germany East.

http://www.calculator.net/sample-size-calculator.html

rms2
06-22-2017, 07:43 PM
I don't question the accuracy of of Busby's testing, just the conclusions you drew from it based on a sample of 47 from somewhere in eastern Germany . . .

What? I drew no conclusions from the Myres et al Germany East sample. It was one of many examples I cited. I paid no special attention to it in my initial post on this subject nor did I emphasize it. It caught your eye, I guess, because it had 0% P312xL21,U152. It apparently caused you some concern that I don't share, which is why I gave it no special emphasis. It was just another Germany-and-nearby stat from Busby as far as I was concerned.

The whole point of my post was that Busby's P312xL21,U152 category is what is left to be divided up among DF19, DF27, DF99, and the rest of P312.

If it is at all representative of reality and not grossly inaccurate, then there isn't enough P312 that isn't L21 or U152 left (not that there is all that much continental L21) to alter the European P312 landscape much, even if new studies are undertaken and the population is tested for all that Busby left out.

That was the only conclusion I drew.

Here are the NINE examples from Busby et al I cited, among which #4, Germany East, received no special emphasis.

http://www.anthrogenica.com/showthread.php?11016-Subclades-of-P312-amp-U106&p=249976&viewfull=1#post249976



Here are some examples:

1. Germany North, N=64: P312xL21,U152 = 3.1%

2. Friesland, N=94: P312xL21,U152 = 4.3%

3. Netherlands, N=87: P312xL21,U152 = 6.9%

4. Germany East, N=47: P312xL21,U152 = 0%

5. Germany West, N = 100: P312xL21,U152 = 10%

6. Germany South, N = 91: P312xL21,U152 = 9.9%

7. Germany Freiburg, N = 102: P312xL21,U152 = 7.8%

9. East France, N = 80: P312xL21,U152 = 7.5%

rms2
06-22-2017, 07:54 PM
FWIW, Busby et al's Germany East sample (N=47). Germany East has about 16.3 million people.

In order to have a 95% confidence interval one would need a minimum of random 385 samples from Germany East. At 47 samples the confidence interval is: 60% 14.01% or 45.99% - 74.01%.

http://www.calculator.net/sample-size-calculator.html?type=2&cl2=95&ss2=47&pc2=60&ps2=16300000&x=70&y=14

You are following Robert's lead and making much more of the Germany East sample than I did. As I pointed out above, it was merely one of nine examples I cited from Germany and nearby. I gave it no special prominence.

Once again, the whole point of citing Busby et al was to show that, if it is at all accurate, there isn't a whole lot of P312xL21,U152 left to be divided up among DF19, DF27, DF99 and the rest.

I pointed that out to make it clear that no one should really expect new studies to radically alter what we currently know of the y-dna landscape of continental Europe. New studies are welcome. They should be done. They can show us how all the P312 subclades are distributed among modern Europeans. But we shouldn't look for big changes in the proportions of the subclades.

rms2
06-22-2017, 08:09 PM
If we add the examples from Rebala et al cited by Robert above, we still don't get any shocking changes.

10. Mecklenburg, N=121: P312xL21,U152,SRY2627 = 8.4%

11. Bavaria, N= 218: P312xL21,U152,SRY2627 = 6.8%

That means in Mecklenburg 8.4% of the male population is left to be divided up among DF19, DF27xSRY2627, DF99, and all the rest.

In Bavaria, 6.8% of the male population is left to be divided up among DF19, DF27xSRY2627, DF99, and all the rest.

That makes my point and does not do anything, really, to counter it.

It was not my purpose to try to make DF19, DF27, DF99 and the rest of P312xL21,U152 seem unimportant. I don't think they are unimportant. I was just pointing out that we probably should not expect one of them to surprise us and turn out to be much more in some area of Central or Eastern Europe than we currently think it is. In other words, when there is only 8% or so of the population left to work with and be divided up among several P312 subclades, one of those clades is not going to somehow miraculously turn out to be 15% or 20% of the population.

MitchellSince1893
06-22-2017, 08:29 PM
You are following Robert's lead and making much more of the Germany East sample than I did. As I pointed out above, it was merely one of nine examples I cited from Germany and nearby. I gave it no special prominence.

Once again, the whole point of citing Busby et al was to show that, if it is at all accurate, there isn't a whole lot of P312xL21,U152 left to be divided up among DF19, DF27, DF99 and the rest.

I pointed that out to make it clear that no one should really expect new studies to radically alter what we currently know of the y-dna landscape of continental Europe. New studies are welcome. They should be done. They can show us how all the P312 subclades are distributed among modern Europeans. But we shouldn't look for big changes in the proportions of the subclades.

Full disclosure. I'm not reading all the details of your discussion with Robert. I was just pointing out, that while Busby is the best we have in a lot of areas, in general we shouldn't treat it as gospel, as the samples sizes are often very small for the area they are supposed to represent. If Busby had taken all 48 samples from tiny village in Eastern Germany then yeah that may be representative of that village, but that's about all we could say with any degree of confidence. To take this as being an accurate representation of a whole region e.g. Eastern Germany is risky.

If another study has a much larger sample size for an area than Busby then I would be inclined to give it more weight. That was my only point.

GoldenHind
06-23-2017, 12:08 AM
If we add the examples from Rebala et al cited by Robert above, we still don't get any shocking changes.

10. Mecklenburg, N=121: P312xL21,U152,SRY2627 = 8.4%

11. Bavaria, N= 218: P312xL21,U152,SRY2627 = 6.8%

That means in Mecklenburg 8.4% of the male population is left to be divided up among DF19, DF27xSRY2627, DF99, and all the rest.

In Bavaria, 6.8% of the male population is left to be divided up among DF19, DF27xSRY2627, DF99, and all the rest.

That makes my point and does not do anything, really, to counter it.

It was not my purpose to try to make DF19, DF27, DF99 and the rest of P312xL21,U152 seem unimportant. I don't think they are unimportant. I was just pointing out that we probably should not expect one of them to surprise us and turn out to be much more in some area of Central or Eastern Europe than we currently think it is. In other words, when there is only 8% or so of the population left to work with and be divided up among several P312 subclades, one of those clades is not going to somehow miraculously turn out to be 15% or 20% of the population.

You seem to be missing my point. No one is claiming that the amount of P312xU152,L21 in Germany is likely to surprise us or turn out to be 15% or 20% of the total population. What I am contesting is your statement that this group "reaches its maximum in SW Germany and nearby SE France," based on the small numbers sampled by Busby, almost all of which are less than 100. The much larger Rebala study suggested the opposite, at least with regard to Mecklenburg in eastern Germany compared to Bavaria in the south.

Mitchell underscored this. My point is merely that one simply can't assume samples of a hundred or less accurately reflects a population of millions, justfying such sweeping statements.

I can't speak for DF27, but I would be astounded if DF99 turns out to reach its maximum in SE France. I know France is severely under tested in the FTDNA database, but so far we have only identified one DF99 example there, from a location in Savoy which is very near the borders with Switzerland and Italy. The other French DF99's are from Normandy, a long way from the southeast of that country.

RobertCasey
06-23-2017, 12:16 AM
Here is a summary of my haplogroup R spreadsheet for 67 markers (which has 53,848 testers). This is only based on the terminal YSNP in the YSTR report (which is only pulled once when the tester upgrades to 67 markers). So, I lot of the M269 would move to other haplogroups - but should be proportional if testing is somewhat consistent over time. This also includes both confirmed and predicted results combined.

M269 - 33,492
L21 - 6,858
R1a - 5,415
U106 - 3,413
DF27 - 1,349
U152 - 1,272
P312 - 1,228
R1b* - 674
R2* - 246
R1* - 95
R* - 10

Only looking at R1b:

M269 - 33,492
L21 - 6,858
U106 - 3,413
DF27 - 1,349
U152 - 1,272
P312 - 1,228
R1b* - 674
Total - 48,286

Ignoring M269 and R1b*

L21 - 6,858 - 49 %
U106 - 3,413 - 24 %
DF27 - 1,349 - 10 %
U152 - 1,272 - 9 %
P312 - 1,228 - 9 %
Total - 14,120

These are the actual YSNP values found in 5,000 FTDNA projects (67 markers only) but are only from the terminal YSNP in the YSTR report. Since testing of DF27 started later, its percentage is probably higher than it actually is today. But as Mike W stated, DF27 is probably less tested than others. L21, U106 and P312 have been around a long time, so their numbers probably track over time.

rms2
06-23-2017, 01:37 AM
You seem to be missing my point . . . What I am contesting is your statement that this group "reaches its maximum in SW Germany and nearby SE France," based on the small numbers sampled by Busby, almost all of which are less than 100 . . .

Your point is meaningless because that is not what I wrote.

Here is what I wrote:



Those percentages, which, in the examples above, reach their maximums in SW Germany and nearby SE France, would have to be divided among all three of those subclades, DF19, DF27, and DF99, not to mention the very minor P312 subclades mentioned in this thread.

So you see, I never said P312xL21,U152 reaches its maximum in SW Germany and nearby SE France. I said "in the examples above", meaning among those nine I cited, it reaches its maximum in SW Germany and nearby SE France.

Pretty obviously, Busby's P312xL21,U152 category reaches its maximums in Iberia and France, since it includes DF27, so I would never have claimed what you erroneously thought I claimed.



Mitchell underscored this. My point is merely that one simply can't assume samples of a hundred or less accurately reflects a population of millions, justfying such sweeping statements.

Mitchell did not read all the posts and focused mistakenly on one sample set, Germany East. Busby et al sampled numerous locations, accumulating a large total number of samples. The results I cited for P312xL21,U152 ranged from a low of 0 to a high of 10%. The point I made was that those are the figures left to be divided among DF19, DF27, DF99 and all the rest, no more and no less.

The figures you cited from Rebala et al are actually in line with what Busby et al found, as I already pointed out.

Besides, I did NOT make any "sweeping statements", not a one.



I can't speak for DF27, but I would be astounded if DF99 turns out to reach its maximum in SE France. I know France is severely under tested in the FTDNA database, but so far we have only identified one DF99 example there, from a location in Savoy which is very near the borders with Switzerland and Italy. The other French DF99's are from Normandy, a long way from the southeast of that country.

No one said DF99 reaches its maximum in SE France. I certainly did not. You misunderstood what I wrote and took it as a claim that all of Busby's P312xL21,U152 reaches its maximum in SW Germany and nearby SE France. As I already pointed out, that is not what I wrote.

My point, as I have said several times already, is that, if Busby et al is accurate, its P312xL21,U152 category is all there is left to be divided among the several P312 clades DF19, DF27, DF99, L238 and the rest. Therefore, even with new studies and more inclusive testing, we will not see any radical changes in the proportions of P312 subclades.

That's it.

rms2
06-23-2017, 01:51 AM
Full disclosure. I'm not reading all the details of your discussion with Robert. I was just pointing out, that while Busby is the best we have in a lot of areas, in general we shouldn't treat it as gospel, as the samples sizes are often very small for the area they are supposed to represent. If Busby had taken all 48 samples from tiny village in Eastern Germany then yeah that may be representative of that village, but that's about all we could say with any degree of confidence. To take this as being an accurate representation of a whole region e.g. Eastern Germany is risky.

Again, you are focusing on one sample set, Germany East. Busby sampled numerous locations throughout Germany and most of the rest of Europe, accumulating a large number of samples.

No one said Busby was to be taken as "gospel", but we aren't likely to get any studies that are to be taken as gospel anytime soon. Do you envision a study that will sample 500 or 1,000 men at each of numerous locations throughout Europe? That would be a trifle pricey.

The results in Busby were fairly consistent, and as I pointed out already, in line with the results of the Rebala et al study cited by Robert.



If another study has a much larger sample size for an area than Busby then I would be inclined to give it more weight. That was my only point.

Look again at all the examples I cited:




1. Germany North, N=64: P312xL21,U152 = 3.1%

2. Friesland, N=94: P312xL21,U152 = 4.3%

3. Netherlands, N=87: P312xL21,U152 = 6.9%

4. Germany East, N=47: P312xL21,U152 = 0%

5. Germany West, N = 100: P312xL21,U152 = 10%

6. Germany South, N = 91: P312xL21,U152 = 9.9%

7. Germany Freiburg, N = 102: P312xL21,U152 = 7.8%

9. East France, N = 80: P312xL21,U152 = 7.5%


Germany East from Myres et al, at N=47, was the smallest, and that is actually not a bad sample size for a population genetics study.

Do you really think Busby's results are inaccurate and that a new study would overturn them drastically?

I don't.

Again, the point I was making was that Busby's P312xL21,U152 category is what is left for DF19, DF27, DF99 and the rest to carve up amongst themselves, which means new studies are not at all likely to alter the current picture of the relative proportions of P312 subclades on the European continent.

That's a fairly simple and straightforward proposition, and it is pretty obviously true.

MitchellSince1893
06-23-2017, 02:15 AM
Do you really think Busby's results are inaccurate and that a new study would overturn them drastically?

I don't.
It depends on your definition of "drastically". Please state within how many % you think Busby's numbers will be compared to a study with at least 385 random samples per geographic region.

rms2
06-23-2017, 02:32 AM
It depends on your definition of "drastically". Please state within how many % you think Busby's numbers will be compared to a study with at least 385 random samples per geographic region.

Why stop at 385? Why not make it 500 or 1,000?

How likely are we to get such a dream study anytime soon?

Do you really think Busby et al was so inaccurate that a new study with 385 samples at each of numerous locations throughout Europe would change the relative proportions of P312 in any meaningful way?

I think those 385 samples would align themselves pretty much as Busby's samples did. U152 and L21 would not suddenly be found to diminish relative to DF19, DF27, DF99 and the rest.

MitchellSince1893
06-23-2017, 02:44 AM
Why stop at 385? Why not make it 500 or 1,000?


Do you really think Busby et al was so inaccurate that a new study with 385 samples at each of numerous locations throughout Europe would change the relative proportions of P312 in any meaningful way?

I think those 383 samples for 67000 would align themselves pretty much as Busby's samples did. U152 and L21 would not suddenly be found to diminish relative to DF19, DF27, DF99 and the rest.

You need at least 383 samples to achieve a 95% confidence level with a 5% confidence interval for subject populations over 68,000. You can get 500 or 1000, but a minimum of 383 to achieve that level of confidence (385 when you get over 1 million)

It depends on your definition of "meaningful way". Please state within how many % you think Busby's numbers will be compared to a study with at least 383 random samples per geographic region.

rms2
06-23-2017, 02:50 AM
You need at least 383 samples (not 385 my typo) to achieve a 95% confidence level with a 5% confidence interval for subject populations over 68,000. You can get 500 or 1000, but a minimum of 383 to achieve that level of confidence.

It depends on your definition of "meaningful way". Please state within how many % you think Busby's numbers will be compared to a study with at least 383 random samples per geographic region.

Little if any, is the best you will get out of me, since such a question is not really reasonable.

"How inaccurate do you imagine Busby is?" is what you are asking me, and you want me to quantify it.

I think Busby was reasonably accurate. It's sophistry to propose a study that will probably never be and then ask me to compare that chimera to an actual study with real results.

Of the examples I cited from Busby, the numbers for Germany totaled 404. Too big a geographic region for you?

MitchellSince1893
06-23-2017, 04:11 AM
As you did not provide numerical values I will throw out a couple of examples of what I'm getting at

Germany North has 64 samples in Busby
N = 64
U106 = 18.8%
P312xL21,U152 = 3.1%
L21 = 3.1%
U152 = 6.3%

If 64 samples were taken numerous times there is a 58% chance the average of these multiple 64 samples tests would be within 5% of the actual Germany North numbers. However that's just the average of multiple 64 samples. A single 64 sample test may luck out and be right on the money or it could be drastically off.

By comparison if you did the same exercise with 400 samples there's a 95% chance you would be within 5% of the actual Germany North numbers.

Hence the reason I do not have a lot of confidence in the double digit Busby samples sizes (this is not the same as me saying they are off, but rather recognizing there is a good chance they could be). There's too much margin for error for my comfort level. They may be close, but they could be way off, and it would be prudent to recognize this statistical possibility.

As they are often the only thing around we use them...but I wouldn't bet the farm on 'em.

rms2
06-23-2017, 01:42 PM
The overall sample size for Germany, at least in the examples I cited, was 404. If, as you said, for a 95% confidence level, 383 samples are needed for regions with a population in excess of 68,000, then the overall figures for Germany should be reasonably accurate.

I seriously doubt we're ever going to get a study that has 383 samples per location at as many locations as were sampled by Busby.

The only reason to attempt to cast doubts on the accuracy of Busby et al's results is if one thinks those results are so far wrong as to be grossly inaccurate.

When Busby et al first appeared, back in 2012 (I think), we did not have a whole lot of ancient y-dna results and so were using it to argue about the origin of various R1b subclades. I will admit I was rather disappointed by the relatively lackluster performance of L21 in Germany, where I expected higher frequencies. In the early days of the R L21 Project, we seemed to be doing well with Americans of German Palatine ancestry and even had a few German nationals who were L21+. Busby's results weren't what I had expected. We did pretty well in France, but Germany's L21 was much lower than I expected.

Thank God we now have ancient y-dna appearing in a growing stream, with the potential to tell us a lot more about y haplogroup origins than modern distribution ever could.

As far as the modern European proportions of the P312 subclades are concerned, however, I think it is pretty plain that new, more inclusive studies are not going to change things much, if at all.

BTW, I did provide the numerical values from Busby et al for the examples I cited, several times, in fact. Here they are again:




1. Germany North, N=64: P312xL21,U152 = 3.1%

2. Friesland, N=94: P312xL21,U152 = 4.3%

3. Netherlands, N=87: P312xL21,U152 = 6.9%

4. Germany East, N=47: P312xL21,U152 = 0%

5. Germany West, N = 100: P312xL21,U152 = 10%

6. Germany South, N = 91: P312xL21,U152 = 9.9%

7. Germany Freiburg, N = 102: P312xL21,U152 = 7.8%

9. East France, N = 80: P312xL21,U152 = 7.5%

rms2
06-23-2017, 02:06 PM
Leaving Busby behind, I think modern Belgium may be a place with the potential for a relatively significant level of DF19 and maybe of DF99, too. The Brabant Project (N=871) tested for SRY2627 and Z196xSRY2627, thus clearing the P312 remainder field of much of DF27. Its P312* catchall category was 9.6% of the total sample population. That is what would be split between DF19, DF27xZ196, DF99, L238 and the rest.

It would be really interesting to see how that 9.6% is parsed out among those subclades.

L21 did fairly well in the Brabant Project, at ~9%. U152 was ~10%, and the portion of DF27 I mentioned above (SRY2627 and Z196xSRY2627) was ~4%.

It would also be very interesting to see how that was divided up between Flemings and Walloons.

MitchellSince1893
06-23-2017, 03:36 PM
The only reason to attempt to cast doubts on the accuracy of Busby et al's results is if one thinks those results are so far wrong as to be grossly inaccurate.
Incorrect. There is another reason which for some reason I am failing to communicate. The reason I stated previously

this is not the same as me saying they are off, but rather recognizing there is a good chance they could be
This is not my opinion. It is a statistical fact. Small sample sizes = greater uncertainty in the results. This is not debatable.

So I will clearly state it again. Saying Busby is very close to the real numbers in regions with small sample sizes is not based in statistics. It is your opinion/your feeling, and we need to treat it as such.

As to the 404 samples in Germany: As long as they were truly random. That is, not collected at 5 selected geographic areas. Otherwise we are introducing a bias in selection.

rms2
06-23-2017, 03:59 PM
Incorrect. There is another reason which for some reason I am failing to communicate. The reason I stated previously

This is not my opinion. It is a statistical fact. Small sample sizes = greater uncertainty in the results. This is not debatable.

The sample sizes were not small. As I said (more than once), the sample size for Germany, at least in the examples I cited, was 404.

Were you wrong about 383 being all that were needed? Or did you mean to specify 383 per so many square miles of territory?

Again, the only real reason to dispute the accuracy of any study, Busby included, is if one believes it isn't accurate.



So I will clearly state it again. Saying Busby is very close to the real numbers in regions with small sample sizes is not based in statistics. It is your opinion/your feeling, and we need to treat it as such.

Once again, the sample sizes in Busby were not actually small in the aggregate, and were, for the most part, not small at the local level either.



As to the 404 samples in Germany: As long as they were truly random. That is not, not collected at 5 selected geographic areas. Otherwise we are introducing a bias in selection.

What? How does collecting them at different locations make them less random than collecting them all at one central location, say, in Berlin?

Notice, too, that Busby's results are not out of line with the Rebala et al study cited earlier by Robert, the Brabant Project, and the Genome of the Netherlands Project.

You asked for 383 samples, and Busby gives you 404 for Germany. Yet you are still resorting to the same small-sample-size argument.

Do you honestly think Busby is seriously flawed and that, really, DF19 or DF99 will turn out to be considerably more frequent than they currently appear to be?

GoldenHind
06-23-2017, 05:21 PM
As they are often the only thing around we use them...but I wouldn't bet the farm on 'em.

Nor would I.

The population of Germany is over 80 million. I am no statistician, but to assume Busby's sampling of 404 accurately reflects its YDNA composition seems to me to be unrealistic.

MitchellSince1893
06-23-2017, 06:17 PM
Again, the only real reason to dispute the accuracy of any study, Busby included, is if one believes it isn't accurate....

Do you honestly think Busby is seriously flawed..?


I acknowledge the statistical uncertainty involved in the small sample sizes found in many of the Busby figures and you state your "opinion", what you "feel", "believe", "think" of Busby and ask me to do the same.

What you or I "feel" or believe is irrelevant. I "know" there is a significant degree of uncertainty in low samples sizes, and your opinion is Busby is close to the actual numbers; but you do not "know" this.

Because there is a known degree of uncertainty it would be not be accurate for someone to tell someone else they are wrong if they they disagree with Busby in these instances.

It would be fair to say "Based on Busby I believe..." but to tell someone they are wrong solely based on Busby's numbers is only an opinion.

I "feel" the readers of this thread should be aware of the difference.

As I've previously stated. On these small sizes, Busby may have gotten close to the actual numbers, but they could also be way off. That is what I "know".

rms2
06-23-2017, 06:22 PM
Nor would I.

The population of Germany is over 80 million. I am no statistician, but to assume Busby's sampling of 404 accurately reflects its YDNA composition seems to me to be unrealistic.

I am no statistician either. I took one undergraduate statistics class and got an "A" in it. I took one graduate statistics class and also got an "A" in it. But both of those were long ago.

The guy you seem to think you are agreeing with said that it takes 383 samples to insure a 95% confidence level in populations over 68,000. Busby's Germany sample, at least in the examples I cited, totaled 404, 21 more than the 383 required for 95% confidence.

Do you really think its P312xL21,U152 category should be much larger than it is? Since you doubt Busby's results, maybe it is actually lower.

rms2
06-23-2017, 06:30 PM
I acknowledge the statistical uncertainty involved in the small sample sizes found in many of the Busby figures . . .

It seems to me you are obfuscating. Busby's Germany sample totaled 404. You said at least 383 samples are needed for a 95% confidence level. Well, Busby provided 404 for Germany alone.

But you want to break things down to the level of the individual sampling locations in order to maintain your low-sample-size argument and then act as if you are being scientific and objective and that I only want to talk about my "feelings". Baloney.

Now, is 404 not enough? Or are you now requiring 383 per square mile or some similar formula?

razyn
06-23-2017, 06:31 PM
Leaving Busby behind

I long for that day really to come, but clearly it hasn't. I remember when that paper came out (2011?) and Mikewww, among others, shot it down in about a week. It's OK to cite what is out there to cite; but there is a lot better evidence now available, on what is being discussed here (subclades of P312 and U106). Much of it has to do with SNPs that Busby et al had not dreamed of -- DF27 being the largest, but far from the only, untargeted SNP. (The paper has many co-authors and its elements, for better or worse, are not all Busby's cross of pain to bear.)

rms2
06-23-2017, 06:47 PM
I long for that day really to come, but clearly it hasn't. I remember when that paper came out (2011?) and Mikewww, among others, shot it down in about a week. It's OK to cite what is out there to cite; but there is a lot better evidence now available, on what is being discussed here (subclades of P312 and U106). Much of it has to do with SNPs that Busby et al had not dreamed of -- DF27 being the largest, but far from the only, untargeted SNP. (The paper has many co-authors and its elements, for better or worse, are not all Busby's cross of pain to bear.)

Groan . . .

As I recall, what Mike and others criticized in Busby was its conclusions, not the stats themselves.

My point was that if Busby et al is reasonably accurate - and I see no good reason to think it is not - then its P312xL21,U152 category is what is left to be divvied up by DF19, DF27, DF99, L238 and the rest. In some spots, like throughout most of France and throughout all of Iberia, that's a BIG, important category. In other places, it is much smaller and less significant and comprises little to be split up.

MitchellSince1893
06-23-2017, 06:53 PM
It seems to be you are obfuscating. Busby's Germany sample totaled 404. You said at least 383 samples are needed for a 95% confidence level. Well, Busby provided 404 for Germany alone.

But you want to break things down to the level of the individual sampling locations in order to maintain your low-sample-size argument and then act as if you are being scientific and objective and that I only want to talk about my "feelings". Baloney.

Now, is 404 not enough? Or are you now requiring 383 per square mile or some similar formula?

I will answer your 404 sample question when you answer mine. Do you acknowledge the uncertainty involved in Busby's small samples and that they could be off by a significant amount?

rms2
06-23-2017, 07:32 PM
I will answer your 404 sample question when you answer mine. Do you acknowledge the uncertainty involved in Busby's small samples and that they could be off by a significant amount?

Not really, because the total sample for Germany was 404. So, for Germany, if what you yourself said is correct, Busby's results are accurate.

Busby collected samples from throughout Germany. I could be wrong, but I don't think the authors claimed, "The results for this particular location are good for Thuringia, the results for this other section are good for Bavaria", etc.

In other words, the study was not a section-by-section report of the R1b make-up of each given locality. If it were, and some sort of sectional dimensions and boundaries were established for each separate study of each locality, then we could say whether or not the sample size of a particular section was sufficient for its population. But that is not what Busby et al did. It merely collected samples from different locations throughout the various countries of Europe.

One cannot really say, for example, that the 47 samples from "Germany East" are insufficient, because there is no indication they were ever meant to represent all of eastern Germany by themselves. We don't know what specific region they were intended to represent, how big it is, and how many people inhabit it, because that was not the intent of the study. "Germany East" is merely the name given to the location where 47 samples were collected, and the results are as published. Those results are part of the total for Germany and of Europe as a whole and do not represent a separate sectional study of all or some part of eastern Germany.

MitchellSince1893
06-23-2017, 08:36 PM
Not really, because the total sample for Germany was 404. So, for Germany, if what you yourself said is correct, Busby's results are accurate.

Busby collected samples from throughout Germany. I could be wrong, but I don't think the authors claimed, "The results for this particular location are good for Thuringia, the results for this other section are good for Bavaria", etc.

In other words, the study was not a section-by-section report of the R1b make-up of each given locality. If it were, and some sort of sectional dimensions and boundaries were established for each separate study of each locality, then we could say whether or not the sample size of a particular section was sufficient for its population. But that is not what Busby et al did. It merely collected samples from different locations throughout the various countries of Europe.

One cannot really say, for example, that the 47 samples from "Germany East" are insufficient, because there is no indication they were ever meant to represent all of eastern Germany by themselves. We don't know what specific region they were intended to represent, how big it is, and how many people inhabit it, because that was not the intent of the study. "Germany East" is merely the name given to the location where 47 samples were collected, and the results are as published. Those results are part of the total for Germany and of Europe as a whole and do not represent a separate sectional study of all or some part of eastern Germany.
Your comments above get directly to the heart of my response about the 404.

I too don't know the specifics of how Busby collected the 404 samples. Were they collected from specific locations in Germany or were they randomly collected throughout Germany, and then later given geographic labels based on their locations?

The answer as to how they were collected will indicate the confidence we can have in the result. If they weren't randomly collected throughout Germany but rather collected from small geographic areas within each region of Germany then there could be a selection bias. e.g. if Busby collected samples within a few square mile area in Germany_East, Germany_North etc. that happened to be low in U106 while on average for U106 this region has a much higher percentage then the numbers could be biased.

If Busby collected these samples to be representative of each region then these small sample size reduce our confidence in the result for each region. Again that doesn't mean they are wrong...just that we can't have high confidence in their accuracy.

rms2
06-23-2017, 09:09 PM
There is no indication they were meant to represent a particular restricted region or population. If they were, then the specifics of the region would have been explained.

I doubt any kind of selection bias existed. How would Busby et al have anticipated the results sufficiently to have exercised some kind of bias? "Let's go here because we know P312xL21, U152 will be low". Is that it?

My pointing out that Busby's P312xL21,U152 category constitutes what is left to be divided up among DF19, DF27, DF99, L238 and the rest has really turned into a much more labored and extensive topic than I expected. I thought it was pretty obvious and pretty damned straightforward. I still think so.

MitchellSince1893
06-23-2017, 10:42 PM
Per the Busby study


We assembled a dataset of 2486 R-M269 Y chromosomes from across Europe, the Near East and western Asia, from a total population of 6503, which included both novel and previously published Y chromosomes. To assess the frequency distribution of R-M269 and various sub-haplogroups in Europe and Asia, we combined our data with that of Myres et al. [21], which gave a combined set of 4529 R-M269 chromosomes from a total sample of 16 298 from 172 different populations (electronic supplementary material, table S1 and figure S1).


Three English populations were generated by combining data between the two studies, where they came from the same area and the resultant population was greater than 30: NorthWest England (ENG-NW); South-West England (ENG-SW); and South-East England (ENGSE). Two additional populations were made by combining populations within the Myres et aldataset: South-East Denmark (DEN-SE) and Switzerland (SWI-SC).

I couldn't find any more details on how Busby collected his samples.

Per Myres

All samples studied were obtained using locally approved informed consent protocols. A total of 2193 samples within the R-M343 component were genotyped in a hierarchical manner for the following SNP markers: M412,M415, M478, M520, M529, L11, L23 and S116 (Supplementary Table S1). In addition, markers M42013 and V8828 were genotyped according to the previously published protocols, as well as the following seven previously published markers in haplogroup R1b: P297, M73, M269, U106, U198, U152 and M222.29 The M479 SNP (specifications given in Supplementary Table S1) was typed in R-M207(xM173) samples.

I looked through the supplementary tables but there wasn't anything specific on sample collections. So as best I can tell neither study provides sufficient detail to determine how samples were collected...how random they were. Maybe someone else can locate something more substantive on this.

razyn
06-23-2017, 11:00 PM
Groan . . .

As I recall, what Mike and others criticized in Busby was its conclusions, not the stats themselves.

My point was that if Busby et al is reasonably accurate - and I see no good reason to think it is not - then its P312xL21,U152 category is what is left to be divvied up by DF19, DF27, DF99, L238 and the rest. In some spots, like throughout most of France and throughout all of Iberia, that's a BIG, important category. In other places, it is much smaller and less significant and comprises little to be split up.
But apart from the deficiencies in Busby's conclusions, methodology, number of loci targeted, and the previously proposed European cline stuff his team was trying to disprove in the first place (not very convincingly) -- the vanishingly tiny sample (404/80,000,000 of the "sampled" population) just doesn't support the kind of argument you are trying to foist upon basically sensible people, based on it. I still think the best plan you've come up with is "Leaving Busby behind." And I still think Mitchell and GoldenHind understand the issue better -- not that I want to keep reading an argument about it.



The population of Germany is over 80 million. I am no statistician, but to assume Busby's sampling of 404 accurately reflects its YDNA composition seems to me to be unrealistic.

It doesn't advance our search for the subclades to write off millions of people who haven't actually been tested, in any statistically meaningful way. We have a fair handle on 404 of the German ones. That leaves the French, Swiss, Austrian, Czech, Romanian, Polish and other interesting populations to explore, e.g. for what those 404 specific guys didn't show much of to Busby. Many of them are going to be DF27, U152, and those other more "continental" haplogroups below P312 (that you L21 guys aren't really very fired-up about). Eventually we'll have a better handle, even from the FTDNA testing of a non-representative sample; but mainly from the aDNA, and newer papers by people who are beginning to have a much better grasp on the complexities involved.

GoldenHind
06-23-2017, 11:06 PM
Do you really think its P312xL21,U152 category should be much larger than it is? Since you doubt Busby's results, maybe it is actually lower.

Contrary to your repeated suggestion that I think a larger study would "radically" alter Busby's data and that I think his P312xU152 portion should be much larger, I have never said that. This is an assumption on your part. What I do doubt is that that category reaches its greatest numbers in SW Germany and SE France and is absent from eastern Germany, as the Busby data implied. The Rebala Mecklenburg study conclusively proves Busby's eastern Germany data is misleading. Why should we assume the rest of it is reliable?

I have no idea what a more representative sampling would show, other than I know for a fact that P312xU152,L21 is present to some extent in eastern Germany.

Finally I do not accept Busby's sampling of 404 in a country of over 80 million as determinative. Obviously you do, as you must have posted them a half dozen times, as if reposting them time after time somehow makes them more reliable.

I know you always have to have the last word, so I will leave you to it.

GoldenHind
06-23-2017, 11:12 PM
My point was that if Busby et al is reasonably accurate - and I see no good reason to think it is not - then its P312xL21,U152 category is what is left to be divvied up by DF19, DF27, DF99, L238 and the rest. In some spots, like throughout most of France and throughout all of Iberia, that's a BIG, important category. In other places, it is much smaller and less significant and comprises little to be split up.

Do you really think DF19, DF99 and L238 are important categories in Iberia and France? People who don't know much about those subclades could get the impression that's what you are suggesting.

MitchellSince1893
06-23-2017, 11:13 PM
I know you always have to have the last word, so I will leave you to it...

Great! Now you've made him aware of my secret battle to not let him get the last word. :) I was willing to drop the whole thing several pages ago, but I kept getting sucked back in every time he got in the last word. :biggrin1: This will now drag on for several more pages. Thanks a lot GoldenHind :P

rms2
06-24-2017, 10:54 AM
Do you really think DF19, DF99 and L238 are important categories in Iberia and France? People who don't know much about those subclades could get the impression that's what you are suggesting.

No, obviously not. I think we know by now that DF27 dominates in Iberia and is pretty significant in France. But we were talking about the entire P312xL21,U152 category in Busby, and it is obviously much bigger in Iberia and France than anywhere else.

L238 is apparently numerous only in Scandinavia as far as I know, but I am not sure even there what percentage of the total male population it accounts for.

I would not call either DF19 or DF99 unimportant, but I think if we found either at 5% anywhere (except a very localized area) that would probably represent its world maximum. That is not intended as an insult, although I understand it may and probably will be interpreted as one.

My own branch of DF13, DF41/CTS2501, appears to be one of the smaller DF13 subclades, especially when compared to DF49, DF21, and Z253. That's just the way it goes.



Contrary to your repeated suggestion that I think a larger study would "radically" alter Busby's data and that I think his P312xU152 portion should be much larger, I have never said that. This is an assumption on your part.

I began my initial post in this exchange (post #30 in this thread) with the caveat that I was not saying that you were claiming that. Guess you missed that part:

http://www.anthrogenica.com/showthread.php?11016-Subclades-of-P312-amp-U106&p=249976&viewfull=1#post249976



You're right, we need some better, more up-to-date studies. I don't mean to imply that you said such studies will radically alter the relative proportions of the P312 subclades; you didn't.

Oops!



What I do doubt is that that category reaches its greatest numbers in SW Germany and SE France and is absent from eastern Germany, as the Busby data implied. The Rebala Mecklenburg study conclusively proves Busby's eastern Germany data is misleading. Why should we assume the rest of it is reliable?

You are persisting in your misreading of what I wrote. I already explained that I said the P312xL21,U152 category reaches its maximum in SW Germany and nearby SE France in those nine examples I cited, not in the Busby data as a whole. Very obviously that category reaches its maximum in Iberia.

Rebala et al proves nothing of the kind. "Busby's eastern Germany data" was not meant to represent, as far as I can tell, all of eastern Germany or even any specific region in Germany. Myres et al collected 47 samples at a location they named "Germany East", and Busby used those samples. They were part of the total number of samples collected in Germany at various locations. Busby never said, "These 47 samples represent all of the so-many-millions of men in eastern Germany from x line of longitude to the Polish, Czech and Austrian borders".

The Rebala et al results were in line with Busby's results, showing a few percent left over to be divided up among DF19, DF27xSRY2627, DF99, L238 and the rest: 8.4% in the case of Mecklenburg, and 6.8% in the case of Bavaria. I don't know which clades would get how much of the 8.4% in Mecklenburg and the 6.8% in Bavaria, but that is all there is left to split up, which was the point I began with, i.e., that new, more expansive and inclusive studies will not alter the general picture much at all.



I have no idea what a more representative sampling would show, other than I know for a fact that P312xU152,L21 is present to some extent in eastern Germany.

Agreed. Again, Myres et al merely called the place in eastern Germany where they collected 47 samples "Germany East". They never said it was intended as a separate study of all of eastern Germany in itself or that it was to represent any specific geographic region with known boundaries and a specified number of people.



Finally I do not accept Busby's sampling of 404 in a country of over 80 million as determinative.

Indicative of the y haplogroup frequencies at the time the study was conducted, I think, but certainly not determinative.

I think the Busby stats are fairly reliable, within a margin of error that I am not statistician enough to quantify, probably a few percent either way. P312xL21,U152 could be more frequent than Busby's results indicate, but it could also be somewhat less frequent than Busby's results indicate.

If one needs 383 samples to get a 95% confidence level for populations of 68,000 and up, then the margin of error must be +/- 5%. That sounds about right.



Obviously you do, as you must have posted them a half dozen times, as if reposting them time after time somehow makes them more reliable.

I only reposted them because you and others apparently missed them, choosing to focus on Myres et al's "Germany East", as if it was a separate study of the entire eastern half of Germany based solely upon the 47 samples collected at the location that was christened with that unfortunate moniker.

I gave "Germany East" no special emphasis. It was merely one of the nine examples from Busby I cited to show how much P312xL21,U152 was left to be divvied up by DF19, DF27, DF99, L238 and the rest. Apparently you interpreted it as an attack on your subclade and responded accordingly, but nothing was further from my mind.

Maybe Busby et al was the worst possible study ever, with terrible results, upon which no one should ever rely.

I doubt that is the case, but it would be fine with me if we don't mention it again for awhile and let the ancient y-dna tell us what it will.

From my perspective, I feel I was making a point that is pretty obviously true and got attacked because I inadvertently touched a nerve.

Mikewww
06-25-2017, 02:17 AM
... Since testing of DF27 started later, its percentage is probably higher than it actually is today. But as Mike W stated, DF27 is probably less tested than others. L21, U106 and P312 have been around a long time, so their numbers probably track over time.
The other reason I think DF27 is the up and comer, from a modern DNA testing standpoint, is that Latin America's population and their descendants have faster population growth rates.

On the Busy, Myres, et. al. surveys of modern DNA goes, I'm glad we have them as they are much better than nothing. However, they are totally inadequate. It is true that the size of the sample can be surprisingly small and still be adequate. The problem is that they surveying methodologies totally missed out on proper cross-sectioning of the populations, in my opinion. You can not all assume that "Germany East" or "North Italy" or whatever has a uniform distribution of haplogroups. Sub-ethnic groups, probably religious groups, language dialects and some other things should all be measured with an adequate sample not met until each of the subgroups were adequately represented. Probably terrain divides (mountains, islands) must be controlled for too.

The best example I can think of is the finding of a higher percentage of L21 in Bologna, Italy. I don't think we found much L21 in Italy any where but Bologna so far. If you just happened to skip that region, or only hit the wrong sub-region, you'd think N. Italy was L21=0.

P.S. My wife use to work for Gallup Polls. All decent polls control down to fairly low level subgroups. If you hit these subgroups correctly, the total sampling can be small and still accurate. Of course, you can also find out later that you had the wrong subgroups defined (i.e. those who voted in the last election while not understanding current voters who hadn't voted recently). Designing a good survey methodology takes a lot of work and research on its own.

RobertCasey
06-26-2017, 12:27 AM
P.S. My wife use to work for Gallup Polls. All decent polls control down to fairly low level subgroups. If you hit these subgroups correctly, the total sampling can be small and still accurate. Of course, you can also find out later that you had the wrong subgroups defined (i.e. those who voted in the last election while not understanding current voters who hadn't voted recently). Designing a good survey methodology takes a lot of work and research on its own.

I only wish that YDNA testing was driven with targeted funds used for Gallup Polls - but current testing is definitely not random sampling. First, you have American dominated testing (but European testing is now taking off in many European countries - a good trend). Not only is there testing bias by geographic regions, but their is extreme testing bias for L226 testing as well. I am now peaking out at around 80 % prediction based on 25 % extensively tested YSNP (40 or more branches under L226 or exact YSNP position is already known). There are two classes that are not YSNP testing - those on the extreme ends of the bell curve from the L226 signature (very high or very low genetic distance from the L226 signature).

1) The very high genetic distance testers have few matches with fairly high genetic distances from their closest matches and have many surnames in the matches. This is the largest obstacle as these people are not seeing the immediate genealogical benefit from testing as this testing is more an investment in the future. This scenario is the hardest individuals to get to test until the get more random matches

2) Those with very low genetic distance from the L226 signature (we have one tester that has genetic distance of zero over the last 1,500 years). These people realize that YSTRs are not going to get them there and are YSNP testing. But many are not testing and do not understand that 111 markers will not help for them and they either need 400 YSTRs (long term view) or need to extensively test YSNPs.

People have an extreme bias of over testing themselves vs. sponsoring YSNP testing of their closer YSTR matches. The numbers of branches being discovered for each new Big Y test is way down (only every third or fourth Big Y test reveals a new branch now). Yet we now have over 500 private YSNPs and only 10 % have been tested at YSEQ (and fortunately another 10 % of private YSNPs via the L226 SNP which is now yielding fewer new branches these days as well).

Mikewww
06-26-2017, 01:40 AM
I only wish that YDNA testing was driven with targeted funds used for Gallup Polls - but current testing is definitely not random sampling.
...
People have an extreme bias of over testing themselves vs. sponsoring YSNP testing of their closer YSTR matches.
You are asking for centralized plan based system. That hasn't work too well in economics as that ignores the incentives and motivations of folks. It also does a poor job of weeding out inefficient producers. We can ask our economics professors. I do agree we need much, much better funded scientific studies. I would have love to have seen Myres, Busby, etc. spend five or six times the amount to get a good survey.

On the other hand, I didn't donate any money to them and have no right to complain. Instead I directed my hobbyist oriented budget mostly to things nearest and dearest my own genealogies and also where I had some control over the decision-making.

RobertCasey
07-14-2017, 02:36 AM
You are asking for centralized plan based system. That hasn't work too well in economics as that ignores the incentives and motivations of folks. It also does a poor job of weeding out inefficient producers. We can ask our economics professors. I do agree we need much, much better funded scientific studies. I would have love to have seen Myres, Busby, etc. spend five or six times the amount to get a good survey.

On the other hand, I didn't donate any money to them and have no right to complain. Instead I directed my hobbyist oriented budget mostly to things nearest and dearest my own genealogies and also where I had some control over the decision-making.

I would never suggest donating funds to the academics - they should get their funds elsewhere. I was referring to crowd-funding of strategic testing by members of haplogroup projects. My personal bias to raise funds to test individuals who lose interest in YDNA testing because their lines are not prolific (or not tested well to date) or very old branches where minimal genealogical progress will be made. I know which testers should be tested to be able to raise accurate charting from 80 % to 90 to 95 %. This high coverage would spark a lot of interest in genealogists when we are able to chart with such high coverage with reasonably high accuracy. Charting is really relating to genealogists who can really relate the similarity to descendant charts

MitchellSince1893
07-26-2017, 04:55 AM
Results of 683 German y-dna samples as it relates to some of the discussions about Busby, Germany, sample sizes, etc. in this thread:

Looking at RobertCasey's FTDNA R1b projects spreadsheet with additional ones I added from the FTDNA R1b project (12,25,37 marker tests); I filtered out the repeat surnames for subclades and duplicate kit#s, and ended up with 683 R1b samples for Germany. Busby had 209 R1b samples for Germany.

Of these 683 FTDNA R1b samples:
U106: 294 samples (43%)
U152: 139 samples (20.4%)
DF27: 75 samples (11.0%)
P312 Other: 54 samples (7.9%)
L21: 75 samples (11.0%)
R1b Other: 46 samples (6.7%)


To compare it to Busby's Germany numbers we need to combine the FTDNA DF27 with "P312 other" to equal Busby's "P312xL21,U152":
U106: 294 samples (43.0% of R1b)
U152: 139 samples (20.4% of R1b)
DF27 & P312 Other: 129 samples (18.9% of R1b)
L21: 75 samples (11.0% of R1b)
R1b Other: 46 samples (6.7% of R1b)


209 of Busby's 423 samples in Germany were R1b. This works out to:
U106: 95 samples (45.5% of R1b)
U152: 51 samples (24.4% of R1b)
DF27 & P312 Other: 29 samples (13.9% of R1b)
L21: 10 samples (4.8% of R1b)
R1b Other: 24 samples (11.5% of R1b)


The Busby and FTDNA numbers above are all within 6.3 percentage points of each other, but what's interesting is Busby's largest sampled haplogroup (U106) is closest to the FTDNA U106 numbers and Busby's smallest sampled haplogroup (L21) is furthest from the FTDNA L21 #s.

Busby's 95 U106 German samples is only 2.5 % pts different compared to the 294 U106 FTDNA samples in Germany.

While Busby's 10 L21 samples in Germany has the largest difference (6.2 pts) compared to the 75 L21 FTDNA samples in Germany