PDA

View Full Version : Matching surnames to Y-Haplogroups and vice versa



jamesdowallen
07-18-2017, 04:54 PM
I'm trying to acquire information about surname/haplogroup correlations. For example, I see with Google that HERRICK's Y-chromosome is mostly in I-Haplogroup, but which subclade is it? For many surnames I don't even get that far with Google. Are there some good websites with such information?

The FTdna projects have the information I need, but these are confidential, right? Perhaps I should write to the project managers for the surnames of interest.
But I'm also interested in the reverse problem: given a haplogroup, find the surname(s). For example, Yfull shows R1a-YP360 (https://www.yfull.com/tree/R-YP360/) with a TMRCA date of 1700 AD and no less than 30 hits! Was this just one family that ALL decided to have their DNA tested? Or is YP360 a very large group? What is their surname(s)?

I want to add some Y-chromosome information at my own website. For example, here is the haplogroup of my own agnatic ancestor (http://fabpedigree.com/s024/f000010.htm).

jamesdowallen
07-18-2017, 05:09 PM
or example, I see with Google that HERRICK's Y-chromosome is mostly in I-Haplogroup, but which subclade is it?

My face turns red! I get so rattled doing several things at once in dozens of browser tabs, I missed that my own Googling produced a public page for FTdna Herrick results (https://www.familytreedna.com/public/Herrick/default.aspx?section=yresults) ! (Do many or all of the projects have public pages?)

My general questions are still asked, however. Who is YP360? :)

MacUalraig
07-18-2017, 05:14 PM
Most of the time you can figure it out yourself as well as the next guy tbh. But it can get messy if you only look at sites where the entry requirement is strs and people then go on to varying levels of snp testing from zero to ngs. It is fiddly in general to restrict the enquiries to those sufficiently tested.

Then there is massive testing bias, one clade may just be good at rounding up others to test, or they are all loaded.

spruithean
07-18-2017, 05:14 PM
https://www.familytreedna.com/public/Herrick/default.aspx?section=yresults

Appears the I-M253 here is L22+.

Edit:

I agree with MacUalraig, some groups are good at rounding up relatives who all happen to test their Y-DNA and tip the scales. In the surname prpject I belong to it took a number of years before distant male cousins began testing.

Dave-V
07-18-2017, 07:30 PM
Two thoughts about this coming from both sides of the question...

From a subclade viewpoint I would have to guess not many projects are progressed so far that they know which unique SNPs line up with the major surname groups (ignoring for this the small percent of other surnames that will invariably be part of any large group through NPEs etc). The major subclades in the project will normally fall under different highlevel SNPs which serve to delineate those major subclades for that project's purposes, but if the SNPs that break down those subclades in that project don't align in time perfectly with the advent of surnames, then they'll probably be shared across in other projects too. (Edit: I'm sure there are lots of examples where that SNP has been identified; my point is I don't think it's the norm).

An extreme example in my project is we have a I-M223 subgroup. M223 is over 12,000 years old but it still serves to delineate that Vance group within the project (in this case there are others of the same surname outside of that group, but bear with me as an example since this could occur in a non-surname project for a major surname just as well). Obviously outside my project this Vance group shares M223 with thousands of other men and probably thousands of other surnames, so it doesn't identify the Vance surname anywhere outside my project. To find the SNP that DOES identify that group uniquely would require extensive NGS testing to find the one SNP that occurred most closely to when the surname was adopted in that line.

Coming at the question from the viewpoint of surnames, they rarely have one alignment with a single Y haplogroup in the first place, mainly due to separate origins of the same surname. So when you're looking for alignment between a SNP/subclade and a surname you have to answer the very real question of which "version" of the surname you're talking about. You can't just say that the best known origin, or the most famous line, etc, is the "correct" origin of that whole surname group and therefore that subclade matches to that surname one-to-one. Again using the Vance project as an example we have Irish lines, German lines, and Welsh lines. We know which one is from the Scottish noble line that is best known as the origin of the surname, but that's only ONE origin.

RobertCasey
07-18-2017, 10:36 PM
Most recent haplogroup projects are now routinely finding YSNP branches where one surname (and its variants) exceed 50 %. We now have 22 out of 51 YSNP branches where one surname exceeds 50 % (a reasonable NPE rate for 1,000 years or 40 generations). If you combine both YSNP branches with a YSTR signatures, that number more than doubles. We even now have seven branches that are within the genealogical time frame. Here is a summary:

Y5610, DC1, Y6913, FGC13418, DC62, BY5212 - 24/32 O'Brien - this includes Sir Conor O'Brien, titled descendant of King Brian Boru (originator of the surname)
DC201 - 17/19 McGrath
Z17669 - 17/24 Butler
FGC5647, FGC5639 - 15/19 Casey
DC69, DC377 - 10/16 Casey
DC199 - 9/13 Mahoney
DC55 - 7/9 Noland
DC39 - 5/6 O'Hern
A6097 - 5/9 - Cannon
DC41 - 4/5 - Kelly
FGC12295 - 3/5 O'Brien
DC470 - 2/2 Curry
DC340 - 2/2 McNamara
DC49 - 2/3 Peavey

R-M222 is several times larger than R-L226 and has many more YSNP branches where surnames exceed 50 % under YSNP branches. There are dozens of other younger haplogroups that include many YSNP branches that are dominated by one surname (and its variants) these days. However, probably less than 10 % of testers fall under this category but the trend continues to grow as more NGS and SNP pack results are analyzed.

There are several characteristics that favor these kinds of haplogroups:

1) The YSTR signature is usually genetically isolated and has very little overlap with other testers.
2) These haplogroups are almost always became prolific 1,500 to 2,000 years ago and usually have numerous equivalents (around 50 L226 equivalents).
3) These haplogroups are very prolific in numbers of living descendants that can test (usually associated with successful conquering of neighbors).
4) These haplogroups are usually have very specific geographic origins (five Irish counties for L226 include 80 % of testers).
5) Very active haplogroup admin leadership that drives NGS and SNP pack testing specific to the haplogroup.
6) Such clear progress that the testers are generally very interested in YSNP testing and upgrading to 67 markers. L226 has 550 67 marker testers, 80 NGS testers, 80 L226 SNP pack orders, 200 private YSNPs ordered from YSEQ, three Full Genomes tests (including our first 20X Whole Genomes Sequencing test).
7) Extensive enough branch discovery (51 branches under L226) and extensive YSNP testing (160 tests of 40 or more branches).
8) Comprehensive charting by signatures - 80 % of 550 67 markers testers can be reliably charted currently:

http://www.rcasey.net/DNA/R_L226/Haplotrees/L226_Home.pdf

jamesdowallen
07-19-2017, 04:18 AM
Most recent haplogroup projects are now routinely finding YSNP branches where one surname (and its variants) exceed 50 %. We now have 22 out of 51 YSNP branches where one surname exceeds 50 % (a reasonable NPE rate for 1,000 years or 40 generations). If you combine both YSNP branches with a YSTR signatures, that number more than doubles. We even now have seven branches that are within the genealogical time frame. Here is a summary:

Y5610, DC1, Y6913, FGC13418, DC62, BY5212 - 24/32 O'Brien - this includes Sir Conor O'Brien, titled descendant of King Brian Boru (originator of the surname)
DC201 - 17/19 McGrath
Z17669 - 17/24 Butler
FGC5647, FGC5639 - 15/19 Casey
DC69, DC377 - 10/16 Casey
DC199 - 9/13 Mahoney
DC55 - 7/9 Noland
DC39 - 5/6 O'Hern
A6097 - 5/9 - Cannon
DC41 - 4/5 - Kelly
FGC12295 - 3/5 O'Brien
DC470 - 2/2 Curry
DC340 - 2/2 McNamara
DC49 - 2/3 Peavey
...


I just spot-checked these clade names against the names at Yfull and Isogg. Of all these clades, only one (FGC12295) shows up at all. ::sad-face:: And even it is hidden (waiting for mouse-over). (You can "Find" it at view-source:https://www.yfull.com/tree/R1b/ but not at https://www.yfull.com/tree/R1b/ itself.) And, I'd already noted that O'Brien connection in my database (perhaps finding it via Jean Manco's wonderful site*).

This isn't your fault of course. I'd be happy to be pointed toward more comprehensive trees especially if they are text-like. It's easy for me to search Yfull or Isogg, and tune their tables for my own convenience. Pngs, Jpegs, etc.? Not so much. FTdna's SNP tree? Much less convenient than Yfull or Isogg to copy-paste its tree structure, but if that's the best "standard", I'll work on it.

@ Jean Manco (If you're reading this): I bought your outstanding book recently and am re-reading it and savouring it. Thanks and Kudos!

JohnHowellsTyrfro
07-19-2017, 05:57 AM
Two thoughts about this coming from both sides of the question...

From a subclade viewpoint I would have to guess not many projects are progressed so far that they know which unique SNPs line up with the major surname groups (ignoring for this the small percent of other surnames that will invariably be part of any large group through NPEs etc). The major subclades in the project will normally fall under different highlevel SNPs which serve to delineate those major subclades for that project's purposes, but if the SNPs that break down those subclades in that project don't align in time perfectly with the advent of surnames, then they'll probably be shared across in other projects too. (Edit: I'm sure there are lots of examples where that SNP has been identified; my point is I don't think it's the norm).

An extreme example in my project is we have a I-M223 subgroup. M223 is over 12,000 years old but it still serves to delineate that Vance group within the project (in this case there are others of the same surname outside of that group, but bear with me as an example since this could occur in a non-surname project for a major surname just as well). Obviously outside my project this Vance group shares M223 with thousands of other men and probably thousands of other surnames, so it doesn't identify the Vance surname anywhere outside my project. To find the SNP that DOES identify that group uniquely would require extensive NGS testing to find the one SNP that occurred most closely to when the surname was adopted in that line.

Coming at the question from the viewpoint of surnames, they rarely have one alignment with a single Y haplogroup in the first place, mainly due to separate origins of the same surname. So when you're looking for alignment between a SNP/subclade and a surname you have to answer the very real question of which "version" of the surname you're talking about. You can't just say that the best known origin, or the most famous line, etc, is the "correct" origin of that whole surname group and therefore that subclade matches to that surname one-to-one. Again using the Vance project as an example we have Irish lines, German lines, and Welsh lines. We know which one is from the Scottish noble line that is best known as the origin of the surname, but that's only ONE origin.

Welsh ancestry or ancestry from near Wales can be a particular problem because of the relatively late adoption of fixed surnames in some parts not fixed until the 1600's I think so quite a few related people with very different surnames. This isn't always understood maybe.
Just thinking about my own research, in a County like Herefordshire, the West of the County was culturally Welsh and even to some extent Welsh speaking up to the 19th century ,so I think you could have people who share descent, with those originating near the border having a culturally Welsh name and others from a little further away but in the same County having a culturally English or Anglo-Saxon based surname and maybe the latter "fixed" much earlier? John

RobertCasey
07-19-2017, 11:30 AM
I just spot-checked these clade names against the names at Yfull and Isogg. Of all these clades, only one (FGC12295) shows up at all. ::sad-face:: And even it is hidden (waiting for mouse-over). (You can "Find" it at view-source:https://www.yfull.com/tree/R1b/ (http://view-source:https://www.yfull.com/tree/R1b/) but not at https://www.yfull.com/tree/R1b/ itself.) And, I'd already noted that O'Brien connection in my database (perhaps finding it via Jean Manco's wonderful site*).

This isn't your fault of course. I'd be happy to be pointed toward more comprehensive trees especially if they are text-like. It's easy for me to search Yfull or Isogg, and tune their tables for my own convenience. Pngs, Jpegs, etc.? Not so much. FTdna's SNP tree? Much less convenient than Yfull or Isogg to copy-paste its tree structure, but if that's the best "standard", I'll work on it.



The best haplotreee for R-L226, is the one that I maintain at my web site. It contains not only NGS discovered branches but also many branches revealed by YSEQ testing, FGC testing, FTDNA L226 SNP pack testing, several major branches in unstable areas that have consistent testing results across 80 NGS tests and several indel branches (the source of discovery is listed when it is revealed by YSEQ testing or L226 SNP pack testing):

http://www.rcasey.net/DNA/R_L226/Haplotrees/L226_Tree.pdf

The second best haplotree is the one maintained by Alex Williamson at BigTree (the preferred method for R-P312):

http://www.ytree.net/DisplayTree.php?blockID=15

However, Alex's haplotree is only based on NGS testing. Many of our branches (17) were revealed via YSEQ testing of YSNPs and L226 SNP pack tests. These are noted in my chart. Also, Alex's chart still has several errors (ZZ31_1 which is in an unstable area AND has inconsistent test results as it tests negative for a major branch under ZZ31_1).

The next best haplotree is FTDNA's but FTDNA insists on listing all 50 private YSNPs found in the L226 SNP pack as branches vs. in the pop up menu with branch equivalents. So around half of the branches listed in the L226 haplotree are not real branches but are private YSNPs. We opened a ticket with FTDNA but their official position is that all private YSNPs had to be listed as branches so that end users could easily order - but they could be ordered from the popup list of branch equivalents as well but they did not want to mix branch equivalents and private YSNPs. Also, FTDNA continues to refuse to add branches based on YSEQ testing AND refuses to add YSEQ discovered branches to their individual order YSNP ordering so that we can validate them at FTDNA. You have to be logged in FTDNA to be able to view their haplotree which is a also major inconvenience:

https://www.familytreedna.com/my/y-dna-haplotree/?ekit=xTkMOy9VAMS2x8FrSaO18A%3d%3d

ISOGG haplotrees are really not very relevant for haplogroups like L226. Their listing criteria no longer meets the needs of our project, so we no longer submit new branches to be added to their haplotree. They will not list branches in unstable areas or indels. But the biggest issue of the genetic diversity requirement which will not allow most of our L226 branches to be listed. They really need to remove the "Genealogy" from their name with that rule. Plus they require so much documentation that is too much effort to provide for such an outdated haplotree (they have only eight of fifty-one branches - one is in error as well):

https://isogg.org/tree/ISOGG_HapgrpR.html

The next on the list is YFULL since most of R-P312 testers use BigTree at no cost vs. the $50 fee from YFULL. R-L226 and many other R-P312 haplogroup admins analyze their own BAM files, so paying $50 for NGS analysis for a third version of the haplotree just does not make sense:

https://www.yfull.com/tree/R1b/

The worst haplotree by far is Full Genomes Corporation which only lists R-L226 only. It only includes NGS tests analyzed by FGC, so none of the 51 branches under L226 are listed since only a handful of NGS testers have been tested (three) or analyzed (one more). They still have not added FGC5628 which should be there for a couple of years. Their interface is horrible as well (you have to expand dozens of the ancient meaningless YSNP branches to see more recent haplogroups):

https://www.fullgenomes.com/demo/ytree/











(http://www.ytree.net/DisplayTree.php?blockID=15)

jamesdowallen
07-19-2017, 07:07 PM
My post here is NOT about genetics or genealogy. It is about software methods and toothpaste!
And it is NOT addressed at RobertCasey. Rather, I'm taking the opportunity to voice a pet peeve. B) I am posting partly out of curiosity: Try to understand what I'm saying and see if I'm communicating successfully! ;)

Have you heard the expression "You can't squeeze toothpaste back into the tube" ?

When I'm studying the data at Ytree or Isogg, I work on my own private copies of those pages with a text editor. I can navigate quickly using search commands, even merge, with software assistance, the Yfull and Isogg trees if I wish. Push a few buttons in a text editor and ... Presto! I have a list of subclades. (If you clicked the link to my website in OP you'll see I've invested some time in amassing my data. Useful editing aids simplify that task.)


The best haplotreee for R-L226, is the one that I maintain at my web site....
http://www.rcasey.net/DNA/R_L226/Haplotrees/L226_Tree.pdf

The second best haplotree is ...

http://www.ytree.net/DisplayTree.php?blockID=15


The first link in above excerpt is to a pdf image. Perhaps there's a tool that would OCR that image, but I don't have it. The first thing I'd want to do is laboriously re-type all that data. (It's small enough that that won't take me too long, but consider the principle. Imagine if I had to do the same thing for the giant trees at Isogg or Yfull.) Perhaps you have that data in a nice textual format and used tools to make your pdf image. But for me to reconstruct the text from the image you created would be like shoving toothpaste back into the tube.

The 2nd link is also not ready for a quick cut-and-paste. But it has some options. With study I might be able to find it will generate a usable report. Or, I could try to reconstruct the tree structure by writing a script to parse the html source. I go to the top of that tree (R-P312/S116) and get a giant display. Is there a way to render this in .csv or do I need to study their script, reinvent a wheel, and get their "toothpaste back into the tube" ? ;)

Sorry for the rant! I'm not trying to cause trouble; I'm mostly posting out of curiosity to see if y'all understand what my point is! :P

RobertCasey
07-20-2017, 12:26 AM
We all shove toothpaste back into the tube every day. I did not even list other known haplogroup sources for U106, R1a and haplogroup I. Here are the major issues:

1) Many vendors are reluctant to use FTDNA IDs - the defacto standard for tracking YSNP results - so you have a classic IT issue of ID translation to create these charts. Only BigTree uses FTDNA IDs. This is a major issue but customer IDs are considered intellectual property of each vendor legally.
2) You have all vendors with no standards issues for export of data: FTDNA, Full Genomes, YSEQ, BigTree, YFULL, ISOGG, numerous admins. At least most of the NGS/WGS data is available in VCF and BAM files. You have to screen scrape FTDNA data and YSEQ data for individual YSNP testing and SNP packs.
3) You have two major sources of YSNP data - NGS/WGS and individual testing/SNP pack testing. Individual YSNP testing at YSEQ reveals branches at 10 % of the cost per branch than Big Y testing - so Big Y (NGS) only approach is just not cost efficient. FTDNA SNP packs dedicated to recent YSNPs (predictable YSNPs in the last 1,500 years or so) are also revealing many branches as well. One third of the L226 branches were discovered via YSEQ and FTDNA SNP packs.
4) You have two major players NGS testing - FTDNA and FGC soon YSEQ will be a third vendor.
5) You have two major players in individual/SNP pack testing - FTDNA and YSEQ.
6) You have many minor players in the atDNA world providing YSNP testing - Ancestry.com, 23andme, Nat Geo, BritainsDNA (wrapping up), LivingDNA (new guy -same old product from YSNP point of view). Why can't some vendor put 50,000 relevant YSNPs in a static test ??

Competition is very good as history teaches us: 1) FGC came out with NGS testing (FTDNA dropped WTY and eventually came out with Big Y); 2) YSEQ introduced SNP pack testing (and now we have much better SNP pack testing from FTDNA); 3) BigTree and YFULL came out with some decent haplotrees but different haplogroups went with different options (and FTDNA finally started making major improvements in their haplotree); 4) FTDNA dropped the very economical testing of individual YSNPs which YSEQ still provides (FTDNA seems OK with allowing YSEQ to have this low profit space for now and is presenting every improving SNP packs as an alternative).

The only eventual solution will be a paid service to access data that is vendor independent. YFULL is the first step in this direction but the $50 per test needs to be replaced with a monthly fee to allow major improvements and enhancements. BigTree is another attempt across vendor for free - but free limits its scope to P312 and now U106. GEDMATCH may be another possibility as they do now charge monthly for premium services. Being in Russia, YFULL could solve the FTDNA ID issue since Russia does not observe intellectual property issues like developed countries do.

Tools could be used to collect data. I import all my relevant L226 data to Dave Vance's SAPP charting tool which could be captured. Once everyone starts charting with a good charting tool, data collection would be possible. Obviously, YFULL and BigTree collect VCF and BAM files already. However, collected data can have errors, so screen scraping FTDNA and YSEQ YSTR and YSNP reports may be required. But FTDNA has web site terms and conditions to stop such activities. Russia or China would be a solution as these countries are hard to pursue legally. YSEQ is in Germany and FGC is in the US, so they are not good candidates. GEDMATCH understands the downside of using FTDNA IDs. If BigTree becomes too comprehensive, they would get FTDNA attention as well.

Reconstructing from numerous sources is not a good long term solution. For L226, start with BigTree but it is missing one-third of the branches since is only NGS/WGS based. For individual YSNP data and SNP pack data, you at the mercy of dozens of admins and dozens of formats. A standardized format would help but if you have all the raw data, a tool could easily generated all the tables you would want. So figuring out how collect all the NGS data and uploading it to commercial entity to host the database is the way to go. YSEQ would probably be very cooperative in sharing their data, but screen scraping FTDNA reports may run in legal issues (uploading one tester at a time just would not work). Do not get me started on privacy issues - that is toothpaste that has remains in tube that will never been used in any meaningful way.

These issues have been discussed many times - not much progress being made.

jamesdowallen
07-20-2017, 09:29 PM
Google turned up a wonderful page for me: http://haplogroup-r.org/tree/R.html
I didn't see it mentioned in the thread.

It's a huge version of the R-tree with many test results shown. Its surname ids are more numerous than the other tree: 28 Sullivans, for example, along with ten other surnames at 21+, and three dozen surnames at 10-19 hits. I'll ignore anything but obvious clusters but still have some work cut out for me. Ignoring all but the fifty surnames with 10+ hits, and not amending alternate spellings, I'd have missed the clearcut Gleeson/Gleason/Glisson cluster.

And -- joy of joys! -- unlike two sources mentioned upthread, the webpage is easily parsed; indeed the very same script to convert Yfull and Isogg to easily browsed text files operates as-is on R.Org.

I'll show a specific example:
The Isogg tree shows a terminal node at

DF13 > Z255/S219 = R1b1a1a2a1a2c1a4a Leaf
YFull shows two subclades of Z255 but our example involves just one:

# # R-Z255 350 AD == .IRL
# # # R-Y17109 600 AD == .IRL
# # # # R-A557 1850 AD == .IRL.IRL
# # # # R-Y17108 1200 AD ==
# # # # # R-A10634 1350 AD == .XXX.XXX.XXX
# # # # # R-Y17112 1250 AD ==
# # # # # # R-Y17989 1950 AD == .XXX.XXX
# # # # # # R-Y16880 1675 AD == .XXX.IRL.IRL
Note that one of the two subclades shown for R-Y17109 supposedly has MRCA less than 2 centuries ago!

And here is what is shown at R_org just for the Y17109 clade, which R_org calls Z16437

# # # R-Z16437/Y17109
# # # # R-Z16437* 3X-McMahon McCarthy Orgain XXX
# # # # R-FGC33226
# # # # # R-FGC33226* 3X-Carroll
# # # # R-Z16440 Pendergrast Phelps
# # # # R-A558 - A557
# # # # # R-A558* Bowman Mack McConnell McDonald Smith
# # # # # R-FGC39568
# # # # # # R-FGC39568* 2X-Creamer 2X-Cremin McCarthy McConnell Nicholson XXX XXX
# # # # # R-Z29008
# # # # # # R-Z29008* Miller XXX
# # # # # # R-A10889 8X-Treacy
# # # # R-A5631
# # # # # R-A5631* Gleeson Keefe
# # # # # R-A5629
# # # # # # R-A10634
# # # # # # # R-A10634* Gleason Glisson
# # # # # # # R-BY5709 2X-Gleeson
# # # # # # R-A5628
# # # # # # # R-A5628* McLachlan
# # # # # # # R-A660 2X-Gleason
# # # # # # # R-HR494
# # # # # # # # R-HR494* Gleeson
# # # # # # # # R-HR1293 Gleason Little

Bad news: They have their own names of course. In fact I'm still trying to figure out how to map the two versions of R-Z16437/Y17109 to each other (although it shows that A5631 == Y17108).

For my purpose I'll ignore surname clusters less than five or so -- so here just Treacy and Gleason. I happened on this particular subtree when I noticed, on skimming, the spelling variations Gleason/Gleeson/Glisson.

I'm posting this mainly for my own benefit -- before my own notes get corroded. :-) But others please feel free to comment.

RobertCasey
07-27-2017, 07:43 PM
Here is a recent analysis that I did for BY2852 (part of Z16437):

http://www.rcasey.net/DNA/Temp/BY2852_20170707C.pdf (http://www.rcasey.net/DNA/Temp/BY2852_20170707C.pdf)

This includes results for BigTree as well as information from FTDNA YSTR and YSNP reports. Again, this is a quick and convenient format using Adobe InDesign. This is in a box chart form that genealogists can understand vs. a standard format for capturing for further analysis. These are my manually generated charts but I am working with others to generate charts in many formats - including one that could be in MySQL format or HTML that you could parse. Unfortunately, the charting programs are early in development and generate too many errors. At some point in time, we would need to output formats that could be modified by additional data or different approaches.

Unfortunately, this part of the haplotree has characteristics that will not yield high accuracy charting: these YSNP branches are not genetically isolated from a YSTR point of view from other YSNP branches. Not only is there YSTR overlap within the branches of Z16437 but there is overlap with its brother L159.2 and even some overlap with other haplogroups not under Z255. This is the main issue with this area of the haplotree.

The Gleason surname cluster could be charted but you have a major choice to make: 1) either leave out many possible testers that are really part of this cluster by being very conservative in what you chart; OR 2) include many more testers into this part of the haplotree knowing that there will be many errors of testers that do not belong. Another alternative is to chart all of Z255 which would be a major effort to collect all the relevant data. By broadening the scope to Z255, many more signatures would be revealed where weak two and three marker signatures under Z16347 would be replaced with stronger three to five signatures under L159.2.

I collected most of the relevant data for all of Z16437 but accuracy just was not high enough and I was having to be extremely conservative with signature matches since there was so much overlap with L159.2 and even outside of Z255. This area of the haplotree will need more than 111 markers to chart (radically reducing the YSTR overlap of signatures) or an unbelievable amount of YSNP testing with very high coverage of testers (pretty much what the Gleason surname cluster is doing).