PDA

View Full Version : Using a combination of STR variants to predict Y subclades

emmental
12-06-2017, 05:04 PM
I’ve found that in my haplogroup (U152>Z36>FGC6418/Y3577) there is a very accurate combination of four STR variants for predicting fellow subclade members.
• DYS19=15
• DYS437=14
• DYS448=20
• DYS511=9
According to MacDonald’s Age Analysis Method on Big Tree, a member of this group and I have a MRCA who was probably born over 3200 years ago. If you look at our subclade in the U152 Project you can see that everyone (so far) has these four variants.

I thought it would be interesting to see if others who studied STRs in their (relatively recent) subclades have also found a combination which works well for predicting.

RobertCasey
12-07-2017, 04:27 PM
I just finished up using a trial copy of the statistical software package SPSS from IBM. For the first time, I used all of haplogroup R as input (54,000 testers at 67 markers) and determined that using genetic distance as a second variable produces 99 % plus prediction for haplogroups in the 1,500 to 2,500 year time frame. The great thing about this statistical model - it can easily be implemented with EXCEL as well. Here is the model used for binary logistic regression:

P = e**(constant1 + constant2 * Signature + constant3*geneticdistance) / ( 1 + (e**(constant1 + constant2 * Signature + constant3*geneticdistance) )

For each haplogroup analyzed, the three constants will change. Unfortunately, it takes a statistical software package to calculate the constants for each haplogroup. But this can easily be implemented via spreadsheet analysis as well.

___________Constant___Significance___Comment

Constant___-28.724_____0.996________99.6 % accuracy
Signature____17.557_____0.985_______98.5 % accuracy
GenDist_____-8.007_____0.968________96.8 % accuracy

For L226, here is the EXCEL version that also has 100 % accuracy:

Signature__ Result___Max GD____Min GD_____GD
Match_____________Positive____Negative____Filter

9_________0/1______10________38_________24.0
8_________0/1______11________26_________18.5
7_________0/1______15________17_________16.0
6_________0/1_______8________11_________9.5

The genetic distance filter is the key in the above chart. It is just the average between the highest genetic distance to test positive and the lowest genetic distance to test negative. By taking this average, future tests could test positive at higher genetic distances and future tests could test negative at lower genetic distances. As more tests are conducted, the genetic distance could be adjusted but probably will ever change the prediction. If no negative testers found, use Genetic Distance of 20 and if no positives are found, use genetic distance of zero.

Also note that the binary logistic regression constants will change some with more tested data as well. It is a form fitting regression model that calculates the constants to fit the observed results. Testing more candidates with signature matches of six and low genetic distance would change the constants - but not by very much. Also, if there is later found to be overlap between positive and negative testers at signature of six, the accuracy of the model would fall to high 99 % range vs. 100 % accuracy.

Hosmer and Lemeshow - Chi Square = 0.000 (0 % is under 5 % which is acceptable) and Significance = 1.000 (100 % accuracy in concordance of pairs - model accuracy matches the actual test results 100 % of the time.

I also had time to analyze L555 and L371. Both have Chi Square 0.000 and Significance of 1.000.

For L555

___________Constant___Significance___Comment

Constant___-56.886_____0.998________99.8 % accuracy
Signature____14.772_____0.996_______99.6 % accuracy
GenDist_____-5.178_____0.993________99.3 % accuracy

EXCEL equivalent:

Signature__Result___Max GD_____Min GD____GD
Match_____________Positive_____Negative__Filter

8_________1_______4__________[20]______12.0
7_________1_______7__________[20]______13.5
6_________0/1_____3__________9_________6.0
5_________0_______[0]________8_________4.0

For L371

____________Constant____Significance___Comment

Constant_____-42.899_____0.999________99.9 % accuracy
Signature_____8.675______0.999_________99.9 % accuracy
GenDist______-1.510______0.999_________99.9 % accuracy

EXCEL Equivalent

Signature__Result___Max GD___Min GD_____GD
Match_____________Positive___Negative___Filter

9_________1_______9________[20]_______14.5
8_________1_______6________[20]_______13.0
7_________NA_____[0]_______[20]_______10.0
6_________0______[0]_______18_________9.0

Note that L371 has a "perfect" curve for signature only prediction. However, both MiniTab and SPSS fail to calculate constants which is known bug in almost all statistical software packages. This is documented in numerous academic papers. Adding a second constant (genetic distance) removes the limitation of the software packages. Here is a paper that I have written about YSNP prediction which goes into detail about the "perfect" curve issue with statistical packages:

http://www.rcasey.net/DNA/R_L21/Math_behind_R_L21_SNP_Predictor_20170307A.pdf

Before you go off download your trial copies of MiniTab or SPSS, you really need to spend a lot of time collecting your data (95 % of time required). Getting the YSTRs is not too hard - but getting all the YSNP data (including branches under the haplogroup) is a real time consuming effort (unless you already do this as a haplogroup admin). The manual method via EXCEL works - but the statistical modeling verifies this approach is correct. I believe that 90 % of the haplogroups could be predicted - only if the haplogroup is 1,500 to 2,500 years old. Younger YSNPs will regularly require certain YSTR values, result in poor prediction accuracy with this methodology. Also, older haplogroups can not be predicted either since YSTR mutations begin to have too many secondary mutations along each path (hidden to living testers). Prediction for L1335 works great but its descendant L743 does not work since it requires certain YSTR values for accuracy.

The only solution for younger time frames is charting using signatures. For L226, I can now chart 82 % of 616 testers with accuracy between 60 to 95 %. We now have 63 branches under L226 with 100 Big Y tests and 90 L226 SNP packs. In addition to 63 YSNP branches, there are around three to four times as many YSTR branches within each YSNP branch.

Please note, that this analysis is only at 67 markers and ignores both CDY markers. If you use Burgarella mutation rates, CDY markers represent 40 % of the 67 marker mutations for a 1,500 year old haplogroup like L226. Due to excessive hidden mutations, these markers are not used. Both YSNP prediction and charting could be pushed to older and younger time frames with 500 YSTRs - but would require filtering out a lot of faster mutating markers that approach CDY mutation rates.

By using genetic distance as a second constant, you no longer have to test positive for L21, U106, DF27, etc. for prediction to work. The convergence that is present between the vast majority older haplogroups is now filtered out via genetic distance or genetic distance filters. It still requires that you only look at haplogroup R and haplogroup I separately. Even this filter may not be needed - I just have not to attempted this across multiple older haplogroups. I was surprised to learn that the L226 signature has a significant overlap with R1a testers - but genetic distance filters them out. The nine marker signature of L226 is just not isolated across all of haplogroup R being only nine markers. But genetic distance filters out the convergence of smaller signatures and filters out false hits from DF27, U106, R1a and others.

RobertCasey
12-07-2017, 05:21 PM
According to MacDonald’s Age Analysis Method on Big Tree, a member of this group and I have a MRCA who was probably born over 3200 years ago. If you look at our subclade in the U152 Project you can see that everyone (so far) has these four variants.

Signature sizes really need to be at least seven markers to be reliable and the time frame of 3,200 years old makes YSTR prediction very problematic due so many hidden YSTR mutations (dependent secondary mutations of the same marker along the same path). Even at 1,500 years, this is a minor factor and there will be around 20 or 30 hidden mutations out of several thousand mutations. You really need to YSNP test down to a more recent time frame to reveal more YSTR mutations. Surprising, you need very minimal sample sizes to have accurate YSNP prediction - but statistics require a younger time frame for this kind of YSNP prediction via YSTR signatures.

There are older haplogroup prediction tools like Athey's tool which work differently from you approach. Smaller signatures can work some times - just not very accurately. If these are rare marker values, this really allows you get by with fewer markers in the signature. According to the Little summary of R1b:

19=15 is 9 %, 437 = 14 is 12 %, 448 = 20 is 5 % and 511 = 9 is 3 %. Also, 511 is in markers 68 to 111, so you are using 111 markers vs. 67 which helps.

emmental
12-07-2017, 08:15 PM
Signature sizes really need to be at least seven markers to be reliable and the time frame of 3,200 years old makes YSTR prediction very problematic due so many hidden YSTR mutations (dependent secondary mutations of the same marker along the same path). Even at 1,500 years, this is a minor factor and there will be around 20 or 30 hidden mutations out of several thousand mutations. You really need to YSNP test down to a more recent time frame to reveal more YSTR mutations. Surprising, you need very minimal sample sizes to have accurate YSNP prediction - but statistics require a younger time frame for this kind of YSNP prediction via YSTR signatures.

There are older haplogroup prediction tools like Athey's tool which work differently from you approach. Smaller signatures can work some times - just not very accurately. If these are rare marker values, this really allows you get by with fewer markers in the signature. According to the Little summary of R1b:

19=15 is 9 %, 437 = 14 is 12 %, 448 = 20 is 5 % and 511 = 9 is 3 %. Also, 511 is in markers 68 to 111, so you are using 111 markers vs. 67 which helps.

Yes, I know that the values of these markers are relatively rare - and to have all four would be extremely rare. This is why it works so well for this particular subclade.

DYS 511 is in the 38-67 STR marker group - which is better for me because I do not need to see all 111 for the prediction. Nothing in the 68-111 group seems to consistently stick out for this subclade.)

I really wasn't using STRs for determining any TMRCA. I just stated that the one member, who did the Big Y, and I, who did FGC Elite 1.0, both joined the Big Tree and were estimated to be 3200 years apart. It just surprised me that these four markers would seemingly hold consistent that far back in time. So far seven different surnames have tested positive for FGC6418 and there are at least 3 more surnames which have tested down to Z36 who have not yet taken any test for FGC6418. All have this combination.

I guess to reword my original question - Is it rare to be able to a find a combination like this? Are there any other subclades out there with a reliable combination of variant STRs?

RobertCasey
12-07-2017, 10:02 PM
Any YSNP that is predicted to be 3,200 years old, YSNP prediction is very questionable from a statistical point of view. But in your case, you are using rare marker values which adds another dimension to prediction and would improve the ability as would adding very slow mutating markers that are different as well. Here is a quick and dirty analysis pulling from 54,000 67 marker testers at haplogroup R. This data is around six months old and the YSNP chains are not updated from the YSTR report so they could be even more dated.

fN64186
Brumby
FGC6418-
R1b-U152>Z36
R1b
0

f227997
Bain
FGC6418-
R1b-M269
Bain
0

f31552
Venter
FGC6418-
R1b-U152>Z36
Venter
1

fE15502
Crosta
FGC6418-
R1b-U152>Z36
R-P312
1

f484973
Domeika
FGC6418-
R1b-U152>Z36
1

f121964
zUnkName
FGC6418-
R1b-U152>Z36
Ortiz
0

f150506
deAlcantara
FGC6418-
R1b-U152>Z36
R1b
1

f275841
Prewitt
FGC6418-
R1b-U152
Border_Rei
0

f293548
Perry
FGC6418-
R1b-U152>Z36
Summers
0

f76785
Flippo
FGC6418-
R1b-U152>Z36
Flippo
1

fB6664
May
FGC6418+
R1b-M269
R-M269
4

f243331
Mueller
FGC6418+
R1b-M269
Miller
4

f204013
Musselman
FGC6418+
R1b-U152>Z36
Musselman
4

I found four positive and nine negative. The last column is your signature match. However, there
is a lot of convergence of this signature from R1a testers. I found 26 matches under R1a, but
you can just require a R1b requirement - but your markers are not unique under haplogroup R:

f275621
Salpagarov
INFO
R1a-M459>M417>Y52
Karachay
4

f315627
Orr
INFO
R1a-M459>M512
Orr
4

f37584
Shoesmith
INFO
R1a-M459>M512
Copeland
4

f88372
Coles
INFO
R1a-M459>M512
Cole
4

fN6841
zUnkName
INFO
R1a-M459>M512
India_Pak
4

f128448
zUnkName
INFO
R1a-M459>M512
Ingram
4

f141317
Chapman
INFO
R1a-M459>M512
Chapman
4

f195104
Graham
INFO
R1a-M459>M512
Graham
4

f197817
Lamont
INFO
R1a-M459>M512
Lamont
4

f229468
Leamon
INFO
R1a-M459>M512
Lemon
4

f247901
Chapman
INFO
R1a-M459>M512
Chapman
4

f248535
zUnkName
INFO
R1a-M459>M512
Chapman
4

f71842
Beard
INFO
R1a-M459>M512
Baird-2
4

f90458
Chapman
INFO
R1a-M459>M512
Chapman
4

f95631
Chapman
INFO
R1a-M459>M512
Chapman
4

fM8204
Altayy
INFO
R1a-M459>M512/M198
R-Arabia
4

f320913
INFO
R1a-M459>M512/M198
Swedish
4

f424366
Piotrowski
INFO
R1a-Z280>CTS1211>(YP951)
Poland
4

fN112858
zUnkName
INFO
R1a-Z280>CTS1211>CTS3402
Poland
4

fN113071
zUnkName
INFO
R1a-Z280>CTS1211>CTS3402>Y2608
Poland
4

f64924
Antiporuk
INFO
R1a-Z282>L1029>YP263
Belarus_1
4

f345297
Jensen
INFO
R1a-Z283
Norway
4

fB5637
DeCochrane
INFO
R1a-Z283
Scottish
4

fN9416
Liersta?
INFO
R1a-Z284>CTS4179
Holt
4

f158276
Chapman
INFO
R1a-Z284>CTS4179
Kauffman
4

f212871
Abdurrahman
INFO
R1a-Z645>Z93>L657
R-Arabia
4

You also have at least five Z36 testers (ancestor of FGC6418) that have not been tested:

f188413
Mosemann
INFO
R1b-U152>Z36
Musselman
4

f42389
Binggeli
INFO
R1b-U152>Z36
Singer
4

f78682
Binkley
INFO
R1b-U152>Z36
Binkley
4

fE17805
Binggeli
INFO
R1b-U152>Z36
Alpine
4

fN117099
Binkley
INFO
R1b-U152>Z36
Binkley
4

Then you have 36 R-M269 testers that have not been tested as well:

f175934
Binggeli(Binkele)
INFO
R1b-M269
Binkley
4

f247898
Musselman
INFO
R1b-M269
Musselman
4

f268314
Binkele
INFO
R1b-M269
Binkley
4

f309887
zUnkName
INFO
R1b-M269
Binkley
4

f131066
Binkley
INFO
R1b-M269
Binkley
4

f86194
zUnkName
INFO
R1b-M269
Binkley
4

f132111
Binkley
INFO
R1b-M269
Binkley
4

f187794
Binkley
INFO
R1b-M269
Binkley
4

f252478
Binkley
INFO
R1b-M269
Binkley
4

f284919
Binkley
INFO
R1b-M269
Binkley
4

f71366
Binkley
INFO
R1b-M269
Binkley
4

f71417
Binggeli
INFO
R1b-M269
Binkley
4

f71739
Binkley
INFO
R1b-M269
Binkley
4

f87035
Binkley
INFO
R1b-M269
Binkley
4

fB2325
Musselman
INFO
R1b-M269
Musselman
4

f129068
Binkley
INFO
R1b-M269
Binkley
4

f109709
Binkley
INFO
R1b-M269
Binkley
4

f111574
Pinkley
INFO
R1b-M269
Binkley
4

f112154
Pinckley
INFO
R1b-M269
Binkley
4

f144904
Singer
INFO
R1b-M269
Singer
4

f157836
White
INFO
R1b-M269
White
4

f170393
zUnkName
INFO
R1b-M269
Binkley
4

f178612
Binkley
INFO
R1b-M269
Hayes
4

f179439
zUnkName
INFO
R1b-M269
Binkley
4

f212790
zUnkName
INFO
R1b-M269
Binkley
4

f221275
zUnkName
INFO
R1b-M269
Binkley
4

f226102
zUnkName
INFO
R1b-M269
Binkley
4

f227800
zUnkName
INFO
R1b-M269
Binkley
4

f284737
zUnkName
INFO
R1b-M269
Binkley
4

f284942
zUnkName
INFO
R1b-M269
Binkley
4

f285048
zUnkName
INFO
R1b-M269
Binkley
4

f294157
zUnkName
INFO
R1b-M269
Binkley
4

f6019
Cooper
INFO
R1b-M269
Cooper
4

f75662
zUnkName
INFO
R1b-M269
Mercer
4

f90724
Binkley
INFO
R1b-M269
Binkley
4

fB2910
Moseman
INFO
R1b-M269
Musselman
4

So only seven percent of your signature has been tested - so that is a pretty small sample size
of testing candidates to establish a real strong relationship (filtering out R1a testers).

On a very positive note, there is a very strong affintity of the surname Binkley, so this YSNP
could be much younger than 3,200 year TMRCA estimate.

emmental
12-08-2017, 12:21 AM
A representative of the Binggeli/ Binkley family has tested positive for six out of the six SNPs he tested for in the FGC6424 block with YSEQ. He also tested negative for eight out of the eight SNPs he tested for in the FGC6439 block. Also, Mosemann, a variation of my surname (Musselman) has tested positive for eight out of ten SNPs he tested for in the FGC6439 block through YSEQ. A Zenger/Singer has been tested positive for Z36. I know about f75662 and White, (I can't say any more than that) but Cooper I didn't know about. Thanks for that lead!!!

From the the different testing done within this group (FTDNA, YSEQ and FGC) at this point it looks like the blocks can be broken up looking something like this (I still can't determine the exact placing of each SNP or the number of SNPs in each block) Z36>FGC6418>FGC6424>FGC6439>FGC6446

PS: I wish I had your computer skills to pull all that up so quickly!

RobertCasey
12-09-2017, 03:42 AM
I decided to test just how many R1b testers that match 3 of 4 of your marker values. I filtered out a huge number of R1a testers and a couple of dozen testers that can not belong to your branch via conflicting YSNP testing. I did not find any Binkley variants Mussellman variants which mean at least the two most common surnames are holding to matching when they are 4 of 4 matches. So to date, your four marker signature appears to be holding up (but there is one from the Binkley project with no surname listed):

f287025
Andrews
INFO
R1b-M269
Joyce
3

fN53053
Beard
INFO
R1b-M269
Baird
3

f170237
Calkins
INFO
R1b-M269
Calkins
3

f227507
Coffman
INFO
R1b-M269
Kauffman
3

f217358
Cooley
INFO
R1b-M269
Cooley
3

f239285
Fox
INFO
R1b-M269
Fox
3

f358970
Grindle
INFO
R1b-M269
Grindle
3

fN62139
Hall
INFO
R1b-M269
Hall
3

f349809
Harris
INFO
R1b-M269
Harris-1
3

f229205
Hawley
INFO
R1b-M269
Hawley
3

f210812
Hillenbrand
INFO
R1b-M269
R-M269
3

f37292
Jackson
INFO
R1b-M269
Jackson
3

f6132
Jackson
INFO
R1b-M269
Jackson
3

f62953
Jackson
INFO
R1b-M269
Jackson
3

f198012
Jackson
INFO
R1b-M269
Jackson
3

f144045
Johnistune
INFO
R1b-M269
Johnson
3

f56060
Johnson
INFO
R1b-M269
Johnson
3

f156626
Johnson
INFO
R1b-M269
Johnson
3

f254143
Johnson
INFO
R1b-M269
Johnson
3

f66391
Johnson
INFO
R1b-M269
Johnson
3

f260314
Johnson
INFO
R1b-M269
Johnson
3

f356103
Johnson
INFO
R1b-M269
Johnson
3

f133310
Johnston
INFO
R1b-M269
Johnson
3

f260420
Johnston
INFO
R1b-M269
Johnson
3

f285034
Lytle
INFO
R1b-M269
Little
3

f268830
McCarrell
INFO
R1b-M269
Carroll
3

f267076
McCarroll
INFO
R1b-M269
McCarroll
3

f277922
McCarroll
INFO
R1b-M269
Carroll
3

f521910
McDonald
INFO
R1b-M269
Cumberland
3

f124947
McDonald
INFO
R1b-M269
Clan Colla
3

f112197
McDonald
INFO
R1b-M269
McCain
3

f133546
McDonald
INFO
R1b-M269
Clan Colla
3

f196960
McDonald
INFO
R1b-M269
Clan Colla
3

f38885
McDonald
INFO
R1b-M269
Clan_Colla
3

f367328
McDonald
INFO
R1b-M269
R-DF21
3

f187597
McWood
INFO
R1b-M269
Clan Fraser
3

f173704
Oliver
INFO
R1b-M269
Oliver
3

fB98433
Poirier
INFO
R1b-M269
Perry
3

f261146
Sinclair
INFO
R1b-M269
Sinclair
3

f405798
Webb
INFO
R1b-M269
Webb
3

f319143
Wells
INFO
R1b-M269
Wells
3

f343595
Wells
INFO
R1b-M269
Wells
3

f373445
Wells
INFO
R1b-M269
Wells
3

f148880
zUnkName
INFO
R1b-M269
Stiles
3

f250571
zUnkName
INFO
R1b-M269
Leigh
3

f93674
zUnkName
INFO
R1b-M269
Stiles
3

f205151
zUnkName
INFO
R1b-M269
Hess
3

f364744
zUnkName
INFO
R1b-M269
Binkley
3

f267222
zUnkName
INFO
R1b-M269
Stiles
3

f289783
zUnkName
INFO
R1b-M269
Cortez
3

f294097
zUnkName
INFO
R1b-M269
Owen
3

f66696
zUnkName
INFO
R1b-M269
3

emmental
12-09-2017, 01:54 PM
I decided to test just how many R1b testers that match 3 of 4 of your marker values. I filtered out a huge number of R1a testers and a couple of dozen testers that can not belong to your branch via conflicting YSNP testing. I did not find any Binkley variants Mussellman variants which mean at least the two most common surnames are holding to matching when they are 4 of 4 matches. So to date, your four marker signature appears to be holding up (but there is one from the Binkley project with no surname listed):

Thank you for that table. Many of those names look familiar. I've probably looked them over in YSearch.

The Binkley Project is quite organized. There are over 40 separate family members who have done Y-STR testing. The name (Binggeli) is Swiss Bernese just as my name (Mosimann). Other confirmed members of the FGC6418 group are from Germany, Alsace, Austria, and a 1KG participant from Utah. There are a few other Swiss families who match the STR combination; Zenger (Z36), Nigg, Muheim, Uetltschi (Z36) and Trosch (Z36) who as of yet have not tested for FGC6418.

Three members of the Binkley group, who didn't test out to 67, (so you probably didn't pick up) do not fit the FGC6418 modal. One is DYS19=16, one is DYS19=14 and one is DYS448=19. All twenty-eight members who did test to 67 are 4 for 4.

No one has yet responded that they have found a relatively accurate combination of STRs in their haplogroup so I'm guessing this is rather rare. I agree, like you stated earlier, "using rare marker values which adds another dimension to prediction and would improve the ability as would adding very slow mutating markers that are different as well."

RobertCasey
12-09-2017, 07:32 PM
I tried to determine if U152 has a dependable modal and found 1,278 testers (first column). However, there are seven markers where the modal is somewhat problematic, so it looks like you need to use P312 without L21 as your modal instead. In memory serves me correctly, P312 is the same as L21 except for 449 which problematic for both. The list breaks up the multi-copy markers and adjusts 389 into a delta format.

272

333

478

510

405
320

217

1278

1144
1009
1046
809
1060
852
1257
1236
755
1049
1149
975
635

1207
985
1230
1221
924
1051
1081
584
1022
929
686

968
902
940
1200
1126
606
969
465
628

941
1134
1217
1224
1202
1228
1262
1194
1210
1277
1095
1071
1252
983
1125
864
1236
1263
1256
678
1246
1004
877
1097
816
1173
1208
1174
1079
1215
1116
1137

Total

13
24
14
11
11
14
12
12
12
13
13
16
17
9
10
11
11
25
15
19
29
15
15
17
17
11
11
19
23
16
15
18
17
36
38
12
12
11
9
15
16
8
10
10
8
10
10
12
23
23
16
10
12
12
15
8
12
22
20
13
12
11
13
11
11
12
12

P312 w/0
L21

901

1069

1371

1632

1034
878
1014
983

541

3878

3433
2837
3308
2626
3236
2641
3820
3771
2248
2961
3552
3009
1857
3666
3007
3779
3783
2691
3034
3021
1761
3065
3034
2089
2827
2750
2715
3657
3408
1711
2988
1523
1986
1316
1228
2952
3582
3663
3781
3674
3756
3796
3471
3730
3870
3326
3225
3780
3028
3501
2801
3748
3835
3675
1905
3813
2846
2556
3220
2615
3609
3721
3519
3372
3739
3671
3527

So the only difference between the P312 (w/o L21) modal and the L21 is 449 which is pretty marginal for both haplogroups. 449 = 30 for L21 and 449 = 29 for P312. These have been the well established modals for L21 which have worked very well for YSNP prediction. Also, many modals could be wrong but for L21, these modals have worked well across 40 or more haplogroups.

The problem with very small signatures is that they will miss the most critical testers at Signature match = 3 which will define the most critical mutations in your haplotree. I would start with analyzing the well tested Hinkle testers first with all 67 markers. Also, you are on the right track with charting branches below FGC6418 as these branches really help in charting of signatures via YSTRs within and between these YSNP branches. Your YDNA sampling under FGC6418 is very biased with Hinkle testers - but you can take a free ride with their testing. Here is a chart that shows what can be done with 63 branches under one haplogroup:

http://www.rcasey.net/DNA/R_L226/Haplotrees/L226_Home.pdf

You just need to concentrate on YSNP testing of the NGS testing - more NGS testing as there are just not enough NGS tests (you really need 25 to 50 NGS tests to get 20 or 30 well defined branches). You could reduce the NGS testing by 50 % if you extensively test YSEQ.