PDA

View Full Version : SNP testing for P312 and the use of STR signatures



TigerMW
07-03-2013, 01:42 PM
I am initiating this thread to facilitate discussion specifically for the R1b-P312 project and try to determine guidance for SNP test planning at the individual level. This could help us in setting subgrouping criteria in the project and/or using some other method to communicate SNP (Single Nucleotide Polymorphism) testing guidance.

The first thought I'd like to express is the premise for SNP testing: Y SNP testing can help us discover the human family tree of paternal lineages. There is nothing else in DNA testing that has Y DNA's combination of high resolution and phylogenetic properties.

Y STR (Short Tandem Repeat) mutations are very useful and act as kind of rough clocks to measure time through diversity. However, they mutate both up and down and can jump two or more increments in one step (event). Their rate of mutations per generation also make for the real possibility of multiple (parallel) occurrences in a lineage.

Y SNPs, however, mutate very, very slowly to the point we can consider that they mutate hardly at all and if they do it is probably just once in a lineage. SNPs chosen to define subclades in formal Y DNA phylogetic trees have been evaluated for their suitability for such purposes to ensure they stable appear in fixed position with other SNPs. ISOGG (International Socieity of Genetic Genealogy) maintains a very current Y DNA tree. For example, here is the tree for all of haplogroup R: http://isogg.org/tree/ISOGG_HapgrpR.html

Y STRs can be used to indicate potential relationships, but Y SNPs can be used to confirm them. In effect, Y STRs are indicators but Y SNPs are truth tellers. In the past, Y SNPs were somewhat limited by their resolution or level of detail. There weren't enough them to define the human family tree of paternal lineages. However, that is changing and the potential is great. Even though a particular Y location may mutate only once every 100 million or so father/son transmissions there are so many locations on the Y chromosome with SNP potential that a new Y SNP occurs on average once every one and a half generations!

Since the SNPs used in formal Y phylogenetic trees (like ISOGG's) are stable and fixed in position on the tree they are reliable and can help show us who we might be closer related too as well as who we are not. I recommend (or at least one member per each true genealogical family) everyone test to their terminal (youngest) SNP on the formal Y trees. To know what SNPs test for us is the question. There are a couple of directions to go including more expensive Walk the Y (WTY) and full genome testing as well as a la carte one at a time testing or some middle of the road, like National Genographic's 2.0 offering.

Y STRs can help guide us. R1b people are quite numerous and of recent relationship, albeit prehistoric. That means we need to get to at least 67 Y STR markers in most cases. 111 Y STRs is fast becoming the new gold standard for R1b.

TigerMW
07-03-2013, 02:13 PM
Y STRs can help guide us. R1b people are quite numerous and of recent relationship, albeit prehistoric. That means we need to get to at least 67 Y STR markers in most cases. 111 Y STRs is fast becoming the new gold standard for R1b.

There is a technique of subgrouping by STR pattern called clustering. A common way to do this is to evaluate both the Genetic Distance (GD) and common off-modal STR values for groups of people. When we find matches and create a high potential (good suspect) subgroup it is typically called a cluster. Clusters typically have genealogical timeframe connotations. However, in some cases we are interested in deeper or older ancestral relationships such as when we are trying to break haplogroups up into smaller and smaller branches of the Y phylogenetic tree of paternal lineages.

Ken Nordtvedt, a former NASA scientist and member of the US National Science Board, has used the term "variety" to represent subgroupings within haplogroups that go back further in time than the genealogical timeframe. I extend that term to what I call "speculative deep ancestral varieties". These are just subgroupings intended to help aggregate people logically so they can more efficiently do SNP testing. I will use the term "variety" for short but when I use that term I'm really intending to describe a speculative potential ancient relationship. It is speculative. I'm not trying to predict haplogroups. SNP testing is required to validate or disprove the true relationships.

Y STRs are very useful in evaluating potential varieties. However, any one STR in any one or more individuals could be anomaly. Remember, they mutate relatively rapidly can sometimes erratically. To overcome the vagaries of Y STRs, its important to look at STR patterns, not just individual SNPs. The more STRs in a pattern, the better the odds of finding a true relationship. In terms of data, the more the merrier. Another way to say it is statistical analysis needs the law of large numbers to be implemented to gain precision.

There are different ways to look for STR patterns. One type of analysis is called the STR Signature Method. In this method we set an expected (best guess) ancestral set of STR values (ancestral haplotype) and then look for differences in suspected subgroupings. Oftentimes we use the modal values (most common values) as the best guess for an ancestral haplotype. The differences are called off-modals. An STR signature is just a collection of off-modal STR values that are found across some subgroup of suspected related people. As always with data, the more the merrier, so the more STRs in the signature, the firmer it is likely to be. Slower moving STRs are typically better for STR signatures but they can be red herrings. Even the slowest STR could have mutated in just the last generation or "back" mutated in the last generation. Unusual or rare patterns are what we typically look for. Sometimes a very high or very low value or set of off-modal values for a fast STR or two can be helpful, but rarity is a good thing to limit potential coincidental convergence from non-related people.

There is no golden method or set of rules for STR signatures because of the vagaries of Y STRs. I try to use the STR signature method but I also cross-check with Genetic Distances (GDs) between members of a suspected subgroup. If someone's GD to the rest of the group is way out of line, I'll typically throw them out but really, only SNP testing can validate or disprove the relationships.

Given the desire to look for multiple STRs in a signature, you can see why 67 STRs are needed. In some cases 111 are needed. In any case, 111 STRs helps provide better TMRCA estimates and better matches so I encourage 111 STRs for R1b people. This is a great starting point, either 67 or 111 STRs. Then you can look around you for what other people like you are seeing in their SNP test results.

TigerMW
07-04-2013, 02:17 PM
... I extend that term to what I call "speculative deep ancestral varieties". These are just subgroupings intended to help aggregate people logically so they can more efficiently do SNP testing. I will use the term "variety" for short but when I use that term I'm really intending to describe a speculative potential ancient relationship. It is speculative. I'm not trying to predict haplogroups. SNP testing is required to validate or disprove the true relationships
....
There is no golden method or set of rules for STR signatures because of the vagaries of Y STRs. I try to use the STR signature method but I also cross-check with Genetic Distances (GDs) between members of a suspected subgroup. If someone's GD to the rest of the group is way out of line, I'll typically throw them out but really, only SNP testing can validate or disprove the relationships.

Given the desire to look for multiple STRs in a signature, you can see why 67 STRs are needed. In some cases 111 are needed. In any case, 111 STRs helps provide better TMRCA estimates and better matches so I encourage 111 STRs for R1b people. This is a great starting point, either 67 or 111 STRs. Then you can look around you for what other people like you are seeing in their SNP test results.

I've instituted STR signature based variety assignments for every P312xL21 person I have found from the P312, U152, SRY2627 and R1b projects that have at least confirmed P312+ and 67 STRs. I maintain that in a spreadsheet that is posted as described in the following thread:
http://www.anthrogenica.com/showthread.php?917-R1b-P312xL21-Haplotypes-Spreadsheet-amp-SNP-Tree

If you are in the P312+ with 67 STRs and in the P312 project, I can easily show you who your closest GD's are and you can also look at STR signature patterns through this spreadsheet.... so don't hesitate to post your request (and include your kit#).

I request that advocates for the various sub-components of P312 (i.e. L238 and DF19) review the STR signatures and indicate how I should modify them or if there are conflicts between SNP results and the subgroupings. We had a person already alert me that one subgroup needed to be broken because one person was Z225+ in the FTDNA system while another person was S225- (another testing co.) and Z225 and S225 are the same.

Of course, my opinion is that everyone in the Z225 suspected new subdivision should all test for Z225 to make sure this is legitimate and they find their terminal SNP.

Wing Genealogist
07-04-2013, 03:10 PM
The R1b-U106 Project utilizes the same type of STR clustering pattern. In the past, such clustering has fairly reliably predicted new SNPs (from Ken Nordvedt's Frisian variety = Z8 [discovered in the 1000 Genome Project] to new SNPs from the Geno 2.0 Project). As Mike has made clear, this method should never be used to replace SNP testing, but should be used as a tool to predict which of the large number of available SNPs should be tested first.

In many ways, the Geno 2.0 test is a real game-changer in terms of SNP testing. While the GenoChip is missing some very important markers (such as P312 and L48), it is a one-stop shop for the vast majority of currently known SNPs, and the cost is roughly equivalent to 5 a la carte SNP tests at FTDNA. While some clades have very unique STR signatures, and would be more economical to be tested a la carte, the majority of folks run the very real possibility of spending a lot more money (and time) buying a bunch of SNP tests, when they can achieve basically the same result (with the small possibility of ordering a single SNP test later) via the Geno 2.0.

brianlm1
07-04-2013, 09:53 PM
As per your suggestion my kit number is 51865.
Many thanks for your help.
Brian McFarlane

jeanL
07-08-2013, 10:45 PM
Hey Mike I'm thinking about testing with FTDNA, the simple Y-12 which is like 54 bucks with shipping. 23andme has my dad as R1b-L11(xU106), he tests negative for U152, L21, M65, M153, M167. So what can I expect in terms of Haplogroup if I test with FTDNA, will they test for DF27 and Z196, or how does it work?

Wing Genealogist
07-08-2013, 11:13 PM
Jean, none of the STR testing will give you any SNP results. It's sort of like the old commercial for toys "Batteries not included" (as you have to purchase the SNP tests separately). However, FT-DNA does not allow someone to purchase any SNP testing until they have at least done some STR testing (even if it is the bare-bones 12 marker test). Given the large number of SNPs below L11 (even after eliminating U106 and its subclades) you may be better off saving your money to purchase the Nat.Geo. Geno 2.0 test. While it is missing some very important SNP markers, by and large it does a good job of extensively testing the vast majority of Y-SNPs currently known.

TigerMW
07-17-2013, 03:23 PM
Hey Mike I'm thinking about testing with FTDNA, the simple Y-12 which is like 54 bucks with shipping. 23andme has my dad as R1b-L11(xU106), he tests negative for U152, L21, M65, M153, M167. So what can I expect in terms of Haplogroup if I test with FTDNA, will they test for DF27 and Z196, or how does it work?

JeanL, please join in and get into one of the projects.

As Wing noted, the 12 STRs is not going to tell us much for a haplogroup like R1b which is so crowded (the bushy tree), and SNP testing does not come with it, just a very generic prediction like "R1b1a2" which means probably M269+. However, the 12 STRs is really a "ticket" to start into the FTDNA project system and start sharing and researching information. It is important to get started with Y STRs even at a low level.

For someone in your situation, I would recommend National Geno 2.0. It has good coverage within R1b and there is a chance you might identify a new SNP/subclade for your group. It also has mt DNA and autosomal DNA thrown in so I consider that as a bonus. The primary alternative to Geno 2.0 is a la carte SNP testing, at $39 a piece. That's inexpensive if you hit your right terminal SNP on your first rifle shot, but I'd consider that lucky unless you have some inside genealogical information on what to test for. The Geno 2.0 approach is more of a shotgun or hand grenade approach.

I guess I'm telling you to buy both, the 12 STRs and Geno 2.0. I'm not really thinking about your pocketbook, though. That's up to you to make those decisions. In reality, you are better off to go all the way to 67 Y STRs but you can always upgrade later.

One approach is to start out with 67 Y STRs and then use that to compare with other people. We even have a tool (big spreadsheet) to help you see SNP testing for people closest to you at 67 STRs. If you are lucky, you'll get a clear pattern with some GD corraboration. At that point, the rifle shot a la carte approach might work well.

GoldenHind
07-22-2013, 11:12 PM
Perhaps I should mention here that I posted some news about the STR signature for L238 and a recently discovered STR signature within P312** on the "L238 and P312** News" thread.