View Full Version : Need deeper (not basic) understand of snps

de Burgh
07-22-2019, 03:53 PM
I have learned all the basics of yDNA, STRs and SNPs. I am working with a more experienced person on constructing a clan haplotree which begins at about 1160 using both STRs and SNPs. I always find myself at a disadvantage and want to come up to speed.

I am looking for sources (including technical papers, books, webpages) for understanding how SNPs/Haplogroups placed in ordered, how to understand the relative age of equivalent SNPs, the variations in the rate snp formation/mutation (causes/range), determining the significance of private SNPs, etc.

I don't need the shallows or kiddy pool. I have 2 Masters and read Supreme Court decisions for fun. On the other, I am not technically informed on the Next Gen processes and am more interested in the post-processing analysis. If I need to learn some of the processing steps to comprehend the answers to my questions, I'm not intimidated. I just don't want to waste my time on things I don't have to know.

07-22-2019, 06:42 PM
It's hard to suggest exactly what you're looking for for two reasons - first because resources that bring people up to speed tend to be relatively basic, and second because you're covering a lot of ground - causes of mutations as well as how haplotrees are created, creating what Maurice Gleeson calls "mutation history trees" and how private SNPs are used etc.

I'll throw out a range of resources and hopefully others can add more.

Probably too simplistic for you but if you need more explanation of types of SNPs - recurrent, synonyms, equivalents, and why some SNPs are not used on haplotrees while others are, a very basic intro to how commercial haplotrees (FTDNA, YFull, etc) are formed (https://drive.google.com/open?id=1m0kWFlkpr9HRJe9qN8-DbN9Xd0XN1GRI). I can guess your reaction will be "too basic" but I was starting at the low end.

Again on the more basic side but some approaches to post-processing analysis combining SNPs and STRs can be found in Maurice Gleeson and John Cleary's videos for example here (https://www.youtube.com/watch?v=rvyHY4R6DwE), here (https://www.youtube.com/watch?v=rFfB-Y3XfCg), and here (https://www.youtube.com/watch?v=pxexkvfus6w). Those videos are still very much on-point even though we have more SNPs to work with since they were produced.

As to variations in SNP mutation rates and other such topics, I'm not aware of anything formal. There have been lots of conversations about it here on Anthrogenica, some of my own comments are here (https://anthrogenica.com/showthread.php?10785-What-mutation-rates-should-we-consider-for-Y-SNPs&p=244811#post244811) and here (https://anthrogenica.com/showthread.php?10785-What-mutation-rates-should-we-consider-for-Y-SNPs&p=244915#post244915).

On the more technical end but still focused on post-testing analysis:

Age estimation using SNPs is covered in two papers - one by Adamov et al describing the YFull approach (https://www.academia.edu/11554977/Defining_a_New_Rate_Constant_for_Y-Chromosome_SNPs_based_on_Full_Sequencing_Data), and a variation by Dr. Iain McDonald. (http://www.jb.man.ac.uk/~mcdonald/genetics/pipeline-summary.pdf?fbclid=IwAR2ak5ufOLWSe5pbcwWFOYejqfqo1 RzWbU3Ik5DHocI0ZChSoMartuXJo5Y) The Adamov paper in particular does a good job of explaining why the years-per-SNP mutation rate is tied to the coverage area of the Y chromosome across which the SNPs are counted.

The automated approach used by the SAPP tool to combine SNPs, STRs, and traditional genealogy information into a phylogenetic tree is described in this paper (https://www.academia.edu/38515225/Y-DNA_Phylogeny_Reconstruction_using_likelihood-weighted_phenetic_and_cladistic_data_-_the_SAPP_Program). The approach assumes you already have collected meaningful SNP, STR, and traditional genealogy information, but it does explain in detail what information from each source is important to building the phylogenetic tree, and an approach to analyzing each source of information that can easily be performed manually.

If these missed the mark entirely, then perhaps if you can describe the questions you're looking to answer in more detail someone can offer some other resources.

de Burgh
07-30-2019, 10:53 PM
Thanks, Davd-V. This adds to my knowledge-base with very little overlap with what I've seen. I very much appreciate it.

08-01-2019, 05:56 AM
Search YouTube for "Genetic Genealogy Ireland" for many great videos for YDNA. Maurice Gleason, John Cleary, Dave Vance, James Mallory, Dennis O'Brien, Dennis Wright, James Irvine, Brad Larkin, James Brazil, Paddy Waldron and myself. I also have around ten videos on YouTube as well - search "Genetic Genealogy Robert Casey". Maurice Gleason has many excellent YDNA videos as well - search "Genetic Genealogy Maurice Gleason."

de Burgh
08-07-2019, 02:11 AM
Thanks you Robert Casey. It's nice to have an opinion from someone presenting.

I'm especially interested at the moment with SNP mutation rates and how they vary across the haplotree and various haplogroups. Being statistically inclined I am looking not just for the averages, but also for the error bars (variation) on that rate.

I see you are in Texas. I am, too. I suppose it would be too great a coincidence that you might be near DFW.

08-07-2019, 06:24 AM
For the 24 surname clusters under L226, the range of YSNP mutation rate from is 250 years per YSNP to 56 years per YSNP. But it does average 84 years per YSNP.

Using the average for L226, 84 years, the surname cluster under DC69 has two YSNPs and is 668 AD. For DC191 and DC877, the TMRCA estimates would be 1256 AD (nine YSNP branches). Since we do know that Irish surnames are created around 1000 AD, this represents a pretty significant statistical variation. Using the worst case YSNP mutation rate, DC69 becomes 612 AD and DC191 and DC877 becomes 1004 AD (right on target).

Here is a summary of using the average of 84 years for L226 surname clusters:

YSNPs___Qty___TMRCA (AD)

I doubt that any surname cluster will ever have only one YSNP since FGC5660 represents 99 % of L226 and its brother. DC70 is only 1 % of L226 (over 800 Y67 or greater testers). But there are three surname clusters already with nine mutations, so someday there will be surname clusters with 10 and probably even 11 YSNP mutations.

So if you are the lucky 42 % of L226 that have six YSNPs in your path - things work out for you at 1004 AD. But three existing surname cluster would be dated at 1256 AD (which does not sit well with those that obviously belong to solid surname clusters). On the other end, one surname cluster would be dated 668 AD.

L226 is very aggressive in using mutations located in complex areas that FTDNA adds for us but rarely adds these branches via automated YSNP calling (with Big Y700, they are now getting more aggressive in calls to warrant the upgrade costs).