PDA

View Full Version : Is there an "easy" method to look at Ancient Y-DNA results



Wing Genealogist
09-23-2020, 06:57 PM
I have discovered a rather complicated way to examine the raw DNA data for ancient DNA remains (without downloading them to my computer and obtaining more software).

The raw data from many Ancient/Medieval DNA studies are stored on the European Nucleotide Archive (ENA) at: https://www.ebi.ac.uk/ena/browser/view/SAMEA######### (not a valid weblink, need to enter the sample #). This data can be viewed on the National Center for Biotechnology Information (NCBI) website https://www.ncbi.nlm.nih.gov/projects/sviewer/?id=CM000686.1&srz=ERR######## (not a valid weblink, need to enter the run ERR #).

This NCBI website uses Build 37 position results, while YBrowse (https://ybrowse.org/gb2/gbrowse/chrY/?) gives only Build 38 position results. I am aware of an online site https://genome.ucsc.edu/cgi-bin/hgLiftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver) which can accomplish this, but I would like to find a less complicated solution.

I also utilize the FTDNA Haplotree https://www.familytreedna.com/public/y-dna-haplotree/R (for Haplogroup R) to trace down the line. I have also a (now somewhat outdated) list of equivalent SNPs for U106 (which is my area of study) online at: https://docs.google.com/spreadsheets/d/1rpJP0Bt4qUQb9wWBFA7i1tLPV75ie_qS0iplwvvlVmQ/edit?usp=sharing under the clades tab. (The Ancient/Medieval/Royal DNA tab shows the results of my work on Ancient U106 remains). Is there a more up-to-date location which lists all of the equivalent SNPs for Y-DNA clades?

I start with the clade listed in the paper, then I go through (one at a time) every SNP (not clade) within the largest subclade. If I find a positive result, I then go to the next level. If I find a negative result, then I go to the next largest subclade.

This process is very labor (and time) intensive, and I would like to know if others have found an easier way.

deadly77
09-23-2020, 07:30 PM
I have discovered a rather complicated way to examine the raw DNA data for ancient DNA remains (without downloading them to my computer and obtaining more software).

The raw data from many Ancient/Medieval DNA studies are stored on the European Nucleotide Archive (ENA) at: https://www.ebi.ac.uk/ena/browser/view/SAMEA######### (not a valid weblink, need to enter the sample #). This data can be viewed on the National Center for Biotechnology Information (NCBI) website https://www.ncbi.nlm.nih.gov/projects/sviewer/?id=CM000686.1&srz=ERR######## (not a valid weblink, need to enter the run ERR #).

This NCBI website uses Build 37 position results, while YBrowse (https://ybrowse.org/gb2/gbrowse/chrY/?) gives only Build 38 position results. I am aware of an online site https://genome.ucsc.edu/cgi-bin/hgLiftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver) which can accomplish this, but I would like to find a less complicated solution.

I also utilize the FTDNA Haplotree https://www.familytreedna.com/public/y-dna-haplotree/R (for Haplogroup R) to trace down the line. I have also a (now somewhat outdated) list of equivalent SNPs for U106 (which is my area of study) online at: https://docs.google.com/spreadsheets/d/1rpJP0Bt4qUQb9wWBFA7i1tLPV75ie_qS0iplwvvlVmQ/edit?usp=sharing under the clades tab. (The Ancient/Medieval/Royal DNA tab shows the results of my work on Ancient U106 remains). Is there a more up-to-date location which lists all of the equivalent SNPs for Y-DNA clades?

I start with the clade listed in the paper, then I go through (one at a time) every SNP (not clade) within the largest subclade. If I find a positive result, I then go to the next level. If I find a negative result, then I go to the next largest subclade.

This process is very labor (and time) intensive, and I would like to know if others have found an easier way.

I do it pretty much the same way (manually looking through the SNPs one by one) but I use the IGV from the Broad Institute for viewing the BAM files. It is time consuming to look through these one by one but I find that it's useful for seeing ambiguous calls and also if there's a lot of mutations in the vicinity of the SNP, which can be an indication that a derived SNP may not be real one. I find it's important to go through the phyloequivalent SNPs on the same level in order to weed out false positives - a lot of ancient DNA samples might have only one read at a position and there's also a lot of no calls.

Most ancient BAM files are mapped to hg19, and as you say YBrowse only shows the hg38 locations. If you have an account with YFull, I find that's the easiest reference to follow, either by the Check SNPs function, or by using your own YTree and clicking on one of the SNPs in that, which will bring up a pop up window. YFull displays both the hg19 and hg38 positions of a SNP, and tends to update pretty closely from YBrowse. Might be a bit of a lag for newer FTDNA SNPs since they don't submit directly to YBrowse. Brad Larkin's Genetic Homeland site also has an option to search for a SNP and that will show both hg19 and hg38 positions.

There are automated extraction tools out there, but I haven't used any of them. Some of that's due to my unfamiliarity with Linux/Unix, where most of these tools appear to be developed and I'm not computer savvy enough to understand Linux through Ubuntu on Windows. Some of that is because I like to see the calls individually.

It is rather time consuming though, and the only time saver that I've really managed is running multiple BAM tracks in parallel on the IGV, so I can do several samples in one go. But I'm probably not doing it the best way - just a self taught route from trial and error.

MacUalraig
09-23-2020, 07:42 PM
I've got a local SQL SNP database with both hg19 and hg38 positions but only for SNPs that I am interested in. If I want a full cross checking list I usually use YFull as they supply both.