PDA

View Full Version : Spreadsheet for combBed SNPs and Adamov-based TMRCAs - Hg38 coordinates



Dave-V
03-14-2018, 02:00 PM
For anyone who used the older "YFull-based SNP Block Age Estimator" spreadsheet, I have updated the combBed regions in that spreadsheet to Hg38 coordinates. The version linked below can be used with any new (i.e. Hg38) SNP position references. Please note the older version of the spreadsheet is valid only for Hg19 coordinate references! This version should be used for any "new" position references.

The two main uses of the spreadsheet are to narrow down a list of SNPs (by position reference) to what's in YFull's combBed (i.e. one measure of higher quality), and to estimate the age of the given block of SNPs using the method from the Adamov et al paper.

One additional note - I was horrified a while ago to get a question as to whether this spreadsheet replaces YFull's analysis. Certainly not! There is much more to YFull's analysis than just number of SNPs and TMRCAs. In fact, YFull does a more customized TMRCA calculation based on your specific test's coverage of the combBed (the "Length Coverage for Age" figure), which after their analysis you can also enter in the spreadsheet here to get the same customization. But this spreadsheet is neither a replacement for nor an advertisement for YFull's methods. It simply automates the calculations from the Adamov paper for anyone who wants to apply that method to a phylogenetic block of SNPs.

Rant over :). The combBed was converted to Hg38 using Alex Williamson's list of conversion rules at http://www.ytree.net/hg19tohg38.html so thanks to Alex.

The Hg38 version of the spreadsheet can be downloaded here: https://drive.google.com/open?id=1xqSlX2GcEgL3goFJquzkRD_Cq6BuWV7Z

Cofgene
03-15-2018, 12:33 AM
Thanks. But users should beware of the statistical problems associated with the use of small data sets in estimating ages. N=1 is bad. N=100 is OK.

Dave-V
03-15-2018, 04:12 AM
Thanks. But users should beware of the statistical problems associated with the use of small data sets in estimating ages. N=1 is bad. N=100 is OK.

A valid point but it's a criticism of the variance of TMRCA estimates in general, and they are still in high demand. Statistically, even at N=1, 95% of the population should fit within the error range given. Whether that range is meaningful or not to the user is another question.