PDA

View Full Version : R1b-Early Subclades Haplotypes Spreadsheet



TigerMW
10-18-2013, 04:07 AM
I maintain a R1b-Early_haplotypes spreadsheet of all the confirmed haplotypes that I can find from public projects. Its found under the Links section of the R1b-YDNA Yahoo group under "Haplotype Data for R1b Early Branches." The Links section is under the More drop down menu arrow on the main page.
http://groups.yahoo.com/groups/R1b-YDNA/

I do NOT use any robot type programs to copy information from projects. I do it manually and only from public information.

There is a "Readme" help tab on the far left at the bottom. You may have to hit the left arrow to get to it.

The "AllHts" tab has all of the haplotypes.

All Haplotypes are displayed, but only the first 67 STRs per each haplotype.

The first group of columns provides background information including the kit #, Most Distant Known Ancestor (MDKA) surname, SNP based haplogroup definition, STR signature based variety (cluster) and geographic origin.
The Hg (haplogroup) column shows a description of the phylogenetic tree branch that the haplotype is on, according to the relevant Downstream SNPs.

The next group of columns is the first set of STR columns. They are the STR values in FTDNA panel sequence order with each multi-copy element broken out.

The Relevant SNPs column is a list of directly relevant SNP test results. Only the lowest level (youngest) positive (derived) SNP is shown plus any negative (ancestral) tested SNPs one level younger than the. In additioned, unpositioned (on the tree) SNP results are also shown.

The second set of STR columns shows only the Genetic Distance (GD) from the Base Haplotype at the top of the screen per each STR. This STR columns are displayed in order from slowest to fastest mutation rate.

On the far right are three columns that control GD calculations for the distance across all 67 STRs for any haplotype that you select versus all of the other 67 STR length haplotypes in the worksheet. You select the Target haplotype by putting an "x" in the yellow cell next to it. You can only select one haplotype as the target. You can make the target haplotype the modal for a selected set of haplotypes by putting an "x" at the top of the spreadsheet next to the calculated Mode row.

The last column has the Ysearch IDs where we could find them. You can use Ysearch to directly to contact the owner of the haplotype.

A statistical summary section is at the bottom of the spreadsheet. The calculations include:
Allele distribution table per each STR, 1-67
Count
Mode per each STR
Diversity per each STR
Variance per each STR
Standard Deviation per each STR
Mean per each STR
Sum of the Variance for all 67 STRs

The blue titled rows are numbers totalled for the whole spreadsheet. The green titled rows are totals for just the selected haplotypes.

You can use the autofiltering function of the spreadsheet in conjunction with the the green titled totals. Select (filter) to just view the haplotypes you want and the green titled statistics will subtotal for just those selected. Selection is accomplished by using the to the Column heading drop down arrow "autofilter" functionality.


The "ExtHts" tab has all of the 111 STR haplotypes.

Extended 111 STRs (only) are displayed on this worksheet. The GD and statistical capabilities are similar to AllHts.


The "Clades" tab has the SNP/haplogroup tree pointer information and the STR signatures of the deep ancestral varieties assigned.

Clades has two sections. First, a list of SNPs with associated haplogroup labels. The haplogroup labels on the other worksheets are displayed based on the SNPs in the Downstream SNPs column and how they align with this first table within Clade. This table contains branching depth levels as well as SNPs that need to be translated to other synomous SNPs for consistency purposes

The second section of Clade is a list of STR based variety labels along with their actual off-modal STR values.


The "Rates" tab has misc. information.

This worksheet primarily just contains supplemental information, like mutation rates for STRs. There are a set of columns where the modal values for the largest R1b subclades is kept. I need to update those modals for Z2103, Z2113, etc.