PDA

View Full Version : Haplogroup R - Public Testing ID's Cross-Reference Project



JamesKane
07-09-2017, 02:22 PM
Hello folks,

As some may be aware I maintain a table of public testing IDs at various labs and web resources that are beneath Haplogroup R. The purpose is to provide a centralized resource to associate a tester's results in a convenient manner.

Each person is given the following entrees:


Haplogroup - This reflects the position in the experimental tree at the same site. This is systematically assigned. It is not intended to supplant the efforts of the experts in each region of the tree, so it may be quite behind current knowledge.

Surname - The surname of the most distant known ancestor

Origin - The country of origin of the most distant known ancestor

Lab IDs for FTDNA, FGC, YSEQ, BISDNA (from the C2 2000 workbook where known), YFULL's tree, or other resources like PGP: Harvard.

The last few columns pertain to if I have created a GRCh38 aligned BAM file and the type of NGS test that is being used in the reporting elsewhere in the site.


The table is hosted at: Kit Cross Reference (http://www.haplogroup-r.org/kits.html). If you are aware of any entrees that could be updated to be more complete or merged, please share a correction with the email address in the page's footer. A form will be developed to assist with the process in the near future.

For those who would like to integrate the data into their own efforts, you may use http://www.haplogroup-r.org/api/crossref.json as an API to receive the table in JSON format.

Mikewww
01-03-2018, 06:06 PM
Hello folks,

As some may be aware I maintain a table of public testing IDs at various labs and web resources that are beneath Haplogroup R. The purpose is to provide a centralized resource to associate a tester's results in a convenient manner.

Each person is given the following entrees:


Haplogroup - This reflects the position in the experimental tree at the same site. This is systematically assigned. It is not intended to supplant the efforts of the experts in each region of the tree, so it may be quite behind current knowledge.

Surname - The surname of the most distant known ancestor

Origin - The country of origin of the most distant known ancestor

Lab IDs for FTDNA, FGC, YSEQ, BISDNA (from the C2 2000 workbook where known), YFULL's tree, or other resources like PGP: Harvard.

The last few columns pertain to if I have created a GRCh38 aligned BAM file and the type of NGS test that is being used in the reporting elsewhere in the site.


The table is hosted at: Kit Cross Reference (http://www.haplogroup-r.org/kits.html). If you are aware of any entrees that could be updated to be more complete or merged, please share a correction with the email address in the page's footer. A form will be developed to assist with the process in the near future.

For those who would like to integrate the data into their own efforts, you may use http://www.haplogroup-r.org/api/crossref.json as an API to receive the table in JSON format.

I think the whole Y DNA data warehouse concept you are using could ultimately change the nature of sharing information for Y DNA across the board.

I know that the Big Tree is P312 focused and we have to respect that Alex is not committed to any reprocessing, but I see where U106 is asking everyone to upload their Hg38 VCF zipped folders. Regardless of any reprocessing for the Big Tree, it might help on the McDonald age estimates to have the Hg38 VCF zipped folders for anyone in R1b. I know he likes to use early branches for comparisons. I'm not sure what other analyses you might be performing.

I'm asking if you want me to encourage people to submit their Hg38 VCF zipped folders, for purposes beyond just for the Big Tree focus?

JamesKane
01-05-2018, 12:56 PM
The Y-DNA Warehouse (http://www.haplogroup-r.org/shared_data.html) is ultimately intended to be a repository open to any Y-DNA project that may want to make use of the capacity. The domain name for this functionality will be changing in the not too distant future to make this more clear. The new site will introduce registration and self-management components. This will allow submitters to edit their most distant known paternal ancestor information and manage their cross-reference list entries. All the existing warehouse links will simply forward you to the new site when that cut-over occurs.

After that development is complete, there are several ideas floating around for what kinds of reports can be exposed via APIs. At first most will be geared to what is needed to support my version of the experimental tree, but they could be quite useful for others:
1) A report that returns the variants considered to be regionally unique to a tester and those which are placed in the tree. (See R-FGC11134's experimental tree (http://www.haplogroup-r.org/tree/R-FGC11134.html) and click on one of the blue kit buttons in the tree for an example.)
2) A variant occurrence report that shows the number of positive and negative calls under each major subclade. This is actually a prerequisite for the first one, but is quite useful for other purposes.
3) A call matrix report for a group of tests at a particular site.

To answer the specific question about if groups outside of Big Tree's focus area should be encouraged to submit results, I'd say it does no harm and allows collection of data into a central repository to start. There should be an understanding that it will take time for analysts to begin digesting it.