PDA

View Full Version : Wish list for YDNA tools and databases - R1b only (but could be expanded later)



RobertCasey
08-05-2016, 03:51 AM
If you could have access to advanced YDNA tools and access to consolidated YDNA databases, what is high on your list. The assumption is that these tools and databases would cost around $10 per month (cancel or add at any time). For now, assume that they are limited to haplogroup R and if business warrants expanding, these would expanded to other haplotrees for additional cost. Please respond with the ones you want the most (in order of importance) and omit the ones you do not care about. State the priority by stating one to ten with ten being the highest.

1) A very functional access to all 80 % of FTDNA data in public domain YSTR and YSNP reports. Currently over 50,000 67 marker YSTRs submission under haplogroup R with options to pull 111 markers as well. You could just download the file to EXCEL or use 20 or 30 canned SQL queries for filtering.

2) YSNP predictor tool that would predict single signature YSNPs (which covers 80 to 90 % of submissions under haplogroup R) - the only prerequisites are predicted by FTDNA to be R haplogroup and must have 67 markers tested. This would save the testing costs of R1b and L21 SNP packs as you could directly get prediction that would allow you to test M222 and L226 SNP packs directly. Accuracy would average 90 to 99 % which would be posted. Predicted YSNP could be verified at YSEQ for only $17.50.

3) Once you have a predicted YSNP, producing descendant charts that includes around 50 % of known submissions connected when only 15 % has been tested. Quality of prediction would range from 60 to 95 % (which would be posted) based on signature and genetic distance. As more YDNA data is tested in the future, coverage would increase to 80 to 90 % and average accuracy would increase as well. You could enter your own data - or use the data from option 1) and supplement option 1) with your own data. This would allow much better analysis of who needs to test what next.

4) Give Alex Williamson a break or some funds to expand his work to all of the R haplotype and beyond since Alex is only missing U106 and R1a. If Alex does not more to do, that would be OK as well. This is just a wish list that is possible. Allow data to be downloaded to spreadsheets for further analysis.

5) Access to cross reference of IDs between FTDNA, YSEQ, YFULL, FGC, FGC, 1K Genomes, YSearch, etc. (would be minimal at first and would depend on user input) - probably free at first to limited nature.

6) Access to YSEQ data that could be integrated with FTDNA data - limited by issue 5). Would be complete if YSEQ allows but would be primarily YSEQ IDs at first. Information could be downloaded to EXCEL or 10 or 20 SQL queries to filter the data.

7) Access to database 400 to 500 YSTRs produced by NGS tests (limited by current knowledge of which ones are reliable and which should be excluded). People are working on this and have tools to extract this data from NGS files.

8) Create option at FGC to release NGS tests to common database without dealing with all uploads. Raw data would be available for download as well. Would reduce the need of getting testers for acquiring the links and sending them off to others. FGC would obviously have to want to do this. Approach FTDNA to add same functionality for Big Y NGS files - if they want to do this. Automate transfers to FGC or YFULL if FTDNA or FGC wants to participate. This is raw data transfer only. Other options would include analysis.

9) Create FGC database of YSNPs available in database form that can downloaded and would have canned SQL to filter common requests.

10) Collection of misc tools including a convergence estimator to see if YSTR matches are false hits, genetic distance calculator and charting, etc. - please add your favorite small YDNA tool that would be good to include.

MitchellSince1893
08-05-2016, 04:05 AM
1. The ability to see my closest 111, 67, and 37 marker matches, regardless of the genetic distance.

2. The ability to filter my BigY results so I don't see matches that don't share certain SNPS with me. E.g. Only show me matches that are positive for R-Z142

3. The ability to search through FTDNA data bases for certain SNP matches. E.g. Show me everyone positive for FGC12378.

4. Geographic info. E.g. Of all Y dna tests where England is listed as origin what percentage is R-U152?

RobertCasey
08-05-2016, 04:16 AM
1. The ability to see my closest 111, 67, and 37 marker matches, regardless of the genetic distance.

2. The ability to filter my BigY results so I don't see matches that don't share certain SNPS with me. E.g. Only show me matches that are positive for R-Z142

3. The ability to search through FTDNA data bases for certain SNP matches. E.g. Show me everyone positive for FGC12378.

4. Geographic info. E.g. Of all Y dna tests where England is listed as origin what percentage is R-U152?

With access to YSTR and YSNP reports, these are primarily SQL queries. However, if you are talking integration across multiple databases, that would be a future integration option. But these kinds of queries would be available for each database. For the more common YSNPs, these would be stored as database fields. For the very recent discoveries, the raw YSNP fields would be scanned which would take time. Probably all the YSNPs in the FTDNA haplotree would be loaded as fields - for the private YSNPs and equivalents, unstable, etc, you would have to extract only a few at a time since that would require scanning the source files for matches.