PDA

View Full Version : FGC6550 - review of predictable haplogroup (L21>DF49>DF23>Z2961>FGC6550)



RobertCasey
05-12-2020, 01:07 AM
I spent several hours analyzing the haplogroup FGC6550. This is my 20th review of predictable haplogroups under L21. This analysis
is by no means complete, but I attempted to pull all FGC6550 testers in the BigTree, updated the YSNP branches to FTDNA labels and looked at
several key surname projects. I have pulled most of the testers that have tested positive for branches under FGC6550 (those in public YSTR
reports). Of course many are private and are not included (I do not feel comfortable using any private data in this analysis). This is about the
20th haplogroup under R-L21 that I have analyzed in this manner, so the procedure is pretty established.

What is really missing:

1) I only looked at the obvious surname projects for data (did not pull DF49, L21 data though).
2) I primarily pulled data that was confirmed positive for FGC6550 only.
3) I used very old (3 years old) for predicted testers (the primary surname projects can now be pulled and many would be added as predicted FGC6550).
4) I deleted 90 % of the HG R testers that have no chance of being positive for FGC6550 (to reduce the size of data not relevant).


I did create a model for prediction of FGC6550 which should 98 to 99 % accurate. Currently the AcaStat model shows 100 % accuracy. This is in
column CB (you can just copy and paste into new rows).

This analysis includes:

1) 50 confirmed FGC6550 testers (this could probably be increased 10 or 20 % from public reports).
2) 23 predicted FGC6550 testers (with more pulls, this would easily double in size).

The first file is my source file for all analysis (it has three tabs):

http://www.rcasey.net/DNA/Temp/FGC6550_HG_R_Master_20200510J.xlsx

1) Tab FGC6550 has the original data.
2) Tab AcaStat is the input and output of the AcaStat statistical program (only costs $20 to download).
3) Tab SAPP is the source file for SAPP input (just needs copy and pasting to a Notepad txt file).


http://www.rcasey.net/DNA/Temp/FGC6550_SAPP_Input_20200510L.txt

Here is a link to the SAPP tool:

http://www.jdvtools.com/SAPP/

Here is the SAPP output file (html version):

http://www.rcasey.net/DNA/Temp/FGC6550_SAPP_Output_20200510M.html

Here is the SAPP output file (graphic version):

http://www.rcasey.net/DNA/Temp/FGC6550_SAPP_Output_20200510J.png

Anyone feel free to update these files with new data. Collecting the data was 90 % of the time. AcaStat and SAPP took less than one hour once the data was collected. I did not really sanity check the data that well. Feel free to ask questions. I have several YouTube videos that explain this process. Search "Genetic Genealogy Robert Casey" for a dozen presentations.

RobertCasey
05-21-2020, 10:01 PM
I have completed a new review of FGC6550 which includes pulls from over 2,000 projects late last year being added. I also created a YouTube video for the analysis of FGC6550.
I have decided for the first time to actually produce a YouTube video showing the reasons/advantages for YSNP prediction and charting of such haplogroups. It also goes over
the SAPP chart and describes how surname clusters are revealed and minor issues with SAPP charting:

https://www.youtube.com/watch?v=S0yUTotwI2I

Here are the five source files used for the latest analysis. The first is my master database for FGC6550 in EXCEL format (all analysis comes from this source file:

http://www.rcasey.net/DNA/Temp/FGC6550_HG_R_Master_20200520D.xlsx

This is the source file used for creating the SAPP output for the graphic version of the chart:

http://www.rcasey.net/DNA/Temp/FGC6550_SAPP_Input_20200520A.txt

Here are links to the HTML and graphic versions of the same input file:

http://www.rcasey.net/DNA/Temp/FGC6550_SAPP_Output_20200520A.html

http://www.rcasey.net/DNA/Temp/FGC6550_SAPP_Output_20200520A.png

Hopefully, somebody is interested in this review as this methodology can be used across all haplogroups.

Bart
05-24-2020, 06:26 PM
Thanks Robert, looks really useful, albeit I've just been getting used to it with a very small sample size.

Will take a much more detailed look over the coming weeks. It looks like what I've been doing manually, but of course much much quicker and automated.

Appreciate your efforts!


Bart