PDA

View Full Version : A Mutation Tree for YFull's NGS "500 STRs"



Dave-V
01-31-2017, 05:47 PM
This is just a start at organizing the STR data from NGS testing, but it is possible at least.

YFull currently has 317 kits in the L21 group reporting up to 503 STR markers from their NGS analyses. I ran those through SAPP to build a L21 mutation history tree for the 317 kits for all markers.

When I say "all markers", I did have to clean up the data for no-reads, etc. I included only the 439 STRs with >50% reporting among the 317 kits. Perhaps more importantly for this first run I simplified the remaining data by using the group modal values in place of the remaining blanks and converting the compound/complex repeats (e.g. "10.3", "16.g", etc) to a "simple"/integer form. In later runs I can try different assumptions. I need to also gather more STR mutation rates for the non-FTDNA-tested markers; for now for those I just used the NIST standard 1.3x10-3.

And finally I also included in the run YFull's Y-SNP tree under L21 and the terminal SNPs for each of the 317 kits so the program would follow the Y-SNP tree.

SAPP completed the 317 kits with 439 data points in about 15 minutes. I have attached the output below. This is a huge amount of data to absorb and neither the picture output nor the table output are particularly easy to follow. However there are interesting points in the mutation history, including several possible STR signatures for sub-groups of L21 among the non-FTDNA markers. The run table output also includes 439-STR-long modal haplotypes for the major sub-L21 SNPs.

Ignore kits YF01441 and YF01525 in the data; they're non-L21 kits in the group that the program correctly shows as outliers.

For anyone outside of L21 I'm happy to repeat the exercise for other groups.

Files (Dropbox):

Output Run Tables (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20439%20STRs%20SAPP%20run%20tables.htm l) (input tables plus SNP modal haplotypes and genetic distances between kits) (warning: 28Meg)

439-STR Mutation History Tree as Picture (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20439%20STRs%20SAPP%20picture%20output .png) (PNG) (warning: 14Meg & 54133x2946 pixels. Normally STR mutations are listed above the nodes/kits; here they are above and to the right of each node or kit for readability).

439-STR Mutation History Tree as Table (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20439%20STRs%20SAPP%20table%20output.h tml) (HTML)

MitchellSince1893
01-31-2017, 06:06 PM
...

For anyone outside of L21 I'm happy to repeat the exercise for other groups.

Files (Dropbox):

Output Run Tables (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20439%20STRs%20SAPP%20run%20tables.htm l) (input tables plus SNP modal haplotypes and genetic distances between kits) (warning: 28Meg)

439-STR Mutation History Tree as Picture (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20439%20STRs%20SAPP%20picture%20output .png) (PNG) (warning: 14Meg & 54133x2946 pixels. Normally STR mutations are listed above the nodes/kits; here they are above and to the right of each node or kit for readability).

439-STR Mutation History Tree as Table (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20439%20STRs%20SAPP%20table%20output.h tml) (HTML)

Thanks for the kind offer. Request a run on R-U152

jbarry6899
01-31-2017, 07:52 PM
Thanks for the kind offer. Request a run on R-U152

Second the request. Thanks!

razyn
01-31-2017, 08:02 PM
I don't want to be late to the party, so I request a run on R-DF27. Really it should be paired with U152 anyhow, but I gather the data will be completely out of hand without doing that. I don't have a computer at present that can handle it, but am currently shopping, maybe I will have a new one before you get around to us. Meanwhile, others are not so handicapped.

Mikewww
01-31-2017, 08:58 PM
This is just a start at organizing the STR data from NGS testing, but it is possible at least.

YFull currently has 317 kits in the L21 group reporting up to 503 STR markers from their NGS analyses. I ran those through SAPP to build a L21 mutation history tree for the 317 kits for all markers.

When I say "all markers", I did have to clean up the data for no-reads, etc. I included only the 439 STRs with >50% reporting among the 317 kits. Perhaps more importantly for this first run I simplified the remaining data by using the group modal values in place of the remaining blanks and converting the compound/complex repeats (e.g. "10.3", "16.g", etc) to a "simple"/integer form. In later runs I can try different assumptions. I need to also gather more STR mutation rates for the non-FTDNA-tested markers; for now for those I just used the NIST standard 1.3x10-3.

And finally I also included in the run YFull's Y-SNP tree under L21 and the terminal SNPs for each of the 317 kits so the program would follow the Y-SNP tree.

SAPP completed the 317 kits with 439 data points in about 15 minutes. I have attached the output below. This is a huge amount of data to absorb and neither the picture output nor the table output are particularly easy to follow. However there are interesting points in the mutation history, including several possible STR signatures for sub-groups of L21 among the non-FTDNA markers. The run table output also includes 439-STR-long modal haplotypes for the major sub-L21 SNPs.

Ignore kits YF01441 and YF01525 in the data; they're non-L21 kits in the group that the program correctly shows as outliers.

For anyone outside of L21 I'm happy to repeat the exercise for other groups.

Files (Dropbox):

Output Run Tables (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20439%20STRs%20SAPP%20run%20tables.htm l) (input tables plus SNP modal haplotypes and genetic distances between kits) (warning: 28Meg)

439-STR Mutation History Tree as Picture (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20439%20STRs%20SAPP%20picture%20output .png) (PNG) (warning: 14Meg & 54133x2946 pixels. Normally STR mutations are listed above the nodes/kits; here they are above and to the right of each node or kit for readability).

439-STR Mutation History Tree as Table (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20439%20STRs%20SAPP%20table%20output.h tml) (HTML)
Dave, we have a DF1 YFull group too. Would it help you to be an admin for that? I haven't compared but there may be some DF1/L513 people who aren't in the L21 group.

Dave-V
01-31-2017, 10:37 PM
Thanks for the kind offer. Request a run on R-U152

MitchellSince1893 sent me the U152 NGS kits; results attached (note in this case I took out kits YF05298, YF01457, and YF05911 as they were not in R-U152). There were 445 STRs with >50% read coverage for this group.

I have also attached here the input file to SAPP for reference, and the spreadsheet I used to create the various sections of the input file.

Picture Output (https://dl.dropboxusercontent.com/u/106196821/SAPP%20U152%20445%20STRs/YFull%20U152%20445%20STRs%20SAPP%20output%20pictur e.png) (PNG)

Table Output (https://dl.dropboxusercontent.com/u/106196821/SAPP%20U152%20445%20STRs/YFull%20U152%20445%20STRs%20SAPP%20output%20table. html) (HTML) (same as picture output but in tabular form; easier to read/print/etc)

Run Tables (https://dl.dropboxusercontent.com/u/106196821/SAPP%20U152%20445%20STRs/YFull%20U152%20445%20STRs%20run%20tables.html) (HTML) (kit inputs, modal haplotypes, etc)

Input File (https://dl.dropboxusercontent.com/u/106196821/SAPP%20U152%20445%20STRs/YFull%20U152.txt)

Spreadsheet (https://dl.dropboxusercontent.com/u/106196821/SAPP%20U152%20445%20STRs/STRs_all_r-u152_20170131%20mod.xlsx) used to create input file.

Dave

Dave-V
01-31-2017, 10:42 PM
Dave, we have a DF1 YFull group too. Would it help you to be an admin for that? I haven't compared but there may be some DF1/L513 people who aren't in the L21 group.

I had forgotten to join the DF1 YFull Group but just did this week (thanks!). Those kits should hopefully be in the L21 set too, but here's the results for DF1 by itself. 451 STRs in this set were >50% so there are a few more captured than in the L21 output.

Output picture (https://dl.dropboxusercontent.com/u/106196821/SAPP%20DF1%20451%20STRs/YFull%20DF1%20451%20STRs%20output%20picture.png) (PNG)

Output table (https://dl.dropboxusercontent.com/u/106196821/SAPP%20DF1%20451%20STRs/YFull%20DF1%20451%20STRs%20output%20table.html) (HTML) (same as above but in table form)

Run Tables (https://dl.dropboxusercontent.com/u/106196821/SAPP%20DF1%20451%20STRs/YFull%20DF1%20451%20STRs%20run%20tables.html) (modal haplotypes, etc)

Input File (https://dl.dropboxusercontent.com/u/106196821/SAPP%20DF1%20451%20STRs/YFull%20DF1.txt)

Spreadsheet (https://dl.dropboxusercontent.com/u/106196821/SAPP%20DF1%20451%20STRs/STRs_all_r-df1_20170201%20mod.xlsx) used to create input file

MitchellSince1893
02-01-2017, 01:37 AM
...

When I say "all markers", I did have to clean up the data for no-reads, etc. I included only the 445 STRs with >50% read coverage for this group...Perhaps more importantly for this first run I simplified the remaining data by using the group modal values in place of the remaining blanks and converting the compound/complex repeats (e.g. "10.3", "16.g", etc) to a "simple"/integer form. In later runs I can try different assumptions. I need to also gather more STR mutation rates for the non-FTDNA-tested markers; for now for those I just used the NIST standard 1.3x10-3....

Feedback for you: This approach of using modal values in place of blanks resulted in me having a 36 marker difference with myself. I'm YF01489 and YY06577. (My father's BigY result and FGC result).

I'd be interested to see if different assumptions get that number down.

Dave-V
02-01-2017, 03:27 PM
Feedback for you: This approach of using modal values in place of blanks resulted in me having a 36 marker difference with myself. I'm YF01489 and YY06577. (My father's BigY result and FGC result).

I'd be interested to see if different assumptions get that number down.

Good example, thanks. A better approach than just using modals might be to use the nearest reported value based on how the kits fall on the SNP tree. I'll try that next.

lgmayka
02-01-2017, 03:40 PM
TI ran those through SAPP to build a L21 mutation history tree for the 317 kits for all markers.
YFull already does something along this line itself, although I don't know whether the employed algorithm is documented anywhere--it may still be very experimental. Just as an example, below is the signature YFull gives for L226. The so-called mutation rate actually has an inverse logic--more stars implies a slower mutation rate.

This is found on the STRs tab of the "Info" on a tree branch.
---
STRs MUTATION RATE ANC DER
DYS538 **** 10 → 9
DYS716 *** 26 → 24
DYS592 shared with YF04772 *** 11 → 12
DYS522 shared with YF06573 *** 10 → 11
DYR59 *** 13 → 15
DYS526A *** 13 → 14
DYS510 shared with YF01431 *** 17 → 18
DYR44 ** 11 → 10
DYS557 shared with YF03622, YF06573 ** 16 → 15
DYS439 shared with R-Z2189 ** 12 → 11
DYS551 ** 13 → 14
DYR101 shared with R-A16 ** 11 → 12
DYS679 shared with R-A16 ** 13 → 14
DYS622 ** 19 → 20
DYR161 shared with YF04772 ** 16 → 15
DYS456 shared with R-Z2189 ** 16 → 15
DYS517 shared with R-A16, YF06573 ** 14 → 15
DYS722 shared with R-A16, YF06573 ** 21 → 22
DYR33 shared with R-A16, YF04772, YF06573 ** 14 → 15
DYS546 shared with R-Z2189, YF01431 ** 15 → 16
DYR75 ** 14 → 13
DYS630 shared with YF01431, YF04772 ** 21 → 20
DYR1 shared with R-Z2189 ** 17 → 16
DYS523 ** 13 → 12
DYR55 shared with R-A16, YF01431, YF03622, YF04772 ** 11 → 12
DYR160 ** 13 → 15
DYF393 shared with YF03622 ** 27 → 28
DYS471 ** 28 → 29
DYS626 shared with R-A16, YF06573 ** 26 → 27
DYS664 shared with YF01431, YF03622, YF04772 ** 50 → 51
DYS627 shared with R-A16 ** 28 → 27
DYS526B ** 35 → 36
DYS684 ** 57 → 59
DYS612 shared with R-A16 ** 31 → 30
DYR170 ** 37 → 34
DYS688 ** 79 → 80
---

jbarry6899
02-01-2017, 03:41 PM
Looking at the U152 results they seem to be missing some key subclades, in particular S8183 and those below.

Cofgene
02-01-2017, 04:50 PM
The U106 guys are curious what you would show for us.....

Dave-V
02-02-2017, 01:25 AM
Here's round 2 for U152. Instead of choosing the modal values for no-call markers, this version triangulated a value from the closest kits and only adopted the modal if there were no values among the closer kits. It does make the GDs smaller.

MitchellSince1893, there is still a GD of 6 between your two kits. I checked the original data and all 6 are original differences! That's a little disconcerting that Big Y and FGC can be off by that many of the same markers, although it's really only just over 1% of the data.

FYI the reason not all the U152 subclades are listed in the output is because not all of the kits list all of the SNPs. The program only reports the SNPs that it's given. I did clean up the <SNP1 SNP1> designations in this version though; that's because I originally told it all the terminal SNPs were SNP+ and I should have said they were SNP* (i.e. negative for ALL downstream SNPs). If anyone cares I'll explain that in more detail but in this output it should look right.

Output as Picture (https://dl.dropboxusercontent.com/u/106196821/SAPP%20U152%20445%20STRs/YFull%20U152%20run%202%20output%20picture.png)

Output as Table (https://dl.dropboxusercontent.com/u/106196821/SAPP%20U152%20445%20STRs/YFull%20U152%20run%202%20output%20table.html)

Run Tables (https://dl.dropboxusercontent.com/u/106196821/SAPP%20U152%20445%20STRs/YFull%20U152%20run%202%20run%20tables.html) (modal haplotypes, etc).

I'll run DF1 V2 tomorrow. For anyone else who would like a run, I just need the Excel table of STRs from the YFull Group for the subclade. It's probably easier if you can send it to me (davevance01@gmail.com) than if I get added to everyone's YFull Group ;).

Osiris
02-02-2017, 02:41 AM
It's very interesting. Thank you for the effort. FYI in the U152 table you have a neat compare against and early tester and a later tester. I've YF1527 and I ordered within the first few weeks it was available. My father's brother is 5800 and then a male line 4th cousin is 5811. They both ordered after it was out over a year. Interestingly it puts my uncle and the 4th cousin in a late clade and my related further out. So maybe these are probably just poorly performing strings. ?

RobertCasey
02-02-2017, 03:57 AM
Here a summary of mutation rates from the academic world and what they have tested extensively (around 200 STRs included). This article was a consolidation of ten or twenty other papers on mutation rates (sorry, this is only the summary chart and does not include the entire article):

http://www.rcasey.net/DNA/Casey/Sources/Mutation_Rates_Burgarella_2010.pdf

Dave-V
02-02-2017, 04:06 AM
It's very interesting. Thank you for the effort. FYI in the U152 table you have a neat compare against and early tester and a later tester. I've YF1527 and I ordered within the first few weeks it was available. My father's brother is 5800 and then a male line 4th cousin is 5811. They both ordered after it was out over a year. Interestingly it puts my uncle and the 4th cousin in a late clade and my related further out. So maybe these are probably just poorly performing strings. ?

Not sure about the first run, but in the second run I posted just above, YF01527 and YF05800 are together with YF05811 on the next tier out, about what you'd expect for those relationships.

MitchellSince1893
02-02-2017, 04:42 AM
double post

MitchellSince1893
02-02-2017, 04:42 AM
Here's round 2 for U152. Instead of choosing the modal values for no-call markers, this version triangulated a value from the closest kits and only adopted the modal if there were no values among the closer kits. It does make the GDs smaller.

MitchellSince1893, there is still a GD of 6 between your two kits. I checked the original data and all 6 are original differences! That's a little disconcerting that Big Y and FGC can be off by that many of the same markers, although it's really only just over 1% of the data.

FYI the reason not all the U152 subclades are listed in the output is because not all of the kits list all of the SNPs. The program only reports the SNPs that it's given. I did clean up the <SNP1 SNP1> designations in this version though; that's because I originally told it all the terminal SNPs were SNP+ and I should have said they were SNP* (i.e. negative for ALL downstream SNPs). If anyone cares I'll explain that in more detail but in this output it should look right.

Output as Picture (https://dl.dropboxusercontent.com/u/106196821/SAPP%20U152%20445%20STRs/YFull%20U152%20run%202%20output%20picture.png)

Output as Table (https://dl.dropboxusercontent.com/u/106196821/SAPP%20U152%20445%20STRs/YFull%20U152%20run%202%20output%20table.html)

Run Tables (https://dl.dropboxusercontent.com/u/106196821/SAPP%20U152%20445%20STRs/YFull%20U152%20run%202%20run%20tables.html) (modal haplotypes, etc).

I'll run DF1 V2 tomorrow. For anyone else who would like a run, I just need the Excel table of STRs from the YFull Group for the subclade. It's probably easier if you can send it to me (davevance01@gmail.com) than if I get added to everyone's YFull Group ;).

This runs seems much better...at least when comparing my two kits...as you said, only 6 out of 446 SNPs differ now. A few differences are to be expected as STRs derived from the BigY and FGC SNP tests are not as accurate as actual STR tests.

Osiris
02-02-2017, 06:52 AM
Not sure about the first run, but in the second run I posted just above, YF01527 and YF05800 are together with YF05811 on the next tier out, about what you'd expect for those relationships.

That did it. Thank you very much.

Dave-V
02-04-2017, 07:58 PM
Ok; I finished up L21 and DF1 using the triangulation of nearby kits for no-call markers rather than modals. Here are the results for both - with 443 STRs for L21, and 451 STRs for DF1. Modal haplotypes for the SNPs under each are in the Run Tables (after the kit data).

L21:
Output as Picture (PNG) (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20run%202%20output%20picture.png)

Output as Table (HTML) (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20run%202%20output%20table.html)

Run Tables (https://dl.dropboxusercontent.com/u/106196821/SAPP%20L21%20439%20STRs/YFull%20L21%20run%202%20run%20tables.html) (modal haplotypes, etc).


DF1:
Output as Picture (PNG) (https://dl.dropboxusercontent.com/u/106196821/SAPP%20DF1%20451%20STRs/YFull%20DF1%20run%202%20output%20picture.png)

Output as Table (HTML) (https://dl.dropboxusercontent.com/u/106196821/SAPP%20DF1%20451%20STRs/YFull%20DF1%20run%202%20output%20table.html)

Run Tables (https://dl.dropboxusercontent.com/u/106196821/SAPP%20DF1%20451%20STRs/YFull%20DF1%20run%202%20run%20tables.html) (modal haplotypes, etc).

Mikewww
02-04-2017, 08:27 PM
Great work, Dave. Do you mind if we post these links on DF1/L513 on the L513 yahoo group? Are you ready to be answer questions about this?

I know folks will ask if we need to do YFull interpretations. I don't have a strong answer other I definitely think there is NOT any harm in it and it's only $50. If you feel like these STR calls look reliable, that's important.

Dave-V
02-04-2017, 09:36 PM
Great work, Dave. Do you mind if we post these links on DF1/L513 on the L513 yahoo group? Are you ready to be answer questions about this?

I know folks will ask if we need to do YFull interpretations. I don't have a strong answer other I definitely think there is NOT any harm in it and it's only $50. If you feel like these STR calls look reliable, that's important.

I'm fine if you want to post them and I can answer questions. Correct answers may cost a little more...

For me this exercise is really more about testing if we - in the absence of commercial tools and databases for it - can really make use of the STRs from NGS testing. Igmayka pointed out that YFull is starting to offer interpretive data linking STRs signatures to SNPs, but it's not yet really in usable form.

I think with this kind of analysis we can say we CAN make sense out of the "500-STR" (or 400-some at least) database at YFull with today's capabilities; although we shouldn't fool anyone that it's as straighforward as, for example, you've made in L21 and L513 for the 67 and 111 marker databases. One of the main issues (as shown so well in MitchellSince1893's case) is reliability, but if we assume reliability is about 98% (just from that one data point so we really need more) then it's still worth it to pursue. We just need to keep signatures to more STRs (8-10 or more, probably) until the reliability is tested. The other issue is just the practical effort of handling that volume of data (would YOU like to build a version of your L21/L513 spreadsheets for these? :)). This kind of triage is a start at that part.

We need to be prepared for the questions that boil down to "why bother looking at NGS STR data if you have SNPs?". It's another area where "your mileage will vary" since it won't help everyone. IMO there's one clear-cut instance where you want to look at these and that's for larger haplogroups with significant branching within genealogical time, where even current NGS testing can't define the near-term SNP tree well enough (either for lack of NGS testing or lack of identified SNPs). In that situation there's real value to be mined from 400-500 STRs (if they're mostly reliable).

Mikewww
02-05-2017, 09:55 PM
...
For me this exercise is really more about testing if we - in the absence of commercial tools and databases for it - can really make use of the STRs from NGS testing.
I appreciate this Dave. Strong patterns of STRs can have very high probabilities and all of those STRs are worth investigating.


...
I think with this kind of analysis we can say we CAN make sense out of the "500-STR" (or 400-some at least) database at YFull with today's capabilities; although we shouldn't fool anyone that it's as straighforward as, for example, you've made in L21 and L513 for the 67 and 111 marker databases.
Agreed, FTDNA is probably fooling around with this stuff. If they can figure out how to make any new STRs an enhancement to the existing straight forward (almost no no calls) matching levels it could be useful.


... The other issue is just the practical effort of handling that volume of data (would YOU like to build a version of your L21/L513 spreadsheets for these? :)). This kind of triage is a start at that part.
Yes, but I need to get a set of data and see how hole-ly it is.


We need to be prepared for the questions that boil down to "why bother looking at NGS STR data if you have SNPs?".
I've been thinking about this for a while, a dangerous thing. The straight-forward use case is for the NGS committed clusters, so much so that we are seeing one NGS per MDKA and we are reaching most youthful haplogroup levels across the MDKAs. In other words the SNPs are running out of gas in these NGS committed clusters. In this case the extra STRs can provide an STR signature that differentiates further, unfolding the leaves at the ends of the twigs (SNPs). In some cases 111 STRs may be doing this but there are generally not enough signature STRs in 111 out beyond the twigs(SNPs). You can't rely on just one STR, I don't think. Maybe two is okay for a brief timeframe, but of course three, four or five is always better, which the 400 STRs may afford.

However, I don't think the straight-forward use case is that promising. NGS testing is just too expensive for this so I don't think we can expect to see that many NGS tests available down truly to full MDKA coverage levels.

I think another use case is more promising. The #2 use case would be to see the potential STR signatures driven out by analyzing the extra STRs and then for a vendor to produce a new STR panel. I am not smart enough to develop criteria for such a panel. I think we'd want a series of non multi-copy fast to medium STRs that can be extracted fairly reliably from the Big Y regions. In this use case, the non-NGS testers can play. They do just two things .... get their STR panels including the new one and test for their expected target most youthful (terminal) shared SNP. If that fails, the backup one SNP higher on the tree.

Some will argue and say that this use case #2 is crazy because 1-111 STRs is expensive. However, the state of the existing STR testers is such that many have already moved up to 67 and even 111 already. What if there were 150,000 67 STR people out there already? What if a third of those were already at 111? That's a lot of folks who could inexpensively take advantage of new STR panel. The latent testers out there at 37 STRs may also have the budgetary reach to play.

dcissell
04-27-2017, 05:35 PM
I have not been able to access the output results/tables. Are they still available somewhere?

Thanks.

Dave-V
04-28-2017, 01:02 AM
I have not been able to access the output results/tables. Are they still available somewhere?

Thanks.

That's because Dropbox eliminated public folders. I have copied the files to Google Drive and you can find them here (each one opens a Google Drive folder):

DF1 (https://drive.google.com/open?id=0B1oWf7A5py4AUHhsMklGT2J3cEU)

L21 (https://drive.google.com/open?id=0B1oWf7A5py4AZ3JpS2R2LWx0UTQ)

U152 (https://drive.google.com/open?id=0B1oWf7A5py4Ab2RwbmdGUlF1ck0)

DF27 (https://drive.google.com/open?id=0B1oWf7A5py4AcU1QS2o0dnc4aHM)


Dave

Cofgene
04-28-2017, 10:30 AM
Would be interested in seeing this done for the U106 region.

Dave-V
05-16-2017, 04:16 PM
Would be interested in seeing this done for the U106 region.

Here's a run based on the U106 group's 213 kits currently showing in YFull. The files are located here (Google Drive): https://drive.google.com/open?id=0B1oWf7A5py4ANWlhMldkSkNqdzQ.

This was based on 445 STRs in the group that were more than 50% called and so could be reliably used to infer values based on close kits in the tree.

Again the files are as follows:

- The "output picture" is a PNG file plotting the mutation history tree with the U106 MRCA at the top; yellow boxes are kits ("leaf nodes") and blue boxes are branching points. The blue boxes are numbered just for reference; yellow boxes show the kit names. At each branching point or kit, the STRs that mutated along those branches are shown above and to the right of the branching point (normally SAPP puts them just above but I've moved them over so they don't cover the boxes). Also, the branching points that correspond to MRCAs for SNPs below U106 are marked with those SNP names in blue.

- The "output table" is the same tree but shown in table form instead of as a picture. Especially when the picture is unreadable or too large to process, the table makes a useful reference.

- The "run tables" show the in-progress output from the SAPP run that produced the picture (or table). This is where tables of genetic distance between kits are produced as well as modal haplotypes for each of the SNPs under U106.

Those are the outputs from the analysis. The other file I've included in the Google Drive folder is the U106 SNP tree that SAPP uses to guide its STR mutation history tree. The U106 SNP tree is taken from YFull's 5.03 tree and looks like the following (with YFull's TMRCAs included where reported). Note - this SNP tree is purposely formatted to fit on one page so it's pretty busy. If you can't enlarge the one below enough to read it, the folder I listed above has a PDF version that should be workable).

http://www.anthrogenica.com/attachment.php?attachmentid=16069&stc=1

Dave

Cofgene
05-16-2017, 04:39 PM
Thanks!!!!! We will chew on this in U106 land.