Results 1 to 7 of 7

Thread: Y-111 > 600+?

  1. #1
    Registered Users
    Posts
    574

    Y-111 > 600+?

    What’s your opinion on this real life comparison? All 4 people have completed Big Y-700.

    Scenario 1: Person A and Person B have a genetic distance of 4 at 111 markers. Person A has 1 Variant (Unnamed) since their common patrilineal ancestor. Person B has 4 Variants (1 Named and 3 Unnamed) since their common patrilineal ancestor.

    Scenario 2: Person C and Person D both have 4 Unnamed Variants since their common patrilineal ancestor and they have a genetic distance of 8 at 111 markers.

    This evidence leads me to conclude that the patrilineal relationship for scenario 1 is closer. However, there are 6/633 Big Y STR differences for scenario 1 and only 2/651 for scenario 2. Am I to conclude that Y-111 is more accurate than 600+ STRs or that Scenario 2 has a closer patrilineal relationship?
    I1> DF29> Z58> Z59> Z2041> Z2040> Z382> S26361> S16414> FGC24354> FGC24357> FGC24356> S10350> FGC75802> Y125947> S21197> BY149414> BY188003> BY188570

    YFull id: YF15884

  2. #2
    Registered Users
    Posts
    36
    Sex
    mtDNA (M)
    T1a1j w A16194c
    Y-DNA (P)
    N-Y85063 (Y7300+)

    Sweden
    Scenario one has 4+6=10 differences in total (Y111 plus the STRs beyond that coming from the BigY test)
    Scenario two has 8+2=10 differences in total

    Just something to keep in mind.

  3. The Following User Says Thank You to BroderTuck For This Useful Post:

     mwauthy (06-28-2019)

  4. #3
    Registered Users
    Posts
    248
    Sex
    Location
    USA
    Nationality
    USA
    Y-DNA (P)
    R1b-L21 L513*

    United States of America Ireland Germany Belgium Wallonia
    Based just on the “usual” averaging process for SNP age estimation, scenario 1 would be estimated as the closer relationship via SNPs since the average of the 2 lines is 2.5 private variants compared to 4 private variants for scenario 2.

    However in reality either could be closer with only 2 data points. Both SNPs and STRs regularly vary from statistical norms sometimes in opposite directions so either could be correct or both may be wrong.

    If these are real scenarios, the “next level down” of analysis for STRs would be to look at the individual marker mutations and assess fast vs. slow markers etc to get a better idea of the common ancestor age. Either Colin Ferguson’s adaptation of the McGee utility (using the Group option) or SAPP can do that analysis on the individual markers.

    With SNPs, the “next level down” would be to look at the base positions for the private variants and apply an age estimate using anyone’s quality region definition (usually combBed, McDonald, or Poznik).

  5. The Following 2 Users Say Thank You to Dave-V For This Useful Post:

     JMcB (06-28-2019),  mwauthy (06-28-2019)

  6. #4
    Registered Users
    Posts
    248
    Sex
    Location
    USA
    Nationality
    USA
    Y-DNA (P)
    R1b-L21 L513*

    United States of America Ireland Germany Belgium Wallonia
    Duplicate post.
    Last edited by Dave-V; 06-28-2019 at 12:34 PM. Reason: Duplicate post

  7. #5
    Registered Users
    Posts
    574

    Quote Originally Posted by Dave-V View Post
    Based just on the “usual” averaging process for SNP age estimation, scenario 1 would be estimated as the closer relationship via SNPs since the average of the 2 lines is 2.5 private variants compared to 4 private variants for scenario 2.

    However in reality either could be closer with only 2 data points. Both SNPs and STRs regularly vary from statistical norms sometimes in opposite directions so either could be correct or both may be wrong.

    If these are real scenarios, the “next level down” of analysis for STRs would be to look at the individual marker mutations and assess fast vs. slow markers etc to get a better idea of the common ancestor age. Either Colin Ferguson’s adaptation of the McGee utility (using the Group option) or SAPP can do that analysis on the individual markers.

    With SNPs, the “next level down” would be to look at the base positions for the private variants and apply an age estimate using anyone’s quality region definition (usually combBed, McDonald, or Poznik).
    Thanks for the response! You’ve confirmed my suspicions that Scenario 2 is not necessarily a closer patrilineal relationship because of less differences with the 600+ STRs.

    Is there a list somewhere of which of those 600+ STRs have faster mutation rates? Are there enough Big Y-700 results out thus far to determine variable mutation rates for these extra 600+ STRs? Since you’re the creator of the SAPP tool have you been able to already adjust your SAPP tool based on this Big Y-700 variability or are you still waiting for more data?
    I1> DF29> Z58> Z59> Z2041> Z2040> Z382> S26361> S16414> FGC24354> FGC24357> FGC24356> S10350> FGC75802> Y125947> S21197> BY149414> BY188003> BY188570

    YFull id: YF15884

  8. The Following User Says Thank You to mwauthy For This Useful Post:

     JMcB (06-28-2019)

  9. #6
    Registered Users
    Posts
    248
    Sex
    Location
    USA
    Nationality
    USA
    Y-DNA (P)
    R1b-L21 L513*

    United States of America Ireland Germany Belgium Wallonia
    Quote Originally Posted by mwauthy View Post
    Is there a list somewhere of which of those 600+ STRs have faster mutation rates? Are there enough Big Y-700 results out thus far to determine variable mutation rates for these extra 600+ STRs? Since you’re the creator of the SAPP tool have you been able to already adjust your SAPP tool based on this Big Y-700 variability or are you still waiting for more data?
    Yes to ALL of these (including "yes I'm waiting for more data").

    There are published rates for the first 111 markers (actually several sets of rates from many different studies), but no published rates yet for any of the additional STR markers found in the Y500 or Y700 Big Ys. FTDNA has not provided either locations or estimated rates for those additional STRs and they haven't to my knowledge been covered by any study. However based on collected sets of Y500 and Y700 data we've been able to run "field estimates" of the rates at least for markers that have had mutations in the data.

    FTDNA HAS confirmed that the "Panel 6" (the markers in range 112-561 found from Y500) STRs were chosen in part for their extreme stability - i.e. VERY low mutation rates, while the "Panel 7" (markers in range 562-838 found from Y700) have variability closer to the first 111. Field testing bears that out and suggests that the Panel 7 (Y700) markers have average variability comparable to the slower half of the first 111.

    SAPP already has mutation rate estimates for all 838 markers built into the tool based on the field testing, with an assumed rate of 0.000001 for the many Panel 6 markers that have not had sufficient mutations to estimate, and an assumed rate of 0.000235 for the many Panel 7 markers that have not had sufficient mutations to estimate. Those appear to provide good results for branching and age estimation purposes, but they're of course just placeholders and we need more data to get accurate rate estimates.

    There is a spreadsheet with these rates here. That file actually has 7 collections of rates for the first 111 from the various studies, and the rates for the 112-838 markers collected from field testing.

    As I said SAPP has these rates built-in and will use them for branching and age estimation. But since FTDNA doesn't provide an easy report of all 838 markers it can be a pain to collect all the additional markers for a group of kits into the required SAPP TXT file format. The easiest approach is probably to download the CSV files for the STR Results for all the kits in the group and put those into the "CSVAnalysis" tool on the SAPP website which will create the TXT file for you with all the kits and their up to 838 markers. See the FAQ page for a link to a video explaining how to use the CSVAnalysis tool if it's needed, but you CAN run it only on the CSVs containing STR markers and then just use the TXT file that the run produces.

  10. The Following 4 Users Say Thank You to Dave-V For This Useful Post:

     JMcB (06-28-2019),  mwauthy (06-28-2019),  rms2 (07-07-2019),  tamandua (06-29-2019)

  11. #7
    Registered Users
    Posts
    574

    Quote Originally Posted by Dave-V View Post
    Yes to ALL of these (including "yes I'm waiting for more data").

    There are published rates for the first 111 markers (actually several sets of rates from many different studies), but no published rates yet for any of the additional STR markers found in the Y500 or Y700 Big Ys. FTDNA has not provided either locations or estimated rates for those additional STRs and they haven't to my knowledge been covered by any study. However based on collected sets of Y500 and Y700 data we've been able to run "field estimates" of the rates at least for markers that have had mutations in the data.

    FTDNA HAS confirmed that the "Panel 6" (the markers in range 112-561 found from Y500) STRs were chosen in part for their extreme stability - i.e. VERY low mutation rates, while the "Panel 7" (markers in range 562-838 found from Y700) have variability closer to the first 111. Field testing bears that out and suggests that the Panel 7 (Y700) markers have average variability comparable to the slower half of the first 111.

    SAPP already has mutation rate estimates for all 838 markers built into the tool based on the field testing, with an assumed rate of 0.000001 for the many Panel 6 markers that have not had sufficient mutations to estimate, and an assumed rate of 0.000235 for the many Panel 7 markers that have not had sufficient mutations to estimate. Those appear to provide good results for branching and age estimation purposes, but they're of course just placeholders and we need more data to get accurate rate estimates.

    There is a spreadsheet with these rates here. That file actually has 7 collections of rates for the first 111 from the various studies, and the rates for the 112-838 markers collected from field testing.

    As I said SAPP has these rates built-in and will use them for branching and age estimation. But since FTDNA doesn't provide an easy report of all 838 markers it can be a pain to collect all the additional markers for a group of kits into the required SAPP TXT file format. The easiest approach is probably to download the CSV files for the STR Results for all the kits in the group and put those into the "CSVAnalysis" tool on the SAPP website which will create the TXT file for you with all the kits and their up to 838 markers. See the FAQ page for a link to a video explaining how to use the CSVAnalysis tool if it's needed, but you CAN run it only on the CSVs containing STR markers and then just use the TXT file that the run produces.
    Thanks! You’ve been very helpful!
    I1> DF29> Z58> Z59> Z2041> Z2040> Z382> S26361> S16414> FGC24354> FGC24357> FGC24356> S10350> FGC75802> Y125947> S21197> BY149414> BY188003> BY188570

    YFull id: YF15884

  12. The Following 2 Users Say Thank You to mwauthy For This Useful Post:

     Dave-V (06-28-2019),  JMcB (06-28-2019)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •