Page 51 of 52 FirstFirst ... 4149505152 LastLast
Results 501 to 510 of 519

Thread: Initial Upper Palaeolithic Homo sapiens from Bacho Kiro Cave, Bulgaria[

  1. #501
    Registered Users
    Posts
    55
    Sex

    Mmh, you are right, just tried the same command above but with maxmiss=1, and indeed the discrepancy pops up
    From path:
    South_Africa_2000BP.SG Belgium_UP_GoyetQ116_1_published_all China_Tianyuan Mongolia_East_N -0.00108850519442685 0.000680976107641864 -1.5984484363133 0.109943212333304
    From precomp:
    South_Africa_2000BP.SG Belgium_UP_GoyetQ116_1_published_all China_Tianyuan Mongolia_East_N 0.00719661806190947 0.00109888868109211 6.54899644134767 5.79250117164811e-11
    I even updated admixtools and re-ran both extract_f2 and f2_from_precomp and this still happens, BUT, if I add the option afprod=T in f2_from_precomp, the problem seems fixed
    Code:
    f2_from_precomp("../data/admixtools_f2s/test", afprod=T)
    South_Africa_2000BP.SG Belgium_UP_GoyetQ116_1_published_all China_Tianyuan Mongolia_East_N -0.00108850519442685 0.000680976107641864 -1.5984484363133 0.109943212333304
    from the reference:
    afprod
    Return negative average allele frequency products instead of f2 estimates. This will result in more precise f4-statistics when the original data had large amounts of missingness, and should be used in that case for qpdstat and qpadm
    I guess one would expect f2_from_precomp to do this automatically, if extract_f2 was ran with maxmiss != 0

  2. The Following User Says Thank You to mauors For This Useful Post:

     Kale (05-06-2021)

  3. #502
    Registered Users
    Posts
    1,971
    Sex
    Omitted

    Here's an informative correspondance I got from Dr. Maier on the matter.
     

    Hi Kale,

    yes, I think you're right about this, it has to do with the maxmiss variable, and more generally, with the selection of SNPs.

    I have written some of this up here: https://uqrmaie1.github.io/admixtool....html#biases-1

    The short version is this: maxmiss = 0 is the default option, which is most conservative, but also uses the smallest number of SNPs. You can set maxmiss to higher values to retain more SNPs when extracting data for many populations, but often it's better to just extract data for fewer populations at a time.

    The reasons why maxmiss > 0 can cause problems is that it is generally not the case that SNPs that are missing in some populations are just a random subset of all SNPs. After I realized that this is in fact often a problem in practice, I set the default to maxmiss = 0, and changed the documentation to advise against using maxmiss > 0. I don't want to take the option out completely, because in many cases this can improve power without introducing bias. It's just not easy know when it's a good idea to use this and when it isn't.

    So what's the difference between option 1 (f2_from_precomp) and option 2 (A directory which contains pre-computed f2-statistics)?

    When there is no missing data (or when you set maxmiss = 0), f4 is exactly equal to the sum of 4 f2-statistics, and also to the sum of the 4 average allele frequency products of those population pairs. When some of the data is missing (or when maxmiss > 0), this is still true in expectation, but it's not exactly true anymore. It turns out that in the case of having missing data, the sum of 4 average allele frequency products is a better approximation than the sum of 4 f2-statistics.

    extract_f2() computes 3 different things for each population pair: f2, average allele frequency products, and fst (fst is a recent addition).

    When you run f2_from_precomp(), it will read the f2-statistics by default (because it doesn't know what you want to do with the output, and you can't use average allele frequency products to compute f3, for example). But when you run f4/qpdstat (or qpadm, which uses f4-stats), it will instead read the average allele frequency products, because they give better approximations of f4 in cases where maxmiss was set to > 0 when running extrat_f2().

    You could get the same behavior by running f2_from_precomp(..., afprod = TRUE), and then plugging that into the f4/qpdstat function.

    There is another small difference between how the f2-stats and the average allele frequency products are computed, which is that SNPs which have exactly the same allele frequency in all populations will be ignored in one case, but not the other. This can change the number of SNPs used and to some extent the estimate and SE, but it will barely affect the Z-score. The reason why this is so is because it makes it easier to exactly match the output of various ADMIXTOOLS 1 programs, which sometimes do and sometimes don't count those SNPs.

    I wish this was all a bit simpler and less confusing, but this was the best I came up with in trying to make things both as accurate as possible, and match the results of the original ADMIXTOOLS. Many of these complications disappear when there either is little missing data, or when you keep maxmiss = 0. Even with maxmiss = 0, the number of SNPs can differ depending on which populations you choose, and sometimes even that can make a difference! But all in all, you'll be better off with setting maxmiss = 0. If you lose too many SNPs this way, you can either reduce the number of populations, or always compute f-stats from scratch, by passing the genotype file prefix as the first argument to f4. This will select the largest number of SNPs that are non-missing in all 4 populations of every f4-stat.


    UPDATE:
    Further correspondence...
     

    Yes, if you set afprod = TRUE, you're not getting f2-statistics, and if you try to compute f3 from it, the results might be wrong. This is different from f4, where you can use either f2 or allele frequency products. I attached a plot that shows the difference between regular f3-stats computed from f2, and pseudo-f3-stats computed from allele frequency products. There is little difference for f3-stats where the first population is not Mbuti or Ust Ishim, but if the first population is Mbuti or Ust Ishim (in my example), what you get is very different.

    For qpgraph, this actually matters very little in my experience, but I wouldn't use afprod = TRUE when interpreting f3-stats directly. And I would be very careful in general with using maxmiss > 0!
    Last edited by Kale; 05-06-2021 at 04:22 PM.
    Collection of 14,000 d-stats: Hidden Content Part 2: Hidden Content Part 3: Hidden Content PM me for d-stats, qpadm, qpgraph, or f3-outgroup nmonte models.

  4. The Following 7 Users Say Thank You to Kale For This Useful Post:

     Helen (06-03-2021),  Jatt1 (05-07-2021),  kolompar (05-06-2021),  mauors (05-06-2021),  Megalophias (05-18-2021),  Nganasankhan (05-06-2021),  Ryukendo (05-21-2021)

  5. #503
    Registered Users
    Posts
    206
    Sex

    Quote Originally Posted by Kale View Post
    outgroup pop1 pop2 pop3 d-value std.err. z-score
    South_Africa_2000BP.SG Onge.DG Yana_UP.SG GoyetQ116_1 -0.00185 0.000588 -3.15
    South_Africa_2000BP.SG Tianyuan Yana_UP.SG GoyetQ116_1 -0.00129 0.0007 -1.84
    South_Africa_2000BP.SG Japan_Jomon Yana_UP.SG GoyetQ116_1 0.00171 0.000635 2.7
    South_Africa_2000BP.SG PrimorskyKrai_Boisman_MN Yana_UP.SG GoyetQ116_1 0.00154 0.000543 2.84
    South_Africa_2000BP.SG Mongolia_N_East Yana_UP.SG GoyetQ116_1 0.00373 0.000575 6.48

    Mbuti.DG Onge.DG Yana_UP.SG GoyetQ116_1 -0.00123 0.000561 -2.19
    Mbuti.DG Tianyuan Yana_UP.SG GoyetQ116_1 -0.000665 0.000673 -0.988
    Mbuti.DG Japan_Jomon Yana_UP.SG GoyetQ116_1 0.00233 0.000599 3.9
    Mbuti.DG PrimorskyKrai_Boisman_MN Yana_UP.SG GoyetQ116_1 0.00216 0.000514 4.21
    Mbuti.DG Mongolia_N_East Yana_UP.SG GoyetQ116_1 0.00435 0.000555 7.84

    Chimp.REF Onge.DG Yana_UP.SG GoyetQ116_1 -0.0012 0.000652 -1.83
    Chimp.REF Tianyuan Yana_UP.SG GoyetQ116_1 -0.00063 0.000754 -0.836
    Chimp.REF Japan_Jomon Yana_UP.SG GoyetQ116_1 0.00237 0.000697 3.4
    Chimp.REF PrimorskyKrai_Boisman_MN Yana_UP.SG GoyetQ116_1 0.0022 0.000611 3.6
    Chimp.REF Mongolia_N_East Yana_UP.SG GoyetQ116_1 0.00439 0.000646 6.79

    Mongolia_N_East are the most severe offender I've found yet, that level of affinity is crazy, these don't even reach significance.
    South_Africa_2000BP.SG GoyetQ116_1 Yana_UP.SG Mongolia_N_East = z -1.88
    Mbuti.DG GoyetQ116_1 Yana_UP.SG Mongolia_N_East = z -1.57

    I need to run these stats on my admixtools1 machine, something seems off here.

    EDIT:
    Yeah ok with admixtools1...
    South_Africa_2000BP.SG Onge.DG Yana_UP.SG GoyetQ116_1 = z -2.1
    South_Africa_2000BP.SG Tianyuan Yana_UP.SG GoyetQ116_1 = z -1.0
    South_Africa_2000BP.SG Japan_Jomon Yana_UP.SG GoyetQ116_1 = z -1.1
    South_Africa_2000BP.SG PrimorskyKrai_Boisman_MN Yana_UP.SG GoyetQ116_1 = z -2.4
    South_Africa_2000BP.SG Mongolia_N_East Yana_UP.SG GoyetQ116_1 = z -2.0

    That makes more sense. I've narrowed down the problem to the function f2_from_precomp.

    Here's a d-stat using 'The prefix of genotype files' for data.
    South_Africa_2000BP.SG GoyetQ116_1 Tianyuan Mongolia_N_East -9.48e-4 6.40e-4 -1.48 0.139 660799
    And using 'A directory which contains pre-computed f2-statistics'
    South_Africa_2000BP.SG GoyetQ116_1 Tianyuan Mongolia_N_East -0.00131 0.000627 -2.10 0.0359

    Basically the same. Also matches results I've gotten using admixtools1 in Linux.

    But using 'f2_from_precomp'
    South_Africa_2000BP.SG GoyetQ116_1 Tianyuan Mongolia_N_East 0.00428 0.000666 6.43 1.28e-10
    Woa what happened here?

    Can anyone else replicate this?
    I am confused now haha. So is Goyet closer to Boisman compared to ANE or not? The stats with Mongolia_N_East seem contradictory.

    I thnink Loschbour might also help tease out the relationships here, there was one model where Loschbour contributed to Mongolia_N_East so in this case the affinity could be explained as Goyet->Loschbour-> Mongolia_East_N. But it could also be the reverse.

  6. #504
    Registered Users
    Posts
    1,971
    Sex
    Omitted

    Quote Originally Posted by Max_H View Post
    I am confused now haha. So is Goyet closer to Boisman compared to ANE or not? The stats with Mongolia_N_East seem contradictory.

    I thnink Loschbour might also help tease out the relationships here, there was one model where Loschbour contributed to Mongolia_N_East so in this case the affinity could be explained as Goyet->Loschbour-> Mongolia_East_N. But it could also be the reverse.
    I think that the Goyet <-> Mongolia_N connection is an artifact of admixtools2 specifically...

    Further correspondance from Dr. Maier.
    I understand now what's causing the differences. It has to do with how the bias correction is calculated in qp3Pop. To imitate that behavior exactly, it's necessary to read the data from genotype files directly. It will take me a few days to change the f3 and maybe qpgraph function in admixtools 2 so that they can use all SNPs available for each population triple and calculate the bias correction in the same way.
    Collection of 14,000 d-stats: Hidden Content Part 2: Hidden Content Part 3: Hidden Content PM me for d-stats, qpadm, qpgraph, or f3-outgroup nmonte models.

  7. The Following 3 Users Say Thank You to Kale For This Useful Post:

     Jatt1 (05-18-2021),  Max_H (05-20-2021),  tipirneni (05-18-2021)

  8. #505
    Registered Users
    Posts
    206
    Sex

    Quote Originally Posted by Kale View Post
    I think that the Goyet <-> Mongolia_N connection is an artifact of admixtools2 specifically...

    Further correspondance from Dr. Maier.
    Thanks.

    On that note, maybe you saw it already but this paper was recently posted on biorxiv. I think their qpGraph is very similar to the one you posted. Do you think this helps explain the Goyet-Mongolia_N-ANE connection? https://www.biorxiv.org/content/10.1...05.18.444621v1

    My only contestion (as far as I can understand it) is the model for ANE. I think West Eurasian-like (UP) ancestry in Malta and Yana should be higher than only 50% given that Tianyuan is supposed to be 100% IUP in their model and in previous papers, ANE populations are not 50% Tianyuan. Also, a small Ustishim contribution to Sunghir relative to Kostenki could as the Peştera Muierii paper suggested could also explain why Kostenki appears more Basal relative to Sunghir.

  9. The Following 2 Users Say Thank You to Max_H For This Useful Post:

     etrusco (05-20-2021),  Kale (05-25-2021)

  10. #506
    Registered Users
    Posts
    665
    Sex
    Ethnicity
    1/2 Italian, 1/2 Armenian
    Nationality
    USA
    Y-DNA (P)
    R1b-U152
    mtDNA (M)
    H5a

    So do we think Use-Ishim, Bacho Kiro and Tase being East Eurasian is accurate?

  11. #507
    Registered Users
    Posts
    1,971
    Sex
    Omitted

    Quote Originally Posted by Max_H View Post
    Thanks.

    On that note, maybe you saw it already but this paper was recently posted on biorxiv. I think their qpGraph is very similar to the one you posted. Do you think this helps explain the Goyet-Mongolia_N-ANE connection? https://www.biorxiv.org/content/10.1...05.18.444621v1

    My only contestion (as far as I can understand it) is the model for ANE. I think West Eurasian-like (UP) ancestry in Malta and Yana should be higher than only 50% given that Tianyuan is supposed to be 100% IUP in their model and in previous papers, ANE populations are not 50% Tianyuan. Also, a small Ustishim contribution to Sunghir relative to Kostenki could as the Peştera Muierii paper suggested could also explain why Kostenki appears more Basal relative to Sunghir.
    I've had Yana/MA1 come out as ~50% Tianyuan in graphs before also, there's a lot of room for flexibility because we have so few ancient East Eurasians to constrain things.
    Collection of 14,000 d-stats: Hidden Content Part 2: Hidden Content Part 3: Hidden Content PM me for d-stats, qpadm, qpgraph, or f3-outgroup nmonte models.

  12. The Following 4 Users Say Thank You to Kale For This Useful Post:

     Helen (06-03-2021),  K33 (05-25-2021),  Max_H (05-29-2021),  parasar (05-25-2021)

  13. #508
    Registered Users
    Posts
    1,971
    Sex
    Omitted

    Testing something with qpadm that keeps popping up in qpgraph.

    BK1653
    GoyetQ116_1: 76.0% +/- 12.7%
    Sunghir.SG: 22.7% +/- 12.7%
    Neanderthal_Altai.DG: 1.38% +/- 0.47%
    Tail: 0.12
    right = c('ZlatyKun.SG', 'Ust_Ishim.DG', 'BachoKiro_IUP', 'Kostenki14', 'Yana_UP.SG', 'Onge.DG', 'Tianyuan_AR33K', 'Denisova.DG')
    Pretty much exact percentages as qpgraph.

    Then you have
    Gravettian
    GoyetQ116_1: 40.5% +/- 8.02%
    Sunghir.SG: 59.5% +/- 8.02%
    Tail: 0.82
    right = c('ZlatyKun.SG', 'Ust_Ishim.DG', 'BachoKiro_IUP', 'Kostenki14', 'Yana_UP.SG', 'Onge.DG', 'Tianyuan_AR33K', 'Denisova.DG')

    Or with BK1653 in left and Goyet in right...
    Gravettian
    BK1653: 47.5% +/- 8.58%
    Sunghir.SG: 52.5% +/- 8.58%
    Tail: 0.42
    right = c('ZlatyKun.SG', 'Ust_Ishim.DG', 'BachoKiro_IUP', 'Kostenki14', 'GoyetQ116_1', 'Yana_UP.SG', 'Onge.DG', 'Tianyuan_AR33K', 'Denisova.DG')

    Post-CI Europe a continuum between Goyet and Sunghir related pops?
    Last edited by Kale; 06-17-2021 at 08:09 AM.
    Collection of 14,000 d-stats: Hidden Content Part 2: Hidden Content Part 3: Hidden Content PM me for d-stats, qpadm, qpgraph, or f3-outgroup nmonte models.

  14. The Following 4 Users Say Thank You to Kale For This Useful Post:

     etrusco (06-17-2021),  Megalophias (06-18-2021),  Ryukendo (06-18-2021),  traject (06-18-2021)

  15. #509
    Registered Users
    Posts
    1,037
    Sex
    Location
    lombardy
    Nationality
    italian

    Italy Portugal Order of Christ Russia Imperial Canada Quebec Spanish Empire (1506-1701) Vatican
    Quote Originally Posted by Kale View Post
    Testing something with qpadm that keeps popping up in qpgraph.

    BK1653
    GoyetQ116_1: 76.0% +/- 12.7%
    Sunghir.SG: 22.7% +/- 12.7%
    Neanderthal_Altai.DG: 1.38% +/- 0.47%
    Tail: 0.12
    right = c('ZlatyKun.SG', 'Ust_Ishim.DG', 'BachoKiro_IUP', 'Kostenki14', 'Yana_UP.SG', 'Onge.DG', 'Tianyuan_AR33K', 'Denisova.DG')
    Pretty much exact percentages as qpgraph.

    Then you have
    Gravettian
    GoyetQ116_1: 40.5% +/- 8.02%
    Sunghir.SG: 59.5% +/- 8.02%
    Tail: 0.82
    right = c('ZlatyKun.SG', 'Ust_Ishim.DG', 'BachoKiro_IUP', 'Kostenki14', 'Yana_UP.SG', 'Onge.DG', 'Tianyuan_AR33K', 'Denisova.DG')

    Or with BK1653 in left and Goyet in right...
    Gravettian
    BK1653: 47.5% +/- 8.58%
    Sunghir.SG: 52.5% +/- 8.58%
    Tail: 0.42
    right = c('ZlatyKun.SG', 'Ust_Ishim.DG', 'BachoKiro_IUP', 'Kostenki14', 'GoyetQ116_1', 'Yana_UP.SG', 'Onge.DG', 'Tianyuan_AR33K', 'Denisova.DG')

    Post-CI Europe a continuum between Goyet and Sunghir related pops?
    very interesting. How about Sunghir? How can be it modeled with?

  16. #510
    Registered Users
    Posts
    1,971
    Sex
    Omitted

    Works fine as just Kostenki14 with right pops...
    right = c('ZlatyKun.SG', 'Ust_Ishim.DG', 'BachoKiro_IUP', 'GoyetQ116_1', 'Yana_UP.SG', 'Onge.DG', 'Tianyuan_AR33K', 'Denisova.DG')

    Even adding BK1653 to the right doesn't break it.
    Adding Gravettian to the right obvious does though.
    Last edited by Kale; 06-18-2021 at 05:00 AM.
    Collection of 14,000 d-stats: Hidden Content Part 2: Hidden Content Part 3: Hidden Content PM me for d-stats, qpadm, qpgraph, or f3-outgroup nmonte models.

  17. The Following 2 Users Say Thank You to Kale For This Useful Post:

     Megalophias (06-18-2021),  Ryukendo (06-18-2021)

Page 51 of 52 FirstFirst ... 4149505152 LastLast

Similar Threads

  1. Replies: 64
    Last Post: 11-11-2020, 01:15 PM
  2. Replies: 13
    Last Post: 07-21-2019, 03:34 PM
  3. Replies: 0
    Last Post: 02-02-2015, 06:17 AM
  4. Neanderthals did coexist with Homo Sapiens in Europe
    By Jean M in forum Human Evolution
    Replies: 4
    Last Post: 08-25-2014, 09:45 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •