Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 29

Thread: Understanding Formal Statistics, f4 and D-Stats

  1. #11
    Banned
    Posts
    5,173
    Sex

    Quote Originally Posted by Kale View Post
    Sorry if I am missing something... but from the two charts at the top of your article, wouldn't it be more problematic to use Chimp?
    The Chimp excess alleles chart is basically the farther from English you get, the more excess alleles with chimp there are. That's basically ascertainment bias favoring English samples right?

    The Mbuti excess alleles chart looks like an inverse of what I'd imagine levels of archaic ancestry are in those populations.
    That doesn't seem like bias, but actual sharing within the human-node. To be fair it does seem a bit exaggerated though.
    Sorry, I should have clarified that biases against E/W/S Asians in D ( Europeans, W/S Asians; Europeans, OUTGROUP) exist for BOTH Mbuti and Chimp. The biases exist due to ascertainment bias AND African admixture. The 2 are not necessarily dependent factors acting upon the test populations.

    For the majority of W Asians the NET ( Mbuti bias - Chimp bias) is +ve. For S Asians and Oceanians its -ve. In other words, for W Asians its better to use Chimp, whereas for E/S Asians and Oceanians its better to use Mbuti, however, neither outgroup is ideal when comparing Europeans with E/W/S Asians and Oceanians.

    To better understand what I'm referring to, these are the tables which was used to generate the barplots in the article:

    Population Excess_Alleles_Shared_With_MBUTI [M] Excess over English Excess_Alleles_Shared_With_CHIMP over English [C] Net over English [M-C]
    Abkhasian 589 321 179 142
    Kurds 657 389 273 116
    Armenian 588 320 209 111
    Adygei 518 250 158 92
    Druze 722 454 392 62
    Georgian 583 315 269 46
    Pathan 660 392 348 44
    Sicilian 646 378 345 33
    Chechen 510 242 213 29
    Greek 442 174 166 8
    Ukrainian 311 43 35 8
    French 294 26 21 5
    English 268 0 0 0
    Iran_Fars 704 436 438 -2
    Jordanian 1059 791 800 -9
    Finnish 236 -32 3 -35
    Albanian 473 205 255 -50
    Belarusian 271 3 61 -58
    Estonian 210 -58 41 -99
    Kalash 419 151 269 -118
    GujaratiD 835 567 705 -138
    Scottish 234 -34 105 -139
    Saudi 894 626 772 -146
    Altaian 404 136 307 -171
    Balochi 553 285 494 -209
    Han 230 -38 391 -429
    Chukchi 0 -268 363 -631
    BedouinB 937 669 1400 -731
    Papuan 26 -242 4127 -4369

    Notice that for Abkhazians, Kurds, Adygei, etc, its better to use Chimp as an outgroup when testing against Europeans, whereas for S Asians, E Asians, and Papuans its better to use Mbuti. However, also notice that neither Mbuti nor Chimp is bias free when testing E/S/W Asians against Europeans.

    The plots are not exaggerated since I did not use normalized values here. When I do in cases to highlight differences that are very small, I state so in the description.

    EDIT: To give an idea how bad it can get take a look at Jordanians and Bedouin. For Bedouin, using Mbuti, there are 669 alleles shared with Mbuti to the exclusion of English, and using Chimp there are 1400 alleles shared with Chimp to the exclusion of English. Thus neither outgroup would work well for Bedouin or Jordanians
    Last edited by Kurd; 09-09-2018 at 11:06 PM.

  2. The Following User Says Thank You to Kurd For This Useful Post:

     Kale (09-10-2018)

  3. #12
    Registered Users
    Posts
    1,430
    Sex
    Omitted

    Theoretically with full genomes, Jordanians & Bedouin should share a bit more with Mbuti than English do, by virtue of lower Neanderthal levels, just as Papuans share less with Mbuti than English because of excess Denisovan.
    That's almost certainly not enough to account for the discrepancy (that's what I meant by exaggerated in the last post, not the plot itself), but it would be interesting to know how much it contributes.
    Collection of 14,000 d-stats: Hidden Content Part 2: Hidden Content Part 3: Hidden Content PM me for d-stats, qpadm, qpgraph, or f3-outgroup nmonte models.

  4. #13
    Banned
    Posts
    5,173
    Sex

    Quote Originally Posted by Kale View Post
    Theoretically with full genomes, Jordanians & Bedouin should share a bit more with Mbuti than English do, by virtue of lower Neanderthal levels, just as Papuans share less with Mbuti than English because of excess Denisovan.
    That's almost certainly not enough to account for the discrepancy (that's what I meant by exaggerated in the last post, not the plot itself), but it would be interesting to know how much it contributes.
    I think most of the allele sharing is due to African <——> SW Asian introgression

  5. #14
    Registered Users
    Posts
    317
    Sex

    Quote Originally Posted by Kale View Post
    Sorry if I am missing something... but from the two charts at the top of your article, wouldn't it be more problematic to use Chimp?
    The Chimp excess alleles chart is basically the farther from English you get, the more excess alleles with chimp there are. That's basically ascertainment bias favoring English samples right?
    There seems like a lot of discussion here about how ascertainment will affect the relationship between the two outgroups (Mbuti, Chimp) and different modern populations. But not much so far on how this will effect the relationship between outgroups and ancients?

    If there are ascertainment issues inflating relatedness between non-West Europeans and chimp (by monomorphic sites which are polymorphic in West Europeans), then will they not cut the other way and also inflate relatedness between ancients and chimp? Thereby also inflating relatedness between non-West Europeans and ancients! (esp. considered in two measures that don't use outgroup, like direct IBS between an ancient and modern sample).

    There doesn't seem like a reason why West Europeans+ancient will be unaffected by ascertainment, while non-West Europeans alone will be effected by ascertainment, deflating relatedness to the ancient. It seems like ancients should be affected by ascertainment too.

    The only reason that this would not be the case for ancients is unless those ancients are genuinely more ancestral to West Europeans, and share the spectrum of variation for that reason...

    Put another way, in a f4(A,B;C,D), I'd have thought any additional allele sharing between any three A,B, D due to ascertainment should cancel. It's only if you have ascertainment affecting any two pairs alone, e.g. A D, that you should have inflation/deflation?
    Last edited by Eterne; 09-10-2018 at 08:03 AM.

  6. #15
    Banned
    Posts
    6,336
    Sex
    Location
    Torun
    Ethnicity
    Central 75% + 25% Mazovia
    Nationality
    Pole
    Y-DNA (P)
    R1a > M198 > YP 1337

    Poland
    Quote Originally Posted by TuaMan View Post
    Can D-stats (or f4) tell you anything about the directionality of the gene flow between pops? I seem to have read conflicting things in the past regarding that question.
    It is about D-stats on Admixture components but their theory provided by Dienekes is applicable to other use of D-stats?

    http://dienekes.blogspot.com/2012/12...omponents.html
    Last edited by lukaszM; 09-10-2018 at 08:47 AM.

  7. #16
    Banned
    Posts
    5,173
    Sex

    Quote Originally Posted by Eterne View Post
    There seems like a lot of discussion here about how ascertainment will affect the relationship between the two outgroups (Mbuti, Chimp) and different modern populations. But not much so far on how this will effect the relationship between outgroups and ancients?

    If there are ascertainment issues inflating relatedness between non-West Europeans and chimp (by monomorphic sites which are polymorphic in West Europeans), then will they not cut the other way and also inflate relatedness between ancients and chimp? Thereby also inflating relatedness between non-West Europeans and ancients! (esp. considered in two measures that don't use outgroup, like direct IBS between an ancient and modern sample).

    There doesn't seem like a reason why West Europeans+ancient will be unaffected by ascertainment, while non-West Europeans alone will be effected by ascertainment, deflating relatedness to the ancient. It seems like ancients should be affected by ascertainment too.

    The only reason that this would not be the case for ancients is unless those ancients are genuinely more ancestral to West Europeans, and share the spectrum of variation for that reason...

    Put another way, in a f4(A,B;C,D), I'd have thought any additional allele sharing between any three A,B, D due to ascertainment should cancel. It's only if you have ascertainment affecting any two pairs alone, e.g. A D, that you should have inflation/deflation?


    A great question which is very relevant to dstats, ADMIXTURE, and IBS, and sort of opens up a can of worms. There are many factors to consider here and I could probably take a couple of hours to completely cover the material.

    Here are some main points to consider based on the following IBS run which I did at 100% genotype rate for an apples to apples comparison:

    1- Ascertainment bias increasingly becomes a big factor the further back you go in time. Notice how Denisovan and Neanderthal shares more alleles with Chimp than with Mbuti. This of course should not be the case.

    2- Ascertainment bias and reference bias when mapping sequences with unusual variation such as in Denisovan and Neanderthal causes them to share more alleles with Mbuti than Eurasians. That is why you see Neanderthal with 90%+ SSA in ADMIXTURE. Ascertainment bias at play big time!!

    3- Ascertainment bias becomes noticeable as you go back 25K years. Notice the order of Ust -> Kostenki -> MA1 wrt allele sharing with Chimp.

    4- The main reason I started diploid genotyping some of the published pseudo-haploid aDNA is because pseudo-haploids are inherently wrong at about 30% of the genome, because humans are hetrozygous at about 30% of the genome. Notice how the pseudo-haploid steppe genomes only share 58-59% alleles with Mbuti vs their diploid counterparts at 64%.

    5- Chimp is more forgiving on the published pseudo-haploid aDNA than Mbuti

    6- Some of Ust Ishim's affinity with SSA may be real, or it could be reference bias. My experience with using Broad Institute's GATK pipeline to genotype was that it creates considerable reference bias.

    BTW, Martiniano genotyped the diploid steppe aDNA and not me. I take a different approach than Rui when diploid genotyping ancients. I have expressed my reservations with his approach which entails dropping T genotypes where the reference is a C **. Although an ok approach, I have expressed to him the bias this creates. My diploid aDNA genotypes tend to be a tad more accurate. I'll post a dstat comparison of mine vs his later.

    SAMPLE IBS-CHIMP (68K SNPs)
    CHIMP 100.00%
    Denisovan 81.28%
    Neanderthal-Altai 79.77%
    MBUTI 68.05%
    Ust-Ishim 60.43%
    Kostenki14 59.68%
    MA1 59.48%
    Eskimo1 59.10%
    Eskimo2 59.04%
    Karasuk-495-Diploid 59.01%
    Kotias-Diploid 58.97%
    BedouinB1 58.95%
    BedouinB2 58.83%
    Karasuk-493-Diploid 58.82%
    Karasuk-495-Haploid 58.72%
    Estonian2 58.72%
    Loschbour_Imputed 58.68%
    Estonian1 58.66%
    Loschbour 58.63%
    Karasuk-496-Diploid 58.63%
    Yamnaya-552-Dip 58.62%
    Kotias-Haploid 58.48%
    Sintashta-395-Dip 58.39%
    Yamnaya-552-Hap 58.22%
    Sintashta-395-Hap 58.05%
    Karasuk-493-Haploid 58.00%
    Karasuk-496-Haploid 57.58%

    SAMPLE IBS-MBUTI (68K SNPs)
    MBUTI 100.00%
    Denisovan 69.18%
    Neanderthal-Altai 68.75%
    CHIMP 68.05%
    BedouinB1 66.14%
    Ust-Ishim 66.07%
    BedouinB2 65.94%
    Eskimo2 65.33%
    Eskimo1 65.31%
    Karasuk-495-Diploid 65.11%
    Yamnaya-552-Dip 64.55%
    Kotias-Diploid 64.43%
    Karasuk-496-Diploid 64.41%
    Karasuk-493-Diploid 64.40%
    Sintashta-395-Dip 64.39%
    Estonian1 64.38%
    Estonian2 64.33%
    Loschbour_Imputed 63.09%
    Loschbour 63.07%
    Kostenki14 59.85%
    Karasuk-495-Haploid 59.48%
    MA1 59.46%
    Kotias-Haploid 59.23%
    Yamnaya-552-Hap 59.15%
    Sintashta-395-Hap 58.72%
    Karasuk-493-Haploid 58.61%
    Karasuk-496-Haploid 58.41%

    Edit: ** to mitigate aDNA damage
    Last edited by Kurd; 09-10-2018 at 03:26 PM.

  8. The Following User Says Thank You to Kurd For This Useful Post:

     Eterne (09-12-2018)

  9. #17
    Registered Users
    Posts
    78
    Sex
    Location
    Amerika ist wunderbar
    Ethnicity
    Greco-Mediterranean
    Nationality
    White American
    Y-DNA (P)
    J2b2*
    mtDNA (M)
    H1

    Germany Japan Italy
    I don't really know how to properly read/understand most formal admixture stats either. But I see them popping up in roughly every new ancient DNA study I read so i'd figure it's time and to stop being such a lazy ass. I don't need to know the hard math, just the basics.

    So lets take a look at this simple one from Lazaridis 2016 where they are testing whether or not the Natufians have SSA admixture relative to other Ancient West Eurasians.

    I have a few questions.My first would be

    A. What does the third column mean? What does it stand for?You know the one that says" f4(Other Ancient, African, Chimp)"? Also what does the Z score mean? I know if the Z score was higher that would indicate admixture occurred but that's about it.


  10. #18
    Registered Users
    Posts
    1,455
    Sex
    Ethnicity
    Anglo
    Nationality
    Canadian

    The f4 column is the actual value of the f4 statistic, the Z value measures the significance of the stat (of course a stronger stat tends to be more significant but it doesn't correspond perfectly).

  11. #19
    Registered Users
    Posts
    78
    Sex
    Location
    Amerika ist wunderbar
    Ethnicity
    Greco-Mediterranean
    Nationality
    White American
    Y-DNA (P)
    J2b2*
    mtDNA (M)
    H1

    Germany Japan Italy
    Quote Originally Posted by Megalophias View Post
    The f4 column is the actual value of the f4 statistic, the Z value measures the significance of the stat (of course a stronger stat tends to be more significant but it doesn't correspond perfectly).
    OK, so what is the F4 statistic? And what does chimp have to do with anything? I'm lost

  12. #20
    Registered Users
    Posts
    232
    Sex
    Location
    Papantla, Mexico

    Ireland Ireland England England Ireland Munster European Union
    There are so many people on here who could give a better answer than me here, but if I'm correct, the third column (f4) and fourth column (Z score) are related,but in that the Z score gives the 'strength/reliability' of the f4 stat in that if there are two separate f4 stats, with identical f4 values, the Z score for the stat with the highest SNP count will get a stronger Z score, supposedly being 'more reliable' for the extra SNPs.

    As I said, I stand to be corrected on that.

    Edit: should have actually read the thread-Megalophias had already posted the answer as to what the f4 stat columns mean.
    Last edited by Bas; 10-10-2018 at 11:10 PM.

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. Replies: 552
    Last Post: 10-30-2019, 11:00 PM
  2. Kurd Genetics using Formal Stats
    By Kurd in forum Western
    Replies: 55
    Last Post: 05-20-2016, 06:53 PM
  3. Are there any Autosomal DNA statistics for endogamous groups?
    By botoole60611 in forum Autosomal (auDNA)
    Replies: 1
    Last Post: 01-04-2016, 06:42 PM
  4. Replies: 0
    Last Post: 06-10-2015, 04:11 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •