Page 3 of 3 FirstFirst 123
Results 21 to 29 of 29

Thread: Understanding Formal Statistics, f4 and D-Stats

  1. #21
    Registered Users
    Posts
    63
    Sex
    Location
    Amerika ist wunderbar
    Ethnicity
    Greco-Mediterranean
    Nationality
    White American
    mtDNA (M)
    H1
    Y-DNA (P)
    J2b2*

    Germany Japan Italy
    Can someone explain what is the difference between f4 and f3?.

  2. #22
    Registered Users
    Posts
    232
    Sex
    Location
    Papantla, Mexico

    Ireland Ireland England England Ireland Munster European Union
    f4 stats:

    In the below example (http://eurogenes.blogspot.com/2016/0...screpancy.html)

    f4: Corded_Ware_Germany Anatolia_Neolithic CHG Chimp 0.002396 9.226 574503


    if the f4 is pops:

    A(Chimp)
    X (CHG)
    Y (Anatolia_N)
    Z (Corded Ware Germany)

    Then it shows how much close or further away one population (z) is to another population (x) compared to pop Y,using pop A as an outgroup. The example above is a good one as it shows corded ware has CHG admixture that is not present in Anatolia_N.

    The blue score shows the f4 score, which being positive here means that in this stat, the Z and X do share admixture to the exclusion of Y.

    The red score is the Z-score, which is generally always going to agree with the f4 score, with the only difference being that the Z score gives an indicator of how significant the f4 score is (generally I have found that a highly positive/negative f4 stat with low snp runs (snps used is final column) will result in a low Z-score. It's almost like a confidence score as low snps mean low result confidence)

    f3 stats:

    To my knowledge, the difference between f4 and f3 stats is that f3 only show evidence of admixture between two populations, with one population as an outgroup (Mbuti;Yamnaya,Corded Ware) would be hugely significant.
    Last edited by Bas; 12-12-2018 at 09:35 PM.

  3. #23
    Registered Users
    Posts
    191
    Sex
    Omitted

    Quote Originally Posted by Bas View Post
    f4 stats:

    In the below example (http://eurogenes.blogspot.com/2016/0...screpancy.html)

    f4: Corded_Ware_Germany Anatolia_Neolithic CHG Chimp 0.002396 9.226 574503
    Bas,

    The f4 test is used to calculate admixture ratios, correct? In this instance, how could we tell anything about the admixture ratio of any of the pops X, Y, or Z?

    This stat above just looks like a regular D-stat. If f4 can be used to calculate admix ratios, then what exactly does D-stats tell us that f4 doesn't?

    EDIT: Also, just to clarify, the positive stat just means CWC and CHG share more drift with each other than Anatolian Neo and CHG share with each other, and not that Anatolian Neolithic is necessarily without any CHG, right?
    Last edited by TuaMan; 12-13-2018 at 01:35 AM.

  4. #24
    Registered Users
    Posts
    232
    Sex
    Location
    Papantla, Mexico

    Ireland Ireland England England Ireland Munster European Union
    Quote Originally Posted by TuaMan View Post
    Bas,

    The f4 test is used to calculate admixture ratios, correct? In this instance, how could we tell anything about the admixture ratio of any of the pops X, Y, or Z?

    This stat above just looks like a regular D-stat. If f4 can be used to calculate admix ratios, then what exactly does D-stats tell us that f4 doesn't?

    EDIT: Also, just to clarify, the positive stat just means CWC and CHG share more drift with each other than Anatolian Neo and CHG share with each other, and not that Anatolian Neolithic is necessarily without any CHG, right?
    Yeah, I think you're right about the shared drift thing there, I worded it a bit clumsily! About the f4 stats and admixture ratios, qpAdm uses f4 stats to work out the admix proportions. This explains it: http://gensoft.pasteur.fr/docs/AdmixTools/4.1/pdoc.pdf .

    Also: http://science.sciencemag.org/conten...sdrecht_SM.pdf

    [For admixture modeling, we used the program qpAdm (v632) (16) of the admixtools v3.0
    package. QpAdm can be viewed as a generalization of f4 statistics jointly modeling multiple of
    them. It tests if the observed target population and the proposed admixture model for it are
    symmetrically related to a set of outgroups, and summarizes the results of multiple such
    comparisons into a single statistic (16). It also estimates ancestry proportion coefficients, and
    their 5 cM block jackknife SEs, by minimizing the difference between the target and the model.
    More specifically, qpAdm requires a target population (T), source/surrogate populations (S) and a
    set of outgroups (O). Outgroups are differentially related to sources so that they can be
    distinguished by f4 statistics (Fig. S18). However, at the same time, outgroups must be related to
    the target and the sources distantly enough so that a source and its related ancestry in the target
    have a symmetrical genetic distance to all outgroups. An example of many scenarios to break this
    prerequisite is a post-mixture gene flow from the target into an outgroup


    Difference between f4 stats and D-stats as stated by Nick Patterson: (actually copied this from Eurogenes comments section from a couple of years back)

    As mentioned earlier, D-statistics are very similar to the 4-population test statistics introduced in REICH et al. (2009). The primary difference is in the computation of the denominator of D. For statistical estimation, and testing for ‘treeness’, the D-statistics are preferable, as the denominator of D, the total number of ‘ABBA’ and ‘BABA’ events, is uninformative for whether a tree phylogeny is supported by the data, while D has a natural interpretation: the extent of the deviation on a normalized scale from -1 to 1.

    http://www.genetics.org/content/earl...ics.112.145037

  5. #25
    Registered Users
    Posts
    191
    Sex
    Omitted

    Does anyone here run Admixtools out of a Linux virtual machine (I have a Windows PC), and if so which VM do you recommend? Ditto for the distribution as well.

  6. #26
    Moderator
    Posts
    5,844
    Sex
    Location
    Normandy
    Ethnicity
    northwesterner
    mtDNA (M)
    H5a1
    Y-DNA (P)
    R-BY3604-Z275

    Normandie Netherlands Friesland Finland Orkney
    Quote Originally Posted by TuaMan View Post
    Does anyone here run Admixtools out of a Linux virtual machine (I have a Windows PC), and if so which VM do you recommend? Ditto for the distribution as well.
    I work with a Fedora25 (yes, only 25, and I don't want to update it, seeing the problems of shared biblios when installing admixtools on the newer versions) from a VM Oracle VirtualBox. That works, but the problems are unavoidable: slowness, management of the RAM. I planned to buy another PC with a Linux as the only system but money, money, money...
    En North alom, de North venom
    En North fum naiz, en North manom

    (Roman de Rou, Wace, 1160-1170)

  7. #27
    Moderator
    Posts
    1,640
    Sex
    Location
    Viseu
    Ethnicity
    Romanised Paesuri
    Nationality
    Portuguese
    mtDNA (M)
    H20 (H20c?)
    Y-DNA (P)
    E-Y168273

    Asturias Portugal 1143 Portugal 1485 Portugal Order of Christ
    Sorry for the newbie question, but how does one make plots with f3-stats in PAST3, like Matt at Eurogenes did here https://imgur.com/a/42BjyWe ?
    YDNA - E-Y31991>PF4428>Y134097>Y168273 (probably Scythian-Sarmatian). Domingos Rodrigues, b. circa 1680 Hidden Content , Viana do Castelo, Portugal
    mtDNA - H20. Maria Josefa de Almeida, b. circa 1750 Hidden Content , Porto, Portugal

    Global25 PCA West Eurasia dataset Hidden Content
    Hidden Content

  8. The Following User Says Thank You to Ruderico For This Useful Post:

     JMcB (03-03-2019)

  9. #28
    Moderator
    Posts
    5,844
    Sex
    Location
    Normandy
    Ethnicity
    northwesterner
    mtDNA (M)
    H5a1
    Y-DNA (P)
    R-BY3604-Z275

    Normandie Netherlands Friesland Finland Orkney
    Quote Originally Posted by Ruderico View Post
    Sorry for the newbie question, but how does one make plots with f3-stats in PAST3, like Matt at Eurogenes did here https://imgur.com/a/42BjyWe ?
    Assuming you want to plot a 2_columns matrix under a regression model (linear or polynomial), you choose "Model>Linear>Bivariate" or "Model>Polynomial". Example: 2 sÚries of D-stats showing affinity to WHG and natufian for some modern populations:

    Capture1.JPG

    First select the 2 columns and make PLOT XY:

    Capture2.JPG

    Obviously the regression is not linear. You select Model>Polynomial, and run. You get the natural parabolic regression:

    reg.jpg
    En North alom, de North venom
    En North fum naiz, en North manom

    (Roman de Rou, Wace, 1160-1170)

  10. The Following 2 Users Say Thank You to anglesqueville For This Useful Post:

     Eihwaz (03-03-2019),  JMcB (03-03-2019)

  11. #29
    Registered Users
    Posts
    145
    Sex
    Ethnicity
    Somali
    Nationality
    Canadian
    mtDNA (M)
    L3a2a
    Y-DNA (P)
    E-V32

    Canada Somalia
    Quote Originally Posted by TuaMan View Post
    Hey all, a question that's been on my mind for some time and that I've been meaning to seek further clarity on. Everyone who follows population genetics and has an interest in the historical ethnogenesis of different populations has surely encountered reference to these two "formal" statistical methods of inferrring population demographic history, they're standard features in the toolkit of any decent academic paper on historical population genetics. From my understanding, they're usually considered a more robust method of inferring population history than other tools like ADMIXTURE and whatnot.

    I've read some of the big papers that explain the use of these methods, namely Green et al 2010 (which I believe was the paper that actually introduced D-Stats) and another paper by Nick Patterson on D-Stats (I believe he actually invented the methodology itself). While I think I have a decent rudimentary grasp of what these methods are and how they work, I'm not gonna lie, the papers were pretty technical and I don't have any university-level course work in genetics or much in the way of advanced statistics, so it's pretty dense reading for me.

    So, I wanted to open a thread up on these formal stats in the hopes that some of the more knowledgable (and patient) members here could elaborate on how these methodologies really work. I would like to approach this from as much of a blank slate as possible, in the interests of trying to capture as much information on the technicalties of these tools as possible.

    1. In layman's terms, what are you actually testing when utilizing either of these methods?

    2. What is the exact difference between the two? When would you want to use, but not the other?

    3. What are some limitations or confounding factors that can skew results of these methods (I know choice of OutGroup is one, but how exactly?)

    I've been meaning to make a thread on this for a while, so I wanted to just finally put something out there and hope someone would be patient enough to flesh things out a bit. I know this an inherently technical topic, and I can ask a bit more targeted questions and explain more what I'm trying to get at if need be, but for now I figured I would try to keep it relatively basic. If anyone else reading this has their own questions about either of these methods, by all means feel free to chime in as well.
    Found this excellent guide from a pop. gen workshop, it covers everything from filtering/converting BAM files to working with ADMIXTOOLS:

    https://buildmedia.readthedocs.org/m...rkshop2019.pdf

  12. The Following 4 Users Say Thank You to blackflash16 For This Useful Post:

     Bas (06-05-2019),  Deftextra (06-05-2019),  Lank (05-31-2019),  TuaMan (06-01-2019)

Page 3 of 3 FirstFirst 123

Similar Threads

  1. Replies: 547
    Last Post: 04-15-2019, 09:00 PM
  2. Kurd Genetics using Formal Stats
    By Kurd in forum Western
    Replies: 55
    Last Post: 05-20-2016, 06:53 PM
  3. Are there any Autosomal DNA statistics for endogamous groups?
    By botoole60611 in forum Autosomal (auDNA)
    Replies: 1
    Last Post: 01-04-2016, 06:42 PM
  4. Replies: 0
    Last Post: 06-10-2015, 04:11 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •