Can someone explain what is the difference between f4 and f3?.
Can someone explain what is the difference between f4 and f3?.
f4 stats:
In the below example (http://eurogenes.blogspot.com/2016/0...screpancy.html)
f4: Corded_Ware_Germany Anatolia_Neolithic CHG Chimp 0.002396 9.226 574503
if the f4 is pops:
A(Chimp)
X (CHG)
Y (Anatolia_N)
Z (Corded Ware Germany)
Then it shows how much close or further away one population (z) is to another population (x) compared to pop Y,using pop A as an outgroup. The example above is a good one as it shows corded ware has CHG admixture that is not present in Anatolia_N.
The blue score shows the f4 score, which being positive here means that in this stat, the Z and X do share admixture to the exclusion of Y.
The red score is the Z-score, which is generally always going to agree with the f4 score, with the only difference being that the Z score gives an indicator of how significant the f4 score is (generally I have found that a highly positive/negative f4 stat with low snp runs (snps used is final column) will result in a low Z-score. It's almost like a confidence score as low snps mean low result confidence)
f3 stats:
To my knowledge, the difference between f4 and f3 stats is that f3 only show evidence of admixture between two populations, with one population as an outgroup (Mbuti;Yamnaya,Corded Ware) would be hugely significant.
Last edited by Bas; 12-12-2018 at 09:35 PM.
Bas,
The f4 test is used to calculate admixture ratios, correct? In this instance, how could we tell anything about the admixture ratio of any of the pops X, Y, or Z?
This stat above just looks like a regular D-stat. If f4 can be used to calculate admix ratios, then what exactly does D-stats tell us that f4 doesn't?
EDIT: Also, just to clarify, the positive stat just means CWC and CHG share more drift with each other than Anatolian Neo and CHG share with each other, and not that Anatolian Neolithic is necessarily without any CHG, right?
Last edited by TuaMan; 12-13-2018 at 01:35 AM.
Yeah, I think you're right about the shared drift thing there, I worded it a bit clumsily! About the f4 stats and admixture ratios, qpAdm uses f4 stats to work out the admix proportions. This explains it: http://gensoft.pasteur.fr/docs/AdmixTools/4.1/pdoc.pdf .
Also: http://science.sciencemag.org/conten...sdrecht_SM.pdf
[For admixture modeling, we used the program qpAdm (v632) (16) of the admixtools v3.0
package. QpAdm can be viewed as a generalization of f4 statistics jointly modeling multiple of
them. It tests if the observed target population and the proposed admixture model for it are
symmetrically related to a set of outgroups, and summarizes the results of multiple such
comparisons into a single statistic (16). It also estimates ancestry proportion coefficients, and
their 5 cM block jackknife SEs, by minimizing the difference between the target and the model.
More specifically, qpAdm requires a target population (T), source/surrogate populations (S) and a
set of outgroups (O). Outgroups are differentially related to sources so that they can be
distinguished by f4 statistics (Fig. S18). However, at the same time, outgroups must be related to
the target and the sources distantly enough so that a source and its related ancestry in the target
have a symmetrical genetic distance to all outgroups. An example of many scenarios to break this
prerequisite is a post-mixture gene flow from the target into an outgroup
Difference between f4 stats and D-stats as stated by Nick Patterson: (actually copied this from Eurogenes comments section from a couple of years back)
As mentioned earlier, D-statistics are very similar to the 4-population test statistics introduced in REICH et al. (2009). The primary difference is in the computation of the denominator of D. For statistical estimation, and testing for ‘treeness’, the D-statistics are preferable, as the denominator of D, the total number of ‘ABBA’ and ‘BABA’ events, is uninformative for whether a tree phylogeny is supported by the data, while D has a natural interpretation: the extent of the deviation on a normalized scale from -1 to 1.
http://www.genetics.org/content/earl...ics.112.145037
Does anyone here run Admixtools out of a Linux virtual machine (I have a Windows PC), and if so which VM do you recommend? Ditto for the distribution as well.
I work with a Fedora25 (yes, only 25, and I don't want to update it, seeing the problems of shared biblios when installing admixtools on the newer versions) from a VM Oracle VirtualBox. That works, but the problems are unavoidable: slowness, management of the RAM. I planned to buy another PC with a Linux as the only system but money, money, money...
En North alom, de North venom
En North fum naiz, en North manom
(Roman de Rou, Wace, 1160-1170)
Sorry for the newbie question, but how does one make plots with f3-stats in PAST3, like Matt at Eurogenes did here https://imgur.com/a/42BjyWe ?
YDNA - E-Y31991>PF4428>Y134097>Y168273 (probably Scythian-Sarmatian). Domingos Rodrigues, b. circa 1680 Hidden Content , Viana do Castelo, Portugal
mtDNA - H20. Maria Josefa de Almeida, b. circa 1750 Hidden Content , Porto, Portugal
Global25 PCA West Eurasia dataset Hidden Content
Hidden Content
JMcB (03-03-2019)
Assuming you want to plot a 2_columns matrix under a regression model (linear or polynomial), you choose "Model>Linear>Bivariate" or "Model>Polynomial". Example: 2 séries of D-stats showing affinity to WHG and natufian for some modern populations:
Capture1.JPG
First select the 2 columns and make PLOT XY:
Capture2.JPG
Obviously the regression is not linear. You select Model>Polynomial, and run. You get the natural parabolic regression:
reg.jpg
En North alom, de North venom
En North fum naiz, en North manom
(Roman de Rou, Wace, 1160-1170)
Found this excellent guide from a pop. gen workshop, it covers everything from filtering/converting BAM files to working with ADMIXTOOLS:
https://buildmedia.readthedocs.org/m...rkshop2019.pdf