PDA

View Full Version : Admixture Databases and models



Theconqueror
03-11-2016, 02:15 AM
Anyone knows if the databases behind Gedmatch Projects are available to the public? I would be interested in using different statistical models for matching autosomal signatures. Thanks.

AJL
03-11-2016, 05:21 AM
^ If you mean the population sample sets for the autosomal admixture analyses, those are mainly taken from direct user submissions to the various projects. Out of curiosity what models were you thinking of using instead of Fst?

Theconqueror
03-11-2016, 05:38 PM
I was thinking about running Neural Networks to train the data by sub-populations to generate and classify a library of aggregated admixture signatures. Then, use the library of admixture signatures as a base to detect a belonging class for a current signature (mine, yours, etc). The program would spit out the likelihood of belonging to an aggregated class. The Oracle does that but I don't know the statistical methodology.




^ If you mean the population sample sets for the autosomal admixture analyses, those are mainly taken from direct user submissions to the various projects. Out of curiosity what models were you thinking of using instead of Fst?

AJL
03-11-2016, 06:33 PM
^Interesting thought. I believe all the admixture analyses online use the standard Fst model to find components, then check the combination of components to find least squared mutations as compared to sample populations. But one other approach I've seen is D-statistics:

http://dienekes.blogspot.com/2012/12/d-statistics-on-admixture-components.html

Theconqueror
03-11-2016, 08:25 PM
I will look into that, thanks. Ultimately, when a population cluster is defined and known, it is a simple classification issue of knowing how close to an aggregated signature an individual is. There are different ways to approach this issue but I believe that using NN to classify and detect matching pattern is a valid one. I believe PCA is already used in some research looking at the shared fraction of the sample variance. I need to understand better the Oracle method that is used right now. It is interesting to see how forcing a 2-3-4 population solution generates quite informative results.



^Interesting thought. I believe all the admixture analyses online use the standard Fst model to find components, then check the combination of components to find least squared mutations as compared to sample populations. But one other approach I've seen is D-statistics:

http://dienekes.blogspot.com/2012/12/d-statistics-on-admixture-components.html