PDA

View Full Version : How to get AdmixTools running



Bas
12-12-2018, 08:33 PM
A little guide on how to get started with the gold standard program used by the leading population geneticists. With this you'll be able to run D-stats to find evidence of admixture as well as qpAdm for admixture coefficients, as well as the other tools included in the qpAdm package. It's actually quite easy once you know where to get all the components from. A big obstacle for some may be having to use a linux machine but here I'll show you how to run Linux from usb,meaning no change at all to your OS,files etc.

Downloading required programs

Linux

Some Linux distributions have problems with AdmixTools. Fedora25 is one that definitely works.

1) If you want to run from USB, making sure you have a clean USB (all existing files will be erased after Fedora is written to it) go to: https://getfedora.org/en/workstation/download/

Haven’t tried versions newer than 25, so if you want that, click on link below download button where it says ‘Need instructions? Or a different version’

2) A media writer will download so you can turn your usb into a linux boot device. From here it’s self-explanatory

AdmixTools

From here: https://github.com/DReichLab/AdmixTools click the green ‘clone or download’ button. This will download an AdmixTools-master zipfile. Unzip and put on a separate USB. Create a new folder inside and call it ‘bin’
You also need to download the specific programs that will actually run what you want to do. One this same page, click the ‘releases’ tab and you’ll see here all the diff programs that AdmixTools can run. Download one of these files and put in the AdmixTools-master bin file

Datasets: Can be run straight out of the box with example data from here: https://reich.hms.harvard.edu/software


Or alternatively, download one of the recent datasets from here:
https://reich.hms.harvard.edu/datasets

Once in Linux-running the data examples in D-stats

Starting off in D-stats as it’s the first thing I learned to do and is pretty straightforward. Navigate to your USB with all the AdmixTools stuff and find the Data folder downloaded before. In this folder you’ll find 3 important files: allmap.ind allmap.geno and allmap.snp.These are EIGENSTRAT files and are the filetypes that the program works with. Copy or move these into your AdmixTools-master/bin folder

In the main AdmixTools-master folder, go into the ‘examples’ subfolder and find a file called ‘parqpDstat’. This file is in the format needed to run qpDstat. The file cannot be used as is so you need to fill it with your own information. This is the parameter file or ‘parfile’


DIR: here you must enter the path to the directory where your .ind/geno/snp files are located
SSS: enter just the file name(in this case allmap)
indivname: put here directory path to the folder where the .ind file is located
snpname: put here directory path to the folder where the .ind file is located
genotypename: put here directory path to the folder where the .geno file is located
poplistname: just the name of the textfile with the populations that you want to run

The poplistname is just a text file with 4 pops, one on each line, that you want to run with:

Mbuti
Uygur
French
Han

You can also run multiple D-stats at one time (number depending on machine memory). If you want to do this, replace ‘poplistname’ with ‘popfilename’ and create a text file with 4 pops on each line

3) Once the file is filled out, it should look something like this:

DIR: /run/media/liveuser/0E3F552A/AdmixTools-master/bin
SSS: allmap
indivname: /run/media/liveuser/0E3F552A/AdmixTools-master/bin/allmap.ind
snpname: /run/media/liveuser/0E3F552A/AdmixTools-master/bin/allmap.snp
genotypename: /run/media/liveuser/0E3F552A/AdmixTools-master/bin/allmap.geno
popfilename: list_qpDstatmultiple

Save as something like: parqpDstat1.

4) This should also be in the bin folder. About the .geno/snp/ind folders, don’t touch any folder apart from the .ind folder. This contains the individual names of each sample. It’s a good idea to use the example data files here just to get a feel for the program because all samples here are named. When merging later you will have to rename a lot of samples as during the merging process, the individual names, while keeping their ids, are renamed to a generic ‘control’. It’s this name-not the id-that you are needed to specify when making parfiles. Now ready for the first run

Running the program

Open up a terminal. This is a command line window that Linux uses. If no icon is present, search for it in the search box.
First we need to make sure we are in the directory where admixtools-master/bin is located. To do this easily, right click a file from this folder and copy the file path.

Type in ‘cd’ and then paste file path. Press enter. The input here should look like this:
cd /run/media/liveuser/0E3F552A/AdmixTools-master/bin

Re-paste the file path, but this time, add ‘qpDstat -p parqpDstat1 [-l lo] [-h hi] >logfile

so:

cd /run/media/liveuser/0E3F552A/AdmixTools-master/bin/qpDstat -p parqpDstat1 [-l lo] [-h hi] >logfile

Press enter and the program should run. It will look like it has frozen but after a few minutes the run will complete and the output will be written to Admix-Tools-master/bin

In some cases a warning message may appear about a missing package. Commonly ‘libblas’ or ‘liblapack’. To download both of them, make sure you are connected to internet (bluetooth tethering with smartphone also works) and type in the terminal:

Sudo dnf install lapack-devel

(in some versions ‘yum’ replaces ‘dnf’)

The necessary components will download. For me I have to do this everytime I use qpAdm, as I'm using Linux on a USB, and is a real pain but it takes 30 mins and for every run after that it's fine.

qpAdm

qpWave is supposed to be used before qpAdm, to verify the number of admixing populations into the target pop. To be honest I never used it and got results that were consistent with published papers but it should be used before.

As to running qpAdm, same idea as with qpDstat. All the commands for running qpAdm will be found in the material downloaded from reich lab. The Right pops list is essential for qpAdm, make sure you pick pops from every possible contributing admixtureangle in relation to the donor pops in the left pop list. So for modelling a European target, this means right pops like WHGs,EHGs,Anatolia_N,CHG,Levant_N, Iran_N,Natufians,Kostenki,Ust-Ishim and possibly any more that are relevant to European genetic pre-history to get every possible admixture angle covered, as well as Mota as a base,going at the top of the right pop list (mentioned once in a paper-I think Lazaridis 2016 Basal Eurasian-that it is important to have Mota as a base)

Kale
03-19-2019, 03:19 AM
I have both gsl and openblas installed in Fedora25 (via "dnf install gsl" and "dnf install openblas"), which are needing for compiling AdmixTools. During the process of compiling though, gsl and openblas cannot be found. What gives?
The binaries on github are not the newest versions, to get the newest versions you have to compile them yourself from what I understand.
Also, the qpgraph program needs one or both of those things to run as well, and can't find them.
Why can't Linux things ever just work? :(

Kale
03-21-2019, 05:51 AM
To follow up on the above issue. If 'installing' gsl straight from online or whatever doesn't work, try downloading the source files and build/compile it yourself, that worked for me. I can now confirm AdmixTools can be compiled & run on Debian9.8.0 as well.

anglesqueville
03-28-2019, 07:39 AM
To follow up on the above issue. If 'installing' gsl straight from online or whatever doesn't work, try downloading the source files and build/compile it yourself, that worked for me. I can now confirm AdmixTools can be compiled & run on Debian9.8.0 as well.

No problem at all with Debian 9.8? No issue of library sharing? I ask this question because I'm intending to buy a new system devoted to Linux and genetics only in the upcoming months, and I'm wondering about the Linux distribution. If Debian works perfectly, including with the updated versions of Admixtools, it's for me a very good news. Is it true with Eigensoft as well?

Kale
03-28-2019, 03:06 PM
I have not tried the newest version of admixtools. I ran the script available here...
http://docs.hpc.shef.ac.uk/en/latest/iceberg/software/apps/admixtools.html
But out of the 6 steps you'll see at the bottom of the page
1) Just downloads admixtools from github, anyone can do that
2) Installs GSL, it didn't do that right so I had to do it manually
3-4) Builds & installs admixtools... I don't remember if the script did that or I did it manually
5-6) nothing really
Also, what's eigensoft, what does that do?

anglesqueville
03-28-2019, 03:59 PM
Eigensoft is Patterson's package that contains smartpca and convertf (this one is precious for the conversions between usual formats, in particular eigenstrat and plink/packedped). About the newest versions of admixtools: I installed them without any problem on fedora25, but was unable to do the same on Helgenes' fedora27 (or 28,don't remember), because of very painful issues of shared libraries. If I build a new system on a very recent PCU (16 cores, perhaps 32, depending on the price of the AMD new processors which are awaited for the upcoming summer), I believe that it will not be a good idea to choose an outdated distribution of Linux. On the other hand, if I choose an updated distribution but am unable to install admixtools on it... well, some months of reflexion.

Kale
03-29-2019, 01:50 AM
For convertf I always just used the binary from github which worked out-of-box on Fedora25. I can't imagine convertf requires many dependencies.
When I get some time, which will probably be next Sunday, I'll try and get the newest version of admixtools working and give convertf a run.

Hmm... off-topic maybe maybe not. Weird issue with qpgraph. Working with a single graph file for all instances...
1) Original: worst outlier -3.297
2) Turned of lsqmode in the parfile: worst outlier -3.328 (basically no change)
3) Turned lsqmode back on, changed initmix from 1000 to 3000: worst outlier 13.973, proportions obviously wrong in multiple instances.
4) Changed initmix back to 1000 (all settings now back to original): result is not back to original, still looks like 3.
It's messed up in a way I have seen happen before when putting edges in certain places that are too unanchored maybe? for it to process right.

Ok nevermind... if you run into that issue close the terminal I guess and open a new one?

tahamimo
05-12-2020, 12:00 PM
Hi,

I followed the instruction but when running the D statistics, I receive the locus name errors:


$ qpDstat -p parqpDstat [-l lo] [-h hi] >logfile
fatalx:
bad chrom: APLT01000001.1
Aborted


here is a glimpse of how my snp file looks like:


loc0_pos17 APLT01000001.1 0.0 822 G T
loc1_pos100 APLT01000001.1 0.0 1070 G T
loc7_pos19 APLT01000001.1 0.0 3008 T G
loc7_pos32 APLT01000001.1 0.0 3021 T C
loc7_pos150 APLT01000001.1 0.0 3139 C A
loc7_pos176 APLT01000001.1 0.0 3165 C T
loc7_pos179 APLT01000001.1 0.0 3168 A G
loc7_pos184 APLT01000001.1 0.0 3173 A G
loc16_pos41 APLT01000003.1 0.0 2263 G A
loc16_pos55 APLT01000003.1 0.0 2277 C T
loc18_pos14 KL579098.1 0.0 19491 T C
loc19_pos7 KL579098.1 0.0 38380 G A
loc19_pos22 KL579098.1 0.0 38395 C A
loc21_pos46 KL579099.1 0.0 7699 G T
loc21_pos70 KL579099.1 0.0 7723 G A
loc22_pos5 KL579099.1 0.0 27665 T C
loc22_pos10 KL579099.1 0.0 27670 A T
loc22_pos32 KL579099.1 0.0 27692 A C
loc25_pos11 KL579100.1 0.0 8407 A C
loc25_pos24 KL579100.1 0.0 8420 C T
loc25_pos31 KL579100.1 0.0 8427 T C
loc25_pos66 KL579100.1 0.0 8462 T A
loc25_pos67 KL579100.1 0.0 8463 G A
loc25_pos75 KL579100.1 0.0 8471 A C
loc25_pos82 KL579100.1 0.0 8478 C A
loc25_pos98 KL579100.1 0.0 8494 G A
loc25_pos100 KL579100.1 0.0 8496 C A
loc26_pos42 KL579100.1 0.0 23965 C T
loc27_pos7 KL579100.1 0.0 49741 A G
loc27_pos8 KL579100.1 0.0 49742 C G
loc27_pos14 KL579100.1 0.0 49748 A G
loc27_pos16 KL579100.1 0.0 49750 G C
loc27_pos23 KL579100.1 0.0 49757 T C
loc27_pos24 KL579100.1 0.0 49758 A G
loc27_pos39 KL579100.1 0.0 49773 C G
loc27_pos43 KL579100.1 0.0 49777 T C
loc27_pos45 KL579100.1 0.0 49779 C G
loc27_pos93 KL579100.1 0.0 49827 T A
loc27_pos95 KL579100.1 0.0 49829 G C
loc27_pos122 KL579100.1 0.0 49856 G T
loc27_pos129 KL579100.1 0.0 49863 T C

do you know how I should fix this?

Kale
05-15-2020, 05:29 AM
If I had to guess I'd say it's expecting a chromosome number in column 2 rather than APLT####

misnomer
05-17-2020, 09:19 AM
Hi,

I followed the instruction but when running the D statistics, I receive the locus name errors:


$ qpDstat -p parqpDstat [-l lo] [-h hi] >logfile
fatalx:
bad chrom: APLT01000001.1
Aborted


here is a glimpse of how my snp file looks like:


loc0_pos17 APLT01000001.1 0.0 822 G T
loc1_pos100 APLT01000001.1 0.0 1070 G T
loc7_pos19 APLT01000001.1 0.0 3008 T G
loc7_pos32 APLT01000001.1 0.0 3021 T C
loc7_pos150 APLT01000001.1 0.0 3139 C A
loc7_pos176 APLT01000001.1 0.0 3165 C T
loc7_pos179 APLT01000001.1 0.0 3168 A G
loc7_pos184 APLT01000001.1 0.0 3173 A G
loc16_pos41 APLT01000003.1 0.0 2263 G A
loc16_pos55 APLT01000003.1 0.0 2277 C T
loc18_pos14 KL579098.1 0.0 19491 T C
loc19_pos7 KL579098.1 0.0 38380 G A
loc19_pos22 KL579098.1 0.0 38395 C A
loc21_pos46 KL579099.1 0.0 7699 G T
loc21_pos70 KL579099.1 0.0 7723 G A
loc22_pos5 KL579099.1 0.0 27665 T C
loc22_pos10 KL579099.1 0.0 27670 A T
loc22_pos32 KL579099.1 0.0 27692 A C
loc25_pos11 KL579100.1 0.0 8407 A C
loc25_pos24 KL579100.1 0.0 8420 C T
loc25_pos31 KL579100.1 0.0 8427 T C
loc25_pos66 KL579100.1 0.0 8462 T A
loc25_pos67 KL579100.1 0.0 8463 G A
loc25_pos75 KL579100.1 0.0 8471 A C
loc25_pos82 KL579100.1 0.0 8478 C A
loc25_pos98 KL579100.1 0.0 8494 G A
loc25_pos100 KL579100.1 0.0 8496 C A
loc26_pos42 KL579100.1 0.0 23965 C T
loc27_pos7 KL579100.1 0.0 49741 A G
loc27_pos8 KL579100.1 0.0 49742 C G
loc27_pos14 KL579100.1 0.0 49748 A G
loc27_pos16 KL579100.1 0.0 49750 G C
loc27_pos23 KL579100.1 0.0 49757 T C
loc27_pos24 KL579100.1 0.0 49758 A G
loc27_pos39 KL579100.1 0.0 49773 C G
loc27_pos43 KL579100.1 0.0 49777 T C
loc27_pos45 KL579100.1 0.0 49779 C G
loc27_pos93 KL579100.1 0.0 49827 T A
loc27_pos95 KL579100.1 0.0 49829 G C
loc27_pos122 KL579100.1 0.0 49856 G T
loc27_pos129 KL579100.1 0.0 49863 T C

do you know how I should fix this?

This is what your snp file should like like when working with admixtools


rs3094315 1 0.020130 752566 G A
rs12124819 1 0.020242 776546 G A
rs28765502 1 0.022137 832918 C T
rs7419119 1 0.022518 842013 G T
rs950122 1 0.022720 846864 C G
rs113171913 1 0.023436 869303 T C
rs13302957 1 0.024116 891021 G A
rs59986066 1 0.024183 893462 C T
rs112905931 1 0.024260 896271 T C
rs6696609 1 0.024457 903426 T C
rs13303368 1 0.024771 914852 G C
rs8997 1 0.025727 949654 A G

1st column is SNP name
2nd column is chromosome. X chromosome is encoded as 23.
Also, Y is encoded as 24, mtDNA is encoded as 90, and XY is encoded as 91.
Note: SNPs with illegal chromosome values, such as 0, will be removed
3rd column is genetic position (in Morgans). If unknown, ok to set to 0.0.
4th column is physical position (in bases)
Optional 5th and 6th columns are reference and variant alleles.

tahamimo
05-19-2020, 09:38 AM
This is what your snp file should like like when working with admixtools


rs3094315 1 0.020130 752566 G A
rs12124819 1 0.020242 776546 G A
rs28765502 1 0.022137 832918 C T
rs7419119 1 0.022518 842013 G T
rs950122 1 0.022720 846864 C G
rs113171913 1 0.023436 869303 T C
rs13302957 1 0.024116 891021 G A
rs59986066 1 0.024183 893462 C T
rs112905931 1 0.024260 896271 T C
rs6696609 1 0.024457 903426 T C
rs13303368 1 0.024771 914852 G C
rs8997 1 0.025727 949654 A G

1st column is SNP name
2nd column is chromosome. X chromosome is encoded as 23.
Also, Y is encoded as 24, mtDNA is encoded as 90, and XY is encoded as 91.
Note: SNPs with illegal chromosome values, such as 0, will be removed
3rd column is genetic position (in Morgans). If unknown, ok to set to 0.0.
4th column is physical position (in bases)
Optional 5th and 6th columns are reference and variant alleles.

Thanks for your explanation,
so based on what you said I cannot change chrom names to random numbers because they have meanings
however, I noticed you changed all to 1, although they have different names, that means that I have only 1 chrom type in my second column? (all different names follow the .1 decimal)

tahamimo
05-19-2020, 11:34 AM
I have already tried changing all chrom names to 1
when running I get:

fatalx:
(ineigenstrat) too many lines in file 189229 189229
Aborted

then I tried to cut the snps to only 20,000, still when trying to run, receive the same error:

fatalx:
(ineigenstrat) too many lines in file 20001 20001
Aborted

how much less should my snps be?

accordingly my genotype file is also huge (200,000 rows), should I cut that too??

misnomer
05-19-2020, 03:06 PM
I have already tried changing all chrom names to 1
when running I get:


then I tried to cut the snps to only 20,000, still when trying to run, receive the same error:


how much less should my snps be?

accordingly my genotype file is also huge (200,000 rows), should I cut that too??

You should not be editing SNP file or genotype file manually. What eigenstrat database are you using? all chromosome names shouldnt be 1. they should go from 1,2,3 till 24. I just gave a small snippet of my snp file, there are ~93000 snps on chromosome number 1, similar on chromosome 2 and so on. in total, my snp file has 1233013 rows (downloaded from Reich harvard lab website).

What are you trying to accomplish? maybe you need to download your dataset and start afresh as now its corrupted.

tahamimo
05-19-2020, 04:12 PM
I have a vcf file, I used vcf2eigenstart to create the genotype and snp file out of it, and I presented the original resulted snp file in my first comment, so basically I cannot manually change it which leaves me with two options:
either finding another program to be able to convert my snp format into the desired format for admixtools
or give up on this package as it is not really practical for a wide variety of input formats

Do you have a suggestion for my first option?

Kale
05-20-2020, 02:09 AM
I have a general question on qpgraph... is there a way to specify the admixture proportions for a single admixture event? Usually qpgraph can pick out the right proportions even with quite minimal constraints, but occasionally it doesn't, and without adding another population to constrain the possible weights, I'd like to just specify the weights because they are known from other runs or other lines of evidence.

misnomer
05-20-2020, 01:42 PM
I have a vcf file, I used vcf2eigenstart to create the genotype and snp file out of it, and I presented the original resulted snp file in my first comment, so basically I cannot manually change it which leaves me with two options:
either finding another program to be able to convert my snp format into the desired format for admixtools
or give up on this package as it is not really practical for a wide variety of input formats

Do you have a suggestion for my first option?


loc0_pos17 APLT01000001.1 0.0 822 G T
loc1_pos100 APLT01000001.1 0.0 1070 G T
loc7_pos19 APLT01000001.1 0.0 3008 T G
loc7_pos32 APLT01000001.1 0.0 3021 T C
loc7_pos150 APLT01000001.1 0.0 3139 C A
loc7_pos176 APLT01000001.1 0.0 3165 C T
loc7_pos179 APLT01000001.1 0.0 3168 A G
loc7_pos184 APLT01000001.1 0.0 3173 A G
loc16_pos41 APLT01000003.1 0.0 2263 G A
loc16_pos55 APLT01000003.1 0.0 2277 C T
loc18_pos14 KL579098.1 0.0 19491 T C
loc19_pos7 KL579098.1 0.0 38380 G A
loc19_pos22 KL579098.1 0.0 38395 C A
loc21_pos46 KL579099.1 0.0 7699 G T
loc21_pos70 KL579099.1 0.0 7723 G A
loc22_pos5 KL579099.1 0.0 27665 T C
loc22_pos10 KL579099.1 0.0 27670 A T
loc22_pos32 KL579099.1 0.0 27692 A C
loc25_pos11 KL579100.1 0.0 8407 A C
loc25_pos24 KL579100.1 0.0 8420 C T
loc25_pos31 KL579100.1 0.0 8427 T C
loc25_pos66 KL579100.1 0.0 8462 T A
loc25_pos67 KL579100.1 0.0 8463 G A
loc25_pos75 KL579100.1 0.0 8471 A C
loc25_pos82 KL579100.1 0.0 8478 C A
loc25_pos98 KL579100.1 0.0 8494 G A
loc25_pos100 KL579100.1 0.0 8496 C A
loc26_pos42 KL579100.1 0.0 23965 C T
loc27_pos7 KL579100.1 0.0 49741 A G
loc27_pos8 KL579100.1 0.0 49742 C G
loc27_pos14 KL579100.1 0.0 49748 A G
loc27_pos16 KL579100.1 0.0 49750 G C
loc27_pos23 KL579100.1 0.0 49757 T C
loc27_pos24 KL579100.1 0.0 49758 A G
loc27_pos39 KL579100.1 0.0 49773 C G
loc27_pos43 KL579100.1 0.0 49777 T C
loc27_pos45 KL579100.1 0.0 49779 C G
loc27_pos93 KL579100.1 0.0 49827 T A
loc27_pos95 KL579100.1 0.0 49829 G C
loc27_pos122 KL579100.1 0.0 49856 G T
loc27_pos129 KL579100.1 0.0 49863 T C

maybe the ''.1"' at the end of your column 2 denotes chromosome number 1. What does the column 2 of the last row of the eigenstrat say? if its in the 20s you know for sure its the chromosome number. So in that case edit the 2nd column to keep only the digits after the fullstop. leave the other columns as they are.

First Redownload the vcf and reconvert it just so you are sure its not crrupted.

tahamimo
05-23-2020, 10:43 AM
I have all my original data, nothing is corrupted misnomer :D

The column 2 doesn't seem to be chromosomes,
my dataset are RADseq not the whole genome sequencing,

this is the tail part:

loc37498_pos217 CM002851.1 0.0 13352 A G
loc37501_pos27 CM002851.1 0.0 13880 T C
loc37504_pos140 CM002851.1 0.0 14485 T C
loc37504_pos148 CM002851.1 0.0 14493 A C
loc37506_pos50 CM002851.1 0.0 14817 C T
loc37506_pos80 CM002851.1 0.0 14847 C T
loc37506_pos85 CM002851.1 0.0 14852 A T
loc37506_pos133 CM002851.1 0.0 14900 C T
loc37506_pos135 CM002851.1 0.0 14902 T A
loc37506_pos152 CM002851.1 0.0 14919 T C
loc37506_pos179 CM002851.1 0.0 14946 T C

TuaMan
06-15-2020, 06:07 PM
Running the program

Open up a terminal. This is a command line window that Linux uses. If no icon is present, search for it in the search box.
First we need to make sure we are in the directory where admixtools-master/bin is located. To do this easily, right click a file from this folder and copy the file path.

Type in ‘cd’ and then paste file path. Press enter. The input here should look like this:
cd /run/media/liveuser/0E3F552A/AdmixTools-master/bin

Re-paste the file path, but this time, add ‘qpDstat -p parqpDstat1 [-l lo] [-h hi] >logfile

so:

cd /run/media/liveuser/0E3F552A/AdmixTools-master/bin/qpDstat -p parqpDstat1 [-l lo] [-h hi] >logfile

Press enter and the program should run. It will look like it has frozen but after a few minutes the run will complete and the output will be written to Admix-Tools-master/bin


So I downloaded Fedora 32 recently, running it off VMware. I have my geno, ind, snp, par file, list_qpDstat file,and qpDstat executable (pulled straight from the 'Release' tab of the Admixtools github page) all saved in my bin folder. If I right click on any blank space within the bin folder, I can open the terminal, and it looks like it can already recognize that I'm in the bin file path where I have everything saved. But when I write my "qpDstat -p parqpDstat [-l lo] [-h hi] >logfile" code, I get this back:

bash: qpDstat: command not found...

If I try to just enter my code the long way including the file path, i.e. "cd /myfilepath/bin/qpDstat -p parqpDstat [-l lo] [-h hi] >logfile," I get:

bash: cd: too many arguments

Anyone have any ideas? For the second error, I'm guessing the redundancy of including the "cd /filepath" is causing the error, since the terminal already recognizes it's opened from the bin folder itself, but I'm concerned with the first error, if I download the executables straight from the site why is it not being recognized? I shouldn't have to recompile it myself, right?

Kale
06-16-2020, 02:32 AM
So I downloaded Fedora 32 recently, running it off VMware. I have my geno, ind, snp, par file, list_qpDstat file,and qpDstat executable (pulled straight from the 'Release' tab of the Admixtools github page) all saved in my bin folder. If I right click on any blank space within the bin folder, I can open the terminal, and it looks like it can already recognize that I'm in the bin file path where I have everything saved. But when I write my "qpDstat -p parqpDstat [-l lo] [-h hi] >logfile" code, I get this back:

bash: qpDstat: command not found...

If I try to just enter my code the long way including the file path, i.e. "cd /myfilepath/bin/qpDstat -p parqpDstat [-l lo] [-h hi] >logfile," I get:

bash: cd: too many arguments

Anyone have any ideas? For the second error, I'm guessing the redundancy of including the "cd /filepath" is causing the error, since the terminal already recognizes it's opened from the bin folder itself, but I'm concerned with the first error, if I download the executables straight from the site why is it not being recognized? I shouldn't have to recompile it myself, right?

For... bash: qpDstat: command not found

Right click the d-stat executable > properties > check something to the effect of 'allow running as executable'.
I ran into the same thing, aggravated the crap out of me, it's like really?!

TuaMan
06-16-2020, 03:06 AM
For... bash: qpDstat: command not found

Right click the d-stat executable > properties > check something to the effect of 'allow running as executable'.
I ran into the same thing, aggravated the crap out of me, it's like really?!

No luck :/ the "Allow executing file as program" is already checked off. All other access permission are set to Read/Write too.

Kale
06-16-2020, 03:14 AM
Hmmm... maybe go to the bin folder > open terminal from there (so it starts in the bin folder), and try...
./qpDstat -p parqpDstat [-l lo] [-h hi] >logfile

TuaMan
06-16-2020, 03:25 AM
Well maybe progress of a sort, looks like it actually tried to run when I throw in the "./" before the rest of the code but it failed again due to this:

fatalx:
No samples: Yoruba
Aborted (core dumped)

Kale
06-16-2020, 03:30 AM
Are you sure it's Yoruba and not Yoruba.DG? It has to be exact, including case.

TuaMan
06-16-2020, 03:44 AM
I copied both the list_qpDstat and example.ind files straight from the master directory download to my bin folder. The list_qpDstat looks like this:

Yoruba
French
Han
Uygur

And the example.ind file:

SAMPLE0 F Case
SAMPLE1 M Case
SAMPLE2 F Control
SAMPLE3 M Control
SAMPLE4 F Control

Do I need to rename the SAMPLEs to Yoruba/French/Han/Uygur in the ind file itself?

Kale
06-16-2020, 03:47 AM
Ahhh yep, indeed you do. The last column is your population names, in this case the only populations you have are 'Case' and 'Control'.
Hmmm also, there is something else... The list should look like...
Yoruba French Han Uygur
Not...
Yoruba
French
Han
Uygur

There is a setting somewhere you have to change to be able to run multiple stats with one command.
What does your parfile look like?

TuaMan
06-16-2020, 05:11 AM
Success!!

result: Yoruba French Han Uygur 0.0410 32.300 - 30753 28328 621026
result: Yoruba Han French Uygur 0.0865 51.447 - 33690 28328 621026
result: Yoruba Uygur French Han 0.0456 22.403 best 33690 30753 621026



Ahhh yep, indeed you do. The last column is your population names, in this case the only populations you have are 'Case' and 'Control'.
Hmmm also, there is something else... The list should look like...
Yoruba French Han Uygur
Not...
Yoruba
French
Han
Uygur

There is a setting somewhere you have to change to be able to run multiple stats with one command.
What does your parfile look like?

Basic error on my part - the ind/snp/geno files I was using originally were the dummy example files from the master download, I didn't actually download the example data-set from the Reich lab :doh: I re-read Bas's original post, noticed my oversight, downloaded the set, used the right allmap ind/geno/snp files, and it worked.

Regarding the list order, that was the default order of the 4 pops on the list_qpDstat file, and interestingly it actually ran the three tests for me above instead of just the one for Yoruba French Han Uyghur. The list_qpDstat1 file is the one that's supposed to run multiple tests for different pops, with 4 pops to a line, but it doesn't run all the permutations possible for each individual test like list_qpDstat did.

My list_qpDstat1 file:

Yoruba French Han Uygur
Yoruba French Japanese Uygur

My results:

result: Yoruba French Han Uygur 0.0410 32.300 30753 28328 621026
result: Yoruba French Japanese Uygur 0.0402 31.016 30690 28320 621026

Kale
06-16-2020, 05:31 AM
Ooo interesting, I didn't realize list_qpDstat setup ran permutations, that's pretty cool.
Might I ask when you get everything setup, which d-stats will you be running first? I'm sure you have a few in mind B)

Michalis Moriopoulos
06-17-2020, 12:31 AM
Is there any particular reason this program was designed for use with Linux and not the more common operating systems?

TuaMan
06-17-2020, 01:33 AM
Ooo interesting, I didn't realize list_qpDstat setup ran permutations, that's pretty cool.
Might I ask when you get everything setup, which d-stats will you be running first? I'm sure you have a few in mind B)

Oh, what won't I be running B) Seriously though, I wanna take a stab at looking into the oldest available aDNA we have from across Africa, Europe, the Middle East, and Asia and seeing if I can find anything interesting. Deep ancestry and relationships stuff. For the most part I'm not super interested in getting into the weeds with figuring out complexities of admixture and relationships with Iron Age, Bronze Age, Copper Age samples, but I'll probably get into that stuff too on a case-by-case basis.


Is there any particular reason this program was designed for use with Linux and not the more common operating systems?

That's something I've wondered about too, and that was the biggest reason preventing me from even making an attempt at this for so long. But I finally had enough free time this weekend and just said to hell with it, I'm interested enough in this stuff to at least make an effort to try to get more involved with it than just sitting around and waiting months (or years) for interesting papers to come out and have others analyze and explain everything for me. But I don't wanna get ahead of myself yet either, I still have to learn about merging different data-sets and what not which sounds like a mini-nightmare in itself, but that's gonna be a problem for this coming weekend most likely.:)

I'm not a programmer or any kind of tech geek at all, I don't think I've ever been in a Linux environment until this weekend. All the programs in Admixtools seem to be written in C, I'm not sure why Windows couldn't handle processing any of them.

Kale
06-17-2020, 02:08 AM
Oh, what won't I be running B) Seriously though, I wanna take a stab at looking into the oldest available aDNA we have from across Africa, Europe, the Middle East, and Asia and seeing if I can find anything interesting. Deep ancestry and relationships stuff. For the most part I'm not super interested in getting into the weeds with figuring out complexities of admixture and relationships with Iron Age, Bronze Age, Copper Age samples, but I'll probably get into that stuff too on a case-by-case basis.
Alright! I was beginning to think I was alone in interest of deep ancestries :beerchug:


That's something I've wondered about too, and that was the biggest reason preventing me from even making an attempt at this for so long. But I finally had enough free time this weekend and just said to hell with it, I'm interested enough in this stuff to at least make an effort to try to get more involved with it than just sitting around and waiting months (or years) for interesting papers to come out and have others analyze and explain everything for me. But I don't wanna get ahead of myself yet either, I still have to learn about merging different data-sets and what not which sounds like a mini-nightmare in itself, but that's gonna be a problem for this coming weekend most likely.:)


Don't worry, if you've gotten the programs running that's the hard part. Managing datasets is easy peazy.
I'd recommend getting notepad++ if you don't have it already, it's probably not uber-essential or even necessary for this (granted I've never tried without it), but it's such a damn useful program for any text management and is super lightweight.
Also very important tip, in your .fam files make sure you do an old-fashioned find&replace, if there's any (9 or -9 I forget which) replace it with 1.

polishguy
06-17-2020, 08:57 AM
Hi, for beginners it is better to use R runner for Admixtools. More https://anthrogenica.com/showthread.php?20714-bodkan-admixr-an-interface-for-running-ADMIXTOOLS-directly-from-R&p=676930&viewfull=1#post676930

Kale
07-06-2020, 05:17 AM
Had an issue with the new qpgraph using qpfstats

fatalx:
(vlog): negative or zero value 0
Aborted

Solution: Make sure the labeling section at the beginning of your graph file has the populations in the same order as they are in the list file used to generate the fstats.

TuaMan
07-30-2020, 04:23 AM
Are the executables no longer available on GitHub or something? When I first downloaded the version 6 zip file last month, on the "Releases" tab of the https://github.com/DReichLab/AdmixTools page you could download the actual executable files themselves directly, as Bas mentioned in his original post, but now it appears they're completely gone? Do we have to manually compile the programs from the source codes inlcuded in the zip file download now?

I downloaded GCC which is supposed to be a C compiler. However whenever I go into the src folder and try to compile any programs - convertf, qpDstat, qp3Pop - the minute I write my gcc command with whichever file I want to compile I get this:

fatal error: nicklib.h: No such file or directory

Which makes no sense because I see that I have a nicklib.h file in my src folder...

EDIT: Completely forgot the executables are all saved out at the very bottom of the Releases page, under v1.0...I thought with each new version there would be a new roll-out of the programs too. Haven't touched any of this stuff in over a month since I first got up and running so my memory was a little hazy.