Results 1 to 5 of 5

Thread: How do you deal with eigenstrat files

  1. #1

    How do you deal with eigenstrat files

    Hi,

    I am new to this stuff and have managed to get Admixtools running and can do the examples.

    However, I am not sure how I am supposed to deal with the data in the source files, because they are quite large and also I am running on a cloud VM.

    1. How do you get the population names from the files that you are interested in. Do you just open the files and look at the values in the column, because the files are quite large.
    2. Does Admixtools run on individuals or on populations (default). How do you specify it to run on individuals, if possible?
    3. How do you run calcs on different datasets. Do you just merge the data into one big dataset? I know admixr has a function for this.
    4. Does the size of the dataset affect performance. If so, do you try and keep it as small as possible?
    Last edited by beyondAtheism; 05-14-2020 at 05:46 PM.

  2. #2
    Registered Users
    Posts
    241
    Sex
    Omitted

    What VM and distribution are you using?

  3. #3
    ubuntu 18.04 on an Azure VM, Standard D2s v3 (2 vcpus, 8 GiB memory) (I think one of their standard/low end models).

    I'm getting the hang of it kind of. Just making a note of all the populations I am interested in by grepping the ind file. I'll then have the names to use in R. Currently have an issue getting modern populations (North Indian), I am using the combined dataset from Reichs page with the HO array, but I am not sure what populations are in that HO array.

  4. #4
    Registered Users
    Posts
    1,704
    Sex
    Omitted

    1. The .ind file is basically your list of samples. Change the population names in the last column to suit your needs.
    2. Populations by default. The only way I know to run on individuals is change the name of the individuals in the .ind file.
    3. Yep, I use plink, never tried admixr.
    4. Yes absolutely. If you have a large master-dataset, and plan on doing extensive work with a small subset of populations, I'd highly recommend making a separate dataset with just those samples of interest. Making a separate dataset is quite quick.
    Collection of 14,000 d-stats: Hidden Content Part 2: Hidden Content Part 3: Hidden Content PM me for d-stats, qpadm, qpgraph, or f3-outgroup nmonte models.

  5. The Following 2 Users Say Thank You to Kale For This Useful Post:

     beyondAtheism (05-15-2020),  misnomer (05-22-2020)

  6. #5
    Quote Originally Posted by Kale View Post
    1. The .ind file is basically your list of samples. Change the population names in the last column to suit your needs.
    2. Populations by default. The only way I know to run on individuals is change the name of the individuals in the .ind file.
    3. Yep, I use plink, never tried admixr.
    4. Yes absolutely. If you have a large master-dataset, and plan on doing extensive work with a small subset of populations, I'd highly recommend making a separate dataset with just those samples of interest. Making a separate dataset is quite quick.
    what he said. Also, only plink should be used for merging or extracting a smaller dataset. You will also need convertf if using plink.
    if you have Reich's latest HO dataset, you really do not need to merge anything right now as most of the papers' samples are already there, except maybe 2-3 newest papers.

    HO has more modern samples than 1240k dataset. but would also recommend use of the 1240k dataset than HO dataset (~530k snps) if possible.

Similar Threads

  1. Replies: 5
    Last Post: 10-19-2019, 04:40 AM
  2. The Overall Best Deal
    By rms2 in forum Other
    Replies: 7
    Last Post: 10-05-2019, 05:45 PM
  3. DanteLabs 30x WGS(+mtDNA) -- $199/169 deal
    By poi in forum Dante Labs
    Replies: 5
    Last Post: 12-14-2018, 09:30 PM
  4. Replies: 24
    Last Post: 02-28-2018, 07:59 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •