Page 2 of 2 FirstFirst 12
Results 11 to 13 of 13

Thread: How to merge 2 datasets with different number of SNPs using plink?

  1. #11
    Quote Originally Posted by misnomer View Post
    Just a tip to speed up qpAdm processing time and save harddisk space (imp for virtual OS linux), might help someone.

    After merging datasets and converting packedped to eigenstrat file, then modifying the .ind file to add back population labels - one should reconvert the eigenstrat to packedancestrymap format using convertf. also add this line in parameter file -hashcheck: NO. This allows you to edit population labels in ind file.

    Size of geno file reduced by 75%, and qpAdm processing time reduced by 60%.
    You can eliminate 1 step from above by directly editing the .fam packedped individuals file and adding back population labels (replace '1s' in the 6th column inside .fam file). Then convertf to packedancestrymap format directly, instead of eigenstrat format. Remember to keep 'hashcheck: NO' in param file of convertf so that you can edit the resultant .ind file and still be able to run qpAdm without error.
    Last edited by misnomer; 02-07-2020 at 04:16 PM.

  2. The Following User Says Thank You to misnomer For This Useful Post:

     Kale (02-08-2020)

  3. #12
    Registered Users


  4. #13
    Junior Member

    Quote Originally Posted by anglesqueville View Post
    Here is how I usually work (I'm not claiming that this is the best method. Say that your 1040k file is A, B the other one).
    1) Convert the 2 files in bed/bim/fam (PACKEDPED) with eigentools/convertf
    2) In order to make less likely problems of positioning, exclude the SNPs in B that are not in A (you can try without this step, but in my experience it is often dangerous, and the conversion in eigenstrat may later result in problems of physical distance):
    ./plink --bfile A --write-snplist
    ./plink --bfile B --extract plink.snplist --make-bed --out B1
    (you may have to add the flag --allow-no-sex. I assume that you don't want to filter or prune in anyway).
    2) Try to merge
    ./plink --bfile A --bmerge B1.bed B1.bim B1.fam --indiv-sort 0 --make-bed --out essai
    (the flag --indiv-sort 0 is here in order to get the individuals of file B following those of A in the resulting ind.file)
    If you didn't encounter problems of multiallelic or multichromosomic snps, the job is done, you rename the "essai" files, and convert this plink file into eigenstrat
    3) If not, you'll read in the essai-merge.missnp and the essai.log files the lists of the problematic snps. Usually I sacrifice them: I write the list of these markers (say "badsnps"), and ./plink --bfile B1 --exclude badsnps --make-bed --out B2; then step 2 to merge A and B2. If you don't want to sacrifice them, you can try to flip the multi-allelic snps, and see whether you have still a problem. About multipositions, I don't know whether there is a simple possibility not to sacrifice the snps involved.
    Thanks for this- so how do i carry out step 3? What do i do to generate the "essai-merge.missnp" if using Plink v1.9? As i still encounter problematic snps. thanks in advance for answering

Page 2 of 2 FirstFirst 12

Similar Threads

  1. Number of SNPs tested.
    By dosas in forum MyHeritage
    Replies: 6
    Last Post: 10-02-2019, 06:45 PM
  2. Genetic Company, Number of SNPs used and Reliability
    By Lupus82 in forum Inquiries Corner
    Replies: 0
    Last Post: 02-16-2018, 10:23 PM
  3. Replies: 7
    Last Post: 11-05-2017, 07:41 PM
  4. A question about Plink
    By Anabasis in forum Inquiries Corner
    Replies: 0
    Last Post: 11-11-2015, 03:40 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts