How to merge 2 datasets with different number of SNPs using plink?

    Just a tip to speed up qpAdm processing time and save harddisk space (imp for virtual OS linux), might help someone.

    After merging datasets and converting packedped to eigenstrat file, then modifying the .ind file to add back population labels - one should reconvert the eigenstrat to packedancestrymap format using convertf. also add this line in parameter file -hashcheck: NO. This allows you to edit population labels in ind file.

    Size of geno file reduced by 75%, and qpAdm processing time reduced by 60%.
    You can eliminate 1 step from above by directly editing the .fam packedped individuals file and adding back population labels (replace '1s' in the 6th column inside .fam file). Then convertf to packedancestrymap format directly, instead of eigenstrat format. Remember to keep 'hashcheck: NO' in param file of convertf so that you can edit the resultant .ind file and still be able to run qpAdm without error.
    Here is how I usually work (I'm not claiming that this is the best method. Say that your 1040k file is A, B the other one).
    1) Convert the 2 files in bed/bim/fam (PACKEDPED) with eigentools/convertf
    2) In order to make less likely problems of positioning, exclude the SNPs in B that are not in A (you can try without this step, but in my experience it is often dangerous, and the conversion in eigenstrat may later result in problems of physical distance):
    ./plink --bfile A --write-snplist
    ./plink --bfile B --extract plink.snplist --make-bed --out B1
    (you may have to add the flag --allow-no-sex. I assume that you don't want to filter or prune in anyway).
    2) Try to merge
    ./plink --bfile A --bmerge B1.bed B1.bim B1.fam --indiv-sort 0 --make-bed --out essai
    (the flag --indiv-sort 0 is here in order to get the individuals of file B following those of A in the resulting ind.file)
    If you didn't encounter problems of multiallelic or multichromosomic snps, the job is done, you rename the "essai" files, and convert this plink file into eigenstrat
    3) If not, you'll read in the essai-merge.missnp and the essai.log files the lists of the problematic snps. Usually I sacrifice them: I write the list of these markers (say "badsnps"), and ./plink --bfile B1 --exclude badsnps --make-bed --out B2; then step 2 to merge A and B2. If you don't want to sacrifice them, you can try to flip the multi-allelic snps, and see whether you have still a problem. About multipositions, I don't know whether there is a simple possibility not to sacrifice the snps involved.
    Thanks for this- so how do i carry out step 3? What do i do to generate the "essai-merge.missnp" if using Plink v1.9? As i still encounter problematic snps. thanks in advance for answering

