Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 29

Thread: R and shell scripts for admixture calculators

  1. #11
    Registered Users
    Posts
    315
    Sex
    Ethnicity
    Finnish

    Use ComplexHeatmap to display calculator results in multiple columns

    Quote Originally Posted by PLogan View Post
    I now want to experiment with maybe doing this with other calculators such as K36, although that might produce a very busy chart.
    For calculators with a large number of components like Eurogenes K36 or MDLP K27, maybe a heatmap will be more clear. And a heatmap also has the advantage that you can easily see which components have small percentages like 0% or 1%.

    If you have a heatmap with a large number of columns and rows, then I think it's easier to read if you have row and column names on all four sides of the heatmap or if you split the heatmap in multiple columns. Neither of those is supported by the pheatmap package I used earlier, but you can use the ComplexHeatmap package instead:



    In order to reorder the columns of the heatmap, I did an MDS of the FST matrix of the calculator and I ordered the components based on their position in the first dimension of the MDS. I reordered the rows of the heatmap based on the position of the populations in a hierarchical clustering tree, where I reordered the branches of the tree so that populations with a high percentage of the first columns were placed first and populations with a high percentage of the last columns were placed last. Another approach to ordering the rows and columns would be to use the Seriation package: https://jokergoo.github.io/ComplexHe...tmap-seriation.

    Code:
    library(ComplexHeatmap)
    library(circlize) # for `colorRamp2`
    library(vegan) # for `reorder.hclust`
    library(colorspace) # for `hex`
    
    t=read.csv("https://pastebin.com/raw/0qDaQ8HL",row.names=1,check.names=F) # MDLP K27
    fst=read.csv("https://pastebin.com/raw/6t0gCQVh",row.names=1)
    
    mds=cmdscale(fst,ncol(fst)-1)
    t2=as.matrix(t)%*%mds
    
    t=t[,order(-mds[,1])] # reorder columns based on position in first dimension of MDS matrix of FST matrix
    
    t=t[reorder(hclust(dist(t2)),as.matrix(t)%*%seq(ncol(t))^2)$order,] # order rows based on position in hierarchical clustering tree so populations with a high percentage of the first columns are placed first
    # t=t[order(prcomp(t2)$x[,1]),] # order rows based on position in first dimension of PCA
    # t=t[reorder(hclust(dist(t2)),prcomp(t2)$x[,1])$order,] # order rows based on position in hierarchical clustering tree weighed by PC1 of PCA
    # ser=seriation::seriate(t,method="BEA_TSP");t=t[get_order(ser,1),get_order(ser,2)] # use the `seriation` package to reorder rows and columns (this doesn't take FST into account)
    
    rows=40
    mats=lapply(seq(1,nrow(t),rows),function(i)as.matrix(t[i:(i+rows-1),]))
    
    cellsize=30
    png("1.png",w=length(mats)*(ncol(t)*cellsize+1000),h=rows*cellsize+1000,res=72)
    
    maps=lapply(mats,function(mat){
      Heatmap(
        mat,
        show_heatmap_legend=F,
        show_column_names=F,
        show_row_names=F,
        cluster_columns=F,
        cluster_rows=F,
        width=ncol(mat)*unit(cellsize,"pt"),
        height=nrow(mat)*unit(cellsize,"pt"),
        top_annotation=columnAnnotation(text=anno_text(gt_render(colnames(mat),padding=unit(c(2,2,2,2),"mm")),just="left",rot=90,location=unit(0,"npc"),gp=gpar(fontsize=17))),
        bottom_annotation=columnAnnotation(text=anno_text(gt_render(colnames(mat),padding=unit(c(2,2,2,2),"mm")),just="left",rot=270,gp=gpar(fontsize=17))),
        left_annotation=rowAnnotation(text1=anno_text(gt_render(rownames(mat),padding=unit(c(2,2,2,2),"mm")),just="right",location=unit(1,"npc"),gp=gpar(fontsize=17))),
        right_annotation=rowAnnotation(text1=anno_text(gt_render(rownames(mat),padding=unit(c(2,2,2,2),"mm")),just="left",location=unit(0,"npc"),gp=gpar(fontsize=17))),
        col=colorRamp2(seq(0,100,length.out=7),hex(HSV(c(210,210,130,60,40,20,0),c(0,rep(.5,6)),1))),
        cell_fun=function(j,i,x,y,w,h,fill)grid.text(sprintf("%.0f",mat[i,j]),x,y,gp=gpar(fontsize=15))
      )
    })
    
    draw(Reduce(`+`,maps))
    dev.off()
    system("mogrify -gravity center -trim -border 16 -bordercolor white 1.png")
    I also made a script which uses ComplexHeatmap to display the results of an ADMIXTURE run with multiple K values side by side: https://anthrogenica.com/showthread....l=1#post770253.
    Last edited by Nganasankhan; 06-24-2022 at 03:00 AM.

  2. The Following 3 Users Say Thank You to Nganasankhan For This Useful Post:

     David Bush (06-27-2022),  PLogan (06-24-2022),  Telfermagne (06-26-2022)

  3. #12
    Registered Users
    Posts
    503
    Sex
    Location
    Missouri, U.S.
    Ethnicity
    Colonial American
    Nationality
    American
    aDNA Match (1st)
    VK2020_Scotland_Orkney_VA:VK207
    Y-DNA (P)
    R1b-U152 >R-FTA96415
    mtDNA (M)
    J1b1a1a
    Y-DNA (M)
    I2-P37 > I-BY77146
    mtDNA (P)
    H

    United States of America Scotland England Netherlands
    Thank you very much for the additional guidance.

  4. #13
    Registered Users
    Posts
    503
    Sex
    Location
    Missouri, U.S.
    Ethnicity
    Colonial American
    Nationality
    American
    aDNA Match (1st)
    VK2020_Scotland_Orkney_VA:VK207
    Y-DNA (P)
    R1b-U152 >R-FTA96415
    mtDNA (M)
    J1b1a1a
    Y-DNA (M)
    I2-P37 > I-BY77146
    mtDNA (P)
    H

    United States of America Scotland England Netherlands
    Injecting my family into the MDLP K27 heatmaps...


  5. The Following User Says Thank You to PLogan For This Useful Post:

     Nganasankhan (06-24-2022)

  6. #14
    Registered Users
    Posts
    315
    Sex
    Ethnicity
    Finnish

    Display the closest populations to a population as a heatmap

    For some reason I hadn't thought of this before, but a nice way to show the closest populations to a population is to display them as a heatmap, so then you not only see the distance to each population but also the percentage of their components.

    I used the same method to account for FST as before, where I used matrix multiplication to multiply the matrix of admixture percentages with an MDS matrix of the FST matrix of the calculator.



    Code:
    library(ComplexHeatmap)
    library(circlize) # for `colorRamp2`
    library(vegan) # for `reorder.hclust`
    library(colorspace) # for `hex`
    
    t=read.csv("https://pastebin.com/raw/0qDaQ8HL",row.names=1,check.names=F) # MDLP K27
    fst=read.csv("https://pastebin.com/raw/6t0gCQVh",row.names=1)
    
    targ="Komi"
    head=64
    
    mds=cmdscale(fst,ncol(fst)-1)
    t2=as.matrix(t)%*%mds
    dist=sort(as.matrix(dist(t2))[targ,])
    t=t[,order(-mds[,1])] # rearrange columns based on the order of each component in the first dimension of MDS of the FST matrix
    t=t[head(names(dist),head),]
    rownames(t)=paste0(rownames(t)," (",sprintf("%.2f",head(dist,head)),")")
    
    cellsize=30
    png("1.png",w=ncol(t)*cellsize+1000,h=nrow(t)*cellsize+1000,res=72)
    
    draw(Heatmap(
      as.matrix(t),
      cell_fun=function(j,i,x,y,w,h,fill)grid.text(sprintf("%.0f",t[i,j]),x,y,gp=gpar(fontsize=15)),
      col=colorRamp2(seq(0,100,length.out=7),hex(HSV(c(210,210,130,60,40,20,0),c(0,rep(.5,6)),1))),
      show_heatmap_legend=F,
      show_column_names=F,
      show_row_names=F,
      cluster_columns=F,
      cluster_rows=F,
      width=ncol(t)*unit(cellsize,"pt"),
      height=nrow(t)*unit(cellsize,"pt"),
      column_title=paste0("Distance to ",targ," (multiplied by MDS of FST)"),column_title_gp=gpar(fontsize=36),
      left_annotation=rowAnnotation(text1=anno_text(gt_render(rownames(t),padding=unit(c(2,2,2,2),"mm")),just="right",location=unit(1,"npc"),gp=gpar(fontsize=17))),
      right_annotation=rowAnnotation(text1=anno_text(gt_render(rownames(t),padding=unit(c(2,2,2,2),"mm")),just="left",location=unit(0,"npc"),gp=gpar(fontsize=17)))
      top_annotation=columnAnnotation(text=anno_text(gt_render(colnames(t),padding=unit(c(2,2,2,2),"mm")),just="left",rot=90,location=unit(0,"npc"),gp=gpar(fontsize=17))),
      bottom_annotation=columnAnnotation(text=anno_text(gt_render(colnames(t),padding=unit(c(2,2,2,2),"mm")),just="left",rot=270,gp=gpar(fontsize=17))),
    ))
    
    dev.off()
    system("mogrify -gravity center -trim -border 16 -bordercolor white 1.png")

  7. The Following 3 Users Say Thank You to Nganasankhan For This Useful Post:

     David Bush (06-27-2022),  PLogan (06-24-2022),  Telfermagne (06-26-2022)

  8. #15
    Registered Users
    Posts
    503
    Sex
    Location
    Missouri, U.S.
    Ethnicity
    Colonial American
    Nationality
    American
    aDNA Match (1st)
    VK2020_Scotland_Orkney_VA:VK207
    Y-DNA (P)
    R1b-U152 >R-FTA96415
    mtDNA (M)
    J1b1a1a
    Y-DNA (M)
    I2-P37 > I-BY77146
    mtDNA (P)
    H

    United States of America Scotland England Netherlands
    Another learning lesson... converting the "Make a heatmap of the FST matrix of a calculator" example from K13 to K36

    Had to comment out the /1e3 on the read.csv or I got all zeros in the cells.

    Code:
    library(pheatmap)
    library(vegan) # for `reorder.hclust`
    library(colorspace) # for `hex`
    
    t=read.csv("d:/DataAnalysis/K36_fst.csv",row.names=1,check.names=F)  #/1e3  #"https://pastebin.com/raw/6JmN2hRY"
    
    hc=hclust(as.dist(t))
    hc=reorder(hc,cmdscale(t)[,1])
    
    pheatmap::pheatmap(
      t*1000,
      filename="K36_Heatmap_of_FST_Matrix.png",
      clustering_callback=function(...)hc,
      legend=F,
      cellwidth=16,
      cellheight=16,
      treeheight_row=80,
      treeheight_col=80,
      fontsize=8,
      border_color=NA,
      display_numbers=T,
      number_format="%.0f",
      fontsize_number=7,
      number_color="black",
      colorRampPalette(hex(HSV(c(210,210,130,60,40,20,0),c(0,rep(.5,6)),1)))(256)
    )


  9. The Following User Says Thank You to PLogan For This Useful Post:

     David Bush (06-27-2022)

  10. #16
    Registered Users
    Posts
    315
    Sex
    Ethnicity
    Finnish

    Run calculators on the command line with ADMIXTURE

    I now figured out a faster way to run a calculator for a large batch of samples, which is to simply use the `-P` option of ADMIXTURE to project the samples. See section 2.14 of the ADMIXTURE manual (https://dalexander.github.io/admixtu...ure-manual.pdf). For example the following code runs Eurogenes K36 for all samples in the 1240K version of the Reich dataset (which took only about 2 hours on my computer, even though it would've taken about 4 days with the Python admix script):

    Code:
    wget https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V50/V50.0/SHARE/public.dir/v50.0_1240K_public.{anno,geno,ind,snp}
    f=v50.0_HO_public;convertf -p <(printf %s\\n genotypename:\ $f.geno snpname:\ $f.snp indivname:\ $f.ind outputformat:\ PACKEDPED genotypeoutname:\ $f.bed snpoutname:\ $f.bim indivoutname:\ $f.fam)
    curl -Lso K36.zip 'https://drive.google.com/uc?export=download&id=1vOJptDkvjDuukjDFQXhgnDDL0zHUJ4YN'
    unzip K36.zip
    plink --allow-no-sex --bfile v50.0_1240K_public  --extract <(cut -d\  -f1 K36.alleles) --geno .9999 --make-bed
    cut -d\  -f1 K36.alleles|paste - K36.36.F|awk 'NR==FNR{a[$1]=$0;next}{print a[$2]}' - plink.bim|cut -f2 >plink.36.P.in
    admixture -j4 -P plink.bed 36
    cut -d\  -f2 plink.fam|paste - <(awk '{for(i=1;i<=NF;i++)$i=sprintf("%.2f",100*$i)}1' plink.36.Q)|awk -F\\t 'NR==FNR{a[$2]=$13;next}{print a[$1]":"$0}' v50.0_1240K_public.anno -|tr '\t ' ,|sort>reich.k36.1240k
    The `--geno .9999` option is not actually needed in this case, but if you run ADMIXTURE with a smaller set of samples and you don't include the `--geno` option, then it can result in an error that "all genotypes are missing for a SNP locus".

    You can download a zip file for the calculator files for K36 from here: https://drive.google.com/file/d/1vOJ...hgnDDL0zHUJ4YN. (Mirror: https://onedrive.live.com/?authkey=%...1FDFEB45%21141.)

    Here's a CSV file for the K36 results of samples in the Reich dataset: https://drive.google.com/file/d/1dGD...IV4Cg6KnKHPeeX. I used the 1240K version when available and the HO version only for samples that were missing from the 1240K version. Eurogenes K36 has about 160,000 overlapping SNPs with the 1240K version but only about 50,000 overlapping SNPs with the HO version.

    The zip file linked above includes the text file `Eurogenes_K36_refs.txt` which contains a list of samples that were used as references for Eurogenes K36. I excluded a total of 1868 samples from the CSV file where the part after the last underscore of the sample ID in the text file was the same as the version ID or master ID of a sample in the Reich dataset. But that may have resulted in some false positives, and it also resulted in false negatives because many samples had different names in the text file and the Reich dataset. I didn't bother to remove samples that suffer from the calculator effect manually, but they include some Chuvash samples which got almost 100% of the Volga-Ural component, even though other Chuvash samples only got about 20%.

    The following code generates population averages so that it removes duplicate samples, it removes samples with less than 200,000 SNPs, it removes samples that have been marked as contaminated, and it removes various suffixes from the population names:

    Code:
    curl -Lso reich.k36.csv 'https://drive.google.com/uc?export=download&id=1dGDy5_jpDR6vxXYz1qIV4Cg6KnKHPeeX'
    curl -LsO https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V50/V50.0/SHARE/public.dir/v50.0_HO_public.anno
    LC_ALL=C sort -t$'\t' -rnk14 v50.0_HO_public.anno|awk -F\\t '!a[$3]++&&$14>=2e5{print$2}'|awk -F'[:,]' 'NR==FNR{a[$0];next}$2 in a' - <(sed 1d reich.k36.csv)|sed 's/:[^,]*//'||grep -v ^\\.\\.|sed -E 's/(_(son|daughter|brother|sister|father|mother|published|all|dup)|[._][12]d\.|[._]rel)[^,]*//'|sed -E 's/\.(SG|SDG|DG|WGA)//'|grev -v ^Ignore_|grep -Ev '_(contam|fail)'|awk -F, '{n[$1]++;for(i=2;i<=NF;i++){a[$1,i]+=$i}}END{for(i in n){o=i;for(j=2;j<=NF;j++)o=o FS sprintf("%.2f",a[i,j]/n[i]);print o}}'|sort|cat <(head -n1 reich.k36.csv) ->reich.k36.csv.ave
    Last edited by Nganasankhan; 06-26-2022 at 11:51 PM.

  11. The Following 3 Users Say Thank You to Nganasankhan For This Useful Post:

     Kale (06-27-2022),  MistH028 (07-06-2022),  PLogan (06-26-2022)

  12. #17
    Registered Users
    Posts
    503
    Sex
    Location
    Missouri, U.S.
    Ethnicity
    Colonial American
    Nationality
    American
    aDNA Match (1st)
    VK2020_Scotland_Orkney_VA:VK207
    Y-DNA (P)
    R1b-U152 >R-FTA96415
    mtDNA (M)
    J1b1a1a
    Y-DNA (M)
    I2-P37 > I-BY77146
    mtDNA (P)
    H

    United States of America Scotland England Netherlands
    Thank you so much for this K36 file Nganasankhan. I'm doing a little cleanup with kit names that have commas in them.

    Australian:IHW9118(BUR,E)
    Australian:IHW9193(WON,M)
    Australian:IHW9195(WON,C)
    and the seven Ignore_Australian kits.

    Nothing too bad.

  13. #18
    Registered Users
    Posts
    315
    Sex
    Ethnicity
    Finnish

    I made two errors in the last shell command in post #16: `||grep` should be `|grep` and `grev` should be `grep`.

    And also in post #14, there's a comma missing from the end of the line that starts with `right_annotation`.

    Quote Originally Posted by PLogan View Post
    I'm doing a little cleanup with kit names that have commas in them.

    Australian:IHW9118(BUR,E)
    Australian:IHW9193(WON,M)
    Australian:IHW9195(WON,C)
    and the seven Ignore_Australian kits.
    I updated the file at Google Drive so I removed the part within the parentheses from the names of those Australian samples, so I changed for example `IHW9118(BUR,E)` to `IHW9118`.
    Last edited by Nganasankhan; 06-28-2022 at 03:53 AM.

  14. The Following User Says Thank You to Nganasankhan For This Useful Post:

     PLogan (06-28-2022)

  15. #19
    Registered Users
    Posts
    315
    Sex
    Ethnicity
    Finnish

    Use Mantel's test to calculate a correlation coefficient between G25 and calculators

    I now used ADMIXTURE to project all samples in the 1240K+HO version of the Reich dataset using various calculators. I then selected 1053 modern samples which had the same sample name and population name in the Reich dataset and in G25, and for each calculator, I used Mantel's test to compare a distance matrix of the calculator results of the samples with a distance matrix of the same samples in G25 (https://anthrogenica.com/showthread....etic-distances). When I took FST into account by multiplying the matrix of calculator results with an MDS matrix of the FST matrix of the calculator, I got the highest correlation coefficient with Harappaworld (0.990), followed by Eurogenes K13 (0.987), Eurogenes K15 (0.986), and Eurogenes K36 (0.984):

    Calculator Correlation with G25
    (FST accounted)
    Correlation with G25
    (FST not accounted)
    SNPs in common
    with 1240K+HO
    Dodecad africa9 0.761 0.450 23889
    Dodecad eurasia7 0.913 0.821 45048
    Dodecad K12b 0.906 0.718 45265
    Dodecad K7b 0.909 0.817 45265
    Dodecad weac2 (K=7) 0.909 0.821 45877
    Eurogenes K13 0.987 0.852 52770
    Eurogenes K15 0.986 0.852 53263
    Eurogenes K36 0.984 0.684 46518
    Harappaworld (K=16) 0.990 0.854 56582
    MDLP K27 0.973 0.752 49757

    In the table above, accounting for FST resulted in a more dramatic increase in correlation with higher-K calculators like Eurogenes K36 and MDLP K27.

    I got the lowest correlation with Dodecad africa9, but it's probably because it's designed for Africans, so it has 7 African components, 2 West Eurasian components, and no East Eurasian or American component.

    I didn't attempt to exclude samples which suffered from the calculator effect, so in some cases the calculators with a lower correlation might simply have more samples that suffer from the calculator effect. One thing that might give an unfair advantage to the Eurogenes calculators is that my experiment only included samples that were in G25, but the G25 datasheets don't include samples that were part of the reference run for G25, so if Davidski also used the same references for G25 and Eurogenes calculators, then it might reduce the number of samples in my experiment that were part of the reference run of the Eurogenes calculators.

    For example this calculates a correlation coefficient between Dodecad K12b and G25:

    Code:
    library(ade4)
    
    t=read.csv("https://drive.google.com/uc?export=download&id=1nG0F4BiGKH5rGHWycVtEoG7w72uSANR2",r=1)
    g25=read.csv("https://drive.google.com/uc?export=download&id=1HYrDwxEXv82DvDLoq736pS5ZTGJA4dn5",r=1)
    pick=intersect(rownames(t),rownames(g25))
    
    fst=read.csv(h=F,text="0,121,74,123,62,48,81,132,66,118,41,173
    121,0,144,72,139,123,101,181,146,52,129,218
    74,144,0,142,56,67,99,97,60,137,52,129
    123,72,142,0,142,129,91,173,145,32,130,209
    62,139,56,142,0,41,98,136,52,137,33,185
    48,123,67,129,41,0,86,140,63,123,41,183
    81,101,99,91,98,86,0,135,99,85,82,172
    132,181,97,173,136,140,135,0,124,170,123,51
    66,146,60,145,52,63,99,124,0,139,47,169
    118,52,137,32,137,123,85,170,139,0,125,206
    41,129,52,130,33,41,82,123,47,125,0,171
    173,218,129,209,185,183,172,51,169,206,171,0")
    
    t2=as.matrix(t)%*%cmdscale(fst,ncol(fst)-1)
    
    mantel.rtest(dist(t2[pick,]),dist(g25[pick,]))
    I want to add more calculators to my comparison, but I have not yet found an FST matrix for other calculators.
    Last edited by Nganasankhan; 07-01-2022 at 03:47 AM.

  16. The Following User Says Thank You to Nganasankhan For This Useful Post:

     Tomenable (07-06-2022)

  17. #20
    Registered Users
    Posts
    315
    Sex
    Ethnicity
    Finnish

    Use Mantel's test to calculate a correlation coefficient between calculator results and an f2 matrix

    The following code downloads the 1240K+HO version of the Reich dataset, converts it to PLINK format, selects modern samples, ignores samples with certain affixes in the sample name, and selects the first 8 samples from each population with at least 8 samples. Then it uses ADMIXTOOLS 2 to create an f2 matrix between the populations:

    Code:
    wget https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V50/V50.0/SHARE/public.dir/v50.0_HO_public.{anno,ind,snp,geno}
    f=v50.0_HO_public;convertf -p <(printf %s\\n genotypename:\ $f.geno snpname:\ $f.snp indivname:\ $f.ind outputformat:\ PACKEDPED genotypeoutname:\ $f.bed snpoutname:\ $f.bim indivoutname:\ $f.fam)
    igno()(grep -Ev '\.REF|rel\.|fail\.|\.contam|Ignore_|_dup|_contam|_lc|_father|_mother|_son|_daughter|_brother|_sister|_relative|_sibling|_twin|Neanderthal|Denisova|Vindija_light|Gorilla|Macaque|Marmoset|Orangutan|Primate_Chimp|hg19ref')
    x=eight;sed 1d v50.0_HO_public.anno|awk -F\\t '$5==0{print$2,$7}'|tr -d \"|grep -Fv .|igno|awk 'NR==FNR{++a[$0];next}a[$2]>=8&&++b[$2]<=8' <(cut -f7 v50.0_HO_public.anno) ->$x.pick
    plink --allow-no-sex --bfile v50.0_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.pick v50.0_HO_public.fam) --make-bed --out $x
    awk 'NR==FNR{a[$1]=$2;next}{$1=a[$2]}1' $x.{pick,fam}>$x.fam.temp;mv $x.fam{.temp,}
    Rscript --no-init-file -e 'library(admixtools);f=f2("eight",unique_only=F);df=as.data.frame(f);m=xtabs(df[,3]~df[,2]+df[,1]);write.csv(round(m,8),"eight.f2",quote=F)'
    Then next the following code goes through the CSV files of results for various calculators, so that it selects the samples that are included in the f2 matrix from the CSV file, calculates their population average, and calculates the correlation coefficient of the population averages with the f2 matrix:

    Code:
    library(ade4)
    
    pick=read.table("eight.pick")[,1]
    
    sapply(c("africa9","eurasia7","harappaworld","k12b","k13","k15","k36","k7b","mdlpk27","weac2"),function(c){
      t=read.csv(paste0("reich.",c,".ho"),row.names=1,header=F)
      f=read.csv("eight.f2",row.names=1)
    
      t=t[sub(".*:","",rownames(t))%in%pick,]
      a=data.frame(aggregate(t,list(sub(":.*","",rownames(t))),mean),row.names=1)
    
      fst=as.matrix(as.dist(read.csv(paste0(c,"fst"),head=F)))
      a2=as.matrix(a)%*%cmdscale(fst,ncol(fst)-1)
    
      mantel.rtest(dist(a2),as.dist(f[rownames(a2),rownames(a2)]))$obs
    })
    Here are the results (the calculator results are multiplied by MDS of FST in each case):

    Calculator Correlation with f2 (no LD
    pruning before f2, 597,573 SNPs)
    Correlation with f2 (with LD
    pruning before f2, 115,808 SNPs)
    Dodecad africa9 0.755 0.778
    Dodecad eurasia7 0.848 0.831
    Dodecad K7b 0.840 0.823
    Dodecad K12b 0.841 0.824
    Dodecad weac2 (K=7) 0.863 0.858
    Eurogenes K13 0.947 0.925
    Eurogenes K15 0.945 0.921
    Eurogenes K36 0.964 0.950
    Harappaworld (K=16) 0.967 0.951
    MDLP K27 0.967 0.958

    Earlier I wasn't sure if I should perform LD pruning before I calculate f2 distances, but now I got higher correlation when I didn't do LD pruning, so I think I'll stop using LD pruning before f2. (In another experiment where I compared G25 distance to distance in `plink --pca`, I also got lower correlation when I did LD pruning before the PCA.)
    Last edited by Nganasankhan; 07-01-2022 at 08:03 AM.

  18. The Following User Says Thank You to Nganasankhan For This Useful Post:

     PLogan (07-04-2022)

Page 2 of 3 FirstFirst 123 LastLast

Similar Threads

  1. Shell commands and R scripts for working with G25
    By Nganasankhan in forum Autosomal (auDNA)
    Replies: 71
    Last Post: 08-13-2022, 03:30 PM
  2. Shell and R scripts for SmartPCA and ADMIXTURE
    By Nganasankhan in forum Autosomal (auDNA)
    Replies: 34
    Last Post: 07-12-2022, 12:40 PM
  3. Genoplot - ADMIXTURE Calculators
    By GenoPlot in forum GenoPlot
    Replies: 16
    Last Post: 08-17-2020, 02:23 AM
  4. Admixture Calculators
    By MR J in forum GenoPlot
    Replies: 1
    Last Post: 07-12-2020, 05:48 PM
  5. WHG-EEF-EHG-CHG Admixture... any calculators?
    By R.Rocca in forum Ancient (aDNA)
    Replies: 19
    Last Post: 05-21-2017, 10:15 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •