1. ## R and shell scripts for admixture calculators

You can post your own scripts in other languages, and you can also post links to scripts made by other people.

Make an MDS plot based on the FST matrix of a calculator

MDS (multidimensional scaling) is essentially a version of PCA which takes a distance matrix as an input. If you have the FST matrix of a calculator, then you can use MDS to plot the components of the calculator in space so that the Euclidean distance between the points of the components is virtually the same as the FST distance between the components. The input matrix for MDS needs to be a square matrix, and the maximum number of dimensions produced by MDS is one fewer than the number of rows and columns of the input matrix, so for example it's 12 in the case of Eurogenes K13. Here you can see the first 6 dimensions out of 12: Code:
```library(tidyverse)
library(cowplot)
library(ggrepel)
library(ggforce)

,,,,,,,,,,,,
19,,,,,,,,,,,,
28,36,,,,,,,,,,,
26,32,36,,,,,,,,,,
26,35,28,21,,,,,,,,,
52,62,50,48,39,,,,,,,,
64,65,76,57,60,82,,,,,,,
114,114,122,110,111,127,76,,,,,,
111,111,123,109,112,130,83,56,,,,,
138,137,154,138,144,161,120,113,105,,,,
179,181,187,177,176,191,146,166,177,217,,,
122,127,124,116,108,121,113,145,151,185,203,,
146,150,150,140,135,141,133,164,170,204,220,41,",check.names=F)))/1e3

mds=cmdscale(as.matrix(fst),ncol(as.matrix(fst))-1,eig=T)

p=as.data.frame(mds\$points)
pct=paste0("Dim ",1:ncol(p)," (",sprintf("%.1f",100*mds\$eig/sum(mds\$eig)),"%)")

plots=lapply(seq(1,5,2),function(i){
xpc=paste0("V",i)
ypc=paste0("V",i+1)
seg=lapply(1:3,function(i)apply(as.data.frame(fst),1,function(x)unlist(p[names(sort(x)[i]),c(xpc,ypc)],use.names=F))%>%t%>%cbind(p[,c(xpc,ypc)]))%>%do.call(rbind,.)%>%setNames(paste0("V",1:4))

k=as.factor(cutree(hclust(as.dist(fst)),6))

ggplot(p,aes(!!sym(xpc),!!sym(ypc)))+
geom_segment(data=seg,aes(x=V1,y=V2,xend=V3,yend=V4),color="gray60",size=.15)+
geom_point(aes(color=!!k),size=.5)+
# geom_text(aes(color=!!k),label=rownames(p),size=2,vjust=-.7)+
labs(x=pct[i],y=pct[i+1])+
scale_x_continuous(breaks=seq(-1,1,.02),expand=expansion(mult=c(.04,.04)))+
scale_y_continuous(breaks=seq(-1,1,.02),expand=expansion(mult=c(.04,.04)))+
scale_fill_manual(values=rainbow_hcl(nlevels(k),90,60))+
scale_color_manual(values=rainbow_hcl(nlevels(k),90,60))+
theme(
axis.text=element_text(size=6,color="black"),
axis.text.y=element_text(angle=90,vjust=1,hjust=.5),
axis.ticks=element_blank(),
axis.ticks.length=unit(0,"cm"),
axis.title=element_text(size=7),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray85",fill=NA,size=.6),
panel.grid.major=element_line(color="gray85",size=.2),
plot.background=element_rect(fill="white")
)
})

cowplot::plot_grid(plotlist=plots,ncol=3)
ggsave("1.png",width=15,height=5)```
Calculate distance to a specified population so that FST is taken into account

When you use Vahaduo to calculate distances between populations in a calculator like K13, then for example a population that is 100% Baltic will be equally distant from a population that is 50% Siberian and 50% Baltic as from a population that is 50% North_Atlantic and 50% Baltic. In order to calculate distances between populations so that the FST distance between the components is taken into account, you can first use MDS on the FST matrix of K13 so that you plot the components of the calculator in 12-dimensional space like in the previous script. Then next you can obtain the coordinates of each population inside the same 12-dimensional space if you express each population as a linear combination of the coordinates of its components, so if for example a population has 50% Siberian and 50% Baltic, then it's moved halfway away from the origin of the space towards the point of the Siberian component, and it's moved halfway away towards the point of the Baltic component. To easily generate the linear combination for each row in the spreadsheet of a calculator, you can simply use matrix multiplication to multiply the matrix of the spreadsheet with the MDS matrix of the FST matrix.

For example in the original K13 spreadsheet, Maris have a high percentage of the Baltic component, so if FST is not taken into account, then Maris are closer to most Balto-Slavic populations than to any Central Asian population, but if FST is taken into account, then Maris end up moving closer to some of the Central Asian populations that have a similar level of Mongoloid admixture as Maris. This calculates distance to Maris so that FST is taken into account:

Code:
```t=read.csv("https://pastebin.com/raw/H4pm9UF1",row.names=1,check.names=F) # K13 original

t2=as.matrix(t)%*%cmdscale(fst,ncol(fst)-1)

writeLines(paste(round(s),names(s)))```
Output:

0 Mari
656 Chuvash
1261 Tatar
1402 Nogay
1478 Afghan_Turkmen
1893 Turkmen
1947 Uzbeki
2128 Aghan_Hazara
2168 La_Brana-1
2211 Burusho
2411 East_Finnish
2417 Afghan_Pashtun
2516 MA-1
2518 Erzya

In another thread, I used Mantel's test to calculate a correlation coefficient between genetic distance calculated using admixture calculators and genetic distance calculated using f2 or G25 (showthread.php?22402-Mantel-s-Test-G25-vs-genetic-distances). I found that the correlation greatly increased when I multiplied the table of admixture percentages with an MDS matrix of the FST matrix of a calculator.

Compare regular distance with distance where FST is taken into account

Here you can see that when FST is accounted for, then Maris move closer to Central Asians and South Asians but further from Europeans and Siberians: Code:
```library(tidyverse)
library(ggforce)

t2=as.matrix(t)%*%cmdscale(fst,ncol(fst)-1)

targ="Mari"
xy=data.frame(x=rank(as.matrix(dist(t))[,targ]),y=rank(as.matrix(dist(t2))[,targ]))

k=as.factor(cutree(hclust(dist(t2)),24))

ggplot(xy,aes(x,y))+
geom_abline(linetype="dashed",color="gray80",size=.3)+
geom_point(aes(color=!!k),size=.5)+
geom_text(aes(color=!!k),label=rownames(xy),size=2,vjust=-.7)+
scale_x_continuous(breaks=seq(1,1000,20),expand=expansion(mult=c(.04,.04)))+
scale_y_continuous(breaks=seq(1,1000,20),expand=expansion(mult=c(.04,.04)))+
scale_fill_manual(values=rainbow_hcl(nlevels(k),90,60))+
scale_color_manual(values=rainbow_hcl(nlevels(k),90,60))+
labs(x=paste("Rank of K13 distance to",targ,"(FST not accounted for)"),y=paste("Rank of K13 distance to",targ,"(multiplied by MDS of FST)"))+
theme(
axis.text=element_text(size=6),
axis.text.y=element_text(angle=90,vjust=1,hjust=.5),
axis.ticks=element_blank(),
axis.ticks.length=unit(0,"cm"),
axis.title=element_text(size=8),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray85",fill=NA,size=.6),
panel.grid.major=element_line(color="gray85",size=.2),
plot.background=element_rect(fill="white"),
plot.subtitle=element_text(size=7),
plot.title=element_text(size=11)
)

ggsave("1.png",width=7,height=7)```
Generate a CSV file for a calculator spreadsheet multiplied by MDS of FST

You can use the output of this script at Vahaduo to generate more accurate models than with a regular calculator spreadsheet (or in some cases it might be more inaccurate, because for example in the case of European populations, this method also amplifies differences in noise-level admixture of non-European components).

Code:
```t=read.csv("https://pastebin.com/raw/H4pm9UF1",row.names=1,check.names=F) # K13 original

mds=cmdscale(fst,ncol(fst)-1)
t2=as.matrix(t)%*%mds

write.table(t(apply(t2,1,function(x)sub("^0","",sprintf("%.5f",x)))),"k13.mds.csv",sep=",",col.names=F,quote=F)```
Make a PCA plot which accounts for the FST distance between the components

If you make a PCA plot where you include all rows of the original K13 spreadsheet so you don't account for FST, then in a plot that shows PC1 and PC2, one of the three corners of the plot is occupied by South Asians and not by Africans like in a conventional global PCA. However when you take FST into account, then the PCA plot will look more like a typical global PCA with Africans in the third corner: Code:
```library(tidyverse)
library(ggforce)

t2=as.matrix(t)%*%cmdscale(fst,ncol(fst)-1)

p0=prcomp(t)
pct=paste0(colnames(p0\$x)," (",sprintf("%.1f",100*p0\$sdev/sum(p0\$sdev)),"%)")
p=as.data.frame(p0\$x)

p=p/sd(p[,1]) # divide the values of all PCs by the standard deviation of PC1 so the values are displayed on a normalized scale

dist=as.matrix(dist(t2))
k=as.factor(cutree(hclust(dist(t2)),24))

set.seed(1)
pal1=hcl(hue,100,55)

# draw a line between each population and its two nearest neighbors
seg=lapply(1:3,function(j)apply(dist,1,function(x)unlist(p[names(sort(x)[j]),c(1,2)],use.names=F))%>%t%>%cbind(p[,c(1,2)]))%>%do.call(rbind,.)%>%setNames(paste0("V",1:4))

ggplot(p,aes(PC1,PC2))+
geom_segment(data=seg,aes(x=V1,y=V2,xend=V3,yend=V4),color="gray50",size=.1)+
geom_point(color=pal1[k],size=.3)+
geom_text(aes(label=rownames(p)),color=pal1[k],size=2,vjust=-.6)+
labs(x=pct[i],y=pct[i+1])+
scale_x_continuous(breaks=seq(-100,100,.5),expand=expansion(.09))+
scale_y_continuous(breaks=seq(-100,100,.5),expand=expansion(.03))+
theme(
axis.ticks=element_blank(),
axis.ticks.length=unit(0,"pt"),
axis.text=element_text(color="black",size=6),
axis.text.y=element_text(angle=90,vjust=1,hjust=.5),
axis.title=element_text(color="black",size=8),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray80",fill=NA,size=.4),
panel.grid.major=element_line(color="gray90",size=.2),
panel.grid.minor=element_blank(),
plot.background=element_rect(fill="white",color=NA)
)

ggsave(paste0(i,".png"),width=6,height=6)```
Make a hierarchical clustering tree based on the FST matrix of a calculator

This uses `reorder.hclust` from `vegan` to reorder the branches of the clustering tree based on the position of each component on the first dimension in an MDS matrix of the FST matrix. (So basically if you think of the first two dimensions of the MDS matrix as the x- and y-axis of a PCA-style scatterplot, then the position of each component on the x-axis is used as the weight for reordering the branches of the clustering tree.) Code:
```library(ape) # for `plot.phylo`
library(vegan) # for `reorder.hclust`

png("1.png",w=700,h=800)

hc=hclust(as.dist(fst))
hc=reorder(hc,cmdscale(fst)[,1]) # use first dimension of MDS as weights for reordering the branches

plot(as.phylo(hc),cex=1.2,font=1,underscore=T) # `font=1` doesn't use italic and `underscore=T` doesn't replace underscores with spaces
dev.off()

system("mogrify -trim -bordercolor white -border 20 1.png") # remove empty space around the plot but add back a 20-pixel margin```
Make a clustered heatmap which accounts for FST distance between components

The branches are reordered so that the weight for each population is the position of the population in the first dimension of a PCA of the admixture matrix multiplied by the MDS matrix of the FST matrix. Code:
```library(pheatmap)
library(vegan) # for `reorder.hclust`
library(colorspace) # for `hex`

t=t[sample(nrow(t),16),]

t2=as.matrix(t)%*%cmdscale(fst,ncol(fst)-1)
hc=hclust(dist(t2))
hc=reorder(hc,prcomp(t2)\$x[,1])

# t=t[,order(colMeans(t))]
t=t[,ncol(t):1]

pheatmap::pheatmap(
t,
filename="1.png",
cluster_cols=F,
clustering_callback=function(...)hc,
legend=F,
cellwidth=16,
cellheight=16,
treeheight_row=80,
treeheight_col=80,
fontsize=8,
border_color=NA,
display_numbers=T,
number_format="%.0f",
fontsize_number=7,
number_color="black",
# breaks=seq(0,100,100/256),
colorRampPalette(hex(HSV(c(210,210,130,60,40,20,0),c(0,rep(.5,6)),1)))(256)
)```
Make a heatmap of the FST matrix of a calculator

The distances are multiplied by 1000. Code:
```library(pheatmap)
library(vegan) # for `reorder.hclust`
library(colorspace) # for `hex`

hc=hclust(as.dist(t))
hc=reorder(hc,cmdscale(t)[,1])

pheatmap::pheatmap(
t*1000,
filename="1.png",
clustering_callback=function(...)hc,
legend=F,
cellwidth=16,
cellheight=16,
treeheight_row=80,
treeheight_col=80,
fontsize=8,
border_color=NA,
display_numbers=T,
number_format="%.0f",
fontsize_number=7,
number_color="black",
colorRampPalette(hex(HSV(c(210,210,130,60,40,20,0),c(0,rep(.5,6)),1)))(256)
)```
Sort populations based on their distance to a component

When you use matrix multiplication to multiply the spreadsheet of a calculator with the FST matrix of the calculator, then it generates a matrix where instead of showing the percentage of each component of each sample, it shows the distance of each sample to each component.

For example this finds populations in K13 that have the lowest distance to the Siberian component:

Code:
```t=read.csv("https://pastebin.com/raw/9dMWpJcU",row.names=1,check.names=F)

19,,,,,,,,,,,,
28,36,,,,,,,,,,,
26,32,36,,,,,,,,,,
26,35,28,21,,,,,,,,,
52,62,50,48,39,,,,,,,,
64,65,76,57,60,82,,,,,,,
114,114,122,110,111,127,76,,,,,,
111,111,123,109,112,130,83,56,,,,,
138,137,154,138,144,161,120,113,105,,,,
179,181,187,177,176,191,146,166,177,217,,,
122,127,124,116,108,121,113,145,151,185,203,,

t2=as.matrix(t)%*%fst
cat(paste(round(s),names(s)),sep="\n")```
Output:

1599 Evens
1737 Evenki
1818 Oroqen
2070 Yakut
2236 Dolgan
3010 Koryak
3109 Buryat
3501 Hezhen
3557 Tuvinian
3690 Xibo
3724 Chukchi
4012 Japanese
4081 Mongolian
4372 Selkup
4379 Ket
4453 Naxi

Find the 4 closest populations to each component

Code:
```\$ curl https://pastebin.com/raw/9dMWpJcU|tr -d \\r>k13
\$ curl https://pastebin.com/raw/6JmN2hRY|tr -d \\r>k13.fst
\$ awk -F, 'NR==FNR{for(i=1;i<=NF;i++)a[NR,i]=\$i;next}{for(i=1;i<=NF;i++){s=0;for(j=1;j<=NF;j++)s+=\$j*a[i,j];printf"%s"(i==NF?"\n":FS),s}}' k13.fst <(sed 1d k13|cut -d, -f2-)|paste -d, <(cut -d, -f1 k13|sed 1d) -|cat <(head -n1 k13) ->k13.mult
\$ for ((i=1;i<=\$(wc -l<k13.fst);i++));do awk -F, '{print\$(i+1),\$1}' i=\$i k13.mult|sed 1d|sort -n|awk '{\$1=sprintf("%.0f",\$1)}1'|head -n4|paste -sd\  -|sed s/^/\$(head -n1 k13.mult|cut -d, -f\$[i+1]):\ /;done
North_Atlantic: 1341 Norwegian 1350 Southwest_English 1353 West_Scottish 1356 North_Dutch
Baltic: 1465 Lithuanian 1587 Belorussian 1615 Russian_Smolensk 1685 Estonian_Polish
West_Med: 1493 Sardinian 2020 French_Basque 2264 Southwest_French 2336 Spanish_Aragon
West_Asian: 1553 Georgian 1587 Abhkasian 1891 Armenian 1956 Lezgin
East_Med: 1650 Lebanese_Christian 1666 Lebanese_Druze 1797 Cyprian 1805 Kurdish_Jewish
Red_Sea: 3234 Saudi 3493 Yemenite_Jewish 3995 Lebanese_Christian 4028 Samaritan
South_Asian: 1690 Chamar 1746 Piramalai 1752 Sakilli 1950 North_Kannadi
East_Asian: 919 Dai 946 She 1117 Vietnamese 1169 Miaozu
Siberian: 1599 Evens 1737 Evenki 1818 Oroqen 2070 Yakut
Amerindian: 45 Karitiana 1338 Pima 2017 Mayan 2844 North_Amerindian
Oceanian: 889 Papuan 5132 NAN_Melanesian 14564 Austroasiatic_Ho 14931 Sakilli
Northeast_African: 1365 Ethiopian_Gumuz 1586 Ethiopian_Anuak 1730 Sudanese 2144 Hadza
Sub-Saharan: 404 Yoruban 753 Bantu_S.W. 808 Bantu_S.E. 863 Mandenka```
R version:

Code:
`Rscript -e 't=read.csv("k13",row.names=1,check.names=F);fst=as.matrix(read.csv("k13.fst",head=F));t2=as.matrix(t)%*%fst;m=apply(t2,2,function(x){s=head(sort(x),4);paste(paste(round(s),names(s)),collapse=" ")});writeLines(paste0(names(t),": ",m))'`
Download a CSV file for the K13 results of all samples in the Reich dataset and generate subsets of the file

I made a CSV file for the K13 results of all samples in the Reich dataset. My file uses the 1240K version when available and HO version otherwise, because Eurogenes K13 has more than three times as many overlapping SNPs with 1240K as with HO. The code below uses the file to generate population averages of modern and ancient samples, where it only includes ancient samples with at least 500,000 SNPs. It then multiplies the resulting files with an MDS matrix of the FST matrix of K13, and it uses the files multiplied by MDS of FST to find the closest ancient populations to the modern Mari average.

Code:
```\$ curl -Lso reich.k13 'https://drive.google.com/uc?export=download&id=1pL2R5dDvQ0wsPw_SgoFn_zWLDrGXa9nM'
\$ curl https://pastebin.com/raw/6JmN2hRY|tr -d \\r>k13.fst
\$ wget https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V50/V50.0/SHARE/public.dir/v50.0_{1240K,HO}_public.anno
\$ igno()(grep -Ev '\.REF|rel\.|fail\.|\.contam|Ignore_|_dup|_contam|_lc|_father|_mother|_son|_daughter|_brother|_sister|_relative|_sibling|_twin|Neanderthal|Denisova|Vindija_light|Gorilla|Macaque|Marmoset|Orangutan|Primate_Chimp|hg19ref')
\$ tav()(awk '{n[\$1]++;for(i=2;i<=NF;i++){a[\$1,i]+=\$i}}END{for(i in n){o=i;for(j=2;j<=NF;j++)o=o FS sprintf("%f",a[i,j]/n[i]);print o}}' "FS=\${1-\$'\t'}")
\$ awk -F\\t '\$5==0{print\$2}' v50.0_HO_public.anno|awk -F'[:,]' 'NR==FNR{a[\$0];next}\$2 in a' - reich.k13|igno|sed 's/:[^,]*//'|tav ,|sort>reich.k13.modern.ave
\$ awk -F\\t '\$5>0&&\$14>=5e5{print\$2}' v50.0_HO_public.anno|awk -F'[:,]' 'NR==FNR{a[\$0];next}\$2 in a' - reich.k13|igno|sed 's/:[^,]*//'|tav ,|sort>reich.k13.ancient.ave
\$ dist()(awk -F, 'NR==FNR{for(i=2;i<=NF;i++)a[i]=\$i;next}\$1{s=0;for(i=2;i<=NF;i++)s+=(\$i-a[i])^2;print s^.5,\$1}' "\$2" "\$1"|sort -n|awk '{\$1=sprintf("%."x"f",\$1)}1' "x=\${3-3}")
\$ dist reich.k13.ancient.ave.mds <(grep ^Mari.SG, reich.k13.modern.ave.mds)|head -n8
0.483 Kazakstan_Sargat_IA
0.507 Russia_Karasuk_oRISE.SG
0.614 Russia_Tuva_IA_AldyBel
0.700 Kazakhstan_Saka_IA
0.715 Russia_Chalmny_Varre
0.719 Kazakhstan_Tasmola_Saka_IA
0.741 Kyrgyzstan_Saka_IA
0.779 Kazakhstan_IA_Tasbas```
The code below uses my modified version of michal3141's CVXPY script to create a model for the modern Mari average using ancient population averages (https://github.com/michal3141/g25). The models generated by the script are usually almost identical to Vahaduo. If you get an error that the solver CVXOPT failed, try running `sed -i s/CVXOPT/ECOS_BB/ mix`.

Code:
```\$ curl https://pastebin.com/raw/afaMiFSa|tr -d \\r>mix;chmod +x mix;pip3 install cvxpy
\$ ./mix reich.k13.ancient.ave.mds <(grep Mari reich.k13.modern.ave.mds)
Mari.SG (.058):
34% Latvia_BA
23% Russia_KusnarenkovoKarajakupovo_Medieval.SG
23% Russia_Chalmny_Varre
12% Russia_Krasnoyarsk_BA.SG
4% Georgia_Kotias.SG
1% Russia_KolymaRiver_LN.SG
1% Congo_Kindoki_Protohistoric
1% Iran_ShahrISokhta_BA2
0% Jordan_EBA```
Make a PCA of individual samples using Voronoi tessellation

Documentation for `ggforce::geom_voronoi_tile`: https://ggforce.data-imaginist.com/r...om_delvor.html. Code:
```library(tidyverse)
library(ggforce)

t=t[!grepl("\\.REF|Ignore_|_o|_lc|_outlier|_dup|_contam|_father|_mother|_son|_daughter|_brother|_sister|_sibling|_twin|\\.rel\\.|Neanderthal|Denisova|Vindija|Orangutang|Macaque|Marmoset|Primate_Chimp|hg19ref",rownames(t)),]

picks=anno[anno[,5]==0|anno[,5]>=4000,2]

mds=as.matrix(t)%*%cmdscale(fst,ncol(fst)-1)
mds=as.data.frame(mds[sub(".*:","",rownames(mds))%in%picks,])

rownames(mds)=sub("\\.(DG|SG|SDG|WGA)","",rownames(mds))
mds=mds[!sub(":.*","",rownames(mds))%in%c("PUR","CLM"),]
popav=data.frame(aggregate(mds,list(sub(":.*","",rownames(mds))),mean),row.names=1)
mds=mds[sub(":.*","",rownames(mds))%in%poppick,]
pop=sub(":.*","",rownames(mds))

p0=prcomp(mds)
p=as.data.frame(p0\$x)
pct=paste0(colnames(p0\$x)," (",sprintf("%.1f",p0\$sdev/sum(p0\$sdev)*100),"%)")

set.seed(1)
color=as.factor(sample(seq(1,length(unique(pop)))))
col=rbind(c(60,80),c(25,95),c(100,70),c(30,70),c(70,50),c(60,100),c(20,50),c(15,40),c(15,70),c(20,30),c(80,30))
hues=max(ceiling(length(color)/nrow(col)),7)
pal1=as.vector(apply(col,1,function(x)hcl(seq(15,375,length=hues+1)[1:hues],x,x)))
pal2=as.vector(apply(col,1,function(x)hcl(seq(15,375,length=hues+1)[1:hues],ifelse(x>50,.8*x,.2*x),ifelse(x>50,.3*x,100))))

for(i in seq(1,3,2)){
pc1=sym(paste0("PC",i+1))
pc2=sym(paste0("PC",i))

mult=diff(range(p[,i+1]))/diff(range(p[,i]))
p[,i]=p[,i]*mult
max=max(abs(p[,i]))
ranges=apply(p,2,function(x)abs(max(x)-min(x)))
maxrange=max(ranges[c(i,i+1)])

centers=aggregate(p,by=list(pop),mean)

ggplot(p,aes(!!pc1,!!pc2,group=-1L))+
coord_fixed(expand=F)+
labs(x=pct[i],y=pct[i+1])+
scale_fill_manual(values=pal1)+
scale_color_manual(values=pal2)+
theme(
axis.text=element_blank(),
axis.ticks=element_blank(),
axis.ticks.length=unit(0,"pt"),
axis.title=element_text(color="black",size=8),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray85",fill=NA,size=.4),
panel.grid.major=element_blank()
)

ggsave(paste0(i,".png"),width=7,height=7)
}```
Make an area chart that displays changes in the global average share of components over time

The code below uses my CSV file for the K13 results of all samples in the 1240K version of the Reich dataset. It rounds the age of each sample to the nearest 1000 years, and it then calculates the global average percentage of the K13 components for each 1000-year period. Code:
```library(tidyverse)

t=t[!grepl("\\.REF|rel\\.|fail\\.|\\.contam|Ignore_|_dup|_contam|_lc|_father|_mother|_son|_daughter|_brother|_sister|_relative|_sibling|_twin|Neanderthal|Denisova|Vindija_light|Gorilla|Macaque|Marmoset|Orangutan|Primate_Chimp|hg19ref|_o",rownames(t)),]

age=anno[,c(2,9)]%>%data.frame(row.names=1,check.names=F)
age2=round(age/1000)
tage=age2[sub(".*:","",rownames(t)),]
ave=aggregate(t,list(tage),mean)%>%data.frame(row.names=1,check.names=F)

taco=function(x)data.frame(V1=rownames(x)[row(x)],V2=colnames(x)[col(x)],V3=unname(do.call(c,x)),check.names=F)
t2=taco(ave)
t2\$V1=as.numeric(t2\$V1)
group=factor(t2\$V2,levels=unique(t2\$V2))

pal=paste0("#",unlist(strsplit("79a6f8 44ccf9 ece879 c5ec79 f5c361 ec9f79 36d17b ff78b8 e178ff 7442b2 782e2e 714c33 4e3d32"," ")))

ggplot(t2,aes(x=V1, y=V3,color=!!group,fill=!!group))+
geom_area(size=.1,alpha=.8)+
labs(x="Thousands of years BP",y="Percentage of component")+
scale_x_continuous(breaks=seq(0,200,5),expand=expansion(mult=c(.005,.005)),trans="reverse")+
scale_y_continuous(breaks=seq(0,100,20),expand=expansion(mult=c(.005,.005)))+
scale_color_manual(values=pal)+
scale_fill_manual(values=pal)+
guides(color=F,fill=guide_legend(ncol=7))+
theme(
axis.text=element_text(size=6,color="black"),
axis.ticks=element_blank(),
axis.ticks.length=unit(0,"cm"),
axis.title=element_text(size=8),
legend.background=element_blank(),
legend.box.just="center",
legend.box.margin=margin(0,unit="cm"),
legend.box.spacing=unit(0,"in"),
legend.direction="vertical",
legend.justification="center",
legend.key=element_rect(fill="white"),
legend.margin=margin(-1,0,0,0),
legend.position="bottom",
legend.text=element_text(size=7,vjust=.5),
legend.title=element_blank(),
panel.background=element_rect(fill="white"),
panel.grid=element_blank(),
plot.background=element_rect(fill="white")
)

ggsave("1.png",width=8,height=4)```
Shell version:

Code:
```\$ curl -Lso reich.k13 'https://drive.google.com/uc?export=download&id=1-x0ZOywaMH3v1KsZYaK0fbPMpReDxE6m'
\$ curl -LsO https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V50/V50.0/SHARE/public.dir/v50.0_HO_public.anno
\$ awk -F\\t '{print sprintf("%.0f",\$5/1000)*1000,\$2}' v50.0_HO_public.anno|awk -F'[ :,]' 'NR==FNR{a[\$2]=\$1;next}{print a[\$2]","\$0}' - reich.k13|sed 1d|grep -v ^,|awk -F, '{for(i=3;i<=NF;i++)a[\$1][i]+=\$i;n[\$1]++}END{for(i in a){o=i","n[i];for(j=3;j<=NF;j++)o=o","sprintf("%.1f",a[i][j]/n[i]);print o}}'|sort -t, -nk1,1|cat <(head -n1 reich.k13|sed s/^/age,n/) -|column -s, -t```
Make a scatterplot which shows the total percentage of two sets of components or the average distance to two sets of components

Each population is connected with a line to its two nearest neighbors. The populations are divided into clusters by cutting a hierarchical clustering tree at the height where it has 24 subtrees. Code:
```library(tidyverse)
library(ggforce)

t=t[sub(".*:","",rownames(t))%in%anno[anno[,5]==0,2],]
t=t[!grepl("\\.REF|rel\\.|fail\\.|\\.contam|Ignore_|_dup|_contam|_lc|_father|_mother|_son|_daughter|_brother|_sister|_relative|_sibling|_twin|Neanderthal|Denisova|Vindija_light|Gorilla|Macaque|Marmoset|Orangutan|Primate_Chimp|hg19ref|_o",rownames(t)),]

t=aggregate(t,list(sub(":.*","",rownames(t)%>%gsub("\\.(DG|SDG|SG|WGA)|_published","",.))),mean)%>%data.frame(row.names=1)

dm=as.matrix(t)%*%as.matrix(fst)
colnames(dm)=colnames(t)
t2=as.matrix(t)%*%cmdscale(fst,ncol(fst)-1)

p1=c("Baltic","North_Atlantic")
p2=c("East_Asian","Siberian","Amerindian")
xy=data.frame(x=rowSums(t[,p1]),y=rowSums(t[,p2]))
# xy=data.frame(x=rowMeans(dm[,p1]),y=rowMeans(dm[,p2]))

# pick=apply(xy<14,1,all)
# xy=xy[pick,]
# t2=t2[pick,]

seg=lapply(1:3,function(i)apply(as.matrix(dist(t2)),1,function(x)unlist(xy[names(sort(x)[i]),],use.names=F))%>%t%>%cbind(xy))%>%do.call(rbind,.)%>%setNames(paste0("V",1:4))

lim=c(min(xy),max(xy))+c(-1,1)*(max(xy)-min(xy))/20

k=cutree(hclust(dist(t2)),24)

ggplot(xy,aes(x,y))+
geom_abline(linetype="dashed",color="gray80",size=.3)+
geom_segment(data=seg,aes(x=V1,y=V2,xend=V3,yend=V4),color="gray50",size=.1)+
geom_point(size=.5,aes(color=as.factor(!!k)))+
geom_text(aes(color=as.factor(!!k)),label=rownames(xy),size=2,vjust=-.7)+
scale_x_continuous(breaks=seq(0,100,10),expand=expansion(mult=c(.06,.06)))+
scale_y_continuous(breaks=seq(0,100,10),expand=expansion(mult=c(.06,.06)))+
scale_fill_manual(values=pal1)+
scale_color_manual(values=pal1)+
coord_cartesian(xlim=lim,ylim=lim,expand=F)+
labs(x=paste0("Total percentage of ",paste(p1,collapse=" + ")),y=paste0("Total percentage of ",paste(p2,collapse=" + ")))+
# labs(x=paste0("Average distance to: ",paste(p1,collapse=" + ")),y=paste0("Average distance to: ",paste(p2,collapse=" + ")))+
theme(
axis.text.y=element_text(angle=90,vjust=1,hjust=.5),
axis.text=element_text(size=6),
axis.ticks=element_blank(),
axis.ticks.length=unit(0,"cm"),
axis.title=element_text(size=8),
legend.position="none",
panel.background=element_rect(fill="white"),
panel.border=element_rect(color="gray85",fill=NA,size=.6),
panel.grid.major=element_line(color="gray85",size=.2),
panel.grid.minor=element_blank(),
plot.background=element_rect(fill="white"),
plot.subtitle=element_text(size=7),
plot.title=element_text(size=11)
)

ggsave("1.png",width=7,height=7)```
Visualize the results of a calculator as a polygonal diagram

This is basically a version of a ternary plot that can be generalized to a higher number of dimensions than 3. A ternary plot is a type of graph where points are drawn inside an equilateral triangle so that the position of each point is determined by three variables that sum up to a constant (https://en.wikipedia.org/wiki/Ternary_plot).  Code:
```library(tidyverse)
library(ggforce)
library(colorspace)

t=t[!grepl("\\.REF|rel\\.|fail\\.|\\.contam|Ignore_|_dup|_contam|_lc|_father|_mother|_son|_daughter|_brother|_sister|_relative|_sibling|_twin|Neanderthal|Denisova|Vindija_light|Gorilla|Macaque|Marmoset|Orangutan|Primate_Chimp|hg19ref|_o",rownames(t)),]

rownames(t)=sub("\\.(SG|DG|SDG|WGA|WGC)|_published","",rownames(t))
t=data.frame(aggregate(t,list(sub(":.*","",rownames(t))),mean),row.names=1,check.names=F)

mds=as.matrix(t)%*%cmdscale(fst,ncol(fst)-1)

t=data.frame(t[,2],t[,9],t[,8],rowSums(t[,c(1,3:7,10:13)]))
names(t)=c("Baltic","Siberian","East_Asian","Other")

start=ifelse(ncol(t)==4,.25,0)
corners=corners*min(2/diff(apply(corners,2,range)))
corners[,2]=corners[,2]-mean(range(corners[,2]))

xy=as.data.frame(as.matrix(t)%*%corners)
grid=as.data.frame(rbind(cbind(corners,rbind(corners[-1,],corners[1,])),cbind(corners,matrix(apply(corners,2,mean),ncol=2,nrow=ncol(t),byrow=T))))

seg=lapply(1:3,function(i)as.matrix(dist(mds))%>%apply(1,function(x)unlist(xy[names(sort(x)[i]),],use.names=F))%>%t%>%cbind(xy))%>%do.call(rbind,.)%>%setNames(paste0("V",1:4))

k=as.factor(cutree(hclust(dist(mds)),32))

set.seed(1)
pal1=hex(colorspace::HSV(hue,.5,1))
pal2=hex(colorspace::HSV(hue,.3,1))

angle=ifelse(angle>90&angle<=270,angle+180,angle)

ggplot(xy,aes(x=V1,y=V2))+
geom_polygon(data=as.data.frame(corners),fill="gray25")+
# geom_text(data=as.data.frame(corners),aes(x=1.04*V1,y=1.04*V2),label=colnames(t),size=3.2,angle=angle,color="gray85")+ # use rotated labels (for pentagon and higher number of corners)
geom_text(data=as.data.frame(corners),aes(x=V1,y=V2+sign(V2)*.02),vjust=(1-corners[,2])/2,hjust=(1+corners[,1])/2,label=colnames(t),size=3.2,color="gray85")+ # don't rotate labels (for square and triangle)
geom_segment(data=grid,aes(x=V1,y=V2,xend=V3,yend=V4),color="gray30",size=.4)+
geom_segment(data=seg,aes(x=V1,y=V2,xend=V3,yend=V4),color="gray10",size=.25)+
geom_point(aes(color=k),size=.5)+
geom_text(aes(label=rownames(xy),color=!!k),size=2.2,vjust=-.6)+
coord_fixed(xlim=c(-1,1),ylim=c(-1,1))+
scale_fill_manual(values=pal1)+
scale_color_manual(values=pal2)+
theme(
axis.text=element_blank(),
axis.ticks=element_blank(),
axis.title=element_blank(),
legend.position="none",
panel.background=element_rect(fill="gray20"),
panel.grid=element_blank(),
plot.background=element_rect(fill="gray20",color=NA,size=0),
plot.margin=margin(0,0,0,0)
)

ggsave("1.png",width=13,height=13)```
The following code first gets the coordinates of the corners of an equilateral triangle with a unit radius, and it then uses matrix multiplication to plot the Gedrosia K3 results of populations inside the triangle:

Code:
```> triangle=sapply(c(sin,cos),function(x)head(x(seq(0,2,length.out=3+1)*pi),-1))
> triangle
[,1] [,2]
[1,]  0.0000000  1.0
[2,]  0.8660254 -0.5
[3,] -0.8660254 -0.5
[,1]     [,2]
Finnish  -0.77162863 -0.33650
Nganasan -0.03230275  0.94405```
Run calculators on the command line with Python

As far as I know, the DIYDodecad program only comes with binaries for Windows and Linux, and Dienekes wrote it in FORTRAN but he didn't publish its source code (https://dodecad.blogspot.com/2011/08...lator-for.html). However there's now also a Python-based alternative to DIYDodecad (https://github.com/stevenliuyi/admix):

Code:
```\$ wget https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V50/V50.0/SHARE/public.dir/v50.0_HO_public.{anno,ind,snp,geno}
\$ f=v50.0_HO_public;convertf -p <(printf %s\\n genotypename:\ \$f.geno snpname:\ \$f.snp indivname:\ \$f.ind outputformat:\ PACKEDPED genotypeoutname:\ \$f.bed snpoutname:\ \$f.bim indivoutname:\ \$f.fam)
\$ plink --bfile v50.0_HO_public --keep <(grep Chuvash33 v50.0_HO_public.fam) --recode 23

Calcuation is started...

K12b
Gedrosia: 2.06%
Siberian: 20.86%
Northwest African: 1.27%
Southeast Asian: 0.00%
Atlantic Med: 4.46%
North European: 53.12%
South Asian: 0.71%
East African: 0.00%
Southwest Asian: 1.85%
East Asian: 2.89%
Caucasus: 12.78%
Sub Saharan: 0.00%```
It's important to specify a lower termination tolerance with `-t 1e-7`, because if you use the default tolerance of 1e-3, then certain samples end up getting close to 100% one component and 0% of all other components, like for example this Vologda Russian sample with Eurogenes K13:

Code:
```\$ plink --bfile v50.0_HO_public --keep <(grep HGDP00899 v50.0_HO_public.fam) --recode 23

Calcuation is started...

K13
North_Atlantic: 0.12%
Baltic: 99.76%
West_Med: 0.00%
West_Asian: 0.00%
East_Med: 0.00%
Red_Sea: 0.00%
South_Asian: 0.00%
East_Asian: 0.00%
Siberian: 0.07%
Amerindian: 0.00%
Oceanian: 0.05%
Northeast_African: 0.00%
Sub-Saharan: 0.00%

Calcuation is started...

K13
North_Atlantic: 3.95%
Baltic: 71.60%
West_Med: 5.06%
West_Asian: 2.74%
East_Med: 6.62%
Red_Sea: 0.67%
South_Asian: 0.00%
East_Asian: 0.16%
Siberian: 6.24%
Amerindian: 1.26%
Oceanian: 1.31%
Northeast_African: 0.04%
Sub-Saharan: 0.36%

Calcuation is started...

K13
North_Atlantic: 4.09%
Baltic: 71.69%
West_Med: 4.73%
West_Asian: 2.64%
East_Med: 6.46%
Red_Sea: 0.91%
South_Asian: 0.00%
East_Asian: 0.11%
Siberian: 6.05%
Amerindian: 1.37%
Oceanian: 1.55%
Northeast_African: 0.03%
Sub-Saharan: 0.36%

Calcuation is started...

K13
North_Atlantic: 4.09%
Baltic: 71.69%
West_Med: 4.73%
West_Asian: 2.64%
East_Med: 6.46%
Red_Sea: 0.91%
South_Asian: 0.00%
East_Asian: 0.11%
Siberian: 6.05%
Amerindian: 1.37%
Oceanian: 1.55%
Northeast_African: 0.03%
Sub-Saharan: 0.36%```
I tried running Eurogenes K13 with different tolerance parameters for all 87 samples with the population name "Russian" in the HO version of the Reich dataset, because they also included a few other samples that got messed up results when I used the default tolerance of 1e-3. The average change in admixture proportions was about 0.3 percentage points between 1e-3 and 1e-4 but it fell to close to zero between 1e-5 and 1e-6, and even 1e-8 was almost as fast as 1e-3:

Tolerance Average difference in admixture percentages
compared to previous tolerance
Running time
per sample
1e-1 - 1.22
1e-2 3.144810 1.65
1e-3 3.515349 3.12
1e-4 0.316737 3.47
1e-5 0.002387 3.64
1e-6 0.000097 3.51
1e-7 0.000062 3.65
1e-8 0.000088 3.65

I picked 1e-7 as the tolerance partially because it's also specified as a tolerance in the parameter file of DIYDodecad calculators, like for example this one: http://magnusducatus.blogspot.com/20...alculator.html.

Here's how you can run a batch of multiple samples and make population averages:

Code:
```printf %s\\n Nganasan Finnish|awk 'NR==FNR{a[\$0];next}\$3 in a{print\$1,\$3}' - v50.0_HO_public.ind|while read x y;do plink --bfile v50.0_HO_public --keep <(awk '\$2==x' "x=\$x" v50.0_HO_public.fam) --allow-no-sex --recode 23 --out 23;admix -f 23.txt -m K12b -t 1e-7|grep %|sed 's/%//;s/.* //'|paste -sd, -|sed "s/^/\$y:\$x,/">>admix;done
I ran Eurogenes K13 for all samples in the 1240K and HO versions of the Reich dataset (I didn't manually remove any samples, so my files include samples that suffer from the calculator effect because Davidski used them as reference samples for K13):

- https://drive.google.com/file/d/1-x0...K0fbPMpReDxE6m - 1240K (more SNPs but less modern samples)
- https://drive.google.com/file/d/1bQV...230z3nYSM9UQ7Z - HO (less SNPs but more modern samples)
- https://drive.google.com/file/d/1pL2...Fn_zWLDrGXa9nM - 1240K and HO combined (duplicate samples with the same master ID are removed so only the sample with the biggest SNP count is kept; use 1240K version when available and use HO version only for samples missing from 1240K)

Eurogenes K13 has about 180,000 overlapping SNPs with the 1240K version of the Reich dataset but only about 50,000 overlapping SNPs with the HO version:

Code:
```\$ wget https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V50/V50.0/SHARE/public.dir/v50.0_{HO,1240K}_public.snp
\$ for x in 1240K HO;do cut -d\  -f1 /usr/local/lib/python3.9/site-packages/admix/data/K13.alleles|LC_ALL=C sort|comm -12 <(awk '{print\$1}' v50.0_\${x}_public.snp|LC_ALL=C sort) -|wc -l|echo "\$(cat) out of \$(wc -l< v50.0_\${x}_public.snp) SNPs overlap with \$x";done
177354 out of 1233013 SNPs overlap with 1240K
52770 out of 597573 SNPs overlap with HO```
When I compared the K13 results of samples that were included in both the 1240K and HO versions, the average difference in admixture percentages reached over 2% for samples that had less than 100,000 SNPs in the 1240K version: Here's how many SNPs different calculators have in common with the 1240K version of the Reich dataset (shown in the first column) and the HO version (shown in the second column):

Code:
```\$ for f in /usr/local/lib/python3.9/site-packages/admix/data/*.alleles;do awk 'ARGIND==1{a[\$1];next}ARGIND==2{b[\$1];next}\$1 in a{na++}\$1 in b{nb++}END{f=ARGV;sub(".*/","",f);print na,nb,f}' v50.0_{1240K,HO}_public.snp \$f;done
26711 23889 Africa9.alleles
35641 35433 AncientNearEast13.alleles
165122 48100 E11.alleles
154754 43947 EUtest13.alleles
159921 45048 Eurasia7.alleles
182709 56582 HarappaWorld.alleles
154754 43947 Jtest14.alleles
161597 45265 K12b.alleles
177354 52770 K13.alleles
160674 46518 K13M2.alleles
160674 46518 K14M1.alleles
179286 53263 K15.alleles
160674 46518 K18M4.alleles
160674 46518 K25R1.alleles
160674 46518 K36.alleles
76223 75060 K47.alleles
101280 101280 K7AMI.alleles
160674 46518 K7M1.alleles
161597 45265 K7b.alleles
101280 101280 K8AMI.alleles
92607 91944 KurdishK10.alleles
114866 49757 MDLPK27.alleles
76217 75054 MichalK25.alleles
112059 110039 TurkicK11.alleles
161125 45515 globe10.alleles
161125 45515 globe13.alleles
38491 9449 puntDNAL.alleles
163875 45877 weac2.alleles
165539 46293 world9.alleles```  Reply With Quote

2. ## The Following 11 Users Say Thank You to Nganasankhan For This Useful Post:

Archetype0ne (06-20-2022),  Eurasia (06-26-2022),  heksindhi (06-20-2022),  JMcB (06-22-2022),  jstephan (06-24-2022),  Kale (06-25-2022),  peloponnesian (06-25-2022),  ph2ter (06-20-2022),  PLogan (06-20-2022),  Telfermagne (06-21-2022),  xerxez (06-26-2022)

3. Display calculator results as a circular stacked bar chart

This uses the Circlize package: https://jokergoo.github.io/circlize_book/book/.

The colored groups are based on cutting a hierarchical clustering tree at the height where it has 20 subtrees (which is below the height of the branches that have black color in the clustering tree). Code:
```library(circlize)
library(ComplexHeatmap) # for `Legend`
library(vegan) # for `reorder.hclust`
library(dendextend) # for `color_branches`
library(colorspace) # for `hex`

ord=c(2,1,3,5,6,4,7:13)
t=t[,ord]
fst=fst[ord,ord]

hc=hclust(dist(as.matrix(t)%*%as.matrix(fst)))
hc=reorder(hc,as.matrix(t)%*%seq(ncol(t))^2)

hue=c(0,30,60,90,120,170,210,240,280,320)
labelcolor=c(hcl(hue+15,60,70),hcl(hue+15,50,50))
barcolor=c(hex(HSV(c(240,210,60,40,15,80,120,340,290),.5,.95)),hex(HSV(c(270,0,30),.5,.6)),hex(HSV(30,.5,.4)))
textcolor=c(rep("black",9),rep("white",4))

labels=hc\$labels[hc\$order]
cut=cutree(hc,20)
dend=color_branches(as.dendrogram(hc),k=length(unique(cut)),col=labelcolor[unique(cut[labels])])
t=t[hc\$labels[hc\$order],]

circos.clear()
png("1.png",w=3500,h=3500,res=300)
circos.initialize("a",xlim=c(0,nrow(t)))

circos.track(ylim=c(0,1),bg.border=NA,track.height=.28,track.margin=c(.01,0),

circos.track(ylim=c(0,1),track.margin=c(0,0),track.height=.26,bg.lty=0,panel.fun=function(x,y){
mat=as.matrix(t)
pos=1:nrow(mat)-.5
barwidth=1
for(i in 1:ncol(mat)){
seq1=rowSums(mat[,seq(i-1),drop=F])
seq2=rowSums(mat[,seq(i),drop=F])
circos.rect(pos-barwidth/2,if(i==1){0}else{seq1},pos+barwidth/2,seq2,col=barcolor[i],border="gray15",lwd=.1)
}
for(i in 1:ncol(mat)){
seq1=rowSums(mat[,seq(i-1),drop=F])
seq2=rowSums(mat[,seq(i),drop=F])
lab=round(100*t[,i])
lab[lab<=1]="" # don't show labels for 0% or 1%
circos.text(pos,if(i==1){seq1/2}else{seq1+(seq2-seq1)/2},labels=lab,col=textcolor[i],cex=.5,facing="clockwise",niceFacing=T)
}
})

circos.track(ylim=c(0,attr(dend,"height")),bg.border=NA,track.margin=c(0,.0015),track.height=.4,panel.fun=function(x,y)circos.dendrogram(dend))

circos.clear()

lg=Legend(at=colnames(t),legend_gp=gpar(col=barcolor),background=barcolor,type="points")
draw(lg,just=c("center","center"))
# draw(lg,x=unit(15,"mm"),y=unit(15,"mm"),just=c("left","bottom")) # draw legend in bottom left corner instead of middle

dev.off()```  Reply With Quote

4. ## The Following 6 Users Say Thank You to Nganasankhan For This Useful Post:

Aben Aboo (06-22-2022),  JMcB (06-22-2022),  Kale (06-25-2022),  PLogan (06-22-2022),  Wâldpykjong (06-22-2022),  xerxez (06-26-2022)

5. That is awesome Nganasankhan!  Reply With Quote

6. ## The Following User Says Thank You to PLogan For This Useful Post:

Nganasankhan (06-22-2022)

7. Nganasankhan, excuse my ignorance, is copy pasting the code into a shell compiler enough?
Recently got literate in Python, but it seems not to help me much with running the code. Thanks for these very valuable scripts you provide.  Reply With Quote

8. Got an error on the legend section, but attached is my rendering injecting myself and my parents. Look in the English area low right.

Code:
```> lg=Legend(at=colnames(t),legend_gp=gpar(col=barcolor),background=barcolor,type="points")
Error in Legend(at = colnames(t), legend_gp = gpar(col = barcolor), background = barcolor,  :
could not find function "Legend"
> draw(lg,just=c("center","center"))
Error in draw(lg, just = c("center", "center")) :
could not find function "draw"
> # draw(lg,x=unit(15,"mm"),y=unit(15,"mm"),just=c("left","bottom")) # draw legend in bottom left corner instead of middle
>

> install.packages("ComplexHeatmap")
Warning in install.packages :
package ‘ComplexHeatmap’ is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
Code:
```>
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

Random number generation:
RNG:     Mersenne-Twister
Normal:  Inversion
Sample:  Rounding

locale:
 LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
 LC_TIME=English_United States.1252
system code page: 65001

attached base packages:
 stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 circlize_0.4.15   dendextend_1.15.2 vegan_2.6-2       lattice_0.20-45   permute_0.9-7

loaded via a namespace (and not attached):
 Rcpp_1.0.8          pillar_1.7.0        compiler_4.1.2      RColorBrewer_1.1-2  viridis_0.6.2       tools_4.1.2         viridisLite_0.4.0   nlme_3.1-153
 lifecycle_1.0.1     tibble_3.1.6        gtable_0.3.0        mgcv_1.8-38         pkgconfig_2.0.3     rlang_1.0.1         Matrix_1.3-4        DBI_1.1.2
 cli_3.2.0           ggrepel_0.9.1       parallel_4.1.2      gridExtra_2.3       cluster_2.1.2       dplyr_1.0.8         GlobalOptions_0.1.2 generics_0.1.2
 vctrs_0.3.8         grid_4.1.2          tidyselect_1.1.1    cowplot_1.1.1       glue_1.6.1          R6_2.5.1            fansi_1.0.2         polyclip_1.10-0
 pheatmap_1.0.12     ggplot2_3.3.5       purrr_0.3.4         tweenr_1.0.2        farver_2.1.0        magrittr_2.0.2      splines_4.1.2       scales_1.1.1
 ellipsis_0.3.2      MASS_7.3-54         assertthat_0.2.1    shape_1.4.6         ggforce_0.3.3       colorspace_2.0-3    utf8_1.2.2          munsell_0.5.0
 crayon_1.5.0```
Thanks for sharing.   Reply With Quote

9. Those BioConductor packages require R 4.2. I'm apprehensive to upgrade and I've read horror stories for some who've upgraded and broken their environments.

Is there a simpler way to make a legend?

TIA  Reply With Quote

10. Originally Posted by Archetype0ne Nganasankhan, excuse my ignorance, is copy pasting the code into a shell compiler enough?
Recently got literate in Python, but it seems not to help me much with running the code. Thanks for these very valuable scripts you provide.
Yeah I try to write most of my shell scripts so that they can just be copied and pasted to a terminal in macOS or GNU/Linux.

You can install R from here: https://cran.r-project.org. Then open the R console application and paste in my code. To install the required packages, run for example `install.packages("tidyverse")`. Originally Posted by PLogan Those BioConductor packages require R 4.2. I'm apprehensive to upgrade and I've read horror stories for some who've upgraded and broken their environments.

Is there a simpler way to make a legend?

TIA
[s]You don't need BioConductor if you install the development version of ComplexHeatmap[/s]: `library(devtools);install_github("jokergoo/ComplexHeatmap")` (https://jokergoo.github.io/ComplexHe...eference/book/). (Edit: apparently you do.)  Reply With Quote

11. ## The Following User Says Thank You to Nganasankhan For This Useful Post:

PLogan (06-22-2022)

12. Hmmm... got all excited.

install fail.JPG

install fail2.JPG  Reply With Quote

13. [s]You don't need BioConductor if you install the development version of ComplexHeatmap[/s]: `library(devtools);install_github("jokergoo/ComplexHeatmap")` (https://jokergoo.github.io/ComplexHe...eference/book/). (Edit: apparently you do.)[/QUOTE]

Just wanted to give another shout out and say thank you. First for your generosity in sharing code examples and secondly for motivating me to upgrade my R environments on my machines. It has solved a number of problems I've been experiencing having everything current. I've been burned in the past when upgrading so was gun shy, but I really wanted to be able to reproduce and utilize your code for Circlize.

Here is my final render injecting myself and parents. They'll be in the lower right in the English area. I now want to experiment with maybe doing this with other calculators such as K36, although that might produce a very busy chart.

Thanks again Nganasankhan!   Reply With Quote

14. Another tribute to Nganasankhan.

Playing with different populations to better learn how the code works. This is a "K13 British Isles counties and cities by Creoda" I collected. I collect all I can lay my hands on. LOL

Here are the results.   Reply With Quote

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•