Page 4 of 4 FirstFirst ... 234
Results 31 to 37 of 37

Thread: Making two-way G25 models with negative weights

  1. #31
    Registered Users
    Posts
    718
    Location
    Florida Native
    Ethnicity
    Greek Peloponnese
    Nationality
    US since the 1890's
    Y-DNA (P)
    G2A2A

    0.020 Anthony_C_scaled = 62% Greek_Thessaly + 38% Greek_North_Tsakonia
    0.020 Anthony_C_scaled = 94% Greek_Messenia + 6% Sardinian
    0.020 Anthony_C_scaled = 86% Greek_Messenia + 14% Italian_Lombardy
    0.020 Anthony_C_scaled = 57% Greek_Messenia + 43% Greek_Thessaly
    0.020 Anthony_C_scaled = 61% Greek_Thessaly + 39% Greek_Laconia
    0.020 Anthony_C_scaled = 77% Greek_Laconia + 23% Italian_Lombardy
    0.020 Anthony_C_scaled = 89% Greek_Messenia + 11% Italian_Bergamo
    0.021 Anthony_C_scaled = 88% Greek_Messenia + 12% French_Corsica
    0.021 Anthony_C_scaled = 75% Greek_North_Tsakonia + 25% Italian_Bergamo
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% Basque_Soule
    0.021 Anthony_C_scaled = 95% Greek_Messenia + 5% Spanish_Asturias
    0.021 Anthony_C_scaled = 58% Greek_Thessaly + 42% Greek_Argolis
    0.021 Anthony_C_scaled = 64% Greek_Thessaly + 36% Greek_Corinthia
    0.021 Anthony_C_scaled = 54% Greek_North_Tsakonia + 46% Albanian
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% Spanish_Pais_Vasco
    0.021 Anthony_C_scaled = 91% Greek_Messenia + 9% Italian_Liguria
    0.021 Anthony_C_scaled = 75% Greek_North_Tsakonia + 25% Italian_Lombardy
    0.021 Anthony_C_scaled = 66% Greek_Thessaly + 34% Greek_East_Taygetos
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Basque_Baztan
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% French_Bigorre
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Basque_Spanish
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% Basque_French
    0.021 Anthony_C_scaled = 82% Greek_Argolis + 18% Italian_Lombardy
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Basque_Biscay
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% Spanish_Burgos
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Basque_Navarre_Center
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Basque_Lower_Navarre
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Basque_Araba
    0.021 Anthony_C_scaled = 86% Greek_Messenia + 14% Italian_Marche
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% Spanish_Aragon
    0.021 Anthony_C_scaled = 79% Greek_Laconia + 21% Italian_Bergamo
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Basque_Navarre_North
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% Spanish_Navarra
    0.021 Anthony_C_scaled = 93% Greek_Messenia + 7% Italian_Veneto
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% French_Bearn
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% French_Chalosse
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Basque_Roncal
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% Spanish_La_Rioja
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Basque_Gipuzkoa_Southwest
    0.021 Anthony_C_scaled = 92% Greek_Messenia + 8% Italian_Piedmont
    0.021 Anthony_C_scaled = 90% Greek_Messenia + 10% Italian_Tuscany
    0.021 Anthony_C_scaled = 86% Greek_Thessaly + 14% Greek_Deep_Mani
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% Spanish_Menorca
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% Spanish_Castilla_La_Mancha
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% Spanish_Peri-Barcelona
    0.021 Anthony_C_scaled = 66% Greek_Thessaly + 34% Greek_Arcadia
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Biscay
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Basque_Gipuzkoa
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Lleida
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Aragon_North
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Pirineu
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Valencia
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Catalunya_Central
    0.021 Anthony_C_scaled = 95% Greek_Messenia + 5% Italian_Trentino-Alto-Adige
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Terres_de_l'Ebre
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Murcia
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Eivissa
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Cataluna
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Barcelones
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Alacant
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% French_South
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Baleares
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Girona
    0.021 Anthony_C_scaled = 77% Greek_North_Tsakonia + 23% Italian_Veneto
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Spanish_Mallorca
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Spanish_Castilla_Y_Leon
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Spanish_Penedes
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% French_Auvergne
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Spanish_Galicia
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Spanish_Castello
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Spanish_Camp_de_Tarragona
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% French_Occitanie
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Spanish_Andalucia
    0.021 Anthony_C_scaled = 95% Greek_Messenia + 5% Italian_Umbria
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Swiss_French
    0.021 Anthony_C_scaled = 84% Greek_Thessaly + 16% Italian_Apulia
    0.021 Anthony_C_scaled = 87% Greek_Thessaly + 13% Sicilian_East
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Spanish_Cantabria
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Portuguese
    0.021 Anthony_C_scaled = 92% Greek_Messenia + 8% Albanian
    0.021 Anthony_C_scaled = 96% Greek_Messenia + 4% Swiss_Italian
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Spanish_Extremadura
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Sicilian_East
    0.021 Anthony_C_scaled = 83% Greek_Argolis + 17% Italian_Liguria
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Italian_Northeast
    0.021 Anthony_C_scaled = 92% Greek_Messenia + 8% Greek_North_Tsakonia
    0.021 Anthony_C_scaled = 99% Greek_Messenia + 1% BelgianC
    0.021 Anthony_C_scaled = 99% Greek_Messenia + 1% Italian_Aosta_Valley
    0.021 Anthony_C_scaled = 99% Greek_Messenia + 1% French_Provence
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Italian_Apulia
    0.021 Anthony_C_scaled = 99% Greek_Messenia + 1% French_Paris
    0.021 Anthony_C_scaled = 100% Greek_Messenia + 0% BelgianA
    0.021 Anthony_C_scaled = 100% Greek_Messenia + 0% Swiss_German
    0.021 Anthony_C_scaled = 100% Greek_Messenia + 0% English_Cornwall
    0.021 Anthony_C_scaled = 100% Greek_Messenia + 0% French_Alsace
    0.021 Anthony_C_scaled = 100% Greek_Messenia + 0% French_Seine-Maritime
    0.021 Anthony_C_scaled = 98% Greek_Messenia + 2% Greek_Laconia
    0.021 Anthony_C_scaled = 97% Greek_Messenia + 3% Greek_Argoli
    88.0 Greek_Peloponnese + 12.0 Swiss_French Distance: 1.4582% / 0.01458192 | R2P

  2. #32
    Registered Users
    Posts
    571
    Sex
    Location
    Missouri, U.S.
    Ethnicity
    Colonial American
    Nationality
    American
    aDNA Match (1st)
    VK2020_Scotland_Orkney_VA:VK207
    Y-DNA (P)
    R1b-U152 >R-FTA96415
    mtDNA (M)
    J1b1a1a
    Y-DNA (M)
    I2-P37 > I-BY77146
    mtDNA (P)
    H

    United States of America Scotland England Netherlands
    You might be Greek

  3. #33
    Registered Users
    Posts
    340
    Sex
    Ethnicity
    Finnish

    Now my two-way script at JS Bin only takes about 50 ms instead of 300 ms to run my standard benchmark where it generates 255,255 models. Previously it sorted all models by distance, but now it just finds the 100 models with the smallest distance.

    I implemented the same change in my R script by replacing `ord=head(order(dist),10)` with `ord=kit::topn(dist,10)`, which reduced the running time of my standard benchmark from about 45 ms to about 35 ms.

    I also made a C++ version of my script which takes about 15 ms to run for my standard benchmark. Compile with `g++ -O3 twoway.cpp`.

    Code:
    #include<iostream>
    #include<fstream>
    #include<string>
    #include<vector>
    #include<sstream>
    #include<cmath>
    
    using namespace std;
    
    int main(int argc,char**argv){
      vector<vector<double> >source;
      vector<double>row;
      vector<string>name;
      string line,field;
      vector<double>target;
      string targetname;
    
      ifstream file1(argv[2]);
      while(getline(file1,line)){
        row.clear();
        stringstream str(line);
        getline(str,field,',');
        targetname=field;
        while(getline(str,field,','))row.push_back(stod(field));
        target=row;
      }
    
      ifstream file2(argv[1]);
      while(getline(file2,line)){
        row.clear();
        stringstream str(line);
        getline(str,field,',');
        if(field==""||field==targetname)continue;
        name.push_back(field);
        while(getline(str,field,','))row.push_back(stod(field));
        source.push_back(row);
      }
    
      int nrow=source.size(),ncol=source[0].size(),ncomb=nrow*(nrow-1)/2;
      vector<double>targetdist(nrow),weight(ncomb),dist(ncomb);
    
      for(int i=0;i<nrow;i++){double sum=0;for(int j=0;j<ncol;j++){double q=source[i][j]-target[j];sum+=q*q;}targetdist[i]=sum;}
    
      int n=0;
      for(int i=0;i<nrow-1;i++)for(int j=i+1;j<nrow;j++){
        double d12=0;
        for(int k=0;k<ncol;k++){double q=source[i][k]-source[j][k];d12+=q*q;}
        double d1=targetdist[i];
        double d2=targetdist[j];
        double w=(d12+d2-d1)/(2*d12);
        weight[n]=w;
        dist[n++]=sqrt(d2-w*w*d12);
      }
    
      int nprint=min(32,ncomb);
      vector<int>foundind(nprint);
      vector<double>founddist(nprint);
      double biggestfound;
    
      for(int n=0;n<ncomb;n++){
        double d=dist[n];
        if(d<biggestfound||n<nprint){
          int insertat=nprint-1;
          for(int i=0;i<nprint-1;i++)if(d<founddist[i]||i==n){insertat=i;break;}
          for(int i=nprint-1;i>insertat;i--){foundind[i]=foundind[i-1];founddist[i]=founddist[i-1];}
          foundind[insertat]=n;
          founddist[insertat]=d;
          biggestfound=founddist[nprint-1];
        }
      }
    
      cout<<"Target "<<targetname<<":"<<endl;
    
      for(n=0;n<nprint;n++){
        double dist=founddist[n];
        int ind=foundind[n];
        double w=weight[ind];
    
        int i=(2*nrow-1-sqrt(4*nrow*nrow-4*nrow+1-8.0*ind))/2;
        int j=ind-i*nrow+i*(i+1)/2+i+1;
    
        string name1=name[w>=.5?i:j];
        string name2=name[w>=.5?j:i];
        if(w<.5)w=1-w;
    
        printf("%.3f %.0f %s %.0f %s\n",dist,100*w,name1.c_str(),100*(1-w),name2.c_str());
      }
    
      return 0;
    }
    Last edited by Nganasankhan; 08-11-2022 at 06:47 PM.

  4. The Following 5 Users Say Thank You to Nganasankhan For This Useful Post:

     Aben Aboo (08-11-2022),  Andour (08-12-2022),  Harut1994 (08-12-2022),  randwulf (08-11-2022),  Toguz (08-12-2022)

  5. #34
    Registered Users
    Posts
    340
    Sex
    Ethnicity
    Finnish

    I now made another new JavaScript interface: https://jsbin.com/pahovip. It allows you to automate the procedure of iteratively generating models so that after each step, you always remove some of the source populations that appear in the lowest-distance models. I think it's a procedure that veteran users of Vahaduo are very familiar with, but I don't know if anyone has tried to automate it before.

    But anyway, if you use my script with the default "elimination mode", then it first finds the best two-way model using all combinations of source populations. And then next it takes the population with the bigger percentage in the model and removes it from the sources, and next it makes a second two-way model and again removes the source population that gets the bigger percentage, and so on. So for example here I made models for Udmurts where the sources included all ancient and modern population averages:

    Target: Udmurt (100 iterations, 2.3 million models generated in 1.7 s)
    1 0.013 89% Besermyan + 11% RUS_Bolshoy_Oleni_Ostrov
    2 0.022 67% Komi + 33% RUS_Kusnarenkovo_Karajakupovo_MED
    3 0.022 71% VK2020_NOR_North_VA_o1 + 29% IRN_Hajji_Firuz_BA
    4 0.024 57% Chuvash + 43% RUS_Karasuk
    5 0.025 61% Sargat_IA + 39% Russian_Leshukonsky
    6 0.025 58% Tatar_Kazan + 42% RUS_Krasnoyarsk_MLBA_o
    7 0.026 56% RUS_Tagar + 44% Mari
    8 0.026 56% KAZ_Zevakinskiy_LBA + 44% Mari
    9 0.026 72% FIN_Levanluhta_IA + 28% CHN_Shanpula_HE3
    10 0.026 58% RUS_Mezhovskaya + 42% Mari
    11 0.026 61% Russian_Leshukonsky + 39% RUS_Kusnarenkovo_Karajakupovo_MED
    12 0.026 50% RUS_Kusnarenkovo_Karajakupovo_MED + 50% Russian_Kostroma
    13 0.028 52% KAZ_Lisakovskiy_MLBA_Alakul + 48% Mansi
    14 0.028 57% UKR_MBA + 43% RUS_Bolshoy_Oleni_Ostrov_o
    15 0.028 52% RUS_Srubnaya_MLBA + 48% Mansi
    16 0.028 51% KAZ_Ak_Moustafa_MLBA1 + 49% Mansi
    17 0.028 53% KAZ_Zevakinskiy_MLBA + 47% Mari
    18 0.029 52% Corded_Ware_CZE_late + 48% Mansi
    19 0.029 52% KAZ_Kairan_MLBA + 48% Mansi
    20 0.029 52% Corded_Ware_DEU + 48% Mansi
    21 0.029 55% RUS_Krasnoyarsk_MLBA + 45% Mansi
    22 0.029 54% RUS_Srubnaya_Alakul_MLBA + 46% Mansi
    23 0.029 51% CHE_LN_steppe + 49% Mansi
    24 0.029 78% KAZ_Mys_MLBA + 22% RUS_Krasnoyarsk_BA

    So in just the first 12 iterations above, my script ended up discovering many of the ancient populations which are best sources of ancestry for Uralic people, including even the less obvious ones like Sargat, Karasuk, and Mezhovskaya. But starting from the 13th iteration, the results became a bit repetitive, because it just began to generate models where some Bronze Age steppe population got a bit over 50% ancestry and Mansi got a bit under 50% ancestry. So that motivated me to add two new elimination modes to the script which are also able to eliminate populations that get under 50% ancestry. And so next when I used one of the new modes so that each source population was removed after it had apperaed in two models, then the following were the first 24 models for Udmurts:

    Target: Udmurt (100 iterations, 2.3 million models generated in 1.6 s)
    1 0.013 89% Besermyan + 11% RUS_Bolshoy_Oleni_Ostrov
    2 0.014 81% Besermyan + 19% VK2020_NOR_North_VA_o1
    3 0.022 67% Komi + 33% RUS_Kusnarenkovo_Karajakupovo_MED
    4 0.022 71% VK2020_NOR_North_VA_o1 + 29% IRN_Hajji_Firuz_BA
    5 0.024 54% Sargat_IA + 46% Komi
    6 0.024 57% Chuvash + 43% RUS_Karasuk
    7 0.024 59% Chuvash + 41% KAZ_Zevakinskiy_LBA
    8 0.025 61% Sargat_IA + 39% Russian_Leshukonsky
    9 0.025 58% Tatar_Kazan + 42% RUS_Krasnoyarsk_MLBA_o
    10 0.026 56% RUS_Tagar + 44% Mari
    11 0.026 56% KAZ_Zevakinskiy_LBA + 44% Mari
    12 0.026 72% FIN_Levanluhta_IA + 28% CHN_Shanpula_HE3
    13 0.026 61% Russian_Leshukonsky + 39% RUS_Kusnarenkovo_Karajakupovo_MED
    14 0.027 69% FIN_Levanluhta_IA + 31% CHN_Ayousaigoukou_IA2
    15 0.028 52% KAZ_Lisakovskiy_MLBA_Alakul + 48% Mansi
    16 0.028 57% UKR_MBA + 43% RUS_Bolshoy_Oleni_Ostrov_o
    17 0.028 52% RUS_Srubnaya_MLBA + 48% Mansi
    18 0.029 55% KAZ_Lisakovskiy_MLBA_Alakul + 45% Khanty
    19 0.029 73% Tatar_Kazan + 27% RUS_Bolshoy_Oleni_Ostrov
    20 0.029 78% KAZ_Mys_MLBA + 22% RUS_Krasnoyarsk_BA
    21 0.029 76% UZB_Kashkarchi_BA + 24% RUS_Krasnoyarsk_BA
    22 0.029 83% Saami + 17% UZB_Bustan_BA_o1
    23 0.030 53% RUS_Krasnoyarsk_MLBA_o + 47% Tatar_Mishar
    24 0.030 52% Tatar_Siberian_Zabolotniye + 48% KAZ_Solyanka_MLBA

    So now it's a bit less repetitive, even though there's still many variations of the pattern that Udmurts get about 50% ancestry from Khanty or Mansi or Swamp Tatars or Uyelgi and about 50% ancestry from some Bronze Age steppe population. But now for example by allowing VK2020_NOR_North_VA_o1:VK518 to appear in multiple models, it also became more clear that VK518 was a better source of Fennoscandinavian ancestry for Udmurts than Levšnluhta or Saami, because VK518 appeared in both the 2nd and 4th model, but Levšnluhta appeared for the first time only in the 12th model, and Saami appeared for the first time only in the 22nd model.

    My new script also supports making models with negative weights. So for example the following output shows that when I used all ancient and modern averages as the source and I allowed both positive and negative weights, then the 20 first models for Maris all got a negative weight (so Maris are another population like Nganasans in the sense that their best models get negative weights, which is an indication that Maris and Nganasans are upstream from other populations in the ancestry tree so that they have contributed ancestry to other populations and not vice versa):

    Target: Mari (100 iterations, 2.3 million models generated in 2.1 s)
    1 0.015 163% Chuvash - 63% Tatar_Mishar
    2 0.054 192% Udmurt - 92% KAZ_Zevakinskiy_LBA
    3 0.057 173% Besermyan - 73% UKR_Cimmerian
    4 0.066 304% Russian_Pinega - 204% Latvian
    5 0.068 226% Russian_Leshukonsky - 126% Latvian
    6 0.069 232% Mansi - 132% Selkup
    7 0.069 172% Komi - 72% DEU_Tollense_BA_o2
    8 0.071 463% Russian_Kostroma - 363% Lithuanian_PA
    9 0.072 177% Tatar_Kazan - 77% DEU_MA_ACD_Baiuvaric
    10 0.072 613% Moksha - 513% Russian_Orel
    11 0.074 148% Tatar_Siberian_Zabolotniye - 48% RUS_Lake_Baikal_BA
    12 0.075 195% Bashkir - 95% CHN_Simutasi_HE
    13 0.075 283% Tatar_Siberian - 183% CHN_Simutasi_HE
    14 0.079 118% FIN_Levanluhta_IA - 18% NOR_N_HG
    15 0.079 283% Russian_Pinezhsky - 183% Latvian
    16 0.079 368% Russian_Krasnoborsky - 268% Lithuanian_VA
    17 0.080 145% VK2020_NOR_North_VA_o1 - 45% RUS_Bolshoy_Oleni_Ostrov
    18 0.080 414% Vepsian - 314% Estonian
    19 0.080 264% Khanty - 164% Selkup
    20 0.082 124% Saami - 24% NOR_N_HG
    21 0.082 184% Sargat_IA - 84% CHN_Xianshuiquangucheng_HE
    22 0.083 181% Saami_Kola - 81% Baltic_EST_IA
    23 0.083 175% VK2020_NOR_North_VA_o2 - 75% DEU_Tollense_BA_o2
    24 0.083 137% RUS_Kusnarenkovo_Karajakupovo_MED - 37% RUS_Lake_Baikal_BA

    Here's also a basic R version of my script:

    Code:
    weightmode=2 # 1 converts negative weights to zero, 2 allows both positive and negative weights, and 3 rejects models with no negative weight
    
    source=as.matrix(do.call(rbind,lapply(Sys.glob("g/25/?as"),read.csv,r=1)))
    targ="Mari"
    target=source[targ,]
    source=source[!rownames(source)%in%targ,]
    name=rownames(source)
    source=unname(source)
    npop=nrow(source)
    
    # this is a faster alternative to `t=combn(npop,2);i1=t[1,];i2=t[2,]`
    i1=unlist(lapply(2:npop,\(x)x:npop))
    i2=rep(1:(npop-1),(npop-1):1)
    
    # this is a fast way to calculate the distance of a vector to each row of a matrix
    d3=sqrt(rowSums(source^2)+sum(target^2)-2*source%*%as.matrix(target)[,1])
    
    d0=dist(source)
    d1=d3[i1]
    d2=d3[i2]
    weight=(d0^2+d2^2-d1^2)/(2*d0^2)
    dist=sqrt(d2^2-weight^2*d0^2)
    
    if(weightmode==1){o1=weight>1;u0=weight<0;weight[o1]=1;weight[u0]=0;dist[o1]=d3[i1[o1]];dist[u0]=d3[i2[u0]]}
    if(weightmode==3)dist[weight<=1]=NA
    
    flip=weight<.5
    pop1=name[ifelse(flip,i2,i1)]
    pop2=name[ifelse(flip,i1,i2)]
    weight[flip]=1-weight[flip]
    
    rej=c()
    ord=c()
    dist2=dist
    for(n in 1:20){
      dist2[pop1%in%rej|pop2%in%rej]=NA
      min=which.min(dist2)
      rej[length(rej)+1]=pop1[min]
      # rej[length(rej)+1]=pop2[min]
      ord[n]=min
    }
    
    writeLines(sprintf("%.3f %.0f %s %.0f %s",dist[ord],100*weight[ord],pop1[ord],100*(1-weight[ord]),pop2[ord]))
    I came up with the idea for my new iterative two-way script because of this post by bce:

    Quote Originally Posted by bce View Post
    here's another way to measure it.
    first run a model without Baltic EST BA in the sources, and then with it. Then caclualate the difference in distances (it's easy with Vahaduo's MULTI mode and excel)

    here are populations for which the distance drops by 0.005 or more:

    Baltic_EST_BA -0.09420711
    Baltic_LVA_BA -0.07990357
    RUS_Ingria_IA -0.05941208
    VK2020_POL_Cedynia_VA -0.05395064
    KAZ_Golden_Horde_Euro -0.05363308
    Baltic_LTU_Late_Antiquity_low_res -0.04965567
    DEU_MA_Krakauer_Berg -0.04957589
    Baltic_EST_MA -0.04814343
    VK2020_RUS_Kurevanikha_VA -0.04721516
    Baltic_LTU_BA -0.0466742
    [quoting only first 10 rows]
    I wanted to see if I could reproduce his sequence of populations by generating a series of two-way models, and actually if you just look at the fourth column of the output below which shows the population that was removed after each step, the list of populations is very close to the table above:

    Target: Baltic_LVA_BA (100 iterations, 1 million models generated in 0.55 s)
    1 0.012 95% Baltic_EST_BA + 5% HUN_MBA_Vatya_o
    2 0.032 93% RUS_Ingria_IA + 7% POL_BKG_N_o1
    3 0.034 63% Baltic_LTU_BA + 37% Baltic_LTU_Late_Antiquity_low_res
    4 0.040 51% Baltic_LTU_Late_Antiquity_low_res + 49% Baltic_EST_IA
    5 0.041 73% VK2020_POL_Cedynia_VA + 27% UKR_Meso
    6 0.043 76% KAZ_Golden_Horde_Euro + 24% UKR_Meso
    7 0.047 88% Baltic_EST_MA + 12% ROU_Meso
    8 0.047 79% VK2020_RUS_Kurevanikha_VA + 21% ROU_Meso
    9 0.049 83% Baltic_EST_IA + 17% DEU_Tollense_BA_o2
    10 0.053 73% DEU_MA_Krakauer_Berg + 27% UKR_Meso

  6. The Following 6 Users Say Thank You to Nganasankhan For This Useful Post:

     Aben Aboo (08-21-2022),  Huck Finn (08-21-2022),  MR J (08-26-2022),  PLogan (08-16-2022),  randwulf (08-21-2022),  Toguz (08-16-2022)

  7. #35
    Registered Users
    Posts
    340
    Sex
    Ethnicity
    Finnish

    I was trying to help someone understand why the weights of the two-way models add up to one even when there are negative weights, so I wrote a script for visualizing the models.

    So for example in the plot below, Han_Shanghai is modeled as 156% Kazakh - 56% German. The point of the model is the point that is the closest to Han_Shanghai on the line that intersects the point of Germans and the point of Kazakhs. To get the coordinates of the point of the model, you can first go from the origin of the space to the point of Germans and then go towards the point of Kazakhs for 1.56 times the length of the line segment between Kazakhs and Germans. Or in other words, you can find the point of the model with the formula `German+1.56*(Kazakh-German)`, which is the same as `1.56*Kazakh-0.56*German`.



    Code:
    t=as.matrix(do.call(rbind,lapply(Sys.glob("g/25/[am]as"),read.csv,r=1)))
    
    n1="German"
    n2="Kazakh"
    n3="Han_Shanghai"
    
    # n1="WHG"
    # n2="RUS_Karelia_HG"
    # n3="RUS_MA1"
    # t=prcomp(t[c(n1,n2,n3),])$x # do a secondary PCA to help avoid overlapping points
    
    p1=t[n1,]
    p2=t[n2,]
    p3=t[n3,]
    
    p21=p2-p1
    p31=p3-p1
    frac=(p31%*%p21/p21%*%p21)[1]
    p4=p1+frac*(p2-p1)
    
    xy=as.data.frame(rbind(p1,p2,p3,frac*p2,(1-frac)*p1,p4))
    
    f1=sprintf("%.2f",frac)
    f2=sprintf("%.2f",1-frac)
    rownames(xy)=c(paste0("Source 1: ",n1),paste0("Source 2: ",n2),paste0("Target: ",n3),
      paste0(f1,"*",n2),
      paste0(f2,"*",n1),
      paste0("Model: ",n1,"+",sprintf("%.2f",frac),"*(",n2,"-",n1,")"))
    
    # rownames(xy)=c(paste0("Source 1: ",n1),paste0("Source 2: ",n2),paste0("Target: ",n3),
    #   paste0(round(frac*100),"% ",n2),
    #   paste0(round((1-frac)*100),"% ",n1),
    #   paste0("Model: ",round(100*frac),"% ",n2,if(abs(frac)>1)" - "else" + ",round(abs(100*(1-frac))),"% ",n1))
    
    seg=as.data.frame(t(c(t(xy[1:2,1:2]))))
    
    ggplot(xy,aes(PC1,PC2))+
    geom_vline(xintercept=0,linetype="solid",color="black",size=.2)+
    geom_hline(yintercept=0,linetype="solid",color="black",size=.2)+
    geom_abline(aes(slope=Reduce(`/`,(p1-p2)[2:1]),intercept=unlist(p1[2]-(p1[1]/(p1[1]-p2[1]))*(p1[2]-p2[2]))),linetype="dotted",color="black",size=.3)+
    geom_abline(aes(slope=unlist(p2[2]/p2[1]),intercept=0),linetype="dotted",color="black",size=.3)+
    geom_abline(aes(slope=unlist(p1[2]/p1[1]),intercept=0),linetype="dotted",color="black",size=.3)+
    geom_segment(data=seg,aes(x=V1,y=V2,xend=V3,yend=V4),color="black",size=.4)+
    geom_point(size=.4)+
    # ggrepel::geom_label_repel(label=rownames(xy),fill="white",size=2,label.r=unit(0,"lines"),label.padding=unit(.05,"lines"),label.size=0,direction="both",max.overlaps=Inf,force=4,force_pull=2,segment.size=.2,min.segment.length=.3,box.padding=.1,point.padding=0)+
    geom_label(aes(label=rownames(xy)),size=2,alpha=1,label.padding=unit(.02,"lines"),label.r=unit(0,"lines"),label.size=0,nudge_y=0.025*(max(xy[,2])-min(xy[,2])))+
    expand_limits(x=0,y=0)+
    scale_x_continuous(breaks=seq(-1,1,.1),expand=expansion(mult=.3))+
    scale_y_continuous(breaks=seq(-1,1,.1),expand=expansion(mult=.1))+
    theme(axis.text=element_text(color="black",size=6),
      axis.text.x=element_text(margin=margin(.4,0,0,0,"lines")),
      axis.text.y=element_text(angle=90,vjust=1,hjust=.5,margin=margin(0,.2,0,0,"lines")),
      axis.ticks=element_line(size=.2,color="black"),
      axis.ticks.length=unit(-.2,"lines"),
      axis.title=element_text(color="black",size=6),
      legend.position="none",
      panel.background=element_rect(fill="white"),
      panel.border=element_rect(color="black",fill=NA,size=.4),
      panel.grid=element_blank())
    
    ggsave("1.png",width=4,height=3)
    Last edited by Nganasankhan; 08-21-2022 at 05:15 AM.

  8. The Following 3 Users Say Thank You to Nganasankhan For This Useful Post:

     Huck Finn (08-21-2022),  PLogan (08-21-2022),  Toguz (08-22-2022)

  9. #36
    Registered Users
    Posts
    571
    Sex
    Location
    Missouri, U.S.
    Ethnicity
    Colonial American
    Nationality
    American
    aDNA Match (1st)
    VK2020_Scotland_Orkney_VA:VK207
    Y-DNA (P)
    R1b-U152 >R-FTA96415
    mtDNA (M)
    J1b1a1a
    Y-DNA (M)
    I2-P37 > I-BY77146
    mtDNA (P)
    H

    United States of America Scotland England Netherlands
    Fun stuff...




  10. #37
    Registered Users
    Posts
    991
    Sex
    Ethnicity
    Mixed
    aDNA Match (1st)
    Muslim Andalusi Iberia_Southeast_c.10-16CE:I12514 0.03148322

    delet
    Last edited by Aben Aboo; 08-22-2022 at 01:23 AM.
     
    Illustrative DNA, Breakdown 5 POPS
    Fit: 1.007 (Good)
    Continental Celtic (400BC-200AD) 28%+Germanic (100-1000AD)25.4%+ Guanche (580-11601AD) 22.4% +Iberian (730-100BC) 15.4% +Anatolian (1-400AD) 8.8%

Page 4 of 4 FirstFirst ... 234

Similar Threads

  1. Making Vahaduo-like models based on SNP-level data?
    By Nganasankhan in forum Autosomal (auDNA)
    Replies: 10
    Last Post: 09-04-2022, 02:26 PM
  2. False negative DNA match
    By chocoholic in forum General
    Replies: 0
    Last Post: 01-20-2020, 08:09 PM
  3. U106 and U198 negative
    By bwitlox in forum R1b-U106
    Replies: 1
    Last Post: 09-19-2019, 12:36 PM
  4. M222 Negative
    By jdean in forum M222
    Replies: 8
    Last Post: 10-12-2018, 02:55 PM
  5. early R1a - M17 negative
    By alan in forum R1a General
    Replies: 80
    Last Post: 10-07-2017, 05:58 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •