Page 1 of 2 12 LastLast
Results 1 to 10 of 16

Thread: K36 nMonte for Reich 1240K Dataset - Nganasankhan

  1. #1
    Registered Users
    Posts
    449
    Sex
    Location
    Missouri, U.S.
    Ethnicity
    Colonial American
    Nationality
    American
    aDNA Match (1st)
    VK2020_Scotland_Orkney_VA:VK207
    Y-DNA (P)
    R1b-U152 >R-FTA96415
    mtDNA (M)
    J1b1a1a
    Y-DNA (M)
    I2-P37 > I-BY77146
    mtDNA (P)
    H

    United States of America Scotland England Netherlands

    K36 nMonte for Reich 1240K Dataset - Nganasankhan

    Moving posting results to their own thread. Purpose to show results of Nganasankhan's hard work in this thread.
    https://anthrogenica.com/showthread....l=1#post857192


    Distance to: PLogan
    7.81272040 CEU.SG:NA12778.SG
    7.87158180 England_BellBeaker_mediumEEF:I1767
    8.11208358 Orcadian.SDG:HGDP00800.SDG
    8.99216882 Orcadian.SDG:HGDP00806.SDG
    9.26131200 England_EarlyMedieval_Saxon.SG:I0777.SG
    10.02550747 France_BellBeaker:I3875
    10.09165992 Orcadian:HGDP00806
    10.11887840 Denmark_Viking.SG:VK328.SG
    10.57209062 French:French23830
    10.68203632 Orcadian:HGDP00805
    10.83641084 England_EarlyMedieval_Saxon.SG:I0161.SG
    10.88697387 Orcadian.SDG:HGDP00805.SDG
    10.91338169 CEU.SG:NA12812.SG
    10.95796970 Iceland_Viking.SG:VK230.SG
    10.98861229 GBR.SG:HG00109.SG
    11.01267452 CEU.SG:NA12872.SG
    11.05752233 England_C_EBA:I7573
    11.06221045 Norway_Viking.SG:VK528.SG
    11.13365618 Ireland_Viking.SG:VK545.SG
    11.16871076 Czech_EBA:I7196
    11.18232534 CEU.SG:NA11918.SG
    11.19416366 England_LBA:I7576
    11.25410147 GBR.SG:HG00099.SG
    11.32649990 Orcadian.SDG:HGDP00804.SDG
    11.32873779 Scotland_LBA:I2861


    Distance to: PLogan_Father
    9.27846970 England_BellBeaker_mediumEEF:I1767
    9.86115612 CEU.SG:NA12778.SG
    9.91685434 CEU.SG:NA12760.SG
    10.54879614 CEU.SG:NA12716.SG
    10.66555671 England_LBA:I7576
    10.81158175 Sweden_Viking.SG:VK353.SG
    10.86159749 CEU.SG:NA12348.SG
    10.88306023 CEU.SG:NA12340.SG
    11.05508028 GBR.SG:HG00116.SG
    11.07338702 Denmark_Viking.SG:VK368.SG
    11.12697623 CEU.SG:NA11881.SG
    11.18419867 CEU.SG:NA12045.SG
    11.23769104 Sweden_Viking.SG:VK31.SG
    11.31893104 Orcadian:HGDP00806
    11.36103428 Norway_Medieval.SG:VK118.SG
    11.48136316 French:French23830
    11.62233195 French:French24690
    11.62960876 England_IA_Roman.SG:3DT16.SG
    11.74225277 Iceland_Viking.SG:VK230.SG
    11.99906246 Hungary_Langobard:SZ38
    12.01933858 CEU.SG:NA11931.SG
    12.02552702 Denmark_Viking.SG:VK301.SG
    12.06789543 Hungary_Langobard:SZ42
    12.24561146 Sweden_Viking.SG:VK402.SG
    12.24970204 England_Viking.SG:VK177.SG

    Distance to: PLogan_Mother
    7.07407238 Orcadian.SDG:HGDP00806.SDG
    7.74324221 CEU.SG:NA12778.SG
    8.18899261 England_BellBeaker_mediumEEF:I1767
    8.50406373 Orcadian.SDG:HGDP00800.SDG
    8.71320836 Orcadian:HGDP00806
    8.72139897 England_C_EBA:I7573
    9.00616456 England_EarlyMedieval_Saxon.SG:I0161.SG
    9.02632262 England_EarlyMedieval_Saxon.SG:I0777.SG
    9.05571643 GBR.SG:HG00099.SG
    9.37778759 Ireland_Viking.SG:VK545.SG
    9.40714622 GBR.SG:HG00109.SG
    9.49086930 Orcadian.SDG:HGDP00804.SDG
    9.52764399 Iceland_Viking_1d.rel.VK110.VK230.SG:VK111.SG
    9.75676688 Denmark_Viking.SG:VK328.SG
    9.99913996 Orcadian.SDG:HGDP00805.SDG
    10.02243982 CEU.SG:NA12348.SG
    10.03120631 French:French23830
    10.03605002 Iceland_Viking.SG:VK230.SG
    10.06530675 GBR.SG:HG00096.SG
    10.09916828 CEU.SG:NA12872.SG
    10.15379732 Norway_Viking.SG:VK515.SG
    10.25985867 Denmark_Viking.SG:VK327.SG
    10.33089057 Orcadian:HGDP00805
    10.37651194 England_EarlyMedieval_Saxon.SG:I0774.SG
    10.39063521 GBR.SG:HG00112.SG



    Target: PLogan
    Distance: 184.0683% / 1.84068253 | ADC: 0.25x RC
    29.8 Sweden_Viking.SG
    24.0 Norway_Viking.SG
    14.4 CEU.SG
    7.2 Germany_Lech_EBA_contam
    6.8 Greenland_EarlyNorse_o1.SG
    5.8 French
    5.6 Spain_Greek_oLocal
    3.4 Germany_Lech_BellBeaker_lc
    3.0 England_BellBeaker_mediumEEF_published


    Target: PLogan_Father
    Distance: 320.9227% / 3.20922694 | ADC: 0.25x RC
    34.0 Sweden_Viking.SG
    29.0 CEU.SG
    11.0 Greenland_EarlyNorse_1d.rel.VK11.SG.VK1.SG
    10.6 Iceland_Pre_Christian.SG
    10.2 Greenland_LateNorse.SG
    4.6 Hungary_Langobard_sister.SZ14_sister.SZ6
    0.6 GBR.SG

    Target: PLogan_Mother
    Distance: 122.3923% / 1.22392311 | ADC: 0.25x RC
    26.4 Sweden_Viking.SG
    14.2 England_C_EBA
    12.0 CEU.SG
    9.6 French
    9.6 Orcadian.SDG
    9.0 Orcadian
    5.2 Iceland_Viking_1d.rel.VK110.VK230.SG
    3.2 Spain_Greek_oLocal
    3.0 Spain_EBA
    2.6 Russia_Yaroslavl_Fatyanovo_BA_lc.SG
    2.4 Germany_Lech_EBA_contam
    2.2 Germany_Tollense_BA.SG
    0.6 Scotland_C_EBA_published_lc

  2. The Following 2 Users Say Thank You to PLogan For This Useful Post:

     Bygdedweller (06-26-2022),  Molfish (06-27-2022)

  3. #2
    Gold Class Member
    Posts
    251
    Sex
    Location
    Eastern Norway
    Ethnicity
    Norwegian
    Y-DNA (P)
    R-Z19
    mtDNA (M)
    H4a

    Norway
    Distance to: Bygdedweller
    5.86214978 Norwegian.DG:S_Norwegian-1.DG
    6.37921625 Greenland_EarlyNorse.SG:VK183.SG
    6.53324575 Norway_Viking.SG:VK394.SG
    7.73259336 Faroes_EarlyModern.SG:VK238.SG
    7.74743829 Denmark_EarlyViking.SG:VK296.SG
    8.13224446 Denmark_Viking.SG:VK320.SG
    8.30904327 England_Viking.SG:VK143.SG
    8.31686239 Italy_North_EarlyMedieval_Langobards_1:CL93
    8.37139176 England_Viking.SG:VK449.SG
    8.45257949 England_Viking.SG:VK172.SG
    8.53556091 Estonia_EarlyViking.SG:VK553.SG
    8.64596438 Denmark_Viking.SG:VK279.SG
    8.75051427 Norway_Viking.SG:VK388.SG
    8.80005114 Denmark_Viking.SG:VK370.SG
    8.86859628 Sweden_Viking.SG:VK306.SG
    8.99200200 Italy_North_EarlyMedieval_Langobards_1:CL146
    9.07545591 Iceland_Pre_Christian.SG:YGS-B-2_38.SG
    9.12082233 Sweden_Viking.SG:vik_84001.SG
    9.12319571 Denmark_Viking.SG:VK323.SG
    9.17408306 Sweden_IA.SG:RISE174.SG
    9.21515057 Italy_North_EarlyMedieval_Langobards_1_brother.CL1 46_son.CL151:CL145
    9.22100320 Hungary_Langobard:SZ9
    9.23418107 Denmark_Viking.SG:VK445.SG
    9.33298987 Denmark_Viking.SG:VK281.SG
    9.34086184 Greenland_EarlyNorse.SG:VK1.SG

  4. #3
    Registered Users
    Posts
    306
    Sex
    Ethnicity
    Finnish

    The CSV file I posted isn't really meant to be used without any kind of filtering, but you're supposed to use it together with the anno files of the Reich dataset.

    I only excluded 1868 samples where the part of the sample name after the last underscore in `Eurogenes_K36_refs.txt` was the same as the version ID or master ID of a sample in the Reich dataset, but I missed a lot of samples because they had a different name in the Reich dataset. Therefore for example some Chuvash samples suffer from the calculator effect, so they get around 90-100% of the Volga-Ural component, even though other Chuvash samples only get about 20% Volga-Ural, and similarly some Bulgarian samples get about 90-100% of the East_Balkan component but others get less than 10%:



    But you also need to remove ancient samples with low coverage, because I didn't want to remove them manually so everyone can decide for themselves what threshold they use for removing low-coverage samples. In the Reich dataset, only samples with less than 15,000 SNPs are marked with the `_lc` suffix, but only about 13% of the SNPs in the 1240K panel are included in Eurogenes K36, so even a sample with 20,000 SNPs in 1240K is expected to have only about 2,600 SNPs in common with Eurogenes K36, which of course is not enough to get reliable results with the ADMIXTURE algorithm.

    If for example you look at which samples have the highest percentage of the East_Balkan component in my CSV file, then the first 20 samples are all either modern samples that suffer from the calculator effect or they're ancient samples that were marked as low-coverage:

    Code:
    $ curl -Lso reich.k36.csv 'https://drive.google.com/uc?export=download&id=1dGDy5_jpDR6vxXYz1qIV4Cg6KnKHPeeX'
    $ awk -F, '{print$10,$1}' reich.k36.csv|sort -nr|head -n20
    96.05 Bulgarian:BulgarianF1
    94.57 Bulgarian.DG:S_Bulgarian-1.DG
    92.95 Bulgarian.DG:S_Bulgarian-2.DG
    92.84 Bulgarian:BulgarianB4
    91.88 Bulgarian:BulgarianA4
    90.47 Bulgarian:BulgarianA1
    89.60 Bulgarian:BulgarianC1
    89.46 Bulgarian:BulgarianB1
    56.67 Germany_Lech_EBA_lc:AITI_5_d
    50.47 Czech_BellBeaker_lc:I4947
    43.20 Greenland_LateNorse_lc.SG:VK76.SG
    38.52 Portugal_C_lc:I6466
    36.46 Denmark_Viking_lc.SG:VK318.SG
    32.07 Turkey_Catalhoyuk_N_Ceramic_lc.SG:CCH290.SG
    31.31 Scotland_C_EBA_lc:I5469
    31.27 Spain_IA_Tartessian_lc:I12560
    24.11 Sweden_TRB_MN_lc.SG:Gokhem7.SG
    24.07 Russia_Yaroslavl_Fatyanovo_BA_dup.I7356_lc.SG:NIK006.SG
    23.90 Tajikistan_Ksirov_Kushan_lc:I12257
    23.52 Greenland_EarlyNorse_lc.SG:VK185.SG
    And if you look at the low-coverage Bell Beaker sample that got 50% East_Balkan, it has only 6680 SNPs and only 958 of them overlap with K36:

    Code:
    $ awk -F\\t 'NR==1||/I4947/{print$2,$13,$21}' OFS=, v50.0_1240K_public.anno
    Version ID,Group ID,SNPs hit on autosomal targets
    I4947,Czech_BellBeaker_lc,6680
    $ curl -Lso K36.zip 'https://drive.google.com/uc?export=download&id=1vOJptDkvjDuukjDFQXhgnDDL0zHUJ4YN'
    $ unzip K36.zip
    $ plink --bfile v50.0_1240K_public --keep <(awk '$2=="I4947"' v50.0_1240K_public.fam) --geno 0 --make-bed
    $ awk 'NR==FNR{a[$2];next}$1 in a{n++}END{print n}' plink.bim K36.alleles
    958
    If someone wanted to create an ADMIXTURE-based calculator that was meant for projecting ancient samples with low coverage, then it might make sense to not do any LD pruning but to simply keep all SNPs of the 1240K panel. Then you might get semi-reasonable results even for samples with tens of thousands of SNPs.
    Last edited by Nganasankhan; 06-27-2022 at 02:18 AM.

  5. The Following 4 Users Say Thank You to Nganasankhan For This Useful Post:

     Bygdedweller (06-27-2022),  David Bush (06-27-2022),  jadegreg (06-28-2022),  PLogan (06-27-2022)

  6. #4
    Registered Users
    Posts
    306
    Sex
    Ethnicity
    Finnish

    In the heatmap in my previous post, I noticed that the Bulgarian samples that don't suffer from the calculator effect actually get a higher percentage of the Italian component than the East_Balkan component, even though the references for the East_Balkan component consist solely of Bulgarian and Romanian samples. But it might be because K36 is based on supervised ADMIXTURE, and K36 has 101 references for the Italian component but only 27 references for the East_Balkan component. Because the way that supervised ADMIXTURE works is that the reference samples are forced to get 100% of a single component, but later when you project samples, then there's a bias where the projected samples tend to get a lower percentage of components with a smaller number of references and a larger percentage of components with a larger number of references. I think it's because individual-level genetic variation tends to get more averaged out for components with a larger number of references. Because f2 distance also suffer from a similar bias, where populations with a larger number of samples tend to have a lower f2 distance to other populations, and populations with a smaller number of samples tend to have a higher f2 distance to other populations.

    The bias of reference population size in supervised ADMIXTURE is demonstrated by the graph below, even though in the case of the graph on the left which includes modern samples, it might partially be because I wasn't able to exclude all samples that suffer from the calculator effect. But anyway, the colored clusters are based on cutting a hierarchical clustering tree for the FST matrix of K36 at the height where it has 12 subtrees. And usually within the clusters, the populations with a higher number of references also tend to get a higher average admixture percentage, so for example regardless of whether you look at only ancient samples or both ancient and modern samples, then within the yellow-orange European cluster, the three components with the highest admixture percentage are Iberian, Italian, and Fennoscandian, but they're also the three components with the highest number of references in the European cluster:



    If you look at all samples in the reich.k36.csv file, then the correlation coefficient between the number of references and the average admixture percentage is about 0.55:

    Code:
    $ curl -Lso K36.zip 'https://drive.google.com/uc?export=download&id=1vOJptDkvjDuukjDFQXhgnDDL0zHUJ4YN'
    $ unzip K36.zip
    $ curl -Lso reich.k36.csv 'https://drive.google.com/uc?export=download&id=1dGDy5_jpDR6vxXYz1qIV4Cg6KnKHPeeX'
    $ cor()(awk '{a[NR]=$1;b[NR]=$2;x+=$1;y+=$2}END{x2=x/NR;y2=y/NR;for(i in a){s1+=(a[i]-x2)*(b[i]-y2);s2+=(a[i]-x2)^2;s3+=(b[i]-y2)^2}print s1/(s2^.5*s3^.5)}')
    $ awk '{a[$1]++}END{for(i in a)print i,a[i]}' Eurogenes_K36_refs.txt|sort|paste -d\   - <(awk -F, 'NR>1{for(i=2;i<=NF;i++)a[i]+=$i}END{for(i in a)print a[i]/NR}' reich.k36.csv)|cut -d\  -f2-|cor
    0.549245
    In a spreadsheet of the K36 results of forum users and their family members, Bulgarians also get higher Italian than East_Balkan:

    Code:
    $ curl https://pastebin.com/raw/PNy8i3CG|tr -d \\r>k36forum
    $ tp(){ awk '{for(i=1;i<=NF;i++)a[i][NR]=$i}END{for(i in a)for(j in a[i])printf"%s"(j==NR?"\n":FS),a[i][j]}' "FS=${1-$'\t'}";}
    $ tab()(a '{if(NF>m)m=NF;for(i=1;i<=NF;i++){a[NR][i]=$i;l=length($i);if(l>b[i])b[i]=l}}END{for(h in a){for(i=1;i<=m;i++)printf("%-"(b[i]+n)"s",a[h][i]);print""}}' n=${2-1} FS="${1-$'\t'}"|sed 's/ *$//')
    $ awk 'NR==1||/Bul:/' k36forum|tp ,|tab ,
                        Bul:Bained Bul:PAGANE Bul:PAGANE_Moter Bul:PAGANE_cousin Bul:PAGANE_brother
    Amerindian          0          0          0                0                 0
    Arabian             0.67       2.93       1.37             0                 1.49
    Armenian            4.49       8.59       8.06             3.81              7.55
    Basque              0.86       0.59       1.55             0                 0.76
    Central_African     0          0          0                0                 0
    Central_Euro        7.2        3.1        2.76             1.92              1.33
    East_African        0          0          0                0                 0
    East_Asian          0          0          0                0                 0
    East_Balkan         7.2        8.9        11.15            7.43              7.84
    East_Central_Asian  0          0          0                0                 0
    East_Central_Euro   11.17      9.86       9.36             12.21             13.55
    East_Med            6.8        5.23       9.61             0                 10.19
    Eastern_Euro        3.64       8.36       2.91             4.96              5.95
    Fennoscandian       2.28       3.92       3.62             7.29              2.29
    French              5.93       6.66       3.49             0.57              6.38
    Iberian             13.69      6.07       1.87             11.48             4.09
    Indo-Chinese        0          0          0                0                 0
    Italian             12.61      11.93      17.16            19.14             10.76
    Malayan             0          0          0                0                 0
    Near_Eastern        6.03       4.09       5.14             6.18              5.96
    North_African       0.38       0          0                0.78              0
    North_Atlantic      3.23       3.28       7.18             0.92              0.96
    North_Caucasian     2.61       6.14       8.31             5.07              7.42
    North_Sea           7.67       5.09       3.55             7.31              8.05
    Northeast_African   0          0          0                0.11              0
    Oceanian            0          0          0                0                 0
    Omotic              0          0          0                0                 0
    Pygmy               0          0          0                0                 0
    Siberian            0          0          0                0                 0
    South_Asian         0          0          0                0                 0
    South_Central_Asian 0          1.06       0                0.25              0.45
    South_Chinese       0          0          0                0                 0
    Volga-Ural          0          0          0                0.33              0
    West_African        0          0          0                0                 0
    West_Caucasian      0          3.32       2.93             3.38              2.37
    West_Med            3.52       0.89       0                6.83              2.6
    Similarly the references of the Central_European component consist of 18 Hungarians and 11 Utah whites, but Hungarians still get fairly low percentages of the Central_European component, which might be because it has only 29 total references. But many Hungarians get the highest percentage of the Italian or Iberian component, which might be because the Italian component has 101 references and the Iberian component has 96 references:

    Code:
    $ Rscript -e 't=read.csv("k36forum",r=1,check=F);t=t[grepl("Hun",rownames(t)),];x=apply(t,1,function(x){o=order(-x);paste(head(paste(round(x,1),colnames(t))[o],5),collapse=", ")});writeLines(paste(names(x),x,sep=": "))'
    Hun:Stears: 17.9 Italian, 15.9 East_Central_Euro, 13.7 Fennoscandian, 11.5 North_Sea, 6.7 Eastern_Euro
    Hun:Benyzero: 18.8 Iberian, 15.2 Central_Euro, 13.1 Italian, 12.7 East_Central_Euro, 12.4 North_Sea
    Hun:17571imremother: 18.2 Italian, 14.1 East_Central_Euro, 11.5 Fennoscandian, 10.2 Iberian, 10 Central_Euro
    Hun:Gergő: 20 Central_Euro, 15.8 Eastern_Euro, 14 East_Central_Euro, 11.7 Iberian, 8.1 Fennoscandian
    Hun:GergőGrandma: 18.9 East_Central_Euro, 11.8 North_Sea, 10.5 Central_Euro, 10.1 North_Atlantic, 9.9 East_Balkan
    Hun:Blanka: 19.5 Italian, 17.3 Eastern_Euro, 14.7 East_Central_Euro, 10.4 North_Atlantic, 8.4 North_Sea
    Hun:Blanka_husband: 19 East_Central_Euro, 15.2 North_Sea, 12 Eastern_Euro, 9.6 Iberian, 9.1 North_Atlantic
    Hun:Mr.G_father: 18.3 Italian, 15.2 East_Central_Euro, 11.4 Iberian, 11.2 Eastern_Euro, 10 Central_Euro
    Hun:someonenotyou: 14.6 French, 14.1 Italian, 9 Fennoscandian, 8.7 East_Central_Euro, 8.7 Eastern_Euro
    Hun:Kokeny: 12.1 East_Central_Euro, 11.8 Italian, 10.6 Iberian, 10.3 East_Balkan, 9 North_Sea
    Hun:kokeny_mom: 13.3 East_Central_Euro, 12.2 Iberian, 9.8 Italian, 8.4 North_Sea, 8.1 Eastern_Euro
    Hun:Universe: 20.6 East_Central_Euro, 16.9 Eastern_Euro, 9.7 North_Atlantic, 8.1 East_Med, 7.2 French
    Hun/DE:Mr.G: 14.2 Iberian, 13.9 Italian, 10.8 North_Atlantic, 10.7 East_Central_Euro, 9.7 North_Sea
    Hun/GB:Oszkar: 14.1 North_Sea, 11.7 Italian, 11.2 East_Central_Euro, 11.2 Eastern_Euro, 10.2 North_Atlantic
    Hun+NL:17571imre: 17.4 Italian, 14.4 East_Central_Euro, 9.9 Fennoscandian, 9.9 Iberian, 9.9 North_Sea
    Rom/Hun:IrisSelene: 15.6 East_Central_Euro, 13.1 Italian, 10.8 North_Caucasian, 10.5 East_Balkan, 9.6 Fennoscandian
    Last edited by Nganasankhan; 06-27-2022 at 04:15 AM.

  7. The Following 6 Users Say Thank You to Nganasankhan For This Useful Post:

     Bygdedweller (06-27-2022),  David Bush (06-27-2022),  jadegreg (06-28-2022),  PLogan (06-27-2022),  Ruderico (06-27-2022),  Varun R (06-27-2022)

  8. #5
    Registered Users
    Posts
    306
    Sex
    Ethnicity
    Finnish

    I did an experiment where I first did two supervised K=2 ADMIXTURE runs. In the first run one reference population consisted of 50 Han samples and the other reference population consisted of 20 French samples, and in the second run one reference population consisted of 20 Han samples and the other reference population consisted of 50 French samples. Then I selected 1000 random modern samples and I projected them on both runs, and when I looked at the average percentage of the Han component among the projected samples, it was 49% in the run with 50 Han references and 37% in the run with 20 Han references:

    Code:
    $ awk '($3=="French"&&++a[$3]<=20)||($3=="Han"&&++a[$3]<=50){print$1,$3}' v50.0_HO_public.ind>ref1.pick
    $ awk '($3=="French"&&++a[$3]<=50)||($3=="Han"&&++a[$3]<=20){print$1,$3}' v50.0_HO_public.ind>ref2.pick
    $ awk -F\\t 'NR>1&&$5==0&&$7!="French"&&$7!="Han"{print$2,$7}' v50.0_HO_public.anno|tr -d \"|shuf -n1000 >proj1.pick;cp proj{1,2}.pick
    $ for f in {ref,proj}{1,2};do plink --bfile v50.0_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $f.pick v50.0_HO_public.fam) --make-bed --out $f;cut -d\  -f2 $f.pick>$f.pop;done
    [...]
    $ for i in {1,2};do admixture -j4 --supervised ref$i.bed 2;cp ref$i.2.P proj$i.2.P.in;admixture -j4 -P proj$i.bed 2;done
    [...]
    $ for i in {1,2};do awk '{x+=$2}END{print x/NR}' proj$i.2.Q;done
    0.491115
    0.365475
    So in other words when Davidski made K36, he should've included the same number of reference samples for each component to avoid bias. But actually the number of references ranges from 16 to 159:

    Code:
    $ awk '{++a[$1]}END{for(i in a)print a[i],i}' Eurogenes_K36_refs.txt|sort -n
    16 Volga-Ural
    21 Basque
    21 Omotic
    24 West_Mediterranean
    25 North_African
    27 Armenian
    27 East_Balkan
    27 Oceanian
    28 French
    29 Amerindian
    29 Central_European
    32 West_Caucasian
    35 Central_African
    35 Pygmy
    38 Arabian
    39 East_Central_European
    42 East_Central_Asian
    42 Eastern_European
    43 North_Atlantic
    49 North_Sea
    70 East_African
    72 South_Asian
    73 Indo-Chinese
    82 East_Mediterranean
    83 South_Chinese
    86 Malayan
    90 Near_Eastern
    92 North_Caucasian
    93 Siberian
    95 South_Central_Asian
    96 Fennoscandian
    96 Iberian
    101 Italian
    101 Northeast_African
    124 West_African
    159 East_Asian
    Last edited by Nganasankhan; 06-27-2022 at 07:55 AM.

  9. The Following 4 Users Say Thank You to Nganasankhan For This Useful Post:

     jadegreg (06-28-2022),  PLogan (06-27-2022),  Ruderico (06-27-2022),  xerxez (06-28-2022)

  10. #6
    Registered Users
    Posts
    449
    Sex
    Location
    Missouri, U.S.
    Ethnicity
    Colonial American
    Nationality
    American
    aDNA Match (1st)
    VK2020_Scotland_Orkney_VA:VK207
    Y-DNA (P)
    R1b-U152 >R-FTA96415
    mtDNA (M)
    J1b1a1a
    Y-DNA (M)
    I2-P37 > I-BY77146
    mtDNA (P)
    H

    United States of America Scotland England Netherlands
    Great analysis Nganasankhan. Really enjoy learning from you.

  11. The Following 2 Users Say Thank You to PLogan For This Useful Post:

     jadegreg (06-28-2022),  Nganasankhan (06-27-2022)

  12. #7
    Global Moderator
    Posts
    4,538
    Sex
    Location
    Vissaiom
    Ethnicity
    Portuguese highlander
    Y-DNA (P)
    E-Y31991>FT17866
    mtDNA (M)
    H20 (xH20a)

    Asturias Galicia Portugal 1143 Portugal 1485 Portugal Order of Christ PortugalRoyalFlag1830
    Nganasankhan could you eventually do a similar analysis for K15? I've always been curious about it since I get abnormally high Atlantic at the expense of very low North Sea for my ethnic background, which naturally leads into some truly terrible distances between us (Portuguese and Galician end up about the most distant Iberian references for me, barely closer than French Basque). I wonder if it had anything to do with the samples David chose for each of those two components. That said I still enjoy looking at K15's results.
    YDNA E-Y31991>PF4428>Y134097>Y134104>Y168273>FT17866 (TMRCA ~1000AD) - Domingos Rodrigues, b. circa 1690 Hidden Content , Viana do Castelo, Portugal - Stonemason, miller.
    mtDNA H20 - Monica Vieira, b. circa 1700 Hidden Content , Porto, Portugal

    Hidden Content
    Global25 PCA West Eurasia dataset Hidden Content

    [1] "distance%=1.7123"
    Ruderico

    Gallic_North_IA,46.4
    West_Iberia_IA,31.8
    Berber_EMA,11
    Roman_Colonial,10.8

  13. #8
    Registered Users
    Posts
    593
    Sex
    Location
    Australia
    Ethnicity
    English/Irish
    aDNA Match (1st)
    England_Mid-Late Iron Age:I20623
    aDNA Match (2nd)
    Norway_Mid_MA:VK117
    aDNA Match (3rd)
    England_Iron Age:I0160
    Y-DNA (P)
    R1b-M222
    mtDNA (M)
    K1a10
    Y-DNA (M)
    R1b-U106
    mtDNA (P)
    H1bz

    Australia Mercia Ireland Leinster
    Thanks to the one with unspellable usernames.

    CEU (White Americans from Utah) against my k36 averages:

    Code:
    CEU(n=75),0.26,0.06,0.30,2.22,0.13,6.86,0.05,0.03,2.43,0.06,5.66,0.11,3.90,9.96,6.71,12.51,0.16,8.06,0.06,0.12,0.15,14.48,2.76,18.64,0.05,0.14,0.09,0.05,0.05,0.42,1.07,0.11,0.90,0.07,0.48,0.88
    Distance to: CEU(n=75)
    2.92371339 English_East
    3.09938704 English_Southeast
    3.30582819 English_West-Midlands
    3.30611555 English
    3.55814558 English_East-Midlands
    3.70197245 Dutch_South
    3.89341752 English_Lancashire
    4.08690592 Dutch
    4.24829378 English_Yorkshire
    4.30555455 Dutch_Central
    4.63841568 English_Northeast
    4.90033672 English_Southwest
    4.96820893 British_Ulster
    5.10972602 Scottish
    5.51683786 Scottish_East
    5.69494513 French_Normandy
    5.69974561 Scottish_Highlands
    5.72495415 German_Northwest
    5.73939021 Welsh
    5.80420537 Scottish_Orkney

    Target: CEU(n=75)
    Distance: 99.5159% / 0.99515867 | ADC: 0.5x RC
    36.0 English_East
    27.6 English_Southeast
    15.8 Dutch_South
    9.2 Dutch
    5.4 Scottish_Highlands
    2.8 Volga_Tatar
    2.0 Tabasaran
    1.2 Kalash

    Target: CEU(n=75)
    Distance: 66.0410% / 0.66041017 | ADC: 0.25x RC
    42.8 English_East
    20.0 Dutch_South
    17.8 English_Southeast
    8.2 Icelandic
    5.4 British_Ulster
    2.6 Tabasaran
    2.0 Chuvash
    1.0 Bahun
    0.2 Kalash

    Individuals
    Code:
    Distance to:	CEU.SG:NA12890.SG
    9.16800960	English_East
    9.24170980	Scottish_East
    9.48151887	German_Northwest
    9.49628348	British_Ulster
    9.54545965	Scottish_Highlands
    
    Distance to:	CEU.SG:NA12878.SG
    9.10643728	Irish_Connacht
    9.21100429	Irish_Ulster
    9.37171809	Scottish_Orkney
    9.43326031	British_Ulster
    9.46770299	Irish
    
    Distance to:	CEU.SG:NA12874.SG
    8.05368239	English_West-Midlands
    8.30734615	English_East
    8.31137774	Scottish
    8.35329276	Welsh
    8.37269371	English_Southwest
    
    Distance to:	CEU.SG:NA12873.SG
    10.67980337	English_East
    11.01034059	English_Southwest
    11.17481991	British_Ulster
    11.20440538	English_Southeast
    11.21107488	English_Lancashire
    
    Distance to:	CEU.SG:NA12872.SG
    6.56115081	English_Northeast
    6.80720207	Scottish_Orkney
    7.03146500	English_West-Midlands
    7.07464487	English_East-Midlands
    7.16647054	Scottish
    
    Distance to:	CEU.SG:NA12843.SG
    7.22749611	German_Lower_Saxony_North
    7.92630431	German_Westphalia
    8.22346642	Icelandic
    8.43380697	Dutch_North
    8.95530011	German_Northwest
    
    Distance to:	CEU.SG:NA12842.SG
    10.28769654	Scottish_Highlands
    10.57330128	German_Lower_Saxony_South
    10.64838485	Scottish_East
    10.75171149	Scottish
    10.90531522	German_Bavaria_Lower/Middle_Franconia
    
    Distance to:	CEU.SG:NA12830.SG
    5.21662726	Dutch_South
    5.23756623	Afrikaner
    5.65416661	French_Normandy
    6.73562915	French_Hauts-de-France
    6.81161508	Flemish
    
    Distance to:	CEU.SG:NA12829.SG
    9.96687514	Swiss_German
    10.12937807	German_Baden-Württemberg
    10.55162547	Walloon
    10.66175408	German_Hesse_South
    10.82586717	German_South
    
    Distance to:	CEU.SG:NA12828.SG
    8.55427963	English_East
    8.70703738	German_Bavaria_Lower/Middle_Franconia
    9.16901849	English_West-Midlands
    9.25056755	English_East-Midlands
    9.26754013	French_Hauts-de-France
    
    Distance to:	CEU.SG:NA12814.SG
    6.93236612	German_Westphalia
    7.41891502	Dutch_Central
    7.70647131	Dutch
    8.45576726	German_Northwest
    8.86754194	Dutch_North
    
    Distance to:	CEU.SG:NA12813.SG
    7.58909744	English_Southwest
    8.05836832	French_Brittany
    8.17357327	English_East-Midlands
    8.18297012	English_East
    8.23560562	English_Lancashire
    
    Distance to:	CEU.SG:NA12812.SG
    6.09839323	English_Northeast
    6.12459795	English_Southeast
    6.24037659	English
    6.39709309	English_West-Midlands
    6.42288097	Scottish
    
    Distance to:	CEU.SG:NA12778.SG
    6.45404524	English_Northeast
    7.01787717	English_East-Midlands
    7.02746754	Cornish
    7.27662009	Dutch_South
    7.38375921	English_West-Midlands
    
    Distance to:	CEU.SG:NA12777.SG
    10.86379308	Dutch_South
    10.94039762	Flemish
    11.70018376	English_Lancashire
    11.70747624	English_East-Midlands
    11.72943306	English_Southeast
    
    Distance to:	CEU.SG:NA12775.SG
    8.74631923	French_Normandy
    8.94322649	Dutch_South
    8.95898432	Afrikaner
    9.29980107	Scottish_Orkney
    9.38211597	French_Brittany
    
    Distance to:	CEU.SG:NA12763.SG
    8.72639101	Afrikaner
    8.99524875	Austrian_Burgenland
    8.99786086	German_Bavaria_Oberpfalz
    9.10432315	German
    9.37704644	French_East
    
    Distance to:	CEU.SG:NA12760.SG
    9.53036725	German_Lower_Saxony_South
    10.40777594	German_Westphalia
    10.55098100	Dutch
    10.74777186	Dutch_Central
    10.92657311	English_East-Midlands
    
    Distance to:	CEU.SG:NA12751.SG
    6.88414846	German_Westphalia
    7.74330033	Dutch_Central
    7.76626036	German_Northwest
    7.77862456	German_Mecklenburg_Center/East
    7.80144858	Danish
    
    Distance to:	CEU.SG:NA12750.SG
    11.30004867	English_East
    11.75572626	German_Northwest
    12.17267021	Scottish_Highlands
    12.28188911	English_Southeast
    12.31250178	English_Southwest
    
    Distance to:	CEU.SG:NA12749.SG
    8.94716156	English_East
    9.16746421	English_Southeast
    9.19103367	English_Southwest
    9.31028464	English
    9.36703261	English_Lancashire
    
    Distance to:	CEU.SG:NA12748.SG
    10.56196951	Scottish_Highlands
    10.72268157	German_Westphalia
    10.82147864	German_Mecklenburg_Center/East
    11.06997290	Dutch
    11.11881738	Faroese
    
    Distance to:	CEU.SG:NA12718.SG
    8.32576723	French_Normandy
    9.01848103	Dutch_South
    9.02765196	Afrikaner
    9.20248336	English_Lancashire
    9.22085137	English_West-Midlands
    
    Distance to:	CEU.SG:NA12716.SG
    8.77880971	German_Saxony-Anhalt_South
    10.32707122	German
    11.15716810	German_Lower_Saxony_South
    11.32645576	German_Northwest
    11.43106732	German_Westphalia
    
    Distance to:	CEU.SG:NA12489.SG
    10.71109238	French_Normandy
    10.79301163	French_Brittany
    11.22106056	Cornish
    11.22653108	English_Southwest
    11.36335338	Afrikaner
    
    Distance to:	CEU.SG:NA12414.SG
    11.33858898	Icelandic
    12.25487658	Scottish_East
    12.28158785	British_Ulster
    12.45926162	Irish_Connacht
    12.48645666	Scottish_Highlands
    
    Distance to:	CEU.SG:NA12413.SG
    10.24385181	French_Northeast
    10.67729366	French_East
    11.00960036	Afrikaner
    11.20215158	French_Normandy
    11.99438619	French
    
    Distance to:	CEU.SG:NA12400.SG
    6.96103441	English_East-Midlands
    7.00631858	English_East
    7.18573587	English_Southeast
    7.36503225	English_West-Midlands
    7.46008043	English
    
    Distance to:	CEU.SG:NA12348.SG
    8.01746843	German
    8.02889781	English_West-Midlands
    8.07317162	Dutch
    8.31544948	Dutch_South
    8.33598225	Afrikaner
    
    Distance to:	CEU.SG:NA12342.SG
    7.12830274	English_Southeast
    7.27038513	English_East-Midlands
    7.29426487	English_Yorkshire
    7.38635905	English
    7.64600549	English_West-Midlands
    
    Distance to:	CEU.SG:NA12341.SG
    8.09701179	British_Ulster
    8.09884560	Scottish_Orkney
    8.33488452	English_Southeast
    8.39207364	Scottish_Southwest
    8.49974117	Scottish
    
    Distance to:	CEU.SG:NA12340.SG
    8.65210957	Dutch_South
    8.79072807	Flemish
    9.75691037	Afrikaner
    9.94916579	German_Rhineland-Palatinate
    10.25456484	German_South
    
    Distance to:	CEU.SG:NA12287.SG
    7.25667279	French_Northeast
    7.55832653	Afrikaner
    7.81758275	Dutch_South
    8.01709424	French_Normandy
    8.01915207	French_East
    
    Distance to:	CEU.SG:NA12286.SG
    7.41851063	Slovak_2
    8.01270866	Czech_Moravia
    8.11308203	Slovenian
    8.14030098	Austrian_Carinthia
    8.37386410	Hungarian
    
    Distance to:	CEU.SG:NA12283.SG
    10.13282784	Scottish_East
    10.39130406	Scottish
    10.42812543	British_Ulster
    10.70221005	Welsh
    10.76732093	Scottish_Highlands
    
    Distance to:	CEU.SG:NA12275.SG
    9.45995772	French_Normandy
    9.74681486	English_Southeast
    9.75122556	Dutch_South
    9.76885357	Afrikaner
    10.08451784	English_Southwest
    
    Distance to:	CEU.SG:NA12273.SG
    9.40132437	German_Bavaria_Oberpfalz
    9.44213429	German_North_Rhine
    10.00915081	Flemish
    10.02940676	French_East
    10.38352060	Swiss_German
    
    Distance to:	CEU.SG:NA12272.SG
    9.51036803	French
    10.93241968	French_East
    11.12346619	French_Hauts-de-France
    11.28094854	French_Brittany
    11.65769274	French_Normandy
    
    Distance to:	CEU.SG:NA12156.SG
    8.80454428	Flemish
    9.10580035	Afrikaner
    9.35908649	English_East-Midlands
    9.44167358	French_Hauts-de-France
    9.69416319	English_East
    
    Distance to:	CEU.SG:NA12155.SG
    10.94421308	Irish_Ulster
    12.10356972	Irish_Connacht
    12.31312308	Irish
    12.48224339	Irish_Munster
    12.52304675	Scottish_Orkney
    
    Distance to:	CEU.SG:NA12154.SG
    8.27871971	German
    8.53591823	Dutch_South
    8.69726394	Afrikaner
    8.79722684	English_Northeast
    8.90193237	Scottish_Highlands
    
    Distance to:	CEU.SG:NA12058.SG
    8.76919609	Scottish_Orkney
    9.02925800	Irish
    9.05512010	Welsh
    9.05564465	Irish_Connacht
    9.06865481	British_Ulster
    
    Distance to:	CEU.SG:NA12046.SG
    8.11098638	French_Normandy
    8.21386024	English_Southeast
    8.53055684	English_Southwest
    8.55854544	French_Brittany
    8.64175329	English
    
    Distance to:	CEU.SG:NA12045.SG
    6.11262628	German_South
    6.78275018	German_Baden-Württemberg
    7.03845153	Flemish
    7.24637841	Walloon
    7.25432285	German_Hesse_South
    
    Distance to:	CEU.SG:NA12043.SG
    10.25152672	English_Southeast
    10.37293112	English_East
    10.52057508	Scottish_Highlands
    10.61341133	British_Ulster
    10.70979925	English
    
    Distance to:	CEU.SG:NA12006.SG
    7.95710374	English_East
    8.23904728	English_West-Midlands
    8.36194953	English_Southeast
    8.44641344	English_Southwest
    8.48579990	English
    
    Distance to:	CEU.SG:NA12004.SG
    7.56562621	Afrikaner
    7.73292959	Dutch_South
    7.82231424	French_Normandy
    8.27571749	Flemish
    8.30341496	German_Rhineland-Palatinate
    
    Distance to:	CEU.SG:NA11995.SG
    5.21676145	German_Westphalia
    5.84012842	German_Northwest
    6.32134479	Dutch
    6.47374698	Dutch_Central
    6.69522965	Icelandic
    
    Distance to:	CEU.SG:NA11994.SG
    9.88378976	Cornish
    10.31055285	English_Northeast
    10.54466690	English_East-Midlands
    10.64923002	Afrikaner
    10.66580049	French_Hauts-de-France
    
    Distance to:	CEU.SG:NA11992.SG
    7.33993188	French_Hauts-de-France
    7.62226344	French_Brittany
    7.67659430	French_Normandy
    8.47332284	Afrikaner
    8.58504514	French
    
    Distance to:	CEU.SG:NA11933.SG
    9.29767713	French_Normandy
    9.59605127	French_East
    9.74994872	Afrikaner
    10.20005392	German_Bavaria_proper
    10.22576159	German_Bavarian_Swabia
    
    Distance to:	CEU.SG:NA11932.SG
    12.37712810	German_North_Moravia
    14.45766925	German_Mecklenburg_West
    14.46725613	German_West_Bohemia
    14.57240543	Dutch_Central
    14.63064934	German
    
    Distance to:	CEU.SG:NA11931.SG
    10.46013384	Flemish
    10.65914631	Walloon
    10.79878234	Dutch_South
    11.41189730	Austrian_Salzburg-Upper_Austria
    11.42084060	German_South
    
    Distance to:	CEU.SG:NA11920.SG
    6.93551007	English_Southeast
    6.96355513	Dutch
    6.99562006	English_Yorkshire
    7.22991010	English_West-Midlands
    7.33455520	English
    
    Distance to:	CEU.SG:NA11918.SG
    6.12175628	Irish_Munster
    6.24747149	Irish
    6.49241866	Irish_Connacht
    6.50329916	Irish_Leinster
    6.65487040	Irish_Ulster
    
    Distance to:	CEU.SG:NA11894.SG
    7.76902182	Scottish_East
    7.84680827	Welsh
    7.94243665	Scottish
    8.29241822	Scottish_Orkney
    8.36151302	Irish_Connacht
    
    Distance to:	CEU.SG:NA11893.SG
    7.91106820	British_Ulster
    8.17294317	Scottish_East
    8.36698273	English_Southwest
    8.44579185	English_Lancashire
    8.46615615	Scottish
    
    Distance to:	CEU.SG:NA11892.SG
    7.02538255	Scottish_East
    7.13164778	British_Ulster
    7.34945576	Scottish_Orkney
    7.39827007	English_Southwest
    7.50054665	Scottish
    
    Distance to:	CEU.SG:NA11881.SG
    8.99130691	Dutch_South
    9.18604376	German_Rhineland-Palatinate_East_of_Rhine
    9.19269819	Flemish
    9.41281042	Dutch
    9.52474147	Afrikaner
    
    Distance to:	CEU.SG:NA11843.SG
    10.39401751	German
    10.39442158	German_North_Rhine
    11.04955655	Dutch_Central
    11.11857905	Flemish
    11.13089844	Dutch
    
    Distance to:	CEU.SG:NA11840.SG
    9.83011190	Swedish
    11.93256469	Norwegian
    11.93349069	Swedish_North
    12.01338420	Swedish_2
    12.05341860	German_Brandenburg_Northwest
    
    Distance to:	CEU.SG:NA11831.SG
    7.22202880	English_Lancashire
    7.30879607	English_Southwest
    7.54416993	British_Ulster
    7.55368784	French_Brittany
    7.58478741	English_Southeast
    
    Distance to:	CEU.SG:NA10851.SG
    36.22762344	German_Frisian
    37.31018226	German_Lower_Saxony_North
    37.79421781	Dutch_North
    38.41379700	German_Schleswig-Holstein
    38.42682527	German_Westphalia
    
    Distance to:	CEU.SG:NA10847.SG
    34.80658271	German_Lower_Saxony_North
    34.83845289	German_Frisian
    35.53822027	German_Schleswig-Holstein
    35.55859671	Dutch_North
    35.55889762	German_Westphalia
    
    Distance to:	CEU.SG:NA07347.SG
    7.35343457	French
    7.67492020	French_Hauts-de-France
    8.06597793	German_Hesse_South
    8.09430664	French_Northeast
    8.16369402	French_Normandy
    
    Distance to:	CEU.SG:NA07056.SG
    6.70189525	English_Lancashire
    6.72249954	English_East
    6.75837998	French_Normandy
    6.98881249	English_East-Midlands
    7.05056735	English
    
    Distance to:	CEU.SG:NA07051.SG
    7.77209110	English_Southeast
    7.91924239	English_Yorkshire
    8.17875296	English
    8.25302975	English_Northeast
    8.26846419	English_Lancashire
    
    Distance to:	CEU.SG:NA07048.SG
    5.62206368	English_Southeast
    5.89547284	English
    5.89997458	English_West-Midlands
    5.96453686	English_Lancashire
    6.12107017	English_East
    
    Distance to:	CEU.SG:NA07037.SG
    41.71540123	Austrian_Salzburg-Upper_Austria
    42.90875086	German_North_Moravia
    43.40531534	German_Bavarian_Swabia
    43.42499741	German_Mecklenburg_West
    43.67852791	German_Saxony-Anhalt_South
    
    Distance to:	CEU.SG:NA07000.SG
    8.04397290	Dutch
    8.04662662	English_Northeast
    8.08086629	English_Lancashire
    8.11421592	Dutch_Central
    8.26352225	Icelandic
    
    Distance to:	CEU.SG:NA06994.SG
    9.51677466	Dutch
    9.86958966	German_Westphalia
    9.89350797	German_Northwest
    9.93374048	Dutch_Central
    10.24163561	German_Schleswig-Holstein
    
    Distance to:	CEU.SG:NA06989.SG
    7.03767007	Scottish_Southwest
    7.32726416	Scottish_Orkney
    7.35578004	Scottish_Highlands
    7.38080619	British_Ulster
    7.49711945	Welsh
    
    Distance to:	CEU.SG:NA06986.SG
    5.87464041	Scottish_Highlands
    6.72059521	Irish_Connacht
    6.73985905	Scottish_Southwest
    6.74785892	Irish
    7.01646635	Irish_Munster
    
    Distance to:	CEU.SG:NA06985.SG
    11.44111009	German
    11.52725032	German_Saxony-Anhalt_South
    11.82739194	German_North_Moravia
    12.08669516	German_Lower_Saxony_South
    12.09246873	Austrian_Salzburg-Upper_Austria
    
    Distance to:	CEU.SG:NA06984.SG
    7.98589381	English_West-Midlands
    8.02224408	English_Southeast
    8.09973456	English_East
    8.41107603	English
    8.76012557	English_Yorkshire
    Compared to my collected results for Utah in K13:
    Code:
    Distance to:	US_Utah(n=22)
    1.28537932	English_Midlands
    1.30433891	English_Southeast
    1.81598458	English_North
    1.99802402	Dutch
    2.03575539	English_Southwest
    2.32447844	Welsh
    2.49543583	Dutch_Central
    2.97077431	Scottish
    3.08535249	German_Northwest
    3.77591049	Dutch_South
    3.96969772	Dutch_North
    4.33301281	French_Brittany
    4.37941777	Irish
    5.20783064	Flemish
    5.26353493	Icelandic
    6.42045170	Danish
    6.65690619	Norwegian
    7.08885745	Norwegian_Southcentral
    7.16507502	Swedish_Götaland
    8.12160698	French_North
    
    Target: US_Utah(n=22)
    Distance: 71.8201% / 0.71820141 | ADC: 0.5x RC
    66.2	English_Midlands
    21.2	Welsh
    11.2	German_Northwest
    1.4	Dargin
    
    Target: US_Utah(n=22)
    Distance: 39.0171% / 0.39017089 | ADC: 0.25x RC
    49.8	English_Midlands
    38.8	Welsh
    5.4	Norwegian
    4.4	German_Southeast
    1.2	Dargin
    0.4	Maya
    
    Distance to:	US_Utah(n=22)
    1.09590146	English_Southeast
    1.27976560	English_Northeast
    1.32762947	English_East-Midlands
    1.38329317	English_West-Midlands
    1.65987951	Dutch_South-Holland
    1.97643619	English_East
    2.12624081	English_Southwest
    2.32295071	Welsh_South
    2.34354432	English_Yorkshire
    2.35070202	English_Lancashire
    2.61597018	Dutch_North-Brabant
    2.67007491	Scottish_East
    2.75889471	Cornish
    2.81682445	German_Westphalia
    2.83594781	Scottish_North-Highlands
    2.84241095	Scottish_Northeast
    2.94154721	British_Ulster
    3.08535249	German_Lower_Saxony_South
    3.29047109	Scottish_Orkney
    3.30099985	Dutch_Gelderland
    
    
    Target: US_Utah(n=22)
    Distance: 74.1125% / 0.74112508 | ADC: 0.5x RC
    66.4	English_Southeast
    18.0	German_Westphalia
    15.6	Welsh_South
    
    Target: US_Utah(n=22)
    Distance: 46.5525% / 0.46552523 | ADC: 0.25x RC
    40.2	Welsh_South
    34.6	English_Southeast
    20.0	Dutch_South-Holland
    3.6	German_County_Glatz
    1.2	German_Westphalia
    0.4	Turkish_Meskheti
    Genealogy to 6 gens: 42% Staffordshire, 40% Offaly, 6% Mayo, 3% Derbyshire, 3% Galway, 3% Shropshire, 1% Roscommon
    Surnames to 4 gens: 10/16 English, 4/16 Irish, 1/16 Welsh, 1/16 unknown
    LivingDNA: 47.8% Ireland, 50.4% England (16.2% NW Eng, 15.7% Central Eng, 7.4% SE Eng, 5.4% Sth Eng, 4.4% S. Yorks, 1.3% S.Central Eng), 1.7% Sth Wales
    AncestryDNA: 48% England & NWE, 45% Ireland, 3% Scotland, 2% Sweden & Denmark, 2% Wales
    G25 closest ancient pop: Faroes_EM @ 0.018

  14. The Following 2 Users Say Thank You to Molfish For This Useful Post:

     jadegreg (06-28-2022),  PLogan (06-27-2022)

  15. #9
    Registered Users
    Posts
    306
    Sex
    Ethnicity
    Finnish

    Quote Originally Posted by Ruderico View Post
    Nganasankhan could you eventually do a similar analysis for K15?
    I'll wait if I find more errors in my files for K13 or K36 before I post files for more calculators.

    Maybe someone could ask Davidski if he can provide a list of the samples he used as references in K15 or K13, or if he can somehow convert the sample IDs in `Eurogenes_K36_refs.txt` so they match the IDs used in the Reich dataset. Or if he just tells what datasets he used for the reference samples, then maybe we can look at the files included with those datasets to convert the sample IDs. Or if he can post a PLINK dataset for all of the reference samples, then I can merge it with the Reich dataset and look at IBS to see which samples are also included in the Reich dataset.

    When I tried googling for the sample IDs in `Eurogenes_K36_refs.txt`, I found that some of the samples were included in old datasets published by the Estonian Biocentre, like "The Genome Wide Structure of the Jewish People" or "Population genetics context of ancient human genome sequence of an extinct Palaeo-Eskimo" (https://evolbio.ut.ee). And some samples were included in a study by Metspalu that's missing from evolbio.ut.ee titled "Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia" (https://www.ncbi.nlm.nih.gov/geo/que...i?acc=GSE33489). And some samples were from SGVP (Singapore Genome Variation Project) (https://blog.nus.edu.sg/sshsphphg/si...ation-project/).

    BTW does someone know if K13 or K15 are based on supervised or unsupervised ADMIXTURE? Because if they're based on supervised ADMIXTURE, then they might suffer from the same bias as K36 that's caused by differences in the size of the reference populations.

    If K13 is unsupervised, then it must have had a large number of Northwestern European samples in the reference run, because normally in a global unsupervised ADMIXTURE run, it's difficult to get a distinct component for Northwestern Europeans like the North_Atlantic component in K13. But instead Northwestern Europeans just get a mixture of a Northeastern European component and Southern European or West Asian or Caucasian components. For example yesterday I did a global unsupervised K=32 run, where Lithuanians got 100% of the Northeastern European component, but there was no distinct Northwestern European component, so Norwegians got 80% of the Northeastern European component, 15% of a Sardinian component, 3% of a North Caucasian component, and 2% of a South Caucasian component. (And actually it kinda pisses me off that North_Atlantic is listed before Baltic in Eurogenes K13, when it's easy to see even from the FST matrix that North_Atlantic is intermediate between Baltic and West_Med.)

    Quote Originally Posted by Molfish View Post
    CEU (White Americans from Utah) against my k36 averages:
    For some reason one of those samples (CEU.SG:NA07037.SG) got 49% of the Central_European component, even though it wasn't included in `Eurogenes_K36_refs.txt`. I excluded 9 Utah samples from my CSV file because their sample names were included in the text file of reference samples for K36, and they all got between 87% and 96% of the Central_European component. But among the remaining 73 CEU samples, the proportion of the Central_European component ranged from 1% to 17%.

    Below you can see the ten modern samples in my CSV file with the highest percentage of the Central_European component (the Hungarian samples suffer from the calculator effect because they were used as references for K36, but I didn't bother to exclude them manually because their sample ID in `Eurogenes_K36_refs.txt` was different from their ID in the Reich dataset):

    Code:
    $ curl -Lso reich.k36.csv 'https://drive.google.com/uc?export=download&id=1dGDy5_jpDR6vxXYz1qIV4Cg6KnKHPeeX'
    $ awk -F, '{print$7,$1}' reich.k36.csv|sort -rn|awk -F: 'NR==FNR{a[$0];next}$2 in a' <(awk -F\\t '$5==0{print$2}' v50.0_HO_public.anno) -|head
    96.15 Hungarian:hungary20
    96.11 Hungarian:hungary15
    92.05 Hungarian:hungary2
    91.95 Hungarian:hungary7
    91.78 Hungarian:hungary6
    91.29 Hungarian:hungary3
    48.93 CEU.SG:NA07037.SG
    17.43 Icelandic:NA15758
    16.60 CEU.SG:NA11932.SG
    16.45 Icelandic:NA15755
    Last edited by Nganasankhan; 06-28-2022 at 03:57 AM.

  16. The Following 5 Users Say Thank You to Nganasankhan For This Useful Post:

     jadegreg (06-28-2022),  Molfish (06-28-2022),  PLogan (06-28-2022),  Ruderico (06-28-2022),  xerxez (06-28-2022)

  17. #10
    Registered Users
    Posts
    449
    Sex
    Location
    Missouri, U.S.
    Ethnicity
    Colonial American
    Nationality
    American
    aDNA Match (1st)
    VK2020_Scotland_Orkney_VA:VK207
    Y-DNA (P)
    R1b-U152 >R-FTA96415
    mtDNA (M)
    J1b1a1a
    Y-DNA (M)
    I2-P37 > I-BY77146
    mtDNA (P)
    H

    United States of America Scotland England Netherlands
    After heeding Nganasankhan's advice, I ran his code snippet in post https://anthrogenica.com/showthread....l=1#post857192 to scrub the dataset.

    The following code generates population averages so that it removes duplicate samples, it removes samples with less than 200,000 SNPs, it removes samples that have been marked as contaminated, and it removes various suffixes from the population names:
    Results are very different for my family. I've never had high numbers for Spain or Iberia, single digits at best as my Mother has a small pull there. My distances look fairly normal however.

    Copy of my file (first time trying this): https://pastebin.com/MgGtDfta

    Target: PLogan
    Distance: 723.7722% / 7.23772194 | ADC: 0.25x RC
    27.2 Spain_Visigoth_Barcelona
    17.4 Wales_C_EBA
    15.4 Norway_Viking_o2
    14.4 Hungary_LBA
    13.8 Norway_Medieval
    11.8 Estonia_CordedWare_o1

    Target: PLogan_Father
    Distance: 688.5419% / 6.88541878 | ADC: 0.25x RC
    49.8 Norway_Medieval
    15.6 Hungary_LBA
    13.6 Wales_C_EBA
    8.2 Italy_Sicily_EBA_o2
    5.6 Spain_Visigoth_Barcelona
    5.0 Iceland_Early_Christian_o
    2.2 Serbia_IronGates_N

    Target: PLogan_Mother
    Distance: 538.1274% / 5.38127382 | ADC: 0.25x RC
    30.8 Wales_C_EBA
    19.4 Norway_Medieval
    12.6 Spain_Menorca_LBA
    9.4 England_IA_o
    8.6 Estonia_CordedWare_o1
    5.8 Spain_Visigoth_Barcelona
    5.2 France_Alsace_Lingolsheim_EBA
    5.2 Spain_UP_Azilian
    3.0 Ukraine_EBA


    Distances
    Code:
    Distance to:	PLogan_Mother
    9.57267988	Orcadian
    9.96193756	England_IA_Roman
    10.04791023	GBR
    10.83173116	CEU
    10.83790570	France_BellBeaker
    11.01264273	Faroes_EarlyModern
    11.05788407	Ireland_Viking
    11.51654028	Norway_Medieval
    11.56122398	Greenland_LateNorse
    11.65798010	Scotland_Viking
    11.68271801	Scotland_Viking_o
    11.85104637	England_MBA
    11.86171573	Scotland_LBA
    11.92226489	England_EarlyMedieval_Saxon
    11.97411375	Ireland_EBA
    12.00089580	England_C_EBA
    12.00839706	Iceland_Pre_Christian
    12.04521067	England_LBA
    12.08148584	Denmark_Viking
    12.10662215	Iceland_Viking
    12.14153615	Czech_Bohemia_BellBeaker
    12.14402322	England_IA
    12.15120570	Scotland_MBA
    12.22566972	Czech_BellBeaker
    12.24880811	England_BellBeaker_mediumEEF
    
    Distance to:	PLogan_Father
    10.24925851	Norway_Medieval
    11.89538566	CEU
    12.67175994	Scotland_Viking_o
    13.05484967	Moldova_Glinoe_Scythian_o2
    13.23847423	Hungary_Langobard
    13.43957589	Italy_LA_oCentralEuropean_o3CentralEuropean
    13.48291141	Denmark_Viking
    13.54246285	Germany_EarlyMedieval
    13.57542265	Faroes_EarlyModern
    13.85080864	GBR
    14.01614783	Germany_BellBeaker
    14.04728799	Iceland_Early_Christian_o
    14.06897295	Orcadian
    14.12837924	Italy_North_EarlyMedieval_Langobards_1
    14.21253672	England_Viking_o
    14.28376701	England_IA_Roman
    14.48302455	Ireland_Viking
    14.55071132	England_EarlyMedieval_Saxon
    14.65423488	Czech_BellBeaker
    14.71425499	England_IA
    14.78738314	Germany_Lech_EBA
    14.79775997	England_IA_ERoman
    14.88002688	Iceland_Pre_Christian
    15.03509894	England_Viking
    15.04827897	Germany_EarlyMedieval_Alemanic_Byzantine
    
    Distance to:	PLogan
    11.08985572	Scotland_Viking_o
    11.17984794	Orcadian
    11.32317977	England_IA_Roman
    11.50415577	Norway_Viking_o2
    11.66127352	GBR
    11.89478877	CEU
    12.29499492	Scotland_Viking
    12.72970934	England_LBA
    13.04847884	Scotland_MBA
    13.06089583	Czech_BellBeaker
    13.10644879	Germany_BellBeaker
    13.18399408	Norway_Medieval
    13.25768079	England_IA
    13.33213786	Scotland_LBA
    13.46759444	England_BellBeaker_mediumEEF
    13.51877213	Ireland_EBA
    13.65611218	England_Viking_o
    13.74470807	Ireland_Viking
    13.74488996	France_BellBeaker
    13.84079116	Faroes_EarlyModern
    13.85159558	Germany_Lech_EBA
    13.98139836	England_IA_ERoman
    14.02408286	Czech_Bohemia_BellBeaker
    14.21200901	England_EarlyMedieval_Saxon
    14.29590501	England_MBA
    Last edited by PLogan; 06-29-2022 at 12:28 AM.

  18. The Following User Says Thank You to PLogan For This Useful Post:

     Ruderico (06-29-2022)

Page 1 of 2 12 LastLast

Similar Threads

  1. Replies: 18
    Last Post: 06-30-2022, 01:39 AM
  2. Reich dataset (AADR) version 50.0
    By Nganasankhan in forum Autosomal (auDNA)
    Replies: 12
    Last Post: 10-18-2021, 06:27 PM
  3. New version of Reich lab dataset
    By firemonkey in forum Autosomal (auDNA)
    Replies: 0
    Last Post: 03-10-2020, 12:17 PM
  4. Replies: 6
    Last Post: 02-21-2018, 10:03 PM
  5. Replies: 14
    Last Post: 10-25-2017, 03:57 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •