In reading some of the comments, I noticed that many are in favor of this calculator, and a few don’t find it that useful.
I feel that a calculator’s usefulness is very subjective. It really depends on what is expected is from it. A calculator’s accuracy is determined by a few parameters, which include among other things, the numbers and sizes of the reference populations that are relevant to the end user’s ancestry, the assignment algorithims (the manner and order SNPs are assigned to geographic regions).
If for example, the expectation is to have admixture narrowed down to a sub-regional population, and have ancient genomes define admixture, then as DMXX has suggested, a higher k, say around 16, and one that is rich in ancient genome references is of course more appropriate.
Also, the admixture program, and Oracle program are sort of independent. From what I have seen so far under this thread, it seems that the European members are happy with the Oracle program. For them it appears to be performing at expectation, as it seems that the European references are good. This is reflected in the small distances in the single and mixed mode population comparisons with the various calculator references.
As for myself, I have gotten better Oracle results from some other calculators. It seems that this calculator did not have adequate SE Kurd references. The Iranian references used were substantially “closer” to me than the Kurd references.
If the expectation of the end user is to be compared with ancient populations, or to have ancient genomes define admixture, then obviously this calculator is not appropriate. A calculator that uses nothing but genomes from ancient references would be the right choice. As of now, I do not know of any such calculator, but I hoping someone would put one together that incorporates all the ancient genomes that have been discovered, that have sufficient overlapping SNPs with our raw data.
Finally, I tested this calculator with 2 ancient genomes, one from pre-Yamanya admixed Europeans, LBK Stuttgart, and one from post-Yamnaya admixed Europeans, BR2 Hungary. Here are the results:
LBK Stuttgart, 7ky
Population
South_Asian -
West_Asian 4.54%
Siberian -
African 0.17%
Southern 51.27%
Atlantic_Baltic 44.03%
East_Asian -
# Population (source) Distance
1 Sardinian (HGDP) 9.65
2 Canarias (1000Genomes) 20.78
3 C_Italian (Dodecad) 22.89
4 TSI30 (Metspalu) 22.9
5 Murcia (1000Genomes) 24.01
6 Tuscan (HGDP) 24.1
7 Ashkenazi (Dodecad) 24.15
8 S_Italian_Sicilian (Dodecad) 24.22
9 Sicilian (Dodecad) 24.24
10 North_Italian (HGDP) 24.61
11 Ashkenazy_Jews (Behar) 24.69
12 Morocco_Jews (Behar) 25.04
13 Andalucia (1000Genomes) 25.1
14 Greek (Dodecad) 25.11
15 Portuguese (Dodecad) 25.52
16 Moroccan (Dodecad) 25.72
17 O_Italian (Dodecad) 25.78
18 Galicia (1000Genomes) 25.87
19 Baleares (1000Genomes) 25.97
20 Castilla_Y_Leon (1000Genomes) 26.01
Mixed Mode Population Sharing:
# Primary Population (source) Secondary Population (source) Distance
1 83.2% Sardinian (HGDP) + 16.8% Saudis (Behar) @ 1.32
2 83.4% Sardinian (HGDP) + 16.6% Yemen_Jews (Behar) @ 1.56
3 83.8% Sardinian (HGDP) + 16.2% Samaritians (Behar) @ 3.1
4 82.4% Sardinian (HGDP) + 17.6% Bedouin (HGDP) @ 3.13
5 83.2% Sardinian (HGDP) + 16.8% Palestinian (HGDP) @ 4.19
6 83.6% Sardinian (HGDP) + 16.4% Jordanians (Behar) @ 4.43
7 83.8% Sardinian (HGDP) + 16.2% Egyptans (Behar) @ 4.64
8 84.3% Sardinian (HGDP) + 15.7% Druze (HGDP) @ 4.8
9 74.8% Sardinian (HGDP) + 25.2% Morocco_Jews (Behar) @ 5.02
10 83.9% Sardinian (HGDP) + 16.1% Lebanese (Behar) @ 5.05
BR2 Hungary 3.2ky
Population
South_Asian -
West_Asian 10.10%
Siberian -
African 0.48%
Southern 19.50%
Atlantic_Baltic 69.91%
East_Asian -
Single Population Sharing:
# Population (source) Distance
1 French (HGDP) 0.78
2 French (Dodecad) 1.02
3 Hungarians (Behar) 6.41
4 Mixed_Germanic (Dodecad) 7.58
5 Cornwall (1000Genomes) 7.87
6 German (Dodecad) 8.45
7 Kent (1000Genomes) 8.55
8 English (Dodecad) 8.71
9 CEU30 (1000Genomes) 8.92
10 Cataluna (1000Genomes) 8.96
11 Dutch (Dodecad) 9.7
12 British (Dodecad) 10.23
13 British_Isles (Dodecad) 10.43
14 Valencia (1000Genomes) 10.5
15 Spaniards (Behar) 10.77
16 Irish (Dodecad) 10.8
17 Cantabria (1000Genomes) 10.86
18 Orcadian (HGDP) 11.31
19 Pais_Vasco (1000Genomes) 11.47
20 Argyll (1000Genomes) 11.57
Mixed Mode Population Sharing:
# Primary Population (source) Secondary Population (source) Distance
1 56% German (Dodecad) + 44% Spaniards (Behar) @ 0.28
2 65% Mixed_Germanic (Dodecad) + 35% Andalucia (1000Genomes) @ 0.41
3 58.7% Mixed_Germanic (Dodecad) + 41.3% Spaniards (Behar) @ 0.41
4 52.6% Dutch (Dodecad) + 47.4% Spaniards (Behar) @ 0.42
5 62.4% German (Dodecad) + 37.6% Andalucia (1000Genomes) @ 0.43
6 66.8% Mixed_Germanic (Dodecad) + 33.2% Murcia (1000Genomes) @ 0.52
7 59.2% Dutch (Dodecad) + 40.8% Andalucia (1000Genomes) @ 0.55
8 61.3% German (Dodecad) + 38.7% Galicia (1000Genomes) @ 0.58
9 99.6% French (HGDP) + 0.4% ASW30 (HapMap3) @ 0.59
10 53.2% Argyll (1000Genomes) + 46.8% Baleares (1000Genomes) @ 0.59
There does not appear to be any inconsistencies between these results and the few excerpts that I have seen posted referencing the Haak paper. I do encourage members who are more familiar with the paper to comment on these results.
point taken
but if you want this BR Hungarian to be a hungarian, then maybe this is the way
Kit Num: F999933
Threshold of components set to 1.000
Threshold of method set to 0.25%
Personal data has been read. 20 approximations mode.
Gedmatch.Com
MDLP K=8 4-Ancestors Oracle
This program is based on 4-Ancestors Oracle Version 0.96 by Alexandr Burnashev.
Questions about results should be sent to him at:
[email protected]
Original concept proposed by Sergey Kozlov.
Many thanks to Alexandr for helping us get this web version developed.
Admix Results (sorted):
# Population Percent
1 East_European 37.11
2 Paleo_Mediterranean 24.87
3 West_European 24.47
4 Caucasian 10.73
5 Paleo_Scandinavian 1.59
6 South_Central_Asian 1.22
Finished reading population data. 73 populations found.
8 components mode.
--------------------------------
Least-squares method.
Using 1 population approximation:
1 HNG @ 13.096346
2 BLG @ 14.194614
3 SLV @ 14.723685
4 GER @ 15.502356
5 CRT @ 17.827982
6 BSN @ 18.872555
7 SRB @ 19.082020
8 RMN @ 20.711857
9 SLK @ 21.622833
10 MCD @ 21.867672
11 MNT @ 22.924067
12 GGZ @ 26.392282
13 FRN @ 26.640478
14 WUKR @ 28.088476
15 CEU @ 29.054943
16 FIN @ 30.243456
17 SWD @ 32.063717
18 UKR @ 32.449532
19 CUKR @ 32.873066
20 NITAL @ 33.069458
and the LBK below, matches your sardinian
Kit Num: F999916
Threshold of components set to 1.000
Threshold of method set to 0.25%
Personal data has been read. 20 approximations mode.
Gedmatch.Com
MDLP K=8 4-Ancestors Oracle
This program is based on 4-Ancestors Oracle Version 0.96 by Alexandr Burnashev.
Questions about results should be sent to him at:
[email protected]
Original concept proposed by Sergey Kozlov.
Many thanks to Alexandr for helping us get this web version developed.
Admix Results (sorted):
# Population Percent
1 Paleo_Mediterranean 63.53
2 West_European 19.53
3 Caucasian 16.94
Finished reading population data. 73 populations found.
8 components mode.
--------------------------------
Least-squares method.
Using 1 population approximation:
1 SRD @ 20.630960
2 CRS @ 22.827448
3 SIC @ 23.885744
4 CITAL @ 26.915306
5 NITAL @ 27.455641
6 ASHK @ 29.542480
7 GRK @ 33.299362
8 KSV @ 34.534298
9 PRT @ 37.158596
10 CPR @ 38.265881
11 IBR @ 40.302605
12 SPN @ 40.438301
13 GGZ @ 46.568909
14 BLG @ 46.676292
15 RMN @ 46.947178
16 MCD @ 48.226307
17 MNT @ 49.707516
18 FRN @ 51.335663
19 SRB @ 54.388084
20 TRK @ 56.747826
how do we judge what is correct?
the interesting thing is mdlp k=8 has all GD numbers very high