# Thread: An nMonte and 4mix Guide for the Participants of the Basal-rich K7 and/or Global 10 T

1. Originally Posted by Hobknobbob12
Thanks for response. Few questions;

1) Do you determine what a modern population is by the population name having a nationality name and using prior knowledge about what is an ancient population and then have a rough idea with the others as it is going to take forever to go through them if I was going to be more precise than that.
2)What is an over-fit?
3) I take it the PCAs are the different co-ordinates of your data?
That's a lot of questions! A lot of these types of things have been discussed in the two primary threads on these tools. I will point you to some areas where you can read the discussions.

Question 1: I had split the initial spreadsheet and posted it already early in the process. The following link contains the moderns and ancients split into two:

http://www.anthrogenica.com/showthre...l=1#post200593

Question 2: Over fitting can occur even with a small number of references in your spreadsheet. The author of nMonte has several discussions on the topic with us:

http://www.anthrogenica.com/showthre...l=1#post200908

http://www.anthrogenica.com/showthre...l=1#post200928

http://www.anthrogenica.com/showthre...l=1#post202697

Question 3: The 10 "PC" values you got are plot points that can be put onto a graph. The first one has most of the "variation" built into it, the second one has the second most, etc. The last few are not nearly as important as the first three or four. For example, one of the late dimensions highlights the difference between two Austronesian/Melanesian populations. If you don't have any Austronesian/Melanesian in your background, say a long-time European rooted person, you may score close to the same number as most other Europeans. It won't highlight much about your ancestry. The following link shows the relative importance of each dimension displaying the Eigen value for each PC value from PC1 through PC10:

Just for fun, I made a spreadsheet of the population associated with the most "extreme" value for each PC and ran it as a model. It worked pretty well:

http://www.anthrogenica.com/showthre...l=1#post209015

I hope that helps. Happy modeling!

2. Annoyingly I went through all the populations myself last night and split them up myself. If only I'd seen this before! I think I understand mostly of what your saying.
Although not sure about question 3. When you run through a number of populations and the PC values are provided and show what fits and the difference is it showing the average fit and difference from all the populations that match which have been run through?

Here are my results for Modern and Ancient

Modern

[1] "1. CLOSEST SINGLE ITEM DISTANCES"
Scottish Orcadian Ukrainian_West German Irish
0.003667765 0.003780952 0.004844804 0.005021295 0.005357714
Slovakian Norwegian Czech
0.005441838 0.005735294 0.005867538

[1] "distance%=0.3071 / distance=0.003071"

Robert_Hudson

English_Kent 41.65
Russian_Orel 25.10
Avar 1.35
Scottish 0.95
Spanish_Pais_Vasco 0.50
Khanty 0.30
English_Cornwall 0.25
Kosipe 0.15
Koinanbe 0.10
Slovakian 0.10

According to the link about overfit you sent me this would be a good distance as it is inbetween 1 and 0.5% What I don't understand is that the populations that I match closest on closest single item distances don't come up in correlation or percentages.

Ancient

[1] "1. CLOSEST SINGLE ITEM DISTANCES"
Unetice_EBA:I0117 Bell_Beaker_Czech:RISE569 Icelandic
0.005820653 0.006230570 0.006257795
Bell_Beaker_Germany:I0112 Nordic_LN:RISE97 Nordic_LN:RISE71
0.006603030 0.007253275 0.007391211
0.007580237 0.007900633

[1] "distance%=0.1409 / distance=0.001409"

Robert_Hudson

Corded_Ware_Germany:I1542 22.95
Bell_Beaker_Germany:I0806 20.35
Unetice_EBA:I0047 17.95
Baalberge_MN:I0559 13.95
Icelandic 13.45
Bell_Beaker_Czech:RISE568 9.45
Hungary_HG:I1507 1.90

Again, distance seems good.

Because of these results I then, for Modern, tested populations that were in closest distance match and the other ones which appeared but weren't in it.

"1. CLOSEST SINGLE ITEM DISTANCES"
Scottish Orcadian Ukrainian_West German Irish
0.003667765 0.003780952 0.004781435 0.005021295 0.005357714
Slovakian Norwegian Czech
0.005441838 0.005735294 0.005867538

[1] "distance%=0.3142 / distance=0.003142"

Robert_Hudson

English_Kent 39.5
Russian_Orel 24.0
Avar 1.6
Balochi 0.0
English_Cornwall 0.0
Irish 0.0
Norwegian 0.0
Scottish 0.0
Swedish 0.0
Czech 0.0
Circassian 0.0
Ukrainian_East 0.0
Ukrainian_West 0.0
Slovakian 0.0
German 0.0
Iranian_Persian 0.0

What possible conclusions interpretations can I draw from these results? What might explain the Avar, Russian or, and african components appearing in percentages but then aren't included in single item distance?

3. Originally Posted by Hobknobbob12
What possible conclusions interpretations can I draw from these results? What might explain the Avar, Russian or, and african components appearing in percentages but then aren't included in single item distance?

One other comment is that it looks like you put Icelandic into your Ancients spreadsheet, but it should be in the Moderns one. I have more I can add on these topics. If you want, later today, I can write a longer post with reference to my father who, similar to you, matches a single reference really well. This is no fun with all of the new toys and I still have gotten to learn some additional things about his admixture in spite of this.

4. Okay so if I can find populations which sit inbetween this range better then the populations would be a better match? Does that not mean the the German 0.005021295 , Irish 0.005357714 , Slovakian 0.005441838, Norwegian 0.005735294 and Czech 0.005867538 populations would be a better match because they are within the range of 1.00 and 0.50? Or is this range only suitable when comparing more than one population? If it's just one population, the less distance the better? I still think I don't entirely follow what overfit means.

Yes the Icelandic was an accident.

I've started to look at ancient populaitions. I ran the 3000BC test and it came up as
Yamnaya_Samara:I0429 40.55
LBK_EN:I0056 22.05
Esperstedt_MN:I0172 20.15
Hungary_HG:I1507 12.20
Samara_Eneolithic:I0433 5.05

and I've been looking into each population. So far I've found out that the Samara were a population in the Ukranian region who have left a strong footprint on modern europeans, LBK-EN Linear pottery culture in central europe and Esperstedt in germany.

If you would like to further elaborate on these topics then yes I'd be very interested.

5. Originally Posted by Hobknobbob12
Okay so if I can find populations which sit inbetween this range better then the populations would be a better match? Does that not mean the the German 0.005021295 , Irish 0.005357714 , Slovakian 0.005441838, Norwegian 0.005735294 and Czech 0.005867538 populations would be a better match because they are within the range of 1.00 and 0.50? Or is this range only suitable when comparing more than one population? If it's just one population, the less distance the better? I still think I don't entirely follow what overfit means.

Yes the Icelandic was an accident.

I've started to look at ancient populaitions. I ran the 3000BC test and it came up as
Yamnaya_Samara:I0429 40.55
LBK_EN:I0056 22.05
Esperstedt_MN:I0172 20.15
Hungary_HG:I1507 12.20
Samara_Eneolithic:I0433 5.05

and I've been looking into each population. So far I've found out that the Samara were a population in the Ukranian region who have left a strong footprint on modern europeans, LBK-EN Linear pottery culture in central europe and Esperstedt in germany.

If you would like to further elaborate on these topics then yes I'd be very interested.
For just one population, it isn't bad to be less than 0.50%. Your Scottish reference match is good. It isn't really necessarily bad even in a model with two or more populations. It is just a rule of thumb. That is the meaning of the phrasing that there is a risk of an over fit. It is a judgement call. To illustrate with my father's results may make it easier. Here are his results with the full moderns spreadsheet:

[1] "1. CLOSEST SINGLE ITEM DISTANCES"
Croatian Hungarian French_East English_Cornwall Slovenian English_Kent Montenegrin Orcadian
0.006694774 0.006708204 0.007141428 0.008024338 0.008276473 0.008363612 0.010605659 0.010826357

[1] "2. FULL TABLE nMONTE"
[1] "Ncycles= 1000"
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
Dad 0.0184000 0.0249000 0.0024000 -0.02770000 -0.00570000 0.023200 0.01820000 -0.00290000 -0.0053000 0.00070000
fitted 0.0182126 0.0248814 0.0023817 -0.02771505 -0.00559155 0.023233 0.01819385 -0.00284085 -0.0053161 0.00055425
dif -0.0001874 -0.0000186 -0.0000183 -0.00001505 0.00010845 0.000033 -0.00000615 0.00005915 -0.0000161 -0.00014575
[1] "distance%=0.0272 / distance=0.000272"

Spanish_Pais_Vasco 28.10
Swedish 21.45
Norwegian 15.90
Spanish_Castilla_la_Mancha 9.15
Slovakian 8.35
Basque_Spanish 4.60
Avar 2.55
Sardinian 1.45
Abkhasian 1.10
Udmurt 1.10
Icelandic 0.70
Basque_French 0.65
Khanty 0.65
Nenets_Forest 0.65
Eskimo_Chaplin 0.55
English_Kent 0.50
Georgian 0.35
Lezgin 0.20
Pashtun_Afghanistan 0.20
Scottish 0.20
Sindhi 0.20
Balkar 0.15
Polish 0.15
Greek 0.10
Irish 0.10
Ket 0.10
Macedonian 0.10
Nganasan 0.10
Bosnian 0.05
Circassian 0.05
Croatian 0.05
Eskimo_Sireniki 0.05
German 0.05
Montenegrin 0.05
Serbian 0.05
Slovenian 0.05
Tajik_Rushan 0.05
Ukrainian_East 0.05

[1] "3. RESTRICTED nMONTE"
[1] "Ncycles= 1000"
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
Dad 0.0184000 0.02490000 0.00240000 -0.0277000 -0.005700 0.0232000 0.01820000 -0.00290000 -0.00530000 0.00070000
fitted 0.0182167 0.02488005 0.00237055 -0.0277056 -0.005573 0.0232209 0.01820385 -0.00280605 -0.00532815 0.00051985
dif -0.0001833 -0.00001995 -0.00002945 -0.0000056 0.000127 0.0000209 0.00000385 0.00009395 -0.00002815 -0.00018015
[1] "distance%=0.0306 / distance=0.000306"

Swedish 28.30
Spanish_Pais_Vasco 23.90
Slovakian 15.65
Udmurt 7.45
Basque_Spanish 7.35
Spanish_Castilla_la_Mancha 6.80
Sardinian 5.35
Avar 2.80
Abkhasian 1.60
Norwegian 0.80

[1] "CORRELATION OF ADMIXTURE POPULATIONS"
Abkhasian Avar Basque_Spanish Norwegian Sardinian Slovakian Spanish_Castilla_la_Mancha Spanish_Pais_Vasco Swedish Udmurt
Abkhasian 1.00 0.87 -0.37 -0.21 -0.40 -0.18 -0.29 -0.36 -0.24 -0.05
Avar 0.87 1.00 -0.11 0.19 -0.41 0.23 -0.08 -0.09 0.17 0.36
Basque_Spanish -0.37 -0.11 1.00 0.91 0.82 0.89 0.99 1.00 0.91 0.51
Norwegian -0.21 0.19 0.91 1.00 0.51 1.00 0.85 0.91 1.00 0.75
Sardinian -0.40 -0.41 0.82 0.51 1.00 0.48 0.88 0.81 0.52 0.02
Slovakian -0.18 0.23 0.89 1.00 0.48 1.00 0.84 0.89 0.99 0.75
Spanish_Castilla_la_Mancha -0.29 -0.08 0.99 0.85 0.88 0.84 1.00 0.99 0.86 0.43
Spanish_Pais_Vasco -0.36 -0.09 1.00 0.91 0.81 0.89 0.99 1.00 0.92 0.52
Swedish -0.24 0.17 0.91 1.00 0.52 0.99 0.86 0.92 1.00 0.75
Udmurt -0.05 0.36 0.51 0.75 0.02 0.75 0.43 0.52 0.75 1.00

If you look at this, there are three single populations that are pretty close to the same distance - Croatian, Hungarian, and French East. For his ancestry, the French East population happens to be the right one. However the full table nMonte produces an over fit model. The distance is less than .03% and every PC dimension's fit is really close. However, there is no way it represents his ancestry and it is difficult even to interpret any meaning from the model. The restricted nMonte report helps since the smaller percentages are automatically removed and the tool reruns producing a still over fit model with an 0.0306 distance and an unhelpful model. The correlation report can be used to remove all but one of the closely correlated populations, keeping the best one from each correlated set. For starters if I were just exploring, I would probably pick the French_East population since it fits his paper ancestry and is one of the three close single population matches, the Swedish (eliminating the closely correlated Norwegian and Slovakian), the Spanish_Pais_Vasco (eliminating the Spanish_Castilla_Mancha), and keep the three non-correlated populations (Avar, Udmurt, and Abkhasian) as those may be indicating something different about his ancestry. Then I would run with just those to see what shakes out.

Your ancients model looks interesting and is using one of the "plans" (I think) that is in one of these threads.

6. Hey thanks for response. I understand what you're saying now. I don't get how the correlations work though. If two populations have a strong correlation number (lower the number) does that mean the population are similar? That might explain the Avar population coming up as that population shows strong correlation with al the northern European ones. Is it possible the test is therefore picking up small percentages of these populations because of the similarities that these populations share as weren't northern Europeans from the caucus region to begin with? Yes that is from one of the first comments on here where someone asked for ancient sample and they were provided with a population set from 3000BC. Would this diagram be a good place to start from in figuring out migration path? Also, what are the villabruna related and basal rich components ?

edit: gif isn't working here is link https://en.wikipedia.org/wiki/File:IE-migrations.gif

7. Originally Posted by Hobknobbob12
Hey thanks for response. I understand what you're saying now. I don't get how the correlations work though. If two populations have a strong correlation number (lower the number) does that mean the population are similar? That might explain the Avar population coming up as that population shows strong correlation with al the northern European ones. Is it possible the test is therefore picking up small percentages of these populations because of the similarities that these populations share as weren't northern Europeans from the caucus region to begin with? Yes that is from one of the first comments on here where someone asked for ancient sample and they were provided with a population set from 5000BC. I think the best way of finding out distant ancestry would be to start say from 5000bc and then look at the key populations which determined specific migratory paths and compare your data with them and then hopefuly that will bring me up to the modern nations.
Correlated populations are higher numbers, not lower, for example Swedish and Norwegian are 1.00 with each other - very similar populations. The Avar is coming up because you may have some slightly higher Caucasus type of admixture than the reference Scottish person. It probably isn't indicating much more than that since it is a low percentage. The nMonte is just trying to make the fit with the PCA dimensions a little bit more perfect, but it may just be normal variation within Scottish people. So, you may want to disregard it or you may want to explore it. One thing I have learned is that this could be mediated from a closer population that just has a little more Caucasus than the Scottish reference. So, you can play around with your model to see if that is what is happening. It may mean including a correlated population that is a little more Caucasus-shifted. You will have to decide if it does more harm than good to your model. For my father, I used some out groups with French_East and German to produce the following model:

[1] "3. RESTRICTED nMONTE"
[1] "Ncycles= 1000"
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
Dad 0.01840000 0.0249000 0.0024000 -0.0277000 -0.00570000 0.02320000 0.01820000 -0.0029000 -0.005300 0.0007000
fitted 0.01785885 0.0243098 0.0015388 -0.0278445 -0.00526435 0.02298545 0.01837475 -0.0024299 -0.003201 -0.0016656
dif -0.00054115 -0.0005902 -0.0008612 -0.0001445 0.00043565 -0.00021455 0.00017475 0.0004701 0.002099 -0.0023656
[1] "distance%=0.3449 / distance=0.003449"

"French_East" 65.60
"German" 31.15
"Nganasan" 2.55
"Andamanese_Onge" 0.70

This shows he really is southwest German like (where all his recent ancestors came from), but that he has a little bit elevated East and extreme Southeast Asian to resolve (Nganasan and Adamanese_Onge). Also, the early principle components that are most important (1, 2, and 3) have more "distance" than many of the later ones. I would rather have the model be a little more tight on the first three with some looseness later in the model. Based on some types from a test Kurd is running, his test was indicating that the Siberian may be mediated through Finno-Ugric populations and that he had some SW Asian input that may be important, I added just a few populations and got the following model:

[1] "3. RESTRICTED nMONTE"
[1] "Ncycles= 1000"
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
Dad 0.0184000 0.0249000 0.0024000 -0.02770000 -0.00570000 0.02320000 0.01820000 -0.00290000 -0.0053000 0.0007000
fitted 0.0183859 0.0248644 0.0025644 -0.02759535 -0.00469625 0.02208805 0.01844655 -0.00274105 -0.0049156 -0.0005567
dif -0.0000141 -0.0000356 0.0001644 0.00010465 0.00100375 -0.00111195 0.00024655 0.00015895 0.0003844 -0.0012567
[1] "distance%=0.2024 / distance=0.002024"

Swedish 43.50
Italian_Bergamo 31.95
French_East 17.90
Saami 6.65

[1] "CORRELATION OF ADMIXTURE POPULATIONS"
French_East Swedish Saami Italian_Bergamo
French_East 1.00 0.97 0.73 0.95
Swedish 0.97 1.00 0.83 0.86
Saami 0.73 0.83 1.00 0.55
Italian_Bergamo 0.95 0.86 0.55 1.00

The French_East, Swedish, and Italian Bergamo are somewhat correlated, but the mix makes the early PCs nice, so they do bring something a little different to the model in spite of their closeness. I am taking this as more an indication of the "flavor" of his ancestry in terms of some of the neighbors. I am not sure what the Saami says except they are supplying some of the excess Siberian/East Asian to the model.

8. Okay cool, I think I'm getting more to grips with it now, thanks. I think I'm going to take it out of my model for now and focus on what is most prevalent. Despite potential overfit I've been looking at populations that made up my results I posted and they seem logical in terms of the migration paths that were taken into the British isles.

[1] "1. CLOSEST SINGLE ITEM DISTANCES"
Unetice_EBA:I0117 Bell_Beaker_Czech:RISE569
0.005820653 0.006230570
Bell_Beaker_Germany:I0112 Nordic_LN:RISE97 Nordic_LN:RISE71
0.006603030 0.007253275 0.007391211
0.007580237 0.007900633

Corded_Ware_Germany:I1542 24.15
Bell_Beaker_Germany:I0806 22.60
Unetice_EBA:I0047 20.35
Baalberge_MN:I0559 16.15
Bell_Beaker_Czech:RISE568 11.70
Hungary_HG:I1507 2.80
Srubnaya:I0431 2.25

Yamnaya_Samara:I0429 40.55
LBK_EN:I0056 22.05
Esperstedt_MN:I0172 20.15
Hungary_HG:I1507 12.20
Samara_Eneolithic:I0433 5.05

Scrubnaya 12th BC on caspian step developed into the Samara, Hungarian HG after?, this led to funnelbeaker culture which was in germany which reflects my results 4300bc- 2800bc (Baalberge/Germany), Corded ware culture 2900- 2350bc which was also in Germany as well, beaker culture 2900-1800bc, Unetice cullture 2300-1600 bc, and if I include population from single item then the next one is Late Bronze age 1600-1200 Halberstadt which again is in Germany. I thought the The Nordic LN97 and Nordic_LN:RISE71 (which are from sweden and denmark) on single item distance might be indication of the next path taken as because I show consistent orcadian in results and according to Morely's subclade predictor I'm R1a1a1b1a3 which has been labeled as being Viking and the vikings went to Orkney. Although those populations are Late neolithic and my ancestors couldn't have been in two places at same time. I do have close match to Battle axe sweden though (looking into it). So can't really be sure what path was taken after germany. Would be interesting to know if there were any populations to compare from the Halstat and la tene cultures.

[1] "1. CLOSEST SINGLE ITEM DISTANCES"
Scottish Orcadian Ukrainian_West German Irish
0.003667765 0.003780952 0.004844804 0.005021295 0.005357714
Slovakian Norwegian Czech
0.005441838 0.005735294 0.005867538

My modern results would also further support this as the Unetice culture" is known from Czech Republic and Slovakia from about 1,400 sites, from Poland (550 sites) and Germany (about 500 sites and loose finds locations).[1] The Únětice culture is also known from north-eastern Austria (in association with the so-called the Böheimkirchen Group), and from western Ukraine”

9. If someone could help me I would really appreciate it. I downloaded the R software, made a Basil-Rich K7 target file and a Global 10 target file and don't understand what to do with them.
It's a little (lot) over my head from there.

Ancient_North_Eurasian*** 20.21
Basal-rich*** 24.91
East_Eurasian*** 2.15
Oceanian*** 0
Southeast_Asian*** 0.01
Sub-Saharan*** 0
Villabruna-related*** 52.72
*
*
Global 10
*
,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10
Dani_Rudd,0.0186,0.0223,0.0035,-0.0344,-0.0071,0.0172,0.0226,-0.0036,-0.005,-0.0028
*
*
Global 10 - model 1
*******************
Yamnaya_Samara 41.8
LBK_EN******** 36.0
Western_HG**** 18.4
Pima*********** 3.8

Thanks so much

*

10. Originally Posted by Danilynn
If someone could help me I would really appreciate it. I downloaded the R software, made a Basil-Rich K7 target file and a Global 10 target file and don't understand what to do with them.
It's a little (lot) over my head from there.

Ancient_North_Eurasian*** 20.21
Basal-rich*** 24.91
East_Eurasian*** 2.15
Oceanian*** 0
Southeast_Asian*** 0.01
Sub-Saharan*** 0
Villabruna-related*** 52.72
*
*
Global 10
*
,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10
Dani_Rudd,0.0186,0.0223,0.0035,-0.0344,-0.0071,0.0172,0.0226,-0.0036,-0.005,-0.0028
*
*
Global 10 - model 1
*******************
Yamnaya_Samara 41.8
LBK_EN******** 36.0
Western_HG**** 18.4
Pima*********** 3.8

Thanks so much

*
Hello Danilynn,

Here you have your results based on the model of our friend randwulf.

Command

Code:
`R -e 'source("nMonte2.R"); getMonte("input/Model_by_randwulf_based_Mathieson.txt","target/Dani_Rudd.txt");'`
Result

Code:
```[1] "distance%=0.2451 / distance=0.002451"

Dani_Rudd

Swedish                    51.6
German                     39.6
Karitiana                   4.1
Sephardi_Jew                3.8
Austroasiatic_Kharia        0.7
Yoruba                      0.2
Ami                         0.0
Biaka                       0.0
Bougainville                0.0
Chukchi                     0.0
Eskimo_Sireniki             0.0
Han                         0.0
Andamanese_Onge             0.0
Papuan                      0.0
She                         0.0
Ulchi                       0.0
Bantu_SA_Tswana             0.0
Basque_Spanish              0.0
Spanish_Andalucia           0.0
Spanish_Aragon              0.0
Spanish_Baleares            0.0
Spanish_Cantabria           0.0
Spanish_Castilla_la_Mancha  0.0
Spanish_Castilla_y_Leon     0.0
Spanish_Cataluna            0.0
Spanish_Galicia             0.0
Spanish_Murcia              0.0
Spanish_Pais_Vasco          0.0
Spanish_Valencia            0.0
Italian_Bergamo             0.0
Italian_CentralSicilian     0.0
Italian_EastSicilian        0.0
Italian_South               0.0
Italian_Tuscan              0.0
Italian_WestSicilian        0.0
Ashkenazi_Jew               0.0
Jordanian                   0.0
Basque_French               0.0
French_East                 0.0
French_South                0.0
Portuguese                  0.0
English_Kent                0.0```
Regards

11. ## The Following User Says Thank You to Ravai For This Useful Post:

randwulf (04-30-2017)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•