PDA

View Full Version : OptimaFit Adventures and Experimentation



BalkanKiwi
06-26-2020, 01:11 AM
Acknowledging how many credits OptimaFit uses, Iíve been selective about what populations I put together and the hypothesis and logic for each run (to level of knowledge anyway). To being with, when modelling North African populations, Egyptian provides the highest percentage, however I wanted to see how that would compare by placing more North African populations in, and then seeing if its still the best fit when each model is run. I have no doubt I have some minor North African, which based on my ancestry, seems logical. For what MyHeritage is worth, I get 1.5% North African, my sister gets 2.9%. She also scores some North African on myOrigins. We also get varying amounts across certain calculators. Ideally, I would also do an OptimaFit for my grandfather (the likely source of the North African), however I'm patiently counting down until the Eurogenes store reopens to get his coordinates. In the mean time, it may be difficult to deduce findings simply based off of mine alone.

For the first experiment, I added a number of North African populations listed in the image below. You can add up to 20 populations on GenoPlot when conducting OptimaFit, however I thought I'd stick to a smaller amount, specifically the ones I have previously modeled that I know I score some of.

https://i.imgur.com/hNjCS6L.jpg

Probably not overly surprising, Egyptian comes out on top. I plan on doing this for my grandfather to see if Egyptian is also his best fit. Depending on this, I may also do the same for my mother and sister. Can it be said Egyptian is the possible source of the North African? Its possible, but I need to validate this with more family members (as much as you can validate ancestry without a paper trail 900+ years ago).

Sample:Balkan Kiwi ► BalkanKiwi
Fit:1.2521
Results:Irish84.6
Croatian12
Egyptian3.4

For my next experiment, I wanted to look at Levantine. This post (https://anthrogenica.com/showthread.php?20697-Playing-with-Iranian-populations-using-NMonte-Runner&p=676989&viewfull=1#post676989) shows that Lebanese_Christian is a good fit when using those particular populations, however I wanted to take this a step further and include Levantine populations only. I also added Egyptian, as I was curious to see how strong it is when compared to Levantine populations (I wasn't expecting it to be stronger, but what's the harm in trying with such a great tool).

Once again I included populations I know I score small amounts of when modelling:

https://i.imgur.com/IY8ERRe.jpg

What surprised me is that Samaritan is the best fit. Admittedly I've never modeled Samaritan before, and I was expecting Lebanese_Christian to still be the best fit. I've done some brief research and came across this 23andMe article (https://blog.23andme.com/ancestry-reports/more-than-just-a-parable-the-genetic-history-of-the-samaritans/). This is from 2008, so I'm not sure if the consensus has changed since then with new research. If every Ashkenazi Jew were to run this exact Levantine setup, would Samaritan be the best fit for most people? I'm not sure, possibly not if they are a unique population, so hopefully someone more knowledgeable in Jewish genetics can answer. Perhaps I could improve upon the populations I've selected. I'll be curious to see if my grandfather also gets the same result. I suppose Samaritan could also just be a "fill in" for something else, however it seems Samaritan is distinct enough that it may not be the case, and the purpose of OptimaFit is to not use proxies in a sense for something else, i.e. if Lebanese_Christian is still a better fit, it wouldn't pick Samaritan to represent it, as it has the choice of Samaritan if its a better fit overall. I'm yet to play around with unscaled data and/or turning penalty off to see what happens.

Sample:Balkan Kiwi ► BalkanKiwi
Fit:1.1805
Results:Irish84.8
Croatian10.2
Samaritan5

BalkanKiwi
06-27-2020, 12:50 AM
I decided to use OptimaFit to play around with my minor Polynesian/SE Asian ancestry. I put a number of populations in from the region, including Korean and Japanese to see if it had any influence on what I expected the result to be. I was expecting Nasoi to be the best fit, or at the very least a Papuan like component, which is the case. Interestingly on a chromosome painting, I have really no SE Asian segments, unlike my grandmother who has a very large stretch on chromosome 21. Basically all of my Polynesian are small, scattered Oceanic segments, which is likely why Papuan populations are my best fit. I didn't seem to have inherited much SE Asian, unlike my grandmother, who has more SE Asian than Oceanian. I plan to also model her to confirm this hypothesis.

https://i.imgur.com/sQNPWrJ.jpg

Sample:Balkan Kiwi ► BalkanKiwi
Fit:1.1104
Results:Irish81.6
Croatian11
Ashkenazi Poland6.6
Kosipe0.8

BalkanKiwi
06-27-2020, 05:54 AM
Some of you may have seen my thread (https://anthrogenica.com/showthread.php?20374-Ashkenazi-and-Sub-Saharan-ancestry-tracing-using-segment-matches-and-G25-modelling) over in the Jewish section regarding East African ancestry and Ashkenazi. I'm hoping my grandfather's coordinates provide some insight into this, as its somewhat interesting to play around with, acknowledging it could very well be noise. I was curious to see if an OptimaFit would show anything that a normal NMonte run wouldn't.

Firstly I created a model that included Eritrean or one of the few Ethiopian populations I match, such as Amhara. Shown below is a model I put together with Eritrean. It seems it could be mediated through Sicilian East, as when I remove Eritrean the Sicilian East percentage increases.

Sample:Balkan Kiwi ► BalkanKiwi
Fit:1.2324
Results:Irish72.5
Croatian14.5
English Cornwall9.5
Sicilian East3
Eritrean0.5

When I run an OptimaFit (below) with Eritrean and a number of Ethiopian populations, the East African doesn't fit into any model. Whether this is due to it being noise I'm not sure. My grandfather's results might help with this, but it might highlight the usefulness of OptimaFit for ruling out potential noise. Just because a population has a percentage in a normal NMonte model doesn't mean its still going to appear in an OptimaFit run. Mind you, its important to note we are playing with very minor amounts of ancestry which needs to be taken into account.

Sample:Balkan Kiwi ► BalkanKiwi
Fit:1.2402
Results:Irish82
Croatian12.8
Sicilian East5.2

firemonkey
06-27-2020, 12:59 PM
Unable to run optima fit error message

firemonkey
06-27-2020, 01:45 PM
Sample:firemonkey ► firemonkey dad
Fit:1.3331
Results:Welsh46.6
Scottish41.8
Irish6
Swedish5.6

Sample:firemonkey ► firemonkey
Fit:0.786
Results:Scottish52.2
Irish43
Swedish4.8

GenoPlot
06-27-2020, 02:12 PM
When I run an OptimaFit (below) with Eritrean and a number of Ethiopian populations, the East African doesn't fit into any model. Whether this is due to it being noise I'm not sure. My grandfather's results might help with this, but it might highlight the usefulness of OptimaFit for ruling out potential noise. Just because a population has a percentage in a normal NMonte model doesn't mean its still going to appear in an OptimaFit run. Mind you, its important to note we are playing with very minor amounts of ancestry which needs to be taken into account.

Sample:Balkan Kiwi ► BalkanKiwi
Fit:1.2402
Results:Irish82
Croatian12.8
Sicilian East5.2

Our primary goal with optimaFit was to provide users with a solid base set of source populations to build upon. In it's current iteration its configured to optimize down to the best 4 (or lower ) set of sources that provide the best fits. We may make that a configurable option in the feature to allow users to pick the optimal number of sources.

It's important to note that increasing the optimal number has an exponential impact on the number of possible combinations. As an example, going to an optimaFit of 6 versus 4 for 20 sources increases the number of possible combinations from 4,845 to 38,760.

All of these models run in parallel, so it would not have any impact on run times, however, they do use extra compute capacity so they would use quite a few more compute credits.

Setting the value to 4 gives us a nice set of base populations with little to no possibility of over-fitting.

BalkanKiwi
06-27-2020, 09:14 PM
Sample:firemonkey ► firemonkey dad
Fit:1.3331
Results:Welsh46.6
Scottish41.8
Irish6
Swedish5.6

Sample:firemonkey ► firemonkey
Fit:0.786
Results:Scottish52.2
Irish43
Swedish4.8

Which other Nordic populations did you include in the model?

BalkanKiwi
06-27-2020, 11:49 PM
Our primary goal with optimaFit was to provide users with a solid base set of source populations to build upon. In it's current iteration its configured to optimize down to the best 4 (or lower ) set of sources that provide the best fits. We may make that a configurable option in the feature to allow users to pick the optimal number of sources.

It's important to note that increasing the optimal number has an exponential impact on the number of possible combinations. As an example, going to an optimaFit of 6 versus 4 for 20 sources increases the number of possible combinations from 4,845 to 38,760.

All of these models run in parallel, so it would not have any impact on run times, however, they do use extra compute capacity so they would use quite a few more compute credits.

Setting the value to 4 gives us a nice set of base populations with little to no possibility of over-fitting.

From what I've seen so far I figured 4 populations was the maximum. So essentially, to use Eritrean as an example but its probably the same for other smaller amounts, even if it appears in a normal NMonte run and then doesn't appear in an OptimaFit, it doesn't confirm its noise (by itself anyway, it could very well be), but more so the 2-3 bigger populations when combined fit better when the lessor population is excluded?

BalkanKiwi
06-28-2020, 04:11 AM
I decided to play around with ancient samples and modelling, which I haven't done much of before. Firstly I wanted to use OptimaFit to see how the ancient Vanuatu and Tongan populations fit into an ancient model. There's a good chance there are better European populations I could use, but I need to experiment more. In any case, below are the populations I used. I should note I used default penalty with unscaled coordinates.

https://i.imgur.com/2YhYKPQ.jpg

I'm not surprised by the results below. There is a clear lack of ancient Oceanian samples and therefore unavoidable bias, as Vanuatu outnumbers the single Tongan sample. Even with more Tongan samples, I'd still expect a Vanuatu sample to be selected. The 2300BP sample is mostly of Papuan ancestry according to this study (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5882562/) by Lipson et al. As most of my minor Polynesian seems to be Papuan like, and as shown in posts above, it seems logical that an ancient sample of mostly Papuan ancestry will be selected as the best fit. Hopefully in future more ancient DNA from this region will be found (and be usable). This OptimaFit modelling could also be expanded to include ancient samples from SE Asia, which I plan to do in the near future. Because mine is minor it may not make a big difference, so I'll be curious to see if my grandmother throws up something different.

Sample:Balkan Kiwi ► BalkanKiwi
Fit:1.2348
Results:England MBA87.6
BKG N10.6
2300BP all (Vanuatu)1.8

GenoPlot
06-28-2020, 05:30 PM
From what I've seen so far I figured 4 populations was the maximum. So essentially, to use Eritrean as an example but its probably the same for other smaller amounts, even if it appears in a normal NMonte run and then doesn't appear in an OptimaFit, it doesn't confirm its noise (by itself anyway, it could very well be), but more so the 2-3 bigger populations when combined fit better when the lessor population is excluded?
Yes, that's correct. For a given sample of sufficiently differentiated ancestry that exceeds n - in this case n being 4 - the major sources of the ancestry will determine the optimal fits. That's why its important to run these models against parents and siblings as the combined results will give a much clearer picture of the ancestral sources.

When it comes to minor ancestry, it can be helpful to view the optimaFit results in tandem with the accompanying oracles. The oracles, while a bit of a blunt instrument in comparison, will often point to alternative sources of minor ancestry with decreasing probabilities . For example, in your case:

OracleDistance86.6% Irish + 13.4% Ashkenazi Poland1.165192.6% Irish + 7.4% Samaritan1.20192% Irish + 8% Lebanese Christian1.212392.8% Irish + 7.2% Syrian1.229292.4% Irish + 7.6% Lebanese Druze1.229792.2% Irish + 7.8% Druze1.233674.8% Irish + 25.2% Croatian1.285791.6% Irish + 8.4% Syrian Jew1.30187.8% English + 12.2% Ashkenazi Poland1.355194% Irish + 6% Iranian Fars1.3666

Note: the oracle distances are only directly comparable to optimaFit runs with the default penalty "off".

BalkanKiwi
06-28-2020, 08:32 PM
Yes, that's correct. For a given sample of sufficiently differentiated ancestry that exceeds n - in this case n being 4 - the major sources of the ancestry will determine the optimal fits. That's why its important to run these models against parents and siblings as the combined results will give a much clearer picture of the ancestral sources.

When it comes to minor ancestry, it can be helpful to view the optimaFit results in tandem with the accompanying oracles. The oracles, while a bit of a blunt instrument in comparison, will often point to alternative sources of minor ancestry with decreasing probabilities . For example, in your case:

OracleDistance86.6% Irish + 13.4% Ashkenazi Poland1.165192.6% Irish + 7.4% Samaritan1.20192% Irish + 8% Lebanese Christian1.212392.8% Irish + 7.2% Syrian1.229292.4% Irish + 7.6% Lebanese Druze1.229792.2% Irish + 7.8% Druze1.233674.8% Irish + 25.2% Croatian1.285791.6% Irish + 8.4% Syrian Jew1.30187.8% English + 12.2% Ashkenazi Poland1.355194% Irish + 6% Iranian Fars1.3666

Note: the oracle distances are only directly comparable to optimaFit runs with the default penalty "off".

Thanks! That's interesting. I'll take note of this from now on. That 2-way oracle seems to align with the OptimaFit results, i.e Samaritan has a slightly better fit than Lebanese Christian, but the difference is negligible. I'm not aware of how much difference there is genetically between Samaritans and Lebanese Christians for one to fit better than the other. Perhaps in the overall context of Ashkenazi and Levantine, they are all the same. The only way to know for sure would be to have more Ashkenazi use this tool (along with family).

BalkanKiwi
07-03-2020, 10:39 AM
I've gone and done some quick modelling for my grandfather to see how North African fits. Using my North African results as a template, I constructed an OptimaFit model with similar populations:

https://i.imgur.com/ZolcmyV.jpg

Scaled OptimaFit with penalty

Sample:Balkan Kiwi ► BKGrandfather
Fit:1.536
Results:Scottish97.4
Karaite Egypt2.6

Unscaled OptimaFit with penalty

Sample:Balkan Kiwi ► BKGrandfather
Fit:0.7679
Results:Irish35
English31.5
Scottish31.5
Egyptian1.5
Mozabite0.5

Turning the penalty off certainly increases the numbers, but these outputs are useful enough without worrying too much about percentages. I'm more interested in the patterns of populations when comparing family. I used the information from the OptimaFit output to construct some quick models.

Scaled with penalty

Sample:Balkan Kiwi ► BKGrandfather
Fit:1.2531
Results:English36.5
Irish33.5
Scottish27.5
Karaite Egypt2.5


Sample:Balkan Kiwi ► BKGrandfather
Fit:1.2749
Results:English36
Irish35
Scottish27
Egyptian2

Unscaled with penalty

Sample:Balkan Kiwi ► BKGrandfather
Fit:0.7391
Results:Irish38
English30.5
Scottish28.5
Egyptian3

Sample:Balkan Kiwi ► BKGrandfather
Fit:0.7904
Results:Irish37
English32
Scottish30
Karaite Egypt1

The take away from this is Egypt is the best fitting North African population and stays consistent across the generations. I expect my mother to score somewhere in the middle. Are these results suggestive of Karaite Jews entering a line at some stage? I'm not sure, probably unlikely and almost impossible to prove. I would be more comfortable saying Egypt is a likely source of the North African.

passenger
07-03-2020, 02:45 PM
I've gone and done some quick modelling for my grandfather to see how North African fits. Using my North African results as a template, I constructed an OptimaFit model with similar populations:

https://i.imgur.com/ZolcmyV.jpg

Scaled OptimaFit with penalty

Sample:Balkan Kiwi ► BKGrandfather
Fit:1.536
Results:Scottish97.4
Karaite Egypt2.6

Unscaled OptimaFit with penalty

Sample:Balkan Kiwi ► BKGrandfather
Fit:0.7679
Results:Irish35
English31.5
Scottish31.5
Egyptian1.5
Mozabite0.5

Turning the penalty off certainly increases the numbers, but these outputs are useful enough without worrying too much about percentages. I'm more interested in the patterns of populations when comparing family. I used the information from the OptimaFit output to construct some quick models.

Scaled with penalty

Sample:Balkan Kiwi ► BKGrandfather
Fit:1.2531
Results:English36.5
Irish33.5
Scottish27.5
Karaite Egypt2.5


Sample:Balkan Kiwi ► BKGrandfather
Fit:1.2749
Results:English36
Irish35
Scottish27
Egyptian2

Unscaled with penalty

Sample:Balkan Kiwi ► BKGrandfather
Fit:0.7391
Results:Irish38
English30.5
Scottish28.5
Egyptian3

Sample:Balkan Kiwi ► BKGrandfather
Fit:0.7904
Results:Irish37
English32
Scottish30
Karaite Egypt1

The take away from this is Egypt is the best fitting North African population and stays consistent across the generations. I expect my mother to score somewhere in the middle. Are these results suggestive of Karaite Jews entering a line at some stage? I'm not sure, probably unlikely and almost impossible to prove. I would be more comfortable saying Egypt is a likely source of the North African.

I think most Western Jews have some sort of Egyptian-like component that not surprisingly shows up as Karaite in some models/calculators. Ancient NE African ancestry (in addition to Berber, which I think is partly separate - maybe from Cyrenaica?) is part of the standard Ashkenazi and Sephardic mix.

BalkanKiwi
07-04-2020, 01:01 AM
I think most Western Jews have some sort of Egyptian-like component that not surprisingly shows up as Karaite in some models/calculators. Ancient NE African ancestry (in addition to Berber, which I think is partly separate - maybe from Cyrenaica?) is part of the standard Ashkenazi and Sephardic mix.

This makes sense. Thanks for the input. At the very least, this type of modelling probably helps to validate some of the mixing that occurred.

With regards to my grandmother, I went and ran an OptimaFit to focus on her Polynesian. I want to start by saying the Polynesian in my family is quite odd. My grandmother is the only one who I've tested who matches other Polynesians (myself, my mother and sister don't). My grandmother has a long stretch of SE Asian on chromosome 21 that houses most, if not all, of her Polynesian matches. No one else in the family inherited this segment. I was expecting her to get a few more % on SE Asian/Oceanian populations, which would make sense, its two generations closer to our Māori ancestor (my grandmother's 3rd great grandmother).

This is the OptimaFit model I created, similar to mine:

https://i.imgur.com/wshtaUm.jpg

I also played around with the data types and penalty applied for the OptimaFit:

Scaled with penalty

Sample:Balkan Kiwi ► BKGrandmother
Fit:1.4767
Results:Scottish99.6
Kosipe0.4

Unscaled with penalty

Sample:Balkan Kiwi ► BKGrandmother
Fit:0.9868
Results:Scottish99
Atayal0.6
Koinanbe0.4

Unscaled with no penalty

Sample:Balkan Kiwi ► BKGrandmother
Fit:0.9761
Results:Scottish98.6
Atayal0.8
Papuan0.6

What surprises me is the percentages. Even two generations back, they are not larger than mine. The only difference is she picks up a SE Asian population in combination with a Papuan like one which is logical.

I decided to run some models. What I think is the most accurate in this case is unscaled with penalty off:

Sample:Balkan Kiwi ► BKGrandmother
Fit:0.6708
Results:Irish60
English25
Scottish13
Atayal1
Kosipe1

As a comparison, if I run unscaled with no penalty for myself:

Sample:Balkan Kiwi ► BalkanKiwi
Fit:0.5424
Results:Irish66
Croatian23.5
English8
Kosipe1.5
Ami1

Unscaled with penalty

Sample:Balkan Kiwi ► BalkanKiwi
Fit:0.6483
Results:Irish50
English34.5
Croatian13
Kosipe1.5
Ami1

It seems overall my grandmother, in terms of these calculators anyway, has no more Oceanian/SE Asian than me, however she has one long segment, while the rest of the family has broken up, smaller segments that aren't big enough to get matches, but can still be recognized when doing these types of tests. In previously looking over chromosomes, I seem to have more Oceanian segments than her (albeit small), which might be reflective in the fact I score slightly more Papuan than her.

BalkanKiwi
07-04-2020, 05:22 AM
Acknowledging how many credits OptimaFit uses, I’ve been selective about what populations I put together and the hypothesis and logic for each run (to level of knowledge anyway). To being with, when modelling North African populations, Egyptian provides the highest percentage, however I wanted to see how that would compare by placing more North African populations in, and then seeing if its still the best fit when each model is run. I have no doubt I have some minor North African, which based on my ancestry, seems logical. For what MyHeritage is worth, I get 1.5% North African, my sister gets 2.9%. She also scores some North African on myOrigins. We also get varying amounts across certain calculators. Ideally, I would also do an OptimaFit for my grandfather (the likely source of the North African), however I'm patiently counting down until the Eurogenes store reopens to get his coordinates. In the mean time, it may be difficult to deduce findings simply based off of mine alone.

For the first experiment, I added a number of North African populations listed in the image below. You can add up to 20 populations on GenoPlot when conducting OptimaFit, however I thought I'd stick to a smaller amount, specifically the ones I have previously modeled that I know I score some of.

https://i.imgur.com/hNjCS6L.jpg

Probably not overly surprising, Egyptian comes out on top. I plan on doing this for my grandfather to see if Egyptian is also his best fit. Depending on this, I may also do the same for my mother and sister. Can it be said Egyptian is the possible source of the North African? Its possible, but I need to validate this with more family members (as much as you can validate ancestry without a paper trail 900+ years ago).

Sample:Balkan Kiwi ► BalkanKiwi
Fit:1.2521
Results:Irish84.6
Croatian12
Egyptian3.4

For my next experiment, I wanted to look at Levantine. This post (https://anthrogenica.com/showthread.php?20697-Playing-with-Iranian-populations-using-NMonte-Runner&p=676989&viewfull=1#post676989) shows that Lebanese_Christian is a good fit when using those particular populations, however I wanted to take this a step further and include Levantine populations only. I also added Egyptian, as I was curious to see how strong it is when compared to Levantine populations (I wasn't expecting it to be stronger, but what's the harm in trying with such a great tool).

Once again I included populations I know I score small amounts of when modelling:

https://i.imgur.com/IY8ERRe.jpg

What surprised me is that Samaritan is the best fit. Admittedly I've never modeled Samaritan before, and I was expecting Lebanese_Christian to still be the best fit. I've done some brief research and came across this 23andMe article (https://blog.23andme.com/ancestry-reports/more-than-just-a-parable-the-genetic-history-of-the-samaritans/). This is from 2008, so I'm not sure if the consensus has changed since then with new research. If every Ashkenazi Jew were to run this exact Levantine setup, would Samaritan be the best fit for most people? I'm not sure, possibly not if they are a unique population, so hopefully someone more knowledgeable in Jewish genetics can answer. Perhaps I could improve upon the populations I've selected. I'll be curious to see if my grandfather also gets the same result. I suppose Samaritan could also just be a "fill in" for something else, however it seems Samaritan is distinct enough that it may not be the case, and the purpose of OptimaFit is to not use proxies in a sense for something else, i.e. if Lebanese_Christian is still a better fit, it wouldn't pick Samaritan to represent it, as it has the choice of Samaritan if its a better fit overall. I'm yet to play around with unscaled data and/or turning penalty off to see what happens.

Sample:Balkan Kiwi ► BalkanKiwi
Fit:1.1805
Results:Irish84.8
Croatian10.2
Samaritan5

I revisited this and wanted to see how my grandfather compares. I created a larger OptimaFit model than what I used for myself:

https://i.imgur.com/gbHbrK8.jpg

The result using scaled with penalty:

Sample:Balkan Kiwi ► BKGrandfather
Fit:1.4961
Results:Scottish97.2
Samaritan2.8

Its nice to validate findings and see consistency across generations using this tool.