PDA

View Full Version : Outliers and "Noise"



Ski
10-11-2016, 04:58 AM
Hello all,

When I got my results back from 23andme, I received a composition that includes a lot of strange results, when I have known ancestry going back four or five generations in every direction. I received a result of nearly 23% Southern European, with the majority being Balkan, with Italian, Iberian, and Sardinian thrown into the mix at very small percentages. I was also shown 0.1% Middle Eastern and North African, and >0.1% East Asian. I know the algorithms at the company have difficulty differentiating between basic European populations like French and German; what is the percentage that qualifies as "noise", and what is the possibility that some populations can be misread as others?

wombatofthenorth
10-12-2016, 02:01 AM
Hello all,

When I got my results back from 23andme, I received a composition that includes a lot of strange results, when I have known ancestry going back four or five generations in every direction. I received a result of nearly 23% Southern European, with the majority being Balkan, with Italian, Iberian, and Sardinian thrown into the mix at very small percentages. I was also shown 0.1% Middle Eastern and North African, and >0.1% East Asian. I know the algorithms at the company have difficulty differentiating between basic European populations like French and German; what is the percentage that qualifies as "noise", and what is the possibility that some populations can be misread as others?

could be noise could be real
0.1% and less would likely be farther, even much farther back than 4 to 5 gens back
also some populations have ancient population wide average mix-ins, in which the percentage would not be from 1 or 2 realtively recent ancestors but could be from a random slew of people from all sorts of different times way, wayyyyyyyyyyyyyy back

for instance many from far Eastern Europe get Yakkut/Broadly Asian etc. in small amounts but it's almost always just a very old population wide signal and nothing to do with a single relative 4-8 gens back or anything like that.

AnnieD
10-12-2016, 04:25 AM
I'm not certain if there is a full consensus on "noise" threshold at 23andMe yet, but the figure that I've seen most oft cited is < 1%. OTOH, I've seen staunch defenders, including a PhD who used to run a genetics co., that < 2% without a paper trail is suspect. :argue:

My biggest trace % at 23andMe is observable as something other than typical NW Euro (I think that I'm mostly British Diaspora) via chromosome painter at Gedmatch and other DNA tests such as Genes for Good. Perhaps an approach of analyzing the chromo painters of your favorite Gedmatch calculators & admixture Oracles of persons with similar reported ancestry on forums such as this is your best bet to determine if it is "real". Eurogenes K13 seems to be a favorite for reliability of many with a wide variety of European backgrounds. However, given your reported background via your flags, I personally find your 23andMe % of S. Euro to be high. There is currently a thread by an Englishman with higher than usual S. Euro at FTDNA but not 23andMe ... so it seems to prove that this is still a new science & work-in-process. ;)

Ski
10-12-2016, 08:32 PM
Thank you both for your responses. That seems to clear up a lot of questions I had. When I received the results, I imagined that a lot of these quirks would be the result of population markers from far antiquity before these groups diverged, but, as in the case with the Yakut result, I would imagine that 23andme's algorithms would eventually encode these results under 'unclassified' or something due to the similarity across different ethnic groups. Both of my non-European populations occurred in percentages at or less than 0.1%, so it is possible these are noise or a shared marker with different groups, as they were listed as 'broadly'. I would also like to see some information on the Southern European population, as it's entirely likely (based on what little I know) that 23andme can't differentiate between ethnic groups that are widely spread throughout the Continent, like Slavs. I know there is a reasonably large portion of my background that is Slavic, but I would assume that there is really no set way to distinguish between the genetics of a West Slav versus a Montenegrin since 23andme relies upon self-reported geographical data to a large degree.

wombatofthenorth
10-13-2016, 07:22 PM
Thank you both for your responses. That seems to clear up a lot of questions I had. When I received the results, I imagined that a lot of these quirks would be the result of population markers from far antiquity before these groups diverged, but, as in the case with the Yakut result, I would imagine that 23andme's algorithms would eventually encode these results under 'unclassified' or something due to the similarity across different ethnic groups. Both of my non-European populations occurred in percentages at or less than 0.1%, so it is possible these are noise or a shared marker with different groups, as they were listed as 'broadly'. I would also like to see some information on the Southern European population, as it's entirely likely (based on what little I know) that 23andme can't differentiate between ethnic groups that are widely spread throughout the Continent, like Slavs. I know there is a reasonably large portion of my background that is Slavic, but I would assume that there is really no set way to distinguish between the genetics of a West Slav versus a Montenegrin since 23andme relies upon self-reported geographical data to a large degree.

If you look at the average breakdown from different parts of their gigantic Eastern European region you'll actually see rather noticeably different results come up if someone has most of their ancestry from one part of that region vs. another.

Also some recent papers were able to split the region (and it also gets into Balkan and Greek regions too) into gene pattern groups of:

Eastern European:
1. Baltic - Latvians, Lithuanians, Estonians (although the Estonians have a merge into Finland and much closer gene pattern ties to Finland than the two Baltic speaking Baltic countries they do still have an overall considerably closer autosomal gene pattern tie to to the Baltic speaking Baltic countries than to Finland)

2. West Slavs (which includes a few isolated geographically outlying groups like Sorbs who match Polish people far more than Germans) - Polish, Sorbs, Czechs, Slovakians

3. East Slavs - Belorussians, Ukrainians, North Russians, Central Russians, Southern Russians (with apparently Belorussians, Ukrainians and South Russians the closest matching)

Balkan:
4. Balkan (South Slavs) - made up of South Slavic and a few non-Slavic speakers who are autosomally fully related to the South Slavic speakers (such as Hungarians) - Slovenians, Croatians, Bosnians, Bulgarians, Montenegrians, Serbians, Hungarians, Romanians and probably Macedonians

Greek:
5. southern tip Balkan/Greek - Albanians and Greeks

Apparently as far gene patterns go, the Greek group is as far from the Balkan (South Slavic) Group as the Broadly Northwestern European group is from the Eastern European group and farther from the Balkan (South Slavic) Group than the Balkan (South Slavic) Group is from the Eastern European Group.

wombatofthenorth
10-13-2016, 07:37 PM
Of course it also depends what components you create to test, you can create ones to get all sorts of different ties or separations between regions. You could make ones where even group 5 and group 1 would score 100% the same.

wombatofthenorth
10-13-2016, 07:41 PM
"but I would assume that there is really no set way to distinguish between the genetics of a West Slav versus a Montenegrin"

actually even 23 as it is should be able to tell that apart to a degree (they do have a Balkan category, although their database did seem to stick one or two West/North Slavic groups in which probably makes it create a bit blurrier picture than should have been the case), the latter should score a much higher Balkan to Eastern European ratio by a very considerable degree (of course the problem is still that if someone gets 30 EE and 4 Balkan does that mean they are Polish (assuming Polish people got a 30:4 ratio, which maybe they don't, just a made up example here)? 30 parts Latvian to 4 parts Montenegrin? etc. etc. and if the numbers are small like it's only 8% of your background the randomness of it all makes it even trickier to get a sense of what is up. In some cases it might be clear, in others, not.

Ski
10-13-2016, 11:16 PM
Your example of 30:4 is exactly what I mean. The ratio of something like EE to Balkan doesn't necessarily mean you're NOT one or the other.

jpb
10-14-2016, 03:47 PM
My relative gets 0.1% Broadly SSA with his background being in Colonial KY and VA. A lot of his early KY and VA ancestors were slave owners :-/, so I guess it's possible that could be where it's from or it could be just noise. Any thoughts??

AppalachianGumbo
10-16-2016, 02:37 PM
Hello all,

When I got my results back from 23andme, I received a composition that includes a lot of strange results, when I have known ancestry going back four or five generations in every direction. I received a result of nearly 23% Southern European, with the majority being Balkan, with Italian, Iberian, and Sardinian thrown into the mix at very small percentages. I was also shown 0.1% Middle Eastern and North African, and >0.1% East Asian. I know the algorithms at the company have difficulty differentiating between basic European populations like French and German; what is the percentage that qualifies as "noise", and what is the possibility that some populations can be misread as others?

With autosomal ancestry inference, there is a panel of selected populations which carry a more common SNP or marker found most representative of a group. Say for example, 600 are tested from the Igbo in Africa to represent a region in Africa. 450 carry a similar marker. This marker is assumed to be the most popular marker of that group. Most people which take an ancestry test who may have Igbo ancestry, in theory would share that most "common" marker to some extent and receive an assignment of ancestry from this geographical location. Out of the 600 Igbo tested 450 are used for the panel. So what does this make of the 150 of the Igbo which also a carry a marker? They are outliers. Outliers are not representative of the vast majority of the 600 tested. This would imply most people may not match this marker. The panel is streamlined by using few markers which deviate to widely from that population.

What does this mean? Ancestry inference are gathered and analyzed at certain regions of the loci. When you receive let's say 3% West SSA, this is not saying you are 3% Igbo. What it is saying 3% analyzed has similarity to the panel, not the Igbo population in general.....thus an inference.

Many of these DNA companies are using the HGDP, 1000 Genomes and at times their own homemade panel. What this boil down to, it is not densely sampled within a population. For Native Americans the HDGP has a panel originally explored for diabetes from Central and South America which consisted of 108 people. This does not include Native American from North America or Alaska. It is a thin population panel. Not that it is not accurate, but not entirely representative of the population especially for their region.

Native Americans and East Asians do not have a gradient and share markers so similar they are hard to tell apart and can be misread as East Asian since Native American DNA is Asian based. The few markers used from Central/South America are the anchor group for all Indigenous America. Europeans have a large gradient between them and Sub-Sahara Africa. South Asian Indians have a gradient with East Asians. They are more easily separated where some populations are more grainy. Italians and Ashkenazi with 23andMe was hard to tell apart because they were very similar. The ancestry inference is not ingenious and is only using statistical data.

Also bare in mind....Markers are shared with all groups, though found at higher frequencies in certain groups can be found in others as well.

AppalachianGumbo
10-16-2016, 02:38 PM
Dup post