PDA

View Full Version : Sample Bias?



mwauthy
11-25-2018, 04:34 PM
Whatís going on in Denmark? Is there sample bias there where for some odd reason they are the only country in the region not testing Y dna very much, or is I-DF29 truly not that common in that country? Here are numbers for I-DF29 based on the Ftdna public haplotree.

-Sweden 985 kits with a pop of 10m
-Finland 525 kits with a pop of 5.5m
-Norway 358 kits with a pop of 5.3m
-Denmark 98 kits with a pop of 5.8m

spruithean
11-25-2018, 04:39 PM
I have a feeling it's just the amount of Danes and those of Danish descent who haven't done DNA testing. I-DF29 is the most dominant lineage under M253 so it being absent from Denmark would be an improbability.

JMcB
11-25-2018, 04:47 PM
What’s going on in Denmark? Is there sample bias there where for some odd reason they are the only country in the region not testing Y dna very much, or is I-DF29 truly not that common in that country? Here are numbers for I-DF29 based on the Ftdna public haplotree.

-Sweden 985 kits with a pop of 10m
-Finland 525 kits with a pop of 5.5m
-Norway 358 kits with a pop of 5.3m
-Denmark 98 kits with a pop of 5.8m

At this point, it appears the Danes aren’t really that inclined towards having their DNA tested. I don’t have the total figures off the top of my head but as your numbers indicate, the Swedes, Finns and Norwegians far surpass them in that regard.

Ruderico
11-25-2018, 04:47 PM
You shouldn't really compare to total population because you might be skewing the result, but to the # of kits sampled

MitchellSince1893
11-25-2018, 04:48 PM
Denmark is under represented, but still has the lowest percentage of DF29.

Total kits for those countries
Denmark: 555, 17.7% I-DF29
Finland: 2421 (smaller pop than Denmark, they must really be into dna tests) 21.7% I-DF29
Norway: 1537, 23.3% I-DF29
Sweden: 2850, 34% I-DF29

Denmark has ~ half the DF29 percentage of Sweden.

mwauthy
11-25-2018, 05:11 PM
You shouldn't really compare to total population because you might be skewing the result, but to the # of kits sampled

You are correct!

mwauthy
11-25-2018, 05:12 PM
Denmark is under represented, but still has the lowest percentage of DF29.

Total kits for those countries
Denmark: 555, 17.7% I-DF29
Finland: 2421 (smaller pop than Denmark, they must really be into dna tests) 21.7% I-DF29
Norway: 1537, 23.3% I-DF29
Sweden: 2850, 34% I-DF29

Denmark has ~ half the DF29 percentage of Sweden.
I misread your post initially. Thanks for the figures.

MitchellSince1893
11-25-2018, 05:16 PM
I misread your post initially. Thanks for the figures.

Here's a list of European countries in the FTDNA database with their DF29 percentages

Isle of Man 44.44%
Sweden 34.49%
Faroe Islands 25.00%
Norway 23.36%
Finland 21.81%
Denmark 17.12%
Liechtenstein 14.29%
Montenegro 13.95%
Slovenia 11.71%
Netherlands 9.87%
Estonia 9.30%
England 6.78%
Germany 6.62%
N Ireland 6.27%
Belgium 6.20%
United Kingdom 5.77%
Serbia 5.60%
Scotland 5.48%
Bosnia 5.31%
Switz 5.10%
Albania 4.55%
Croatia 4.55%
Macedonia 4.55%
Bulgaria 4.47%
France 4.12%
Portugal 3.74%
Poland 3.61%
Austria 3.14%
Iceland 3.03%
Romania 2.77%
Wales 2.71%
Ireland 2.69%
Czech 2.64%
Slovakia 2.51%
Russia 2.28%
Spain 2.18%
Lithuania 1.84%
Moldova 1.82%
Italy 1.64%
Hungary 1.63%
Ukraine 1.59%
Belarus 1.52%
Latvia 1.30%
Greece 0.98%
Georgia 0.42%
Turkey 0.27%

mwauthy
11-25-2018, 05:19 PM
Here's a list of European countries in the FTDNA database with their DF29 percentages

Where can I access this data? Where does it show total kits per country?

MitchellSince1893
11-25-2018, 05:25 PM
Where can I access this data? Where does it show total kits per country?

1. Go to the top of the tree and click on "A".
2. Then click on 3 dots to the right of the top line "A-PR292"
3. Select "Country Report"

This will show you the total number of kits, currently 162,525, and the total for each country

I was doing this for some R haplogroups so I already had the structure built on my excel spreadsheet. All I had to do was enter the DF29 numbers into my spreadsheet and divide by the total numbers for each country.

Helgenes50
11-25-2018, 05:29 PM
Where can I access this data? Where does it show total kits per country?

The best way to compare is here
https://www.familytreedna.com/public/y-dna-haplotree/A

mwauthy
11-25-2018, 05:32 PM
1. Go to the top of the tree and click on "A".
2. Then click on 3 dots to the right of the top line "A-PR292"
3. Select "Country Report"

This will show you the total number of kits, currently 162,525, and the total for each country

I was doing this for some R haplogroups so I already had the structure built on my excel spreadsheet. All I had to do was enter the DF29 numbers into my spreadsheet and divide by the total numbers for each country.

Thanks!

MitchellSince1893
11-25-2018, 05:41 PM
Thanks!

Then you can create a map from the results. This isn't a completed map...no color key, but it gives you an idea of what you can do with any haplogroup using the FTDNA numbers.

https://i.pinimg.com/originals/7f/e1/f7/7fe1f7b127b30fab209328a80bb8dd4f.png

MitchellSince1893
11-25-2018, 05:58 PM
In case anyone is interested in doing this, This may save you some time on the European country totals...

Albania 66
Armenia 500
Austria 478
Azerbijan 109
Belarus 591
Belgium 355
Bosnia 113
Bulgaria 291
Croatia 154
Cyprus 57
Czech 492
Denmark 555
England 10996
Estonia 86
Faore Isles 4
Finland 2,421
France 2,497
Georgia 238
Germany 7,326
Greece 615
Guernsey 4
Hungary 800
Iceland 132
Ireland 7,767
Isle of Man 18
Italy 2,749
Jersey 1
Latvia 154
Liech 7
Lithuania 708
Lux 44
Macedonia 66
Malta 50
Moldova 55
Montenegro 43
N Ireland 830
Netherlands 983
Norway 1,537
Poland 2,633
Portugal 722
Romania 397
Russia 3,600
Scotland 5,835
Serbia 125
Slovakia 319
Slovenia 111
Spain 2,061
Sweden 2,850
Switz 1,255
Turkey 741
UK 4,127
Ukraine 1,255
Wales 922

mwauthy
11-25-2018, 06:08 PM
In case anyone is interested in doing this, They may save you some time on the European country totals...

Albania 66
Armenia 500
Austria 478
Azerbijan 109
Belarus 591
Belgium 355
Bosnia 113
Bulgaria 291
Croatia 154
Cyprus 57
Czech 492
Denmark 555
England 10996
Estonia 86
Faore Isles 4
Finland 2,421
France 2,497
Georgia 238
Germany 7,326
Greece 615
Guernsey 4
Hungary 800
Iceland 132
Ireland 7,767
Isle of Man 18
Italy 2,749
Jersey 1
Latvia 154
Liech 7
Lithuania 708
Lux 44
Macedonia 66
Malta 50
Moldova 55
Montenegro 43
N Ireland 830
Netherlands 983
Norway 1,537
Poland 2,633
Portugal 722
Romania 397
Russia 3,600
Scotland 5,835
Serbia 125
Slovakia 319
Slovenia 111
Spain 2,061
Sweden 2,850
Switz 1,255
Turkey 741
UK 4,127
Ukraine 1,255
Wales 922

I wonder what is a good amount for minimum amount of kits sampled in order not to have skewed results as with the Isle of Man percentage?

MitchellSince1893
11-25-2018, 06:11 PM
I wonder what is a good amount for minimum amount of kits sampled in order not to have skewed results as with the Isle of Man percentage?

In the case of Isle of Man. The male population is ~42500. So you would need a minimum of 381 samples. https://www.calculator.net/sample-size-calculator.html?type=1&cl=95&ci=5&pp=50&ps=42500&x=82&y=20

According to the calculator above, the safe number for countries is a minimum of 385 samples.

The above answer isn't very helpful as it excludes 23 of the above countries

Just as a WAG (wild ass guess), go with countries with at least 50 to 100 samples total. It's really up to you.

ianz91
11-25-2018, 06:35 PM
DNA testing seems to be mainly a New World thing, think USA, Canada, and Australia. So there's going to be less testing from Europe, Asia, Africa, Middle East, etc., especially when it comes to specific Y-DNA testing.

spruithean
11-25-2018, 07:09 PM
DNA testing seems to be mainly a New World thing, think USA, Canada, and Australia. So there's going to be less testing from Europe, Asia, Africa, Middle East, etc., especially when it comes to specific Y-DNA testing.

Agreed. When you combine the pursuit to find where a certain ancestor came from you are bound to end up with an abundance of New World genealogists flooding the Y-DNA databases.

It seems the aspect of "not knowing" where ones family may exactly come from drives this testing. I know what my maternal grandfather's haplogroup is however it isn't as "interesting" as my paternal haplogroup because I know where my grandfather was from, his village, the house he grew up in the Netherlands, etc. That is not something I have with my direct paternal lineage.

mwauthy
11-25-2018, 08:55 PM
Here are percentages for I-M253 based on the Ftdna Haplotree since many STR testers show as I-M253 rather than I-DF29. I might add more countries later. Minimum of at least 25 kits positive for I-M253.

Sweden: 1,291/2,850= 45%
Norway: 579/1,537= 38%
Denmark: 202/555= 36%
Iceland: 42/132= 32%
Finland: 664/2,421= 27%
Netherlands: 229/983= 23%
England: 2,176/10,996= 20%
Germany: 1,278/7,326= 17%
Scotland: 797/5,835= 14%
Wales: 126/922= 14%
Belgium: 42/355= 12%
Switzerland: 148/1,255= 12%
France: 245/2,497= 10%
Austria: 45/478= 9%
Ireland: 569/7,767= 7%
Poland: 173/2,633= 7%

JonikW
11-25-2018, 09:05 PM
Fascinating. I remember reading somewhere that I1 shows most diversity in Denmark and may have originated there even if it ranks lower than others in percentage terms. Anyone know anything about that?

JMcB
11-25-2018, 09:24 PM
In case anyone is interested in doing this, This may save you some time on the European country totals...

Albania 66
Armenia 500
Austria 478
Azerbijan 109
Belarus 591
Belgium 355
Bosnia 113
Bulgaria 291
Croatia 154
Cyprus 57
Czech 492
Denmark 555
England 10996
Estonia 86
Faore Isles 4
Finland 2,421
France 2,497
Georgia 238
Germany 7,326
Greece 615
Guernsey 4
Hungary 800
Iceland 132
Ireland 7,767
Isle of Man 18
Italy 2,749
Jersey 1
Latvia 154
Liech 7
Lithuania 708
Lux 44
Macedonia 66
Malta 50
Moldova 55
Montenegro 43
N Ireland 830
Netherlands 983
Norway 1,537
Poland 2,633
Portugal 722
Romania 397
Russia 3,600
Scotland 5,835
Serbia 125
Slovakia 319
Slovenia 111
Spain 2,061
Sweden 2,850
Switz 1,255
Turkey 741
UK 4,127
Ukraine 1,255
Wales 922

There’s a lot of British Isles in those numbers (30495)

JMcB
11-25-2018, 09:25 PM
double post.

mwauthy
11-25-2018, 10:35 PM
Thereís a lot of British Isles in those numbers (30495)

19% of all kits and that doesnít include people with British Isles ancestry that marked Unites States or Canada.

GoldenHind
11-27-2018, 02:52 AM
There’s a lot of British Isles in those numbers (30495)

The enormous overweighting of samples with ancestry from Britain and Ireland in the FTDNA database has been known for several years.

JMcB
11-27-2018, 04:41 AM
The enormous overweighting of samples with ancestry from Britain and Ireland in the FTDNA database has been known for several years.

I was aware of that but it was nice to see the numbers, nevertheless

oz
11-27-2018, 05:21 AM
Here are percentages for I-M253 based on the Ftdna Haplotree since many STR testers show as I-M253 rather than I-DF29. I might add more countries later. Minimum of at least 25 kits positive for I-M253.

Sweden: 1,291/2,850= 45%
Norway: 579/1,537= 38%
Denmark: 202/555= 36%
Iceland: 42/132= 32%
Finland: 664/2,421= 27%
Netherlands: 229/983= 23%
England: 2,176/10,996= 20%
Germany: 1,278/7,326= 17%
Scotland: 797/5,835= 14%
Wales: 126/922= 14%
Belgium: 42/355= 12%
Switzerland: 148/1,255= 12%
France: 245/2,497= 10%
Austria: 45/478= 9%
Ireland: 569/7,767= 7%
Poland: 173/2,633= 7%

Yeah i'm one of those i never tested further than y12 on ftdna.

mwauthy
12-08-2018, 05:49 PM
You shouldn't really compare to total population because you might be skewing the result, but to the # of kits sampled

Sweden: 1291/2850=45%, 10m
Norway: 579/1537=38%, 5m
Denmark: 202/555=36%, 6m
Iceland: 42/132=32%, 0.3m
Finland: 664/2421=27%, 6m
Netherlands: 229/983=23%, 17m
England: 2176/10996=20%, 55m
UK: 736/4127=18%, 66m
Germany: 1278/7326=17%, 83m
Scotland: 797/5835=14%, 5m
Wales: 126/922=14%, 3m
Belgium: 42/355=12%, 11m
Switzerland: 148/1255=12%, 8m
France: 245/2497=10%, 67m
Northern Ireland: 87/830=10%, 2m
Austria: 45/478= 9%, 9m
Ireland: 569/7767=7%, 5m
Poland: 173/2633=7%, 38m

I decided to include population totals to illustrate the sampling bias. Belgium has a slightly larger population than Sweden yet Sweden has 8 times as many samples. The British Isles has around 7 times the population of Belgium yet has 85 times as many samples. Thatís a disproportionate sampling bias of 12 to 1. I believe that sampling bias is going to negatively influence any theories we have regarding subclade distribution and tying them to particular historical events.

Michał
12-08-2018, 10:33 PM
I decided to include population totals to illustrate the sampling bias. Belgium has a slightly larger population than Sweden yet Sweden has 8 times as many samples. The British Isles has around 7 times the population of Belgium yet has 85 times as many samples. Thatís a disproportionate sampling bias of 12 to 1. I believe that sampling bias is going to negatively influence any theories we have regarding subclade distribution and tying them to particular historical events.
I wouldn't say that this kind of bias would affect any theories we have. For example, this won't affect a theory that assumes migration of I1a from Scandinavia to Britain, as I1a is relatively frequent in both regions and both regions are well represented among the FTDNA customers, so we can quite securely estimate the frequency of I1a in both Scandinavia and Britain (or even in specific countries/subregions). There is no doubt that I1a is most frequent in Scandinavia (and more specifically in Sweden), and this won't be changed after testing more people from underrepresented countries. Of course, the situation would be very different for some very rare subclades of I1a, especially when these subclades are occasionally seen in some countries that are strongly underrepresented among the FTDNA customers, like some countries in Eastern and SE Europe (Belarus, Moldova, Macedonia, Albania, etc.), as in all such cases it would much harder to analyze the relative frequencies in particular countries.

spruithean
12-08-2018, 11:28 PM
I wouldn't say that this kind of bias would affect any theories we have. For example, this won't affect a theory that assumes migration of I1a from Scandinavia to Britain, as I1a is relatively frequent in both regions and both regions are well represented among the FTDNA customers, so we can quite securely estimate the frequency of I1a in both Scandinavia and Britain (or even in specific countries/subregions). There is no doubt that I1a is most frequent in Scandinavia (and more specifically in Sweden), and this won't be changed after testing more people from underrepresented countries. Of course, the situation would be very different for some very rare subclades of I1a, especially when these subclades are occasionally seen in some countries that are strongly underrepresented among the FTDNA customers, like some countries in Eastern and SE Europe (Belarus, Moldova, Macedonia, Albania, etc.), as in all such cases it would much harder to analyze the relative frequencies in particular countries.

I would have to agree. The lack of I1 within aDNA samples outside of Northern Europe seems to reflect our current theories.

mwauthy
12-08-2018, 11:31 PM
I wouldn't say that this kind of bias would affect any theories we have. For example, this won't affect a theory that assumes migration of I1a from Scandinavia to Britain, as I1a is relatively frequent in both regions and both regions are well represented among the FTDNA customers, so we can quite securely estimate the frequency of I1a in both Scandinavia and Britain (or even in specific countries/subregions). There is no doubt that I1a is most frequent in Scandinavia (and more specifically in Sweden), and this won't be changed after testing more people from underrepresented countries. Of course, the situation would be very different for some very rare subclades of I1a, especially when these subclades are occasionally seen in some countries that are strongly underrepresented among the FTDNA customers, like some countries in Eastern and SE Europe (Belarus, Moldova, Macedonia, Albania, etc.), as in all such cases it would much harder to analyze the relative frequencies in particular countries.

I agree with you that nothing is going to change the fact that I1a has its highest frequencies in Scandinavia. Iím simply alluding to the theories regarding the origins and distributions of subclades much farther downstream of I1a and how they can possibly be attributed to certain historical migrations.

spruithean
12-08-2018, 11:35 PM
I agree with you that nothing is going to change the fact that I1a has its highest frequencies in Scandinavia. Iím simply alluding to the theories regarding the origins and distributions of subclades much farther downstream of I1a and how they can possibly be attributed to certain historical migrations.

It's possible I suppose. The frustrating thing with I1a subclades is the lower frequencies of them in databases. The lack of aDNA results or more specifically deeper testing on the aDNA really limits the scope with which we can use to theorize.

JonikW
12-09-2018, 09:39 PM
It's possible I suppose. The frustrating thing with I1a subclades is the lower frequencies of them in databases. The lack of aDNA results or more specifically deeper testing on the aDNA really limits the scope with which we can use to theorize.

Interesting posts. I hope we soon see loads of aDNA from Scandinavia that will help us build a picture. My bet is that my Z140 subclade originated with the Angles and was subsequently also spread into Britain by the Vikings.

ronzo
12-10-2018, 03:37 AM
Interesting posts. I hope we soon see loads of aDNA from Scandinavia that will help us build a picture. My bet is that my Z140 subclade originated with the Angles and was subsequently also spread into Britain by the Vikings.

I also think this is an interesting post as I am a Z140 with Danish roots.I have been very fortunate to have a relatively close match from Denmark (375 years) considering the relatively low testing rates there. My own personal situation doesn't prove anything of course but I have a low level of Scandinavian autosomal DNA yet a proven YDNA link in Denmark to at least the early 1700's.

RobertCasey
12-10-2018, 05:14 AM
Here is my post on how this report can be used by haplogroup admins to determine how many missing testers are from their spreadsheet summaries (posted under Facebook group - Only FTDNA Project Admins)

There is a new handy report under the public FTDNA haplotree that allows you to put in your haplogroup and it will tell you the counts for "Country of Origin" for people who have tested positive for your haplogroup. This is an excellent tool to see how well you are doing with recruiting and finding testers in other projects. Unfortunately, this report omits Unknown and Blank origins.

So this new report on R-L226 showed 470 known L226 testers. My entire spreadsheet (with Unknown and Blank) only has 452 testers that are confirmed L226 positive. So I am missing a lot more than 18 testers in my spreadsheet. I have 149 testers that I summarize as Unknown European ties but there are only 50 testers in the report that are reported with non-European origins. So at least 99 confirmed L226 testers are omitted from this new FTDNA report due to being Unknown or Blank.

There is one small error found in this new report - they have two listings for origins of Haiti. These two testers actually belong to the branch immediately above L226 and they are L226 negative. However, based on this report, admins should encourage that all testers put some value into this setting. Putting USA, Australia, Canada, etc, helps makes this report more complete and would allow admins to better determine how many confirmed testers of their haplgroup are missing.

Also, I went through and spot checked around 100 testers and found changes in this field for around 10 % of the testers that have updated from Unknown/Blank to a country.
First go to the public FTDNA haplotree:

https://www.familytreedna.com/public/y-dna-haplotree/A (https://www.familytreedna.com/public/y-dna-haplotree/A?fbclid=IwAR1v_W8KHZwKDDGDZJo64wjTH_aXLGUl7EkXlTA 6XTZuzQxtZLd2vOCyeM0)

Key in your haplogroup (ie. R-L226) - you must have the haplogroup letter included

Click on the three dots stacked on top of each other to the right next and select "Country Report."

https://www.familytreedna.com/…/y-dna-haplotr…/R;name=R-L226 (https://www.familytreedna.com/public/y-dna-haplotree/R;name=R-L226?fbclid=IwAR0IhjssjsTmZfVx845Rs0zAqUhlNe4OPxw9 DMt_AWCifbTaj2NmmBUA8z4)

For more details, see the post at Anthrogenica:

https://anthrogenica.com/showthread.php?15929-Sample-Bias (https://anthrogenica.com/showthread.php?15929-Sample-Bias&fbclid=IwAR0NsYCysilihxmxg97u6mRM-JMpBC2TLxZK3WZbXoUbP9DIEDZtt0y5c1c)

mwauthy
12-10-2018, 01:30 PM
Iíd also like to point out that these countries of origin are self reported and could be occasionally inaccurate or not relevant to historical migrations. For example, there is one kit that lists Austria as the country of origin but they have a Welsh surname. Iíve contacted them trying to figure out the discrepancy but have received no response. Wales versus Austria is going to affect how I interpret the historical subclade origin and distribution.

oz
12-10-2018, 04:57 PM
Iíd also like to point out that these countries of origin are self reported and could be occasionally inaccurate or not relevant to historical migrations. For example, there is one kit that lists Austria as the country of origin but they have a Welsh surname. Iíve contacted them trying to figure out the discrepancy but have received no response. Wales versus Austria is going to affect how I interpret the historical subclade origin and distribution.

I agree the self-reporting thing might be quite misleading in some cases. Plus a lot of these countries unfortunately have a very low number of samples so drawing any conclusions would be very sketchy. What does look pretty obvious considering the amount of sampling and population size, is that the Z63 branch is freakin tiny in Scandinavia and other northernmost regions of Europe compared to other major subclades the Y2592 and the Z58. The Z63 seems to get more frequent in comparison to those two from the Netherlands towards the mainland. In Scandinavia it's only about 2.5% of the I1 (or DF29 same difference) pretty much. With Denmark having the highest percentage of Z63 compared to Sweden and Norway, while in the Finnish samples it's only 1.3% of the DF29! There is one in Iceland though out of 4 samples lol. It is also more common in the British Isles than Scandinavia, mostly England.
And on the mainland it looks to be roughly around 20% of the DF29 give the limited amount of data and samples. With the highest percentage in Spain at 60% out of 45 samples, and Ukraine at 50% but even less only out of 20 samples.

In conclusion i would say this database might only show very rough estimates, but what seems pretty certain is that Z58 and its downstreams is the most common type of I1 branch probably worldwide and in Europe. And Z63 might've have not even originated in Scandinavia given how tiny the frequency of it is there compared to other direct descendant and major branches of DF29. Its ancestor might have originated in Scandinavia but for whatever reason it seems to have spread around more on the mainland.

spruithean
12-11-2018, 12:40 AM
Iíd also like to point out that these countries of origin are self reported and could be occasionally inaccurate or not relevant to historical migrations. For example, there is one kit that lists Austria as the country of origin but they have a Welsh surname. Iíve contacted them trying to figure out the discrepancy but have received no response. Wales versus Austria is going to affect how I interpret the historical subclade origin and distribution.

I've noticed this with my own matches on FTDNA at times. Some of them list a literal "paternal ancestor" and their place of origin instead of their direct patrilineal ancestor (Y-line) and where they were from. It caused some confusion until information was exchanged and trees were laid out :lol: . But establishing contact with matches is not a guaranteed event.


I agree the self-reporting thing might be quite misleading in some cases. Plus a lot of these countries unfortunately have a very low number of samples so drawing any conclusions would be very sketchy. What does look pretty obvious considering the amount of sampling and population size, is that the Z63 branch is freakin tiny in Scandinavia and other northernmost regions of Europe compared to other major subclades the Y2592 and the Z58. The Z63 seems to get more frequent in comparison to those two from the Netherlands towards the mainland. In Scandinavia it's only about 2.5% of the I1 (or DF29 same difference) pretty much. With Denmark having the highest percentage of Z63 compared to Sweden and Norway, while in the Finnish samples it's only 1.3% of the DF29! There is one in Iceland though out of 4 samples lol. It is also more common in the British Isles than Scandinavia, mostly England.
And on the mainland it looks to be roughly around 20% of the DF29 give the limited amount of data and samples. With the highest percentage in Spain at 60% out of 45 samples, and Ukraine at 50% but even less only out of 20 samples.

In conclusion i would say this database might only show very rough estimates, but what seems pretty certain is that Z58 and its downstreams is the most common type of I1 branch probably worldwide and in Europe. And Z63 might've have not even originated in Scandinavia given how tiny the frequency of it is there compared to other direct descendant and major branches of DF29. Its ancestor might have originated in Scandinavia but for whatever reason it seems to have spread around more on the mainland.

I've noticed similar things with Z63 vs Z58 and the other DF29 groups. There is definitely a large piece of I1 data missing from both modern individuals, but more so with ancient samples. I1 has such a young diversification date compared to a formation date when compared to most other haplogroups. Something interesting happened...

oz
12-11-2018, 04:16 AM
I've noticed this with my own matches on FTDNA at times. Some of them list a literal "paternal ancestor" and their place of origin instead of their direct patrilineal ancestor (Y-line) and where they were from. It caused some confusion until information was exchanged and trees were laid out :lol: . But establishing contact with matches is not a guaranteed event.



I've noticed similar things with Z63 vs Z58 and the other DF29 groups. There is definitely a large piece of I1 data missing from both modern individuals, but more so with ancient samples. I1 has such a young diversification date compared to a formation date when compared to most other haplogroups. Something interesting happened...

Yea... something interesting might've happened but who's gonna bother with it? To figure out origins or history of a subclade that's as low as 2% of the European population. Hopefully we get lucky with some ancient Dna at least, it's a slim hope though.

spruithean
12-11-2018, 01:48 PM
Yea... something interesting might've happened but who's gonna bother with it? To figure out origins or history of a subclade that's as low as 2% of the European population. Hopefully we get lucky with some ancient Dna at least, it's a slim hope though.

I'm afraid that's probably true. Not a haplogroup that seems to be tied to the various IE groups or anything like that. Just seems to be one of those "also present" haplogroups.

mwauthy
12-11-2018, 02:01 PM
Yea... something interesting might've happened but who's gonna bother with it? To figure out origins or history of a subclade that's as low as 2% of the European population. Hopefully we get lucky with some ancient Dna at least, it's a slim hope though.

I did not know that I1 only accounts for 2% of the European population. I read somewhere that haplogroup I represents 20% of the European population. Does that mean that I2 is 18% or is the 2% figure for I1 incorrect?

oz
12-11-2018, 02:59 PM
I did not know that I1 only accounts for 2% of the European population. I read somewhere that haplogroup I represents 20% of the European population. Does that mean that I2 is 18% or is the 2% figure for I1 incorrect?

You misunderstood me, I said the Z63 branch is most likely around that percentage. But who knows.
Btw, looking at this ftdna database I noticed a small amount of the M253 people were interested in taking deeper tests or something. Especially the England and Germany. Compare the numbers with Swedes who tested for DF29 + it's double the amount. So the numbers under the DF29 subclades could be much higher if those people tested deeper than just M253.

RobertCasey
12-11-2018, 03:25 PM
I did not know that I1 only accounts for 2% of the European population. I read somewhere that haplogroup I represents 20% of the European population. Does that mean that I2 is 18% or is the 2% figure for I1 incorrect?

The point of my post has nothing to do with countries (or surnames). These reports give excellent summaries of testers that are confirmed for any particular haplogroup. But look at the three key summaries for my L226 project (using the geography report which appears to be pretty good coverage of all of L226):

_______________________Confirmed
_______________________L226

Within L226 project_______428
In my spreadsheet________448
FTDNA report____________597 - 25 % of confirmed testers are missing from my spreadsheet - primarily - "DNA opt in to sharing" disabled. Also included: project does not post public YSTR report, tester belongs to no project.

So even though I have increased my project by 100 % during the last 18 months - I am still missing 25 % of the testers that have tested positive for L226 or its downstream branches.

This report also gives you an idea of how large haplogroups are when compared to each other. There are three large haplogroups that are dominated by Irish testers:

M222 = 3,359___CTS4466 = 938____L226 = 597

It gives your the relative sizes of larger haplogroups (in comparison with each other)

M173 = 71,099 > M269 = 56,755 > P312= 33,440 > L21 = 19,261 > Z253 = 2334 > L226 = 597 > FGC5660 = 326 > FGC5628 = 163 > ZZ34_1 = 91 > FGC5647 = 4

The surname report is totally bogus - for 597 testers with geographies - it shows a mere 61 surnames.

For L226, I find the geographical origins very questionable. Because 90 % of the testers are listed as "Ireland," many just assume that they must be Irish. I find that looking Surname.io to determine the nationality of surnames is much more reliable. There is is now strong evidence that the L226 mutation did not occur in Ireland. We recently broke up the L226 equivalents into two new branches just above L226 and we have one very old branch of L226 that is only 1.0 % of L226 testers. Looking at the surnames - none are Irish surnames. These are spread out over many geographic areas: Scotland, England, Germany and France. So origins of surnames of oldest proven ancestors is much more reliable than what people enter into this field. More than half of the testers still stuck in the USA enter Ireland - since most of testers list Ireland which is not correct.

mwauthy
12-11-2018, 03:38 PM
You misunderstood me, I said the Z63 branch is most likely around that percentage. But who knows.
Btw, looking at this ftdna database I noticed a small amount of the M253 people were interested in taking deeper tests or something. Especially the England and Germany. Compare the numbers with Swedes who tested for DF29 + it's double the amount. So the numbers under the DF29 subclades could be much higher if those people tested deeper than just M253.

The numbers I posted originally were for I-DF29. Then I posted numbers for I-M253. However, I think there are even more STR testers in the Ftdna database than I posted for I-M253 because I think people who are projected for I-M253 (subclade is shown in red) are not listed on the Ftdna Haplotree.

Pylsteen
12-11-2018, 04:41 PM
I am trying to have a look at the distribution of Dutch haplogroups, but it is difficult to obtain percentages. Those who have tested (or projected?) I-M253 are 229/988 (ca. 23%) of Dutch lineages; almost 60% of those are stuck at the I-M253 level. In contrast, I count 441/988 (ca. 44,6%) R1b, of which only ca. 20% seems stuck at R-M343 and R-M269. If I would only include more defined lineages, the percentage of I1 would certainly go down against R1b.

oz
12-11-2018, 04:52 PM
I am trying to have a look at the distribution of Dutch haplogroups, but it is difficult to obtain percentages. Those who have tested (or projected?) I-M253 are 229/988 (ca. 23%) of Dutch lineages; almost 60% of those are stuck at the I-M253 level. In contrast, I count 441/988 (ca. 44,6%) R1b, of which only ca. 20% seems stuck at R-M343 and R-M269. If I would only include more defined lineages, the percentage of I1 would certainly go down against R1b.

It's not that difficult to obtain percentages there are online calculators if you google them.

Pylsteen
12-11-2018, 05:03 PM
It's not that difficult to obtain percentages there are online calculators if you google them.

I know how to do that, my issue is that the number of people that are "stuck" in a projected or basic result makes it difficult to get a percentage of defined subclades of the whole.
For example, there are 19 Dutch I-Z63, so 19/988 = ca. 1,92%, but there must be some more of them among the 131 ones who have not tested/projected further than I-M253.

oz
12-11-2018, 05:07 PM
I know how to do that, my issue is that the number of people that are "stuck" in a projected or basic result makes it difficult to get a percentage of defined subclades of the whole.
For example, there are 19 Dutch I-Z63, so 19/988 = ca. 1,92%, but there must be some more of them among the 131 ones who have not tested/projected further than I-M253.

Yes and I already mentioned that in the above post.

Pylsteen
12-11-2018, 05:24 PM
Yes and I already mentioned that in the above post.

A yes I see; I have to find a method. The results differ per method. If I get rid of all basic lineages in all haplogroups, I am left with ca. 662 more defined results. I would get 19/662 = ca. 2,9% of Z63 among the Dutch. However, this would also lead to ca. 15% of I-M253 instead of ca. 23%, since it seems that with I-M253, the number of basic results is much larger than with R-M269. If I keep the basic results and assume that the proportions of the subclades are the same among the basic lines as found among the more defined results, I would get ca. 4,4% of Z63 among the Dutch.

oz
12-11-2018, 05:50 PM
A yes I see; I have to find a method. The results differ per method. If I get rid of all basic lineages in all haplogroups, I am left with ca. 662 more defined results. I would get 19/662 = ca. 2,9% of Z63 among the Dutch. However, this would also lead to ca. 15% of I-M253 instead of ca. 23%, since it seems that with I-M253, the number of basic results is much larger than with R-M269. If I keep the basic results and assume that the proportions of the subclades are the same among the basic lines as found among the more defined results, I would get ca. 4,4% of Z63 among the Dutch.

I don't know what you're projecting but that list shows Dutch at 308 I-M170 and 229 of that is I-M253 and it shows 441 R1b-343, 420 R1b-M269.

Pylsteen
12-11-2018, 06:11 PM
I don't know what you're projecting but that list shows Dutch at 308 I-M170 and 229 of that is I-M253 and it shows 441 R1b-343, 420 R1b-M269.

131/229 (ca. 58%) of I-M253 is "stuck" at the I-M253 level; 75/441 (ca. 18%) of R-M343 is stuck at M343 or M269; if I would drop those, the percentage of remaining I1 goes down against R1b and this affects the percentages of subclades of the total too.
If I drop every seemingly basic results in all haplogroups, I am left with 662/988 "slightly to good defined" Dutch results to work with.
In that case, I-Z63 is 19/662 (ca. 2,9%) of all "defined" Dutch lineages.
Of the "defined" I-M253 lineages (229 minus the 131 "basic" = 98), I-Z63 is 19/98 = ca. 19,38%.
If I choose to keep working with all 988 dutch results (including the basic ones), and assume that among all I-M253, I-Z63 is found at ca. 19,4%, then I get an estimate of (0,194x229)/988 = ca. 4,5% of I-Z63 in the whole population.
Hope it's clear what I am doing.

oz
12-11-2018, 07:18 PM
131/229 (ca. 58%) of I-M253 is "stuck" at the I-M253 level; 75/441 (ca. 18%) of R-M343 is stuck at M343 or M269; if I would drop those, the percentage of remaining I1 goes down against R1b and this affects the percentages of subclades of the total too.
If I drop every seemingly basic results in all haplogroups, I am left with 662/988 "slightly to good defined" Dutch results to work with.
In that case, I-Z63 is 19/662 (ca. 2,9%) of all "defined" Dutch lineages.
Of the "defined" I-M253 lineages (229 minus the 131 "basic" = 98), I-Z63 is 19/98 = ca. 19,38%.
If I choose to keep working with all 988 dutch results (including the basic ones), and assume that among all I-M253, I-Z63 is found at ca. 19,4%, then I get an estimate of (0,194x229)/988 = ca. 4,5% of I-Z63 in the whole population.
Hope it's clear what I am doing.

Why are you so "stuck" on the M253 of course most of them up to 99% will be under DF29.

Pylsteen
12-11-2018, 08:27 PM
Why are you so "stuck" on the M253 of course most of them up to 99% will be under DF29.

Yes, I agree with that. It seems reasonable to me to assume there will be not much difference in the percentages of subclades found in the defined groups and the yet undefined group. As long as the data set is large enough and I state that I make use of extrapolation/induction, it will probably not be a large problem. Well, that's it for today.

oz
12-12-2018, 03:18 AM
One more interesting thing about the chart is Spain and I1 particularly Z63. I did some rough calculations. It shows 943 R-343 and if that represents about 60% of the population you add the 99 samples of I-M253 that's about 9.5 less common so it would amount to around 6.3% of I1 in Spain. 27 of the 45 DF29 is under Z63 which means it could be up to 4% of the population. If that's the case and it doesn't have to be again it's rough estimates, but still it indicates a significant presence of Z63 there the sample size isn't that small. I'm thinking Visigoths maybe.

dink
12-12-2018, 09:55 PM
One more interesting thing about the chart is Spain and I1 particularly Z63. I did some rough calculations. It shows 943 R-343 and if that represents about 60% of the population you add the 99 samples of I-M253 that's about 9.5 less common so it would amount to around 6.3% of I1 in Spain. 27 of the 45 DF29 is under Z63 which means it could be up to 4% of the population. If that's the case and it doesn't have to be again it's rough estimates, but still it indicates a significant presence of Z63 there the sample size isn't that small. I'm thinking Visigoths maybe.

Or Suebi. Visigoths were probably a mish mash of haplogroups from the Balkans as well that joined them by the time they reached Iberia.