PDA

View Full Version : HarappaWorld PCAs



poi
02-01-2018, 06:25 AM
Here is a very preliminary PCA using HarappaWorld calculator's spreadsheet. I have also added a couple of forum members(based on their scores posted on this really old thread from 2015) just to see where some people fit in. Note this does not use the mother-of-all data Khana has put together. This is just a baby first step. If it is useful, I might create a 3D version for it, but it already does account for 93% of the variance.

Purple=IA/IE speakers
Green=Dravidian speakers
Orange=AA speakers
Forum members are Red pluses.

Revision 3: added Kush, Xehanort, Gyanwali, and Mingle. Also, fixed color coding issue where Srilankan and Goan were darkgreen instead of purple.

Biplot (no changes)
21154
https://i.imgur.com/QvDToho.png

Overall
21151
https://i.imgur.com/vK14zfG.png

Zoomed into members
https://i.imgur.com/haWx2jz.png
https://i.imgur.com/haWx2jz.png

Version 2

Revision Version 2 - added Reza family and Jortita, also added 95% ellipse
Note that the graph has flipped... it sort of looks like a scorpio!

Biplot - same as before
21143
https://i.imgur.com/dV9qoB1.png

Overview
21145
https://i.imgur.com/lidsYCL.png

Members zoomed in
21144
https://i.imgur.com/DJ4VvXm.png


Version 1:

Using the biplot, we can see that the 1st quadrant has SouthIndian component, 2nd quadrant has Baloch component, 3rd quadrant has Caucasian+NE-European+NE-Asian+SW-Asian, while the 4th quadrant has Papuan+SE-Asian pull.

https://i.imgur.com/q7PeKeT.png

Here are just the IA/Dravidian groups zoomed in.

https://i.imgur.com/rryUyh2.png

jortita
02-01-2018, 09:44 AM
Can you please plot me as well, even though it is an outdated calculator

bmoney
02-01-2018, 09:50 AM
Here is a very preliminary PCA using HarappaWorld calculator's spreadsheet. I have also added a couple of forum members(based on their scores posted on this really old thread from 2015) just to see where some people fit in. Note this does not use the mother-of-all data Khana has put together. This is just a baby first step. If it is useful, I might create a 3D version for it, but it already does account for 93% of the variance.

Purple=IA/IE speakers
Green=Dravidian speakers
Orange=AA speakers
Forum members are Red pluses.

Using the biplot, we can see that the 1st quadrant has SouthIndian component, 2nd quadrant has Baloch component, 3rd quadrant has Caucasian+NE-European+NE-Asian+SW-Asian, while the 4th quadrant has Papuan+SE-Asian pull.

Here are just the IA/Dravidian groups zoomed in.

I dont get how Baloch doesnt angle along with SW Asian Siberian NE Euro Caucasian

Siberian must be a part of that grouping due to ANE

MonkeyDLuffy
02-01-2018, 01:15 PM
Clustering right between kashmiri and punjabi brahmins, makes me wonder if we really split from Brahmins.

I'm Punjabi Ramgarhia btw.

poi
02-01-2018, 08:56 PM
@jortita @reza -- updated the PCAs and added your scores.

khanabadoshi
02-01-2018, 10:02 PM
I dont get how Baloch doesnt angle along with SW Asian Siberian NE Euro Caucasian

Siberian must be a part of that grouping due to ANE

Because the component is in lots of population groups besides the Baloch at significant numbers. So everything to the right of the y-axis has higher Baloch than everything to the left. You'll notice the Baloch, Makrani, and Brahui are in the 2nd Quadrant (not the Baloch quadrant), because this quadrant is high Baloch + high SW Asian/Caucasian etc. The further right you go the more Baloch. But not even the Baloch get pushed all the way down to the corner, because they aren't 100% of the component. In the 3rd quadrant are people with significant Baloch and but low SW Asian/Caucasian or they have significant SI coupled with Baloch. So what's actually defining placement to the right or left of the axis is Baloch v. SI and what defining above or below the axis are the other components.

The PCA would turn out differently if he included more populations out of the region as they would force everyone to be bounded in the shape you're used to seeing.

Reza
02-01-2018, 10:21 PM
@jortita @reza -- updated the PCAs and added your scores.

Thanks!

Pretty much in the epicentre..

poi
02-01-2018, 10:43 PM
Because the component is in lots of population groups besides the Baloch at significant numbers. So everything to the right of the y-axis has higher Baloch than everything to the left. You'll notice the Baloch, Makrani, and Brahui are in the 2nd Quadrant (not the Baloch quadrant), because this quadrant is high Baloch + high SW Asian/Caucasian etc. The further right you go the more Baloch. But not even the Baloch get pushed all the way down to the corner, because they aren't 100% of the component. In the 3rd quadrant are people with significant Baloch and but low SW Asian/Caucasian or they have significant SI coupled with Baloch. So what's actually defining placement to the right or left of the axis is Baloch v. SI and what defining above or below the axis are the other components.

The PCA would turn out differently if he included more populations out of the region as they would force everyone to be bounded in the shape you're used to seeing.




Very well said! The quadrants, after revision 2, have shifted fyi. Could not figure out how to flip it back lol

Sapporo
02-01-2018, 11:49 PM
Are the HGDP Pathan or HGDP Sindhi going to be included? Also, poi, what averages did you use for the Harappa Punjabi Khatri and Punjabi Jatt Sikh? The ones from Zack's spreadsheets or the ones from Dr_McNinja's spreadsheets? Or did you combine them?

poi
02-02-2018, 12:06 AM
Are the HGDP Pathan or HGDP Sindhi going to be included? Also, poi, what averages did you use for the Harappa Punjabi Khatri and Punjabi Jatt Sikh? The ones from Zack's spreadsheets or the ones from Dr_McNinja's spreadsheets? Or did you combine them?

I have not merged mcninja or Khana's data yet, just used the calculator's plus anthrogenica members.

Basically my process is documented in a script, so it can be replicated. All populations are categorized to have Region (e.g. SouthAsia), country, ethnicity, and caste if applicable. If there are multiple Punjabi Khatri data points, they are averaged as 1 Punjabi-Khatri.

My plan is to use this script for any spreadsheet (after categorization) to generate clean dataset for PCA generation, including color coding etc.

I will put this code on github if it is of interest to anyone.

Xehanort
02-02-2018, 12:13 AM
Poi, can you plot me as well bro!

Xehanort
02-02-2018, 12:13 AM
I have not merged mcninja or Khana's data yet, just used the calculator's plus anthrogenica members.

Basically my process is documented in a script, so it can be replicated. All populations are categorized to have Region (e.g. SouthAsia), country, ethnicity, and caste if applicable. If there are multiple Punjabi Khatri data points, they are averaged as 1 Punjabi-Khatri.

My plan is to use this script for any spreadsheet (after categorization) to generate clean dataset for PCA generation, including color coding etc.

I will put this code on github if it is of interest to anyone.

I would appreciate a plot. Thanks!

Sapporo
02-02-2018, 12:24 AM
I have not merged mcninja or Khana's data yet, just used the calculator's plus anthrogenica members.

Basically my process is documented in a script, so it can be replicated. All populations are categorized to have Region (e.g. SouthAsia), country, ethnicity, and caste if applicable. If there are multiple Punjabi Khatri data points, they are averaged as 1 Punjabi-Khatri.

My plan is to use this script for any spreadsheet (after categorization) to generate clean dataset for PCA generation, including color coding etc.

I will put this code on github if it is of interest to anyone.

Thanks for the explanation. The only point I wanted to make is not to mix results from Zack's averages (his DIY master version of Harappa) and Gedmatch Harappa (McNinja's spreadsheet). There are some noteworthy differences in the results for various individuals (especially for those after HRP0240) and the overall group averages they create. That is why it is better to just run results through Gedmatch Harappa for even the academic samples (HGDP, Xing, Metspalu, etc.). In fact, these differences are one of the reasons I am not a fan of Harappa besides it being very outdated.

khanabadoshi
02-02-2018, 01:07 AM
I have not merged mcninja or Khana's data yet, just used the calculator's plus anthrogenica members.

Basically my process is documented in a script, so it can be replicated. All populations are categorized to have Region (e.g. SouthAsia), country, ethnicity, and caste if applicable. If there are multiple Punjabi Khatri data points, they are averaged as 1 Punjabi-Khatri.

My plan is to use this script for any spreadsheet (after categorization) to generate clean dataset for PCA generation, including color coding etc.

I will put this code on github if it is of interest to anyone.

Oh this is awesome! So I can make a massive list, and then your script can clean it up and make averages? This will be very good for everyone.

khanabadoshi
02-02-2018, 01:09 AM
Thanks for the explanation. The only point I wanted to make is not to mix results from Zack's averages (his DIY master version of Harappa) and Gedmatch Harappa (McNinja's spreadsheet). There are some noteworthy differences in the results for various individuals (especially for those after HRP0240) and the overall group averages they create. That is why it is better to just run results through Gedmatch Harappa for even the academic samples (HGDP, Xing, Metspalu, etc.). In fact, these differences are one of the reasons I am not a fan of Harappa besides it being very outdated.

How did he get Xing to work? They have such little overlap. I'm going to upload some Xing samples and try it on Harappa and see it works. They don't work on many other calculators because they are just so different. If it does work, it means Harappa used very few SNPs or he imputed the Xing files.

Sapporo
02-02-2018, 01:13 AM
How did he get Xing to work? They have such little overlap. I'm going to upload some Xing samples and try it on Harappa and see it works. They don't work on many other calculators because they are just so different. If it does work, it means Harappa used very few SNPs or he imputed the Xing files.

Honestly, I have no idea. My guess is he used the limited SNPs or as you suggested, he imputed the Xing files. Therefore, it's possible the Xing results aren't as accurate as the HGDP in comparison.

poi
02-02-2018, 01:48 AM
Oh this is awesome! So I can make a massive list, and then your script can clean it up and make averages? This will be very good for everyone.

Yep, plan to create a PCA data generator from any spreadsheet... then we can easily curate data ad-hoc for any type of PCA. We can go nuts but that's ambitious enough lol.

MonkeyDLuffy
02-02-2018, 02:42 AM
Exciting times at Genica’s Hindi section.

Mingle
02-02-2018, 02:54 AM
Interesting PCA. Do you think you could plot me as well? Really curious to see where I stand.

By the way, does anyone know where the 'Pashtun' sample is taken from? It seems to be considerably south of surbakhun.

Kulin
02-02-2018, 02:58 AM
A minor error, but the Sinhalese and Goan samples are labelled with green, instead of purple lol.

poi
02-02-2018, 03:08 AM
A minor error, but the Sinhalese and Goan samples are labelled with green, instead of purple lol.

Whoops, the categorization for language, which drives this pca's color coding, is a bit wonky as it sort of infers based on geography and tribal affiliation. Need to make things more deterministic eventually. There can't be more than a few hundred ethnic groups right?

khanabadoshi
02-02-2018, 03:10 AM
Interesting PCA. Do you think you could plot me as well? Really curious to see where I stand.

By the way, does anyone know where the 'Pashtun' sample is taken from? It seems to be considerably south of surbakhun.

I think it's based on the Pathan HGDP samples which are of various scores. I have them all plotted out separately, PM me your kit number or post your Harappa score here and I'll add you to that list as well. If you are comfortable sharing your tribe and where you are from, I'll add that to the name. It'll be labeled like: Country; Province, City/Village - Pashtun [Tribename(s)] Mingle or something like that.

bmoney
02-02-2018, 03:12 AM
Whoops, the categorization for language, which drives this pca's color coding, is a bit wonky as it sort of infers based on geography and tribal affiliation. Need to make things more deterministic eventually. There can't be more than a few hundred ethnic groups right?

Poi whats with Gujarati As location

poi
02-02-2018, 03:14 AM
I think it's based on the Pathan HGDP samples which are of various scores. I have them all plotted out separately, PM me your kit number or post your Harappa score here and I'll add you to that list as well. If you are comfortable sharing your tribe and where you are from, I'll add that to the name. It'll be labeled like: Country; Province, City/Village - Pashtun [Tribename(s)] Mingle or something like that.

Your labeling strategy will get rid of 90% of the code... most of it is figuring out geography and language. It might still be useful as a backup or for non-southasian groups.

Kulin
02-02-2018, 03:14 AM
Whoops, the categorization for language, which drives this pca's color coding, is a bit wonky as it sort of infers based on geography and tribal affiliation. Need to make things more deterministic eventually. There can't be more than a few hundred ethnic groups right?

haha yes, I don't think there are more than 100 on Harappa.

khanabadoshi
02-02-2018, 03:15 AM
Poi whats with Gujarati As location

This stood out to me as well. Gujarati A and Gujarati Patel should be a lot further apart. But maybe I'm misremembering. Harrapa is old, things are going to plot differently than they do now on other calculators.

khanabadoshi
02-02-2018, 03:20 AM
Your labeling strategy will get rid of 90% of the code... most of it is figuring out geography and language. It might still be useful as a backup or for non-southasian groups.



If the code will work better if I make these each their own column, I can do that as well.

I label as precisely as possible to make alphabetically sorting on Excel sensible and related to geographic location. I'm constantly editing my nomenclature method. It's hard to make something standard when you have to account for ethnic groups that overlap over countries or 2 closer ethnic groups that are in different provinces. That's why you might notice I started labeling with ASC:: or AS:: or AW:: or AC:: -- ie. Asia South Central, South, West, Central etc. that ensures that results in that batch are more like each other despite the Country name or Province name or Ethnicity name.

poi
02-02-2018, 03:21 AM
Editrd

Mingle
02-02-2018, 03:23 AM
I think it's based on the Pathan HGDP samples which are of various scores. I have them all plotted out separately, PM me your kit number or post your Harappa score here and I'll add you to that list as well. If you are comfortable sharing your tribe and where you are from, I'll add that to the name. It'll be labeled like: Country; Province, City/Village - Pashtun [Tribename(s)] Mingle or something like that.

I'm a Yousafzai from Swabi, Khyber-Pakhtunkhwa.



HarappaWorld

Admix Results (sorted):

# Population Percent
1 Baloch 36.46
2 Caucasian 22.17
3 S-Indian 21.82
4 NE-Euro 12.84
5 SW-Asian 3.03
6 NE-Asian 1.42
7 Mediterranean 1.4
8 SE-Asian 0.38
9 American 0.32
10 Papuan 0.1
11 Siberian 0.06

Single Population Sharing:

# Population (source) Distance
1 pashtun (harappa) 5.99
2 pathan (hgdp) 8.04
3 kalash (hgdp) 8.67
4 punjabi-khatri (harappa) 10.16
5 sindhi (harappa) 10.27
6 kashmiri (harappa) 11.2
7 burusho (hgdp) 12.75
8 bhatia (harappa) 12.77
9 punjabi-jatt-sikh (harappa) 12.86
10 punjabi-jatt-muslim (harappa) 13.36
11 haryana-jatt (harappa) 13.59
12 kashmiri-pandit (reich) 14.1
13 gujarati-muslim (harappa) 14.76
14 punjabi (harappa) 15.16
15 punjabi-brahmin (harappa) 15.87
16 punjabi-arain (xing) 16.3
17 singapore-indian-c (sgvp) 16.52
18 up-muslim (harappa) 16.59
19 tajik (yunusbayev) 16.66
20 sindhi (hgdp) 16.86

Mixed Mode Population Sharing:

# Primary Population (source) Secondary Population (source) Distance
1 64.9% kashmiri-pahari (harappa) + 35.1% lezgin (behar) @ 2.29
2 65.5% singapore-indian-c (sgvp) + 34.5% lezgin (behar) @ 2.34
3 64.1% kashmiri-pahari (harappa) + 35.9% urkarah (xing) @ 2.52
4 67.6% punjabi (harappa) + 32.4% lezgin (behar) @ 2.65
5 63.7% punjabi-ramgarhia (harappa) + 36.3% lezgin (behar) @ 2.72
6 78.1% sindhi (harappa) + 21.9% chechen (yunusbayev) @ 2.74
7 76.1% punjabi-khatri (harappa) + 23.9% lezgin (behar) @ 2.85
8 66.8% punjabi (harappa) + 33.2% urkarah (xing) @ 2.97
9 67.9% kashmiri-pahari (harappa) + 32.1% chechen (yunusbayev) @ 2.97
10 68.5% singapore-indian-c (sgvp) + 31.5% chechen (yunusbayev) @ 2.98
11 77.9% punjabi-jatt-sikh (harappa) + 22.1% georgian (harappa) @ 2.98
12 68.8% punjabi-arain (xing) + 31.2% chechen (yunusbayev) @ 3.02
13 76% sindhi (harappa) + 24% lezgin (behar) @ 3.06
14 77.3% punjabi-khatri (harappa) + 22.7% kumyk (yunusbayev) @ 3.07
15 63.7% kashmiri-pahari (harappa) + 36.3% stalskoe (xing) @ 3.1
16 79.7% punjabi-jatt-sikh (harappa) + 20.3% abhkasian (yunusbayev) @ 3.11
17 66.6% punjabi-brahmin (harappa) + 33.4% lezgin (behar) @ 3.12
18 67.1% singapore-indian-c (sgvp) + 32.9% kumyk (yunusbayev) @ 3.12
19 75.4% sindhi (harappa) + 24.6% urkarah (xing) @ 3.13
20 80.3% sindhi (harappa) + 19.7% adygei (hgdp) @ 3.13

poi
02-02-2018, 03:26 AM
If the code will work better if I make these each their own column, I can do that as well.

I label as precisely as possible to make alphabetically sorting on Excel sensible and related to geographic location. I'm constantly editing my nomenclature method. It's hard to make something standard when you have to account for ethnic groups that overlap over countries or 2 closer ethnic groups that are in different provinces. That's why you might notice I started labeling with ASC:: or AS:: or AW:: or AC:: -- ie. Asia South Central, South, West, Central etc. that ensures that results in that batch are more like each other despite the Country name or Province name or Ethnicity name.
Awesome, man. Don't worry about what makes it easier for me, but what makes sense in terms of capturing the metadata. You have a good system. The script can easily process as long as the system is well understood. Basically, your labeling strategy is great.

khanabadoshi
02-02-2018, 03:29 AM
I'm a Yousafzai from Swabi, Khyber-Pakhtunkhwa.



HarappaWorld

Admix Results (sorted):

# Population Percent
1 Baloch 36.46
2 Caucasian 22.17
3 S-Indian 21.82
4 NE-Euro 12.84
5 SW-Asian 3.03
6 NE-Asian 1.42
7 Mediterranean 1.4
8 SE-Asian 0.38
9 American 0.32
10 Papuan 0.1
11 Siberian 0.06

Single Population Sharing:

# Population (source) Distance
1 pashtun (harappa) 5.99
2 pathan (hgdp) 8.04
3 kalash (hgdp) 8.67
4 punjabi-khatri (harappa) 10.16
5 sindhi (harappa) 10.27
6 kashmiri (harappa) 11.2
7 burusho (hgdp) 12.75
8 bhatia (harappa) 12.77
9 punjabi-jatt-sikh (harappa) 12.86
10 punjabi-jatt-muslim (harappa) 13.36
11 haryana-jatt (harappa) 13.59
12 kashmiri-pandit (reich) 14.1
13 gujarati-muslim (harappa) 14.76
14 punjabi (harappa) 15.16
15 punjabi-brahmin (harappa) 15.87
16 punjabi-arain (xing) 16.3
17 singapore-indian-c (sgvp) 16.52
18 up-muslim (harappa) 16.59
19 tajik (yunusbayev) 16.66
20 sindhi (hgdp) 16.86

Mixed Mode Population Sharing:

# Primary Population (source) Secondary Population (source) Distance
1 64.9% kashmiri-pahari (harappa) + 35.1% lezgin (behar) @ 2.29
2 65.5% singapore-indian-c (sgvp) + 34.5% lezgin (behar) @ 2.34
3 64.1% kashmiri-pahari (harappa) + 35.9% urkarah (xing) @ 2.52
4 67.6% punjabi (harappa) + 32.4% lezgin (behar) @ 2.65
5 63.7% punjabi-ramgarhia (harappa) + 36.3% lezgin (behar) @ 2.72
6 78.1% sindhi (harappa) + 21.9% chechen (yunusbayev) @ 2.74
7 76.1% punjabi-khatri (harappa) + 23.9% lezgin (behar) @ 2.85
8 66.8% punjabi (harappa) + 33.2% urkarah (xing) @ 2.97
9 67.9% kashmiri-pahari (harappa) + 32.1% chechen (yunusbayev) @ 2.97
10 68.5% singapore-indian-c (sgvp) + 31.5% chechen (yunusbayev) @ 2.98
11 77.9% punjabi-jatt-sikh (harappa) + 22.1% georgian (harappa) @ 2.98
12 68.8% punjabi-arain (xing) + 31.2% chechen (yunusbayev) @ 3.02
13 76% sindhi (harappa) + 24% lezgin (behar) @ 3.06
14 77.3% punjabi-khatri (harappa) + 22.7% kumyk (yunusbayev) @ 3.07
15 63.7% kashmiri-pahari (harappa) + 36.3% stalskoe (xing) @ 3.1
16 79.7% punjabi-jatt-sikh (harappa) + 20.3% abhkasian (yunusbayev) @ 3.11
17 66.6% punjabi-brahmin (harappa) + 33.4% lezgin (behar) @ 3.12
18 67.1% singapore-indian-c (sgvp) + 32.9% kumyk (yunusbayev) @ 3.12
19 75.4% sindhi (harappa) + 24.6% urkarah (xing) @ 3.13
20 80.3% sindhi (harappa) + 19.7% adygei (hgdp) @ 3.13

Awesome, I have a few other Yusufzai results, you'll have a good comparison! I might be able to do a quick and dirty comparison in in an hour or so, we'll see. Who knows, maybe one of the results I have is already you. I'll check and let you know.

Sapporo
02-02-2018, 03:48 AM
Interesting PCA. Do you think you could plot me as well? Really curious to see where I stand.

By the way, does anyone know where the 'Pashtun' sample is taken from? It seems to be considerably south of surbakhun.


poi created the PCA so he knows. It's either the Pashtun (Harappa) participant average or the HGDP Pathan samples.


This stood out to me as well. Gujarati A and Gujarati Patel should be a lot further apart. But maybe I'm misremembering. Harrapa is old, things are going to plot differently than they do now on other calculators.

Sometimes the "upper caste" or diverse Sindh/Baloch shifted Gujaratis are labeled Gujarati B (I think it was reversed on the Houston samples) but normally they are referred to as Gujarati A on most anthro forums. In addition, the tightly clustered "Patel" like samples are sometimes labeled Gujarati A.

http://www.harappadna.org/tag/sindhi/

kush
02-02-2018, 04:27 AM
@ poi Can you plot me as well? I feel left out haha..
Im pretty sure where ill be on the plot, but still curious regardless. Thanks

# Population Percent
1 S-Indian 53.7
2 Baloch 37.19
3 Caucasian 4.32
4 NE-Asian 1.52
5 SE-Asian 1.2
6 SW-Asian 1.1
7 Papuan 0.45
8 American 0.4
9 Mediterranean 0.12

poi
02-02-2018, 04:36 AM
@ poi Can you plot me as well? I feel left out haha..
Im pretty sure where ill be on the plot, but still curious regardless. Thanks

# Population Percent
1 S-Indian 53.7
2 Baloch 37.19
3 Caucasian 4.32
4 NE-Asian 1.52
5 SE-Asian 1.2
6 SW-Asian 1.1
7 Papuan 0.45
8 American 0.4
9 Mediterranean 0.12

Oh, I thought I added you but guess not. I will add a bunch of people in the next revision. The PCA might shift a bit, though, and the minimum distance lines between populations will certainly adjust accordingly.

khanabadoshi
02-02-2018, 05:09 AM
@ poi Can you plot me as well? I feel left out haha..
Im pretty sure where ill be on the plot, but still curious regardless. Thanks

# Population Percent
1 S-Indian 53.7
2 Baloch 37.19
3 Caucasian 4.32
4 NE-Asian 1.52
5 SE-Asian 1.2
6 SW-Asian 1.1
7 Papuan 0.45
8 American 0.4
9 Mediterranean 0.12

which state/province are you from again?

State/Province; City/Village -- Telugu [Reddy] | Kush

kush
02-02-2018, 05:16 AM
which state/province are you from again?

State/Province; City/Village -- Telugu [Reddy] | Kush

State- Telangana
City- Nizamabad

poi
02-02-2018, 06:12 AM
Added Xehanort, Mingle, Kush, and Gyanwali! Check out the first post.


I would appreciate a plot. Thanks!

In this Harappa PCA, you are closer to the Punjabi Arains. hmm


Interesting PCA. Do you think you could plot me as well? Really curious to see where I stand. By the way, does anyone know where the 'Pashtun' sample is taken from? It seems to be considerably south of surbakhun.

Interestingly, you are closer to the Pashtun sample... and inside the ellipse... I noticed that Surbakhun has less SouthIndian and a bit higher Caucasian and SW-Asian, while you have a tad bit more NE-Euro. Could that be it? Note that SouthIndian is pulling things towards the 3rd quadrant, while Caucasian(along with other components like NE-Euro, NE-Asian, Siberian, SW-Asian) are pulling things towards the 1st. Looks like your higher SouthIndian is pulling you towards the center, relatively speaking.


@ poi Can you plot me as well? I feel left out haha..

Left out no more!

Mingle
02-02-2018, 06:19 AM
Interestingly, you are closer to the Pashtun sample... and inside the ellipse... I noticed that Surbakhun has less SouthIndian and a bit higher Caucasian and SW-Asian, while you have a tad bit more NE-Euro. Could that be it? Note that SouthIndian is pulling things towards the 3rd quadrant, while Caucasian(along with other components like NE-Euro, NE-Asian, Siberian, SW-Asian) are pulling things towards the 1st. Looks like your higher SouthIndian is pulling you towards the center, relatively speaking.

Appreciate your work man :) What is the 'Pashtun' sample based off of? Is it the HGDP Pashtun?

bmoney
02-02-2018, 06:24 AM
Added Xehanort, Mingle, Kush, and Gyanwali! Check out the first post.



In this Harappa PCA, you are closer to the Punjabi Arains. hmm



Interestingly, you are closer to the Pashtun sample... and inside the ellipse... I noticed that Surbakhun has less SouthIndian and a bit higher Caucasian and SW-Asian, while you have a tad bit more NE-Euro. Could that be it? Note that SouthIndian is pulling things towards the 3rd quadrant, while Caucasian(along with other components like NE-Euro, NE-Asian, Siberian, SW-Asian) are pulling things towards the 1st. Looks like your higher SouthIndian is pulling you towards the center, relatively speaking.



Left out no more!

amazing work poi - we should start doing this for other calcs

clear zones here East/Bengal zone, Punjabi-Pahari zone, Central India, Munda and ASI zones

BTW did anyone else notice that Munda, particularly from Orissa, are the closest pops to Onge and not Tamil Dalits/ASI

Must be the SE Asian admixture in Onge - shows what a poor proxy it is for ASI

About the Sri Lankan sample (im assuming Sinhalese) are meant to be descendents of the Vanga kingdom of what is now Bengal. The plot shows that they are pretty close to Bengalis

@Varun youre pretty much an eastern Indian Brahmin - your community must have been straight imports

poi
02-02-2018, 06:46 AM
Appreciate your work man :) What is the 'Pashtun' sample based off of? Is it the HGDP Pashtun?

I got Pashtun directly from the spreadsheet section in the HarappaWorld Gedmatch calculator. I do not know beyond that, but other members likely can tell you where that sample came from.

poi
02-02-2018, 06:54 AM
amazing work poi - we should start doing this for other calcs

clear zones here East/Bengal zone, Punjabi-Pahari zone, Central India, Munda and ASI zones

BTW did anyone else notice that Munda, particularly from Orissa, are the closest pops to Onge and not Tamil Dalits/ASI

Must be the SE Asian admixture in Onge - shows what a poor proxy it is for ASI

About the Sri Lankan sample (im assuming Sinhalese) are meant to be descendents of the Vanga kingdom of what is now Bengal. The plot shows that they are pretty close to Bengalis

@Varun youre pretty much an eastern Indian Brahmin - your community must have been straight imports

Appreciate the words, bro. It is fun doing this. With so many spreadsheets, including metadata rich ones Khana is building, we can do some cool stuff.

I understand that Harappa is like the Godfather of South Asian nonAcademic calculator, but what is the next interesting or useful?

Xehanort
02-02-2018, 08:16 AM
Added Xehanort, Mingle, Kush, and Gyanwali! Check out the first post.



In this Harappa PCA, you are closer to the Punjabi Arains. hmm



Interestingly, you are closer to the Pashtun sample... and inside the ellipse... I noticed that Surbakhun has less SouthIndian and a bit higher Caucasian and SW-Asian, while you have a tad bit more NE-Euro. Could that be it? Note that SouthIndian is pulling things towards the 3rd quadrant, while Caucasian(along with other components like NE-Euro, NE-Asian, Siberian, SW-Asian) are pulling things towards the 1st. Looks like your higher SouthIndian is pulling you towards the center, relatively speaking.



Left out no more!

Interesting.

Xehanort
02-02-2018, 08:16 AM
Added Xehanort, Mingle, Kush, and Gyanwali! Check out the first post.



In this Harappa PCA, you are closer to the Punjabi Arains. hmm



Interestingly, you are closer to the Pashtun sample... and inside the ellipse... I noticed that Surbakhun has less SouthIndian and a bit higher Caucasian and SW-Asian, while you have a tad bit more NE-Euro. Could that be it? Note that SouthIndian is pulling things towards the 3rd quadrant, while Caucasian(along with other components like NE-Euro, NE-Asian, Siberian, SW-Asian) are pulling things towards the 1st. Looks like your higher SouthIndian is pulling you towards the center, relatively speaking.



Left out no more!

Interesting. Thanks bro. Appreciate it! I think it's my high Baloch on this which gives me an Arain pull.

misanthropy
02-02-2018, 09:09 AM
Admix Results (sorted):

# Population Percent
1 S-Indian 48.9
2 Baloch 30.29
3 Caucasian 6.2
4 SW-Asian 4.51
5 NE-Euro 2.67
6 NE-Asian 1.99
7 E-African 1.9
8 Siberian 1.2
9 American 0.77
10 Papuan 0.51
11 SE-Asian 0.45
12 San 0.35
13 Beringian 0.26

Single Population Sharing:

# Population (source) Distance
1 kerala-christian (harappa) 4.98
2 ap-hyderabad (harappa) 4.99
3 kerala-muslim (harappa) 5.35
4 tamil (harappa) 5.46
5 up (harappa) 5.93
6 bihari (harappa) 6.2
7 rajasthani (harappa) 6.26
8 caribbean-indian (harappa) 6.55
9 andhra-pradesh (harappa) 6.77
10 sri-lankan (harappa) 7.14
11 singapore-indian-b (sgvp) 7.18
12 cochin-jew (behar) 7.31
13 maharashtrian (harappa) 7.47
14 ap-brahmin (xing) 7.49
15 karnataka-brahmin (harappa) 7.56
16 up-muslim (metspalu) 7.67
17 srivastava (reich) 7.89
18 tharu (metspalu) 7.93
19 iyengar-brahmin (harappa) 8.02
20 karnataka (harappa) 8.16

Mixed Mode Population Sharing:

# Primary Population (source) Secondary Population (source) Distance
1 94.4% ap-hyderabad (harappa) + 5.6% ethiopian-jew (behar) @ 2.26
2 94.5% ap-hyderabad (harappa) + 5.5% ethiopian (behar) @ 2.3
3 94.5% ap-hyderabad (harappa) + 5.5% tygray (pagani) @ 2.4
4 93.7% ap-hyderabad (harappa) + 6.3% qatari (henn2012) @ 2.41
5 94.5% ap-hyderabad (harappa) + 5.5% amhara (pagani) @ 2.42
6 94.6% ap-hyderabad (harappa) + 5.4% afar (pagani) @ 2.48
7 95% ap-hyderabad (harappa) + 5% somali (harappa) @ 2.54
8 92.4% tamil (harappa) + 7.6% yemenese (behar) @ 2.54
9 94.3% ap-hyderabad (harappa) + 5.7% bedouin (hgdp) @ 2.58
10 92.9% tamil (harappa) + 7.1% egyptian (behar) @ 2.6
11 94.8% ap-hyderabad (harappa) + 5.2% oromo (pagani) @ 2.6
12 94.5% ap-hyderabad (harappa) + 5.5% saudi (behar) @ 2.67
13 93.8% tamil (harappa) + 6.2% yemen-jew (behar) @ 2.67
14 90.5% andhra-pradesh (harappa) + 9.5% yemenese (behar) @ 2.68
15 93% tamil (harappa) + 7% egypt (henn2012) @ 2.68
16 93.7% tamil (harappa) + 6.3% bedouin (hgdp) @ 2.69
17 90.6% tharu (metspalu) + 9.4% yemen-jew (behar) @ 2.73
18 93.8% tamil (harappa) + 6.2% saudi (behar) @ 2.73
19 95.2% ap-hyderabad (harappa) + 4.8% esomali (pagani) @ 2.74
20 93.3% tamil (harappa) + 6.7% qatari (henn2012) @ 2.74

khanabadoshi
02-02-2018, 10:24 AM
Appreciate your work man :) What is the 'Pashtun' sample based off of? Is it the HGDP Pashtun?

So this is kind of the opposite of Poi's PCA, as I'm not using any of the averages. So look at both to get a full sense of where you place. Yellow Squares are forum members, but I've added very few people so far.

https://i.gyazo.com/1f13c54861b7798e5097458dea38373a.jpg


https://i.gyazo.com/bf5806af4088a2e0a77f99261b4d3ba9.png

Kurd
02-02-2018, 01:18 PM
Harappa's Baloch component is an NOT a real Baloch signal but a hybrid Baloch-Indian signal, because not only do Kurds score (26 - 30%) less of it than Indians, but even Durrani Pashtuns, myself, and actual Iranian Baloch (Farid & Zara), who cluster close to Baloch, have actual recent Baloch geneflow, and have Baloch very high up in oracles on other calculators score less of it than Indians!

Here is Zara (Baloch):

Admix Results (sorted):
# Population Percent
1 Baloch 36.26
2 Caucasian 27.67
3 SW-Asian 12.38
4 S-Indian 11.07
5 NE-Euro 6.49
6 W-African 2.07
7 Siberian 1.58
8 E-African 1.27
9 Pygmy 0.47
10 Beringian 0.45
11 Papuan 0.17
12 American 0.12

Single Population Sharing:
# Population (source) Distance
1 pashtun (harappa) 13.91
2 iranian (behar) 15.45
3 turkmen (yunusbayev) 16.31
4 iranian (harappa) 16.75
5 tajik (yunusbayev) 17.54
6 kurd (harappa) 18.32
7 kalash (hgdp) 18.93
8 pathan (hgdp) 19.44
9 kurd (xing) 19.75
10 makrani (hgdp) 20.38
11 kurd (yunusbayev) 20.88
12 balochi (hgdp) 21.19
13 sindhi (harappa) 21.19
14 punjabi-khatri (harappa) 22.83
15 iraqi-arab (harappa) 23.11
16 bhatia (harappa) 23.25
17 burusho (hgdp) 23.25
18 kashmiri (harappa) 23.66
19 azeri (harappa) 24.32
20 turkish (harappa) 24.38



Here is Farid (Baloch):

Admix Results (sorted):
# Population Percent
1 Baloch 38.22
2 Caucasian 25.21
3 S-Indian 13.87
4 SW-Asian 9.98
5 NE-Euro 6.16
6 W-African 2.42
7 E-African 1.2
8 Siberian 1.17
9 SE-Asian 0.54
10 San 0.5
11 Pygmy 0.5
12 Beringian 0.17
13 Papuan 0.07

Single Population Sharing:
# Population (source) Distance
1 pashtun (harappa) 11.13
2 kalash (hgdp) 14.98
3 pathan (hgdp) 15.5
4 sindhi (harappa) 17.2
5 tajik (yunusbayev) 17.49
6 makrani (hgdp) 17.89
7 balochi (hgdp) 18.12
8 turkmen (yunusbayev) 18.39
9 punjabi-khatri (harappa) 19.02
10 bhatia (harappa) 19.17
11 iranian (behar) 19.31
12 burusho (hgdp) 19.55
13 kashmiri (harappa) 19.9
14 iranian (harappa) 20.27
15 punjabi-jatt-muslim (harappa) 20.97
16 punjabi-jatt-sikh (harappa) 21.22
17 gujarati-muslim (harappa) 21.29
18 sindhi (hgdp) 21.88
19 brahui (hgdp) 21.93
20 kurd (harappa) 21.98



Just to show that Farid and Zara are really Balochis (not advocating Gedmatch calculators for their admixture %s):

MDLP World-22 4-Ancestors Oracle
This program is based on 4-Ancestors Oracle Version 0.96 by Alexandr Burnashev.
Questions about results should be sent to him at: [email protected]
Original concept proposed by Sergey Kozlov.
Many thanks to Alexandr for helping us get this web version developed.

Admix Results (sorted):
# Population Percent
1 West-Asian 43.27
2 Indian 18.15
3 Near_East 11.92
4 Indo-Iranian 6.77
5 North-East-European 6.20
6 Atlantic_Mediterranean_Neolithic 4.58
7 Samoedic 3.47
8 Sub-Saharian 2.28
9 North-European-Mesolithic 1.23


Finished reading population data. 276 populations found.
22 components mode.

--------------------------------

Least-squares method.

Using 1 population approximation:
1 Parsi_derived @ 8.527444
2 Makrani_derived @ 10.857623
3 Balochi_derived @ 12.277985
4 Brahui_derived @ 12.819414
5 Pashtun_derived @ 12.853998

MDLP World-22 4-Ancestors Oracle
This program is based on 4-Ancestors Oracle Version 0.96 by Alexandr Burnashev.
Questions about results should be sent to him at: [email protected]
Original concept proposed by Sergey Kozlov.
Many thanks to Alexandr for helping us get this web version developed.

Admix Results (sorted):
# Population Percent
1 West-Asian 44.28
2 Indian 14.97
3 Near_East 12.56
4 Indo-Iranian 7.60
5 Atlantic_Mediterranean_Neolithic 7.18
6 North-East-European 5.01
7 Samoedic 2.82
8 Sub-Saharian 2.68


Finished reading population data. 276 populations found.
22 components mode.

--------------------------------

Least-squares method.

Using 1 population approximation:
1 Parsi_derived @ 9.383298
2 Makrani_derived @ 12.695770
3 Pashtun_derived @ 14.584200
4 Iranian_derived @ 14.705791
5 Brahui_derived @ 14.792733
6 Balochi_derived @ 15.442040


Edit: It appears that Zack wanted to turn all Indians into Balochis for some reason B)

bmoney
02-02-2018, 01:35 PM
Harappa's Baloch component is an NOT a real Baloch signal but a hybrid Baloch-Indian signal, because not only do Kurds score (26 - 30%) less of it than Indians, but even Durrani Pashtuns, myself, and actual Iranian Baloch (Farid & Zara), who cluster close to Baloch, have actual recent Baloch geneflow, and have Baloch very high up in oracles on other calculators score less of it than Indians!

its probably reflecting the Iran_N component rather than the recent West Asian geneflow the Baloch have received

Iran_N is pretty high in the Baloch

Kurd
02-02-2018, 01:51 PM
its probably reflecting the Iran_N component rather than the recent West Asian geneflow the Baloch have received

Iran_N is pretty high in the Baloch

I don’t think so because if that were the case then Farid, Zara, some Kurds and Iranians should still score higher than Indians, besides Iran N should have Iran N references

poi
02-02-2018, 02:12 PM
Harappa's Baloch component is an NOT a real Baloch signal but a hybrid Baloch-Indian signal, because not only do Kurds score (26 - 30%) less of it than Indians, but even Durrani Pashtuns, myself, and actual Iranian Baloch (Farid & Zara), who cluster close to Baloch, have actual recent Baloch geneflow, and have Baloch very high up in oracles on other calculators score less of it than Indians!
...
Edit: It appears that Zack wanted to turn all Indians into Balochis for some reason B)

Thanks for the clarification about the misnomer Baloch in Harappa. In this PCA, the Balochis are pulled towards Mediterranean, while Gujarati Patels are towards the Baloch.

Are there any other outright component issues like that? Also, would those flaws matter when we are looking at something like PCA? It could be an issue if some of these component flaws skew some populations too much, but trying to wrap my head around their effect on relationships.

Mingle
02-02-2018, 03:53 PM
About the Sri Lankan sample (im assuming Sinhalese) are meant to be descendents of the Vanga kingdom of what is now Bengal. The plot shows that they are pretty close to Bengalis

If Sinhalas are of Bengali descent, then why do they speak a language descended from Maharashtri Prakrit?

MonkeyDLuffy
02-02-2018, 03:57 PM
If Sinhalas are of Bengali descent, then why do they speak a language descended from Maharashtri Prakrit?

Languages are all over the place in the sub continent tbh. Like Brahui live in such isolated place surrounded by Iranic languages and yet speak a Dravidian language.

Mingle
02-02-2018, 04:30 PM
Languages are all over the place in the sub continent tbh. Like Brahui live in such isolated place surrounded by Iranic languages and yet speak a Dravidian language.

True, but there is generally an explanation for all of those stuff. Are there any explanations for why Sinhalas speak a language descended from Maharashtri Prakrit?

parasar
02-02-2018, 04:36 PM
If Sinhalas are of Bengali descent, then why do they speak a language descended from Maharashtri Prakrit?

The language of the Maharashtra region at the time the Singhalas moved to Lanka was pretty much the same as the language of Bangal. A few centuries after the Singhala move we have the Asokan inscriptions which are tuned to the local dialects and there is essentially no difference. There are a few peculiarities here and there no doubt - for example in the east r becomes l.

Textually (Mahavamsa, Kulavamsa) and genetically the connection has been known for some time.

"A genetic distance analysis by Dr Robet Kirk also concluded that the modern Sinhalese are most closely related to the Bengalis.[3]"

"This is further substantiated by a VNTR study, which found 70-82% of Sinhalese genes to originate from Bengali admixture:[4]"
https://en.wikipedia.org/wiki/Genetic_studies_on_Sinhalese

Another way to look at it is outer band and inner band Indo-Aryan languages. The outer band despite the massive distances at which they are spoken are closer to each other than to the core (sauraseni, braj).

khanabadoshi
02-02-2018, 04:53 PM
its probably reflecting the Iran_N component rather than the recent West Asian geneflow the Baloch have received

Iran_N is pretty high in the Baloch


TBH, it looks more like Iran ChL + some SI -- all the Iranians and Western Afghans are in the direct path of the Baloch cline.
The Baloch should be near the Bandari and Parsis, I was not expecting them to be so far apart. Maybe in the 3D PCA they will come closer. This PCA only accounts for 80-some percent of variance. If anything, making this PCA proves that we shouldn't refer to Harappa anymore. Also, in Harappa, a lot of groups outside of South Asia score some significant percentage of SI, so SI is not purely based on Onge or Paniya or something, it's also hybridized.

poi
02-02-2018, 05:10 PM
TBH, it looks more like Iran ChL + some SI -- all the Iranians and Western Afghans are in the direct path of the Baloch cline.
The Baloch should be near the Bandari and Parsis, I was not expecting them to be so far apart. Maybe in the 3D PCA they will come closer. This PCA only accounts for 80-some percent of variance. If anything, making this PCA proves that we shouldn't refer to Harappa anymore. Also, in Harappa, a lot of groups outside of South Asia score some significant percentage of SI, so SI is not purely based on Onge or Paniya or something, it's also hybridized.

Not sure which PCA you mean, but revision3 2D PCA accounted for about 92% (based on these SouthAsian populations up there). I can create a 3D PCA for Harappa, which will account for 96% of the variance.

Overall, though, it makes sense not to read into Harappa alone due to its oddities. Let's move on to a more relevant calculator results. I should have the script ready to ingest spreadsheets by this weekend.

Kurd
02-02-2018, 06:50 PM
Thanks for the clarification about the misnomer Baloch in Harappa. In this PCA, the Balochis are pulled towards Mediterranean, while Gujarati Patels are towards the Baloch.

Are there any other outright component issues like that? Also, would those flaws matter when we are looking at something like PCA? It could be an issue if some of these component flaws skew some populations too much, but trying to wrap my head around their effect on relationships.

Sure, it the domino effect which results in many components not representing their labels. To Get an idea of what each component truly represents look at the spreadsheet and see which populations score high and that Will tell you what the component should really be called

I believe Iím the only one that designs calculators without throwing hundreds of non-reference samples into the run except for some of the commercial companies

Xehanort
02-02-2018, 08:34 PM
Edit: It appears that Zack wanted to turn all Indians into Balochis for some reason B)

You need to get off your high horse thinking Balochis are better than everyone. We'd take our Steppe ancestry over the "Baloch" component any day. Tell me, how much Steppe ancestry do Baloch have? Not very, I assume (29% max).

Xehanort
02-02-2018, 08:36 PM
its probably reflecting the Iran_N component rather than the recent West Asian geneflow the Baloch have received

Iran_N is pretty high in the Baloch

Yeah, well said! I got a high component here, and turns out that I ended up with a very high Iran Neolithic score (40 to 47%) on all of David's calculators. My Steppe was between 36 and 40%.

Xehanort
02-02-2018, 08:37 PM
TBH, it looks more like Iran ChL + some SI -- all the Iranians and Western Afghans are in the direct path of the Baloch cline.
The Baloch should be near the Bandari and Parsis, I was not expecting them to be so far apart. Maybe in the 3D PCA they will come closer. This PCA only accounts for 80-some percent of variance. If anything, making this PCA proves that we shouldn't refer to Harappa anymore. Also, in Harappa, a lot of groups outside of South Asia score some significant percentage of SI, so SI is not purely based on Onge or Paniya or something, it's also hybridized.

No, I doubt it.

poi
02-02-2018, 08:45 PM
You need to get off your high horse thinking Balochis are better than everyone. We'd take our Steppe ancestry over the "Baloch" component any day. Tell me, how much Steppe ancestry do Baloch have? Not very, I assume (29% max).

I think Kurd was just being playful about Zach's liberal use of the component Baloch in HarappaWorld. We need expert inputs from Kurd and other scientists like him dude

Xehanort
02-02-2018, 09:00 PM
I think Kurd was just being playful about Zach's liberal use of the component Baloch in HarappaWorld. We need expert inputs from Kurd and other scientists like him dude

Maybe I got the wrong impression from him, so I was a bit angered. I apologize in the case that he said it in a playful way.

misanthropy
02-02-2018, 09:07 PM
You need to get off your high horse thinking Balochis are better than everyone. We'd take our Steppe ancestry over the "Baloch" component any day. Tell me, how much Steppe ancestry do Baloch have? Not very, I assume (29% max).

This is sig worthy 😂👌

khanabadoshi
02-03-2018, 12:04 AM
Not sure which PCA you mean, but revision3 2D PCA accounted for about 92% (based on these SouthAsian populations up there). I can create a 3D PCA for Harappa, which will account for 96% of the variance.

Overall, though, it makes sense not to read into Harappa alone due to its oddities. Let's move on to a more relevant calculator results. I should have the script ready to ingest spreadsheets by this weekend.

Yours accounts for more variance, mines was less at 80-something %.

bmoney
02-03-2018, 12:16 AM
If Sinhalas are of Bengali descent, then why do they speak a language descended from Maharashtri Prakrit?

Thats a good point, not sure. But they have no historical attestation to Maharashtra whereas they do for Vanga

EDIT: just read Parasars post

bmoney
02-03-2018, 12:21 AM
TBH, it looks more like Iran ChL + some SI -- all the Iranians and Western Afghans are in the direct path of the Baloch cline.
The Baloch should be near the Bandari and Parsis, I was not expecting them to be so far apart. Maybe in the 3D PCA they will come closer. This PCA only accounts for 80-some percent of variance. If anything, making this PCA proves that we shouldn't refer to Harappa anymore. Also, in Harappa, a lot of groups outside of South Asia score some significant percentage of SI, so SI is not purely based on Onge or Paniya or something, it's also hybridized.

SI has an ANE and ENF component for sure inferring from calculators

bmoney
02-03-2018, 12:25 AM
You need to get off your high horse thinking Balochis are better than everyone. We'd take our Steppe ancestry over the "Baloch" component any day. Tell me, how much Steppe ancestry do Baloch have? Not very, I assume (29% max).

Id take Baloch over steppe dude - some of that sweet sweet basal

Baloch Zindabad (not saying Balochistan for political reasons)

bmoney
02-03-2018, 12:29 AM
Not sure which PCA you mean, but revision3 2D PCA accounted for about 92% (based on these SouthAsian populations up there). I can create a 3D PCA for Harappa, which will account for 96% of the variance.

Overall, though, it makes sense not to read into Harappa alone due to its oddities. Let's move on to a more relevant calculator results. I should have the script ready to ingest spreadsheets by this weekend.

Im interested in an ANE_K7 run or a calc that has ancients as opposed to modern references, because those are more informative for SAs like me with older admixture

Suggestions on the best ancients calc Kurd Khana Sapporo anyone else?

Xehanort
02-03-2018, 12:33 AM
Id take Baloch over steppe dude - some of that sweet sweet basal

Baloch Zindabad (not saying Balochistan for political reasons)

Yeah, you're right! Gotta love that Basal!

khanabadoshi
02-03-2018, 12:40 AM
@Kakiasumi, Rukha, Mingle, Surbakhun:






POPID
S-Indian
Baloch
Caucasian
NE-Euro
SE-Asian
Siberian
NE-Asian
Papuan
American
Beringian
Mediterranean
SW-Asian
San
E-African
Pygmy
W-African


ASC::KPK - Ormuri [Burki]
14.77
36.26
24.84
14.76
0.29
0.69
0.00
1.22
0.93
1.05
2.41
2.55
0.10
0.00
0.00
0.06


ASC::KPK - Pashtun [Khan] (U02)
20.78
28.68
21.65
11.20
0.89
0.97
1.62
0.10
1.49
0.11
5.10
4.56
0.49
1.82
0.00
0.53


ASC::KPK - Pashtun [Khan] (U04)
21.61
36.41
18.67
11.51
0.00
2.06
1.53
1.10
1.32
1.70
0.00
4.08
0.00
0.00
0.00
0.00


ASC::KPK - Pashtun [Khan] (U66)
21.63
37.04
18.92
12.02
1.03
2.88
0.00
0.00
0.96
1.17
1.31
3.05
0.00
0.00
0.00
0.00


ASC::KPK - Pashtun [Khan] (U73)
31.53
36.13
12.94
11.98
1.32
1.10
0.63
0.27
0.06
1.51
0.88
1.64
0.00
0.00
0.00
0.00


ASC::KPK - Pashtun [Khattak]
21.30
36.50
18.46
13.74
0.37
1.56
0.00
0.52
1.50
1.04
1.66
3.34
0.00
0.00
0.00
0.00


ASC::KPK - Pashtun [Niazi]
23.94
38.19
19.69
10.89
0.00
1.02
0.00
0.67
1.77
0.39
0.47
2.59
0.00
0.00
0.00
0.37


ASC::KPK - Pashtun [Zaman]
31.91
37.19
14.04
9.94
0.00
1.97
0.00
0.87
0.14
0.44
1.18
2.14
0.18
0.00
0.00
0.00


ASC::KPK, Chitral, Ashret - PHALURA 01
24.54
38.77
16.49
13.65
0.56
3.47
0.00
0.12
1.20
0.61
0.00
0.59
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Arandu Gol - ARANDUYIWAR [Savi/Gawar-Bati] 01
28.17
39.61
15.97
8.83
0.00
0.13
1.45
0.24
0.91
0.59
1.97
2.13
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Arandu Gol - ARANDUYIWAR [Savi/Gawar-Bati] 02
31.36
45.35
12.46
6.71
0.00
0.00
0.00
1.31
0.25
0.35
0.00
2.05
0.00
0.00
0.16
0.00


ASC::KPK; Chitral, Boroghol, Chelmarabad - WAKHEK 01
14.36
32.99
18.19
19.53
0.00
3.06
2.68
0.61
3.32
1.25
2.94
1.05
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Boroghol, Chelmarabad - WAKHEK 02
16.70
32.18
19.31
19.24
0.00
2.98
2.44
0.35
1.71
2.65
0.99
1.37
0.08
0.00
0.00
0.00


ASC::KPK; Chitral, Bumboret, Shekhen Deh - NURISTANI [Shekhani] 01
19.18
44.07
19.16
13.28
0.00
1.62
0.00
0.00
1.15
0.62
0.00
0.93
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Bumboret, Shekhen Deh - NURISTANI [Shekhani] 02
21.03
42.32
18.52
13.43
0.00
0.82
0.64
0.16
0.50
2.07
0.00
0.51
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Chitral City - KHO 01 [Mehtar]
20.25
35.76
18.28
15.32
0.00
2.89
2.36
0.00
2.88
0.31
0.00
1.91
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Damir, Dundi Darri - DAMELI 01
21.80
40.41
20.91
12.85
0.00
2.61
0.43
0.00
0.60
0.00
0.00
0.34
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Damir, Musta Japan Deh - DAMELI 02
19.77
39.37
16.01
15.97
0.00
1.50
1.26
0.54
1.41
1.91
1.40
0.83
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Drosh, Kesu - KHO 02 [Singanali]
20.95
37.33
17.29
14.25
0.00
3.42
1.54
0.00
2.28
0.94
0.00
1.98
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Lutkho, Gobor Bokh - YIDGHA or SHEKHANI
17.25
41.55
18.53
15.94
0.00
1.54
0.00
0.00
1.50
2.32
0.00
1.37
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Rambur, Kalashgram - KALASHA Muslim
20.33
41.48
20.01
12.13
0.00
2.09
0.29
0.08
1.26
0.77
0.00
1.55
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Shishi Koh - KHO 03 [Singanali]
20.54
34.79
16.92
16.89
0.00
2.85
2.61
0.00
0.40
3.05
1.45
0.50
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Singhur, Shah Miran Deh - KHO 04
17.37
38.33
17.47
16.10
0.65
2.29
0.64
0.16
2.20
1.68
2.00
1.10
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Turkho, Mizegram - KHO 05
18.18
35.04
19.86
16.17
0.00
2.51
3.16
0.80
1.66
0.80
0.42
1.27
0.00
0.00
0.12
0.00


ASC::KPK; Chitral, Yarkhun, Lasht - KHO 07
17.38
35.00
19.76
17.42
0.00
3.24
5.36
0.00
0.41
0.66
0.00
0.72
0.00
0.00
0.00
0.00


ASC::KPK; Chitral, Yarkhun, Power - KHO 06
15.86
34.72
17.85
18.98
0.00
2.45
4.04
0.00
2.09
1.93
2.00
0.00
0.08
0.00
0.00
0.00


ASC::KPK; FATA - Pashtun [Mohmand]
20.28
38.32
18.24
11.46
0.00
3.53
0.20
0.41
1.70
0.00
3.89
1.80
0.16
0.00
0.00
0.00


ASC::KPK; Swabi - Pashtun [Yusufzai] | Mingle
21.82
36.46
22.17
12.84
0.38
0.06
1.42
0.10
0.32
0.00
1.40
3.03
0.00
0.00
0.00
0.00


ASC::KPK; Swat - Pashtun [Yusufzai]
24.80
35.83
17.21
13.56
0.00
2.38
0.42
0.00
1.54
1.06
2.38
0.51
0.00
0.00
0.00
0.33


ASC::PATHAN [Outlier] 02
11.81
39.78
21.19
11.31
0.00
2.52
5.22
0.16
0.00
0.00
0.67
7.09
0.00
0.26
0.00
0.00


ASC::PATHAN [Outlier] 05
10.51
27.53
13.35
10.26
0.00
12.80
17.28
0.07
0.92
0.34
4.34
2.61
0.00
0.00
0.00
0.00


ASC::PATHAN 01
20.03
42.18
18.41
13.01
0.00
0.21
0.00
0.85
0.23
2.31
0.15
2.62
0.00
0.00
0.00
0.00


ASC::PATHAN 03
25.26
40.27
16.10
11.47
0.15
2.34
0.52
0.00
1.54
0.00
0.00
2.33
0.00
0.00
0.00
0.00


ASC::PATHAN 04
19.50
42.97
13.62
12.85
2.08
1.43
0.08
0.21
0.28
1.12
2.17
3.69
0.00
0.00
0.00
0.00


ASC::PATHAN 06
25.96
42.74
17.46
9.61
0.84
0.00
0.00
0.70
1.14
1.37
0.00
0.19
0.00
0.00
0.00
0.00


ASC::PATHAN 07
20.64
41.74
13.93
13.44
0.88
1.80
0.00
0.49
1.29
0.92
3.13
1.73
0.00
0.00
0.00
0.00


ASC::PATHAN 08
21.23
38.08
20.66
9.91
0.99
2.15
0.66
0.56
2.35
0.29
0.93
2.17
0.00
0.00
0.00
0.00


ASC::PATHAN 09
25.60
41.47
14.63
11.35
0.95
1.27
0.00
0.12
0.64
1.34
0.38
1.74
0.00
0.51
0.00
0.00


ASC::PATHAN 10
26.76
41.34
12.67
12.59
0.00
1.45
0.97
0.35
1.56
0.00
0.81
1.49
0.00
0.00
0.00
0.00


ASC::PATHAN 11
29.82
46.48
13.42
7.75
0.00
0.00
0.00
0.00
0.91
0.28
1.32
0.00
0.00
0.00
0.00
0.00


ASC::PATHAN 12
22.47
41.61
18.45
11.31
2.74
0.85
0.00
0.43
1.53
0.59
0.00
0.00
0.00
0.00
0.00
0.00


ASC::PATHAN 13
34.59
38.09
9.44
8.99
0.00
2.64
0.00
0.63
0.00
1.01
2.53
1.97
0.00
0.00
0.09
0.00


ASC::PATHAN 14
36.10
43.51
9.44
4.37
0.86
1.90
0.00
0.47
0.45
0.00
0.64
2.26
0.00
0.00
0.00
0.00


ASC::PATHAN 15
21.94
43.67
11.58
13.15
0.00
2.38
0.00
0.12
0.44
0.00
2.41
4.30
0.00
0.00
0.00
0.00


ASC::PATHAN 16
18.18
44.23
19.35
11.89
0.00
1.58
0.00
0.00
1.74
0.06
0.00
2.95
0.00
0.00
0.00
0.00


ASC::PATHAN 17
21.37
43.23
12.53
11.47
0.00
2.73
2.30
0.05
1.07
0.00
0.81
4.29
0.00
0.00
0.00
0.17


ASC::PATHAN 18
23.95
43.51
12.64
13.14
0.00
1.59
2.35
0.36
0.00
0.00
0.00
1.98
0.49
0.00
0.00
0.00


ASC::PATHAN 19
21.20
40.40
19.56
12.69
1.56
1.64
0.00
0.00
0.56
0.80
0.00
1.59
0.00
0.00
0.00
0.00


ASC::PATHAN 20
28.53
41.36
12.20
10.64
0.00
2.79
0.16
0.61
0.93
0.74
0.78
1.27
0.00
0.00
0.00
0.00


ASC::PATHAN 21
18.10
43.45
18.75
13.80
1.59
0.26
0.30
0.00
0.91
1.90
0.42
0.51
0.00
0.00
0.00
0.00


ASC::PATHAN 22
21.04
41.21
19.01
11.36
0.00
2.53
1.18
0.31
0.00
1.12
0.07
2.14
0.00
0.00
0.00
0.00


ASC::PATHAN 23
26.32
39.68
16.07
9.38
0.12
0.93
0.85
0.34
2.04
1.07
0.71
2.51
0.00
0.00
0.00
0.00


ASC::PK; BAL, Musa Khel - Pashtun [Kakar]
20.50
37.14
21.07
14.37
0.00
1.08
0.28
0.00
1.19
1.16
0.97
2.17
0.05
0.00
0.00
0.00


ASC::TAJIK - ISHKASHIM 01
12.73
33.47
20.54
19.44
0.00
4.45
3.97
1.11
1.35
0.86
1.42
0.65
0.00
0.00
0.00
0.00


ASC::TAJIK - ISHKASHIM 02
12.60
34.09
21.05
20.33
3.18
0.96
0.00
0.68
1.92
1.55
2.14
1.50
0.00
0.00
0.00
0.00


ASC::TAJIK - ISHKASHIM 03
11.23
33.04
21.59
19.59
0.26
3.99
2.07
0.07
2.33
0.41
4.22
0.82
0.00
0.00
0.37
0.00


ASC::TAJIK - ISHKASHIM 04
14.24
33.60
15.48
23.04
0.00
3.19
3.32
0.96
1.17
0.64
3.61
0.72
0.00
0.00
0.00
0.00


ASC::TAJIK - RUSHAN 01
6.87
32.34
18.38
25.70
0.00
3.69
3.44
0.00
2.16
1.33
3.88
2.21
0.00
0.00
0.00
0.00


ASC::TAJIK - RUSHAN 02
8.66
31.93
20.14
24.05
1.03
3.42
1.79
1.05
0.98
1.11
5.15
0.44
0.24
0.00
0.00
0.00


ASC::TAJIK - RUSHAN 03
9.04
33.54
18.99
22.07
1.56
2.58
2.22
0.08
2.03
1.97
5.56
0.37
0.00
0.00
0.00
0.00


ASC::TAJIK - RUSHAN 04
8.64
29.73
21.64
22.69
0.00
4.19
3.13
0.00
1.01
1.13
4.67
3.18
0.00
0.00
0.00
0.00


ASC::TAJIK - SHUGNAN 01
9.93
33.26
20.31
22.54
0.00
4.15
1.61
0.19
2.49
0.52
4.58
0.22
0.00
0.18
0.00
0.00


ASC::TAJIK - SHUGNAN 02
9.26
30.41
21.47
20.45
0.12
5.02
4.60
0.07
1.52
1.42
3.90
1.75
0.00
0.00
0.00
0.00


ASC::TAJIK - YAGNOBI 01
4.47
29.11
26.98
22.31
0.00
3.16
2.55
0.31
1.25
1.00
4.70
4.11
0.00
0.00
0.00
0.00


ASC::TAJIK - YAGNOBI 02
4.66
30.63
26.66
21.92
0.00
3.48
2.32
0.00
1.36
0.00
5.42
3.57
0.00
0.00
0.00
0.00


ASC::TAJIK - YAGNOBI 03
4.14
28.81
25.74
22.53
0.64
3.45
1.97
0.68
1.18
0.97
5.99
3.88
0.00
0.00
0.00
0.00


ASC::TAJIKISTANI 01
5.27
29.74
22.72
16.55
0.00
4.46
7.10
0.00
1.82
0.55
6.10
5.31
0.00
0.38
0.00
0.00


ASC::TAJIKISTANI 02
8.87
23.96
20.12
15.80
0.36
6.73
11.15
0.46
1.77
1.88
5.46
3.36
0.00
0.06
0.00
0.00


ASC::TAJIKISTANI 03
6.48
28.08
23.75
14.93
0.00
5.16
6.97
1.12
1.59
2.23
5.74
3.52
0.12
0.32
0.00
0.00


ASC::TAJIKISTANI 04
7.72
28.95
22.63
13.88
0.00
5.47
9.34
0.06
1.51
0.87
6.18
3.38
0.00
0.00
0.00
0.00


ASC::AFG - Pashtun [Achakzai] + ⅛ Tajik
21.90
33.91
17.51
11.29
1.31
1.90
5.23
0.00
2.24
1.03
1.80
1.57
0.27
0.00
0.00
0.00


ASC::AFG - Pashtun [Amarkhel] + Hazara
15.04
32.91
19.39
13.16
2.21
6.54
2.35
0.19
0.00
1.80
2.38
4.03
0.00
0.00
0.00
0.00


ASC::AFG - Pashtun [Barak]
17.04
36.04
22.65
11.88
1.49
2.19
0.00
0.81
0.32
1.72
2.97
2.89
0.00
0.00
0.00
0.00


ASC::AFG - Pashtun [Barakzai]
20.68
35.32
20.30
12.58
1.31
0.42
2.51
0.00
1.30
0.73
2.01
2.85
0.00
0.00
0.00
0.00


ASC::AFG - Pashtun [Mohammadzai]
14.72
34.19
26.43
12.85
0.00
2.10
1.19
0.00
0.98
1.08
2.72
3.15
0.00
0.00
0.00
0.59


ASC::AFG - Pashtun [Tscharki]
13.25
31.27
24.70
14.15
0.00
3.01
3.19
0.00
0.00
1.21
4.24
4.99
0.00
0.00
0.00
0.00


ASC::AFG - Pashtun/Tajik [Nassiri]
16.01
34.14
21.55
11.74
0.00
4.35
4.69
0.00
1.47
0.85
1.35
3.84
0.00
0.00
0.00
0.00


ASC::AFG - Qizilbash
11.56
27.77
28.45
11.14
0.77
4.31
4.80
0.08
0.81
1.36
1.57
7.16
0.23
0.00
0.00
0.00


ASC::AFG - Tajik [⅞ Qizilbash + ⅛ Iranian]
15.02
29.87
25.10
11.76
0.00
4.40
6.29
0.00
0.45
0.00
3.16
3.94
0.00
0.00
0.00
0.00


ASC::AFG - Tajik [Ahmedi]
24.39
35.14
18.23
10.64
0.00
0.67
3.91
0.46
1.65
1.06
1.53
1.56
0.18
0.48
0.07
0.00


ASC::AFG - Tajik [Ghazizadeh]
20.88
32.20
16.56
9.45
0.21
3.61
6.81
0.33
1.68
1.22
2.92
3.86
0.00
0.00
0.00
0.27


ASC::AFG - Tajik [Syed: Shah]
16.61
35.90
22.95
10.98
0.17
4.26
1.85
0.00
0.86
1.64
1.68
3.01
0.00
0.00
0.00
0.08


ASC::AFG - Tajik [Yousefzada]
30.69
36.95
14.00
11.99
0.00
1.23
0.00
0.59
1.18
1.92
1.33
0.09
0.00
0.00
0.00
0.00


ASC::AFG: PANJSHIR - Tajik [Korshidzada] + ľ Samarqandi
17.02
37.79
22.21
12.22
0.99
2.44
0.59
0.82
0.85
0.25
1.26
3.26
0.00
0.00
0.00
0.29


ASC::AFG; HELMAND, Gereshk - Tajik [Syed: Naqwi]
13.93
31.68
21.54
9.69
0.11
5.01
6.00
0.00
0.83
0.00
2.72
6.46
0.00
0.82
0.00
1.21


ASC::AFG; HERAT - Tajik [Syed: Heravi]
13.13
29.25
23.07
10.25
0.60
5.25
4.07
0.55
0.96
1.02
5.31
6.29
0.11
0.00
0.14
0.00


ASC::AFG; KANDAHAR - Pashtun [Ayubi]
15.66
36.50
23.44
14.70
0.05
2.00
1.35
0.23
0.78
0.70
0.34
4.24
0.00
0.00
0.00
0.00


ASC::AFG; KAPISA, Kohistan - Tajik [Rahmati/Bukhari]
14.42
29.84
22.32
13.40
1.21
3.73
4.91
0.00
2.08
1.26
3.64
2.71
0.24
0.00
0.00
0.23


ASC::AFG; KUNDUZ - Pashtun [Mohmand]
18.46
34.19
20.39
13.76
0.00
2.38
2.55
1.19
1.64
0.45
2.39
2.58
0.00
0.00
0.00
0.00


ASC::AFG; KUNDUZ - Tajik [Syed]
7.56
23.12
19.57
13.49
2.74
8.22
11.36
0.00
2.67
0.54
5.02
5.66
0.00
0.00
0.00
0.05


ASC::AFG; LAGHMAN - Pashtun/Tajik
22.20
34.92
20.96
10.72
0.00
0.77
1.53
0.31
0.94
1.85
2.06
3.62
0.12
0.00
0.00
0.00


ASC::AFG; LOGAR - Pashtun [Sahak/Ahmadzai]
15.29
35.12
19.43
15.33
0.00
2.51
1.72
0.09
1.33
0.78
2.77
5.28
0.00
0.00
0.00
0.34


ASC::AFG; PANJSHIR - Tajik + Pashtun [Wardak] | Rukha
15.63
35.45
23.10
15.57
0.00
1.18
2.37
0.56
2.70
0.00
0.18
2.81
0.00
0.00
0.00
0.45

khanabadoshi
02-03-2018, 12:41 AM
Im interested in an ANE_K7 run or a calc that has ancients as opposed to modern references, because those are more informative for SAs like me with older admixture

Suggestions on the best ancients calc Kurd Khana Sapporo anyone else?

I like Davidski's Basal-Rich K7

Xehanort
02-03-2018, 12:55 AM
I like Davidski's Basal-Rich K7

Davidski is the man for these things.

bmoney
02-03-2018, 01:00 AM
I like Davidski's Basal-Rich K7

I'm interested in that, does Davidski have reference pops we can use for a PCA too?

khanabadoshi
02-03-2018, 01:04 AM
I'm interested in that, does Davidski have reference pops we can use for a PCA too?

Yes. I made a PCA for the BRK7 and his Global 10 both. I still have the files somewhere. More people tested on the Global 10, and he has more references for it, but the BRK7 is actually a calculator while the Global 10 is coordinates for a genome-wide PCA (I think).

EDIT:
You'll have to zoom in like crazy, its a massive PDF: https://1drv.ms/b/s!Am_OOQX_t-4D3B5IxG2kEMnEVRoa

Dendrogram: https://1drv.ms/b/s!Am_OOQX_t-4D3By4s2p_fKesz9Iv

bmoney
02-03-2018, 01:26 AM
Yes. I made a PCA for the BRK7 and his Global 10 both. I still have the files somewhere. More people tested on the Global 10, and he has more references for it, but the BRK7 is actually a calculator while the Global 10 is coordinates for a genome-wide PCA (I think).

EDIT:
You'll have to zoom in like crazy, its a massive PDF: https://1drv.ms/b/s!Am_OOQX_t-4D3B5IxG2kEMnEVRoa

Dendrogram: https://1drv.ms/b/s!Am_OOQX_t-4D3By4s2p_fKesz9Iv

fascinating - so Brahuis are the closest living relatives to Iran_N

HGDP Pathans and Kalash overlap

Gujarati A is basically Sindhi

Burusho and UP Brahmin are cousins - maybe the z93 wave came via the Hunza valley

Chenchu Chamar Gujarati D and Velamas form a group

Kurd
02-03-2018, 01:26 AM
Im interested in an ANE_K7 run or a calc that has ancients as opposed to modern references, because those are more informative for SAs like me with older admixture

Suggestions on the best ancients calc Kurd Khana Sapporo anyone else?

The K12 is the most accurate ancient based calculator with regards to admixture percentages because the component labels and component allele frequencies are not skewed/tainted by the 100s of non-references you see included in other ADMIXTURE based tests. The following should be kept in mind when doing a PCA:

1- Imperative to run PCA on 23 V4 and V3 only
2- Combine MLBA and EMBA comp into 1 comp
3- Try to equalize number of samples from various regions as much as practical


The K29 is the most accurate high K modern based ADMIXTURE calculator for the aforementioned reasons. Would be interesting to see a PCA for it.

Edit: with the K12 the component allele frequencies for the non E Asian comps are based 100% on ancients, whereas with some other ancient calculators out there for ex Iran N comp allele frequencies is not based 100% on ancient Iranians only, but rather is based on a combination of ancients and modern SC Asians

I was tempted to do this in the K12 to mitigate the underestimation of Iran N due to limited number of samples as compared with EEF, but resisted the temptation in order to keep comps purely based on ancients

khanabadoshi
02-03-2018, 01:28 AM
The K12 is the most accurate ancient based calculator with regards to admixture percentages because the component labels and component allele frequencies are not skewed/tainted by the 100s of non-references you see included in other ADMIXTURE based tests. The following should be kept in mind when doing a PCA:

1- Imperative to run PCA on 23 V4 and V3 only
2- Combine MLBA and EMBA comp into 1 comp
3- Try to equalize number of samples from various regions as much as practical


The K29 is the most accurate high K modern based ADMIXTURE calculator for the aforementioned reasons. Would be interesting to see a PCA for it.

Will get on the PCA :)

khanabadoshi
02-03-2018, 01:29 AM
fascinating - so Brahuis are the closest living relatives to Iran_N

HGDP Pathans and Kalash overlap

Gujarati A is basically Sindhi

Burusho and UP Brahmin are cousins - maybe the z93 wave came via the Hunza valley

Chenchu Chamar Gujarati D and Velamas form a group

Depends on the calculator, on many calcs the Bandari are the closest, then the Makrani. If I remember, it's the case for this one too, but it's only obvious in 3d pca. (but I might be mistaken).

Kurd
02-03-2018, 01:47 AM
fascinating - so Brahuis are the closest living relatives to Iran_N

HGDP Pathans and Kalash overlap

Gujarati A is basically Sindhi

Burusho and UP Brahmin are cousins - maybe the z93 wave came via the Hunza valley

Chenchu Chamar Gujarati D and Velamas form a group


No ADMIXTURE based calculator can ever reliably settle this question. A much more sensitive test is required such as the 1 to 1 derived allele test I have been working on for the past couple of months. There’s absolutely no test like it anywhere. I have put the script together but have been doing QC for the past several weeks. It does not give a percentage but rather a test sample likeness score to a target sample or population. It takes into account many more factors that contribute to errors and biases processing genomes.

It will be made available via GenePlaza when ready to roll out.

bmoney
02-03-2018, 01:47 AM
The K12 is the most accurate ancient based calculator with regards to admixture percentages because the component labels and component allele frequencies are not skewed/tainted by the 100s of non-references you see included in other ADMIXTURE based tests. The following should be kept in mind when doing a PCA:

1- Imperative to run PCA on 23 V4 and V3 only
2- Combine MLBA and EMBA comp into 1 comp
3- Try to equalize number of samples from various regions as much as practical


The K29 is the most accurate high K modern based ADMIXTURE calculator for the aforementioned reasons. Would be interesting to see a PCA for it.

Edit: with the K12 the component allele frequencies for the non E Asian comps are based 100% on ancients, whereas with some other ancient calculators out there for ex Iran N comp allele frequencies is not based 100% on ancient Iranians only, but rather is based on a combination of ancients and modern SC Asians

I was tempted to do this in the K12 to mitigate the underestimation of Iran N due to limited number of samples as compared with EEF, but resisted the temptation in order to keep comps purely based on ancients

What do you mean by non-references? SNPs not indicative of ancestry?

bmoney
02-03-2018, 01:50 AM
No ADMIXTURE based calculator can ever reliably settle this question. A much more sensitive test is required such as the 1 to 1 derived allele test I have been working on for the past couple of months. There’s absolutely no test like it anywhere. I have put the script together but have been doing QC for the past several weeks. It does not give a percentage but rather a test sample likeness score to a target sample or population. It takes into account many more factors that contribute to errors and biases processing genomes.

It will be made available via GenePlaza when ready to roll out.

Cant w8

yes i see what you mean - a similar assortment of markers doesn't necessarily indicate genesis from a particular ancient sample, its more related to similarity between groups

Xehanort
02-03-2018, 01:53 AM
The K12 is the most accurate ancient based calculator with regards to admixture percentages because the component labels and component allele frequencies are not skewed/tainted by the 100s of non-references you see included in other ADMIXTURE based tests. The following should be kept in mind when doing a PCA:

1- Imperative to run PCA on 23 V4 and V3 only
2- Combine MLBA and EMBA comp into 1 comp
3- Try to equalize number of samples from various regions as much as practical


The K29 is the most accurate high K modern based ADMIXTURE calculator for the aforementioned reasons. Would be interesting to see a PCA for it.

Edit: with the K12 the component allele frequencies for the non E Asian comps are based 100% on ancients, whereas with some other ancient calculators out there for ex Iran N comp allele frequencies is not based 100% on ancient Iranians only, but rather is based on a combination of ancients and modern SC Asians

I was tempted to do this in the K12 to mitigate the underestimation of Iran N due to limited number of samples as compared with EEF, but resisted the temptation in order to keep comps purely based on ancients

Edited!

Kurd
02-03-2018, 02:01 AM
What do you mean by non-references? SNPs not indicative of ancestry?


In all Gedmatch calculators and other ADMIXTURE based tests out there designers load the run up with 100s of samples that contribute to the component allele frequencies. For ex in a supervised run they will designate perhaps 10 component references for the calculator components and label the components accordingly. The problem is for ex if its a Gedrosian component and you have 10 Gedrosian references, the other non-references in the run Indians etc will stack up under that comp since it better lines up with them than E Asian or Caucasian. The problem is when they stack up they turn the comp from a Gedrosian to a hybrid Gedrosian/Indian comp. That causes an avalanche effect because now the comp becomes less attractive to say Iranians and thus pushes some of their % to Caucasian. This causes some S Euros for ex to get pushed out of Caucasian and so on. That is why I or Ancestry DNA don’t load up our runs with many non-ref samples

poi
02-03-2018, 02:24 AM
Will get on the PCA :)
Bro, do you have the latest k29?

bmoney
02-03-2018, 02:28 AM
In all Gedmatch calculators and other ADMIXTURE based tests out there designers load the run up with 100s of samples that contribute to the component allele frequencies. For ex in a supervised run they will designate perhaps 10 component references for the calculator components and label the components accordingly. The problem is for ex if its a Gedrosian component and you have 10 Gedrosian references, the other non-references in the run Indians etc will stack up under that comp since it better lines up with them than E Asian or Caucasian. The problem is when they stack up they turn the comp from a Gedrosian to a hybrid Gedrosian/Indian comp. That causes an avalanche effect because now the comp becomes less attractive to say Iranians and thus pushes some of their % to Caucasian. This causes some S Euros for ex to get pushed out of Caucasian and so on. That is why I or Ancestry DNA don’t load up our runs with many non-ref samples

Ah ok gotcha - yes makes sense

So in reality Iranians would score higher of actual Baloch

I was of the impression that Iranians had higher levels of more recent Caucasus derived ancestry than Iran_N - but i probably extrapolated from Harappa to get that idea

khanabadoshi
02-03-2018, 02:34 AM
Bro, do you have the latest k29?

TBH, I'm not sure LOL. I curate here and there. Lemme check.

Xehanort
02-03-2018, 02:44 AM
Ah ok gotcha - yes makes sense

So in reality Iranians would score higher of actual Baloch

I was of the impression that Iranians had higher levels of more recent Caucasus derived ancestry than Iran_N - but i probably extrapolated from Harappa to get that idea

No, Iranians are Iran Chalcolithic not Iran Neolithic. Only Bandaris have significant Iran Neolithic ancestry.

khanabadoshi
02-03-2018, 02:48 AM
*The post has since been removed*



Everyone has different opinions regarding the accuracy of a calculator or the methods of a calculator designer. You are entitled to your opinion of disliking his calculators, however, there is no need to be unnecessarily rude regarding his work. Irrespective of whether or not you agree with his methods, he does put an inordinate amount of time into delving into autosomal genetics, with a specific focus on South Asians (an effort you'd be hard-pressed to substitute with another). He has made dozens of calculators for free before ever charging for one, and I believe he is charging only because Gedmatch cannot host his type of calculator so he must resort to GenePlaza.

Normally, I wouldn't get involved in these types of disagreements or posts, however, I'm a moderator now. I feel it's appropriate to nip this tangent in the bud before it gets too riley.

If you have one-on-one issues with his methods you want to discuss, I suggest you PM him. If you wish to dispute his methods in a thread, then argue against the methodology, not the man.

Kurd
02-03-2018, 03:14 AM
Per the Anthrogenica ToS:

3.8 Tangents from thread topics are an organic feature of discussion and are to be expected, but threads should remain reasonably on-topic. The administration may split, delete, merge, or create new threads without notice should a need for such maintenance arise.

3.12 Anthrogenica encourages its members to participate in discussions in a topic-focused manner. Personalization of discussions is completely prohibited at all times. This includes (and is not limited to) direct personal attacks, accusations, insinuations and false disclosures. Additionally, discussions that degenerate into inconsequential flaming or inanity will be deleted without prior notice.
Note that this discussion policy also applies to Anthrogenica's Private Messaging and Visitor Message functions.

Violating posts have been removed.

Xehanort
02-03-2018, 03:44 AM
Per the Anthrogenica ToS:

3.8 Tangents from thread topics are an organic feature of discussion and are to be expected, but threads should remain reasonably on-topic. The administration may split, delete, merge, or create new threads without notice should a need for such maintenance arise.

3.12 Anthrogenica encourages its members to participate in discussions in a topic-focused manner.
Personalization of discussions is completely prohibited at all times. This includes (and is not limited to) direct personal attacks, accusations, insinuations and false disclosures. Additionally, discussions that degenerate into inconsequential flaming or inanity will be deleted without prior notice.
Note that this discussion policy also applies to Anthrogenica's Private Messaging and Visitor Message functions.

Violating posts have been removed.

Xehanort
02-03-2018, 03:55 AM
Edited.

Xehanort
02-03-2018, 04:02 AM
Edited.

Xehanort
02-03-2018, 04:04 AM
Edited.

khanabadoshi
02-03-2018, 04:05 AM
Consider this fair warning. This thread will be monitored.
Please remain on topic.

Xehanort
02-03-2018, 04:08 AM
Consider this fair warning. This thread will be monitored.
Please remain on topic.



Edited.

Xehanort
02-03-2018, 04:10 AM
I have removed my posts.

khanabadoshi
02-03-2018, 04:11 AM
I have removed my posts.

Thank you. I appreciate it.

Xehanort
02-03-2018, 04:12 AM
Thank you. I appreciate it.

No problem bro.

poi
02-03-2018, 04:12 AM
Okay, back to the topic of harappa PCA. I want to plot some Chitralis next.

poi
02-03-2018, 08:05 AM
Here is the brand new PCA (revision 4) with Chitrali, Afghan, and Tajik populations that Khana posted in above:

Notes:

The biplot, interestingly, has not changed
The 1st quadrant is still Caucasian+NE-Euro+NE-Asian+Siberian+SW-Asian pull, probably even more so with the addition of Tajiks/Afghans
Almost all SouthAsian populations have shifted towards SouthIndian (3rd quadrant) and Baloch (4th quadrant)


Biplot:
21188
https://i.imgur.com/6qg2o2U.png

Overall:
21191
https://i.imgur.com/qauOHv6.png

Members zoomed in:
21190
https://i.imgur.com/8HCIw7U.png
https://i.imgur.com/8HCIw7U.png

poi
02-03-2018, 07:40 PM
Okay, the last revision, hopefully: added @Misanthrophy, as well as Turkmen and Iranian averages.

Biplot - Caucasian has a major pull towards the first 1st quadrant, while SouthIndian and Baloch are pulling things to the 3rd and 4th quadrants respectively.
21209
https://i.imgur.com/ABd52gEg.png

Overall
21210
https://i.imgur.com/US0c4DW.png

Zoomed in
21211
https://i.imgur.com/wsSkpjbg.png

---

3D PCA - note that the 2D PC1+PC2 account for over 92% of the variance, so the 3rd dimension may not be that important. Still, here it is.

https://plot.ly/~mpan19/19.embed

Xehanort
02-03-2018, 07:54 PM
Okay, the last revision, hopefully: added @Misanthrophy, as well as Turkmen and Iranian averages.

Biplot - Caucasian has a major pull towards the first 1st quadrant, while SouthIndian and Baloch are pulling things to the 3rd and 4th quadrants respectively.
21209
https://i.imgur.com/ABd52gEg.png

Overall
21210
https://i.imgur.com/US0c4DW.png

Zoomed in
21211
https://i.imgur.com/wsSkpjbg.png

Thanks Poi. I think my Baloch shift on this pulls me very close to Arains. In other tests, I cluster with Sindhis. This isn't weird at all though, considering my heavy Iran Neolithic (Baloch) admixture in comparison to my steppe proportions (Caucasian+NE Euro).

Mingle
02-03-2018, 11:32 PM
Okay, the last revision, hopefully: added @Misanthrophy, as well as Turkmen and Iranian averages.

Biplot - Caucasian has a major pull towards the first 1st quadrant, while SouthIndian and Baloch are pulling things to the 3rd and 4th quadrants respectively.
21209
https://i.imgur.com/ABd52gEg.png

Overall
21210
https://i.imgur.com/US0c4DW.png

Zoomed in
21211
https://i.imgur.com/wsSkpjbg.png

---

3D PCA - note that the 2D PC1+PC2 account for over 92% of the variance, so the 3rd dimension may not be that important. Still, here it is.

https://plot.ly/~mpan19/19.embed

Great work, the 3D model looks very fascinating :) I'd be curious how different this would look if we did one with another calculator. I would love to give it a go but I'm 100% illiterate in these kind of stuff. Anyways, that is another topic to discuss.

What I found interesting was how far away the Iran sample was from the Baloch sample and how close the Turkmen sample was to it. I'm guessing this is because the Iranian sample was from Northwestern Iran. A Coastal Persian (Iranian Bandari) would probably be closer to Baloches than to Turkmens. Anatolian Turks are also pretty close to Turkmens and Northwest Persians as well. Maybe you can add an Iranian Bandari sample if you want. An Uzbek one would be interesting too as they tend to cluster with Tajikistanis from what I recall.

All the Tajik and Pashtun samples seem to be above the x-axis line (with the one exception being 'PATHAN-avg'). That really makes the chart easy to follow and look more organized. By the way, do you know why the 'PATHAN-avg' is an outlier if it is supposed to be an average? And do you know which part of Afghanistan the 'AFG-Tajik-Avg' is from?

I'm a bit confused on how your x-axis vs. y-axis works. How come tribals and some South Indians are in the northwest quadrant whereas Brahmin South Indians and Bengalis tend to be in the southwest quadrant? Shouldn't it be the other way around with Brahmins in the north and tribals in the south?

Sapporo
02-04-2018, 12:32 AM
All the Tajik and Pashtun samples seem to be above the x-axis line (with the one exception being 'PATHAN-avg'). That really makes the chart easy to follow and look more organized. By the way, do you know why the 'PATHAN-avg' is an outlier if it is supposed to be an average? And do you know which part of Afghanistan the 'AFG-Tajik-Avg' is from?


I'd note that it isn't just the HGDP Pathan who are under the x-axis but the HGDP Burusho, HGDP Kalash and KPK Chitral Nuristani average. Now, they might score minimally more SI (like 1-2%) on Harappa than some of the populations above the x-axis but a better explanation is these samples are very old and thus their results were calculated on very old chips. More importantly, however, is that they are pulled toward the HGDP Pakistani Baloch/Brahui cluster since the majority of them score very high Baloch (40-46%; including those Nuristani samples khana managed to get) while scoring slightly lower Caucasus (15-19%). Since the Baloch/Brahui cluster is pulled in a specific direction below the x-axis, these groups have a notable pull toward that cluster.

On a separate note, we have seen the results of two Iranian Baloch on this forum. They seem to be much more Near Eastern/Iran ChL shifted than the Pakistani Baloch. On Harappa, they score much lower Baloch and much higher Caucasus than the HGDP Baloch.

bmoney
02-04-2018, 12:48 AM
I'd note that it isn't just the HGDP Pathan who are under the x-axis but the HGDP Burusho, HGDP Kalash and KPK Chitral Nuristani average. Now, they might score minimally more SI (like 1-2%) on Harappa than some of the populations above the x-axis but a better explanation is these samples are very old and thus their results were calculated on very old chips. More importantly, however, is that they are pulled toward the HGDP Pakistani Baloch/Brahui cluster since the majority of them score very high Baloch (40-46%; including those Nuristani samples khana managed to get) while scoring slightly lower Caucasus (15-19%). Since the Baloch/Brahui cluster is pulled in a specific direction below the x-axis, these groups have a notable pull toward that cluster.

On a separate note, we have seen the results of two Iranian Baloch on this forum. They seem to be much more Near Eastern/Iran ChL shifted than the Pakistani Baloch. On Harappa, they score much lower Baloch and much higher Caucasus than the HGDP Baloch.

I think the Iranian Baloch are a distinct pop to the Paki Baloch who are Iranicized Brahui. You'll notice a pull towards the north compared to the Brahui due to the recent West Asian admixture from a pop like the Iranian Baloch who provided the language shift

Poi can we do a basal rich k7 run or ane_k7 run if no ones interesting in purchasing the basal k7 which looks pretty good IMO.

We have pop average spreadsheets for both

@Davidski group discount?

Kurd
02-04-2018, 02:14 AM
Great work, the 3D model looks very fascinating :) I'd be curious how different this would look if we did one with another calculator. I would love to give it a go but I'm 100% illiterate in these kind of stuff. Anyways, that is another topic to discuss.

What I found interesting was how far away the Iran sample was from the Baloch sample and how close the Turkmen sample was to it. I'm guessing this is because the Iranian sample was from Northwestern Iran. A Coastal Persian (Iranian Bandari) would probably be closer to Baloches than to Turkmens. Anatolian Turks are also pretty close to Turkmens and Northwest Persians as well. Maybe you can add an Iranian Bandari sample if you want. An Uzbek one would be interesting too as they tend to cluster with Tajikistanis from what I recall.

All the Tajik and Pashtun samples seem to be above the x-axis line (with the one exception being 'PATHAN-avg'). That really makes the chart easy to follow and look more organized. By the way, do you know why the 'PATHAN-avg' is an outlier if it is supposed to be an average? And do you know which part of Afghanistan the 'AFG-Tajik-Avg' is from?

I'm a bit confused on how your x-axis vs. y-axis works. How come tribals and some South Indians are in the northwest quadrant whereas Brahmin South Indians and Bengalis tend to be in the southwest quadrant? Shouldn't it be the other way around with Brahmins in the north and tribals in the south?


Admixture percentage based PCAs are one of the worst tools to use if you are looking for an accurate estimate of genetic distance between 2 samples and here is why:

1- You can't capture all variation in 2 dimensions. For example, Iranian and Turkmen may look close, but that is because the axis that captures E Asian variation , which maybe shown in PC3-PC4 axes is hidden from your view (even though I did not run the PCA myself, its an educated guess based on the PCAs I've generated). So if you were to look at a plot of PCA3-PCA they may not be that close. Remember you have to add the distances from all PCs (axes), PC1/2, PC3/4, PC5/6, etc. So in effect even an oracle distance measurement would convey more accurate info than a simple glance at PC1/2 because the oracle distance would add up the differences from ALL the components and convey that info to you as ONE number.

2- Component references are even trickier, whether it be oracles or PCAs. Here is why. Imagine if you declare Balochis as 100% Balochi. If you now throw Balochis into oracles or PCAs they are going to seem that they are from Mars because their score would be 100%. Mathematically they will have a HUGE oracle distance from everyone else because all their other comps would be 0. Since PCAs and oracles are sensitive to A-A', B-B', C-C', etc where each letter represents a component score.


Just to give you an idea how off things are in a PCA. Here are 20 HGDP Brahuis. What I did is tabulated their actual alleles for each position in the genome. I wrote a script to compare their alleles at each single position in the genome and report to me ONLY the postions where there was allele agreement between ALL 20 Brahuis.

So for example, if 18 of them where C/C for position rs1234 and 2 of the Brahuis were T/T for that position, I trashed that position ( I removed the position from the analysis). Why did I do this? Because I wanted to make 100% sure that I only included positions in the analysis where ALL 20 Brahuis had allele agreement. Why? Because I wanted to make sure I was only including alleles that are SPECIFIC to Brahuis ( in effect where Brahuis have an allele frequency of 100%).

After extracting positions where ALL 20 Brahuis agreed on a certain allele I was left with only 1541 positions. So now I took those 1541 position with Brahui allele assignments and compared them to a few dozen populations, and documented how many alleles each population shared with them.

I tabulated and sorted the table from highest shared alleles on top:



Population
Unadjusted sharing of Hom-Ref Alleles (0/0)


20 BRAHUIS
1,541.00


Balochi
1,424.58


Kalash
1,415.54


Kurds_Kurmanji
1,404.80


Sindhi
1,404.67


Pathan
1,404.47


Abkhasian
1,403.44


Iran_Zoroastrian
1,403.14


Georgian
1,401.00


Iran_Fars
1,400.59


Punjabi
1,400.50


Iranian
1,399.68


Armenian
1,399.40


Kurds_Feyli
1,399.20


Assyrian
1,397.82


GujaratiD
1,395.80


Jew_iraqi
1,394.33


Kumyk
1,390.88


Tajik
1,389.00


Chechen
1,386.56


North_Ossetian
1,386.30


Balkar
1,379.80


Druze
1,379.59


Saudi
1,379.00


Turkish
1,378.12


Greek
1,372.57


BedouinB
1,370.24


Romanian
1,370.10


English
1,369.50


Jew_Ashkenazi
1,361.29


Icelandic
1,360.42


Jordanian
1,359.11


French
1,358.39


Norwegian
1,357.82


Sardinian
1,355.52


Ukrainian
1,354.89


Estonian
1,353.60


Nogai
1,353.44


Finnish
1,352.57


Belarusian
1,352.38


Scottish
1,347.50


Chuvash
1,344.70


Stuttgart
1,339.00


Uzbek
1,335.50


Uygur
1,332.60


Nganasan
1,291.89


Han
1,284.69


Ulchi
1,275.00


Somali
1,208.54


Khomani
1,147.50




This is where researchers would stop the analysis, but not me, because my experience with genotyping had indicated unequal representation of populations in the Hg 19 Human Reference, so I decided to go the extra mile and investigate Build 37/ Hg19 Human Reference, because if there is unequal representation of the various populations by the anonymous volunteers that contributed to the Hg19 then that would theoretically bias the genotyping of the HGDP Brahuis towards populations with greater representation in Hg19.

So how to go about figuring out how much each population has contributed to the Hg19 Human Reference if that information is secret. What I did was take the above populations and see how many alleles each shared with Hg19, because then I could come up with a coefficient called Beta that would equalize the playing field for all populations by mitigating the differences in contribution to Hg19.

After multiplying the coefficient with the number of alleles shared above, I came up with this adjusted table also sorted. Keep in mind this is not something you will see published anywhere.

Since we are talking Homozygous Reference sites only the discrepencies in representation are not that bad. They are much worse when it comes to hetrozygous sites (0/1s), however unlike some of the other pops I have analyzed there was not much agreement in hetro sites with Brahuis and thus none were used in the analysis. I have some high coverage Brahuis and Baloch from the Simmons Diversity Project with a few million SNPs genotyped that will yield more accurate results but for now here is the adjusted table. Again keep in mind you'll not see an adjusted table anywhere else. Not ideal because variant sites were not included, but I intend to include them when I use the SGDP data in the future




Population
Correction Factor (β)
Adjusted sharing of Hom-Ref Alleles (0/0)


20 BRAHUIS
1.02
1,571.82


Balochi
1.03059
1,443.34


Punjabi
1.01081
1,439.97


Sindhi
1.02262
1,436.44


Pathan
1.02141
1,434.55


GujaratiD
1.02632
1,432.54


Kalash
1.01191
1,432.40


Iran_Fars
1.01750
1,425.10


Kurds_Kurmanji
1.01383
1,424.22


Kurds_Feyli
1.01715
1,423.20


Abkhasian
1.01385
1,422.88


Iranian
1.01575
1,421.73


Georgian
1.01327
1,419.59


Armenian
1.01406
1,419.08


Tajik
1.02066
1,417.70


Kumyk
1.01841
1,416.48


Assyrian
1.01148
1,413.87


North_Ossetian
1.01774
1,410.90


Balkar
1.02116
1,409.00


Jew_iraqi
1.00978
1,407.96


Iran_Zoroastrian
1.00212
1,406.11


Chechen
1.01247
1,403.85


Nogai
1.03298
1,398.07


Turkish
1.01187
1,394.48


Druze
1.00847
1,391.27


Greek
1.01214
1,389.24


Romanian
1.01372
1,388.90


Uygur
1.04206
1,388.65


Saudi
1.00488
1,385.73


Uzbek
1.03571
1,383.19


English
1.00905
1,381.89


Chuvash
1.02673
1,380.64


Jew_Ashkenazi
1.01378
1,380.05


Jordanian
1.01367
1,377.69


Ukrainian
1.01609
1,376.69


French
1.01035
1,372.45


BedouinB
1.00121
1,371.90


Belarusian
1.01433
1,371.76


Finnish
1.01397
1,371.47


Icelandic
1.00735
1,370.42


Norwegian
1.00921
1,370.33


Estonian
1.01157
1,369.27


Stuttgart
1.01753
1,362.47


Scottish
1.00857
1,359.05


Sardinian
1.00082
1,356.63


Han
1.02204
1,313.01


Nganasan
1.00669
1,300.53


Ulchi
1.00675
1,283.61


Somali
1.05000
1,268.97


Khomani
1.01465
1,164.31

MonkeyDLuffy
02-04-2018, 02:21 AM
Admixture percentage based PCAs are one of the worst tools to use if you are looking for an accurate estimate of genetic distance between 2 samples and here is why:

1- You can't capture all variation in 2 dimensions. For example, Iranian and Turkmen may look close, but that is because the axis that captures E Asian variation , which maybe shown in PC3-PC4 axes is hidden from your view (even though I did not run the PCA myself, its an educated guess based on the PCAs I've generated). So if you were to look at a plot of PCA3-PCA they may not be that close. Remember you have to add the distances from all PCs (axes), PC1/2, PC3/4, PC5/6, etc. So in effect even an oracle distance measurement would convey more accurate info than a simple glance at PC1/2 because the oracle distance would add up the differences from ALL the components and convey that info to you as ONE number.

2- Component references are even trickier, whether it be oracles or PCAs. Here is why. Imagine if you declare Balochis as 100% Balochi. If you now throw Balochis into oracles or PCAs they are going to seem that they are from Mars because their score would be 100%. Mathematically they will have a HUGE oracle distance from everyone else because all their other comps would be 0. Since PCAs and oracles are sensitive to A-A', B-B', C-C', etc where each letter represents a component score.


Just to give you an idea how off things are in a PCA. Here are 20 HGDP Brahuis. What I did is tabulated their actual alleles for each position in the genome. I wrote a script to compare their alleles at each single position in the genome and report to me ONLY the postions where there was allele agreement between ALL 20 Brahuis.

So for example, if 18 of them where C/C for position rs1234 and 2 of the Brahuis were T/T for that position, I trashed that position ( I removed the position from the analysis). Why did I do this? Because I wanted to make 100% sure that I only included positions in the analysis where ALL 20 Brahuis had allele agreement. Why? Because I wanted to make sure I was only including alleles that are SPECIFIC to Brahuis ( in effect where Brahuis have an allele frequency of 100%).

After extracting positions where ALL 20 Brahuis agreed on a certain allele I was left with only 1541 positions. So now I took those 1541 position with Brahui allele assignments and compared them to a few dozen populations, and documented how many alleles each population shared with them.

I tabulated and sorted the table from highest shared alleles on top:



Population
Unadjusted sharing of Hom-Ref Alleles (0/0)


20 BRAHUIS
1,541.00


Balochi
1,424.58


Kalash
1,415.54


Kurds_Kurmanji
1,404.80


Sindhi
1,404.67


Pathan
1,404.47


Abkhasian
1,403.44


Iran_Zoroastrian
1,403.14


Georgian
1,401.00


Iran_Fars
1,400.59


Punjabi
1,400.50


Iranian
1,399.68


Armenian
1,399.40


Kurds_Feyli
1,399.20


Assyrian
1,397.82


GujaratiD
1,395.80


Jew_iraqi
1,394.33


Kumyk
1,390.88


Tajik
1,389.00


Chechen
1,386.56


North_Ossetian
1,386.30


Balkar
1,379.80


Druze
1,379.59


Saudi
1,379.00


Turkish
1,378.12


Greek
1,372.57


BedouinB
1,370.24


Romanian
1,370.10


English
1,369.50


Jew_Ashkenazi
1,361.29


Icelandic
1,360.42


Jordanian
1,359.11


French
1,358.39


Norwegian
1,357.82


Sardinian
1,355.52


Ukrainian
1,354.89


Estonian
1,353.60


Nogai
1,353.44


Finnish
1,352.57


Belarusian
1,352.38


Scottish
1,347.50


Chuvash
1,344.70


Stuttgart
1,339.00


Uzbek
1,335.50


Uygur
1,332.60


Nganasan
1,291.89


Han
1,284.69


Ulchi
1,275.00


Somali
1,208.54


Khomani
1,147.50




This is where researchers would stop the analysis, but not me, because my experience with genotyping had indicated unequal representation of populations in the Hg 19 Human Reference, so I decided to go the extra mile and investigate Build 37/ Hg19 Human Reference, because if there is unequal representation of the various populations by the anonymous volunteers that contributed to the Hg19 then that would theoretically bias the genotyping of the HGDP Brahuis towards populations with greater representation in Hg19.

So how to go about figuring out how much each population has contributed to the Hg19 Human Reference if that information is secret. What I did was take the above populations and see how many alleles each shared with Hg19, because then I could come up with a coefficient called Beta that would equalize the playing field for all populations by mitigating the differences in contribution to Hg19.

After multiplying the coefficient with the number of alleles shared above, I came up with this adjusted table also sorted. Keep in mind this is not something you will see published anywhere.

Since we are talking Homozygous Reference sites only the discrepencies in representation are not that bad. They are much worse when it comes to hetrozygous sites (0/1s), however unlike some of the other pops I have analyzed there was not much agreement in hetro sites with Brahuis and thus none were used in the analysis. I have some high coverage Brahuis and Baloch from the Simmons Diversity Project with a few million SNPs genotyped that will yield more accurate results but for now here is the adjusted table. Again keep in mind you'll not see an adjusted table anywhere else. Not ideal because variant sites were not included, but I intend to include them when I use the SGDP data in the future




Population
Adjusted sharing of Hom-Ref Alleles (0/0)


20 BRAHUIS
1,571.82


Balochi
1,443.34


Punjabi
1,439.97


Sindhi
1,436.44


Pathan
1,434.55


GujaratiD
1,432.54


Kalash
1,432.40


Iran_Fars
1,425.10


Kurds_Kurmanji
1,424.22


Kurds_Feyli
1,423.20


Abkhasian
1,422.88


Iranian
1,421.73


Georgian
1,419.59


Armenian
1,419.08


Tajik
1,417.70


Kumyk
1,416.48


Assyrian
1,413.87


North_Ossetian
1,410.90


Balkar
1,409.00


Jew_iraqi
1,407.96


Iran_Zoroastrian
1,406.11


Chechen
1,403.85


Nogai
1,398.07


Turkish
1,394.48


Druze
1,391.27


Greek
1,389.24


Romanian
1,388.90


Uygur
1,388.65


Saudi
1,385.73


Uzbek
1,383.19


English
1,381.89


Chuvash
1,380.64


Jew_Ashkenazi
1,380.05


Jordanian
1,377.69


Ukrainian
1,376.69


French
1,372.45


BedouinB
1,371.90


Belarusian
1,371.76


Finnish
1,371.47


Icelandic
1,370.42


Norwegian
1,370.33


Estonian
1,369.27


Stuttgart
1,362.47


Scottish
1,359.05


Sardinian
1,356.63


Han
1,313.01


Nganasan
1,300.53


Ulchi
1,283.61


Somali
1,268.97


Khomani
1,164.31



Kurd, when you use Punjabi samples, do you use PJL or those samples include us the Punjabi forum members as well?

Kurd
02-04-2018, 02:28 AM
Kurd, when you use Punjabi samples, do you use PJL or those samples include us the Punjabi forum members as well?

I used a mix of you guys and the PJL but keep in mind that this analysis does not include 0/1 hetrozygous sites because I didn't have any left with the HGDP Brahuis. Those would make a difference too based on others I have done, but I believe if I use the SGDP Brahuis I will have plenty of 0/1 sites. The only thing is SGDP only has about 4 samples

Mingle
02-04-2018, 02:44 AM
I'd note that it isn't just the HGDP Pathan who are under the x-axis but the HGDP Burusho, HGDP Kalash and KPK Chitral Nuristani average. Now, they might score minimally more SI (like 1-2%) on Harappa than some of the populations above the x-axis but a better explanation is these samples are very old and thus their results were calculated on very old chips.

Interesting, what do you mean by that?

If newer chips come out in 2020, then will my 23andMe data be invalid (like how the HGDP Pathans are somewhat invalid) and then will I have to take another DNA test to have more up to date DNA data?


On a separate note, we have seen the results of two Iranian Baloch on this forum. They seem to be much more Near Eastern/Iran ChL shifted than the Pakistani Baloch. On Harappa, they score much lower Baloch and much higher Caucasus than the HGDP Baloch.

Do you know which part of Iranian Balochistan they were from?

Mingle
02-04-2018, 02:47 AM
I think the Iranian Baloch are a distinct pop to the Paki Baloch who are Iranicized Brahui. You'll notice a pull towards the north compared to the Brahui due to the recent West Asian admixture from a pop like the Iranian Baloch who provided the language shift

Poi can we do a basal rich k7 run or ane_k7 run if no ones interesting in purchasing the basal k7 which looks pretty good IMO.

We have pop average spreadsheets for both

@Davidski group discount?

I wouldn't say that they are Balochified Brahuis just yet. The Brahuis are also said to be descended from Dravidian-speaking migrants from Central India or so. Based on geography, it would make sense for Pakistan Baloches to shift slightly more south and for Iran Baloches to shift slightly more west. The Iran Baloches are probably a buffer between Pakistan Baloches and their Persian neighbors immediately west of them. If we had more Persian samples from the eastern part of Iran, it would solve a lot of questions.

Sapporo
02-04-2018, 02:58 AM
Interesting, what do you mean by that?

If newer chips come out in 2020, then will my 23andMe data be invalid (like how the HGDP Pathans are somewhat invalid) and then will I have to take another DNA test to have more up to date DNA data?

What I mean is their results may be less accurate when you do a direct comparison to people who were tested on newer 23andMe chips like the V4 or V5. Also, you have to take into account that many of the Gedmatch calculators are quite old and work better for people tested on older 23andMe chips like V3 or FTDNA. Note, I'm not suggesting a V4 or V5 chip will give you extremely inaccurate results but V3 will tend to be more accurate.

The HGDP samples are originally from 2009 I believe. I'm not sure how many SNPs were used to analyze their data either. Their results aren't necessarily invalid. I'm just suggesting they can't be compared directly to users who used V5 chips from 23andMe without taking into account that there might be slight differences because of technology, platform used, etc.

Older results don't necessarily become invalid when new technology comes out. It's just that improved technology and new discoveries in the genetics world improves the accuracy of results.




Do you know which part of Iranian Balochistan they were from?

No, I don't recall. However, they are both forum members here though. Just inactive. Kurd has more details on their origins so you can probably pm him for details.

Mingle
02-04-2018, 03:03 AM
What I mean is their results may be less accurate when you do a direct comparison to people who were tested on newer 23andMe chips like the V4 or V5. Also, you have to take into account that many of the Gedmatch calculators are quite old and work better for people tested on older 23andMe chips like V3 or FTDNA.

The HGDP samples are originally from 2009 I believe. I'm not sure how many SNPs were used to analyze their data either. Their results aren't necessarily invalid. I'm just suggesting they can't be compared directly to users who used V5 chips from 23andMe without taking into account that there might be slight differences because of technology, platform used, etc.

Older results don't necessarily become invalid when new technology comes out. It's just that improved technology and new discoveries in the genetics world improves the accuracy of results.

When newer technology comes out, would I have to retake a DNA test to increase accuracy, or would they be able to somehow update my DNA sample on the computer using my raw data?

Kurd
02-04-2018, 03:09 AM
Guys, just to show you the effect using hetrozygous sites ( less common alleles) 0/1 from a high coverage Simmons Diversity Project Abkhazian sample with a few million genotyped SNPs

First, with no adjustment factor to compensate for Hg19 biases due to unequal representation of populations. This is what all research is based on including publications. Sorted with highest sharing with Abkhazian on top.



SAMPLE
0/1 ALLELE SHARING UNADJUSTED


Abkhazian_SGDP
23627


.Sapporo
9510


.Kaido
9382


.Kurds
9375


.Parasar
9369


.Kurds_Feyli
9364


.Pashtun_Afg1
9356


N Italian
9351


.Kurds_Feyli
9343


.Kurds_Kur
9335


.Sein
9334


.Kurds_Kur
9324


.Punjabi_Rgh
9316


.Jatt_Har393
9313


.Kurds_Feyli
9312


.Kurds_Feyli
9311


.Balq
9311


.Pashtun_Afg2
9290


.Jesus
9278


.Jam
9276


.Greek_AT
9264


.Kurds_Kur
9254


.Kurds_Feyli
9224


.Kurds_Kur
9214


Normandy
9209


.Reza
9203


.Targum
9195


.Kurds_Kur
9192


.NK19191
9180


NW European
9173


.Hanif
9157


.Arain_Sheikh
9141


.Khana
9134


NW European
9124


English
9095


.Hanna
9090


.Kurds_Feyli
9088


.Punjabi_Rajput
9085


Romanian
9056


.Kurds_Feyli
9050


German
9049


British
9049


.Zephyrous
8987


.Varun
8924


.Kurds_Feyli
8753


Somali
8663


.Sadia
8045




Now with my proprietory adjustment. Which ones make more sense to you? The unadjusted ones on top, or the ones here with my Hg19 adjustment coefficient included. Trouble is that I don't know of any other researcher or paper where adjustments such as these are made.



SAMPLE
0/1 ALLELE SHARING ADJUSTED


Abkhazian_SGDP
42,209.71


.Kurds_Feyli
18,947.04


.Kurds_Feyli
18,904.55


.Kurds_Feyli
18,841.83


.Kurds_Feyli
18,839.80


.Kurds
18,759.44


.Greek_AT
18,711.18


.Kurds_Kur
18,679.40


.Kurds_Feyli
18,663.77


.Kurds_Kur
18,657.39


Normandy
18,632.76


.Kurds_Kur
18,517.32


.Targum
18,515.84


.Kurds_Kur
18,437.28


.Kurds_Kur
18,393.26


.Kurds_Feyli
18,388.59


.Kurds_Feyli
18,311.70


Romanian
18,283.09


.NK19191
18,255.48


NW European
18,237.02


NW European
18,139.60


English
18,081.95


German
17,990.49


British
17,990.49


N Italian
17,948.71


.Hanna
17,926.61


.Kaido
17,901.28


.Pashtun_Afg1
17,851.67


.Varun
17,741.98


.Pashtun_Afg2
17,725.74


.Kurds_Feyli
17,710.75


.Sapporo
17,575.11


.Sein
17,249.85


.Punjabi_Rgh
17,216.59


.Jatt_Har393
17,211.04


.Balq
17,207.34


.Parasar
17,186.26


.Jesus
17,146.36


.Jam
17,142.66


.Hanif
16,922.74


.Arain_Sheikh
16,893.17


.Reza
16,881.76


.Khana
16,880.24


.Punjabi_Rajput
16,789.68


.Zephyrous
16,485.53


.Sadia
14,867.69


Somali
12,216.85

Kurd
02-04-2018, 03:31 AM
I think the Iranian Baloch are a distinct pop to the Paki Baloch who are Iranicized Brahui. You'll notice a pull towards the north compared to the Brahui due to the recent West Asian admixture from a pop like the Iranian Baloch who provided the language shift


Actually, the 2 Iranian Baloch I posted have the same amount of S Indian as the Pak Baloch according to the Harappa calculator. The difference is higher Baloch score for the Pakistani Baloch samples, but that can simply be a result of inbreeding. Of course if Baloch score decreases something else has to go up to compensate. The other thing I've noticed with the 2 I posted is consistently higher SSA admix than the Pak ones.

I think you have that backwards with regards to the 2 I posted being Iranicized Brahui. Actually the Pakistani Baloch are slightly Punjabified or Sindhizised Baloch. Remember that they moved from the NW Iran area to the Iran Balochistan area and subsequently east to Pakistan. The same sort of thing happened to the Kurds that moved from W Iran to the Khash Balochistan Iran area during Shah Abbas (late 1500s), to control the out-of-control Baloch tribes in that area and collect taxes for the King, for example the Soharbzai and Ghulam Shahzai Kurds. Many of them voluntarily crossed over to Pakistan or were displaced to the other side of Bolan by powerful tribes such as the Yar Ahmadzai Baloch.

Sapporo
02-04-2018, 03:35 AM
When newer technology comes out, would I have to retake a DNA test to increase accuracy, or would they be able to somehow update my DNA sample on the computer using my raw data?

You will likely need to take one of the newer tests if the chip you tested on becomes outdated. For example, my V3 is outdated but still useful for outdated Gedmatch calculators. Currently, the 23andMe V5 and Living DNA are the most current chips. I've been told that FTDNA is updating their chip shortly to match up with 23andMe's V5 and Living DNA. khana is more informed on this than myself.

MonkeyDLuffy
02-04-2018, 03:45 AM
Guys, just to show you the effect using hetrozygous sites ( less common alleles) 0/1 from a high coverage Simmons Diversity Project Abkhazian sample with a few million genotyped SNPs

First, with no adjustment factor to compensate for Hg19 biases due to unequal representation of populations. This is what all research is based on including publications. Sorted with highest sharing with Abkhazian on top.



SAMPLE
0/1 ALLELE SHARING UNADJUSTED


Abkhazian_SGDP
23627


.Sapporo
9510


.Kaido
9382


.Kurds
9375


.Parasar
9369


.Kurds_Feyli
9364


.Pashtun_Afg1
9356


N Italian
9351


.Kurds_Feyli
9343


.Kurds_Kur
9335


.Sein
9334


.Kurds_Kur
9324


.Punjabi_Rgh
9316


.Jatt_Har393
9313


.Kurds_Feyli
9312


.Kurds_Feyli
9311


.Balq
9311


.Pashtun_Afg2
9290


.Jesus
9278


.Jam
9276


.Greek_AT
9264


.Kurds_Kur
9254


.Kurds_Feyli
9224


.Kurds_Kur
9214


Normandy
9209


.Reza
9203


.Targum
9195


.Kurds_Kur
9192


.NK19191
9180


NW European
9173


.Hanif
9157


.Arain_Sheikh
9141


.Khana
9134


NW European
9124


English
9095


.Hanna
9090


.Kurds_Feyli
9088


.Punjabi_Rajput
9085


Romanian
9056


.Kurds_Feyli
9050


German
9049


British
9049


.Zephyrous
8987


.Varun
8924


.Kurds_Feyli
8753


Somali
8663


.Sadia
8045




Now with my proprietory adjustment. Which ones make more sense to you? The unadjusted ones on top, or the ones here with my Hg19 adjustment coefficient included. Trouble is that I don't know of any other researcher or paper where adjustments such as these are made.



SAMPLE
0/1 ALLELE SHARING ADJUSTED


Abkhazian_SGDP
42,209.71


.Kurds_Feyli
18,947.04


.Kurds_Feyli
18,904.55


.Kurds_Feyli
18,841.83


.Kurds_Feyli
18,839.80


.Kurds
18,759.44


.Greek_AT
18,711.18


.Kurds_Kur
18,679.40


.Kurds_Feyli
18,663.77


.Kurds_Kur
18,657.39


Normandy
18,632.76


.Kurds_Kur
18,517.32


.Targum
18,515.84


.Kurds_Kur
18,437.28


.Kurds_Kur
18,393.26


.Kurds_Feyli
18,388.59


.Kurds_Feyli
18,311.70


Romanian
18,283.09


.NK19191
18,255.48


NW European
18,237.02


NW European
18,139.60


English
18,081.95


German
17,990.49


British
17,990.49


N Italian
17,948.71


.Hanna
17,926.61


.Kaido
17,901.28


.Pashtun_Afg1
17,851.67


.Varun
17,741.98


.Pashtun_Afg2
17,725.74


.Kurds_Feyli
17,710.75


.Sapporo
17,575.11


.Sein
17,249.85


.Punjabi_Rgh
17,216.59


.Jatt_Har393
17,211.04


.Balq
17,207.34


.Parasar
17,186.26


.Jesus
17,146.36


.Jam
17,142.66


.Hanif
16,922.74


.Arain_Sheikh
16,893.17


.Reza
16,881.76


.Khana
16,880.24


.Punjabi_Rajput
16,789.68


.Zephyrous
16,485.53


.Sadia
14,867.69


Somali
12,216.85



Anyone looking at the list, I am the Punjabi_Rgh.

pegasus
02-04-2018, 03:48 AM
I wouldn't say that they are Balochified Brahuis just yet. The Brahuis are also said to be descended from Dravidian-speaking migrants from Central India or so. Based on geography, it would make sense for Pakistan Baloches to shift slightly more south and for Iran Baloches to shift slightly more west. The Iran Baloches are probably a buffer between Pakistan Baloches and their Persian neighbors immediately west of them. If we had more Persian samples from the eastern part of Iran, it would solve a lot of questions.

The Dravidian language they speak is a late one, not an archaic branch also genetically they are not related to Central Indians . I agree with Bmoney, the Iranian Baloch show much more of an Iranian shift , Pak Baloch and Brohi populations look what they are , largely descendents of Iran_N farmers. They are like the Sardinians of the region in some ways. Yes, there is a lack of Eastern Iranian samples , it would be interesting to see how people from say Bam or say Sistani Persians cline.

Mingle
02-04-2018, 03:54 AM
The Dravidian language they speak is a late one, not an archaic branch also genetically they are not related to Central Indians . I agree with Bmoney, the Iranian Baloch show much more of an Iranian shift , Pak Baloch and Brohi populations look what they are , largely descendents of Iran_N farmers. They are like the Sardinians of the region in some ways. Yes, there is a lack of Eastern Iranian samples , it would be interesting to see how people from say Bam or say Sistani Persians cline.

I didn't mean to imply I thought that they were genetically related to Central Indians. I only mentioned that to make a point that the Brahuis aren't really any more native to Balochistan than the Baloch are.

poi
02-04-2018, 04:15 AM
If newer chips come out in 2020, then will my 23andMe data be invalid ... and then will I have to take another DNA test to have more up to date DNA data?


Bad news: yes, by 2020, old 23andme data would likely not work great with new calculators.
Good news: by 2020, whole genome sequencing will probably be so affordable that you'd be able to walk to your local pharmacy and get your DNA sequenced in an hour. Okay, may not be within an hour at a local pharmacy, but definitely much more affordable that "new chips" are not about SNP counts, but rather speed/cost of sequencing. Once you're fully sequenced, you can do whatever you want... extract 23andme version 1 if you want. Correct me if I'm wrong here.

Meanwhile, while we are waiting for Walgreens to catch up, you can check this chart to see how to maximize SNPs using multiple companies:

https://isogg.org/wiki/Autosomal_SNP_comparison_chart

Mingle
02-04-2018, 04:42 AM
Bad news: yes, by 2020, old 23andme data would likely not work great with new calculators.
Good news: by 2020, whole genome sequencing will probably be so affordable that you'd be able to walk to your local pharmacy and get your DNA sequenced in an hour. Okay, may not be within an hour at a local pharmacy, but definitely much more affordable that "new chips" are not about SNP counts, but rather speed/cost of sequencing. Once you're fully sequenced, you can do whatever you want... extract 23andme version 1 if you want. Correct me if I'm wrong here.

Meanwhile, while we are waiting for Walgreens to catch up, you can check this chart to see how to maximize SNPs using multiple companies:

https://isogg.org/wiki/Autosomal_SNP_comparison_chart

I was worried that I would have to order a whole new 23andMe kit in a few years in order to increase accuracy of my results. This is great news if true. :)

poi
02-04-2018, 04:46 AM
Guys, just to show you the effect using hetrozygous sites ( less common alleles) 0/1 from a high coverage Simmons Diversity Project Abkhazian sample with a few million genotyped SNPs

First, with no adjustment factor to compensate for Hg19 biases due to unequal representation of populations. This is what all research is based on including publications. Sorted with highest sharing with Abkhazian on top.



SAMPLE
0/1 ALLELE SHARING UNADJUSTED


Abkhazian_SGDP
23627


.Sapporo
9510


.Kaido
9382


.Kurds
9375


.Parasar
9369


.Kurds_Feyli
9364


.Pashtun_Afg1
9356


N Italian
9351


.Kurds_Feyli
9343


.Kurds_Kur
9335


.Sein
9334


.Kurds_Kur
9324


.Punjabi_Rgh
9316


.Jatt_Har393
9313


.Kurds_Feyli
9312


.Kurds_Feyli
9311


.Balq
9311


.Pashtun_Afg2
9290


.Jesus
9278


.Jam
9276


.Greek_AT
9264


.Kurds_Kur
9254


.Kurds_Feyli
9224


.Kurds_Kur
9214


Normandy
9209


.Reza
9203


.Targum
9195


.Kurds_Kur
9192


.NK19191
9180


NW European
9173


.Hanif
9157


.Arain_Sheikh
9141


.Khana
9134


NW European
9124


English
9095


.Hanna
9090


.Kurds_Feyli
9088


.Punjabi_Rajput
9085


Romanian
9056


.Kurds_Feyli
9050


German
9049


British
9049


.Zephyrous
8987


.Varun
8924


.Kurds_Feyli
8753


Somali
8663


.Sadia
8045




Now with my proprietory adjustment. Which ones make more sense to you? The unadjusted ones on top, or the ones here with my Hg19 adjustment coefficient included. Trouble is that I don't know of any other researcher or paper where adjustments such as these are made.



SAMPLE
0/1 ALLELE SHARING ADJUSTED


Abkhazian_SGDP
42,209.71


.Kurds_Feyli
18,947.04


.Kurds_Feyli
18,904.55


.Kurds_Feyli
18,841.83


.Kurds_Feyli
18,839.80


.Kurds
18,759.44


.Greek_AT
18,711.18


.Kurds_Kur
18,679.40


.Kurds_Feyli
18,663.77


.Kurds_Kur
18,657.39


Normandy
18,632.76


.Kurds_Kur
18,517.32


.Targum
18,515.84


.Kurds_Kur
18,437.28


.Kurds_Kur
18,393.26


.Kurds_Feyli
18,388.59


.Kurds_Feyli
18,311.70


Romanian
18,283.09


.NK19191
18,255.48


NW European
18,237.02


NW European
18,139.60


English
18,081.95


German
17,990.49


British
17,990.49


N Italian
17,948.71


.Hanna
17,926.61


.Kaido
17,901.28


.Pashtun_Afg1
17,851.67


.Varun
17,741.98


.Pashtun_Afg2
17,725.74


.Kurds_Feyli
17,710.75


.Sapporo
17,575.11


.Sein
17,249.85


.Punjabi_Rgh
17,216.59


.Jatt_Har393
17,211.04


.Balq
17,207.34


.Parasar
17,186.26


.Jesus
17,146.36


.Jam
17,142.66


.Hanif
16,922.74


.Arain_Sheikh
16,893.17


.Reza
16,881.76


.Khana
16,880.24


.Punjabi_Rajput
16,789.68


.Zephyrous
16,485.53


.Sadia
14,867.69


Somali
12,216.85



@Kurd - do you have rough timetable/estimate/guesstimate on new calculators/geneplaza apps you're releasing for this year?

bmoney
02-05-2018, 03:07 AM
I wouldn't say that they are Balochified Brahuis just yet. The Brahuis are also said to be descended from Dravidian-speaking migrants from Central India or so. Based on geography, it would make sense for Pakistan Baloches to shift slightly more south and for Iran Baloches to shift slightly more west. The Iran Baloches are probably a buffer between Pakistan Baloches and their Persian neighbors immediately west of them. If we had more Persian samples from the eastern part of Iran, it would solve a lot of questions.

Ive heard that theory but considering the large ASI + Munda levels in Central India which virtually do not exist in the Brahui (along with y-hgs H and O) - to me it seems more likely that the Brahui/Baloch are an ancient isolated pop who've been in the region for millenia considering how isolated they are in plots to everyone

Also the reason I say Iranicized Brahui is that the Pakistani Baloch and Brahui are genetically the same and plot the same. So either the Brahui are Dravidianised Baloch (Central India theory) which genetics do not support, or the Baloch are Iranicized Brahui

According to F Southworth Brahui is a split from 'Zagrosian' - this is the only linguistic position that lines up with Brahui/Baloch Iran_N scores

http://www.academia.edu/7336719/Rice_in_Dravidian

parasar
02-05-2018, 05:13 PM
Ive heard that theory but considering the large ASI + Munda levels in Central India which virtually do not exist in the Brahui (along with y-hgs H and O) - to me it seems more likely that the Brahui/Baloch are an ancient isolated pop who've been in the region for millenia considering how isolated they are in plots to everyone

Also the reason I say Iranicized Brahui is that the Pakistani Baloch and Brahui are genetically the same and plot the same. So either the Brahui are Dravidianised Baloch (Central India theory) which genetics do not support, or the Baloch are Iranicized Brahui

According to F Southworth Brahui is a split from 'Zagrosian' - this is the only linguistic position that lines up with Brahui/Baloch Iran_N scores

http://www.academia.edu/7336719/Rice_in_Dravidian

The Baloch region was also adjacent to Saka, Avestan, Vedic, and Old Persian speaking areas. But none of those made into Brahui. Brahui has loanwords from Balochi. So perhaps both moved in together about a 1000 years back?

bmoney
02-05-2018, 11:08 PM
The Baloch region was also adjacent to Saka, Avestan, Vedic, and Old Persian speaking areas. But none of those made into Brahui. Brahui has loanwords from Balochi. So perhaps both moved in together about a 1000 years back?

Or the Brahui were so isolated that the only foreign contribution they received since moving from the Zagros were from the Baloch who moved into their areas 1000 years back

heksindhi
02-05-2018, 11:20 PM
I think the Iranian Baloch are a distinct pop to the Paki Baloch who are Iranicized Brahui. You'll notice a pull towards the north compared to the Brahui due to the recent West Asian admixture from a pop like the Iranian Baloch who provided the language shift


I think the conclusions being drawn from the Baluch/Brahui/Makrani samples are perhaps premature based on the limited samples we have. While 25 each of the 3 groups would appear to be sufficient, I think there is an issue in that all samples were collected from narrowly scoped locations for each group. These are tribal groups and by definition share common ancestry. By sampling in limited non cosmopolitan areas, you end up with individuals that are (to various degrees) related to each other). This, on top of the Baluch and Brahui being an isolated/inbred population...

You can easily verify this by seeing their matches on gedmatch. Virtually all, except the outliers, are closely related to each other. Compare this to the Sindhi samples which were collected from a major city and you can see the difference in terms of variety of samples.

The Iranian Baluch should be (and are) closest to the Makrani in all calculators except Harappa. I would bet that the variation in Baluchi results is closely tied to geography and tribal affiliation. I know I've said this before, but I really do think that it doesn't make sense to build calculators using only the HGDP Baluch. If any are to be used, it should be the Makrani set because they appear to be the least inbred.

bmoney
02-06-2018, 12:14 AM
I think the conclusions being drawn from the Baluch/Brahui/Makrani samples are perhaps premature based on the limited samples we have. While 25 each of the 3 groups would appear to be sufficient, I think there is an issue in that all samples were collected from narrowly scoped locations for each group. These are tribal groups and by definition share common ancestry. By sampling in limited non cosmopolitan areas, you end up with individuals that are (to various degrees) related to each other). This, on top of the Baluch and Brahui being an isolated/inbred population...

You can easily verify this by seeing their matches on gedmatch. Virtually all, except the outliers, are closely related to each other. Compare this to the Sindhi samples which were collected from a major city and you can see the difference in terms of variety of samples.

The Iranian Baluch should be (and are) closest to the Makrani in all calculators except Harappa. I would bet that the variation in Baluchi results is closely tied to geography and tribal affiliation. I know I've said this before, but I really do think that it doesn't make sense to build calculators using only the HGDP Baluch. If any are to be used, it should be the Makrani set because they appear to be the least inbred.

Makes sense but the only thing id say is

- Arent all Brahui/Paki Baloch pops inbred due to their isolation, where would you find a more representative pop for the Baloch/Brahui than the HGDP samples?
- Are the Iranian Baloch and the Makrani being artifically pulled closer due to both having SSA admixture?
- How are the Makrani less inbred apart from their SSA admixture?

bmoney
02-06-2018, 12:16 AM
@Poi any plot plans?

poi
02-06-2018, 12:26 AM
@Poi any plot plans?

Yep! The PCA data generator is very close to completion. It will allow us to "select" populations to be included in the PCA based on region/countries/language/ethnicity etc. Also, members can be added adhoc just to see where we plot.

With Khana's dataset for different samples and members here, we can theoretically generate different types of PCAs using relevant samples so that the total variance can be kept high even for 2D PCA.

heksindhi
02-06-2018, 01:17 AM
Makes sense but the only thing id say is

- Arent all Brahui/Paki Baloch pops inbred due to their isolation, where would you find a more representative pop for the Baloch/Brahui than the HGDP samples?
- Are the Iranian Baloch and the Makrani being artifically pulled closer due to both having SSA admixture?
- How are the Makrani less inbred apart from their SSA admixture?

If I had the luxury of collecting my own samples, I would broaden the collection area and try to ensure that I included a diverse set of tribes in the samples. I happen to believe that the Brahui and Baluch are, in fact, closely related, at least at the present time. That being said, we have no idea which tribes are labeled as Brahui or Baluch here. My personal experience is that both will identify as Baluch, so the identifying characteristic had to be their professed first language. This is a rather arbitrary criterion as Baluch tribes are invariably bi or tri-lingual (again depending on geography) with a documented history of switching primary languages.

The Makrani and the Iranian Baluch are not being pulled closer due to SSA admixture. IMO, the presence of SSA admixture indicates that the Iranian Baluch *are* Makrani (aka coastal Baluch)- just from a little further west...I'm no expert on calculators, but I have to believe that if you expanded the set of HGDP Baluchi samples to include the Iranian Baluch, they would auto-magically become more "baloch" in that calculator. Run the Makrani and the Iranian Baluch through a calculator that does not have a "baloch" component (built using the HGDP Baluch) and I'd put money on them being virtually identical.

The Makrani tend to be less in-bred for multiple reasons. Makran has far weaker tribal links compared to other parts of Baluchestan. The coastal location also makes for a less insular society. This is the part of Baluchestan that has historical links with Oman and the Gulf. Add all that, and you get less in-breeding.

As for the value of a Baluch component in genetic calculators? I honestly struggle to see the need. The entirety of the Baluch-like groups numbered less than a million as recently as a century ago. A thousand years ago, they numbered in the 10s of thousands at best. Real Baluch genetic impact is limited to Sindh and Southern Punjab. The component of actual relevance to South Asia is Iran_N - Why not use Iran_N in the calculator? This way you avoid what I see as a common pitfall in Harappa like calculators - i.e. you have 2 different groups scoring similar amounts of the "baloch" component, but for different reasons as the Baluch are most definitely more than just Iran_N

bmoney
02-06-2018, 01:29 AM
If I had the luxury of collecting my own samples, I would broaden the collection area and try to ensure that I included a diverse set of tribes in the samples. I happen to believe that the Brahui and Baluch are, in fact, closely related, at least at the present time. That being said, we have no idea which tribes are labeled as Brahui or Baluch here. My personal experience is that both will identify as Baluch, so the identifying characteristic had to be their professed first language. This is a rather arbitrary criterion as Baluch tribes are invariably bi or tri-lingual (again depending on geography) with a documented history of switching primary languages.

The Makrani and the Iranian Baluch are not being pulled closer due to SSA admixture. IMO, the presence of SSA admixture indicates that the Iranian Baluch *are* Makrani (aka coastal Baluch)- just from a little further west...I'm no expert on calculators, but I have to believe that if you expanded the set of HGDP Baluchi samples to include the Iranian Baluch, they would auto-magically become more "baloch" in that calculator. Run the Makrani and the Iranian Baluch through a calculator that does not have a "baloch" component (built using the HGDP Baluch) and I'd put money on them being virtually identical.

The Makrani tend to be less in-bred for multiple reasons. Makran has far weaker tribal links compared to other parts of Baluchestan. The coastal location also makes for a less insular society. This is the part of Baluchestan that has historical links with Oman and the Gulf. Add all that, and you get less in-breeding.

As for the value of a Baluch component in genetic calculators? I honestly struggle to see the need. The entirety of the Baluch-like groups numbered less than a million as recently as a century ago. A thousand years ago, they numbered in the 10s of thousands at best. Real Baluch genetic impact is limited to Sindh and Southern Punjab. The component of actual relevance to South Asia is Iran_N - Why not use Iran_N in the calculator? This way you avoid what I see as a common pitfall in Harappa like calculators - i.e. you have 2 different groups scoring similar amounts of the "baloch" component, but for different reasons as the Baluch are most definitely more than just Iran_N

Absolutely agree with you there - the Baloch/Brahui have recent West Asian admixture thats not present in pops like South Indians - Iran_N is a better component. However Harappa was a modern pop based calc. To me this also supports my point that ancients calcs are more informative to SAs, as modern pop admixture calcs don't necessarily mean actual admixture

You could be right on the Iranian Baloch absorbing more Iranian ancestry as they move west, not sure if they score identically to the Makrani - @Khana @Kurd

khanabadoshi
02-06-2018, 02:38 AM
I think the conclusions being drawn from the Baluch/Brahui/Makrani samples are perhaps premature based on the limited samples we have. While 25 each of the 3 groups would appear to be sufficient, I think there is an issue in that all samples were collected from narrowly scoped locations for each group. These are tribal groups and by definition share common ancestry. By sampling in limited non cosmopolitan areas, you end up with individuals that are (to various degrees) related to each other). This, on top of the Baluch and Brahui being an isolated/inbred population...

You can easily verify this by seeing their matches on gedmatch. Virtually all, except the outliers, are closely related to each other. Compare this to the Sindhi samples which were collected from a major city and you can see the difference in terms of variety of samples.

The Iranian Baluch should be (and are) closest to the Makrani in all calculators except Harappa. I would bet that the variation in Baluchi results is closely tied to geography and tribal affiliation. I know I've said this before, but I really do think that it doesn't make sense to build calculators using only the HGDP Baluch. If any are to be used, it should be the Makrani set because they appear to be the least inbred.

Tbh, now that we have Iran N/ChL, it makes no sense to use any modern population as the basis for that "signal".
If you aren't using ancients as the basis for components, then I agree with you: Use the Makrani or Bandaris, and I lean towards using Bandaris.

heksindhi
02-06-2018, 02:48 AM
You could be right on the Iranian Baloch absorbing more Iranian ancestry as they move west, not sure if they score identically to the Makrani - @Khana @Kurd

I don't have access to the Iranian Baluch samples, but I think the comparison below between a Makrani and an Iranian Bandari sample illustrates my point. I have no idea which calculators include the HGDP Baluch so I picked a couple where the Makrani did not have extremely high West Asian scores as that implied the Baluch samples had been used in the calculator.

MDLP K11

Makrani (MW5060914)

Population
African 2.37
Amerindian -
ASI 23.68
Basal 20.61
Iran-Mesolithic 10.69
Neolithic 2.33
Oceanic -
EHG 37.20
SEA -
Siberian 0.17
WHG 2.96

Bandari (GZ9303501)

Population
African 3.29
Amerindian 0.96
ASI 19.97
Basal 23.86
Iran-Mesolithic 9.52
Neolithic 5.35
Oceanic 0.38
EHG 35.70
SEA -
Siberian 0.45
WHG 0.52

MDLP World 22



Makrani

Pygmy 0.07
West-Asian 47.29
North-European-Mesolithic 3.35
Indo-Tibetan -
Mesoamerican -
Arctic-Amerind 0.66
South-America_Amerind -
Indian 18.60
North-Siberean -
Atlantic_Mediterranean_Neolithic 3.28
Samoedic 2.04
Indo-Iranian 7.07
East-Siberean -
North-East-European 5.22
South-African 0.56
North-Amerind -
Sub-Saharian 1.73
East-South-Asian -
Near_East 10.11
Melanesian -
Paleo-Siberian -
Austronesian -

Bandari

Pygmy 1.44
West-Asian 45.19
North-European-Mesolithic 0.60
Indo-Tibetan -
Mesoamerican -
Arctic-Amerind 1.16
South-America_Amerind 0.80
Indian 16.70
North-Siberean 1.28
Atlantic_Mediterranean_Neolithic 5.57
Samoedic 1.39
Indo-Iranian 5.29
East-Siberean -
North-East-European 1.86
South-African 1.13
North-Amerind -
Sub-Saharian 1.73
East-South-Asian -
Near_East 15.70
Melanesian -
Paleo-Siberian -
Austronesian 0.15

Harappa - same 2 individuals:

Makrani

S-Indian 7.73
Baloch 57.75
Caucasian 13.67
NE-Euro 3.20
SE-Asian 0.84
Siberian -
NE-Asian -
Papuan -
American -
Beringian 0.33
Mediterranean 1.67
SW-Asian 11.99
San -
E-African 2.59
Pygmy -
W-African 0.2

Bandari

S-Indian 12.08
Baloch 35.31
Caucasian 30.12
NE-Euro 3.92
SE-Asian -
Siberian 0.54
NE-Asian -
Papuan 0.07
American 1.16
Beringian 0.77
Mediterranean -
SW-Asian 11.52
San -
E-African 3.16
Pygmy 0.30
W-African 1.05

Edit:

If I had to guesstimate, I would suggest that the Harappa "baloch" component is a composite of about 65% "Gedrosian/Zagrosian" signal + 10% ASI and the remainder generic West Asian/Caucasian

khanabadoshi
02-06-2018, 03:16 AM
Absolutely agree with you there - the Baloch/Brahui have recent West Asian admixture thats not present in pops like South Indians - Iran_N is a better component. However Harappa was a modern pop based calc. To me this also supports my point that ancients calcs are more informative to SAs, as modern pop admixture calcs don't necessarily mean actual admixture

You could be right on the Iranian Baloch absorbing more Iranian ancestry as they move west, not sure if they score identically to the Makrani - @Khana @Kurd

I wrote this whole post out explaining... but here is something more fun.

puntDNAL K13 scores with all outliers removed of all Baloch, Brahui, Makrani samples + Iranian Baluch + Emirati and Omani Baluch. Who is who? Where is the cline going? Which is the odd group out?

https://i.gyazo.com/cba7c61f2a5293b3507af4df56d3be97.png

bmoney
02-06-2018, 06:04 AM
puntDNAL K13 scores with all outliers removed of all Baloch, Brahui, Makrani samples + Iranian Baluch + Emirati and Omani Baluch. Who is who? Where is the cline going?

Nice - overall very similar

The pops highlighted in brown and light yellow have generally lower West Asian, higher SW Asian and higher SW Europe so guessing they are Emirati and Omani Baloch

Also in Harappa you can see elevated Caucasian in the Bandari which is common in Iranian pops compared to elevated Baloch in the Makrani

Im guessing the difference is one is Iran_N dominant, the other is newer Iran_Chl

bmoney
02-06-2018, 06:10 AM
I don't have access to the Iranian Baluch samples, but I think the comparison below between a Makrani and an Iranian Bandari sample illustrates my point. I have no idea which calculators include the HGDP Baluch so I picked a couple where the Makrani did not have extremely high West Asian scores as that implied the Baluch samples had been used in the calculator.

Do you have the gedmatch kit numbers for those?

Sapporo
02-06-2018, 08:36 AM
I'm sure their Gedmatch results are somewhere on the forum but zara and farid (both genica members) have posted their results in the past. Kurd has also occasionally used their data in some of his admixture calculators and statistical analysis.

From what I've recollect, both of them seemed heavily shifted toward Iranians in comparison to the HGDP Baloch/Brahui/Makrani. Anyways, the easiest way to see if there is actually a notable difference between Iranian Baloch like zara and farid versus the HGDP samples is if we use an admixture calculator that includes both Iran N & Iran Chl. I've seen some anthro bloggers use both components in their nnmonte runs but don't recall seeing a Gedmatch or DIY calculator that contained both.

khanabadoshi
02-06-2018, 09:02 AM
Nice - overall very similar

The pops highlighted in brown and light yellow have generally lower West Asian, higher SW Asian and higher SW Europe so guessing they are Emirati and Omani Baloch

Also in Harappa you can see elevated Caucasian in the Bandari which is common in Iranian pops compared to elevated Baloch in the Makrani

Im guessing the difference is one is Iran_N dominant, the other is newer Iran_Chl

Each Color is either:

1) Baloch Pakistan
2) Baluch Iran
3) Baluch Emirati
4) Brahui
5) Makrani

Obviously, the 3 larger group colors are either: Baloch Pakistan, Brahui, or Makrani. (If I had run the Bandari, I would have added them too, my little exercise would still work)
So using one's own personal theories or hypothesis, (like the ones that are being discussed in the thread), I'm hoping people can take a guess at which is which, and say why they think that.
The 2 smaller groups are Iranian Baluch and Emirati Baloch. After you decide the 3 larger groups, you gotta guess which is which of these guys. Your reasoning should make them fit follow the SAME logic/assumptions one used to decide which was Baloch/Brahui/Makrani.


Some premises are: Iranian Baluch are more pure; Makrani are the least admixed populace of the region; Brahui are from Central India; Emirati Baluch will be Arab-shifted; the most in-bred or related grouping/average is the Baloch Pakistan group; the least inbred group is the Makrani; the least inbred group is the Iranian Baluch; Pakistani Baloch are really just Brahui; Brahui are really just Baloch; Makrani sab ky baap hein aur Bandara Dada! etc etc...

I'm hoping a few people give me guesses. Start with the assumption/premise you are are most sure of or think is true and work your way from there.

khanabadoshi
02-06-2018, 09:12 AM
I'm sure their Gedmatch results are somewhere on the forum but zara and farid (both genica members) have posted their results in the past. Kurd has also occasionally used their data in some of his admixture calculators and statistical analysis.

From what I've recollect, both of them seemed heavily shifted toward Iranians in comparison to the HGDP Baloch/Brahui/Makrani. Anyways, the easiest way to see if there is actually a notable difference between Iranian Baloch like zara and farid versus the HGDP samples is if we use an admixture calculator that includes both Iran N & Iran Chl. I've seen some anthro bloggers use both components in their nnmonte runs but don't recall seeing a Gedmatch or DIY calculator that contained both.

I'll discuss Zarafshaan and Farid and the 3rd person in some detail after the guesses. I've actually discussed them quite a bit before, but everyone seems to not remember LOL.
All the Iranian Baloch are posted in the punt K13 table I posted.

I'm hoping a few of you guys actually make a thought-out real attempt, one that tries to conform to all one's premises.

heksindhi
02-06-2018, 12:04 PM
I'll give it a shot - I honestly thought it would have been easier to pick them out, but they are remarkably similar. However, the bottom dark yellow group is slightly more SW Asian - hence Makrani.

The green have the highest NE Europe and South Asian - indications of Sindhi/Punjabi (possibly Pashtun) input - so I'll guess Baluch - that leaves the blue as Brahui...

The beige have the highest SW Asian, so Omani Baluch and the brown are in between Makrani and Omani - Iranian Baluch?

The last 2 groups could really go either way. If I recall correctly - the Omani Baluch originate largely from Pakistani Makran/Gawadar so, depending on how you draw your map, they could turn out to be the geographically intermediate group instead.

ssamlal
02-06-2018, 12:11 PM
I'm late to the party but here are my numbers. @Poi feel free to include (or exclude) in the PCA :)





Mine
Mine
Brother RS
Brother VS
Pat_Uncle
Mum



MyHeritage
Ancestry


S-Indian
53.94
53.68
54.06
54.42
52.67
55.99


Baloch
31.83
31.87
33.34
31.93
32.78
31.66


Caucasian
2.87
2.91
0.98
2.43
2.42
2.82


NE-Euro
3.74
3.76
3.68
4.86
4.73
3.42


SE-Asian
2.42
2.59
2.34
2.72
2.42
2.94


Siberian
0.99
1.1
1.41
1.29
0.77
1.97


NE-Asian
-
-
1.56
0.4
1.27
-


Papuan
0.19
0.24
0.82
-
1.17
0.38


American
0.76
0.6
-
-
0.3
0.4


Beringian
1.03
0.83
0.35
0.67
0.06
0.15


Mediterranean
2.12
1.9
1.35
1.15
0.81
0.27


SW-Asian
-
-
-
-
-
-


San
-
-
0.1
-
0.61
-


E-African
-
0.42
-
-
-
-


Pygmy
0.1
0.09
-
0.11
-
-


W-African
-
-
-
-
-
-

parasar
02-06-2018, 04:14 PM
1) Baloch Pakistan Yellow
2) Baluch Iran Brown
3) Baluch Emirati Beige
4) Brahui Green
5) Makrani Blue

khanabadoshi
02-06-2018, 08:19 PM
1) Baloch Pakistan Yellow2) Baluch Iran Brown3) Baluch Emirati Beige4) Brahui Green5) Makrani BlueSo I lied and left 2 obviously major outliers in one of the groups to see if anyone would notice. I think you might have seen them, based on how you grouped. If that is your basis, know that I'm being tricky on purpose... don't consider those outliers, and re-evaluate, you may come to different conclusions.

bmoney
02-09-2018, 06:04 AM
Regarding Iran Baluch vs Pakistan Baluch the real test would be Iran Chl vs Iran N for both

Otherwise Poi can you whip up a Eurogenes ANE K7 pca with forum users and reference pops included like you did for Harappa

It would give us a good ancients breakdown for SAs and better identify clusters

Less components in it as well so assuming that means you can capture more variation on 2d?

poi
02-09-2018, 06:43 AM
Regarding Iran Baluch vs Pakistan Baluch the real test would be Iran Chl vs Iran N for both

Otherwise Poi can you whip up a Eurogenes ANE K7 pca with forum users and reference pops included like you did for Harappa

It would give us a good ancients breakdown for SAs and better identify clusters

Less components in it as well so assuming that means you can capture more variation on 2d?

I was thinking about ANE K7. That and ASI K9 are interesting for South Asians. This week has been a bit busy with work, but should be able to get the PCAs coming starting tomorrow.

poi
02-09-2018, 06:55 AM
I was thinking about ANE K7. That and ASI K9 are interesting for South Asians. This week has been a bit busy with work, but should be able to get the PCAs coming starting tomorrow.

Okay, ANE K7 could get interesting. Just quickly loading non-tribal averages of South Asians from all areas(healthy mix from the region), the ANE itself has almost no pull lol! The top quadrants have a massive WHG pull, while quadrant 3 is exclusively ENF, while quadrant 4 has mix of ASE+EastEurasian+Africans. I will work on this tomorrow. Post your scores if you want.

21331

bmoney
02-09-2018, 02:56 PM
Okay, ANE K7 could get interesting. Just quickly loading non-tribal averages of South Asians from all areas(healthy mix from the region), the ANE itself has almost no pull lol! The top quadrants have a massive WHG pull, while quadrant 3 is exclusively ENF, while quadrant 4 has mix of ASE+EastEurasian+Africans. I will work on this tomorrow. Post your scores if you want.

21331

Whys that? because ANE is so widespread in SAs?

Yes WHG in this calc is generally only present in Indo-Aryan upper castes and NW groups

poi
02-09-2018, 04:24 PM
Whys that? because ANE is so widespread in SAs?

ANE, in ANE-K7, is wide spread among non-tribal SAs such as the frequency does not differ much among population(averages). Even tribals like Paniya have ANE more than many Pashtuns, Bengalis, and South Indians.

parasar
02-09-2018, 05:44 PM
... Post your scores if you want.

...

ANE 29.12
ASE 18.23
WHG-UHG 6.16
East_Eurasian 5.33
West_African -
East_African 3.11
ENF 38.05

bmoney
02-10-2018, 02:29 AM
ANE, in ANE-K7, is wide spread among non-tribal SAs such as the frequency does not differ much among population(averages). Even tribals like Paniya have ANE more than many Pashtuns, Bengalis, and South Indians.

We can pretty much conclude ANE is a part of the 'ASI' composite

misanthropy
02-12-2018, 12:14 AM
ANE 26.65
ASE 19.58
WHG-UHG -
East_Eurasian 8.06
West_African -
East_African 5.89
ENF 39.82

Mingle
02-13-2018, 05:41 AM
Post your scores if you want.

21331


Population
ANE 27.86
ASE 12.38
WHG-UHG 7.86
East_Eurasian 2.03
West_African 1.21
East_African 0.38
ENF 48.28

Mingle
02-13-2018, 05:56 AM
Even tribals like Paniya have ANE more than many Pashtuns, Bengalis, and South Indians.

Interesting if true. Do you have any results of theirs? How much do tribals get? 30%+?

poi
02-13-2018, 06:23 AM
Interesting if true. Do you have any results of theirs? How much do tribals get? 30%+?

From what I've seen, pretty much all South Asians score 27-31% ANE in ANE-K7. I am working on a script to make the PCA cleaner and faster to generate based on existing spreadsheets. I wanted to finish it this weekend, but could not do it due to being a bit busy. I hope to start cranking out PCAs soon on South Asian populations.

bmoney
02-13-2018, 06:24 AM
Population
ANE 29.73
ASE 21.23
WHG-UHG -
East_Eurasian 4.49
West_African -
East_African 4.68
ENF 39.87

Sapporo
02-13-2018, 12:12 PM
Population

ANE 30.43
ASE 15.14
WHG-UHG 6.33
East_Eurasian 3.02
West_African -
East_African 2.47
ENF 42.61

pegasus
02-14-2018, 04:40 AM
ANE, in ANE-K7, is wide spread among non-tribal SAs such as the frequency does not differ much among population(averages). Even tribals like Paniya have ANE more than many Pashtuns, Bengalis, and South Indians.

In Eurasian ,after Siberians, ANE peaks in Hindu Kush populations ie Kalash, Burusho ,and Pashtuns, some even at par like in the case of some Kalash who score 40%. Pania atm have around 20-25% ANE, my guess is this will rise a bit given the archaic connection between ANE and ASE.

poi
02-14-2018, 05:24 AM
In Eurasian ,after Siberians, ANE peaks in Hindu Kush populations ie Kalash, Burusho ,and Pashtuns, some even at par like in the case of some Kalash who score 40%. Pania atm have around 20-25% ANE, my guess is this will rise a bit given the archaic connection between ANE and ASE.

I've read that ANE-K7's ANE wasn't MA1.

khanabadoshi
02-14-2018, 06:02 AM
Interesting if true. Do you have any results of theirs? How much do tribals get? 30%+?


From what I've seen, pretty much all South Asians score 27-31% ANE in ANE-K7. I am working on a script to make the PCA cleaner and faster to generate based on existing spreadsheets. I wanted to finish it this weekend, but could not do it due to being a bit busy. I hope to start cranking out PCAs soon on South Asian populations.


I've read that ANE-K7's ANE wasn't MA1.

You have to take into the account the ANE % /= ANE; it could be "ANE-like".

Weren't we all scoring a bazillion % CHG not too long ago? Iran N was "CHG-like", we just didn't know that until a few samples in the Zargos were tested.

PS. I'll get that spreadsheet to ya... I finally slept.

pegasus
02-14-2018, 07:49 AM
You have to take into the account the ANE % /= ANE; it could be "ANE-like".

Weren't we all scoring a bazillion % CHG not too long ago? Iran N was "CHG-like", we just didn't know that until a few samples in the Zargos were tested.

PS. I'll get that spreadsheet to ya... I finally slept.

I agree its a combo of ANE and ANE like stuff.

bmoney
02-14-2018, 09:25 AM
I agree its a combo of ANE and ANE like stuff.

Maybe ANE combined with older Ust'Ishim like stuff so an archaic pop that links ANE to East Asians but clearly separated from ghost-ASI

ancient East Eurasians such as UI, which predates MA-1 by 20,000 years, are a) as close to MA-1 as they are to modern East Eurasians and b) closer to them than to modern West Eurasians.

Ust'Ishim y-dna: Basal K2a* has been found only in the remains of two Paleolithic individuals from western Siberia (Ust-Ishimsky District) and southwestern Romania (Peștera cu Oase), while K2a1* has been found only in living individuals from India (Telugu) and South East Asia (Malay).

In a 2016 study, modern Tibetans were identified as the modern population that has the most alleles in common with Ust'-Ishim man.[16] According to a 2017 study, "Siberian and East Asian populations shared 38% of their ancestry"[17] to Ust’-Ishim man.

Fu et al. (2014, 448-9) found that UI “is not more closely related to the Onge from the Andaman Islands (putative descendants of an early coastal migration) than he is to present-day East Asians or Native Americans

One of Lukasz calcs running these samples, not sure what the genotype rate was but I've highlighted the differences

Mal'ta

0.31% West-African
0.00% Siberian
9.80% South-Indian
1.32% Ne-Asian
17.76% Kalash
0.04% Papuan
0.21% Paleo-African
16.64% Samoyedic
38.52% NE-Euro
0.00% SE-Asian
0.00% Tibeto-Burmese
0.00% SW-Euro
0.00% Caucasian
15.40% Amerindian
0.00% Red-Sea

Ust'-Ishim

8.33% West-African
1.04% Siberian
26.66% South-Indian
3.40% Ne-Asian
4.85% Kalash
11.50% Papuan
4.48% Paleo-African
0.00% Samoyedic
10.32% NE-Euro
8.59% SE-Asian
7.56% Tibeto-Burmese
5.52% SW-Euro
0.10% Caucasian
0.58% Amerindian
7.07% Red-Sea

pegasus
02-14-2018, 09:33 AM
South Indians come out as closest to Usht Ishim like but its because they show a similar West Eurasian/East Eurasian mix, he is thought of as existing before the East/West Eurasian split. MA1 though is thought of distinctively as "West Eurasian" even though he too has 30-35% ASE. Using modern populations to model such ancient genomes is very confusing.

purohit
06-11-2018, 05:07 AM
Clustering right between kashmiri and punjabi brahmins, makes me wonder if we really split from Brahmins.

I'm Punjabi Ramgarhia btw.

You once said your grandfather is baloch?

khanabadoshi
06-11-2018, 05:15 AM
You once said your grandfather is baloch?

His grandfather was from Chaman, Balochistan. However, he wasn't Baloch. It's relevant to mention, in case there was some admixture or something.

MonkeyDLuffy
06-11-2018, 07:46 AM
You once said your grandfather is baloch?

Yes like khana said, although keep in mind chaman is heavily pashtun area as well, so far I score a little different to other Ramgarhias and I do get pashtun matches. But I doubt any mix since my SI is at the higher side of Ramgarhia samples we have.

purohit
06-12-2018, 09:22 AM
Pushtikar and rajasthani brahMin cluster between nw groups but we dont look like them . Genotype koi maayne nhi rakhta phenotype mein.

khanabadoshi
06-13-2018, 08:17 AM
Pushtikar and rajasthani brahMin cluster between nw groups but we dont look like them . Genotype koi maayne nhi rakhta phenotype mein.

Guzarish haan agar aap shakal dekha sakte haan. Shayad aap logon kay surat Janoobi Punjab ya Sindh say milti julti haan? Janoobimashriq Punjab kay log aur Rajasthanioun kay shakalein taqreeban barabar haan meray hisaab say.

MonkeyDLuffy
06-13-2018, 03:51 PM
Guzarish haan agar aap shakal dekha sakte haan. Shayad aap logon kay surat Janoobi Punjab ya Sindh say milti julti haan? Janoobimashriq Punjab kay log aur Rajasthanioun kay shakalein taqreeban barabar haan meray hisaab say.

Farak hai paji, inki shakal depend krti hai community pr. Jo Rajputs hai, especially upper class, vo Punjabi lgte Hain, but rest have that West Indian touch, jesa kuch Sindhi logo Mai hota hai.

purohit
06-13-2018, 07:43 PM
Farak hai paji, inki shakal depend krti hai community pr. Jo Rajputs hai, especially upper class, vo Punjabi lgte Hain, but rest have that West Indian touch, jesa kuch Sindhi logo Mai hota hai.
Rajasthani rajputs looks very rajasthani to me. Especially the western ones

purohit
06-13-2018, 07:53 PM
Guzarish haan agar aap shakal dekha sakte haan. Shayad aap logon kay surat Janoobi Punjab ya Sindh say milti julti haan? Janoobimashriq Punjab kay log aur Rajasthanioun kay shakalein taqreeban barabar haan meray hisaab say.

Mein khud ki pic hi lagau ya sab dosto ki bhi laga doo.

purohit
06-13-2018, 08:36 PM
https://scontent.fjai2-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/35198164_1687481278004699_961633492423147520_n.jpg ?_nc_cat=0&efg=eyJpIjoidCJ9&oh=97ec84a785d35eb4b1d41647ed652612&oe=5BAB8AE1
https://scontent.fjai2-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/35143851_1687484998004327_6263971593739829248_n.jp g?_nc_cat=0&efg=eyJpIjoidCJ9&oh=1bd163919e565548d42d0ff91b2e61bf&oe=5BA5785D
https://scontent.fjai2-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/35228041_1687485981337562_5859807135302418432_n.jp g?_nc_cat=0&efg=eyJpIjoidCJ9&oh=a2e6e7ca1f6beb2d55a9baa42ebbbf38&oe=5BA51484
https://scontent.fjai2-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/35193521_1687485041337656_2135057770741760000_n.jp g?_nc_cat=0&efg=eyJpIjoidCJ9&oh=e0bcd7b2cc1c611523aaac8734e245c6&oe=5BB37FE7
https://scontent.fjai2-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/35236641_1687485461337614_291330565150343168_n.jpg ?_nc_cat=0&efg=eyJpIjoidCJ9&oh=dd454f57434f040332b8774228d74554&oe=5B775461
Me purohit
https://scontent.fjai2-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/35362286_1687485214670972_4069470603743068160_n.jp g?_nc_cat=0&efg=eyJpIjoidCJ9&oh=3012971c5dd379d75de93221cd59c21b&oe=5BBDCB32
https://scontent.fjai2-1.fna.fbcdn.net/v/t34.18173-12/fr/cp0/e15/q65/11418672_1649302125347189_1426337820_n.jpg?_nc_cat =0&efg=eyJpIjoidCJ9&oh=a1072eca17a6fd96457f63fc0f821aa6&oe=5B244449
https://scontent.fjai2-1.fna.fbcdn.net/v/t34.18173-12/fr/cp0/e15/q65/12244051_1649302128680522_1221735686_n.jpg?_nc_cat =0&efg=eyJpIjoidCJ9&oh=86242bf407b0f24db6a65b143a8eb589&oe=5B24114A
https://scontent.fjai2-1.fna.fbcdn.net/v/t34.18173-12/fr/cp0/e15/q65/12212146_880825612003607_145326689_n.jpg?_nc_cat=0&efg=eyJpIjoidCJ9&oh=45e67a0d39b768298cd2f23410b23237&oe=5B2334D2
https://scontent.fjai2-1.fna.fbcdn.net/v/t34.18173-12/fr/cp0/e15/q65/12231540_1649266702017398_1807266234_n.jpg?_nc_cat =0&efg=eyJpIjoidCJ9&oh=75b7b0457a4e5e693e5b110f84ad0a59&oe=5B23ED62
https://scontent.fjai2-1.fna.fbcdn.net/v/t1.6435-9/fr/cp0/e15/q65/35162219_1687525611333599_8460899257438699520_n.jp g?_nc_cat=0&efg=eyJpIjoidCJ9&oh=134240d14200b449c2f1112117fcfa8c&oe=5BAEEBA2


Moderator: Spoiler tagged the photos to prevent clutter/disruption of the thread. Will delete photos when appropriate.

redifflal
06-13-2018, 09:02 PM
Mujhe yeh do jagon matlab Punjab ya Sindh/Gujarat me utna fark nahi lagta hain ki ek Rajasthani banda kisi me pass ho aur doosre me na ho. Genetics alag ho sakte hain magar phenotype ka range lagbhag ek hi hota hain.

Edit: never mind saw khana had asked for the pics to be posted. Go on then

purohit
06-13-2018, 09:20 PM
Theek hai saa

khanabadoshi
06-13-2018, 09:23 PM
Damn, very interesting set of pictures almost all of you pass easy in Punjab to me, but different regions of it. Two stand out immediately to me as not-native; ie. if I saw them in Pakistan I will assume they moved post-47. One guy looks very Brahui/Baloch/Sindhi to me, and if I saw him in Pakistan, my immediate assumption is that he is Brahui.

I'll post more in detail later.

redifflal
06-13-2018, 09:39 PM
Theek hai saa

Sorry yaar humne aapko yeh likhne ke baad hi dekhen ki khanaji aapko khud hi bole the yahan post karne ko. Moderator haan bole toh hum kya.

Vaise aapka tasveeron me sabhi janon ye pushtikar brahmand hain? Sach bolun toh mujhe lagta nahi ki kabhi Rajasthan ka brahmand dekha ho. Kolkata me joh Rajasthani hain voh Marwari Baniye hote hain ya Rajput hote hain. Aaplogon ka phenotype Rajput/Thakur jaise hain.

redifflal
06-13-2018, 09:45 PM
Aaplog ka surname kya hote hain?

purohit
06-13-2018, 10:08 PM
Aaplog ka surname kya hote hain?

Purohit, Vyas, joshi, acharya, bohra, bora, chhangani ,bissa ,ranga, thanvi, kiradoo, kalla kapta, pushkarna, there are more than 36 subcastes. Our language is marwari most pushkarna live in bikaner jodhpur and jaisalmer districts. Some pushkarna live in North gujrat and pakistan.

MonkeyDLuffy
06-13-2018, 10:32 PM
Mujhe yeh do jagon matlab Punjab ya Sindh/Gujarat me utna fark nahi lagta hain ki ek Rajasthani banda kisi me pass ho aur doosre me na ho. Genetics alag ho sakte hain magar phenotype ka range lagbhag ek hi hota hain.

Edit: never mind saw khana had asked for the pics to be posted. Go on then

There is a variation, I guess since I am native to the place, and grew up around punjabis, himachalis, kashmiris, Uttrakhandis, Haryanvis and rajasthanis, for me it is easier to tell the difference than a person who is not native to the region.

Some of them look very punjabi as khana stated, but some show a little foreign element, IMO they would fit better in Haryana than punjab. While Sindhis and punjabis overlap in looks usually, gujaratis are different looking than both of them (unless we are talking about mixed individuals or brahmins). 8/10 Patels who make majority of gujurati diaspora are very easy to distinguish. Same way I have seen purohit, he fits better in Haryana than Punjab.

khanabadoshi
06-13-2018, 10:48 PM
Theek hai saa


Sorry yaar humne aapko yeh likhne ke baad hi dekhen ki khanaji aapko khud hi bole the yahan post karne ko. Moderator haan bole toh hum kya.

Vaise aapka tasveeron me sabhi janon ye pushtikar brahmand hain? Sach bolun toh mujhe lagta nahi ki kabhi Rajasthan ka brahmand dekha ho. Kolkata me joh Rajasthani hain voh Marwari Baniye hote hain ya Rajput hote hain. Aaplogon ka phenotype Rajput/Thakur jaise hain.

I will edit the post and put the pictures in a Spoiler tag. Later I will delete the pictures from the post when purohit wants me to, or if it has been up too long.

poi
06-13-2018, 11:01 PM
Purohit, Vyas, joshi, acharya, bohra, bora, chhangani ,bissa ,ranga, thanvi, kiradoo, kalla kapta, pushkarna, there are more than 36 subcastes. Our language is marwari most pushkarna live in bikaner jodhpur and jaisalmer districts. Some pushkarna live in North gujrat and pakistan.

Bro, have I run your gedmatch? If not and want me to run to generate scores and place you in the PCA, you can send it to me.

redifflal
06-13-2018, 11:27 PM
There is a variation, I guess since I am native to the place, and grew up around punjabis, himachalis, kashmiris, Uttrakhandis, Haryanvis and rajasthanis, for me it is easier to tell the difference than a person who is not native to the region.

Some of them look very punjabi as khana stated, but some show a little foreign element, IMO they would fit better in Haryana than punjab. While Sindhis and punjabis overlap in looks usually, gujaratis are different looking than both of them (unless we are talking about mixed individuals or brahmins). 8/10 Patels who make majority of gujurati diaspora are very easy to distinguish. Same way I have seen purohit, he fits better in Haryana than Punjab.

Ah no problem. I was honestly seeing Haryana and Himachal as a sub region of historical Punjab. And yes they do look very Haryanvi to me as well.

purohit
06-14-2018, 04:36 AM
Bro, have I run your gedmatch? If not and want me to run to generate scores and place you in the PCA, you can send it to me.

I have not taken any dna test yet. Ken ji aryan once said its not possible in india. Is it true? Can i order it online?

poi
06-14-2018, 04:57 AM
I have not taken any dna test yet. Ken ji aryan once said its not possible in india. Is it true? Can i order it online?

I am guessing, but there has to be a HUGE genetic testing industry in India. May be shipping biological samples outside of the country might not be allowed(again, guessing), but there are probably local companies offering services.

purohit
06-14-2018, 05:03 AM
There is a variation, I guess since I am native to the place, and grew up around punjabis, himachalis, kashmiris, Uttrakhandis, Haryanvis and rajasthanis, for me it is easier to tell the difference than a person who is not native to the region.

Some of them look very punjabi as khana stated, but some show a little foreign element, IMO they would fit better in Haryana than punjab. While Sindhis and punjabis overlap in looks usually, gujaratis are different looking than both of them (unless we are talking about mixed individuals or brahmins). 8/10 Patels who make majority of gujurati diaspora are very easy to distinguish. Same way I have seen purohit, he fits better in Haryana than Punjab.

मार्गदर्शन karo bhai. How to and where to take dna test in india. Kharcha kitna aayega. Kya bharat mein possible hai

poi
06-14-2018, 05:17 AM
मार्गदर्शन karo bhai. How to and where to take dna test in india. Kharcha kitna aayega. Kya bharat mein possible hai

bhai, FTDNA might be able to do it, but the shipping charge could be high. I do not know anything about local Indian companies though. https://www.familytreedna.com/learn/ftdna/shipping-dna-tests-to-international-destinations/

Kulin
06-14-2018, 05:21 AM
मार्गदर्शन karo bhai. How to and where to take dna test in india. Kharcha kitna aayega. Kya bharat mein possible hai

Bhai, the dari suits ur look.