Page 215 of 398 FirstFirst ... 115165205213214215216217225265315 ... LastLast
Results 2,141 to 2,150 of 3971

Thread: Global25 automated nMonte for South/Central Asian members

  1. #2141
    Registered Users
    Posts
    23
    Sex

    Quote Originally Posted by Generalissimo View Post
    Don't worry about any of this. The accuracy of the Global25 is validated by formal statistics models.

    In other words, when the Global25 and formal models correlate closely, then there's no problem. When they don't correlate then there's a problem, and there's no point going down that road, like modeling or modeling with super old samples that aren't ancestral to any modern populations.
    Hey, great to hear from the General himself. Please also read my post after that which poi quoted just above. I recently got my global25 coordinates from you and have been in this forum for a bit. It's out of curiosity, but you probably have seen these numbers back and front you might be able to do an estimate by eye.
    Say we are modeling p1 and p2 : nep_bram and up_bram (or pakistani) using these as base populations (DMXX's work):


    Barcin N + Ganj D + Han + Khvalynsk + N Simulated AASI + W Siberia N (+MysteryP)

    All of them have a non-zero distance. Could you produce a point MysteryP (25 numbers) such that both the p1 and p2 can have zero distance.
    - ideally there exists a point that is the closest to the existing points.
    - maybe it's a stretch, but an algorithm like that could be used to speculate a population that is related to both p1 and p2
    Last edited by dumbo007; 09-12-2018 at 02:14 AM.

  2. The Following 4 Users Say Thank You to dumbo007 For This Useful Post:

     Jatt1 (09-12-2018),  pnb123 (09-12-2018),  poi (09-12-2018),  Sapporo (09-12-2018)

  3. #2142
    Registered Users
    Posts
    412
    Sex
    Location
    U.S
    Nationality
    Indian
    Y-DNA
    Z30522+
    mtDNA
    C4a1a (T195C!)

    India United States of America Chola Empire India Maratha Empire
    Quote Originally Posted by dumbo007 View Post
    Thank you. I'm glad you're looking at it this way.
    I'll summarize the assumptions I am making, so we can agree or correct them. I think you guys are doing a great job coming up with base populations that make historic sense and calculating these distances- I wanted to hack the math to maybe find a shortcut looking the other way.
    - The PCA basis that Davidsky defined creates a 25Dim space where each population is a point.
    - When looking for a fit for pop_x as an admixture of p1, p2, p3... we are looking for the closest point on the shape formed by p1,p2... to pop_x. The Euclidean distance from pop_x to that closest point is the fit distance. The coordinates of that point in the basis of p1,p2 are the admixtures we look at
    - The admixture fractions have to be less than 1 (you wouldn't have a person that is 1.1 ASI + 1.1 Iran etc.) So, in that example, the closest point would be on the plane defined by the three points (the normal projection of (1.1, 1.1, 1.1) and the fit distance would be non-zero ( the distance of the normal drawn from the point onto the plane in this case).
    - Changing bases should preserve the euclidean distance in a space- I think your star analogy might be inaccurate.
    - I agree the points Ganj_dareh, Barcin, Steppe, ASI form a polygon ( in 2D, polyhedron in 3D, general polytope in n-D) and the SA populations are scattered on the periphery of this polytope so that all the fit distances are non-zero and it is possible that they are on different directions.
    If this was 3D, let's say these components make up a shape like a glob of icecream. and say india_bram lies slightly outside in one direction (like a fruitfly), such that the fit distance (= euclidean distance) to the closest point on the icecream glob is 2.7. Maybe nep_bram lies a distance of 2.7 but in a slightly different direction. But it should be possible to find a point on this space, say the tip of an icecream cone, such that when the cone were to touch the icecream glob, the point india_bram falls between (inside) the icecream and the tip of the cone. If the tip of the icecream were to be taken as another of the base components, it should be possible to find a fit distance of zero with all admixture components less than 1.
    - My speculation was that are we systematically missing a icecream-cone tip point (= some mystery population) (possibly more, but maybe it is just one such point that might explain this for more than one SA groups) and if we add that 'mystery component' it is possible to write these components with admixtures less than 1, and fit distance zero.
    - I understand that even if such a point exists and is found with math first (I've seen a lot of attempts to come up with what might base components for these populations, but they still have a distance) , it might not make sense genealogically, but I think it is worth thinking about because it's not ruled out. Once we find a point like that we can see what that mystery component is- which way the cone-tip points, or what the closest existing/ancient population is to that tip.
    Yes you are correct in that 1.1 would not be a value in a scaled PCA (because it should all sum to 1), but I was just illustrating an example of a linear combination, which for a truly orthonormal choice of real eigenvectors should represent any point in the space, such as using unscaled basis vectors.
    Here is my understanding of the PCA coordinates given by David and what nMonte is doing:
    Each of the samples is represented by 25 different coordinates in a PCA space with 25 orthonormal vectors, where PC1 contains the most variance demonstrated by the data, followed by PC2, PC3 etc, with PC25 containing the least variance (caution that this is variaence of the whole data set and so that may wipe out uniqueness of a highly divergent population with only a few samples). nMonte is fitting a reduced dimensional space (identified as the surface formed by the points we choose as source populations) to this 25D by least-squares projections, and then calculating our individual samples location in that reduced space (using an overfit model because source populations themselves are related to each other).
    Distances are only preserved on a linear map (translation, rotation etc) when dimensions are preserved. Projections do not preserve distances (a simple example is if we collapse 2D space into 1D space using projection, a point originally at '(a,b)' is now at 'a', and the euclidean distance from the origin to that point has shrunk from 'sqrt(a^2+b^2)' to 'a' (sorry if I am being too basic, but I wanted to be inclusive of others in this discussion as much as possible so my own viewpoint is corrected if in error). This is where the astronomy example came in. (Regarding the ice-cream example, I wonder if the main problem is not where the fruit fly is with respect to the ice-cream in a given frame, but rather that the fruit fly appears in the location because it is superimposed into the image with the ice-cream by removing time as a dimension).

    Instead of manually deciding which source populations best fit all of us, I guess one can use monte-carlo simulations or step-wise regression to figure out the best set of source populations that works for all of us within the given population space.
    Paternal YDNA: G-P303+ -> G-Z30522+
    Paternal mtDNA: U7a3b1
    Maternal YDNA: R-Z2123+ -> R-YP526+
    Maternal mtDNA: C4a1 (T195C!)

  4. The Following 4 Users Say Thank You to soulblighter For This Useful Post:

     dumbo007 (09-12-2018),  Jatt1 (09-12-2018),  pnb123 (09-12-2018),  poi (09-12-2018)

  5. #2143
    Gold Member Class
    Posts
    3,296
    Sex
    Ethnicity
    Nepali Brahmin
    Y-DNA
    R1a-L657>Y6
    mtDNA
    M30

    fwiw - here are the eigenvals (used to convert the unscaled pcs into scaled)

    Code:
    129.557,103.13,14.222,10.433,9.471,7.778,5.523,5.325,4.183,3.321,2.637,2.246,2.21,1.894,1.842,1.758,1.7,1.605,1.58,1.564,1.557,1.529,1.519,1.452,1.434
    Hidden Content
    Global25 Web Runner - Scaled - Hidden Content | Hidden Content | Hidden Content
    Global25 Web Runner - Unscaled - Hidden Content | Hidden Content | Hidden Content
    To include your G25 coordinates and/or sponsorship support - email: Hidden Content
    Hidden Content
    LukaszM's Eurogenes K36 PCA: Hidden Content | Hidden Content

  6. The Following 3 Users Say Thank You to poi For This Useful Post:

     dumbo007 (09-12-2018),  Jatt1 (09-12-2018),  soulblighter (09-12-2018)

  7. #2144
    Gold Member Class
    Posts
    3,296
    Sex
    Ethnicity
    Nepali Brahmin
    Y-DNA
    R1a-L657>Y6
    mtDNA
    M30

    Does anyone want to create a group in the tool for their ethnic group (or add to the existing groups)?

    I added one for my ethnic group. I can do for others as well... just make sure you represent the "true" representation of your group with all 4 grandparents of the same ethnic group.

    It is interesting to see how the "average" looks versus individually. Also added the ability to use the g25 group averages as if they individuals.



    Hidden Content
    Global25 Web Runner - Scaled - Hidden Content | Hidden Content | Hidden Content
    Global25 Web Runner - Unscaled - Hidden Content | Hidden Content | Hidden Content
    To include your G25 coordinates and/or sponsorship support - email: Hidden Content
    Hidden Content
    LukaszM's Eurogenes K36 PCA: Hidden Content | Hidden Content

  8. The Following 8 Users Say Thank You to poi For This Useful Post:

     aaronbee2010 (09-12-2018),  bmoney (09-12-2018),  bored (09-14-2018),  Jatt1 (09-12-2018),  pnb123 (09-12-2018),  Sapporo (09-12-2018),  Zaid (09-13-2018),  Zuran (09-12-2018)

  9. #2145
    Registered Users
    Posts
    23
    Sex

    Quote Originally Posted by soulblighter View Post
    Yes you are correct in that 1.1 would not be a value in a scaled PCA (because it should all sum to 1), but I was just illustrating an example of a linear combination, which for a truly orthonormal choice of real eigenvectors should represent any point in the space, such as using unscaled basis vectors.
    Here is my understanding of the PCA coordinates given by David and what nMonte is doing:
    Each of the samples is represented by 25 different coordinates in a PCA space with 25 orthonormal vectors, where PC1 contains the most variance demonstrated by the data, followed by PC2, PC3 etc, with PC25 containing the least variance (caution that this is variaence of the whole data set and so that may wipe out uniqueness of a highly divergent population with only a few samples). nMonte is fitting a reduced dimensional space (identified as the surface formed by the points we choose as source populations) to this 25D by least-squares projections, and then calculating our individual samples location in that reduced space (using an overfit model because source populations themselves are related to each other).
    Distances are only preserved on a linear map (translation, rotation etc) when dimensions are preserved. Projections do not preserve distances (a simple example is if we collapse 2D space into 1D space using projection, a point originally at '(a,b)' is now at 'a', and the euclidean distance from the origin to that point has shrunk from 'sqrt(a^2+b^2)' to 'a' (sorry if I am being too basic, but I wanted to be inclusive of others in this discussion as much as possible so my own viewpoint is corrected if in error). This is where the astronomy example came in. (Regarding the ice-cream example, I wonder if the main problem is not where the fruit fly is with respect to the ice-cream in a given frame, but rather that the fruit fly appears in the location because it is superimposed into the image with the ice-cream by removing time as a dimension).

    Instead of manually deciding which source populations best fit all of us, I guess one can use monte-carlo simulations or step-wise regression to figure out the best set of source populations that works for all of us within the given population space.
    I agree, if the coefficients are not constrained to be <=1, orthogonal bases can span the entire space. Here, in admixtures, these are constrained, so in the example, the closest point would be (because of symmetry) 1/3p1, 1/3p2, 1/3p3 and the projection comes in defining the distance from this point to pop_x. I'm pretty sure that we are talking fruitfly distance from icecream (comments please). A projection of a distant point onto a second plane and then using that point without also including the distance from original point would be misleading .

    As for doing monte carlo to find the distance of a given point from a set of known points is what is being done now and it also makes sense genealogically. The reason I'm saying to look for a new point is because seems like there might not be an existing point where the cone-tip would be- or else I'm sure guys would've found it by trial already. Because all these SA samples still have some distance from sets of well chosen reference points, it might be instructive to ask: if I pick these n base populations, what extra point is needed (cone-tip) to make sure the fit-distances are zero, i.e the populations are entirely contained within the shape formed by the basis points.
    In a way it is a way to quantify the fit distance instead of a scalar into a vector. or where are all the SA data-points (fruitflies) in reference to the glob (defining base populations)

  10. #2146
    Gold Member Class
    Posts
    3,296
    Sex
    Ethnicity
    Nepali Brahmin
    Y-DNA
    R1a-L657>Y6
    mtDNA
    M30

    It turns out that G25 coordinates I had for me ("poi") and my sister-in-law ("poi_sil") were SWAPPED. No wonder "poi" was so close to "poi_motherinlaw", while "poi_sil" was so close to "poi_mom". At least part of the craziness is now explained.

    So, it looks like I'm the most WestSiberianN+Barcin shifted of my group with extremely low KhvalynskEN. Also, it makes sense that my East Asian is elevated compared to others since my mom also has higher East Asian in the group. My mother in law is at the bottom when it comes to East Asian in our group.

    ps -- I double checked everybody's coordinates and mine was the only instance that was screwed up, so your coordinate labels should be fine. I will fix this in the next update.
    Last edited by poi; 09-12-2018 at 03:30 PM.
    Hidden Content
    Global25 Web Runner - Scaled - Hidden Content | Hidden Content | Hidden Content
    Global25 Web Runner - Unscaled - Hidden Content | Hidden Content | Hidden Content
    To include your G25 coordinates and/or sponsorship support - email: Hidden Content
    Hidden Content
    LukaszM's Eurogenes K36 PCA: Hidden Content | Hidden Content

  11. The Following 11 Users Say Thank You to poi For This Useful Post:

     bmoney (09-13-2018),  Jatt1 (09-12-2018),  khanabadoshi (09-13-2018),  MonkeyDLuffy (09-12-2018),  parasar (09-12-2018),  pnb123 (09-12-2018),  prashantvaidwan (09-12-2018),  Reza (09-12-2018),  Sapporo (09-12-2018),  traject (09-12-2018),  Zaid (09-13-2018)

  12. #2147
    Gold Member Class
    Posts
    2,634
    Sex
    Location
    Canada
    Ethnicity
    Punjabi Sikh Ramgarhia
    Nationality
    Canadian
    Y-DNA
    R1a-Y2568*
    mtDNA
    M3a2

    India Punjab Canada Azad Baluchistan Sikh Empire Nishan Sahib
    Does anyone have breakdown of saidu sharif outlier? Pegasus?
    Last edited by MonkeyDLuffy; 09-14-2018 at 03:24 AM.
    Deg Teg Fateh - Victory to Charity and Arms

    Punjab, Punjabi, Fateh.

  13. The Following 2 Users Say Thank You to MonkeyDLuffy For This Useful Post:

     Jatt1 (09-14-2018),  Zuran (09-14-2018)

  14. #2148
    Suspended Account
    Posts
    1,576
    Sex
    Location
    Silicon Valley
    Ethnicity
    JUTT
    Nationality
    American & Canadian
    Y-DNA
    L1a2a1 / L1a1
    mtDNA
    HV2a3 / R0

    United States of America California Republic Canada India Punjab Sikh Empire Nishan Sahib
    Quote Originally Posted by MonkeyDLuffy View Post
    Does anyone have breakdown of saidu sharif outlier? Pegasus?
    It's more or less a more AASI shifted version of SIS BA3 I believe. Although, it might get very minor Steppe.

    Edit: Just modeled it. Exactly as I said.

    Screen Shot 2018-09-14 at 1.07.00 AM.png

    Modeled it alongside SIS BA2 and SIS BA3. Not 100% in line with the paper. Most probably because the simulated AASI is still just slightly off. A little inflated as SIS BA3 should be getting around 42-44% and SIS BA2 closer to 14-15%.

    Screen Shot 2018-09-14 at 1.15.36 AM.png

    Modeled them using the Barcin N + Khvalynsk model as well.

    Screen Shot 2018-09-14 at 1.39.22 AM.png
    Last edited by Sapporo; 09-14-2018 at 08:41 AM.
    I4285 I4285 1873-1661 calBCE (343025 BP, PSUAMS-2536) BMAC Sappali_Tepe_BA Sappali Tepe Uzbekistan U7a3 L1a
    I5604 I5604 1880-1697 calBCE (346520 BP, PSUAMS-2774) BMAC Bustan_BA Bustan Uzbekistan K1a1 L1a
    I6667 I6667 1497-1413 calBCE (317020 BP, PSUAMS-2998) Parkhai_LBA_o Parkhai_LBA_o Parkhai II Turkmenistan HV2a
    I6669 I6669 3082-2909 calBCE (436525 BP, PSUAMS-2950) Parkhai_EN Parkhai_EN Parkhai II Turkmenistan HV2
    I4899 I4899 1600-1300 BCE BMAC Bustan_BA Bustan Uzbekistan R0 J

  15. The Following 6 Users Say Thank You to Sapporo For This Useful Post:

     bmoney (09-15-2018),  pegasus (09-14-2018),  pnb123 (09-14-2018),  poi (09-14-2018),  tipirneni (09-19-2018),  Zuran (09-14-2018)

  16. #2149
    Registered Users
    Posts
    2,228
    Location
    Gonur Tepe

    Afghanistan Jammu and Kashmir United States of America Canada
    Quote Originally Posted by MonkeyDLuffy View Post
    Does anyone have breakdown of saidu sharif outlier? Pegasus?
    He s like a 40% Iran_N , 50% AASI , 10% EHG but lacks Barcin so it cannot be MLBA Steppe, but Irula and Gond score Barcin but in their case, its extremely likely representing some archaic combo of Basal and WHG ish not actual ANF.

  17. The Following 5 Users Say Thank You to pegasus For This Useful Post:

     bmoney (09-15-2018),  MonkeyDLuffy (09-14-2018),  poi (09-14-2018),  Sapporo (09-14-2018),  Zuran (09-14-2018)

  18. #2150
    Gold Member Class
    Posts
    2,634
    Sex
    Location
    Canada
    Ethnicity
    Punjabi Sikh Ramgarhia
    Nationality
    Canadian
    Y-DNA
    R1a-Y2568*
    mtDNA
    M3a2

    India Punjab Canada Azad Baluchistan Sikh Empire Nishan Sahib
    Quote Originally Posted by pegasus View Post
    He s like a 40% Iran_N , 50% AASI , 10% EHG but lacks Barcin so it cannot be MLBA Steppe, but Irula and Gond score Barcin but in their case, its extremely likely representing some archaic combo of Basal and WHG ish not actual ANF.
    It gets me the best fits, better than Paniya, under 2.
    Deg Teg Fateh - Victory to Charity and Arms

    Punjab, Punjabi, Fateh.

  19. The Following 2 Users Say Thank You to MonkeyDLuffy For This Useful Post:

     Jatt1 (09-14-2018),  pegasus (09-14-2018)

Page 215 of 398 FirstFirst ... 115165205213214215216217225265315 ... LastLast

Similar Threads

  1. Replies: 85
    Last Post: 01-25-2019, 03:20 AM
  2. Central and South Asian DNA Paper
    By Principe in forum Ancient (aDNA)
    Replies: 813
    Last Post: 10-12-2018, 05:29 PM
  3. Replies: 25
    Last Post: 03-09-2018, 02:35 AM
  4. Replies: 14
    Last Post: 04-17-2017, 10:24 PM
  5. In what world is Afghanistan Central or South Asian?
    By MyAnthropologies in forum Pashto
    Replies: 7
    Last Post: 08-17-2016, 08:18 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •