Page 1 of 35 12311 ... LastLast
Results 1 to 10 of 346

Thread: European PCA and beyond

  1. #1
    Registered Users
    Posts
    876
    Sex
    Location
    Netherlands
    Ethnicity
    South-Dutch
    Nationality
    Dutch
    Y-DNA (P)
    I2a2a1b2-CTS1977
    mtDNA (M)
    H13a1a1

    Netherlands Belgium

    European PCA and beyond

    European PCA and beyond

    I am interested in how to best prepare a map of the European samples of the Eurogenes Global 25 PCA data.
    The classical method is to use Principle Component Analysis (PCA).
    Modern data scientist have developed a lot of alternative models. In this post I will compare PCA with t-distributed Stochastic Neighbor Embedding (t-SNE).

    PCA transforms a set of correlated variables into a set of uncorrelated variables which are called 'principal components' or shortly 'dimensions'.
    The method is not only used to get uncorrelated (orthogonal) components, but also to reduce the number of variables: because the higher components are more noisy, dropping the higher dimensions results in much cleaner data.

    The Global 25 dataset estimates the 25 PCA components of raw DNA samples.
    In the complete set of the Global 25 samples the optimal number of components appears to be 4.
    When selecting a subset of the Global 25, the independence of the variables may have been lost, so it is advisable to perform a secondary PCA to restore the indepence of the scores.
    The secondary PCA scores of the European subset appears to have only 2 principle components. This comes in handy for preparing scatter plots.

    In the ecosystem of Global 25 users it is customary NOT to use PCA scores, but to multiply them with the root of the eigenvalues and call this 'scaled' data.
    When the subset has only 2 principle components, the bias of this 'scaling' will remain limited, but IMO it is an amateurish method without scientific basis.
    Anyway, I do not use it.

    I selected the the European samples from Iron Age to modern times. I dropped some very distant populations (Chuvash, Mari, Bashkir, Kalmyk, Nogai, England_Roman_o).
    Here is the scatterplot of the first two dimensions of the secondary PCA.
    To give an idea where the populations are located on the plot, I have color-marked 10 populations. These labels are derived from the Global 25 labels, they are not the result of the PCA algorithm.
    secPCA.png

    In the past few years data scientists have developed alternative methods to handle multidimensional data. Many of these methods try to find a 'manifold'.
    The idea is that high-dimensional data may locally be close to a lower dimensional manifold. An example is the surface of the earth. It has 3-dimensional coordinates, but at small distances it appears to be 2-dimensional.
    Here is the scatterplot of the same 25 dimensional European data projected on a 2-dimensional t-SNE manifold.
    t-SNE.png

    Again the color-marking is from the Global labeling, not from the t-SNE algorithm.
    The scatterplot shows a much more detailed structure than the secondary PCA. Yet in a topological sense, the structure appears similar.
    (Note the two yellow outliers in both plots. They are Italian_Northeast_o:ALP188 and Italian_Trentino-Alto-Adige_o:ALP414)
    On the lower right are two Caucasian clusters. On the upper right the North East European populations are separated from the more Western Europeans.
    A t-SNE plot should not be interpreted in terms of genetic distances, but in terms of shared near neighbors.
    T-SNE is considered a useful tool for exploring and visualizing, but the result should always be validated.
    In this case the clustering by the color-marked populations appears to be plausible.
    Yet my feeling is that the result could have been stronger with more data, especially in Middle and Eastern Europe.
    T-SNE is created for datasets with more dimensions and more samples.
    As it is now, the clustering may be more or less overfitted, but most models are. I like it as an interesting exploration, which is methodologically very different from PCA.

  2. The Following 30 Users Say Thank You to Huijbregts For This Useful Post:

     Просигој (09-29-2019),  μόνος (10-06-2019),  Agamemnon (09-30-2019),  Andour (10-03-2019),  anglesqueville (09-29-2019),  Aroon1916 (09-29-2019),  Camulogène Rix (09-29-2019),  cpan0256 (10-01-2019),  dchicn (09-29-2019),  Defski (10-02-2019),  Dimanto (10-25-2019),  DMXX (09-29-2019),  Endovelicus (09-30-2019),  Garimund (09-29-2019),  Jessie (09-29-2019),  JMcB (09-29-2019),  loxias (10-16-2019),  MitchellSince1893 (10-02-2019),  Nino90 (09-29-2019),  ph2ter (09-29-2019),  Power77 (10-02-2019),  Pribislav (10-06-2019),  Radboud (09-29-2019),  randwulf (09-29-2019),  Ruderico (09-29-2019),  Sea Warrior (10-04-2019),  sktibo (09-29-2019),  timberwolf (10-01-2019),  traject (10-07-2019),  Trelvern (09-29-2019)

  3. #2
    Moderator
    Posts
    6,205
    Sex
    Location
    Normandy
    Ethnicity
    northwesterner
    Y-DNA (P)
    R-BY3604-Z275
    mtDNA (M)
    H5a1

    Normandie Netherlands Friesland Finland Orkney
    Hi Ger. Did you use the tsne package in R ?
    En North alom, de North venom
    En North fum naiz, en North manom

    (Roman de Rou, Wace, 1160-1170)

  4. The Following 3 Users Say Thank You to anglesqueville For This Useful Post:

     Agamemnon (09-30-2019),  JMcB (09-29-2019),  Power77 (10-02-2019)

  5. #3
    Registered Users
    Posts
    876
    Sex
    Location
    Netherlands
    Ethnicity
    South-Dutch
    Nationality
    Dutch
    Y-DNA (P)
    I2a2a1b2-CTS1977
    mtDNA (M)
    H13a1a1

    Netherlands Belgium
    There are several implementations of t-SNE in R.
    I used Rtsne.
    The Rtsne function has many optional parameters. I used:
    res.tsne <- Rtsne(myData, perplex=15, initial_dims=2, theta=0, max_iter=3000)

  6. The Following 6 Users Say Thank You to Huijbregts For This Useful Post:

     Agamemnon (09-30-2019),  anglesqueville (09-29-2019),  JMcB (09-29-2019),  ph2ter (09-29-2019),  Power77 (10-02-2019),  Ruderico (10-01-2019)

  7. #4
    Moderator
    Posts
    6,205
    Sex
    Location
    Normandy
    Ethnicity
    northwesterner
    Y-DNA (P)
    R-BY3604-Z275
    mtDNA (M)
    H5a1

    Normandie Netherlands Friesland Finland Orkney
    Thanks, but what about the data? smartpca (the underlying program for G25) uses the genotypes under Eigenstrat format. Which matrix is your "myData" ?
    En North alom, de North venom
    En North fum naiz, en North manom

    (Roman de Rou, Wace, 1160-1170)

  8. The Following 3 Users Say Thank You to anglesqueville For This Useful Post:

     Agamemnon (09-30-2019),  JMcB (09-29-2019),  Power77 (10-02-2019)

  9. #5
    Registered Users
    Posts
    876
    Sex
    Location
    Netherlands
    Ethnicity
    South-Dutch
    Nationality
    Dutch
    Y-DNA (P)
    I2a2a1b2-CTS1977
    mtDNA (M)
    H13a1a1

    Netherlands Belgium
    Quote Originally Posted by anglesqueville View Post
    Thanks, but what about the data? smartpca (the underlying program for G25) uses the genotypes under Eigenstrat format. Which matrix is your "myData" ?
    The data are the unscaled PCA scores as supplied by Eurogenes.
    I have selected the Iron Age and modern European samples, with the exception of some distant populations:
    Chuvash, Mari, Bashkir, Kalmyk, Nogai, England_Roman_o.

  10. The Following 4 Users Say Thank You to Huijbregts For This Useful Post:

     Agamemnon (09-30-2019),  JMcB (09-29-2019),  Power77 (10-02-2019),  Ruderico (09-29-2019)

  11. #6
    Registered Users
    Posts
    2,693
    Sex
    Location
    Zagreb
    Ethnicity
    Croatian (NW)
    Nationality
    Croatian
    Y-DNA (P)
    I2a1a2b (Y5596>A815)

    Croatia Austrian Empire Slovenia
    Looks promising.
    I will try it in the near future.

  12. The Following 3 Users Say Thank You to ph2ter For This Useful Post:

     Agamemnon (09-30-2019),  JMcB (09-29-2019),  Power77 (10-02-2019)

  13. #7
    Gold Class Member
    Posts
    3,659
    Sex
    Location
    Calne,England
    Ethnicity
    British and Irish
    Nationality
    Great Britain
    Y-DNA (P)
    E-Y45878
    mtDNA (M)
    H67

    United Kingdom Scotland England Ireland
    Obviously a lot of thought goes into something like this , but for the average person , who's interested in exploring their ethnicity , this kind of thing is as clear as mud. .
    Please support Mental health research and world community grid

    Hidden Content
    Hidden Content
    Hidden Content
    Hidden Content

  14. The Following User Says Thank You to firemonkey For This Useful Post:

     jstephan (10-17-2019)

  15. #8
    Registered Users
    Posts
    876
    Sex
    Location
    Netherlands
    Ethnicity
    South-Dutch
    Nationality
    Dutch
    Y-DNA (P)
    I2a2a1b2-CTS1977
    mtDNA (M)
    H13a1a1

    Netherlands Belgium
    Quote Originally Posted by firemonkey View Post
    Obviously a lot of thought goes into something like this , but for the average person , who's interested in exploring their ethnicity , this kind of thing is as clear as mud. .
    Unfortunately, exploring the ethnicity is not as simple as counting blood cells.

  16. The Following 5 Users Say Thank You to Huijbregts For This Useful Post:

     Просигој (09-29-2019),  Agamemnon (09-30-2019),  JMcB (09-29-2019),  Power77 (10-02-2019),  Trelvern (09-29-2019)

  17. #9
    Moderator
    Posts
    6,205
    Sex
    Location
    Normandy
    Ethnicity
    northwesterner
    Y-DNA (P)
    R-BY3604-Z275
    mtDNA (M)
    H5a1

    Normandie Netherlands Friesland Finland Orkney
    Ger, first I'm perfectly aware that this methodology is highly interesting. First of all, kudos for this. But, at least in some ways of use and interpretation, there will be likely some problems. In my understanding t-sne can be used basically if the data can be seen as points on 2-manifold. With a data matrix of the kind that we are willing to use here, it's a heavy assumption. Example: I used Rtsne with your settings on a matrix of "unscaled" G25 components of some forum users (including yourself btw), plus some European individuals from a few populations (British, Irish, German, Italian, Spanish, Ukrainian and Finnish). Inside the matrix there are Agamemnon, his father (South Italian + Jewish), and his mother (English). This trio is very badly treated, as we could expect with a basically local (in a topological sense) methodology:

    plot.jpg

    edit: the brown cluster is the German one, the dark blue the BritishIrish
    edit edit: hopefully you will not take this post as a criticize. In fact, I'm rather excited ( in a very positive sense) about your idea. I'm only wondering if there is a way to treat this kind of issue.
    Last edited by anglesqueville; 09-29-2019 at 04:45 PM.
    En North alom, de North venom
    En North fum naiz, en North manom

    (Roman de Rou, Wace, 1160-1170)

  18. The Following 7 Users Say Thank You to anglesqueville For This Useful Post:

     Agamemnon (09-30-2019),  Camulogène Rix (09-29-2019),  Dewsloth (09-30-2019),  Drewcastle (10-07-2019),  Helgenes50 (10-08-2019),  JMcB (09-29-2019),  Power77 (10-02-2019)

  19. #10
    Registered Users
    Posts
    1,270
    Sex
    Location
    Sweden
    Ethnicity
    Germanic + Italo-Celt
    Nationality
    Swedish
    Y-DNA (P)
    R-L2 / R1b-U152
    mtDNA (M)
    H1a1

    Sweden Italy Italy 1861-1946 Sami Vatican
    Good luck!
    Hidden Content

    Eurogenes K13 Mixed Mode Population Sharing:
    1. 63.1% Swedish + 36.9% Spanish_Andalucia @ 2.93

    Dodecad K12b
    7. 66.4% Norwegian + 33.6% TSI30 (Metspalu) @ 3.24
    9. 60.6% Norwegian + 39.4% N_Italian @ 3.32

     

    [1] "distance%=4.0727"
    Nino_scaled

    FARMERS-Balkans_Neolithic,42
    STEPPE-Eneolithic,35.8
    HUNTERS-WHG,16.6
    HUNTERS-West_Siberia_Neolithic,5
    EAST-ASIATIC_Neolithic,0.4

    IBEROMAURUSIAN,0.2

Page 1 of 35 12311 ... LastLast

Similar Threads

  1. That's a lot of European for a Berber, right?
    By Aizkora in forum General
    Replies: 56
    Last Post: 10-29-2019, 12:19 AM
  2. Replies: 13
    Last Post: 09-13-2016, 05:21 PM
  3. Replies: 84
    Last Post: 01-13-2016, 01:14 AM
  4. Minoans were European
    By Grossvater in forum General
    Replies: 23
    Last Post: 10-11-2013, 04:34 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •