Page 1 of 30 12311 ... LastLast
Results 1 to 10 of 298

Thread: How I run clustering in Past4 and Excel

  1. #1
    Registered Users
    Posts
    1,026
    Sex
    Omitted

    How I run clustering in Past4 and Excel

    This will be over multiple posts, as I'm limited to 5 images per post. I will try to use images rather than lots of description.

    Step by step, how to run G25 clustering using the following software:

    Past4
    Excel


    Step 1: Import the G25 modern dataset into Excel.

    Attachment 39244

    Find the Global25_PCA_modern file on your PC.
    Import>Load.
    Close the Queries window.
    Select all.
    Copy.
    Paste special>Values.
    Select column B and insert a new column.
    Enter the following into cell B2: =LEFT(A2, FIND(":", A2&":")-1)
    Fill down (double click bottom right of cell B2).


    Step 2: Select the populations to be clustered.

    Create a new worksheet.
    Copy and paste the populations you want to cluster. I've chosen a Northern and Western European subset for this example.
    Title the first two columns Sample and Population.

    Attachment 39245

    Step 3: Run clustering in Past4

    Copy the entire datasheet in Excel (Select all>Copy).
    Open Past4.
    Check Show Row and Column attributes (top left).
    Click on the blank cell in the Name column, left of A and above 1.
    Paste.
    Above the Population column select Group from the dropdown menu.

    Attachment 39246

    Save the file.
    Select all (top left cell does this quickly).

    Attachment 39247

    Enter the number of clusters to be created (I chose 10 for this example).
    Click OK.
    Click Copy.

    Go back to Excel.
    Insert two blank columns between the Population and PC1 columns.
    Click on cell C1 and paste.

    Attachment 39248
    Last edited by Capitalis; 08-29-2020 at 09:07 PM.

  2. The Following 26 Users Say Thank You to Capitalis For This Useful Post:

     Andour (08-30-2020),  btree (09-16-2020),  Bygdedweller (01-21-2021),  Devadatta (10-19-2020),  digital_noise (08-29-2020),  E_M81_I3A (08-30-2020),  FionnSneachta (08-30-2020),  Garimund (08-29-2020),  Itrane2000 (01-08-2021),  jadegreg (10-08-2020),  Jessie (09-19-2020),  JMcB (08-29-2020),  jstephan (09-02-2020),  Kale (12-27-2020),  laltota (01-06-2021),  MitchellSince1893 (08-31-2020),  msmarjoribanks (10-10-2020),  Northern Adriatic (04-11-2021),  Nqp15hhu (08-31-2020),  PLogan (08-30-2020),  randwulf (12-30-2020),  Robert1 (08-31-2020),  Rufus191 (06-16-2021),  Telfermagne (01-27-2021),  Tomenable (09-06-2021),  Trelvern (11-12-2020)

  3. #2
    Registered Users
    Posts
    1,026
    Sex
    Omitted

    Step 4: Lots of work in Excel!

    Select all.

    Attachment 39249

    Attachment 39250

    Sheet should now be sorted by population clusters.

    Attachment 39251

    Select entire Column B (Population).
    Copy.
    Paste special>Values.

    Insert a blank column between the Population and Item columns.
    Enter the following in cell C2: =B2&"_"&E2
    Press Enter and fill down.
    Select entire Column C (Clustered population).
    Copy.
    Paste special>Values.
    Select Columns B, D and E (control click to select multiple) and delete them.

    Sheet should look like this:

    Attachment 39252

    Select all and copy.


    Step 5: Run LDA analysis in Past4

    Go back to Past4.
    Click on the Sample cell.
    Paste.
    Select all.

    Attachment 39253

    Takes a few seconds to run LDA analysis.
    Last edited by Capitalis; 08-29-2020 at 08:45 PM.

  4. The Following 12 Users Say Thank You to Capitalis For This Useful Post:

     Andour (08-30-2020),  FionnSneachta (08-30-2020),  jadegreg (10-08-2020),  JMcB (08-29-2020),  Kale (12-27-2020),  msmarjoribanks (10-10-2020),  Nqp15hhu (09-01-2020),  randwulf (12-30-2020),  Robert1 (08-31-2020),  Rufus191 (06-16-2021),  Tomenable (09-24-2021),  Trelvern (11-12-2020)

  5. #3
    Registered Users
    Posts
    1,026
    Sex
    Omitted

    You should now have a visual representation of the relationships between the clusters you created.

    Attachment 39254

    You can play around with the Graph settings to your liking.

    Or you can change the colours and symbols.

    Attachment 39255

    You can select a subset of your clustering (e.g. remove Eastern Europeans) and run another LDA analysis.

    You can add ancient samples (to the plot or to the clustering run in Past4), by loading the Global25_PCA dataset into Excel and following the same process outlined here.

    Of course you can import your own G25 coordinates into Excel too. I keep a separate sheet with Anthrogenica members that I've collected over time.

    You can run PCA instead of LDA (LDA emphasises differences between groups, which is why I prefer it here).
    Last edited by Capitalis; 08-29-2020 at 09:05 PM.

  6. The Following 11 Users Say Thank You to Capitalis For This Useful Post:

     Andour (08-30-2020),  Devadatta (10-19-2020),  FionnSneachta (08-30-2020),  Jessie (09-19-2020),  JMcB (08-29-2020),  Kale (12-27-2020),  Nqp15hhu (09-01-2020),  randwulf (12-30-2020),  Robert1 (08-31-2020),  Tomenable (09-24-2021),  Trelvern (11-12-2020)

  7. #4
    Registered Users
    Posts
    1,026
    Sex
    Omitted

    If anyone attempts to follow this from scratch, with no prior experience, and manages to generate a plot in Past4... well done!
    Last edited by Capitalis; 08-30-2020 at 12:34 PM.

  8. The Following 5 Users Say Thank You to Capitalis For This Useful Post:

     Andour (08-30-2020),  btree (09-16-2020),  FionnSneachta (08-30-2020),  JMcB (08-29-2020),  Nqp15hhu (09-01-2020)

  9. #5
    Registered Users
    Posts
    1,026
    Sex
    Omitted

    Quote Originally Posted by Capitalis View Post
    You can select a subset of your clustering (e.g. remove Eastern Europeans) and run another LDA analysis.
    Now we get to the point of this clustering process.

    Here I've run an LDA analysis on a mostly Western European subset of the dataset I clustered above.

    Axis 1 vs Axis 2
    Attachment 39265

    In the clustering process (when we ran K=10 above), three Irish clusters were created. This means that instead of one Irish population to compare ourselves with, we now have three.

    If you view Axis 1 vs Axis 3, you can clearly see the separation between the Irish clusters. This is because Axis 3 is dominated by within-country variation.

    Axis 1 vs Axis 3
    Attachment 39266

    This means that Axis 1 vs Axis 3 should not be interpreted as a geographic map of Europe, but rather as many individual countries sharing the same space in order to display their regional differences.
    Last edited by Capitalis; 08-30-2020 at 09:40 AM.

  10. The Following 8 Users Say Thank You to Capitalis For This Useful Post:

     Andour (08-30-2020),  FionnSneachta (08-30-2020),  Garimund (08-31-2020),  jadegreg (10-08-2020),  JMcB (08-30-2020),  Nqp15hhu (08-31-2020),  randwulf (12-30-2020),  Tomenable (09-24-2021)

  11. #6
    Registered Users
    Posts
    1,026
    Sex
    Omitted

    You can now begin to model your ancestry.

    In the clustering process, four English clusters were created. I can see that I am likely a combination of English_6 and either Irish_9 or Irish_3.

    Axis 1 vs Axis 3
    Attachment 39267

    Axis 1 vs Axis 2
    Attachment 39268

    My Irish ancestry is mostly from SW Ireland, West Cork in particular. I am not attracted to Irish_4, which is a hint that this is a northern half of Ireland cluster.

    We can support this interpretation by adding Anthrogenica member FionnSneachta, who is largely of Connacht ancestry, to the plot.

    Axis 1 vs Axis 3
    Attachment 39269

    Axis 1 vs Axis 2
    Attachment 39270
    Last edited by Capitalis; 08-30-2020 at 12:37 PM.

  12. The Following 10 Users Say Thank You to Capitalis For This Useful Post:

     Andour (08-30-2020),  FionnSneachta (08-30-2020),  Garimund (08-31-2020),  jadegreg (10-08-2020),  Jessie (09-19-2020),  JMcB (08-30-2020),  Nqp15hhu (08-31-2020),  randwulf (12-30-2020),  Ruderico (08-31-2020),  Tomenable (09-24-2021)

  13. #7
    Registered Users
    Posts
    1,026
    Sex
    Omitted

    We now want to run some models, guided by what we have observed on the LDA plots, in Vahaduo.

    Go back to the Excel worksheet.
    Right click on the worksheet tile at the bottom of the screen.

    Attachment 39271

    Attachment 39272

    Insert two blank columns between the Sample and Clustered population columns.
    Enter the following in cell B2: =REPLACE(A2,1,FIND(":",A2)-1,"")
    Press Enter and fill down.
    Enter the following in cell C2: =D2&""&B2
    Press Enter and fill down.
    Select entire Column C.
    Copy.
    Paste special>Values.
    Select Columns A, B and D (control click to select multiple) and delete them.

    Sheet should look like this:

    Attachment 39273
    Last edited by Capitalis; 08-30-2020 at 11:11 AM.

  14. The Following 9 Users Say Thank You to Capitalis For This Useful Post:

     Andour (08-31-2020),  FionnSneachta (08-30-2020),  Garimund (08-31-2020),  jadegreg (10-08-2020),  JMcB (08-30-2020),  Nqp15hhu (09-01-2020),  randwulf (12-30-2020),  Ruderico (08-31-2020),  Tomenable (09-24-2021)

  15. #8
    Registered Users
    Posts
    1,026
    Sex
    Omitted

    We now want to create a CSV file, so we can copy the source populations we want to model with into Vahaduo.

    In Excel:
    File>Export.

    Attachment 39274

    Click Save As.
    Name the file and click Save.
    Click OK on the warning message.

    Locate the newly created CSV file on your PC.
    Right click on the file.

    Attachment 39275

    File should look like this in Notepad:

    Attachment 39276
    Last edited by Capitalis; 08-30-2020 at 11:26 AM.

  16. The Following 9 Users Say Thank You to Capitalis For This Useful Post:

     Andour (08-31-2020),  FionnSneachta (08-30-2020),  Garimund (08-31-2020),  jadegreg (10-08-2020),  JMcB (08-30-2020),  Nqp15hhu (09-01-2020),  randwulf (12-30-2020),  Ruderico (08-31-2020),  Tomenable (09-24-2021)

  17. #9
    Registered Users
    Posts
    1,026
    Sex
    Omitted

    Copy and paste into the Vahaduo Source tab the population clusters that you identified in the LDA analysis in Past4.

    Attachment 39277

    Paste your G25 coordinates into the Target tab, then select the Single tab and run the model.

    Attachment 39278

    Looks pretty good based on my known ancestry and LivingDNA results.

    Try any alternative model(s) that you identified.

    Attachment 39279

    Not so good.

    I then wanted to confirm in Vahaduo that Irish_4 is a northern half of Ireland cluster, so I ran Anthrogenica member FionnSneachta with all Irish samples as sources.

    Attachment 39280

    Attachment 39281

    Looks good.
    Last edited by Capitalis; 08-30-2020 at 12:39 PM.

  18. The Following 7 Users Say Thank You to Capitalis For This Useful Post:

     Andour (08-31-2020),  FionnSneachta (08-30-2020),  Garimund (08-31-2020),  jadegreg (10-08-2020),  JMcB (08-30-2020),  Nqp15hhu (09-01-2020),  randwulf (12-30-2020)

  19. #10
    Registered Users
    Posts
    1,026
    Sex
    Omitted

    If this thread is too much and you just want the finished DAT file for use in Past4, or TXT file with the clustered populations for use in Vahaduo, I have uploaded G25 N+W Europe K10 clustered to my shared OneDrive folder.

    My G25 coordinates:

    Code:
    Capitalis_G25,0.0119,0.0134,0.0137,0.0131,0.0136,0.0047,0.0024,0.0016,0.0007,0.0029,-0.0022,-0.0015,-0.0093,-0.0019,0.0142,0.0055,0.0002,0.0020,0.0051,0.0019,0.0055,0.0062,-0.0016,0.0073,0.0008


    Attachment 39282
    Last edited by Capitalis; 08-30-2020 at 10:41 PM.

  20. The Following 6 Users Say Thank You to Capitalis For This Useful Post:

     FionnSneachta (08-30-2020),  jadegreg (10-08-2020),  JMcB (08-30-2020),  randwulf (12-30-2020),  Ruderico (08-31-2020),  Tomenable (09-24-2021)

Page 1 of 30 12311 ... LastLast

Similar Threads

  1. Replies: 227
    Last Post: 08-30-2020, 12:03 PM
  2. PAST4 Greek regionals plot.
    By dosas in forum Southern
    Replies: 0
    Last Post: 07-16-2020, 10:25 AM
  3. 1405 Ancient DNA Genomes - BAM Analysis in Excel File
    By falconson1 in forum Ancient (aDNA)
    Replies: 16
    Last Post: 12-08-2019, 10:58 PM
  4. A TMRCA Estimator Excel spreadsheet
    By MJost in forum Other
    Replies: 42
    Last Post: 03-29-2014, 08:32 PM
  5. Excel Sheet of Haplotypes Glitch
    By scottraveler in forum General
    Replies: 1
    Last Post: 08-09-2012, 07:26 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •