Page 1 of 74 1231151 ... LastLast
Results 1 to 10 of 731

Thread: Eurogenes Global 25 with Gradient Descent Optimization - Run Your Own Models!

  1. #1
    Gold Class Member
    Posts
    3,147
    Sex
    Location
    Pennsylvania
    Ethnicity
    West European
    Nationality
    USA
    Y-DNA (P)
    R1b-U152-Z36+FGC6511
    mtDNA (M)
    H11a2a

    United States of America Germany England France Scotland Ireland

    Eurogenes Global 25 with Gradient Descent Optimization - Run Your Own Models!

    My friend has worked with me to finish an online version of the Gradient Descent optimization for Eurogenes Global 25 modeling. I mostly provided guidance/specifications, he provided the coding and the knowledge of the optimizers. I have been running models for a few months on request while this was being finished. Now you can run them for yourself. Give it a try:

    https://yk.github.io/ancestry

    I will make a few more posts with "how to use it", but you may want to dive right in. Have at it!
    Last edited by randwulf; 01-09-2020 at 12:31 PM.

  2. The Following 71 Users Say Thank You to randwulf For This Useful Post:

     agent_lime (01-12-2020),  ajc347 (02-16-2020),  Alain (01-20-2020),  Andour (01-09-2020),  Angoliga (01-09-2020),  antpet (02-10-2021),  Aspar (01-12-2020),  asquecco (01-11-2020),  AureliusDNA (03-05-2021),  Bygdedweller (09-24-2020),  Caius Agrippa (04-10-2020),  dany198124 (01-11-2020),  Defski (01-09-2020),  Deftextra (01-13-2020),  Dewsloth (01-09-2020),  discreetmaverick (01-10-2020),  Dorkymon (01-09-2020),  Drewcastle (03-15-2021),  estesiquesabe (03-08-2021),  E_M81_I3A (01-18-2020),  Finn (01-18-2020),  firemonkey (01-15-2020),  Generalissimo (01-14-2020),  Greekscholar (01-09-2020),  Hurricane (01-09-2020),  Itrane2000 (10-29-2020),  jelliedsoup (01-09-2020),  JerryS. (11-18-2020),  Jessie (01-10-2020),  JMcB (01-09-2020),  jonahst (01-09-2020),  JonikW (02-16-2020),  Kelmendasi (01-09-2020),  Koolmets21 (01-09-2020),  linthos (01-17-2020),  maroco (11-13-2020),  Melissay122 (11-11-2020),  Mennovolg (01-09-2020),  michal3141 (01-18-2020),  Nibelung (01-16-2020),  Nino90 (01-09-2020),  NixYO (02-04-2020),  Nqp15hhu (11-10-2020),  oz (01-16-2020),  parasar (01-10-2020),  pegasus (01-11-2020),  ph2ter (01-10-2020),  PLogan (01-12-2020),  poi (01-18-2020),  Reza (01-19-2020),  Riverman (11-13-2020),  Robert1 (01-17-2020),  rober_tce (01-12-2020),  Ruderico (01-09-2020),  RVBLAKE (01-19-2020),  Samuel7312 (01-10-2020),  Sea Warrior (03-07-2020),  sktibo (01-16-2020),  ssamlal (01-11-2020),  tatals (01-22-2020),  Telfermagne (03-15-2021),  Theconqueror (01-09-2020),  Thracian88 (01-10-2020),  timberwolf (01-09-2020),  Tolan (01-10-2020),  Tomenable (01-10-2020),  Trelvern (01-09-2020),  vettor (01-09-2020),  vishankar (09-29-2020),  xerxez (03-25-2021),  Youenn (01-10-2020)

  3. #2
    Gold Class Member
    Posts
    3,147
    Sex
    Location
    Pennsylvania
    Ethnicity
    West European
    Nationality
    USA
    Y-DNA (P)
    R1b-U152-Z36+FGC6511
    mtDNA (M)
    H11a2a

    United States of America Germany England France Scotland Ireland
    If you are curious, you can search for sites/posts/articles on Gradient Descent, such as this one:

    https://towardsdatascience.com/machi...t-366b77b52645

    The application screen looks like the following:

    MainScreen.png

    In the top box, paste your scaled or unscaled coordinates (with or without the header line). Then pick a model. The eight Eurogenes spreadsheets are available or if you choose "Custom", a text box is opened where you can paste your own model. Just make sure, of course, that you use scaled coordinates with scaled spreadsheets or unscaled with unscaled.

    Next, choose an optimizer. There are two available AdaGrad or Adam. They each work a little differently and have some different parameters that you can set. I recommend using the defaults at first, though you may want to experiment with the options. The application is designed to run in your browser on your device, so you won't hurt anything by experimenting. If it hangs up, I recommend just reloading the page. If you pick a lot of steps, it may run a while.

    I think the two parameters with which you will want to experiment first are the learning rate and number of steps parameters. I have found I get some pretty good results, for example, for my family using the unscaled, modern individuals spreadsheet with a learning rate of 300 and 10000 steps. You may find differently for your coordinates.
    Last edited by randwulf; 01-09-2020 at 03:19 AM.

  4. The Following 25 Users Say Thank You to randwulf For This Useful Post:

     Andour (01-09-2020),  Angoliga (01-09-2020),  discreetmaverick (01-10-2020),  evon (01-18-2020),  Greekscholar (01-09-2020),  Hurricane (01-09-2020),  jelliedsoup (01-13-2020),  Jessie (01-10-2020),  JJJ (01-09-2020),  laulei (11-17-2020),  Melissay122 (11-11-2020),  Nibelung (01-16-2020),  Nino90 (01-09-2020),  Nqp15hhu (11-10-2020),  Robert1 (01-17-2020),  Ruderico (01-09-2020),  RVBLAKE (01-19-2020),  Sea Warrior (12-19-2020),  sktibo (01-16-2020),  timberwolf (01-09-2020),  Tomenable (01-10-2020),  Trelvern (01-09-2020),  vettor (01-09-2020),  vishankar (09-29-2020),  Youenn (01-10-2020)

  5. #3
    Gold Class Member
    Posts
    3,147
    Sex
    Location
    Pennsylvania
    Ethnicity
    West European
    Nationality
    USA
    Y-DNA (P)
    R1b-U152-Z36+FGC6511
    mtDNA (M)
    H11a2a

    United States of America Germany England France Scotland Ireland
    The first output you will get looks like the following:

    lossgraph.png

    This is a graph that shows the progress that the Gradient Descent optimizer is making as it steps "down the hill" from its starting point to a closer distance model. You can get a visual idea of how efficiently your parameters are allowing the optimizer to find a solution for you. Often you will see a fairly quick initial progress with smaller progress steps over time.

  6. The Following 15 Users Say Thank You to randwulf For This Useful Post:

     Andour (01-09-2020),  Angoliga (01-09-2020),  dany198124 (01-11-2020),  Greekscholar (01-09-2020),  Hurricane (01-09-2020),  Jessie (01-10-2020),  Nibelung (01-16-2020),  Nino90 (01-09-2020),  Nqp15hhu (11-10-2020),  Robert1 (01-17-2020),  Ruderico (01-09-2020),  Sea Warrior (03-07-2020),  Tomenable (01-10-2020),  Trelvern (01-09-2020),  vettor (01-09-2020)

  7. #4
    Gold Class Member
    Posts
    3,147
    Sex
    Location
    Pennsylvania
    Ethnicity
    West European
    Nationality
    USA
    Y-DNA (P)
    R1b-U152-Z36+FGC6511
    mtDNA (M)
    H11a2a

    United States of America Germany England France Scotland Ireland
    The second output that you get is a fairly-standard-around-here listing of the results with a distance%, populations, and percentages. Only results greater than 0.5% are displayed and the rest are summed to the "Other" category. If you have a large "Other" category, usually you can reduce this by adjusting the learning rate and steps. I found that I can usually reduce this amount to less than 0.5% if I play with the learning rate and steps by looking at the visual and see how it is making progress or if it is making progress (or has stopped indicated by a "level" long line at the end of the process). Then I make some adjustments to see if I can get the optimizer to look at the solution from a slightly different perspective. Here is what that report looks like:

    percs.png
    Last edited by randwulf; 01-09-2020 at 03:35 AM.

  8. The Following 15 Users Say Thank You to randwulf For This Useful Post:

     Andour (01-09-2020),  Angoliga (01-09-2020),  Greekscholar (01-09-2020),  Hurricane (01-09-2020),  Jessie (01-10-2020),  JMcB (01-09-2020),  Melissay122 (11-11-2020),  Nibelung (01-16-2020),  Nino90 (01-09-2020),  Nqp15hhu (11-10-2020),  Robert1 (01-17-2020),  Ruderico (01-09-2020),  Sea Warrior (12-19-2020),  Tomenable (01-10-2020),  vettor (01-09-2020)

  9. #5
    Gold Class Member
    Posts
    3,147
    Sex
    Location
    Pennsylvania
    Ethnicity
    West European
    Nationality
    USA
    Y-DNA (P)
    R1b-U152-Z36+FGC6511
    mtDNA (M)
    H11a2a

    United States of America Germany England France Scotland Ireland
    The last report is a bar graph representation of your results. If you hover over or click any of the lines, the percentage is displayed for the line. This looks like the following:

    bargraphresults.png
    Last edited by randwulf; 01-09-2020 at 03:36 AM.

  10. The Following 10 Users Say Thank You to randwulf For This Useful Post:

     Andour (01-09-2020),  Angoliga (01-09-2020),  Greekscholar (01-09-2020),  Hurricane (01-09-2020),  JMcB (01-09-2020),  Nibelung (01-16-2020),  Nino90 (01-09-2020),  Robert1 (01-17-2020),  Ruderico (01-09-2020),  vettor (01-09-2020)

  11. #6
    Gold Class Member
    Posts
    3,147
    Sex
    Location
    Pennsylvania
    Ethnicity
    West European
    Nationality
    USA
    Y-DNA (P)
    R1b-U152-Z36+FGC6511
    mtDNA (M)
    H11a2a

    United States of America Germany England France Scotland Ireland
    Both optimizers allow the addition of an "L2" penalty to introduce additional "regularization" into the optimization process beyond anything those optimizers already have built into their standard formulas. This is something like the penalty for nMonte, but here is an article for the mathematics minded to see how L2 would be used with the Adam optimizer. I imagine you could find something similar for AdaGrad.

    https://towardsdatascience.com/adam-...n-6be9a291375c

    If you leave this at zero, you just get the default behavior of the optimizer with no additional regularization.

  12. The Following 8 Users Say Thank You to randwulf For This Useful Post:

     Andour (01-09-2020),  Angoliga (01-09-2020),  Greekscholar (01-09-2020),  JMcB (01-09-2020),  Nino90 (01-09-2020),  Robert1 (01-17-2020),  Ruderico (01-09-2020),  vettor (01-09-2020)

  13. #7
    Gold Class Member
    Posts
    3,147
    Sex
    Location
    Pennsylvania
    Ethnicity
    West European
    Nationality
    USA
    Y-DNA (P)
    R1b-U152-Z36+FGC6511
    mtDNA (M)
    H11a2a

    United States of America Germany England France Scotland Ireland
    Both optimizers are set up with the ability to cause a sampling of the spreadsheet to occur prior to processing the model by supplying an initialization "Standard Deviation" amount. This can have some interesting affects, but I am not sure exactly how best to use it, to be honest. If you leave the value at 0.0, then the entire spreadsheet is used. My friend said this is a technique used for very large data sets, but that this data set may not be large enough to make this a useful technique. Nevertheless, it is there and available if you want to try it out.

  14. The Following 8 Users Say Thank You to randwulf For This Useful Post:

     Angoliga (01-09-2020),  Greekscholar (01-09-2020),  JMcB (01-09-2020),  Nibelung (01-16-2020),  Nino90 (01-09-2020),  Robert1 (01-17-2020),  Ruderico (01-09-2020),  vettor (01-09-2020)

  15. #8
    Gold Class Member
    Posts
    3,147
    Sex
    Location
    Pennsylvania
    Ethnicity
    West European
    Nationality
    USA
    Y-DNA (P)
    R1b-U152-Z36+FGC6511
    mtDNA (M)
    H11a2a

    United States of America Germany England France Scotland Ireland
    The Adam optimizer has a couple of additional parameters "Beta1" and "Beta2". These are part of the actual formula that is running. The following article, again for the mathematics-minded, runs through AdaGrad, RMSProp (not part of the application at this time), and Adam in some detail, including the Adam Beta1 and Beta2 parameters. Most of the articles I have read recommend leaving these at the default values that you see in the application, but I did get some interesting results by modifying them slightly. It doesn't hurt to try, but you may want to just stick with the defaults. Here is the article:

    https://towardsdatascience.com/learn...5-65a2f3583f7d

  16. The Following 8 Users Say Thank You to randwulf For This Useful Post:

     Angoliga (01-09-2020),  Greekscholar (01-09-2020),  JMcB (01-09-2020),  Melissay122 (11-11-2020),  Nino90 (01-09-2020),  Robert1 (01-17-2020),  Ruderico (01-09-2020),  vettor (01-09-2020)

  17. #9
    Global Moderator
    Posts
    3,727
    Sex
    Location
    Vissaiom
    Ethnicity
    Portuguese highlander
    Y-DNA (P)
    E-Y31991>FT17866
    mtDNA (M)
    H20 (xH20a)

    Asturias Galicia Portugal 1143 Portugal 1485 Portugal Order of Christ PortugalRoyalFlag1830
    Thanks randwulf

    I tried with standard settings using population averages (I always like averages because they allow me to better understand how a sample behaves) and the output is identical to nMonte3
    Code:
    [1] "distance%=1.2135"
    
             Ruderico
    
    Spanish_Castilla_Y_Leon,61.8
    Spanish_Asturias,36.2
    Spanish_Pais_Vasco,2



    Edit: Same with ancients, I wonder if it would be good idea to remove low-res samples from the list

    Last edited by Ruderico; 01-09-2020 at 01:56 PM.
    YDNA E-Y31991>PF4428>Y134097>Y134104>Y168273>FT17866 (TMRCA ~1100AD) - Domingos Rodrigues, b. circa 1690 Hidden Content , Viana do Castelo, Portugal - Stonemason, miller.
    mtDNA H20 - Monica Vieira, b. circa 1700 Hidden Content , Porto, Portugal

    Hidden Content
    Global25 PCA West Eurasia dataset Hidden Content

    [1] "distance%=1.6007"

    Ruderico

    NW_Iberia_IA,80.4
    Berber_EMA,11
    Roman_Colonial,8.6

  18. The Following 12 Users Say Thank You to Ruderico For This Useful Post:

     Andour (01-09-2020),  evon (01-18-2020),  Greekscholar (01-09-2020),  Hurricane (01-09-2020),  JMcB (01-09-2020),  Melissay122 (11-11-2020),  Nqp15hhu (11-10-2020),  parasar (01-10-2020),  randwulf (01-09-2020),  Robert1 (01-17-2020),  Tomenable (01-10-2020),  Trelvern (01-09-2020)

  19. #10
    Gold Class Member
    Posts
    3,147
    Sex
    Location
    Pennsylvania
    Ethnicity
    West European
    Nationality
    USA
    Y-DNA (P)
    R1b-U152-Z36+FGC6511
    mtDNA (M)
    H11a2a

    United States of America Germany England France Scotland Ireland
    Quote Originally Posted by Ruderico View Post
    Thanks randwulf

    I tried with standard settings using population averages (I always like averages because they allow me to better understand how a sample behaves) and the output is identical to nMonte3




    Edit: Same with ancients, I wonder if it would be good idea to remove low-res samples from the list

    I guess I want to keep it simple and load the published files. But, the Custom option allows you to insert your own pruned version into a text box and run it. It can take an entire spreadsheet pretty easily, as I have tried it.

  20. The Following 9 Users Say Thank You to randwulf For This Useful Post:

     Andour (01-09-2020),  Bl1tzTurk (01-18-2020),  Greekscholar (01-09-2020),  Jessie (01-10-2020),  JMcB (01-09-2020),  Nino90 (01-09-2020),  Robert1 (01-17-2020),  Ruderico (01-09-2020),  vishankar (09-29-2020)

Page 1 of 74 1231151 ... LastLast

Similar Threads

  1. Replies: 35
    Last Post: 03-16-2020, 06:12 PM
  2. Replies: 873
    Last Post: 01-09-2020, 02:32 AM
  3. Replies: 528
    Last Post: 12-19-2019, 03:52 AM
  4. Replies: 10
    Last Post: 05-18-2019, 07:35 PM
  5. Replies: 2
    Last Post: 06-04-2016, 03:11 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •