PDA

View Full Version : UCLA SPA Results



Telfermagne
07-31-2013, 06:56 PM
Finally got access to a non-cloud based computer with command prompt so I could run UCLA's SPA on my 23andMe (V2 to V3 upgrade) raw data. Feel free to share your results in this thread if you have utilized this utility. The two way mix was a pleasant surprise, the location in Kent, England and the second location in Lorraine accounts for my British Islander and Alemannic mix (Alsace-Lorraine and Switzerland), Rhenish (Rhine-Pfalz) and Rhine-Alemannic (Baden).


This is consistent with Dr. Doug McDonald’s prediction of 53.2% English and 46.8% French and his “genetic location” prediction (if I were an unmixed individual) - which was off the coast of Southern England (near the Channel Islands).


http://i1161.photobucket.com/albums/q506/sar1227/UCLASPA2waymixprediction_zpsd40d89fe.jpg (http://s1161.photobucket.com/user/sar1227/media/UCLASPA2waymixprediction_zpsd40d89fe.jpg.html)

Sein
08-08-2013, 04:51 AM
Hi there,

For those of us who are computational illiterate (like myself), how do you use the command prompt, what do you really have to do? This tool looks like a lot of fun, but I am completely handicapped when it comes to even simple stuff like this. They don't really explain it well on the page. Thank you!

MJost
08-08-2013, 11:18 AM
Just for fun I plotted on a google map my four Jost siblbings' dual SPA result. Chris is the True paternal location and his other end is the oddest. Maternal is Dutch GGgrandfather and Old Romanian core aDNA for GGgrandmother who was born in Poland 1864. My line is shows half maternal with my four siblings. The parental dna is so randomly distributed to the children.

https://mapsengine.google.com/map/edit?mid=zY_XzlsWjKZs.kQzKSUPqSDZg

MJost

Telfermagne
08-10-2013, 09:03 PM
The SPA seems to be a lot like Dr. McDonald's analysis, it can only accurately pinpoint one's general genetic "homelands" when one is either of one ancestry group, two ancestry groups, or two chief ancestry groups with any other groups being extremely minute, or redundant, so that they would not be very influential in the analysis. When one is "Heinz 57" the test can only give an "ancestral average" result, so not all folks can get non-confusing results.

It was a pleasant surprise to see actual ancestral residencies within a reasonable radius of both of my points.


From Kent -

Hurstbourne Priors - 98.3 miles from Kent
Andover - 103 miles from Kent
Halesworth - 121 miles from Kent
Westhall - 126 miles from Kent
Stainforth - 205 miles from Kent
Whitkirk - 221 miles from Kent
Whitgift - 219 miles from Kent
Blacktoft - 223 miles from Kent
----------------------------------------------------------
Castlederg - 534 miles from Kent
Churchill - 561.72 miles from Kent

From Lorraine -

Saarbrücken - 60.9 miles from Lorraine
Alsace - 106.3 miles from Lorraine
Hütten - 104.4 miles from Lorraine
Berwangen - 174.6 miles from Lorraine
Fürfeld - 176.46 miles from Lorraine
Kuchen - 206.9 miles from Lorraine

Telfermagne
08-13-2013, 06:23 PM
Saw some variation after doing multiple runs so I did 20 consecutive runs then identified the most common prediction.

The dual runs will be the most meaningful for I am an admixed individual. My primary ancestry groups are from England, Ulster, Wales and the German states that were once part of Alemannia and Lotharingia, then Bavaria. Given the extent of my mix, the program can only give ancestral “averages” - or best fits under the assumption that I am of only two ancestry groups.

8 out of 20 runs predict my first ancestry as being Welsh. 8 out of 20 runs predict my first ancestry as being English. 4 out of 20 runs predict my first ancestry as being Northern French. 16 out of 20 runs predict one ancestry group as being British Islander. 11 out of 20 runs predict my second ancestry group as being German. 9 out of 20 runs predict my second ancestry group as being German-French (Lorraine & Alsace). 20 out of 20 runs predict my second ancestry group as being from a German cultured region.

1.) cd C:\
2.) spa --mfile genome_Seth_Reeder_Full_20130731102500.txt --model-input europe.model --location-output europe.loc -n 2
--------------------------------------------------------------------------------------------------------------------
Trial Run (July 31, 2013) -
If a two way mix:
- My predicted first ancestry is from Kent, South East England
- My predicted second ancestry is from Lorraine, North-eastern France

If unmixed:
- My predicted ancestry is from Picardy, Northern France
--------------------------------------------------------------------------------------------------------------------

Dual Run (assumes that the subject descends from two ancestry groups)
Official Run 1 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Mid Wales
- My predicted second ancestry is from Bavaria, Germany

Official Run 2 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Pais-des-Calais, Northern France
- My predicted second ancestry is from Lorraine, North-eastern France

Official Run 3 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from London, South East England
- My predicted second ancestry is from Lorraine, North-eastern France

Official Run 4 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from North-west Wales
- My predicted second ancestry is from Bavaria, Germany

Official Run 5 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Mid-west Wales
- My predicted second ancestry is from Bavaria, Germany

Official Run 6 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Mid-west Wales
- My predicted second ancestry is from Bavaria, Germany

Official Run 7 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Worcestershire, West Midlands, England
- My predicted second ancestry is from Baden-Württemberg, Germany

Official Run 8 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Pais-des-Calais, Northern France
- My predicted second ancestry is from Ardennes, Northern France

Official Run 9 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Kent, South East England
- My predicted second ancestry is from Lorraine, North-eastern France

Official Run 10 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Mid Wales
- My predicted second ancestry is from Bavaria, Germany

Official Run 11 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Buckinghamshire, South East England
- My predicted second ancestry is from Baden-Württemberg, Germany

Official Run 12 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Kent, South East England
- My predicted second ancestry is from Lorraine, North-eastern France

Official Run 13 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Buckinghamshire, South East England
- My predicted second ancestry is from Alsace, North-eastern France

Official Run 14 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Pais-des-Calais, Northern France
- My predicted second ancestry is from Lorraine, North-eastern France

Official Run 15 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Mid-west Wales
- My predicted second ancestry is from Bavaria, Germany

Official Run 16 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from North-west Wales
- My predicted second ancestry is from Bavaria, Germany

Official Run 17 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Mid-west Wales
- My predicted second ancestry is from Bavaria, Germany

Official Run 18 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Buckinghamshire, South East England
- My predicted second ancestry is from Baden-Württemberg, Germany

Official Run 19 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Pais-des-Calais, Northern France
- My predicted second ancestry is from Lorraine, North-eastern France

Official Run 20 (August 13, 2013)
- 1 individual 991,786 SNPs are read
- 103,218 SNPs in common between model and genome file
- My predicted first ancestry is from Hampshire, South East England
- My predicted second ancestry is from Lorraine, North-eastern France

Telfermagne
08-14-2013, 07:36 PM
Upped it to 40 runs total (20 runs on August 13, 2013 and another 20 on August 14, 2013). Gonna eventually get the total up to 100 and see how the figures look.

Western Britain + Southern Germany
12 Wales + S German
5 West Mid England + S German
1 Irish Sea nearer to SE Ireland + West Austrian bordering S German
Total = 18 out of 40. Prediction occurs 45% of the time.


Northern France + North-eastern France
10 N France + NE France
Total = 10 out of 40. Prediction occurs 25% of the time.

South East England + North-eastern France
9 SE England + NE French
Total = 9 out of 40. Prediction occurs 22.5% of the time.


South East England + Southern Germany
3 SE England + S German
Total = 3 out of 40. Prediction occurs 7.5% of the time.

Telfermagne
06-27-2014, 04:00 AM
I wound up doing a total of 60 runs with SPA. I will share the final tally and another step. I decided to average the results out to see where that single point fell in relation to what I got in single-population mode. Turns out the single-population mode usually coincided with where my two-way splits balanced out at (Northern France). I'm going to go out on a limb and wager that this coincidence can be used to reinforce the point that the single population mode is a kind of "ancestral average", like Dr. McDonald's spot on the map. Sort of beating a dead horse, but doing so with the hope that there's evidence weighing the club/bat/mallet/hammer/stick. The happening that SPA gives different outcomes with varying recurrence values leads me to suspect that the engines behind 23andMe and FTDNA's aDNA tools do something similar, I'm wondering if they are balancing out the probabilities for the final presentation or if they're just going with whatever their programs first spit out.


Western British Islander and Southern Germanosphere mix
15 Wales + Southern Germany
10 West Midlands + Southern Germany
2 Irish Sea nearer to SE Ireland + Southern German/West Austrian Border
Total = 27 out of 60. Prediction occurs 45% of the time.

South East English and Southern Germanosphere mix
7 South East England + Southern Germany
Total = 7 out of 60. Prediction occurs 11.66666% of the time.

South East English and North-eastern French mix
13 South East England + North-eastern France
Total = 13 out of 60. Prediction occurs 21.66666% of the time.

Northern French and North-eastern French mix
13 Northern France + North-eastern France
Total = 13 out of 60. Prediction occurs 21.66666% of the time.

With regard to going another step and averaging out the two-way split so to see if it matches with the general vicinity of what I get with single population mode: The point that I get in single population mode umps between somewhere in Northern France and the English Channel (more frequently in Northern France though). To assist in this task I will be using an online utility for finding a geographic midpoint: http://www.geomidpoint.com/

First the weights between the initial points seem to be equal since SPA does not yield detailed percentages. It just gives two spots on the map, so I will assume that it's attempting to predict a 50/50 split. Next I'm going to work with a line that goes between the reported points.

Under category 1 (Western British Islander + Southern Germanosphere mix), my point farthest Northwest is in Ireland while my point farthest Southeast is in Southern Germany near the Austrian border (I will just assign Austria for the second point). The equal weighted geographic midpoint between Ireland and Austria is in Belgium. I will recycle the percentage of recurrence for this category of two way split as a weight, and that is 45% for a weight of 0.45. This will be point A.

Under category 2 (Southeast English + Southern Germanosphere mix), my point farthest Northwest is in SE England (lets give it to Kent) while my point farthest Southeast is in Southern Germany around Munich. The equal weighted geographic midpoint between Southeast England and the part of Germany around Munich is in Luxembourg. This will be point B and its weight will be 0.1166666.

A weighted midpoint between points A & B falls well within Belgium. So, I will create a new point A in Belgium with a weight combining that of the old point A and point B for a total of 0.5666666.

Under category 3 (Southeast English + Northeast French), my point farthest Northwest is in SE England (let's give it to Kent also) while my point farthest Southeast lies in Northeast France (lets give it to Meuse). Point C will then be in the Pas-de-Calais region of France. The weight will be 0.2166666.

Under category 4 (NE France + Northern France), I will find a midpoint between Picardy and Meusse (both recurrent locations in SPA). Point D will then be in the Marne department. The weight will be 0.2166666.

Points C & D have equal weight, and a midpoint is in Picardy and it will be the new point B. The weight will then be 0.4333332.

The midpoint between the new A & B is voila in Northern France, around the Nord Department which is in the vicinity of other locations given by the single population mode.

Telfermagne
09-25-2014, 07:53 PM
Played around a bit with my SPA results and redid the weighted-geographic midpoint between my results (similar to 23andMe's Global Similarity). The result that I got is about 76-91 miles away from what SPA spat out for my non-mixed mode (Weighted Midpoint = Audignies; non-mixed mode = Picardy). This is very tight and well within a 600 mile error (a rule of thumb that I've seen Dr. McDonald toss around at FTDNA Forums). I also compared this weighted midpoint to what I got with my 23andMe results.

The midpoint I get with my 23andMe results is Wrangle, Boston, Lincolnshire, England which is about 439 miles from the weighted-midpoint I get from SPA. This also is well within a 600 mile error.

More details regarding the SPA weighted midpoint:

http://alookatgeneticgenealogy.blogspot.com/2014/09/a-look-at-uclas-spatial-ancestry.html

In short I found a midpoint for each of the 60 mixed-mode results SPA spat out (un-weighted since SPA does not give precise %). I then found a weighted midpoint between the previously found midpoints with the weight being the % of occurrence.

Rebecq, Belgium 0.2333 occurrence
Ath, Belgium 0.1333 occurrence
Ham-sur-Heure-Nalinnes, Belgium 0.0333 occurrence
Wetteren, Belgium 0.0333 occurrence
Lobbes, Belgium 0.0833 occurrence
Peruwelz, Belgium 0.05 occurrence
La Férée 0.2 occurrence
Oisy-le-Verger 0.0167 occurrence
Solesmes 0.0833 occurrence
Caudry 0.2 occurrence
Hédauville 0.0167 occurrence
Wattignies 0.0167 occurrence
Viesly 0.0333 occurrence
Sarton 0.0167 occurrence
Longueville 0.0167 occurrence
Leschelle 0.0333 occurrence

How I got the 23andMe weighted midpoint (I split and shared Nonspecific N. Euro between the various N. Euro populations):

Point B Scratchwork: 53.00458, 0.234492. Wrangle, Boston, Lincolnshire, England.
23andMe ID: Seth Reeder (Cadwallon)

52.2% British & Irish
11.6% French & German
3.1% Scandinavian
29.1% Nonspecific Northern European
-------------------------------------------------------------------
1.1% Iberian
0.5% Ashkenazi
2.2% Nonspecific European
0.1% West African

United Kingdom 52.735567708185, -1.6789861408734. Weight 0.261 + 0.0415714286 + 0.0031428571 = 0.305714286
Republic of Ireland 53.115425619273, -7.4069761943761. Weight 0.261 + 0.0415714286 + 0.0031428571 = 0.305714286
Weighted Outcome1: 52.959934, -4.530406. Weight = 0.611428572

France 47.123369318629, 2.6805478522224. Weight 0.058 + 0.0415714286 + 0.0031428571 = 0.102714286
Germany 50.846581236184, 9.6860092676237. Weight 0.058 + 0.0415714286 + 0.0031428571 = 0.102714286
Weighted Outcome1: 49.037944, 6.052218. Weight = 0.205428572

Sweden 58.888444437865, 15.535572129725. Weight 0.0103333333 + 0.0415714286 + 0.0031428571 = 0.055047619
Norway 61.089862500005, 9.9201984517371. Weight 0.0103333333 + 0.0415714286 + 0.0031428571 = 0.055047619
Denmark 55.853126914967, 10.864851918353. Weight 0.0103333333 + 0.0415714286 + 0.0031428571 = 0.055047619
Weighted Outcome1: 58.633564, 12.117512. Weight = 0.165142857

Spain 39.674578415577, -3.2843582298931. Weight 0.0055 + 0.0031428571 = 0.0086428571
Portugal 39.704754462102, -9.1655868359197. Weight 0.0055 + 0.0031428571 = 0.0086428571
Weighted Outcome1: 39.726771, -6.224329. Weight = 0.0172857142

Ashkenazi (based on Polish cities with notable Jewish populations) 51.906906, 18.550952. Weight = 0.005

West Africa 8.48445, -13.23445. Weight = 0.001

Final Weighted Outcome = 53.00458, 0.234492

Telfermagne
09-29-2014, 05:01 PM
Did 50 more runs, this time with an updated raw data file from 23andMe. Had forgotten to replace that after 23andMe made some changes. Results weren't too much different.

As noted before SPA does not give the same result each time. That being said, run the program multiple times and find the most recurrent result. I presume that the most recurrent result is the most likely result (general area as opposed to exact coordinates).

My most likely outcome was one half British Islander and one half German speaker. This outcome happened 46% of the time. My next most likely outcome was one half British Islander and one half Northern (& Northeastern) French. This outcome happened 34% of the time. next most likely outcome was one half Northern (& Northeastern French) and one half Northern (& Northeastern French). This outcome happened 20% of the time.

EDIT:

I ran it again with single population mode. I got the same result every time (Picardy). I'm guessing that either the mixed-mode is unstable, or my previous guess is true (the most recurrent combination = the most likely).