# Thread: How I run clustering in Past4 and Excel

1. Complete analysis of Y-DNA haplogroup I-M253 (I1); 50 tests minimum.

Table

Hierarchical clustering

PCA plot (PC 1v2)

PCA plot (PC 1v2) - removed Finland and Spain.

If you can interpret any of this output, you're a better person than me. You probably are anyway.

3. Complete analysis of all subclades below R-P312>ZZ11; 50 R-ZZ11 tests minimum.

R-ZZ11 is the MRCA of R-U152 (42% of England R-ZZ11) and R-DF27 (58% of England R-ZZ11).

I don't agree with the thought that we shouldn't analyse modern Y-DNA and rather should wait for our subclades of personal interest to be found in ancient DNA, as

A. I suspect I'll be long dead before bones containing my precise subclade are dug up somewhere on the continent.
B. I find it interesting to see how the relations between European countries differ depending on which haplogroups we analyse. Somewhere in the ether, Tacitus* is nodding sagely at the PCA plot below.
C. I am attempting to find a haplogroup that doesn't fit the simple formula (Ireland + Netherlands) / 2 = England (assuming a decent number of FTDNA tests are in the analysis).

---

R-ZZ11 subclades _ 50 tests _ Table - This is rather massive, so apologies to anyone on a tablet as this may be illegible. You'll need to either save the image, or open it in a new window to view on a PC.

R-ZZ11 subclades _ 50 tests _ Hierarchical clustering

R-ZZ11 subclades _ 50 tests _ PCA plot _ PC 1v2

You may note that only 50% of the variance is captured by PC's 1 and 2. This is due to there being other PC's potentially of significance, as shown by the scree plot.

R-ZZ11 subclades _ 50 tests _ Scree plot

If you want to explore the other PC's, you can download the dat file for use in Past 4 from my shared OneDrive folder.

*Silures
Several Roman authors including Pliny, Ptolemy and Tacitus mention this tribe and later civitas (administrative unit in a Roman province). Their territory was south east Wales - the Brecon Beacons and south Welsh valleys. A people of the mountains and valleys, we know relatively little about how they lived.

Like the other tribes of the Welsh Mountains, they were difficult for the Romans to conquer and control. For a time in the period around AD 45-57, they led the British opposition to the Roman advance westwards.

Tacitus describes them as a strong and warlike nation, and for ten years or more the Romans fought to contain, rather than conquer them. Although defeated and occupied by the early 60's, their bitter resistance may explain the late grant of self governing civitas status to them only in the early 2nd century. The capital was established at a previously unoccupied site at Caerwent and was given the name Venta Silrum. Tacitus described them as swarthy and curly-haired, and suggested their ancestors might be from Spain because of the similarities in appearance with some peoples in Spain. However, there is no evidence to suggest any genetic links between south Wales and parts of Spain.

Source: https://www.bbc.co.uk/history/ancien...shtml#nineteen

5. Using the dat file, you can push the analysis further, as I've done below. However, the finer the scale of the analysis, the fewer Y-DNA tests you will be basing the analysis on. But don't let that stop you.

R-ZZ11>U152 subclades _ 50 tests _ Hierarchical clustering

R-ZZ11>U152 subclades _ 50 tests _ PCA plot _ PC 1v2

R-ZZ11>DF27 subclades _ 50 tests _ Hierarchical clustering

R-ZZ11>DF27 subclades _ 50 tests _ PCA plot _ PC 1v2

7. Complete analysis of haplogroup I-P215 (I2); 50 tests minimum.

Table

Hierarchical clustering

PCA plot _ PC 1v2

PCA plot _ PC 1v2 _ Eastern Europe

PCA plot _ PC 1v2 _ Western Europe

My recent posts cover the main Y-DNA haplogroups in England, namely I-M253 (I1), I-P215 (I2), R-U106>>>Z381 and R-P312>ZZ11. Only R-P312>Z290 is missing.

Netherlands currently has 46 R-Z290 tests, and from this experience I would prefer to wait for 50 tests (100 is obviously better but we'll be waiting some time for many relevant countries to reach 100 tests).

Analysis could also be performed on the secondary haplogroups E, G and J (Netherlands >50 tests for each), and R1a (Netherlands 36 tests).

I also considered analysing mtDNA, but a cursory glance revealed that Ireland, England and the Netherlands have identical ratios of the main mtDNA haplogroups HV, J, T and U, which doesn't bode well and seems like a lot of work for potentially no return.

My final observation is that Northern Ireland has an overall fascinating Y-DNA story that I hope one day will be analysed professionally.

Fin.

9. Complete analysis of Y-DNA haplogroup R-U106; 50 tests minimum. The dat file is available.

This includes the missing 14% of R-U106 in England from my previous R-U106 analysis (i.e. the non-R-Z381 subclades).

For DF96/DF98 fans, R-Z306 is the grandfather of R-Z304 (R-Z304 being the MRCA (great grandfather and grandfather) of R-DF96 and R-DF98).

Table

Hierarchical clustering

Scree plot

11. PCA plot _ PC 1v2

PCA plot _ PC 1v3

PCA plot _ PC 2v3

Again, Northern Ireland occupies a fairly unique position vis-à-vis the rest of the Isles.

13. I made a formula error (missing \$-sign) in the I-M253 spreadsheet, so I will post the corrected analysis soon.

---

Complete analysis (Europe only) of Y-DNA haplogroup E-CTS9083 (E1); 50 tests minimum.

Table

Hierarchical clustering

PCA plot _ PC 1v2 - PC 2v3 generally looks more geographically as expected; see the dat file if interested.

The better summary of the data is the Hierarchical clustering chart rather than the PCA plot, as the clustering chart takes all of the data into consideration (e.g. Ukraine and the Netherlands are not closely related as the PCA plot suggests).

The Ire-Eng-Neth formula seems to "pass" (meaning England falls between Ireland and the Netherlands in the major subclades of a haplogroup) in some cases (R-U106, R-ZZ11), and fail in others (E-CTS9083, I-P215). Perhaps relating to (an accumulation of?) different population movements at different times, where some affected England and the Netherlands, while others England but not the Netherlands.

England's closest populations in Hierarchical clustering (primary; secondary)

I-M253: TBC
I-P215: Wales; France/Italy
R-U106: Netherlands; Wales/Ireland/Scotland
R-ZZ11: Netherlands; France
E-CTS9083: Ireland; Scotland

15. The error in the I-M253 spreadsheet resulted in a small number of countries not quite summing to 100%, but in terms of clustering/PCA nothing has changed with the correction.

---

Complete analysis of Y-DNA haplogroup I-M253 (I1); 50 tests minimum.

Table

Hierarchical clustering

PCA plot _ PC 1v2

PCA plot _ PC 1v2 - removed Finland (assumed founder effect from Norway/Sweden).

Scree plot _ No Finland

If all/the vast majority of I1 in England is purportedly post-Roman, then what links England, France, Italy and Hungary?

https://en.wikipedia.org/wiki/Ostrogoths

https://en.wikipedia.org/wiki/Lombards

https://en.wikipedia.org/wiki/Carolingian_Empire

https://en.wikipedia.org/wiki/Normans

---

England's closest populations in Hierarchical clustering (primary; secondary)

I-M253: France; Italy
I-P215: Wales; France/Italy
R-U106: Netherlands; Wales/Ireland/Scotland
R-ZZ11: Netherlands; France
E-CTS9083: Ireland; Scotland

17. Complete analysis of Y-DNA haplogroup J-M304 (J); 50 tests minimum.

According to Wikipedia,

J-M267 is uncommon in most of Northern and Central Europe. It is, however, found in significant pockets at levels of 5–10% among many populations in southern Europe.
FTDNA data reveals that J-M267 (J1) is ~25% of J in NW Europe, which is why I analysed J as a whole.

I am not able to comment on how much of J across Europe is Jewish or non-Jewish, I can only analyse what has been recorded at FTDNA. Again, according to Wikipedia,

[J-M267] is also found at very high but lesser extent in parts of the Caucasus, Ethiopia and parts of North Africa and amongst most Levant peoples, incl. Jewish groups, especially those with Cohen surnames.

Table

Hierarchical clustering

PCA plot _ PC 1v2

19. Complete analysis of Y-DNA haplogroup R-P312>Z290; 40 tests minimum (rather than 50, otherwise the Netherlands would have been excluded for the first time).

All dat files are available.

Table

Hierarchical clustering

Scree plot

PCA plot _ PC 1v2

---

England's closest populations in Hierarchical clustering (primary; secondary)

I-M253: France; Italy
I-P215: Wales; France/Italy
R-U106: Netherlands; Wales/Ireland/Scotland
R-Z290: Germany; Wales
R-ZZ11: Netherlands; France
E-CTS9083: Ireland; Scotland
J-M304: Germany; Bulgaria

