PDA

View Full Version : Z253 analysis of Recent Chromo2 2000 sample spreadsheet.



IrishTypeIII
02-21-2014, 12:55 AM
The recent availability of data from Britain's DNA Chromo 2, Random 2000 men sample, Extremely Large file here (https://www.britainsdna.com/download/C2_2000.zip) has enabled detailed investigation of SNPs being tested in Chromo 2.

This file is HUGE so I have broken it down somewhat. I have selected just those men that were S218 (which is Britain's DNA name for Z253).
There were a total of 60, of which 4 were also S168 (L226). I then removed all SNPs that were ancestral for ALL of the 60 men, as these have no interest to our clade.

I was left with 325 SNPs that were derived for ALL 60 men, and presume these are pre-Z253. (Someone else can work out which lie between L21 and Z253).
There were 2 that were derived for most of the 60 so I presume these are Private SNPs in those men.

Then there are 66 SNPs that are a mix of Ancestral and Derived across the 60 men. This is where the branching is occurring post Z253.

Of interest to me is that for the 4 S168(L226) men, there were NO OTHER exclusive SNPs, meaning Chromo2 has found NO SNPs that split the L226 cluster.

For those that would like to have a play, here is my spreadsheet Z253 L226 Chromo2 Data (https://dl.dropboxusercontent.com/u/14028750/Z253_L226%20Chromo2%20Data.xlsx)

Have fun!

hoxgi
02-21-2014, 10:30 AM
Dennis, thank you very much indeed for modifying and posting the spreadsheet of Chromo 2 results. There is a large amount of information in the spreadsheet and it will take time to work through it.

I have been looking at the Z2185 pathway under Z2534. Originally there were five SNP’s identified in Thousand Genomes data: Z2182 (a.k.a. S882), Z2183, Z2184 (a.k.a. S883), Z2185 (a.k.a. S893) and Z2186. FTDNA did not develop primers for Z2182 or Z2186. Z2184 has been thought to be phylogenetically equivalent to Z2185, and the Z2183 has been thought to be phylogenetically equivalent to L1066.

The data from the Chromo 2 results now allows further conclusions to be drawn, although I accept that confirmation is needed, especially when making a deduction based on the result of only one kit.
1. L1066 and Z2183 remain phylogenetically equivalent.
2. Z2183 is downstream from Z2186, because of the results of kit #929 (Z2183 ancestral, Z2186 derived).
3. Z2186 is downstream from Z2185, because of the results of kit #899 (Z2186 ancestral, Z2185 derived).
4. Z2182, Z2184 and Z2185 are phylogenetically equivalent.
5. The results of S8889 are inconsistent; it appears to be in this pathway but could be either above or below Z2534 (a.k.a. S868).

There are also two “new” SNPs, S884 and S891, which are phylogenetically equivalent to each other and are downstream from Z2185 (there are a number of kits with S884 and S891 ancestral, Z2185 derived). There are two kits, #895 and #899, which are positive for S884 and S891, negative for Z2183 and Z2186, and positive for Z2534, Z2185, Z2184 and Z2182.

So S884 and S891 are in a parallel pathway to Z2186, Z2183 and L1066.

S884 is also known as CTS4314, which is included in Geno 2.0. There is one CTS4314+ result from Geno 2 in Mike’s SNP spreadsheet and the Z253 Project – Christiansen fN94900, who is Z253+, Z2534+, Z2185+ and L1066- (which is consistent with the Chromo 2 results above).

The primers for S884/CTS4314 are known and hopefully it will be tested by FTDNA in Big Y, as well as Geno 2.

I suspect that there is quite a bit more data mining to do from the Chromo 2 spreadsheet. I’ll look at it again tomorrow.

Greg H

IrishTypeIII
02-22-2014, 04:31 AM
I have rearranged the spreadsheet somewhat, and additional SNPs appear more relevant. Comments?
Revised Z253 Chromo2 Data (https://dl.dropboxusercontent.com/u/14028750/Z253_L226%20Chromo2%20Data.xlsx)

hoxgi
02-22-2014, 05:47 AM
Dennis, you've definitely improved the spreadsheet. Having a basic division into Z2534 ancestral and Z2534 derived assists in determining which SNPs are unstable, recurrent or otherwise unreliable. Alex Williamson posted in the L21 Yahoo group that PF2642, S19777, S3023, S8889, V244 and S583 fall into this category. I would also suggest including PF4655, S3846, S6157 and S8933, based on their results in your spreadsheet.

There is another finding in the revised spreadsheet which demands attention. There are four kits (#889, #890, #916 and #930) which are all Z2534- and which are all derived for a group of sixteen SNPs. Every other kit in the spreadsheet is ancestral for every one of these 16 SNPs. The SNPs are S841, S844, S845, S847, S848, S849, S850, S853, S854, S855, S856, S857, S859, S860, S862 and S865,

None of these SNPs appear to have any alternative names, except for S847, which is also known as CTS2554 and was discovered in 1000 Genomes data; its primers have not been determined and it does not appear in any of the Geno 2.0 results in the Z253 Project and it also does not appear in Mike's spreadsheet of L21+ SNP results.

There are some large varieties which are currently Z253*, so ideally we should follow through on these results with Sanger testing of CTS2554 when available.

There is also another new SNP, S7879, which derived in two of the other Z2534- kits (#893 and #896) and ancestral in all the rest, so this may also be a new terminal SNP under Z253.

Greg H

IrishTypeIII
02-22-2014, 10:19 PM
Greg,

I noticed that there is no L554 cluster and am wondering if the testers, 889,890,916 and 930 are that cluster. I don't know of an alternative name for L554, but it is ancestral 'A' and derived 'G'. S845, S847, S856, S862 and S865 are all A > G. Is one of these the alternative name for L554?

Updated the Z53 Chromo2 file (https://dl.dropboxusercontent.com/u/14028750/Z253_L226%20Chromo2%20Data.xlsx) once more grouping the unstable SNPs to the bottom of the spreadsheet.


And this is just the start .... just wait for Big-Y results :-)

hoxgi
02-23-2014, 03:34 AM
Dennis,

As far as I am aware, there are no alternative names for L554 and no L554+ people have tested Chromo 2. There are only around five L554+ people so far known, despite their high GDs to each other. David Pike is admin of the L554 Project and is L554+ himself; he has a Big Y test in progress.

So I suspect that L554 may not have been included in the Chromo 2 tests/results, at least as far as the results released on the public website are concerned. This appears to be the case with PF825 as well.

And yes, Big Y results are going to generate considerably more work. I trust that you will continue to put your spreadsheet abilities to good use :).

I have been trying to select promising "new" SNPs under Z253 for further testing and ISOGG listing. I have made those SNPs with concordant and consistent results from two different testing companies (such as Geno 2 from FTDNA and Chromo 2 from Britain's DNA) my top priority, especially if the relevant primers are already known. This approach has been successful with CTS9881, which is now positioned on the ISOGG Y-tree. My next target SNP at this stage is CTS4314, which defines a new branch below Z2185.

Greg H

IrishTypeIII
02-23-2014, 06:54 AM
Greg,
This is the 'tree' I have constructed from the data presented.
1484

Interesting?
Dennis

hoxgi
02-23-2014, 12:11 PM
Dennis,

The modified spreadsheet (current version) is brilliantly set out and the tree structure is very useful as a basis for incorporating future results from Big Y and other sources. Readers should keep in mind that quite a few of the SNPs included have only been detected in one sample so far and may remain private. Also, many of the "new" SNPs have not been confirmed by Sanger sequencing or by cross-referencing to other test results such as Geno 2, Big Y or FG. However the tree is very helpful in depicting how the Chromo 2 results appear to fit into the accepted Z253 branch structure.

There's a lot of SNPs!

I'll have to go back and check in detail that my notes correlate with your tree for each new SNP, but I suspect that any discrepancies will be my error rather than yours.

I do have a couple of suggestions to enhance the utility of your tree.

One is to include the SNPs already reliably positioned downstream of Z253 which were not included in the Chromo 2 results. So L554, L1308 and PF 825 could be added to the second row, immediately under Z253, and DF73 could be added to the third row, immediately under Z2534. You could also add L894 and L895 (which remain private and equivalent) to the row immediately below L1066.

The other is that, as far as I can see, Z2182 and Z2184 remain equivalent to Z2185, so they should all be boxed together.

You also have a typo in the L1066 box; it should read CTS1202, not CTS1201.

Finally, I wonder what you and others think about S1984; it is ancestral in only one Z253+ sample (which is also Z2534+ Z2185-), so I presume that this represents a back-mutation, which is private (at this stage) when occurring below Z253. However, while it is mainly derived across L21+, there are a few other ancestral results, including everyone who is L159+.

Anyway, great work. As we will use the tree as a research tool, it's useful to include all new SNPs, even those that are currently private or equivalent to other SNPs. Perhaps we should refer to it as "the Z253 Research Tree" to differentiate it from the established and simpler trees such as that of ISOGG.

Greg H

Celtarion
02-23-2014, 10:43 PM
If Full Genome and FTDNA are on time as planned, we might potentially see new branches below Z253 and Z2534 :-)

When I'll get the data available, I'll share the files I got, I'll be travelling again end of next week, so I'll try to upload them through Yahoo! groups.

Are there any new folders available throguh Yahoo! Groups for FG and Big Y?

Thanks,

Joss.

IrishTypeIII
02-24-2014, 12:18 AM
Dennis,

The modified spreadsheet (current version) is brilliantly set out and the tree structure is very useful as a basis for incorporating future results from Big Y and other sources. Readers should keep in mind that quite a few of the SNPs included have only been detected in one sample so far and may remain private. Also, many of the "new" SNPs have not been confirmed by Sanger sequencing or by cross-referencing to other test results such as Geno 2, Big Y or FG. However the tree is very helpful in depicting how the Chromo 2 results appear to fit into the accepted Z253 branch structure.

There's a lot of SNPs!

I'll have to go back and check in detail that my notes correlate with your tree for each new SNP, but I suspect that any discrepancies will be my error rather than yours.

I do have a couple of suggestions to enhance the utility of your tree.

One is to include the SNPs already reliably positioned downstream of Z253 which were not included in the Chromo 2 results. So L554, L1308 and PF 825 could be added to the second row, immediately under Z253, and DF73 could be added to the third row, immediately under Z2534. You could also add L894 and L895 (which remain private and equivalent) to the row immediately below L1066.

The other is that, as far as I can see, Z2182 and Z2184 remain equivalent to Z2185, so they should all be boxed together.

You also have a typo in the L1066 box; it should read CTS1202, not CTS1201.

Finally, I wonder what you and others think about S1984; it is ancestral in only one Z253+ sample (which is also Z2534+ Z2185-), so I presume that this represents a back-mutation, which is private (at this stage) when occurring below Z253. However, while it is mainly derived across L21+, there are a few other ancestral results, including everyone who is L159+.

Anyway, great work. As we will use the tree as a research tool, it's useful to include all new SNPs, even those that are currently private or equivalent to other SNPs. Perhaps we should refer to it as "the Z253 Research Tree" to differentiate it from the established and simpler trees such as that of ISOGG.

Greg H

Thanks Greg,

Gee I thought I might have made more mistakes than you picked up! My mind was whirring while I was moving boxes around! I believe I have corrected all now, and added the known SNPs as you have recommended.

Yes, I agree, that S1984 is probably a back mutation, and private at that, but for completeness we must report ALL that the Chromo 2 shows us. ISOGG SNPs have been highlighted.

1490

I have updated tree on the spreadsheet (https://dl.dropboxusercontent.com/u/14028750/Z253_L226%20Chromo2%20Data.xlsx) too.

Cheers

Dennis

hoxgi
02-24-2014, 11:24 AM
Dennis,

Your tree diagram is almost perfect now and I really like the way it correlates visually with the way you have set out the spreadsheet.

I have a couple more corrections for you.

The colour coding for the ISOGG-listed SNPs is an excellent addition. However the following SNPs should be coloured as they have been included in the ISOGG Y-tree: Z2534, PF825, and DF73.

PF825 is actually PF825.2 according to ISOGG and FTDNA.

L894 and L895 should be boxed together, as they are phylogenetically equivalent at present.

Finally, according to your spreadsheet, you should have S808 in its own box below the large box with the 16 SNPs starting with S841.

You are also missing a couple of alternative names: S884=CTS4314 and S847=CTS2554.

I'm not trying to be obsessive (I hope) with these comments. It's just that if we want to use your diagram as a serious research tool, we need it to be as accurate as humanly possible.

Many of the "new" SNPs from the Chromo 2 data have only been found in one person and so are private at present. What do you think about adding a second colour to show those SNPs which are not private, but not yet on the ISOGG Y-tree? They would include the 16 in the one large box on the left of the diagram, as well as L1308, S7839, S27687, S882, S883, Z2186, Z2183, S884, S891, CTS11831, CTS11843, S23267 and S9189. All the others, the private ones, could remain as they are.

Hopefully Big Y results will be released as planned at the end of the week. Apart from the sheer number of SNPs, it will also be difficult to fit them in with the Chromo 2 results unless Britain's DNA release identifying information for each of their new "S" SNPs. So far we have a list of duplicate names, which is very helpful, but as far as I am aware, we could have one of the "S" SNPs rediscovered by FTDNA in Big Y or by FGC with Full Genomes and given another name, and we wouldn't know that it is a duplicate. Or am I missing something?

Greg H

IrishTypeIII
02-27-2014, 02:16 AM
I have completely redone the spreadsheet and Tree based on Version 2 of the data (https://www.britainsdna.com/download/C2_2000_v2.zip) released by Britain's DNA.

My spreadsheet and tree are in the Files section of the R-L21 project at Yahoo.

For those that do not have access, Here is the Z253 Research Data (https://dl.dropboxusercontent.com/u/14028750/Z253%20Research%20Data%20v2.xlsx) spreadsheet and the Z253 Research Tree (https://dl.dropboxusercontent.com/u/14028750/Z253%20Research%20Tree%20V2.pdf).

There are several differences from the version 1 data, and a lot cleaner. Still 60 sets of Z253 results.

Dennis