PDA

View Full Version : Chromo2 Results Data



Williamson
02-19-2014, 01:48 PM
Hi Everyone,

BISDNA has produced a spreadsheet which has the results of 2000 men tested there for its Chromo2 test. It is not meant to cover everyone, but just to give a sample of the various haplogroups they're finding.

The file can be downloaded from

https://www.britainsdna.com/download/C2_2000.zip

The first line of the file contains a short description of the results below it:

These are the Chromo2 Y genotypes for 2000 anonymised individuals. Row 2 has the unique identifier for each sample. Row 3 indicates the inferred haplogroup, and row 4 the inferred subtype, as of February 2014. Rows 5 - 14503 are the genotyping results for all 14498 Y SNPs on the Chromo2 chip (NOT including SNPs that were tested by DNA sequencing or Taqman, such as S28 and S21, which were only tested when phylogenetically relevant), including the 258 SNPs that are currently known to be phylogenetically uninformative or have failed (rows 14246-14503). Column 1 gives the SNP name; column 2 indicates the ancestral (negative) genotype, column 3 indicates the derived (positive) genotype for this SNP, called to the Illumina TOP strand. Columns 4 - 2003 are the genotyping results for each of the 2000 samples. While the Y chromosome is a single copy piece of DNA, the Illumina software used to call alleles is designed for autosomal markers which come in two copies. Hence when AA or GG are given, it means A or G, respectively. A small number of markers give apparent heterozygote calls, e.g. AG. The reason they appear to be heterozygous is again because of limitations in the Illumina genotype clustering software, which expects three clusters. If the true ancestral and derived clusters are too close together, it is not possible to force a "homozygous" call for the one variant, hence an AG call is allowed. In a few cases a marker might arise, for example, by the DNA letter A changing to C, then much later in time in someone with the C it changes back to an A again; this is called back-mutation. For SNPs where the back-mutation state defines a subtype, the ANCESTRAL allele defines the subtype, rather than the derived allele, as would normally be the case. Back-mutations are easily spotted when viewing the results of several samples from the same part of the Y tree.

Regards,
Alex

Heber
02-19-2014, 06:25 PM
Thanks for posting. I was waiting on this dataset for some time and I find it quiet disappointing.
There is a total absence of surname and location data which makes geogenealogical analysis impossible.
It may be useful for determining the phylogeny of BISDNA Trees although I haven't figured that out yet.
I would have preferred a straight comparison of BISDNA SNP names and ISOGG names.

cmorley
02-20-2014, 12:43 AM
I ran my phylogenetic algorithm on this data. Here is a sample of the output, after some basic tuning:
1447

Rathna
02-20-2014, 03:41 AM
I ran my phylogenetic algorithm on this data. Here is a sample of the output, after some basic tuning:
1447
Have you run only this page or the whole tree? I published some observations about R-L23 on "R1b phylogeny".

razyn
02-20-2014, 04:03 AM
I ran my phylogenetic algorithm on this data. Here is a sample of the output, after some basic tuning:
1447

Anybody who isn't impressed should open the current version of Chris Morley's experimental tree and search on (i.e., "Find") CTS4027 -- where there are four haplotypes, and a dead end. The Chromo2 data explodes, at that position.

I agree with Gioiello, it would be nice to see a lot more positions. But this is a promising way to learn a lot about our phylogenetic tree.

I'm glad somebody can handle that huge spreadsheet. I opened it (took a while), but it is intimidating.

Rathna
02-20-2014, 04:31 AM
I agree with Gioiello, it would be nice to see a lot more positions. But this is a promising way to learn a lot about our phylogenetic tree.

I'm glad somebody can handle that huge spreadsheet. I opened it (took a while), but it is intimidating.
Of course I did my inquire without a computer program, by watching every line, and did some mistakes, I'll correct next on "R1b phylogeny" (for instance it seems that 584, 585, 586, 587 beyond 591 are R-L51) and that the 10 Z2110 have also the SNPs S12460, S17864, S20900, but I have to verify.

The Morley's tree, if run completely, would be useful also to understand the position of the S SNPs, many resulted unreliable because present everywhere, but many others seem good.

Rathna
02-20-2014, 04:42 AM
Of course I did my inquire without a computer program, by watching every line, and did some mistakes, I'll correct next on "R1b phylogeny" (for instance it seems that 584, 585, 586, 587 beyond 591 are R-L51) and that the 10 Z2110 have also the SNPs S12460, S17864, S20900, but I have to verify.

The Morley's tree, if run completely, would be useful also to understand the position of the S SNPs, many resulted unreliable because present everywhere, but many others seem good.

S12460 could be the link between my R-Z2110 and all the BritainsDNA tested for this SNP and could demonstrate the link of my Z2110* with the British Z2110* and all the subclades CTS9219. If confirmed, this could be a great discover.

cmorley
02-20-2014, 09:28 PM
There are only 1999 samples. No sample #407.


Have you run only this page or the whole tree? I published some observations about R-L23 on "R1b phylogeny".

Whole tree. When analysing data like this I think one really needs to take a holistic approach. Otherwise, you miss any systematic irregularities.

The attached image was just an excerpt. It was generated using the Geno variant of my algorithm. I don't want to release the entire report without first checking it over.

The Geno variant produced over 300 phylogenetic inconsistencies. I've just rerun the analysis using the NGS variant of my algorithm, and now there are only 82 phylogenetic inconsistencies reported.


Anybody who isn't impressed should open the current version of Chris Morley's experimental tree and search on (i.e., "Find") CTS4027 -- where there are four haplotypes, and a dead end. The Chromo2 data explodes, at that position.

I don't see CTS4027 on the Chromo2 spreadsheet. The screenshot I posted centres on R1a-L448/S200, a close relative of R1a-CTS4027. Geno 2.0 did reveal previously structure downstream of R1a-L448/S200. Still, the Chromo2 results are pleasing for a R1a-L448/S200 researcher. There is actually a more dramatic "explosion" for R1a-Z287/S223, another clade in this genetic neighbourhood.

razyn
02-20-2014, 09:46 PM
I don't see CTS4027 on the Chromo2 spreadsheet. The screenshot I posted centres on R1a-L448/S200, a close relative of R1a-CTS4027.

I wasn't looking at the Chromo2 spreadsheet, I was referring to your screen shot (with CTS4027 at the very top) compared with the Feb. 7 release of your experimental tree.

I have, however, looked at the spreadsheet (in the area that interests me, which isn't R1a1). If you feel any inclination to compare what I found with what your algorithm found, I posted here: http://www.anthrogenica.com/showthread.php?1484-DF27-aka-S250-results-from-the-Chromo2-test&p=31686&viewfull=1#post31686

My money would be on your algorithm. At my age it's hard to scroll down 14,000 rows looking for anomalies... 1,999 times. And I didn't.

pgo1963
02-21-2014, 01:50 AM
Chris, well done. May we see the rest of the tree? I'm particularly interested in haplogroup I. Thanks,

Phil