PDA

View Full Version : Y-Full results...............what can I do with them?



tonwil61
05-29-2014, 10:25 AM
Hi Guys,
I need advice about my Y-full results.

They have analysed my FTDNA .bam files and today released the results. 53185 SNPs, 90 private SNP's (different from novel variants supplied from FTDNA?), 497 STR results as well as full mtDNA results (reported against rCRS and RSRS values).

My questions are: how can I share this data (ysearch.com has no capability for these extra STR's, and also the SNP's cannot be translated to FTDNA database, or can they)? and how can the data add to the current pool of knowledge about R1b-L2 if it can't be shared?? Sorry if these sounds like stupid questions.

Cheers

Tony Wilson

lgmayka
05-29-2014, 02:21 PM
YFull has groups, but none specifically for U152 yet. (There are groups for L21, U106, DF27, and P312.) Perhaps you should ask YFull management to create a group for U152.

tonwil61
05-30-2014, 07:44 AM
Hi Guys,
as Igmayka suggested I contacted the administrators at y-full. I am happy to announce there is now a R-U 152 group available thanks to the quick and efficient Vadim Urasin at Y-full.

Cheers

Tony Wilson

MitchellSince1893
05-30-2014, 07:04 PM
I just requested to join that group....looks like 44 other people beat me to it.

EDIT: Actually the 44 other people were from studies such as 1000 genome project.

There are now 4 individual members for a total of 48.

Bolgeris
05-30-2014, 07:17 PM
I will join that group ..
when my BAM file will arrive. :)

R.Rocca
05-30-2014, 07:17 PM
Aside from being able to see one's own SNPs with the associated positions, is there a place where you can see the SNP positions of others, especially those that get put on their Y-Full tree?

MitchellSince1893
05-30-2014, 07:35 PM
Once I get accepted into the group I will tell you what's viewable.

haleaton
05-30-2014, 09:23 PM
I just requested to join that group....looks like 44 other people beat me to it.

Thanks, I forgot to apply and add my data.

RCO
05-31-2014, 01:54 AM
Aside from being able to see one's own SNPs with the associated positions, is there a place where you can see the SNP positions of others, especially those that get put on their Y-Full tree?

Yes, you can see the SNPs of the others as well.

R.Rocca
05-31-2014, 02:12 AM
Yes, you can see the SNPs of the others as well.

I'm more interested in the positions, not the SNP names. Can you tell me if they show the positions of the SNPs of others?

RCO
05-31-2014, 02:15 AM
Yes, they show the positions related to the known nodes in their tree (ISOGG tree), but you can only observe the results from your group. I only know the ones from the J1 group.

MitchellSince1893
05-31-2014, 02:29 AM
I'm more interested in the positions, not the SNP names. Can you tell me if they show the positions of the SNPs of others?

Rich, not sure if I'm answering your question, but when I went to the ybrower and entered position 22867545 I got a table that showed this (just an excerpt):


Sample ID HG 22867545


REFSEQ A

HG00152 R1b1a2a1a2b A
NA20754 R1b1a2a1a2b A
NA19685 R1b1a2a1a2b* A
NA19661 R1b1a2a1a2b* A
NA19720 R1b1a2a1a2b* A
HG00145 R1b1a2a1a2b1* A
HG00129 R1b1a2a1a2b1* A
NA07357 R1b1a2a1a2b1*A
YF01555 R1b1a2a1a2b1 A
YF01527 R1b1a2a1a2b1 A
YF01647 R1b1a2a1a2b1* A
NA19679 R1b1a2a1a2b1* A
HG01941 R1b1a2a1a2b1* A
NA20515 R1b1a2a1a2b1a1 A
HG01356 R1b1a2a1a2b1a1 A
NA19649 R1b1a2a1a2b1a1 A
HG01767 R1b1a2a1a2b1a1 A
NA12005 R1b1a2a1a2b1a1 A
NA20759 R1b1a2a1a2b1a2a* A
HG02262 R1b1a2a1a2b1a2a* A
NA07347 R1b1a2a1a2b1a2a1 A
NA11994 R1b1a2a1a2b1a2a1 A
HG00736 R1b1a2a1a2b1c* A
NA20812 R1b1a2a1a2b1c1* T
HG00244 R1b1a2a1a2b1c1* T
HG01536 R1b1a2a1a2b1c1a A
...
HG01777 R-Y3140 T
NA12144 R-Y3577 A
NA20512 R-Z36*A
HG01383 R-Z36* A
HG01060 R-Z36* A
NA20803 R-Z37 A
NA20798 R-Z37 N
NA20538 R-Z37 A
NA20806 R-Z37 A

MitchellSince1893
05-31-2014, 02:45 AM
This is very cool as I can now run through my private snps to see if any others match them

MitchellSince1893
05-31-2014, 05:11 AM
FYI for those joining, if you want your kit to show up on the YResults (Y SNPs and Y STRs) and YBrowser results, you may need to go into the account settings and check the applicable boxes.

I kept waiting for my kit# to show up on the results and I finally realized I had to make those changes.

haleaton
05-31-2014, 06:36 PM
The tree part may be available without being a member of the group, perhaps if settings are set: http://www.yfull.com/tree/R1b1a2a1a2b1/.

Most of the U152 samples are the public data sets (eg. 1K Genomes) with a few YFXXXX ones that are members who have contributed data and adjusted settings.

Not all features have been implemented, but there are two useful and fun operational functions.

You can "View Y Results" and compare named SNPs within group as to positive, negative, or ?. Multiple entries can be separated by commas and I did not find a small upper limit to the number that could be input, but there may be an large upper limit.

A similar View "Y Results" function for STRs seemed to be non-operational (is operational) at this time. [Edit this is operational and compares user ones in the data set. It just takes a long time to process and come by. My Error!]

There is also a "Y Chr Browser" where you can also submit location hg19 locations separated by commas with a limit I found to be 10 and see comparisons with other group datasets with the REFSEQ being the first. It reports back the consensus value, if there was one, or a letter code for ambiguous cases, but you can hover over the box and the YFull read values will be displayed.

I don't mind making my data set public and it is ID: YF1461 or YF01461, U152-L2*, sample ancestry is Jonas Eaton, b. 1618, Kent, England, came to Mass. in 1637.

In comparing FGC and YFull reliability/quality in reported private SNPs I did find (in the two I checked) low read number SNPs that had correct Sanger values tested on my sample only--no matches for me.

In looking at my ** NGC Private SNPs I did find a very small number of cases that YFull did not call out based on their quality criteria. One of them was:

13254034 (C>G) rs113991829 **

I am "16 G", but using the Y Browser YF01647 (who is L2* Edit: DF110+ > 7983816+ > 1750878+1 et al. L196- ) is "9C 23G" which they classify as "G." Most of the other group are low read unambiguous "C" negative. Some others are heterozygous. Is this a Positive match with another L2* sample or lost in the noise?

lamahorse
07-27-2014, 03:51 PM
Recently my FTDNA changed my haplogroup to U152. Still waiting on my big Y results.. 3-5 weeks!

R.Rocca
07-27-2014, 06:06 PM
Recently my FTDNA changed my haplogroup to U152. Still waiting on my big Y results.. 3-5 weeks!

Have you joined the U152 project? What kit number are you?

haleaton
10-27-2014, 08:29 PM
The folks at YFull have now updated their software so that multiple bam files from the same person can be analyzed and can be compared.

Not surprisingly besides differences in coverage, there were also many cases where Big Y gave different results (positive/negative).

CTS4727 15785857(T/G) with 7 negative reads for BGI and 2 positive reads for BigY which reports it as positive for my sample,
Jonas Eaton R-L2* , FTDNA#125963, for YFull FGC BGI (YF01461), FTNDA Big Y (YF02170).

CTS4727 is also positive for a least one R-L2 downstream in the DF110 branch, though I am DF110-. YFull currently has CTS4727 as
part of this branch based on the same sample. However I also find references to CTS4727 being positive for a R-L21 person which would
mean this SNP may be difficult to determine or unstable.

I am awaiting Sanger sequencing of CTS4727 in a week or so. Rich Rocca predicts it will likely negative. I tend to agree.

I am interested if others out there were tested CTS4727+ in their Big Y or other data.

I was able to to show, for my sample only, that using the FGC BGI (now Elite) low read down to two reads could be verified as positive
using Sanger sequencing from YSeq, however the one read positive that could be Sanger sequenced (4) all turned out to be false positive.

The Big Y data has much more "noise" and YFull does not even report single read "Novel SNPs", though each location can be examined for it.

For Big Y YFull reported 43 "ambiguous" Novel SNPs which were all 2 reads, but every single one of them turned out was negative with several reads in my NGC BGI data. (Remember YFull for Novel SNPs reports only positive SNPs, a comparison of known SNPs such as CTS4727 also yields differences of possible false negatives also)

So, I would be careful with SNPs that FTDNA reports as positive with only two reads.

haleaton
10-28-2014, 04:01 AM
CTS4727 15785857(T/G) with 7 negative reads for BGI and 2 positive reads for BigY which reports it as positive for my sample,
Jonas Eaton R-L2* , FTDNA#125963, for YFull FGC BGI (YF01461), FTNDA Big Y (YF02170).

CTS4727 is also positive for a least one R-L2 downstream in the DF110 branch, though I am DF110-. YFull currently has CTS4727 as
part of this branch based on the same sample. However I also find references to CTS4727 being positive for a R-L21 person which would
mean this SNP may be difficult to determine or unstable.

I am awaiting Sanger sequencing of CTS4727 in a week or so. Rich Rocca predicts it will likely negative. I tend to agree.

I am interested if others out there were tested CTS4727+ in their Big Y or other data.



Just got results of Sanger Sequencing at Yseq and I am by Sanger CTS4727- 15785857(T/G) T-. Which shows that SNPs at the 2 Read level for Big Y data should be questioned and I think that would be positive or negative.

FTDNA Big Y Reports CTS4727 for me a positive with "high" confidence.

tonwil61
10-28-2014, 04:24 AM
Hi Haleaton,
Y-full now has an extra called 'Y Full Report' which lists out all positive, no call, and ambiguous results layed out as a descending tree from A1 to (in my case) what y-full are calling R-3960 which is one step below R-L2. On this line I am listed as CTS 4727 (T>G) ie +ve, ChrY position: 15785857 (+strand). Also in the FTDNA Big Y results I confirmed as (T>G)+ve for CTS 4727.

I am the DF110 from the U152 Y-DNA tree.

Cheers

Tony Wilson

PS: When I say one step below R-L2 , I mean one step further down the tree.

haleaton
10-28-2014, 05:12 AM
Hi Haleaton,
Y-full now has an extra called 'Y Full Report' which lists out all positive, no call, and ambiguous results layed out as a descending tree from A1 to (in my case) what y-full are calling R-3960 which is one step below R-L2. On this line I am listed as CTS 4727 (T>G) ie +ve, ChrY position: 15785857 (+strand). Also in the FTDNA Big Y results I confirmed as (T>G)+ve for CTS 4727.

I am the DF110 from the U152 Y-DNA tree.

Cheers

Tony Wilson

PS: When I say one step below R-L2 , I mean one step further down the tree.

From communicating with Rich Rocca I think your DF110 branch is pretty solid but CTS4727 (at the same level as DF110) is suspect. From the U152 YFull group
I think you (YF01647) are also 2G reads with you Big Y data for CTS4727 which could be a similar error.

I am DF110- by many reads in both BGI and BigY.

One of odd ironies is that random "known" SNPs reported and defined somewhere, but not established on the tree by multiple sample, are very often "unstable" or heterozygous or in this case perhaps measurement error but are reported as postitive or negative as exact truth if you do not look into the read counts. Higher standards are applied to "Novel SNPs" because they have not been defined before.

This was so true in that all 43 two read "ambiguous" SNPs are likely false positives. YFull never claimed there were valid SNPs. FTNDA never cited them as a Novel Variant, however a known SNP with two read gets classified as "high" confidence.

[Edit: I think there is a L196 Big Y in the queue, so it will be interesting to how it turns out]

haleaton
11-12-2014, 08:17 AM
From communicating with Rich Rocca I think your DF110 branch is pretty solid but CTS4727 (at the same level as DF110) is suspect. From the U152 YFull group
I think you (YF01647) are also 2G reads with you Big Y data for CTS4727 which could be a similar error.

I am DF110- by many reads in both BGI and BigY.

One of odd ironies is that random "known" SNPs reported and defined somewhere, but not established on the tree by multiple sample, are very often "unstable" or heterozygous or in this case perhaps measurement error but are reported as postitive or negative as exact truth if you do not look into the read counts. Higher standards are applied to "Novel SNPs" because they have not been defined before.

This was so true in that all 43 two read "ambiguous" SNPs are likely false positives. YFull never claimed there were valid SNPs. FTNDA never cited them as a Novel Variant, however a known SNP with two read gets classified as "high" confidence.

[Edit: I think there is a L196 Big Y in the queue, so it will be interesting to how it turns out]

Doh!

Y-Full put CTS4727 on their tree and me with it,
Terminal subclade events:
terminal haplogroup of sample YF02170 changed from R1b1a2a1a2b1 to R-CTS4727*

If any of the three current DF110 folks from BigY could check their results for this or provide Rich Rocca their bam it might help straighten it out for them.

Or anybody from other haplogroups who tested positive for CTS4727. YFull tree was based on my and another BigY with only two reads which is where BigY gets iffy.

tonwil61
11-12-2014, 09:55 PM
HI Haleaton,
just checked my y-Full report and CTS4727 is listed as one step below R-Y3960 (which is what they call my sub-grouping) and one step above R1b1a2a1a2b1( S139 • L2).

Sample: #YF01647 (R-Y3960)
ChrY position: 15785857 (+strand)
Reads: 2
Position data: 2G
Weight for G: 1.0
Probability of error: 0.0 (0<->1)
Sample allele: G
Reference (hg19) allele: T
Known SNPs at this position: CTS4727 (T->G)

Richard already has my .Bam files I think.

Cheers

Tony Wilson

haleaton
11-13-2014, 02:33 AM
HI Haleaton,
just checked my y-Full report and CTS4727 is listed as one step below R-Y3960 (which is what they call my sub-grouping) and one step above R1b1a2a1a2b1( S139 • L2).

Sample: #YF01647 (R-Y3960)
ChrY position: 15785857 (+strand)
Reads: 2
Position data: 2G
Weight for G: 1.0
Probability of error: 0.0 (0<->1)
Sample allele: G
Reference (hg19) allele: T
Known SNPs at this position: CTS4727 (T->G)

Richard already has my .Bam files I think.

Cheers

Tony Wilson

YFull said it was an oversight as they found it in four places. So they should remove CTS4723. You were the case with two reads like me, though I am DF110-.

It is and interesting probability issue. If you do have a reported 2 read positive SNP from BigY if then it is one that has been shown to be phylogenetically rock solid from others tests and is consistent with you upstream SNPs then it is more likely that even two reads is valid. So it is probably a good thing for FTDNA to report it.

However if you are searching the massive number of locations and comparing against the reference a significant number of false positives at low reads will be found, so this is why they do not report them as Novel SNPs.

Sven Vermaete
12-02-2014, 03:10 PM
I got my yfull analyse today. It confirmed my toughts, i stay an outlier in R1B-L2; just as the fluxus network, based on 111str, pointed out before.
I stay R1b-L2*