PDA

View Full Version : YFull interpretation services, analysis and tree support



lgmayka
09-25-2015, 01:27 PM
I am not sure how Yfull got into this but they are far behind for the subclades I work with the most.
I am very appreciative of citizen scientists' efforts in R-P312 and R-U106, but the human race is far larger than those two specific clades. I mentioned YFull because I know of no other comprehensive, up-to-date Y-DNA haplotree.

TigerMW
09-25-2015, 01:36 PM
I am very appreciative of citizen scientists' efforts in R-P312 and R-U106, but the human race is far larger than those two specific clades. I mentioned YFull because I know of no other comprehensive, up-to-date Y-DNA haplotree.
The YFull tree is not to up-to-date, though, not even close. It's like every other tree, some branching is in good shape some is not. To boot, I think they only consider NGS tested BAM files... ironic, given this conversation.

I think if you truly want comprehensive, you'll have to work with FTDNA, warts and all. I know, I know; Razyn and you will cite the molasses and decades problem. Don't forget, you saw your pack this year and you are skeptical of an R1b-DF27 Pack this year.

Umm... this seems to happen all over R1b forums. Folks wonder why such focus on R1b in an R1b forum.:)

lgmayka
09-25-2015, 01:44 PM
The YFull tree is not to up-to-date, though, not even close.[/B].
It is as up to date as customers provide. It is true that some administrators are aggressively discouraging their project members from submitting BAM files to YFull. The result is a lack of development in those branches of the haplotree.

Once again I must emphasize that a haplotree that covers only one clade of one haplogroup is of much less value to the world as a whole, and can in fact exacerbate the suffocating bias already prevalent in academic research. Unless/until someone provides a genuine competitive alternative, it is to everyone's advantage to submit BAM files to YFull in addition to any analysis performed by administrators.

MacUalraig
09-25-2015, 02:14 PM
I am not sure how Yfull got into this but they are far behind for the subclades I work with the most.

Have you asked the people in 'the subclades you work with most' to support their work which is independent of the testing cos? I browse the L513 forum and don't recall you mentioning it. Nor in M222 forums.

If you really want what you describe as a system of record they are more independent than having it at an actual testing co. It would be even better if it was non-commercial I agree but I don't see any unpaid person stepping up to the plate to code one.

I believe there is an L21 group at YFull with a large number of admins (15) and 242 members. Everyone who is anyone, it would seem.;)

lgmayka
09-25-2015, 02:31 PM
It would be even better if it was non-commercial I agree but I don't see any unpaid person stepping up to the plate to code one.
ISOGG is an excellent example of how a purely volunteer effort cannot possibly keep up with the continuing SNP tsunami across all clades and haplogroups.

TigerMW
09-25-2015, 02:40 PM
There are people who like to advocate the benefits of YFull, which is fine. I setup this topic to facilitate that.

[EDIT: I just took the initiative to start this thread to help those who want to focus on YFull trees, etc.

It is my understanding, they call their tree an "experimental" tree. That doesn't mean it is not very accurate, though.]

TigerMW
09-25-2015, 02:44 PM
Why did Yfull come up in this discussion again? Do you guys want to start up a thread on YFull, go ahead?


Have you asked the people in 'the subclades you work with most' to support their work which is independent of the testing cos? I browse the L513 forum and don't recall you mentioning it. Nor in M222 forums.
We have discussed YFull and FGC interpretations both on L513 and we have people that use both. The truth is that with Alex Williamson's Big Tree, we have outstanding phylogenetic analysis for free. I'm not sure how many trees we need and I invest my time to facilitate what I think is most productive. I am the person who contacts almost everyone in L21 to get their Big Y raw results and get them shared on the yahoo group so Alex can get them. We have over 1,100 Big Y results shared via this method. It takes a lot of my time. I personally have had a FGC interpretation done. I paid although they kindly give us $10 discount.

I encourage the sharing of FGC results via that mechanism too but not everyone wants to. That's okay. People have preferences.


If you really want what you describe as a system of record they are more independent than having it at an actual testing co. It would be even better if it was non-commercial I agree but I don't see any unpaid person stepping up to the plate to code one.
I see you do read what I write. That's flattering. For genetic genealogy we also need integrated surname, MDKA origin support and more. The nice thing about working with a testing company is this also supports a chain of evidence that is needed for a formal tree. Plus some testing companies support a complete product set, Sanger Sequenced SNP tests, SNP Packs/Panels, Next Generation Sequencing. These are all tied to the same kit #/record which better supports a chain of evidenced needed for a true formal tree.

I have to keep a cross-reference of FTDNA kit #s, YSEQ IDs, FGC IDs.. and yes I included a column for YFull IDs. I guess you didn't notice that as you track me. BTW, I'm behind on all of that and just about everything. I have to try to prioritize my time.



I believe there is an L21 group at YFull with a large number of admins (15) and 242 members. Everyone who is anyone, it would seem.;)
I am a nobody.

[[[EDIT: This too much fun. I can't resist so I'll copy my note from elsewhere.

... Remember "My name is Nobody"?
https://www.youtube.com/watch?v=2RYq1PLdT0s

;) Shoot, I probably just gave fodder to some other threads who think everyone in R1b dreams of being a cowboy.
]]]

George Chandler
09-25-2015, 03:06 PM
The truth is that with Alex Williamson's Big Tree, we have outstanding phylogenetic analysis for free.

I can't speak for the rest of it but I do know that the R-S1051 tree variations have been incorrect. I know that non Sanger validated SNP's (that were unreliable) were being posted. I sent a couple of private emails regarding the errors previously and received no reply. No corrections were made after the emails were sent.

George

TigerMW
09-25-2015, 03:10 PM
I can't speak for the rest of it but I do know that the R-S1051 tree variations have been incorrect. I know that non Sanger validated SNP's (that were unreliable) were being posted. I sent a couple of private emails regarding the errors previously and received no reply. No corrections were made after the emails were sent.

George
What are the changes you are asking for George? I will get them to Alex.

You realize that Alex's Big Tree is similar to YFull's in that it is experimental and NGS only based, right?

BTW, I think Alex likes the YFull toolset.

Perhaps the issue is that Alex is using all variants that are phylogenetically consistent and you want him to remove those that are not easily Sanger Sequenced or may have some other problems, but this is off-topic from YFull.

Any additional comments on the YFull tree?

Joe B
09-25-2015, 04:53 PM
I am very appreciative of citizen scientists' efforts in R-P312 and R-U106, but the human race is far larger than those two specific clades. I mentioned YFull because I know of no other comprehensive, up-to-date Y-DNA haplotree.
The YFull tree is not to up-to-date, though, not even close. It's like every other tree, some branching is in good shape some is not. To boot, I think they only consider NGS tested BAM files... ironic, given this conversation.

I think if you truly want comprehensive, you'll have to work with FTDNA, warts and all. I know, I know; Razyn and you will cite the molasses and decades problem. Don't forget, you saw your pack this year and you are skeptical of an R1b-DF27 Pack this year.

Umm... this seems to happen all over R1b forums. Folks wonder why such focus on R1b in an R1b forum.:)Let's talk R1b then. lgmayka is correct. The world is a lot bigger than the R1b-P312 and U106 haplogroups. YFull has done a great job with the more basal R1b haplogroups, especially with R1b-Z2103. And YFull cross checks their work with the ISOGG tree. Can't say the same for FTDNA and their haplotree. FTDNA has been ass backwards for the the R1b(U106- P312-) haplogroups for a long time.
I didn't really buy into this bias towards P312 and L21 until L277 was put downstream of L21 in the FTDNA haplotree last week. I do wonder if they even looked at R1b-Z2103 haplotypes that are L277+ and L277- let alone consult the ISOGG or YFull trees. Better yet, why not consult with haplogroup projects to get the phylogeny straight? The YFull haplotree is very accurate for the early R1b branches but will always be behind our project's haplotree because we have smal. YFull gives the consumer a great user interface so they can better understand their NGS results and a haplotree that is updated monthly with TMRCA's, something to look forward to. It's likely that the YFull haplotree has sold quite a few Big Y tests for FTDNA because people can acutully see the potential of NGS testing with YFull.
ISOGG did a great job of updating the R1b (P312- U106-) part of the tree during the SNP avalanche.

TigerMW
09-25-2015, 05:10 PM
Let's talk R1b then. lgmayka is correct. The world is a lot bigger than the R1b-P312 and U106 haplogroups. YFull has done a great job with the more basal R1b haplogroups, especially with R1b-Z2103. And YFull cross checks their work with the ISOGG tree. Can't say the same for FTDNA and their haplotree. FTDNA has been ass backwards for the the R1b(U106- P312-) haplogroups for a long time.
Do you guys really want to talk about the FTDNA haplotree here? I thought you wanted to talk about YFull. YFull was being brought up off-topic in other threads.

I hope no one is saying I don't do anything to support R1b people outside of P312 and U106.

We already have a thread about FTDNA updating their haplotree. I've subcribed to it so if you want to talk about that instead let's go over there.
http://www.anthrogenica.com/showthread.php?5195-Getting-FTDNA-to-update-their-Haplotree


I didn't really buy into this bias towards P312 and L21 until L277 was put downstream of L21 in the FTDNA haplotree last week....
There is actually a very good reason for that and it goes against some of your other perceptions.

George Chandler
09-25-2015, 05:34 PM
What are the changes you are asking for George? I will get them to Alex.

You realize that Alex's Big Tree is similar to YFull's in that it is experimental and NGS only based, right?

BTW, I think Alex likes the YFull toolset.

Perhaps the issue is that Alex is using all variants that are phylogenetically consistent and you want him to remove those that are not easily Sanger Sequenced or may have some other problems, but this is off-topic from YFull.

Any additional comments on the YFull tree?

I'll send you a private email regarding the corrections. I appreciate the offer but not sure why he needs a go between to answer several previous emails?

I do know that it's an experimental tree but I sort of wish he would wait and gather more information before posting it. I realize that we can all make mistakes but posting to the tree errors that can lead to incorrect conclusions especially when his name and his work is appreciated by many in the community. It's not meant to be personal but frustrating when I get emails that ask why my S1051 tree doesn't match that of Alex's. I do notice he has removed some SNP's which are problematic he had posted previously. I personally prefer that SNP's which don't meet Sanger validation aren't posted to trees. There are some like FGC9660 that both YFull and Alex use that seem to be ok but didn't pass Sanger.

I haven't spent as much time as I would like investigating the YFull Tree..the S1051 portion is good in the sense it's conservative. I think it's better to sit back and watch the results come in for a while and try to revalidate using YSEQ or FTDNA before sending anything to ISOGG. My preferences are to cover as much ground as possible in terms of the actual sequencing then keep the non Sanger validated positions on hold and watch them. I agree with the YFULL BAM analysis only criteria. Combine that with the YSEQ conservative Sanger validation and for me that works the best.

George

TigerMW
09-25-2015, 05:54 PM
I'll send you a private email regarding the corrections. I appreciate the offer but not sure why he needs a go between to answer several previous emails?
He's very busy and has done really great work. Folks like YSEQ look at his work closely too.

I try to protect his time. He's too important to get sidetracked like I tend to.



I do know that it's an experimental tree but I sort of wish he would wait and gather more information before posting it. I realize that we can all make mistakes but posting to the tree errors and can lead to incorrect conclusions especially when his name and his work is appreciated by many in the community. It's not meant to be personal but frustrating when I get emails that ask why my S1051 tree doesn't match that of Alex's. I do notice he has removed some SNP's which are problematic he had posted previously. I personally prefer that SNP's which don't meet Sanger validation aren't posted to trees. There are some like FGC9660 that both YFull and Alex use that seem to be ok but didn't pass Sanger.
It is "draft" or "experimental" which means he is not encumbered with process issues, etc. He really does work hard to get it right, given his interpretations. I would not say his blocks and blocks of equivalents diagramming style is for everyone. It really is more for people who are comfortable diving into NGS. Still, its been invaluable.



I haven't spent as much time as I would like investigating the YFull Tree..the S1051 portion is good in the sense it's conservative. I think it's better to sit back and watch the results come in for a while and try to revalidate using YSEQ or FTDNA before sending anything to ISOGG. My preferences are to cover as much ground as possible in terms of the actual sequencing then keep the non Sanger validated positions on hold and watch them. I agree with the YFULL BAM analysis only criteria. Combine that with the YSEQ conservative Sanger validation and for me that works the best.

George
If you like conservative, ISOGG and Ray Banks are the ticket. I punched a bunch into it for L513 but it is exhaustive. Just hit ctrl-F/Find over on their hg R tree:
http://isogg.org/tree/ISOGG_HapgrpR.html

I spared no expense as far as adding phylogenetic equivalents. I'm glad I did and I really do need to thank and compliment Ray because he did a ton of work on it and gave some really good SNP assessment feedback.

He and Alex are folks we are truly indebted to.

Joe B
09-25-2015, 07:49 PM
Do you guys really want to talk about the FTDNA haplotree here? I thought you wanted to talk about YFull. YFull was being brought up off-topic in other threads.

I hope no one is saying I don't do anything to support R1b people outside of P312 and U106.

We already have a thread about FTDNA updating their haplotree. I've subcribed to it so if you want to talk about that instead let's go over there.
http://www.anthrogenica.com/showthread.php?5195-Getting-FTDNA-to-update-their-Haplotree


There is actually a very good reason for that and it goes against some of your other perceptions.Mike, I remember when you kept track of the early R1b branches phylogenetics when nobody esle would or could. It's simply a matter of the oxygen being sucked out of the room for the early branches of R1b, reason unknown.
It's hard to talk about one haplotree without comparing it to another. In this case, it may be a good idea to have your results in a 3rd party tree like YFull's because of the errors in the FTDNA tree. We try to practice good science in building the project's phylogenetic tree. That includes cross checking results, on a level that's beyond me, betweeen the smal's analysis and YFull or Full Genomes Corp. That's smal's idea. The YFull interface is just easier to work with and has a public haplotree. FGC analysis is as good as it gets.

TigerMW
09-25-2015, 09:40 PM
Mike, I remember when you kept track of the early R1b branches phylogenetics when nobody esle would or could. It's simply a matter of the oxygen being sucked out of the room for the early branches or R1b, reason unknown. .

You are right. The oxygen does sucked up. Here is another example of a testing company and what a public figure in the industry sees when he looks at the marketplace.

----------------------------------------------------------------------------------------
On Tue, Aug 25, 2015 at 12:12 PM, Thomas Krahn [email protected] [R1b-L21-Project] <[email protected]> wrote:

"[I] Dear Doug,

To defend my anger, I have said this before. A M343 panel makes only sense for early/old R1b samples.
However more than 90% of all tested R1b samples fall in the L21 or U106 category.
...."
https://groups.yahoo.com/neo/groups/R1b-L21-Project/conversations/messages/29705
---------------------------------------------------------------------------------------

In this case, some feel I over recommended the number of SNPs outside of L21 and U106 to be included in FTDNA's $99 test. I stretched FTDNA for every SNP I could and think broke new ground for the other subclades at least within R1b. Its hard to get ahead of hg G as Ray is nearly one man haplogroup project and ISOGG all in one.


It's hard to talk about one haplotree without comparing it to another. In this case, it may be a good idea to have your results in a 3rd party tree like YFull's because of the errors in the FTDNA tree. We try to practice good science in building the project's phylogenetic tree. That includes cross checking results, on a level that's beyond me, betweeen the smal's analysis and YFull or Full Genomes Corp. That's smal's idea. The YFull interface is just easier to work with and has a public haplotree. FGC analysis is as good as it gets.

I agree. Everything I hear is that YFull has a great user interface. Sensibly, their tree is based on NGS testing, but some who advocate YFull think NGS testing is too expensive and not practical for everyone. That seems a bit ironic.

It's also ironic that FTDNA is actually the one putting out tests that won't pay off well, i.e. the Haplogroup N SNP Pack. They really do want to cover the waterfront and that's good. That's what we need in terms of the masses, but we also need the YFull's, etc. for research platforms.

lgmayka
09-25-2015, 11:26 PM
The truth is that with Alex Williamson's Big Tree, we have outstanding phylogenetic analysis for free. I'm not sure how many trees we need and I invest my time to facilitate what I think is most productive.
That's fine, as long as you realize that a tree specifically for R-P312 is utterly useless to everyone else.

lgmayka
09-25-2015, 11:39 PM
​Sensibly, their tree is based on NGS testing, but some who advocate YFull think NGS testing is too expensive and not practical for everyone. That seems a bit ironic.
Not ironic, just practical. I myself vigorously encourage the Big Y, but it's obvious--from email responses, and even more from the lack of response--that the great majority of project members consider even the discount price of $475 way too much to spend on a hobby. Thus, I am now also vigorously encouraging the SNP packs in appropriate cases. I am still somewhat discouraged by the fairly low number of orders, but we must always take the attitude that "some is much better than none."

miiser
09-26-2015, 12:34 AM

I agree. Everything I hear is that YFull has a great user interface. Sensibly, their tree is based on NGS testing, but some who advocate YFull think NGS testing is too expensive and not practical for everyone. That seems a bit ironic.

This is not ironic. There are two separate issues here.

1. For an individual who has already done an STR test (and targeted SNP testing as warranted by the STR test outcome), are they likely to learn anything new and significant regarding their family history from a NGS test? Will the information they gain justify the expense? For many people (but not all), the answer to this question is "no". NGS is not always a smart investment for the customer, even if they can afford it.

2. For a person who DOES decide to spend big money on NGS, for whatever reason, what is the best investment, the most informative path, and the best option for intelligent data analysis and tree management?


Since the discussion has already strayed into the subject of the motives behind test recommendations -

It seems apparent that the motivation, among some project administrators, for promoting FTDNA is that they are in a position of authority to see, analyze, and control all the data when testers stay within the FTDNA bubble. When individuals stray outside of this system, the administrators who relish this position of authority lose that control. There is nothing ironic about it, but it is an obvious and noteworthy private agenda that factors into which services are recommended by administrators. It then becomes a question not of which service is better for the customer, but of which service is better for the administrator.

You recommend NGS testing because YOU want to see the data. You recommend NGS testing specifically through FTDNA and tree management through Alex because this keeps the data within your sphere of influence. Saying repeatedly that it's a "team sport" doesn't hide the fact that you are recommending the course of action which favors your private agenda.

VinceT
09-26-2015, 05:58 AM
My biggest complaint with YFull right now is that they seem to be exercising selection bias (Big-Y over FGC) in positioning SNPs on their tree.

Almost everything else I've found outstanding, and I recommend that both FGC and BigY BAM files are worthy of submission.

Yesterday, I was advised of the following from a fellow clade mate, which indicates that at least Russian population geneticists are taking it seriously:



From: YFull Team <acgt[at]yfull[dot]com>

Subject: YFull, R1b research

Date: September 22, 2015 at 2:26:53 AM PDT

To: [redacted]


Hello,

We started a collaborative research of some group of R1b samples with Dr. Oleg Balanovsky (Vavilov Institute of General Genetics, Moscow, Russia). We want to compare researched group with samples from different countries and subclades of R1b. Also after the research we will add several samples from Dr. Oleg Balanovsky's collection in our database for improving YTree.
For this research we need to give access at your BAM file [redacted] for Dr. Oleg Balanovsky's team. Can we do it?

Best regards,
Vadim Urasin
YFull Team

Earl Davis
09-26-2015, 04:21 PM
What are the changes you are asking for George? I will get them to Alex.

Perhaps the issue is that Alex is using all variants that are phylogenetically consistent and you want him to remove those that are not easily Sanger Sequenced or may have some other problems, but this is off-topic from YFull.

Any additional comments on the YFull tree?

I think it's vital that we have both or failing that some configuration so certain types of snp can be included or excluded based on what the user needs.

I find the Big Tree most useful for my purposes whilst Yfull may suit others needs better. The main reason I go to Yfull is to look at the estimated branching dates. The reason I go to The Big Tree is to get a full picture of the early branching.

As an example one of my ancestral lines is DF27+.

If I went to the Big Tree and all the complicated and recurrent snips were stripped out I would get an impression of DF27 that would suggest that DF27 had over 40 immediate surviving 'child' branches most of whom probably appeared within 100 years of DF27 itself. I would also get the impression that one of those 'children' then accounted for almost 50% of the 'grandchildren' of DF27 and may even start to imagine that branch might have been some sort of tribal leader.

Including some of the more consistent palindrome snips however immediately challenges that impression even if a degree more caution is now needed due to the dangers of re-loc events and other complications. Suddenly DF27 appears to have a small amount of immediate surviving offspring branches. No longer does it seem that it burst out and establish itself within 100 years of DF27 but that we may be talking about 500 years or more before the surviving branches started to really expand. What seemed a puzzle about why those 40 branches spread so geographically widely in just 100 years now seems to make more sense.

I then like to see my own direct line into context.

I am DF27>ZZ12>ZZ19/ZZ20>Z31644>FCG17112+

Only FCG17112 in that chain is without it's problems and like any SNP will eventually prove recurrent somewhere as more people test. As we know even DF27 has it's own problems. Without the complex variants included in the Big Tree my FGC17112 would sit directly under DF27 as one of many brothers of Z195, a tiny struggling twig of a branch at 1% or less of all DF27 next to Z195 with 47% of all of DF27. Add the complicated snips back and Z195 stays where he is but FGC17112 is now 4 levels lower than Z195 and can be compared with the subclades of Z195 that probably arose at a similar timespan to FGC17112. The problem gets worse in next level under FCG17112 as FGC17112 sits in a block with two other complicated snips that some people will ignore. So by the time we get to the next 'gold standard' snp FGC17119 looking at the tree in the two different ways suggests FGC17119 is two levels (and perhaps less than 200 years after DF27) or with the complicated snips included FGC17119 is 7 levels under DF27 and perhaps in the range of 600+ years after DF27.

So that's why I like the Big Tree. It includes problematic snips providing they have a degree of consistency and gives me an alterative way of visualising the early branching and helps me understand where each of my own snips fit in the context of which of their contempory branches were wondering the earth in that century or so. For those that like snp counting it provides an alternative view of timescales.

Earl.

Rory Cain
10-02-2015, 09:21 PM
My biggest complaint with YFull right now is that they seem to be exercising selection bias (Big-Y over FGC) in positioning SNPs on their tree.

Almost everything else I've found outstanding, and I recommend that both FGC and BigY BAM files are worthy of submission.:

Thanks Vince. I have also found Yfull outstanding. A far more useful storage and retrieval tool than either Big Y or FGC provide their customers. IMHO, after a Big Y or FGC test, Yfull is an essential add-on, with any one if it's beneficial features being sufficient to justify the $49 price tag. Get in now - it could well go up!

Can't say I have seen a bias towards Big Y over FGC. Perhaps that's because I tested through FGC, and I am seeing my FGC-numbered SNPs progressively get confirmed by the Big Y results of others. There appears to be no upper limit to the number of times Big Y can " discover" a SNP already discovered by FGC and still report it as a "novel" SNP. This doesn't help Big Y customers to make sense of their results. No problem. Pay Yfull $49 because Yfull are up to the job even if Big Y are not.

fjnj
10-07-2015, 12:32 AM
What surprised me is that YFull missed 34 of my high quality private snps that were reported by BigY. They did find 9 missed by BigY, but it is clear those are low read (10 or less). I am wondering whether anyone else has similar experience with YFull.

VinceT
10-07-2015, 03:06 AM
^ YFull neglected to identify over 10% of the SNPs below R-U106 found in my Full Genomes BAM file, that were otherwise identified by Full Genomes Corp. and by FTDNA for a Big Y file from someone else in my haplogroup. All save one were in the centromere, which I find understandable. But this is why I recommend having your bases covered (ha-ha) by getting both Full Genomes Corp. and YFull analyze your BAM file. They use different filtering protocols and algorithms which seem to compliment each other nicely.

Petr
10-07-2015, 06:54 AM
BigY misidentifies many SNPs as private while they are not private or they are very unstable. I think you can safely ignore thel BigY interpretation.

TigerMW
10-07-2015, 07:24 AM
BigY misidentifies many SNPs as private ....
Wait a minute. I think Big Y calls are way too rigid but they don't call something "private" or "public". Where are you getting their "private" call? It's not really a call anyway. It is a temporal state. Eventually, many private SNPs turn out to be public.

haleaton
10-07-2015, 12:18 PM
What surprised me is that YFull missed 34 of my high quality private snps that were reported by BigY. They did find 9 missed by BigY, but it is clear those are low read (10 or less). I am wondering whether anyone else has similar experience with YFull.

Seems like a lot. Are you sure they did not give them names and move them from Novel to Known SNPs as they matched other samples being put on the tree? As this happens the number of Novel gets reduced for previous samples.

Also, many of the Big Y high quality private SNPs as identified by FTDNA are widely shared across haplogroups. This culling process is done internally by YFull and is not transparent, but done explicitly by FGC.

I had two samples analyzed by YFull over a year and a half ago and they did identify SNPs that FGC found but classified as less reliable and did not name which turned out to later match in other samples or were verified by Sanger sequencing. I just recently had YFull analyze a Big Y sample, still in process, and they did seem to miss a couple and the analysis will complete (age estimate, STRs) after almost three months which seems long. The small companies often have day jobs, remember. FGC took three weeks to do the same job, but does not do tree building or age estimation--it is just a comprehensive analytical report. [Edit: FGC also identifies INDELs which YFull does not, though some can be found in Yfull if you already know where to look in the callable data. I would count these as misses of valuable mutation information.]

FGC does rarely miss an odd SNP that never gets ranked at all in their four quality ratings. I found this in getting both FGC NGS and Big Y tests on same person. Data always has noise and special cases.

The odd thing was FTDNA identified a SNP, A197, as medium quality that had too low reliability by their criteria to name and YFull did not even call it out at all. Ironically, it ended up on the YFull tree as a single subclade defining SNP, though at the time it was only found in my own two samples from the same person. It is now found the samples of three persons.

I found the analysis by all three companies to teach me something. I find big value of YFull is the ability to query your data and compare against others in a group. It would be nice to be able compare across the entire set of public samples though.

34 sounds like a lot to miss if they were indeed high quality.

fjnj
10-07-2015, 12:57 PM
Seems like a lot. Are you sure they did not give them names and move them from Novel to Known SNPs as they matched other samples being put on the tree? As this happens the number of Novel gets reduced for previous samples.

Also, many of the Big Y high quality private SNPs as identified by FTDNA are widely shared across haplogroups. This culling process is done internally by YFull and is not transparent, but done explicitly by FGC.

I had two samples analyzed by YFull over a year and a half ago and they did identify SNPs that FGC found but classified as less reliable and did not name which turned out to later match in other samples or were verified by Sanger sequencing. I just recently had YFull analyze a Big Y sample, still in process, and they did seem to miss a couple and the analysis will complete (age estimate, STRs) after almost three months which seems long. The small companies often have day jobs, remember. FGC took three weeks to do the same job, but does not do tree building or age estimation--it is just a comprehensive analytical report. [Edit: FGC also identifies INDELs which YFull does not, though some can be found in Yfull if you already know where to look in the callable data. I would count these as misses of valuable mutation information.]

FGC does rarely miss an odd SNP that never gets ranked at all in their four quality ratings. I found this in getting both FGC NGS and Big Y tests on same person. Data always has noise and special cases.

The odd thing was FTDNA identified a SNP, A197, as medium quality that had too low reliability by their criteria to name and YFull did not even call it out at all. Ironically, it ended up on the YFull tree as a single subclade defining SNP, though at the time it was only found in my own two samples from the same person. It is now found the samples of three persons.

I found the analysis by all three companies to teach me something. I find big value of YFull is the ability to query your data and compare against others in a group. It would be nice to be able compare across the entire set of public samples though.

34 sounds like a lot to miss if they were indeed high quality.
I has an answer from YFull and it makes sense. Those 34 private SNPs are all in palindromic regions or regions homologous to X chromosome and hence have little phylogenic value. This is basically a reasonable decision on their part. However, by the same standard, some of the Z series and FGC series SNPs should be discarded too as they are in the same regions.

haleaton
10-07-2015, 02:05 PM
I has an answer from YFull and it makes sense. Those 34 private SNPs are all in palindromic regions or regions homologous to X chromosome and hence have little phylogenic value. This is basically a reasonable decision on their part. However, by the same standard, some of the Z series and FGC series SNPs should be discarded too as they are in the same regions.

Interesting, I wonder if they are being more selective now after study of SNPs for YFull's age estimation. I did notice they more recently may be excluding those SNPs which homologous to X chromosome and therefore cannot be easily Sanger sequenced in a couple cases, though they do seem to be able to distinguish between two phylogenically distinct samples in the limited cases (3 persons) I have looked at and these end up not being "unstable" and found widely in distant Haplogroups.

As to SNPs in the Centromeric GGAAT repeat region, my own A197 which Y-Full has as my U152 > L2 > R-A197 subclade definition on the currrent YFull comes to to mind. It is really part of a MNP GA > AT with its twin, A7399. Recent counsel, I had from Thomas Krahn said:

"For your understanding A197 and A7393 are both in the centromeric GGAAT repeat region. Of course we can design primers for it and maybe we can get a readable sequence. However this SNPs will be meaningless for your research.

You may have heard that the centromere is a highly repetitive region that develops in a permanent flow of self-recombination. Mutations come and go because they are deleted through LOH between the large scale repetitive elements. The reference sequence only represents a small fraction of the repeats that exist in real Y chromosomes. It just can't get sequenced with any sequencing technology, especially not with NGS short read sequencing. The reads just align on the A197 region because the actual regions where they come from are missing in the reference sequence."

Are many of the SNPs on the various trees actually "meaningless?" It would be interesting to know if some of these can be "measured" using whatever method the FTDNA SNP Pack tests use.

On the other hand, in NGS Sequencing by both FGS (BGI & Complete Genomics labs) and FTDNA Big Y this MNP or "Phantom" MNP does appear to be phylogenetically useful, so far, though there are better choices if you are to pick only one to define a subclade.

fjnj
10-07-2015, 03:19 PM
Interesting, I wonder if they are being more selective now after study of SNPs for YFull's age estimation. I did notice they more recently may be excluding those SNPs which homologous to X chromosome and therefore cannot be easily Sanger sequenced in a couple cases, though they do seem to be able to distinguish between two phylogenically distinct samples in the limited cases (3 persons) I have looked at and these end up not being "unstable" and found widely in distant Haplogroups.

As to SNPs in the Centromeric GGAAT repeat region, my own A197 which Y-Full has as my U152 > L2 > R-A197 subclade definition on the currrent YFull comes to to mind. It is really part of a MNP GA > AT with its twin, A7399. Recent counsel, I had from Thomas Krahn said:

"For your understanding A197 and A7393 are both in the centromeric GGAAT repeat region. Of course we can design primers for it and maybe we can get a readable sequence. However this SNPs will be meaningless for your research.

You may have heard that the centromere is a highly repetitive region that develops in a permanent flow of self-recombination. Mutations come and go because they are deleted through LOH between the large scale repetitive elements. The reference sequence only represents a small fraction of the repeats that exist in real Y chromosomes. It just can't get sequenced with any sequencing technology, especially not with NGS short read sequencing. The reads just align on the A197 region because the actual regions where they come from are missing in the reference sequence."

Are many of the SNPs on the various trees actually "meaningless?" It would be interesting to know if some of these can be "measured" using whatever method the FTDNA SNP Pack tests use.

On the other hand, in NGS Sequencing by both FGS (BGI & Complete Genomics labs) and FTDNA Big Y this MNP or "Phantom" MNP does appear to be phylogenetically useful, so far, though there are better choices if you are to pick only one to define a subclade.
I think that YFull is more selective now than they are at the begining, which is a good thing. The list of my novel snps generated by Big Y is over 270, which of course includes "recently" identified snps not in their database, "novel" snps in questionable regions of the Y chromosome, and in the end, there are about 40 useful private snps accoring to Y-Full out of the 270 plus. YFull did pick up some low-coverage, but good quality ones from the BAM file that are not reported by Big Y.

Rory Cain
10-08-2015, 08:22 AM
Wait a minute. I think Big Y calls are way too rigid but they don't call something "private" or "public". Where are you getting their "private" call? It's not really a call anyway. It is a temporal state. Eventually, many private SNPs turn out to be public.

Mike, Petr has a valid point here, he just used a different word. I believe he refers to what FTDNA labels as "novel SNPs". There appears to be no upper limit to the number of times that Big Y can re-"discover" already discovered SNPs (many of them already on the ISOGG or Yfull y-trees because of their commonality.

Rory

Cofgene
10-08-2015, 11:27 AM
I has an answer from YFull and it makes sense. Those 34 private SNPs are all in palindromic regions or regions homologous to X chromosome and hence have little phylogenic value. This is basically a reasonable decision on their part. However, by the same standard, some of the Z series and FGC series SNPs should be discarded too as they are in the same regions.

Yes this seems to be the thought but I wonder if there is data available to back up the "instability" of SNPs in this region over a phylgenetic timeframe [ That time concept is also undefined.] Do we have studies which show SNPs in these regions are "flippy" in less than a 2000 year time frame?

fjnj
10-08-2015, 12:59 PM
Yes this seems to be the thought but I wonder if there is data available to back up the "instability" of SNPs in this region over a phylgenetic timeframe [ That time concept is also undefined.] Do we have studies which show SNPs in these regions are "flippy" in less than a 2000 year time frame?
I can't find the original post but it has been discussed earlier by lgmayka that regions like dyz19 should be ignored for good reasons. About 40 of my private snps reported by ftdna are within dyz19, so I ignored those. But the 36 novel snps ignored by YFull are not in obvious problematic regions. I simply took their words for it but it seems worthy to get the analysis from FGC too.
It seems very clear that YFull does not adhere to a clear standard: they claims those 36 novel snps do not pass their "standard", but report other snps in dyz19 as positive. They even include Z13704, which is located at such a position on the Y chromosome that no further comments are necessary.

TigerMW
10-08-2015, 01:17 PM
I can't find the original post but it has been discussed earlier by lgmayka that regions like dyz19 should be ignored for good reasons. About 40 of my private snps reported by ftdna are within dyz19, so I ignored those. But the 36 novel snps ignored by YFull are not in obvious problematic regions. I simply took their words for it but it seems worthy to get the analysis from FGC too.
It seems very clear that YFull does not adhere to a clear standard: they claims those 36 novel snps do not pass their "standard", but report other snps in dyz19 as positive. They even include Z13704, which is located at such a position on the Y chromosome that no further comments are necessary.
I don't know what YFull is doing but there are many SNPs found in DYZ19 that turn out to be challenging as you would expect.

However, some and very consistent across a lot of people.

I think there is more to it than it is either in DYZ19 or it is out.

fjnj
10-08-2015, 03:31 PM
I don't know what YFull is doing but there are many SNPs found in DYZ19 that turn out to be challenging as you would expect.

However, some and very consistent across a lot of people.

I think there is more to it than it is either in DYZ19 or it is out.
That would be true. However, with some many private snps to chase down, one needs to prioritize and snps in regions such as dyz19 can be justifiably placed on the lower end of the list.

Cofgene
10-08-2015, 04:20 PM
That would be true. However, with some many private snps to chase down, one needs to prioritize and snps in regions such as dyz19 can be justifiably placed on the lower end of the list.

A proper priority cannot be set without an equal opportunity evaluation.

The point that needs to be made for the repeat regions is that they are trouble for the shorter read technologies as of today. We aren't going after them today due to the low thru-put and higher cost of the long read options. This is a technical limitation of what most labs use to get sequence information. That should not dictate "ignoring" them. Within a couple of years the highly repetitive regions will yield to longer read results which allow for viable assemblies based upon embedded SNP locations.

haleaton
10-09-2015, 07:25 PM
. . .
As to SNPs in the Centromeric GGAAT repeat region, my own A197 which Y-Full has as my U152 > L2 > R-A197 subclade definition on the currrent YFull comes to to mind. It is really part of a MNP GA > AT with its twin, A7399. Recent counsel, I had from Thomas Krahn said:

"For your understanding A197 and A7393 are both in the centromeric GGAAT repeat region. Of course we can design primers for it and maybe we can get a readable sequence. However this SNPs will be meaningless for your research.

You may have heard that the centromere is a highly repetitive region that develops in a permanent flow of self-recombination. Mutations come and go because they are deleted through LOH between the large scale repetitive elements. The reference sequence only represents a small fraction of the repeats that exist in real Y chromosomes. It just can't get sequenced with any sequencing technology, especially not with NGS short read sequencing. The reads just align on the A197 region because the actual regions where they come from are missing in the reference sequence."

Are many of the SNPs on the various trees actually "meaningless?" It would be interesting to know if some of these can be "measured" using whatever method the FTDNA SNP Pack tests use.

On the other hand, in NGS Sequencing by both FGS (BGI & Complete Genomics labs) and FTDNA Big Y this MNP or "Phantom" MNP does appear to be phylogenetically useful, so far, though there are better choices if you are to pick only one to define a subclade.

To my surprise, YSEQ reported back my sample as positive for A197 & A7393 from Sanger Sequencing . . .

The only reason they even tried was the lab assistant was new and did not reject them for being in the centromeric GGAAT repeat region.

I still wonder now if a lot of the SNPs on the trees are actually "meaningless" and will go away with longer read lengths or an updated Reference Sequence . . .

George Chandler
10-10-2015, 03:57 AM
I don't know what YFull is doing but there are many SNPs found in DYZ19 that turn out to be challenging as you would expect.

However, some and very consistent across a lot of people.

I think there is more to it than it is either in DYZ19 or it is out.

I'm only speculating as I don't know the specific positions that are being discussed but one possibility is that there are positions deemed unreliable and others that are "not recommended" but "possible" to sequence for. I think those "may" be what is being observed? I have one like that in my R-S1051 group called FGC9660 which didn't meet Sanger standards but doesn't seem to bounce around like others do when comparing many different sets of results.

George

George
10-10-2015, 01:57 PM
Is the yfull.com website down? I haven't been able to reach their tree for over a day.

jbarry6899
10-10-2015, 02:32 PM
Yes, it's been down since yesterday. Vadim posted on Facebook that they are working on it but gave no estimated time for repair.

Rory Cain
10-11-2015, 06:54 AM
Yes, it's been down since yesterday. Vadim posted on Facebook that they are working on it but gave no estimated time for repair.

Back up now. I was able to access it .

George
10-31-2015, 01:24 PM
A question about Yfull numbers.

I use my own clade as an example, but the same question could be put in all contexts. I see that the "formation" date for Y4460 is 2200 BP, as is the TMRCA date. However the "age" of Y4460 differs if one looks at it from the perspective of different "lines". There are two lines in the Y3118 subclade which are considerably older than the given overall "formation" and "TMRCA" ages. One of these puts the beginning of Y4660 at 3083BP and the other at 3017BP.

The question is: why should the oldest date not be taken as the date for the inception of Y4460?

MitchellSince1893
10-31-2015, 06:02 PM
A question about Yfull numbers.

I use my own clade as an example, but the same question could be put in all contexts. I see that the "formation" date for Y4460 is 2200 BP, as is the TMRCA date. However the "age" of Y4460 differs if one looks at it from the perspective of different "lines". There are two lines in the Y3118 subclade which are considerably older than the given overall "formation" and "TMRCA" ages. One of these puts the beginning of Y4660 at 3083BP and the other at 3017BP.

The question is: why should the oldest date not be taken as the date for the inception of Y4460?

I believe they take the average of all kits within a branch to determine these dates. With few data points you sometimes have jr branches with older dates than their parent branches...as more kits are added these things will work themselves out.

George
10-31-2015, 07:04 PM
I believe they take the average of all kits within a branch to determine these dates. With few data points you sometimes have jr branches with older dates than their parent branches...as more kits are added these things will work themselves out.

Thank you for the reply. I guess I'm just too dull to understand the system:behindsofa:

Rory Cain
11-17-2015, 07:52 PM
The question is: why should the oldest date not be taken as the date for the inception of Y4460?

Come to think of it, I've always wondered what these age estimates are based on - SNP or STR markers? Or a combination?

lgmayka
11-19-2015, 03:00 PM
Come to think of it, I've always wondered what these age estimates are based on - SNP or STR markers?
YFull's age estimates are based only on SNPs.

jbarry6899
11-19-2015, 03:03 PM
See: http://www.yfull.com/faq/what-yfulls-age-estimation-methodology/

Joe B
11-19-2015, 08:20 PM
YFull's age estimates are based only on SNPs.

See: http://www.yfull.com/faq/what-yfulls-age-estimation-methodology/
Is anybody working on a SNP/STR hybrid for TMRCA estimates? There are a couple of R1b-Z2103 lines that are working on the ~500 to~1250 YBP time frame.

http://www.yfull.com/faq/how-distance-calculated/

Q: How is "Distance" calculated?

A: The YFull formula for calculating "Distance" is "Differences" divided by "Compared STRs". For example, 49/392 = 0.125. Extrapolating this to the Y67 and Y111 STR tests offered by another company would produce a genetic distance of 8/67 (0.125 x 67) and 14/111 (0.125 x 111).
Last updated on September 24, 2015.
http://www.yfull.com/faq/what-does-genetic-distance-two-str-samples-mean-terms-years-pres/


Q: What does genetic distance of two STR samples mean in terms of years before the present time?

A: YFull does not know the answer, and the matter will be investigated.
Last updated on September 24, 2015.

Rory Cain
11-19-2015, 08:46 PM
YFull's age estimates are based only on SNPs.
Ok, thanks. I suspect then that many of the present estimates, drawn from a limited SNP database, could change considerably as further results arrive and a more complete picture emerges from the database. For instance I seem to recall that the original age estimates for DF21 and DF5 were very modest compared to what we now know. DF5 was supposed to be something like 1600 years old whereas Yfull presently have it as 3,500 years old.

Hence my question whether the age estimates included the STR comparisons. But I have an answer now, so thanks for that.

Rory Cain
11-19-2015, 08:57 PM
See: http://www.yfull.com/faq/what-yfulls-age-estimation-methodology/

Thanks. Yfull's assumed rate of one SNP mutation per 144 years is quite different to other estimates which can be as frequent as one SNP mutation per two generations, which might equate at say 50-70 years rather than 144 years. Even allowing for all the usual variations, like how many years per generation, how many SNPs were reported, etc., I'm wondering how estimates can differ so widely.

VinceT
11-19-2015, 11:08 PM
YFull's estimates are based on limiting to the CombBED regions, which is a subset of typical Big-Y coverage. Big-Y coverage suggests rates somewhere between 120 and 150 years per SNP, so of course the rate implied by the CombBED regions will be skewed to a slower rate.

When you consider data for all 14 million callable sites available from FGC's Y-Elite and WGS data, you will get rate estimates somewhere in the range of 70-100 years per SNP. Estimates giving less than 60-70 years per SNP likely (and incorrectly) assume that all 18-22 million positions available from FGC's Y-Elite and WGS data are validly callable. Basically, chances are that those wildly fast estimates are based on analysis of unfiltered and unqualified data.

Cofgene
11-19-2015, 11:32 PM
Is anybody working on a SNP/STR hybrid for TMRCA estimates? There are a couple of R1b-Z2103 lines that are working on the ~500 to~1250 YBP time frame.



Iain McDonald is in the very, very early stages of investigating that topic. He has indicated that it will be easier to develop the models if 111 STR markers are utilized within the U106 project.