PDA

View Full Version : H6a1a3/T5785C became H6-b2d1p ... OR Confused by YFull MTree



Donwulff
07-21-2019, 08:58 AM
As the title. Looks like H6a1a3 defined by mutation T5785C downstream of H6a1a1 mutation at T11253C has became H6-b2d1p in the latest YFull MTree: https://www.yfull.com/mtree/H6-b2d1p/

I suppose YFull is doing their own thing, but that seems to raise all sorts of questions in the context of the mtDNA haplotree. How can H6b branch be downstream of H6a subclades? What happened to H6-b2d1a, and how did we get to b2d1q? Or is this just an attempt to extend the "official" haplotree without colliding with possible new versions? Seems like there should be a better notation for it, because this would confuse the phylogenic interpretation of the branch, which is the main reason for this notation in the first place.

While some of the markers (T11253C and C522CAC) have mutated back and forth, I don't see it as possible for this really to be a parallel branch of H6a; A16482G and A4727G at least would have had to independently mutate (Purifying selection on ancestral heteroplasmy?). However that's not what the current MTree depics in any case, with H6-b2d1p as sub-clade to H6-a1a1.

Actually, I just noticed there's also "H6a1a2a1" as subclade of "H6-b2d1c2" some with dash others without, and there's also clades like "H6a1a2b-a" so I have to re-think what all this means, or if the MTree is just messed up right now... Any ideas?

Donwulff
07-23-2019, 02:07 PM
Today, the same branch is H6-a5a1o* so I'm going to go with at least "The MTree is just messed up right now" from options of the previous post ;) Most of the questions still remain. The dash may mean branch added by YFull, although in that case it seems like it should now be H6a-5a1o* since H6a already exists. Or should that be read as "dash a", ie. different from H6a? I think that's the case, although I'm not really making heads or tails of the phylogeny in the tree right now.

Donwulff
07-25-2019, 08:31 PM
And today it's H6-a5a1. Going to stop paying attention for now though; I'm not sure when it happened but the tree version now states "YFull MTree 1.01.13413 (under construction)" if it hadn't otherwise become obvious. H6-a5a1 is listed as 50 years old haplogroup, with hundreds of samples in it ;) It's good to see that they're doing some serious re-calculation of the tree thugh, but I'm wondering what's the possible benefit of showing that way-off results in the meanwhile.

Saetro
07-29-2019, 08:10 PM
And today it's H6-a5a1. Going to stop paying attention for now though; I'm not sure when it happened but the tree version now states "YFull MTree 1.01.13413 (under construction)" if it hadn't otherwise become obvious. H6-a5a1 is listed as 50 years old haplogroup, with hundreds of samples in it ;) It's good to see that they're doing some serious re-calculation of the tree thugh, but I'm wondering what's the possible benefit of showing that way-off results in the meanwhile.

In a similar subclade and position.
FTDNA is also supposed to be working on tree extensions.
We are shortly likely to have 2 versions - just like Y.
I am pushing down my impatience by working flat out on autosomal while waiting.
I happen to share an extra mutation with 2 others at FTDNA, so am hoping for a tree extension.

Donwulff
08-04-2019, 11:37 AM
The haplogroup-roulette seems to have ended for me a couple of days ago. First the sample went back to SNP results under progress and the reports became inaccessible, now it's "Complete" again. The branch settled to H6-a5a1* so just star indicating no sub-group identified added. Identifying SNP is T11253C (formerly H6a1a definining, but it's now definining for both H6-a5a and h6-a5a1, huh?) though. A lot of the branch changes seem to be based on indels at location 522 which looks bit strange to me; BGISEQ is worse for indels so I'm also wondering if that could be sequencing error.

Based on a link to Wikitree posted on another disussion, I gather this has happened to other branches earlier in the summer. I've yet to try to figure out how much sense the new assignment & phylogeny makes, though. It makes me curious, in the absence of very strong historical records or ancient samples how exactly do you construct a sound phylogeny for recurrent mutations on the mtDNA? Also, as usual, YFull should have some documentation.

32144

Exclamation mark seems to indicate back-mutation; sometimes three times as with AC302A!!! and AC302A!!!! actually in mismatches. But what's with the color-coding? Most earlier back-mutations seem to be in red, so does that mean mutation not found, which is expected. Or is it a mutation hotspot? Not all instances of the locations are in red though. Some SNP's are marked in grey, intuitively I'd think that's low confidence in call, but A302C in grey for H6 before it mutates back suggests it's something else, maybe uncertain in the phylogeny tree.

Another question these changes raise... if FTDNA does the same, do they get (potentially rightly) accused of plagiarism/copyright infrigement? This is fact based, but with different samples and different algorithms they could arrive at different phylogeny tree (see above about recurrent/back mutations). So we'll have different branch name on every service?

Donwulff
08-04-2019, 06:17 PM
In https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3819997/ "Concept for estimating mitochondrial DNA haplogroups using a maximum likelihood approach (EMMA)":
"Furthermore, zero weight was applied to deletions at positions 521 and 522 as they are sometimes the result of a 5′ alignment of indels in the AC-stretch. For multiple insertions at positions 315 (C-insertions), 455 (T-insertions), 524 (AC-insertions), and 573 (C-insertions) only the first insertion was weighted."

I'm going to assume by default that YFull knows what they're doing, but yeah, given BGISEQ's known worse performance on indels (How much of that is due to different adapter sequence which could be interpreted as indel at the ends of reads?) I'm tempted to spend some time looking at this in detail. I'm not really seeing the same thing on a quick look with BAM browser... In fact, additional considerations would be NUMTs (copies of mtDNA sequence) elsewhere in the genome. I'm also wondering if YFull has something to handle circular genomes, because BWA alignment doesn't do it well. Most research seems to align the reads to two references, one formed by joining end to the beginning of mtDNA sequence so they can align the read continuously over the mtDNA beginning/end. Not that position 522 should be really affected by the sequence breakpoint, but just as a general thought on analysis.

Donwulff
09-05-2019, 07:14 PM
I did stop checking for a while, but now that I checked again, this is getting even somewhat weirder. The base haplogroup switched to H6a1b4 with -a1* I'm guessing everybody on the branch got moved, though I'm not sure if I want to spend the effort to really try to figure out if it makes sense. I know the order of the mutations (and backmutations) makes all the difference, but still interesting to move from H6a1a3 to H6a1b4; maybe not the final position either since at least age estimation still reads "Data is being processed...". Hopefully more samples can help clear the phylogeny, although there's already quite a big group of samples stuck at H6a1b4-a1

Donwulff
10-04-2019, 10:13 PM
Another related post inspired me to look again, and looks like after a wild ride I'm now back to H6a1a3* version 1.02.00 (Under Construction). Actually I think I've gained a star, and I do think list of the samples I'm matching has changed somewhat, so others in related haplogroups might have ended up somewhere else. While I keep appreciating they're actually doing changes, I'm almost ready to express my frustration at the display of apparently badly incorrect results.

On that note, I'd like to remind anybody actually reading to check to make sure your mtDNA settings are set properly, at least mine defaulted to hidden and no country of origin, so I think that's what most people's settings are. If wishing to participate in the genealogical community/finding Y and mtDNA lineages, it's important to have them right. It took a moment to find the settings on https://www.yfull.com/settings/#t12-tab (While logged on).