PDA

View Full Version : New U5 sequences



GailT
10-07-2013, 08:19 PM
I'll use this thread for updates on new U5 sequences, starting with the results from the new Li et al study on diabetes in Denmark. The study has 2000 full mtDNA sequences, including 160 in haplogroup U5, or 8%. The sequencing is of uneven quality with some samples having large numbers of no calls, but I was able to place all but 1 in subclades of U5, listed below.

It is interesting that Denmark is nearly 73% U5a and 27% U5b. The U5b samples are heavily concentrated in U5b2. In constrast, the recent 13 U5 samples from northwest Spain (Zamorra Province) were 31% U5a and 69% U5b.

Some notable finds were new U5a1*, U5b1* and U5b3* samples.

For two of the subclades found often in Finland, U5b1b1a and U5b1b2, there were no U5b1b1a samples, and there were 3 U5b1b2 samples. I've speculated that U5b1b1a arrived in Finland via an eastern European route, and U5b1b2 via a western European route, and these results seem consistent with that theory.


U5a1 = 51%
N = 84
U5a1* = 1
U5a1a1 = 31
U5a1a2 = 5
U5a1b = 22
U5a1c2a = 9
U5a1d = 3
U5a1e = 1
U5a1f = 4
U5a1g = 4
U5a1h = 4
U5a1*i1 = 1

U5a2 = 21%
N = 34
U5a2a = 13
U5a2b = 10
U5a2c = 5
U5a2d = 3
U5a2e = 1
U5a2*g =1


U5b1 = 7.6%
N = 12
U5b1* = 1
U5b1b2 = 3
U5b1c2 = 1
U5b1c2b = 3
U5b1d2 = 2
U5b1e = 2


U5b2 = 17%
N = 27
U5b2a1a1 = 3
U5b2a1a1*C = 2
U5b2a1a1*C2 = 2
U5b2a2a1 = 4
U5b2a2b = 1
U5b2a2b1 = 3
U5b2a2c = 1
U5b2a4a = 1
U5b2a5 = 1

U5b2b* = 1
U5b2b1a = 1
U5b2b4 = 3
U5b2b4*B = 2
U5b2b4*B1 = 1

U5b2c2b = 1


U5b3 = 1.3%
N = 2
U5b3* = 1
U5b3e = 1

Baltimore1937
10-08-2013, 02:53 AM
Why is my U5b2b2 so elusive?

rbstens
11-17-2013, 12:38 AM
Would you please elaborate on what differentiates U5b2a1a1, U5b2a1a1*C, and U5b2a1a1*C2? Also, why haven't these sub-groups been officially designated?

Anglecynn
11-17-2013, 01:09 AM
Interesting, seems U5a1b is fairly common there, or the second largest group among the U5a.

GailT
11-17-2013, 01:12 AM
Would you please elaborate on what differentiates U5b2a1a1, U5b2a1a1*C, and U5b2a1a1*C2? Also, why haven't these sub-groups been officially designated?

Here are the mutations I used to define these:

U5b2a1a1*C: 16239 and 16192!
U5b2a1a1*C2: 13708, 16269


It require a certain number of samples, and diversity of samples, for a new subclade name to be included in Phylotree. And updates have been slow. Perhaps it will make it into the next update.

annika
11-22-2013, 11:59 AM
May I ask that how do you draw a conclusion that the haplogroup U5b1b2 arrived to Finland via Western Europe route?

GailT
01-25-2014, 05:42 AM
May I ask that how do you draw a conclusion that the haplogroup U5b1b2 arrived to Finland via Western Europe route?

This is speculative, but there are several U5b1b2 samples from Denmark, Germany, UK and Ireland, but none from eastern Europe. There are quite a few U5b1b1a samples from eastern Europe, but it is rare in western Europe. So I'm guessing that U5b1b1a and U5b1b2 arrived in Finland by different routes.

GailT
01-25-2014, 06:26 AM
There are 1292 FMS samples just published in a study by Raule et al. on Longevity (link (http://onlinelibrary.wiley.com/doi/10.1111/acel.12186/pdf)). First, thanks to Ian Logan for sorting these into haplogroups.

By country, the number of samples are:
Denmark 854
Finland 292
Italy 125
Greece 21

There are 112 U5 samples, with totals and percentage by country:
Denmark 59 (6.9%)
Finland 48 (16.1%)
Italy 5 (4%)
Greece none

The Li et al study had 2000 Danish samples, with 170 U5, or 8.5%.

The distribution by subclade is similar to the Li et al study and the U5 project results with very few U5b1 samples in Denmark and a large number of U5b1 samples in Finland.

Denmark % of U5
U5a1 32 54%
U5a2 10 17%
U5b1 3 5%
U5b2 10 17%
U5b3 4 7%

Finland % of U5
U5a1 11 23%
U5a2 8 17%
U5b1 25 52%
U5b2 4 8%
U5b3 0 0%

tuuli
02-05-2014, 04:05 PM
My U5b2a1a has a marker 16185T which cannot be found in this group, except to the Z haplogroup.
Please advise if possible, thanks.

GailT
02-06-2014, 06:07 AM
My U5b2a1a has a marker 16185T which cannot be found in this group, except to the Z haplogroup.
Please advise if possible, thanks.

Did you test at FTDNA or 23andMe? If 23andMe, you can use mthap to possibly identify a subclade of U5b2a1a. There is one person in U5b2a1a1 (GQ853200) who also has 16185.

GailT
05-17-2014, 04:37 AM
Ian Logan has processed the 1041 mtDNA full sequence results from the Human Genome Diversity Project. There are 29 U5 sequences. The most interesting one is a U5a1* Uygur/China sample. There is also 1 Hazara/Pakistan sample that is U5a1a1. The others are distributed across Europe and the Middle East, as expected.

Baltimore1937
05-18-2014, 07:12 AM
Ian Logan has processed the 1041 mtDNA full sequence results from the Human Genome Diversity Project. There are 29 U5 sequences. The most interesting one is a U5a1* Uygur/China sample. There is also 1 Hazara/Pakistan sample that is U5a1a1. The others are distributed across Europe and the Middle East, as expected.

Was U5b2b2 among them (U5) in the Human Genome Diversity Project? Could you give us a link so we could se it for ourselves?

GailT
05-18-2014, 01:55 PM
Was U5b2b2 among them (U5) in the Human Genome Diversity Project? Could you give us a link so we could se it for ourselves?


Yes, KJ445922 is U5b2b2 and is "Orcadian". This sample is missing several expected mutations so is not completely reliable. It has an extra mutation A13127G but does not share any extras with the other U5b2b2 samples. Here is the link to Ian's post (http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2014-05/1400007181).

Also, in case you did not already see this, there were two U5b2b2 ancient samples reported by Bollinger et al. from Blatterhohle, Germany dated at 5900 and 5600 ybp. One was a poor quality read and the other did not share any extra mutations with modern people.

parasar
05-18-2014, 06:41 PM
Ian Logan has processed the 1041 mtDNA full sequence results from the Human Genome Diversity Project. There are 29 U5 sequences. The most interesting one is a U5a1* Uygur/China sample. There is also 1 Hazara/Pakistan sample that is U5a1a1. The others are distributed across Europe and the Middle East, as expected.

GailT,

What is your opinion on the provenance of the sequence below that you have reviewed previously?
GQ214520(New Zealand) Corser Haplogroup U5 07-JUN-2010
A73G A263G 309.1C 315.1C A750G A1438G A2706G T3197C A4769G C7028T
A8860G C9107T G9477A A11467G G11719A A12308G G12372A T13617C A13827G G13928C
C14766T T14783C G15221A A15326G C16114A C16192T C16256T C16270T C16294T G16526A
http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2010-06/1276238678
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.0052022&representation=PDF

GailT
05-19-2014, 03:43 AM
What is your opinion on the provenance of the sequence below that you have reviewed previously? GQ214520(New Zealand) Corser Haplogroup U5 07-JUN-2010


GQ214520 is U5a2a1 with extra mutations C9107T, T14783C and G15221A. It is missing A14793G so this might be a reversion, or perhaps they made a mistake in the analysis. I wonder about the quality of the read when they report a reversion and an extra mutation close to the site of the reversion. Corser et al. report this as a "European Haplotype" and it seems likely that GQ214520 has recent European ancestry.

GailT
06-29-2014, 12:55 AM
I'm a volunteer administrator for some of the mtDNA haplogroup projects but no other formal connection to FTDNA. I summarized the project results on the U5 project page (https://www.familytreedna.com/public/u5b/default.aspx?section=results), and plan to update to include recent FMS samples.

U5b2a2 is estimated to be about 11,000 years old and is widespread in northern Europe. There are no U5b2a2samples with 215G but the FMS test would be useful for identifying a subclade of U5b2a2

GailT
07-01-2014, 04:29 AM
I can't send a private message response so i'm replying here. U8c is defined by DV13, i.e., Mannis created a new subclade for this apparently extinct branch of U8. But perhaps extant U8c samples will be found when more people are tested.

Kwheaton
09-29-2014, 02:03 PM
Gail,
Just thought I would give this thread a bump. Any News in the mtDNA U world?
Kelly

GailT
09-30-2014, 04:14 AM
Gail,
Just thought I would give this thread a bump. Any News in the mtDNA U world?
Kelly

There have been very old branches of U2b and U2c (dating to around 25 to 30 kya) found in white South Africans.These haplogroups are most often found in southwest to south Asia, but given the age of these groups, and the fact that they have no members outside of South Africa, they really could have originated anywhere. I'd really like to see more full sequences of U samples from India, Pakistan and Afganistan.

We had one person with HVR test results in the U5 project who appreared to be neither U5a or U5b, which would have been very interesting, but the FMS showed that they were B4 (the U5 prediction was wrong).

I'm still waiting for the mtDNA results for the HGDP samples used in the Lippold and Rieux papers, they include some undersampled regions so maybe we will see some interesting results there.

GailT
10-10-2014, 04:50 PM
I'm still waiting for the mtDNA results for the HGDP samples used in the Lippold and Rieux papers, they include some undersampled regions so maybe we will see some interesting results there.

1058 HGDP sequences appeared on GenBank yesterday and Ian is processing them now...

Krefter
11-10-2014, 04:52 AM
1058 HGDP sequences appeared on GenBank yesterday and Ian is processing them now...

Have you gotten the results yet? If so can you please post them? If you don't already do this, I've found it helpful to predict the haplogroups of samples in academic studies online by looking at their mutations(and keeping track of extras). In a Basque study the vast majority of U5's were U5b1f1a. BTW, through 23andme I found what U5b2a2 clade I belong, It's listed in my signature. Every little feather helps.

Baltimore1937
11-11-2014, 04:49 AM
Since my old laptop crashed, I lost a lot of my bookmarks, including Ian Logan. Anyway, I'm wondering about my rare U5b2b2. Although it is probably Norse or other northern European, it came into being during the LGM down in southern France, etc. Browsing around in Ancestry and all of the different trees there, I keep bumping in to Huguenot or possible Huguenot (how to differentiate between Norman and Huguenot?). What I'm getting at is there is also the possibility that my haplotype never migrated northward after the LGM, but came over to the colonies from France (or from France via England).

GailT
11-11-2014, 07:57 PM
Have you gotten the results yet? If so can you please post them? If you don't already do this, I've found it helpful to predict the haplogroups of samples in academic studies online by looking at their mutations(and keeping track of extras). In a Basque study the vast majority of U5's were U5b1f1a. BTW, through 23andme I found what U5b2a2 clade I belong, It's listed in my signature. Every little feather helps.

It turns out that these were almost all duplicates of the HGDP samples previously published by Zheng et al., some of which had been previously published by Hartmann et al., so now we have some cases where the same individual has 3 different GenBank accession numbers. Not very helpful.

I also look at HVR samples from any studies that seem interesting - I'm the person who identified the U5b1f1a cluster in the Basque samples in the Behar et al study. I shared this info with Doron, and at one time he seemed interested in doing full sequences for these U5 samples, but dropped it for lack of funding.

GailT
11-11-2014, 08:09 PM
I'm wondering about my rare U5b2b2. Although it is probably Norse or other northern European, it came into being during the LGM down in southern France, etc.

There are now 4 other people in your U5b2b2* Group B proposed new subclade, but none of them have listed a country of origin. Two seem to have colonial southern US ancestry and that might suggest an origin in the UK. Hopefully some of these folks will share more info on their ancestry, and you'll get more close matches.

GailT
11-13-2017, 04:36 AM
Peng et al. published 382 mtDNA FMS samples in GenBank, as part of the as yet unpublished paper "Mitochondrial Genomes Uncover the Maternal History of the Pamir Populations". There are 14 U5 samples including 12 from subclades that might be associated with a Steppe/Indo-European or eastern European hunter-gatherer origin (U5a1a1, U5a1b, U5a1d2b, U5a1g, U5a2a1, U5a2b) and two U5b samples.

One of the U5b samples is unclassified U5b2 and is a close match to another unclassified U5b2 sample from India that was was published in GenBank in 2011. They share six mutations (T152C, C264T, T4080C, G9139A, T14199C, A14409G) but they also differ by six steps so are not closely related. This branch of U5b2 has not been found in Europe, and I've guessed that it is a branch of U5b2 that migrated east during or after the LGM. The new Pamiri sample seem consistent with an ancient migration east. We cannot reliably estimate the age of this branch with only two samples, but a difference of 6 steps indicates a very distant maternal ancestor for this branch of U5b2.

The U5b2a1 sample is closely related to U5b2a1b (shares the 2 coding region mutations but not the HVR mutations at 152 and 16325). U5b2a1b has an age estimate of about 3000 years and seems to be Germanic, but the Pamiri sample seems to diverge at an earlier date. There is also a pre-U5b2a1b Ancient sample in the Unetice culture, from Haak et al. dated at about 3600 ybp.

Táltos
01-20-2019, 07:33 PM
Discovered my father’s mtDNA this morning-U5a1! Opened up 23andme and discovered that I have a second cousin match. His mother was the sister of my paternal grandmother. They were Rusyns from what is now NE Slovakia.

GailT
01-22-2019, 04:27 AM
Discovered my father’s mtDNA this morning-U5a1! Opened up 23andme and discovered that I have a second cousin match. His mother was the sister of my paternal grandmother. They were Rusyns from what is now NE Slovakia.

If the mtDNA results are from 23andMe you can upload them to James Lick's mthap web tool and maybe identify a more specific subclade of U5a1.

Táltos
01-22-2019, 02:57 PM
If the mtDNA results are from 23andMe you can upload them to James Lick's mthap web tool and maybe identify a more specific subclade of U5a1.

Thanks Gail, good idea! I will see if he will do that.

solarius
11-12-2019, 12:00 AM
I hope this doesn't count as a necro post but I think it might be of interest to you all.

My father's maternal haplogroup is U5a2a1 which is unheard of as far as Punjabi/Indian people are concerned. I'm trying to upload my own mtDNA fasta file to Genbank but haven't heard back yet, but once I've got confirmation I will be trying to get his uploaded there too. He can confirm that his maternal line is ethnic Punjabi as far back as the mid 19th century.

I am aware of the Pamir study mentioned above, those U5a2a1 samples in that paper cluster most closely when I put my father's FASTA into Genbank's Blast and then generate a tree of descent.

His sample is YF66503 on YFull's mtDNA tree near the top:

https://www.yfull.com/mtree/U5a2a1/

GailT
11-12-2019, 03:11 AM
There is one Indian sample in GenBank, JX462730. There is also an Italian sample that also shares a coding region mutation with the Pamir samples. It would be interesting to see his results to see how closely he matches the Pamir samples.

My guess is that U5a2a1 originated in the Steppe region about 6000 years ago, and likely spread west, south and east from there. My maternal line is also U5a2a1, with German maternal ancestry.



My father's maternal haplogroup is U5a2a1 which is unheard of as far as Punjabi/Indian people are concerned. I'm trying to upload my own mtDNA fasta file to Genbank but haven't heard back yet, but once I've got confirmation I will be trying to get his uploaded there too. He can confirm that his maternal line is ethnic Punjabi as far back as the mid 19th century.

I am aware of the Pamir study mentioned above, those U5a2a1 samples in that paper cluster most closely when I put my father's FASTA into Genbank's Blast and then generate a tree of descent.

His sample is YF66503 on YFull's mtDNA tree near the top:

https://www.yfull.com/mtree/U5a2a1/

aaronbee2010
11-12-2019, 12:45 PM
I hope this doesn't count as a necro post but I think it might be of interest to you all.

My father's maternal haplogroup is U5a2a1 which is unheard of as far as Punjabi/Indian people are concerned. I'm trying to upload my own mtDNA fasta file to Genbank but haven't heard back yet, but once I've got confirmation I will be trying to get his uploaded there too. He can confirm that his maternal line is ethnic Punjabi as far back as the mid 19th century.

I am aware of the Pamir study mentioned above, those U5a2a1 samples in that paper cluster most closely when I put my father's FASTA into Genbank's Blast and then generate a tree of descent.

His sample is YF66503 on YFull's mtDNA tree near the top:

https://www.yfull.com/mtree/U5a2a1/

Me and a few other AG users have a spreadsheet of haplogroups of South Asian relatives found through various genotyping companies.

Since most of the people with edit access to the spreadsheet are Punjabi Jatts (3/4 of them including me - the remaining person is a Punjabi Ramgarhia), this does mean that Jatts are heavily overrepresented on this spreadsheet. Here's a list of Punjabi U5's on there for you (I've filtered out results from close relatives and results from those with non-Punjabi maternal ancestry):

U5a1a1
U5a1
U5b2a1
U5a2b
U5a2a
U5b2b
U5a1
U5b2a2
U5a1a1
U5b2a2
U5a2a1
U5b1b1a
U5b2a2
U5a1
U5a1a1
U5a1

Your fathers mtDNA appears to be a Pontic-Caspian Steppe derived lineage (where your R1a also came from as well). Steppe mtDNA in South Asia is mainly found in the Northwest as far as I know.

solarius
11-13-2019, 04:52 AM
There is one Indian sample in GenBank, JX462730. There is also an Italian sample that also shares a coding region mutation with the Pamir samples. It would be interesting to see his results to see how closely he matches the Pamir samples.

My guess is that U5a2a1 originated in the Steppe region about 6000 years ago, and likely spread west, south and east from there. My maternal line is also U5a2a1, with German maternal ancestry.

I actually overlooked that one without realising, there's hundreds of results and it never appeared near the top of Blast's search.

I've joined the U5 group on YFull and discovered that only one sample on there shared a private mutation with his sample and it was an Italian woman from Tuscany who submitted to HGDP with the sample designation HGDP01168 which appears in Genbank as ascension no KF451626. It's the one that shares the C allele:

34567

Oddly enough the other Indian sample does not share any private mutations with my father's sample, I'm struggling to find the original study that it came from (the sample on Genbank doesn't link to it's "unpublished" study), the associated academics for that sample have done other medical genetics studies involving U5 Punjabis/Indians but I can't find the specific one with this U5a2a1 individual, it would be nice if anybody could find out what background individual P66 came from:

https://www.ncbi.nlm.nih.gov/nuccore/JX462730

solarius
11-13-2019, 05:16 AM
Me and a few other AG users have a spreadsheet of haplogroups of South Asian relatives found through various genotyping companies.

Since most of the people with edit access to the spreadsheet are Punjabi Jatts (3/4 of them including me - the remaining person is a Punjabi Ramgarhia), this does mean that Jatts are heavily overrepresented on this spreadsheet. Here's a list of Punjabi U5's on there for you (I've filtered out results from close relatives and results from those with non-Punjabi maternal ancestry):

U5a1a1
U5a1
U5b2a1
U5a2b
U5a2a
U5b2b
U5a1
U5b2a2
U5a1a1
U5b2a2
U5a2a1
U5b1b1a
U5b2a2
U5a1
U5a1a1
U5a1

Your fathers mtDNA appears to be a Pontic-Caspian Steppe derived lineage (where your R1a also came from as well). Steppe mtDNA in South Asia is mainly found in the Northwest as far as I know.

Thanks I appreciate the info, on FTDNA there appears to be nobody of Indian origin when it comes to U5a2a1, my father has 57 full sequence matches all with a GD of 3. Just to point out, when it comes to mtDNA it's not particularly useful to compare outside of haplogroups, all those haplogroups you posted haven't shared a common maternal ancestor between each other for over 33,000 years. Even a full sequence mtDNA match with someone (ie within a haplogroup) can represent a 600 year time frame. It is nice to know that there is at least 1 other Punjabi in your dataset that is U5a2a1, ironically in this context those "non-Punjabis" that you've filtered out are more relevant assuming they are also U5a2a1, it would be interesting to know of their background too.

Regarding the Pontic-Caspian Steppe, if you mean proto-Indo-European, very likely I would agree, if you mean Saka/Scythian, I would disagree.

aaronbee2010
11-13-2019, 01:57 PM
Thanks I appreciate the info, on FTDNA there appears to be nobody of Indian origin when it comes to U5a2a1, my father has 57 full sequence matches all with a GD of 3. Just to point out, when it comes to mtDNA it's not particularly useful to compare outside of haplogroups, all those haplogroups you posted haven't shared a common maternal ancestor between each other for over 33,000 years. Even a full sequence mtDNA match with someone (ie within a haplogroup) can represent a 600 year time frame. It is nice to know that there is at least 1 other Punjabi in your dataset that is U5a2a1, ironically in this context those "non-Punjabis" that you've filtered out are more relevant assuming they are also U5a2a1, it would be interesting to know of their background too.

Regarding the Pontic-Caspian Steppe, if you mean proto-Indo-European, very likely I would agree, if you mean Saka/Scythian, I would disagree.

While these groups may have a TMRCA over 33k years, what's also very important to consider is how close together these lineages were and are they "equivalent" to each other i.e. did they come from the same source around the same time?

To give an example, The I2-L704 in Swat Valley IA samples would've come from the same source (PC_Steppe) as the R1a-Z93 found in them. While the TMRCA of these two lineages is around 47200 years (IJK), what matters more is that they're historically equivalent to each other.

Transferring this thought process to mtDNA, you can say that U5, H1, H2, W1 (and others) are equivalent in the sense that they're groups associated with one particular region and timeframe (PC_Steppe). Of course, posting all PC_Steppe samples in a list would take up too much time, so just posting U5's made more sense to me. Evidently, it seems you're more interested in samples specifically along the lines of U5a2a1 (including upstream subclades such as U5a2a, U5a2 etc. - these samples may turn out to be U5a2a1 upon further testing) than with U5 as a whole. As I've said, the subclades themselves are historically equivalent.

The samples I filtered were all from samples with European maternal ancestry, with the exception of one whose maternal grandmother was from Myanmar. In the context of U5's historical presence in Punjab, I don't think those samples are particularly relevant.

And yes, I am referring to PC_Steppe in a PIE context.

solarius
11-13-2019, 03:10 PM
While these groups may have a TMRCA over 33k years, what's also very important to consider is how close together these lineages were and are they "equivalent" to each other i.e. did they came from the same source around the same time?

To give an example, The I2-L704 in Swat Valley IA samples would've come from the same source (PC_Steppe) as the R1a-Z93 found in them. While the TMRCA of these two lineages is around 47200 years (IJK), what matters more is that they're historically equivalent to each other.

Transferring this thought process to mtDNA, you can say that U5, H1, H2, W1 (and others) are equivalent in the sense that they're groups associated with one particular region and timeframe (PC_Steppe). Of course, posting all PC_Steppe samples in a list would take up too much time, so just posting U5's made more sense to me. Evidently, it seems you're more interested in samples specifically along the lines of U5a2a1 (including upstream subclades such as U5a2a, U5a2 etc. - these samples may turn out to be U5a2a1 upon further testing) then with U5 as a whole. As I've said, the subclades themselves are historically equivalent.

The samples I filtered were all from samples with European maternal ancestry, with the exception of one whose maternal grandmother was from Myanmar. In the context of U5's historical presence in Punjab, I don't think those samples are particularly relevant.

And yes, I am referring to PC_Steppe in a PIE context.

This is a cogent post, I assumed you might have been unaware of how these haplogroups were related to each other, apologies for underestimating your knowledge.

Are you familiar with these Black Sea Scythian samples? Mind the sample links, they're missing an "m" ie links should say "mtree" instead of "tree":

https://www.yfull.com/samples-from-paper/179/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5339713/

These samples range approximately between 7th century BCE to the 2nd Century BCE and they somewhat correlate to your list.

Regarding my father's subclade, I would be interested in confirming the origin of his mtDNA, YFull had incorrectly dated his haplogroup defining SNPs (not shown on the tree) as being 1200 YBP which led me to pursue a non-PIE origin given the published samples on YFull's mtree, this can't be the case given the 2500 year old Scythian U5a2a1 sample.

aaronbee2010
11-13-2019, 09:18 PM
This is a cogent post, I assumed you might have been unaware of how these haplogroups were related to each other, apologies for underestimating your knowledge.

Are you familiar with these Black Sea Scythian samples? Mind the sample links, they're missing an "m" ie links should say "mtree" instead of "tree":

https://www.yfull.com/samples-from-paper/179/

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5339713/

These samples range approximately between 7th century BCE to the 2nd Century BCE and they somewhat correlate to your list.

Regarding my father's subclade, I would be interested in confirming the origin of his mtDNA, YFull had incorrectly dated his haplogroup defining SNPs (not shown on the tree) as being 1200 YBP which led me to pursue a non-PIE origin given the published samples on YFull's mtree, this can't be the case given the 2500 year old Scythian U5a2a1 sample.

The problem is that YFull haven't yet sorted the U5a2a1 samples on their MTree into different samples. At one point they had done so, but then their tree underwent a lot of revision and they eventually reverted back to the beginning to start again. The age estimations also fluctuated heavily (from 1000 to 10000 years roughly for my subclade, IIRC) but then they took off the age estimation. Your best bet would probably be what you said you were pretty much doing in a previous post, analysing the individual samples under the U5a2a1 MTree (as they appear to be public submissions on GenBank) and forming your own tree, although waiting for YFull is another option.

The PC_Steppe mtDNA in Punjab probably left from the aforementioned region around 5000-6000 years ago, if I'm not mistaken. I'm not sure how this relates exactly to those Scythian samples, although their U5 subclades appear to be from the Eurasian Steppe and not South Asia, so the TMRCA of Scythian and Punjabi mtDNA is probably not that much less than the 5000-6000 year old region I mentioned, if at all.

solarius
11-13-2019, 11:56 PM
The problem is that YFull haven't yet sorted the U5a2a1 samples on their MTree into different samples. At one point they had done so, but then their tree underwent a lot of revision and they eventually reverted back to the beginning to start again. The age estimations also fluctuated heavily (from 1000 to 10000 years roughly for my subclade, IIRC) but then they took off the age estimation. Your best bet would probably be what you said you were pretty much doing in a previous post, analysing the individual samples under the U5a2a1 MTree (as they appear to be public submissions on GenBank) and forming your own tree, although waiting for YFull is another option.

The PC_Steppe mtDNA in Punjab probably left from the aforementioned region around 5000-6000 years ago, if I'm not mistaken. I'm not sure how this relates exactly to those Scythian samples, although their U5 subclades appear to be from the Eurasian Steppe and not South Asia, so the TMRCA of Scythian and Punjabi mtDNA is probably not that much less than the 5000-6000 year old region I mentioned, if at all.

You mentioned a spreadsheet of haplogroups, where is it? Who maintains it? I would like to update it with my data and compare my private mutations with that fellow U5a2a1.

GailT
11-14-2019, 04:53 AM
I've joined the U5 group on YFull and discovered that only one sample on there shared a private mutation with his sample and it was an Italian woman from Tuscany who submitted to HGDP with the sample designation HGDP01168 which appears in Genbank as ascension no KF451626. It's the one that shares the C allele:


Oddly enough the other Indian sample does not share any private mutations with my father's sample, I'm struggling to find the original study that it came from (the sample on Genbank doesn't link to it's "unpublished" study), the associated academics for that sample have done other medical genetics studies involving U5 Punjabis/Indians but I can't find the specific one with this U5a2a1 individual, it would be nice if anybody could find out what background individual P66 came from:

https://www.ncbi.nlm.nih.gov/nuccore/JX462730


The Indian U5a2a1 sample is from a study of people the LHON mutation that causes eye disease, so that is one that you don't want to match, and they are unlikely to share any specific info about any of the individuals in that study.

Because mtDNA has a very slow but also a highly variable mutation rate, even exact match can share a common maternal ancestor up to several thousand years ago, or even older in samples with no "recent" mutations, so you may not get very specific info for origins from mtDNA other than IE Steppe several thousand years ago. The Tuscan HGDP01168 sample shares a coding region mutation A15844G with the Pamir samples, and the Pamir samples have additional extra mutations. My age estimate for U5a2a1 is about 6000 years, so the A15844G group probably shares a common maternal ancestor probably between 5000 to 2000 years ago, maybe toward the older end of that range given the wide geographic distribution. If your father has more extra mutations in addition to A15844G, exact matches could be more recently related.

solarius
11-14-2019, 06:32 AM
The Indian U5a2a1 sample is from a study of people the LHON mutation that causes eye disease, so that is one that you don't want to match, and they are unlikely to share any specific info about any of the individuals in that study.

Because mtDNA has a very slow but also a highly variable mutation rate, even exact match can share a common maternal ancestor up to several thousand years ago, or even older in samples with no "recent" mutations, so you may not get very specific info for origins from mtDNA other than IE Steppe several thousand years ago. The Tuscan HGDP01168 sample shares a coding region mutation A15844G with the Pamir samples, and the Pamir samples have additional extra mutations. My age estimate for U5a2a1 is about 6000 years, so the A15844G group probably shares a common maternal ancestor probably between 5000 to 2000 years ago, maybe toward the older end of that range given the wide geographic distribution. If your father has more extra mutations in addition to A15844G, exact matches could be more recently related.

My father's sample shares two private mutations with that Tuscan sample at A302AC and T16192C! and also shares A302AC with several other samples within U5a2a1. He doesn't share any private mutations with the Indian sample JX462730.

T16192C! does define the downstream clade U5a2a2 as well.

I ran Blast again to grab some screenshots, I collated all the U5a2a1 samples from YFull's mtree and Ian Logan's tree as well to input 55 samples (including my dad's) and I noticed something interesting when I generated the distance trees:

3459734598

I did one with fast minimum evolution (the first screenshot) and another with neighbour joining (both sorted by distance too), in both I re-rooted the tree to a midpoint from a 3000 year old confirmed U5a2a1 discovered in the Baltic. I then selected the other Indian sample P66 for reference sake.

P66 appears to have a much more distant common ancestor between my dad's sample than his does with the Tuscan sample and the 3 Pamir samples that cluster nearby too. The Neighbour Joining Algorithm appears to make the most sense as the Tuscan sample has SNPs in common with both my dad's and the Pamir group ie the most likely and simplest path for those mutations to occur. A302AC is also common between all 5 samples.

I'm going to submit both my own mtDNA genome and my dad's to NCBI pending their acceptance and hopefully Ian Logan will incorporate these results once I get the ascension numbers.

GailT
11-15-2019, 04:23 AM
I'm going to submit both my own mtDNA genome and my dad's to NCBI pending their acceptance and hopefully Ian Logan will incorporate these results once I get the ascension numbers.

They will only accept samples submitted by the person who was tested, so you might need to submit your dad's using his email, or have him submit if possible. There were even some cases in which they removed samples already in GenBank after finding they were submitted by a family member of the person who was tested.

aaronbee2010
11-16-2019, 01:04 AM
You mentioned a spreadsheet of haplogroups, where is it? Who maintains it? I would like to update it with my data and compare my private mutations with that fellow U5a2a1.

Check your PM please. You only need 2 more posts before you're able to reply to it.

mdn
11-17-2019, 04:52 PM
My father's mtDNA: U5a2a1 (test - DanteLabs 30x, analyzed by mtDNA server, dna.jameslick.com and some other service).
It will be uploaded on YFull after it will finish Y analysis.

Markers found (shown as differences to rCRS):

HVR2: 73G 263G (309.1C) 310.1T (315.1C)
CR: 750G 1438G 2706G 3197C 4769G 7028T 7837C 8860G 9477A 11467G 11719A 12308G 12372A 13617C 13630G 13827G 13928C 14766T 14793G 15326G
HVR1: 16114A 16192T 16256T 16270T 16292T 16294T (16519C) 16526A

Best mtDNA Haplogroup Matches:

1) U5a2a1

Defining Markers for haplogroup U5a2a1:
HVR2: 73G 263G
CR: 750G 1438G 2706G 3197C 4769G 7028T 8860G 9477A 11467G 11719A 12308G 12372A 13617C 13827G 13928C 14766T 14793G 15326G
HVR1: 16114A 16192T 16256T 16270T 16294T 16526A

Marker path from rCRS to haplogroup U5a2a1 (plus extra markers):
H2a2a1(rCRS) ⇨ 263G ⇨ H2a2a ⇨ 8860G 15326G ⇨ H2a2 ⇨ 750G ⇨ H2a ⇨ 4769G ⇨ H2 ⇨ 1438G ⇨ H ⇨ 2706G 7028T ⇨ HV ⇨ 14766T ⇨ R0 ⇨ 73G 11719A ⇨ R ⇨ 11467G 12308G 12372A ⇨ U ⇨ 16192T 16270T ⇨ U5 ⇨ 3197C 9477A 13617C ⇨ U5a'b ⇨ 14793G 16256T ⇨ U5a ⇨ 16526A ⇨ U5a2 ⇨ 16294T ⇨ U5a2(C16294T) ⇨ 16114A ⇨ U5a2a ⇨ 13827G 13928C ⇨ U5a2a1 ⇨ (309.1C) 310.1T (315.1C) 7837C 13630G 16292T (16519C)

Good Match! Your results also had extra markers for this haplogroup:
Matches(26): 73G 263G 750G 1438G 2706G 3197C 4769G 7028T 8860G 9477A 11467G 11719A 12308G 12372A 13617C 13827G 13928C 14766T 14793G 15326G 16114A 16192T 16256T 16270T 16294T 16526A
Extras(4): (309.1C) 310.1T (315.1C) 7837C 13630G 16292T (16519C)

solarius
11-20-2019, 10:40 AM
My father's mtDNA: U5a2a1 (test - DanteLabs 30x, analyzed by mtDNA server, dna.jameslick.com and some other service).
It will be uploaded on YFull after it will finish Y analysis.

Markers found (shown as differences to rCRS):

HVR2: 73G 263G (309.1C) 310.1T (315.1C)
CR: 750G 1438G 2706G 3197C 4769G 7028T 7837C 8860G 9477A 11467G 11719A 12308G 12372A 13617C 13630G 13827G 13928C 14766T 14793G 15326G
HVR1: 16114A 16192T 16256T 16270T 16292T 16294T (16519C) 16526A

Best mtDNA Haplogroup Matches:

1) U5a2a1

Defining Markers for haplogroup U5a2a1:
HVR2: 73G 263G
CR: 750G 1438G 2706G 3197C 4769G 7028T 8860G 9477A 11467G 11719A 12308G 12372A 13617C 13827G 13928C 14766T 14793G 15326G
HVR1: 16114A 16192T 16256T 16270T 16294T 16526A

Marker path from rCRS to haplogroup U5a2a1 (plus extra markers):
H2a2a1(rCRS) ⇨ 263G ⇨ H2a2a ⇨ 8860G 15326G ⇨ H2a2 ⇨ 750G ⇨ H2a ⇨ 4769G ⇨ H2 ⇨ 1438G ⇨ H ⇨ 2706G 7028T ⇨ HV ⇨ 14766T ⇨ R0 ⇨ 73G 11719A ⇨ R ⇨ 11467G 12308G 12372A ⇨ U ⇨ 16192T 16270T ⇨ U5 ⇨ 3197C 9477A 13617C ⇨ U5a'b ⇨ 14793G 16256T ⇨ U5a ⇨ 16526A ⇨ U5a2 ⇨ 16294T ⇨ U5a2(C16294T) ⇨ 16114A ⇨ U5a2a ⇨ 13827G 13928C ⇨ U5a2a1 ⇨ (309.1C) 310.1T (315.1C) 7837C 13630G 16292T (16519C)

Good Match! Your results also had extra markers for this haplogroup:
Matches(26): 73G 263G 750G 1438G 2706G 3197C 4769G 7028T 8860G 9477A 11467G 11719A 12308G 12372A 13617C 13827G 13928C 14766T 14793G 15326G 16114A 16192T 16256T 16270T 16294T 16526A
Extras(4): (309.1C) 310.1T (315.1C) 7837C 13630G 16292T (16519C)

Do you plan on uploading your results to YFull?

https://yfull.com/mtree/U5a2a1/

I would also implore you to upload your mtDNA to NCBI's Genbank to further enhance the study of this haplogroup, if you're interested I could point you in the right direction with some easy to follow instructions.

mdn
11-26-2019, 09:06 AM
Do you plan on uploading your results to YFull?
I would also implore you to upload your mtDNA to NCBI's Genbank to further enhance the study of this haplogroup, if you're interested I could point you in the right direction with some easy to follow instructions.
It is YF67337, so as I understand - there will be possibility to get mtDNA only after Y processing will be finished (seems it will take some time).
Regarding to NCBI - please could you provide some links or some description, I do not understand how it can be done.

solarius
12-01-2019, 04:19 AM
It is YF67337, so as I understand - there will be possibility to get mtDNA only after Y processing will be finished (seems it will take some time).
Regarding to NCBI - please could you provide some links or some description, I do not understand how it can be done.

A renown amateur geneticist and retired physician Ian Logan has a website that helps you make such submissions with a tool (you will need your father's consent and for him to send it from his email if you wish to do this, NCBI will remove/reject samples submitted on behalf of relatives):

http://www.ianlogan.co.uk/checker/submission_maker.htm

If you do think that you will make such an important donation, do inform him once you obtain an ascension number from NCBI's Genbank, as he maintains a mitochondrial haplotree on his website.

Good luck and let us know what ascension number you get.

I have one for my sample which is due to be publicly available on the 3rd of December.

GailT
12-09-2019, 03:46 AM
Good Match! Your results also had extra markers for this haplogroup:
Extras(4): (309.1C) 310.1T (315.1C) 7837C 13630G 16292T (16519C)

There is a U5a2a1 sample in GenBank, KY670988 (https://www.ncbi.nlm.nih.gov/nuccore/KY670988), that shares two of his extra mutations, 7837C and 16292T. The sample is from the Pskov region of Russia and was published in a research study 9 yeas ago.

FreeAmin
12-30-2019, 02:02 AM
Is there any new info about U5b2a1a1 and U5b2a1a ?

FreeAmin
12-30-2019, 02:08 AM
I'm going to do a WGS I need help from someone who knows the U5 haplogroup very well I'm U5b2a1a1 according to 23andme I'll surely have some new exclusive subclade

GailT
01-19-2020, 12:40 AM
I'm going to do a WGS I need help from someone who knows the U5 haplogroup very well I'm U5b2a1a1 according to 23andme I'll surely have some new exclusive subclade

I have an age estimate of about 7000 years for U5b2a1a1, and it is most often found in northern Europe. There are well over 100 full sequence samples of U5b2a1a1 and many subgroups. If you send me your full sequence results, I can tell which subgroup you match.

FreeAmin
02-10-2020, 04:02 PM
I have an age estimate of about 7000 years for U5b2a1a1, and it is most often found in northern Europe. There are well over 100 full sequence samples of U5b2a1a1 and many subgroups. If you send me your full sequence results, I can tell which subgroup you match.

I am waiting for my WGS results, when I have them I will contact you and Ian Logan and donate my mtdna to GenBank I am sure I will have a new exclusive branch.

FreeAmin
02-22-2020, 04:41 PM
I have an age estimate of about 7000 years for U5b2a1a1, and it is most often found in northern Europe. There are well over 100 full sequence samples of U5b2a1a1 and many subgroups. If you send me your full sequence results, I can tell which subgroup you match.
I received my results, and I sent you the fasta file. I contacted you with your e-mail because we've contacted each other in the past gto******@*mail.com