PDA

View Full Version : GenBank Submissions



Humanist
06-20-2014, 02:53 AM
Funny, that 16311C is one of the terminal SNPs for M30b so you have more matches for M30b than any other haplogroup but two mismatches with it.

Try submitting your sample here: http://www.ianlogan.co.uk/Submission.htm

Does Sein have a mtDNA FGS? Or is he referring to his 23andMe data?

Dr_McNinja
06-20-2014, 02:56 AM
Does Sein have a mtDNA FGS? Or is he referring to his 23andMe data?Oh whoops, forgot he had 23andMe and not FTDNA. If you have an mt Full Sequence test from FTDNA you can use that.

Humanist
06-20-2014, 03:01 AM
Oh whoops, forgot he had 23andMe and not FTDNA. If you have an mt Full Sequence test from FTDNA you can use that.

Do you have a FGS? If so, do you have a link to your sample's page at GenBank? I am here (http://www.ncbi.nlm.nih.gov/nuccore/296801916). I used to have a link to it in my signature page on the old DNA Forums, and had kind of forgotten about it. Glad I submitted my sample, though, as it was included in a study of HV4 a couple of years ago.

Dr_McNinja
06-20-2014, 03:09 AM
Do you have a FGS? If so, do you have a link to your sample's page at GenBank? I am here (http://www.ncbi.nlm.nih.gov/nuccore/296801916). I used to have a link to it in my signature page on the old DNA Forums, and had kind of forgotten about it. Glad I submitted my sample, though, as it was included in a study of HV4 a couple of years ago.No, not yet, I just e-mailed mine today.

Humanist
06-20-2014, 03:19 AM
No, not yet, I just e-mailed mine today.

Awesome! Congrats.

Dr_McNinja
06-20-2014, 02:57 PM
I added my rCRS results in here as number 14: http://i.imgur.com/UxD9rys.png I'm guessing that's how it'll look once they add me. I seem to have the most overlap with the one above me (#13) but they have overlap with the others. Any idea on how to find out where these samples were from?

I noticed some of the values in RSRS results don't match rCRS. My mismatched C16311T for example is only in RSRS and not in rCRS while the other M30b people have T16311C presumably from their rCRS tables.

Here's the e-mail I got from Ian summarizing my mutations from the rCRS table:


Punjabi Jatt

USA

Haplogroup M30b

Simplified CRS mutations (42)

73G 152C 195A 263G 315.1C 469T 489C 522- 523- 750G
1284Y 1438G 2706G 4769G 5147A 5529G 5899.1C 7028T 7647Y 8701G
8860G 9540C 10398G 10400T 10873C 11017C 11719A 11914A 12007A 12705T
14766T 14783C 14870G 15043A 15301A 15326G 15431A 16093C 16223T 16266T
16278T 16519C

CRS mutations as given by sqn (42)

A73G T152C T195A A263G 315.1C C469T T489C C522- A523- A750G
T1284Y A1438G A2706G A4769G G5147A A5529G 5899.1C C7028T T7647Y A8701G
A8860G T9540C A10398G C10400T T10873C T11017C G11719A G11914A G12007A C12705T
C14766T T14783C A14870G G15043A G15301A A15326G G15431A T16093C C16223T C16266T
C16278T T16519C

--------------

CHECK: CRS mutations as given by FASTA file suplied (42)

A73G T152C T195A A263G 315.1C C469T T489C C522- A523- A750G
T1284Y A1438G A2706G A4769G G5147A A5529G 5899.1C C7028T T7647Y A8701G
A8860G T9540C A10398G C10400T T10873C T11017C G11719A G11914A G12007A C12705T
C14766T T14783C A14870G G15043A G15301A A15326G G15431A T16093C C16223T C16266T
C16278T T16519C

-----------------

A simple breakdown of the CRS mutations (42)

A73G - mutation in HVR2 area
T152C - mutation in HVR2 area
T195A - mutation in HVR2 area
A263G - mutation in HVR2 area
315.1C - insertion in HVR2 area
C469T - mutation in HVR2 area
T489C - mutation in HVR2 area
C522- - deletion in HVR2 area
A523- - deletion in HVR2 area
A750G - mutation in 12S rRNA gene (MT-RNR2) . . . . . . . Common mutation near CRS
T1284Y - mutation in 12S rRNA gene (MT-RNR2)
A1438G - mutation in 12S rRNA gene (MT-RNR2) . . . . . . . Common mutation near CRS
A2706G - mutation in 16S rRNA gene (MT-RNR1) . . . . . . . Common mutation near CRS
A4769G - mutation in peptide NAD2 (MT-ND2) synonymous - no change in amino acid
G5147A - mutation in peptide NAD2 (MT-ND2) synonymous - no change in amino acid
A5529G - mutation in tRNA for Tryptophan (MT-TW)
5899.1C - insertion in a non-coding area
C7028T - mutation in peptide COX1 (MT-CO1) synonymous - no change in amino acid
T7647Y - mutation in peptide COX2 (MT-CO2) Heteroplasmy T7647C is non-synonymous 'I21T'
A8701G - mutation in peptide ATP6 (MT-ATP6) non-synonymous 'T59A' Defines Haplogroup M
A8860G - mutation in peptide ATP6 (MT-ATP6) non-synonymous 'T112A' Common mutation near CRS
T9540C - mutation in peptide COX3 (MT-CO3) synonymous - no change in amino acid
A10398G - mutation in peptide NAD3 (MT-ND3) non-synonymous 'T114A' Defines Haplogroup K1, M & L
C10400T - mutation in peptide NAD3 (MT-ND3) synonymous - no change in amino acid
T10873C - mutation in peptide NAD4 (MT-ND4) synonymous - no change in amino acid
T11017C - mutation in peptide NAD4 (MT-ND4) synonymous - no change in amino acid
G11719A - mutation in peptide NAD4 (MT-ND4) synonymous - no change in amino acid
G11914A - mutation in peptide NAD4 (MT-ND4) synonymous - no change in amino acid
G12007A - mutation in peptide NAD4 (MT-ND4) synonymous - no change in amino acid
C12705T - mutation in peptide NAD5 (MT-ND5) synonymous - no change in amino acid
C14766T - mutation in peptide CYTB (MT-CYB ) non-synonymous 'T7I' . Common mutation near CRS
T14783C - mutation in peptide CYTB (MT-CYB ) synonymous - no change in amino acid
A14870G - mutation in peptide CYTB (MT-CYB ) non-synonymous 'I42V' Defines Haplogroup M27c
G15043A - mutation in peptide CYTB (MT-CYB ) synonymous - no change in amino acid
G15301A - mutation in peptide CYTB (MT-CYB ) synonymous - no change in amino acid
A15326G - mutation in peptide CYTB (MT-CYB ) non-synonymous 'T194A' Common mutation near CRS
G15431A - mutation in peptide CYTB (MT-CYB ) non-synonymous 'A229T' Defines Haplogroup U6b1 & M30b
T16093C - mutation in HVR1 area
C16223T - mutation in HVR1 area
C16266T - mutation in HVR1 area
C16278T - mutation in HVR1 area
T16519C - mutation in HVR1 area


And matches fairly well (because of 'A5529G' & 'T11017C') with sequence :

JQ702987 Behar Haplogroup M30b 07-APR-2012
44.1C A73G T152C T195A A263G 309.1C 315.1C T489C C522- A523-
A750G A1438G A2706G A4769G G5147A A5529G 5899.1C C7028T A8701G A8860G
T8937C T9540C A10398G C10400T T10873C T11017C G11719A G12007A C12705T C14766T
T14783C G15043A G15301A A15326G G15431A C16223T C16278T T16304C T16311C G16391A
T16519C

See also:
http://www.ianlogan.co.uk/sequences_by_group/m30_genbank_sequences.htm

parasar
06-21-2014, 03:41 PM
I added my rCRS results in here as number 14: http://i.imgur.com/UxD9rys.png I'm guessing that's how it'll look once they add me. I seem to have the most overlap with the one above me (#13) but they have overlap with the others. Any idea on how to find out where these samples were from?

I noticed some of the values in RSRS results don't match rCRS. My mismatched C16311T for example is only in RSRS and not in rCRS while the other M30b people have T16311C presumably from their rCRS tables.

Here's the e-mail I got from Ian summarizing my mutations from the rCRS table:

The references are different. sCRS is referenced to a modern European who already had many mutations that were not ancestral.
So that European person already had the C16311T mutation and therefore in reference to that you are not showing it.

Since you are M and have RSRS C16311T and the rCRS reference was an N derivative H and also had RSRS C16311T, this is a very upstream mutation.

Dr_McNinja
06-21-2014, 05:04 PM
The references are different. sCRS is referenced to a modern European who already had many mutations that were not ancestral.
So that European person already had the C16311T mutation and therefore in reference to that you are not showing it.

Since you are M and have RSRS C16311T and the rCRS reference was an N derivative H and also had RSRS C16311T, this is a very upstream mutation.Yeah, Ian said T16311C was a recent mutation:


Yes, there is the 16311 difference. A common mutation, so not a very important one.
Maybe it has arisen in the last 400 years, or so.

So you think it mutated in different populations around the same time recently? There are M4'67 derivatives that would have to have been in the Pakistan/NW-India region for at least several centuries which have it while there are other branches in Southeast India that also have it.

parasar
06-21-2014, 05:16 PM
Yeah, Ian said T16311C was a recent mutation:



So you think it mutated in different populations around the same time recently? There are M4'67 derivatives that would have to have been in the Pakistan/NW-India region for at least several centuries which have it while there are other branches in Southeast India that also have it.

If this correct -"other M30b people have T16311C presumably from their rCRS" - then it means it has back mutated in these people.

Humanist
07-16-2014, 08:02 PM
The below post by MfA reminded me of the above exchange between Dr. McNinja and I. It appears that you can have your data listed on GenBank, even if you have not had a mtDNA FGS. But, apparently, you need to have your data uploaded at OpenSNP.org. I checked the HV page and found that I am listed twice. The first being my mtDNA FGS sample, and again as "Paul G."



Ian Logan is very kind to add me GenBank.
http://www.ianlogan.co.uk/sequences_by_group/j1b_genbank_sequences.htm

For those who interests, all you need to do is uploading your 23andMe raw data to OpenSNP.org (https://opensnp.org)

dp
07-18-2014, 09:03 PM
There are many partial sequences in GEDBANK. Such as the following from the mitochondrial control region:
>ENA|FJ719801|FJ719801.1 Homo sapiens isolate XH119 D-loop, partial sequence; mitochondrial.
GGGTACCACCCAAGTATTGACTCACCCATCAACAACCGCTATGTATCTCG TACATTACTG
CCAGCCACCATGAATATTGTACGGTACTATAAATACTTGACCACCTGTAG TACATAAAAA
CCCAATCCACATCAAAACCCCCTCCCCATGCTTACAAGCAAGTACAGCAA TCAACCCCCA
ACTATCACACATCAACTGCAACTCCAAAGCCACCCCTCACCCACTAGGAT ACCAACAAAC
CTACCCACCCTTAACAGTACATAGCACATAAAGCCATTTACCGTACATAG CACATTACAG
TCAAATCCCTTCTCGTCCCCATGGATGACCCCCCTCAGATAGGAG
Have they stopped allowing submissions like the above?

My submission of my complete mitogenome sequence .fasta file was in GEDBANK awaiting review during the government shut down. Kept me from getting into a comprehensive study that looked at all K mitogenomes. Fortunately, they did have 3 sequences from my haplotype (K1a4a1c) so at least the group was represented and coorelated in the coalescene estimates. I've posted under the K1a4 thread trying to get K's with T199C, to have the complete sequence taken to try to get other K1a4a1c's identified. So far, no dice; maybe I should have posted it in the K thread instead.
David Powell
dp :-)

Huntergatherer1066
01-14-2015, 01:27 AM
I just submitted my FMS fasta to GenBank, the first time around I accidentally attached the wrong file to the email. Let's see if they accept it this time, I was using these instructions: http://www.ianlogan.co.uk/checker/submission_maker.htm

Little bit
01-14-2015, 05:19 PM
I submitted my FMS in 2013 using Ian Logan's DIY tool and it went without a hitch. You get some spooky sounding questions from the person taking the submission which basically prods you to say you are sure you know what you are doing and really want to do it. I'm very happy I did and see it as a nice legacy to my J1c3i ancestors which may help my J1c3i descendents (and other more distant relatives):
http://www.ncbi.nlm.nih.gov/nuccore/KF600657

Good luck!

vettor
01-14-2015, 06:06 PM
I submitted my FMS in 2013 using Ian Logan's DIY tool and it went without a hitch. You get some spooky sounding questions from the person taking the submission which basically prods you to say you are sure you know what you are doing and really want to do it. I'm very happy I did and see it as a nice legacy to my J1c3i ancestors which may help my J1c3i descendents (and other more distant relatives):
http://www.ncbi.nlm.nih.gov/nuccore/KF600657

Good luck!

I did mine in early 2013 and the main issue is that your data is available for all studies to use forever ..........you cannot stop this once you commit to Genbank .

But apart from some correspondence from Ian, I never anything from anyone.

dp
01-14-2015, 07:20 PM
I just submitted my FMS fasta to GenBank, the first time around I accidentally attached the wrong file to the email. Let's see if they accept it this time, I was using these instructions: http://www.ianlogan.co.uk/checker/submission_maker.htm

If a mistake persists, after the sequence is added, try: http://www.ncbi.nlm.nih.gov/Genbank/update.html
It gave a link which I used to change/update my ethnicity from just plain Caucasian.
dp :-)

Huntergatherer1066
01-14-2015, 11:28 PM
If a mistake persists, after the sequence is added, try: http://www.ncbi.nlm.nih.gov/Genbank/update.html
It gave a link which I used to change/update my ethnicity from just plain Caucasian.
dp :-)

Thanks for the tip. So far so good though, the first time around I accidentally attached the .txt version of the fasta instead of the .sqn verison.

Petr
01-15-2015, 12:30 AM
Does anybody know if this way of mtDNA submission to NCBI still works?

https://my.familytreedna.com/personal-survey.aspx

vettor
01-15-2015, 12:59 AM
Does anybody know if this way of mtDNA submission to NCBI still works?

https://my.familytreedna.com/personal-survey.aspx

for me , that way did not work

i got access via ian Logan site and also got help from James Lick

BTW, it took 6 months for mine to be done

Huntergatherer1066
01-24-2015, 12:24 AM
My submission is searchable on GenBank as of today, so I guess that makes for a 10 day turn around for anyone keeping track.

Huntergatherer1066
04-24-2015, 04:50 PM
This is more just to keep track for myself but I just submitted my father's FMS, he is an H13a1a1a so much more common than myself, but he has no exact matches in the FTDNA database so he may still be useful.

Baltimore1937
04-24-2015, 06:45 PM
PhyloTree uses me as an example on their chart, for all to see. So I hope FTDNA did an accurate test for me (and for science).