PDA

View Full Version : Converting .VCF file to 23 and me format using Python3



digital_noise
08-06-2019, 04:44 PM
I did a Nebula Genomics test a while ago and am trying to make some use out of the files they provide.

Does anyone know how to cvert a .VCF file to 23 and me format using Python3? I am at my wits end trying to glean any value from this (worthless) test. I can successfully use DNAKitStudio and convert but when I upload to Gedmatch, I get the following error:

Notice: Undefined index: u in /var/www/genesis-v1.0.0/html/v_upload2log.phpnf on line 23

Notice: Undefined index: name in /var/www/genesis-v1.0.0/html/v_upload2log.phpnf on line 24

Notice: Undefined index: name2 in /var/www/genesis-v1.0.0/html/v_upload2log.phpnf on line 25

Notice: Undefined index: email in /var/www/genesis-v1.0.0/html/v_upload2log.phpnf on line 26

Notice: Undefined index: auth in /var/www/genesis-v1.0.0/html/v_upload2log.phpnf on line 27
You must agree that you are authorized to upload this file.


I used an older version of DNA Studio Kit and managed to upload the file, but in the end, I see this:
File does not contain the correct number of SNPs. The SNP count must be between 50000 and 10000000 SNPs.
Your file had 27550498 SNPs.

Any idea's?

The Nebula test is worthless. I have read that the low coverage is borderline useless. I dont really care about that, I am more interested in geneology and ancestry, and the ethnicity estimate isnt awful but its exactly what we got for free with Gencove before they stopped accepting uploads.

I'd like to get some value out of this, but Nebula seems to want to make that hard to impossible to do so.

Here is their page on this:
https://blog.nebula.org/how-to-start-exploring-your-raw-genomic-data/

Jorge Caballero
08-13-2019, 10:34 PM
Try DNA Kit Studio from simple tools.

anglesqueville
08-14-2019, 09:22 AM
More than 27 million SNPs for less than $200, it's interesting. I hope they soon sell their services out of the USA. That said if you are interested in ancestry you will likely never use these 27 million. The only modern detailed reference sample that I know which use so many SNPs (even more if I remember well) is Pagani, essentially useful for east-Eurasian ancestry. 1000genomes use around 40 million but its European regionalisation is very sketchy. Afaik all the others, and in particular the Harvardian samples use far fewer SNPs. About the conversion vcf-->txt, I'm absolutely ignorant of DNAKitstudio. I always used vcftools and PLINK. Would you mind posting a hardcopy of your ancestral report? I'm curious about what they do with 27million SNPs.

digital_noise
08-14-2019, 05:39 PM
32437

I should clarify my statement. I think its a low pass sequencing, I have read multiple places that the data is almost worthless. So for medical, etc.. hard pass. The ancestry portion is not wrong overall but the %'s are a bit mixed up, mostly the Scandinavian being 10% too much.

On paper I am:
50% English (likely 6% or so French as well in there)
25% Italian
12.5% Swedish
12.5% Dutch/German

What I have notice dis the 50% english is rarely identified as actual English. Usually that % is lower and the Germanic is higher. Who knows what is correct.