Page 1 of 3 123 LastLast
Results 1 to 10 of 24

Thread: WHY HASN'T NATIONAL GEOGRAPHIC/HELIX DONE ANYTHING ABOUT RELEASING RAW DATA FILES ?

  1. #1
    Junior Member
    Posts
    5
    Sex

    WHY HASN'T NATIONAL GEOGRAPHIC/HELIX DONE ANYTHING ABOUT RELEASING RAW DATA FILES ?

    WHY HASN'T NATIONAL GEOGRAPHIC/HELIX DONE ANYTHING ABOUT RELEASING RAW DATA FILES ?
    It was first represented in December 2016 that they were working on a way to release the raw data files. Any new news?
    It looks like once the Genographic Project was completed and Dr. Wells left they could't care less about it.

  2. #2
    Junior Member
    Posts
    1
    Sex

    Quote Originally Posted by WilliamBruce View Post
    WHY HASN'T NATIONAL GEOGRAPHIC/HELIX DONE ANYTHING ABOUT RELEASING RAW DATA FILES ?
    It was first represented in December 2016 that they were working on a way to release the raw data files. Any new news?
    It looks like once the Genographic Project was completed and Dr. Wells left they could't care less about it.
    I have also been wondering the same thing. I inquired and got a response on July 31, 2017 that they were in the testing stage and hoped that it would be available within the next month and I would get an email when the option was available.

  3. #3
    Junior Member
    Posts
    5
    Sex

    Quote Originally Posted by chipmunk226 View Post
    I have also been wondering the same thing. I inquired and got a response on July 31, 2017 that they were in the testing stage and hoped that it would be available within the next month and I would get an email when the option was available.

    Thanks. I inquired twice and all they could tell me was that they were currently working on it. It was the same response I received before I purchased Geno 2.0 in February.

  4. #4
    Registered Users
    Posts
    1,250
    Sex
    Location
    USA
    Ethnicity
    Latvian
    Nationality
    USA
    Y-DNA
    R-L20
    mtDNA
    k1a4a1

    I wonder what happened to the promised more regions update for non-Helix users. When people got Geno 2.0 NG non-Helix they were promised they would be updated with expanded regions and yet only the Helix users have gotten them.

  5. #5
    Registered Users
    Posts
    1,203
    Sex
    Location
    Glasgow, Scotland
    Ethnicity
    Pictland/Deira
    Y-DNA
    R1b-M222-FGC5864
    mtDNA
    H5r*

    They seem to have made a somewhat vague claim its coming 'later this year'

    "Downloads will be available later this year. Exact cost and timing will be announced in the coming weeks. If you'd like to be added to a list for notification when this feature becomes available, let us know at https://support.helix.com/hc/en-us/requests/new"

    https://twitter.com/my_helix/status/928679680219136000
    YSEQ:#37; YFull: YF01405 (Y Elite 2013)
    WGS (Full Genomes Nov 2015, YSEQ Feb 2019, Dante Mar 2019, FGC-10X Linked Reads Apr 2019, Dante-Nanopore May 2019) - further WGS tests pending ;-)
    Ancestry GCs: Scots in central Scotland & Ulster, Ireland; English in Yorkshire & Pennines
    Hidden Content
    FBIMatch: A828783 (autosomal DNA) for segment matching DO NOT POST ADMIXTURE REPORTS USING MY KIT

  6. The Following User Says Thank You to MacUalraig For This Useful Post:

     Robert1 (11-27-2017)

  7. #6
    Junior Member
    Posts
    5
    Sex

    Geno 2.0 has finally released downloadable files. However they are incomplete and do not work on Gedmatch or Gedmatch Genesis.

  8. The Following User Says Thank You to WilliamBruce For This Useful Post:

     Robert1 (12-06-2017)

  9. #7
    Registered Users
    Posts
    210
    Sex

    Quote Originally Posted by WilliamBruce View Post
    Geno 2.0 has finally released downloadable files. However they are incomplete and do not work on Gedmatch or Gedmatch Genesis.
    Could you describe the content a little more? Or if you don't mind sending me the raw data, I could take a look at the file format. I'm DNACousins@gmail.com

  10. The Following User Says Thank You to Ann Turner For This Useful Post:

     Robert1 (12-06-2017)

  11. #8
    Registered Users
    Posts
    1,250
    Sex
    Location
    USA
    Ethnicity
    Latvian
    Nationality
    USA
    Y-DNA
    R-L20
    mtDNA
    k1a4a1

    Is it really true that the raw download includes full mtDNA??? On another forum someone was claiming it gave 16,000+ data points for mtDNA (7% no calls), so is it really like a poor man's full mtDNA? Or did they make some mistake and say just see some 16,000 something position and not notice tons of spots missing along the way from 0 to 16,000+?

  12. The Following User Says Thank You to wombatofthenorth For This Useful Post:

     Robert1 (12-06-2017)

  13. #9
    Registered Users
    Posts
    210
    Sex

    Quote Originally Posted by wombatofthenorth View Post
    Is it really true that the raw download includes full mtDNA??? On another forum someone was claiming it gave 16,000+ data points for mtDNA (7% no calls), so is it really like a poor man's full mtDNA? Or did they make some mistake and say just see some 16,000 something position and not notice tons of spots missing along the way from 0 to 16,000+?
    I'm just now in the process of analyzing the files William Bruce graciously sent to me. It does have slots for all 16569 positions but with a very high no-call rate, which seems to run for rather long stretches (1000+ bases). If anyone has another mtDNA file to send me, I could check to see if the no-call positions are consistent. DNACousins@gmail.com.

  14. The Following 2 Users Say Thank You to Ann Turner For This Useful Post:

     Robert1 (11-28-2017),  wombatofthenorth (11-27-2017)

  15. #10
    Registered Users
    Posts
    210
    Sex

    Thanks to William Bruce for sending me his raw data from the Helix version of the Genographic Project.

    He was unable to upload the file to GEDmatch because of the file format. It is sorted by rsid instead of chromosome and position, it lacks a column for position, and it puts the two alleles in separate columns. I used Excel's VLOOKUP function against a 23andMe v3 template to identify the position, concatenated allele1 and allele2 to make a genotype column, rearranged columns, and sorted by chromosome and position. This was sufficient for GEDmatch to recognize the file format (23andMe). There is no X data at all, and GEDmatch made note of that but still accepted the file.

    General observations:

    The genotype distribution is unremarkable. SNPs where the two alleles are also complementary base pairs are avoided. Transitions ( A<->G or C <-> T ) are more common than transversions.

    The no-call rate is quite high at 13%. Genealogy companies aim for no more than 3% and often achieve much better than that. It would be useful to know if the no-calls have a consistent pattern.

    The overall homozygosity is 60.2%, lower than found in the SNP selection for 23andMe v3 (70.5%) or LivingDNA (83%). Homozygosity increases when more SNPS with rare alleles are added, since most people will share the more common allele. The Genographic Project may have looked for SNPs that are somewhat common globally but have different distributions in various parts of the world.

    Helix_genotype_distribution.jpg

    William has a file from LivingDNA for comparison. GEDmatch Genesis shows that 99.5% of the calls in the Helix file match the calls in the LivingDNA file (about 237 differences out of the 47423 SNPs available for comparison). This sounds like a decent concordance rate, but it is lower than achieved by chip technology. My son's LivingDNA file compared to his 23andMe file shows 54 differences out of 183,824 SNPs (99.97%).

    The SNP overlap with different platforms is important for GEDmatch. These stats are comparisons of specific files, so they would vary slightly depending on no-calls in the other files.

    Helix_SNP_overlap.jpg

    With the current GEDmatch algorithm, William's LivingDNA and Helix files show lots of gaps in sections where the SNP overlap falls below their threshold. He shows only a 76.3% similarity with himself.

    Helix&LivingDNA_Genesis.jpg

    The Helix file fared better with a comparison with a parent/child kit from 23andMe v4, where there are more SNPs in common. There were still some small breaks: there were 66 segments vs the expected 22. However, the total cM added up to 3455, just shy of the 3565 I see for a 23andMe v3 kit. GEDmatch also introduces breaks for a v3 kit in regions where the SNP density is low, so 45 segments were reported for a self-to-self comparison.
    Last edited by Ann Turner; 11-28-2017 at 03:07 PM.

  16. The Following 6 Users Say Thank You to Ann Turner For This Useful Post:

     bjp (11-30-2017),  Dewsloth (11-28-2017),  IberianHilary (01-20-2018),  MacUalraig (12-01-2017),  Robert1 (11-28-2017),  wombatofthenorth (12-09-2017)

Page 1 of 3 123 LastLast

Similar Threads

  1. Replies: 29
    Last Post: 08-18-2018, 11:48 PM
  2. Replies: 32
    Last Post: 06-02-2018, 08:52 PM
  3. How do you delete your data files from Genesis?
    By WilliamBruce in forum Open-Source Projects
    Replies: 10
    Last Post: 08-07-2017, 05:57 AM
  4. Replies: 14
    Last Post: 07-22-2016, 10:56 PM
  5. Replies: 0
    Last Post: 05-05-2016, 03:42 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •