Results 1 to 4 of 4

Thread: How to add RSIDs to a .BAM or VCF file?

  1. #1

    How to add RSIDs to a .BAM or VCF file?

    Does anyone know how to add RSIDs to a BAM or FASTQ or even VCF file? I tested with a company called Seeq a long time ago and I have a BAM and FASTQ and VCF but they are basically irrelevant. Is there a way to add these?

  2. #2
    Registered Users
    Posts
    1,220
    Sex
    Location
    Glasgow, Scotland
    Ethnicity
    Pictland/Deira
    Y-DNA (P)
    R1b-M222-FGC5864
    mtDNA (M)
    H5r*

    Quote Originally Posted by Smashorpass View Post
    Does anyone know how to add RSIDs to a BAM or FASTQ or even VCF file? I tested with a company called Seeq a long time ago and I have a BAM and FASTQ and VCF but they are basically irrelevant. Is there a way to add these?
    BAM and FASTQ files are collections of reads and so not structured that way. Annotating a VCF with rsIDs can be done eg

    Annotate a VCF with dbSNP IDs and depth of coverage for each sample

    https://software.broadinstitute.org/...tAnnotator.php

    but I've not done one myself :-)
    YSEQ:#37; YFull: YF01405 (Y Elite 2013)
    WGS (Full Genomes Nov 2015, YSEQ Feb 2019, Dante Mar 2019, FGC-10X Linked Reads Apr 2019, Dante-Nanopore May 2019) - further WGS tests pending ;-)
    Ancestry GCs: Scots in central Scotland & Ulster, Ireland; English in Yorkshire & Pennines
    Hidden Content
    FBIMatch: A828783 (autosomal DNA) for segment matching DO NOT POST ADMIXTURE REPORTS USING MY KIT

  3. The Following 2 Users Say Thank You to MacUalraig For This Useful Post:

     anglesqueville (09-23-2018),  Onur Dincer (09-25-2018)

  4. #3
    Banned
    Posts
    5,173
    Sex

    Quote Originally Posted by MacUalraig View Post
    BAM and FASTQ files are collections of reads and so not structured that way. Annotating a VCF with rsIDs can be done eg

    Annotate a VCF with dbSNP IDs and depth of coverage for each sample

    https://software.broadinstitute.org/...tAnnotator.php

    but I've not done one myself :-)


    You're sending the guy on a wild goose chase with GATK tools. They are absolutely not easy to figure out and use I'm pretty sure that I'm the only user here who has figured out how to use them.

    In addition to a linux operating environment with a strong computer and linux command familiarity, the poor guy will have to figure out all the dependencies needed, the correct formats for human reference genomes, the types of headers to add to bams and how to add them, so that GATK will work (GATK is extremely picky), and a host of other issues. Quite a commitment to be sure.

    Since I'm feeling generous today, here is a script that I have put together using Linux bash and AWK that simplifies matters alot.

    1 - All you will need is a Linux environment and to install AWK with this command:

    sudo apt-get install gawk
    2- Next create your Unix shell (command processor) by saving my following AWK commands in a text file called say RS_Converter.sh (the .sh denotes that it is a shell):

    # Lookup rs nos from data.bim by position number and insert into A.bim. Don't change A.bim if corresponding position in Data.bim is not found. file2 file1 > output

    awk 'NR == FNR {REP[$1,$4] = $2; next} ($1,$4) in REP {$2 = REP[$1,$4]} 1' OFS="\t" Data.bim A.bim > A_rs.bim
    Data.bim is a plink bim file that already contains rs numbers. Get one with over 1 million genotyped positions if possible.

    A.bim is your plink bim file that does not contain rs numbers. You can easily convert your VCF to plink bed bim fam format using plink tools.

    A_rs.bim will be the outputted bim file with rs numbers added.


    3- Next you need to make the shell executable. You can do this by making the location where A_rs.bim your working directory by using the linux cd commands. Once you set your working directory, make RS_Converter.sh file executable by typing the following command:

    chmod +x RS_Converter.sh
    If successful, the commands inside RS_Converter.sh will change from black to a color indicating that your file is an executable shell.


    To invoke your shell, all you do is point terminal to the directory and type the shell's file name.


    You can always change your plink bed bim fam files back to 23andMe text format by using the plink command:

    /plink --bfile A_rs --recode 23 --out A_rs.txt

    Good luck....
    Last edited by Kurd; 09-23-2018 at 04:25 PM. Reason: typo

  5. The Following 3 Users Say Thank You to Kurd For This Useful Post:

     Danzo (09-23-2018),  lukaszM (09-23-2018),  Onur Dincer (09-25-2018)

  6. #4
    Quote Originally Posted by Kurd View Post
    You can always change your plink bed bim fam files back to 23andMe text format by using the plink command:
    Wow I never knew about --recode 23, thanks for that. I was manually compiling the ped/map to 23andme format this entire time (using linux commands of course).

Similar Threads

  1. How to get BAM File from 23andme v3?
    By gjenetiks in forum 23andMe
    Replies: 6
    Last Post: 09-17-2018, 03:07 AM
  2. BAM file delays
    By lgmayka in forum FTDNA
    Replies: 10
    Last Post: 08-24-2017, 09:50 PM
  3. Replies: 8
    Last Post: 11-28-2016, 09:57 PM
  4. BAM file sharing
    By Bdeed in forum L2
    Replies: 1
    Last Post: 06-06-2016, 07:28 PM
  5. Is a 40.6 MB BAM file small?
    By DillonResearcher in forum FTDNA
    Replies: 11
    Last Post: 05-31-2016, 01:55 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •