Results 1 to 3 of 3

Thread: little script ( windows + R) converting to 23&me

  1. #1
    Moderator
    Posts
    5,212
    Sex
    Location
    Normandy
    Ethnicity
    northwesterner
    Y-DNA
    U152>L2>Z367
    mtDNA
    H5a1

    Normandie Netherlands Friesland Finland Orkney

    little script ( windows + R) converting to 23&me

    For those that might interest, I wrote a small script (nothing really great, it's really very basic), which within its limits can replace the professional tools to convert dna.land data directly to the format 23 & me. It's under R of course. Limitation:
    R does not accept, without installing a tricky extra package (Bigmemory) to handle really big data. Now an uncompressed imputed.vcf file weighs ca 2.5 Gb, so it is excluded to process it in its entirety. In any case this would be useless, since the txt file obtained would be unusable. My hypothesis will therefore be that you want to convert extracts from your genome.
    Script: I'll put it in the end of post, you'll just have to copy and paste it with a notepad, and save it under any name with extension .r
    Usage: Once downloaded the file imputed.cf.gz (it is the 3rd, for the wired townspeople, a few minutes. For rural people like me who live at the bottom of a hole, count 20 to 40 mn). You unzip it, and open it with glogg. You select with the mouse the part you want to convert. This is the painful part. You copy and paste this part on a text program (it is slow if you have taken many lines, f***g windows), and you save under the name myfile.csv. You put this file and the script in a single folder, open a R session, source the script, and it runs. If your file is a bit large, there is a latency time (R reads your file), then it runs (very fast). I advise against more than 2.500,000 lines. Especially, Windows doing everything wrong, as soon as the number of lines becomes a bit big processing files with notepad (or other) becomes painful. As output you will get in your folder a file named myconvertedfile.txt, which you can open with a text program, a spreadsheet (or glogg if it is big). Since I wanted a 23&me file, I did not keep the allele probabilities, but this is not a problem (except that it will slow down because there is a calculation to be made from the "Genetic likehoods"). If someone asks me, I will post an augmented script).

    script:

    tab<-read.csv("myfile.csv",sep='\t',header=F)
    Ntab<-tab[1,1:4]
    Ntab[,4]<-as.character(Ntab[,4])
    I<-which(substr(tab[,10],1,3)=="0/0")
    Ntab[I,4]<-paste(substr(tab[I,4],1,1),substr(tab[I,4],1,1),sep="")
    I<-which(substr(tab[,10],1,3)=="0/1")
    Ntab[I,4]<-paste(substr(tab[I,4],1,1),substr(tab[I,5],1,1),sep="")
    I<-which(substr(tab[,10],1,3)=="1/1")
    Ntab[I,4]<-paste(substr(tab[I,5],1,1),substr(tab[I,5],1,1),sep="")
    Ntab[,1]<-tab[,3]
    Ntab[,2]<-tab[,1]
    Ntab[,3]<-tab[,2]
    write.table(Ntab,"myconvertedfile.txt",sep='\t', row.names=FALSE,col.names=FALSE,quote=F)
    Warning: of course this elementary script does not correct the damned flipped snps!
    Last edited by anglesqueville; 08-16-2017 at 04:05 PM.
    En North alom, de North venom
    En North fum naiz, en North manom

    (Roman de Rou, Wace, 1160-1170)

  2. The Following 3 Users Say Thank You to anglesqueville For This Useful Post:

     Adam A (09-27-2017),  lukaszM (08-17-2017),  Theconqueror (08-16-2017)

  3. #2
    Registered Users
    Posts
    1,136
    Sex
    Location
    Canada
    Ethnicity
    Norman/German
    Nationality
    Canadian
    Y-DNA
    R1B-DF99(FGC16982)

    Normandie Germany Imperial
    Thanks for this Angles. Bigmemory is easily installed in R Studio. What is the tricky part with Bigmemory? Is it how it can be used in conjonction with your script?
    K8: French East/German South/Austrian 26%, French North East/Belgian/German West 25%, French North 25%, Irish/Scottish/Welsh 10%, French South/French Basque 9%, German East/Czech/Austrian 5%
    K16: German/Czech/Austrian 33%, French/Belgian 23%, Portuguese/Spanish 22%, Irish/Scottish/Welsh 13%
    K36: German Rheinland-Pfalz 35%, French North West 27%, Spain 27%, Ireland 11%
    G25: German 40%, French 28%, Spanish 23%, Irish 9%
    K16: German 50%, French 19%, French East 9%, Spanish 8%

  4. #3
    Moderator
    Posts
    5,212
    Sex
    Location
    Normandy
    Ethnicity
    northwesterner
    Y-DNA
    U152>L2>Z367
    mtDNA
    H5a1

    Normandie Netherlands Friesland Finland Orkney
    There is nothing really tricky with Bigmemory, in fact, but I wanted to write a script for people with very elementary knowledge of R ( just open and source). That said honnestly I never manage for myself genetic files in R ... Well, I just ran the chromosomes 1 and 2 of my mom with this little script. For the chrom2: 2 minutes. That's perhaps only a funny thing, perhaps useful, don't know. In any case it was'nt that big a work...
    En North alom, de North venom
    En North fum naiz, en North manom

    (Roman de Rou, Wace, 1160-1170)

  5. The Following User Says Thank You to anglesqueville For This Useful Post:

     Theconqueror (08-16-2017)

Similar Threads

  1. Replies: 3
    Last Post: 04-28-2017, 03:18 PM
  2. Map of the diffusion of the runic script
    By Passa in forum Linguistics
    Replies: 0
    Last Post: 08-10-2015, 03:57 PM
  3. Replies: 0
    Last Post: 08-25-2012, 01:38 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •