Page 71 of 73 FirstFirst ... 21616970717273 LastLast
Results 701 to 710 of 723

Thread: New K25 Admixture Calculator at GenePlaza.com

  1. #701
    Gold Member Class
    Posts
    5,164
    Sex

    Quote Originally Posted by sammygene View Post
    Would someone be able to convert my brothers for me? I don't really understand how to do it.
    If no one does it for you by this weekend I will
    EurasianDNA.com - A study of the population history of West & South Asia.

  2. The Following 2 Users Say Thank You to Kurd For This Useful Post:

     kingjohn (02-12-2018),  sammygene (02-12-2018)

  3. #702
    Global Moderator
    Posts
    3,051
    Sex
    Location
    Chicago
    Ethnicity
    Baloch Kashmiri Uzbek Kho
    Nationality
    USA
    Y-DNA
    R-Y17491 > R-YP4858
    mtDNA
    A8a

    Pakistan United States of America
    Quote Originally Posted by Kurd View Post
    I would like to see if the converted V5 file using my script that I uploaded make more sense using some of the available admixture calculators. It would be nice if you or someone can post before and after conversion V5 results side by side
    I will try on the V5s I have, though it seems most are Build 37. But, I'm pretty sure Poi will come upon this thread and have a few run before I even wake up. Haha.
    “Chahar chez est tohfay Multan, Gard-o- Garma, Gada-o- Goristan”.

    Four things are the gift of Multan: Dusty winds, hot seasons, beggars and graveyards.




  4. The Following 3 Users Say Thank You to khanabadoshi For This Useful Post:

     bmoney (02-12-2018),  Kurd (02-12-2018),  sammygene (02-12-2018)

  5. #703
    Gold Member Class
    Posts
    5,164
    Sex

    Quote Originally Posted by Kurd View Post
    I put together a script for 23andMe V5 users to enable them to convert from Hg38 to H19 so that they can use admixture calculators.

    My script is LINUX based and you will need to have GNU awk (gawk) installed on your computer. The files you need are inside the following zipped file: https://drive.google.com/file/d/1c8W...ew?usp=sharing

    Here are the instructions. They are also included in the folder


    1- Make sure you have GNU awk (gawk) installed on your computer. If not:
    sudo apt-get update
    sudo apt-get install gawk

    2- Rename your 23andMe V5 file --> 23IN.txt, and place in same folder as 23_TO_UCSC.sh and UCSC_TO_23.sh
    3- Make sure your working directory is the same folder where you have the files mentioned in 2 above
    4- Run 23_TO_UCSC.sh. This will produce an input file (UCSC_IN.txt) for the UCSC build converter
    5- Go to the UCSC converter page at : https://genome.ucsc.edu/cgi-bin/hgLiftOver
    6- Choose your "Original Assembly" (hg38) and your "New Assembly" (hg19), and point the "Browse" button to your file (UCSC_IN.txt) and click submit file
    7- After a few minutes your converted file will be ready. Save it as UCSC_OUT.txt in the same folder as step 2
    8- Run UCSC_TO_23.sh. Your converted hg19 23andMe file will be outputted as 23OUT.txt.

    I just got off the phone with 23andMe. Apparently they are still using build37. Therefore, the odd results must be due to low marker overlap

    Hg38 is a much more accurate human reference than Hg19 due to technical reasons which I have discussed elsewhere. I am inclined to use it when mapping ancient sequences just to get a more accurate assembled genome even if I have to convert back to hg19 coordinates post genotyping
    EurasianDNA.com - A study of the population history of West & South Asia.

  6. The Following 4 Users Say Thank You to Kurd For This Useful Post:

     bmoney (02-13-2018),  khanabadoshi (02-13-2018),  kingjohn (02-13-2018),  poi (02-13-2018)

  7. #704
    Gold Member Class
    Posts
    2,262
    Sex
    Ethnicity
    Nepali Brahmin
    Y-DNA
    R1a-L657>Y6
    mtDNA
    M30

    Quote Originally Posted by khanabadoshi View Post
    I will try on the V5s I have, though it seems most are Build 37. But, I'm pretty sure Poi will come upon this thread and have a few run before I even wake up. Haha.
    Lol, just saw this thread/post (thanks bmoney!). I will get to it when my kids are asleep. lol

  8. The Following 2 Users Say Thank You to poi For This Useful Post:

     bmoney (02-13-2018),  khanabadoshi (02-13-2018)

  9. #705
    Global Moderator
    Posts
    3,051
    Sex
    Location
    Chicago
    Ethnicity
    Baloch Kashmiri Uzbek Kho
    Nationality
    USA
    Y-DNA
    R-Y17491 > R-YP4858
    mtDNA
    A8a

    Pakistan United States of America
    Quote Originally Posted by Kurd View Post
    I just got off the phone with 23andMe. Apparently they are still using build37. Therefore, the odd results must be due to low marker overlap

    Hg38 is a much more accurate human reference than Hg19 due to technical reasons which I have discussed elsewhere. I am inclined to use it when mapping ancient sequences just to get a more accurate assembled genome even if I have to convert back to hg19 coordinates post genotyping
    I honestly think the script should be doing exactly this, "patching" everything to GRch38. There is an online remapping tool: https://www.ncbi.nlm.nih.gov/genome/tools/remap

    Also, you may find this useful, v5 is a custom order of these chips:

    Data Sheet: https://support.illumina.com/content...0-2016-016.pdf
    Specifically, the MAF tables, the 1000Genome reference Venn diagram and intron/exon/y/mtDNA snp breakdowns will be of interest.





    Infinium Global Screening Array v1.0 Product Files



    This download contains the latest manifest (.bpm and .csv), cluster (.egt), and LIMS product descriptor files for the Infinium Global Screening Array v1.0.
    https://support.illumina.com/downloa...ort-files.html


    In case... All of Illumina's refseq builds/annotations: https://support.illumina.com/sequenc...e/igenome.html

    Arrays with GRCh38/hg38 files available:
    “Chahar chez est tohfay Multan, Gard-o- Garma, Gada-o- Goristan”.

    Four things are the gift of Multan: Dusty winds, hot seasons, beggars and graveyards.




  10. The Following 3 Users Say Thank You to khanabadoshi For This Useful Post:

     Kurd (02-13-2018),  poi (02-13-2018),  Varun R (02-13-2018)

  11. #706
    Registered Users
    Posts
    6,241
    Sex
    Location
    Torun
    Ethnicity
    Central 75% + 25% Mazovia
    Nationality
    Pole
    Y-DNA
    R1a > M198 > YP 1337

    Poland
    I don't know if script was supposed to fix also low genotype rate when using V5 in DIY calculators? I checked yesterday and no difference but maybe I do sth wrong on this site with reference genomes. Somebody could check if it is the same situation after conversion?
    Last edited by lukaszM; 02-13-2018 at 10:13 AM.

  12. #707
    Gold Member Class
    Posts
    5,164
    Sex

    Quote Originally Posted by lukaszM View Post
    I don't know if script was supposed to fix also low genotype rate when using V5 in DIY calculators? I checked yesterday and no difference but maybe I do sth wrong on this site with reference genomes. Somebody could check if it is the same situation after conversion?
    The script helps you convert from one coordinate system to another. No script can help you increase the low number of overlapping SNPs (genotype rate) that V5 have with V4 (110K) and others. You would have to impute those non-overlapping uncalled SNPs, which isn’t a perfect solution as imputation accuracy varies upon the algorithm used, but generally falls of with rarer variants, which you have seen me sometimes refer to as more population specific rarer variants. I believe you can see a hint of this in the tables posted by Khana above

    Edit: Unless I was given wrong info by 23, they are still supposedly using Build 37. It just occurred to me that V5 results may also be off, because not all the markers are actually genotyped, rather most may be imputed, and that is why they have published imputation tables shown by Khana above
    Last edited by Kurd; 02-13-2018 at 11:44 AM.
    EurasianDNA.com - A study of the population history of West & South Asia.

  13. The Following 5 Users Say Thank You to Kurd For This Useful Post:

     khanabadoshi (02-14-2018),  kingjohn (02-13-2018),  lukaszM (02-13-2018),  poi (02-13-2018),  sammygene (02-13-2018)

  14. #708
    Banned
    Posts
    4,088
    Sex
    Ethnicity
    karius the black hair
    Nationality
    east med

    this imputation stuff is a disaster

  15. #709
    Global Moderator
    Posts
    3,051
    Sex
    Location
    Chicago
    Ethnicity
    Baloch Kashmiri Uzbek Kho
    Nationality
    USA
    Y-DNA
    R-Y17491 > R-YP4858
    mtDNA
    A8a

    Pakistan United States of America
    Quote Originally Posted by Kurd View Post
    The script helps you convert from one coordinate system to another. No script can help you increase the low number of overlapping SNPs (genotype rate) that V5 have with V4 (110K) and others. You would have to impute those non-overlapping uncalled SNPs, which isn’t a perfect solution as imputation accuracy varies upon the algorithm used, but generally falls of with rarer variants, which you have seen me sometimes refer to as more population specific rarer variants. I believe you can see a hint of this in the tables posted by Khana above

    Edit: Unless I was given wrong info by 23, they are still supposedly using Build 37. It just occurred to me that V5 results may also be off, because not all the markers are actually genotyped, rather most may be imputed, and that is why they have published imputation tables shown by Khana above
    Quote Originally Posted by kingjohn View Post
    this imputation stuff is a disaster
    Yes, if you read the entire data sheet and the rest of the literature on the GSA chip, it is highly imputed. They go into details of the accuracy/confidence of the imputation process; which is why 23andme updated to this chip. They also managed to lower the cost of production extensively ~$40/chip when bought in bulk. I strongly suggest everyone to at least read the full data sheet. It's only 7 pages.

    The chip itself can be mapped in either GRCh37 or 38. So it seems 23andMe continued with 37.

    V4 was typed on a completely customized HTS iSelect HD (unaltered it would be the Illumina HumanOmniExpress-24) and also relies heavily on imputation, relative to the V3 chip, which was typed on the Illumina HumanOmniExpress+. In the case of all these versions, 23andMe always customizes the chip -- removing many SNPs and adding others. This is all dependent on their own proprietary research of which SNPs they actually believe to be Ancestry Informative. This consideration is to be taken on top of the fact that Illumina also add/removes SNPs in each new Bead chip -- as reflected by their own research. Population Genetic companies and researchers will customize chips by necessity as Illumina designs their chip for much broader use (clinical/pharmaceutical). In both cases, the companies are considerably more confident that each iteration is far more accurate than the last. V4 tested for almost half the SNPs of V3, not because V4 is less accurate, but those SNPs in V3 are either uninformative or highly predictable based on one's general ethnic grouping. Remember, whatever was imputed on V4 was based upon the knowledge of knowing the tested calls of each ethnic group on the 900K+ V3 chip. They removed SNPs which had to have a 90%+ confidence of a certain call in each grouping [what the actual threshold is, I don't know, I don't see it published, but I assume it's reasonable]. So the imputations are not pure statistical inferences in a vacuum. Their algorithms take into account the hard data of acquired knowledge as well. The basis is always against the 1000Genomes sample set. You can download the full population breakdown on the newest chipset to see how they stack on an unaltered chip (It's around 10 Gigs).

    If you want to delve further in the imputation and Ancestry admixture process, you can read all of 23andMe's patents -- which is what I did. Essentially, your reported information has a significant effect on what allele is chosen during the imputation process. This is not ideal, but it is inherent -- they have to base the probabilities on something. In the case of South Asians and East Asians, or anyone nearby -- the GIH sample set and one of the Chinese sample sets is the basis of our probabilities. So in that Venn diagram above, the SNPs in the SAS-EAS grouping are the basis for our imputation. Once we told 23andMe we are South Asian, or the population that is closest to our own is GIH, the imputation process assigns a high % likelihood that we will have the same call as the GIH sample. However, they also take in strong consideration your relatives who are tested. f. ex. If your parents are both tested, your calls must conform to their calls. Another point to consider, 23andMe tests regions in-house on their own chips, likely based on your general ethnic grouping. So you are not just tested on the V5 chip, your raw data is the culmination of a few tests. It is very likely that they have extensively tested the 1000G groups and have full/actual non-imputed calls for each of the samples in-house. They are the basis for our imputations in each iteration. The process makes sense, the weak point is that much is dependent upon which 1000G sample we are identified to be closest to. A caveat is that much of this is based upon their patents and they may or may not implement the entirety of their patent claim in their own workflow. Their ancestry admixture process is extremely interesting, but I won't go into it. You can read it here:


    Genotype Calling: https://docs.google.com/viewer?url=p.../US8428886.pdf
    Estimation of Admixture Generation: https://www.google.com/patents/US201...AjwQ6AEIzQEwFw
    Processing Data from Genotyping Chips: https://docs.google.com/viewer?url=p.../US8645343.pdf
    Last edited by khanabadoshi; 02-14-2018 at 06:22 AM.
    “Chahar chez est tohfay Multan, Gard-o- Garma, Gada-o- Goristan”.

    Four things are the gift of Multan: Dusty winds, hot seasons, beggars and graveyards.




  16. The Following 4 Users Say Thank You to khanabadoshi For This Useful Post:

     bmoney (02-14-2018),  Kurd (02-15-2018),  poi (02-14-2018),  Varun R (02-14-2018)

  17. #710
    Gold Member Class
    Posts
    2,262
    Sex
    Ethnicity
    Nepali Brahmin
    Y-DNA
    R1a-L657>Y6
    mtDNA
    M30

    Quote Originally Posted by khanabadoshi View Post
    Yes, if you read the entire data sheet and the rest of the literature on the GSA chip, it is highly imputed. They go into details of the accuracy/confidence of the imputation process; which is why 23andme updated to this chip. They also managed to lower the cost of production extensively ~$40/chip when bought in bulk. I strongly suggest everyone to at least read the full data sheet. It's only 7 pages.

    The chip itself can be mapped in either GRCh37 or 38. So it seems 23andMe continued with 37.

    V4 was typed on a completely customized HTS iSelect HD (unaltered it would be the Illumina HumanOmniExpress-24) and also relies heavily on imputation, relative to the V3 chip, which was typed on the Illumina HumanOmniExpress+. In the case of all these versions, 23andMe always customizes the chip -- removing many SNPs and adding others. This is all dependent on their own proprietary research of which SNPs they actually believe to be Ancestry Informative. This consideration is to be taken on top of the fact that Illumina also add/removes SNPs in each new Bead chip -- as reflected by their own research. In both cases, the companies are considerably more confident that each iteration is far more accurate than the last. V4 tested for almost half the SNPs of V3, not because V4 is less accurate, but those SNPs in V3 are either uninformative or highly predictable based on one's general ethnic grouping. Remember, whatever was imputed on V4 was based upon the knowledge of knowing the tested calls of each ethnic group on the 900K+ V3 chip. They removed SNPs which had to have a 90%+ confidence of a certain call in each grouping. So the imputations are not pure statistical inferences in a vacuum. Their algorithms take into account the hard data of acquired knowledge as well. The basis is always against the 1000Genomes sample set. You can download the full population breakdown on the newest chipset to see how they stack on an unaltered chip (It's around 10 Gigs).

    If you want to delve further in the imputation and Ancestry admixture process, you can read all of 23andMe's patents -- which is what I did. Essentially, your reported information has a significant effect on what allele is chosen during the imputation process. This is not ideal, but it is inherent -- they have to base the probabilities on something. In the case of South Asians and East Asians, or anyone nearby -- the GIH sample set and one of the Chinese sample sets is the basis of our probabilities. So in that Venn diagram above, the SNPs in the SAS-EAS grouping are the basis for our imputation. Once we told 23andMe we are South Asian, or the population that is closest to our own is GIH, the imputation process assigns a high % likelihood that we will have the same call as the GIH sample. However, they also take in strong consideration your relatives who are tested. f. ex. If your parents are both tested, your calls must conform to their calls. Another point to consider, 23andMe tests regions in-house on their own chips, likely based on your general ethnic grouping. So you are not just tested on the V5 chip, your raw data is the culmination of a few tests. It is very likely that they have extensively tested the 1000G groups and have full/actual non-imputed calls for each of the samples in-house. They are the basis for our imputations in each iteration. The process makes sense, the weak point is that much is dependent upon which 1000G sample we are identified to be closest to. A caveat is that much of this is based upon their patents and they may or may not implement the entirety of their patent claim in their own workflow. Their ancestry admixture process is extremely interesting, but I won't go into it. You can read it here:


    Genotype Calling: https://docs.google.com/viewer?url=p.../US8428886.pdf
    Estimation of Admixture Generation: https://www.google.com/patents/US201...AjwQ6AEIzQEwFw
    Processing Data from Genotyping Chips: https://docs.google.com/viewer?url=p.../US8645343.pdf
    I want my money back. They can keep my spit sample.

  18. The Following User Says Thank You to poi For This Useful Post:

     bmoney (02-14-2018)

Page 71 of 73 FirstFirst ... 21616970717273 LastLast

Similar Threads

  1. Turkic K11 Admixture Calculator
    By Kurd in forum Turkic
    Replies: 94
    Last Post: 06-27-2018, 01:28 PM
  2. Turkic ADMIXTURE calculator
    By Kurd in forum Turkic
    Replies: 173
    Last Post: 06-07-2018, 09:08 PM
  3. Replies: 24
    Last Post: 05-25-2018, 04:45 AM
  4. Upcoming K21 Calculator at GenePlaza
    By Kurd in forum Autosomal (auDNA)
    Replies: 190
    Last Post: 12-04-2017, 06:05 PM
  5. DIY K9 Kurdish Admixture Calculator
    By Kurd in forum Autosomal (auDNA)
    Replies: 117
    Last Post: 12-03-2017, 10:27 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •