Page 1 of 2 12 LastLast
Results 1 to 10 of 19

Thread: Problems with Illumina whole exome testing results

  1. #1
    Registered Users
    Posts
    24
    Sex
    Location
    Texas
    Ethnicity
    Colonial British/German
    Nationality
    USA
    Y-DNA
    R-L1065>S744>S691*
    mtDNA
    T2b19a*

    Problems with Illumina whole exome testing results

    I am dividing this post into three parts because of high volume of characters. I really don't know where to go with this, so I'm just going to post it here in hopes that someone might have an idea of what the issue might be. Just to be clear, I have no formal training in molecular biology whatsoever, so everything I know I learned mostly from the internet. I have two whole exomes from late 2014, a mother and son. All of the data is consistent with mother and son, so no question about that, and we have 23andMe data that is consistent with the whole exomes as well, so we are 100% certain that there is no sample mix up. Also, I have searched these exomes for rare variants for the past five years, and one extremely rare variant in both mother and son was also present in their 23andMe results, and confirmed with Sanger sequencing. The two whole exome tests were collected and processed at the same time, and results delivered at the same time. I have cut and pasted the whole exome specifications. The VCF format was an older format, complex and not as easy to read as some of the newer formats I have seen, but includes a lot of information about every variant call.
    Methodology
    Enrichment
    Nextera Rapid Capture Expanded Exome Kit - FC-140-1006
    Platform: Illumina HiSeq
    Analysis: Gene By Gene uses the Arpeggi Engine for NGS analytics. This pipeline has been vetted and shown to be more accurate than traditional tools for alignment, variant calling, and variant annotation.
    Deliverables
    Raw data will be provided in FastQ format, BAM, VCF and Annotated VCF will only be provided if ordered
    Results are delivered to the customer via electronic FTP transfer and are only stored by Gene By Gene for 30 days.

    So over the past few days while reviewing some data for the X chromosome, I "stepped back for a better view". The old saying, "can't see the forest because of all the trees" comes to mind. I think I noticed this several years back but really couldn't do anything with it and chalked it up to my inability to understand the data/VCF format. I selected a single gene from the X to demonstrate the problem, and I'll copy and paste that data. I considered sharing some background about why these two had whole exome testing, but suffice it to say that the male had a young adult onset of a rare, complex fatal genetic disease, and no cause has been identified to date. He is Canadian, and his specialists in Canada have shown zero interest in genomic/genetic testing, as there are currently zero treatment options for the disease. So the only options for genetic counseling are private pay, and his care is consuming all available resources. We simply can't afford to pay for clinical grade testing, and Canadian healthcare absolutely will not. It is amazing that he is still alive, but he is just a hollow shell of who he was before all this started.

  2. The Following 2 Users Say Thank You to Ysearcher For This Useful Post:

     MacUalraig (06-09-2019),  Piquerobi (06-07-2019)

  3. #2
    Registered Users
    Posts
    24
    Sex
    Location
    Texas
    Ethnicity
    Colonial British/German
    Nationality
    USA
    Y-DNA
    R-L1065>S744>S691*
    mtDNA
    T2b19a*

    So here's the problem. While scrolling through the male data, it is obvious that he has an abundance of heterozygous and homozygous variants in the X, and the data is not identical to his mother. In other words, the X chromosome results are reported in exactly the same manner as his autosomal results, chromosomes 1 - 22. How can that be? I have copied and pasted the results for the SLC25A5 gene, xq24, chrX:118,602,363-118,605,359 (GRCh37/hg19), Size:2,997 bases Orientationlus strand, Size:298 amino acids - https://www.genecards.org/cgi-bin/ca...ywords=slc25a5
    To clarify a little, this format did not come with rsID SNP annotation, and the rsIDs in parentheses were added by me. I just started working on this particular gene yesterday, so I haven't finished adding rsID numbers for the variants. The genotype is in the usual fashion, reference allele following by variant alllele, and heterozygous indicated by 0/1, homozygous by 1/1. I chose this particular gene because all the variants listed are heterozygous calls. The numeric annotations for each variant are as follows (I have used the first listed variant @ 118603641 as an example) -

    GC - Guanine-Cytosine Content of all reads at the variant position - 0.514373
    HL - Longest homopolymers run adjacent to the position across all reads - 3
    HR - RMS Weighted Homopolymer Rate of all reads - 1.49647
    IndelCnt - 0
    MismatchCnt - 1
    GT - Genotype - 0/1
    AD - Allelic depth for the ref and alternate alleles in the order listed - 12,5
    DP - Approximate read depth - 17
    GQ - Genotype quality - 27
    PL - Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specifications - 32,0,49
    AB - Ratio of genotype allelic depths for heterozygous variants - 0.705882
    SR - Ratio of reads supporting genotypes that are on the forward strand - 0.647059
    BQ - RMS Phred-scaled base quality of supporting reads at variant position - 36
    LowMQ - Number of reads with MQ<=10 for the ref and alternate alleles in the order listed - 0,0
    ClipCNT - Number of reads with clipping for the ref and alt alleles in the order listed - 0,0
    ReadOffset - Average offset to variant position in reads supporting the ref and alt alleles in the order listed - 68.5,64
    RAD - Allelic depths for the ref and alt alleles in the order listed, for reverse strand reads only - 4,2
    AS - Allele support scores for the ref and alt alleles in the order listed - 11.9965,4.60191

    For this particular format, the highest possible call score is 1484.13, typical for very common variants. As you can see, some of the heterozygous variants listed have the highest possible score, so could not be simply low confidence calls.

  4. #3
    Registered Users
    Posts
    24
    Sex
    Location
    Texas
    Ethnicity
    Colonial British/German
    Nationality
    USA
    Y-DNA
    R-L1065>S744>S691*
    mtDNA
    T2b19a*

    So now, when I try to paste the data, I get these error messages -


    The following errors occurred with your submission

    You have included a total of 76 images in your message. The maximum number that you may include is 20. Please correct the problem and then continue again.

    Images include use of smilies, the BB code [img] tag, and HTML <img> tags. The use of these is all subject to them being enabled by the administrator.


    There are zero images or HTML tags included, so I have no idea what the problem is. Drat.

    I'll try posting the data in bites. Not ideal, but maybe I can post it that way.

    son - Illumina Nextera whole exome
    chrX 118603641 . G A 16.2974 . GC=0.514373;HL=3;HR=1.49647;IndelCnt=0;MismatchCnt =1 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:12,5:17:27:32,0,49:0.705882:0.647059:36:0,0:0,0: 67.5,63.4:4,2:11.9965,4.60191
    chrX 118603644 . G T 16.2994 . GC=0.514373;HL=3;HR=1.10058;IndelCnt=0;MismatchCnt =1 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:12,5:17:27:32,0,49:0.705882:0.647059:37:0,0:0,0: 68.5,64:4,2:11.9966,4.60236
    chrX 118603650 . T C 14.3443 . GC=0.513293;HL=2;HR=1.1528;IndelCnt=0;MismatchCnt= 0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:14,5:19:29:11,0,36:0.736842:0.631579:37:0,0:0,0: 67.9286,65.2:5,2:13.9957,4.60256
    chrX 118603668 . A G 33.4013 . GC=0.513969;HL=3;HR=3.69443;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:21,9:30:54:118,0,121:0.7:0.533333:35:0,2:0,0:82. 4286,66.3333:10,4:20.9935,8.28304
    chrX 118603706 . A AG 1484.13 . GC=0.517671;HL=5;HR=3.23131;IndelCnt=1;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS not in dbSNP 0/1:61,29:90:5000:500,0,500:0.677778:0.577778:37:2,1 8:1,0:64.5738,60.069:26,12:60.9793,26.6907
    (rs201182381) chrX 118603713 . T C 1484.13 . GC=0.517917;HL=5;HR=1.63986;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:61,29:90:5000:500,0,500:0.677778:0.566667:33:2,1 8:1,0:65.5246,61.4483:27,12:60.963,26.6988
    (rs143413528) chrX 118603729 . G A 1484.13 . GC=0.517921;HL=2;HR=1.84447;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:64,31:95:5000:500,0,500:0.673684:0.557895:36:4,2 0:1,2:68.1875,64.5806:29,13:63.9777,28.5357
    (rs148294496) chrX 118603742 . A C 1484.13 . GC=0.517663;HL=2;HR=2.01456;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:66,31:97:5000:500,0,500:0.680412:0.546392:35:4,2 0:1,2:73.303,66.6774:31,13:65.9792,28.5312
    chrX 118603747 . A T 1484.13 . GC=0.517121;HL=2;HR=1.183;IndelCnt=0;MismatchCnt=0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:66,31:97:5000:500,0,500:0.680412:0.536082:36:6,2 0:1,2:69.8485,67.4839:32,13:65.9745,28.5372
    chrX 118603773 . T C 1484.13 . GC=0.516763;HL=2;HR=1.9816;IndelCnt=0;MismatchCnt= 0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:56,30:86:5000:500,0,500:0.651163:0.523256:36:6,2 0:1,2:80.2857,68.2667:28,13:55.9801,27.6157
    Last edited by Ysearcher; 06-07-2019 at 05:04 PM.

  5. #4
    Registered Users
    Posts
    24
    Sex
    Location
    Texas
    Ethnicity
    Colonial British/German
    Nationality
    USA
    Y-DNA
    R-L1065>S744>S691*
    mtDNA
    T2b19a*

    Well, apparently the web site is interpreting something in the format as smiley faces. Bizarre. No idea how to turn it off. Being thwarted by clever tools is frustrating. I'll just move on with more data (and smiley faces)

    son - Illumina Nextera whole exome
    chrX 118603830 . A G 14.1288 . GC=0.520678;HL=3;HR=3.10238;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:27,8:35:54:9,0,35:0.771429:0.571429:35:0,7:0,2:8 3.037,93.875:13,2:26.9886,7.36502
    chrX 118604030 . T C 28.6962 . GC=0.495976;HL=3;HR=3.2093;IndelCnt=0;MismatchCnt= 0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:37,12:49:85:101,0,105:0.755102:0.44898:36:21,8:0 ,0:56.2973,77.5833:21,6:36.9857,11.2438
    chrX 118604043 . T C 108.097 . GC=0.491017;HL=3;HR=3.30159;IndelCnt=0;MismatchCnt =0.288675 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:23,12:35:71:238,0,238:0.657143:0.457143:36:11,8: 0,0:64.087,77.5833:13,6:22.9914,11.2428
    chrX 118604367 . C T 537.341 . GC=0.493317;HL=2;HR=1.15798;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:56,21:77:154:398,0,398:0.727273:0.519481:37:0,14 :1,14:71.2143,63.0952:27,10:55.9833,20.933
    chrX 118604399 . C G 19.303 . GC=0.504334;HL=3;HR=1.46674;IndelCnt=0;MismatchCnt =1 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:78,19:97:5000:55,0,66:0.804124:0.505155:36:2,12: 3,12:59.4359,66.1579:38,10:77.9703,18.9421
    chrX 118604400 . T G 31.9592 . GC=0.505663;HL=3;HR=1.50407;IndelCnt=0;MismatchCnt =1 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:73,19:92:155:113,0,116:0.793478:0.51087:36:2,12: 3,12:60.6301,66.1053:35,10:72.9748,18.9408
    chrX 118604409 . C T 16.842 . GC=0.507428;HL=3;HR=4.13216;IndelCnt=0;MismatchCnt =1 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:70,17:87:143:37,0,52:0.804598:0.505747:37:2,10:3 ,12:56.6286,74.1765:35,8:69.9772,16.9473
    chrX 118604412 . G A 16.8535 . GC=0.507428;HL=3;HR=4.25182;IndelCnt=0;MismatchCnt =2 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:70,17:87:143:37,0,52:0.804598:0.505747:36:2,10:3 ,12:56.6286,74.3529:35,8:69.9737,16.9482
    chrX 118604416 . A G 18.3581 . GC=0.507741;HL=2;HR=2.4436;IndelCnt=0;MismatchCnt= 1.23669 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:69,17:86:142:48,0,61:0.802326:0.5:35:2,10:3,12:5 5.087,74.5882:35,8:68.9716,16.9473
    chrX 118604428 . T C 107.609 . GC=0.51153;HL=3;HR=3.4567;IndelCnt=0;MismatchCnt=0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:52,17:69:129:237,0,238:0.753623:0.492754:36:2,10 :2,12:53.4423,75.2941:27,8:51.9805,16.9469
    Last edited by Ysearcher; 06-07-2019 at 05:06 PM.

  6. #5
    Registered Users
    Posts
    24
    Sex
    Location
    Texas
    Ethnicity
    Colonial British/German
    Nationality
    USA
    Y-DNA
    R-L1065>S744>S691*
    mtDNA
    T2b19a*

    So now, when I try to paste the data, I get these error messages -


    The following errors occurred with your submission

    You have included a total of 76 images in your message. The maximum number that you may include is 20. Please correct the problem and then continue again.

    Images include use of smilies, the BB code [img] tag, and HTML <img> tags. The use of these is all subject to them being enabled by the administrator.


    There are zero images included, so I have no idea what the problem is. Drat.

  7. #6
    Registered Users
    Posts
    24
    Sex
    Location
    Texas
    Ethnicity
    Colonial British/German
    Nationality
    USA
    Y-DNA
    R-L1065>S744>S691*
    mtDNA
    T2b19a*

    (rs12390) chrX 118604436 . T A 98.2523 . GC=0.515335;HL=2;HR=1.5;IndelCnt=0;MismatchCnt=0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:43,15:58:110:228,0,228:0.741379:0.482759:37:0,10 :2,12:55.2326,80:23,7:42.9864,14.952
    chrX 118604444 . G C 22.8934 . GC=0.517641;HL=5;HR=4.49927;IndelCnt=0;MismatchCnt =0.301511 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:38,11:49:84:76,0,83:0.77551:0.469388:37:0,7:2,10 :57.8421,78:20,6:37.9879,10.9652
    chrX 118605099 . C A 98.231 . GC=0.365118;HL=2;HR=1.82268;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:29,13:42:82:228,0,228:0.690476:0.642857:37:16,5: 0,0:80.8621,71.2308:10,5:28.9906,12.1792
    chrX 118605114 . C A 534.353 . GC=0.356318;HL=3;HR=2.57216;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:24,16:40:82:398,0,398:0.6:0.625:36:16,6:0,0:76.2 083,70.375:9,6:23.9916,14.9893
    chrX 118605134 . G T 512.255 . GC=0.347838;HL=2;HR=2.25332;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:30,17:47:101:394,0,394:0.638298:0.659574:37:20,6 :0,0:73.3667,71.6471:10,6:29.9903,15.927
    chrX 118605144 . G T 399.771 . GC=0.344001;HL=4;HR=6.75494;IndelCnt=0;MismatchCnt =1 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:33,17:50:105:369,0,369:0.66:0.64:36:20,6:0,0:79. 3333,74.5882:12,6:32.9878,15.9277
    chrX 118605145 . G A 399.618 . GC=0.344001;HL=4;HR=6.75494;IndelCnt=0;MismatchCnt =1 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:33,17:50:105:369,0,369:0.66:0.64:37:20,6:0,0:79. 6061,74.8824:12,6:32.9891,15.9272
    chrX 118605163 . C T 1484.13 . GC=0.326887;HL=3;HR=1.69252;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:43,22:65:140:500,0,500:0.661538:0.615385:35:24,6 :0,0:65.2558,81.8182:16,9:42.9835,20.6101
    chrX 118605195 . A ATGATGGTGG,ATGATGG 977.065 . GC=0.311063;HL=3;HR=1.03873;IndelCnt=1;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:63,25,1:89:5000:458,0,458,458,458,458:0.715909:0 .522727:36:28,6,0:0,0,0:78.873,99.92,6:30,12,1:62. 9749,23.4068,0.936435
    chrX 118605217 . G A 77.5456 . GC=0.295876;HL=6;HR=7.53493;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:73,22:95:5000:204,0,205:0.768421:0.505263:36:24, 5:1,0:74.7808,92.1364:35,12:72.9741,20.6135

  8. #7
    Registered Users
    Posts
    24
    Sex
    Location
    Texas
    Ethnicity
    Colonial British/German
    Nationality
    USA
    Y-DNA
    R-L1065>S744>S691*
    mtDNA
    T2b19a*

    chrX 118605245 . A C 35.3923 . GC=0.272431;HL=3;HR=2.8746;IndelCnt=0;MismatchCnt= 0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:53,16:69:119:124,0,126:0.768116:0.463768:36:5,1: 2,0:85.1698,69.6875:26,11:52.9815,14.9899

    mother - Illumina Nextera whole exome
    chrX 118604043 . T C 21.6092 . GC=0.495481;HL=3;HR=3.38824;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:18,7:25:43:69,0,77:0.72:0.48:36:15,5:1,0:55.2222 ,70.1429:9,4:17.9929,6.64943
    chrX 118604060 . G T 30.9822 . GC=0.497093;HL=3;HR=3.18229;IndelCnt=0;MismatchCnt =0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:8,6:14:21:110,0,113:0.571429:0.5:35:5,5:1,0:70.2 5,81.1667:4,3:7.99608,5.6988
    chrX 118604079 . C T 19.5107 . GC=0.477979;HL=2;HR=1.5;IndelCnt=0;MismatchCnt=0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 1/1:0,5:5:6:63,61,0:1:0.4:38:0,4:0,0:0,65:0,3:0,4.74 874
    chrX 118604367 . C T 1484.13 . GC=0.499415;HL=2;HR=1.151;IndelCnt=0;MismatchCnt=0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:94,33:127:5000:500,0,500:0.740157:0.496063:37:0, 23:1,31:63.7553,66.2121:46,18:93.9711,32.9169
    chrX 118605134 . G T 24.2917 . GC=0.344285;HL=2;HR=2.2381;IndelCnt=0;MismatchCnt= 0 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:51,15:66:110:83,0,89:0.772727:0.560606:37:39,13: 1,0:65.9804,62.1333:23,6:50.9833,13.8094
    chrX 118605144 . G T 36.0703 . GC=0.339348;HL=4;HR=6.83952;IndelCnt=0;MismatchCnt =1 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:56,17:73:125:126,0,128:0.767123:0.534247:38:39,1 3:1,0:72.2679,63:27,7:55.9834,15.6498
    chrX 118605145 . G A 36.0863 . GC=0.339348;HL=4;HR=6.83952;IndelCnt=0;MismatchCnt =1 GT:ADP:GQL:AB:SR:BQ:LowMQ:ClipCnt:ReadOffset:RAD:AS 0/1:56,17:73:125:126,0,128:0.767123:0.534247:37:39,1 3:1,0:72.3036,63.1765:27,7:55.9823,15.6504

    Any ideas or suggestions are greatly appreciated.

  9. #8
    Registered Users
    Posts
    24
    Sex
    Location
    Texas
    Ethnicity
    Colonial British/German
    Nationality
    USA
    Y-DNA
    R-L1065>S744>S691*
    mtDNA
    T2b19a*

    Well, I guess I'm just talking to myself on a public forum, but it's all I can do. So here's the entire header for the whole exomes in question (and I can't fix data that is translated into smiley faces by the Anthrogenica website) -

    ##fileformat=VCFv4.1
    ##fileDate=20140804
    ##FORMAT=<ID=GT,Number=1,Type=String,Description=" Genotype">
    ##FORMAT=<ID=AD,Number=.,Type=Integer,Description= "Allelic depths for the ref and alt alleles in the order listed">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description= "Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
    ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description= "Genotype Quality">
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description= "Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
    ##FORMAT=<ID=AB,Number=1,Type=Float,Description="R atio of genotype allelic depths for heterozygous variants">
    ##FORMAT=<ID=SR,Number=1,Type=Float,Description="R atio of reads supporting genotype that are on the forward strand">
    ##FORMAT=<ID=BQ,Number=1,Type=Integer,Description= "RMS Phred-scaled base quality of supporting reads at variant position">
    ##FORMAT=<ID=LowMQ,Number=.,Type=Integer,Descripti on="Number of reads with MQ<=10 for the ref and alt alleles in the order listed">
    ##FORMAT=<ID=ClipCnt,Number=.,Type=Integer,Descrip tion="Number of reads with clipping for the ref and alt alleles in the order listed">
    ##FORMAT=<ID=ReadOffset,Number=.,Type=Float,Descri ption="Average offset to variant position in reads supporting the ref and alt alleles in the order listed">
    ##FORMAT=<ID=RAD,Number=.,Type=Integer,Description ="Allelic depths for the ref and alt alleles in the order listed, for reverse-strand reads only">
    ##FORMAT=<ID=AS,Number=.,Type=Float,Description="A llele support scores for the ref and alt alleles in the order listed">
    ##INFO=<ID=AC,Number=A,Type=Integer,Description="A llele count in genotypes, for each ALT allele, in the same order as listed">
    ##INFO=<ID=AF,Number=A,Type=Float,Description="All ele Frequency, for each ALT allele, in the same order as listed">
    ##INFO=<ID=AN,Number=1,Type=Integer,Description="T otal number of alleles in called genotypes">
    ##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Descri ption="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
    ##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
    ##INFO=<ID=Dels,Number=1,Type=Float,Description="F raction of Reads Containing Spanning Deletions">
    ##INFO=<ID=FS,Number=1,Type=Float,Description="Phr ed-scaled p-value using Fisher's exact test to detect strand bias">
    ##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Desc ription="Consistency of the site with at most two segregating haplotypes">
    ##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Des cription="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
    ##INFO=<ID=MLEAC,Number=A,Type=Integer,Description ="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">
    ##INFO=<ID=MLEAF,Number=A,Type=Float,Description=" Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">
    ##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
    ##INFO=<ID=MQ0,Number=1,Type=Integer,Description=" Total Mapping Quality Zero Reads">
    ##INFO=<ID=MQRankSum,Number=1,Type=Float,Descripti on="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
    ##INFO=<ID=QD,Number=1,Type=Float,Description="Var iant Confidence/Quality by Depth">
    ##INFO=<ID=RPA,Number=.,Type=Integer,Description=" Number of times tandem repeat unit is repeated, for each allele (including reference)">
    ##INFO=<ID=RU,Number=1,Type=String,Description="Ta ndem repeat unit (bases)">
    ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Desc ription="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
    ##INFO=<ID=STR,Number=0,Type=Flag,Description="Var iant is a short tandem repeat">
    ##INFO=<ID=HR,Number=1,Type=Float,Description="RMS Weighted Homopolymer Rate of all reads">
    ##INFO=<ID=HL,Number=1,Type=Integer,Description="L ongest Homopolymer run adjacent to position across all reads">
    ##INFO=<ID=IndelCnt,Number=1,Type=Float,Descriptio n="RMS number of indels near variant position across all reads">
    ##INFO=<ID=MismatchCnt,Number=1,Type=Float,Descrip tion="RMS number of mismatches near variant position across all reads">
    ##INFO=<ID=GC,Number=1,Type=Float,Description="GC content of all reads at the variant position">
    ##contig=<ID=chrM,length=16571,assembly=hg19>
    ##contig=<ID=chr1,length=249250621,assembly=hg19>
    ##contig=<ID=chr2,length=243199373,assembly=hg19>
    ##contig=<ID=chr3,length=198022430,assembly=hg19>
    ##contig=<ID=chr4,length=191154276,assembly=hg19>
    ##contig=<ID=chr5,length=180915260,assembly=hg19>
    ##contig=<ID=chr6,length=171115067,assembly=hg19>
    ##contig=<ID=chr7,length=159138663,assembly=hg19>
    ##contig=<ID=chr8,length=146364022,assembly=hg19>
    ##contig=<ID=chr9,length=141213431,assembly=hg19>
    ##contig=<ID=chr10,length=135534747,assembly=hg19>
    ##contig=<ID=chr11,length=135006516,assembly=hg19>
    ##contig=<ID=chr12,length=133851895,assembly=hg19>
    ##contig=<ID=chr13,length=115169878,assembly=hg19>
    ##contig=<ID=chr14,length=107349540,assembly=hg19>
    ##contig=<ID=chr15,length=102531392,assembly=hg19>
    ##contig=<ID=chr16,length=90354753,assembly=hg19>
    ##contig=<ID=chr17,length=81195210,assembly=hg19>
    ##contig=<ID=chr18,length=78077248,assembly=hg19>
    ##contig=<ID=chr19,length=59128983,assembly=hg19>
    ##contig=<ID=chr20,length=63025520,assembly=hg19>
    ##contig=<ID=chr21,length=48129895,assembly=hg19>
    ##contig=<ID=chr22,length=51304566,assembly=hg19>
    ##contig=<ID=chrX,length=155270560,assembly=hg19>
    ##contig=<ID=chrY,length=59373566,assembly=hg19>
    ##contig=<ID=chr1_gl000191_random,length=106433,as sembly=hg19>
    ##contig=<ID=chr1_gl000192_random,length=547496,as sembly=hg19>
    ##contig=<ID=chr4_ctg9_hap1,length=590426,assembly =hg19>
    ##contig=<ID=chr4_gl000193_random,length=189789,as sembly=hg19>
    ##contig=<ID=chr4_gl000194_random,length=191469,as sembly=hg19>
    ##contig=<ID=chr6_apd_hap1,length=4622290,assembly =hg19>
    ##contig=<ID=chr6_cox_hap2,length=4795371,assembly =hg19>
    ##contig=<ID=chr6_dbb_hap3,length=4610396,assembly =hg19>
    ##contig=<ID=chr6_mann_hap4,length=4683263,assembl y=hg19>
    ##contig=<ID=chr6_mcf_hap5,length=4833398,assembly =hg19>
    ##contig=<ID=chr6_qbl_hap6,length=4611984,assembly =hg19>
    ##contig=<ID=chr6_ssto_hap7,length=4928567,assembl y=hg19>
    ##contig=<ID=chr7_gl000195_random,length=182896,as sembly=hg19>
    ##contig=<ID=chr8_gl000196_random,length=38914,ass embly=hg19>
    ##contig=<ID=chr8_gl000197_random,length=37175,ass embly=hg19>
    ##contig=<ID=chr9_gl000198_random,length=90085,ass embly=hg19>
    ##contig=<ID=chr9_gl000199_random,length=169874,as sembly=hg19>
    ##contig=<ID=chr9_gl000200_random,length=187035,as sembly=hg19>
    ##contig=<ID=chr9_gl000201_random,length=36148,ass embly=hg19>
    ##contig=<ID=chr11_gl000202_random,length=40103,as sembly=hg19>
    ##contig=<ID=chr17_ctg5_hap1,length=1680828,assemb ly=hg19>
    ##contig=<ID=chr17_gl000203_random,length=37498,as sembly=hg19>
    ##contig=<ID=chr17_gl000204_random,length=81310,as sembly=hg19>
    ##contig=<ID=chr17_gl000205_random,length=174588,a ssembly=hg19>
    ##contig=<ID=chr17_gl000206_random,length=41001,as sembly=hg19>
    ##contig=<ID=chr18_gl000207_random,length=4262,ass embly=hg19>
    ##contig=<ID=chr19_gl000208_random,length=92689,as sembly=hg19>
    ##contig=<ID=chr19_gl000209_random,length=159169,a ssembly=hg19>
    ##contig=<ID=chr21_gl000210_random,length=27682,as sembly=hg19>
    ##contig=<ID=chrUn_gl000211,length=166566,assembly =hg19>
    ##contig=<ID=chrUn_gl000212,length=186858,assembly =hg19>
    ##contig=<ID=chrUn_gl000213,length=164239,assembly =hg19>
    ##contig=<ID=chrUn_gl000214,length=137718,assembly =hg19>
    ##contig=<ID=chrUn_gl000215,length=172545,assembly =hg19>
    ##contig=<ID=chrUn_gl000216,length=172294,assembly =hg19>
    ##contig=<ID=chrUn_gl000217,length=172149,assembly =hg19>
    ##contig=<ID=chrUn_gl000218,length=161147,assembly =hg19>
    ##contig=<ID=chrUn_gl000219,length=179198,assembly =hg19>
    ##contig=<ID=chrUn_gl000220,length=161802,assembly =hg19>
    ##contig=<ID=chrUn_gl000221,length=155397,assembly =hg19>
    ##contig=<ID=chrUn_gl000222,length=186861,assembly =hg19>
    ##contig=<ID=chrUn_gl000223,length=180455,assembly =hg19>
    ##contig=<ID=chrUn_gl000224,length=179693,assembly =hg19>
    ##contig=<ID=chrUn_gl000225,length=211173,assembly =hg19>
    ##contig=<ID=chrUn_gl000226,length=15008,assembly= hg19>
    ##contig=<ID=chrUn_gl000227,length=128374,assembly =hg19>
    ##contig=<ID=chrUn_gl000228,length=129120,assembly =hg19>
    ##contig=<ID=chrUn_gl000229,length=19913,assembly= hg19>
    ##contig=<ID=chrUn_gl000230,length=43691,assembly= hg19>
    ##contig=<ID=chrUn_gl000231,length=27386,assembly= hg19>
    ##contig=<ID=chrUn_gl000232,length=40652,assembly= hg19>
    ##contig=<ID=chrUn_gl000233,length=45941,assembly= hg19>
    ##contig=<ID=chrUn_gl000234,length=40531,assembly= hg19>
    ##contig=<ID=chrUn_gl000235,length=34474,assembly= hg19>
    ##contig=<ID=chrUn_gl000236,length=41934,assembly= hg19>
    ##contig=<ID=chrUn_gl000237,length=45867,assembly= hg19>
    ##contig=<ID=chrUn_gl000238,length=39939,assembly= hg19>
    ##contig=<ID=chrUn_gl000239,length=33824,assembly= hg19>
    ##contig=<ID=chrUn_gl000240,length=41933,assembly= hg19>
    ##contig=<ID=chrUn_gl000241,length=42152,assembly= hg19>
    ##contig=<ID=chrUn_gl000242,length=43523,assembly= hg19>
    ##contig=<ID=chrUn_gl000243,length=43341,assembly= hg19>
    ##contig=<ID=chrUn_gl000244,length=39929,assembly= hg19>
    ##contig=<ID=chrUn_gl000245,length=36651,assembly= hg19>
    ##contig=<ID=chrUn_gl000246,length=38154,assembly= hg19>
    ##contig=<ID=chrUn_gl000247,length=36422,assembly= hg19>
    ##contig=<ID=chrUn_gl000248,length=39786,assembly= hg19>
    ##contig=<ID=chrUn_gl000249,length=38502,assembly= hg19>
    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT GRC14384680_S4_L00

  10. #9
    Registered Users
    Posts
    24
    Sex
    Location
    Texas
    Ethnicity
    Colonial British/German
    Nationality
    USA
    Y-DNA
    R-L1065>S744>S691*
    mtDNA
    T2b19a*

    Here is the web page for the the VCFv4.1 format - https://github.com/samtools/hts-spec...er/VCFv4.1.pdf Now, copied and pasted from the VCF format web page, the specifications related to genotype calling -

    1.4.2 Genotype fields
    If genotype information is present, then the same types of data must be present for all samples. First a FORMAT field is given specifying the data types and order (colon-separated alphanumeric String). This is followed by one field per sample, with the colon-separated data in this field corresponding to the types specified in the format. The first sub-field must always be the genotype (GT) if it is present. There are no required sub-fields.
    As with the INFO field, there are several common, reserved keywords that are standards across the community:
    • GT : genotype, encoded as allele values separated by either of / or |. The allele values are 0 for the reference allele (what is in the REF field), 1 for the first allele listed in ALT, 2 for the second allele list in ALT and so on. For diploid calls examples could be 0/1, 1 | 0, or 1/2, etc. For haploid calls, e.g. on Y, male non- pseudoautosomal X, or mitochondrion, only one allele value should be given; a triploid call might look like 0/0/1. If a call cannot be made for a sample at a given locus, ‘.’ should be specified for each missing allele in the GT field (for example ‘./.’ for a diploid genotype and ‘.’ for haploid genotype). The meanings of the separators are as follows (see the PS field below for more details on incorporating phasing information into the genotypes):
    ◦ / : genotype unphased ◦ | : genotype phased
    So I hope this clarifies my dilemma. All phenotype evidence contraindicates Klinefelter syndrome (subject has a presumed biological son, normal well developed muscular build, no gynecomastia, etc.) yet the whole exome data indicates a diploid X chromosome, and the proband has a fatal neurodegenerative disorder. What am I missing here?? Either there IS a diploid X chromosome, or the data is incorrect. Which is it? I suspect lab error.
    Last edited by Ysearcher; 06-09-2019 at 05:17 PM.

  11. #10
    Registered Users
    Posts
    1,198
    Sex
    Location
    Glasgow, Scotland
    Ethnicity
    Pictland/Deira
    Y-DNA
    R1b-M222-FGC5864
    mtDNA
    H5r*

    So your primary question is about the validity of heterozygous calls on a male X?

    I have them on mine and cross checking against multiple tests shows that at least some of them are real, some turn into homozyous calls from another lab - haven't done a full analysis yet.

    If there is something riding on the answer I would consult someone like Dr. Ann Turner who probably can just rattle the answer off.

    I did wonder if there is some connection with genes that have escaped X inactivation but I'm unsure.
    KDM6A is cited as a classic example of this and looking at that gene virtually all the variants I have within it are hetero. Nearby ZNF674 which has been categorised as a well behaved silenced gene shows the exact opposite pattern. However when I checked some more the pattern didn't seem to always hold up.

    I found a table of escapeeism status in this paper
    https://bsd.biomedcentral.com/articl...293-015-0053-7

    but it doesn't have data for the gene you mentioned in particular. The table is at

    https://static-content.springer.com/...OESM1_ESM.xlsx

    I'm not sure how if at all this effects the germline. One paper I looked at specifically said the silencing is outside the germline in which case this is a red herring.

    A disproportionate number of my het calls are in PAR1/PAR2 which is to be expected - about 1/6 of the total within just 1.7% of the length.
    Last edited by MacUalraig; 06-09-2019 at 05:21 PM. Reason: tidied up url
    YSEQ:#37; YFull: YF01405 (Y Elite 2013)
    WGS (Full Genomes Nov 2015, YSEQ Feb 2019, Dante Mar 2019, FGC-10X Linked Reads Apr 2019, Dante-Nanopore May 2019) - further WGS tests pending ;-)
    Ancestry GCs: Scots in central Scotland & Ulster, Ireland; English in Yorkshire & Pennines
    Hidden Content
    FBIMatch: A828783 (autosomal DNA) for segment matching DO NOT POST ADMIXTURE REPORTS USING MY KIT

  12. The Following User Says Thank You to MacUalraig For This Useful Post:

     pmokeefe (06-09-2019)

Page 1 of 2 12 LastLast

Similar Threads

  1. Exome Coverage Differences
    By dbm in forum 23andMe
    Replies: 0
    Last Post: 04-30-2018, 05:13 PM
  2. Replies: 0
    Last Post: 01-10-2017, 12:34 PM
  3. Genos Whole Exome Sequencing for $399
    By Heber in forum Other
    Replies: 16
    Last Post: 11-27-2016, 09:57 PM
  4. 75x Exome vs. 4x Genome
    By BrianNC in forum Medical Genetics
    Replies: 2
    Last Post: 07-02-2016, 06:51 PM
  5. Exome pilot
    By warwick in forum General
    Replies: 5
    Last Post: 06-17-2013, 07:03 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •