PDA

View Full Version : BAM Analysis Kit (Windows and Linux). New release.



teepean47
06-02-2019, 12:25 PM
I have released a new version of BAM Analysis kit. Please report any bugs you find!

Highlights:

-Update SNP filters to include more SNPs (23andMe V3 to V5, FTDNA old and new, Ancestry)

-Replace samtools and bcftools with native versions. Htslib has fixed a bug that prevented using native samtools/bcftools. Using native version of samtools speeds up handling large BAM files.

-Remove old mtDNA haplogroup prediction

-Add Yleaf v2 and use it to predict Y-haplogroup

-Update haplogrep to 2.1.20

Download the zip file or use a Git client.

https://github.com/teepean/BAM-Analysis-Kit/releases

I have also released an experimental Linux version of BAM Analysis Kit. Required software: bedtools, samtools 1.7 or higher, bcftools 1.7 or higher, gawk, Python 3.6 or higher with pandas installed

https://github.com/teepean/BAM-Analysis-Kit-Linux/releases/tag/v2.09

ArmandoR1b
06-02-2019, 04:26 PM
Windows BAK v2.091 fails for me. I tested it with a random file. Here is the first error with v2.091.



Input BAM : "C:\BAM\Crusades\SI-41.bam"

Pre-execution Cleanup ...

Sorting ...
sort: invalid option -- '@'
open: No such file or directory
[bam_sort_core] fail to open file 4


BAM Analysis Kit 1.8 works fine for the same file. The BAM is in a different directory than the programs.

Tonev
06-02-2019, 06:26 PM
Windows BAK v2.091 fails for me. I tested it with a random file. Here is the first error with v2.091.



Input BAM : "C:\BAM\Crusades\SI-41.bam"

Pre-execution Cleanup ...

Sorting ...
sort: invalid option -- '@'
open: No such file or directory
[bam_sort_core] fail to open file 4


BAM Analysis Kit 1.8 works fine for the same file. The BAM is in a different directory than the programs.

Same errors and behaviour for me too...

teepean47
06-03-2019, 03:19 AM
I have released a fix, version 2.0.92. For some reason Git did not update the old executables.

https://github.com/teepean/BAM-Analysis-Kit/releases

xenus
06-03-2019, 04:46 AM
Were the issues Generalissimo brought up ever addressed?

teepean47
06-03-2019, 05:31 AM
Were the issues Generalissimo brought up ever addressed?

If you mean base alignment quality then yes. BAQ is not enabled in samtools or GATK.

I have been working on a new tool based on pileupCaller. The problem with that tool at the moment is that I haven't figured out a decent way to annotate the results.

https://github.com/teepean/FADAK

xenus
06-03-2019, 07:52 AM
If you mean base alignment quality then yes. BAQ is not enabled in samtools or GATK.

I have been working on a new tool based on pileupCaller. The problem with that tool at the moment is that I haven't figured out a decent way to annotate the results.

https://github.com/teepean/FADAK

What annotations do you need?
Are you just trying to work pileupCaller into a pipeline?

teepean47
06-03-2019, 09:19 AM
What annotations do you need?
Are you just trying to work pileupCaller into a pipeline?

The aim is to replace GATK with pileUpcaller or any other program (snpAD, ANGSD, FreeBayes) suitable for ancient DNA. One aim is to produce Gedmatch compatible files and therefore annotating missing rsids is a necessity. The pipeline has to work on Windows but so far pileupCaller is the only one that is working and is stable for everyday usage. The problem with pileupCaller is that is not multi-threading so could be a lot faster but I have no experience with Haskell.

xenus
06-03-2019, 01:04 PM
The aim is to replace GATK with pileUpcaller or any other program (snpAD, ANGSD, FreeBayes) suitable for ancient DNA. One aim is to produce Gedmatch compatible files and therefore annotating missing rsids is a necessity. The pipeline has to work on Windows but so far pileupCaller is the only one that is working and is stable for everyday usage. The problem with pileupCaller is that is not multi-threading so could be a lot faster but I have no experience with Haskell.

If the problem is just parallelizing without shared resources it should be easy. My Haskell ability is rudimentary at best and Haskell's Cabal package manager has always wanted to lock me in dependency hell which I wouldn't imagine to be any nicer on windows.

Seeing that your repo includes Cygwin i'm wondering if that's just because it works better for your use case than WSL or does WSL just not work at all?

teepean47
06-03-2019, 02:24 PM
If the problem is just parallelizing without shared resources it should be easy. My Haskell ability is rudimentary at best and Haskell's Cabal package manager has always wanted to lock me in dependency hell which I wouldn't imagine to be any nicer on windows.

Cabal works on Windows. Most of the time. I am currently experimenting with parallel options (RTS+) as pileupCaller is compiled with the -threaded option.


Seeing that your repo includes Cygwin i'm wondering if that's just because it works better for your use case than WSL or does WSL just not work at all?

I am using Windows/Cygwin combination so that the program is more accessible to average users. Linux test version is working fine using WSL.

ArmandoR1b
06-04-2019, 12:29 AM
I have released a fix, version 2.0.92. For some reason Git did not update the old executables.

https://github.com/teepean/BAM-Analysis-Kit/releases

That works for me. If I gave you a spreadsheet of ISOGG 2017 Y-DNA longhand names would you be able to add them so they show in the output of Complete_SNPs_y.csv? I currently use Access to cross-reference the output so that I can put the SNPs in order by longhand name. The only reason I mention 2017 is because that is closest to what more recent studies have used. Otherwise I would just use the most recent SNP Index that ISOGG has available.

teepean47
06-04-2019, 05:21 AM
That works for me. If I gave you a spreadsheet of ISOGG 2017 Y-DNA longhand names would you be able to add them so they show in the output of Complete_SNPs_y.csv? I currently use Access to cross-reference the output so that I can put the SNPs in order by longhand name. The only reason I mention 2017 is because that is closest to what more recent studies have used. Otherwise I would just use the most recent SNP Index that ISOGG has available.

Sure why not. By the way, Yleaf uses ISOGG from 2019.

ArmandoR1b
06-04-2019, 12:52 PM
Sure why not. By the way, Yleaf uses ISOGG from 2019. Nice. I hadn't looked for YLeaf2 and was used to the first version of YLeaf so I didn't know they had also used a more current tree. I'll get both in the spreadsheet so that we can have both 2019 and 2017 since the 2019 tree has more SNPs.

xenus
06-04-2019, 09:44 PM
Taking a look at my development bookmarks there are a few things I've saved. These two are written in go so i think getting them working with cygwin isn't out of the question.
Elprep claims to be a GATK4 replacement https://github.com/ExaScience/elprep
Bigly is a pileup caller library with more detailed output than the samtools mpileup https://github.com/brentp/bigly

Only other things I have saved are the mega projects like http://bdgenomics.org/ and https://usegalaxy.org/ which are really impressive but massive projects. There really are some awesome things going on as far as platforms go but even if it's open source it's rarely painless to set up a local version of a hosted platform.

167273
06-05-2019, 12:17 AM
I try it and version 2.091 isn´t working. It start analysing, but there is a message sorted a realigned BAM not found. After analysis no SNPs found.

AbdoNumen
06-05-2019, 03:49 AM
Ran 2 more BAMs today, not as successfully.

I1945 (https://www.ebi.ac.uk/ena/data/view/ERS1211897) returned as "NA", while I1293 (https://www.ebi.ac.uk/ena/data/view/ERS1211894) returned just J-CTS5904* (=J2). By comparison, Genetiker reported them as R2a-Y3399 and J2a-CTS1085, respectively.

teepean47
06-05-2019, 06:08 PM
Taking a look at my development bookmarks there are a few things I've saved. These two are written in go so i think getting them working with cygwin isn't out of the question.
Elprep claims to be a GATK4 replacement https://github.com/ExaScience/elprep
Bigly is a pileup caller library with more detailed output than the samtools mpileup https://github.com/brentp/bigly

Only other things I have saved are the mega projects like http://bdgenomics.org/ and https://usegalaxy.org/ which are really impressive but massive projects. There really are some awesome things going on as far as platforms go but even if it's open source it's rarely painless to set up a local version of a hosted platform.

Elprep looks very interesting and it might speed up the preprocessing stage a bit but unfortunately the actual genotyping is still handled by GATK.

I have been testing Freebayes but don't have any idea at the moment what settings are suitable for aDNA.

I did test Bigly but the output was just zeroes for some reason.

teepean47
06-05-2019, 06:09 PM
Ran 2 more BAMs today, not as successfully.

I1945 (https://www.ebi.ac.uk/ena/data/view/ERS1211897) returned as "NA", while I1293 (https://www.ebi.ac.uk/ena/data/view/ERS1211894) returned just J-CTS5904* (=J2). By comparison, Genetiker reported them as R2a-Y3399 and J2a-CTS1085, respectively.

Looks like Yleaf doesn't get enough snips to make a prediction. The output of the program Genetiker uses looks very familiar but I do not remember what it is. Anyway he does the prediction "manually".

Puntanen
06-11-2019, 11:20 PM
Excellent, thank you! Something seems to be better now.
The Levänluhta JK2065 individual with the original and the updated analysis kit.
Here's the Gedmatch comparison differences between the two, and the respective diagnostic statuses.
This was I believe a sparse DNA set.

30901

teepean47
06-13-2019, 05:52 PM
Excellent, thank you! Something seems to be better now.
The Levänluhta JK2065 individual with the original and the updated analysis kit.
Here's the Gedmatch comparison differences between the two, and the respective diagnostic statuses.
This was I believe a sparse DNA set.



Increased SNP count is because of a better filter that includes a wider range of SNPs.

ArmandoR1b
06-25-2019, 02:40 AM
Sure why not. By the way, Yleaf uses ISOGG from 2019.
I finally got around to looking at the files of the program and I see that YLeaf has an hg19.txt file with the longhand names included. Would you be able to tweak your program so it includes the longhand name in the output of the Complete_SNPs_y.csv ?

teepean47
06-25-2019, 04:34 PM
I finally got around to looking at the files of the program and I see that YLeaf has an hg19.txt file with the longhand names included. Would you be able to tweak your program so it includes the longhand name in the output of the Complete_SNPs_y.csv ?

I can look into it. The source of Yleaf is included with the kit as it is written in Python.

ArmandoR1b
06-26-2019, 02:38 AM
I can look into it.
Thanks. Just an idea to throw out there. Maybe an integrated tweak of http://www.y-str.org/2014/04/23andme-to-ysnps.html to take the Complete_SNPs_y.csv, instead of 23andme. I edited it to work as a standalone program that can process the Complete_SNPS_y.csv file https://bit.ly/2NeMkTR If you are willing to try it et me know what you think.



The source of Yleaf is included with the kit as it is written in Python.
Right, I found the hg19.txt file in your program files under Yleaf>Position Files.

teepean47
06-28-2019, 02:32 PM
Thanks. Just an idea to throw out there. Maybe an integrated tweak of http://www.y-str.org/2014/04/23andme-to-ysnps.html to take the Complete_SNPs_y.csv, instead of 23andme. I edited it to work as a standalone program that can process the Complete_SNPS_y.csv file https://bit.ly/2NeMkTR If you are willing to try it et me know what you think.


Right, I found the hg19.txt file in your program files under Yleaf>Position Files.

About the longhand names: my results do have them. For example I7275 gives following output:


chr marker_name haplogroup pos mutation anc der reads called_perc called_base state Description
chrY FGC4725 J1a2a1a2d2b2b2c3b3~ 2810893 G->T G T 1 100 A NA Discordant genotype
chrY Y18093 Q1b2b1a2e1~ 2853508 C->G C G 2 50 T NA Discordant genotype
chrY L1242 I1a3a1a2b~ 6753291 G->T G T 1 100 A NA Discordant genotype
chrY M5606 CT 7707388 T->G T G 2 50 A NA Discordant genotype
chrY Z46011 G2a2a2b3b~ 7797337 A->G A G 1 100 C NA Discordant genotype
chrY Z12438 C1b1a1a 7821105 G->T G T 1 100 A NA Discordant genotype
chrY YP1257 R1a1a1b1a3a2b1~ 8472810 C->A C A 2 50 T NA Discordant genotype
chrY FGC59478 G2a2a1a1a1a2b~ 8616697 G->A G A 14 100 T NA Discordant genotype
chrY Z12176 NO 9978055 C->A C A 2 50 T NA Discordant genotype
chrY BY32169 I2b 13464054 C->A C A 2 50 G NA Discordant genotype
chrY Y23139 I1a2a1a1d1a1a1b1~ 13824095 A->T A T 1 100 G NA Discordant genotype
chrY AM01255 I2a1a2~ 14201162 T->G T G 1 100 C NA Discordant genotype
chrY Z35308 N1a1a1a1a3a1~ 14775042 C->A C A 1 100 T NA Discordant genotype
chrY Z13912 H3b 15103568 G->C G C 2 50 A NA Discordant genotype
chrY Y50468 I2a1b2a1a1 15270735 C->A C A 1 100 T NA Discordant genotype
chrY CTS5904 J2 16542685 C->G C G 2 50 T NA Discordant genotype
chrY Z42059 S1d 16971068 C->T C T 2 50 A NA Discordant genotype
chrY FGC26903 A00 17566434 C->A C A 1 100 G NA Discordant genotype
chrY Y19555 I2a1b1a2a1a1~ 17973524 C->A C A 3 100 T NA Discordant genotype
chrY MF16057 O1b1a1a1a1b1a1a1a1b 19122575 C->A C A 1 100 T NA Discordant genotype
chrY S2320 I1c1a1~ 19341119 G->T G T 1 100 A NA Discordant genotype
chrY Z18982 H2 21807412 G->T G T 1 100 A NA Discordant genotype
chrY Z4084 C 21810124 G->A G A 1 100 C NA Discordant genotype
chrY ZS9877 J1a2a1a1a~ 21916926 G->T G T 2 50 A NA Discordant genotype
chrY ZS1616^^ J1a2a1a2d2b2b2c4b1c1e1a~ 22254665 G->C G C 1 100 A NA Discordant genotype
chrY CTS11448 N1 23115653 G->A G A 2 50 T NA Discordant genotype
chrY Z31172 M1~ 23635202 G->C G C 1 100 A NA Discordant genotype

ArmandoR1b
06-28-2019, 04:47 PM
About the longhand names: my results do have them. For example I7275 gives following output:
I'm looking for a Windows based solution. The output that you posted is from Yleaf which is Linux based. I don't get the output that you have when I use BAM Analysis Kit in Windows.

teepean47
06-28-2019, 06:11 PM
I'm looking for a Windows based solution. The output that you posted is from Yleaf which is Linux based. I don't get the output that you have when I use BAM Analysis Kit in Windows.

This was created with the Windows version.

ArmandoR1b
06-28-2019, 06:16 PM
This was created with the Windows version.

So why do I not get the same output as you?

teepean47
06-28-2019, 06:29 PM
So why do I not get the same output as you?

I have no idea. Could you point me to a BAM you have tried.

ArmandoR1b
06-28-2019, 08:13 PM
I have no idea. Could you point me to a BAM you have tried.

I don't think that it has anything to do with the BAM file but rather a setting or something similar. A file I recently tried is I5661 (ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR220/ERR2207066/I5661_1240k.bam) What are the steps you use and what is the name of the output file that has the longhand name? All I do is open BAM Analysis Kit.exe with only Y chromosome selected then I point to the BAM then when it is done the folder named "out" opens with the output files which includes Complete_SNPs_y.csv and when I open that file the only columns are SNP Name, Position, Derived, Reference, and Genotype.

teepean47
06-28-2019, 09:17 PM
I don't think that it has anything to do with the BAM file but rather a setting or something similar. A file I recently tried is I5661 (ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR220/ERR2207066/I5661_1240k.bam) What are the steps you use and what is the name of the output file that has the longhand name? All I do is open BAM Analysis Kit.exe with only Y chromosome selected then I point to the BAM then when it is done the folder named "out" opens with the output files which includes Complete_SNPs_y.csv and when I open that file the only columns are SNP Name, Position, Derived, Reference, and Genotype.

Complete_SNPs_y.csv is generated by the older version made by Felix. The files generated by Yleaf are y-haplogroup.out, y-haplogroup.txt, y-haplogroup.fmf, y-haplogroup.chr. If you do not have these files then you are not running the latest version of BAM Analysis Kit.

ArmandoR1b
06-28-2019, 09:53 PM
Complete_SNPs_y.csv is generated by the older version made by Felix. The files generated by Yleaf are y-haplogroup.out, y-haplogroup.txt, y-haplogroup.fmf, y-haplogroup.chr. If you do not have these files then you are not running the latest version of BAM Analysis Kit.

I downloaded and extracted BAM Analysis Kit 2.0.92 and I run BAM Analysis Kit.exe from that folder. How do I have the wrong version if that is the latest version and it is what I am running on my PC?

ArmandoR1b
06-28-2019, 10:23 PM
I had never paid attention but there are errors before the program finishes.



Erasmus MC Department of Genetic Identification

Yleaf: software tool for human Y-chromosomal
phylogenetic analysis and haplogroup inference v2.0



|
/|\
/\|/\
\\\|///
\\|//
|||
|||
|||


A subdirectory or file C:\BAM already exists.
Error occurred while processing: C:\BAM.
A subdirectory or file Analysis already exists.
Error occurred while processing: Analysis.
A subdirectory or file Kit already exists.
Error occurred while processing: Kit.
A subdirectory or file 2.092\out already exists.
Error occurred while processing: 2.092\out.
Starting...
bam_sorted_realigned.bam
A subdirectory or file C:\BAM already exists.
Error occurred while processing: C:\BAM.
A subdirectory or file Analysis already exists.
Error occurred while processing: Analysis.
A subdirectory or file Kit already exists.
Error occurred while processing: Kit.
A subdirectory or file 2.092\out\bam_sorted_realigned already exists.
Error occurred while processing: 2.092\out\bam_sorted_realigned.
Traceback (most recent call last):
File "bin\yleaf\Yleafw.py", line 470, in <module>
output_file = samtools(threads, folder, folder_name, bam_file, args.Quality_thresh)
File "bin\yleaf\Yleafw.py", line 382, in samtools
header,total_reads = chromosome_table(bam_file,folder,file_name)
File "bin\yleaf\Yleafw.py", line 200, in chromosome_table
df_chromosome.to_csv(output, index=None, sep="\t")
File "C:\BAM Analysis Kit 2.092\bin\python373\lib\site-packages\pandas\core\generic.py", line 3020, in to_csv
formatter.save()
File "C:\BAM Analysis Kit 2.092\bin\python373\lib\site-packages\pandas\io\formats\csvs.py", line 157, in save
compression=self.compression)
File "C:\BAM Analysis Kit 2.092\bin\python373\lib\site-packages\pandas\io\common.py", line 424, in _get_handle
f = open(path_or_buf, mode, encoding=encoding, newline="")
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\BAM Analysis Kit 2.092\\out\\bam_sorted_realigned\\bam_sorted_reali gned.chr'
The system cannot find the file specified.
The system cannot find the file specified.
The system cannot find the file specified.
The system cannot find the file specified.
The system cannot find the file specified.
The system cannot find the file specified.
Extracting Y-SNP markers ..

All Tasks Completed. Please find results in out subfolder.
Also check the logs/info in this window for errors (if any).
Press any key to continue . . .


The BAK program that Felix created would automatically clean up anything that needed cleaning up prior to processing. It would change the out directory to out.old. However, for your version even if I manually delete the directories such as
C:\BAM and 'C:\BAM Analysis Kit 2.092\out\ I still have errors. Why are there two backslashes prior to each directory for
C:\\BAM Analysis Kit 2.092\\out\\bam_sorted_realigned\\bam_sorted_reali gned.chr

teepean47
06-29-2019, 07:33 AM
I had never paid attention but there are errors before the program finishes.



Erasmus MC Department of Genetic Identification

Yleaf: software tool for human Y-chromosomal
phylogenetic analysis and haplogroup inference v2.0



|
/|\
/\|/\
\\\|///
\\|//
|||
|||
|||


A subdirectory or file C:\BAM already exists.
Error occurred while processing: C:\BAM.
A subdirectory or file Analysis already exists.
Error occurred while processing: Analysis.
A subdirectory or file Kit already exists.
Error occurred while processing: Kit.
A subdirectory or file 2.092\out already exists.
Error occurred while processing: 2.092\out.
Starting...
bam_sorted_realigned.bam
A subdirectory or file C:\BAM already exists.
Error occurred while processing: C:\BAM.
A subdirectory or file Analysis already exists.
Error occurred while processing: Analysis.
A subdirectory or file Kit already exists.
Error occurred while processing: Kit.
A subdirectory or file 2.092\out\bam_sorted_realigned already exists.
Error occurred while processing: 2.092\out\bam_sorted_realigned.
Traceback (most recent call last):
File "bin\yleaf\Yleafw.py", line 470, in <module>
output_file = samtools(threads, folder, folder_name, bam_file, args.Quality_thresh)
File "bin\yleaf\Yleafw.py", line 382, in samtools
header,total_reads = chromosome_table(bam_file,folder,file_name)
File "bin\yleaf\Yleafw.py", line 200, in chromosome_table
df_chromosome.to_csv(output, index=None, sep="\t")
File "C:\BAM Analysis Kit 2.092\bin\python373\lib\site-packages\pandas\core\generic.py", line 3020, in to_csv
formatter.save()
File "C:\BAM Analysis Kit 2.092\bin\python373\lib\site-packages\pandas\io\formats\csvs.py", line 157, in save
compression=self.compression)
File "C:\BAM Analysis Kit 2.092\bin\python373\lib\site-packages\pandas\io\common.py", line 424, in _get_handle
f = open(path_or_buf, mode, encoding=encoding, newline="")
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\BAM Analysis Kit 2.092\\out\\bam_sorted_realigned\\bam_sorted_reali gned.chr'
The system cannot find the file specified.
The system cannot find the file specified.
The system cannot find the file specified.
The system cannot find the file specified.
The system cannot find the file specified.
The system cannot find the file specified.
Extracting Y-SNP markers ..

All Tasks Completed. Please find results in out subfolder.
Also check the logs/info in this window for errors (if any).
Press any key to continue . . .


The BAK program that Felix created would automatically clean up anything that needed cleaning up prior to processing. It would change the out directory to out.old. However, for your version even if I manually delete the directories such as
C:\BAM and 'C:\BAM Analysis Kit 2.092\out\ I still have errors. Why are there two backslashes prior to each directory for
C:\\BAM Analysis Kit 2.092\\out\\bam_sorted_realigned\\bam_sorted_reali gned.chr

I noticed that path C:\BAM Analysis Kit 2.092 does have spaces so could you rename it to BAM-Analysis-Kit-2.092? The file downloaded from Github should not have spaces. Cygwin is very particular about not having spaces in path names which is one of the reasons I have been trying to eliminate Cygwin from BAM Analysis Kit.

Can you also send me the complete output? Use pastebin or similar.

ArmandoR1b
06-29-2019, 01:21 PM
I noticed that path C:\BAM Analysis Kit 2.092 does have spaces so could you rename it to BAM-Analysis-Kit-2.092? The file downloaded from Github should not have spaces. Cygwin is very particular about not having spaces in path names which is one of the reasons I have been trying to eliminate Cygwin from BAM Analysis Kit.

Can you also send me the complete output? Use pastebin or similar.

Renaming the directory worked. I now have y-haplogroup.out, y-haplogroup.txt, y-haplogroup.fmf, y-haplogroup.chr. and y-haplogroup.out has the longhand name like I was looking for. It doesn't the headers in the same order that you had for I7275. My y-haplogroup.out has chr pos marker_name haplogroup mutation anc der reads called_perc called_base state and does not have Description. I don't see that as a problem though. The file y-haplogroup.txt must be the prediction output. It is hilariously bad. The BAM for esp005 (ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR222/ERR2224078/esp005.merged.hs37d5.fa.cons.90perc.bam) shows him to be positive for DF27 and P312 and ancestral for M405 R1b1a1b1a1a1 but y-haplogroup.txt contains a prediction of R-S25738 which is a subclade of R-U106.

Here is the log in pastebin. https://pastebin.com/UzAub4Fy

ArmandoR1b
06-29-2019, 04:46 PM
I updated the hg19.txt file to include synonyms of the SNPs since they were missing. It's annoying to see more commonly known SNPs such as M405 but not it's synonym U106. https://bit.ly/2RIiWnv I replaced the original hg19.txt file with that file in the directory
C:\BAM-Analysis-Kit-2.092\bin\yleaf\Position_files It makes it easier to spot important SNPs.

teepean47
07-02-2019, 02:07 PM
I updated the hg19.txt file to include synonyms of the SNPs since they were missing. It's annoying to see more commonly known SNPs such as M405 but not it's synonym U106. https://bit.ly/2RIiWnv I replaced the original hg19.txt file with that file in the directory
C:\BAM-Analysis-Kit-2.092\bin\yleaf\Position_files It makes it easier to spot important SNPs.

You should send it to the author:

https://github.com/genid/Yleaf

azzam
07-30-2019, 03:55 PM
I have released a new version of BAM Analysis kit. Please report any bugs you find!


https://github.com/teepean/BAM-Analysis-Kit-Linux/releases/tag/v2.09

Is it HG38 or HG19?

teepean47
08-01-2019, 03:13 PM
Is it HG38 or HG19?

HG19 at the moment.

teepean47
11-09-2019, 08:44 AM
I have released two new versions of BAM Analysis Kit:

"Normal" version 2.93 with a bugfix where error manifested with "[W::sam_parse1] urecognized reference name; treated as unmapped"

https://github.com/teepean/BAM-Analysis-Kit/archive/v2.093.zip

A Lite version where I have removed most of the Cygwin binaries and deprecated lobSTR as it is not developed any longer. I have also updated Y-SNP reference, thanks to shadowhite

https://github.com/teepean/BAM-Analysis-Kit/archive/v3.01.zip

Future:

I have been investigating the possibility to include FreeBayes so that could be used instead of GATK. FreeBayes has better support for aDNA and is considerably faster.

dufresda
11-10-2019, 01:28 AM
Hey teepean,

Awesome work again by the way , appreaciated you fast response error in .092.

Currently testing lite version, its running on a WGS bam so not quite done yet but will report should there be any issues

Tonev
01-02-2020, 02:05 PM
Hello, I need some help/clarification. I have tested in DANTE Labs in 2018 (the chinese laboratory at that time). My original vcf file contained ~ 3718 lines of Y-DNA snps.
From the other side, BAM Analysis Kit v.2.092 processed the original bam file from DANTE and produced over 16 million lines of snps for the Y-DNA (bam_chrY.vcf.gz).
I deleted all lines with only Ref contents (and empty Alt positions) assuming that those match the reference snps. Yet the remaining part of the Y-DNA file (containing both Ref+ALt positions) was still huge- more than 33 thousand lines.... Instead of the 3718 lines of the DANTE vcf file.... Any thoughts/ comments? teepean47?

teepean47
01-02-2020, 04:55 PM
Hello, I need some help/clarification. I have tested in DANTE Labs in 2018 (the chinese laboratory at that time). My original vcf file contained ~ 3718 lines of Y-DNA snps.
From the other side, BAM Analysis Kit v.2.092 processed the original bam file from DANTE and produced over 16 million lines of snps for the Y-DNA (bam_chrY.vcf.gz).
I deleted all lines with only Ref contents (and empty Alt positions) assuming that those match the reference snps. Yet the remaining part of the Y-DNA file (containing both Ref+ALt positions) was still huge- more than 33 thousand lines.... Instead of the 3718 lines of the DANTE vcf file.... Any thoughts/ comments? teepean47?

Dante Labs has a different method to call variants.

Tonev
01-02-2020, 05:16 PM
Dante Labs has a different method to call variants.

I did already see that- my pack from the Chinese laboratory from 2018 is different from 2019 kits from the Italian laboratory of Dante. Any other comments?

teepean47
01-02-2020, 06:29 PM
I did already see that- my pack from the Chinese laboratory from 2018 is different from 2019 kits from the Italian laboratory of Dante. Any other comments?

I have no idea how Dante Labs' variant calling process works or what criteria they have to accept SNPs.

SakaDo
01-02-2020, 07:44 PM
Честита Нова Година, Тонев!
Според теб тази проба от варненския некропол може ли да се свърже с т. нар. тракийско население, защото м
атериалната култура определено говори за това? Или пробата не е достатъчно надеждна, за да бъде свързана с нещо конкретно?

Tonev
01-02-2020, 08:09 PM
Честита Нова Година, Тонев!
Според теб тази проба от варненския некропол може ли да се свърже с т. нар. тракийско население, защото м
атериалната култура определено говори за това? Или пробата не е достатъчно надеждна, за да бъде свързана с нещо конкретно?

Хубава нова 2020 и на теб. Не съм експерт по темата, но некрополът е датиран 5-4 хилядолетие пр.н.е., а траките като етноним се появяват в средата на 2 хил. пр.н.е.

Но всяка нова вълна наследява гените и културата на заварените, нали! Та чак до нас :)

Малко препратки към любимата уикипедия: за 5000г.пр.н.е. на бълг. ез. (https://bg.wikipedia.org/wiki/5_%D1%85%D0%B8%D0%BB%D1%8F%D0%B4%D0%BE%D0%BB%D0%B5 %D1%82%D0%B8%D0%B5_%D0%BF%D1%80.%D0%BD.%D0%B5.), на рус.ез.(https://ru.wikipedia.org/wiki/5-%D0%B5_%D1%82%D1%8B%D1%81%D1%8F%D1%87%D0%B5%D0%BB% D0%B5%D1%82%D0%B8%D0%B5_%D0%B4%D0%BE_%D0%BD._%D1%8 D.), и за 4000 г.пр.н.е. на бълг.ез. (https://bg.wikipedia.org/wiki/4_%D1%85%D0%B8%D0%BB%D1%8F%D0%B4%D0%BE%D0%BB%D0%B5 %D1%82%D0%B8%D0%B5_%D0%BF%D1%80.%D0%BD.%D0%B5.) и за траките (https://bg.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BA%D0%B8)

SakaDo
01-07-2020, 03:31 PM
Общото наименование за многобройните "траки" се появява късно, защото е екзоним даден от впоследствие нахлулите по нашите земи гърци, но това не е името, с което сме се самоназовавали. Не е автоним.
Така или ина4e въпросното погребение е свързано с трако-скитската материална култура. Освен това и езикът по тези земи не се е променял, а най-старите топоними са обясними
с нашия май4ин език и днес.
Съжалявам за 4-то, клавиатурата ми се е прецакала :)
https://www.facebook.com/SparotokPavelSerafimov/posts/459603164087652/

pepitus
01-22-2020, 07:51 PM
Good night Teepean47 I am trying to use the Software but only works for me the version 1.8, the other versions when goes to create the bam_complete_sort.bam or something like that, doesn't do it and do not remove the files .bam, so finish in some seconds creating the files in the out directory but totally empty...

I am using the Windows one and in his different versions of 3 and 2, only works the 1.8

¿Can you help me?

Regards

pepitus
01-22-2020, 08:39 PM
WIth Linux doesn't work also:

Sorting ...
[E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes
samtools sort: truncated file. Aborting

Indexing the sorted BAM file ...
[E::hts_open_format] Failed to open file bam_complete_sorted.bam
samtools index: failed to open "bam_complete_sorted.bam": No such file or directory

teepean47
01-23-2020, 07:14 AM
WIth Linux doesn't work also:

Sorting ...
[E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes
samtools sort: truncated file. Aborting

Indexing the sorted BAM file ...
[E::hts_open_format] Failed to open file bam_complete_sorted.bam
samtools index: failed to open "bam_complete_sorted.bam": No such file or directory

Can you point me to the BAM you are trying to process.

pepitus
01-23-2020, 10:30 AM
Thanks for your reply, the BAM is from DanteLab, what do you need to know?

teepean47
01-23-2020, 01:56 PM
Thanks for your reply, the BAM is from DanteLab, what do you need to know?

Can you copy the whole output and paste it here or to pastebin. It looks like something is corrupted.

I would like to download the BAM as well.

pepitus
01-24-2020, 12:40 PM
It seems to be corrupted but at the same time it seems to be ok...

[email protected]:~/BAM-Analysis-Kit-Linux$ samtools quickcheck -v my_bam.bam > bad_bams.fofn && echo 'all ok' || echo 'some files failed check, see bad_bams.fofn'
all ok

[email protected]:~/BAM-Analysis-Kit-Linux$ samtools flagstat my_bam.bam
[E::bgzf_uncompress] Inflate operation failed: 3
[E::bgzf_read] Read block operation failed with error 1 after 0 of 4 bytes
[bam_flagstat_core] Truncated file? Continue anyway.
937865035 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
1192053 + 0 supplementary
68527680 + 0 duplicates
917589400 + 0 mapped (97.84% : N/A)
936672982 + 0 paired in sequencing
468336491 + 0 read1
468336491 + 0 read2
907962854 + 0 properly paired (96.93% : N/A)
914178332 + 0 with itself and mate mapped
2219015 + 0 singletons (0.24% : N/A)
3034226 + 0 with mate mapped to a different chr
1458950 + 0 with mate mapped to a different chr (mapQ>=5)
[email protected]:~/BAM-Analysis-Kit-Linux$ tail -c 28 my_bam.bam | hexdump -C
00000000 1f 8b 08 04 00 00 00 00 00 ff 06 00 42 43 02 00 |............BC..|
00000010 1b 00 03 00 00 00 00 00 00 00 00 00 |............|
0000001c

[email protected]:~/BAM-Analysis-Kit-Linux$ samtools quickcheck -qvvv my_bam.bam
verbosity set to 3
checking my_bam.bam
opened my_bam.bam
my_bam.bam is sequence data
my_bam.bam has 86 targets in header.
my_bam.bam has good EOF block.

I will post this afternoon the output of the program, the truth is that with the 1.8 version in Windows I am having no problems... but with the rest... a lot

pepitus
01-24-2020, 01:07 PM
[email protected]:~/BAM-Analysis-Kit-Linux$ samtools quickcheck -v my_bam.bam > bad_bams.fofn && echo 'all ok' || echo 'some files failed check, see bad_bams.fofn'
all ok
[email protected]:~/BAM-Analysis-Kit-Linux$ samtools flagstat GFX0237712.bam
[E::bgzf_uncompress] Inflate operation failed: 3
[E::bgzf_read] Read block operation failed with error 1 after 0 of 4 bytes
[bam_flagstat_core] Truncated file? Continue anyway.
937865035 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
1192053 + 0 supplementary
68527680 + 0 duplicates
917589400 + 0 mapped (97.84% : N/A)
936672982 + 0 paired in sequencing
468336491 + 0 read1
468336491 + 0 read2
907962854 + 0 properly paired (96.93% : N/A)
914178332 + 0 with itself and mate mapped
2219015 + 0 singleton…
[email protected]:~/BAM-Analysis-Kit-Linux$ samtools view my_bam.bam | tail
[E::bgzf_uncompress] Inflate operation failed: 3
[E::bgzf_read] Read block operation failed with error 1 after 0 of 4 bytes
[main_samview] truncated file.
A00910:60:HYH7LDSXX:4:1448:17445:32863 77 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ################################### RG:Z:1
A00910:60:HYH7LDSXX:4:1448:17445:32863 141 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ################################### RG:Z:1
A00910:60:HYH7LDSXX:2:2326:14226:4241 77 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ################################### RG:Z:1
A00910:60:HYH7LDSXX:2:2326:14226:4241 141 * 0 0 * * 0 0 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN ################################### RG:Z:1
A00910:60:HYH7LDSXX:2:1208:12771:16470 77 * 0 0 * * 0 0 GTTGTGGGTATTGTTGTTTTCCTATGCAGGCTACTTCTTCGCCAATATCC CCGTCGTCAAAAACAATCTGGGCTTAGTCATGGCGGCAATCATCGTGATT TCGATTTTACCTGCGGTAGTCGAAATTATCCGTGCCAAACTTAACGCCAA A FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFF,FFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF FFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF F RG:Z:1
... (And more)

teepean47
01-24-2020, 04:12 PM
I will post this afternoon the output of the program, the truth is that with the 1.8 version in Windows I am having no problems... but with the rest... a lot

Version 1.8 uses older version of samtools that is more forgiving with errors.

EDIT: I am interested in at what point the error occurs. Also it does look like the BAM is corrupted and I recommend downloading it again and comparing it to the one you have downloaded previously.

pepitus
01-24-2020, 09:16 PM
I have downloaded the BAM file again, and now in my computer with the Linux version, IS WORKING! It seems you were completly right, I will informe if something goes wrong and also if goes well. Thanks a lot!

pepitus
01-24-2020, 11:38 PM
After a lot of this warnings, it is processing the chr1, this happenned once the bam_complete_sorted.bam was created

W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
... And more like this one

BAM-Linux-v.2.09

As I said now is calculating, are this messages a bad signal?

Regards

teepean47
01-25-2020, 05:25 AM
After a lot of this warnings, it is processing the chr1, this happenned once the bam_complete_sorted.bam was created

W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
... And more like this one

BAM-Linux-v.2.09

As I said now is calculating, are this messages a bad signal?

Regards

It its not dangerous and should happen due to the way BAM Analysis Kit handles BAMs per chromosome basis. I will update Github later today.

pepitus
01-25-2020, 07:30 AM
Thanks :)

teepean47
01-25-2020, 07:35 AM
Thanks :)

I have updated console_bam.sh, please test.

pepitus
01-25-2020, 06:34 PM
mmm still does it :(

... (A lot of)
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
[W::sam_parse1] urecognized mate reference name; treated as unmapped
Adding or Replace Read Group Header ...
INFO 2020-01-25 19:32:28 AddOrReplaceReadGroups

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
********** AddOrReplaceReadGroups -INPUT bam_wh_tmp.bam -OUTPUT bam_wh.bam -SORT_ORDER coordinate -RGID rgid -RGLB rglib -RGPL illumina -RGPU rgpu -RGSM sample -VALIDATION_STRINGENCY SILENT
**********


19:32:28.617 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/home/XXX/BAM-Analysis-Kit-Linux/bin/picard/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat Jan 25 19:32:28 CET 2020] AddOrReplaceReadGroups INPUT=bam_wh_tmp.bam OUTPUT=bam_wh.bam SORT_ORDER=coordinate RGID=rgid RGLB=rglib RGPL=illumina RGPU=rgpu RGSM=sample VALIDATION_STRINGENCY=SILENT VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sat Jan 25 19:32:28 CET 2020] Executing as [email protected] on Linux 4.19.0-7-amd64 amd64; OpenJDK 64-Bit Server VM 1.8.0_222-8u222-b10-1~deb9u1-b10; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.18.16-SNAPSHOT
INFO 2020-01-25 19:32:28 AddOrReplaceReadGroups Created read-group ID=rgid PL=illumina LB=rglib SM=sample

INFO 2020-01-25 19:32:39 AddOrReplaceReadGroups Processed 1.000.000 records. Elapsed time: 00:00:10s. Time for last 1.000.000: 10s. Last read position: chr1:2.763.215
INFO 2020-01-25 19:32:48 AddOrReplaceReadGroups Processed 2.000.000 r

pepitus
01-25-2020, 08:52 PM
INFO 21:33:02,339 ProgressMeter - done 2.49250621E8 64.8 m 15.0 s 100.0% 64.8 m 0.0 s
INFO 21:33:02,339 ProgressMeter - Total runtime 3888.13 secs, 64.80 min, 1.08 hours
INFO 21:33:02,340 MicroScheduler - 4227223 reads were filtered out during the traversal out of approximately 72216137 total reads (5.85%)
INFO 21:33:02,340 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
INFO 21:33:02,341 MicroScheduler - -> 232396 reads (0.32% of total) failing BadMateFilter
INFO 21:33:02,342 MicroScheduler - -> 3831284 reads (5.31% of total) failing DuplicateReadFilter
INFO 21:33:02,343 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 21:33:02,344 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 21:33:02,344 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 21:33:02,345 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 21:33:02,345 MicroScheduler - -> 163543 reads (0.23% of total) failing UnmappedReadFilter
------------------------------------------------------------------------------------------
Done. There were 1 WARN messages, the first 1 are repeated below.
WARN 20:28:14,256 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
------------------------------------------------------------------------------------------

Is the warn dangerous, this is the part of the chr1

teepean47
01-25-2020, 09:00 PM
INFO 21:33:02,339 ProgressMeter - done 2.49250621E8 64.8 m 15.0 s 100.0% 64.8 m 0.0 s
INFO 21:33:02,339 ProgressMeter - Total runtime 3888.13 secs, 64.80 min, 1.08 hours
INFO 21:33:02,340 MicroScheduler - 4227223 reads were filtered out during the traversal out of approximately 72216137 total reads (5.85%)
INFO 21:33:02,340 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
INFO 21:33:02,341 MicroScheduler - -> 232396 reads (0.32% of total) failing BadMateFilter
INFO 21:33:02,342 MicroScheduler - -> 3831284 reads (5.31% of total) failing DuplicateReadFilter
INFO 21:33:02,343 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter
INFO 21:33:02,344 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
INFO 21:33:02,344 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter
INFO 21:33:02,345 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter
INFO 21:33:02,345 MicroScheduler - -> 163543 reads (0.23% of total) failing UnmappedReadFilter
------------------------------------------------------------------------------------------
Done. There were 1 WARN messages, the first 1 are repeated below.
WARN 20:28:14,256 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
------------------------------------------------------------------------------------------

Is the warn dangerous, this is the part of the chr1

No. GATK warns that it needs at least 10 samples to perform a InbreedingCoeff calculation and BAM Analysis Kit handles only one BAM at a time.

teepean47
01-25-2020, 09:01 PM
mmm still does it :(


It is fixed now. I did the previous fix too early in the morning :)

Hodo Scariti
02-08-2020, 11:10 AM
A huge effort, thanks Teepean.

A question: does it exist a GUI for Linux version?

teepean47
02-08-2020, 03:36 PM
A huge effort, thanks Teepean.

A question: does it exist a GUI for Linux version?

Thansk!

I have been thinking about creating a GUI for Linux as well but haven't had the time to do that.

Hodo Scariti
02-10-2020, 07:22 AM
Thansk!

I have been thinking about creating a GUI for Linux as well but haven't had the time to do that.

Yes, I think that a GUI would be useful for Linux users not very advanced in console typing... but I think also that you already did a wonderful and really exhausting work.

pepitus
02-21-2020, 12:03 AM
Hello again, it could be possible to use your software in an Ethernet Cluster of 32 cores? What about memory limitations? It would be this, https://www.hardkernel.com/shop/odroid-mc1-my-cluster-one-with-32-cpu-cores-and-8gb-dram/

Regards

teepean47
02-21-2020, 05:38 AM
Hello again, it could be possible to use your software in an Ethernet Cluster of 32 cores? What about memory limitations? It would be this, https://www.hardkernel.com/shop/odroid-mc1-my-cluster-one-with-32-cpu-cores-and-8gb-dram/

Regards

The Linux version is just a script and all the included programs use Java or Python. All the steps that use GATK would have to be rewritten to use a cluster. I have no experience with that and do not have access to any cluster.

werno
10-05-2020, 09:12 AM
Hello,

I have analyzed my bam files with your tool bam-analysis-kit 1.8 und 2.0
and
I have found a discrepancy in the html SNPedia report for the Chr. 17.
I did not check others variations and chromosomes.

For example your tool found the following variations:

rs59328451
Location: 17:39766801
Your Genotype: TT
Summary: homozygote for pachyonychia congenita Type I mutation (more ..)

rs57424749
Location: 17:39768561
Your Genotype: CC
Summary: homozygote for pachyonychia congenita Type I mutation (more ..)

and in the SNPedia it is as follows, respectively

GenoMagSummary(A;A)0normal
(A;T)3heterozygote for pachyonychia congenita Type I mutation
(T;T)3homozygote for pachyonychia congenita Type I mutation



GenoMagSummary(C;C)3homozygote for pachyonychia congenita Type I
mutation
(C;G)3heterozygote for pachyonychia congenita Type I mutation
(G;G)0normal

According to these facts your tool evaluates the variations as
pathological but it is wrong. It seems that your tool does not check the
direction plus or minus orientation of the strand.

I have checked my data in IGV Viewer and
https://www.ncbi.nlm.nih.gov/snp and in my case these variants are not
pathological.

What do you think about this?

The second problem I have is, that bam-analysis-kit 1.8 und 2.0 can not
analyze the chromosomes 2, 9,19,20 in my bam files. For all bam files
always the same chromosomes. The bam files are from two different
institutes but maybe from the same machine but I do not have this
information. One bam file was created 4 years ago. Others last year. But
I have no problem e.g. with current samtools or GATK.

Thanks in advance for feedback.

With best regards

werno

teepean47
10-05-2020, 09:58 AM
Hello,

I have analyzed my bam files with your tool bam-analysis-kit 1.8 und 2.0
and
I have found a discrepancy in the html SNPedia report for the Chr. 17.
I did not check others variations and chromosomes.

For example your tool found the following variations:

rs59328451
Location: 17:39766801
Your Genotype: TT
Summary: homozygote for pachyonychia congenita Type I mutation (more ..)

rs57424749
Location: 17:39768561
Your Genotype: CC
Summary: homozygote for pachyonychia congenita Type I mutation (more ..)

and in the SNPedia it is as follows, respectively

GenoMagSummary(A;A)0normal
(A;T)3heterozygote for pachyonychia congenita Type I mutation
(T;T)3homozygote for pachyonychia congenita Type I mutation



GenoMagSummary(C;C)3homozygote for pachyonychia congenita Type I
mutation
(C;G)3heterozygote for pachyonychia congenita Type I mutation
(G;G)0normal

According to these facts your tool evaluates the variations as
pathological but it is wrong. It seems that your tool does not check the
direction plus or minus orientation of the strand.

I have checked my data in IGV Viewer and
https://www.ncbi.nlm.nih.gov/snp and in my case these variants are not
pathological.

What do you think about this?

Snpedia analysis was written by the original author.



The second problem I have is, that bam-analysis-kit 1.8 und 2.0 can not
analyze the chromosomes 2, 9,19,20 in my bam files. For all bam files
always the same chromosomes. The bam files are from two different
institutes but maybe from the same machine but I do not have this
information. One bam file was created 4 years ago. Others last year. But
I have no problem e.g. with current samtools or GATK.

Would it be possible for you to share one of the BAMs and/or copy the output from BAM Analysis Kit.