03-12-2018, 03:44 AM
I'm trying to use the BAM Analysis Kit from http://www.y-str.org/2014/04/bam-analysis-kit.html to play around with some ancient samples, and I've been advised that it can take "hours, if not days" to analyze files.

Well, it's been about 24 hours thus far (I left the computer on overnight), I'm using the tool to analyze a 3.3GB BAM file (which is small-to-medium-sized file relative to some others, from what I can tell) from Brouhaski et al 2016, BUT it still appears to be going slower than expected.

It took ~22hours (even if my PC slept overnight, still at least 15 hours) just to get the notification that "INITIALIZATION COMPLETE; STARTING PROCESSING"!!! And now the CPU is cranking up and working on Chromosome 1

I used all the "default" settings; processing all 23 chromosomes, Y chr, X chr, telomere set, Y-STR set, 4 parallel threads.

My hardware:

Acer Aspire E5 575-33bm
Windows 10 64 bit
Intel Core i3-7100U (Kaby Lake) @ 2.4Ghz (2 cores, 4 threads), 3MB L3 cache
8GB 2200Mhz DDR4 Dual Channel RAM
1 TB 5400 RPM hard drive

I'd thought the CPU would be the "weak link" for this program, but the CPU/RAM usage has only really spiked occasionally. Thus, it occurs to me that the CPU is constantly waiting for my HARD DRIVE, which is the real weak link?

I was going to purchase a solid-state drive anyway; my question is whether my intuition is correct that the hard drive is the main drag for this program? The new SSD I'm looking at has MUCH MUCH higher sequential and random read & write speeds than my HDD; below are the Crystal Disk Mark benchmark results from my current hard drive:


vs a reviewer's benchmark from the internal SSD I'm looking at buying (250GB Samsung Evo 860 M.2)


Would loading these BAM files from an SSD produce the kind of massive speed boost I'm envisioning? Particularly in the "initialization" phase? Also, would I need to place the BAM Analysis Program files on the SDD as well to get the full benefit?

Any tips/advice would be great...

03-15-2018, 12:10 PM
I advise to seek the same samples in PLINK format which is much more easy to handle.
I used to waste time already on analizing BAMs but most of them where available in better formats.