Wing Genealogist
01-05-2017, 11:45 AM
I received permission from Thomas Krahn to cross post his message on this forum:

I'm not sure if anyone noticed DebbieK's post on Facebook mid December.
(This was the time when everybody was running after coupon codes like blind chicken)

She mentioned a nice DIY genome sequencing project by Cleve G. Brown who is actually the CTO from Nanopore.

Here are some links:
I have actually downloaded the FastQ files and mapped them to the hg19 reference. We all know about the error rate issues of Nanopore reads (an error every fifth base or so), however with the huge amount of Data Clive produced, I was surprised to get quite usable results. Here is the Y chromosome BAM file I separated out:


It's quite easy to trace down the Y-haplogroup to
R1b-U106 > Z18 > Z237 > S4031 > S2307 > CTS5533 > S4034 > S3201/S6986
He is negative for the downstream SNP S6979 and interestingly also for S6980 which kicks this node one notch deeper in the phylogeny.

This can be all done through visual inspection of the BAM file. However I have a hard time to use common SNP callers (like bcftools call, freebayes) on this error prone reads. For sure we want to exclude INDELS from the gappy read alignment because it doesn't make sense to interpret anything with them. But still the SNP calls are either too noisy or there is not enough coverage for automated SNP calling.

I strongly suggest to to contact Cleve Brown and invite him to this group. A contact to the CTO of Nanopore should be a benefit for both sides.

Thomas mentions needing to go in (by hand) due to the large number if mis-reads. It may be possible to automate the process to allow reads from this technology to use the common SNP callers Thomas mentions.