PDA

View Full Version : Using the correct Y chromosome coverage statistics



FGC Corp
03-22-2019, 10:30 PM
Comment by FGC team:

"There are only 23,673,595 non-N bases in the non-recombining chrY GRCh38 reference sequence (including the unplaced chrY contig), (PARs; https://en.wikipedia.org/wiki/Pseudoautosomal_region#Location). Human genome reference sequences, such as the hs38DH sequence used by 1000 Genomes Project and FGC, mask the chrY PAR regions with N's, so reads from those regions get mapped to the chrX PARs instead. Since the PARs are not relevant for analyses of paternal line chrY inheritance, those other regions, should be excluded.

Accordingly we think the 23,673,595 number is a better number to use, rather than the 26,000,000 number that is often quoted."

Put differently:
We think the 26,000,000 number is incorrect.

For example, people are comparing 30x WGS stats from us and other companies. The difference is that we exclude the PARs.

The technology is the same, as is the SNP yield.

FGC Corp
03-22-2019, 11:36 PM
Comment by FGC team:

"There are only 23,673,595 non-N bases in the non-recombining chrY GRCh38 reference sequence (including the unplaced chrY contig), (PARs; https://en.wikipedia.org/wiki/Pseudoautosomal_region#Location). Human genome reference sequences, such as the hs38DH sequence used by 1000 Genomes Project and FGC, mask the chrY PAR regions with N's, so reads from those regions get mapped to the chrX PARs instead. Since the PARs are not relevant for analyses of paternal line chrY inheritance, those other regions, should be excluded.

Accordingly we think the 23,673,595 number is a better number to use, rather than the 26,000,000 number that is often quoted."

Put differently:
We think the 26,000,000 number is incorrect.

For example, people are comparing 30x WGS stats from us and other companies. The difference is that we exclude the PARs.

The technology is the same, as is the SNP yield.

If one service represents their 30x WGS as better because it yields 26,000,000 vs 23,673,595, then I think that is a misrepresentation of the facts.

JamesKane
03-23-2019, 12:14 AM
The absolute number of bases covered in a WGS test is pretty meaningless anyway. In my opinion any location with less than 2 reads is not useful except under very special circumstances anyway. That’s the whole reason I dislike YFULL’s coverage statistics and stared publishing a set using more generally accepted heuristics.

The PAR regions that you are calling out here, should just be filtered since as you note the regions do recombine with chrX. That makes any mutation found therein suspect since it could have originated on a maternal line.

FGC Corp
03-23-2019, 01:27 AM
The absolute number of bases covered in a WGS test is pretty meaningless anyway. In my opinion any location with less than 2 reads is not useful except under very special circumstances anyway. That’s the whole reason I dislike YFULL’s coverage statistics and stared publishing a set using more generally accepted heuristics.

The PAR regions that you are calling out here, should just be filtered since as you note the regions do recombine with chrX. That makes any mutation found therein suspect since it could have originated on a maternal line.

The issue here is that some people are relying on those statistics. I think the technical point isn't clear to some people in the community.

FGC Corp
03-26-2019, 03:23 PM
The absolute number of bases covered in a WGS test is pretty meaningless anyway. In my opinion any location with less than 2 reads is not useful except under very special circumstances anyway. That’s the whole reason I dislike YFULL’s coverage statistics and stared publishing a set using more generally accepted heuristics.

The PAR regions that you are calling out here, should just be filtered since as you note the regions do recombine with chrX. That makes any mutation found therein suspect since it could have originated on a maternal line.

YFull is still using the 26,000,000 statistic.

MacUalraig
03-26-2019, 04:11 PM
To be fair though I'm not sure any actual vendor has cited the 26M number. In fact one vendor I can think of publicly states that SNPs in the PARs are 'not useful for phylogenetic studies' so they would only be contradicting themselves ;-)

By the way, Greg M. once stated to me that even a single read could be relied upon sometimes. If I recall he was maybe talking about a call for a known SNP rather than novel discovery. I keep meaning to go back to the x2 and x4 data I got from FGC and do a full true/false positive analysis. Again in terms of known SNPs (ie my Y Elite SNPs), my x2 only gave one false negative and that was from a single read that Greg then said was misaligned. I realigned GWK3W to hg38 the other day and it still shows ancestral for it, I think it is still misaligned.

FGC Corp
04-07-2019, 07:11 PM
Good news:
The Yfull team is updating their coverage statistics to reflect our recommendations.

MitchellSince1893
04-07-2019, 07:28 PM
Good news:
The Yfull team is updating their coverage statistics to reflect our recommendations.

Does this impact their dating methodology? I ask because my father has done bigY500 and y elite 2.1 and their dates are 500years different. Y elite being 1243ybp and bigY 733ybp

MacUalraig
04-07-2019, 07:32 PM
Does this impact their dating methodology? I ask because my father has done bigY500 and y elite 2.1 and their dates are 500years different. Y elite being 1243ybp and bigY 733ybp

The disputed figure was the total coverage length (percentage) based on 26Mb rather than 23Mb. Their dating uses a subset of the uncontroversial regions.