PDA

View Full Version : WGS Extract Help



Scythoslav
07-22-2021, 07:48 PM
How do I align my bam file to use hs37d5 reference as the source?

capsian
08-26-2021, 01:45 AM
How do I align my bam file to use hs37d5 reference as the source?

what do you mean
are you mean quality

SurplusGadgets2003
09-12-2021, 08:59 PM
Sorry, just saw this. Using the new update of 10 Jul 2021 makes this even easier.

No matter which method you use, make sure you have downloaded and indexed the hs37d5 reference model. This was automatic if you upgraded from v2. Or is part of the Install / Upgrade script in v3. That script is re-entrant so you can run it again if not sure. The last action in the script is the query to download and index reference genomes. The standard source for hs37d5 is the NIH server. Some in Europe have trouble accessing that. So there is another source from the EBI server in the UK as well.

If your BAM is based on a hs38 or hs38DH reference model, simply hit the Realign button on the first tab. It does an Unalign to create FASTQ files and then an Align action with preset values to create the new BAM. If the Build 38 1K Genome project reference models are the starting point, it will select hs37d5.

Otherwise, see the third tab (Analyze) under the FASTQ file(s) section.

if you do not have the FASTQ files yet, hit the Unalign button to create them from your already loaded BAM file. When you hit the Align button later, it will automatically find these.

Once you have the FASTQ files, use the Align button there. It has three pop-ups to query for additional information. One of the pop-ups asks for which reference genome (of the 10 it knows about) to use. Select hs37d5. Another pop-up is the name(s) of the FASTQ files if it could not find them. The third is the name you want to give your BAM file it will create. Then sit back and wait. Alignment is a 160 CPU hour task in most cases. This is when you really appreciate those 16+ core CPU's as each core will reduce the overall time.

More information available in the Users Manual available from the tool website at https://wgsextract.github.io/

Pleiades
09-25-2021, 04:13 PM
Sorry, just saw this. Using the new update of 10 Jul 2021 makes this even easier.

No matter which method you use, make sure you have downloaded and indexed the hs37d5 reference model. This was automatic if you upgraded from v2. Or is part of the Install / Upgrade script in v3. That script is re-entrant so you can run it again if not sure. The last action in the script is the query to download and index reference genomes. The standard source for hs37d5 is the NIH server. Some in Europe have trouble accessing that. So there is another source from the EBI server in the UK as well.

If your BAM is based on a hs38 or hs38DH reference model, simply hit the Realign button on the first tab. It does an Unalign to create FASTQ files and then an Align action with preset values to create the new BAM. If the Build 38 1K Genome project reference models are the starting point, it will select hs37d5.

Otherwise, see the third tab (Analyze) under the FASTQ file(s) section.

if you do not have the FASTQ files yet, hit the Unalign button to create them from your already loaded BAM file. When you hit the Align button later, it will automatically find these.

Once you have the FASTQ files, use the Align button there. It has three pop-ups to query for additional information. One of the pop-ups asks for which reference genome (of the 10 it knows about) to use. Select hs37d5. Another pop-up is the name(s) of the FASTQ files if it could not find them. The third is the name you want to give your BAM file it will create. Then sit back and wait. Alignment is a 160 CPU hour task in most cases. This is when you really appreciate those 16+ core CPU's as each core will reduce the overall time.

More information available in the Users Manual available from the tool website at https://wgsextract.github.io/

How much free disk space do you need for this process? My 30x WGS CRAM is 65 GB.

Is 200 GB enough?