View on GitHub

linux_cookbook

Tips and tricks for your daily life on our servers

Genome Assembly Instruction

This pipeline is a prototype of genome assembly using Lichen genome as an example.

Reads trimming/filtering (optional)

Reads trimming/filtering are usually using Trimmomatic (for short reads-usually Illumina reads) and Filtlong (for long reads)

Trimmomatic trimming

Trim the pair ends reads:

trimmomatic PE -threads 16 -phred33 -trimlog trim.log reads_R1.fastq reads_R2.fastq reads_trim_R1.fastq reads_unpair_R1.fastq reads_trim_R2.fastq reads_unpair_R2.fastq LEADING:10 TRAILING:10 ILLUMINACLIP:/path_to_your_trimmomatic/trimmomatic-0.39-1/adapters/TruSeq3-PE.fa/:2:30:10

Trim the single end reads:

trimmomatic SE -threads 16 -phred33 -trimlog trim.log reads.fastq reads_trim.fastq LEADING:10 TRAILING:10 ILLUMINACLIP:/path_to_your_trimmomatic/trimmomatic-0.39-1/adapters/TruSeq3-PE.fa/:2:30:10

Filtlong filtering

generic setting for keeping 90% of the reads:

filtlong --min_length 1000 --keep_percent 90 nanopore_reads.fastq | gzip > nanopore_filtlong.fastq.gz

Reads assembly

Reads assembly are using long reads as skeleton and short reads for correction. Two different software are usually used for long reads assembly: Canu or Flye, each has their own advantages.

Canu assembly

Canu required an estimate genome size for assembly, following command is using a 1.2 Gb as an example:

canu -p canu -genomeSize=1.2g -nanopore-raw reads.fastq

Flye assembly

Flye required an estimate genome size for assembly, following command is using a 1.2 Gb as an example:

flye --nano-raw nanopore.fastq.gz --threads 30 --out-dir ./ --genome-size 1.2g

Contigs/scaffolds polish

The assembled contigs/scaffolds need to be farther polish for consensus (using Racon), reads correction (Illumina short reads mapping and correction using Pilon) and curating heterozygous (using Purge Haplotigs *optional)

Racon consensus

Racon can correct the assembly process that do not have consensus steps. This usually take 2 round Racon for better result. This example using a -m (score for matching bases) 3 and -x (score for mismatching bases) -5. Racon also required overlaps mapping which can be acquired using bwa.

bwa index assembled.fasta
bwa mem -t 16 -x ont2d assembled.fasta nanopore.fastq > mapping.sam
racon -m 8 -x -6 -t 16 nanopore.fastq mapping.sam assembled.fasta > racon.fasta
bwa index racon.fasta
bwa mem -t 14 -x ont2d racon.fasta nanopore.fastq > mapping2nd.sam
racon -m 8 -x -6 -t 16 nanopore.fastq mapping2nd.sam racon.fasta > racon2nd.fasta

Pilon polishing

Pilon can be used for improving draft assemblies. Personally, I use 2 round Pilon for assembly polishing. Pilon also required mapping information using illumine short reads which can be acquired using bwa.

bwa index racon2nd.fasta
bwa mem -t 16 racon2nd.fasta reads_R1.fastq reads_R2.fastq | samtools view - -Sb | samtools sort - -@14 -o mapping.sorted.bam
samtools index mapping.sorted.bam
java -Xmx32G -jar pilon-1.23.jar --genome racon2nd.fasta --fix all --changes --frags mapping.sorted.bam --threads 16 --output ./pilon_round1 | tee ./round1.pilon
bwa index pilon_round1.fasta
bwa mem -t 16 pilon_round1.fasta reads_R1.fastq reads_R2.fastq | samtools view - -Sb | samtools sort - -@14 -o ./mapping_pilon1.sorted.bam
samtools index mapping_pilon1.sorted.bam
java -Xmx32G -jar pilon-1.23.jar --genome pilon_round1.fasta --fix all --changes --frags mapping_pilon1.sorted.bam --threads 16 --output ./pilon_round2 | tee ./round2.pilon

Purge Haplotigs heterozygous curating for diploid genome

** Purge Haplotigs can help with curating heterozygous diploid genome assemblies. Purge Haplotigs also required mapping information using long sequencing reads by using minimap2. The purge_haplotigs contigcov need the result from purge_haplotigs readhist and using the low cutoff for -l, mid-point for -m and high cutoff for -h.**

minimap2 -ax map-pb round2.pilon.fasta nanopore.fastq | samtools view -hF 256 - | samtools sort -@ 8 -m 1G -o aligned.bam -T tmp.ali
purge_haplotigs readhist -b aligned.bam -g round2.pilon.fasta -t 16
purge_haplotigs contigcov -i aligned.bam.gencov -l 10 -m 95 -h 190
purge_haplotigs purge -g round2.pilon.fasta -c coverage_stats.csv -t 16 -r ../repeats.bed