fusemblr is a pipeline wrapper designed for the assembly of complex genomes using nanopore reads and paired-end illumina
fusemblr was designed for the Fusarium oxysporum assembly project (hence the name)
The pipeline uses Nanopore (the longer and higher coverage the better) and paired-end illumina reads (PacBio is optional but recommended)
Pipeline in 5 steps:
- Downsampling of reads to a designated coverage using Filtlong (Default: 100X; appears to help using this coverage)
- Polishing of downsampled reads with the paired-end illumina reads using Ratatosk
- Assembly with Flye
removed the hard coded maximium value for the minimum overlap threshold (previously 10kb)
by default the minimum overlap value is automatically provided as the read N95 after polishing - Optional: Polishing of assembly with PacBio Hifi and paired-end illumina reads using NextPolish2
- Filtering (minimum length 10kb), reordering and renaming using Seqkit and awk
conda install samtobam::fusemblr
fusemblr.sh -n nanopore.fq.gz -1 illumina.R1.fq.gz -2 illumina.R2.fq.gz -g 70000000
Required inputs:
-n | --nanopore Nanopore long reads used for assembly in fastq or fasta format (*.fastq / *.fq) and can be gzipped (*.gz)
-1 | --pair1 Paired end illumina reads in fastq format; first pair. Used for Rataosk polishing and PAQman evaluation. Can be gzipped (*.gz)
-2 | --pair2 Paired end illumina reads in fastq format; second pair. Used for Rataosk polishing and PAQman evaluation. Can be gzipped (*.gz)
-g | --genomesize Estimation of genome size, required for downsampling and assembly
Recommended inputs:
-h | --hifi Pacbio HiFi reads required for assembly polishing with NextPolish2 (Recommended if available)
-t | --threads Number of threads for tools that accept this option (default: 1)
Optional parameters:
-m | --minsize Minimum size of reads to keep during downsampling (Default: 5000)
-x | --coverage The amount of coverage for downsampling (X), based on genome size, i.e. coverage*genomesize (Default: 100)
-v | --minovl Minimum overlap for Flye assembly, (Default: Calculated during run as N95 of reads used for assembly)
-w | --weight The weighting used by Filtlong for selecting reads; balancing the length vs the quality (Default: 5)
-p | --prefix Prefix for output (default: name of assembly file (-a) before the fasta suffix)
-o | --output Name of output folder for all results (default: fusemblr_output)
-c | --cleanup Remove a large number of files produced by each of the tools that can take up a lot of space. Choose between 'yes' or 'no' (default: 'yes')
-h | --help Print this help message
Following assembly it is recommended that you run PAQman on your resulting assembly to comprehensively check the quality
This can also help you compare any assemblies you have to check for the best.