NGS Server: Bioinformatics Software Catalog¶
Last updated: 2025-11-13
145 program packages for bioinformatics now available!
This catalog groups installed tools by primary use and gives a one-line description of what each does. Items marked [legacy] are older but still useful; [MPI] indicates an MPI build is available.
Programs aren’t available by default; you must enable them via Lmod modules. Module names match the program name in lowercase—for example, loading star module enables the STAR program. See Modules for more info.
Read QC, Trimming, Filtering & Contamination¶
- FastQC — Quality control reports for raw sequencing reads (per-base quality, GC, adapters, overrepresented sequences).{prdel}
- TrimGalore — Wrapper around Cutadapt and FastQC for adapter/quality trimming of Illumina reads.
- Cutadapt — Adapter and primer trimming for short reads ([legacy] version kept for reproducibility).
- Filtlong — Length/quality-based filtering for ONT/PacBio reads, optionally guided by a reference.
- BBTools — Versatile toolkit for read mapping, trimming, filtering, QC and more.
- SortMeRNA — Rapid rRNA read filtering (enrichment/depletion) for RNA-Seq/metatranscriptomes.
- vecscreen — Screens nucleotide sequences for vector contamination using NCBI’s UniVec database.
- Rcorrector — k-mer based error correction for Illumina RNA-Seq reads.
- Lighter — Fast, memory‑efficient error correction for high-coverage Illumina genomic reads.
Read Mapping & Spliced Alignment¶
- BWA — Fast short read alignment to a reference genome (aln/mem algorithms).
- Bowtie — Ultrafast memory‑efficient short‑read aligner.
- Bowtie2 — Gapped, local alignment of reads to long reference sequences.
- BBMap/BBTools — Sensitive generic read mapper with many specialized modes (metagenomes, RNA, long reads).
- BLAT — Fast local alignment for highly similar sequences (good for EST/mRNA to genome).
- LAST — General purpose aligner for divergent or long sequences; supports split/long indel alignments.
- Minimap2 — Versatile long and short read mapper; splice-aware modes for RNA and cDNA.
- HISAT2 — Graph-based, splice-aware aligner for RNA-Seq to large genomes.
- GMAP/GSNAP — Spliced alignment of mRNA/ESTs (GMAP) and reads (GSNAP) to genomes.
- STAR — High-speed, splice-aware RNA-Seq aligner with rich junction discovery.
- TopHat2 — Older splice-aware aligner built on Bowtie ([legacy]).
- Subread Suite — A suite of programs for processing next-gen sequencing data, in particular RNA-seq data. Contains featureCounts.
Genome Assembly (short read, long read, hybrid)¶
- ABySS — de novo assembler for short paired-end reads and genomes of all sizes. Can use [MPI].
- SPAdes — de novo assembler (Illumina, IonTorrent, plasmids, metagenomes; includes rnaSPAdes).
- Velvet — de Bruijn graph assembler for short reads ([legacy] but still used for small genomes).
- SOAPdenovo2 — Short read genome assembler ([legacy]).
- Ray — Parallel [MPI] de novo assembler. ([legacy] but useful for teaching/testing).
- MaSuRCA — Hybrid assembler combining short reads and long reads.
- Canu — Long read assembler (PacBio/ONT) with built-in correction and trimming.
- Flye — Fast long read genome assembler with repeat resolution.
- wtdbg2 — Lightweight long read assembler for large genomes.
- Smartdenovo — OLC assembler for long reads (simple and fast; experimental).
- GoldRush – A fast long-read genome assembler for producing high-quality contigs and scaffolds.
- LongStitch – A genome scaffolder that improves draft assemblies using long reads and linked reads.
- ntLink – A genome assembly scaffolder using long reads to link contigs.
Transcriptomics: Assembly, Clustering & Quantification¶
- Trinity — de novo transcriptome assembler for short read RNA-Seq.
- Oases — Transcriptome assembler built on Velvet ([legacy]).
- Trans-ABySS — Transcriptome assembly based on ABySS contigs.
- TransLiG — de novo transcriptome assembly tailored for long read RNA-Seq.
- TransPi — Modular transcriptome assembly/annotation pipeline orchestrating multiple assemblers.
- TGICL — EST clustering and assembly to generate tentative consensuses (TCs).
- Corset — Clusters transcripts/contigs by shared read evidence; supports differential analysis.
- StringTie — Reference-guided transcript assembly and abundance estimation from RNA-Seq alignments.
- Cufflinks — Legacy transcript assembly/quantification suite ([legacy]).
- Kallisto — Pseudoalignment-based transcript quantification (fast, low memory).
- Salmon — Quasi-mapping-based transcript quantification; supports GC/positional bias correction.
- RSEM — Expectation-maximization-based transcript/isoform abundance estimation.
- ESPRESSO — Long read RNA pipeline for isoform discovery, correction and quantification.
- rMATS‑turbo — Replicate-aware differential alternative splicing (SE, A5SS, A3SS, MXE, RI), reports PSI and FDR (turbo version is much faster).
Long read Basecalling & Signal Processing¶
- ONT Guppy — Oxford Nanopore basecaller (CPU build for ONT FAST5/FASTQ workflows).
- TELL-Read — Converts raw BCL/FASTQ into barcode-linked FASTQs and generates QC; includes reference index generator and genome mapping via genomes.json.
- TELL-Link — De novo assembly of TELL-Seq linked reads (barcode-aware graph, contigs, scaffolding).
Error Correction & Polishing (Reads/Assemblies)¶
- proovread — Hybrid correction of long reads using short reads.
- Racon — Consensus polishing of long read assemblies from read-to-assembly alignments.
- Lighter — Memory‑efficient k‑mer–based error correction for Illumina reads. — Illumina read error correction (also listed under QC).
- Rcorrector — RNA-Seq read error correction (also listed under QC).
- Tigmint – Corrects misassemblies in draft genomes using linked-read or long-read data.
- ntEdit – A k-mer-based genome polishing tool for correcting errors in draft assemblies.
Gene Prediction, Structural Annotation & Completeness¶
- AUGUSTUS — ab initio gene prediction; optional BUSCO backend for gene-set–guided assessment.
- Prodigal — Prokaryotic gene prediction (CDS and translation initiation sites).
- Exonerate — Fast pairwise alignment with models for protein2genome, est2genome; great for annotating genes by homology.
- PASApipeline — Aligns transcripts to the genome to refine/validate gene models and build evidence-based gene sets.
- EvidentialGene — Evidence-driven reconstruction, filtering and casing of best mRNA/protein gene sets.
- BUSCO — Assembly/annotation completeness via benchmarking single-copy orthologs (e.g., embryophyta_odb10; miniprot/HMMER pipelines).
- RNAmmer — rRNA gene prediction.
- TransDecoder — ORF prediction on transcripts to extract likely coding sequences.
- Trinotate — Functional annotation of transcriptomes (BLAST/HMMER/signal peptides/TM, GO/KEGG reports).
- QUAST — Quality assessment of genome assemblies with reference‑based and reference‑free metrics and plots.
- QUAST‑LG — Large‑genome mode of QUAST enabling scalable reference‑guided evaluation for large assemblies.
- RagTag — Reference‑guided correction, scaffolding, and patching to reduce contig fragmentation and improve continuity.
- Liftoff — Lifts gene annotations from a reference onto a target assembly using exon‑aware minimap2 alignments.
Sequence Search, Homology & Multiple Sequence Alignment¶
- NCBI BLAST+ — Standard nucleotide/protein similarity searches and database utilities.
- DIAMOND — BLAST-compatible, ultra-fast protein aligner (useful for large metagenomes/transcriptomes).
- PLAST — Parallel local alignment search tool (protein BLAST-like at scale).
- BLAT — Fast local aligner for similar sequences (also listed under mapping).
- HMMER — Profile HMM searches against sequence/domain databases (Pfam, custom profiles).
- MAFFT — Multiple sequence alignment for nucleotides/proteins (accurate/fast modes).
- FASTA suite — Classic sequence comparison and utilities (FASTA/FASTX). — FASTA36 suite for sequence similarity searching and alignments.
Whole Genome Alignment, Synteny & Comparative Genomics¶
- MUMmer4 — Rapid whole genome alignment/synteny (nucmer/promer); good for assemblies and structural differences.
- Minimap2 — Also supports whole genome and assembly-to-assembly mapping (listed above under mapping).
- LAST — General-purpose aligner for divergent/long sequences; robust split alignments and long indels.
Pan genome & Graph Genomics¶
- Cactus — Whole-genome aligner and pangenome pipeline (minigraph-cactus) for multi-genome alignment and graph construction.
- PGGB — PanGenome Graph Builder orchestrating wfmash → seqwish → smoothxg → gfaffix → odgi to construct and polish chromosome‑scale pangenome graphs.
- wfmash — Fast Mash‑based long‑read/assembly aligner optimized for whole‑genome/pangenome all‑vs‑all mappings (PAF output).
- seqwish — Induces a compact variation graph (GFA) from sequences and their pairwise alignments (PAF).
- smoothxg — Block‑POA smoothing and consensus polishing of variation graphs to improve mappability and contiguity.
- odgi — Toolkit to build, sort, visualize and analyze variation graphs (OG/GAFA), with rich QC and stats.
- GFAffix — Detects and resolves shared sequence affixes in GFA graphs to reduce fragmentation and simplify paths.
- vg — Variation graph toolkit for mapping, indexing (GBWT/GBZ), variant calling, and graph conversions.
- GraphAligner — Long-read to variation-graph aligner (GAF/GAM output), optimized for large graphs.
- PanGenie — Short-read genotyper on pangenome graphs using k-mer counts and haplotype panels.
- MultiQC — Aggregates QC outputs from many tools/runs into a single HTML report (handy for PGGB pipelines).
- Minigraph — Sequence‑to‑graph mapper/graph inducer for pangenomics, ideal for mapping assemblies/reads to graph references.
- PanTools — Pangenomic toolkit to construct and analyze pangenomes/panproteomes; builds De Bruijn graph pangenomes, integrates annotations, and supports homology grouping, read mapping, and Cypher-based queries.
- Gretl — Genome graph analysis supporting OLS/GLS, GARCH, VAR/VECM, panel data, and time-series tools.
Variant Analysis & GWAS¶
- BCFtools — Utilities for BCF/VCF: view, filter, query, subset, merge, normalize, call; complements SAMtools/HTSlib workflows.
- VCFtools — Suite for filtering, summarizing, and converting VCF; computes population genetics stats (MAF, HWE, Fst) and subsets samples/sites.
- GEMMA — Genome‑wide Efficient Mixed Model Association: LMM/BSLMM tools for GWAS; GRM estimation and fast association on large cohorts.
- GraphTyper — Graph-based variant calling and genotyping (SNVs/indels/SVs) from short reads.
- Paragraph — Structural-variant genotyper for short reads via local graph realignment around candidate SVs.
- SyRI — Synteny and Rearrangement Identifier to detect inversions, translocations, and other SVs between assemblies.
- TELL‑Sort — Variant calling and haplotype phasing from TELL‑Read linked‑reads.
Phylogenetics & Phylogenetic Placement¶
- SEPP — SATé-Enabled Phylogenetic Placement addresses the problem of phylogenetic placement for meta-genomic short reads (places short fragments/OTUs onto reference trees).
- Phylo (Biopython) — Local collection of phylogenetics utilities/pipelines (exact contents vary on this server).
- FastTree — Approximately maximum-likelihood phylogenetic trees from large alignments, fast with good accuracy.
- OrthoFinder — Orthogroup inference and comparative genomics pipeline (orthologs, gene trees, rooted species tree).
- FastME A comprehensive, accurate and fast distance-based phylogeny inference program.
Protein Features & Subcellular Targeting¶
- SignalP 6.0 Fast — Predicts signal peptides and cleavage sites across all domains of life (supports all known SP types).
- TMHMM 2.0 — Predicts transmembrane helices and topology.
- Miniprot — Fast protein-to-genome aligner used by BUSCO (eukaryote mode) for exon-aware mapping.
Machine Learning & Gradient Boosting¶
- XGBoost — Scalable gradient‑boosted decision trees for tabular data; CPU only support and distributed training.
- LightGBM — Fast, memory‑efficient gradient boosting (histogram/leaf‑wise growth); supports CPU only support and distributed learning.
- CatBoost — Gradient boosting with strong native handling of categorical features; CPU only implementations and production‑ready tools.
- Random Forest (scikit‑learn) — Bagging ensemble of decision trees for classification/regression; robust baselines with multi‑core parallelism.
Molecular Dynamics / Protein Modeling¶
-
GROMACS — Molecular dynamics for biomolecules; multiple builds available:
Gromacs_2025.2,Gromacs_2022.6,Gromacs_2021.4. -
Gromacs_2025is [MPI] built for multi-node runs).
Clustering, Dereplication & k-mer Analysis¶
- MCL — Markov Cluster Algorithm for graph clustering; widely used to cluster proteins into families.
- CD‑HIT — Fast clustering/dereplication of nucleotide/protein sequences at user-defined identity.
- Jellyfish — Fast, parallel k-mer counting to estimate coverage/complexity and drive downstream QC.
RNA-Seq/Transcriptome Quality Assessment¶
- RNAQuast — Quality assessment of de novo transcriptome assemblies.
- Transrate — Assembly quality metrics based on read mapping and contig properties.
- DETONATE — Suite for reference-based (REF EVAL) and reference-free (RSEM EVAL) transcriptome assembly evaluation.
Metagenomics¶
- Kraken2 — Ultrafast k-mer–based taxonomic classifier for metagenomic reads; supports custom/standard databases, confidence thresholds, paired-end/long reads, and produces per-read assignments plus summary reports. Databese will be installed upon request.
Utilities & File Formats¶
- GenomeTools — Comprehensive genome annotation toolkit; GFF3 validation/manipulation and many sequence utilities.
- BamTools — Library and utilities for working with BAM files.
- SAMtools — Toolkit for working with SAM/BAM/CRAM files (view/sort/index/merge/convert, region fetch, pileups)
- bio - collection of Python bioinformatics tools, combines data from different sources: GenBank,Ensembl,Gene Ontology,Sequence Ontology, NCBI Taxonomy and provides a unified, consistent interface. Developped for Bioinformatics CookBook
- AGAT - Suite of tools to handle gene annotations in any GTF/GFF format (perl scripts in SIF container).
- UCSC Utilities — Command-line tools from the UCSC Genome Browser (e.g., faToTwoBit, twoBitToFa, wigToBigWig, bedGraphToBigWig, liftOver etc.).
Legacy / Technology-specific¶
- 454 (Newbler) — Discontinued tools and pipelines related to Roche/454 sequencing data ([legacy]).
- Boiler — RNA-Seq alignment compression/summary.
- DISCASM — Extract discordant/unmapped reads and assemble to reveal fusions/foreign transcripts.
Notes¶
- Some packages appear in multiple categories due to overlapping functionality (e.g., minimap2, BBMap).
- [legacy] tools are kept to support older workflows; prefer modern equivalents when possible.
- Contact me, if you need anything alse.