R on the UEB HPC via Slurm¶
This page shows how to run R on compute nodes and what is different compared to a normal desktop/laptop computer. Worked examples are given for two R programs:
- Phenology (PhenoFlex from chillR)
- RNA‑Seq differential expression (DESeq2)
Key differences vs. normal interactive use¶
R cannot be run on the login node! Here what you shall do:
-
Where computation runs
Laptop/Desktop: everything runs locally.
HPC: submit work to compute nodes withsrun/sbatch; avoid heavy work on the login node. -
Resources are explicit
You request CPUs, memory, and time (-c/--mem/-t). The scheduler enforces these limits. -
Non‑interactive is the default
PreferRscript my_analysis.Rinside a batch job. You can get an interactive R shell on a node usingsrun --pty. -
Parallel execution
Tie R parallel workers to Slurm: e.g.,workers = $SLURM_CPUS_PER_TASK. Don’t oversubscribe (workers should not exceed requested CPUs).
Quick start¶
Interactive R on a compute node (for testing only!)¶
# 1 hour, 4 CPUs, 8 GiB RAM, short partition
srun --pty -p short -c 4 --mem=8G -t 01:00:00 R --vanilla
Run a script non‑interactively¶
Rscript my_analysis.R
In production, wrap that Rscript in an sbatch job (see examples below).
Example 1 — PhenoFlex (from chillR)¶
Below are two columns: Local (desktop/laptop) and HPC via Slurm. Adjust input paths and parameters with real study values.
Files¶
temps.csv— hourly (or sub‑daily) temperature data per station/season.
Local vs. HPC (side‑by‑side comparison)¶
| Local (desktop/laptop) | HPC via Slurm (compute nodes) |
|---|---|
R script: phenoflex.R |
R script: phenoflex.R (same file) |
phenoflex.R¶
Minimal scaffold—fill in parameters per your study.¶
library(chillR)
1) Read/prepare temperatures¶
Example structure: timestamp, Tmin, Tmax or hourly temps ```temps <- read.csv("temps.csv")
### If needed, convert to hourly format using chillR helpers:
```hourly <- chillR::stack_hourly_temps(years, latitude, daily_tmin, daily_tmax)
### 2) Set/fit PhenoFlex parameters (fill in your calibrated values)
```params <- list(E0=..., E1=..., A0=..., A1=..., Tf=..., Tc=..., slope=..., delta=...)
3) Run the model (placeholders below — consult your calibrated setup)¶
``result <- chillR::PhenoFlex(temperatures = temps, par = params)
phenoflex.sbatch`**### 4) Save outputs
```write.csv(result, file = "phenoflex_output.csv", row.names = FALSE)
```cat("PhenoFlex scaffold executed. Fill in params and inputs.\n")
#!/usr/bin/env bash
#SBATCH -J phenoflex
#SBATCH -p short
#SBATCH -c 4
#SBATCH --mem=8G
#SBATCH -t 01:00:00
#SBATCH -o %x.%j.out
#SBATCH -e %x.%j.err
module use /share/apps/Modules/modulefiles
module load R # or a specific R version/module available on your cluster
Tie parallel workers (if you parallelize parts) to the CPUs you requested¶
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}Rscript phenoflex.R
_Submit with_
````sbatch phenoflex.sbatch`
Example 2 — RNA‑Seq differential expression with DESeq2¶
Common pattern using tximport + DESeq2. The R script detects CPUs from Slurm and registers BiocParallel accordingly.
Files¶
samples.tsv— tab‑separated with columns:sample,condition,folder(path to Salmonquant.sfor a count file folder).tx2gene.tsv— two columns: transcript ID, gene ID (for tximport aggregation).
Local vs. HPC (side‑by‑side)¶
| Local (desktop/laptop) | HPC via Slurm (compute nodes) |
|---|---|
R script: deseq2.R |
R script: deseq2.R (same file) |
| library(tximport) | |
| library(DESeq2) | |
| library(BiocParallel) |
0) Parallel setup (local default: 4 workers)¶
```workers <- 4L register(MulticoreParam(workers = workers))
### 1) Metadata and files
```samp <- read.delim("samples.tsv", stringsAsFactors = FALSE)
files <- file.path(samp$folder, "quant.sf")
names(files) <- samp$sample
2) Mapping transcripts to genes¶
```tx2gene <- read.delim("tx2gene.tsv", header = FALSE) colnames(tx2gene) <- c("TX", "GENE")
### 3) Import & build DESeq2 object
```txi <- tximport(files, type = "salmon", tx2gene = tx2gene)
dds <- DESeqDataSetFromTximport(txi, colData = samp, design = ~ condition)
4) Fit model¶
``dds <- DESeq(dds, parallel = TRUE)
deseq2.sbatch`** ### 5) Results
```res <- results(dds, contrast = c("condition", "treated", "control"))
write.csv(as.data.frame(res), file = "deseq2_results.csv")
#!/usr/bin/env bash
#SBATCH -J deseq2
#SBATCH -p DB
#SBATCH -c 16
#SBATCH --mem=64G
#SBATCH -t 08:00:00
#SBATCH -o %x.%j.out
#SBATCH -e %x.%j.err
module use /share/apps/Modules/modulefiles
module load R # a module that provides DESeq2 & tximport (or install in your user lib)
# Match thread-hungry libs to Slurm allocation
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
Rscript deseq2.R
deseq2.R, but auto‑detect workers from Slurm:# deseq2.R (HPC‑aware)
library(tximport)
library(DESeq2)
library(BiocParallel)
# Detect workers from Slurm; fall back to 1 if not set
slurm_cpus <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK", unset = "1"))
register(MulticoreParam(workers = slurm_cpus))
samp <- read.delim("samples.tsv", stringsAsFactors = FALSE)
files <- file.path(samp$folder, "quant.sf")
names(files) <- samp$sample
tx2gene <- read.delim("tx2gene.tsv", header = FALSE)
colnames(tx2gene) <- c("TX", "GENE")
txi <- tximport(files, type = "salmon", tx2gene = tx2gene)
dds <- DESeqDataSetFromTximport(txi, colData = samp, design = ~ condition)
dds <- DESeq(dds, parallel = TRUE)
res <- results(dds, contrast = c("condition", "treated", "control"))
write.csv(as.data.frame(res), file = "deseq2_results.csv")
Notes
- On clusters where fork is restricted or you need multi‑node parallelism, replace MulticoreParam with SnowParam(workers = slurm_cpus, type = "SOCK") (still within one node unless you open ports).
- Align -c (CPUs) in your sbatch with workers in R to avoid oversubscription.
- Large imports (tximport) and model fitting can be memory‑intensive; size --mem accordingly.
- For reproducibility: record exact package versions (e.g., renv::snapshot()), and the Slurm resources in your analysis log.
- If Salmon/Kallisto outputs live on IB‑connected storage, keep I/O paths there (see InfiniBand page).
Handy patterns¶
Clean environment Rscript (no site/user profiles)¶
Rscript --vanilla script.R
Per‑job R library (isolate from your default one)¶
# inside a job before installing packages
export R_LIBS_USER="$PWD/rlib"
mkdir -p "$R_LIBS_USER"
Rscript -e 'install.packages(c("tximport","DESeq2"), repos="https://cloud.r-project.org")'
Use renv (project-local library and lockfile)¶
Rscript -e 'install.packages("renv", repos="https://cloud.r-project.org"); renv::init(); renv::snapshot()'
See also¶
- Slurm examples — job arrays, GPU jobs, quick reference
- Modules — loading
Rand friends - InfiniBand — why keeping data on the fabric helps
Final words¶
- For small datasets and quick exploratory work (e.g., a typical DESeq2 run with tens of samples), running on a desktop/laptop is simplest and often faster—no queueing and no shared-filesystem overhead.
-
For larger datasets or compute/memory-intensive tasks (e.g., PhenoFlex parameter sweeps, XGBoost/Random Forest with big grids or many trees, permutations/bootstraps), the HPC cluster’s extra RAM and CPUs deliver better throughput and keep your workstation free.
-
Rule of thumb: if a job needs >8–16 CPU threads, >16–32 GB RAM, runs >30–60 min, or you want to launch many variants in parallel, submit it to Slurm.