Skip to content

R on the UEB HPC via Slurm

This page shows how to run R on compute nodes and what is different compared to a normal desktop/laptop computer. Worked examples are given for two R programs:

  • Phenology (PhenoFlex from chillR)
  • RNA‑Seq differential expression (DESeq2)

Key differences vs. normal interactive use

R cannot be run on the login node! Here what you shall do:

  • Where computation runs
    Laptop/Desktop: everything runs locally.
    HPC: submit work to compute nodes with srun/sbatch; avoid heavy work on the login node.

  • Resources are explicit
    You request CPUs, memory, and time (-c/--mem/-t). The scheduler enforces these limits.

  • Non‑interactive is the default
    Prefer Rscript my_analysis.R inside a batch job. You can get an interactive R shell on a node using srun --pty.

  • Parallel execution
    Tie R parallel workers to Slurm: e.g., workers = $SLURM_CPUS_PER_TASK. Don’t oversubscribe (workers should not exceed requested CPUs).


Quick start

Interactive R on a compute node (for testing only!)

# 1 hour, 4 CPUs, 8 GiB RAM, short partition
srun --pty -p short -c 4 --mem=8G -t 01:00:00 R --vanilla

Run a script non‑interactively

Rscript my_analysis.R

In production, wrap that Rscript in an sbatch job (see examples below).


Example 1 — PhenoFlex (from chillR)

Below are two columns: Local (desktop/laptop) and HPC via Slurm. Adjust input paths and parameters with real study values.

Files

  • temps.csv — hourly (or sub‑daily) temperature data per station/season.

Local vs. HPC (side‑by‑side comparison)

Local (desktop/laptop) HPC via Slurm (compute nodes)
R script: phenoflex.R R script: phenoflex.R (same file)

phenoflex.R

Minimal scaffold—fill in parameters per your study.

library(chillR)

1) Read/prepare temperatures

Example structure: timestamp, Tmin, Tmax or hourly temps ```temps <- read.csv("temps.csv")

### If needed, convert to hourly format using chillR helpers:
```hourly <- chillR::stack_hourly_temps(years, latitude, daily_tmin, daily_tmax)

### 2) Set/fit PhenoFlex parameters (fill in your calibrated values)
```params <- list(E0=..., E1=..., A0=..., A1=..., Tf=..., Tc=..., slope=..., delta=...)

3) Run the model (placeholders below — consult your calibrated setup)

``result <- chillR::PhenoFlex(temperatures = temps, par = params)

### 4) Save outputs
```write.csv(result, file = "phenoflex_output.csv", row.names = FALSE)

```cat("PhenoFlex scaffold executed. Fill in params and inputs.\n")
**Batch script:phenoflex.sbatch`**
#!/usr/bin/env bash
#SBATCH -J phenoflex
#SBATCH -p short
#SBATCH -c 4
#SBATCH --mem=8G
#SBATCH -t 01:00:00
#SBATCH -o %x.%j.out
#SBATCH -e %x.%j.err

module use /share/apps/Modules/modulefiles
module load R     # or a specific R version/module available on your cluster

Tie parallel workers (if you parallelize parts) to the CPUs you requested

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1} export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}Rscript phenoflex.R

_Submit with_ 
````sbatch phenoflex.sbatch` 


Example 2 — RNA‑Seq differential expression with DESeq2

Common pattern using tximport + DESeq2. The R script detects CPUs from Slurm and registers BiocParallel accordingly.

Files

  • samples.tsv — tab‑separated with columns: sample, condition, folder (path to Salmon quant.sf or a count file folder).
  • tx2gene.tsv — two columns: transcript ID, gene ID (for tximport aggregation).

Local vs. HPC (side‑by‑side)

Local (desktop/laptop) HPC via Slurm (compute nodes)
R script: deseq2.R R script: deseq2.R (same file)
library(tximport)
library(DESeq2)
library(BiocParallel)

0) Parallel setup (local default: 4 workers)

```workers <- 4L register(MulticoreParam(workers = workers))

### 1) Metadata and files
```samp <- read.delim("samples.tsv", stringsAsFactors = FALSE)
files <- file.path(samp$folder, "quant.sf")
names(files) <- samp$sample

2) Mapping transcripts to genes

```tx2gene <- read.delim("tx2gene.tsv", header = FALSE) colnames(tx2gene) <- c("TX", "GENE")

### 3) Import & build DESeq2 object
```txi <- tximport(files, type = "salmon", tx2gene = tx2gene)
dds <- DESeqDataSetFromTximport(txi, colData = samp, design = ~ condition)

4) Fit model

``dds <- DESeq(dds, parallel = TRUE)

### 5) Results
```res <- results(dds, contrast = c("condition", "treated", "control"))
write.csv(as.data.frame(res), file = "deseq2_results.csv")
**Batch script:deseq2.sbatch`**

#!/usr/bin/env bash
#SBATCH -J deseq2
#SBATCH -p DB
#SBATCH -c 16
#SBATCH --mem=64G
#SBATCH -t 08:00:00
#SBATCH -o %x.%j.out
#SBATCH -e %x.%j.err

module use /share/apps/Modules/modulefiles
module load R     # a module that provides DESeq2 & tximport (or install in your user lib)

# Match thread-hungry libs to Slurm allocation
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK:-1}

Rscript deseq2.R
Same deseq2.R, but auto‑detect workers from Slurm:
# deseq2.R (HPC‑aware)
library(tximport)
library(DESeq2)
library(BiocParallel)

# Detect workers from Slurm; fall back to 1 if not set
slurm_cpus <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK", unset = "1"))
register(MulticoreParam(workers = slurm_cpus))

samp <- read.delim("samples.tsv", stringsAsFactors = FALSE)
files <- file.path(samp$folder, "quant.sf")
names(files) <- samp$sample

tx2gene <- read.delim("tx2gene.tsv", header = FALSE)
colnames(tx2gene) <- c("TX", "GENE")

txi <- tximport(files, type = "salmon", tx2gene = tx2gene)
dds <- DESeqDataSetFromTximport(txi, colData = samp, design = ~ condition)

dds <- DESeq(dds, parallel = TRUE)
res <- results(dds, contrast = c("condition", "treated", "control"))
write.csv(as.data.frame(res), file = "deseq2_results.csv")

Notes - On clusters where fork is restricted or you need multi‑node parallelism, replace MulticoreParam with SnowParam(workers = slurm_cpus, type = "SOCK") (still within one node unless you open ports). - Align -c (CPUs) in your sbatch with workers in R to avoid oversubscription. - Large imports (tximport) and model fitting can be memory‑intensive; size --mem accordingly. - For reproducibility: record exact package versions (e.g., renv::snapshot()), and the Slurm resources in your analysis log. - If Salmon/Kallisto outputs live on IB‑connected storage, keep I/O paths there (see InfiniBand page).


Handy patterns

Clean environment Rscript (no site/user profiles)

Rscript --vanilla script.R

Per‑job R library (isolate from your default one)

# inside a job before installing packages
export R_LIBS_USER="$PWD/rlib"
mkdir -p "$R_LIBS_USER"
Rscript -e 'install.packages(c("tximport","DESeq2"), repos="https://cloud.r-project.org")'

Use renv (project-local library and lockfile)

Rscript -e 'install.packages("renv", repos="https://cloud.r-project.org"); renv::init(); renv::snapshot()'

See also


Final words

  • For small datasets and quick exploratory work (e.g., a typical DESeq2 run with tens of samples), running on a desktop/laptop is simplest and often faster—no queueing and no shared-filesystem overhead.
  • For larger datasets or compute/memory-intensive tasks (e.g., PhenoFlex parameter sweeps, XGBoost/Random Forest with big grids or many trees, permutations/bootstraps), the HPC cluster’s extra RAM and CPUs deliver better throughput and keep your workstation free.

  • Rule of thumb: if a job needs >8–16 CPU threads, >16–32 GB RAM, runs >30–60 min, or you want to launch many variants in parallel, submit it to Slurm.