Submit and Manage Jobs on the HPC Slurm Cluster

HPC slurm bowdoin-hpc sbatch job-submission parallel-processing squeue hpcsub

Questions

How do I submit a job to the HPC cluster?
How do I write a Slurm job script?
What is the sbatch command?
How do I run a parallel processing job on the HPC cluster?
How do I check the status of my HPC job?
How do I cancel a job on the HPC cluster?
How do I run an interactive job on the HPC cluster?
What is the hpcsub command?
Why is my job stuck in the wait queue?
What is the difference between SMP and OpenMPI parallel processing?

Environment

This article applies to Bowdoin faculty, students, and researchers submitting computational jobs to the Bowdoin HPC Slurm cluster. Jobs are submitted from the cluster headnode at moosehead.bowdoin.edu, accessible via SSH or the HPC Web Portal. See Access the Bowdoin HPC Environment in the Related Articles section for connection instructions.

Resolution

Write and Submit a Basic Job Script

To run a job on the HPC cluster, create a plain text script file containing the commands you want to run. A basic script called myscript.sh looks like this:

#!/bin/bash
#SBATCH --mail-type=BEGIN,END,FAIL
my-program-name

The #SBATCH --mail-type=BEGIN,END,FAIL line tells Slurm to send you email notifications when your job starts, finishes, or fails.

To submit the script to the cluster, log in to moosehead.bowdoin.edu, change into the directory containing your script, and run:

sbatch myscript.sh

By default, the cluster assigns one CPU core to your job. To request more cores, see the parallel processing section below.

Submit a Quick Job with hpcsub

The hpcsub wrapper script provides a shortcut for running a single command or program on the cluster without writing a script file. For example, to run a program called myprogram using the default 1 CPU core and 6 GB of memory:

hpcsub -cmd myprogram

Run Parallel Processing Jobs

The HPC cluster supports two types of parallel processing: SMP (Symmetric Multiprocessing) and OpenMPI.

SMP (shared memory) runs a job on multiple CPU cores on a single machine. To submit an SMP job, use the -N 1 flag (one compute node) and -n to specify the number of CPU cores:

sbatch -N 1 -n 8 myscript.sh

This example requests 8 CPU cores on a single compute node. The program being run must support multi-core processing — simply requesting multiple cores does not automatically parallelize a single-threaded program.

OpenMPI runs a job across multiple CPU cores spread over multiple machines. To submit an OpenMPI job, use only the -n flag to specify the total number of CPU cores (Slurm will distribute cores across available nodes):

sbatch -n 16 myscript.sh

The program must be compiled with OpenMPI support to use this mode.

Fixed cores per node: To request a specific number of cores per node in a multi-node job, use --ntasks-per-node. For example, to request 4 nodes with 12 cores each (48 cores total):

sbatch -N 4 --ntasks-per-node=12 myscript.sh

Run Interactive Jobs

Interactive jobs allow you to work with HPC resources in real time. To start an interactive session on a compute node:

srun --pty bash

When the session starts, you will be placed on a compute node where you can run commands directly. Type exit to end the interactive session and return to the headnode.

Monitor and Control Your Jobs

Use these Slurm commands to check job status and manage running jobs:

squeue — view all jobs currently running or waiting in the queue
squeue -u username — view only your own jobs
scancel jobid — cancel a specific job (replace jobid with the job number shown in squeue output)
scancel -u username — cancel all of your jobs
scontrol show job jobid — view detailed information about a specific job

Why Is My Job Waiting in the Queue?

Jobs may wait in the queue when the requested resources are not immediately available. Common reasons include:

SMP jobs requesting many cores: An SMP job that requires all cores on a single machine (for example, 24 cores) may wait until a single node has that many cores free, even if the total number of free cores across the cluster is higher.
High cluster utilization: When many jobs are running, new jobs wait until resources become available.
GPU or specialized resource requests: Jobs requesting specific GPU cards or high-memory nodes may wait if those resources are currently in use.

To check what resources are currently available, use:

sinfo

If your SMP job is waiting because no single node has enough free cores, you can cancel the job with scancel and resubmit with fewer cores to start running immediately.

Additional Help

If you need further assistance, you have several options:

Bowdoin Bot: Chat with Bowdoin Bot directly from any KB page for instant answers.
Phone: Call the Bowdoin College Service Desk at (207) 725-3030.
In person: Visit the Tech Hub in Smith Union during business hours.
Submit a ticket: Request assistance through the Service Catalog.

Additional Resources

AI-assisted content: This article was drafted with the assistance of an AI writing tool and reviewed by Bowdoin IT staff for accuracy.

0 reviews

Print Article

Updating...