Submit and Manage Jobs on the HPC Slurm Cluster

Questions

  • How do I submit a job to the HPC cluster?
  • How do I write a Slurm job script?
  • What is the sbatch command?
  • How do I run a parallel processing job on the HPC cluster?
  • How do I check the status of my HPC job?
  • How do I cancel a job on the HPC cluster?
  • How do I run an interactive job on the HPC cluster?
  • What is the hpcsub command?
  • Why is my job stuck in the wait queue?
  • What is the difference between SMP and OpenMPI parallel processing?

Environment

This article applies to Bowdoin faculty, students, and researchers submitting computational jobs to the Bowdoin HPC Slurm cluster. Jobs are submitted from the cluster headnode at moosehead.bowdoin.edu, accessible via SSH or the HPC Web Portal. See Access the Bowdoin HPC Environment in the Related Articles section for connection instructions.

Resolution

Write and Submit a Basic Job Script

To run a job on the HPC cluster, create a plain text script file containing the commands you want to run. A basic script called myscript.sh looks like this:

#!/bin/bash
#SBATCH --mail-type=BEGIN,END,FAIL
my-program-name

The #SBATCH --mail-type=BEGIN,END,FAIL line tells Slurm to send you email notifications when your job starts, finishes, or fails.

To submit the script to the cluster, log in to moosehead.bowdoin.edu, change into the directory containing your script, and run:

sbatch myscript.sh

By default, the cluster assigns one CPU core to your job. To request more cores, see the parallel processing section below.

Submit a Quick Job with hpcsub

The hpcsub wrapper script provides a shortcut for running a single command or program on the cluster without writing a script file. For example, to run a program called myprogram using the default 1 CPU core and 6 GB of memory:

hpcsub -cmd myprogram

Run Parallel Processing Jobs

The HPC cluster supports two types of parallel processing: SMP (Symmetric Multiprocessing) and OpenMPI.

SMP (shared memory) runs a job on multiple CPU cores on a single machine. To submit an SMP job, use the -N 1 flag (one compute node) and -n to specify the number of CPU cores:

sbatch -N 1 -n 8 myscript.sh

This example requests 8 CPU cores on a single compute node. The program being run must support multi-core processing — simply requesting multiple cores does not automatically parallelize a single-threaded program.

OpenMPI runs a job across multiple CPU cores spread over multiple machines. To submit an OpenMPI job, use only the -n flag to specify the total number of CPU cores (Slurm will distribute cores across available nodes):

sbatch -n 16 myscript.sh

The program must be compiled with OpenMPI support to use this mode.

Fixed cores per node: To request a specific number of cores per node in a multi-node job, use --ntasks-per-node. For example, to request 4 nodes with 12 cores each (48 cores total):

sbatch -N 4 --ntasks-per-node=12 myscript.sh

Run Interactive Jobs

Interactive jobs allow you to work with HPC resources in real time. To start an interactive session on a compute node:

srun --pty bash

When the session starts, you will be placed on a compute node where you can run commands directly. Type exit to end the interactive session and return to the headnode.


Monitor and Control Your Jobs

Use these Slurm commands to check job status and manage running jobs:

  • squeue — view all jobs currently running or waiting in the queue
  • squeue -u username — view only your own jobs
  • scancel jobid — cancel a specific job (replace jobid with the job number shown in squeue output)
  • scancel -u username — cancel all of your jobs
  • scontrol show job jobid — view detailed information about a specific job

Why Is My Job Waiting in the Queue?

Jobs may wait in the queue when the requested resources are not immediately available. Common reasons include:

  • SMP jobs requesting many cores: An SMP job that requires all cores on a single machine (for example, 24 cores) may wait until a single node has that many cores free, even if the total number of free cores across the cluster is higher.
  • High cluster utilization: When many jobs are running, new jobs wait until resources become available.
  • GPU or specialized resource requests: Jobs requesting specific GPU cards or high-memory nodes may wait if those resources are currently in use.

To check what resources are currently available, use:

sinfo

If your SMP job is waiting because no single node has enough free cores, you can cancel the job with scancel and resubmit with fewer cores to start running immediately.

Additional Help

If you need further assistance, you have several options:

  • Bowdoin Bot: Chat with Bowdoin Bot directly from any KB page for instant answers.
  • Phone: Call the Bowdoin College Service Desk at (207) 725-3030.
  • In person: Visit the Tech Hub in Smith Union during business hours.
  • Submit a ticket: Request assistance through the Service Catalog.

Additional Resources

 

 

AI-assisted content: This article was drafted with the assistance of an AI writing tool and reviewed by Bowdoin IT staff for accuracy.
Print Article

Related Articles (7)

Bowdoin College provides a Linux-based High-Performance Computing (HPC) cluster for faculty, students, and researchers. The cluster offers approximately 1,400 CPU cores, GPU computing, up to 2 TB of RAM per node, and a variety of scientific software. This article provides an overview of HPC resources and how to get started.
Instructions for connecting to the Bowdoin HPC environment using SSH, the HPC Web Portal, JupyterLab, or RStudio. Covers SSH access from macOS and Linux, VPN requirements for off-campus use, and SSH configuration tips for dropped connections.
A comprehensive reference of software available on the Bowdoin HPC Linux cluster, including commercial packages such as MATLAB, Gaussian, Mathematica, Stata, and COMSOL, as well as over 130 open-source scientific applications. Includes instructions for using the module system to load software and detailed usage guides for each commercial package.
Reference information for the Bowdoin HPC Slurm cluster, including queue (partition) descriptions, job policies and resource limits, and a hardware overview suitable for grant proposals.
Instructions for transferring files between your local computer and the Bowdoin HPC environment. Covers the HPC Web Portal file browser, mounting the HPC research space via SMB from macOS or Windows, SFTP from the command line, and using Gluster temporary scratch storage for running jobs
Instructions for requesting GPU computing, high-memory nodes, and other specialized resources on the Bowdoin HPC Slurm cluster. Covers available NVIDIA GPU cards and request syntax, memory reservation options, mixed GPU and CPU jobs, and the experimental NVIDIA Grace Hopper system.
The Bowdoin HPC Web Portal (Open OnDemand) provides browser-based access to the HPC environment for command-line sessions, graphical applications, file management, and job monitoring. The portal is accessed at hpcweb.bowdoin.edu using Firefox or Chrome.