Use GPU and High-Memory Resources on the HPC Cluster

HPC slurm bowdoin-hpc nvidia cuda high-memory grace-hopper gpu-computing

Questions

How do I run a GPU job on the Bowdoin HPC cluster?
What GPU cards are available on the HPC cluster?
How do I request a specific GPU card for my job?
How do I request more memory for my HPC job?
What is the maximum amount of RAM I can request?
How do I use the high-memory queue?
Can I run a job that uses both GPU and CPU resources?
What is the NVIDIA Grace Hopper system at Bowdoin?
What CUDA compute capability do the GPU cards support?

Environment

This article applies to Bowdoin faculty, students, and researchers running jobs that require GPU computing or large amounts of memory on the HPC Slurm cluster. Familiarity with basic job submission using sbatch is assumed. See Submit and Manage Jobs on the HPC Slurm Cluster in the Related Articles section for job submission basics.

Resolution

GPU Computing

GPU (Graphics Processing Unit) computing uses the parallel processing power of video cards for high-speed computation. A GPU contains hundreds of specialized processors and dedicated high-speed memory. Many scientific software applications can take advantage of GPU acceleration, and you can also write custom GPU code using the NVIDIA CUDA programming environment.

The Bowdoin HPC cluster includes a variety of NVIDIA GPU cards. Each type of card has a unique identifier used when requesting it in a job submission. The currently available GPU cards are:

NVIDIA GeForce RTX 3080 — 10 GB, CUDA compute capability 8.6 — request with --gres=gpu:rtx3080:1
NVIDIA GeForce RTX 2080 Ti — 11 GB, CUDA compute capability 7.5 — request with --gres=gpu:rtx2080:1
NVIDIA GeForce RTX 5090 — 32 GB, CUDA compute capability 12.0 — request with --gres=gpu:rtx5090:1
NVIDIA Tensor Core A100 — 80 GB, CUDA compute capability 8.0 — request with --gres=gpu:a100:1
NVIDIA Blackwell Pro 6000 Server Edition — 96 GB, CUDA compute capability 12.0 — request with --gres=gpu:pro6000:1

To submit a job that uses a GPU card, specify the gpu partition and the type of GPU card:

sbatch -p gpu --gres=gpu:rtx2080:1 myscript.sh

This example runs your job on a compute node with an RTX 2080 Ti GPU card. The GPU card is assigned exclusively to your job until it completes.

Memory Reservation

The default main queue allows up to 370 GB of RAM per compute node. The highmem queue allows up to 2 TB of RAM per compute node.

To request a specific total amount of memory per compute node, use the --mem option:

sbatch --mem=50G myscript.sh

To request a specific amount of memory per CPU core, use --mem-per-cpu. For example, to request 5 GB per core across 12 cores (60 GB total):

sbatch --mem-per-cpu=5G -n 12 myscript.sh

To request more than 370 GB, use the highmem queue. For example, to request 1 TB (1,000 GB) of RAM:

sbatch -p highmem --mem=1000G myscript.sh

Important: If you request more memory than any available node can provide, the job will remain in the wait queue indefinitely. Check available resources with sinfo before submitting high-memory jobs.

Mixed GPU and CPU Jobs

You can request both GPU and CPU resources in a single job. For example, to request a GPU card along with 8 CPU cores:

sbatch -p gpu --gres=gpu:rtx2080:1 -n 8 myscript.sh

Experimental NVIDIA Grace Hopper System

The Bowdoin HPC environment includes an experimental NVIDIA Grace Hopper system. The Grace Hopper architecture combines an NVIDIA GPU with an ARM-based CPU in a single integrated module, connected by a high-bandwidth NVLink-C2C interconnect.

Note: The Grace Hopper system is experimental. Contact Bowdoin IT through the Service Catalog for guidance on using the Grace Hopper system for your research.

Additional Help

If you need further assistance, you have several options:

Bowdoin Bot: Chat with Bowdoin Bot directly from any KB page for instant answers.
Phone: Call the Bowdoin College Service Desk at (207) 725-3030.
In person: Visit the Tech Hub in Smith Union during business hours.
Submit a ticket: Request assistance through the Service Catalog.

Additional Resources

AI-assisted content: This article was drafted with the assistance of an AI writing tool and reviewed by Bowdoin IT staff for accuracy.

0 reviews

Print Article

Updating...