Body
Questions
- What queues are available on the Bowdoin HPC cluster?
- What are the resource limits for HPC jobs?
- What is the maximum number of CPU cores I can request?
- What is the maximum runtime for an HPC job?
- How many jobs can I run at the same time?
- What is the difference between the main, gpu, and highmem queues?
- What hardware does the Bowdoin HPC cluster have?
- How do I describe the HPC cluster in a grant proposal?
Environment
This article is a reference for Bowdoin faculty, students, and researchers using the HPC Slurm cluster. It covers cluster queues (also called partitions in Slurm), job policies, resource limits, and hardware specifications.
Resolution
Queue (Partition) Descriptions
The Bowdoin HPC Slurm cluster organizes resources into queues (Slurm calls these "partitions"). When submitting a job, you can specify which queue to use with the -p option. If you do not specify a queue, your job is submitted to the main queue by default.
- main — the default queue. Provides access to standard compute nodes with up to 370 GB of RAM per node.
- gpu — for jobs that require GPU computing. You must also specify the GPU type with the
--gres option. See Use GPU and High-Memory Resources on the HPC Cluster in the Related Articles section.
- highmem — for jobs that require more than 370 GB of RAM per node, up to 2 TB. Use the
--mem option to specify the amount of memory needed.
Job Policies and Resource Limits
The following policies and limits apply to jobs submitted to the HPC cluster. These limits help ensure fair access to shared resources across the Bowdoin HPC community.
Note: Specific numeric limits for maximum concurrent jobs, maximum runtime per job, and maximum cores per user may change as the cluster is expanded or reconfigured. Run sacctmgr show qos format=Name,MaxWall,MaxTRESPerUser on the cluster headnode to see the current limits, or contact the IT Service Desk for the latest information.
General policies include:
- Jobs are scheduled on a first-come, first-served basis.
- Each job is assigned a priority based on submission time and resource request.
- Email notifications are available for job start, completion, and failure (configured with
#SBATCH --mail-type=BEGIN,END,FAIL in your job script).
- Jobs that exceed their requested resources (memory, time) may be terminated by the scheduler.
Hardware Overview
The Bowdoin HPC Cluster consists of the following hardware (suitable for inclusion in grant proposals):
- CPU cores: approximately 1,400 cores spread across multiple compute nodes
- Compute nodes: ranging from 16 to 192 CPU cores and 192 GB to 2 TB of RAM per node
- GPU cards: approximately 20 NVIDIA GPU cards, including RTX 3080, RTX 2080 Ti, RTX 5090, A100, and Blackwell Pro 6000 models
- Networking: 2x100 GB low-latency Ethernet per node (200 GB aggregate)
- Storage: dedicated, redundantly configured Gluster high-speed networked filesystem for temporary scratch storage
- Operating system: Rocky Linux
- Job scheduler: Slurm Workload Manager
- Parallel processing: single-threaded, SMP (shared memory), and OpenMPI environments
- Experimental: NVIDIA Grace Hopper integrated GPU-CPU system
HPC Environment Status
A live status dashboard showing the current state of HPC resources is available at hpc.bowdoin.edu/status.
Additional Help
If you need further assistance, you have several options:
- Bowdoin Bot: Chat with Bowdoin Bot directly from any KB page for instant answers.
- Phone: Call the Bowdoin College Service Desk at (207) 725-3030.
- In person: Visit the Tech Hub in Smith Union during business hours.
- Submit a ticket: Request assistance through the Service Catalog.
Additional Resources
AI-assisted content: This article was drafted with the assistance of an AI writing tool and reviewed by Bowdoin IT staff for accuracy.