Slurm

Terminology

Tasks

A task is a command which is run in parallel by slurm using srun. Tasks can be used to run more than one command at the same time instead of one after the other, increasing performance

Partitions

Partitions are an organizational structure in slurm which allows nodes to be grouped together and for certain options and restrictions to be placed on them. We have a few partitions on Buddy:

Tip

this information can be found using the sinfo command in the terminal

Name

Time limit

Description

general

5 days

Used for most jobs

general-long

30 days

Used for jobs that are expected to run for a long time

high-mem

5 days

Used for jobs which are expected to have high memory usage

high-mem-long

30 days

Used for long jobs which are expected to have high memory usage

gpu

5 days

Used for gpu jobs

gpu-long

30 days

Used for gpu jobs which are expected to run for a long time

testing

2 days

Reserved for our internal testing

We recommend that you use the partition that is most appropriate to your application.

Cores

Each node has 20 cores so the product of –tasks-per-node and –cpus-per-task should not exceed 20

Commands

sbatch used allocate resource and run the given script using slurm srun used withing an sbatch file to run a command as a parallel task smap displays the jobs currently running on the cluster sinfo displays information about down and running nodes aswell as partition information

Sbatch Parameters

Sbatch parameters are used to control the way jobs are submitted and run on buddy

Common sbatch parameters

Name

Environment variables

Default

Description

-J,–job-name

SLURM_JOB_NAME

script name or “sbatch”

the name of your job

-o,–output

N/A

“slurm-%j.out”

file to dump standard output of program

-e,–error

N/A

“slurm-%j.out”

file to dump standard error of program

-n,–ntasks

N/A

1 unless –cpus-per task is set

the maximum number of tasks sbatch should allocate resources for

-N,–nodes

SLURM_JOB_NUM_NODES

enough nodes to satisfy the -n and -c options

the number of nodes to allocate. A minimum and maximum can also be set like: –nodes=10-12

-c,–cpus-per-task

SLURM_CPUS_PER_TASK

one processor per task

the number of cpus to allocate for each task

–ntasks-per-node

SLURM_TASKS_PER_NODE

?

the number of tasks to allocate for on each node

-p,–partition

SBATCH_PARTITION

general

the partition to run the job in

-t,–time

SBATCH_TIMELIMIT

max time for partition

the maximum amount of time the job is allowed to run