Slurm

Terminology

Tasks

A task is a command which is run in parallel by slurm using srun. Tasks can be used to run more than one command at the same time instead of one after the other, increasing performance

Partitions

Partitions are an organizational structure in slurm which allows nodes to be grouped together and for certain options and restrictions to be placed on them. We have a few partitions on Buddy:

Tip

this information can be found using the sinfo command in the terminal

Name	Time limit	Description
general	5 days	Used for most jobs
general-long	30 days	Used for jobs that are expected to run for a long time
high-mem	5 days	Used for jobs which are expected to have high memory usage
high-mem-long	30 days	Used for long jobs which are expected to have high memory usage
gpu	5 days	Used for gpu jobs
gpu-long	30 days	Used for gpu jobs which are expected to run for a long time
testing	2 days	Reserved for our internal testing

We recommend that you use the partition that is most appropriate to your application.

Cores

Each node has 20 cores so the product of –tasks-per-node and –cpus-per-task should not exceed 20

Commands

sbatch used allocate resource and run the given script using slurm srun used withing an sbatch file to run a command as a parallel task smap displays the jobs currently running on the cluster sinfo displays information about down and running nodes aswell as partition information

Sbatch Parameters

Sbatch parameters are used to control the way jobs are submitted and run on buddy

Common sbatch parameters

Name	Environment variables	Default	Description
-J,–job-name	SLURM_JOB_NAME	script name or “sbatch”	the name of your job
-o,–output	N/A	“slurm-%j.out”	file to dump standard output of program
-e,–error	N/A	“slurm-%j.out”	file to dump standard error of program
-n,–ntasks	N/A	1 unless –cpus-per task is set	the maximum number of tasks sbatch should allocate resources for
-N,–nodes	SLURM_JOB_NUM_NODES	enough nodes to satisfy the -n and -c options	the number of nodes to allocate. A minimum and maximum can also be set like: –nodes=10-12
-c,–cpus-per-task	SLURM_CPUS_PER_TASK	one processor per task	the number of cpus to allocate for each task
–ntasks-per-node	SLURM_TASKS_PER_NODE	?	the number of tasks to allocate for on each node
-p,–partition	SBATCH_PARTITION	general	the partition to run the job in
-t,–time	SBATCH_TIMELIMIT	max time for partition	the maximum amount of time the job is allowed to run