Understand following concept is crucial, a resource allocation that is suitible to your application means the difference between hours of running time to days of running time.
For this document, we will use follow symbol through out:
P: number of processes
T: number of threads
N: number of nodes
X: just means many ..
Their relationship is follow:
- Each node contains multiple processes, each process contains multiple thread.
- All thread from same process can communicate directly over memory, the cost of communcation is minimal
- Threads from different process cannot communicate directly, it counts to process to process communcation below
- All processes from same node can communicate through Linux IPC (shared file, shared memory, pipeline, unix socket, signal, lock, etc... ), the cost of communcation is larger
- All processes from different node can communicate through MPI or other network protocol, the cost of communication is largest
We will not discuss the detail on process and thread itself, it is not very important for HPC discussion,for more information read: Difference between Process and Thread
There are a couple kind of application as far as parallelization for HPC:
- Single Process Single Thread - no parallelization
Most application work in this way, there is no way to speed it up even running on HPC cluster. However you can still benefits from the large memory and storage space from our cluster.
You should request just single Process:
P = 1
- Single Process Multiple Thread
A lot of application work in this way, typically through OpenMP for HPC application, sometimes an application may use pthread directly. The thread communicate with each other with in same process memory space, so there is very small communcation cost between each thread.
You should request single Process with multiple Threads:
P = 1, T = 2 ~ 40
- Multiple Process Single Thread on Multiple Node
Most MPI application works in this way, Process communicate with each other through MPI (message passing interface) over Infiniband network device for different nodes and shared memory on same node, it is most expensive way of communication.
You should request multiple Process with single Threads:
P = 2 ~ X, T = 1
- Multiple Process Multiple Thread on Multiple Node
Some MPI application works in this way, typically utilizing both MPI and OpenMP. A application linked with a optimized BLAS library (Intel MKL, OpenBLAS) can also benefit from this model.
You should request multiple Process with multiple Threads:
P = 2 ~ X, T = 2 ~ 40
- Multiple Process Single Thread on Same Node - Uncommon
This kind of application is very rare, it usually communicate over shared memory, unix socket or other Linux IPC.
You should request multiple Process with single Threads on same node:
P = 2 ~ 40, T = 1, N = 1
- Multiple Process Multiple Thread on Same Node - Very Uncommon
Very few application works this way, it's almost same as above.
You should request just 1 node and let the application decide how to run itself:
N = 1, exclusive
¶ Mapping Process and Thread to Slurm
-
In Slurm nodes represent node, so for node request:
--nodes N
-N N # short form
-
To request a node exclusively, use:
--exclusive
-
In Slurm tasks is a equivalent to process in most case, so for process request:
--ntasks P
-n P # short form
--ntasks-per-node P # when use with --nodes or -N
-
In Slurm Cpus Per Task is a equivalent to thread in most case, so for thread request:
--cpus-per-task T
-c T # short form
Examples:
-
single process multiple thread
-n 1
-c 10
-
multiple process multiple thread
-n 5
-c 10
-
multiple process multiple thread on same mode:
-N 1
--ntasks-per-node 4
-c 5
-
All above arugment can work with both interactive salloc
or job script sbatch
, for example:
salloc -c 10
For Job Script:
#SBATCH -n 2
#SBATCH -c 10