#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv)
{
char host[1024];
int rank, procs, host_len;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &procs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(host, &host_len);
printf("started %s at %s with rank: %d of %d", argv[0], host, rank, procs);
MPI_Finalize();
}
# Step 0: Load New Module System
$ enable_mod
# Step 1: Check modules
$ module list
Currently Loaded Modules:
1) slurm/17.02
# Step 2: Load correct modules
$ module load gcc/4
$ module load openmpi/2.0
# Step 3: Check again
$ module list
Currently Loaded Modules:
1) slurm/17.02 2) gcc/4 3) openmpi/2.0
For a simple program, you can compile it manually:
$ mpicc simple_mpi.c -o simple_mpi
For a more complicated program, make
command is recommended to manage the compilation process. In addition, you may wish to create separate directories to hold source code, object files and etc. So as the number of files for your program grows, everything is still neat. For instance, you may create a simple file directory structure like below:
$ tree
.
├── build
│ └── simple_mpi.o # object files
├── Makefile # generic Makefile
├── simple_mpi # final binary
├── src
│ └── simple_mpi.c # source code files
└── submission.sh # submission script for Slurm
2 directories, 5 files
Here is a generic Makefile. You can expand on it for your own program.
# This file assumes you are writing your code in C. You can change the EXT variable to change this behavior
# to make it work for other source code as well.
# For example:
# to C++: EXT = cpp
# to Fortran: EXT = F
EXT = c
SRCS = $(shell find src -name '*.$(EXT)')
OBJS = $(SRCS:src/%.$(EXT)=build/%.o)
BIN = simple_mpi
# You need change your compiler for different languages as well
# to C++: CC = mpiCC
# LD = mpiCC
# to Fortran: CC = mpifort
# LD = mpifort
CC = mpicc
LD = mpicc
#CFLAGS is given to compiler while compiling each object file
#LDFLAGS is given to compiler at the linking stage
CFLAGS = -O2
LDFLAGS =
all: $(BIN)
$(BIN): $(OBJS)
$(LD) $(LDFLAGS) $(OBJS) -o $(BIN)
build/%.o: src/%.$(EXT)
$(CC) $(CFLAGS) -c $< -o $@
clean:
rm build/*.o
rm $(BIN)
With this Makefile, your compilation process will be as simple as below:
$ make
mpicc -O2 -c src/simple_mpi.c -o build/simple_mpi.o
mpicc build/simple_mpi.o -o simple_mpi
This is a very basic and generic submission script. You can expand on it for your own program
#!/bin/bash
#SBATCH --job-name=simple_mpi # set job name to simple_mpi
#SBATCH --output=output # set stdout and stderr output to same file
#SBATCH --ntasks=4 # launch 4 tasks, which will use 4 cores by default
enable_lmod # enable new module system
module load gcc/4 # load run time (gcc/openmpi) for your code
module load openmpi/2.0
srun simple_mpi
To submit to Slurm, we will use sbatch
command.
$ sbatch submission.sh
Submitted batch job 740
# the job is submitted, and job number is 740
# at this point, this job should be in PD stage
#replace xxxxxxx with your username
$ squeue -u xxxxxxx
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
740 main simple_m mdong003 PD 0:00 1 (None)
# after a couple of seconds, the job start executing
$ squeue -u xxxxxxx
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
740 main simple_m mdong003 R 0:01 1 coreV2-22-001
# you can also check your job status with `scontrol show job`, which shows more information
$ scontrol show job 740
JobId=740 JobName=simple_mpi
UserId=mdong003(30290) GroupId=users(14514) MCS_label=N/A
Priority=10002 Nice=0 Account=odu QOS=turing_default_qos
JobState=FAILED Reason=NonZeroExitCode Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=134:0
RunTime=00:00:05 TimeLimit=365-00:00:00 TimeMin=N/A
SubmitTime=2017-05-30T10:47:54 EligibleTime=2017-05-30T10:47:54
StartTime=2017-05-30T10:47:55 EndTime=2017-05-30T10:48:00 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=main AllocNode:Sid=turing2:164977
ReqNodeList=(null) ExcNodeList=(null)
NodeList=coreV2-22-001
BatchHost=coreV2-22-001
NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=4,node=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
Gres=(null) Reservation=(null)
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/mdong003/Source - HPCDocs/simple_mpi/submission.sh
WorkDir=/home/mdong003/Source - HPCDocs/simple_mpi
StdErr=/home/mdong003/Source - HPCDocs/simple_mpi/output
StdIn=/dev/null
StdOut=/home/mdong003/Source - HPCDocs/simple_mpi/output
Power=
# you can check the status of your job even after your job finished its execution with `sacct -j 740`
740 simple_mpi main odu 4 COMPLETED 0:0
740.batch batch odu 4 COMPLETED 0:0
# Checking your output file is a relatively simple method to monitoring your job
# Be aware, the output does not have a fixed order due to the nature of parallel execution
$ cat output
started simple_mpi at coreV2-22-001 with rank: 0 of 4
started simple_mpi at coreV2-22-001 with rank: 2 of 4
started simple_mpi at coreV2-22-001 with rank: 1 of 4
started simple_mpi at coreV2-22-001 with rank: 3 of 4