Jupyter is a very nice platform to use while developing a Python-based processing pipeline. However, there are several major limitations on using Jupyter on our cluster:
While one can launch many Jupyter sessions to process more data or calculation, it is advisable that you would turn your notebook into a Python script that can run in the batch (noninteractive) mode, then submit it for execution via SLURM, our job submission system.
In a nutshell, here are the steps you need to take:
Create a Python script, based on the pieces of code you already crafted in the notebook, that will run without requiring user intervention (i.e. it reads input from files and writes output to file(s) or using the print
command);
Create a "job script", which is a shell script with special instructions to the job scheduler (see examples in https://wiki.hpc.odu.edu/Software/Python#job-script for a starting point; you will need to adjust the modules to load).
Submit the job from the cluster's terminal interface using sbatch YOUR_JOB_SCRIPT
. Check the SLURM output file for errors and correct the scripts/programs for the issues that arise.
Sometimes it is beneficial to test your python script created in step 1 above using an interactive Python session on the terminal. Using the salloc
command (see, e.g. https://wiki.hpc.odu.edu/Software/Python#interactive-jobs).
See https://wiki.hpc.odu.edu/slurm for more detailed info on SLURM.
Q: How do I convert my Jupyter notebook into a Python script so I can run it with SLURM?
Method 1: Copy-and-paste
Using the OnDemand interface, open two tabs: (1) on the first tab, open the notebook in Jupyter; (2) on the second tab, use OnDemand's file explorer to create a blank Python script and edit it. Copy the pertinent segments of Python codes from the notebook to the editor window.
Method 2: Using jupyter nbconvert
command
To do this, get onto a terminal session on Wahab cluster, then issue the following sequence of commands ($
signifies the shell prompt):
$ module load python
$ module load py-jupyter
$ cd YOUR_NOTEBOOK_DIRECTORY
$ jupyter nbconvert --to python YOUR_NOTEBOOK.ipynb
Your entire notebook will be converted to a Python script, which will include the markdown documentation as comments. Here is a sample snippet of how it looks like:
# ### 2.3 Updating Elements in a `Series`
#
# At times we need to modify certain elements in a `Series` or a `DataFrame`.
# This is accomplished by the use of `.loc[]` operator, which can read or update one or more elements corresponding to the specified labels.
#
# Here is an example for a single-element update:
# In[21]:
print("Before update:")
print(MEM_USAGE[7])
print(MEM_USAGE.loc[7])
# In[22]:
MEM_USAGE.loc[7] = 0.33
print("After update:")
print(MEM_USAGE.loc[7])
print(MEM_USAGE)
Indeed it does not look as neat.
¶ I'm still confused! Which method should I use? Method 1 or 2?
Method one is fully manual and may sound tedious, but you are building your script step-by-step, and getting full wareness of what gets into your script.
Method two will result in messier script initially. But if you have fashioned your notebook to work almost like a script (i.e. the notebook can be re-run over and over without needing manual intervention), then you may choose this route to get started faster.
Whichever way to choose, keep in mind that you need to make a non-interactive script, which means your script will have to rely only on inputs from files and/or command-line options.
Q: I want to use Jupyter on my own local machine (desktop, laptop), can I get that?
Definitely! Jupyter is an open-source product and you can get it free of cost. Sometimes you may want to run analysis or interactive notebook session on your own machine, not subject to the time limit on our cluster. Fortunately, you can install Jupyter on your own computer! The popular Anaconda Python distribution can be installed on computers running Windows, MacOS, or Linux. Please see https://www.anaconda.com/products/individual#Downloads to download Anaconda.