TensorFlow is an open source platform for machine learning using neural network models. Machine learning process consists of two steps: training the model and inference (i.e., using the trained model to perform predictions). TensorFlow training phase is computationally very intensive, therefore it is best carried out on GPUs.
TensorFlow is available in Wahab and Turing clusters as Singularity containers. The container consists of Python 3, a specific of TensorFlow built and optimized for target hardware. Each of these containers also comes with PyTorch 1.3, which is another popular framework for neural networks.
TensorFlow can run on GPUs or CPUs. As of the time of writing, these are the available variants of TensorFlow:
Description | module. |
---|---|
TensorFlow 1.15 + PyTorch 1.3 for CPU | tensorflow-cpu/1.15.0 |
TensorFlow 1.15 + PyTorch 1.3 for GPU | tensorflow-gpu/1.15.0 |
TensorFlow 2.2 + PyTorch 1.3 for CPU | tensorflow-cpu/2.2.0 |
TensorFlow 2.2 + PyTorch 1.3 for GPU | tensorflow-gpu/2.2.0 |
The TensorFlow containers can be used in conjunction with personalized module environment to allow users to install and use additional Python libraries, in the same manner as the basic Python container.
For more information on TensorFlow please visit: http://tensorflow.org .
TensorFlow calculations are commonly run on GPU due to its high efficiency. We recommend you try this first, unless you know that your workload can benefit from the CPU implementation (see below). Please note that our Volta V100 GPUs have 16 GB RAM per GPU, and older GPUs have smaller RAM: 12 GB for K40, K80, and P100.
In order to run TensorFlow GPU interactively, use the following commands:
-c 4
to the number of CPU cores you need):salloc -p gpu --gres gpu:1 -c 4
enable_lmod
module load container_env tensorflow-gpu/1.15.0
crun python script.py
Job script use same options as interactive job, you can use following sample as reference:
#!/bin/bash
#SBATCH -p gpu
#SBATCH --gres gpu:1
enable_lmod
module load container_env tensorflow-gpu/1.15.0
crun python script.py
Due to the nature of the workload, the CPU implementation of TensorFlow has significantly lower performance than the GPU one. But in few special cases, TensorFlow CPU can be a viable alternative to the GPU version:
salloc -N 1 --exclusive
enable_lmod
module load container_env tensorflow-cpu/1.15.0
crun python script.py
#!/bin/bash
#SBATCH -N 1
#SBATCH --exclusive
enable_lmod
module load container_env tensorflow-cpu/1.15.0
crun python script.py
TensorFlow is usually used with Python programming language. We provide Python version >= 3.7 in the container. Some popular packages such as numpy
, scipy
, pandas
, and scikit-learn
are included in the container.
Tensorflow GPU software uses with CUDA and cublas for NVIDIA GPUs.
The Tensorflow CPU software uses the Intel MKL (Math Kernel Library) optimized for Intel CPUs.
Warning: The CPU implementation of PyTorch has issues with its optimizer. Please test first if you decide to use PyTorch on CPU to make sure that it works as expected. Please contact us at rcc@odu.edu if you need further assistance.
The crun
("container run") command is a shortcut to execute the software in the container. If your workload requires more than one container at the same time, please use crun.MODULE_NAME
(e.g. crun.tensorflow-gpu
) in place of crun
to disambiguate.