To run Hive or Spark on the cluster please follow steps below:
SSH into Turing or Wahab
ssh username@turing.hpc.odu.edu
ssh username@wahab.hpc.odu.edu
Allocating resource
salloc -N number_of_nodes -c number_of_cores
On Turing, please use:
salloc -N 1 -c 32
On Wahab, please:
salloc -N 1 -c 40
This step is very important. Please ensure that you have typed it correctly.
Launch bash
bash -l
Spark will not start properly if not using bash.
Load hive modules and start spark cluster
enable_lmod
module load container_env python3 hive
start_spark_cluster
start_spark_cluster is a custom command, it will start a spark cluster for you
Initialize metastore for Hive, this step only needs to be done once for a directory
create-workspace
This is also a custom command, it replaces some steps done by yarn and Hadoop
Run hive
hive
Load hive modules and start spark cluster
enable_lmod
module load container_env python3 spark
start_spark_cluster
start_spark_cluster is a custom command, it will start a spark cluster for you
Run Spark
pyspark
spark-shell
spark-submit