You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
leewyang edited this page Dec 2, 2019
·
19 revisions
Running TensorFlowOnSpark on a Spark Standalone cluster (Single Host)
We illustrate how to use TensorFlowOnSpark on a Spark Standalone clusterrunning on a single machine. While this is not a true distributed cluster, it is useful for small scale development and testing of distributed Spark applications. After your application is working in this environment, it should run in a true distributed Spark cluster with minimal changes. Note that a Spark Standalone cluster running on multiple machines requires a distributed file system that is accessible from all of the executors/workers.
Install Spark
Install Apache Spark per instructions. Make sure that you can successfully run some of the basic examples. Also make sure you set the following environment variables:
export SPARK_HOME=<path to Spark>
export PATH=${SPARK_HOME}/bin:${PATH}
Install TensorFlow and TensorFlowOnSpark
Install TensorFlow per instructions. For example, using the pip install method, you should be able to install TensorFlow and TensorFlowOnSpark as follows:
You can browse to the Spark Web UI to view your Spark cluster along with your application logs. In particular, each of the TensorFlow nodes in a TensorFlowOnSpark cluster will be "running" on a Spark executor/worker, so its logs will be available in the stderr logs of its associated executor/worker.
Test Pypark, TensorFlow, and TensorFlowOnSpark
Start a pyspark shell and import tensorflow and tensorflowonspark. If everything is setup correctly, you shouldn't see any errors.
pyspark --master $MASTER
>>> import tensorflow as tf
>>> import tensorflowonspark as tfos
>>> from tensorflowonspark import TFCluster
>>> tf.__version__
>>> tfos.__version__
>>> exit()
Run the MNIST examples
Once your Spark Standalone cluster is setup, you should now be able to run the MNIST examples. Note: if you are using TensorFlow 1.x, please use the examples from the v1.4.4 tag.
Shutdown Spark cluster
When you're done with the local Spark Standalone cluster, shut it down as follows: