You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
./sbt/sbt assembly
make # This builds the native libraries used in KeystoneML
Example: MNIST pipeline
# Get the data from S3
wget https://mnist-data.s3.amazonaws.com/train-mnist-dense-with-labels.data
wget https://mnist-data.s3.amazonaws.com/test-mnist-dense-with-labels.data
KEYSTONE_MEM=4g ./bin/run-pipeline.sh \
keystoneml.pipelines.images.mnist.MnistRandomFFT \
--trainLocation ./train-mnist-dense-with-labels.data \
--testLocation ./test-mnist-dense-with-labels.data \
--numFFTs 4 \
--blockSize 2048
Running with spark-submit
To run KeystoneML pipelines on large datasets you will need a Spark cluster.
KeystoneML pipelines run on the cluster using
spark-submit.
You need to export SPARK_HOME to run KeystoneML using spark-submit. Having done
that you can similarly use run-pipeline.sh to launch your pipeline.
export SPARK_HOME=~/spark-1.3.1-bin-cdh4 # should match the version keystone is built with
KEYSTONE_MEM=4g ./bin/run-pipeline.sh \
keystoneml.pipelines.images.mnist.MnistRandomFFT \
--trainLocation ./train-mnist-dense-with-labels.data \
--testLocation ./test-mnist-dense-with-labels.data \
--numFFTs 4 \
--blockSize 2048
About
Simplifying robust end-to-end machine learning on Apache Spark.