You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PySpark isn't on sys.path by default, but that doesn't mean it can't be used as a regular library.
You can address this by either symlinking pyspark into your site-packages,
or adding pyspark to sys.path at runtime. findspark does the latter.
Without any arguments, the SPARK_HOME environment variable will be used,
and if that isn't set, other possible install locations will be checked. If
you've installed spark with
brew install apache-spark
on OS X, the location /usr/local/opt/apache-spark/libexec will be searched.
Alternatively, you can specify a location with the spark_home argument.
findspark.init('/path/to/spark_home')
To verify the automatically detected location, call
findspark.find()
Findspark can add a startup file to the current IPython profile so that the environment vaiables will be properly set and pyspark will be imported upon IPython startup. This file is created when edit_profile is set to true.
Findspark can also add to the .bashrc configuration file if it is present so that the environment variables will be properly set whenever a new shell is opened. This is enabled by setting the optional argument edit_rc to true.