You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
optimize API to run a bin-packing operation on a Delta Table.
Reading from Delta Lake
importdask_deltatableasddt# read delta tabledf=ddt.read_deltalake("delta_path")
# with specific versiondf=ddt.read_deltalake("delta_path", version=3)
# with specific datetimedf=ddt.read_deltalake("delta_path", datetime="2018-12-19T16:39:57-08:00")
df is a Dask DataFrame that you can work with in the same way you normally would. See
the Dask DataFrame documentation for
available operations.
Accessing remote file systems
To be able to read from S3, azure, gcsfs, and other remote filesystems,
you ensure the credentials are properly configured in environment variables
or config files. For AWS, you may need ~/.aws/credential; for gcsfs,
GOOGLE_APPLICATION_CREDENTIALS. Refer to your cloud provider documentation
to configure these.
dask-deltatable can connect to AWS Glue catalog to read the delta table.
The method will look for AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
environment variables, and if those are not available, fall back to
~/.aws/credentials.
dask-deltatable can connect to Unity catalog to read the delta table.
The method will look for DATABRICKS_HOST and DATABRICKS_TOKEN environment
variables or try to find them as kwargs with the same name but lowercase.
To write a Dask dataframe to Delta Lake, use to_deltalake method.
importdask.dataframeasddimportdask_deltatableasddtdf=dd.read_csv("s3://bucket_name/data.csv")
# do some processing on the dataframe...ddt.to_deltalake("s3://bucket_name/delta_path", df)
Writing to Delta Lake is still in development, so be aware that some features
may not work.