You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Delta Lake is an open-source storage format that runs on top of existing data lakes. Delta Lake is compatible with processing engines like Apache Spark and provides benefits such as ACID transaction guarantees, schema enforcement, and scalable data handling.
The Delta Lake project aims to unlock the power of the Deltalake for as many users and projects as possible
by providing native low-level APIs aimed at developers and integrators, as well as a high-level operations
API that lets you query, inspect, and operate your Delta Lake with ease.
The deltalake library aims to adopt patterns from other libraries in data processing,
so getting started should look familiar.
fromdeltalakeimportDeltaTable, write_deltalakeimportpandasaspd# write some data into a delta tabledf=pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
write_deltalake("./data/delta", df)
# Load data from the delta tabledt=DeltaTable("./data/delta")
df2=dt.to_pandas()
assertdf.equals(df2)
The same table can also be loaded using the core Rust crate:
use deltalake::{open_table,DeltaTableError};#[tokio::main]asyncfnmain() -> Result<(),DeltaTableError>{// open the table written in pythonlet table = open_table("./data/delta").await?;// show all active files in the tablelet files:Vec<_> = table.get_file_uris()?.collect();println!("{files:?}");Ok(())}
The following section outlines some core features like supported storage backends
and operations that can be performed against tables. The state of implementation
of features outlined in the Delta protocol is also tracked.
Cloud Integrations
Storage
Rust
Python
Comment
Local
S3 - AWS
S3 - MinIO
S3 - R2
Azure Blob
Azure ADLS Gen2
Microsoft OneLake
Google Cloud Storage
HDFS
LakeFS
Python: Rust engine writer only supported
Supported Operations
Operation
Rust
Python
Description
Create
Create a new table
Read
Read data from a table
Vacuum
Remove unused files and log entries
Delete - predicates
Delete data based on a predicate
Optimize - compaction
Harmonize the size of data file
Optimize - Z-order
Place similar data into the same file
Merge
Merge a target Delta table with source data
Update
Update values from a table
Add Column
Add new columns or nested fields
Add Feature
Enable delta table features
Add Constraints
Set delta constraints, to verify data on write
Drop Constraints
Removes delta constraints
Set Table Properties
Set delta table properties
Convert to Delta
Convert parquet table to delta table
FS check
Remove corrupted files from table
Restore
Restores table to previous version state
Protocol Support Level
Writer Version
Requirement
Status
Version 2
Append Only Tables
Version 2
Column Invariants
Version 3
Enforce delta.checkpoint.writeStatsAsJson
Version 3
Enforce delta.checkpoint.writeStatsAsStruct
Version 3
CHECK constraints
Version 4
Change Data Feed
Version 4
Generated Columns
Version 5
Column Mapping
Version 6
Identity Columns
Version 7
Table Features
Reader Version
Requirement
Status
Version 2
Column Mapping
Version 3
Table Features (requires reader V7)
About
A native Rust library for Delta Lake, with bindings into Python