You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry guys, I am super busy recently for other projects, I will come back to continue to improve maybe a month later (since Apr 15th), please create an issue if you have any problem.
Hive is not included in current Feast roadmap, this project intends to add Hive support for Offline Store.
For more details, can check this Feast issue.
The public releases have passed all integration tests, please create an issue if you got any problem.
Change Logs
DONE [v0.1.1] I am working on the first workable version, think it will be released in a couple of days.
DONE [v0.1.2] Allow custom hive conf when connect to a HiveServer2
DONE [v0.14.0] Support Feast 0.14.x
DONE [v0.17.0] Support Feast 0.17.0
TODO It currently supports insert into for uploading entity_df, which is a little inefficient, gonna add extra parameters for people who are able to provide HDFS address in next version (for uploading to HDFS).
CREATETABLEdriver_stats (
event_timestamp bigint,
driver_id bigint,
conv_rate float,
acc_rate float,
avg_daily_trips int,
created bigint
)
STORED AS PARQUET;
Load data into the table
LOAD DATA INPATH '/tmp/driver_stats.parquet' INTO TABLE driver_stats;
Edit example.py
# This is an example feature definition filefromgoogle.protobuf.duration_pb2importDurationfromfeastimportEntity, Feature, FeatureView, ValueTypefromfeast_hiveimportHiveSource# Read data from Hive table# Here we use a Query to reuse the original parquet data, # but you can replace to your own Table or Query.driver_hourly_stats=HiveSource(
# table='driver_stats',query=""" SELECT Timestamp(cast(event_timestamp / 1000000 as bigint)) AS event_timestamp, driver_id, conv_rate, acc_rate, avg_daily_trips, Timestamp(cast(created / 1000000 as bigint)) AS created FROM driver_stats """,
event_timestamp_column="event_timestamp",
created_timestamp_column="created",
)
# Define an entity for the driver.driver=Entity(name="driver_id", value_type=ValueType.INT64, description="driver id", )
# Define FeatureViewdriver_hourly_stats_view=FeatureView(
name="driver_hourly_stats",
entities=["driver_id"],
ttl=Duration(seconds=86400*1),
features=[
Feature(name="conv_rate", dtype=ValueType.FLOAT),
Feature(name="acc_rate", dtype=ValueType.FLOAT),
Feature(name="avg_daily_trips", dtype=ValueType.INT64),
],
online=True,
batch_source=driver_hourly_stats,
tags={},
)
git clone https://github.com/baineng/feast-hive.git
cd feast-hive
# creating virtual env ...
pip install -e ".[dev]"# before commit
make format
make lint