You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Demonstrating and Building ETL pipelines in Airflow
This repo demonstrates a use case for n Ecommerce business that has a platform that generates transaction data each time a purchase is made. With this transaction data, the functions in the pipeline seek to answer 3 business questions
Who is our platinum customer? Anyone with purchase value equal to or more than 5000
What is the purchase history like for each user? This builds a dataset that can be used for a recommendation engine downstream
What items are commonly purchased together? This builds a dataset that can be used for Basket Analysis downstream
Analysis Implementation
The code can be found in etl_utils.py file.
Question 1 is implemented using pd.merge() to get the combined dataset and df.groupby().sum() to get total purchases.
Both question 2 and 3 are achieved using Pandas Pivot Tablespd.pivtot_table()
Generated Data
The sample CSVs generated by the above functions can be found in the samples/ folder.
Airflow Connections
I have used the PostgresOperator which requires a postgres connection and a SimpleHttpOperator which requires a http connection. This is set in the Admin UI in the connections tab.
About
Demonstrating and Building ML pipelines in Airflow