smart-sales-docs

In Module 2, we wrote a simple, centralized data_prep.py to get started. In real projects, this often evolves into modular per-table scripts to better manage complexity and focus. We'll see an example of that kind of evolution.

In Module 2, we had one file: scripts/data_prep.py.

In Module 3, we now use one file per data table:

scripts/data_prep/prepare_customers.py
scripts/data_prep/prepare_products.py
scripts/data_prep/prepare_sales.py

Why?

As data projects grow, it becomes easier to:

Focus on one dataset at a time
Avoid breaking other code when cleaning changes
Test and debug more easily
Let different team members work on different files

We move the old data_prep.py in an archive/ folder so you can compare and reuse as needed.

Module 3: Continuing Project Work

We don't need to create our .venv as we should already have it. If not, go back to Module 1 and 2 make sure those steps are completed. Now, we just follow our regular workflow. If we find we need additional external packages, we can always re-run the install from requirements.txt command as needed. In general, we:

Pull any recent changes from GitHub.
Activate the .venv.
Run scripts/data_prep.py.

Module 3: Mac/Linux Commands

Open your smart sales repository in VS Code. Open a terminal in the root project folder. Activate your .venv and run each file.

source .venv/bin/activate
python3 scripts/data_prep/prepare_customers.py
python3 scripts/data_prep/prepare_products.py
python3 scripts/data_prep/prepare_sales.py

Module 3: Windows PowerShell Commands

Open your smart sales repository in VS Code. Open a PowerShell terminal in the root project folder. Activate your .venv and run each file.

.venv\Scripts\activate
py scripts/data_prep/prepare_customers.py
py scripts/data_prep/prepare_products.py
py scripts/data_prep/prepare_sales.py

After Making Progress

Once you’ve verified the scripts ran successfully, git add, commit, and push changes to your GitHub repository.

git add .
git commit -m "ran initial data_prep.py"
git push -u origin main

For best results, git add-commit-push frequently after making any useful progress.

Complete all Data Preparation

For this step, use pandas (and optionally, a shared DataScrubber class if you like) to clean and prepare each of the raw data files.

Cleaning is a critical task.

Continue until you think you have good data in all the prepared files.

Resources

pro-analytics-01
And the repos from earlier modules.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
archive		archive
data		data
examples		examples
images		images
scripts		scripts
sql/dw_create		sql/dw_create
utils		utils
.gitignore		.gitignore
D31_Data_Collection.md		D31_Data_Collection.md
D32_Data_Cleaning_&_ETL_Preparation.md		D32_Data_Cleaning_&_ETL_Preparation.md
D33_Data_Cleaning_with_pandas.md		D33_Data_Cleaning_with_pandas.md
D41_Data_Warehousing.md		D41_Data_Warehousing.md
D42_Design_DW.md		D42_Design_DW.md
D43_Choose_DW_Tools.md		D43_Choose_DW_Tools.md
README.md		README.md
REF_MODULE4_DBT.md		REF_MODULE4_DBT.md
REF_MODULE4_SQLMESH.md		REF_MODULE4_SQLMESH.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

smart-sales-docs

Why?

Module 3: Continuing Project Work

Module 3: Mac/Linux Commands

Module 3: Windows PowerShell Commands

After Making Progress

Complete all Data Preparation

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Languages

denisecase/smart-sales-docs

Folders and files

Latest commit

History

Repository files navigation

smart-sales-docs

Why?

Module 3: Continuing Project Work

Module 3: Mac/Linux Commands

Module 3: Windows PowerShell Commands

After Making Progress

Complete all Data Preparation

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages