pro-analytics-apache-starter

This project provides an isolated development environment for Apache tools like Kafka and PySpark using local JDK and virtual environments.
Works across macOS, Linux, and Windows (via WSL).

Getting Started

Just Windows Users: FIRST Set up WSL

Install Windows Subsystem for Linux (WSL) by following the instructions.

Open WSL by opening a PowerShell terminal and running wsl.

wsl

Important: All remaining commands must be run from within the WSL environment. We will use the same ones the Mac/Linux users do when we are working in WSL.

All Platforms: Change to Home Directory

Change to your home directory. Run these and all following commands in your shell ($ prompt) terminal.

cd ~/

All Platforms: Clone Your Repository Into Your Home Directory

Copy the template repo into your GitHub account. You can change the name as desired.
Open a terminal in your "Projects" folder or where ever you keep your coding projects.
Avoid using "Documents" or any folder that syncs automatically to OneDrive or other cloud services.
Clone this repository into that folder - Windows users - clone into your default WSL directory.

In the command below, if you changed the repository name, use that name instead.

For example - clone with something like this - but use your GitHub account name and repo name:

git clone https://github.com/denisecase/pro-analytics-apache-starter

Then cd into your new folder (if you changed the name, use that):

cd pro-analytics-apache-starter

All Platforms: Adjust Requirements (Packages Needed)

Review requirements.txt and comment / uncomment the specific packages needed for your project.

Create Virtual Environment

python3 -m venv .venv
source .venv/bin/activate

Important Reminder: Always run source .venv/bin/activate before working on the project.

Install Requirements

python3 -m pip install --upgrade pip setuptools wheel
python3 -m pip install --upgrade -r requirements.txt

Grant Yourself Execute Permissions on the Script folders

chmod +x ./01-setup/*.sh
chmod +x ./02-scripts/*.sh
chmod +x ./02-scripts/*.py

Install JDK

Verify compatible versions. See instructions in the file. Then, install the necessary OpenJDK locally.

./01-setup/download-jdk.sh

Install Apache Tools (As Needed)

Use the commands below to install only the tools your project requires:

./01-setup/install-kafka.sh
./01-setup/install-pyspark.sh

Example: Using Apache Kafka (e.g., for Streaming Data)

Start the Kafka service (keep this terminal running)

./02-scripts/run-kafka.sh

In a second terminal, create a Kafka topic

./kafka/bin/kafka-topics.sh --create --topic test-topic --bootstrap-server localhost:9092

In that second terminal, list Kafka topics

./kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092

In that second terminal, stop the Kafka service when done working with Kafka. Use whichever works.

./kafka/bin/kafka-server-stop.sh
pkill -f kafka

Example: Using PySpark (e.g., for BI and Analytics)

Start PySpark (leave this terminal running)

./02-scripts/run-pyspark.sh

Open a browser to https://localhost:4040/ to monitor Spark jobs and execution details.

In a second terminal, test Spark

python3 02-scripts/test-pyspark.py

Use that second terminal to stop the service when done:

pkill -f pyspark

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
01-setup		01-setup
02-scripts		02-scripts
03-troubleshooting		03-troubleshooting
images		images
jdk		jdk
kafka		kafka
spark		spark
.gitignore		.gitignore
README.md		README.md
SPARK_vs_DASK.md		SPARK_vs_DASK.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pro-analytics-apache-starter

Getting Started

Just Windows Users: FIRST Set up WSL

All Platforms: Change to Home Directory

All Platforms: Clone Your Repository Into Your Home Directory

All Platforms: Adjust Requirements (Packages Needed)

Create Virtual Environment

Install Requirements

Grant Yourself Execute Permissions on the Script folders

Install JDK

Install Apache Tools (As Needed)

Example: Using Apache Kafka (e.g., for Streaming Data)

Example: Using PySpark (e.g., for BI and Analytics)

About

Uh oh!

Uh oh!

Languages

denisecase/pro-analytics-apache-starter

Folders and files

Latest commit

History

Repository files navigation

pro-analytics-apache-starter

Getting Started

Just Windows Users: FIRST Set up WSL

All Platforms: Change to Home Directory

All Platforms: Clone Your Repository Into Your Home Directory

All Platforms: Adjust Requirements (Packages Needed)

Create Virtual Environment

Install Requirements

Grant Yourself Execute Permissions on the Script folders

Install JDK

Install Apache Tools (As Needed)

Example: Using Apache Kafka (e.g., for Streaming Data)

Example: Using PySpark (e.g., for BI and Analytics)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages