You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run uv run pre-commit install to setup the git hooks
If you use VSCode, you may want to install the extensions (ruff, mypy) it will recommend when you open this folder
Running locally
uv run main --help
uv run main <DATASET_ID> update-template
uv run main <DATASET_ID> backfill-local <INIT_TIME_END>
Development commands
Add dependency: uv add <package> [--dev]. Use --dev to add a development only dependency.
Lint: uv run ruff check
Type check: uv run mypy
Format: uv run ruff format
Tests:
Run tests in parallel on all available cores: uv run pytest
Run tests serially: uv run pytest -n 0
Deploying to the cloud
To reformat a large archive we parallelize work across multiple cloud servers.
We use
docker to package the code and dependencies
kubernetes indexed jobs to run work in parallel
Setup
Install docker and kubectl. Make sure docker can be found at /usr/bin/docker and kubectl at /usr/bin/kubectl.
Setup a docker image repository and export the DOCKER_REPOSITORY environment variable in your local shell. eg. export DOCKER_REPOSITORY=us-central1-docker.pkg.dev/<project-id>/reformatters/main
Setup a kubernetes cluster and configure kubectl to point to your cluster. eg gcloud container clusters get-credentials <cluster-name> --region <region> --project <project>
Create a kubectl secret containing your Source Coop S3 credentials kubectl create secret generic source-coop-storage-options-key --from-literal=contents='{"key": "...", "secret": "..."}'.
Development commands
`DYNAMICAL_ENV=prod uv run main <DATASET_ID> backfill-kubernetes <INIT_TIME_END> <JOBS_PER_POD> <MAX_PARALLELISM>