You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To ensure the traceability, reproducibility and standardization for all ML
datasets and models generated and consumed within Toyota Research Institute
(TRI), we developed the Dataset-Governance-Policy (DGP) that codifies the schema
and maintenance of all TRI's Autonomous Vehicle (AV) datasets.
Components
Schema:
Protobuf-based schemas for
raw data, annotations and dataset management.
DataLoaders: Universal PyTorch DatasetClass to load all
DGP-compliant datasets.
CLI: Main CLI for handling DGP datasets and the entrypoint of
visulization tools.
Getting started is as simple as initializing a dataset-class with the relevant
dataset JSON, raw data sensor names, annotation types, and split information.
Below, we show a few examples of initializing a Pytorch dataset for multi-modal
learning from 2D bounding boxes, and 3D bounding boxes.
fromdgp.datasetsimportSynchronizedSceneDataset# Load synchronized pairs of camera and lidar frames, with 2d and 3d# bounding box annotations.dataset=SynchronizedSceneDataset('<dataset_name>_v0.0.json',
datum_names=('camera_01', 'lidar'),
requested_annotations=('bounding_box_2d', 'bounding_box_3d'),
split='train')
Examples
A list of starter scripts are provided in the examples directory.
examples/load_dataset.py: Simple example script to
load a multi-modal dataset based on the Getting Started section above.
Build and run tests
You can build the base docker image and run the tests within
docker container
via:
make docker-build
make docker-run-tests
Build the Python wheel.
make build
For setup local developement.
make develop
Runing the test using local development environment.
make test
Versioning
This repository adheres to PEP 440 for
versioning.
Contributing
We appreciate all contributions to DGP! To learn more about making a
contribution to DGP, please see Contribution Guidelines.
DGP is developed and currently maintained by Quincy Chen, Arjun Bhargava, Chao
Fang, Chris Ochoa and Kuan-Hui Lee from ML-Engineering team at
Toyota Research Institute (TRI), with contributions
coming from ML-Research team at TRI,
Woven Planet and
Parallel Domain.
About
ML Dataset Governance Policy for Autonomous Vehicle Datasets