You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 29, 2025. It is now read-only.
DataKit -- Orchestrate applications using a Git-like dataflow
DataKit is a tool to orchestrate applications using a Git-like dataflow. It
revisits the UNIX pipeline concept, with a modern twist: streams of
tree-structured data instead of raw text. DataKit allows you to define
complex build pipelines over version-controlled data.
DataKit is currently used as the coordination
layer for HyperKit, the
hypervisor component of
Docker for Mac and Windows, and
for the DataKitCI continuous integration system.
There are several components in this repository:
src contains the main DataKit service. This is a Git-like database to which other services can connect.
ci contains DataKitCI, a continuous integration system that uses DataKit to monitor repositories and store build results.
ci/self-ci is the CI configuration for DataKitCI that tests DataKit itself.
bridge/github is a service that monitors repositories on GitHub and syncs their metadata with a DataKit database.
e.g. when a pull request is opened or updated, it will commit that information to DataKit. If you commit a status message to DataKit, the bridge will push it to GitHub.
bridge/local is a drop-in replacement for bridge/github that just monitors a local Git repository. This is useful for local testing.
Quick Start
The easiest way to use DataKit is to start both the server and the client in containers.
To expose a Git repository as a 9p endpoint on port 5640 on a private network, run:
These commands will expose the database's 9p endpoint on port 5640.
If you want to build the project from source without Docker, you will need to install
ocaml and opam. Then write:
$ make depends
$ make && make test
For information about command-line options:
$ datakit --help
Prometheus metric reporting
Run with --listen-prometheus 9090 to expose metrics at https://*:9090/metrics.
Note: there is no encryption and no access control. You are expected to run the
database in a container and to not export this port to the outside world. You
can either collect the metrics by running a Prometheus service in a container
on the same Docker network, or front the service with nginx or similar if you
want to collect metrics remotely.
Language bindings
Go bindings are in the api/go directory.
OCaml bindings are in the api/ocaml directory. See examples/ocaml-client for an example.
Licensing
DataKit is licensed under the Apache License, Version 2.0. See
LICENSE for the full
license text.
Contributions are welcome under the terms of this license. You may wish to browse
the weekly reports to read about overall activity in the repository.
About
Connect processes into powerful data pipelines with a simple git-like filesystem interface