You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
github-statistics is a workflow repository designed to pull data from the
GitHub Repositories API and GitHub Users API on a regularly scheduled
basis to generate distribution statistics based on a subset of GitHub early
repositories and users.
As of 2021, GitHub has over 73 million registered users. The github-users.db
SQLite database in this repository includes the first 1.5 million registered
users. It reflects 15 CI runs, pulling 100,000 users per run, compressed with
Zstandard, the same compression algorithm GitHub uses for actions/cache@v3.
The planned studies to be produced by this repository will be bounded by GitHub
repository limits in order to follow recommendations set out by the
Managing large files article. 1.5 million users is the maximum amount of
users that can fit in a full series of 100,000 user inserts after compressed
with Zstandard.
As of Jun 17 2022, github-statistics adds repositories.
Note:Do not use Git LFS. It is not possible to remove Git LFS objects
from a repository without deleting and recreating the repository.
Databases
github-repositories.db
github-users.db
Tables
repositoriesNEW
GitHub repositories as listed by GET /repositories
repositories_stargazersNEW
GitHub repositories from repositories and their stargazer counts
users
GitHub users as listed by GET /users
users_followers
GitHub users from users and their follower counts
Decompress database
macOS
zstd -d github-users.tzst
tar xf github-users.tar
Ubuntu
tar --use-compress-program zstd -xf github-users.tzst