| CARVIEW |
Welcome to the 2009 GitHub Contest
The 2009 GitHub contest is a simple recommendation engine contest. We have released a dataset of our watched repositories data and want to provide a list of recommended repositories for each of our users. Removed from the sample dataset are 4,788 watches - write an open source program to guess the highest percentage of those removed watches and you win our prize.
The Contest
The contest is pretty simple. You download our sample data which includes a file containing which users are watching which repositories, and a file listing 4,788 of those users. Then you write a program to recommend up to 10 repositories for each of the test users. You get a point for each time one of your guesses was an entry we removed. To enter:
- Create a new repository on GitHub for your entry.
- Add our contest server as a post-receive URL:
- https://contest.github.com
- Push to your repo with a
results.txtfile that contains your guesses. - Your score is calculated and added to the leaderboard.
The contest ends on August 30th, 2009 at noon PST - whichever repository has the highest number of guessed recommendations at that time wins.
The Prize

A bottle of Pappy Van Winkle 20 year reserve - the best damn bourbon whiskey ever (age permitting) and a free large GitHub account for life.
The Data
The sample data consists of 440,237 entries - each identifying a single user who has watched a single repository. This data contains:
- 120,867 unique repositories
- 56,555 unique users
We include in seperate files the names and calculated language breakdowns for each repository in case you want to use that data for classification algorithms. If you want other data that we have that you can access via the GitHub API, please let me know so I can snapshot it for you server-side instead of having you pound the server with API requests.
The Rules
- All the code used to generate your
results.txtfile must be included in your repository and must be licenced under a OSI compatible license after the contest is over. If you wish to use a restrictive license during the contest so others can't copy you, that is fine, but the repository must still be public so we can check your results. - GitHub must be allowed to use the code commercially without restriction, regardless of the license choosen.
- It is technically possible to guess who the users are in our system and figure out what we have removed - this dataset is already basically public via our API - doing this will disqualify your entry.
