This week I’ve changed the graphs around quite a bit! The activity on the site just continues to increase, and surprisingly it didn’t take a hit because of RubyConf.
Stats Breakdown
Notably New Projects
Settingslogic: Globally accessible settings in an ERB-enabled YML file. Where can I get some? Check out some more examples in binarylogic’sblog post about the plugin.
scanty: A Sinatra-driven blog from adamwiggins. I haven’t had the chance to look at any Sinatra apps, but this will definitely be my starting point. See it live on his blog.
hoshi: A new, clean way of defining HTML/XML views in pure Ruby. Definitely would be nice for apps that need a template system or HTML output quickly and easily. Tons of examples here.
Workaholic Repos
factor, 774 commits: The Factor programming language
digispeaker, 508 commits: From their site, ‘internet connected 100W stereo amplifier embedded in a pair of speakers’
As always, suggestions for improvements (and projects to cover!) are welcome. I’d love to break down the huge spikes in the activity or perhaps add an entirely new graph. I would also like to see some projects using other languages than Ruby next week.
That’s right, in the past accounts could not be deleted. Previously we advised users to downgrade to the free plan and stop using their account, but some people wanted to tie up those loose ends.
Now that you know this, we kindly ask that you forget this feature exists.
We send an email receipt whenever we charge your credit card on GitHub. While this is sufficient for most, not all corporations are happy receiving a plain/text receipt, so we’re now attaching a more formal PDF receipt along with your regular email.
It’ll look something like this:
PS. For those interested, we’re using the Prawn library to generate the PDFs.
Today we’re unveiling a new feature on GitHub that we’ve been working on for a bit now that lets you search through nearly all the public source code on the site (which is a lot).
GitHub Codesearch can be found at github.com/codesearch and will let you type in anything you’re looking for in source code and get highlighted results of any files in our public repositories that match. You will also get a sidebar with the facet counts of language breakdown of the results and repository breakdowns.
You can also search through any specific language, if you’re looking for something specific by selecting that language in the dropdown:
You can also refine your search results by clicking on one of the languages or repositories listed in the sidebar to drill down to only those results:
There are a pretty good number of things you can do with this, and we hope you enjoy it. We’ll be incorporating this functionality into more aspects of the website soon (search from the front page, search through a users repositories from their user page, search through a repos code from the project page, etc). Let us know if you find any kinks or can think of any particularly interesting application of this technology that we could incorporate into the site.
It looks like I’ve been enlisted as an official blogger to bring you the latest and greatest work that’s coming out of GitHub! Sit back, ogle at the graphs, and fork away.
Stats Breakdown
Notably New Projects
Authlogic: The self proclaimed “Chuck Norris of authentication solutions.” It’s a new, refreshing look at doing authentication in Rails (and any framework, really) that doesn’t rely on generators to inject code into your app. Still needs a lot of documentation and tutorials, but it’s definitely looking promising. (Note: the project’s first commit was on 10/24, but it started really gaining watchers and attention this week.)
Ronin: Simply put, Ronin is a hacker’s swiss army knife. You can test your apps for security vulnerabilities with it or even punch holes through your servers via a multitude of different protocols. This project lived on RubyForge for a while, but it’s just nestled into a new home here on GitHub. Clone away!
roleful: A DSL-ified way to describe permissions for objects, such as when you need to define what different classes/levels of users can do inside of your application. Keeps your models clean and is quite elegant in its execution.
Most Watched
cached_externals, 63 new watchers (New Capistrano extension for linking to external gems/plugins, etc)
homoiconic, 61 new watchers (A new code blog from the author of Raganwald)
rack-cache, 32 new watchers (Drop-in caching for Rack based apps…ETag, Expires, Cache-Control, and so on)
Next week I’d love to get a new graph up, perhaps hourly events so we can really see what times of day the hackers come out. If you’ve got any suggestions, let me know!
The site should be much faster – but it’s still not fast enough. We’re hard at work making things like git clones, tree browsing, and commit viewing much faster. As always, we’ll keep you in the loop.
After trying a few different solutions in the early days, we settled on Ara Howard’s Bj. It was fine for quite a while, but some of the design decisions haven’t been working out for us lately. Bj allows you to spawn exactly one worker per machine – we want a machine dedicated to workers. Bj loads a new Rails environment for every job submitted – we want to load a new Rails environment one time only. Both of these decisions carry performance implications.
If we were to run one Bj per machine, we’d only have four workers running as GitHub consists of four, ultra-beefy app slices. Unlike most contemporary web apps, the fewer the slices we have the better – it means less machines connected to our networked file system, and less machines create less network chatter and lock contention. As some of the jobs take a while to run (60+ seconds), four workers is a very low number. We want something like 20, but we’d settle for as few as 8.
We did hack Bj to allow multiple instances to run on a machine, but that ended up being counterproductive due to design decision #2: loading a new Rails environment for each job.
See, Rails takes a while to start up. Not only do you have to load all the associated libraries, but each require statement needs to look through the entire, massive load path – a load path that includes the Rails app, Rubygems, the Rails source code, and all of our plugins. Doing this over and over, multiple times a minute, burns a lot of CPU and takes a lot of time. In some cases, the Rails load time is 99% of the entire background job’s lifetime. Spawning a whole bunch of Bjs on a single machine meant we effectively DoS’d the poor CPU.
I started working on a solution, but it was at this point we realized we were doing something wrong. These are not flaws in Bj, they are design decisions – these two ideas make Bj a pleasure to work with and perfect for simple sites. It’s working great on FamSpam. We had simply outgrown it, and hacking Bj would have been error prone and time consuming. Luckily, we had seen people praising Dj in the past and a solid recommendation from technoweenie was all we needed.
The transition took about an hour and a half – from installing the plugin to successfully running Dj on the production site, complete with local and staging trial runs (and bug fixes). Because we had changed queues so many times in the past, we were using a simple interface to submitting a job.
RockQueue meant we didn’t have to change any application code, just infrastructure. I highly recommend an abstraction like this for vendor-specific APIs that would normally be littered all throughout your app, as changing vendors can become a major pain.
Anyway, Dj lets us spawn as many workers on a machine as we want. They’re just rake tasks running a loop, after all. It deals with locking and retries in a simple way, and works much like Bj. The queue is much faster now that we don’t have to pay the Rails startup tax.
We now have a single machine dedicated to running background tasks. We’re running 20 Dj workers on it with great success. There is no science behind this number.
Since people have already started asking “why didn’t you use queue X” or “you should use queue Y,” it seems reasonable to address that: we were very happy with Bj and wanted a similar system, albeit with a few different opinions. Dj is that system. It is simple, required no research beyond the short README, works wonderfully with Rails, is fast, is hackable, solves both the queue and the worker problems, and has no external dependecies. Also, it’s hosted on GitHub!
Dj is our 5th queue. In the past we’ve used SQS, ActiveMQ, Starling, and Bj. Dj is so far my favorite.
In a future post I’ll discuss the ways in which we use (and abuse) our queue. Count on it.
What happens when a dozen Git developers get together for beers after an all day Git conference? Patches like this:
The second annual GitTogether concluded today, which consisted of a few dozen core git developers and evangelists that met at Google (who graciously hosted and fed us) for the last three days. I had a great time, finally meeting in person a really significant group of Git developers who have literally written most of Git. Junio Hamano, Shawn Pearce, Johannes Schindelin, Eric Wong, Jeff King, Petr Baudis, Christian Couder, and Nick Hengeveld were all there, and have alone probably written more than half of the existing codebase.
On Tuesday night, A Large Angry SCM donated beer money for us all, and the above patch was the final outcome of that night. What is more Gitish than a dozen geeks passing around a laptop with an awesome patch so everyone can sign off on it, then four core Git developers trying to figure out how to use git send-email for 10 minutes? Good times.
Over the three days, besides a good amount of laughing and joking (at one point Jeff had a ‘git log—swedish-chef’ working, I think), there were some really interesting things discussed. Tom got to demonstrate GitHub and Gist to the group, most of whom are very command line oriented and had not used either before. I got to talk about the need for a linkable git library and Johannes found someone else (not with the GitTogether, but at Google with the OpenAFS project at the same time) that might be able to work with me on that (and in getting a TortoiseSVN-like client as well). I also got to show off my iPhone-based Git server work in progress that even got a few patches from David Symonds right after.
The whole thing started off with Johannes giving a nice tech talk on contributing with Git.
Sam demonstrated the work he went through to import 20 years of Perl history into the git repository that the Perl team is just now finishing transitioning to from Perforce. He also talked about the GitTorrent protocol, which we might be able to use at some point down the road to speed up git clones.
Shawn went over pack v4, which is a new packfile format, which was completely fascinating to me, but likely not to anyone who would be reading this blog entry. Basically it would speed up some operations by keeping commit and tree data in binary form in the packfile, rather than in deflated text form. Shawn also spoke about the status of the JGit project, which is the most complete re-implementation of Git around (it’s in Java).
Junio went though a sort of statistical history of the Git project that was fascinating (turns out there are still about 220 lines of code still around from Linus’s original first commit).
Tim talked about something that I think will be one of the next huge (highly visible) changes in Git you’re likely to see in the next year – handling large meda well, and being able to do narrow and sparse clones, (and shallow clones better). This means being able to clone part of a Git repository, such as just the last revision (shallow), just the ‘lib’ directory (narrow) or just a single file (sparse). Importantly, you would be able to see the history of everything still (it would download the commit and tree objects, which are generally small, but not the larger blobs), and you would be able to do pushes back (which shallow clones can’t currently do).
The other important, highly visual thing that was discussed, and even a few patches are already in for, is for little improvements to the UI. The full planning document is on Gist, but already things like making use of the term ‘stage’ for things that happen in the index (such as using ‘git diff—staged’ instead of ‘git diff—cached’) is being worked on. I’m excited that staging files may soon be done via ‘git stage’ rather-than/in-addition-to ‘git add’. This is nice for new users who often have a hard time seeing why you have to keep ‘git add’ing to stage your changes.
Overall, a really great conference, and I’m so glad that now I have faces to go along with nearly all the names I see on the list all the time. And even geeky drinking stories with some of them. Now we just need to get Junio (which, btw, he says is pronounced ‘June’) to accept the pirate patch…
The Rails Rumble, which partnered with Linode, to provide a uniform hosting environment for competitors, and GitHub, to provider source control, tracked a total of 14,355 commits the private GitHub repositories over the weekend. Over 12,000 files were produced, and 245,257 lines of code.
The amount of people running into issues with our gem builder has been overwhelming, so we’re releasing the code if some kind souls want to lend a hand making the system more secure and robust.
Basically, the less effort that’s required to bring in code via a pull request, the sooner it can be added to the project release. And at the end of the day, that’s really what it’s all about.
I’ve converted our participation graphs (as seen above) to use Canvas instead of Flash. Linux users rejoice! This means the graphs should load quite a bit faster and not bog down your CPU like the Flash graphs did. I haven’t yet implemented the little mouse over bubbles that the old graphs had, but I plan to do that in the future.
You might also be interested to know that I’ve used my Open Source Friday project to implement these graphs. The project is called Primer and offers a Flash-like layer on top of Canvas that makes it easier to create dynamic and interactive Canvas-based works. It’s still very young, but I’ll be working on it every Friday and hope to get it to the point where I can redo the impact graphs with it. Enjoy!
The secret is out! Today Google released the entire Android stack as open source software and guess what version control system they are using? That’s right, it’s Git.
I say ‘the secret is out’, because we happened to know about this a bit beforehand, since I’ve been down to the Google campus a number of times in the last few weeks helping to train the Androids there in Git. I was asked by Shawn Pearce (you may know him from his Git and EGit/JGit glory – he is the hero that takes over maintanance when Junio is out of town) to come in to help him train the Google engineers working on Android in transitioning from Perforce to Git, so Android could be shared with the masses. I can tell you I was more than happy to do it.
I gave two talks for the Android team, working with Dave Bort (the handsome guy you see in the video on the front page here) and Shawn to develop material that will help them specifically in transitioning from Perforce and highlighting the tasks that they felt would be most common with thier workflow and project size.
Logical Awesome is now officially offering this type of custom training service to all companies, where we can help your organization with training and planning if you are thinking about switching to Git as well. If you would be interested, send me an email at scott@logicalawesome.com.
If you’re interested, the tech talks that I gave there should be available on YouTube once Google finishes post-producing them. We’ll be sure to remind everyone when that happens. :)
It’s hard to imagine, but just one year ago today we made the first commit to the GitHub repository. We don’t have a baby book for GitHub, so we’ll have to settle on the blog to record our handprint and first words.
We have four full-time employees: Tom Preston-Werner, Chris Wanstrath, PJ Hyett, and Scott Chacon. Our support man, Tekkub, answers all your nuanced questions via email, IRC, and on the forums. We’ve taken a grand total of $0 in venture capital or other outside investment. Just recently we topped 20,000 public repositories.
But we couldn’t have done it without you, our loyal users. You’ve dared to try a new version control system and seen how much better things can be. Thanks for joining us on this adventure, we look forward to the upcoming year to make your life as a developer even more amazing!
The 2008 Rails Rumble is set to start tonight at midnight and we’ll be sponsoring the competition in two ways. First, we’re providing free private repositories for each of the 237 teams. Second, we’ve thrown in a bottle of the traditional GitHub celebratory Bourbon, Pappy Van Winkle, for the winner!
Keep up with the event over at the Rails Rumble Blog. There’s sure to be some really great apps come out of it. We can’t wait to see them!
Economies around the world have hit a serious rough patch, and some folks may be looking for work elsewhere. What better way to impress your new employer than to include a link to your GitHub profile so they can check out your skills?
From personal experience, it’s much easier to hire someone that has made their code publicly available, as opposed to just taking their word that they’re good at what they do.
Don’t have any open source projects of your own? Find one to contribute to!
It’s been almost 6 months since we released the network graph and since then it has become an indispensable part of the GitHub experience. Over that period I’ve been compiling a list of things that would make the graph even better. More precise and less confusing portrayal of merge and branch structure. The ability to pull a specific range of commits in for drawing, resulting in blazingly fast incremental renders. Data caching so to remove the overhead of just-in-time graph construction. A boatload of bug fixes. And finally, the ability to actually draw the Rails, Linux, and Git networks!
What are you waiting for? Go check out the new and improved graph of your favorite project!
No, we’re not crazy. Prior to today, if you had a project that had been forked, you were unable to delete it.
The reason is simple: when you fork a repository, we don’t copy over the data, we just point your repository to the original via an alternates file. It’s one of the great things about git, and the reason you can fork the linux kernel and not use up any of your allotted space (until you make changes).
So, if the owner of the repository wants to delete their repo, it would have broken all of the forks as well. Not exactly optimal in open source situations when the original project maintainer gets bored.
Our solution was to create a “graveyard” for deleted repositories. A place where your repo’s objects are stored, but no longer associated with your account. That way it’s not counted against your account’s limits nor are any of its forks affected.
One thing worth noting is that this does not apply to private repositories with forks. If a private repo is deleted, all of its forks will also be deleted for security reasons. It’s up to you to let your network and collaborators know what’s happening.
With awesome tools like GitNub and GitX adding GitHub integration, it would be great if you didn’t have to keep re-entering you username and API token.
So let’s settle on a standard, pioneered by GitNub: