Today is the day you have all been waiting for. The bi-weekly GitHub drinkup is on tonight at 7:30pm. While PJ and Scott are off on important business in NYC, Tom, Chris and I will be ready to party it up, talk some shop, pass out stickers, and do some important business (sans bull) of our own this evening at Elixir (3200 16th St. at Guerrero), one of the oldest saloons in San Francisco. Looking forward to seeing you there!
CARVIEW |
-
GitHub Meetup SF #3
-
Continuous Integration Spring Cleaning
Continuous Integration is a fancy term for “run your project’s tests after someone pushes to the repository and notify interested parties if they fail.”
We’re currently in the process of revamping our test suite (which we’ll blog about in the future) and moving servers, so I thought it’d be a good time to re-evaluate our options.
Integrity has grown a lot since its first release. It has a ton of features, great documentation, and nice notifiers (I wrote the Campfire notifier).
It also has a very attractive interface, is easy to configure, and works with multiple projects. And it’ll run anything – not just Ruby projects.
It’s not the easiest thing to install, though. There are a lot of dependencies and I never quite got it working on my latest install attempt. I hear it works better with Passenger than Thin (what I was using).
If you can get it working, it’s worth it. qrush’s report card is a great addition, too.
BuildBot is a Python builder that’s pretty feature complete. And, of course, it works great with GitHub.
I installed it and tried it out – it’s pretty easy to use. And because it’s a generic builder you can also use it for non-test related tasks, like compiling stuff. It has a server BuildBot and worker BuildBots which means you can scale it to run many concurrent tasks, even across machines.
For us it seemed overpowered, but I’ll definitely keep it in mind if we need hardcore lifting in the future.
RunCodeRun is Relevance’s hosted CI service. It supports both private and open source projects, but is Ruby-specific.
For an example check out the AASM project overview or a specific build view. Pretty nice. It has tight GitHub integration, too, linking to each commit.
Unfortunately RCR doesn’t support Campfire notifications yet, as far as I can tell. We need ’em!
CI Joe
Because knowing is half the battle. CI Joe is a dead simple, Git-talkin’, Unix-lovin’, HTTP slingin’ continuous integration server we wrote to do one thing and do it well.
It uses your Git config and lets you extend it through Git hooks. A POST will trigger a build – which means it works great with GitHub. It supports HTTP auth so Internet prankster can’t trigger your builds. It comes with Campfire support. It’s language agnostic – as long as your test suite can be run from a Unix shell, CI Joe can handle it.
Check out the documentation for the complete tour.
We use the Campfire notifier (I sound like a broken record, don’t I) and Joe’s HTTP basic auth feature. Our config looks like this:
$ cat .git/config ... [campfire] user = notifier"github":https://github.com/github.com pass = secret subdomain = github room = The GitHub Dancy Party [cijoe] user = chris pass = secret runner = rake -s test:units
We also use Joe’s “after-reset” hook. We keep our database.yml file in Git, but the CI server needs its own database config settings. If Joe finds an executable “after-reset” Git hook it’ll run it after updating the repo and before running the tests. Ours looks like this:
$ cat .git/hooks/after-reset rm config/database.yml cp database.yml config/database.yml
As you can see, we keep our good database.yml unversioned in the CI clone’s root and just remove the versioned one after each reset. Joe runs a “git reset —hard” which does not remove unversioned files – our custom database.yml won’t get wiped.
Again, check out the source and documentation to get rollin’. Issues go to Issues or the mailing list.
Enjoy.
-
Deployment Script Spring Cleaning
Better late than never, right? As we get ready to upgrade our servers I thought it’d be a good time to upgrade our deployment process. Currently pushing out a new version of GitHub takes upwards of 15 minutes. Ouch. My goal: one minute deploys (excluding server restart time).
We currently use Capistrano with a 400 line deploy.rb file. Engine Yard provides a handful of useful Cap tasks (in gem form) that we use along with many of the built-in features. We also use the fast_remote_cache deployment strategy and have written a handful (400 lines or so) of our own tasks to manage things like our service hooks or SVN importer.
As you may know, Capistrano keeps a releases directory where it creates timestamped versions of your app. All your daemons and processes then assume your app lives under a directory called current which is actually a symlink to the latest timestamped version of your app in releases. When you deploy a new version of your app, it’s put into a new timestamped directory under releases. After all the heavy lifting is done the current symlink is switched to it.
Which was really great. Before Git. So I went digging.
First I investigated Vlad the Deployer, the Capistrano alternative in Ruby. I like that it’s built on Rake but it seems to make the same assumptions as Capistrano. Basically both of these tools are modular and built in such a way that they work the same whether you’re using Subversion, Perforce, or Git. Which is great if you’re using SVN but unfortunate if you’re using Git.
For example, this is from Vlad’s included Git deployment strategy:
When you deploy a new copy of your app, Vlad removes the existing copy and does a full clone to get a new version. Capistrano does something similar by default but has a bundled “remote_cache” strategy that is a bit smarter: it caches the Git repo and does a fetch then a reset. It still has to then copy the updated version of your app into a timestamped directory and switch the symlink, but it’s able to cut down on time spent pulling redundant objects. It even knows about the depth option.
The next thing I looked at was Heroku’s rush. It lets you drive servers (even clusters of them) using Ruby over SSH, which looked very promising. Maybe I’d write a little git-deploy script based on it.
Unfortunately for me Rush needs to be installed on every server you’re managing. It also needs a running instance of rushd. Which makes sense – it’s a super powerful library – but that wouldn’t work for deploying GitHub.
Fabric is a library I first heard about back in February. It’s like Capistrano or Vlad but with more emphasis on being a framework/tool for remote management of servers. Easy deployment scripts are just a side effect of that mentality.
It’s very powerful and after playing with it for a while I was extremely pleased. I’ll definitely be using it in all my Python projects. However, I wasn’t looking forward to porting all our custom Capistrano tasks to Python. Also, though I love Python, we’re mostly a Ruby shop and everyone needs to be able to add, debug, and modify our deploy scripts with ease.
Playing with Fabric did inspire me, though. Capistrano is basically a tool for remote server management, too, if you think about it. We may have outgrown its ideas about deployment but I can always write my own deployment code using Capistrano’s ssh and clustering capabilities. So I did.
It turned out to be pretty easy. First I created a config/deploy directory and started splitting up the deploy.rb into smaller chunks:
$ ls -1 config/deploy gem_eval.rb import.rb notify.rb queue.rb services.rb settings.rb sudo_everywhere.rb symlinks.rb
Then I pulled them in. Careful here: Capistrano override both load and require so it’s probably best to just use load.
This separation kept the deploy.rb and each specific file small and focused.
Next I thought about how I’d do Git-based deployment. Not too different from Capistrano’s remote_cache, really. Just get rid of all the timestamp directories and have the current directory contain our clone of the Git repo. Do a fetch then reset to deploy. Rollback? No problem.
The best part is that because Engine Yard’s gemified tasks and our own code both call standard Capistrano tasks like deploy and deploy:update, we can just replace them and not change the dependent code.
Here’s what our new deploy.rb looks like. Well, the meat of it at least:
Great. I like this – very Gitty and simple. But copying and removing directories wasn’t the only slow part of our deploy process.
Every Capistrano task you run adds a bit of overhead. I don’t know exactly why, but I imagine each task opens a fresh SSH connection to the necessary servers. Maybe. Either way, the less tasks you run the better.
We were running about eight symlink related tasks during each deploy. Config files and cache directories that only live on the server need to be symlinked into the app’s directory structure after the reset. Cutting these actions down to a single task made everything much, much faster.
Here’s our symlinks.rb:
Finally, bundling CSS and JavaScript. I’d like to move us to Sprockets but we’re not using it yet and this adventure is all about speeding up our existing setup.
Since the early days we’ve been using Uladzislau Latynski’s jsmin.rb to minimize our JavaScript. Our Cap task looked something like this:
Spot the problem? We’re minimizing the JS locally, on every deploy, then uploading it to each server individually. We also do this same process for Gist’s JavaScript and the CSS (using YUI’s CSS compressor). So with N servers, this is basically happening 3N times on each deploy. Yowza.
Solution? Do the minimizing and bundling on the servers. The beefy, beefy servers:
As long as the bundle Rake tasks don’t need to load the Rails environment (which ours don’t), this is much faster.
Conclusion
We moved to a more Git-like deployment setup, cut down the number of tasks we run, and moved bundling and minimizing JS and CSS from our localhost to the server. Did it help?
As I said before, a GitHub deploy can take 15 minutes (not counting server restarts). My goal was to drop it down to 1 minute. How’d we do?
$ time cap production deploy * executing `production' * executing `deploy' triggering before callbacks for `deploy:update' * executing `notify:campfire' * executing `deploy:update' * executing `deploy:update_code' triggering after callbacks for `deploy:update_code' * executing `symlinks:make' * executing `deploy:bundle' * executing `deploy:restart' * executing `mongrel:restart' * executing `deploy:cleanup' real 0m14.361s user 0m2.049s sys 0m0.560s
15 minutes down to 14 seconds. Not bad.
-
GitHub Meetup SF #2
You know the drill. 7:30pm, July 30th at Eddie Rickenbacker’s. Look for me or mojombo (picture reference).
And oh yeah, rumor is Tekkub will be here – don’t miss it!
-
Microformats on GitHub
Last weekend was a microformatsDevCamp in San Francisco. What’s a DevCamp? Chris Messina has an explanation.
Now that the dust is settled a few projects have been posted on GitHub:
- beaulebens/hCard-LDAP-Service
- hober/ufdevcamp-ubro
- chirags/Friend-Select
- reid/upcoming-attendees
- singpolyma/hCard-to-Google-Bookmarklet
- amccollum/microtron
- peliom/mfgae
Pretty cool stuff. For more projects you can checkout the microformatDevCamp wiki. Want to hear about all the latest Microformat news as it happens? Follow their blog.
-
Smart JS Polling
While Comet may be all the rage, some of us are still stuck in web 2.0. And those of us that are use Ajax polling to see if there’s anything new on the server.
Here at GitHub we normally do this with memcached. The web browser polls a URL which checks a memcached key. If there’s no data cached, the request returns and polls again in a few seconds. If there is data, the request returns with it and the browser merrily goes about its business. On the other end our background workers stick the goods in memcached when they’re ready.
In this way we use memcached as a poor man’s message bus.
Yet there’s a problem with this: if after a few Ajax polls there’s no data, there probably won’t be for a while. Maybe the site is overloaded or the queue is backed up. In those circumstances the continued polling adds additional unwanted strain to the site. What to do?
The solution is to increment the amount of time you wait in between each poll. Really, it’s that simple. We wrote a little jQuery plugin to make this pattern even easier in our own JS. Here it is, from us to you:
Any time you see “Loading commit data…” or “Hardcore Archiving Action,” you’re seeing smart polling. Enjoy!
-
The 2009 GitHub Contest
Today we’re announcing our 2009 GitHub Contest. Since the Netflix prize is now over, we figured you guys needed something to do. Here is your chance to contribute to the open source canon, make GitHub better, and possibly win two of the best prizes probably ever offered by a contest: a bottle of Pappy Van Winkle and a large GitHub account for life! We would estimate the value here, but, honestly, they’re priceless. Also, hopefully have some fun.
So, the problem is that we want to recommend repositories to you when you log into GitHub that you’ll love. How do we find the perfect projects for you? I wanted to just look at networks of what people were watching and figure out what you might like by what your friends liked. In researching collaborative filtering and recommendation systems papers I found little that is really helpful for this sort of problem, oddly, and very little open source code. Most papers I found online (for free, because I’m cheap – why aren’t all academic papers free and open, btw?) are explicit rating system based (like the Netflix prize – figuring out what you would rate something on a 1-X scale based on previous ratings) not item-based collaborative filters for binary implicit voting (like recommending new items based on past purchasing history) which seems way more useful to most websites to me.
Anyhow, so we figured perhaps you can do this better than we can. I extracted a dataset of all the repository watches in our database – close to half a million – and withheld a sample of them. I then created a test file listing the users I held watches back from. If you can write a program to analyze our dataset and best guess the watches we held back, you win our amazing prizes.
To enter the contest, check out our contest website. Basically you just put your guesses into a file named ‘results.txt’ and push it to a public GitHub project that has “https://contest.github.com” as a post-receive hook. On each push, our site will see if you’ve changed your ‘results.txt’ file then download and score it if you have. At the end of the contest, your source code has to be released under an OSI compatible license so nobody ever has to worry about this problem again. Whoever has the highest score at noon PST on Aug 30, 2009 wins. Good luck!
-
Pro Git Book
For about the last 8 months, I’ve been working on a side project. In November, Apress contacted me about writing a book about Git and I thought it would be a good idea. I may have slightly underestimated the amount of work that it would take, but a few days ago I put the content of the book online under a Creative Commons noncommercial 3.0 license. The book is titled “Pro Git” and you can read it or reference it online at https://progit.org.
The actual printed version will be shipping in another few weeks, but as Apress was kind enough to allow me to publish it under the CC license, you can take a look now. I hope it’s helpful to you in learning or teaching Git.
The full markdown content for the book, as well as all the images and the .graffle file I used to generate them is on GitHub at progit/progit. If you’re interested in providing a translation under the CC license, please fork the project, copy the ‘en’ folder to the language code of your choice and start translating – I’ll put them online as they are done. Chinese, Portuguese and Ukrainian translations have already been started. Man I love GitHub.
I also encourage you to buy a copy if you use the online resource a lot. Though as a disclaimer I do get royalties when you do, I really do want this to be a commercial success so that more publishing companies and authors will release technical books under open licenses – it benefits the entire community and I’m really glad Apress let me do it.
Oh yeah, the other cool thing is that the Pro Git website is a GitHub Pages site being generated with Jekyll.
-
GitHub Rebase #26
Welcome to Rebase #26! If you’ve got an interesting project you’d like to see on the column feel free to shoot me a message. I’d love to see more themed Rebases, like the book edition. Perhaps we could have a JSON edition, a hardcore C edition, unknown language edition, and so on. I follow some simple guidelines that you can check out here too.
Featured Project
asi-http-request is the Steven Seagal of HTTP libraries for Objective-C. Drop this guy into your OSX or iPhone application and it’s guaranteed to kick ass. Well, at least your HTTP calls will. The library makes it easy to interact with RESTful services as well as submit multipart/form-data if you’re in the need for it. It also has a boatload of other features including progress delegates, a streamlined interface to uploading files from disk, and background/queueing support. Take a gander at the docs here, including a nice look at what applications are using it. Fork away, punk.
Notably New Projects
deadweight deals with a common problem that many developers face: unused CSS rules. What do you do with them? Comment them out? Leave them for that annoying team member to deal with instead? This project takes the higher ground by analyzing your stylesheets and some given views to determine what selectors you can safely dispose of. You can even use Mechanize to submit forms and make sure you’re shedding your unnecessary CSS.
jquery-visualize is a really nice way to get simple graphs in your application that’s both accessible (read: degrades into tables) and really spiffy looking. It’s as simple as filling up a table with data and then calling
$('table').visualize();
. Of course, there’s plenty of configuration options like colors, the type of graph, line weights, and more. Try out a demo or download it for yourself.tokyo-recipes is a collection of Lua scripts that plug directly into Tokyo Cabinet, an extremely efficient and speedy key-value store. There’s plenty of awesome recipes in this cookbook including expiring data based on TTL, map reduce and even a simple high-low betting game If you’re just getting started on writing your own Lua scripts for Tokyo Cabinet or are looking for some real examples of how you can use the plugins to your advantage, take a look at this repo.
weberl is a small Erlang webserver that’s based on web.py. It’s essentially a bare-bones web framework that doesn’t assume much, which is certainly ideal if you’re just getting off the ground or you don’t like too much baggage. This project has just started up and could certainly use the help of both experienced and greenhorn Erlang coders if you’re up for it. Go forth and clone!
Redisent is an interface to the Redis key value-store for PHP. Unlike memcached, Redis persists data, and now with this library you can easily hook in your code to it. It also supports clustering, which allows you to hook up more than one key-value store and set aliases for each. Read up more about Redisent on this great blog post/tutorial for using it.
-
Speedy Gem Indexing
As of RubyGems 1.3.2, the index generation code supports incremental index updates. What this means is instead of taking minutes rebuild all of the indexes for GitHub’s thousands of gems, it takes just seconds to index the new gems.So, your gem should show up in our index within 1-2 minutes now, assuming it builds correctly and our job queue isn’t backed up. We also have dropped support for legacy indexes, so anyone using a version of RubyGems prior to the 1.2.0 release needs to upgrade.
-
PHP.git
PHP moved from CVS to Subversion. Why should you care? Because they’ve got an official mirror on GitHub: https://github.com/php.
They’ve also published a nice mini guide to Git that you can give to your friends. Good work guys!
-
Clone Stats!
-
Git User's Survey 2009
jnareb tells us it’s that time of year again – the Git User’s Survey 2009 is up!
Please take a few minutes to participate. Remember: all questions are optional and your responses are saved via your session cookie if you want to take a break. No pressure.
The survey began July 15th and will remain open until September 15th, 2009.
More information about the survey (including the results when they’re ready) can be found on the Git wiki: https://git.or.cz/gitwiki/GitSurvey2009
And, if you’re curious, here’s the original announcement.
-
GitHub Meetup SF
We’re going to try and hold a GitHub Meetup in SF every fortnight (read: two weeks) at a bar or restaurant. After all, the best part of any tech meetup is the drinking!
The first one will be held tomorrow, July 16th at Kilowatt in the Mission at 7:30pm. First round’s on GitHub! Just look for me or mojombo (picture reference).
See you there!
-
GitHub Rebase #25
It’s the 25th edition of Rebase! If you’re feeling nostalgic and want to dive through the previous issues, check out the archive here.
Featured Project
pinax is a Django-based platform for building awesome web applications quickly. This combination of reusable apps and strong conventions can help get your site off the ground in no time. Nicholas Tollervey puts it best in this blog post:
In our analogy, Django is Lego bricks: it gives you the building blocks you need to build interesting things on the web. […] Pinax is a collection of Lego sets: it gives you a set of off-the-shelf components commonly used in web-development: a wiki, OpenID, Twitter clone and so on
Pinax is packed with plenty of features, not to mention a decent set of docs. Check it out in action on Cloud27, a social networking site built just to show off its features, and definitely check out this talk from DjangoCon 2008 about the framework.
Notably New Projects
data-baby-names is a quick study using R and Ruby to produce some neat graphs about the top 1000 baby names from 1880 to present day. For stats nuts or perhaps new parents, this could be an interesting diversion from your normal routine.
merle isn’t exactly new, but it is sure is notable! This Erlang based memcached client is the slickest way to interface with everyone’s favorite object caching system. If you’re writing a serious web app with Erlang, chances are it’s going to need something as high performance solid as memcached, so save yourself some frustration and check out this first. Read up or peruse the docs if you find yourself needing to integrate some caching in your applicaiton.
weary souls need REST, according to this library. Think of weary as the after-HTTParty. The DSL for declaring a HTTP resource is extremely clean, and it’s really simple to dive down and tinker with the details if necessary. This gem also uses Crack to parse json and xml that your webservice du jour provides.
sykobot is an IRC bot from another universe. No really, it’s a bot for #archlinux on Freenode, and it’s got google search, quoting, and plenty of more fun goodies implemented already. The bot is written in Lisp, and it’s already got several active contributors even though it’s only been around for nearly 2 weeks. Get forking!
CssMerger is a C# app that allows you to develop CSS in separate files so some cohesion and sanity can be kept while designing your site, and then serves up a single file once deployed. Basically, it works by replacing import directives with your desired CSS file. If you’re using ASP.NET you should definitely take a look into how this can help you.
-
luckiestmonkey on Aug 13
-
defunkt on Aug 06
-
defunkt on Aug 04
-
defunkt on Jul 29
-
defunkt on Jul 29
-
defunkt on Jul 29
-
schacon on Jul 29
-
schacon on Jul 28
-
qrush on Jul 27
-
pjhyett on Jul 27
-
defunkt on Jul 23
-
defunkt on Jul 22
-
defunkt on Jul 20
-
defunkt on Jul 15
-
qrush on Jul 12