| CARVIEW |
Select Language
HTTP/2 302
server: nginx
date: Mon, 22 Dec 2025 12:34:47 GMT
content-type: text/plain; charset=utf-8
content-length: 0
x-archive-redirect-reason: found capture at 20091025103720
location: https://web.archive.org/web/20091025103720/https://feeds.feedburner.com/github
server-timing: captures_list;dur=0.501912, exclusion.robots;dur=0.038153, exclusion.robots.policy;dur=0.028370, esindex;dur=0.008385, cdx.remote;dur=9.837790, LoadShardBlock;dur=295.048393, PetaboxLoader3.datanode;dur=98.029801, PetaboxLoader3.resolve;dur=93.346379
x-app-server: wwwb-app213-dc8
x-ts: 302
x-tr: 326
server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0
set-cookie: wb-p-SERVER=wwwb-app213; path=/
x-location: All
x-as: 14061
x-rl: 0
x-na: 0
x-page-cache: MISS
server-timing: MISS
x-nid: DigitalOcean
referrer-policy: no-referrer-when-downgrade
permissions-policy: interest-cohort=()
HTTP/2 200
server: nginx
date: Mon, 22 Dec 2025 12:34:48 GMT
content-type: text/xml; charset=UTF-8
x-archive-orig-last-modified: Sun, 25 Oct 2009 10:32:52 GMT
x-archive-orig-etag: QkPMpZiPvE9ev9L4gTk3SI0UkjM
x-archive-orig-date: Sun, 25 Oct 2009 10:37:20 GMT
x-archive-orig-expires: Sun, 25 Oct 2009 10:37:20 GMT
x-archive-orig-cache-control: private, max-age=0
x-archive-orig-x-content-type-options: nosniff
x-archive-orig-x-xss-protection: 0
x-archive-orig-server: GFE/2.0
cache-control: max-age=1800
x-archive-guessed-content-type: text/xml
x-archive-guessed-charset: utf-8
memento-datetime: Sun, 25 Oct 2009 10:37:20 GMT
link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate"
content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org
x-archive-src: webgroup-20100503193022-00066/ARCHIVEIT-1147-WEEKLY-AVVNXU-20091025103411-00039-crawling105.us.archive.org.warc.gz
server-timing: captures_list;dur=0.800787, exclusion.robots;dur=0.028284, exclusion.robots.policy;dur=0.011768, esindex;dur=0.015270, cdx.remote;dur=22.211684, LoadShardBlock;dur=101.764883, PetaboxLoader3.datanode;dur=113.963827, PetaboxLoader3.resolve;dur=64.073263, load_resource;dur=137.962851
x-app-server: wwwb-app213-dc8
x-ts: 200
x-tr: 293
server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0
x-location: All
x-as: 14061
x-rl: 0
x-na: 0
x-page-cache: MISS
server-timing: MISS
x-nid: DigitalOcean
referrer-policy: no-referrer-when-downgrade
permissions-policy: interest-cohort=()
tag:github.com,2008:/blog
The GitHub Blog
2009-10-21T11:27:15-07:00
tag:github.com,2008:Post/532
2009-10-21T11:16:35-07:00
2009-10-21T11:27:15-07:00
GitHub Meetup SF #8
<center>
<p><img src="https://images-0.redbubble.net/img/art/size:large/view:main/2712622-2-blackbird-fly-away.jpg" /></p>
</center>
<p>Come get your drink on with the people of the Hub at <a href="https://maps.google.com/maps?hl=en&ie=UTF8&q=blackbird+san+francisco&fb=1&gl=us&hq=blackbird&hnear=san+francisco&cid=0,0,6763374940088794175&ei=_E3fSvCtA4S0swO3ntzdDw&ved=0CAsQnwIwAA&ll=37.768086,-122.429688&spn=0.009736,0.017316&t=h&z=16&iwloc=A">Blackbird</a> this Thursday October 22nd at 8pm. Also, be sure to look out for a possible Drinkup:Shanghai, China edition in the next few days- PJ and Scott are headed out for <a href="https://kungfurails.com/">KungFuRails</a> right now!</p>
luckiestmonkey
tag:github.com,2008:Post/531
2009-10-20T13:43:17-07:00
2009-10-21T12:21:41-07:00
Introducing BERT and BERT-RPC
<p>As I detailed in <a href="https://github.com/blog/530-how-we-made-github-fast">How We Made GitHub Fast</a>, we have created a new data serialization and <span class="caps">RPC</span> protocol to power the GitHub backend. We have big plans for these technologies and I’d like to take a moment to explain what makes them special and the philosophy behind their creation.</p>
<p>The serialization format is called <span class="caps">BERT</span> (Binary ERlang Term) and is based on<br />
the existing <a href="https://www.erlang.org/doc/apps/erts/erl_ext_dist.html">external term format</a> already implemented by Erlang. The <span class="caps">RPC</span> protocol is called <span class="caps">BERT</span>-<span class="caps">RPC</span> and is a simple protocol built on top of <span class="caps">BERT</span> packets.</p>
<p>You can view the current specifications at <a href="https://bert-rpc.org">https://bert-rpc.org</a>.</p>
<p>This is a long article; if you want to see some example code of how easy it is to setup an Erlang/Ruby <span class="caps">BERT</span>-<span class="caps">RPC</span> server and call it from a Ruby <span class="caps">BERT</span>-<span class="caps">RPC</span> client, skip to the end.</p>
<h3>How <span class="caps">BERT</span> and <span class="caps">BERT</span>-<span class="caps">RPC</span> Came to Be</h3>
<p>For the new GitHub architecture, we decided to use a simple <span class="caps">RPC</span> mechanism to expose the Git repositories as a service. This allows us to federate users across disparate file servers and eliminates the need for a shared file system.</p>
<p>Choosing a data serialization and <span class="caps">RPC</span> protocol was a difficult task. My first thought was to look at <a href="https://incubator.apache.org/thrift/">Thrift</a> and <a href="https://code.google.com/p/protobuf/">Protocol Buffers</a> since they are both gaining traction as modern, low-latency <span class="caps">RPC</span> implementations.</p>
<p>I had some contact with Thrift when I worked at Powerset, I talk to a lot of people that use Thrift at their jobs, and Scott is using Thrift as part of some Cassandra experiments we’re doing. As much as I want to like Thrift, I just can’t. I find the entire concept behind IDLs and code generation abhorrent. Coming from a background in dynamic languages and automated testing, these ideas just seem silly. The developer overhead required to constantly maintain IDLs and keep the corresponding implementation code up to date is too frustrating. I don’t do these things when I write application code, so why should I be forced to do them when I write <span class="caps">RPC</span> code?</p>
<p>Protocol Buffers ends up looking very similar to Thrift. More IDLs and more code generation. Any solution that relies on these concepts does not fit well with my worldview. In addition, the set of types available to both Thrift and Protocol Buffers feels limiting compared to what I’d like to easily transmit over the wire.</p>
<p><a href="https://www.xmlrpc.com/"><span class="caps">XML</span>-<span class="caps">RPC</span></a>, <a href="https://www.w3.org/TR/2003/REC-soap12-part2-20030624/"><span class="caps">SOAP</span></a>, and other <span class="caps">XML</span> based protocols are hardly even worth mentioning. They are unnecessarily verbose and complex. <span class="caps">XML</span> is not convertible to a simple unambiguous data structure in any language I’ve ever used. I’ve wasted too many hours of my life clumsily extracting data from <span class="caps">XML</span> files to feel anything but animosity towards the format.</p>
<p><a href="https://json-rpc.org/"><span class="caps">JSON</span>-<span class="caps">RPC</span></a> is a nice system, much more inline with how I see the world. It’s simple, relatively compact, has support for a decent set of types, and works well in an agile workflow. A big problem here, though, is the lack of support for native binary data. Our applications will be transmitting large amounts of binary data, and it displeases me to think that every byte of binary data I send across the wire would have to be encoded into an inferior representation just because <span class="caps">JSON</span> is a text-based protocol.</p>
<p>After becoming thoroughly disenfranchised with the current “state of the art” <span class="caps">RPC</span> protocols, I sat down and started thinking about what the ideal solution would look like. I came up with a list that looked something like this:</p>
<ul>
<li>Extreme simplicity</li>
<li>Dynamic (No IDLs or code generation)</li>
<li>Good set of types (nil, symbols, hashes, bignums, heterogenous arrays, etc)</li>
<li>Support for complex types (Time, Regex, etc)</li>
<li>No need to encode binary data</li>
<li>Synchronous and Asynchronous calls</li>
<li>Fast serialization/deserialization</li>
<li>Streaming (to and from)</li>
<li>Caching directives</li>
</ul>
<p>I mentioned before that I like <span class="caps">JSON</span>. I love the concept of extracting a subset of a language and using that to facilitate interprocess communication. This got me thinking about the work I’d done with <a href="https://github.com/mojombo/erlectricity">Erlectricity</a>. About two years ago I wrote a C extension for Erlectricity to speed up the deserialization of Erlang’s external term format. I remember being very impressed with the simplicity of the serialization format and how easy it was to parse. Since I was considering using Erlang more within the GitHub architecture, an Erlang-centric solution might be really nice. Putting these pieces together, I was struck by an idea.</p>
<p>What if I extracted the generic parts of Erlang’s external term format and made that into a standard for interprocess communication? What if Erlang had the equivalent of JavaScript’s <span class="caps">JSON</span>? And what if an <span class="caps">RPC</span> protocol could be built on top of that format? What would those things look like and how simple could they be made?</p>
<p>Of course, the first thing any project needs is a good name, so I started brainstorming acronyms. <span class="caps">EETF</span> (Erlang External Term Format) is the obvious one, but it’s boring and not accurate for what I wanted to do since I would only be using a subset of <span class="caps">EETF</span>. After a while I came up with <span class="caps">BERT</span> for Binary ERlang Term. Not only did this moniker precisely describe the nature of the idea, but it was nearly a person’s name, just like <span class="caps">JSON</span>, offering a tip of the hat to my source of inspiration.</p>
<p>Over the next few weeks I sketched out specifications for <span class="caps">BERT</span> and <span class="caps">BERT</span>-<span class="caps">RPC</span> and showed them to a bunch of my developer friends. I got some great feedback on ways to simplify some confusing parts of the spec and was able to boil things down to what I think is the simplest manifestation that still enables the rich set of features that I want these technologies to support.</p>
<p>The responses were generally positive, and I found a lot of people looking for something simple to replace the nightmarish solutions they were currently forced to work with. If there’s one thing I’ve learned in doing open source over the last 5 years, it’s that if I find an idea compelling, then there are probably a boatload of people out there that will feel the same way. So I went ahead with the project and created reference implementations in Ruby that would eventually become the backbone of the new GitHub architecture.</p>
<p>But enough talk, let’s take a look at the Ruby workflow and you’ll see what I mean when I say that <span class="caps">BERT</span> and <span class="caps">BERT</span>-<span class="caps">RPC</span> are built around a philosophy of simplicity and Getting Things Done.</p>
<h3>A Simple Example</h3>
<p>To give you an idea of how easy it is to get a Ruby based <span class="caps">BERT</span>-<span class="caps">RPC</span> service running, consider the following simple calculator service:</p>
<pre><code># calc.rb
require 'ernie'
mod(:calc) do
fun(:add) do |a, b|
a + b
end
end</code></pre>
<p>This is a complete service file suitable for use by my Erlang/Ruby hybrid <span class="caps">BERT</span>-<span class="caps">RPC</span> server framework called <a href="https://github.com/mojombo/ernie">Ernie</a>. You start up the service like so:</p>
<pre><code>$ ernie -p 9999 -n 10 -h calc.rb</code></pre>
<p>This fires up the server on port 9999 and spawns ten Ruby workers to handle requests. Ernie takes care of balancing and queuing incoming connections. All you have to worry about is writing your <span class="caps">RPC</span> functions, Ernie takes care of the rest.</p>
<p>To call the service, you can use my Ruby <span class="caps">BERT</span>-<span class="caps">RPC</span> client called <a href="https://github.com/mojombo/bertrpc"><span class="caps">BERTRPC</span></a> like so:</p>
<pre><code>require 'bertrpc'
svc = BERTRPC::Service.new('localhost', 9999)
svc.call.calc.add(1, 2)
# => 3</code></pre>
<p>That’s it! Nine lines of code to a working example. No IDLs. No code generation. If the module and function that you call from the client exist on the server, then everything goes well. If they don’t, then you get an exception, just like your application code.</p>
<p>Since a <span class="caps">BERT</span>-<span class="caps">RPC</span> client can be written in any language, you could easily call the calculator service from Python or JavaScript or Lua or whatever. <span class="caps">BERT</span> and <span class="caps">BERT</span>-<span class="caps">RPC</span> are intended to make communicating between different languages as streamlined as possible.</p>
<h3>Conclusion</h3>
<p>The Ernie framework and the <span class="caps">BERTRPC</span> library power the new GitHub and we use them exactly as-is. They’ve been in use since the move to Rackspace three weeks ago and are responsible for serving over 300 million <span class="caps">RPC</span> requests in that period. They are still incomplete implementations of the spec, but I plan to flesh them out as time goes on.</p>
<p>If you find <span class="caps">BERT</span> and <span class="caps">BERT</span>-<span class="caps">RPC</span> intriguing, I’d love to hear your feedback. The best place to hold discussions is on the <a href="https://groups.google.com/group/bert-rpc">official mailing list</a>. If you want to participate, I’d love to see implementations in more languages. Together, we can make <span class="caps">BERT</span> and <span class="caps">BERT</span>-<span class="caps">RPC</span> the easiest way to get <span class="caps">RPC</span> done in every language!</p>
mojombo
tag:github.com,2008:Post/530
2009-10-20T11:54:03-07:00
2009-10-21T10:08:08-07:00
How We Made GitHub Fast
<p>Now that things have settled down from the move to Rackspace, I wanted to take some time to go over the architectural changes that we’ve made in order to bring you a speedier, more scalable GitHub.</p>
<p>In my first draft of this article I spent a lot of time explaining why we made each of the technology choices that we did. After a while, however, it became difficult to separate the architecture from the discourse and the whole thing became confusing. So I’ve decided to simply explain the architecture and then write a series of follow up posts with more detailed analyses of exactly why we made the choices we did.</p>
<p>There are many ways to scale modern web applications. What I will be describing here is the method that we chose. This should by no means be considered the only way to scale an application. Consider it a case study of what worked for us given our unique requirements.</p>
<h3>Understanding the Protocols</h3>
<p>We expose three primary protocols to end users of GitHub: <span class="caps">HTTP</span>, <span class="caps">SSH</span>, and Git. When browsing the site with your favorite browser, you’re using <span class="caps">HTTP</span>. When you clone, pull, or push to a private <span class="caps">URL</span> like <span style="background-color:#ddd; padding: 0 .2em; font: 90% Monaco, 'Courier New', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace;">git@github.com:mojombo/jekyll.git</span> you’re doing so via <span class="caps">SSH</span>. When you clone or pull from a public repository via a <span class="caps">URL</span> like <code>git://github.com/mojombo/jekyll.git</code> you’re using the Git protocol.</p>
<p>The easiest way to understand the architecture is by tracing how each of these requests propagates through the system.</p>
<h3>Tracing an <span class="caps">HTTP</span> Request</h3>
<p>For this example I’ll show you how a request for a tree page such as <a href="https://github.com/mojombo/jekyll">https://github.com/mojombo/jekyll</a> happens.</p>
<p>The first thing your request hits after coming down from the internet is the active load balancer. For this task we use a pair of Xen instances running <a href="https://www.vergenet.net/linux/ldirectord/">ldirectord</a>. These are called <code>lb1a</code> and <code>lb1b</code>. At any given time one of these is active and the other is waiting to take over in case of a failure in the master. The load balancer doesn’t do anything fancy. It forwards <span class="caps">TCP</span> packets to various servers based on the requested IP and port and can remove misbehaving servers from the balance pool if necessary. In the event that no servers are available for a given pool it can serve a simple static site instead of refusing connections.</p>
<p>For requests to the main website, the load balancer ships your request off to one of the four frontend machines. Each of these is an 8 core, 16GB <span class="caps">RAM</span> bare metal server. Their names are <code>fe1</code>, …, <code>fe4</code>. <a href="https://nginx.net/">Nginx</a> accepts the connection and sends it to a Unix domain socket upon which sixteen <a href="https://github.com/blog/517-unicorn">Unicorn</a> worker processes are selecting. One of these workers grabs the request and runs the <a href="https://rubyonrails.org/">Rails</a> code necessary to fulfill it.</p>
<p>Many pages require database lookups. Our MySQL database runs on two 8 core, 32GB <span class="caps">RAM</span> bare metal servers with 15k <span class="caps">RPM</span> <span class="caps">SAS</span> drives. Their names are <code>db1a</code> and <code>db1b</code>. At any given time, one of them is master and one is slave. MySQL replication is accomplished via <a href="https://www.drbd.org/"><span class="caps">DRBD</span></a>.</p>
<p>If the page requires information about a Git repository and that data is not cached, then it will use our <a href="https://github.com/mojombo/grit">Grit</a> library to retrieve the data. In order to accommodate our Rackspace setup, we’ve modified Grit to do something special. We start by abstracting out every call that needs access to the filesystem into the Grit::Git object. We then replace Grit::Git with a stub that makes <span class="caps">RPC</span> calls to our Smoke service. Smoke has direct disk access to the repositories and essentially presents Grit::Git as a service. It’s called Smoke because Smoke is just Grit in the cloud. Get it?</p>
<p>The stubbed Grit makes <span class="caps">RPC</span> calls to <code>smoke</code> which is a load balanced hostname that maps back to the <code>fe</code> machines. Each frontend runs four <a href="https://github.com/mojombo/proxymachine">ProxyMachine</a> instances behind <a href="https://haproxy.1wt.eu/">HAProxy</a> that act as routing proxies for Smoke calls. ProxyMachine is my content aware (layer 7) <span class="caps">TCP</span> routing proxy that lets us write the routing logic in Ruby. The proxy examines the request and extracts the username of the repository that has been specified. We then use a proprietary library called Chimney (it routes the smoke!) to lookup the route for that user. A user’s route is simply the hostname of the file server on which that user’s repositories are kept.</p>
<p>Chimney finds the route by making a call to <a href="https://code.google.com/p/redis/">Redis</a>. Redis runs on the database servers. We use Redis as a persistent key/value store for the routing information and a variety of other data.</p>
<p>Once the Smoke proxy has determined the user’s route, it establishes a transparent proxy to the proper file server. We have four pairs of fileservers. Their names are <code>fs1a</code>, <code>fs1b</code>, …, <code>fs4a</code>, <code>fs4b</code>. These are 8 core, 16GB <span class="caps">RAM</span> bare metal servers, each with six 300GB 15K <span class="caps">RPM</span> <span class="caps">SAS</span> drives arranged in <span class="caps">RAID</span> 10. At any given time one server in each pair is active and the other is waiting to take over should there be a fatal failure in the master. All repository data is constantly replicated from the master to the slave via <span class="caps">DRBD</span>.</p>
<p>Every file server runs two <a href="https://github.com/mojombo/ernie">Ernie</a> <span class="caps">RPC</span> servers behind HAProxy. Each Ernie spawns 15 Ruby workers. These workers take the <span class="caps">RPC</span> call and reconstitute and perform the Grit call. The response is sent back through the Smoke proxy to the Rails app where the Grit stub returns the expected Grit response.</p>
<p>When Unicorn is finished with the Rails action, the response is sent back through Nginx and directly to the client (outgoing responses do not go back through the load balancer).</p>
<p>Finally, you see a pretty web page!</p>
<p>The above flow is what happens when there are no cache hits. In many cases the Rails code uses Evan Weaver’s Ruby <a href="https://github.com/fauna/memcached/">memcached</a> client to query the <a href="https://www.danga.com/memcached/">Memcache</a> servers that run on each slave file server. Since these machines are otherwise idle, we place 12GB of Memcache on each. These servers are aliased as <code>memcache1</code>, …, <code>memcache4</code>.</p>
<h3><span class="caps">BERT</span> and <span class="caps">BERT</span>-<span class="caps">RPC</span></h3>
<p>For our data serialization and <span class="caps">RPC</span> protocol we are using <span class="caps">BERT</span> and <span class="caps">BERT</span>-<span class="caps">RPC</span>. You haven’t heard of them before because they’re brand new. I invented them because I was not satisfied with any of the available options that I evaluated, and I wanted to experiment with an idea that I’ve had for a while. Before you freak out about <span class="caps">NIH</span> syndrome (or to help you refine your freak out), please read my accompanying article <a href="https://github.com/blog/531-introducing-bert-and-bert-rpc">Introducing <span class="caps">BERT</span> and <span class="caps">BERT</span>-<span class="caps">RPC</span></a> about how these technologies came to be and what I intend for them to solve.</p>
<p>If you’d rather just check out the spec, head over to <a href="https://bert-rpc.org">https://bert-rpc.org</a>.</p>
<p>For the code hungry, check out my Ruby <span class="caps">BERT</span> serialization library <a href="https://github.com/mojombo/bert"><span class="caps">BERT</span></a>, my Ruby <span class="caps">BERT</span>-<span class="caps">RPC</span> client <a href="https://github.com/mojombo/bertrpc"><span class="caps">BERTRPC</span></a>, and my Erlang/Ruby hybrid <span class="caps">BERT</span>-<span class="caps">RPC</span> server <a href="https://github.com/mojombo/ernie">Ernie</a>. These are the exact libraries we use at GitHub to serve up all repository data.</p>
<h3>Tracing an <span class="caps">SSH</span> Request</h3>
<p>Git uses <span class="caps">SSH</span> for encrypted communications between you and the server. In order to understand how our architecture deals with <span class="caps">SSH</span> connections, it is first important to understand how this works in a simpler setup.</p>
<p>Git relies on the fact that <span class="caps">SSH</span> allows you to execute commands on a remote server. For instance, the command <span style="background-color:#ddd; padding: 0 .2em; font: 90% Monaco, 'Courier New', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace;">ssh tom@frost ls -al</span> runs <code>ls -al</code> in the home directory of my user on the <code>frost</code> server. I get the output of the command on my local terminal. <span class="caps">SSH</span> is essentially hooking up the <span class="caps">STDIN</span>, <span class="caps">STDOUT</span>, and <span class="caps">STDERR</span> of the remote machine to my local terminal.</p>
<p>If you run a command like <span style="background-color:#ddd; padding: 0 .2em; font: 90% Monaco, 'Courier New', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace;">git clone tom@frost:mojombo/bert</span>, what Git is doing behind the scenes is SSHing to <code>frost</code>, authenticating as the <code>tom</code> user, and then remotely executing <code>git upload-pack mojombo/bert</code>. Now your client can talk to that process on the remote server by simply reading and writing over the <span class="caps">SSH</span> connection. Neat, huh?</p>
<p>Of course, allowing arbitrary execution of commands is unsafe, so <span class="caps">SSH</span> includes the ability to restrict what commands can be executed. In a very simple case, you can restrict execution to <a href="https://www.kernel.org/pub/software/scm/git/docs/git-shell.html">git-shell</a> which is included with Git. All this script does is check the command that you’re trying to execute and ensure that it’s one of <code>git upload-pack</code>, <code>git receive-pack</code>, or <code>git upload-archive</code>. If it is indeed one of those, it uses <a href="https://linux.die.net/man/3/exec" title="3">exec</a> to replace the current process with that new process. After that, it’s as if you had just executed that command directly.</p>
<p>So, now that you know how Git’s <span class="caps">SSH</span> operations work in a simple case, let me show you how we handle this in GitHub’s architecture.</p>
<p>First, your Git client initiates an <span class="caps">SSH</span> session. The connection comes down off the internet and hits our load balancer.</p>
<p>From there, the connection is sent to one of the frontends where <a href="https://www.au.kernel.org/software/scm/git/docs/git-daemon.html"><span class="caps">SSHD</span></a> accepts it. We have patched our <span class="caps">SSH</span> daemon to perform public key lookups from our MySQL database. Your key identifies your GitHub user and this information is sent along with the original command and arguments to our proprietary script called Gerve (Git sERVE). Think of Gerve as a super smart version of <code>git-shell</code>.</p>
<p>Gerve verifies that your user has access to the repository specified in the arguments. If you are the owner of the repository, no database lookups need to be performed, otherwise several <span class="caps">SQL</span> queries are made to determine permissions.</p>
<p>Once access has been verified, Gerve uses Chimney to look up the route for the owner of the repository. The goal now is to execute your original command on the proper file server and hook your local machine up to that process. What better way to do this than with another remote <span class="caps">SSH</span> execution!</p>
<p>I know it sounds crazy but it works great. Gerve simply uses <code>exec(3)</code> to replace itself with a call to<span style="background-color:#ddd; padding: 0 .2em; font: 90% Monaco, 'Courier New', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace;">ssh git@<route> <command> <arg></span>. After this call, your client is hooked up to a process on a frontend machine which is, in turn, hooked up to a process on a file server.</p>
<p>Think of it this way: after determining permissions and the location of the repository, the frontend becomes a transparent proxy for the rest of the session. The only drawback to this approach is that the internal <span class="caps">SSH</span> is unnecessarily encumbered by the overhead of encryption/decryption when none is strictly required. It’s possible we may replace this this internal <span class="caps">SSH</span> call with something more efficient, but this approach is just too damn simple (and still very fast) to make me worry about it very much.</p>
<h3>Tracing a Git Request</h3>
<p>Performing public clones and pulls via Git is similar to how the <span class="caps">SSH</span> method works. Instead of using <span class="caps">SSH</span> for authentication and encryption, however, it relies on a server side <a href="https://www.au.kernel.org/software/scm/git/docs/git-daemon.html">Git Daemon</a>. This daemon accepts connections, verifies the command to be run, and then uses <code>fork(2)</code> and <code>exec(3)</code> to spawn a worker that then becomes the command process.</p>
<p>With this in mind, I’ll show you how a public clone operation works.</p>
<p>First, your Git client issues a <a href="https://github.com/mojombo/egitd/blob/master/docs/protocol.txt">request</a> containing the command and repository name you wish to clone. This request enters our system on the load balancer.</p>
<p>From there, the request is sent to one of the frontends. Each frontend runs four ProxyMachine instances behind HAProxy that act as routing proxies for the Git protocol. The proxy inspects the request and extracts the username (or gist name) of the repo. It then uses Chimney to lookup the route. If there is no route or any other error is encountered, the proxy speaks the Git protocol and sends back an appropriate messages to the client. Once the route is known, the repo name (e.g. <code>mojombo/bert</code>) is translated into its path on disk (e.g. <code>a/a8/e2/95/mojombo/bert.git</code>). On our old setup that had no proxies, we had to use a modified daemon that could convert the user/repo into the correct filepath. By doing this step in the proxy, we can now use an unmodified daemon, allowing for a much easier upgrade path.</p>
<p>Next, the Git proxy establishes a transparent proxy with the proper file server and sends the modified request (with the converted repository path). Each file server runs two Git Daemon processes behind HAProxy. The daemon speaks the pack file protocol and streams data back through the Git proxy and directly to your Git client.</p>
<p>Once your client has all the data, you’ve cloned the repository and can get to work!</p>
<h3>Sub- and Side-Systems</h3>
<p>In addition to the primary web application and Git hosting systems, we also run a variety of other sub-systems and side-systems. Sub-systems include the job queue, archive downloads, billing, mirroring, and the svn importer. Side-systems include GitHub Pages, Gist, gem server, and a bunch of internal tools. You can look forward to explanations of how some of these work within the new architecture, and what new technologies we’ve created to help our application run more smoothly.</p>
<h3>Conclusion</h3>
<p>The architecture outlined here has allowed us to properly scale the site and resulted in massive performance increases across the entire site. Our average Rails response time on our previous setup was anywhere from 500ms to several seconds depending on how loaded the slices were. Moving to bare metal and federated storage on Rackspace has brought our average Rails response time to consistently under 100ms. In addition, the job queue now has no problem keeping up with the 280,000 background jobs we process every day. We still have plenty of headroom to grow with the current set of hardware, and when the time comes to add more machines, we can add new servers on any tier with ease. I’m very pleased with how well everything is working, and if you’re like me, you’re enjoying the new and improved GitHub every day!</p>
mojombo
tag:github.com,2008:Post/529
2009-10-19T09:37:58-07:00
2009-10-19T10:04:52-07:00
Ryan Tomayko is a GitHubber
<p>Today marks <a href="https://github.com/rtomayko">Ryan Tomayko’s</a> first day as a GitHubber. He’ll be helping make GitHub more stable, reliable, and awesome.</p>
<p>Ryan has consistently impressed all of us with his work on <a href="https://github.com/sinatra/sinatra">Sinatra</a> and <a href="https://github.com/rack/rack">Rack</a>, the awesomeness of <a href="https://github.com/rtomayko/shotgun">shotgun</a> and <a href="https://github.com/rtomayko/git-sh">git-sh</a>, his prolific <a href="https://tomayko.com/">writing and linking</a>, and his various other projects.</p>
<div align="center"><a href="https://github.com/rtomayko"><img src="https://img.skitch.com/20091019-rxbm5925t37sjtxik2u7m4wj5a.jpg"/></a></div>
<div align="center" xmlns:cc="https://creativecommons.org/ns#" about="https://www.flickr.com/photos/mojombo/3625905407/in/set-72157619744233018/"><a rel="cc:attributionURL" href="https://www.flickr.com/photos/mojombo/">https://www.flickr.com/photos/mojombo/</a> / <a rel="license" href="https://creativecommons.org/licenses/by-nc-sa/2.0/">CC BY-NC-SA 2.0</a></div>
<p>You can follow <a href="https://tomayko.com/">his blog</a>, <a href="https://twitter.com/rtomayko">his twitter</a>, or <a href="https://github.com/rtomayko">his GitHub</a></p>
<p>Welcome to the team, Ryan!</p>
defunkt
tag:github.com,2008:Post/528
2009-10-18T18:51:27-07:00
2009-10-19T00:16:07-07:00
Scheduled Maintenance Tonight at 23:00 PDT
<p>We will be having a maintenance window tonight from <a href="https://www.timeanddate.com/worldclock/fixedtime.html?month=10&day=18&year=2009&hour=23&min=0&sec=0&p1=224">23:00 to 23:59 <span class="caps">PDT</span></a>. A very small amount of web unavailability will be required during this period.</p>
<p>We will be upgrading some core libraries to versions that are not compatible with what is currently running, so all daemons must be restarted simultaneously. For this to go smoothly, we will be disabling the web app for perhaps 30 seconds.</p>
<p><span class="caps">UPDATE</span>: Maintenance was completed successfully. Total web unavailability was a tad more than estimated at one minute and 40 seconds. Some job runners did not restart cleanly and as a result some jobs failed, but all job runners are operating normally now. If you experienced any problems during the maintenance window, don’t hesitate to contact us at <a href="https://support.github.com">https://support.github.com</a>.</p>
mojombo
tag:github.com,2008:Post/527
2009-10-16T09:38:50-07:00
2009-10-16T09:39:53-07:00
Helping with Texting
<p><a href="https://unicefinnovation.org/"><span class="caps">UNICEF</span></a> is using <span class="caps">SMS</span> to help those in need. And they’re doing it with open source.</p>
<p>You can read all about <a href="https://www.rapidsms.org/">RapidSMS</a>, their <a href="https://unicefinnovation.org/mobile-and-sms.php">Mobile and <span class="caps">SMS</span> platform</a>, but here’s a snippet:</p>
<blockquote>
<p>The impact a RapidSMS implementation has on UNICEF’s work practices is dramatic. In October 2008, Ethiopia experienced crippling droughts. Faced with the possibility of famine, <span class="caps">UNICEF</span> Ethiopia launched a massive food distribution program to supply the high-protein food Plumpy’nut to under-nourished children at more than 1,800 feeding centres in the country. Previously, <span class="caps">UNICEF</span> monitored the distribution of food by sending a small set of individuals who traveled to each feeding center. The monitor wrote down the amount of food that was received, was distributed, and if more food was needed. There had been a two week to two month delay between the collection of that data and analysis, prolonging action. In a famine situation each day can mean the difference between recovery, starvation, or even death.</p>
<p>The Ethiopian implementation of RapidSMS completely eliminated the delay. After a short training session the monitors would enter information directly into their mobile phones as <span class="caps">SMS</span> messages. This data would instantaneously appear on the server and immediately be visualized into graphs showing potential distribution problem and displayed on a map clearly showing where the problems were. The data could be seen, not only by the field office, but by the regional office, supply division and even headquarters, greatly improving response coordination. The process of entering the data into phones was also easier and more cost effective for the monitors themselves leading to quick adoption of the technology.</p>
</blockquote>
<p>What a great use of technology. The site says, “<span class="caps">GSMA</span> [predicts] that by 2010, 90% of the world will be covered by mobile networks.” Seems like <span class="caps">SMS</span> is going to become more important and more ubiquitous in the future.</p>
<p>Check out the <a href="https://www.rapidsms.org/">RapidSMS home page</a> or browse the source, right here on GitHub: <a href="https://github.com/rapidsms/rapidsms">https://github.com/rapidsms/rapidsms</a></p>
defunkt
tag:github.com,2008:Post/526
2009-10-15T15:16:58-07:00
2009-10-15T15:19:03-07:00
Scheduled Maintenance Tonight at 22:00 PDT
<p>We’re having another maintenance window tonight from <a href="https://www.timeanddate.com/worldclock/fixedtime.html?month=10&day=15&year=2009&hour=22&min=0&sec=0&p1=224">22:00 to 23:00 <span class="caps">PDT</span></a>. We will be installing and testing the “sorry server” that will be enabled if no frontends are available to serve requests. Instead of just refusing connections, this server will point you to the Twitter status feed and display information on surviving when GitHub is down. In order to test that everything is working properly we will need to deliberately remove all the frontends from the load balancer for a small period. Git and <span class="caps">SSH</span> access to repositories will be unaffected during this period.</p>
mojombo
tag:github.com,2008:Post/525
2009-10-15T09:50:22-07:00
2009-10-15T09:50:30-07:00
GitHub Ribbon in CSS
<p><a href="https://github.com/jbalogh">jbalogh</a> has a write up on how to implement <a href="https://github.com/blog/273-github-ribbons">GitHub’s Ribbons</a> in pure <span class="caps">CSS</span>: <a href="https://people.mozilla.com/%7Ejbalogh/ribbon/ribbon.html">Redoing the GitHub Ribbon in <span class="caps">CSS</span></a></p>
<div align="center"><a href="https://people.mozilla.com/%7Ejbalogh/ribbon/ribbon.html"><img src="https://s3.amazonaws.com/github/ribbons/forkme_left_red_aa0000.png"/></a></div>
<p>Daniel Perez Alvarez has a <a href="https://unindented.org/articles/2009/10/github-ribbon-using-css-transforms/">similar writeup</a> which includes a nice “supported browsers” table. Awesome work!</p>
defunkt
tag:github.com,2008:Post/524
2009-10-14T10:02:32-07:00
2009-10-14T10:03:11-07:00
TUAW on GitHub
<p><a href="https://www.tuaw.com/">The Unofficial Apple Weblog</a>, or <span class="caps">TUAW</span>, is now on GitHub: <a href="https://github.com/tuaw">https://github.com/tuaw</a></p>
<div align="center"><a href="https://github.com/tuaw"><img src="https://img.skitch.com/20091014-p9c85ngu2yrhsr7jnrd888ex6.png"/></a></div>
<p>While there’s nothing there <em>yet</em>, they only announced the account yesterday and said <a href="https://www.tuaw.com/2009/10/13/tuaw-is-now-on-github/">in the post</a>, “This is where you’ll be able to find code for our developer-related posts. We’ll try to get some projects hosted in there very soon, so don’t worry that it’s empty now!”</p>
<p>Today they’ve featured <a href="https://rentzsch.github.com/clicktoflash/">ClickToFlash</a>, everyone’s favorite Safari plugin, in a <a href="https://www.tuaw.com/2009/10/14/clicktoflash-makes-the-web-a-nicer-place-to-visit/">blog post</a>.</p>
<p>Welcome, TUAWers!</p>
defunkt
tag:github.com,2008:Post/523
2009-10-13T23:12:27-07:00
2009-10-13T23:17:14-07:00
Gist Improvements
<p>Another day, more updates! We just rolled out some subtle changes to <a href="https://gist.github.com">Gist</a></p>
<ul>
<li>Search! All gists are now searchable. Want to find all <a href="https://gist.github.com/gists/search?q=unicorn&page=1">gists about unicorn?</a> Done.</li>
</ul>
<ul>
<li>Unified user box. You may have noticed we rolled out a new user box a couple days ago – that now follows you to Gist as well.</li>
</ul>
<ul>
<li>Added your most recent gists on the home page.</li>
</ul>
<ul>
<li>Tweaked UI. Some of the design elements of Gist have changed a bit to be more consistent with the rest of the site.</li>
</ul>
<center>
<p><a href="https://gist.github.com"><img src="https://share.kyleneath.com/captures/Gist_-_GitHub-20091013-231646.png" /></a></p>
</center>
<p>Hope you enjoy!</p>
kneath
tag:github.com,2008:Post/521
2009-10-13T11:19:24-07:00
2009-10-13T11:29:42-07:00
Speedy Version Sorting
<div align="center"><a href="https://twitter.com/defunkt/statuses/4738823939"><img src="https://twictur.es/i/4738823939.gif"/></a></div>
<p>Last week I offered fame and fortune to anyone who could speed up our <a href="https://github.com/defunkt/version_sorter">version_sorter</a>. It’s used to sort a repo’s tags:</p>
<div align="center"><img src="https://img.skitch.com/20091013-1sg6bq1gr63gebwiuqeguqhja.png"/></div>
<p>This morning I ran the numbers and the winner is…</p>
<div align="center"><a href="https://github.com/pope"><img src="https://img.skitch.com/20091013-t29bnac6y5r5c5murydu5jbjam.png"/></a></div>
<p><a href="https://github.com/pope">Pope</a>!</p>
<p>Special thanks to <a href="https://github.com/binary42">binary42</a>, <a href="https://github.com/pope">pope</a>, <a href="https://github.com/jordi">jordi</a>, <a href="https://github.com/ahoward">ahoward</a>, <a href="https://github.com/jqr">jqr</a>, and <a href="https://github.com/mikeauclair">mikeauclair</a> for speeding up the code.</p>
<p>Here are my benchmarks from fastest to slowest. I used <a href="https://gist.github.com/209422">this script</a> with <a href="https://gist.github.com/209424">this dataset</a> to run them.</p>
<pre>version_sorter benchmarks
sorting 1,311 tags 100 times
original
user system total real
sort 49.840000 0.570000 50.410000 ( 60.088636)
rsort 51.610000 0.610000 52.220000 ( 61.462576)
-----------------------------------------------------------------
pope
user system total real
sort 0.650000 0.010000 0.660000 ( 0.686630)
rsort 0.740000 0.010000 0.750000 ( 0.806579)
-----------------------------------------------------------------
jordi
user system total real
sort 1.770000 0.020000 1.790000 ( 1.930918)
rsort 2.240000 0.020000 2.260000 ( 2.477109)
-----------------------------------------------------------------
ahoward
user system total real
sort 2.360000 0.020000 2.380000 ( 2.581706)
rsort 2.480000 0.030000 2.510000 ( 2.796861)
-----------------------------------------------------------------
binary42
user system total real
sort 4.170000 0.050000 4.220000 ( 4.693593)
rsort 4.470000 0.050000 4.520000 ( 5.112159)
-----------------------------------------------------------------
mikeauclair
user system total real
sort 44.060000 0.530000 44.590000 ( 54.701128)
rsort 46.280000 0.540000 46.820000 ( 54.965692)
-----------------------------------------------------------------
jqr
user system total real
sort 48.800000 0.540000 49.340000 ( 56.063984)
rsort 50.970000 0.580000 51.550000 ( 59.799366)
-----------------------------------------------------------------
</pre>
<p>Pope wrote a C extension, but jordi and ahoward had impressive pure-Ruby implementations as well. Check out all the entries:</p>
<ul>
<li><a href="https://github.com/pope/version_sorter">pope/version_sorter</a></li>
<li><a href="https://github.com/jordi/version_sorter">jordi/version_sorter</a></li>
<li><a href="https://github.com/ahoward/version_sorter">ahoward/version_sorter</a></li>
<li><a href="https://github.com/binary42/version_sorter">binary42/version_sorter</a></li>
<li><a href="https://github.com/mikeauclair/version_sorter">mikeauclair/version_sorter</a></li>
<li><a href="https://github.com/jqr/version_sorter">jqr/version_sorter</a></li>
</ul>
defunkt
tag:github.com,2008:Post/520
2009-10-13T10:31:10-07:00
2009-10-13T10:50:35-07:00
A Note on Today's Outage
<p>We had an outage this morning from 06:32 to 07:42 <span class="caps">PDT</span>. One of the file servers experienced an unusually high load that caused the heartbeat monitor on that file server pair to behave abnormally and confuse the dynamic hostname that points to the active file server in the pair. This in turn caused the frontends to start timing out and resulted in their removal from the load balancer. Here is what we intend to do to prevent this from happening in the future:</p>
<ul>
<li>The slave file servers are still in standby mode from the migration. We will have a maintenance window tonight at 22:00 <span class="caps">PDT</span> in order to ensure that slaves are ready to take over as master should the existing masters exhibit this kind of behavior.</li>
<li>To identify the root cause of the load spikes we will be enabling process accounting on the file servers so that we may inspect what processes are causing the high load.</li>
<li>As a related item, the site still gives a “connection refused” error when all the frontends are out of load balancer rotation. We are working on determining why the placeholder site that should be shown during this type of outage is not being brought up.</li>
<li>We’ve also identified a problem with the single unix domain socket upstream approach in Nginx. By default, any upstream failures cause Nginx to consider that upstream defunct and remove it from service for a short period. With only a single upstream, this obviously presents a problem. We are testing a change to the configuration that should make Nginx always try upstreams.</li>
</ul>
<p>We apologize for the downtime and any inconvenience it may have caused. Thank you for your patience and understanding as we continue to refine our Rackspace setup and deal with unanticipated events.</p>
mojombo
tag:github.com,2008:Post/519
2009-10-12T11:15:15-07:00
2009-10-12T11:26:05-07:00
unicorn.god
<p>Some people have been asking for our <a href="https://github.com/blog/517-unicorn">Unicorn</a> <a href="https://god.rubyforge.org/">god</a> config.</p>
<p>Here it is:</p>
<script src="https://gist.github.com/208581.js"></script><noscript><pre># <a href="https://unicorn.bogomips.org/SIGNALS.html">https://unicorn.bogomips.org/<span class="caps">SIGNALS</span>.html</a><br />
<br />
rails_env = <span class="caps">ENV</span>[‘RAILS_ENV’] || ‘production’<br />
rails_root = <span class="caps">ENV</span>[‘RAILS_ROOT’] || “/data/github/current”<br />
<br />
God.watch do |w|<br />
w.name = “unicorn”<br />
w.interval = 30.seconds # default<br />
<br />
# unicorn needs to be run from the rails root<br />
w.start = “cd #{rails_root} && /usr/local/bin/unicorn_rails -c #{rails_root}/config/unicorn.rb -E #{rails_env} -D”<br />
<br />
# <span class="caps">QUIT</span> gracefully shuts down workers<br />
w.stop = “kill -<span class="caps">QUIT</span> `cat #{rails_root}/tmp/pids/unicorn.pid`”<br />
<br />
# USR2 causes the master to re-create itself and spawn a new worker pool<br />
w.restart = “kill -USR2 `cat #{rails_root}/tmp/pids/unicorn.pid`”<br />
<br />
w.start_grace = 10.seconds<br />
w.restart_grace = 10.seconds<br />
w.pid_file = “#{rails_root}/tmp/pids/unicorn.pid”<br />
<br />
w.uid = ‘git’<br />
w.gid = ‘git’<br />
<br />
w.behavior(:clean_pid_file)<br />
<br />
w.start_if do |start|<br />
start.condition(:process_running) do |c|<br />
c.interval = 5.seconds<br />
c.running = false<br />
end<br />
end<br />
<br />
w.restart_if do |restart|<br />
restart.condition(:memory_usage) do |c|<br />
c.above = 300.megabytes<br />
c.times = [3, 5] # 3 out of 5 intervals<br />
end<br />
<br />
restart.condition(:cpu_usage) do |c|<br />
c.above = 50.percent<br />
c.times = 5<br />
end<br />
end<br />
<br />
# lifecycle<br />
w.lifecycle do |on|<br />
on.condition(:flapping) do |c|<br />
c.to_state = [:start, :restart]<br />
c.times = 5<br />
c.within = 5.minute<br />
c.transition = :unmonitored<br />
c.retry_in = 10.minutes<br />
c.retry_times = 5<br />
c.retry_within = 2.hours<br />
end<br />
end<br />
end</pre></noscript>
<p>That’s for starting and stopping the master. It’s important to note that god only knows about the master – not the workers. The memory limit condition, then, only applies to the master (and is probably never hit).</p>
<p>To watch the workers we use a cute hack <a href="https://github.com/mojombo">mojombo</a> came up with (though he promises first class support in future versions of code): we start a thread and periodically check the memory usage of workers. If a worker is gobbling up more than 300mb of <span class="caps">RSS</span>, we send it a <span class="caps">QUIT</span>. The <span class="caps">QUIT</span> tells it to die once it finishes processing the current request. Once that happens the master will spawn a new worker – we should hardly notice.</p>
<script src="https://gist.github.com/208588.js"></script><noscript><pre># This will ride alongside god and kill any rogue memory-greedy<br />
# processes. Their sacrifice is for the greater good.<br />
<br />
unicorn_worker_memory_limit = 300_000<br />
<br />
Thread.new do<br />
loop do<br />
begin<br />
# unicorn workers<br />
#<br />
# ps output line format:<br />
# 31580 275444 unicorn_rails worker<sup class="footnote"><a href="#fn15">15</a></sup> -c /data/github/current/config/unicorn.rb -E production -D<br />
# pid ram command<br />
<br />
lines = `ps -e -www -o pid,rss,command | grep ‘[u]nicorn_rails worker’`.split(“\n”)<br />
lines.each do |line|<br />
parts = line.split(’ ‘)<br />
if parts<sup class="footnote"><a href="#fn1">1</a></sup>.to_i > unicorn_worker_memory_limit<br />
# tell the worker to die after it finishes serving its request<br />
::Process.kill(’QUIT’, parts<sup class="footnote"><a href="#fn0">0</a></sup>.to_i)<br />
end<br />
end<br />
rescue Object<br />
# don’t die ever once we’ve tested this<br />
nil<br />
end<br />
<br />
sleep 30<br />
end<br />
end</pre></noscript>
<p>That’s it! Don’t forget the <a href="https://unicorn.bogomips.org/SIGNALS.html">Unicorn Signals</a> page when working with Unicorn.</p>
defunkt
tag:github.com,2008:Post/518
2009-10-12T09:55:30-07:00
2009-10-12T09:57:15-07:00
Gemcutter Railscast
<p><a href="https://github.com/rbates">rbates</a> has a great (as always) <a href="https://railscasts.com/episodes/183-gemcutter-jeweler">screencast</a> on <a href="https://github.com/technicalpickles/jeweler">Jeweler</a> and <a href="https://gemcutter.org/">Gemcutter</a>.</p>
<div align="center"><a href="https://railscasts.com/episodes/183-gemcutter-jeweler"><img src="https://img.skitch.com/20091012-rrf7756fn2kpqeqbx7qukg8xrn.png"/></a></div>
<p>Check it out and give Gemcutter a try!</p>
<p>(<a href="https://github.com/sr/mg">mg</a> looks like another good tool to help you create gems, too (though I haven’t used it).)</p>
defunkt
tag:github.com,2008:Post/517
2009-10-09T10:44:47-07:00
2009-10-09T14:23:38-07:00
Unicorn!
<p>We’ve been running <a href="https://unicorn.bogomips.org/">Unicorn</a> for more than a month. Time to talk about it.</p>
<h3>What is it?</h3>
<p>Unicorn is an <span class="caps">HTTP</span> server for Ruby, similar to Mongrel or Thin. It uses Mongrel’s Ragel <span class="caps">HTTP</span> parser but has a dramatically different architecture and philosophy.</p>
<p>In the classic setup you have nginx sending requests to a pool of mongrels using a smart balancer or a simple round robin.</p>
<p>Eventually you want better visibility and reliability out of your load balancing situation, so you throw haproxy into the mix:</p>
<div align="center"><img src="https://img.skitch.com/20091009-dttaae9jykqttjdejd5qrrs37m.png"/></div>
<p>Which works great. We ran this setup for a long time and were very happy with it. However, there are a few problems.</p>
<h4>Slow Actions</h4>
<p>When actions take longer than 60s to complete, Mongrel will try to kill the thread. This has proven unreliable due to Ruby’s threading. Mongrels will often get into a “stuck” stage and need to be killed by some external process (e.g. god or monit).</p>
<p>Yes, this is a problem with our application. No action should ever take 60s. But we have a complicated application with many moving parts and things go wrong. Our production environment needs to handle errors and failures gracefully.</p>
<h4>Memory Growth</h4>
<p>We restart mongrels that hit a certain memory threshhold. This is often a problem with parts of our application. Engine Yard has a great post on <a href="https://www.engineyard.com/blog/2009/thats-not-a-memory-leak-its-bloat/">memory bloat</a> and how to deal with it.</p>
<p>Like slow actions, however, it happens. You need to be prepared for things to not always be perfect, and so does your production environment. We don’t kill app servers often due to memory bloat, but it happens.</p>
<h4>Slow Deploys</h4>
<p>When your server’s <span class="caps">CPU</span> is pegged, restarting 9 mongrels hurts. Each one has to load all of Rails, all your gems, all your libraries, and your app into memory before it can start serving requests. They’re all doing the exact same thing but fighting each other for resources.</p>
<p>During that time, you’ve killed your old mongrels so any users hitting your site have to wait for the mongrels to be fully started. If you’re really overloaded, this can result in 10s+ waits. Ouch.</p>
<p>There are some complicated solutions that automate “rolling restarts” with multiple haproxy setups and restarting mongrels in different pools. But, as I said, they’re complicated and not foolproof.</p>
<h4>Slow Restarts</h4>
<p>As with the deploys, any time a mongrel is killed due to memory growth or timeout problems it will take multiple seconds until it’s ready to serve requests again. During peak load this can have a noticeable impact on the site’s responsiveness.</p>
<h4>Push Balancing</h4>
<p>With most popular load balancing solutions, requests are handed to a load balancer who decides which mongrel will service it. The better the load balancer, the smarter it is about knowing who is ready.</p>
<p>This is typically why you’d graduate from an nginx-based load balancing solution to haproxy: haproxy is better at queueing up requests and handing them to mongrels who can actually serve them.</p>
<p>At the end of the day, though, the load balancer is still pushing requests to the mongrels. You run the risk of pushing a request to a mongrel who may not be the best candidate for serving a request at that time.</p>
<h2>Unicorn</h2>
<div align="center"><img src="https://github.com/images/error/angry_unicorn.png"/></div>
<p><a href="https://unicorn.bogomips.org/">Unicorn</a> has a slightly different architecture. Instead of the nginx => haproxy => mongrel cluster setup you end up with something like:</p>
<div align="center"><img src="https://img.skitch.com/20091009-nhkxprrntc4k9u1x4kbe14x61t.png"/></div>
<p>nginx sends requests directly to the Unicorn worker pool over a Unix Domain Socket (or <span class="caps">TCP</span>, if you prefer). The Unicorn master manages the workers while the OS handles balancing, which we’ll talk about in a second. The master itself never sees any requests.</p>
<p>Here’s the only difference between our nginx => haproxy and nginx => unicorn configs:</p>
<pre># port 3000 is haproxy
upstream github {
server 127.0.0.1:3000;
}
# unicorn master opens a unix domain socket
upstream github {
server unix:/data/github/current/tmp/sockets/unicorn.sock;
}</pre>
<p>When the Unicorn master starts, it loads our app into memory. As soon as it’s ready to serve requests it forks 16 workers. Those workers then select() on the socket, only serving requests they’re capable of handling. In this way the kernel handles the load balancing for us.</p>
<h4>Slow Actions</h4>
<p>The Unicorn master process knows exactly how long each worker has been processing a request. If a worker takes longer than 30s (we lowered it from mongrel’s default of 60s) to respond, the master immediately kills the worker and forks a new one. The new worker is instantly able to serve a new request – no multi-second startup penalty.</p>
<p>When this happens the client is sent a 502 error page. You may have seen <a href="https://github.com/502">ours</a> and wondered what it meant. Usually it means your request was killed before it completed.</p>
<h4>Memory Growth</h4>
<p>When a worker is using too much memory, god or monit can send it a <span class="caps">QUIT</span> signal. This tells the worker to die after finishing the current request. As soon as the worker dies, the master forks a new one which is instantly able to serve requests. In this way we don’t have to kill your connection mid-request or take a startup penalty.</p>
<h4>Slow Deploys</h4>
<p>Our deploys are ridiculous now. Combined with our <a href="https://github.com/blog/470-deployment-script-spring-cleaning">custom Capistrano recipes</a>, they’re very fast. Here’s what we do.</p>
<p>First we send the existing Unicorn master a USR2 signal. This tells it to begin starting a new master process, reloading all our app code. When the new master is fully loaded it forks all the workers it needs. The first worker forked notices there is still an old master and sends it a <span class="caps">QUIT</span> signal.</p>
<p>When the old master receives the <span class="caps">QUIT</span>, it starts gracefully shutting down its workers. Once all the workers have finished serving requests, it dies. We now have a fresh version of our app, fully loaded and ready to receive requests, without any downtime: the old and new workers all share the Unix Domain Socket so nginx doesn’t have to even care about the transition.</p>
<p>We can also use this process to upgrade Unicorn itself.</p>
<p>What about migrations? Simple: just throw up a “The site is temporarily down for maintenance” page, run the migration, restart Unicorn, then remove the downtime page. Same as it ever was.</p>
<h4>Slow Restarts</h4>
<p>As mentioned above, restarts are only slow when the master has to start. Workers can be killed and re-fork() incredibly fast.</p>
<p>When we are doing a full restart, only one process is ever loading all the app code: the master. There are no wasted cycles.</p>
<h4>Push Balancing</h4>
<p>Instead of being pushed requests, workers pull requests. <a href="https://github.com/rtomayko">Ryan Tomayko</a> has a great article on the nitty gritties of this process titled <a href="https://tomayko.com/writings/unicorn-is-unix">I like Unicorn because it’s Unix</a>.</p>
<p>Basically, a worker asks for a request when it’s ready to serve one. Simple.</p>
<h2>Migration Strategy</h2>
<p>So, you want to migrate from thin or mongrel cluster to Unicorn? If you’re running an nginx => haproxy => cluster setup it’s pretty easy. Instead of changing any settings, you can simply tell the Unicorn workers to listen on a <span class="caps">TCP</span> port when they are forked. These ports can match the ports of your current mongrels.</p>
<p>Check out the <a href="https://unicorn.bogomips.org/Unicorn/Configurator.html">Configurator documentation</a> for an example of this method. Specifically this part:</p>
<pre>after_fork do |server, worker|
# per-process listener ports for debugging/admin/migrations
addr = "127.0.0.1:#{9293 + worker.nr}"
server.listen(addr, :tries => -1, :delay => 5, :tcp_nopush => true)
end</pre>
<p>This tells each worker to start listening on a port equal to their worker # + 9293 forever – they’ll keep trying to bind until the port is available.</p>
<p>Using this trick you can start up a pool of Unicorn workers, then shut down your existing pool of mongrel or thin app servers when the Unicorns are ready. The workers will bind to the ports as soon as possible and start serving requests.</p>
<p>It’s a good way to get familiar with Unicorn without touching your haproxy or nginx configs.</p>
<p>(For fun, try running “kill -9” on a worker then doing a “ps aux”. You probably won’t even notice it was gone.)</p>
<p>Once you’re comfortable with Unicorn and have your deploy scripts ready, you can modify nginx’s upstream to use Unix Domain Sockets then stop opening ports in the Unicorn workers. Also, no more haproxy.</p>
<h2>GitHub’s Setup</h2>
<p>Here’s our Unicorn config in all its glory:</p>
<script src="https://gist.github.com/206253.js"></script><noscript><pre># unicorn_rails -c /data/github/current/config/unicorn.rb -E production -D<br />
<br />
rails_env = <span class="caps">ENV</span>[‘RAILS_ENV’] || ‘production’<br />
<br />
# 16 workers and 1 master<br />
worker_processes (rails_env == ‘production’ ? 16 : 4)<br />
<br />
# Load rails+github.git into the master before forking workers<br />
# for super-fast worker spawn times<br />
preload_app true<br />
<br />
# Restart any workers that haven’t responded in 30 seconds<br />
timeout 30<br />
<br />
# Listen on a Unix data socket<br />
listen ‘/data/github/current/tmp/sockets/unicorn.sock’, :backlog => 2048<br />
<br />
##<br />
# <span class="caps">REE</span><br />
<br />
# <a href="https://www.rubyenterpriseedition.com/faq.html#adapt_apps_for_cow">https://www.rubyenterpriseedition.com/faq.html#adapt_apps_for_cow</a><br />
if GC.respond_to?(:copy_on_write_friendly=)<br />
GC.copy_on_write_friendly = true<br />
end<br />
<br />
<br />
before_fork do |server, worker|<br />
##<br />
# When sent a USR2, Unicorn will suffix its pidfile with .oldbin and<br />
# immediately start loading up a new version of itself (loaded with a new<br />
# version of our app). When this new Unicorn is completely loaded<br />
# it will begin spawning workers. The first worker spawned will check to<br />
# see if an .oldbin pidfile exists. If so, this means we’ve just booted up<br />
# a new Unicorn and need to tell the old one that it can now die. To do so<br />
# we send it a <span class="caps">QUIT</span>.<br />
#<br />
# Using this method we get 0 downtime deploys.<br />
<br />
old_pid = RAILS_ROOT + ‘/tmp/pids/unicorn.pid.oldbin’<br />
if File.exists?(old_pid) && server.pid != old_pid<br />
begin<br />
Process.kill(“<span class="caps">QUIT</span>”, File.read(old_pid).to_i)<br />
rescue Errno::<span class="caps">ENOENT</span>, Errno::<span class="caps">ESRCH</span><br />
# someone else did our job for us<br />
end<br />
end<br />
end<br />
<br />
<br />
after_fork do |server, worker|<br />
##<br />
# Unicorn master loads the app then forks off workers – because of the way<br />
# Unix forking works, we need to make sure we aren’t using any of the parent’s<br />
# sockets, e.g. db connection<br />
<br />
ActiveRecord::Base.establish_connection<br />
<span class="caps">CHIMNEY</span>.client.connect_to_server<br />
# Redis and Memcached would go here but their connections are established<br />
# on demand, so the master never opens a socket<br />
<br />
<br />
##<br />
# Unicorn master is started as root, which is fine, but let’s<br />
# drop the workers to git:git<br />
<br />
begin<br />
uid, gid = Process.euid, Process.egid<br />
user, group = ‘git’, ‘git’<br />
target_uid = Etc.getpwnam(user).uid<br />
target_gid = Etc.getgrnam(group).gid<br />
worker.tmp.chown(target_uid, target_gid)<br />
if uid != target_uid || gid != target_gid<br />
Process.initgroups(user, target_gid)<br />
Process::<span class="caps">GID</span>.change_privilege(target_gid)<br />
Process::<span class="caps">UID</span>.change_privilege(target_uid)<br />
end<br />
rescue => e<br />
if RAILS_ENV == ‘development’<br />
<span class="caps">STDERR</span>.puts “couldn’t change user, oh well”<br />
else<br />
raise e<br />
end<br />
end<br />
end</pre></noscript>
<p>I recommend making the <a href="https://unicorn.bogomips.org/SIGNALS.html"><span class="caps">SIGNALS</span></a> documentation your new home page and reading all the other pages available at the <a href="https://unicorn.bogomips.org/">Unicorn site</a>. It’s very well documented and Eric is focusing on improving it every day.</p>
<h2>Speed</h2>
<p>Honestly, I don’t care. I want a production environment that can gracefully handle chaos more than I want something that’s screaming fast. I want stability and reliability over raw speed.</p>
<p>Luckily, Unicorn seems to offer both.</p>
<p>Here are Tom’s benchmarks on our Rackspace bare metal hardware. We ran GitHub on one machine and the benchmarks on a separate machine. The servers are 8 core 16GB boxes connected via gigabit ethernet.</p>
<p>What we’re testing is a single Rails action rendering a simple string. This means each requeust goes through the entire Rails routing process and all that jazz.</p>
<p>Mongrel has haproxy in front of it. unicorn-tcp is using a port opened by the master, unicorn-unix with a 1024 backlog is the master opening a unix domain socket with the default “listen” backlog, and the 2048 backlog is the same setup with an increased “listen” backlog.</p>
<p>These benchmarks examine as many requests as we were able to push through before getting any 502 or 500 errors. Each test uses 8 workers.</p>
<pre>
mongrel
8: Reply rate [replies/s]:
min 1270.4 avg 1301.7 max 1359.7 stddev 50.3 (3 samples)
unicorn-tcp
8: Reply rate [replies/s]:
min 1341.7 avg 1351.0 max 1360.7 stddev 7.8 (4 samples)
unicorn-unix (1024 backlog)
8: Reply rate [replies/s]:
min 1148.2 avg 1149.7 max 1152.1 stddev 1.8 (4 samples)
unicorn-unix (2048 backlog)
8: Reply rate [replies/s]:
min 1462.2 avg 1502.3 max 1538.7 stddev 39.6 (4 samples)
</pre>
<h2>Conclusion</h2>
<p><a href="https://www.modrails.com/">Passenger</a> is awesome. <a href="https://mongrel.rubyforge.org/">Mongrel</a> is awesome. <a href="https://code.macournoyer.com/thin/">Thin</a> is awesome.</p>
<p>Use what works best for you. Decide what you need and evaluate the available options based on those needs. Don’t pick a tool because GitHub uses it, pick a tool because it solves the problems you have.</p>
<p>We use Thin to serve the <a href="https://github.com/pjhyett/github-services">GitHub Services</a> and I use Passenger for many of my side projects. Unicorn isn’t for every app.</p>
<p>But it’s working great for us.</p>
<p><strong>Edit:</strong> Tweaked a diagaram and clarified the Unicorn master’s role based on feedback from Eric.</p>
defunkt