Come get your drink on with the people of the Hub at Blackbird this Thursday October 22nd at 8pm. Also, be sure to look out for a possible Drinkup:Shanghai, China edition in the next few days- PJ and Scott are headed out for KungFuRails right now!
CARVIEW |
-
GitHub Meetup SF #8
-
Introducing BERT and BERT-RPC
As I detailed in How We Made GitHub Fast, we have created a new data serialization and RPC protocol to power the GitHub backend. We have big plans for these technologies and I’d like to take a moment to explain what makes them special and the philosophy behind their creation.
The serialization format is called BERT (Binary ERlang Term) and is based on
the existing external term format already implemented by Erlang. The RPC protocol is called BERT-RPC and is a simple protocol built on top of BERT packets.You can view the current specifications at https://bert-rpc.org.
This is a long article; if you want to see some example code of how easy it is to setup an Erlang/Ruby BERT-RPC server and call it from a Ruby BERT-RPC client, skip to the end.
How BERT and BERT-RPC Came to Be
For the new GitHub architecture, we decided to use a simple RPC mechanism to expose the Git repositories as a service. This allows us to federate users across disparate file servers and eliminates the need for a shared file system.
Choosing a data serialization and RPC protocol was a difficult task. My first thought was to look at Thrift and Protocol Buffers since they are both gaining traction as modern, low-latency RPC implementations.
I had some contact with Thrift when I worked at Powerset, I talk to a lot of people that use Thrift at their jobs, and Scott is using Thrift as part of some Cassandra experiments we’re doing. As much as I want to like Thrift, I just can’t. I find the entire concept behind IDLs and code generation abhorrent. Coming from a background in dynamic languages and automated testing, these ideas just seem silly. The developer overhead required to constantly maintain IDLs and keep the corresponding implementation code up to date is too frustrating. I don’t do these things when I write application code, so why should I be forced to do them when I write RPC code?
Protocol Buffers ends up looking very similar to Thrift. More IDLs and more code generation. Any solution that relies on these concepts does not fit well with my worldview. In addition, the set of types available to both Thrift and Protocol Buffers feels limiting compared to what I’d like to easily transmit over the wire.
XML-RPC, SOAP, and other XML based protocols are hardly even worth mentioning. They are unnecessarily verbose and complex. XML is not convertible to a simple unambiguous data structure in any language I’ve ever used. I’ve wasted too many hours of my life clumsily extracting data from XML files to feel anything but animosity towards the format.
JSON-RPC is a nice system, much more inline with how I see the world. It’s simple, relatively compact, has support for a decent set of types, and works well in an agile workflow. A big problem here, though, is the lack of support for native binary data. Our applications will be transmitting large amounts of binary data, and it displeases me to think that every byte of binary data I send across the wire would have to be encoded into an inferior representation just because JSON is a text-based protocol.
After becoming thoroughly disenfranchised with the current “state of the art” RPC protocols, I sat down and started thinking about what the ideal solution would look like. I came up with a list that looked something like this:
- Extreme simplicity
- Dynamic (No IDLs or code generation)
- Good set of types (nil, symbols, hashes, bignums, heterogenous arrays, etc)
- Support for complex types (Time, Regex, etc)
- No need to encode binary data
- Synchronous and Asynchronous calls
- Fast serialization/deserialization
- Streaming (to and from)
- Caching directives
I mentioned before that I like JSON. I love the concept of extracting a subset of a language and using that to facilitate interprocess communication. This got me thinking about the work I’d done with Erlectricity. About two years ago I wrote a C extension for Erlectricity to speed up the deserialization of Erlang’s external term format. I remember being very impressed with the simplicity of the serialization format and how easy it was to parse. Since I was considering using Erlang more within the GitHub architecture, an Erlang-centric solution might be really nice. Putting these pieces together, I was struck by an idea.
What if I extracted the generic parts of Erlang’s external term format and made that into a standard for interprocess communication? What if Erlang had the equivalent of JavaScript’s JSON? And what if an RPC protocol could be built on top of that format? What would those things look like and how simple could they be made?
Of course, the first thing any project needs is a good name, so I started brainstorming acronyms. EETF (Erlang External Term Format) is the obvious one, but it’s boring and not accurate for what I wanted to do since I would only be using a subset of EETF. After a while I came up with BERT for Binary ERlang Term. Not only did this moniker precisely describe the nature of the idea, but it was nearly a person’s name, just like JSON, offering a tip of the hat to my source of inspiration.
Over the next few weeks I sketched out specifications for BERT and BERT-RPC and showed them to a bunch of my developer friends. I got some great feedback on ways to simplify some confusing parts of the spec and was able to boil things down to what I think is the simplest manifestation that still enables the rich set of features that I want these technologies to support.
The responses were generally positive, and I found a lot of people looking for something simple to replace the nightmarish solutions they were currently forced to work with. If there’s one thing I’ve learned in doing open source over the last 5 years, it’s that if I find an idea compelling, then there are probably a boatload of people out there that will feel the same way. So I went ahead with the project and created reference implementations in Ruby that would eventually become the backbone of the new GitHub architecture.
But enough talk, let’s take a look at the Ruby workflow and you’ll see what I mean when I say that BERT and BERT-RPC are built around a philosophy of simplicity and Getting Things Done.
A Simple Example
To give you an idea of how easy it is to get a Ruby based BERT-RPC service running, consider the following simple calculator service:
# calc.rb require 'ernie' mod(:calc) do fun(:add) do |a, b| a + b end end
This is a complete service file suitable for use by my Erlang/Ruby hybrid BERT-RPC server framework called Ernie. You start up the service like so:
$ ernie -p 9999 -n 10 -h calc.rb
This fires up the server on port 9999 and spawns ten Ruby workers to handle requests. Ernie takes care of balancing and queuing incoming connections. All you have to worry about is writing your RPC functions, Ernie takes care of the rest.
To call the service, you can use my Ruby BERT-RPC client called BERTRPC like so:
require 'bertrpc' svc = BERTRPC::Service.new('localhost', 9999) svc.call.calc.add(1, 2) # => 3
That’s it! Nine lines of code to a working example. No IDLs. No code generation. If the module and function that you call from the client exist on the server, then everything goes well. If they don’t, then you get an exception, just like your application code.
Since a BERT-RPC client can be written in any language, you could easily call the calculator service from Python or JavaScript or Lua or whatever. BERT and BERT-RPC are intended to make communicating between different languages as streamlined as possible.
Conclusion
The Ernie framework and the BERTRPC library power the new GitHub and we use them exactly as-is. They’ve been in use since the move to Rackspace three weeks ago and are responsible for serving over 300 million RPC requests in that period. They are still incomplete implementations of the spec, but I plan to flesh them out as time goes on.
If you find BERT and BERT-RPC intriguing, I’d love to hear your feedback. The best place to hold discussions is on the official mailing list. If you want to participate, I’d love to see implementations in more languages. Together, we can make BERT and BERT-RPC the easiest way to get RPC done in every language!
-
How We Made GitHub Fast
Now that things have settled down from the move to Rackspace, I wanted to take some time to go over the architectural changes that we’ve made in order to bring you a speedier, more scalable GitHub.
In my first draft of this article I spent a lot of time explaining why we made each of the technology choices that we did. After a while, however, it became difficult to separate the architecture from the discourse and the whole thing became confusing. So I’ve decided to simply explain the architecture and then write a series of follow up posts with more detailed analyses of exactly why we made the choices we did.
There are many ways to scale modern web applications. What I will be describing here is the method that we chose. This should by no means be considered the only way to scale an application. Consider it a case study of what worked for us given our unique requirements.
Understanding the Protocols
We expose three primary protocols to end users of GitHub: HTTP, SSH, and Git. When browsing the site with your favorite browser, you’re using HTTP. When you clone, pull, or push to a private URL like git@github.com:mojombo/jekyll.git you’re doing so via SSH. When you clone or pull from a public repository via a URL like
git://github.com/mojombo/jekyll.git
you’re using the Git protocol.The easiest way to understand the architecture is by tracing how each of these requests propagates through the system.
Tracing an HTTP Request
For this example I’ll show you how a request for a tree page such as https://github.com/mojombo/jekyll happens.
The first thing your request hits after coming down from the internet is the active load balancer. For this task we use a pair of Xen instances running ldirectord. These are called
lb1a
andlb1b
. At any given time one of these is active and the other is waiting to take over in case of a failure in the master. The load balancer doesn’t do anything fancy. It forwards TCP packets to various servers based on the requested IP and port and can remove misbehaving servers from the balance pool if necessary. In the event that no servers are available for a given pool it can serve a simple static site instead of refusing connections.For requests to the main website, the load balancer ships your request off to one of the four frontend machines. Each of these is an 8 core, 16GB RAM bare metal server. Their names are
fe1
, …,fe4
. Nginx accepts the connection and sends it to a Unix domain socket upon which sixteen Unicorn worker processes are selecting. One of these workers grabs the request and runs the Rails code necessary to fulfill it.Many pages require database lookups. Our MySQL database runs on two 8 core, 32GB RAM bare metal servers with 15k RPM SAS drives. Their names are
db1a
anddb1b
. At any given time, one of them is master and one is slave. MySQL replication is accomplished via DRBD.If the page requires information about a Git repository and that data is not cached, then it will use our Grit library to retrieve the data. In order to accommodate our Rackspace setup, we’ve modified Grit to do something special. We start by abstracting out every call that needs access to the filesystem into the Grit::Git object. We then replace Grit::Git with a stub that makes RPC calls to our Smoke service. Smoke has direct disk access to the repositories and essentially presents Grit::Git as a service. It’s called Smoke because Smoke is just Grit in the cloud. Get it?
The stubbed Grit makes RPC calls to
smoke
which is a load balanced hostname that maps back to thefe
machines. Each frontend runs four ProxyMachine instances behind HAProxy that act as routing proxies for Smoke calls. ProxyMachine is my content aware (layer 7) TCP routing proxy that lets us write the routing logic in Ruby. The proxy examines the request and extracts the username of the repository that has been specified. We then use a proprietary library called Chimney (it routes the smoke!) to lookup the route for that user. A user’s route is simply the hostname of the file server on which that user’s repositories are kept.Chimney finds the route by making a call to Redis. Redis runs on the database servers. We use Redis as a persistent key/value store for the routing information and a variety of other data.
Once the Smoke proxy has determined the user’s route, it establishes a transparent proxy to the proper file server. We have four pairs of fileservers. Their names are
fs1a
,fs1b
, …,fs4a
,fs4b
. These are 8 core, 16GB RAM bare metal servers, each with six 300GB 15K RPM SAS drives arranged in RAID 10. At any given time one server in each pair is active and the other is waiting to take over should there be a fatal failure in the master. All repository data is constantly replicated from the master to the slave via DRBD.Every file server runs two Ernie RPC servers behind HAProxy. Each Ernie spawns 15 Ruby workers. These workers take the RPC call and reconstitute and perform the Grit call. The response is sent back through the Smoke proxy to the Rails app where the Grit stub returns the expected Grit response.
When Unicorn is finished with the Rails action, the response is sent back through Nginx and directly to the client (outgoing responses do not go back through the load balancer).
Finally, you see a pretty web page!
The above flow is what happens when there are no cache hits. In many cases the Rails code uses Evan Weaver’s Ruby memcached client to query the Memcache servers that run on each slave file server. Since these machines are otherwise idle, we place 12GB of Memcache on each. These servers are aliased as
memcache1
, …,memcache4
.BERT and BERT-RPC
For our data serialization and RPC protocol we are using BERT and BERT-RPC. You haven’t heard of them before because they’re brand new. I invented them because I was not satisfied with any of the available options that I evaluated, and I wanted to experiment with an idea that I’ve had for a while. Before you freak out about NIH syndrome (or to help you refine your freak out), please read my accompanying article Introducing BERT and BERT-RPC about how these technologies came to be and what I intend for them to solve.
If you’d rather just check out the spec, head over to https://bert-rpc.org.
For the code hungry, check out my Ruby BERT serialization library BERT, my Ruby BERT-RPC client BERTRPC, and my Erlang/Ruby hybrid BERT-RPC server Ernie. These are the exact libraries we use at GitHub to serve up all repository data.
Tracing an SSH Request
Git uses SSH for encrypted communications between you and the server. In order to understand how our architecture deals with SSH connections, it is first important to understand how this works in a simpler setup.
Git relies on the fact that SSH allows you to execute commands on a remote server. For instance, the command ssh tom@frost ls -al runs
ls -al
in the home directory of my user on thefrost
server. I get the output of the command on my local terminal. SSH is essentially hooking up the STDIN, STDOUT, and STDERR of the remote machine to my local terminal.If you run a command like git clone tom@frost:mojombo/bert, what Git is doing behind the scenes is SSHing to
frost
, authenticating as thetom
user, and then remotely executinggit upload-pack mojombo/bert
. Now your client can talk to that process on the remote server by simply reading and writing over the SSH connection. Neat, huh?Of course, allowing arbitrary execution of commands is unsafe, so SSH includes the ability to restrict what commands can be executed. In a very simple case, you can restrict execution to git-shell which is included with Git. All this script does is check the command that you’re trying to execute and ensure that it’s one of
git upload-pack
,git receive-pack
, orgit upload-archive
. If it is indeed one of those, it uses exec to replace the current process with that new process. After that, it’s as if you had just executed that command directly.So, now that you know how Git’s SSH operations work in a simple case, let me show you how we handle this in GitHub’s architecture.
First, your Git client initiates an SSH session. The connection comes down off the internet and hits our load balancer.
From there, the connection is sent to one of the frontends where SSHD accepts it. We have patched our SSH daemon to perform public key lookups from our MySQL database. Your key identifies your GitHub user and this information is sent along with the original command and arguments to our proprietary script called Gerve (Git sERVE). Think of Gerve as a super smart version of
git-shell
.Gerve verifies that your user has access to the repository specified in the arguments. If you are the owner of the repository, no database lookups need to be performed, otherwise several SQL queries are made to determine permissions.
Once access has been verified, Gerve uses Chimney to look up the route for the owner of the repository. The goal now is to execute your original command on the proper file server and hook your local machine up to that process. What better way to do this than with another remote SSH execution!
I know it sounds crazy but it works great. Gerve simply uses
exec(3)
to replace itself with a call tossh git@<route> <command> <arg>. After this call, your client is hooked up to a process on a frontend machine which is, in turn, hooked up to a process on a file server.Think of it this way: after determining permissions and the location of the repository, the frontend becomes a transparent proxy for the rest of the session. The only drawback to this approach is that the internal SSH is unnecessarily encumbered by the overhead of encryption/decryption when none is strictly required. It’s possible we may replace this this internal SSH call with something more efficient, but this approach is just too damn simple (and still very fast) to make me worry about it very much.
Tracing a Git Request
Performing public clones and pulls via Git is similar to how the SSH method works. Instead of using SSH for authentication and encryption, however, it relies on a server side Git Daemon. This daemon accepts connections, verifies the command to be run, and then uses
fork(2)
andexec(3)
to spawn a worker that then becomes the command process.With this in mind, I’ll show you how a public clone operation works.
First, your Git client issues a request containing the command and repository name you wish to clone. This request enters our system on the load balancer.
From there, the request is sent to one of the frontends. Each frontend runs four ProxyMachine instances behind HAProxy that act as routing proxies for the Git protocol. The proxy inspects the request and extracts the username (or gist name) of the repo. It then uses Chimney to lookup the route. If there is no route or any other error is encountered, the proxy speaks the Git protocol and sends back an appropriate messages to the client. Once the route is known, the repo name (e.g.
mojombo/bert
) is translated into its path on disk (e.g.a/a8/e2/95/mojombo/bert.git
). On our old setup that had no proxies, we had to use a modified daemon that could convert the user/repo into the correct filepath. By doing this step in the proxy, we can now use an unmodified daemon, allowing for a much easier upgrade path.Next, the Git proxy establishes a transparent proxy with the proper file server and sends the modified request (with the converted repository path). Each file server runs two Git Daemon processes behind HAProxy. The daemon speaks the pack file protocol and streams data back through the Git proxy and directly to your Git client.
Once your client has all the data, you’ve cloned the repository and can get to work!
Sub- and Side-Systems
In addition to the primary web application and Git hosting systems, we also run a variety of other sub-systems and side-systems. Sub-systems include the job queue, archive downloads, billing, mirroring, and the svn importer. Side-systems include GitHub Pages, Gist, gem server, and a bunch of internal tools. You can look forward to explanations of how some of these work within the new architecture, and what new technologies we’ve created to help our application run more smoothly.
Conclusion
The architecture outlined here has allowed us to properly scale the site and resulted in massive performance increases across the entire site. Our average Rails response time on our previous setup was anywhere from 500ms to several seconds depending on how loaded the slices were. Moving to bare metal and federated storage on Rackspace has brought our average Rails response time to consistently under 100ms. In addition, the job queue now has no problem keeping up with the 280,000 background jobs we process every day. We still have plenty of headroom to grow with the current set of hardware, and when the time comes to add more machines, we can add new servers on any tier with ease. I’m very pleased with how well everything is working, and if you’re like me, you’re enjoying the new and improved GitHub every day!
-
Ryan Tomayko is a GitHubber
Today marks Ryan Tomayko’s first day as a GitHubber. He’ll be helping make GitHub more stable, reliable, and awesome.
Ryan has consistently impressed all of us with his work on Sinatra and Rack, the awesomeness of shotgun and git-sh, his prolific writing and linking, and his various other projects.
You can follow his blog, his twitter, or his GitHub
Welcome to the team, Ryan!
-
Scheduled Maintenance Tonight at 23:00 PDT
We will be having a maintenance window tonight from 23:00 to 23:59 PDT. A very small amount of web unavailability will be required during this period.
We will be upgrading some core libraries to versions that are not compatible with what is currently running, so all daemons must be restarted simultaneously. For this to go smoothly, we will be disabling the web app for perhaps 30 seconds.
UPDATE: Maintenance was completed successfully. Total web unavailability was a tad more than estimated at one minute and 40 seconds. Some job runners did not restart cleanly and as a result some jobs failed, but all job runners are operating normally now. If you experienced any problems during the maintenance window, don’t hesitate to contact us at https://support.github.com.
-
Helping with Texting
UNICEF is using SMS to help those in need. And they’re doing it with open source.
You can read all about RapidSMS, their Mobile and SMS platform, but here’s a snippet:
The impact a RapidSMS implementation has on UNICEF’s work practices is dramatic. In October 2008, Ethiopia experienced crippling droughts. Faced with the possibility of famine, UNICEF Ethiopia launched a massive food distribution program to supply the high-protein food Plumpy’nut to under-nourished children at more than 1,800 feeding centres in the country. Previously, UNICEF monitored the distribution of food by sending a small set of individuals who traveled to each feeding center. The monitor wrote down the amount of food that was received, was distributed, and if more food was needed. There had been a two week to two month delay between the collection of that data and analysis, prolonging action. In a famine situation each day can mean the difference between recovery, starvation, or even death.
The Ethiopian implementation of RapidSMS completely eliminated the delay. After a short training session the monitors would enter information directly into their mobile phones as SMS messages. This data would instantaneously appear on the server and immediately be visualized into graphs showing potential distribution problem and displayed on a map clearly showing where the problems were. The data could be seen, not only by the field office, but by the regional office, supply division and even headquarters, greatly improving response coordination. The process of entering the data into phones was also easier and more cost effective for the monitors themselves leading to quick adoption of the technology.
What a great use of technology. The site says, “GSMA [predicts] that by 2010, 90% of the world will be covered by mobile networks.” Seems like SMS is going to become more important and more ubiquitous in the future.
Check out the RapidSMS home page or browse the source, right here on GitHub: https://github.com/rapidsms/rapidsms
-
Scheduled Maintenance Tonight at 22:00 PDT
We’re having another maintenance window tonight from 22:00 to 23:00 PDT. We will be installing and testing the “sorry server” that will be enabled if no frontends are available to serve requests. Instead of just refusing connections, this server will point you to the Twitter status feed and display information on surviving when GitHub is down. In order to test that everything is working properly we will need to deliberately remove all the frontends from the load balancer for a small period. Git and SSH access to repositories will be unaffected during this period.
-
GitHub Ribbon in CSS
jbalogh has a write up on how to implement GitHub’s Ribbons in pure CSS: Redoing the GitHub Ribbon in CSS
Daniel Perez Alvarez has a similar writeup which includes a nice “supported browsers” table. Awesome work!
-
TUAW on GitHub
The Unofficial Apple Weblog, or TUAW, is now on GitHub: https://github.com/tuaw
While there’s nothing there yet, they only announced the account yesterday and said in the post, “This is where you’ll be able to find code for our developer-related posts. We’ll try to get some projects hosted in there very soon, so don’t worry that it’s empty now!”
Today they’ve featured ClickToFlash, everyone’s favorite Safari plugin, in a blog post.
Welcome, TUAWers!
-
Gist Improvements
Another day, more updates! We just rolled out some subtle changes to Gist
- Search! All gists are now searchable. Want to find all gists about unicorn? Done.
- Unified user box. You may have noticed we rolled out a new user box a couple days ago – that now follows you to Gist as well.
- Added your most recent gists on the home page.
- Tweaked UI. Some of the design elements of Gist have changed a bit to be more consistent with the rest of the site.
Hope you enjoy!
-
Speedy Version Sorting
Last week I offered fame and fortune to anyone who could speed up our version_sorter. It’s used to sort a repo’s tags:
This morning I ran the numbers and the winner is…
Pope!
Special thanks to binary42, pope, jordi, ahoward, jqr, and mikeauclair for speeding up the code.
Here are my benchmarks from fastest to slowest. I used this script with this dataset to run them.
version_sorter benchmarks sorting 1,311 tags 100 times original user system total real sort 49.840000 0.570000 50.410000 ( 60.088636) rsort 51.610000 0.610000 52.220000 ( 61.462576) ----------------------------------------------------------------- pope user system total real sort 0.650000 0.010000 0.660000 ( 0.686630) rsort 0.740000 0.010000 0.750000 ( 0.806579) ----------------------------------------------------------------- jordi user system total real sort 1.770000 0.020000 1.790000 ( 1.930918) rsort 2.240000 0.020000 2.260000 ( 2.477109) ----------------------------------------------------------------- ahoward user system total real sort 2.360000 0.020000 2.380000 ( 2.581706) rsort 2.480000 0.030000 2.510000 ( 2.796861) ----------------------------------------------------------------- binary42 user system total real sort 4.170000 0.050000 4.220000 ( 4.693593) rsort 4.470000 0.050000 4.520000 ( 5.112159) ----------------------------------------------------------------- mikeauclair user system total real sort 44.060000 0.530000 44.590000 ( 54.701128) rsort 46.280000 0.540000 46.820000 ( 54.965692) ----------------------------------------------------------------- jqr user system total real sort 48.800000 0.540000 49.340000 ( 56.063984) rsort 50.970000 0.580000 51.550000 ( 59.799366) -----------------------------------------------------------------
Pope wrote a C extension, but jordi and ahoward had impressive pure-Ruby implementations as well. Check out all the entries:
-
A Note on Today's Outage
We had an outage this morning from 06:32 to 07:42 PDT. One of the file servers experienced an unusually high load that caused the heartbeat monitor on that file server pair to behave abnormally and confuse the dynamic hostname that points to the active file server in the pair. This in turn caused the frontends to start timing out and resulted in their removal from the load balancer. Here is what we intend to do to prevent this from happening in the future:
- The slave file servers are still in standby mode from the migration. We will have a maintenance window tonight at 22:00 PDT in order to ensure that slaves are ready to take over as master should the existing masters exhibit this kind of behavior.
- To identify the root cause of the load spikes we will be enabling process accounting on the file servers so that we may inspect what processes are causing the high load.
- As a related item, the site still gives a “connection refused” error when all the frontends are out of load balancer rotation. We are working on determining why the placeholder site that should be shown during this type of outage is not being brought up.
- We’ve also identified a problem with the single unix domain socket upstream approach in Nginx. By default, any upstream failures cause Nginx to consider that upstream defunct and remove it from service for a short period. With only a single upstream, this obviously presents a problem. We are testing a change to the configuration that should make Nginx always try upstreams.
We apologize for the downtime and any inconvenience it may have caused. Thank you for your patience and understanding as we continue to refine our Rackspace setup and deal with unanticipated events.
-
unicorn.god
Some people have been asking for our Unicorn god config.
Here it is:
That’s for starting and stopping the master. It’s important to note that god only knows about the master – not the workers. The memory limit condition, then, only applies to the master (and is probably never hit).
To watch the workers we use a cute hack mojombo came up with (though he promises first class support in future versions of code): we start a thread and periodically check the memory usage of workers. If a worker is gobbling up more than 300mb of RSS, we send it a QUIT. The QUIT tells it to die once it finishes processing the current request. Once that happens the master will spawn a new worker – we should hardly notice.
That’s it! Don’t forget the Unicorn Signals page when working with Unicorn.
-
Gemcutter Railscast
rbates has a great (as always) screencast on Jeweler and Gemcutter.
Check it out and give Gemcutter a try!
(mg looks like another good tool to help you create gems, too (though I haven’t used it).)
-
Unicorn!
We’ve been running Unicorn for more than a month. Time to talk about it.
What is it?
Unicorn is an HTTP server for Ruby, similar to Mongrel or Thin. It uses Mongrel’s Ragel HTTP parser but has a dramatically different architecture and philosophy.
In the classic setup you have nginx sending requests to a pool of mongrels using a smart balancer or a simple round robin.
Eventually you want better visibility and reliability out of your load balancing situation, so you throw haproxy into the mix:
Which works great. We ran this setup for a long time and were very happy with it. However, there are a few problems.
Slow Actions
When actions take longer than 60s to complete, Mongrel will try to kill the thread. This has proven unreliable due to Ruby’s threading. Mongrels will often get into a “stuck” stage and need to be killed by some external process (e.g. god or monit).
Yes, this is a problem with our application. No action should ever take 60s. But we have a complicated application with many moving parts and things go wrong. Our production environment needs to handle errors and failures gracefully.
Memory Growth
We restart mongrels that hit a certain memory threshhold. This is often a problem with parts of our application. Engine Yard has a great post on memory bloat and how to deal with it.
Like slow actions, however, it happens. You need to be prepared for things to not always be perfect, and so does your production environment. We don’t kill app servers often due to memory bloat, but it happens.
Slow Deploys
When your server’s CPU is pegged, restarting 9 mongrels hurts. Each one has to load all of Rails, all your gems, all your libraries, and your app into memory before it can start serving requests. They’re all doing the exact same thing but fighting each other for resources.
During that time, you’ve killed your old mongrels so any users hitting your site have to wait for the mongrels to be fully started. If you’re really overloaded, this can result in 10s+ waits. Ouch.
There are some complicated solutions that automate “rolling restarts” with multiple haproxy setups and restarting mongrels in different pools. But, as I said, they’re complicated and not foolproof.
Slow Restarts
As with the deploys, any time a mongrel is killed due to memory growth or timeout problems it will take multiple seconds until it’s ready to serve requests again. During peak load this can have a noticeable impact on the site’s responsiveness.
Push Balancing
With most popular load balancing solutions, requests are handed to a load balancer who decides which mongrel will service it. The better the load balancer, the smarter it is about knowing who is ready.
This is typically why you’d graduate from an nginx-based load balancing solution to haproxy: haproxy is better at queueing up requests and handing them to mongrels who can actually serve them.
At the end of the day, though, the load balancer is still pushing requests to the mongrels. You run the risk of pushing a request to a mongrel who may not be the best candidate for serving a request at that time.
Unicorn
Unicorn has a slightly different architecture. Instead of the nginx => haproxy => mongrel cluster setup you end up with something like:
nginx sends requests directly to the Unicorn worker pool over a Unix Domain Socket (or TCP, if you prefer). The Unicorn master manages the workers while the OS handles balancing, which we’ll talk about in a second. The master itself never sees any requests.
Here’s the only difference between our nginx => haproxy and nginx => unicorn configs:
# port 3000 is haproxy upstream github { server 127.0.0.1:3000; } # unicorn master opens a unix domain socket upstream github { server unix:/data/github/current/tmp/sockets/unicorn.sock; }
When the Unicorn master starts, it loads our app into memory. As soon as it’s ready to serve requests it forks 16 workers. Those workers then select() on the socket, only serving requests they’re capable of handling. In this way the kernel handles the load balancing for us.
Slow Actions
The Unicorn master process knows exactly how long each worker has been processing a request. If a worker takes longer than 30s (we lowered it from mongrel’s default of 60s) to respond, the master immediately kills the worker and forks a new one. The new worker is instantly able to serve a new request – no multi-second startup penalty.
When this happens the client is sent a 502 error page. You may have seen ours and wondered what it meant. Usually it means your request was killed before it completed.
Memory Growth
When a worker is using too much memory, god or monit can send it a QUIT signal. This tells the worker to die after finishing the current request. As soon as the worker dies, the master forks a new one which is instantly able to serve requests. In this way we don’t have to kill your connection mid-request or take a startup penalty.
Slow Deploys
Our deploys are ridiculous now. Combined with our custom Capistrano recipes, they’re very fast. Here’s what we do.
First we send the existing Unicorn master a USR2 signal. This tells it to begin starting a new master process, reloading all our app code. When the new master is fully loaded it forks all the workers it needs. The first worker forked notices there is still an old master and sends it a QUIT signal.
When the old master receives the QUIT, it starts gracefully shutting down its workers. Once all the workers have finished serving requests, it dies. We now have a fresh version of our app, fully loaded and ready to receive requests, without any downtime: the old and new workers all share the Unix Domain Socket so nginx doesn’t have to even care about the transition.
We can also use this process to upgrade Unicorn itself.
What about migrations? Simple: just throw up a “The site is temporarily down for maintenance” page, run the migration, restart Unicorn, then remove the downtime page. Same as it ever was.
Slow Restarts
As mentioned above, restarts are only slow when the master has to start. Workers can be killed and re-fork() incredibly fast.
When we are doing a full restart, only one process is ever loading all the app code: the master. There are no wasted cycles.
Push Balancing
Instead of being pushed requests, workers pull requests. Ryan Tomayko has a great article on the nitty gritties of this process titled I like Unicorn because it’s Unix.
Basically, a worker asks for a request when it’s ready to serve one. Simple.
Migration Strategy
So, you want to migrate from thin or mongrel cluster to Unicorn? If you’re running an nginx => haproxy => cluster setup it’s pretty easy. Instead of changing any settings, you can simply tell the Unicorn workers to listen on a TCP port when they are forked. These ports can match the ports of your current mongrels.
Check out the Configurator documentation for an example of this method. Specifically this part:
after_fork do |server, worker| # per-process listener ports for debugging/admin/migrations addr = "127.0.0.1:#{9293 + worker.nr}" server.listen(addr, :tries => -1, :delay => 5, :tcp_nopush => true) end
This tells each worker to start listening on a port equal to their worker # + 9293 forever – they’ll keep trying to bind until the port is available.
Using this trick you can start up a pool of Unicorn workers, then shut down your existing pool of mongrel or thin app servers when the Unicorns are ready. The workers will bind to the ports as soon as possible and start serving requests.
It’s a good way to get familiar with Unicorn without touching your haproxy or nginx configs.
(For fun, try running “kill -9” on a worker then doing a “ps aux”. You probably won’t even notice it was gone.)
Once you’re comfortable with Unicorn and have your deploy scripts ready, you can modify nginx’s upstream to use Unix Domain Sockets then stop opening ports in the Unicorn workers. Also, no more haproxy.
GitHub’s Setup
Here’s our Unicorn config in all its glory:
I recommend making the SIGNALS documentation your new home page and reading all the other pages available at the Unicorn site. It’s very well documented and Eric is focusing on improving it every day.
Speed
Honestly, I don’t care. I want a production environment that can gracefully handle chaos more than I want something that’s screaming fast. I want stability and reliability over raw speed.
Luckily, Unicorn seems to offer both.
Here are Tom’s benchmarks on our Rackspace bare metal hardware. We ran GitHub on one machine and the benchmarks on a separate machine. The servers are 8 core 16GB boxes connected via gigabit ethernet.
What we’re testing is a single Rails action rendering a simple string. This means each requeust goes through the entire Rails routing process and all that jazz.
Mongrel has haproxy in front of it. unicorn-tcp is using a port opened by the master, unicorn-unix with a 1024 backlog is the master opening a unix domain socket with the default “listen” backlog, and the 2048 backlog is the same setup with an increased “listen” backlog.
These benchmarks examine as many requests as we were able to push through before getting any 502 or 500 errors. Each test uses 8 workers.
mongrel 8: Reply rate [replies/s]: min 1270.4 avg 1301.7 max 1359.7 stddev 50.3 (3 samples) unicorn-tcp 8: Reply rate [replies/s]: min 1341.7 avg 1351.0 max 1360.7 stddev 7.8 (4 samples) unicorn-unix (1024 backlog) 8: Reply rate [replies/s]: min 1148.2 avg 1149.7 max 1152.1 stddev 1.8 (4 samples) unicorn-unix (2048 backlog) 8: Reply rate [replies/s]: min 1462.2 avg 1502.3 max 1538.7 stddev 39.6 (4 samples)
Conclusion
Passenger is awesome. Mongrel is awesome. Thin is awesome.
Use what works best for you. Decide what you need and evaluate the available options based on those needs. Don’t pick a tool because GitHub uses it, pick a tool because it solves the problems you have.
We use Thin to serve the GitHub Services and I use Passenger for many of my side projects. Unicorn isn’t for every app.
But it’s working great for us.
Edit: Tweaked a diagaram and clarified the Unicorn master’s role based on feedback from Eric.
-
luckiestmonkey on Oct 21
-
mojombo on Oct 20
-
mojombo on Oct 20
-
defunkt on Oct 19
-
mojombo on Oct 18
-
defunkt on Oct 16
-
mojombo on Oct 15
-
defunkt on Oct 15
-
defunkt on Oct 14
-
kneath on Oct 13
-
defunkt on Oct 13
-
mojombo on Oct 13
-
defunkt on Oct 12
-
defunkt on Oct 12
-
defunkt on Oct 09