Mac OS X 10.4 Tiger introduced a new program called launchd. This came as a surprise to many developers, system-administrators and computer hobbyists. The reactions typically went in the following order:
- What was wrong with the old way?
- Merger of cron, init, inetd and friends? Are you crazy? Isn’t that risky and against the Unix philosophy?
- What? You took away dependancies and you call that a feature?!? You definitely are crazy.
This blog will over time, set the record straight on all counts. But first, I’d like to start at the beginning. I want you, the reader to mentally back up a bit… No wait, a lot. In fact, so far that we’re not even thinking about code or technology anymore. For as long as people have organized themselves together to accomplish a greater goal, the problem of bootstrapping newcomers has and will continue to be a problem. In the case of computers, their genesis wasn’t because somebody was anxious to write programs. Computers were built because people were interested in solving problems or asking questions that were formerly impractical to answer.
To that same end, the launchd project doesn’t exist because we think spawning programs is fun, interesting or even challenging. The launchd project exists to help developers bootstrap their slice of code into the relatively large operating system. So what does Mac OS X share in common with any other operating system, and how does launchd make implementing Mac OS X easier?
An operating system is first and foremost, written and designed by people. Humans are imperfect creatures, and it would be naïve to think that any person or group of people could write bug free software. It would also be naïve to think that today’s attempt at self-organizing the operating system into discrete components will be the same as tomorrow. In essence, the soul of launchd embraces imperfection and fluidity as a fact of life. Let’s enumerate how:
Case 1: Encouraging “Best Effort” Style Programming
Processes fail. If they don’t outright crash, they often simply abort upon the first unexpected condition. No worries. Launchd will happily restart a job. By giving programs a second chance on life, we’re changing the ethos of the programming community in subtle but important ways. First, developers begin to take on more adventurous designs and problems, since the cost of failure has been lowered. Second, developers begin to adopt the best-effort style design for their own code. The net effect of these two changes is that cascading failures are slowly disappearing from the operating system.
Case 2: Encouraging Robustness in a Modern World
Across all parts of the computing spectrum, customers are demanding more flexibility and robustness in the face of an ever changing world. Nobody wants to restart a machine or a single application after making software or hardware configuration changes. The only reason why these restarts have been required is because of assumptions made by programmers. A classic example is assuming that a given hardware device will first be attached when needed and not go away while in use. Another favorite example is assuming that the machine is attached to a network and that the IP address will never change.
We have found that as developers adopt APIs that allow them to monitor and respond to hot-plug/hot-unplug events, the traditional “what order do we launch programs in?” problem evaporates quickly. Yes, that’s right, when a programs are well behaved, we have learned that the order of invocation doesn’t matter. The launchd project embraces that fact and therefore provides no facilities to coordinate the order of invocation. If a developer has an ordering issue, there is, with rare exception, an API already on the system that helps them coordinate programatic callbacks on the conditions they care about.
Case 3: Encouraging More Predictable Availability of Services
Processes don’t operate in a vacuum. They in fact vend and consume services in the form of Unix sockets and/or Mach ports. Having said that, most of I/O a process does is [and should be] hidden behind higher level APIs.
When a program dies, the sockets and ports normally die with it. Any program that attempts to talk directly or indirectly to the other will experience a temporary outage until the target programs finishes reinitializing itself (see case 1).
Alternatively, a developer can avoid transient service unavailability by using launchd as a liaison for fundamental communication handles. The nice implication of this is that the client library code becomes simpler to implement. In Unix terms (and for local IPC only), a connect() either succeeds or it doesn’t. If it fails, a developer can assume that it will keep failing until some form of corrective action is taken by the user or system-administrator.
Case 4: Encouraging “Pay as You Go” Computing
As customers, we don’t like computers slowing down for features that we don’t use. One of the goals of the launchd project is to defer the invocation of programs until we absolutely know they are required. The method we use builds on the feature described in the previous case example.
When launchd acts as a liaison for communication handles, it can use the presence of data on a Unix socket or a message on a Mach port as a request start a given program that will eventually dequeue the socket/port. If we use Mac OS X as a case example, we see how launchd helps Mac OS X stay feature rich and lean on resources at the same time. This is one of the contributing factors to why Mac OS X boots up so quickly.
From a Technical Perspective:
This may appear like a bold statement, but we hope launchd will be as successful as virtual memory was. We view launchd as a manager of virtual processes. Much like virtual memory allows for a page of memory to be allocated long before it is backed by a physical page, launchd allows for processes to be allocated long before they’re backed by a concrete process the kernel knows about. The combinatorial advantages of this are huge, but we’ll get into those on another day, and with another blog post.