| CARVIEW |
What to do
When you want to move out of C++ std::optional in an expression, prefer
to std::exchange it with a std::nullopt:
std::optional<std::string> src = "hi";
auto dest = std::exchange(src, std::nullopt);
The source receives the nullopt state (so it no longer holds a value), and the
expression evaluates to the optional’s former state. If the source was nullopt, then
the expression evaluates to nullopt; if you are sure that the source is non-nullopt,
then you can immediately use .value() or dereference * on the expression to obtain the
underlying value.
(For readers familiar with Rust, this is analogous to Option::take.)
Using std::swap also works, but requires a separate statement and a
previously declared swap destination that is “move-assignable”:
std::optional<std::string> swap_src = "hi";
std::optional<std::string> swap_dest;
std::swap(swap_src, swap_dest);
In some cases, swap may be more efficient, because it is more common for swap to be template-specialized than exchange, but the difference is unlikely to matter outside of hot loops. Consult Godbolt and a profiler if performance is an overriding concern; otherwise prefer the version that’s more readable.
What not to do: pitfalls of other approaches
It is tempting to only move the optional:
std::optional<std::string> moved_from_optional = "hi";
auto only_moved_to = std::move(moved_from_optional);
std::cout << "moved_from_optional: " << moved_from_optional.has_value() << std::endl;
but this leaves the original in the non-nullopt state (older C++ standards proposals
call this the “engaged” state, but modern references don’t seem to use this terminology).
The snippet above will print moved_from_optional: 1.
Leaving the moved-from optional with the hollowed-out, moved-from value is almost never
useful. In the above example, moved_from_optional contains the empty string. It is
almost certainly a programming defect to use this value in any way; you’re probably better
off with an optional that returns false from .has_value() and throws on .value().
What if we only move the contained value, not the optional? The same result pertains:
std::optional<std::string> moved_from_value = "hi";
auto moved_value = std::move(moved_from_value.value());
std::cout << "moved_from_value: " << moved_from_value.has_value() << std::endl;
This will print moved_from_value: 1.
Note that there are multiple references out there suggesting that one of the above
approaches is better than the other. This is wrong: for most practical purposes, they are
not even meaningfully different, and neither is what you want in the common case. The
only benefit, compared to doing exchange or swap with a nullopt, is that you might
save an instruction clearing the discriminant field of the optional. In most cases, this
is unlikely to matter, and may even be optimized away if the compiler can detect that the
source optional is dead after the move.
One can make raw move less error-prone by following it with a separate reset call; the
following call will print move_then_reset: 0:
std::optional<std::string> move_then_reset = "hi";
auto moved_to = std::move(move_then_reset);
move_then_reset.reset();
std::cout << "move_then_reset: " << move_then_reset.has_value() << std::endl;
This prints moved_then_reset: 0. However, this seems strictly worse than using
std::exchange: it is more verbose and requires more diligence.
Postscript
A couple of closing thoughts:
First, as of writing, none of the top hits on Stack Overflow or Google currently recommend
using std::exchange or std::swap for this task, even though they have been in the
standard since C++14 and C++11 respectively. The actual best other source I can find is
this answer on StackOverflow, which is not even the top voted response on
its question. I don’t know what to make of this, other than to point out that if you’re
doing engineering, then (a) don’t expect other people to do your work for you, and (b)
recognize that our information ecosystem does not reliably rank the best information most
highly.
Second, this post is one of just a handful of pieces of writing that I’ve posted to my
extremely sparse homepage in the past few years. You might guess from this that I am some
kind of C++ aficionado. In reality, I have written only tens of KLOC of C++ code in my
entire life, and the necessity of mastering countless subtleties like the above to do
mundane tasks well is a major reason that I dislike C++. Even “modern” C++ is a
terrifying mountain of complexity and user-hostile design with minimal guardrails to
prevent you from tumbling down, and it is not true, as is often claimed, that you can
easily avoid the steep ledges by sticking to “modern” idioms. (For example, bonus fact
about std::optional: the standard does not promise that dereferencing an unset
(nullopt) optional fails in any well-defined way; it’s simply undefined behavior.) If
the use of C++ for building important large-scale systems does not worry you, then you
either don’t know enough C++ or you fundamentally misunderstand the stochastic nature of
software engineering at scale.
Appendix: Complete demonstration code
Save this as optional.cpp and run g++ --std=c++17 optional.cpp && ./a.out:
#include <iostream>
#include <optional>
#include <utility>
template <typename T>
void print(const char *prefix, const std::optional<T> &opt) {
std::cout << prefix;
if (opt.has_value()) {
std::cout << "some(" << *opt << ")";
} else {
std::cout << "none";
}
std::cout << std::endl;
}
int main() {
std::optional<std::string> src = "hi";
auto dest = std::exchange(src, std::nullopt);
print("src: ", src);
print("dest: ", dest);
std::cout << std::endl;
std::optional<std::string> swap_src = "hi";
std::optional<std::string> swap_dest;
std::swap(swap_src, swap_dest);
print("swap_src: ", swap_src);
print("swap_dest: ", swap_dest);
std::cout << std::endl;
std::optional<std::string> moved_from_optional = "hi";
auto only_moved_to = std::move(moved_from_optional);
print("moved_from_optional: ", moved_from_optional);
print("only_moved_to: ", only_moved_to);
std::cout << std::endl;
std::optional<std::string> moved_from_value = "hi";
auto moved_value = std::move(moved_from_value.value());
print("moved_from_value: ", moved_from_value);
std::cout << "moved_value (not an optional): " << moved_value << std::endl;
std::cout << std::endl;
std::optional<std::string> move_then_reset = "hi";
auto moved_to = std::move(move_then_reset);
move_then_reset.reset();
print("move_then_reset: ", move_then_reset);
print("moved_to: ", moved_to);
return 0;
}
Output:
src: none
dest: some(hi)
swap_src: none
swap_dest: some(hi)
moved_from_optional: some()
only_moved_to: some(hi)
moved_from_value: some()
moved_value (not an optional): hi
move_then_reset: none
moved_to: some(hi)
Summary
N. Forsgren, J. Humble, G. Kim. Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations. IT Revolution Press, 2018.
Accelerate is an influential book of advice for building and delivering software. I’m a practicing software engineer; I spend a fair fraction of my free time trying to get better at my craft; so, among other things, I read books. I read this one after coming across numerous positive references (example), and skimming several of the annual “State of DevOps” reports that the authors and their collaborators have published under the “DORA” (DevOps Research and Assessment) banner.
Accelerate investigates some important questions in software development that are hard to investigate. I believe that the authors are smart people who are trying their best. Gathering and analyzing this data was a lot of work, and I admire that effort.
Nevertheless, the claim to have scientifically validated a set of superior software development practices seems questionable. I find the authors’ approach inadequate to the task in multiple ways. Moreover, their use of statistical language, and rhetoric in general, is sloppy enough to raise doubts about whether the methodological issues go deeper than just the language.
As for what you should do in your software development organization: I came away thinking that the authors’ high level recommendations are mostly fine; see the appendix for a summary. However, I urge you to consider for yourself, from first principles, whether improvement along any given axis is the most valuable use of your limited resources, compared to other things you could be doing. For example, think hard about whether investing in improving deploy frequency is where you should spend your time now, and at what point you would see diminishing returns.
At this point, you might reasonably reply:
Yeah? Well, you know, that’s just like uh, your opinion, man. [The Big Lebowski]
Fair enough. Now the burden falls on me to substantiate my opinion. I’ll begin with a summary of the book’s line of argument, and then go into my criticisms.
If bacteria could conceive of the Simulation Argument, they’d probably think, “Intelligent beings will simulate far more of us than could ever exist in reality! We must be simulations!”
In reality, humans prefer to use computers to watch cat videos and chat with other humans.
I posit that this stays true at every conceivable margin of intelligence. Future superintelligences will be chiefly enraptured by media about reality and conversations with other superintelligences. You are a pitiful human-level intelligence, not worth wasting computation on, and therefore not a simulation. Your superintelligent distant progeny will be too busy gawking at each other on Posthuman Tiktok to pay you any mind, much less resurrect you, whether for their own amusement or to punish you.
Some people hypothesize that future beings will nevertheless want to simulate especially interesting humans. However, paraphrasing Doctor Manhattan, I posit that the world’s most interesting human means no more to a god than does its most interesting termite.
Stated generally: once you have the resources to simulate many beings of a given type with high fidelity, doing so becomes uninteresting, at least compared to other uses for those resources. Robin Hanson has put numbers on this argument, although I think he rhetorically overstates the extent to which humans will even be viewed as meaningful “ancestors” by posthumans.
Nick Bostrom’s original paper on the Simulation Argument also mentions the “simulations become uninteresting” scenario (labeled “Proposition (2)”), but implies that it requires a weird change in the motivations of posthuman societies:
Another possible convergence point is that almost all individual posthumans in virtually all posthuman civilizations develop in a direction where they lose their desires to run ancestor-simulations. This would require significant changes to the motivations driving their human predecessors, for there are certainly many humans who would like to run ancestor-simulations if they could afford to do so. But perhaps many of our human desires will be regarded as silly by anyone who becomes a posthuman. Maybe the scientific value of ancestor-simulations to a posthuman civilization is negligible (which is not too implausible given its unfathomable intellectual superiority), and maybe posthumans regard recreational activities as merely a very inefficient way of getting pleasure – which can be obtained much more cheaply by direct stimulation of the brain’s reward centers. One conclusion that follows from (2) is that posthuman societies will be very different from human societies: they will not contain relatively wealthy independent agents who have the full gamut of human-like desires and are free to act on them.
However, I claim we can extrapolate straightforwardly from our current society and its ordinary motivations. How many high-fidelity simulations do humans run of our evolutionary relatives?
To be sure, we run a lot of low-fidelity simulations of many kinds of living things, but those simulations are conceived for our needs, and we invariably abstract away the parts that aren’t relevant to us.
In the scientific realm, today it is technically feasible (barely) to run a molecular-level high-fidelity simulation of some components of a bacterium, but when you want to understand bacterial behavior as a whole it’s far more efficient to use a more abstracted model. Efficient simulations of populations of bacteria use more abstracted models still. In general, efficient simulations of aggregates will always abstract away many low-level details that might be relevant when simulating individuals in detail.
In the entertainment realm, we could build computer games that accurately simulate animal brains at the neural level, but we seem to be much more efficiently entertained by simulations that abstract away those pieces. In practice, we prefer to program simplistic behaviors and instead spend our computational power simulating more entertaining aspects of animal behavior, like realistic fur physics.
Based on these observations, we can posit that even if future beings were interested in simulating humans, they’re unlikely to simulate many with sufficient fidelity to generate subjective qualia. (Whether computer simulations can generate qualia is somewhat controversial, of course; if they cannot, then this whole discussion is moot, but if they can, then it likely requires a simulation that is relatively complex and faithful to reality.)
Ironically, then, the notion that we are all simulations falls apart because it is an exercise in egotism: we assume that our present-day selves are so special that, amid the vastness of the universe and the deepness of cosmological time, our distant descendants will find us so uniquely fascinating that they will recreate us, not only occasionally or in shallow simulacra devised for their purposes, but in vast numbers and with enough fidelity to generate the rich inner lives that we each observe subjectively today. I will confess that I don’t have a rigorous disproof of this idea, but I also don’t see any reason to think that it’s a likely scenario, and it seems generally incompatible with many aspects of my worldview.
From first principles
Turn a misbehaving computer off and on, or stop a misbehaving program and then start it again. Often, the problem goes away.
Most users don’t think hard about this, and accept it as just another inscrutable fact about computers.
However, as you learn more about how computers work, I suspect that you start feeling uncomfortable about never outgrowing this seemingly hacky and arbitrary fix. Professional engineers working for the most celebrated technology companies on Earth are sometimes reduced to blindly rebooting everything from their personal workstation to hundred-node distributed systems clusters. Is this the best that anyone can do?
Well, I offer the following argument that restarting from the initial state is a deeply principled technique for repairing a stateful system — whether that system is a program, or an entire computer, or a collection of computers.
Before a computing system starts running, it’s in a fixed initial state. At startup, it executes its initialization sequence, which transitions the system from the initial state to a useful working state:
(init_0)
|
v
(init_1)
|
v
[...]
|
v
(init_N)
|
v
(w_0)
This initialization sequence has been executed many times during the development, testing, and operation of the system. It is therefore likely to be reliable: that is, the transitions from the initial state to the working state occur with very high cumulative reliability. And this is not accidental: it stems from fundamental characteristics of the engineering process that built the initialization sequence.
As the system runs correctly, it transitions from its initial working state to other well-behaved states:
(init_0)
|
v
.------------------.
| (w_0) <--> [...] |
| ^ ^ |
| | | | (working states)
| v v |
| [...] <--> (w_n) |
'------------------`
However, when the system reaches a defect, it leaves the set of working states, and enters a broken state:
(init_0)
|
v
.------------------.
| (w_0) <--> [...] |
| ^ ^ |
| | | | (working states)
| v v |
| [...] <--> (w_n) |
'--------------+---'
|
v
(BROKEN)
By definition, this broken state is unexpected; otherwise, it would just end in another working state.
At this point, any attempt to bring your system back directly from the broken state into a working state is improvisational. We are no longer like the classically trained violist from Juilliard performing a Mozart sonata after rehearsing it a thousand times; we are now playing jazz. And in the engineering of reliable systems, we do not want our systems to improvise.
So, what should we do to fix the system?
Turn it off, and turn it on again. Anything else is less principled.
This is the basic insight behind the philosophy of crash-only software, a.k.a. recovery-oriented computing.
Complications
Granularity
If you were paying attention, you may have noticed some sleight of hand in the above reasoning. I glossed over the distinction between two different ways of resetting a system: rebooting a computer, and restarting a program.
As often happens with a crack in a simple story, if you pry at it, you will realize that a great chasm of complication opens up.
Restarting a program, as you well know from experience, is sometimes not enough to fix its misbehavior. There can be errant state elsewhere in the computer. Sometimes bad state can survive even a system reboot: if the program executable is corrupted on disk, no amount of rebooting will save you. If your hardware is corrupted deeply enough, even wiping the disk and reinstalling your operating system won’t work.
And yet, of course, we do not throw out our computers and buy new ones every time a program does something wrong. So the story of system repair is one of “turning it off and on again” at various layers of abstraction. At each layer, we hope that we can purge the corruption by discarding some compartmentalized state, and replacing it with a known start state, from which we can enter a highly reliable reinitialization sequence that ends in a working state.
(There seem to be certain analogies here between computing systems and biological ones. Your body is composed of trillions of compartmentalized cells, most of which are programmed to die after a while, partly because this prevents their DNA from accumulating enough mutations to start misbehaving in serious ways. Our body even sends its own agents to destroy misbehaving cells that have neglected to destroy themselves; sometimes you just gotta kill dash nine.)
Local crashes and global equilibria
So, resetting a single component’s state is insufficient to prevent the system as a whole from going wrong. We can go further: sometimes resetting a component can exacerbate the problem.
Consider, for example, the following scenario:
- A process P performs certain queries against a shared backend when it starts up, but not in routine operation.
- P contains a latent defect that, under certain conditions, is encountered with high probability in a short interval after startup.
- P contains assertions which catch the defect and crash.
What will happen when we encounter the conditions that trigger the defect? P will crash-loop, and every time it crashes, it will fire off its startup queries. Since the shared backend receives these queries relatively infrequently in ordinary operation, it may not be prepared for this load, and it may fall over. This is especially likely if the startup queries are expensive and there are many replicas of P.
Oops! Your beautiful crash-only error handling strategy has nudged your system into a new equilibrium where the backend is continually receiving too much load. A local defect has been amplified into a global system outage. Even if you remove the crashing defect, the flood of retrying startup queries may persist as a metastable failure mode of your system.
As with most software problems, there are ways to deal with the particular scenario outlined here (for example: stochastically delay restart timing after a crash, or add circuit breakers for the query load, or cache the startup query results so that they can be reused across restarts, or…). But the particular example is less important than the general insight that restarting a localized part of the system cannot be a silver bullet for reliability problems.
Crashiness is a healthy part of a balanced diet in reliable system engineering. But you should still think about what happens when you crash.
Forensic analysis vs. repair
The discussion above focuses on how to bring a broken system back into a working state. But, hopefully, you plan to continue building and operating your system for the foreseeable future, not just today.
In an ideal world, you will have designed your system for observability, and it will already have produced enough durable evidence to figure out what happened and fix the defect later. Here in the real world, the picture is often less complete. Depending on the urgency of the fix, you should consider pausing to gather forensic evidence before executing the reboot.
(If you’re a computer science researcher looking for a good ambitious problem, consider figuring out how to instrument multi-process and multi-computer distributed systems to support post hoc reconstruction of state at arbitrary points in time, at overheads low enough to be used in production systems. Yes, I know about rr. It’s amazing! But I think it’s not quite at the state where most companies would be comfortable running literally all their production processes under it, and multi-tier systems are outside its current scope.)
The parable of Mike and the login shell
Once, a student named Mike wondered whether it was better for programs to be written so that
- each function would be strict about its preconditions, checking its inputs and crashing immediately with an assertion failure if a precondition was violated; or
- each function would be permissive about its preconditions, checking its inputs where necessary, but repairing erroneous inputs and proceeding as best it could.
So, he wrote two Unix shells: one in the strict style, and one in the permissive style.
The shell written in the strict style would crash, at first. Mike was fearless enough to use his work-in-progress as his login shell; crashing was incredibly inconvenient, as it would log him out of the machine completely. Nevertheless, he persisted; he found and fixed defects at a rapid rate, and soon enough the shell became a usable and useful tool.
The shell written in the permissive style also had defects. But he was never able to find and fix enough of them to make it usable. Eventually he gave up on this shell.
He concluded that it was better for most programs to be written in a strict and crashing style. Even when crashing was incredibly inconvenient, it made errors so much easier to diagnose and fix that you could build better software if you did it.
Mike went on to become one of the eminent programmers of his generation, earning fame and fortune.
Meta
Acknowledgments
Thanks to my teammates on Airtable’s Performance & Architecture team for being a sounding board for these ideas, and encouraging me to write them up. (Consider joining us!)
Reactions
See this essay discussed elsewhere: