| CARVIEW |
Haskell extensions and imports used in this post
{-# LANGUAGE
DataKinds,
DeriveGeneric,
DeriveTraversable,
DerivingStrategies,
GeneralizedNewtypeDeriving,
RankNTypes,
ScopedTypeVariables,
StandaloneDeriving,
TypeFamilies #-}
import Data.Ord (Down(Down, getDown))
import Data.List.NonEmpty (NonEmpty(..))
import qualified GHC.Generics as GHC
import Generics.SOP (Generic, HasDatatypeInfo, NP(..), K(..))
import Test.QuickCheck
import Test.StrictCheckMinimax
Minimax is a general algorithm for finding optimal strategies. It’s not meant to be efficient or practical. It is more of a basic concept of game theory, and a reference against which to compare other game-solving algorithms.
We consider a simple model of two-player games. They take turns playing moves until reaching an end state with a final score. One player’s goal is to maximize the score, whereas the other player’s goal is to minimize it. Let us call these players Max and Min respectively, short for Maximizer and Minimizer.
We represent such a game by its game tree, which is made up of
three constructors:
a Max (resp. Min) node represents a game state where Max (resp. Min)
chooses the next move, each move resulting in a new game state,
and an End leaf represents an end state as its score.
data Game score
= Max (NonEmpty (Game score))
| Min (NonEmpty (Game score))
| End score
deriving stock (Show, Functor, Foldable)Note that Max and Min nodes must have at least one possible move.
You may be wondering about games that end when one player can no longer play:
instead of an empty Min or Max node, such game states simply correspond
to an End leaf, making the final score explicit.
Most real games just have a win/tie/lose end condition.
They naturally correspond to applying Game to a type with three possible scores:
In practice, chess engines don’t work with the whole game tree
since it is too massive. Instead, they build approximations by
pruning certain branches of the tree and replacing them with leaves.
The score on each leaf is a number which estimates how favorable
the game state is to either player. So we end up with
Game ℝ, or Game Double.
In general, the type Game represents two-player games
with complete information and zero-sum objectives.
We shall assume that score is a totally ordered set. This requirement
corresponds to a constraint Ord score in Haskell. In that case,
there exists an “optimal strategy” for each player which guarantees
them an “optimal score” m in the sense that as long as one player
sticks to their “optimal strategy”, the other player cannot
score better than m.
This situation is what we call a Nash equilibrium in game theory.
For win/tie/lose games, the existence of a Nash equilibrium
means that either there is a winning strategy for one of the players,
or they must tie by playing optimally.
The “optimal score” m is unique, and can be computed by a fold of the game tree,
replacing Max and Min constructors with the functions maximum and minimum.
This is the minimax algorithm:
minimax :: Ord score => Game score -> score
minimax (Max gs) = maximum (minimax <$> gs)
minimax (Min gs) = minimum (minimax <$> gs)
minimax (End s) = sminimax is quite an inefficient algorithm:
it must traverse the whole game tree. Indeed, maximum
and minimum must traverse the whole list to find
the maximum or minimum element.
Often, we can do much better. For instance, consider the following tree:
Max [ End 0,
Min [ End (-1),
t ] ]
The minimax of that tree does not depend on the subtree t.
Indeed, minimum [-1, minimax t] is guaranteed to be at most -1,
so the maximum between that value and 0 is guaranteed to be 0.
Thus we can compute the minimax without inspecting the subtree t,
which may be arbitrarily large.
That idea leads to a more efficient algorithm to compute the minimax.
Alpha-beta
The alpha-beta pruning algorithm1 is a modification of minimax with an extra pair of arguments:
The pair (alpha, beta) represents a “relevance interval” which
relaxes the possible outputs of alphabeta.
Either alphabeta t (alpha, beta) produces a score within that interval,
in which case it is guaranteed to be equal to minimax. Otherwise,
alphabeta t (alpha, beta) produces a value outside of the interval,
in which case its exact value does not matter; it only has to be on
the same side of the interval as minimax t. More rigorously:
- if
alpha < minimax t < beta, thenalphabeta t (alpha, beta) = minimax t; - if
minimax t <= alpha, thenalphabeta t (alpha, beta) <= alpha; - if
beta <= minimax t, thenbeta <= alphabeta t (alpha, beta).
Leaving the value of alphabeta underspecified when outside of the
interval allows the implementation to short-circuit:
we can stop searching through Max nodes as soon as we can guarantee a score greater than beta,
and we can stop searching through Min nodes as soon as we can guarantee a score smaller than alpha.
We can then use alphabeta to redefine minimax:
-- Minimax using alpha-beta pruning
minimaxAB :: (Ord score, Bounded score) => Game score -> score
minimaxAB t = alphabeta t (minBound, maxBound)assuming that score is Bounded with extreme values
minBound :: score and maxBound :: score.
It’s possible to avoid the Bounded constraint by changing
the interval type (score, score) to (Maybe score, Maybe score),
which amounts to adding distinguished top and bottom elements.
We’ll stick with Bounded to keep things a bit simpler.
Implementing alphabeta is a standard exercise.
It is even easier when you have a formal specification like the above
to guide the implementation.
alphabeta :: Ord score => Game score -> (score, score) -> score
alphabeta (Max (g0 :| [])) i = alphabeta g0 i
alphabeta (Max (g0 :| g1 : gs)) (alpha, beta) =
let m0 = alphabeta g0 (alpha, beta) in
if beta <= m0 then m0
else m0 `max` alphabeta (Max (g1 :| gs)) (max alpha m0, beta)
alphabeta (Min (g0 :| [])) i = alphabeta g0 i
alphabeta (Min (g0 :| g1 : gs)) (alpha, beta) =
let m0 = alphabeta g0 (alpha, beta) in
if m0 <= alpha then m0
else m0 `min` alphabeta (Min (g1 :| gs)) (alpha, min beta m0)
alphabeta (End s) _ = sBut still, it is at least a little finicky and tedious to make sure that you haven’t mixed your alphas and betas.
As we will see in this post, we can streamline the implementation of alpha-beta pruning by factoring the short-circuiting logic out of the “minimax” logic.
Generalized minimax
Remark that minimax only uses min and max
(via minimum and maximum), rather than the comparison
functions of Ord (compare, (<=), etc.).
We can reduce the dependency footprint of minimax by
defining a new class with only the necessary operations,
the class of lattices:
class Lattice a where
-- Join, least upper bound, max
(\/) :: a -> a -> a
-- Meet, greatest lower bound, min
(/\) :: a -> a -> aIn mathematics, lattices are algebraic structures with two operations
(\/) (“join”) and (/\) (“meet”)
satisfying commutativity, associativity, as well as the absorption laws:
x \/ (x /\ y) = x
x /\ (x \/ y) = x
In this post, we will only be looking at lattices that arise
out of total orders,
so this class is rather just a way of saying that we only
depend on min and max.
Binary operations can be iterated to combine lists of arguments,
similarly to the maximum and minimum functions:
-- maximum
joins :: Lattice a => NonEmpty a -> a
joins = foldr1 (\/)
-- minimum
meets :: Lattice a => NonEmpty a -> a
meets = foldr1 (/\)Minimax in lattices is defined by replacing Max and Min nodes with
the joins and meets operations.
-- Minimax in lattices
minimaxL :: Lattice score => Game score -> score
minimaxL (Max gs) = joins (minimaxL <$> gs)
minimaxL (Min gs) = meets (minimaxL <$> gs)
minimaxL (End x) = xminimaxL generalizes minimax since every decidable total order is a lattice
(because you can use (<=) to define min/max).
Ideally this fact would be made explicit by making Lattice into
a superclass of Ord. Unfortunately in Haskell this would require us
to modify Ord or redefine it.
Another way to express the relation between Lattice and Ord is through a newtype.
newtype OrdLattice a = OrdLattice a
deriving newtype (Eq, Ord, Bounded)
unOrdLattice :: OrdLattice a -> a
unOrdLattice (OrdLattice x) = x
instance Ord a => Lattice (OrdLattice a) where
OrdLattice x \/ OrdLattice y = OrdLattice (max x y)
OrdLattice x /\ OrdLattice y = OrdLattice (min x y)With that, we recover the starting minimax by specializing
minimaxL to OrdLattice s, and then unwrapping OrdLattice:
Clamping functions
Focus on the type (score, score) -> score which appears in the signature of alphabeta.
More specifically, we are interested in a subset of those functions that
we shall call clamping functions.
Intuitively, a clamping function f is a delayed representation of a constant s:
the goal of f is to compute s, but it may also stop early with an approximation
if it’s not necessary to know the exact value of s.
The name “clamping function” is a reference to the clamp function:
clamp :: Ord score => score -> (score, score) -> score
clamp s (alpha, beta) = max alpha (min s beta)We can think of the partially applied function clamp s as an encoding of the constant s,
which may or may not be output depending on the interval (alpha, beta).
More formally, a clamping function with value s is a function f :: (score, score) -> score
that satisfies the following, for all (alpha, beta) such that alpha < beta:
- if
alpha < s < beta, thenf (alpha, beta) = s; - if
s <= alpha, thenf (alpha, beta) <= alpha; - if
beta <= s, thenbeta <= f (alpha, beta).
Two clamping functions with the same value s are considered equal.
In particular, as clamping functions, const s is equal to clamp s.
Making the notion of equality explicit is necessary to make sense of equations
(laws for lattices, homomorphisms, and isomorphisms).
We enshrine the definition of clamping functions in a newtype:
-- Type of clamping functions, satisfying the properties above.
newtype Clamping score = Clamping ((score, score) -> score)
unClamping :: Clamping score -> (score, score) -> score
unClamping (Clamping f) = fFor any value s, we can construct the constant clamping function:
Note that \_ -> s and clamp s are both clamping functions with value s,
so both are valid definitions of clamping s.
We prefer the constant function \_ -> s because it does less work.
Conversely, we can project clamping functions back into their values
by passing the whole interval (minBound, maxBound):
Those two functions form an isomorphism between score and Clamping score,
meaning that they satisfy the following equations:
declamp . clamping = id
clamping . declamp = id
We now get to the secret sauce of this post: the maximum of two clamping functions (as well as the minimum). This operation can be defined in two ways. First is the naive definition, for reference:
-- "max" for clamping functions, naive variant
maxC :: Ord s => Clamping s -> Clamping s -> Clamping s
maxC (Clamping f) (Clamping g) = Clamping (\i -> max (f i) (g i))Second is the lazy definition: if f (alpha, beta) is greater
than the given upper bound beta, then the max of f and g will
be even greater:
beta <= f (alpha, beta) <= max (f (alpha, beta)) (g (alpha, beta))
In that case, the maximum of f and g is allowed to output
f (alpha, beta) without looking at g.
Otherwise we must evaluate g, but we can tighten the interval by
updating the lower bound to max alpha (f (alpha, beta)).
-- "max" for clamping functions, lazy variant
lazyMaxC :: Ord s => Clamping s -> Clamping s -> Clamping s
lazyMaxC (Clamping f) (Clamping g) = Clamping (\(alpha, beta) ->
let fi = f (alpha, beta) in
if beta <= fi then fi else fi `max` g (max alpha fi, beta))Dually, we also have a lazyMinC.
lazyMinC :: Ord s => Clamping s -> Clamping s -> Clamping s
lazyMinC (Clamping f) (Clamping g) = Clamping (\(alpha, beta) ->
let fi = f (alpha, beta) in
if fi <= alpha then fi else fi `min` g (alpha, min beta fi))To avoid repeating ourselves,
we can also reuse lazyMaxC to implement lazyMinC.
Use Down to invert the ordering of an Ord:
lazyMinC :: Ord s => Clamping s -> Clamping s -> Clamping s
lazyMinC f g = undualize (lazyMaxC (dualize f) (dualize g))
where
dualizeWith from to (Clamping h) =
Clamping (\(beta, alpha) -> from (h (to alpha, to beta)))
dualize = dualizeWith Down getDown -- Clamping s -> Clamping (Down s)
undualize = dualizeWith getDown Down -- Clamping (Down s) -> Clamping sThese “naive” and “lazy” functions denote the same value
(maxC = lazyMaxC and minC = lazyMinC),
but lazyMaxC and lazyMinC may do less work,
either by ignoring their second argument or by applying it to a smaller interval than expected.
The point is that these “lazy” functions embody the short-circuiting logic of alpha-beta pruning exactly. All that’s left to do is to plug them into minimax.
The lattice of clamping functions
With the lazy min and max that we just defined, we get a lattice:
Specialize minimax in the lattice of clamping functions:
This doesn’t look like much, but we have actually implemented the
alpha-beta pruning algorithm.
With a tiny bit of plumbing, we can redefine the function alphabeta
from earlier:
alphabeta' :: Ord score => Game score -> (score, score) -> score
alphabeta' = unClamping . minimaxC . fmap clampingThen we want to partially apply alphabeta' to the interval (minBound, maxBound).
This amounts to replacing unClamping with declamp in the body of alphabeta'.
Behold our final implementation of minimax by alpha-beta pruning:
minimaxAB' :: (Ord score, Bounded score) => Game score -> score
minimaxAB' = declamp . minimaxC . fmap clampingTo sum up, we implemented alpha-beta pruning as a simple combination of:
- minimax, generalized from orders to lattices (
minimaxL); - the lattice of clamping functions (
Lattice (Clamping score)).
This alternative approach does not completely absolve you from effort:
you still have to juggle alphas and betas correctly to implement the lattice
(lazyMinC and lazyMaxC).
But unlike in the original alphabeta,
you don’t have to do all that juggling in the middle of a recursive function.
The logic of alpha-beta pruning is neatly decomposed into bite-sized pieces.
Correctness for free
Since we just reused the code of minimax, it’s also easier to prove that that alpha-beta pruning yields the same result:
minimax = minimaxAB'
As we are about to see, this is a direct consequence of
the free theorem2 for minimaxL:
any function of type forall s. Lattice s => Game s -> s,
such as minimaxL, commutes with any lattice homomorphism3 f,
in the following sense:
f . minimaxL = minimaxL . fmap f
We can picture that equation as a commutative diagram:
\[\require{AMScd} \begin{CD} \small\texttt{Game s} @>{\texttt{minimaxL}}>> \small\texttt{s} \\ @V{\texttt{fmap f}}VV @VV{\texttt{f}}V \\ \small\texttt{Game t} @>{\texttt{minimaxL}}>> \small\texttt{t} \end{CD}\]
If f has an inverse f⁻¹, we can rewrite that to
minimaxL = f⁻¹ . minimaxL . fmap f
By replacing (f, f⁻¹) with the isomorphism (clamping, declamp) defined earlier,
we obtain exactly the equality between minimax and alpha-beta pruning:
minimaxL = declamp . minimaxL . fmap clamping
= minimaxAB'
As a commutative diagram:
\[\require{AMScd} \begin{CD} \small\texttt{Game s} @>{\texttt{minimaxAB’}\text{ (alpha-beta)}}>> \small\texttt{s} \\ @V{\texttt{fmap clamping}}VV @AA{\texttt{declamp}}A \\ \scriptsize\texttt{Game (Clamping s)} @>{\texttt{minimaxL}}>> \scriptsize\texttt{Clamping s} \end{CD}\]
QED.
(To be pedantic, the above proof
conflates minimaxL with minimax/minimaxO,
which relies on pretending that Lattice is a superclass of Ord.
Below is another proof that doesn’t take that shortcut,
by going through the OrdLattice newtype explicitly,
so this proof applies more directly to the Haskell definitions as written here.)
A somewhat more rigorous proof
We want to prove that the alpha-beta-pruning minimaxAB'
is equivalent to the naive minimax:
minimax = minimaxAB'
Recall the free theorem of minimaxL. For any lattice isomorphism (f, f⁻¹):
minimaxL = f⁻¹ . minimaxL . fmap f
Replace (f, f⁻¹) with the lattice isomorphism (clamping . unOrdLattice, OrdLattice . declamp)
between the lattices OrdLattice score and Clamping score.
minimaxL = OrdLattice . declamp . minimaxL . fmap (clamping . unOrdLattice)
Now we can prove the equality between minimax and minimaxAB',
using the above equation as the middle step, followed by
canceling inverses:
minimax
= minimaxO
= unOrdLattice . minimaxL . fmap OrdLattice
= unOrdLattice . OrdLattice . declamp . minimaxL . fmap (clamping . unOrdLattice) . fmap OrdLattice
= declamp . minimaxL . fmap clamping
= minimaxAB'
The above is only a proof of functional correctness:
minimax and minimaxAB' compute the same result.
To verify that minimaxAB' does so more efficiently
is another problem for another day. For now, we can test it.
Strictness check
We test that our “fancy” implementation of alpha-beta (minimaxAB') has the same
strictness as the “classical” implementation (minimaxAB),
which we presume to be much lazier than minimax.
We use StrictCheck for property-testing of strictness behaviors in Haskell.
The following test checks that minimaxAB and minimaxAB' have the
same demand on random inputs.
We use the function observe1 from StrictCheck to observe the demand
of a function f: observe1 applies f it to an instrumented copy
of the provided input g, it forces the output (f g of type Int)
using the provided forcing function (`seq` ()),
and finally returns the demand on the input tree g that was observed
by forcing the instrumented copy of g.
main :: IO ()
main = do
quickCheck $ \(g :: Game Int) ->
label (bucket (length g)) $
let demand f = snd (observe1 (`seq` ()) f g) in
demand minimaxAB === demand minimaxAB'From the source repository of this blog, the following command compiles and runs this blog post:
cabal run alpha-beta
Instances and auxiliary definitions
-- Histogram of generated value sizes
bucket :: Int -> String
bucket n | n == 1 = "= 1"
| n < 10 = "< 10"
| n < 100 = "< 100"
| n < 1000 = "< 1000"
| otherwise = ">= 1000"
-- Instances
deriving stock instance GHC.Generic (Game a)
instance Generic (Game a)
instance HasDatatypeInfo (Game a)
instance Shaped a => Shaped (Game a)
instance Arbitrary a => Arbitrary (Game a) where
arbitrary = sized $ \n -> if n == 0 then End <$> arbitrary else
resize (n `div` 2) $ frequency
[(1, End <$> arbitrary), (2, Max <$> arbitrary), (2, Min <$> arbitrary)]
shrink (Max (g :| gs)) = g : gs ++ (Max <$> shrink (g :| gs))
shrink (Min (g :| gs)) = g : gs ++ (Min <$> shrink (g :| gs))
shrink (End s) = End <$> shrink s
instance Arbitrary a => Arbitrary (NonEmpty a) where
arbitrary = liftA2 (:|) arbitrary arbitrary
shrink (x :| xs) = [y :| ys | y : ys <- shrink (x : xs)]Conclusion
I came up with this idea a while back on Stack Overflow, as an answer to Alpha-beta pruning with recursion schemes. My understanding of alpha-beta pruning changed overnight from a somewhat tricky algorithm to a completely trivial solution. Getting to reuse minimax is not only a satisfying achievement in refactoring, it enables a neat proof of correctness by parametricity (via free theorems).
The role of laziness should also be underscored. If you try to do the same thing in a call-by-value language, the implementation of “generalized minimax” must explicitly delay computations, obscuring the point:
Alpha-beta pruning is just minimax in a lattice of clamping functions.
For a clearer presentation, see the talk Alpha-Beta Pruning Explored, Extended and Verified (2024) by Tobias Nipkow.↩︎
Theorems for free! by Philip Wadler. Free theorems involving type constructor classes by Janis Voigtländer.↩︎
A lattice homomorphism
fis a function that commutes with the lattice operations:
↩︎f (x /\ y) = f x /\ f y f (x \/ y) = f x \/ f y
Twentyseven is a Rubik’s cube solver and one of my earliest projects in Haskell. The first commit dates from January 2014, and version 0.0.0 was uploaded on Hackage in March 2016.
I first heard of Haskell in a course on lambda calculus in 2013. A programming language with lazy evaluation sounded like a crazy idea, so I gave it a try. Since then, I have kept writing in Haskell as my favorite language. For me it is the ideal blend of programming and math. And a Rubik’s cube solver is a great excuse for doing group theory.
Twentyseven 1.0.0 is more of a commemorative release for myself, with the goal of making it compile with the current version of GHC (9.12). There was surprisingly little breakage:
Aside from that, the code is basically just as it was 9 years ago,
including design decisions that I would find questionable today.
For example, I use unsafePerformIO to read precomputed tables
into top-level constants, but the location of the files to read from
can be configured by command-line arguments, so I better make sure that
the tables are not forced before the location is set…
How Twentyseven works
The input of the program is a string enumerating the 54 facelets of a Rubik’s cube, each character represents one color.
DDDFUDLRB FUFDLLLRR UBLBFDFUD ULBFRULLB RRRLBBRUB UBFFDFDRU
The facelets follow the order pictured below. They are grouped by faces (up, left, front, right, back, top), and in each face they are listed in top-down, left-right order.
00 01 02
03 04 05
06 07 08
10 11 12 20 21 22 30 31 32 40 41 42
13 14 15 23 24 25 33 34 35 43 44 45
16 17 18 26 27 28 36 37 38 46 47 48
50 51 52
53 54 55
56 57 58
The output is a sequence of moves to solve that cube.
U L B' L R2 D R U2 F U2 L2 B2 U B2 D' B2 U' R2 U L2 R2 U
The implementation of Twentyseven is based on Herbert Kociemba’s notes about Cube Explorer, a program written in Pascal!
The search algorithm is iterative deepening A*, or IDA*. Like A*, IDA* finds the shortest path between two vertices in a graph. A conventional A* is not feasible because the state space of a Rubik’s cube is massive (43 252 003 274 489 856 000 states, literally billions of billions). Instead, we run a series of depth-first searches with a maximum allowed number of moves that increases for each search. As it is based on depth-first search, IDA* only needs memory for the current path, which is super cheap.
IDA* relies on an estimate of the number of moves remaining to reach the solved state. We obtain such an estimate by projecting the Rubik’s cube state into a simpler puzzle. For example, we can consider only the permutation of corners, ignoring their orientation. We can pre-compute a table mapping each corner permutation (there are 8! = 40320) to the minimum number of moves to put the corners back to their location. This is a lower bound on the number of moves to actually solve a Rubik’s cube. Different projections yield different lower bounds (for example, by looking at the permutation of edges instead, or their orientation), and we can combine lower bounds into their maximum, yielding a more precise lower bound, and thus a faster IDA*.
Putting all that together, we obtain an optimal solver for Rubik’s cubes. But even with these heuristics, Twentyseven can take hours to solve a random cube optimally. Kociemba’s Cube Explorer is apparently much faster (I’ve never tried it myself). My guess is that the difference is due to a better selection of projections, yielding better heuristics. But I haven’t gotten around to figure out whether I’ve misinterpreted his notes or those improvements can only be found in the code.
A faster alternative is Kociemba’s two phase algorithm. It is suboptimal, but it solves Rubik’s cubes in a fraction of a second (1000 cubes per minute). The first phase puts cubies into a “common orientation” and “separates” the edges into two groups. In other words, we reach a state where the permutation of 12 edges can be decomposed into two disjoint permutations of 4 and 8 edges respectively. In the second phase, we restrict the possible moves: quarter- and half-turns on the top and bottom faces, half-turns only on the other faces. These restricted moves preserve the “common orientation” of edges and corners from phase 1, and the edges in the middle slice stay in their slice. Each phase thus performs an IDA* search in a much smaller space than the full Rubik’s cube state space (2 217 093 120 and 19 508 428 800 states respectively).
]]>Compositionality means that for every node, its descendants—the other nodes reachable from it—are defined by composing the descendants of its children. Dynamism means that the children of a node are generated only when that node is visited; we will see that this requirement corresponds to asking for a monadic unfold.
A prior solution, using the Phases applicative functor,
is compositional but not dynamic in that sense. The essence of Phases
is a zipping operation in free applicative functors.
What if we did zipping in free monads instead?
This is a Literate Haskell post. The source code is on Gitlab. A reusable version of this code is now available on Hackage: the weave library.
Extensions and imports for this Literate Haskell file
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PackageImports #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE ViewPatterns #-}
{-# OPTIONS_GHC -Wno-x-partial -Wno-unused-matches -Wno-unused-top-binds -Wno-unused-imports #-}
import "deepseq" Control.DeepSeq (NFData)
import Data.Foldable (toList)
import Data.Function ((&))
import Data.Functor ((<&>))
import Data.Functor.Identity (Identity(..), runIdentity)
import GHC.Generics (Generic)
import "tasty" Test.Tasty (TestTree, localOption)
import "tasty-hunit" Test.Tasty.HUnit ((@?=), testCase)
import "tasty-bench" Test.Tasty.Bench (bgroup, bench, defaultMain, nf, bcompare)
-- import "tasty-bench" Test.Tasty.Bench (mutatorCpuTime)
import "tasty-expected-failure" Test.Tasty.ExpectedFailure (expectFail)
import "some" Data.Some.Newtype (Some(Some))
import "transformers" Control.Monad.Trans.State
import qualified "containers" Data.Set as Set
import "containers" Data.Set (Set)Background: breadth-first folds and traversals
Our running example will be the type of binary trees:
A breadth-first walk explores the tree level by level; every level contains the
nodes at the same distance from the root. The list of levels of a tree can be defined
recursively—it is a fold. For a tree Node x l r, the first level contains
just the root node x, and the subsequent levels are obtained by appending the
levels of the subtrees l and r pairwise.
levels :: Tree a -> [[a]]
levels Leaf = []
levels (Node x l r) = [x] : zipLevels (levels l) (levels r)zipLevels :: [[a]] -> [[a]] -> [[a]]
zipLevels [] yss = yss
zipLevels xss [] = xss
zipLevels (xs : xss) (ys : yss) = (xs ++ ys) : zipLevels xss yss(We can’t just use zipWith because it throws away the end of a list when the
other list is empty.)
Finally, we concatenate the levels together to obtain the list of nodes in breadth-first order.
Thanks to laziness, the list will indeed be produced by walking the tree in breadth-first order. So far so good.
The above function lets us fold a tree in breadth-first order. The next level of difficulty is to traverse a tree, producing a tree with the same shape as the original tree, only with modified labels.
This has the exact same type as traverse, which you might obtain with
deriving (Foldable, Traversable). The stock-derived Traversable—enabled
by the DeriveTraversable extension—is a depth-first traversal, but the laws
of traverse don’t specify the order in which nodes should be visited,
so you could make it a breadth-first traversal if you wanted.
To define a breadth-first traversal is a surprisingly non-trivial exercise, as pointed out by Chris Okasaki in Breadth-first numbering: lessons from a small exercise in algorithm design (ICFP 2000).
“Breadth-first numbering” is a special case of “breadth-first traversal”
where the arrow (a -> m b) is specialized to a counter.
Okasaki presents a “numbering” solution based on queues and another solution
based on levels.
Both are easily adaptable to the more general “traversal” problem as we will
soon see.
There is a wonderful Discourse thread from 2024 on the topic of
breadth-first traversals.
The first post gives an elegant breadth-first numbering algorithm
which also appears in the appendix of Okasaki’s paper,
but sadly it does not generalize from “numbering” to
“traversal” beyond the special case m = State s.
Last but not least, another level-based solution to the breadth-first traversal
problem can be found in the
tree-traversals library by Noah Easterly.
It is built around an applicative transformer named Phases,
which is a list of actions—imagine the type “[m _]”—where each
element m _ represents one level of the tree.
The Phases applicative enables a compositional definition of a
breadth-first traversal, similarly to the levels function above:
the set of nodes reachable from the root is defined by combining the sets of
nodes reachable from its children. This concern of compositionality
is one of the main motivations behind this post.
Non-standard terminology
The broad family of algorithms being discussed is typically called
“breadth-first search” (BFS) or “breadth-first traversal”,
but in general these algorithms are not “searching” for anything,
and in Haskell, “traversal” is reserved for “things like traverse”.
Instead, this post will use “walks” as a term encompassing folds, traversals,
unfolds, or any concept that can be qualified with “breadth-first”.
Problem statement: Breadth-first unfolds
Both the fold toListBF and the traversal traverseBF had in common that they
receive a tree as an input. This explicit tree makes the notion of levels
“static”. With unfolds, we will have to deal with levels that exist only
“dynamically” as the result of unfolding the tree progressively.
To introduce the unfolding of a tree, it is convenient to introduce its “base functor”. We modify the tree type by replacing the recursive tree fields with an extra type parameter:
An unfold generates a tree from a seed and a function which expands the seed into a leaf or a node containing more seeds. A pure unfold—or anamorphism—can be defined readily:
unfold :: (s -> TreeF a s) -> s -> Tree a
unfold f s = case f s of
LeafF -> Leaf
NodeF a l r -> Node a (unfold f l) (unfold f r)The order in which nodes are evaluated depends on
how the resulting tree is consumed. Hence unfold
is neither inherently “depth-first” nor “breadth-first”.
The situation changes if we make the unfold monadic.
An implementation of unfoldM must decide upon an ordering between actions.
To see why adding an M to unfold imposes an ordering,
contemplate the fact that these expressions have the same meaning:
Node a (unfold f l) (unfold f r)
= ( let tl = unfold f l in
let tr = unfold f r in
Node a tl tr )
= ( let tr = unfold f r in
let tl = unfold f l in
Node a tl tr )
whereas these monadic expressions do not have the same meaning in general:
( unfoldM f l >>= \tl ->
unfoldM f r >>= \tr ->
pure (Node a tl tr) )
/=
( unfoldM f r >>= \tr ->
unfoldM f l >>= \tl ->
pure (Node a tl tr) )
Without further requirements, there is an “obvious” definition of unfoldM,
which is a depth-first unfold:
unfoldM_DF :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_DF f s = f s >>= \case
LeafF -> pure Leaf
NodeF a l r -> liftA2 (Node a) (unfoldM_DF f l) (unfoldM_DF f r)We unfold the left subtree l fully before unfolding the right one r.
The problem is to define a breadth-first unfoldM.
If you want to think about this problem on your own, you can stop reading here. The rest of this post presents solutions.
Queue-based unfold
The two breadth-first numbering algorithms in Okasaki’s paper can
actually be generalized to breadth-first unfolds.
Here is the first one that uses queues (using the function (<+) for “push” and
pattern-matching on (:>) for “pop”):
unfoldM_BF_Q :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_Q f b0 = go (b0 <+ Empty) <&> \case
_ :> t -> t
_ -> error "impossible"
where
go Empty = pure Empty
go (q :> b) = f b >>= \case
LeafF -> go q <&> \p -> Leaf <+ p
NodeF a b1 b2 -> go (b2 <+ b1 <+ q) <&> \case
p :> t1 :> t2 -> Node a t1 t2 <+ p
_ -> error "impossible"(The operator (<&>) is flip (<$>). I use it to avoid parentheses around
lambdas.)
Queue implementation for unfoldM_BF_Q
data Q a = Q [a] [a]
pattern Empty :: Q a
pattern Empty = Q [] []
infixr 1 <+
(<+) :: a -> Q a -> Q a
x <+ Q xs ys = Q (x : xs) ys
pop :: Q a -> Maybe (Q a, a)
pop (Q xs (y : ys)) = Just (Q xs ys, y)
pop (Q xs []) = case reverse xs of
[] -> Nothing
y : ys -> Just (Q [] ys, y)
infixl 1 :>
pattern (:>) :: Q a -> a -> Q a
pattern q :> y <- (pop -> Just (q, y))
{-# COMPLETE Empty, (:>) #-}As it happens, containers uses that queue-based technique to implement
breadth-first unfold for rose trees (Data.Tree.unfoldTreeM_BF).
There is a pending question of whether we can improve upon it.
This post might provide a theoretical alternative,
but it seems too slow to be worth serious consideration
(see the benchmark section).
If you’re frowning upon the use of error—as you should be—you can replace
error with dummy values here (Empty, Leaf), but
(1) that won’t be possible with tree structures that must be non-empty
(e.g., if Leaf contained a value) and (2) this is dead code, which
is harmless but no more elegant than making it obvious with error.
The correctness of this solution is also not quite obvious.
There are subtle ways to get this implementation wrong:
should the recursive call be b2 <+ b1 <+ q or b1 <+ b2 <+ q?
Should the pattern be p :> t1 :> t2 or p :> t2 :> t1?
For another version of this challenge, try implementing the unfold for another
tree type, such as finger trees or rose trees, without getting lost in the
order of pushes and pops (by the way, this is Data.Tree.unfoldTreeM_BF in
containers). The invariant is not complex but there is room for mistakes.
I believe that the compositional approach that will be presented later is more
robust on that front, although it is admittedly a subjective quality for which
is difficult to make a strong case.
Some uses of unfolds
Traversals from unfolds
One sense in which unfoldM is a more difficult problem than traverse is
that we can use unfoldM to implement traverse.
We do have to make light of the technicality that there is a Monad constraint
instead of Applicative, which makes unfoldM not suited to implement the
Traversable class.
A depth-first unfold gives a depth-first traversal:
traverse_DF :: Monad m => (a -> m b) -> Tree a -> m (Tree b)
traverse_DF = unfoldM_DF . traverseRoot
-- auxiliary function
traverseRoot :: Applicative m => (a -> m b) -> Tree a -> m (TreeF b (Tree a))
traverseRoot _ Leaf = pure LeafF
traverseRoot f (Node a l r) = f a <&> \b -> NodeF b l rA breadth-first unfold gives a breadth-first traversal:
traverse_BF_Q :: Monad m => (a -> m b) -> Tree a -> m (Tree b)
traverse_BF_Q = unfoldM_BF_Q . traverseRootUnfolds in graphs
We can use a tree unfold to explore a graph. This usage distinguishes unfolds from folds and traversals, which only let you explore trees.
Given a type of vertices V, a directed graph is represented by a function
V -> F V, where F is a functor which describes the arity of each node.
The obvious choice for F is lists, but we will stick to TreeF here
so we can just reuse this post’s unfoldM implementations.
The TreeF functor restricts us graphs where each node has zero or two
outgoing edges; it is a weird restriction, but we will make do for the sake of
example.
+-------+
v |
+->1--->2--->3 |
| | | ^ |
| v v | |
| 4--->5--->6--+
| | | ^
| +----|----+
| |
+-------+
The graph drawn above turns into the following function, where every vertex
is mapped either to NodeF with the same vertex as the first argument followed
by its two adjacent vertices, or to LeafF if it has no outgoing edges or does
not belong to the graph.
graph :: Int -> TreeF Int Int
graph 1 = NodeF 1 2 4
graph 2 = NodeF 2 3 5
graph 3 = LeafF
graph 4 = NodeF 4 5 6
graph 5 = NodeF 5 1 6
graph 6 = NodeF 6 2 3
graph _ = LeafFIf we simply feed that function to unfold, we will get the infinite tree
of all possible paths from a chosen starting vertex.
To obtain a finite tree, we want to keep track of vertices that we have
already visited, using a stateful memory. The following function wraps graph,
returning LeafF also if a vertex has already been visited.
visitGraph :: Int -> State (Set Int) (TreeF Int Int)
visitGraph vertex = do
visited <- get
if vertex `elem` visited then pure LeafF
else do
put (Set.insert vertex visited)
pure (graph vertex)Applying unfoldM_BF to that function produces a “breadth-first tree”
of the graph, an encoding of the trajectory of a breadth-first walk through the
graph. “Breadth-first trees” are a concept from graph theory with well-studied
properties.
-- Visit `graph` in breadth-first order
bfGraph_Q :: Int -> Tree Int
bfGraph_Q = (`evalState` Set.empty) . unfoldM_BF_Q visitGraphtestGraphQ :: TestTree
testGraphQ = testCase "Q-graph" $
bfGraph_Q 1 @?=
Node 1
(Node 2 Leaf
(Node 5 Leaf Leaf))
(Node 4 Leaf (Node 6 Leaf Leaf))Compile and run
This post is a compilable Literate Haskell file. You can run all of the tests
and benchmarks in here. The source repository provides the necessary
configuration to build it with cabal.
$ cabal build breadth-first-unfolds
Test cases can then be selected with the -p option and a pattern
(see the tasty documentation for details).
Run all tests and benchmarks by passing no option.
$ cabal exec breadth-first-unfolds -- -p "/Q-graph/||/S-graph/"
All
Q-graph: OK
S-graph: OK
“Global” level-based unfold
The other solution from Okasaki’s paper can also be adapted into a monadic unfold.
The starting point is to unfold a list of seeds [s] instead of a single seed:
we can traverse the list with the expansion function s -> m (TreeF a s) to
obtain another list of seeds, the next level of the breadth-first unfold,
and keep going.
Iterating this process naively yields a variant of monadic unfold without a
result. This no-result variant can be generalized from TreeF to
any foldable structure:
-- Inner loop: multi-seed unfold
unfoldsM_BF_G_ :: (Monad m, Foldable f) => (s -> m (f s)) -> [s] -> m ()
unfoldsM_BF_G_ f [] = pure ()
-- Read from right to left: traverse, flatten, recurse.
unfoldsM_BF_G_ f xs = unfoldsM_BF_G_ f . concatMap toList =<< traverse f xs
-- Top-level function: single-seed unfold
unfoldM_BF_G_ :: (Monad m, Foldable f) => (s -> m (f s)) -> s -> m ()
unfoldM_BF_G_ f = unfoldsM_BF_G_ f . (: [])Modifying this solution to create the output tree requires a little more thought.
We must keep hold of the intermediate list of ts :: [TreeF a s] to
reconstruct trees after the recursive call returns.
unfoldsM_BF_G :: Monad m => (s -> m (TreeF a s)) -> [s] -> m [Tree a]
unfoldsM_BF_G f [] = pure []
-- traverse, flatten, recurse, reconstruct
unfoldsM_BF_G f xs = traverse f xs >>= \ts ->
reconstruct ts <$> unfoldsM_BF_G f (concatMap toList ts)The reconstruction function picks a root in the first list and completes it with subtrees from the second list:
reconstruct :: [TreeF a s] -> [Tree a] -> [Tree a]
reconstruct (LeafF : ts) us = Leaf : reconstruct ts us
reconstruct (NodeF a _ _ : ts) (l : r : us) = Node a l r : reconstruct ts us
reconstruct _ _ = error "impossible"You could modify the final branch to produce [], but error makes it
explicit that this branch should never be reached by the unfold where it is
used.
The top-level unfold function wraps the seed in a singleton input list and extracts the root from a singleton output list.
unfoldM_BF_G :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_G f = fmap head . unfoldsM_BF_G f . (: [])
Unit test testGraphG
This solution is less brittle than the queue-based solution because
we always traverse lists left-to-right.
To avoid the uses of error in reconstruct,
you can probably create a specialized data structure in place of [TreeF a s],
but that is finicky in its own way.
In search of compositionality
Both of the solutions above (the queue-based and the “monolithic” level-based unfolds)
stem from a global view of breadth-first walks: we are iterating on a list or a
queue which holds all the seeds from one or two levels at a time.
That structure represents a “front line” between visited and unvisited
vertices, and every iteration advances the front line a little: with a queue we
advance it one vertex at a time, with a list we advance the whole front line
in an inner loop—one call to traverse—before recursing.
The opposite local view of breadth-first order is exemplified by the earlier
levels function: it only produces a list of lists of the vertices
reachable from the current root. It does so recursively, by composing
together the vertices reachable from its children. Our goal here is to find a
similarly local, compositional implementation of breadth-first unfolds.
Rather than defining unfoldM directly, which sequences the computations on
all levels into a single computation, we will introduce an intermediate
function weave that keeps levels separate—just as toListBF is defined
using levels.
The result of weave will be in an as yet unknown applicative functor F m
depending on m.
And because levels are kept separate, weave only needs
a constraint Applicative m to compose computations on the same level.
The goal is to implement this signature, where the result type F is also an
unknown:
The name weave comes from visualizing a breadth-first walk
as a path zigzagging across a tree like this:
which is reminiscent of weaving as in the making of textile:
With only what we know so far, a bit of type-directed programming leads to the
following incomplete definition. We have constructed something of type
m (F m (Tree a)), while we expect F m (Tree a):
weave :: Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
weave f s = _ (step <$> f s) where
step :: TreeF a s -> F m (Tree a)
step LeafF = pure Leaf
step (NodeF a l r) = liftA2 NodeF (weave f l) (weave f r)To fill the hole _, we postulate the following primitive, weft,
as part of the unknown definition of F:
Intuitively, F m represents “multi-level computations”.
The weft function constructs a multi-level (F m)-computation from
one level of m-computation which returns the subsequent levels
as an (F m)-computation.
We fill the hole with weft, completing the definition of weave:
weave :: forall m s a. Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
weave f s = weft (weaveF <$> f s) where
weaveF :: TreeF a s -> F m (Tree a)
weaveF LeafF = pure Leaf
weaveF (NodeF a l r) = liftA2 (Node a) (weave f l) (weave f r)The function weave defines a multi-level computation which represents
a breadth-first walk from a seed s:
- the first level of the walk is
f s, expanding the initial seed; - the auxiliary function
weaveFconstructs the remaining levels from the initial seed’s expansion:- if the seed expands to
LeafF, there are no more seeds, and we terminate with an empty computation (pure); - if the seed expands to
NodeF, we obtain two sub-seedslandr, they generate their own weaves recursively (weave f landweave f r), and we compose them (liftA2).
- if the seed expands to
One way to think about weft is as a generalization of the following primitives:
we can “embed” m-computations into F m,
and we can “delay” multi-level (F m)-computations, shifting the
m-computation on each level to the next level.
embed :: Applicative m => m a -> F m a
embed u = weft (pure <$> u)
delay :: Applicative m => F m a -> F m a
delay u = weft (pure u)The key law relating these two operations is that embedded computations and delayed computations commute with each other:
embed u *> delay v = delay v <* embed u
The embed and delay operations are provided by the Phases applicative
functor that I mentioned earlier, which enables breadth-first traversals,
but not breadth-first unfolds. Thus, weft is a strictly more expressive
primitive than embed and delay.
Eventually, we will run a multi-level computation as a single m-computation
so that we can use weave to define unfoldM. The runner function will be
called mesh:
It is characterized by this law which says that mesh executes the first
level of the computation u :: m (F m a), then executes the remaining levels
recursively:
mesh (weft u) = u >>= mesh
Putting everything together, weave and mesh combine into a breadth-first unfold:
It remains to find an applicative functor F equipped with weft and mesh.
The weave applicative
A basic approach to design a type is to make some of the operations it
should support into constructors. The weave applicative WeaveS has
constructors for pure and weft:
(The suffix “S” stands for Spoilers. Read on!)
We instantiate the unknown functor F with WeaveS.
Astute readers will have recognized WeaveS as the free monad.
Just as Phases has the same type definition as the free applicative functor but
a different Applicative instance, we will give WeaveS an Applicative
instance that does not coincide with the Applicative and Monad instances of
the free monad.
Starting with the easy functions,
weft is WeftS, and the equation for mesh above is basically its definition.
We just need to add an equation for EndS.
weft :: m (WeaveS m a) -> WeaveS m a
weft = WeftS
mesh :: Monad m => WeaveS m a -> m a
mesh (EndS a) = pure a
mesh (WeftS u) = u >>= meshRecall that WeaveS represents multi-level computations.
Computations are composed level-wise with the following liftS2.
The interesting case is the one where both arguments are WeftS: we compose
the first level with liftA2, and the subsequent ones with liftS2
recursively.
liftS2 :: Applicative m => (a -> b -> c) -> WeaveS m a -> WeaveS m b -> WeaveS m c
liftS2 f (EndS a) wb = f a <$> wb
liftS2 f wa (EndS b) = flip f b <$> wa
liftS2 f (WeftS wa) (WeftS wb) = WeftS ((liftA2 . liftS2) f wa wb)liftS2 will be the liftA2 in WeaveS’s Applicative instance.
The Functor and Applicative instances show that WeaveS is an
applicative transformer: for every applicative functor m,
WeaveS m is also an applicative functor.
instance Functor m => Functor (WeaveS m) where
fmap f (EndS a) = EndS (f a)
fmap f (WeftS wa) = WeftS ((fmap . fmap) f wa)
instance Applicative m => Applicative (WeaveS m) where
pure = EndS
liftA2 = liftS2That completes the definition of unfoldM_BF: a level-based, compositional
breadth-first unfold.
As a unit test, we copy the code for visiting a graph from earlier:
testGraphS :: TestTree
testGraphS = testCase "S-graph" $
bfGraphS 1 @?=
Node 1
(Node 2 Leaf
(Node 5 Leaf Leaf))
(Node 4 Leaf (Node 6 Leaf Leaf))Code golf
There is a variant of weave that I prefer:
weaveS :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveS m (Tree a))
weaveS f s = f s <&> \case
LeafF -> pure Leaf
NodeF a l r -> liftA2 (Node a) (weft (weaveS f l)) (weft (weaveS f r))The outer weft constructor was moved into the recursive calls.
The result type has an extra m, which makes it more apparent that
we always start with a call to f. It’s the same vibe as replacing the type
[a] with NonEmpty a when we know that a list will always have at least one
element; weaveS always produces at least one level of computation.
We also replace (<$>) with its flipped version (<&>) for aesthetic reasons:
we can apply it to a lambda without parentheses, and that change makes the
logic flow naturally from left to right: we first expand the seed s using
f, and continue depending on whether the expansion produced LeafF or NodeF.
To define unfoldM, instead of applying mesh directly, we chain it with
(>>=).
unfoldM_BF_S :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_S f s = weaveS f s >>= meshA wrinkle in time
That solution is Obviously Correct™, but it has a terrible flaw: it does not run in linear time!
We can demonstrate this by generating a “thin” tree whose height
is equal to its size.
The height h is the seed of the unfolding, and we generate a NodeF as long
as it is non-zero, asking for a decreased height h - 1 on the right,
and a zero height on the left.
thinTreeS :: Int -> Tree ()
thinTreeS = runIdentity . unfoldM_BF_S f
where
f 0 = pure LeafF
f h = pure (NodeF () 0 (h - 1))Compare the running times of evaluating thinTreeS at height 100
(the baseline)
and at height 1000 (10x the baseline).
benchS :: TestTree
benchS = bgroup "S-thin"
[ bench "1x" (nf thinTreeS 100)
, bench "10x" (nf thinTreeS 1000) & bcompare "S-thin.1x"
]Benchmark output (relative):
| height | time |
|---|---|
| baseline | 1x |
| 10x | 105x |
Raw output
$ cabal exec breadth-first-unfolds -- -p "S-thin"
All
S-thin
1x: OK
27.6 μs ± 2.6 μs, 267 KB allocated, 317 B copied, 6.0 MB peak memory
10x: OK
2.90 ms ± 181 μs, 23 MB allocated, 178 KB copied, 7.0 MB peak memory, 105.35x
Multiplying the height by 10x makes the function run 100x slower. Dramatically quadratic.
Complexity analysis
We can compare this implementation with level from earlier, which is linear-time.
In particular, looking at zipLevels with liftS2—which play similar
roles—there is a crucial difference when one of the arguments is empty
([] or EndS):
zipLevels simply returns the other argument, whereas liftS2 calls (<$>),
continuing the recursion down the other argument.
So zipLevels stops working after reaching the end of either argument, whereas
liftS2 walks to the end of both arguments. There is at least one
call to liftS2 on every level which will walk to the bottom of the tree,
so we get a quadratic lower bound Ω(height2).
Out of sight, out of mind
The problematic combinators are fmap and liftS2, which weaveS uses to
construct the unfolded tree. If we don’t care about that tree—wanting only
the effect of a monadic unfold—then we can get rid of the complexity
associated with those combinators.
With no result to return, we remove the a type parameter from the definition
of WeaveS, yielding the oblivious (“O”) variant:
We rewrite mesh into meshO, reducing a WeaveO m computation
into m () instead of m a.
The Applicative instance for WeaveS becomes a Monoid instance for WeaveO.
liftA2 is replaced with (<>), zipping two computations level-wise.
instance Applicative m => Semigroup (WeaveO m) where
EndO <> v = v
u <> EndO = u
WeftO u <> WeftO v = WeftO (liftA2 (<>) u v)
instance Applicative m => Monoid (WeaveO m) where
mempty = EndO
mappend = (<>)To implement a breadth-first walk, we modify weaveS above by replacing
liftA2 (Node a) with (<>). Note that the type parameter a is no longer in
the result. It was only used in the tree that we decided to forget.
weaveO :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveO m)
weaveO f s = f s <&> \case
LeafF -> mempty
NodeF _ l r -> WeftO (weaveO f l) <> WeftO (weaveO f r)Running weaveO with meshO yields a oblivious monadic unfold:
it produces m () instead of m (Tree a).
(You may remember seeing another implementation of that same signature
just earlier, unfoldM_BF_G_.)
unfoldM_BF_O_ :: Monad m => (s -> m (TreeF a s)) -> s -> m ()
unfoldM_BF_O_ f s = weaveO f s >>= meshOPreviously, we benchmarked the function thinTreeS that outputs a tree by
forcing the tree. With an oblivious unfold, there is no tree to force.
Instead we will count the number of generated NodeF constructors:
thinTreeO :: Int -> Int
thinTreeO = (`execState` 0) . unfoldM_BF_O_ (state . f)
where
f 0 counter = (LeafF, counter)
f h counter = (NodeF () 0 (h - 1), counter + 1) -- increment the counter for every NodeFWe adapt the benchmark from before to measure the complexity of unfolding thin trees. We have to increase the baseline height from 100 to 500 because this benchmark runs so much faster than the previous ones.
benchO :: TestTree
benchO = bgroup "O-thin"
[ bench "1x" (nf thinTreeO 500)
, bench "10x" (nf thinTreeO 5000) & bcompare "O-thin.1x"
]Benchmark output (relative):
| height | time |
|---|---|
| baseline | 1x |
| 10x | 9.8x |
Raw output
$ cabal exec breadth-first-unfolds -- -p O-thin
All
O-thin
1x: OK
148 μs ± 8.3 μs, 543 KB allocated, 773 B copied, 6.0 MB peak memory
10x: OK
1.45 ms ± 113 μs, 5.4 MB allocated, 82 KB copied, 7.0 MB peak memory, 9.78x
The growth is linear, as desired: the “10x” bench is 10x slower than the baseline “1x” bench.
Laziness for the win
The oblivious unfold avoided quadratic explosion by simplifying the problem.
Now let’s solve the original problem again,
so we can’t just get rid of fmap and liftA2.
As mentioned previously, the root cause was that (1) liftA2 calls fmap when
one of the constructors is EndS, and (2) fmap traverses the other argument.
The next solution will be to make fmap take constant time,
by storing the “mapped function” in the constructor.
Behold the “L” variant of WeaveS, which is a GADT:
For comparison, here is the previous “S” variant with GADT syntax:
This trick is also known as the “co-Yoneda construction”.
The definition of fmap is no longer recursive.
It doesn’t even need m to be a functor anymore!
instance Functor (WeaveL m) where
fmap f (EndL a) = EndL (f a)
fmap f (WeftL wa g) = WeftL wa (f . g)The Applicative instance is… a good exercise for the reader.
The details are not immediately important—we only care about improving fmap
for now—we will come back to have a look at the Applicative instance soon.
The runner function meshL is a simple bit of type Tetris.
meshL :: Monad m => WeaveL m a -> m a
meshL (EndL a) = pure a
meshL (WeftL wa f) = f <$> (wa >>= meshL)By partially applying WeftL to id as its second argument,
we obtain a counterpart to the unary WeftS constructor:
With those primitives redefined, the “weave” and “unfold” are identical. Below, we only renamed the “S” suffixes to “L”:
weaveL :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveL m (Tree a))
weaveL f s = f s <&> \case
LeafF -> pure Leaf
NodeF a s1 s2 -> liftA2 (Node a) (weftL (weaveL f s1)) (weftL (weaveL f s2))
unfoldM_BF_L :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_L f s = weaveL f s >>= meshLThe benchmarks show that 10x the height takes 10x the time. Linear growth again.
Benchmark code and output: thinTreeL and benchL
Copy of the benchS benchmark.
thinTreeL :: Int -> Tree ()
thinTreeL = runIdentity . unfoldM_BF_L f
where
f 0 = pure LeafF
f h = pure (NodeF () 0 (h - 1))
benchL :: TestTree
benchL = bgroup "L-thin"
[ bench "1x" (nf thinTreeL 100)
, bench "10x" (nf thinTreeL 1000) & bcompare "L-thin.1x"
]Benchmark output (relative):
| height | time |
|---|---|
| baseline | 1x |
| 10x | 9.93x |
Raw output:
$ cabal exec breadth-first-unfolds -- -p "L-thin"
All
L-thin
1x: OK
14.1 μs ± 782 ns, 59 KB allocated, 5 B copied, 6.0 MB peak memory
10x: OK
140 μs ± 13 μs, 586 KB allocated, 51 B copied, 6.0 MB peak memory, 9.93x
Lazy in more ways than one
As hinted by the “L” and “S” suffixes,
WeaveL is a “lazy” variant of WeaveS: fmap for WeaveL “postpones”
work by accumulating functions in the WeftL constructor.
That work is “forced” by meshL, which is where the fmap ((<$>)) of the
underlying monad m is called, performing the work accumulated
by possibly many calls to WeaveL’s fmap.
One subtlety is that there are multiple “lazinesses” at play.
The main benefit of using WeaveL is really to delay computation,
that is a kind of laziness, but WeaveL doesn’t need to be
implemented in a lazy language.
We can rewrite all of the code we’ve seen so far in a strict language
with minor changes, and we will still observe the quadratic vs linear behavior
of WeaveS vs WeaveL on thin trees.
The “manufactured laziness” of WeaveL is a concept independent of the
“ambient laziness” in Haskell.
Nevertheless, we can still find an interesting role for that “ambient laziness”
in this story. Indeed, the function weaveL also happens to be lazier than
weaveS in the usual sense.
A concrete test case is worth a thousand words. Consider the following
tree generator which keeps unfolding left subtrees while making
every right subtree undefined:
partialTreeF :: Bool -> TreeF () Bool
partialTreeF True = NodeF () True False
partialTreeF False = undefinedIf we used the pure unfold, we would get the same tree as this
recursive definition:
What happens if we use one of the monadic unfolds? For example
unfoldM_BF_S:
Try to force the first Node constructor.
whnfTreeS :: TestTree
whnfTreeS = expectFail $ testCase "S-whnf" $ do
case partialTreeS of
Node _ _ _ -> pure () -- Succeed
Leaf -> error "unreachable" -- definitely not a LeafAs it turns out, this test using the “S” variant fails. (That’s
why the test is marked with expectFail.)
Forcing partialTreeS evaluates the undefined in partialTreeF.
Therefore partialTreeS is not equivalent to partialTree.
$ cabal exec breadth-first-unfolds -- -p "S-whnf"
All
S-whnf: FAIL (expected)
Exception: Prelude.undefined
CallStack ...
In contrast, the “L” variant makes that same test succeed.
partialTreeL :: Tree ()
partialTreeL = runIdentity (unfoldM_BF_L (Identity . partialTreeF) True)
whnfTreeL :: TestTree
whnfTreeL = testCase "L-whnf" $ do
case partialTreeL of
Node _ _ _ -> pure () -- Succeed
Leaf -> error "unreachable"Test output:
$ cabal exec breadth-first-unfolds -- -p "L-whnf"
All
L-whnf: OK
This difference can only be seen with “lazy monads”, where (>>=) is
lazy in its first argument.
(If this definition sounds not quite right, that’s probably because of seq.
It makes a precise definition of “lazy monad” more complicated.)
Examples of lazy monads from the transformers library
are Identity, Reader, lazy State, lazy Writer, and Accum.
The secret sauce is the definition of liftA2 for WeaveL:
instance Applicative m => Applicative (WeaveL m) where
pure = EndL
liftA2 f (EndL a) wb = f a <$> wb
liftA2 f wa (EndL b) = flip f b <$> wa
liftA2 f (WeftL wa g) (WeftL wb h)
= WeftL ((liftA2 . liftA2) (,) wa wb) (\ ~(a, b) -> f (g a) (h b))In the third clause of liftA2, we put the function f in a lambda with a
lazy pattern (~(a, b)) directly under the topmost constructor WeftL.
Thus, we can access the result of f from the second field of WeftL
without looking at the first field. In liftS2 earlier, f was
passed as an argument to (liftA2 . liftS2), that forces us to run the
computation before we can get a hold on the result of f.
Maximizing laziness
The “L” variant of unfoldM is lazier than the “S” variant,
but there is still a gap between partialTreeL and the pure partialTree:
if we force not only the root, but also the left subtree of partialTreeL,
then we run into undefined again.
forceLeftTreeL :: TestTree
forceLeftTreeL = expectFail $ testCase "L-left" $ do
case partialTreeL of
Node _ (Node _ _ _) _ -> pure () -- Succeed
_ -> error "unreachable"Test output:
$ cabal exec breadth-first-unfolds -- -p "L-left"
All
L-left: FAIL (expected)
Exception: Prelude.undefined
Although the unfold using WeaveL is lazier than using WeaveS,
it is not yet as lazy as it could be.
The reason is that, strictly speaking, WeaveL’s liftA2 is a strict function.
The expansion function partialTreeF produces a level with an undefined
sub-computation, which crashes the whole level.
Each level in a computation will be either completely defined or undefined.
To recap, we’ve been looking at the following trees:
partialTreeS = undefined
partialTreeL = Node () undefined undefined
partialTree = Node () partialTree undefined
It is natural to ask: can we define a breadth-first unfold that, when applied
to partialTreeF, will yield the same tree as partialTree?
More generally, the new problem is to define a breadth-first unfoldM
whose specialization with the Identity functor is equivalent to
the pure unfold even on partially-defined values. That is, it satisfies
the following equation:
unfold f = runIdentity . unfoldM (Identity . f)
Laziness without end
The strictness of liftA2 is caused by WeaveL having two constructors.
Let’s get rid of EndL.
Having only one constructor lets us use lazy patterns:
Wait a second. I spoke too fast, GHC gives us an error:
error: [GHC-87005]
• An existential or GADT data constructor cannot be used
inside a lazy (~) pattern
• In the pattern: WeftE wa g
In the pattern: ~(WeftE wa g)
In an equation for ‘fmap’: fmap f ~(WeftE wa g) = WeftE wa (f . g)
|
641 | > fmap f ~(WeftE wa g) = WeftE wa (f . g)
| ^^^^^^^^^^
The feature we need is “first-class existentials”, for which there is an open GHC proposal.
Not letting that stop us, there is a simple version of first-class existentials
available in the package some,
as the module Data.Some.Newtype (internally using unsafeCoerce).
That will be sufficient for our purposes.
All we need is an abstract type Some and a pattern synonym:
And we’re back on track. Here comes the actual “E” (endless) variant:
newtype WeaveE m a = MkWeaveE (Some (WeavingE m a))
data WeavingE m a b where
WeftE :: m (WeaveE m b) -> (b -> a) -> WeavingE m a bI spare you the details.
Functor, Applicative, weftE, meshE
instance Functor (WeaveE m) where
fmap f (MkWeaveE (Some ~(WeftE u g))) = MkWeaveE (Some (WeftE u (f . g)))
instance Applicative m => Applicative (WeaveE m) where
pure x = MkWeaveE (Some (WeftE (pure (pure ())) (\_ -> x)))
liftA2 f (MkWeaveE (Some ~(WeftE u g))) (MkWeaveE (Some ~(WeftE v h)))
= MkWeaveE (Some (WeftE ((liftA2 . liftA2) (,) u v) (\ ~(x, y) -> f (g x) (h y))))
weftE :: m (WeaveE m a) -> WeaveE m a
weftE u = MkWeaveE (Some (WeftE u id))
meshE :: Monad m => WeaveE m a -> m a
meshE (MkWeaveE (Some (WeftE u f))) = f <$> (u >>= meshE)
Breadth-first unfold, “E” variant: weaveE and unfoldM_BF_E
weaveE :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveE m (Tree a))
weaveE f s = f s <&> \case
LeafF -> pure Leaf
NodeF a s1 s2 -> liftA2 (Node a) (weftE (weaveE f s1)) (weftE (weaveE f s2))
unfoldM_BF_E :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_E f s = weaveE f s >>= meshEThe endless WeaveE enables an even lazier implementation of unfoldM.
When specialized to the identity monad, it lets us force the resulting
tree in any order. The forceLeftTreeE test passes (unlike forceLeftTreeL).
partialTreeE :: Tree ()
partialTreeE = runIdentity (unfoldM_BF_E (Identity . partialTreeF) True)
forceLeftTreeE :: TestTree
forceLeftTreeE = testCase "E-left" $ do
case partialTreeE of
Node _ (Node _ _ _) _ -> pure () -- Succeed
_ -> error "unreachable"Test output:
$ cabal exec breadth-first-unfolds -- -p "E-left"
All
E-left: OK
One can also check that forcing the left spine of partialTreeE
arbitrarily deep throws no errors.
We made it lazy, but at what cost?
First, this “Endless” variant only works for lazy monads.
With a strict monad, the runner meshE will loop forever.
It is possible to run things more incrementally by pattern-matching on
WeaveE, but you’re better off using the oblivious WeaveO anyway.
Second, when you aren’t running into an unproductive loop, the “Endless” variant of
unfoldM has quadratic time complexity Ω(height2). The reason
is essentially the same as the “Strict” variant: liftA2 keeps looping even if
one argument is a pure weave—before, that was to traverse the other
non-pure argument, now, there isn’t even a way to tell when the computation
has ended.
Thus, every leaf may create work proportional to the height of the tree.
Running the same benchmark as before, we measure even more baffling timings:
| height | time |
|---|---|
| baseline | 1x |
| 10x | 738x |
Benchmark: thinTreeE and benchE
thinTreeE :: Int -> Tree ()
thinTreeE = runIdentity . unfoldM_BF_E f
where
f 0 = pure LeafF
f h = pure (NodeF () 0 (h - 1))
benchE :: TestTree
benchE = {- localOption mutatorCpuTime $ -} bgroup "E-thin"
[ bench "1x" (nf thinTreeE 100)
, bench "10x" (nf thinTreeE 1000) & bcompare "E-thin.1x"
]Raw output:
$ cabal exec breadth-first-unfolds -- -p "E-thin."
All
E-thin
1x: OK
243 μs ± 22 μs, 1.2 MB allocated, 13 KB copied, 6.0 MB peak memory
10x: OK
179 ms ± 17 ms, 119 MB allocated, 29 MB copied, 21 MB peak memory, 737.76x
Using the previous setup comparing a baseline and a 10x run, we see a more than 700x slowdown, so much worse than the 100x predicted by a quadratic model. Interestingly, the raw output shows that the total cumulative allocations did grow by a 100x factor.1
But it gets weirder with more data points: it does not follow a clear power law. If Time(n) grew as nc for some fixed exponent c, then the ratio Time(Mn)/Time(n) would be Mc, a constant that does not depend on n.
In the following benchmark, we keep doubling the height (M = 2) for every test case, and we measure the time relative to the preceding case each time. A quadratic model predicts a 4x slowdown at every step. Instead, we observe wildly varying factors.
Benchmark output (each time factor is relative to the preceding line, for example, the “4x” benchmark is 9.5x slower than the “2x” benchmark):
| height | time |
|---|---|
| 1x | |
| 2x | 10.9x |
| 4x | 9.5x |
| 8x | 5.4x |
| 16x | 1.4x |
Code and raw output
benchE' :: TestTree
benchE' = {- localOption mutatorCpuTime $ -} bgroup "E-thin-more"
[ bench "1x" (nf thinTreeE 100)
, bench "2x" (nf thinTreeE 200) & bcompare "E-thin-more.1x"
, bench "4x" (nf thinTreeE 400) & bcompare "E-thin-more.2x"
, bench "8x" (nf thinTreeE 800) & bcompare "E-thin-more.4x"
, bench "16x" (nf thinTreeE 1000) & bcompare "E-thin-more.8x"
]$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
E-thin-more
1x: OK
222 μs ± 9.3 μs, 1.2 MB allocated, 13 KB copied, 6.0 MB peak memory
2x: OK
2.43 ms ± 85 μs, 4.8 MB allocated, 236 KB copied, 7.0 MB peak memory, 10.94x
4x: OK
23.1 ms ± 1.2 ms, 19 MB allocated, 2.7 MB copied, 10 MB peak memory, 9.53x
8x: OK
126 ms ± 7.8 ms, 76 MB allocated, 18 MB copied, 24 MB peak memory, 5.44x
16x: OK
181 ms ± 7.0 ms, 119 MB allocated, 30 MB copied, 24 MB peak memory, 1.44x
I believe this benchmark is triggering some pathological behavior in the garbage
collector. I modified tasty-bench with an option to measure CPU time without GC
(mutator time). At time of writing, tasty-bench is still waiting for a new release.
We can point Cabal to an unreleased commit of tasty-bench by adding the following
lines to cabal.project.local.
source-repository-package
type: git
location: https://github.com/Bodigrim/tasty-bench.git
tag: 81ff742a3db1d514461377729e00a74e5a9ac1b8
Then, uncomment the setting “localOption mutatorCpuTime $” in benchE and
benchE' above and uncomment the import of mutatorCpuTime at the top.
Benchmark output (excluding GC time, relative):
| height | time |
|---|---|
| baseline | 1x |
| 1x | 95x |
Raw output
$ cabal exec breadth-first-unfolds -- -p "E-thin."
All
E-thin
1x: OK
216 μs ± 18 μs, 1.2 MB allocated, 13 KB copied, 6.0 MB peak memory
10x: OK
20.5 ms ± 1.9 ms, 119 MB allocated, 29 MB copied, 21 MB peak memory, 94.91x
For the “2x” benchmarks, we are closer the expected 4x slowdown, but there is still a noticeable gap. I’m going to chalk the rest to inherent measurement errors (the cost of tasty-bench’s simplicity) exacerbated by the pathological GC behavior; a possible explanation is that the pattern of memory usage becomes so bad that it affects non-GC time.
Benchmark output (excluding GC time, each measurement is relative to the preceding line):
| height | time |
|---|---|
| 1x | |
| 2x | 3.2x |
| 4x | 4.2x |
| 8x | 4.5x |
| 16x | 1.7x |
Raw output
$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
E-thin-more
1x: OK
186 μs ± 16 μs, 1.2 MB allocated, 13 KB copied, 21 MB peak memory
2x: OK
597 μs ± 28 μs, 4.8 MB allocated, 236 KB copied, 21 MB peak memory, 3.20x
4x: OK
2.48 ms ± 148 μs, 19 MB allocated, 2.9 MB copied, 21 MB peak memory, 4.15x
8x: OK
11.2 ms ± 986 μs, 76 MB allocated, 18 MB copied, 24 MB peak memory, 4.50x
16x: OK
18.4 ms ± 1.7 ms, 119 MB allocated, 29 MB copied, 24 MB peak memory, 1.65x
It doesn’t seem possible for a breadth-first unfold to be both maximally lazy and of linear time complexity, but I don’t know how to formally prove that impossibility either.
Microbenchmarks: Queues vs Global Levels vs Weaves
So far we’ve focused on asymptotics (linear vs quadratic). Some readers will inevitably wonder about real speed. Among the linear-time algorithms—queues (“Q”), global levels (“G”), and weaves (lazy “L” or oblivious “O”)—which one is faster?
tl;dr: Queues are (much) faster in these microbenchmarks (up to 25x!), but keep in mind that these are all quite naive implementations.
There are two categories to measure separately: unfolds which produce trees,
and oblivious unfolds—which don’t produce trees. These microbenchmarks
construct full trees up to a chosen number of nodes. When there is an
output tree, we force it (using nf), otherwise we force a counter of the
number of nodes. We run on different sufficiently large sizes (500 and 5000)
to check the stability of the measured factors, ensuring that we are only
comparing the time components that dominate at scale.
The tables list times relative to the queue benchmark for each tree size.
Tree-producing unfolds
| algorithm | size | time |
|---|---|---|
| Queue | 500 | 1x |
| Global Levels | 500 | 1.4x |
| Lazy Weave | 500 | 3.1x |
| Queue | 5000 | 1x |
| Global Levels | 5000 | 1.2x |
| Lazy Weave | 5000 | 3.3x |
Code and raw output
fullTreeF :: Int -> Int -> TreeF Int Int
fullTreeF size n | n >= size = LeafF
fullTreeF size n = NodeF n (2 * n) (2 * n + 1)
fullTree_Q :: Int -> Tree Int
fullTree_Q size = runIdentity (unfoldM_BF_Q (Identity . fullTreeF size) 1)
fullTree_G :: Int -> Tree Int
fullTree_G size = runIdentity (unfoldM_BF_G (Identity . fullTreeF size) 1)
fullTree_L :: Int -> Tree Int
fullTree_L size = runIdentity (unfoldM_BF_L (Identity . fullTreeF size) 1)
fullTree :: TestTree
fullTree = bgroup "fullTree"
[ bench "Q-1x" (nf fullTree_Q 500)
, bench "G-1x" (nf fullTree_G 500) & bcompare "fullTree.Q-1x"
, bench "L-1x" (nf fullTree_L 500) & bcompare "fullTree.Q-1x"
, bench "Q-10x" (nf fullTree_Q 5000)
, bench "G-10x" (nf fullTree_G 5000) & bcompare "fullTree.Q-10x"
, bench "L-10x" (nf fullTree_L 5000) & bcompare "fullTree.Q-10x"
]$ cabal exec breadth-first-unfolds -- -p fullTree
All
fullTree
Q-1x: OK
20.6 μs ± 1.1 μs, 141 KB allocated, 477 B copied, 6.0 MB peak memory
G-1x: OK
28.6 μs ± 2.4 μs, 223 KB allocated, 928 B copied, 6.0 MB peak memory, 1.39x
L-1x: OK
64.3 μs ± 5.6 μs, 353 KB allocated, 3.7 KB copied, 6.0 MB peak memory, 3.13x
Q-10x: OK
287 μs ± 26 μs, 1.5 MB allocated, 57 KB copied, 7.0 MB peak memory
G-10x: OK
349 μs ± 30 μs, 2.2 MB allocated, 94 KB copied, 7.0 MB peak memory, 1.22x
L-10x: OK
935 μs ± 73 μs, 3.5 MB allocated, 386 KB copied, 7.0 MB peak memory, 3.25x
Oblivious unfolds
| algorithm | size | time |
|---|---|---|
| Queue | 500 | 1x |
| Global Levels | 500 | 11x |
| Oblivious Weave | 500 | 25x |
| Queue | 5000 | 1x |
| Global Levels | 5000 | 10x |
| Oblivious Weave | 5000 | 24x |
Code and raw output
unfoldM_BF_Q_ :: Monad m => (s -> m (TreeF a s)) -> s -> m ()
unfoldM_BF_Q_ f s0 = unfoldM_f (s0 <+ Empty)
where
unfoldM_f (q :> s) = f s >>= \case
LeafF -> unfoldM_f q
NodeF _ l r -> unfoldM_f (r <+ l <+ q)
unfoldM_f Empty = pure ()eatFullTree_Q :: Int -> Int
eatFullTree_Q size = (`execState` 0) (unfoldM_BF_Q_ (state . \n c -> (fullTreeF size n, c + 1)) 1)
eatFullTree_G :: Int -> Int
eatFullTree_G size = (`execState` 0) (unfoldM_BF_G_ (state . \n c -> (fullTreeF size n, c + 1)) 1)
eatFullTree_O :: Int -> Int
eatFullTree_O size = (`execState` 0) (unfoldM_BF_O_ (state . \n c -> (fullTreeF size n, c + 1)) 1)
eatFullTree :: TestTree
eatFullTree = bgroup "eatFullTree"
[ bench "Q-1x" (nf eatFullTree_Q 500)
, bench "G-1x" (nf eatFullTree_G 500) & bcompare "eatFullTree.Q-1x"
, bench "W-1x" (nf eatFullTree_O 500) & bcompare "eatFullTree.Q-1x"
, bench "Q-10x" (nf eatFullTree_Q 5000)
, bench "G-10x" (nf eatFullTree_G 5000) & bcompare "eatFullTree.Q-10x"
, bench "W-10x" (nf eatFullTree_O 5000) & bcompare "eatFullTree.Q-10x"
]$ cabal exec breadth-first-unfolds -- -p eatFullTree
All
eatFullTree
Q-1x: OK
11.0 μs ± 724 ns, 78 KB allocated, 338 B copied, 6.0 MB peak memory
G-1x: OK
116 μs ± 11 μs, 379 KB allocated, 1.3 KB copied, 6.0 MB peak memory, 10.57x
W-1x: OK
278 μs ± 14 μs, 830 KB allocated, 5.9 KB copied, 6.0 MB peak memory, 25.36x
Q-10x: OK
120 μs ± 11 μs, 781 KB allocated, 21 KB copied, 6.0 MB peak memory
G-10x: OK
1.23 ms ± 122 μs, 3.9 MB allocated, 109 KB copied, 7.0 MB peak memory, 10.27x
W-10x: OK
2.92 ms ± 255 μs, 8.4 MB allocated, 631 KB copied, 7.0 MB peak memory, 24.43x
Conclusion
I hope to have piqued your interest in breadth-first unfolds without
using queues.
To the best of my knowledge, this specific problem hasn’t been studied in the
literature. It is of course related to breadth-first traversals,
previously solved using the Phases applicative.2
The intersection of functional programming and breadth-first walks is a small
niche, which makes it quick to survey that corner of the world for any related
ideas to those presented here.
The paper Modular models of monoids with operations by Zhixuan Yang
and Nicolas Wu, in ICFP 2023, mentions a general construction of Phases as an
example application of their theory. Basically, Phases is defined by a
fixed-point equation:
Phases f = Day f Phases :+: Identity
We can express Phases abstractly as a least fixed-point
μx.f▫x + Id in any monoidal category with a suitable structure.
If we instantiate the monoidal product ▫ not with Day convolution,
but with functor composition (Compose), then we get Weave.
In another coincidence, the monad-coroutine package
implements a weave function which is a generalization of
liftS2—this may require some squinting.
While WeaveS as a data type coincides with the free monad Free,
monad-coroutine’s core data type Coroutine coincides
with the free monad transformer FreeT.
We can view Phases as a generalization of “zipping” from
lists to free applicatives—which are essentially lists of actions,
and Weave generalizes that further to free monads. To recap, the surprise was
that the naive data type of free monads results in a quadratic-time unfold.
That issue motivated a “lazy” variant3 which achieves a linear-time
breadth-first unfold. That in turn suggested an even “lazier” variant which
enables more control on evaluation order at the cost of efficiency.
I’ve just released the weave library which implements the main ideas of this post. I don’t expect it to have many users, given how much slower it is compared to queue-based solutions. But I would be curious to find a use case for the new compositionality afforded by this abstraction.
Recap table
| Unfolds | Time | Laziness | Compositional | |
|---|---|---|---|---|
| Phases* | No | linear† | by levels | Yes |
| Queue (Q) | Yes | linear† | strict | No |
| Global Levels (G) | Yes | linear† | by levels | No |
| Strict Weave (S) | Yes | quadratic‡ | strict | Yes |
| Oblivious Weave (O) | Oblivious only | linear† | N/A | Yes |
| Lazy Weave (L) | Yes | linear† | by levels | Yes |
| Endless Weave (E) | Yes | quadratic‡E | maximally lazy◊ | Yes |
†Linear wrt. size: Θ(size).
‡Quadratic wrt. height: lower bound Ω(height2), upper bound O(height × size).
EThe “Endless” meshE only terminates with lazy monads.
*I guess there exists an “endless Phases” variant, that
would be quadratic and maximally lazy.
◊The definition of “maximally lazy” in this post actually misses a
range of possible lazy behaviors with monads other than Identity. A further
refinement seems to be another can of worms.
The main action of this Literate Haskell program
Extra test cases
whnfTreeE :: TestTree
whnfTreeE = testCase "E-whnf" $ do
case partialTreeE of
Node _ _ _ -> pure () -- Succeed
Leaf -> error "unreachable"
whnfTreeQ :: TestTree
whnfTreeQ = expectFail $ testCase "Q-whnf" $ do
case partialTreeQ of
Node _ _ _ -> pure () -- Succeed
Leaf -> error "unreachable"
partialTreeQ :: Tree ()
partialTreeQ = runIdentity (unfoldM_BF_Q (Identity . partialTreeF) True)
bfGraph_L :: Int -> Tree Int
bfGraph_L = (`evalState` Set.empty) . unfoldM_BF_L visitGraph
testGraphL :: TestTree
testGraphL = testCase "L-graph" $
bfGraph_L 1 @?=
Node 1
(Node 2 Leaf
(Node 5 Leaf Leaf))
(Node 4 Leaf (Node 6 Leaf Leaf))
bfGraph_E :: Int -> Tree Int
bfGraph_E = (`evalState` Set.empty) . unfoldM_BF_E visitGraph
testGraphE :: TestTree
testGraphE = testCase "E-graph" $
bfGraph_E 1 @?=
Node 1
(Node 2 Leaf
(Node 5 Leaf Leaf))
(Node 4 Leaf (Node 6 Leaf Leaf))Note that
tasty-benchalso reports memory statistics (allocated, copied, and peak memory) when certain RTS options are enabled, which I’ve done by compiling the test executable with-with-rtsopts=-T.↩︎- The tree-traversals library by Noah Easterly.
- Algebras for weighted search, by Donnacha Oisín Kidney and Nicolas Wu, in ICFP 2021.
- Breadth-first traversal via staging, by Jeremy Gibbons, Donnacha Oisín Kidney, Tom Shrijvers, and Nicolas Wu, in MPC 2022. It has been revised into a short version (5 pages): Phases in software architecture in FUNARCH 2023.
Speaking of variants of free monads, one might think of the “freer” monad, which has different motivations and which does not help us here.↩︎
On my feed aggregator haskell.pl-a.net, I occasionally saw posts with broken titles like this (from ezyang’s blog):
What’s different this time? LLM edition
Yesterday I decided to do something about it.
Locating the problem
Tracing back where it came from, that title was sent already broken by Planet Haskell, which is itself a feed aggregator for blogs. The blog originally produces the good not broken title. Therefore the blame lies with Planet Haskell. It’s probably a misconfigured locale. Maybe someone will fix it. It seems to be running archaic software on an old machine, stuff I wouldn’t deal with myself so I won’t ask someone else to.
Blog
|
| What’s
v
Planet Haskell
|
| What’s
v
haskell.pl-a.net (my site)
|
| What’s
v
Your screen
In any case, this mistake can be fixed after the fact. Mis-encoded text is such an ubiquitous issue that there are nicely packaged solutions out there, like ftfy.
ftfy has been used as a data processing step in major NLP research, including OpenAI’s original GPT.
But my hobby site is written in OCaml and I would rather have fun solving this encoding problem than figure out how to install a Python program and call it from OCaml.
Explaining the problem
This is the typical situation where a program is assuming the wrong text encoding.
Text encodings
A quick summary for those who don’t know about text encodings.
Humans read and write sequences of characters, while computers talk to each other using sequences of bytes. If Alice writes a blog, and Bob wants to read it from across the world, the characters that Alice writes must be encoded into bytes so her computer can send it over the internet to Bob’s computer, and Bob’s computer must decode those bytes to display them on his screen. The mapping between sequences of characters and sequences of bytes is called an encoding.
Multiple encodings are possible, but it’s not always obvious which encoding to use to decode a given byte string. There are good and bad reasons for this, but the net effect is that many text-processing programs arbitrarily guess and assume the encoding in use, and sometimes they assume wrong.
Back to the problem
UTF-8 is the most prevalent encoding nowadays.1 I’d be surprised if one of the Planet Haskell blogs doesn’t use it, which is ironic considering the issue we’re dealing with.
- A blog using UTF-8 encodes the right single quote2 " ’ " as three consecutive bytes (226, 128, 153) in its RSS or Atom feed.
- The culprit, Planet Haskell, read those bytes but wrongly assumed an encoding different from UTF-8 where each byte corresponds to one character.
- It did some transformation to the decoded text (extract the title and body and put it on a webpage with other blogs).
- It encoded the final result in UTF-8.
What the blog sees → '’'
|
| UTF-8 encode (one character into three bytes)
v
226 128 153
|
| ??? decode (not UTF-8)
v
What Planet Haskell sees → 'â' '€' '™'
|
| UTF-8 encode
v
(...)
|
| UTF-8 decode
v
What you see → 'â' '€' '™'
The final encoding doesn’t really matter, as long as everyone else downstream agrees with it. The point is that Planet Haskell outputs three characters “’” in place of the right single quote " ’ ", all because UTF-8 represents " ’ " with three bytes.
In spite of their differences, most encodings in practice agree at least about ASCII characters, in the range 0-127, which is sufficient to contain the majority of English language writing if you can compromise on details such as confusing the apostrophe and the single quotes. That’s why in the title “What’s different this time?” everything but one character was transferred fine.
Solving the problem
The fix is simple: replace “’” with " ’ ". Of course, we also want to do that with all other characters that are mis-encoded the same way: those are exactly all the non-ASCII Unicode characters. The more general fix is to invert Planet Haskell’s decoding logic. Thank the world that this mistake can be reversed to begin with. If information had been lost by mis-encoding, I may have been forced to use one of those dreadful LLMs to reconstruct titles.3
- Decode Planet Haskell’s output in UTF-8.
- Encode each character as a byte to recover the original output from the blog.
- Decode the original output correctly, in UTF-8.
There is one missing detail: what encoding to use in step 2? I first tried the naive thing: each character is canonically a Unicode code point, which is a number between 0 and 1114111, and I just hoped that those which did occur would fit in the range 0-255. That amounts to making the hypothesis that Planet Haskell is decoding blog posts in Latin-1. That seems likely enough, but you will have guessed correctly that the naive thing did not reconstruct the right single quote in this case. The Latin-1 hypothesis was proven false.
As it turns out, the euro sign “€” and the trademark symbol “™” are not in the Latin-1 alphabet. They are code points numbers 8364 and 8482 in Unicode, which are not in the range 0-255. Planet Haskell has to be using an encoding that features these two symbols. I needed to find which one.
Faffing about, I came across the Wikipedia article on Western Latin character sets which lists a comparison table. How convenient. I looked up the two symbols to find what encoding had them, if any. There were two candidates: Windows-1252 and Macintosh. Flip a coin. It was Windows-1252.
Windows-1252 differs from Latin-1 (and thus Unicode) in 27 positions, those whose byte starts with 8 or 9 in hexadecimal (27 valid characters + 5 unused positions): that’s 27 characters that I had to map manually to the range 0-255 according to the Windows-1252 encoding, and the remaining characters would be mapped for free by Unicode. This data entry task was autocompleted halfway through by Copilot, because of course GPT-* knows Windows-1252 by heart.
let windows1252_hack (c : Uchar.t) : int =
let c = Uchar.to_int c in
if c = 0x20AC then 0x80
else if c = 0x201A then 0x82
else if c = 0x0192 then 0x83
else if c = 0x201E then 0x84
else if c = 0x2026 then 0x85
else if c = 0x2020 then 0x86
else if c = 0x2021 then 0x87
else if c = 0x02C6 then 0x88
else if c = 0x2030 then 0x89
else if c = 0x0160 then 0x8A
else if c = 0x2039 then 0x8B
else if c = 0x0152 then 0x8C
else if c = 0x017D then 0x8E
else if c = 0x2018 then 0x91
else if c = 0x2019 then 0x92
else if c = 0x201C then 0x93
else if c = 0x201D then 0x94
else if c = 0x2022 then 0x95
else if c = 0x2013 then 0x96
else if c = 0x2014 then 0x97
else if c = 0x02DC then 0x98
else if c = 0x2122 then 0x99
else if c = 0x0161 then 0x9A
else if c = 0x203A then 0x9B
else if c = 0x0153 then 0x9C
else if c = 0x017E then 0x9E
else if c = 0x0178 then 0x9F
else cAnd that’s how I restored the quotes, apostrophes, guillemets, accents, et autres in my feed.
See also
- Mojibake, anyone? from BASHing data 2
Update: When Planet Haskell picked up this post, it fixed the intentional mojibake in the title.
There is no room for this in my mental model. Planet Haskell is doing something wild to parse blog titles.
As of September 2024, UTF-8 is used by 98.3% of surveyed web sites.↩︎
The Unicode right single quote is sometimes used as an apostrophe, to much disapproval.↩︎
Or I could just query the blogs directly for their titles.↩︎
There are two main episodes in this saga: Hope and Miranda. The primary conclusion is that the name comes from universal algebra, whereas another common interpretation of “algebraic” as a reference to “sums of products” is not historically accurate. We drive the point home with Clear. CLU is extra.
Disclaimer: I’m no historian and I’m nowhere as old as these languages to have any first-hand perspective. Corrections and suggestions for additional information are welcome.
Hope (1980)
Algebraic data types were at first simply called “data types”. This programming language feature is commonly attributed to Hope, an experimental applicative language by Rod Burstall et al.. Here is the relevant excerpt from the paper, illustrating its concrete syntax:
A data declaration is used to introduce a new data type along with the data constructors which create elements of that type. For example, the data declaration for natural numbers would be:
data num == 0 ++ succ(num)(…) To define a type ‘tree of numbers’, we could say
data numtree == empty ++ tip(num) ++ node(numtree#numtree)(The sign
#gives the cartesian product of types). One of the elements ofnumtreeis:node(tip(succ(0)), node(tip(succ(succ(0))), tip(0)))But we would like to have trees of lists and trees of trees as well, without having to redefine them all separately. So we declare a type variable
typevar alphawhich when used in a type expression denotes any type (including second- and higher-order types). A general definition of
treeas a parametric type is now possible:Nowdata tree(alpha) == empty ++ tip(alpha) ++ node(tree(alpha)#tree(alpha))treeis not a type but a unary type constructor – the typenumtreecan be dispensed with in favour oftree(num).
Pattern matching in Hope is done in multi-clause function declarations or multi-clause lambdas.
There was no case expression.
reverse(nil) <= nil reverse(a::l) <= reverse(l) <> [a]lambda true, p => p | false, p => false
As far as I can tell, other early programming languages cite Hope or one of its descendants as their inspiration for data types. There is a slightly earlier appearance in NPL by Darlington and the same Burstall, but I couldn’t find a source describing the language or any samples of data type declarations. Given the proximity, it seems reasonable to consider them the same language to a large extent. This paper by Burstall and Darlington (1977) seems to be using NPL in its examples, but data types are only introduced informally; see on page 62 (page 19 of the PDF):
We need a data type
atom, from which we derive a data typetree, using constructor functionstipto indicate a tip andtreeto combine two subtreestip : atoms → trees tree : trees x trees → treesWe also need lists of atoms and of trees, so for any type
alphaletnil : alpha-lists cons : alphas x alpha-lists → alpha-lists
Hope inspired ML (OCaml’s grandpa) to adopt data types. In Standard ML:
datatype 'a option = Nothing | Some of 'a
Before it became Standard, ML started out as the “tactic language” of the LCF proof assistant by Robin Milner, and early versions did not feature data types (see the first version of Edinburgh LCF). it’s unclear when data types were added exactly, but The Definition of Standard ML by Milner et al. credits Hope for it (in Appendix F: The Development of ML):
Two movements led to the re-design of ML. One was the work of Rod Burstall and his group on specifications, crystallised in the specification language Clear and in the functional programming language Hope; the latter was for expressing executable specifications. The outcome of this work which is relevant here was twofold. First, there were elegant programming features in Hope, particularly pattern matching and clausal function definitions; second, there were ideas on modular construction of specifications, using signatures in the interfaces. A smaller but significant movement was by Luca Cardelli, who extended the data-type repertoire in ML by adding named records and variant types.
Miranda (1985)
“Data types” as a programming language feature appeared in Hope, but its first mention under the name “algebraic data types” that I could find is in Miranda: a non-strict functional language with polymorphic types by David Turner in 1985:
Algebraic data types
The basic method of introducing a new concrete data type, as in a number of other languages, is to declare a free algebra. In Miranda this is done by an equation using the symbol
::=,being a typical example. (…) The idea of using free algebras to define data types has a long and respectable history [Landin 64], [Burstall 69], [Hoare 75]. We call it a free algebra, because there are no associated laws, such as a law equating a tree with its mirror image. Two trees are equal only if they are constructed in exactly the same way.tree ::= Niltree | Node num tree tree
In case you aren’t aware, Miranda is a direct precursor of Haskell. A minor similarity with Haskell that we can see here is that data constructors are curried in Miranda, unlike in Hope and ML. Another distinguishing feature of Miranda is laziness. See also A History of Haskell: being lazy with class.
Below are links to the articles cited in the quote above. The first [Landin 64] doesn’t explicitly talk about algebra in this sense, while [Burstall 69] and [Hoare 75] refer to “word algebra” rather than “free algebra” to describe the same structure, without putting “algebra” in the same phrase as “type” yet.
- The mechanical evaluation of expression, by Peter Landin (1964)
- Proving properties by structural induction, by Rod Burstall (1969)
- Recursive data structures by Tony Hoare (1975)
Hoare’s paper contains some futuristic pseudocode in particular:
A possible notation for such a type definition was suggested by Knuth; it is a mixture of BNF (the
|symbol) and the PASCAL definition of a type by enumeration:type proposition = (prop (letter) | neg (proposition) | conj, disj (proposition, proposition));(…) In defining operations on a data structure, it is usually necessary to enquire which of the various forms the structure takes, and what are its components. For this, I suggest an elegant notation which has been implemented by Fred McBride in his pattern-matching LISP. Consider for example a function intended to count the number of
&s contained in a proposition. (…)function andcount (p: proposition): integer; andcount := cases p of (prop(c) → 0| neg(q) → andcount(q)| conj(q,r) → andcount(q) + andcount(r)+1| disj(q,r) → andcount(q) + andcount(r));
Fred McBride’s pattern-matching LISP is the topic of his PhD dissertation. There is not enough room on this page to write about the groundbreaking history of LISP.
Unfree algebras in Miranda
If algebraic data types are “free algebras”, one may naturally wonder whether “unfree algebras” have a role to play. Miranda allows quotienting data type definitions by equations (“laws” or “rewrite rules”). You could then define the integers like this, with a constructor to decrement numbers, and equations to reduce integers to a canonical representation:
int ::= Zero | Suc int | Pred int
Suc (Pred n) => n
Pred (Suc n) => n
In hindsight this is superfluous, but it’s fun to see this kind of old experiments in programming languages. The modern equivalent in Haskell would be to hide the data constructors and expose smart constructors instead. There are uses for quotient types in proof assistants and dependently typed languages, but they work quite differently.
Sums of products?
There is another folklore interpretation of “algebraic” in “algebraic data types” as referring to “sums of products”.
It’s not an uncommon interpretation. In fact, trying to find a source for this folklore is what got me going on this whole adventure. The Wikipedia article on algebraic data types at the time of writing doesn’t outright say it, but it does refer to sums and products several times while making no mention of free algebras. Some [citation needed] tags should be sprinkled around. The Talk page of that article contains an unresolved discussion of this issue, with links to a highly upvoted SO answer and another one whose references don’t provide first-hand account of the origins of the term. For sure, following that idea leads to some fun combinatorics, like differentiation on data types, but that doesn’t seem to have been the original meaning of “algebraic data types”.
That interpretation might have been in some people’s mind in the 70s and 80s, even if only as a funny coincidence, but I haven’t found any written evidence of it except maybe this one sentence in a later paper, Some history of programming languages by David Turner (2012):
The ISWIM paper also has the first appearance of algebraic type definitions used to define structures. This is done in words, but the sum-of-products idea is clearly there.
It’s only a “maybe” because while the phrase “algebraic type” undeniably refers to sums of products, it’s not clear that the adjective “algebraic” specifically is meant to be associated with “sum-of-products” in that sentence. We could replace “algebraic type” with “data type” without changing the meaning of the sentence.
Clear (1979)
In contrast, free algebras—or initial algebras as one might prefer to call them—are a concept from the areas of universal algebra and category theory with a well-established history in programming language theory by the time algebraic data types came around, with influential contributions by a certain ADJ group; see for example Initial algebra semantics and continuous algebras.
Ironically, much related work focused on the other ADT, “abstract data types”. Using universal algebra as a foundation, a variety of “specification languages” have been designed for defining algebraic structures, notably the OBJ family of languages created by Joseph Goguen (a member of the aforementioned ADJ group) and others, and the Clear language by Rod Burstall (of Hope fame) and Joseph Goguen. Details of the latter can be found in The Semantics of Clear, a specification language. (You may remember seeing a mention of Clear earlier in the quote from The Definition of Standard ML.)
Example theories in Clear
Here is the theory of monoids in Clear. It consists of one sort named carrier,
an element (a nullary operation) named empty and a binary operation append.
constant Monoid = theory
sorts carrier
opns empty : carrier
append : carrier,carrier -> carrier
eqns all x: carrier . append(x,empty) = x
all x: carrier . append(empty,x) = x
all x,y,z: carrier . append(append(x,y),z) = append(x,append(y,z))
endth
A theory is an interface. Its implementations are called algebras. In that example, the algebras of “the theory of monoids” are exactly monoids.
In every theory, there is an initial algebra obtained by turning the
operations into constructors (or “uninterpreted operations”), equating elements
(which are trees of constructors) modulo the equations of the theory.
For the example above, the initial monoid is a singleton monoid, with only an empty element
(all occurrences of append are simplified away by the two equations for empty),
which is not very interesting. Better examples are those corresponding to the usual data types.
The booleans can be defined as the initial algebra of the theory with one sort (truthvalue)
and two values of that sort, true and false.
constant Bool = theory data
sorts truthvalue
opns true,false: truthvalue
endth
In Clear, the initial algebra is specified by adding the data keyword to a theory.
In the semantics of Clear, rather than thinking in terms of a specific algebra,
a “data theory” is still a theory (an interface),
with additional constraints that encode “initiality”, so the only possible
algebra (implementation) is the initial one.
My guess as to why the concept of data theory is set up that way
is that it allows plain theories and data theories to be combined seamlessly.
The natural numbers are the initial algebra of zero and succ:
constant Nat = theory data
sorts nat
opns zero: nat
succ: nat -> nat
endth
At this point, the connection between “data theories” in Clear and data types in Hope and subsequent languages is hopefully clear.
More substantial examples in Clear
Theories can be extended into bigger theories with new sorts, operations, and equations.
Here is an extended theory of booleans with two additional operations not, and,
and their equations. This should demonstrate that, beyond the usual mathematical structures,
we can define non-trivial operations in this language:
constant Bool1 = enrich Bool by
opns not: truthvalue -> truthvalue
and: truthvalue,truthvalue -> truthvalue
eqns all . not true = false
all . not false = true
all p: truthvalue . and(false, p) = false
all p: truthvalue . and(true, p) = p
enden
Initial algebras are also called free algebras, but that gets confusing because
“free” is an overloaded word. Earlier for instance, you might have expected the initial
monoid, or “free monoid”, to be the monoid of lists. The monoid of lists is the
initial algebra in a slightly different theory: the theory of monoids with an
embedding from a fixed set of elements A.
We might formalize it as follows in Clear.
The theory List is parameterized by an algebra A of the theory Set,
and its body is the same as Monoid, except that we renamed carrier to list,
we added an embed operation, and we added the data keyword to restrict that
theory to its initial algebra.
constant Set = theory sorts element endth
procedure List(A : Set) = theory data
sorts list
opns empty : list
append : list,list -> list
embed : element of A -> list
eqns all x: list . append(x,empty) = x
all x: list . append(empty,x) = x
all x,y,z: list . append(append(x,y),z) = append(x,append(y,z))
endth
One may certainly see a resemblance between theories in Clear, modules in ML, and object-oriented classes. It’s always funny to find overlaps between the worlds of functional and object-oriented programming.
CLU (1977)
CLU is a programming language created at MIT by Barbara Liskov and her students in the course of their work on data abstraction.
It features tagged union types, which are called “oneof types”. (Source: CLU Reference Manual by Barbara Liskov et al. (1979).)
T = oneof[empty: null,
integer: int,
real_num: real,
complex_num: complex]
Values are constructed by naming the oneof type (either as an identifier bound to it,
or by spelling out the oneof construct) then the tag prefixed by make_:
T$make_integer(42)
The tagcase destructs “oneof” values.
x: oneof[pair: pair, empty: null]
...
tagcase x
tag empty: return(false)
tag pair(p: pair): if (p.car = i)
then return(true)
else x := down(p.cdr)
end
end
The main missing feature for parity with algebraic data types is recursive type definitions, which are not allowed directly. They can be achieved indirectly though inconveniently through multiple clusters (classes in modern terminology). (Source: A History of CLU by Barbara Liskov (1992).)
Burstall’s papers on Hope and Clear cite CLU, but beyond that it doesn’t seem easy to make precise claims about the influence of CLU, which is an object-oriented language, on the evolution of those other declarative languages developed across the pond.
]]>In a Turing machine, there is a tape and there is a program. What is the program in a Turing machine? drumroll 🥁… It is a finite-state machine, which is equivalent to a regular expression!
Just for a silly pun, I’m going to introduce this programming language in an absurd allegory about T-rexes (“Turing regular expressions”, or just “Turing expressions”).
Tales of T-rexes
T-rexes are mysterious creatures who speak in tales (Turing expressions) about one T-rex—usually the speaker. Tales are constructed from symbols for the operations of a Turing machine, and the standard regex combinators.
>and<are the simple tales of the T-rex taking a single step to the left or right;0!and1!tell of the T-rex “writing” a0or1;0?and1?tell of the T-rex “observing” a0or1;e₁ ... eₙis a tale made up of a sequence ofntales, and there is an empty tale in the casen = 0(“nothing happened”);(e)*says that the taleehappened an arbitrary number of times, possibly zero;(e₁|e₂)says that one of the tales happened,e₁ore₂.
As it is a foreign language from a fantasy world, the description above
shouldn’t be taken too literally.
For instance, it can be difficult to imagine a T-rex holding a pen,
much less writing with it. In truth, the actions that the tales 0! and 1!
describe are varied, and “writing” is only the closest approximation
among the crude words of humans.
Iteration (e)* and choice (e₁|e₂) make T-rex tales nondeterministic:
different sequences of events may be valid interpretations of the same tale.
The observations 0? and 1? enable us to prune the tree of
possibilities. T-rex communication may be convoluted sometimes,
but at least they mean to convey coherent series of events.
Enough exposition. Let’s meet T-rexes!
Two T-rexes greet us, introducing themselves as Alan and Alonzo. They invite us for a chat in their home in Jurassic park.
The tale of Alonzo
While Alan serves tea, Alonzo shows us a mysterious drawing. T-rex imagery is quite simplistic, owing to their poor vision and clumsy hands. We can sort of recognize Alonzo on the left, next to a row of circles and lines:
🦖
0001011000
Then, Alonzo tells us the following tale: “(>)*1?0!>1?0!”.
Noticing our puzzled faces, Alan fetches a small machine from the garage. It is a machine to interpret T-rex tales in a somewhat visual rendition. Alan demonstrates how to transcribe the tale that Alonzo told us together with his drawing into the machine. Here is the result:
Alan's machine
(>)*1?0!>1?0!00010110000001010000Press the Run button to see the machine render the tale.
It happens in the blink of an eye; T-rexes are really fast!
We will break it down step by step in what follows.
(You can also edit the program tale and the input
drawing in these examples and see what happens then.)
Alonzo’s drawing is the scene where the tale takes place.
We ask Alan and Alonzo what “0” and “1” represent,
but we are not fluent enough in T-rex to understand their apparently nuanced answer.
We have no other choice than to make abstraction of it.
🦖 ← Alonzo
0001011000 ← the world
Alonzo first said “(>)*”: he walked toward the right.
As is customary in T-rex discourse, this tale leaves a lot up to interpretation.
There are many possible guesses of how many steps he actually took.
He could even have walked out of the picture!
🦖
0001011000
🦖
0001011000
🦖
0001011000
🦖
0001011000
But then the tale goes “1?”:
Alonzo observed a 1, whatever that means, right where he stopped.
That narrows the possibilities down to three:
🦖
0001011000
🦖
0001011000
🦖
0001011000
Afterwards, Alonzo tells us that “0!” happened,
which we think of abstractly as “writing” 0 where Alonzo is standing.
Before "0!"
🦖
0001011000
After "0!"
🦖
0000011000
Each of the three possibilities from earlier after writing 0:
🦖
0000011000
🦖
0001001000
🦖
0001010000
Alonzo then made one step to the right, and observed another 1 (“>1?”).
Only the second possibility above is consistent with that subsequent observation.
Finally, he writes 0 again.
🦖
0001000000
And we can see that the outcome matches the machine output above.
After puzzling over it for a while, we start to make sense of
Alonzo’s tale “(>)*1?0!>1?0!”, and imagine this rough translation:
“Funny story. I walked to the right, and I stopped in front of a 1,
isn’t that right Alan? I was feeling hungry. So I
ate it. I left nothing! I was still hungry, I am a dinosaur after all,
so I took a step to the right, only to find another 1.
Aren’t I lucky? I ate it too. It was super tasty!”
We have a good laugh. Alan and Alonzo share a few more tales. More tea? Of course. At their insistence, we try telling some of our own tales, to varied success. It’s getting late. Thank you for your hospitality. The end.
More examples
Exercise for the interested reader: implement the following operations using Turing expressions. Here’s a free machine to experiment with:
Alan's empty machine
Preamble: extra features and clarifications
For convenience, other digits can also be used in programs
(i.e., 2?, 2!, 3?, 3!, up to 9? and 9! are allowed).
The symbol 2 will be used to mark the end of a binary input
in the exercises that follow.
Turing expressions are nondeterministic, but the machine only searches for
the first valid execution. The search is biased as follows:
(e₁|e₂) first tries e₁ and then, if that fails, e₂;
(e)* is equivalent to (|e(e)*).1
The tape extends infinitely to the left and to the right, initialized with zeroes
outside of the input.
The machine aborts if the tape head (🦖) walks out of the range [-100, 100].2
0 is the initial position of the tape head and where the input starts.
The machine prints the first 10 symbols starting at position 0 as the output.
Whitespace is ignored. A # starts a comment up to the end of the line.
What kind of person comments a regex?
Determinism
Although Turing expressions are nondeterministic in general,
we obtain a deterministic subset of Turing expressions by requiring
the branching combinators to be guarded.
Don’t allow unrestricted iterations (e)* and choices (e₁|e₂),
only use while loops (1?e)*0?3 and conditional statements (0?e₁|1?e₂).
Not
Flip the bits. 0 becomes 1, 1 becomes 0.
Examples:
Input: 0100110112
Output: 1011001002
Input: 1111111112
Output: 0000000002
Solution
Alan's machine
((0?1!|1?0!)>)*2?01001101121011001002Binary increment
Input: binary representation of a natural number n.
Output: binary representation of (n + 1).
Cheatsheet of binary representations:
0: 000
1: 100
2: 010
3: 110
4: 001
5: 101
Examples:
Input: 0010000000
Output: 1010000000
Input: 1110000000
Output: 0001000000
No input delimiters for this exercise.
Solution
Alan's machine
(1?0!>)*0?1!11100000000001000000Left shift
Move bits to the left.
Examples:
Input: 0100110112
Output: 1001101102
Input: 1111111112
Output: 1111111102
Solution
Alan's machine
(2?|(0!>0?|1!>1?)*(0!>2?))01001101121001101112Right shift
Move bits to the right.
Examples:
Input: 0100110112
Output: 0010011012
Input: 1111111112
Output: 0111111112
Solution
Alan's machine
((0?|1?)(0?>)*(2?|1?0!>(1?>)*(2?|0?1!>)))*2?01001101120010011012Cumulative xor
Each bit of the output is the xor of the input bits to the left of it.
Examples:
Input: 0100100012
Output: 0111000012
Input: 1111111112
Output: 1010101012
Solution
Alan's machine
((0?>)*(1?>((0?1!>)*1?0!|2?)|2?))*2?01001000120111000012Unary subtraction
Input: Two unary numbers x and y, separated by a single 0.
Output: The difference (x - y).
Example: evaluate (5 - 3).
Input: 1111101110
Output: 1100000000
Feel free to add a 2 to delimit the input.
I only decided to allow symbols other than 0 and 1
after finishing this exercise.
Solution
Alan's machine
(1?>)*0?<(1?>(0?>)*1?0!>(1?<(0?<)*1?0!<|0?(0?<)*1?0!))*0?<(1?<)*0?>11111011101100000000Example: evaluate (3 - 5).
Input: 1110111110
Output: 0000000110
In my solution, the result -2 is represented by two 1 placed in the location
of the second argument rather than the first. You may use a different encoding.
Solution (bis)
Alan's machine
(1?>)*0?<(1?>(0?>)*1?0!>(1?<(0?<)*1?0!<|0?(0?<)*1?0!))*0?<(1?<)*0?>11101111100000000110Commented program
The lack of delimiters in my version of the problem makes this a bit tricky.
# Example: evaluate (5 - 3).
# TAPE: 111110111
# HEAD: ^
(1?>)*0?
# TAPE: 111110111
# HEAD: ^
#
# The following line checks whether
# the second argument is 0,
# in which case we will skip the loop.
>(0?<|1?<<)
# Otherwise we move the head on the last 1
# of the first argument.
# TAPE: 111110111
# HEAD: ^
#
# BEGIN LOOP
# Loop invariant: the difference between
# the two numbers on tape is constant.
(1?
# Go to the first 1 of the second argument.
>(0?>)*1?
# During the first iteration,
# the tape looks like this:
# TAPE: 111110111
# HEAD: ^
#
# Erase 1 to 0 and move to the right.
0!>
# TAPE: 111110011
# HEAD: ^
#
# Check whether there remains
# at least one 1 to the right.
# BEGIN IF
(1?
# There is at least one 1 on the right.
# Move back into the first argument.
<(0?<)*1?
# TAPE: 111110011
# HEAD: ^
#
# Erase 1 to 0. Move left.
0!<
# TAPE: 111100011
# HEAD: ^
#
# If the 1s of the first argument ran out
# at this point (which would mean
# first argument < second argument),
# we will BREAK out of the loop (then terminate),
# otherwise, CONTINUE, back to the top of loop
|0?
# ELSE (second branch of the IF from three lines ago)
# The 1s of the second argument ran out
# (which means first argument >= second argument)
# Tape when we reach this point (in the last iteration):
# TAPE: 1110000000
# HEAD: ^
#
# Move back into the first argument.
(0?<)*1?
# TAPE: 111000000
# HEAD: ^
#
# Erase 1 to 0.
0!
# TAPE: 110000000
# HEAD: ^
#
# Reading a 0 will break out of the loop.
# BREAK
)
# END IF
)*0?
# END LOOP
The separation of program and tape
Until this point, there may remain misgivings about whether this is actually “regular expressions”. The syntax is the same, but is it really the same semantics? This section spells out a precise alternative definition of Turing machines with a clear place for the standard semantics of regular expressions (as regular languages).
Instead of applying regular expressions directly to an input string, we are using them to describe interactions between the program and the tape of a Turing machine. Then the regular expression might as well be the program.
The mechanics of Turing machines are defined traditionally via a transition relation between states. A Turing machine state is a pair (q, t) of a program state q ∈ Q (where Q is the set of states of a finite-state machine) and a tape state t ∈ 2ℤ × ℤ (the bits on the tape and the position of the read-write head).
That “small-step” formalization of Turing machines is too monolithic for our present purpose of revealing the regular languages hidden inside Turing machines. The issue is that the communication between the program and the tape is implicit in the transition between states as program-tape pairs (q, t). We will take a more modular approach using trace semantics: the program and the tape each give rise to traces of interactions which make explicit the communication between those two components.
The standard semantics of regular expressions
The raison d’être of regular expressions is to recognize sequences of symbols,
also known as strings, lists, or words. Here, we will refer to them as traces.
Regular expressions are conventionally interpreted as sets of traces, reading
the iteration * and choice | combinators are operations on sets.
Let A be a set of symbols; in our case
A = {<, >, 0!, 1!, 0?, 1?} but the
following definition works with any A. The trace semantics of a regular
expression e over the alphabet A is defined inductively:
- An atomic expression
e∈ A contains a single trace which is just that symbol. Trace(e) = {e} ife∈ A - A concatenation of expressions
e₁…eₙcontains concatenations of traces of everyeᵢ. Trace(e₁…eₙ) = { t1 … tn ∣ ∀i, ti ∈ Trace(eᵢ) } - An iteration
(e)*contains all concatenations of traces t1 … tn such that each subtrace ti is a trace of that samee. Trace((e)*) = { t1 … tn ∣ ∀i, ti ∈ Trace(e) } - A choice
(e₁|e₂)contains the union of traces ofe₁ande₂. Trace((e₁|e₂)) = Trace(e₁) ∪ Trace(e₂)
Equivalently, a trace semantics can be viewed as a relation between program and trace.
We write e ⊢ t, pronounced “e recognizes t”, as an abbreviation of
t ∈ Trace(e).
This is also for uniformity with the notation in the next section.
A core result of automata theory is that the sets of traces definable by regular expressions are the same as those definable by finite-state machines. That led to our remark that Turing machines might as well be Turing regular expressions.
The Turing machine memory model
In the semantics of regular expressions above, the meaning of the symbols
(<, >, etc.) is trivial:
a symbol in a regular expression denotes itself as a singleton trace.
In this section, we will give these symbols their natural meaning as
“operations on a tape”.
The tape is the memory model of Turing machines.
Memory models are better known in the context of concurrent programming
languages, as they answer the question of how to resolve concurrent writes and
reads.
The tape carries a sequence of symbols extending infinitely in both directions. A head on the tape reads one symbol at a time, and can move left or right, one symbol at a time. Addresses on the tape are integers, elements of ℤ. A tape state is a pair (m, i) ∈ 2ℤ × ℤ: the memory contents is m ∈ 2ℤ (note 2ℤ = ℤ → {0, 1}) and the position of the head is i ∈ ℤ. The behavior of the tape is defined as a ternary relation pronounced “(m, i) steps to (m′, i′) with trace t”, written:
(m, i) ⇝ (m′, i′) ⊢ t
It is defined by the following rules. We step left and right by decrementing and incrementing the head position i.
(m, i) ⇝ (m, i − 1) ⊢ <
(m, i) ⇝ (m, i + 1) ⊢ >
Writing operations use the notation m[i ↦ v] for updating the value of the tape m at address i with v.
(m, i) ⇝ (m[i ↦ 0], i) ⊢ !0
(m, i) ⇝ (m[i ↦ 1], i) ⊢ !1
Observations, or assertions, step only when a side condition is satisfied. Otherwise, the tape is stuck, and that triggers backtracking in the search for a valid trace.
(m, i) ⇝ (m, i) ⊢ ?0 if m(i) = 0
(m, i) ⇝ (m, i) ⊢ ?1 if m(i) = 1
We close this relation by reflexivity (indexed by the empty trace ϵ) and transitivity (indexed by the concatenation of traces).
(m, i) ⇝ (m, i) ⊢ ϵ (m, i) ⇝ (m′, i′) ⊢ t and (m′, i′) ⇝ (m″, i″) ⊢ t′ ⇔ (m, i) ⇝ (m″, i″) ⊢ t t′
Turing regular expressions
We now connect programs and tapes together through the trace.
A Turing regular expression e and an initial tape (m, 0) step
to a final tape (m′, i′), written
e, (m, 0) ⇝ (m′, i′)
if there exists a trace t recognized by both the program and the tape:
e ⊢ t
(m, 0) ⇝ (m′, i′) ⊢ t
We can then consider classes of functions computable by Turing expressions via an
encoding of inputs and outputs on the tape.
Let encode : ℕ → 2ℤ be an encoding of natural numbers as tapes.
A Turing expression e computes a function f : ℕ → ℕ if, for all n,
there is exactly one final tape (m′, i′) such that
e, (encode(n), 0) ⇝ (m′, i′)
and that unique tape encodes f(n):
m′ = encode(f(n))
Et voilà. That’s how we can program Turing machines with regular expressions.
Finite-state machines: the next 700 programming languages
Finite-state machines appear obviously in Turing machines, but you can similarly view many programming languages in terms of finite-state machines by reducing the state to just the program counter: “where you are currently in the source program” can only take finitely many values in a finite program. From that point of view, all other components of the abstract machine of your favorite programming language—including the values of local variables—belong to the “memory” that the program counter interacts with. Why would we do this? For glory of course. So we can say that most programming languages are glorified regular expressions.
To be fair, there are exceptions to this idea: cellular automata and homoiconic languages (i.e., with the ability to quote and unquote code at will) are those I can think of. At most there is a boring construction where the finite-state machine writes the source program to memory then runs a general interpreter on it.
Completely free from Turing-completeness
The theory of formal languages and automata has a ready-made answer about the expressiveness of regular expressions: regular expressions denote regular languages, which belong to a lower level of expressiveness than recursively enumerable languages in the Chomsky hierarchy.
What I want to point out is that theory can only ever study “expressiveness” in a narrow sense. Real expressiveness is fundamentally open-ended: the only limit is your imagination. Any mathematical definition of “expressiveness” must place road blocks so that meaningful impossibility theorems can be proved. The danger is to forget about those road blocks when extrapolating mathematical theorems into general claims about the usefulness of a programming language.4
The expressiveness of formal languages is a delicate idea in that there are well-established mathematical concepts and theorems about it, but the rigor of mathematics hides a significant formalization gap between how a theory measures “expressiveness” and the informal open-ended question of “what can we do with this?”.
“Regular expressions are not Turing-complete” might literally be a theorem in some textbook; it doesn’t stop regular expressions from also being a feasible programming language for Turing machines as demonstrated in this post. Leaving you to come to terms with your own understanding of this paradox, a closing thought: at the end of the day, science is no slave to mathematics, we do mathematics in service of science.
Bonus track: Brainfuck
Turing regular expressions look similar to Brainfuck. Let’s extend the primitives of Turing expressions to be able to compile Brainfuck.
The loop operator [...] in Brainfuck can be written as (0~...)*0?,
with a new operation 0~ to observe a value not equal to zero.
With + and - (increment and decrement modulo 256) as additional operations
supported by our regular expressions,
a Brainfuck program is compiled to an extended Turing expression
simply by replacing [ and ] textually with (0~ and )*0?.5
Interestingly, translating Brainfuck to extended Turing expressions does not use |,
yet Brainfuck is Turing-complete: while loops seem to make conditional
expressions redundant.
(Are Turing expressions without choice (e₁|e₂)
(i.e., with only <, >, 0!, 1!, 0?, 1?, and (e)*) also Turing-complete?)
The machine implemented within this post supports those new constructs:
+, -, 0~, 1~ (also 2~ to 9~, just because; but not more, just because),
and the brackets [ and ]. You can write code in Brainfuck, and it will be desugared
and interpreted as an extended Turing expression. You can also directly write an extended
Turing expression.
The input can now be a comma-separated list prefixed by a comma (to allow multi-digit numbers).
Example: ,1,1,2,3,5,8,13.
Trailing zeroes in the output will not be printed for clarity.
Brainfuck is a high-level programming language compared to Turing expressions. Being able to increment and decrement numbers makes programming so much less tedious than explicitly manipulating unary or binary numbers in Turing machines.
Small examples
The idiom [-] zeroes out a number.
Alan's machine
[-],42,0Add two numbers.
Alan's machine
>[-<+>],42,57,99Nondeterministic Brainfuck
Extending Brainfuck with star (e)* and choice (e₁|e₂) equips the language with nondeterminism.
One thing we can do using nondeterminism is to define the inverse of a function
simply by guessing the output y, applying the function to it f(y),
then checking that it matches the input x, in which case y = f⁻¹(x).
It’s not efficient, but it’s a general implementation of inverses which may have its uses
in writing formal specifications.
Here’s a roundabout implementation of subtraction (x − y): guess the answer (call it z), add one of the operands to it (z + y), and compare the result with the other operand (if z + y = x then z = x − y).
Alan's machine
[->>>+<<<](+>>+<<)*>[->+<]>[->-<]>0?,10,3,7Commented program
# Using four consecutive cells, named A, B, C, D
# Expected result: value of (A - B) placed in A
# D ← A
# A ← 0
[->>>+<<<]
# A ← GUESS
# C ← A
(+>>+<<)*
# C ← B + C
# B ← 0
>[->+<]
# D ← D - C
# C ← 0
>[->-<]
# Assert(D == 0)
>0?
Note that this is the opposite of most real-world regex engines:
*is usually eager (equivalent to(e(e)*|)rather than(|e(e)*)). The lazy variant is usually written*?and you could add it to the syntax of Turing regexes if you want.↩︎It’s very easy to accidentally write an infinite loop; this is a half-assed safeguard to catch some of them.↩︎
Extra care must be taken when more than two distinct symbols may be encountered on the tape.↩︎
Another hot take in the same vein is that the simply-typed lambda calculus—the simplest total functional programming language—is Turing-complete: you can encode Turing machines/general recursive functions/your favorite Turing-complete gadget by controlling nontermination with fuel, for a concrete example. Another idea is that a language where functions terminate can easily be extended with nontermination or recursion as an explicit effect.
More generally, Kleene’s normal form theorem says that if you can “express” primitive recursion, then you can “express” general recursion. Some might view this theorem as a counterargument, pointing to a boundary between “Turing-completeness” (can “express” general recursion) and “weak Turing-completeness” (can “express” primitive recursion) which can be made precise. While I recognize that there is a rich theory behind these concepts, I rather view Kleene’s normal form theorem as an argument why such a distinction is too subtle to be relevant to expressiveness in a broad sense of what we can and cannot do using a programming language.↩︎
We might even say that Brainfuck can be compiled to regular expressions using regular expressions. The regular expressions to do those substitutions are trivial though. Using
sed:sed 's/\[/(0~/g;s/\]/)\*0?/g', where the two actual regular expressions are\[and\].↩︎
Abstraction, food for thought
Two apples are the same as two apples.
Two apples are not the same as two oranges.
Two ripe apples are not the same as two rotten apples, even though they are both two apples and two apples.
Two fruits are the same as two fruits, even though they could be two apples and two oranges.
]]>Algebraic effects
In Haskell, different sorts of effectful computations can be expressed using monads. Monads for individual effects are fairly well understood. The challenge now is to combine many different effects. Applications manage many kinds of resources (files, network, databases…), handle many types of errors, and run in different environments (production vs testing with mock components). Can that be done while maintaining a reasonable level of separation of concerns?
Currently, a common approach is to use monad transformers and type classes (mtl-style).
But when you have a big stack of monad transformers, it may not be easy to even
understand what (>>=) does, because its behavior arises from the composition
of all of these transformers. So the actual control flow of the program is
opaque to us, which can be an obstacle to locating errors and guaranteeing
performance.
Algebraic effects are another approach to combine effects. Whereas with transformers, every monad transformer must be defined from scratch, algebraic effects start from a few core primitives: you have one (parameterized) monad with abilities to “call” an operation and to “handle” such calls. The hope is that those core primitives:
- are simple to implement and to optimize;
- make it easy to define and reason about effects, in terms of both behavior and performance.
Until now, algebraic effect systems in Haskell used free monads or the continuation monad. Continuations were emulated as closures; this comes with a level of indirection whose cost is difficult to mitigate. The newly implemented delimited continuations primops let us directly manipulate native continuations.
This post uses delimited continuations to implement programs with various effects. The usual culprits:
- Exceptions
- Output
- Combining exceptions and output
- Input
- Combining input and output: streaming
- Interacting with the real world
- State
- Nondeterminism
- Concurrency
The example programs leveraging this mini effect library will look like your
standard-fare monadic code. What makes them interesting is that, operationally,
they are all in the IO monad. Unlike with monad transformers, adding a new
effect does not change the underlying monad, so code that doesn’t use that
effect does not pay a price for it. Another notable consequence is that
“unlifting” abstractions like UnliftIO or MonadBaseControl are no longer
relevant: there is nothing to “unlift” if you never leave IO.
The abstraction layer of algebraic effects over continuations is so thin that I
just use prompt and control0 directly, but the bits that are “operations”
and the bits that are “handlers” are clearly identifiable. The system
implemented here is untyped as far as effects are concerned, but features
named handlers as a mitigating alternative;
a complete effect system which would keep track of what operations each
computation may call and would provide safe primitives to define new effects is
left as an exercise for the reader.
This post is written in Literate Haskell (source code). It can be compiled using the development version of GHC (or GHC 9.6 if it has been released).
$ ghc 2023-01-02-del-cont-examples.lhs -main-is DelContExamples.main -o run-tests
$ ./run-tests
All tests passed!
Extensions and imports
{-# LANGUAGE
BangPatterns,
BlockArguments,
DerivingStrategies,
GADTs,
GeneralizedNewtypeDeriving,
MagicHash,
UnboxedTuples #-}
module DelContExamples where
import qualified Control.Exception as E
import Control.Exception.Base (NoMatchingContinuationPrompt(..))
import Data.Either
import Data.Foldable (for_)
import Data.Functor (void)
import Data.Functor.Sum (Sum(..))
import Data.Maybe (fromMaybe, maybe)
import System.IO.Unsafe
import System.Environment
import GHC.Exts (PromptTag#, newPromptTag#, prompt#, control0#)
import GHC.IO (IO(..))
import GHC.Stack (HasCallStack)
import Prelude hiding (log)The mother of all monads
Capturing continuations is the power of the continuation monad, in which we can embed all other monads. It’s the mother of all monads.
Mom is defined identically to IO, but its only operations are the new
delimited continuation primitives.
The available operations wrap the RTS primitives newPromptTag#,
prompt# and control0#.
-- Unsafe primitives
data PromptTag a = PromptTag (PromptTag# a)
newPromptTag :: Mom (PromptTag a)
newPromptTag = Mom (IO (\s -> case newPromptTag# s of
(# s', tag #) -> (# s', PromptTag tag #)))
prompt :: PromptTag a -> Mom a -> Mom a
prompt (PromptTag tag) (Mom (IO m)) = Mom (IO (prompt# tag m))
control0 :: PromptTag a -> ((Mom b -> Mom a) -> Mom a) -> Mom b
control0 (PromptTag tag) f =
Mom (IO (control0# tag (\k -> case f (\(Mom (IO a)) -> Mom (IO (k a))) of Mom (IO b) -> b)))The boxing of the continuation k in control0 could be avoided by
introducing a new type for continuations, replacing (Mom b -> Mom a).
I’m not sure whether there is much to gain from that optimization.
I leave it like this for simplicity.
prompt and control0, “goto” with extra steps?
When a function terminates normally, it returns its result to its caller,
its predecessor in the call stack. prompt lets you prepare another return point
earlier in the call stack, and control0 returns to that point. What happens
to all the stack frames that were skipped that way? They are copied to the heap so they
can be restored later.
In more concrete terms, when you call control0 t f :: Mom b, the caller expects a
result of some type b. It is assumed that you have previously set up a
prompt t :: Mom a -> Mom a in the call stack with the same tag t :: PromptTag a.
The slice of the stack up to that prompt t is unwinded and stored as a function
continue :: Mom b -> Mom a (IO b -> IO a).
prompt t is popped off the stack, and the program carries on as f continue.
It sounds completely insane the first time you learn about it, it’s like “goto” with extra steps. And yet, when you get down to it, delimited continuations have rather clean semantics, both operationally and denotationally. The implementation was a surprisingly small change in GHC.
The changes required to implement
— The GHC Proposalprompt#andcontrol0#are relatively minimal. They only impact the RTS, and they do not require any changes to existing functionality. Though capturing portions of the RTS stack may seem like a radical proposition, GHC actually already does it when raising an asynchronous exception to avoid the need to duplicate work for any blackholed thunks. In fact, getting that right is significantly more subtle than implementingcontrol0#, which is quite straightforward in comparison.
The richness of continuations, both theoretically and practically, suggests that these control operators are not as arbitrary as they seem.
Effectful code, pure semantics
The code in this post can be split in two levels. Library-level code uses the delimited
continuation primitives to implement effects—operations and handlers, and user-level
code uses those effects in example programs.
Without direct access to delimited continuations, user-level code cannot
observe any mutation, so it will be safe to use the following pure run
function.
-- Look Ma', no IO!
run :: Mom a -> Maybe a
run (Mom m) = unsafePerformIO
(E.catch (Just <$> m) \NoMatchingContinuationPrompt -> pure Nothing)Hiding the delimited continuations primitives avoids the danger of duplicating
and observing the creation of fresh PromptTags in a pure context.
Some partiality remains (Maybe) due to potentially mismatched
control0# calls. Such errors would be prevented by a type system for effects,
which is beyond the scope of this post.
Further reading
On prompt#, control0#, and newPromptTag#:
- The GHC proposal: Delimited continuations primops;
- Comment by @tomjaguarpaw in the discussion of the GHC proposal illustrating the semantics of the primops;
- Gist by @lexi-lambda with background on reduction semantics and continuations;
- A Monadic Framework for Delimited Continuations by Kent Dybvig, Simon Peyton Jones, and Amr Sabry (JFP 2007).
- The patch implementing the proposal.
On the continuation monad:
- The reasonable effectiveness of the continuation monad.
- The essence of functional programming, by Philip Wadler (POPL 1992).
Exceptions
To begin, let’s implement exceptions using delimited continuations.
This effect has an operation throw and a handler catch.
Operation
We first declare the uninterpreted operation Throw as a constructor
in a functor. The parameter a is ignored by exceptions; it will be
used by other effects.
We wrap this constructor in a user-facing function throw.
Every throw should have a matching catch, and we ensure this
by requiring a tag that identifies the corresponding catch.
The exact type of tag will be revealed in a moment.
control0 uses that tag to look up the matching catch in the call stack,
and returns to it with the exception e wrapped in Throw.
The underscore is the continuation, which is the slice of the stack below the
catch, which is thus discarded.
Handler
The type of catch should also look familiar, with the novelty that the
handled computation f expects a tag—so that it may call throw.
In catch f onThrow, a fresh tag is generated, then
f tag either (1) returns normally, and its result is wrapped in Pure a,
or (2) f tag throws an exception wrapped in Op (Throw e).
We then return the result or apply the handler onThrow accordingly.
catch :: (Exception e % a -> Mom a) -> (e -> Mom a) -> Mom a
catch f onThrow = do
tag <- newPromptTag
handle tag (f tag)
where
handle tag action = do
next <- prompt tag (Pure <$> action)
case next of
Op (Throw e) -> onThrow e
Pure a -> pure aYou might have guessed that the Exception e % a tag is just a PromptTag.
More surprisingly, the tag index involves a free monad.
For exceptions, Free (Exception e) a is equivalent to Either e a:
we expect the computation under prompt to produce either an exception e or
a result a. More generally, for an effect expressed as a functor f,
things will be set up exactly so that handlers will be matching on a
computation/tree of type Free f r.
Using catch, we can implement try.
try :: (Exception e % Either e a -> Mom a) -> Mom (Either e a)
try f = catch (\tag -> Right <$> f tag) (\e -> pure (Left e))The explicit tags serve as a form of capabilities, handles that functions
take as explicit arguments, granting the permission to use the associated
effects. This partly makes up for the lack of effect typing.
It’s not watertight: you can easily capture the tag to call throw outside of
try/catch. But from a non-adversarial perspective, this mechanism may
prevent quite a few mistakes.
Test
testThrow :: IO ()
testThrow = do
assert (isRight' (run (try (\_ -> pure "Result"))))
assert (isLeft' (run (try (\exc -> throw exc "Error"))))
where
isRight' = maybe False isRight
isLeft' = maybe False isLeft-- Minimalistic unit testing framework
assert :: HasCallStack => Bool -> IO ()
assert True = pure ()
assert False = error "Assertion failed"Output
Algebraic effects are also known as “resumable exceptions”, extending exceptions with the ability to continue the computation right where the exception was thrown.
The next simplest effect after exceptions is to produce some output.
Like Throw, we represent the Output operation as a constructor,
containing the value to output, and now also a continuation.
Operation
The output wrapper is similar to throw, additionally storing the
continuation in the Output constructor.
The expected argument of the continuation continue is a computation which is
to replace the operation call.
When we call output o :: Mom (), that call “bubbles
up” like an exception, gets caught by a handler, and the call gets replaced by
pure () or some other Mom () computation.
output :: Out o % r -> o -> Mom ()
output tag o = control0 tag \continue -> pure (Op (Output o continue))A synonym specialized to strings.
Example
An infinite output stream of the Fibonacci sequence.
fibonacci :: Out Int % r -> Mom a
fibonacci out = fib 0 1
where
fib !a !b = do
output out a
fib b (a + b)Handler
Run a computation lazily and collect its output in a list.
collect :: (Out o % () -> Mom ()) -> [o]
collect f = runList do
tag <- newPromptTag
handle tag (Pure <$> f tag)
where
handle tag action = do
next <- prompt tag action
case next of
Op (Output o continue) ->
pure (o : runList (handle tag (continue (pure ()))))
Pure () -> pure []
runList = fromMaybe [] . runTest
testFibonacci :: IO ()
testFibonacci =
assert (take 8 (collect fibonacci)
== [0, 1, 1, 2, 3, 5, 8, 13])Combining exceptions and output
Example
The big selling point of algebraic effects is that effects can be
combined smoothly. We can thus use log to trace the
execution flow of a program using throw and catch
without further ceremony.
This looks like your usual monadic program. The point is that everything lives
in the same monad Mom (which is operationally equal to IO),
so you do not have to worry about “lifting” or “unlifting” anything through a
transformer: the semantics of (>>=) do not change with every new effect, and
there isn’t the problem that “lifting” catch and other operations that are
actually handlers is counter-intuitive for many transformers, if possible at all.
To be fair, there remain
difficulties in this area even with
algebraic effects.
tracedCatch :: Out String % r -> Mom Bool
tracedCatch out = catch this onThrow
where
this exc = do
log out "Start"
_ <- throw exc "Boom"
log out "This is unreachable"
pure False
onThrow msg = do
log out ("Error: " ++ msg)
pure TrueTest
testTracedCatch :: IO ()
testTracedCatch =
assert (collect (void . tracedCatch) ==
[ "Start"
, "Error: Boom" ])Silent handler
There can also be different ways of handling an effect. The following handler discards output instead of collecting it, for example to ignore debugging logs.
discardOutput :: (Out o % a -> Mom a) -> Mom a
discardOutput f = do
tag <- newPromptTag
handle tag (Pure <$> f tag)
where
handle tag action = do
next <- prompt tag action
case next of
Op (Output _o continue) -> handle tag (continue (pure ()))
Pure a -> pure aInput
Dually, there is an effect to request some input.
Operation
The input call is expected to return a result i. As before, the type of the
input _ operation must coincide with the domain Mom i of the continuation.
Example
Output the cumulative sum of an input stream.
Like fibonacci, this is an infinite loop in IO.
It gets broken by control0 in input.
Until now, an infinite loop in IO would either have to be broken by an
exception (which makes it not actually infinite), or have to involve
concurrency.
csum :: In Int % r -> Out Int % r -> Mom a
csum inp out = go 0
where
go !acc = do
n <- input inp
let acc' = acc + n
output out acc'
go acc'Handler
Supply a list of inputs and stop when we run out.
listInput :: [i] -> (In i % a -> Mom a) -> Mom (Maybe a)
listInput is f = do
tag <- newPromptTag
catch (\exc -> handle exc tag is (Pure <$> f tag))
(\() -> pure Nothing)
where
handle exc tag is action = do
next <- prompt tag action
case next of
Op (Input continue)
| i : is' <- is -> handle exc tag is' (continue (pure i))
| otherwise -> handle exc tag [] (continue (throw exc ()))
Pure a -> pure (Just a)Test
testCsum :: IO ()
testCsum =
assert ((collect \out ->
void $ listInput [1 .. 5] \inp ->
csum inp out
) == [1, 3, 6, 10, 15])Combining input and output: streaming
The input and output effect can be combined in a streaming fashion, alternating execution between the consumer and the producer.
Handler
Feed the output of one computation into the input of the other. Terminate whenever one side terminates, discarding the other.
connect :: (Out x % a -> Mom a) -> (In x % a -> Mom a) -> Mom a
connect producer consumer = do
out <- newPromptTag
inp <- newPromptTag
handleI out inp (Pure <$> producer out) (Pure <$> consumer inp)
where
handleI out inp produce consume = do
next <- prompt inp consume
case next of
Op (Input continue) -> handleO out inp produce continue
Pure a -> pure a
handleO out inp produce consuming = do
next <- prompt out produce
case next of
Op (Output o continue) ->
handleI out inp (continue (pure ())) (consuming (pure o))
Pure a -> pure aTest
Connect two copies of the cumulative sum process: compute the cumulative sum of the cumulative sum.
csum2 :: In Int % () -> Out Int % () -> Mom ()
csum2 inp out = connect (\out' -> csum inp out') (\inp' -> csum inp' out)testConnect :: IO ()
testConnect =
assert ((collect \out ->
void $ listInput [1 .. 5] \inp ->
csum2 inp out
) == [1, 4, 10, 20, 35])Interacting with the real world
What sets IO apart from ST and Mom is that it can change the world.
We can define handlers to send output and receive input from the real world.
The result of these handlers must be in IO.
Printing output
Text output can be printed to stdout.
printOutput :: (Out String % () -> Mom ()) -> IO ()
printOutput f = momToIO do
tag <- newPromptTag
handle tag (Pure <$> f tag)
where
handle tag action = do
next <- prompt tag action
case next of
Op (Output o continue) -> pure do
putStrLn o
momToIO (handle tag (continue (pure ())))
Pure () -> pure (pure ())
momToIO = fromMaybe (pure ()) . runReading input
We can forward input from stdin into a
consumer computation.
readInput :: (In String % () -> Mom ()) -> IO ()
readInput f = momToIO do
tag <- newPromptTag
handle tag (Pure <$> f tag)
where
handle tag action = do
next <- prompt tag action
case next of
Op (Input continue) -> pure do
i <- getLine
momToIO (handle tag (continue (pure i)))
Pure () -> pure (pure ())
momToIO = fromMaybe (pure ()) . runA drawback of this implementation is that for a computation that features both
input and output, these handlers are awkward to compose.
We can coerce IO to Mom so readInput can be composed with printOutput,
but that is a hacky solution that makes the type Mom a lie (it’s not supposed
to have side effects). A better solution may be to combine effects before
interpreting them in IO all at once.
State
No effect tutorial would be complete without the state effect.
Operations
get :: State s % r -> Mom s
get tag = control0 tag \continue -> pure (Op (Get continue))
put :: State s % r -> s -> Mom ()
put tag s = control0 tag \continue -> pure (Op (Put s continue))Handler
State-passing, no mutation.
runState :: s -> (State s % a -> Mom a) -> Mom (s, a)
runState s0 f = do
tag <- newPromptTag
handle tag s0 (Pure <$> f tag)
where
handle tag s action = do
next <- prompt tag action
case next of
Op (Get continue) -> handle tag s (continue (pure s))
Op (Put s' continue) -> handle tag s' (continue (pure ()))
Pure a -> pure (s, a)Example
Again, combining state with logging is effortless, because effects live in the same underlying monad.
logState :: Out String % r -> State Int % s -> Mom ()
logState out st = do
n <- get st
log out (show n)incr2 :: Out String % r -> State Int % s -> Mom ()
incr2 out st = do
incr st
logState out st
incr st
logState out stTest
testState :: IO ()
testState = do
assert ((collect \out -> runState 0 (incr2 out) *> pure ()) == ["1", "2"])
assert (run (discardOutput \out -> runState 0 (incr2 out)) == Just (2, ()))Nondeterminism
The examples above are quite sequential in nature.
Mom can also replace the list monad.
Operation
Choose one element in a list.
choose :: Nondet % r -> [x] -> Mom x
choose tag xs = control0 tag \continue -> pure (Op (Choose xs continue))Example
nameTheorems :: Nondet % r -> Mom String
nameTheorems nd = do
name1 <- choose nd ["Church", "Curry"]
name2 <- choose nd ["Turing", "Howard"]
result <- choose nd ["thesis", "isomorphism"]
pure (name1 ++ "-" ++ name2 ++ " " ++ result)Handler
Use the output effect to stream all results of a nondeterministic computation. Here, the continuation is not used linearly: it is called once for every element in the given list.
enumerate :: (Nondet % a -> Mom a) -> Out a % r -> Mom ()
enumerate f out = do
tag <- newPromptTag
handle tag (Pure <$> f tag)
where
handle tag action = do
next <- prompt tag action
case next of
Op (Choose xs continue) -> for_ xs (handle tag . continue . pure)
Pure a -> output out aTest
testEnumerate :: IO ()
testEnumerate = do
assert (collect (enumerate nameTheorems) ==
[ "Church-Turing thesis"
, "Church-Turing isomorphism"
, "Church-Howard thesis"
, "Church-Howard isomorphism"
, "Curry-Turing thesis"
, "Curry-Turing isomorphism"
, "Curry-Howard thesis"
, "Curry-Howard isomorphism"
])Concurrency
Earlier, the streaming handler connect interleaved execution of one consumer
and one producer thread. Here is a cooperative concurrency effect that lets us
dynamically fork any number of threads and interleave them.
Operations
Fork a thread to run the given computation.
fork :: Conc % r -> Mom () -> Mom ()
fork tag thread = control0 tag \continue -> pure (Op (Fork thread continue))Cooperative concurrency: threads must yield explicitly.
Example
A thread that repeats an output value three times.
simpleThread :: Out String % r -> Conc % s -> Int -> Mom ()
simpleThread out conc n = do
log out (show n)
yield conc
log out (show n)
yield conc
log out (show n)
yield concInterleave 111, 222, 333.
interleave123 :: Out String % r -> Conc % s -> Mom ()
interleave123 out conc = do
fork conc (simpleThread out conc 1)
fork conc (simpleThread out conc 2)
fork conc (simpleThread out conc 3)Handler
A round-robin scheduler. handle keeps track of a queue of threads.
It runs the first thread until the next event. If the thread yields,
its continuation is pushed to the end of the queue. If the thread
forks another thread, the forked thread is pushed to the end of the queue,
and we continue in the main thread (forking does not yield).
If the thread terminates, we remove it from the queue.
runConc :: (Conc % () -> Mom ()) -> Mom ()
runConc f = do
tag <- newPromptTag
handle tag [Pure <$> f tag]
where
handle tag [] = pure ()
handle tag (thread : threads) = do
next <- prompt tag thread
case next of
Op (Yield continue) -> handle tag (threads ++ [continue (pure ())])
Op (Fork th continue) -> handle tag (continue (pure ()) : threads ++ [Pure <$> th])
Pure () -> handle tag threadsTest
testInterleave :: IO ()
testInterleave =
assert ((collect \out -> runConc \conc -> interleave123 out conc)
== ["1", "2", "3", "1", "2", "3", "1", "2", "3"])Conclusion
Primitive delimited continuation in Haskell give us the power to jump around the stack to implement many kinds of effects. Under the hood, those operations live in the IO monad, grounding effectful code in a familiar execution model.
For those new to the topic, I hope that these examples may serve as a good starting point to experiment with delimited continuations and algebraic effects in Haskell.
The system implemented here is as rudimentary as it gets. To define new effects and handlers, we use the new primitives directly, which is dangerous. This was deliberate to provide material to familiarize oneself with those primitives. Moreover, on the one hand, a type system to keep track of the scope of delimited continuations is a non-trivial ask. On the other hand, the examples here all follow a regular structure, so there is probably a way to encapsulate the primitives, trading off some expressiveness for a safe interface to define new effects and handlers.
Named handlers—via prompt tags—occupy an interesting spot in the scale of safety guarantees. It is imperfect, even very easy to circumvent. But if you’re not working against it, it is still a neat way to prevent simple mistakes. This system can be reinforced further using rank-2 polymorphism, a technique described in:
- First-Class Names for Effect Handlers, Ningning Xie, Youyou Cong, Kazuki Ikemori, Daan Leijen (OOPSLA 2022).
Interestingly, prompt tags were not part of the original proposal, and they are not used by eff, the effect system which gave rise to Alexis King’s GHC proposal. Prompt tags were added during the feedback process to make the primitives type-safe by default.
Now is an exciting time for algebraic effects/delimited continuations, as they are making their way into industrial languages: Haskell, OCaml, WebAssembly.
All of this is executable
]]>Quantified constraints and type families
QuantifiedConstraints is an extension from GHC 8.6 that lets us
use forall in constraints.
It lets us express constraints for instances of higher-kinded types like Fix:
Other solutions existed previously, but they’re less elegant:
It also lets us say that a monad transformer indeed transforms monads:
(Examples lifted from the GHC User Guide on QuantifiedConstraints, section Motivation.)
One restriction is that the conclusion of a quantified constraint cannot mention a type family.
type family F a
-- (forall a. C (F a)) -- Illegal type family application in a quantified constraintA quantified constraint can be thought of as providing a local instance, and they are subject to a similar restriction on the shape of instance heads so that instance resolution may try to match required constraints with the head of existing instances.
Type families are not matchable: we cannot determine whether an applied
type family F a matches a type constructor T in a manner satisfying the
properties required by instance resolution (“coherence”). So type families
can’t be in the conclusion of a type family.
The quantified constraint trick
Step 1
To legalize type families in quantified constraints, all we need is a class synonym:
That CF a is equivalent to C (F a), and forall a. CF a is legal.
Step 2?
Since GHC 9.2, Step 1 alone solves the problem. It Just Works™. And I don’t know why.
Before that, for GHC 9.0 and prior, we also needed to hold the compiler’s hand and tell it how to instantiate the quantified constraint.
Indeed, now functions may have constraints of the form forall a. CF a,
which should imply C (F x) for any x.
Although CF and C (F x) are logically related, when C (F x) is required,
that triggers a search for instances of the class C, and not the CF which
is provided by the quantified constraint.
The search would fail unless some hint is provided to the compiler.
When you require a constraint C (F x), insert a type annotation mentioning
the CF x constraint (using the CF class instead of C).
Inside the annotation (to the left of ::), we are given CF x, from which C (F x) is inferred as a superclass. Outside the annotation, we are requiring CF x,
which is trivially solved by the quantified constraint forall a. CF a.
Recap
-- Mixing quantified constraints with type families --
class C a
type family F a
-- forall a. C (F a) -- Nope.
class C (F a) => CF a -- Class synonym
instance C (F a) => CF a
-- forall a. CF a -- Yup.
-- Some provided function we want to call.
f :: C (F t) => t
-- A function we want to implement using f.
g :: (forall a. CF a) => t
g = f -- OK on GHC >= 9.2
g = f :: CF t => t -- Annotation needed on GHC <= 9.0The part of that type annotation that really matters is the constraint. The rest of the type to the right of the arrow is redundant. Another way to write only the constraint uses the following identity function with a fancy type:
So you can supply the hint like this instead:
Application: generic-functor
What do I need that trick for? It comes up in generic metaprogramming.
Imagine deriving Functor for Generic types (no Generic1, which is not as
general as you might hope). One way is to implement the following class on
generic representations:
A type constructor f :: Type -> Type will be a Functor when its
generic representation (Rep) implements RepFmap a a'…
for all a, a'.
-- Class synonym for generically derivable functors
class (forall a. Generic (f a), forall a a'. RepFmap a a' (Rep (f a) ()) (Rep (f a') ())) => GFunctor f
instance ... -- idem (class synonym)
-- Wait a second...But that is illegal, because the type family Rep occurs in the conclusion of
a quantified constraint.
Time for the trick! We give a new name to the conclusion:
class RepFmap a a' (Rep (f a) ()) (Rep (f a') ()) => RepFmapRep a a' f
instance ... -- idem (class synonym)And we can use it in a quantified constraint:
-- Now this works!
class (forall a. Generic (f a), forall a a'. RepFmapRep a a' f) => GFunctor f
instance ... -- idem (class synonym)To obtain the final generic implementation of fmap, we wrap repFmap between to and from.
gfmap :: forall f a a'. GFunctor f => (a -> a') -> f a -> f a'
gfmap f =
with @(RepFmapRep a a' f) -- Hand-holding for GHC <= 9.0
(to @_ @() . repFmap f . from @_ @())Et voilà.
Appendix: Couldn’t we do this instead?
If you’ve followed all of that, there’s one other way you might try defining
gfmap without QuantifiedConstraints, by just listing the three constraints
actually needed in the body of the function.
-- Dangerous gfmap!
gfmap ::
Generic (f a) =>
Generic (f a') =>
RepFmap a a' (Rep (f a) ()) (Rep (f a') ()) =>
(a -> a') -> f a -> f a'
gfmap f = to @_ @() . repFmap f . from @_ @()This is okay as long as it is only ever used to implement fmap as in:
Any other use voids a guarantee you didn’t know you expected.
The thing I haven’t told you is that RepFmap is implemented with…
incoherent instances!1 In fact, this gfmap may behave differently
depending on how it is instantiated at compile time.
For example, for a functor with a field of constant type:
gfmap @(T a) @b @b' where a, b and b' are distinct type variables
behaves like fmap should. But gfmap @(T Int) @Int @Int
will unexpectedly apply its argument function to every field.
They all have type Int, so a function Int -> Int can and will be applied to
all fields.
I could demonstrate this if I had implemented RepFmap…
Luckily, there is a more general version of this “dangerous gfmap” readily
available in my library
generic-functor.
It can be very incoherent, but if you follow some rules, it can also be very
fun to use.
Playing with fire
gsolomap2 is a function from generic-functor that can implement
fmap, and much more.
Map over the first parameter if you prefer:
Or map over both type parameters at once:
I don’t know what to call this, but gsolomap also does what you might guess
from this type:
watT ::
(a -> a') ->
T (a , a ) ((a -> a') -> Maybe a ) ->
T (a', a') ((a' -> a ) -> Maybe a')
watT = gsolomapIt’s important to specialize gsolomap with distinct type variables
(a and a').
You cannot refactor code by inlining a function if its body uses gsolomap,
as it risks breaking that requirement.
Witnessing incoherence
For an example of surprising result caused by incoherence, apply the fmapT
defined above to some concrete arguments. See how the result changes then you
replace fmapT with its definition, gsolomap.
fmapT ((+1) :: Int -> Int) (C 0 0 0) == C 0 0 1 :: T Int Int
gsolomap ((+1) :: Int -> Int) (C 0 0 0) == C 1 1 1 :: T Int Int -- Noooooo...(Gist of those gsolomap (counter)examples)
This is why gfmap’s signature should use quantified constraints:
this guarantees that when the RepFmap constraint is solved,
the first two parameters are going to be distinct type variables,
thanks to the universal quantification (forall a a').
Thus, incoherence is hidden away.
Following that recipe, generic-functor contains safe implementations of
Functor, Foldable, Traversable, Bifunctor, and Bitraversable.
In particular, the type of gfmap guarantees that it has a unique
inhabitant satisfying gfmap id = id, and this property is quite
straightforward to check by visual inspection of the implementation.
After all, gfmap will essentially do one of three things:
(1) it will be id on types that don’t mention the type parameters
in its function argument a -> a', (2) it will apply the provided function
f, or (3) it will fmap (or bimap, or dimap) itself through a type
constructor. In all cases (with some inductive reasoning for (3)),
if f = id, then gfmap f = id.
The dangerous gfmap (without QuantifiedConstraints) or gsolomap fail this
property, because the extra occurrences of a and a' in its constraint make
their signatures have a different “shape” from fmap.
The trade-off is that those safe functions can’t do the same crazy things
as gsolomap above.
Combining my two favorite topics, I’ve always wanted to mechanize combinatorics in Coq.2 An immediate challenge is to formalize the idea of “set”.3 We have to be able to define the set of things we want to count. It turns out that there are at least two ways of encoding sets in type theory: sets as types, and sets as predicates. They are suitable for defining different classes of operations: sums (disjoint union) are a natural operation on types, while unions and intersections are naturally defined on predicates.
The interplay between these two notions of sets, and finiteness, will then let us prove the standard formula for the cardinality of unions, aka. the binary inclusion-exclusion formula:
#|X ∪ Y| = #|X| + #|Y| - #|X ∩ Y|
Imports and options
From Coq Require Import ssreflect ssrbool.
Set Implicit Arguments.
Sets as types
The obvious starting point is to view a type as the set of its inhabitants.
How do we count its inhabitants?
We will say that a set A has cardinality n if there is a bijection between
A and the set {0 .. n-1} of natural numbers between 0 and n-1.
Bijections
A bijection is a standard way to represent a one-to-one correspondence
between two sets, with a pair of inverse functions.
We define the type bijection A B as a record containing the two functions
and a proof of their inverse relationship.
Record is_bijection {A B} (to : A -> B) (from : B -> A) : Prop :=
{ from_to : forall a, from (to a) = a
; to_from : forall b, to (from b) = b }.
Record bijection (A B : Type) : Type :=
{ bij_to : A -> B
; bij_from : B -> A
; bij_is_bijection :> is_bijection bij_to bij_from }.
Infix "<-->" := bijection (at level 90) : type_scope.
We say that A and B are isomorphic when there exists a bijection between
A and B. Isomorphism is an equivalence relation: reflexive, symmetric,
transitive.4
Definition bijection_refl {A} : A <--> A.
Admitted. (* Easy exercise *)
Definition bijection_sym {A B} : (A <--> B) -> (B <--> A).
Admitted. (* Easy exercise *)
Definition bijection_trans {A B C} : (A <--> B) -> (B <--> C) -> (A <--> C).
Admitted. (* Easy exercise *)
Infix ">>>" := bijection_trans (at level 40).
Finite sets
Our “bijective” definition of cardinality shall rely on a primitive,
canonical family of finite types {0 .. n-1} that is taken for granted.
We can define them as the following sigma type, using the familiar set
comprehension notation, also known as ordinal in math-comp:
Definition fin (n : nat) : Type := { p | p < n }.
An inhabitant of fin n is a pair of a p : nat and
a proof object of p < n. Such proofs objects are unique for a given
p and n, so the first component uniquely determines the second component,
and fin n does have exactly n inhabitants.5
Finiteness
We can now say that a type A has cardinality n if there is a bijection
between A and fin n, i.e., there is an inhabitant of A <--> fin n.
Note that this only defines finite cardinalities, which is fine for doing
finite combinatorics. Infinity is really weird so let’s not think about it.
As a sanity check, you can verify the cardinalities of the usual suspects,
bool, unit, and Empty_set.
Definition bijection_bool : bool <--> fin 2.
Admitted. (* Easy exercise *)
Definition bijection_unit : unit <--> fin 1.
Admitted. (* Easy exercise *)
Definition bijection_Empty_set : Empty_set <--> fin 0.
Admitted. (* Easy exercise *)
A type A is finite when it has some cardinality n : nat.
When speaking informally, it’s common to view finiteness as a property,
a thing that a set either is or is not. To prove finiteness
is merely to exhibit the relevant data: a number to be the cardinality,
and an associated bijection (which we call an enumeration of A,
enum for short).
Hence we formalize “finiteness” as the type of that data.
Record is_finite (A : Type) : Type :=
{ card : nat
; enum : A <--> fin card }.
Further bundling is_finite A proofs with their associated set A, we obtain
a concept aptly named “finite type”.6 A finite type is a type A paired with
a proof of is_finite A.
Record finite_type : Type :=
{ ft_type :> Type
; ft_is_finite :> is_finite ft_type }.
We leverage coercions (indicated by :>) to lighten the notation of
expressions involving finite_type.
The first coercion ft_type lets us use a finite_type as a Type.
So if E : finite_type, we can write the judgement that
“e is an element of E” as e : E, which implicitly expands to
the more cumbersome e : ft_type E.
Similarly, the second coercion ft_is_finite lets us access
the evidence of finiteness without naming that field. In particular,
we can write the cardinality of E : finite_type as card E,
as if card were a proper field of E rather than the nested record it
actually belongs to. This is a convenient mechanism for overloading,
letting us reuse the name card(inality) even though records technically
cannot have fields with the same name.
With that, we define #|A| as sugar for card A:
Notation "'#|' A '|'" := (card A).
Some notation boilerplate
Declare Scope fintype_scope.
Delimit Scope fintype_scope with fintype.
Bind Scope fintype_scope with finite_type.
Uniqueness of cardinality
The phrase “cardinality of a set” suggests that cardinality is an inherent property of sets. But now we’ve defined “finite type” essentially as a tuple where the cardinality is just one component. What’s to prevent us from putting a different number there, for the same underlying type?
We can prove that this cannot happen. Cardinality is unique: any two finiteness proofs for the same type must yield the same cardinality.
(The proof is a little tedious and technical.)
Theorem card_unique {A} (F1 F2 : is_finite A) : card F1 = card F2.
Admitted. (* Intermediate exercise *)
A slightly more general result is that isomorphic types (i.e., related by
a bijection) have the same cardinality. It can first be proved
in terms of is_finite, from which a corollary in terms of finite_type
follows.
Theorem card_bijection {A B} (FA : is_finite A) (FB : is_finite B)
: (A <--> B) -> card FA = card FB.
Admitted. (* Like card_unique *)
Theorem card_bijection_finite_type {A B : finite_type}
: (A <--> B) -> #|A| = #|B|.
Proof.
apply card_bijection.
Qed.
The converse is also true and useful: two types with the same cardinality are isomorphic.
Theorem bijection_card {A B} (FA : is_finite A) (FB : is_finite B)
: card FA = card FB -> (A <--> B).
Admitted. (* Easy exercise *)
Theorem bijection_card_finite_type {A B : finite_type}
: #|A| = #|B| -> (A <--> B).
Proof.
apply bijection_card.
Qed.
Operations on finite sets
Sum
The sum of sets is also known as the disjoint union.
Inductive sum (A B : Type) : Type :=
| inl : A -> A + B
| inr : B -> A + B
where "A + B" := (sum A B) : type_scope.
sum is a binary operation on types. We must work to
make it an operation on finite types.
There is a bijection between fin n + fin m (sum of sets)
and fin (n + m) (sum of nats).
Definition bijection_sum_fin {n m} : fin n + fin m <--> fin (n + m).
Admitted. (* Intermediate exercise *)
The sum is a bifunctor.
Definition bijection_sum {A A' B B'}
: (A <--> B) -> (A' <--> B') -> (A + A' <--> B + B').
Admitted. (* Easy exercise *)
Combining those facts, we can prove that the sum of two finite sets is finite (finite_sum),
and the cardinality of the sum is the sum of the cardinalities (card_sum).
Definition is_finite_sum {A B} (FA : is_finite A) (FB : is_finite B)
: is_finite (A + B) :=
{| card := #|FA| + #|FB|
; enum := bijection_sum (enum FA) (enum FB) >>> bijection_sum_fin |}.
Definition finite_sum (A B : finite_type) : finite_type :=
{| ft_type := A + B ; ft_is_finite := is_finite_sum A B |}.
Infix "+" := finite_sum : fintype_scope.
Theorem card_sum {A B : finite_type} : #|(A + B)%fintype| = #|A| + #|B|.
Proof.
reflexivity.
Qed.
Product
The cartesian product has structure dual to the sum.
Inductive prod (A B : Type) : Type :=
| pair : A -> B -> A * B
where "A * B" := (prod A B) : type_scope.
- There is a bijection
fin n * fin m <--> fin (n * m). - The product is a bifunctor.
- The product of finite sets is finite.
- The cardinality of the product is the product of the cardinalities.
Coq code
Definition bijection_prod_fin {n m} : fin n * fin m <--> fin (n * m).
Admitted. (* Intermediate exercise *)
Definition bijection_prod {A A' B B'}
: (A <--> B) -> (A' <--> B') -> (A * A' <--> B * B').
Admitted. (* Easy exercise *)
Definition is_finite_prod {A B} (FA : is_finite A) (FB : is_finite B)
: is_finite (A * B) :=
{| card := #|FA| * #|FB|
; enum := bijection_prod (enum FA) (enum FB) >>> bijection_prod_fin |}.
Definition finite_prod (A B : finite_type) : finite_type :=
{| ft_type := A * B ; ft_is_finite := is_finite_prod A B |}.
Infix "*" := finite_prod : fintype_scope.
Theorem card_prod {A B : finite_type} : #|(A * B)%fintype| = #|A| * #|B|.
Proof.
reflexivity.
Qed.
Sets as predicates
Two other common operations on sets are union and intersection.
However, those operations don’t fit in the view of sets as types.
While set membership x ∈ X is a proposition, type inhabitation x : X is
a judgement, which is a completely different thing,7 so we need a different
approach.
The idea of set membership x ∈ X as a proposition presumes that x
and X are entities that exist independently of each other. This suggests
that there is some “universe” that elements x live in, and the
sets X under consideration are subsets of that same universe.
We represent the universe by a type A, and sets (i.e., “subsets of the universe”)
by predicates on A.
Definition set_of (A : Type) := (A -> bool).
Hence, if x : A is an element of the universe, and X : set A is a set
(subset of the universe), we will denote set membership x ∈ X simply as X x
(x satisfies the predicate X).
Those predicates are boolean, i.e., decidable. This is necessary in several
constructions and proofs here, notably to prove that the union or intersection
of finite sets is finite. We rely on a coercion to implicitly convert booleans
to Prop: is_true : bool >-> Prop, which is exported by ssreflect.
Union, intersection, complement
Those common set operations correspond to the usual logical connectives.
Section Operations.
Context {A : Type}.
Definition union' (X Y : set_of A) : set_of A := fun a => X a || Y a.
Definition intersection' (X Y : set_of A) : set_of A := fun a => X a && Y a.
Definition complement' (X : set_of A) : set_of A := fun a => negb (X a).
End Operations.
Define the familiar infix notation for union and intersection.
Declare Scope set_of_scope.
Delimit Scope set_of_scope with set_of.
Bind Scope set_of_scope with set_of.
Infix "∪" := union' (at level 40) : set_of_scope.
Infix "∩" := intersection' (at level 40) : set_of_scope.
Finiteness
Again, we will characterize finite sets using bijections to fin n.
We first transform the set X into a type to_type X, so we can form
the type of bijections to_type X <--> fin n. Like fin, we define
to_type A as a sigma type. Thanks to the predicate X being boolean,
there is at most one proof p : X a for each a, so the type
{ a : A | X a } has exactly one inhabitant for each inhabitant a : A
satisfying X a.
Definition to_type {A : Type} (X : set_of A) : Type := { a : A | X a }.
Coercion to_type : set_of >-> Sortclass.
We obtain a notion of finite set by imitating the structure of finite_type.
The set-as-predicate X is finite if the set-as-type to_type X is finite.
Record finite_set_of (A : Type) : Type :=
{ elem_of :> set_of A
; fso_is_finite :> is_finite (to_type elem_of)
}.
Similarly, a finite_type_of can be coerced to a finite_type.
Definition to_finite_type {A} (X : finite_set_of A) : finite_type :=
{| ft_type := elem_of X
; ft_is_finite := X |}.
Coercion to_finite_type : finite_set_of >-> finite_type.
Finite unions and intersections
We then prove that the union and intersection of finite sets are finite. This is actually fairly challenging, since proving finiteness means to calculate the cardinality of the set and to construct the associated bijection. Unlike sum and product, there is no simple formula for the cardinality of union and intersection. One candidate may seem to be the binary inclusion-exclusion formula:
#|X ∪ Y| = #|X| + #|Y| - #|X ∩ Y|
But that only gives the cardinality of the union in terms of the intersection, or vice versa, and we don’t know either yet.
Rather than constructing the bijections directly, we decompose the proof.
The intuition is that X ∪ Y and X ∩ Y can easily be “bounded” by known
finite sets, namely X + Y and X respectively. By “bounded”, we mean that
there is an injection from one set to the other.
The standard definition of injectivity is via an implication
f x = f y -> x = y. However, a better definition for our purposes
comes from a one-sided inverse property: a function f : A -> B is
a section if there exists another function g : B -> A (called a retraction)
such that g (f x) = x.
Every section is an injection, but the converse requires the law of excluded
middle.
Record is_section {A B} (to : A -> B) (from : B -> A) : Prop :=
{ s_from_to : forall a, from (to a) = a }.
Record section (A B : Type) : Type :=
{ s_from : A -> B
; s_to : B -> A
; s_is_section : is_section s_from s_to }.
The point is that, given a section to a finite set, section A (fin n),
we can construct a bijection A <--> fin m for some m, that is smaller than
n. We formalize this result with a proof-relevant sigma type.
Definition section_bijection (A : Type) (n : nat)
: section A (fin n) -> { m & A <--> fin m }.
Admitted. (* Hard exercise *)
This construction is rather involved. It is much more general than when we were looking specifically at union and intersection, but at the same time it is easier to come up with as it abstracts away the distracting details of those operations.
Now there is a section from X ∪ Y to X + Y,
and from X ∩ Y to X.
Definition section_union {A} (X Y : set_of A)
: section (X ∪ Y)%set_of (X + Y).
Admitted. (* Easy exercise *)
Definition section_intersection {A} (X Y : set_of A)
: section (X ∩ Y)%set_of X.
Admitted. (* Easy exercise *)
We can then rely on the finiteness of X and X + Y to extend those
sections to fin n for some n via the following theorem:
Theorem section_extend (A B C : Type)
: section A B -> (B <--> C) -> section A C.
Admitted. (* Easy exercise *)
Definition section_union' {A} (X Y : finite_set_of A)
: section (X ∪ Y)%set_of (fin (#|X| + #|Y|)).
Proof.
eapply section_extend.
- apply section_union.
- apply is_finite_sum.
Qed.
Definition section_intersection' {A} (X Y : finite_set_of A)
: section (X ∩ Y)%set_of (fin #|X|).
Proof.
eapply section_extend.
- apply section_intersection.
- apply enum.
Qed.
Finally, by section_bijection, we obtain finiteness proofs of union' and
intersection', which let us define union and intersection properly as operations
on finite sets.
Theorem is_finite_union {A} {X Y : set_of A}
(FX : is_finite X) (FY : is_finite Y)
: is_finite (X ∪ Y)%set_of.
Proof.
refine {| enum := projT2 (section_bijection _) |}.
eapply (section_extend (B := X + Y)%type).
- apply section_union.
- apply (is_finite_sum FX FY).
Qed.
Theorem is_finite_intersection {A} {X Y : set_of A}
(FX : is_finite X) (FY : is_finite Y)
: is_finite (X ∩ Y)%set_of.
Proof.
refine {| enum := projT2 (section_bijection _) |}.
eapply section_extend.
- apply section_intersection.
- apply (enum FX).
Qed.
Definition union {A} (X Y : finite_set_of A) : finite_set_of A :=
{| fso_is_finite := is_finite_union X Y |}.
Definition intersection {A} (X Y : finite_set_of A) : finite_set_of A :=
{| fso_is_finite := is_finite_intersection X Y |}.
Declare Scope fso_scope.
Delimit Scope fso_scope with fso.
Bind Scope fso_scope with finite_set_of.
Infix "∪" := union (at level 40) : fso_scope.
Infix "∩" := intersection (at level 40) : fso_scope.
Hereafter, ∪ and ∩ will denote finite unions and intersections.
#[local] Open Scope fso_scope.
Inclusion-exclusion
#|X ∪ Y| = #|X| + #|Y| - #|X ∩ Y|
To prove that formula, it’s probably not a good idea to look at how ∪ and ∩
compute their cardinalities. A better idea is to construct a bijection, which
implies an equality of cardinalities by card_bijection.
To start, subtractions are bad, so we rewrite the goal:
#|X ∪ Y| + #|X ∩ Y| = #|X| + #|Y|
Now we look for a bijection (X ∪ Y) + (X ∩ Y) <--> X + Y.
It gets a bit tricky because of the dependent types.
Definition inclusion_exclusion_bijection {A} (X Y : finite_set_of A)
: (X ∪ Y)%set_of + (X ∩ Y)%set_of <--> X + Y.
Admitted. (* Hard exercise *)
Isomorphic sets have the same cardinality (by theorem card_bijection_finite_type).
The resulting equation simplifies to the binary inclusion-exclusion identity,
because #|A + B| equals #|A| + #|B| definitionally.
So the proof consists simply of applying that theorem with the above bijection.
Theorem inclusion_exclusion {A} (X Y : finite_set_of A)
: #|X ∪ Y| + #|X ∩ Y| = #|X| + #|Y|.
Proof.
apply (@card_bijection_finite_type ((X ∪ Y) + (X ∩ Y)) (X + Y)).
apply inclusion_exclusion_bijection.
Qed.
Conclusion
To formalize mathematics, it’s often useful to revisit our preconceptions about fundamental concepts. To carry out even basic combinatorics in type theory, it’s useful to distinguish two views of the naive notion of set.
For example, when we say “union”, we really mean one of two things depending on the context. Either the sets are obviously disjoint, so we really mean “sum”: this corresponds to viewing sets as types. Or we implicitly know that the two sets contain the same “type” of elements a priori, so the overlap is something we have to worry about explicitly: this corresponds to viewing sets as predicates on a given universe.
Ironically, when making restaurant reservations, I still occasionally forget to count myself.↩︎
The code from this post is part of this project I’ve started here. Also check out Brent Yorgey’s thesis: Combinatorial Species and Labelled Structures (2014).↩︎
Speaking of sets, it’s important to distinguish naive set theory from axiomatic set theory. Naive set theory is arguably what most people think of when they hear “set”. It is a semi-formal system for organizing mathematics: there are sets, they have elements, and there are various operations to construct and analyze sets, but overall we don’t think too hard about what sets are (hence, “semi-formal”). When this blog post talks about sets, it is in the context of naive set theory. Axiomatic set theory is formal, with rules that are clear enough to be encoded in a computer. The name “axiomatic set theory” is a stroke of marketing genius, establishing it as the “standard” way of formalizing naive set theory, and thus, all of mathematics, as can be seen in most introductory courses on formal logic. Historically, Zermelo’s set theory was formulated at around the same time as Russell’s type theory, and type theory is at the root of currently very active areas of programming language theory and formal methods.↩︎
Bijections actually form a groupoid (a “proof-relevant equivalence relation”).↩︎
We could also have defined
finas the inductive type of bounded naturals, which is namedFin.tin the standard library. Anecdotal experience suggests that the sigma type is more beginner-friendly. But past a certain level of familiarity, I think they are completely interchangeable.Inductive fin' : nat -> Type := | F0 : fin' 1 | FS : forall n, fin' n -> fin' (S n).The definition of
finas a sigma type relies on details of the definition of the order relation_ < _. Other definitions may allow the propositionp < nto be inhabited by multiple proof objects, causingfin nto have “more” thanninhabitants unless they are collapsed by proof irrelevance.↩︎math-comp has a different but equivalent definition of
fintype.↩︎… if you know what those words mean.↩︎