Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/1.1 200 OK Server: nginx/1.28.0 Date: Fri, 16 Jan 2026 01:53:46 GMT Content-Type: text/xml Content-Length: 517574 Last-Modified: Tue, 02 Sep 2025 19:43:37 GMT Connection: keep-alive ETag: "68b748e9-7e5c6" Accept-Ranges: bytes Lysxia's blog https://blog.poisson.chat https://blog.poisson.chat/favicon.png Lysxia's blog https://blog.poisson.chat Mon, 01 Sep 2025 00:00:00 UT Alpha-beta pruning is just minimax in a lattice of clamping functions https://blog.poisson.chat/posts/2025-09-01-alpha-beta.html A lazy take on a classic game theory algorithm.

A cup of latte — Sip a caffè latte while thinking about lattices

Haskell extensions and imports used in this post

{-# LANGUAGE
  DataKinds,
  DeriveGeneric,
  DeriveTraversable,
  DerivingStrategies,
  GeneralizedNewtypeDeriving,
  RankNTypes,
  ScopedTypeVariables,
  StandaloneDeriving,
  TypeFamilies #-}

import Data.Ord (Down(Down, getDown))
import Data.List.NonEmpty (NonEmpty(..))
import qualified GHC.Generics as GHC
import Generics.SOP (Generic, HasDatatypeInfo, NP(..), K(..))
import Test.QuickCheck
import Test.StrictCheck

Minimax

Minimax is a general algorithm for finding optimal strategies. It’s not meant to be efficient or practical. It is more of a basic concept of game theory, and a reference against which to compare other game-solving algorithms.

We consider a simple model of two-player games. They take turns playing moves until reaching an end state with a final score. One player’s goal is to maximize the score, whereas the other player’s goal is to minimize it. Let us call these players Max and Min respectively, short for Maximizer and Minimizer.

We represent such a game by its game tree, which is made up of three constructors: a Max (resp. Min) node represents a game state where Max (resp. Min) chooses the next move, each move resulting in a new game state, and an End leaf represents an end state as its score.

data Game score
  = Max (NonEmpty (Game score))
  | Min (NonEmpty (Game score))
  | End score
  deriving stock (Show, Functor, Foldable)

Note that Max and Min nodes must have at least one possible move. You may be wondering about games that end when one player can no longer play: instead of an empty Min or Max node, such game states simply correspond to an End leaf, making the final score explicit.

Most real games just have a win/tie/lose end condition. They naturally correspond to applying Game to a type with three possible scores:

data WinLose = MinWins | Tie | MaxWins
  deriving (Eq, Ord, Show)

In practice, chess engines don’t work with the whole game tree since it is too massive. Instead, they build approximations by pruning certain branches of the tree and replacing them with leaves. The score on each leaf is a number which estimates how favorable the game state is to either player. So we end up with Game ℝ, or Game Double.

In general, the type Game represents two-player games with complete information and zero-sum objectives.

We shall assume that score is a totally ordered set. This requirement corresponds to a constraint Ord score in Haskell. In that case, there exists an “optimal strategy” for each player which guarantees them an “optimal score” m in the sense that as long as one player sticks to their “optimal strategy”, the other player cannot score better than m. This situation is what we call a Nash equilibrium in game theory. For win/tie/lose games, the existence of a Nash equilibrium means that either there is a winning strategy for one of the players, or they must tie by playing optimally.

The “optimal score” m is unique, and can be computed by a fold of the game tree, replacing Max and Min constructors with the functions maximum and minimum. This is the minimax algorithm:

minimax :: Ord score => Game score -> score
minimax (Max gs) = maximum (minimax <$> gs)
minimax (Min gs) = minimum (minimax <$> gs)
minimax (End s) = s

minimax is quite an inefficient algorithm: it must traverse the whole game tree. Indeed, maximum and minimum must traverse the whole list to find the maximum or minimum element.

Often, we can do much better. For instance, consider the following tree:

Max [ End 0,
      Min [ End (-1),
            t ] ]

The minimax of that tree does not depend on the subtree t. Indeed, minimum [-1, minimax t] is guaranteed to be at most -1, so the maximum between that value and 0 is guaranteed to be 0. Thus we can compute the minimax without inspecting the subtree t, which may be arbitrarily large. That idea leads to a more efficient algorithm to compute the minimax.

Alpha-beta

The alpha-beta pruning algorithm¹ is a modification of minimax with an extra pair of arguments:

alphabeta :: Ord score => Game score -> (score, score) -> score

The pair (alpha, beta) represents a “relevance interval” which relaxes the possible outputs of alphabeta. Either alphabeta t (alpha, beta) produces a score within that interval, in which case it is guaranteed to be equal to minimax. Otherwise, alphabeta t (alpha, beta) produces a value outside of the interval, in which case its exact value does not matter; it only has to be on the same side of the interval as minimax t. More rigorously:

if alpha < minimax t < beta, then alphabeta t (alpha, beta) = minimax t;
if minimax t <= alpha, then alphabeta t (alpha, beta) <= alpha;
if beta <= minimax t, then beta <= alphabeta t (alpha, beta).

Leaving the value of alphabeta underspecified when outside of the interval allows the implementation to short-circuit: we can stop searching through Max nodes as soon as we can guarantee a score greater than beta, and we can stop searching through Min nodes as soon as we can guarantee a score smaller than alpha.

We can then use alphabeta to redefine minimax:

-- Minimax using alpha-beta pruning
minimaxAB :: (Ord score, Bounded score) => Game score -> score
minimaxAB t = alphabeta t (minBound, maxBound)

assuming that score is Bounded with extreme values minBound :: score and maxBound :: score. It’s possible to avoid the Bounded constraint by changing the interval type (score, score) to (Maybe score, Maybe score), which amounts to adding distinguished top and bottom elements. We’ll stick with Bounded to keep things a bit simpler.

Implementing alphabeta is a standard exercise. It is even easier when you have a formal specification like the above to guide the implementation.

alphabeta :: Ord score => Game score -> (score, score) -> score
alphabeta (Max (g0 :| [])) i = alphabeta g0 i
alphabeta (Max (g0 :| g1 : gs)) (alpha, beta) =
  let m0 = alphabeta g0 (alpha, beta) in
  if beta <= m0 then m0
  else m0 `max` alphabeta (Max (g1 :| gs)) (max alpha m0, beta)
alphabeta (Min (g0 :| [])) i = alphabeta g0 i
alphabeta (Min (g0 :| g1 : gs)) (alpha, beta) =
  let m0 = alphabeta g0 (alpha, beta) in
  if m0 <= alpha then m0
  else m0 `min` alphabeta (Min (g1 :| gs)) (alpha, min beta m0)
alphabeta (End s) _ = s

But still, it is at least a little finicky and tedious to make sure that you haven’t mixed your alphas and betas.

As we will see in this post, we can streamline the implementation of alpha-beta pruning by factoring the short-circuiting logic out of the “minimax” logic.

Generalized minimax

Remark that minimax only uses min and max (via minimum and maximum), rather than the comparison functions of Ord (compare, (<=), etc.).

We can reduce the dependency footprint of minimax by defining a new class with only the necessary operations, the class of lattices:

class Lattice a where
  -- Join, least upper bound, max
  (\/) :: a -> a -> a 
  -- Meet, greatest lower bound, min
  (/\) :: a -> a -> a

In mathematics, lattices are algebraic structures with two operations (\/) (“join”) and (/\) (“meet”) satisfying commutativity, associativity, as well as the absorption laws:

x \/ (x /\ y) = x
x /\ (x \/ y) = x

In this post, we will only be looking at lattices that arise out of total orders, so this class is rather just a way of saying that we only depend on min and max.

Binary operations can be iterated to combine lists of arguments, similarly to the maximum and minimum functions:

-- maximum
joins :: Lattice a => NonEmpty a -> a
joins = foldr1 (\/)

-- minimum
meets :: Lattice a => NonEmpty a -> a
meets = foldr1 (/\)

Minimax in lattices is defined by replacing Max and Min nodes with the joins and meets operations.

-- Minimax in lattices
minimaxL :: Lattice score => Game score -> score
minimaxL (Max gs) = joins (minimaxL <$> gs)
minimaxL (Min gs) = meets (minimaxL <$> gs)
minimaxL (End x) = x

minimaxL generalizes minimax since every decidable total order is a lattice (because you can use (<=) to define min/max). Ideally this fact would be made explicit by making Lattice into a superclass of Ord. Unfortunately in Haskell this would require us to modify Ord or redefine it. Another way to express the relation between Lattice and Ord is through a newtype.

newtype OrdLattice a = OrdLattice a
   deriving newtype (Eq, Ord, Bounded)

unOrdLattice :: OrdLattice a -> a
unOrdLattice (OrdLattice x) = x

instance Ord a => Lattice (OrdLattice a) where
  OrdLattice x \/ OrdLattice y = OrdLattice (max x y)
  OrdLattice x /\ OrdLattice y = OrdLattice (min x y)

With that, we recover the starting minimax by specializing minimaxL to OrdLattice s, and then unwrapping OrdLattice:

minimaxO :: Ord score => Game score -> score
minimaxO = unOrdLattice . minimaxL . fmap OrdLattice

Clamping functions

Focus on the type (score, score) -> score which appears in the signature of alphabeta. More specifically, we are interested in a subset of those functions that we shall call clamping functions.

Intuitively, a clamping function f is a delayed representation of a constant s: the goal of f is to compute s, but it may also stop early with an approximation if it’s not necessary to know the exact value of s.

The name “clamping function” is a reference to the clamp function:

clamp :: Ord score => score -> (score, score) -> score
clamp s (alpha, beta) = max alpha (min s beta)

We can think of the partially applied function clamp s as an encoding of the constant s, which may or may not be output depending on the interval (alpha, beta).

More formally, a clamping function with value s is a function f :: (score, score) -> score that satisfies the following, for all (alpha, beta) such that alpha < beta:

if alpha < s < beta, then f (alpha, beta) = s;
if s <= alpha, then f (alpha, beta) <= alpha;
if beta <= s, then beta <= f (alpha, beta).

Two clamping functions with the same value s are considered equal. In particular, as clamping functions, const s is equal to clamp s. Making the notion of equality explicit is necessary to make sense of equations (laws for lattices, homomorphisms, and isomorphisms).

We enshrine the definition of clamping functions in a newtype:

-- Type of clamping functions, satisfying the properties above.
newtype Clamping score = Clamping ((score, score) -> score)

unClamping :: Clamping score -> (score, score) -> score
unClamping (Clamping f) = f

For any value s, we can construct the constant clamping function:

clamping :: score -> Clamping score
clamping s = Clamping (\_ -> s)

Note that \_ -> s and clamp s are both clamping functions with value s, so both are valid definitions of clamping s. We prefer the constant function \_ -> s because it does less work.

Conversely, we can project clamping functions back into their values by passing the whole interval (minBound, maxBound):

declamp :: Bounded score => Clamping score -> score
declamp (Clamping f) = f (minBound, maxBound)

Those two functions form an isomorphism between score and Clamping score, meaning that they satisfy the following equations:

declamp . clamping = id
clamping . declamp = id

We now get to the secret sauce of this post: the maximum of two clamping functions (as well as the minimum). This operation can be defined in two ways. First is the naive definition, for reference:

-- "max" for clamping functions, naive variant
maxC :: Ord s => Clamping s -> Clamping s -> Clamping s
maxC (Clamping f) (Clamping g) = Clamping (\i -> max (f i) (g i))

Second is the lazy definition: if f (alpha, beta) is greater than the given upper bound beta, then the max of f and g will be even greater:

beta <= f (alpha, beta) <= max (f (alpha, beta)) (g (alpha, beta))

In that case, the maximum of f and g is allowed to output f (alpha, beta) without looking at g. Otherwise we must evaluate g, but we can tighten the interval by updating the lower bound to max alpha (f (alpha, beta)).

-- "max" for clamping functions, lazy variant
lazyMaxC :: Ord s => Clamping s -> Clamping s -> Clamping s
lazyMaxC (Clamping f) (Clamping g) = Clamping (\(alpha, beta) ->
  let fi = f (alpha, beta) in
  if beta <= fi then fi else fi `max` g (max alpha fi, beta))

Dually, we also have a lazyMinC.

lazyMinC :: Ord s => Clamping s -> Clamping s -> Clamping s
lazyMinC (Clamping f) (Clamping g) = Clamping (\(alpha, beta) ->
  let fi = f (alpha, beta) in
  if fi <= alpha then fi else fi `min` g (alpha, min beta fi))

To avoid repeating ourselves, we can also reuse lazyMaxC to implement lazyMinC. Use Down to invert the ordering of an Ord:

lazyMinC :: Ord s => Clamping s -> Clamping s -> Clamping s
lazyMinC f g = undualize (lazyMaxC (dualize f) (dualize g))
  where
    dualizeWith from to (Clamping h) =
      Clamping (\(beta, alpha) -> from (h (to alpha, to beta)))
    dualize   = dualizeWith Down getDown -- Clamping s -> Clamping (Down s)
    undualize = dualizeWith getDown Down -- Clamping (Down s) -> Clamping s

These “naive” and “lazy” functions denote the same value (maxC = lazyMaxC and minC = lazyMinC), but lazyMaxC and lazyMinC may do less work, either by ignoring their second argument or by applying it to a smaller interval than expected.

The point is that these “lazy” functions embody the short-circuiting logic of alpha-beta pruning exactly. All that’s left to do is to plug them into minimax.

The lattice of clamping functions

With the lazy min and max that we just defined, we get a lattice:

instance Ord score => Lattice (Clamping score) where
  (\/) = lazyMaxC
  (/\) = lazyMinC

Specialize minimax in the lattice of clamping functions:

minimaxC :: Ord score => Game (Clamping score) -> Clamping score
minimaxC = minimaxL

This doesn’t look like much, but we have actually implemented the alpha-beta pruning algorithm. With a tiny bit of plumbing, we can redefine the function alphabeta from earlier:

alphabeta' :: Ord score => Game score -> (score, score) -> score
alphabeta' = unClamping . minimaxC . fmap clamping

Then we want to partially apply alphabeta' to the interval (minBound, maxBound). This amounts to replacing unClamping with declamp in the body of alphabeta'. Behold our final implementation of minimax by alpha-beta pruning:

minimaxAB' :: (Ord score, Bounded score) => Game score -> score
minimaxAB' = declamp . minimaxC . fmap clamping

To sum up, we implemented alpha-beta pruning as a simple combination of:

minimax, generalized from orders to lattices (minimaxL);
the lattice of clamping functions (Lattice (Clamping score)).

This alternative approach does not completely absolve you from effort: you still have to juggle alphas and betas correctly to implement the lattice (lazyMinC and lazyMaxC). But unlike in the original alphabeta, you don’t have to do all that juggling in the middle of a recursive function. The logic of alpha-beta pruning is neatly decomposed into bite-sized pieces.

Correctness for free

Since we just reused the code of minimax, it’s also easier to prove that that alpha-beta pruning yields the same result:

minimax = minimaxAB'

As we are about to see, this is a direct consequence of the free theorem² for minimaxL: any function of type forall s. Lattice s => Game s -> s, such as minimaxL, commutes with any lattice homomorphism³ f, in the following sense:

f . minimaxL = minimaxL . fmap f

We can picture that equation as a commutative diagram:

\[\require{AMScd} \begin{CD} \small\texttt{Game s} @>{\texttt{minimaxL}}>> \small\texttt{s} \\ @V{\texttt{fmap f}}VV @VV{\texttt{f}}V \\ \small\texttt{Game t} @>{\texttt{minimaxL}}>> \small\texttt{t} \end{CD}\]

If f has an inverse f⁻¹, we can rewrite that to

minimaxL = f⁻¹ . minimaxL . fmap f

By replacing (f, f⁻¹) with the isomorphism (clamping, declamp) defined earlier, we obtain exactly the equality between minimax and alpha-beta pruning:

minimaxL = declamp . minimaxL . fmap clamping
         = minimaxAB'

As a commutative diagram:

\[\require{AMScd} \begin{CD} \small\texttt{Game s} @>{\texttt{minimaxAB’}\text{ (alpha-beta)}}>> \small\texttt{s} \\ @V{\texttt{fmap clamping}}VV @AA{\texttt{declamp}}A \\ \scriptsize\texttt{Game (Clamping s)} @>{\texttt{minimaxL}}>> \scriptsize\texttt{Clamping s} \end{CD}\]

QED.

(To be pedantic, the above proof conflates minimaxL with minimax/minimaxO, which relies on pretending that Lattice is a superclass of Ord. Below is another proof that doesn’t take that shortcut, by going through the OrdLattice newtype explicitly, so this proof applies more directly to the Haskell definitions as written here.)

A somewhat more rigorous proof

We want to prove that the alpha-beta-pruning minimaxAB' is equivalent to the naive minimax:

minimax = minimaxAB'

Recall the free theorem of minimaxL. For any lattice isomorphism (f, f⁻¹):

minimaxL = f⁻¹ . minimaxL . fmap f

Replace (f, f⁻¹) with the lattice isomorphism (clamping . unOrdLattice, OrdLattice . declamp) between the lattices OrdLattice score and Clamping score.

minimaxL = OrdLattice . declamp . minimaxL . fmap (clamping . unOrdLattice)

Now we can prove the equality between minimax and minimaxAB', using the above equation as the middle step, followed by canceling inverses:

minimax
= minimaxO
= unOrdLattice . minimaxL . fmap OrdLattice
= unOrdLattice . OrdLattice . declamp . minimaxL . fmap (clamping . unOrdLattice) . fmap OrdLattice
= declamp . minimaxL . fmap clamping
= minimaxAB'

The above is only a proof of functional correctness: minimax and minimaxAB' compute the same result.

To verify that minimaxAB' does so more efficiently is another problem for another day. For now, we can test it.

Strictness check

We test that our “fancy” implementation of alpha-beta (minimaxAB') has the same strictness as the “classical” implementation (minimaxAB), which we presume to be much lazier than minimax.

We use StrictCheck for property-testing of strictness behaviors in Haskell. The following test checks that minimaxAB and minimaxAB' have the same demand on random inputs. We use the function observe1 from StrictCheck to observe the demand of a function f: observe1 applies f it to an instrumented copy of the provided input g, it forces the output (f g of type Int) using the provided forcing function (`seq` ()), and finally returns the demand on the input tree g that was observed by forcing the instrumented copy of g.

main :: IO ()
main = do
  quickCheck $ \(g :: Game Int) ->
    label (bucket (length g)) $
    let demand f = snd (observe1 (`seq` ()) f g) in
    demand minimaxAB === demand minimaxAB'

From the source repository of this blog, the following command compiles and runs this blog post:

cabal run alpha-beta

Instances and auxiliary definitions

-- Histogram of generated value sizes
bucket :: Int -> String
bucket n | n == 1 = "= 1"
         | n < 10 = "< 10"
         | n < 100 = "< 100"
         | n < 1000 = "< 1000"
         | otherwise = ">= 1000"

-- Instances
deriving stock instance GHC.Generic (Game a) 
instance Generic (Game a)
instance HasDatatypeInfo (Game a)
instance Shaped a => Shaped (Game a)

instance Arbitrary a => Arbitrary (Game a) where
  arbitrary = sized $ \n -> if n == 0 then End <$> arbitrary else
    resize (n `div` 2) $ frequency
      [(1, End <$> arbitrary), (2, Max <$> arbitrary), (2, Min <$> arbitrary)]
  shrink (Max (g :| gs)) = g : gs ++ (Max <$> shrink (g :| gs))
  shrink (Min (g :| gs)) = g : gs ++ (Min <$> shrink (g :| gs))
  shrink (End s) = End <$> shrink s

instance Arbitrary a => Arbitrary (NonEmpty a) where
  arbitrary = liftA2 (:|) arbitrary arbitrary
  shrink (x :| xs) = [y :| ys | y : ys <- shrink (x : xs)]

Conclusion

I came up with this idea a while back on Stack Overflow, as an answer to Alpha-beta pruning with recursion schemes. My understanding of alpha-beta pruning changed overnight from a somewhat tricky algorithm to a completely trivial solution. Getting to reuse minimax is not only a satisfying achievement in refactoring, it enables a neat proof of correctness by parametricity (via free theorems).

The role of laziness should also be underscored. If you try to do the same thing in a call-by-value language, the implementation of “generalized minimax” must explicitly delay computations, obscuring the point:

Alpha-beta pruning is just minimax in a lattice of clamping functions.

For a clearer presentation, see the talk Alpha-Beta Pruning Explored, Extended and Verified (2024) by Tobias Nipkow.↩︎
Theorems for free! by Philip Wadler. Free theorems involving type constructor classes by Janis Voigtländer.↩︎
A lattice homomorphism f is a function that commutes with the lattice operations:
```
f (x /\ y) = f x /\ f y
f (x \/ y) = f x \/ f y
```
↩︎

]]> Mon, 01 Sep 2025 00:00:00 UT https://blog.poisson.chat/posts/2025-09-01-alpha-beta.html Lysxia Twentyseven 1.0.0 https://blog.poisson.chat/posts/2025-08-01-twentyseven.html Twelve years of Haskell

Twentyseven is a Rubik’s cube solver and one of my earliest projects in Haskell. The first commit dates from January 2014, and version 0.0.0 was uploaded on Hackage in March 2016.

I first heard of Haskell in a course on lambda calculus in 2013. A programming language with lazy evaluation sounded like a crazy idea, so I gave it a try. Since then, I have kept writing in Haskell as my favorite language. For me it is the ideal blend of programming and math. And a Rubik’s cube solver is a great excuse for doing group theory.

Twentyseven 1.0.0 is more of a commemorative release for myself, with the goal of making it compile with the current version of GHC (9.12). There was surprisingly little breakage:

Aside from that, the code is basically just as it was 9 years ago, including design decisions that I would find questionable today. For example, I use unsafePerformIO to read precomputed tables into top-level constants, but the location of the files to read from can be configured by command-line arguments, so I better make sure that the tables are not forced before the location is set…

How Twentyseven works

The input of the program is a string enumerating the 54 facelets of a Rubik’s cube, each character represents one color.

DDDFUDLRB FUFDLLLRR UBLBFDFUD ULBFRULLB RRRLBBRUB UBFFDFDRU

The facelets follow the order pictured below. They are grouped by faces (up, left, front, right, back, top), and in each face they are listed in top-down, left-right order.

                  00 01 02
                  03 04 05
                  06 07 08

        10 11 12  20 21 22  30 31 32  40 41 42
        13 14 15  23 24 25  33 34 35  43 44 45
        16 17 18  26 27 28  36 37 38  46 47 48

                  50 51 52
                  53 54 55
                  56 57 58

The output is a sequence of moves to solve that cube.

U L B' L R2 D R U2 F U2 L2 B2 U B2 D' B2 U' R2 U L2 R2 U

The implementation of Twentyseven is based on Herbert Kociemba’s notes about Cube Explorer, a program written in Pascal!

The search algorithm is iterative deepening A*, or IDA*. Like A*, IDA* finds the shortest path between two vertices in a graph. A conventional A* is not feasible because the state space of a Rubik’s cube is massive (43 252 003 274 489 856 000 states, literally billions of billions). Instead, we run a series of depth-first searches with a maximum allowed number of moves that increases for each search. As it is based on depth-first search, IDA* only needs memory for the current path, which is super cheap.

IDA* relies on an estimate of the number of moves remaining to reach the solved state. We obtain such an estimate by projecting the Rubik’s cube state into a simpler puzzle. For example, we can consider only the permutation of corners, ignoring their orientation. We can pre-compute a table mapping each corner permutation (there are 8! = 40320) to the minimum number of moves to put the corners back to their location. This is a lower bound on the number of moves to actually solve a Rubik’s cube. Different projections yield different lower bounds (for example, by looking at the permutation of edges instead, or their orientation), and we can combine lower bounds into their maximum, yielding a more precise lower bound, and thus a faster IDA*.

Putting all that together, we obtain an optimal solver for Rubik’s cubes. But even with these heuristics, Twentyseven can take hours to solve a random cube optimally. Kociemba’s Cube Explorer is apparently much faster (I’ve never tried it myself). My guess is that the difference is due to a better selection of projections, yielding better heuristics. But I haven’t gotten around to figure out whether I’ve misinterpreted his notes or those improvements can only be found in the code.

A faster alternative is Kociemba’s two phase algorithm. It is suboptimal, but it solves Rubik’s cubes in a fraction of a second (1000 cubes per minute). The first phase puts cubies into a “common orientation” and “separates” the edges into two groups. In other words, we reach a state where the permutation of 12 edges can be decomposed into two disjoint permutations of 4 and 8 edges respectively. In the second phase, we restrict the possible moves: quarter- and half-turns on the top and bottom faces, half-turns only on the other faces. These restricted moves preserve the “common orientation” of edges and corners from phase 1, and the edges in the middle slice stay in their slice. Each phase thus performs an IDA* search in a much smaller space than the full Rubik’s cube state space (2 217 093 120 and 19 508 428 800 states respectively).

]]> Fri, 01 Aug 2025 00:00:00 UT https://blog.poisson.chat/posts/2025-08-01-twentyseven.html Lysxia Unfolding trees breadth-first in Haskell https://blog.poisson.chat/posts/2025-03-30-breadth-first-unfolds.html To visit a tree or graph in breadth-first order, there are two main implementation approaches: queue-based or level-based. Our goal here is to develop a level-based approach where the levels of the breadth-first walk are constructed compositionally and dynamically.

Compositionality means that for every node, its descendants—the other nodes reachable from it—are defined by composing the descendants of its children. Dynamism means that the children of a node are generated only when that node is visited; we will see that this requirement corresponds to asking for a monadic unfold.

A prior solution, using the Phases applicative functor, is compositional but not dynamic in that sense. The essence of Phases is a zipping operation in free applicative functors. What if we did zipping in free monads instead?

This is a Literate Haskell post. The source code is on Gitlab. A reusable version of this code is now available on Hackage: the weave library.

Extensions and imports for this Literate Haskell file

{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE DeriveTraversable #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE PackageImports #-}
{-# LANGUAGE PatternSynonyms #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE ViewPatterns #-}
{-# OPTIONS_GHC -Wno-x-partial -Wno-unused-matches -Wno-unused-top-binds -Wno-unused-imports #-}

import "deepseq" Control.DeepSeq (NFData)
import Data.Foldable (toList)
import Data.Function ((&))
import Data.Functor ((<&>))
import Data.Functor.Identity (Identity(..), runIdentity)
import GHC.Generics (Generic)
import "tasty" Test.Tasty (TestTree, localOption)
import "tasty-hunit" Test.Tasty.HUnit ((@?=), testCase)
import "tasty-bench" Test.Tasty.Bench (bgroup, bench, defaultMain, nf, bcompare)
-- import "tasty-bench" Test.Tasty.Bench (mutatorCpuTime)
import "tasty-expected-failure" Test.Tasty.ExpectedFailure (expectFail)
import "some" Data.Some.Newtype (Some(Some))
import "transformers" Control.Monad.Trans.State
import qualified "containers" Data.Set as Set
import "containers" Data.Set (Set)

Background: breadth-first folds and traversals

Our running example will be the type of binary trees:

data Tree a = Leaf | Node a (Tree a) (Tree a)
  deriving (Eq, Show, Generic, NFData)

A breadth-first walk explores the tree level by level; every level contains the nodes at the same distance from the root. The list of levels of a tree can be defined recursively—it is a fold. For a tree Node x l r, the first level contains just the root node x, and the subsequent levels are obtained by appending the levels of the subtrees l and r pairwise.

levels :: Tree a -> [[a]]
levels Leaf = []
levels (Node x l r) = [x] : zipLevels (levels l) (levels r)

zipLevels :: [[a]] -> [[a]] -> [[a]]
zipLevels [] yss = yss
zipLevels xss [] = xss
zipLevels (xs : xss) (ys : yss) = (xs ++ ys) : zipLevels xss yss

(We can’t just use zipWith because it throws away the end of a list when the other list is empty.)

Finally, we concatenate the levels together to obtain the list of nodes in breadth-first order.

toListBF :: Tree a -> [a]
toListBF = concat . levels

Thanks to laziness, the list will indeed be produced by walking the tree in breadth-first order. So far so good.

The above function lets us fold a tree in breadth-first order. The next level of difficulty is to traverse a tree, producing a tree with the same shape as the original tree, only with modified labels.

traverseBF :: Applicative m => (a -> m b) -> Tree a -> m (Tree b)

This has the exact same type as traverse, which you might obtain with deriving (Foldable, Traversable). The stock-derived Traversable—enabled by the DeriveTraversable extension—is a depth-first traversal, but the laws of traverse don’t specify the order in which nodes should be visited, so you could make it a breadth-first traversal if you wanted.

To define a breadth-first traversal is a surprisingly non-trivial exercise, as pointed out by Chris Okasaki in Breadth-first numbering: lessons from a small exercise in algorithm design (ICFP 2000).

“Breadth-first numbering” is a special case of “breadth-first traversal” where the arrow (a -> m b) is specialized to a counter. Okasaki presents a “numbering” solution based on queues and another solution based on levels. Both are easily adaptable to the more general “traversal” problem as we will soon see.

There is a wonderful Discourse thread from 2024 on the topic of breadth-first traversals. The first post gives an elegant breadth-first numbering algorithm which also appears in the appendix of Okasaki’s paper, but sadly it does not generalize from “numbering” to “traversal” beyond the special case m = State s.

Last but not least, another level-based solution to the breadth-first traversal problem can be found in the tree-traversals library by Noah Easterly. It is built around an applicative transformer named Phases, which is a list of actions—imagine the type “[m _]”—where each element m _ represents one level of the tree. The Phases applicative enables a compositional definition of a breadth-first traversal, similarly to the levels function above: the set of nodes reachable from the root is defined by combining the sets of nodes reachable from its children. This concern of compositionality is one of the main motivations behind this post.

Non-standard terminology

The broad family of algorithms being discussed is typically called “breadth-first search” (BFS) or “breadth-first traversal”, but in general these algorithms are not “searching” for anything, and in Haskell, “traversal” is reserved for “things like traverse”. Instead, this post will use “walks” as a term encompassing folds, traversals, unfolds, or any concept that can be qualified with “breadth-first”.

Problem statement: Breadth-first unfolds

Both the fold toListBF and the traversal traverseBF had in common that they receive a tree as an input. This explicit tree makes the notion of levels “static”. With unfolds, we will have to deal with levels that exist only “dynamically” as the result of unfolding the tree progressively.

To introduce the unfolding of a tree, it is convenient to introduce its “base functor”. We modify the tree type by replacing the recursive tree fields with an extra type parameter:

data TreeF a t = LeafF | NodeF a t t
  deriving (Functor, Foldable, Traversable)

An unfold generates a tree from a seed and a function which expands the seed into a leaf or a node containing more seeds. A pure unfold—or anamorphism—can be defined readily:

unfold :: (s -> TreeF a s) -> s -> Tree a
unfold f s = case f s of
  LeafF -> Leaf
  NodeF a l r -> Node a (unfold f l) (unfold f r)

The order in which nodes are evaluated depends on how the resulting tree is consumed. Hence unfold is neither inherently “depth-first” nor “breadth-first”.

The situation changes if we make the unfold monadic.

unfoldM :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)

An implementation of unfoldM must decide upon an ordering between actions. To see why adding an M to unfold imposes an ordering, contemplate the fact that these expressions have the same meaning:

Node a (unfold f l) (unfold f r)
= ( let tl = unfold f l in
    let tr = unfold f r in
    Node a tl tr )
= ( let tr = unfold f r in
    let tl = unfold f l in
    Node a tl tr )

whereas these monadic expressions do not have the same meaning in general:

( unfoldM f l >>= \tl ->
  unfoldM f r >>= \tr ->
  pure (Node a tl tr) )
/=
( unfoldM f r >>= \tr ->
  unfoldM f l >>= \tl ->
  pure (Node a tl tr) )

Without further requirements, there is an “obvious” definition of unfoldM, which is a depth-first unfold:

unfoldM_DF :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_DF f s = f s >>= \case
  LeafF -> pure Leaf
  NodeF a l r -> liftA2 (Node a) (unfoldM_DF f l) (unfoldM_DF f r)

We unfold the left subtree l fully before unfolding the right one r.

The problem is to define a breadth-first unfoldM.

If you want to think about this problem on your own, you can stop reading here. The rest of this post presents solutions.

Queue-based unfold

The two breadth-first numbering algorithms in Okasaki’s paper can actually be generalized to breadth-first unfolds. Here is the first one that uses queues (using the function (<+) for “push” and pattern-matching on (:>) for “pop”):

unfoldM_BF_Q :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_Q f b0 = go (b0 <+ Empty) <&> \case
    _ :> t -> t
    _ -> error "impossible"
  where
    go Empty = pure Empty
    go (q :> b) = f b >>= \case
      LeafF -> go q <&> \p -> Leaf <+ p
      NodeF a b1 b2 -> go (b2 <+ b1 <+ q) <&> \case
        p :> t1 :> t2 -> Node a t1 t2 <+ p
        _ -> error "impossible"

(The operator (<&>) is flip (<$>). I use it to avoid parentheses around lambdas.)

Queue implementation for unfoldM_BF_Q

data Q a = Q [a] [a]

pattern Empty :: Q a
pattern Empty = Q [] []

infixr 1 <+
(<+) :: a -> Q a -> Q a
x <+ Q xs ys = Q (x : xs) ys

pop :: Q a -> Maybe (Q a, a)
pop (Q xs (y : ys)) = Just (Q xs ys, y)
pop (Q xs []) = case reverse xs of
  [] -> Nothing
  y : ys -> Just (Q [] ys, y)

infixl 1 :>
pattern (:>) :: Q a -> a -> Q a
pattern q :> y <- (pop -> Just (q, y))

{-# COMPLETE Empty, (:>) #-}

As it happens, containers uses that queue-based technique to implement breadth-first unfold for rose trees (Data.Tree.unfoldTreeM_BF). There is a pending question of whether we can improve upon it. This post might provide a theoretical alternative, but it seems too slow to be worth serious consideration (see the benchmark section).

If you’re frowning upon the use of error—as you should be—you can replace error with dummy values here (Empty, Leaf), but (1) that won’t be possible with tree structures that must be non-empty (e.g., if Leaf contained a value) and (2) this is dead code, which is harmless but no more elegant than making it obvious with error.

The correctness of this solution is also not quite obvious. There are subtle ways to get this implementation wrong: should the recursive call be b2 <+ b1 <+ q or b1 <+ b2 <+ q? Should the pattern be p :> t1 :> t2 or p :> t2 :> t1? For another version of this challenge, try implementing the unfold for another tree type, such as finger trees or rose trees, without getting lost in the order of pushes and pops (by the way, this is Data.Tree.unfoldTreeM_BF in containers). The invariant is not complex but there is room for mistakes. I believe that the compositional approach that will be presented later is more robust on that front, although it is admittedly a subjective quality for which is difficult to make a strong case.

Some uses of unfolds

Traversals from unfolds

One sense in which unfoldM is a more difficult problem than traverse is that we can use unfoldM to implement traverse. We do have to make light of the technicality that there is a Monad constraint instead of Applicative, which makes unfoldM not suited to implement the Traversable class.

A depth-first unfold gives a depth-first traversal:

traverse_DF :: Monad m => (a -> m b) -> Tree a -> m (Tree b)
traverse_DF = unfoldM_DF . traverseRoot

-- auxiliary function
traverseRoot :: Applicative m => (a -> m b) -> Tree a -> m (TreeF b (Tree a))
traverseRoot _ Leaf = pure LeafF
traverseRoot f (Node a l r) = f a <&> \b -> NodeF b l r

A breadth-first unfold gives a breadth-first traversal:

traverse_BF_Q :: Monad m => (a -> m b) -> Tree a -> m (Tree b)
traverse_BF_Q = unfoldM_BF_Q . traverseRoot

Unfolds in graphs

We can use a tree unfold to explore a graph. This usage distinguishes unfolds from folds and traversals, which only let you explore trees.

Given a type of vertices V, a directed graph is represented by a function V -> F V, where F is a functor which describes the arity of each node. The obvious choice for F is lists, but we will stick to TreeF here so we can just reuse this post’s unfoldM implementations. The TreeF functor restricts us graphs where each node has zero or two outgoing edges; it is a weird restriction, but we will make do for the sake of example.

An ASCII drawing of a graph

        +-------+
        v       |
+->1--->2--->3  |
|  |    |    ^  |
|  v    v    |  |
|  4--->5--->6--+
|  |    |    ^
|  +----|----+
|       |
+-------+

The graph drawn above turns into the following function, where every vertex is mapped either to NodeF with the same vertex as the first argument followed by its two adjacent vertices, or to LeafF if it has no outgoing edges or does not belong to the graph.

graph :: Int -> TreeF Int Int
graph 1 = NodeF 1 2 4
graph 2 = NodeF 2 3 5
graph 3 = LeafF
graph 4 = NodeF 4 5 6
graph 5 = NodeF 5 1 6
graph 6 = NodeF 6 2 3
graph _ = LeafF

If we simply feed that function to unfold, we will get the infinite tree of all possible paths from a chosen starting vertex.

To obtain a finite tree, we want to keep track of vertices that we have already visited, using a stateful memory. The following function wraps graph, returning LeafF also if a vertex has already been visited.

visitGraph :: Int -> State (Set Int) (TreeF Int Int)
visitGraph vertex = do
  visited <- get
  if vertex `elem` visited then pure LeafF
  else do
    put (Set.insert vertex visited)
    pure (graph vertex)

Applying unfoldM_BF to that function produces a “breadth-first tree” of the graph, an encoding of the trajectory of a breadth-first walk through the graph. “Breadth-first trees” are a concept from graph theory with well-studied properties.

-- Visit `graph` in breadth-first order
bfGraph_Q :: Int -> Tree Int
bfGraph_Q = (`evalState` Set.empty) . unfoldM_BF_Q visitGraph

testGraphQ :: TestTree
testGraphQ = testCase "Q-graph" $
  bfGraph_Q 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

Compile and run

This post is a compilable Literate Haskell file. You can run all of the tests and benchmarks in here. The source repository provides the necessary configuration to build it with cabal.

$ cabal build breadth-first-unfolds

Test cases can then be selected with the -p option and a pattern (see the tasty documentation for details). Run all tests and benchmarks by passing no option.

$ cabal exec breadth-first-unfolds -- -p "/Q-graph/||/S-graph/"
All
  Q-graph: OK
  S-graph: OK

“Global” level-based unfold

The other solution from Okasaki’s paper can also be adapted into a monadic unfold.

The starting point is to unfold a list of seeds [s] instead of a single seed: we can traverse the list with the expansion function s -> m (TreeF a s) to obtain another list of seeds, the next level of the breadth-first unfold, and keep going.

Iterating this process naively yields a variant of monadic unfold without a result. This no-result variant can be generalized from TreeF to any foldable structure:

-- Inner loop: multi-seed unfold
unfoldsM_BF_G_ :: (Monad m, Foldable f) => (s -> m (f s)) -> [s] -> m ()
unfoldsM_BF_G_ f [] = pure ()
-- Read from right to left: traverse, flatten, recurse.
unfoldsM_BF_G_ f xs = unfoldsM_BF_G_ f . concatMap toList =<< traverse f xs

-- Top-level function: single-seed unfold
unfoldM_BF_G_ :: (Monad m, Foldable f) => (s -> m (f s)) -> s -> m ()
unfoldM_BF_G_ f = unfoldsM_BF_G_ f . (: [])

Modifying this solution to create the output tree requires a little more thought. We must keep hold of the intermediate list of ts :: [TreeF a s] to reconstruct trees after the recursive call returns.

unfoldsM_BF_G :: Monad m => (s -> m (TreeF a s)) -> [s] -> m [Tree a]
unfoldsM_BF_G f [] = pure []
-- traverse, flatten, recurse, reconstruct
unfoldsM_BF_G f xs = traverse f xs >>= \ts ->
  reconstruct ts <$> unfoldsM_BF_G f (concatMap toList ts)

The reconstruction function picks a root in the first list and completes it with subtrees from the second list:

reconstruct :: [TreeF a s] -> [Tree a] -> [Tree a]
reconstruct (LeafF : ts) us = Leaf : reconstruct ts us
reconstruct (NodeF a _ _ : ts) (l : r : us) = Node a l r : reconstruct ts us
reconstruct _ _ = error "impossible"

You could modify the final branch to produce [], but error makes it explicit that this branch should never be reached by the unfold where it is used.

The top-level unfold function wraps the seed in a singleton input list and extracts the root from a singleton output list.

unfoldM_BF_G :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_G f = fmap head . unfoldsM_BF_G f . (: [])

Unit test testGraphG

bfGraph_G :: Int -> Tree Int
bfGraph_G = (`evalState` Set.empty) . unfoldM_BF_G visitGraph

testGraphG :: TestTree
testGraphG = testCase "Q-graph" $
  bfGraph_G 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

This solution is less brittle than the queue-based solution because we always traverse lists left-to-right. To avoid the uses of error in reconstruct, you can probably create a specialized data structure in place of [TreeF a s], but that is finicky in its own way.

In search of compositionality

Both of the solutions above (the queue-based and the “monolithic” level-based unfolds) stem from a global view of breadth-first walks: we are iterating on a list or a queue which holds all the seeds from one or two levels at a time. That structure represents a “front line” between visited and unvisited vertices, and every iteration advances the front line a little: with a queue we advance it one vertex at a time, with a list we advance the whole front line in an inner loop—one call to traverse—before recursing.

The opposite local view of breadth-first order is exemplified by the earlier levels function: it only produces a list of lists of the vertices reachable from the current root. It does so recursively, by composing together the vertices reachable from its children. Our goal here is to find a similarly local, compositional implementation of breadth-first unfolds.

Rather than defining unfoldM directly, which sequences the computations on all levels into a single computation, we will introduce an intermediate function weave that keeps levels separate—just as toListBF is defined using levels. The result of weave will be in an as yet unknown applicative functor F m depending on m. And because levels are kept separate, weave only needs a constraint Applicative m to compose computations on the same level. The goal is to implement this signature, where the result type F is also an unknown:

weave :: Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)

The name weave comes from visualizing a breadth-first walk as a path zigzagging across a tree like this:

which is reminiscent of weaving as in the making of textile:

Warp and weft illustration — Illustration from the Wikipedia article on Weaving: warp and weft.

With only what we know so far, a bit of type-directed programming leads to the following incomplete definition. We have constructed something of type m (F m (Tree a)), while we expect F m (Tree a):

weave :: Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
weave f s = _ (step <$> f s) where
  step :: TreeF a s -> F m (Tree a)
  step LeafF = pure Leaf
  step (NodeF a l r) = liftA2 NodeF (weave f l) (weave f r)

To fill the hole _, we postulate the following primitive, weft, as part of the unknown definition of F:

weft :: Applicative m => m (F m a) -> F m a

Intuitively, F m represents “multi-level computations”. The weft function constructs a multi-level (F m)-computation from one level of m-computation which returns the subsequent levels as an (F m)-computation.

We fill the hole with weft, completing the definition of weave:

weave :: forall m s a. Applicative m => (s -> m (TreeF a s)) -> s -> F m (Tree a)
weave f s = weft (weaveF <$> f s) where
  weaveF :: TreeF a s -> F m (Tree a)
  weaveF LeafF = pure Leaf
  weaveF (NodeF a l r) = liftA2 (Node a) (weave f l) (weave f r)

The function weave defines a multi-level computation which represents a breadth-first walk from a seed s:

the first level of the walk is f s, expanding the initial seed;
the auxiliary function weaveF constructs the remaining levels from the initial seed’s expansion:
- if the seed expands to LeafF, there are no more seeds, and we terminate with an empty computation (pure);
- if the seed expands to NodeF, we obtain two sub-seeds l and r, they generate their own weaves recursively (weave f l and weave f r), and we compose them (liftA2).

One way to think about weft is as a generalization of the following primitives: we can “embed” m-computations into F m, and we can “delay” multi-level (F m)-computations, shifting the m-computation on each level to the next level.

embed :: Applicative m => m a -> F m a
embed u = weft (pure <$> u)

delay :: Applicative m => F m a -> F m a
delay u = weft (pure u)

The key law relating these two operations is that embedded computations and delayed computations commute with each other:

embed u *> delay v = delay v <* embed u

The embed and delay operations are provided by the Phases applicative functor that I mentioned earlier, which enables breadth-first traversals, but not breadth-first unfolds. Thus, weft is a strictly more expressive primitive than embed and delay.

Eventually, we will run a multi-level computation as a single m-computation so that we can use weave to define unfoldM. The runner function will be called mesh:

mesh :: Monad m => F m a -> m a

It is characterized by this law which says that mesh executes the first level of the computation u :: m (F m a), then executes the remaining levels recursively:

mesh (weft u) = u >>= mesh

Putting everything together, weave and mesh combine into a breadth-first unfold:

unfoldM_BF :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF f s = mesh (weave f s)

It remains to find an applicative functor F equipped with weft and mesh.

The weave applicative

A basic approach to design a type is to make some of the operations it should support into constructors. The weave applicative WeaveS has constructors for pure and weft:

data WeaveS m a
  = EndS a
  | WeftS (m (WeaveS m a))

(The suffix “S” stands for Spoilers. Read on!)

We instantiate the unknown functor F with WeaveS.

type F = WeaveS

Astute readers will have recognized WeaveS as the free monad. Just as Phases has the same type definition as the free applicative functor but a different Applicative instance, we will give WeaveS an Applicative instance that does not coincide with the Applicative and Monad instances of the free monad.

Starting with the easy functions, weft is WeftS, and the equation for mesh above is basically its definition. We just need to add an equation for EndS.

weft :: m (WeaveS m a) -> WeaveS m a
weft = WeftS

mesh :: Monad m => WeaveS m a -> m a
mesh (EndS a) = pure a
mesh (WeftS u) = u >>= mesh

Recall that WeaveS represents multi-level computations. Computations are composed level-wise with the following liftS2. The interesting case is the one where both arguments are WeftS: we compose the first level with liftA2, and the subsequent ones with liftS2 recursively.

liftS2 :: Applicative m => (a -> b -> c) -> WeaveS m a -> WeaveS m b -> WeaveS m c
liftS2 f (EndS a) wb = f a <$> wb
liftS2 f wa (EndS b) = flip f b <$> wa
liftS2 f (WeftS wa) (WeftS wb) = WeftS ((liftA2 . liftS2) f wa wb)

liftS2 will be the liftA2 in WeaveS’s Applicative instance. The Functor and Applicative instances show that WeaveS is an applicative transformer: for every applicative functor m, WeaveS m is also an applicative functor.

instance Functor m => Functor (WeaveS m) where
  fmap f (EndS a) = EndS (f a)
  fmap f (WeftS wa) = WeftS ((fmap . fmap) f wa)

instance Applicative m => Applicative (WeaveS m) where
  pure = EndS
  liftA2 = liftS2

That completes the definition of unfoldM_BF: a level-based, compositional breadth-first unfold.

As a unit test, we copy the code for visiting a graph from earlier:

bfGraphS :: Int -> Tree Int
bfGraphS = (`evalState` Set.empty) . unfoldM_BF visitGraph

testGraphS :: TestTree
testGraphS = testCase "S-graph" $
  bfGraphS 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

Code golf

There is a variant of weave that I prefer:

weaveS :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveS m (Tree a))
weaveS f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a l r -> liftA2 (Node a) (weft (weaveS f l)) (weft (weaveS f r))

The outer weft constructor was moved into the recursive calls. The result type has an extra m, which makes it more apparent that we always start with a call to f. It’s the same vibe as replacing the type [a] with NonEmpty a when we know that a list will always have at least one element; weaveS always produces at least one level of computation. We also replace (<$>) with its flipped version (<&>) for aesthetic reasons: we can apply it to a lambda without parentheses, and that change makes the logic flow naturally from left to right: we first expand the seed s using f, and continue depending on whether the expansion produced LeafF or NodeF.

To define unfoldM, instead of applying mesh directly, we chain it with (>>=).

unfoldM_BF_S :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_S f s = weaveS f s >>= mesh

A wrinkle in time

That solution is Obviously Correct™, but it has a terrible flaw: it does not run in linear time!

We can demonstrate this by generating a “thin” tree whose height is equal to its size. The height h is the seed of the unfolding, and we generate a NodeF as long as it is non-zero, asking for a decreased height h - 1 on the right, and a zero height on the left.

thinTreeS :: Int -> Tree ()
thinTreeS = runIdentity . unfoldM_BF_S f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))

Compare the running times of evaluating thinTreeS at height 100 (the baseline) and at height 1000 (10x the baseline).

benchS :: TestTree
benchS = bgroup "S-thin"
  [ bench  "1x" (nf thinTreeS 100)
  , bench "10x" (nf thinTreeS 1000) & bcompare "S-thin.1x"
  ]

Benchmark output (relative):

height	time
baseline	1x
10x	105x

Raw output

$ cabal exec breadth-first-unfolds -- -p "S-thin"
All
  S-thin
    1x:  OK
      27.6 μs ± 2.6 μs, 267 KB allocated, 317 B  copied, 6.0 MB peak memory
    10x: OK
      2.90 ms ± 181 μs,  23 MB allocated, 178 KB copied, 7.0 MB peak memory, 105.35x

Multiplying the height by 10x makes the function run 100x slower. Dramatically quadratic.

Complexity analysis

We can compare this implementation with level from earlier, which is linear-time. In particular, looking at zipLevels with liftS2—which play similar roles—there is a crucial difference when one of the arguments is empty ([] or EndS): zipLevels simply returns the other argument, whereas liftS2 calls (<$>), continuing the recursion down the other argument. So zipLevels stops working after reaching the end of either argument, whereas liftS2 walks to the end of both arguments. There is at least one call to liftS2 on every level which will walk to the bottom of the tree, so we get a quadratic lower bound Ω(height²).

Out of sight, out of mind

The problematic combinators are fmap and liftS2, which weaveS uses to construct the unfolded tree. If we don’t care about that tree—wanting only the effect of a monadic unfold—then we can get rid of the complexity associated with those combinators.

With no result to return, we remove the a type parameter from the definition of WeaveS, yielding the oblivious (“O”) variant:

data WeaveO m
  = EndO
  | WeftO (m (WeaveO m))

We rewrite mesh into meshO, reducing a WeaveO m computation into m () instead of m a.

meshO :: Monad m => WeaveO m -> m ()
meshO EndO = pure ()
meshO (WeftO u) = u >>= meshO

The Applicative instance for WeaveS becomes a Monoid instance for WeaveO. liftA2 is replaced with (<>), zipping two computations level-wise.

instance Applicative m => Semigroup (WeaveO m) where
  EndO <> v = v
  u <> EndO = u
  WeftO u <> WeftO v = WeftO (liftA2 (<>) u v)

instance Applicative m => Monoid (WeaveO m) where
  mempty = EndO
  mappend = (<>)

To implement a breadth-first walk, we modify weaveS above by replacing liftA2 (Node a) with (<>). Note that the type parameter a is no longer in the result. It was only used in the tree that we decided to forget.

weaveO :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveO m)
weaveO f s = f s <&> \case
  LeafF -> mempty
  NodeF _ l r -> WeftO (weaveO f l) <> WeftO (weaveO f r)

Running weaveO with meshO yields a oblivious monadic unfold: it produces m () instead of m (Tree a). (You may remember seeing another implementation of that same signature just earlier, unfoldM_BF_G_.)

unfoldM_BF_O_ :: Monad m => (s -> m (TreeF a s)) -> s -> m ()
unfoldM_BF_O_ f s = weaveO f s >>= meshO

Previously, we benchmarked the function thinTreeS that outputs a tree by forcing the tree. With an oblivious unfold, there is no tree to force. Instead we will count the number of generated NodeF constructors:

thinTreeO :: Int -> Int
thinTreeO = (`execState` 0) . unfoldM_BF_O_ (state . f)
  where
    f 0 counter = (LeafF, counter)
    f h counter = (NodeF () 0 (h - 1), counter + 1)  -- increment the counter for every NodeF

We adapt the benchmark from before to measure the complexity of unfolding thin trees. We have to increase the baseline height from 100 to 500 because this benchmark runs so much faster than the previous ones.

benchO :: TestTree
benchO = bgroup "O-thin"
  [ bench  "1x" (nf thinTreeO 500)
  , bench "10x" (nf thinTreeO 5000) & bcompare "O-thin.1x"
  ]

Benchmark output (relative):

height	time
baseline	1x
10x	9.8x

Raw output

$ cabal exec breadth-first-unfolds -- -p O-thin
All
  O-thin
    1x:  OK
      148  μs ± 8.3 μs, 543 KB allocated, 773 B  copied, 6.0 MB peak memory
    10x: OK
      1.45 ms ± 113 μs, 5.4 MB allocated,  82 KB copied, 7.0 MB peak memory, 9.78x

The growth is linear, as desired: the “10x” bench is 10x slower than the baseline “1x” bench.

Laziness for the win

The oblivious unfold avoided quadratic explosion by simplifying the problem. Now let’s solve the original problem again, so we can’t just get rid of fmap and liftA2. As mentioned previously, the root cause was that (1) liftA2 calls fmap when one of the constructors is EndS, and (2) fmap traverses the other argument. The next solution will be to make fmap take constant time, by storing the “mapped function” in the constructor. Behold the “L” variant of WeaveS, which is a GADT:

data WeaveL m a where
  EndL :: a -> WeaveL m a
  WeftL :: m (WeaveL m b) -> (b -> a) -> WeaveL m a

For comparison, here is the previous “S” variant with GADT syntax:

data WeaveS m a where
  EndS :: a -> WeaveS m a
  WeftS :: m (WeaveS m a) -> WeaveS m a

This trick is also known as the “co-Yoneda construction”.

The definition of fmap is no longer recursive. It doesn’t even need m to be a functor anymore!

instance Functor (WeaveL m) where
  fmap f (EndL a) = EndL (f a)
  fmap f (WeftL wa g) = WeftL wa (f . g)

The Applicative instance is… a good exercise for the reader. The details are not immediately important—we only care about improving fmap for now—we will come back to have a look at the Applicative instance soon.

The runner function meshL is a simple bit of type Tetris.

meshL :: Monad m => WeaveL m a -> m a
meshL (EndL a) = pure a
meshL (WeftL wa f) = f <$> (wa >>= meshL)

By partially applying WeftL to id as its second argument, we obtain a counterpart to the unary WeftS constructor:

weftL :: m (WeaveL m a) -> WeaveL m a
weftL wa = WeftL wa id

With those primitives redefined, the “weave” and “unfold” are identical. Below, we only renamed the “S” suffixes to “L”:

weaveL :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveL m (Tree a))
weaveL f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a s1 s2 -> liftA2 (Node a) (weftL (weaveL f s1)) (weftL (weaveL f s2))

unfoldM_BF_L :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_L f s = weaveL f s >>= meshL

The benchmarks show that 10x the height takes 10x the time. Linear growth again.

Benchmark code and output: thinTreeL and benchL

Copy of the benchS benchmark.

thinTreeL :: Int -> Tree ()
thinTreeL = runIdentity . unfoldM_BF_L f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))

benchL :: TestTree
benchL = bgroup "L-thin"
  [ bench  "1x" (nf thinTreeL  100)
  , bench "10x" (nf thinTreeL 1000) & bcompare "L-thin.1x"
  ]

Benchmark output (relative):

height	time
baseline	1x
10x	9.93x

Raw output:

$ cabal exec breadth-first-unfolds -- -p "L-thin"     
All
  L-thin
    1x:  OK
      14.1 μs ± 782 ns,  59 KB allocated,   5 B  copied, 6.0 MB peak memory
    10x: OK
      140  μs ±  13 μs, 586 KB allocated,  51 B  copied, 6.0 MB peak memory, 9.93x

Lazy in more ways than one

As hinted by the “L” and “S” suffixes, WeaveL is a “lazy” variant of WeaveS: fmap for WeaveL “postpones” work by accumulating functions in the WeftL constructor. That work is “forced” by meshL, which is where the fmap ((<$>)) of the underlying monad m is called, performing the work accumulated by possibly many calls to WeaveL’s fmap.

One subtlety is that there are multiple “lazinesses” at play. The main benefit of using WeaveL is really to delay computation, that is a kind of laziness, but WeaveL doesn’t need to be implemented in a lazy language. We can rewrite all of the code we’ve seen so far in a strict language with minor changes, and we will still observe the quadratic vs linear behavior of WeaveS vs WeaveL on thin trees. The “manufactured laziness” of WeaveL is a concept independent of the “ambient laziness” in Haskell.

Nevertheless, we can still find an interesting role for that “ambient laziness” in this story. Indeed, the function weaveL also happens to be lazier than weaveS in the usual sense.

A concrete test case is worth a thousand words. Consider the following tree generator which keeps unfolding left subtrees while making every right subtree undefined:

partialTreeF :: Bool -> TreeF () Bool
partialTreeF True = NodeF () True False
partialTreeF False = undefined

If we used the pure unfold, we would get the same tree as this recursive definition:

partialTree :: Tree ()
partialTree = Node () partialTree undefined

What happens if we use one of the monadic unfolds? For example unfoldM_BF_S:

partialTreeS :: Tree ()
partialTreeS = runIdentity (unfoldM_BF_S (Identity . partialTreeF) True)

Try to force the first Node constructor.

whnfTreeS :: TestTree
whnfTreeS = expectFail $ testCase "S-whnf" $ do
  case partialTreeS of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable" -- definitely not a Leaf

As it turns out, this test using the “S” variant fails. (That’s why the test is marked with expectFail.) Forcing partialTreeS evaluates the undefined in partialTreeF. Therefore partialTreeS is not equivalent to partialTree.

$ cabal exec breadth-first-unfolds -- -p "S-whnf"
All
  S-whnf: FAIL (expected)
    Exception: Prelude.undefined
    CallStack ...

In contrast, the “L” variant makes that same test succeed.

partialTreeL :: Tree ()
partialTreeL = runIdentity (unfoldM_BF_L (Identity . partialTreeF) True)

whnfTreeL :: TestTree
whnfTreeL = testCase "L-whnf" $ do
  case partialTreeL of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"

Test output:

$ cabal exec breadth-first-unfolds -- -p "L-whnf"
All
  L-whnf: OK

This difference can only be seen with “lazy monads”, where (>>=) is lazy in its first argument. (If this definition sounds not quite right, that’s probably because of seq. It makes a precise definition of “lazy monad” more complicated.) Examples of lazy monads from the transformers library are Identity, Reader, lazy State, lazy Writer, and Accum.

The secret sauce is the definition of liftA2 for WeaveL:

instance Applicative m => Applicative (WeaveL m) where
  pure = EndL
  liftA2 f (EndL a) wb = f a <$> wb
  liftA2 f wa (EndL b) = flip f b <$> wa
  liftA2 f (WeftL wa g) (WeftL wb h)
    = WeftL ((liftA2 . liftA2) (,) wa wb) (\ ~(a, b) -> f (g a) (h b))

In the third clause of liftA2, we put the function f in a lambda with a lazy pattern (~(a, b)) directly under the topmost constructor WeftL. Thus, we can access the result of f from the second field of WeftL without looking at the first field. In liftS2 earlier, f was passed as an argument to (liftA2 . liftS2), that forces us to run the computation before we can get a hold on the result of f.

Maximizing laziness

The “L” variant of unfoldM is lazier than the “S” variant, but there is still a gap between partialTreeL and the pure partialTree: if we force not only the root, but also the left subtree of partialTreeL, then we run into undefined again.

forceLeftTreeL :: TestTree
forceLeftTreeL = expectFail $ testCase "L-left" $ do
  case partialTreeL of
    Node _ (Node _ _ _) _ -> pure ()  -- Succeed
    _ -> error "unreachable"

Test output:

$ cabal exec breadth-first-unfolds -- -p "L-left" 
All
  L-left: FAIL (expected)
    Exception: Prelude.undefined

Although the unfold using WeaveL is lazier than using WeaveS, it is not yet as lazy as it could be. The reason is that, strictly speaking, WeaveL’s liftA2 is a strict function. The expansion function partialTreeF produces a level with an undefined sub-computation, which crashes the whole level. Each level in a computation will be either completely defined or undefined.

To recap, we’ve been looking at the following trees:

partialTreeS = undefined
partialTreeL = Node () undefined undefined
partialTree  = Node () partialTree undefined

It is natural to ask: can we define a breadth-first unfold that, when applied to partialTreeF, will yield the same tree as partialTree?

More generally, the new problem is to define a breadth-first unfoldM whose specialization with the Identity functor is equivalent to the pure unfold even on partially-defined values. That is, it satisfies the following equation:

unfold f = runIdentity . unfoldM (Identity . f)

Laziness without end

The strictness of liftA2 is caused by WeaveL having two constructors. Let’s get rid of EndL.

data WeaveE m a where
  WeftE :: m (WeaveE m b) -> (b -> a) -> WeaveE m a

Having only one constructor lets us use lazy patterns:

instance Functor (WeaveE m) where
  fmap f ~(WeftE wa g) = WeftE wa (f . g)

Wait a second. I spoke too fast, GHC gives us an error:

error: [GHC-87005]
    • An existential or GADT data constructor cannot be used
        inside a lazy (~) pattern
    • In the pattern: WeftE wa g
      In the pattern: ~(WeftE wa g)
      In an equation for ‘fmap’: fmap f ~(WeftE wa g) = WeftE wa (f . g)
    |
641 | >   fmap f ~(WeftE wa g) = WeftE wa (f . g)
    |              ^^^^^^^^^^

The feature we need is “first-class existentials”, for which there is an open GHC proposal.

Not letting that stop us, there is a simple version of first-class existentials available in the package some, as the module Data.Some.Newtype (internally using unsafeCoerce). That will be sufficient for our purposes. All we need is an abstract type Some and a pattern synonym:

-- imported from Data.Some.Newtype
data Some f
pattern Some :: f a -> Some f

And we’re back on track. Here comes the actual “E” (endless) variant:

newtype WeaveE m a = MkWeaveE (Some (WeavingE m a))

data WeavingE m a b where
  WeftE :: m (WeaveE m b) -> (b -> a) -> WeavingE m a b

I spare you the details.

Functor, Applicative, weftE, meshE

instance Functor (WeaveE m) where
  fmap f (MkWeaveE (Some ~(WeftE u g))) = MkWeaveE (Some (WeftE u (f . g)))

instance Applicative m => Applicative (WeaveE m) where
  pure x = MkWeaveE (Some (WeftE (pure (pure ())) (\_ -> x)))
  liftA2 f (MkWeaveE (Some ~(WeftE u g))) (MkWeaveE (Some ~(WeftE v h)))
    = MkWeaveE (Some (WeftE ((liftA2 . liftA2) (,) u v) (\ ~(x, y) -> f (g x) (h y))))

weftE :: m (WeaveE m a) -> WeaveE m a
weftE u = MkWeaveE (Some (WeftE u id))

meshE :: Monad m => WeaveE m a -> m a
meshE (MkWeaveE (Some (WeftE u f))) = f <$> (u >>= meshE)

Breadth-first unfold, “E” variant: weaveE and unfoldM_BF_E

weaveE :: Applicative m => (s -> m (TreeF a s)) -> s -> m (WeaveE m (Tree a))
weaveE f s = f s <&> \case
  LeafF -> pure Leaf
  NodeF a s1 s2 -> liftA2 (Node a) (weftE (weaveE f s1)) (weftE (weaveE f s2))

unfoldM_BF_E :: Monad m => (s -> m (TreeF a s)) -> s -> m (Tree a)
unfoldM_BF_E f s = weaveE f s >>= meshE

The endless WeaveE enables an even lazier implementation of unfoldM. When specialized to the identity monad, it lets us force the resulting tree in any order. The forceLeftTreeE test passes (unlike forceLeftTreeL).

partialTreeE :: Tree ()
partialTreeE = runIdentity (unfoldM_BF_E (Identity . partialTreeF) True)

forceLeftTreeE :: TestTree
forceLeftTreeE = testCase "E-left" $ do
  case partialTreeE of
    Node _ (Node _ _ _) _ -> pure ()  -- Succeed
    _ -> error "unreachable"

Test output:

$ cabal exec breadth-first-unfolds -- -p "E-left"
All
  E-left: OK

One can also check that forcing the left spine of partialTreeE arbitrarily deep throws no errors.

We made it lazy, but at what cost? First, this “Endless” variant only works for lazy monads. With a strict monad, the runner meshE will loop forever. It is possible to run things more incrementally by pattern-matching on WeaveE, but you’re better off using the oblivious WeaveO anyway.

Second, when you aren’t running into an unproductive loop, the “Endless” variant of unfoldM has quadratic time complexity Ω(height²). The reason is essentially the same as the “Strict” variant: liftA2 keeps looping even if one argument is a pure weave—before, that was to traverse the other non-pure argument, now, there isn’t even a way to tell when the computation has ended. Thus, every leaf may create work proportional to the height of the tree.

Running the same benchmark as before, we measure even more baffling timings:

height	time
baseline	1x
10x	738x

Benchmark: thinTreeE and benchE

thinTreeE :: Int -> Tree ()
thinTreeE = runIdentity . unfoldM_BF_E f
  where
    f 0 = pure LeafF
    f h = pure (NodeF () 0 (h - 1))

benchE :: TestTree
benchE = {- localOption mutatorCpuTime $ -} bgroup "E-thin"
  [ bench "1x" (nf thinTreeE 100)
  , bench "10x" (nf thinTreeE 1000) & bcompare "E-thin.1x"
  ]

Raw output:

$ cabal exec breadth-first-unfolds -- -p "E-thin."
All
  E-thin
    1x:  OK
      243  μs ±  22 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    10x: OK
      179  ms ±  17 ms, 119 MB allocated,  29 MB copied,  21 MB peak memory, 737.76x

Using the previous setup comparing a baseline and a 10x run, we see a more than 700x slowdown, so much worse than the 100x predicted by a quadratic model. Interestingly, the raw output shows that the total cumulative allocations did grow by a 100x factor.¹

But it gets weirder with more data points: it does not follow a clear power law. If Time(n) grew as n^c for some fixed exponent c, then the ratio Time(Mn)/Time(n) would be M^c, a constant that does not depend on n.

In the following benchmark, we keep doubling the height (M = 2) for every test case, and we measure the time relative to the preceding case each time. A quadratic model predicts a 4x slowdown at every step. Instead, we observe wildly varying factors.

Benchmark output (each time factor is relative to the preceding line, for example, the “4x” benchmark is 9.5x slower than the “2x” benchmark):

height	time
1x
2x	10.9x
4x	9.5x
8x	5.4x
16x	1.4x

Code and raw output

benchE' :: TestTree
benchE' = {- localOption mutatorCpuTime $ -} bgroup "E-thin-more"
  [ bench "1x" (nf thinTreeE 100)
  , bench "2x" (nf thinTreeE 200) & bcompare "E-thin-more.1x"
  , bench "4x" (nf thinTreeE 400) & bcompare "E-thin-more.2x"
  , bench "8x" (nf thinTreeE 800) & bcompare "E-thin-more.4x"
  , bench "16x" (nf thinTreeE 1000) & bcompare "E-thin-more.8x"
  ]

$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
  E-thin-more
    1x:  OK
      222  μs ± 9.3 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    2x:  OK
      2.43 ms ±  85 μs, 4.8 MB allocated, 236 KB copied, 7.0 MB peak memory, 10.94x
    4x:  OK
      23.1 ms ± 1.2 ms,  19 MB allocated, 2.7 MB copied,  10 MB peak memory, 9.53x
    8x:  OK
      126  ms ± 7.8 ms,  76 MB allocated,  18 MB copied,  24 MB peak memory, 5.44x
    16x: OK
      181  ms ± 7.0 ms, 119 MB allocated,  30 MB copied,  24 MB peak memory, 1.44x

I believe this benchmark is triggering some pathological behavior in the garbage collector. I modified tasty-bench with an option to measure CPU time without GC (mutator time). At time of writing, tasty-bench is still waiting for a new release. We can point Cabal to an unreleased commit of tasty-bench by adding the following lines to cabal.project.local.

source-repository-package
    type: git
    location: https://github.com/Bodigrim/tasty-bench.git
    tag: 81ff742a3db1d514461377729e00a74e5a9ac1b8

Then, uncomment the setting “localOption mutatorCpuTime $” in benchE and benchE' above and uncomment the import of mutatorCpuTime at the top.

Benchmark output (excluding GC time, relative):

height	time
baseline	1x
1x	95x

Raw output

$ cabal exec breadth-first-unfolds -- -p "E-thin."
All
  E-thin
    1x:  OK
      216  μs ±  18 μs, 1.2 MB allocated,  13 KB copied, 6.0 MB peak memory
    10x: OK
      20.5 ms ± 1.9 ms, 119 MB allocated,  29 MB copied,  21 MB peak memory, 94.91x

For the “2x” benchmarks, we are closer the expected 4x slowdown, but there is still a noticeable gap. I’m going to chalk the rest to inherent measurement errors (the cost of tasty-bench’s simplicity) exacerbated by the pathological GC behavior; a possible explanation is that the pattern of memory usage becomes so bad that it affects non-GC time.

Benchmark output (excluding GC time, each measurement is relative to the preceding line):

height	time
1x
2x	3.2x
4x	4.2x
8x	4.5x
16x	1.7x

Raw output

$ cabal exec breadth-first-unfolds -- -p "E-thin-more"
All
  E-thin-more
    1x:  OK
      186  μs ±  16 μs, 1.2 MB allocated,  13 KB copied,  21 MB peak memory
    2x:  OK
      597  μs ±  28 μs, 4.8 MB allocated, 236 KB copied,  21 MB peak memory, 3.20x
    4x:  OK
      2.48 ms ± 148 μs,  19 MB allocated, 2.9 MB copied,  21 MB peak memory, 4.15x
    8x:  OK
      11.2 ms ± 986 μs,  76 MB allocated,  18 MB copied,  24 MB peak memory, 4.50x
    16x: OK
      18.4 ms ± 1.7 ms, 119 MB allocated,  29 MB copied,  24 MB peak memory, 1.65x

It doesn’t seem possible for a breadth-first unfold to be both maximally lazy and of linear time complexity, but I don’t know how to formally prove that impossibility either.

Microbenchmarks: Queues vs Global Levels vs Weaves

So far we’ve focused on asymptotics (linear vs quadratic). Some readers will inevitably wonder about real speed. Among the linear-time algorithms—queues (“Q”), global levels (“G”), and weaves (lazy “L” or oblivious “O”)—which one is faster?

tl;dr: Queues are (much) faster in these microbenchmarks (up to 25x!), but keep in mind that these are all quite naive implementations.

There are two categories to measure separately: unfolds which produce trees, and oblivious unfolds—which don’t produce trees. These microbenchmarks construct full trees up to a chosen number of nodes. When there is an output tree, we force it (using nf), otherwise we force a counter of the number of nodes. We run on different sufficiently large sizes (500 and 5000) to check the stability of the measured factors, ensuring that we are only comparing the time components that dominate at scale.

The tables list times relative to the queue benchmark for each tree size.

Tree-producing unfolds

algorithm	size	time
Queue	500	1x
Global Levels	500	1.4x
Lazy Weave	500	3.1x
Queue	5000	1x
Global Levels	5000	1.2x
Lazy Weave	5000	3.3x

Code and raw output

fullTreeF :: Int -> Int -> TreeF Int Int
fullTreeF size n | n >= size = LeafF
fullTreeF size n = NodeF n (2 * n) (2 * n + 1)

fullTree_Q :: Int -> Tree Int
fullTree_Q size = runIdentity (unfoldM_BF_Q (Identity . fullTreeF size) 1)

fullTree_G :: Int -> Tree Int
fullTree_G size = runIdentity (unfoldM_BF_G (Identity . fullTreeF size) 1)

fullTree_L :: Int -> Tree Int
fullTree_L size = runIdentity (unfoldM_BF_L (Identity . fullTreeF size) 1)

fullTree :: TestTree
fullTree = bgroup "fullTree"
  [ bench "Q-1x" (nf fullTree_Q 500)
  , bench "G-1x" (nf fullTree_G 500) & bcompare "fullTree.Q-1x"
  , bench "L-1x" (nf fullTree_L 500) & bcompare "fullTree.Q-1x"
  , bench "Q-10x" (nf fullTree_Q 5000)
  , bench "G-10x" (nf fullTree_G 5000) & bcompare "fullTree.Q-10x"
  , bench "L-10x" (nf fullTree_L 5000) & bcompare "fullTree.Q-10x"
  ]

$ cabal exec breadth-first-unfolds -- -p fullTree
All
  fullTree
    Q-1x:  OK
      20.6 μs ± 1.1 μs, 141 KB allocated, 477 B  copied, 6.0 MB peak memory
    G-1x:  OK
      28.6 μs ± 2.4 μs, 223 KB allocated, 928 B  copied, 6.0 MB peak memory, 1.39x
    L-1x:  OK
      64.3 μs ± 5.6 μs, 353 KB allocated, 3.7 KB copied, 6.0 MB peak memory, 3.13x
    Q-10x: OK
      287  μs ±  26 μs, 1.5 MB allocated,  57 KB copied, 7.0 MB peak memory
    G-10x: OK
      349  μs ±  30 μs, 2.2 MB allocated,  94 KB copied, 7.0 MB peak memory, 1.22x
    L-10x: OK
      935  μs ±  73 μs, 3.5 MB allocated, 386 KB copied, 7.0 MB peak memory, 3.25x

Oblivious unfolds

algorithm	size	time
Queue	500	1x
Global Levels	500	11x
Oblivious Weave	500	25x
Queue	5000	1x
Global Levels	5000	10x
Oblivious Weave	5000	24x

Code and raw output

unfoldM_BF_Q_ :: Monad m => (s -> m (TreeF a s)) -> s -> m ()
unfoldM_BF_Q_ f s0 = unfoldM_f (s0 <+ Empty)
  where
    unfoldM_f (q :> s) = f s >>= \case
      LeafF -> unfoldM_f q
      NodeF _ l r -> unfoldM_f (r <+ l <+ q)
    unfoldM_f Empty = pure ()

eatFullTree_Q :: Int -> Int
eatFullTree_Q size = (`execState` 0) (unfoldM_BF_Q_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree_G :: Int -> Int
eatFullTree_G size = (`execState` 0) (unfoldM_BF_G_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree_O :: Int -> Int
eatFullTree_O size = (`execState` 0) (unfoldM_BF_O_ (state . \n c -> (fullTreeF size n, c + 1)) 1)

eatFullTree :: TestTree
eatFullTree = bgroup "eatFullTree"
  [ bench "Q-1x" (nf eatFullTree_Q 500)
  , bench "G-1x" (nf eatFullTree_G 500) & bcompare "eatFullTree.Q-1x"
  , bench "W-1x" (nf eatFullTree_O 500) & bcompare "eatFullTree.Q-1x"
  , bench "Q-10x" (nf eatFullTree_Q 5000)
  , bench "G-10x" (nf eatFullTree_G 5000) & bcompare "eatFullTree.Q-10x"
  , bench "W-10x" (nf eatFullTree_O 5000) & bcompare "eatFullTree.Q-10x"
  ]

$ cabal exec breadth-first-unfolds -- -p eatFullTree
All
  eatFullTree
    Q-1x:  OK
      11.0 μs ± 724 ns,  78 KB allocated, 338 B  copied, 6.0 MB peak memory
    G-1x:  OK
      116  μs ±  11 μs, 379 KB allocated, 1.3 KB copied, 6.0 MB peak memory, 10.57x
    W-1x:  OK
      278  μs ±  14 μs, 830 KB allocated, 5.9 KB copied, 6.0 MB peak memory, 25.36x
    Q-10x: OK
      120  μs ±  11 μs, 781 KB allocated,  21 KB copied, 6.0 MB peak memory
    G-10x: OK
      1.23 ms ± 122 μs, 3.9 MB allocated, 109 KB copied, 7.0 MB peak memory, 10.27x
    W-10x: OK
      2.92 ms ± 255 μs, 8.4 MB allocated, 631 KB copied, 7.0 MB peak memory, 24.43x

Conclusion

I hope to have piqued your interest in breadth-first unfolds without using queues. To the best of my knowledge, this specific problem hasn’t been studied in the literature. It is of course related to breadth-first traversals, previously solved using the Phases applicative.² The intersection of functional programming and breadth-first walks is a small niche, which makes it quick to survey that corner of the world for any related ideas to those presented here.

The paper Modular models of monoids with operations by Zhixuan Yang and Nicolas Wu, in ICFP 2023, mentions a general construction of Phases as an example application of their theory. Basically, Phases is defined by a fixed-point equation:

Phases f = Day f Phases :+: Identity

We can express Phases abstractly as a least fixed-point μx.f▫x + Id in any monoidal category with a suitable structure. If we instantiate the monoidal product ▫ not with Day convolution, but with functor composition (Compose), then we get Weave.

In another coincidence, the monad-coroutine package implements a weave function which is a generalization of liftS2—this may require some squinting. While WeaveS as a data type coincides with the free monad Free, monad-coroutine’s core data type Coroutine coincides with the free monad transformer FreeT.

We can view Phases as a generalization of “zipping” from lists to free applicatives—which are essentially lists of actions, and Weave generalizes that further to free monads. To recap, the surprise was that the naive data type of free monads results in a quadratic-time unfold. That issue motivated a “lazy” variant³ which achieves a linear-time breadth-first unfold. That in turn suggested an even “lazier” variant which enables more control on evaluation order at the cost of efficiency.

I’ve just released the weave library which implements the main ideas of this post. I don’t expect it to have many users, given how much slower it is compared to queue-based solutions. But I would be curious to find a use case for the new compositionality afforded by this abstraction.

Recap table

	Unfolds	Time	Laziness	Compositional
Phases^*	No	linear^†	by levels	Yes
Queue (Q)	Yes	linear^†	strict	No
Global Levels (G)	Yes	linear^†	by levels	No
Strict Weave (S)	Yes	quadratic^‡	strict	Yes
Oblivious Weave (O)	Oblivious only	linear^†	N/A	Yes
Lazy Weave (L)	Yes	linear^†	by levels	Yes
Endless Weave (E)	Yes	quadratic^‡E	maximally lazy^◊	Yes

^†Linear wrt. size: Θ(size).
^‡Quadratic wrt. height: lower bound Ω(height²), upper bound O(height × size).
^EThe “Endless” meshE only terminates with lazy monads.
^*I guess there exists an “endless Phases” variant, that would be quadratic and maximally lazy.
^◊The definition of “maximally lazy” in this post actually misses a range of possible lazy behaviors with monads other than Identity. A further refinement seems to be another can of worms.

The main action of this Literate Haskell program

main :: IO ()
main = defaultMain
  [ testGraphQ
  , testGraphG
  , testGraphS
  , testGraphL
  , testGraphE
  , whnfTreeQ
  , whnfTreeS
  , whnfTreeL
  , whnfTreeE
  , forceLeftTreeL
  , forceLeftTreeE
  , benchS
  , benchO
  , benchL
  , benchE
  , benchE'
  , fullTree
  , eatFullTree
  ]

Extra test cases

whnfTreeE :: TestTree
whnfTreeE = testCase "E-whnf" $ do
  case partialTreeE of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"

whnfTreeQ :: TestTree
whnfTreeQ = expectFail $ testCase "Q-whnf" $ do
  case partialTreeQ of
    Node _ _ _ -> pure ()  -- Succeed
    Leaf -> error "unreachable"

partialTreeQ :: Tree ()
partialTreeQ = runIdentity (unfoldM_BF_Q (Identity . partialTreeF) True)

bfGraph_L :: Int -> Tree Int
bfGraph_L = (`evalState` Set.empty) . unfoldM_BF_L visitGraph

testGraphL :: TestTree
testGraphL = testCase "L-graph" $
  bfGraph_L 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

bfGraph_E :: Int -> Tree Int
bfGraph_E = (`evalState` Set.empty) . unfoldM_BF_E visitGraph

testGraphE :: TestTree
testGraphE = testCase "E-graph" $
  bfGraph_E 1 @?=
    Node 1
      (Node 2 Leaf
              (Node 5 Leaf Leaf))
      (Node 4 Leaf (Node 6 Leaf Leaf))

Note that tasty-bench also reports memory statistics (allocated, copied, and peak memory) when certain RTS options are enabled, which I’ve done by compiling the test executable with -with-rtsopts=-T.↩︎
- The tree-traversals library by Noah Easterly.
- Algebras for weighted search, by Donnacha Oisín Kidney and Nicolas Wu, in ICFP 2021.
- Breadth-first traversal via staging, by Jeremy Gibbons, Donnacha Oisín Kidney, Tom Shrijvers, and Nicolas Wu, in MPC 2022. It has been revised into a short version (5 pages): Phases in software architecture in FUNARCH 2023.
↩︎
Speaking of variants of free monads, one might think of the “freer” monad, which has different motivations and which does not help us here.↩︎

]]> Sun, 30 Mar 2025 00:00:00 UT https://blog.poisson.chat/posts/2025-03-30-breadth-first-unfolds.html Lysxia Unicode shenanigans: Martine Ã©crit en UTF-8 https://blog.poisson.chat/posts/2024-10-05-unicode-shenanigans.html

An old French meme

Martine Ã©crit en UTF-8 (parody cover of the Martine series of French children's books)

On my feed aggregator haskell.pl-a.net, I occasionally saw posts with broken titles like this (from ezyang’s blog):

Whatâ€™s different this time? LLM edition

Yesterday I decided to do something about it.

Locating the problem

Tracing back where it came from, that title was sent already broken by Planet Haskell, which is itself a feed aggregator for blogs. The blog originally produces the good not broken title. Therefore the blame lies with Planet Haskell. It’s probably a misconfigured locale. Maybe someone will fix it. It seems to be running archaic software on an old machine, stuff I wouldn’t deal with myself so I won’t ask someone else to.

ASCII diagram of how a blog title travels through the relevant parties

      Blog
       |
       | What’s
       v
 Planet Haskell
       | 
       | Whatâ€™s
       v
haskell.pl-a.net (my site)
       |
       | Whatâ€™s
       v
  Your screen

In any case, this mistake can be fixed after the fact. Mis-encoded text is such an ubiquitous issue that there are nicely packaged solutions out there, like ftfy.

ftfy has been used as a data processing step in major NLP research, including OpenAI’s original GPT.

But my hobby site is written in OCaml and I would rather have fun solving this encoding problem than figure out how to install a Python program and call it from OCaml.

Explaining the problem

This is the typical situation where a program is assuming the wrong text encoding.

Text encodings

A quick summary for those who don’t know about text encodings.

Humans read and write sequences of characters, while computers talk to each other using sequences of bytes. If Alice writes a blog, and Bob wants to read it from across the world, the characters that Alice writes must be encoded into bytes so her computer can send it over the internet to Bob’s computer, and Bob’s computer must decode those bytes to display them on his screen. The mapping between sequences of characters and sequences of bytes is called an encoding.

Multiple encodings are possible, but it’s not always obvious which encoding to use to decode a given byte string. There are good and bad reasons for this, but the net effect is that many text-processing programs arbitrarily guess and assume the encoding in use, and sometimes they assume wrong.

Back to the problem

UTF-8 is the most prevalent encoding nowadays.¹ I’d be surprised if one of the Planet Haskell blogs doesn’t use it, which is ironic considering the issue we’re dealing with.

A blog using UTF-8 encodes the right single quote² " ’ " as three consecutive bytes (226, 128, 153) in its RSS or Atom feed.
The culprit, Planet Haskell, read those bytes but wrongly assumed an encoding different from UTF-8 where each byte corresponds to one character.
It did some transformation to the decoded text (extract the title and body and put it on a webpage with other blogs).
It encoded the final result in UTF-8.

ASCII diagram of how text gets encoded and decoded (wrongly)

      What the blog sees →       '’'
                                  |
                                  | UTF-8 encode (one character into three bytes)
                                  v
                             226 128 153
                                  |
                                  | ??? decode (not UTF-8)
                                  v
What Planet Haskell sees →   'â' '€' '™'
                                  |
                                  | UTF-8 encode
                                  v
                                (...)
                                  |
                                  | UTF-8 decode
                                  v
            What you see →   'â' '€' '™'

The final encoding doesn’t really matter, as long as everyone else downstream agrees with it. The point is that Planet Haskell outputs three characters “â€™” in place of the right single quote " ’ ", all because UTF-8 represents " ’ " with three bytes.

In spite of their differences, most encodings in practice agree at least about ASCII characters, in the range 0-127, which is sufficient to contain the majority of English language writing if you can compromise on details such as confusing the apostrophe and the single quotes. That’s why in the title “Whatâ€™s different this time?” everything but one character was transferred fine.

Solving the problem

The fix is simple: replace “â€™” with " ’ ". Of course, we also want to do that with all other characters that are mis-encoded the same way: those are exactly all the non-ASCII Unicode characters. The more general fix is to invert Planet Haskell’s decoding logic. Thank the world that this mistake can be reversed to begin with. If information had been lost by mis-encoding, I may have been forced to use one of those dreadful LLMs to reconstruct titles.³

Decode Planet Haskell’s output in UTF-8.
Encode each character as a byte to recover the original output from the blog.
Decode the original output correctly, in UTF-8.

There is one missing detail: what encoding to use in step 2? I first tried the naive thing: each character is canonically a Unicode code point, which is a number between 0 and 1114111, and I just hoped that those which did occur would fit in the range 0-255. That amounts to making the hypothesis that Planet Haskell is decoding blog posts in Latin-1. That seems likely enough, but you will have guessed correctly that the naive thing did not reconstruct the right single quote in this case. The Latin-1 hypothesis was proven false.

As it turns out, the euro sign “€” and the trademark symbol “™” are not in the Latin-1 alphabet. They are code points numbers 8364 and 8482 in Unicode, which are not in the range 0-255. Planet Haskell has to be using an encoding that features these two symbols. I needed to find which one.

Faffing about, I came across the Wikipedia article on Western Latin character sets which lists a comparison table. How convenient. I looked up the two symbols to find what encoding had them, if any. There were two candidates: Windows-1252 and Macintosh. Flip a coin. It was Windows-1252.

Windows-1252 differs from Latin-1 (and thus Unicode) in 27 positions, those whose byte starts with 8 or 9 in hexadecimal (27 valid characters + 5 unused positions): that’s 27 characters that I had to map manually to the range 0-255 according to the Windows-1252 encoding, and the remaining characters would be mapped for free by Unicode. This data entry task was autocompleted halfway through by Copilot, because of course GPT-* knows Windows-1252 by heart.

let windows1252_hack (c : Uchar.t) : int =
  let c = Uchar.to_int c in
  if      c = 0x20AC then 0x80
  else if c = 0x201A then 0x82
  else if c = 0x0192 then 0x83
  else if c = 0x201E then 0x84
  else if c = 0x2026 then 0x85
  else if c = 0x2020 then 0x86
  else if c = 0x2021 then 0x87
  else if c = 0x02C6 then 0x88
  else if c = 0x2030 then 0x89
  else if c = 0x0160 then 0x8A
  else if c = 0x2039 then 0x8B
  else if c = 0x0152 then 0x8C
  else if c = 0x017D then 0x8E
  else if c = 0x2018 then 0x91
  else if c = 0x2019 then 0x92
  else if c = 0x201C then 0x93
  else if c = 0x201D then 0x94
  else if c = 0x2022 then 0x95
  else if c = 0x2013 then 0x96
  else if c = 0x2014 then 0x97
  else if c = 0x02DC then 0x98
  else if c = 0x2122 then 0x99
  else if c = 0x0161 then 0x9A
  else if c = 0x203A then 0x9B
  else if c = 0x0153 then 0x9C
  else if c = 0x017E then 0x9E
  else if c = 0x0178 then 0x9F
  else c

And that’s how I restored the quotes, apostrophes, guillemets, accents, et autres in my feed.

Hope (1980)

Algebraic data types were at first simply called “data types”. This programming language feature is commonly attributed to Hope, an experimental applicative language by Rod Burstall et al.. Here is the relevant excerpt from the paper, illustrating its concrete syntax:

A data declaration is used to introduce a new data type along with the data constructors which create elements of that type. For example, the data declaration for natural numbers would be:
data num == 0 ++ succ(num)
(…) To define a type ‘tree of numbers’, we could say
data numtree == empty ++ tip(num)
                      ++ node(numtree#numtree)
(The sign # gives the cartesian product of types). One of the elements of numtree is:
node(tip(succ(0)),
     node(tip(succ(succ(0))), tip(0)))
But we would like to have trees of lists and trees of trees as well, without having to redefine them all separately. So we declare a type variable
typevar alpha
which when used in a type expression denotes any type (including second- and higher-order types). A general definition of tree as a parametric type is now possible:
data tree(alpha) == empty ++ tip(alpha)
                          ++ node(tree(alpha)#tree(alpha))
Now tree is not a type but a unary type constructor – the type numtree can be dispensed with in favour of tree(num).

Pattern matching in Hope is done in multi-clause function declarations or multi-clause lambdas. There was no case expression.

reverse(nil) <= nil
reverse(a::l) <= reverse(l) <> [a]

lambda true, p => p
     | false, p => false

As far as I can tell, other early programming languages cite Hope or one of its descendants as their inspiration for data types. There is a slightly earlier appearance in NPL by Darlington and the same Burstall, but I couldn’t find a source describing the language or any samples of data type declarations. Given the proximity, it seems reasonable to consider them the same language to a large extent. This paper by Burstall and Darlington (1977) seems to be using NPL in its examples, but data types are only introduced informally; see on page 62 (page 19 of the PDF):

We need a data type atom, from which we derive a data type tree, using constructor functions tip to indicate a tip and tree to combine two subtrees
tip : atoms → trees
tree : trees x trees → trees
We also need lists of atoms and of trees, so for any type alpha let
nil : alpha-lists
cons : alphas x alpha-lists → alpha-lists

Hope inspired ML (OCaml’s grandpa) to adopt data types. In Standard ML:

datatype 'a option = Nothing | Some of 'a

Before it became Standard, ML started out as the “tactic language” of the LCF proof assistant by Robin Milner, and early versions did not feature data types (see the first version of Edinburgh LCF). it’s unclear when data types were added exactly, but The Definition of Standard ML by Milner et al. credits Hope for it (in Appendix F: The Development of ML):

Two movements led to the re-design of ML. One was the work of Rod Burstall and his group on specifications, crystallised in the specification language Clear and in the functional programming language Hope; the latter was for expressing executable specifications. The outcome of this work which is relevant here was twofold. First, there were elegant programming features in Hope, particularly pattern matching and clausal function definitions; second, there were ideas on modular construction of specifications, using signatures in the interfaces. A smaller but significant movement was by Luca Cardelli, who extended the data-type repertoire in ML by adding named records and variant types.

Miranda (1985)

“Data types” as a programming language feature appeared in Hope, but its first mention under the name “algebraic data types” that I could find is in Miranda: a non-strict functional language with polymorphic types by David Turner in 1985:

Algebraic data types

The basic method of introducing a new concrete data type, as in a number of other languages, is to declare a free algebra. In Miranda this is done by an equation using the symbol ::=,
tree ::= Niltree | Node num tree tree
being a typical example. (…) The idea of using free algebras to define data types has a long and respectable history [Landin 64], [Burstall 69], [Hoare 75]. We call it a free algebra, because there are no associated laws, such as a law equating a tree with its mirror image. Two trees are equal only if they are constructed in exactly the same way.

In case you aren’t aware, Miranda is a direct precursor of Haskell. A minor similarity with Haskell that we can see here is that data constructors are curried in Miranda, unlike in Hope and ML. Another distinguishing feature of Miranda is laziness. See also A History of Haskell: being lazy with class.

Below are links to the articles cited in the quote above. The first [Landin 64] doesn’t explicitly talk about algebra in this sense, while [Burstall 69] and [Hoare 75] refer to “word algebra” rather than “free algebra” to describe the same structure, without putting “algebra” in the same phrase as “type” yet.

The mechanical evaluation of expression, by Peter Landin (1964)
Proving properties by structural induction, by Rod Burstall (1969)
Recursive data structures by Tony Hoare (1975)

Hoare’s paper contains some futuristic pseudocode in particular:

A possible notation for such a type definition was suggested by Knuth; it is a mixture of BNF (the | symbol) and the PASCAL definition of a type by enumeration:
type proposition = (prop (letter) | neg (proposition) |
                   conj, disj (proposition, proposition));
(…) In defining operations on a data structure, it is usually necessary to enquire which of the various forms the structure takes, and what are its components. For this, I suggest an elegant notation which has been implemented by Fred McBride in his pattern-matching LISP. Consider for example a function intended to count the number of &s contained in a proposition. (…)
function andcount (p: proposition): integer;
  andcount := cases p of
    (prop(c) → 0|
     neg(q) → andcount(q)|
     conj(q,r) → andcount(q) + andcount(r)+1|
     disj(q,r) → andcount(q) + andcount(r));

Fred McBride’s pattern-matching LISP is the topic of his PhD dissertation. There is not enough room on this page to write about the groundbreaking history of LISP.

Unfree algebras in Miranda

If algebraic data types are “free algebras”, one may naturally wonder whether “unfree algebras” have a role to play. Miranda allows quotienting data type definitions by equations (“laws” or “rewrite rules”). You could then define the integers like this, with a constructor to decrement numbers, and equations to reduce integers to a canonical representation:

int ::= Zero | Suc int | Pred int
Suc (Pred n) => n
Pred (Suc n) => n

In hindsight this is superfluous, but it’s fun to see this kind of old experiments in programming languages. The modern equivalent in Haskell would be to hide the data constructors and expose smart constructors instead. There are uses for quotient types in proof assistants and dependently typed languages, but they work quite differently.

Sums of products?

There is another folklore interpretation of “algebraic” in “algebraic data types” as referring to “sums of products”.

It’s not an uncommon interpretation. In fact, trying to find a source for this folklore is what got me going on this whole adventure. The Wikipedia article on algebraic data types at the time of writing doesn’t outright say it, but it does refer to sums and products several times while making no mention of free algebras. Some [citation needed] tags should be sprinkled around. The Talk page of that article contains an unresolved discussion of this issue, with links to a highly upvoted SO answer and another one whose references don’t provide first-hand account of the origins of the term. For sure, following that idea leads to some fun combinatorics, like differentiation on data types, but that doesn’t seem to have been the original meaning of “algebraic data types”.

That interpretation might have been in some people’s mind in the 70s and 80s, even if only as a funny coincidence, but I haven’t found any written evidence of it except maybe this one sentence in a later paper, Some history of programming languages by David Turner (2012):

The ISWIM paper also has the first appearance of algebraic type definitions used to define structures. This is done in words, but the sum-of-products idea is clearly there.

It’s only a “maybe” because while the phrase “algebraic type” undeniably refers to sums of products, it’s not clear that the adjective “algebraic” specifically is meant to be associated with “sum-of-products” in that sentence. We could replace “algebraic type” with “data type” without changing the meaning of the sentence.

Clear (1979)

In contrast, free algebras—or initial algebras as one might prefer to call them—are a concept from the areas of universal algebra and category theory with a well-established history in programming language theory by the time algebraic data types came around, with influential contributions by a certain ADJ group; see for example Initial algebra semantics and continuous algebras.

Ironically, much related work focused on the other ADT, “abstract data types”. Using universal algebra as a foundation, a variety of “specification languages” have been designed for defining algebraic structures, notably the OBJ family of languages created by Joseph Goguen (a member of the aforementioned ADJ group) and others, and the Clear language by Rod Burstall (of Hope fame) and Joseph Goguen. Details of the latter can be found in The Semantics of Clear, a specification language. (You may remember seeing a mention of Clear earlier in the quote from The Definition of Standard ML.)

Example theories in Clear

Here is the theory of monoids in Clear. It consists of one sort named carrier, an element (a nullary operation) named empty and a binary operation append.

constant Monoid = theory
                      sorts carrier
                      opns empty : carrier
                           append : carrier,carrier -> carrier
                      eqns all x: carrier . append(x,empty) = x
                           all x: carrier . append(empty,x) = x
                           all x,y,z: carrier . append(append(x,y),z) = append(x,append(y,z))
                  endth

A theory is an interface. Its implementations are called algebras. In that example, the algebras of “the theory of monoids” are exactly monoids.

In every theory, there is an initial algebra obtained by turning the operations into constructors (or “uninterpreted operations”), equating elements (which are trees of constructors) modulo the equations of the theory. For the example above, the initial monoid is a singleton monoid, with only an empty element (all occurrences of append are simplified away by the two equations for empty), which is not very interesting. Better examples are those corresponding to the usual data types.

The booleans can be defined as the initial algebra of the theory with one sort (truthvalue) and two values of that sort, true and false.

constant Bool = theory data
                    sorts truthvalue
                    opns true,false: truthvalue
                endth

In Clear, the initial algebra is specified by adding the data keyword to a theory. In the semantics of Clear, rather than thinking in terms of a specific algebra, a “data theory” is still a theory (an interface), with additional constraints that encode “initiality”, so the only possible algebra (implementation) is the initial one. My guess as to why the concept of data theory is set up that way is that it allows plain theories and data theories to be combined seamlessly.

The natural numbers are the initial algebra of zero and succ:

constant Nat = theory data
                   sorts nat
                   opns zero: nat
                        succ: nat -> nat
               endth

At this point, the connection between “data theories” in Clear and data types in Hope and subsequent languages is hopefully clear.

More substantial examples in Clear

Theories can be extended into bigger theories with new sorts, operations, and equations. Here is an extended theory of booleans with two additional operations not, and, and their equations. This should demonstrate that, beyond the usual mathematical structures, we can define non-trivial operations in this language:

constant Bool1 = enrich Bool by
                     opns not: truthvalue -> truthvalue
                          and: truthvalue,truthvalue -> truthvalue
                     eqns all . not true = false
                          all . not false = true
                          all p: truthvalue . and(false, p) = false
                          all p: truthvalue . and(true, p) = p
                 enden

Initial algebras are also called free algebras, but that gets confusing because “free” is an overloaded word. Earlier for instance, you might have expected the initial monoid, or “free monoid”, to be the monoid of lists. The monoid of lists is the initial algebra in a slightly different theory: the theory of monoids with an embedding from a fixed set of elements A.

We might formalize it as follows in Clear. The theory List is parameterized by an algebra A of the theory Set, and its body is the same as Monoid, except that we renamed carrier to list, we added an embed operation, and we added the data keyword to restrict that theory to its initial algebra.

constant Set = theory sorts element endth
procedure List(A : Set) = theory data
                              sorts list
                              opns empty : list
                                   append : list,list -> list
                                   embed : element of A -> list
                              eqns all x: list . append(x,empty) = x
                                   all x: list . append(empty,x) = x
                                   all x,y,z: list . append(append(x,y),z) = append(x,append(y,z))
                          endth

One may certainly see a resemblance between theories in Clear, modules in ML, and object-oriented classes. It’s always funny to find overlaps between the worlds of functional and object-oriented programming.

CLU (1977)

CLU is a programming language created at MIT by Barbara Liskov and her students in the course of their work on data abstraction.

It features tagged union types, which are called “oneof types”. (Source: CLU Reference Manual by Barbara Liskov et al. (1979).)

T = oneof[empty:       null,
          integer:     int,
          real_num:    real,
          complex_num: complex]

Values are constructed by naming the oneof type (either as an identifier bound to it, or by spelling out the oneof construct) then the tag prefixed by make_:

T$make_integer(42)

The tagcase destructs “oneof” values.

x: oneof[pair: pair, empty: null]
...
tagcase x
    tag empty: return(false)
    tag pair(p: pair): if (p.car = i)
                       then return(true)
                       else x := down(p.cdr)
                       end
end

The main missing feature for parity with algebraic data types is recursive type definitions, which are not allowed directly. They can be achieved indirectly though inconveniently through multiple clusters (classes in modern terminology). (Source: A History of CLU by Barbara Liskov (1992).)

Burstall’s papers on Hope and Clear cite CLU, but beyond that it doesn’t seem easy to make precise claims about the influence of CLU, which is an object-oriented language, on the evolution of those other declarative languages developed across the pond.

]]> Fri, 26 Jul 2024 00:00:00 UT https://blog.poisson.chat/posts/2024-07-26-adt-history.html Lysxia Programming Turing machines with regexes https://blog.poisson.chat/posts/2024-06-18-turing-regex.html Everybody knows that regular expressions are not Turing-complete. That won’t stop me from doing this.

In a Turing machine, there is a tape and there is a program. What is the program in a Turing machine? ^{drumroll 🥁…} It is a finite-state machine, which is equivalent to a regular expression!

Just for a silly pun, I’m going to introduce this programming language in an absurd allegory about T-rexes (“Turing regular expressions”, or just “Turing expressions”).

Tales of T-rexes

T-rexes are mysterious creatures who speak in tales (Turing expressions) about one T-rex—usually the speaker. Tales are constructed from symbols for the operations of a Turing machine, and the standard regex combinators.

> and < are the simple tales of the T-rex taking a single step to the left or right;
0! and 1! tell of the T-rex “writing” a 0 or 1;
0? and 1? tell of the T-rex “observing” a 0 or 1;
e₁ ... eₙ is a tale made up of a sequence of n tales, and there is an empty tale in the case n = 0 (“nothing happened”);
(e)* says that the tale e happened an arbitrary number of times, possibly zero;
(e₁|e₂) says that one of the tales happened, e₁ or e₂.

As it is a foreign language from a fantasy world, the description above shouldn’t be taken too literally. For instance, it can be difficult to imagine a T-rex holding a pen, much less writing with it. In truth, the actions that the tales 0! and 1! describe are varied, and “writing” is only the closest approximation among the crude words of humans.

Iteration (e)* and choice (e₁|e₂) make T-rex tales nondeterministic: different sequences of events may be valid interpretations of the same tale. The observations 0? and 1? enable us to prune the tree of possibilities. T-rex communication may be convoluted sometimes, but at least they mean to convey coherent series of events.

Enough exposition. Let’s meet T-rexes!

Two T-rexes greet us, introducing themselves as Alan and Alonzo. They invite us for a chat in their home in Jurassic park.

The tale of Alonzo

While Alan serves tea, Alonzo shows us a mysterious drawing. T-rex imagery is quite simplistic, owing to their poor vision and clumsy hands. We can sort of recognize Alonzo on the left, next to a row of circles and lines:

🦖
0001011000

Then, Alonzo tells us the following tale: “(>)*1?0!>1?0!”.

Noticing our puzzled faces, Alan fetches a small machine from the garage. It is a machine to interpret T-rex tales in a somewhat visual rendition. Alan demonstrates how to transcribe the tale that Alonzo told us together with his drawing into the machine. Here is the result:

Alan's machine

Program:

(>)*1?0!>1?0!

Input:

0001011000

Output:

0001010000

Press the Run button to see the machine render the tale. It happens in the blink of an eye; T-rexes are really fast! We will break it down step by step in what follows.

(You can also edit the ~~program~~ tale and the ~~input~~ drawing in these examples and see what happens then.)

Alonzo’s drawing is the scene where the tale takes place. We ask Alan and Alonzo what “0” and “1” represent, but we are not fluent enough in T-rex to understand their apparently nuanced answer. We have no other choice than to make abstraction of it.

🦖          ← Alonzo
0001011000  ← the world

Alonzo first said “(>)*”: he walked toward the right. As is customary in T-rex discourse, this tale leaves a lot up to interpretation. There are many possible guesses of how many steps he actually took. He could even have walked out of the picture!

But then the tale goes “1?”: Alonzo observed a 1, whatever that means, right where he stopped. That narrows the possibilities down to three:

Afterwards, Alonzo tells us that “0!” happened, which we think of abstractly as “writing” 0 where Alonzo is standing.

Before "0!"
   🦖
0001011000

After "0!"
   🦖
0000011000

Each of the three possibilities from earlier after writing 0:

Alonzo then made one step to the right, and observed another 1 (“>1?”). Only the second possibility above is consistent with that subsequent observation. Finally, he writes 0 again.

      🦖
0001000000

And we can see that the outcome matches the machine output above.

After puzzling over it for a while, we start to make sense of Alonzo’s tale “(>)*1?0!>1?0!”, and imagine this rough translation: “Funny story. I walked to the right, and I stopped in front of a 1, isn’t that right Alan? I was feeling hungry. So I ate it. I left nothing! I was still hungry, I am a dinosaur after all, so I took a step to the right, only to find another 1. Aren’t I lucky? I ate it too. It was super tasty!”

We have a good laugh. Alan and Alonzo share a few more tales. More tea? Of course. At their insistence, we try telling some of our own tales, to varied success. It’s getting late. Thank you for your hospitality. The end.

More examples

Exercise for the interested reader: implement the following operations using Turing expressions. Here’s a free machine to experiment with:

Alan's empty machine

Program:

Input:

Output:

Preamble: extra features and clarifications

For convenience, other digits can also be used in programs (i.e., 2?, 2!, 3?, 3!, up to 9? and 9! are allowed). The symbol 2 will be used to mark the end of a binary input in the exercises that follow.

Turing expressions are nondeterministic, but the machine only searches for the first valid execution. The search is biased as follows: (e₁|e₂) first tries e₁ and then, if that fails, e₂; (e)* is equivalent to (|e(e)*).¹

The tape extends infinitely to the left and to the right, initialized with zeroes outside of the input. The machine aborts if the tape head (🦖) walks out of the range [-100, 100].² 0 is the initial position of the tape head and where the input starts. The machine prints the first 10 symbols starting at position 0 as the output.

Whitespace is ignored. A # starts a comment up to the end of the line. What kind of person comments a regex?

Determinism

Although Turing expressions are nondeterministic in general, we obtain a deterministic subset of Turing expressions by requiring the branching combinators to be guarded. Don’t allow unrestricted iterations (e)* and choices (e₁|e₂), only use while loops (1?e)*0?³ and conditional statements (0?e₁|1?e₂).

Not

Flip the bits. 0 becomes 1, 1 becomes 0.

Examples:

Input:  0100110112
Output: 1011001002

Input:  1111111112
Output: 0000000002

Solution

Alan's machine

Program:

((0?1!|1?0!)>)*2?

Input:

0100110112

Output:

1011001002

Binary increment

Input: binary representation of a natural number n.
Output: binary representation of (n + 1).

Cheatsheet of binary representations:

Examples:

Input:  0010000000
Output: 1010000000

Input:  1110000000
Output: 0001000000

No input delimiters for this exercise.

Solution

Alan's machine

Program:

(1?0!>)*0?1!

Input:

1110000000

Output:

0001000000

Left shift

Move bits to the left.

Examples:

Input:  0100110112
Output: 1001101102

Input:  1111111112
Output: 1111111102

Solution

Alan's machine

Program:

(2?|(0!>0?|1!>1?)*(0!>2?))

Input:

0100110112

Output:

1001101112

Thanks to Nathanaëlle Courant for this nice solution!

Right shift

Move bits to the right.

Examples:

Input:  0100110112
Output: 0010011012

Input:  1111111112
Output: 0111111112

Solution

Alan's machine

Program:

((0?|1?)(0?>)*(2?|1?0!>(1?>)*(2?|0?1!>)))*2?

Input:

0100110112

Output:

0010011012

Thanks to Nathanaëlle Courant for this nice solution!

Cumulative xor

Each bit of the output is the xor of the input bits to the left of it.

Examples:

Input:  0100100012
Output: 0111000012

Input:  1111111112
Output: 1010101012

Solution

Alan's machine

Program:

((0?>)*(1?>((0?1!>)*1?0!|2?)|2?))*2?

Input:

0100100012

Output:

0111000012

Unary subtraction

Input: Two unary numbers x and y, separated by a single 0.
Output: The difference (x - y).

Example: evaluate (5 - 3).

Input:  1111101110
Output: 1100000000

Feel free to add a 2 to delimit the input. I only decided to allow symbols other than 0 and 1 after finishing this exercise.

Solution

Alan's machine

Program:

(1?>)*0?<(1?>(0?>)*1?0!>(1?<(0?<)*1?0!<|0?(0?<)*1?0!))*0?<(1?<)*0?>

Input:

1111101110

Output:

1100000000

Example: evaluate (3 - 5).

Input:  1110111110
Output: 0000000110

In my solution, the result -2 is represented by two 1 placed in the location of the second argument rather than the first. You may use a different encoding.

Solution (bis)

Alan's machine

Program:

(1?>)*0?<(1?>(0?>)*1?0!>(1?<(0?<)*1?0!<|0?(0?<)*1?0!))*0?<(1?<)*0?>

Input:

1110111110

Output:

0000000110

Commented program

The lack of delimiters in my version of the problem makes this a bit tricky.

        # Example: evaluate (5 - 3).
        # TAPE: 111110111
        # HEAD: ^
(1?>)*0?
        # TAPE: 111110111
        # HEAD:      ^
        #
        # The following line checks whether
        # the second argument is 0,
        # in which case we will skip the loop.
>(0?<|1?<<)
        # Otherwise we move the head on the last 1
        # of the first argument.
        # TAPE: 111110111
        # HEAD:     ^
        #
        # BEGIN LOOP
        # Loop invariant: the difference between
        # the two numbers on tape is constant.
(1?
        # Go to the first 1 of the second argument.
>(0?>)*1?
        # During the first iteration,
        # the tape looks like this:
        # TAPE: 111110111
        # HEAD:       ^
        #
        # Erase 1 to 0 and move to the right.
 0!>
        # TAPE: 111110011
        # HEAD:        ^
        #
        # Check whether there remains
        # at least one 1 to the right.
        # BEGIN IF
 (1?
        # There is at least one 1 on the right.
        # Move back into the first argument.
  <(0?<)*1?
        # TAPE: 111110011
        # HEAD:     ^
        #
        # Erase 1 to 0. Move left.
  0!<
        # TAPE: 111100011
        # HEAD:    ^
        #
        # If the 1s of the first argument ran out
        # at this point (which would mean
        # first argument < second argument),
        # we will BREAK out of the loop (then terminate),
        # otherwise, CONTINUE, back to the top of loop
 |0?
        # ELSE (second branch of the IF from three lines ago)
        # The 1s of the second argument ran out
        # (which means first argument >= second argument)
        # Tape when we reach this point (in the last iteration):
        # TAPE: 1110000000
        # HEAD:          ^
        #
        # Move back into the first argument.
  (0?<)*1?
        # TAPE: 111000000
        # HEAD:   ^
        #
        # Erase 1 to 0.
  0!
        # TAPE: 110000000
        # HEAD:   ^
        #
        # Reading a 0 will break out of the loop.
        # BREAK
 )
        # END IF
)*0?
        # END LOOP

The separation of program and tape

Until this point, there may remain misgivings about whether this is actually “regular expressions”. The syntax is the same, but is it really the same semantics? This section spells out a precise alternative definition of Turing machines with a clear place for the standard semantics of regular expressions (as regular languages).

Instead of applying regular expressions directly to an input string, we are using them to describe interactions between the program and the tape of a Turing machine. Then the regular expression might as well be the program.

The mechanics of Turing machines are defined traditionally via a transition relation between states. A Turing machine state is a pair (q, t) of a program state q ∈ Q (where Q is the set of states of a finite-state machine) and a tape state t ∈ 2^ℤ × ℤ (the bits on the tape and the position of the read-write head).

That “small-step” formalization of Turing machines is too monolithic for our present purpose of revealing the regular languages hidden inside Turing machines. The issue is that the communication between the program and the tape is implicit in the transition between states as program-tape pairs (q, t). We will take a more modular approach using trace semantics: the program and the tape each give rise to traces of interactions which make explicit the communication between those two components.

The standard semantics of regular expressions

The raison d’être of regular expressions is to recognize sequences of symbols, also known as strings, lists, or words. Here, we will refer to them as traces. Regular expressions are conventionally interpreted as sets of traces, reading the iteration * and choice | combinators are operations on sets.

Let A be a set of symbols; in our case A = {<, >, 0!, 1!, 0?, 1?} but the following definition works with any A. The trace semantics of a regular expression e over the alphabet A is defined inductively:

An atomic expression e ∈ A contains a single trace which is just that symbol. Trace(e) = {e} if e ∈ A
A concatenation of expressions e₁ … eₙ contains concatenations of traces of every eᵢ. Trace(e₁ … eₙ) = { t₁ … t_n ∣ ∀i, t_i ∈ Trace(eᵢ) }
An iteration (e)* contains all concatenations of traces t₁ … t_n such that each subtrace t_i is a trace of that same e. Trace((e)*) = { t₁ … t_n ∣ ∀i, t_i ∈ Trace(e) }
A choice (e₁|e₂) contains the union of traces of e₁ and e₂. Trace((e₁|e₂)) = Trace(e₁) ∪ Trace(e₂)

Equivalently, a trace semantics can be viewed as a relation between program and trace. We write e ⊢ t, pronounced “e recognizes t”, as an abbreviation of t ∈ Trace(e). This is also for uniformity with the notation in the next section.

A core result of automata theory is that the sets of traces definable by regular expressions are the same as those definable by finite-state machines. That led to our remark that Turing machines might as well be Turing regular expressions.

The Turing machine memory model

In the semantics of regular expressions above, the meaning of the symbols (<, >, etc.) is trivial: a symbol in a regular expression denotes itself as a singleton trace. In this section, we will give these symbols their natural meaning as “operations on a tape”. The tape is the memory model of Turing machines. Memory models are better known in the context of concurrent programming languages, as they answer the question of how to resolve concurrent writes and reads.

The tape carries a sequence of symbols extending infinitely in both directions. A head on the tape reads one symbol at a time, and can move left or right, one symbol at a time. Addresses on the tape are integers, elements of ℤ. A tape state is a pair (m, i) ∈ 2^ℤ × ℤ: the memory contents is m ∈ 2^ℤ (note 2^ℤ = ℤ → {0, 1}) and the position of the head is i ∈ ℤ. The behavior of the tape is defined as a ternary relation pronounced “(m, i) steps to (m^′, i^′) with trace t”, written:

(m, i) ⇝ (m^′, i^′) ⊢ t

It is defined by the following rules. We step left and right by decrementing and incrementing the head position i.

(m, i) ⇝ (m, i − 1) ⊢ < (m, i) ⇝ (m, i + 1) ⊢ >

Writing operations use the notation m[i ↦ v] for updating the value of the tape m at address i with v.

(m, i) ⇝ (m[i ↦ 0], i) ⊢ !0 (m, i) ⇝ (m[i ↦ 1], i) ⊢ !1

Observations, or assertions, step only when a side condition is satisfied. Otherwise, the tape is stuck, and that triggers backtracking in the search for a valid trace.

(m, i) ⇝ (m, i) ⊢ ?0 if m(i) = 0 (m, i) ⇝ (m, i) ⊢ ?1 if m(i) = 1

We close this relation by reflexivity (indexed by the empty trace ϵ) and transitivity (indexed by the concatenation of traces).

(m, i) ⇝ (m, i) ⊢ ϵ (m, i) ⇝ (m^′, i^′) ⊢ t and (m^′, i^′) ⇝ (m^″, i^″) ⊢ t^′ ⇔ (m, i) ⇝ (m^″, i^″) ⊢ t t^′

Turing regular expressions

We now connect programs and tapes together through the trace. A Turing regular expression e and an initial tape (m, 0) step to a final tape (m^′, i^′), written

e, (m, 0) ⇝ (m^′, i^′)

if there exists a trace t recognized by both the program and the tape:

e ⊢ t (m, 0) ⇝ (m^′, i^′) ⊢ t

We can then consider classes of functions computable by Turing expressions via an encoding of inputs and outputs on the tape. Let encode : ℕ → 2^ℤ be an encoding of natural numbers as tapes. A Turing expression e computes a function f : ℕ → ℕ if, for all n, there is exactly one final tape (m^′, i^′) such that

e, (encode(n), 0) ⇝ (m^′, i^′)

and that unique tape encodes f(n):

m^′ = encode(f(n))

Et voilà. That’s how we can program Turing machines with regular expressions.

Finite-state machines: the next 700 programming languages

Finite-state machines appear obviously in Turing machines, but you can similarly view many programming languages in terms of finite-state machines by reducing the state to just the program counter: “where you are currently in the source program” can only take finitely many values in a finite program. From that point of view, all other components of the abstract machine of your favorite programming language—including the values of local variables—belong to the “memory” that the program counter interacts with. Why would we do this? For glory of course. So we can say that most programming languages are glorified regular expressions.

To be fair, there are exceptions to this idea: cellular automata and homoiconic languages (i.e., with the ability to quote and unquote code at will) are those I can think of. At most there is a boring construction where the finite-state machine writes the source program to memory then runs a general interpreter on it.

Completely free from Turing-completeness

The theory of formal languages and automata has a ready-made answer about the expressiveness of regular expressions: regular expressions denote regular languages, which belong to a lower level of expressiveness than recursively enumerable languages in the Chomsky hierarchy.

What I want to point out is that theory can only ever study “expressiveness” in a narrow sense. Real expressiveness is fundamentally open-ended: the only limit is your imagination. Any mathematical definition of “expressiveness” must place road blocks so that meaningful impossibility theorems can be proved. The danger is to forget about those road blocks when extrapolating mathematical theorems into general claims about the usefulness of a programming language.⁴

The expressiveness of formal languages is a delicate idea in that there are well-established mathematical concepts and theorems about it, but the rigor of mathematics hides a significant formalization gap between how a theory measures “expressiveness” and the informal open-ended question of “what can we do with this?”.

“Regular expressions are not Turing-complete” might literally be a theorem in some textbook; it doesn’t stop regular expressions from also being a feasible programming language for Turing machines as demonstrated in this post. Leaving you to come to terms with your own understanding of this paradox, a closing thought: at the end of the day, science is no slave to mathematics, we do mathematics in service of science.

Bonus track: Brainfuck

Turing regular expressions look similar to Brainfuck. Let’s extend the primitives of Turing expressions to be able to compile Brainfuck.

The loop operator [...] in Brainfuck can be written as (0~...)*0?, with a new operation 0~ to observe a value not equal to zero.

With + and - (increment and decrement modulo 256) as additional operations supported by our regular expressions, a Brainfuck program is compiled to an extended Turing expression simply by replacing [ and ] textually with (0~ and )*0?.⁵

Interestingly, translating Brainfuck to extended Turing expressions does not use |, yet Brainfuck is Turing-complete: while loops seem to make conditional expressions redundant. (Are Turing expressions without choice (e₁|e₂) (i.e., with only <, >, 0!, 1!, 0?, 1?, and (e)*) also Turing-complete?)

The machine implemented within this post supports those new constructs: +, -, 0~, 1~ (also 2~ to 9~, just because; but not more, just because), and the brackets [ and ]. You can write code in Brainfuck, and it will be desugared and interpreted as an extended Turing expression. You can also directly write an extended Turing expression.

The input can now be a comma-separated list prefixed by a comma (to allow multi-digit numbers). Example: ,1,1,2,3,5,8,13. Trailing zeroes in the output will not be printed for clarity.

Brainfuck is a high-level programming language compared to Turing expressions. Being able to increment and decrement numbers makes programming so much less tedious than explicitly manipulating unary or binary numbers in Turing machines.

Small examples

The idiom [-] zeroes out a number.

Alan's machine

Program:

[-]

Input:

,42

Output:

,0

Add two numbers.

Alan's machine

Program:

>[-<+>]

Input:

,42,57

Output:

,99

Nondeterministic Brainfuck

Extending Brainfuck with star (e)* and choice (e₁|e₂) equips the language with nondeterminism.

One thing we can do using nondeterminism is to define the inverse of a function simply by guessing the output y, applying the function to it f(y), then checking that it matches the input x, in which case y = f⁻¹(x). It’s not efficient, but it’s a general implementation of inverses which may have its uses in writing formal specifications.

Here’s a roundabout implementation of subtraction (x − y): guess the answer (call it z), add one of the operands to it (z + y), and compare the result with the other operand (if z + y = x then z = x − y).

Alan's machine

Program:

[->>>+<<<](+>>+<<)*>[->+<]>[->-<]>0?

Input:

,10,3

Output:

,7

Commented program

# Using four consecutive cells, named A, B, C, D
# Expected result: value of (A - B) placed in A

# D ← A
# A ← 0
[->>>+<<<]

# A ← GUESS
# C ← A
(+>>+<<)*

# C ← B + C
# B ← 0
>[->+<]

# D ← D - C
# C ← 0
>[->-<]

# Assert(D == 0)
>0?

Note that this is the opposite of most real-world regex engines: * is usually eager (equivalent to (e(e)*|) rather than (|e(e)*)). The lazy variant is usually written *? and you could add it to the syntax of Turing regexes if you want.↩︎
It’s very easy to accidentally write an infinite loop; this is a half-assed safeguard to catch some of them.↩︎
Extra care must be taken when more than two distinct symbols may be encountered on the tape.↩︎
Another hot take in the same vein is that the simply-typed lambda calculus—the simplest total functional programming language—is Turing-complete: you can encode Turing machines/general recursive functions/your favorite Turing-complete gadget by controlling nontermination with fuel, for a concrete example. Another idea is that a language where functions terminate can easily be extended with nontermination or recursion as an explicit effect.

More generally, Kleene’s normal form theorem says that if you can “express” primitive recursion, then you can “express” general recursion. Some might view this theorem as a counterargument, pointing to a boundary between “Turing-completeness” (can “express” general recursion) and “weak Turing-completeness” (can “express” primitive recursion) which can be made precise. While I recognize that there is a rich theory behind these concepts, I rather view Kleene’s normal form theorem as an argument why such a distinction is too subtle to be relevant to expressiveness in a broad sense of what we can and cannot do using a programming language.↩︎
We might even say that Brainfuck can be compiled to regular expressions using regular expressions. The regular expressions to do those substitutions are trivial though. Using sed: sed 's/\[/(0~/g;s/\]/)\*0?/g', where the two actual regular expressions are \[ and \].↩︎

]]> Tue, 18 Jun 2024 00:00:00 UT https://blog.poisson.chat/posts/2024-06-18-turing-regex.html Lysxia Abstract nonsense https://blog.poisson.chat/posts/2023-09-05-abstract-nonsense.html I’ve been reading The Joy of Abstraction, by Eugenia Cheng. Very accessible. Would recommend. It’s doing good stuff to my mind.

Abstraction, food for thought

Two apples are the same as two apples.

Two apples are not the same as two oranges.

Two ripe apples are not the same as two rotten apples, even though they are both two apples and two apples.

Two fruits are the same as two fruits, even though they could be two apples and two oranges.

]]> Tue, 05 Sep 2023 00:00:00 UT https://blog.poisson.chat/posts/2023-09-05-abstract-nonsense.html Lysxia From delimited continuations to algebraic effects in Haskell https://blog.poisson.chat/posts/2023-01-02-del-cont-examples.html The upcoming version of GHC will feature primitives for delimited continuations. Let’s put them to use and build a bare bones algebraic effect system.

Algebraic effects

In Haskell, different sorts of effectful computations can be expressed using monads. Monads for individual effects are fairly well understood. The challenge now is to combine many different effects. Applications manage many kinds of resources (files, network, databases…), handle many types of errors, and run in different environments (production vs testing with mock components). Can that be done while maintaining a reasonable level of separation of concerns?

Currently, a common approach is to use monad transformers and type classes (mtl-style). But when you have a big stack of monad transformers, it may not be easy to even understand what (>>=) does, because its behavior arises from the composition of all of these transformers. So the actual control flow of the program is opaque to us, which can be an obstacle to locating errors and guaranteeing performance.

Algebraic effects are another approach to combine effects. Whereas with transformers, every monad transformer must be defined from scratch, algebraic effects start from a few core primitives: you have one (parameterized) monad with abilities to “call” an operation and to “handle” such calls. The hope is that those core primitives:

are simple to implement and to optimize;
make it easy to define and reason about effects, in terms of both behavior and performance.

Until now, algebraic effect systems in Haskell used free monads or the continuation monad. Continuations were emulated as closures; this comes with a level of indirection whose cost is difficult to mitigate. The newly implemented delimited continuations primops let us directly manipulate native continuations.

This post uses delimited continuations to implement programs with various effects. The usual culprits:

Exceptions
Output
Combining exceptions and output
Input
Combining input and output: streaming
Interacting with the real world
State
Nondeterminism
Concurrency

The example programs leveraging this mini effect library will look like your standard-fare monadic code. What makes them interesting is that, operationally, they are all in the IO monad. Unlike with monad transformers, adding a new effect does not change the underlying monad, so code that doesn’t use that effect does not pay a price for it. Another notable consequence is that “unlifting” abstractions like UnliftIO or MonadBaseControl are no longer relevant: there is nothing to “unlift” if you never leave IO.

The abstraction layer of algebraic effects over continuations is so thin that I just use prompt and control0 directly, but the bits that are “operations” and the bits that are “handlers” are clearly identifiable. The system implemented here is untyped as far as effects are concerned, but features named handlers as a mitigating alternative; a complete effect system which would keep track of what operations each computation may call and would provide safe primitives to define new effects is left as an exercise for the reader.

This post is written in Literate Haskell (source code). It can be compiled using the development version of GHC (or GHC 9.6 if it has been released).

$ ghc 2023-01-02-del-cont-examples.lhs -main-is DelContExamples.main -o run-tests
$ ./run-tests
All tests passed!

Extensions and imports

{-# LANGUAGE
  BangPatterns,
  BlockArguments,
  DerivingStrategies,
  GADTs,
  GeneralizedNewtypeDeriving,
  MagicHash,
  UnboxedTuples #-}
module DelContExamples where

import qualified Control.Exception as E
import Control.Exception.Base (NoMatchingContinuationPrompt(..))
import Data.Either
import Data.Foldable (for_)
import Data.Functor (void)
import Data.Functor.Sum (Sum(..))
import Data.Maybe (fromMaybe, maybe)
import System.IO.Unsafe
import System.Environment
import GHC.Exts (PromptTag#, newPromptTag#, prompt#, control0#)
import GHC.IO (IO(..))
import GHC.Stack (HasCallStack)
import Prelude hiding (log)

The mother of all monads

Capturing continuations is the power of the continuation monad, in which we can embed all other monads. It’s the mother of all monads.

Mom is defined identically to IO, but its only operations are the new delimited continuation primitives.

newtype Mom a = Mom (IO a)
  deriving newtype (Functor, Applicative, Monad)

The available operations wrap the RTS primitives newPromptTag#, prompt# and control0#.

-- Unsafe primitives

data PromptTag a = PromptTag (PromptTag# a)

newPromptTag :: Mom (PromptTag a)
newPromptTag = Mom (IO (\s -> case newPromptTag# s of
  (# s', tag #) -> (# s', PromptTag tag #)))

prompt :: PromptTag a -> Mom a -> Mom a
prompt (PromptTag tag) (Mom (IO m)) = Mom (IO (prompt# tag m))

control0 :: PromptTag a -> ((Mom b -> Mom a) -> Mom a) -> Mom b
control0 (PromptTag tag) f =
  Mom (IO (control0# tag (\k -> case f (\(Mom (IO a)) -> Mom (IO (k a))) of Mom (IO b) -> b)))

The boxing of the continuation k in control0 could be avoided by introducing a new type for continuations, replacing (Mom b -> Mom a). I’m not sure whether there is much to gain from that optimization. I leave it like this for simplicity.

`prompt` and `control0`, “goto” with extra steps?

When a function terminates normally, it returns its result to its caller, its predecessor in the call stack. prompt lets you prepare another return point earlier in the call stack, and control0 returns to that point. What happens to all the stack frames that were skipped that way? They are copied to the heap so they can be restored later.

In more concrete terms, when you call control0 t f :: Mom b, the caller expects a result of some type b. It is assumed that you have previously set up a prompt t :: Mom a -> Mom a in the call stack with the same tag t :: PromptTag a. The slice of the stack up to that prompt t is unwinded and stored as a function continue :: Mom b -> Mom a (IO b -> IO a). prompt t is popped off the stack, and the program carries on as f continue.

It sounds completely insane the first time you learn about it, it’s like “goto” with extra steps. And yet, when you get down to it, delimited continuations have rather clean semantics, both operationally and denotationally. The implementation was a surprisingly small change in GHC.

The changes required to implement prompt# and control0# are relatively minimal. They only impact the RTS, and they do not require any changes to existing functionality. Though capturing portions of the RTS stack may seem like a radical proposition, GHC actually already does it when raising an asynchronous exception to avoid the need to duplicate work for any blackholed thunks. In fact, getting that right is significantly more subtle than implementing control0#, which is quite straightforward in comparison.
— The GHC Proposal

The richness of continuations, both theoretically and practically, suggests that these control operators are not as arbitrary as they seem.

Effectful code, pure semantics

The code in this post can be split in two levels. Library-level code uses the delimited continuation primitives to implement effects—operations and handlers, and user-level code uses those effects in example programs. Without direct access to delimited continuations, user-level code cannot observe any mutation, so it will be safe to use the following pure run function.

-- Look Ma', no IO!
run :: Mom a -> Maybe a
run (Mom m) = unsafePerformIO
  (E.catch (Just <$> m) \NoMatchingContinuationPrompt -> pure Nothing)

Hiding the delimited continuations primitives avoids the danger of duplicating and observing the creation of fresh PromptTags in a pure context. Some partiality remains (Maybe) due to potentially mismatched control0# calls. Such errors would be prevented by a type system for effects, which is beyond the scope of this post.

Exceptions

To begin, let’s implement exceptions using delimited continuations. This effect has an operation throw and a handler catch.

Operation

We first declare the uninterpreted operation Throw as a constructor in a functor. The parameter a is ignored by exceptions; it will be used by other effects.

data Exception e a
  = Throw e

We wrap this constructor in a user-facing function throw. Every throw should have a matching catch, and we ensure this by requiring a tag that identifies the corresponding catch. The exact type of tag will be revealed in a moment. control0 uses that tag to look up the matching catch in the call stack, and returns to it with the exception e wrapped in Throw. The underscore is the continuation, which is the slice of the stack below the catch, which is thus discarded.

throw :: Exception e % r -> e -> Mom a
throw tag e = control0 tag \_ -> pure (Op (Throw e))

Handler

The type of catch should also look familiar, with the novelty that the handled computation f expects a tag—so that it may call throw. In catch f onThrow, a fresh tag is generated, then f tag either (1) returns normally, and its result is wrapped in Pure a, or (2) f tag throws an exception wrapped in Op (Throw e). We then return the result or apply the handler onThrow accordingly.

catch :: (Exception e % a -> Mom a) -> (e -> Mom a) -> Mom a
catch f onThrow = do
  tag <- newPromptTag
  handle tag (f tag)
 where
  handle tag action = do
    next <- prompt tag (Pure <$> action)
    case next of
      Op (Throw e) -> onThrow e
      Pure a -> pure a

You might have guessed that the Exception e % a tag is just a PromptTag. More surprisingly, the tag index involves a free monad. For exceptions, Free (Exception e) a is equivalent to Either e a: we expect the computation under prompt to produce either an exception e or a result a. More generally, for an effect expressed as a functor f, things will be set up exactly so that handlers will be matching on a computation/tree of type Free f r.

type f % r = PromptTag (Free f r)

data Free f r
  = Op (f (Free f r))
  | Pure r

Using catch, we can implement try.

try :: (Exception e % Either e a -> Mom a) -> Mom (Either e a)
try f = catch (\tag -> Right <$> f tag) (\e -> pure (Left e))

The explicit tags serve as a form of capabilities, handles that functions take as explicit arguments, granting the permission to use the associated effects. This partly makes up for the lack of effect typing. It’s not watertight: you can easily capture the tag to call throw outside of try/catch. But from a non-adversarial perspective, this mechanism may prevent quite a few mistakes.

Test

testThrow :: IO ()
testThrow = do
  assert (isRight' (run (try (\_ -> pure "Result"))))
  assert (isLeft'  (run (try (\exc -> throw exc "Error"))))
 where
  isRight' = maybe False isRight
  isLeft' = maybe False isLeft

-- Minimalistic unit testing framework
assert :: HasCallStack => Bool -> IO ()
assert True = pure ()
assert False = error "Assertion failed"

Output

Algebraic effects are also known as “resumable exceptions”, extending exceptions with the ability to continue the computation right where the exception was thrown.

The next simplest effect after exceptions is to produce some output. Like Throw, we represent the Output operation as a constructor, containing the value to output, and now also a continuation.

Operation

data Out o a
  = Output o (Mom () -> Mom a)

The output wrapper is similar to throw, additionally storing the continuation in the Output constructor. The expected argument of the continuation continue is a computation which is to replace the operation call. When we call output o :: Mom (), that call “bubbles up” like an exception, gets caught by a handler, and the call gets replaced by pure () or some other Mom () computation.

output :: Out o % r -> o -> Mom ()
output tag o = control0 tag \continue -> pure (Op (Output o continue))

A synonym specialized to strings.

log :: Out String % r -> String -> Mom ()
log = output

Example

An infinite output stream of the Fibonacci sequence.

fibonacci :: Out Int % r -> Mom a
fibonacci out = fib 0 1
  where
    fib !a !b = do
      output out a
      fib b (a + b)

Handler

Run a computation lazily and collect its output in a list.

collect :: (Out o % () -> Mom ()) -> [o]
collect f = runList do
  tag <- newPromptTag
  handle tag (Pure <$> f tag)
 where
  handle tag action = do
    next <- prompt tag action
    case next of
      Op (Output o continue) ->
        pure (o : runList (handle tag (continue (pure ()))))
      Pure () -> pure []
  runList = fromMaybe [] . run

Test

testFibonacci :: IO ()
testFibonacci =
  assert (take 8 (collect fibonacci)
          == [0, 1, 1, 2, 3, 5, 8, 13])

Combining exceptions and output

Example

The big selling point of algebraic effects is that effects can be combined smoothly. We can thus use log to trace the execution flow of a program using throw and catch without further ceremony.

This looks like your usual monadic program. The point is that everything lives in the same monad Mom (which is operationally equal to IO), so you do not have to worry about “lifting” or “unlifting” anything through a transformer: the semantics of (>>=) do not change with every new effect, and there isn’t the problem that “lifting” catch and other operations that are actually handlers is counter-intuitive for many transformers, if possible at all. To be fair, there remain difficulties in this area even with algebraic effects.

tracedCatch :: Out String % r -> Mom Bool
tracedCatch out = catch this onThrow 
 where
  this exc = do
    log out "Start"
    _ <- throw exc "Boom"
    log out "This is unreachable"
    pure False
  onThrow msg = do
    log out ("Error: " ++ msg)
    pure True

Test

testTracedCatch :: IO ()
testTracedCatch =
  assert (collect (void . tracedCatch) ==
    [ "Start"
    , "Error: Boom" ])

Silent handler

There can also be different ways of handling an effect. The following handler discards output instead of collecting it, for example to ignore debugging logs.

discardOutput :: (Out o % a -> Mom a) -> Mom a
discardOutput f = do
  tag <- newPromptTag
  handle tag (Pure <$> f tag)
 where
  handle tag action = do
    next <- prompt tag action
    case next of
      Op (Output _o continue) -> handle tag (continue (pure ()))
      Pure a -> pure a

testDiscard :: IO ()
testDiscard =
  assert (run (discardOutput tracedCatch) == Just True)

Input

Dually, there is an effect to request some input.

Operation

data In i a
  = Input (Mom i -> Mom a)

The input call is expected to return a result i. As before, the type of the input _ operation must coincide with the domain Mom i of the continuation.

input :: In i % r -> Mom i
input tag = control0 tag \continue -> pure (Op (Input continue))

Example

Output the cumulative sum of an input stream. Like fibonacci, this is an infinite loop in IO. It gets broken by control0 in input. Until now, an infinite loop in IO would either have to be broken by an exception (which makes it not actually infinite), or have to involve concurrency.

csum :: In Int % r -> Out Int % r -> Mom a
csum inp out = go 0
  where
    go !acc = do
      n <- input inp
      let acc' = acc + n
      output out acc'
      go acc'

Handler

Supply a list of inputs and stop when we run out.

listInput :: [i] -> (In i % a -> Mom a) -> Mom (Maybe a)
listInput is f = do
  tag <- newPromptTag
  catch (\exc -> handle exc tag is (Pure <$> f tag))
    (\() -> pure Nothing)
 where
  handle exc tag is action = do
    next <- prompt tag action
    case next of
      Op (Input continue)
        | i : is' <- is -> handle exc tag is' (continue (pure i))
        | otherwise -> handle exc tag [] (continue (throw exc ()))
      Pure a -> pure (Just a)

Test

testCsum :: IO ()
testCsum =
  assert ((collect \out ->
           void $ listInput [1 .. 5] \inp ->
           csum inp out
          ) == [1, 3, 6, 10, 15])

Combining input and output: streaming

The input and output effect can be combined in a streaming fashion, alternating execution between the consumer and the producer.

Handler

Feed the output of one computation into the input of the other. Terminate whenever one side terminates, discarding the other.

connect :: (Out x % a -> Mom a) -> (In x % a -> Mom a) -> Mom a
connect producer consumer = do
  out <- newPromptTag
  inp <- newPromptTag
  handleI out inp (Pure <$> producer out) (Pure <$> consumer inp)
 where
  handleI out inp produce consume = do
    next <- prompt inp consume
    case next of
      Op (Input continue) -> handleO out inp produce continue
      Pure a -> pure a
  handleO out inp produce consuming = do
    next <- prompt out produce
    case next of
      Op (Output o continue) ->
        handleI out inp (continue (pure ())) (consuming (pure o))
      Pure a -> pure a

Test

Connect two copies of the cumulative sum process: compute the cumulative sum of the cumulative sum.

csum2 :: In Int % () -> Out Int % () -> Mom ()
csum2 inp out = connect (\out' -> csum inp out') (\inp' -> csum inp' out)

testConnect :: IO ()
testConnect =
  assert ((collect \out ->
           void $ listInput [1 .. 5] \inp ->
           csum2 inp out
          ) == [1, 4, 10, 20, 35])

Interacting with the real world

What sets IO apart from ST and Mom is that it can change the world. We can define handlers to send output and receive input from the real world. The result of these handlers must be in IO.

Printing output

Text output can be printed to stdout.

printOutput :: (Out String % () -> Mom ()) -> IO ()
printOutput f = momToIO do
  tag <- newPromptTag
  handle tag (Pure <$> f tag)
 where
  handle tag action = do
    next <- prompt tag action
    case next of
      Op (Output o continue) -> pure do
        putStrLn o
        momToIO (handle tag (continue (pure ())))
      Pure () -> pure (pure ())
  momToIO = fromMaybe (pure ()) . run

Reading input

We can forward input from stdin into a consumer computation.

readInput :: (In String % () -> Mom ()) -> IO ()
readInput f = momToIO do
  tag <- newPromptTag
  handle tag (Pure <$> f tag)
 where
  handle tag action = do
    next <- prompt tag action
    case next of
      Op (Input continue) -> pure do
        i <- getLine
        momToIO (handle tag (continue (pure i)))
      Pure () -> pure (pure ())
  momToIO = fromMaybe (pure ()) . run

A drawback of this implementation is that for a computation that features both input and output, these handlers are awkward to compose. We can coerce IO to Mom so readInput can be composed with printOutput, but that is a hacky solution that makes the type Mom a lie (it’s not supposed to have side effects). A better solution may be to combine effects before interpreting them in IO all at once.

State

No effect tutorial would be complete without the state effect.

Operations

data State s a
  = Get (Mom s -> Mom a)
  | Put s (Mom () -> Mom a)

get :: State s % r -> Mom s
get tag = control0 tag \continue -> pure (Op (Get continue))

put :: State s % r -> s -> Mom ()
put tag s = control0 tag \continue -> pure (Op (Put s continue))

Handler

State-passing, no mutation.

runState :: s -> (State s % a -> Mom a) -> Mom (s, a)
runState s0 f = do
  tag <- newPromptTag
  handle tag s0 (Pure <$> f tag)
 where
  handle tag s action = do
    next <- prompt tag action
    case next of
      Op (Get continue) -> handle tag s (continue (pure s))
      Op (Put s' continue) -> handle tag s' (continue (pure ()))
      Pure a -> pure (s, a)

Example

incr :: State Int % r -> Mom ()
incr st = do
  n <- get st
  put st (n + 1)

Again, combining state with logging is effortless, because effects live in the same underlying monad.

logState :: Out String % r -> State Int % s -> Mom ()
logState out st = do
  n <- get st
  log out (show n)

incr2 :: Out String % r -> State Int % s -> Mom ()
incr2 out st = do
  incr st
  logState out st
  incr st
  logState out st

Test

testState :: IO ()
testState = do
  assert ((collect \out -> runState 0 (incr2 out) *> pure ()) == ["1", "2"])
  assert (run (discardOutput \out -> runState 0 (incr2 out)) == Just (2, ()))

Nondeterminism

The examples above are quite sequential in nature. Mom can also replace the list monad.

Operation

Choose one element in a list.

data Nondet a where
  Choose :: [x] -> (Mom x -> Mom a) -> Nondet a

choose :: Nondet % r -> [x] -> Mom x
choose tag xs = control0 tag \continue -> pure (Op (Choose xs continue))

Example

nameTheorems :: Nondet % r -> Mom String
nameTheorems nd = do
  name1 <- choose nd ["Church", "Curry"]
  name2 <- choose nd ["Turing", "Howard"]
  result <- choose nd ["thesis", "isomorphism"]
  pure (name1 ++ "-" ++ name2 ++ " " ++ result)

Handler

Use the output effect to stream all results of a nondeterministic computation. Here, the continuation is not used linearly: it is called once for every element in the given list.

enumerate :: (Nondet % a -> Mom a) -> Out a % r -> Mom ()
enumerate f out = do
  tag <- newPromptTag
  handle tag (Pure <$> f tag)
 where
  handle tag action = do
    next <- prompt tag action
    case next of
      Op (Choose xs continue) -> for_ xs (handle tag . continue . pure)
      Pure a -> output out a

Test

testEnumerate :: IO ()
testEnumerate = do
  assert (collect (enumerate nameTheorems) ==
    [ "Church-Turing thesis"
    , "Church-Turing isomorphism"
    , "Church-Howard thesis"
    , "Church-Howard isomorphism"
    , "Curry-Turing thesis"
    , "Curry-Turing isomorphism"
    , "Curry-Howard thesis"
    , "Curry-Howard isomorphism"
    ])

Concurrency

Earlier, the streaming handler connect interleaved execution of one consumer and one producer thread. Here is a cooperative concurrency effect that lets us dynamically fork any number of threads and interleave them.

Operations

data Conc a
  = Fork (Mom ()) (Mom () -> Mom a)
  | Yield (Mom () -> Mom a)

Fork a thread to run the given computation.

fork :: Conc % r -> Mom () -> Mom ()
fork tag thread = control0 tag \continue -> pure (Op (Fork thread continue))

Cooperative concurrency: threads must yield explicitly.

yield :: Conc % r -> Mom ()
yield tag = control0 tag \continue -> pure (Op (Yield continue))

Example

A thread that repeats an output value three times.

simpleThread :: Out String % r -> Conc % s -> Int -> Mom ()
simpleThread out conc n = do
  log out (show n)
  yield conc
  log out (show n)
  yield conc
  log out (show n)
  yield conc

Interleave 111, 222, 333.

interleave123 :: Out String % r -> Conc % s -> Mom ()
interleave123 out conc = do
  fork conc (simpleThread out conc 1)
  fork conc (simpleThread out conc 2)
  fork conc (simpleThread out conc 3)

Handler

A round-robin scheduler. handle keeps track of a queue of threads. It runs the first thread until the next event. If the thread yields, its continuation is pushed to the end of the queue. If the thread forks another thread, the forked thread is pushed to the end of the queue, and we continue in the main thread (forking does not yield). If the thread terminates, we remove it from the queue.

runConc :: (Conc % () -> Mom ()) -> Mom ()
runConc f = do
  tag <- newPromptTag
  handle tag [Pure <$> f tag]
 where
  handle tag [] = pure ()
  handle tag (thread : threads) = do
    next <- prompt tag thread
    case next of
      Op (Yield continue) -> handle tag (threads ++ [continue (pure ())])
      Op (Fork th continue) -> handle tag (continue (pure ()) : threads ++ [Pure <$> th])
      Pure () -> handle tag threads

Test

testInterleave :: IO ()
testInterleave =
  assert ((collect \out -> runConc \conc -> interleave123 out conc)
          == ["1", "2", "3", "1", "2", "3", "1", "2", "3"])

Conclusion

Primitive delimited continuation in Haskell give us the power to jump around the stack to implement many kinds of effects. Under the hood, those operations live in the IO monad, grounding effectful code in a familiar execution model.

For those new to the topic, I hope that these examples may serve as a good starting point to experiment with delimited continuations and algebraic effects in Haskell.

The system implemented here is as rudimentary as it gets. To define new effects and handlers, we use the new primitives directly, which is dangerous. This was deliberate to provide material to familiarize oneself with those primitives. Moreover, on the one hand, a type system to keep track of the scope of delimited continuations is a non-trivial ask. On the other hand, the examples here all follow a regular structure, so there is probably a way to encapsulate the primitives, trading off some expressiveness for a safe interface to define new effects and handlers.

Named handlers—via prompt tags—occupy an interesting spot in the scale of safety guarantees. It is imperfect, even very easy to circumvent. But if you’re not working against it, it is still a neat way to prevent simple mistakes. This system can be reinforced further using rank-2 polymorphism, a technique described in:

First-Class Names for Effect Handlers, Ningning Xie, Youyou Cong, Kazuki Ikemori, Daan Leijen (OOPSLA 2022).

Interestingly, prompt tags were not part of the original proposal, and they are not used by eff, the effect system which gave rise to Alexis King’s GHC proposal. Prompt tags were added during the feedback process to make the primitives type-safe by default.

Now is an exciting time for algebraic effects/delimited continuations, as they are making their way into industrial languages: Haskell, OCaml, WebAssembly.

All of this is executable

main :: IO ()
main = do
  testThrow
  testFibonacci
  testTracedCatch
  testDiscard
  testCsum
  testConnect
  testState
  testEnumerate
  testInterleave
  putStrLn "All tests passed!"

]]> Mon, 02 Jan 2023 00:00:00 UT https://blog.poisson.chat/posts/2023-01-02-del-cont-examples.html Lysxia The quantified constraint trick https://blog.poisson.chat/posts/2022-09-21-quantified-constraint-trick.html My favorite Haskell trick is how to use quantified constraints with type families. Kudos to Iceland_jack for coming up with it.

Quantified constraints and type families

QuantifiedConstraints is an extension from GHC 8.6 that lets us use forall in constraints.

It lets us express constraints for instances of higher-kinded types like Fix:

newtype Fix f = Fix (f (Fix f))

deriving instance (forall a. Eq a => Eq (f a)) => Eq (Fix f)

Other solutions existed previously, but they’re less elegant:

deriving instance Eq (f (Fix f)) => Eq (Fix f)

instance Eq1 f => Eq (Fix f) where ...

It also lets us say that a monad transformer indeed transforms monads:

class (forall m. Monad m => Monad (t m)) => MonadTrans t where
  lift :: m a -> t m a

(Examples lifted from the GHC User Guide on QuantifiedConstraints, section Motivation.)

One restriction is that the conclusion of a quantified constraint cannot mention a type family.

type family F a

-- (forall a. C (F a))  -- Illegal type family application in a quantified constraint

A quantified constraint can be thought of as providing a local instance, and they are subject to a similar restriction on the shape of instance heads so that instance resolution may try to match required constraints with the head of existing instances.

Type families are not matchable: we cannot determine whether an applied type family F a matches a type constructor T in a manner satisfying the properties required by instance resolution (“coherence”). So type families can’t be in the conclusion of a type family.

The quantified constraint trick

Step 1

To legalize type families in quantified constraints, all we need is a class synonym:

class    C (F a) => CF a
instance C (F a) => CF a

That CF a is equivalent to C (F a), and forall a. CF a is legal.

Step 2?

Since GHC 9.2, Step 1 alone solves the problem. It Just Works™. And I don’t know why.

Before that, for GHC 9.0 and prior, we also needed to hold the compiler’s hand and tell it how to instantiate the quantified constraint.

Indeed, now functions may have constraints of the form forall a. CF a, which should imply C (F x) for any x. Although CF and C (F x) are logically related, when C (F x) is required, that triggers a search for instances of the class C, and not the CF which is provided by the quantified constraint. The search would fail unless some hint is provided to the compiler.

When you require a constraint C (F x), insert a type annotation mentioning the CF x constraint (using the CF class instead of C).

_ {- C (F x) available here -} :: CF x => _

Inside the annotation (to the left of ::), we are given CF x, from which C (F x) is inferred as a superclass. Outside the annotation, we are requiring CF x, which is trivially solved by the quantified constraint forall a. CF a.

Recap

-- Mixing quantified constraints with type families --

class C a
type family F a

-- forall a. C (F a)  -- Nope.

class    C (F a) => CF a  -- Class synonym
instance C (F a) => CF a

-- forall a. CF a     -- Yup.

-- Some provided function we want to call.
f :: C (F t) => t

-- A function we want to implement using f.
g :: (forall a. CF a) => t
g = f               -- OK on GHC >= 9.2
g = f :: CF t => t  -- Annotation needed on GHC <= 9.0

The part of that type annotation that really matters is the constraint. The rest of the type to the right of the arrow is redundant. Another way to write only the constraint uses the following identity function with a fancy type:

with :: forall c r. (c => r) -> (c => r)
with x = x

So you can supply the hint like this instead:

g :: forall t. (forall a. CF a) => t
g = with @(CF t) f

Application: generic-functor

What do I need that trick for? It comes up in generic metaprogramming.

Imagine deriving Functor for Generic types (no Generic1, which is not as general as you might hope). One way is to implement the following class on generic representations:

class RepFmap a a' rep rep' where
  repFmap :: (a -> a') -> rep -> rep'

A type constructor f :: Type -> Type will be a Functor when its generic representation (Rep) implements RepFmap a a'… for all a, a'.

-- Class synonym for generically derivable functors
class    (forall a. Generic (f a), forall a a'. RepFmap a a' (Rep (f a) ()) (Rep (f a') ())) => GFunctor f
instance ...   -- idem (class synonym)

-- Wait a second...

But that is illegal, because the type family Rep occurs in the conclusion of a quantified constraint.

Time for the trick! We give a new name to the conclusion:

class    RepFmap a a' (Rep (f a) ()) (Rep (f a') ()) => RepFmapRep a a' f
instance ...  -- idem (class synonym)

And we can use it in a quantified constraint:

-- Now this works!
class    (forall a. Generic (f a), forall a a'. RepFmapRep a a' f) => GFunctor f
instance ...   -- idem (class synonym)

To obtain the final generic implementation of fmap, we wrap repFmap between to and from.

gfmap :: forall f a a'. GFunctor f => (a -> a') -> f a -> f a'
gfmap f =
  with @(RepFmapRep a a' f)             -- Hand-holding for GHC <= 9.0
    (to @_ @() . repFmap f . from @_ @())

Et voilà.

(Gist of this example)

Appendix: Couldn’t we do this instead?

If you’ve followed all of that, there’s one other way you might try defining gfmap without QuantifiedConstraints, by just listing the three constraints actually needed in the body of the function.

-- Dangerous gfmap!
gfmap ::
  Generic (f a) =>
  Generic (f a') =>
  RepFmap a a' (Rep (f a) ()) (Rep (f a') ()) =>
  (a -> a') -> f a -> f a'
gfmap f = to @_ @() . repFmap f . from @_ @()

This is okay as long as it is only ever used to implement fmap as in:

fmap = gfmap

Any other use voids a guarantee you didn’t know you expected.

The thing I haven’t told you is that RepFmap is implemented with… incoherent instances!¹ In fact, this gfmap may behave differently depending on how it is instantiated at compile time.

For example, for a functor with a field of constant type:

data T a b = C Int a b
  deriving Generic

gfmap @(T a) @b @b' where a, b and b' are distinct type variables behaves like fmap should. But gfmap @(T Int) @Int @Int will unexpectedly apply its argument function to every field. They all have type Int, so a function Int -> Int can and will be applied to all fields.

I could demonstrate this if I had implemented RepFmap… Luckily, there is a more general version of this “dangerous gfmap” readily available in my library generic-functor. It can be very incoherent, but if you follow some rules, it can also be very fun to use.

Playing with fire

gsolomap² is a function from generic-functor that can implement fmap, and much more.

fmapT :: (b -> b') -> T a b -> T a b'
fmapT = gsolomap

Map over the first parameter if you prefer:

firstT :: (a -> a') -> T a b -> T a' b
firstT = gsolomap

Or map over both type parameters at once:

bothT :: (a -> a') -> T a a -> T a' a'
bothT = gsolomap

I don’t know what to call this, but gsolomap also does what you might guess from this type:

watT ::
  (a -> a') ->
  T (a , a ) ((a  -> a') -> Maybe a ) ->
  T (a', a') ((a' -> a ) -> Maybe a') 
watT = gsolomap

It’s important to specialize gsolomap with distinct type variables (a and a'). You cannot refactor code by inlining a function if its body uses gsolomap, as it risks breaking that requirement.

Witnessing incoherence

For an example of surprising result caused by incoherence, apply the fmapT defined above to some concrete arguments. See how the result changes then you replace fmapT with its definition, gsolomap.

fmapT    ((+1) :: Int -> Int) (C 0 0 0) == C 0 0 1 :: T Int Int
gsolomap ((+1) :: Int -> Int) (C 0 0 0) == C 1 1 1 :: T Int Int  -- Noooooo...

(Gist of those gsolomap (counter)examples)

This is why gfmap’s signature should use quantified constraints: this guarantees that when the RepFmap constraint is solved, the first two parameters are going to be distinct type variables, thanks to the universal quantification (forall a a'). Thus, incoherence is hidden away.

Following that recipe, generic-functor contains safe implementations of Functor, Foldable, Traversable, Bifunctor, and Bitraversable.

In particular, the type of gfmap guarantees that it has a unique inhabitant satisfying gfmap id = id, and this property is quite straightforward to check by visual inspection of the implementation.

After all, gfmap will essentially do one of three things: (1) it will be id on types that don’t mention the type parameters in its function argument a -> a', (2) it will apply the provided function f, or (3) it will fmap (or bimap, or dimap) itself through a type constructor. In all cases (with some inductive reasoning for (3)), if f = id, then gfmap f = id.

gfmap f = id
gfmap f = f
gfmap f = fmap (gfmap f)

The dangerous gfmap (without QuantifiedConstraints) or gsolomap fail this property, because the extra occurrences of a and a' in its constraint make their signatures have a different “shape” from fmap.

The trade-off is that those safe functions can’t do the same crazy things as gsolomap above.

AFAICT there is no way around that with GHC.Generics. Incoherent instances can be avoided with kind-generics.↩︎
gsolomap accepts one function parameter. There is also gmultimap which accepts arbitrarily many functions.↩︎

]]> Wed, 21 Sep 2022 00:00:00 UT https://blog.poisson.chat/posts/2022-09-21-quantified-constraint-trick.html Lysxia Formalizing finite sets https://blog.poisson.chat/posts/2022-05-27-formalizing-finite-sets.html Combinatorics studies mathematical structures by counting. Counting may seem like a benign activity, but the same rigor necessary to prevent double- or under-counting mistakes arguably underpins all of mathematics.¹

Combining my two favorite topics, I’ve always wanted to mechanize combinatorics in Coq.² An immediate challenge is to formalize the idea of “set”.³ We have to be able to define the set of things we want to count. It turns out that there are at least two ways of encoding sets in type theory: sets as types, and sets as predicates. They are suitable for defining different classes of operations: sums (disjoint union) are a natural operation on types, while unions and intersections are naturally defined on predicates.

The interplay between these two notions of sets, and finiteness, will then let us prove the standard formula for the cardinality of unions, aka. the binary inclusion-exclusion formula:

#|X ∪ Y| = #|X| + #|Y| - #|X ∩ Y|

Imports and options

From Coq Require Import ssreflect ssrbool.

Set Implicit Arguments.

Sets as types

The obvious starting point is to view a type as the set of its inhabitants.

How do we count its inhabitants? We will say that a set A has cardinality n if there is a bijection between A and the set {0 .. n-1} of natural numbers between 0 and n-1.

Bijections

A bijection is a standard way to represent a one-to-one correspondence between two sets, with a pair of inverse functions. We define the type bijection A B as a record containing the two functions and a proof of their inverse relationship.

Record is_bijection {A B} (to : A -> B) (from : B -> A) : Prop :=
  { from_to : forall a, from (to a) = a
  ; to_from : forall b, to (from b) = b }.

Record bijection (A B : Type) : Type :=
  { bij_to : A -> B
  ; bij_from : B -> A
  ; bij_is_bijection :> is_bijection bij_to bij_from }.

Infix "<-->" := bijection (at level 90) : type_scope.

We say that A and B are isomorphic when there exists a bijection between A and B. Isomorphism is an equivalence relation: reflexive, symmetric, transitive.⁴

Definition bijection_refl {A} : A <--> A.
Admitted. (* Easy exercise *)

Definition bijection_sym {A B} : (A <--> B) -> (B <--> A).
Admitted. (* Easy exercise *)

Definition bijection_trans {A B C} : (A <--> B) -> (B <--> C) -> (A <--> C).
Admitted. (* Easy exercise *)

Infix ">>>" := bijection_trans (at level 40).

Finite sets

Our “bijective” definition of cardinality shall rely on a primitive, canonical family of finite types {0 .. n-1} that is taken for granted. We can define them as the following sigma type, using the familiar set comprehension notation, also known as ordinal in math-comp:

Definition fin (n : nat) : Type := { p | p < n }.

An inhabitant of fin n is a pair of a p : nat and a proof object of p < n. Such proofs objects are unique for a given p and n, so the first component uniquely determines the second component, and fin n does have exactly n inhabitants.⁵

Finiteness

We can now say that a type A has cardinality n if there is a bijection between A and fin n, i.e., there is an inhabitant of A <--> fin n. Note that this only defines finite cardinalities, which is fine for doing finite combinatorics. Infinity is really weird so let’s not think about it.

As a sanity check, you can verify the cardinalities of the usual suspects, bool, unit, and Empty_set.

Definition bijection_bool : bool <--> fin 2.
Admitted. (* Easy exercise *)

Definition bijection_unit : unit <--> fin 1.
Admitted. (* Easy exercise *)

Definition bijection_Empty_set : Empty_set <--> fin 0.
Admitted. (* Easy exercise *)

A type A is finite when it has some cardinality n : nat. When speaking informally, it’s common to view finiteness as a property, a thing that a set either is or is not. To prove finiteness is merely to exhibit the relevant data: a number to be the cardinality, and an associated bijection (which we call an enumeration of A, enum for short). Hence we formalize “finiteness” as the type of that data.

Record is_finite (A : Type) : Type :=
  { card : nat
  ; enum : A <--> fin card }.

Further bundling is_finite A proofs with their associated set A, we obtain a concept aptly named “finite type”.⁶ A finite type is a type A paired with a proof of is_finite A.

Record finite_type : Type :=
  { ft_type :> Type
  ; ft_is_finite :> is_finite ft_type }.

We leverage coercions (indicated by :>) to lighten the notation of expressions involving finite_type.

The first coercion ft_type lets us use a finite_type as a Type. So if E : finite_type, we can write the judgement that “e is an element of E” as e : E, which implicitly expands to the more cumbersome e : ft_type E.

Similarly, the second coercion ft_is_finite lets us access the evidence of finiteness without naming that field. In particular, we can write the cardinality of E : finite_type as card E, as if card were a proper field of E rather than the nested record it actually belongs to. This is a convenient mechanism for overloading, letting us reuse the name card(inality) even though records technically cannot have fields with the same name. With that, we define #|A| as sugar for card A:

Notation "'#|' A '|'" := (card A).

Some notation boilerplate

Declare Scope fintype_scope.
Delimit Scope fintype_scope with fintype.
Bind Scope fintype_scope with finite_type.

Uniqueness of cardinality

The phrase “cardinality of a set” suggests that cardinality is an inherent property of sets. But now we’ve defined “finite type” essentially as a tuple where the cardinality is just one component. What’s to prevent us from putting a different number there, for the same underlying type?

We can prove that this cannot happen. Cardinality is unique: any two finiteness proofs for the same type must yield the same cardinality.

(The proof is a little tedious and technical.)

Theorem card_unique {A} (F1 F2 : is_finite A) : card F1 = card F2.
Admitted. (* Intermediate exercise *)

A slightly more general result is that isomorphic types (i.e., related by a bijection) have the same cardinality. It can first be proved in terms of is_finite, from which a corollary in terms of finite_type follows.

Theorem card_bijection {A B} (FA : is_finite A) (FB : is_finite B)
  : (A <--> B) -> card FA = card FB.
Admitted. (* Like card_unique *)

Theorem card_bijection_finite_type {A B : finite_type}
  : (A <--> B) -> #|A| = #|B|.
Proof.
  apply card_bijection.
Qed.

The converse is also true and useful: two types with the same cardinality are isomorphic.

Theorem bijection_card {A B} (FA : is_finite A) (FB : is_finite B)
  : card FA = card FB -> (A <--> B).
Admitted. (* Easy exercise *)

Theorem bijection_card_finite_type {A B : finite_type}
  : #|A| = #|B| -> (A <--> B).
Proof.
  apply bijection_card.
Qed.

Operations on finite sets

Sum

The sum of sets is also known as the disjoint union.

Inductive sum (A B : Type) : Type :=
| inl : A -> A + B
| inr : B -> A + B
where "A + B" := (sum A B) : type_scope.

sum is a binary operation on types. We must work to make it an operation on finite types.

There is a bijection between fin n + fin m (sum of sets) and fin (n + m) (sum of nats).

Definition bijection_sum_fin {n m} : fin n + fin m <--> fin (n + m).
Admitted. (* Intermediate exercise *)

The sum is a bifunctor.

Definition bijection_sum {A A' B B'}
  : (A <--> B) -> (A' <--> B') -> (A + A' <--> B + B').
Admitted. (* Easy exercise *)

Combining those facts, we can prove that the sum of two finite sets is finite (finite_sum), and the cardinality of the sum is the sum of the cardinalities (card_sum).

Definition is_finite_sum {A B} (FA : is_finite A) (FB : is_finite B)
  : is_finite (A + B) :=
  {| card := #|FA| + #|FB|
  ;  enum := bijection_sum (enum FA) (enum FB) >>> bijection_sum_fin |}.

Definition finite_sum (A B : finite_type) : finite_type :=
  {| ft_type := A + B ; ft_is_finite := is_finite_sum A B |}.

Infix "+" := finite_sum : fintype_scope.

Theorem card_sum {A B : finite_type} : #|(A + B)%fintype| = #|A| + #|B|.
Proof.
  reflexivity.
Qed.

Product

The cartesian product has structure dual to the sum.

Inductive prod (A B : Type) : Type :=
| pair : A -> B -> A * B
where "A * B" := (prod A B) : type_scope.

There is a bijection fin n * fin m <--> fin (n * m).
The product is a bifunctor.
The product of finite sets is finite.
The cardinality of the product is the product of the cardinalities.

Coq code

Definition bijection_prod_fin {n m} : fin n * fin m <--> fin (n * m).
Admitted. (* Intermediate exercise *)

Definition bijection_prod {A A' B B'}
  : (A <--> B) -> (A' <--> B') -> (A * A' <--> B * B').
Admitted. (* Easy exercise *)

Definition is_finite_prod {A B} (FA : is_finite A) (FB : is_finite B)
  : is_finite (A * B) :=
  {| card := #|FA| * #|FB|
  ;  enum := bijection_prod (enum FA) (enum FB) >>> bijection_prod_fin |}.

Definition finite_prod (A B : finite_type) : finite_type :=
  {| ft_type := A * B ; ft_is_finite := is_finite_prod A B |}.

Infix "*" := finite_prod : fintype_scope.

Theorem card_prod {A B : finite_type} : #|(A * B)%fintype| = #|A| * #|B|.
Proof.
  reflexivity.
Qed.

Sets as predicates

Two other common operations on sets are union and intersection. However, those operations don’t fit in the view of sets as types. While set membership x ∈ X is a proposition, type inhabitation x : X is a judgement, which is a completely different thing,⁷ so we need a different approach.

The idea of set membership x ∈ X as a proposition presumes that x and X are entities that exist independently of each other. This suggests that there is some “universe” that elements x live in, and the sets X under consideration are subsets of that same universe. We represent the universe by a type A, and sets (i.e., “subsets of the universe”) by predicates on A.

Definition set_of (A : Type) := (A -> bool).

Hence, if x : A is an element of the universe, and X : set A is a set (subset of the universe), we will denote set membership x ∈ X simply as X x (x satisfies the predicate X).

Those predicates are boolean, i.e., decidable. This is necessary in several constructions and proofs here, notably to prove that the union or intersection of finite sets is finite. We rely on a coercion to implicitly convert booleans to Prop: is_true : bool >-> Prop, which is exported by ssreflect.

Union, intersection, complement

Those common set operations correspond to the usual logical connectives.

Section Operations.

Context {A : Type}.

Definition union' (X Y : set_of A) : set_of A := fun a => X a || Y a.
Definition intersection' (X Y : set_of A) : set_of A := fun a => X a && Y a.
Definition complement' (X : set_of A) : set_of A := fun a => negb (X a).

End Operations.

Define the familiar infix notation for union and intersection.

Declare Scope set_of_scope.
Delimit Scope set_of_scope with set_of.
Bind Scope set_of_scope with set_of.

Infix "∪" := union' (at level 40) : set_of_scope.
Infix "∩" := intersection' (at level 40) : set_of_scope.

Finiteness

Again, we will characterize finite sets using bijections to fin n. We first transform the set X into a type to_type X, so we can form the type of bijections to_type X <--> fin n. Like fin, we define to_type A as a sigma type. Thanks to the predicate X being boolean, there is at most one proof p : X a for each a, so the type { a : A | X a } has exactly one inhabitant for each inhabitant a : A satisfying X a.

Definition to_type {A : Type} (X : set_of A) : Type := { a : A | X a }.

Coercion to_type : set_of >-> Sortclass.

We obtain a notion of finite set by imitating the structure of finite_type. The set-as-predicate X is finite if the set-as-type to_type X is finite.

Record finite_set_of (A : Type) : Type :=
  { elem_of :> set_of A
  ; fso_is_finite :> is_finite (to_type elem_of)
  }.

Similarly, a finite_type_of can be coerced to a finite_type.

Definition to_finite_type {A} (X : finite_set_of A) : finite_type :=
  {| ft_type := elem_of X
  ;  ft_is_finite := X |}.

Coercion to_finite_type : finite_set_of >-> finite_type.

Finite unions and intersections

We then prove that the union and intersection of finite sets are finite. This is actually fairly challenging, since proving finiteness means to calculate the cardinality of the set and to construct the associated bijection. Unlike sum and product, there is no simple formula for the cardinality of union and intersection. One candidate may seem to be the binary inclusion-exclusion formula:

#|X ∪ Y| = #|X| + #|Y| - #|X ∩ Y|

But that only gives the cardinality of the union in terms of the intersection, or vice versa, and we don’t know either yet.

Rather than constructing the bijections directly, we decompose the proof. The intuition is that X ∪ Y and X ∩ Y can easily be “bounded” by known finite sets, namely X + Y and X respectively. By “bounded”, we mean that there is an injection from one set to the other.

The standard definition of injectivity is via an implication f x = f y -> x = y. However, a better definition for our purposes comes from a one-sided inverse property: a function f : A -> B is a section if there exists another function g : B -> A (called a retraction) such that g (f x) = x. Every section is an injection, but the converse requires the law of excluded middle.

Record is_section {A B} (to : A -> B) (from : B -> A) : Prop :=
  { s_from_to : forall a, from (to a) = a }.

Record section (A B : Type) : Type :=
  { s_from : A -> B
  ; s_to : B -> A
  ; s_is_section : is_section s_from s_to }.

The point is that, given a section to a finite set, section A (fin n), we can construct a bijection A <--> fin m for some m, that is smaller than n. We formalize this result with a proof-relevant sigma type.

Definition section_bijection (A : Type) (n : nat)
  : section A (fin n) -> { m & A <--> fin m }.
Admitted. (* Hard exercise *)

This construction is rather involved. It is much more general than when we were looking specifically at union and intersection, but at the same time it is easier to come up with as it abstracts away the distracting details of those operations.

Now there is a section from X ∪ Y to X + Y, and from X ∩ Y to X.

Definition section_union {A} (X Y : set_of A)
  : section (X ∪ Y)%set_of (X + Y).
Admitted. (* Easy exercise *)

Definition section_intersection {A} (X Y : set_of A)
  : section (X ∩ Y)%set_of X.
Admitted. (* Easy exercise *)

We can then rely on the finiteness of X and X + Y to extend those sections to fin n for some n via the following theorem:

Theorem section_extend (A B C : Type)
  : section A B -> (B <--> C) -> section A C.
Admitted. (* Easy exercise *)

Definition section_union' {A} (X Y : finite_set_of A)
  : section (X ∪ Y)%set_of (fin (#|X| + #|Y|)).
Proof.
  eapply section_extend.
  - apply section_union.
  - apply is_finite_sum.
Qed.

Definition section_intersection' {A} (X Y : finite_set_of A)
  : section (X ∩ Y)%set_of (fin #|X|).
Proof.
  eapply section_extend.
  - apply section_intersection.
  - apply enum.
Qed.

Finally, by section_bijection, we obtain finiteness proofs of union' and intersection', which let us define union and intersection properly as operations on finite sets.

Theorem is_finite_union {A} {X Y : set_of A}
    (FX : is_finite X) (FY : is_finite Y)
  : is_finite (X ∪ Y)%set_of.
Proof.
  refine {| enum := projT2 (section_bijection _) |}.
  eapply (section_extend (B := X + Y)%type).
  - apply section_union.
  - apply (is_finite_sum FX FY).
Qed.

Theorem is_finite_intersection {A} {X Y : set_of A}
    (FX : is_finite X) (FY : is_finite Y)
  : is_finite (X ∩ Y)%set_of.
Proof.
  refine {| enum := projT2 (section_bijection _) |}.
  eapply section_extend.
  - apply section_intersection.
  - apply (enum FX).
Qed.

Definition union {A} (X Y : finite_set_of A) : finite_set_of A :=
  {| fso_is_finite := is_finite_union X Y |}.

Definition intersection {A} (X Y : finite_set_of A) : finite_set_of A :=
  {| fso_is_finite := is_finite_intersection X Y |}.

Declare Scope fso_scope.
Delimit Scope fso_scope with fso.
Bind Scope fso_scope with finite_set_of.

Infix "∪" := union (at level 40) : fso_scope.
Infix "∩" := intersection (at level 40) : fso_scope.

Hereafter, ∪ and ∩ will denote finite unions and intersections.

#[local] Open Scope fso_scope.

Inclusion-exclusion

#|X ∪ Y| = #|X| + #|Y| - #|X ∩ Y|

To prove that formula, it’s probably not a good idea to look at how ∪ and ∩ compute their cardinalities. A better idea is to construct a bijection, which implies an equality of cardinalities by card_bijection.

To start, subtractions are bad, so we rewrite the goal:

#|X ∪ Y| + #|X ∩ Y| = #|X| + #|Y|

Now we look for a bijection (X ∪ Y) + (X ∩ Y) <--> X + Y. It gets a bit tricky because of the dependent types.

Definition inclusion_exclusion_bijection {A} (X Y : finite_set_of A)
  : (X ∪ Y)%set_of + (X ∩ Y)%set_of <--> X + Y.
Admitted. (* Hard exercise *)

Isomorphic sets have the same cardinality (by theorem card_bijection_finite_type). The resulting equation simplifies to the binary inclusion-exclusion identity, because #|A + B| equals #|A| + #|B| definitionally. So the proof consists simply of applying that theorem with the above bijection.

Theorem inclusion_exclusion {A} (X Y : finite_set_of A)
  : #|X ∪ Y| + #|X ∩ Y| = #|X| + #|Y|.
Proof.
  apply (@card_bijection_finite_type ((X ∪ Y) + (X ∩ Y)) (X + Y)).
  apply inclusion_exclusion_bijection.
Qed.

Conclusion

To formalize mathematics, it’s often useful to revisit our preconceptions about fundamental concepts. To carry out even basic combinatorics in type theory, it’s useful to distinguish two views of the naive notion of set.

For example, when we say “union”, we really mean one of two things depending on the context. Either the sets are obviously disjoint, so we really mean “sum”: this corresponds to viewing sets as types. Or we implicitly know that the two sets contain the same “type” of elements a priori, so the overlap is something we have to worry about explicitly: this corresponds to viewing sets as predicates on a given universe.

Ironically, when making restaurant reservations, I still occasionally forget to count myself.↩︎
The code from this post is part of this project I’ve started here. Also check out Brent Yorgey’s thesis: Combinatorial Species and Labelled Structures (2014).↩︎
Speaking of sets, it’s important to distinguish naive set theory from axiomatic set theory. Naive set theory is arguably what most people think of when they hear “set”. It is a semi-formal system for organizing mathematics: there are sets, they have elements, and there are various operations to construct and analyze sets, but overall we don’t think too hard about what sets are (hence, “semi-formal”). When this blog post talks about sets, it is in the context of naive set theory. Axiomatic set theory is formal, with rules that are clear enough to be encoded in a computer. The name “axiomatic set theory” is a stroke of marketing genius, establishing it as the “standard” way of formalizing naive set theory, and thus, all of mathematics, as can be seen in most introductory courses on formal logic. Historically, Zermelo’s set theory was formulated at around the same time as Russell’s type theory, and type theory is at the root of currently very active areas of programming language theory and formal methods.↩︎
Bijections actually form a groupoid (a “proof-relevant equivalence relation”).↩︎
We could also have defined fin as the inductive type of bounded naturals, which is named Fin.t in the standard library. Anecdotal experience suggests that the sigma type is more beginner-friendly. But past a certain level of familiarity, I think they are completely interchangeable.
```
Inductive fin' : nat -> Type :=
| F0 : fin' 1
| FS : forall n, fin' n -> fin' (S n).
```
The definition of fin as a sigma type relies on details of the definition of the order relation _ < _. Other definitions may allow the proposition p < n to be inhabited by multiple proof objects, causing fin n to have “more” than n inhabitants unless they are collapsed by proof irrelevance.↩︎
math-comp has a different but equivalent definition of fintype.↩︎
… if you know what those words mean.↩︎

]]> Fri, 27 May 2022 00:00:00 UT https://blog.poisson.chat/posts/2022-05-27-formalizing-finite-sets.html Lysxia

Original Source | Taken Source

Minimax

Alpha-beta

Generalized minimax

Clamping functions

The lattice of clamping functions

Correctness for free

Strictness check

Conclusion

How Twentyseven works

Background: breadth-first folds and traversals

Non-standard terminology

Problem statement: Breadth-first unfolds

Queue-based unfold

Some uses of unfolds

Traversals from unfolds

Unfolds in graphs

Compile and run

“Global” level-based unfold

In search of compositionality

The weave applicative

Code golf

A wrinkle in time

Complexity analysis

Out of sight, out of mind

Laziness for the win

Lazy in more ways than one

Maximizing laziness

Laziness without end

Microbenchmarks: Queues vs Global Levels vs Weaves

Tree-producing unfolds

Oblivious unfolds

Conclusion

Recap table

Locating the problem

Explaining the problem

Text encodings

Back to the problem

Solving the problem

See also

Hope (1980)

Miranda (1985)

Algebraic data types

Unfree algebras in Miranda

Sums of products?

Clear (1979)

Example theories in Clear

More substantial examples in Clear

CLU (1977)

Tales of T-rexes

The tale of Alonzo

Alan's machine

More examples

Alan's empty machine (JavaScript is disabled)

Preamble: extra features and clarifications

Determinism

Not

Alan's machine

Binary increment

Alan's machine

Left shift

Alan's machine

Right shift

Alan's machine

Cumulative xor

Alan's machine

Unary subtraction

Alan's machine

Alan's machine

The separation of program and tape

The standard semantics of regular expressions

The Turing machine memory model

Turing regular expressions

Finite-state machines: the next 700 programming languages

Completely free from Turing-completeness

Bonus track: Brainfuck

Small examples

Alan's machine

Alan's machine

Nondeterministic Brainfuck

Alan's machine

Alan's empty machine

`prompt` and `control0`, “goto” with extra steps?