| CARVIEW |
A couple months ago, I asked for help creating a web interface for Disco, a student-oriented programming language for teaching functional programming and discrete mathematics. I am very happy to report that Heinrich Apfelmus responded to the call, and this is the result:
It’s fairly bare bones at the moment, but it’s a fantastic place to start, and far more than I would have been able to accomplish in a few weeks. Give it a try, and let me know of any bugs you find! You are also very welcome to contribute fixes, features, etc. on GitHub:
If you want to know more about Disco, you can read a paper about it here, or read the language documentation.
tl;dr: I would like to have a web interface for Disco, a student-oriented programming language for teaching functional programming and discrete mathematics, which is implemented in Haskell. I’m looking for others interested to help build it. If you like building web stuff with Haskell compiled to WASM and want to have a positive impact on students learning mathematics and functional programming, get in touch!
Disco
For the past nine (!) years I have been developing Disco, a functional teaching language for use in a discrete mathematics course. It features first-class functions, polymorphism, and recursive algebraic data types, along with various built-in mathematical niceties and syntax that is designed to be close to mathematical practice.
As a simple example, here’s a recursive function in Disco to compute the hyperbinary numbers:
h : N -> N
h(0) = 1
h(2n) = h(n) + h(n .- 1)
h(2n + 1) = h(n)
Since recursive functions in Disco are automatically memoized, this evaluates almost instantly even on very large numbers:
Disco> h(1000000)
1287
If you want to know more about Disco, you can read a paper about it here, or read the language documentation.
The problem
I want students in my Discrete Mathematics course (or any Discrete Mathematics course) to be able to use Disco: to play around at a REPL, write and test their own code, and so on.
However, getting students to install Haskell and build Disco from source is a total non-starter, for multiple reasons. Some students in the course are not all that tech-savvy. Some students don’t even have their own computer, or the computer they do have chokes when trying to compile a large Haskell project (e.g. because of limited memory). Even aside from those issues, I simply don’t want to spend time and effort helping students install stuff at the beginning of the semester (at least not in this class).
Old solution: Replit
For a couple years, students were able to use Disco via a site called Replit, which provided free educational accounts, and supported arbitrary Nix configurations. Since the disco package on Hackage was picked up by nixpkgs, it was possible to spin up a Disco interpreter on Replit in just a few seconds. Replit provided a virtual file system, an editor, and, of course, a REPL.
This was a great solution while it lasted, but sadly it is no longer viable:
- Disco has been broken in nixpkgs for a while now, and I have no idea how to fix it.
- Even if I could fix it, Replit has stopped supporting free education accounts, and started trying to shove LLMs into everything, making it no longer a viable teaching platform.
Solution criteria
There are a few non-negotiable criteria that any solution must meet:
It must either run on existing cloud infrastructure, or run completely client-side. I do not want to be in the business of running a server, or of worrying about mitigating DOS attacks (by which I mean students accidentally running infinite loops generating infinite amounts of memory). I also do not want to be in the business of managing accounts and logins.
Students should not have to install anything. As I mentioned before, this is a nonstarter for some students.
It should allow students to work at a Disco REPL, and also test their own Disco code.
Potential solutions
Of course, a really nice solution would be to find an existing site which replicates some of the functionality we used to have with Replit and supports educational uses. I kind of doubt such a thing exists, though I would be happy to be wrong about this.
Another possibility is to have students use VSCode in their browsers via GitHub; to make this work we would have to add LSP support to Disco and package it up appropriately. I’m not really excited about using GitHub for this, although implementing LSP for Disco is independently a great idea.
The other solution I can think of is to compile Disco to WASM, and build a web UI on top of it, so that the whole thing runs in the student’s browser. I spent some time on this last year and successfully got Disco to compile to WASM, but didn’t make it any farther than that. Honestly, I simply don’t know anything about web development, and I am not very motivated to learn.
UI levels
Running with the last idea above (WASM + a custom web UI), what could such a thing look like? What features would we want? Here are some different solutions I could imagine, in increasing order of effort.
Level 0 would be to have just a web-based REPL. Students can edit Disco code on their own device, upload it to the website, then evaluate expressions at the REPL. Having to reupload their code every time they make changes would be annoying, but this would still be better than nothing.
Level 1 would be to have a web-based editor and REPL. Students can type code in the editor, then push a button to load it into the REPL and play with it.
Level 2 would be to have a web-based filesystem + editor + REPL. Students can see a list of multiple files, edit them individually, and load any of them into the REPL. The filesystem must live on their own computer, not on a remote server; but it could be their real filesystem, or a virtual filesystem.
I don’t know how difficult any of these are; I would assume that at least some of them can be achieved by gluing together some existing Javascript components (e.g. CodeMirror?). I’m sure it will require extending Disco itself with some new APIs, but I can definitely work on that part once I know what would be needed.
If you know how to build web UIs like this and are interested in helping, get in touch!
In a post from a year ago, I explored how to prove decidable equality in Agda of a particular indexed data type. Recently, I discovered a different way to accomplish the same thing, without resorting to embedded sigma types.
This post is literate Agda; you can download it here if you want to play along. I tested everything here with Agda version 2.6.4.3 and version 2.0 of the standard library. (I assume it would also work with more recent versions, but haven’t tested it.)
Background
This section is repeated from my previous post, which I assume no one remembers.
First, some imports and a module declaration. Note that the entire
development is parameterized by some abstract set B of base types,
which must have decidable equality.
open import Data.Product using (Σ ; _×_ ; _,_ ; -,_ ; proj₁ ; proj₂)
open import Data.Product.Properties using (≡-dec)
open import Function using (_∘_)
open import Relation.Binary using (DecidableEquality)
open import Relation.Binary.PropositionalEquality using (_≡_ ; refl)
open import Relation.Nullary.Decidable using (yes; no; Dec)
module OneLevelTypesIndexed2 (B : Set) (≟B : DecidableEquality B) whereWe’ll work with a simple type system containing base types, function types, and some distinguished type constructor □. So far, this is just to give some context; it is not the final version of the code we will end up with, so we stick it in a local module so it won’t end up in the top-level namespace.
For example, if \(X\) and \(Y\) are base types, then we could write down a type like \(\square ((\square \square X \to Y) \to \square Y)\):
infixr 2 _⇒_
infix 30 □_
postulate
BX BY : B
X : Ty
X = base BX
Y : Ty
Y = base BY
example : Ty
example = □ ((□ □ X ⇒ Y) ⇒ □ Y)However, for reasons that would take us too far afield in this blog post, I don’t want to allow immediately nested boxes, like \(\square \square X\). We can still have multiple boxes in a type, and even boxes nested inside of other boxes, as long as there is at least one arrow in between. In other words, I only want to rule out boxes immediately applied to another type with an outermost box. So we don’t want to allow the example type given above (since it contains \(\square \square X\)), but, for example, \(\square ((\square X \to Y) \to \square Y)\) would be OK.
Two encodings
In my previous blog
post,
I ended up with the following encoding of types indexed by a Boxity,
which records the number of top-level boxes. Since the boxity of the
arguments to an arrow type do not matter, we make them sigma types
that package up a boxity with a type having that boxity. I was then
able to define decidable equality for ΣTy and Ty by mutual
recursion.
data Boxity : Set where
₀ : Boxity
₁ : Boxity
variable b b₁ b₂ b₃ b₄ : Boxity
module WithSigma where
ΣTy : Set
data Ty : Boxity → Set
ΣTy = Σ Boxity Ty
data Ty where
□_ : Ty ₀ → Ty ₁
base : B → Ty ₀
_⇒_ : ΣTy → ΣTy → Ty ₀The problem is that working with this definition of Ty is really
annoying! Every time we construct or pattern-match on an arrow type,
we have to package up each argument type into a dependent pair with
its Boxity; this introduces syntactic clutter, and in many cases we
know exactly what the Boxity has to be, so it’s not even
informative. The version we really want looks more like this:
data Ty : Boxity → Set where
base : B → Ty ₀
_⇒_ : {b₁ b₂ : Boxity} → Ty b₁ → Ty b₂ → Ty ₀
□_ : Ty ₀ → Ty ₁
infixr 2 _⇒_
infix 30 □_In this version, the boxities of the arguments to the arrow constructor are just implicit parameters of the arrow constructor itself. Previously, I was unable to get decidable equality to go through for this version… but just the other day, I finally realized how to make it work!
Path-dependent equality
The key trick that makes everything work is to define a path-dependent equality type. I learned this from Martín Escardó. The idea is that we can express equality between two indexed things with different indices, as long as we also have an equality between the indices.
_≡⟦_⟧_ : {A : Set} {B : A → Set} {a₀ a₁ : A} → B a₀ → a₀ ≡ a₁ → B a₁ → Set
b₀ ≡⟦ refl ⟧ b₁ = b₀ ≡ b₁That’s exactly what we need here: the ability to express
equality between Ty values, which may be indexed by different
boxities—as long as we know that the boxities are equal.
Decidable equality for Ty
We can now use this to directly encode decidable equality for Ty.
First, we can easily define decidable equality for Boxity.
Boxity-≟ : DecidableEquality Boxity
Boxity-≟ ₀ ₀ = yes refl
Boxity-≟ ₀ ₁ = no λ ()
Boxity-≟ ₁ ₀ = no λ ()
Boxity-≟ ₁ ₁ = yes reflHere is the type of the decision procedure: given two Ty values
which may have different boxities, we decide whether or not we can
produce a witness to their equality. Such a witness consists of a
pair of (1) a proof that the boxities are equal, and (2) a proof
that the types are equal, depending on (1).We would really like to
write this as Σ (b₁ ≡ b₂) λ p → σ ≡⟦ p ⟧ τ, but for some reason Agda
requires us to fill in some extra implicit arguments before it is
happy that everything is unambiguous, requiring some ugly syntax.
Before showing the definition of Ty-≟′, let’s see that we can use it
to easily define both a boxity-homogeneous version of decidable
equality for Ty, as well as decidable equality for Σ Boxity Ty:
Ty-≟ : DecidableEquality (Ty b)
Ty-≟ {b} σ τ with Ty-≟′ σ τ
... | no σ≢τ = no (λ σ≡τ → σ≢τ ( refl , σ≡τ))
... | yes (refl , σ≡τ) = yes σ≡τ
ΣTy-≟ : DecidableEquality (Σ Boxity Ty)
ΣTy-≟ (_ , σ) (_ , τ) with Ty-≟′ σ τ
... | no σ≢τ = no λ { refl → σ≢τ (refl , refl) }
... | yes (refl , refl) = yes reflA lot of pattern matching on refl and everything falls out quite easily.
And now the definition of Ty-≟′. It looks complicated, but it is
actually not very difficult. The most interesting case is when
comparing two arrow types for equality: we must first compare the
boxities of the arguments, then consider the arguments themselves once
we know the boxities are equal.
Ty-≟′ (□ σ) (□ τ) with Ty-≟′ σ τ
... | yes (refl , refl) = yes (refl , refl)
... | no σ≢τ = no λ { (refl , refl) → σ≢τ (refl , refl) }
Ty-≟′ (base S) (base T) with ≟B S T
... | yes refl = yes (refl , refl)
... | no S≢T = no λ { (refl , refl) → S≢T refl }
Ty-≟′ (_⇒_ {b₁} {b₂} σ₁ σ₂) (_⇒_ {b₃} {b₄} τ₁ τ₂) with Boxity-≟ b₁ b₃ | Boxity-≟ b₂ b₄ | Ty-≟′ σ₁ τ₁ | Ty-≟′ σ₂ τ₂
... | no b₁≢b₃ | _ | _ | _ = no λ { (refl , refl) → b₁≢b₃ refl }
... | yes _ | no b₂≢b₄ | _ | _ = no λ { (refl , refl) → b₂≢b₄ refl }
... | yes _ | yes _ | no σ₁≢τ₁ | _ = no λ { (refl , refl) → σ₁≢τ₁ (refl , refl) }
... | yes _ | yes _ | yes _ | no σ₂≢τ₂ = no λ { (refl , refl) → σ₂≢τ₂ (refl , refl) }
... | yes _ | yes _ | yes (refl , refl) | yes (refl , refl) = yes (refl , refl)
Ty-≟′ (□ _) (base _) = no λ ()
Ty-≟′ (□ _) (_ ⇒ _) = no λ ()
Ty-≟′ (base _) (□ _) = no λ ()
Ty-≟′ (base _) (_ ⇒ _) = no λ { (refl , ()) }
Ty-≟′ (_ ⇒ _) (□ _) = no λ ()
Ty-≟′ (_ ⇒ _) (base _) = no λ { (refl , ()) }Tagged monoid, semigroup, idempotent, range, query, sum, sparse, table, Haskell, competitive programming
Continuing a series of posts on techniques for calculating range queries, today I will present the sparse table data structure, for doing fast range queries on a static sequence with an idempotent combining operation.
Motivation
In my previous post, we saw that if we have a static sequence and a binary operation with a group structure (i.e. every element has an inverse), we can precompute a prefix sum table in \(O(n)\) time, and then use it to answer arbitrary range queries in \(O(1)\) time.
What if we don’t have inverses? We can’t use prefix sums, but can we
do something else that still allows us to answer range queries in
\(O(1)\)? One thing we could always do would be to construct an \(n
\times n\) table storing the answer to every possible range
query—that is, \(Q[i,j]\) would store the value of the range \(a_i
\diamond \dots \diamond a_j\). Then we could just look up the answer to
any range query in \(O(1)\). Naively computing the value of each
\(Q[i,j]\) would take \(O(n)\) time, for a total of \(O(n^3)\) time to fill
in each of the entries in the tableWe only have to fill in \(Q[i,j]\)
where \(i < j\), but this is still about \(n^2/2\) entries.
, though it’s not
too hard to fill in the table in \(O(n^2)\) total time, spending only
\(O(1)\) to fill in each entry—I’ll leave this to you as an exercise.
However, \(O(n^2)\) is often too big. Can we do better? More generally, we are looking for a particular subset of range queries to precompute, such that the total number is asymptotically less than \(n^2\), but we can still compute the value of any arbitrary range query by combining some (constant number of) precomputed ranges. In the case of a group structure, we were able to compute the values for only prefix ranges of the form \(1 \dots k\), then compute the value of an arbitrary range using two prefixes, via subtraction.
A sparse table is exactly such a scheme for precomputing a subset of
ranges.In fact, I believe, but do not know for sure, that this is
where the name “sparse table” comes from—it is “sparse” in the sense
that it only stores a sparse subset of range values.
Rather than only
a linear number of ranges, as with prefix sums, we have to compute
\(O(n \lg n)\) of them, but that’s still way better than \(O(n^2)\). Note,
however, that a sparse table only works when the combining operation
is idempotent, that is, when \(x \diamond x = x\) for all \(x\). For
example, we can use a sparse table with combining operations such as
\(\max\) or \(\gcd\), but not with \(+\) or \(\times\). Let’s see how it works.
Sparse tables
The basic idea behind a sparse table is that we precompute a series of “levels”, where level \(i\) stores values for ranges of length \(2^i\). So level \(0\) stores “ranges of length \(1\)”—that is, the elements of the original sequence; level \(1\) stores ranges of length \(2\); level \(2\) stores ranges of length \(4\); and so on. Formally, \(T[i,j]\) stores the value of the range of length \(2^i\) starting at index \(j\). That is,
\[T[i,j] = a_j \diamond \dots \diamond a_{j+2^i-1}.\]
We can see that \(i\) only needs to go from \(0\) up to \(\lfloor \lg n \rfloor\); above that and the stored ranges would be larger than the entire sequence. So this table has size \(O(n \lg n)\).
Two important questions remain: how do we compute this table in the first place? And once we have it, how do we use it to answer arbitrary range queries in \(O(1)\)?
Computing the table is easy: each range on level \(i\), of length \(2^i\), is the combination of two length-\(2^{i-1}\) ranges from the previous level. That is,
\[T[i,j] = T[i-1, j] \diamond T[i-1, j+2^{i-1}]\]
The zeroth level just consists of the elements of the original sequence, and we can compute each subsequent level using values from the previous level, so we can fill in the entire table in \(O(n \lg n)\) time, doing just a single combining operation for each value in the table.
Once we have the table, we can compute the value of an arbitrary range \([l,r]\) as follows:
Compute the biggest power of two that fits within the range, that is, the largest \(k\) such that \(2^k \leq r - l + 1\). We can compute this simply as \(\lfloor \lg (r - l + 1) \rfloor\).
Look up two range values of length \(2^k\), one for the range which begins at \(l\) (that is, \(T[k, l]\)) and one for the range which ends at \(r\) (that is, \(T[k, r - 2^k + 1]\)). These two ranges overlap; but because the combining operation is idempotent, combining the values of the ranges yields the value for our desired range \([l,r]\).
This is why we require the combining operation to be idempotent: otherwise the values in the overlap would be overrepresented in the final, combined value.
Haskell code
Let’s write some Haskell code! First, a little module for idempotent
semigroups. Note that we couch everything in terms of semigroups,
not monoids, because we have no particular need of an identity
element; indeed, some of the most important examples like \(\min\) and
\(\max\) don’t have an identity element. The IdempotentSemigroup
class has no methods, since as compared to Semigroup it only adds a
law. However, it’s still helpful to signal the requirement. You
might like to convince yourself that all the instances listed below
really are idempotent.
module IdempotentSemigroup where
import Data.Bits
import Data.Semigroup
-- | An idempotent semigroup is one where the binary operation
-- satisfies the law @x <> x = x@ for all @x@.
class Semigroup m => IdempotentSemigroup m
instance Ord a => IdempotentSemigroup (Min a)
instance Ord a => IdempotentSemigroup (Max a)
instance IdempotentSemigroup All
instance IdempotentSemigroup Any
instance IdempotentSemigroup Ordering
instance IdempotentSemigroup ()
instance IdempotentSemigroup (First a)
instance IdempotentSemigroup (Last a)
instance Bits a => IdempotentSemigroup (And a)
instance Bits a => IdempotentSemigroup (Ior a)
instance (IdempotentSemigroup a, IdempotentSemigroup b) => IdempotentSemigroup (a,b)
instance IdempotentSemigroup b => IdempotentSemigroup (a -> b)Now, some code for sparse tables. First, a few imports.
{-# LANGUAGE TupleSections #-}
module SparseTable where
import Data.Array (Array, array, (!))
import Data.Bits (countLeadingZeros, finiteBitSize, (!<<.))
import IdempotentSemigroupThe sparse table data structure itself is just a 2D array over some
idempotent semigroup m. Note that UArray would be more efficient,
but (1) that would make the code for building the sparse table more
annoying (more on this later), and (2) it would require a bunch of
tedious additional constraints on m.
We will frequently need to compute rounded-down base-two logarithms,
so we define a function for it. A straightforward implementation
would be to repeatedly shift right by one bit and count the number of
shifts needed to reach zero; however, there is a better way, using
Data.Bits.countLeadingZeros. It has a naive default implementation
which counts right bit shifts, but in most cases it compiles down to
much more efficient machine instructions.
-- | Logarithm base 2, rounded down to the nearest integer. Computed
-- efficiently using primitive bitwise instructions, when available.
lg :: Int -> Int
lg n = finiteBitSize n - 1 - countLeadingZeros nNow let’s write a function to construct a sparse table, given a
sequence of values. Notice how the sparse table array st is defined
recursively.
This works because the Array type is lazy in the stored values, with
the added benefit that only the array values we end up actually
needing will be computed. However, this comes with a decent amount of
overhead. If we wanted to use an unboxed array instead, we wouldn’t
be able to use
the recursive definition trick; instead, we would have to use an
STUArray
and fill in the values in a specific order. The code for this would
be longer and much more tedious, but could be faster if we end up
needing all the values in the array anyway.
-- | Construct a sparse table which can answer range queries over the
-- given list in $O(1)$ time. Constructing the sparse table takes
-- $O(n \lg n)$ time and space, where $n$ is the length of the list.
fromList :: IdempotentSemigroup m => [m] -> SparseTable m
fromList ms = SparseTable st
where
n = length ms
lgn = lg n
st =
array ((0, 0), (lgn, n - 1)) $
zip ((0,) <$> [0 ..]) ms
++ [ ((i, j), st ! (i - 1, j) <> st ! (i - 1, j + 1 !<<. (i - 1)))
| i <- [1 .. lgn]
, j <- [0 .. n - 1 !<<. i]
]Finally, we can write a function to answer range queries.
Applications
Most commonly, we can use a sparse table to find the minimum or
maximum values on a range, \(\min\) and \(\max\) being the quintessential
idempotent operations. For example, this plays a key role in a
solution to the (quite tricky) problem
Ograda.At first it
seemed like that problem should be solvable with some kind of sliding
window approach, but I couldn’t figure out how to make it work!
What if we want to find the index of the minimum or maximum value in
a given range (see, for example, Worst Weather)? We can easily accomplish this using the semigroup Min (Arg m i) (or Max (Arg m i)), where m is the type of the values and i is
the index type. Arg, from Data.Semigroup, is just a pair which uses only the first value
for its Eq and Ord instances, and carries along the second value
(which is also exposed via Functor, Foldable, and Traversable
instances). In the example below, we can see that the call to range st 0 3 returns both the max value on the range (4) and its index
(2) which got carried along for the ride:
λ> :m +Data.Semigroup
λ> st = fromList (map Max (zipWith Arg [2, 3, 4, 2, 7, 4, 9] [0..]))
λ> range st 0 3
Max {getMax = Arg 4 2}
Finally, I will mention that being able to compute range minimum queries is one way to compute lowest common ancestors for a (static, rooted) tree. First, walk the tree via a depth-first search and record the depth of each node encountered in sequence, a so-called Euler tour (note that you must record every visit to a node—before visiting any of its children, in between each child, and after visiting all the children). Now the minimum depth recorded between visits to any two nodes will correspond to their lowest common ancestor.
Here are a few problems that involve computing least common ancestors in a tree, though note there are also other techniques for computing LCAs (such as binary jumping) which I plan to write about eventually.
In a previous blog post I categorized a number of different techniques for calculating range queries. Today, I will discuss one of those techniques which is simple but frequently useful.
Precomputing prefix sums
Suppose we have a static sequence of values \(a_1, a_2, a_3, \dots,
a_n\) drawn from some
groupThat is,
there is an associative binary operation with an identity element, and
every element has an inverse.
, and want
to be able to compute the total value (according to the group
operation) of any contiguous subrange. That is, given a range
\([i,j]\), we want to compute \(a_i \diamond a_{i+1} \diamond \dots
\diamond a_j\) (where \(\diamond\) is the group operation). For example,
we might have a sequence of integers and want to compute the sum, or
perhaps the bitwise xor (but not the maximum) of all the values in any particular
subrange.
Of course, we could simply compute \(a_i \diamond \dots \diamond a_j\) directly, but that takes \(O(n)\) time. With some simple preprocessing, it’s possible to compute the value of any range in constant time.
The key idea is to precompute an array \(P\) of prefix sums, so \(P_i = a_1 \diamond \dots \diamond a_i\). This can be computed in linear time via a scan; for example:
import Data.Array
import Data.List (scanl')
prefix :: Monoid a => [a] -> Array Int a
prefix a = listArray (0, length a) $ scanl' (<>) mempty aActually, I would typically use an unboxed array, which is
faster but slightly more limited in its uses: import
Data.Array.Unboxed, use UArray instead of Array, and add an
IArray UArray a constraint.
Note that we set \(P_0 = 0\) (or whatever the identity element is for the group); this is why I had the sequence of values indexed starting from \(1\), so \(P_0\) corresponds to the empty sum, \(P_1 = a_1\), \(P_2 = a_1 \diamond a_2\), and so on.
Now, for the value of the range \([i,j]\), just compute \(P_j \diamond P_{i-1}^{-1}\)—that is, we start with a prefix that ends at the right place, then cancel or “subtract” the prefix that ends right before the range we want. For example, to find the sum of the integers \(a_5 + \dots + a_{10}\), we can compute \(P_{10} - P_4\).
That’s why this only works for groups but not for general monoids: only in a group can we cancel unwanted values. So, for example, this works for finding the sum of any range, but not the maximum.
Practice problems
Want to practice? Here are a few problems that can be solved using techniques discussed in this post:
It is possible to generalize this scheme to 2D—that is, to compute the value of any subrectangle of a 2D grid of values from some group in only \(O(1)\) time. I will leave you the fun of figuring out the details.
If you’re looking for an extra challenge, here are a few harder problems which use techniques from this post as an important component, but require some additional nontrivial ingredients:
Static range queries
Suppose we have a sequence of values, which is static in the sense that the values in the sequence will never change, and we want to perform range queries, that is, for various ranges we want to compute the total of all consecutive values in the range, according to some binary combining operation. For example, we might want to compute the maximum, sum, or product of all the consecutive values in a certain subrange. We have various options depending on the kind of ranges we want and the algebraic properties of the operation.
If we want ranges corresponding to a sliding window, we can use an amortized queue structure to find the total of each range in \(O(1)\), for an arbitrary monoid.
If we want arbitrary ranges but the operation is a group, the solution is relatively straightforward: we can precompute all prefix sums, and subtract to find the result for an arbitrary range in \(O(1)\).
If the operation is an idempotent semigroup (that is, it has the property that \(x \diamond x = x\) for all \(x\)), we can use a sparse table, which takes \(O(n \lg n)\) time and space for precomputation, and then allows us to answer arbitrary range queries in \(O(1)\).
If the operation is an arbitrary monoid, we can use a sqrt tree, which uses \(O(n \lg \lg n)\) precomputed time and space, and allows answering arbitrary range queries in \(O(\lg \lg n)\). I will write about this in a future post.
Dynamic range queries
What if we want dynamic range queries, that is, we want to be able to interleave range queries with arbitrary updates to the values of the sequence?
- If the operation is an arbitrary monoid, we can use a segment tree.
- If the operation is a group, we can use a Fenwick tree.
I published a paper about Fenwick trees, which also discusses segment trees, but I should write more about them here!
Table
Here’s a table summarizing the above classification scheme. I plan to fill in links as I write blog posts about each row.
| Sequence | Ranges | Operation | Solution | Precomputation | Queries |
|---|---|---|---|---|---|
| Static | Sliding window | Monoid | Amortized queue | \(O(1)\) | \(O(1)\) |
| Static | Arbitrary | Group | Prefix sum table | \(O(n)\) | \(O(1)\) |
| Static | Arbitrary | Idempotent semigroup | Sparse table | \(O(n \lg n)\) | \(O(1)\) |
| Static | Arbitrary | Monoid | Sqrt tree | \(O(n \lg \lg n)\) | \(O(\lg \lg n)\) |
| Dynamic | Arbitrary | Group | Fenwick tree | \(O(n)\) | \(O(\lg n)\) |
| Dynamic | Arbitrary | Monoid | Segment tree | \(O(n)\) | \(O(\lg n)\) |
In January 2009, while just a baby first-year PhD student, I wrote a blog post titled Abstraction, intuition, and the “monad tutorial fallacy”. In it, I made the argument that humans tend to learn best by first grappling with concrete examples, and only later proceeding to higher-level intuition and analogies; hence, it’s a mistake to think that clearly presenting your intuition for a topic will help other people understand it. Analogies and intuition can help, but only when accompanied by concrete examples and active engagement. To illustrate the point, I made up a fictitious programmer with a fictitious analogy.
But now Joe goes and writes a monad tutorial called “Monads are Burritos,” under the well-intentioned but mistaken assumption that if other people read his magical insight, learning about monads will be a snap for them. “Monads are easy,” Joe writes. “Think of them as burritos.” Joe hides all the actual details about types and such because those are scary, and people will learn better if they can avoid all that difficult and confusing stuff. Of course, exactly the opposite is true, and all Joe has done is make it harder for people to learn about monads…
My intention was to choose a fictitious analogy which was obviously ridiculous and silly, as a parody of many of the monad tutorials which existed at the time (and still do). Mark Jason Dominus then wrote a blog post, Monads are like burritos, pointing out that actually, monads are kinda like burritos. It’s really funny, though I don’t think it’s actually a very good analogy, and my guess is that Mark would agree: it was clearly written as a silly joke and not as a real way to explain monads.
In any case, from that point the “monads are burritos” meme took on a life of its own. For example:
- Chris Done made a webcomic about it
- Ed Morehouse wrote a ridiculous paper exploring the categorical foundations of burritos
- Someone made a
burritolibrary in Rust - Dr Eugenia Cheng tweeted about it
I even joined in the fun and made this meme image about bad monad tutorials:

Of course there are lots of people who still understand that it was all just a silly joke. Recently, however, I’ve seen several instances where people apparently believe “monads are burritos” is a real, helpful thing and not just a joke meme. For example, see this thread on lobste.rs, or this Mastodon post.
So, to set the record straight: “monads are burritos” is not a helpful
analogy!Yes, I am writing a blog post because People Are Wrong On
The Internet, and I know it probably won’t
make any difference, but here we are.
Why not, you ask?
To expand on my reasons from a 10-year-old Reddit
comment:
- The burrito analogy strongly implies that a value of type
m asomehow “contains” a value (or values) of typea. But that is not true for all monads (e.g. there is no sense in which a value of typeIO Stringcontains aString). - Relatedly, the analogy also implies that a value of type
m acan be “unwrapped” to get ana, but this is impossible for many monads. - It is not actually very easy to take a burrito containing a burrito
and merge it into a single-level burrito. At least this is not in
any sense a natural operation on burritos. Perhaps you could argue
that it is always easy to remove outer tortilla layers (but not the
innermost one since the food will all fall out), but this is a bad
analogy, since in general
joindoes not just “remove” an outer layer, but somehow merges the effects of two layers into one.
Actually, burritos are a great analogy for the Identity monad!
…but not much beyond that.
On a more positive note, my sense is that the average pedagogical quality of Haskell materials, and monad tutorials in particular, has indeed gone up significantly since 2009. I’d love to think this can be at least partially attributed to my original blog post, though of course it’s impossible to know that for sure.
A few days ago I gave a talk at ZuriHac 2025 entitled Haskell for Competitive Programming, a basic introduction to competitive programming in general, and the joy of using Haskell for competitive programming in particular. This is an expanded version of my talk in blog post form. (For an even gentler introduction to competitive programming in Haskell, see this old blog post from 2019.)
Competitive Programming
First of all, what is competitive programming? It’s a broad term, but when I talk about competitive programming I have something in mind along the following lines:
- There are well-specified input and output formats, usually with a few examples, and a precise specification of what the output should be for a given input.
- Your job is to write a program which transforms input meeting the specification into a correct output.
- You submit your program, which is tested on a number of inputs and declared correct if and only if it yields the correct output for all the tested inputs.
- There is often time pressure involved—that is, you have a limited amount of time in which to write your program. However, it is also possible to participate “recreationally”, simply for the joy of problem-solving, without time pressure (in fact, the vast majority of the competitive programming I do is of this form, though I have occasionally participated in timed contests).
There are many variations: whether you are allowed to use code libraries prepared ahead of time, or must type everything from scratch; outputs can be scored according to some criteria rather than simply being judged right or wrong; and so on.
There are many sites which allow you to participate in contests and/or solve competitive programming problems recreationally. My favorite is Open Kattis; I mention some others at the end of this post.
Pot: a first example
As an introductory example, let’s look at
Pot. As usual, there’s a silly
story, but what it boils down to is that we will be given a sequence
of numbers, and we should interpret the last digit of each number as an
exponent, then sum the results. For example, if given 125, we
should interpret it as \(12^5\), and so on.
Dealing with I/O via interact
An imperative approach to such a problem would involve doing a
sequence of input commands, some computation, and a sequence of output
commands—possibly interleaved with one another—and we might
immediately think to start using functions like getLine and
putStrLn to do the required I/O in Haskell. However, there is a
much more fruitful functional perspective: we are simply being asked
to implement a particular (partial) function of type String -> String. The fact that the function’s input and output should be
hooked up to the program’s standard input and output is just an
implementation detail. Competitive programming is functional at
heart!
It turns out that Haskell’s standard library already has the perfect built-in function for this scenario:
interact takes a pure String -> String function and turns it into
an IO action which reads from standard input, passes the input to
the given String -> String function, and prints the result to standard output. It even
does this using lazy I/O—that is, the input is
read lazily, as demanded by the function, so that the output and input
can be automatically interleaved depending on which parts of the
output depend on which parts of the input. In particular, this means
that that the entire input need not be stored in memory at once. If
the inputs can be processed into outputs in a streaming fashion—as
is the case in the example problem we are currently
considering—then the input and output will be interleaved. In
general, this kind of lazy I/O is
problematic
and even unsafe, but it’s perfect for this scenario.
Solving the problem with a pipeline
So interact does all the IO for us, and all we have to do is write
a pure String -> String function which transforms the input to the
output. In this case, we can split the input into lines, drop the
first line (we don’t need to know how many lines of input there
are—we just get a list of all of them, since interact will read
until EOF), read each number and turn it into the first digits
raised to the power of the last digit, then sum them and show the
result. The full solution is below. Notice how I use the “backwards
composition” operator (>>>), since I find it more convenient to type
from left to right as I’m thinking about transforming from input to
output.
import Control.Category ((>>>))
main = interact $
lines >>> drop 1 >>> map (read >>> process) >>> sum >>> show
process :: Integer -> Integer
process n = (n `div` 10) ^ (n `mod` 10)I use Integer here since raw performance doesn’t matter much for
this easy problem, and Integer avoids any potential problems with
overflow. However, using Int instead of Integer can make a big
difference for some compute-intensive problems. On Kattis, Int will
always be 64 bits, but last time I checked Int can be 32 bits on
Codeforces.
Shopping List: wholemeal programming and ByteString
Let’s consider Shopping List as a second example. In this problem, we are given a list of shopping lists, where each shopping list consists of a list of space-separated items on a single line. We are asked to find the items which are common to all the shopping lists, and print them in alphabetical order.
Wholemeal programming with standard data structures
This problem is very amenable to a “wholemeal programming”
approach,
where we work entirely at the level of whole data structure
transformations rather than looping over individual elements. We can
turn each shopping list into a set, then find the intersection of all
the sets. Moreover, if we use Data.Set, which uses an ordering on
the elements, we will get the result in alphabetical order “for free”
(“free” as in the amount of code we have to write, not necessarily
runtime cost). Haskell has a decent collection of data structures in
the containers library ((Int)Set, (Int)Map, Seq, Tree, and
even Graph) with a large collection of standard methods to construct
and manipulate them, which are bread and butter for many competitive
programming problems.
ByteString vs String
Unfortunately, when we try submitting this code, we get a Time Limit Exceeded error! What’s wrong?
The issue is our use of String, which is an actual linked list of
characters and is very slow, especially when we have many short
strings, as in this problem. In the worst case, we could have 100
shopping lists, each with 5000 items of length 10, for a total of up
to 5 MB of input; with that much input data to read, any overhead
associated with reading and parsing the input can make a significant
difference.
Switching to ByteString is much faster. Why not Text, you ask?
Well, Text has to do a bunch of extra work to deal properly with
Unicode encodings, but in 99.99% of all competitive programming problems
I’ve ever seen, the input is guaranteed to be ASCII. So not
only do we not need Text, we can get away with a version of
ByteString that simply assumes every character is a single 8-bit
byte!
Once we import it, all we need to do is replace a bunch of
String operations with corresponding ByteString ones.
{-# LANGUAGE ImportQualifiedPost #-}
import Control.Category ((>>>))
import Data.Set (Set)
import Data.Set qualified as S
import Data.ByteString.Lazy.Char8 qualified as BS
main = BS.interact $
BS.lines >>> drop 1 >>> map (BS.words >>> S.fromList) >>>
foldr1 S.intersection >>>
(\s -> BS.pack (show (S.size s)) : S.toList s) >>> BS.unlinesA Favourable Ending: input parsing and lazy recursive structures
As a last example, let’s look at A Favourable Ending. This problem consists of a number of test cases; each test case describes a choose-your-own-adventure book with a number of sections, where each section is either an ending (either good or bad), or allows the reader to choose among three sections to proceed to next. For each test case, we are asked how many distinct stories there are with good endings.
More abstractly, since we are guaranteed that there are no loops, the sections of the book form a DAG, and we are asked to count the number of distinct paths in a DAG from a distinguished start node to any of a distinguished set of “good” leaves.
Parsing with Scanner
Parsing the input for this problem is trickier than the other
examples so far. In theory, we could still ignore the first number
specifying the number of test cases, and just continue reading test
cases until EOF. However, each test case begins with a number
specifying the number of sections in the book, and we cannot ignore
this number: we need to know how many lines to read before the start
of the next test case. Doing this manually involves pattern-matching
on a list of lines, using splitAt to split off the lines for each
test case, and manually passing around the list of the remaining
lines: tedious.
Fortunately, Haskell is great at building abstractions to insulate us
from such tedium. I’ve developed a simple Scanner
abstraction
which works well in this context.
We begin by creating some data types to represent the input in structured form:
type Book = Map Int Section
data Section = End Disposition | Choice [Int]
deriving (Eq, Show)
data Disposition = Favourably | Catastrophically
deriving (Eq, Show, Read)Now we can write a Scanner to read a Book:
book :: Scanner Book
book = do
s <- int
M.fromList <$> s >< ((,) <$> int <*> section)
section :: Scanner Section
section = do
t <- peek
if isDigit (BS.head t)
then Choice <$> (3 >< int)
else End . readLower . BS.unpack <$> str
readLower :: Read a => String -> a
readLower = read . onHead toUpper
onHead :: (a -> a) -> [a] -> [a]
onHead _ [] = []
onHead f (x : xs) = f x : xs(readLower and onHead are functions in my personal competitive
programming template, included here for completeness).
One more piece of boilerplate we can write at this point is the main
function, which simply consists of running the Scanner to read all the
test cases, solving each test case, and formatting the output.
DP + topsort with a lazy recursive map
With all that framework out of the way, we can turn to actually solving the problem. And here is where something really fun happens. In a typical imperative language, we would have to first topologically sort the book sections, then use dynamic programming to compute the number of good stories beginning at each section, starting with the leaves and proceeding backwards through the topological sort to the start—dozens of lines of code. However, in Haskell we can get all of this for free, just by defining a lazy, recursive map!
solve :: Book -> Int
solve book = endings ! 1
where
endings = M.fromList [(p, endingsFrom (book!p)) | p <- M.keys book]
endingsFrom (End d) = if d == Favourably then 1 else 0
endingsFrom (Choice ps) = sum $ map (endings !) psendings is a Map from each book section to the number of favorable
stories starting with that section. Notice how its values are defined
via the endingsFrom function, which is in turn defined, in the
Choice case, by looking up the values of the choices in the
endings map and summing them. endings is thus defined
recursively, which works because it is lazy in the values. When we
demand the value of endings ! 1, the runtime system starts evaluating
thunks in the map as needed, implicitly doing a topological sort for us.
Here’s another way to think about this: what we really want is the
function endingsFrom : Section -> Int, which tells us how many good
endings there are starting at a given section. It can be defined via a
recurrence; however, if we were to literally implement it as a
recursive function, our program would spend a ridiculous amount of
time recomputing the same values over and over again. So, we insert a
lazy map in the middle to memoize it (there are other data
structures
that can be used for this purpose as well).
Resources
Here are some resources in case you’re interested in exploring more.
- Open Kattis has a collection of thousands of high-quality problems which can be solved in Haskell (or many other languages). If you just want to try solving some problems for fun, it’s a great place to start.
- There are also other sites which accept Haskell, such as Codeforces. Check these out if you want to actually participate in timed contests.
- My public listing of Kattis problems I have solved, with my own personal rating system.
- I’ve written a series of blog posts about competitive programming in Haskell, on a variety of topics.
- I also have a repository of modules I’ve developed specifically for competitive programming. Many of the modules are documented in one or more blog posts.
- Soumik Sarkar has an even larger collection of Haskell libraries for competitive programming.
Tagged competitive programming, Hendrix, programming, contest, HCPC, Kattis
I haven’t written on here in a while, mostly because a lot of my time has gone into preparing for the second annual Hendrix College Programming Contest, which will take place this Saturday, March 15, from 12:30-5:30pm CDT (17:30-22:30 UTC).
I’ve created an open mirror contest which will run in parallel to the official contest, so if you want to grab some friends and try solving some of the problems together using your favorite language, be my guest!
My paper, You could have invented Fenwick trees, has just been published as a Functional Pearl in the Journal of Functional Programming. This blog post is an advertisement for the paper, which presents a novel way to derive the Fenwick tree data structure from first principles.
Suppose we have a sequence of integers \(a_1, \dots, a_n\) and want to be able to perform two operations:
- we can update any \(a_i\) by adding some value \(v\) to it; or
- we can perform a range query, which asks for the sum of the values \(a_i + \dots + a_j\) for any range \([i,j]\).
There are several ways to solve this problem. For example:
- We could just keep the sequence of integers in a mutable array. Updating is \(O(1)\), but range queries are \(O(n)\) since we must actually loop through the range and add up all the values.
- We could keep a separate array of prefix sums on the side, so that \(P_i\) stores the sum \(a_1 + \dots + a_i\). Then the range query on \([i,j]\) can be computed as \(P_j - P_{i-1}\), which only takes \(O(1)\); however, updates now take \(O(n)\) since we must also update all the prefix sums which include the updated element.
- We can get the best of both worlds using a segment tree, a binary tree storing the elements at the leaves, with each internal node caching the sum of its children. Then both update and range query can be done in \(O(\lg n)\).
I won’t go through the details of this third solution here, but it is relatively straightforward to understand and implement, especially in a functional language.
However, there is a fourth solution, known as a Fenwick tree or Fenwick array, independently invented by Ryabko (1989) and Fenwick (1994). Here’s a typical Java implementation of a Fenwick tree:
class FenwickTree {
private long[] a;
public FenwickTree(int n) { a = new long[n+1]; }
public long prefix(int i) {
long s = 0;
for (; i > 0; i -= LSB(i)) s += a[i]; return s;
}
public void update(int i, long delta) {
for (; i < a.length; i += LSB(i)) a[i] += delta;
}
public long range(int i, int j) {
return prefix(j) - prefix(i-1);
}
public long get(int i) { return range(i,i); }
public void set(int i, long v) { update(i, v - get(i)); }
private int LSB(int i) { return i & (-i); }
}I know what you’re thinking: what the heck!? There are some loops adding and
subtracting LSB(i), which is defined as the bitwise AND of i and
-i? What on earth is this doing? Unless you have seen this
before, this code is probably a complete mystery, as it was for me the
first time I encountered it.
However, from the right point of view, we can derive this mysterious imperative code as an optimization of segment trees. In particular, in my paper I show how we can:
- Start with a segment tree.
- Delete some redundant info from the segment tree, and shove the remaining values into an array in a systematic way.
- Define operations for moving around in the resulting Fenwick array by converting array indices to indices in a segment tree, moving around the tree appropriately, and converting back.
- Describe these operations using a Haskell EDSL for infinite-precision 2’s complement binary arithmetic, and fuse away all the intermediate conversion steps, until the above mysterious implementation pops out.
- Profit.
I may be exaggerating step 5 a teensy bit. But you’ll find everything else described in much greater detail, with pretty pictures, in the paper! The official JFP version is here, and here’s an extended version with an appendix containing an omitted proof.