| CARVIEW |
Differentiating Voices from Music
We begin by modeling a musical voice, which is, roughly speaking, the abstract version of a human voice. The voice can be doing one thing at a time, or can choose to not be doing anything.
Voices are modeled by step functions, which are divisions of the real line into discrete chunks. We interpret each discrete chunk as a note being played by the voice for the duration of the chunk.
This gives rise to a nice applicative structure that I alluded to in my previous post:
liftA2 f
|---- a ----|-- b --|
|-- x --|---- y ----|
=
|- fax -|fay|- fby -|
where we take the union of the note boundaries in order to form the applicative. If either voice is resting, so too is the applicative. There is also an Alternative instance here, which chooses the first non-rest.
There is a similar monoidal structure here, where multiplication is given by “play these two things simultaneously,” relying on an underlying Semigroup instance for the meaning of “play these two things:”
If either voice is resting, we treat its value as mempty, and can happily combine the two parts in parallel.
All of this gives rise to the following rich structure:
newtype Voice a = Voice { unVoice :: SF Time (Maybe a) }
deriving stock
(Eq, Ord, Show, Functor, Foldable, Traversable, Generic, Generic1)
deriving newtype
(Semigroup, Monoid)
deriving
Applicative
via Compose (SF Time) Maybe
instance Filterable Voice
instance Witherable Voice
instance Alternative Voice
-- | Delay a voice by some amount of time.
delayV :: Time -> Voice a -> Voice aFrom Voices to Music
Voices, therefore, give us our primitive notion of monophony. But real music usually has many voices doing many things, independently. This was a point in which I got stuck in my previous post.
The solution here, is surprisingly easy. Assign a Voice to each voice name:
newtype Music v a = Music
{ getVoices :: v -> Voice a
}
deriving stock
(Functor, Generic, Functor)
deriving newtype
(Semigroup, Monoid)
deriving (Applicative, Alternative)
via Compose ((->) v) Voice
instance Profunctor Music
instance Finite v => Foldable (Music v)
instance Finite v => Traversable (Music v)
instance Filterable (Music v)
instance Finite v => Witherable (Music v)We get an extremely rich structure here completely for free. Our monoid combines all voices in parallel; our applicative combines voices pointwise; etc. However, we also have a new Profunctor Music instance, whose characteristic lmap :: (b -> c) -> Music c a -> Music b a method allows us to trade lines between voices.
In addition to the in-parallel monoid instance, we can also define a tile product operator over Music v a, which composes things sequentially1:
duration :: Music v a -> Time
(##) :: Semigroup a => Music v a -> Music v a -> Music v a
m@(Music m1) ## Music m2 =
Music $ liftA2 (<>) m1 $ fmap (delayV $ duration m) m2The Semigroup a constraint on (##) arises from the fact that the pieces of music might extend off to infinity in either direction (which pure @Voice must do), and we need to deal with that.
There are a few other combinators we care about. First, we can lift anonymous voices (colloquially “tunes”) into multi-part Music:
voice :: Eq v => v -> Music () a -> Music v a
voice v (Music sf) = Music $
\case
((== v) -> True) -> sf ()
_ -> memptyand we can assign the same line to everyone:
Writing Lines
The primitives for building little tunes are
which you can then compose sequentially via (##), and assign to voices via voice.
Harmonic Constraints
One of the better responses to my last blog post was a link to Dmitri Tymoczko’s FARM 2024 talk.
There’s much more in this video than I can possibly due justice to here, but my big takeaway was that this guy is thinking about the same sorts of things that I am. So I dove into his work, and that lead to his quadruple hierarchy:
Voices move within chords, which move within scales, which move within macro-harmonies.
Tymoczko presents a T algebra which is a geometric space for reasoning about voice leadings. He’s got a lot of fun websites for exploring the ideas, but I couldn’t find an actual implementation of the idea anywhere, so I cooked one up myself.
The idea here is that we have some T :: [Nat] -> Type which describes a hierarchy of abstract scales moving with respect to one another. For example, the Western traditional of having triads move within the diatonic scale, which moves within the chromatic scale, would be represented as T '[3, 7, 12]. T forms a monoid, and has some simple generators that give rise to smooth voice leadings (chord changes.)
Having a model for smooth harmonic transformations means we can use it constructively. I am still working out the exact details here, but the rough shape of the idea is to build an underlying field of key changes (represented as smooth voice leadings in T '[7, 12]):
We can then make an underlying field of functional harmonic changes (chord changes), modeled as smooth voice leadings in T '[3, 7]:
Our voices responsible for harmony can now be written as values of type
and we can use the applicative musical structure to combine the elements together:
{-# LANGUAGE ApplicativeDo #-}
extend :: T ns -> T (ns ++ ms)
sink :: T ns -> T (n ': ns)
harmony :: Music h (Set (T '[3, 7, 12]))
harmony = do
k <- everyone keyChanges
c <- everyone chordChanges
m <- harmonicVoices
pure $ extend m <> extend c <> sink kwhich we can later fmap out into concrete pitches. The result is that we can completely isolate the following pieces:
- key changes
- chord changes
- how voices express the current harmony
- the rhythms of all of the above
and the result is guaranteed to compose in a way that the ear can interpret as music. Not necessarily good music, but undeniably as music.
The type indices on T are purely for my book-keeping, and nothing requires them to be there. Which means we could also use the applicative structure to modulate over different sorts of harmony (eg, move from triads to seventh chords.)
Melody: Still an Open Question
I haven’t quite gotten a feel for melody yet; I think it’s probably in T '[7, 12], but it would be nice to be able to target chord tones as well. Please let me know in the comments if you have any insight here.
However, I have been thinking about contouring, which is the overall “shape” of a musical line. Does it go up, and peak in the middle, and then come down again? Or maybe it smoothly descends down.
We can use the discrete intervals intrinsic inside of Voices to find “reasonable” times to sample them. In essence this assigns a Time to each segment:
and we can then use these times to then sample a function Time -> b. This then allows us to apply contours (given as regular Real -> Real functions) to arbitrary rhythms. I currently have this typed as
where a ~ T something, and the outputted Reals get rounded to their nearest integer values. I’m not deeply in love with this type, but the rough idea is great—turn arbitrary real-valued functions into musical lines. This generalizes contouring, as well as scale runs.
Next Steps?
I’m writing all of this up because I go back to work on Monday and life is going to get very busy soon. I’m afraid I won’t be able to finish all of this!
The types above I’m pretty certain are relatively close to perfect. They seem to capture everything I could possibly want, and nothing I don’t want. Assuming I’m right about that, they must make up the basis of musical composition.
The next step therefore is to build musical combinators on top. One particular combinator I’ve got my eye on is some sort of general ~> “get from here to there” operator:
which I imagine would bridge a gap between the end of one piece of music with beginning of another. I think this would be roughly as easy as moving each voice linearly in T space from where it was to where it needs to be. This might need to be a ternary operation in order to also associate a rhythmic pattern to use for the bridge.
But I imagine (~>) would be great for lots of dumb little musical things. Like when applied over the chord dimension, it would generate arpeggios. Over the scale dimension, it would generate runs. And it would make chromatic moves in the chroma dimension.
Choosing exactly what moves to make for Ts consisting of components in multiple axes might just be some bespoke order, or could do something more intelligent. I think the right approach would be to steal diagrams’ idea of an Envelope, and attach some relevant metadata to each Music. We could then write (~>) as a function of those envelopes, but I must admit I don’t quite know what this would look like.
As usual, I’d love any insight you have! Please leave it in the comments. Although I must admit I appreciate comments of the form “have you tried $X” much more than of the form “music is sublime and you’re an idiot for even trying this.”
Happy new year!
Strictly speaking, the tile product can also do parallel composition, as well as sychronizing composition, but that’s not super important right now.↩︎
What Music Is Not
It’s tempting to gesticulate wildly, saying that music is merely a function from time to wave amplitudes, eg something of the form:
While I think it’s fair to say that this is indeed the underlying denotation of sound, this is clearly not the denotation of music. For example, we can transpose a song up a semitone without changing the speed—something that’s very challenging without a great deal of in the waveform representation. And we can play a musical phrase backwards, which is probably impossible in a waveform for any timbral envelope.
Since we have now two examples of “reasonable to want to do” with musical objects, which cannot be expressed in terms of a function Time -> Amplitude, we must conceed that waveforms-over-time cannot be the denotation of music.
What Music Might Be
Music is obviously temporal, so keeping the “function from time” part seems relevant. But a function from time to what? As a first attempt:
which, for a given time, returns a set of notes starting at that time, and how long they ought to be played for. An immediate improvement would be to parameterize the above over notes:
It’s tempting to try to eliminate more of the structure here with our parametricity, but I was unable to do so. In contrapuntal music, we will want to be able to express two notes starting at the same moment, but ending at different times.
One alluring path here could to write monophonic voices, and combine them together for polyphony:
Such an encoding has many unfavorable traits. First, it just feels yucky. Why are there two layers of Time? Second, now I-as-a-composer need to make a choice of which voice I put each note in, despite the fact that this is merely an encoding quirk. So no, I don’t think this is a viable path forward.
So let’s return to our best contender:
This definition is trivially a monoid, pointwise over the time structure:
If we think about abstract sets here, rather than Data.Set.Set, such an object is clearly a functor. There are many possible applicatives here, but the pointwise zipper seems most compelling to me. Pictorally:
pure a
=
|----- a ----forever...
liftA2 f
|---- a ----|-- b --|
|-- x --|---- y ----|
=
|- fax -|fay|- fby -|
Such an applicative structure is quite nice! It would allow us to “stamp” a rhythm on top of a pure representation of a melody.
However, the desirability of this instance is a point against μ (Music a) = Time -> Set (Duration, a), since by Conal Elliott’s typeclass morphism rule, the meaning of the applicative here ought to be the applicative of the meaning. Nevertheless, any other applicative structure would be effecitvely useless, since it would require the notes on one side to begin at the same time as the notes on the other. To sketch:
-- bad instance!
liftA2 f (Music a) (Music b) =
Music (liftA2 (\(d1, m) (d2, n) -> (d1 <> d2, f m n)) a b)Good luck finding a musically meaningful pure for such a thing!
Ok, so let’s say we commit to the pointwise zippy instance as our applicative instance. Is there a corresponding monad? Such a thing would substitute notes with more music. My first idea of what to do with such a thing would be to replace chords with texture. For example, we could replace chords with broken chords, or with basslines that target the same notes.
Anyway, the answer is yes, there is such a monad. But it’s musically kinda troublesome. Assume we have the following function:
which will convert a Music a into its notes and an optional temporal interval (optional because pure goes on forever.) Then, we can write our bind as:
m >>= f = flip foldMap (notes m) \case
(Nothing, a) -> f a
(Just (start, duration), a) ->
offset start $ _ duration $ f awhere offset changes when a piece of music occurs. We are left with a hole of type:
whose semantics sure better be that it forces the given Music to fit in the alotted time. There are two reasonable candidates here:
where scaleTo changes the local interpretation of time such that the entire musical argument is played within the given duration, and truncate just takes the first Duration’s worth of time. Truncate is too obviously unhelpful here, since the >>= continuation doesn’t know how much time it’s been given, and thus most binds will drop almost all of their resulting music.
Therefore we will go with scaleTo. Which satisfies all of the algebraic (monad) laws, but results in some truly mystifying tunes. The problem here is that this is not an operation which respects musical meter. Each subsequent bind results in a correspondingly smaller share of the pie. Thus by using only bind and mconcat, it’s easy to get a bar full of quarter notes, followed by a bar of sixty-fourth notes, followed by two bars full of of 13-tuplets. If you want to get a steady rhythm out of the whole thing, you need a global view on how many binds deep you’re ever going to go, and you need to ensure locally that you only produce a small powers-of-two number of notes, or else you will accidentally introduce tuplets.
It’s a mess. But algebraically it’s fine.
What Music Seems Like It Should Be
The above foray into monads seems tentatively promising for amateur would-be algorithmic composers (read: people like me.) But I have been reading several books on musical composition lately, and my big takeaway from them is just how damn contextual notes are.
So maybe this means we want more of a comonadic interface. One in which you can extend every note, by taking into account all of the notes in its local vicinity. This feels just as right as the monadic approach does, albeit in a completely different way. Being able to give a comonad instance for Music would require us to somehow reckon with having only a single a at any given time. Which appeals to my functional programmer soul, but again, I don’t know how to do it.
But imagine if we did have a comonadic instance. We could perform voice leading by inspecting what the next note was, and by futzing around with our pitch. We could do some sort of reharmonization by shifting notes around according to what else is happening.
But maybe all of this is just folly.
Music as it’s actually practiced doesn’t seem to have much of the functionaly-compositional properties we like—ie, that we can abstract and encapsulate. But music doesn’t appear to be like that! Instead, a happy melody takes a different character when played on major vs minor chords. Adding a dissonant interval can completely reconceptualize other notes.
It feels like a bit of a bummer to end like this, but I don’t really know where to go from here. I’ve worked something like six completely-different approaches over the last few months, and what’s documented here is the most promising bits and pieces. My next thought is that maybe music actually forms a sheaf, which is to say that it is a global solution that respects many local constraints.
All of this research into music has given me much more thoughts about music qua music which I will try to articulate the next time I have an evening to myself. Until then.
]]>SELECT statement. When you run that query in postgres, you get the output of your program.
Why have I done this? Because I needed a funny compilation target to test out the actual features of the language, which is that its intermediary language is a bunch of abstract category theory nonsense. Which I’ll get to. But I’m sure you first want to see this bad boy in action.
Behold, the function that returns 100 regardless of what input you give it. But it does it with the equivalent of a while loop:
count : Int -> Int
count =
x ->
loop x
i ->
n <- join id id -< i
z <- abs . (-) -< (n, 100)
case z of
inl _ -> inr . (+) -< (n, 1)
inr _ -> inl -< nIf you’re familiar with arrow notation, you’ll notice the above looks kinda like one big proc block. This is not a coincidence (because nothing is a coincidence). I figured if I were to go through all of this work, we might as well get a working arrow desugarer out of the mix. But I digress; that’s a story for another time.
Anyway, what’s going on here is we have an arrow count, which takes a single argument x. We then loop, starting from the value of x. Inside the loop, we now have a new variable i, which we do some voodoo on to compute n—the current value of the loop variable. Then we subtract 100 from n, and take the absolute value. The abs function here is a bit odd; it returns Left (abs x) if the input was negative, and Right x otherwise. Then we branch on the output of abs, where Left and Right have been renamed inl and inr respectively. If n - 100 was less than zero, we find ourselves in the inl case, where we add 1 to n and wrap the whole thing in inr—which the loop interprets as “loop again with this new value.” Otherwise, n - 100 was non-negative, and so we can return n directly.
Is it roundabout? You bet! The obtuseness here is not directly a feature, I was just looking for conceptually simple things I could do which would be easy to desugar into category-theoretical stuff. Which brings us to the intermediary language. After desugaring the source syntax for count above, we’re left with this IL representation:
id △ id
⨟ cochoice
( undist
⨟ ( (prj₁ ⨟ id ▽ id) △ id
⨟ ( prj₁ △ 100
⨟ (-)
⨟ abs
)
△ id
⨟ prj₁ △ id
⨟ dist
⨟ ( (prj₂ ⨟ prj₂ ⨟ prj₁) △ 1
⨟ (+)
⨟ inr
)
▽ ( prj₂
⨟ prj₂
⨟ prj₁
⨟ inl
)
)
△ prj₂
⨟ dist
)
⨟ prj₁
We’ll discuss all of this momentarily, but for now, just let your eyes glaze over the pretty unicode.
The underlying idea here is that each of these remaining symbols has very simple and specific algebraic semantics. For example, A ⨟ B means “do A and pipe the result into B.” By giving a transformation from this categorical IL into other domains, it becomes trivial to compile catlang to all sorts of weird compilation targets. Like SQL.
You’re probably wondering what the generated SQL looks like. Take a peek if you dare.
Ungodly Compiled SQL
SELECT
f0 AS f0
FROM
(SELECT
f0 AS f0, f1 AS f1
FROM
(SELECT *
FROM
(WITH t0 AS
(SELECT *
FROM
(WITH RECURSIVE recursion AS
(SELECT
clock_timestamp() as step
, *
FROM
(WITH t1 AS
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1, NULL::integer AS f2, NULL::integer AS f3
FROM
(WITH t2 AS
(SELECT * FROM (SELECT 0 as f0) AS _)
SELECT *
FROM
(SELECT * FROM (SELECT f0 AS f0 FROM t2 AS _) AS _
CROSS JOIN
(SELECT f0 AS f1 FROM t2 AS _))
AS _)
AS _)
AS _)
SELECT *
FROM
(WITH t3 AS
(SELECT *
FROM
(-- undist
SELECT *
FROM
(SELECT
f0 AS f0, NULL::integer AS f1, f1 AS f2
FROM
(-- undist1
SELECT * FROM t1 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
NULL::integer AS f0, f2 AS f1, f3 AS f2
FROM
(-- dist2
SELECT * FROM t1 AS _ WHERE "f2" IS NOT NULL)
AS _)
AS _)
AS _)
SELECT *
FROM
(WITH t4 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1
FROM
(WITH t5 AS
(SELECT * FROM t3 AS _)
SELECT *
FROM
(WITH t6 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0
FROM
(WITH t7 AS
(SELECT * FROM (SELECT f0 AS f0, f1 AS f1 FROM t5 AS _) AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0
FROM
(-- join1
SELECT * FROM t7 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
f1 AS f0
FROM
(-- join2
SELECT * FROM t7 AS _ WHERE "f1" IS NOT NULL)
AS _)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f1, f1 AS f2, f2 AS f3 FROM t5 AS _))
AS _)
SELECT *
FROM
(WITH t8 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1
FROM
(WITH t9 AS
(SELECT *
FROM
(SELECT
f0 - f1 AS f0
FROM
(WITH t10 AS
(SELECT * FROM t6 AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT f0 AS f0 FROM (SELECT f0 AS f0 FROM t10 AS _) AS _)
AS _
CROSS JOIN
(SELECT f0 AS f1 FROM (SELECT 100 as f0 FROM t10 AS _) AS _))
AS _)
AS _)
AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
abs(f0) as f0, NULL::integer as f1
FROM
t9
AS _
WHERE
f0 < 0)
AS _
UNION
SELECT *
FROM
(SELECT NULL::integer as f0, f0 as f1 FROM t9 AS _ WHERE f0 >= 0)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5 FROM t6 AS _))
AS _)
SELECT *
FROM
(WITH t11 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1
FROM
(SELECT f0 AS f0, f1 AS f1 FROM t8 AS _)
AS _)
AS _
CROSS JOIN
(SELECT
f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5, f4 AS f6, f5 AS f7
FROM
t8
AS _))
AS _)
SELECT *
FROM
(WITH t12 AS
(SELECT *
FROM
(-- dist
SELECT *
FROM
(SELECT
f0 AS f0, f2 AS f1, NULL::integer AS f10, NULL::integer AS f11, NULL::integer AS f12, NULL::integer AS f13, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5, f7 AS f6, NULL::integer AS f7, NULL::integer AS f8, NULL::integer AS f9
FROM
(-- dist1
SELECT * FROM t11 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
NULL::integer AS f0, NULL::integer AS f1, f4 AS f10, f5 AS f11, f6 AS f12, f7 AS f13, NULL::integer AS f2, NULL::integer AS f3, NULL::integer AS f4, NULL::integer AS f5, NULL::integer AS f6, f1 AS f7, f2 AS f8, f3 AS f9
FROM
(-- dist2
SELECT * FROM t11 AS _ WHERE "f1" IS NOT NULL)
AS _)
AS _)
AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
NULL::integer AS f0, f0 AS f1
FROM
(SELECT
f0 + f1 AS f0
FROM
(WITH t13 AS
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1, f2 AS f2, f3 AS f3, f4 AS f4, f5 AS f5, f6 AS f6
FROM
(-- join1
SELECT * FROM t12 AS _
WHERE
("f0" IS NOT NULL) AND ((("f1" IS NOT NULL) OR ("f2" IS NOT NULL)) AND (("f3" IS NOT NULL) AND ((("f4" IS NOT NULL) OR ("f5" IS NOT NULL)) AND ("f6" IS NOT NULL)))))
AS _)
AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0
FROM
(SELECT
f0 AS f0
FROM
(SELECT
f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
FROM
(SELECT
f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
FROM
t13
AS _)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f1 FROM (SELECT 1 as f0 FROM t13 AS _) AS _))
AS _)
AS _)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
f0 AS f0, NULL::integer AS f1
FROM
(SELECT
f0 AS f0
FROM
(SELECT
f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
FROM
(SELECT
f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
FROM
(SELECT
f7 AS f0, f8 AS f1, f9 AS f2, f10 AS f3, f11 AS f4, f12 AS f5, f13 AS f6
FROM
(-- join2
SELECT * FROM t12 AS _
WHERE
("f7" IS NOT NULL) AND ((("f8" IS NOT NULL) OR ("f9" IS NOT NULL)) AND (("f10" IS NOT NULL) AND ((("f11" IS NOT NULL) OR ("f12" IS NOT NULL)) AND ("f13" IS NOT NULL)))))
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f2 FROM (SELECT f2 AS f0 FROM t3 AS _) AS _))
AS _)
SELECT *
FROM
(-- dist
SELECT *
FROM
(SELECT
f0 AS f0, f2 AS f1, NULL::integer AS f2, NULL::integer AS f3
FROM
(-- dist1
SELECT * FROM t4 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
NULL::integer AS f0, NULL::integer AS f1, f1 AS f2, f2 AS f3
FROM
(-- dist2
SELECT * FROM t4 AS _ WHERE "f1" IS NOT NULL)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _
UNION ALL
SELECT
clock_timestamp() as step
, *
FROM
(SELECT *
FROM
(WITH t14 AS
(SELECT * FROM recursion AS _)
SELECT *
FROM
(WITH t15 AS
(SELECT *
FROM
(-- undist
SELECT *
FROM
(SELECT
f0 AS f0, NULL::integer AS f1, f1 AS f2
FROM
(-- undist1
SELECT * FROM t14 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
NULL::integer AS f0, f2 AS f1, f3 AS f2
FROM
(-- dist2
SELECT * FROM t14 AS _ WHERE "f2" IS NOT NULL)
AS _)
AS _)
AS _)
SELECT *
FROM
(WITH t16 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1
FROM
(WITH t17 AS
(SELECT * FROM t15 AS _)
SELECT *
FROM
(WITH t18 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0
FROM
(WITH t19 AS
(SELECT * FROM (SELECT f0 AS f0, f1 AS f1 FROM t17 AS _) AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0
FROM
(-- join1
SELECT * FROM t19 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
f1 AS f0
FROM
(-- join2
SELECT * FROM t19 AS _ WHERE "f1" IS NOT NULL)
AS _)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f1, f1 AS f2, f2 AS f3 FROM t17 AS _))
AS _)
SELECT *
FROM
(WITH t20 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1
FROM
(WITH t21 AS
(SELECT *
FROM
(SELECT
f0 - f1 AS f0
FROM
(WITH t22 AS
(SELECT * FROM t18 AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT f0 AS f0 FROM (SELECT f0 AS f0 FROM t22 AS _) AS _)
AS _
CROSS JOIN
(SELECT f0 AS f1 FROM (SELECT 100 as f0 FROM t22 AS _) AS _))
AS _)
AS _)
AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
abs(f0) as f0, NULL::integer as f1
FROM
t21
AS _
WHERE
f0 < 0)
AS _
UNION
SELECT *
FROM
(SELECT NULL::integer as f0, f0 as f1 FROM t21 AS _ WHERE f0 >= 0)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5 FROM t18 AS _))
AS _)
SELECT *
FROM
(WITH t23 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1
FROM
(SELECT f0 AS f0, f1 AS f1 FROM t20 AS _)
AS _)
AS _
CROSS JOIN
(SELECT
f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5, f4 AS f6, f5 AS f7
FROM
t20
AS _))
AS _)
SELECT *
FROM
(WITH t24 AS
(SELECT *
FROM
(-- dist
SELECT *
FROM
(SELECT
f0 AS f0, f2 AS f1, NULL::integer AS f10, NULL::integer AS f11, NULL::integer AS f12, NULL::integer AS f13, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5, f7 AS f6, NULL::integer AS f7, NULL::integer AS f8, NULL::integer AS f9
FROM
(-- dist1
SELECT * FROM t23 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
NULL::integer AS f0, NULL::integer AS f1, f4 AS f10, f5 AS f11, f6 AS f12, f7 AS f13, NULL::integer AS f2, NULL::integer AS f3, NULL::integer AS f4, NULL::integer AS f5, NULL::integer AS f6, f1 AS f7, f2 AS f8, f3 AS f9
FROM
(-- dist2
SELECT * FROM t23 AS _ WHERE "f1" IS NOT NULL)
AS _)
AS _)
AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
NULL::integer AS f0, f0 AS f1
FROM
(SELECT
f0 + f1 AS f0
FROM
(WITH t25 AS
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1, f2 AS f2, f3 AS f3, f4 AS f4, f5 AS f5, f6 AS f6
FROM
(-- join1
SELECT * FROM t24 AS _
WHERE
("f0" IS NOT NULL) AND ((("f1" IS NOT NULL) OR ("f2" IS NOT NULL)) AND (("f3" IS NOT NULL) AND ((("f4" IS NOT NULL) OR ("f5" IS NOT NULL)) AND ("f6" IS NOT NULL)))))
AS _)
AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0
FROM
(SELECT
f0 AS f0
FROM
(SELECT
f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
FROM
(SELECT
f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
FROM
t25
AS _)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f1 FROM (SELECT 1 as f0 FROM t25 AS _) AS _))
AS _)
AS _)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
f0 AS f0, NULL::integer AS f1
FROM
(SELECT
f0 AS f0
FROM
(SELECT
f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
FROM
(SELECT
f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
FROM
(SELECT
f7 AS f0, f8 AS f1, f9 AS f2, f10 AS f3, f11 AS f4, f12 AS f5, f13 AS f6
FROM
(-- join2
SELECT * FROM t24 AS _
WHERE
("f7" IS NOT NULL) AND ((("f8" IS NOT NULL) OR ("f9" IS NOT NULL)) AND (("f10" IS NOT NULL) AND ((("f11" IS NOT NULL) OR ("f12" IS NOT NULL)) AND ("f13" IS NOT NULL)))))
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f2 FROM (SELECT f2 AS f0 FROM t15 AS _) AS _))
AS _)
SELECT *
FROM
(-- dist
SELECT *
FROM
(SELECT
f0 AS f0, f2 AS f1, NULL::integer AS f2, NULL::integer AS f3
FROM
(-- dist1
SELECT * FROM t16 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
NULL::integer AS f0, NULL::integer AS f1, f1 AS f2, f2 AS f3
FROM
(-- dist2
SELECT * FROM t16 AS _ WHERE "f1" IS NOT NULL)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _
WHERE
("f2" IS NOT NULL) AND ("f3" IS NOT NULL))
AS _)
SELECT * FROM recursion ORDER BY step DESC LIMIT 1)
AS _)
SELECT *
FROM
(WITH t26 AS
(SELECT *
FROM
(-- undist
SELECT *
FROM
(SELECT
f0 AS f0, NULL::integer AS f1, f1 AS f2
FROM
(-- undist1
SELECT * FROM t0 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
NULL::integer AS f0, f2 AS f1, f3 AS f2
FROM
(-- dist2
SELECT * FROM t0 AS _ WHERE "f2" IS NOT NULL)
AS _)
AS _)
AS _)
SELECT *
FROM
(WITH t27 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1
FROM
(WITH t28 AS
(SELECT * FROM t26 AS _)
SELECT *
FROM
(WITH t29 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0
FROM
(WITH t30 AS
(SELECT * FROM (SELECT f0 AS f0, f1 AS f1 FROM t28 AS _) AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0
FROM
(-- join1
SELECT * FROM t30 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
f1 AS f0
FROM
(-- join2
SELECT * FROM t30 AS _ WHERE "f1" IS NOT NULL)
AS _)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f1, f1 AS f2, f2 AS f3 FROM t28 AS _))
AS _)
SELECT *
FROM
(WITH t31 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1
FROM
(WITH t32 AS
(SELECT *
FROM
(SELECT
f0 - f1 AS f0
FROM
(WITH t33 AS
(SELECT * FROM t29 AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT f0 AS f0 FROM (SELECT f0 AS f0 FROM t33 AS _) AS _)
AS _
CROSS JOIN
(SELECT f0 AS f1 FROM (SELECT 100 as f0 FROM t33 AS _) AS _))
AS _)
AS _)
AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
abs(f0) as f0, NULL::integer as f1
FROM
t32
AS _
WHERE
f0 < 0)
AS _
UNION
SELECT *
FROM
(SELECT NULL::integer as f0, f0 as f1 FROM t32 AS _ WHERE f0 >= 0)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5 FROM t29 AS _))
AS _)
SELECT *
FROM
(WITH t34 AS
(SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1
FROM
(SELECT f0 AS f0, f1 AS f1 FROM t31 AS _)
AS _)
AS _
CROSS JOIN
(SELECT
f0 AS f2, f1 AS f3, f2 AS f4, f3 AS f5, f4 AS f6, f5 AS f7
FROM
t31
AS _))
AS _)
SELECT *
FROM
(WITH t35 AS
(SELECT *
FROM
(-- dist
SELECT *
FROM
(SELECT
f0 AS f0, f2 AS f1, NULL::integer AS f10, NULL::integer AS f11, NULL::integer AS f12, NULL::integer AS f13, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5, f7 AS f6, NULL::integer AS f7, NULL::integer AS f8, NULL::integer AS f9
FROM
(-- dist1
SELECT * FROM t34 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
NULL::integer AS f0, NULL::integer AS f1, f4 AS f10, f5 AS f11, f6 AS f12, f7 AS f13, NULL::integer AS f2, NULL::integer AS f3, NULL::integer AS f4, NULL::integer AS f5, NULL::integer AS f6, f1 AS f7, f2 AS f8, f3 AS f9
FROM
(-- dist2
SELECT * FROM t34 AS _ WHERE "f1" IS NOT NULL)
AS _)
AS _)
AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
NULL::integer AS f0, f0 AS f1
FROM
(SELECT
f0 + f1 AS f0
FROM
(WITH t36 AS
(SELECT *
FROM
(SELECT
f0 AS f0, f1 AS f1, f2 AS f2, f3 AS f3, f4 AS f4, f5 AS f5, f6 AS f6
FROM
(-- join1
SELECT * FROM t35 AS _
WHERE
("f0" IS NOT NULL) AND ((("f1" IS NOT NULL) OR ("f2" IS NOT NULL)) AND (("f3" IS NOT NULL) AND ((("f4" IS NOT NULL) OR ("f5" IS NOT NULL)) AND ("f6" IS NOT NULL)))))
AS _)
AS _)
SELECT *
FROM
(SELECT *
FROM
(SELECT
f0 AS f0
FROM
(SELECT
f0 AS f0
FROM
(SELECT
f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
FROM
(SELECT
f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
FROM
t36
AS _)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f1 FROM (SELECT 1 as f0 FROM t36 AS _) AS _))
AS _)
AS _)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
f0 AS f0, NULL::integer AS f1
FROM
(SELECT
f0 AS f0
FROM
(SELECT
f2 AS f0, f3 AS f1, f4 AS f2, f5 AS f3
FROM
(SELECT
f1 AS f0, f2 AS f1, f3 AS f2, f4 AS f3, f5 AS f4, f6 AS f5
FROM
(SELECT
f7 AS f0, f8 AS f1, f9 AS f2, f10 AS f3, f11 AS f4, f12 AS f5, f13 AS f6
FROM
(-- join2
SELECT * FROM t35 AS _
WHERE
("f7" IS NOT NULL) AND ((("f8" IS NOT NULL) OR ("f9" IS NOT NULL)) AND (("f10" IS NOT NULL) AND ((("f11" IS NOT NULL) OR ("f12" IS NOT NULL)) AND ("f13" IS NOT NULL)))))
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _
CROSS JOIN
(SELECT f0 AS f2 FROM (SELECT f2 AS f0 FROM t26 AS _) AS _))
AS _)
SELECT *
FROM
(-- dist
SELECT *
FROM
(SELECT
f0 AS f0, f2 AS f1, NULL::integer AS f2, NULL::integer AS f3
FROM
(-- dist1
SELECT * FROM t27 AS _ WHERE "f0" IS NOT NULL)
AS _)
AS _
UNION
SELECT *
FROM
(SELECT
NULL::integer AS f0, NULL::integer AS f1, f1 AS f2, f2 AS f3
FROM
(-- dist2
SELECT * FROM t27 AS _ WHERE "f1" IS NOT NULL)
AS _)
AS _)
AS _)
AS _)
AS _)
AS _
WHERE
("f0" IS NOT NULL) AND ("f1" IS NOT NULL))
AS _)
AS _;It’s not pretty, rather amazingly, running the above query in postgres 17 will in fact return a single row with a single column whose value is 100. And you’d better believe it does it by actually looping its way up to 100. If you don’t believe me, make the following change:
which will instead return a row for each step of the iteration.
There are some obvious optimizations I could make to the generated SQL, but it didn’t seem worth my time, since that’s not the interesting part of the project.
What the Hell Is Going On?
Let’s take some time to discuss the underlying category theory here. I am by no means an expert, but what I have learned after a decade of bashing my head against this stuff is that a little goes a long way.
For our intents and purposes, we have types, and arrows (functions) between types. We always have the identity “do nothing arrow” id:
and we can compose arrows by lining up one end to another:1
Unlike Haskell (or really any programming language, for that matter), we DO NOT have the notion of function application. That is, there is no arrow:
You can only compose arrows, you can’t apply them. That’s why we call these things “arrows” rather than “functions.”
There are a bundle of arrows for working with product types. The two projection functions correspond to fst and snd, taking individual components out of pairs:
How do we get things into pairs in the first place? We can use the “fork” operation, which takes two arrows computing b and c, and generates a new arrow which generates a pair of (b, c):
If you’re coming from a Haskell background, it’s tempting to think of this operation merely as the (,) pair constructor. But you’ll notice from the type of the computation that there can be no data dependency between b and c, thus we are free to parallelize each side of the pair.
In category theory, the distinction between left and right sides of an arrow is rather arbitrary. This gives rise to a notion called duality where we can flip the arrows around, and get cool new behavior. If we dualize all of our product machinery, we get the coproduct machinery, where a coproduct of a and b is “either a or b, but definitely not both nor neither.”
Swapping the arrow direction of prj₁ and prj₂, and replacing (,) with Either gives us the following injections:
and the following “join” operation for eliminating coproducts:
Again, coming from Haskell this is just the standard either function. It corresponds to a branch between one of two cases.
As you can see, with just these eight operations, we already have a tremendous amount of expressivity. We can express data dependencies via ⨟ and branching via ▽. With △ we automatically encode opportunities for parallelism, and gain the ability to build complicated data structures, with prj₁ and prj₂ allowing us to get the information back out of the data structures.
You’ll notice in the IL that there are no variable names anywhere to be found. The desugaring of the source language builds a stack (via the something to allocate △ id pattern), and replaces subsequent variable lookups with a series of projections on the stack to find the value again. On one hand, this makes the categorical IL rather hard to read, but it makes it very easy to re-target! Many domains do have a notion of grouping, but don’t have a native notion of naming.
For example, in an electronic circuit, I can have a ribbon of 32 wires which represents an Int32. If I have another ribbon of 32 wires, I can trivially route both wires into a 64-wire ribbon corresponding to a pair of (Int32, Int32).
By eliminating names before we get to the IL, it means no compiler backend ever needs to deal with names. They can just work on a stack representation, and are free to special-case optimize series of projections if they are able to.
Of particular interest to this discussion is how we desugar loops in catlang. The underlying primitive is cochoice:
which magically turns an arrow on Eithers into an arrow without the eithers. We obviously must run that arrow on eithers. If that function returns inl, then we’re happy and we can just output that. But if the function returns inr, we have no choice but to pass it back in to the eithered arrow. In Haskell, cochoice is implemented as:
cochoiceHask :: (Either a c -> Either b c) -> a -> c
cochoiceHask f = go . Left
where
go :: Either a c -> b
go eac =
case f eac of
Left b -> b
Right c -> go (Right c)which as you can see, will loop until f finally returns a Left. What’s neat about this formulation of a loop is that we can statically differentiate between our first and subsequent passes through the loop body. The first time through eac is Left, while for all other times it is Right. We don’t take advantage of it in the original count program, but how many times have you written loop code that needs to initialize something its first time through?
Compiling to SQL
So that’s the underlying theory behind the IL. How can we compile this to SQL now?
As alluded to before, we simply need to give SQL implementations for each of the operations in the intermediary language. As a simple example, id compiles to SELECT * FROM {}, where {} is the input of the arrow.
The hardest part here was working out a data representation. It seems obvious to encode each element of a product as a new column, but what do we do about coproducts? After much work thought, I decided to flatten out the coproducts. So, for example, the type:
would be represented as three columns:
with the constraint that exactly one of f2 or f3 would be IS NOT NULL at any given point in time.
With this hammered out, almost everything else is pretty trivial. Composition corresponds to a nested query. Forks are CROSS JOINs which concatenate the columns of each sub-query. Joins are UNIONs, where we add a WHERE field IS NOT NULL clause to enforce we’re looking at the correct coproduct constructor.
Cochoice is the only really tricky thing, but it corresponds to a recursive CTE. Generating a recursive CTE table for the computation isn’t too hard, but getting the final value out of it was surprisingly tricky. The semantics of SQL tables is that they are multisets and come with an arbitrary greatest element. Which is to say, you need an column structured in a relevant way in order to query the final result. Due to some quirks in what postgres accepts, and in how I structured my queries, it was prohibitively hard to insert a “how many times have I looped” column and order by that. So instead I cheated and added a clock_timestamp() as step column which looks at the processor clock and ordered by that.
This is clearly a hack, and presumably will cause problems if I ever add some primitives which generate more than one row, but again, this is just for fun and who cares. Send me a pull request if you’re offended by my chicanery!
Stupid Directions To Go In the Future
I’ve run out of vacation time to work on this project, so I’m probably not going to get around to the meta-circular stupidity I was planning.
The compiler still needs a few string-crunching primitives (which are easy to add), but then it would be simple to write a little brainfuck interpreter in catlang. Which I could then compile to SQL. Now we’ve got a brainfuck interpreter running in postgres. Of course, this has been done by hand before, but to my knowledge, never via compilation.
There exist C to brainfuck compilers. And postgres is written in C. So in a move that would make Xzibit proud, we could run postgres in postgres. And of course, it would be fun to run brainfuck in brainfuck. That’d be a cool catlang backend if someone wanted to contribute such a thing.
Notes and Due Diligence and What Have You
I am not the first person to do anything like this. The source language of catlang is heavily inspired by Haskell’s arrow syntax, which in turn is essentially a desugaring algorithm for Arrows. Arrows are slightly the wrong abstraction because they require an operation arr :: (a -> b) -> (a ~> b)—which requires you to be able to embed Haskell functions in your category, something which is almost never possible.
Unfortunately, arrow syntax in Haskell desugars down to arr for almost everything it does, which in turn makes arrow notation effectively useless. In an ideal world, everything I described in this blog post would be a tiny little Haskell library, with arrow notation doing the heavy lifting. But that is just not the world we live in.
Nor am I the first person to notice that there are categorical semantics behind programming languages. I don’t actually know whom to cite on this one, but it is well-established folklore that the lambda calculus corresponds to cartesian-closed categories. The “closed” part of “cartesian-closed” means we have an operation eval :: (a ~> b, a) ~> b, but everyone and their dog has implemented the lambda calculus, so I thought it would be fun to see how far we can get without it. This is not a limitation on catlang’s turing completeness (since cochoice gives us everything we need.)
I’ve been thinking about writing a category-first programming language for the better part of a decade, ever since I read Compiling to Categories. That paper takes Haskell and desugars it back down to categories. I stole many of the tricks here from that paper.
Anyway. All of the code is available on github if you’re interested in taking a look. The repo isn’t up to my usual coding standards, for which you have my apologies. Of note is the template-haskell backend which can spit out Haskell code; meaning it wouldn’t be very hard to make a quasiquoter to compile catlang into what Haskell’s arrow desugaring ought to be. If there’s enough clamor for such a thing, I’ll see about turning this part into a library.
When looking at the types of arrows in this essay, we make the distinction that
~>are arrows that we can write in catlang, while->exist in the metatheory.↩︎
So I decided to rehash my understanding, and came up with something much conceptually clearer about what is happening and why.
A quick summary of Theorems for Free:
For any polymorphic type, we can generate a law that must hold for any value of that type.
One the examples given is for the function length :: forall a. [a] -> Int, which states that forall f l. length (fmap f l) = length l—namely, that fmap doesn’t change the length of the list.
Theorems for Free gives a roundabout and obtuse set of rules for computing these free theorems. But, as usual, the clarity of the idea is obscured by the encoding details.
The actual idea is this:
Parametrically-polymorphic functions can’t branch on the specific types they are instantiated at.
Because of this fact, functions must behave the same way, regardless of the type arguments passed to them. So all of the free theorems have the form “replacing the type variables before calling the function is the same as replacing the type variables after calling the function.”
What does it mean to replace a type variable? Well, if we want to replace a type variable a with a', we will generate a fresh function f :: a -> a', and then stick it wherever we need to.
For example, given the function id :: a -> a, we generate the free theorem:
or, for the function fromJust :: Maybe a -> a, we get:
This scheme also works for functions in multiple type parameters. Given the function swap :: (a, b) -> (b, a), we must replace both a and b, giving the free theorem:
In the special case where there are no type parameters, we don’t need to do anything. This is what’s happening in the length example given in the introduction.
Simple stuff, right? The obfuscation in the paper comes from the actual technique given to figure out where to apply these type substitutions. The paper is not fully general here, in that it only gives rules for the [] and (->) type constructors (if I recall correctly.) These rules are further obscured in that they inline the definitions of fmap, rather than writing fmap directly.1 But for types in one variable, fmap is exactly the function that performs type substitution.
Perhaps this paper predates typeclasses? Very possible.↩︎
Why doesn’t [the Data.Map function]
unionWith :: (a -> a -> a) -> Map k a -> Map k a -> Map k aallow for different value types the wayintersectionWith :: (a -> b -> c) -> Map k a -> Map k b -> Map k cdoes?
This is a very reasonable question, and it lead down an interesting rabbit hole of at the intersection of API design and efficient implementation.
To answer the original question, what would the type of a different value type of unionWith look like? It would be something in the flavor of:
But this new Maybe a -> Maybe b -> c parameter is somewhat lossy, in that it gives the impression that it could be called with Nothing Nothing as parameters, which doesn’t fit into the vibe of being a “union.”
So instead we could restrict that possibility by using These a b:
data These a b = This a | That b | These a b
unionWith :: (These a b -> c) -> Map k a -> Map k b -> Map k cwhich seems reasonable enough.
But let’s take reasonableness out of the picture and start again from first principles. Instead let’s ask ourselves the deep philsophical question of what even IS a map?
A Map k v is a particularly efficient implementation of functions with type k -> Maybe v. But why is this Maybe here? It’s really only to encode the “default” value of performing a lookup. Nothing goes wrong if we generalize this to be Monoid v => k -> v. In fact, it helps us make sense of the right bias present in Data.Map, where we see:
This equality is hard to justify under the normal understanding of Map k v being an encoding of a function k -> Maybe v. But under the general monoid interpretation, we get a nice semigroup homomorphism:
where the monoid in question has been specialized to be Last.
Of course, we also have a monoid homomorphism:
Let’s re-evaluate the original question in terms of this newly-generalized Map. Now that we’ve removed all of the unnecessary baggage of Maybe, we can again think about the desired type of unionWith:
which looks awfully familiar. This new type signature automatically resolves our original concerns about “what should we do if the key isn’t present?”—just call the function with mempty as a parameter!
We can give some semantics as to what unionWith ought to do again by relating it to the observation lookup. The relevant law here seems like it ought to be:
By choosing a degenerate function f, say, \_ _ -> nontrivial, where nontrivial is some value that is not mempty, we can see the beginnings of a problem:
Regardless of the key we lookup in our unionWithed Map, we need to get back nontrivial. How can we implement such a thing? I see only two ways:
- explicitly associate every key in the map with
nontrivial, or - keep
nontrivialaround as a default value in the map
#1 is clearly a non-starter, given that we want our Maps to be efficient encodings of functions, which leaves us with only #2. This is actually a pretty common construction, which stems immediately from the fact that a pair of monoids is itself a monoid. The construction would look something like this:
data Map k v = Map
{ defaultValue :: v
, implementation :: Data.Map.Map k v
}
deriving stock Generic
deriving (Semigroup, Monoid) via (Generically (Map k v))
unionWith
:: (a -> b -> c)
-> Map k a
-> Map k b
-> Map k c
unionWith f (Map def1 imp1) (Map def2 imp2) =
Map (f def1 def2) (liftA2 f imp1 imp2)Seems fine, right? The nail in the coffin comes from when we reintroduce our semigroup homomorphism:
Without loss of generalization, take m2 = pure nontrivial (where pure is just unionWith with a constant function.) This gives us:
Making this thing efficient is a further complication! We again have two options:
- modify the value at every key by multiplying in
nontrivial, or - finding a way of suspending this computation
#1 clearly requires \(O(n)\) work, which again forces us to look at #2. But #2 seems very challenging, because the monoidal values we need to suspend need not span the entire Map. For example, consider a Map constructed a la:
Representing this thing efficiently certainly isn’t impossible, but you’re not going to be able to do it on the balanced binary search trees that underlie the implementation of Data.Map.Map.
I find this quite an interesting result. I always assumed that Data.Map.Map (or at least, Data.Map.Monoidal.MonoidalMap) didn’t have an Applicative instance because it would require a Monoid constraint on its output—but that’s not the sort of thing we can express in Haskell.
But the analysis above says that’s not actually the reason! It’s that there can be no efficient implementation of Applicative, even if we could constrain the result.
What I find so cool about this style of analysis is that we didn’t actually write any code, nor did we peek into the implementation of Data.Map (except to know that it’s implemented as a balanced BST.) All we did was look at the obvious laws, instantiate them with degenerate inputs, and think about what would be required to to efficiently get the right answer.
Regardless of why, I thought I’d switch up the usual dance step today, and discuss what solving my most-recent-big-problem actually looked like, in terms of what I tried, where I looked, and what the timeline was.
The Problem
The problem is to serialize a program graph into a series of let-bindings. For example, given the following graph:
+
/ \
f ---> g
| / \
a \ /
expensive
which represents the program:
Unfortunately, this is a naive representation of the program, since it duplicates the work required to compute expensive four times, and g expensive expensive twice. Instead, we would prefer to generate the equivalent-but-more-efficient program:
This transformation is affectionately known as sharing, since it shares the computed answer whenever there is repeated work to be done.
So this is what we’re trying to do. Given the original graph, determine the best place to insert these let-bindings, for some reasonable definition of “best.” We can assume there are no side effects involved, so any place that an expression is well-scoped is an acceptable solution.
In order to understand some of my attempted solutions, it’s worth noting that our final solution should build something of type Expr, and the original graph is represented as a IntMap (ExprF Int). ExprF is the Base functor of Expr, with all of its self-references replaced by some type variable, in this case Int. Thus, the graph above looks much more like:
_ : IntMap (ExprF Int)
_ = IM.fromList
[ (0, Apply "+" [1, 3])
, (1, Apply "f" [2, 3]
, (2, ...) -- a
, (3, Apply "g" [4, 4])
, (4, ...) -- expensive
]The Original Solution
I spent over a year trying to solve this problem, with various mostly-working solutions during that time. My strategy here was to think really hard, write up some algorithm that seemed plausible, and then run it against our (small) battery of integration tests to make sure it got the same answer as before.
Why not property test it? I tried, but found it very challenging to implement well-typed generators that would reliably introduce shared thunks. But maybe there’s a different lesson to be learned here about writing good generators.
Anyway. For eight months, one of these think-really-hard algorithms fit the bill and didn’t give us any problems. It was a weird, bespoke solution to the problem that independetly kept track of all of the free variables in every graph fragment, and tried to let-bind a fragment as soon as we landed in a context where all of the free variables were in scope. It seemed to work, but it was extremely messy and unmaintainable.
At the time of writing, this sharing algorithm was the only source of let-binds in our entire language, which meant that it didn’t need to account for let-binds in the program.
Of course, that invariant eventually changed. We added a way in the source langauge to introduce lets, which meant my algorithm was wrong. And I had written it sufficiently long ago that I no longer remembered exactly why it worked. Which meant the theory of my program was lost, and thus that we ought to rewrite it.
Unfolding a Solution
I went back to the problem statement, and stared at it for a long time (back to the think-really-hard algorithm!) Upon staring at the problem, I realized that what I was really trying to do was determine where diamond patterns arose in the propgram graph.
Recall our original graph:
+
/ \
f ---> g
| / \
a \ /
expensive
If we redraw it such that g is on a different rank than f, then the two diamond patterns become much clearer:
+
/ \
f |
| \ |
a \ /
g
/ \
\ /
expensive
The insight I came up with is that if a node n is the source of a diamond, then we must let-bind the sink of the diamond immediately before inlining the definition of n.
This gives rise to the question of “how do we identify a diamond?” What we can do is give a mapping from each node to its reachable set of nodes. For example, in the above, we’d compute the map:
+ -> {+, f, a, g, expensive}
f -> {f, a, g, expensive}
a -> {a}
g -> {g, expensive}
expensive -> {expensive}
Then when we go to inline a node, say, +, we can look for any nodes that are reachable via more than one of its immediate subterms. Since the immediate subterms of + are f and g, we can take the intersections of their reachable sets:
{f, a, g, expensive} union {g, expensive}
giving us
{g, expensive}
which is exactly the set of nodes that we need to perform sharing on. If you topologically sort this set, it gives you the order that you should perform your let bindings.
EXCEPT there’s a kink in the whole thing. What happens if one of the terms in this diamond contains free variables? In particular, we might have something like this:
+
/ \
f |
| \ |
a \ /
λx
/ \
\ /
expensive
|
x
This gives us an analogous set of reachable nodes when we look at +, but we obviously can’t lift expensive x above the lambda.
Resolving this problem required giving up on the notion of memoizing the entire reachable set of nodes, and to instead crawl the graph ensuring that everything is well-scoped.
Performance Woes
My algorithm looked fine, and, importantly, got the right answer in a reasonable amount of time on our (small) battery of integration tests. So I shipped it, commended myself on a job well done, and thought nothing more about it. For about a week, until a bug report came in saying that our compiler now seemed to hang on big programs.
Which was something I hadn’t noticed, since we didn’t have any big programs in our integration tests.
Damn!
Upon digging in to what exactly was so slow, I noticed that my algorithm was accidentally quadratic. I needed to fold over every node in the graph, and that required looking at the entire reachable set underneath it. I had put in some of the obvious safeguards, hoping that they would prune the search tree early, but it wasn’t enough sacrifice for the Great God of Asymptotes.
Did I mention that at this point in the story, having this algorithm working fast was on the critical path of the company? Everybody else was blocked on me figuring this out. Talk about pressure!
Anyway. You’ll notice above that in my description of the algorithm, everything sounds fine. But the juice is in the details, as the common saying goes. Computing reachability isn’t quite the right thing to be using here, as it gave us the wrong answer for the lambda example above. Which is unfortunate because reachability is something we can do in linear time.
And then when reachability didn’t work, I just threw away the fast performance and hoped my bespoke algorithm would do the job. My only redemption comes from the fact that at least it got the right answer, even if it did so very slowly.
Finding the Kernel
Back to the drawing board.
Whenever I have graph theory problems, I call up my boy Vikrem. He’s good at nerd stuff like this.
We rubberducked the problem, and tried to reframe the problem in the language of graph theory. We had a Merkiv–Maguire moment where we indepdently realized that the goal was somehow related to finding the lowest common ancestor (LCA) of a node.
Which is to say, roughly, that we are looking for forks in the diamond diagram. Which we already knew, but it was nice to have some language for.
Our new problem is that LCA is defined only over trees. There are some extensions to DAGs, but none of them seem to be particularly well founded. However, searching for exactly that brought me to this stackoverflow question, where nestled in the comments is someone suggesting that the poster isn’t looking for LCA, but instead for a related notion the lowest single common ancestor. LSCA is defined in a 2010 paper New common ancestor problems in trees and directed acyclic graphs.
The standard definition of LCA(x, y) = l is that “l is an ancestor of x and of y, and that no descendent of l has this property.”
But the definition of LSCA(x, y) = l is that “l lies on all root-to-x paths, and that l lies on all root-to-y paths, and that no descendent of l has this property.”
The distinction between the two is easily seen in the following graph:
0
/ \
1 2
| X |
3 4
Under the standard definition, LCA is not uniquely defined for DAGs. That is, LCA(3, 4) = {1, 2}. But neither 1 nor 2 lies on all paths from the root. Under LSCA therefore we get LSCA(3, 4) = 0, which is the obviously-correct place to let-bind 3 and 4.
The paper gives a preprocessing scheme for computing LSCA by building a “lowest single ancestor” (LSA) tree. The LSA of a node is the LSCA of all of its in-edges. This definition cashes out to mean “the most immediate diamond above any node.” Finally! This is exactly what we’re looking for, since this is where we must insert our let-bindings! Even better, the paper gives us an algorithm for computing the LSA tree in linear time!
The First Implementer
Of course, I’m lazy and would prefer not to implement this thing. So instead I searched on hackage for lsca, and found nothing. But then I searched for lca and found that, like always, Ed Kmett was 13 years ahead of me.
The lca package implements an \(O(log n)\) algorithm for computing the LCA of any two nodes in a graph. Which is very convenient for me, since the LSCA algorithm requires being able to do this.
Time to roll up the sleeves and get cracking I suppose.
The paper was surprisingly straightforward, and my first attempt implemented the (imperative) algorithms as given (imperatively.) The first step is to do a topological sort on the DAG in order to know in which order one ought to unfold the LSA tree.
But as is so often the case, this topological sort isn’t actually relevant to the algorithm; it’s just an encoding detail of expressing the algorithm imperatively. But you don’t need that when you’ve got laziness on your side! Instead you can just tie the know and do something cool like this:
lsaTree :: Ord v => Map v (Set v) -> Map v (Path v)
lsaTree input = fix $ \result -> M.fromList $ do
(node, parents) <- M.toList input
let parentResults = fmap (result M.!) parents
...Notice how we use fix to bind the eventual result of the final computation. Then we can chase pointers by looking them up in result—even though it’s not yet “computed.” Who cares what order the computer does it in. Why is that a thing I should need to specify?
Anyway. The exact details of implementing LSA are not particularly important for the remainder of this blog post. If you’re interested, you can peep the PR, which is delightfully small.
Tying It All Back Together
Equipped with my LSA tree, I was now ready to go back and solve the original problem of figuring out where to stick let-bindings. It’s easy now. Given the original program graph, find the LSA for each node. The LSA is the place you should insert the let binding.
So given the map of nodes to their LSAs, invert that map and get back a map of nodes to descendents who have this node as an LSA. Now when you go to inline a node, just look up everything in this map and inline it first.
It turns out to be a very elegant solution. It’s one third of the length of my horrible ad-hoc implementations, and it runs in linear time of the number of nodes in the graph. All in all, very good.
More often than I’m comfortable about, people will ask me how I can have so many good ideas. And what I like about this story is that it’s pretty typical of how I actually “have” “good” ideas. I’m reminded of the fact that luck favors the prepared mind. Attentive readers will notice that none of this process was due to brilliance on my part. I happened to know Vikrem who’s a genius. Together we pulled at some ancient graph theory strings and remembered a fact that someone else had thought important to teach us. That wasn’t actually the right path, but it lead us to stackoverflow where someone had linked to a relevant paper. I implemented the paper using a library that someone else had done the heavy lifting on, and simplified the implementation using this knot-tying trick I picked up somewhere along the way.
Also, I’m just really pleased that the solution came from trying to reverse engineer the relevant graph-theory search terms. Maybe that’s the actual takeaway here.
]]>Imagine we have some little class, the details of which matter not in the least:
We can give some instances of this type:
Regular, everyday stuff. But the instances for type constructors are more interesting, because they come with an instance context:
Then, of course, if we know both Foo a and Foo b, we can infer Foo (a, b). To make this fact overwhelmingly explicit, we can reify the usual constraint-solving logic by using the Dict type, and thus the following program will typecheck:
import Data.Constraint
forwards
:: Dict (Foo a)
-> Dict (Foo b)
-> Dict (Foo (a, b))
forwards Dict Dict = DictPerhaps tipped off by the name here, the gentle reader is asked to notice the asymmetry here, since the converse program will not typecheck:
But why should it not typecheck?1 Recall from the relevant instance definition that these instances must, in fact, exist:
As a testament to just how good GHC is, we can support this bidirectionality via a minor tweak to the definition of class and its instances.
The trick is to add an associated type family to Foo, and to use it as a superclass constraint:
type Foo :: Type -> Constraint
class Evidence a => Foo a where
type Evidence a :: Constraint
type Evidence a = ()
...Because we’ve given a default implementation of the type family, our existing simple instances work as before:
with the only change required coming from the type constructor instances:
or, if we you want to be cute about it:
By sticking Evidence into the superclass constraint, GHC knows that this dictionary is always available when you’ve got a Foo dictionary around. And our earlier backwards program now typechecks as expected.
This is all available in a play session if you’d like to fool around with it.
Rhetorical question. I don’t want to hear about orphans or overlapping instances or whatever.↩︎
This anti-pattern isn’t particularly surprising in its prevalence; after all, if you’ve got the usual imperative brainworms, this is just how things get built. The gang of four “builder pattern” is exactly this; you can build an empty object, and setters on such a thing change the state but return the object itself. Thus, you build things by chaning together setter methods:
Even if you don’t ascribe to the whole OOP design principle thing, you’re still astronomically likely to think about building data structures like this:
To be more concrete, maybe instead of doodads and widgets you have BSTs and Nodes. Or dictionaries and key-value pairs. Or graphs and edges. Anywhere you look, you’ll probably find examples of this sort of code.
Maybe you’re thinking to yourself “I’m a hairy-chested functional programmer and I scoff at patterns like these.” That might be true, but perhaps you too are guilty of writing code that looks like:
Just because it’s dressed up with functional combinators doesn’t mean you’re not still writing C code. To my eye, the great promise of functional programming is its potential for conceptual clarity, and repeated mutation will always fall short of the mark.
The complaint, as usual, is that repeated mutation tells you how to build something, rather than focusing on what it is you’re building. An algorithm cannot be correct in the absence of intention—after all, you must know what you’re trying to accomplish in order to know if you succeeded. What these builder patterns, for loops, and foldrs all have in common is that they are algorithms for strategies for building something.
But you’ll notice none of them come with comments. And therefore we can only ever guess at what the original author intended, based on the context of the code we’re looking at.
I’m sure this all sounds like splitting hairs, but that’s because the examples so far have been extremely simple. But what about this one?
cgo :: (a -> (UInt, UInt)) -> [a] -> [NonEmpty a]
cgo f = foldr step []
where
step a [] = [pure a]
step a bss0@((b :| bs) : bss)
| let (al, ac) = f a
, let (bl, bc) = f b
, al + 1 == bl && ac == bc
= (a :| b : bs) : bss
| otherwise = pure a : bss0which I found by grepping through haskell-language-server for foldr, and then mangled to remove the suggestive variable names. What does this one do? Based solely on the type we can presume it’s using that function to partition the list somehow. But how? And is it correct? We’ll never know—and the function doesn’t even come with any tests!
It’s Always Monoids
The shift in perspective necessary here is to reconceptualize building-by-repeated-mutation as building-by-combining. Rather than chiseling out the object you want, instead find a way of gluing it together from simple, obviously-correct pieces.
The notion of “combining together” should evoke in you a cozy warm fuzzy feeling. Much like being in a secret pillow form. You must come to be one with the monoid. Once you have come to embrace monoids, you will have found inner programming happiness. Monoids are a sacred, safe place, at the fantastic intersection of “overwhelming powerful” and yet “hard to get wrong.”
As an amazingly fast recap, a monoid is a collection of three things: some type m, some value of that type mempty, and binary operation over that type (<>) :: m -> m -> m, subject to a bunch of laws:
which is to say, mempty does nothing and (<>) doesn’t care where you stick the parentheses.
If you’re going to memorize any two particular examples of monoids, it had better be these two:
instance Monoid [a] where
mempty = []
a <> b = a ++ b
instance (Monoid a, Monoid b) => Monoid (a, b) where
mempty = (mempty, mempty)
(a1, b1) <> (a2, b2) = (a1 <> a2, b1 <> b2)The first says that lists form a monoid under the empty list and concatenation. The second says that products preserve monoids.
The list monoid instance is responsible for the semantics of the ordered, “sequency” data structures. That is, if I have some sequential flavor of data structure, its monoid instance should probably satisfy the equation toList a <> toList b = toList (a <> b). Sequency data structures are things like lists, vectors, queues, deques, that sort of thing. Data structures where, when you combine them, you assume there is no overlap.
The second monoid instance here, over products, is responsible for pretty much all the other data structures. The first thing we can do with it is remember that functions are just really, really big product types, with one “slot” for every value in the domain. We can show an isomorphism between pairs and functions out of booleans, for example:
from :: (Bool -> a) -> (a, a)
from f = (f False, f True)
to :: (a, a) -> (Bool -> a)
to (a, _) False = a
to (_, a) True = aand under this isomorphism, we should thereby expect the Monoid a => Monoid (Bool -> a) instance to agree with Monoid a => Monoid (a, a). If you generalize this out, you get the following instance:
which combines values in the codomain monoidally. We can show the equivalence between this monoid instance and our original product preservation:
from f <> from g
= (f False, f True) <> (g False, g True)
= (f False <> g False, f True <> g True)
= ((f <> g) False, (f <> g) True)
= from (f <> g)and
to (a11, a12) <> to (a21, a22)
= \x -> to (a11, a12) x <> to (a21, a22) x
= \x -> case x of
False -> to (a11, a12) False <> to (a21, a22) False
True -> to (a11, a12) True <> to (a21, a22) True
= \x -> case x of
False -> a11 <> a21
True -> a12 <> a22
= \x -> to (a11 <> a21, a12 <> a22) x
= to (a11 <> a21, a12 <> a22)which is a little proof that our function monoid agrees with the preservation-of-products monoid. The same argument works for any type x in the domain of the function, but showing it generically is challenging.
Anyway, I digresss.
The reason to memorize this Monoid instance is that it’s the monoid instance that every data structure is trying to be. Recall that almost all data structures are merely different encodings of functions, designed to make some operations more efficient than they would otherwise be.
Don’t believe me? A Map k v is an encoding of the function k -> Maybe v optimized to efficiently query which k values map to Just something. That is to say, it’s a sparse representation of a function.
From Theory to Practice
What does all of this look like in practice? Stuff like worrying about foldr is surely programming-in-the-small, which is worth knowing, but isn’t the sort of thing that turns the tides of a successful application.
The reason I’ve been harping on about the function and product monoids is that they are compositional. The uninformed programmer will be surprised by just far one can get by composing these things.
At work, we need to reduce a tree (+ nonlocal references) into an honest-to-goodness graph. While we’re doing it, we need to collect certain nodes. And the tree has a few constructors which semantically change the scope of their subtrees, so we need to preserve that information as well.
It’s actually quite the exercise to sketch out an algorithm that will accomplish all of these goals when you’re thinking about explicit mutation. Our initial attempts at implementing this were clumsy. We’d fold the tree into a graph, adding fake nodes for the Scope construcotrs. Then we’d filter all the nodes in the graph, trying to find the ones we needed to collect. Then we’d do a graph traversal from the root, trying to find these Scope nodes, and propagating their information downstream.
Rather amazingly, this implementation kinda sorta worked! But it was slow, and took \(O(10k)\) SLOC to implement.
The insight here is that everything we needed to collect was monoidal:
data Solution = Solution
{ graph :: Graph
, collectedNodes :: Set Node
, metadata :: Map Node Metadata
}
deriving stock (Generic)
deriving (Semigroup, Monoid) via Generically Solutionwhere the deriving (Semigroup, Monoid) via Generically Solution stanza gives us the semigroup and monoid instances that we’d expect from Solution being the product of a bunch of other monoids.
And now for the coup de grace: we hook everything up with the Writer monad. Writer is a chronically slept-on type, because most people seem to think it’s useful only for logging, and, underwhelming at doing logging compared to a real logger type. But the charm is in the details:
Writer w is a monad whenever w is a monoid, which makes it the perfect monad for solving data-structure-creation problems like the one we’ve got in mind. Such a thing gives rise to a few helper functions:
collectNode :: MonadWriter Solution m => Node -> m ()
collectNode n = tell $ mempty { collectedNodes = Set.singleton n }
addMetadata :: MonadWriter Solution m => Node -> Metadata -> m ()
addMetadata n m = tell $ mempty { metadata = Map.singleton n m }
emitGraphFragment :: MonadWriter Solution m => Graph -> m ()
emitGraphFragment g = tell $ mempty { graph = g }each of which is responsible for adding a little piece to the final solution. Our algorithm is thus a function of the type:
algorithm
:: Metadata
-- ^ the current scope
-> Tree
-- ^ the tree we're reducing
-> Writer Solution Node
-- ^ our partial solution, and the node corresponding to the root of the treewhich traverses the Tree, recursing with a different Metadata whenever it comes across a Scope constructor, and calling our helper functions as it goes. At each step of the way, the only thing it needs to return is the root Node of the section of the graph it just built, which recursing calls can use to break up the problem into inductive pieces.
This new implementation is roughly 20x smaller, coming in at @O(500)@ SLOC, and was free of all the bugs we’d been dilligently trying to squash under the previous implementation.
Chalk it down to another win for induction!
]]>^.. and folded their way to a solution that is much more naturally written merely as toList. And don’t get me started about the stateful operators like <<+= and their friends. Many programs which can be more naturally written functionally accidentally end up being imperative due to somebody finding a weird lens combinator and trying to use it in anger. Much like a serious drug collection, the tendency is to push it as far as you can.
Thus, my response has usually been one of pushback and moderation. I don’t avoid lenses at all costs, but I do try to limit myself to the prime types (Lens', Prism', Iso'), and to the boring combinators (view, set, over). I feel like these give me most of the benefits of lenses, without sending me tumbling down the rabbit hole.
All of this is to say that my grokkage of lenses has always been one of generalized injections and projections, for a rather shallow definition of “generalized”. That is, I’ve grown accustomed to thinking about lenses as getter/setter pairs for data structures—eg, I’ve got a big product type and I want to pull a smaller piece out of it, or modify a smaller piece in a larger structure. I think about prisms as the dual structure over coproducts—“generalized” injecting and pattern matching.
And this is all true; but I’ve been missing the forest for the trees on this one. That’s not to say that I want to write lensier code, but that I should be taking the “generalized” part much more seriously.
The big theme of my intellectual development over the last few years has been thinking about abstractions as shared vocabularies. Monoids are not inherently interesting; they’re interesting because of how they let you quotient seemingly-unrelated problems by their monoidal structure. Applicatives are cool because once you’ve grokked them, you begin to see them everywhere. Anywhere you’ve got conceptually-parallel, data-independent computations, you’ve got an applicative lurking somewhere under the surface (even if it happens to be merely the Identity applicative.)
I’ve had a similar insight about lenses, and that’s what I wanted to write about today.
The Context
At work, I’ve been thinking a lot about compilers and memory layout lately. I won’t get into the specifics of why, but we can come up with an inspired example. Imagine we’d like to use Haskell to write a little eDSL that we will use to generate x86 machine code.
The trick of course, is that we’re writing Haskell in order to not write machine code. So the goal is to design high-level combinators in Haskell that express our intent, while simultaneously generating machine code that faithfully implements the intention.
One particularly desirable feature about eDSLs is that they allow us to reuse Haskell’s type system. Thus, imagine we have some type:
Notice that the a parameter here is entirely phantom; it serves only to annotate the type of the value produced by executing getMachineCode. For today’s purpose, we’ll ignore all the details about calling conventions and register layout and what not; let’s just assume a Code a corresponds to a computation that leaves a value (or pointer) to something of type a in a well-known place, whether that be the top of the stack, or eax or something. It doesn’t matter!
Since the type parameter to Code is phantom, we need to think about what role it should have. Keeping it at phantom would be disastrous, since this type isn’t used by Haskell, but it is certainly used to ensure our program is correct. Similarly, representational seems wrong, since coerce is meaningful only when thinking about Haskell; which this thing decidedly is not. Thus, our only other option is:
Frustratingly, due to very similar reasoning, Code cannot be a functor, because there’s no way1 to lift an arbitrary Haskell function a -> b into a corresponding function Code a -> Code b. If there were, we’d be in the clear! But alas, we are not.
The Problem
All of the above is to say that we are reusing Haskell’s type system, but not its values. An expression of type Code Bool has absolutely no relation to the values True or False—except that we could write, by hand, a function litBool :: Bool -> Code Bool which happened to do the right thing.
It is tempting, however, to make new Haskell types in order to help constrain the assembly code we end up writing. For example, maybe we want to write a DSP for efficiently decoding audio. We can use Haskell’s types to organize our thoughts and prevent ourselves from making any stupid mistakes:
data Decoder = Decoder
{ format :: Format
, seekPos :: Int
, state :: ParserState
}
data Chunk = ...
createDecoder :: Code MediaHandle -> Code Decoder
decodeChunk :: Code Decoder -> (Code Decoder, Code Chunk)We now have a nice interface in our eDSL to guide end-users along the blessed path of signal decoding. We have documented what we are trying to do, and how it can be used once it’s implemented. But due to our phantom, yet nominal, parameter to Code, this is all just make believe. There is absolutely no correlation between what we’ve written down and how we can use it. The problem arises when we go to implement decodeChunk. We’ll need to know what state we’re in, which means we’ll need some function:
In a world where Code is a functor, this is implemented trivially as fmap state. But Code is not a functor! Alas! Woe! What ever can we do?
The Solution
Lenses, my guy!
Recall that Code is phantom in its argument, even if we use roles to restrict that fact. This means we can implement a safe-ish version of unsafeCoerce, that only fiddles with the paramater of our phantom type:
Judicious use of unsafeCoerceCode allows us to switch between a value’s type and its in-memory representation. For example, given a type:
we can reinterpret a Decode as a sequence of bytes:
decoderRep :: Iso' (Code Decoder) (Code (Bytes (32 + 4 + 1)))
decoderRep = iso unsafeCoerceCode unsafeCoerceCode
stateRep :: Iso' (Code ParserState) (Code (Bytes 1))
stateRep = iso unsafeCoerceCode unsafeCoerceCodewhich says we are considering our Decoder to be laid out in memory like:
Of course, this is a completely unsafe transformation, as far as the Haskell type system is aware. We’re in the wild west out here, well past any type theoretical life buoys. We’d better be right that this coercion is sound. But assuming this is in fact the in-memory representation of a Decoder, we are well justified in this transformation.
Notice the phrasing of our Iso' above. It is not an iso between Decoder and Bytes 37, but between Codes of such things. This witnesses the fact that it is not true in the Haskell embedding, merely in our Code domain. Of course, isos are like the least exciting optics, so let’s see what other neat things we can do.
Imagine we have some primitives:
slice
:: n <= m
=> Int -- ^ offset
-> Proxy n -- ^ size
-> Code (Bytes m)
-> Code (Bytes n)
overwrite
:: n <= m
=> Int -- ^ offset
-> Bytes n
-> Bytes m
-> Bytes mwhich we can envision as Haskell bindings to the pseudo-C functions:
const char[n] slice(size_t offset, char[m] bytes) {
return &bytes[offset];
}
char[m] overwrite(size_t offset, char[n] value, char[m] bytes) {
char[m] new_bytes = malloc(m);
memcpy(new_bytes, bytes, m);
memcpy(&new_bytes[offset], value, n);
return new_bytes;
}We can use slice and overwrite to give a Lens' into Bytes:
slicing :: n <= m => Int -> Code (Bytes m) -> Code (Bytes n)
slicing offset =
lens
(slice offset Proxy)
(\orig new -> overwrite offset new orig)and finally, we can give an implementation of the desired decoderState above:
decoderState :: Lens' (Code Decoder) (Code ParserState)
decoderState = decoderRep . slicing 36 . from stateRepSuch a lens acts exactly as a record selector would, in that it allows us to view, set, and over a ParserState inside of a Decoder. But recall that Code is just a list of instructions we eventually want the machine to run. We’re using the shared vocabulary of lenses to emit machine code! What looks like using a data structure to us when viewed through the Haskell perspective, is instead invoking an assembler.
Reflections
Once the idea sinks in, you’ll start seeing all sorts of cool things you can do with optics to generate code. Prisms generalize running initializer code. A Traversal over Code can be implemented as a loop. And since all the sizes are known statically, if you’re feeling plucky, you can decide to unroll the loop right there in the lens.
Outside of the context of Code, the realization that optics are this general is still doing my head in. Something I love about working in Haskell is that I’m still regularly having my mind blown, even after a decade.
Short of compiling to categories via something like categorifier.↩︎
Although perhaps it will be interesting only to future me, I thought it would be a good exercise to write up the experience—if only so I learn the lesson about how to read profiles and not make the same mistake again.
Some Context
I’m currently employed to work on a compiler. The performance has never been stellar, in that we were usually seeing about 5s to compile programs, even trivially small ones consisting of less than a hundred instructions. It was painful, but not that painful, since the test suite still finished in a minute or two. It was a good opportunity to get a coffee. I always assumed that the time penalties we were seeing were constant factors; perhaps it took a second or two to connect to Z3 or something like that.
But then we started unrolling loops, which turned trivially small programs into merely small programs, and our performance ballooned. Now we were looking at 45s for some of our tests! Uh oh! That’s no longer in the real of constant factors, and it was clear that something asymptotically was wrong.
So I fired up GHC with the trusty old -prof flag, and ran the test suite in +RTS -p mode, which instruments the program with all sorts of profiling goodies. After a few minutes, the test suite completed, and left a test-suite.prof file laying around in the current directory. You can inspect such things by hand, but tools like profiteur make the experience much nicer.
Without further ado, here’s what our profile looked like:
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
Well, that’s not very helpful. Of course MAIN takes 100% of the time. So I expanded that, and saw:
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
No clearer. Opening up main:
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
Sheesh.
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
└ getTest . . . . . . . . . . . . . . . . . . . 100%
OH MY GOD. JUST TELL ME SOMETHING ALREADY.
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
└ getTest . . . . . . . . . . . . . . . . . . . 100%
└ test . . . . . . . . . . . . . . . . . . . . 100%
Fast forwarding for quite a while, I opened up the entire stack until I got to something that didn’t take 100% of the program’s runtime:
MAIN . . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main . . . . . . . . . . . . . . . . . . . . . . . 100%
└ main.\ . . . . . . . . . . . . . . . . . . . . . 100%
└ getTest . . . . . . . . . . . . . . . . . . . 100%
└ test . . . . . . . . . . . . . . . . . . . . 100%
└ makeTest . . . . . . . . . . . . . . . . . 100%
└ makeTest.\ . . . . . . . . . . . . . . . 100%
└ compileProgram . . . . . . . . . . . . 100%
└ evalAppT . . . . . . . . . . . . . . 100%
└ runAppT . . . . . . . . . . . . . 100%
└ runAppT' . . . . . . . . . . . . 100%
└ withLogging . . . . . . . . . 100%
└ transformSSA . . . . . . . . 100%
└ >>= . . . . . . . . . . . 100%
└ >>>= . . . . . . . . . . 100%
└ ibind . . . . . . . . 100%
└ ibind.\ . . . . . . 100%
└ ibind.\.\ . . . . 100%
├ toSSA . . . . . 15%
├ transform1 . . . 15%
├ transform2 . . . 10%
├ transform3 . . . 10%
├ transform4 . . . 20%
└ collectGarbage . 30%
Now we’re in business. I dutifully dug into toSSA, the transforms, and collectGarbage. I cached some things, used better data structures, stopped appending lists, you know, the usual Haskell tricks. My work was rewarded, in that I managed to shave 80% off the runtime of our program.
A few months later, we wrote a bigger program and fed it to the compiler. This one didn’t stop compiling. We left it overnight.
Uh oh. Turns out I hadn’t fixed the problem. I’d only papered over it.
Retrospective
So what went wrong here? Quite a lot, in fact! And worse, I had all of the information all along, but managed to misinterpret it at several steps of the process.
Unwinding the story stack, the most salient aspect of having not solved the problem was reducing the runtime by only 80%. Dramatic percentages feel like amazing improvements, but that’s because human brains are poorly designed for building software. In the real world, big percentages are fantastic. In software, they are linear improvements.
That is to say that a percentage-based improvement is \(O(n)\) faster in the best case. My efforts improved our runtime from 45s to 9s. Which feels great, but the real problem is that this program is measured in seconds at all.
It’s more informative to think in terms of orders of magnitude. Taking 45s on a ~3GHz processor is on the order of 1011 instructions, while 9s is 1010. How the hell is it taking us TEN BILLION instructions to compile a dinky little program? That’s the real problem. Improving things from one hundred billion down to ten billion is no longer very impressive at all.
To get a sense of the scale here, even if we spent 1M cycles (which feels conservatively expensive) for each instruction we wanted to compile, we should still be looking at < 0.1s. Somehow we are over 1000x worse than that.
So that’s one mistake I made: being impressed by extremely marginal improvements. Bad Sandy.
The other mistake came from my interpretation of the profile. As a quick pop quiz, scroll back up to the profile and see if you can spot where the problem is.
After expanding a few obviously-not-the-problem call centers that each were 100% of the runtime, I turned my brain off and opened all of the 100% nodes. But in doing so, I accidentally breezed past the real problem. The real problem is either that compileProgram takes 100% of the time of the test, or that transformSSA takes 100% of compiling the program. Why’s that? Because unlike main and co, test does more work than just compiling the program. It also does non-trivial IO to produce debugging outputs, and property checks the resulting programs. Similarly for compileProgram, which does a great deal more than transformSSA.
This is somewhat of a philosophical enlightenment. The program execution hasn’t changed at all, but our perspective has. Rather than micro-optimizing the code that is running, this new perspective suggests we should focus our effort on determining why that code is running in the first place.
Digging through transformSSA made it very obvious the problem was an algorithmic one—we were running an unbounded loop that terminated on convergence, where each step it took @O(n^2)@ work to make a single step. When I stopped to actually read the code, the problem was immediate, and the solution obvious.
The lesson? Don’t read the profile. Read the code. Use the profile to focus your attention.
]]>