| CARVIEW |
Select Language
HTTP/2 200
server: nginx
date: Fri, 16 Jan 2026 11:53:20 GMT
content-type: application/atom+xml
last-modified: Fri, 16 Jan 2026 11:33:56 GMT
vary: Accept-Encoding
etag: W/"696a2224-2c5ec7"
strict-transport-security: max-age=2592000; includeSubDomains
referrer-policy: strict-origin-when-cross-origin
x-frame-options: SAMEORIGIN
x-xss-protection: 1; mode=block
x-content-type-options: nosniff
cache-control: no-transform,public,max-age=172800,s-maxage=172800
content-encoding: gzip
Posts tagged ‘haskell’ on abhinavsarkar.net https://abhinavsarkar.net/posts/tags/haskell/feed.atom 2025-10-21T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net https://abhinavsarkar.net/images/favicon.ico © 2017–2026, Abhinav Sarkar https://abhinavsarkar.net/posts/arithmetic-bytecode-vm/ A Fast Bytecode VM for Arithmetic: The Virtual Machine 2025-10-21T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics:</p>
<ul class="task-list">
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#parsing-expressions">Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#testing-the-parser">Unit testing for our parser.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#the-ast-interpreter">Interpreting ASTs.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed#the-compiler">Compiling ASTs to bytecode.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed#the-decompiler">Disassembling and decompiling bytecode.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed#testing-the-compiler">Unit testing for our compiler.</a></label></li>
<li><label><input type="checkbox"></input><span class="todo">Property-based testing for our compiler.</span></label></li>
<li><label><input type="checkbox"></input><span class="todo">Efficiently executing bytecode in a virtual machine (VM).</span></label></li>
<li><label><input type="checkbox"></input><span class="todo">Unit testing and property-based testing for our <abbr title="Virtual Machine">VM</abbr>.</span></label></li>
<li><label><input type="checkbox"></input><span class="todo">Benchmarking our code to see how the different passes perform.</span></label></li>
<li><label><input type="checkbox"></input><span class="todo">All the while keeping an eye on performance.</span></label></li>
</ul>
<p>In this final post, we write the virtual machine that executes our bytecode, and benchmark it.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>A Fast Bytecode VM for Arithmetic</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed">The Parser</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed">The Compiler</a>
</li>
<li>
<strong>The Virtual Machine</strong> 👈
</li>
</ol>
</section>
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#introduction">Introduction</a></li><li><a href="#testing-the-compiler">Testing the Compiler</a></li><li><a href="#the-virtual-machine">The Virtual Machine</a></li><li><a href="#testing-the-vm">Testing the <a href="%Virtual%20Machine" target="_blank" rel="noopener">VM</a></a></li><li><a href="#benchmarking-the-vm">Benchmarking the <a href="%Virtual%20Machine" target="_blank" rel="noopener">VM</a></a></li><li><a href="#benchmarking-against-c">Benchmarking Against C</a></li><li><a href="#future-directions">Future Directions</a></li><li><a href="#conclusion">Conclusion</a></li></ol></nav>
<h2 data-track-content data-content-name="introduction" data-content-piece="arithmetic-bytecode-vm" id="introduction">Introduction</h2>
<p><a href="https://en.wikipedia.org/wiki/Bytecode" target="_blank" rel="noopener">Bytecode</a> <em><a href="https://en.wikipedia.org/wiki/Virtual_machine#Process_virtual_machines" target="_blank" rel="noopener">Virtual Machines</a></em> (VMs) are known to be faster than <abbr title="Abstract Syntax Tree">AST</abbr>-walking interpreters. That’s why many real-world programming languages these days are implemented with bytecode <abbr title="Virtual Machine">VM</abbr>s, for example, <a href="https://en.wikipedia.org/wiki/Java_(programming_language)" target="_blank" rel="noopener">Java</a>, <a href="https://en.wikipedia.org/wiki/Python_(programming_language)" target="_blank" rel="noopener">Python</a>, <a href="https://en.wikipedia.org/wiki/PHP" target="_blank" rel="noopener">PHP</a>, and <a href="https://en.wikipedia.org/wiki/Raku" target="_blank" rel="noopener">Raku</a>. The reason is partially, the flat and compact nature of bytecode itself. But <abbr title="Virtual Machine">VM</abbr>s also have a few other tricks up their sleeves that make them highly performant. In this post, we write a <abbr title="Virtual Machine">VM</abbr> for our arithmetic expression language, and explore some of these performance tricks.</p>
<p>But first, we need to finish a pending task.</p>
<h2 data-track-content data-content-name="testing-the-compiler" data-content-piece="arithmetic-bytecode-vm" id="testing-the-compiler">Testing the Compiler</h2>
<p>We wrote <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed#testing-the-compiler%5D">some unit tests</a> for our compiler in the last post, but unit tests cover only the cases we can think of. A compiler has to deal with any input, and with just unit tests we cannot be sure of its correctness.</p>
<p>To test our compiler and other components for correctness, we use the <a href="https://hackage.haskell.org/package/QuickCheck" target="_blank" rel="noopener">QuickCheck</a> library. QuickCheck is a <em>Property-based Testing</em> framework. The key idea of property-based testing is to write properties of our code that hold true for any input, and then to automatically generate a large number of arbitrary inputs and make sure that the properties are indeed true for them<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a><a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>. Since we are writing an arithmetic expression parser/compiler/<abbr title="Virtual Machine">VM</abbr>, we generate arbitrary expression <abbr title="Abstract Syntax Tree">AST</abbr>s, and use them to assert certain invariants of our program.</p>
<p>With QuickCheck, we write generators for the inputs for our tests. These generators are composable just like parser combinators are. We use the library provided generators to write small generators that we combine to create larger ones. Let’s start:</p>
<figure>
<div class="sourceCode" id="cb1" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">numGen ::</span> <span class="dt">Q.Gen</span> <span class="dt">Expr</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>numGen <span class="ot">=</span> <span class="dt">Num</span> <span class="op"><$></span> Q.arbitrary</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="ot">varGen ::</span> <span class="dt">Set.Set</span> <span class="dt">Ident</span> <span class="ot">-></span> <span class="dt">Q.Gen</span> <span class="dt">Expr</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>varGen vars <span class="ot">=</span> <span class="dt">Var</span> <span class="op"><$></span> Q.elements (Set.toList vars)</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> </span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="ot">identGen ::</span> <span class="dt">Q.Gen</span> <span class="dt">Ident</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>identGen <span class="ot">=</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> mkIdent</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> ( (<span class="op">:</span>)</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> Q.elements lower</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> Q.resize <span class="dv">5</span> (Q.listOf1 <span class="op">$</span> Q.elements validChars)</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> ) <span class="ot">`Q.suchThat`</span> (<span class="fu">not</span> <span class="op">.</span> isReservedKeyword <span class="op">.</span> BSC.pack)</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> lower <span class="ot">=</span> [<span class="ch">'a'</span> <span class="op">..</span> <span class="ch">'z'</span>]</span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> validChars <span class="ot">=</span> lower <span class="op"><></span> [<span class="ch">'A'</span> <span class="op">..</span> <span class="ch">'Z'</span>]</span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>First come the basic generators:</p>
<ul>
<li><code>numGen</code> generates number expressions by using QuickCheck’s built-in <code>arbitrary</code> function.</li>
<li><code>varGen</code> generates variable expressions by choosing from the set of passed valid variable names.</li>
<li><code>identGen</code> generates valid identifiers from combinations of letters a—z and A—Z, and discarding ones that are reserved keywords.</li>
</ul>
<p>Moving on to composite generators:</p>
<figure>
<div class="sourceCode" id="cb2" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">binOpGen ::</span> <span class="dt">Set.Set</span> <span class="dt">Ident</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Q.Gen</span> <span class="dt">Expr</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>binOpGen vars size <span class="ot">=</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">BinOp</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> Q.chooseEnum (<span class="dt">Add</span>, <span class="dt">Div</span>)</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> exprGen vars (size <span class="ot">`div`</span> <span class="dv">2</span>)</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> exprGen vars (size <span class="ot">`div`</span> <span class="dv">2</span>)</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="ot">letGen ::</span> <span class="dt">Set.Set</span> <span class="dt">Ident</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Q.Gen</span> <span class="dt">Expr</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>letGen vars size <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> x <span class="ot"><-</span> identGen</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> vars' <span class="ot">=</span> Set.insert x vars</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">Let</span> x <span class="op"><$></span> exprGen vars (size <span class="ot">`div`</span> <span class="dv">2</span>) <span class="op"><*></span> exprGen vars' (size <span class="ot">`div`</span> <span class="dv">2</span>)</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a><span class="ot">exprGen ::</span> <span class="dt">Set.Set</span> <span class="dt">Ident</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Q.Gen</span> <span class="dt">Expr</span></span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a>exprGen vars size</span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> size <span class="op"><</span> <span class="dv">5</span> <span class="ot">=</span> Q.frequency [(<span class="dv">4</span>, Q.oneof baseGens), (<span class="dv">1</span>, Q.oneof compositeGens)]</span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> Q.frequency [(<span class="dv">1</span>, Q.oneof baseGens), (<span class="dv">4</span>, Q.oneof compositeGens)]</span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a> baseGens <span class="ot">=</span> numGen <span class="op">:</span> [varGen vars <span class="op">|</span> <span class="fu">not</span> <span class="op">$</span> Set.null vars]</span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a> compositeGens <span class="ot">=</span> [binOpGen vars size, letGen vars size]</span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<ul>
<li><p><code>binOpGen</code> generates binary expressions with arbitrary binary operations. It recursively calls <code>exprGen</code> to generate the operands. The <code>size</code> parameter controls the complexity of the generated expressions, and we half the size of operands (and so on recursively) so that we don’t end up with infinitely large expressions.</p></li>
<li><p><code>letGen</code> generates <code class="sourceCode haskell"><span class="dt">Let</span></code> expressions by generating an identifier, and then generating the assignment and body expressions recursively. We do the same trick of halving sizes here as well. Notice that the assignment is generated with the passed variable names in scope, whereas the body is generated with the new identifier added to the scope.</p></li>
<li><p><code>exprGen</code> uses the above generators to generate all kinds of expressions. At smaller sizes, it prefers to generate base expressions, while at larger sizes, it prefers composite ones. Due to the careful recursive halving of size in composite generators, we end up with expressions of finite sizes.</p></li>
</ul>
<p>Finally, we have some instances of QuickCheck’s <code class="sourceCode haskell"><span class="dt">Arbitrary</span></code> type class to tie everything together:</p>
<figure>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Q.Arbitrary</span> <span class="dt">Expr</span> <span class="kw">where</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> arbitrary <span class="ot">=</span> Q.sized <span class="op">$</span> exprGen Set.empty</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> shrink <span class="ot">=</span> Q.genericShrink</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Q.Arbitrary</span> <span class="dt">Ident</span> <span class="kw">where</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> arbitrary <span class="ot">=</span> identGen</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Q.Arbitrary</span> <span class="dt">Op</span> <span class="kw">where</span></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> arbitrary <span class="ot">=</span> Q.chooseEnum (<span class="dt">Add</span>, <span class="dt">Div</span>)</span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>We can apply them in GHCi:</p>
<div class="sourceCode" id="cb4" data-lang="ghci"><pre class="sourceCode lhs noNumberSource"><code class="sourceCode literatehaskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeApplications</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> Q.sample <span class="op">$</span> Q.arbitrary <span class="op">@</span><span class="dt">Expr</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>0</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>((let jgSg = 2 in (-2 - -2)) + -2)</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>2</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>(0 / 1)</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>(-11 / -13)</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>((let kpuS = 10 in 31) + (let jChmZV = -12 in jChmZV))</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>((54 * -55) * (let ohLSk = 29 in -45))</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>(-102 - (-119 * -125))</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>(-234 - (32 / -217))</span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a>(let kVrB = (-261 * 238) in ((let qdz = 228 in 347) + 18))</span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a>(let uMMdXH = ((let ePUi = 842 in ePUi) - (let zrkM = (let vwH = ((9 + -987) / -487) in (let ylKowr = vwH in vwH)) in zrkM)) in (((uMMdXH / -836) / uMMdXH) - (let qkK = uMMdXH in qkK)))</span></code></pre></div>
<p>Notice that the generated samples increase in complexity. With the generators in place, we define our properties next. Let’s test our parser first:</p>
<figure>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ot">prop_print_ast_then_parse_returns_same_ast ::</span> <span class="dt">Spec</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>prop_print_ast_then_parse_returns_same_ast <span class="ot">=</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> prop <span class="st">"Property: Print AST then parse returns same AST"</span> <span class="op">$</span> \expr <span class="ot">-></span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> parse (BSC.pack <span class="op">$</span> <span class="fu">show</span> expr) <span class="op">==</span> <span class="dt">Right</span> expr</span></code></pre></div>
<figcaption>
ArithVMSpec.hs
</figcaption>
</figure>
<p>This property is a simple round-trip test for the parser and printer: we parse the string representation of a generated expression, and assert that it gives back the same expression.</p>
<p>The second property is a more involved round-trip test for the compiler and decompiler:</p>
<figure>
<div class="sourceCode" id="cb6" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">prop_disassemble_bytecode_then_decompile_then_compile_returns_same_bytecode ::</span> <span class="dt">Spec</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>prop_disassemble_bytecode_then_decompile_then_compile_returns_same_bytecode <span class="ot">=</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> prop ( <span class="st">"Property: Disassemble bytecode then decompile then compile"</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">" returns same bytecode"</span> ) <span class="op">$</span> \expr <span class="ot">-></span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> compile (sizedExpr expr) <span class="kw">of</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> _ <span class="ot">-></span> Q.discard</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> bytecode <span class="ot">-></span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> (disassemble bytecode <span class="op">>>=</span> (decompile <span class="op">>>></span> <span class="fu">fmap</span> sizedExpr) <span class="op">>>=</span> compile)</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a> <span class="op">==</span> <span class="dt">Right</span> bytecode</span></code></pre></div>
<figcaption>
ArithVMSpec.hs
</figcaption>
</figure>
<p>This asserts that compiling an expression, then disassembling and decompiling it, and finally compiling it again should result in the original bytecode<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>.</p>
<p>This requires a helper function to get the size of an expression:</p>
<figure>
<div class="sourceCode" id="cb7" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">sizedExpr ::</span> <span class="dt">Expr</span> <span class="ot">-></span> <span class="dt">SizedExpr</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>sizedExpr expr <span class="ot">=</span> <span class="kw">case</span> expr <span class="kw">of</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Num</span> _ <span class="ot">-></span> (expr, <span class="dv">3</span>)</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Var</span> _ <span class="ot">-></span> (expr, <span class="dv">2</span>)</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">BinOp</span> _ a b <span class="ot">-></span> (expr, <span class="fu">snd</span> (sizedExpr a) <span class="op">+</span> <span class="fu">snd</span> (sizedExpr b) <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Let</span> _ a b <span class="ot">-></span> (expr, <span class="fu">snd</span> (sizedExpr a) <span class="op">+</span> <span class="fu">snd</span> (sizedExpr b) <span class="op">+</span> <span class="dv">1</span>)</span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>We run these tests in a later section. This ends our short detour.</p>
<h2 data-track-content data-content-name="the-virtual-machine" data-content-piece="arithmetic-bytecode-vm" id="the-virtual-machine">The Virtual Machine</h2>
<p>Now for the main event: the virtual machine. Our <abbr title="Virtual Machine">VM</abbr> is a stack-based machine that operates on a stack of values and executes the compiled bytecode. Our goal is to be as fast as possible. For a quick reminder, these are our <code class="sourceCode haskell"><span class="dt">Opcode</span></code>s:</p>
<figure>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Opcode</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">OPush</span> <span class="op">!</span><span class="dt">Int16</span> <span class="co">-- 0</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OGet</span> <span class="op">!</span><span class="dt">Word8</span> <span class="co">-- 1</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OSwapPop</span> <span class="co">-- 2</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OAdd</span> <span class="co">-- 3</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OSub</span> <span class="co">-- 4</span></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OMul</span> <span class="co">-- 5</span></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">ODiv</span> <span class="co">-- 6</span></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Read</span>, <span class="dt">Eq</span>, <span class="dt">Generic</span>)</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>And now, the heart of the <abbr title="Virtual Machine">VM</abbr>:</p>
<figure>
<div class="sourceCode" id="cb9" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">interpretBytecode ::</span> <span class="dt">Bytecode</span> <span class="ot">-></span> <span class="dt">Result</span> <span class="dt">Int16</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>interpretBytecode <span class="ot">=</span> interpretBytecode' defaultStackSize</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="ot">interpretBytecode' ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Bytecode</span> <span class="ot">-></span> <span class="dt">Result</span> <span class="dt">Int16</span></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>interpretBytecode' stackSize bytecode <span class="ot">=</span> runST <span class="op">$</span> runExceptT <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a> stack <span class="ot"><-</span> PA.newPinnedPrimArray stackSize</span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a> sp <span class="ot"><-</span> go <span class="dv">0</span> <span class="dv">0</span> stack</span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a> checkStack <span class="dt">InterpretBytecode</span> stackSize sp</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a> PA.readPrimArray stack <span class="dv">0</span></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a> <span class="op">!</span>size <span class="ot">=</span> BS.length bytecode</span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a> go sp ip _ <span class="op">|</span> ip <span class="op">==</span> size <span class="ot">=</span> <span class="fu">pure</span> sp</span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a> go <span class="op">!</span>sp <span class="op">!</span>ip stack <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> opcode <span class="ot">=</span> readInstr bytecode ip</span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span></span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> sp <span class="op">>=</span> stackSize <span class="ot">-></span> throwInterpretError <span class="st">"Stack overflow"</span></span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> sp <span class="op"><</span> <span class="dv">0</span> <span class="ot">-></span> throwInterpretError <span class="st">"Stack underflow"</span></span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> sp <span class="op"><</span> <span class="dv">2</span> <span class="op">&&</span> opcode <span class="op">>=</span> <span class="dv">2</span> <span class="ot">-></span> throwInsufficientElementsError</span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> opcode <span class="op">==</span> <span class="dv">0</span> <span class="op">&&</span> ip <span class="op">+</span> <span class="dv">2</span> <span class="op">>=</span> size <span class="ot">-></span> throwIPOOBError <span class="op">$</span> ip <span class="op">+</span> <span class="dv">2</span></span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> opcode <span class="op">==</span> <span class="dv">1</span> <span class="op">&&</span> ip <span class="op">+</span> <span class="dv">1</span> <span class="op">>=</span> size <span class="ot">-></span> throwIPOOBError <span class="op">$</span> ip <span class="op">+</span> <span class="dv">1</span></span>
<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">-></span> <span class="kw">case</span> opcode <span class="kw">of</span></span>
<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> <span class="kw">do</span> <span class="co">-- OPush</span></span>
<span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a> PA.writePrimArray stack sp <span class="op">$</span> readInstrArgInt16 bytecode ip</span>
<span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a> go (sp <span class="op">+</span> <span class="dv">1</span>) (ip <span class="op">+</span> <span class="dv">3</span>) stack</span>
<span id="cb9-26"><a href="#cb9-26" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="ot">-></span> <span class="kw">do</span> <span class="co">-- OGet</span></span>
<span id="cb9-27"><a href="#cb9-27" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> i <span class="ot">=</span> <span class="fu">fromIntegral</span> <span class="op">$</span> readInstrArgWord8 bytecode ip</span>
<span id="cb9-28"><a href="#cb9-28" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> i <span class="op"><</span> sp</span>
<span id="cb9-29"><a href="#cb9-29" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> <span class="kw">do</span></span>
<span id="cb9-30"><a href="#cb9-30" aria-hidden="true" tabindex="-1"></a> PA.copyMutablePrimArray stack sp stack i <span class="dv">1</span></span>
<span id="cb9-31"><a href="#cb9-31" aria-hidden="true" tabindex="-1"></a> go (sp <span class="op">+</span> <span class="dv">1</span>) (ip <span class="op">+</span> <span class="dv">2</span>) stack</span>
<span id="cb9-32"><a href="#cb9-32" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> throwInterpretError <span class="op">$</span></span>
<span id="cb9-33"><a href="#cb9-33" aria-hidden="true" tabindex="-1"></a> <span class="st">"Stack index "</span> <span class="op"><></span> <span class="fu">show</span> i <span class="op"><></span> <span class="st">" out of bound "</span> <span class="op"><></span> <span class="fu">show</span> (sp <span class="op">-</span> <span class="dv">1</span>)</span>
<span id="cb9-34"><a href="#cb9-34" aria-hidden="true" tabindex="-1"></a> <span class="dv">2</span> <span class="ot">-></span> <span class="kw">do</span> <span class="co">-- OSwapPop</span></span>
<span id="cb9-35"><a href="#cb9-35" aria-hidden="true" tabindex="-1"></a> PA.copyMutablePrimArray stack (sp <span class="op">-</span> <span class="dv">2</span>) stack (sp <span class="op">-</span> <span class="dv">1</span>) <span class="dv">1</span></span>
<span id="cb9-36"><a href="#cb9-36" aria-hidden="true" tabindex="-1"></a> go (sp <span class="op">-</span> <span class="dv">1</span>) (ip <span class="op">+</span> <span class="dv">1</span>) stack</span>
<span id="cb9-37"><a href="#cb9-37" aria-hidden="true" tabindex="-1"></a> <span class="dv">3</span> <span class="ot">-></span> interpretBinOp (<span class="op">+</span>) <span class="co">-- OAdd</span></span>
<span id="cb9-38"><a href="#cb9-38" aria-hidden="true" tabindex="-1"></a> <span class="dv">4</span> <span class="ot">-></span> interpretBinOp (<span class="op">-</span>) <span class="co">-- OSub</span></span>
<span id="cb9-39"><a href="#cb9-39" aria-hidden="true" tabindex="-1"></a> <span class="dv">5</span> <span class="ot">-></span> interpretBinOp (<span class="op">*</span>) <span class="co">-- OMul</span></span>
<span id="cb9-40"><a href="#cb9-40" aria-hidden="true" tabindex="-1"></a> <span class="dv">6</span> <span class="ot">-></span> <span class="kw">do</span> <span class="co">-- ODiv</span></span>
<span id="cb9-41"><a href="#cb9-41" aria-hidden="true" tabindex="-1"></a> b <span class="ot"><-</span> PA.readPrimArray stack <span class="op">$</span> sp <span class="op">-</span> <span class="dv">1</span></span>
<span id="cb9-42"><a href="#cb9-42" aria-hidden="true" tabindex="-1"></a> a <span class="ot"><-</span> PA.readPrimArray stack <span class="op">$</span> sp <span class="op">-</span> <span class="dv">2</span></span>
<span id="cb9-43"><a href="#cb9-43" aria-hidden="true" tabindex="-1"></a> when (b <span class="op">==</span> <span class="dv">0</span>) <span class="op">$</span> throwInterpretError <span class="st">"Division by zero"</span></span>
<span id="cb9-44"><a href="#cb9-44" aria-hidden="true" tabindex="-1"></a> when (b <span class="op">==</span> (<span class="op">-</span><span class="dv">1</span>) <span class="op">&&</span> a <span class="op">==</span> <span class="fu">minBound</span>) <span class="op">$</span></span>
<span id="cb9-45"><a href="#cb9-45" aria-hidden="true" tabindex="-1"></a> throwInterpretError <span class="st">"Arithmetic overflow"</span></span>
<span id="cb9-46"><a href="#cb9-46" aria-hidden="true" tabindex="-1"></a> PA.writePrimArray stack (sp <span class="op">-</span> <span class="dv">2</span>) <span class="op">$</span> a <span class="ot">`div`</span> b</span>
<span id="cb9-47"><a href="#cb9-47" aria-hidden="true" tabindex="-1"></a> go (sp <span class="op">-</span> <span class="dv">1</span>) (ip <span class="op">+</span> <span class="dv">1</span>) stack</span>
<span id="cb9-48"><a href="#cb9-48" aria-hidden="true" tabindex="-1"></a> n <span class="ot">-></span> throwInterpretError <span class="op">$</span></span>
<span id="cb9-49"><a href="#cb9-49" aria-hidden="true" tabindex="-1"></a> <span class="st">"Invalid bytecode: "</span> <span class="op"><></span> <span class="fu">show</span> n <span class="op"><></span> <span class="st">" at: "</span> <span class="op"><></span> <span class="fu">show</span> ip</span>
<span id="cb9-50"><a href="#cb9-50" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb9-51"><a href="#cb9-51" aria-hidden="true" tabindex="-1"></a> interpretBinOp op <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb9-52"><a href="#cb9-52" aria-hidden="true" tabindex="-1"></a> b <span class="ot"><-</span> PA.readPrimArray stack <span class="op">$</span> sp <span class="op">-</span> <span class="dv">1</span></span>
<span id="cb9-53"><a href="#cb9-53" aria-hidden="true" tabindex="-1"></a> a <span class="ot"><-</span> PA.readPrimArray stack <span class="op">$</span> sp <span class="op">-</span> <span class="dv">2</span></span>
<span id="cb9-54"><a href="#cb9-54" aria-hidden="true" tabindex="-1"></a> PA.writePrimArray stack (sp <span class="op">-</span> <span class="dv">2</span>) <span class="op">$</span> a <span class="ot">`op`</span> b</span>
<span id="cb9-55"><a href="#cb9-55" aria-hidden="true" tabindex="-1"></a> go (sp <span class="op">-</span> <span class="dv">1</span>) (ip <span class="op">+</span> <span class="dv">1</span>) stack</span>
<span id="cb9-56"><a href="#cb9-56" aria-hidden="true" tabindex="-1"></a> <span class="ot">{-# INLINE interpretBinOp #-}</span></span>
<span id="cb9-57"><a href="#cb9-57" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-58"><a href="#cb9-58" aria-hidden="true" tabindex="-1"></a> throwIPOOBError ip <span class="ot">=</span> throwInterpretError <span class="op">$</span></span>
<span id="cb9-59"><a href="#cb9-59" aria-hidden="true" tabindex="-1"></a> <span class="st">"Instruction index "</span> <span class="op"><></span> <span class="fu">show</span> ip <span class="op"><></span> <span class="st">" out of bound "</span> <span class="op"><></span> <span class="fu">show</span> (size <span class="op">-</span> <span class="dv">1</span>)</span>
<span id="cb9-60"><a href="#cb9-60" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-61"><a href="#cb9-61" aria-hidden="true" tabindex="-1"></a> throwInsufficientElementsError <span class="ot">=</span></span>
<span id="cb9-62"><a href="#cb9-62" aria-hidden="true" tabindex="-1"></a> throwInterpretError <span class="st">"Not enough elements to execute operation"</span></span>
<span id="cb9-63"><a href="#cb9-63" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-64"><a href="#cb9-64" aria-hidden="true" tabindex="-1"></a> throwInterpretError <span class="ot">=</span> throwError <span class="op">.</span> <span class="dt">Error</span> <span class="dt">InterpretBytecode</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>The <code>interpretBytecode'</code> function is where the action happens. It is way more complex than <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#the-ast-interpreter"><code>interpretAST</code></a>, but the complexity has a reason, namely performance.</p>
<p><code>interpretBytecode'</code> runs inside the <a href="https://hackage.haskell.org/package/base/docs/Control-Monad-ST.html#t:ST" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ST</span></code></a> monad wrapped with the <a href="https://hackage.haskell.org/package/mtl/docs/Control-Monad-Except.html#t:ExceptT" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ExceptT</span></code></a> monad transformer. <code>ST</code> monad lets us use mutable data structures locally while ensuring the function remains externally pure. <code class="sourceCode haskell"><span class="dt">ExceptT</span></code> monad transformer adds support for throwing and propagating errors in a pure manner.</p>
<p>We use <a href="https://hackage.haskell.org/package/primitive/docs/Data-Primitive-PrimArray.html" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">PrimArray</span></code></a> for our stack, which is a mutable array of unboxed primitive types, in our case an array of <a href="https://hackage.haskell.org/package/base/docs/Data-Int.html#t:Int16" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Int16</span></code></a> values. Using a mutable unboxed array is much faster than using an immutable and/or boxed one like <a href="https://hackage.haskell.org/package/containers/docs/Data-Sequence.html#t:Seq" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Seq</span></code></a> or <a href="https://hackage.haskell.org/package/vector/docs/Data-Vector.html" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Vector</span></code></a> due to reduced allocation and/or pointer chasing.</p>
<p>The core of the <abbr title="Virtual Machine">VM</abbr> is the <code>go</code> function, a tight, tail-recursive loop that <abbr title="Glasgow Haskell Compiler">GHC</abbr> compiles into an efficient machine loop, as we see later. It takes the stack pointer (<code>sp</code>), instruction pointer (<code>ip</code>)<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>, and the stack as arguments.</p>
<p>At the top of each loop, a block of guard clauses checks for stack overflow, underflow, and other error conditions before branching on the current opcode. Placing these checks at the top instead of inside the opcode cases is a deliberate choice. This may make the code slightly harder to understand, but it significantly improves the performance of the loop by moving all branching at the beginning of the loop, resulting in code that is more friendly to the CPU’s <em><a href="https://en.wikipedia.org/wiki/Branch_predictor" target="_blank" rel="noopener">Branch Predictor</a></em>. Also notice how we reduce the number of checks by working with a range of opcodes at once in the <code class="sourceCode haskell">opcode <span class="op">>=</span> <span class="dv">2</span></code> guard. The checks are also sorted so as to be most performant, guided by profiling and benchmarking<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>.</p>
<p>The handling of each opcode is actually pretty straightforward. We use different <code class="sourceCode haskell"><span class="dt">PrimArray</span></code> specific operations to read and write to the stack, while taking care of doing the required bound and arithmetic checks. We also use the <code>readInstr*</code> functions that we wrote <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed#cb7-1">earlier</a>.</p>
<p>After carrying out each operation, we reenter the loop by calling it tail-recursively with the right stack and instruction pointers. Finally, we make sure that the execution terminated correctly by checking the state of the stack, and return its first element.</p>
<h3 id="peeking-under-the-hood-ghc-core">Peeking Under the Hood: <abbr title="Glasgow Haskell Compiler">GHC</abbr> Core</h3>
<p>We see <a href="#benchmarking-the-vm">later</a> that the <abbr title="Virtual Machine">VM</abbr> is quite fast, but how does <abbr title="Glasgow Haskell Compiler">GHC</abbr> achieve this performance? To see the magic, we can look at <abbr title="Glasgow Haskell Compiler">GHC</abbr>’s intermediate language: <a href="https://hackage-content.haskell.org/package/ghc/docs/GHC-Core.html" target="_blank" rel="noopener">Core</a>. Core is a simpler functional language than Haskell to which <abbr title="Glasgow Haskell Compiler">GHC</abbr> compiles Haskell. The simpler nature of Core makes it easier for <abbr title="Glasgow Haskell Compiler">GHC</abbr> to optimize it, and compile it further. We can get the Core code for a program by compiling with the <abbr title="Glasgow Haskell Compiler">GHC</abbr> option <code>-ddump-simpl</code>.</p>
<p>The actual Core code for our <abbr title="Virtual Machine">VM</abbr> is too verbose to show here, but here is a simplified C-like pseudo-code version of our <code>go</code> loop:</p>
<div class="sourceCode" id="cb10" data-lang="core"><pre class="sourceCode c numberSource"><code class="sourceCode c"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>$wgo <span class="op">(</span>stack_addr<span class="op">,</span> ip<span class="op">,</span> sp<span class="op">)</span> <span class="op">{</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>ip <span class="op">==</span> bytecode_size<span class="op">)</span> <span class="op">{</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> sp<span class="op">;</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>sp <span class="op">>=</span> stack_size<span class="op">)</span> <span class="op">{</span></span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> throw <span class="st">"Stack Overflow"</span><span class="op">;</span></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>sp <span class="op"><</span> <span class="dv">0</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a> throw <span class="st">"Stack Underflow"</span><span class="op">;</span></span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a> opcode <span class="op">=</span> read_byte_at<span class="op">(</span>bytecode_addr<span class="op">,</span> ip<span class="op">);</span></span>
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a> <span class="co">// ... other checks ...</span></span>
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true" tabindex="-1"></a> <span class="cf">switch</span> <span class="op">(</span>opcode<span class="op">)</span> <span class="op">{</span></span>
<span id="cb10-16"><a href="#cb10-16" aria-hidden="true" tabindex="-1"></a> <span class="cf">case</span> <span class="dv">0</span><span class="op">:</span> <span class="co">// OPush</span></span>
<span id="cb10-17"><a href="#cb10-17" aria-hidden="true" tabindex="-1"></a> val <span class="op">=</span> read_int16_at<span class="op">(</span>bytecode_addr<span class="op">,</span> ip <span class="op">+</span> <span class="dv">1</span><span class="op">);</span></span>
<span id="cb10-18"><a href="#cb10-18" aria-hidden="true" tabindex="-1"></a> write_int16_at<span class="op">(</span>stack_addr<span class="op">,</span> sp<span class="op">,</span> val<span class="op">);</span></span>
<span id="cb10-19"><a href="#cb10-19" aria-hidden="true" tabindex="-1"></a> jump $wgo<span class="op">(</span>stack_addr<span class="op">,</span> ip <span class="op">+</span> <span class="dv">3</span><span class="op">,</span> sp <span class="op">+</span> <span class="dv">1</span><span class="op">);</span></span>
<span id="cb10-20"><a href="#cb10-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-21"><a href="#cb10-21" aria-hidden="true" tabindex="-1"></a> <span class="cf">case</span> <span class="dv">3</span><span class="op">:</span> <span class="co">// OAdd</span></span>
<span id="cb10-22"><a href="#cb10-22" aria-hidden="true" tabindex="-1"></a> val2 <span class="op">=</span> read_int16_at<span class="op">(</span>stack_addr<span class="op">,</span> sp <span class="op">-</span> <span class="dv">1</span><span class="op">);</span></span>
<span id="cb10-23"><a href="#cb10-23" aria-hidden="true" tabindex="-1"></a> val1 <span class="op">=</span> read_int16_at<span class="op">(</span>stack_addr<span class="op">,</span> sp <span class="op">-</span> <span class="dv">2</span><span class="op">);</span></span>
<span id="cb10-24"><a href="#cb10-24" aria-hidden="true" tabindex="-1"></a> write_int16_at<span class="op">(</span>stack_addr<span class="op">,</span> sp <span class="op">-</span> <span class="dv">2</span><span class="op">,</span> val1 <span class="op">+</span> val2<span class="op">);</span></span>
<span id="cb10-25"><a href="#cb10-25" aria-hidden="true" tabindex="-1"></a> jump $wgo<span class="op">(</span>stack_addr<span class="op">,</span> ip <span class="op">+</span> <span class="dv">1</span><span class="op">,</span> sp <span class="op">-</span> <span class="dv">1</span><span class="op">);</span></span>
<span id="cb10-26"><a href="#cb10-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-27"><a href="#cb10-27" aria-hidden="true" tabindex="-1"></a> <span class="co">// ... other cases ...</span></span>
<span id="cb10-28"><a href="#cb10-28" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb10-29"><a href="#cb10-29" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>A few key optimizations are worth pointing out:</p>
<ol type="1">
<li><p><strong>The loop:</strong> The tail-recursive <code>go</code> function is compiled into a proper loop. The <code>jump $wgo(...)</code> instruction is effectively a <code class="sourceCode c"><span class="cf">goto</span></code>, which means there’s no function call overhead for each iteration of the <abbr title="Virtual Machine">VM</abbr> loop.</p></li>
<li><p><strong>Unboxing:</strong> The Core code is full of primitive, unboxed types like <code class="sourceCode haskell"><span class="dt">Int</span><span class="op">#</span></code>, <code class="sourceCode haskell"><span class="dt">Addr</span><span class="op">#</span></code>, and <code class="sourceCode haskell"><span class="dt">Word</span><span class="op">#</span></code>, and operations on them. These are raw machine integers and memory addresses, not boxed Haskell objects. This means operations on them are as fast as they would be in C. The stack operations are not function calls on a <code class="sourceCode haskell"><span class="dt">PrimArray</span></code> instance, but primitive memory reads and writes on a raw memory address <code>stack_addr</code>.</p></li>
<li><p><strong>Inlining:</strong> The <code>interpretBinOp</code> helper function is completely inlined into the main loop. For <code class="sourceCode haskell"><span class="dt">OAdd</span></code>, the code for reading two values, adding them, and writing the result is laid out inline, and works on unboxed values and array address.</p></li>
</ol>
<p>In short, <abbr title="Glasgow Haskell Compiler">GHC</abbr> has turned our high-level, declarative Haskell code into a low-level loop that looks remarkably like one we would write in C. We get the safety and expressiveness of Haskell, while <abbr title="Glasgow Haskell Compiler">GHC</abbr> does the heavy lifting to produce highly optimized code. It’s the best of both worlds!</p>
<h2 data-track-content data-content-name="testing-the-vm" data-content-piece="arithmetic-bytecode-vm" id="testing-the-vm">Testing the <abbr title="Virtual Machine">VM</abbr></h2>
<p>We must test the <abbr title="Virtual Machine">VM</abbr> to make sure it works correctly<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>. We reuse <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#testing-the-interpreter">the success and failure tests</a> for the <abbr title="Abstract Syntax Tree">AST</abbr> interpreter, as the bytecode interpreter should yield the same result:</p>
<figure>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">bytecodeInterpreterSpec ::</span> <span class="dt">Spec</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>bytecodeInterpreterSpec <span class="ot">=</span> describe <span class="st">"Bytecode interpreter"</span> <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> forM_ astInterpreterSuccessTests <span class="op">$</span> \(input, result) <span class="ot">-></span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> it (<span class="st">"interprets: \""</span> <span class="op"><></span> BSC.unpack input <span class="op"><></span> <span class="st">"\""</span>) <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> parseCompileInterpret input <span class="ot">`shouldBe`</span> <span class="dt">Right</span> result</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> forM_ errorTests <span class="op">$</span> \(input, err) <span class="ot">-></span></span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> it (<span class="st">"fails for: \""</span> <span class="op"><></span> BSC.unpack input <span class="op"><></span> <span class="st">"\""</span>) <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a> parseCompileInterpret input <span class="ot">`shouldSatisfy`</span> \<span class="kw">case</span></span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> (<span class="dt">Error</span> <span class="dt">InterpretBytecode</span> msg) <span class="op">|</span> err <span class="op">==</span> msg <span class="ot">-></span> <span class="dt">True</span></span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a> parseCompileInterpret <span class="ot">=</span> parseSized <span class="op">>=></span> compile <span class="op">>=></span> interpretBytecode' <span class="dv">7</span></span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a> errorTests <span class="ot">=</span></span>
<span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a> [ (<span class="st">"1/0"</span>, <span class="st">"Division by zero"</span>),</span>
<span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a> (<span class="st">"-32768 / -1"</span>, <span class="st">"Arithmetic overflow"</span>),</span>
<span id="cb11-18"><a href="#cb11-18" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let a = 0 in let b = 0 in let c = 0 in let d = 0 in let e = 0 in "</span></span>
<span id="cb11-19"><a href="#cb11-19" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">"let f = 0 in a + b + c + d + e + f"</span>,</span>
<span id="cb11-20"><a href="#cb11-20" aria-hidden="true" tabindex="-1"></a> <span class="st">"Stack overflow"</span></span>
<span id="cb11-21"><a href="#cb11-21" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb11-22"><a href="#cb11-22" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb11-23"><a href="#cb11-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-24"><a href="#cb11-24" aria-hidden="true" tabindex="-1"></a><span class="ot">prop_interpret_ast_returns_same_result_as_compile_assemble_then_interpret_bytecode ::</span></span>
<span id="cb11-25"><a href="#cb11-25" aria-hidden="true" tabindex="-1"></a> <span class="dt">Spec</span></span>
<span id="cb11-26"><a href="#cb11-26" aria-hidden="true" tabindex="-1"></a>prop_interpret_ast_returns_same_result_as_compile_assemble_then_interpret_bytecode <span class="ot">=</span></span>
<span id="cb11-27"><a href="#cb11-27" aria-hidden="true" tabindex="-1"></a> prop ( <span class="st">"Property: Interpret AST returns same result as compile"</span></span>
<span id="cb11-28"><a href="#cb11-28" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">" then interpret bytecode"</span> ) <span class="op">$</span> \expr <span class="ot">-></span></span>
<span id="cb11-29"><a href="#cb11-29" aria-hidden="true" tabindex="-1"></a> interpretAST expr <span class="op">==</span> (compile (sizedExpr expr) <span class="op">>>=</span> interpretBytecode)</span></code></pre></div>
<figcaption>
ArithVMSpec.hs
</figcaption>
</figure>
<p>We also add a property-based test this time: for any given expression, interpreting the <abbr title="Abstract Syntax Tree">AST</abbr> should produce the same result as compiling it to bytecode and executing it in the <abbr title="Virtual Machine">VM</abbr><a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>.</p>
<p>Our test suite is complete now:</p>
<figure>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> hspec <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> parserSpec</span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> astInterpreterSpec</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a> compilerSpec</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a> prop_print_ast_then_parse_returns_same_ast</span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a> prop_disassemble_bytecode_then_decompile_then_compile_returns_same_bytecode</span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> bytecodeInterpreterSpec</span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a> prop_interpret_ast_returns_same_result_as_compile_assemble_then_interpret_bytecode</span></code></pre></div>
<figcaption>
ArithVMSpec.hs
</figcaption>
</figure>
<p>And finally, we run all tests together:</p>
<details>
<summary>
Test run
</summary>
<pre class="plain"><code>$ cabal test -O2
Running 1 test suites...
Test suite specs: RUNNING...
Parser
parses: "1 + 2 - 3 * 4 + 5 / 6 / 0 + 1" [✔]
parses: "1+2-3*4+5/6/0+1" [✔]
parses: "1 + -1" [✔]
parses: "let x = 4 in x + 1" [✔]
parses: "let x=4in x+1" [✔]
parses: "let x = 4 in let y = 5 in x + y" [✔]
parses: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
parses: "let x = 4 in (let y = 5 in x + 1) + let z = 2 in z * z" [✔]
parses: "let x=4in 2+let y=x-5in x+let z=y+1in z/2" [✔]
parses: "let x = (let y = 3 in y + y) in x * 3" [✔]
parses: "let x = let y = 3 in y + y in x * 3" [✔]
parses: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
fails for: "" [✔]
fails for: "1 +" [✔]
fails for: "1 & 1" [✔]
fails for: "1 + 1 & 1" [✔]
fails for: "1 & 1 + 1" [✔]
fails for: "(" [✔]
fails for: "(1" [✔]
fails for: "(1 + " [✔]
fails for: "(1 + 2" [✔]
fails for: "(1 + 2}" [✔]
fails for: "66666" [✔]
fails for: "-x" [✔]
fails for: "let 1" [✔]
fails for: "let x = 1 in " [✔]
fails for: "let let = 1 in 1" [✔]
fails for: "let x = 1 in in" [✔]
fails for: "let x=1 inx" [✔]
fails for: "letx = 1 in x" [✔]
fails for: "let x ~ 1 in x" [✔]
fails for: "let x = 1 & 2 in x" [✔]
fails for: "let x = 1 inx" [✔]
fails for: "let x = 1 in x +" [✔]
fails for: "let x = 1 in x in" [✔]
fails for: "let x = let x = 1 in x" [✔]
AST interpreter
interprets: "1" [✔]
interprets: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
interprets: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
interprets: "1 + -1" [✔]
interprets: "1 * -1" [✔]
interprets: "let x = 4 in x + 1" [✔]
interprets: "let x = 4 in let x = x + 1 in x + 2" [✔]
interprets: "let x = 4 in let y = 5 in x + y" [✔]
interprets: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
interprets: "let x = 4 in (let y = 5 in x + y) + let z = 2 in z * z" [✔]
interprets: "let x = let y = 3 in y + y in x * 3" [✔]
interprets: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
fails for: "x" [✔]
fails for: "let x = 4 in y + 1" [✔]
fails for: "let x = y + 1 in x" [✔]
fails for: "let x = x + 1 in x" [✔]
fails for: "1/0" [✔]
fails for: "-32768 / -1" [✔]
Compiler
compiles: "1" [✔]
compiles: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
compiles: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
compiles: "let x = 4 in x + 1" [✔]
compiles: "let x = 4 in let y = 5 in x + y" [✔]
compiles: "let x = 4 in let x = x + 1 in x + 2" [✔]
compiles: "let x = let y = 3 in y + y in x * 3" [✔]
compiles: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
compiles: "1/0" [✔]
compiles: "-32768 / -1" [✔]
fails for: "x" [✔]
fails for: "let x = 4 in y + 1" [✔]
fails for: "let x = y + 1 in x" [✔]
fails for: "let x = x + 1 in x" [✔]
fails for: "let x = 4 in let y = 1 in let z = 2 in y + x" [✔]
fails for: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
fails for: "let a = 0 in let b = 0 in let c = 0 in let d = 0 in d" [✔]
fails for greater sized expr [✔]
fails for lesser sized expr [✔]
Property: Print AST then parse returns same AST [✔]
+++ OK, passed 100 tests.
Property: Disassemble bytecode then decompile then compile returns same bytecode [✔]
+++ OK, passed 100 tests.
Bytecode interpreter
interprets: "1" [✔]
interprets: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
interprets: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
interprets: "1 + -1" [✔]
interprets: "1 * -1" [✔]
interprets: "let x = 4 in x + 1" [✔]
interprets: "let x = 4 in let x = x + 1 in x + 2" [✔]
interprets: "let x = 4 in let y = 5 in x + y" [✔]
interprets: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
interprets: "let x = 4 in (let y = 5 in x + y) + let z = 2 in z * z" [✔]
interprets: "let x = let y = 3 in y + y in x * 3" [✔]
interprets: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
fails for: "1/0" [✔]
fails for: "-32768 / -1" [✔]
fails for: "let a = 0 in let b = 0 in let c = 0 in let d = 0 in let e = 0 in let f = 0 in a + b + c + d + e + f" [✔]
Property: Interpret AST returns same result as compile then interpret bytecode [✔]
+++ OK, passed 100 tests.
Finished in 0.0166 seconds
91 examples, 0 failures
Test suite specs: PASS</code></pre>
</details>
<p>Happily, all tests pass.</p>
<h2 data-track-content data-content-name="benchmarking-the-vm" data-content-piece="arithmetic-bytecode-vm" id="benchmarking-the-vm">Benchmarking the <abbr title="Virtual Machine">VM</abbr></h2>
<p>Now for the fun part: benchmarking. We use the <a href="https://hackage.haskell.org/package/criterion" target="_blank" rel="noopener">criterion</a> library to benchmark the code.</p>
<figure>
<div class="sourceCode" id="cb14" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE GHC2021 #-}</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Main</span> <span class="kw">where</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">ArithVMLib</span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Arrow</span> ((>>>))</span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.DeepSeq</span> (force)</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Exception</span> (evaluate)</span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span> ((>=>))</span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Criterion</span></span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Criterion.Main</span></span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Criterion.Main.Options</span></span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Criterion.Types</span></span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.ByteString</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">BS</span></span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a> code <span class="ot"><-</span> BS.getContents <span class="op">>>=</span> evaluate <span class="op">.</span> force</span>
<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> <span class="dt">Right</span> ast <span class="ot">=</span> force <span class="op">$</span> parseSized code</span>
<span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> bytecode <span class="ot">=</span> force <span class="op">$</span> compile ast</span>
<span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> program <span class="ot">=</span> force <span class="op">$</span> disassemble bytecode</span>
<span id="cb14-22"><a href="#cb14-22" aria-hidden="true" tabindex="-1"></a> runMode</span>
<span id="cb14-23"><a href="#cb14-23" aria-hidden="true" tabindex="-1"></a> ( <span class="dt">Run</span></span>
<span id="cb14-24"><a href="#cb14-24" aria-hidden="true" tabindex="-1"></a> (defaultConfig {reportFile <span class="ot">=</span> <span class="dt">Just</span> <span class="st">"benchmark.html"</span>})</span>
<span id="cb14-25"><a href="#cb14-25" aria-hidden="true" tabindex="-1"></a> <span class="dt">Prefix</span></span>
<span id="cb14-26"><a href="#cb14-26" aria-hidden="true" tabindex="-1"></a> []</span>
<span id="cb14-27"><a href="#cb14-27" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb14-28"><a href="#cb14-28" aria-hidden="true" tabindex="-1"></a> [ bgroup</span>
<span id="cb14-29"><a href="#cb14-29" aria-hidden="true" tabindex="-1"></a> <span class="st">"pass"</span></span>
<span id="cb14-30"><a href="#cb14-30" aria-hidden="true" tabindex="-1"></a> [ bench <span class="st">"parse"</span> <span class="op">$</span> whnf (parseSized <span class="op">>>></span> force) code,</span>
<span id="cb14-31"><a href="#cb14-31" aria-hidden="true" tabindex="-1"></a> bench <span class="st">"compile"</span> <span class="op">$</span> whnf (compile <span class="op">>>></span> force) ast,</span>
<span id="cb14-32"><a href="#cb14-32" aria-hidden="true" tabindex="-1"></a> bench <span class="st">"disassemble"</span> <span class="op">$</span> whnf (disassemble <span class="op">>>></span> force) bytecode,</span>
<span id="cb14-33"><a href="#cb14-33" aria-hidden="true" tabindex="-1"></a> bench <span class="st">"decompile"</span> <span class="op">$</span> whnf (decompile <span class="op">>>></span> force) program</span>
<span id="cb14-34"><a href="#cb14-34" aria-hidden="true" tabindex="-1"></a> ],</span>
<span id="cb14-35"><a href="#cb14-35" aria-hidden="true" tabindex="-1"></a> bgroup</span>
<span id="cb14-36"><a href="#cb14-36" aria-hidden="true" tabindex="-1"></a> <span class="st">"interpret"</span></span>
<span id="cb14-37"><a href="#cb14-37" aria-hidden="true" tabindex="-1"></a> [ bench <span class="st">"ast"</span> <span class="op">$</span> whnf (<span class="fu">fst</span> <span class="op">>>></span> interpretAST <span class="op">>>></span> force) ast,</span>
<span id="cb14-38"><a href="#cb14-38" aria-hidden="true" tabindex="-1"></a> bench <span class="st">"bytecode"</span> <span class="op">$</span> whnf (interpretBytecode <span class="op">>>></span> force) bytecode</span>
<span id="cb14-39"><a href="#cb14-39" aria-hidden="true" tabindex="-1"></a> ],</span>
<span id="cb14-40"><a href="#cb14-40" aria-hidden="true" tabindex="-1"></a> bgroup</span>
<span id="cb14-41"><a href="#cb14-41" aria-hidden="true" tabindex="-1"></a> <span class="st">"run"</span></span>
<span id="cb14-42"><a href="#cb14-42" aria-hidden="true" tabindex="-1"></a> [ bench <span class="st">"ast"</span> <span class="op">$</span></span>
<span id="cb14-43"><a href="#cb14-43" aria-hidden="true" tabindex="-1"></a> whnf (parse <span class="op">>=></span> interpretAST <span class="op">>>></span> force) code,</span>
<span id="cb14-44"><a href="#cb14-44" aria-hidden="true" tabindex="-1"></a> bench <span class="st">"bytecode"</span> <span class="op">$</span></span>
<span id="cb14-45"><a href="#cb14-45" aria-hidden="true" tabindex="-1"></a> whnf (parseSized <span class="op">>=></span> compile <span class="op">>=></span> interpretBytecode <span class="op">>>></span> force) code</span>
<span id="cb14-46"><a href="#cb14-46" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb14-47"><a href="#cb14-47" aria-hidden="true" tabindex="-1"></a> ]</span></code></pre></div>
<figcaption>
ArithVMBench.hs
</figcaption>
</figure>
<p>We have a benchmark suite to measure the performance of each pass, the two interpreters (<abbr title="Abstract Syntax Tree">AST</abbr> and bytecode), and the full end-to-end runs<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>. We compile with the following <abbr title="Glasgow Haskell Compiler">GHC</abbr> options:</p>
<pre class="plain"><code> -O2
-fllvm
-funbox-strict-fields
-funfolding-use-threshold=16
-threaded
-rtsopts
-with-rtsopts=-N2</code></pre>
<details>
<summary>
Benchmark run
</summary>
<pre class="plain"><code>$ cat benchmark.tb | cabal bench
Running 1 benchmarks...
Benchmark bench: RUNNING...
benchmarking pass/parse
time 581.1 ms (566.7 ms .. 594.3 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 573.5 ms (570.4 ms .. 577.1 ms)
std dev 3.948 ms (1.359 ms .. 5.424 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking pass/compile
time 51.00 ms (50.48 ms .. 52.54 ms)
0.998 R² (0.995 R² .. 1.000 R²)
mean 50.82 ms (50.57 ms .. 51.87 ms)
std dev 810.9 μs (185.8 μs .. 1.509 ms)
benchmarking pass/disassemble
time 160.3 ms (154.7 ms .. 166.5 ms)
0.998 R² (0.990 R² .. 1.000 R²)
mean 155.8 ms (150.0 ms .. 160.5 ms)
std dev 7.642 ms (4.255 ms .. 11.76 ms)
variance introduced by outliers: 12% (moderately inflated)
benchmarking pass/decompile
time 495.1 ms (454.0 ms .. 523.7 ms)
0.999 R² (0.999 R² .. 1.000 R²)
mean 506.5 ms (495.0 ms .. 525.1 ms)
std dev 17.73 ms (2.167 ms .. 22.59 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking interpret/ast
time 49.57 ms (49.53 ms .. 49.61 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 49.80 ms (49.71 ms .. 50.07 ms)
std dev 255.9 μs (124.2 μs .. 433.9 μs)
benchmarking interpret/bytecode
time 15.83 ms (15.79 ms .. 15.88 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 15.79 ms (15.75 ms .. 15.83 ms)
std dev 96.85 μs (70.30 μs .. 140.9 μs)
benchmarking run/ast
time 628.0 ms (626.7 ms .. 630.5 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 617.2 ms (610.2 ms .. 621.0 ms)
std dev 6.679 ms (1.899 ms .. 8.802 ms)
variance introduced by outliers: 19% (moderately inflated)
benchmarking run/bytecode
time 643.8 ms (632.5 ms .. 655.3 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 638.3 ms (635.8 ms .. 641.2 ms)
std dev 2.981 ms (1.292 ms .. 4.153 ms)
variance introduced by outliers: 19% (moderately inflated)
Benchmark bench: FINISH</code></pre>
</details>
<p>Here are the results in a more digestible format:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Benchmark</th>
<th style="text-align: right;">Mean Time (ms)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">pass/parse</td>
<td style="text-align: right;">573.5</td>
</tr>
<tr>
<td style="text-align: left;">pass/compile</td>
<td style="text-align: right;">50.8</td>
</tr>
<tr>
<td style="text-align: left;">pass/disassemble</td>
<td style="text-align: right;">155.8</td>
</tr>
<tr>
<td style="text-align: left;">pass/decompile</td>
<td style="text-align: right;">506.5</td>
</tr>
<tr>
<td style="text-align: left;">interpret/ast</td>
<td style="text-align: right;">49.8</td>
</tr>
<tr>
<td style="text-align: left;">interpret/bytecode</td>
<td style="text-align: right;">15.8</td>
</tr>
<tr>
<td style="text-align: left;">run/ast</td>
<td style="text-align: right;">617.2</td>
</tr>
<tr>
<td style="text-align: left;">run/bytecode</td>
<td style="text-align: right;">638.3</td>
</tr>
</tbody>
</table>
</div>
<p>Here are the times in a chart (smaller is better):</p>
<figure class="w-100pct">
<a href="https://abhinavsarkar.net/images/plots/pandocplot7696613430207986151.svg" class="img-link"><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 800 600'%3E%3C/svg%3E" class="lazyload w-100pct" style="--image-aspect-ratio: 1.3333333333333333" data-src="/images/plots/pandocplot7696613430207986151.svg" alt="Benchmark times"></img>
<noscript><img src="/images/plots/pandocplot7696613430207986151.svg" class="w-100pct" alt="Benchmark times"></img></noscript></a>
<figcaption>Benchmark times</figcaption>
</figure>
<p>Let’s break down these numbers:</p>
<ul>
<li><strong>Parsing and decompiling are slow</strong>: At ~573ms and ~506ms, these are by far the slowest passes. This isn’t surprising. Parsing with parser combinators has a known trade-off of expressiveness for performance. Decompiling is a shift-reduce parser that reconstructs an <abbr title="Abstract Syntax Tree">AST</abbr> from a linear stream of opcodes, and we didn’t spend any time optimizing it.</li>
<li><strong>Compilation is fast</strong>: At ~51ms, compilation is an order of magnitude faster than parsing. This is thanks to pre-calculating the bytecode size during the parsing phase, which allows us to pre-allocate a single <code class="sourceCode haskell"><span class="dt">ByteString</span></code> and fill it in with low-level pointer operations.</li>
<li><strong>Bytecode interpretation is blazingly fast</strong>: At just ~16ms, our <abbr title="Virtual Machine">VM</abbr>’s interpreter is over 3 times faster than the <abbr title="Abstract Syntax Tree">AST</abbr> interpreter (~50ms), which proves our belief that bytecode interpreters are faster.</li>
<li><strong>End-to-end runs</strong>: Interestingly, the total time to run via bytecode (~638ms) is slightly slower than the run via <abbr title="Abstract Syntax Tree">AST</abbr> (~617ms). This is because the cost of parsing, compiling, and then interpreting is higher than just parsing and interpreting. The real win for a bytecode <abbr title="Virtual Machine">VM</abbr> comes when you compile <em>once</em> and run <em>many times</em>, amortizing the initial compilation cost.</li>
</ul>
<p>I can already see readers thinking, “Sure that’s fast, but is it faster than C/Rust/Zig/my favourite language?” Let’s find out.</p>
<h2 data-track-content data-content-name="benchmarking-against-c" data-content-piece="arithmetic-bytecode-vm" id="benchmarking-against-c">Benchmarking Against C</h2>
<p>To get a better sense of our <abbr title="Virtual Machine">VM</abbr>’s performance, I rewrote it in C.</p>
<p><a href="https://abhinavsarkar.net/code/arithvm.html?mtm_campaign=feed">The C implementation</a> is a classic manual approach: a hand-written tokenizer and recursive-descent parser, <code class="sourceCode c"><span class="kw">struct</span></code>s with pointers for the <abbr title="Abstract Syntax Tree">AST</abbr>, and manual memory management and error propagation. The <abbr title="Virtual Machine">VM</abbr> is a simple <code class="sourceCode c"><span class="cf">while</span></code> loop with a <code class="sourceCode c"><span class="cf">switch</span></code> statement for dispatching opcodes<a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a>.</p>
<p>To compare our Haskell code against the C code, we need to write the last Haskell module, the <abbr title="Command-line interface">CLI</abbr> app that we demonstrated in <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#introduction">the first post</a>:</p>
<details>
<summary>
ArithVMApp.hs
</summary>
<div class="sourceCode" id="cb17" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE GHC2021 #-}</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Main</span> <span class="kw">where</span></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">ArithVMLib</span></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Arrow</span> ((>>>))</span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span> ((>=>))</span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.ByteString</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">BS</span></span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Foldable</span> (toList)</span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Set</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Set</span></span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.String</span> (<span class="dt">IsString</span> (fromString))</span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Options.Applicative</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">O</span></span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.Exit</span> (exitFailure)</span>
<span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.IO</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">IO</span></span>
<span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Test.QuickCheck</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Q</span></span>
<span id="cb17-16"><a href="#cb17-16" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Text.Pretty.Simple</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">PS</span></span>
<span id="cb17-17"><a href="#cb17-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-18"><a href="#cb17-18" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Command</span></span>
<span id="cb17-19"><a href="#cb17-19" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">RunPass</span> <span class="dt">Pass</span> <span class="dt">Input</span></span>
<span id="cb17-20"><a href="#cb17-20" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Run</span> <span class="dt">Input</span></span>
<span id="cb17-21"><a href="#cb17-21" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Generate</span> <span class="dt">Int</span></span>
<span id="cb17-22"><a href="#cb17-22" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb17-23"><a href="#cb17-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-24"><a href="#cb17-24" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Input</span> <span class="ot">=</span> <span class="dt">InputFP</span> <span class="dt">FilePath</span> <span class="op">|</span> <span class="dt">InputStdin</span> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb17-25"><a href="#cb17-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-26"><a href="#cb17-26" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">IsString</span> <span class="dt">Input</span> <span class="kw">where</span></span>
<span id="cb17-27"><a href="#cb17-27" aria-hidden="true" tabindex="-1"></a> fromString <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb17-28"><a href="#cb17-28" aria-hidden="true" tabindex="-1"></a> <span class="st">"-"</span> <span class="ot">-></span> <span class="dt">InputStdin</span></span>
<span id="cb17-29"><a href="#cb17-29" aria-hidden="true" tabindex="-1"></a> fp <span class="ot">-></span> <span class="dt">InputFP</span> fp</span>
<span id="cb17-30"><a href="#cb17-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-31"><a href="#cb17-31" aria-hidden="true" tabindex="-1"></a><span class="ot">commandParser ::</span> <span class="dt">IO</span> <span class="dt">Command</span></span>
<span id="cb17-32"><a href="#cb17-32" aria-hidden="true" tabindex="-1"></a>commandParser <span class="ot">=</span></span>
<span id="cb17-33"><a href="#cb17-33" aria-hidden="true" tabindex="-1"></a> O.customExecParser (O.prefs <span class="op">$</span> O.showHelpOnError <span class="op"><></span> O.showHelpOnEmpty)</span>
<span id="cb17-34"><a href="#cb17-34" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> O.info (O.hsubparser (<span class="fu">mconcat</span> subcommandParsers) <span class="op">O.<**></span> O.helper)</span>
<span id="cb17-35"><a href="#cb17-35" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> O.fullDesc <span class="op"><></span> O.header <span class="st">"Bytecode VM for Arithmetic written in Haskell"</span></span>
<span id="cb17-36"><a href="#cb17-36" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb17-37"><a href="#cb17-37" aria-hidden="true" tabindex="-1"></a> subcommandParsers <span class="ot">=</span></span>
<span id="cb17-38"><a href="#cb17-38" aria-hidden="true" tabindex="-1"></a> <span class="fu">map</span></span>
<span id="cb17-39"><a href="#cb17-39" aria-hidden="true" tabindex="-1"></a> ( \(command, pass, desc) <span class="ot">-></span></span>
<span id="cb17-40"><a href="#cb17-40" aria-hidden="true" tabindex="-1"></a> O.command command</span>
<span id="cb17-41"><a href="#cb17-41" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> O.info (<span class="dt">RunPass</span> pass <span class="op"><$></span> inputParser)</span>
<span id="cb17-42"><a href="#cb17-42" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> O.progDesc desc</span>
<span id="cb17-43"><a href="#cb17-43" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb17-44"><a href="#cb17-44" aria-hidden="true" tabindex="-1"></a> [ (<span class="st">"read"</span>, <span class="dt">Read</span>, <span class="st">"Read an expression from file or STDIN"</span>),</span>
<span id="cb17-45"><a href="#cb17-45" aria-hidden="true" tabindex="-1"></a> (<span class="st">"parse"</span>, <span class="dt">Parse</span>, <span class="st">"Parse expression to AST"</span>),</span>
<span id="cb17-46"><a href="#cb17-46" aria-hidden="true" tabindex="-1"></a> (<span class="st">"print"</span>, <span class="dt">Print</span>, <span class="st">"Parse expression to AST and print it"</span>),</span>
<span id="cb17-47"><a href="#cb17-47" aria-hidden="true" tabindex="-1"></a> (<span class="st">"compile"</span>, <span class="dt">Compile</span>, <span class="st">"Parse and compile expression to bytecode"</span>),</span>
<span id="cb17-48"><a href="#cb17-48" aria-hidden="true" tabindex="-1"></a> (<span class="st">"disassemble"</span>, <span class="dt">Disassemble</span>, <span class="st">"Disassemble bytecode to opcodes"</span>),</span>
<span id="cb17-49"><a href="#cb17-49" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"decompile"</span>,</span>
<span id="cb17-50"><a href="#cb17-50" aria-hidden="true" tabindex="-1"></a> <span class="dt">Decompile</span>,</span>
<span id="cb17-51"><a href="#cb17-51" aria-hidden="true" tabindex="-1"></a> <span class="st">"Disassemble and decompile bytecode to expression"</span></span>
<span id="cb17-52"><a href="#cb17-52" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb17-53"><a href="#cb17-53" aria-hidden="true" tabindex="-1"></a> (<span class="st">"interpret-ast"</span>, <span class="dt">InterpretAST</span>, <span class="st">"Parse expression and interpret AST"</span>),</span>
<span id="cb17-54"><a href="#cb17-54" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"interpret-bytecode"</span>,</span>
<span id="cb17-55"><a href="#cb17-55" aria-hidden="true" tabindex="-1"></a> <span class="dt">InterpretBytecode</span>,</span>
<span id="cb17-56"><a href="#cb17-56" aria-hidden="true" tabindex="-1"></a> <span class="st">"Parse, compile and assemble expression, and interpret bytecode"</span></span>
<span id="cb17-57"><a href="#cb17-57" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb17-58"><a href="#cb17-58" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb17-59"><a href="#cb17-59" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> [ O.command <span class="st">"run"</span> <span class="op">.</span> O.info (<span class="dt">Run</span> <span class="op"><$></span> inputParser) <span class="op">$</span></span>
<span id="cb17-60"><a href="#cb17-60" aria-hidden="true" tabindex="-1"></a> O.progDesc <span class="st">"Run bytecode"</span>,</span>
<span id="cb17-61"><a href="#cb17-61" aria-hidden="true" tabindex="-1"></a> O.command <span class="st">"generate"</span> <span class="op">.</span> O.info (<span class="dt">Generate</span> <span class="op"><$></span> maxSizeParser) <span class="op">$</span></span>
<span id="cb17-62"><a href="#cb17-62" aria-hidden="true" tabindex="-1"></a> O.progDesc <span class="st">"Generate a random arithmetic expression"</span></span>
<span id="cb17-63"><a href="#cb17-63" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb17-64"><a href="#cb17-64" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-65"><a href="#cb17-65" aria-hidden="true" tabindex="-1"></a> inputParser <span class="ot">=</span></span>
<span id="cb17-66"><a href="#cb17-66" aria-hidden="true" tabindex="-1"></a> O.strArgument</span>
<span id="cb17-67"><a href="#cb17-67" aria-hidden="true" tabindex="-1"></a> ( O.metavar <span class="st">"FILE"</span></span>
<span id="cb17-68"><a href="#cb17-68" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> O.value <span class="dt">InputStdin</span></span>
<span id="cb17-69"><a href="#cb17-69" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> O.help <span class="st">"Input file, pass - to read from STDIN (default)"</span></span>
<span id="cb17-70"><a href="#cb17-70" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb17-71"><a href="#cb17-71" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-72"><a href="#cb17-72" aria-hidden="true" tabindex="-1"></a> maxSizeParser <span class="ot">=</span></span>
<span id="cb17-73"><a href="#cb17-73" aria-hidden="true" tabindex="-1"></a> O.option</span>
<span id="cb17-74"><a href="#cb17-74" aria-hidden="true" tabindex="-1"></a> O.auto</span>
<span id="cb17-75"><a href="#cb17-75" aria-hidden="true" tabindex="-1"></a> ( O.long <span class="st">"size"</span></span>
<span id="cb17-76"><a href="#cb17-76" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> O.short <span class="ch">'s'</span></span>
<span id="cb17-77"><a href="#cb17-77" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> O.metavar <span class="st">"INT"</span></span>
<span id="cb17-78"><a href="#cb17-78" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> O.value <span class="dv">100</span></span>
<span id="cb17-79"><a href="#cb17-79" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> O.help <span class="st">"Maximum size of the generated AST"</span></span>
<span id="cb17-80"><a href="#cb17-80" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb17-81"><a href="#cb17-81" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-82"><a href="#cb17-82" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb17-83"><a href="#cb17-83" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> commandParser <span class="op">>>=</span> runCommand</span>
<span id="cb17-84"><a href="#cb17-84" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-85"><a href="#cb17-85" aria-hidden="true" tabindex="-1"></a><span class="ot">runCommand ::</span> <span class="dt">Command</span> <span class="ot">-></span> <span class="dt">IO</span> ()</span>
<span id="cb17-86"><a href="#cb17-86" aria-hidden="true" tabindex="-1"></a>runCommand <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb17-87"><a href="#cb17-87" aria-hidden="true" tabindex="-1"></a> <span class="dt">RunPass</span> <span class="dt">Read</span> i <span class="ot">-></span> run i (<span class="fu">const</span> <span class="op">$</span> <span class="fu">pure</span> ()) (\_ <span class="ot">-></span> <span class="dt">Right</span><span class="ot"> () ::</span> <span class="dt">Either</span> <span class="dt">String</span> ())</span>
<span id="cb17-88"><a href="#cb17-88" aria-hidden="true" tabindex="-1"></a> <span class="dt">RunPass</span> <span class="dt">Parse</span> i <span class="ot">-></span> run i (<span class="fu">const</span> <span class="op">$</span> <span class="fu">pure</span> ()) parse</span>
<span id="cb17-89"><a href="#cb17-89" aria-hidden="true" tabindex="-1"></a> <span class="dt">RunPass</span> <span class="dt">Print</span> i <span class="ot">-></span> run i pPrintExpr parse</span>
<span id="cb17-90"><a href="#cb17-90" aria-hidden="true" tabindex="-1"></a> <span class="dt">RunPass</span> <span class="dt">Compile</span> i <span class="ot">-></span> run i BS.putStr <span class="op">$</span> parseSized <span class="op">>=></span> compile</span>
<span id="cb17-91"><a href="#cb17-91" aria-hidden="true" tabindex="-1"></a> <span class="dt">RunPass</span> <span class="dt">Decompile</span> i <span class="ot">-></span> run i pPrintExpr <span class="op">$</span> disassemble <span class="op">>=></span> decompile</span>
<span id="cb17-92"><a href="#cb17-92" aria-hidden="true" tabindex="-1"></a> <span class="dt">RunPass</span> <span class="dt">Disassemble</span> i <span class="ot">-></span> run i (<span class="fu">mapM_</span> <span class="fu">print</span>) <span class="op">$</span> disassemble <span class="op">>>></span> <span class="fu">fmap</span> toList</span>
<span id="cb17-93"><a href="#cb17-93" aria-hidden="true" tabindex="-1"></a> <span class="dt">RunPass</span> <span class="dt">InterpretAST</span> i <span class="ot">-></span> run i <span class="fu">print</span> <span class="op">$</span> parse <span class="op">>=></span> interpretAST</span>
<span id="cb17-94"><a href="#cb17-94" aria-hidden="true" tabindex="-1"></a> <span class="dt">RunPass</span> <span class="dt">InterpretBytecode</span> i <span class="ot">-></span></span>
<span id="cb17-95"><a href="#cb17-95" aria-hidden="true" tabindex="-1"></a> run i <span class="fu">print</span> <span class="op">$</span> parseSized <span class="op">>=></span> compile <span class="op">>=></span> interpretBytecode</span>
<span id="cb17-96"><a href="#cb17-96" aria-hidden="true" tabindex="-1"></a> <span class="dt">Run</span> i <span class="ot">-></span> run i <span class="fu">print</span> interpretBytecode</span>
<span id="cb17-97"><a href="#cb17-97" aria-hidden="true" tabindex="-1"></a> <span class="dt">Generate</span> maxSize <span class="ot">-></span> Q.generate (exprGen Set.empty maxSize) <span class="op">>>=</span> pPrintExpr</span>
<span id="cb17-98"><a href="#cb17-98" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb17-99"><a href="#cb17-99" aria-hidden="true" tabindex="-1"></a> run input <span class="fu">print</span> process <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb17-100"><a href="#cb17-100" aria-hidden="true" tabindex="-1"></a> code <span class="ot"><-</span> <span class="kw">case</span> input <span class="kw">of</span></span>
<span id="cb17-101"><a href="#cb17-101" aria-hidden="true" tabindex="-1"></a> <span class="dt">InputStdin</span> <span class="ot">-></span> BS.getContents</span>
<span id="cb17-102"><a href="#cb17-102" aria-hidden="true" tabindex="-1"></a> <span class="dt">InputFP</span> fp <span class="ot">-></span> BS.readFile fp</span>
<span id="cb17-103"><a href="#cb17-103" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> process code <span class="kw">of</span></span>
<span id="cb17-104"><a href="#cb17-104" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> err <span class="ot">-></span> <span class="dt">IO</span><span class="op">.</span>hPrint <span class="dt">IO</span><span class="op">.</span>stderr err <span class="op">>></span> exitFailure</span>
<span id="cb17-105"><a href="#cb17-105" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> val <span class="ot">-></span> <span class="fu">print</span> val</span>
<span id="cb17-106"><a href="#cb17-106" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-107"><a href="#cb17-107" aria-hidden="true" tabindex="-1"></a> pPrintExpr <span class="ot">=</span></span>
<span id="cb17-108"><a href="#cb17-108" aria-hidden="true" tabindex="-1"></a> PS.pPrintOpt <span class="dt">PS.CheckColorTty</span> <span class="op">$</span></span>
<span id="cb17-109"><a href="#cb17-109" aria-hidden="true" tabindex="-1"></a> PS.defaultOutputOptionsDarkBg</span>
<span id="cb17-110"><a href="#cb17-110" aria-hidden="true" tabindex="-1"></a> { PS.outputOptionsIndentAmount <span class="ot">=</span> <span class="dv">2</span>,</span>
<span id="cb17-111"><a href="#cb17-111" aria-hidden="true" tabindex="-1"></a> PS.outputOptionsCompact <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb17-112"><a href="#cb17-112" aria-hidden="true" tabindex="-1"></a> }</span></code></pre></div>
</details>
<p>We compile with the following <abbr title="Glasgow Haskell Compiler">GHC</abbr> options<a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a>:</p>
<pre class="plain"><code> -O2
-fllvm
-funbox-strict-fields
-funfolding-use-threshold=16</code></pre>
<p>And for the C version, we compile using GCC:</p>
<pre class="plain"><code>gcc -O3 arithvm.c -o arithvm -Wall</code></pre>
<p>Now, let’s see how they stack up against each other. We use <a href="https://github.com/sharkdp/hyperfine" target="_blank" rel="noopener">hyperfine</a> to run the two executables.</p>
<details>
<summary>
Hyperfine run
</summary>
<pre class="plain"><code>$ arith-vm compile benchmark.tb > benchmark.tbc
# Haskell runs
$ hyperfine -L pass read,parse,compile,interpret-bytecode --warmup 10 -r 30 \
"arith-vm {pass} benchmark.tb"
Benchmark 1: arith-vm read benchmark.tb
Time (mean ± σ): 30.4 ms ± 0.2 ms [User: 2.4 ms, System: 15.9 ms]
Range (min … max): 30.0 ms … 30.9 ms 30 runs
Benchmark 2: arith-vm parse benchmark.tb
Time (mean ± σ): 567.6 ms ± 5.7 ms [User: 537.4 ms, System: 22.0 ms]
Range (min … max): 554.7 ms … 579.9 ms 30 runs
Benchmark 3: arith-vm compile benchmark.tb
Time (mean ± σ): 630.0 ms ± 4.5 ms [User: 598.5 ms, System: 23.5 ms]
Range (min … max): 622.6 ms … 641.1 ms 30 runs
Benchmark 4: arith-vm interpret-bytecode benchmark.tb
Time (mean ± σ): 650.2 ms ± 4.9 ms [User: 619.0 ms, System: 23.3 ms]
Range (min … max): 640.9 ms … 656.6 ms 30 runs
$ hyperfine --warmup 10 -r 30 "arith-vm run benchmark.tbc"
Benchmark 1: arith-vm run benchmark.tbc
Time (mean ± σ): 29.3 ms ± 0.2 ms [User: 17.6 ms, System: 2.9 ms]
Range (min … max): 28.9 ms … 29.6 ms 30 runs
# C runs
$ hyperfine -L pass read,parse,compile,interpret --warmup 10 -r 30 \
"./arithvm {pass} benchmark.tb"
Benchmark 1: ./arithvm read benchmark.tb
Time (mean ± σ): 14.2 ms ± 0.2 ms [User: 0.8 ms, System: 13.0 ms]
Range (min … max): 14.0 ms … 14.6 ms 30 runs
Benchmark 2: ./arithvm parse benchmark.tb
Time (mean ± σ): 217.4 ms ± 2.6 ms [User: 192.2 ms, System: 23.7 ms]
Range (min … max): 213.6 ms … 223.9 ms 30 runs
Benchmark 3: ./arithvm compile benchmark.tb
Time (mean ± σ): 254.5 ms ± 2.9 ms [User: 228.3 ms, System: 24.7 ms]
Range (min … max): 246.0 ms … 259.1 ms 30 runs
Benchmark 4: ./arithvm interpret benchmark.tb
Time (mean ± σ): 267.9 ms ± 2.1 ms [User: 241.5 ms, System: 24.9 ms]
Range (min … max): 263.4 ms … 272.2 ms 30 runs
$ hyperfine --warmup 10 -r 30 "./arithvm run benchmark.tbc"
Benchmark 1: ./arithvm run benchmark.tbc
Time (mean ± σ): 13.9 ms ± 0.1 ms [User: 12.4 ms, System: 1.1 ms]
Range (min … max): 13.6 ms … 14.1 ms 30 runs</code></pre>
</details>
<p>Here’s a summary of the results:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Pass</th>
<th style="text-align: right;">C Time (ms)</th>
<th style="text-align: right;">Haskell Time (ms)</th>
<th style="text-align: right;">Slowdown</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">Read</td>
<td style="text-align: right;">14.2</td>
<td style="text-align: right;">30.4</td>
<td style="text-align: right;">2.14x</td>
</tr>
<tr>
<td style="text-align: left;">Parse</td>
<td style="text-align: right;">203.2</td>
<td style="text-align: right;">537.2</td>
<td style="text-align: right;">2.64x</td>
</tr>
<tr>
<td style="text-align: left;">Compile</td>
<td style="text-align: right;">37.1</td>
<td style="text-align: right;">62.4</td>
<td style="text-align: right;">1.68x</td>
</tr>
<tr>
<td style="text-align: left;">Interpret</td>
<td style="text-align: right;">13.4</td>
<td style="text-align: right;">20.2</td>
<td style="text-align: right;">1.51x</td>
</tr>
<tr>
<td style="text-align: left;">Run</td>
<td style="text-align: right;">13.9</td>
<td style="text-align: right;">29.3</td>
<td style="text-align: right;">2.11x</td>
</tr>
</tbody>
</table>
</div>
<p>I have subtracted the times of previous passes to get the times for individual passes. Here’s the same in a chart (smaller is better):</p>
<figure class="w-100pct">
<a href="https://abhinavsarkar.net/images/plots/pandocplot18010519765177174314.svg" class="img-link"><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 800 600'%3E%3C/svg%3E" class="lazyload w-100pct" style="--image-aspect-ratio: 1.3333333333333333" data-src="/images/plots/pandocplot18010519765177174314.svg" alt="Run time of different passes for C and Haskell VMs"></img>
<noscript><img src="/images/plots/pandocplot18010519765177174314.svg" class="w-100pct" alt="Run time of different passes for C and Haskell VMs"></img></noscript></a>
<figcaption>Run time of different passes for C and Haskell VMs</figcaption>
</figure>
<p>As expected, the C implementation is faster across the board, between 1.5x to 2.6x. The biggest difference is in parsing, where the hand-written C parser is more than twice as fast as our combinator-based one. On the other hand, the Haskell <abbr title="Virtual Machine">VM</abbr> is only 50% slower than the C <abbr title="Virtual Machine">VM</abbr>. In my opinion, the Haskell code’s performance is quite respectable, especially given the safety, expressiveness and conciseness benefits, as illustrated by the code sizes<a href="#fn11" class="footnote-ref" id="fnref11" role="doc-noteref"><sup>11</sup></a>:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Implementation</th>
<th style="text-align: right;">Lines of Code</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">C</td>
<td style="text-align: right;">775</td>
</tr>
<tr>
<td style="text-align: left;">Haskell</td>
<td style="text-align: right;">407</td>
</tr>
</tbody>
</table>
</div>
<p>The Haskell implementation is almost half the size of the C code. I don’t know about you but I’m perfectly happy with the half as small, half as fast tread-off.</p>
<p>The benchmark results for the <abbr title="Virtual Machine">VM</abbr>s become less surprising when I compare the C <code>interpret</code> function with the <abbr title="Glasgow Haskell Compiler">GHC</abbr> Core code for <code>interpretBytecode'</code><a href="#fn12" class="footnote-ref" id="fnref12" role="doc-noteref"><sup>12</sup></a>.</p>
<div class="sourceCode" id="cb21" data-lang="c"><pre class="sourceCode numberSource c"><code class="sourceCode c"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> interpret<span class="op">(</span><span class="dt">const</span> <span class="dt">uint8_t</span> <span class="op">*</span>bytecode<span class="op">,</span> <span class="dt">const</span> <span class="dt">long</span> bytecode_size<span class="op">,</span> <span class="dt">int16_t</span> <span class="op">*</span>result<span class="op">)</span> <span class="op">{</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a> VM vm<span class="op">;</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a> vm_init<span class="op">(&</span>vm<span class="op">);</span></span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> <span class="op">(</span>vm<span class="op">.</span>ip <span class="op"><</span> bytecode_size<span class="op">)</span> <span class="op">{</span></span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>vm<span class="op">.</span>sp <span class="op">>=</span> STACK_SIZE<span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> VM_ERROR_STACK_OVERFLOW<span class="op">;</span> <span class="op">}</span></span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>vm<span class="op">.</span>sp <span class="op"><</span> <span class="dv">0</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> VM_ERROR_STACK_UNDERFLOW<span class="op">;</span> <span class="op">}</span></span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">uint8_t</span> op <span class="op">=</span> bytecode<span class="op">[</span>vm<span class="op">.</span>ip<span class="op">];</span></span>
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a> <span class="co">// other checks</span></span>
<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a> <span class="cf">switch</span> <span class="op">(</span>op<span class="op">)</span> <span class="op">{</span></span>
<span id="cb21-13"><a href="#cb21-13" aria-hidden="true" tabindex="-1"></a> <span class="cf">case</span> OP_PUSH<span class="op">:</span> <span class="op">{</span></span>
<span id="cb21-14"><a href="#cb21-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">uint8_t</span> byte1 <span class="op">=</span> bytecode<span class="op">[</span>vm<span class="op">.</span>ip <span class="op">+</span> <span class="dv">1</span><span class="op">];</span></span>
<span id="cb21-15"><a href="#cb21-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">uint8_t</span> byte2 <span class="op">=</span> bytecode<span class="op">[</span>vm<span class="op">.</span>ip <span class="op">+</span> <span class="dv">2</span><span class="op">];</span></span>
<span id="cb21-16"><a href="#cb21-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">const</span> <span class="dt">int16_t</span> value <span class="op">=</span> <span class="op">(</span><span class="dt">int16_t</span><span class="op">)((</span><span class="dt">uint16_t</span><span class="op">)</span>byte1 <span class="op">|</span> <span class="op">((</span><span class="dt">uint16_t</span><span class="op">)</span>byte2 <span class="op"><<</span> <span class="dv">8</span><span class="op">));</span></span>
<span id="cb21-17"><a href="#cb21-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-18"><a href="#cb21-18" aria-hidden="true" tabindex="-1"></a> vm<span class="op">.</span>stack<span class="op">[</span>vm<span class="op">.</span>sp<span class="op">]</span> <span class="op">=</span> value<span class="op">;</span></span>
<span id="cb21-19"><a href="#cb21-19" aria-hidden="true" tabindex="-1"></a> vm<span class="op">.</span>sp<span class="op">++;</span></span>
<span id="cb21-20"><a href="#cb21-20" aria-hidden="true" tabindex="-1"></a> vm<span class="op">.</span>ip <span class="op">+=</span> <span class="dv">3</span><span class="op">;</span></span>
<span id="cb21-21"><a href="#cb21-21" aria-hidden="true" tabindex="-1"></a> <span class="cf">break</span><span class="op">;</span></span>
<span id="cb21-22"><a href="#cb21-22" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb21-23"><a href="#cb21-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-24"><a href="#cb21-24" aria-hidden="true" tabindex="-1"></a> <span class="cf">case</span> OP_ADD<span class="op">:</span></span>
<span id="cb21-25"><a href="#cb21-25" aria-hidden="true" tabindex="-1"></a> <span class="cf">case</span> OP_SUB<span class="op">:</span></span>
<span id="cb21-26"><a href="#cb21-26" aria-hidden="true" tabindex="-1"></a> <span class="cf">case</span> OP_MUL<span class="op">:</span></span>
<span id="cb21-27"><a href="#cb21-27" aria-hidden="true" tabindex="-1"></a> <span class="cf">case</span> OP_DIV<span class="op">:</span> <span class="op">{</span></span>
<span id="cb21-28"><a href="#cb21-28" aria-hidden="true" tabindex="-1"></a> <span class="dt">int16_t</span> value1 <span class="op">=</span> vm<span class="op">.</span>stack<span class="op">[</span>vm<span class="op">.</span>sp <span class="op">-</span> <span class="dv">2</span><span class="op">];</span></span>
<span id="cb21-29"><a href="#cb21-29" aria-hidden="true" tabindex="-1"></a> <span class="dt">int16_t</span> value2 <span class="op">=</span> vm<span class="op">.</span>stack<span class="op">[</span>vm<span class="op">.</span>sp <span class="op">-</span> <span class="dv">1</span><span class="op">];</span></span>
<span id="cb21-30"><a href="#cb21-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-31"><a href="#cb21-31" aria-hidden="true" tabindex="-1"></a> <span class="dt">int16_t</span> result<span class="op">;</span></span>
<span id="cb21-32"><a href="#cb21-32" aria-hidden="true" tabindex="-1"></a> <span class="cf">switch</span> <span class="op">(</span>op<span class="op">)</span> <span class="op">{</span></span>
<span id="cb21-33"><a href="#cb21-33" aria-hidden="true" tabindex="-1"></a> <span class="cf">case</span> OP_ADD<span class="op">:</span> <span class="op">{</span> result <span class="op">=</span> value1 <span class="op">+</span> value2<span class="op">;</span> <span class="cf">break</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb21-34"><a href="#cb21-34" aria-hidden="true" tabindex="-1"></a> <span class="cf">case</span> OP_SUB<span class="op">:</span> <span class="op">{</span> result <span class="op">=</span> value1 <span class="op">-</span> value2<span class="op">;</span> <span class="cf">break</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb21-35"><a href="#cb21-35" aria-hidden="true" tabindex="-1"></a> <span class="cf">case</span> OP_MUL<span class="op">:</span> <span class="op">{</span> result <span class="op">=</span> value1 <span class="op">*</span> value2<span class="op">;</span> <span class="cf">break</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb21-36"><a href="#cb21-36" aria-hidden="true" tabindex="-1"></a> <span class="cf">case</span> OP_DIV<span class="op">:</span> <span class="op">{</span></span>
<span id="cb21-37"><a href="#cb21-37" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>value2 <span class="op">==</span> <span class="dv">0</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> VM_ERROR_DIVISION_BY_ZERO<span class="op">;</span> <span class="op">}</span></span>
<span id="cb21-38"><a href="#cb21-38" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>value2 <span class="op">==</span> <span class="op">-</span><span class="dv">1</span> <span class="op">&&</span> value1 <span class="op">==</span> <span class="op">-</span><span class="dv">32768</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb21-39"><a href="#cb21-39" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> VM_ERROR_ARITHMETIC_OVERFLOW<span class="op">;</span></span>
<span id="cb21-40"><a href="#cb21-40" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb21-41"><a href="#cb21-41" aria-hidden="true" tabindex="-1"></a> result <span class="op">=</span> value1 <span class="op">/</span> value2<span class="op">;</span></span>
<span id="cb21-42"><a href="#cb21-42" aria-hidden="true" tabindex="-1"></a> <span class="cf">break</span><span class="op">;</span></span>
<span id="cb21-43"><a href="#cb21-43" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb21-44"><a href="#cb21-44" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb21-45"><a href="#cb21-45" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-46"><a href="#cb21-46" aria-hidden="true" tabindex="-1"></a> vm<span class="op">.</span>stack<span class="op">[</span>vm<span class="op">.</span>sp <span class="op">-</span> <span class="dv">2</span><span class="op">]</span> <span class="op">=</span> result<span class="op">;</span></span>
<span id="cb21-47"><a href="#cb21-47" aria-hidden="true" tabindex="-1"></a> vm<span class="op">.</span>sp<span class="op">--;</span></span>
<span id="cb21-48"><a href="#cb21-48" aria-hidden="true" tabindex="-1"></a> vm<span class="op">.</span>ip<span class="op">++;</span></span>
<span id="cb21-49"><a href="#cb21-49" aria-hidden="true" tabindex="-1"></a> <span class="cf">break</span><span class="op">;</span></span>
<span id="cb21-50"><a href="#cb21-50" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb21-51"><a href="#cb21-51" aria-hidden="true" tabindex="-1"></a> <span class="co">// ... other cases ...</span></span>
<span id="cb21-52"><a href="#cb21-52" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb21-53"><a href="#cb21-53" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb21-54"><a href="#cb21-54" aria-hidden="true" tabindex="-1"></a> <span class="co">// ... final checks and return ...</span></span>
<span id="cb21-55"><a href="#cb21-55" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>This structure is almost a 1-to-1 match with the <abbr title="Glasgow Haskell Compiler">GHC</abbr> Core code we saw earlier. The C <code class="sourceCode c"><span class="cf">while</span></code> loop corresponds to the optimized <code>$wgo</code> function that <abbr title="Glasgow Haskell Compiler">GHC</abbr> generates, the <code class="sourceCode c"><span class="cf">switch</span></code> statement is almost identical to the <code class="sourceCode c"><span class="cf">case</span></code> analysis on the raw opcode byte, and the C stack array is equivalent to the <code class="sourceCode haskell"><span class="dt">MutableByteArray</span><span class="op">#</span></code> <abbr title="Glasgow Haskell Compiler">GHC</abbr> uses. <abbr title="Glasgow Haskell Compiler">GHC</abbr> effectively compiles our high-level Haskell into a low-level code that is structurally identical to what we wrote by hand in C<a href="#fn13" class="footnote-ref" id="fnref13" role="doc-noteref"><sup>13</sup></a>.</p>
<p>This explains why the performance is in the same ballpark. The remaining performance gap is probably due to the thin layer of abstraction that the Haskell runtime still maintains, but it’s remarkable how close we can get to C-like performance.</p>
<h2 data-track-content data-content-name="future-directions" data-content-piece="arithmetic-bytecode-vm" id="future-directions">Future Directions</h2>
<p>While our Haskell program is fast, we can improve certain things:</p>
<ul>
<li><p><strong>Parser optimizations</strong>: As the benchmarks showed, parsing is our slowest pass. For better performance, we could replace our Attoparsec-based combinator parser with a parser generator like <a href="https://www.haskell.org/alex/" target="_blank" rel="noopener">Alex</a> and <a href="https://www.haskell.org/happy/" target="_blank" rel="noopener">Happy</a>, or even write a recursive-descent parser by hand.</p></li>
<li><p><strong>Superinstructions</strong>: We could analyze the bytecode for common instruction sequences (like <code class="sourceCode haskell"><span class="dt">OPush</span></code> followed by <code class="sourceCode haskell"><span class="dt">OAdd</span></code>) and combine them into single superinstructions. This would reduce the instruction dispatch overhead, but may make compilation slower.</p></li>
<li><p><strong>Register-based <abbr title="Virtual Machine">VM</abbr></strong>: A register-based <abbr title="Virtual Machine">VM</abbr>, which uses a small array of virtual registers instead of a memory-based stack, could significantly reduce memory traffic and improve performance. This would require a more complex compiler capable of register allocation.</p></li>
<li><p><strong>Just-in-Time (JIT) compilation</strong>: The ultimate performance boost could come from a <a href="https://en.wikipedia.org/wiki/JIT_compiler" target="_blank" rel="noopener"><abbr title="Just in time">JIT</abbr> compiler</a>. Instead of interpreting bytecode, we could compile it to native machine code at runtime, eliminating the interpreter entirely. Maybe we could use <a href="https://web.archive.org/web/20251021/https://llvm.org/" target="_blank" rel="noopener">LLVM</a> to build a <abbr title="Just in time">JIT</abbr> compiler in Haskell.</p></li>
</ul>
<h2 data-track-content data-content-name="conclusion" data-content-piece="arithmetic-bytecode-vm" id="conclusion">Conclusion</h2>
<p>And that’s a wrap! We successfully built a bytecode compiler and virtual machine in Haskell. We covered parsing, <abbr title="Abstract Syntax Tree">AST</abbr> interpretation, compilation, and bytecode execution, as well as, debugging and testing functionalities. Let’s update our checklist:</p>
<ul class="task-list">
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#parsing-expressions">Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#testing-the-parser">Unit testing for our parser.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#the-ast-interpreter">Interpreting ASTs.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed#the-compiler">Compiling ASTs to bytecode.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed#the-decompiler">Disassembling and decompiling bytecode.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed#testing-the-compiler">Unit testing for our compiler.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="#testing-the-compiler">Property-based testing for our compiler.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="#the-virtual-machine">Efficiently executing bytecode in a virtual machine.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="#testing-the-vm">Unit testing and property-based testing for our VM.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="#benchmarking-the-vm">Benchmarking our code to see how the different passes perform.</a></label></li>
<li><label><input type="checkbox" checked></input>All the while <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed#compiling-fast-and-slow">keeping</a> <a href="#peeking-under-the-hood-ghc-core">an eye</a> on performance.</label></li>
</ul>
<p>The journey from a simple <abbr title="Abstract Syntax Tree">AST</abbr> interpreter to a bytecode <abbr title="Virtual Machine">VM</abbr> has been a rewarding one. We saw a significant performance improvement, learned about how compilers and <abbr title="Virtual Machine">VM</abbr>s work, and how to write performant code in Haskell. While our Haskell implementation isn’t as fast as the hand-written C version, it’s far more concise and, I would argue, easier to reason about. It’s a great demonstration of Haskell’s power for writing high-performance—yet safe and elegant—code.</p>
<p>See the full code at:</p>
<ul>
<li><a href="https://abhinavsarkar.net/code/ArithVMLib.html?mtm_campaign=feed">ArithVMLib.hs</a></li>
<li><a href="https://abhinavsarkar.net/code/ArithVMSpec.html?mtm_campaign=feed">ArithVMSpec.hs</a></li>
<li><a href="https://abhinavsarkar.net/code/ArithVMBench.html?mtm_campaign=feed">ArithVMBench.hs</a></li>
<li><a href="https://abhinavsarkar.net/code/ArithVMApp.html?mtm_campaign=feed">ArithVMApp.hs</a></li>
<li><a href="https://abhinavsarkar.net/code/arithvm.html?mtm_campaign=feed">arithvm.c</a></li>
</ul>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>Actually, QuickCheck does not generate entirely arbitrary inputs. It generates arbitrary inputs with increasing complexity—where the complexity is defined by the user—and asserts the properties on these inputs. When a test fails for a particular input, QuickCheck also tries to simplify the culprit and tries to find the simplest input for which the test fails. This process is called <em>Shrinking</em> in QuickCheck parlance. QuickCheck then shows this simplest input to the user for them to use it to debug their code.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>Read this good <a href="https://jesper.sikanda.be/posts/quickcheck-intro.html" target="_blank" rel="noopener">introduction to QuickCheck</a> if you are unfamiliar.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>Notice that we discard the expressions that do not compile successfully.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p><code>sp</code> and <code>ip</code> are not actual pointers, but indices into the stack and bytecode arrays respectively.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>Guided by the <abbr title="Glasgow Haskell Compiler">GHC</abbr> profiler, I tweaked the code in many different ways and ran benchmarks for every change. Then I chose the code that was most performant.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>It is extremely important to write good tests before getting your hands dirty with performance optimizations. In my case, the tests saved me many times from breaking the VM while moving code around for performance.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>We are using our <abbr title="Abstract Syntax Tree">AST</abbr> interpreter as a definitional interpreter, assuming it to be correctly implemented because of its simpler nature.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>I ran all benchmarks on an Apple M4 Pro 24GB machine against a 142MB file generated using the expression generator we wrote earlier.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9"><p>I don’t claim to be a great or even a good C programmer. In fact, this C <abbr title="Virtual Machine">VM</abbr> is the first substantial C code I have written in decades. I’m sure the code is not most optimized. It may even be ridden with memory management bugs. If you find something wrong, please let me know in the comments.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn10"><p>I tried various RTS options to tweak <abbr title="Glasgow Haskell Compiler">GHC</abbr> garbage collection, but the defaults proved to be fastest.<a href="#fnref10" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn11"><p>The lines of code are for only the overlapping functionalities between C and Haskell versions.<a href="#fnref11" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn12"><p>I did try using <em><a href="https://en.wikipedia.org/wiki/Threaded_code#Direct_threading" target="_blank" rel="noopener">Direct Threading</a></em> and <em><a href="https://en.wikipedia.org/wiki/Threaded_code#Subroutine_threading" target="_blank" rel="noopener">Subroutine Threading</a></em> in the C code, but they resulted in slower code than the switch-case variant. GCC may be smart enough in case of this simple <abbr title="Virtual Machine">VM</abbr> to optimize the switch-case to be faster than threaded code.<a href="#fnref12" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn13"><p>You may have noticed that the C <code>interpret</code> function is not laid out in the exact same manner as the Haskell <code>interpretBytecode'.go</code> function. In case of C, moving the checks to the front did not yield in performance improvement. I suspect this may be because GCC is smart enough to do that optimization by itself. The nested <code class="sourceCode c"><span class="cf">switch</span></code> were also no detriment to the performance of the C code.<a href="#fnref13" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>A Fast Bytecode VM for Arithmetic</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed">The Parser</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed">The Compiler</a>
</li>
<li>
<strong>The Virtual Machine</strong> 👈
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2025-10-21T00:00:00Z <p>In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics:</p>
<ul class="task-list">
<li><label><input type="checkbox" checked="" /><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/#parsing-expressions">Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).</a></label></li>
<li><label><input type="checkbox" checked="" /><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/#testing-the-parser">Unit testing for our parser.</a></label></li>
<li><label><input type="checkbox" checked="" /><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/#the-ast-interpreter">Interpreting ASTs.</a></label></li>
<li><label><input type="checkbox" checked="" /><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/#the-compiler">Compiling ASTs to bytecode.</a></label></li>
<li><label><input type="checkbox" checked="" /><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/#the-decompiler">Disassembling and decompiling bytecode.</a></label></li>
<li><label><input type="checkbox" checked="" /><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/#testing-the-compiler">Unit testing for our compiler.</a></label></li>
<li><label><input type="checkbox" /><span class="todo">Property-based testing for our compiler.</span></label></li>
<li><label><input type="checkbox" /><span class="todo">Efficiently executing bytecode in a virtual machine (VM).</span></label></li>
<li><label><input type="checkbox" /><span class="todo">Unit testing and property-based testing for our <abbr title="Virtual Machine">VM</abbr>.</span></label></li>
<li><label><input type="checkbox" /><span class="todo">Benchmarking our code to see how the different passes perform.</span></label></li>
<li><label><input type="checkbox" /><span class="todo">All the while keeping an eye on performance.</span></label></li>
</ul>
<p>In this final post, we write the virtual machine that executes our bytecode, and benchmark it.</p>
https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/ A Fast Bytecode VM for Arithmetic: The Compiler 2025-08-24T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics:</p>
<ul class="task-list">
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#parsing-expressions">Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#testing-the-parser">Unit testing for our parser.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#the-ast-interpreter">Interpreting ASTs.</a></label></li>
<li><label><input type="checkbox"></input><span class="todo">Compiling ASTs to bytecode.</span></label></li>
<li><label><input type="checkbox"></input><span class="todo">Disassembling and decompiling bytecode.</span></label></li>
<li><label><input type="checkbox"></input><span class="todo">Unit testing for our compiler.</span></label></li>
<li><label><input type="checkbox"></input>Property-based testing for our compiler.</label></li>
<li><label><input type="checkbox"></input>Efficiently executing bytecode in a virtual machine (VM).</label></li>
<li><label><input type="checkbox"></input>Unit testing and property-based testing for our <abbr title="Virtual Machine">VM</abbr>.</label></li>
<li><label><input type="checkbox"></input>Benchmarking our code to see how the different passes perform.</label></li>
<li><label><input type="checkbox"></input>All the while keeping an eye on performance.</label></li>
</ul>
<p>In this post, we write the compiler for our <abbr title="Abstract Syntax Tree">AST</abbr> to bytecode, and a decompiler for the bytecode.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>A Fast Bytecode VM for Arithmetic</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed">The Parser</a>
</li>
<li>
<strong>The Compiler</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm/?mtm_campaign=feed">The Virtual Machine</a>
</li>
</ol>
</section>
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#introduction">Introduction</a></li><li><a href="#the-bytecode">The Bytecode</a><ol><li><a href="#num"><code class="sourceCode haskell"><span class="dt">Num</span></code></a></li><li><a href="#binop"><code class="sourceCode haskell"><span class="dt">BinOp</span></code></a></li><li><a href="#var-and-let"><code class="sourceCode haskell"><span class="dt">Var</span></code> and
<code class="sourceCode haskell"><span class="dt">Let</span></code></a></li></ol></li><li><a href="#the-compiler">The Compiler</a><ol><li><a href="#compiling-fast-and-slow">Compiling, Fast and Slow</a></li></ol></li><li><a href="#the-decompiler">The Decompiler</a></li><li><a href="#testing-the-compiler">Testing the Compiler</a></li></ol></nav>
<h2 data-track-content data-content-name="introduction" data-content-piece="arithmetic-bytecode-vm-compiler" id="introduction">Introduction</h2>
<p><abbr title="Abstract Syntax Tree">AST</abbr> interpreters are well known to be slow because of how <abbr title="Abstract Syntax Tree">AST</abbr> nodes are represented in the computer’s memory. The <abbr title="Abstract Syntax Tree">AST</abbr> nodes contain pointers to other nodes, which may be anywhere in the memory. So while interpreting an <abbr title="Abstract Syntax Tree">AST</abbr>, the interpreter jumps all over the memory, causing a slowdown. One solution to this is to convert the <abbr title="Abstract Syntax Tree">AST</abbr> into a more compact and optimized representation known as <em><a href="https://en.wikipedia.org/wiki/Bytecode" target="_blank" rel="noopener">Bytecode</a></em>.</p>
<p>Bytecode is a flattened and compact representation of a program, usually manifested as a byte array. Bytecode is essentially an <em><a href="https://en.wikipedia.org/wiki/Instruction_Set" target="_blank" rel="noopener">Instruction Set</a></em> (IS), but custom-made to be executed by a <em><a href="https://en.wikipedia.org/wiki/Virtual_machine#Process_virtual_machines" target="_blank" rel="noopener">Virtual Machine</a></em> (VM), instead of a physical machine. Each bytecode instruction is one byte in size (that’s where it gets its name from). A bytecode and its <abbr title="Virtual Machine">VM</abbr> are created in synergy so that the execution is as efficient as possible<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. Compiling source code to bytecode and executing it in a <abbr title="Virtual Machine">VM</abbr> also allows the program to be run on all platforms that the <abbr title="Virtual Machine">VM</abbr> supports without the developer caring much about portability concerns. The most popular combo of bytecode and <abbr title="Virtual Machine">VM</abbr> is probably the <a href="https://en.wikipedia.org/wiki/Java_bytecode" target="_blank" rel="noopener">Java bytecode</a> and the <a href="https://en.wikipedia.org/wiki/Java_virtual_machine" target="_blank" rel="noopener">Java virtual machine</a>.</p>
<p>The <abbr title="Virtual Machine">VM</abbr>s can be <a href="https://en.wikipedia.org/wiki/Stack_machine" target="_blank" rel="noopener">stack-based</a> or <a href="https://en.wikipedia.org/wiki/Register_machine" target="_blank" rel="noopener">register-based</a>. In a stack-based <abbr title="Virtual Machine">VM</abbr>, all values created during the execution of a program are stored only in a <em><a href="https://en.wikipedia.org/wiki/Stack_(abstract_data_type)" target="_blank" rel="noopener">Stack</a></em> data-structure residing in the memory. Whereas, in a register-based <abbr title="Virtual Machine">VM</abbr>, there is also an additional set of fixed number of <a href="https://en.wikipedia.org/wiki/Processor_register" target="_blank" rel="noopener">registers</a> that are used to store values in preference to the stack<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>. Register-based <abbr title="Virtual Machine">VM</abbr>s are usually faster, but stack-based <abbr title="Virtual Machine">VM</abbr>s are usually simpler to implement. For our purpose, we choose to implement a stack-based <abbr title="Virtual Machine">VM</abbr>.</p>
<p>We are going to write a compiler that compiles our expression <abbr title="Abstract Syntax Tree">AST</abbr> to bytecode. But first, let’s design the bytecode for our stack-based <abbr title="Virtual Machine">VM</abbr>.</p>
<h2 data-track-content data-content-name="the-bytecode" data-content-piece="arithmetic-bytecode-vm-compiler" id="the-bytecode">The Bytecode</h2>
<p>Here is our expression AST as a reminder:</p>
<figure>
<div class="sourceCode" id="cb1" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Expr</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Num</span> <span class="op">!</span><span class="dt">Int16</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Var</span> <span class="op">!</span><span class="dt">Ident</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">BinOp</span> <span class="op">!</span><span class="dt">Op</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Let</span> <span class="op">!</span><span class="dt">Ident</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Generic</span>)</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Ident</span> <span class="ot">=</span> <span class="dt">Ident</span> <span class="dt">BS.ByteString</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Ord</span>, <span class="dt">Generic</span>, <span class="dt">Hashable</span>)</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Op</span> <span class="ot">=</span> <span class="dt">Add</span> <span class="op">|</span> <span class="dt">Sub</span> <span class="op">|</span> <span class="dt">Mul</span> <span class="op">|</span> <span class="dt">Div</span> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Enum</span>, <span class="dt">Generic</span>)</span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>Let’s figure out the right bytecode for each case. First, we create <em><a href="https://en.wikipedia.org/wiki/Opcodes" target="_blank" rel="noopener">Opcodes</a></em> for each bytecode, which are sort of mnemonics for actual bytecode. Think of them as <em><a href="https://en.wikipedia.org/wiki/Assembly_language" target="_blank" rel="noopener">Assembly</a></em> is to <em><a href="https://en.wikipedia.org/wiki/Machine_Code" target="_blank" rel="noopener">Machine Code</a></em>.</p>
<h3 id="num"><code class="sourceCode haskell"><span class="dt">Num</span></code></h3>
<p>For a number literal, we need to put it directly in the bytecode so that we can use it later during the execution. We also need an opcode to push it on the stack. Let’s call it <code class="sourceCode haskell"><span class="dt">OPush</span></code> with an <code class="sourceCode haskell"><span class="dt">Int16</span></code> parameter.</p>
<h3 id="binop"><code class="sourceCode haskell"><span class="dt">BinOp</span></code></h3>
<p>Binary operations recursively use <code class="sourceCode haskell"><span class="dt">Expr</span></code> for their operands. To evaluate a binary operation, we need its operands to be evaluated before, so we compile them first to bytecode. After that, all we need is an opcode per operator. Let’s call them <code class="sourceCode haskell"><span class="dt">OAdd</span></code>, <code class="sourceCode haskell"><span class="dt">OSub</span></code>, <code class="sourceCode haskell"><span class="dt">OMul</span></code>, and <code class="sourceCode haskell"><span class="dt">ODiv</span></code> for <code class="sourceCode haskell"><span class="dt">Add</span></code>, <code class="sourceCode haskell"><span class="dt">Sub</span></code>, <code class="sourceCode haskell"><span class="dt">Mul</span></code>, and <code class="sourceCode haskell"><span class="dt">Div</span></code> operators respectively.</p>
<h3 id="var-and-let"><code class="sourceCode haskell"><span class="dt">Var</span></code> and <code class="sourceCode haskell"><span class="dt">Let</span></code></h3>
<p>Variables and <code class="sourceCode haskell"><span class="dt">Let</span></code> expressions are more complex<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>. In the <abbr title="Abstract Syntax Tree">AST</abbr> interpreter we chucked the variables in a map, but we cannot do that in a <abbr title="Virtual Machine">VM</abbr>. There is no environment map in a <abbr title="Virtual Machine">VM</abbr>, and all values must reside in the stack. How do we have variables at all then? Let’s think for a bit.</p>
<p>Each expression, after being evaluated in the <abbr title="Virtual Machine">VM</abbr>, must push exactly one value on the stack: its result. <code class="sourceCode haskell"><span class="dt">Num</span></code> expressions are a trivial case. When a binary operation is evaluated, first its left operand is evaluated. That pushes one value on the stack. Then its right operand is evaluated, and that pushes another value on the stack. Finally, the operation pops the two values from the top of the stack, does its thing, and pushes the resultant value back on the stack—again one value for the entire <code class="sourceCode haskell"><span class="dt">BinOp</span></code> expression.</p>
<p>A <code class="sourceCode haskell"><span class="dt">Let</span></code> expression binds a variable’s value to its name, and then the variable can be referred from the body of the expression. But how can we refer to a variable when the stack contains only values, not names? Let’s imagine that we are in middle of evaluating a large expression, wherein we encounter a <code class="sourceCode haskell"><span class="dt">Let</span></code> expression. First we evaluate its assignment expression, and that pushes a value on the top of the stack. Let’s say that the stack has <code>n</code> values at this point. After this we get to evaluate the body expression. At all times when we are doing that, the value from assignment stays at the same point in the stack because evaluating sub-expressions, no matter how complicated, only adds new values to the stack, without popping an existing value from before. Therefore, we can use the stack index of the assignment value (<code>n−1</code>) to refer to it from within the body expression. So, we encode <code class="sourceCode haskell"><span class="dt">Var</span></code> as an opcode and an integer index into the stack.</p>
<p>We choose to use a <code class="sourceCode haskell"><span class="dt">Word8</span></code> to index the stack, limiting us to a stack depth of 256. We encode the variable references with an opcode <code class="sourceCode haskell"><span class="dt">OGet</span></code>, which when executed gets the value from the stack at the given index and pushes it on the stack.</p>
<p>For a <code class="sourceCode haskell"><span class="dt">Let</span></code> expression, after we compile its assignment and body expressions, we need to make sure that the exactly-one-value invariant holds. Evaluating the assignment and body pushes two values on the stack, but we can have only one! So we overwrite the assignment value with the body value, and pop the stack to remove the body value. We invent a new opcode <code class="sourceCode haskell"><span class="dt">OSwapPop</span></code> to do this, called so because its effect is equivalent to swapping the topmost two values on the stack, and then popping the new top value<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>.</p>
<p>Putting all the opcodes together, we have the <code class="sourceCode haskell"><span class="dt">Opcode</span></code> ADT:</p>
<figure>
<div class="sourceCode" id="cb2" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Opcode</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">OPush</span> <span class="op">!</span><span class="dt">Int16</span> <span class="co">-- 0</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OGet</span> <span class="op">!</span><span class="dt">Word8</span> <span class="co">-- 1</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OSwapPop</span> <span class="co">-- 2</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OAdd</span> <span class="co">-- 3</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OSub</span> <span class="co">-- 4</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OMul</span> <span class="co">-- 5</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">ODiv</span> <span class="co">-- 6</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Read</span>, <span class="dt">Eq</span>, <span class="dt">Generic</span>)</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">NFData</span> <span class="dt">Opcode</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>Notice that we also assigned bytecodes—that is, a unique byte value—to each <code class="sourceCode haskell"><span class="dt">Opcode</span></code> above, which are just their ordinals. Now we are ready to write the compiler.</p>
<h2 data-track-content data-content-name="the-compiler" data-content-piece="arithmetic-bytecode-vm-compiler" id="the-compiler">The Compiler</h2>
<p>The compiler takes an expression with the bytecode size, and compiles it to a strict <a href="https://hackage.haskell.org/package/bytestring/docs/Data-ByteString.html#t:ByteString" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ByteString</span></code></a> of that size. Recall that in <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#parsing-expressions">the previous post</a>, we wrote our parser such that the bytecode size for each <abbr title="Abstract Syntax Tree">AST</abbr> node was calculated while parsing it. This allows us to pre-allocate a bytestring of required size before compiling the <abbr title="Abstract Syntax Tree">AST</abbr>. We compile to actual bytes here, and don’t use the opcodes.</p>
<figure>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Bytecode</span> <span class="ot">=</span> <span class="dt">BS.ByteString</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="ot">compile ::</span> <span class="dt">SizedExpr</span> <span class="ot">-></span> <span class="dt">Result</span> <span class="dt">Bytecode</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>compile <span class="ot">=</span> compile' defaultStackSize</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a><span class="ot">compile' ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">SizedExpr</span> <span class="ot">-></span> <span class="dt">Result</span> <span class="dt">Bytecode</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>compile' stackSize (expr, bytecodeSize) <span class="ot">=</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">uncurry</span> (<span class="fu">fmap</span> <span class="op">.</span> <span class="fu">const</span>) <span class="op">.</span> BSI.unsafeCreateUptoN' bytecodeSize <span class="op">$</span> \fp <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> (bytecodeSize,)</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> <span class="fu">fmap</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> (compileIO bytecodeSize stackSize fp fp expr <span class="op">>>=</span> checkSize fp <span class="op">.</span> TS.fst)</span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a> <span class="ot">`catch`</span> (<span class="fu">pure</span> <span class="op">.</span> <span class="dt">Left</span>)</span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a> checkSize fp ip <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> actualBytecodeSize <span class="ot">=</span> ip <span class="ot">`minusPtr`</span> fp</span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a> unless (actualBytecodeSize <span class="op">==</span> bytecodeSize) <span class="op">$</span></span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> throwIO <span class="op">.</span> <span class="dt">Error</span> <span class="dt">Compile</span> <span class="op">$</span></span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a> <span class="st">"Compiled bytecode size "</span> <span class="op"><></span> <span class="fu">show</span> actualBytecodeSize</span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">" is not same as expected size: "</span> <span class="op"><></span> <span class="fu">show</span> bytecodeSize</span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="ot">compileIO ::</span></span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Ptr</span> <span class="dt">Word8</span> <span class="ot">-></span> <span class="dt">Ptr</span> <span class="dt">Word8</span> <span class="ot">-></span> <span class="dt">Expr</span> <span class="ot">-></span> <span class="dt">IO</span> (<span class="dt">Pair</span> (<span class="dt">Ptr</span> <span class="dt">Word8</span>) <span class="dt">Int</span>)</span>
<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a>compileIO bytecodeSize stackSize fp ip <span class="ot">=</span> go Map.empty <span class="dv">0</span> ip</span>
<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a> ep <span class="ot">=</span> fp <span class="ot">`plusPtr`</span> bytecodeSize</span>
<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-28"><a href="#cb3-28" aria-hidden="true" tabindex="-1"></a> go env <span class="op">!</span>sp <span class="op">!</span>ip <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb3-29"><a href="#cb3-29" aria-hidden="true" tabindex="-1"></a> <span class="dt">Num</span> n <span class="op">|</span> sp <span class="op">+</span> <span class="dv">1</span> <span class="op"><=</span> stackSize <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb3-30"><a href="#cb3-30" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> <span class="op">!</span>lb <span class="ot">=</span> <span class="fu">fromIntegral</span> <span class="op">$</span> n <span class="op">.&.</span> <span class="bn">0xff</span></span>
<span id="cb3-31"><a href="#cb3-31" aria-hidden="true" tabindex="-1"></a> <span class="op">!</span>mb <span class="ot">=</span> <span class="fu">fromIntegral</span> <span class="op">$</span> ((<span class="fu">fromIntegral</span><span class="ot"> n ::</span> <span class="dt">Word16</span>) <span class="op">.&.</span> <span class="bn">0xff00</span>) <span class="ot">`shiftR`</span> <span class="dv">8</span></span>
<span id="cb3-32"><a href="#cb3-32" aria-hidden="true" tabindex="-1"></a> writeByte ip <span class="dv">0</span> <span class="co">-- OPush</span></span>
<span id="cb3-33"><a href="#cb3-33" aria-hidden="true" tabindex="-1"></a> writeByte (ip <span class="ot">`plusPtr`</span> <span class="dv">1</span>) lb</span>
<span id="cb3-34"><a href="#cb3-34" aria-hidden="true" tabindex="-1"></a> writeByte (ip <span class="ot">`plusPtr`</span> <span class="dv">2</span>) mb</span>
<span id="cb3-35"><a href="#cb3-35" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> (ip <span class="ot">`plusPtr`</span> <span class="dv">3</span> <span class="op">:!:</span> sp <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb3-36"><a href="#cb3-36" aria-hidden="true" tabindex="-1"></a> <span class="dt">Num</span> _ <span class="ot">-></span> throwCompileError <span class="st">"Stack overflow"</span></span>
<span id="cb3-37"><a href="#cb3-37" aria-hidden="true" tabindex="-1"></a> <span class="dt">BinOp</span> op a b <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb3-38"><a href="#cb3-38" aria-hidden="true" tabindex="-1"></a> (ip' <span class="op">:!:</span> sp') <span class="ot"><-</span> go env sp ip a</span>
<span id="cb3-39"><a href="#cb3-39" aria-hidden="true" tabindex="-1"></a> (ip'' <span class="op">:!:</span> sp'') <span class="ot"><-</span> go env sp' ip' b</span>
<span id="cb3-40"><a href="#cb3-40" aria-hidden="true" tabindex="-1"></a> writeByte ip'' <span class="op">$</span> translateOp op</span>
<span id="cb3-41"><a href="#cb3-41" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> (ip'' <span class="ot">`plusPtr`</span> <span class="dv">1</span> <span class="op">:!:</span> sp'' <span class="op">-</span> <span class="dv">1</span>)</span>
<span id="cb3-42"><a href="#cb3-42" aria-hidden="true" tabindex="-1"></a> <span class="dt">Let</span> x assign body <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb3-43"><a href="#cb3-43" aria-hidden="true" tabindex="-1"></a> (ip' <span class="op">:!:</span> sp') <span class="ot"><-</span> go env sp ip assign</span>
<span id="cb3-44"><a href="#cb3-44" aria-hidden="true" tabindex="-1"></a> (ip'' <span class="op">:!:</span> sp'') <span class="ot"><-</span> go (Map.insert x sp env) sp' ip' body</span>
<span id="cb3-45"><a href="#cb3-45" aria-hidden="true" tabindex="-1"></a> writeByte ip'' <span class="dv">2</span> <span class="co">-- OSwapPop</span></span>
<span id="cb3-46"><a href="#cb3-46" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> (ip'' <span class="ot">`plusPtr`</span> <span class="dv">1</span> <span class="op">:!:</span> sp'' <span class="op">-</span> <span class="dv">1</span>)</span>
<span id="cb3-47"><a href="#cb3-47" aria-hidden="true" tabindex="-1"></a> <span class="dt">Var</span> x <span class="op">|</span> sp <span class="op">+</span> <span class="dv">1</span> <span class="op"><=</span> stackSize <span class="ot">-></span> <span class="kw">case</span> Map.lookup x env <span class="kw">of</span></span>
<span id="cb3-48"><a href="#cb3-48" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> throwCompileError <span class="op">$</span> <span class="st">"Unknown variable: "</span> <span class="op"><></span> <span class="fu">show</span> x</span>
<span id="cb3-49"><a href="#cb3-49" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> varScope</span>
<span id="cb3-50"><a href="#cb3-50" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> varScope <span class="op"><</span> stackSize <span class="op">&&</span> varScope <span class="op"><</span> <span class="fu">fromIntegral</span> (<span class="fu">maxBound</span> <span class="op">@</span><span class="dt">Word8</span>) <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb3-51"><a href="#cb3-51" aria-hidden="true" tabindex="-1"></a> writeByte ip <span class="dv">1</span> <span class="co">-- OGet</span></span>
<span id="cb3-52"><a href="#cb3-52" aria-hidden="true" tabindex="-1"></a> writeByte (ip <span class="ot">`plusPtr`</span> <span class="dv">1</span>) <span class="op">$</span> <span class="fu">fromIntegral</span> varScope</span>
<span id="cb3-53"><a href="#cb3-53" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> (ip <span class="ot">`plusPtr`</span> <span class="dv">2</span> <span class="op">:!:</span> sp <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb3-54"><a href="#cb3-54" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> _ <span class="ot">-></span> throwCompileError <span class="st">"Stack overflow"</span></span>
<span id="cb3-55"><a href="#cb3-55" aria-hidden="true" tabindex="-1"></a> <span class="dt">Var</span> _ <span class="ot">-></span> throwCompileError <span class="st">"Stack overflow"</span></span>
<span id="cb3-56"><a href="#cb3-56" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-57"><a href="#cb3-57" aria-hidden="true" tabindex="-1"></a><span class="ot"> writeByte ::</span> <span class="dt">Ptr</span> <span class="dt">Word8</span> <span class="ot">-></span> <span class="dt">Word8</span> <span class="ot">-></span> <span class="dt">IO</span> ()</span>
<span id="cb3-58"><a href="#cb3-58" aria-hidden="true" tabindex="-1"></a> writeByte <span class="op">!</span>ip <span class="op">!</span>val</span>
<span id="cb3-59"><a href="#cb3-59" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> ip <span class="op"><</span> ep <span class="ot">=</span> poke ip val</span>
<span id="cb3-60"><a href="#cb3-60" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> throwCompileError <span class="op">$</span></span>
<span id="cb3-61"><a href="#cb3-61" aria-hidden="true" tabindex="-1"></a> <span class="st">"Instruction index "</span> <span class="op"><></span> <span class="fu">show</span> (ip <span class="ot">`minusPtr`</span> fp)</span>
<span id="cb3-62"><a href="#cb3-62" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">" out of bound "</span> <span class="op"><></span> <span class="fu">show</span> (bytecodeSize <span class="op">-</span> <span class="dv">1</span>)</span>
<span id="cb3-63"><a href="#cb3-63" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-64"><a href="#cb3-64" aria-hidden="true" tabindex="-1"></a> translateOp <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb3-65"><a href="#cb3-65" aria-hidden="true" tabindex="-1"></a> <span class="dt">Add</span> <span class="ot">-></span> <span class="dv">3</span> <span class="co">-- OAdd</span></span>
<span id="cb3-66"><a href="#cb3-66" aria-hidden="true" tabindex="-1"></a> <span class="dt">Sub</span> <span class="ot">-></span> <span class="dv">4</span> <span class="co">-- OSub</span></span>
<span id="cb3-67"><a href="#cb3-67" aria-hidden="true" tabindex="-1"></a> <span class="dt">Mul</span> <span class="ot">-></span> <span class="dv">5</span> <span class="co">-- OMul</span></span>
<span id="cb3-68"><a href="#cb3-68" aria-hidden="true" tabindex="-1"></a> <span class="dt">Div</span> <span class="ot">-></span> <span class="dv">6</span> <span class="co">-- ODiv</span></span>
<span id="cb3-69"><a href="#cb3-69" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-70"><a href="#cb3-70" aria-hidden="true" tabindex="-1"></a> throwCompileError <span class="ot">=</span> throwIO <span class="op">.</span> <span class="dt">Error</span> <span class="dt">Compile</span></span>
<span id="cb3-71"><a href="#cb3-71" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-72"><a href="#cb3-72" aria-hidden="true" tabindex="-1"></a><span class="ot">defaultStackSize ::</span> <span class="dt">Int</span></span>
<span id="cb3-73"><a href="#cb3-73" aria-hidden="true" tabindex="-1"></a>defaultStackSize <span class="ot">=</span> <span class="dv">256</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>We use the <code>unsafeCreateUptoN'</code> function from the <a href="https://hackage.haskell.org/package/bytestring/docs/Data-ByteString-Internal.html" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Data.ByteString.Internal</span></code></a> module that allocates enough memory for the provided bytecode size, and gives us a pointer to the allocated memory. We call this pointer <code>fp</code> for <a href="https://en.wikipedia.org/wiki/frame_pointer" target="_blank" rel="noopener">frame pointer</a>. Then we traverse the <abbr title="Abstract Syntax Tree">AST</abbr> recursively, writing bytes for opcodes and arguments for each case. We use pointer arithmetic and the <a href="https://hackage.haskell.org/package/base/docs/Foreign-Storable.html#v:poke" target="_blank" rel="noopener"><code>poke</code></a> function to write the bytes. <code class="sourceCode haskell"><span class="dt">Int16</span></code> numbers are encoded as two bytes in <a href="https://en.wikipedia.org/wiki/little_endian" target="_blank" rel="noopener">little endian</a> fashion.</p>
<p>In the recursive traversal function <code>go</code>, we pass and return the current stack pointer <code>sp</code> and instruction pointer <code>ip</code>. We update these correctly for each case<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>. We also take care of checking that the pointers stay in the right bounds, failing which we throw appropriate errors.</p>
<p>We also pass an <code>env</code> parameter that is similar to the variable names to values environment we use in the <abbr title="Abstract Syntax Tree">AST</abbr> interpreter, but this one tracks variable names to stack indices at which they reside. We update this information before compiling the body of a <code class="sourceCode haskell"><span class="dt">Let</span></code> expression to capture the stack index of its assignment value. When compiling a <code class="sourceCode haskell"><span class="dt">Var</span></code> expression, we use the <code>env</code> map to lookup the variable’s stack index, and encode it in the bytecode.</p>
<p>At the end of compilation, we check that the entire bytestring is filled with bytes till the very end, failing which we throw an error. This check is required because otherwise the bytestring may have garbage bytes, and may fail inexplicably during execution.</p>
<p>All the errors are thrown in the <code>IO</code> monad using the <a href="https://hackage.haskell.org/package/base/docs/Control-Exception.html#v:throwIO" target="_blank" rel="noopener"><code>throwIO</code></a> function, and are caught after compilation using the <a href="https://hackage.haskell.org/package/base/docs/Control-Exception.html#v:catch" target="_blank" rel="noopener"><code>catch</code></a> function. The final result or error is returned wrapped into <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#error-handling"><code class="sourceCode haskell"><span class="dt">Result</span></code></a>.</p>
<p>Let’s see it in action:</p>
<pre class="plain"><code>$ echo -n "1 + 2 - 3 * 4" | arith-vm compile | hexdump -C
00000000 00 01 00 00 02 00 03 00 03 00 00 04 00 05 04 |...............|
0000000f</code></pre>
<pre class="plain"><code>$ echo -n "let x = 4 in let y = 5 in x + y" | arith-vm compile | hexdump -C
00000000 00 04 00 00 05 00 01 00 01 01 03 02 02 |.............|
0000000d</code></pre>
<p>You can verify that the resultant bytes are indeed correct. I assume that it is difficult for you to read raw bytes. We’ll fix this in a minute. Meanwhile, let’s ponder upon some performance characteristics of our compiler.</p>
<h3 id="compiling-fast-and-slow">Compiling, Fast and Slow</h3>
<p>You may be wondering why I chose to write the compiler in this somewhat convoluted way of pre-allocating a bytestring and using pointers. The answer is: performance. I didn’t actually start with pointers. I iterated through many different data and control structures to find the fastest one.</p>
<p>The table below shows the compilation times for a benchmark expression file when using different data structures to implement the <code>compileIO</code> function:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Data structure</th>
<th style="text-align: right;">Time (ms)</th>
<th style="text-align: right;">Incremental speedup</th>
<th style="text-align: right;">Overall speedup</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">List</td>
<td style="text-align: right;">4345</td>
<td style="text-align: right;">1x</td>
<td style="text-align: right;">1x</td>
</tr>
<tr>
<td style="text-align: left;">Seq</td>
<td style="text-align: right;">523</td>
<td style="text-align: right;">8.31x</td>
<td style="text-align: right;">8.31x</td>
</tr>
<tr>
<td style="text-align: left;">DList</td>
<td style="text-align: right;">486</td>
<td style="text-align: right;">1.08x</td>
<td style="text-align: right;">8.94x</td>
</tr>
<tr>
<td style="text-align: left;">BS Builder</td>
<td style="text-align: right;">370</td>
<td style="text-align: right;">1.31x</td>
<td style="text-align: right;">11.74x</td>
</tr>
<tr>
<td style="text-align: left;">Pre-allocated BS</td>
<td style="text-align: right;">54</td>
<td style="text-align: right;">6.85x</td>
<td style="text-align: right;">80.46x</td>
</tr>
<tr>
<td style="text-align: left;">Bytearray</td>
<td style="text-align: right;">52</td>
<td style="text-align: right;">1.02x</td>
<td style="text-align: right;">83.55x</td>
</tr>
</tbody>
</table>
</div>
<p>I started with the bread-and-butter data structure of Haskellers, the humble and known to be slow <a href="https://hackage.haskell.org/package/base/docs/Data-List.html" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">List</span></code></a>, which was indeed quite slow. Next, I moved on to <a href="https://hackage.haskell.org/package/containers/docs/Data-Sequence.html#t:Seq" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Seq</span></code></a> and thereafter <a href="https://hackage.haskell.org/package/dlist/docs/Data-DList.html#t:DList" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">DList</span></code></a>, which are known to be faster at concatenation/consing. Then I abandoned the use of intermediate data structures completely, choosing to use a bytestring <a href="https://hackage.haskell.org/package/bytestring/docs/Data-ByteString-Builder.html#t:Builder" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Builder</span></code></a> to create the bytestring. Finally, I had the epiphany that the bytestring size was known at compile time, and rewrote the function to pre-allocate the bytestring, thereby reaching the fastest solution.</p>
<p>I also tried using <a href="https://hackage.haskell.org/package/primitive/docs/Data-Primitive-ByteArray.html#t:ByteArra" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Bytearray</span></code></a>, which has more-or-less same performance of bytestring, but it is inconvenient to use because there are no functions to do IO with bytearrays. So I’d anyway need to use bytestrings for reading from STDIN or writing to STDOUT, and converting to-and-fro between bytearray and bytestring is a performance killer. Thus, I decided to stick to bytestrings.</p>
<p>The pre-allocated bytestring approach is 80 times faster than using lists, and almost 10 times faster than using <code class="sourceCode haskell"><span class="dt">Seq</span></code>. For such gain, I’m okay with the complications it brings to the code. Here are the numbers in a chart (smaller is better):</p>
<figure class="w-100pct">
<a href="https://abhinavsarkar.net/images/plots/pandocplot5432109677417900652.svg" class="img-link"><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 800 600'%3E%3C/svg%3E" class="lazyload w-100pct" style="--image-aspect-ratio: 1.3333333333333333" data-src="/images/plots/pandocplot5432109677417900652.svg" alt="Compilation time using different data-structures"></img>
<noscript><img src="/images/plots/pandocplot5432109677417900652.svg" class="w-100pct" alt="Compilation time using different data-structures"></img></noscript></a>
<figcaption>Compilation time using different data-structures</figcaption>
</figure>
<p>The other important data structure used here is the map (or dictionary) in which we add the mappings from identifiers to their stack indices. This data structure needs to be performant because we do a lookup for each variable we encounter while compiling. I benchmarked compilation for some data structures<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Data structure</th>
<th style="text-align: right;">Time (ms)</th>
<th style="text-align: right;">Slowdown</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><a href="https://hackage.haskell.org/package/unordered-containers/docs/Data-HashMap-Strict.html#t:HashMap" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Data.HashMap.Strict.HashMap</span></code></a></td>
<td style="text-align: right;">55</td>
<td style="text-align: right;">1.00x</td>
</tr>
<tr>
<td style="text-align: left;"><a href="https://hackage.haskell.org/package/base/docs/Data-List.html#t:List" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Data.List.List</span></code></a><a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a></td>
<td style="text-align: right;">63</td>
<td style="text-align: right;">1.14x</td>
</tr>
<tr>
<td style="text-align: left;"><a href="https://hackage.haskell.org/package/containers/docs/Data-Map-Strict.html#t:Map" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Data.Map.Strict.Map</span></code></a></td>
<td style="text-align: right;">71</td>
<td style="text-align: right;">1.29x</td>
</tr>
<tr>
<td style="text-align: left;"><a href="https://hackage.haskell.org/package/bytestring-trie/docs/Data-Trie.html#t:Trie" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Data.Trie.Trie</span></code></a></td>
<td style="text-align: right;">80</td>
<td style="text-align: right;">1.45x</td>
</tr>
<tr>
<td style="text-align: left;"><a href="https://hackage.haskell.org/package/vector-hashtables/docs/Data-Vector-Hashtables.html#t:Dictionary" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Data.Vector.Hashtables.Dictionary</span></code></a></td>
<td style="text-align: right;">104</td>
<td style="text-align: right;">1.89x</td>
</tr>
<tr>
<td style="text-align: left;"><a href="https://hackage.haskell.org/package/hashtables/docs/Data-HashTable-IO.html#t:BasicHashTable" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Data.HashTable.IO.BasicHashTable</span></code></a></td>
<td style="text-align: right;">312</td>
<td style="text-align: right;">5.67x</td>
</tr>
</tbody>
</table>
</div>
<p>Strict hashmap turns out to be the fasted one, but interestingly, linked list is a close second. Mutable hashtable is the slowest even though I expected it to be the fastest. Here are the times in a chart (smaller is better):</p>
<figure class="w-100pct">
<a href="https://abhinavsarkar.net/images/plots/pandocplot15511258508283734502.svg" class="img-link"><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 800 600'%3E%3C/svg%3E" class="lazyload w-100pct" style="--image-aspect-ratio: 1.3333333333333333" data-src="/images/plots/pandocplot15511258508283734502.svg" alt="Compilation time using different map data-structures"></img>
<noscript><img src="/images/plots/pandocplot15511258508283734502.svg" class="w-100pct" alt="Compilation time using different map data-structures"></img></noscript></a>
<figcaption>Compilation time using different map data-structures</figcaption>
</figure>
<p>Another choice I had to make was how to write the <code>go</code> function. I ended up passing and returning pointers and environment map, and throwing errors in <code class="sourceCode haskell"><span class="dt">IO</span></code>, but a number of solutions are possible. I tried out some of them, and noted the compilation times for the benchmark expression file:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Control structure</th>
<th style="text-align: right;">Time (ms)</th>
<th style="text-align: right;">Slowdown</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">IO</td>
<td style="text-align: right;">57.4</td>
<td style="text-align: right;">1.00x</td>
</tr>
<tr>
<td style="text-align: left;">IO + IORef</td>
<td style="text-align: right;">65.0</td>
<td style="text-align: right;">1.13x</td>
</tr>
<tr>
<td style="text-align: left;">IO + ReaderT</td>
<td style="text-align: right;">60.9</td>
<td style="text-align: right;">1.06x</td>
</tr>
<tr>
<td style="text-align: left;">IO + StateT</td>
<td style="text-align: right;">65.6</td>
<td style="text-align: right;">1.14x</td>
</tr>
<tr>
<td style="text-align: left;">IO + ExceptT</td>
<td style="text-align: right;">65.9</td>
<td style="text-align: right;">1.15x</td>
</tr>
<tr>
<td style="text-align: left;">IO + ReaderT + ExceptT</td>
<td style="text-align: right;">107.1</td>
<td style="text-align: right;">1.87x</td>
</tr>
<tr>
<td style="text-align: left;">IO + StateT + ExceptT</td>
<td style="text-align: right;">383.9</td>
<td style="text-align: right;">6.69x</td>
</tr>
<tr>
<td style="text-align: left;">IO + StateT + ReaderT</td>
<td style="text-align: right;">687.5</td>
<td style="text-align: right;">11.98x</td>
</tr>
<tr>
<td style="text-align: left;">IO + StateT + ReaderT + ExceptT</td>
<td style="text-align: right;">702.0</td>
<td style="text-align: right;">12.23x</td>
</tr>
<tr>
<td style="text-align: left;">IO + CPS</td>
<td style="text-align: right;">78.2</td>
<td style="text-align: right;">1.36x</td>
</tr>
<tr>
<td style="text-align: left;">IO + DCPS</td>
<td style="text-align: right;">78.4</td>
<td style="text-align: right;">1.37x</td>
</tr>
<tr>
<td style="text-align: left;">IO + ContT</td>
<td style="text-align: right;">76.5</td>
<td style="text-align: right;">1.33x</td>
</tr>
</tbody>
</table>
</div>
<p>I tried putting the pointer in <a href="https://hackage.haskell.org/package/base/docs/Data-IORef.html#t:IORef" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">IORef</span></code></a>s and <a href="https://hackage.haskell.org/package/mtl/docs/Control-Monad-State-Strict.html#v:StateT" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">StateT</span></code></a> state instead of passing them back-and-forth. I tried putting the environment in a <a href="https://hackage.haskell.org/package/mtl/docs/Control-Monad-Reader.html#t:ReaderT" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ReaderT</span></code></a> config. I tried using <a href="https://hackage.haskell.org/package/mtl/docs/Control-Monad-Except.html#t:ExceptT" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ExceptT</span></code></a> for throwing errors instead of using IO errors. Then I tried various combinations of these monad transformers.</p>
<p>Finally, I also tried converting the <code>go</code> function to be tail-recursive by using <em><a href="https://en.wikipedia.org/wiki/Continuation-passing_style" target="_blank" rel="noopener">Continuation-passing style</a></em> (CPS), and then <a href="https://en.wikipedia.org/wiki/Defunctionalization" target="_blank" rel="noopener">defunctionalizing</a> the continuations, as well as, using the <a href="https://hackage.haskell.org/package/mtl/docs/Control-Monad-Cont.html#t:ContT" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ContT</span></code></a> monad transformer. All of these approaches resulted in slower code. The times are interesting to compare (smaller is better):</p>
<figure class="w-100pct">
<a href="https://abhinavsarkar.net/images/plots/pandocplot6077718961309221259.svg" class="img-link"><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 800 600'%3E%3C/svg%3E" class="lazyload w-100pct" style="--image-aspect-ratio: 1.3333333333333333" data-src="/images/plots/pandocplot6077718961309221259.svg" alt="Compilation time using different control-structures"></img>
<noscript><img src="/images/plots/pandocplot6077718961309221259.svg" class="w-100pct" alt="Compilation time using different control-structures"></img></noscript></a>
<figcaption>Compilation time using different control-structures</figcaption>
</figure>
<p>There is no reason to use <code class="sourceCode haskell"><span class="dt">IORef</span></code>s here because they result in slower and uglier code. Using one monad transformer at a time results in slight slowdowns, which may be worth the improvement in the code. But using more than one of them degrades performance by a lot. Also, there is no improvement caused by <abbr title="Continuation-passing style">CPS</abbr> conversion, because <abbr title="Glasgow Haskell Compiler">GHC</abbr> is smart enough to optimize the non tail-recursive code to be faster then handwritten tail-recursive one that allocates a lot of closures (or objects in case of defunctionalization).</p>
<p>Moving on …</p>
<h2 data-track-content data-content-name="the-decompiler" data-content-piece="arithmetic-bytecode-vm-compiler" id="the-decompiler">The Decompiler</h2>
<p>It is a hassle to read raw bytes in the compiler output. Let’s write a decompiler to aid us in debugging and testing the compiler. First, a disassembler that converts bytes to opcodes:</p>
<figure>
<div class="sourceCode" id="cb6" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Program</span> <span class="ot">=</span> <span class="dt">Seq</span> <span class="dt">Opcode</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="ot">disassemble ::</span> <span class="dt">Bytecode</span> <span class="ot">-></span> <span class="dt">Result</span> <span class="dt">Program</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a>disassemble bytecode <span class="ot">=</span> go <span class="dv">0</span> Seq.empty</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> <span class="op">!</span>size <span class="ot">=</span> BS.length bytecode</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> go <span class="op">!</span>ip <span class="op">!</span>program</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> ip <span class="op">==</span> size <span class="ot">=</span> <span class="fu">pure</span> program</span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="kw">case</span> readInstr bytecode ip <span class="kw">of</span></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="op">|</span> ip <span class="op">+</span> <span class="dv">2</span> <span class="op"><</span> size <span class="ot">-></span></span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a> go (ip <span class="op">+</span> <span class="dv">3</span>) <span class="op">$</span> program <span class="op">|></span> <span class="dt">OPush</span> (readInstrArgInt16 bytecode ip)</span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> throwIPOOBError <span class="op">$</span> ip <span class="op">+</span> <span class="dv">2</span></span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="op">|</span> ip <span class="op">+</span> <span class="dv">1</span> <span class="op"><</span> size <span class="ot">-></span></span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a> go (ip <span class="op">+</span> <span class="dv">2</span>) <span class="op">$</span> program <span class="op">|></span> <span class="dt">OGet</span> (readInstrArgWord8 bytecode ip)</span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="ot">-></span> throwIPOOBError <span class="op">$</span> ip <span class="op">+</span> <span class="dv">1</span></span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a> <span class="dv">2</span> <span class="ot">-></span> go (ip <span class="op">+</span> <span class="dv">1</span>) <span class="op">$</span> program <span class="op">|></span> <span class="dt">OSwapPop</span></span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a> <span class="dv">3</span> <span class="ot">-></span> go (ip <span class="op">+</span> <span class="dv">1</span>) <span class="op">$</span> program <span class="op">|></span> <span class="dt">OAdd</span></span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a> <span class="dv">4</span> <span class="ot">-></span> go (ip <span class="op">+</span> <span class="dv">1</span>) <span class="op">$</span> program <span class="op">|></span> <span class="dt">OSub</span></span>
<span id="cb6-20"><a href="#cb6-20" aria-hidden="true" tabindex="-1"></a> <span class="dv">5</span> <span class="ot">-></span> go (ip <span class="op">+</span> <span class="dv">1</span>) <span class="op">$</span> program <span class="op">|></span> <span class="dt">OMul</span></span>
<span id="cb6-21"><a href="#cb6-21" aria-hidden="true" tabindex="-1"></a> <span class="dv">6</span> <span class="ot">-></span> go (ip <span class="op">+</span> <span class="dv">1</span>) <span class="op">$</span> program <span class="op">|></span> <span class="dt">ODiv</span></span>
<span id="cb6-22"><a href="#cb6-22" aria-hidden="true" tabindex="-1"></a> n <span class="ot">-></span> throwDisassembleError <span class="op">$</span></span>
<span id="cb6-23"><a href="#cb6-23" aria-hidden="true" tabindex="-1"></a> <span class="st">"Invalid bytecode: "</span> <span class="op"><></span> <span class="fu">show</span> n <span class="op"><></span> <span class="st">" at: "</span> <span class="op"><></span> <span class="fu">show</span> ip</span>
<span id="cb6-24"><a href="#cb6-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-25"><a href="#cb6-25" aria-hidden="true" tabindex="-1"></a> throwIPOOBError ip <span class="ot">=</span> throwDisassembleError <span class="op">$</span></span>
<span id="cb6-26"><a href="#cb6-26" aria-hidden="true" tabindex="-1"></a> <span class="st">"Instruction index "</span> <span class="op"><></span> <span class="fu">show</span> ip <span class="op"><></span> <span class="st">" out of bound "</span> <span class="op"><></span> <span class="fu">show</span> (size <span class="op">-</span> <span class="dv">1</span>)</span>
<span id="cb6-27"><a href="#cb6-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-28"><a href="#cb6-28" aria-hidden="true" tabindex="-1"></a> throwDisassembleError <span class="ot">=</span> throwError <span class="op">.</span> <span class="dt">Error</span> <span class="dt">Disassemble</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>A disassembled program is a sequence of opcodes. We simply go over each byte of the bytecode, and append the right opcode for it to the program, along with any parameters it may have. Note that we do not verify that the disassembled program is correct.</p>
<p>Here are the helpers that read instruction bytes and their arguments from a bytestring:</p>
<figure>
<div class="sourceCode" id="cb7" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">readInstr ::</span> <span class="dt">BS.ByteString</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Word8</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>readInstr <span class="ot">=</span> BS.unsafeIndex</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# INLINE readInstr #-}</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="ot">readInstrArgWord8 ::</span> <span class="dt">BS.ByteString</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Word8</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>readInstrArgWord8 bytecode ip <span class="ot">=</span> readInstr bytecode (ip <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# INLINE readInstrArgWord8 #-}</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a><span class="ot">readInstrArgInt16 ::</span> <span class="dt">BS.ByteString</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Int16</span></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>readInstrArgInt16 bytecode ip <span class="ot">=</span></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> lb <span class="ot">=</span> readInstr bytecode (ip <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a> mb <span class="ot">=</span> readInstr bytecode (ip <span class="op">+</span> <span class="dv">2</span>)</span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="ot"> b1 ::</span> <span class="dt">Word16</span> <span class="ot">=</span> <span class="fu">fromIntegral</span> lb</span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a> b2 <span class="ot">=</span> <span class="fu">fromIntegral</span> mb <span class="ot">`shiftL`</span> <span class="dv">8</span></span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="fu">fromIntegral</span> (b1 <span class="op">.|.</span> b2)</span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# INLINE readInstrArgInt16 #-}</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>Next, we decompile the opcodes to an expression:</p>
<figure>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">decompile ::</span> <span class="dt">Program</span> <span class="ot">-></span> <span class="dt">Result</span> <span class="dt">Expr</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>decompile program <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> stack <span class="ot"><-</span> go Seq.empty program</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> checkStack <span class="dt">Decompile</span> <span class="fu">maxBound</span> <span class="op">$</span> <span class="fu">length</span> stack</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> ast <span class="op">:<|</span> _ <span class="ot">=</span> stack</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> ast</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> go stack <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">Seq.Empty</span> <span class="ot">-></span> <span class="fu">pure</span> stack</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a> opcode <span class="op">:<|</span> rest <span class="ot">-></span> <span class="kw">case</span> opcode <span class="kw">of</span></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">OPush</span> n <span class="ot">-></span> go (stack <span class="op">|></span> <span class="dt">Num</span> n) rest</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">OAdd</span> <span class="ot">-></span> decompileBinOp <span class="dt">Add</span> <span class="op">>>=</span> <span class="fu">flip</span> go rest</span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">OSub</span> <span class="ot">-></span> decompileBinOp <span class="dt">Sub</span> <span class="op">>>=</span> <span class="fu">flip</span> go rest</span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">OMul</span> <span class="ot">-></span> decompileBinOp <span class="dt">Mul</span> <span class="op">>>=</span> <span class="fu">flip</span> go rest</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">ODiv</span> <span class="ot">-></span> decompileBinOp <span class="dt">Div</span> <span class="op">>>=</span> <span class="fu">flip</span> go rest</span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">OGet</span> i <span class="ot">-></span> go (stack <span class="op">|></span> <span class="dt">Var</span> (mkIdent <span class="op">$</span> mkName <span class="op">$</span> <span class="fu">fromIntegral</span> i)) rest</span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">OSwapPop</span> <span class="ot">-></span> decompileLet <span class="op">>>=</span> <span class="fu">flip</span> go rest</span>
<span id="cb8-18"><a href="#cb8-18" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb8-19"><a href="#cb8-19" aria-hidden="true" tabindex="-1"></a> decompileBinOp op <span class="ot">=</span> <span class="kw">case</span> stack <span class="kw">of</span></span>
<span id="cb8-20"><a href="#cb8-20" aria-hidden="true" tabindex="-1"></a> stack' <span class="op">:|></span> a <span class="op">:|></span> b <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> stack' <span class="op">|></span> <span class="dt">BinOp</span> op a b</span>
<span id="cb8-21"><a href="#cb8-21" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> throwDecompileError <span class="op">$</span></span>
<span id="cb8-22"><a href="#cb8-22" aria-hidden="true" tabindex="-1"></a> <span class="st">"Not enough elements to decompile binary operation: "</span> <span class="op"><></span> <span class="fu">show</span> op</span>
<span id="cb8-23"><a href="#cb8-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-24"><a href="#cb8-24" aria-hidden="true" tabindex="-1"></a> decompileLet <span class="ot">=</span> <span class="kw">case</span> stack <span class="kw">of</span></span>
<span id="cb8-25"><a href="#cb8-25" aria-hidden="true" tabindex="-1"></a> stack' <span class="op">:|></span> a <span class="op">:|></span> b <span class="ot">-></span></span>
<span id="cb8-26"><a href="#cb8-26" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> <span class="op">$</span> stack' <span class="op">|></span> <span class="dt">Let</span> (mkIdent <span class="op">$</span> mkName <span class="op">$</span> <span class="fu">length</span> stack <span class="op">-</span> <span class="dv">2</span>) a b</span>
<span id="cb8-27"><a href="#cb8-27" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> throwDecompileError <span class="st">"Not enough elements to decompile let"</span></span>
<span id="cb8-28"><a href="#cb8-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-29"><a href="#cb8-29" aria-hidden="true" tabindex="-1"></a> mkName i <span class="ot">=</span> names <span class="ot">`Seq.index`</span> i</span>
<span id="cb8-30"><a href="#cb8-30" aria-hidden="true" tabindex="-1"></a> names <span class="ot">=</span> Seq.fromList <span class="op">$</span> <span class="fu">tail</span> <span class="op">$</span> combinations <span class="dv">2</span></span>
<span id="cb8-31"><a href="#cb8-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-32"><a href="#cb8-32" aria-hidden="true" tabindex="-1"></a> combinations <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb8-33"><a href="#cb8-33" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> [<span class="st">""</span>]</span>
<span id="cb8-34"><a href="#cb8-34" aria-hidden="true" tabindex="-1"></a> n <span class="ot">-></span> <span class="kw">let</span> prev <span class="ot">=</span> combinations (n <span class="op">-</span> <span class="dv">1</span>)</span>
<span id="cb8-35"><a href="#cb8-35" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> prev <span class="op"><></span> [x <span class="op">:</span> xs <span class="op">|</span> x <span class="ot"><-</span> [<span class="ch">'a'</span> <span class="op">..</span> <span class="ch">'z'</span>], xs <span class="ot"><-</span> prev]</span>
<span id="cb8-36"><a href="#cb8-36" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-37"><a href="#cb8-37" aria-hidden="true" tabindex="-1"></a> throwDecompileError <span class="ot">=</span> throwError <span class="op">.</span> <span class="dt">Error</span> <span class="dt">Decompile</span></span>
<span id="cb8-38"><a href="#cb8-38" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-39"><a href="#cb8-39" aria-hidden="true" tabindex="-1"></a><span class="ot">checkStack ::</span> (<span class="dt">MonadError</span> <span class="dt">Error</span> m) <span class="ot">=></span> <span class="dt">Pass</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> m ()</span>
<span id="cb8-40"><a href="#cb8-40" aria-hidden="true" tabindex="-1"></a>checkStack pass stackSize <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb8-41"><a href="#cb8-41" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="ot">-></span> <span class="fu">pure</span> ()</span>
<span id="cb8-42"><a href="#cb8-42" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> throwError <span class="op">$</span> <span class="dt">Error</span> pass <span class="st">"Final stack has no elements"</span></span>
<span id="cb8-43"><a href="#cb8-43" aria-hidden="true" tabindex="-1"></a> n <span class="op">|</span> n <span class="op">></span> stackSize <span class="ot">-></span> throwError <span class="op">.</span> <span class="dt">Error</span> pass <span class="op">$</span> <span class="st">"Stack overflow"</span></span>
<span id="cb8-44"><a href="#cb8-44" aria-hidden="true" tabindex="-1"></a> n <span class="op">|</span> n <span class="op">></span> <span class="dv">1</span> <span class="ot">-></span> throwError <span class="op">.</span> <span class="dt">Error</span> pass <span class="op">$</span> <span class="st">"Final stack has more than one element"</span></span>
<span id="cb8-45"><a href="#cb8-45" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> throwError <span class="op">.</span> <span class="dt">Error</span> pass <span class="op">$</span> <span class="st">"Stack underflow"</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>Decompilation is the opposite of compilation. While compiling there is an implicit stack of expressions that are yet to be compiled. We make that stack explicit here, capturing expressions as they are decompiled from opcodes. For compound expressions, we inspect the stack and use the already decompiled expressions as the operands of the expression being decompiled. This way we build up larger expressions from smaller ones, culminating in the single top-level expression at the end<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>. Finally, we check the stack to make sure that there is only one expression left in it. Note that like the disassembler, we do not verify that the decompiled expression is correct.</p>
<p>There is one tricky thing in decompilation: we lose the names of the variables when compiling, and are left with only stack indices. So while decompiling, we generate variable names from their stack indices by indexing a list of unique names. Let’s see it in action:</p>
<pre class="plain"><code>$ echo -n "1 + 2 - 3 * 4" | arith-vm compile | arith-vm disassemble
OPush 1
OPush 2
OAdd
OPush 3
OPush 4
OMul
OSub
$ echo -n "1 + 2 - 3 * 4" | arith-vm compile | arith-vm decompile
( ( 1 + 2 ) - ( 3 * 4 ) )
$ echo -n "let x = 4 in let y = 5 in x + y" | arith-vm compile | arith-vm disassemble
OPush 4
OPush 5
OGet 0
OGet 1
OAdd
OSwapPop
OSwapPop
$ echo -n "let x = 4 in let y = 5 in x + y" | arith-vm compile | arith-vm decompile
( let a = 4 in ( let b = 5 in ( a + b ) ) )</code></pre>
<p>That’s all for compilation and decompilation. Now, we use them together to make sure that everything works.</p>
<h2 data-track-content data-content-name="testing-the-compiler" data-content-piece="arithmetic-bytecode-vm-compiler" id="testing-the-compiler">Testing the Compiler</h2>
<p>We write some unit tests for the compiler, targeting both success and failure cases:</p>
<figure>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">compilerSpec ::</span> <span class="dt">Spec</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>compilerSpec <span class="ot">=</span> describe <span class="st">"Compiler"</span> <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> forM_ compilerSuccessTests <span class="op">$</span> \(input, result) <span class="ot">-></span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> it (<span class="st">"compiles: \""</span> <span class="op"><></span> BSC.unpack input <span class="op"><></span> <span class="st">"\""</span>) <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> parseCompile input <span class="ot">`shouldBe`</span> <span class="dt">Right</span> (Seq.fromList result)</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> forM_ compilerErrorTests <span class="op">$</span> \(input, err) <span class="ot">-></span></span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> it (<span class="st">"fails for: \""</span> <span class="op"><></span> BSC.unpack input <span class="op"><></span> <span class="st">"\""</span>) <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a> parseCompile input <span class="ot">`shouldSatisfy`</span> \<span class="kw">case</span></span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> (<span class="dt">Error</span> <span class="dt">Compile</span> msg) <span class="op">|</span> err <span class="op">==</span> msg <span class="ot">-></span> <span class="dt">True</span></span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a> it <span class="st">"fails for greater sized expr"</span> <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true" tabindex="-1"></a> compile (<span class="dt">Num</span> <span class="dv">1</span>, <span class="dv">4</span>) <span class="ot">`shouldSatisfy`</span> \<span class="kw">case</span></span>
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span></span>
<span id="cb10-16"><a href="#cb10-16" aria-hidden="true" tabindex="-1"></a> ( <span class="dt">Error</span> <span class="dt">Compile</span> <span class="st">"Compiled bytecode size 3 is not same as expected size: 4"</span></span>
<span id="cb10-17"><a href="#cb10-17" aria-hidden="true" tabindex="-1"></a> ) <span class="ot">-></span> <span class="dt">True</span></span>
<span id="cb10-18"><a href="#cb10-18" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb10-19"><a href="#cb10-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-20"><a href="#cb10-20" aria-hidden="true" tabindex="-1"></a> it <span class="st">"fails for lesser sized expr"</span> <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb10-21"><a href="#cb10-21" aria-hidden="true" tabindex="-1"></a> compile (<span class="dt">Num</span> <span class="dv">1</span>, <span class="dv">2</span>) <span class="ot">`shouldSatisfy`</span> \<span class="kw">case</span></span>
<span id="cb10-22"><a href="#cb10-22" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> (<span class="dt">Error</span> <span class="dt">Compile</span> <span class="st">"Instruction index 2 out of bound 1"</span>) <span class="ot">-></span> <span class="dt">True</span></span>
<span id="cb10-23"><a href="#cb10-23" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb10-24"><a href="#cb10-24" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb10-25"><a href="#cb10-25" aria-hidden="true" tabindex="-1"></a> parseCompile <span class="ot">=</span> parseSized <span class="op">>=></span> compile' <span class="dv">4</span> <span class="op">>=></span> disassemble</span>
<span id="cb10-26"><a href="#cb10-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-27"><a href="#cb10-27" aria-hidden="true" tabindex="-1"></a><span class="ot">compilerSuccessTests ::</span> [(<span class="dt">BSC.ByteString</span>, [<span class="dt">Opcode</span>])]</span>
<span id="cb10-28"><a href="#cb10-28" aria-hidden="true" tabindex="-1"></a>compilerSuccessTests <span class="ot">=</span></span>
<span id="cb10-29"><a href="#cb10-29" aria-hidden="true" tabindex="-1"></a> [ ( <span class="st">"1"</span>,</span>
<span id="cb10-30"><a href="#cb10-30" aria-hidden="true" tabindex="-1"></a> [<span class="dt">OPush</span> <span class="dv">1</span>]</span>
<span id="cb10-31"><a href="#cb10-31" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb10-32"><a href="#cb10-32" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"1 + 2 - 3 * 4 + 5 / 6 / 1 + 1"</span>,</span>
<span id="cb10-33"><a href="#cb10-33" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">OPush</span> <span class="dv">1</span>, <span class="dt">OPush</span> <span class="dv">2</span>, <span class="dt">OAdd</span>, <span class="dt">OPush</span> <span class="dv">3</span>, <span class="dt">OPush</span> <span class="dv">4</span>, <span class="dt">OMul</span>, <span class="dt">OSub</span>, <span class="dt">OPush</span> <span class="dv">5</span>, <span class="dt">OPush</span> <span class="dv">6</span>,</span>
<span id="cb10-34"><a href="#cb10-34" aria-hidden="true" tabindex="-1"></a> <span class="dt">ODiv</span>, <span class="dt">OPush</span> <span class="dv">1</span>, <span class="dt">ODiv</span>, <span class="dt">OAdd</span>, <span class="dt">OPush</span> <span class="dv">1</span>, <span class="dt">OAdd</span> ]</span>
<span id="cb10-35"><a href="#cb10-35" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb10-36"><a href="#cb10-36" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)"</span>,</span>
<span id="cb10-37"><a href="#cb10-37" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">OPush</span> <span class="dv">1</span>, <span class="dt">OPush</span> <span class="dv">2</span>, <span class="dt">OPush</span> <span class="dv">3</span>, <span class="dt">OSub</span>, <span class="dt">OPush</span> <span class="dv">4</span>, <span class="dt">OMul</span>, <span class="dt">OAdd</span>, <span class="dt">OPush</span> <span class="dv">5</span>, <span class="dt">OPush</span> <span class="dv">6</span>,</span>
<span id="cb10-38"><a href="#cb10-38" aria-hidden="true" tabindex="-1"></a> <span class="dt">ODiv</span>, <span class="dt">OPush</span> <span class="dv">1</span>, <span class="dt">OPush</span> <span class="dv">1</span>, <span class="dt">OAdd</span>, <span class="dt">ODiv</span>, <span class="dt">OAdd</span> ]</span>
<span id="cb10-39"><a href="#cb10-39" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb10-40"><a href="#cb10-40" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = 4 in x + 1"</span>,</span>
<span id="cb10-41"><a href="#cb10-41" aria-hidden="true" tabindex="-1"></a> [<span class="dt">OPush</span> <span class="dv">4</span>, <span class="dt">OGet</span> <span class="dv">0</span>, <span class="dt">OPush</span> <span class="dv">1</span>, <span class="dt">OAdd</span>, <span class="dt">OSwapPop</span>]</span>
<span id="cb10-42"><a href="#cb10-42" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb10-43"><a href="#cb10-43" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = 4 in let y = 5 in x + y"</span>,</span>
<span id="cb10-44"><a href="#cb10-44" aria-hidden="true" tabindex="-1"></a> [<span class="dt">OPush</span> <span class="dv">4</span>, <span class="dt">OPush</span> <span class="dv">5</span>, <span class="dt">OGet</span> <span class="dv">0</span>, <span class="dt">OGet</span> <span class="dv">1</span>, <span class="dt">OAdd</span>, <span class="dt">OSwapPop</span>, <span class="dt">OSwapPop</span>]</span>
<span id="cb10-45"><a href="#cb10-45" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb10-46"><a href="#cb10-46" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = 4 in let x = x + 1 in x + 2"</span>,</span>
<span id="cb10-47"><a href="#cb10-47" aria-hidden="true" tabindex="-1"></a> [<span class="dt">OPush</span> <span class="dv">4</span>, <span class="dt">OGet</span> <span class="dv">0</span>, <span class="dt">OPush</span> <span class="dv">1</span>, <span class="dt">OAdd</span>, <span class="dt">OGet</span> <span class="dv">1</span>, <span class="dt">OPush</span> <span class="dv">2</span>, <span class="dt">OAdd</span>, <span class="dt">OSwapPop</span>, <span class="dt">OSwapPop</span>]</span>
<span id="cb10-48"><a href="#cb10-48" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb10-49"><a href="#cb10-49" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = let y = 3 in y + y in x * 3"</span>,</span>
<span id="cb10-50"><a href="#cb10-50" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">OPush</span> <span class="dv">3</span>, <span class="dt">OGet</span> <span class="dv">0</span>, <span class="dt">OGet</span> <span class="dv">0</span>, <span class="dt">OAdd</span>, <span class="dt">OSwapPop</span>, <span class="dt">OGet</span> <span class="dv">0</span>, <span class="dt">OPush</span> <span class="dv">3</span>, <span class="dt">OMul</span>, <span class="dt">OSwapPop</span> ]</span>
<span id="cb10-51"><a href="#cb10-51" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb10-52"><a href="#cb10-52" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3"</span>,</span>
<span id="cb10-53"><a href="#cb10-53" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">OPush</span> <span class="dv">1</span>, <span class="dt">OPush</span> <span class="dv">2</span>, <span class="dt">OGet</span> <span class="dv">1</span>, <span class="dt">OGet</span> <span class="dv">1</span>, <span class="dt">OMul</span>, <span class="dt">OSwapPop</span>, <span class="dt">OAdd</span>, <span class="dt">OGet</span> <span class="dv">0</span>, <span class="dt">OPush</span> <span class="dv">1</span>,</span>
<span id="cb10-54"><a href="#cb10-54" aria-hidden="true" tabindex="-1"></a> <span class="dt">OAdd</span>, <span class="dt">OSwapPop</span>, <span class="dt">OGet</span> <span class="dv">0</span>, <span class="dt">OPush</span> <span class="dv">3</span>, <span class="dt">OMul</span>, <span class="dt">OSwapPop</span> ]</span>
<span id="cb10-55"><a href="#cb10-55" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb10-56"><a href="#cb10-56" aria-hidden="true" tabindex="-1"></a> (<span class="st">"1/0"</span>, [<span class="dt">OPush</span> <span class="dv">1</span>, <span class="dt">OPush</span> <span class="dv">0</span>, <span class="dt">ODiv</span>]),</span>
<span id="cb10-57"><a href="#cb10-57" aria-hidden="true" tabindex="-1"></a> (<span class="st">"-32768 / -1"</span>, [<span class="dt">OPush</span> (<span class="op">-</span><span class="dv">32768</span>), <span class="dt">OPush</span> (<span class="op">-</span><span class="dv">1</span>), <span class="dt">ODiv</span>])</span>
<span id="cb10-58"><a href="#cb10-58" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb10-59"><a href="#cb10-59" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-60"><a href="#cb10-60" aria-hidden="true" tabindex="-1"></a><span class="ot">compilerErrorTests ::</span> [(<span class="dt">BSC.ByteString</span>, <span class="dt">String</span>)]</span>
<span id="cb10-61"><a href="#cb10-61" aria-hidden="true" tabindex="-1"></a>compilerErrorTests <span class="ot">=</span></span>
<span id="cb10-62"><a href="#cb10-62" aria-hidden="true" tabindex="-1"></a> [ (<span class="st">"x"</span>, <span class="st">"Unknown variable: x"</span>),</span>
<span id="cb10-63"><a href="#cb10-63" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 4 in y + 1"</span>, <span class="st">"Unknown variable: y"</span>),</span>
<span id="cb10-64"><a href="#cb10-64" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = y + 1 in x"</span>, <span class="st">"Unknown variable: y"</span>),</span>
<span id="cb10-65"><a href="#cb10-65" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = x + 1 in x"</span>, <span class="st">"Unknown variable: x"</span>),</span>
<span id="cb10-66"><a href="#cb10-66" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 4 in let y = 1 in let z = 2 in y + x"</span>, <span class="st">"Stack overflow"</span>),</span>
<span id="cb10-67"><a href="#cb10-67" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 4 in let y = 5 in x + let z = y in z * z"</span>, <span class="st">"Stack overflow"</span>),</span>
<span id="cb10-68"><a href="#cb10-68" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let a = 0 in let b = 0 in let c = 0 in let d = 0 in d"</span>, <span class="st">"Stack overflow"</span>)</span>
<span id="cb10-69"><a href="#cb10-69" aria-hidden="true" tabindex="-1"></a> ]</span></code></pre></div>
<figcaption>
ArithVMSpec.hs
</figcaption>
</figure>
<p>In each test, we parse and compile an expression, and then disassemble the compiled bytes, which we match with expected list of opcodes, or an error message.</p>
<p>Let’s put these tests with the parser tests, and run them:</p>
<figure>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> hspec <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> parserSpec</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> astInterpreterSpec</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> compilerSpec</span></code></pre></div>
<figcaption>
ArithVMSpec.hs
</figcaption>
</figure>
<details>
<summary>
Output of the test run
</summary>
<pre class="plain"><code>$ cabal test -O2
Running 1 test suites...
Test suite specs: RUNNING...
Parser
parses: "1 + 2 - 3 * 4 + 5 / 6 / 0 + 1" [✔]
parses: "1+2-3*4+5/6/0+1" [✔]
parses: "1 + -1" [✔]
parses: "let x = 4 in x + 1" [✔]
parses: "let x=4in x+1" [✔]
parses: "let x = 4 in let y = 5 in x + y" [✔]
parses: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
parses: "let x = 4 in (let y = 5 in x + 1) + let z = 2 in z * z" [✔]
parses: "let x=4in 2+let y=x-5in x+let z=y+1in z/2" [✔]
parses: "let x = (let y = 3 in y + y) in x * 3" [✔]
parses: "let x = let y = 3 in y + y in x * 3" [✔]
parses: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
fails for: "" [✔]
fails for: "1 +" [✔]
fails for: "1 & 1" [✔]
fails for: "1 + 1 & 1" [✔]
fails for: "1 & 1 + 1" [✔]
fails for: "(" [✔]
fails for: "(1" [✔]
fails for: "(1 + " [✔]
fails for: "(1 + 2" [✔]
fails for: "(1 + 2}" [✔]
fails for: "66666" [✔]
fails for: "-x" [✔]
fails for: "let 1" [✔]
fails for: "let x = 1 in " [✔]
fails for: "let let = 1 in 1" [✔]
fails for: "let x = 1 in in" [✔]
fails for: "let x=1 inx" [✔]
fails for: "letx = 1 in x" [✔]
fails for: "let x ~ 1 in x" [✔]
fails for: "let x = 1 & 2 in x" [✔]
fails for: "let x = 1 inx" [✔]
fails for: "let x = 1 in x +" [✔]
fails for: "let x = 1 in x in" [✔]
fails for: "let x = let x = 1 in x" [✔]
AST interpreter
interprets: "1" [✔]
interprets: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
interprets: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
interprets: "1 + -1" [✔]
interprets: "1 * -1" [✔]
interprets: "let x = 4 in x + 1" [✔]
interprets: "let x = 4 in let x = x + 1 in x + 2" [✔]
interprets: "let x = 4 in let y = 5 in x + y" [✔]
interprets: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
interprets: "let x = 4 in (let y = 5 in x + y) + let z = 2 in z * z" [✔]
interprets: "let x = let y = 3 in y + y in x * 3" [✔]
interprets: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
fails for: "x" [✔]
fails for: "let x = 4 in y + 1" [✔]
fails for: "let x = y + 1 in x" [✔]
fails for: "let x = x + 1 in x" [✔]
fails for: "1/0" [✔]
fails for: "-32768 / -1" [✔]
Compiler
compiles: "1" [✔]
compiles: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
compiles: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
compiles: "let x = 4 in x + 1" [✔]
compiles: "let x = 4 in let y = 5 in x + y" [✔]
compiles: "let x = 4 in let x = x + 1 in x + 2" [✔]
compiles: "let x = let y = 3 in y + y in x * 3" [✔]
compiles: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
compiles: "1/0" [✔]
compiles: "-32768 / -1" [✔]
fails for: "x" [✔]
fails for: "let x = 4 in y + 1" [✔]
fails for: "let x = y + 1 in x" [✔]
fails for: "let x = x + 1 in x" [✔]
fails for: "let x = 4 in let y = 1 in let z = 2 in y + x" [✔]
fails for: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
fails for: "let a = 0 in let b = 0 in let c = 0 in let d = 0 in d" [✔]
fails for greater sized expr [✔]
fails for lesser sized expr [✔]
Finished in 0.0147 seconds
73 examples, 0 failures
Test suite specs: PASS</code></pre>
</details>
<p>Awesome, it works! That’s it for this post. Let’s update our checklist:</p>
<ul class="task-list">
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#parsing-expressions">Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#testing-the-parser">Unit testing for our parser.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#the-ast-interpreter">Interpreting ASTs.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="#the-compiler">Compiling ASTs to bytecode.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="#the-decompiler">Disassembling and decompiling bytecode.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="#testing-the-compiler">Unit testing for our compiler.</a></label></li>
<li><label><input type="checkbox"></input>Property-based testing for our compiler.</label></li>
<li><label><input type="checkbox"></input>Efficiently executing bytecode in a virtual machine (VM).</label></li>
<li><label><input type="checkbox"></input>Unit testing and property-based testing for our <abbr title="Virtual Machine">VM</abbr>.</label></li>
<li><label><input type="checkbox"></input>Benchmarking our code to see how the different passes perform.</label></li>
<li><label><input type="checkbox"></input>All the while keeping an eye on performance.</label></li>
</ul>
<p>In the <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm/?mtm_campaign=feed">next part</a>, we write a virtual machine that runs our compiled bytecode, and do some benchmarking.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>There are <abbr title="Virtual Machine">VM</abbr>s that execute hardware <abbr title="Instruction Set">IS</abbr>s instead of bytecode. Such <abbr title="Virtual Machine">VM</abbr>s are also called <em><a href="https://en.wikipedia.org/wiki/Emulators" target="_blank" rel="noopener">Emulators</a></em> because they emulate actual CPU hardware. Some examples are <a href="https://www.qemu.org/" target="_blank" rel="noopener">QEMU</a> and <a href="https://en.wikipedia.org/wiki/Video_game_console_emulator" target="_blank" rel="noopener">video game console emulators</a>.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p><abbr title="Virtual Machine">VM</abbr>s use virtual registers instead of actual CPU registers, which are often represented as a fixed size array of 1, 2, 4 or 8 byte elements.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>I call them variables here but they do not actually vary. A better name is let bindings.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>We could have used two separate opcodes here: <code class="sourceCode haskell"><span class="dt">OSwap</span></code> and <code class="sourceCode haskell"><span class="dt">OPop</span></code>. That would result in same final result when evaluating an expression, but we’d have to execute two instructions instead of one for <code class="sourceCode haskell"><span class="dt">Let</span></code> expressions. Using a single <code class="sourceCode haskell"><span class="dt">OSwapPop</span></code> instruction speeds up execution, not only because we reduce the number of instructions, but also because we don’t need to do a full swap, only a half swap is enough because we pop the stack anyway after the swap. This also shows how we can improve the performance of our <abbr title="Virtual Machine">VM</abbr>s by inventing specific opcodes for particular operations.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>Notice the use of <a href="https://hackage.haskell.org/package/strict/docs/Data-Strict-Tuple.html#t:Pair" target="_blank" rel="noopener">strict <code class="sourceCode haskell"><span class="dt">Pair</span></code>s</a> here, for performance reasons.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>I ran all benchmarks on an Apple M4 Pro 24GB machine against a 142MB file.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>Used as <em><a href="https://en.wikipedia.org/wiki/Association_list" target="_blank" rel="noopener">Association List</a></em>.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>The decompiler is a bottom-up <a href="https://en.wikipedia.org/wiki/shift-reduce_parser" target="_blank" rel="noopener">shift-reduce parser</a> from the opcodes to the expression tree.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>A Fast Bytecode VM for Arithmetic</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed">The Parser</a>
</li>
<li>
<strong>The Compiler</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm/?mtm_campaign=feed">The Virtual Machine</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2025-08-24T00:00:00Z <p>In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics:</p>
<ul class="task-list">
<li><label><input type="checkbox" checked="" /><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/#parsing-expressions">Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).</a></label></li>
<li><label><input type="checkbox" checked="" /><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/#testing-the-parser">Unit testing for our parser.</a></label></li>
<li><label><input type="checkbox" checked="" /><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/#the-ast-interpreter">Interpreting ASTs.</a></label></li>
<li><label><input type="checkbox" /><span class="todo">Compiling ASTs to bytecode.</span></label></li>
<li><label><input type="checkbox" /><span class="todo">Disassembling and decompiling bytecode.</span></label></li>
<li><label><input type="checkbox" /><span class="todo">Unit testing for our compiler.</span></label></li>
<li><label><input type="checkbox" />Property-based testing for our compiler.</label></li>
<li><label><input type="checkbox" />Efficiently executing bytecode in a virtual machine (VM).</label></li>
<li><label><input type="checkbox" />Unit testing and property-based testing for our <abbr title="Virtual Machine">VM</abbr>.</label></li>
<li><label><input type="checkbox" />Benchmarking our code to see how the different passes perform.</label></li>
<li><label><input type="checkbox" />All the while keeping an eye on performance.</label></li>
</ul>
<p>In this post, we write the compiler for our <abbr title="Abstract Syntax Tree">AST</abbr> to bytecode, and a decompiler for the bytecode.</p>
https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/ A Fast Bytecode VM for Arithmetic: The Parser 2025-08-02T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics:</p>
<ul class="task-list">
<li><label><input type="checkbox"></input><span class="todo">Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).</span></label></li>
<li><label><input type="checkbox"></input><span class="todo">Unit testing for our parser.</span></label></li>
<li><label><input type="checkbox"></input><span class="todo">Interpreting ASTs.</span></label></li>
<li><label><input type="checkbox"></input>Compiling ASTs to bytecode.</label></li>
<li><label><input type="checkbox"></input>Disassembling and decompiling bytecode.</label></li>
<li><label><input type="checkbox"></input>Unit testing for our compiler.</label></li>
<li><label><input type="checkbox"></input>Property-based testing for our compiler.</label></li>
<li><label><input type="checkbox"></input>Efficiently executing bytecode in a virtual machine (VM).</label></li>
<li><label><input type="checkbox"></input>Unit testing and property-based testing for our <abbr title="Virtual Machine">VM</abbr>.</label></li>
<li><label><input type="checkbox"></input>Benchmarking our code to see how the different passes perform.</label></li>
<li><label><input type="checkbox"></input>All the while keeping an eye on performance.</label></li>
</ul>
<p>In this post, we write the parser for our expression language to an <abbr title="Abstract Syntax Tree">AST</abbr>, and an <abbr title="Abstract Syntax Tree">AST</abbr> interpreter.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>A Fast Bytecode VM for Arithmetic</strong>.</p>
<ol>
<li>
<strong>The Parser</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed">The Compiler</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm/?mtm_campaign=feed">The Virtual Machine</a>
</li>
</ol>
</section>
<nav id="toc"><h3>Contents</h3><ol><li><a href="#introduction">Introduction</a></li><li><a href="#expressions">Expressions</a></li><li><a href="#parsing-expressions">Parsing Expressions</a></li><li><a href="#error-handling">Error Handling</a></li><li><a href="#the-parser">The Parser</a></li><li><a href="#testing-the-parser">Testing the Parser</a></li><li><a href="#the-ast-interpreter">The AST Interpreter</a></li><li><a href="#testing-the-interpreter">Testing the Interpreter</a></li></ol></nav>
<h2 data-track-content data-content-name="introduction" data-content-piece="arithmetic-bytecode-vm-parser" id="introduction">Introduction</h2>
<p>The language that we are going to work with is that of basic <a href="https://en.wikipedia.org/wiki/arithmetic" target="_blank" rel="noopener">arithmetic</a> expressions, with integer values, and addition, subtraction, multiplication and integer division operations. However, our expression language has a small twist: it is possible to introduce a variable using a <code class="sourceCode haskell"><span class="kw">let</span></code> binding and use the variable in the expressions in the body of <code class="sourceCode haskell"><span class="kw">let</span></code><a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. Furthermore, we use the same syntax for <code class="sourceCode haskell"><span class="kw">let</span></code> as Haskell does. Here are some examples of valid expressions in our language:</p>
<div class="sourceCode" id="cb1" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="dv">1</span> <span class="op">+</span> <span class="dv">2</span> <span class="op">-</span> <span class="dv">3</span> <span class="op">*</span> <span class="dv">4</span> <span class="op">+</span> <span class="dv">5</span> <span class="op">/</span> <span class="dv">6</span> <span class="op">/</span> <span class="dv">0</span> <span class="op">+</span> <span class="dv">1</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> x <span class="ot">=</span> <span class="dv">4</span> <span class="kw">in</span> x <span class="op">+</span> <span class="dv">1</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> x <span class="ot">=</span> <span class="dv">4</span> <span class="kw">in</span> <span class="kw">let</span> y <span class="ot">=</span> <span class="dv">5</span> <span class="kw">in</span> x <span class="op">+</span> y</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> x <span class="ot">=</span> <span class="dv">4</span> <span class="kw">in</span> <span class="kw">let</span> y <span class="ot">=</span> <span class="dv">5</span> <span class="kw">in</span> x <span class="op">+</span> <span class="kw">let</span> z <span class="ot">=</span> y <span class="kw">in</span> z <span class="op">*</span> z</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> x <span class="ot">=</span> <span class="dv">4</span> <span class="kw">in</span> (<span class="kw">let</span> y <span class="ot">=</span> <span class="dv">5</span> <span class="kw">in</span> x <span class="op">+</span> <span class="dv">1</span>) <span class="op">+</span> <span class="kw">let</span> z <span class="ot">=</span> <span class="dv">2</span> <span class="kw">in</span> z <span class="op">*</span> z</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> x <span class="ot">=</span> (<span class="kw">let</span> y <span class="ot">=</span> <span class="dv">3</span> <span class="kw">in</span> y <span class="op">+</span> y) <span class="kw">in</span> x <span class="op">*</span> <span class="dv">3</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> x <span class="ot">=</span> <span class="kw">let</span> y <span class="ot">=</span> <span class="dv">3</span> <span class="kw">in</span> y <span class="op">+</span> y <span class="kw">in</span> x <span class="op">*</span> <span class="dv">3</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> x <span class="ot">=</span> <span class="kw">let</span> y <span class="ot">=</span> <span class="dv">1</span> <span class="op">+</span> <span class="kw">let</span> z <span class="ot">=</span> <span class="dv">2</span> <span class="kw">in</span> z <span class="op">*</span> z <span class="kw">in</span> y <span class="op">+</span> <span class="dv">1</span> <span class="kw">in</span> x <span class="op">*</span> <span class="dv">3</span></span></code></pre></div>
<p>The only gotcha here is that the body of a <code class="sourceCode haskell"><span class="kw">let</span></code> expression extends as far as possible while accounting for nested <code class="sourceCode haskell"><span class="kw">let</span></code>s. It becomes clear when we look at parsed expressions later.</p>
<p>The eventual product is a command-line tool that can run different commands. Let’s start with a demo of the tool:</p>
<pre class="plain"><code>$ arith-vm -h
Bytecode VM for Arithmetic written in Haskell
Usage: arith-vm COMMAND
Available options:
-h,--help Show this help text
Available commands:
read Read an expression from file or STDIN
parse Parse expression to AST
print Parse expression to AST and print it
compile Parse and compile expression to bytecode
disassemble Disassemble bytecode to opcodes
decompile Disassemble and decompile bytecode to expression
interpret-ast Parse expression and interpret AST
interpret-bytecode Parse, compile and assemble expression, and
interpret bytecode
run Run bytecode
generate Generate a random arithmetic expression
$ arith-vm parse -h
Usage: arith-vm print [FILE]
Parse expression to AST and print it
Available options:
FILE Input file, pass - to read from STDIN (default)
-h,--help Show this help text
$ echo -n "let x = 1 in let y = 2 in y + x * 3" | arith-vm print
( let x = 1 in ( let y = 2 in ( y + ( x * 3 ) ) ) )
$ echo -n "let x = 1 in let y = 2 in y + x * 3" | arith-vm compile > a.tbc
$ hexdump -C a.tbc
00000000 00 01 00 00 02 00 01 01 01 00 00 03 00 05 03 02 |................|
00000010 02 |.|
00000011
$ arith-vm disassemble a.tbc
OPush 1
OPush 1
OPush 2
OGet 1
OGet 0
OPush 3
OMul
OAdd
OSwapPop
OSwapPop
$ arith-vm decompile a.tbc
( let a = 1 in ( let b = 2 in ( b + ( a * 3 ) ) ) )
$ echo -n "let x = 1 in let y = 2 in y + x * 3" | arith-vm interpret-ast
5
$ echo -n "let x = 1 in let y = 2 in y + x * 3" | arith-vm interpret-bytecode
5
$ arith-vm run a.tbc
5
$ arith-vm generate
(
(
(
( let nD =
( 11046 - -20414 ) in
( let xqf = ( -15165 * nD ) in nD )
) * 26723
) /
(
( let phMuOI =
( let xQ = ( let mmeBy = -28095 in 22847 ) in 606 ) in 25299
) *
( let fnoNQm = ( let mzZaZk = 29463 in 18540 ) in ( -2965 / fnoNQm ) )
)
) * 21400
)</code></pre>
<p>We can parse an expression, or compile it to bytecode. We can also disassemble bytecode to opcodes, or decompile it back to an expression. We can interpret an expression either as an AST or as bytecode. We can also run a bytecode file directly. Finally, we have a handy command to generate random expressions for testing/benchmarking purposes<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<p>Let’s start.</p>
<h2 data-track-content data-content-name="expressions" data-content-piece="arithmetic-bytecode-vm-parser" id="expressions">Expressions</h2>
<p>Since this is Haskell, we start with listing many language extensions and imports:</p>
<figure>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE GHC2021 #-}</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE OverloadedStrings #-}</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE UndecidableInstances #-}</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">ArithVMLib</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> ( <span class="dt">Expr</span>(<span class="op">..</span>), <span class="dt">Ident</span>(<span class="op">..</span>), <span class="dt">Op</span>(<span class="op">..</span>), <span class="dt">Pass</span>(<span class="op">..</span>), <span class="dt">Error</span>(<span class="op">..</span>), <span class="dt">Opcode</span>(<span class="op">..</span>), <span class="dt">Bytecode</span>,</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> sizedExpr, parse, parseSized, compile', compile, decompile, disassemble,</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> exprGen, interpretAST, interpretBytecode', interpretBytecode ) <span class="kw">where</span></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Applicative</span> ((<|>))</span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.DeepSeq</span> (<span class="dt">NFData</span>)</span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Exception</span> (<span class="dt">Exception</span>, catch, throwIO)</span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span> (unless, void, when)</span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.Except</span> (<span class="dt">MonadError</span> (..), runExceptT)</span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.ST.Strict</span> (runST)</span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Attoparsec.ByteString.Char8</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">P</span></span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Bits</span> (shiftL, shiftR, (.&.), (.|.))</span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.ByteString</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">BS</span></span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.ByteString.Char8</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">BSC</span></span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.ByteString.Internal</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">BSI</span></span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.ByteString.Unsafe</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">BS</span></span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Char</span> (toUpper)</span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.HashMap.Strict</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Map</span></span>
<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Hashable</span> (<span class="dt">Hashable</span>)</span>
<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Int</span> (<span class="dt">Int16</span>)</span>
<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.List</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">List</span></span>
<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Maybe</span> (fromMaybe)</span>
<span id="cb3-28"><a href="#cb3-28" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Primitive.PrimArray</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">PA</span></span>
<span id="cb3-29"><a href="#cb3-29" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Sequence</span> (<span class="dt">Seq</span> (..), (|>))</span>
<span id="cb3-30"><a href="#cb3-30" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Sequence</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Seq</span></span>
<span id="cb3-31"><a href="#cb3-31" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Set</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Set</span></span>
<span id="cb3-32"><a href="#cb3-32" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Strict.Tuple</span> (<span class="dt">Pair</span> ((:!:)))</span>
<span id="cb3-33"><a href="#cb3-33" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Strict.Tuple</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">TS</span></span>
<span id="cb3-34"><a href="#cb3-34" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Word</span> (<span class="dt">Word16</span>, <span class="dt">Word8</span>)</span>
<span id="cb3-35"><a href="#cb3-35" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Foreign.Ptr</span> (<span class="dt">Ptr</span>, minusPtr, plusPtr)</span>
<span id="cb3-36"><a href="#cb3-36" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Foreign.Storable</span> (poke)</span>
<span id="cb3-37"><a href="#cb3-37" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">GHC.Generics</span> (<span class="dt">Generic</span>)</span>
<span id="cb3-38"><a href="#cb3-38" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Test.QuickCheck</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Q</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>We use the <code class="sourceCode haskell"><span class="dt">GHC2021</span></code> extension here that enables a lot of useful GHC extensions by default. We are using the <a href="https://hackage.haskell.org/package/bytestring/" target="_blank" rel="noopener">bytestring</a> and <a href="https://hackage.haskell.org/package/attoparsec/" target="_blank" rel="noopener">attoparsec</a> libraries for parsing, <a href="https://hackage.haskell.org/package/strict/" target="_blank" rel="noopener">strict</a>, <a href="https://hackage.haskell.org/package/containers/" target="_blank" rel="noopener">containers</a> and <a href="https://hackage.haskell.org/package/unordered-containers/" target="_blank" rel="noopener">unordered-containers</a> for compilation, <a href="https://hackage.haskell.org/package/deepseq/" target="_blank" rel="noopener">deepseq</a>, <a href="https://hackage.haskell.org/package/mtl/" target="_blank" rel="noopener">mtl</a> and <a href="https://hackage.haskell.org/package/primitive/" target="_blank" rel="noopener">primitive</a> for interpreting, and <a href="https://hackage.haskell.org/package/QuickCheck/" target="_blank" rel="noopener">QuickCheck</a> for testing.</p>
<p>The first step is to parse an expression into an <em><a href="https://en.wikipedia.org/wiki/Abstract_Syntax_Tree" target="_blank" rel="noopener">Abstract Syntax Tree</a></em> (AST). We represent the <abbr title="Abstract Syntax Tree">AST</abbr> as Haskell <em><a href="https://en.wikipedia.org/wiki/Algebraic_data_type" target="_blank" rel="noopener">Algebraic Data Types</a></em> (ADTs):</p>
<figure>
<div class="sourceCode" id="cb4" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Expr</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Num</span> <span class="op">!</span><span class="dt">Int16</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Var</span> <span class="op">!</span><span class="dt">Ident</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">BinOp</span> <span class="op">!</span><span class="dt">Op</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Let</span> <span class="op">!</span><span class="dt">Ident</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Generic</span>)</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Ident</span> <span class="ot">=</span> <span class="dt">Ident</span> <span class="dt">BS.ByteString</span></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Ord</span>, <span class="dt">Generic</span>, <span class="dt">Hashable</span>)</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Op</span> <span class="ot">=</span> <span class="dt">Add</span> <span class="op">|</span> <span class="dt">Sub</span> <span class="op">|</span> <span class="dt">Mul</span> <span class="op">|</span> <span class="dt">Div</span> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Enum</span>, <span class="dt">Generic</span>)</span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">NFData</span> <span class="dt">Expr</span></span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">Expr</span> <span class="kw">where</span></span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">Num</span> n <span class="ot">-></span> <span class="fu">show</span> n</span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">Var</span> (<span class="dt">Ident</span> x) <span class="ot">-></span> BSC.unpack x</span>
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">BinOp</span> op a b <span class="ot">-></span> <span class="st">"("</span> <span class="op"><></span> <span class="fu">show</span> a <span class="op"><></span> <span class="st">" "</span> <span class="op"><></span> <span class="fu">show</span> op <span class="op"><></span> <span class="st">" "</span> <span class="op"><></span> <span class="fu">show</span> b <span class="op"><></span> <span class="st">")"</span></span>
<span id="cb4-20"><a href="#cb4-20" aria-hidden="true" tabindex="-1"></a> <span class="dt">Let</span> (<span class="dt">Ident</span> x) a b <span class="ot">-></span></span>
<span id="cb4-21"><a href="#cb4-21" aria-hidden="true" tabindex="-1"></a> <span class="st">"(let "</span> <span class="op"><></span> BSC.unpack x <span class="op"><></span> <span class="st">" = "</span> <span class="op"><></span> <span class="fu">show</span> a <span class="op"><></span> <span class="st">" in "</span> <span class="op"><></span> <span class="fu">show</span> b <span class="op"><></span> <span class="st">")"</span></span>
<span id="cb4-22"><a href="#cb4-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-23"><a href="#cb4-23" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">NFData</span> <span class="dt">Ident</span></span>
<span id="cb4-24"><a href="#cb4-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-25"><a href="#cb4-25" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">Ident</span> <span class="kw">where</span></span>
<span id="cb4-26"><a href="#cb4-26" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> (<span class="dt">Ident</span> x) <span class="ot">=</span> BSC.unpack x</span>
<span id="cb4-27"><a href="#cb4-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-28"><a href="#cb4-28" aria-hidden="true" tabindex="-1"></a><span class="ot">mkIdent ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Ident</span></span>
<span id="cb4-29"><a href="#cb4-29" aria-hidden="true" tabindex="-1"></a>mkIdent <span class="ot">=</span> <span class="dt">Ident</span> <span class="op">.</span> BSC.pack</span>
<span id="cb4-30"><a href="#cb4-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-31"><a href="#cb4-31" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">NFData</span> <span class="dt">Op</span></span>
<span id="cb4-32"><a href="#cb4-32" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-33"><a href="#cb4-33" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">Op</span> <span class="kw">where</span></span>
<span id="cb4-34"><a href="#cb4-34" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb4-35"><a href="#cb4-35" aria-hidden="true" tabindex="-1"></a> <span class="dt">Add</span> <span class="ot">-></span> <span class="st">"+"</span></span>
<span id="cb4-36"><a href="#cb4-36" aria-hidden="true" tabindex="-1"></a> <span class="dt">Sub</span> <span class="ot">-></span> <span class="st">"-"</span></span>
<span id="cb4-37"><a href="#cb4-37" aria-hidden="true" tabindex="-1"></a> <span class="dt">Mul</span> <span class="ot">-></span> <span class="st">"*"</span></span>
<span id="cb4-38"><a href="#cb4-38" aria-hidden="true" tabindex="-1"></a> <span class="dt">Div</span> <span class="ot">-></span> <span class="st">"/"</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>We add <code class="sourceCode haskell"><span class="dt">Show</span></code> instances for <abbr title="Algebraic Data Type">ADT</abbr>s so that we can pretty-print the parsed <abbr title="Abstract Syntax Tree">AST</abbr><a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>. Now, we can start parsing.</p>
<h2 data-track-content data-content-name="parsing-expressions" data-content-piece="arithmetic-bytecode-vm-parser" id="parsing-expressions">Parsing Expressions</h2>
<p>The <a href="https://en.wikipedia.org/wiki/EBNF" target="_blank" rel="noopener">EBNF</a> grammar for expressions is as follows:</p>
<div class="sourceCode" id="cb5" data-lang="ebnf"><pre class="sourceCode c numberSource"><code class="sourceCode c"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>expr <span class="op">::=</span> term <span class="op">|</span> term space<span class="op">*</span> <span class="op">(</span><span class="st">"+"</span> <span class="op">|</span> <span class="st">"-"</span><span class="op">)</span> term</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>term <span class="op">::=</span> factor <span class="op">|</span> factor space<span class="op">*</span> <span class="op">(</span><span class="st">"*"</span> <span class="op">|</span> <span class="st">"/"</span><span class="op">)</span> factor</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>factor <span class="op">::=</span> space<span class="op">*</span> <span class="op">(</span>grouping <span class="op">|</span> num <span class="op">|</span> var <span class="op">|</span> let<span class="op">)</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>grouping <span class="op">::=</span> <span class="st">"("</span> expr space<span class="op">*</span> <span class="st">")"</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>num <span class="op">::=</span> <span class="st">"-"</span><span class="op">?</span> <span class="op">[</span><span class="dv">0</span><span class="op">-</span><span class="dv">9</span><span class="op">]+</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>var <span class="op">::=</span> ident</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>ident <span class="op">::=</span> <span class="op">([</span>a<span class="op">-</span>z<span class="op">]</span> <span class="op">|</span> <span class="op">[</span>A<span class="op">-</span>Z<span class="op">])+</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>let <span class="op">::=</span> <span class="st">"let"</span> space<span class="op">+</span> ident space<span class="op">*</span> <span class="st">"="</span> expr space<span class="op">*</span> <span class="st">"in"</span> space<span class="op">+</span> expr space<span class="op">*</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>space <span class="op">::=</span> <span class="st">" "</span> <span class="op">|</span> <span class="st">"</span><span class="sc">\t</span><span class="st">"</span> <span class="op">|</span> <span class="st">"</span><span class="sc">\n</span><span class="st">"</span> <span class="op">|</span> <span class="st">"</span><span class="sc">\f</span><span class="st">"</span> <span class="op">|</span> <span class="st">"</span><span class="sc">\r</span><span class="st">"</span></span></code></pre></div>
<p>The <code>expr</code>, <code>term</code>, <code>factor</code>, and <code>grouping</code> productions take care of having the right precedence of arithmetic operations. The <code>num</code> and <code>var</code> productions are trivial. Our language is fairly oblivious of whitespaces; we allow zero-or-more spaces at most places.</p>
<p>The <code class="sourceCode haskell"><span class="kw">let</span></code> expressions grammar is pretty standard, except we require one-or-more spaces after the <code class="sourceCode haskell"><span class="kw">let</span></code> and <code class="sourceCode haskell"><span class="kw">in</span></code> keywords to make them unambiguous.</p>
<p>We use the <a href="https://en.wikipedia.org/wiki/parser_combinator" target="_blank" rel="noopener">parser combinator</a> library <a href="https://hackage.haskell.org/package/attoparsec/" target="_blank" rel="noopener">attoparsec</a> for creating the parser. attoparsec works directly with bytestrings so we don’t incur the cost of decoding unicode characters<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a><a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>.</p>
<p>We write the parser in a top-down <a href="https://en.wikipedia.org/wiki/Recursive_descent_parser" target="_blank" rel="noopener">recursive-descent</a> fashion, same as the grammar, starting with the <code>expr</code> parser:</p>
<figure>
<div class="sourceCode" id="cb6" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">SizedExpr</span> <span class="ot">=</span> (<span class="dt">Expr</span>, <span class="dt">Int</span>)</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="co">-- expr ::= term | term space* ("+" | "-") term</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="ot">exprParser ::</span> <span class="dt">P.Parser</span> <span class="dt">SizedExpr</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>exprParser <span class="ot">=</span> chainBinOps termParser <span class="op">$</span> \<span class="kw">case</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> <span class="ch">'+'</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="dt">Add</span></span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> <span class="ch">'-'</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="dt">Sub</span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> op <span class="ot">-></span> <span class="fu">fail</span> <span class="op">$</span> <span class="st">"Expected '+' or '-', got: "</span> <span class="op"><></span> <span class="fu">show</span> op</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a><span class="co">-- term ::= factor | factor space* ("*" | "/") factor</span></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a><span class="ot">termParser ::</span> <span class="dt">P.Parser</span> <span class="dt">SizedExpr</span></span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a>termParser <span class="ot">=</span> chainBinOps factorParser <span class="op">$</span> \<span class="kw">case</span></span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a> <span class="ch">'*'</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="dt">Mul</span></span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a> <span class="ch">'/'</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="dt">Div</span></span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a> op <span class="ot">-></span> <span class="fu">fail</span> <span class="op">$</span> <span class="st">"Expected '*' or '/', got: "</span> <span class="op"><></span> <span class="fu">show</span> op</span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a><span class="ot">chainBinOps ::</span> <span class="dt">P.Parser</span> <span class="dt">SizedExpr</span> <span class="ot">-></span> (<span class="dt">Char</span> <span class="ot">-></span> <span class="dt">P.Parser</span> <span class="dt">Op</span>) <span class="ot">-></span> <span class="dt">P.Parser</span> <span class="dt">SizedExpr</span></span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a>chainBinOps operandParser operatorParser <span class="ot">=</span> operandParser <span class="op">>>=</span> rest</span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb6-20"><a href="#cb6-20" aria-hidden="true" tabindex="-1"></a> rest (<span class="op">!</span>expr, <span class="op">!</span>size1) <span class="ot">=</span></span>
<span id="cb6-21"><a href="#cb6-21" aria-hidden="true" tabindex="-1"></a> ( <span class="kw">do</span></span>
<span id="cb6-22"><a href="#cb6-22" aria-hidden="true" tabindex="-1"></a> P.skipSpace</span>
<span id="cb6-23"><a href="#cb6-23" aria-hidden="true" tabindex="-1"></a> c <span class="ot"><-</span> P.anyChar</span>
<span id="cb6-24"><a href="#cb6-24" aria-hidden="true" tabindex="-1"></a> operator <span class="ot"><-</span> operatorParser c</span>
<span id="cb6-25"><a href="#cb6-25" aria-hidden="true" tabindex="-1"></a> (operand, <span class="op">!</span>size2) <span class="ot"><-</span> operandParser</span>
<span id="cb6-26"><a href="#cb6-26" aria-hidden="true" tabindex="-1"></a> rest (<span class="dt">BinOp</span> operator expr operand, size1 <span class="op">+</span> size2 <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb6-27"><a href="#cb6-27" aria-hidden="true" tabindex="-1"></a> ) <span class="op"><|></span> <span class="fu">pure</span> (expr, size1)</span>
<span id="cb6-28"><a href="#cb6-28" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# INLINE chainBinOps #-}</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>One small complication: our parsers not only return the parsed expressions, but also the number of bytes they occupy when compiled to bytecode. We gather this information while building the AST in parts, and propagate it upward in the tree. We use the bytecode size later in the compilation pass<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>.</p>
<p>Both <code class="sourceCode haskell">exprParser</code> and <code class="sourceCode haskell">termParser</code> chain the right higher precedence parsers with the right operators between them<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a> using the <code>chainBinOps</code> combinator.</p>
<figure>
<div class="sourceCode" id="cb7" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- factor ::= space* (grouping | num | var | let)</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="ot">factorParser ::</span> <span class="dt">P.Parser</span> <span class="dt">SizedExpr</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>factorParser <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> P.skipSpace</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> P.peekChar' <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> <span class="ch">'('</span> <span class="ot">-></span> groupingParser</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> <span class="ch">'-'</span> <span class="ot">-></span> numParser</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> c <span class="op">|</span> P.isDigit c <span class="ot">-></span> numParser</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> c <span class="op">|</span> c <span class="op">/=</span> <span class="ch">'l'</span> <span class="ot">-></span> varParser</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> varParser <span class="op"><|></span> letParser</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a><span class="co">-- grouping ::= "(" expr space* ")"</span></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="ot">groupingParser ::</span> <span class="dt">P.Parser</span> <span class="dt">SizedExpr</span></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a>groupingParser <span class="ot">=</span> P.char <span class="ch">'('</span> <span class="op">*></span> exprParser <span class="op"><*</span> P.skipSpace <span class="op"><*</span> P.char <span class="ch">')'</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p><code>factorParser</code> uses <a href="https://en.wikipedia.org/wiki/Parsing#Lookahead" target="_blank" rel="noopener">lookahead</a> to dispatch between one of the primary parsers, which is faster than using <a href="https://en.wikipedia.org/wiki/backtracking" target="_blank" rel="noopener">backtracking</a>. <code>groupingParser</code> simply skips the parenthesis, and recursively calls <code>exprParser</code>.</p>
<figure>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- num ::= "-"? [0-9]+</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="ot">numParser ::</span> <span class="dt">P.Parser</span> <span class="dt">SizedExpr</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>numParser <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> n <span class="ot"><-</span> P.signed P.decimal <span class="op">P.<?></span> <span class="st">"number"</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> validInt16 n</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> <span class="fu">pure</span> (<span class="dt">Num</span> <span class="op">$</span> <span class="fu">fromIntegral</span> n, <span class="dv">3</span>)</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="fu">fail</span> <span class="op">$</span> <span class="st">"Expected a valid Int16, got: "</span> <span class="op"><></span> <span class="fu">show</span> n</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="ot"> validInt16 ::</span> <span class="dt">Integer</span> <span class="ot">-></span> <span class="dt">Bool</span></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a> validInt16 i <span class="ot">=</span></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a> <span class="fu">fromIntegral</span> (<span class="fu">minBound</span> <span class="op">@</span><span class="dt">Int16</span>) <span class="op"><=</span> i</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a> <span class="op">&&</span> i <span class="op"><=</span> <span class="fu">fromIntegral</span> (<span class="fu">maxBound</span> <span class="op">@</span><span class="dt">Int16</span>)</span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p><code>numParser</code> uses the <code>signed</code> and <code>decimal</code> parsers from the attoparsec library to parse an optionally signed integer. We restrict the numbers to 2-byte integers (-32768–32767 inclusive)<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>. The <code class="sourceCode haskell"><span class="op"><?></span></code> helper from attoparsec names parsers so that the error message shown in case of failures point to the right parser.</p>
<figure>
<div class="sourceCode" id="cb9" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- var ::= ident</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="ot">varParser ::</span> <span class="dt">P.Parser</span> <span class="dt">SizedExpr</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>varParser <span class="ot">=</span> (,<span class="dv">2</span>) <span class="op">.</span> <span class="dt">Var</span> <span class="op"><$></span> identParser</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="co">-- ident ::= ([a-z] | [A-Z])+</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a><span class="ot">identParser ::</span> <span class="dt">P.Parser</span> <span class="dt">Ident</span></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a>identParser <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a> ident <span class="ot"><-</span> P.takeWhile1 P.isAlpha_ascii <span class="op">P.<?></span> <span class="st">"identifier"</span></span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> isReservedKeyword ident</span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> <span class="fu">fail</span> <span class="op">$</span></span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a> <span class="st">"Expected identifier, got: \""</span> <span class="op"><></span> BSC.unpack ident</span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">"\", which is a reversed keyword"</span></span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Ident</span> ident</span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# INLINE identParser #-}</span></span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a><span class="ot">isReservedKeyword ::</span> <span class="dt">BSC.ByteString</span> <span class="ot">-></span> <span class="dt">Bool</span></span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a>isReservedKeyword <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a> <span class="st">"let"</span> <span class="ot">-></span> <span class="dt">True</span></span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a> <span class="st">"in"</span> <span class="ot">-></span> <span class="dt">True</span></span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# INLINE isReservedKeyword #-}</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p><code>varParser</code> and <code>identParser</code> are straightforward. We restrict identifiers to upper-and-lowercase <a href="https://en.wikipedia.org/wiki/ASCII" target="_blank" rel="noopener">ASCII</a> alphabetic letters. We also check that our reserved keywords (<code class="sourceCode haskell"><span class="kw">let</span></code> and <code class="sourceCode haskell"><span class="kw">in</span></code>) are not used as identifiers.</p>
<p>Finally, we write the parser for <code class="sourceCode haskell"><span class="kw">let</span></code> expressions:</p>
<figure>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="co">-- let ::= "let" space+ ident space* "=" expr space* "in" space+ expr space*</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="ot">letParser ::</span> <span class="dt">P.Parser</span> <span class="dt">SizedExpr</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>letParser <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> expect <span class="st">"let"</span> <span class="op"><*</span> skipSpace1</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> <span class="op">!</span>x <span class="ot"><-</span> identParser</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> P.skipSpace <span class="op">*></span> expect <span class="st">"="</span></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> (assign, <span class="op">!</span>aSize) <span class="ot"><-</span> exprParser</span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> P.skipSpace <span class="op">*></span> expect <span class="st">"in"</span> <span class="op"><*</span> skipSpace1</span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a> (body, <span class="op">!</span>bSize) <span class="ot"><-</span> exprParser <span class="op"><*</span> P.skipSpace</span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> (<span class="dt">Let</span> x assign body, aSize <span class="op">+</span> bSize <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a> expect s <span class="ot">=</span></span>
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a> void (P.string s) <span class="op"><|></span> <span class="kw">do</span></span>
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true" tabindex="-1"></a> found <span class="ot"><-</span> P.manyTill P.anyChar (void P.space <span class="op"><|></span> P.endOfInput)</span>
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> found' <span class="ot">=</span> <span class="kw">if</span> found <span class="op">==</span> <span class="st">""</span> <span class="kw">then</span> <span class="st">"end-of-input"</span> <span class="kw">else</span> <span class="st">"\""</span> <span class="op"><></span> found <span class="op"><></span> <span class="st">"\""</span></span>
<span id="cb10-16"><a href="#cb10-16" aria-hidden="true" tabindex="-1"></a> <span class="fu">fail</span> <span class="op">$</span> <span class="st">"Expected: \""</span> <span class="op"><></span> BSC.unpack s <span class="op"><></span> <span class="st">"\", got: "</span> <span class="op"><></span> found'</span>
<span id="cb10-17"><a href="#cb10-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-18"><a href="#cb10-18" aria-hidden="true" tabindex="-1"></a> skipSpace1 <span class="ot">=</span> P.space <span class="op">*></span> P.skipSpace</span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>In <code>letParser</code> we use <code>identParser</code> to parse the variable name, and recursively call <code>exprParser</code> to parse the assignment and body expressions, while making sure to correctly parse the spaces. The helper parser <code>expect</code> is used to parse known string tokens (<code>let</code>, <code>=</code> and <code>in</code>), and provide good error messages in case of failures. Talking about error messages …</p>
<h2 data-track-content data-content-name="error-handling" data-content-piece="arithmetic-bytecode-vm-parser" id="error-handling">Error Handling</h2>
<p>Let’s figure out an error handling strategy. We use an <code class="sourceCode haskell"><span class="dt">Error</span></code> type wrapped in <a href="https://hackage.haskell.org/package/base/docs/Prelude.html#t:Either" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Either</span></code></a> to propagate the errors in our program:</p>
<figure>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Error</span> <span class="ot">=</span> <span class="dt">Error</span> <span class="op">!</span><span class="dt">Pass</span> <span class="op">!</span><span class="dt">String</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Generic</span>)</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Eq</span> <span class="dt">Error</span> <span class="kw">where</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Error</span> _ m1) <span class="op">==</span> (<span class="dt">Error</span> _ m2) <span class="ot">=</span> m1 <span class="op">==</span> m2</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">Error</span> <span class="kw">where</span></span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> (<span class="dt">Error</span> pass msg) <span class="ot">=</span> <span class="fu">show</span> pass <span class="op"><></span> <span class="st">" error: "</span> <span class="op"><></span> msg</span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">NFData</span> <span class="dt">Error</span></span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Exception</span> <span class="dt">Error</span></span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Pass</span></span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Read</span></span>
<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Parse</span></span>
<span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Print</span></span>
<span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Compile</span></span>
<span id="cb11-18"><a href="#cb11-18" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Decompile</span></span>
<span id="cb11-19"><a href="#cb11-19" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Disassemble</span></span>
<span id="cb11-20"><a href="#cb11-20" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">InterpretAST</span></span>
<span id="cb11-21"><a href="#cb11-21" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">InterpretBytecode</span></span>
<span id="cb11-22"><a href="#cb11-22" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>, <span class="dt">Generic</span>)</span>
<span id="cb11-23"><a href="#cb11-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-24"><a href="#cb11-24" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">NFData</span> <span class="dt">Pass</span></span>
<span id="cb11-25"><a href="#cb11-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-26"><a href="#cb11-26" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Result</span> <span class="ot">=</span> <span class="dt">Either</span> <span class="dt">Error</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>The <code class="sourceCode haskell"><span class="dt">Error</span></code> type also captures the <code class="sourceCode haskell"><span class="dt">Pass</span></code> in which the error is thrown. <code class="sourceCode haskell"><span class="dt">Result</span></code> is a type alias that represents either an error or a result. Finally, we put all the parsers together to write the <code>parse</code> function.</p>
<h2 data-track-content data-content-name="the-parser" data-content-piece="arithmetic-bytecode-vm-parser" id="the-parser">The Parser</h2>
<p>Our <code>parseSized</code> function uses the <code>parse</code> function from attoparsec to run the <code>exprParser</code> over an input.</p>
<figure>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">parseSized ::</span> <span class="dt">BS.ByteString</span> <span class="ot">-></span> <span class="dt">Result</span> <span class="dt">SizedExpr</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>parseSized <span class="ot">=</span> processResult <span class="op">.</span> P.parse (exprParser <span class="op"><*</span> P.skipSpace)</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> processResult <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">P.Done</span> <span class="st">""</span> res <span class="ot">-></span> <span class="fu">pure</span> res</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">P.Done</span> leftover _ <span class="ot">-></span></span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a> throwParseError <span class="op">$</span></span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> <span class="st">"Leftover input: \""</span> <span class="op"><></span> BSC.unpack leftover <span class="op"><></span> <span class="st">"\""</span></span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">P.Partial</span> f <span class="ot">-></span> processResult <span class="op">$</span> f <span class="st">""</span></span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">P.Fail</span> _ [] err <span class="ot">-></span></span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a> throwParseError <span class="op">.</span> capitalize <span class="op">.</span> fromMaybe err <span class="op">$</span></span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a> List.stripPrefix <span class="st">"Failed reading: "</span> err</span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">P.Fail</span> <span class="st">""</span> ctxs _ <span class="ot">-></span></span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a> throwParseError <span class="op">$</span></span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a> <span class="st">"Expected: "</span> <span class="op"><></span> formatExpected ctxs <span class="op"><></span> <span class="st">", got: end-of-input"</span></span>
<span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">P.Fail</span> leftover ctxs _ <span class="ot">-></span></span>
<span id="cb12-17"><a href="#cb12-17" aria-hidden="true" tabindex="-1"></a> throwParseError <span class="op">$</span></span>
<span id="cb12-18"><a href="#cb12-18" aria-hidden="true" tabindex="-1"></a> <span class="st">"Expected: "</span> <span class="op"><></span> formatExpected ctxs</span>
<span id="cb12-19"><a href="#cb12-19" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">", got: \""</span> <span class="op"><></span> <span class="fu">head</span> (<span class="fu">words</span> <span class="op">$</span> BSC.unpack leftover) <span class="op"><></span> <span class="st">"\""</span></span>
<span id="cb12-20"><a href="#cb12-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-21"><a href="#cb12-21" aria-hidden="true" tabindex="-1"></a> capitalize <span class="op">~</span>(c <span class="op">:</span> cs) <span class="ot">=</span> <span class="fu">toUpper</span> c <span class="op">:</span> cs</span>
<span id="cb12-22"><a href="#cb12-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-23"><a href="#cb12-23" aria-hidden="true" tabindex="-1"></a> formatExpected ctxs <span class="ot">=</span> <span class="kw">case</span> <span class="fu">last</span> ctxs <span class="kw">of</span></span>
<span id="cb12-24"><a href="#cb12-24" aria-hidden="true" tabindex="-1"></a> [c] <span class="ot">-></span> <span class="st">"\'"</span> <span class="op"><></span> [c] <span class="op"><></span> <span class="st">"\'"</span></span>
<span id="cb12-25"><a href="#cb12-25" aria-hidden="true" tabindex="-1"></a> s <span class="ot">-></span> s</span>
<span id="cb12-26"><a href="#cb12-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-27"><a href="#cb12-27" aria-hidden="true" tabindex="-1"></a> throwParseError <span class="ot">=</span> throwError <span class="op">.</span> <span class="dt">Error</span> <span class="dt">Parse</span></span>
<span id="cb12-28"><a href="#cb12-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-29"><a href="#cb12-29" aria-hidden="true" tabindex="-1"></a><span class="ot">parse ::</span> <span class="dt">BS.ByteString</span> <span class="ot">-></span> <span class="dt">Result</span> <span class="dt">Expr</span></span>
<span id="cb12-30"><a href="#cb12-30" aria-hidden="true" tabindex="-1"></a>parse <span class="ot">=</span> <span class="fu">fmap</span> <span class="fu">fst</span> <span class="op">.</span> parseSized</span>
<span id="cb12-31"><a href="#cb12-31" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# INLINE parse #-}</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>The <code>processResult</code> function deals with intricacies of how attoparsec returns the parsing result. Basically, we inspect the returned result and throw appropriate errors with useful error messages. We use <a href="https://hackage.haskell.org/package/mtl/docs/Control-Monad-Error.html#v:throwError" target="_blank" rel="noopener"><code>throwError</code></a> from the <code class="sourceCode haskell"><span class="dt">MonadError</span></code> typeclass that works with all its instances, which <code class="sourceCode haskell"><span class="dt">Either</span></code> is one of.</p>
<p>Finally, we throw away the bytecode size from the result of <code>parseSized</code> in the <code>parse</code> function.</p>
<p>The parser is done. But as good programmers, we must make sure that it works correctly. Let’s write some unit tests.</p>
<h2 data-track-content data-content-name="testing-the-parser" data-content-piece="arithmetic-bytecode-vm-parser" id="testing-the-parser">Testing the Parser</h2>
<p>We use the <a href="https://hspec.github.io/" target="_blank" rel="noopener">hspec</a> library to write unit tests for our program. Each test is written as a <a href="https://en.wikipedia.org/wiki/Behavior-driven_development#Behavioral_specifications" target="_blank" rel="noopener">spec</a><a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a>.</p>
<figure>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE GHC2021 #-}</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE OverloadedStrings #-}</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Main</span> (main) <span class="kw">where</span></span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">ArithVMLib</span></span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Arrow</span> ((>>>))</span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span> (forM_, (>=>))</span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.ByteString.Char8</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">BSC</span></span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Int</span> (<span class="dt">Int16</span>)</span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Sequence</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Seq</span></span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Test.Hspec</span></span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Test.Hspec.QuickCheck</span></span>
<span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Test.QuickCheck</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Q</span></span>
<span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-16"><a href="#cb13-16" aria-hidden="true" tabindex="-1"></a><span class="ot">parserSpec ::</span> <span class="dt">Spec</span></span>
<span id="cb13-17"><a href="#cb13-17" aria-hidden="true" tabindex="-1"></a>parserSpec <span class="ot">=</span> describe <span class="st">"Parser"</span> <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb13-18"><a href="#cb13-18" aria-hidden="true" tabindex="-1"></a> forM_ parserSuccessTests <span class="op">$</span> \(input, result) <span class="ot">-></span></span>
<span id="cb13-19"><a href="#cb13-19" aria-hidden="true" tabindex="-1"></a> it (<span class="st">"parses: \""</span> <span class="op"><></span> BSC.unpack input <span class="op"><></span> <span class="st">"\""</span>) <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb13-20"><a href="#cb13-20" aria-hidden="true" tabindex="-1"></a> (<span class="fu">show</span> <span class="op"><$></span> parse input) <span class="ot">`shouldBe`</span> <span class="dt">Right</span> result</span>
<span id="cb13-21"><a href="#cb13-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-22"><a href="#cb13-22" aria-hidden="true" tabindex="-1"></a> forM_ parserErrorTests <span class="op">$</span> \(input, err) <span class="ot">-></span></span>
<span id="cb13-23"><a href="#cb13-23" aria-hidden="true" tabindex="-1"></a> it (<span class="st">"fails for: \""</span> <span class="op"><></span> BSC.unpack input <span class="op"><></span> <span class="st">"\""</span>) <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb13-24"><a href="#cb13-24" aria-hidden="true" tabindex="-1"></a> parse input <span class="ot">`shouldSatisfy`</span> \<span class="kw">case</span></span>
<span id="cb13-25"><a href="#cb13-25" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> (<span class="dt">Error</span> <span class="dt">Parse</span> msg) <span class="op">|</span> err <span class="op">==</span> msg <span class="ot">-></span> <span class="dt">True</span></span>
<span id="cb13-26"><a href="#cb13-26" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb13-27"><a href="#cb13-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-28"><a href="#cb13-28" aria-hidden="true" tabindex="-1"></a><span class="ot">parserSuccessTests ::</span> [(<span class="dt">BSC.ByteString</span>, <span class="dt">String</span>)]</span>
<span id="cb13-29"><a href="#cb13-29" aria-hidden="true" tabindex="-1"></a>parserSuccessTests <span class="ot">=</span></span>
<span id="cb13-30"><a href="#cb13-30" aria-hidden="true" tabindex="-1"></a> [ ( <span class="st">"1 + 2 - 3 * 4 + 5 / 6 / 0 + 1"</span>,</span>
<span id="cb13-31"><a href="#cb13-31" aria-hidden="true" tabindex="-1"></a> <span class="st">"((((1 + 2) - (3 * 4)) + ((5 / 6) / 0)) + 1)"</span></span>
<span id="cb13-32"><a href="#cb13-32" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-33"><a href="#cb13-33" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"1+2-3*4+5/6/0+1"</span>,</span>
<span id="cb13-34"><a href="#cb13-34" aria-hidden="true" tabindex="-1"></a> <span class="st">"((((1 + 2) - (3 * 4)) + ((5 / 6) / 0)) + 1)"</span></span>
<span id="cb13-35"><a href="#cb13-35" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-36"><a href="#cb13-36" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"1 + -1"</span>,</span>
<span id="cb13-37"><a href="#cb13-37" aria-hidden="true" tabindex="-1"></a> <span class="st">"(1 + -1)"</span></span>
<span id="cb13-38"><a href="#cb13-38" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-39"><a href="#cb13-39" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = 4 in x + 1"</span>,</span>
<span id="cb13-40"><a href="#cb13-40" aria-hidden="true" tabindex="-1"></a> <span class="st">"(let x = 4 in (x + 1))"</span></span>
<span id="cb13-41"><a href="#cb13-41" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-42"><a href="#cb13-42" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x=4in x+1"</span>,</span>
<span id="cb13-43"><a href="#cb13-43" aria-hidden="true" tabindex="-1"></a> <span class="st">"(let x = 4 in (x + 1))"</span></span>
<span id="cb13-44"><a href="#cb13-44" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-45"><a href="#cb13-45" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = 4 in let y = 5 in x + y"</span>,</span>
<span id="cb13-46"><a href="#cb13-46" aria-hidden="true" tabindex="-1"></a> <span class="st">"(let x = 4 in (let y = 5 in (x + y)))"</span></span>
<span id="cb13-47"><a href="#cb13-47" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-48"><a href="#cb13-48" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = 4 in let y = 5 in x + let z = y in z * z"</span>,</span>
<span id="cb13-49"><a href="#cb13-49" aria-hidden="true" tabindex="-1"></a> <span class="st">"(let x = 4 in (let y = 5 in (x + (let z = y in (z * z)))))"</span></span>
<span id="cb13-50"><a href="#cb13-50" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-51"><a href="#cb13-51" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = 4 in (let y = 5 in x + 1) + let z = 2 in z * z"</span>,</span>
<span id="cb13-52"><a href="#cb13-52" aria-hidden="true" tabindex="-1"></a> <span class="st">"(let x = 4 in ((let y = 5 in (x + 1)) + (let z = 2 in (z * z))))"</span></span>
<span id="cb13-53"><a href="#cb13-53" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-54"><a href="#cb13-54" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x=4in 2+let y=x-5in x+let z=y+1in z/2"</span>,</span>
<span id="cb13-55"><a href="#cb13-55" aria-hidden="true" tabindex="-1"></a> <span class="st">"(let x = 4 in (2 + (let y = (x - 5) in (x + (let z = (y + 1) in (z / 2))))))"</span></span>
<span id="cb13-56"><a href="#cb13-56" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-57"><a href="#cb13-57" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = (let y = 3 in y + y) in x * 3"</span>,</span>
<span id="cb13-58"><a href="#cb13-58" aria-hidden="true" tabindex="-1"></a> <span class="st">"(let x = (let y = 3 in (y + y)) in (x * 3))"</span></span>
<span id="cb13-59"><a href="#cb13-59" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-60"><a href="#cb13-60" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = let y = 3 in y + y in x * 3"</span>,</span>
<span id="cb13-61"><a href="#cb13-61" aria-hidden="true" tabindex="-1"></a> <span class="st">"(let x = (let y = 3 in (y + y)) in (x * 3))"</span></span>
<span id="cb13-62"><a href="#cb13-62" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-63"><a href="#cb13-63" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3"</span>,</span>
<span id="cb13-64"><a href="#cb13-64" aria-hidden="true" tabindex="-1"></a> <span class="st">"(let x = (let y = (1 + (let z = 2 in (z * z))) in (y + 1)) in (x * 3))"</span></span>
<span id="cb13-65"><a href="#cb13-65" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb13-66"><a href="#cb13-66" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb13-67"><a href="#cb13-67" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-68"><a href="#cb13-68" aria-hidden="true" tabindex="-1"></a><span class="ot">parserErrorTests ::</span> [(<span class="dt">BSC.ByteString</span>, <span class="dt">String</span>)]</span>
<span id="cb13-69"><a href="#cb13-69" aria-hidden="true" tabindex="-1"></a>parserErrorTests <span class="ot">=</span></span>
<span id="cb13-70"><a href="#cb13-70" aria-hidden="true" tabindex="-1"></a> [ (<span class="st">""</span>, <span class="st">"Not enough input"</span>),</span>
<span id="cb13-71"><a href="#cb13-71" aria-hidden="true" tabindex="-1"></a> (<span class="st">"1 +"</span>, <span class="st">"Leftover input: \"+\""</span>),</span>
<span id="cb13-72"><a href="#cb13-72" aria-hidden="true" tabindex="-1"></a> (<span class="st">"1 & 1"</span>, <span class="st">"Leftover input: \"& 1\""</span>),</span>
<span id="cb13-73"><a href="#cb13-73" aria-hidden="true" tabindex="-1"></a> (<span class="st">"1 + 1 & 1"</span>, <span class="st">"Leftover input: \"& 1\""</span>),</span>
<span id="cb13-74"><a href="#cb13-74" aria-hidden="true" tabindex="-1"></a> (<span class="st">"1 & 1 + 1"</span>, <span class="st">"Leftover input: \"& 1 + 1\""</span>),</span>
<span id="cb13-75"><a href="#cb13-75" aria-hidden="true" tabindex="-1"></a> (<span class="st">"("</span>, <span class="st">"Not enough input"</span>),</span>
<span id="cb13-76"><a href="#cb13-76" aria-hidden="true" tabindex="-1"></a> (<span class="st">"(1"</span>, <span class="st">"Expected: ')', got: end-of-input"</span>),</span>
<span id="cb13-77"><a href="#cb13-77" aria-hidden="true" tabindex="-1"></a> (<span class="st">"(1 + "</span>, <span class="st">"Expected: ')', got: \"+\""</span>),</span>
<span id="cb13-78"><a href="#cb13-78" aria-hidden="true" tabindex="-1"></a> (<span class="st">"(1 + 2"</span>, <span class="st">"Expected: ')', got: end-of-input"</span>),</span>
<span id="cb13-79"><a href="#cb13-79" aria-hidden="true" tabindex="-1"></a> (<span class="st">"(1 + 2}"</span>, <span class="st">"Expected: ')', got: \"}\""</span>),</span>
<span id="cb13-80"><a href="#cb13-80" aria-hidden="true" tabindex="-1"></a> (<span class="st">"66666"</span>, <span class="st">"Expected a valid Int16, got: 66666"</span>),</span>
<span id="cb13-81"><a href="#cb13-81" aria-hidden="true" tabindex="-1"></a> (<span class="st">"-x"</span>, <span class="st">"Expected: number, got: \"-x\""</span>),</span>
<span id="cb13-82"><a href="#cb13-82" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let 1"</span>, <span class="st">"Expected: identifier, got: \"1\""</span>),</span>
<span id="cb13-83"><a href="#cb13-83" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 1 in "</span>, <span class="st">"Not enough input"</span>),</span>
<span id="cb13-84"><a href="#cb13-84" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let let = 1 in 1"</span>,</span>
<span id="cb13-85"><a href="#cb13-85" aria-hidden="true" tabindex="-1"></a> <span class="st">"Expected identifier, got: \"let\", which is a reversed keyword"</span></span>
<span id="cb13-86"><a href="#cb13-86" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-87"><a href="#cb13-87" aria-hidden="true" tabindex="-1"></a> ( <span class="st">"let x = 1 in in"</span>,</span>
<span id="cb13-88"><a href="#cb13-88" aria-hidden="true" tabindex="-1"></a> <span class="st">"Expected identifier, got: \"in\", which is a reversed keyword"</span></span>
<span id="cb13-89"><a href="#cb13-89" aria-hidden="true" tabindex="-1"></a> ),</span>
<span id="cb13-90"><a href="#cb13-90" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x=1 inx"</span>, <span class="st">"Expected: space, got: \"x\""</span>),</span>
<span id="cb13-91"><a href="#cb13-91" aria-hidden="true" tabindex="-1"></a> (<span class="st">"letx = 1 in x"</span>, <span class="st">"Leftover input: \"= 1 in x\""</span>),</span>
<span id="cb13-92"><a href="#cb13-92" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x ~ 1 in x"</span>, <span class="st">"Expected: \"=\", got: \"~\""</span>),</span>
<span id="cb13-93"><a href="#cb13-93" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 1 & 2 in x"</span>, <span class="st">"Expected: \"in\", got: \"&\""</span>),</span>
<span id="cb13-94"><a href="#cb13-94" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 1 inx"</span>, <span class="st">"Expected: space, got: \"x\""</span>),</span>
<span id="cb13-95"><a href="#cb13-95" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 1 in x +"</span>, <span class="st">"Leftover input: \"+\""</span>),</span>
<span id="cb13-96"><a href="#cb13-96" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 1 in x in"</span>, <span class="st">"Leftover input: \"in\""</span>),</span>
<span id="cb13-97"><a href="#cb13-97" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = let x = 1 in x"</span>, <span class="st">"Expected: \"in\", got: end-of-input"</span>)</span>
<span id="cb13-98"><a href="#cb13-98" aria-hidden="true" tabindex="-1"></a> ]</span></code></pre></div>
<figcaption>
ArithVMSpec.hs
</figcaption>
</figure>
<p>We have a bunch of tests for the parser, testing both success and failure cases. Notice how spaces are treated in the expressions. Also notice how the <code class="sourceCode haskell"><span class="kw">let</span></code> expressions are parsed. We’ll add property-based tests for the parser in the next post.</p>
<p>There is not much we can do with the parsed <abbr title="Abstract Syntax Tree">AST</abbr>s at this point. Let’s write an interpreter to evaluate our <abbr title="Abstract Syntax Tree">AST</abbr>s.</p>
<h2 data-track-content data-content-name="the-ast-interpreter" data-content-piece="arithmetic-bytecode-vm-parser" id="the-ast-interpreter">The AST Interpreter</h2>
<p>The <abbr title="Abstract Syntax Tree">AST</abbr> interpreter is a standard and short recursive interpreter with an environment mapping variables to their values:</p>
<figure>
<div class="sourceCode" id="cb14" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ot">interpretAST ::</span> <span class="dt">Expr</span> <span class="ot">-></span> <span class="dt">Result</span> <span class="dt">Int16</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>interpretAST <span class="ot">=</span> go Map.empty</span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> go env <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Num</span> n <span class="ot">-></span> <span class="fu">pure</span> n</span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Var</span> x <span class="ot">-></span> <span class="kw">case</span> Map.lookup x env <span class="kw">of</span></span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> v <span class="ot">-></span> <span class="fu">pure</span> v</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> throwInterpretError <span class="op">$</span> <span class="st">"Unknown variable: "</span> <span class="op"><></span> <span class="fu">show</span> x</span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">BinOp</span> op a b <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a> <span class="op">!</span>a' <span class="ot"><-</span> go env a</span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> <span class="op">!</span>b' <span class="ot"><-</span> go env b</span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> op <span class="kw">of</span></span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Add</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$!</span> a' <span class="op">+</span> b'</span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">Sub</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$!</span> a' <span class="op">-</span> b'</span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">Mul</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$!</span> a' <span class="op">*</span> b'</span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">Div</span> <span class="op">|</span> b' <span class="op">==</span> <span class="dv">0</span> <span class="ot">-></span> throwInterpretError <span class="st">"Division by zero"</span></span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">Div</span> <span class="op">|</span> b' <span class="op">==</span> (<span class="op">-</span><span class="dv">1</span>) <span class="op">&&</span> a' <span class="op">==</span> <span class="fu">minBound</span> <span class="ot">-></span></span>
<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a> throwInterpretError <span class="st">"Arithmetic overflow"</span></span>
<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">Div</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$!</span> a' <span class="ot">`div`</span> b'</span>
<span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a> <span class="dt">Let</span> x assign body <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a> <span class="op">!</span>val <span class="ot"><-</span> go env assign</span>
<span id="cb14-22"><a href="#cb14-22" aria-hidden="true" tabindex="-1"></a> go (Map.insert x val env) body</span>
<span id="cb14-23"><a href="#cb14-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-24"><a href="#cb14-24" aria-hidden="true" tabindex="-1"></a> throwInterpretError <span class="ot">=</span> throwError <span class="op">.</span> <span class="dt">Error</span> <span class="dt">InterpretAST</span></span></code></pre></div>
<figcaption>
ArithVMLib.hs
</figcaption>
</figure>
<p>This interpreter serves both as a performance baseline for the bytecode <abbr title="Virtual Machine">VM</abbr> we write later, and as a definitional interpreter for testing the <abbr title="Virtual Machine">VM</abbr><a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a>. We are careful in detecting division-by-zero and arithmetic overflow errors, but we ignore possible integer overflow/underflow errors that may be caused by the arithmetic operations.</p>
<h2 data-track-content data-content-name="testing-the-interpreter" data-content-piece="arithmetic-bytecode-vm-parser" id="testing-the-interpreter">Testing the Interpreter</h2>
<p>We write some unit tests for the interpreter following the same pattern as the parser:</p>
<figure>
<div class="sourceCode" id="cb15" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ot">astInterpreterSpec ::</span> <span class="dt">Spec</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>astInterpreterSpec <span class="ot">=</span> describe <span class="st">"AST interpreter"</span> <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> forM_ astInterpreterSuccessTests <span class="op">$</span> \(input, result) <span class="ot">-></span></span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a> it (<span class="st">"interprets: \""</span> <span class="op"><></span> BSC.unpack input <span class="op"><></span> <span class="st">"\""</span>) <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a> parseInterpret input <span class="ot">`shouldBe`</span> <span class="dt">Right</span> result</span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a> forM_ astInterpreterErrorTests <span class="op">$</span> \(input, err) <span class="ot">-></span></span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a> it (<span class="st">"fails for: \""</span> <span class="op"><></span> BSC.unpack input <span class="op"><></span> <span class="st">"\""</span>) <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a> parseInterpret input <span class="ot">`shouldSatisfy`</span> \<span class="kw">case</span></span>
<span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> (<span class="dt">Error</span> <span class="dt">InterpretAST</span> msg) <span class="op">|</span> err <span class="op">==</span> msg <span class="ot">-></span> <span class="dt">True</span></span>
<span id="cb15-11"><a href="#cb15-11" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb15-12"><a href="#cb15-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb15-13"><a href="#cb15-13" aria-hidden="true" tabindex="-1"></a> parseInterpret <span class="ot">=</span> parse <span class="op">>=></span> interpretAST</span>
<span id="cb15-14"><a href="#cb15-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-15"><a href="#cb15-15" aria-hidden="true" tabindex="-1"></a><span class="ot">astInterpreterSuccessTests ::</span> [(<span class="dt">BSC.ByteString</span>, <span class="dt">Int16</span>)]</span>
<span id="cb15-16"><a href="#cb15-16" aria-hidden="true" tabindex="-1"></a>astInterpreterSuccessTests <span class="ot">=</span></span>
<span id="cb15-17"><a href="#cb15-17" aria-hidden="true" tabindex="-1"></a> [ (<span class="st">"1"</span>, <span class="dv">1</span>),</span>
<span id="cb15-18"><a href="#cb15-18" aria-hidden="true" tabindex="-1"></a> (<span class="st">"1 + 2 - 3 * 4 + 5 / 6 / 1 + 1"</span>, <span class="op">-</span><span class="dv">8</span>),</span>
<span id="cb15-19"><a href="#cb15-19" aria-hidden="true" tabindex="-1"></a> (<span class="st">"1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)"</span>, <span class="op">-</span><span class="dv">3</span>),</span>
<span id="cb15-20"><a href="#cb15-20" aria-hidden="true" tabindex="-1"></a> (<span class="st">"1 + -1"</span>, <span class="dv">0</span>),</span>
<span id="cb15-21"><a href="#cb15-21" aria-hidden="true" tabindex="-1"></a> (<span class="st">"1 * -1"</span>, <span class="op">-</span><span class="dv">1</span>),</span>
<span id="cb15-22"><a href="#cb15-22" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 4 in x + 1"</span>, <span class="dv">5</span>),</span>
<span id="cb15-23"><a href="#cb15-23" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 4 in let x = x + 1 in x + 2"</span>, <span class="dv">7</span>),</span>
<span id="cb15-24"><a href="#cb15-24" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 4 in let y = 5 in x + y"</span>, <span class="dv">9</span>),</span>
<span id="cb15-25"><a href="#cb15-25" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 4 in let y = 5 in x + let z = y in z * z"</span>, <span class="dv">29</span>),</span>
<span id="cb15-26"><a href="#cb15-26" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 4 in (let y = 5 in x + y) + let z = 2 in z * z"</span>, <span class="dv">13</span>),</span>
<span id="cb15-27"><a href="#cb15-27" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = let y = 3 in y + y in x * 3"</span>, <span class="dv">18</span>),</span>
<span id="cb15-28"><a href="#cb15-28" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3"</span>, <span class="dv">18</span>)</span>
<span id="cb15-29"><a href="#cb15-29" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb15-30"><a href="#cb15-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-31"><a href="#cb15-31" aria-hidden="true" tabindex="-1"></a><span class="ot">astInterpreterErrorTests ::</span> [(<span class="dt">BSC.ByteString</span>, <span class="dt">String</span>)]</span>
<span id="cb15-32"><a href="#cb15-32" aria-hidden="true" tabindex="-1"></a>astInterpreterErrorTests <span class="ot">=</span></span>
<span id="cb15-33"><a href="#cb15-33" aria-hidden="true" tabindex="-1"></a> [ (<span class="st">"x"</span>, <span class="st">"Unknown variable: x"</span>),</span>
<span id="cb15-34"><a href="#cb15-34" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = 4 in y + 1"</span>, <span class="st">"Unknown variable: y"</span>),</span>
<span id="cb15-35"><a href="#cb15-35" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = y + 1 in x"</span>, <span class="st">"Unknown variable: y"</span>),</span>
<span id="cb15-36"><a href="#cb15-36" aria-hidden="true" tabindex="-1"></a> (<span class="st">"let x = x + 1 in x"</span>, <span class="st">"Unknown variable: x"</span>),</span>
<span id="cb15-37"><a href="#cb15-37" aria-hidden="true" tabindex="-1"></a> (<span class="st">"1/0"</span>, <span class="st">"Division by zero"</span>),</span>
<span id="cb15-38"><a href="#cb15-38" aria-hidden="true" tabindex="-1"></a> (<span class="st">"-32768 / -1"</span>, <span class="st">"Arithmetic overflow"</span>)</span>
<span id="cb15-39"><a href="#cb15-39" aria-hidden="true" tabindex="-1"></a> ]</span></code></pre></div>
<figcaption>
ArithVMSpec.hs
</figcaption>
</figure>
<p>Now, we can run the parser and interpreter tests to make sure that everything works correctly.</p>
<figure>
<div class="sourceCode" id="cb16" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> hspec <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> parserSpec</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> astInterpreterSpec</span></code></pre></div>
<figcaption>
ArithVMSpec.hs
</figcaption>
</figure>
<details>
<summary>
Output of the test run
</summary>
<pre class="plain"><code>$ cabal test -O2
Running 1 test suites...
Test suite specs: RUNNING...
Parser
parses: "1 + 2 - 3 * 4 + 5 / 6 / 0 + 1" [✔]
parses: "1+2-3*4+5/6/0+1" [✔]
parses: "1 + -1" [✔]
parses: "let x = 4 in x + 1" [✔]
parses: "let x=4in x+1" [✔]
parses: "let x = 4 in let y = 5 in x + y" [✔]
parses: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
parses: "let x = 4 in (let y = 5 in x + 1) + let z = 2 in z * z" [✔]
parses: "let x=4in 2+let y=x-5in x+let z=y+1in z/2" [✔]
parses: "let x = (let y = 3 in y + y) in x * 3" [✔]
parses: "let x = let y = 3 in y + y in x * 3" [✔]
parses: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
fails for: "" [✔]
fails for: "1 +" [✔]
fails for: "1 & 1" [✔]
fails for: "1 + 1 & 1" [✔]
fails for: "1 & 1 + 1" [✔]
fails for: "(" [✔]
fails for: "(1" [✔]
fails for: "(1 + " [✔]
fails for: "(1 + 2" [✔]
fails for: "(1 + 2}" [✔]
fails for: "66666" [✔]
fails for: "-x" [✔]
fails for: "let 1" [✔]
fails for: "let x = 1 in " [✔]
fails for: "let let = 1 in 1" [✔]
fails for: "let x = 1 in in" [✔]
fails for: "let x=1 inx" [✔]
fails for: "letx = 1 in x" [✔]
fails for: "let x ~ 1 in x" [✔]
fails for: "let x = 1 & 2 in x" [✔]
fails for: "let x = 1 inx" [✔]
fails for: "let x = 1 in x +" [✔]
fails for: "let x = 1 in x in" [✔]
fails for: "let x = let x = 1 in x" [✔]
AST interpreter
interprets: "1" [✔]
interprets: "1 + 2 - 3 * 4 + 5 / 6 / 1 + 1" [✔]
interprets: "1 + (2 - 3) * 4 + 5 / 6 / (1 + 1)" [✔]
interprets: "1 + -1" [✔]
interprets: "1 * -1" [✔]
interprets: "let x = 4 in x + 1" [✔]
interprets: "let x = 4 in let x = x + 1 in x + 2" [✔]
interprets: "let x = 4 in let y = 5 in x + y" [✔]
interprets: "let x = 4 in let y = 5 in x + let z = y in z * z" [✔]
interprets: "let x = 4 in (let y = 5 in x + y) + let z = 2 in z * z" [✔]
interprets: "let x = let y = 3 in y + y in x * 3" [✔]
interprets: "let x = let y = 1 + let z = 2 in z * z in y + 1 in x * 3" [✔]
fails for: "x" [✔]
fails for: "let x = 4 in y + 1" [✔]
fails for: "let x = y + 1 in x" [✔]
fails for: "let x = x + 1 in x" [✔]
fails for: "1/0" [✔]
fails for: "-32768 / -1" [✔]
Finished in 0.0058 seconds
54 examples, 0 failures
Test suite specs: PASS</code></pre>
</details>
<p>Awesome, it works! That’s it for this post. Let’s update our checklist:</p>
<ul class="task-list">
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#parsing-expressions">Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#testing-the-parser">Unit testing for our parser.</a></label></li>
<li><label><input type="checkbox" checked></input><a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#the-ast-interpreter">Interpreting ASTs.</a></label></li>
<li><label><input type="checkbox"></input>Compiling ASTs to bytecode.</label></li>
<li><label><input type="checkbox"></input>Disassembling and decompiling bytecode.</label></li>
<li><label><input type="checkbox"></input>Unit testing for our compiler.</label></li>
<li><label><input type="checkbox"></input>Property-based testing for our compiler.</label></li>
<li><label><input type="checkbox"></input>Efficiently executing bytecode in a virtual machine (VM).</label></li>
<li><label><input type="checkbox"></input>Unit testing and property-based testing for our <abbr title="Virtual Machine">VM</abbr>.</label></li>
<li><label><input type="checkbox"></input>Benchmarking our code to see how the different passes perform.</label></li>
<li><label><input type="checkbox"></input>All the while keeping an eye on performance.</label></li>
</ul>
<p>In the <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed">next part</a>, we write a bytecode compiler for our expression <abbr title="Abstract Syntax Tree">AST</abbr>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>Variables are scoped to the body of the <code class="sourceCode haskell"><span class="kw">let</span></code> expressions they are introduced in, that is, our language has <a href="https://en.wikipedia.org/wiki/lexical_scoping" target="_blank" rel="noopener">lexical scoping</a>. Also, variables with same name in inner <code class="sourceCode haskell"><span class="kw">let</span></code>s <a href="https://en.wikipedia.org/wiki/Variable_shadowing" target="_blank" rel="noopener">shadow</a> the variables in outer <code class="sourceCode haskell"><span class="kw">let</span></code>s.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>If you are wondering why do this at all, when we can directly run the expressions while parsing, I think this is a great little project to learn how to write performant bytecode compilers and <abbr title="Virtual Machine">VM</abbr>s in Haskell.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>Bangs (<code class="sourceCode haskell"><span class="op">!</span></code>) that enforce strictness are placed in the <code class="sourceCode haskell"><span class="dt">Expr</span></code> <abbr title="Algebraic Data Type">ADT</abbr> (and also in the later code) at the right positions that provide performance benefits. The right positions were found by profiling the program. A bang placed at a wrong position (for example in front of <code class="sourceCode haskell"><span class="dt">Expr</span></code> inside <code class="sourceCode haskell"><span class="dt">BinOp</span></code>) may ruin the compiler provided optimizations and make the overall program slower.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>attoparsec is very fast, but there are <a href="https://gitlab.com/FinnBender/haskell-parsing-benchmarks/" target="_blank" rel="noopener">faster parsing libraries</a> in Haskell. On the other hand, attoparsec does not provided great error messages. If the user experience were a higher priority, I’d use the <a href="https://hackage.haskell.org/package/megaparsec/" target="_blank" rel="noopener">megaparsec</a> library. I find attoparsec to have the right balance of performance, developer experience and user experience. Handwritten parsers from scratch could be faster, but they’d be harder to maintain and use.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>I wrote the first version of the parser using the <a href="https://hackage.haskell.org/package/base/docs/Text-ParserCombinators-ReadP.html" target="_blank" rel="noopener"><code>ReadP</code></a> library that comes with Haskell standard library. I rewrote it to use attoparsec and found that the rewritten parser was more than 10x faster.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>You don’t need to think about the bytecode size of expressions right now. It’ll become clear when we go over compilation in the next post.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>Certain functions such as <code>chainBinOps</code> are inlined using the <a href="https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/pragmas.html#inline-pragma" target="_blank" rel="noopener"><code>INLINE</code></a> pragma to improve the program performance. The functions to inline were chosen by profiling.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>Since the numbers need to be encoded into bytes when we compile to bytecode, we need to choose <em>some</em> encoding for them. For simpler code, we choose 2-byte integers.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9"><p>Testing your parsers is crucial because that’s your programming languages’ interface to the users. Also because writing (fast) parsers is difficult and error-prone. Most of the bugs I found in this program were in the parser.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn10"><p>Again, notice the carefully placed bangs to enforce strictness. Try to figure out why they are placed at some places and not at others.<a href="#fnref10" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>A Fast Bytecode VM for Arithmetic</strong>.</p>
<ol>
<li>
<strong>The Parser</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-compiler/?mtm_campaign=feed">The Compiler</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm/?mtm_campaign=feed">The Virtual Machine</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/arithmetic-bytecode-vm-parser/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2025-08-02T00:00:00Z <p>In this series of posts, we write a fast bytecode compiler and a virtual machine for arithmetic in Haskell. We explore the following topics:</p>
<ul class="task-list">
<li><label><input type="checkbox" /><span class="todo">Parsing arithmetic expressions to Abstract Syntax Trees (ASTs).</span></label></li>
<li><label><input type="checkbox" /><span class="todo">Unit testing for our parser.</span></label></li>
<li><label><input type="checkbox" /><span class="todo">Interpreting ASTs.</span></label></li>
<li><label><input type="checkbox" />Compiling ASTs to bytecode.</label></li>
<li><label><input type="checkbox" />Disassembling and decompiling bytecode.</label></li>
<li><label><input type="checkbox" />Unit testing for our compiler.</label></li>
<li><label><input type="checkbox" />Property-based testing for our compiler.</label></li>
<li><label><input type="checkbox" />Efficiently executing bytecode in a virtual machine (VM).</label></li>
<li><label><input type="checkbox" />Unit testing and property-based testing for our <abbr title="Virtual Machine">VM</abbr>.</label></li>
<li><label><input type="checkbox" />Benchmarking our code to see how the different passes perform.</label></li>
<li><label><input type="checkbox" />All the while keeping an eye on performance.</label></li>
</ul>
<p>In this post, we write the parser for our expression language to an <abbr title="Abstract Syntax Tree">AST</abbr>, and an <abbr title="Abstract Syntax Tree">AST</abbr> interpreter.</p>
https://abhinavsarkar.net/posts/brainfuck-interpreter/ Interpreting Brainfuck in Haskell 2025-01-19T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>Writing an interpreter for Brainfuck is almost a rite of passage for any programming language implementer,
and it’s my turn now. In this post, we’ll write not one but four Brainfuck interpreters in Haskell. Let’s go!</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/brainfuck-interpreter/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more-->
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#introduction">Introduction</a></li><li><a href="#setup">Setup</a></li><li><a href="#string-interpreter">String Interpreter</a></li><li><a href="#ast-interpreter">AST Interpreter</a></li><li><a href="#bytecode-interpreter">Bytecode Interpreter</a></li><li><a href="#optimizing-bytecode-interpreter">Optimizing Bytecode Interpreter</a></li><li><a href="#comparison">Comparison</a></li></ol></nav>
<h2 data-track-content data-content-name="introduction" data-content-piece="brainfuck-interpreter" id="introduction">Introduction</h2>
<p><a href="https://en.wikipedia.org/wiki/Brainfuck" target="_blank" rel="noopener">Brainfuck</a> (henceforth BF) is the most famous of esoteric programming languages. Its fame lies in
the fact that it is extremely minimalist, with only eight instructions, and very easy to implement.
Yet, it is Turing-complete and as capable as any other programming language<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. Writing
an interpreter for <abbr title="Brainfuck">BF</abbr> is a fun exercise, and so there are hundreds, maybe even thousands of them. Since <abbr title="Brainfuck">BF</abbr>
is very verbose, optimizing <abbr title="Brainfuck">BF</abbr> interpreters is almost a sport, with people posting benchmarks of their
creations. I can’t say that what I have in this post is novel, but it was definitely a fun exercise for me.</p>
<p><abbr title="Brainfuck">BF</abbr> has eight instructions of one character each. A <abbr title="Brainfuck">BF</abbr> program is a sequence of these instructions. It may have other characters as well, which are treated as comments and are ignored while executing. An instruction pointer (IP) points at the next instruction to be executed, starting with the first instruction. The instructions are executed sequentially, except for the jump instructions that may cause the <abbr title="Instruction pointer">IP</abbr> to jump to remote instructions. The program terminates when the <abbr title="Instruction pointer">IP</abbr> moves past the last instruction.</p>
<p><abbr title="Brainfuck">BF</abbr> programs work by modifying data in a memory that is an array of at least 30000 byte cells initialized to zero. A data pointer (DP) points to the current byte of the memory to be modified, starting with the first byte of the memory. <abbr title="Brainfuck">BF</abbr> programs can also read from standard input and write to standard output, one byte at a time using the <a href="https://en.wikipedia.org/wiki/ASCII" target="_blank" rel="noopener">ASCII</a> character encoding.</p>
<p>The eight <abbr title="Brainfuck">BF</abbr> instructions each consist of a single character:</p>
<dl>
<dt><code class="sourceCode brainfuck" data-lang="brainfuck"><span class="kw">></span></code></dt>
<dd>
Increment the <abbr title="Data pointer">DP</abbr> by one to point to the next cell to the right.
</dd>
<dt><code class="sourceCode brainfuck" data-lang="brainfuck"><span class="kw"><</span></code></dt>
<dd>
Decrement the <abbr title="Data pointer">DP</abbr> by one to point to the next cell to the left.
</dd>
<dt><code class="sourceCode brainfuck" data-lang="brainfuck"><span class="st">+</span></code></dt>
<dd>
Increment the byte at the <abbr title="Data pointer">DP</abbr> by one.
</dd>
<dt><code class="sourceCode brainfuck" data-lang="brainfuck"><span class="st">-</span></code></dt>
<dd>
Decrement the byte at the <abbr title="Data pointer">DP</abbr> by one.
</dd>
<dt><code class="sourceCode brainfuck" data-lang="brainfuck"><span class="fl">.</span></code></dt>
<dd>
Output the byte at the <abbr title="Data pointer">DP</abbr>.
</dd>
<dt><code class="sourceCode brainfuck" data-lang="brainfuck"><span class="fl">,</span></code></dt>
<dd>
Accept one byte of input, and store its value in the byte at the <abbr title="Data pointer">DP</abbr>.
</dd>
<dt><code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">[</span></code></dt>
<dd>
If the byte at the <abbr title="Data pointer">DP</abbr> is zero, then instead of moving the <abbr title="Instruction pointer">IP</abbr> forward to the next command, jump it forward to the command after the matching <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">]</span></code> command.
</dd>
<dt><code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">]</span></code></dt>
<dd>
If the byte at the <abbr title="Data pointer">DP</abbr> is nonzero, then instead of moving the <abbr title="Instruction pointer">IP</abbr> forward to the next command, jump it back to the command after the matching <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">[</span></code> command.
</dd>
</dl>
<p>Each <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">[</span></code> matches exactly one <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">]</span></code> and vice versa, and the <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">[</span></code> comes first. Together, they add conditions and loops to <abbr title="Brainfuck">BF</abbr>.</p>
<p>Some details are left to implementations. In our case, we assume that the memory cells are signed bytes that underflow and overflow without errors. Also, accessing the memory beyond array boundaries wraps to the opposite side without errors.</p>
<p>For a taste, here is a small <abbr title="Brainfuck">BF</abbr> program that prints <code>Hello, World!</code> when run:</p>
<div class="sourceCode" data-lang="brainfuck"><pre class="sourceCode brainfuck"><code class="sourceCode brainfuck"><span id="1"><a href="#1" aria-hidden="true" tabindex="-1"></a><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="cn">[</span><span class="kw">></span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="kw">></span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="kw">></span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="kw">></span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="kw">></span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="kw">></span><span class="st">+</span><span class="kw"><</span><span class="kw"><</span><span class="kw"><</span><span class="kw"><</span><span class="kw"><</span><span class="kw"><</span><span class="st">-</span><span class="cn">]</span><span class="kw">></span><span class="st">+</span><span class="st">+</span><span class="st">+</span></span>
<span id="2"><a href="#2" aria-hidden="true" tabindex="-1"></a><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="fl">.</span><span class="kw">></span><span class="st">+</span><span class="st">+</span><span class="fl">.</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="fl">.</span><span class="fl">.</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="fl">.</span><span class="kw">></span><span class="kw">></span><span class="fl">.</span><span class="kw">></span><span class="st">-</span><span class="fl">.</span><span class="kw"><</span><span class="kw"><</span><span class="st">-</span><span class="fl">.</span><span class="kw"><</span><span class="fl">.</span><span class="st">+</span><span class="st">+</span><span class="st">+</span><span class="fl">.</span><span class="st">-</span><span class="st">-</span><span class="st">-</span><span class="st">-</span><span class="st">-</span><span class="st">-</span><span class="fl">.</span><span class="st">-</span><span class="st">-</span><span class="st">-</span><span class="st">-</span><span class="st">-</span><span class="st">-</span><span class="st">-</span><span class="st">-</span><span class="fl">.</span><span class="kw">></span><span class="kw">></span><span class="kw">></span><span class="st">+</span><span class="fl">.</span><span class="kw">></span><span class="st">-</span><span class="fl">.</span></span></code></pre></div>
<p>As you can imagine, interpreting <abbr title="Brainfuck">BF</abbr> is easy, at least when doing it naively. So instead of writing one interpreter, we are going to write four, with increasing performance and complexity.</p>
<h2 data-track-content data-content-name="setup" data-content-piece="brainfuck-interpreter" id="setup">Setup</h2>
<p>First, some imports:</p>
<div class="sourceCode" id="cb1" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE GHC2021 #-}</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE LambdaCase #-}</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE TypeFamilies #-}</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Main</span> <span class="kw">where</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Arrow</span> ((>>>))</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span> (void)</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Bits</span> (shiftR, (.&.))</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.ByteArray</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">BA</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Char</span> (chr, ord)</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Functor</span> (($>))</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Int</span> (<span class="dt">Int8</span>)</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Kind</span> (<span class="dt">Type</span>)</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Vector</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">V</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Vector.Storable.Mutable</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">MV</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Word</span> (<span class="dt">Word16</span>, <span class="dt">Word8</span>)</span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Foreign.Ptr</span> (<span class="dt">Ptr</span>, castPtr, minusPtr, plusPtr)</span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Foreign.Storable</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">S</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.Environment</span> (getArgs, getProgName)</span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.Exit</span> (exitFailure)</span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.IO</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">IO</span></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Text.ParserCombinators.ReadP</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">P</span></span></code></pre></div>
<p>We use the <code class="sourceCode haskell"><span class="dt">GHC2021</span></code> extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the <a href="https://hackage.haskell.org/package/memory" target="_blank" rel="noopener">memory</a> and <a href="https://hackage.haskell.org/package/vector" target="_blank" rel="noopener">vector</a> libraries.</p>
<p>We abstract the interpreter interface as a typeclass:</p>
<div class="sourceCode" id="cb2" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> <span class="dt">Interpreter</span> a <span class="kw">where</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">data</span> <span class="dt">Program</span><span class="ot"> a ::</span> <span class="dt">Type</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="ot"> parse ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Program</span> a</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="ot"> interpret ::</span> <span class="dt">Memory</span> <span class="ot">-></span> <span class="dt">Program</span> a <span class="ot">-></span> <span class="dt">IO</span> ()</span></code></pre></div>
<p>An <code class="sourceCode haskell"><span class="dt">Interpreter</span></code> is specified by a data type <code class="sourceCode haskell"><span class="dt">Program</span></code> and two functions: <code>parse</code> parses a string to a <code class="sourceCode haskell"><span class="dt">Program</span></code>, and <code>interpret</code> interprets the parsed <code class="sourceCode haskell"><span class="dt">Program</span></code>.</p>
<p>For modeling the mutable memory, we use a mutable unboxed <a href="https://hackage.haskell.org/package/vector/docs/Data-Vector-Mutable.html#t:IOVector" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">IOVector</span></code></a> of signed bytes (<code class="sourceCode haskell"><span class="dt">Int8</span></code>) from the vector package. Since our interpreter runs in <code class="sourceCode haskell"><span class="dt">IO</span></code>, this works well for us. The <abbr title="Data pointer">DP</abbr> hence, is modelled as a index in this vector, which we name the <code class="sourceCode haskell"><span class="dt">MemIdx</span></code> type.</p>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Memory</span> <span class="ot">=</span> <span class="dt">Memory</span> {<span class="ot">unMemory ::</span> <span class="dt">MV.IOVector</span> <span class="dt">Int8</span>}</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">MemIdx</span> <span class="ot">=</span> <span class="dt">Int</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="ot">newMemory ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">IO</span> <span class="dt">Memory</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>newMemory <span class="ot">=</span> <span class="fu">fmap</span> <span class="dt">Memory</span> <span class="op">.</span> MV.new</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="ot">memorySize ::</span> <span class="dt">Memory</span> <span class="ot">-></span> <span class="dt">Int</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>memorySize <span class="ot">=</span> MV.length <span class="op">.</span> unMemory</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a><span class="ot">readMemory ::</span> <span class="dt">Memory</span> <span class="ot">-></span> <span class="dt">MemIdx</span> <span class="ot">-></span> <span class="dt">IO</span> <span class="dt">Int8</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a>readMemory <span class="ot">=</span> MV.unsafeRead <span class="op">.</span> unMemory</span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a><span class="ot">writeMemory ::</span> <span class="dt">Memory</span> <span class="ot">-></span> <span class="dt">MemIdx</span> <span class="ot">-></span> <span class="dt">Int8</span> <span class="ot">-></span> <span class="dt">IO</span> ()</span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a>writeMemory <span class="ot">=</span> MV.unsafeWrite <span class="op">.</span> unMemory</span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a><span class="ot">modifyMemory ::</span> <span class="dt">Memory</span> <span class="ot">-></span> (<span class="dt">Int8</span> <span class="ot">-></span> <span class="dt">Int8</span>) <span class="ot">-></span> <span class="dt">MemIdx</span> <span class="ot">-></span> <span class="dt">IO</span> ()</span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a>modifyMemory <span class="ot">=</span> MV.unsafeModify <span class="op">.</span> unMemory</span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a><span class="ot">nextMemoryIndex ::</span> <span class="dt">Memory</span> <span class="ot">-></span> <span class="dt">MemIdx</span> <span class="ot">-></span> <span class="dt">MemIdx</span></span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a>nextMemoryIndex memory memIdx <span class="ot">=</span> (memIdx <span class="op">+</span> <span class="dv">1</span>) <span class="ot">`rem`</span> memorySize memory</span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="ot">prevMemoryIndex ::</span> <span class="dt">Memory</span> <span class="ot">-></span> <span class="dt">MemIdx</span> <span class="ot">-></span> <span class="dt">MemIdx</span></span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a>prevMemoryIndex memory memIdx <span class="ot">=</span> (memIdx <span class="op">-</span> <span class="dv">1</span>) <span class="ot">`mod`</span> memorySize memory</span></code></pre></div>
<p>We wrap the <code class="sourceCode haskell"><span class="dt">IOVector</span> <span class="dt">Int8</span></code> with a <code class="sourceCode haskell"><span class="dt">Memory</span></code> <code class="sourceCode haskell"><span class="kw">newtype</span></code>. <code>newMemory</code> creates a new memory array of bytes initialized to zero. <code>memorySize</code> returns the size of the memory. <code>readMemory</code>, <code>writeMemory</code> and <code>modifyMemory</code> are for reading from, writing to and modifying the memory respectively. <code>nextMemoryIndex</code> and <code>prevMemoryIndex</code> increment and decrement the array index respectively, taking care of wrapping at boundaries.</p>
<p>Now we write the <code>main</code> function using the <code class="sourceCode haskell"><span class="dt">Interpreter</span></code> typeclass functions:</p>
<div class="sourceCode" id="cb4" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">IO</span><span class="op">.</span>hSetBuffering <span class="dt">IO</span><span class="op">.</span>stdin <span class="dt">IO</span><span class="op">.</span><span class="dt">NoBuffering</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">IO</span><span class="op">.</span>hSetBuffering <span class="dt">IO</span><span class="op">.</span>stdout <span class="dt">IO</span><span class="op">.</span><span class="dt">LineBuffering</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a> progName <span class="ot"><-</span> getProgName</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> usage <span class="ot">=</span> <span class="st">"Usage: "</span> <span class="op"><></span> progName <span class="op"><></span> <span class="st">" -(s|a|b|o) <bf_file>"</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a> getArgs <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a> [interpreterType, fileName] <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a> code <span class="ot"><-</span> <span class="fu">filter</span> (<span class="ot">`elem`</span> <span class="st">"+-.,><[]"</span>) <span class="op"><$></span> <span class="fu">readFile</span> fileName</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a> memory <span class="ot"><-</span> newMemory <span class="dv">30000</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a> parseAndInterpret memory code usage interpreterType</span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> exitWithMsg usage</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a> parseAndInterpret memory code usage <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a> <span class="st">"-s"</span> <span class="ot">-></span> interpret <span class="op">@</span><span class="dt">StringInterpreter</span> memory <span class="op">$</span> parse code</span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a> <span class="st">"-a"</span> <span class="ot">-></span> interpret <span class="op">@</span><span class="dt">ASTInterpreter</span> memory <span class="op">$</span> parse code</span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a> <span class="st">"-b"</span> <span class="ot">-></span> interpret <span class="op">@</span><span class="dt">BytecodeInterpreter</span> memory <span class="op">$</span> parse code</span>
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a> <span class="st">"-o"</span> <span class="ot">-></span> interpret <span class="op">@</span><span class="dt">OptimizingBytecodeInterpreter</span> memory <span class="op">$</span> parse code</span>
<span id="cb4-20"><a href="#cb4-20" aria-hidden="true" tabindex="-1"></a> t <span class="ot">-></span> exitWithMsg <span class="op">$</span> <span class="st">"Invalid interpreter type: "</span> <span class="op"><></span> t <span class="op"><></span> <span class="st">"\n"</span> <span class="op"><></span> usage</span>
<span id="cb4-21"><a href="#cb4-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-22"><a href="#cb4-22" aria-hidden="true" tabindex="-1"></a> exitWithMsg msg <span class="ot">=</span> <span class="dt">IO</span><span class="op">.</span>hPutStrLn <span class="dt">IO</span><span class="op">.</span>stderr msg <span class="op">>></span> exitFailure</span></code></pre></div>
<p>The <code>main</code> function calls the <code>parse</code> and <code>interpret</code> functions for the right interpreter with a new memory and the input string read from the file specified in the command line argument. We make sure to filter out non-<abbr title="Brainfuck">BF</abbr> characters when reading the input file.</p>
<p>With the setup done, let’s move on to our first interpreter.</p>
<h2 data-track-content data-content-name="string-interpreter" data-content-piece="brainfuck-interpreter" id="string-interpreter">String Interpreter</h2>
<p>A <abbr title="Brainfuck">BF</abbr> program can be interpreted directly from its string representation, going over the characters and executing the right logic for them. But strings in Haskell are notoriously slow because they are implemented as singly linked-lists of characters. Indexing into strings has <span class="math inline">\(O(n)\)</span> time complexity, so it is not a good idea to use them directly. Instead, we use a char Zipper<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">StringInterpreter</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Interpreter</span> <span class="dt">StringInterpreter</span> <span class="kw">where</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">data</span> <span class="dt">Program</span> <span class="dt">StringInterpreter</span> <span class="ot">=</span> <span class="dt">ProgramCZ</span> <span class="dt">CharZipper</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> parse <span class="ot">=</span> <span class="dt">ProgramCZ</span> <span class="op">.</span> czFromString</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> interpret memory (<span class="dt">ProgramCZ</span> code) <span class="ot">=</span> interpretCharZipper memory code</span></code></pre></div>
<p>Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>.</p>
<div class="sourceCode" id="cb6" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">CharZipper</span> <span class="ot">=</span> <span class="dt">CharZipper</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot">czLeft ::</span> <span class="dt">String</span>,<span class="ot"> czFocus ::</span> <span class="dt">Maybe</span> <span class="dt">Char</span>,<span class="ot"> czRight ::</span> <span class="dt">String</span>}</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="ot">czFromString ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">CharZipper</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>czFromString <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> [] <span class="ot">-></span> <span class="dt">CharZipper</span> [] <span class="dt">Nothing</span> []</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> (x <span class="op">:</span> xs) <span class="ot">-></span> <span class="dt">CharZipper</span> [] (<span class="dt">Just</span> x) xs</span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a><span class="ot">czMoveLeft ::</span> <span class="dt">CharZipper</span> <span class="ot">-></span> <span class="dt">CharZipper</span></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a>czMoveLeft <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">CharZipper</span> [] (<span class="dt">Just</span> focus) right <span class="ot">-></span> <span class="dt">CharZipper</span> [] <span class="dt">Nothing</span> (focus <span class="op">:</span> right)</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">CharZipper</span> (x <span class="op">:</span> xs) (<span class="dt">Just</span> focus) right <span class="ot">-></span> <span class="dt">CharZipper</span> xs (<span class="dt">Just</span> x) (focus <span class="op">:</span> right)</span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a> z <span class="ot">-></span> z</span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a><span class="ot">czMoveRight ::</span> <span class="dt">CharZipper</span> <span class="ot">-></span> <span class="dt">CharZipper</span></span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a>czMoveRight <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">CharZipper</span> left (<span class="dt">Just</span> focus) [] <span class="ot">-></span> <span class="dt">CharZipper</span> (focus <span class="op">:</span> left) <span class="dt">Nothing</span> []</span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">CharZipper</span> left (<span class="dt">Just</span> focus) (x <span class="op">:</span> xs) <span class="ot">-></span> <span class="dt">CharZipper</span> (focus <span class="op">:</span> left) (<span class="dt">Just</span> x) xs</span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a> z <span class="ot">-></span> z</span></code></pre></div>
<p>This zipper is a little different from the usual implementations because we need to know when the focus of the zipper has moved out the program boundaries. Hence, we model the focus as <code class="sourceCode haskell"><span class="dt">Maybe</span> <span class="dt">Char</span></code>. <code>czFromString</code> creates a char zipper from a string. <code>czMoveLeft</code> and <code>czMoveRight</code> move the focus left and right respectively, taking care of setting the focus to <code class="sourceCode haskell"><span class="dt">Nothing</span></code> if we move outside the program string.</p>
<p>Parsing the program is thus same as creating the char zipper from the program string. For interpreting the program, we write this function:</p>
<div class="sourceCode" id="cb7" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">interpretCharZipper ::</span> <span class="dt">Memory</span> <span class="ot">-></span> <span class="dt">CharZipper</span> <span class="ot">-></span> <span class="dt">IO</span> ()</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>interpretCharZipper memory <span class="ot">=</span> go <span class="dv">0</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> go <span class="op">!</span>memIdx <span class="op">!</span>program <span class="ot">=</span> <span class="kw">case</span> czFocus program <span class="kw">of</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="fu">return</span> ()</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="ot">-></span> <span class="kw">case</span> c <span class="kw">of</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> <span class="ch">'+'</span> <span class="ot">-></span> modifyMemory memory (<span class="op">+</span> <span class="dv">1</span>) memIdx <span class="op">>></span> goNext</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> <span class="ch">'-'</span> <span class="ot">-></span> modifyMemory memory (<span class="fu">subtract</span> <span class="dv">1</span>) memIdx <span class="op">>></span> goNext</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> <span class="ch">'>'</span> <span class="ot">-></span> go (nextMemoryIndex memory memIdx) program'</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a> <span class="ch">'<'</span> <span class="ot">-></span> go (prevMemoryIndex memory memIdx) program'</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a> <span class="ch">','</span> <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a> <span class="fu">getChar</span> <span class="op">>>=</span> writeMemory memory memIdx <span class="op">.</span> <span class="fu">fromIntegral</span> <span class="op">.</span> <span class="fu">ord</span></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a> goNext</span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a> <span class="ch">'.'</span> <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a> readMemory memory memIdx <span class="op">>>=</span> <span class="fu">putChar</span> <span class="op">.</span> <span class="fu">chr</span> <span class="op">.</span> <span class="fu">fromIntegral</span></span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a> goNext</span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a> <span class="ch">'['</span> <span class="ot">-></span> readMemory memory memIdx <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> go memIdx <span class="op">$</span> skipRight <span class="dv">1</span> program</span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> goNext</span>
<span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a> <span class="ch">']'</span> <span class="ot">-></span> readMemory memory memIdx <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> goNext</span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> go memIdx <span class="op">$</span> skipLeft <span class="dv">1</span> program</span>
<span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> goNext</span>
<span id="cb7-24"><a href="#cb7-24" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb7-25"><a href="#cb7-25" aria-hidden="true" tabindex="-1"></a> program' <span class="ot">=</span> czMoveRight program</span>
<span id="cb7-26"><a href="#cb7-26" aria-hidden="true" tabindex="-1"></a> goNext <span class="ot">=</span> go memIdx program'</span></code></pre></div>
<p>Our main driver here is the tail-recursive <code>go</code> function that takes the memory index and the program as inputs. It then gets the current focus of the program zipper, and executes the <abbr title="Brainfuck">BF</abbr> logic accordingly.</p>
<p>If the current focus is <code class="sourceCode haskell"><span class="dt">Nothing</span></code>, it means the program has finished running. So we end the execution. Otherwise, we switch over the character and do what the <abbr title="Brainfuck">BF</abbr> spec tells us to do.</p>
<p>For <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="st">+</span></code> and <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="st">-</span></code>, we increment or decrement respectively the value in the memory cell at the current index, and go to the next character. For <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="kw">></span></code> and <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="kw"><</span></code>, we increment or decrement the memory index respectively, and go to the next character.</p>
<p>For <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="fl">,</span></code>, we read an ASCII encoded character from the standard input, and write it to the memory at the current memory index as a byte. For <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="fl">.</span></code>, we read the byte from the memory at the current memory index, and write it out to the standard output as an ASCII encoded character. After either cases, we go to the next character.</p>
<p>For <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">[</span></code>, we read the byte at the current memory index, and if it is zero, we skip right over the part of the program till the matching <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">]</span></code> is found. Otherwise, we go to the next character.</p>
<p>For <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">]</span></code>, we skip left over the part of the program till the matching <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">[</span></code> is found, if the current memory byte is non-zero. Otherwise, we go to the next character.</p>
<p>The next two functions implement the skipping logic:</p>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">skipRight ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">CharZipper</span> <span class="ot">-></span> <span class="dt">CharZipper</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>skipRight <span class="op">!</span>depth <span class="op">!</span>program</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> depth <span class="op">==</span> <span class="dv">0</span> <span class="ot">=</span> program'</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="kw">case</span> czFocus program' <span class="kw">of</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="fu">error</span> <span class="st">"No matching [ while skipping the loop forward"</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> <span class="ch">'['</span> <span class="ot">-></span> skipRight (depth <span class="op">+</span> <span class="dv">1</span>) program'</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> <span class="ch">']'</span> <span class="ot">-></span> skipRight (depth <span class="op">-</span> <span class="dv">1</span>) program'</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> skipRight depth program'</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a> program' <span class="ot">=</span> czMoveRight program</span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a><span class="ot">skipLeft ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">CharZipper</span> <span class="ot">-></span> <span class="dt">CharZipper</span></span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a>skipLeft <span class="op">!</span>depth <span class="op">!</span>program</span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> depth <span class="op">==</span> <span class="dv">0</span> <span class="ot">=</span> czMoveRight program</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="kw">case</span> czFocus program' <span class="kw">of</span></span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="fu">error</span> <span class="st">"No matching ] while skipping the loop backward"</span></span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> <span class="ch">']'</span> <span class="ot">-></span> skipLeft (depth <span class="op">+</span> <span class="dv">1</span>) program'</span>
<span id="cb8-18"><a href="#cb8-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> <span class="ch">'['</span> <span class="ot">-></span> skipLeft (depth <span class="op">-</span> <span class="dv">1</span>) program'</span>
<span id="cb8-19"><a href="#cb8-19" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> skipLeft depth program'</span>
<span id="cb8-20"><a href="#cb8-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb8-21"><a href="#cb8-21" aria-hidden="true" tabindex="-1"></a> program' <span class="ot">=</span> czMoveLeft program</span></code></pre></div>
<p>The tail-recursive functions <code>skipRight</code> and <code>skipLeft</code> skip over parts of the program by moving the focus to right and left respectively, till the matching bracket is found. Since the loops can contain nested loops, we keep track of the depth of loops we are in, and return only when the depth becomes zero. If we move off the program boundaries while skipping, we throw an error.</p>
<p>That’s it! We now have a fully functioning <abbr title="Brainfuck">BF</abbr> interpreter. To test it, we use these two <abbr title="Brainfuck">BF</abbr> programs: <a href="https://abhinavsarkar.net/code/hanoi.html?mtm_campaign=feed"><code>hanoi.bf</code></a> and <a href="https://abhinavsarkar.net/code/mandelbrot.html?mtm_campaign=feed"><code>mandelbrot.bf</code></a>.</p>
<p><code>hanoi.bf</code> solves the <a href="https://en.wikipedia.org/wiki/Tower_of_Hanoi" target="_blank" rel="noopener">Tower of Hanoi</a> puzzle with animating the solution process as ASCII art:</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 925 547'%3E%3C/svg%3E" class="lazyload w-100pct nolink mw-70pct" style="--image-aspect-ratio: 1.6910420475319927" data-src="/images/brainfuck-interpreter/hanoi.svg" alt="A freeze-frame from the animation of solving the Tower of Hanoi puzzle with hanoi.bf"></img>
<noscript><img src="/images/brainfuck-interpreter/hanoi.svg" class="w-100pct nolink mw-70pct" alt="A freeze-frame from the animation of solving the Tower of Hanoi puzzle with hanoi.bf"></img></noscript>
<figcaption>A freeze-frame from the animation of solving the Tower of Hanoi puzzle with <code>hanoi.bf</code></figcaption>
</figure>
<p><code>mandelbrot.bf</code> prints an ASCII art showing the <a href="https://en.wikipedia.org/wiki/Mandelbrot_set" target="_blank" rel="noopener">Mandelbrot set</a>:</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 1090 865'%3E%3C/svg%3E" class="lazyload w-100pct nolink" style="--image-aspect-ratio: 1.260115606936416" data-src="/images/brainfuck-interpreter/mandelbrot.svg" alt="Mandelbrot set ASCII art by mandelbrot.bf"></img>
<noscript><img src="/images/brainfuck-interpreter/mandelbrot.svg" class="w-100pct nolink" alt="Mandelbrot set ASCII art by mandelbrot.bf"></img></noscript>
<figcaption>Mandelbrot set ASCII art by <code>mandelbrot.bf</code></figcaption>
</figure>
<p>Both of these <abbr title="Brainfuck">BF</abbr> programs serve as good benchmarks for <abbr title="Brainfuck">BF</abbr> interpreters. Let’s test ours by compiling and running it<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>:</p>
<pre class="plain"><code>❯ nix-shell -p "ghc.withPackages (pkgs: with pkgs; [vector memory])" \
--run "ghc --make bfi.hs -O2"
[1 of 2] Compiling Main ( bfi.hs, bfi.o )
[2 of 2] Linking bfi [Objects changed]
❯ time ./bfi -s hanoi.bf > /dev/null
29.15 real 29.01 user 0.13 sys
❯ time ./bfi -s mandelbrot.bf > /dev/null
94.86 real 94.11 user 0.50 sys</code></pre>
<p>That seems quite slow. We can do better.</p>
<h2 data-track-content data-content-name="ast-interpreter" data-content-piece="brainfuck-interpreter" id="ast-interpreter">AST Interpreter</h2>
<p>Instead of executing <abbr title="Brainfuck">BF</abbr> programs from their string representations, we can parse them to an <em><a href="https://en.wikipedia.org/wiki/Abstract_Syntax_Tree" target="_blank" rel="noopener">Abstract Syntax Tree</a></em> (AST). This allows us to match brackets only once at parse time, instead of doing it repeatedly at run time. We capture loops as <abbr title="Abstract Syntax Tree">AST</abbr> nodes, allowing us to skip them trivially.</p>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">ASTInterpreter</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Interpreter</span> <span class="dt">ASTInterpreter</span> <span class="kw">where</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">data</span> <span class="dt">Program</span> <span class="dt">ASTInterpreter</span> <span class="ot">=</span> <span class="dt">ProgramAST</span> <span class="dt">Instructions</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> parse <span class="ot">=</span> <span class="dt">ProgramAST</span> <span class="op">.</span> parseToInstrs</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> interpret memory (<span class="dt">ProgramAST</span> instrs) <span class="ot">=</span> interpretAST memory instrs</span></code></pre></div>
<p>We represent the <abbr title="Brainfuck">BF</abbr> <abbr title="Abstract Syntax Tree">AST</abbr> as a Haskell <em><a href="https://en.wikipedia.org/wiki/Algebraic_Data_Type" target="_blank" rel="noopener">Algebraic Data Type</a></em> (ADT):</p>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Instructions</span> <span class="ot">=</span> <span class="dt">V.Vector</span> <span class="dt">Instruction</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Instruction</span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Inc</span> <span class="co">-- +</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Dec</span> <span class="co">-- -</span></span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">MoveR</span> <span class="co">-- ></span></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">MoveL</span> <span class="co">-- <</span></span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">GetC</span> <span class="co">-- ,</span></span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">PutC</span> <span class="co">-- .</span></span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Loop</span> <span class="dt">Instructions</span> <span class="co">-- []</span></span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>)</span></code></pre></div>
<p>There is one constructor per <abbr title="Brainfuck">BF</abbr> instruction, except for loops where the <code class="sourceCode haskell"><span class="dt">Loop</span></code> constructor captures both the start and end of loop instructions. We use immutable boxed vectors for lists of instructions instead of Haskell lists so that we can index into them in <span class="math inline">\(O(1)\)</span>.</p>
<p>We use the parse combinator library <a href="https://hackage.haskell.org/package/base/docs/Text-ParserCombinators-ReadP.html" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ReadP</span></code></a> to write a recursive-decent parser for <abbr title="Brainfuck">BF</abbr>:</p>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">parseToInstrs ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Instructions</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>parseToInstrs code <span class="ot">=</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> V.fromList <span class="op">$</span> <span class="kw">case</span> P.readP_to_S (P.many instrParser <span class="op"><*</span> P.eof) code <span class="kw">of</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> [(res, <span class="st">""</span>)] <span class="ot">-></span> res</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a> out <span class="ot">-></span> <span class="fu">error</span> <span class="op">$</span> <span class="st">"Unexpected output while parsing: "</span> <span class="op"><></span> <span class="fu">show</span> out</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a><span class="ot">instrParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Instruction</span></span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a>instrParser <span class="ot">=</span> P.choice</span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a> [ P.char <span class="ch">'+'</span> <span class="op">$></span> <span class="dt">Inc</span>,</span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a> P.char <span class="ch">'-'</span> <span class="op">$></span> <span class="dt">Dec</span>,</span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a> P.char <span class="ch">'>'</span> <span class="op">$></span> <span class="dt">MoveR</span>,</span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a> P.char <span class="ch">'<'</span> <span class="op">$></span> <span class="dt">MoveL</span>,</span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a> P.char <span class="ch">','</span> <span class="op">$></span> <span class="dt">GetC</span>,</span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a> P.char <span class="ch">'.'</span> <span class="op">$></span> <span class="dt">PutC</span>,</span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">Loop</span> <span class="op">.</span> V.fromList <span class="op"><$></span> P.between (P.char <span class="ch">'['</span>) (P.char <span class="ch">']'</span>) (P.many instrParser)</span>
<span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a> ]</span></code></pre></div>
<p>All cases except the loop one are straightforward. For loops, we call the parser recursively to parse the loop body. Note that the parser matches the loop brackets correctly. If the brackets don’t match, the parser fails.</p>
<p>Next, we interpret the <abbr title="Brainfuck">BF</abbr> <abbr title="Abstract Syntax Tree">AST</abbr>:</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">interpretAST ::</span> <span class="dt">Memory</span> <span class="ot">-></span> <span class="dt">Instructions</span> <span class="ot">-></span> <span class="dt">IO</span> ()</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>interpretAST memory <span class="ot">=</span> void <span class="op">.</span> interpretInstrs <span class="dv">0</span> memory</span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a><span class="ot">interpretInstrs ::</span> <span class="dt">MemIdx</span> <span class="ot">-></span> <span class="dt">Memory</span> <span class="ot">-></span> <span class="dt">Instructions</span> <span class="ot">-></span> <span class="dt">IO</span> <span class="dt">MemIdx</span></span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>interpretInstrs memIdx <span class="op">!</span>memory <span class="op">!</span>program <span class="ot">=</span> go memIdx <span class="dv">0</span></span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> go <span class="op">!</span>memIdx <span class="op">!</span>progIdx</span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> progIdx <span class="op">==</span> V.length program <span class="ot">=</span> <span class="fu">return</span> memIdx</span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="kw">case</span> program <span class="op">V.!</span> progIdx <span class="kw">of</span></span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Inc</span> <span class="ot">-></span> modifyMemory memory (<span class="op">+</span> <span class="dv">1</span>) memIdx <span class="op">>></span> goNext</span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Dec</span> <span class="ot">-></span> modifyMemory memory (<span class="fu">subtract</span> <span class="dv">1</span>) memIdx <span class="op">>></span> goNext</span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">MoveR</span> <span class="ot">-></span> go (nextMemoryIndex memory memIdx) <span class="op">$</span> progIdx <span class="op">+</span> <span class="dv">1</span></span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">MoveL</span> <span class="ot">-></span> go (prevMemoryIndex memory memIdx) <span class="op">$</span> progIdx <span class="op">+</span> <span class="dv">1</span></span>
<span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">GetC</span> <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a> <span class="fu">getChar</span> <span class="op">>>=</span> writeMemory memory memIdx <span class="op">.</span> <span class="fu">fromIntegral</span> <span class="op">.</span> <span class="fu">ord</span></span>
<span id="cb13-16"><a href="#cb13-16" aria-hidden="true" tabindex="-1"></a> goNext</span>
<span id="cb13-17"><a href="#cb13-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">PutC</span> <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb13-18"><a href="#cb13-18" aria-hidden="true" tabindex="-1"></a> readMemory memory memIdx <span class="op">>>=</span> <span class="fu">putChar</span> <span class="op">.</span> <span class="fu">chr</span> <span class="op">.</span> <span class="fu">fromIntegral</span></span>
<span id="cb13-19"><a href="#cb13-19" aria-hidden="true" tabindex="-1"></a> goNext</span>
<span id="cb13-20"><a href="#cb13-20" aria-hidden="true" tabindex="-1"></a> <span class="dt">Loop</span> instrs <span class="ot">-></span> readMemory memory memIdx <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb13-21"><a href="#cb13-21" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> goNext</span>
<span id="cb13-22"><a href="#cb13-22" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> interpretInstrs memIdx memory instrs <span class="op">>>=</span> <span class="fu">flip</span> go progIdx</span>
<span id="cb13-23"><a href="#cb13-23" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb13-24"><a href="#cb13-24" aria-hidden="true" tabindex="-1"></a> goNext <span class="ot">=</span> go memIdx <span class="op">$</span> progIdx <span class="op">+</span> <span class="dv">1</span></span></code></pre></div>
<p>The <abbr title="Abstract Syntax Tree">AST</abbr> interpreter code is quite similar to the string interpreter one. This time we use an integer as the <abbr title="Instruction pointer">IP</abbr> to index the <code class="sourceCode haskell"><span class="dt">Instructions</span></code> vector. All cases except the loop one are pretty much same as before.</p>
<p>For loops, we read the byte at the current memory index, and if it is zero, we skip executing the <code class="sourceCode haskell"><span class="dt">Loop</span></code> <abbr title="Abstract Syntax Tree">AST</abbr> node and go to the next instruction. Otherwise, we recursively interpret the loop body and go to the next instruction, taking care of passing the updated memory index returned from the recursive call to the execution of the next instruction.</p>
<p>And we are done. Let’s see how it performs:</p>
<pre class="plain"><code>❯ time ./bfi -a hanoi.bf > /dev/null
14.94 real 14.88 user 0.05 sys
❯ time ./bfi -a mandelbrot.bf > /dev/null
36.49 real 36.32 user 0.17 sys</code></pre>
<p>Great! <code>hanoi.bf</code> runs 2x faster, whereas <code>mandelbrot.bf</code> runs 2.6x faster. Can we do even better?</p>
<h2 data-track-content data-content-name="bytecode-interpreter" data-content-piece="brainfuck-interpreter" id="bytecode-interpreter">Bytecode Interpreter</h2>
<p><abbr title="Abstract Syntax Tree">AST</abbr> interpreters are well known to be slow because of how <abbr title="Abstract Syntax Tree">AST</abbr> nodes are represented in the computer’s memory. The <abbr title="Abstract Syntax Tree">AST</abbr> nodes contain pointers to other nodes, which may be anywhere in the memory. So while interpreting an <abbr title="Abstract Syntax Tree">AST</abbr>, it jumps all over the memory, causing a slowdown. One solution to this is to convert the <abbr title="Abstract Syntax Tree">AST</abbr> into a more compact and optimized representation known as <em><a href="https://en.wikipedia.org/wiki/Bytecode" target="_blank" rel="noopener">Bytecode</a></em>. That’s what our next interpreter uses.</p>
<div class="sourceCode" id="cb15" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">BytecodeInterpreter</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Interpreter</span> <span class="dt">BytecodeInterpreter</span> <span class="kw">where</span></span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">data</span> <span class="dt">Program</span> <span class="dt">BytecodeInterpreter</span> <span class="ot">=</span> <span class="dt">ProgramBC</span> <span class="dt">BA.Bytes</span></span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a> parse <span class="ot">=</span></span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a> parseToInstrs</span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> translate</span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> assemble</span>
<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> <span class="dt">ProgramBC</span></span>
<span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a> interpret memory (<span class="dt">ProgramBC</span> bytecode) <span class="ot">=</span> interpretBytecode memory bytecode</span></code></pre></div>
<p>We reuse the parser from the <abbr title="Abstract Syntax Tree">AST</abbr> interpreter, but then we convert the resultant <abbr title="Abstract Syntax Tree">AST</abbr> into bytecode by translating and assembling it<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>. We use the <a href="https://hackage.haskell.org/package/memory/docs/Data-ByteArray.html#t:Bytes" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Bytes</span></code></a> byte array data type from the memory package to represent bytecode.</p>
<p>Unlike <abbr title="Abstract Syntax Tree">AST</abbr>, bytecode has a flat list of instructions—called <em><a href="https://en.wikipedia.org/wiki/Opcodes" target="_blank" rel="noopener">Opcodes</a></em>—that can be encoded in a single byte each, with optional parameters. Because of its flat nature and compactness, bytecode is more CPU friendly to execute, which is where it gets its performance from. The downside is that bytecode is not human readable unlike <abbr title="Abstract Syntax Tree">AST</abbr>.</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="9-9"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Opcode</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">OpInc</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpDec</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpMoveR</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpMoveL</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpGetC</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpPutC</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpLoop</span> <span class="dt">Opcodes</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="op">|</span> <span class="dt">OpClear</span></span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>)</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Opcodes</span> <span class="ot">=</span> [<span class="dt">Opcode</span>]</span></code></pre></div>
<p>We use the <code class="sourceCode haskell"><span class="dt">Opcode</span></code> <abbr title="Algebraic Data Type">ADT</abbr> to model the <abbr title="Brainfuck">BF</abbr> opcodes. For now, it corresponds one-to-one with the <code class="sourceCode haskell"><span class="dt">Instruction</span></code> <abbr title="Algebraic Data Type">ADT</abbr>.</p>
<p>The <code>translate</code> function translates <code class="sourceCode haskell"><span class="dt">Instructions</span></code> to <code class="sourceCode haskell"><span class="dt">Opcodes</span></code>:</p>
<div class="sourceCode" id="cb17" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">translate ::</span> <span class="dt">Instructions</span> <span class="ot">-></span> <span class="dt">Opcodes</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>translate <span class="ot">=</span> V.toList <span class="op">>>></span> <span class="fu">map</span> translateOpcode</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a> translateOpcode <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Inc</span> <span class="ot">-></span> <span class="dt">OpInc</span></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Dec</span> <span class="ot">-></span> <span class="dt">OpDec</span></span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">MoveR</span> <span class="ot">-></span> <span class="dt">OpMoveR</span></span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">MoveL</span> <span class="ot">-></span> <span class="dt">OpMoveL</span></span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">GetC</span> <span class="ot">-></span> <span class="dt">OpGetC</span></span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">PutC</span> <span class="ot">-></span> <span class="dt">OpPutC</span></span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Loop</span> instrs <span class="ot">-></span> <span class="dt">OpLoop</span> <span class="op">$</span> translate instrs</span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a></span></code></pre></div>
<p>The <code>assemble</code> function assembles <code class="sourceCode haskell"><span class="dt">Opcodes</span></code> to bytecode byte array:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="20-20"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">assemble ::</span> <span class="dt">Opcodes</span> <span class="ot">-></span> <span class="dt">BA.Bytes</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>assemble <span class="ot">=</span> BA.pack <span class="op">.</span> <span class="fu">concatMap</span> assembleOpcode</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="ot">assembleOpcode ::</span> <span class="dt">Opcode</span> <span class="ot">-></span> [<span class="dt">Word8</span>]</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>assembleOpcode <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpInc</span> <span class="ot">-></span> [<span class="dv">0</span>]</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpDec</span> <span class="ot">-></span> [<span class="dv">1</span>]</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpMoveR</span> <span class="ot">-></span> [<span class="dv">2</span>]</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpMoveL</span> <span class="ot">-></span> [<span class="dv">3</span>]</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpGetC</span> <span class="ot">-></span> [<span class="dv">4</span>]</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpPutC</span> <span class="ot">-></span> [<span class="dv">5</span>]</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpLoop</span> body <span class="ot">-></span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> assembledBody <span class="ot">=</span> <span class="fu">concatMap</span> assembleOpcode body</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> bodyLen <span class="ot">=</span> <span class="fu">length</span> assembledBody <span class="op">+</span> <span class="dv">3</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="kw">if</span> bodyLen <span class="op">></span> <span class="dv">65</span>_536 <span class="co">-- 2 ^ 16</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> <span class="fu">error</span> <span class="op">$</span> <span class="st">"Body of loop is too big: "</span> <span class="op"><></span> <span class="fu">show</span> bodyLen</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">do</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> assembledBodyLen <span class="ot">=</span> assembleBodyLen bodyLen</span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a> [<span class="dv">6</span>] <span class="op"><></span> assembledBodyLen <span class="op"><></span> assembledBody <span class="op"><></span> [<span class="dv">7</span>] <span class="op"><></span> assembledBodyLen</span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">OpClear</span> <span class="ot">-></span> [<span class="dv">8</span>]</span></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a> assembleBodyLen bodyLen <span class="ot">=</span></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> lb <span class="ot">=</span> <span class="fu">fromIntegral</span> <span class="op">$</span> bodyLen <span class="op">.&.</span> <span class="bn">0xff</span></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a> mb <span class="ot">=</span> <span class="fu">fromIntegral</span> <span class="op">$</span> (bodyLen <span class="op">.&.</span> <span class="bn">0xff00</span>) <span class="ot">`shiftR`</span> <span class="dv">8</span></span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> [lb, mb] <span class="co">-- assumes Little-endian arch</span></span></code></pre></div>
<p>The <code>assembleOpcode</code> function assembles an <code class="sourceCode haskell"><span class="dt">Opcode</span></code> to a list of bytes (<code class="sourceCode haskell"><span class="dt">Word8</span></code>s). For all cases except for <code class="sourceCode haskell"><span class="dt">OpLoop</span></code>, we simply return a unique byte for the opcode.</p>
<p>For <code class="sourceCode haskell"><span class="dt">OpLoop</span></code>, we first recursively assemble the loop body. We encode both the body and the body length in the assembled bytecode, so that the bytecode interpreter can use the body length to skip over the loop body when required. We use two bytes to encode the body length, so we first check if the body length plus three is over 65536 (<span class="math inline">\(= 2^8*2^8\)</span>). If so, we throw an error. Otherwise, we return:</p>
<ol type="1">
<li>a unique byte for loop start (6),</li>
<li>followed by the body length encoded in two bytes (in the <em><a href="https://en.wikipedia.org/wiki/Little-endian" target="_blank" rel="noopener">Little-endian</a></em> order),</li>
<li>then the assembled loop body,</li>
<li>followed by a unique byte for loop end (7),</li>
<li>finally followed by the encoded body length again.</li>
</ol>
<p>We encode the body length at the end again so that we can use it to jump backward to the start of the loop, to continue looping. Let’s look at this example to understand the loop encoding better:</p>
<div class="sourceCode" id="cb18" data-lang="ghci"><pre class="sourceCode lhs noNumberSource"><code class="sourceCode literatehaskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> code <span class="ot">=</span> <span class="st">"++++++++++++++++++++++++++++++++++++++++++++++++>+++++[<+.>-]"</span></span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">concatMap</span> assembleOpcode <span class="op">.</span> translate <span class="op">.</span> parseToInstrs <span class="op">$</span> code</span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,6,8,0,3,0,5,2,1,7,8,0]</span></code></pre></div>
<p>Let’s focus on the last twelve bytes. The diagram below shows the meaning of the various bytes:</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 496 271'%3E%3C/svg%3E" class="lazyload w-100pct nolink mw-70pct" style="--image-aspect-ratio: 1.830258302583026" data-src="/images/brainfuck-interpreter/bytecode.svg" alt="Assembled bytecode for a loop"></img>
<noscript><img src="/images/brainfuck-interpreter/bytecode.svg" class="w-100pct nolink mw-70pct" alt="Assembled bytecode for a loop"></img></noscript>
<figcaption>Assembled bytecode for a <abbr title="Brainfuck">BF</abbr> loop</figcaption>
</figure>
<p>The example also demonstrates the flat nature of assembled bytecode. Now, all we have to do is to interpret it:</p>
<div class="sourceCode" id="cb19" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">interpretBytecode ::</span> <span class="dt">Memory</span> <span class="ot">-></span> <span class="dt">BA.Bytes</span> <span class="ot">-></span> <span class="dt">IO</span> ()</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>interpretBytecode memory bytecode <span class="ot">=</span></span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a> MV.unsafeWith</span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a> (unMemory memory)</span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a> (BA.withByteArray bytecode</span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> interpretBytecodePtr (memorySize memory) (BA.length bytecode))</span></code></pre></div>
<p>Instead of using integer indices in the bytecode array and memory vector, this time we use C-style direct <a href="https://en.wikipedia.org/wiki/Pointer_(computer_programming)" target="_blank" rel="noopener">pointers</a><a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="26-26"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">ProgPtr</span> <span class="ot">=</span> <span class="dt">Ptr</span> <span class="dt">Word8</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">MemPtr</span> <span class="ot">=</span> <span class="dt">Ptr</span> <span class="dt">Int8</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="ot">interpretBytecodePtr ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">MemPtr</span> <span class="ot">-></span> <span class="dt">ProgPtr</span> <span class="ot">-></span> <span class="dt">IO</span> ()</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>interpretBytecodePtr memLen programLen memStartPtr progStartPtr <span class="ot">=</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> go memStartPtr progStartPtr</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> progEndPtr <span class="ot">=</span> progStartPtr <span class="ot">`plusProgPtr`</span> programLen</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> memEndPtr <span class="ot">=</span> memStartPtr <span class="ot">`plusMemPtr`</span> memLen</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> go <span class="op">!</span>memPtr <span class="op">!</span>progPtr</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> progPtr <span class="op">==</span> progEndPtr <span class="ot">=</span> <span class="fu">return</span> ()</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> readProg <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> modifyMem (<span class="op">+</span> <span class="dv">1</span>) <span class="op">>></span> goNext <span class="co">-- Inc</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="ot">-></span> modifyMem (<span class="fu">subtract</span> <span class="dv">1</span>) <span class="op">>></span> goNext <span class="co">-- Dec</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> <span class="dv">2</span> <span class="ot">-></span> jump (nextMemPtr memStartPtr memEndPtr memPtr <span class="dv">1</span>) <span class="dv">1</span> <span class="co">-- MoveR</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> <span class="dv">3</span> <span class="ot">-></span> jump (prevMemPtr memStartPtr memEndPtr memPtr <span class="dv">1</span>) <span class="dv">1</span> <span class="co">-- MoveL</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> <span class="dv">4</span> <span class="ot">-></span> <span class="fu">getChar</span> <span class="op">>>=</span> writeMem <span class="op">.</span> <span class="fu">fromIntegral</span> <span class="op">.</span> <span class="fu">ord</span> <span class="op">>></span> goNext <span class="co">-- GetC</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a> <span class="dv">5</span> <span class="ot">-></span> readMem <span class="op">>>=</span> <span class="fu">putChar</span> <span class="op">.</span> <span class="fu">chr</span> <span class="op">.</span> <span class="fu">fromIntegral</span> <span class="op">>></span> goNext <span class="co">-- PutC</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> <span class="dv">6</span> <span class="ot">-></span> readMem <span class="op">>>=</span> \<span class="kw">case</span> <span class="co">-- Loop start</span></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> readProg2 <span class="op">>>=</span> jump memPtr</span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> jump memPtr <span class="dv">3</span></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a> <span class="dv">7</span> <span class="ot">-></span> readMem <span class="op">>>=</span> \<span class="kw">case</span> <span class="co">-- Loop end</span></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> jump memPtr <span class="dv">3</span></span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> readProg2 <span class="op">>>=</span> jump memPtr <span class="op">.</span> <span class="fu">negate</span></span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dv">8</span> <span class="ot">-></span> writeMem <span class="dv">0</span> <span class="op">>></span> goNext <span class="co">-- Clear</span></span></span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"></a> op <span class="ot">-></span> <span class="fu">error</span> <span class="op">$</span> <span class="st">"Unknown opcode: "</span> <span class="op"><></span> <span class="fu">show</span> op</span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"></a> goNext <span class="ot">=</span> jump memPtr <span class="dv">1</span></span>
<span id="cb1-30"><a href="#cb1-30" aria-hidden="true" tabindex="-1"></a> jump memPtr offset <span class="ot">=</span> go memPtr <span class="op">$</span> progPtr <span class="ot">`plusProgPtr`</span> offset</span>
<span id="cb1-31"><a href="#cb1-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-32"><a href="#cb1-32" aria-hidden="true" tabindex="-1"></a> readProg <span class="ot">=</span> S.peek progPtr</span>
<span id="cb1-33"><a href="#cb1-33" aria-hidden="true" tabindex="-1"></a> readProg2 <span class="ot">=</span> <span class="co">-- assumes Little-endian arch</span></span>
<span id="cb1-34"><a href="#cb1-34" aria-hidden="true" tabindex="-1"></a> <span class="fu">fromIntegral</span> <span class="op"><$></span> S.peek (castPtr <span class="op">@</span>_ <span class="op">@</span><span class="dt">Word16</span> <span class="op">$</span> progPtr <span class="ot">`plusProgPtr`</span> <span class="dv">1</span>)</span>
<span id="cb1-35"><a href="#cb1-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-36"><a href="#cb1-36" aria-hidden="true" tabindex="-1"></a> readMem <span class="ot">=</span> S.peek memPtr</span>
<span id="cb1-37"><a href="#cb1-37" aria-hidden="true" tabindex="-1"></a> writeMem <span class="ot">=</span> S.poke memPtr</span>
<span id="cb1-38"><a href="#cb1-38" aria-hidden="true" tabindex="-1"></a> modifyMem f <span class="ot">=</span> readMem <span class="op">>>=</span> writeMem <span class="op">.</span> f</span></code></pre></div>
<p>In Haskell, the pointer type <a href="https://hackage.haskell.org/package/base/docs/Foreign-Ptr.html#t:Ptr" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Ptr</span></code></a> is parametrized by the type of the data it points to. We have two types of pointers here, one that points to the bytecode program, and another that points to the memory cells. So in this case, the <abbr title="Instruction pointer">IP</abbr> and <abbr title="Data pointer">DP</abbr> are actually pointers.</p>
<p>The <code>go</code> function here is again the core of the interpreter loop. We track the current <abbr title="Instruction pointer">IP</abbr> and <abbr title="Data pointer">DP</abbr> in it, and execute the logic corresponding to the opcode at the current memory location. <code>go</code> ends when the <abbr title="Instruction pointer">IP</abbr> points to the end of the program byte array.</p>
<p>Most of the cases in <code>go</code> are similar to previous interpreters. Only difference is that we use pointers to read the current opcode and memory cell. For the loop start opcode, we read the byte pointed to by the <abbr title="Data pointer">DP</abbr>, and if it is zero, we read the next two bytes from the program bytecode, and use it as the offset to jump the <abbr title="Instruction pointer">IP</abbr> by to skip over the loop body. Otherwise, we jump the <abbr title="Instruction pointer">IP</abbr> by 3 bytes to skip over the loop start opcode and encoded loop body length bytes. For the loop end opcode, we follow similar steps, except we jump backward to the start of the loop.</p>
<p>The helper functions for doing pointer arithmetic are following:</p>
<div class="sourceCode" id="cb20" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">plusProgPtr ::</span> <span class="dt">ProgPtr</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">ProgPtr</span></span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>plusProgPtr <span class="ot">=</span> plusPtr</span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a><span class="ot">plusMemPtr ::</span> <span class="dt">MemPtr</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">MemPtr</span></span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a>plusMemPtr <span class="ot">=</span> plusPtr</span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a><span class="ot">nextMemPtr ::</span> <span class="dt">MemPtr</span> <span class="ot">-></span> <span class="dt">MemPtr</span> <span class="ot">-></span> <span class="dt">MemPtr</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">MemPtr</span></span>
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a>nextMemPtr memStartPtr memEndPtr memPtr inc <span class="ot">=</span></span>
<span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> memPtr' <span class="ot">=</span> memPtr <span class="ot">`plusMemPtr`</span> inc</span>
<span id="cb20-10"><a href="#cb20-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="kw">if</span> memEndPtr <span class="op">></span> memPtr'</span>
<span id="cb20-11"><a href="#cb20-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> memPtr'</span>
<span id="cb20-12"><a href="#cb20-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> memStartPtr <span class="ot">`plusPtr`</span> (memPtr' <span class="ot">`minusPtr`</span> memEndPtr)</span>
<span id="cb20-13"><a href="#cb20-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb20-14"><a href="#cb20-14" aria-hidden="true" tabindex="-1"></a><span class="ot">prevMemPtr ::</span> <span class="dt">MemPtr</span> <span class="ot">-></span> <span class="dt">MemPtr</span> <span class="ot">-></span> <span class="dt">MemPtr</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">MemPtr</span></span>
<span id="cb20-15"><a href="#cb20-15" aria-hidden="true" tabindex="-1"></a>prevMemPtr memStartPtr memEndPtr memPtr inc <span class="ot">=</span></span>
<span id="cb20-16"><a href="#cb20-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> memPtr' <span class="ot">=</span> memPtr <span class="ot">`plusMemPtr`</span> (<span class="op">-</span><span class="dv">1</span> <span class="op">*</span> inc)</span>
<span id="cb20-17"><a href="#cb20-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="kw">if</span> memPtr' <span class="op">>=</span> memStartPtr</span>
<span id="cb20-18"><a href="#cb20-18" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> memPtr'</span>
<span id="cb20-19"><a href="#cb20-19" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> memEndPtr <span class="ot">`plusPtr`</span> (memPtr' <span class="ot">`minusPtr`</span> memStartPtr)</span></code></pre></div>
<p><code>nextMemPtr</code> and <code>prevMemPtr</code> implement wrapping of pointers as we <a href="#cb3-1">do for memory indices</a> in <code>nextMemoryIndex</code> and <code>prevMemoryIndex</code>. Let’s see what the results of our hard work are:</p>
<pre class="plain"><code>❯ time ./bfi -b hanoi.bf > /dev/null
11.10 real 11.04 user 0.04 sys
❯ time ./bfi -b mandelbrot.bf > /dev/null
15.72 real 15.68 user 0.04 sys</code></pre>
<p>1.3x and 2.3x speedups for <code>hanoi.bf</code> and <code>mandelbrot.bf</code> respectively over the <abbr title="Abstract Syntax Tree">AST</abbr> interpreter. Not bad. But surely we can do even better?</p>
<h2 data-track-content data-content-name="optimizing-bytecode-interpreter" data-content-piece="brainfuck-interpreter" id="optimizing-bytecode-interpreter">Optimizing Bytecode Interpreter</h2>
<p>We can optimize our bytecode interpreter by emitting specialized opcodes for particular patterns of opcodes that occur frequently. Think of it as replacing every occurrence of a long phrase in a text with a single word that means the same, leading to a shorter text and faster reading time. Since <abbr title="Brainfuck">BF</abbr> is so verbose, there are many opportunities for optimizing <abbr title="Brainfuck">BF</abbr> bytecode<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>. We are going to implement only one simple optimization, just to get a taste of how to do it.</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="8-8"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">OptimizingBytecodeInterpreter</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Interpreter</span> <span class="dt">OptimizingBytecodeInterpreter</span> <span class="kw">where</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">data</span> <span class="dt">Program</span> <span class="dt">OptimizingBytecodeInterpreter</span> <span class="ot">=</span> <span class="dt">ProgramOBC</span> <span class="dt">BA.Bytes</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> parse <span class="ot">=</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> parseToInstrs</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> translate</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op">>>></span> optimize</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> assemble</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> <span class="dt">ProgramOBC</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> interpret memory (<span class="dt">ProgramOBC</span> bytecode) <span class="ot">=</span> interpretBytecode memory bytecode</span></code></pre></div>
<p>The optimizing bytecode interpreter is pretty much same as the bytecode interpreter, with the <code>optimize</code> function called between the translation and assembly phases.</p>
<p>The pattern of opcode we are optimizing for is <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">[</span><span class="st">-</span><span class="cn">]</span></code> and <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">[</span><span class="st">+</span><span class="cn">]</span></code>. Both of these <abbr title="Brainfuck">BF</abbr> opcodes when executed, decrement or increment the current memory cell till it becomes zero. In effect, these patterns clear the current cell. We start the process by adding a new <code class="sourceCode haskell"><span class="dt">Opcode</span></code> for clearing a cell:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="9-9"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Opcode</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">OpInc</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpDec</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpMoveR</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpMoveL</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpGetC</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpPutC</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">OpLoop</span> <span class="dt">Opcodes</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op">|</span> <span class="dt">OpClear</span></span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>)</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Opcodes</span> <span class="ot">=</span> [<span class="dt">Opcode</span>]</span></code></pre></div>
<p>The <code>optimize</code> function recursively goes over the <code class="sourceCode haskell"><span class="dt">Opcodes</span></code>, and emits optimized ones by replacing the patterns that clear the current cell with <code class="sourceCode haskell"><span class="dt">OpClear</span></code>:</p>
<div class="sourceCode" id="cb22" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="ot">optimize ::</span> <span class="dt">Opcodes</span> <span class="ot">-></span> <span class="dt">Opcodes</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>optimize <span class="ot">=</span> <span class="fu">map</span> <span class="op">$</span> \<span class="kw">case</span></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpLoop</span> [<span class="dt">OpDec</span>] <span class="ot">-></span> <span class="dt">OpClear</span></span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpLoop</span> [<span class="dt">OpInc</span>] <span class="ot">-></span> <span class="dt">OpClear</span></span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpLoop</span> body <span class="ot">-></span> <span class="dt">OpLoop</span> <span class="op">$</span> optimize body</span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a> op <span class="ot">-></span> op</span></code></pre></div>
<p>Then we modify the <code>assembleOpcode</code> function to emit a unique byte for <code class="sourceCode haskell"><span class="dt">OpClear</span></code>:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="17-17"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">assembleOpcode ::</span> <span class="dt">Opcode</span> <span class="ot">-></span> [<span class="dt">Word8</span>]</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>assembleOpcode <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpInc</span> <span class="ot">-></span> [<span class="dv">0</span>]</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpDec</span> <span class="ot">-></span> [<span class="dv">1</span>]</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpMoveR</span> <span class="ot">-></span> [<span class="dv">2</span>]</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpMoveL</span> <span class="ot">-></span> [<span class="dv">3</span>]</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpGetC</span> <span class="ot">-></span> [<span class="dv">4</span>]</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpPutC</span> <span class="ot">-></span> [<span class="dv">5</span>]</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">OpLoop</span> body <span class="ot">-></span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> assembledBody <span class="ot">=</span> <span class="fu">concatMap</span> assembleOpcode body</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> bodyLen <span class="ot">=</span> <span class="fu">length</span> assembledBody <span class="op">+</span> <span class="dv">3</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="kw">if</span> bodyLen <span class="op">></span> <span class="dv">65</span>_536 <span class="co">-- 2 ^ 16</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> <span class="fu">error</span> <span class="op">$</span> <span class="st">"Body of loop is too big: "</span> <span class="op"><></span> <span class="fu">show</span> bodyLen</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">do</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> assembledBodyLen <span class="ot">=</span> assembleBodyLen bodyLen</span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> [<span class="dv">6</span>] <span class="op"><></span> assembledBodyLen <span class="op"><></span> assembledBody <span class="op"><></span> [<span class="dv">7</span>] <span class="op"><></span> assembledBodyLen</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dt">OpClear</span> <span class="ot">-></span> [<span class="dv">8</span>]</span></span></code></pre></div>
<p>Finally, we modify the bytecode interpreter to execute the <code class="sourceCode haskell"><span class="dt">OpClear</span></code> opcode.</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="16-16"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>go <span class="op">!</span>memPtr <span class="op">!</span>progPtr</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> progPtr <span class="op">==</span> progEndPtr <span class="ot">=</span> <span class="fu">return</span> ()</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> readProg <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> modifyMem (<span class="op">+</span> <span class="dv">1</span>) <span class="op">>></span> goNext <span class="co">-- Inc</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="ot">-></span> modifyMem (<span class="fu">subtract</span> <span class="dv">1</span>) <span class="op">>></span> goNext <span class="co">-- Dec</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="dv">2</span> <span class="ot">-></span> jump (nextMemPtr memStartPtr memEndPtr memPtr <span class="dv">1</span>) <span class="dv">1</span> <span class="co">-- MoveR</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="dv">3</span> <span class="ot">-></span> jump (prevMemPtr memStartPtr memEndPtr memPtr <span class="dv">1</span>) <span class="dv">1</span> <span class="co">-- MoveL</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="dv">4</span> <span class="ot">-></span> <span class="fu">getChar</span> <span class="op">>>=</span> writeMem <span class="op">.</span> <span class="fu">fromIntegral</span> <span class="op">.</span> <span class="fu">ord</span> <span class="op">>></span> goNext <span class="co">-- GetC</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="dv">5</span> <span class="ot">-></span> readMem <span class="op">>>=</span> <span class="fu">putChar</span> <span class="op">.</span> <span class="fu">chr</span> <span class="op">.</span> <span class="fu">fromIntegral</span> <span class="op">>></span> goNext <span class="co">-- PutC</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="dv">6</span> <span class="ot">-></span> readMem <span class="op">>>=</span> \<span class="kw">case</span> <span class="co">-- Loop start</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> readProg2 <span class="op">>>=</span> jump memPtr</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> jump memPtr <span class="dv">3</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="dv">7</span> <span class="ot">-></span> readMem <span class="op">>>=</span> \<span class="kw">case</span> <span class="co">-- Loop end</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="ot">-></span> jump memPtr <span class="dv">3</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> readProg2 <span class="op">>>=</span> jump memPtr <span class="op">.</span> <span class="fu">negate</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dv">8</span> <span class="ot">-></span> writeMem <span class="dv">0</span> <span class="op">>></span> goNext <span class="co">-- Clear</span></span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> op <span class="ot">-></span> <span class="fu">error</span> <span class="op">$</span> <span class="st">"Unknown opcode: "</span> <span class="op"><></span> <span class="fu">show</span> op</span></code></pre></div>
<p>We can see how the patterns <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">[</span><span class="st">-</span><span class="cn">]</span></code> and <code class="sourceCode brainfuck" data-lang="brainfuck"><span class="cn">[</span><span class="st">+</span><span class="cn">]</span></code> that may execute operations tens, maybe hundreds, of times, are replaced by a single operation in the interpreter now. This is what gives us the speedup in this case. Let’s run it:</p>
<pre class="plain"><code>❯ time ./bfi -o hanoi.bf > /dev/null
4.07 real 4.04 user 0.01 sys
❯ time ./bfi -o mandelbrot.bf > /dev/null
15.58 real 15.53 user 0.04 sys</code></pre>
<p><code>hanoi.bf</code> runs 2.7x faster, whereas <code>mandelbrot.bf</code> is barely 1% faster as compared to the non-optimizing bytecode interpreter. This demonstrates how different optimizations apply to different programs, and hence the need to implement a wide variety of them to be able to optimize all programs well.</p>
<h2 data-track-content data-content-name="comparison" data-content-piece="brainfuck-interpreter" id="comparison">Comparison</h2>
<p>It’s time for a final comparison of the run times of the four interpreters:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Interpreter</th>
<th style="text-align: right;">Hanoi</th>
<th style="text-align: right;">Mandelbrot</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">String</td>
<td style="text-align: right;">29.15s</td>
<td style="text-align: right;">94.86s</td>
</tr>
<tr>
<td style="text-align: left;">AST</td>
<td style="text-align: right;">14.94s</td>
<td style="text-align: right;">36.49s</td>
</tr>
<tr>
<td style="text-align: left;">Bytecode</td>
<td style="text-align: right;">11.10s</td>
<td style="text-align: right;">15.72s</td>
</tr>
<tr>
<td style="text-align: left;">Optimizing Bytecode</td>
<td style="text-align: right;">4.07s</td>
<td style="text-align: right;">15.58s</td>
</tr>
</tbody>
</table>
</div>
<p>The final interpreter is 7x faster than the baseline one for <code>hanoi.bf</code>, and 6x faster for <code>mandelbrot.bf</code>. Here’s the same data as a chart:</p>
<figure class="w-100pct mw-80pct">
<a href="https://abhinavsarkar.net/images/plots/pandocplot3569819420457630450.svg" class="img-link"><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 800 600'%3E%3C/svg%3E" class="lazyload w-100pct mw-80pct" style="--image-aspect-ratio: 1.3333333333333333" data-src="/images/plots/pandocplot3569819420457630450.svg" alt="Run time of the four interpreters"></img>
<noscript><img src="/images/plots/pandocplot3569819420457630450.svg" class="w-100pct mw-80pct" alt="Run time of the four interpreters"></img></noscript></a>
<figcaption>Run time of the four interpreters</figcaption>
</figure>
<p>That’s it for this post. I hope you enjoyed it and took something away from it. In a future post, we’ll explore more optimization for our <abbr title="Brainfuck">BF</abbr> interpreter. The full code for this post is available <a href="https://abhinavsarkar.net/code/bfi.html?mtm_campaign=feed">here</a>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p><abbr title="Brainfuck">BF</abbr> is Turning-complete. That means it can be used to implement any computable program. However, it is a Turing tarpit, which means it is not feasible to write any useful programs in it because of its lack of abstractions.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>A string interpreter also serves as an useful baseline for measuring the performance of <abbr title="Brainfuck">BF</abbr> interpreters. That’s why I decided to use strings instead of <a href="https://hackage.haskell.org/package/text/docs/Data-Text.html#t:Text" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Data.Text</span></code></a> or <a href="https://hackage.haskell.org/package/containers/docs/Data-Sequence.html#t:Seq" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Data.Sequence</span></code></a>, which are more performant.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>I am a big fan of zippers, as evidenced by <a href="https://abhinavsarkar.net/tags/zippers/?mtm_campaign=feed">this growing list</a> of posts that I use them in.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>We use <a href="https://nixos.org" target="_blank" rel="noopener">Nix</a> for getting the dependency libraries.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>If you are unfamiliar, <a href="https://hackage.haskell.org/package/base/docs/Control-Arrow.html#v:-62--62--62-" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="op">>>></span></code></a> is the left-to-right function composition function:</p>
<div class="sourceCode" id="cb16" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>f <span class="op">>>></span> g <span class="ot">=</span> g <span class="op">.</span> f</span></code></pre></div>
<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn6"><p>While the only way to access byte arrays is pointers, we could have continued accessing the memory vector using indices. I benchmarked both methods, and found that using pointers for memory access sped up the execution of <code>hanoi.bf</code> by 1.1x and <code>mandelbrot.bf</code> by 1.6x as compared to index-based access. It’s also nice to learn how to use pointers in Haskell. This is why we chose to use <a href="https://hackage.haskell.org/package/vector/docs/Data-Vector-Storable.html#g:1" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Storable</span></code></a> vectors for the memory.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>See <a href="https://bfc.wilfred.me.uk" target="_blank" rel="noopener">BFC</a>, which touts itself as “an industrial-grade Brainfuck compiler”, with a huge list of optimizations.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/brainfuck-interpreter/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2025-01-19T00:00:00Z <p>Writing an interpreter for Brainfuck is almost a rite of passage for any programming language implementer,
and it’s my turn now. In this post, we’ll write not one but four Brainfuck interpreters in Haskell. Let’s go!</p>
https://abhinavsarkar.net/posts/solving-aoc20-seating-system/ Solving Advent of Code “Seating System” with Comonads and Stencils 2025-01-05T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In this post, we solve the Advent of Code 2020 <a href="https://adventofcode.com/2020/day/11" target="_blank" rel="noopener">“Seating System”</a> challenge in Haskell using comonads and stencils.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/solving-aoc20-seating-system/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>Solving Advent of Code</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/type-level-haskell-aoc7/?mtm_campaign=feed">“Handy Haversacks” in Type-level Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/parsers-zippers-interpreters-aoc7/?mtm_campaign=feed">“No Space Left On Device” with Parsers, Zippers and Interpreters</a>
</li>
<li>
<a href="https://abhinavsarkar.net/notes/2022-type-level-rps/?mtm_campaign=feed">“Rock-Paper-Scissors” in Type-level Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/compiling-aoc23-aplenty/?mtm_campaign=feed">“Aplenty” by Compiling</a>
</li>
<li>
<strong>“Seating System” with Comonads and Stencils</strong> 👈
</li>
</ol>
</section>
<nav id="toc"><h3>Contents</h3><ol><li><a href="#the-challenge">The Challenge</a></li><li><a href="#the-cellular-automaton">The Cellular Automaton</a></li><li><a href="#the-solution">The Solution</a></li><li><a href="#the-zipper">The Zipper</a></li><li><a href="#the-comonad">The Comonad</a></li><li><a href="#the-array">The Array</a></li><li><a href="#the-stencil">The Stencil</a></li></ol></nav>
<h2 data-track-content data-content-name="the-challenge" data-content-piece="solving-aoc20-seating-system" id="the-challenge">The Challenge</h2>
<p>Here’s a quick summary of the challenge:</p>
<blockquote>
<p>The seat layout fits on a grid. Each position is either floor (<code>.</code>), an empty seat (<code>L</code>), or an occupied seat (<code>#</code>). For example, the initial seat layout might look like this:</p>
<pre class="plain"><code>L.LL.LL.LL
LLLLLLL.LL
L.L.L..L..
LLLL.LL.LL
L.LL.LL.LL
L.LLLLL.LL
..L.L.....
LLLLLLLLLL
L.LLLLLL.L
L.LLLLL.LL</code></pre>
<p>All decisions are based on the number of occupied seats adjacent to a given seat (one of the eight positions immediately up, down, left, right, or diagonal from the seat).</p>
<p>The following rules are applied to every seat simultaneously:</p>
<ul>
<li>If a seat is empty (<code>L</code>) and there are no occupied seats adjacent to it, the seat becomes occupied.</li>
<li>If a seat is occupied (<code>#</code>) and four or more seats adjacent to it are also occupied, the seat becomes empty.</li>
<li>Otherwise, the seat’s state does not change.</li>
</ul>
Floor (<code>.</code>) never changes; seats don’t move, and nobody sits on the floor.
</blockquote>
<p>This is a classic <em><a href="https://en.wikipedia.org/wiki/Cellular_Automaton" target="_blank" rel="noopener">Cellular Automaton</a></em> problem. We need to write a program that simulates seats being occupied till no further seats are emptied or occupied, and
returns the final number of occupied seats. Let’s solve this in Haskell.</p>
<h2 data-track-content data-content-name="the-cellular-automaton" data-content-piece="solving-aoc20-seating-system" id="the-cellular-automaton">The Cellular Automaton</h2>
<p>First, some imports:</p>
<div class="sourceCode" id="cb2" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE GHC2021 #-}</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE LambdaCase #-}</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE PatternSynonyms #-}</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE TypeFamilies #-}</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Main</span> <span class="kw">where</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Arrow</span> ((>>>))</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Comonad</span> (<span class="dt">Comonad</span> (..))</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Function</span> (on)</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.List</span> (intercalate, nubBy)</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Massiv.Array</span> (<span class="dt">Ix2</span> (..))</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Massiv.Array</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">A</span></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Massiv.Array.Unsafe</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">AU</span></span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Proxy</span> (<span class="dt">Proxy</span> (..))</span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Vector.Generic</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">VG</span></span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Vector.Generic.Mutable</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">VGM</span></span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Vector.Unboxed</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">VU</span></span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.Environment</span> (getArgs, getProgName)</span></code></pre></div>
<p>We use the <span class="hs"><code class="sourceCode haskell"><span class="dt">GHC2021</span></code></span> extension here that enables a lot of useful GHC extensions by default. Our non-base imports come from the <a href="https://bartoszmilewski.com/2017/01/02/comonads/" target="_blank" rel="noopener">comonad</a>, <a href="https://hackage.haskell.org/package/massiv" target="_blank" rel="noopener">massiv</a> and <a href="https://hackage.haskell.org/package/vector" target="_blank" rel="noopener">vector</a> libraries.</p>
<p>Quoting the Wikipedia page on <a href="https://en.wikipedia.org/wiki/Cellular_Automaton" target="_blank" rel="noopener">Cellular Automaton</a> (CA):</p>
<blockquote>
<ul>
<li>A cellular automaton consists of a regular grid of cells, each in one of a finite number of states.</li>
<li>For each cell, a set of cells called its neighborhood is defined relative to the specified cell.</li>
<li>An initial state is selected by assigning a state for each cell.</li>
<li>A new generation is created, according to some fixed rule that determines the new state of each cell in terms of the current state of the cell and the states of the cells in its neighborhood.</li>
</ul>
</blockquote>
<p>Let’s model the automaton of the challenge using Haskell:</p>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Cell</span> <span class="ot">=</span> <span class="dt">Cell</span> <span class="dt">Char</span> <span class="kw">deriving</span> (<span class="dt">Eq</span>)</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="kw">pattern</span> <span class="dt">Empty</span>, <span class="dt">Occupied</span>, <span class="dt">Floor</span><span class="ot"> ::</span> <span class="dt">Cell</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="kw">pattern</span> <span class="dt">Empty</span> <span class="ot">=</span> <span class="dt">Cell</span> <span class="ch">'L'</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="kw">pattern</span> <span class="dt">Occupied</span> <span class="ot">=</span> <span class="dt">Cell</span> <span class="ch">'#'</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a><span class="kw">pattern</span> <span class="dt">Floor</span> <span class="ot">=</span> <span class="dt">Cell</span> <span class="ch">'.'</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# COMPLETE Empty, Occupied, Floor #-}</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a><span class="ot">parseCell ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Cell</span></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a>parseCell <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> <span class="ch">'L'</span> <span class="ot">-></span> <span class="dt">Empty</span></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> <span class="ch">'#'</span> <span class="ot">-></span> <span class="dt">Occupied</span></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a> <span class="ch">'.'</span> <span class="ot">-></span> <span class="dt">Floor</span></span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a> c <span class="ot">-></span> <span class="fu">error</span> <span class="op">$</span> <span class="st">"Invalid character: "</span> <span class="op"><></span> <span class="fu">show</span> c</span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a><span class="ot">rule ::</span> <span class="dt">Cell</span> <span class="ot">-></span> [<span class="dt">Cell</span>] <span class="ot">-></span> <span class="dt">Cell</span></span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a>rule cell neighbours <span class="ot">=</span></span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> occupiedNeighboursCount <span class="ot">=</span> <span class="fu">length</span> <span class="op">$</span> <span class="fu">filter</span> (<span class="op">==</span> <span class="dt">Occupied</span>) neighbours</span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="kw">case</span> cell <span class="kw">of</span></span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a> <span class="dt">Empty</span> <span class="op">|</span> occupiedNeighboursCount <span class="op">==</span> <span class="dv">0</span> <span class="ot">-></span> <span class="dt">Occupied</span></span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a> <span class="dt">Occupied</span> <span class="op">|</span> occupiedNeighboursCount <span class="op">>=</span> <span class="dv">4</span> <span class="ot">-></span> <span class="dt">Empty</span></span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> cell</span></code></pre></div>
<p>A cell in the grid can be in empty, occupied or floor state. We encode this with the <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/pattern_synonyms.html" target="_blank" rel="noopener">pattern synonyms</a> <code class="sourceCode haskell"><span class="dt">Empty</span></code>, <code class="sourceCode haskell"><span class="dt">Occupied</span></code> and <code class="sourceCode haskell"><span class="dt">Floor</span></code> over the <code class="sourceCode haskell"><span class="dt">Cell</span></code> <code class="sourceCode haskell"><span class="kw">newtype</span></code> over <code class="sourceCode haskell"><span class="dt">Char</span></code><a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>.</p>
<p>The <code>parseCell</code> function parses a character to a <code class="sourceCode haskell"><span class="dt">Cell</span></code>. The <code>rule</code> function implements <a href="#the-challenge">the automaton rule</a>.</p>
<h2 data-track-content data-content-name="the-solution" data-content-piece="solving-aoc20-seating-system" id="the-solution">The Solution</h2>
<p>We are going to solve this puzzle in three different ways. So, let’s abstract the details and solve it top-down.</p>
<div class="sourceCode" id="cb4" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> (<span class="dt">Eq</span> a) <span class="ot">=></span> <span class="dt">Grid</span> a <span class="kw">where</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ot"> fromLists ::</span> [[<span class="dt">Cell</span>]] <span class="ot">-></span> a</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="ot"> step ::</span> a <span class="ot">-></span> a</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="ot"> toLists ::</span> a <span class="ot">-></span> [[<span class="dt">Cell</span>]]</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="ot">solve ::</span> <span class="kw">forall</span> a<span class="op">.</span> (<span class="dt">Grid</span> a) <span class="ot">=></span> <span class="dt">Proxy</span> a <span class="ot">-></span> [[<span class="dt">Cell</span>]] <span class="ot">-></span> <span class="dt">Int</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>solve _ <span class="ot">=</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a> fromLists <span class="op">@</span>a</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> fix step</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> toLists</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> <span class="fu">fmap</span> (<span class="fu">filter</span> (<span class="op">==</span> <span class="dt">Occupied</span>) <span class="op">>>></span> <span class="fu">length</span>)</span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> <span class="fu">sum</span></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a> fix f x <span class="ot">=</span> <span class="kw">let</span> x' <span class="ot">=</span> f x <span class="kw">in</span> <span class="kw">if</span> x <span class="op">==</span> x' <span class="kw">then</span> x <span class="kw">else</span> fix f x'</span></code></pre></div>
<p>We solve the challenge using the <code class="sourceCode haskell"><span class="dt">Grid</span></code> typeclass that all our different solutions implement. A grid is specified by three functions:</p>
<ol type="1">
<li><code>fromList</code>: converts a list of lists of cells to the grid.</li>
<li><code>step</code>: runs one step of the <abbr title="Cellular Automata">CA</abbr> simulation.</li>
<li><code>toList</code>: converts the grid back to a list of lists of cells.</li>
</ol>
<p>The <code>solve</code> function calculates the number of finally occupied seats for any instance of the <code class="sourceCode haskell"><span class="dt">Grid</span></code> typeclass by running the simulation till it converges<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<p>Now, we use <code>solve</code> to solve the challenge in three ways depending on the command line argument supplied:</p>
<div class="sourceCode" id="cb6" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> progName <span class="ot"><-</span> getProgName</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> getArgs <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> [gridType, fileName] <span class="ot">-></span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">readFile</span> fileName</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> <span class="op">>>=</span> (<span class="fu">lines</span> <span class="op">>>></span> <span class="fu">map</span> (<span class="fu">map</span> parseCell) <span class="op">>>></span> solve' gridType <span class="op">>>></span> <span class="fu">print</span>)</span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">putStrLn</span> <span class="op">$</span> <span class="st">"Usage: "</span> <span class="op"><></span> progName <span class="op"><></span> <span class="st">" -(z|a|s) <input_file>"</span></span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a> solve' <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a> <span class="st">"-z"</span> <span class="ot">-></span> solve <span class="op">$</span> <span class="dt">Proxy</span> <span class="op">@</span>(<span class="dt">ZGrid</span> <span class="dt">Cell</span>)</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a> <span class="st">"-a"</span> <span class="ot">-></span> solve <span class="op">$</span> <span class="dt">Proxy</span> <span class="op">@</span>(<span class="dt">AGrid</span> <span class="dt">Cell</span>)</span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a> <span class="st">"-s"</span> <span class="ot">-></span> solve <span class="op">$</span> <span class="dt">Proxy</span> <span class="op">@</span>(<span class="dt">SGrid</span> <span class="dt">Cell</span>)</span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">error</span> <span class="st">"Invalid grid type"</span></span></code></pre></div>
<p>We have set up the top (<code>main</code>) and the bottom (<code>rule</code>) of our solutions. Now let’s work on the middle part.</p>
<h2 data-track-content data-content-name="the-zipper" data-content-piece="solving-aoc20-seating-system" id="the-zipper">The Zipper</h2>
<p>To simulate a <abbr title="Cellular Automata">CA</abbr>, we need to focus on each cell of the automaton grid, and run the rule for the cell. What is the first thing that come to minds of functional programmers when we want to focus on a part of a data structure? <a href="https://en.wikipedia.org/wiki/Zipper_(data_structure)" target="_blank" rel="noopener">Zippers</a>!.</p>
<p>Zippers are a special view of data structures, which allow one to navigate and easily update them. A zipper always has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus.</p>
<p>The first way to solve the challenge is the zipper for once-nested lists. Let’s start with creating the zipper for a simple list:</p>
<div class="sourceCode" id="cb7" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Zipper</span> a <span class="ot">=</span> <span class="dt">Zipper</span> [a] a [a] <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Functor</span>)</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="ot">zPosition ::</span> <span class="dt">Zipper</span> a <span class="ot">-></span> <span class="dt">Int</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>zPosition (<span class="dt">Zipper</span> left _ _) <span class="ot">=</span> <span class="fu">length</span> left</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="ot">zLength ::</span> <span class="dt">Zipper</span> a <span class="ot">-></span> <span class="dt">Int</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>zLength (<span class="dt">Zipper</span> left _ right) <span class="ot">=</span> <span class="fu">length</span> left <span class="op">+</span> <span class="dv">1</span> <span class="op">+</span> <span class="fu">length</span> right</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a><span class="ot">listToZipper ::</span> [a] <span class="ot">-></span> <span class="dt">Zipper</span> a</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>listToZipper <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a> [] <span class="ot">-></span> <span class="fu">error</span> <span class="st">"Cannot create Zipper from empty list"</span></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a> (x <span class="op">:</span> xs) <span class="ot">-></span> <span class="dt">Zipper</span> [] x xs</span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a><span class="ot">zipperToList ::</span> <span class="dt">Zipper</span> a <span class="ot">-></span> [a]</span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a>zipperToList (<span class="dt">Zipper</span> left focus right) <span class="ot">=</span> <span class="fu">reverse</span> left <span class="op"><></span> (focus <span class="op">:</span> right)</span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a><span class="ot">pShowZipper ::</span> (<span class="dt">Show</span> a) <span class="ot">=></span> <span class="dt">Zipper</span> a <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a>pShowZipper (<span class="dt">Zipper</span> left focus right) <span class="ot">=</span></span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a> <span class="fu">unwords</span> <span class="op">$</span></span>
<span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a> <span class="fu">map</span> <span class="fu">show</span> (<span class="fu">reverse</span> left) <span class="op"><></span> ((<span class="st">"["</span> <span class="op"><></span> <span class="fu">show</span> focus <span class="op"><></span> <span class="st">"]"</span>) <span class="op">:</span> <span class="fu">map</span> <span class="fu">show</span> right)</span>
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a><span class="ot">zLeft ::</span> <span class="dt">Zipper</span> a <span class="ot">-></span> <span class="dt">Zipper</span> a</span>
<span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a>zLeft z<span class="op">@</span>(<span class="dt">Zipper</span> left focus right) <span class="ot">=</span> <span class="kw">case</span> left <span class="kw">of</span></span>
<span id="cb7-24"><a href="#cb7-24" aria-hidden="true" tabindex="-1"></a> [] <span class="ot">-></span> z</span>
<span id="cb7-25"><a href="#cb7-25" aria-hidden="true" tabindex="-1"></a> x <span class="op">:</span> xs <span class="ot">-></span> <span class="dt">Zipper</span> xs x (focus <span class="op">:</span> right)</span>
<span id="cb7-26"><a href="#cb7-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-27"><a href="#cb7-27" aria-hidden="true" tabindex="-1"></a><span class="ot">zRight ::</span> <span class="dt">Zipper</span> a <span class="ot">-></span> <span class="dt">Zipper</span> a</span>
<span id="cb7-28"><a href="#cb7-28" aria-hidden="true" tabindex="-1"></a>zRight z<span class="op">@</span>(<span class="dt">Zipper</span> left focus right) <span class="ot">=</span> <span class="kw">case</span> right <span class="kw">of</span></span>
<span id="cb7-29"><a href="#cb7-29" aria-hidden="true" tabindex="-1"></a> [] <span class="ot">-></span> z</span>
<span id="cb7-30"><a href="#cb7-30" aria-hidden="true" tabindex="-1"></a> x <span class="op">:</span> xs <span class="ot">-></span> <span class="dt">Zipper</span> (focus <span class="op">:</span> left) x xs</span></code></pre></div>
<p>A list zipper has a focus element, and two lists that capture the elements to the left and right of the focus. We use it through these functions:</p>
<ul>
<li><code>zPosition</code> returns the zero-indexed position of the focus in the zipper.</li>
<li><code>zLength</code> returns the length of the zipper.</li>
<li><code>listToZipper</code> and <code>zipperToList</code> do conversions between lists and zippers.</li>
<li><code>pShowZipper</code> pretty-prints a zipper, highlighting the focus.</li>
<li><code>zLeft</code> and <code>zRight</code> move the zipper’s focus to left and right respectively.</li>
</ul>
<p>Let’s see it all in action:</p>
<div class="sourceCode" id="cb8" data-lang="ghci"><pre class="sourceCode lhs noNumberSource"><code class="sourceCode literatehaskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> z <span class="ot">=</span> listToZipper [<span class="dv">1</span><span class="op">..</span><span class="dv">7</span>]</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStrLn</span> <span class="op">$</span> pShowZipper z</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>[1] 2 3 4 5 6 7</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> z' <span class="ot">=</span> zRight <span class="op">$</span> zRight <span class="op">$</span> zLeft <span class="op">$</span> zRight <span class="op">$</span> zRight z</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStrLn</span> <span class="op">$</span> pShowZipper z'</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>1 2 3 [4] 5 6 7</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> zPosition z'</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a>3</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> zLength z'</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>7</span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> zipperToList z'</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a>[1,2,3,4,5,6,7]</span></code></pre></div>
<p>Great! Now, what is the zipper for a once-nested list? A once-nested zipper, of course:</p>
<div class="sourceCode" id="cb9" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">ZGrid</span> a <span class="ot">=</span> <span class="dt">ZGrid</span> (<span class="dt">Zipper</span> (<span class="dt">Zipper</span> a)) <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Functor</span>)</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="ot">zgPosition ::</span> <span class="dt">ZGrid</span> a <span class="ot">-></span> (<span class="dt">Int</span>, <span class="dt">Int</span>)</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>zgPosition (<span class="dt">ZGrid</span> rows<span class="op">@</span>(<span class="dt">Zipper</span> _ focus _)) <span class="ot">=</span> (zPosition rows, zPosition focus)</span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a><span class="ot">zgSize ::</span> <span class="dt">ZGrid</span> a <span class="ot">-></span> (<span class="dt">Int</span>, <span class="dt">Int</span>)</span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a>zgSize (<span class="dt">ZGrid</span> rows<span class="op">@</span>(<span class="dt">Zipper</span> _ focus _)) <span class="ot">=</span> (zLength rows, zLength focus)</span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a><span class="ot">listsToZGrid ::</span> [[a]] <span class="ot">-></span> <span class="dt">ZGrid</span> a</span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a>listsToZGrid rows <span class="ot">=</span></span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (first <span class="op">:</span> rest) <span class="ot">=</span> <span class="fu">fmap</span> listToZipper rows</span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="dt">ZGrid</span> <span class="op">$</span> <span class="dt">Zipper</span> [] first rest</span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a><span class="ot">zGridToLists ::</span> <span class="dt">ZGrid</span> a <span class="ot">-></span> [[a]]</span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a>zGridToLists (<span class="dt">ZGrid</span> (<span class="dt">Zipper</span> left focus right)) <span class="ot">=</span></span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a> <span class="fu">reverse</span> (<span class="fu">fmap</span> zipperToList left)</span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> (zipperToList focus <span class="op">:</span> <span class="fu">fmap</span> zipperToList right)</span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a><span class="ot">pShowZGrid ::</span> (<span class="dt">Show</span> a) <span class="ot">=></span> <span class="dt">ZGrid</span> a <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a>pShowZGrid (<span class="dt">ZGrid</span> (<span class="dt">Zipper</span> left focus right)) <span class="ot">=</span></span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a> intercalate <span class="st">"\n"</span> <span class="op">$</span> pShowRows left <span class="op"><></span> (pShowZipper focus <span class="op">:</span> pShowRows right)</span>
<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a> pShowRows <span class="ot">=</span> <span class="fu">map</span> pShowZipper'</span>
<span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a> pShowZipper' <span class="ot">=</span></span>
<span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a> zipperToList</span>
<span id="cb9-26"><a href="#cb9-26" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> <span class="fu">splitAt</span> (zPosition focus)</span>
<span id="cb9-27"><a href="#cb9-27" aria-hidden="true" tabindex="-1"></a> <span class="op">>>></span> \ <span class="op">~</span>(left', focus' <span class="op">:</span> right') <span class="ot">-></span></span>
<span id="cb9-28"><a href="#cb9-28" aria-hidden="true" tabindex="-1"></a> <span class="fu">unwords</span> <span class="op">$</span></span>
<span id="cb9-29"><a href="#cb9-29" aria-hidden="true" tabindex="-1"></a> <span class="fu">map</span> <span class="fu">show</span> left' <span class="op"><></span> ((<span class="st">" "</span> <span class="op"><></span> <span class="fu">show</span> focus' <span class="op"><></span> <span class="st">" "</span>) <span class="op">:</span> <span class="fu">map</span> <span class="fu">show</span> right')</span></code></pre></div>
<p><code class="sourceCode haskell"><span class="dt">ZGrid</span></code> is a <code class="sourceCode haskell"><span class="kw">newtype</span></code> over a zipper of zippers. It has functions similar to <code class="sourceCode haskell"><span class="dt">Zipper</span></code> for getting focus, position and size, for conversions to-and-from lists of lists, and for pretty-printing.</p>
<p>Next, the functions to move the focus in the grid:</p>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">zgUp ::</span> <span class="dt">ZGrid</span> a <span class="ot">-></span> <span class="dt">ZGrid</span> a</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>zgUp (<span class="dt">ZGrid</span> rows) <span class="ot">=</span> <span class="dt">ZGrid</span> <span class="op">$</span> zLeft rows</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a><span class="ot">zgDown ::</span> <span class="dt">ZGrid</span> a <span class="ot">-></span> <span class="dt">ZGrid</span> a</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a>zgDown (<span class="dt">ZGrid</span> rows) <span class="ot">=</span> <span class="dt">ZGrid</span> <span class="op">$</span> zRight rows</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a><span class="ot">zgLeft ::</span> <span class="dt">ZGrid</span> a <span class="ot">-></span> <span class="dt">ZGrid</span> a</span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a>zgLeft (<span class="dt">ZGrid</span> rows) <span class="ot">=</span> <span class="dt">ZGrid</span> <span class="op">$</span> <span class="fu">fmap</span> zLeft rows</span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a><span class="ot">zgRight ::</span> <span class="dt">ZGrid</span> a <span class="ot">-></span> <span class="dt">ZGrid</span> a</span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a>zgRight (<span class="dt">ZGrid</span> rows) <span class="ot">=</span> <span class="dt">ZGrid</span> <span class="op">$</span> <span class="fu">fmap</span> zRight rows</span></code></pre></div>
<p>Let’s check them out in GHCi:</p>
<div class="sourceCode" id="cb11" data-lang="ghci"><pre class="sourceCode lhs noNumberSource"><code class="sourceCode literatehaskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> zg <span class="ot">=</span> listsToZGrid <span class="op">$</span> <span class="fu">replicate</span> <span class="dv">7</span> <span class="op">$</span> [<span class="dv">1</span><span class="op">..</span><span class="dv">7</span>]</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStrLn</span> <span class="op">$</span> pShowZGrid zg</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>[1] 2 3 4 5 6 7</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> 1 2 3 4 5 6 7</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> 1 2 3 4 5 6 7</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> 1 2 3 4 5 6 7</span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> 1 2 3 4 5 6 7</span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> 1 2 3 4 5 6 7</span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a> 1 2 3 4 5 6 7</span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> zg' <span class="ot">=</span> zgDown <span class="op">$</span> zgRight <span class="op">$</span> zgDown <span class="op">$</span> zgRight zg</span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStrLn</span> <span class="op">$</span> pShowZGrid zg'</span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a>1 2 3 4 5 6 7</span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a>1 2 3 4 5 6 7</span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a>1 2 [3] 4 5 6 7</span>
<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a>1 2 3 4 5 6 7</span>
<span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a>1 2 3 4 5 6 7</span>
<span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a>1 2 3 4 5 6 7</span>
<span id="cb11-18"><a href="#cb11-18" aria-hidden="true" tabindex="-1"></a>1 2 3 4 5 6 7</span>
<span id="cb11-19"><a href="#cb11-19" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> zgPosition zg'</span>
<span id="cb11-20"><a href="#cb11-20" aria-hidden="true" tabindex="-1"></a>(2,2)</span>
<span id="cb11-21"><a href="#cb11-21" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> zgSize zg'</span>
<span id="cb11-22"><a href="#cb11-22" aria-hidden="true" tabindex="-1"></a>(7,7)</span>
<span id="cb11-23"><a href="#cb11-23" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> zGridToLists zg'</span>
<span id="cb11-24"><a href="#cb11-24" aria-hidden="true" tabindex="-1"></a>[[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7]]</span></code></pre></div>
<p>It works as expected. Now, how do we use this to simulate a <abbr title="Cellular Automata">CA</abbr>?</p>
<h2 data-track-content data-content-name="the-comonad" data-content-piece="solving-aoc20-seating-system" id="the-comonad">The Comonad</h2>
<p>A <abbr title="Cellular Automata">CA</abbr> requires us to focus on each cell of the grid, and run a rule for the cell that depends on the neighbours of the cell. An Haskell abstraction that neatly fits this requirement is <em><a href="https://bartoszmilewski.com/2017/01/02/comonads/" target="_blank" rel="noopener">Comonad</a></em>.</p>
<p>Comonads are duals of Monads<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>. We don’t need to learn everything about them for now. For our purpose, Comonad provides an interface that exactly lines up with what is needed for simulating <abbr title="Cellular Automata">CA</abbr>:</p>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> <span class="dt">Functor</span> w <span class="ot">=></span> <span class="dt">Comonad</span> w <span class="kw">where</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="ot"> extract ::</span> w a <span class="ot">-></span> a</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="ot"> duplicate ::</span> w a <span class="ot">-></span> w (w a)</span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="ot"> extend ::</span> (w a <span class="ot">-></span> b) <span class="ot">-></span> w a <span class="ot">-></span> w b</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a> <span class="ot">{-# MINIMAL extract, (duplicate | extend) #-}</span></span></code></pre></div>
<p>Assuming we can make <code class="sourceCode haskell"><span class="dt">ZGrid</span></code> a comonad instance, the signatures for the above functions for <code class="sourceCode haskell"><span class="dt">ZGrid</span> <span class="dt">Cell</span></code> would be:</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> <span class="dt">Comonad</span> <span class="dt">ZGrid</span> <span class="kw">where</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="ot"> extract ::</span> <span class="dt">ZGrid</span> <span class="dt">Cell</span> <span class="ot">-></span> <span class="dt">Cell</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="ot"> duplicate ::</span> <span class="dt">ZGrid</span> <span class="dt">Cell</span> <span class="ot">-></span> <span class="dt">ZGrid</span> (<span class="dt">ZGrid</span> <span class="dt">Cell</span>)</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a><span class="ot"> extend ::</span> (<span class="dt">ZGrid</span> <span class="dt">Cell</span> <span class="ot">-></span> <span class="dt">Cell</span>) <span class="ot">-></span> <span class="dt">ZGrid</span> <span class="dt">Cell</span> <span class="ot">-></span> <span class="dt">ZGrid</span> <span class="dt">Cell</span></span></code></pre></div>
<p>For <code>ZGrid</code> as a <abbr title="Cellular Automata">CA</abbr> comonad:</p>
<ul>
<li>The <code>extract</code> function would return the current focus of the grid.</li>
<li>The <code>duplicate</code> function would return a grid of grids, one inner grid for each possible focus of the input grid.</li>
<li>The <code>extend</code> function would apply the automata rule to each possible focus of the grid, and return a new grid.</li>
</ul>
<p>The nice part is, we need to implement only the <code>extract</code> and <code>duplicate</code> functions, and the generation of the new grid is taken care of automatically by the default implementation of the <code>extend</code> function. Let’s write the comonad instance for <code class="sourceCode haskell"><span class="dt">ZGrid</span></code>.</p>
<p>First, we write the comonad instance for <code class="sourceCode haskell"><span class="dt">Zipper</span></code>:</p>
<div class="sourceCode" id="cb14" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Comonad</span> <span class="dt">Zipper</span> <span class="kw">where</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a> extract (<span class="dt">Zipper</span> _ focus _) <span class="ot">=</span> focus</span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> duplicate zipper <span class="ot">=</span> <span class="dt">Zipper</span> left zipper right</span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> pos <span class="ot">=</span> zPosition zipper</span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a> left <span class="ot">=</span> iterateN pos zLeft <span class="op">$</span> zLeft zipper</span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a> right <span class="ot">=</span> iterateN (zLength zipper <span class="op">-</span> pos <span class="op">-</span> <span class="dv">1</span>) zRight <span class="op">$</span> zRight zipper</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a><span class="ot">iterateN ::</span> <span class="dt">Int</span> <span class="ot">-></span> (a <span class="ot">-></span> a) <span class="ot">-></span> a <span class="ot">-></span> [a]</span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a>iterateN n f <span class="ot">=</span> <span class="fu">take</span> n <span class="op">.</span> <span class="fu">iterate</span> f</span></code></pre></div>
<p><code>extract</code> for <code class="sourceCode haskell"><span class="dt">Zipper</span></code> simply returns the input zipper’s focus element.</p>
<p><code>duplicate</code> returns a zipper of zippers, with the input zipper as its focus, and the left and right lists of zippers as variation of the input zipper with all possible focuses. Trying out the functions in GHCi gives a better idea:</p>
<div class="sourceCode" id="cb15" data-lang="ghci"><pre class="sourceCode lhs noNumberSource"><code class="sourceCode literatehaskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> z <span class="ot">=</span> listToZipper [<span class="dv">1</span><span class="op">..</span><span class="dv">7</span>]<span class="ot"> ::</span> <span class="dt">Zipper</span> <span class="dt">Int</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>t duplicate z</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>duplicate z :: Zipper (Zipper Int)</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">mapM_</span> (<span class="fu">putStrLn</span> <span class="op">.</span> pShowZipper) <span class="op">$</span> zipperToList <span class="op">$</span> duplicate z</span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>[1] 2 3 4 5 6 7</span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a>1 [2] 3 4 5 6 7</span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a>1 2 [3] 4 5 6 7</span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a>1 2 3 [4] 5 6 7</span>
<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a>1 2 3 4 [5] 6 7</span>
<span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a>1 2 3 4 5 [6] 7</span>
<span id="cb15-11"><a href="#cb15-11" aria-hidden="true" tabindex="-1"></a>1 2 3 4 5 6 [7]</span></code></pre></div>
<p>Great! Now we use similar construction to write the comonad instance for <code class="sourceCode haskell"><span class="dt">ZGrid</span></code>:</p>
<div class="sourceCode" id="cb16" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Comonad</span> <span class="dt">ZGrid</span> <span class="kw">where</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a> extract (<span class="dt">ZGrid</span> grid) <span class="ot">=</span> extract <span class="op">$</span> extract grid</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> duplicate grid <span class="ot">=</span> <span class="dt">ZGrid</span> <span class="op">$</span> <span class="dt">Zipper</span> left focus right</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a> (focusRowPos, focusColPos) <span class="ot">=</span> zgPosition grid</span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a> (rowCount, colCount) <span class="ot">=</span> zgSize grid</span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> focus <span class="ot">=</span> <span class="dt">Zipper</span> focusLeft grid focusRight</span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a> focusLeft <span class="ot">=</span> iterateN focusColPos zgLeft <span class="op">$</span> zgLeft grid</span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a> focusRight <span class="ot">=</span></span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a> iterateN (colCount <span class="op">-</span> focusColPos <span class="op">-</span> <span class="dv">1</span>) zgRight <span class="op">$</span> zgRight grid</span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a> left <span class="ot">=</span> iterateN focusRowPos (<span class="fu">fmap</span> zgUp) <span class="op">$</span> <span class="fu">fmap</span> zgUp focus</span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a> right <span class="ot">=</span></span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a> iterateN (rowCount <span class="op">-</span> focusRowPos <span class="op">-</span> <span class="dv">1</span>) (<span class="fu">fmap</span> zgDown) <span class="op">$</span> <span class="fu">fmap</span> zgDown focus</span></code></pre></div>
<p>It works in similar fashion:</p>
<div class="sourceCode" id="cb17" data-lang="ghci"><pre class="sourceCode lhs noNumberSource"><code class="sourceCode literatehaskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> zg <span class="ot">=</span> listsToZGrid <span class="op">$</span> <span class="fu">replicate</span> <span class="dv">4</span> <span class="op">$</span> [<span class="dv">0</span><span class="op">..</span><span class="dv">3</span>]<span class="ot"> ::</span> <span class="dt">ZGrid</span> <span class="dt">Int</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStrLn</span> <span class="op">$</span> pShowZGrid zg</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>[0] 1 2 3</span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a> 0 1 2 3</span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a> 0 1 2 3</span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a> 0 1 2 3</span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>t duplicate zg</span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a>duplicate zg :: ZGrid (ZGrid Int)</span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>t <span class="fu">mapM_</span> (<span class="fu">putStrLn</span> <span class="op">.</span> pShowZGrid) <span class="op">$</span> <span class="fu">concat</span> <span class="op">$</span> zGridToLists <span class="op">$</span> duplicate zg</span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a>mapM_ (putStrLn . pShowZGrid) $ concat $ zGridToLists $ duplicate zg :: IO ()</span></code></pre></div>
<p>I’ve rearranged the output of running the last line of the code above for clarity:</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 661 625'%3E%3C/svg%3E" class="lazyload w-100pct nolink mw-80pct" style="--image-aspect-ratio: 1.0576" data-src="/images/solving-aoc20-seating-system/gridgrid.svg" alt="Output of duplicate for ZGrid"></img>
<noscript><img src="/images/solving-aoc20-seating-system/gridgrid.svg" class="w-100pct nolink mw-80pct" alt="Output of duplicate for ZGrid"></img></noscript>
<figcaption>Output of <code>duplicate</code> for <code class="sourceCode haskell"><span class="dt">ZGrid</span></code></figcaption>
</figure>
<p>We can see a grid of grids, with one inner grid focussed at each possible focus of the input grid. Now we finally implement the automaton:</p>
<div class="sourceCode" id="cb18" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">zGridNeighbours ::</span> <span class="dt">ZGrid</span> a <span class="ot">-></span> [a]</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>zGridNeighbours grid <span class="ot">=</span></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">map</span> <span class="fu">snd</span> <span class="op">.</span> nubBy ((<span class="op">==</span>) <span class="ot">`on`</span> <span class="fu">fst</span>) <span class="op">$</span></span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a> [ (pos, extract grid')</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> move <span class="ot"><-</span> moves,</span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> grid' <span class="ot">=</span> move grid,</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> pos <span class="ot">=</span> zgPosition grid',</span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a> pos <span class="op">/=</span> zgPosition grid</span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a> moves <span class="ot">=</span></span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a> [ zgUp, zgDown, zgRight, zgLeft,</span>
<span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a> zgUp <span class="op">>>></span> zgLeft, zgUp <span class="op">>>></span> zgRight,</span>
<span id="cb18-14"><a href="#cb18-14" aria-hidden="true" tabindex="-1"></a> zgDown <span class="op">>>></span> zgLeft, zgDown <span class="op">>>></span> zgRight</span>
<span id="cb18-15"><a href="#cb18-15" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb18-16"><a href="#cb18-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-17"><a href="#cb18-17" aria-hidden="true" tabindex="-1"></a><span class="ot">stepZGrid ::</span> <span class="dt">ZGrid</span> <span class="dt">Cell</span> <span class="ot">-></span> <span class="dt">ZGrid</span> <span class="dt">Cell</span></span>
<span id="cb18-18"><a href="#cb18-18" aria-hidden="true" tabindex="-1"></a>stepZGrid <span class="ot">=</span> extend <span class="op">$</span> \grid <span class="ot">-></span> rule (extract grid) (zGridNeighbours grid)</span>
<span id="cb18-19"><a href="#cb18-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-20"><a href="#cb18-20" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Grid</span> (<span class="dt">ZGrid</span> <span class="dt">Cell</span>) <span class="kw">where</span></span>
<span id="cb18-21"><a href="#cb18-21" aria-hidden="true" tabindex="-1"></a> fromLists <span class="ot">=</span> listsToZGrid</span>
<span id="cb18-22"><a href="#cb18-22" aria-hidden="true" tabindex="-1"></a> step <span class="ot">=</span> stepZGrid</span>
<span id="cb18-23"><a href="#cb18-23" aria-hidden="true" tabindex="-1"></a> toLists <span class="ot">=</span> zGridToLists</span></code></pre></div>
<p><code>zGridNeighbours</code> returns the neighbour cells of the currently focussed cell of the grid. It does so by moving the focus in all eight directions, and extracting the new focuses. We also make sure to return unique cells by their position.</p>
<p><code>stepZGrid</code> implements one step of the <abbr title="Cellular Automata">CA</abbr> using the <code>extend</code> function of the <code class="sourceCode haskell"><span class="dt">Comonad</span></code> typeclass. We call <code>extend</code> with a function that takes the current grid, and returns the result of running the <abbr title="Cellular Automata">CA</abbr> <code>rule</code> on its focus and the neighbours of the focus.</p>
<p>Finally, we plug in our functions into the <code class="sourceCode haskell"><span class="dt">ZGrid</span> <span class="dt">Cell</span></code> instance of <code class="sourceCode haskell"><span class="dt">Grid</span></code>.</p>
<p>That’s it! Let’s compile and run the code<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>:</p>
<pre class="plain"><code>❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
--run "ghc --make seating-system.hs -O2"
[1 of 2] Compiling Main ( seating-system.hs, seating-system.o )
[2 of 2] Linking seating-system
❯ time ./seating-system -z input.txt
2243
2.72 real 2.68 user 0.02 sys</code></pre>
<p>I verified with the Advent of Code website that the result is correct. We also see the time elapsed, which is 2.7 seconds. That seems pretty high. Can we do better?</p>
<h2 data-track-content data-content-name="the-array" data-content-piece="solving-aoc20-seating-system" id="the-array">The Array</h2>
<p>The problem with the zipper approach is that lists in Haskell are too slow. Some operations on them like <code>length</code> are <span class="math inline">\(O(n)\)</span>. The are also lazy in spine and value, and build up thunks. We could switch to a different list-like data structure<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>, or cache the grid size and neighbour indices for each index to make it run faster. Or we could try an entirely different approach.</p>
<p>Let’s think about it for a bit. Zippers intermix two things together: the data in the grid, and the focus. When running a step of the <abbr title="Cellular Automata">CA</abbr>, the grid data does not change when focussing on all possible focuses, only the focus itself changes. What if we separate the data from the focus? Maybe that’ll make it faster. Let’s try it out.</p>
<p>Let’s model the grid as combination of a 2D array and an index into the array. We are using the arrays from the <a href="https://hackage.haskell.org/package/massiv" target="_blank" rel="noopener">massiv</a> library.</p>
<div class="sourceCode" id="cb20" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">AGrid</span> a <span class="ot">=</span> <span class="dt">AGrid</span> {<span class="ot">aGrid ::</span> <span class="dt">A.Array</span> <span class="dt">A.B</span> <span class="dt">A.Ix2</span> a,<span class="ot"> aGridFocus ::</span> <span class="dt">A.Ix2</span>}</span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Functor</span>)</span></code></pre></div>
<p><code class="sourceCode haskell"><span class="dt">A.Ix2</span></code> is massiv’s way of representing an index into an 2D array, and is essentially same as a two-tuple of <code class="sourceCode haskell"><span class="dt">Int</span></code>s. <code class="sourceCode haskell"><span class="dt">A.Array</span> <span class="dt">A.B</span> <span class="dt">A.Ix2</span> a</code> here means a 2D boxed array of <code>a</code>s. massiv uses <a href="https://hackage.haskell.org/package/massiv/docs/Data-Massiv-Core.html#t:Strategy" target="_blank" rel="noopener">representation strategies</a> to decide how arrays are actually represented in the memory, among which are boxed, unboxed, primitive, storable, delayed etc. Even though primitive and storable arrays are faster, we have to go with boxed arrays here because the <code class="sourceCode haskell"><span class="dt">Functor</span></code> instance of <code class="sourceCode haskell"><span class="dt">A.Array</span></code> exists only for boxed and delayed arrays, and boxed ones are the faster among the two for our purpose.</p>
<p>It is actually massively<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a> easier to write the <code class="sourceCode haskell"><span class="dt">Comonad</span></code> instance for <code class="sourceCode haskell"><span class="dt">AGrid</span></code>:</p>
<div class="sourceCode" id="cb21" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Comonad</span> <span class="dt">AGrid</span> <span class="kw">where</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a> extract (<span class="dt">AGrid</span> grid focus) <span class="ot">=</span> grid <span class="op">A.!</span> focus</span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a> extend f (<span class="dt">AGrid</span> grid focus) <span class="ot">=</span></span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">AGrid</span> (A.compute <span class="op">$</span> A.imap (\pos _ <span class="ot">-></span> f <span class="op">$</span> <span class="dt">AGrid</span> grid pos) grid) focus</span></code></pre></div>
<p>The <code>extract</code> implementation simply looks up the element from the array at the focus index. This time, we don’t need to implement <code>duplicate</code> because it is easier to implement <code>extend</code> directly. We map with index (<code class="sourceCode haskell">A.imap</code>) over the grid, calling the function <code>f</code> for the variation of the grid with the index as the focus.</p>
<p>Next, we write the <abbr title="Cellular Automata">CA</abbr> step:</p>
<div class="sourceCode" id="cb22" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="ot">listsToAGrid ::</span> [[<span class="dt">Cell</span>]] <span class="ot">-></span> <span class="dt">AGrid</span> <span class="dt">Cell</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>listsToAGrid <span class="ot">=</span> A.fromLists' <span class="dt">A.Seq</span> <span class="op">>>></span> <span class="fu">flip</span> <span class="dt">AGrid</span> (<span class="dv">0</span> <span class="op">:.</span> <span class="dv">0</span>)</span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a><span class="ot">aGridNeighbours ::</span> <span class="dt">AGrid</span> a <span class="ot">-></span> [a]</span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a>aGridNeighbours (<span class="dt">AGrid</span> grid (x <span class="op">:.</span> y)) <span class="ot">=</span></span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a> [ grid <span class="op">A.!</span> (x <span class="op">+</span> i <span class="op">:.</span> y <span class="op">+</span> j)</span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> i <span class="ot"><-</span> [<span class="op">-</span><span class="dv">1</span>, <span class="dv">0</span>, <span class="dv">1</span>],</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a> j <span class="ot"><-</span> [<span class="op">-</span><span class="dv">1</span>, <span class="dv">0</span>, <span class="dv">1</span>],</span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a> (x <span class="op">+</span> i, y <span class="op">+</span> j) <span class="op">/=</span> (x, y),</span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a> validIndex (x <span class="op">+</span> i, y <span class="op">+</span> j)</span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">A.Sz</span> (rowCount <span class="op">:.</span> colCount) <span class="ot">=</span> A.size grid</span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a> validIndex (a, b) <span class="ot">=</span> <span class="fu">and</span> [a <span class="op">>=</span> <span class="dv">0</span>, b <span class="op">>=</span> <span class="dv">0</span>, a <span class="op"><</span> rowCount, b <span class="op"><</span> colCount]</span>
<span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-16"><a href="#cb22-16" aria-hidden="true" tabindex="-1"></a><span class="ot">stepAGrid ::</span> <span class="dt">AGrid</span> <span class="dt">Cell</span> <span class="ot">-></span> <span class="dt">AGrid</span> <span class="dt">Cell</span></span>
<span id="cb22-17"><a href="#cb22-17" aria-hidden="true" tabindex="-1"></a>stepAGrid <span class="ot">=</span> extend <span class="op">$</span> \grid <span class="ot">-></span> rule (extract grid) (aGridNeighbours grid)</span>
<span id="cb22-18"><a href="#cb22-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-19"><a href="#cb22-19" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Grid</span> (<span class="dt">AGrid</span> <span class="dt">Cell</span>) <span class="kw">where</span></span>
<span id="cb22-20"><a href="#cb22-20" aria-hidden="true" tabindex="-1"></a> fromLists <span class="ot">=</span> listsToAGrid</span>
<span id="cb22-21"><a href="#cb22-21" aria-hidden="true" tabindex="-1"></a> step <span class="ot">=</span> stepAGrid</span>
<span id="cb22-22"><a href="#cb22-22" aria-hidden="true" tabindex="-1"></a> toLists <span class="ot">=</span> aGrid <span class="op">>>></span> A.toLists</span></code></pre></div>
<p><code>listsToAGrid</code> converts a list of lists of cells into an <code class="sourceCode haskell"><span class="dt">AGrid</span></code> focussed at <code>(0,0)</code>. <code>aGridNeighbours</code> finds the neighbours of the current focus of a grid by directly looking up the valid neighbour indices into the array. <code>stepAGrid</code> calls <code>extract</code> and <code>aGridNeighbours</code> to implement the <abbr title="Cellular Automata">CA</abbr> step, much like the <code class="sourceCode haskell"><span class="dt">ZGrid</span></code> case. And finally, we create the <code class="sourceCode haskell"><span class="dt">AGrid</span> <span class="dt">Cell</span></code> instance of <code class="sourceCode haskell"><span class="dt">Grid</span></code>.</p>
<p>Let’s compile and run it:</p>
<pre class="plain"><code>❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
--run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -a input.txt
2243
0.10 real 0.09 user 0.00 sys</code></pre>
<p>Woah! It takes only 0.1 second this time. Can we do even better?</p>
<h2 data-track-content data-content-name="the-stencil" data-content-piece="solving-aoc20-seating-system" id="the-stencil">The Stencil</h2>
<p>massiv has a construct called <em><a href="https://hackage.haskell.org/package/massiv/docs/Data-Massiv-Array-Stencil.html" target="_blank" rel="noopener">Stencil</a></em> that can be used for simulating <abbr title="Cellular Automata">CA</abbr>:</p>
<blockquote>
<p>Stencil is abstract description of how to handle elements in the neighborhood of every array cell in order to compute a value for the cells in the new array.</p>
</blockquote>
<p>That sounds like exactly what we need. Let’s try it out next.</p>
<p>With stencils, we do not need the instance of <code class="sourceCode haskell"><span class="dt">Comonad</span></code> for the grid. So we can switch to the faster unboxed array representation:</p>
<div class="sourceCode" id="cb24" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="kw">instance</span> <span class="dt">VU.MVector</span> s <span class="dt">Cell</span> <span class="ot">=</span> <span class="dt">MV_Char</span> (<span class="dt">VU.MVector</span> s <span class="dt">Char</span>)</span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="kw">instance</span> <span class="dt">VU.Vector</span> <span class="dt">Cell</span> <span class="ot">=</span> <span class="dt">V_Char</span> (<span class="dt">VU.Vector</span> <span class="dt">Char</span>)</span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a><span class="kw">deriving</span> <span class="kw">instance</span> <span class="dt">VGM.MVector</span> <span class="dt">VU.MVector</span> <span class="dt">Cell</span></span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a><span class="kw">deriving</span> <span class="kw">instance</span> <span class="dt">VG.Vector</span> <span class="dt">VU.Vector</span> <span class="dt">Cell</span></span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">VU.Unbox</span> <span class="dt">Cell</span></span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">SGrid</span> a <span class="ot">=</span> <span class="dt">A.Array</span> <span class="dt">A.U</span> <span class="dt">A.Ix2</span> a</span></code></pre></div>
<p>First five lines make <code class="sourceCode haskell"><span class="dt">Cell</span></code> an instance of the <a href="https://hackage.haskell.org/package/vector/docs/Data-Vector-Unboxed.html#t:Unbox" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Unbox</span></code></a> typeclass. We chose to make <code class="sourceCode haskell"><span class="dt">Cell</span></code> a <code class="sourceCode haskell"><span class="kw">newtype</span></code> wrapper over <code class="sourceCode haskell"><span class="dt">Char</span></code> because <code class="sourceCode haskell"><span class="dt">Char</span></code> has an <code class="sourceCode haskell"><span class="dt">Unbox</span></code> instance.</p>
<p>Then we define a new grid type <code class="sourceCode haskell"><span class="dt">SGrid</span></code> that is an 2D unboxed array.</p>
<p>Now, we define the stencil and the step function for our <abbr title="Cellular Automata">CA</abbr>:</p>
<div class="sourceCode" id="cb25" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ot">ruleStencil ::</span> <span class="dt">A.Stencil</span> <span class="dt">A.Ix2</span> <span class="dt">Cell</span> <span class="dt">Cell</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a>ruleStencil <span class="ot">=</span> AU.makeUnsafeStencil (<span class="dt">A.Sz</span> (<span class="dv">3</span> <span class="op">:.</span> <span class="dv">3</span>)) (<span class="dv">1</span> <span class="op">:.</span> <span class="dv">1</span>) <span class="op">$</span> \_ get <span class="ot">-></span></span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a> rule (get (<span class="dv">0</span> <span class="op">:.</span> <span class="dv">0</span>)) <span class="op">$</span> <span class="fu">map</span> get neighbourIndexes</span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a> neighbourIndexes <span class="ot">=</span></span>
<span id="cb25-6"><a href="#cb25-6" aria-hidden="true" tabindex="-1"></a> [ <span class="op">-</span><span class="dv">1</span> <span class="op">:.</span> <span class="op">-</span><span class="dv">1</span>, <span class="op">-</span><span class="dv">1</span> <span class="op">:.</span> <span class="dv">0</span>, <span class="op">-</span><span class="dv">1</span> <span class="op">:.</span> <span class="dv">1</span>,</span>
<span id="cb25-7"><a href="#cb25-7" aria-hidden="true" tabindex="-1"></a> <span class="dv">0</span> <span class="op">:.</span> <span class="op">-</span><span class="dv">1</span>, <span class="dv">0</span> <span class="op">:.</span> <span class="dv">1</span>,</span>
<span id="cb25-8"><a href="#cb25-8" aria-hidden="true" tabindex="-1"></a> <span class="dv">1</span> <span class="op">:.</span> <span class="op">-</span><span class="dv">1</span>, <span class="dv">1</span> <span class="op">:.</span> <span class="dv">0</span>, <span class="dv">1</span> <span class="op">:.</span> <span class="dv">1</span></span>
<span id="cb25-9"><a href="#cb25-9" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb25-10"><a href="#cb25-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-11"><a href="#cb25-11" aria-hidden="true" tabindex="-1"></a><span class="ot">stepSGrid ::</span> <span class="dt">SGrid</span> <span class="dt">Cell</span> <span class="ot">-></span> <span class="dt">SGrid</span> <span class="dt">Cell</span></span>
<span id="cb25-12"><a href="#cb25-12" aria-hidden="true" tabindex="-1"></a>stepSGrid <span class="ot">=</span> A.mapStencil (<span class="dt">A.Fill</span> <span class="dt">Floor</span>) ruleStencil <span class="op">>>></span> A.computeP</span>
<span id="cb25-13"><a href="#cb25-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-14"><a href="#cb25-14" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Grid</span> (<span class="dt">SGrid</span> <span class="dt">Cell</span>) <span class="kw">where</span></span>
<span id="cb25-15"><a href="#cb25-15" aria-hidden="true" tabindex="-1"></a> fromLists <span class="ot">=</span> A.fromLists' <span class="dt">A.Seq</span></span>
<span id="cb25-16"><a href="#cb25-16" aria-hidden="true" tabindex="-1"></a> step <span class="ot">=</span> stepSGrid</span>
<span id="cb25-17"><a href="#cb25-17" aria-hidden="true" tabindex="-1"></a> toLists <span class="ot">=</span> A.toLists</span></code></pre></div>
<p>We make a stencil of size 3-by-3, where the focus is at index <code>(1,1)</code> relative to the stencil’s top-left cell. In the callback function, we use the supplied <code>get</code> function to get the neighbours of the focus by using indices relative to the focus, and call <code>rule</code> with the cells at focus and neighbour indices.</p>
<p>Then we write the step function <code>stepSGrid</code> that maps the stencil over the grid. Finally we put everything together in the <code class="sourceCode haskell"><span class="dt">SGrid</span> <span class="dt">Cell</span></code> instance of <code class="sourceCode haskell"><span class="dt">Grid</span></code>.</p>
<p>Let’s compile and run it:</p>
<pre class="plain"><code>❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
--run "ghc --make seating-system.hs -O2"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt
2243
0.08 real 0.07 user 0.00 sys</code></pre>
<p>It is only a bit faster than the previous solution. But, this time we have another trick up our sleeves. Did you notice <code>A.computeP</code> we sneaked in there? With stencils, we can now run the step for all cells in parallel! Let’s recompile it with the right options and run it again:</p>
<pre class="plain"><code>❯ rm ./seating-system
❯ nix-shell -p "ghc.withPackages (p: [p.massiv p.comonad])" \
--run "ghc --make seating-system.hs -O2 -threaded -rtsopts"
[2 of 2] Linking seating-system
❯ time ./seating-system -s input.txt +RTS -N
2243
0.04 real 0.11 user 0.05 sys</code></pre>
<p>The <code>-threaded</code> option enables multithreading, and the <code>+RTS -N</code> option makes the process use all CPU cores<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>. We get a nice speedup of 2x over the single-threaded version.</p>
<h2 class="notoc" data-track-content data-content-name="bonus-round-simulation-visualization" data-content-piece="solving-aoc20-seating-system" id="bonus-round-simulation-visualization">Bonus Round: Simulation Visualization</h2>
<p>Since you’ve read the entire post, here is a bonus visualization of the <abbr title="Cellular Automata">CA</abbr> simulation for you (warning: lots of fast blinking):</p>
<details>
<summary class="print-hide">
Play the simulation
</summary>
<img src class="lazyload nolink w-100pct mw-80pct" data-src="/images/solving-aoc20-seating-system/ca.gif"></img>
<noscript><img src="/images/solving-aoc20-seating-system/ca.gif" class="nolink w-100pct mw-80pct"></img></noscript>
</details>
<p>That’s it for this post! I hope you enjoyed it and took something away from it. The full code for this post is available <a href="https://abhinavsarkar.net/code/seating-system.html?mtm_campaign=feed">here</a>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>The reason for using a <code class="sourceCode haskell"><span class="kw">newtype</span></code> instead of a <code class="sourceCode haskell"><span class="kw">data</span></code> is explained in the <a href="#the-stencil">Stencil</a> section.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>If you are unfamiliar, <a href="https://hackage.haskell.org/package/base/docs/Control-Arrow.html#v:-62--62--62-" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="op">>>></span></code></a> is the left-to-right function composition function:</p>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode haskell noNumberSource"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>f <span class="op">>>></span> g <span class="ot">=</span> g <span class="op">.</span> f</span></code></pre></div>
<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn3"><p>This <a href="https://www.quora.com/What-is-a-Comonad-and-when-should-I-use-them/answer/Bartosz-Milewski" target="_blank" rel="noopener">short post</a> by <a href="https://bartoszmilewski.com/" target="_blank" rel="noopener">Bartosz Milewski</a> explains how comonads and monads are related.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>We use <a href="https://nixos.org" target="_blank" rel="noopener">Nix</a> for getting the dependency libraries.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>I did try a variation with <a href="https://hackage.haskell.org/package/containers/docs/Data-Sequence.html#t:Seq" target="_blank" rel="noopener"><code>Data.Sequence.Seq</code></a> instead of lists, and it was twice as fast.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>Pun very much intended.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>I tried running the process with different values of <code>N</code> and found that <code>N4</code> gave the fastest results. So, <a href="https://en.wikipedia.org/wiki/Amdahl’s_law" target="_blank" rel="noopener">Amdahl’s law</a> applies here.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>Solving Advent of Code</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/type-level-haskell-aoc7/?mtm_campaign=feed">“Handy Haversacks” in Type-level Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/parsers-zippers-interpreters-aoc7/?mtm_campaign=feed">“No Space Left On Device” with Parsers, Zippers and Interpreters</a>
</li>
<li>
<a href="https://abhinavsarkar.net/notes/2022-type-level-rps/?mtm_campaign=feed">“Rock-Paper-Scissors” in Type-level Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/compiling-aoc23-aplenty/?mtm_campaign=feed">“Aplenty” by Compiling</a>
</li>
<li>
<strong>“Seating System” with Comonads and Stencils</strong> 👈
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/solving-aoc20-seating-system/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2025-01-05T00:00:00Z <p>In this post, we solve the Advent of Code 2020 <a href="https://adventofcode.com/2020/day/11" target="_blank" rel="noopener">“Seating System”</a> challenge in Haskell using comonads and stencils.</p>
https://abhinavsarkar.net/posts/repling-with-haskeline/ Going REPLing with Haskeline 2024-10-31T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>So you went ahead and created a new programming language, with an AST, a parser, and an interpreter. And now you hate how you have to write the programs in your new language in files to run them? You need a <a href="https://en.wikipedia.org/wiki/REPL" target="_blank" rel="noopener">REPL</a>! In this post, we’ll create a shiny REPL with lots of nice features using the Haskeline library to go along with your new PL that you implemented in Haskell.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/repling-with-haskeline/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more-->
<nav id="toc"><h3>Contents</h3><ol><li><a href="#the-demo">The Demo</a></li><li><a href="#dawn-of-a-new-language">Dawn of a New Language</a></li><li><a href="#a-repl-of-our-own">A REPL of Our Own</a></li><li><a href="#state-and-settings">State and Settings</a></li><li><a href="#repling-down-the-prompt">REPLing Down the Prompt</a></li><li><a href="#reading-the-input">Reading the Input</a></li><li><a href="#evaluating-the-input">Evaluating the Input</a></li><li><a href="#doing-the-completions">Doing the Completions</a></li><li><a href="#conclusion">Conclusion</a></li></ol></nav>
<h2 data-track-content data-content-name="the-demo" data-content-piece="repling-with-haskeline" id="the-demo">The Demo</h2>
<p>First a short demo:</p>
<div class="asciinema" data-src="/files/repling-with-haskeline/demo.cast">
</div>
<noscript>
<details>
<summary>
Play demo
</summary>
<img src class="lazyload nolink w-100pct" data-src="/images/repling-with-haskeline/demo.gif"></img>
<noscript><img src="/images/repling-with-haskeline/demo.gif" class="nolink w-100pct"></img></noscript>
</details>
</noscript>
<p>That is a pretty good REPL, isn’t it? You can even <a href="https://demo.abnv.me/fibolisp/" target="_blank" rel="noopener">try it online</a><a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>, running entirely in your browser.</p>
<h2 data-track-content data-content-name="dawn-of-a-new-language" data-content-piece="repling-with-haskeline" id="dawn-of-a-new-language">Dawn of a New Language</h2>
<p>Let’s assume that we have created a new small <a href="https://en.wikipedia.org/wiki/Lisp_(programming_language)" target="_blank" rel="noopener">Lisp</a><a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>, just large enough to be able to conveniently write and run the Fibonacci function that returns the nth <a href="https://en.wikipedia.org/wiki/Fibonacci_sequence" target="_blank" rel="noopener">Fibonacci number</a>. That’s it, nothing more. This lets us focus on the features of the REPL<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>, not the language.</p>
<p>We have a parser to parse the code from text to an AST, and an interpreter that evaluates an AST and returns a value. We are not going into the details of the parser and the interpreter, just listing the type signatures of the functions they provide is enough for this post.</p>
<p>Let’s start with the AST:</p>
<div class="sourceCode" id="cb1" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Language.FiboLisp.Types</span> <span class="kw">where</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Text</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Text</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Text.Lazy</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">LText</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Text.Pretty.Simple</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">PS</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Text.Printf</span> (printf)</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Ident</span> <span class="ot">=</span> <span class="dt">String</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Expr</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Num_</span> <span class="dt">Integer</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Bool_</span> <span class="dt">Bool</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Var</span> <span class="dt">Ident</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">BinaryOp</span> <span class="dt">Op</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">If</span> <span class="dt">Expr</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Apply</span> <span class="dt">Ident</span> [<span class="dt">Expr</span>]</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>)</span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Op</span> <span class="ot">=</span> <span class="dt">Add</span> <span class="op">|</span> <span class="dt">Sub</span> <span class="op">|</span> <span class="dt">LessThan</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Enum</span>)</span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Def</span> <span class="ot">=</span> <span class="dt">Def</span> {<span class="ot">defName ::</span> <span class="dt">Ident</span>,<span class="ot"> defParams ::</span> [<span class="dt">Ident</span>],<span class="ot"> defBody ::</span> <span class="dt">Expr</span>}</span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Program</span> <span class="ot">=</span> <span class="dt">Program</span> [<span class="dt">Def</span>] [<span class="dt">Expr</span>]</span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>)</span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"></a><span class="ot">carKeywords ::</span> [<span class="dt">String</span>]</span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"></a>carKeywords <span class="ot">=</span> [<span class="st">"def"</span>, <span class="st">"if"</span>, <span class="st">"+"</span>, <span class="st">"-"</span>, <span class="st">"<"</span>]</span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-30"><a href="#cb1-30" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">Def</span> <span class="kw">where</span></span>
<span id="cb1-31"><a href="#cb1-31" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> <span class="dt">Def</span> {<span class="op">..</span>} <span class="ot">=</span></span>
<span id="cb1-32"><a href="#cb1-32" aria-hidden="true" tabindex="-1"></a> printf <span class="st">"(Def %s [%s] (%s))"</span> defName (<span class="fu">unwords</span> defParams) (<span class="fu">show</span> defBody)</span>
<span id="cb1-33"><a href="#cb1-33" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-34"><a href="#cb1-34" aria-hidden="true" tabindex="-1"></a><span class="ot">showProgram ::</span> <span class="dt">Program</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb1-35"><a href="#cb1-35" aria-hidden="true" tabindex="-1"></a>showProgram <span class="ot">=</span></span>
<span id="cb1-36"><a href="#cb1-36" aria-hidden="true" tabindex="-1"></a> Text.unpack</span>
<span id="cb1-37"><a href="#cb1-37" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> LText.toStrict</span>
<span id="cb1-38"><a href="#cb1-38" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> PS.pShowOpt</span>
<span id="cb1-39"><a href="#cb1-39" aria-hidden="true" tabindex="-1"></a> ( PS.defaultOutputOptionsNoColor</span>
<span id="cb1-40"><a href="#cb1-40" aria-hidden="true" tabindex="-1"></a> { PS.outputOptionsIndentAmount <span class="ot">=</span> <span class="dv">2</span>,</span>
<span id="cb1-41"><a href="#cb1-41" aria-hidden="true" tabindex="-1"></a> PS.outputOptionsCompact <span class="ot">=</span> <span class="dt">True</span>,</span>
<span id="cb1-42"><a href="#cb1-42" aria-hidden="true" tabindex="-1"></a> PS.outputOptionsCompactParens <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb1-43"><a href="#cb1-43" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-44"><a href="#cb1-44" aria-hidden="true" tabindex="-1"></a> )</span></code></pre></div>
<p>That’s right! We named our little language FiboLisp.</p>
<p>FiboLisp is expression oriented; everything is an expression. So naturally, we have an <code class="sourceCode haskell"><span class="dt">Expr</span></code> AST. Writing the Fibonacci function requires not many syntactic facilities. In FiboLisp we have:</p>
<ul>
<li>integer numbers,</li>
<li>booleans,</li>
<li>variables,</li>
<li>addition, subtraction, and less-than binary operations on numbers,</li>
<li>conditional <code>if</code> expressions, and</li>
<li>function calls by name.</li>
</ul>
<p>We also have function definitions, captured by <code class="sourceCode haskell"><span class="dt">Def</span></code>, which records the function name, its parameter names, and its body as an expression.</p>
<p>And finally we have <code class="sourceCode haskell"><span class="dt">Program</span></code>s, which are a bunch of function definitions to define, and another bunch of expressions to evaluate.</p>
<p>Short and simple. We don’t need anything more<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>. This is how the Fibonacci function looks in FiboLisp:</p>
<div class="sourceCode" id="cb2" data-lang="fibolisp"><pre class="sourceCode clj numberSource"><code class="sourceCode clojure"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">def</span><span class="fu"> fibo </span>[n]</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> (<span class="kw">if</span> (<span class="kw"><</span> n <span class="dv">2</span>)</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> n</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> (<span class="kw">+</span> (fibo (<span class="kw">-</span> n <span class="dv">1</span>)) (fibo (<span class="kw">-</span> n <span class="dv">2</span>)))))</span></code></pre></div>
<p>We can see all the AST types in use here. Note that FiboLisp is lexically scoped.</p>
<p>The module also lists a bunch of keywords (<code>carKeywords</code>) that can appear in the <em>car</em><a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a> position of a Lisp expression, that we use later for auto-completion in the REPL, and some functions to convert the AST types to nice looking strings.</p>
<p>For the parser, we have this pared-down code:</p>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Language.FiboLisp.Parser</span> (<span class="dt">ParsingError</span>(<span class="op">..</span>), parse) <span class="kw">where</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.DeepSeq</span> (<span class="dt">NFData</span>)</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Exception</span> (<span class="dt">Exception</span>)</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">GHC.Generics</span> (<span class="dt">Generic</span>)</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Language.FiboLisp.Types</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="ot">parse ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Either</span> <span class="dt">ParsingError</span> <span class="dt">Program</span></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">ParsingError</span> <span class="ot">=</span> <span class="dt">ParsingError</span> <span class="dt">String</span> <span class="op">|</span> <span class="dt">EndOfStreamError</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Generic</span>, <span class="dt">NFData</span>)</span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Exception</span> <span class="dt">ParsingError</span></span></code></pre></div>
<p>The essential function is <code>parse</code>, which takes the code as a string, and returns either a <code class="sourceCode haskell"><span class="dt">ParsingError</span></code> on failure, or a <code class="sourceCode haskell"><span class="dt">Program</span></code> on success. If the parser detects that an <a href="https://en.wikipedia.org/wiki/S-expression" target="_blank" rel="noopener">S-expression</a> is not properly closed, it returns an <code class="sourceCode haskell"><span class="dt">EndOfStreamError</span></code> error.</p>
<p>We also have this pretty-printer module that converts function ASTs back to pretty Lisp code:</p>
<div class="sourceCode" id="cb4" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Language.FiboLisp.Printer</span> (prettyShowDef) <span class="kw">where</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Language.FiboLisp.Types</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="ot">prettyShowDef ::</span> <span class="dt">Def</span> <span class="ot">-></span> <span class="dt">String</span></span></code></pre></div>
<p>Finally, the last thing before we hit the real topic of this post, the FiboLisp interpreter:</p>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Language.FiboLisp.Interpreter</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Value</span>, <span class="dt">RuntimeError</span>, interpret, builtinFuncs, builtinVals) <span class="kw">where</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.DeepSeq</span> (<span class="dt">NFData</span>)</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Exception</span> (<span class="dt">Exception</span>)</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Map.Strict</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Map</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">GHC.Generics</span> (<span class="dt">Generic</span>)</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Language.FiboLisp.Types</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a><span class="ot">interpret ::</span> (<span class="dt">String</span> <span class="ot">-></span> <span class="dt">IO</span> ()) <span class="ot">-></span> <span class="dt">Program</span> <span class="ot">-></span> <span class="dt">IO</span> (<span class="dt">Either</span> <span class="dt">RuntimeError</span> <span class="dt">Value</span>)</span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">RuntimeError</span> <span class="ot">=</span> <span class="dt">RuntimeError</span> <span class="dt">String</span></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Generic</span>, <span class="dt">NFData</span>)</span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Exception</span> <span class="dt">RuntimeError</span></span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Value</span> <span class="ot">=</span> <span class="op">...</span></span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Generic</span>, <span class="dt">NFData</span>)</span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a><span class="ot">builtinFuncs ::</span> <span class="dt">Map.Map</span> <span class="dt">String</span> <span class="dt">Value</span></span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a><span class="ot">builtinVals ::</span> [<span class="dt">Value</span>]</span></code></pre></div>
<p>We have elided the details again. All that matters to us is the <code>interpret</code> function that takes a program, and returns either a runtime error or a value. <code class="sourceCode haskell"><span class="dt">Value</span></code> is the runtime representation of the values of FiboLisp expressions, and all we care about is that it can be <code>show</code>n and fully evaluated via <code class="sourceCode haskell"><span class="dt">NFData</span></code><a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>. <code>interpret</code> also takes a <code class="sourceCode haskell"><span class="dt">String</span> <span class="ot">-></span> <span class="dt">IO</span> ()</code> function, that’ll be demystified when we get into implementing the REPL.</p>
<p>Lastly, we have a map of built-in functions and a list of built-in values. We expose them so that they can be treated specially in the REPL.</p>
<p>If you want, you can go ahead and fill in the missing code using your favourite parsing and pretty-printing libraries<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>, and the method of writing interpreters. For this post, those implementation details are not necessary.</p>
<p>Let’s package all this functionality into a module for ease of importing:</p>
<div class="sourceCode" id="cb6" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Language.FiboLisp</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> ( <span class="kw">module</span> <span class="dt">Language.FiboLisp.Types</span>,</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">module</span> <span class="dt">Language.FiboLisp.Parser</span>,</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">module</span> <span class="dt">Language.FiboLisp.Printer</span>,</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">module</span> <span class="dt">Language.FiboLisp.Interpreter</span>,</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a><span class="kw">where</span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Language.FiboLisp.Interpreter</span></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Language.FiboLisp.Parser</span></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Language.FiboLisp.Printer</span></span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Language.FiboLisp.Types</span></span></code></pre></div>
<p>Now, with all the preparations done, we can go REPLing.</p>
<h2 data-track-content data-content-name="a-repl-of-our-own" data-content-piece="repling-with-haskeline" id="a-repl-of-our-own">A REPL of Our Own</h2>
<p>The main functionality that a REPL provides is entering expressions and definitions, one at a time, that it <strong>R</strong>eads, <strong>E</strong>valuates, and <strong>P</strong>rints, and then <strong>L</strong>oops back, letting us do the same again. This can be accomplished with a simple program that prompts the user for an input and does all these with it. However, such a REPL will be quite lackluster.</p>
<p>These days programming languages come with advanced REPLs like <a href="https://ipython.org/" target="_blank" rel="noopener">IPython</a> and <a href="https://nrepl.org/" target="_blank" rel="noopener">nREPL</a>, which provide many functionalities beyond simple REPLing. We want FiboLisp to have a great REPL too.</p>
<p>You may have already noticed some advanced features that our REPL provides in the demo. Let’s state them here:</p>
<ol type="1">
<li>Commands starting with colon:
<ol type="1">
<li>to set and unset settings: <code>:set</code> and <code>:unset</code>,</li>
<li>to load files into the REPL: <code>:load</code>,</li>
<li>to show the source code of functions: <code>:source</code>,</li>
<li>to show a help message: <code>:help</code>.</li>
</ol></li>
<li>Settings to enable/disable:
<ol type="1">
<li>dumping of parsed ASTs: <code>dump</code>,</li>
<li>showing program execution times: <code>time</code>.</li>
</ol></li>
<li>Multiline expressions and functions, with correct indentation.</li>
<li>Colored output and messages.</li>
<li>Auto-completion of commands, code and file names.</li>
<li>Safety checks when loading files.</li>
<li><a href="https://en.wikipedia.org/wiki/GNU_Readline" target="_blank" rel="noopener">Readline</a>-like navigation through the history of previous inputs.</li>
</ol>
<p><a href="https://hackage.haskell.org/package/haskeline" target="_blank" rel="noopener">Haskeline</a> — the Haskell library that we use to create the REPL — provides only basic functionalities, upon which we build to provide these features. Let’s begin.</p>
<h2 data-track-content data-content-name="state-and-settings" data-content-piece="repling-with-haskeline" id="state-and-settings">State and Settings</h2>
<p>As usual, we start the module with many imports<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>:</p>
<div class="sourceCode" id="cb7" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE TemplateHaskell #-}</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Language.FiboLisp.Repl</span> (run) <span class="kw">where</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.DeepSeq</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">DS</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Exception</span> (<span class="dt">Exception</span> (..), evaluate)</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Lens.Basic</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Lens</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span> (when)</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.Catch</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Catch</span></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.IO.Class</span> (<span class="dt">MonadIO</span>, liftIO)</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.Identity</span> (<span class="dt">IdentityT</span> (..))</span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.Reader</span> (<span class="dt">MonadReader</span>, <span class="dt">ReaderT</span> (runReaderT))</span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.Reader</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Reader</span></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.State.Strict</span> (<span class="dt">MonadState</span>, <span class="dt">StateT</span> (runStateT))</span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.State.Strict</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">State</span></span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.Trans</span> (<span class="dt">MonadTrans</span>, lift)</span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Char</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Char</span></span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Functor</span> ((<&>))</span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.List</span></span>
<span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a> (dropWhileEnd, foldl', isPrefixOf, isSuffixOf, nub, <span class="fu">sort</span>, stripPrefix)</span>
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Map.Strict</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Map</span></span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Maybe</span> (fromJust)</span>
<span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Set</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Set</span></span>
<span id="cb7-24"><a href="#cb7-24" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Time</span> (<span class="dt">NominalDiffTime</span>, diffUTCTime, getCurrentTime)</span>
<span id="cb7-25"><a href="#cb7-25" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Language.FiboLisp</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">L</span></span>
<span id="cb7-26"><a href="#cb7-26" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.Console.Haskeline</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">H</span></span>
<span id="cb7-27"><a href="#cb7-27" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.Console.Terminfo</span> <span class="kw">qualified</span> <span class="kw">as</span> <span class="dt">Term</span></span>
<span id="cb7-28"><a href="#cb7-28" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.Directory</span> (canonicalizePath, doesFileExist, getCurrentDirectory)</span></code></pre></div>
<p>Notice that we import the previously shown <code class="sourceCode haskell"><span class="dt">Language.FiboLisp</span></code> module qualified as <code class="sourceCode haskell"><span class="dt">L</span></code>, and Haskeline as <code class="sourceCode haskell"><span class="dt">H</span></code>. Another important library that we use here is <a href="https://hackage.haskell.org/package/terminfo" target="_blank" rel="noopener">terminfo</a>, which helps us do colored output.</p>
<p>A REPL must preserve the context through a session. In case of FiboLisp, this means we should be able to define a function<a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a> as one input, and then use it later in the session, one or many times<a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a>. The REPL should also respect the REPL settings through the session till they are unset.</p>
<p>Additionally, the REPL has to remember whether it is in middle of writing a multiline input. To support multiline input, the REPL also needs to remember the previous indentation, and the input done in previous lines of a multiline input. Together these form the <code class="sourceCode haskell"><span class="dt">ReplState</span></code>:</p>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">ReplState</span> <span class="ot">=</span> <span class="dt">ReplState</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> _replDefs ::</span> <span class="dt">Defs</span>,</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="ot"> _replSettings ::</span> <span class="dt">Settings</span>,</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="ot"> _replLineMode ::</span> <span class="dt">LineMode</span>,</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="ot"> _replIndent ::</span> <span class="dt">Int</span>,</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a><span class="ot"> _replSeenInput ::</span> <span class="dt">String</span></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Defs</span> <span class="ot">=</span> <span class="dt">Map.Map</span> <span class="dt">L.Ident</span> <span class="dt">L.Def</span></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Settings</span> <span class="ot">=</span> <span class="dt">Set.Set</span> <span class="dt">Setting</span></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Setting</span> <span class="ot">=</span> <span class="dt">Dump</span> <span class="op">|</span> <span class="dt">MeasureTime</span> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Ord</span>, <span class="dt">Enum</span>)</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">LineMode</span> <span class="ot">=</span> <span class="dt">SingleLine</span> <span class="op">|</span> <span class="dt">MultiLine</span> <span class="kw">deriving</span> (<span class="dt">Eq</span>)</span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">Setting</span> <span class="kw">where</span></span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">Dump</span> <span class="ot">-></span> <span class="st">"dump"</span></span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">MeasureTime</span> <span class="ot">-></span> <span class="st">"time"</span></span></code></pre></div>
<p>Let’s deal with settings first. We set and unset settings using the <code>:set</code> and <code>:unset</code> commands. So, we write the code to parse setting the settings:</p>
<div class="sourceCode" id="cb9" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">SettingMode</span> <span class="ot">=</span> <span class="dt">Set</span> <span class="op">|</span> <span class="dt">Unset</span> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Enum</span>)</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">SettingMode</span> <span class="kw">where</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Set</span> <span class="ot">-></span> <span class="st">":set"</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Unset</span> <span class="ot">-></span> <span class="st">":unset"</span></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a><span class="ot">parseSetting ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Setting</span></span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a>parseSetting <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a> <span class="st">"dump"</span> <span class="ot">-></span> <span class="dt">Just</span> <span class="dt">Dump</span></span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a> <span class="st">"time"</span> <span class="ot">-></span> <span class="dt">Just</span> <span class="dt">MeasureTime</span></span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a><span class="ot">parseSettingMode ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">SettingMode</span></span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a>parseSettingMode <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a> <span class="st">":set"</span> <span class="ot">-></span> <span class="dt">Just</span> <span class="dt">Set</span></span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a> <span class="st">":unset"</span> <span class="ot">-></span> <span class="dt">Just</span> <span class="dt">Unset</span></span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a><span class="ot">parseSettingCommand ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Either</span> <span class="dt">String</span> (<span class="dt">SettingMode</span>, <span class="dt">Setting</span>)</span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a>parseSettingCommand command <span class="ot">=</span> <span class="kw">case</span> <span class="fu">words</span> command <span class="kw">of</span></span>
<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a> [modeStr, settingStr] <span class="ot">-></span> <span class="kw">case</span> parseSettingMode modeStr <span class="kw">of</span></span>
<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> mode <span class="ot">-></span> <span class="kw">case</span> parseSetting settingStr <span class="kw">of</span></span>
<span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> setting <span class="ot">-></span> <span class="dt">Right</span> (mode, setting)</span>
<span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">Left</span> <span class="op">$</span> <span class="st">"Unknown setting: "</span> <span class="op"><></span> settingStr</span>
<span id="cb9-26"><a href="#cb9-26" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">Left</span> <span class="op">$</span> <span class="st">"Unknown command: "</span> <span class="op"><></span> command</span>
<span id="cb9-27"><a href="#cb9-27" aria-hidden="true" tabindex="-1"></a> [modeStr]</span>
<span id="cb9-28"><a href="#cb9-28" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Just</span> _ <span class="ot"><-</span> parseSettingMode modeStr <span class="ot">-></span> <span class="dt">Left</span> <span class="st">"No setting specified"</span></span>
<span id="cb9-29"><a href="#cb9-29" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">Left</span> <span class="op">$</span> <span class="st">"Unknown command: "</span> <span class="op"><></span> command</span></code></pre></div>
<p>Nothing fancy here, just splitting the input into words and going through them to make sure they are valid.</p>
<p>The REPL is a monad that wraps over <code class="sourceCode haskell"><span class="dt">ReplState</span></code>:</p>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Repl</span> a <span class="ot">=</span> <span class="dt">Repl</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> runRepl_ ::</span> <span class="dt">StateT</span> <span class="dt">ReplState</span> (<span class="dt">ReaderT</span> <span class="dt">AddColor</span> <span class="dt">IO</span>) a</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> ( <span class="dt">Functor</span>,</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Applicative</span>,</span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Monad</span>,</span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">MonadIO</span>,</span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">MonadState</span> <span class="dt">ReplState</span>,</span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">MonadReader</span> <span class="dt">AddColor</span>,</span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Catch.MonadThrow</span>,</span>
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">Catch.MonadCatch</span>,</span>
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Catch.MonadMask</span></span>
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-16"><a href="#cb10-16" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">AddColor</span> <span class="ot">=</span> <span class="dt">Term.Color</span> <span class="ot">-></span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb10-17"><a href="#cb10-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-18"><a href="#cb10-18" aria-hidden="true" tabindex="-1"></a><span class="ot">runRepl ::</span> <span class="dt">AddColor</span> <span class="ot">-></span> <span class="dt">Repl</span> a <span class="ot">-></span> <span class="dt">IO</span> a</span>
<span id="cb10-19"><a href="#cb10-19" aria-hidden="true" tabindex="-1"></a>runRepl addColor <span class="ot">=</span></span>
<span id="cb10-20"><a href="#cb10-20" aria-hidden="true" tabindex="-1"></a> <span class="fu">fmap</span> <span class="fu">fst</span></span>
<span id="cb10-21"><a href="#cb10-21" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">flip</span> runReaderT addColor</span>
<span id="cb10-22"><a href="#cb10-22" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">flip</span> runStateT (<span class="dt">ReplState</span> Map.empty Set.empty <span class="dt">SingleLine</span> <span class="dv">0</span> <span class="st">""</span>)</span>
<span id="cb10-23"><a href="#cb10-23" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> runRepl_</span></code></pre></div>
<p><code class="sourceCode haskell"><span class="dt">Repl</span></code> also lets us do IO — is it really a REPL if you can’t do printing — and deal with exceptions. Additionally, we have a read-only state that is a function, which will be explained soon. The REPL starts in the single line mode, with no indentation, functions definitions, settings, or previously seen input.</p>
<h2 data-track-content data-content-name="repling-down-the-prompt" data-content-piece="repling-with-haskeline" id="repling-down-the-prompt">REPLing Down the Prompt</h2>
<p>Let’s go top-down. We write the <code>run</code> function that is the entry point of this module:</p>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">run ::</span> <span class="dt">IO</span> ()</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>run <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> term <span class="ot"><-</span> Term.setupTermFromEnv</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> addColor <span class="ot">=</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> Term.getCapability term <span class="op">$</span> Term.withForegroundColor <span class="op">@</span><span class="dt">String</span> <span class="kw">of</span></span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> fc <span class="ot">-></span> fc</span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> \_ s <span class="ot">-></span> s</span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> runRepl addColor <span class="op">.</span> H.runInputT settings <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a> H.outputStrLn <span class="op">$</span> addColor promptColor <span class="st">"FiboLisp REPL"</span></span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a> H.outputStrLn <span class="op">$</span> addColor infoColor <span class="st">"Press <TAB> to start"</span></span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a> repl</span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a> settings <span class="ot">=</span></span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a> H.setComplete doCompletions <span class="op">$</span></span>
<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a> H.defaultSettings {H.historyFile <span class="ot">=</span> <span class="dt">Just</span> <span class="st">".fibolisp"</span>}</span></code></pre></div>
<p>This sets up Haskeline to run our REPL using the functions we provide in the later sections: <code>repl</code> and <code>doCompletions</code>. This also demystifies the read-only state of the REPL: a function that adds colors to our output strings, depending on the capabilities of the terminal in which our REPL is running in. We also set up a history file to remember the previous REPL inputs.</p>
<p>When the REPL starts, we output some messages in nice colors, which are defined as:</p>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>promptColor, printColor, outputColor, errorColor,<span class="ot"> infoColor ::</span> <span class="dt">Term.Color</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>promptColor <span class="ot">=</span> <span class="dt">Term.Green</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>printColor <span class="ot">=</span> <span class="dt">Term.White</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>outputColor <span class="ot">=</span> <span class="dt">Term.Green</span></span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>errorColor <span class="ot">=</span> <span class="dt">Term.Red</span></span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>infoColor <span class="ot">=</span> <span class="dt">Term.Cyan</span></span></code></pre></div>
<p>Off we go <code>repl</code>ing now:</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Prompt</span> <span class="ot">=</span> <span class="dt">H.InputT</span> <span class="dt">Repl</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="ot">repl ::</span> <span class="dt">Prompt</span> ()</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>repl <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a> replLineMode <span class="op">.=</span> <span class="dt">SingleLine</span></span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> replIndent <span class="op">.=</span> <span class="dv">0</span></span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> replSeenInput <span class="op">.=</span> <span class="st">""</span></span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a> Catch.handle (\<span class="dt">H.Interrupt</span> <span class="ot">-></span> repl) <span class="op">.</span> H.withInterrupt <span class="op">$</span></span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a> readInput <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">EndOfInput</span> <span class="ot">-></span> outputWithColor promptColor <span class="st">"Goodbye."</span></span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a> input <span class="ot">-></span> evalAndPrint input <span class="op">>></span> repl</span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a><span class="ot">outputWithColor ::</span> <span class="dt">Term.Color</span> <span class="ot">-></span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Prompt</span> ()</span>
<span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a>outputWithColor color text <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a> addColor <span class="ot"><-</span> getAddColor</span>
<span id="cb13-16"><a href="#cb13-16" aria-hidden="true" tabindex="-1"></a> H.outputStrLn <span class="op">$</span> addColor color text</span>
<span id="cb13-17"><a href="#cb13-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-18"><a href="#cb13-18" aria-hidden="true" tabindex="-1"></a><span class="ot">getAddColor ::</span> <span class="dt">Prompt</span> <span class="dt">AddColor</span></span>
<span id="cb13-19"><a href="#cb13-19" aria-hidden="true" tabindex="-1"></a>getAddColor <span class="ot">=</span> lift Reader.ask</span></code></pre></div>
<p>We infuse our <code class="sourceCode haskell"><span class="dt">Repl</span></code> with the powers of Haskeline by wrapping it with Haskeline’s <code class="sourceCode haskell"><span class="dt">InputT</span></code> monad transformer, and call it the <code class="sourceCode haskell"><span class="dt">Prompt</span></code> type. In the <code>repl</code> function, we <code>readInput</code>, <code>evalAndPrint</code> it, and <code>repl</code> again.</p>
<p>We also deal with the user quitting the REPL (the <code class="sourceCode haskell"><span class="dt">EndOfInput</span></code> case), and hitting <kbd>Ctrl</kbd> + <kbd>C</kbd> to interrupt typing or a running evaluation (the handling for <code class="sourceCode haskell"><span class="dt">H.Interrupt</span></code>).</p>
<p>Wait a minute! What is that imperative looking <code class="sourceCode haskell"><span class="op">.=</span></code> doing in our Haskell code? That’s right, we are looking through some lenses!</p>
<div class="sourceCode" id="cb14" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Lens'</span> s a <span class="ot">=</span> <span class="dt">Lens.Lens</span> s s a a</span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="ot">replDefs ::</span> <span class="dt">Lens'</span> <span class="dt">ReplState</span> <span class="dt">Defs</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>replDefs <span class="ot">=</span> <span class="op">$</span>(Lens.field '_replDefs)</span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a><span class="ot">replSettings ::</span> <span class="dt">Lens'</span> <span class="dt">ReplState</span> <span class="dt">Settings</span></span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a>replSettings <span class="ot">=</span> <span class="op">$</span>(Lens.field '_replSettings)</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a><span class="ot">replLineMode ::</span> <span class="dt">Lens'</span> <span class="dt">ReplState</span> <span class="dt">LineMode</span></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a>replLineMode <span class="ot">=</span> <span class="op">$</span>(Lens.field '_replLineMode)</span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a><span class="ot">replIndent ::</span> <span class="dt">Lens'</span> <span class="dt">ReplState</span> <span class="dt">Int</span></span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a>replIndent <span class="ot">=</span> <span class="op">$</span>(Lens.field '_replIndent)</span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a><span class="ot">replSeenInput ::</span> <span class="dt">Lens'</span> <span class="dt">ReplState</span> <span class="dt">String</span></span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a>replSeenInput <span class="ot">=</span> <span class="op">$</span>(Lens.field '_replSeenInput)</span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a><span class="ot">use ::</span> (<span class="dt">MonadTrans</span> t, <span class="dt">MonadState</span> s m) <span class="ot">=></span> <span class="dt">Lens'</span> s a <span class="ot">-></span> t m a</span>
<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a>use l <span class="ot">=</span> lift <span class="op">.</span> State.gets <span class="op">$</span> Lens.view l</span>
<span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a><span class="ot">(.=) ::</span> (<span class="dt">MonadTrans</span> t, <span class="dt">MonadState</span> s m) <span class="ot">=></span> <span class="dt">Lens'</span> s a <span class="ot">-></span> a <span class="ot">-></span> t m ()</span>
<span id="cb14-22"><a href="#cb14-22" aria-hidden="true" tabindex="-1"></a>l <span class="op">.=</span> a <span class="ot">=</span> lift <span class="op">.</span> State.modify' <span class="op">$</span> Lens.set l a</span>
<span id="cb14-23"><a href="#cb14-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-24"><a href="#cb14-24" aria-hidden="true" tabindex="-1"></a><span class="ot">(%=) ::</span> (<span class="dt">MonadTrans</span> t, <span class="dt">MonadState</span> s m) <span class="ot">=></span> <span class="dt">Lens'</span> s a <span class="ot">-></span> (a <span class="ot">-></span> a) <span class="ot">-></span> t m ()</span>
<span id="cb14-25"><a href="#cb14-25" aria-hidden="true" tabindex="-1"></a>l <span class="op">%=</span> f <span class="ot">=</span> lift <span class="op">.</span> State.modify' <span class="op">$</span> Lens.over l f</span></code></pre></div>
<p>If you’ve never encountered <a href="https://hackage.haskell.org/package/lens-tutorial/docs/Control-Lens-Tutorial.html" target="_blank" rel="noopener">lenses</a> before, you can think of them as pairs of setters and getters. The <code>repl*</code> lenses above are for setting and getting the corresponding fields from the <code class="sourceCode haskell"><span class="dt">ReplState</span></code> data type<a href="#fn11" class="footnote-ref" id="fnref11" role="doc-noteref"><sup>11</sup></a>. The <code>use</code>, <code class="sourceCode haskell"><span class="op">.=</span></code>, and <code class="sourceCode haskell"><span class="op">%=</span></code> functions are for getting, setting and modifying respectively the state in the <code class="sourceCode haskell"><span class="dt">State</span></code> monad using lenses. We see them in action at the beginning of the <a href="#cb13-3"><code>repl</code></a> function when we use <code class="sourceCode haskell"><span class="op">.=</span></code> to set the various fields of <code class="sourceCode haskell"><span class="dt">ReplState</span></code> to their initial values in the <code class="sourceCode haskell"><span class="dt">State</span></code> monad.</p>
<p>All that is left now is actually reading the input, evaluating it and printing the results.</p>
<h2 data-track-content data-content-name="reading-the-input" data-content-piece="repling-with-haskeline" id="reading-the-input">Reading the Input</h2>
<p>Haskeline gives us functions to read the user’s input as text. However, being Haskellers, we prefer some structure around it:</p>
<div class="sourceCode" id="cb15" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Input</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Setting</span> (<span class="dt">SettingMode</span>, <span class="dt">Setting</span>)</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Load</span> <span class="dt">FilePath</span></span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Source</span> <span class="dt">String</span></span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Help</span></span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Program</span> <span class="dt">L.Program</span></span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">BadInputError</span> <span class="dt">String</span></span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">EndOfInput</span></span></code></pre></div>
<p>We’ve got all previously mentioned cases covered with the <code class="sourceCode haskell"><span class="dt">Input</span></code> data type. We also do some input validation and capture errors for the failure cases with the <code class="sourceCode haskell"><span class="dt">BadInputError</span></code> constructor. <code class="sourceCode haskell"><span class="dt">EndOfInput</span></code> is used for when the user quits the REPL.</p>
<p>Here is how we read the input:</p>
<div class="sourceCode" id="cb16" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="ot">readInput ::</span> <span class="dt">Prompt</span> <span class="dt">Input</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>readInput <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> addColor <span class="ot"><-</span> getAddColor</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> lineMode <span class="ot"><-</span> use replLineMode</span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a> prevIndent <span class="ot"><-</span> use replIndent</span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> promptSym <span class="ot">=</span> <span class="kw">case</span> lineMode <span class="kw">of</span> <span class="dt">SingleLine</span> <span class="ot">-></span> <span class="st">"λ"</span>; _ <span class="ot">-></span> <span class="st">"|"</span></span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> prompt <span class="ot">=</span> addColor promptColor <span class="op">$</span> promptSym <span class="op"><></span> <span class="st">"> "</span></span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a> mInput <span class="ot"><-</span> H.getInputLineWithInitial prompt (<span class="fu">replicate</span> prevIndent <span class="ch">' '</span>, <span class="st">""</span>)</span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> currentIndent <span class="ot">=</span> <span class="fu">maybe</span> <span class="dv">0</span> (<span class="fu">length</span> <span class="op">.</span> <span class="fu">takeWhile</span> (<span class="op">==</span> <span class="ch">' '</span>)) mInput</span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> trimStart <span class="op">.</span> trimEnd <span class="op"><$></span> mInput <span class="kw">of</span></span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="fu">return</span> <span class="dt">EndOfInput</span></span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> input <span class="op">|</span> <span class="fu">null</span> input <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a> replIndent <span class="op">.=</span> <span class="kw">case</span> lineMode <span class="kw">of</span></span>
<span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">SingleLine</span> <span class="ot">-></span> prevIndent</span>
<span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">MultiLine</span> <span class="ot">-></span> currentIndent</span>
<span id="cb16-19"><a href="#cb16-19" aria-hidden="true" tabindex="-1"></a> readInput</span>
<span id="cb16-20"><a href="#cb16-20" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> input<span class="op">@</span>(<span class="ch">':'</span> <span class="op">:</span> _) <span class="ot">-></span> parseCommand input</span>
<span id="cb16-21"><a href="#cb16-21" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> input <span class="ot">-></span> parseCode input currentIndent</span>
<span id="cb16-22"><a href="#cb16-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-23"><a href="#cb16-23" aria-hidden="true" tabindex="-1"></a><span class="ot">trimStart ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb16-24"><a href="#cb16-24" aria-hidden="true" tabindex="-1"></a>trimStart <span class="ot">=</span> <span class="fu">dropWhile</span> <span class="dt">Char</span><span class="op">.</span><span class="fu">isSpace</span></span>
<span id="cb16-25"><a href="#cb16-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-26"><a href="#cb16-26" aria-hidden="true" tabindex="-1"></a><span class="ot">trimEnd ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb16-27"><a href="#cb16-27" aria-hidden="true" tabindex="-1"></a>trimEnd <span class="ot">=</span> dropWhileEnd <span class="dt">Char</span><span class="op">.</span><span class="fu">isSpace</span></span></code></pre></div>
<p>We use the <code>getInputLineWithInitial</code> function provided by Haskeline to show a prompt and read user’s input as a string. The prompt shown depends on the <code class="sourceCode haskell"><span class="dt">LineMode</span></code> of the REPL state. In the <code class="sourceCode haskell"><span class="dt">SingleLine</span></code> mode we show <code>λ></code>, where in the <code class="sourceCode haskell"><span class="dt">MultiLine</span></code> mode we show <code>|></code>.</p>
<p>If there is no input, that means the user has quit the REPL. In that case we return <code class="sourceCode haskell"><span class="dt">EndOfInput</span></code>, which is handled in the <a href="#cb13-3"><code>repl</code></a> function. If the input is empty, we read more input, preserving the previous indentation (<code>prevIndent</code>) in the <code class="sourceCode haskell"><span class="dt">MultiLine</span></code> mode.</p>
<p>If the input starts with <code>:</code>, we parse it for various commands:</p>
<div class="sourceCode" id="cb17" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">parseCommand ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Prompt</span> <span class="dt">Input</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>parseCommand input</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="st">":help"</span> <span class="ot">`isPrefixOf`</span> input <span class="ot">=</span> <span class="fu">return</span> <span class="dt">Help</span></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="st">":load"</span> <span class="ot">`isPrefixOf`</span> input <span class="ot">=</span></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a> checkFilePath <span class="op">.</span> trimStart <span class="op">.</span> fromJust <span class="op">$</span> stripPrefix <span class="st">":load"</span> input</span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="st">":source"</span> <span class="ot">`isPrefixOf`</span> input <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> <span class="op">.</span> <span class="dt">Source</span> <span class="op">.</span> trimStart <span class="op">.</span> fromJust <span class="op">$</span> stripPrefix <span class="st">":source"</span> input</span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> input <span class="op">==</span> <span class="st">":"</span> <span class="ot">=</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">BadInputError</span> <span class="st">"No command specified"</span></span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="kw">case</span> parseSettingCommand input <span class="kw">of</span></span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> setting <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Setting</span> setting</span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> err <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">BadInputError</span> err</span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a><span class="ot">checkFilePath ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Prompt</span> <span class="dt">Input</span></span>
<span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a>checkFilePath file</span>
<span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">null</span> file <span class="ot">=</span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">BadInputError</span> <span class="st">"No file specified"</span></span>
<span id="cb17-16"><a href="#cb17-16" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span></span>
<span id="cb17-17"><a href="#cb17-17" aria-hidden="true" tabindex="-1"></a> isSafeFilePath file <span class="op"><&></span> \<span class="kw">case</span></span>
<span id="cb17-18"><a href="#cb17-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">True</span> <span class="ot">-></span> <span class="dt">Load</span> file</span>
<span id="cb17-19"><a href="#cb17-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">False</span> <span class="ot">-></span> <span class="dt">BadInputError</span> <span class="op">$</span> <span class="st">"Cannot access file: "</span> <span class="op"><></span> file</span>
<span id="cb17-20"><a href="#cb17-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-21"><a href="#cb17-21" aria-hidden="true" tabindex="-1"></a><span class="ot">isSafeFilePath ::</span> (<span class="dt">MonadIO</span> m) <span class="ot">=></span> <span class="dt">FilePath</span> <span class="ot">-></span> m <span class="dt">Bool</span></span>
<span id="cb17-22"><a href="#cb17-22" aria-hidden="true" tabindex="-1"></a>isSafeFilePath fp <span class="ot">=</span></span>
<span id="cb17-23"><a href="#cb17-23" aria-hidden="true" tabindex="-1"></a> liftIO <span class="op">$</span> isPrefixOf <span class="op"><$></span> getCurrentDirectory <span class="op"><*></span> canonicalizePath fp</span></code></pre></div>
<p>The <code>:help</code> and <code>:source</code> cases are straightforward. In case of <code>:load</code>, we make sure to check that the file asked to be loaded is located somewhere inside the current directory of the REPL or its recursive subdirectories. Otherwise, we deny loading by returning a <code class="sourceCode haskell"><span class="dt">BadInputError</span></code>. We parse the settings using the <a href="#cb9-20"><code>parseSettingCommand</code></a> function we wrote earlier.</p>
<p>If the input is not a command, we parse it as code:</p>
<div class="sourceCode" id="cb18" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">parseCode ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Prompt</span> <span class="dt">Input</span></span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>parseCode currentInput indent <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a> seenInput <span class="ot"><-</span> use replSeenInput</span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> input <span class="ot">=</span> seenInput <span class="op"><></span> <span class="st">" "</span> <span class="op"><></span> currentInput</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> L.parse input <span class="kw">of</span></span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> <span class="dt">L.EndOfStreamError</span> <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a> replLineMode <span class="op">.=</span> <span class="dt">MultiLine</span></span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a> replIndent <span class="op">.=</span> indent</span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a> replSeenInput <span class="op">.=</span> input</span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a> readInput</span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> err <span class="ot">-></span></span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> <span class="op">$</span> <span class="dt">BadInputError</span> <span class="op">$</span> <span class="st">"ERROR: "</span> <span class="op"><></span> displayException err</span>
<span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> program <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Program</span> program</span></code></pre></div>
<p>We append the previously seen input (in case of multiline input) with the current input and parse it using the <code>parse</code> function provided by the <code class="sourceCode haskell"><span class="dt">Language.FiboLisp</span></code> module. If parsing fails with an
<code class="sourceCode haskell"><span class="dt">EndOfStreamError</span></code>, it means that the input is incomplete. In that case, we set the REPL line mode to <code class="sourceCode haskell"><span class="dt">Multiline</span></code>, REPL indentation to the current indentation, and seen input to the previously seen input appended with the current input, and read more input. If it is some other error, we return a <code class="sourceCode haskell"><span class="dt">BadInputError</span></code> with it.</p>
<p>If the result of parsing is a program, we return it as a <code class="sourceCode haskell"><span class="dt">Program</span></code> input.</p>
<p>That’s it for reading the user input. Next, we evaluate it.</p>
<h2 data-track-content data-content-name="evaluating-the-input" data-content-piece="repling-with-haskeline" id="evaluating-the-input">Evaluating the Input</h2>
<p>Recall that the <a href="#cb13-3"><code>repl</code></a> function calls the <code>evalAndPrint</code> function with the read input:</p>
<div class="sourceCode" id="cb19" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">evalAndPrint ::</span> <span class="dt">Input</span> <span class="ot">-></span> <span class="dt">Prompt</span> ()</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>evalAndPrint <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">EndOfInput</span> <span class="ot">-></span> <span class="fu">return</span> ()</span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">BadInputError</span> err <span class="ot">-></span> outputWithColor errorColor err</span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Help</span> <span class="ot">-></span> H.outputStr helpMessage</span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Setting</span> (<span class="dt">Set</span>, setting) <span class="ot">-></span> replSettings <span class="op">%=</span> Set.insert setting</span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Setting</span> (<span class="dt">Unset</span>, setting) <span class="ot">-></span> replSettings <span class="op">%=</span> Set.delete setting</span>
<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Source</span> ident <span class="ot">-></span> showSource ident</span>
<span id="cb19-9"><a href="#cb19-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">Load</span> fp <span class="ot">-></span> loadAndEvalFile fp</span>
<span id="cb19-10"><a href="#cb19-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Program</span> program <span class="ot">-></span> interpretAndPrint program</span>
<span id="cb19-11"><a href="#cb19-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb19-12"><a href="#cb19-12" aria-hidden="true" tabindex="-1"></a> helpMessage <span class="ot">=</span></span>
<span id="cb19-13"><a href="#cb19-13" aria-hidden="true" tabindex="-1"></a> <span class="fu">unlines</span></span>
<span id="cb19-14"><a href="#cb19-14" aria-hidden="true" tabindex="-1"></a> [ <span class="st">"Available commands"</span>,</span>
<span id="cb19-15"><a href="#cb19-15" aria-hidden="true" tabindex="-1"></a> <span class="st">":set/:unset dump Dumps the program AST"</span>,</span>
<span id="cb19-16"><a href="#cb19-16" aria-hidden="true" tabindex="-1"></a> <span class="st">":set/:unset time Shows the program execution time"</span>,</span>
<span id="cb19-17"><a href="#cb19-17" aria-hidden="true" tabindex="-1"></a> <span class="st">":load <file> Loads a source file"</span>,</span>
<span id="cb19-18"><a href="#cb19-18" aria-hidden="true" tabindex="-1"></a> <span class="st">":source <func_name> Prints the source code of a function"</span>,</span>
<span id="cb19-19"><a href="#cb19-19" aria-hidden="true" tabindex="-1"></a> <span class="st">":help Shows this help"</span></span>
<span id="cb19-20"><a href="#cb19-20" aria-hidden="true" tabindex="-1"></a> ]</span></code></pre></div>
<p>The cases of <code class="sourceCode haskell"><span class="dt">EndOfInput</span></code>, <code class="sourceCode haskell"><span class="dt">BadInputError</span></code> and <code class="sourceCode haskell"><span class="dt">Help</span></code> are straightforward. For settings, we insert or remove the setting from the REPL settings, depending on it being set or unset. For the other cases, we call the respective helper functions.</p>
<p>For a <code>:source</code> command, we check if the requested identifier maps to a user-defined or builtin function, and if so, print its source. Otherwise we print an error.</p>
<div class="sourceCode" id="cb20" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">showSource ::</span> <span class="dt">L.Ident</span> <span class="ot">-></span> <span class="dt">Prompt</span> ()</span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>showSource ident <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a> defs <span class="ot"><-</span> use replDefs</span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> Map.lookup ident defs <span class="kw">of</span></span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> def <span class="ot">-></span> outputWithColor infoColor <span class="op">$</span> L.prettyShowDef def</span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="kw">case</span> Map.lookup ident L.builtinFuncs <span class="kw">of</span></span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> func <span class="ot">-></span> outputWithColor infoColor <span class="op">$</span> <span class="fu">show</span> func</span>
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span></span>
<span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a> outputWithColor errorColor <span class="op">$</span> <span class="st">"No such function: "</span> <span class="op"><></span> ident</span></code></pre></div>
<p>For a <code>:load</code> command, we check if the requested file exists. If so, we read and parse it, and interpret the resultant program. In case of any errors in reading or parsing the file, we catch and print them.</p>
<div class="sourceCode" id="cb21" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">loadAndEvalFile ::</span> <span class="dt">FilePath</span> <span class="ot">-></span> <span class="dt">Prompt</span> ()</span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>loadAndEvalFile fp <span class="ot">=</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a> liftIO (doesFileExist fp) <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">False</span> <span class="ot">-></span> outputWithColor errorColor <span class="op">$</span> <span class="st">"No such file: "</span> <span class="op"><></span> fp</span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">True</span> <span class="ot">-></span> Catch.handleAll outputError <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a> code <span class="ot"><-</span> liftIO <span class="op">$</span> <span class="fu">readFile</span> fp</span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a> outputWithColor infoColor <span class="op">$</span> <span class="st">"Loaded "</span> <span class="op"><></span> fp</span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> L.parse code <span class="kw">of</span></span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> err <span class="ot">-></span> outputError err</span>
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> program <span class="ot">-></span> interpretAndPrint program</span>
<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a><span class="ot">outputError ::</span> (<span class="dt">Exception</span> e) <span class="ot">=></span> e <span class="ot">-></span> <span class="dt">Prompt</span> ()</span>
<span id="cb21-13"><a href="#cb21-13" aria-hidden="true" tabindex="-1"></a>outputError err <span class="ot">=</span></span>
<span id="cb21-14"><a href="#cb21-14" aria-hidden="true" tabindex="-1"></a> outputWithColor errorColor <span class="op">$</span> <span class="st">"ERROR: "</span> <span class="op"><></span> displayException err</span></code></pre></div>
<p>Finally, we come to the workhorse of the REPL: the interpretation of the user provided program:</p>
<div class="sourceCode" id="cb22" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="ot">interpretAndPrint ::</span> <span class="dt">L.Program</span> <span class="ot">-></span> <span class="dt">Prompt</span> ()</span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>interpretAndPrint (<span class="dt">L.Program</span> pDefs exprs) <span class="ot">=</span></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a> Catch.handleAll outputError <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a> defs <span class="ot"><-</span> use replDefs</span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a> settings <span class="ot"><-</span> use replSettings</span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> defs' <span class="ot">=</span></span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a> foldl' (\ds d <span class="ot">-></span> Map.insert (L.defName d) d ds) defs pDefs</span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a> program <span class="ot">=</span> <span class="dt">L.Program</span> (Map.elems defs') exprs</span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a> when (<span class="dt">Dump</span> <span class="ot">`Set.member`</span> settings) <span class="op">$</span></span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a> outputWithColor infoColor (L.showProgram program)</span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a> addColor <span class="ot"><-</span> getAddColor</span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a> extPrint <span class="ot"><-</span> H.getExternalPrint</span>
<span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-16"><a href="#cb22-16" aria-hidden="true" tabindex="-1"></a> (execTime, val) <span class="ot"><-</span> liftIO <span class="op">.</span> measureElapsedTime <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb22-17"><a href="#cb22-17" aria-hidden="true" tabindex="-1"></a> val <span class="ot"><-</span> L.interpret (extPrint <span class="op">.</span> addColor printColor) program</span>
<span id="cb22-18"><a href="#cb22-18" aria-hidden="true" tabindex="-1"></a> evaluate <span class="op">$</span> DS.force val</span>
<span id="cb22-19"><a href="#cb22-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-20"><a href="#cb22-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> val <span class="kw">of</span></span>
<span id="cb22-21"><a href="#cb22-21" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> err <span class="ot">-></span> outputError err</span>
<span id="cb22-22"><a href="#cb22-22" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> v <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb22-23"><a href="#cb22-23" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> output <span class="ot">=</span> <span class="fu">show</span> v</span>
<span id="cb22-24"><a href="#cb22-24" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> <span class="fu">null</span> output</span>
<span id="cb22-25"><a href="#cb22-25" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> <span class="fu">return</span> ()</span>
<span id="cb22-26"><a href="#cb22-26" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> outputWithColor outputColor <span class="op">$</span> <span class="st">"=> "</span> <span class="op"><></span> output</span>
<span id="cb22-27"><a href="#cb22-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-28"><a href="#cb22-28" aria-hidden="true" tabindex="-1"></a> when (<span class="dt">MeasureTime</span> <span class="ot">`Set.member`</span> settings) <span class="op">$</span></span>
<span id="cb22-29"><a href="#cb22-29" aria-hidden="true" tabindex="-1"></a> outputWithColor infoColor <span class="op">$</span></span>
<span id="cb22-30"><a href="#cb22-30" aria-hidden="true" tabindex="-1"></a> <span class="st">"(Execution time: "</span> <span class="op"><></span> <span class="fu">show</span> execTime <span class="op"><></span> <span class="st">")"</span></span>
<span id="cb22-31"><a href="#cb22-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-32"><a href="#cb22-32" aria-hidden="true" tabindex="-1"></a> replDefs <span class="op">.=</span> defs'</span>
<span id="cb22-33"><a href="#cb22-33" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-34"><a href="#cb22-34" aria-hidden="true" tabindex="-1"></a><span class="ot">measureElapsedTime ::</span> <span class="dt">IO</span> a <span class="ot">-></span> <span class="dt">IO</span> (<span class="dt">NominalDiffTime</span>, a)</span>
<span id="cb22-35"><a href="#cb22-35" aria-hidden="true" tabindex="-1"></a>measureElapsedTime f <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb22-36"><a href="#cb22-36" aria-hidden="true" tabindex="-1"></a> start <span class="ot"><-</span> getCurrentTime</span>
<span id="cb22-37"><a href="#cb22-37" aria-hidden="true" tabindex="-1"></a> ret <span class="ot"><-</span> f</span>
<span id="cb22-38"><a href="#cb22-38" aria-hidden="true" tabindex="-1"></a> end <span class="ot"><-</span> getCurrentTime</span>
<span id="cb22-39"><a href="#cb22-39" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> (diffUTCTime end start, ret)</span></code></pre></div>
<p>We start by collecting the user defined functions in the current input with the previously defined functions in the session such that current functions override the previous functions with the same names. At this point, if the <code>dump</code> setting is set, we print the program AST.</p>
<p>Then we invoke the <code>interpret</code> function provided by the <code class="sourceCode haskell"><span class="dt">Language.FiboLisp</span></code> module. Recall that the <a href="#cb5-10"><code>interpret</code></a> function takes the program to interpret and a function of type <code class="sourceCode haskell"><span class="dt">String</span> <span class="ot">-></span> <span class="dt">IO</span> ()</code>. This function is a color-adding wrapper over the function returned by the Haskeline function <code>getExternalPrint</code><a href="#fn12" class="footnote-ref" id="fnref12" role="doc-noteref"><sup>12</sup></a>. This function allows non-REPL code to safely print to the Haskeline driven REPL without garbling the output. We pass it to the <code>interpret</code> function so that the interpret can invoke it when the user code invokes the builtin <code>print</code> function or similar.</p>
<p>We make sure to <a href="https://hackage.haskell.org/package/deepseq/docs/Control-DeepSeq.html#v:force" target="_blank" rel="noopener"><code>force</code></a> and <a href="https://hackage.haskell.org/package/base/docs/Control-Exception.html#v:evaluate" target="_blank" rel="noopener"><code>evaluate</code></a> the value returned by the interpreter so that any lazy values or errors are fully evaluated<a href="#fn13" class="footnote-ref" id="fnref13" role="doc-noteref"><sup>13</sup></a>, and the measured elapsed time is correct.</p>
<p>If the interpreter returns an error, we print it. Else we convert the value to a string, and if is it not empty<a href="#fn14" class="footnote-ref" id="fnref14" role="doc-noteref"><sup>14</sup></a>, we print it.</p>
<p>Finally, we print the execution time if the <code>time</code> setting is set, and set the REPL defs to the current program defs.</p>
<p>That’s all! We have completed our REPL. But wait, I think we forgot one thing …</p>
<h2 data-track-content data-content-name="doing-the-completions" data-content-piece="repling-with-haskeline" id="doing-the-completions">Doing the Completions</h2>
<p>The REPL would work fine with this much code, but it would not be a good experience for the user, because they’d have to type everything without any help from the REPL. To make it convenient for the user, we provide contextual auto-completion functionality while typing. Haskeline lets us plug in our custom completion logic by setting a completion function, which we did <a href="#cb11-14">way back</a> at the start. Now we need to implement it.</p>
<div class="sourceCode" id="cb23" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="ot">doCompletions ::</span> <span class="dt">H.CompletionFunc</span> <span class="dt">Repl</span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a>doCompletions <span class="ot">=</span></span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">fmap</span> runIdentityT <span class="op">.</span> H.completeWordWithPrev <span class="dt">Nothing</span> <span class="st">" "</span> <span class="op">$</span> \leftRev word <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a> defs <span class="ot"><-</span> use replDefs</span>
<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a> lineMode <span class="ot"><-</span> use replLineMode</span>
<span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"></a> settings <span class="ot"><-</span> use replSettings</span>
<span id="cb23-7"><a href="#cb23-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> funcs <span class="ot">=</span> nub <span class="op">$</span> Map.keys defs <span class="op"><></span> Map.keys L.builtinFuncs</span>
<span id="cb23-8"><a href="#cb23-8" aria-hidden="true" tabindex="-1"></a> vals <span class="ot">=</span> <span class="fu">map</span> <span class="fu">show</span> L.builtinVals</span>
<span id="cb23-9"><a href="#cb23-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> (word, lineMode) <span class="kw">of</span></span>
<span id="cb23-10"><a href="#cb23-10" aria-hidden="true" tabindex="-1"></a> (<span class="ch">'('</span> <span class="op">:</span> rest, _) <span class="ot">-></span></span>
<span id="cb23-11"><a href="#cb23-11" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span></span>
<span id="cb23-12"><a href="#cb23-12" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">H.Completion</span> (<span class="ch">'('</span> <span class="op">:</span> hint) hint <span class="dt">True</span></span>
<span id="cb23-13"><a href="#cb23-13" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> hint <span class="ot"><-</span> nub <span class="op">.</span> <span class="fu">sort</span> <span class="op">$</span> L.carKeywords <span class="op"><></span> funcs,</span>
<span id="cb23-14"><a href="#cb23-14" aria-hidden="true" tabindex="-1"></a> rest <span class="ot">`isPrefixOf`</span> hint</span>
<span id="cb23-15"><a href="#cb23-15" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb23-16"><a href="#cb23-16" aria-hidden="true" tabindex="-1"></a> (_, <span class="dt">SingleLine</span>) <span class="ot">-></span> <span class="kw">case</span> word <span class="kw">of</span></span>
<span id="cb23-17"><a href="#cb23-17" aria-hidden="true" tabindex="-1"></a> <span class="st">""</span> <span class="op">|</span> <span class="fu">null</span> leftRev <span class="ot">-></span></span>
<span id="cb23-18"><a href="#cb23-18" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> [<span class="dt">H.Completion</span> <span class="st">""</span> s <span class="dt">True</span> <span class="op">|</span> s <span class="ot"><-</span> commands <span class="op"><></span> funcs <span class="op"><></span> vals]</span>
<span id="cb23-19"><a href="#cb23-19" aria-hidden="true" tabindex="-1"></a> <span class="ch">':'</span> <span class="op">:</span> _ <span class="op">|</span> <span class="fu">null</span> leftRev <span class="ot">-></span></span>
<span id="cb23-20"><a href="#cb23-20" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> [H.simpleCompletion c <span class="op">|</span> c <span class="ot"><-</span> commands, word <span class="ot">`isPrefixOf`</span> c]</span>
<span id="cb23-21"><a href="#cb23-21" aria-hidden="true" tabindex="-1"></a> _</span>
<span id="cb23-22"><a href="#cb23-22" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="st">"tes:"</span> <span class="ot">`isSuffixOf`</span> leftRev <span class="ot">-></span></span>
<span id="cb23-23"><a href="#cb23-23" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span></span>
<span id="cb23-24"><a href="#cb23-24" aria-hidden="true" tabindex="-1"></a> [ H.simpleCompletion <span class="op">$</span> <span class="fu">show</span> s</span>
<span id="cb23-25"><a href="#cb23-25" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> s <span class="ot"><-</span> [<span class="dt">Dump</span> <span class="op">..</span>], s <span class="ot">`notElem`</span> settings, word <span class="ot">`isPrefixOf`</span> <span class="fu">show</span> s</span>
<span id="cb23-26"><a href="#cb23-26" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb23-27"><a href="#cb23-27" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="st">"tesnu:"</span> <span class="ot">`isSuffixOf`</span> leftRev <span class="ot">-></span></span>
<span id="cb23-28"><a href="#cb23-28" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span></span>
<span id="cb23-29"><a href="#cb23-29" aria-hidden="true" tabindex="-1"></a> [ H.simpleCompletion <span class="op">$</span> <span class="fu">show</span> s</span>
<span id="cb23-30"><a href="#cb23-30" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> s <span class="ot"><-</span> [<span class="dt">Dump</span> <span class="op">..</span>], s <span class="ot">`elem`</span> settings, word <span class="ot">`isPrefixOf`</span> <span class="fu">show</span> s</span>
<span id="cb23-31"><a href="#cb23-31" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb23-32"><a href="#cb23-32" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="st">"daol:"</span> <span class="ot">`isSuffixOf`</span> leftRev <span class="ot">-></span></span>
<span id="cb23-33"><a href="#cb23-33" aria-hidden="true" tabindex="-1"></a> isSafeFilePath word <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb23-34"><a href="#cb23-34" aria-hidden="true" tabindex="-1"></a> <span class="dt">True</span> <span class="ot">-></span> H.listFiles word</span>
<span id="cb23-35"><a href="#cb23-35" aria-hidden="true" tabindex="-1"></a> <span class="dt">False</span> <span class="ot">-></span> <span class="fu">pure</span> []</span>
<span id="cb23-36"><a href="#cb23-36" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="st">"ecruos:"</span> <span class="ot">`isSuffixOf`</span> leftRev <span class="ot">-></span></span>
<span id="cb23-37"><a href="#cb23-37" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span></span>
<span id="cb23-38"><a href="#cb23-38" aria-hidden="true" tabindex="-1"></a> [ H.simpleCompletion ident</span>
<span id="cb23-39"><a href="#cb23-39" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> ident <span class="ot"><-</span> funcs,</span>
<span id="cb23-40"><a href="#cb23-40" aria-hidden="true" tabindex="-1"></a> ident <span class="ot">`Map.notMember`</span> L.builtinFuncs,</span>
<span id="cb23-41"><a href="#cb23-41" aria-hidden="true" tabindex="-1"></a> word <span class="ot">`isPrefixOf`</span> ident</span>
<span id="cb23-42"><a href="#cb23-42" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb23-43"><a href="#cb23-43" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">-></span></span>
<span id="cb23-44"><a href="#cb23-44" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> [H.simpleCompletion c <span class="op">|</span> c <span class="ot"><-</span> funcs <span class="op"><></span> vals, word <span class="ot">`isPrefixOf`</span> c]</span>
<span id="cb23-45"><a href="#cb23-45" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">pure</span> []</span>
<span id="cb23-46"><a href="#cb23-46" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb23-47"><a href="#cb23-47" aria-hidden="true" tabindex="-1"></a> commands <span class="ot">=</span> <span class="st">":help"</span> <span class="op">:</span> <span class="st">":load"</span> <span class="op">:</span> <span class="st">":source"</span> <span class="op">:</span> <span class="fu">map</span> <span class="fu">show</span> [<span class="dt">Set</span> <span class="op">..</span>]</span></code></pre></div>
<p>Haskeline provides us the <code>completeWordWithPrev</code> function to easily create our own completion function. It takes a callback function that it calls with the current word being completed (the word immediately to the left of the cursor), and the content of the line before the word (to the left of the word), reversed. We use these to return different completion lists of strings.</p>
<p>Going case by case:</p>
<ol type="1">
<li>If the word starts with <code>(</code>, it means we are in middle of writing FiboLisp code. So we return the <a href="#cb1-27"><code>carKeywords</code></a> and the user-defined and builtin function names that start with the current word sans the initial <code>(</code>. This happens regardless of the current line mode. Rest of the cases below apply only in the <code class="sourceCode haskell"><span class="dt">SingleLine</span></code> mode.</li>
<li>If the entire line is empty, we return the names of all commands, functions, and builtin values.</li>
<li>If the word starts with <code>:</code>, and is at the beginning of the line, we return the commands that start with the word.</li>
<li>If the line starts with
<ol type="i">
<li><code>:set</code>, we return the <strong>not set</strong> settings</li>
<li><code>:unset</code>, we return the <strong>set</strong> settings</li>
<li><code>:load</code>, we return the names of the files and directories in the current directory</li>
<li><code>:source</code>, we return the names of the user-defined functions</li>
</ol>
that start with the word.</li>
<li>Otherwise we return no completions.</li>
</ol>
<p>This covers all cases, and provides helpful completions, while avoiding bad ones. And this completes the implementation of our wonderful REPL.</p>
<h2 data-track-content data-content-name="conclusion" data-content-piece="repling-with-haskeline" id="conclusion">Conclusion</h2>
<p>I wrote this REPL while implementing a Lisp that I wrote<a href="#fn15" class="footnote-ref" id="fnref15" role="doc-noteref"><sup>15</sup></a> while going through the <a href="https://web.archive.org/web/20241031/https://mitpress.mit.edu/9780262047760/essentials-of-compilation/" target="_blank" rel="noopener">Essentials of Compilation</a> book, which I thoroughly recommend for getting started with compilers. It started as a basic REPL, and gathered a lot of nice functionalities over time. So I decided to extract and share it here. I hope that this Haskeline tutorial helps you in creating beautiful and useful REPLs. Here is the complete <a href="https://abhinavsarkar.net/code/fibolisp-repl.html?mtm_campaign=feed">code</a> for the REPL.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>The online demo is rather slow to load and to run, and works only on Firefox and Chrome. Even though I managed to put it together somehow, I don’t actually know how it exactly works, and I’m unable to fix the issues with it.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>Lisps are awesome and I absolutely recommend creating one or more of them as an amateur PL implementer. Some resources I recommend are: the <a href="https://web.archive.org/web/20241031/https://www.buildyourownlisp.com/" target="_blank" rel="noopener">Build Your Own Lisp</a> book, and the <a href="https://github.com/kanaka/mal" target="_blank" rel="noopener">Make-A-Lisp</a> tutorial.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>REPLs are wonderful for doing interactive and exploratory programming where you try out small snippets of code in the REPL, and put your program together piece-by-piece. They are also good for debugging because they let you inspect the state of running programs from within. I still fondly remember the experience of connecting (or <a href="https://docs.cider.mx/cider/basics/up_and_running.html#launch-an-nrepl-server-from-emacs" target="_blank" rel="noopener">jacking in</a>) to running productions systems written in <a href="https://clojure.org" target="_blank" rel="noopener">Clojure</a> over REPL, and figuring out issues by dumping variables.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>We don’t even need <code class="sourceCode clojure"><span class="kw">let</span></code>. We can, and have to, define variables by creating functions, with parameters serving the role of variables. In fact, we can’t even assign or reassign variables. Functions are the only scoping mechanism in FiboLisp, much like old-school JavaScript with its <a href="https://developer.mozilla.org/en-US/docs/Glossary/IIFE" target="_blank" rel="noopener">IIFEs</a>.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p><em>car</em> is obviously <a href="https://en.wikipedia.org/wiki/CAR_and_CDR" target="_blank" rel="noopener"><strong>C</strong>ontents of the <strong>A</strong>ddress part of the <strong>R</strong>egister</a>, the first expression in a list form in a Lisp.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>You may be wondering about why we need the <code class="sourceCode haskell"><span class="dt">NFData</span></code> instances for the errors and values. This will become clear when we write the REPL.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>I recommend the <a href="https://hackage.haskell.org/package/sexp-grammar" target="_blank" rel="noopener">sexp-grammar</a> library, which provides both parsing and printing facilities for S-expressions based languages. Or you can write something by yourself using the parsing and pretty-printing libraries like <a href="https://hackage.haskell.org/package/megaparsec" target="_blank" rel="noopener">megaparsec</a> and <a href="https://hackage.haskell.org/package/prettyprinter" target="_blank" rel="noopener">prettyprinter</a>.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>We assume that our project’s Cabal file sets the default-language to GHC2021, and the default-extensions to <code class="sourceCode haskell"><span class="dt">LambdaCase</span></code>, <code class="sourceCode haskell"><span class="dt">OverloadedStrings</span></code>, <code class="sourceCode haskell"><span class="dt">RecordWildCards</span></code>, and <code class="sourceCode haskell"><span class="dt">StrictData</span></code>.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9"><p>Recall that there is no way to define variables in FiboLisp.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn10"><p>If the interpreter allows mutually recursive function definitions, functions can be called before defining them.<a href="#fnref10" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn11"><p>We are using the <a href="https://hackage.haskell.org/package/basic-lens" target="_blank" rel="noopener">basic-lens</a> library here, which is the tiniest lens library, and provides only the five functions and types we see used here.<a href="#fnref11" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn12"><p>Using the function returned from <code>getExternalPrint</code> is not necessary in our case because the REPL blocks when it invokes the interpreter. That means, nothing but the interpreter can print anything while it is running. So the interpreter can actually print directly to <code>stdout</code> and nothing will go wrong.</p>
<p>However, imagine a case in which our code starts a background thread that needs to print to the REPL. In such case, we must use the Haskeline provided print function instead of printing directly. When printing to the REPL using it, Haskeline coordinates the prints so that the output in the terminal is not garbled.<a href="#fnref12" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn13"><p>Now we see why we <a href="#cb5-12">derive</a> <code>NFData</code> instances for errors and <code class="sourceCode haskell"><span class="dt">Value</span></code>.<a href="#fnref13" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn14"><p>Returned value could be of type void with no textual representation, in which case we would not print it.<a href="#fnref14" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn15"><p>I wrote the original REPL code almost three years ago. I refactored, rewrote and improved a lot of it in the course of writing this post. As they say, writing is thinking.<a href="#fnref15" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/repling-with-haskeline/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2024-10-31T00:00:00Z <p>So you went ahead and created a new programming language, with an AST, a parser, and an interpreter. And now you hate how you have to write the programs in your new language in files to run them? You need a <a href="https://en.wikipedia.org/wiki/REPL" target="_blank" rel="noopener">REPL</a>! In this post, we’ll create a shiny REPL with lots of nice features using the Haskeline library to go along with your new PL that you implemented in Haskell.</p>
https://abhinavsarkar.net/posts/nix-for-haskell/ Getting Started with Nix for Haskell 2024-08-29T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>So, you’ve heard of the new hotness that is <a href="https://nixos.org" target="_blank" rel="noopener">Nix</a>, for creating reproducible and isolated development environments, and want to use it for your new Haskell project? But you are unclear about how to get started? Then this is the guide you are looking for.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/nix-for-haskell/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more-->
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#nix-for-haskell">Nix for Haskell</a></li><li><a href="#shelling-out">Shelling Out</a></li><li><a href="#bonus-round-flakes">Bonus Round: Flakes</a></li><li><a href="#conclusion">Conclusion</a></li></ol></nav>
<p>Nix is notoriously hard to get started with. If you are familiar with Haskell, you may have an easier time learning the <a href="https://nix.dev/manual/nix/2.24/language/" target="_blank" rel="noopener">Nix language</a>, but it is still difficult to figure out the various toolchains and library functions needed to put your knowledge of the Nix language to use. There are <a href="https://github.com/pwm/nixkell" target="_blank" rel="noopener">some</a> <a href="https://github.com/srid/haskell-flake" target="_blank" rel="noopener">frameworks</a> for setting up Haskell projects with Nix, but again, they are hard to understand because of their large feature scopes. So, in this post, I’m going to show a really easy way for you to get started.</p>
<h2 data-track-content data-content-name="nix-for-haskell" data-content-piece="nix-for-haskell" id="nix-for-haskell">Nix for Haskell</h2>
<p>But first, what does it mean to use Nix for a Haskell project? It means that all the dependencies of our projects — Haskell packages, and non-Haskell ones too — come from <a href="https://github.com/NixOS/nixpkgs/" target="_blank" rel="noopener">Nixpkgs</a>, a repository of software configured and managed using Nix<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. It also means that all the tools we use for development, such as builders, linters, style checkers, LSP servers, and everything else, also come from Nixpkgs<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>. And all of this happens by writing some configuration files in the Nix language.</p>
<p>Start with creating a new directory for the project. For the purpose of this post, we name this project <code>ftr</code>:</p>
<div class="sourceCode" id="cb1" data-lang="shell"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> mkdir ftr</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> cd ftr</span></code></pre></div>
<p>The first thing to do is to set up the project to point to the Nixpkgs repo — more specifically, a particular fixed version of the repo — so that our builds are reproducible<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>. We do this by using <a href="https://github.com/nmattia/niv" target="_blank" rel="noopener">Niv</a>.</p>
<p>Niv is a tool for pinning/locking down the version of the Nixpkgs repo, much like <code>cabal freeze</code> or <code>npm freeze</code>. But instead of pinning each dependency at some version, we pin the entire repo (from which all the dependencies come) at a version.</p>
<p>Run the following commands:</p>
<div class="sourceCode" id="cb2" data-lang="shell"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> nix-shell <span class="at">-p</span> niv</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> niv init</span></code></pre></div>
<p>Running <code>nix-shell -p niv</code> drops us into a nested shell in which the <code>niv</code> executable is available. Running <code>niv init</code> sets up Niv for our project, creating <code>nix/sources.{json|nix}</code> files. The <code>nix/sources.json</code> file is where the Nixpkgs repo version is pinned<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>. If we open it now, it may look something like this:</p>
<figure>
<div class="sourceCode" id="cb3" data-lang="json"><pre class="sourceCode numberSource json"><code class="sourceCode json"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">"nixpkgs"</span><span class="fu">:</span> <span class="fu">{</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">"branch"</span><span class="fu">:</span> <span class="st">"nixos-unstable"</span><span class="fu">,</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">"description"</span><span class="fu">:</span> <span class="st">"Nix Packages collection"</span><span class="fu">,</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">"homepage"</span><span class="fu">:</span> <span class="kw">null</span><span class="fu">,</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">"owner"</span><span class="fu">:</span> <span class="st">"NixOS"</span><span class="fu">,</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">"repo"</span><span class="fu">:</span> <span class="st">"nixpkgs"</span><span class="fu">,</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">"rev"</span><span class="fu">:</span> <span class="st">"6c43a3495a11e261e5f41e5d7eda2d71dae1b2fe"</span><span class="fu">,</span></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">"sha256"</span><span class="fu">:</span> <span class="st">"16f329z831bq7l3wn1dfvbkh95l2gcggdwn6rk3cisdmv2aa3189"</span><span class="fu">,</span></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">"type"</span><span class="fu">:</span> <span class="st">"tarball"</span><span class="fu">,</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">"url"</span><span class="fu">:</span> <span class="st">"https://github.com/NixOS/nixpkgs/archive/6c43a3495a11e261e5f41e5d7eda2d71dae1b2fe.tar.gz"</span><span class="fu">,</span></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">"url_template"</span><span class="fu">:</span> <span class="st">"https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"</span></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a><span class="fu">}</span></span></code></pre></div>
<figcaption>
nix/sources.json
</figcaption>
</figure>
<p>By default, Niv sets up the Nixpkgs repo, pinned to some version. Let’s pin it to the latest stable version as of the time of writing this post: 24.05. Run:</p>
<div class="sourceCode" id="cb4" data-lang="shell"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> niv drop nixpkgs</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> niv add NixOS/nixpkgs <span class="at">-n</span> nixpkgs <span class="at">-b</span> nixos-24.05</span></code></pre></div>
<p>Now, <code>nix/sources.json</code> may look like this:</p>
<figure>
<div class="sourceCode" id="cb5" data-lang="json"><pre class="sourceCode numberSource json"><code class="sourceCode json"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">"nixpkgs"</span><span class="fu">:</span> <span class="fu">{</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">"branch"</span><span class="fu">:</span> <span class="st">"nixos-24.05"</span><span class="fu">,</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">"description"</span><span class="fu">:</span> <span class="st">"Nix Packages collection & NixOS"</span><span class="fu">,</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">"homepage"</span><span class="fu">:</span> <span class="st">""</span><span class="fu">,</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">"owner"</span><span class="fu">:</span> <span class="st">"NixOS"</span><span class="fu">,</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">"repo"</span><span class="fu">:</span> <span class="st">"nixpkgs"</span><span class="fu">,</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">"rev"</span><span class="fu">:</span> <span class="st">"36bae45077667aff5720e5b3f1a5458f51cf0776"</span><span class="fu">,</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">"sha256"</span><span class="fu">:</span> <span class="st">"0mkbsp2f07lrqcnlsnybi6kbxdr7sjs3hiz4kf4jkqirk4qgswfi"</span><span class="fu">,</span></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">"type"</span><span class="fu">:</span> <span class="st">"tarball"</span><span class="fu">,</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">"url"</span><span class="fu">:</span> <span class="st">"https://github.com/NixOS/nixpkgs/archive/36bae45077667aff5720e5b3f1a5458f51cf0776.tar.gz"</span><span class="fu">,</span></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">"url_template"</span><span class="fu">:</span> <span class="st">"https://github.com/<owner>/<repo>/archive/<rev>.tar.gz"</span></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a><span class="fu">}</span></span></code></pre></div>
<figcaption>
nix/sources.json
</figcaption>
</figure>
<p>Pinning is done. Now, let’s get some stuff from the repo. But wait, first we have to configure Nixpkgs. Create a file <code>nix/nixpkgs.nix</code>:</p>
<figure>
<div class="sourceCode" id="cb6" data-lang="nix"><pre class="sourceCode numberSource nix"><code class="sourceCode nix"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="op">{</span> <span class="va">system</span> <span class="op">?</span> <span class="bu">builtins</span><span class="op">.</span>currentSystem <span class="op">}</span>:</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> <span class="va">sources</span> <span class="op">=</span> <span class="bu">import</span> <span class="ss">./sources.nix</span><span class="op">;</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="kw">in</span> <span class="bu">import</span> sources<span class="op">.</span>nixpkgs <span class="op">{</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">inherit</span> <span class="va">system</span><span class="op">;</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> <span class="va">overlays</span> <span class="op">=</span> <span class="op">[</span> <span class="op">];</span></span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> <span class="va">config</span> <span class="op">=</span> <span class="op">{</span> <span class="op">};</span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<figcaption>
nix/nixpkgs.nix
</figcaption>
</figure>
<p>Well, I lied. We <em>could</em> configure Nixpkgs if we had to<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>, but for this post, we leave all the settings empty, and just import it from Niv sources.</p>
<p>At this point, we could start pulling things from Nixpkgs manually, but to make it declarative and reproducible, let’s create our own Nix shell.</p>
<h2 data-track-content data-content-name="shelling-out" data-content-piece="nix-for-haskell" id="shelling-out">Shelling Out</h2>
<p>Create a file named <code>shell.nix</code>:</p>
<figure>
<div class="sourceCode" id="cb7" data-lang="nix"><pre class="sourceCode numberSource nix"><code class="sourceCode nix"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="op">{</span> <span class="va">system</span> <span class="op">?</span> <span class="bu">builtins</span><span class="op">.</span>currentSystem<span class="op">,</span> <span class="va">devTools</span> <span class="op">?</span> <span class="cn">true</span> <span class="op">}</span>:</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="va">pkgs</span> <span class="op">=</span> <span class="bu">import</span> <span class="ss">./nix/nixpkgs.nix</span> <span class="op">{</span> <span class="kw">inherit</span> <span class="va">system</span><span class="op">;</span> <span class="op">};</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> <span class="va">myHaskellPackages</span> <span class="op">=</span> pkgs<span class="op">.</span>haskellPackages<span class="op">;</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="kw">in</span> myHaskellPackages<span class="op">.</span>shellFor <span class="op">{</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> <span class="va">packages</span> <span class="op">=</span> <span class="va">p</span><span class="op">:</span> <span class="op">[</span> <span class="op">];</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> <span class="va">nativeBuildInputs</span> <span class="op">=</span> <span class="kw">with</span> pkgs<span class="op">;</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> <span class="op">[</span> ghc cabal-install <span class="op">]</span> <span class="op">++</span> lib<span class="op">.</span>optional devTools <span class="op">[</span></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> niv</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a> hlint</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a> ormolu</span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a> <span class="op">(</span>ghc<span class="op">.</span>withPackages <span class="op">(</span><span class="va">p</span><span class="op">:</span> <span class="op">[</span> p<span class="op">.</span>haskell-language-server <span class="op">]))</span></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a> <span class="op">];</span></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<figcaption>
shell.nix
</figcaption>
</figure>
<p>Ah! Now, the Nix magic is shining through. What <code>shell.nix</code> does is, it creates a custom Nix shell with the things we mention already available in the shell. <code>pkgs.haskellPackages.shellFor</code> is how we create the custom shell, and <code>nativeBuildInputs</code> are the tools we want available.</p>
<p>We make <code>ghc</code> and <code>cabal-install</code> mandatorily available, because they are necessary for doing any Haskell development; and <code>niv</code>, <code>hlint</code>, <code>ormolu</code> and <code>haskell-language-server</code><a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a><a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a> optionally available (depending on the passed <code>devTools</code> flag), because we need them only when writing code.</p>
<p>Exit the previous Nix shell, and start a new one to start working on the project<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>:</p>
<div class="sourceCode" id="cb8" data-lang="shell"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> nix-shell <span class="at">--arg</span> devTools false</span></code></pre></div>
<p>Okay, I lied again, we are still setting up. In this new shell, <code>hlint</code>, <code>ormoulu</code> etc are not available but we can run <code>cabal</code> now. We use it to initialize the Haskell project:</p>
<div class="sourceCode" id="cb9" data-lang="shell"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> cabal init <span class="at">-p</span> ftr</span></code></pre></div>
<p>After answering all the questions Cabal asks us, we are left with a <code>ftr.cabal</code> file, along with some starter Haskell code in the right directories. Let’s build and run the starter code:</p>
<pre class="plain"><code>$ cabal run
Hello, Haskell!</code></pre>
<p>It works!</p>
<p>Edit the <code>ftr.cabal</code> file now to add some new Haskell dependency (without a version), such as <a href="https://hackage.haskell.org/package/extra" target="_blank" rel="noopener"><code>extra</code></a>. If we run <code>cabal build</code> now, Cabal will start downloading the <code>extra</code> package. Cancel that! We want our dependencies to come from Nixpkgs, not Hackage. For that we need to tell Nix about our Haskell project.</p>
<p>Create a file <code>package.nix</code>:</p>
<figure>
<div class="sourceCode" id="cb11" data-lang="nix"><pre class="sourceCode numberSource nix"><code class="sourceCode nix"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="op">{</span> <span class="va">system</span> <span class="op">?</span> <span class="bu">builtins</span><span class="op">.</span>currentSystem <span class="op">}</span>:</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> <span class="va">pkgs</span> <span class="op">=</span> <span class="bu">import</span> <span class="ss">./nix/nixpkgs.nix</span> <span class="op">{</span> <span class="kw">inherit</span> <span class="va">system</span><span class="op">;</span> <span class="op">};</span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> <span class="va">hlib</span> <span class="op">=</span> pkgs<span class="op">.</span>haskell<span class="op">.</span>lib<span class="op">.</span>compose<span class="op">;</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a><span class="kw">in</span> pkgs<span class="op">.</span>lib<span class="op">.</span>pipe</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a><span class="op">(</span>pkgs<span class="op">.</span>haskellPackages<span class="op">.</span>callCabal2nix <span class="st">"ftr"</span> <span class="op">(</span>pkgs<span class="op">.</span>lib<span class="op">.</span>cleanSource <span class="ss">./.</span><span class="op">)</span> <span class="op">{</span> <span class="op">})</span></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a><span class="op">[</span> hlib<span class="op">.</span>dontHaddock <span class="op">]</span></span></code></pre></div>
<figcaption>
package.nix
</figcaption>
</figure>
<p>The <code>package.nix</code> file is the Nix representation of the Cabal package for our project. We use <a href="https://github.com/NixOS/cabal2nix" target="_blank" rel="noopener"><code>cabal2nix</code></a> here, a tool that makes Nix aware of Cabal files, making it capable of pulling the right Haskell dependencies from Nixpkgs. We also configure Nix to not run <a href="https://www.haskell.org/haddock/" target="_blank" rel="noopener">Haddock</a> on our code by setting the <code>hlib.dontHaddock</code> option<a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a>, since we are not going to write any doc for this demo project.</p>
<p>Now, edit <code>shell.nix</code> to make it aware of our new Nix package:</p>
<figure>
<div id="cb1" class="sourceCode" data-lang="nix" data-emphasize="4-5,7-7"><pre class="sourceCode numberSource nix"><code class="sourceCode nix"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="op">{</span> <span class="va">system</span> <span class="op">?</span> <span class="bu">builtins</span><span class="op">.</span>currentSystem<span class="op">,</span> <span class="va">devTools</span> <span class="op">?</span> <span class="cn">true</span> <span class="op">}</span>:</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="va">pkgs</span> <span class="op">=</span> <span class="bu">import</span> <span class="ss">./nix/nixpkgs.nix</span> <span class="op">{</span> <span class="kw">inherit</span> <span class="va">system</span><span class="op">;</span> <span class="op">};</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="va">myHaskellPackages</span> <span class="op">=</span> pkgs<span class="op">.</span>haskellPackages<span class="op">.</span>extend</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op">(</span><span class="va">final</span><span class="op">:</span> <span class="va">prev</span><span class="op">:</span> <span class="op">{</span> <span class="va">ftr</span> <span class="op">=</span> <span class="bu">import</span> <span class="ss">./package.nix</span> <span class="op">{</span> <span class="kw">inherit</span> <span class="va">system</span><span class="op">;</span> <span class="op">};</span> <span class="op">});</span></span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="kw">in</span> myHaskellPackages<span class="op">.</span>shellFor <span class="op">{</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="va">packages</span> <span class="op">=</span> <span class="va">p</span><span class="op">:</span> <span class="op">[</span> p<span class="op">.</span>ftr <span class="op">];</span></span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="va">nativeBuildInputs</span> <span class="op">=</span> <span class="kw">with</span> pkgs<span class="op">;</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="op">[</span> ghc cabal-install <span class="op">]</span> <span class="op">++</span> lib<span class="op">.</span>optional devTools <span class="op">[</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> niv</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> hlint</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> ormolu</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="op">(</span>ghc<span class="op">.</span>withPackages <span class="op">(</span><span class="va">p</span><span class="op">:</span> <span class="op">[</span> p<span class="op">.</span>haskell-language-server <span class="op">]))</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="op">];</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<figcaption>
shell.nix
</figcaption>
</figure>
<p>We extend Haskell packages from Nixpkgs with our own package <code>ftr</code>, and add an entry in the previously empty <code>packages</code> list. This makes all the Haskell dependencies we mention in <code>ftr.cabal</code> available in the Nix shell. Exit the Nix shell now, and restart it by running:</p>
<div class="sourceCode" id="cb12" data-lang="shell"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> nix-shell <span class="at">--arg</span> devTools false</span></code></pre></div>
<p>We can run <code>cabal build</code> now. Notice that nothing is downloaded from Hackage this time.</p>
<p>Even better, we can now build our project using Nix:</p>
<div class="sourceCode" id="cb13" data-lang="shell"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> nix-build package.nix</span></code></pre></div>
<p>This builds our project in a truly isolated environment outside the Nix shell, and puts the results in the <code>result</code> directory. Go ahead and try running it:</p>
<pre class="plain"><code>$ result/bin/ftr
Hello, Haskell!</code></pre>
<p>Great! Now we can quit and restart the Nix shell without the <code>--arg devTools false</code> option. This will download and set up all the fancy dev tools we configured. Then we can start our favorite editor from the terminal and have access to all of them in it<a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a>.</p>
<p>This is all we need to get started on a Haskell project with Nix. There are some inconveniences in this setup, like we need to restart the Nix shell and the editor every time we modify our project dependencies, but these days most editors come with some extensions to do this automatically, without needing restarts. For more seamless experience in the terminal, we could install <a href="https://direnv.net/" target="_blank" rel="noopener"><code>direnv</code></a> and <a href="https://github.com/nix-community/nix-direnv" target="_blank" rel="noopener"><code>nix-direnv</code></a> that refresh the Nix shells automatically<a href="#fn11" class="footnote-ref" id="fnref11" role="doc-noteref"><sup>11</sup></a>.</p>
<h2 data-track-content data-content-name="bonus-round-flakes" data-content-piece="nix-for-haskell" id="bonus-round-flakes">Bonus Round: Flakes</h2>
<p>As a bonus, I’m going to show how to easily set up a <a href="https://nix.dev/concepts/flakes.html" target="_blank" rel="noopener">Nix Flake</a> for this project. Simply create a <code>flake.nix</code> file:</p>
<figure>
<div class="sourceCode" id="cb16" data-lang="nix"><pre class="sourceCode numberSource nix"><code class="sourceCode nix"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a> <span class="va">description</span> <span class="op">=</span> <span class="st">"ftr is demo project for using Nix to manage Haskell projects"</span><span class="op">;</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> <span class="va">inputs</span>.<span class="va">flake-utils</span>.<span class="va">url</span> <span class="op">=</span> <span class="st">"github:numtide/flake-utils"</span><span class="op">;</span></span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a> <span class="va">outputs</span> <span class="op">=</span> <span class="op">{</span> <span class="va">self</span><span class="op">,</span> <span class="va">flake-utils</span> <span class="op">}</span>:</span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a> flake<span class="op">-</span>utils<span class="op">.</span>lib<span class="op">.</span>eachDefaultSystem <span class="op">(</span><span class="va">system</span><span class="op">:</span></span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> <span class="va">ftr</span> <span class="op">=</span> <span class="bu">import</span> <span class="ss">./package.nix</span> <span class="op">{</span> <span class="kw">inherit</span> <span class="va">system</span><span class="op">;</span> <span class="op">};</span></span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="kw">rec</span> <span class="op">{</span></span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a> <span class="va">devShells</span>.<span class="va">default</span> <span class="op">=</span> <span class="bu">import</span> <span class="ss">./shell.nix</span> <span class="op">{</span> <span class="kw">inherit</span> <span class="va">system</span><span class="op">;</span> <span class="op">};</span></span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a> <span class="va">packages</span>.<span class="va">default</span> <span class="op">=</span> ftr<span class="op">;</span></span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a> <span class="va">apps</span>.<span class="va">default</span> <span class="op">=</span> <span class="op">{</span></span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a> <span class="va">type</span> <span class="op">=</span> <span class="st">"app"</span><span class="op">;</span></span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a> <span class="va">program</span> <span class="op">=</span> <span class="st">"</span><span class="sc">${</span>ftr<span class="sc">}</span><span class="st">/bin/ftr"</span><span class="op">;</span></span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a> <span class="op">};</span></span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a> <span class="op">});</span></span>
<span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<figcaption>
flake.nix
</figcaption>
</figure>
<p>We reuse the package and shell Nix files we created earlier. We have to commit everything to our VSC at this point. After that, we can run the newfangled Nix commands such as<a href="#fn12" class="footnote-ref" id="fnref12" role="doc-noteref"><sup>12</sup></a>:</p>
<div class="sourceCode" id="cb18" data-lang="shell"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> nix develop <span class="co"># same as: nix-shell</span></span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> nix build <span class="co"># same as: nix-build package</span></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> nix shell <span class="co"># builds the package and starts a shell with the built executable available</span></span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> nix run <span class="co"># builds the package and runs the built executable</span></span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> nix profile install <span class="co"># builds the package and installs the built executable in our Nix profile</span></span></code></pre></div>
<p>If we upload the project to a public Github repo, anyone with Nix set up can run and/or install our package executable by running single commands:</p>
<div class="sourceCode" id="cb19" data-lang="shell"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> nix run github:username/ftr <span class="co"># downloads, builds and runs without installing</span></span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a><span class="ex">$</span> nix profile install github:username/ftr <span class="co"># downloads, builds and installs</span></span></code></pre></div>
<p>If that not super cool then I don’t know what is.</p>
<h2 class="notoc" data-track-content data-content-name="bonus-round-2-statically-linked-executable" data-content-piece="nix-for-haskell" id="bonus-round-2-statically-linked-executable">Bonus Round 2: Statically Linked Executable</h2>
<p>Create a file <code>package-static.nix</code> and <code>nix-build</code> it to create a statically linked executable on Linux<a href="#fn13" class="footnote-ref" id="fnref13" role="doc-noteref"><sup>13</sup></a>, which can be run on any Linux machine without installing any dependency libraries or even Nix<a href="#fn14" class="footnote-ref" id="fnref14" role="doc-noteref"><sup>14</sup></a>:</p>
<figure>
<div class="sourceCode" id="cb20" data-lang="nix"><pre class="sourceCode numberSource nix"><code class="sourceCode nix"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="op">{</span> <span class="va">system</span> <span class="op">?</span> <span class="bu">builtins</span><span class="op">.</span>currentSystem <span class="op">}</span>:</span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span></span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a> <span class="va">sources</span> <span class="op">=</span> <span class="bu">import</span> <span class="ss">./nix/sources.nix</span><span class="op">;</span></span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a> <span class="va">nixpkgs</span> <span class="op">=</span> <span class="bu">import</span> sources<span class="op">.</span>nixpkgs <span class="op">{</span></span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">inherit</span> <span class="va">system</span><span class="op">;</span></span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a> <span class="va">overlays</span> <span class="op">=</span> <span class="op">[</span></span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a> <span class="op">(</span><span class="va">final</span><span class="op">:</span> <span class="va">prev</span><span class="op">:</span> <span class="op">{</span></span>
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a> <span class="va">haskellPackages</span> <span class="op">=</span> prev<span class="op">.</span>haskellPackages<span class="op">.</span>override <span class="op">{</span></span>
<span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a> <span class="va">ghc</span> <span class="op">=</span> prev<span class="op">.</span>haskellPackages<span class="op">.</span>ghc<span class="op">.</span>override <span class="op">{</span></span>
<span id="cb20-10"><a href="#cb20-10" aria-hidden="true" tabindex="-1"></a> <span class="va">enableRelocatedStaticLibs</span> <span class="op">=</span> <span class="cn">true</span><span class="op">;</span></span>
<span id="cb20-11"><a href="#cb20-11" aria-hidden="true" tabindex="-1"></a> <span class="va">enableShared</span> <span class="op">=</span> <span class="cn">false</span><span class="op">;</span></span>
<span id="cb20-12"><a href="#cb20-12" aria-hidden="true" tabindex="-1"></a> <span class="va">enableDwarf</span> <span class="op">=</span> <span class="cn">false</span><span class="op">;</span></span>
<span id="cb20-13"><a href="#cb20-13" aria-hidden="true" tabindex="-1"></a> <span class="op">};</span></span>
<span id="cb20-14"><a href="#cb20-14" aria-hidden="true" tabindex="-1"></a> <span class="va">buildHaskellPackages</span> <span class="op">=</span></span>
<span id="cb20-15"><a href="#cb20-15" aria-hidden="true" tabindex="-1"></a> prev<span class="op">.</span>haskellPackages<span class="op">.</span>buildHaskellPackages<span class="op">.</span>override</span>
<span id="cb20-16"><a href="#cb20-16" aria-hidden="true" tabindex="-1"></a> <span class="op">(</span><span class="va">old</span><span class="op">:</span> <span class="op">{</span> <span class="va">ghc</span> <span class="op">=</span> final<span class="op">.</span>haskellPackages<span class="op">.</span>ghc<span class="op">;</span> <span class="op">});</span></span>
<span id="cb20-17"><a href="#cb20-17" aria-hidden="true" tabindex="-1"></a> <span class="op">};</span></span>
<span id="cb20-18"><a href="#cb20-18" aria-hidden="true" tabindex="-1"></a> <span class="op">})</span></span>
<span id="cb20-19"><a href="#cb20-19" aria-hidden="true" tabindex="-1"></a> <span class="op">];</span></span>
<span id="cb20-20"><a href="#cb20-20" aria-hidden="true" tabindex="-1"></a> <span class="va">config</span> <span class="op">=</span> <span class="op">{</span> <span class="op">};</span></span>
<span id="cb20-21"><a href="#cb20-21" aria-hidden="true" tabindex="-1"></a> <span class="op">};</span></span>
<span id="cb20-22"><a href="#cb20-22" aria-hidden="true" tabindex="-1"></a> <span class="va">pkgs</span> <span class="op">=</span> nixpkgs<span class="op">.</span>pkgsMusl<span class="op">;</span></span>
<span id="cb20-23"><a href="#cb20-23" aria-hidden="true" tabindex="-1"></a> <span class="va">hlib</span> <span class="op">=</span> pkgs<span class="op">.</span>haskell<span class="op">.</span>lib<span class="op">.</span>compose<span class="op">;</span></span>
<span id="cb20-24"><a href="#cb20-24" aria-hidden="true" tabindex="-1"></a><span class="kw">in</span> pkgs<span class="op">.</span>lib<span class="op">.</span>pipe</span>
<span id="cb20-25"><a href="#cb20-25" aria-hidden="true" tabindex="-1"></a><span class="op">(</span>pkgs<span class="op">.</span>haskellPackages<span class="op">.</span>callCabal2nix <span class="st">"ftr"</span> <span class="op">(</span>pkgs<span class="op">.</span>lib<span class="op">.</span>cleanSource <span class="ss">./.</span><span class="op">)</span> <span class="op">{</span> <span class="op">})</span> <span class="op">[</span></span>
<span id="cb20-26"><a href="#cb20-26" aria-hidden="true" tabindex="-1"></a> hlib<span class="op">.</span>dontHaddock</span>
<span id="cb20-27"><a href="#cb20-27" aria-hidden="true" tabindex="-1"></a> hlib<span class="op">.</span>justStaticExecutables</span>
<span id="cb20-28"><a href="#cb20-28" aria-hidden="true" tabindex="-1"></a> hlib<span class="op">.</span>disableSharedLibraries</span>
<span id="cb20-29"><a href="#cb20-29" aria-hidden="true" tabindex="-1"></a> hlib<span class="op">.</span>enableDeadCodeElimination</span>
<span id="cb20-30"><a href="#cb20-30" aria-hidden="true" tabindex="-1"></a> <span class="op">(</span>hlib<span class="op">.</span>appendConfigureFlags <span class="op">[</span></span>
<span id="cb20-31"><a href="#cb20-31" aria-hidden="true" tabindex="-1"></a> <span class="st">"-O2"</span></span>
<span id="cb20-32"><a href="#cb20-32" aria-hidden="true" tabindex="-1"></a> <span class="st">"--ghc-option=-fPIC"</span></span>
<span id="cb20-33"><a href="#cb20-33" aria-hidden="true" tabindex="-1"></a> <span class="st">"--ghc-option=-optl=-static"</span></span>
<span id="cb20-34"><a href="#cb20-34" aria-hidden="true" tabindex="-1"></a> <span class="st">"--extra-lib-dirs=</span><span class="sc">${</span>pkgs<span class="op">.</span>gmp6<span class="op">.</span>override <span class="op">{</span> <span class="va">withStatic</span> <span class="op">=</span> <span class="cn">true</span><span class="op">;</span> <span class="op">}</span><span class="sc">}</span><span class="st">/lib"</span></span>
<span id="cb20-35"><a href="#cb20-35" aria-hidden="true" tabindex="-1"></a> <span class="st">"--extra-lib-dirs=</span><span class="sc">${</span></span>
<span id="cb20-36"><a href="#cb20-36" aria-hidden="true" tabindex="-1"></a> pkgs<span class="op">.</span>libffi<span class="op">.</span>overrideAttrs <span class="op">(</span><span class="va">old</span><span class="op">:</span> <span class="op">{</span> <span class="va">dontDisableStatic</span> <span class="op">=</span> <span class="cn">true</span><span class="op">;</span> <span class="op">})</span></span>
<span id="cb20-37"><a href="#cb20-37" aria-hidden="true" tabindex="-1"></a> <span class="sc">}</span><span class="st">/lib"</span></span>
<span id="cb20-38"><a href="#cb20-38" aria-hidden="true" tabindex="-1"></a> <span class="st">"--extra-lib-dirs=</span><span class="sc">${</span>pkgs<span class="op">.</span>ncurses<span class="op">.</span>override <span class="op">{</span> <span class="va">enableStatic</span> <span class="op">=</span> <span class="cn">true</span><span class="op">;</span> <span class="op">}</span><span class="sc">}</span><span class="st">/lib"</span></span>
<span id="cb20-39"><a href="#cb20-39" aria-hidden="true" tabindex="-1"></a> <span class="st">"--extra-lib-dirs=</span><span class="sc">${</span>pkgs<span class="op">.</span>zlib<span class="op">.</span>static<span class="sc">}</span><span class="st">/lib"</span></span>
<span id="cb20-40"><a href="#cb20-40" aria-hidden="true" tabindex="-1"></a> <span class="op">])</span></span>
<span id="cb20-41"><a href="#cb20-41" aria-hidden="true" tabindex="-1"></a><span class="op">]</span></span></code></pre></div>
<figcaption>
package-static.nix
</figcaption>
</figure>
<h2 data-track-content data-content-name="conclusion" data-content-piece="nix-for-haskell" id="conclusion">Conclusion</h2>
<p>This post shows a quick and easy way to get started with using Nix for managing simple Haskell projects. Unfortunately, if we have any complex requirements, such as custom dependency versions, patched dependencies, custom non-Haskell dependencies, custom configuration for Nixpkgs, multi-component Haskell projects, using a different GHC version, custom build scripts etc, this setup does not scale. In such case you can either grow this setup by learning Nix in more depth with the help of the official <a href="https://wiki.nixos.org/w/index.php?title=Haskell" target="_blank" rel="noopener">Haskell with Nix docs</a> and this <a href="https://github.com/Gabriella439/haskell-nix" target="_blank" rel="noopener">great tutorial</a>, or switch to using a framework like <a href="https://github.com/pwm/nixkell" target="_blank" rel="noopener">Nixkell</a> or <a href="https://github.com/srid/haskell-flake" target="_blank" rel="noopener">haskell-flake</a>.</p>
<p>This post only scratches the surface of all things possible to do with Nix. I hope I was able to showcase some benefits of Nix, and help you get started. Happy Haskelling and happy Nixing!</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>One big advantage that Nix has over using Cabal for managing Haskell projects is the <a href="https://cache.nixos.org/" target="_blank" rel="noopener">Nix binary cache</a> that provides pre-built libraries and executable for download. That means no more waiting for Cabal to build scores of dependencies from sources.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>Search Nixpkgs for packages at <a href="https://search.nixos.org/" target="_blank" rel="noopener">search.nixos.org</a>.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>I’m assuming that you’ve already set up Nix at this point. If you have not, follow <a href="https://nix.dev/manual/nix/2.24/installation/" target="_blank" rel="noopener">this guide</a>.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>Of course, we can use Niv to manage any number of source repos, not just Nixpkgs. But we don’t need any other for this post.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>We could do all sort of interesting and useful things here, like patching some Nixpkgs packages with our own patches, reconfiguring the build flags of some packages, etc.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p><a href="https://github.com/ndmitchell/hlint" target="_blank" rel="noopener"><code>hlint</code></a> is a Haskell linter, <a href="https://github.com/tweag/ormolu" target="_blank" rel="noopener"><code>ormolu</code></a> is a Haskell file formatter, and <a href="https://github.com/haskell/haskell-language-server/" target="_blank" rel="noopener"><code>haskell-language-server</code></a> is an LSP server for Haskell. Other tools that I find useful are <a href="https://github.com/kowainik/stan" target="_blank" rel="noopener"><code>stan</code></a>, the Haskell static analyzer, <a href="https://just.systems/" target="_blank" rel="noopener"><code>just</code></a>, the command runner, and <a href="https://github.com/NixOS/nixfmt" target="_blank" rel="noopener"><code>nixfmt</code></a>, the Nix file formatter. All of them and more are available through Nixpkgs. You can start using them by adding them to <code>nativeBuildInputs</code>.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>If you are wondering why we need to wrap only <code>haskell-language-server</code> with all the <code>ghc</code> stuff, that’s because, to work correctly <code>haskell-language-server</code> is required to be compiled with same version of <code>ghc</code> that your project is going to used. The other tools do not have this restriction.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>You may notice Nix downloading a lot of stuff from Nixpkgs. It may occasionally need to build a few things as well, if they are not available in the binary cache.</p>
<p>You may need to tweak the <code>connect-timeout</code> and <code>download-attempts</code> settings in the <a href="https://nix.dev/manual/nix/2.24/command-ref/conf-file.html" target="_blank" rel="noopener"><code>nix.conf</code></a> file if you are on a slow network.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9"><p>There are many more options that we can set here. These options roughly correspond to the command line options for the <code>cabal</code> command. See a comprehensive list <a href="https://ryantm.github.io/nixpkgs/languages-frameworks/haskell/#haskell-development-helpers" target="_blank" rel="noopener">here</a>.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn10"><p>To update the tools and dependencies of the project, run <code>niv update nixpkgs</code>, and restart the Nix shell.<a href="#fnref10" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn11"><p>Use this <code>.envrc</code> file to configure <code>direnv</code> for automatic refreshes for this project:</p>
<div class="sourceCode" id="cb15" data-lang="bash"><pre class="sourceCode numberSource bash"><code class="sourceCode bash"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="co">#!/usr/bin/env bash</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="ex">use</span> nix</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="ex">watch_file</span> shell.nix</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="ex">watch_file</span> nix/sources.json</span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a><span class="ex">watch_file</span> ftr.cabal</span></code></pre></div>
<a href="#fnref11" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn12"><p>First, we would have to modify our <code>nix.conf</code> file to enable these commands by adding the line:</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode numberSource conf"><code class="sourceCode toml"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="dt">experimental-features</span> <span class="op">=</span> <span class="dt">nix-command</span> <span class="dt">flakes</span></span></code></pre></div>
<a href="#fnref12" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn13"><p>This might take several hours to finish when run for the first time. Also, the <code>enableDwarf = false</code> config requires GHC >= 9.6.<a href="#fnref13" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn14"><p>Another benefit of statically linked executables is, if you package them in Docker/OCI containers, the container sizes are much smaller than ones created for dynamically linked executables.<a href="#fnref14" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/nix-for-haskell/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2024-08-29T00:00:00Z <p>So, you’ve heard of the new hotness that is <a href="https://nixos.org" target="_blank" rel="noopener">Nix</a>, for creating reproducible and isolated development environments, and want to use it for your new Haskell project? But you are unclear about how to get started? Then this is the guide you are looking for.</p>
https://abhinavsarkar.net/posts/compiling-aoc23-aplenty/ Solving Advent of Code “Aplenty” by Compiling 2024-04-07T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>Every year I try to solve some problems from the <a href="https://adventofcode.com" target="_blank" rel="noopener">Advent of Code</a> (AoC) competition in a <a href="https://abhinavsarkar.net/posts/type-level-haskell-aoc7/?mtm_campaign=feed">not</a> <a href="https://abhinavsarkar.net/notes/2022-type-level-rps?mtm_campaign=feed">straightforward</a> <a href="https://abhinavsarkar.net/posts/parsers-zippers-interpreters-aoc7/?mtm_campaign=feed">way</a>. Let’s solve the part one of the day 19 problem <a href="https://adventofcode.com/2023/day/19" target="_blank" rel="noopener">Aplenty</a> by compiling the problem input to an executable file.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/compiling-aoc23-aplenty/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>Solving Advent of Code</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/type-level-haskell-aoc7/?mtm_campaign=feed">“Handy Haversacks” in Type-level Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/parsers-zippers-interpreters-aoc7/?mtm_campaign=feed">“No Space Left On Device” with Parsers, Zippers and Interpreters</a>
</li>
<li>
<a href="https://abhinavsarkar.net/notes/2022-type-level-rps/?mtm_campaign=feed">“Rock-Paper-Scissors” in Type-level Haskell</a>
</li>
<li>
<strong>“Aplenty” by Compiling</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/solving-aoc20-seating-system/?mtm_campaign=feed">“Seating System” with Comonads and Stencils</a>
</li>
</ol>
</section>
<nav id="toc"><h3>Contents</h3><ol><li><a href="#the-problem">The Problem</a></li><li><a href="#the-parser">The Parser</a></li><li><a href="#the-interpreter">The Interpreter</a></li><li><a href="#the-control-flow-graph">The Control-flow Graph</a></li><li><a href="#the-compiler">The Compiler</a></li><li><a href="#the-compiler-output">The Compiler Output</a></li><li><a href="#the-bonus-optimizations">The Bonus: Optimizations</a></li><li><a href="#the-conclusion">The Conclusion</a></li></ol></nav>
<h2 data-track-content data-content-name="the-problem" data-content-piece="compiling-aoc23-aplenty" id="the-problem">The Problem</h2>
<p>What the problem presents as input is essentially a program. Here is the example input:</p>
<figure>
<pre class="plain"><code>px{a<2006:qkq,m>2090:A,rfg}
pv{a>1716:R,A}
lnx{m>1548:A,A}
rfg{s<537:gd,x>2440:R,A}
qs{s>3448:A,lnx}
qkq{x<1416:A,crn}
crn{x>2662:A,R}
in{s<1351:px,qqz}
qqz{s>2770:qs,m<1801:hdj,R}
gd{a>3333:R,R}
hdj{m>838:A,pv}
{x=787,m=2655,a=1222,s=2876}
{x=1679,m=44,a=2067,s=496}
{x=2036,m=264,a=79,s=2244}
{x=2461,m=1339,a=466,s=291}
{x=2127,m=1623,a=2188,s=1013}</code></pre>
<figcaption>
exinput.txt
</figcaption>
</figure>
<p>Each line in the first section of the input is a code block. The bodies of the blocks have statements of these types:</p>
<ul>
<li><em>Accept</em> (<code>A</code>) or <em>Reject</em> (<code>R</code>) that terminate the program.</li>
<li>Jumps to other blocks by their names, for example: <code>rfg</code> as the last statement of the <code>px</code> block in the first line.</li>
<li>Conditional statements that have a condition and what to do if the condition is true, which can be only Accept/Reject or a jump to another block.</li>
</ul>
<p>The problem calls the statements <em>“rules”</em>, the blocks <em>“workflows”</em>, and the program <em>“system”</em>.</p>
<p>All blocks of the program operates on a set of four values: <code>x</code>, <code>m</code>, <code>a</code>, and <code>s</code>. The problem calls them <em>“ratings”</em>, and each set of ratings is for/forms a <em>“part”</em>. The second section of the input specifies a bunch of these parts to run the system against.</p>
<p>This seems to map very well to a C program, with <code>Accept</code> and <code>Reject</code> returning <code class="sourceCode c"><span class="kw">true</span></code> and <code class="sourceCode c"><span class="kw">false</span></code> respectively, and jumps accomplished using <code class="sourceCode c"><span class="cf">goto</span></code>s. So that’s what we’ll do: we’ll compile the problem input to a C program, then compile that to an executable, and run it to get the solution to the problem.</p>
<p>And of course, we’ll do all this in Haskell. First some imports:</p>
<div class="sourceCode" id="cb2" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE LambdaCase #-}</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE StrictData #-}</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Main</span> <span class="kw">where</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Array</span> <span class="kw">as</span> <span class="dt">Array</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Char</span> (digitToInt, isAlpha, isDigit)</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Foldable</span> (foldl', foldr')</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Function</span> (fix)</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Functor</span> (($>))</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Graph</span> <span class="kw">as</span> <span class="dt">Graph</span></span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.List</span> (intercalate, (\\))</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Map.Strict</span> <span class="kw">as</span> <span class="dt">Map</span></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.Environment</span> (getArgs)</span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Text.ParserCombinators.ReadP</span> <span class="kw">as</span> <span class="dt">P</span></span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Prelude</span> <span class="kw">hiding</span> (<span class="dt">GT</span>, <span class="dt">LT</span>)</span></code></pre></div>
<h2 data-track-content data-content-name="the-parser" data-content-piece="compiling-aoc23-aplenty" id="the-parser">The Parser</h2>
<p>First, we parse the input program to Haskell data types. We use the <a href="https://hackage.haskell.org/package/base/docs/Text-ParserCombinators-ReadP.html" target="_blank" rel="noopener">ReadP</a> parser library built into the Haskell standard library.</p>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Part</span> <span class="ot">=</span> <span class="dt">Part</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> partX ::</span> <span class="dt">Int</span>,</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="ot"> partM ::</span> <span class="dt">Int</span>,</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="ot"> partA ::</span> <span class="dt">Int</span>,</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="ot"> partS ::</span> <span class="dt">Int</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> } <span class="kw">deriving</span> (<span class="dt">Show</span>)</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Rating</span> <span class="ot">=</span> <span class="dt">X</span> <span class="op">|</span> <span class="dt">M</span> <span class="op">|</span> <span class="dt">A</span> <span class="op">|</span> <span class="dt">S</span> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a><span class="ot">emptyPart ::</span> <span class="dt">Part</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a>emptyPart <span class="ot">=</span> <span class="dt">Part</span> <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span> <span class="dv">0</span></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a><span class="ot">addRating ::</span> <span class="dt">Part</span> <span class="ot">-></span> (<span class="dt">Rating</span>, <span class="dt">Int</span>) <span class="ot">-></span> <span class="dt">Part</span></span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a>addRating p (r, v) <span class="ot">=</span> <span class="kw">case</span> r <span class="kw">of</span></span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">X</span> <span class="ot">-></span> p {partX <span class="ot">=</span> v}</span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">M</span> <span class="ot">-></span> p {partM <span class="ot">=</span> v}</span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">A</span> <span class="ot">-></span> p {partA <span class="ot">=</span> v}</span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">S</span> <span class="ot">-></span> p {partS <span class="ot">=</span> v}</span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a><span class="ot">partParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Part</span></span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a>partParser <span class="ot">=</span></span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a> foldl' addRating emptyPart</span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> P.between (P.char <span class="ch">'{'</span>) (P.char <span class="ch">'}'</span>)</span>
<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a> (partRatingParser <span class="ot">`P.sepBy1`</span> P.char <span class="ch">','</span>)</span>
<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a><span class="ot">partRatingParser ::</span> <span class="dt">P.ReadP</span> (<span class="dt">Rating</span>, <span class="dt">Int</span>)</span>
<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a>partRatingParser <span class="ot">=</span></span>
<span id="cb3-28"><a href="#cb3-28" aria-hidden="true" tabindex="-1"></a> (,) <span class="op"><$></span> ratingParser <span class="op"><*></span> (P.char <span class="ch">'='</span> <span class="op">*></span> intParser)</span>
<span id="cb3-29"><a href="#cb3-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-30"><a href="#cb3-30" aria-hidden="true" tabindex="-1"></a><span class="ot">ratingParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Rating</span></span>
<span id="cb3-31"><a href="#cb3-31" aria-hidden="true" tabindex="-1"></a>ratingParser <span class="ot">=</span></span>
<span id="cb3-32"><a href="#cb3-32" aria-hidden="true" tabindex="-1"></a> P.get <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb3-33"><a href="#cb3-33" aria-hidden="true" tabindex="-1"></a> <span class="ch">'x'</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="dt">X</span></span>
<span id="cb3-34"><a href="#cb3-34" aria-hidden="true" tabindex="-1"></a> <span class="ch">'m'</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="dt">M</span></span>
<span id="cb3-35"><a href="#cb3-35" aria-hidden="true" tabindex="-1"></a> <span class="ch">'a'</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="dt">A</span></span>
<span id="cb3-36"><a href="#cb3-36" aria-hidden="true" tabindex="-1"></a> <span class="ch">'s'</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="dt">S</span></span>
<span id="cb3-37"><a href="#cb3-37" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> P.pfail</span>
<span id="cb3-38"><a href="#cb3-38" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-39"><a href="#cb3-39" aria-hidden="true" tabindex="-1"></a><span class="ot">intParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Int</span></span>
<span id="cb3-40"><a href="#cb3-40" aria-hidden="true" tabindex="-1"></a>intParser <span class="ot">=</span></span>
<span id="cb3-41"><a href="#cb3-41" aria-hidden="true" tabindex="-1"></a> foldl' (\n d <span class="ot">-></span> n <span class="op">*</span> <span class="dv">10</span> <span class="op">+</span> d) <span class="dv">0</span> <span class="op"><$></span> P.many1 digitParser</span>
<span id="cb3-42"><a href="#cb3-42" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-43"><a href="#cb3-43" aria-hidden="true" tabindex="-1"></a><span class="ot">digitParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Int</span></span>
<span id="cb3-44"><a href="#cb3-44" aria-hidden="true" tabindex="-1"></a>digitParser <span class="ot">=</span> <span class="fu">digitToInt</span> <span class="op"><$></span> P.satisfy <span class="fu">isDigit</span></span>
<span id="cb3-45"><a href="#cb3-45" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-46"><a href="#cb3-46" aria-hidden="true" tabindex="-1"></a><span class="ot">parse ::</span> (<span class="dt">Show</span> a) <span class="ot">=></span> <span class="dt">P.ReadP</span> a <span class="ot">-></span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Either</span> <span class="dt">String</span> a</span>
<span id="cb3-47"><a href="#cb3-47" aria-hidden="true" tabindex="-1"></a>parse parser text <span class="ot">=</span> <span class="kw">case</span> P.readP_to_S (parser <span class="op"><*</span> P.eof) text <span class="kw">of</span></span>
<span id="cb3-48"><a href="#cb3-48" aria-hidden="true" tabindex="-1"></a> [(res, <span class="st">""</span>)] <span class="ot">-></span> <span class="dt">Right</span> res</span>
<span id="cb3-49"><a href="#cb3-49" aria-hidden="true" tabindex="-1"></a> [(_, s)] <span class="ot">-></span> <span class="dt">Left</span> <span class="op">$</span> <span class="st">"Leftover input: "</span> <span class="op"><></span> s</span>
<span id="cb3-50"><a href="#cb3-50" aria-hidden="true" tabindex="-1"></a> out <span class="ot">-></span> <span class="dt">Left</span> <span class="op">$</span> <span class="st">"Unexpected output: "</span> <span class="op"><></span> <span class="fu">show</span> out</span></code></pre></div>
<p><code class="sourceCode haskell"><span class="dt">Part</span></code> is a Haskell data type representing parts, and <code class="sourceCode haskell"><span class="dt">Rating</span></code> is an enum for, well, ratings<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>.
Following that are parsers for parts and ratings, written in Applicative and Monadic styles using the basic parsers and combinators provided by the ReadP library.</p>
<p>Finally, we have the <code>parse</code> function to run a parser on an input. We can try parsing parts in GHCi:</p>
<div class="sourceCode" id="cb4" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> parse partParser <span class="st">"{x=2127,m=1623,a=2188,s=1013}"</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>Right (Part {partX = 2127, partM = 1623, partA = 2188, partS = 1013})</span></code></pre></div>
<p>Next, we represent and parse the program, I mean, the system:</p>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">System</span> <span class="ot">=</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">System</span> (<span class="dt">Map.Map</span> <span class="dt">WorkflowName</span> <span class="dt">Workflow</span>)</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Workflow</span> <span class="ot">=</span> <span class="dt">Workflow</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> wName ::</span> <span class="dt">WorkflowName</span>,</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="ot"> wRules ::</span> [<span class="dt">Rule</span>]</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a> } <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">WorkflowName</span> <span class="ot">=</span> <span class="dt">String</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Rule</span></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">AtomicRule</span> <span class="dt">AtomicRule</span></span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">If</span> <span class="dt">Condition</span> <span class="dt">AtomicRule</span></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">AtomicRule</span></span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Jump</span> <span class="dt">WorkflowName</span></span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Accept</span></span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Reject</span></span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>, <span class="dt">Ord</span>)</span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Condition</span></span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Comparison</span> <span class="dt">Rating</span> <span class="dt">CmpOp</span> <span class="dt">Int</span></span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">CmpOp</span> <span class="ot">=</span> <span class="dt">LT</span> <span class="op">|</span> <span class="dt">GT</span> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span></code></pre></div>
<p>A <code class="sourceCode haskell"><span class="dt">System</span></code> is a map of workflows by their names. A <code class="sourceCode haskell"><span class="dt">Workflow</span></code> has a name and a list of rules. A <code class="sourceCode haskell"><span class="dt">Rule</span></code> is either an <code class="sourceCode haskell"><span class="dt">AtomicRule</span></code>, or an <code class="sourceCode haskell"><span class="dt">If</span></code> rule. An <code class="sourceCode haskell"><span class="dt">AtomicRule</span></code> is either a <code class="sourceCode haskell"><span class="dt">Jump</span></code> to another workflow by name, or an <code class="sourceCode haskell"><span class="dt">Accept</span></code> or <code class="sourceCode haskell"><span class="dt">Reject</span></code> rule. The <code class="sourceCode haskell"><span class="dt">Condition</span></code> of an <code class="sourceCode haskell"><span class="dt">If</span></code> rule is a less that (<code class="sourceCode haskell"><span class="dt">LT</span></code>) or a greater than (<code class="sourceCode haskell"><span class="dt">GT</span></code>) <code class="sourceCode haskell"><span class="dt">Comparison</span></code> of some <code class="sourceCode haskell"><span class="dt">Rating</span></code> of an input part with an integer value.</p>
<p>Now, it’s time to parse the system:</p>
<div class="sourceCode" id="cb6" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">systemParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">System</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>systemParser <span class="ot">=</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">System</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> foldl' (\m wf <span class="ot">-></span> Map.insert (wName wf) wf m) Map.empty</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> workflowParser <span class="ot">`P.endBy1`</span> P.char <span class="ch">'\n'</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a><span class="ot">workflowParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Workflow</span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a>workflowParser <span class="ot">=</span></span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">Workflow</span></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> P.many1 (P.satisfy <span class="fu">isAlpha</span>)</span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> P.between (P.char <span class="ch">'{'</span>) (P.char <span class="ch">'}'</span>)</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a> (ruleParser <span class="ot">`P.sepBy1`</span> P.char <span class="ch">','</span>)</span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a><span class="ot">ruleParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Rule</span></span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a>ruleParser <span class="ot">=</span></span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a> (<span class="dt">AtomicRule</span> <span class="op"><$></span> atomicRuleParser) <span class="op">P.<++</span> ifRuleParser</span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a><span class="ot">ifRuleParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Rule</span></span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a>ifRuleParser <span class="ot">=</span></span>
<span id="cb6-20"><a href="#cb6-20" aria-hidden="true" tabindex="-1"></a> <span class="dt">If</span></span>
<span id="cb6-21"><a href="#cb6-21" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> (<span class="dt">Comparison</span> <span class="op"><$></span> ratingParser <span class="op"><*></span> cmpOpParser <span class="op"><*></span> intParser)</span>
<span id="cb6-22"><a href="#cb6-22" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> (P.char <span class="ch">':'</span> <span class="op">*></span> atomicRuleParser)</span>
<span id="cb6-23"><a href="#cb6-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-24"><a href="#cb6-24" aria-hidden="true" tabindex="-1"></a><span class="ot">atomicRuleParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">AtomicRule</span></span>
<span id="cb6-25"><a href="#cb6-25" aria-hidden="true" tabindex="-1"></a>atomicRuleParser <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb6-26"><a href="#cb6-26" aria-hidden="true" tabindex="-1"></a> c <span class="op">:</span> _ <span class="ot"><-</span> P.look</span>
<span id="cb6-27"><a href="#cb6-27" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> c <span class="kw">of</span></span>
<span id="cb6-28"><a href="#cb6-28" aria-hidden="true" tabindex="-1"></a> <span class="ch">'A'</span> <span class="ot">-></span> P.char <span class="ch">'A'</span> <span class="op">$></span> <span class="dt">Accept</span></span>
<span id="cb6-29"><a href="#cb6-29" aria-hidden="true" tabindex="-1"></a> <span class="ch">'R'</span> <span class="ot">-></span> P.char <span class="ch">'R'</span> <span class="op">$></span> <span class="dt">Reject</span></span>
<span id="cb6-30"><a href="#cb6-30" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> (<span class="dt">Jump</span> <span class="op">.</span>) <span class="op">.</span> (<span class="op">:</span>) <span class="op"><$></span> P.char c <span class="op"><*></span> P.many1 (P.satisfy <span class="fu">isAlpha</span>)</span>
<span id="cb6-31"><a href="#cb6-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-32"><a href="#cb6-32" aria-hidden="true" tabindex="-1"></a><span class="ot">cmpOpParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">CmpOp</span></span>
<span id="cb6-33"><a href="#cb6-33" aria-hidden="true" tabindex="-1"></a>cmpOpParser <span class="ot">=</span> P.choice [P.char <span class="ch">'<'</span> <span class="op">$></span> <span class="dt">LT</span>, P.char <span class="ch">'>'</span> <span class="op">$></span> <span class="dt">GT</span>]</span></code></pre></div>
<p>Parsing is straightforward as there are no recursive data types or complicated precedence or associativity rules here. We can exercise it in GHCi (output formatted for clarity):</p>
<div class="sourceCode" id="cb7" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> parse workflowParser <span class="st">"px{a<2006:qkq,m>2090:A,rfg}"</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>Right (</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> Workflow {</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> wName = "px",</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> wRules = [</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> If (Comparison A LT 2006) (Jump "qkq"),</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> If (Comparison M GT 2090) Accept,</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> AtomicRule (Jump "rfg")</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div>
<p>Excellent! We can now combine the part parser and the system parser to parse the problem input:</p>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Input</span> <span class="ot">=</span> <span class="dt">Input</span> <span class="dt">System</span> [<span class="dt">Part</span>] <span class="kw">deriving</span> (<span class="dt">Show</span>)</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="ot">inputParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Input</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>inputParser <span class="ot">=</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Input</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> systemParser</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> (P.char <span class="ch">'\n'</span> <span class="op">*></span> partParser <span class="ot">`P.endBy1`</span> P.char <span class="ch">'\n'</span>)</span></code></pre></div>
<p>Before moving on to translating the system to C, let’s write an interpreter so that we can compare the output of our final C program against it for validation.</p>
<h2 data-track-content data-content-name="the-interpreter" data-content-piece="compiling-aoc23-aplenty" id="the-interpreter">The Interpreter</h2>
<p>Each system has a workflow named “in”, where the execution of the system starts. Running the system results in <code class="sourceCode haskell"><span class="dt">True</span></code> if the run ends with an <code class="sourceCode haskell"><span class="dt">Accept</span></code> rule, or in <code class="sourceCode haskell"><span class="dt">False</span></code> if the run ends with a <code class="sourceCode haskell"><span class="dt">Reject</span></code> rule. With this in mind, let’s cook up the interpreter:</p>
<div class="sourceCode" id="cb9" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">runSystem ::</span> <span class="dt">System</span> <span class="ot">-></span> <span class="dt">Part</span> <span class="ot">-></span> <span class="dt">Bool</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>runSystem (<span class="dt">System</span> system) part <span class="ot">=</span> runRule <span class="op">$</span> <span class="dt">Jump</span> <span class="st">"in"</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a> runRule <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Accept</span> <span class="ot">-></span> <span class="dt">True</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Reject</span> <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Jump</span> wfName <span class="ot">-></span> jump wfName</span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a> jump wfName <span class="ot">=</span> <span class="kw">case</span> Map.lookup wfName system <span class="kw">of</span></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> workflow <span class="ot">-></span> runRules <span class="op">$</span> wRules workflow</span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span></span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a> <span class="fu">error</span> <span class="op">$</span> <span class="st">"Workflow not found in system: "</span> <span class="op"><></span> wfName</span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a> runRules <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a> (rule <span class="op">:</span> rest) <span class="ot">-></span> <span class="kw">case</span> rule <span class="kw">of</span></span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">AtomicRule</span> aRule <span class="ot">-></span> runRule aRule</span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">If</span> cond aRule <span class="ot">-></span></span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> evalCond cond</span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> runRule aRule</span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> runRules rest</span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">error</span> <span class="st">"Workflow ended without accept/reject"</span></span>
<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a> evalCond <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a> <span class="dt">Comparison</span> r <span class="dt">LT</span> value <span class="ot">-></span> rating r <span class="op"><</span> value</span>
<span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a> <span class="dt">Comparison</span> r <span class="dt">GT</span> value<span class="ot">-></span> rating r <span class="op">></span> value</span>
<span id="cb9-26"><a href="#cb9-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-27"><a href="#cb9-27" aria-hidden="true" tabindex="-1"></a> rating <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb9-28"><a href="#cb9-28" aria-hidden="true" tabindex="-1"></a> <span class="dt">X</span> <span class="ot">-></span> partX part</span>
<span id="cb9-29"><a href="#cb9-29" aria-hidden="true" tabindex="-1"></a> <span class="dt">M</span> <span class="ot">-></span> partM part</span>
<span id="cb9-30"><a href="#cb9-30" aria-hidden="true" tabindex="-1"></a> <span class="dt">A</span> <span class="ot">-></span> partA part</span>
<span id="cb9-31"><a href="#cb9-31" aria-hidden="true" tabindex="-1"></a> <span class="dt">S</span> <span class="ot">-></span> partS part</span></code></pre></div>
<p>The interpreter starts by running the rule to jump to the “in” workflow. Running a rule returns <code class="sourceCode haskell"><span class="dt">True</span></code> or <code class="sourceCode haskell"><span class="dt">False</span></code> for <code class="sourceCode haskell"><span class="dt">Accept</span></code> or <code class="sourceCode haskell"><span class="dt">Reject</span></code> rules respectively, or jumps to a workflow for <code class="sourceCode haskell"><span class="dt">Jump</span></code> rules. Jumping to a workflow looks it up in the system’s map of workflows, and sequentially runs each of its rules.</p>
<p>An <code class="sourceCode haskell"><span class="dt">AtomicRule</span></code> is run as previously mentioned. An <code class="sourceCode haskell"><span class="dt">If</span></code> rule evaluates its condition, and either runs the consequent rule if the condition is true, or moves on to running the rest of the rules in the workflow.</p>
<p>That’s it for the interpreter. We can run it on the example input:</p>
<div class="sourceCode" id="cb10" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> inputText <span class="ot"><-</span> <span class="fu">readFile</span> <span class="st">"input.txt"</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Right</span> (<span class="dt">Input</span> system parts) <span class="ot">=</span> parse inputParser inputText</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runSystem system (parts <span class="op">!!</span> <span class="dv">0</span>)</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>True</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runSystem system (parts <span class="op">!!</span> <span class="dv">1</span>)</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a>False</span></code></pre></div>
<p>The AoC problem requires us to return the sum total of the ratings of the parts that are accepted by the system:</p>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">solve ::</span> <span class="dt">Input</span> <span class="ot">-></span> <span class="dt">Int</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>solve (<span class="dt">Input</span> system parts) <span class="ot">=</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">sum</span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">map</span> (\(<span class="dt">Part</span> x m a s) <span class="ot">-></span> x <span class="op">+</span> m <span class="op">+</span> a <span class="op">+</span> s)</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">filter</span> (runSystem system)</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> parts</span></code></pre></div>
<p>Let’s run it for the example input:</p>
<div class="sourceCode" id="cb12" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Right</span> input <span class="ot"><-</span> parse inputParser <span class="op"><$></span> <span class="fu">readFile</span> <span class="st">"exinput.txt"</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> solve input</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>19114</span></code></pre></div>
<p>It returns the correct answer!
Next up, we generate some C code.</p>
<h2 data-track-content data-content-name="the-control-flow-graph" data-content-piece="compiling-aoc23-aplenty" id="the-control-flow-graph">The Control-flow Graph</h2>
<p>But first, a quick digression to graphs. A <a href="https://en.wikipedia.org/wiki/Control-flow_graph" target="_blank" rel="noopener">Control-flow graph</a> or CFG, is a graph of all possible paths that can be taken through a program during its execution. It has many uses in compilers, but for now, we use it to generate more readable C code.</p>
<p>Using the <a href="https://hackage.haskell.org/package/containers/docs/Data-Graph.html" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Data.Graph</span></code></a> module from the <code>containers</code> package, we write the function to create a control-flow graph for our system/program, and use it to <a href="https://en.wikipedia.org/wiki/Topological_sorting" target="_blank" rel="noopener">topologically sort</a> the workflows:</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Graph'</span> a <span class="ot">=</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Graph.Graph</span>, <span class="dt">Graph.Vertex</span> <span class="ot">-></span> (a, [a]), a <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Graph.Vertex</span>)</span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a><span class="ot">cfGraph ::</span> <span class="dt">Map.Map</span> <span class="dt">WorkflowName</span> <span class="dt">Workflow</span> <span class="ot">-></span> <span class="dt">Graph'</span> <span class="dt">WorkflowName</span></span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>cfGraph system <span class="ot">=</span></span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> graphFromMap</span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Map.toList</span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">flip</span> Map.map system</span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> \(<span class="dt">Workflow</span> _ rules) <span class="ot">-></span></span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a> <span class="fu">flip</span> <span class="fu">concatMap</span> rules <span class="op">$</span> \<span class="kw">case</span></span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">AtomicRule</span> (<span class="dt">Jump</span> wfName) <span class="ot">-></span> [wfName]</span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">If</span> _ (<span class="dt">Jump</span> wfName) <span class="ot">-></span> [wfName]</span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> []</span>
<span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a><span class="ot"> graphFromMap ::</span> (<span class="dt">Ord</span> a) <span class="ot">=></span> [(a, [a])] <span class="ot">-></span> <span class="dt">Graph'</span> a</span>
<span id="cb13-16"><a href="#cb13-16" aria-hidden="true" tabindex="-1"></a> graphFromMap m <span class="ot">=</span></span>
<span id="cb13-17"><a href="#cb13-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (graph, nLookup, vLookup) <span class="ot">=</span></span>
<span id="cb13-18"><a href="#cb13-18" aria-hidden="true" tabindex="-1"></a> Graph.graphFromEdges <span class="op">$</span> <span class="fu">map</span> (\(f, ts) <span class="ot">-></span> (f, f, ts)) m</span>
<span id="cb13-19"><a href="#cb13-19" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> (graph, \v <span class="ot">-></span> <span class="kw">let</span> (x, _, xs) <span class="ot">=</span> nLookup v <span class="kw">in</span> (x, xs), vLookup)</span>
<span id="cb13-20"><a href="#cb13-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-21"><a href="#cb13-21" aria-hidden="true" tabindex="-1"></a><span class="ot">toposortWorkflows ::</span> <span class="dt">Map.Map</span> <span class="dt">WorkflowName</span> <span class="dt">Workflow</span> <span class="ot">-></span> [<span class="dt">WorkflowName</span>]</span>
<span id="cb13-22"><a href="#cb13-22" aria-hidden="true" tabindex="-1"></a>toposortWorkflows system <span class="ot">=</span></span>
<span id="cb13-23"><a href="#cb13-23" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (cfg, nLookup, _) <span class="ot">=</span> cfGraph system</span>
<span id="cb13-24"><a href="#cb13-24" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="fu">map</span> (<span class="fu">fst</span> <span class="op">.</span> nLookup) <span class="op">$</span> Graph.topSort cfg</span></code></pre></div>
<p><code class="sourceCode haskell"><span class="dt">Graph'</span></code> is a simpler type for a graph of nodes of type <code>a</code>. The <code>cfGraph</code> function takes a the map from workflow names to workflows — that is, a system — and returns a control-flow graph of workflow names. It does this by finding jumps from workflows to other workflows, and connecting them.</p>
<p>Then, the <code>toposortWorkflows</code> function uses the created CFG to topologically sort the workflows. We’ll see this in action in a bit. Moving on to …</p>
<h2 data-track-content data-content-name="the-compiler" data-content-piece="compiling-aoc23-aplenty" id="the-compiler">The Compiler</h2>
<p>The compiler, for now, simply generates the C code for a given system. We write a <code class="sourceCode haskell"><span class="dt">ToC</span></code> typeclass for convenience:</p>
<div class="sourceCode" id="cb14" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> <span class="dt">ToC</span> a <span class="kw">where</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a><span class="ot"> toC ::</span> a <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">ToC</span> <span class="dt">Part</span> <span class="kw">where</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> toC (<span class="dt">Part</span> x m a s) <span class="ot">=</span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a> <span class="st">"{"</span> <span class="op"><></span> intercalate <span class="st">", "</span> (<span class="fu">map</span> <span class="fu">show</span> [x, m, a, s]) <span class="op"><></span> <span class="st">"}"</span></span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">ToC</span> <span class="dt">CmpOp</span> <span class="kw">where</span></span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a> toC <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">LT</span> <span class="ot">-></span> <span class="st">"<"</span></span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">GT</span> <span class="ot">-></span> <span class="st">">"</span></span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">ToC</span> <span class="dt">Rating</span> <span class="kw">where</span></span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a> toC <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">X</span> <span class="ot">-></span> <span class="st">"x"</span></span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">M</span> <span class="ot">-></span> <span class="st">"m"</span></span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">A</span> <span class="ot">-></span> <span class="st">"a"</span></span>
<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">S</span> <span class="ot">-></span> <span class="st">"s"</span></span>
<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">ToC</span> <span class="dt">AtomicRule</span> <span class="kw">where</span></span>
<span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a> toC <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb14-22"><a href="#cb14-22" aria-hidden="true" tabindex="-1"></a> <span class="dt">Accept</span> <span class="ot">-></span> <span class="st">"return true;"</span></span>
<span id="cb14-23"><a href="#cb14-23" aria-hidden="true" tabindex="-1"></a> <span class="dt">Reject</span> <span class="ot">-></span> <span class="st">"return false;"</span></span>
<span id="cb14-24"><a href="#cb14-24" aria-hidden="true" tabindex="-1"></a> <span class="dt">Jump</span> wfName <span class="ot">-></span> <span class="st">"goto "</span> <span class="op"><></span> wfName <span class="op"><></span> <span class="st">";"</span></span>
<span id="cb14-25"><a href="#cb14-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-26"><a href="#cb14-26" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">ToC</span> <span class="dt">Condition</span> <span class="kw">where</span></span>
<span id="cb14-27"><a href="#cb14-27" aria-hidden="true" tabindex="-1"></a> toC <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb14-28"><a href="#cb14-28" aria-hidden="true" tabindex="-1"></a> <span class="dt">Comparison</span> rating op val <span class="ot">-></span></span>
<span id="cb14-29"><a href="#cb14-29" aria-hidden="true" tabindex="-1"></a> toC rating <span class="op"><></span> <span class="st">" "</span> <span class="op"><></span> toC op <span class="op"><></span> <span class="st">" "</span> <span class="op"><></span> <span class="fu">show</span> val</span>
<span id="cb14-30"><a href="#cb14-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-31"><a href="#cb14-31" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">ToC</span> <span class="dt">Rule</span> <span class="kw">where</span></span>
<span id="cb14-32"><a href="#cb14-32" aria-hidden="true" tabindex="-1"></a> toC <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb14-33"><a href="#cb14-33" aria-hidden="true" tabindex="-1"></a> <span class="dt">AtomicRule</span> aRule <span class="ot">-></span> toC aRule</span>
<span id="cb14-34"><a href="#cb14-34" aria-hidden="true" tabindex="-1"></a> <span class="dt">If</span> cond aRule <span class="ot">-></span></span>
<span id="cb14-35"><a href="#cb14-35" aria-hidden="true" tabindex="-1"></a> <span class="st">"if ("</span> <span class="op"><></span> toC cond <span class="op"><></span> <span class="st">") { "</span> <span class="op"><></span> toC aRule <span class="op"><></span> <span class="st">" }"</span></span>
<span id="cb14-36"><a href="#cb14-36" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-37"><a href="#cb14-37" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">ToC</span> <span class="dt">Workflow</span> <span class="kw">where</span></span>
<span id="cb14-38"><a href="#cb14-38" aria-hidden="true" tabindex="-1"></a> toC (<span class="dt">Workflow</span> wfName rules) <span class="ot">=</span></span>
<span id="cb14-39"><a href="#cb14-39" aria-hidden="true" tabindex="-1"></a> wfName</span>
<span id="cb14-40"><a href="#cb14-40" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">":\n"</span></span>
<span id="cb14-41"><a href="#cb14-41" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> intercalate <span class="st">"\n"</span> (<span class="fu">map</span> ((<span class="st">" "</span> <span class="op"><></span>) <span class="op">.</span> toC) rules)</span>
<span id="cb14-42"><a href="#cb14-42" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-43"><a href="#cb14-43" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">ToC</span> <span class="dt">System</span> <span class="kw">where</span></span>
<span id="cb14-44"><a href="#cb14-44" aria-hidden="true" tabindex="-1"></a> toC (<span class="dt">System</span> system) <span class="ot">=</span></span>
<span id="cb14-45"><a href="#cb14-45" aria-hidden="true" tabindex="-1"></a> intercalate</span>
<span id="cb14-46"><a href="#cb14-46" aria-hidden="true" tabindex="-1"></a> <span class="st">"\n"</span></span>
<span id="cb14-47"><a href="#cb14-47" aria-hidden="true" tabindex="-1"></a> [ <span class="st">"bool runSystem(int x, int m, int a, int s) {"</span>,</span>
<span id="cb14-48"><a href="#cb14-48" aria-hidden="true" tabindex="-1"></a> <span class="st">" goto in;"</span>,</span>
<span id="cb14-49"><a href="#cb14-49" aria-hidden="true" tabindex="-1"></a> intercalate</span>
<span id="cb14-50"><a href="#cb14-50" aria-hidden="true" tabindex="-1"></a> <span class="st">"\n"</span></span>
<span id="cb14-51"><a href="#cb14-51" aria-hidden="true" tabindex="-1"></a> (<span class="fu">map</span> (toC <span class="op">.</span> (system <span class="op">Map.!</span>)) <span class="op">$</span> toposortWorkflows system),</span>
<span id="cb14-52"><a href="#cb14-52" aria-hidden="true" tabindex="-1"></a> <span class="st">"}"</span></span>
<span id="cb14-53"><a href="#cb14-53" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb14-54"><a href="#cb14-54" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-55"><a href="#cb14-55" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">ToC</span> <span class="dt">Input</span> <span class="kw">where</span></span>
<span id="cb14-56"><a href="#cb14-56" aria-hidden="true" tabindex="-1"></a> toC (<span class="dt">Input</span> system parts) <span class="ot">=</span></span>
<span id="cb14-57"><a href="#cb14-57" aria-hidden="true" tabindex="-1"></a> intercalate</span>
<span id="cb14-58"><a href="#cb14-58" aria-hidden="true" tabindex="-1"></a> <span class="st">"\n"</span></span>
<span id="cb14-59"><a href="#cb14-59" aria-hidden="true" tabindex="-1"></a> [ <span class="st">"#include <stdbool.h>"</span>,</span>
<span id="cb14-60"><a href="#cb14-60" aria-hidden="true" tabindex="-1"></a> <span class="st">"#include <stdio.h>\n"</span>,</span>
<span id="cb14-61"><a href="#cb14-61" aria-hidden="true" tabindex="-1"></a> toC system,</span>
<span id="cb14-62"><a href="#cb14-62" aria-hidden="true" tabindex="-1"></a> <span class="st">"int main() {"</span>,</span>
<span id="cb14-63"><a href="#cb14-63" aria-hidden="true" tabindex="-1"></a> <span class="st">" int parts[][4] = {"</span>,</span>
<span id="cb14-64"><a href="#cb14-64" aria-hidden="true" tabindex="-1"></a> intercalate <span class="st">",\n"</span> (<span class="fu">map</span> ((<span class="st">" "</span> <span class="op"><></span>) <span class="op">.</span> toC) parts),</span>
<span id="cb14-65"><a href="#cb14-65" aria-hidden="true" tabindex="-1"></a> <span class="st">" };"</span>,</span>
<span id="cb14-66"><a href="#cb14-66" aria-hidden="true" tabindex="-1"></a> <span class="st">" int totalRating = 0;"</span>,</span>
<span id="cb14-67"><a href="#cb14-67" aria-hidden="true" tabindex="-1"></a> <span class="st">" for(int i = 0; i < "</span> <span class="op"><></span> <span class="fu">show</span> (<span class="fu">length</span> parts) <span class="op"><></span> <span class="st">"; i++) {"</span>,</span>
<span id="cb14-68"><a href="#cb14-68" aria-hidden="true" tabindex="-1"></a> <span class="st">" int x = parts[i][0];"</span>,</span>
<span id="cb14-69"><a href="#cb14-69" aria-hidden="true" tabindex="-1"></a> <span class="st">" int m = parts[i][1];"</span>,</span>
<span id="cb14-70"><a href="#cb14-70" aria-hidden="true" tabindex="-1"></a> <span class="st">" int a = parts[i][2];"</span>,</span>
<span id="cb14-71"><a href="#cb14-71" aria-hidden="true" tabindex="-1"></a> <span class="st">" int s = parts[i][3];"</span>,</span>
<span id="cb14-72"><a href="#cb14-72" aria-hidden="true" tabindex="-1"></a> <span class="st">" if (runSystem(x, m, a, s)) {"</span>,</span>
<span id="cb14-73"><a href="#cb14-73" aria-hidden="true" tabindex="-1"></a> <span class="st">" totalRating += x + m + a + s;"</span>,</span>
<span id="cb14-74"><a href="#cb14-74" aria-hidden="true" tabindex="-1"></a> <span class="st">" }"</span>,</span>
<span id="cb14-75"><a href="#cb14-75" aria-hidden="true" tabindex="-1"></a> <span class="st">" }"</span>,</span>
<span id="cb14-76"><a href="#cb14-76" aria-hidden="true" tabindex="-1"></a> <span class="st">" printf(\"%d\", totalRating);"</span>,</span>
<span id="cb14-77"><a href="#cb14-77" aria-hidden="true" tabindex="-1"></a> <span class="st">" return 0;"</span>,</span>
<span id="cb14-78"><a href="#cb14-78" aria-hidden="true" tabindex="-1"></a> <span class="st">"}"</span></span>
<span id="cb14-79"><a href="#cb14-79" aria-hidden="true" tabindex="-1"></a> ]</span></code></pre></div>
<p>As mentioned before, <code class="sourceCode haskell"><span class="dt">Accept</span></code> and <code class="sourceCode haskell"><span class="dt">Reject</span></code> rules are converted to return <code class="sourceCode c"><span class="kw">true</span></code> and <code class="sourceCode c"><span class="kw">false</span></code> respectively, and <code class="sourceCode haskell"><span class="dt">Jump</span></code> rules are converted to <code class="sourceCode c"><span class="cf">goto</span></code>s. <code class="sourceCode haskell"><span class="dt">If</span></code> rules become <code class="sourceCode c"><span class="cf">if</span></code> statements, and <code class="sourceCode haskell"><span class="dt">Workflow</span></code>s become block labels followed by block statements.</p>
<p>A <code class="sourceCode haskell"><span class="dt">System</span></code> is translated to a function <code>runSystem</code> that takes four parameters, <code>x</code>, <code>m</code>, <code>a</code> and <code>s</code>, and runs the workflows translated to blocks by executing <code class="sourceCode c"><span class="cf">goto</span> in</code>.</p>
<p>Finally, an <code class="sourceCode haskell"><span class="dt">Input</span></code> is converted to a C file with the required includes, and a <code>main</code> function that solves the problem by calling the <code>runSystem</code> function for all parts.</p>
<p>Let’s throw in a <code>main</code> function to put everything together.</p>
<div class="sourceCode" id="cb15" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> file <span class="ot"><-</span> <span class="fu">head</span> <span class="op"><$></span> getArgs</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a> code <span class="ot"><-</span> <span class="fu">readFile</span> file</span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> parse inputParser code <span class="kw">of</span></span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> input <span class="ot">-></span> <span class="fu">putStrLn</span> <span class="op">$</span> toC input</span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> err <span class="ot">-></span> <span class="fu">error</span> err</span></code></pre></div>
<p>The <code>main</code> function reads the input from the file provided as the command line argument, parses it and outputs the generated C code. Let’s run it now.</p>
<h2 data-track-content data-content-name="the-compiler-output" data-content-piece="compiling-aoc23-aplenty" id="the-compiler-output">The Compiler Output</h2>
<p>We compile our compiler and run it to generate the C code for the example problem:</p>
<pre class="plain"><code>$ ghc --make aplenty.hs
$ ./aplenty exinput.txt > aplenty.c</code></pre>
<p>This is the C code it generates:</p>
<div class="sourceCode" id="cb17" data-lang="c"><pre class="sourceCode numberSource c"><code class="sourceCode c"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><stdbool.h></span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im"><stdio.h></span></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a><span class="dt">bool</span> runSystem<span class="op">(</span><span class="dt">int</span> x<span class="op">,</span> <span class="dt">int</span> m<span class="op">,</span> <span class="dt">int</span> a<span class="op">,</span> <span class="dt">int</span> s<span class="op">)</span> <span class="op">{</span></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">goto</span> in<span class="op">;</span></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>in<span class="op">:</span></span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>s <span class="op"><</span> <span class="dv">1351</span><span class="op">)</span> <span class="op">{</span> <span class="cf">goto</span> px<span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a> <span class="cf">goto</span> qqz<span class="op">;</span></span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a>qqz<span class="op">:</span></span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>s <span class="op">></span> <span class="dv">2770</span><span class="op">)</span> <span class="op">{</span> <span class="cf">goto</span> qs<span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>m <span class="op"><</span> <span class="dv">1801</span><span class="op">)</span> <span class="op">{</span> <span class="cf">goto</span> hdj<span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">false</span><span class="op">;</span></span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a>qs<span class="op">:</span></span>
<span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>s <span class="op">></span> <span class="dv">3448</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a> <span class="cf">goto</span> lnx<span class="op">;</span></span>
<span id="cb17-16"><a href="#cb17-16" aria-hidden="true" tabindex="-1"></a>lnx<span class="op">:</span></span>
<span id="cb17-17"><a href="#cb17-17" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>m <span class="op">></span> <span class="dv">1548</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-18"><a href="#cb17-18" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span></span>
<span id="cb17-19"><a href="#cb17-19" aria-hidden="true" tabindex="-1"></a>px<span class="op">:</span></span>
<span id="cb17-20"><a href="#cb17-20" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>a <span class="op"><</span> <span class="dv">2006</span><span class="op">)</span> <span class="op">{</span> <span class="cf">goto</span> qkq<span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-21"><a href="#cb17-21" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>m <span class="op">></span> <span class="dv">2090</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-22"><a href="#cb17-22" aria-hidden="true" tabindex="-1"></a> <span class="cf">goto</span> rfg<span class="op">;</span></span>
<span id="cb17-23"><a href="#cb17-23" aria-hidden="true" tabindex="-1"></a>rfg<span class="op">:</span></span>
<span id="cb17-24"><a href="#cb17-24" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>s <span class="op"><</span> <span class="dv">537</span><span class="op">)</span> <span class="op">{</span> <span class="cf">goto</span> gd<span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-25"><a href="#cb17-25" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>x <span class="op">></span> <span class="dv">2440</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">false</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-26"><a href="#cb17-26" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span></span>
<span id="cb17-27"><a href="#cb17-27" aria-hidden="true" tabindex="-1"></a>qkq<span class="op">:</span></span>
<span id="cb17-28"><a href="#cb17-28" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>x <span class="op"><</span> <span class="dv">1416</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-29"><a href="#cb17-29" aria-hidden="true" tabindex="-1"></a> <span class="cf">goto</span> crn<span class="op">;</span></span>
<span id="cb17-30"><a href="#cb17-30" aria-hidden="true" tabindex="-1"></a>hdj<span class="op">:</span></span>
<span id="cb17-31"><a href="#cb17-31" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>m <span class="op">></span> <span class="dv">838</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-32"><a href="#cb17-32" aria-hidden="true" tabindex="-1"></a> <span class="cf">goto</span> pv<span class="op">;</span></span>
<span id="cb17-33"><a href="#cb17-33" aria-hidden="true" tabindex="-1"></a>pv<span class="op">:</span></span>
<span id="cb17-34"><a href="#cb17-34" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>a <span class="op">></span> <span class="dv">1716</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">false</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-35"><a href="#cb17-35" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span></span>
<span id="cb17-36"><a href="#cb17-36" aria-hidden="true" tabindex="-1"></a>gd<span class="op">:</span></span>
<span id="cb17-37"><a href="#cb17-37" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>a <span class="op">></span> <span class="dv">3333</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">false</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-38"><a href="#cb17-38" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">false</span><span class="op">;</span></span>
<span id="cb17-39"><a href="#cb17-39" aria-hidden="true" tabindex="-1"></a>crn<span class="op">:</span></span>
<span id="cb17-40"><a href="#cb17-40" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>x <span class="op">></span> <span class="dv">2662</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb17-41"><a href="#cb17-41" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">false</span><span class="op">;</span></span>
<span id="cb17-42"><a href="#cb17-42" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb17-43"><a href="#cb17-43" aria-hidden="true" tabindex="-1"></a><span class="dt">int</span> main<span class="op">()</span> <span class="op">{</span></span>
<span id="cb17-44"><a href="#cb17-44" aria-hidden="true" tabindex="-1"></a> <span class="dt">int</span> parts<span class="op">[][</span><span class="dv">4</span><span class="op">]</span> <span class="op">=</span> <span class="op">{</span></span>
<span id="cb17-45"><a href="#cb17-45" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span><span class="dv">787</span><span class="op">,</span> <span class="dv">2655</span><span class="op">,</span> <span class="dv">1222</span><span class="op">,</span> <span class="dv">2876</span><span class="op">},</span></span>
<span id="cb17-46"><a href="#cb17-46" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span><span class="dv">1679</span><span class="op">,</span> <span class="dv">44</span><span class="op">,</span> <span class="dv">2067</span><span class="op">,</span> <span class="dv">496</span><span class="op">},</span></span>
<span id="cb17-47"><a href="#cb17-47" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span><span class="dv">2036</span><span class="op">,</span> <span class="dv">264</span><span class="op">,</span> <span class="dv">79</span><span class="op">,</span> <span class="dv">2244</span><span class="op">},</span></span>
<span id="cb17-48"><a href="#cb17-48" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span><span class="dv">2461</span><span class="op">,</span> <span class="dv">1339</span><span class="op">,</span> <span class="dv">466</span><span class="op">,</span> <span class="dv">291</span><span class="op">},</span></span>
<span id="cb17-49"><a href="#cb17-49" aria-hidden="true" tabindex="-1"></a> <span class="op">{</span><span class="dv">2127</span><span class="op">,</span> <span class="dv">1623</span><span class="op">,</span> <span class="dv">2188</span><span class="op">,</span> <span class="dv">1013</span><span class="op">}</span></span>
<span id="cb17-50"><a href="#cb17-50" aria-hidden="true" tabindex="-1"></a> <span class="op">};</span></span>
<span id="cb17-51"><a href="#cb17-51" aria-hidden="true" tabindex="-1"></a> <span class="dt">int</span> totalRating <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb17-52"><a href="#cb17-52" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span><span class="op">(</span><span class="dt">int</span> i <span class="op">=</span> <span class="dv">0</span><span class="op">;</span> i <span class="op"><</span> <span class="dv">5</span><span class="op">;</span> i<span class="op">++)</span> <span class="op">{</span></span>
<span id="cb17-53"><a href="#cb17-53" aria-hidden="true" tabindex="-1"></a> <span class="dt">int</span> x <span class="op">=</span> parts<span class="op">[</span>i<span class="op">][</span><span class="dv">0</span><span class="op">];</span></span>
<span id="cb17-54"><a href="#cb17-54" aria-hidden="true" tabindex="-1"></a> <span class="dt">int</span> m <span class="op">=</span> parts<span class="op">[</span>i<span class="op">][</span><span class="dv">1</span><span class="op">];</span></span>
<span id="cb17-55"><a href="#cb17-55" aria-hidden="true" tabindex="-1"></a> <span class="dt">int</span> a <span class="op">=</span> parts<span class="op">[</span>i<span class="op">][</span><span class="dv">2</span><span class="op">];</span></span>
<span id="cb17-56"><a href="#cb17-56" aria-hidden="true" tabindex="-1"></a> <span class="dt">int</span> s <span class="op">=</span> parts<span class="op">[</span>i<span class="op">][</span><span class="dv">3</span><span class="op">];</span></span>
<span id="cb17-57"><a href="#cb17-57" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>runSystem<span class="op">(</span>x<span class="op">,</span> m<span class="op">,</span> a<span class="op">,</span> s<span class="op">))</span> <span class="op">{</span></span>
<span id="cb17-58"><a href="#cb17-58" aria-hidden="true" tabindex="-1"></a> totalRating <span class="op">+=</span> x <span class="op">+</span> m <span class="op">+</span> a <span class="op">+</span> s<span class="op">;</span></span>
<span id="cb17-59"><a href="#cb17-59" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb17-60"><a href="#cb17-60" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb17-61"><a href="#cb17-61" aria-hidden="true" tabindex="-1"></a> printf<span class="op">(</span><span class="st">"</span><span class="sc">%d</span><span class="st">"</span><span class="op">,</span> totalRating<span class="op">);</span></span>
<span id="cb17-62"><a href="#cb17-62" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb17-63"><a href="#cb17-63" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>We see the <code>toposortWorkflows</code> function in action, sorting the blocks in the topological order of jumps between them, as opposed to the <a href="#the-problem">original input</a>. Does this work? Only one way to know:</p>
<pre class="plain"><code>$ gcc aplenty.c -o solution
$ ./solution
19114</code></pre>
<p>Perfect! The solution matches the interpreter output.</p>
<h2 data-track-content data-content-name="the-bonus-optimizations" data-content-piece="compiling-aoc23-aplenty" id="the-bonus-optimizations">The Bonus: Optimizations</h2>
<p>By studying the output C code, we spot some possibilities for optimizing the compiler output. Notice how the <code>lnx</code> block returns same value (<code class="sourceCode c"><span class="kw">true</span></code>) regardless of which branch it takes:</p>
<div class="sourceCode" id="cb19" data-lang="c"><pre class="sourceCode numberSource c"><code class="sourceCode c"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a>lnx<span class="op">:</span></span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>m <span class="op">></span> <span class="dv">1548</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span></span></code></pre></div>
<p>So, we should be able to replace it with:</p>
<div class="sourceCode" id="cb20" data-lang="c"><pre class="sourceCode numberSource c"><code class="sourceCode c"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a>lnx<span class="op">:</span></span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span></span></code></pre></div>
<p>If we do this, the <code>lnx</code> block becomes degenerate, and hence the jumps to the block can be inlined, turning the <code>qs</code> block from:</p>
<div class="sourceCode" id="cb21" data-lang="c"><pre class="sourceCode numberSource c"><code class="sourceCode c"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a>qs<span class="op">:</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>s <span class="op">></span> <span class="dv">3448</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">goto</span> lnx<span class="op">;</span></span></code></pre></div>
<p>to:</p>
<div class="sourceCode" id="cb22" data-lang="c"><pre class="sourceCode numberSource c"><code class="sourceCode c"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a>qs<span class="op">:</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>s <span class="op">></span> <span class="dv">3448</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span></span></code></pre></div>
<p>which makes the <code class="sourceCode c"><span class="cf">if</span></code> statement in the <code>qs</code> block redundant as well. Hence, we can repeat the previous optimization and further reduce the generated code.</p>
<p>Another possible optimization is to inline the blocks to which there are only single jumps from the rest of the blocks, for example the <code>qqz</code> block.</p>
<p>Let’s write these optimizations.</p>
<h3 id="simplify-workflows">Simplify Workflows</h3>
<div class="sourceCode" id="cb23" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="ot">simplifyWorkflows ::</span> <span class="dt">System</span> <span class="ot">-></span> <span class="dt">System</span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a>simplifyWorkflows (<span class="dt">System</span> system) <span class="ot">=</span></span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">System</span> <span class="op">$</span> Map.map simplifyWorkflow system</span>
<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a> simplifyWorkflow (<span class="dt">Workflow</span> name rules) <span class="ot">=</span></span>
<span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Workflow</span> name</span>
<span id="cb23-7"><a href="#cb23-7" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> foldr'</span>
<span id="cb23-8"><a href="#cb23-8" aria-hidden="true" tabindex="-1"></a> ( \r rs <span class="ot">-></span> <span class="kw">case</span> rs <span class="kw">of</span></span>
<span id="cb23-9"><a href="#cb23-9" aria-hidden="true" tabindex="-1"></a> [r'] <span class="op">|</span> ruleOutcome r <span class="op">==</span> ruleOutcome r' <span class="ot">-></span> rs</span>
<span id="cb23-10"><a href="#cb23-10" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> r <span class="op">:</span> rs</span>
<span id="cb23-11"><a href="#cb23-11" aria-hidden="true" tabindex="-1"></a> )</span>
<span id="cb23-12"><a href="#cb23-12" aria-hidden="true" tabindex="-1"></a> [<span class="fu">last</span> rules]</span>
<span id="cb23-13"><a href="#cb23-13" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> <span class="fu">init</span> rules</span>
<span id="cb23-14"><a href="#cb23-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb23-15"><a href="#cb23-15" aria-hidden="true" tabindex="-1"></a> ruleOutcome <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb23-16"><a href="#cb23-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">If</span> _ aRule <span class="ot">-></span> aRule</span>
<span id="cb23-17"><a href="#cb23-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">AtomicRule</span> aRule <span class="ot">-></span> aRule</span></code></pre></div>
<p><code>simplifyWorkflows</code> goes over all workflows and repeatedly removes the statements from the end of the blocks that has same outcome as the statement previous to them.</p>
<h3 id="inline-redundant-jumps">Inline Redundant Jumps</h3>
<div class="sourceCode" id="cb24" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="ot">inlineRedundantJumps ::</span> <span class="dt">System</span> <span class="ot">-></span> <span class="dt">System</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>inlineRedundantJumps (<span class="dt">System</span> system) <span class="ot">=</span></span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">System</span> <span class="op">$</span></span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a> foldl' (<span class="fu">flip</span> Map.delete) (Map.map inlineJumps system) <span class="op">$</span></span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a> Map.keys redundantJumps</span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"></a> redundantJumps <span class="ot">=</span></span>
<span id="cb24-8"><a href="#cb24-8" aria-hidden="true" tabindex="-1"></a> Map.map (\wf <span class="ot">-></span> <span class="kw">let</span> <span class="op">~</span>(<span class="dt">AtomicRule</span> rule) <span class="ot">=</span> <span class="fu">head</span> <span class="op">$</span> wRules wf <span class="kw">in</span> rule)</span>
<span id="cb24-9"><a href="#cb24-9" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Map.filter (\wf <span class="ot">-></span> <span class="fu">length</span> (wRules wf) <span class="op">==</span> <span class="dv">1</span>)</span>
<span id="cb24-10"><a href="#cb24-10" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> system</span>
<span id="cb24-11"><a href="#cb24-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-12"><a href="#cb24-12" aria-hidden="true" tabindex="-1"></a> inlineJumps (<span class="dt">Workflow</span> name rules) <span class="ot">=</span></span>
<span id="cb24-13"><a href="#cb24-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Workflow</span> name <span class="op">$</span> <span class="fu">map</span> inlineJump rules</span>
<span id="cb24-14"><a href="#cb24-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-15"><a href="#cb24-15" aria-hidden="true" tabindex="-1"></a> inlineJump <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb24-16"><a href="#cb24-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">AtomicRule</span> (<span class="dt">Jump</span> wfName)</span>
<span id="cb24-17"><a href="#cb24-17" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> Map.member wfName redundantJumps <span class="ot">-></span></span>
<span id="cb24-18"><a href="#cb24-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">AtomicRule</span> <span class="op">$</span> redundantJumps <span class="op">Map.!</span> wfName</span>
<span id="cb24-19"><a href="#cb24-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">If</span> cond (<span class="dt">Jump</span> wfName)</span>
<span id="cb24-20"><a href="#cb24-20" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> Map.member wfName redundantJumps <span class="ot">-></span></span>
<span id="cb24-21"><a href="#cb24-21" aria-hidden="true" tabindex="-1"></a> <span class="dt">If</span> cond <span class="op">$</span> redundantJumps <span class="op">Map.!</span> wfName</span>
<span id="cb24-22"><a href="#cb24-22" aria-hidden="true" tabindex="-1"></a> rule <span class="ot">-></span> rule</span></code></pre></div>
<p><code>inlineRedundantJumps</code> find the jumps to degenerate workflows and inlines them. It does this by first going over all workflows and creating a map of degenerate workflow names to the only rule in them, and then replacing the jumps to such workflows with the only rules.</p>
<h3 id="remove-jumps">Remove Jumps</h3>
<div class="sourceCode" id="cb25" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ot">removeJumps ::</span> <span class="dt">System</span> <span class="ot">-></span> <span class="dt">System</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a>removeJumps (<span class="dt">System</span> system) <span class="ot">=</span></span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> system' <span class="ot">=</span></span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a> foldl' (<span class="fu">flip</span> <span class="op">$</span> Map.adjust removeJumpsWithSingleJumper) system <span class="op">$</span></span>
<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a> toposortWorkflows system</span>
<span id="cb25-6"><a href="#cb25-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="dt">System</span></span>
<span id="cb25-7"><a href="#cb25-7" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> foldl' (<span class="fu">flip</span> Map.delete) system'</span>
<span id="cb25-8"><a href="#cb25-8" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> (\\ [<span class="st">"in"</span>])</span>
<span id="cb25-9"><a href="#cb25-9" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> workflowsWithNJumpers <span class="dv">0</span> system'</span>
<span id="cb25-10"><a href="#cb25-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb25-11"><a href="#cb25-11" aria-hidden="true" tabindex="-1"></a> removeJumpsWithSingleJumper (<span class="dt">Workflow</span> name rules) <span class="ot">=</span></span>
<span id="cb25-12"><a href="#cb25-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">Workflow</span> name <span class="op">$</span></span>
<span id="cb25-13"><a href="#cb25-13" aria-hidden="true" tabindex="-1"></a> <span class="fu">init</span> rules <span class="op"><></span> <span class="kw">case</span> <span class="fu">last</span> rules <span class="kw">of</span></span>
<span id="cb25-14"><a href="#cb25-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">AtomicRule</span> (<span class="dt">Jump</span> wfName)</span>
<span id="cb25-15"><a href="#cb25-15" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> wfName <span class="ot">`elem`</span> workflowsWithSingleJumper <span class="ot">-></span></span>
<span id="cb25-16"><a href="#cb25-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (<span class="dt">Workflow</span> _ rules') <span class="ot">=</span> system <span class="op">Map.!</span> wfName</span>
<span id="cb25-17"><a href="#cb25-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> rules'</span>
<span id="cb25-18"><a href="#cb25-18" aria-hidden="true" tabindex="-1"></a> rule <span class="ot">-></span> [rule]</span>
<span id="cb25-19"><a href="#cb25-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-20"><a href="#cb25-20" aria-hidden="true" tabindex="-1"></a> workflowsWithSingleJumper <span class="ot">=</span> workflowsWithNJumpers <span class="dv">1</span> system</span>
<span id="cb25-21"><a href="#cb25-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-22"><a href="#cb25-22" aria-hidden="true" tabindex="-1"></a> workflowsWithNJumpers n sys <span class="ot">=</span></span>
<span id="cb25-23"><a href="#cb25-23" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (cfg, nLookup, _) <span class="ot">=</span> cfGraph sys</span>
<span id="cb25-24"><a href="#cb25-24" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="fu">map</span> (<span class="fu">fst</span> <span class="op">.</span> nLookup <span class="op">.</span> <span class="fu">fst</span>)</span>
<span id="cb25-25"><a href="#cb25-25" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">filter</span> (\(_, d) <span class="ot">-></span> d <span class="op">==</span> n)</span>
<span id="cb25-26"><a href="#cb25-26" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Array.assocs</span>
<span id="cb25-27"><a href="#cb25-27" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Graph.indegree</span>
<span id="cb25-28"><a href="#cb25-28" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> cfg</span></code></pre></div>
<p><code>removeJumps</code> does two things: first, it finds blocks with only one jumper, and inlines their statements to the jump location. Then it finds blocks to which there are no jumps, and removes them entirely from the program. It uses the <code>workflowsWithNJumpers</code> helper function that uses the control-flow graph of the system to find all workflows to which there are <code>n</code> number of jumps, where <code>n</code> is provided as an input to the function. Note the usage of the <code>toposortWorkflows</code> function here, which makes sure that we remove the blocks in topological order, accumulating as many statements as possible in the final program.</p>
<p>With these functions in place, we write the <code>optimize</code> function:</p>
<div class="sourceCode" id="cb26" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="ot">optimize ::</span> <span class="dt">System</span> <span class="ot">-></span> <span class="dt">System</span></span>
<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a>optimize <span class="ot">=</span></span>
<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a> applyTillUnchanged</span>
<span id="cb26-4"><a href="#cb26-4" aria-hidden="true" tabindex="-1"></a> (removeJumps <span class="op">.</span> inlineRedundantJumps <span class="op">.</span> simplifyWorkflows)</span>
<span id="cb26-5"><a href="#cb26-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb26-6"><a href="#cb26-6" aria-hidden="true" tabindex="-1"></a><span class="ot"> applyTillUnchanged ::</span> (<span class="dt">Eq</span> a) <span class="ot">=></span> (a <span class="ot">-></span> a) <span class="ot">-></span> a <span class="ot">-></span> a</span>
<span id="cb26-7"><a href="#cb26-7" aria-hidden="true" tabindex="-1"></a> applyTillUnchanged f <span class="ot">=</span></span>
<span id="cb26-8"><a href="#cb26-8" aria-hidden="true" tabindex="-1"></a> fix (\recurse x <span class="ot">-></span> <span class="kw">if</span> f x <span class="op">==</span> x <span class="kw">then</span> x <span class="kw">else</span> recurse (f x))</span></code></pre></div>
<p>We execute the three optimization functions repeatedly till a <a href="https://en.wikipedia.org/wiki/Fixed-point_combinator" target="_blank" rel="noopener">fixed point</a> is reached for the resultant <code class="sourceCode haskell"><span class="dt">System</span></code>, that is, till there are no further possibilities of optimization.</p>
<p>Finally, we change our <code>main</code> function to apply the optimizations:</p>
<div class="sourceCode" id="cb27" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a> file <span class="ot"><-</span> <span class="fu">head</span> <span class="op"><$></span> getArgs</span>
<span id="cb27-4"><a href="#cb27-4" aria-hidden="true" tabindex="-1"></a> code <span class="ot"><-</span> <span class="fu">readFile</span> file</span>
<span id="cb27-5"><a href="#cb27-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> parse inputParser code <span class="kw">of</span></span>
<span id="cb27-6"><a href="#cb27-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> (<span class="dt">Input</span> system parts) <span class="ot">-></span></span>
<span id="cb27-7"><a href="#cb27-7" aria-hidden="true" tabindex="-1"></a> <span class="fu">putStrLn</span> <span class="op">.</span> toC <span class="op">$</span> <span class="dt">Input</span> (optimize system) parts</span>
<span id="cb27-8"><a href="#cb27-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> err <span class="ot">-></span> <span class="fu">error</span> err</span></code></pre></div>
<p>Compiling the optimized compiler and running it as earlier, generates this C code for the <code>runSystem</code> function now:</p>
<div class="sourceCode" id="cb28" data-lang="c"><pre class="sourceCode numberSource c"><code class="sourceCode c"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a><span class="dt">bool</span> runSystem<span class="op">(</span><span class="dt">int</span> x<span class="op">,</span> <span class="dt">int</span> m<span class="op">,</span> <span class="dt">int</span> a<span class="op">,</span> <span class="dt">int</span> s<span class="op">)</span> <span class="op">{</span></span>
<span id="cb28-2"><a href="#cb28-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">goto</span> in<span class="op">;</span></span>
<span id="cb28-3"><a href="#cb28-3" aria-hidden="true" tabindex="-1"></a>in<span class="op">:</span></span>
<span id="cb28-4"><a href="#cb28-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>s <span class="op"><</span> <span class="dv">1351</span><span class="op">)</span> <span class="op">{</span> <span class="cf">goto</span> px<span class="op">;</span> <span class="op">}</span></span>
<span id="cb28-5"><a href="#cb28-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>s <span class="op">></span> <span class="dv">2770</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb28-6"><a href="#cb28-6" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>m <span class="op"><</span> <span class="dv">1801</span><span class="op">)</span> <span class="op">{</span> <span class="cf">goto</span> hdj<span class="op">;</span> <span class="op">}</span></span>
<span id="cb28-7"><a href="#cb28-7" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">false</span><span class="op">;</span></span>
<span id="cb28-8"><a href="#cb28-8" aria-hidden="true" tabindex="-1"></a>px<span class="op">:</span></span>
<span id="cb28-9"><a href="#cb28-9" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>a <span class="op"><</span> <span class="dv">2006</span><span class="op">)</span> <span class="op">{</span> <span class="cf">goto</span> qkq<span class="op">;</span> <span class="op">}</span></span>
<span id="cb28-10"><a href="#cb28-10" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>m <span class="op">></span> <span class="dv">2090</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb28-11"><a href="#cb28-11" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>s <span class="op"><</span> <span class="dv">537</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">false</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb28-12"><a href="#cb28-12" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>x <span class="op">></span> <span class="dv">2440</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">false</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb28-13"><a href="#cb28-13" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span></span>
<span id="cb28-14"><a href="#cb28-14" aria-hidden="true" tabindex="-1"></a>qkq<span class="op">:</span></span>
<span id="cb28-15"><a href="#cb28-15" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>x <span class="op"><</span> <span class="dv">1416</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb28-16"><a href="#cb28-16" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>x <span class="op">></span> <span class="dv">2662</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb28-17"><a href="#cb28-17" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">false</span><span class="op">;</span></span>
<span id="cb28-18"><a href="#cb28-18" aria-hidden="true" tabindex="-1"></a>hdj<span class="op">:</span></span>
<span id="cb28-19"><a href="#cb28-19" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>m <span class="op">></span> <span class="dv">838</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb28-20"><a href="#cb28-20" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> <span class="op">(</span>a <span class="op">></span> <span class="dv">1716</span><span class="op">)</span> <span class="op">{</span> <span class="cf">return</span> <span class="kw">false</span><span class="op">;</span> <span class="op">}</span></span>
<span id="cb28-21"><a href="#cb28-21" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">true</span><span class="op">;</span></span>
<span id="cb28-22"><a href="#cb28-22" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>It works well<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>. We now have 1.7x fewer lines of code as compared to before<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>.</p>
<h2 data-track-content data-content-name="the-conclusion" data-content-piece="compiling-aoc23-aplenty" id="the-conclusion">The Conclusion</h2>
<p>This was another attempt to solve Advent of Code problems in somewhat unusual ways. This year we learned some basics of compilation. Swing by next year for more weird ways to solve simple problems.</p>
<p>The full code for this post is available <a href="https://abhinavsarkar.net/code/aplenty.html?mtm_campaign=feed">here</a>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>I love how I have to write XMAS horizontally and vertically a couple of time.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>I’m sure many more optimizations are possible yet. After all, this program is essentially a decision tree.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>For the actual problem input with 522 blocks, the optimizations reduce the LoC by 1.5x.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>Solving Advent of Code</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/type-level-haskell-aoc7/?mtm_campaign=feed">“Handy Haversacks” in Type-level Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/parsers-zippers-interpreters-aoc7/?mtm_campaign=feed">“No Space Left On Device” with Parsers, Zippers and Interpreters</a>
</li>
<li>
<a href="https://abhinavsarkar.net/notes/2022-type-level-rps/?mtm_campaign=feed">“Rock-Paper-Scissors” in Type-level Haskell</a>
</li>
<li>
<strong>“Aplenty” by Compiling</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/solving-aoc20-seating-system/?mtm_campaign=feed">“Seating System” with Comonads and Stencils</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/compiling-aoc23-aplenty/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2024-04-07T00:00:00Z <p>Every year I try to solve some problems from the <a href="https://adventofcode.com" target="_blank" rel="noopener">Advent of Code</a> (AoC) competition in a <a href="https://abhinavsarkar.net/posts/type-level-haskell-aoc7/">not</a> <a href="https://abhinavsarkar.net/notes/2022-type-level-rps">straightforward</a> <a href="https://abhinavsarkar.net/posts/parsers-zippers-interpreters-aoc7/">way</a>. Let’s solve the part one of the day 19 problem <a href="https://adventofcode.com/2023/day/19" target="_blank" rel="noopener">Aplenty</a> by compiling the problem input to an executable file.</p>
https://abhinavsarkar.net/posts/implementing-co-4/ Implementing Co, a Small Language With Coroutines #4: Adding Channels 2023-06-03T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In the <a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed">previous post</a>, we added coroutines to <span class="fancy">Co</span>, the small language we are implementing in this series of posts. In this post, we add channels to it to be able to communicate between coroutines.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/implementing-co-4/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>Implementing Co, a Small Language With Coroutines</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed">The Parser</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed">The Interpreter</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed">Adding Coroutines</a>
</li>
<li>
<strong>Adding Channels</strong> 👈
</li>
</ol>
</section>
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#introduction">Introduction</a></li><li><a href="#channel-design">Channel Design</a></li><li><a href="#channel-operations">Channel Operations</a></li><li><a href="#adding-channels">Adding Channels</a></li><li><a href="#wiring-channels">Wiring Channels</a></li><li><a href="#sending-and-receiving">Sending and Receiving</a></li><li><a href="#pubsub-using-channels">Pubsub using Channels</a></li><li><a href="#bonus-round-emulating-actors">Bonus Round: Emulating Actors</a></li></ol></nav>
<h2 data-track-content data-content-name="introduction" data-content-piece="implementing-co-4" id="introduction">Introduction</h2>
<p>With coroutines, we can now have multiple <em>Threads of Computation</em> (ToCs) in a <span class="fancy">Co</span> program. However, right now these <abbr title="Threads of computation">ToCs</abbr> work completely independent of each other. Often in such concurrent systems, we need to communicate between these <abbr title="Threads of computation">ToCs</abbr>, for example, one coroutine may produce some data that other coroutines may need to consume. Or, one coroutine may need to wait for some other coroutine to complete some task before it can proceed. For that, we need <a href="https://en.wikipedia.org/wiki/Synchronization_(computer_science)#Thread_or_process_synchronization" target="_blank" rel="noopener">Synchonization</a> between coroutines.</p>
<p>There are various ways to synchronize <abbr title="Threads of computation">ToCs</abbr>: <a href="https://en.wikipedia.org/wiki/Lock_(computer_science)" target="_blank" rel="noopener">Locks</a>, <a href="https://en.wikipedia.org/wiki/Semaphore_(programming)" target="_blank" rel="noopener">Semaphores</a>, <a href="https://en.wikipedia.org/wiki/Futures_and_promises" target="_blank" rel="noopener">Promises</a>, <a href="https://en.wikipedia.org/wiki/Actor_model" target="_blank" rel="noopener">Actors</a>, <a href="https://en.wikipedia.org/wiki/Channel_(programming)" target="_blank" rel="noopener">Channels</a>, <a href="https://en.wikipedia.org/wiki/Software_transactional_memory" target="_blank" rel="noopener">Software Transactional Memory</a>, etc. In particular, channels are generally used with coroutines for synchronization in many languages like <a href="https://gobyexample.com/channels" target="_blank" rel="noopener">Go</a>, <a href="https://kotlinlang.org/docs/coroutines-and-channels.html" target="_blank" rel="noopener">Kotlin</a>, <a href="https://docs.python.org/3/library/asyncio-queue.html" target="_blank" rel="noopener">Python</a> etc, and we are going to do the same.</p>
<p><a href="https://en.wikipedia.org/wiki/Channel_(programming)" target="_blank" rel="noopener"><em>Channels</em></a> are a synchronization primitive based on <a href="https://en.wikipedia.org/wiki/Communicating_Sequential_Processes" target="_blank" rel="noopener">Communicating Sequential Processes</a><sup><a href="#ref-Hoare1986-ih" class="citation" title="Hoare, Communicating Sequential Processes.
">@1</a></sup> (CSP). <abbr title="Communicating Sequential Processes">CSP</abbr> is a formal language for describing patterns of interaction between concurrent processes. In <abbr title="Communicating Sequential Processes">CSP</abbr>, processes communicate with each other by sending and receiving messages over channels.</p>
<p>A process can send a message to a channel only if the channel is not full, and blocks otherwise. Similarly, a process can receive a message from a channel only if the channel is not empty, blocking otherwise. Thus, channels provide a way for processes to synchronize with each other, and at the same time, communicate by passing messages.</p>
<p>Before we implement channels, we have to decide how they are going to work.</p>
<h2 data-track-content data-content-name="channel-design" data-content-piece="implementing-co-4" id="channel-design">Channel Design</h2>
<p>There are various design decisions that we need to make while implementing channels. Depending on what we choose, we end up with different kinds. Some of the major design decisions are:</p>
<dl>
<dt><em>Buffered</em> vs <em>Unbuffered</em></dt>
<dd>
<p>A buffered channel has a buffer to store messages. A send operation on a buffered channel succeeds if the buffer is not full, even if there are no pending receive operations. On the other hand, a send operation on an unbuffered channel blocks until the message is received by some other process. For example, in Java <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/LinkedBlockingQueue.html" target="_blank" rel="noopener"><code>LinkedBlockingQueue</code></a> is a buffered channel, while <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/SynchronousQueue.html" target="_blank" rel="noopener"><code>SynchronousQueue</code></a> is an unbuffered channel<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>.</p>
</dd>
<dt><em>Bounded</em> vs <em>Unbounded</em></dt>
<dd>
<p>A bounded channel has a buffer of fixed capacity, and can hold only a fixed number of messages at maximum. A send operation on a bounded channel blocks if the buffer is full and there are no pending receive operations. An unbounded channel has a buffer with no fixed capacity, and can hold any number of messages. A send operation on an unbounded channel never blocks. For example, in Java <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/ArrayBlockingQueue.html" target="_blank" rel="noopener"><code>ArrayBlockingQueue</code></a> is a bounded channel, while <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/LinkedBlockingQueue.html" target="_blank" rel="noopener"><code>LinkedBlockingQueue</code></a> is an unbounded one.</p>
</dd>
<dt><em>Synchronous</em> vs <em>Asynchronous</em></dt>
<dd>
<p>A synchronous channel blocks on send until the message is received by some other process, even if the channel has an unbounded buffer. An asynchronous channel does not block on send if the channel’s buffer has space. For example, in Java <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/LinkedTransferQueue.html" target="_blank" rel="noopener"><code>LinkedTransferQueue</code></a> is a synchronous channel, while <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/ArrayBlockingQueue.html" target="_blank" rel="noopener"><code>ArrayBlockingQueue</code></a> is an asynchronous channel.</p>
</dd>
<dt><em>Blocking</em> vs <em>Non-blocking</em></dt>
<dd>
<p>A blocking channel blocks on send if the channel’s buffer is full, or on receive if it is empty. A non-blocking channel never blocks on send or receive, and instead returns a sentinel value (usually the <a href="https://en.wikipedia.org/wiki/Null_pointer" target="_blank" rel="noopener">Null</a> value), or throws an error to indicate that the operation could not be executed. For example, in Java <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/BlockingQueue.html#put(E)" target="_blank" rel="noopener"><code>BlockingQueue.put</code></a> is a blocking send operation, while <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/BlockingQueue.html#offer(E)" target="_blank" rel="noopener"><code>BlockingQueue.offer</code></a> is a non-blocking send operation.</p>
</dd>
<dt><em>Fair</em> vs <em>Unfair</em></dt>
<dd>
<p>A fair channel ensures that the order of sends and receives is preserved. That means, if there are multiple pending sends and receives, they are executed in the order they were requested. An unfair channel does not guarantee any order. For example, in Java, <code>ArrayBlockingQueue</code> supports <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/ArrayBlockingQueue.html#%3Cinit%3E(int,boolean)" target="_blank" rel="noopener">fair and unfair modes</a> by passing a boolean flag to its constructor.</p>
</dd>
<dt><em>Locking</em> vs <em>Lock-free</em></dt>
<dd>
<p>A locking channel uses locks to synchronize access to the channel. A lock-free channel uses atomic operations for the same. For example, in Java <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/LinkedBlockingQueue.html" target="_blank" rel="noopener"><code>LinkedBlockingQueue</code></a> is a locking channel, while <a href="https://docs.oracle.com/en/java/javase/20/docs/api/java.base/java/util/concurrent/ConcurrentLinkedQueue.html" target="_blank" rel="noopener"><code>ConcurrentLinkedQueue</code></a> is a lock-free channel.</p>
</dd>
<dt><em>Selectable</em> vs <em>Non-selectable</em></dt>
<dd>
<p>A selectable channel can be used in a <a href="https://en.wikipedia.org/wiki/Select_(Unix)" target="_blank" rel="noopener"><em>Select</em></a> like operation to wait for a message on multiple channels at once. A non-selectable channel cannot be used in such an operation. For example, channels in <a href="https://gobyexample.com/channels" target="_blank" rel="noopener">Go</a> and <a href="https://github.com/clojure/core.async" target="_blank" rel="noopener">Clojure core.async</a> are selectable, while aforementioned channels in Java are not.</p>
</dd>
</dl>
<p>In our implementation for <span class="fancy">Co</span>, we have both buffered and unbuffered channels. The buffered channels are bounded, with a fixed capacity. The channels are asynchronous, blocking, fair, lock-free, and non-selectable.</p>
<p>Enough of theory, let’s see how channels work in <span class="fancy">Co</span>.</p>
<h2 data-track-content data-content-name="channel-operations" data-content-piece="implementing-co-4" id="channel-operations">Channel Operations</h2>
<p>In this section, we explore the various scenarios for send and receive operations on a channel in <span class="fancy">Co</span> using diagrams. These diagrams are for buffered channels. For unbuffered channels, the send operation acts as for a fully buffered channel, and the receive operation acts as for an empty buffered channel.</p>
<p>Each channel has three internal queues: a send queue, a receive queue, and a buffer<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>. The send and receive queues are used to store pending send and receive operations (as coroutines) respectively. The buffer is used to store data of the messages. The send and receive queues are always bounded, because otherwise any number of send and receive operations can be blocked on a channel, thus defeating the point of bounded buffer. In extreme cases, it can cause the program to run out of memory.</p>
<p>The invariants we must maintain for the channel operations are:</p>
<ol type="1">
<li>There can never be pending send operations while there are pending receive operations, and vice versa. This is because a send operation will complete immediately if there are pending receive operations, and vice versa.</li>
<li>There can never be pending receive operations while there are messages in the buffer. This is because a receive operation will complete immediately by dequeuing the oldest message in the buffer.</li>
<li>There can never be pending send operations while there is room in the buffer. This is because a send operation will complete immediately by enqueuing the message in the buffer.</li>
</ol>
<p>With these invariants in mind, let’s look at the different scenarios in detail:</p>
<ul>
<li>When a program tries to receive from a channel, and the channel has nothing in its buffer and there are no pending sends, the program blocks. The programs’s continuation is captured as a coroutine, and is enqueued to the receive queue. Note that the coroutine is not queued into the interpreter’s global coroutine queue.</li>
</ul>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 873 321'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 2.7196261682242993" data-src="/images/implementing-co-4/receive1.svg" alt="Receive when no pending sends and buffer empty"></img>
<noscript><img src="/images/implementing-co-4/receive1.svg" class="w-100pct nolink extra-width" alt="Receive when no pending sends and buffer empty"></img></noscript>
<figcaption>Receive when no pending sends and buffer empty</figcaption>
</figure>
<ul>
<li>The corresponding scenario for a send operation is when the channel has pending receives. In this case, the send operation completes immediately, and the first coroutine in the receive queue is dequeued and resumed with the message.</li>
</ul>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 857 321'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 2.6697819314641746" data-src="/images/implementing-co-4/send1.svg" alt="Send when pending receives"></img>
<noscript><img src="/images/implementing-co-4/send1.svg" class="w-100pct nolink extra-width" alt="Send when pending receives"></img></noscript>
<figcaption>Send when pending receives</figcaption>
</figure>
<ul>
<li>When there are no pending receives and the buffer is not full, the message is enqueued to the buffer, and the send operation completes immediately.</li>
</ul>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 841 321'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 2.61993769470405" data-src="/images/implementing-co-4/send2.svg" alt="Send when no pending receives and buffer not full"></img>
<noscript><img src="/images/implementing-co-4/send2.svg" class="w-100pct nolink extra-width" alt="Send when no pending receives and buffer not full"></img></noscript>
<figcaption>Send when no pending receives and buffer not full</figcaption>
</figure>
<ul>
<li>In the corresponding scenario for a receive operation, when there are no pending sends, and there are messages in the buffer, the oldest message is dequeued, and the receive operation completes immediately with it.</li>
</ul>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 873 321'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 2.7196261682242993" data-src="/images/implementing-co-4/receive2.svg" alt="Receive when no pending sends and buffer not empty"></img>
<noscript><img src="/images/implementing-co-4/receive2.svg" class="w-100pct nolink extra-width" alt="Receive when no pending sends and buffer not empty"></img></noscript>
<figcaption>Receive when no pending sends and buffer not empty</figcaption>
</figure>
<ul>
<li>When the buffer is full, the program trying to do a send operation is blocked and its continuation is captured as a coroutine and queued into the send queue. Note that the coroutine is not queued into the interpreter’s global coroutine queue.</li>
</ul>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 841 321'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 2.61993769470405" data-src="/images/implementing-co-4/send3.svg" alt="Send when buffer full"></img>
<noscript><img src="/images/implementing-co-4/send3.svg" class="w-100pct nolink extra-width" alt="Send when buffer full"></img></noscript>
<figcaption>Send when buffer full</figcaption>
</figure>
<ul>
<li>In the corresponding scenario for a receive operation, when the buffer is full, the oldest message is dequeued from the buffer, and the receive operation completes immediately with it. If there are pending sends, the oldest coroutine in the send queue is dequeued and resumed, and its message is enqueued to the buffer.</li>
</ul>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 1025 321'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 3.1931464174454827" data-src="/images/implementing-co-4/receive3.svg" alt="Receive when pending sends and buffer full"></img>
<noscript><img src="/images/implementing-co-4/receive3.svg" class="w-100pct nolink extra-width" alt="Receive when pending sends and buffer full"></img></noscript>
<figcaption>Receive when pending sends and buffer full</figcaption>
</figure>
<ul>
<li>When the send queue is full and the buffer is full as well, an error is thrown when trying to do a send operation.</li>
</ul>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 849 321'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 2.6448598130841123" data-src="/images/implementing-co-4/send4.svg" alt="Send when send queue and buffer full"></img>
<noscript><img src="/images/implementing-co-4/send4.svg" class="w-100pct nolink extra-width" alt="Send when send queue and buffer full"></img></noscript>
<figcaption>Send when send queue and buffer full</figcaption>
</figure>
<ul>
<li>Similarly, when the receive queue is full and the buffer is empty, an error is thrown when a receive operation is attempted.</li>
</ul>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 865 321'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 2.694704049844237" data-src="/images/implementing-co-4/receive4.svg" alt="Receive when receive queue full and buffer empty"></img>
<noscript><img src="/images/implementing-co-4/receive4.svg" class="w-100pct nolink extra-width" alt="Receive when receive queue full and buffer empty"></img></noscript>
<figcaption>Receive when receive queue full and buffer empty</figcaption>
</figure>
<p>That captures all scenarios for send and receive operations on a channel. In the next section, we implement channels in <span class="fancy">Co</span>.</p>
<h2 data-track-content data-content-name="adding-channels" data-content-piece="implementing-co-4" id="adding-channels">Adding Channels</h2>
<p>Let’s start with defining the <code>Channel</code> type:</p>
<div class="sourceCode" id="cb1" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Channel</span> <span class="ot">=</span> <span class="dt">Channel</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> channelCapacity ::</span> <span class="dt">Int</span>,</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="ot"> channelBuffer ::</span> <span class="dt">Queue</span> <span class="dt">Value</span>,</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="ot"> channelSendQueue ::</span> <span class="dt">Queue</span> (<span class="dt">Coroutine</span> (), <span class="dt">Value</span>),</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="ot"> channelReceiveQueue ::</span> <span class="dt">Queue</span> (<span class="dt">Coroutine</span> <span class="dt">Value</span>)</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="ot">newChannel ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Channel</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>newChannel size <span class="ot">=</span> <span class="dt">Channel</span> size <span class="op"><$></span> newQueue <span class="op"><*></span> newQueue <span class="op"><*></span> newQueue</span></code></pre></div>
<p>A channel has a buffer, a send queue, and a receive queue. The buffer is a queue of <span class="fancy">Co</span> values, the receive queue is a queue of coroutines, and the send queue is a queue of coroutine and value pairs. A channel also has a capacity, which is the capacity of the buffer<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>.</p>
<p>Now, we add <code>Channel</code> to the <code>Value</code> type:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="8-8"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Value</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Null</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Boolean</span> <span class="dt">Bool</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Str</span> <span class="dt">String</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Num</span> <span class="dt">Integer</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Function</span> <span class="dt">Identifier</span> [<span class="dt">Identifier</span>] [<span class="dt">Stmt</span>] <span class="dt">Env</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">BuiltinFunction</span> <span class="dt">Identifier</span> <span class="dt">Int</span> ([<span class="dt">Expr</span>] <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Value</span>)</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op">|</span> <span class="dt">Chan</span> <span class="dt">Channel</span></span></span></code></pre></div>
<p>Finally, we introduce some new built-in functions to create channels:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="4-7" data-deemphasize="8-10"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">builtinEnv ::</span> <span class="dt">IO</span> <span class="dt">Env</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>builtinEnv <span class="ot">=</span> Map.fromList <span class="op"><$></span> <span class="fu">traverse</span> (<span class="fu">traverse</span> newIORef) [</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> (<span class="st">"print"</span>, <span class="dt">BuiltinFunction</span> <span class="st">"print"</span> <span class="dv">1</span> executePrint)</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> , (<span class="st">"newChannel"</span>,</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dt">BuiltinFunction</span> <span class="st">"newChannel"</span> <span class="dv">0</span> <span class="op">$</span> <span class="fu">fmap</span> <span class="dt">Chan</span> <span class="op">.</span> <span class="fu">const</span> (newChannel <span class="dv">0</span>))</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> , (<span class="st">"newBufferedChannel"</span>,</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dt">BuiltinFunction</span> <span class="st">"newBufferedChannel"</span> <span class="dv">1</span> executeNewBufferedChannel)</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> , (<span class="st">"sleep"</span>, <span class="dt">BuiltinFunction</span> <span class="st">"sleep"</span> <span class="dv">1</span> executeSleep)</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> , (<span class="st">"getCurrentMillis"</span>,</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">BuiltinFunction</span> <span class="st">"getCurrentMillis"</span> <span class="dv">0</span> executeGetCurrentMillis)</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> ]</span></code></pre></div>
<p>The <code>newChannel</code> function creates an unbuffered channel, and the <code>newBufferedChannel</code> function creates a buffered channel with the given capacity:</p>
<div class="sourceCode" id="cb2" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">executeNewBufferedChannel ::</span> [<span class="dt">Expr</span>] <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Value</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>executeNewBufferedChannel argEs <span class="ot">=</span> evaluate (<span class="fu">head</span> argEs) <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Num</span> capacity <span class="op">|</span> capacity <span class="op">>=</span> <span class="dv">0</span> <span class="ot">-></span> <span class="dt">Chan</span> <span class="op"><$></span> newChannel (<span class="fu">fromIntegral</span> capacity)</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> throw <span class="st">"newBufferedChannel call expected a positive number argument"</span></span></code></pre></div>
<h2 data-track-content data-content-name="wiring-channels" data-content-piece="implementing-co-4" id="wiring-channels">Wiring Channels</h2>
<p>Moving on to wiring the channels into the existing interpreter implementation. First we add a new constructor for send statements to the <code>Stmt</code> type:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="11-11"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Stmt</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">ExprStmt</span> <span class="dt">Expr</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">VarStmt</span> <span class="dt">Identifier</span> <span class="dt">Expr</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">AssignStmt</span> <span class="dt">Identifier</span> <span class="dt">Expr</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">IfStmt</span> <span class="dt">Expr</span> [<span class="dt">Stmt</span>]</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">WhileStmt</span> <span class="dt">Expr</span> [<span class="dt">Stmt</span>]</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">FunctionStmt</span> <span class="dt">Identifier</span> [<span class="dt">Identifier</span>] [<span class="dt">Stmt</span>]</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">ReturnStmt</span> (<span class="dt">Maybe</span> <span class="dt">Expr</span>)</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">YieldStmt</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">SpawnStmt</span> <span class="dt">Expr</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op">|</span> <span class="dt">SendStmt</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Program</span> <span class="ot">=</span> [<span class="dt">Stmt</span>]</span></code></pre></div>
<p>And another for receive expressions to the <code>Expr</code> type:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="10-10"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Expr</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">LNull</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">LBool</span> <span class="dt">Bool</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">LStr</span> <span class="dt">String</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">LNum</span> <span class="dt">Integer</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Variable</span> <span class="dt">Identifier</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Binary</span> <span class="dt">BinOp</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Call</span> <span class="dt">Expr</span> [<span class="dt">Expr</span>]</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Lambda</span> [<span class="dt">Identifier</span>] [<span class="dt">Stmt</span>]</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op">|</span> <span class="dt">Receive</span> <span class="dt">Expr</span></span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Identifier</span> <span class="ot">=</span> <span class="dt">String</span></span></code></pre></div>
<p>We have already written the code to parse these statements and expressions in the <a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed">first post</a>, so that’s taken care of. We need to modify the <code>execute</code> and <code>evaluate</code> functions to handle these new statements and expressions. Let’s start with <code>execute</code>:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="23-27"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">execute ::</span> <span class="dt">Stmt</span> <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>execute <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">ExprStmt</span> expr <span class="ot">-></span> void <span class="op">$</span> evaluate expr</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">VarStmt</span> name expr <span class="ot">-></span> evaluate expr <span class="op">>>=</span> defineVar name</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">AssignStmt</span> name expr <span class="ot">-></span> evaluate expr <span class="op">>>=</span> assignVar name</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">IfStmt</span> expr body <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> cond <span class="ot"><-</span> evaluate expr</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> when (isTruthy cond) <span class="op">$</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> traverse_ execute body</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> while<span class="op">@</span>(<span class="dt">WhileStmt</span> expr body) <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> cond <span class="ot"><-</span> evaluate expr</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> when (isTruthy cond) <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> traverse_ execute body</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> execute while</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">ReturnStmt</span> mExpr <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> mRet <span class="ot"><-</span> <span class="fu">traverse</span> evaluate mExpr</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> throwError <span class="op">.</span> <span class="dt">Return</span> <span class="op">.</span> fromMaybe <span class="dt">Null</span> <span class="op">$</span> mRet</span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">FunctionStmt</span> name params body <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a> env <span class="ot"><-</span> State.gets isEnv</span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> defineVar name <span class="op">$</span> <span class="dt">Function</span> name params body env</span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a> <span class="dt">YieldStmt</span> <span class="ot">-></span> yield</span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a> <span class="dt">SpawnStmt</span> expr <span class="ot">-></span> spawn expr</span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dt">SendStmt</span> expr chan <span class="ot">-></span> evaluate chan <span class="op">>>=</span> \<span class="kw">case</span></span></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dt">Chan</span> channel <span class="ot">-></span> <span class="kw">do</span></span></span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> val <span class="ot"><-</span> evaluate expr</span></span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> channelSend val channel</span></span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> v <span class="ot">-></span> throw <span class="op">$</span> <span class="st">"Cannot send to a non-channel: "</span> <span class="op"><></span> <span class="fu">show</span> v</span></span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"></a> isTruthy <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb1-30"><a href="#cb1-30" aria-hidden="true" tabindex="-1"></a> <span class="dt">Null</span> <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb1-31"><a href="#cb1-31" aria-hidden="true" tabindex="-1"></a> <span class="dt">Boolean</span> b <span class="ot">-></span> b</span>
<span id="cb1-32"><a href="#cb1-32" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">True</span></span></code></pre></div>
<p>To execute a <code>SendStmt</code>, we evaluate its arguments to get the channel and the value to send. Then we call the <code>channelSend</code> function to send the value over the channel.</p>
<p>Similarly, to evaluate a <code>Receive</code> expression, we evaluate its argument to get the channel, and then call the <code>channelReceive</code> function to receive a value from the channel:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="11-13"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">evaluate ::</span> <span class="dt">Expr</span> <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Value</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>evaluate <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">LNull</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="dt">Null</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">LBool</span> bool <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Boolean</span> bool</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">LStr</span> str <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Str</span> str</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">LNum</span> num <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Num</span> num</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Variable</span> v <span class="ot">-></span> lookupVar v</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Lambda</span> params body <span class="ot">-></span> <span class="dt">Function</span> <span class="st">"<lambda>"</span> params body <span class="op"><$></span> State.gets isEnv</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> binary<span class="op">@</span><span class="dt">Binary</span> {} <span class="ot">-></span> evaluateBinaryOp binary</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> call<span class="op">@</span><span class="dt">Call</span> {} <span class="ot">-></span> evaluateFuncCall call</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dt">Receive</span> expr <span class="ot">-></span> evaluate expr <span class="op">>>=</span> \<span class="kw">case</span></span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dt">Chan</span> channel <span class="ot">-></span> channelReceive channel</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> val <span class="ot">-></span> throw <span class="op">$</span> <span class="st">"Cannot receive from a non-channel: "</span> <span class="op"><></span> <span class="fu">show</span> val</span></span></code></pre></div>
<p>Now comes the core of the implementation: the <code>channelSend</code> and <code>channelReceive</code> functions. Let’s look into them in detail.</p>
<h2 data-track-content data-content-name="sending-and-receiving" data-content-piece="implementing-co-4" id="sending-and-receiving">Sending and Receiving</h2>
<p>The <code>channelSend</code> function takes a value and a channel, and sends the value over the channel, blocking if necessary.</p>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ot">channelSend ::</span> <span class="dt">Value</span> <span class="ot">-></span> <span class="dt">Channel</span> <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>channelSend value <span class="dt">Channel</span> {<span class="op">..</span>} <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> bufferSize <span class="ot"><-</span> queueSize channelBuffer</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> sendQueueSize <span class="ot"><-</span> queueSize channelSendQueue</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> dequeue channelReceiveQueue <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> <span class="co">-- there are pending receives</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> coroutine<span class="op">@</span><span class="dt">Coroutine</span> {<span class="op">..</span>} <span class="ot">-></span></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> scheduleCoroutine <span class="op">$</span> coroutine { corCont <span class="ot">=</span> <span class="fu">const</span> <span class="op">$</span> corCont value }</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> <span class="co">-- there are no pending receives and the buffer is not full</span></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="op">|</span> channelCapacity <span class="op">></span> <span class="dv">0</span> <span class="op">&&</span> bufferSize <span class="op"><</span> channelCapacity <span class="ot">-></span></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a> enqueue value channelBuffer</span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a> <span class="co">-- there are no pending receives and</span></span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a> <span class="co">-- (the buffer is full or the channel is unbuffered)</span></span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="op">|</span> sendQueueSize <span class="op"><</span> maxSendQueueSize <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> env <span class="ot"><-</span> State.gets isEnv</span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a> callCC <span class="op">$</span> \cont <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a> coroutine <span class="ot"><-</span> newCoroutine env cont</span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a> enqueue (coroutine, value) channelSendQueue</span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a> runNextCoroutine</span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a> <span class="co">-- the send queue is full</span></span>
<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> throw <span class="st">"Channel send queue is full"</span></span>
<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a> maxSendQueueSize <span class="ot">=</span> <span class="dv">4</span></span></code></pre></div>
<p>This is a direct implementation of the algorithm we discussed earlier using diagrams. We try to dequeue a coroutine from the receive queue. Then:</p>
<ul>
<li>If there is a coroutine, we schedule it to be run with the sent value. The send call <strong>does not block</strong>.</li>
<li>If there is no coroutine, and
<ul>
<li>the channel is buffered and the buffer is not full, we enqueue the sent value to the buffer. The send call <strong>does not block</strong>.</li>
<li>the buffer is full, we create a new coroutine with the current continuation, and enqueue the coroutine and the value to the send queue. The send call <strong>blocks</strong>.</li>
</ul></li>
<li>If the send queue is full, we throw an error.</li>
</ul>
<p>Next, let’s write the <code>channelReceive</code> function:</p>
<div class="sourceCode" id="cb4" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">channelReceive ::</span> <span class="dt">Channel</span> <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Value</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>channelReceive <span class="dt">Channel</span> {<span class="op">..</span>} <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> mSend <span class="ot"><-</span> dequeue channelSendQueue</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a> mBufferedValue <span class="ot"><-</span> dequeue channelBuffer</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a> recieveQueueSize <span class="ot"><-</span> queueSize channelReceiveQueue</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> (mSend, mBufferedValue) <span class="kw">of</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a> <span class="co">-- the channel is unbuffered and there are pending sends</span></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Just</span> (sendCoroutine, sendValue), <span class="dt">Nothing</span>) <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a> scheduleCoroutine sendCoroutine</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> sendValue</span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a> <span class="co">-- the buffer is full and there are pending sends</span></span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Just</span> (sendCoroutine, sendValue), <span class="dt">Just</span> bufferedValue) <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a> scheduleCoroutine sendCoroutine</span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a> enqueue sendValue channelBuffer</span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> bufferedValue</span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a> <span class="co">-- the buffer is empty and there are no pending sends</span></span>
<span id="cb4-20"><a href="#cb4-20" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Nothing</span>, <span class="dt">Nothing</span>) <span class="op">|</span> recieveQueueSize <span class="op"><</span> maxReceiveQueueSize <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb4-21"><a href="#cb4-21" aria-hidden="true" tabindex="-1"></a> env <span class="ot"><-</span> State.gets isEnv</span>
<span id="cb4-22"><a href="#cb4-22" aria-hidden="true" tabindex="-1"></a> callCC <span class="op">$</span> \receive <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb4-23"><a href="#cb4-23" aria-hidden="true" tabindex="-1"></a> coroutine <span class="ot"><-</span> newCoroutine env receive</span>
<span id="cb4-24"><a href="#cb4-24" aria-hidden="true" tabindex="-1"></a> enqueue coroutine channelReceiveQueue</span>
<span id="cb4-25"><a href="#cb4-25" aria-hidden="true" tabindex="-1"></a> runNextCoroutine</span>
<span id="cb4-26"><a href="#cb4-26" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> <span class="dt">Null</span></span>
<span id="cb4-27"><a href="#cb4-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-28"><a href="#cb4-28" aria-hidden="true" tabindex="-1"></a> <span class="co">-- the receive queue is full</span></span>
<span id="cb4-29"><a href="#cb4-29" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Nothing</span>, <span class="dt">Nothing</span>) <span class="ot">-></span> throw <span class="st">"Channel receive queue is full"</span></span>
<span id="cb4-30"><a href="#cb4-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-31"><a href="#cb4-31" aria-hidden="true" tabindex="-1"></a> <span class="co">-- the buffer is not empty and there are no pending sends</span></span>
<span id="cb4-32"><a href="#cb4-32" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Nothing</span>, <span class="dt">Just</span> bufferedValue) <span class="ot">-></span> <span class="fu">return</span> bufferedValue</span>
<span id="cb4-33"><a href="#cb4-33" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb4-34"><a href="#cb4-34" aria-hidden="true" tabindex="-1"></a> maxReceiveQueueSize <span class="ot">=</span> <span class="dv">4</span></span></code></pre></div>
<p>This is also a straightforward implementation of the algorithm. We try to dequeue a coroutine and its value from the send queue, and another value from the buffer. Then:</p>
<ul>
<li>If there is a coroutine,
<ul>
<li>but no buffered value, we schedule the coroutine to be resumed, and return its value. The returned value becomes the value that is received from the channel. The receive call <strong>does not block</strong>.</li>
<li>and a buffered value, we schedule the coroutine to be resumed, enqueue its value to the buffer, and return the buffered value. The receive call <strong>does not block</strong>.</li>
</ul></li>
<li>If there is no coroutine and no buffered value, and the receive queue is not full, we create a new coroutine with the current continuation, and enqueue it to the receive queue. The receive call <strong>blocks</strong>.</li>
<li>If the receive queue is full, we throw an error.</li>
</ul>
<p>We hardcode the capacity of the send and receive queues to 4.</p>
<p>That’s it for the implementation of channels. Since we broke down the scenarios for send and receive operations, the implementation is not complicated. Let’s see it in action next.</p>
<h2 data-track-content data-content-name="pubsub-using-channels" data-content-piece="implementing-co-4" id="pubsub-using-channels">Pubsub using Channels</h2>
<p>In this demo, we implement a pubsub system using channels. The pubsub system consists of a server and a set of workers. The server sends messages to the workers over a channel. The workers print the messages and send acks back to the server over another channel. After sending all the messages, the server waits for all the acks from the workers, and then stops the workers.</p>
<p>Diagrammatically, the pubsub system looks like this:</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 713 305'%3E%3C/svg%3E" class="lazyload w-100pct nolink" style="--image-aspect-ratio: 2.3377049180327867" data-src="/images/implementing-co-4/pubsub.svg" alt="Pubsub using channels"></img>
<noscript><img src="/images/implementing-co-4/pubsub.svg" class="w-100pct nolink" alt="Pubsub using channels"></img></noscript>
<figcaption>Pubsub using channels</figcaption>
</figure>
<p>The boxes with double borders are <abbr title="Threads of computation">ToCs</abbr>, and the ones with single borders are channels. The arrows show how the <abbr title="Threads of computation">ToCs</abbr> and channels are connected.</p>
<details>
<summary class="print-hide">
Pubsub code
</summary>
<div class="sourceCode" id="cb5" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co">// server sends messages to workers.</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">startServer</span>(messageCount<span class="op">,</span> messageChan) {</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"server starting"</span>)<span class="op">;</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> i <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (i <span class="op"><</span> messageCount <span class="op">+</span> <span class="dv">1</span>) {</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"server sending: "</span> <span class="op">+</span> i)<span class="op">;</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> i <span class="op">-></span> messageChan<span class="op">;</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"server sent: "</span> <span class="op">+</span> i)<span class="op">;</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a><span class="co">// workers receive messages over a channel, print them.</span></span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a><span class="co">// and send a ack back to the sender on a channel.</span></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">worker</span>(name<span class="op">,</span> messageChan<span class="op">,</span> ackChan) {</span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"worker "</span> <span class="op">+</span> name <span class="op">+</span> <span class="st">" starting"</span>)<span class="op">;</span></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> message <span class="op">=</span> <span class="kw">null</span><span class="op">;</span></span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (<span class="kw">true</span>) {</span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a> message <span class="op">=</span> <span class="op"><-</span> messageChan<span class="op">;</span></span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"worker "</span> <span class="op">+</span> name <span class="op">+</span> <span class="st">" received: "</span> <span class="op">+</span> message)<span class="op">;</span></span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (message <span class="op">==</span> <span class="kw">null</span>) {</span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"worker "</span> <span class="op">+</span> name <span class="op">+</span> <span class="st">" stopped"</span>)<span class="op">;</span></span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span><span class="op">;</span></span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"worker "</span> <span class="op">+</span> name <span class="op">+</span> <span class="st">" sending: "</span> <span class="op">+</span> message)<span class="op">;</span></span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a> message <span class="op">-></span> ackChan<span class="op">;</span></span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"worker "</span> <span class="op">+</span> name <span class="op">+</span> <span class="st">" sent: "</span> <span class="op">+</span> message)<span class="op">;</span></span>
<span id="cb5-28"><a href="#cb5-28" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb5-29"><a href="#cb5-29" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb5-30"><a href="#cb5-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-31"><a href="#cb5-31" aria-hidden="true" tabindex="-1"></a><span class="co">// start workers.</span></span>
<span id="cb5-32"><a href="#cb5-32" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">startWorkers</span>(workerCount<span class="op">,</span> messageChan<span class="op">,</span> ackChan) {</span>
<span id="cb5-33"><a href="#cb5-33" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"workers starting"</span>)<span class="op">;</span></span>
<span id="cb5-34"><a href="#cb5-34" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> i <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb5-35"><a href="#cb5-35" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (i <span class="op"><</span> workerCount <span class="op">+</span> <span class="dv">1</span>) {</span>
<span id="cb5-36"><a href="#cb5-36" aria-hidden="true" tabindex="-1"></a> <span class="kw">function</span>(name) {</span>
<span id="cb5-37"><a href="#cb5-37" aria-hidden="true" tabindex="-1"></a> spawn <span class="fu">worker</span>(name<span class="op">,</span> messageChan<span class="op">,</span> ackChan)<span class="op">;</span></span>
<span id="cb5-38"><a href="#cb5-38" aria-hidden="true" tabindex="-1"></a> }(i)<span class="op">;</span></span>
<span id="cb5-39"><a href="#cb5-39" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb5-40"><a href="#cb5-40" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb5-41"><a href="#cb5-41" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"workers scheduled to be started"</span>)<span class="op">;</span></span>
<span id="cb5-42"><a href="#cb5-42" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb5-43"><a href="#cb5-43" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-44"><a href="#cb5-44" aria-hidden="true" tabindex="-1"></a><span class="co">// server waits for acks from workers.</span></span>
<span id="cb5-45"><a href="#cb5-45" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">waitForWorkers</span>(messageCount<span class="op">,</span> ackChan<span class="op">,</span> doneChan) {</span>
<span id="cb5-46"><a href="#cb5-46" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"server waiting for acks"</span>)<span class="op">;</span></span>
<span id="cb5-47"><a href="#cb5-47" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> i <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb5-48"><a href="#cb5-48" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> message <span class="op">=</span> <span class="kw">null</span><span class="op">;</span></span>
<span id="cb5-49"><a href="#cb5-49" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (i <span class="op"><</span> messageCount <span class="op">+</span> <span class="dv">1</span>) {</span>
<span id="cb5-50"><a href="#cb5-50" aria-hidden="true" tabindex="-1"></a> message <span class="op">=</span> <span class="op"><-</span> ackChan<span class="op">;</span></span>
<span id="cb5-51"><a href="#cb5-51" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"server received: "</span> <span class="op">+</span> message)<span class="op">;</span></span>
<span id="cb5-52"><a href="#cb5-52" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb5-53"><a href="#cb5-53" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb5-54"><a href="#cb5-54" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"server received all acks"</span>)<span class="op">;</span></span>
<span id="cb5-55"><a href="#cb5-55" aria-hidden="true" tabindex="-1"></a> <span class="kw">null</span> <span class="op">-></span> doneChan<span class="op">;</span></span>
<span id="cb5-56"><a href="#cb5-56" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb5-57"><a href="#cb5-57" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-58"><a href="#cb5-58" aria-hidden="true" tabindex="-1"></a><span class="co">// stop workers.</span></span>
<span id="cb5-59"><a href="#cb5-59" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">stopWorkers</span>(workerCount<span class="op">,</span> messageChan<span class="op">,</span> doneChan) {</span>
<span id="cb5-60"><a href="#cb5-60" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> done <span class="op">=</span> <span class="op"><-</span> doneChan<span class="op">;</span></span>
<span id="cb5-61"><a href="#cb5-61" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"workers stopping"</span>)<span class="op">;</span></span>
<span id="cb5-62"><a href="#cb5-62" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> i <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb5-63"><a href="#cb5-63" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (i <span class="op"><</span> workerCount <span class="op">+</span> <span class="dv">1</span>) {</span>
<span id="cb5-64"><a href="#cb5-64" aria-hidden="true" tabindex="-1"></a> <span class="kw">null</span> <span class="op">-></span> messageChan<span class="op">;</span></span>
<span id="cb5-65"><a href="#cb5-65" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb5-66"><a href="#cb5-66" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb5-67"><a href="#cb5-67" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"workers scheduled to be stopped"</span>)<span class="op">;</span></span>
<span id="cb5-68"><a href="#cb5-68" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb5-69"><a href="#cb5-69" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-70"><a href="#cb5-70" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> workerCount <span class="op">=</span> <span class="dv">3</span><span class="op">;</span></span>
<span id="cb5-71"><a href="#cb5-71" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> messageCount <span class="op">=</span> <span class="dv">7</span><span class="op">;</span></span>
<span id="cb5-72"><a href="#cb5-72" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> messageBufferSize <span class="op">=</span> <span class="dv">5</span><span class="op">;</span></span>
<span id="cb5-73"><a href="#cb5-73" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> ackBufferSize <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb5-74"><a href="#cb5-74" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> messageChan <span class="op">=</span> <span class="fu">newBufferedChannel</span>(messageBufferSize)<span class="op">;</span></span>
<span id="cb5-75"><a href="#cb5-75" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> ackChan <span class="op">=</span> <span class="fu">newBufferedChannel</span>(ackBufferSize)<span class="op">;</span></span>
<span id="cb5-76"><a href="#cb5-76" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> doneChan <span class="op">=</span> <span class="fu">newChannel</span>()<span class="op">;</span></span>
<span id="cb5-77"><a href="#cb5-77" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-78"><a href="#cb5-78" aria-hidden="true" tabindex="-1"></a><span class="fu">startWorkers</span>(workerCount<span class="op">,</span> messageChan<span class="op">,</span> ackChan)<span class="op">;</span></span>
<span id="cb5-79"><a href="#cb5-79" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">waitForWorkers</span>(messageCount<span class="op">,</span> ackChan<span class="op">,</span> doneChan)<span class="op">;</span></span>
<span id="cb5-80"><a href="#cb5-80" aria-hidden="true" tabindex="-1"></a><span class="fu">startServer</span>(messageCount<span class="op">,</span> messageChan)<span class="op">;</span></span>
<span id="cb5-81"><a href="#cb5-81" aria-hidden="true" tabindex="-1"></a><span class="fu">stopWorkers</span>(workerCount<span class="op">,</span> messageChan<span class="op">,</span> doneChan)<span class="op">;</span></span></code></pre></div>
</details>
<p>Running the program produces this output:</p>
<details>
<summary class="print-hide">
Pubsub output
</summary>
<pre class="plain"><code>workers starting
workers scheduled to be started
server starting
server sending: 1
server sent: 1
server sending: 2
server sent: 2
server sending: 3
server sent: 3
server sending: 4
server sent: 4
server sending: 5
server sent: 5
server sending: 6
worker 1 starting
worker 1 received: 1
worker 1 sending: 1
worker 1 sent: 1
worker 1 received: 2
worker 1 sending: 2
worker 2 starting
worker 2 received: 3
worker 2 sending: 3
worker 3 starting
worker 3 received: 4
worker 3 sending: 4
server waiting for acks
server received: 1
server received: 2
server received: 3
server received: 4
server sent: 6
server sending: 7
server sent: 7
worker 1 sent: 2
worker 1 received: 5
worker 1 sending: 5
worker 1 sent: 5
worker 1 received: 6
worker 1 sending: 6
worker 1 sent: 6
worker 1 received: 7
worker 1 sending: 7
worker 2 sent: 3
worker 3 sent: 4
server received: 5
server received: 6
server received: 7
server received all acks
worker 1 sent: 7
workers stopping
workers scheduled to be stopped
worker 2 received: null
worker 2 stopped
worker 3 received: null
worker 3 stopped
worker 1 received: null
worker 1 stopped</code></pre>
</details>
<p>The output shows how the server and worker coroutines yield control to each other when they are waiting for messages or acks<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>.</p>
<h2 data-track-content data-content-name="bonus-round-emulating-actors" data-content-piece="implementing-co-4" id="bonus-round-emulating-actors">Bonus Round: Emulating Actors</h2>
<p>The <em><a href="https://en.wikipedia.org/wiki/Actor_model" target="_blank" rel="noopener">Actor model</a></em> is a concurrent programming paradigm where computation is carried out by lightweight processes called <em>Actors</em> that can only communicate with each other by sending messages. This makes them ideal for building concurrent and distributed systems.</p>
<p>In this section, we emulate actors in <span class="fancy">Co</span> using channels:</p>
<div class="sourceCode" id="cb7" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">start</span>(<span class="bu">process</span>) {</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> inbox <span class="op">=</span> <span class="fu">newChannel</span>()<span class="op">;</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">spawn</span> (<span class="kw">function</span> () {</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> val <span class="op">=</span> <span class="kw">null</span><span class="op">;</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (<span class="kw">true</span>) {</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> val <span class="op">=</span> <span class="op"><-</span> inbox<span class="op">;</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (val <span class="op">==</span> <span class="kw">null</span>) { <span class="cf">return</span><span class="op">;</span> }</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> <span class="bu">process</span>(val)<span class="op">;</span></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a> })()<span class="op">;</span></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">function</span> (message) { message <span class="op">-></span> inbox<span class="op">;</span> }<span class="op">;</span></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">send</span>(actor<span class="op">,</span> message) { <span class="fu">actor</span>(message)<span class="op">;</span> }</span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">stop</span>(actor) { <span class="fu">actor</span>(<span class="kw">null</span>)<span class="op">;</span> }</span></code></pre></div>
<p>Actors are implemented as wrappers around channels. By sending messages to an actor’s channel, we can send messages to the actor. However, we cannot expose the channels directly, so we wrap them in functions.</p>
<p>The <code>start</code> function creates and starts an actor by creating a new channel, and spawning a coroutine that receives messages from the channel in a loop and passes them to the <code>process</code> function taken as a parameter by the <code>start</code> function. Upon receiving a <code>null</code> value, the coroutine returns, which stops the actor.</p>
<p>The <code>start</code> function returns a function to send messages to the actor, which works by sending the messages to the actor’s channel.</p>
<p>The <code>send</code> function is a convenience function to send a message to an actor. The <code>stop</code> function stop an actor by sending it a <code class="sourceCode javascript"><span class="kw">null</span></code> message.</p>
<p>It was easy, wasn’t it? Now let’s use actors in some different ways.</p>
<p>Let’s start with a simple example of an actor that prints the received messages:</p>
<div class="sourceCode" id="cb8" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> printer <span class="op">=</span> <span class="fu">start</span>(print)<span class="op">;</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">send</span>(printer<span class="op">,</span> <span class="st">"world"</span>)<span class="op">;</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="fu">send</span>(printer<span class="op">,</span> <span class="st">"hello"</span>)<span class="op">;</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="fu">stop</span>(printer)<span class="op">;</span></span></code></pre></div>
<p>The <code>process</code> parameter here is the <a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed#print-func"><code>print</code></a> function. Running this program produces the following output:</p>
<pre class="plain"><code>hello
world</code></pre>
<p>Next, let’s write an actor that counts. For that, first we need to create a <em><a href="https://en.wikipedia.org/wiki/2-Tuple" target="_blank" rel="noopener">2-Tuple</a></em> data structure using closures, named <code>Pair</code><a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>:</p>
<div class="sourceCode" id="cb10" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">Pair</span>(first<span class="op">,</span> second) {</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">function</span> (command) {</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (command <span class="op">==</span> <span class="st">"first"</span>) { <span class="cf">return</span> first<span class="op">;</span> }</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (command <span class="op">==</span> <span class="st">"second"</span>) { <span class="cf">return</span> second<span class="op">;</span> }</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">null</span><span class="op">;</span></span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> }<span class="op">;</span></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">first</span>(pair) { <span class="cf">return</span> <span class="fu">pair</span>(<span class="st">"first"</span>)<span class="op">;</span> }</span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">second</span>(pair) { <span class="cf">return</span> <span class="fu">pair</span>(<span class="st">"second"</span>)<span class="op">;</span> }</span></code></pre></div>
<p>Now we implement the counter actor:</p>
<div class="sourceCode" id="cb11" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">makeCounter</span>() {</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> value <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="fu">start</span>(<span class="kw">function</span> (message) {</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> command <span class="op">=</span> <span class="fu">first</span>(message)<span class="op">;</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> arg <span class="op">=</span> <span class="fu">second</span>(message)<span class="op">;</span></span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (command <span class="op">==</span> <span class="st">"inc"</span>) { value <span class="op">=</span> value <span class="op">+</span> arg<span class="op">;</span> }</span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (command <span class="op">==</span> <span class="st">"get"</span>) { <span class="fu">send</span>(arg<span class="op">,</span> value)<span class="op">;</span> }</span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a> })<span class="op">;</span></span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>The <code>makeCounter</code> function creates a counter actor. The counter actor is started with a processing function that takes a message as a <code>Pair</code>, extracts the command and the argument from the message, and increments the counter value or sends the counter value back depending on the command. We exercise the counter like this:</p>
<div class="sourceCode" id="cb12" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> printer <span class="op">=</span> <span class="fu">start</span>(print)<span class="op">;</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> counter1 <span class="op">=</span> <span class="fu">makeCounter</span>()<span class="op">;</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="fu">send</span>(counter1<span class="op">,</span> <span class="fu">Pair</span>(<span class="st">"inc"</span><span class="op">,</span> <span class="dv">1</span>))<span class="op">;</span></span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a><span class="fu">send</span>(counter1<span class="op">,</span> <span class="fu">Pair</span>(<span class="st">"get"</span><span class="op">,</span> printer))<span class="op">;</span></span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a><span class="fu">send</span>(counter1<span class="op">,</span> <span class="fu">Pair</span>(<span class="st">"inc"</span><span class="op">,</span> <span class="dv">2</span>))<span class="op">;</span></span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a><span class="fu">send</span>(counter1<span class="op">,</span> <span class="fu">Pair</span>(<span class="st">"get"</span><span class="op">,</span> printer))<span class="op">;</span></span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a><span class="fu">stop</span>(counter1)<span class="op">;</span></span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> counter2 <span class="op">=</span> <span class="fu">makeCounter</span>()<span class="op">;</span></span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a><span class="fu">send</span>(counter2<span class="op">,</span> <span class="fu">Pair</span>(<span class="st">"inc"</span><span class="op">,</span> <span class="dv">5</span>))<span class="op">;</span></span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a><span class="fu">send</span>(counter2<span class="op">,</span> <span class="fu">Pair</span>(<span class="st">"get"</span><span class="op">,</span> printer))<span class="op">;</span></span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a><span class="fu">stop</span>(counter2)<span class="op">;</span></span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a><span class="fu">stop</span>(printer)<span class="op">;</span></span></code></pre></div>
<p>The output of the program is:</p>
<pre class="plain"><code>1
3
5</code></pre>
<p>And for the grand finale, let’s reimplement the <a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed#cb5-1">ping-pong</a> program using actors:</p>
<div class="sourceCode" id="cb14" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">makePingPonger</span>(name) {</span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> self <span class="op">=</span> <span class="kw">null</span><span class="op">;</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">function</span> <span class="fu">pingPong</span>(message) {</span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> value <span class="op">=</span> <span class="fu">first</span>(message)<span class="op">;</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> other <span class="op">=</span> <span class="fu">second</span>(message)<span class="op">;</span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (value <span class="op">==</span> <span class="st">"done"</span>) {</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(name <span class="op">+</span> <span class="st">" done"</span>)<span class="op">;</span></span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">spawn</span> (<span class="kw">function</span> () { <span class="fu">stop</span>(self)<span class="op">;</span> } ())<span class="op">;</span></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span><span class="op">;</span></span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(name <span class="op">+</span> <span class="st">" "</span> <span class="op">+</span> value)<span class="op">;</span></span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (value <span class="op">==</span> <span class="dv">0</span>) {</span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(name <span class="op">+</span> <span class="st">" done"</span>)<span class="op">;</span></span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a> <span class="fu">send</span>(other<span class="op">,</span> <span class="fu">Pair</span>(<span class="st">"done"</span><span class="op">,</span> self))<span class="op">;</span></span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a> <span class="fu">spawn</span> (<span class="kw">function</span> () { <span class="fu">stop</span>(self)<span class="op">;</span> } ())<span class="op">;</span></span>
<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span><span class="op">;</span></span>
<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a> <span class="fu">send</span>(other<span class="op">,</span> <span class="fu">Pair</span>(value <span class="op">-</span> <span class="dv">1</span><span class="op">,</span> self))<span class="op">;</span></span>
<span id="cb14-22"><a href="#cb14-22" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb14-23"><a href="#cb14-23" aria-hidden="true" tabindex="-1"></a> self <span class="op">=</span> <span class="fu">start</span>(pingPong)<span class="op">;</span></span>
<span id="cb14-24"><a href="#cb14-24" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> self<span class="op">;</span></span>
<span id="cb14-25"><a href="#cb14-25" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>The <code>makePingPonger</code> function creates a ping-ponger actor. The ping-ponger actor is started with a processing function that takes a message as a <code>Pair</code> of the value to print and the other actor to send the next message to. The processing function prints the value, decrements it, and sends it to the other actor. If the value is 0, it sends a <code>done</code> message to the other actor and stops itself. If the value is <code>done</code>, it stops itself.</p>
<p>Upon running it like this:</p>
<div class="sourceCode" id="cb15" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> pinger <span class="op">=</span> <span class="fu">makePingPonger</span>(<span class="st">"ping"</span>)<span class="op">;</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> ponger <span class="op">=</span> <span class="fu">makePingPonger</span>(<span class="st">"pong"</span>)<span class="op">;</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="fu">send</span>(pinger<span class="op">,</span> <span class="fu">Pair</span>(<span class="dv">10</span><span class="op">,</span> ponger))<span class="op">;</span></span></code></pre></div>
<p>It produces the same output as the original ping-pong program:</p>
<pre class="plain"><code>ping 10
pong 9
ping 8
pong 7
ping 6
pong 5
ping 4
pong 3
ping 2
pong 1
ping 0
ping done
pong done</code></pre>
<hr></hr>
<p>In this post, we added channels to <span class="fancy">Co</span>, and used them to create a variety of concurrent programs. We learned about <abbr title="Communicating Sequential Processes">CSP</abbr> and how implement it using coroutines and channels. In the next post, we will add support for <em><a href="https://en.wikipedia.org/wiki/Sleep_(system_call)" target="_blank" rel="noopener">sleep</a></em> to <span class="fancy">Co</span>.</p>
<p>The code for complete <span class="fancy">Co</span> interpreter is available <a href="https://abhinavsarkar.net/code/co-interpreter.html?mtm_campaign=feed">here</a>.</p>
<h2 class="notoc" data-track-content data-content-name="acknowledgements" data-content-piece="implementing-co-4" id="acknowledgements">Acknowledgements</h2>
<p>Many thanks to <a href="https://www.deobald.ca/" target="_blank" rel="noopener">Steven Deobald</a> for reviewing a draft of this article.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" role="list">
<div id="ref-Hoare1986-ih" class="csl-entry" role="listitem">
Hoare, C A R. <em>Communicating Sequential Processes</em>. Prentice Hall, 1986. <a href="https://doi.org/10.1145/359576.359585" target="_blank" rel="noopener">https://doi.org/10.1145/359576.359585</a>.
</div>
</div>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>Recently, Java added support for <a href="https://openjdk.org/jeps/444" target="_blank" rel="noopener"><em>Virtual Threads</em></a>, which though are not cooperatively scheduled like coroutines, are scheduled by the JVM, and are very lightweight. With virtual threads, the various Java queues can be considered channels as defined in <abbr title="Communicating Sequential Processes">CSP</abbr>.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>The design of channels in <span class="fancy">Co</span> is inspired by the design of channels in <a href="https://cdn.cognitect.com/presentations/2014/insidechannels.pdf" target="_blank" rel="noopener">Clojure core.async</a>. It is a simplified version, not supporting some of the features of core.async, such as <a href="https://clojuredocs.org/clojure.core.async/transduce" target="_blank" rel="noopener">transducers</a>, and <a href="https://clojuredocs.org/clojure.core.async/alt!" target="_blank" rel="noopener">alts</a>.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>Recall that the <a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed#queue-ds"><code>Queue</code></a> type is an immutable queue data structure wrapped in an <code>IORef</code>, which we manipulate using atomic operations <a href="https://hackage.haskell.org/package/base/docs/Data-IORef.html#v:atomicModifyIORef-39-" target="_blank" rel="noopener"><code>atomicModifyIORef'</code></a>.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>You can try running the program with different values for the <code>workerCount</code>, <code>messageCount</code>, <code>messageBufferSize</code> and <code>ackBufferSize</code> variables to see how it behaves. You can also try changing the order of the function calls at the end of the program, or prefixing them with <code class="sourceCode"><span class="cf">spawn</span></code> to see how it affects the output. In some cases, the program may deadlock and hang, and in some other cases, it may throw an error. Try to understand why.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>We used the same trick to create a <a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed#bonus-round-breadth-first-traversal-without-a-queue">binary tree</a> data structure in the previous post.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>Implementing Co, a Small Language With Coroutines</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed">The Parser</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed">The Interpreter</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed">Adding Coroutines</a>
</li>
<li>
<strong>Adding Channels</strong> 👈
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/implementing-co-4/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2023-06-03T00:00:00Z <p>In the <a href="https://abhinavsarkar.net/posts/implementing-co-3/">previous post</a>, we added coroutines to <span class="fancy">Co</span>, the small language we are implementing in this series of posts. In this post, we add channels to it to be able to communicate between coroutines.</p>
https://abhinavsarkar.net/posts/implementing-co-3/ Implementing Co, a Small Language With Coroutines #3: Adding Coroutines 2023-02-11T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In the <a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed">previous post</a>, we wrote the interpreter for basic features of <span class="fancy">Co</span>, the small language we are building in this series of posts. In this post, we explore and implement what makes <span class="fancy">Co</span> really interesting: support for lightweight concurrency using Coroutines.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>Implementing Co, a Small Language With Coroutines</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed">The Parser</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed">The Interpreter</a>
</li>
<li>
<strong>Adding Coroutines</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-4/?mtm_campaign=feed">Adding Channels</a>
</li>
</ol>
</section>
<p>In this and next two posts, we add support for the following features to the <span class="fancy">Co</span> interpreter:</p>
<ul>
<li><code class="sourceCode javascript"><span class="kw">yield</span></code> statement to yield the current thread of computation (ToC).</li>
<li><code class="sourceCode"><span class="cf">spawn</span></code> statement to start a new <abbr title="Thread of computation">ToC</abbr>.</li>
<li>First class channels with operators to send and receive values over them.</li>
<li><code class="sourceCode"><span class="cf">sleep</span></code> function to sleep the current <abbr title="Thread of computation">ToC</abbr> for a given number of milliseconds.</li>
</ul>
<p>Let’s <span class="fancy">Co</span>!</p>
<!--balha-->
<nav id="toc"><h3>Contents</h3><ol><li><a href="#coroutines">Coroutines</a></li><li><a href="#coroutines-in-various-languages">Coroutines in Various Languages</a></li><li><a href="#implementing-coroutines">Implementing Coroutines</a></li><li><a href="#continuation-passing-style">Continuation-Passing Style</a><ol><li><a href="#continuation-passing-style-in-haskell">Continuation-Passing Style in Haskell</a></li><li><a href="#call-with-current-continuation">Call with Current Continuation</a></li></ol></li><li><a href="#from-continuations-to-coroutines">From Continuations to Coroutines</a></li><li><a href="#scheduling-coroutines">Scheduling Coroutines</a></li><li><a href="#yield-and-spawn">Yield and Spawn</a><ol><li><a href="#implementation">Implementation</a></li></ol></li><li><a href="#waiting-for-termination">Waiting for Termination</a></li><li><a href="#putting-everything-together">Putting Everything Together</a></li><li><a href="#bonus-round-breadth-first-traversal-without-a-queue">Bonus Round: Breadth-First Traversal without a Queue</a></li></ol></nav>
<h2 data-track-content data-content-name="coroutines" data-content-piece="implementing-co-3" id="coroutines">Coroutines</h2>
<p><a href="https://en.wikipedia.org/wiki/Coroutines" target="_blank" rel="noopener">Coroutines</a><sup><a href="#ref-Knuth1997-rv" class="citation" title="Knuth, “Coroutines.”
">@1</a></sup> are computations that support <a href="https://en.wikipedia.org/wiki/Cooperative_multitasking" target="_blank" rel="noopener"><em>Cooperative multitasking</em></a><span><sup><a href="#ref-Bartel2011-ap" class="citation" title="Bartel, “Non-Preemptive Multitasking.”
">@2</a></sup>.</span> Unlike ordinary <a href="https://en.wikipedia.org/wiki/Subroutines" target="_blank" rel="noopener"><em>Subroutines</em></a> that execute from start to end, and do not hold any state between invocations, coroutines can exit in the middle, and may resume later from the same point while holding state between invocations. They do so by <a href="https://en.wikipedia.org/wiki/Yield_(multithreading)" target="_blank" rel="noopener"><em>yielding</em></a> the control of the current running thread.</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 713 337'%3E%3C/svg%3E" class="lazyload w-100pct nolink" style="--image-aspect-ratio: 2.115727002967359" data-src="/images/implementing-co-3/coroutine-vs-subroutine.svg" alt="Subroutines vs. Coroutines"></img>
<noscript><img src="/images/implementing-co-3/coroutine-vs-subroutine.svg" class="w-100pct nolink" alt="Subroutines vs. Coroutines"></img></noscript>
<figcaption>Subroutines vs. Coroutines</figcaption>
</figure>
<p>The above diagram compares the execution of a subroutine and a coroutine, invoked from a caller<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. The rectangles represent instructions, starting at top and ending at bottom. The arrows represent the flow of control.</p>
<p>The subroutine executes from start to end when called. The coroutine can exit in the middle by yielding, and can resume later from the same point. The coroutine state is saved automatically at the point of yielding, and restored when the coroutine resumes. Note that the coroutine may not be resumed, in which case the rest of the coroutine never executes.</p>
<h2 data-track-content data-content-name="coroutines-in-various-languages" data-content-piece="implementing-co-3" id="coroutines-in-various-languages">Coroutines in Various Languages</h2>
<p>Many languages have support for coroutines, either <a href="https://en.wikipedia.org/wiki/Coroutine#Programming_languages_with_native_support" target="_blank" rel="noopener">built-in</a> or <a href="https://en.wikipedia.org/wiki/Coroutine#Implementations" target="_blank" rel="noopener">through libraries or plugins</a>. Here are two examples in <a href="https://kotlinlang.org" target="_blank" rel="noopener">Kotlin</a> and <a href="https://www.python.org/" target="_blank" rel="noopener">Python</a><a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>:</p>
<div id="lst:kotlin-coroutine" class="listing numberSource kotlin">
<div class="sourceCode" id="cb1" data-lang="kotlin"><pre class="sourceCode numberSource kotlin"><code class="sourceCode kotlin"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">fun</span> <span class="fu">main</span><span class="op">()</span> <span class="op">=</span> runBlocking <span class="op">{</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> launch <span class="op">{</span> <span class="co">// launch a new coroutine and continue</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> delay<span class="op">(</span><span class="dv">1000L</span><span class="op">)</span> <span class="co">// non-blocking delay for 1 second</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> println<span class="op">(</span><span class="st">"World!"</span><span class="op">)</span> <span class="co">// print after delay</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">}</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> println<span class="op">(</span><span class="st">"Hello"</span><span class="op">)</span> <span class="co">// main coroutine</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="co">// prints "Hello World!"</span></span></code></pre></div>
<p>Coroutines in Kotlin</p>
</div>
<div id="lst:python-coroutine" class="listing numberSource python">
<div class="sourceCode" id="cb2" data-lang="python"><pre class="sourceCode numberSource python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> asyncio</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="cf">async</span> <span class="kw">def</span> say_after(delay, what):</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">await</span> asyncio.sleep(delay)</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(what, end<span class="op">=</span><span class="st">""</span>)</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="cf">async</span> <span class="kw">def</span> main():</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a> <span class="cf">await</span> asyncio.gather(</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a> say_after(<span class="dv">1</span>, <span class="st">'World!</span><span class="ch">\n</span><span class="st">'</span>),</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> say_after(<span class="dv">0</span>, <span class="st">'Hello '</span>))</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>asyncio.run(main())</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a><span class="co"># prints "Hello World!"</span></span></code></pre></div>
<p>Coroutines in Python</p>
</div>
<p>Now, for a different kind of example, the following <a href="https://en.wikipedia.org/wiki/JavaScript" target="_blank" rel="noopener">JavaScript</a> code prints numbers 11–16 and 1–4 interleaved, using <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Generator" target="_blank" rel="noopener">Generators</a><a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>:</p>
<div id="lst:javascript-coroutine" class="listing numberSource javascript">
<div class="sourceCode" id="cb3" data-lang="javascript"><pre class="sourceCode numberSource javascript"><code class="sourceCode javascript"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span><span class="op">*</span> <span class="fu">printNums</span>(start<span class="op">,</span> end) {</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> (<span class="kw">let</span> i <span class="op">=</span> start<span class="op">;</span> i <span class="op"><</span> end <span class="op">+</span> <span class="dv">1</span><span class="op">;</span> i<span class="op">++</span>) {</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="bu">console</span><span class="op">.</span><span class="fu">log</span>(i)<span class="op">;</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">yield</span><span class="op">;</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">run</span>(<span class="op">...</span>gens) {</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">const</span> queue <span class="op">=</span> [<span class="op">...</span>gens]<span class="op">;</span></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (queue<span class="op">.</span><span class="at">length</span> <span class="op">!=</span> <span class="dv">0</span>) {</span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">const</span> p <span class="op">=</span> queue<span class="op">.</span><span class="fu">shift</span>()<span class="op">;</span></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (<span class="op">!</span>p<span class="op">.</span><span class="fu">next</span>()<span class="op">.</span><span class="at">done</span>) {</span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a> queue<span class="op">.</span><span class="fu">push</span>(p)<span class="op">;</span></span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a><span class="fu">run</span>(<span class="fu">printNums</span>(<span class="dv">11</span><span class="op">,</span> <span class="dv">16</span>)<span class="op">,</span> <span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>))<span class="op">;</span></span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a><span class="co">// prints numbers 11–16 and 1–4, interleaved.</span></span></code></pre></div>
<p>Generators in JavaScript</p>
</div>
<p>The next example is in <span class="fancy">Co</span>, and it has the same behaviour as the JavaScript example above, except we don’t have to write the function to schedule and run the coroutines. The runtime for <span class="fancy">Co</span>—the <span class="fancy">Co</span> interpreter—does that implicitly for us.</p>
<div id="lst:co-coroutine" class="listing javascript numberSource">
<div class="sourceCode" id="cb4" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">printNums</span>(start<span class="op">,</span> end) {</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> i <span class="op">=</span> start<span class="op">;</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (i <span class="op"><</span> end <span class="op">+</span> <span class="dv">1</span>) {</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(i)<span class="op">;</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">yield</span><span class="op">;</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)<span class="op">;</span></span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="fu">printNums</span>(<span class="dv">11</span><span class="op">,</span> <span class="dv">16</span>)<span class="op">;</span></span></code></pre></div>
<p>Coroutine example in <span class="fancy">Co</span></p>
</div>
<p>So how are coroutines implemented in <span class="fancy">Co</span>? Let’s find out.</p>
<h2 data-track-content data-content-name="implementing-coroutines" data-content-piece="implementing-co-3" id="implementing-coroutines">Implementing Coroutines</h2>
<p>A coroutine is essentially an <em>Environment</em><sup><a href="#ref-Abelson1996-c32" class="citation" title="Abelson, Sussman, and with Julie Sussman, “The Environment Model of
Evaluation.”
">@6</a></sup> and a <em>Continuation</em><span><sup><a href="#ref-Reynolds1993-dc" class="citation" title="Reynolds, “The Discoveries of Continuations.”
">@7</a></sup>.</span> The environment is the state of the executing code at the point of yielding. The continuation is the code to be executed when the coroutine is resumed later. If we can capture the environment and the continuation, we can implement coroutines.</p>
<p>Different implementations of coroutines capture the environment and the continuation in different ways<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>:</p>
<ul>
<li>We can capture the environment as the current stack and the continuation as the pointer to the next instruction to be executed at the level of <a href="https://en.wikipedia.org/wiki/machine_code" target="_blank" rel="noopener">machine code</a>. This is how coroutines are implemented in C and C++.</li>
<li>We can transform the code into a state machine as a large switch statement, and use variables to store the environment. This is how Go threads are implemented in the Clojure <a href="https://clojure.github.io/core.async/" target="_blank" rel="noopener">core.async</a><a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a> library.</li>
<li>We can capture the environment and the continuation as a <a href="https://en.wikipedia.org/wiki/Closure_(computer_programming)" target="_blank" rel="noopener"><em>Closure</em></a>. To do this, we need to first transform the code into <a href="https://en.wikipedia.org/wiki/Continuation-passing_style" target="_blank" rel="noopener"><em>Continuation-passing style</em></a> (CPS), so that we have the handle to the continuation at every point in the code. This is how we are going to implement coroutines in <span class="fancy">Co</span>.</li>
</ul>
<p>Let’s learn what <abbr title="Continuation-passing style">CPS</abbr> is, and how we can use it to implement coroutines.</p>
<h2 data-track-content data-content-name="continuation-passing-style" data-content-piece="implementing-co-3" id="continuation-passing-style">Continuation-Passing Style</h2>
<p>In the usual direct programming style, we write one statement or function call after another, as a sequence of steps to execute. There is another way of thinking about program execution: after returning from executing one statement/function, the rest of the program—which can be thought of as a big statement/function itself—is run. In <abbr title="Continuation-passing style">CPS</abbr>, this is made explicit: each statement/function takes the rest of the program that comes after it as an argument, which it invokes explicitly. For example, if we have a program to get the recommendations for a user and print them, written in direct style like this:</p>
<div class="sourceCode" id="cb5" data-lang="javascript"><pre class="sourceCode numberSource javascript"><code class="sourceCode javascript"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">getUserRecommendations</span>(userId) {</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> user <span class="op">=</span> <span class="fu">getUser</span>(userId)<span class="op">;</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> friends <span class="op">=</span> <span class="fu">getFriends</span>(user)<span class="op">;</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> recommendations <span class="op">=</span> <span class="fu">getRecommendations</span>(friends)<span class="op">;</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">recordRecommendations</span>(userId<span class="op">,</span> recommendations)<span class="op">;</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> recommendations<span class="op">;</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">main</span>() {</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> recommendations <span class="op">=</span> <span class="fu">getUserRecommendations</span>(<span class="dv">123</span>)<span class="op">;</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a> <span class="bu">console</span><span class="op">.</span><span class="fu">log</span>(recommendations)<span class="op">;</span></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>It can be converted to an equivalent <abbr title="Continuation-passing style">CPS</abbr> program like this:</p>
<div class="sourceCode" id="cb6" data-lang="javascript"><pre class="sourceCode numberSource javascript"><code class="sourceCode javascript"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">getUserRecommendationsCPS</span>(userId<span class="op">,</span> cont) {</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">getUserCPS</span>(userId<span class="op">,</span> (user) <span class="kw">=></span> {</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">getFriendsCPS</span>(user<span class="op">,</span> (friends) <span class="kw">=></span> {</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">getRecommendationsCPS</span>(friends<span class="op">,</span> (recommendations) <span class="kw">=></span> {</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">recordRecommendationsCPS</span>(userId<span class="op">,</span> recommendations<span class="op">,</span> () <span class="kw">=></span> {</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">cont</span>(recommendations)<span class="op">;</span></span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> })<span class="op">;</span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> })<span class="op">;</span></span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a> })<span class="op">;</span></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a> })<span class="op">;</span></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">mainCPS</span>() {</span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a> <span class="fu">getUserRecommendationsCPS</span>(<span class="dv">123</span><span class="op">,</span> (recommendations) <span class="kw">=></span> {</span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a> <span class="bu">console</span><span class="op">.</span><span class="fu">log</span>(recommendations)<span class="op">;</span></span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a> })<span class="op">;</span></span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>We see how each function takes the rest of the program after it captured as a function, as a parameter, and calls it explicitly to further the flow of the program. Instead of returning the recommendations, the <code>getUserRecommendationsCPS</code> function now takes a function as an additional parameter, which it calls with the recommendations at the end of all the processing. Same for all the other functions invoked in the program. These functions passed as arguments are known as continuations because they <strong>continue</strong> the execution of the programs when called, and hence this style is called the continuation-passing style. The <code>cont</code> function is the continuation here.</p>
<details>
<summary>
The rest of the functions can be written in <abbr title="Continuation-passing style">CPS</abbr> like this:
</summary>
<div class="sourceCode" id="cb7" data-lang="javascript"><pre class="sourceCode numberSource javascript"><code class="sourceCode javascript"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">getUserCPS</span>(userId<span class="op">,</span> cont) {</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> user <span class="op">=</span> <span class="fu">getUser</span>(userId)<span class="op">;</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">cont</span>(user)<span class="op">;</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">getFriendsCPS</span>(user<span class="op">,</span> cont) {</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> friends <span class="op">=</span> <span class="fu">getFriends</span>(user)<span class="op">;</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">cont</span>(friends)<span class="op">;</span></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">getRecommendationsCPS</span>(friends<span class="op">,</span> cont) {</span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> recommendations <span class="op">=</span> <span class="fu">getRecommendations</span>(friends)<span class="op">;</span></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a> <span class="fu">cont</span>(recommendations)<span class="op">;</span></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">recordRecommendationsCPS</span>(userId<span class="op">,</span> recommendations<span class="op">,</span> cont) {</span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a> <span class="fu">recordRecommendations</span>(userId<span class="op">,</span> recommendations)<span class="op">;</span></span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a> <span class="fu">cont</span>()<span class="op">;</span></span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
</details>
<p>So, what is the point of all this? Why transform code into <abbr title="Continuation-passing style">CPS</abbr>? Since, in <abbr title="Continuation-passing style">CPS</abbr> the rest of the program is passed as a function, a program can itself explicitly manipulate the flow of control of the program. This lets us do things like<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>:</p>
<ul>
<li>Returning early from a function by calling the continuation with the return value, and not executing the rest of the function.</li>
<li>Implementing exceptions by passing two continuations: one for the normal flow of the program, and another for the exceptional flow.</li>
<li>Implementing non-deterministic programs by passing continuations for backtracking to previous states of the program.</li>
<li>Converting potentially stack-blowing recursive programs into iterative programs by passing the continuation as a parameter to the recursive function.</li>
<li>Suspending the execution of the program by storing the continuation, and resuming it later.</li>
</ul>
<p>We can now begin to see how <abbr title="Continuation-passing style">CPS</abbr> can be used to implement coroutines.</p>
<h3 id="continuation-passing-style-in-haskell">Continuation-Passing Style in Haskell</h3>
<p>It is straightforward to translate the <a href="#cb5-1">above program</a> into Haskell:</p>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">getUserRecommendations ::</span> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">Int</span> <span class="ot">-></span> m <span class="dt">Recommendations</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>getUserRecommendations userId <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> user <span class="ot"><-</span> getUser userId</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> friends <span class="ot"><-</span> getFriends user</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> recommendations <span class="ot"><-</span> getRecommendations friends</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> recordRecommendations userId recommendations</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> recommendations</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> getUserRecommendations <span class="dv">123</span> <span class="op">>>=</span> <span class="fu">print</span></span></code></pre></div>
<p>And the <abbr title="Continuation-passing style">CPS</abbr> versions:</p>
<div class="sourceCode" id="cb9" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">getUserRecommendationsCPS ::</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">Int</span> <span class="ot">-></span> (<span class="dt">Recommendations</span> <span class="ot">-></span> m a) <span class="ot">-></span> m a</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>getUserRecommendationsCPS userId cont <span class="ot">=</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a> getUserCPS userId <span class="op">$</span> \user <span class="ot">-></span></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a> getFriendsCPS user <span class="op">$</span> \friends <span class="ot">-></span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a> getRecommendationsCPS friends <span class="op">$</span> \recommendations <span class="ot">-></span></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a> recordRecommendationsCPS userId recommendations <span class="op">$</span> \_ <span class="ot">-></span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a> cont recommendations</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a><span class="ot">getUserCPS ::</span> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">Int</span> <span class="ot">-></span> (<span class="dt">User</span> <span class="ot">-></span> m a) <span class="ot">-></span> m a</span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a>getUserCPS userId cont <span class="ot">=</span> getUser userId <span class="op">>>=</span> cont</span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a><span class="ot">getFriendsCPS ::</span> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">User</span> <span class="ot">-></span> (<span class="dt">Friends</span> <span class="ot">-></span> m a) <span class="ot">-></span> m a</span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a>getFriendsCPS user cont <span class="ot">=</span> getFriends user <span class="op">>>=</span> cont</span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a><span class="ot">getRecommendationsCPS ::</span></span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">Friends</span> <span class="ot">-></span> (<span class="dt">Recommendations</span> <span class="ot">-></span> m a) <span class="ot">-></span> m a</span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a>getRecommendationsCPS friends cont <span class="ot">=</span></span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a> getRecommendations friends <span class="op">>>=</span> cont</span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a><span class="ot">recordRecommendationsCPS ::</span></span>
<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Recommendations</span> <span class="ot">-></span> (() <span class="ot">-></span> m a) <span class="ot">-></span> m a</span>
<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a>recordRecommendationsCPS userId recommendations cont <span class="ot">=</span></span>
<span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a> recordRecommendations userId recommendations <span class="op">>></span> cont ()</span>
<span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-26"><a href="#cb9-26" aria-hidden="true" tabindex="-1"></a><span class="ot">mainCPS ::</span> <span class="dt">IO</span> ()</span>
<span id="cb9-27"><a href="#cb9-27" aria-hidden="true" tabindex="-1"></a>mainCPS <span class="ot">=</span> getUserRecommendationsCPS <span class="dv">123</span> <span class="op">$</span> <span class="fu">print</span></span></code></pre></div>
<p>We can immediately notice a pattern in the type signatures of the functions above: they are all of the form:</p>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">f ::</span> <span class="dt">Monad</span> m <span class="ot">=></span> b <span class="ot">-></span> (a <span class="ot">-></span> m r) <span class="ot">-></span> m r</span></code></pre></div>
<p>It is indeed a known pattern, and is captured by the <code class="sourceCode haskell"><span class="dt">ContT</span></code> type:</p>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">ContT</span> r m a <span class="ot">=</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">ContT</span> {<span class="ot"> runContT ::</span> (a <span class="ot">-></span> m r) <span class="ot">-></span> m r }</span></code></pre></div>
<p>Turns out, the <a href="https://hackage.haskell.org/package/mtl/docs/Control-Monad-Cont.html#t:ContT" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ContT</span></code></a> type is a monad transformer, and we can use it to write the above <abbr title="Continuation-passing style">CPS</abbr> program in a more concise way<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>:</p>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">getUserRecommendationsCont ::</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">ContT</span> r m <span class="dt">Recommendations</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>getUserRecommendationsCont userId <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> user <span class="ot"><-</span> getUserCont userId</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a> friends <span class="ot"><-</span> getFriendsCont user</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a> recommendations <span class="ot"><-</span> getRecommendationsCont friends</span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a> recordRecommendationsCont userId recommendations</span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> recommendations</span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a><span class="ot">getUserCont ::</span> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">ContT</span> r m <span class="dt">User</span></span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a>getUserCont userId <span class="ot">=</span> <span class="dt">ContT</span> (getUser userId <span class="op">>>=</span>)</span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a><span class="ot">getFriendsCont ::</span> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">User</span> <span class="ot">-></span> <span class="dt">ContT</span> r m <span class="dt">Friends</span></span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a>getFriendsCont user <span class="ot">=</span> <span class="dt">ContT</span> (getFriends user <span class="op">>>=</span>)</span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a><span class="ot">getRecommendationsCont ::</span></span>
<span id="cb12-17"><a href="#cb12-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">Friends</span> <span class="ot">-></span> <span class="dt">ContT</span> r m <span class="dt">Recommendations</span></span>
<span id="cb12-18"><a href="#cb12-18" aria-hidden="true" tabindex="-1"></a>getRecommendationsCont friends <span class="ot">=</span></span>
<span id="cb12-19"><a href="#cb12-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">ContT</span> (getRecommendations friends <span class="op">>>=</span>)</span>
<span id="cb12-20"><a href="#cb12-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-21"><a href="#cb12-21" aria-hidden="true" tabindex="-1"></a><span class="ot">recordRecommendationsCont ::</span></span>
<span id="cb12-22"><a href="#cb12-22" aria-hidden="true" tabindex="-1"></a> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Recommendations</span> <span class="ot">-></span> <span class="dt">ContT</span> r m ()</span>
<span id="cb12-23"><a href="#cb12-23" aria-hidden="true" tabindex="-1"></a>recordRecommendationsCont userId recommendations <span class="ot">=</span></span>
<span id="cb12-24"><a href="#cb12-24" aria-hidden="true" tabindex="-1"></a> <span class="dt">ContT</span> <span class="op">$</span> \cont <span class="ot">-></span></span>
<span id="cb12-25"><a href="#cb12-25" aria-hidden="true" tabindex="-1"></a> recordRecommendations userId recommendations <span class="op">>></span> cont ()</span>
<span id="cb12-26"><a href="#cb12-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-27"><a href="#cb12-27" aria-hidden="true" tabindex="-1"></a><span class="ot">mainCont ::</span> <span class="dt">IO</span> ()</span>
<span id="cb12-28"><a href="#cb12-28" aria-hidden="true" tabindex="-1"></a>mainCont <span class="ot">=</span> runContT (getUserRecommendationsCont <span class="dv">123</span>) <span class="fu">print</span></span></code></pre></div>
<p>So we have come full circle: we started with <a href="#cb8-1">monadic code</a>, and ended with similar <a href="#cb12-1">monadic code</a>, but with a different monad. So what did we gain from this transformation? Well, we can now use the <a href="https://hackage.haskell.org/package/mtl/docs/Control-Monad-Cont.html#v:callCC" target="_blank" rel="noopener"><code class="sourceCode haskell">callCC</code></a> function provided by <code class="sourceCode haskell"><span class="dt">ContT</span></code>.</p>
<h3 id="call-with-current-continuation">Call with Current Continuation</h3>
<p><code>callCC</code>—short for “call with current continuation”—is a function that provides on-demand access to the current continuation at any point in the code, just like we had in the <a href="#cb9-1">CPS version</a> of the program. At the same time, by using <code class="sourceCode haskell"><span class="dt">ConT</span></code> we can write the program again in the concise monadic style<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>.</p>
<p>The following example uses <code>callCC</code> to print the user recommendation twice, instead of once<a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a>:</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">getUserRecommendationsCont2 ::</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">Monad</span> m <span class="ot">=></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">ContT</span> r m <span class="dt">Recommendations</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>getUserRecommendationsCont2 userId <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a> user <span class="ot"><-</span> getUserCont userId</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a> friends <span class="ot"><-</span> getFriendsCont user</span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> recommendations <span class="ot"><-</span> getRecommendationsCont friends</span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> logRecommendationsCont userId recommendations</span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a> callCC <span class="op">$</span> \cont <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a> cont recommendations</span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a> cont recommendations</span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a><span class="ot">mainCont2 ::</span> <span class="dt">IO</span> ()</span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a>mainCont2 <span class="ot">=</span> runContT (getUserRecommendationsCont2 <span class="dv">123</span>) <span class="fu">print</span></span></code></pre></div>
<p>This is the power of <abbr title="Continuation-passing style">CPS</abbr>: it lets the programs manipulate the flow of control explicitly, and in some cases markedly, as we see in the next section.</p>
<h2 data-track-content data-content-name="from-continuations-to-coroutines" data-content-piece="implementing-co-3" id="from-continuations-to-coroutines">From Continuations to Coroutines</h2>
<p>Since continuations are functions, we can store them in data structures. This lets us pause the execution of a <abbr title="Continuation-passing style">CPS</abbr> program at a certain point, and resume it later from the same point. This is exactly what coroutines do.</p>
<p>To implement coroutines in <span class="fancy">Co</span>, first we enhance the <code class="sourceCode haskell"><span class="dt">Interpreter</span></code> monad to be able to capture the current continuation by adding the <code class="sourceCode haskell"><span class="dt">ContT</span></code> monad transformer in the transformer stack:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="4-5,17-17"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Interpreter</span> a <span class="ot">=</span> <span class="dt">Interpreter</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> runInterpreter ::</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">ExceptT</span> <span class="dt">Exception</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> (<span class="dt">ContT</span></span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> (<span class="dt">Either</span> <span class="dt">Exception</span> ())</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> (<span class="dt">StateT</span> <span class="dt">InterpreterState</span> <span class="dt">IO</span>))</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> a</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> ( <span class="dt">Functor</span>,</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Applicative</span>,</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">Monad</span>,</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">MonadIO</span>,</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">MonadBase</span> <span class="dt">IO</span>,</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">MonadState</span> <span class="dt">InterpreterState</span>,</span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">MonadError</span> <span class="dt">Exception</span>,</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dt">MonadCont</span></span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> )</span></code></pre></div>
<p>To be able to pause and resume the <span class="fancy">Co</span> code being interpreted, we need to capture the current interpreter environment as well. The environment contains the bindings that the executing <span class="fancy">Co</span> code sees at any given time. By capturing and later restoring the environment, the code execution resumes with same environment, and hence works correctly.</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="4-4,9-9,10:31-10:36"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Coroutine</span> a <span class="ot">=</span> <span class="dt">Coroutine</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> corEnv ::</span> <span class="dt">Env</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> corCont ::</span> a <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> ,<span class="ot"> corReady ::</span> <span class="dt">MVar</span> <span class="dt">TimeSpec</span></span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="ot">newCoroutine ::</span> <span class="dt">Env</span> <span class="ot">-></span> (a <span class="ot">-></span> <span class="dt">Interpreter</span> ()) <span class="ot">-></span> <span class="dt">Interpreter</span> (<span class="dt">Coroutine</span> a)</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>newCoroutine env cont <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> ready <span class="ot"><-</span> newMVar <span class="op">=<<</span> currentSystemTime</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Coroutine</span> env cont <span class="deemphasis">ready</span></span></code></pre></div>
<p>The <code class="sourceCode haskell"><span class="dt">Coroutine</span></code> data type contains the environment and the continuation. The <code>newCoroutine</code> function creates a new coroutine.</p>
<p>Next, we enhance the interpreter state to keep a queue of coroutines to be run.</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="3-3,7:60-7:68"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">InterpreterState</span> <span class="ot">=</span> <span class="dt">InterpreterState</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> isEnv ::</span> <span class="dt">Env</span>,</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="emphasis"><span class="ot"> isCoroutines ::</span> <span class="dt">Queue</span> (<span class="dt">Coroutine</span> ())</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="ot">initInterpreterState ::</span> <span class="dt">IO</span> <span class="dt">InterpreterState</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>initInterpreterState <span class="ot">=</span> <span class="dt">InterpreterState</span> <span class="op"><$></span> builtinEnv <span class="op"><*></span> <span class="emphasis">newQueue</span></span></code></pre></div>
<p>We use an <a href="https://hackage.haskell.org/package/base/docs/Data-IORef.html#t:IORef" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">IORef</span></code></a> containing a <a href="https://en.wikipedia.org/wiki/min-priority_queue" target="_blank" rel="noopener">min-priority queue</a> to store the coroutines<a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a>. For now, we use it as a simple <a href="https://en.wikipedia.org/wiki/FIFO_(computing_and_electronics)" target="_blank" rel="noopener">FIFO</a> queue, but we will see in a later post how we use it to implement the <code>sleep</code> functionality in our interpreter.</p>
<p><a id="queue-ds"></a></p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="1:46-1:56,5-5,6:12-6:13,6:21-6:27,9:18-9:19,9:26-9:33"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Queue</span> a <span class="ot">=</span> <span class="dt">IORef</span> (<span class="dt">PQ.MinPQueue</span> <span class="dt">TimeSpec</span> a<span class="deemphasis">, <span class="dt">TimeSpec</span></span>)</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="ot">newQueue ::</span> <span class="dt">MonadBase</span> <span class="dt">IO</span> m <span class="ot">=></span> m (<span class="dt">Queue</span> a)</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>newQueue <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> now <span class="ot"><-</span> liftBase currentSystemTime</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> newIORef <span class="deemphasis">(</span>PQ.empty<span class="deemphasis">, now)</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="ot">queueSize ::</span> <span class="dt">MonadBase</span> <span class="dt">IO</span> m <span class="ot">=></span> <span class="dt">Queue</span> a <span class="ot">-></span> m <span class="dt">Int</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>queueSize <span class="ot">=</span> <span class="fu">fmap</span> <span class="deemphasis">(</span>PQ.size<span class="deemphasis"> <span class="op">.</span> <span class="fu">fst</span>)</span> <span class="op">.</span> readIORef</span></code></pre></div>
<p>Now that we know how coroutines are stored in the interpreter, let’s see how we schedule them.</p>
<h2 data-track-content data-content-name="scheduling-coroutines" data-content-piece="implementing-co-3" id="scheduling-coroutines">Scheduling Coroutines</h2>
<p>First step in scheduling coroutines is to write functions to enqueue and dequeue from a queue:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="2:56-2:57,2:58-2:74,3:4-3:5,3:26-3:27,4-4,5:3-5:5,16:45-16:46,16:47-16:63,18:11-18:12,18:13-18:29,20:14-20:15,20:17-20:33"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">enqueueAt ::</span> <span class="dt">TimeSpec</span> <span class="ot">-></span> a <span class="ot">-></span> <span class="dt">Queue</span> a <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>enqueueAt time val queue <span class="ot">=</span> atomicModifyIORef' queue <span class="op">$</span> \<span class="deemphasis">(</span>q<span class="deemphasis">, maxWakeupTime)</span> <span class="ot">-></span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> (<span class="deemphasis">(</span> PQ.insert time val q<span class="deemphasis">,</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="kw">if</span> time <span class="op">></span> maxWakeupTime <span class="kw">then</span> time <span class="kw">else</span> maxWakeupTime</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="deemphasis"> )</span>, ())</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="ot">enqueue ::</span> a <span class="ot">-></span> <span class="dt">Queue</span> a <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>enqueue val queue <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> now <span class="ot"><-</span> currentSystemTime</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> enqueueAt now val queue</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="ot">currentSystemTime ::</span> <span class="dt">MonadIO</span> m <span class="ot">=></span> m <span class="dt">TimeSpec</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>currentSystemTime <span class="ot">=</span> liftIO <span class="op">$</span> getTime <span class="dt">Monotonic</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a><span class="ot">dequeue ::</span> <span class="dt">Queue</span> a <span class="ot">-></span> <span class="dt">Interpreter</span> (<span class="dt">Maybe</span> a)</span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a>dequeue queue <span class="ot">=</span> atomicModifyIORef' queue <span class="op">$</span> \<span class="deemphasis">(</span>q<span class="deemphasis">, maxWakeupTime)</span> <span class="ot">-></span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> PQ.null q</span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> (<span class="deemphasis">(</span>q<span class="deemphasis">, maxWakeupTime)</span>, <span class="dt">Nothing</span>)</span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">let</span> ((_, val), q') <span class="ot">=</span> PQ.deleteFindMin q</span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> (<span class="deemphasis">(</span>q'<span class="deemphasis">, maxWakeupTime)</span>, <span class="dt">Just</span> val)</span></code></pre></div>
<p>To use the min-priority queue as a FIFO queue, we use the current system time—which is a monotonically increasing value—as the priority of the values in the queue. This way, the coroutines are scheduled in the order they are enqueued.</p>
<p>The <code>enqueueAt</code> function enqueues the given value at the given time in the queue. The <code>enqueue</code> function enqueues the value at the current time, thus scheduling it to run immediately.</p>
<p>The <code>dequeue</code> function dequeues the value with the lowest priority from the queue, which in this case, is the value that is enqueued first.</p>
<p>The <code>currentSystemTime</code> function returns the monotonically increasing current system time.</p>
<p>Over these queuing primitives, we build the coroutine scheduling functions:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="10-10"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">scheduleCoroutine ::</span> <span class="dt">Coroutine</span> () <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>scheduleCoroutine coroutine <span class="ot">=</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> State.gets isCoroutines <span class="op">>>=</span> enqueue coroutine</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="ot">runNextCoroutine ::</span> <span class="dt">Interpreter</span> ()</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>runNextCoroutine <span class="ot">=</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> State.gets isCoroutines <span class="op">>>=</span> dequeue <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> throwError <span class="dt">CoroutineQueueEmpty</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> <span class="dt">Coroutine</span> {<span class="op">..</span>} <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> void <span class="op">$</span> takeMVar corReady</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> setEnv corEnv</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> corCont ()</span></code></pre></div>
<p>The <code>scheduleCoroutine</code> function takes a coroutine, and schedules it by enqueuing it in the coroutine queue in the interpreter state.</p>
<p>The <code>runNextCoroutine</code> function dequeues the next coroutine from the queue, and runs it. It first restores the environment of the coroutine in the interpreter state, and then runs the continuation of the coroutine. If the queue is empty, it throws a <code class="sourceCode haskell"><span class="dt">CoroutineQueueEmpty</span></code> exception, which we add in the <code class="sourceCode haskell"><span class="dt">Exception</span></code> data type:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="4-4"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Exception</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Return</span> <span class="dt">Value</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">RuntimeError</span> <span class="dt">String</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op">|</span> <span class="dt">CoroutineQueueEmpty</span></span></span></code></pre></div>
<p>The <code>runNextCoroutine</code> function is the heart of the coroutine scheduling. It is called at the end of every function related to coroutines in the interpreter, and that’s how the coroutines are run one-after-another. Next, we see how we use these functions to implement the <code class="sourceCode javascript"><span class="kw">yield</span></code> and <code class="sourceCode"><span class="cf">spawn</span></code> statements in <span class="fancy">Co</span>.</p>
<h2 data-track-content data-content-name="yield-and-spawn" data-content-piece="implementing-co-3" id="yield-and-spawn">Yield and Spawn</h2>
<p>Let’s recall the program we used to demonstrate coroutines:</p>
<div class="sourceCode" id="cb14" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">printNums</span>(start<span class="op">,</span> end) {</span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> i <span class="op">=</span> start<span class="op">;</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (i <span class="op"><</span> end <span class="op">+</span> <span class="dv">1</span>) {</span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(i)<span class="op">;</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">yield</span><span class="op">;</span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)<span class="op">;</span></span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a><span class="fu">printNums</span>(<span class="dv">11</span><span class="op">,</span> <span class="dv">16</span>)<span class="op">;</span></span></code></pre></div>
<p>Running this program with the interpreter produces the following output:</p>
<pre class="plain"><code>11
1
12
2
13
3
14
4
15
16</code></pre>
<p>As we see, the numbers printed by the <code class="sourceCode javascript"><span class="fu">printNums</span>(<span class="dv">11</span><span class="op">,</span> <span class="dv">16</span>)</code> function call are interleaved with the ones printed by the <code class="sourceCode javascript"><span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)</code> call.</p>
<p>This is how the code is interpreted:</p>
<ol type="1">
<li>First, the definition of the function <code class="sourceCode javascript">printNums</code> executes. The function gets stored in the environment as a <code class="sourceCode haskell"><span class="dt">Function</span></code> value.</li>
<li>The <code class="sourceCode javascript">spawn <span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)</code> statement executes. The <code class="sourceCode"><span class="cf">spawn</span></code> statement creates a new coroutine for the function call <code class="sourceCode javascript"><span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)</code> and schedules it.</li>
<li><details>
<summary>
The <code class="sourceCode javascript"><span class="fu">printNums</span>(<span class="dv">11</span><span class="op">,</span> <span class="dv">16</span>)</code> function call executes, prints <code class="sourceCode javascript"><span class="dv">11</span></code> and yields.
</summary>
<ol type="i">
<li>The <code class="sourceCode javascript"><span class="cf">while</span></code> loop executes, and the <code>print</code> statement prints the value of the variable <code>i</code>, which is <code class="sourceCode javascript"><span class="dv">11</span></code> at this point.</li>
<li>The <code class="sourceCode javascript"><span class="kw">yield</span></code> statement executes. This creates a new coroutine for the rest of the call execution, and schedules it. The call execution suspends at this point.</li>
</ol>
</details></li>
<li><details>
<summary>
The <code>runNextCoroutine</code> function executes, which dequeues the coroutine for the <code class="sourceCode javascript"><span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)</code> call, and runs it. This prints <code class="sourceCode javascript"><span class="dv">1</span></code> and yields.
</summary>
<ol type="i">
<li>The <code class="sourceCode javascript"><span class="cf">while</span></code> loop executes, and the <code>print</code> statement prints the value of the variable <code>i</code>, which is <code class="sourceCode javascript"><span class="dv">1</span></code> at this point.</li>
<li>The <code class="sourceCode javascript"><span class="kw">yield</span></code> statement executes. This creates a new coroutine for the rest of the call execution, and schedules it. The call execution suspends at this point.</li>
</ol>
</details></li>
<li><details>
<summary>
The <code>runNextCoroutine</code> function executes again, which dequeues the coroutine for the <code class="sourceCode javascript"><span class="fu">printNums</span>(<span class="dv">11</span><span class="op">,</span> <span class="dv">16</span>)</code> call, and runs it. This prints <code class="sourceCode javascript"><span class="dv">12</span></code> and yields.
</summary>
<ol type="i">
<li>The call resumes after the <code class="sourceCode javascript"><span class="kw">yield</span></code> statement. The <code class="sourceCode javascript"><span class="cf">while</span></code> loop executes again, and the <code>print</code> statement prints the value of the variable <code>i</code>, which is <code class="sourceCode javascript"><span class="dv">12</span></code> at this point.</li>
<li>The function execution suspends at the <code class="sourceCode javascript"><span class="kw">yield</span></code> statement again.</li>
</ol>
</details></li>
<li><details>
<summary>
The <code>runNextCoroutine</code> function executes again, which dequeues the coroutine for the <code class="sourceCode javascript"><span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)</code> call, and runs it. This prints <code class="sourceCode javascript"><span class="dv">2</span></code> and yields.
</summary>
<ol type="i">
<li>The call resumes after the <code class="sourceCode javascript"><span class="kw">yield</span></code> statement. The <code class="sourceCode javascript"><span class="cf">while</span></code> loop executes again, and the <code>print</code> statement prints the value of the variable <code>i</code>, which is <code class="sourceCode javascript"><span class="dv">2</span></code> at this point.</li>
<li>The function execution suspends at the <code class="sourceCode javascript"><span class="kw">yield</span></code> statement again.</li>
</ol>
</details></li>
<li>This back-and-forth process of suspension and resumption of function executions continues until the <code class="sourceCode javascript"><span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)</code> call returns after printing the number <code class="sourceCode javascript"><span class="dv">4</span></code>.</li>
<li>After that, the call <code class="sourceCode javascript"><span class="fu">printNums</span>(<span class="dv">11</span><span class="op">,</span> <span class="dv">16</span>)</code> resumes to print the numbers and yields, again and again, until it returns after printing the number <code class="sourceCode javascript"><span class="dv">16</span></code>.</li>
<li>Interpretation ends.</li>
</ol>
<p>The diagram below depicts this process in abstract:</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 673 849'%3E%3C/svg%3E" class="lazyload w-100pct nolink mw-80pct" style="--image-aspect-ratio: 0.7926972909305064" data-src="/images/implementing-co-3/coroutine-scheduling.svg" alt="Spawning, yielding, and running coroutines"></img>
<noscript><img src="/images/implementing-co-3/coroutine-scheduling.svg" class="w-100pct nolink mw-80pct" alt="Spawning, yielding, and running coroutines"></img></noscript>
<figcaption>Spawning, yielding, and running coroutines</figcaption>
</figure>
<p>With the understanding of how they work, let’s see how to implement the <code class="sourceCode javascript"><span class="kw">yield</span></code> and <code class="sourceCode"><span class="cf">spawn</span></code> statements in <span class="fancy">Co</span>.</p>
<h3 id="implementation">Implementation</h3>
<p>First, we add the <code class="sourceCode haskell"><span class="dt">YieldStmt</span></code> and <code class="sourceCode haskell"><span class="dt">SpawnStmt</span></code> constructors to the <code class="sourceCode haskell"><span class="dt">Stmt</span></code> data type:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="9-10" data-deemphasize="11-11"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Stmt</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">ExprStmt</span> <span class="dt">Expr</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">VarStmt</span> <span class="dt">Identifier</span> <span class="dt">Expr</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">AssignStmt</span> <span class="dt">Identifier</span> <span class="dt">Expr</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">IfStmt</span> <span class="dt">Expr</span> [<span class="dt">Stmt</span>]</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">WhileStmt</span> <span class="dt">Expr</span> [<span class="dt">Stmt</span>]</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">FunctionStmt</span> <span class="dt">Identifier</span> [<span class="dt">Identifier</span>] [<span class="dt">Stmt</span>]</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">ReturnStmt</span> (<span class="dt">Maybe</span> <span class="dt">Expr</span>)</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op">|</span> <span class="dt">YieldStmt</span></span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op">|</span> <span class="dt">SpawnStmt</span> <span class="dt">Expr</span></span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="op">|</span> <span class="dt">SendStmt</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Program</span> <span class="ot">=</span> [<span class="dt">Stmt</span>]</span></code></pre></div>
<p>Then, we enhance the <code>stmt</code> parser to parse these statements:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="6-7" data-deemphasize="14-14"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">stmt ::</span> <span class="dt">Parser</span> <span class="dt">Stmt</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>stmt <span class="ot">=</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">IfStmt</span> <span class="op"><$></span> (reserved <span class="st">"if"</span> <span class="op">*></span> parens expr) <span class="op"><*></span> braces (many stmt)</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">WhileStmt</span> <span class="op"><$></span> (reserved <span class="st">"while"</span> <span class="op">*></span> parens expr) <span class="op"><*></span> braces (many stmt)</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">VarStmt</span> <span class="op"><$></span> (reserved <span class="st">"var"</span> <span class="op">*></span> identifier) <span class="op"><*></span> (symbol <span class="st">"="</span> <span class="op">*></span> expr <span class="op"><*</span> semi)</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op"><|></span> <span class="dt">YieldStmt</span> <span class="op"><$</span> (reserved <span class="st">"yield"</span> <span class="op"><*</span> semi)</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op"><|></span> <span class="dt">SpawnStmt</span> <span class="op"><$></span> (reserved <span class="st">"spawn"</span> <span class="op">*></span> expr <span class="op"><*</span> semi)</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">ReturnStmt</span> <span class="op"><$></span> (reserved <span class="st">"return"</span> <span class="op">*></span> optional expr <span class="op"><*</span> semi)</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">FunctionStmt</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> try (reserved <span class="st">"function"</span> <span class="op">*></span> identifier)</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> parens (sepBy identifier <span class="op">$</span> symbol <span class="st">","</span>)</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> braces (many stmt)</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> try (<span class="dt">AssignStmt</span> <span class="op"><$></span> identifier <span class="op"><*></span> (symbol <span class="st">"="</span> <span class="op">*></span> expr <span class="op"><*</span> semi))</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="op"><|></span> try (<span class="dt">SendStmt</span> <span class="op"><$></span> expr <span class="op"><*></span> (symbol <span class="st">"->"</span> <span class="op">*></span> expr <span class="op"><*</span> semi))</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">ExprStmt</span> <span class="op"><$></span> expr <span class="op"><*</span> semi</span></code></pre></div>
<p>Next, we implement the <code>execute</code> function for the <code class="sourceCode haskell"><span class="dt">YieldStmt</span></code> and <code class="sourceCode haskell"><span class="dt">SpawnStmt</span></code> statements:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="21-22" data-deemphasize="23-27"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">execute ::</span> <span class="dt">Stmt</span> <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>execute <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">ExprStmt</span> expr <span class="ot">-></span> void <span class="op">$</span> evaluate expr</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">VarStmt</span> name expr <span class="ot">-></span> evaluate expr <span class="op">>>=</span> defineVar name</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">AssignStmt</span> name expr <span class="ot">-></span> evaluate expr <span class="op">>>=</span> assignVar name</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">IfStmt</span> expr body <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> cond <span class="ot"><-</span> evaluate expr</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> when (isTruthy cond) <span class="op">$</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> traverse_ execute body</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> while<span class="op">@</span>(<span class="dt">WhileStmt</span> expr body) <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> cond <span class="ot"><-</span> evaluate expr</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> when (isTruthy cond) <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> traverse_ execute body</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> execute while</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">ReturnStmt</span> mExpr <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> mRet <span class="ot"><-</span> <span class="fu">traverse</span> evaluate mExpr</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> throwError <span class="op">.</span> <span class="dt">Return</span> <span class="op">.</span> fromMaybe <span class="dt">Null</span> <span class="op">$</span> mRet</span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">FunctionStmt</span> name params body <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a> env <span class="ot"><-</span> State.gets isEnv</span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> defineVar name <span class="op">$</span> <span class="dt">Function</span> name params body env</span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dt">YieldStmt</span> <span class="ot">-></span> yield</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dt">SpawnStmt</span> expr <span class="ot">-></span> spawn expr</span></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">SendStmt</span> expr chan <span class="ot">-></span> evaluate chan <span class="op">>>=</span> \<span class="kw">case</span></span></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">Chan</span> channel <span class="ot">-></span> <span class="kw">do</span></span></span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> val <span class="ot"><-</span> evaluate expr</span></span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> channelSend val channel</span></span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> v <span class="ot">-></span> throw <span class="op">$</span> <span class="st">"Cannot send to a non-channel: "</span> <span class="op"><></span> <span class="fu">show</span> v</span></span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"></a> isTruthy <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb1-30"><a href="#cb1-30" aria-hidden="true" tabindex="-1"></a> <span class="dt">Null</span> <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb1-31"><a href="#cb1-31" aria-hidden="true" tabindex="-1"></a> <span class="dt">Boolean</span> b <span class="ot">-></span> b</span>
<span id="cb1-32"><a href="#cb1-32" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">True</span></span></code></pre></div>
<p>All the scaffolding is now in place. Next, we implement the <code class="sourceCode javascript"><span class="kw">yield</span></code> and <code class="sourceCode"><span class="cf">spawn</span></code> functions. First comes <code class="sourceCode"><span class="cf">spawn</span></code>:</p>
<div class="sourceCode" id="cb16" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="ot">spawn ::</span> <span class="dt">Expr</span> <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>spawn expr <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> env <span class="ot"><-</span> State.gets isEnv</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> coroutine <span class="ot"><-</span> newCoroutine env (<span class="fu">const</span> <span class="op">$</span> evaluate expr <span class="op">>></span> runNextCoroutine)</span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a> scheduleCoroutine coroutine</span></code></pre></div>
<p>The <code class="sourceCode"><span class="cf">spawn</span></code> statement creates a new coroutine for the expression <code class="sourceCode haskell">expr</code> and schedules it. The coroutine captures the current environment, and evaluates the expression <code class="sourceCode haskell">expr</code> when it is run. The <code>runNextCoroutine</code> function is called after the expression is evaluated to run the next coroutine in the queue<a href="#fn11" class="footnote-ref" id="fnref11" role="doc-noteref"><sup>11</sup></a>.</p>
<p>Next up is <code class="sourceCode javascript"><span class="kw">yield</span></code>:</p>
<div class="sourceCode" id="cb17" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">yield ::</span> <span class="dt">Interpreter</span> ()</span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>yield <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a> env <span class="ot"><-</span> State.gets isEnv</span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a> callCC <span class="op">$</span> \cont <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a> newCoroutine env cont <span class="op">>>=</span> scheduleCoroutine</span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a> runNextCoroutine</span></code></pre></div>
<p>The <code class="sourceCode javascript"><span class="kw">yield</span></code> function is the essence of coroutines in <span class="fancy">Co</span>. This is where we use the continuations that we added to the interpreter. First, we capture the current environment from the interpreter state. Then, we invoke <code class="sourceCode haskell">callCC</code> to get the current continuation. This continuation represents the rest of the program execution that lies in future after the <code class="sourceCode javascript"><span class="kw">yield</span></code> statement<a href="#fn12" class="footnote-ref" id="fnref12" role="doc-noteref"><sup>12</sup></a>. We create a new coroutine with the captured environment and the continuation, and schedule it. Finally, we run the next coroutine in the queue.</p>
<p>By capturing the environment and the continuation in a coroutine, and scheduling it to be run later, we are able to suspend the current program execution, and resume it later. At the same time, by running the next coroutine in the queue, we cause the interleaved execution of function calls that we saw in the previous section.</p>
<h2 data-track-content data-content-name="waiting-for-termination" data-content-piece="implementing-co-3" id="waiting-for-termination">Waiting for Termination</h2>
<p>There is one last thing we need to implement. If we were to run the following program with the interpreter as we have it now, it would terminate prematurely without printing anything:</p>
<div class="sourceCode" id="cb18" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">printNums</span>(start<span class="op">,</span> end) {</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> i <span class="op">=</span> start<span class="op">;</span></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (i <span class="op"><</span> end <span class="op">+</span> <span class="dv">1</span>) {</span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(i)<span class="op">;</span></span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">yield</span><span class="op">;</span></span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)<span class="op">;</span></span></code></pre></div>
<p>That’s because <code class="sourceCode"><span class="cf">spawn</span></code> schedules a coroutine for the <code class="sourceCode javascript"><span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)</code> function call, but the interpreter does not wait for all scheduled coroutines to finish executing before terminating. So, we add a mechanism for the same:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="3:3-3:4,3:14-3:30,4-4,5:32-5:43,6-6,7:5-7:9"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">awaitTermination ::</span> <span class="dt">Interpreter</span> ()</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>awaitTermination <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="deemphasis">(</span>coroutines<span class="deemphasis">, maxWakeupTime)</span> <span class="ot"><-</span> readIORef <span class="op">=<<</span> State.gets isCoroutines</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> dur <span class="ot"><-</span> calcSleepDuration maxWakeupTime</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> unless (PQ.null coroutines) <span class="op">$</span><span class="deemphasis"> <span class="kw">if</span> dur <span class="op">></span> <span class="dv">0</span></span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="kw">then</span> sleep dur <span class="op">>></span> awaitTermination</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="kw"><span class="deemphasis">else</span></span> yield <span class="op">>></span> awaitTermination</span></code></pre></div>
<p>The <code>awaitTermination</code> function checks if the coroutine queue is empty. If it is not, it yields and calls itself again to redo the check. Calling <code class="sourceCode javascript"><span class="kw">yield</span></code> causes the next coroutine in the queue to be run. <code>awaitTermination</code> keeps checking the queue, and yielding until the queue is empty. Then, it finally returns.</p>
<h2 data-track-content data-content-name="putting-everything-together" data-content-piece="implementing-co-3" id="putting-everything-together">Putting Everything Together</h2>
<p>Finally, we put everything together in the <code>interpret</code> function:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="5-5,8:34-8:53,12-12"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">interpret ::</span> <span class="dt">Program</span> <span class="ot">-></span> <span class="dt">IO</span> (<span class="dt">Either</span> <span class="dt">String</span> ())</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>interpret program <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> state <span class="ot"><-</span> initInterpreterState</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> retVal <span class="ot"><-</span> <span class="fu">flip</span> evalStateT state</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op">.</span> <span class="fu">flip</span> runContT <span class="fu">return</span></span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> runExceptT</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> runInterpreter</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> (traverse_ execute program <span class="emphasis"><span class="op">>></span> awaitTermination</span>)</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> retVal <span class="kw">of</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> (<span class="dt">RuntimeError</span> err) <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Left</span> err</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> (<span class="dt">Return</span> _) <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Left</span> <span class="st">"Cannot return from outside functions"</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="dt">Left</span> <span class="dt">CoroutineQueueEmpty</span> <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Right</span> ()</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> _ <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Right</span> ()</span></code></pre></div>
<p>We add <code>awaitTermination</code> at the end of the program to be interpreted so that the interpreter waits for the coroutine queue to be empty before terminating.</p>
<p>We use <code>runContT</code> to run the program with the initial continuation, which is just <code class="sourceCode haskell"><span class="fu">return</span></code>. This causes the interpreter to terminate when the program returns.</p>
<p>Lastly, we catch the <code class="sourceCode haskell"><span class="dt">CoroutineQueueEmpty</span></code> exception, ignore it, and terminate the interpreter.</p>
<p>That’s it! We have implemented coroutines in <span class="fancy">Co</span>. For an interesting example of usage of coroutines, we are going to implement the breadth-first traversal of a binary tree without using a queue in the next section.</p>
<h2 data-track-content data-content-name="bonus-round-breadth-first-traversal-without-a-queue" data-content-piece="implementing-co-3" id="bonus-round-breadth-first-traversal-without-a-queue">Bonus Round: Breadth-First Traversal without a Queue</h2>
<p><a href="https://en.wikipedia.org/wiki/Breadth-first_traversal" target="_blank" rel="noopener">Breadth-first traversal</a> is a common algorithm for traversing a tree. It traverses the tree level-by-level, from left to right. It uses a queue to keep track of the nodes that are yet to be traversed. However, with coroutines, we can implement a breadth-first traversal without using a queue.</p>
<p>First, we need to define a binary tree data structure in <span class="fancy">Co</span>. Remember, however, that <span class="fancy">Co</span> does not have a built-in data structure for trees, neither does it support user-defined data structures. So, we are going to borrow <a href="https://web.archive.org/web/20230211/https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-14.html#%25_sec_2.1.3" target="_blank" rel="noopener">a trick from the Wizard book</a>, and implement it using closures:</p>
<div class="sourceCode" id="cb19" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">BinaryTree</span>(val<span class="op">,</span> left<span class="op">,</span> right) {</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">function</span> (command) {</span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (command <span class="op">==</span> <span class="st">"val"</span>) { <span class="cf">return</span> val<span class="op">;</span> }</span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (command <span class="op">==</span> <span class="st">"left"</span>) { <span class="cf">return</span> left<span class="op">;</span> }</span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (command <span class="op">==</span> <span class="st">"right"</span>) { <span class="cf">return</span> right<span class="op">;</span> }</span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">null</span><span class="op">;</span></span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a> }<span class="op">;</span></span>
<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb19-9"><a href="#cb19-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-10"><a href="#cb19-10" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">binaryTreeVal</span>(tree) { <span class="cf">return</span> <span class="fu">tree</span>(<span class="st">"val"</span>)<span class="op">;</span> }</span>
<span id="cb19-11"><a href="#cb19-11" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">binaryTreeLeft</span>(tree) { <span class="cf">return</span> <span class="fu">tree</span>(<span class="st">"left"</span>)<span class="op">;</span> }</span>
<span id="cb19-12"><a href="#cb19-12" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">binaryTreeRight</span>(tree) { <span class="cf">return</span> <span class="fu">tree</span>(<span class="st">"right"</span>)<span class="op">;</span> }</span></code></pre></div>
<p>We define a binary tree as a function that takes a node value, and left and right subtrees as parameters, and returns an anonymous function that takes a command, and returns the corresponding parameter value. The <code class="sourceCode javascript">binaryTreeVal</code>, <code class="sourceCode javascript">binaryTreeLeft</code> and <code class="sourceCode javascript">binaryTreeRight</code> are helper functions that call the returned anonymous function with the appropriate command.</p>
<p>Next, we write a function to generate a <a href="https://en.wikipedia.org/wiki/Binary_tree#perfect" target="_blank" rel="noopener">perfect binary tree</a> given a starting power-of-two number:</p>
<div class="sourceCode" id="cb20" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">generatePowersOfTwoBinaryTree</span>(start) {</span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">function</span> <span class="fu">generateTree</span>(start<span class="op">,</span> interval) {</span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (start <span class="op">==</span> <span class="dv">1</span>) {</span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="fu">BinaryTree</span>(<span class="dv">1</span><span class="op">,</span> <span class="kw">null</span><span class="op">,</span> <span class="kw">null</span>)<span class="op">;</span></span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="fu">BinaryTree</span>(start<span class="op">,</span></span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a> <span class="fu">generateTree</span>(start <span class="op">-</span> interval<span class="op">/</span><span class="dv">2</span><span class="op">,</span> interval<span class="op">/</span><span class="dv">2</span>)<span class="op">,</span></span>
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">generateTree</span>(start <span class="op">-</span> interval<span class="op">/</span><span class="dv">2</span><span class="op">,</span> interval<span class="op">/</span><span class="dv">2</span>))<span class="op">;</span></span>
<span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb20-10"><a href="#cb20-10" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="fu">generateTree</span>(start<span class="op">,</span> start)<span class="op">;</span></span>
<span id="cb20-11"><a href="#cb20-11" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>And, a function to pretty-print a tree node:</p>
<div class="sourceCode" id="cb21" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">printTreeNode</span>(val<span class="op">,</span> depth) {</span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> i <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> padding <span class="op">=</span> <span class="st">"┃━"</span><span class="op">;</span></span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (i <span class="op"><</span> depth) {</span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a> padding <span class="op">=</span> padding <span class="op">+</span> <span class="st">"━━━━━━━━"</span><span class="op">;</span></span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(padding <span class="op">+</span> <span class="st">" "</span> <span class="op">+</span> val)<span class="op">;</span></span>
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>Finally, here’s the function that does the breadth-first traversal, and prints the tree:</p>
<div class="sourceCode" id="cb22" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">printBinaryTreeBreadthFirst</span>(tree) {</span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">function</span> <span class="fu">traverseTree</span>(tree<span class="op">,</span> depth) {</span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (tree <span class="op">==</span> <span class="kw">null</span>) { <span class="cf">return</span><span class="op">;</span> }</span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">printTreeNode</span>(<span class="fu">binaryTreeVal</span>(tree)<span class="op">,</span> depth)<span class="op">;</span></span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a> spawn <span class="fu">traverseTree</span>(<span class="fu">binaryTreeLeft</span>(tree)<span class="op">,</span> depth <span class="op">+</span> <span class="dv">1</span>)<span class="op">;</span></span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a> spawn <span class="fu">traverseTree</span>(<span class="fu">binaryTreeRight</span>(tree)<span class="op">,</span> depth <span class="op">+</span> <span class="dv">1</span>)<span class="op">;</span></span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">traverseTree</span>(tree<span class="op">,</span> <span class="dv">0</span>)<span class="op">;</span></span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>We run it like this:</p>
<div class="sourceCode" id="cb23" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> tree <span class="op">=</span> <span class="fu">generatePowersOfTwoBinaryTree</span>(<span class="dv">16</span>)<span class="op">;</span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a><span class="fu">printBinaryTreeBreadthFirst</span>(tree)<span class="op">;</span></span></code></pre></div>
<details>
<summary>
And, we get the following output:
</summary>
<pre class="plain"><code>┃━ 16
┃━━━━━━━━━ 8
┃━━━━━━━━━ 8
┃━━━━━━━━━━━━━━━━━ 4
┃━━━━━━━━━━━━━━━━━ 4
┃━━━━━━━━━━━━━━━━━ 4
┃━━━━━━━━━━━━━━━━━ 4
┃━━━━━━━━━━━━━━━━━━━━━━━━━ 2
┃━━━━━━━━━━━━━━━━━━━━━━━━━ 2
┃━━━━━━━━━━━━━━━━━━━━━━━━━ 2
┃━━━━━━━━━━━━━━━━━━━━━━━━━ 2
┃━━━━━━━━━━━━━━━━━━━━━━━━━ 2
┃━━━━━━━━━━━━━━━━━━━━━━━━━ 2
┃━━━━━━━━━━━━━━━━━━━━━━━━━ 2
┃━━━━━━━━━━━━━━━━━━━━━━━━━ 2
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1
┃━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1</code></pre>
</details>
<p>The trick here is to use the coroutine queue itself for the breadth-first traversal. After printing a tree node, we spawn a coroutine for traversing each child node. The coroutines are scheduled in the order they are spawned, so the traversal is breadth-first, resulting in the above output.</p>
<hr></hr>
<p>In this post, we added support for coroutines to our <span class="fancy">Co</span> interpreter. We learned about the continuation-passing style, and used it to implement coroutines. In the <a href="https://abhinavsarkar.net/posts/implementing-co-4/?mtm_campaign=feed">next part</a>, we’ll add support for channels to our interpreter, and use them for cross-coroutine communication.</p>
<p>The code for complete <span class="fancy">Co</span> interpreter is available <a href="https://abhinavsarkar.net/code/co-interpreter.html?mtm_campaign=feed">here</a>.</p>
<h2 class="notoc" data-track-content data-content-name="acknowledgements" data-content-piece="implementing-co-3" id="acknowledgements">Acknowledgements</h2>
<p>Many thanks to <a href="https://arunraghavan.net/" target="_blank" rel="noopener">Arun Raghavan</a> for reviewing a draft of this article.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" role="list">
<div id="ref-Abelson1996-c32" class="csl-entry" role="listitem">
Abelson, Harold, Gerald Jay Sussman, and with Julie Sussman. <span>“The Environment Model of Evaluation.”</span> In <em>Structure and Interpretation of Computer Programs</em>, 2nd Editon. MIT Press/McGraw-Hill, 1996. <a href="https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-21.html#%_sec_3.2" target="_blank" rel="noopener">https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-21.html#%_sec_3.2</a>.
</div>
<div id="ref-Bartel2011-ap" class="csl-entry" role="listitem">
Bartel, Joe. <span>“<span>Non-Preemptive</span> Multitasking.”</span> <em>The Computer Journal</em>, no. 30 (May 2011): 37–38, 28. <a href="https://cini.classiccmp.org/pdf/HT68K/HT68K%20TCJ30p37.pdf" target="_blank" rel="noopener">https://cini.classiccmp.org/pdf/HT68K/HT68K%20TCJ30p37.pdf</a>.
</div>
<div id="ref-Knuth1997-rv" class="csl-entry" role="listitem">
Knuth, Donald E. <span>“Coroutines.”</span> In <em>The Art of Computer Programming: Volume 1: Fundamental Algorithms</em>, 3rd ed., 193–200. Addison Wesley, 1997.
</div>
<div id="ref-Reynolds1993-dc" class="csl-entry" role="listitem">
Reynolds, John C. <span>“The Discoveries of Continuations.”</span> <em>LISP and Symbolic Computation</em> 6, no. 3-4 (1993): 233–47. <a href="https://www.cs.ru.nl/~freek/courses/tt-2011/papers/cps/histcont.pdf" target="_blank" rel="noopener">https://www.cs.ru.nl/~freek/courses/tt-2011/papers/cps/histcont.pdf</a>.
</div>
</div>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>This representation is copied from a <a href="https://dmitrykandalov.com/coroutines-as-threads" target="_blank" rel="noopener">series</a> of articles on coroutines by Dmitry Kandalov. The articles are a great introduction to coroutines, and are highly recommended.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>Read the <a href="https://kotlinlang.org/docs/coroutines-guide.html" target="_blank" rel="noopener">Kotlin docs</a> and <a href="https://docs.python.org/3/library/asyncio-task.html" target="_blank" rel="noopener">Python docs</a> to learn more about coroutines in Kotlin and Python respectively.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>Generators are similar to coroutines. The main difference is that generators are typically used to produce a sequence of values, while coroutines are used to implement concurrency. But coroutines (as we have them in this post) can be implemented over generators, and generators can be implemented over coroutines and channels. So the difference is mostly of intent.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>Coroutines as we have them in <span class="fancy">Co</span>, are asymmetric, non-first-class, and stackful.</p>
<p>In contrast, coroutines in</p>
<ul>
<li>Kotlin are asymmetric, non-first-class and stackless,</li>
<li>Python are asymmetric, first-class and stackless,</li>
<li>Lua are asymmetric, first-class and stackful, and</li>
<li>Zig are symmetric, non-first-class and stackless.</li>
</ul>
<p>See the Wikipedia article on coroutines for more details on the <a href="https://en.wikipedia.org/wiki/Coroutine#Definition_and_Types" target="_blank" rel="noopener">types of coroutines</a> and their <a href="https://en.wikipedia.org/wiki/Coroutine#Implementations" target="_blank" rel="noopener">various implementations</a>.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>The core.async library implements something like coroutines in Clojure, but they are not true coroutines. They have <a href="https://www.clojure.org/guides/core_async_go#_unsupported_constructs_and_other_limitations_in_go_blocks" target="_blank" rel="noopener">various limitations</a> like not being able to yield from a functions called from a <code>go</code> block. This is because core.async is implemented as a macro that transforms the code directly inside a <code>go</code> block into a state machine, but not the functions called from the <code>go</code> block.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>See <a href="https://free.cofree.io/2020/01/02/cps/" target="_blank" rel="noopener">this article</a> by
Ziyang Liu and <a href="https://matt.might.net/articles/programming-with-continuations--exceptions-backtracking-search-threads-generators-coroutines/" target="_blank" rel="noopener">this one</a> by Matt Might for detailed explanations of the various use-cases of <abbr title="Continuation-passing style">CPS</abbr>.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>See <a href="https://blog.poisson.chat/posts/2019-10-26-reasonable-continuations.html" target="_blank" rel="noopener">this article</a> by Li-yao XIA for an introduction to the <code class="sourceCode haskell"><span class="dt">Cont</span></code> monad.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p><a href="https://en.wikipedia.org/wiki/Scheme_(programming_language)" target="_blank" rel="noopener">Scheme</a> was the first language to introduce <code class="sourceCode scheme"><span class="kw">call/cc</span></code>. Since then <a href="https://en.wikipedia.org/wiki/Continuation#Programming_language_support" target="_blank" rel="noopener">many languages</a> have added support for it.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9"><p>If we compare the <a href="#cb9-1">CPS version</a> of the program with the <a href="#cb8-1">direct style version</a>, we can see that it is possible to print the recommendations twice in the <abbr title="Continuation-passing style">CPS</abbr> version by calling the continuation twice. However, this is not possible in the direct style version, since the flow of control is implicit in it.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn10"><p>We use the min-priority queue from the <a href="https://hackage.haskell.org/package/pqueue" target="_blank" rel="noopener">pqueue</a> library.<a href="#fnref10" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn11"><p>It is essential to call <code>runNextCoroutine</code> after the expression in the <code class="sourceCode"><span class="cf">spawn</span></code> statement is evaluated. The evaluation of the expression may or may not yield. If it does, yielding causes the next coroutine to be <a href="#cb17-1">run</a>.</p>
<p>However, if it does not yield, but instead returns, and we do not call <code>runNextCoroutine</code> after it, the flow of control then goes to the end of the previous call to <code>runNextCoroutine</code> called from a previous yield. This causes the program after the previous yield to start executing, but with the interpreter environment set to that of the expression in the <code class="sourceCode"><span class="cf">spawn</span></code> statement, leading to unexpected behavior.</p>
<p>So, calling <code>runNextCoroutine</code> after the expression evaluation is a must to ensure correct execution.<a href="#fnref11" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn12"><p>The coroutines in <span class="fancy">Co</span> are stackful, which means that the <abbr title="Thread of computation">ToC</abbr> can be yielded from anywhere in the program, including nested function calls, and are resumed from the same point. This is in contrast to stackless coroutine implementations, where the <abbr title="Thread of computation">ToC</abbr> can only be yielded from particular functions that are marked as being able to yield, like generators in Python or <code>async</code> functions in JavaScript. Stackless coroutines are more efficient, but they are also more restrictive.<a href="#fnref12" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>Implementing Co, a Small Language With Coroutines</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed">The Parser</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed">The Interpreter</a>
</li>
<li>
<strong>Adding Coroutines</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-4/?mtm_campaign=feed">Adding Channels</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2023-02-11T00:00:00Z <p>In the <a href="https://abhinavsarkar.net/posts/implementing-co-2/">previous post</a>, we wrote the interpreter for basic features of <span class="fancy">Co</span>, the small language we are building in this series of posts. In this post, we explore and implement what makes <span class="fancy">Co</span> really interesting: support for lightweight concurrency using Coroutines.</p>
https://abhinavsarkar.net/posts/static-site-generator-using-shake/ Writing a Static Site Generator Using Shake 2022-12-17T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>Static site generators (SSGs) are all rage these days as people realize that plain HTML websites are good enough for most cases. <abbr title="Static site generator">SSGs</abbr> take raw data in various formats—often <a href="https://en.wikipedia.org/wiki/Markdown" target="_blank" rel="noopener">Markdown</a>, <a href="https://en.wikipedia.org/wiki/JSON" target="_blank" rel="noopener">JSON</a>, and <a href="https://en.wikipedia.org/wiki/YAML" target="_blank" rel="noopener">YAML</a>—and process them to produce the static websites, which can then be hosted easily on any hosting provider, or on personal <abbr title="Virtual private server">VPSes</abbr>. In this post, we write a bespoke <abbr title="Static site generator">SSG</abbr> using the <a href="https://shakebuild.com/" target="_blank" rel="noopener">Shake</a> build system.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/static-site-generator-using-shake/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more-->
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#introduction">Introduction</a></li><li><a href="#build-systems">Build Systems</a></li><li><a href="#static-site-structure">Static Site Structure</a></li><li><a href="#main">Main</a></li><li><a href="#build-targets">Build Targets</a></li><li><a href="#build-rules">Build Rules</a></li><li><a href="#utilities">Utilities</a></li><li><a href="#building-the-blog">Building the Blog</a></li><li><a href="#shake-features">Shake Features</a></li><li><a href="#tips-and-tricks">Tips and Tricks</a></li><li><a href="#conclusion">Conclusion</a></li></ol></nav>
<h2 data-track-content data-content-name="introduction" data-content-piece="static-site-generator-using-shake" id="introduction">Introduction</h2>
<p>In the <a href="https://info.cern.ch/" target="_blank" rel="noopener">beginning</a>, people coded websites by hand, painstakingly writing the HTML tags and CSS styles (and JavaScript code, if they were into <a href="https://en.wikipedia.org/wiki/DHTML" target="_blank" rel="noopener">DHTML</a>). Many <a href="https://en.wikipedia.org/wiki/Weblog" target="_blank" rel="noopener">weblogs</a> were crafted by the hands of passionate individuals, even before the word <em>Blog</em> came into being.</p>
<p>Over time, these websites grew in size and some clever people decided to separate the data for the websites from the presentation and layout. The data moved into databases and <a href="https://en.wikipedia.org/wiki/Common_Gateway_Interface" target="_blank" rel="noopener">CGI</a> scripts were invented to pull the data and create webpages out of them programmatically, on request. Thus began the age of <a href="https://en.wikipedia.org/wiki/Content_management_systems" target="_blank" rel="noopener">Content management systems</a> (CMS) like <a href="https://en.wikipedia.org/wiki/Drupal" target="_blank" rel="noopener">Drupal</a>, and of course, blogging software like <a href="https://en.wikipedia.org/wiki/Wordpress" target="_blank" rel="noopener">Wordpress</a> and <a href="https://en.wikipedia.org/wiki/Blogspot" target="_blank" rel="noopener">Blogspot</a>.</p>
<p>Things eventually came full circle, as people realized that they don’t need the bloated and complicated mess of <abbr title="Content management system">CMSes</abbr> and blogging software, but at the same time appreciated the separation of presentation and data. Thus <a href="https://en.wikipedia.org/wiki/Static_site_generator" target="_blank" rel="noopener">Static site generators</a> were born<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>.</p>
<p><abbr title="Static site generator">SSGs</abbr> allow users to write blog articles and pages as plain data in various simple formats like <a href="https://en.wikipedia.org/wiki/Markdown" target="_blank" rel="noopener">Markdown</a> or <a href="https://en.wikipedia.org/wiki/reStructuredText" target="_blank" rel="noopener">reStructuredText</a>, and configurations in <a href="https://en.wikipedia.org/wiki/YAML" target="_blank" rel="noopener">YAML</a>, <a href="https://en.wikipedia.org/wiki/JSON" target="_blank" rel="noopener">JSON</a> or <a href="https://en.wikipedia.org/wiki/TOML" target="_blank" rel="noopener">TOML</a>, and process them to produce static websites in HTML/CSS/JS.</p>
<p>Most <abbr title="Static site generator">SSGs</abbr> allow the user to operate in a default mode where you can follow the conventions of the <abbr title="Static site generator">SSG</abbr>—like writing the blog articles in certain formats, and putting them in certain directories—and the <abbr title="Static site generator">SSG</abbr> takes care of everything else. The user does not need to know any internals.</p>
<p>At the same time, most <abbr title="Static site generator">SSGs</abbr> allow users to customize the output website by creating custom templates, and custom URLs. However, all <abbr title="Static site generator">SSGs</abbr> limit what users can do with them. If you need to do something that goes against the grain of your <abbr title="Static site generator">SSG</abbr>, you are stuck.</p>
<h2 data-track-content data-content-name="build-systems" data-content-piece="static-site-generator-using-shake" id="build-systems">Build Systems</h2>
<p><abbr title="Static site generator">SSGs</abbr> are used to create websites by transforming a set of input files (templates, content, and assets) into a set of output files (HTML, CSS, and JavaScript files). In this sense, <abbr title="Static site generator">SSGs</abbr> can be seen as a type of build system, as they automate the process of building a website by following a set of rules and dependencies.</p>
<p>A build system is a tool for automating the process of building complex projects. Build systems are commonly used in software development to ensure that the correct sequence of steps is followed in order to produce a working version of the software. This typically involves compiling source code, linking libraries, and running tests to ensure that the software is correct. However, build systems can also be used for projects in other domains where a set of input files need to be transformed into a set of output files according to some rules.</p>
<p><a href="https://shakebuild.com/" target="_blank" rel="noopener">Shake</a> is a build system written in the <a href="https://haskell.org" target="_blank" rel="noopener">Haskell</a>. It is flexible and powerful enough for managing the build process of complex software projects like <a href="https://www.haskell.org/ghc/" target="_blank" rel="noopener">GHC</a>, but at the same time, it is simple enough to be used to create an <abbr title="Static site generator">SSG</abbr><a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<h3 id="shake">Shake</h3>
<p>In Shake, build targets represent the files or outputs that need to be produced as part of the build process. These could be executable binaries, library files, or any other type of output that is required to complete the build. Build targets are declared in a build script, along with information about their dependencies. For example, if an executable binary depends on a particular library file, the build script would specify this dependency.</p>
<p>Once the build targets and their dependencies have been declared, Shake uses <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#t:Rules" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Rules</span></code></a> to specify how those targets should be built. A rule typically consists of a pattern that matches one or more targets, along with a set of instructions—called build <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#t:Action" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Action</span></code></a>s by Shake—for building them. For example, a rule might specify that a certain type of source code file should be compiled using a particular compiler, with a certain set of flags. When Shake encounters a target that matches the pattern in a rule, it executes the instructions in the rule to build it.</p>
<p>By declaring dependencies between targets and defining rules to build them , Shake is able to figure out the correct order in which to build the targets <a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>. Shake also provides a number of features to help users customize and optimize their build process, such as support for parallel builds, on-demand rebuilding, and caching of intermediate results.</p>
<p>In this post, we use Shake to build an <abbr title="Static site generator">SSG</abbr> by defining the build targets and rules for building the website. In addition, we use <a href="https://pandoc.org/" target="_blank" rel="noopener">Pandoc</a> to render Markdown content into HTML, and <a href="https://hackage.haskell.org/package/mustache" target="_blank" rel="noopener">Mustache</a> to render HTML templates.</p>
<h2 data-track-content data-content-name="static-site-structure" data-content-piece="static-site-generator-using-shake" id="static-site-structure">Static Site Structure</h2>
<p>The source of our website is arranged like this:</p>
<pre class="plain w-100pct"><code>shake-blog
├── Site.hs
├── about.md
├── contact.md
├── css
│ └── default.css
├── images
│ └── logo.png
├── posts
│ ├── 2022-08-12-welcome.md
│ ├── 2022-10-07-hello-world.md
└── templates
├── archive.html
├── default.html
├── home.html
├── post-list.html
└── post.html</code></pre>
<p><code>Site.hs</code> contains the Haskell code that we are going to write in this post. <code>about.md</code> and <code>contact.md</code> are two static pages. The <code>css</code> and <code>images</code> directories contain assets for the website. The <code>posts</code> directory contains blog posts, names of which start with the post publication dates in <code>YYYY-mm-dd</code> format. Finally, the <code>templates</code> directory contains the Mustache templates for the website.</p>
<p>The blog posts start with YAML metadata sections that contain the title of the post, name of the author (optional) and a list of tags for the post. For example:</p>
<div id="lst:welcome.md" class="listing numberSource markdown">
<div class="sourceCode" id="cb2" data-lang="markdown"><pre class="sourceCode numberSource markdown"><code class="sourceCode markdown"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="an">title:</span><span class="co"> Welcome to my blog</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="an">author:</span><span class="co"> Abhinav Sarkar</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="an">tags:</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="co"> - brag</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="co"> - note</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="co">---</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>Welcome to my new blog. I wrote the blog generator myself.</span></code></pre></div>
<p><code>posts/2022-08-12-welcome.md</code></p>
</div>
<p>Pages are written in a similar fashion, but have only title in their YAML metadata.</p>
<p>After processing the input above, our <abbr title="Static site generator">SSG</abbr> produces the following file structure:</p>
<pre class="plain w-100pct"><code>_site/
├── about
│ └── index.html
├── archive
│ └── index.html
├── contact
│ └── index.html
├── css
│ └── default.css
├── images
│ └── logo.png
├── index.html
├── posts
│ ├── 2022-08-12-welcome
│ │ └── index.html
│ ├── 2022-10-07-hello-world
│ │ └── index.html
└── tags
├── brag
│ └── index.html
├── note
│ └── index.html
└── programming
└── index.html</code></pre>
<p>The CSS and image assets are copied directly. One <code>index.html</code> file is generated for each page, post, and tag. Additionally, one file is generated for the archive of posts, and one for the home page.</p>
<p>With the input and output described, let’s get started with writing the generator.</p>
<h2 data-track-content data-content-name="main" data-content-piece="static-site-generator-using-shake" id="main">Main</h2>
<p>We are going to write the program in a top-down fashion, starting with the <code>main</code> function. First come the extensions and imports. Other than imports from Shake, Pandoc and Mustache libraries, we also import from <a href="https://hackage.haskell.org/package/aeson" target="_blank" rel="noopener">aeson</a>, <a href="https://hackage.haskell.org/package/text" target="_blank" rel="noopener">text</a>, <a href="https://hackage.haskell.org/package/time" target="_blank" rel="noopener">time</a> and <a href="https://hackage.haskell.org/package/unordered-containers" target="_blank" rel="noopener">unordered-containers</a> libraries<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>.</p>
<div class="sourceCode" id="cb7" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE ApplicativeDo, DataKinds, DeriveGeneric #-}</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE DerivingVia, LambdaCase, TypeApplications #-}</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Main</span> <span class="kw">where</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span> (forM, void)</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Aeson.Types</span> (<span class="dt">Result</span> (..))</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.List</span> (nub, sortOn)</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Text</span> (<span class="dt">Text</span>)</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Time</span> (<span class="dt">UTCTime</span>, defaultTimeLocale, formatTime, parseTimeM)</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Deriving.Aeson</span></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Deriving.Aeson.Stock</span> (<span class="dt">PrefixedSnake</span>)</span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Development.Shake</span> (<span class="dt">Action</span>, <span class="dt">Rules</span>, (%>), (|%>), (~>))</span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Development.Shake.FilePath</span> ((<.>), (</>))</span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Text.Pandoc</span> (<span class="dt">Block</span> (<span class="dt">Plain</span>), <span class="dt">Meta</span> (..), <span class="dt">MetaValue</span> (..), <span class="dt">Pandoc</span> (..))</span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Aeson.Types</span> <span class="kw">as</span> <span class="dt">A</span></span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.HashMap.Strict</span> <span class="kw">as</span> <span class="dt">HM</span></span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Ord</span> <span class="kw">as</span> <span class="dt">Ord</span></span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Text</span> <span class="kw">as</span> <span class="dt">T</span></span>
<span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Development.Shake</span> <span class="kw">as</span> <span class="dt">Shake</span></span>
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Development.Shake.FilePath</span> <span class="kw">as</span> <span class="dt">Shake</span></span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Text.Mustache</span> <span class="kw">as</span> <span class="dt">Mus</span></span>
<span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Text.Mustache.Compile</span> <span class="kw">as</span> <span class="dt">Mus</span></span>
<span id="cb7-24"><a href="#cb7-24" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Text.Pandoc</span> <span class="kw">as</span> <span class="dt">Pandoc</span></span></code></pre></div>
<p>The <code>main</code> function sets up the top-level Shake build targets, and lets Shake invoke the right one depending on the arguments passed at runtime.</p>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> Shake.shakeArgs Shake.shakeOptions <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> Shake.withTargetDocs <span class="st">"Build the site"</span> <span class="op">$</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> <span class="st">"build"</span> <span class="op">~></span> buildTargets</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> Shake.withTargetDocs <span class="st">"Clean the built site"</span> <span class="op">$</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> <span class="st">"clean"</span> <span class="op">~></span> Shake.removeFilesAfter outputDir [<span class="st">"//*"</span>]</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> Shake.withoutTargets buildRules</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a><span class="ot">outputDir ::</span> <span class="dt">String</span></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a>outputDir <span class="ot">=</span> <span class="st">"_site"</span></span></code></pre></div>
<p>There are two top-level build targets:</p>
<ol type="1">
<li><code>build</code>: generates the website.</li>
<li><code>clean</code>: deletes the generated website.</li>
</ol>
<p><code>outputDir</code> is the subdirectory in which the website is generated. Building the <code>clean</code> target deletes all files inside <code>outputDir</code>. The <code>build</code> target runs the <code>buildTargets</code> action that sets up the build targets for generating the site. The <code>buildRules</code> are also included in the Shake setup.</p>
<h2 data-track-content data-content-name="build-targets" data-content-piece="static-site-generator-using-shake" id="build-targets">Build Targets</h2>
<p>The <code>buildTargets</code> function sets up the build targets for the files to be generated by Shake.</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="13-16"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">buildTargets ::</span> <span class="dt">Action</span> ()</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>buildTargets <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> assetPaths <span class="ot"><-</span> Shake.getDirectoryFiles <span class="st">""</span> assetGlobs</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> Shake.need <span class="op">$</span> <span class="fu">map</span> (outputDir <span class="op"></></span>) assetPaths</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> Shake.need <span class="op">$</span> <span class="fu">map</span> indexHtmlOutputPath pagePaths</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> postPaths <span class="ot"><-</span> Shake.getDirectoryFiles <span class="st">""</span> postGlobs</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> Shake.need <span class="op">$</span> <span class="fu">map</span> indexHtmlOutputPath postPaths</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> Shake.need <span class="op">$</span> <span class="fu">map</span> (outputDir <span class="op"></></span>) [<span class="st">"archive/index.html"</span>, <span class="st">"index.html"</span>]</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> posts <span class="ot"><-</span> forM postPaths readPost</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> Shake.need</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> [ outputDir <span class="op"></></span> <span class="st">"tags"</span> <span class="op"></></span> T.unpack tag <span class="op"></></span> <span class="st">"index.html"</span></span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="op">|</span> post <span class="ot"><-</span> posts, tag <span class="ot"><-</span> postTags post ]</span></span></code></pre></div>
<p>The <a href="https://hackage.haskell.org/package/shake-0.19.6/docs/Development-Shake.html#v:need" target="_blank" rel="noopener"><code class="sourceCode haskell">Shake.need</code></a> function registers one or more targets with Shake.</p>
<p>For assets, we just want them to be copied to the <code>outputDir</code> at the same path.</p>
<p>Page and post target paths in the <code>outputDir</code> are stripped of their extensions and appended with <code>/index.html</code>. So a post sourced from <code>posts/example.md</code> ends up at <code><outputDir>/posts/example/index.html</code>.</p>
<p>We also register two composite targets for the post archive and the home page<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>.</p>
<p>The paths, globs and helper function are shown below:</p>
<div class="sourceCode" id="cb9" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">assetGlobs ::</span> [<span class="dt">String</span>]</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>assetGlobs <span class="ot">=</span> [<span class="st">"css/*.css"</span>, <span class="st">"images/*.png"</span>]</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="ot">pagePaths ::</span> [<span class="dt">String</span>]</span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>pagePaths <span class="ot">=</span> [<span class="st">"about.md"</span>, <span class="st">"contact.md"</span>]</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a><span class="ot">postGlobs ::</span> [<span class="dt">String</span>]</span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a>postGlobs <span class="ot">=</span> [<span class="st">"posts/*.md"</span>]</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a><span class="ot">indexHtmlOutputPath ::</span> <span class="dt">FilePath</span> <span class="ot">-></span> <span class="dt">FilePath</span></span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a>indexHtmlOutputPath srcPath <span class="ot">=</span></span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a> outputDir <span class="op"></></span> Shake.dropExtension srcPath <span class="op"></></span> <span class="st">"index.html"</span></span></code></pre></div>
<p>Now Shake knows what we want it to build. But how does it know how to build them? That’s what the build rules are for.</p>
<h2 data-track-content data-content-name="build-rules" data-content-piece="static-site-generator-using-shake" id="build-rules">Build Rules</h2>
<p>We have one build rule function for each build target type:</p>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">buildRules ::</span> <span class="dt">Rules</span> ()</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>buildRules <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> assets</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> pages</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> posts</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> archive</span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> tags</span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> home</span></code></pre></div>
<p>Let’s start with the simplest one, the build rule for assets.</p>
<h3 id="assets">Assets</h3>
<p>In Shake, the build rules are written with <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:-124--37--62-" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="op">|%></span></code></a> or <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:-37--62-" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="op">%></span></code></a> operators. The <code class="sourceCode haskell"><span class="op">|%></span></code> operator takes a list of output globs or paths, and a function from target path to build action. When <code class="sourceCode haskell">Shake.need</code> is called with a file that matches a target glob, the corresponding build action is called with the target path.</p>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">assets ::</span> <span class="dt">Rules</span> ()</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>assets <span class="ot">=</span> <span class="fu">map</span> (outputDir <span class="op"></></span>) assetGlobs <span class="op">|%></span> \target <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> src <span class="ot">=</span> Shake.dropDirectory1 target</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> Shake.copyFileChanged src target</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> Shake.putInfo <span class="op">$</span> <span class="st">"Copied "</span> <span class="op"><></span> target <span class="op"><></span> <span class="st">" from "</span> <span class="op"><></span> src</span></code></pre></div>
<p>In case of assets, we simply get the original source path by dropping the first directory from the target path (that is, <code>outputDir</code>), and copy the source file to the target path if the file has changed<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>.</p>
<h3 id="pages">Pages</h3>
<p>Building pages is a bit more interesting. First, we write a data type to represent a page:</p>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Page</span> <span class="ot">=</span> <span class="dt">Page</span> {<span class="ot">pageTitle ::</span> <span class="dt">Text</span>,<span class="ot"> pageContent ::</span> <span class="dt">Text</span>}</span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Generic</span>)</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">ToJSON</span>) via <span class="dt">PrefixedSnake</span> <span class="st">"page"</span> <span class="dt">Page</span></span></code></pre></div>
<p>A page has a title and some text content. We also make <code class="sourceCode haskell"><span class="dt">Page</span></code> data type JSON serializable so that it can be consumed by the Mustache library for filling templates.</p>
<p>Now, the code that builds pages:</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">pages ::</span> <span class="dt">Rules</span> ()</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>pages <span class="ot">=</span> <span class="fu">map</span> indexHtmlOutputPath pagePaths <span class="op">|%></span> \target <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> src <span class="ot">=</span> indexHtmlSourcePath target</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a> (meta, html) <span class="ot"><-</span> markdownToHtml src</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> page <span class="ot">=</span> <span class="dt">Page</span> (meta <span class="op">HM.!</span> <span class="st">"title"</span>) html</span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> applyTemplateAndWrite <span class="st">"default.html"</span> page target</span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a> Shake.putInfo <span class="op">$</span> <span class="st">"Built "</span> <span class="op"><></span> target <span class="op"><></span> <span class="st">" from "</span> <span class="op"><></span> src</span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a><span class="ot">indexHtmlSourcePath ::</span> <span class="dt">FilePath</span> <span class="ot">-></span> <span class="dt">FilePath</span></span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a>indexHtmlSourcePath <span class="ot">=</span></span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a> Shake.dropDirectory1</span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> (<span class="op"><.></span> <span class="st">"md"</span>)</span>
<span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Shake.dropTrailingPathSeparator</span>
<span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Shake.dropFileName</span></code></pre></div>
<p>We get the source path from the target path by passing it through the <code>indexHtmlSourcePath</code> function. We read and render the source file by calling the <code>markdownToHtml</code> function. It returns the page YAML metadata as a <a href="https://hackage.haskell.org/package/aeson/docs/Data-Aeson-Types.html#t:FromJSON" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">FromJSON</span></code></a>-able value (a <a href="https://hackage.haskell.org/package/unordered-containers/docs/Data-HashMap-Strict.html" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">HashMap</span></code></a> in this case), and the page HTML text.</p>
<p>Next, we apply the <code class="sourceCode haskell"><span class="dt">Page</span></code> data to the <code>default.html</code> template, and write it to the target path by calling the <code>applyTemplateAndWrite</code> function. This creates the HTML file for the page.</p>
<p>The <code>default.html</code> Mustache template can be seen below:</p>
<details>
<summary class="print-em">
<code>templates/default.html</code>
</summary>
<div class="sourceCode" id="cb14" data-lang="mustache"><pre class="sourceCode numberSource mustache"><code class="sourceCode mustache"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="dt"><!DOCTYPE</span> html<span class="dt">></span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a><span class="kw"><html</span><span class="ot"> lang</span><span class="op">=</span><span class="st">"en"</span><span class="kw">></span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="kw"><head></span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> <span class="kw"><meta</span><span class="ot"> charset</span><span class="op">=</span><span class="st">"UTF-8"</span><span class="kw">></span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> <span class="kw"><meta</span><span class="ot"> name</span><span class="op">=</span><span class="st">"viewport"</span><span class="ot"> content</span><span class="op">=</span><span class="st">"width=device-width, initial-scale=1.0"</span><span class="kw">></span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a> <span class="kw"><meta</span><span class="ot"> http-equiv</span><span class="op">=</span><span class="st">"X-UA-Compatible"</span><span class="ot"> content</span><span class="op">=</span><span class="st">"ie=edge"</span><span class="kw">></span></span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a> <span class="kw"><title></span>My Shake Blog — <span class="sc">{{{</span>title<span class="sc">}}}</span><span class="kw"></title></span></span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a> <span class="kw"><link</span><span class="ot"> rel</span><span class="op">=</span><span class="st">"stylesheet"</span><span class="ot"> type</span><span class="op">=</span><span class="st">"text/css"</span><span class="ot"> href</span><span class="op">=</span><span class="st">"/css/default.css"</span> <span class="kw">/></span></span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a><span class="kw"></head></span></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a><span class="kw"><body></span></span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> <span class="kw"><div</span><span class="ot"> id</span><span class="op">=</span><span class="st">"header"</span><span class="kw">></span></span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a> <span class="kw"><div</span><span class="ot"> id</span><span class="op">=</span><span class="st">"logo"</span><span class="kw">></span></span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a> <span class="kw"><a</span><span class="ot"> href</span><span class="op">=</span><span class="st">"/"</span><span class="kw">></span>My Shake Blog<span class="kw"></a></span></span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a> <span class="kw"></div></span></span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a> <span class="kw"><div</span><span class="ot"> id</span><span class="op">=</span><span class="st">"navigation"</span><span class="kw">></span></span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a> <span class="kw"><a</span><span class="ot"> href</span><span class="op">=</span><span class="st">"/"</span><span class="kw">></span>Home<span class="kw"></a></span></span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a> <span class="kw"><a</span><span class="ot"> href</span><span class="op">=</span><span class="st">"/about/"</span><span class="kw">></span>About<span class="kw"></a></span></span>
<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a> <span class="kw"><a</span><span class="ot"> href</span><span class="op">=</span><span class="st">"/contact/"</span><span class="kw">></span>Contact<span class="kw"></a></span></span>
<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a> <span class="kw"><a</span><span class="ot"> href</span><span class="op">=</span><span class="st">"/archive/"</span><span class="kw">></span>Archive<span class="kw"></a></span></span>
<span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a> <span class="kw"></div></span></span>
<span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a> <span class="kw"></div></span></span>
<span id="cb14-22"><a href="#cb14-22" aria-hidden="true" tabindex="-1"></a> <span class="kw"><div</span><span class="ot"> id</span><span class="op">=</span><span class="st">"content"</span><span class="kw">></span></span>
<span id="cb14-23"><a href="#cb14-23" aria-hidden="true" tabindex="-1"></a> <span class="kw"><h1></span><span class="sc">{{{</span>title<span class="sc">}}}</span><span class="kw"></h1></span></span>
<span id="cb14-24"><a href="#cb14-24" aria-hidden="true" tabindex="-1"></a> <span class="sc">{{{</span>content<span class="sc">}}}</span></span>
<span id="cb14-25"><a href="#cb14-25" aria-hidden="true" tabindex="-1"></a> <span class="kw"></div></span></span>
<span id="cb14-26"><a href="#cb14-26" aria-hidden="true" tabindex="-1"></a> <span class="kw"><div</span><span class="ot"> id</span><span class="op">=</span><span class="st">"footer"</span><span class="kw">></span></span>
<span id="cb14-27"><a href="#cb14-27" aria-hidden="true" tabindex="-1"></a> Site proudly generated by <span class="kw"><a</span><span class="ot"> href</span><span class="op">=</span><span class="st">"https://shakebuild.com"</span><span class="kw">></span>Shake<span class="kw"></a></span></span>
<span id="cb14-28"><a href="#cb14-28" aria-hidden="true" tabindex="-1"></a> <span class="kw"></div></span></span>
<span id="cb14-29"><a href="#cb14-29" aria-hidden="true" tabindex="-1"></a><span class="kw"></body></span></span>
<span id="cb14-30"><a href="#cb14-30" aria-hidden="true" tabindex="-1"></a><span class="kw"></html></span></span></code></pre></div>
</details>
<h3 id="posts">Posts</h3>
<p>Building posts is similar to building pages. We have a data type for posts:</p>
<div class="sourceCode" id="cb15" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Post</span> <span class="ot">=</span> <span class="dt">Post</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> postTitle ::</span> <span class="dt">Text</span>,</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="ot"> postAuthor ::</span> <span class="dt">Maybe</span> <span class="dt">Text</span>,</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="ot"> postTags ::</span> [<span class="dt">Text</span>],</span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a><span class="ot"> postDate ::</span> <span class="dt">Maybe</span> <span class="dt">Text</span>,</span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a><span class="ot"> postContent ::</span> <span class="dt">Maybe</span> <span class="dt">Text</span>,</span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a><span class="ot"> postLink ::</span> <span class="dt">Maybe</span> <span class="dt">Text</span></span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a> } <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Generic</span>)</span>
<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">FromJSON</span>, <span class="dt">ToJSON</span>) via <span class="dt">PrefixedSnake</span> <span class="st">"post"</span> <span class="dt">Post</span></span></code></pre></div>
<p>Other than the title and text content, a post also has a date, a list of tags, an optional author, and a permalink. Some of these data come from the post YAML metadata, and some are derived from the post source path. as we see below:</p>
<div class="sourceCode" id="cb16" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="ot">posts ::</span> <span class="dt">Rules</span> ()</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>posts <span class="ot">=</span> <span class="fu">map</span> indexHtmlOutputPath postGlobs <span class="op">|%></span> \target <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> src <span class="ot">=</span> indexHtmlSourcePath target</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> post <span class="ot"><-</span> readPost src</span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a> postHtml <span class="ot"><-</span> applyTemplate <span class="st">"post.html"</span> post</span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> page <span class="ot">=</span> <span class="dt">Page</span> (postTitle post) postHtml</span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> applyTemplateAndWrite <span class="st">"default.html"</span> page target</span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a> Shake.putInfo <span class="op">$</span> <span class="st">"Built "</span> <span class="op"><></span> target <span class="op"><></span> <span class="st">" from "</span> <span class="op"><></span> src</span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a><span class="ot">readPost ::</span> <span class="dt">FilePath</span> <span class="ot">-></span> <span class="dt">Action</span> <span class="dt">Post</span></span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a>readPost postPath <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a> date <span class="ot"><-</span> parseTimeM <span class="dt">False</span> defaultTimeLocale <span class="st">"%Y-%-m-%-d"</span></span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">take</span> <span class="dv">10</span></span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Shake.takeBaseName</span>
<span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> postPath</span>
<span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> formattedDate <span class="ot">=</span></span>
<span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a> T.pack <span class="op">$</span> formatTime <span class="op">@</span><span class="dt">UTCTime</span> defaultTimeLocale <span class="st">"%b %e, %Y"</span> date</span>
<span id="cb16-19"><a href="#cb16-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-20"><a href="#cb16-20" aria-hidden="true" tabindex="-1"></a> (post, html) <span class="ot"><-</span> markdownToHtml postPath</span>
<span id="cb16-21"><a href="#cb16-21" aria-hidden="true" tabindex="-1"></a> Shake.putInfo <span class="op">$</span> <span class="st">"Read "</span> <span class="op"><></span> postPath</span>
<span id="cb16-22"><a href="#cb16-22" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> <span class="op">$</span> post</span>
<span id="cb16-23"><a href="#cb16-23" aria-hidden="true" tabindex="-1"></a> { postDate <span class="ot">=</span> <span class="dt">Just</span> formattedDate,</span>
<span id="cb16-24"><a href="#cb16-24" aria-hidden="true" tabindex="-1"></a> postContent <span class="ot">=</span> <span class="dt">Just</span> html,</span>
<span id="cb16-25"><a href="#cb16-25" aria-hidden="true" tabindex="-1"></a> postLink <span class="ot">=</span> <span class="dt">Just</span> <span class="op">.</span> T.pack <span class="op">$</span> <span class="st">"/"</span> <span class="op"><></span> Shake.dropExtension postPath <span class="op"><></span> <span class="st">"/"</span></span>
<span id="cb16-26"><a href="#cb16-26" aria-hidden="true" tabindex="-1"></a> }</span></code></pre></div>
<p>We call the <code>readPost</code> function, which parses the post date from the post path, and renders the post text using the <code>markdownToHtml</code> function. We then apply the <code class="sourceCode haskell"><span class="dt">Post</span></code> data to the <code>post.html</code> template to create the templated HTML content. Finally, we create the <code class="sourceCode haskell"><span class="dt">Page</span></code> data from the rendered post, apply it to the <code>default.html</code> template, and write the final HTML file to the target path.</p>
<p>The template for the post page can be seen below:</p>
<details>
<summary class="print-em">
<code>templates/post.html</code>
</summary>
<div class="sourceCode" id="cb17" data-lang="mustache"><pre class="sourceCode numberSource mustache"><code class="sourceCode mustache"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="kw"><div</span><span class="ot"> class</span><span class="op">=</span><span class="st">"info"</span><span class="kw">></span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a> Posted on <span class="sc">{{{</span>date<span class="sc">}}}</span></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">{{#author}}</span></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a> by <span class="sc">{{{</span>author<span class="sc">}}}</span></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">{{/author}}</span></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a><span class="kw"></div></span></span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a><span class="kw"><div</span><span class="ot"> class</span><span class="op">=</span><span class="st">"info"</span><span class="kw">></span></span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a> Tags:</span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a> <span class="kw"><ul</span><span class="ot"> class</span><span class="op">=</span><span class="st">"tags"</span><span class="kw">></span></span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a> <span class="fu">{{#tags}}</span></span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a> <span class="kw"><li><a</span><span class="ot"> href</span><span class="op">=</span><span class="st">"/tags/</span><span class="sc">{{{</span><span class="op">.</span><span class="sc">}}}</span><span class="st">/"</span><span class="kw">></span><span class="sc">{{{</span><span class="op">.</span><span class="sc">}}}</span><span class="kw"></a></li></span></span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a> <span class="fu">{{/tags}}</span></span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a> <span class="kw"></ul></span></span>
<span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a><span class="kw"></div></span></span>
<span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a><span class="sc">{{{</span>content<span class="sc">}}}</span></span></code></pre></div>
</details>
<h3 id="archive">Archive</h3>
<p>The archive page is a bit more involved. We read all the posts, and sort them by date. Then we apply the <code>archive.html</code> template, and then the <code>default.html</code> template to create the final HTML file, as shown below:</p>
<div class="sourceCode" id="cb18" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">archive ::</span> <span class="dt">Rules</span> ()</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>archive <span class="ot">=</span> outputDir <span class="op"></></span> <span class="st">"archive/index.html"</span> <span class="op">%></span> \target <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a> postPaths <span class="ot"><-</span> Shake.getDirectoryFiles <span class="st">""</span> postGlobs</span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a> posts <span class="ot"><-</span> sortOn (<span class="dt">Ord</span><span class="op">.</span><span class="dt">Down</span> <span class="op">.</span> postDate) <span class="op"><$></span> forM postPaths readPost</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a> writeArchive (T.pack <span class="st">"Archive"</span>) posts target</span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a><span class="ot">writeArchive ::</span> <span class="dt">Text</span> <span class="ot">-></span> [<span class="dt">Post</span>] <span class="ot">-></span> <span class="dt">FilePath</span> <span class="ot">-></span> <span class="dt">Action</span> ()</span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a>writeArchive title posts target <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a> html <span class="ot"><-</span> applyTemplate <span class="st">"archive.html"</span> <span class="op">$</span> HM.singleton <span class="st">"posts"</span> posts</span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a> applyTemplateAndWrite <span class="st">"default.html"</span> (<span class="dt">Page</span> title html) target</span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a> Shake.putInfo <span class="op">$</span> <span class="st">"Built "</span> <span class="op"><></span> target</span></code></pre></div>
<p>The <code>archive.html</code> template transcludes the <code>post-list.html</code> template for reuse with the home page.</p>
<details>
<summary class="print-em">
<code>templates/archive.html</code>
</summary>
<div class="sourceCode" id="cb19" data-lang="mustache"><pre class="sourceCode numberSource mustache"><code class="sourceCode mustache"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a>My posts:</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a><span class="va">{{></span><span class="ch"> templates/post-list</span><span class="op">.</span><span class="ch">html </span><span class="va">}}</span></span></code></pre></div>
</details>
<details>
<summary class="print-em">
<code>templates/post-list.html</code>
</summary>
<div class="sourceCode" id="cb20" data-lang="mustache"><pre class="sourceCode numberSource mustache"><code class="sourceCode mustache"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="kw"><ul></span></span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">{{#posts}}</span></span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a> <span class="kw"><li></span></span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a> <span class="kw"><a</span><span class="ot"> href</span><span class="op">=</span><span class="st">"</span><span class="sc">{{{link}}}</span><span class="st">"</span><span class="kw">></span><span class="sc">{{{</span>title<span class="sc">}}}</span><span class="kw"></a></span> - <span class="sc">{{{</span>date<span class="sc">}}}</span></span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a> <span class="kw"></li></span></span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">{{/posts}}</span></span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a><span class="kw"></ul></span></span></code></pre></div>
</details>
<h3 id="tags">Tags</h3>
<p>Now, we build a page for each post tag. Step one is to read all the posts, collect the tags, and add build targets for each tag. We do this in the <code>buildTargets</code> function, as shown in the emphasized code below:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-emphasize="13-16"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">buildTargets ::</span> <span class="dt">Action</span> ()</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>buildTargets <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> assetPaths <span class="ot"><-</span> Shake.getDirectoryFiles <span class="st">""</span> assetGlobs</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> Shake.need <span class="op">$</span> <span class="fu">map</span> (outputDir <span class="op"></></span>) assetPaths</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> Shake.need <span class="op">$</span> <span class="fu">map</span> indexHtmlOutputPath pagePaths</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> postPaths <span class="ot"><-</span> Shake.getDirectoryFiles <span class="st">""</span> postGlobs</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> Shake.need <span class="op">$</span> <span class="fu">map</span> indexHtmlOutputPath postPaths</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> Shake.need <span class="op">$</span> <span class="fu">map</span> (outputDir <span class="op"></></span>) [<span class="st">"archive/index.html"</span>, <span class="st">"index.html"</span>]</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> posts <span class="ot"><-</span> forM postPaths readPost</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> Shake.need</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> [ outputDir <span class="op"></></span> <span class="st">"tags"</span> <span class="op"></></span> T.unpack tag <span class="op"></></span> <span class="st">"index.html"</span></span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a><span class="emphasis"> <span class="op">|</span> post <span class="ot"><-</span> posts, tag <span class="ot"><-</span> postTags post ]</span></span></code></pre></div>
<p>Next, we implement the build rule for tags:</p>
<div class="sourceCode" id="cb21" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">tags ::</span> <span class="dt">Rules</span> ()</span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>tags <span class="ot">=</span> outputDir <span class="op"></></span> <span class="st">"tags/*/index.html"</span> <span class="op">%></span> \target <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> tag <span class="ot">=</span> T.pack <span class="op">$</span> Shake.splitDirectories target <span class="op">!!</span> <span class="dv">2</span></span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a> postPaths <span class="ot"><-</span> Shake.getDirectoryFiles <span class="st">""</span> postGlobs</span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a> posts <span class="ot"><-</span> sortOn (<span class="dt">Ord</span><span class="op">.</span><span class="dt">Down</span> <span class="op">.</span> postDate)</span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">filter</span> ((tag <span class="ot">`elem`</span>) <span class="op">.</span> postTags)</span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> forM postPaths readPost</span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a> writeArchive (T.pack <span class="st">"Posts tagged "</span> <span class="op"><></span> tag) posts target</span></code></pre></div>
<p>First, we parse the tag from the target path. We then read all the posts, filter them by tag, and render the tag page using the <code>writeArchive</code> function that we use for the archive page.</p>
<h3 id="home">Home</h3>
<p>Finally, we come to the home page. It is quite similar to the archive page, except that we only show the first few posts<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a><a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>:</p>
<div class="sourceCode" id="cb22" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="ot">home ::</span> <span class="dt">Rules</span> ()</span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>home <span class="ot">=</span> outputDir <span class="op"></></span> <span class="st">"index.html"</span> <span class="op">%></span> \target <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a> postPaths <span class="ot"><-</span> Shake.getDirectoryFiles <span class="st">""</span> postGlobs</span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a> posts <span class="ot"><-</span> <span class="fu">take</span> <span class="dv">3</span></span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> sortOn (<span class="dt">Ord</span><span class="op">.</span><span class="dt">Down</span> <span class="op">.</span> postDate)</span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> forM postPaths readPost</span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a> html <span class="ot"><-</span> applyTemplate <span class="st">"home.html"</span> <span class="op">$</span> HM.singleton <span class="st">"posts"</span> posts</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> page <span class="ot">=</span> <span class="dt">Page</span> (T.pack <span class="st">"Home"</span>) html</span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a> applyTemplateAndWrite <span class="st">"default.html"</span> page target</span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a> Shake.putInfo <span class="op">$</span> <span class="st">"Built "</span> <span class="op"><></span> target</span></code></pre></div>
<p>The <code>home.html</code> template also transcludes the <code>post-list.html</code> template:</p>
<details>
<summary class="print-em">
<code>templates/home.html</code>
</summary>
<div class="sourceCode" id="cb23" data-lang="mustache"><pre class="sourceCode numberSource mustache"><code class="sourceCode mustache"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="kw"><h2></span>Welcome<span class="kw"></h2></span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a><span class="kw"><img</span><span class="ot"> src</span><span class="op">=</span><span class="st">"/images/logo.png"</span><span class="ot"> style</span><span class="op">=</span><span class="st">"float: right; margin: 10px;"</span> <span class="kw">/></span></span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a><span class="kw"><p></span>Welcome to my blog!<span class="kw"></p></span></span>
<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a><span class="kw"><p></span>My recent posts here for your reading pleasure:<span class="kw"></p></span></span>
<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a><span class="kw"><h2></span>Posts<span class="kw"></h2></span></span>
<span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"></a><span class="va">{{></span><span class="ch"> templates/post-list</span><span class="op">.</span><span class="ch">html </span><span class="va">}}</span></span>
<span id="cb23-7"><a href="#cb23-7" aria-hidden="true" tabindex="-1"></a><span class="kw"><p></span>You can find all posts in the <span class="kw"><a</span><span class="ot"> href</span><span class="op">=</span><span class="st">"/archive/"</span><span class="kw">></span>archives<span class="kw"></a></span>.</span></code></pre></div>
</details>
<p>That’s it for the build rules. We have covered all the targets that we defined in the <code>buildTargets</code> function. Next, we look at the Pandoc and Mustache utilities that we use in the build rules.</p>
<h2 data-track-content data-content-name="utilities" data-content-piece="static-site-generator-using-shake" id="utilities">Utilities</h2>
<p>We use the Pandoc library to render Markdown to HTML. We also use the Mustache library to render the generated HTML with the Mustache templates. We wrap these libraries in a few utility functions, as shown in the next sections.</p>
<h3 id="pandoc">Pandoc</h3>
<p>We wrap Pandoc’s Markdown-to-HTML rendering to make it a Shake build action. We also parse the YAML metadata from the Markdown source, and return it as a <code class="sourceCode haskell"><span class="dt">FromJSON</span></code>-able value<a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a>.</p>
<div class="sourceCode" id="cb24" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="ot">markdownToHtml ::</span> <span class="dt">FromJSON</span> a <span class="ot">=></span> <span class="dt">FilePath</span> <span class="ot">-></span> <span class="dt">Action</span> (a, <span class="dt">Text</span>)</span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>markdownToHtml filePath <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a> content <span class="ot"><-</span> Shake.readFile' filePath</span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a> Shake.quietly <span class="op">.</span> Shake.traced <span class="st">"Markdown to HTML"</span> <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a> pandoc<span class="op">@</span>(<span class="dt">Pandoc</span> meta _) <span class="ot"><-</span></span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a> runPandoc <span class="op">.</span> Pandoc.readMarkdown readerOptions <span class="op">.</span> T.pack <span class="op">$</span> content</span>
<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"></a> meta' <span class="ot"><-</span> fromMeta meta</span>
<span id="cb24-8"><a href="#cb24-8" aria-hidden="true" tabindex="-1"></a> html <span class="ot"><-</span> runPandoc <span class="op">.</span> Pandoc.writeHtml5String writerOptions <span class="op">$</span> pandoc</span>
<span id="cb24-9"><a href="#cb24-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> (meta', html)</span>
<span id="cb24-10"><a href="#cb24-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb24-11"><a href="#cb24-11" aria-hidden="true" tabindex="-1"></a> readerOptions <span class="ot">=</span></span>
<span id="cb24-12"><a href="#cb24-12" aria-hidden="true" tabindex="-1"></a> Pandoc.def {Pandoc.readerExtensions <span class="ot">=</span> Pandoc.pandocExtensions}</span>
<span id="cb24-13"><a href="#cb24-13" aria-hidden="true" tabindex="-1"></a> writerOptions <span class="ot">=</span></span>
<span id="cb24-14"><a href="#cb24-14" aria-hidden="true" tabindex="-1"></a> Pandoc.def {Pandoc.writerExtensions <span class="ot">=</span> Pandoc.pandocExtensions}</span>
<span id="cb24-15"><a href="#cb24-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-16"><a href="#cb24-16" aria-hidden="true" tabindex="-1"></a> fromMeta (<span class="dt">Meta</span> meta) <span class="ot">=</span></span>
<span id="cb24-17"><a href="#cb24-17" aria-hidden="true" tabindex="-1"></a> A.fromJSON <span class="op">.</span> A.toJSON <span class="op"><$></span> <span class="fu">traverse</span> metaValueToJSON meta <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb24-18"><a href="#cb24-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">Success</span> res <span class="ot">-></span> <span class="fu">pure</span> res</span>
<span id="cb24-19"><a href="#cb24-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">Error</span> err <span class="ot">-></span> <span class="fu">fail</span> <span class="op">$</span> <span class="st">"json conversion error:"</span> <span class="op"><></span> err</span>
<span id="cb24-20"><a href="#cb24-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-21"><a href="#cb24-21" aria-hidden="true" tabindex="-1"></a> metaValueToJSON <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb24-22"><a href="#cb24-22" aria-hidden="true" tabindex="-1"></a> <span class="dt">MetaMap</span> m <span class="ot">-></span> A.toJSON <span class="op"><$></span> <span class="fu">traverse</span> metaValueToJSON m</span>
<span id="cb24-23"><a href="#cb24-23" aria-hidden="true" tabindex="-1"></a> <span class="dt">MetaList</span> m <span class="ot">-></span> A.toJSONList <span class="op"><$></span> <span class="fu">traverse</span> metaValueToJSON m</span>
<span id="cb24-24"><a href="#cb24-24" aria-hidden="true" tabindex="-1"></a> <span class="dt">MetaBool</span> m <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> A.toJSON m</span>
<span id="cb24-25"><a href="#cb24-25" aria-hidden="true" tabindex="-1"></a> <span class="dt">MetaString</span> m <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> A.toJSON <span class="op">$</span> T.strip m</span>
<span id="cb24-26"><a href="#cb24-26" aria-hidden="true" tabindex="-1"></a> <span class="dt">MetaInlines</span> m <span class="ot">-></span> metaValueToJSON <span class="op">$</span> <span class="dt">MetaBlocks</span> [<span class="dt">Plain</span> m]</span>
<span id="cb24-27"><a href="#cb24-27" aria-hidden="true" tabindex="-1"></a> <span class="dt">MetaBlocks</span> m <span class="ot">-></span></span>
<span id="cb24-28"><a href="#cb24-28" aria-hidden="true" tabindex="-1"></a> <span class="fu">fmap</span> (A.toJSON <span class="op">.</span> T.strip)</span>
<span id="cb24-29"><a href="#cb24-29" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> runPandoc</span>
<span id="cb24-30"><a href="#cb24-30" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Pandoc.writePlain Pandoc.def</span>
<span id="cb24-31"><a href="#cb24-31" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> <span class="dt">Pandoc</span> <span class="fu">mempty</span> m</span>
<span id="cb24-32"><a href="#cb24-32" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-33"><a href="#cb24-33" aria-hidden="true" tabindex="-1"></a> runPandoc action <span class="ot">=</span></span>
<span id="cb24-34"><a href="#cb24-34" aria-hidden="true" tabindex="-1"></a> Pandoc.runIO (Pandoc.setVerbosity <span class="dt">Pandoc.ERROR</span> <span class="op">>></span> action)</span>
<span id="cb24-35"><a href="#cb24-35" aria-hidden="true" tabindex="-1"></a> <span class="op">>>=</span> <span class="fu">either</span> (<span class="fu">fail</span> <span class="op">.</span> <span class="fu">show</span>) <span class="fu">return</span></span></code></pre></div>
<h3 id="mustache">Mustache</h3>
<p>We wrap Mustache’s template reading and rendering to make them Shake build actions.</p>
<div class="sourceCode" id="cb25" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ot">applyTemplate ::</span> <span class="dt">ToJSON</span> a <span class="ot">=></span> <span class="dt">String</span> <span class="ot">-></span> a <span class="ot">-></span> <span class="dt">Action</span> <span class="dt">Text</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a>applyTemplate templateName context <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a> tmpl <span class="ot"><-</span> readTemplate <span class="op">$</span> <span class="st">"templates"</span> <span class="op"></></span> templateName</span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> Mus.checkedSubstitute tmpl (A.toJSON context) <span class="kw">of</span></span>
<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a> ([], text) <span class="ot">-></span> <span class="fu">return</span> text</span>
<span id="cb25-6"><a href="#cb25-6" aria-hidden="true" tabindex="-1"></a> (errs, _) <span class="ot">-></span> <span class="fu">fail</span> <span class="op">$</span></span>
<span id="cb25-7"><a href="#cb25-7" aria-hidden="true" tabindex="-1"></a> <span class="st">"Error while substituting template "</span> <span class="op"><></span> templateName</span>
<span id="cb25-8"><a href="#cb25-8" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">": "</span> <span class="op"><></span> <span class="fu">unlines</span> (<span class="fu">map</span> <span class="fu">show</span> errs)</span>
<span id="cb25-9"><a href="#cb25-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-10"><a href="#cb25-10" aria-hidden="true" tabindex="-1"></a><span class="ot">applyTemplateAndWrite ::</span> <span class="dt">ToJSON</span> a <span class="ot">=></span> <span class="dt">String</span> <span class="ot">-></span> a <span class="ot">-></span> <span class="dt">FilePath</span> <span class="ot">-></span> <span class="dt">Action</span> ()</span>
<span id="cb25-11"><a href="#cb25-11" aria-hidden="true" tabindex="-1"></a>applyTemplateAndWrite templateName context outputPath <span class="ot">=</span></span>
<span id="cb25-12"><a href="#cb25-12" aria-hidden="true" tabindex="-1"></a> applyTemplate templateName context</span>
<span id="cb25-13"><a href="#cb25-13" aria-hidden="true" tabindex="-1"></a> <span class="op">>>=</span> Shake.writeFile' outputPath <span class="op">.</span> T.unpack</span>
<span id="cb25-14"><a href="#cb25-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-15"><a href="#cb25-15" aria-hidden="true" tabindex="-1"></a><span class="ot">readTemplate ::</span> <span class="dt">FilePath</span> <span class="ot">-></span> <span class="dt">Action</span> <span class="dt">Mus.Template</span></span>
<span id="cb25-16"><a href="#cb25-16" aria-hidden="true" tabindex="-1"></a>readTemplate templatePath <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb25-17"><a href="#cb25-17" aria-hidden="true" tabindex="-1"></a> Shake.need [templatePath]</span>
<span id="cb25-18"><a href="#cb25-18" aria-hidden="true" tabindex="-1"></a> eTemplate <span class="ot"><-</span> Shake.quietly</span>
<span id="cb25-19"><a href="#cb25-19" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Shake.traced <span class="st">"Compile template"</span></span>
<span id="cb25-20"><a href="#cb25-20" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> Mus.localAutomaticCompile templatePath</span>
<span id="cb25-21"><a href="#cb25-21" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> eTemplate <span class="kw">of</span></span>
<span id="cb25-22"><a href="#cb25-22" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> template <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb25-23"><a href="#cb25-23" aria-hidden="true" tabindex="-1"></a> Shake.need <span class="op">.</span> Mus.getPartials <span class="op">.</span> Mus.ast <span class="op">$</span> template</span>
<span id="cb25-24"><a href="#cb25-24" aria-hidden="true" tabindex="-1"></a> Shake.putInfo <span class="op">$</span> <span class="st">"Read "</span> <span class="op"><></span> templatePath</span>
<span id="cb25-25"><a href="#cb25-25" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> template</span>
<span id="cb25-26"><a href="#cb25-26" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> err <span class="ot">-></span> <span class="fu">fail</span> <span class="op">$</span> <span class="fu">show</span> err</span></code></pre></div>
<p>The <code>readTemplate</code> function specially takes care of marking the template (and its transcluded templates) as dependencies of pages that use them. By doing this, Shake rebuilds the pages if any of the templates change.</p>
<h2 data-track-content data-content-name="building-the-blog" data-content-piece="static-site-generator-using-shake" id="building-the-blog">Building the Blog</h2>
<p>We are now ready to run the build:</p>
<details>
<summary class="print-hide">
Build log
</summary>
<pre class="plain"><code>$ ./Site.hs clean
Build completed in 0.02s
$ ./Site.hs build
Copied _site/images/logo.png from images/logo.png
Copied _site/css/default.css from css/default.css
Read templates/default.html
Built _site/contact/index.html from contact.md
Read templates/default.html
Built _site/about/index.html from about.md
Read posts/2022-10-07-hello-world.md
Read templates/post.html
Read templates/default.html
Built _site/posts/2022-10-07-hello-world/index.html from posts/2022-10-07-hello-world.md
Read posts/2022-08-12-welcome.md
Read templates/post.html
Read templates/default.html
Built _site/posts/2022-08-12-welcome/index.html from posts/2022-08-12-welcome.md
Read posts/2022-08-12-welcome.md
Read posts/2022-10-07-hello-world.md
Read templates/home.html
Read templates/default.html
Built _site/index.html
Read posts/2022-08-12-welcome.md
Read posts/2022-10-07-hello-world.md
Read templates/archive.html
Read templates/default.html
Built _site/archive/index.html
Read posts/2022-08-12-welcome.md
Read posts/2022-10-07-hello-world.md
Read posts/2022-08-12-welcome.md
Read posts/2022-10-07-hello-world.md
Read templates/archive.html
Read templates/default.html
Built _site/tags/programming/index.html
Read posts/2022-08-12-welcome.md
Read posts/2022-10-07-hello-world.md
Read templates/archive.html
Read templates/default.html
Built _site/tags/note/index.html
Read posts/2022-08-12-welcome.md
Read posts/2022-10-07-hello-world.md
Read templates/archive.html
Read templates/default.html
Built _site/tags/brag/index.html
Build completed in 0.10s</code></pre>
</details>
<p>The logs show that Shake built all the targets that we define in the <code>buildTargets</code> function<a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a><a href="#fn11" class="footnote-ref" id="fnref11" role="doc-noteref"><sup>11</sup></a>.</p>
<p>Next, we look into some helpful Shake specific features.</p>
<h2 data-track-content data-content-name="shake-features" data-content-piece="static-site-generator-using-shake" id="shake-features">Shake Features</h2>
<p>Being a generic build system, Shake has some unique features that are not found in most other <abbr title="Static site generator">SSGs</abbr>. In this section, we look at some of these features.</p>
<h3 id="caching">Caching</h3>
<p>As we see in the build log above, the posts and templates are read multiple times. This is because Shake does not cache the dependencies of build rules by default. However, we can add caching by using the <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:newCacheIO" target="_blank" rel="noopener"><code>newCacheIO</code></a> function<a href="#fn12" class="footnote-ref" id="fnref12" role="doc-noteref"><sup>12</sup></a><a href="#fn13" class="footnote-ref" id="fnref13" role="doc-noteref"><sup>13</sup></a>. Once we add caching, the build log shows that the posts and templates are read only once:</p>
<details>
<summary class="print-hide">
Build log
</summary>
<pre class="plain"><code>Copied _site/images/logo.png from images/logo.png
Copied _site/css/default.css from css/default.css
Read templates/default.html
Built _site/contact/index.html from contact.md
Built _site/about/index.html from about.md
Read posts/2022-08-12-welcome.md
Read templates/post.html
Built _site/posts/2022-08-12-welcome/index.html from posts/2022-08-12-welcome.md
Read posts/2022-10-07-hello-world.md
Built _site/posts/2022-10-07-hello-world/index.html from posts/2022-10-07-hello-world.md
Read templates/home.html
Built _site/index.html
Read templates/archive.html
Built _site/archive/index.html
Built _site/tags/programming/index.html
Built _site/tags/note/index.html
Built _site/tags/brag/index.html
Build completed in 0.03s</code></pre>
</details>
<h3 id="parallelism">Parallelism</h3>
<p>Shake can run build actions in parallel. We can enable parallelism by using the <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:shakeThreads" target="_blank" rel="noopener"><code>shakeThreads</code></a> configuration option, or by using the <code>--jobs</code> command line option. Enabling parallel builds can reduce build times significantly.</p>
<p>Shake tries to automatically detect which build actions can be run in parallel. However, we can specify it explicitly as well. We explore this in the <a href="#tips-and-tricks">Tips and Tricks</a> section.</p>
<h3 id="fine-grain-dependency-management">Fine-grain Dependency Management</h3>
<p>Using the <code>Shake.need</code> function, we can explicitly specify the dependencies of a build target. For example, we can use it to mark the <code>Site.hs</code> file as a dependency of all targets. This way, Shake rebuilds the site if the build script changes. We have already seen how we can use it to mark the templates as dependencies of pages that use them.</p>
<h3 id="traces-and-reports">Traces and Reports</h3>
<p>Shake can be instructed to generate build traces and reports. These can be used to understand/debug/improve the builds. We can enable these features by using the <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:shakeReport" target="_blank" rel="noopener"><code>shakeReport</code></a> configuration option, or by using the <code>--report</code> command line option.</p>
<p>The report generated by Shake shows time taken by each build rule, their dependency graph, and the command plot traced by the <code>Shake.traced</code> function. For example, here is the command plot for a build of the website you are reading right now<a href="#fn14" class="footnote-ref" id="fnref14" role="doc-noteref"><sup>14</sup></a>:</p>
<details>
<summary class="print-hide">
Command plot of a build of this website
</summary>
<figure>
<a href="https://abhinavsarkar.net/images/static-site-generator-using-shake/command-plot.png" class="img-link"><picture><source srcset="/images/static-site-generator-using-shake/command-plot.webp" type="image/webp"><img src class="lazyload" data-src="/images/static-site-generator-using-shake/command-plot.png" alt="Command plot of a build of this website"></img></picture>
<noscript><picture><source srcset="/images/static-site-generator-using-shake/command-plot.webp" type="image/webp"><img src="/images/static-site-generator-using-shake/command-plot.png" alt="Command plot of a build of this website"></img></picture></noscript></a>
<figcaption>Command plot of a build of this website</figcaption>
</figure>
</details>
<p>The traces can be viewed using a trace viewer like <a href="https://ui.perfetto.dev/" target="_blank" rel="noopener">Perfetto</a>. For example, here is a trace of a build of this website:</p>
<details>
<summary class="print-hide">
Trace of a build of this website
</summary>
<figure>
<a href="https://abhinavsarkar.net/images/static-site-generator-using-shake/trace.png" class="img-link"><picture><source srcset="/images/static-site-generator-using-shake/trace.webp" type="image/webp"><img src class="lazyload" data-src="/images/static-site-generator-using-shake/trace.png" alt="Trace of a build of this website"></img></picture>
<noscript><picture><source srcset="/images/static-site-generator-using-shake/trace.webp" type="image/webp"><img src="/images/static-site-generator-using-shake/trace.png" alt="Trace of a build of this website"></img></picture></noscript></a>
<figcaption>Trace of a build of this website</figcaption>
</figure>
</details>
<h3 id="errors">Errors</h3>
<p>Shake provides detailed error messages when builds fail. For example, here is the error message when a build fails due to a missing template:</p>
<pre class="plain"><code>Error when running Shake build system:
at want, called at src/Development/Shake/Internal/Args.hs:83:67 in shake-0.19.7-IRPInZXX5QOAqz04qHWdHp:Development.Shake.Internal.Args
* Depends on: build
at need, called at Site.hs:54:3 in main:Main
* Depends on: _site/posts/2022-10-07-hello-world/index.html
* Depends on: templates/post.html
at error, called at src/Development/Shake/Internal/Rules/File.hs:179:58 in shake-0.19.7-IRPInZXX5QOAqz04qHWdHp:Development.Shake.Internal.Rules.File
* Raised the exception:
Error, file does not exist and no rule available:
templates/post.html</code></pre>
<p>To learn more about Shake, read the Shake <a href="https://shakebuild.com/manual" target="_blank" rel="noopener">manual</a> and the <a href="https://shakebuild.com/faq" target="_blank" rel="noopener">FAQ</a>.</p>
<h2 data-track-content data-content-name="tips-and-tricks" data-content-piece="static-site-generator-using-shake" id="tips-and-tricks">Tips and Tricks</h2>
<p>Let’s look at some tips and tricks that can be used to improve the build.</p>
<h3 id="explicit-parallelism">Explicit Parallelism</h3>
<p>Shake is a monadic build system. That means, while the build actions are executing for a build target, they can add new dependencies for the target. These dependencies can depend on the result of previous build actions. So, Shake cannot know all the dependencies of a build target before the build actions for it are executed. This makes it difficult for Shake to automatically detect which build actions can be run in parallel.</p>
<p>However, we can explicitly specify it by using the <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:parallel" target="_blank" rel="noopener"><code>parallel</code></a>, and <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:forP" target="_blank" rel="noopener"><code>forP</code></a>, and <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:par" target="_blank" rel="noopener"><code>par</code></a> functions<a href="#fn15" class="footnote-ref" id="fnref15" role="doc-noteref"><sup>15</sup></a>. Additionally, Shake also builds all builds targets specified in a single <code>Shake.need</code> call in parallel. Here is how we can improve the parallelism of our <abbr title="Static site generator">SSG</abbr> using these functions:</p>
<div class="sourceCode" id="cb29" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="ot">buildTargetsParallel ::</span> <span class="dt">Action</span> ()</span>
<span id="cb29-2"><a href="#cb29-2" aria-hidden="true" tabindex="-1"></a>buildTargetsParallel <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb29-3"><a href="#cb29-3" aria-hidden="true" tabindex="-1"></a> (assetPaths, postPaths) <span class="ot"><-</span></span>
<span id="cb29-4"><a href="#cb29-4" aria-hidden="true" tabindex="-1"></a> Shake.getDirectoryFiles <span class="st">""</span> assetGlobs</span>
<span id="cb29-5"><a href="#cb29-5" aria-hidden="true" tabindex="-1"></a> <span class="ot">`Shake.par`</span> Shake.getDirectoryFiles <span class="st">""</span> postGlobs</span>
<span id="cb29-6"><a href="#cb29-6" aria-hidden="true" tabindex="-1"></a> posts <span class="ot"><-</span> Shake.forP postPaths readPost</span>
<span id="cb29-7"><a href="#cb29-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb29-8"><a href="#cb29-8" aria-hidden="true" tabindex="-1"></a> void <span class="op">$</span> Shake.parallel [</span>
<span id="cb29-9"><a href="#cb29-9" aria-hidden="true" tabindex="-1"></a> Shake.need <span class="op">$</span></span>
<span id="cb29-10"><a href="#cb29-10" aria-hidden="true" tabindex="-1"></a> <span class="fu">map</span> (outputDir <span class="op"></></span>)</span>
<span id="cb29-11"><a href="#cb29-11" aria-hidden="true" tabindex="-1"></a> (assetPaths <span class="op"><></span> [<span class="st">"archive/index.html"</span>, <span class="st">"index.html"</span>]</span>
<span id="cb29-12"><a href="#cb29-12" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> [<span class="st">"tags"</span> <span class="op"></></span> T.unpack tag <span class="op"></></span> <span class="st">"index.html"</span></span>
<span id="cb29-13"><a href="#cb29-13" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> post <span class="ot"><-</span> posts, tag <span class="ot"><-</span> postTags post])</span>
<span id="cb29-14"><a href="#cb29-14" aria-hidden="true" tabindex="-1"></a> , Shake.need <span class="op">$</span> <span class="fu">map</span> indexHtmlOutputPath (pagePaths <span class="op"><></span> postPaths)</span>
<span id="cb29-15"><a href="#cb29-15" aria-hidden="true" tabindex="-1"></a> ]</span></code></pre></div>
<h3 id="faster-builds">Faster Builds</h3>
<p>There are different modes in which we can run our builds depending on the complexity of our generator, and our preferences:</p>
<ol type="1">
<li>Run the build script without compiling it using the <code>runhaskell</code> command.</li>
<li>Compile the build script using <code>ghc</code> or <code>cabal</code> every time we have to run the build, and then run the build using the compiled executable.</li>
<li>Compile the build script using <code>ghc</code> or <code>cabal</code> once, and then run the build using the compiled executable.</li>
</ol>
<p>Mode 1 is good enough for small scripts. However, it is slow for large scripts because it runs the script using an interpreter, which is slower than running a compiled executable.</p>
<p>Mode 2 and 3 speed up the build by compiling the build script. However, they have different tradeoffs: mode 2 is good if we change the build script often, but, it is useless work if the build script stays the same. Mode 3 is good if the build script does not change often. But if we do change it often, we’ll have to remember to recompile it.</p>
<p>If we go with compiling the script, we can use the tips in <a href="https://www.parsonsmatt.org/2019/11/27/keeping_compilation_fast.html" target="_blank" rel="noopener">this article</a> to speed up the compilation. Additionally, hand-writing the JSON instances for data types instead of deriving them also gives a noticeable speedup. We may also want to switch on/off optimizations by passing the <code>-O2</code>/<code>-O0</code> flag to <code>ghc</code> or <code>cabal</code> to speed up the compilation. We may also enable parallel compilation by passing the <code>-j</code> flag.</p>
<p>If we decide to go with mode 2, that is, to compile the build script every time we run the build, we may want to use <a href="https://cabal.readthedocs.io/en/3.6/cabal-project.html#cfg-field-executable-dynamic" target="_blank" rel="noopener">dynamic linking</a> to reduce linking time.</p>
<p>When running the build using a compiled executable, Shake recommends switching on multithreading but switching off idle and parallel garbage collection. Additionally, we may also want to tune the allocation area sizes for the garbage collector.</p>
<p>Putting all this together, we may want to use the following flags to compile the generator in mode 2:</p>
<pre class="plain"><code>-O0 -dynamic -j -threaded -rtsopts "-with-rtsopts=-I0 -qg -N -A32m -n8m"</code></pre>
<p>and these flags for mode 3:</p>
<pre class="plain"><code>-O2 -j -threaded -rtsopts "-with-rtsopts=-I0 -qg -N -A32m -n8m"</code></pre>
<p>However, these flags are suggestions only. We should experiment with them to find the best combination for our build.</p>
<h3 id="watch-and-serve">Watch and Serve</h3>
<p>We can add support for automatically rebuilding the site when the Markdown files or assets change using the <a href="https://hackage.haskell.org/package/fsnotify" target="_blank" rel="noopener">fsnotify</a> package. We can add support for automatic rebuilding for the Haskell source as well using <a href="https://eradman.com/entrproject/" target="_blank" rel="noopener"><code>entr</code></a> to rerun the script, or using <a href="https://github.com/ndmitchell/ghcid" target="_blank" rel="noopener"><code>ghcid</code></a> to re-interpret the script on every change.</p>
<p>We can also add support for serving the site using the <a href="https://hackage.haskell.org/package/warp" target="_blank" rel="noopener">warp</a> and <a href="https://hackage.haskell.org/package/wai-app-static" target="_blank" rel="noopener">wai-app-static</a> packages<a href="#fn16" class="footnote-ref" id="fnref16" role="doc-noteref"><sup>16</sup></a><a href="#fn17" class="footnote-ref" id="fnref17" role="doc-noteref"><sup>17</sup></a>. We can add live reloading on the browser side using the <a href="https://livejs.com/" target="_blank" rel="noopener">livejs</a> JavaScript library.</p>
<p>Together, these features give us a hot-reloading development environment with fast feedback loop for our <abbr title="Static site generator">SSG</abbr>.</p>
<h2 data-track-content data-content-name="conclusion" data-content-piece="static-site-generator-using-shake" id="conclusion">Conclusion</h2>
<p>In this article, we looked at how we can use Shake to build a static site generator. We also looked at Shake specific features, and some tips and tricks that can be used to improve the build. Shake offers flexibility that is unparalleled by other <abbr title="Static site generator">SSGs</abbr>, but at the cost of writing your own build script. However, if you do want to write your own <abbr title="Static site generator">SSG</abbr>, Shake is a great choice as the foundation for it.</p>
<h2 class="notoc" data-track-content data-content-name="acknowledgements" data-content-piece="static-site-generator-using-shake" id="acknowledgements">Acknowledgements</h2>
<p>Many thanks to <a href="https://arunraghavan.net/" target="_blank" rel="noopener">Arun Raghavan</a> and <a href="https://deobald.ca/" target="_blank" rel="noopener">Steven Deobald</a> for reviewing a draft of this article.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p><a href="https://jekyllrb.com/" target="_blank" rel="noopener">Jekyll</a> was the first modern <abbr title="Static site generator">SSG</abbr> released in 2008. Since then, there has been a <a href="https://staticsitegenerators.net/" target="_blank" rel="noopener">proliferation</a> of <abbr title="Static site generator">SSGs</abbr>.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>There are already a few <abbr title="Static site generator">SSGs</abbr> that use Shake as their build system. See <a href="https://github.com/ChrisPenner/slick#" target="_blank" rel="noopener">Slick</a> and <a href="https://github.com/srid/rib" target="_blank" rel="noopener">Rib</a>.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>Shake is a monadic and suspending build system. Being monadic here means that while build actions are executing, they can add new dependencies for build targets, and those dependencies can depend on the results of previous build actions. Being suspending means that when a build action requires a dependency that is not yet built, Shake suspends the build action and builds the dependency first. Together, these features make Shake flexible and powerful. Read the detailed and accessible paper <a href="https://www.microsoft.com/en-us/research/uploads/prod/2020/04/build-systems-jfp.pdf" target="_blank" rel="noopener">Build systems à la carte: Theory and practice</a> for a comparison of Shake with other build systems.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>To run the generator directly without writing a separate file for dependency management, we can prepend one of these three <a href="https://en.wikipedia.org/wiki/Shebang_(Unix)" target="_blank" rel="noopener">Shebang</a> snippets to <code>Site.hs</code>.</p>
<div id="lst:nix-shell-shebang" class="listing numberSource bash">
<div class="sourceCode" id="cb4" data-lang="bash"><pre class="sourceCode numberSource bash"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co">#! /usr/bin/env nix-shell</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="co">#! nix-shell -p "haskellPackages.ghcWithPackages (p: [p.mustache p.pandoc p.shake p.deriving-aeson])"</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co">#! nix-shell -i runhaskell</span></span></code></pre></div>
<p><a href="https://web.archive.org/web/20221217/https://iam.travishartwell.net/2015/06/17/nix-shell-shebang/" target="_blank" rel="noopener">Nix shell</a> shebang snippet</p>
</div>
<div id="lst:cabal-shebang" class="listing numberSource bash">
<div class="sourceCode" id="cb5" data-lang="bash"><pre class="sourceCode numberSource bash"><code class="sourceCode bash"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co">#! /usr/bin/env cabal</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="ex">{-</span> cabal:</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="ex">build-depends:</span> base, aeson, deriving-aeson, mustache, pandoc, shake, text, time, unordered-containers</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="ex">-}</span></span></code></pre></div>
<p><a href="https://cabal.readthedocs.io/en/latest/cabal-commands.html#cabal-run" target="_blank" rel="noopener">Cabal</a> shebang snippet</p>
</div>
<div id="lst:stack-shebang" class="listing numberSource bash">
<div class="sourceCode" id="cb6" data-lang="bash"><pre class="sourceCode numberSource bash"><code class="sourceCode bash"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co">#! /usr/bin/env stack</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="ex">{-</span> stack script</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> <span class="ex">--resolver</span> lts-19.28</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> <span class="ex">--package</span> <span class="st">"base aeson deriving-aeson mustache pandoc shake text time unordered-containers"</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a><span class="ex">-}</span></span></code></pre></div>
<p><a href="https://docs.haskellstack.org/en/latest/scripts/" target="_blank" rel="noopener">Stack</a> shebang snippet</p>
</div>
<p>We need to have the corresponding toolchain (Nix, Cabal or Stack) installed to run the generator. The snippets take care of downloading and/or building the dependencies, and running the generator.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>Since tag page generation is a bit more involved, we have faded out the related code for now. We come back to it a later section.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>We use the <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:putInfo" target="_blank" rel="noopener"><code class="sourceCode haskell">Shake.putInfo</code></a> function to print a message to the console. There also exist <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:putWarn" target="_blank" rel="noopener"><code class="sourceCode haskell">putWarn</code></a> and <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:putError" target="_blank" rel="noopener"><code class="sourceCode haskell">putError</code></a> functions for printing warnings and errors respectively.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>If you are familiar with other templating languages like <a href="https://github.com/Shopify/liquid" target="_blank" rel="noopener">Liquid</a>, and are wondering why we are limiting the post count in the Haskell code, and not in the Mustache template, it is because Mustache is a logic-less template engine. It does not have any control flow constructs except check for null values. Hence, we have to do the limiting in the Haskell code.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>If the code accumulates a lot of config options like post count on home page, we can move them to an external JSON/YAML/TOML config file, and read them at the start of the build script. We can wrap the <code class="sourceCode haskell"><span class="dt">Rules</span></code> monad in a <a href="https://hackage.haskell.org/package/transformers/docs/Control-Monad-Trans-Reader.html#t:ReaderT" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ReaderT</span></code></a> monad transformer to make the config options available to all build rules.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9"><p>We use the <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake.html#v:traced" target="_blank" rel="noopener"><code>Shake.traced</code></a> function to trace the operations in build actions. It logs the operations to the console, and also records them in traces and reports. See the <a href="#traces-and-reports">Traces and Reports</a> section for more details.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn10"><p>The full code for the SSG is available <a href="https://abhinavsarkar.net/code/shake-blog.html?mtm_campaign=feed">here</a>.<a href="#fnref10" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn11"><p>When we run the build for the first time, it takes some time to download and/or build the dependencies. Subsequent builds are much faster.<a href="#fnref11" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn12"><p>You can find the code for the <abbr title="Static site generator">SSG</abbr> with caching <a href="https://abhinavsarkar.net/code/shake-blog-with-caching.html?mtm_campaign=feed">here</a>.<a href="#fnref12" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn13"><p>If we use the experimental forward build feature, it’s easier to cache the output of build actions using one of the <code>cache*</code> functions in the <a href="https://hackage.haskell.org/package/shake-0.19.7/docs/Development-Shake-Forward.html" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Development.Shake.Forward</span></code></a> module. However, forward builds require <a href="https://github.com/jacereda/fsatrace" target="_blank" rel="noopener"><code>fsatrace</code></a> to be installed on the system, and it doesn’t work on macOS with <a href="https://support.apple.com/en-us/HT204899" target="_blank" rel="noopener">System Integrity Protection</a> enabled.<a href="#fnref13" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn14"><p>That’s right, <a href="https://abhinavsarkar.net?mtm_campaign=feed" target="_blank" rel="noopener">abhinavsarkar.net</a> is also built using Shake! See <a href="https://abhinavsarkar.net/colophon/?mtm_campaign=feed">the website Colophon</a> for more details. It used to be built using <a href="https://jaspervdj.be/hakyll/" target="_blank" rel="noopener">Hakyll</a>, but I switched to Shake after getting frustrated with the opaqueness of Hakyll’s build system, and those pesky <a href="https://hackage.haskell.org/package/hakyll-4.15.1.1/docs/Hakyll-Web-Template-Context.html" target="_blank" rel="noopener">contexts</a>.<a href="#fnref14" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn15"><p>Shake also supports the <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/applicative_do.html" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ApplicativeDo</span></code></a> extension, enabling which causes the compiler to automatically detect the build actions that can be run in parallel. However, it may not detect all cases. Regardless, it is better to enable it to improve the parallelism of the build.<a href="#fnref15" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn16"><p>Don’t forget to add a signal handler to stop the watcher and server threads when the build script is interrupted.<a href="#fnref16" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn17"><p>See how Rib does watch and serve <a href="https://github.com/srid/rib/tree/master/rib-core/src/Rib" target="_blank" rel="noopener">here</a>.<a href="#fnref17" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/static-site-generator-using-shake/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2022-12-17T00:00:00Z <p>Static site generators (SSGs) are all rage these days as people realize that plain HTML websites are good enough for most cases. <abbr title="Static site generator">SSGs</abbr> take raw data in various formats—often <a href="https://en.wikipedia.org/wiki/Markdown" target="_blank" rel="noopener">Markdown</a>, <a href="https://en.wikipedia.org/wiki/JSON" target="_blank" rel="noopener">JSON</a>, and <a href="https://en.wikipedia.org/wiki/YAML" target="_blank" rel="noopener">YAML</a>—and process them to produce the static websites, which can then be hosted easily on any hosting provider, or on personal <abbr title="Virtual private server">VPSes</abbr>. In this post, we write a bespoke <abbr title="Static site generator">SSG</abbr> using the <a href="https://shakebuild.com/" target="_blank" rel="noopener">Shake</a> build system.</p>
https://abhinavsarkar.net/posts/parsers-zippers-interpreters-aoc7/ Solving Advent of Code “No Space Left On Device” with Parsers, Zippers and Interpreters 2022-12-09T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In this post, we solve the Advent of Code 2022, <a href="https://adventofcode.com/2022/day/7" target="_blank" rel="noopener">“No Space Left On Device”</a> challenge in Haskell using parsers, zippers and interpreters.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/parsers-zippers-interpreters-aoc7/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>Solving Advent of Code</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/type-level-haskell-aoc7/?mtm_campaign=feed">“Handy Haversacks” in Type-level Haskell</a>
</li>
<li>
<strong>“No Space Left On Device” with Parsers, Zippers and Interpreters</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/notes/2022-type-level-rps/?mtm_campaign=feed">“Rock-Paper-Scissors” in Type-level Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/compiling-aoc23-aplenty/?mtm_campaign=feed">“Aplenty” by Compiling</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/solving-aoc20-seating-system/?mtm_campaign=feed">“Seating System” with Comonads and Stencils</a>
</li>
</ol>
</section>
<nav id="toc"><h3>Contents</h3><ol><li><a href="#the-challenge">The Challenge</a></li><li><a href="#setup">Setup</a></li><li><a href="#modeling-the-filesystem">Modeling the Filesystem</a></li><li><a href="#parsing-the-browsing-session">Parsing the Browsing Session</a></li><li><a href="#navigating-and-modifying-the-filesystem">Navigating and Modifying the Filesystem</a></li><li><a href="#interpreting-the-browsing-session">Interpreting the Browsing Session</a></li><li><a href="#solving-the-challenge">Solving the Challenge</a></li><li><a href="#running-the-program">Running the Program</a></li></ol></nav>
<h2 data-track-content data-content-name="the-challenge" data-content-piece="parsers-zippers-interpreters-aoc7" id="the-challenge">The Challenge</h2>
<p>Here’s a quick recap of the challenge:</p>
<blockquote>
<p>You browse around the filesystem to assess the situation and save the resulting terminal output (your puzzle input). For example:</p>
</blockquote>
<pre class="plain"><code>$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k</code></pre>
<p>We need to write a program that understands the browsing session, builds a model of the filesystem, and then solves the challenge.</p>
<p>Let’s get started!</p>
<h2 data-track-content data-content-name="setup" data-content-piece="parsers-zippers-interpreters-aoc7" id="setup">Setup</h2>
<p>First, some imports:</p>
<div class="sourceCode" id="cb2" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE LambdaCase #-}</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">Main</span> <span class="kw">where</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Char</span> (isDigit)</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.List</span> (sortOn)</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Map.Strict</span> <span class="kw">as</span> <span class="dt">Map</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Text.ParserCombinators.ReadP</span> ((<++))</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Text.ParserCombinators.ReadP</span> <span class="kw">as</span> <span class="dt">P</span></span></code></pre></div>
<p>We are using the <a href="https://hackage.haskell.org/package/base/docs/Text-ParserCombinators-ReadP.html" target="_blank" rel="noopener"><code>ReadP</code></a> parser combinator library, which is part of the <code>base</code> package. We also use the <code class="sourceCode haskell"><span class="dt">LambdaCase</span></code> extension, which allows us to write <code class="sourceCode haskell"><span class="kw">case</span></code> expressions in a more concise way.</p>
<h2 data-track-content data-content-name="modeling-the-filesystem" data-content-piece="parsers-zippers-interpreters-aoc7" id="modeling-the-filesystem">Modeling the Filesystem</h2>
<p>To start with, we model the filesystem:</p>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">File</span> <span class="ot">=</span> <span class="dt">File</span> {<span class="ot">fName ::</span> <span class="dt">String</span>,<span class="ot"> fSize ::</span> <span class="dt">Int</span>}</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">File</span> <span class="kw">where</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> (<span class="dt">File</span> name size) <span class="ot">=</span> name <span class="op"><></span> <span class="st">"(file, size="</span> <span class="op"><></span> <span class="fu">show</span> size <span class="op"><></span> <span class="st">")"</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Dir</span> <span class="ot">=</span> <span class="dt">Dir</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> dName ::</span> <span class="dt">String</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> dSize ::</span> <span class="dt">Int</span></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> dFiles ::</span> <span class="dt">Map.Map</span> <span class="dt">String</span> <span class="dt">File</span></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> dDirs ::</span> <span class="dt">Map.Map</span> <span class="dt">String</span> <span class="dt">Dir</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">Dir</span> <span class="kw">where</span></span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a> <span class="fu">showsPrec</span> d (<span class="dt">Dir</span> name _ files dirs) <span class="ot">=</span></span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a> <span class="fu">showString</span> (<span class="fu">concat</span> <span class="op">$</span> <span class="fu">replicate</span> d <span class="st">" "</span>)</span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">showString</span> <span class="st">"- "</span></span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">showString</span> name</span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">showString</span> <span class="st">" (dir)\n"</span></span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> showDirs (sortOn dName <span class="op">$</span> Map.elems dirs)</span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> showFiles (sortOn fName <span class="op">$</span> Map.elems files)</span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="ot"> showDirs ::</span> [<span class="dt">Dir</span>] <span class="ot">-></span> <span class="dt">ShowS</span></span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a> showDirs <span class="ot">=</span> <span class="fu">foldr</span> (<span class="op">.</span>) <span class="fu">id</span> <span class="op">.</span> <span class="fu">fmap</span> (<span class="fu">showsPrec</span> <span class="op">$</span> d <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a><span class="ot"> showFiles ::</span> [<span class="dt">File</span>] <span class="ot">-></span> <span class="dt">ShowS</span></span>
<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a> showFiles <span class="ot">=</span> <span class="fu">foldr</span> ((<span class="op">.</span>) <span class="op">.</span> showFile) <span class="fu">id</span></span>
<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-28"><a href="#cb3-28" aria-hidden="true" tabindex="-1"></a><span class="ot"> showFile ::</span> <span class="dt">File</span> <span class="ot">-></span> <span class="dt">ShowS</span></span>
<span id="cb3-29"><a href="#cb3-29" aria-hidden="true" tabindex="-1"></a> showFile (<span class="dt">File</span> name' size) <span class="ot">=</span></span>
<span id="cb3-30"><a href="#cb3-30" aria-hidden="true" tabindex="-1"></a> <span class="fu">showString</span> (<span class="fu">concat</span> <span class="op">$</span> <span class="fu">replicate</span> (d <span class="op">+</span> <span class="dv">1</span>) <span class="st">" "</span>)</span>
<span id="cb3-31"><a href="#cb3-31" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">showString</span> <span class="st">"- "</span></span>
<span id="cb3-32"><a href="#cb3-32" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">showString</span> name'</span>
<span id="cb3-33"><a href="#cb3-33" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">showString</span> <span class="st">" (file, size="</span></span>
<span id="cb3-34"><a href="#cb3-34" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">shows</span> size</span>
<span id="cb3-35"><a href="#cb3-35" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">showString</span> <span class="st">")\n"</span></span>
<span id="cb3-36"><a href="#cb3-36" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-37"><a href="#cb3-37" aria-hidden="true" tabindex="-1"></a><span class="ot">emptyDir ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Dir</span></span>
<span id="cb3-38"><a href="#cb3-38" aria-hidden="true" tabindex="-1"></a>emptyDir name <span class="ot">=</span> <span class="dt">Dir</span> name <span class="dv">0</span> Map.empty Map.empty</span></code></pre></div>
<p>The <code class="sourceCode haskell"><span class="dt">File</span></code> type is pretty straightforward. The <code class="sourceCode haskell"><span class="dt">Dir</span></code> type is a bit more complex. It contains a name, a size, and two maps of files and directories for the files and directories contained within respectively. The size is the sum of the sizes of all the files and directories contained in the directory.</p>
<p>The <code class="sourceCode haskell"><span class="dt">Show</span></code> instance for <code class="sourceCode haskell"><span class="dt">Dir</span></code> pretty-prints the directory structure. It uses a recursive function to print the files and directories contained in the directory.</p>
<p>We can check it in GHCi:</p>
<div class="sourceCode" id="cb4" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> fs <span class="ot">=</span> <span class="dt">Dir</span> <span class="st">"/"</span> <span class="dv">0</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> ( Map.fromList</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> [ (<span class="st">"b.txt"</span>, <span class="dt">File</span> <span class="st">"b.txt"</span> <span class="dv">14848514</span>)</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> , (<span class="st">"c.dat"</span>, <span class="dt">File</span> <span class="st">"c.dat"</span> <span class="dv">8504156</span>)</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> ]</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> )</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> ( Map.fromList</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> [ ( <span class="st">"a"</span>,</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Dir</span> <span class="st">"a"</span> <span class="dv">0</span> (Map.fromList [(<span class="st">"f"</span>, <span class="dt">File</span> <span class="st">"f"</span> <span class="dv">29116</span>)]) Map.empty )</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> , ( <span class="st">"d"</span>,</span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Dir</span> <span class="st">"d"</span> <span class="dv">0</span> ( Map.fromList [ (<span class="st">"d.log"</span>, <span class="dt">File</span> <span class="st">"d.log"</span> <span class="dv">8033020</span>)</span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> , (<span class="st">"k"</span>, <span class="dt">File</span> <span class="st">"k"</span> <span class="dv">7214296</span>)</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> ]</span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> )</span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> Map.empty )</span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> ]</span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> )</span>
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb4-20"><a href="#cb4-20" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">print</span> fs</span>
<span id="cb4-21"><a href="#cb4-21" aria-hidden="true" tabindex="-1"></a>- / (dir)</span>
<span id="cb4-22"><a href="#cb4-22" aria-hidden="true" tabindex="-1"></a> - a (dir)</span>
<span id="cb4-23"><a href="#cb4-23" aria-hidden="true" tabindex="-1"></a> - f (file, size=29116)</span>
<span id="cb4-24"><a href="#cb4-24" aria-hidden="true" tabindex="-1"></a> - d (dir)</span>
<span id="cb4-25"><a href="#cb4-25" aria-hidden="true" tabindex="-1"></a> - d.log (file, size=8033020)</span>
<span id="cb4-26"><a href="#cb4-26" aria-hidden="true" tabindex="-1"></a> - k (file, size=7214296)</span>
<span id="cb4-27"><a href="#cb4-27" aria-hidden="true" tabindex="-1"></a> - b.txt (file, size=14848514)</span>
<span id="cb4-28"><a href="#cb4-28" aria-hidden="true" tabindex="-1"></a> - c.dat (file, size=8504156)</span></code></pre></div>
<p>That looks good!</p>
<h2 data-track-content data-content-name="parsing-the-browsing-session" data-content-piece="parsers-zippers-interpreters-aoc7" id="parsing-the-browsing-session">Parsing the Browsing Session</h2>
<p>Let’s first define some data types to model the browsing session:</p>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">CdArg</span> <span class="ot">=</span> <span class="dt">CdDir</span> <span class="dt">String</span> <span class="op">|</span> <span class="dt">CdUp</span> <span class="op">|</span> <span class="dt">CdRoot</span> <span class="kw">deriving</span> (<span class="dt">Show</span>)</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Command</span> <span class="ot">=</span> <span class="dt">Cd</span> <span class="dt">CdArg</span> <span class="op">|</span> <span class="dt">Ls</span> <span class="kw">deriving</span> (<span class="dt">Show</span>)</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Output</span> <span class="ot">=</span> <span class="dt">OutputFile</span> <span class="dt">File</span> <span class="op">|</span> <span class="dt">OutputDir</span> <span class="dt">Dir</span> <span class="kw">deriving</span> (<span class="dt">Show</span>)</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Line</span> <span class="ot">=</span> <span class="dt">LCommand</span> <span class="dt">Command</span> <span class="op">|</span> <span class="dt">LOutput</span> <span class="dt">Output</span> <span class="kw">deriving</span> (<span class="dt">Show</span>)</span></code></pre></div>
<p>Now, we define the parsers:</p>
<div class="sourceCode" id="cb6" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">commandParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Command</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>commandParser <span class="ot">=</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> P.char <span class="ch">'$'</span> <span class="op">*></span> P.skipSpaces <span class="op">*></span> P.choice [<span class="dt">Cd</span> <span class="op"><$></span> cdParser, <span class="dt">Ls</span> <span class="op"><$</span> lsParser]</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> lsParser <span class="ot">=</span> P.string <span class="st">"ls"</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> cdParser <span class="ot">=</span></span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> P.string <span class="st">"cd"</span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> <span class="op">*></span> P.skipSpaces</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a> <span class="op">*></span> ((<span class="dt">CdUp</span> <span class="op"><$</span> P.string <span class="st">".."</span>)</span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a> <span class="op"><++</span> (<span class="dt">CdRoot</span> <span class="op"><$</span> P.string <span class="st">"/"</span>)</span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a> <span class="op"><++</span> (<span class="dt">CdDir</span> <span class="op"><$></span> P.munch1 (<span class="op">/=</span> <span class="ch">' '</span>)))</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a><span class="ot">outputParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Output</span></span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a>outputParser <span class="ot">=</span> P.choice [<span class="dt">OutputFile</span> <span class="op"><$></span> fileParser, <span class="dt">OutputDir</span> <span class="op"><$></span> dirParser]</span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a> fileParser <span class="ot">=</span></span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a> <span class="fu">flip</span> <span class="dt">File</span></span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> (<span class="fu">read</span> <span class="op"><$></span> P.munch1 <span class="fu">isDigit</span>)</span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> (P.skipSpaces <span class="op">*></span> P.munch1 (<span class="op">/=</span> <span class="ch">' '</span>))</span>
<span id="cb6-20"><a href="#cb6-20" aria-hidden="true" tabindex="-1"></a> dirParser <span class="ot">=</span> emptyDir <span class="op"><$></span> (P.string <span class="st">"dir "</span> <span class="op">*></span> P.munch1 (<span class="op">/=</span> <span class="ch">' '</span>))</span>
<span id="cb6-21"><a href="#cb6-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-22"><a href="#cb6-22" aria-hidden="true" tabindex="-1"></a><span class="ot">lineParser ::</span> <span class="dt">P.ReadP</span> <span class="dt">Line</span></span>
<span id="cb6-23"><a href="#cb6-23" aria-hidden="true" tabindex="-1"></a>lineParser <span class="ot">=</span> P.choice [<span class="dt">LOutput</span> <span class="op"><$></span> outputParser, <span class="dt">LCommand</span> <span class="op"><$></span> commandParser]</span>
<span id="cb6-24"><a href="#cb6-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-25"><a href="#cb6-25" aria-hidden="true" tabindex="-1"></a><span class="ot">parseLine ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Line</span></span>
<span id="cb6-26"><a href="#cb6-26" aria-hidden="true" tabindex="-1"></a>parseLine s <span class="ot">=</span> <span class="kw">case</span> P.readP_to_S lineParser s <span class="kw">of</span></span>
<span id="cb6-27"><a href="#cb6-27" aria-hidden="true" tabindex="-1"></a> [(l, <span class="st">""</span>)] <span class="ot">-></span> l</span>
<span id="cb6-28"><a href="#cb6-28" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">error</span> <span class="op">$</span> <span class="st">"Failed to parse line: "</span> <span class="op"><></span> s</span></code></pre></div>
<p>The <code>commandParser</code> parses a command:</p>
<ul>
<li>A command starts with a <code>$</code> character, followed by a space, followed by the command name.</li>
<li>The <code>cd</code> command is followed by a space and a directory name, or <code>..</code> to go up, or <code>/</code> to go the root directory.</li>
<li>The <code>ls</code> command takes no arguments.</li>
</ul>
<p>The <code>outputParser</code> parses a line of command output. It can be either a file or a directory:</p>
<ul>
<li>A file is a size followed by a space and a name.</li>
<li>A directory is the string <code>dir</code> followed by a name.</li>
</ul>
<p>The <code>lineParser</code> parses a line of the browsing session. It can be either a command or a line of command output.</p>
<p>Finally, the <code>parseLine</code> function parses a line of the browsing session, and returns a <code class="sourceCode haskell"><span class="dt">Line</span></code> value.</p>
<p>We can try it out in GHCi:</p>
<div class="sourceCode" id="cb7" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> parseLine <span class="st">"$ cd .."</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>LCommand (Cd CdUp)</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> parseLine <span class="st">"$ cd /"</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>LCommand (Cd CdRoot)</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> parseLine <span class="st">"$ cd a"</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>LCommand (Cd (CdDir "a"))</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> parseLine <span class="st">"$ ls"</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>LCommand Ls</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> parseLine <span class="st">"29116 f"</span></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>LOutput (OutputFile f(file, size=29116))</span></code></pre></div>
<p>Works as expected.</p>
<h2 data-track-content data-content-name="navigating-and-modifying-the-filesystem" data-content-piece="parsers-zippers-interpreters-aoc7" id="navigating-and-modifying-the-filesystem">Navigating and Modifying the Filesystem</h2>
<p>Now, we define functions to navigate and modify the file system. We are going to use a Zipper to navigate the file system.</p>
<p>A <a href="https://en.wikipedia.org/wiki/Zipper_(data_structure)" target="_blank" rel="noopener">Zipper</a> is a data structure that allows us to navigate a data structure, focus on a part of it, and to modify the data structure at the focus.</p>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">FsZipper</span> <span class="ot">=</span> <span class="dt">FsZipper</span> {<span class="ot">zPath ::</span> [<span class="dt">Dir</span>],<span class="ot"> zCurrent ::</span> <span class="dt">Dir</span>} <span class="kw">deriving</span> (<span class="dt">Show</span>)</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="ot">moveUp ::</span> <span class="dt">FsZipper</span> <span class="ot">-></span> <span class="dt">FsZipper</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>moveUp <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">FsZipper</span> [] _ <span class="ot">-></span> <span class="fu">error</span> <span class="st">"Can't move up from root"</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">FsZipper</span> (d <span class="op">:</span> ds) cur <span class="ot">-></span></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">FsZipper</span> ds <span class="op">$</span> d {dDirs <span class="ot">=</span> Map.insert (dName cur) cur <span class="op">$</span> dDirs d}</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="ot">moveDown ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">FsZipper</span> <span class="ot">-></span> <span class="dt">FsZipper</span></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>moveDown name (<span class="dt">FsZipper</span> ds d) <span class="ot">=</span> <span class="dt">FsZipper</span> (d <span class="op">:</span> ds) <span class="op">$</span> findDir d</span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a> findDir <span class="dt">Dir</span> {dDirs <span class="ot">=</span> ds'} <span class="ot">=</span> <span class="kw">case</span> Map.lookup name ds' <span class="kw">of</span></span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="fu">error</span> <span class="op">$</span> <span class="st">"Can't find directory "</span> <span class="op"><></span> name</span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> d' <span class="ot">-></span> d'</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a><span class="ot">moveToRoot ::</span> <span class="dt">FsZipper</span> <span class="ot">-></span> <span class="dt">FsZipper</span></span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a>moveToRoot zipper <span class="ot">=</span> <span class="kw">case</span> zipper <span class="kw">of</span></span>
<span id="cb8-18"><a href="#cb8-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">FsZipper</span> [] _ <span class="ot">-></span> zipper</span>
<span id="cb8-19"><a href="#cb8-19" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> moveToRoot <span class="op">$</span> moveUp zipper</span>
<span id="cb8-20"><a href="#cb8-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-21"><a href="#cb8-21" aria-hidden="true" tabindex="-1"></a><span class="ot">addFile ::</span> <span class="dt">File</span> <span class="ot">-></span> <span class="dt">FsZipper</span> <span class="ot">-></span> <span class="dt">FsZipper</span></span>
<span id="cb8-22"><a href="#cb8-22" aria-hidden="true" tabindex="-1"></a>addFile f (<span class="dt">FsZipper</span> ds d) <span class="ot">=</span></span>
<span id="cb8-23"><a href="#cb8-23" aria-hidden="true" tabindex="-1"></a> <span class="dt">FsZipper</span> ds d {dFiles <span class="ot">=</span> Map.insert (fName f) f <span class="op">$</span> dFiles d}</span>
<span id="cb8-24"><a href="#cb8-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-25"><a href="#cb8-25" aria-hidden="true" tabindex="-1"></a><span class="ot">addDir ::</span> <span class="dt">Dir</span> <span class="ot">-></span> <span class="dt">FsZipper</span> <span class="ot">-></span> <span class="dt">FsZipper</span></span>
<span id="cb8-26"><a href="#cb8-26" aria-hidden="true" tabindex="-1"></a>addDir d (<span class="dt">FsZipper</span> ds d') <span class="ot">=</span></span>
<span id="cb8-27"><a href="#cb8-27" aria-hidden="true" tabindex="-1"></a> <span class="dt">FsZipper</span> ds d' {dDirs <span class="ot">=</span> Map.insert (dName d) d <span class="op">$</span> dDirs d'}</span>
<span id="cb8-28"><a href="#cb8-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-29"><a href="#cb8-29" aria-hidden="true" tabindex="-1"></a><span class="ot">toZipper ::</span> <span class="dt">Dir</span> <span class="ot">-></span> <span class="dt">FsZipper</span></span>
<span id="cb8-30"><a href="#cb8-30" aria-hidden="true" tabindex="-1"></a>toZipper <span class="ot">=</span> <span class="dt">FsZipper</span> []</span>
<span id="cb8-31"><a href="#cb8-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-32"><a href="#cb8-32" aria-hidden="true" tabindex="-1"></a><span class="ot">fromZipper ::</span> <span class="dt">FsZipper</span> <span class="ot">-></span> <span class="dt">Dir</span></span>
<span id="cb8-33"><a href="#cb8-33" aria-hidden="true" tabindex="-1"></a>fromZipper <span class="ot">=</span> zCurrent <span class="op">.</span> moveToRoot</span></code></pre></div>
<p>The <code class="sourceCode haskell"><span class="dt">FsZipper</span></code> type is a zipper for our file system model. It contains the path to the current directory, and the current directory.</p>
<ul>
<li>The <code>moveUp</code> function moves up a directory in the file system. It takes the current directory, and replaces it in the parent directory. It then returns a zipper with the parent directory as the current directory.</li>
<li>The <code>moveDown</code> function moves down a directory in the file system. It takes the name of the directory to move to, and returns a zipper with the new directory as the current directory.</li>
<li>The <code>moveToRoot</code> function moves to the root directory in the file system. It calls the <code>moveUp</code> function repeatedly until it reaches the root directory.</li>
<li>The <code>addFile</code> and <code>addDir</code> functions add a file and a directory to the current directory, respectively.</li>
<li>The <code>toZipper</code> function converts the root directory of the file system to a zipper, and the <code>fromZipper</code> function does the opposite.</li>
</ul>
<p>We can test these functions in GHCi:</p>
<div class="sourceCode" id="cb9" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> fs <span class="ot">=</span> <span class="dt">Dir</span> <span class="st">"/"</span> <span class="dv">0</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> ( Map.fromList</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> [ (<span class="st">"b.txt"</span>, <span class="dt">File</span> <span class="st">"b.txt"</span> <span class="dv">14848514</span>)</span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> , (<span class="st">"c.dat"</span>, <span class="dt">File</span> <span class="st">"c.dat"</span> <span class="dv">8504156</span>)</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> ]</span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> )</span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> ( Map.fromList</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> [ ( <span class="st">"a"</span>,</span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Dir</span> <span class="st">"a"</span> <span class="dv">0</span> (Map.fromList [(<span class="st">"f"</span>, <span class="dt">File</span> <span class="st">"f"</span> <span class="dv">29116</span>)]) Map.empty )</span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> , ( <span class="st">"d"</span>,</span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Dir</span> <span class="st">"d"</span> <span class="dv">0</span> ( Map.fromList [ (<span class="st">"d.log"</span>, <span class="dt">File</span> <span class="st">"d.log"</span> <span class="dv">8033020</span>)</span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> , (<span class="st">"k"</span>, <span class="dt">File</span> <span class="st">"k"</span> <span class="dv">7214296</span>)</span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> ]</span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> )</span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> Map.empty )</span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> ]</span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> )</span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">print</span> fs</span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a>- / (dir)</span>
<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a> - a (dir)</span>
<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a> - f (file, size=29116)</span>
<span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a> - d (dir)</span>
<span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a> - d.log (file, size=8033020)</span>
<span id="cb9-26"><a href="#cb9-26" aria-hidden="true" tabindex="-1"></a> - k (file, size=7214296)</span>
<span id="cb9-27"><a href="#cb9-27" aria-hidden="true" tabindex="-1"></a> - b.txt (file, size=14848514)</span>
<span id="cb9-28"><a href="#cb9-28" aria-hidden="true" tabindex="-1"></a> - c.dat (file, size=8504156)</span>
<span id="cb9-29"><a href="#cb9-29" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">import</span> <span class="dt">Data.Function</span> ((&))</span>
<span id="cb9-30"><a href="#cb9-30" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb9-31"><a href="#cb9-31" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> fs</span>
<span id="cb9-32"><a href="#cb9-32" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">&</span> toZipper</span>
<span id="cb9-33"><a href="#cb9-33" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">&</span> moveDown <span class="st">"a"</span></span>
<span id="cb9-34"><a href="#cb9-34" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">&</span> addFile (<span class="dt">File</span> <span class="st">"g"</span> <span class="dv">12345</span>)</span>
<span id="cb9-35"><a href="#cb9-35" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">&</span> moveUp</span>
<span id="cb9-36"><a href="#cb9-36" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">&</span> moveDown <span class="st">"d"</span></span>
<span id="cb9-37"><a href="#cb9-37" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">&</span> addDir (<span class="dt">Dir</span> <span class="st">"e"</span> <span class="dv">0</span> Map.empty Map.empty)</span>
<span id="cb9-38"><a href="#cb9-38" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">&</span> fromZipper</span>
<span id="cb9-39"><a href="#cb9-39" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb9-40"><a href="#cb9-40" aria-hidden="true" tabindex="-1"></a>- / (dir)</span>
<span id="cb9-41"><a href="#cb9-41" aria-hidden="true" tabindex="-1"></a> - a (dir)</span>
<span id="cb9-42"><a href="#cb9-42" aria-hidden="true" tabindex="-1"></a> - f (file, size=29116)</span>
<span id="cb9-43"><a href="#cb9-43" aria-hidden="true" tabindex="-1"></a> - g (file, size=12345)</span>
<span id="cb9-44"><a href="#cb9-44" aria-hidden="true" tabindex="-1"></a> - d (dir)</span>
<span id="cb9-45"><a href="#cb9-45" aria-hidden="true" tabindex="-1"></a> - e (dir)</span>
<span id="cb9-46"><a href="#cb9-46" aria-hidden="true" tabindex="-1"></a> - d.log (file, size=8033020)</span>
<span id="cb9-47"><a href="#cb9-47" aria-hidden="true" tabindex="-1"></a> - k (file, size=7214296)</span>
<span id="cb9-48"><a href="#cb9-48" aria-hidden="true" tabindex="-1"></a> - b.txt (file, size=14848514)</span>
<span id="cb9-49"><a href="#cb9-49" aria-hidden="true" tabindex="-1"></a> - c.dat (file, size=8504156)</span></code></pre></div>
<p>Perfect!</p>
<h2 data-track-content data-content-name="interpreting-the-browsing-session" data-content-piece="parsers-zippers-interpreters-aoc7" id="interpreting-the-browsing-session">Interpreting the Browsing Session</h2>
<p>Now that we have a way to navigate and modify the file system, we define a function to interpret the browsing session.</p>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">interpretLine ::</span> <span class="dt">FsZipper</span> <span class="ot">-></span> <span class="dt">Line</span> <span class="ot">-></span> <span class="dt">FsZipper</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>interpretLine zipper <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">LCommand</span> (<span class="dt">Cd</span> <span class="dt">CdUp</span>) <span class="ot">-></span> moveUp zipper</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">LCommand</span> (<span class="dt">Cd</span> <span class="dt">CdRoot</span>) <span class="ot">-></span> moveToRoot zipper</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">LCommand</span> (<span class="dt">Cd</span> (<span class="dt">CdDir</span> name)) <span class="ot">-></span> moveDown name zipper</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">LCommand</span> <span class="dt">Ls</span> <span class="ot">-></span> zipper</span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">LOutput</span> (<span class="dt">OutputFile</span> f) <span class="ot">-></span> addFile f zipper</span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">LOutput</span> (<span class="dt">OutputDir</span> d) <span class="ot">-></span> addDir d zipper</span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a><span class="ot">interpret ::</span> [<span class="dt">Line</span>] <span class="ot">-></span> <span class="dt">Dir</span></span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a>interpret <span class="ot">=</span> fromZipper <span class="op">.</span> <span class="fu">foldl</span> interpretLine (toZipper <span class="op">$</span> emptyDir <span class="st">"/"</span>)</span></code></pre></div>
<p>The <code>interpretLine</code> function interprets a parsed line of the browsing session, and returns the updated file system zipper. It uses pattern matching to handle each command and output.</p>
<p>The <code>interpret</code> function interprets the entire parsed browsing session, and returns the root directory of the final file system.</p>
<p>We can test the interpreter in GHCi on the test browsing session:</p>
<div class="sourceCode" id="cb11" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> testInput <span class="ot">=</span> [</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"$ cd /"</span>,</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"$ ls"</span>,</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"dir a"</span>,</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"14848514 b.txt"</span>,</span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"8504156 c.dat"</span>,</span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"dir d"</span>,</span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"$ cd a"</span>,</span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"$ ls"</span>,</span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"dir e"</span>,</span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"29116 f"</span>,</span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"2557 g"</span>,</span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"62596 h.lst"</span>,</span>
<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"$ cd e"</span>,</span>
<span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"$ ls"</span>,</span>
<span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"584 i"</span>,</span>
<span id="cb11-18"><a href="#cb11-18" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"$ cd .."</span>,</span>
<span id="cb11-19"><a href="#cb11-19" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"$ cd .."</span>,</span>
<span id="cb11-20"><a href="#cb11-20" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"$ cd d"</span>,</span>
<span id="cb11-21"><a href="#cb11-21" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"$ ls"</span>,</span>
<span id="cb11-22"><a href="#cb11-22" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"4060174 j"</span>,</span>
<span id="cb11-23"><a href="#cb11-23" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"8033020 d.log"</span>,</span>
<span id="cb11-24"><a href="#cb11-24" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"5626152 d.ext"</span>,</span>
<span id="cb11-25"><a href="#cb11-25" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"7214296 k"</span></span>
<span id="cb11-26"><a href="#cb11-26" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> ]</span>
<span id="cb11-27"><a href="#cb11-27" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb11-28"><a href="#cb11-28" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> interpret <span class="op">$</span> <span class="fu">map</span> parseLine testInput</span>
<span id="cb11-29"><a href="#cb11-29" aria-hidden="true" tabindex="-1"></a>- / (dir)</span>
<span id="cb11-30"><a href="#cb11-30" aria-hidden="true" tabindex="-1"></a> - a (dir)</span>
<span id="cb11-31"><a href="#cb11-31" aria-hidden="true" tabindex="-1"></a> - e (dir)</span>
<span id="cb11-32"><a href="#cb11-32" aria-hidden="true" tabindex="-1"></a> - i (file, size=584)</span>
<span id="cb11-33"><a href="#cb11-33" aria-hidden="true" tabindex="-1"></a> - f (file, size=29116)</span>
<span id="cb11-34"><a href="#cb11-34" aria-hidden="true" tabindex="-1"></a> - g (file, size=2557)</span>
<span id="cb11-35"><a href="#cb11-35" aria-hidden="true" tabindex="-1"></a> - h.lst (file, size=62596)</span>
<span id="cb11-36"><a href="#cb11-36" aria-hidden="true" tabindex="-1"></a> - d (dir)</span>
<span id="cb11-37"><a href="#cb11-37" aria-hidden="true" tabindex="-1"></a> - d.ext (file, size=5626152)</span>
<span id="cb11-38"><a href="#cb11-38" aria-hidden="true" tabindex="-1"></a> - d.log (file, size=8033020)</span>
<span id="cb11-39"><a href="#cb11-39" aria-hidden="true" tabindex="-1"></a> - j (file, size=4060174)</span>
<span id="cb11-40"><a href="#cb11-40" aria-hidden="true" tabindex="-1"></a> - k (file, size=7214296)</span>
<span id="cb11-41"><a href="#cb11-41" aria-hidden="true" tabindex="-1"></a> - b.txt (file, size=14848514)</span>
<span id="cb11-42"><a href="#cb11-42" aria-hidden="true" tabindex="-1"></a> - c.dat (file, size=8504156)</span></code></pre></div>
<p>This matches the file system shown in the problem statement.</p>
<h2 data-track-content data-content-name="solving-the-challenge" data-content-piece="parsers-zippers-interpreters-aoc7" id="solving-the-challenge">Solving the Challenge</h2>
<p>Next, we solve the challenge.</p>
<h3 id="part-1">Part 1</h3>
<p>For part 1, we need to return the sum of the sizes of directories smaller than 100000.</p>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">calcAndSetDirSize ::</span> <span class="dt">Dir</span> <span class="ot">-></span> <span class="dt">Dir</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>calcAndSetDirSize d<span class="op">@</span><span class="dt">Dir</span> {dFiles <span class="ot">=</span> fs, dDirs <span class="ot">=</span> ds} <span class="ot">=</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> ds' <span class="ot">=</span> <span class="fu">fmap</span> calcAndSetDirSize ds</span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> d {dSize <span class="ot">=</span> <span class="fu">sum</span> <span class="op">$</span> <span class="fu">fmap</span> fSize fs <span class="op"><></span> <span class="fu">fmap</span> dSize ds', dDirs <span class="ot">=</span> ds'}</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a><span class="ot">findDirsSmallerThan ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Dir</span> <span class="ot">-></span> [<span class="dt">Dir</span>]</span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>findDirsSmallerThan size d<span class="op">@</span>(<span class="dt">Dir</span> {dSize <span class="ot">=</span> dSize', dDirs <span class="ot">=</span> ds}) <span class="ot">=</span></span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> [d <span class="op">|</span> dSize' <span class="op"><=</span> size] <span class="op"><></span> <span class="fu">concatMap</span> (findDirsSmallerThan size) ds</span></code></pre></div>
<p>First, the <code>calcAndSetDirSize</code> function calculates and sets the size of each directory. It recursively calculates the size of the subdirectories, and sets their sizes. It then sets the size of the current directory to the sum of the sizes of the files and subdirectories.</p>
<p>Then, the <code>findDirsSmallerThan</code> function recursively finds the directories smaller than the given size.</p>
<p>Finally, we write the <code>part1</code> function that solves the first part of the challenge:</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">part1 ::</span> <span class="dt">Dir</span> <span class="ot">-></span> <span class="dt">Int</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>part1 <span class="ot">=</span> <span class="fu">sum</span> <span class="op">.</span> <span class="fu">map</span> dSize <span class="op">.</span> findDirsSmallerThan <span class="dv">100000</span></span></code></pre></div>
<h3 id="part-2">Part 2</h3>
<p>For part 2, we need to return the size of the smallest directory larger than space required for the update.</p>
<div class="sourceCode" id="cb14" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ot">findDirsLargerThan ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Dir</span> <span class="ot">-></span> [<span class="dt">Dir</span>]</span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>findDirsLargerThan size d<span class="op">@</span>(<span class="dt">Dir</span> {dSize <span class="ot">=</span> dSize', dDirs <span class="ot">=</span> ds}) <span class="ot">=</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> [d <span class="op">|</span> dSize' <span class="op">>=</span> size] <span class="op"><></span> <span class="fu">concatMap</span> (findDirsLargerThan size) ds</span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a><span class="ot">part2 ::</span> <span class="dt">Dir</span> <span class="ot">-></span> <span class="dt">Int</span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>part2 fs <span class="ot">=</span></span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> freeSpace <span class="ot">=</span> totalSpace <span class="op">-</span> dSize fs</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a> spaceRequired <span class="ot">=</span> updateSpace <span class="op">-</span> freeSpace</span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a> dirs <span class="ot">=</span> findDirsLargerThan spaceRequired fs</span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="fu">minimum</span> <span class="op">$</span> <span class="fu">map</span> dSize dirs</span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a> totalSpace <span class="ot">=</span> <span class="dv">70000000</span></span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a> updateSpace <span class="ot">=</span> <span class="dv">30000000</span></span></code></pre></div>
<p>The <code>findDirsLargerThan</code> function is similar to the <code>findDirsSmallerThan</code> function, except that it finds the directories larger than the given size.</p>
<p>The <code>part2</code> function solves the second part of the challenge by first calculating the amount of free space in the file system, and the amount of space required for the update. It then finds the directories larger than the amount of space required for the update, and returns the size of the smallest directory.</p>
<h2 data-track-content data-content-name="running-the-program" data-content-piece="parsers-zippers-interpreters-aoc7" id="running-the-program">Running the Program</h2>
<p>Finally, we write a <code>main</code> program:</p>
<div class="sourceCode" id="cb15" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> input <span class="ot"><-</span> <span class="fu">lines</span> <span class="op"><$></span> <span class="fu">getContents</span></span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> fs <span class="ot">=</span> calcAndSetDirSize <span class="op">$</span> interpret <span class="op">$</span> <span class="fu">map</span> parseLine input</span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span> fs</span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span> <span class="op">$</span> part1 fs</span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span> <span class="op">$</span> part2 fs</span></code></pre></div>
<p>The <code>main</code> function reads the browsing session from the standard input, parses the browsing session, interprets it, and calculates and sets the size of each directory. Then, it prints the final file system, and the solutions to the two parts of the challenge.</p>
<p>That’s it for this post! I hope you enjoyed it. The full code for this post is available <a href="https://abhinavsarkar.net/code/no-space.html?mtm_campaign=feed">here</a>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p><section class="series-info">
<p>This post is a part of the series: <strong>Solving Advent of Code</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/type-level-haskell-aoc7/?mtm_campaign=feed">“Handy Haversacks” in Type-level Haskell</a>
</li>
<li>
<strong>“No Space Left On Device” with Parsers, Zippers and Interpreters</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/notes/2022-type-level-rps/?mtm_campaign=feed">“Rock-Paper-Scissors” in Type-level Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/compiling-aoc23-aplenty/?mtm_campaign=feed">“Aplenty” by Compiling</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/solving-aoc20-seating-system/?mtm_campaign=feed">“Seating System” with Comonads and Stencils</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/parsers-zippers-interpreters-aoc7/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2022-12-09T00:00:00Z <p>In this post, we solve the Advent of Code 2022, <a href="https://adventofcode.com/2022/day/7" target="_blank" rel="noopener">“No Space Left On Device”</a> challenge in Haskell using parsers, zippers and interpreters.</p>
https://abhinavsarkar.net/posts/implementing-co-2/ Implementing Co, a Small Language With Coroutines #2: The Interpreter 2021-09-21T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In the <a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed">previous post</a>, we wrote the parser for <span class="fancy">Co</span>, the small language we are building in this series of posts. The previous post was all about the syntax of <span class="fancy">Co</span>. In this post we dive into the semantics of <span class="fancy">Co</span>, and write an interpreter for its basic features.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>Implementing Co, a Small Language With Coroutines</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed">The Parser</a>
</li>
<li>
<strong>The Interpreter</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed">Adding Coroutines</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-4/?mtm_campaign=feed">Adding Channels</a>
</li>
</ol>
</section>
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#previously-on">Previously, on …</a></li><li><a href="#running-a-program">Running a Program</a></li><li><a href="#runtime-values">Runtime Values</a></li><li><a href="#environment-model-of-evaluation">Environment Model of Evaluation</a></li><li><a href="#scopes">Scopes</a></li><li><a href="#closures">Closures</a></li><li><a href="#early-returns">Early Returns</a></li><li><a href="#the-interpreter">The Interpreter</a></li><li><a href="#manipulating-environments">Manipulating Environments</a></li><li><a href="#evaluating-expressions">Evaluating Expressions</a></li><li><a href="#executing-statements">Executing Statements</a></li><li><a href="#evaluating-function-calls">Evaluating Function Calls</a></li><li><a href="#interpreting-a-program">Interpreting a Program</a></li></ol></nav>
<h2 data-track-content data-content-name="previously-on" data-content-piece="implementing-co-2" id="previously-on">Previously, on …</h2>
<p>Here’s a quick recap. The basic features of <span class="fancy">Co</span> that we are aiming to implement in this post are:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Dynamic_typing" target="_blank" rel="noopener">Dynamic</a> and <a href="https://en.wikipedia.org/wiki/Strong_typing" target="_blank" rel="noopener">strong</a> typing.</li>
<li>Null, boolean, string and integer literals, and values.</li>
<li>Addition, subtraction, multiplication and integer division arithmetic operations.</li>
<li>String concatenation operation.</li>
<li>Equality and inequality checks on booleans, strings and numbers.</li>
<li>Less-than and greater-than comparison operations on numbers.</li>
<li>Variable declarations, usage and assignments.</li>
<li><code class="sourceCode javascript"><span class="cf">if</span></code> and <code class="sourceCode javascript"><span class="cf">while</span></code> statements.</li>
<li>Function declarations and calls, with support for recursion.</li>
<li>First class functions and anonymous functions.</li>
<li>Mutable closures.</li>
</ul>
<div class="note">
<p>Note that some parts of code snippets in this post have been faded away. These are the part which add support for coroutines and channels. You can safely ignore these parts for now. We’ll go over them in the next post.</p>
</div>
<p>We represent the <span class="fancy">Co</span> <a href="https://en.wikipedia.org/wiki/Abstract_Syntax_Tree" target="_blank" rel="noopener"><em>Abstract Syntax Tree</em></a> (AST) as a pair of Haskell <a href="https://en.wikipedia.org/wiki/Algebraic_data_type" target="_blank" rel="noopener">Algebraic Data Types</a> (ADTs), one for <a href="https://en.wikipedia.org/wiki/Expression_(computer_science)" target="_blank" rel="noopener"><em>Expressions</em></a>:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="10-10"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Expr</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">LNull</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">LBool</span> <span class="dt">Bool</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">LStr</span> <span class="dt">String</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">LNum</span> <span class="dt">Integer</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Variable</span> <span class="dt">Identifier</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Binary</span> <span class="dt">BinOp</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Call</span> <span class="dt">Expr</span> [<span class="dt">Expr</span>]</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Lambda</span> [<span class="dt">Identifier</span>] [<span class="dt">Stmt</span>]</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="op">|</span> <span class="dt">Receive</span> <span class="dt">Expr</span></span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Identifier</span> <span class="ot">=</span> <span class="dt">String</span></span></code></pre></div>
<p>And another for <a href="https://en.wikipedia.org/wiki/Statement_(computer_science)" target="_blank" rel="noopener"><em>Statements</em></a>:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="9-11"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Stmt</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">ExprStmt</span> <span class="dt">Expr</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">VarStmt</span> <span class="dt">Identifier</span> <span class="dt">Expr</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">AssignStmt</span> <span class="dt">Identifier</span> <span class="dt">Expr</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">IfStmt</span> <span class="dt">Expr</span> [<span class="dt">Stmt</span>]</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">WhileStmt</span> <span class="dt">Expr</span> [<span class="dt">Stmt</span>]</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">FunctionStmt</span> <span class="dt">Identifier</span> [<span class="dt">Identifier</span>] [<span class="dt">Stmt</span>]</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">ReturnStmt</span> (<span class="dt">Maybe</span> <span class="dt">Expr</span>)</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="op">|</span> <span class="dt">YieldStmt</span></span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="op">|</span> <span class="dt">SpawnStmt</span> <span class="dt">Expr</span></span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="op">|</span> <span class="dt">SendStmt</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Program</span> <span class="ot">=</span> [<span class="dt">Stmt</span>]</span></code></pre></div>
<p>Also, <code>program</code> is the parser for <span class="fancy">Co</span> programs. To parse code, run the <code>program</code> parser with the <code>runParser</code> function like this:</p>
<div class="sourceCode" id="cb1" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser program <span class="st">"var x = 1 + s;"</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>Right [VarStmt "x" (Binary Plus (LNum 1) (Variable "s"))]</span></code></pre></div>
<p>Now, off to the new stuff.</p>
<h2 data-track-content data-content-name="running-a-program" data-content-piece="implementing-co-2" id="running-a-program">Running a Program</h2>
<p>There are many ways to run a program. If the program is written in <a href="https://en.wikipedia.org/wiki/Machine_Code" target="_blank" rel="noopener"><em>Machine Code</em></a>, you can run it directly on the matching <a href="https://en.wikipedia.org/wiki/CPU" target="_blank" rel="noopener">CPU</a>. But machine code is too <a href="https://en.wikipedia.org/wiki/Low-level_programming_language" target="_blank" rel="noopener">low-level</a>, and writing programs in it is very tedious and error-prone. Thus, programmers prefer to write code in <a href="https://en.wikipedia.org/wiki/high-level_programming_languages" target="_blank" rel="noopener">high-level programming languages</a>, and turn it into machine code to be able to run it<span><sup><a href="#ref-Abelson1996-c4" class="citation" title="Abelson, Sussman, and with Julie Sussman, “Metalinguistic Abstraction.”
">@1</a></sup>.</span> Here’s where different ways of running code come in:</p>
<ul>
<li>We can run the high-level code through a <a href="https://en.wikipedia.org/wiki/Compiler" target="_blank" rel="noopener"><em>Compiler</em></a> to turn it into machine code to be able to run it directly. Example: compiling <a href="https://en.wikipedia.org/wiki/C++" target="_blank" rel="noopener">C++</a> using <a href="https://en.wikipedia.org/wiki/GNU_Compiler_Collection" target="_blank" rel="noopener">GCC</a>.</li>
<li>We can run the code through a compiler which turns it into a relatively lower-level programming language code, and then run that lower-level program through another compiler to turn it into machine code. Example: compiling <a href="https://haskell.org" target="_blank" rel="noopener">Haskell</a> into <a href="https://web.archive.org/web/20210921/https://llvm.org/" target="_blank" rel="noopener">LLVM IR</a> using <a href="https://www.haskell.org/ghc/" target="_blank" rel="noopener">GHC</a>, which can then be run through the LLVM toolchain to generate machine code.</li>
<li>We can run the code through a <a href="https://en.wikipedia.org/wiki/Transpiler" target="_blank" rel="noopener"><em>Transpiler</em></a> (also called <em>Source-to-source compiler</em>) to turn it into code in a programming language that is of similar level, and then run the resultant code with that language’s toolchain. Example: transpiling <a href="https://www.purescript.org/" target="_blank" rel="noopener">Purescript</a> into <a href="https://en.wikipedia.org/wiki/JavaScript" target="_blank" rel="noopener">JavaScript</a>, and running it with <a href="https://nodejs.org/" target="_blank" rel="noopener">node.js</a>.</li>
<li>We can compile the source code to <a href="https://en.wikipedia.org/wiki/Bytecode" target="_blank" rel="noopener"><em>Bytecode</em></a> and run the bytecode on a <a href="https://en.wikipedia.org/wiki/Virtual_Machine" target="_blank" rel="noopener"><em>Virtual Machine</em></a>. Example: <a href="https://en.wikipedia.org/wiki/Java_virtual_machine" target="_blank" rel="noopener">Java virtual machine</a> running <a href="https://en.wikipedia.org/wiki/Java" target="_blank" rel="noopener">Java</a> bytecode compiled from <a href="https://clojure.org/" target="_blank" rel="noopener">Clojure</a> source code by the Clojure compiler.</li>
<li>We can parse the code to an AST, and immediately execute the AST using an <a href="https://en.wikipedia.org/wiki/Interpreter_(computing)#Abstract_syntax_tree_interpreters" target="_blank" rel="noopener"><em>AST Interpreter</em></a>. Example: <a href="https://www.php.net/" target="_blank" rel="noopener">PHP</a> version 3, <a href="https://www.gnu.org/software/bash/" target="_blank" rel="noopener">Bash</a>. <a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></li>
<li>We can also mix-and-match parts of the above options to create hybrids, like <a href="https://en.wikipedia.org/wiki/Just-in-time_compilation" target="_blank" rel="noopener"><em>Just-in-time compilation</em></a> to machine code within a virtual machine.</li>
</ul>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 641 529'%3E%3C/svg%3E" class="lazyload w-100pct nolink" style="--image-aspect-ratio: 1.2117202268431002" data-src="/images/implementing-co-2/source-to-machine.svg" alt="Many ways to run a program"></img>
<noscript><img src="/images/implementing-co-2/source-to-machine.svg" class="w-100pct nolink" alt="Many ways to run a program"></img></noscript>
<figcaption>Many ways to run a program</figcaption>
</figure>
<p>For running <span class="fancy">Co</span> programs, we will implement an AST-walking interpreter. The interpreter implemented in this post will support only the <a href="#previously-on">basic features</a> of <span class="fancy">Co</span>. In the next post, we’ll extend it to support coroutines and channels.</p>
<div class="note">
<p>The complete code for the interpreter is <a href="https://abhinavsarkar.net/code/co-interpreter.html?mtm_campaign=feed">here</a>. You can load it in GHCi using <a href="https://en.wikipedia.org/wiki/Low-level_programming_language" target="_blank" rel="noopener">stack</a> (by running <code>stack co-interpreter.hs</code>), and follow along while reading this article.</p>
</div>
<h2 data-track-content data-content-name="runtime-values" data-content-piece="implementing-co-2" id="runtime-values">Runtime Values</h2>
<p>An AST-walking interpreter takes an AST as its input, and recursively walks down the AST nodes, from top to bottom. While doing this, it evaluates expressions to runtime values, and executes the statements to do their effects.</p>
<p>The runtime values are things that can be passed around in the code during the program run time. Often called “first-class”, these values can be assigned to variables, passed as function arguments, and returned from functions. If <span class="fancy">Co</span> were to support data structures like lists and maps, these values could be stored in them as well. The <code class="sourceCode haskell"><span class="dt">Value</span></code> ADT below represents these values:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="8-8"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Value</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Null</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Boolean</span> <span class="dt">Bool</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Str</span> <span class="dt">String</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Num</span> <span class="dt">Integer</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Function</span> <span class="dt">Identifier</span> [<span class="dt">Identifier</span>] [<span class="dt">Stmt</span>] <span class="dt">Env</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">BuiltinFunction</span> <span class="dt">Identifier</span> <span class="dt">Int</span> ([<span class="dt">Expr</span>] <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Value</span>)</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="op">|</span> <span class="dt">Chan</span> <span class="dt">Channel</span></span></span></code></pre></div>
<p>Other than the usual values like <code class="sourceCode javascript"><span class="kw">null</span></code>, booleans, strings, and numbers, we also have functions as first-class runtime values in <span class="fancy">Co</span>. We have a constructor <code class="sourceCode haskell"><span class="dt">Function</span></code> for the functions that programmers define in their <span class="fancy">Co</span> code, and another constructor <code class="sourceCode haskell"><span class="dt">BuiltinFunction</span></code> for built-in functions like <code>print</code><a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<p>We also write instances to show and check equality for these values:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="9-9"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">Value</span> <span class="kw">where</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Null</span> <span class="ot">-></span> <span class="st">"null"</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Boolean</span> b <span class="ot">-></span> <span class="fu">show</span> b</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Str</span> s <span class="ot">-></span> s</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Num</span> n <span class="ot">-></span> <span class="fu">show</span> n</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Function</span> name _ _ _ <span class="ot">-></span> <span class="st">"function "</span> <span class="op"><></span> name</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">BuiltinFunction</span> name _ _ <span class="ot">-></span> <span class="st">"function "</span> <span class="op"><></span> name</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">Chan</span> <span class="dt">Channel</span> {} <span class="ot">-></span> <span class="st">"Channel"</span></span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Eq</span> <span class="dt">Value</span> <span class="kw">where</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">Null</span> <span class="op">==</span> <span class="dt">Null</span> <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Boolean</span> b1 <span class="op">==</span> <span class="dt">Boolean</span> b2 <span class="ot">=</span> b1 <span class="op">==</span> b2</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">Str</span> s1 <span class="op">==</span> <span class="dt">Str</span> s2 <span class="ot">=</span> s1 <span class="op">==</span> s2</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">Num</span> n1 <span class="op">==</span> <span class="dt">Num</span> n2 <span class="ot">=</span> n1 <span class="op">==</span> n2</span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> _ <span class="op">==</span> _ <span class="ot">=</span> <span class="dt">False</span></span></code></pre></div>
<p>Note that only <code class="sourceCode javascript"><span class="kw">null</span></code>, booleans, strings and numbers can be checked for equality in <span class="fancy">Co</span>. Also, only values of same type can be equals. A string can never be equal to a number<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>.</p>
<p>So, how do we go about turning the expressions to values, and executing statements? Before learning that, we must take a detour into some theory of programming languages.</p>
<div class="note">
<p>Readers familiar with the concepts of environments, scopes, closures and early returns can skip the next sections, and jump directly to the <a href="#the-interpreter">implementation</a>.</p>
</div>
<h2 data-track-content data-content-name="environment-model-of-evaluation" data-content-piece="implementing-co-2" id="environment-model-of-evaluation">Environment Model of Evaluation</h2>
<p>Let’s say we have this little <span class="fancy">Co</span> program to run:</p>
<div class="sourceCode" id="cb2" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> a <span class="op">=</span> <span class="dv">2</span><span class="op">;</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">twice</span>(x) { <span class="cf">return</span> x <span class="op">+</span> x<span class="op">;</span> }</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(<span class="fu">twice</span>(a))<span class="op">;</span></span></code></pre></div>
<p>We need to evaluate <code class="sourceCode javascript"><span class="fu">twice</span>(a)</code> to a value to print it. One way to do that is to substitute variables for their values, quite literally. <code>twice</code> is a variable, value of which is a function. And <code>a</code> is another variable, with value <code class="sourceCode javascript"><span class="dv">2</span></code>. We can do repeated substitution to arrive at a resultant value like this:</p>
<div class="sourceCode" id="cb3" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(<span class="fu">twice</span>(a))<span class="op">;</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">=></span> <span class="fu">print</span>(<span class="fu">twice</span>(<span class="dv">2</span>))<span class="op">;</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="kw">=></span> <span class="fu">print</span>(<span class="dv">2</span> <span class="op">+</span> <span class="dv">2</span>)<span class="op">;</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a><span class="kw">=></span> <span class="fu">print</span>(<span class="dv">4</span>)<span class="op">;</span></span></code></pre></div>
<p>This is called the <a href="https://web.archive.org/web/20210921/https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-10.html#%25_sec_1.1.5" target="_blank" rel="noopener"><em>Substitution model of evaluation</em></a><span><sup><a href="#ref-Abelson1996-c115" class="citation" title="Abelson, Sussman, and with Julie Sussman, “The Substitution Model for
Procedure Application.”
">@5</a></sup>.</span> This works for the example we have above, and for a large set of programs<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>. However, it breaks down as soon as we add mutability to the mix:</p>
<div class="sourceCode" id="cb4" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> a <span class="op">=</span> <span class="dv">2</span><span class="op">;</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">incA</span>() {</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> b <span class="op">=</span> a <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> b<span class="op">;</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(<span class="fu">incA</span>())<span class="op">;</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>a <span class="op">=</span> <span class="dv">3</span><span class="op">;</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(<span class="fu">incA</span>())<span class="op">;</span></span></code></pre></div>
<p>Running this with the <span class="fancy">Co</span> interpreter results in the output:</p>
<pre class="plain"><code>3
4</code></pre>
<p>We can’t use the substitution model here because we can’t consider variables like <code>a</code> to be substitutable with single values anymore. Now, we must think of them more as places in which the values are stored. Also, the stored values may change over the lifetime of the program execution. We call this place where the variable values are stored, the <em>Environment</em>, and this understanding of program execution is called the <a href="https://web.archive.org/web/20210921/https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-21.html#%25_sec_3.2" target="_blank" rel="noopener"><em>Environment Model of Evaluation</em></a><span><sup><a href="#ref-Abelson1996-c32" class="citation" title="Abelson, Sussman, and with Julie Sussman, “The Environment Model of
Evaluation.”
">@7</a></sup>.</span></p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 497 65'%3E%3C/svg%3E" class="lazyload w-100pct mw-80pct nolink" style="--image-aspect-ratio: 7.6461538461538465" data-src="/images/implementing-co-2/env-model.svg" alt="Value of a variable may change over time"></img>
<noscript><img src="/images/implementing-co-2/env-model.svg" class="w-100pct mw-80pct nolink" alt="Value of a variable may change over time"></img></noscript>
<figcaption>Value of a variable may change over time</figcaption>
</figure>
<p>A pair of a variable’s name and its value at any particular time is called a <em>Binding</em>. An <em>Environment</em> is a collection of zero-or-more bindings. To fully understand environments, first we have to learn about scopes.</p>
<h2 data-track-content data-content-name="scopes" data-content-piece="implementing-co-2" id="scopes">Scopes</h2>
<p>Let’s consider the <code>twice</code> function again:</p>
<div class="sourceCode" id="cb6" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">twice</span>(x) { <span class="cf">return</span> x <span class="op">+</span> x<span class="op">;</span> }</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(<span class="fu">twice</span>(<span class="dv">1</span>))<span class="op">;</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(<span class="fu">twice</span>(<span class="dv">2</span>))<span class="op">;</span></span></code></pre></div>
<p>Calling <code>twice</code> with different arguments prints different results. The function seems to forget the value of its parameter <code>x</code> after each call. This may feel very natural to programmers, but how does it really work? The answer is <a href="https://en.wikipedia.org/wiki/Scope_(computer_science)" target="_blank" rel="noopener"><em>Scopes</em></a>.</p>
<p>A scope is a region of the program lifetime during which a variable name-to-value binding is in effect. When the program execution enters a scope, the variables in that scope become defined and available to the executing code<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>. When the program execution exits the scope, the variables become undefined and inaccessible (also known as “going out of scope”).</p>
<p><a href="https://en.wikipedia.org/wiki/Lexical_scoping" target="_blank" rel="noopener"><em>Lexical scoping</em></a> is a specific style of scoping where the structure of the program itself shows where a scope begins and ends<span><sup><a href="#ref-Abelson1996-c118" class="citation" title="Abelson, Sussman, and with Julie Sussman, “Procedures as Black-Box
Abstractions.”
">@9</a></sup>.</span> Like most modern languages, <span class="fancy">Co</span> is lexically scoped. A function in <span class="fancy">Co</span> starts a new scope which extends over the entire function body, and the scope ends when the function ends<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>. Functions are the only way of creating new scopes in <span class="fancy">Co</span><a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>.</p>
<p>That’s how repeated invocation of functions don’t remember the values of their parameters across the calls. Every time a new call is started, a new scope is created with the parameter names bound to the value of the arguments of the call. And when the call returns, this new scope is destroyed.</p>
<p>Scopes can be enclosed within other scopes. In <span class="fancy">Co</span>, this can be done by defining a function inside the body of another function. All programs have at least one scope, which is the program’s top-level scope, often called the global scope.</p>
<p>Scopes are intimately related to the environment. In fact, the structure of the environment is how scopes are implemented<span><sup><a href="#ref-Abelson1996-c556" class="citation" title="Abelson, Sussman, and with Julie Sussman, “Lexical Addressing.”
">@12</a></sup>.</span></p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 377 193'%3E%3C/svg%3E" class="lazyload w-100pct mw-70pct nolink" style="--image-aspect-ratio: 1.9533678756476685" data-src="/images/implementing-co-2/twice-scopes.svg" alt="Scopes are implemented by the environment"></img>
<noscript><img src="/images/implementing-co-2/twice-scopes.svg" class="w-100pct mw-70pct nolink" alt="Scopes are implemented by the environment"></img></noscript>
<figcaption>Scopes are implemented by the environment</figcaption>
</figure>
<p>An environment can be thought of as a stack of frames, with one frame per enclosed scope<span><sup><a href="#ref-Abelson1996-c32" class="citation" title="Abelson, Sussman, and with Julie Sussman, “The Environment Model of
Evaluation.”
">@13</a></sup>.</span> A frame contains zero-or-more bindings. The bindings in enclosed scopes (frames higher in the environment stack) hide the bindings (called <a href="https://en.wikipedia.org/wiki/Variable_shadowing" target="_blank" rel="noopener">shadowing</a>) in enclosing scopes (frames lower in the environment stack). Program’s global scope is represented by the lowermost frame in the stack.</p>
<p>The above diagram shows the frames of the two calls to the <code>twice</code> function. The scope of the <code>twice</code> function is enclosed in the global scope. To find the value of a variable inside the function, the interpret first looks into the topmost frame that represents the scope of the <code>twice</code> function. If the binding is not found, then the interpreter goes down the stack of frames, and looks into the frame for the global scope.</p>
<p>What happens when a function body tries to access variables not defined in the function’s scope? We get <em>Closures</em>.</p>
<h2 data-track-content data-content-name="closures" data-content-piece="implementing-co-2" id="closures">Closures</h2>
<p>If a function body refers to variables not defined in the function’s scope, such variables are called <a href="https://en.wikipedia.org/wiki/Free_variables_and_bound_variables" target="_blank" rel="noopener"><em>Free Variables</em></a><span><sup><a href="#ref-Abelson1996-c118" class="citation" title="Abelson, Sussman, and with Julie Sussman, “Procedures as Black-Box
Abstractions.”
">@14</a></sup>.</span> In lexically scoped languages, the value of a free variable is determined from the scope in which the function is defined. A function along with the references to all its free variables, is called a <em><a href="https://en.wikipedia.org/wiki/Closure_(computer_programming)" target="_blank" rel="noopener">Closure</a></em><a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>.</p>
<p>Closures are prevalent in programming languages with first-class functions. <span class="fancy">Co</span>—with its support for first-class functions—also supports closures. Closures in <span class="fancy">Co</span> are mutable, meaning the values of the free variables of a function can change over time, and the changes are reflected in the behavior of the function<a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a>.</p>
<p>We already saw an example of closures earlier:</p>
<div class="sourceCode" id="cb7" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> a <span class="op">=</span> <span class="dv">2</span><span class="op">;</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">incA</span>() {</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> b <span class="op">=</span> a <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> b<span class="op">;</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(<span class="fu">incA</span>())<span class="op">;</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>a <span class="op">=</span> <span class="dv">3</span><span class="op">;</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(<span class="fu">incA</span>())<span class="op">;</span></span></code></pre></div>
<p>This is how the frames exist over time for the two invocations of the <code>incA</code> function:</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 449 209'%3E%3C/svg%3E" class="lazyload w-100pct mw-80pct nolink" style="--image-aspect-ratio: 2.148325358851675" data-src="/images/implementing-co-2/incA-scopes.svg" alt="a is a free variable of the function incA"></img>
<noscript><img src="/images/implementing-co-2/incA-scopes.svg" class="w-100pct mw-80pct nolink" alt="a is a free variable of the function incA"></img></noscript>
<figcaption><code>a</code> is a free variable of the function <code>incA</code></figcaption>
</figure>
<p>Here, <code>a</code> is a free variable of the function <code>incA</code>. Its value is not present in the scope of <code>incA</code>, but is obtained from the global scope. When its value in the global scope changes later, the value returned by <code>incA</code> changes as well. In other words, <code>incA</code> and <code>a</code> together form a closure.</p>
<p>The following example demonstrates a closure with a mutable free variable and enclosed scopes:</p>
<div class="sourceCode" id="cb8" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">makeCounter</span>(name) {</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> count <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="kw">function</span> () {</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> count <span class="op">=</span> count <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(name <span class="op">+</span> <span class="st">" = "</span> <span class="op">+</span> count)<span class="op">;</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> }<span class="op">;</span></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> countA <span class="op">=</span> <span class="fu">makeCounter</span>(<span class="st">"a"</span>)<span class="op">;</span></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> countB <span class="op">=</span> <span class="fu">makeCounter</span>(<span class="st">"b"</span>)<span class="op">;</span></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a><span class="fu">countA</span>()<span class="op">;</span></span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a><span class="fu">countA</span>()<span class="op">;</span></span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a><span class="fu">countB</span>()<span class="op">;</span></span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a><span class="fu">countA</span>()<span class="op">;</span></span></code></pre></div>
<p>Here, both <code>name</code> and <code>count</code> are free variables referred in the returned function. While <code>name</code> is only read, <code>count</code> is changed in the body of the function.</p>
<p>Running the above code prints:</p>
<pre class="plain"><code>a = 1
a = 2
b = 1
a = 3</code></pre>
<p>Note that the two functions <code>countA</code> and <code>countB</code> refer to two different instances of the <code>count</code> variable, and are not affected by each other. In other words, <code>countA</code> and <code>countB</code> are two different closures for the same function.</p>
<p>Now for one last thing to learn about before we jump to the implementation: early returns.</p>
<h2 data-track-content data-content-name="early-returns" data-content-piece="implementing-co-2" id="early-returns">Early Returns</h2>
<p>Statement oriented programming languages often allow returning from a function before the entire function is done executing. This is called an <a href="https://en.wikipedia.org/wiki/Return_statement#Multiple_return_statements" target="_blank" rel="noopener"><em>Early return</em></a>. We saw an example of this in the fibonacci function in the previous post:</p>
<div class="sourceCode" id="cb10" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">fib</span>(n) {</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (n <span class="op"><</span> <span class="dv">2</span>) {</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> n<span class="op">;</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="fu">fib</span>(n <span class="op">-</span> <span class="dv">2</span>)</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> <span class="op">+</span> <span class="fu">fib</span>(n <span class="op">-</span> <span class="dv">1</span>)<span class="op">;</span></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>In the above code, when the input <code>n</code> is less than 2, the code returns early from the function at the line 3.</p>
<p>Expression oriented programming languages, like Haskell, have no early returns. Every function is an expression in Haskell, and has to be evaluated entirely<a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a> to get back a value. Since our AST-walking interpreter itself is written in Haskell, we need to figure out how to support early returns in the <span class="fancy">Co</span> code being interpreted. The interpreter should be able to stop evaluating at an AST node representing a <code class="sourceCode javascript"><span class="cf">return</span></code> statement, and jump to the node representing the function’s caller.</p>
<p>One way to implement this is <a href="https://en.wikipedia.org/wiki/Exception_(computer_science)" target="_blank" rel="noopener"><em>Exceptions</em></a>. Exceptions let us abort the execution of code at any point of execution, and resume from some other point in the lower in the function call stack. Although Haskell <a href="https://hackage.haskell.org/package/base-4.12.0.0/docs/Control-Exception.html" target="_blank" rel="noopener">supports</a> exceptions as we know them from languages like Java and Python, it also supports exceptions as values using the <em><a href="https://hackage.haskell.org/package/mtl-2.2.2/docs/Control-Monad-Except.html#t:MonadError" target="_blank" rel="noopener">Error monad</a></em>. That’s what we will leverage for implementing early returns in our interpreter.</p>
<p>Finally, we are really to start implementing the interpreter.</p>
<h2 data-track-content data-content-name="the-interpreter" data-content-piece="implementing-co-2" id="the-interpreter">The Interpreter</h2>
<p>The interpreter is implemented as a Haskell <code class="sourceCode haskell"><span class="kw">newtype</span></code> over a stack of monad using the monad transformers and typeclasses from the <a href="https://hackage.haskell.org/package/mtl" target="_blank" rel="noopener">mtl</a> library:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="4-5,6:41-6:42,16:27-16:28,17-17"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Interpreter</span> a <span class="ot">=</span> <span class="dt">Interpreter</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> runInterpreter ::</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">ExceptT</span> <span class="dt">Exception</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> (<span class="dt">ContT</span></span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> (<span class="dt">Either</span> <span class="dt">Exception</span> ())</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> (<span class="dt">StateT</span> <span class="dt">InterpreterState</span> <span class="dt">IO</span>)<span class="deemphasis">)</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> a</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> ( <span class="dt">Functor</span>,</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Applicative</span>,</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">Monad</span>,</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">MonadIO</span>,</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">MonadBase</span> <span class="dt">IO</span>,</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">MonadState</span> <span class="dt">InterpreterState</span>,</span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">MonadError</span> <span class="dt">Exception</span><span class="deemphasis">,</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">MonadCont</span></span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> )</span></code></pre></div>
<p>From bottom to top, the monad stack is comprised of:</p>
<ol type="1">
<li>the <a href="https://hackage.haskell.org/package/base-4.14.0.0/docs/System-IO.html#t:IO" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">IO</span></code></a> monad to be able to print to the console,</li>
<li>the <a href="https://hackage.haskell.org/package/mtl-2.2.2/docs/Control-Monad-State-Strict.html#t:State" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">State</span></code></a> monad transformer to track the state of the interpreter, and</li>
<li>the <a href="https://hackage.haskell.org/package/mtl-2.2.2/docs/Control-Monad-Except.html#t:Except" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Except</span></code></a> monad transformer to propagate exceptions while interpreting the code.</li>
</ol>
<p>We model the environment as <a href="https://hackage.haskell.org/package/containers/docs/Data-Map-Strict.html#t:Map" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Map</span></code></a> of variable names to <a href="https://hackage.haskell.org/package/base/docs/Data-IORef.html#t:IORef" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">IORef</span></code></a>s of values:</p>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Env</span> <span class="ot">=</span> <span class="dt">Map.Map</span> <span class="dt">Identifier</span> (<span class="dt">IORef</span> <span class="dt">Value</span>)</span></code></pre></div>
<p>The immutable nature of <code class="sourceCode haskell"><span class="dt">Map</span></code> and the mutable nature of <code class="sourceCode haskell"><span class="dt">IORef</span></code> allow us to correctly model scopes, frames and closures in <span class="fancy">Co</span>, as we see in the later sections of this post.</p>
<p>The interpreter state contains the environment used for interpretation. The state changes as variables come in and go out of scopes.</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="2:17-2:18,3-3,7:56-7:68"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">InterpreterState</span> <span class="ot">=</span> <span class="dt">InterpreterState</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> isEnv ::</span> <span class="dt">Env</span><span class="deemphasis">,</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"><span class="ot"> isCoroutines ::</span> <span class="dt">Queue</span> (<span class="dt">Coroutine</span> ())</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="ot">initInterpreterState ::</span> <span class="dt">IO</span> <span class="dt">InterpreterState</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>initInterpreterState <span class="ot">=</span> <span class="dt">InterpreterState</span> <span class="op"><$></span> builtinEnv <span class="deemphasis"><span class="op"><*></span> newQueue</span></span></code></pre></div>
<p>Initial interpreter state contains the built-in environment with bindings for the built-in functions like <code>print</code>. In particular, <code>print</code> is implemented by the <code>executePrint</code> function, which we see in a later section. Note that, <a href="https://en.wikipedia.org/wiki/arity" target="_blank" rel="noopener">arity</a> of built-in functions is also encapsulated in them.</p>
<p><a id="print-func"></a></p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="4-10"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">builtinEnv ::</span> <span class="dt">IO</span> <span class="dt">Env</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>builtinEnv <span class="ot">=</span> Map.fromList <span class="op"><$></span> <span class="fu">traverse</span> (<span class="fu">traverse</span> newIORef) [</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> (<span class="st">"print"</span>, <span class="dt">BuiltinFunction</span> <span class="st">"print"</span> <span class="dv">1</span> executePrint)</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> , (<span class="st">"newChannel"</span>,</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">BuiltinFunction</span> <span class="st">"newChannel"</span> <span class="dv">0</span> <span class="op">$</span> <span class="fu">fmap</span> <span class="dt">Chan</span> <span class="op">.</span> <span class="fu">const</span> (newChannel <span class="dv">0</span>))</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> , (<span class="st">"newBufferedChannel"</span>,</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">BuiltinFunction</span> <span class="st">"newBufferedChannel"</span> <span class="dv">1</span> executeNewBufferedChannel)</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> , (<span class="st">"sleep"</span>, <span class="dt">BuiltinFunction</span> <span class="st">"sleep"</span> <span class="dv">1</span> executeSleep)</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> , (<span class="st">"getCurrentMillis"</span>,</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">BuiltinFunction</span> <span class="st">"getCurrentMillis"</span> <span class="dv">0</span> executeGetCurrentMillis)</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> ]</span></code></pre></div>
<p>When trying to interpret wrong code like <code class="sourceCode javascript"><span class="dv">1</span> <span class="op">+</span> <span class="kw">true</span></code>, the interpreter throws runtime errors. We roll these errors along with early returns into an ADT for exceptions:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="4-4"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Exception</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">Return</span> <span class="dt">Value</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">RuntimeError</span> <span class="dt">String</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="op">|</span> <span class="dt">CoroutineQueueEmpty</span></span></span></code></pre></div>
<p>That’s it for defining the types for the interpreter. Next, we implement the functions to interpret <span class="fancy">Co</span> programs, starting with functions to work with environments.</p>
<h2 data-track-content data-content-name="manipulating-environments" data-content-piece="implementing-co-2" id="manipulating-environments">Manipulating Environments</h2>
<p>In <span class="fancy">Co</span>, variables must be initialized when being defined. Additionally, only the already defined variables can be referenced or assigned.</p>
<p>To define a new variable, we create a new <code class="sourceCode haskell"><span class="dt">IORef</span></code> with the variable’s value, insert it in the current environment map with the variable name as the key, and replace the interpreter state with the new environment map.</p>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">defineVar ::</span> <span class="dt">Identifier</span> <span class="ot">-></span> <span class="dt">Value</span> <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>defineVar name value <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> env <span class="ot"><-</span> State.gets isEnv</span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> Map.member name env</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> throw <span class="op">$</span> <span class="st">"Variable already defined: "</span> <span class="op"><></span> name</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">do</span></span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a> valueRef <span class="ot"><-</span> newIORef value</span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> setEnv <span class="op">$</span> Map.insert name valueRef env</span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a><span class="ot">setEnv ::</span> <span class="dt">Env</span> <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a>setEnv env <span class="ot">=</span> State.modify' <span class="op">$</span> \is <span class="ot">-></span> is {isEnv <span class="ot">=</span> env}</span></code></pre></div>
<p>Note that trying to redefine an already defined variable throws a runtime error.</p>
<p>We also create a helper function <code>setEnv</code> that we reuse in later sections.</p>
<p>To lookup and assign a variable, we get the current environment, lookup the <code class="sourceCode haskell"><span class="dt">IORef</span></code> in the map by the variable’s name, and then read the <code class="sourceCode haskell"><span class="dt">IORef</span></code> for lookup, or write the new value to it for assignment.</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">lookupVar ::</span> <span class="dt">Identifier</span> <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Value</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>lookupVar name <span class="ot">=</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a> State.gets isEnv <span class="op">>>=</span> findValueRef name <span class="op">>>=</span> readIORef</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a><span class="ot">assignVar ::</span> <span class="dt">Identifier</span> <span class="ot">-></span> <span class="dt">Value</span> <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>assignVar name value <span class="ot">=</span></span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> State.gets isEnv <span class="op">>>=</span> findValueRef name <span class="op">>>=</span> <span class="fu">flip</span> writeIORef value</span></code></pre></div>
<p>We use the helper function <code>findValueRef</code> to lookup a variable name in the environment map. It throws a runtime error if the variable is not already defined.</p>
<div class="sourceCode" id="cb14" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ot">findValueRef ::</span> <span class="dt">Identifier</span> <span class="ot">-></span> <span class="dt">Env</span> <span class="ot">-></span> <span class="dt">Interpreter</span> (<span class="dt">IORef</span> <span class="dt">Value</span>)</span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>findValueRef name env <span class="ot">=</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> Map.lookup name env <span class="kw">of</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> ref <span class="ot">-></span> <span class="fu">return</span> ref</span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> throw <span class="op">$</span> <span class="st">"Unknown variable: "</span> <span class="op"><></span> name</span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a><span class="ot">throw ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Interpreter</span> a</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a>throw <span class="ot">=</span> throwError <span class="op">.</span> <span class="dt">RuntimeError</span></span></code></pre></div>
<p>These functions are enough for us to implement the evaluation of expressions and execution of statements.</p>
<h2 data-track-content data-content-name="evaluating-expressions" data-content-piece="implementing-co-2" id="evaluating-expressions">Evaluating Expressions</h2>
<p><span class="fancy">Co</span> expressions are represented by the <a href="#cb1-1"><code class="sourceCode haskell"><span class="dt">Expr</span></code></a> ADT. The <code>evaluate</code> function below shows how they are evaluated to runtime values.</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="11-13"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">evaluate ::</span> <span class="dt">Expr</span> <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Value</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>evaluate <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">LNull</span> <span class="ot">-></span> <span class="fu">pure</span> <span class="dt">Null</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">LBool</span> bool <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Boolean</span> bool</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">LStr</span> str <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Str</span> str</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">LNum</span> num <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Num</span> num</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Variable</span> v <span class="ot">-></span> lookupVar v</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Lambda</span> params body <span class="ot">-></span> <span class="dt">Function</span> <span class="st">"<lambda>"</span> params body <span class="op"><$></span> State.gets isEnv</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> binary<span class="op">@</span><span class="dt">Binary</span> {} <span class="ot">-></span> evaluateBinaryOp binary</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> call<span class="op">@</span><span class="dt">Call</span> {} <span class="ot">-></span> evaluateFuncCall call</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">Receive</span> expr <span class="ot">-></span> evaluate expr <span class="op">>>=</span> \<span class="kw">case</span></span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">Chan</span> channel <span class="ot">-></span> channelReceive channel</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> val <span class="ot">-></span> throw <span class="op">$</span> <span class="st">"Cannot receive from a non-channel: "</span> <span class="op"><></span> <span class="fu">show</span> val</span></span></code></pre></div>
<p>Literals <code class="sourceCode javascript"><span class="kw">null</span></code>, booleans, strings, and numbers evaluate to themselves. Variables are looked up from the environment using the <code>lookupVar</code> function we wrote earlier. Anonymous functions are evaluated to function values that capture the current environment from the interpreter state. We learn more about function definitions and calls in the next section. Binary operations and function call expressions are handled by helper functions explained below.</p>
<div class="sourceCode" id="cb15" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ot">evaluateBinaryOp ::</span> <span class="dt">Expr</span> <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Value</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>evaluateBinaryOp <span class="op">~</span>(<span class="dt">Binary</span> op leftE rightE) <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> left <span class="ot"><-</span> evaluate leftE</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a> right <span class="ot"><-</span> evaluate rightE</span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> errMsg msg <span class="ot">=</span> msg <span class="op"><></span> <span class="st">": "</span> <span class="op"><></span> <span class="fu">show</span> left <span class="op"><></span> <span class="st">" and "</span> <span class="op"><></span> <span class="fu">show</span> right</span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> (op, left, right) <span class="kw">of</span></span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Plus</span>, <span class="dt">Num</span> n1, <span class="dt">Num</span> n2) <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Num</span> <span class="op">$</span> n1 <span class="op">+</span> n2</span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Plus</span>, <span class="dt">Str</span> s1, <span class="dt">Str</span> s2) <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Str</span> <span class="op">$</span> s1 <span class="op"><></span> s2</span>
<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Plus</span>, <span class="dt">Str</span> s1, _) <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Str</span> <span class="op">$</span> s1 <span class="op"><></span> <span class="fu">show</span> right</span>
<span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Plus</span>, _, <span class="dt">Str</span> s2) <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Str</span> <span class="op">$</span> <span class="fu">show</span> left <span class="op"><></span> s2</span>
<span id="cb15-11"><a href="#cb15-11" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Plus</span>, _, _) <span class="ot">-></span> throw <span class="op">$</span> errMsg <span class="st">"Cannot add or append"</span></span>
<span id="cb15-12"><a href="#cb15-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-13"><a href="#cb15-13" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Minus</span>, <span class="dt">Num</span> n1, <span class="dt">Num</span> n2) <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Num</span> <span class="op">$</span> n1 <span class="op">-</span> n2</span>
<span id="cb15-14"><a href="#cb15-14" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Minus</span>, _, _) <span class="ot">-></span> throw <span class="op">$</span> errMsg <span class="st">"Cannot subtract non-numbers"</span></span>
<span id="cb15-15"><a href="#cb15-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-16"><a href="#cb15-16" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Slash</span>, <span class="dt">Num</span> n1, <span class="dt">Num</span> n2) <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Num</span> <span class="op">$</span> n1 <span class="ot">`div`</span> n2</span>
<span id="cb15-17"><a href="#cb15-17" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Slash</span>, _, _) <span class="ot">-></span> throw <span class="op">$</span> errMsg <span class="st">"Cannot divide non-numbers"</span></span>
<span id="cb15-18"><a href="#cb15-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-19"><a href="#cb15-19" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Star</span>, <span class="dt">Num</span> n1, <span class="dt">Num</span> n2) <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Num</span> <span class="op">$</span> n1 <span class="op">*</span> n2</span>
<span id="cb15-20"><a href="#cb15-20" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Star</span>, _, _) <span class="ot">-></span> throw <span class="op">$</span> errMsg <span class="st">"Cannot multiply non-numbers"</span></span>
<span id="cb15-21"><a href="#cb15-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-22"><a href="#cb15-22" aria-hidden="true" tabindex="-1"></a> (<span class="dt">LessThan</span>, <span class="dt">Num</span> n1, <span class="dt">Num</span> n2) <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Boolean</span> <span class="op">$</span> n1 <span class="op"><</span> n2</span>
<span id="cb15-23"><a href="#cb15-23" aria-hidden="true" tabindex="-1"></a> (<span class="dt">LessThan</span>, _, _) <span class="ot">-></span> throw <span class="op">$</span> errMsg <span class="st">"Cannot compare non-numbers"</span></span>
<span id="cb15-24"><a href="#cb15-24" aria-hidden="true" tabindex="-1"></a> (<span class="dt">GreaterThan</span>, <span class="dt">Num</span> n1, <span class="dt">Num</span> n2) <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Boolean</span> <span class="op">$</span> n1 <span class="op">></span> n2</span>
<span id="cb15-25"><a href="#cb15-25" aria-hidden="true" tabindex="-1"></a> (<span class="dt">GreaterThan</span>, _, _) <span class="ot">-></span> throw <span class="op">$</span> errMsg <span class="st">"Cannot compare non-numbers"</span></span>
<span id="cb15-26"><a href="#cb15-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-27"><a href="#cb15-27" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Equals</span>, _, _) <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Boolean</span> <span class="op">$</span> left <span class="op">==</span> right</span>
<span id="cb15-28"><a href="#cb15-28" aria-hidden="true" tabindex="-1"></a> (<span class="dt">NotEquals</span>, _, _) <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">Boolean</span> <span class="op">$</span> left <span class="op">/=</span> right</span></code></pre></div>
<p>To evaluate a binary operation, first we recursively evaluate its left and right operands by calling <code>evaluate</code> on them. Then, depending on the operation and types of the operands, we do different things.</p>
<ul>
<li>The <code class="sourceCode javascript"><span class="op">+</span></code> operation can be used to either add two numbers, or to concat two operands when one or both of them are strings. In all other cases, it throws runtime errors.</li>
<li>The <code class="sourceCode javascript"><span class="op">-</span></code>, <code class="sourceCode haskell"><span class="op">/</span></code>, <code class="sourceCode haskell"><span class="op">*</span></code>, <code class="sourceCode javascript"><span class="op">></span></code>, and <code class="sourceCode javascript"><span class="op"><</span></code> operations can be invoked only on numbers. Other cases throw runtime errors.</li>
<li>The <code class="sourceCode javascript"><span class="op">==</span></code> and <code class="sourceCode javascript"><span class="op">!=</span></code> operations run corresponding Haskell operations on their operands.</li>
</ul>
<p>That’s all for evaluating binary operations. Next, let’s look at how to execute statements. We come back to evaluating function calls after that.</p>
<h2 data-track-content data-content-name="executing-statements" data-content-piece="implementing-co-2" id="executing-statements">Executing Statements</h2>
<p><span class="fancy">Co</span> statements are represented by the <a href="#cb2-1"><code class="sourceCode haskell"><span class="dt">Stmt</span></code></a> ADT. The <code>execute</code> function below uses a case expression to execute the various types of statements in different ways:</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="21-27"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">execute ::</span> <span class="dt">Stmt</span> <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>execute <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">ExprStmt</span> expr <span class="ot">-></span> void <span class="op">$</span> evaluate expr</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">VarStmt</span> name expr <span class="ot">-></span> evaluate expr <span class="op">>>=</span> defineVar name</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">AssignStmt</span> name expr <span class="ot">-></span> evaluate expr <span class="op">>>=</span> assignVar name</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">IfStmt</span> expr body <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> cond <span class="ot"><-</span> evaluate expr</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> when (isTruthy cond) <span class="op">$</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> traverse_ execute body</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> while<span class="op">@</span>(<span class="dt">WhileStmt</span> expr body) <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> cond <span class="ot"><-</span> evaluate expr</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> when (isTruthy cond) <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> traverse_ execute body</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> execute while</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">ReturnStmt</span> mExpr <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> mRet <span class="ot"><-</span> <span class="fu">traverse</span> evaluate mExpr</span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> throwError <span class="op">.</span> <span class="dt">Return</span> <span class="op">.</span> fromMaybe <span class="dt">Null</span> <span class="op">$</span> mRet</span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">FunctionStmt</span> name params body <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a> env <span class="ot"><-</span> State.gets isEnv</span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> defineVar name <span class="op">$</span> <span class="dt">Function</span> name params body env</span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">YieldStmt</span> <span class="ot">-></span> yield</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">SpawnStmt</span> expr <span class="ot">-></span> spawn expr</span></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">SendStmt</span> expr chan <span class="ot">-></span> evaluate chan <span class="op">>>=</span> \<span class="kw">case</span></span></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">Chan</span> channel <span class="ot">-></span> <span class="kw">do</span></span></span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> val <span class="ot"><-</span> evaluate expr</span></span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> channelSend val channel</span></span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> v <span class="ot">-></span> throw <span class="op">$</span> <span class="st">"Cannot send to a non-channel: "</span> <span class="op"><></span> <span class="fu">show</span> v</span></span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"></a> isTruthy <span class="ot">=</span> \<span class="kw">case</span></span>
<span id="cb1-30"><a href="#cb1-30" aria-hidden="true" tabindex="-1"></a> <span class="dt">Null</span> <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb1-31"><a href="#cb1-31" aria-hidden="true" tabindex="-1"></a> <span class="dt">Boolean</span> b <span class="ot">-></span> b</span>
<span id="cb1-32"><a href="#cb1-32" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">True</span></span></code></pre></div>
<p>Expressions in expression statements are evaluated by calling <code>evaluate</code> on them, and the resultant values are discarded.</p>
<p>For variable definition and assignment statements, first we evaluate the value expressions, and then define or assign variables with the given variable names and the resultant values.</p>
<p>For <code class="sourceCode javascript"><span class="cf">if</span></code> statements, first we evaluate their conditions, and if conditions yield truthy<a href="#fn11" class="footnote-ref" id="fnref11" role="doc-noteref"><sup>11</sup></a> values, we recursively execute the statement bodies. <code class="sourceCode javascript"><span class="cf">while</span></code> statements are executed in a similar fashion, except we recursively execute the <code class="sourceCode javascript"><span class="cf">while</span></code> statements again after executing their bodies.</p>
<p>For <code class="sourceCode javascript"><span class="cf">return</span></code> statements, we evaluate their optional return value expressions, and then throw the resultant values as exceptions wrapped with the <code class="sourceCode haskell"><span class="dt">Return</span></code> constructor.</p>
<p>Execution of function statements is more interesting. First thing that we do is to capture the current environment from the interpreter state. Then we define a new variable<a href="#fn12" class="footnote-ref" id="fnref12" role="doc-noteref"><sup>12</sup></a> with the function’s name and a runtime function value that encapsulates the function’s name, parameter names, and body statements, as well as, the captured environment. This is how closures record the values of functions’ free variables from their definition contexts.</p>
<p>In the next section, we see how the captured environments and returns as exceptions are used to evaluate function calls.</p>
<h2 data-track-content data-content-name="evaluating-function-calls" data-content-piece="implementing-co-2" id="evaluating-function-calls">Evaluating Function Calls</h2>
<p>The capability of defining and calling functions is the cornerstone of abstraction in programming languages. In <span class="fancy">Co</span>, functions are first-class, and are also the means of implementing scopes and closures. Named functions support recursion<a href="#fn13" class="footnote-ref" id="fnref13" role="doc-noteref"><sup>13</sup></a> as well. Hence, this section is the most important and involved one.</p>
<p>We start by evaluating the callee expression of the function call.</p>
<div class="sourceCode" id="cb16" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="ot">evaluateFuncCall ::</span> <span class="dt">Expr</span> <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Value</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>evaluateFuncCall <span class="op">~</span>(<span class="dt">Call</span> callee argEs) <span class="ot">=</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> evaluate callee <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">BuiltinFunction</span> name arity func <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a> checkArgCount name argEs arity</span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a> func argEs</span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a> func<span class="op">@</span><span class="dt">Function</span> {} <span class="ot">-></span> evaluateFuncCall' func argEs</span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> val <span class="ot">-></span> throw <span class="op">$</span> <span class="st">"Cannot call a non-function: "</span> <span class="op"><></span> <span class="fu">show</span> callee <span class="op"><></span> <span class="st">" is "</span> <span class="op"><></span> <span class="fu">show</span> val</span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a><span class="ot">checkArgCount ::</span> <span class="dt">Identifier</span> <span class="ot">-></span> [<span class="dt">Expr</span>] <span class="ot">-></span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Interpreter</span> ()</span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a>checkArgCount funcName argEs arity <span class="ot">=</span></span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a> when (<span class="fu">length</span> argEs <span class="op">/=</span> arity) <span class="op">$</span></span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a> throw <span class="op">$</span> funcName <span class="op"><></span> <span class="st">" call expected "</span> <span class="op"><></span> <span class="fu">show</span> arity</span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">" argument(s) but received "</span> <span class="op"><></span> <span class="fu">show</span> (<span class="fu">length</span> argEs)</span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a><span class="ot">executePrint ::</span> [<span class="dt">Expr</span>] <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Value</span></span>
<span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a>executePrint argEs <span class="ot">=</span></span>
<span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a> evaluate (<span class="fu">head</span> argEs) <span class="op">>>=</span> liftIO <span class="op">.</span> <span class="fu">print</span> <span class="op">>></span> <span class="fu">return</span> <span class="dt">Null</span></span></code></pre></div>
<p>If the resultant value is not a function, we throw a runtime error.</p>
<p>If we get a built-in function, we check that the count of arguments is same as the arity of the function by invoking <code>checkArgCount</code>, failing which we throw a runtime error. Then, we invoke the corresponding implementation function. For <code>print</code>, it is the <code>executePrint</code> function, in which we evaluate the argument and print it using Haskell’s <a href="https://hackage.haskell.org/package/base-4.12.0.0/docs/Prelude.html#v:print" target="_blank" rel="noopener"><code>print</code></a> function.</p>
<p>If we get a user-defined function, we evaluate the function call with the helper function <code>evaluateFuncCall'</code>. But before diving into it, let’s take a look at how the world looks from inside a function.</p>
<div class="sourceCode" id="cb17" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">makeGreeter</span>(greeting) {</span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">function</span> <span class="fu">greeter</span>(name) {</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> say <span class="op">=</span> greeting <span class="op">+</span> <span class="st">" "</span> <span class="op">+</span> name<span class="op">;</span></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(say)<span class="op">;</span></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> greeter<span class="op">;</span></span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> hello <span class="op">=</span> <span class="fu">makeGreeter</span>(<span class="st">"hello"</span>)<span class="op">;</span></span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> namaste <span class="op">=</span> <span class="fu">makeGreeter</span>(<span class="st">"namaste"</span>)<span class="op">;</span></span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a><span class="fu">hello</span>(<span class="st">"Arthur"</span>)<span class="op">;</span></span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a><span class="fu">namaste</span>(<span class="st">"Ford"</span>)<span class="op">;</span></span></code></pre></div>
<p>In the above <span class="fancy">Co</span> code, the function <code>greeter</code> has a free variable <code>greeting</code>, a bound parameter <code>name</code>, and a local variable <code>say</code>. Upon executing the code with the interpreter, we get the following output:</p>
<pre class="plain"><code>hello Arthur
namaste Ford</code></pre>
<p>The output makes sense when we understand the variables <code>hello</code> and <code>namaste</code> are closures over the function <code>greeter</code>. The environment seen from inside <code>greeter</code> when it is being executed is a mix of the scope (and hence, the environment) it is defined in, and the scope it is called in.</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 465 241'%3E%3C/svg%3E" class="lazyload w-100pct mw-70pct nolink" style="--image-aspect-ratio: 1.929460580912863" data-src="/images/implementing-co-2/function-eval.svg" alt="Function environment is a mix of its caller and definition environments"></img>
<noscript><img src="/images/implementing-co-2/function-eval.svg" class="w-100pct mw-70pct nolink" alt="Function environment is a mix of its caller and definition environments"></img></noscript>
<figcaption>Function environment is a mix of its caller and definition environments</figcaption>
</figure>
<p>More specifically, the free variables come from the definition scope, and the parameters come from the caller scope. Local variables can be derived from any combinations of free variables and parameters. With this understanding, let’s see how we evaluate function calls:</p>
<div class="sourceCode" id="cb19" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">evaluateFuncCall' ::</span> <span class="dt">Value</span> <span class="ot">-></span> [<span class="dt">Expr</span>] <span class="ot">-></span> <span class="dt">Interpreter</span> <span class="dt">Value</span></span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>evaluateFuncCall'</span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a> <span class="op">~</span>func<span class="op">@</span>(<span class="dt">Function</span> funcName params body funcDefEnv) argEs <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a> checkArgCount funcName argEs (<span class="fu">length</span> params)</span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a> funcCallEnv <span class="ot"><-</span> State.gets isEnv</span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a> setupFuncEnv</span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a> retVal <span class="ot"><-</span> executeBody funcCallEnv</span>
<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a> setEnv funcCallEnv</span>
<span id="cb19-9"><a href="#cb19-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> retVal</span>
<span id="cb19-10"><a href="#cb19-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb19-11"><a href="#cb19-11" aria-hidden="true" tabindex="-1"></a> setupFuncEnv <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb19-12"><a href="#cb19-12" aria-hidden="true" tabindex="-1"></a> args <span class="ot"><-</span> <span class="fu">traverse</span> evaluate argEs</span>
<span id="cb19-13"><a href="#cb19-13" aria-hidden="true" tabindex="-1"></a> env <span class="ot"><-</span> overrideVar funcDefEnv funcName func</span>
<span id="cb19-14"><a href="#cb19-14" aria-hidden="true" tabindex="-1"></a> env' <span class="ot"><-</span> foldM (<span class="fu">uncurry</span> <span class="op">.</span> overrideVar) env <span class="op">$</span> <span class="fu">zip</span> params args</span>
<span id="cb19-15"><a href="#cb19-15" aria-hidden="true" tabindex="-1"></a> setEnv env'</span>
<span id="cb19-16"><a href="#cb19-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-17"><a href="#cb19-17" aria-hidden="true" tabindex="-1"></a> overrideVar env name value <span class="ot">=</span></span>
<span id="cb19-18"><a href="#cb19-18" aria-hidden="true" tabindex="-1"></a> Map.insert name <span class="op"><$></span> newIORef value <span class="op"><*></span> <span class="fu">pure</span> env</span>
<span id="cb19-19"><a href="#cb19-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-20"><a href="#cb19-20" aria-hidden="true" tabindex="-1"></a> executeBody funcCallEnv <span class="ot">=</span></span>
<span id="cb19-21"><a href="#cb19-21" aria-hidden="true" tabindex="-1"></a> (traverse_ execute body <span class="op">>></span> <span class="fu">return</span> <span class="dt">Null</span>) <span class="ot">`catchError`</span> \<span class="kw">case</span></span>
<span id="cb19-22"><a href="#cb19-22" aria-hidden="true" tabindex="-1"></a> <span class="dt">Return</span> val <span class="ot">-></span> <span class="fu">return</span> val</span>
<span id="cb19-23"><a href="#cb19-23" aria-hidden="true" tabindex="-1"></a> err <span class="ot">-></span> setEnv funcCallEnv <span class="op">>></span> throwError err</span></code></pre></div>
<p>Let’s go over the above code, step by step:</p>
<ol type="1">
<li><code>evaluateFuncCall'</code> is called with the function to evaluate. We get access to the function’s name, its parameter names, body statements, and the environment it is defined in. We also get the argument expressions for the function call. (Line 2–3)</li>
<li>First, we check that the count of arguments is same as the count of the function parameter by invoking <code>checkArgCount</code>, failing which we throw a runtime error. (Line 4)</li>
<li>Then, we capture the current environment from the interpreter state. This is the function’s caller’s environment. (Line 5)</li>
<li>Next, we set up the environment in which the function will be executed (line 6). In <code>setupFuncEnv</code>:
<ol type="a">
<li>We evaluate the argument expressions in the current (caller’s) environment<a href="#fn14" class="footnote-ref" id="fnref14" role="doc-noteref"><sup>14</sup></a>. (Line 12)</li>
<li>We bind the callee function itself to its name in its own environment. This lets our function to recursively call itself. (Line 13)</li>
<li>We bind the argument values to their parameter names in the function’s environment. This lets the function body access the arguments being called with. (Line 14)</li>
<li>We set the current environment in the interpreter state to the functions’s environment. (Line 15)</li>
</ol></li>
<li>With the function environment set up, we execute the function body in <code>executeBody</code> (line 7):
<ol type="a">
<li>We execute each statement in the body, and return <code class="sourceCode javascript"><span class="kw">null</span></code> in case there was no explicit <code class="sourceCode javascript"><span class="cf">return</span></code> in the function. (Line 21)</li>
<li>If the body contains a <code class="sourceCode javascript"><span class="cf">return</span></code> statement, or if its execution throws a runtime error, we handle the exception in the <code>catchError</code> case statement.
<ol type="i">
<li>For <code class="sourceCode javascript"><span class="cf">return</span></code>, we pass along the return value. (Line 22)</li>
<li>For a runtime error, first we set the current environment back to the caller’s environment that we captured in step 3, and then we throw the error. The error is eventually handled in the <code>interpret</code> function described in the next section. (Line 23)</li>
</ol></li>
<li>We capture the value returned from executing the body. (Line 7)</li>
</ol></li>
<li>We set the current environment back to the caller’s environment that we captured in step 3. (Line 8)</li>
<li>We return the captured return value from <code>evaluateFuncCall'</code>. The function call is now complete. (Line 9)</li>
</ol>
<p>Curious readers may wonder, why do we need to use State monad, <a href="https://hackage.haskell.org/package/containers/docs/Data-Map-Strict.html#t:Map" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Map</span></code></a>s, and <a href="https://hackage.haskell.org/package/base/docs/Data-IORef.html#t:IORef" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">IORef</span></code></a>s together, when all of them do similar work of storing and mutating variables? Because, together they let us implement function calls, scopes and closures, as described below:</p>
<ol type="1">
<li>State monad lets us swap the current environment for a function’s definition environment when a function call is made, and to restore the calling environment after the call is complete.</li>
<li>Immutable maps are perfect for implementing scopes. Adding variables in an immutable map returns a modified map without changing the original map. This lets us shadow variables defined in outer scopes when entering inner scopes, while also being able to easily restore the shadowed variables by just restoring the original map after the inner scopes end<a href="#fn15" class="footnote-ref" id="fnref15" role="doc-noteref"><sup>15</sup></a>. There is no need to use a stack of mutable maps, which is how environments are generally implemented in interpreters which do not use immutable maps.</li>
<li>Lastly, putting <code class="sourceCode haskell"><span class="dt">IORef</span></code>s as values of immutable maps lets us implement mutable closures. All closures of same function share the same references to the <code class="sourceCode haskell"><span class="dt">IORef</span></code>s. This allows variable mutations made from one closure to be visible to all others. If we had used just immutable maps, changes made to variable values would not propagate between closures because of immutability.</li>
</ol>
<p>So that’s how function calls—the most crucial part of the interpreter—work. That completes the guts of our interpreter for the <a href="#previously-on">basic features</a> of <span class="fancy">Co</span>. In the next and last section, we put everything together.</p>
<h2 data-track-content data-content-name="interpreting-a-program" data-content-piece="implementing-co-2" id="interpreting-a-program">Interpreting a Program</h2>
<p>We are down to the last step. We interpret a program returned from the parser written in the <a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed">previous post</a> to run it.</p>
<div id="cb1" class="sourceCode" data-lang="haskell" data-deemphasize="5-5,8:33-8:54,8:7-8:8,12-12"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">interpret ::</span> <span class="dt">Program</span> <span class="ot">-></span> <span class="dt">IO</span> (<span class="dt">Either</span> <span class="dt">String</span> ())</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>interpret program <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> state <span class="ot"><-</span> initInterpreterState</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> retVal <span class="ot"><-</span> <span class="fu">flip</span> evalStateT state</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="op">.</span> <span class="fu">flip</span> runContT <span class="fu">return</span></span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> runExceptT</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> runInterpreter</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> <span class="deemphasis">(</span>traverse_ execute program<span class="deemphasis"> <span class="op">>></span> awaitTermination)</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> retVal <span class="kw">of</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> (<span class="dt">RuntimeError</span> err) <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Left</span> err</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> (<span class="dt">Return</span> _) <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Left</span> <span class="st">"Cannot return from outside functions"</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="deemphasis"> <span class="dt">Left</span> <span class="dt">CoroutineQueueEmpty</span> <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Right</span> ()</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> _ <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="dt">Right</span> ()</span></code></pre></div>
<p>We run the list of statements in the program by running the <code>execute</code> function on them. Then we run the monad transformer stack, layer by layer, to get the return value. Finally, we case match on the return value to catch errors, and we are done.</p>
<p>We package the parser and the interpreter together to create the <code>runFile</code> function that takes a file path, reads and parses the file, and then interprets the AST:</p>
<div class="sourceCode" id="cb20" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">runFile ::</span> <span class="dt">FilePath</span> <span class="ot">-></span> <span class="dt">IO</span> ()</span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>runFile file <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a> code <span class="ot"><-</span> <span class="fu">readFile</span> file</span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> runParser program code <span class="kw">of</span></span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> err <span class="ot">-></span> hPutStrLn stderr err</span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> program <span class="ot">-></span> interpret program <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> err <span class="ot">-></span> hPutStrLn stderr <span class="op">$</span> <span class="st">"ERROR: "</span> <span class="op"><></span> err</span>
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">return</span> ()</span></code></pre></div>
<p>Finally, we can run the interpreter on the <span class="fancy">Co</span> files:</p>
<div class="sourceCode" id="cb21" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runFile <span class="st">"fib.co"</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>0</span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a>1</span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a>1</span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a>2</span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a>3</span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a>5</span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a>0</span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a>1</span>
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a>1</span>
<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a>2</span>
<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a>3</span>
<span id="cb21-13"><a href="#cb21-13" aria-hidden="true" tabindex="-1"></a>5</span></code></pre></div>
<hr></hr>
<p>That’s all for now. We implemented the interpreter for the <a href="#previously-on">basic features</a> for <span class="fancy">Co</span>, and learned about how function calls, scopes and closures work. In the <a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed">next part</a>, we’ll extend our interpreter to add support for coroutines and channels in <span class="fancy">Co</span>.</p>
<p>The full code for the interpreter can be seen <a href="https://abhinavsarkar.net/code/co-interpreter.html?mtm_campaign=feed">here</a>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" role="list">
<div id="ref-Abelson1996-c556" class="csl-entry" role="listitem">
Abelson, Harold, Gerald Jay Sussman, and with Julie Sussman. <span>“Lexical Addressing.”</span> In <em>Structure and Interpretation of Computer Programs</em>, 2nd Editon. MIT Press/McGraw-Hill, 1996. <a href="https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-35.html#%_sec_5.5.6" target="_blank" rel="noopener">https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-35.html#%_sec_5.5.6</a>.
</div>
<div id="ref-Abelson1996-c4" class="csl-entry" role="listitem">
———. <span>“Metalinguistic Abstraction.”</span> In <em>Structure and Interpretation of Computer Programs</em>, 2nd Editon. MIT Press/McGraw-Hill, 1996. <a href="https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-25.html#%_chap_4" target="_blank" rel="noopener">https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-25.html#%_chap_4</a>.
</div>
<div id="ref-Abelson1996-c421" class="csl-entry" role="listitem">
———. <span>“Normal Order and Applicative Order.”</span> In <em>Structure and Interpretation of Computer Programs</em>, 2nd Editon. MIT Press/McGraw-Hill, 1996. <a href="https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-27.html#%_sec_4.2.1" target="_blank" rel="noopener">https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-27.html#%_sec_4.2.1</a>.
</div>
<div id="ref-Abelson1996-c118" class="csl-entry" role="listitem">
———. <span>“Procedures as Black-Box Abstractions.”</span> In <em>Structure and Interpretation of Computer Programs</em>, 2nd Editon. MIT Press/McGraw-Hill, 1996. <a href="https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-10.html#%_sec_1.1.8" target="_blank" rel="noopener">https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-10.html#%_sec_1.1.8</a>.
</div>
<div id="ref-Abelson1996-c313" class="csl-entry" role="listitem">
———. <span>“The Costs of Introducing Assignment.”</span> In <em>Structure and Interpretation of Computer Programs</em>, 2nd Editon. MIT Press/McGraw-Hill, 1996. <a href="https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-20.html#%_sec_3.1.3" target="_blank" rel="noopener">https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-20.html#%_sec_3.1.3</a>.
</div>
<div id="ref-Abelson1996-c32" class="csl-entry" role="listitem">
———. <span>“The Environment Model of Evaluation.”</span> In <em>Structure and Interpretation of Computer Programs</em>, 2nd Editon. MIT Press/McGraw-Hill, 1996. <a href="https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-21.html#%_sec_3.2" target="_blank" rel="noopener">https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-21.html#%_sec_3.2</a>.
</div>
<div id="ref-Abelson1996-c115" class="csl-entry" role="listitem">
———. <span>“The Substitution Model for Procedure Application.”</span> In <em>Structure and Interpretation of Computer Programs</em>, 2nd Editon. MIT Press/McGraw-Hill, 1996. <a href="https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-10.html#%_sec_1.1.5" target="_blank" rel="noopener">https://mitp-content-server.mit.edu/books/content/sectbyfn/books_pres_0/6515/sicp.zip/full-text/book/book-Z-H-10.html#%_sec_1.1.5</a>.
</div>
</div>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>It’s hard to find examples of real-world programming languages that are run with AST interpreters. This is because AST interpreters are too slow for real-world usage. However, they are the easiest to understand and implement, and hence are widely using in teaching programming languages theory.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>Since the user-defined and built-in functions are first-class, they can be assigned to variables, and passed as arguments to other functions. Thus, <span class="fancy">Co</span> supports <a href="https://en.wikipedia.org/wiki/higher-order_functions" target="_blank" rel="noopener">higher-order functions</a> as well.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>This is called <a href="https://en.wikipedia.org/wiki/Strong_typing" target="_blank" rel="noopener">Strong typing</a> in programming languages parlance. JavaScript, on the other hand, is a weakly typed language. In JavaScript, <code class="sourceCode javascript"><span class="dv">1</span> <span class="op">==</span> <span class="st">'1'</span></code> evaluates to <code class="sourceCode javascript"><span class="kw">true</span></code>, whereas in <span class="fancy">Co</span>, it evaluates to <code class="sourceCode javascript"><span class="kw">false</span></code>.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>The property of being able to substitute expressions for their corresponding values without changing the meaning of the program is called <a href="https://en.wikipedia.org/wiki/Referential_transparency" target="_blank" rel="noopener"><em>Referential transparency</em></a><sup><a href="#ref-Abelson1996-c313" class="citation" title="(Abelson, Sussman, and with Julie Sussman, “The Costs of Introducing
Assignment”)
">@6</a></sup>. Pure functions—like <code>twice</code> here—that do not have any side-effects are referentially transparent.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>I’m being a little hand-wavy here because most programmers have at least an intuitive understanding of scopes. Read literature for accurate details.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>This is in contrast to <a href="https://en.wikipedia.org/wiki/Dynamic_scoping" target="_blank" rel="noopener"><em>Dynamic scoping</em></a> where the a variable’s scope is essentially global, and is defined by function’s execution context instead of definition context, as in lexical scoping.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p><a href="https://en.wikipedia.org/wiki/Block_(programming)" target="_blank" rel="noopener"><em>Blocks</em></a> are another widely used structure to support lexical scoping. <span class="fancy">Co</span> doesn’t have blocks in the interest of simplicity of implementation.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>The function is said to close its free variables over its closure. Hence, the name <em>Closure</em>.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9"><p>Some programming languages like Java support a limited version of closures, which requires values of the free variables of functions to not change over time.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn10"><p>Well, not entirely, because Haskell is a lazily evaluated language.<a href="#fnref10" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn11"><p>In <span class="fancy">Co</span>, only <code class="sourceCode javascript"><span class="kw">null</span></code> and <code class="sourceCode javascript"><span class="kw">false</span></code> evaluate to false. All other values evaluate to true. This is implemented by the <code>isTruthy</code> function.<a href="#fnref11" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn12"><p>Functions are just variables in <span class="fancy">Co</span>. That is to say, functions definitions and variable definitions share the same namespace. This is how it works in many programming languages like JavaScript and Python. But some languages like <a href="https://en.wikipedia.org/wiki/Common_Lisp" target="_blank" rel="noopener">Common Lisp</a> have <a href="https://en.wikipedia.org/wiki/Common_Lisp#The_function_namespace" target="_blank" rel="noopener">separate namespaces</a> for functions and variables.<a href="#fnref12" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn13"><p><span class="fancy">Co</span> does not support mutual recursion though. This is because a function in <span class="fancy">Co</span> only sees the bindings done before its own definition. This can be fixed by either adding a special syntax for mutually recursive functions, or by hoisting all the bindings in a scope to the top of the scope, like <a href="https://developer.mozilla.org/en-US/docs/Glossary/Hoisting" target="_blank" rel="noopener">how JavaScript does</a>.</p>
<p>Anonymous functions do not support recursion at all, because they do not have names to refer to themselves in their bodies.<a href="#fnref13" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn14"><p>Evaluating function arguments before the function body is called the <a href="https://en.wikipedia.org/wiki/Evaluation_strategy#Eager_evaluation" target="_blank" rel="noopener"><em>Strict evaluation strategy</em></a>. Most of the modern programming languages work this way, for example, Java, Python, JavaScript, Ruby etc. This is in contrast to <a href="https://en.wikipedia.org/wiki/Evaluation_strategy#Non-strict_evaluation" target="_blank" rel="noopener"><em>Non-strict evaluation</em></a> in programming languages like Haskell, where the arguments to functions are evaluated only when their values are needed in the function bodies<sup><a href="#ref-Abelson1996-c421" class="citation" title="(Abelson, Sussman, and with Julie Sussman, “Normal Order and Applicative
Order”)
">@21</a></sup>.<a href="#fnref14" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn15"><p>This is what the <code>overrideVar</code> function does in the code above.<a href="#fnref15" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>Implementing Co, a Small Language With Coroutines</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed">The Parser</a>
</li>
<li>
<strong>The Interpreter</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed">Adding Coroutines</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-4/?mtm_campaign=feed">Adding Channels</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2021-09-21T00:00:00Z <p>In the <a href="https://abhinavsarkar.net/posts/implementing-co-1/">previous post</a>, we wrote the parser for <span class="fancy">Co</span>, the small language we are building in this series of posts. The previous post was all about the syntax of <span class="fancy">Co</span>. In this post we dive into the semantics of <span class="fancy">Co</span>, and write an interpreter for its basic features.</p>
https://abhinavsarkar.net/posts/implementing-co-1/ Implementing Co, a Small Language With Coroutines #1: The Parser 2021-04-24T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>Many major programming languages these days support some lightweight concurrency primitives. The most recent popular ones are <a href="https://en.wikipedia.org/wiki/Go_(programming_language)#Concurrency:_goroutines_and_channels" target="_blank" rel="noopener">Goroutines</a> in <a href="https://golang.org/" target="_blank" rel="noopener">Go</a>, <a href="https://kotlinlang.org/docs/coroutines-basics.html" target="_blank" rel="noopener">Coroutines</a> in <a href="https://kotlinlang.org/" target="_blank" rel="noopener">Kotlin</a> and <a href="https://rust-lang.github.io/async-book/01_getting_started/02_why_async.html" target="_blank" rel="noopener">Async</a> in <a href="https://www.rust-lang.org/" target="_blank" rel="noopener">Rust</a>. Let’s explore some of these concepts in detail by implementing a programming language with support for coroutines and Go-style channels.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>Implementing Co, a Small Language With Coroutines</strong>.</p>
<ol>
<li>
<strong>The Parser</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed">The Interpreter</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed">Adding Coroutines</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-4/?mtm_campaign=feed">Adding Channels</a>
</li>
</ol>
</section>
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#lightweight-concurrency">Lightweight Concurrency</a></li><li><a href="#introducing-co">Introducing <span class="fancy">Co</span></a></li><li><a href="#the-co-interpreter">The <span class="fancy">Co</span> Interpreter</a></li><li><a href="#the-co-ast">The <span class="fancy">Co</span> AST</a><ol><li><a href="#expressions">Expressions</a></li><li><a href="#statements">Statements</a></li></ol></li><li><a href="#parsing">Parsing</a><ol><li><a href="#parsing-expressions">Parsing Expressions</a></li><li><a href="#parsing-statements">Parsing Statements</a></li></ol></li></ol></nav>
<h2 data-track-content data-content-name="lightweight-concurrency" data-content-piece="implementing-co-1" id="lightweight-concurrency">Lightweight Concurrency</h2>
<p><a href="https://en.wikipedia.org/wiki/Light-weight_process" target="_blank" rel="noopener">Lightweight concurrency</a> has been a popular topic among programmers and programming language designers alike in recent times. Many languages created in the last decade have support for them either natively or using libraries. Some example are:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Go_(programming_language)#Concurrency:_goroutines_and_channels" target="_blank" rel="noopener">Goroutines</a> in <a href="https://golang.org/" target="_blank" rel="noopener">Go</a>,</li>
<li><a href="https://kotlinlang.org/docs/coroutines-basics.html" target="_blank" rel="noopener">Coroutines</a> in <a href="https://kotlinlang.org/" target="_blank" rel="noopener">Kotlin</a>,</li>
<li><a href="https://rust-lang.github.io/async-book/01_getting_started/02_why_async.html" target="_blank" rel="noopener">Async</a> in <a href="https://www.rust-lang.org/" target="_blank" rel="noopener">Rust</a>, and</li>
<li><a href="https://clojure.github.io/core.async/" target="_blank" rel="noopener">core.async</a> in <a href="https://clojure.org/" target="_blank" rel="noopener">Clojure</a>.</li>
</ul>
<p>These examples differ in their implementation details but all of them enable programmers to run millions of tasks concurrently. This capability of being able to do multiple tasks at the same time is called <a href="https://en.wikipedia.org/wiki/Computer_multitasking" target="_blank" rel="noopener"><em>Multitasking</em></a>.</p>
<p>Multitasking can be of two types:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Pre-emptive_multitasking" target="_blank" rel="noopener">Pre-emptive multitasking</a> in which the tasks can be <a href="https://en.wikipedia.org/wiki/Preemption_(computing)" target="_blank" rel="noopener">preempted</a> so that other tasks can be run.</li>
<li><a href="https://en.wikipedia.org/wiki/Cooperative_multitasking" target="_blank" rel="noopener">Cooperative multitasking</a><sup><a href="#ref-Bartel2011-ap" class="citation" title="Bartel, “Non-Preemptive Multitasking.”
">@1</a></sup> in which the tasks voluntarily yield control to other tasks to be run.</li>
</ul>
<p><a href="https://en.wikipedia.org/wiki/Coroutines" target="_blank" rel="noopener">Coroutines</a><sup><a href="#ref-Knuth1997-rv" class="citation" title="Knuth, “Coroutines.”
">@2</a></sup> are computations that support cooperative multitasking<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. Unlike ordinary <a href="https://en.wikipedia.org/wiki/Subroutines" target="_blank" rel="noopener"><em>Subroutines</em></a> that execute from start to end and do not hold any state between invocations, coroutines can exit in the middle by calling other coroutines and may later resume at the same point. They also hold state between invocations. They do so by <a href="https://en.wikipedia.org/wiki/Yield_(multithreading)" target="_blank" rel="noopener"><em>yielding</em></a> the control of the current running thread so that some other coroutine can be run on the same thread<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<p>Coroutine implementations often come with support for <a href="https://en.wikipedia.org/wiki/Channel_(programming)" target="_blank" rel="noopener"><em>Channels</em></a> for inter-coroutine communication. One coroutine can send a message over a channel, and another coroutine can receive the message from the same channel. Coroutines and channels together are an implementation of <a href="https://en.wikipedia.org/wiki/Communicating_Sequential_Processes" target="_blank" rel="noopener"><em>Communicating Sequential Processes</em></a> (CSP)<span><sup><a href="#ref-Hoare1986-ih" class="citation" title="Hoare, Communicating Sequential Processes.
">@5</a></sup>,</span> a formal language for describing patterns of interaction in concurrent systems.</p>
<p>In this series of posts, we implement <span class="fancy">Co</span>, a small dynamically typed <a href="https://en.wikipedia.org/wiki/imperative_programming" target="_blank" rel="noopener">imperative programming</a> language with support for coroutines and channels. <a href="https://haskell.org" target="_blank" rel="noopener">Haskell</a> is our choice of language to write an interpreter for <span class="fancy">Co</span>.</p>
<h2 data-track-content data-content-name="introducing-co" data-content-piece="implementing-co-1" id="introducing-co">Introducing <span class="fancy">Co</span></h2>
<p><span class="fancy">Co</span> has these <span id="basic-features">basic features</span> that are found in many programming languages:</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Dynamic_typing" target="_blank" rel="noopener">Dynamic</a> and <a href="https://en.wikipedia.org/wiki/Strong_and_weak_typing" target="_blank" rel="noopener">strong</a> typing.</li>
<li>Null, boolean, string and integer literals, and values.</li>
<li>Addition, subtraction, multiplication and integer division arithmetic operations.</li>
<li>String concatenation operation.</li>
<li>Equality and inequality checks on booleans, strings and numbers.</li>
<li>Less-than and greater-than comparison operations on numbers.</li>
<li>Variable declarations, usage and assignments.</li>
<li><code class="sourceCode javascript"><span class="cf">if</span></code> and <code class="sourceCode javascript"><span class="cf">while</span></code> statements.</li>
<li>Function declarations and calls, with support for recursion.</li>
<li>First class functions and anonymous functions.</li>
<li>Mutable closures.</li>
</ul>
<p>It also has these special features:</p>
<ul>
<li><code class="sourceCode javascript"><span class="kw">yield</span></code> statement to yield the current thread of computation (ToC).</li>
<li><code class="sourceCode"><span class="cf">spawn</span></code> statement to start a new <abbr title="Thread of computation">ToC</abbr>.</li>
<li>First class channels with operators to send and receive values over them.</li>
<li><code class="sourceCode"><span class="cf">sleep</span></code> function to sleep the current <abbr title="Thread of computation">ToC</abbr> for a given number of milliseconds.</li>
</ul>
<p>Let’s see some example code in <span class="fancy">Co</span> for illustration:</p>
<div class="sourceCode" id="cb1" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co">// Fibonacci numbers</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="co">// using a while loop</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> a <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> b <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> j <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> temp <span class="op">=</span> <span class="kw">null</span><span class="op">;</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="cf">while</span> (j <span class="op"><</span> <span class="dv">6</span>) {</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(a)<span class="op">;</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> temp <span class="op">=</span> a<span class="op">;</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> a <span class="op">=</span> b<span class="op">;</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> b <span class="op">=</span> temp <span class="op">+</span> b<span class="op">;</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> j <span class="op">=</span> j <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a><span class="co">// Fibonacci numbers</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a><span class="co">// using recursive function call</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">fib</span>(n) {</span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (n <span class="op"><</span> <span class="dv">2</span>) {</span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> n<span class="op">;</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="fu">fib</span>(n <span class="op">-</span> <span class="dv">2</span>)</span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a> <span class="op">+</span> <span class="fu">fib</span>(n <span class="op">-</span> <span class="dv">1</span>)<span class="op">;</span></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> i <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a><span class="cf">while</span> (i <span class="op"><</span> <span class="dv">6</span>) {</span>
<span id="cb1-27"><a href="#cb1-27" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="fu">fib</span>(i))<span class="op">;</span></span>
<span id="cb1-28"><a href="#cb1-28" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb1-29"><a href="#cb1-29" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<p>As you may notice, <span class="fancy">Co</span>’s syntax is heavily inspired by <a href="https://en.wikipedia.org/wiki/JavaScript" target="_blank" rel="noopener">JavaScript</a>. The code example above computes and prints<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> the first six <a href="https://en.wikipedia.org/wiki/Fibonacci_numbers" target="_blank" rel="noopener">Fibonacci numbers</a> in two different ways, and demonstrates a number of features of <span class="fancy">Co</span>, including variable declarations and assignments, <code class="sourceCode javascript"><span class="cf">while</span></code> loops, <code class="sourceCode javascript"><span class="cf">if</span></code> conditions, and function declarations and calls along with recursion.</p>
<p>We can save the code in a file and run it with the <span class="fancy">Co</span> interpreter<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a> to print this (correct) output:</p>
<pre class="plain"><code>0
1
1
2
3
5
0
1
1
2
3
5</code></pre>
<p>The next example shows the usage of coroutines in <span class="fancy">Co</span>:</p>
<div class="sourceCode" id="cb3" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">printNums</span>(start<span class="op">,</span> end) {</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> i <span class="op">=</span> start<span class="op">;</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (i <span class="op"><</span> end <span class="op">+</span> <span class="dv">1</span>) {</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(i)<span class="op">;</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">yield</span><span class="op">;</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)<span class="op">;</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a><span class="fu">printNums</span>(<span class="dv">11</span><span class="op">,</span> <span class="dv">16</span>)<span class="op">;</span></span></code></pre></div>
<p>Running this code with the interpreter prints this output:</p>
<pre class="plain"><code>11
1
12
2
13
3
14
4
15
16</code></pre>
<p>The <code>printNum</code> function prints numbers between the <code>start</code> and <code>end</code> arguments, but yields the <abbr title="Thread of computation">ToC</abbr> after each print. Notice how the prints are interleaved. This is because the two calls to the function <code>printNums</code> run concurrently in two separate coroutines.</p>
<p>The next example show the usage of channels in <span class="fancy">Co</span>:</p>
<div class="sourceCode" id="cb5" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> chan <span class="op">=</span> <span class="fu">newChannel</span>()<span class="op">;</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">player</span>(name) {</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> n <span class="op">=</span> <span class="kw">null</span><span class="op">;</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (<span class="kw">true</span>) {</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> n <span class="op">=</span> <span class="op"><-</span> chan<span class="op">;</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (n <span class="op">==</span> <span class="st">"done"</span>) {</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(name <span class="op">+</span> <span class="st">" done"</span>)<span class="op">;</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span><span class="op">;</span></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(name <span class="op">+</span> <span class="st">" "</span> <span class="op">+</span> n)<span class="op">;</span></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (n <span class="op">==</span> <span class="dv">0</span>) {</span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(name <span class="op">+</span> <span class="st">" done"</span>)<span class="op">;</span></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a> <span class="st">"done"</span> <span class="op">-></span> chan<span class="op">;</span></span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span><span class="op">;</span></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a> n <span class="op">-</span> <span class="dv">1</span> <span class="op">-></span> chan<span class="op">;</span></span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">player</span>(<span class="st">"ping"</span>)<span class="op">;</span></span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">player</span>(<span class="st">"pong"</span>)<span class="op">;</span></span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a><span class="dv">10</span> <span class="op">-></span> chan<span class="op">;</span></span></code></pre></div>
<p>This is the popular <a href="https://erlang.org/doc/getting_started/conc_prog.html" target="_blank" rel="noopener">Ping-pong</a> <a href="https://kotlinlang.org/docs/channels.html#channels-are-fair" target="_blank" rel="noopener">benchmark</a> for communication between <abbr title="Threads of computation">ToCs</abbr>. Here we use channels and coroutines for the same. Running this code prints this output:</p>
<pre class="plain"><code>ping 10
pong 9
ping 8
pong 7
ping 6
pong 5
ping 4
pong 3
ping 2
pong 1
ping 0
ping done
pong done</code></pre>
<p>Lastly, here is <a href="https://www.geeksforgeeks.org/sleep-sort-king-laziness-sorting-sleeping/" target="_blank" rel="noopener">Sleep sort</a> in <span class="fancy">Co</span>:</p>
<div class="sourceCode" id="cb7" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">sleepSort</span>(a<span class="op">,</span> b<span class="op">,</span> c<span class="op">,</span> d<span class="op">,</span> e) {</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">function</span> <span class="fu">printNum</span>(num) {</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">sleep</span>(num)<span class="op">;</span></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(num)<span class="op">;</span></span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> spawn <span class="fu">printNum</span>(a)<span class="op">;</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> spawn <span class="fu">printNum</span>(b)<span class="op">;</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> spawn <span class="fu">printNum</span>(c)<span class="op">;</span></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> spawn <span class="fu">printNum</span>(d)<span class="op">;</span></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a> spawn <span class="fu">printNum</span>(e)<span class="op">;</span></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="fu">sleepSort</span>(<span class="dv">5</span><span class="op">,</span> <span class="dv">4</span><span class="op">,</span> <span class="dv">3</span><span class="op">,</span> <span class="dv">2</span><span class="op">,</span> <span class="dv">1</span>)<span class="op">;</span></span></code></pre></div>
<p>Running this code prints this output:</p>
<pre class="plain"><code>1
2
3
4
5</code></pre>
<p>We’ll revisit these examples later while building our interpreter, and understand them as well as the code that implements them.</p>
<h2 data-track-content data-content-name="the-co-interpreter" data-content-piece="implementing-co-1" id="the-co-interpreter">The <span class="fancy">Co</span> Interpreter</h2>
<p>The <span class="fancy">Co</span> interpreter works in two stages:</p>
<ol type="1">
<li>Parsing: a parser converts <span class="fancy">Co</span> source code to <a href="https://en.wikipedia.org/wiki/Abstract_Syntax_Tree" target="_blank" rel="noopener"><em>Abstract Syntax Tree</em></a> (AST).</li>
<li>Interpretation: a <a href="https://en.wikipedia.org/wiki/Interpreter_(computing)#Abstract_syntax_tree_interpreters" target="_blank" rel="noopener">tree-walking interpreter</a> walks the AST, executes the instructions and produces the output.</li>
</ol>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 521 65'%3E%3C/svg%3E" class="lazyload w-100pct mw-80pct nolink" style="--image-aspect-ratio: 8.015384615384615" data-src="/images/implementing-co-1/stages.svg" alt="Stages of the Co interpreter"></img>
<noscript><img src="/images/implementing-co-1/stages.svg" class="w-100pct mw-80pct nolink" alt="Stages of the Co interpreter"></img></noscript>
<figcaption>Stages of the <span class="fancy">Co</span> interpreter</figcaption>
</figure>
<p>In this post, we implement the parser for <span class="fancy">Co</span>. In the <a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed">second part</a>, we create a first cut of the interpreter that supports the <a href="#basic-features">basic features</a> of <span class="fancy">Co</span>. In the third and fourth parts, we extend the interpreter to add support for coroutines and channels.</p>
<div class="note">
<p>The complete code for the parser is <a href="https://abhinavsarkar.net/code/co-parser.html?mtm_campaign=feed">here</a>. You can load it in GHCi using <a href="https://haskellstack.org/" target="_blank" rel="noopener">stack</a> (by running <code>stack co-interpreter.hs</code>), and follow along while reading this article<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>.</p>
</div>
<p>Let’s start with listing the extensions and imports needed<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>.</p>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE FlexibleContexts #-}</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE GeneralizedNewtypeDeriving #-}</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE LambdaCase #-}</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE RecordWildCards #-}</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">CoInterpreter</span> <span class="kw">where</span></span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Concurrent</span> (forkIO, threadDelay)</span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Concurrent.MVar.Lifted</span></span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span> (foldM, unless, void, when)</span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.Base</span> (<span class="dt">MonadBase</span>, liftBase)</span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.Combinators.Expr</span> (<span class="dt">Operator</span> (..), makeExprParser)</span>
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.Cont</span> (<span class="dt">ContT</span>, <span class="dt">MonadCont</span>, callCC, runContT)</span>
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.Except</span> (<span class="dt">ExceptT</span>, <span class="dt">MonadError</span> (..), runExceptT)</span>
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.IO.Class</span> (<span class="dt">MonadIO</span>, liftIO)</span>
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad.State.Strict</span> (<span class="dt">MonadState</span>, <span class="dt">StateT</span>, evalStateT)</span>
<span id="cb10-16"><a href="#cb10-16" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Control.Monad.State.Strict</span> <span class="kw">as</span> <span class="dt">State</span></span>
<span id="cb10-17"><a href="#cb10-17" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Foldable</span> (traverse_)</span>
<span id="cb10-18"><a href="#cb10-18" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.IORef.Lifted</span></span>
<span id="cb10-19"><a href="#cb10-19" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.Map.Strict</span> <span class="kw">as</span> <span class="dt">Map</span></span>
<span id="cb10-20"><a href="#cb10-20" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Maybe</span> (fromMaybe)</span>
<span id="cb10-21"><a href="#cb10-21" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.PQueue.Prio.Min</span> <span class="kw">as</span> <span class="dt">PQ</span></span>
<span id="cb10-22"><a href="#cb10-22" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Time.Clock.POSIX</span> (getPOSIXTime)</span>
<span id="cb10-23"><a href="#cb10-23" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Void</span> (<span class="dt">Void</span>)</span>
<span id="cb10-24"><a href="#cb10-24" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.Clock</span> (<span class="dt">Clock</span> (<span class="dt">Monotonic</span>), fromNanoSecs, getTime, <span class="dt">TimeSpec</span>)</span>
<span id="cb10-25"><a href="#cb10-25" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.Environment</span> (getArgs, getProgName)</span>
<span id="cb10-26"><a href="#cb10-26" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">System.IO</span> (hPutStrLn, stderr)</span>
<span id="cb10-27"><a href="#cb10-27" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Text.Megaparsec</span> <span class="kw">hiding</span> (runParser)</span>
<span id="cb10-28"><a href="#cb10-28" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Text.Megaparsec.Char</span></span>
<span id="cb10-29"><a href="#cb10-29" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Text.Megaparsec.Char.Lexer</span> <span class="kw">as</span> <span class="dt">L</span></span>
<span id="cb10-30"><a href="#cb10-30" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Text.Pretty.Simple</span></span>
<span id="cb10-31"><a href="#cb10-31" aria-hidden="true" tabindex="-1"></a> ( <span class="dt">CheckColorTty</span> (<span class="op">..</span>),</span>
<span id="cb10-32"><a href="#cb10-32" aria-hidden="true" tabindex="-1"></a> <span class="dt">OutputOptions</span> (<span class="op">..</span>),</span>
<span id="cb10-33"><a href="#cb10-33" aria-hidden="true" tabindex="-1"></a> defaultOutputOptionsNoColor,</span>
<span id="cb10-34"><a href="#cb10-34" aria-hidden="true" tabindex="-1"></a> pPrintOpt,</span>
<span id="cb10-35"><a href="#cb10-35" aria-hidden="true" tabindex="-1"></a> )</span></code></pre></div>
<p>Next, let’s take a look at the <span class="fancy">Co</span> AST.</p>
<h2 data-track-content data-content-name="the-co-ast" data-content-piece="implementing-co-1" id="the-co-ast">The <span class="fancy">Co</span> AST</h2>
<p>Since <span class="fancy">Co</span> is an <a href="https://en.wikipedia.org/wiki/imperative_programming" target="_blank" rel="noopener">imperative programming</a> language, it is naturally <a href="https://en.wikipedia.org/wiki/Statement_(computer_science)" target="_blank" rel="noopener"><em>Statement</em></a> oriented. A statement describes how an action is to be executed. A <span class="fancy">Co</span> program is a list of top-level statements.</p>
<p>Statements have internal components called <a href="https://en.wikipedia.org/wiki/Expression_(computer_science)" target="_blank" rel="noopener"><em>Expressions</em></a>. Expressions evaluate to values at program run time. Let’s look at them first.</p>
<h3 id="expressions">Expressions</h3>
<p>We represent <span class="fancy">Co</span> expressions as a Haskell <a href="https://en.wikipedia.org/wiki/Algebraic_data_type" target="_blank" rel="noopener"><em>Algebraic data type</em></a> (ADT) with one constructor per expression type:</p>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Expr</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">LNull</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">LBool</span> <span class="dt">Bool</span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">LStr</span> <span class="dt">String</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">LNum</span> <span class="dt">Integer</span></span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Variable</span> <span class="dt">Identifier</span></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Binary</span> <span class="dt">BinOp</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Call</span> <span class="dt">Expr</span> [<span class="dt">Expr</span>]</span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Lambda</span> [<span class="dt">Identifier</span>] [<span class="dt">Stmt</span>]</span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Receive</span> <span class="dt">Expr</span></span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Identifier</span> <span class="ot">=</span> <span class="dt">String</span></span></code></pre></div>
<dl>
<dt><code class="sourceCode haskell"><span class="dt">LNull</span></code></dt>
<dd>
<p>The literal <code class="sourceCode javascript"><span class="kw">null</span></code>. Evaluates to the null value.</p>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">LBool</span> <span class="dt">Bool</span></code></dt>
<dd>
<p>The boolean literals, <code class="sourceCode javascript"><span class="kw">true</span></code> and <code class="sourceCode javascript"><span class="kw">false</span></code>. Evaluate to their counterpart boolean values.</p>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">LStr</span> <span class="dt">String</span></code></dt>
<dd>
<p>A string literal like <code class="sourceCode javascript"><span class="st">"towel"</span></code>. Evaluates to a string.</p>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">LNum</span> <span class="dt">Integer</span></code></dt>
<dd>
<p>An integer literal like <code class="sourceCode javascript"><span class="dv">1</span></code> or <code class="sourceCode javascript"><span class="op">-</span><span class="dv">5</span></code>. Evaluates to an integer.</p>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">Variable</span> <span class="dt">Identifier</span></code></dt>
<dd>
<p>A variable named by an identifier like <code>a1</code> or <code>sender</code>. Evaluates to the variable’s value at the point in the execution of code. An <code class="sourceCode haskell"><span class="dt">Identifier</span></code> is a string of alphanumeric characters, starting with an alpha character.</p>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">Binary</span> <span class="dt">Op</span> <span class="dt">Expr</span> <span class="dt">Expr</span></code></dt>
<dd>
<p>A binary operation on two expressions. Example: <code class="sourceCode javascript"><span class="dv">1</span> <span class="op">+</span> <span class="dv">41</span></code> or <code class="sourceCode javascript"><span class="dv">2</span> <span class="op">==</span> <span class="st">"2"</span></code>. Supported binary operations are defined by the <code class="sourceCode haskell"><span class="dt">BinOp</span></code> enum: addition/concatenation, subtraction, multiplication, integer division, equality and inequality checks, and less-than and greater-than comparisons.</p>
</dd>
</dl>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">BinOp</span> <span class="ot">=</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">Plus</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Minus</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Slash</span></span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Star</span></span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Equals</span></span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">NotEquals</span></span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">LessThan</span></span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">GreaterThan</span></span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span></code></pre></div>
<dl>
<dt><code class="sourceCode haskell"><span class="dt">Call</span> <span class="dt">Expr</span> [<span class="dt">Expr</span>]</code></dt>
<dd>
<p>Calls the function obtained by evaluating the callee expression, with the given argument expressions. Examples: <code class="sourceCode javascript"><span class="fu">calcDistance</span>(start<span class="op">,</span> end)</code> or <code class="sourceCode javascript"><span class="fu">getResolver</span>(context)(template)</code>.</p>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">Lambda</span> [<span class="dt">Identifier</span>] [<span class="dt">Stmt</span>]</code></dt>
<dd>
<p>A function with the given parameter names and body statements. Example: <code class="sourceCode javascript"><span class="kw">function</span> (a<span class="op">,</span> b) { <span class="cf">return</span> a<span class="op">*</span>a <span class="op">+</span> b<span class="op">*</span>b<span class="op">;</span> }</code>.</p>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">Receive</span> <span class="dt">Expr</span></code></dt>
<dd>
<p>Receives a value from a channel. Examples:</p>
<div class="sourceCode" id="cb13" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="co">// receive a value from the channel and prints it</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="fu">print</span>(<span class="op"><-</span> someChannel)<span class="op">;</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="co">// receive a value from the channel and assigns it</span></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> x <span class="op">=</span> <span class="op"><-</span> someChannel<span class="op">;</span></span></code></pre></div>
</dd>
</dl>
<p>Next, we see how statements are represented in the AST.</p>
<h3 id="statements">Statements</h3>
<p>We represent the <span class="fancy">Co</span> statements as a Haskell ADT with one constructor per statement type:</p>
<div class="sourceCode" id="cb14" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Stmt</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="dt">ExprStmt</span> <span class="dt">Expr</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">VarStmt</span> <span class="dt">Identifier</span> <span class="dt">Expr</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">AssignStmt</span> <span class="dt">Identifier</span> <span class="dt">Expr</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">IfStmt</span> <span class="dt">Expr</span> [<span class="dt">Stmt</span>]</span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">WhileStmt</span> <span class="dt">Expr</span> [<span class="dt">Stmt</span>]</span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">FunctionStmt</span> <span class="dt">Identifier</span> [<span class="dt">Identifier</span>] [<span class="dt">Stmt</span>]</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">ReturnStmt</span> (<span class="dt">Maybe</span> <span class="dt">Expr</span>)</span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">YieldStmt</span></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">SpawnStmt</span> <span class="dt">Expr</span></span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">SendStmt</span> <span class="dt">Expr</span> <span class="dt">Expr</span></span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Program</span> <span class="ot">=</span> [<span class="dt">Stmt</span>]</span></code></pre></div>
<dl>
<dt><code class="sourceCode haskell"><span class="dt">ExprStmt</span> <span class="dt">Expr</span></code></dt>
<dd>
<p>A statement with just an expression. The expression’s value is thrown away after executing the statement. Example: <code class="sourceCode javascript"><span class="dv">1</span> <span class="op">+</span> <span class="dv">2</span><span class="op">;</span></code></p>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">VarStmt</span> <span class="dt">Identifier</span> <span class="dt">Expr</span></code></dt>
<dd>
<p>Defines a new variable named by an identifier and sets it to an expression’s value. In <span class="fancy">Co</span>, variables must be initialized when being defined. Example: <code class="sourceCode javascript"><span class="kw">var</span> a <span class="op">=</span> <span class="dv">5</span><span class="op">;</span></code></p>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">AssignStmt</span> <span class="dt">Identifier</span> <span class="dt">Expr</span></code></dt>
<dd>
<p>Sets an already defined variable named by an identifier to an expression’s value. Example: <code class="sourceCode javascript">a <span class="op">=</span> <span class="dv">55</span> <span class="op">+</span> <span class="st">"hello"</span><span class="op">;</span></code></p>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">IfStmt</span> <span class="dt">Expr</span> [<span class="dt">Stmt</span>]</code></dt>
<dd>
<p>Executes a list of statements if the condition expression evaluates to a truthy value. In <span class="fancy">Co</span>, <code class="sourceCode javascript"><span class="kw">null</span></code> and <code class="sourceCode javascript"><span class="kw">false</span></code> values are non-truthy, and every other value is truthy. Also note that, there are no <code class="sourceCode javascript"><span class="cf">else</span></code> branches for <code class="sourceCode javascript"><span class="cf">if</span></code> statements in <span class="fancy">Co</span>. Example:</p>
<div class="sourceCode" id="cb15" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> (a <span class="op">==</span> <span class="dv">1</span>) {</span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="st">"hello"</span>)<span class="op">;</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">WhileStmt</span> <span class="dt">Expr</span> [<span class="dt">Stmt</span>]</code></dt>
<dd>
<p>Executes a list of statements repeatedly while the condition expression evaluates to a truthy value. Example:</p>
<div class="sourceCode" id="cb16" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> n <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a><span class="cf">while</span> (n <span class="op"><</span> <span class="dv">5</span>) {</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(n)<span class="op">;</span></span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> n <span class="op">=</span> n <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">FunctionStmt</span> <span class="dt">Identifier</span> [<span class="dt">Identifier</span>] [<span class="dt">Stmt</span>]</code></dt>
<dd>
<p>Defines a function with a name, a list of parameter names, and a list of body statements. Example:</p>
<div class="sourceCode" id="cb17" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">greet</span>(greeting) {</span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(greeting <span class="op">+</span> <span class="st">" world"</span>)<span class="op">;</span></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">ReturnStmt</span> (<span class="dt">Maybe</span> <span class="dt">Expr</span>)</code></dt>
<dd>
<p>Returns from a function, optionally returning an expression’s value. Example:</p>
<div class="sourceCode" id="cb18" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">square</span>(x) {</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> x <span class="op">*</span> x<span class="op">;</span></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">YieldStmt</span></code></dt>
<dd>
<p>Suspends the currently executing <abbr title="Thread of computation">ToC</abbr> so that some other <abbr title="Thread of computation">ToC</abbr> may run. The current <abbr title="Thread of computation">ToC</abbr> resumes later from statement next to the <code class="sourceCode javascript"><span class="kw">yield</span></code> statement. Example:</p>
<div class="sourceCode" id="cb19" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">printNums</span>(start<span class="op">,</span> end) {</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> i <span class="op">=</span> start<span class="op">;</span></span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (i <span class="op"><</span> end <span class="op">+</span> <span class="dv">1</span>) {</span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(i)<span class="op">;</span></span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">yield</span><span class="op">;</span></span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">SpawnStmt</span> <span class="dt">Expr</span></code></dt>
<dd>
<p>Starts a new <abbr title="Thread of computation">ToC</abbr> in which the given expression is evaluated, which runs concurrently with all other running <abbr title="Threads of computation">ToCs</abbr>. Example:</p>
<div class="sourceCode" id="cb20" data-lang="co"><pre class="sourceCode javascript numberSource"><code class="sourceCode javascript"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)<span class="op">;</span></span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a><span class="fu">printNums</span>(<span class="dv">11</span><span class="op">,</span> <span class="dv">16</span>)<span class="op">;</span></span></code></pre></div>
</dd>
<dt><code class="sourceCode haskell"><span class="dt">SendStmt</span> <span class="dt">Expr</span> <span class="dt">Expr</span></code></dt>
<dd>
<p>Sends an expressions’s value to a channel. Example:</p>
<div class="sourceCode" id="cb21"><pre class="sourceCode numberSource js"><code class="sourceCode javascript"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="dv">1</span> <span class="op">+</span> <span class="fu">square</span>(<span class="dv">3</span>) <span class="op">-></span> someChannel<span class="op">;</span></span></code></pre></div>
</dd>
</dl>
<p>The <span class="fancy">Co</span> AST is minimal but it serves our purpose. Next, let’s figure out how to actually parse source code to AST.</p>
<h2 data-track-content data-content-name="parsing" data-content-piece="implementing-co-1" id="parsing">Parsing</h2>
<p>Parsing is the process of taking textual input data and converting it to a data structure—often a hierarchal structure like AST—while checking the input for correct syntax. There are many ways of writing parsers<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>, <a href="https://en.wikipedia.org/wiki/Parser_Combinators" target="_blank" rel="noopener"><em>Parser Combinators</em></a><sup><a href="#ref-Hutton1992-vc" class="citation" title="Hutton, “Higher-Order Functions for Parsing.”
">@11</a></sup> being one of them. Parser combinators <a href="https://hackage.haskell.org/packages/#cat:Parsing" target="_blank" rel="noopener">libraries</a> are popular in Haskell because of their ease of use and succinctness. We are going to use one such library, <a href="https://hackage.haskell.org/package/megaparsec" target="_blank" rel="noopener">Megaparsec</a>, to create a parser for <span class="fancy">Co</span>.</p>
<p>Let’s start with writing some basic parsers for the <span class="fancy">Co</span> syntax using the Megaparsec parsers.</p>
<div class="sourceCode" id="cb22" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Parser</span> <span class="ot">=</span> <span class="dt">Parsec</span> <span class="dt">Void</span> <span class="dt">String</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a><span class="ot">sc ::</span> <span class="dt">Parser</span> ()</span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a>sc <span class="ot">=</span> L.space space1 lineCmnt blockCmnt</span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a> lineCmnt <span class="ot">=</span> L.skipLineComment <span class="st">"//"</span></span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a> blockCmnt <span class="ot">=</span> L.skipBlockComment <span class="st">"/*"</span> <span class="st">"*/"</span></span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a><span class="ot">lexeme ::</span> <span class="dt">Parser</span> a <span class="ot">-></span> <span class="dt">Parser</span> a</span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a>lexeme <span class="ot">=</span> L.lexeme sc</span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a><span class="ot">symbol ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span></span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a>symbol <span class="ot">=</span> L.symbol sc</span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a><span class="ot">reserved ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> ()</span>
<span id="cb22-16"><a href="#cb22-16" aria-hidden="true" tabindex="-1"></a>reserved w <span class="ot">=</span> (lexeme <span class="op">.</span> try) <span class="op">$</span> string w <span class="op">*></span> notFollowedBy alphaNumChar</span>
<span id="cb22-17"><a href="#cb22-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-18"><a href="#cb22-18" aria-hidden="true" tabindex="-1"></a>parens,<span class="ot"> braces ::</span> <span class="dt">Parser</span> a <span class="ot">-></span> <span class="dt">Parser</span> a</span>
<span id="cb22-19"><a href="#cb22-19" aria-hidden="true" tabindex="-1"></a>parens <span class="ot">=</span> between (symbol <span class="st">"("</span>) (symbol <span class="st">")"</span>)</span>
<span id="cb22-20"><a href="#cb22-20" aria-hidden="true" tabindex="-1"></a>braces <span class="ot">=</span> between (symbol <span class="st">"{"</span>) (symbol <span class="st">"}"</span>)</span>
<span id="cb22-21"><a href="#cb22-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-22"><a href="#cb22-22" aria-hidden="true" tabindex="-1"></a>semi, identifier,<span class="ot"> stringLiteral ::</span> <span class="dt">Parser</span> <span class="dt">String</span></span>
<span id="cb22-23"><a href="#cb22-23" aria-hidden="true" tabindex="-1"></a>semi <span class="ot">=</span> symbol <span class="st">";"</span></span>
<span id="cb22-24"><a href="#cb22-24" aria-hidden="true" tabindex="-1"></a>identifier <span class="ot">=</span> lexeme ((<span class="op">:</span>) <span class="op"><$></span> letterChar <span class="op"><*></span> many alphaNumChar)</span>
<span id="cb22-25"><a href="#cb22-25" aria-hidden="true" tabindex="-1"></a>stringLiteral <span class="ot">=</span> char <span class="ch">'"'</span> <span class="op">>></span> manyTill L.charLiteral (char <span class="ch">'"'</span>) <span class="op"><*</span> sc</span>
<span id="cb22-26"><a href="#cb22-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-27"><a href="#cb22-27" aria-hidden="true" tabindex="-1"></a><span class="ot">integer ::</span> <span class="dt">Parser</span> <span class="dt">Integer</span></span>
<span id="cb22-28"><a href="#cb22-28" aria-hidden="true" tabindex="-1"></a>integer <span class="ot">=</span> lexeme (L.signed sc L.decimal)</span></code></pre></div>
<p>In the above code, <code class="sourceCode haskell"><span class="dt">Parser</span></code> is a concise type-alias for our parser type<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>.</p>
<p>The <code>sc</code> parser is the space consumer parser. It dictates what’s ignored while parsing the sources code. For <span class="fancy">Co</span>, we consider the Unicode space character and the control characters—tab, newline, carriage return, form feed, and vertical tab—as whitespaces. We use the <a href="https://hackage.haskell.org/package/megaparsec-9.0.1/docs/Text-Megaparsec-Char.html#v:space1" target="_blank" rel="noopener"><code>space1</code></a> parser to configure that. <code>sc</code> also lets us configure how to ignore line comments and block comments while parsing.</p>
<p>The <code>lexeme</code> combinator is for parsing <a href="https://en.wikipedia.org/wiki/Lexemes" target="_blank" rel="noopener"><em>Lexemes</em></a> while ignoring whitespaces and comments. It is implemented using the <a href="https://hackage.haskell.org/package/megaparsec-9.0.1/docs/Text-Megaparsec-Char-Lexer.html#v:lexeme" target="_blank" rel="noopener"><code>lexeme</code></a> combinator from Megaparsec.</p>
<p>The <code>symbol</code> combinator is for parsing symbols, that is, operators like <code class="sourceCode javascript"><span class="op">*</span></code> and <code class="sourceCode javascript"><span class="op">;</span></code>. It is implemented using the <a href="https://hackage.haskell.org/package/megaparsec-9.0.1/docs/Text-Megaparsec-Char-Lexer.html#v:symbol" target="_blank" rel="noopener"><code>symbol</code></a> combinator.</p>
<p>The <code>reserved</code> combinator is for parsing reserved keywords like <code class="sourceCode javascript"><span class="cf">if</span></code> and <code class="sourceCode javascript"><span class="cf">while</span></code>. It is implemented as a combination of <code>lexeme</code> combinator and <code>string</code> parser, while making sure that the reserved keyword is not a prefix of another word. It uses the <code>try</code> combinator to backtrack if that happens.</p>
<p>The <code>parens</code> and <code>braces</code> combinators are for parsing code surrounded by parentheses <code>(</code> and <code>)</code> and braces <code>{</code> and <code>}</code> respectively.</p>
<p>The <code class="sourceCode haskell">semi</code> parser matches a semicolon <code class="sourceCode javascript"><span class="op">;</span></code>. The <code>identifier</code> parser is for parsing identifiers in <span class="fancy">Co</span>. The <code>stringLiteral</code> parser matches a string literal like <code class="sourceCode javascript"><span class="st">"polyphonic ringtone"</span></code>. <code>integer</code> is the parser for <span class="fancy">Co</span> integers.</p>
<p>Let’s also write some functions to run the parsers and pretty-print the output in GHCi:</p>
<div class="sourceCode" id="cb23" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="ot">runParser ::</span> <span class="dt">Parser</span> a <span class="ot">-></span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Either</span> <span class="dt">String</span> a</span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a>runParser parser code <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> parse parser <span class="st">""</span> code <span class="kw">of</span></span>
<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Left</span> err <span class="ot">-></span> <span class="dt">Left</span> <span class="op">$</span> errorBundlePretty err</span>
<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Right</span> prog <span class="ot">-></span> <span class="dt">Right</span> prog</span>
<span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb23-7"><a href="#cb23-7" aria-hidden="true" tabindex="-1"></a><span class="ot">pPrint ::</span> (<span class="dt">MonadIO</span> m, <span class="dt">Show</span> a) <span class="ot">=></span> a <span class="ot">-></span> m ()</span>
<span id="cb23-8"><a href="#cb23-8" aria-hidden="true" tabindex="-1"></a>pPrint <span class="ot">=</span></span>
<span id="cb23-9"><a href="#cb23-9" aria-hidden="true" tabindex="-1"></a> pPrintOpt <span class="dt">CheckColorTty</span> <span class="op">$</span></span>
<span id="cb23-10"><a href="#cb23-10" aria-hidden="true" tabindex="-1"></a> defaultOutputOptionsNoColor</span>
<span id="cb23-11"><a href="#cb23-11" aria-hidden="true" tabindex="-1"></a> { outputOptionsIndentAmount <span class="ot">=</span> <span class="dv">2</span>,</span>
<span id="cb23-12"><a href="#cb23-12" aria-hidden="true" tabindex="-1"></a> outputOptionsCompact <span class="ot">=</span> <span class="dt">True</span>,</span>
<span id="cb23-13"><a href="#cb23-13" aria-hidden="true" tabindex="-1"></a> outputOptionsCompactParens <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb23-14"><a href="#cb23-14" aria-hidden="true" tabindex="-1"></a> }</span></code></pre></div>
<p>That completes our basic setup for parsing. Let’s try them out in GHCi now:</p>
<div class="sourceCode" id="cb24" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser identifier <span class="st">"num1 "</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>Right "num1"</span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser stringLiteral <span class="st">"\"val\" "</span></span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a>Right "val"</span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser integer <span class="st">"1 "</span></span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a>Right 1</span>
<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser integer <span class="st">"-12 "</span></span>
<span id="cb24-8"><a href="#cb24-8" aria-hidden="true" tabindex="-1"></a>Right (-12)</span></code></pre></div>
<p>They work as expected. Next, off to parsing <span class="fancy">Co</span> expressions.</p>
<h3 id="parsing-expressions">Parsing Expressions</h3>
<p>Parsing <span class="fancy">Co</span> expressions to AST requires us to handle the <a href="https://en.wikipedia.org/wiki/Operator_associativity" target="_blank" rel="noopener"><em>Associativity</em></a> and <a href="https://en.wikipedia.org/wiki/Order_of_operations" target="_blank" rel="noopener"><em>Precedence</em></a> of the operators. Fortunately, Megaparsec makes it easy with the <a href="https://hackage.haskell.org/package/parser-combinators-1.3.0/docs/Control-Monad-Combinators-Expr.html#v:makeExprParser" target="_blank" rel="noopener"><code>makeExprParser</code></a> combinator. <code>makeExprParser</code> takes a parser to parse the terms and a table of operators, and creates the expression parser for us.</p>
<ul>
<li>Terms are parts of expressions which cannot be broken down further into sub-expressions. Examples in <span class="fancy">Co</span> are literals, variables, groupings and function calls.</li>
<li>The table of operators is a list of <a href="https://hackage.haskell.org/package/parser-combinators-1.3.0/docs/Control-Monad-Combinators-Expr.html#t:Operator" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Operator</span></code></a> <code class="sourceCode haskell"><span class="dt">Parser</span> <span class="dt">Expr</span></code> lists ordered in descending precedence. All operators in one list have the same precedence but may have different associativity.</li>
</ul>
<p>This is a lot to take in but looking at the code makes it clear. First, the operator table:</p>
<div class="sourceCode" id="cb25" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ot">operators ::</span> [[<span class="dt">Operator</span> <span class="dt">Parser</span> <span class="dt">Expr</span>]]</span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a>operators <span class="ot">=</span></span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a> [ [<span class="dt">Prefix</span> <span class="op">$</span> <span class="dt">Receive</span> <span class="op"><$</span> symbol <span class="st">"<-"</span>],</span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a> [ binary <span class="dt">Slash</span> <span class="op">$</span> symbol <span class="st">"/"</span>,</span>
<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a> binary <span class="dt">Star</span> <span class="op">$</span> symbol <span class="st">"*"</span>],</span>
<span id="cb25-6"><a href="#cb25-6" aria-hidden="true" tabindex="-1"></a> [ binary <span class="dt">Plus</span> <span class="op">$</span> symbol <span class="st">"+"</span>,</span>
<span id="cb25-7"><a href="#cb25-7" aria-hidden="true" tabindex="-1"></a> binary <span class="dt">Minus</span> <span class="op">$</span> try (symbol <span class="st">"-"</span> <span class="op"><*</span> notFollowedBy (char <span class="ch">'>'</span>))</span>
<span id="cb25-8"><a href="#cb25-8" aria-hidden="true" tabindex="-1"></a> ],</span>
<span id="cb25-9"><a href="#cb25-9" aria-hidden="true" tabindex="-1"></a> [ binary <span class="dt">LessThan</span> <span class="op">$</span> symbol <span class="st">"<"</span>,</span>
<span id="cb25-10"><a href="#cb25-10" aria-hidden="true" tabindex="-1"></a> binary <span class="dt">GreaterThan</span> <span class="op">$</span> symbol <span class="st">">"</span></span>
<span id="cb25-11"><a href="#cb25-11" aria-hidden="true" tabindex="-1"></a> ],</span>
<span id="cb25-12"><a href="#cb25-12" aria-hidden="true" tabindex="-1"></a> [ binary <span class="dt">Equals</span> <span class="op">$</span> symbol <span class="st">"=="</span>,</span>
<span id="cb25-13"><a href="#cb25-13" aria-hidden="true" tabindex="-1"></a> binary <span class="dt">NotEquals</span> <span class="op">$</span> symbol <span class="st">"!="</span></span>
<span id="cb25-14"><a href="#cb25-14" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb25-15"><a href="#cb25-15" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb25-16"><a href="#cb25-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb25-17"><a href="#cb25-17" aria-hidden="true" tabindex="-1"></a> binary op symP <span class="ot">=</span> <span class="dt">InfixL</span> <span class="op">$</span> <span class="dt">Binary</span> op <span class="op"><$</span> symP</span></code></pre></div>
<p>The prefix operator <code class="sourceCode javascript"><span class="op"><-</span></code>, for receiving values from channels, is of highest precedence and hence the first in the table. Next are the binary operators <code class="sourceCode haskell"><span class="op">/</span></code> and <code class="sourceCode haskell"><span class="op">*</span></code> for integer division and multiplication respectively. Following it, are the binary <code class="sourceCode javascript"><span class="op">+</span></code> and <code class="sourceCode javascript"><span class="op">-</span></code> operators which are of same precedence. After that come the comparison operators <code class="sourceCode javascript"><span class="op"><</span></code> and <code class="sourceCode javascript"><span class="op">></span></code>. Finally, we have the lowest precedence operators, the equality and inequality checks <code class="sourceCode javascript"><span class="op">==</span></code> and <code class="sourceCode javascript"><span class="op">!=</span></code>.</p>
<p>Each operator also contains the parser to parse the operator symbol in the source code. All of them are self-explanatory except the parser for the <code class="sourceCode javascript"><span class="op">-</span></code> operator. The <code class="sourceCode javascript"><span class="op">-</span></code> parser is slightly complicated because we want it to not parse <code class="sourceCode javascript"><span class="op">-></span></code>, the channel send operator, the first character of which is same as the symbol for the <code class="sourceCode javascript"><span class="op">-</span></code> operator.</p>
<p>Moving on to the term parser next:</p>
<div class="sourceCode" id="cb26" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="ot">term ::</span> <span class="dt">Parser</span> <span class="dt">Expr</span></span>
<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a>term <span class="ot">=</span> primary <span class="op">>>=</span> call</span>
<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb26-4"><a href="#cb26-4" aria-hidden="true" tabindex="-1"></a> call e <span class="ot">=</span></span>
<span id="cb26-5"><a href="#cb26-5" aria-hidden="true" tabindex="-1"></a> ( lookAhead (symbol <span class="st">"("</span>)</span>
<span id="cb26-6"><a href="#cb26-6" aria-hidden="true" tabindex="-1"></a> <span class="op">>></span> symbol <span class="st">"("</span></span>
<span id="cb26-7"><a href="#cb26-7" aria-hidden="true" tabindex="-1"></a> <span class="op">>></span> <span class="dt">Call</span> e <span class="op"><$></span> sepBy expr (symbol <span class="st">","</span>) <span class="op"><*</span> symbol <span class="st">")"</span></span>
<span id="cb26-8"><a href="#cb26-8" aria-hidden="true" tabindex="-1"></a> <span class="op">>>=</span> call )</span>
<span id="cb26-9"><a href="#cb26-9" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="fu">pure</span> e</span>
<span id="cb26-10"><a href="#cb26-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb26-11"><a href="#cb26-11" aria-hidden="true" tabindex="-1"></a> primary <span class="ot">=</span> <span class="dt">LNull</span> <span class="op"><$</span> reserved <span class="st">"null"</span></span>
<span id="cb26-12"><a href="#cb26-12" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">LBool</span> <span class="dt">True</span> <span class="op"><$</span> reserved <span class="st">"true"</span></span>
<span id="cb26-13"><a href="#cb26-13" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">LBool</span> <span class="dt">False</span> <span class="op"><$</span> reserved <span class="st">"false"</span></span>
<span id="cb26-14"><a href="#cb26-14" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">LStr</span> <span class="op"><$></span> stringLiteral</span>
<span id="cb26-15"><a href="#cb26-15" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">LNum</span> <span class="op"><$></span> integer</span>
<span id="cb26-16"><a href="#cb26-16" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">Lambda</span></span>
<span id="cb26-17"><a href="#cb26-17" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> (reserved <span class="st">"function"</span> <span class="op">*></span> parens (sepBy identifier <span class="op">$</span> symbol <span class="st">","</span>))</span>
<span id="cb26-18"><a href="#cb26-18" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> braces (many stmt)</span>
<span id="cb26-19"><a href="#cb26-19" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">Variable</span> <span class="op"><$></span> identifier</span>
<span id="cb26-20"><a href="#cb26-20" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> parens expr</span></code></pre></div>
<p>The <code>term</code> parser uses the <code>primary</code> parser to parse the primary terms, and the <code>call</code> parser to parse the function calls. The <code>call</code> parser is recursive because the callee itself can be a chain of more functions calls, for example <code class="sourceCode javascript"><span class="fu">x</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">2</span>)(y)(<span class="st">"blah"</span>)()</code>. It starts with the output of the <code>primary</code> parser, and then tries to parse a function call. If it succeeds, it calls itself again to parse the next function call in the chain. If it fails, it returns the output of the previous round<a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a>.</p>
<p>The <code>primary</code> parser is a combination of smaller parsers—one for each type of primary terms in <span class="fancy">Co</span>—combined together using the <a href="https://hackage.haskell.org/package/base-4.12.0.0/docs/Control-Applicative.html#t:Alternative" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Alternative</span></code></a> instance of the parsers. It matches each parser one by one from the top until the match succeeds<a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a>. First, it tries to match for literals <code class="sourceCode javascript"><span class="kw">null</span></code>, <code class="sourceCode javascript"><span class="kw">true</span></code>, and <code class="sourceCode javascript"><span class="kw">false</span></code>, failing which it matches for string and integer literals. Then it matches for anonymous functions, variables, and expressions grouped in parentheses—in that order<a href="#fn11" class="footnote-ref" id="fnref11" role="doc-noteref"><sup>11</sup></a>.</p>
<p>That’s it! We finally write the expression parser:</p>
<div class="sourceCode" id="cb27" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="ot">expr ::</span> <span class="dt">Parser</span> <span class="dt">Expr</span></span>
<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a>expr <span class="ot">=</span> makeExprParser term operators</span></code></pre></div>
<p>Let’s play with it in GHCi:</p>
<div class="sourceCode" id="cb28" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> pPrint <span class="op">$</span> runParser expr <span class="st">"1 + a < 9 - <- chan"</span></span>
<span id="cb28-2"><a href="#cb28-2" aria-hidden="true" tabindex="-1"></a>Right</span>
<span id="cb28-3"><a href="#cb28-3" aria-hidden="true" tabindex="-1"></a> ( Binary LessThan</span>
<span id="cb28-4"><a href="#cb28-4" aria-hidden="true" tabindex="-1"></a> ( Binary Plus ( LNum 1 ) ( Variable "a" ) )</span>
<span id="cb28-5"><a href="#cb28-5" aria-hidden="true" tabindex="-1"></a> ( Binary Minus ( LNum 9 ) ( Receive ( Variable "chan" ) ) ) )</span>
<span id="cb28-6"><a href="#cb28-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> pPrint <span class="op">$</span> runParser expr <span class="st">"funFun(null == \"ss\" + 12, true)"</span></span>
<span id="cb28-7"><a href="#cb28-7" aria-hidden="true" tabindex="-1"></a>Right</span>
<span id="cb28-8"><a href="#cb28-8" aria-hidden="true" tabindex="-1"></a> ( Call "funFun"</span>
<span id="cb28-9"><a href="#cb28-9" aria-hidden="true" tabindex="-1"></a> [ Binary Equals LNull</span>
<span id="cb28-10"><a href="#cb28-10" aria-hidden="true" tabindex="-1"></a> ( Binary Plus ( LStr "ss" ) ( LNum 12 ) )</span>
<span id="cb28-11"><a href="#cb28-11" aria-hidden="true" tabindex="-1"></a> , LBool True ] )</span>
<span id="cb28-12"><a href="#cb28-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> pPrint <span class="op">$</span> runParser expr <span class="st">"-99 - <- chan + funkyFun(a, false, \"hey\")"</span></span>
<span id="cb28-13"><a href="#cb28-13" aria-hidden="true" tabindex="-1"></a>Right</span>
<span id="cb28-14"><a href="#cb28-14" aria-hidden="true" tabindex="-1"></a> ( Binary Plus</span>
<span id="cb28-15"><a href="#cb28-15" aria-hidden="true" tabindex="-1"></a> ( Binary Minus ( LNum ( - 99 ) ) ( Receive ( Variable "chan" ) ) )</span>
<span id="cb28-16"><a href="#cb28-16" aria-hidden="true" tabindex="-1"></a> ( Call "funkyFun" [ Variable "a", LBool False, LStr "hey" ] ) )</span>
<span id="cb28-17"><a href="#cb28-17" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> pPrint <span class="op">$</span> runParser expr <span class="st">"(function (x) { print(x + 1); })(99)"</span></span>
<span id="cb28-18"><a href="#cb28-18" aria-hidden="true" tabindex="-1"></a>Right</span>
<span id="cb28-19"><a href="#cb28-19" aria-hidden="true" tabindex="-1"></a> ( Call</span>
<span id="cb28-20"><a href="#cb28-20" aria-hidden="true" tabindex="-1"></a> ( Lambda [ "x" ]</span>
<span id="cb28-21"><a href="#cb28-21" aria-hidden="true" tabindex="-1"></a> [ ExprStmt</span>
<span id="cb28-22"><a href="#cb28-22" aria-hidden="true" tabindex="-1"></a> ( Call</span>
<span id="cb28-23"><a href="#cb28-23" aria-hidden="true" tabindex="-1"></a> ( Variable "print" )</span>
<span id="cb28-24"><a href="#cb28-24" aria-hidden="true" tabindex="-1"></a> [ Binary Plus ( Variable "x" ) ( LNum 1 ) ] ) ] )</span>
<span id="cb28-25"><a href="#cb28-25" aria-hidden="true" tabindex="-1"></a> [ LNum 99 ] )</span></code></pre></div>
<p>Done! Onward, to parsing <span class="fancy">Co</span> statements.</p>
<h3 id="parsing-statements">Parsing Statements</h3>
<p>For parsing statements, we reuse the same trick we used for parsing expressions: combine smaller parsers for each statement type using <code class="sourceCode haskell"><span class="dt">Alternative</span></code>.</p>
<div class="sourceCode" id="cb29" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="ot">stmt ::</span> <span class="dt">Parser</span> <span class="dt">Stmt</span></span>
<span id="cb29-2"><a href="#cb29-2" aria-hidden="true" tabindex="-1"></a>stmt <span class="ot">=</span></span>
<span id="cb29-3"><a href="#cb29-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">IfStmt</span> <span class="op"><$></span> (reserved <span class="st">"if"</span> <span class="op">*></span> parens expr) <span class="op"><*></span> braces (many stmt)</span>
<span id="cb29-4"><a href="#cb29-4" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">WhileStmt</span> <span class="op"><$></span> (reserved <span class="st">"while"</span> <span class="op">*></span> parens expr) <span class="op"><*></span> braces (many stmt)</span>
<span id="cb29-5"><a href="#cb29-5" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">VarStmt</span> <span class="op"><$></span> (reserved <span class="st">"var"</span> <span class="op">*></span> identifier) <span class="op"><*></span> (symbol <span class="st">"="</span> <span class="op">*></span> expr <span class="op"><*</span> semi)</span>
<span id="cb29-6"><a href="#cb29-6" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">YieldStmt</span> <span class="op"><$</span> (reserved <span class="st">"yield"</span> <span class="op"><*</span> semi)</span>
<span id="cb29-7"><a href="#cb29-7" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">SpawnStmt</span> <span class="op"><$></span> (reserved <span class="st">"spawn"</span> <span class="op">*></span> expr <span class="op"><*</span> semi)</span>
<span id="cb29-8"><a href="#cb29-8" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">ReturnStmt</span> <span class="op"><$></span> (reserved <span class="st">"return"</span> <span class="op">*></span> optional expr <span class="op"><*</span> semi)</span>
<span id="cb29-9"><a href="#cb29-9" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">FunctionStmt</span></span>
<span id="cb29-10"><a href="#cb29-10" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> try (reserved <span class="st">"function"</span> <span class="op">*></span> identifier)</span>
<span id="cb29-11"><a href="#cb29-11" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> parens (sepBy identifier <span class="op">$</span> symbol <span class="st">","</span>)</span>
<span id="cb29-12"><a href="#cb29-12" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> braces (many stmt)</span>
<span id="cb29-13"><a href="#cb29-13" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> try (<span class="dt">AssignStmt</span> <span class="op"><$></span> identifier <span class="op"><*></span> (symbol <span class="st">"="</span> <span class="op">*></span> expr <span class="op"><*</span> semi))</span>
<span id="cb29-14"><a href="#cb29-14" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> try (<span class="dt">SendStmt</span> <span class="op"><$></span> expr <span class="op"><*></span> (symbol <span class="st">"->"</span> <span class="op">*></span> expr <span class="op"><*</span> semi))</span>
<span id="cb29-15"><a href="#cb29-15" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="dt">ExprStmt</span> <span class="op"><$></span> expr <span class="op"><*</span> semi</span></code></pre></div>
<p>Most of the statements start with keywords like <code class="sourceCode javascript"><span class="cf">if</span></code>, <code class="sourceCode javascript"><span class="kw">var</span></code>, <code class="sourceCode javascript"><span class="kw">function</span></code>, etc, so our parsers look for the keywords first using the <code>reserved</code> parser. If none of the keyword-starting statements match then we try matching for assignments, channel sends, and expression statements, in that order. The <code>stmt</code> parser uses the <code>expr</code> parser to parse expressions within statements. It also uses the combinators <a href="https://hackage.haskell.org/package/parser-combinators-1.3.0/docs/Control-Applicative-Combinators.html#v:many" target="_blank" rel="noopener"><code>many</code></a>, <a href="https://hackage.haskell.org/package/parser-combinators-1.3.0/docs/Control-Applicative-Combinators.html#v:optional" target="_blank" rel="noopener"><code>optional</code></a>, <a href="https://hackage.haskell.org/package/megaparsec-9.0.1/docs/Text-Megaparsec.html#v:try" target="_blank" rel="noopener"><code>try</code></a> and <a href="https://hackage.haskell.org/package/parser-combinators-1.3.0/docs/Control-Applicative-Combinators.html#v:sepBy" target="_blank" rel="noopener"><code>sepBy</code></a> from Megaparsec.</p>
<p>Finally, we put everything together to create the parser for <span class="fancy">Co</span>:</p>
<div class="sourceCode" id="cb30" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb30-1"><a href="#cb30-1" aria-hidden="true" tabindex="-1"></a><span class="ot">program ::</span> <span class="dt">Parser</span> <span class="dt">Program</span></span>
<span id="cb30-2"><a href="#cb30-2" aria-hidden="true" tabindex="-1"></a>program <span class="ot">=</span> sc <span class="op">*></span> many stmt <span class="op"><*</span> eof</span></code></pre></div>
<p>The <span class="fancy">Co</span> parser matches multiple top-level statements, starting with optional whitespace and ending with end-of-file.</p>
<p>It’s demo time. In GHCi, we parse a file and pretty-print the AST:</p>
<div class="sourceCode" id="cb31" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">readFile</span> <span class="st">"coroutines.co"</span> <span class="op">>>=</span> pPrint <span class="op">.</span> runParser program</span>
<span id="cb31-2"><a href="#cb31-2" aria-hidden="true" tabindex="-1"></a>Right</span>
<span id="cb31-3"><a href="#cb31-3" aria-hidden="true" tabindex="-1"></a> [ FunctionStmt "printNums"</span>
<span id="cb31-4"><a href="#cb31-4" aria-hidden="true" tabindex="-1"></a> [ "start", "end" ]</span>
<span id="cb31-5"><a href="#cb31-5" aria-hidden="true" tabindex="-1"></a> [ VarStmt "i"</span>
<span id="cb31-6"><a href="#cb31-6" aria-hidden="true" tabindex="-1"></a> ( Variable "start" )</span>
<span id="cb31-7"><a href="#cb31-7" aria-hidden="true" tabindex="-1"></a> , WhileStmt</span>
<span id="cb31-8"><a href="#cb31-8" aria-hidden="true" tabindex="-1"></a> ( Binary LessThan</span>
<span id="cb31-9"><a href="#cb31-9" aria-hidden="true" tabindex="-1"></a> ( Variable "i" )</span>
<span id="cb31-10"><a href="#cb31-10" aria-hidden="true" tabindex="-1"></a> ( Binary Plus ( Variable "end" ) ( LNum 1 ) ) )</span>
<span id="cb31-11"><a href="#cb31-11" aria-hidden="true" tabindex="-1"></a> [ ExprStmt</span>
<span id="cb31-12"><a href="#cb31-12" aria-hidden="true" tabindex="-1"></a> ( Call "print" [ Variable "i" ] )</span>
<span id="cb31-13"><a href="#cb31-13" aria-hidden="true" tabindex="-1"></a> , YieldStmt</span>
<span id="cb31-14"><a href="#cb31-14" aria-hidden="true" tabindex="-1"></a> , AssignStmt "i"</span>
<span id="cb31-15"><a href="#cb31-15" aria-hidden="true" tabindex="-1"></a> ( Binary Plus ( Variable "i" ) ( LNum 1 ) ) ] ]</span>
<span id="cb31-16"><a href="#cb31-16" aria-hidden="true" tabindex="-1"></a> , SpawnStmt</span>
<span id="cb31-17"><a href="#cb31-17" aria-hidden="true" tabindex="-1"></a> ( Call "printNums" [ LNum 1, LNum 4 ] )</span>
<span id="cb31-18"><a href="#cb31-18" aria-hidden="true" tabindex="-1"></a> , ExprStmt</span>
<span id="cb31-19"><a href="#cb31-19" aria-hidden="true" tabindex="-1"></a> ( Call "printNums" [ LNum 11, LNum 16 ] ) ]</span></code></pre></div>
<p>I’ve gone ahead and created a side-by-side view of the source code and corresponding AST for all three input files, aligned neatly for your pleasure of comparison. If you so wish, feast your eyes on them, and check for yourself that everything works correctly.</p>
<details>
<summary class="print-em">
Source code and AST for <code>fib.co</code>
</summary>
<div class="sbs-code">
<div class="scrollable-table">
<table>
<colgroup>
<col style="width: 18%"></col>
<col style="width: 81%"></col>
</colgroup>
<tbody>
<tr>
<td><div class="sourceCode" id="cb32" data-lang="co"><pre class="sourceCode javascript small noNumberSource"><code class="sourceCode javascript"><span id="cb32-1"><a href="#cb32-1" aria-hidden="true" tabindex="-1"></a><span class="co">// Fibonacci numbers</span></span>
<span id="cb32-2"><a href="#cb32-2" aria-hidden="true" tabindex="-1"></a><span class="co">// using a while loop</span></span>
<span id="cb32-3"><a href="#cb32-3" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> a <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb32-4"><a href="#cb32-4" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> b <span class="op">=</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb32-5"><a href="#cb32-5" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> j <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb32-6"><a href="#cb32-6" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> temp <span class="op">=</span> <span class="kw">null</span><span class="op">;</span></span>
<span id="cb32-7"><a href="#cb32-7" aria-hidden="true" tabindex="-1"></a><span class="cf">while</span> (j <span class="op"><</span> <span class="dv">6</span>) {</span>
<span id="cb32-8"><a href="#cb32-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(a)<span class="op">;</span></span>
<span id="cb32-9"><a href="#cb32-9" aria-hidden="true" tabindex="-1"></a> temp <span class="op">=</span> a<span class="op">;</span></span>
<span id="cb32-10"><a href="#cb32-10" aria-hidden="true" tabindex="-1"></a> a <span class="op">=</span> b<span class="op">;</span></span>
<span id="cb32-11"><a href="#cb32-11" aria-hidden="true" tabindex="-1"></a> b <span class="op">=</span> temp <span class="op">+</span> b<span class="op">;</span></span>
<span id="cb32-12"><a href="#cb32-12" aria-hidden="true" tabindex="-1"></a> j <span class="op">=</span> j <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb32-13"><a href="#cb32-13" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb32-14"><a href="#cb32-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb32-15"><a href="#cb32-15" aria-hidden="true" tabindex="-1"></a><span class="co">// Fibonacci numbers</span></span>
<span id="cb32-16"><a href="#cb32-16" aria-hidden="true" tabindex="-1"></a><span class="co">// using recursive function call</span></span>
<span id="cb32-17"><a href="#cb32-17" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">fib</span>(n) {</span>
<span id="cb32-18"><a href="#cb32-18" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (n <span class="op"><</span> <span class="dv">2</span>) {</span>
<span id="cb32-19"><a href="#cb32-19" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> n<span class="op">;</span></span>
<span id="cb32-20"><a href="#cb32-20" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb32-21"><a href="#cb32-21" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> <span class="fu">fib</span>(n <span class="op">-</span> <span class="dv">2</span>)</span>
<span id="cb32-22"><a href="#cb32-22" aria-hidden="true" tabindex="-1"></a> <span class="op">+</span> <span class="fu">fib</span>(n <span class="op">-</span> <span class="dv">1</span>)<span class="op">;</span></span>
<span id="cb32-23"><a href="#cb32-23" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb32-24"><a href="#cb32-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb32-25"><a href="#cb32-25" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> i <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb32-26"><a href="#cb32-26" aria-hidden="true" tabindex="-1"></a><span class="cf">while</span> (i <span class="op"><</span> <span class="dv">6</span>) {</span>
<span id="cb32-27"><a href="#cb32-27" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(<span class="fu">fib</span>(i))<span class="op">;</span></span>
<span id="cb32-28"><a href="#cb32-28" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb32-29"><a href="#cb32-29" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div></td>
<td><div class="sourceCode" id="cb33" data-lang="haskell"><pre class="sourceCode haskell small noNumberSource"><code class="sourceCode haskell"><span id="cb33-1"><a href="#cb33-1" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb33-2"><a href="#cb33-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb33-3"><a href="#cb33-3" aria-hidden="true" tabindex="-1"></a>[ <span class="dt">VarStmt</span> <span class="st">"a"</span> ( <span class="dt">LNum</span> <span class="dv">0</span> )</span>
<span id="cb33-4"><a href="#cb33-4" aria-hidden="true" tabindex="-1"></a>, <span class="dt">VarStmt</span> <span class="st">"b"</span> ( <span class="dt">LNum</span> <span class="dv">1</span> )</span>
<span id="cb33-5"><a href="#cb33-5" aria-hidden="true" tabindex="-1"></a>, <span class="dt">VarStmt</span> <span class="st">"j"</span> ( <span class="dt">LNum</span> <span class="dv">0</span> )</span>
<span id="cb33-6"><a href="#cb33-6" aria-hidden="true" tabindex="-1"></a>, <span class="dt">VarStmt</span> <span class="st">"temp"</span> <span class="dt">LNull</span></span>
<span id="cb33-7"><a href="#cb33-7" aria-hidden="true" tabindex="-1"></a>, <span class="dt">WhileStmt</span> ( <span class="dt">Binary</span> <span class="dt">LessThan</span> ( <span class="dt">Variable</span> <span class="st">"j"</span> ) ( <span class="dt">LNum</span> <span class="dv">6</span> ) )</span>
<span id="cb33-8"><a href="#cb33-8" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">ExprStmt</span> ( <span class="dt">Call</span> <span class="st">"print"</span> [ <span class="dt">Variable</span> <span class="st">"a"</span> ] )</span>
<span id="cb33-9"><a href="#cb33-9" aria-hidden="true" tabindex="-1"></a> , <span class="dt">AssignStmt</span> <span class="st">"temp"</span> ( <span class="dt">Variable</span> <span class="st">"a"</span> )</span>
<span id="cb33-10"><a href="#cb33-10" aria-hidden="true" tabindex="-1"></a> , <span class="dt">AssignStmt</span> <span class="st">"a"</span> ( <span class="dt">Variable</span> <span class="st">"b"</span> )</span>
<span id="cb33-11"><a href="#cb33-11" aria-hidden="true" tabindex="-1"></a> , <span class="dt">AssignStmt</span> <span class="st">"b"</span> ( <span class="dt">Binary</span> <span class="dt">Plus</span> ( <span class="dt">Variable</span> <span class="st">"temp"</span> ) ( <span class="dt">Variable</span> <span class="st">"b"</span> ) )</span>
<span id="cb33-12"><a href="#cb33-12" aria-hidden="true" tabindex="-1"></a> , <span class="dt">AssignStmt</span> <span class="st">"j"</span> ( <span class="dt">Binary</span> <span class="dt">Plus</span> ( <span class="dt">Variable</span> <span class="st">"j"</span> ) ( <span class="dt">LNum</span> <span class="dv">1</span> ) )</span>
<span id="cb33-13"><a href="#cb33-13" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb33-14"><a href="#cb33-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb33-15"><a href="#cb33-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb33-16"><a href="#cb33-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb33-17"><a href="#cb33-17" aria-hidden="true" tabindex="-1"></a>, <span class="dt">FunctionStmt</span> <span class="st">"fib"</span> [ <span class="st">"n"</span> ]</span>
<span id="cb33-18"><a href="#cb33-18" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">IfStmt</span> ( <span class="dt">Binary</span> <span class="dt">LessThan</span> ( <span class="dt">Variable</span> <span class="st">"n"</span> ) ( <span class="dt">LNum</span> <span class="dv">2</span> ) )</span>
<span id="cb33-19"><a href="#cb33-19" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">ReturnStmt</span> ( <span class="dt">Just</span> ( <span class="dt">Variable</span> <span class="st">"n"</span> ) )</span>
<span id="cb33-20"><a href="#cb33-20" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb33-21"><a href="#cb33-21" aria-hidden="true" tabindex="-1"></a> , <span class="dt">ReturnStmt</span> ( <span class="dt">Just</span> ( <span class="dt">Binary</span> <span class="dt">Plus</span> ( <span class="dt">Call</span> <span class="st">"fib"</span> [ <span class="dt">Binary</span> <span class="dt">Minus</span> ( <span class="dt">Variable</span> <span class="st">"n"</span> ) ( <span class="dt">LNum</span> <span class="dv">2</span> ) ] )</span>
<span id="cb33-22"><a href="#cb33-22" aria-hidden="true" tabindex="-1"></a> ( <span class="dt">Call</span> <span class="st">"fib"</span> [ <span class="dt">Binary</span> <span class="dt">Minus</span> ( <span class="dt">Variable</span> <span class="st">"n"</span> ) ( <span class="dt">LNum</span> <span class="dv">1</span> ) ] ) ) )</span>
<span id="cb33-23"><a href="#cb33-23" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb33-24"><a href="#cb33-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb33-25"><a href="#cb33-25" aria-hidden="true" tabindex="-1"></a>, <span class="dt">VarStmt</span> <span class="st">"i"</span> ( <span class="dt">LNum</span> <span class="dv">0</span> )</span>
<span id="cb33-26"><a href="#cb33-26" aria-hidden="true" tabindex="-1"></a>, <span class="dt">WhileStmt</span> ( <span class="dt">Binary</span> <span class="dt">LessThan</span> ( <span class="dt">Variable</span> <span class="st">"i"</span> ) ( <span class="dt">LNum</span> <span class="dv">6</span> ) )</span>
<span id="cb33-27"><a href="#cb33-27" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">ExprStmt</span> ( <span class="dt">Call</span> <span class="st">"print"</span> [ <span class="dt">Call</span> <span class="st">"fib"</span> [ <span class="dt">Variable</span> <span class="st">"i"</span> ] ] )</span>
<span id="cb33-28"><a href="#cb33-28" aria-hidden="true" tabindex="-1"></a> , <span class="dt">AssignStmt</span> <span class="st">"i"</span> ( <span class="dt">Binary</span> <span class="dt">Plus</span> ( <span class="dt">Variable</span> <span class="st">"i"</span> ) ( <span class="dt">LNum</span> <span class="dv">1</span> ) )</span>
<span id="cb33-29"><a href="#cb33-29" aria-hidden="true" tabindex="-1"></a> ] ]</span></code></pre></div></td>
</tr>
</tbody>
</table>
</div>
</div>
</details>
<details>
<summary class="print-em">
Source code and AST for <code>coroutines.co</code>
</summary>
<div class="sbs-code">
<div class="scrollable-table">
<table>
<colgroup>
<col style="width: 25%"></col>
<col style="width: 75%"></col>
</colgroup>
<tbody>
<tr>
<td><div class="sourceCode" id="cb34" data-lang="co"><pre class="sourceCode javascript small noNumberSource"><code class="sourceCode javascript"><span id="cb34-1"><a href="#cb34-1" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">printNums</span>(start<span class="op">,</span> end) {</span>
<span id="cb34-2"><a href="#cb34-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> i <span class="op">=</span> start<span class="op">;</span></span>
<span id="cb34-3"><a href="#cb34-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (i <span class="op"><</span> end <span class="op">+</span> <span class="dv">1</span>) {</span>
<span id="cb34-4"><a href="#cb34-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(i)<span class="op">;</span></span>
<span id="cb34-5"><a href="#cb34-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">yield</span><span class="op">;</span></span>
<span id="cb34-6"><a href="#cb34-6" aria-hidden="true" tabindex="-1"></a> i <span class="op">=</span> i <span class="op">+</span> <span class="dv">1</span><span class="op">;</span></span>
<span id="cb34-7"><a href="#cb34-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb34-8"><a href="#cb34-8" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb34-9"><a href="#cb34-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb34-10"><a href="#cb34-10" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">printNums</span>(<span class="dv">1</span><span class="op">,</span> <span class="dv">4</span>)<span class="op">;</span></span>
<span id="cb34-11"><a href="#cb34-11" aria-hidden="true" tabindex="-1"></a><span class="fu">printNums</span>(<span class="dv">11</span><span class="op">,</span> <span class="dv">16</span>)<span class="op">;</span></span></code></pre></div></td>
<td><div class="sourceCode" id="cb35" data-lang="haskell"><pre class="sourceCode haskell small noNumberSource"><code class="sourceCode haskell"><span id="cb35-1"><a href="#cb35-1" aria-hidden="true" tabindex="-1"></a>[ <span class="dt">FunctionStmt</span> <span class="st">"printNums"</span> [ <span class="st">"start"</span>, <span class="st">"end"</span> ]</span>
<span id="cb35-2"><a href="#cb35-2" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">VarStmt</span> <span class="st">"i"</span> ( <span class="dt">Variable</span> <span class="st">"start"</span> )</span>
<span id="cb35-3"><a href="#cb35-3" aria-hidden="true" tabindex="-1"></a> , <span class="dt">WhileStmt</span> ( <span class="dt">Binary</span> <span class="dt">LessThan</span> ( <span class="dt">Variable</span> <span class="st">"i"</span> ) ( <span class="dt">Binary</span> <span class="dt">Plus</span> ( <span class="dt">Variable</span> <span class="st">"end"</span> ) ( <span class="dt">LNum</span> <span class="dv">1</span> ) )</span>
<span id="cb35-4"><a href="#cb35-4" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">ExprStmt</span> ( <span class="dt">Call</span> <span class="st">"print"</span> [ <span class="dt">Variable</span> <span class="st">"i"</span> ] )</span>
<span id="cb35-5"><a href="#cb35-5" aria-hidden="true" tabindex="-1"></a> , <span class="dt">YieldStmt</span></span>
<span id="cb35-6"><a href="#cb35-6" aria-hidden="true" tabindex="-1"></a> , <span class="dt">AssignStmt</span> <span class="st">"i"</span> ( <span class="dt">Binary</span> <span class="dt">Plus</span> ( <span class="dt">Variable</span> <span class="st">"i"</span> ) ( <span class="dt">LNum</span> <span class="dv">1</span> ) )</span>
<span id="cb35-7"><a href="#cb35-7" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb35-8"><a href="#cb35-8" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb35-9"><a href="#cb35-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb35-10"><a href="#cb35-10" aria-hidden="true" tabindex="-1"></a>, <span class="dt">SpawnStmt</span> ( <span class="dt">Call</span> <span class="st">"printNums"</span> [ <span class="dt">LNum</span> <span class="dv">1</span>, <span class="dt">LNum</span> <span class="dv">4</span> ] )</span>
<span id="cb35-11"><a href="#cb35-11" aria-hidden="true" tabindex="-1"></a>, <span class="dt">ExprStmt</span> ( <span class="dt">Call</span> <span class="st">"printNums"</span> [ <span class="dt">LNum</span> <span class="dv">11</span>, <span class="dt">LNum</span> <span class="dv">16</span> ] ) ]</span></code></pre></div></td>
</tr>
</tbody>
</table>
</div>
</div>
</details>
<details>
<summary class="print-em">
Source code and AST for <code>pingpong.co</code>
</summary>
<div class="sbs-code">
<div class="scrollable-table">
<table>
<colgroup>
<col style="width: 20%"></col>
<col style="width: 79%"></col>
</colgroup>
<tbody>
<tr>
<td><div class="sourceCode" id="cb36" data-lang="co"><pre class="sourceCode javascript small noNumberSource"><code class="sourceCode javascript"><span id="cb36-1"><a href="#cb36-1" aria-hidden="true" tabindex="-1"></a><span class="kw">var</span> chan <span class="op">=</span> <span class="fu">newChannel</span>()<span class="op">;</span></span>
<span id="cb36-2"><a href="#cb36-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb36-3"><a href="#cb36-3" aria-hidden="true" tabindex="-1"></a><span class="kw">function</span> <span class="fu">player</span>(name) {</span>
<span id="cb36-4"><a href="#cb36-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">var</span> n <span class="op">=</span> <span class="kw">null</span><span class="op">;</span></span>
<span id="cb36-5"><a href="#cb36-5" aria-hidden="true" tabindex="-1"></a> <span class="cf">while</span> (<span class="kw">true</span>) {</span>
<span id="cb36-6"><a href="#cb36-6" aria-hidden="true" tabindex="-1"></a> n <span class="op">=</span> <span class="op"><-</span> chan<span class="op">;</span></span>
<span id="cb36-7"><a href="#cb36-7" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (n <span class="op">==</span> <span class="st">"done"</span>) {</span>
<span id="cb36-8"><a href="#cb36-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(name <span class="op">+</span> <span class="st">" done"</span>)<span class="op">;</span></span>
<span id="cb36-9"><a href="#cb36-9" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span><span class="op">;</span></span>
<span id="cb36-10"><a href="#cb36-10" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb36-11"><a href="#cb36-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb36-12"><a href="#cb36-12" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(name <span class="op">+</span> <span class="st">" "</span> <span class="op">+</span> n)<span class="op">;</span></span>
<span id="cb36-13"><a href="#cb36-13" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> (n <span class="op">==</span> <span class="dv">0</span>) {</span>
<span id="cb36-14"><a href="#cb36-14" aria-hidden="true" tabindex="-1"></a> <span class="fu">print</span>(name <span class="op">+</span> <span class="st">" done"</span>)<span class="op">;</span></span>
<span id="cb36-15"><a href="#cb36-15" aria-hidden="true" tabindex="-1"></a> <span class="st">"done"</span> <span class="op">-></span> chan<span class="op">;</span></span>
<span id="cb36-16"><a href="#cb36-16" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span><span class="op">;</span></span>
<span id="cb36-17"><a href="#cb36-17" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb36-18"><a href="#cb36-18" aria-hidden="true" tabindex="-1"></a> n <span class="op">-</span> <span class="dv">1</span> <span class="op">-></span> chan<span class="op">;</span></span>
<span id="cb36-19"><a href="#cb36-19" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb36-20"><a href="#cb36-20" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb36-21"><a href="#cb36-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb36-22"><a href="#cb36-22" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">player</span>(<span class="st">"ping"</span>)<span class="op">;</span></span>
<span id="cb36-23"><a href="#cb36-23" aria-hidden="true" tabindex="-1"></a>spawn <span class="fu">player</span>(<span class="st">"pong"</span>)<span class="op">;</span></span>
<span id="cb36-24"><a href="#cb36-24" aria-hidden="true" tabindex="-1"></a><span class="dv">10</span> <span class="op">-></span> chan<span class="op">;</span></span></code></pre></div></td>
<td><div class="sourceCode" id="cb37" data-lang="haskell"><pre class="sourceCode haskell small noNumberSource"><code class="sourceCode haskell"><span id="cb37-1"><a href="#cb37-1" aria-hidden="true" tabindex="-1"></a>[ <span class="dt">VarStmt</span> <span class="st">"chan"</span> ( <span class="dt">Call</span> <span class="st">"newChannel"</span> [] )</span>
<span id="cb37-2"><a href="#cb37-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb37-3"><a href="#cb37-3" aria-hidden="true" tabindex="-1"></a>, <span class="dt">FunctionStmt</span> <span class="st">"player"</span> [ <span class="st">"name"</span> ]</span>
<span id="cb37-4"><a href="#cb37-4" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">VarStmt</span> <span class="st">"n"</span> <span class="dt">LNull</span></span>
<span id="cb37-5"><a href="#cb37-5" aria-hidden="true" tabindex="-1"></a> , <span class="dt">WhileStmt</span> ( <span class="dt">LBool</span> <span class="dt">True</span> )</span>
<span id="cb37-6"><a href="#cb37-6" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">AssignStmt</span> <span class="st">"n"</span> ( <span class="dt">Receive</span> ( <span class="dt">Variable</span> <span class="st">"chan"</span> ) )</span>
<span id="cb37-7"><a href="#cb37-7" aria-hidden="true" tabindex="-1"></a> , <span class="dt">IfStmt</span> ( <span class="dt">Binary</span> <span class="dt">Equals</span> ( <span class="dt">Variable</span> <span class="st">"n"</span> ) ( <span class="dt">LStr</span> <span class="st">"done"</span> ) )</span>
<span id="cb37-8"><a href="#cb37-8" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">ExprStmt</span> ( <span class="dt">Call</span> <span class="st">"print"</span> [ <span class="dt">Binary</span> <span class="dt">Plus</span> ( <span class="dt">Variable</span> <span class="st">"name"</span> ) ( <span class="dt">LStr</span> <span class="st">" done"</span> ) ] )</span>
<span id="cb37-9"><a href="#cb37-9" aria-hidden="true" tabindex="-1"></a> , <span class="dt">ReturnStmt</span> <span class="dt">Nothing</span></span>
<span id="cb37-10"><a href="#cb37-10" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb37-11"><a href="#cb37-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb37-12"><a href="#cb37-12" aria-hidden="true" tabindex="-1"></a> , <span class="dt">ExprStmt</span> ( <span class="dt">Call</span> <span class="st">"print"</span> [ <span class="dt">Binary</span> <span class="dt">Plus</span> ( <span class="dt">Binary</span> <span class="dt">Plus</span> ( <span class="dt">Variable</span> <span class="st">"name"</span> ) ( <span class="dt">LStr</span> <span class="st">" "</span> ) ) ( <span class="dt">Variable</span> <span class="st">"n"</span> ) ] )</span>
<span id="cb37-13"><a href="#cb37-13" aria-hidden="true" tabindex="-1"></a> , <span class="dt">IfStmt</span> ( <span class="dt">Binary</span> <span class="dt">Equals</span> ( <span class="dt">Variable</span> <span class="st">"n"</span> ) ( <span class="dt">LNum</span> <span class="dv">0</span> ) )</span>
<span id="cb37-14"><a href="#cb37-14" aria-hidden="true" tabindex="-1"></a> [ <span class="dt">ExprStmt</span> ( <span class="dt">Call</span> <span class="st">"print"</span> [ <span class="dt">Binary</span> <span class="dt">Plus</span> ( <span class="dt">Variable</span> <span class="st">"name"</span> ) ( <span class="dt">LStr</span> <span class="st">" done"</span> ) ] )</span>
<span id="cb37-15"><a href="#cb37-15" aria-hidden="true" tabindex="-1"></a> , <span class="dt">SendStmt</span> ( <span class="dt">LStr</span> <span class="st">"done"</span> ) <span class="st">"chan"</span></span>
<span id="cb37-16"><a href="#cb37-16" aria-hidden="true" tabindex="-1"></a> , <span class="dt">ReturnStmt</span> <span class="dt">Nothing</span></span>
<span id="cb37-17"><a href="#cb37-17" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb37-18"><a href="#cb37-18" aria-hidden="true" tabindex="-1"></a> , <span class="dt">SendStmt</span> ( <span class="dt">Binary</span> <span class="dt">Minus</span> ( <span class="dt">Variable</span> <span class="st">"n"</span> ) ( <span class="dt">LNum</span> <span class="dv">1</span> ) ) <span class="st">"chan"</span></span>
<span id="cb37-19"><a href="#cb37-19" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb37-20"><a href="#cb37-20" aria-hidden="true" tabindex="-1"></a> ]</span>
<span id="cb37-21"><a href="#cb37-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb37-22"><a href="#cb37-22" aria-hidden="true" tabindex="-1"></a>, <span class="dt">SpawnStmt</span> ( <span class="dt">Call</span> <span class="st">"player"</span> [ <span class="dt">LStr</span> <span class="st">"ping"</span> ] )</span>
<span id="cb37-23"><a href="#cb37-23" aria-hidden="true" tabindex="-1"></a>, <span class="dt">SpawnStmt</span> ( <span class="dt">Call</span> <span class="st">"player"</span> [ <span class="dt">LStr</span> <span class="st">"pong"</span> ] )</span>
<span id="cb37-24"><a href="#cb37-24" aria-hidden="true" tabindex="-1"></a>, <span class="dt">SendStmt</span> ( <span class="dt">LNum</span> <span class="dv">10</span> ) <span class="st">"chan"</span> ]</span></code></pre></div></td>
</tr>
</tbody>
</table>
</div>
</div>
</details>
<hr></hr>
<p>That’s all for now. In about 200 lines of code, we have implemented the complete parser for <span class="fancy">Co</span>. In the <a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed">next part</a>, we’ll implement a tree-walking interpreter for the <span class="fancy">Co</span> AST that supports the <a href="#basic-features">basic features</a>.</p>
<p>The full code for the parser can be seen <a href="https://abhinavsarkar.net/code/co-parser.html?mtm_campaign=feed">here</a>.</p>
<h2 class="notoc" data-track-content data-content-name="acknowledgements" data-content-piece="implementing-co-1" id="acknowledgements">Acknowledgements</h2>
<p>Many thanks to <a href="https://www.deobald.ca/" target="_blank" rel="noopener">Steven Deobald</a> for reviewing a draft of this article.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<div id="refs" class="references csl-bib-body hanging-indent" data-entry-spacing="0" role="list">
<div id="ref-Bartel2011-ap" class="csl-entry" role="listitem">
Bartel, Joe. <span>“<span>Non-Preemptive</span> Multitasking.”</span> <em>The Computer Journal</em>, no. 30 (May 2011): 37–38, 28. <a href="https://cini.classiccmp.org/pdf/HT68K/HT68K%20TCJ30p37.pdf" target="_blank" rel="noopener">https://cini.classiccmp.org/pdf/HT68K/HT68K%20TCJ30p37.pdf</a>.
</div>
<div id="ref-Hoare1986-ih" class="csl-entry" role="listitem">
Hoare, C A R. <em>Communicating Sequential Processes</em>. Prentice Hall, 1986. <a href="https://doi.org/10.1145/359576.359585" target="_blank" rel="noopener">https://doi.org/10.1145/359576.359585</a>.
</div>
<div id="ref-Hutton1992-vc" class="csl-entry" role="listitem">
Hutton, Graham. <span>“Higher-Order Functions for Parsing.”</span> <em>Journal of Functional Programming</em> 2, no. 3 (1992): 323–43. <a href="https://doi.org/10.1017/S0956796800000411" target="_blank" rel="noopener">https://doi.org/10.1017/S0956796800000411</a>.
</div>
<div id="ref-Knuth1997-rv" class="csl-entry" role="listitem">
Knuth, Donald E. <span>“Coroutines.”</span> In <em>The Art of Computer Programming: Volume 1: Fundamental Algorithms</em>, 3rd ed., 193–200. Addison Wesley, 1997.
</div>
<div id="ref-Watson2017" class="csl-entry" role="listitem">
Watson, Des. <span>“Approaches to Syntax Analysis.”</span> In <em>A Practical Approach to Compiler Construction</em>, 75–93. Springer International Publishing, 2017. <a href="https://doi.org/10.1007/978-3-319-52789-5_4" target="_blank" rel="noopener">https://doi.org/10.1007/978-3-319-52789-5_4</a>.
</div>
</div>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>Coroutines are often conflated with <a href="https://en.wikipedia.org/wiki/Fiber_(computer_science)" target="_blank" rel="noopener"><em>Fibers</em></a>. Some implementations of coroutines call themselves fibers instead. Quoting Wikipedia:</p>
<blockquote>
<p>Fibers describe essentially the same concept as coroutines. The distinction, if
there is any, is that coroutines are a language-level construct, a form of control
flow, while fibers are a systems-level construct, viewed as threads that happen
to not run in parallel. It is contentious which of the two concepts has priority:
fibers may be viewed as an implementation of coroutines, or as a substrate on which
to implement coroutines.</p>
</blockquote>
<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn2"><p><a href="https://en.wikipedia.org/wiki/Green_threads" target="_blank" rel="noopener"><em>Green threads</em></a> are another popular solution for lightweight concurrency. Unlike coroutines, green threads are preemtable. Unlike <a href="https://en.wikipedia.org/wiki/Thread_(computing)" target="_blank" rel="noopener"><em>Threads</em></a>, they are scheduled by the language runtime instead of the operating system.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p><code>print</code> is a built-in function in <span class="fancy">Co</span>. It behaves same as the <code>console.log</code> function in JavaScript, or the <a href="https://hackage.haskell.org/package/base-4.15.0.0/docs/Prelude.html#v:print" target="_blank" rel="noopener"><code>print</code></a> function in Haskell.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>The complete code for the interpreter is <a href="https://abhinavsarkar.net/code/co-interpreter.html?mtm_campaign=feed">here</a>.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>If you are a Cabal user instead, the process is a little longer. First run:</p>
<pre class="plain"><code>cabal repl \
--build-depends "megaparsec ^>= 9.0" \
--build-depends "lifted-base ^>= 0.2" \
--build-depends "clock ^>= 0.8" \
--build-depends "pqueue ^>= 1.4" \
--build-depends "pretty-simple ^>= 4.0"</code></pre>
<p>Then in the started GHCi REPL, run <code>:load co-interpreter.hs</code>. You should be able to use the functions in the interpreter now.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>We are using:</p>
<ul>
<li>the <a href="https://hackage.haskell.org/package/containers" target="_blank" rel="noopener">containers</a> and <a href="https://hackage.haskell.org/package/pqueue" target="_blank" rel="noopener">pqueue</a> libraries for data structures,</li>
<li>the <a href="https://hackage.haskell.org/package/clock" target="_blank" rel="noopener">clock</a> library for getting system time,</li>
<li>the <a href="https://hackage.haskell.org/package/megaparsec" target="_blank" rel="noopener">megaparsec</a> library for parsing,</li>
<li>the <a href="https://hackage.haskell.org/package/mtl" target="_blank" rel="noopener">mtl</a> and <a href="https://hackage.haskell.org/package/lifted-base" target="_blank" rel="noopener">lifted-base</a> libraries for the Monad stack and operations, and</li>
<li>the <a href="https://hackage.haskell.org/package/pretty-simple" target="_blank" rel="noopener">pretty-simple</a> library for pretty printing data structures.</li>
</ul>
<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn7"><p>I explore parsing in more depth by writing a JSON parser from scratch in Haskell in <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed">one of my previous posts</a>.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>I followed <a href="https://github.com/mrkkrp/megaparsec-site/blob/master/tutorials/parsing-simple-imperative-language.md" target="_blank" rel="noopener">this</a> (outdated) Megaparsec tutorial for writing a parser for an imperative language. <a href="https://markkarpov.com/tutorial/megaparsec.html" target="_blank" rel="noopener">This tutorial</a> is also helpful in learning the intricacies of Megaparsec.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9"><p><code>term</code> is a <a href="https://en.wikipedia.org/wiki/Recursive_descent_parser" target="_blank" rel="noopener"><em>Recursive descent parser</em></a>. If we directly include call parsing as an alternative in the <code>primary</code> parser, we would run into the <a href="https://en.wikipedia.org/wiki/Left_recursion" target="_blank" rel="noopener"><em>Left recursion</em></a> problem, causing the parser to hang in infinite loop.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn10"><p>In the parsing parlance this is called <em><a href="https://en.wikipedia.org/wiki/Backtracking" target="_blank" rel="noopener">Backtracking</a></em>. The alternative is to do a <em><a href="https://en.wikipedia.org/wiki/Parsing#Lookahead" target="_blank" rel="noopener">Lookahead</a></em><sup><a href="#ref-Watson2017" class="citation" title="(Watson, “Approaches to Syntax Analysis”)
">@14</a></sup>.<a href="#fnref10" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn11"><p>An earlier version of this post used the <code>symbol</code> parser for parsing keywords instead of the <code>reserved</code> parser. That would cause the parser to fail to parse variable names like <code>nullx</code> that start with a keyword. Using the <code>reserved</code> parser fixes this issue because of the built-in backtracking using <code>try</code>.<a href="#fnref11" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>Implementing Co, a Small Language With Coroutines</strong>.</p>
<ol>
<li>
<strong>The Parser</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-2/?mtm_campaign=feed">The Interpreter</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-3/?mtm_campaign=feed">Adding Coroutines</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/implementing-co-4/?mtm_campaign=feed">Adding Channels</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/implementing-co-1/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2021-04-24T00:00:00Z <p>Many major programming languages these days support some lightweight concurrency primitives. The most recent popular ones are <a href="https://en.wikipedia.org/wiki/Go_(programming_language)#Concurrency:_goroutines_and_channels" target="_blank" rel="noopener">Goroutines</a> in <a href="https://golang.org/" target="_blank" rel="noopener">Go</a>, <a href="https://kotlinlang.org/docs/coroutines-basics.html" target="_blank" rel="noopener">Coroutines</a> in <a href="https://kotlinlang.org/" target="_blank" rel="noopener">Kotlin</a> and <a href="https://rust-lang.github.io/async-book/01_getting_started/02_why_async.html" target="_blank" rel="noopener">Async</a> in <a href="https://www.rust-lang.org/" target="_blank" rel="noopener">Rust</a>. Let’s explore some of these concepts in detail by implementing a programming language with support for coroutines and Go-style channels.</p>
https://abhinavsarkar.net/posts/type-level-haskell-aoc7/ Solving Advent of Code “Handy Haversacks” in Type-level Haskell 2021-01-04T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>I have been trying to use type-level programming in Haskell to solve interesting problems since I read <a href="https://thinkingwithtypes.com/" target="_blank" rel="noopener">Thinking with Types</a> by <a href="https://reasonablypolymorphic.com/" target="_blank" rel="noopener">Sandy Maguire</a>. Then I found myself solving the problems in <a href="https://adventofcode.com/2020" target="_blank" rel="noopener">Advent of Code 2020</a> and some of them seemed suitable to be solved with type-level programming. So I decided to give it a shot.</p>
<p><p>This post was originally published on <a href="https://abhinavsarkar.net/posts/type-level-haskell-aoc7/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>Solving Advent of Code</strong>.</p>
<ol>
<li>
<strong>“Handy Haversacks” in Type-level Haskell</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/parsers-zippers-interpreters-aoc7/?mtm_campaign=feed">“No Space Left On Device” with Parsers, Zippers and Interpreters</a>
</li>
<li>
<a href="https://abhinavsarkar.net/notes/2022-type-level-rps/?mtm_campaign=feed">“Rock-Paper-Scissors” in Type-level Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/compiling-aoc23-aplenty/?mtm_campaign=feed">“Aplenty” by Compiling</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/solving-aoc20-seating-system/?mtm_campaign=feed">“Seating System” with Comonads and Stencils</a>
</li>
</ol>
</section>
</p>
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#type-level-programming">Type-level Programming</a></li><li><a href="#handy-haversacks">Handy Haversacks</a></li><li><a href="#terms-types-and-kinds">Terms, Types, and Kinds</a></li><li><a href="#type-level-primitives">Type-level Primitives</a></li><li><a href="#type-families">Type Families</a></li><li><a href="#setup">Setup</a></li><li><a href="#consuming-strings-at-type-level">Consuming Strings at Type-level</a></li><li><a href="#parsing-at-type-level">Parsing at Type-level</a></li><li><a href="#how-many-bags">How Many Bags?</a></li></ol></nav>
<h2 data-track-content data-content-name="type-level-programming" data-content-piece="type-level-haskell-aoc7" id="type-level-programming">Type-level Programming</h2>
<p>Type-level programming (TLP) is, simply put, using the type system of a language to solve a problem, or a part of a problem. In a way, we already do TLP when we create the right types to represent our problems and solutions in code. The right types do a lot of work for us by making sure that wrong models and states do not compile, hence reducing the solution-space for us. But in some languages like <a href="https://www.haskell.org/" target="_blank" rel="noopener">Haskell</a> and <a href="https://www.idris-lang.org/" target="_blank" rel="noopener">Idris</a>, we can do much more than just crafting the right types. We can leverage the type-system itself for computation! Recent versions of Haskell have introduced superb support for various extensions and primitives to make TLP in Haskell easier than ever before<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. Let’s use TLP to solve an interesting problem in this post.</p>
<h2 data-track-content data-content-name="handy-haversacks" data-content-piece="type-level-haskell-aoc7" id="handy-haversacks">Handy Haversacks</h2>
<p><a href="https://adventofcode.com/2020/day/7" target="_blank" rel="noopener">Handy Haversacks</a> is the problem for the day seven of Advent of Code 2020<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>. In this problem, our input is a set of rules about some bags. The bags have different colors and may contain zero or more bags of other colors. Here are the rules for the example problem:</p>
<pre class="plain"><code>light red bags contain 1 bright white bag, 2 muted yellow bags.
dark orange bags contain 3 bright white bags, 4 muted yellow bags.
bright white bags contain 1 shiny gold bag.
muted yellow bags contain 2 shiny gold bags, 9 faded blue bags.
shiny gold bags contain 1 dark olive bag, 2 vibrant plum bags.
dark olive bags contain 3 faded blue bags, 4 dotted black bags.
vibrant plum bags contain 5 faded blue bags, 6 dotted black bags.
faded blue bags contain no other bags.
dotted black bags contain no other bags.</code></pre>
<p>We are going to solve the part two of the problem: given the color of a bag, find out how many other bags in total that bag contains. Since the bags can contain more bags, this is a recursive problem. For the rules above, a <code>shiny gold</code> bag contains …</p>
<blockquote>
<p>1 dark olive bag (and the 7 bags within it) plus 2 vibrant plum bags (and the 11 bags within each of those): 1 + 1*7 + 2 + 2*11 = 32 bags!</p>
</blockquote>
<p>At this point, many of the readers would have already solved this problem in their heads: just parse the input to a lookup table and use it to recursively calculate the number of bags. Easy, isn’t it? But what if we want to solve it with type-level programming?</p>
<h2 data-track-content data-content-name="terms-types-and-kinds" data-content-piece="type-level-haskell-aoc7" id="terms-types-and-kinds">Terms, Types, and Kinds</h2>
<p>We are used to working in the world of <em>Terms</em>. Terms are “things” that programs manipulate at the runtime, for example, integers, strings, and user-defined data type instances. Terms have <em>Types</em> which are used by the compiler to prevent certain behaviors at compile-time, even before the programs are run. For example, it prevents you from adding a string to an integer.</p>
<p>The compiler works (or computes) with types instead of terms. This chain goes further. Like terms have types, types have <em>Kinds</em>. Kinds can be thought of as “the types of the Types”. The compiler uses kinds to prevent certain behaviors of the types at compile-time. Let’s use GHCi to explore terms, types, and kinds:</p>
<div class="sourceCode" id="cb2" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">True</span> <span class="co">-- a term</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>True</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="dt">True</span> <span class="co">-- and its type</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>True :: Bool</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">Bool</span> <span class="co">-- and the kind of the Bool type</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>Bool :: *</span></code></pre></div>
<p>All terms have only one kind: <code class="sourceCode haskell"><span class="op">*</span></code>. That is, all types like <code class="sourceCode haskell"><span class="dt">Int</span></code>, <code class="sourceCode haskell"><span class="dt">String</span></code> and whatever data types we define ourselves, have the kind <code class="sourceCode haskell"><span class="op">*</span></code>.</p>
<p>It’s trivial to create new types in Haskell with <code>data</code> and <code>newtype</code> definitions. How do we go about creating new kinds? The <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/data_kinds.html#extension-DataKinds" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">DataKinds</span></code></a> extension lets us do that:</p>
<div class="sourceCode" id="cb3" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XDataKinds</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">data</span> <span class="dt">Allow</span> <span class="ot">=</span> <span class="dt">Yes</span> <span class="op">|</span> <span class="dt">No</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="dt">Yes</span> <span class="co">-- Yes is data constructor with type Allow</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>Yes :: Allow</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">Allow</span> <span class="co">-- Allow is a type with kind *</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>Allow :: *</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">Yes</span> <span class="co">-- Yes is a type too. Its kind is Allow.</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>Yes :: Allow</span></code></pre></div>
<p>The <code class="sourceCode haskell"><span class="dt">DataKinds</span></code> extension promotes types to kinds, and data constructors of types to the types of corresponding kinds. In the example above, <code class="sourceCode haskell"><span class="dt">Yes</span></code> and <code class="sourceCode haskell"><span class="dt">No</span></code> are the promoted types of the promoted kind <code class="sourceCode haskell"><span class="dt">Allow</span></code>. Even though the constructors, types, and kinds may have same names, the compiler can tell apart from their context.</p>
<p>Now we know how to create our own kinds. What if we check for the promoted kinds of the built-in types?</p>
<div class="sourceCode" id="cb4" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="dt">True</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>True :: Bool</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="dv">1</span><span class="ot"> ::</span> <span class="dt">Int</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>1 :: Int :: Int</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="st">"hello"</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>"hello" :: [Char]</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">True</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>True :: Bool</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dv">1</span></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>1 :: Nat</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="st">"hello"</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a>"hello" :: Symbol</span></code></pre></div>
<p>As expected, the <code class="sourceCode haskell"><span class="dt">Bool</span></code> type is promoted to the <code class="sourceCode haskell"><span class="dt">Bool</span></code> kind. But numbers and strings have kinds <code class="sourceCode haskell"><span class="dt">Nat</span></code> and <code class="sourceCode haskell"><span class="dt">Symbol</span></code> respectively. What are these new kinds?</p>
<h2 data-track-content data-content-name="type-level-primitives" data-content-piece="type-level-haskell-aoc7" id="type-level-primitives">Type-level Primitives</h2>
<p>To be able to do useful computation at type-level, we need type-level numbers and strings. We can use <a href="https://wiki.haskell.org/Peano_numbers" target="_blank" rel="noopener">Peano numbers</a> to encode natural numbers as a type and use the <code class="sourceCode haskell"><span class="dt">DataKinds</span></code> extension to <a href="https://www.parsonsmatt.org/2017/04/26/basic_type_level_programming_in_haskell.html#data-kinds" target="_blank" rel="noopener">promote them to type-level</a>. With numbers as types now, we can use them for interesting things like <a href="https://www.parsonsmatt.org/2017/04/26/basic_type_level_programming_in_haskell.html#vectors" target="_blank" rel="noopener">sized vectors</a> with compile-time validation for index bound checks etc. But Peano numbers are awkward to work with because of their verbosity. Fortunately, GHC has built-in support for type-level natural numbers with the <a href="https://hackage.haskell.org/package/base-4.14.1.0/docs/GHC-TypeLits.html" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">GHC.TypeLits</span></code></a> package.</p>
<div class="sourceCode" id="cb5" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dv">1</span> <span class="co">-- 1 is a type-level number here</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>1 :: Nat</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dv">10000</span> <span class="co">-- kind of all type-level numbers is GHC.TypeLits.Nat</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>10000 :: Nat</span></code></pre></div>
<p>GHC supports type-level strings as well through the same package. Unlike term-level strings which are lists of <code class="sourceCode haskell"><span class="dt">Char</span></code>s, type-level strings are defined as a primitive and their kind is <code class="sourceCode haskell"><span class="dt">Symbol</span></code><a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>.</p>
<div class="sourceCode" id="cb6" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="st">"hello at type-level"</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>"hello at type-level" :: Symbol</span></code></pre></div>
<p>GHC also supports type-level lists and tuples. Type-level lists can contain zero or more types of same kind, while type-level tuples can contain zero or more types of possibly different kinds.</p>
<div class="sourceCode" id="cb7" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind [<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>]</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>[1, 2, 3] :: [Nat]</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind [<span class="st">"hello"</span>, <span class="st">"world"</span>]</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>["hello", "world"] :: [Symbol]</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="co">-- prefix the tuple with ' to disambiguate it as a type-level tuple</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind '(<span class="dv">1</span>, <span class="st">"one"</span>)</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>'(1, "one") :: (Nat, Symbol)</span></code></pre></div>
<p>Now we are familiar with the primitives for type-level computations. How exactly do we do these computations though?</p>
<h2 data-track-content data-content-name="type-families" data-content-piece="type-level-haskell-aoc7" id="type-families">Type Families</h2>
<p><em>Type families</em> can be thought of as functions that work at type-level. Just like we use functions to do computations at term-level, we use type families to do computations at type-level. Type families are enabled by the <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/type_families.html#extension-TypeFamilies" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">TypeFamilies</span></code></a> extension<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>.</p>
<p>Let’s write a simple type family to compute the logical conjunction of two type-level booleans:</p>
<div class="sourceCode" id="cb8" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeFamilies</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">+</span>m</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">type</span> <span class="kw">family</span> <span class="dt">And</span> (<span class="ot">x ::</span> <span class="dt">Bool</span>) (<span class="ot">y ::</span> <span class="dt">Bool</span>)<span class="ot"> ::</span> <span class="dt">Bool</span> <span class="kw">where</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">And</span> <span class="dt">True</span> <span class="dt">True</span> <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">And</span> _ _ <span class="ot">=</span> <span class="dt">False</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">And</span></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a>And :: Bool -> Bool -> Bool</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">And</span> <span class="dt">True</span> <span class="dt">False</span></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>And True False :: Bool</span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a>= 'False</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">And</span> <span class="dt">True</span> <span class="dt">True</span></span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a>And True True :: Bool</span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a>= 'True</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">And</span> <span class="dt">False</span> <span class="dt">True</span></span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a>And False True :: Bool</span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a>= 'False</span></code></pre></div>
<p>Kind of <code>And</code> shows that it is a function at type-level. We apply it using the <code>:kind!</code> command in GHCi to see that it indeed works as expected.</p>
<p>GHC comes with some useful type families to do computations on <code class="sourceCode haskell"><span class="dt">Nat</span></code>s and <code class="sourceCode haskell"><span class="dt">Symbol</span></code>s in the <code class="sourceCode haskell"><span class="dt">GHC.TypeLits</span></code> package. Let’s see them in action:</p>
<div class="sourceCode" id="cb9" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">import</span> <span class="dt">GHC.TypeLits</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeOperators</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dv">1</span> <span class="op">+</span> <span class="dv">2</span> <span class="co">-- addition at type-level</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>1 + 2 :: Nat</span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>= 3</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">CmpNat</span> <span class="dv">1</span> <span class="dv">2</span> <span class="co">-- comparison at type-level, return lifted Ordering</span></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a>CmpNat 1 2 :: Ordering</span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a>= 'LT</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">AppendSymbol</span> <span class="st">"hello"</span> <span class="st">"world"</span> <span class="co">-- appending two symbols at type-level</span></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a>AppendSymbol "hello" "world" :: Symbol</span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a>= "helloworld"</span></code></pre></div>
<p>The <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/type_operators.html#extension-TypeOperators" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">TypeOperators</span></code></a> extension enables us to define and use type families with symbolic names.</p>
<p>We have learned the basics of TLP in Haskell. Next, we can proceed to solve the actual problem.</p>
<h2 data-track-content data-content-name="setup" data-content-piece="type-level-haskell-aoc7" id="setup">Setup</h2>
<p>This post is written in a literate programming style, meaning if you take all the code snippets from the post (excluding the GHCi examples) in the order they appear and put them in a file, you’ll have a real working program. First go the extensions and imports:</p>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE DataKinds, TypeFamilies #-}</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE TypeOperators, UndecidableInstances #-}</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">AoC7</span> <span class="kw">where</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Proxy</span></span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Symbol.Ascii</span></span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">GHC.TypeLits</span></span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Prelude</span> <span class="kw">hiding</span> (words, reverse)</span></code></pre></div>
<p>We have already encountered some of these extensions in the sections above. We’ll learn about the rest of them as we go along.</p>
<h2 data-track-content data-content-name="consuming-strings-at-type-level" data-content-piece="type-level-haskell-aoc7" id="consuming-strings-at-type-level">Consuming Strings at Type-level</h2>
<p>The first capability required for parsing is to consume the input string character-by-character. It’s easy to do that with term-level strings as they are simply lists of characters. But <code class="sourceCode haskell"><span class="dt">Symbol</span></code>s are built-in primitives and cannot be consumed character-by-character using the built-in functionalities. Therefore, the first thing we should do is to figure out how to break a symbol into its constituent characters. Fortunately for us, the <a href="https://hackage.haskell.org/package/symbols-0.3.0.0" target="_blank" rel="noopener"><code>symbols</code></a> library implements just that with the <a href="https://hackage.haskell.org/package/symbols-0.3.0.0/docs/Data-Symbol-Ascii.html#t:ToList" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ToList</span></code></a> type family<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>. It also provide a few more utilities to work with symbols which we use later for solving our problem. Let’s see what <code>ToList</code> gives us:</p>
<div class="sourceCode" id="cb11" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">import</span> <span class="dt">Data.Symbol.Ascii</span> (<span class="dt">ToList</span>)</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="st">"hello there"</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>"hello there" :: Symbol</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>= "hello there"</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">ToList</span> <span class="st">"hello there"</span></span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a>ToList "hello there" :: [Symbol]</span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a>= '["h", "e", "l", "l", "o", " ", "t", "h", "e", "r", "e"]</span></code></pre></div>
<p>It does what we want. However, for the purpose of parsing rules for this problem, it’s better to have the symbols broken as words. We already have the capability to break a symbol into a list of symbols of its characters. Now, we can combine the character symbols to create a list of word symbols.</p>
<p>We start with solving this problem with a term-level function. It is like the <a href="https://hackage.haskell.org/package/base-4.14.1.0/docs/Prelude.html#v:words" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="fu">words</span></code></a> function from the Prelude but of type <code class="sourceCode haskell">[<span class="dt">String</span>] <span class="ot">-></span> [<span class="dt">String</span>]</code> instead of <code class="sourceCode haskell"><span class="dt">String</span> <span class="ot">-></span> [<span class="dt">String</span>]</code>.</p>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="fu">words</span><span class="ot"> ::</span> [<span class="dt">String</span>] <span class="ot">-></span> [<span class="dt">String</span>]</span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="fu">words</span> s <span class="ot">=</span> <span class="fu">reverse</span> <span class="op">$</span> words2 [] s</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="ot">words2 ::</span> [<span class="dt">String</span>] <span class="ot">-></span> [<span class="dt">String</span>] <span class="ot">-></span> [<span class="dt">String</span>]</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>words2 acc [] <span class="ot">=</span> acc</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>words2 [] (<span class="st">" "</span><span class="op">:</span>xs) <span class="ot">=</span> words2 [] xs</span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>words2 [] (x<span class="op">:</span>xs) <span class="ot">=</span> words2 [x] xs</span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a>words2 acc (<span class="st">" "</span><span class="op">:</span>xs) <span class="ot">=</span> words2 (<span class="st">""</span><span class="op">:</span>acc) xs</span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a>words2 (a<span class="op">:</span>as) (x<span class="op">:</span>xs) <span class="ot">=</span> words2 ((a <span class="op">++</span> x)<span class="op">:</span>as) xs</span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a><span class="fu">reverse</span><span class="ot"> ::</span> [a] <span class="ot">-></span> [a]</span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a><span class="fu">reverse</span> l <span class="ot">=</span> rev l []</span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a><span class="ot">rev ::</span> [a] <span class="ot">-></span> [a] <span class="ot">-></span> [a]</span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a>rev [] a <span class="ot">=</span> a</span>
<span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a>rev (x<span class="op">:</span>xs) a <span class="ot">=</span> rev xs (x<span class="op">:</span>a)</span></code></pre></div>
<p>This code may look unidiomatic Haskell but it’s written this way because we have to translate it to the type families version and type families do not support <code class="sourceCode haskell"><span class="kw">let</span></code> or <code class="sourceCode haskell"><span class="kw">where</span></code> bindings and <code class="sourceCode haskell"><span class="kw">case</span></code> or <code class="sourceCode haskell"><span class="kw">if</span></code> constructs. They support only recursion and pattern matching.</p>
<p><code class="sourceCode haskell"><span class="fu">words</span></code> works as expected:</p>
<div class="sourceCode" id="cb13" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">words</span> [<span class="st">"h"</span>, <span class="st">"e"</span>, <span class="st">"l"</span>, <span class="st">"l"</span>, <span class="st">"o"</span>, <span class="st">" "</span>, <span class="st">"t"</span>, <span class="st">"h"</span>, <span class="st">"e"</span>, <span class="st">"r"</span>, <span class="st">"e"</span>]</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>["hello","there"]</span></code></pre></div>
<p>Translating <code>words</code> to type-level is almost mechanical. Starting with the last function above:</p>
<div class="sourceCode" id="cb14" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">Rev</span> (<span class="ot">acc ::</span> [<span class="dt">Symbol</span>]) (<span class="ot">chrs ::</span> [<span class="dt">Symbol</span>])<span class="ot"> ::</span> [<span class="dt">Symbol</span>] <span class="kw">where</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Rev</span> '[] a <span class="ot">=</span> a</span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Rev</span> (x <span class="op">:</span> xs) a <span class="ot">=</span> <span class="dt">Rev</span> xs (x <span class="op">:</span> a)</span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span> <span class="dt">Reverse</span> (<span class="ot">chrs ::</span> [<span class="dt">Symbol</span>])<span class="ot"> ::</span> [<span class="dt">Symbol</span>] <span class="kw">where</span></span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Reverse</span> l <span class="ot">=</span> <span class="dt">Rev</span> l '[]</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span> <span class="dt">Words2</span> (<span class="ot">acc ::</span> [<span class="dt">Symbol</span>]) (<span class="ot">chrs ::</span> [<span class="dt">Symbol</span>])<span class="ot"> ::</span> [<span class="dt">Symbol</span>] <span class="kw">where</span></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Words2</span> acc '[] <span class="ot">=</span> acc</span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Words2</span> '[] (<span class="st">" "</span> <span class="op">:</span> xs) <span class="ot">=</span> <span class="dt">Words2</span> '[] xs</span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">Words2</span> '[] (x <span class="op">:</span> xs) <span class="ot">=</span> <span class="dt">Words2</span> '[x] xs</span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Words2</span> acc (<span class="st">" "</span> <span class="op">:</span> xs) <span class="ot">=</span> <span class="dt">Words2</span> (<span class="st">""</span> <span class="op">:</span> acc) xs</span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">Words2</span> (a <span class="op">:</span> as) (x <span class="op">:</span> xs) <span class="ot">=</span> <span class="dt">Words2</span> (<span class="dt">AppendSymbol</span> a x <span class="op">:</span> as) xs</span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span> <span class="dt">Words</span> (<span class="ot">chrs ::</span> [<span class="dt">Symbol</span>])<span class="ot"> ::</span> [<span class="dt">Symbol</span>] <span class="kw">where</span></span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a> <span class="dt">Words</span> s <span class="ot">=</span> <span class="dt">Reverse</span> (<span class="dt">Words2</span> '[] s)</span></code></pre></div>
<p>We need the <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/instances.html#extension-UndecidableInstances" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">UndecidableInstances</span></code></a> extension to write these type families. This extension relaxes some rules that GHC places to make sure that type inference in the compiler terminates. In other words, this extension lets us write recursive code at type-level which may never terminate. Since GHC cannot prove that the recursion terminates, it’s up to us programmers to make sure that it does.</p>
<p>Let’s see if it works:</p>
<div class="sourceCode" id="cb15" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">ToList</span> <span class="st">"hello there"</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>ToList "hello there" :: [Symbol]</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>= '["h", "e", "l", "l", "o", " ", "t", "h", "e", "r", "e"]</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">Words</span> (<span class="dt">ToList</span> <span class="st">"hello there"</span>)</span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>Words (ToList "hello there") :: [Symbol]</span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a>= '["hello", "there"]</span></code></pre></div>
<p>Great! Now we can move on to the actual parsing of the rules.</p>
<h2 data-track-content data-content-name="parsing-at-type-level" data-content-piece="type-level-haskell-aoc7" id="parsing-at-type-level">Parsing at Type-level</h2>
<p>Here are the rules of the problem as a list of symbols:</p>
<div class="sourceCode" id="cb16" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Rules</span> <span class="ot">=</span> [</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a> <span class="st">"light red bags contain 1 bright white bag, 2 muted yellow bags."</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> , <span class="st">"dark orange bags contain 3 bright white bags, 4 muted yellow bags."</span></span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> , <span class="st">"bright white bags contain 1 shiny gold bag."</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a> , <span class="st">"muted yellow bags contain 2 shiny gold bags, 9 faded blue bags."</span></span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a> , <span class="st">"shiny gold bags contain 1 dark olive bag, 2 vibrant plum bags."</span></span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a> , <span class="st">"dark olive bags contain 3 faded blue bags, 4 dotted black bags."</span></span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> , <span class="st">"vibrant plum bags contain 5 faded blue bags, 6 dotted black bags."</span></span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a> , <span class="st">"faded blue bags contain no other bags."</span></span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a> , <span class="st">"dotted black bags contain no other bags."</span></span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a> ]</span></code></pre></div>
<p>We can see that the rules always start with the color of the container bag. Then they go on to either say that such-and-such bags “contain no other bags.” or list out the counts of one or more contained colored bags. We capture this model in a new type (and kind!):</p>
<div class="sourceCode" id="cb17" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Bag</span> <span class="ot">=</span> <span class="dt">EmptyBag</span> <span class="dt">Symbol</span> <span class="op">|</span> <span class="dt">FilledBag</span> <span class="dt">Symbol</span> [(<span class="dt">Nat</span>, <span class="dt">Symbol</span>)]</span></code></pre></div>
<p>A <code class="sourceCode haskell"><span class="dt">Bag</span></code> is either an <code class="sourceCode haskell"><span class="dt">EmptyBag</span></code> with a color or a <code class="sourceCode haskell"><span class="dt">FilledBag</span></code> with a color and a list of tuples of count of contained bags along with their colors.</p>
<p>Next, we write the parsing logic at type-level which works on word symbols, directly as type families this time:</p>
<div class="sourceCode" id="cb18" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span> <span class="dt">Parse</span> (<span class="ot">wrds ::</span> [<span class="dt">Symbol</span>])<span class="ot"> ::</span> <span class="dt">Bag</span> <span class="kw">where</span></span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">Parse</span> (color1 <span class="op">:</span> color2 <span class="op">:</span> <span class="st">"bags"</span> <span class="op">:</span> <span class="st">"contain"</span> <span class="op">:</span> rest) <span class="ot">=</span></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Parse2</span> (<span class="dt">AppendSymbol</span> color1 (<span class="dt">AppendSymbol</span> <span class="st">" "</span> color2)) rest</span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span> <span class="dt">Parse2</span> (<span class="ot">color ::</span> <span class="dt">Symbol</span>) (<span class="ot">wrds ::</span> [<span class="dt">Symbol</span>])<span class="ot"> ::</span> <span class="dt">Bag</span> <span class="kw">where</span></span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Parse2</span> color (<span class="st">"no"</span> <span class="op">:</span> _) <span class="ot">=</span> <span class="dt">EmptyBag</span> color</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Parse2</span> color rest <span class="ot">=</span> <span class="dt">FilledBag</span> color (<span class="dt">Parse3</span> rest)</span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span> <span class="dt">Parse3</span> (<span class="ot">wrds ::</span> [<span class="dt">Symbol</span>])<span class="ot"> ::</span> [(<span class="dt">Nat</span>, <span class="dt">Symbol</span>)] <span class="kw">where</span></span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Parse3</span> '[] <span class="ot">=</span> '[]</span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Parse3</span> (count <span class="op">:</span> color1 <span class="op">:</span> color2 <span class="op">:</span> _ <span class="op">:</span> rest) <span class="ot">=</span></span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a> ('(<span class="dt">ReadNat</span> count, <span class="dt">AppendSymbol</span> color1 (<span class="dt">AppendSymbol</span> <span class="st">" "</span> color2)) <span class="op">:</span> <span class="dt">Parse3</span> rest)</span></code></pre></div>
<p>The <code class="sourceCode haskell"><span class="dt">Parse</span></code> type family parses a list of word symbols into the <code class="sourceCode haskell"><span class="dt">Bag</span></code> type. The logic is straightforward, if a little verbose compared to the equivalent term-level code. We use the <code class="sourceCode haskell"><span class="dt">AppendSymbol</span></code> type family to put word symbols together and the <a href="https://hackage.haskell.org/package/symbols-0.3.0.0/docs/Data-Symbol-Ascii.html#t:ReadNat" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">ReadNat</span></code></a> type family to convert a <code class="sourceCode haskell"><span class="dt">Symbol</span></code> into a <code class="sourceCode haskell"><span class="dt">Nat</span></code>. The rest is pattern matching and recursion. A quick test in GHCi reveals that it works:</p>
<div class="sourceCode" id="cb19" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">Parse</span></span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> (<span class="dt">Words</span> (<span class="dt">ToList</span> <span class="st">"light red bags contain 1 bright white bag, 2 muted yellow bags."</span>))</span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a>Parse (Words (ToList "light red bags contain 1 bright white bag, 2 muted yellow bags.")) :: Bag</span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a>= 'FilledBag</span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a> "light red" '[ '(1, "bright white"), '(2, "muted yellow")]</span>
<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">Parse</span> (<span class="dt">Words</span> (<span class="dt">ToList</span> <span class="st">"bright white bags contain 1 shiny gold bag."</span>))</span>
<span id="cb19-9"><a href="#cb19-9" aria-hidden="true" tabindex="-1"></a>Parse (Words (ToList "bright white bags contain 1 shiny gold bag.")) :: Bag</span>
<span id="cb19-10"><a href="#cb19-10" aria-hidden="true" tabindex="-1"></a>= 'FilledBag "bright white" '[ '(1, "shiny gold")]</span>
<span id="cb19-11"><a href="#cb19-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">Parse</span> (<span class="dt">Words</span> (<span class="dt">ToList</span> <span class="st">"faded blue bags contain no other bags."</span>))</span>
<span id="cb19-12"><a href="#cb19-12" aria-hidden="true" tabindex="-1"></a>Parse (Words (ToList "faded blue bags contain no other bags.")) :: Bag</span>
<span id="cb19-13"><a href="#cb19-13" aria-hidden="true" tabindex="-1"></a>= 'EmptyBag "faded blue"</span></code></pre></div>
<p>Finally, we parse all rules into a list of <code class="sourceCode haskell"><span class="dt">Bag</span></code>s:</p>
<div class="sourceCode" id="cb20" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span> <span class="dt">ParseRules</span> (<span class="ot">rules ::</span> [<span class="dt">Symbol</span>])<span class="ot"> ::</span> [<span class="dt">Bag</span>] <span class="kw">where</span></span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">ParseRules</span> '[] <span class="ot">=</span> '[]</span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">ParseRules</span> (rule <span class="op">:</span> rest) <span class="ot">=</span> (<span class="dt">Parse</span> (<span class="dt">Words</span> (<span class="dt">ToList</span> rule)) <span class="op">:</span> <span class="dt">ParseRules</span> rest)</span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Bags</span> <span class="ot">=</span> <span class="dt">ParseRules</span> <span class="dt">Rules</span></span></code></pre></div>
<p>And validate that it works:</p>
<div class="sourceCode" id="cb21" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">Bags</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>Bags :: [Bag]</span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a>= '[ 'FilledBag</span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a> "light red" '[ '(1, "bright white"), '(2, "muted yellow")],</span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a> 'FilledBag</span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a> "dark orange" '[ '(3, "bright white"), '(4, "muted yellow")],</span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a> 'FilledBag "bright white" '[ '(1, "shiny gold")],</span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a> 'FilledBag</span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a> "muted yellow" '[ '(2, "shiny gold"), '(9, "faded blue")],</span>
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a> 'FilledBag</span>
<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a> "shiny gold" '[ '(1, "dark olive"), '(2, "vibrant plum")],</span>
<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a> 'FilledBag</span>
<span id="cb21-13"><a href="#cb21-13" aria-hidden="true" tabindex="-1"></a> "dark olive" '[ '(3, "faded blue"), '(4, "dotted black")],</span>
<span id="cb21-14"><a href="#cb21-14" aria-hidden="true" tabindex="-1"></a> 'FilledBag</span>
<span id="cb21-15"><a href="#cb21-15" aria-hidden="true" tabindex="-1"></a> "vibrant plum" '[ '(5, "faded blue"), '(6, "dotted black")],</span>
<span id="cb21-16"><a href="#cb21-16" aria-hidden="true" tabindex="-1"></a> 'EmptyBag "faded blue", 'EmptyBag "dotted black"]</span></code></pre></div>
<p>On to the final step of solving the problem: calculating the number of contained bags.</p>
<h2 data-track-content data-content-name="how-many-bags" data-content-piece="type-level-haskell-aoc7" id="how-many-bags">How Many Bags?</h2>
<p>We have the list of bags with us now. To calculate the total number of bags contained in a bag of a given color, we need to be able to lookup bags from this list by their colors. So, that’s the first thing we implement:</p>
<div class="sourceCode" id="cb22" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">LookupBag</span> (<span class="ot">color ::</span> <span class="dt">Symbol</span>) (<span class="ot">bags ::</span> [<span class="dt">Bag</span>])<span class="ot"> ::</span> <span class="dt">Bag</span> <span class="kw">where</span></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">LookupBag</span> color '[] <span class="ot">=</span> <span class="dt">TypeError</span> (<span class="dt">Text</span> <span class="st">"Unknown color: "</span> <span class="op">:<>:</span> <span class="dt">ShowType</span> color)</span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">LookupBag</span> color (<span class="dt">EmptyBag</span> color' <span class="op">:</span> rest) <span class="ot">=</span></span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">LookupBag2</span> color (<span class="dt">CmpSymbol</span> color color') (<span class="dt">EmptyBag</span> color') rest</span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">LookupBag</span> color (<span class="dt">FilledBag</span> color' contained <span class="op">:</span> rest) <span class="ot">=</span></span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">LookupBag2</span> color (<span class="dt">CmpSymbol</span> color color') (<span class="dt">FilledBag</span> color' contained) rest</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span> <span class="dt">LookupBag2</span> (<span class="ot">color ::</span> <span class="dt">Symbol</span>)</span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a> (<span class="ot">order ::</span> <span class="dt">Ordering</span>)</span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a> (<span class="ot">bag ::</span> <span class="dt">Bag</span>)</span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a> (<span class="ot">rest ::</span> [<span class="dt">Bag</span>])<span class="ot"> ::</span> <span class="dt">Bag</span> <span class="kw">where</span></span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">LookupBag2</span> _ <span class="dt">EQ</span> bag _ <span class="ot">=</span> bag</span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">LookupBag2</span> color _ _ rest <span class="ot">=</span> <span class="dt">LookupBag</span> color rest</span></code></pre></div>
<p>The <code class="sourceCode haskell"><span class="dt">LookupBag</span></code> type family recursively walks through the list of <code class="sourceCode haskell"><span class="dt">Bag</span></code>s, matching each bag’s color to the given color using the <a href="https://hackage.haskell.org/package/base-4.14.1.0/docs/GHC-TypeLits.html#t:CmpSymbol" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">CmpSymbol</span></code></a> type family. Upon finding a match, it returns the matched bag. If no match is found, it returns a <code class="sourceCode haskell"><span class="dt">TypeError</span></code>. <a href="https://hackage.haskell.org/package/base-4.9.1.0/docs/GHC-TypeLits.html#t:TypeError" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">TypeError</span></code></a> is a type family much like the <a href="https://hackage.haskell.org/package/base-4.14.1.0/docs/Prelude.html#v:error" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="fu">error</span></code></a> function except it throws a compile time error instead of a runtime error.</p>
<p>Finally, we use <code class="sourceCode haskell"><span class="dt">LookupBag</span></code> to implement the <code class="sourceCode haskell"><span class="dt">BagCount</span></code> type family which does the actual calculation:</p>
<div class="sourceCode" id="cb23" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span> <span class="dt">BagCount</span> (<span class="ot">color ::</span> <span class="dt">Symbol</span>)<span class="ot"> ::</span> <span class="dt">Nat</span> <span class="kw">where</span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">BagCount</span> color <span class="ot">=</span> <span class="dt">BagCount2</span> (<span class="dt">LookupBag</span> color <span class="dt">Bags</span>)</span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span> <span class="dt">BagCount2</span> (<span class="ot">bag ::</span> <span class="dt">Bag</span>)<span class="ot"> ::</span> <span class="dt">Nat</span> <span class="kw">where</span></span>
<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">BagCount2</span> (<span class="dt">EmptyBag</span> _) <span class="ot">=</span> <span class="dv">0</span></span>
<span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">BagCount2</span> (<span class="dt">FilledBag</span> _ bagCounts) <span class="ot">=</span> <span class="dt">BagCount3</span> bagCounts</span>
<span id="cb23-7"><a href="#cb23-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb23-8"><a href="#cb23-8" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="kw">family</span> <span class="dt">BagCount3</span> (<span class="ot">a ::</span> [(<span class="dt">Nat</span>, <span class="dt">Symbol</span>)])<span class="ot"> ::</span> <span class="dt">Nat</span> <span class="kw">where</span></span>
<span id="cb23-9"><a href="#cb23-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">BagCount3</span> '[] <span class="ot">=</span> <span class="dv">0</span></span>
<span id="cb23-10"><a href="#cb23-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">BagCount3</span> ( '(n, bag) <span class="op">:</span> as) <span class="ot">=</span></span>
<span id="cb23-11"><a href="#cb23-11" aria-hidden="true" tabindex="-1"></a> n <span class="op">+</span> n <span class="op">GHC.TypeLits.*</span> <span class="dt">BagCount2</span> (<span class="dt">LookupBag</span> bag <span class="dt">Bags</span>) <span class="op">+</span> <span class="dt">BagCount3</span> as</span></code></pre></div>
<p>We use the type-level operators <code class="sourceCode haskell"><span class="op">+</span></code> and <code class="sourceCode haskell"><span class="op">*</span></code> from the <code class="sourceCode haskell"><span class="dt">GHC.TypeLits</span></code> package to do the math on the <code class="sourceCode haskell"><span class="dt">Nat</span></code> numbers. The rest again, is just recursion and pattern matching.</p>
<p>Our work is finished. It’s time to put it all to test in GHCi:</p>
<div class="sourceCode" id="cb24" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">BagCount</span> <span class="st">"shiny gold"</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>BagCount "shiny gold" :: Nat</span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a>= 32</span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">BagCount</span> <span class="st">"light red"</span></span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a>BagCount "light red" :: Nat</span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a>= 186</span>
<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">BagCount</span> <span class="st">"faded blue"</span></span>
<span id="cb24-8"><a href="#cb24-8" aria-hidden="true" tabindex="-1"></a>BagCount "faded blue" :: Nat</span>
<span id="cb24-9"><a href="#cb24-9" aria-hidden="true" tabindex="-1"></a>= 0</span></code></pre></div>
<p>It works! We can also convert the type-level counts to term-level using the <a href="https://hackage.haskell.org/package/base-4.14.1.0/docs/GHC-TypeLits.html#v:natVal" target="_blank" rel="noopener"><code>natVal</code></a> function and the <a href="https://hackage.haskell.org/package/base-4.14.1.0/docs/Data-Proxy.html#t:Proxy" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">Proxy</span></code></a> type with the <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/type_applications.html#extension-TypeApplications" target="_blank" rel="noopener"><code class="sourceCode haskell"><span class="dt">TypeApplications</span></code></a> extension. If we put an invalid color, we get a compilation error instead of a runtime error.</p>
<div class="sourceCode" id="cb25" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeApplications</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> natVal <span class="op">$</span> <span class="dt">Proxy</span> <span class="op">@</span>(<span class="dt">BagCount</span> <span class="st">"shiny gold"</span>)</span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a>32</span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> natVal <span class="op">$</span> <span class="dt">Proxy</span> <span class="op">@</span>(<span class="dt">BagCount</span> <span class="st">"shiny red"</span>)</span>
<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-6"><a href="#cb25-6" aria-hidden="true" tabindex="-1"></a><span class="ot"><</span>interactive<span class="op">>:</span><span class="dv">17</span><span class="op">:</span><span class="dv">1</span><span class="op">:</span> <span class="fu">error</span><span class="op">:</span></span>
<span id="cb25-7"><a href="#cb25-7" aria-hidden="true" tabindex="-1"></a> • Unknown color: "shiny red"</span>
<span id="cb25-8"><a href="#cb25-8" aria-hidden="true" tabindex="-1"></a> • In the expression: natVal $ Proxy @(BagCount "shiny red")</span>
<span id="cb25-9"><a href="#cb25-9" aria-hidden="true" tabindex="-1"></a> In an equation for ‘it’:</span>
<span id="cb25-10"><a href="#cb25-10" aria-hidden="true" tabindex="-1"></a> it = natVal $ Proxy @(BagCount "shiny red")</span></code></pre></div>
<hr></hr>
<p>This concludes our little fun experiment with type-level programming in Haskell<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>. Though our problem was an easy one, it demonstrated the power of TLP. I hope to see more useful applications of TLP in the Haskell ecosystem going forward.</p>
<p>You can find the complete code for this post <a href="https://abhinavsarkar.net/code/aoc7.html?mtm_campaign=feed">here</a>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>Many modern libraries are increasingly employing TLP for better type-safe APIs. Some examples:</p>
<ul>
<li><a href="https://hackage.haskell.org/package/servant" target="_blank" rel="noopener">servant</a>: Type-safe webservice APIs</li>
<li><a href="https://hackage.haskell.org/package/row-types" target="_blank" rel="noopener">row-types</a>: Type-safe open records</li>
<li><a href="https://hackage.haskell.org/package/polysemy" target="_blank" rel="noopener">polysemy</a>: Type-safe effect system</li>
<li><a href="https://hackage.haskell.org/package/type-of-html" target="_blank" rel="noopener">type-of-html</a>: Type driven HTML generation</li>
<li><a href="https://hackage.haskell.org/package/sized" target="_blank" rel="noopener">sized</a>: Type-safe sized sequence data-types</li>
<li><a href="https://hackage.haskell.org/package/dimensions" target="_blank" rel="noopener">dimensional</a>: Safe type-level dimensionality for multidimensional data</li>
</ul>
<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn2"><p>For the unfamiliar:</p>
<blockquote>
<p>Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like. People use them as a speed contest, interview prep, company training, university coursework, practice problems, or to challenge each other.</p>
</blockquote>
<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn3"><p>Type-level strings have interesting usages like type-safe keys in <a href="https://hackage.haskell.org/package/row-types" target="_blank" rel="noopener">open records</a>.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>The type families we use in this post are technically <em>Top-level closed type families</em>. There are other ways of defining type families as described in the <a href="https://wiki.haskell.org/GHC/Type_families" target="_blank" rel="noopener">Haskell wiki</a>.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>The author of the <code>symbols</code> library <a href="https://kcsongor.github.io" target="_blank" rel="noopener">Csongor Kiss</a> has written <a href="https://kcsongor.github.io/symbol-parsing-haskell/" target="_blank" rel="noopener">an excellent post</a> about how <code>ToList</code> is implemented.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>We solve only the example problem in this post but not the actual problem which has a much larger set of rules. This is because it’s just too slow to compile. I suspect it’s because we don’t have a built-in function to break a symbol into its constituent characters and have to resort to complicated type-level hacks for the same.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>Solving Advent of Code</strong>.</p>
<ol>
<li>
<strong>“Handy Haversacks” in Type-level Haskell</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/parsers-zippers-interpreters-aoc7/?mtm_campaign=feed">“No Space Left On Device” with Parsers, Zippers and Interpreters</a>
</li>
<li>
<a href="https://abhinavsarkar.net/notes/2022-type-level-rps/?mtm_campaign=feed">“Rock-Paper-Scissors” in Type-level Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/compiling-aoc23-aplenty/?mtm_campaign=feed">“Aplenty” by Compiling</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/solving-aoc20-seating-system/?mtm_campaign=feed">“Seating System” with Comonads and Stencils</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/type-level-haskell-aoc7/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2021-01-04T00:00:00Z <p>I have been trying to use type-level programming in Haskell to solve interesting problems since I read <a href="https://thinkingwithtypes.com/" target="_blank" rel="noopener">Thinking with Types</a> by <a href="https://reasonablypolymorphic.com/" target="_blank" rel="noopener">Sandy Maguire</a>. Then I found myself solving the problems in <a href="https://adventofcode.com/2020" target="_blank" rel="noopener">Advent of Code 2020</a> and some of them seemed suitable to be solved with type-level programming. So I decided to give it a shot.</p>
<p> https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-3/ JSON Parsing from Scratch in Haskell: Error Reporting—Part 2 2020-09-30T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-2/?mtm_campaign=feed">previous post</a>, we set out to rewrite the JSON parser we wrote in Haskell in an <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed">earlier post</a>, to add support for error reporting. The parser was written very naively: if it failed, it returned nothing. You couldn’t tell what the failure was or where it happened. That’s OK for a toy parser but error reporting is an absolute must requirement for all good parsers. In the previous post, we finished writing the basic framework for the same. In this post, we’ll finish adding simple but useful error reporting capability to our JSON parser.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-3/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>JSON Parsing from Scratch in Haskell</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed">JSON Parsing from Scratch in Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-2/?mtm_campaign=feed">Error Reporting—Part 1</a>
</li>
<li>
<strong>Error Reporting—Part 2</strong> 👈
</li>
</ol>
</section>
<nav id="toc"><h3>Contents</h3><ol><li><a href="#setup">Setup</a></li><li><a href="#jnull-and-jbool-parsers">JNull and JBool Parsers</a></li><li><a href="#jstring-parser">JString Parser</a></li><li><a href="#jnumber-parser">JNumber Parser</a></li><li><a href="#jarray-parser">JArray Parser</a></li><li><a href="#jobject-parser">JObject Parser</a></li><li><a href="#jvalue-parser">JValue Parser</a></li><li><a href="#conclusion">Conclusion</a></li></ol></nav>
<h2 data-track-content data-content-name="setup" data-content-piece="json-parsing-from-scratch-in-haskell-3" id="setup">Setup</h2>
<p>In the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-2/?mtm_campaign=feed">previous post</a>, we implemented a new parser using <a href="https://web.archive.org/web/20200930/https://learnyouahaskell.com/zippers" target="_blank" rel="noopener"><em>Zippers</em></a> which supported error reporting with multiline contextual error messages with the position of the errors. We also wrote some basic parsers using this new parser. Here’s a quick recap of the code:</p>
<p>The <code>JValue</code> data type:</p>
<div class="sourceCode" id="cb1" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">JValue</span> <span class="ot">=</span> <span class="dt">JNull</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JBool</span> <span class="dt">Bool</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JString</span> <span class="dt">String</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JNumber</span> {<span class="ot"> int ::</span> <span class="dt">Integer</span>,<span class="ot"> frac ::</span> [<span class="dt">Int</span>],<span class="ot"> exponent ::</span> <span class="dt">Integer</span> }</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JArray</span> [<span class="dt">JValue</span>]</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JObject</span> [(<span class="dt">String</span>, <span class="dt">JValue</span>)]</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Generic</span>)</span></code></pre></div>
<p>The <code>ParseResult</code> data type:</p>
<div class="sourceCode" id="cb2" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">ParseResult</span> a <span class="ot">=</span> <span class="dt">Error</span> [<span class="dt">String</span>] <span class="op">|</span> <span class="dt">Result</span> a</span></code></pre></div>
<p>The new <code>TextZipper</code> based parser:</p>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Parser</span> i o <span class="ot">=</span> <span class="dt">Parser</span> {</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="ot"> runParser_ ::</span> <span class="dt">TextZipper</span> i <span class="ot">-></span> <span class="dt">ParseResult</span> (<span class="dt">TextZipper</span> i, o)</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a><span class="ot">runParser ::</span> <span class="dt">Parser</span> <span class="dt">String</span> o <span class="ot">-></span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">ParseResult</span> (<span class="dt">String</span>, o)</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>runParser parser input <span class="ot">=</span></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> runParser_ parser (textZipper <span class="op">$</span> <span class="fu">lines</span> input) <span class="kw">of</span></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Error</span> errs <span class="ot">-></span> <span class="dt">Error</span> errs</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> (restZ, output) <span class="ot">-></span> <span class="dt">Result</span> (leftOver restZ, output)</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> leftOver tz <span class="ot">=</span> <span class="fu">concat</span> (tzRight tz <span class="op">:</span> tzBelow tz)</span></code></pre></div>
<p>The helper functions to create and throw parse errors:</p>
<div class="sourceCode" id="cb4" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">parseError ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">TextZipper</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">ParseResult</span> a</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>parseError err zipper <span class="ot">=</span> <span class="dt">Error</span> [addPosition err zipper]</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="ot">throw ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> o</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>throw <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">.</span> parseError</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="ot">elseThrow ::</span> <span class="dt">Parser</span> <span class="dt">String</span> o <span class="ot">-></span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> o</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>elseThrow parser err <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> runParser_ parser input <span class="kw">of</span></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> (rest, a) <span class="ot">-></span> <span class="dt">Result</span> (rest, a)</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Error</span> errs <span class="ot">-></span> <span class="dt">Error</span> (addPosition err input <span class="op">:</span> errs)</span></code></pre></div>
<p>And finally, the basic parsers written using the new parser:</p>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ot">lookahead ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>lookahead <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> currentChar input <span class="kw">of</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="ot">-></span> <span class="dt">Result</span> (input, c)</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> parseError <span class="st">"Empty input"</span> input</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a><span class="ot">safeLookahead ::</span> <span class="dt">Parser</span> <span class="dt">String</span> (<span class="dt">Maybe</span> <span class="dt">Char</span>)</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>safeLookahead <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> currentChar input <span class="kw">of</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="ot">-></span> <span class="dt">Result</span> (input, <span class="dt">Just</span> c)</span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">Result</span> (input, <span class="dt">Nothing</span>)</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a><span class="ot">satisfy ::</span> (<span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Bool</span>) <span class="ot">-></span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a>satisfy predicate expectation <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> currentChar input <span class="kw">of</span></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="op">|</span> predicate c <span class="ot">-></span> <span class="dt">Result</span> (move input, c)</span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="ot">-></span> <span class="fu">flip</span> parseError input <span class="op">$</span></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a> expectation <span class="op"><></span> <span class="st">", got '"</span> <span class="op"><></span> showCharForErrorMsg c <span class="op"><></span> <span class="st">"'"</span></span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">flip</span> parseError input <span class="op">$</span></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a> expectation <span class="op"><></span> <span class="st">", but the input is empty"</span></span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a><span class="ot">char ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a>char c <span class="ot">=</span> satisfy (<span class="op">==</span> c) <span class="op">$</span> printf <span class="st">"Expected '%v'"</span> <span class="op">$</span> showCharForErrorMsg c</span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a><span class="ot">digit ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Int</span></span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a>digit <span class="ot">=</span> <span class="fu">digitToInt</span> <span class="op"><$></span> satisfy <span class="fu">isDigit</span> <span class="st">"Expected a digit"</span></span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a><span class="ot">string ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">String</span></span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a>string <span class="st">""</span> <span class="ot">=</span> <span class="fu">pure</span> <span class="st">""</span></span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a>string (c<span class="op">:</span>cs) <span class="ot">=</span> (<span class="op">:</span>) <span class="op"><$></span> char c <span class="op"><*></span> string cs</span></code></pre></div>
<p>Now, we are going to rewrite all the elemental JSON parsers from the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed">first post</a> and put them together to create the final JSON parser. We start with the simplest of them.</p>
<h2 data-track-content data-content-name="jnull-and-jbool-parsers" data-content-piece="json-parsing-from-scratch-in-haskell-3" id="jnull-and-jbool-parsers">JNull and JBool Parsers</h2>
<p>The <code>jNull</code> and <code>jBool</code> parsers stay the same as seen before.</p>
<div class="sourceCode" id="cb6" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jNull ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>jNull <span class="ot">=</span> string <span class="st">"null"</span> <span class="op">$></span> <span class="dt">JNull</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="ot">jBool ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>jBool <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> c <span class="ot"><-</span> lookahead</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">JBool</span> <span class="op"><$></span> <span class="kw">case</span> c <span class="kw">of</span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> <span class="ch">'t'</span> <span class="ot">-></span> string <span class="st">"true"</span> <span class="op">$></span> <span class="dt">True</span></span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a> <span class="ch">'f'</span> <span class="ot">-></span> string <span class="st">"false"</span> <span class="op">$></span> <span class="dt">False</span></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> throw <span class="op">$</span></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a> errorMsgForChar <span class="st">"Expected: 't' for true or 'f' for false; got '%v'"</span> c</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a><span class="ot">errorMsgForChar ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a>errorMsgForChar err c <span class="ot">=</span> printf err <span class="op">$</span> showCharForErrorMsg c</span></code></pre></div>
<h2 data-track-content data-content-name="jstring-parser" data-content-piece="json-parsing-from-scratch-in-haskell-3" id="jstring-parser">JString Parser</h2>
<p>JSON strings have a <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#string">complex syntax</a> and the parser for them is going to be the most complex one. First, we write the <code>jsonChar</code> parser to parse a JSON character:</p>
<div class="sourceCode" id="cb7" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jsonChar ::</span> <span class="dt">Parser</span> <span class="dt">String</span> (<span class="dt">Char</span>, <span class="dt">Int</span>)</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>jsonChar <span class="ot">=</span> lookahead <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\\'</span> <span class="ot">-></span> char <span class="ch">'\\'</span> <span class="op">*></span> escapedChar</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> (,<span class="dv">1</span>) <span class="op"><$></span> otherChar</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> escapedChar <span class="ot">=</span> lookahead <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> <span class="ch">'"'</span> <span class="ot">-></span> (<span class="ch">'"'</span>, <span class="dv">2</span>) <span class="op"><$</span> char <span class="ch">'"'</span></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\\'</span> <span class="ot">-></span> (<span class="ch">'\\'</span>, <span class="dv">2</span>) <span class="op"><$</span> char <span class="ch">'\\'</span></span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> <span class="ch">'/'</span> <span class="ot">-></span> ( <span class="ch">'/'</span>, <span class="dv">2</span>) <span class="op"><$</span> char <span class="ch">'/'</span></span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a> <span class="ch">'b'</span> <span class="ot">-></span> (<span class="ch">'\b'</span>, <span class="dv">2</span>) <span class="op"><$</span> char <span class="ch">'b'</span></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a> <span class="ch">'f'</span> <span class="ot">-></span> (<span class="ch">'\f'</span>, <span class="dv">2</span>) <span class="op"><$</span> char <span class="ch">'f'</span></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a> <span class="ch">'n'</span> <span class="ot">-></span> (<span class="ch">'\n'</span>, <span class="dv">2</span>) <span class="op"><$</span> char <span class="ch">'n'</span></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a> <span class="ch">'r'</span> <span class="ot">-></span> (<span class="ch">'\r'</span>, <span class="dv">2</span>) <span class="op"><$</span> char <span class="ch">'r'</span></span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a> <span class="ch">'t'</span> <span class="ot">-></span> (<span class="ch">'\t'</span>, <span class="dv">2</span>) <span class="op"><$</span> char <span class="ch">'t'</span></span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a> <span class="ch">'u'</span> <span class="ot">-></span> (,<span class="dv">6</span>) <span class="op"><$></span> (char <span class="ch">'u'</span> <span class="op">*></span> unicodeChar)</span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a> c <span class="ot">-></span> throw <span class="op">$</span> errorMsgForChar <span class="st">"Invalid escaped character: '%v'"</span> c</span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a> unicodeChar <span class="ot">=</span></span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true" tabindex="-1"></a> <span class="fu">chr</span> <span class="op">.</span> <span class="fu">fromIntegral</span> <span class="op">.</span> digitsToNumber <span class="dv">16</span> <span class="dv">0</span> <span class="op"><$></span> replicateM <span class="dv">4</span> hexDigit</span>
<span id="cb7-20"><a href="#cb7-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true" tabindex="-1"></a> hexDigit <span class="ot">=</span> <span class="fu">digitToInt</span> <span class="op"><$></span></span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true" tabindex="-1"></a> satisfy <span class="fu">isHexDigit</span> <span class="st">"Expected a hex digit"</span></span>
<span id="cb7-23"><a href="#cb7-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-24"><a href="#cb7-24" aria-hidden="true" tabindex="-1"></a> otherChar <span class="ot">=</span> satisfy (<span class="fu">not</span> <span class="op">.</span> isQuoteEscapeOrControl)</span>
<span id="cb7-25"><a href="#cb7-25" aria-hidden="true" tabindex="-1"></a> <span class="st">"Did not except '\"', '\\' or control characters"</span></span>
<span id="cb7-26"><a href="#cb7-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-27"><a href="#cb7-27" aria-hidden="true" tabindex="-1"></a> isQuoteEscapeOrControl c <span class="ot">=</span> c <span class="op">==</span> <span class="ch">'\"'</span> <span class="op">||</span> c <span class="op">==</span> <span class="ch">'\\'</span> <span class="op">||</span> <span class="fu">isControl</span> c</span>
<span id="cb7-28"><a href="#cb7-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-29"><a href="#cb7-29" aria-hidden="true" tabindex="-1"></a><span class="ot">digitsToNumber ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Integer</span> <span class="ot">-></span> [<span class="dt">Int</span>] <span class="ot">-></span> <span class="dt">Integer</span></span>
<span id="cb7-30"><a href="#cb7-30" aria-hidden="true" tabindex="-1"></a>digitsToNumber base <span class="ot">=</span></span>
<span id="cb7-31"><a href="#cb7-31" aria-hidden="true" tabindex="-1"></a> <span class="fu">foldl</span> (\num digit <span class="ot">-></span> num <span class="op">*</span> <span class="fu">fromIntegral</span> base <span class="op">+</span> <span class="fu">fromIntegral</span> digit)</span></code></pre></div>
<p>We do two lookaheads in <code>jsonChar</code><a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. The first is for checking if the first character is backslash (<code>\</code>). If so, we consume it and parse the rest of the input for a JSON escaped character. Otherwise, we parse the input for a non escaped and non-control character.</p>
<p>When parsing for escaped characters, we do another lookahead and based on it, we consume one of the known escaped characters. Or if the lookahead gives <code>u</code>, we parse the rest of the input for a unicode character represented with four hex digits.</p>
<p><code>jsonChar</code> also returns the number of characters consumed from the input for the purpose that’ll become clear when we go over the <code>jString</code> parser. We also make sure to throw appropriate errors when parsing fails. Let’s check out <code>jsonChar</code> in GHCi:</p>
<div class="sourceCode" id="cb8" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jsonChar <span class="st">"a"</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>("",('a',1))</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jsonChar <span class="st">"\\b"</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>("",('\b',2))</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jsonChar <span class="st">"\\u0040"</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>("",('@',6))</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jsonChar <span class="st">"\\g"</span></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a>Invalid escaped character: 'g' at line 1, column 2: \g</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jsonChar <span class="st">"\\u0x40"</span></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a>Expected a hex digit, got 'x' at line 1, column 4: \u0x40</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jsonChar <span class="st">"\0"</span></span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a>Did not except '"', '\' or control characters, got '\0' at line 1, column 1: \0</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a> ↑</span></code></pre></div>
<p>Both positive and negative tests work fine and the error messages are correct too. Let’s move on to writing the JSON string parser.</p>
<div class="sourceCode" id="cb9" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jString ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>jString <span class="ot">=</span> <span class="dt">JString</span> <span class="op"><$></span> (char <span class="ch">'"'</span> <span class="op">*></span> jString')</span></code></pre></div>
<p>The <code>jString</code> parser simply consumes the leading double quote character of a JSON string and invokes the ancillary parser <code>jString'</code> that does the rest of the parsing.</p>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jString' ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">String</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>jString' <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> c <span class="ot"><-</span> lookahead <span class="ot">`elseThrow`</span> <span class="st">"Expected rest of a string"</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> c <span class="op">==</span> <span class="ch">'"'</span></span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> <span class="st">""</span> <span class="op"><$</span> char <span class="ch">'"'</span></span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> jFirstChar</span></code></pre></div>
<p>The <code>jString'</code> parser first does a lookahead, failing which it throws an error. If the lookahead returns the double quote character, then it consumes the character and returns an empty string. Otherwise, it calls the <code>jFirstChar</code> parser to parse the first JSON character.</p>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jFirstChar ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">String</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>jFirstChar <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> (first, count) <span class="ot"><-</span> jsonChar</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> <span class="op">|</span> <span class="fu">not</span> (isSurrogate first) <span class="ot">-></span> (first<span class="op">:</span>) <span class="op"><$></span> jString'</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> isHighSurrogate first <span class="ot">-></span> jSecondChar first</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> pushback count</span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> throw</span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> errorMsgForChar <span class="st">"Expected a high surrogate character, got '%v'"</span></span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> first</span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a><span class="ot">pushback ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> ()</span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a>pushback count <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span></span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> (<span class="fu">iterate</span> moveBackByOne input <span class="op">!!</span> count, ())</span></code></pre></div>
<p><code>jFirstChar</code> calls the <code>jsonChar</code> parser to get the first JSON character and the count of the characters consumed from the input. If the first JSON character is not a unicode surrogate character then it just calls <code>jString'</code> to parse the rest of the string and returns the first character consed with it<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<p>If the first character is a high surrogate character then it calls <code>jSecondChar</code> to parse the rest of the string. Else, the first character is a low surrogate character, which is an error case as Unicode surrogate pairs must start with a high surrogate character<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>. We can throw an error to report the same but there is a catch here. Since we have already consumed the first JSON character from the input, throwing an error at this point will report the wrong position. To fix this, we need to move back in the input by the number of characters consumed by the first JSON character. For this purpose, we call a special parser <code>pushback</code> which rewinds the cursor in the input text zipper by calling <code>moveBackByOne</code> the right number of times. After invoking the <code>pushback</code> parser, we throw the error.</p>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jSecondChar ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">String</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>jSecondChar first <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> (second, count) <span class="ot"><-</span> jsonChar <span class="ot">`elseThrow`</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> <span class="st">"Expected a second character of a surrogate pair"</span></span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> isLowSurrogate second</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> (combineSurrogates first second <span class="op">:</span>) <span class="op"><$></span> jString'</span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="kw">do</span></span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> pushback count</span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a> throw</span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> errorMsgForChar <span class="st">"Expected a low surrogate character, got '%v'"</span></span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> second</span></code></pre></div>
<p>The <code>jSecondChar</code> parser is similar to the <code>jFirstChar</code> parser. If it finds a low surrogate JSON character then it combines the high and the low surrogate characters and calls <code>jString'</code> to parse the rest of the string. Else it pushes back the input by the correct number of characters and throw an error<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>.</p>
<p>That completes our most complex parser. Let’s test it out in GHCi:</p>
<div class="sourceCode" id="cb13" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jString <span class="st">"\"abc\""</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>("","abc")</span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jString <span class="st">"\"abc"</span></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>Empty input at line 1, column 5: "abc</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>→ Expected rest of a string at line 1, column 5: "abc</span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jString <span class="st">"\"\\uD834\\uDD1E\""</span></span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a>("","𝄞")</span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jString <span class="st">"\"\\uD834\""</span></span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a>Did not except '"', '\' or control characters, got '"' at line 1, column 8: \uD834"</span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a>→ Expected a second character of a surrogate pair at line 1, column 8: \uD834"</span>
<span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jString <span class="st">"\"\\uD834\\u0040\""</span></span>
<span id="cb13-16"><a href="#cb13-16" aria-hidden="true" tabindex="-1"></a>Expected a low surrogate character, got '@' at line 1, column 8: \uD834\u0040</span>
<span id="cb13-17"><a href="#cb13-17" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb13-18"><a href="#cb13-18" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jString <span class="st">"\"\\uDD1E\\uDD1E\""</span></span>
<span id="cb13-19"><a href="#cb13-19" aria-hidden="true" tabindex="-1"></a>Expected a high surrogate character, got '?' at line 1, column 2: "\uDD1E</span>
<span id="cb13-20"><a href="#cb13-20" aria-hidden="true" tabindex="-1"></a> ↑</span></code></pre></div>
<p>We try out all the success and failure cases. Everything works out right.</p>
<h2 data-track-content data-content-name="jnumber-parser" data-content-piece="json-parsing-from-scratch-in-haskell-3" id="jnumber-parser">JNumber Parser</h2>
<p>Numbers in JSON can be in different formats as shown by <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#number">this syntax</a>. Let’s start by writing the parser for a JSON integer.</p>
<div class="sourceCode" id="cb14" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jUInt ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Integer</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>jUInt <span class="ot">=</span> (<span class="ot">`elseThrow`</span> <span class="st">"Expected an unsigned integer"</span>) <span class="op">$</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> lookahead <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> <span class="ch">'0'</span> <span class="ot">-></span> <span class="fu">fromIntegral</span> <span class="op"><$></span> digit</span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> c <span class="op">|</span> <span class="fu">isDigit</span> c <span class="ot">-></span> digitsToNumber <span class="dv">10</span> <span class="dv">0</span> <span class="op"><$></span> digits</span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a> c <span class="ot">-></span> throw <span class="op">$</span> printf <span class="st">"Expected a digit, got '%v'"</span> c</span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a><span class="ot">jInt ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Integer</span></span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a>jInt <span class="ot">=</span> (<span class="ot">`elseThrow`</span> <span class="st">"Expected a signed integer"</span>) <span class="op">$</span></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a> lookahead <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> <span class="ch">'-'</span> <span class="ot">-></span> <span class="fu">negate</span> <span class="op"><$></span> (char <span class="ch">'-'</span> <span class="op">*></span> jUInt)</span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> jUInt</span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a><span class="ot">digits ::</span> <span class="dt">Parser</span> <span class="dt">String</span> [<span class="dt">Int</span>]</span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a>digits <span class="ot">=</span> ((<span class="op">:</span>) <span class="op"><$></span> digit <span class="op"><*></span> digits') <span class="ot">`elseThrow`</span> <span class="st">"Expected digits"</span></span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a> digits' <span class="ot">=</span> safeLookahead <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="op">|</span> <span class="fu">isDigit</span> c <span class="ot">-></span> (<span class="op">:</span>) <span class="op"><$></span> digit <span class="op"><*></span> digits'</span>
<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">return</span> []</span></code></pre></div>
<p><code>jUint</code> is a parser for an unsigned JSON integer. It starts with a lookahead and matches the character to see if it is a digit. If so it parses one-or-more characters as digits using the <code>digits</code> parser and converts the list of digits to a number. It handles the case of the first digit being zero specially because only JSON number which can begin with zero is zero itself.</p>
<p><code>jInt</code> adds support for parsing optionally negative signed integers over <code>jUint</code>. Note that there are no positive signed integers in JSON.</p>
<p>The <code>digits</code> parser parses one or more characters into a list of digits as integers. It uses <code>safeLookahead</code> so it stops parsing when it encounters a non-digit character.</p>
<p>Moving on to parsing fractions and exponents:</p>
<div class="sourceCode" id="cb15" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jFrac ::</span> <span class="dt">Parser</span> <span class="dt">String</span> [<span class="dt">Int</span>]</span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>jFrac <span class="ot">=</span> (char <span class="ch">'.'</span> <span class="op">*></span> digits) <span class="ot">`elseThrow`</span> <span class="st">"Expected a fraction"</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="ot">jExp ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Integer</span></span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>jExp c <span class="ot">=</span> (char c <span class="op">*></span> jExp') <span class="ot">`elseThrow`</span> <span class="st">"Expected an exponent"</span></span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a> jExp' <span class="ot">=</span> lookahead <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a> <span class="ch">'-'</span> <span class="ot">-></span> <span class="fu">negate</span> <span class="op"><$></span> (char <span class="ch">'-'</span> <span class="op">*></span> jUInt)</span>
<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a> <span class="ch">'+'</span> <span class="ot">-></span> char <span class="ch">'+'</span> <span class="op">*></span> jUInt</span>
<span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> jUInt</span></code></pre></div>
<p>The <code>jFrac</code> parser parses simply for a dot (<code>.</code>) followed by one or more digits.</p>
<p>The <code>jExp</code> parser parses for a exponent symbol character (<code>e</code> or <code>E</code>) followed by a positive or negative or no sign unsigned integer. It uses <code>jUint</code> to do that. The exponent character is provided by the <code>jNumber</code> parser below.</p>
<p>Finally, the number parser <code>jNumber</code> brings all these together to parse any JSON number:</p>
<div class="sourceCode" id="cb16" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jNumber ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>jNumber <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> i <span class="ot"><-</span> jInt</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> safeLookahead <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> <span class="ch">'.'</span> <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a> f <span class="ot"><-</span> jFrac</span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a> safeLookahead <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c' <span class="op">|</span> isExpSym c' <span class="ot">-></span> <span class="dt">JNumber</span> i f <span class="op"><$></span> jExp c' <span class="co">-- int+frac+exp</span></span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">JNumber</span> i f <span class="dv">0</span> <span class="co">-- int+frac</span></span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="op">|</span> isExpSym c <span class="ot">-></span> <span class="dt">JNumber</span> i [] <span class="op"><$></span> jExp c <span class="co">-- int+exp</span></span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">pure</span> <span class="op">$</span> <span class="dt">JNumber</span> i [] <span class="dv">0</span> <span class="co">-- int</span></span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a> isExpSym c <span class="ot">=</span> c <span class="op">==</span> <span class="ch">'e'</span> <span class="op">||</span> c <span class="op">==</span> <span class="ch">'E'</span></span></code></pre></div>
<p>The first part of any JSON number is a signed integer, which is followed by an optional fraction part, followed by an optional exponent part. We do two lookaheads to determine if there are following fractional and/or exponent parts and accordingly use the previously defined parsers to parse them. Then we put the parts together depending on the four cases<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>.</p>
<p>Let’s take <code>jNumber</code> for a spin in GHCi:</p>
<div class="sourceCode" id="cb17" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"0"</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>("",0)</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"123"</span></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>("",123)</span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"123.1"</span></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>("",123.1)</span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"123e-22"</span></span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a>("",123e-22)</span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"123.11E+3"</span></span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a>("",123.11e3)</span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"01"</span></span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a>("1",0)</span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"-a"</span></span>
<span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a>Expected a digit, got 'a' at line 1, column 2: -a</span>
<span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb17-16"><a href="#cb17-16" aria-hidden="true" tabindex="-1"></a>→ Expected an unsigned integer at line 1, column 2: -a</span>
<span id="cb17-17"><a href="#cb17-17" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb17-18"><a href="#cb17-18" aria-hidden="true" tabindex="-1"></a>→ Expected a signed integer at line 1, column 1: -a</span>
<span id="cb17-19"><a href="#cb17-19" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb17-20"><a href="#cb17-20" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"1.a"</span></span>
<span id="cb17-21"><a href="#cb17-21" aria-hidden="true" tabindex="-1"></a>Expected a digit, got 'a' at line 1, column 3: 1.a</span>
<span id="cb17-22"><a href="#cb17-22" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb17-23"><a href="#cb17-23" aria-hidden="true" tabindex="-1"></a>→ Expected digits at line 1, column 3: 1.a</span>
<span id="cb17-24"><a href="#cb17-24" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb17-25"><a href="#cb17-25" aria-hidden="true" tabindex="-1"></a>→ Expected a fraction at line 1, column 2: 1.a</span>
<span id="cb17-26"><a href="#cb17-26" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb17-27"><a href="#cb17-27" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"1ex"</span></span>
<span id="cb17-28"><a href="#cb17-28" aria-hidden="true" tabindex="-1"></a>Expected a digit, got 'x' at line 1, column 3: 1ex</span>
<span id="cb17-29"><a href="#cb17-29" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb17-30"><a href="#cb17-30" aria-hidden="true" tabindex="-1"></a>→ Expected an unsigned integer at line 1, column 3: 1ex</span>
<span id="cb17-31"><a href="#cb17-31" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb17-32"><a href="#cb17-32" aria-hidden="true" tabindex="-1"></a>→ Expected an exponent at line 1, column 2: 1ex</span>
<span id="cb17-33"><a href="#cb17-33" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb17-34"><a href="#cb17-34" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"1.1e+@"</span></span>
<span id="cb17-35"><a href="#cb17-35" aria-hidden="true" tabindex="-1"></a>Expected a digit, got '@' at line 1, column 6: 1.1e+@</span>
<span id="cb17-36"><a href="#cb17-36" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb17-37"><a href="#cb17-37" aria-hidden="true" tabindex="-1"></a>→ Expected an unsigned integer at line 1, column 6: 1.1e+@</span>
<span id="cb17-38"><a href="#cb17-38" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb17-39"><a href="#cb17-39" aria-hidden="true" tabindex="-1"></a>→ Expected an exponent at line 1, column 4: 1.1e+@</span>
<span id="cb17-40"><a href="#cb17-40" aria-hidden="true" tabindex="-1"></a> ↑</span></code></pre></div>
<p>Great! We were able to write small parsers for the parts of JSON numbers and combine them together using the <code>Applicative</code> and <code>Monad</code> capabilities to create a parser for any JSON number. This covers all the scalar types in JSON. Coming up are the parser for the two composite types.</p>
<h2 data-track-content data-content-name="jarray-parser" data-content-piece="json-parsing-from-scratch-in-haskell-3" id="jarray-parser">JArray Parser</h2>
<p>We start with rewriting the helper functions:</p>
<div class="sourceCode" id="cb18" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">surroundedBy ::</span> <span class="dt">Parser</span> i a <span class="ot">-></span> <span class="dt">Parser</span> i b <span class="ot">-></span> <span class="dt">Parser</span> i a</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>surroundedBy parser1 parser2 <span class="ot">=</span> parser2 <span class="op">*></span> parser1 <span class="op"><*</span> parser2</span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a><span class="ot">separatedBy ::</span> <span class="dt">Parser</span> <span class="dt">String</span> v <span class="ot">-></span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> [v]</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a>separatedBy parser sepChar errMsg <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a> res <span class="ot"><-</span> parser <span class="ot">`elseThrow`</span> errMsg</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a> safeLookahead <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="op">|</span> c <span class="op">==</span> sepChar <span class="ot">-></span></span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a> (res<span class="op">:</span>) <span class="op"><$></span> (char sepChar <span class="op">*></span> separatedBy parser sepChar errMsg)</span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">return</span> [res]</span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a><span class="ot">spaces ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">String</span></span>
<span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a>spaces <span class="ot">=</span> safeLookahead <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb18-14"><a href="#cb18-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="op">|</span> isWhitespace c <span class="ot">-></span> (<span class="op">:</span>) <span class="op"><$></span> char c <span class="op"><*></span> spaces</span>
<span id="cb18-15"><a href="#cb18-15" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">return</span> <span class="st">""</span></span>
<span id="cb18-16"><a href="#cb18-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb18-17"><a href="#cb18-17" aria-hidden="true" tabindex="-1"></a> isWhitespace c <span class="ot">=</span> c <span class="op">==</span> <span class="ch">' '</span> <span class="op">||</span> c <span class="op">==</span> <span class="ch">'\n'</span> <span class="op">||</span> c <span class="op">==</span> <span class="ch">'\r'</span> <span class="op">||</span> c <span class="op">==</span> <span class="ch">'\t'</span></span>
<span id="cb18-18"><a href="#cb18-18" aria-hidden="true" tabindex="-1"></a></span></code></pre></div>
<p><code>surroundedBy</code> stays the same. <code>separatedBy</code> now takes a separator character instead of a separator parser because we need to match the character with the lookahead. It also takes an error message to throw in case the given parser errors out. Rest of the changes are to convert <code>separatedBy</code> and <code>spaces</code> from <code>Alternative</code> based parsing to lookahead based parsing.</p>
<p>Now, we can rewrite the <code>jArray</code> parser using these helpers:</p>
<div class="sourceCode" id="cb19" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jArray ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>jArray <span class="ot">=</span> <span class="dt">JArray</span> <span class="op"><$></span> <span class="kw">do</span></span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a> _ <span class="ot"><-</span> char <span class="ch">'['</span> <span class="op"><*</span> spaces</span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a> c <span class="ot"><-</span> lookahead <span class="ot">`elseThrow`</span> <span class="st">"Expected a JSON value or ']'"</span></span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> c <span class="kw">of</span></span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a> <span class="ch">']'</span> <span class="ot">-></span> [] <span class="op"><$</span> char <span class="ch">']'</span></span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> separatedBy jValue <span class="ch">','</span> <span class="st">"Expected a JSON value"</span> <span class="op"><*</span> </span>
<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a> satisfy (<span class="op">==</span> <span class="ch">']'</span>) <span class="st">"Expected ',' or ']'"</span></span></code></pre></div>
<p><code>jArray</code> is now written in a monadic style instead of the earlier applicative style because we need to use lookahead<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>. First, it consumes the opening bracket (<code>[</code>) and any spaces that follow. Then it does a lookahead and checks to see if it is the closing bracket (<code>]</code>). If so, it consumes the character and returns an empty array.</p>
<p>Otherwise, it recursively parses for one-or-more JSON values separated by comma using the yet to be defined <code>jValue</code> parser, followed by the closing bracket (<code>]</code>). It also takes care of throwing appropriate errors. Let’s check it out in GHCi:</p>
<div class="sourceCode" id="cb20" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">""</span> <span class="co">-- empty input</span></span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>Expected '[', but the input is empty at line 1, column 1:</span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"["</span> <span class="co">-- no closing bracket</span></span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a>Empty input at line 1, column 2: [</span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a>→ Expected a JSON value or ']' at line 1, column 2: [</span>
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"[]"</span> <span class="co">-- empty array</span></span>
<span id="cb20-10"><a href="#cb20-10" aria-hidden="true" tabindex="-1"></a>("",[])</span>
<span id="cb20-11"><a href="#cb20-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"[\t \r \n]"</span> <span class="co">-- empty array with whitespace</span></span>
<span id="cb20-12"><a href="#cb20-12" aria-hidden="true" tabindex="-1"></a>("",[])</span>
<span id="cb20-13"><a href="#cb20-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"[\t\t 0]"</span> <span class="co">-- one element array with whitespace</span></span>
<span id="cb20-14"><a href="#cb20-14" aria-hidden="true" tabindex="-1"></a>("",[0])</span>
<span id="cb20-15"><a href="#cb20-15" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"[0]"</span> <span class="co">-- one element array without whitespace</span></span>
<span id="cb20-16"><a href="#cb20-16" aria-hidden="true" tabindex="-1"></a>("",[0])</span>
<span id="cb20-17"><a href="#cb20-17" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"[0,]"</span> <span class="co">-- closing bracket after comma</span></span>
<span id="cb20-18"><a href="#cb20-18" aria-hidden="true" tabindex="-1"></a>Unexpected character: ']' at line 1, column 4: [0,]</span>
<span id="cb20-19"><a href="#cb20-19" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb20-20"><a href="#cb20-20" aria-hidden="true" tabindex="-1"></a>→ Expected a JSON value at line 1, column 4: [0,]</span>
<span id="cb20-21"><a href="#cb20-21" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb20-22"><a href="#cb20-22" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"[0,1]"</span> <span class="co">-- two element array</span></span>
<span id="cb20-23"><a href="#cb20-23" aria-hidden="true" tabindex="-1"></a>("",[0, 1])</span>
<span id="cb20-24"><a href="#cb20-24" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"[0,"</span> <span class="co">-- no element after comma</span></span>
<span id="cb20-25"><a href="#cb20-25" aria-hidden="true" tabindex="-1"></a>Empty input at line 1, column 4: [0,</span>
<span id="cb20-26"><a href="#cb20-26" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb20-27"><a href="#cb20-27" aria-hidden="true" tabindex="-1"></a>→ Expected a JSON value at line 1, column 4: [0,</span>
<span id="cb20-28"><a href="#cb20-28" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb20-29"><a href="#cb20-29" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"[ "</span> <span class="co">-- no closing bracket after whitespace</span></span>
<span id="cb20-30"><a href="#cb20-30" aria-hidden="true" tabindex="-1"></a>Empty input at line 1, column 8: ······</span>
<span id="cb20-31"><a href="#cb20-31" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb20-32"><a href="#cb20-32" aria-hidden="true" tabindex="-1"></a>→ Expected a JSON value or ']' at line 1, column 8: ······</span>
<span id="cb20-33"><a href="#cb20-33" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb20-34"><a href="#cb20-34" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"[0;1]"</span> <span class="co">-- semicolon instead of comma</span></span>
<span id="cb20-35"><a href="#cb20-35" aria-hidden="true" tabindex="-1"></a>Expected ',' or ']', got ';' at line 1, column 3: [0;1]</span>
<span id="cb20-36"><a href="#cb20-36" aria-hidden="true" tabindex="-1"></a> ↑</span></code></pre></div>
<p>It works as expected for all positive and negative tests. Moving on to the last composite parser for JSON objects.</p>
<h2 data-track-content data-content-name="jobject-parser" data-content-piece="json-parsing-from-scratch-in-haskell-3" id="jobject-parser">JObject Parser</h2>
<p>The <code>jObject</code> parser is very similar to the <code>jArray</code> parser: written in a monadic style using lookahead. Only difference is that the surrounding characters are braces (<code>{}</code>) and the recursive parsing is for one-or-more pairs of JSON string and JSON value<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>.</p>
<div class="sourceCode" id="cb21" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jObject ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>jObject <span class="ot">=</span> <span class="dt">JObject</span> <span class="op"><$></span> <span class="kw">do</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a> _ <span class="ot"><-</span> char <span class="ch">'{'</span> <span class="op"><*</span> spaces</span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a> c <span class="ot"><-</span> lookahead <span class="ot">`elseThrow`</span> <span class="st">"Expected a JSON value or '}'"</span></span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> c <span class="kw">of</span></span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a> <span class="ch">'}'</span> <span class="ot">-></span> [] <span class="op"><$</span> char <span class="ch">'}'</span></span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> separatedBy pair <span class="ch">','</span> <span class="st">"Expected an object key-value pair"</span> <span class="op"><*</span></span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a> satisfy (<span class="op">==</span> <span class="ch">'}'</span>) <span class="st">"Expected ',' or '}'"</span></span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a> pair <span class="ot">=</span> (\ <span class="op">~</span>(<span class="dt">JString</span> s) j <span class="ot">-></span> (s, j)) <span class="op"><$></span> key <span class="op"><*</span> char <span class="ch">':'</span> <span class="op"><*></span> value</span>
<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a> key <span class="ot">=</span> (jString <span class="ot">`surroundedBy`</span> spaces) <span class="ot">`elseThrow`</span> <span class="st">"Expected an object key"</span></span>
<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a> value <span class="ot">=</span> jValue <span class="ot">`elseThrow`</span> <span class="st">"Expected an object value"</span></span></code></pre></div>
<p>Testing in GHCi:</p>
<div class="sourceCode" id="cb22" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">""</span> <span class="co">-- empty input</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>Expected '{', but the input is empty at line 1, column 1:</span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{"</span> <span class="co">-- no closing brace</span></span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a>Empty input at line 1, column 2: {</span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a>→ Expected a JSON value or '}' at line 1, column 2: {</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{}"</span> <span class="co">-- empty object</span></span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a>("",{})</span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{\t \n}"</span> <span class="co">-- empty object with whitespace</span></span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a>("",{})</span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{\t \n \"a\": 1}"</span> <span class="co">-- one element object with whitespace</span></span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a>("",{"a": 1})</span>
<span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{\"a\": 1}"</span> <span class="co">-- one element object without whitespace</span></span>
<span id="cb22-16"><a href="#cb22-16" aria-hidden="true" tabindex="-1"></a>("",{"a": 1})</span>
<span id="cb22-17"><a href="#cb22-17" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{\"a\": 1,}"</span> <span class="co">-- closing brace after comma</span></span>
<span id="cb22-18"><a href="#cb22-18" aria-hidden="true" tabindex="-1"></a>Expected '"', got '}' at line 1, column 9: a":·1,}</span>
<span id="cb22-19"><a href="#cb22-19" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-20"><a href="#cb22-20" aria-hidden="true" tabindex="-1"></a>→ Expected an object key at line 1, column 9: a":·1,}</span>
<span id="cb22-21"><a href="#cb22-21" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-22"><a href="#cb22-22" aria-hidden="true" tabindex="-1"></a>→ Expected an object key-value pair at line 1, column 9: a":·1,}</span>
<span id="cb22-23"><a href="#cb22-23" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-24"><a href="#cb22-24" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{\"a\":}"</span> <span class="co">-- no value in key-value pair</span></span>
<span id="cb22-25"><a href="#cb22-25" aria-hidden="true" tabindex="-1"></a>Unexpected character: '}' at line 1, column 6: {"a":}</span>
<span id="cb22-26"><a href="#cb22-26" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-27"><a href="#cb22-27" aria-hidden="true" tabindex="-1"></a>→ Expected an object value at line 1, column 6: {"a":}</span>
<span id="cb22-28"><a href="#cb22-28" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-29"><a href="#cb22-29" aria-hidden="true" tabindex="-1"></a>→ Expected an object key-value pair at line 1, column 2: {"a":}</span>
<span id="cb22-30"><a href="#cb22-30" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-31"><a href="#cb22-31" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{\"a\": 1,"</span> <span class="co">-- no key after comma</span></span>
<span id="cb22-32"><a href="#cb22-32" aria-hidden="true" tabindex="-1"></a>Expected '"', but the input is empty at line 1, column 9: a":·1,</span>
<span id="cb22-33"><a href="#cb22-33" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-34"><a href="#cb22-34" aria-hidden="true" tabindex="-1"></a>→ Expected an object key at line 1, column 9: a":·1,</span>
<span id="cb22-35"><a href="#cb22-35" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-36"><a href="#cb22-36" aria-hidden="true" tabindex="-1"></a>→ Expected an object key-value pair at line 1, column 9: a":·1,</span>
<span id="cb22-37"><a href="#cb22-37" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-38"><a href="#cb22-38" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{1}"</span> <span class="co">-- wrong key type</span></span>
<span id="cb22-39"><a href="#cb22-39" aria-hidden="true" tabindex="-1"></a>Expected '"', got '1' at line 1, column 2: {1}</span>
<span id="cb22-40"><a href="#cb22-40" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-41"><a href="#cb22-41" aria-hidden="true" tabindex="-1"></a>→ Expected an object key at line 1, column 2: {1}</span>
<span id="cb22-42"><a href="#cb22-42" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-43"><a href="#cb22-43" aria-hidden="true" tabindex="-1"></a>→ Expected an object key-value pair at line 1, column 2: {1}</span>
<span id="cb22-44"><a href="#cb22-44" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-45"><a href="#cb22-45" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{\"a\"}"</span> <span class="co">-- no colon after key</span></span>
<span id="cb22-46"><a href="#cb22-46" aria-hidden="true" tabindex="-1"></a>Expected ':', got '}' at line 1, column 5: {"a"}</span>
<span id="cb22-47"><a href="#cb22-47" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-48"><a href="#cb22-48" aria-hidden="true" tabindex="-1"></a>→ Expected an object key-value pair at line 1, column 2: {"a"}</span>
<span id="cb22-49"><a href="#cb22-49" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-50"><a href="#cb22-50" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{ "</span> <span class="co">-- no closing brace after whitespace</span></span>
<span id="cb22-51"><a href="#cb22-51" aria-hidden="true" tabindex="-1"></a>Empty input at line 1, column 8: ······</span>
<span id="cb22-52"><a href="#cb22-52" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-53"><a href="#cb22-53" aria-hidden="true" tabindex="-1"></a>→ Expected a JSON value or '}' at line 1, column 8: ······</span>
<span id="cb22-54"><a href="#cb22-54" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-55"><a href="#cb22-55" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{\"a\": 1;\"b\": 2}"</span> <span class="co">-- semicolon instead of comma</span></span>
<span id="cb22-56"><a href="#cb22-56" aria-hidden="true" tabindex="-1"></a>Expected ',' or '}', got ';' at line 1, column 8: "a":·1;"b":·</span>
<span id="cb22-57"><a href="#cb22-57" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-58"><a href="#cb22-58" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jObject <span class="st">"{\"a\"# 1}"</span> <span class="co">-- hash instead of semicolon</span></span>
<span id="cb22-59"><a href="#cb22-59" aria-hidden="true" tabindex="-1"></a>Expected ':', got '#' at line 1, column 5: {"a"#·1}</span>
<span id="cb22-60"><a href="#cb22-60" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb22-61"><a href="#cb22-61" aria-hidden="true" tabindex="-1"></a>→ Expected an object key-value pair at line 1, column 2: {"a"#·1</span>
<span id="cb22-62"><a href="#cb22-62" aria-hidden="true" tabindex="-1"></a> ↑</span></code></pre></div>
<p>That’s a lot of tests! But everything works out. On to the final parser which brings everything together.</p>
<h2 data-track-content data-content-name="jvalue-parser" data-content-piece="json-parsing-from-scratch-in-haskell-3" id="jvalue-parser">JValue Parser</h2>
<p>The <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#json-parser">earlier version</a> of the <code>jValue</code> parser was based on backtracking: it tried to parse the input with parsers for each JSON type one-by-one, trying the next parser in case of failure. Since we have eschewed backtracking in this post for the purpose of having correct error message, we need to replace backtracking with lookahead in the <code>jValue</code> parser too.</p>
<div class="sourceCode" id="cb23" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jValue ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a>jValue <span class="ot">=</span> jValue' <span class="ot">`surroundedBy`</span> spaces</span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a> jValue' <span class="ot">=</span> lookahead <span class="op">>>=</span> \<span class="kw">case</span></span>
<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a> <span class="ch">'n'</span> <span class="ot">-></span> jNull <span class="ot">`elseThrow`</span> <span class="st">"Expected null"</span></span>
<span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"></a> <span class="ch">'t'</span> <span class="ot">-></span> jBool <span class="ot">`elseThrow`</span> <span class="st">"Expected true"</span></span>
<span id="cb23-7"><a href="#cb23-7" aria-hidden="true" tabindex="-1"></a> <span class="ch">'f'</span> <span class="ot">-></span> jBool <span class="ot">`elseThrow`</span> <span class="st">"Expected false"</span></span>
<span id="cb23-8"><a href="#cb23-8" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\"'</span> <span class="ot">-></span> jString <span class="ot">`elseThrow`</span> <span class="st">"Expected a string"</span></span>
<span id="cb23-9"><a href="#cb23-9" aria-hidden="true" tabindex="-1"></a> <span class="ch">'['</span> <span class="ot">-></span> jArray <span class="ot">`elseThrow`</span> <span class="st">"Expected an array"</span></span>
<span id="cb23-10"><a href="#cb23-10" aria-hidden="true" tabindex="-1"></a> <span class="ch">'{'</span> <span class="ot">-></span> jObject <span class="ot">`elseThrow`</span> <span class="st">"Expected an object"</span></span>
<span id="cb23-11"><a href="#cb23-11" aria-hidden="true" tabindex="-1"></a> c <span class="op">|</span> c <span class="op">==</span> <span class="ch">'-'</span> <span class="op">||</span> <span class="fu">isDigit</span> c <span class="ot">-></span></span>
<span id="cb23-12"><a href="#cb23-12" aria-hidden="true" tabindex="-1"></a> jNumber <span class="ot">`elseThrow`</span> <span class="st">"Expected a number"</span></span>
<span id="cb23-13"><a href="#cb23-13" aria-hidden="true" tabindex="-1"></a> c <span class="ot">-></span> throw <span class="op">$</span> printf <span class="st">"Unexpected character: '%v'"</span></span>
<span id="cb23-14"><a href="#cb23-14" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> showCharForErrorMsg c</span></code></pre></div>
<p>That turns out to be easier than expected as the JSON syntax is unambiguous. We can always choose the right sub-parser to use by looking at the character lookahead returns. We also sprinkle right error messages for each sub-parser and for the default case. And we are done!</p>
<p>We write the top-level <code>parseJSON</code> function now which runs the <code>jValue</code> parser over the input and return the result as an <code>Either</code>. The <code>printResult</code> function just pretty-prints the result of <code>parseJSON</code>.</p>
<div class="sourceCode" id="cb24" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="ot">parseJSON ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Either</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>parseJSON s <span class="ot">=</span> <span class="kw">case</span> runParser jValue s <span class="kw">of</span></span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> (<span class="st">""</span>, j) <span class="ot">-></span> <span class="dt">Right</span> j</span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> (i, _) <span class="ot">-></span> <span class="dt">Left</span> <span class="op">$</span> <span class="st">"Leftover input: "</span> <span class="op"><></span> i</span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a> err<span class="op">@</span>(<span class="dt">Error</span> _) <span class="ot">-></span> <span class="dt">Left</span> <span class="op">$</span> <span class="fu">show</span> err</span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"></a><span class="ot">printResult ::</span> <span class="dt">Either</span> <span class="dt">String</span> <span class="dt">JValue</span> <span class="ot">-></span> <span class="dt">IO</span> ()</span>
<span id="cb24-8"><a href="#cb24-8" aria-hidden="true" tabindex="-1"></a>printResult <span class="ot">=</span> <span class="fu">putStrLn</span> <span class="op">.</span> <span class="fu">either</span> (<span class="st">"ERROR:\n"</span> <span class="op"><></span>) ((<span class="st">"RESULT:\n"</span> <span class="op"><></span>) <span class="op">.</span> <span class="fu">show</span>)</span></code></pre></div>
<p>Finally, the test case promised at the beginning of the post:</p>
<div class="sourceCode" id="cb25" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> printResult <span class="op">$</span> parseJSON <span class="st">"[{\"c\"\t:\n \n \t[\r\"\\g\"]}]"</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a>ERROR:</span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a>Invalid escaped character: 'g' at line 3, column 8: ·\t[\r"\g"]}]</span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a>→ Expected a string at line 3, column 6: ··\t[\r"\g"]}</span>
<span id="cb25-6"><a href="#cb25-6" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb25-7"><a href="#cb25-7" aria-hidden="true" tabindex="-1"></a>→ Expected a JSON value at line 3, column 6: ··\t[\r"\g"]}</span>
<span id="cb25-8"><a href="#cb25-8" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb25-9"><a href="#cb25-9" aria-hidden="true" tabindex="-1"></a>→ Expected an array at line 3, column 4: ··\t[\r"\g"</span>
<span id="cb25-10"><a href="#cb25-10" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb25-11"><a href="#cb25-11" aria-hidden="true" tabindex="-1"></a>→ Expected an object value at line 1, column 8: {"c"\t:\n</span>
<span id="cb25-12"><a href="#cb25-12" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb25-13"><a href="#cb25-13" aria-hidden="true" tabindex="-1"></a>→ Expected an object key-value pair at line 1, column 3: [{"c"\t:\n</span>
<span id="cb25-14"><a href="#cb25-14" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb25-15"><a href="#cb25-15" aria-hidden="true" tabindex="-1"></a>→ Expected an object at line 1, column 2: [{"c"\t:</span>
<span id="cb25-16"><a href="#cb25-16" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb25-17"><a href="#cb25-17" aria-hidden="true" tabindex="-1"></a>→ Expected a JSON value at line 1, column 2: [{"c"\t:</span>
<span id="cb25-18"><a href="#cb25-18" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb25-19"><a href="#cb25-19" aria-hidden="true" tabindex="-1"></a>→ Expected an array at line 1, column 1: [{"c"\t</span>
<span id="cb25-20"><a href="#cb25-20" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb25-21"><a href="#cb25-21" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> printResult <span class="op">$</span> parseJSON <span class="st">"[{\"c\"\t:\n \n \t[\r\"\\n\"]}]"</span></span>
<span id="cb25-22"><a href="#cb25-22" aria-hidden="true" tabindex="-1"></a>RESULT:</span>
<span id="cb25-23"><a href="#cb25-23" aria-hidden="true" tabindex="-1"></a>[{"c": ["\n"]}]</span></code></pre></div>
<p>We also run the property-based tests to make sure that nothing is broken:</p>
<div class="sourceCode" id="cb26" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runTests</span>
<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a>== prop_genParseJString ==</span>
<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span>
<span id="cb26-4"><a href="#cb26-4" aria-hidden="true" tabindex="-1"></a>== prop_genParseJNumber ==</span>
<span id="cb26-5"><a href="#cb26-5" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span>
<span id="cb26-6"><a href="#cb26-6" aria-hidden="true" tabindex="-1"></a>== prop_genParseJArray ==</span>
<span id="cb26-7"><a href="#cb26-7" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span>
<span id="cb26-8"><a href="#cb26-8" aria-hidden="true" tabindex="-1"></a>== prop_genParseJObject ==</span>
<span id="cb26-9"><a href="#cb26-9" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span>
<span id="cb26-10"><a href="#cb26-10" aria-hidden="true" tabindex="-1"></a>== prop_genParseJSON ==</span>
<span id="cb26-11"><a href="#cb26-11" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span></code></pre></div>
<p>We have successfully added useful error-reporting capabilities to our JSON parser.</p>
<h2 data-track-content data-content-name="conclusion" data-content-piece="json-parsing-from-scratch-in-haskell-3" id="conclusion">Conclusion</h2>
<p>Over the course of the last two posts, we rewrote the JSON parser we wrote in the first post to add support for error reporting. While doing that, we learned about Zippers and how to use them to move around within a data structure. We also learned about lookahead based predictive parsing and how it is different from backtracking parsing. The full code for the new JSON parser can be found <a href="https://abhinavsarkar.net/code/jsonparser-2.html?mtm_campaign=feed">here</a>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>For comparison: <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#cb34-1">the <code>jsonChar</code> parser</a> which uses backtracking from the first post.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>We are using the <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/multiway_if.html#extension-MultiWayIf" target="_blank" rel="noopener">Multi-way If</a> GHC extension here.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>To learn about Unicode surrogate characters and what they have to do with JSON parsing, see the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#jstring-unicode"><code>jString</code> parser section</a> in the first post.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>For comparison: <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#cb36-1">the <code>jString</code> parser</a> from the first post.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>For comparison: <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#jnumber-parser">the <code>jNumber</code> parser</a> which uses backtracking from the first post.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>For comparison: <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#jarray-parser">the <code>jArray</code> parser</a> from the first post.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>For comparison: <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#jobject-parser">the <code>jObject</code> parser</a> from the first post.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>JSON Parsing from Scratch in Haskell</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed">JSON Parsing from Scratch in Haskell</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-2/?mtm_campaign=feed">Error Reporting—Part 1</a>
</li>
<li>
<strong>Error Reporting—Part 2</strong> 👈
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-3/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2020-09-30T00:00:00Z <p>In the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-2/">previous post</a>, we set out to rewrite the JSON parser we wrote in Haskell in an <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/">earlier post</a>, to add support for error reporting. The parser was written very naively: if it failed, it returned nothing. You couldn’t tell what the failure was or where it happened. That’s OK for a toy parser but error reporting is an absolute must requirement for all good parsers. In the previous post, we finished writing the basic framework for the same. In this post, we’ll finish adding simple but useful error reporting capability to our JSON parser.</p>
https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-2/ JSON Parsing from Scratch in Haskell: Error Reporting—Part 1 2020-09-29T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed">previous post</a> we wrote a simple but correct JSON parser in Haskell. The parser was written very naively: if it failed, it returned nothing. You couldn’t tell what the failure was or where it happened. That’s OK for a toy parser but error reporting is an absolute must requirement for all good parsers. So in this post and next post, we’ll add simple but useful error reporting capability to our JSON parser.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-2/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>JSON Parsing from Scratch in Haskell</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed">JSON Parsing from Scratch in Haskell</a>
</li>
<li>
<strong>Error Reporting—Part 1</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-3/?mtm_campaign=feed">Error Reporting—Part 2</a>
</li>
</ol>
</section>
<nav id="toc"><h3>Contents</h3><ol><li><a href="#introduction">Introduction</a></li><li><a href="#setup">Setup</a></li><li><a href="#adding-error">Adding Error</a><ol><li><a href="#the-backtracking-problem">The Backtracking Problem</a></li></ol></li><li><a href="#tracking-position">Tracking Position</a><ol><li><a href="#zippers">Zippers</a></li><li><a href="#text-zipper">Text Zipper</a></li><li><a href="#zippered-parser">Zippered Parser</a></li></ol></li><li><a href="#errors-with-position">Errors with Position</a></li><li><a href="#basic-parsers">Basic Parsers</a></li><li><a href="#conclusion">Conclusion</a></li></ol></nav>
<h2 data-track-content data-content-name="introduction" data-content-piece="json-parsing-from-scratch-in-haskell-2" id="introduction">Introduction</h2>
<p>The <a href="https://abhinavsarkar.net/code/jsonparser.html?mtm_campaign=feed">JSON parser</a> we wrote in the previous post works correctly and passes all tests. However, if we run it with an invalid input, it returns <code>Nothing</code>:</p>
<div class="sourceCode" id="cb1" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">parseJSON ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">JValue</span></span></code></pre></div>
<p>This is a not a very user-friendly parser. In the real-world, we often have to run potentially invalid input through parsers and we expect the parsers to help us out in figuring what is wrong, and sometimes even to keep going by letting us handle or ignore the errors. Different parsers support error handling and reporting to different degrees. <a href="https://github.com/mrkkrp/megaparsec" target="_blank" rel="noopener">Megaparsec</a> is a Haskell parser library which has good support for it. Here’s how a parsing error looks when working with Megaparsec:</p>
<pre class="plain"><code>1:4:
|
1 | aaacc
| ^
unexpected 'c'
expecting 'a' or 'b'
in foo, in bar</code></pre>
<p>The error report tells us what the parser was expecting and what it got. It also tells us where the error happened. It even tells us the context of the error by telling us that it happened “in foo, in bar”. Such an error report is definitely quite useful to track down the problems with the inputs or even with the parsers<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>.</p>
<h2 data-track-content data-content-name="setup" data-content-piece="json-parsing-from-scratch-in-haskell-2" id="setup">Setup</h2>
<p>We want to implement similar error reporting for our JSON parser. To be specific, we want the error reporting to tell us:</p>
<ul>
<li>The nature of errors: what is expected and what is wrong.</li>
<li>The position of errors in terms of line and column numbers in the input.</li>
<li>The context of errors in terms of the JSON syntax.</li>
</ul>
<p>Here’s how it will look like when we are done:</p>
<div class="sourceCode" id="cb3" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> printResult <span class="op">$</span> parseJSON <span class="st">"[{\"c\"\t:\n \n \t[\r\"\\g\"]}]"</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>ERROR:</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>Invalid escaped character: 'g' at line 3, column 8: ·\t[\r"\g"]}]</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>→ Expected a string at line 3, column 6: ··\t[\r"\g"]}</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>→ Expected a JSON value at line 3, column 6: ··\t[\r"\g"]}</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>→ Expected an array at line 3, column 4: ··\t[\r"\g"</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a>→ Expected an object value at line 1, column 8: {"c"\t:\n</span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a>→ Expected an object key-value pair at line 1, column 3: [{"c"\t:\n</span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a>→ Expected an object at line 1, column 2: [{"c"\t:</span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a>→ Expected a JSON value at line 1, column 2: [{"c"\t:</span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a>→ Expected an array at line 1, column 1: [{"c"\t</span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a> ↑</span></code></pre></div>
<p>Adding support for error reporting will be a major code change. We will rely on the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#cb61">property-based tests</a> which we wrote in the previous post to make sure that nothing breaks<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>. A lot of code though will stay the same. Instead of showing such parts in this post, I’ll link to the relevant sections in the previous post.</p>
<p>To start with, the imports are below.</p>
<div class="sourceCode" id="cb4" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE DeriveGeneric, TupleSections #-}</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE LambdaCase, MultiWayIf #-}</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">JSONParser</span> <span class="kw">where</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Applicative</span> (<span class="dt">Alternative</span>(..))</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span> (replicateM)</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Bits</span> (shiftL)</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Char</span> (isDigit, isHexDigit, isSpace, chr, ord, digitToInt)</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Functor</span> (($>))</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.List</span> (intercalate)</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.List.Split</span> (split, dropFinalBlank, keepDelimsR, onSublist)</span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="kw">qualified</span> <span class="dt">Data.List.NonEmpty</span> <span class="kw">as</span> <span class="dt">NEL</span></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">GHC.Generics</span> (<span class="dt">Generic</span>)</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Numeric</span> (showHex)</span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Prelude</span> <span class="kw">hiding</span> (lines)</span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Text.Printf</span> (printf)</span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Test.QuickCheck</span> <span class="kw">hiding</span> (<span class="dt">Positive</span>, <span class="dt">Negative</span>)</span></code></pre></div>
<p>Here’s the <code>JValue</code> data type for a refresher:</p>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">JValue</span> <span class="ot">=</span> <span class="dt">JNull</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JBool</span> <span class="dt">Bool</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JString</span> <span class="dt">String</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JNumber</span> {<span class="ot"> int ::</span> <span class="dt">Integer</span>,<span class="ot"> frac ::</span> [<span class="dt">Int</span>],<span class="ot"> exponent ::</span> <span class="dt">Integer</span> }</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JArray</span> [<span class="dt">JValue</span>]</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JObject</span> [(<span class="dt">String</span>, <span class="dt">JValue</span>)]</span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Generic</span>)</span></code></pre></div>
<p>The <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#cb3-1">instances</a> for <code>JValue</code> and the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#json-generators">JSON generators</a> remain the same.</p>
<h2 data-track-content data-content-name="adding-error" data-content-piece="json-parsing-from-scratch-in-haskell-2" id="adding-error">Adding Error</h2>
<p>The old <code>Parser</code> type we defined in the previous post returned <code>Nothing</code> in case of failures:</p>
<div class="sourceCode" id="cb6" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Parser</span> i o <span class="ot">=</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">Parser</span> {<span class="ot"> runParser ::</span> i <span class="ot">-></span> <span class="dt">Maybe</span> (i, o) }</span></code></pre></div>
<p>To be able to return errors, we start with creating a new type to capture the possible results of parsing:</p>
<div class="sourceCode" id="cb7" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">ParseResult</span> a <span class="ot">=</span> <span class="dt">Error</span> [<span class="dt">String</span>] <span class="op">|</span> <span class="dt">Result</span> a</span></code></pre></div>
<p><code>ParseResult</code>—the result of parsing—is now either an <code>Error</code> with a list of error messages, or a <code>Result</code> with the result of successful parsing.</p>
<p>Let’s quickly write the various typeclass instances for it:</p>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> a <span class="ot">=></span> <span class="dt">Show</span> (<span class="dt">ParseResult</span> a) <span class="kw">where</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> (<span class="dt">Result</span> res) <span class="ot">=</span> <span class="fu">show</span> res</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> (<span class="dt">Error</span> errs) <span class="ot">=</span> formatErrors (<span class="fu">reverse</span> errs)</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> formatErrors [] <span class="ot">=</span> <span class="fu">error</span> <span class="st">"No errors to format"</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> formatErrors [err] <span class="ot">=</span> err</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> formatErrors (err<span class="op">:</span>errs) <span class="ot">=</span></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> err <span class="op"><></span> delim <span class="op"><></span> intercalate delim (<span class="fu">map</span> (<span class="fu">concatMap</span> padNewline) errs)</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a> delim <span class="ot">=</span> <span class="st">"\n→ "</span></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a> padNewline <span class="ch">'\n'</span> <span class="ot">=</span> <span class="ch">'\n'</span><span class="op">:</span><span class="fu">replicate</span> (<span class="fu">length</span> delim <span class="op">-</span> <span class="dv">1</span>) <span class="ch">' '</span></span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a> padNewline c <span class="ot">=</span> [c]</span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Functor</span> <span class="dt">ParseResult</span> <span class="kw">where</span></span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a> <span class="fu">fmap</span> _ (<span class="dt">Error</span> errs) <span class="ot">=</span> <span class="dt">Error</span> errs</span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a> <span class="fu">fmap</span> f (<span class="dt">Result</span> res) <span class="ot">=</span> <span class="dt">Result</span> (f res)</span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-18"><a href="#cb8-18" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Applicative</span> <span class="dt">ParseResult</span> <span class="kw">where</span></span>
<span id="cb8-19"><a href="#cb8-19" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> <span class="ot">=</span> <span class="dt">Result</span></span>
<span id="cb8-20"><a href="#cb8-20" aria-hidden="true" tabindex="-1"></a> <span class="dt">Error</span> errs <span class="op"><*></span> _ <span class="ot">=</span> <span class="dt">Error</span> errs</span>
<span id="cb8-21"><a href="#cb8-21" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> f <span class="op"><*></span> result <span class="ot">=</span> <span class="fu">fmap</span> f result</span></code></pre></div>
<p>The <code>Show</code> instance shows each error message in the list on its own line, starting with the last message. The results are shown verbatim. The <code>Functor</code> and <code>Applicative</code> instances propagate errors while operating on results as expected.</p>
<p>Let’s print an error in GHCi to get a feel of it:</p>
<div class="sourceCode" id="cb9" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">print</span> <span class="op">$</span> <span class="dt">Error</span> [<span class="st">"something went wrong"</span>, <span class="st">"and we know"</span>]</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>and we know</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>→ something went wrong</span></code></pre></div>
<p>Now we write a new Parser type which returns <code>ParseResult</code> instead of <code>Maybe</code><a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>:</p>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Parser1</span> i o <span class="ot">=</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">Parser1</span> {<span class="ot"> runParser1 ::</span> i <span class="ot">-></span> <span class="dt">ParseResult</span> (i, o) }</span></code></pre></div>
<p>And the instances for the new <code>Parser1</code> type:</p>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Functor</span> (<span class="dt">Parser1</span> i) <span class="kw">where</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">fmap</span> f parser <span class="ot">=</span> <span class="dt">Parser1</span> <span class="op">$</span> <span class="fu">fmap</span> (<span class="fu">fmap</span> f) <span class="op">.</span> runParser1 parser</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Applicative</span> (<span class="dt">Parser1</span> i) <span class="kw">where</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> x <span class="ot">=</span> <span class="dt">Parser1</span> <span class="op">$</span> <span class="fu">pure</span> <span class="op">.</span> (, x)</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> pf <span class="op"><*></span> pa <span class="ot">=</span> <span class="dt">Parser1</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> runParser1 pf input <span class="kw">of</span></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Error</span> err <span class="ot">-></span> <span class="dt">Error</span> err</span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> (rest, f) <span class="ot">-></span> <span class="fu">fmap</span> f <span class="op"><$></span> runParser1 pa rest</span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Alternative</span> (<span class="dt">Parser1</span> i) <span class="kw">where</span></span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a> empty <span class="ot">=</span> <span class="dt">Parser1</span> <span class="op">$</span> <span class="fu">const</span> <span class="op">$</span> <span class="dt">Error</span> [<span class="st">"Unknown error."</span>]</span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a> parser1 <span class="op"><|></span> parser2 <span class="ot">=</span> <span class="dt">Parser1</span> <span class="op">$</span> \input <span class="ot">-></span></span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> runParser1 parser1 input <span class="kw">of</span></span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> res <span class="ot">-></span> <span class="dt">Result</span> res</span>
<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">Error</span> _ <span class="ot">-></span> runParser1 parser2 input</span>
<span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Monad</span> (<span class="dt">Parser1</span> i) <span class="kw">where</span></span>
<span id="cb11-18"><a href="#cb11-18" aria-hidden="true" tabindex="-1"></a> parser <span class="op">>>=</span> f <span class="ot">=</span> <span class="dt">Parser1</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> runParser1 parser input <span class="kw">of</span></span>
<span id="cb11-19"><a href="#cb11-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">Error</span> errs <span class="ot">-></span> <span class="dt">Error</span> errs</span>
<span id="cb11-20"><a href="#cb11-20" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> (rest, output) <span class="ot">-></span> runParser1 (f output) rest</span></code></pre></div>
<p>The instances are similar to the ones from the previous post, with additional <code>Error</code> propagation. Some work is delegated to the instances of <code>ParseResult</code>.</p>
<p>Next, we write some helper functions to return errors while parsing:</p>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">parseError1 ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">ParseResult</span> a</span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>parseError1 err <span class="ot">=</span> <span class="dt">Error</span> [err]</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="ot">throw1 ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser1</span> <span class="dt">String</span> o</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>throw1 <span class="ot">=</span> <span class="dt">Parser1</span> <span class="op">.</span> <span class="fu">const</span> <span class="op">.</span> parseError1</span></code></pre></div>
<p>And finally, we rewrite our old parsers to return errors on failures instead of returning <code>Nothing</code>:</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">satisfy1 ::</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a> (<span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Bool</span>) <span class="ot">-></span> (<span class="dt">Char</span> <span class="ot">-></span> <span class="dt">String</span>) <span class="ot">-></span> <span class="dt">Parser1</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>satisfy1 predicate mkError <span class="ot">=</span> <span class="dt">Parser1</span> <span class="op">$</span> \<span class="kw">case</span></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a> (c<span class="op">:</span>cs) <span class="op">|</span> predicate c <span class="ot">-></span> <span class="dt">Result</span> (cs, c)</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a> (c<span class="op">:</span>_) <span class="ot">-></span> parseError1 (mkError c)</span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> parseError1 <span class="st">"Empty input"</span></span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a><span class="ot">char1 ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Parser1</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a>char1 c <span class="ot">=</span> satisfy1 (<span class="op">==</span> c) <span class="op">$</span> printf <span class="st">"Expected '%v', got '%v'"</span> c</span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a><span class="ot">string1 ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser1</span> <span class="dt">String</span> <span class="dt">String</span></span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a>string1 <span class="st">""</span> <span class="ot">=</span> <span class="fu">pure</span> <span class="st">""</span></span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a>string1 (c<span class="op">:</span>cs) <span class="ot">=</span> (<span class="op">:</span>) <span class="op"><$></span> char1 c <span class="op"><*></span> string1 cs</span></code></pre></div>
<p><code>satisfy1</code> is a “higher-order” parser, it takes a function to create the error message—which <code>char1</code> and other parsers which call <code>satisfy1</code> pass—to create contextual error messages. The <code>string1</code> parser stays the same.</p>
<p>At this point, we can play with these parsers in GHCi:</p>
<div class="sourceCode" id="cb14" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser1 (string1 <span class="st">"abc"</span>) <span class="st">"abc"</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>("","abc")</span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser1 (string1 <span class="st">"abc"</span>) <span class="st">"abx"</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>Expected 'c', got 'x'</span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser1 (string1 <span class="st">"abc"</span>) <span class="st">""</span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>Empty input</span></code></pre></div>
<p>Great, it works! Let’s try it out by rewriting the JSON bool parser from the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#cb32-1">previous post</a>:</p>
<div class="sourceCode" id="cb15" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jBool1 ::</span> <span class="dt">Parser1</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>jBool1 <span class="ot">=</span> string1 <span class="st">"true"</span> <span class="op">$></span> <span class="dt">JBool</span> <span class="dt">True</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> string1 <span class="st">"false"</span> <span class="op">$></span> <span class="dt">JBool</span> <span class="dt">False</span></span></code></pre></div>
<p>Over to GHCi:</p>
<div class="sourceCode" id="cb16" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser1 jBool1 <span class="st">"trux"</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>Expected 'f', got 't'</span></code></pre></div>
<p>Oops. That error message is wrong. It should have said <code>Expected 'e', got 'x'</code>. Somehow, the message is about <code>t</code> instead of <code>x</code>. What’s going on here?</p>
<h3 id="the-backtracking-problem">The Backtracking Problem</h3>
<p>The problem is with the <a href="https://en.wikipedia.org/wiki/backtracking" target="_blank" rel="noopener">backtracking</a> done in the <code>jBool1</code> parser. When the first parser branch of <code>true</code> is unable to parse the input, it backtracks to the start of the input and tries to parse it with the second branch of <code>false</code>. And then it fails while trying to match <code>f</code> with the first character of the input <code>t</code>, hence the error message. The solution to this is to abandon backtracking and write a <em>Predictive Parser</em> with <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?nosidenotes?mtm_campaign=feed#fn12">lookahead</a>.</p>
<div class="sourceCode" id="cb17" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">lookahead1 ::</span> <span class="dt">Parser1</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>lookahead1 <span class="ot">=</span> <span class="dt">Parser1</span> <span class="op">$</span> \<span class="kw">case</span></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a> input<span class="op">@</span>(c<span class="op">:</span>_) <span class="ot">-></span> <span class="dt">Result</span> (input, c)</span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> parseError1 <span class="st">"Empty input"</span></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a><span class="ot">jBool2 ::</span> <span class="dt">Parser1</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a>jBool2 <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a> c <span class="ot"><-</span> lookahead1</span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">JBool</span> <span class="op"><$></span> <span class="kw">case</span> c <span class="kw">of</span></span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a> <span class="ch">'t'</span> <span class="ot">-></span> string1 <span class="st">"true"</span> <span class="op">$></span> <span class="dt">True</span></span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a> <span class="ch">'f'</span> <span class="ot">-></span> string1 <span class="st">"false"</span> <span class="op">$></span> <span class="dt">False</span></span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> throw1 <span class="op">$</span></span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a> printf <span class="st">"Expected: 't' for true or 'f' for false; got '%v'"</span> c</span></code></pre></div>
<p>The <code>lookahead1</code> function lets us peek at the first character of the input without consuming it. We use it in the <code>jBool2</code> function to choose one of the <code>true</code> or <code>false</code> branches categorically, without any backtracking. If the lookahead is neither <code>t</code> or <code>f</code> then we throw an error.</p>
<p>Let’s see if it works:</p>
<div class="sourceCode" id="cb18" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser1 jBool2 <span class="st">"trux"</span></span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>Expected 'e', got 'x'</span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser1 jBool2 <span class="st">"falze"</span></span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a>Expected 's', got 'z'</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser1 jBool2 <span class="st">"null"</span></span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>Expected: 't' for true or 'f' for false; got 'n'</span></code></pre></div>
<p>It works as expected. But this signals a big change for all our previous JSON parsers. We’ll need to switch from backtracking to lookahead everywhere. Fortunately, the JSON syntax is such that any JSON input can be parsed unambiguously with lookahead of only one character and we’ll not require any radical changes. But first, let’s figure out how to add position tracking to our parser.</p>
<h2 data-track-content data-content-name="tracking-position" data-content-piece="json-parsing-from-scratch-in-haskell-2" id="tracking-position">Tracking Position</h2>
<p>We need to track the position where our parser is currently at—in terms of line and column numbers in the input text—so that we can include that information in our error messages. One obvious way of doing this is to make the parser stateful. We can do this by layering the <a href="https://hackage.haskell.org/package/transformers/docs/Control-Monad-Trans-State-Strict.html#t:StateT" target="_blank" rel="noopener"><code>StateT</code></a> monad transformer over the basic <code>Parser</code> monad. Then we can have the current line and column numbers in the state and update them while processing the input in the parsers we write.</p>
<p>But we choose to be more adventurous! We’ll instead use an often talked about but seldom used technique: <em>Zippers</em><a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>.</p>
<h3 id="zippers">Zippers</h3>
<p>Quoting the <a href="https://en.wikipedia.org/wiki/Zipper_(data_structure)" target="_blank" rel="noopener">Wikipedia</a> article on Zippers:</p>
<blockquote>
<p>A zipper is a technique of representing an aggregate data structure so that it is convenient for writing programs that traverse the structure arbitrarily and update its contents, especially in purely functional programming languages.</p>
</blockquote>
<p>Basically, zippers are a special view of data structures, which allow one to navigate and update them easily. A zipper always has a focus or cursor which is the current element of the data structure we are “at”. Alongside, it also captures the rest of the data structure in a way that makes it easy to move around it. We can update the data structure by updating the element at the focus. Let’s take the example of a non-empty list to understand zippers.</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 121 97'%3E%3C/svg%3E" class="lazyload w-100pct mw-30pct nolink" style="--image-aspect-ratio: 1.2474226804123711" data-src="/images/json-parsing-from-scratch-in-haskell-2/list-zipper.svg" alt="List zipper"></img>
<noscript><img src="/images/json-parsing-from-scratch-in-haskell-2/list-zipper.svg" class="w-100pct mw-30pct nolink" alt="List zipper"></img></noscript>
<figcaption>List zipper</figcaption>
</figure>
<p>For the above list, when are “at” or interested in the element <code>4</code>, the focus of the list zipper is <code>4</code>. It also contains two lists named <code>left</code> and <code>right</code> to capture the elements of the list left and right of the focus respectively. To move the focus from <code>4</code> to <code>3</code> on its left, we just need to uncons <code>3</code> from the left list, make it the focus and cons <code>4</code> to the right list. Here’s the code for list zipper:</p>
<div class="sourceCode" id="cb19" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">ListZipper</span> a <span class="ot">=</span></span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">ListZipper</span> {</span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a><span class="ot"> lzLeft ::</span> [a]</span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> lzFocus ::</span> a</span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> lzRight ::</span> [a]</span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a> } <span class="kw">deriving</span> (<span class="dt">Show</span>)</span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a><span class="ot">list2zipper ::</span> <span class="dt">NEL.NonEmpty</span> a <span class="ot">-></span> <span class="dt">ListZipper</span> a</span>
<span id="cb19-9"><a href="#cb19-9" aria-hidden="true" tabindex="-1"></a>list2zipper list <span class="ot">=</span> <span class="dt">ListZipper</span> [] (NEL.head list) (NEL.tail list)</span>
<span id="cb19-10"><a href="#cb19-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-11"><a href="#cb19-11" aria-hidden="true" tabindex="-1"></a><span class="ot">lzMoveRight ::</span> <span class="dt">ListZipper</span> a <span class="ot">-></span> <span class="dt">ListZipper</span> a</span>
<span id="cb19-12"><a href="#cb19-12" aria-hidden="true" tabindex="-1"></a>lzMoveRight (<span class="dt">ListZipper</span> l f []) <span class="ot">=</span> <span class="dt">ListZipper</span> l f []</span>
<span id="cb19-13"><a href="#cb19-13" aria-hidden="true" tabindex="-1"></a>lzMoveRight (<span class="dt">ListZipper</span> l f (x<span class="op">:</span>xs)) <span class="ot">=</span> <span class="dt">ListZipper</span> (f<span class="op">:</span>l) x xs</span>
<span id="cb19-14"><a href="#cb19-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-15"><a href="#cb19-15" aria-hidden="true" tabindex="-1"></a><span class="ot">lzMoveLeft ::</span> <span class="dt">ListZipper</span> a <span class="ot">-></span> <span class="dt">ListZipper</span> a</span>
<span id="cb19-16"><a href="#cb19-16" aria-hidden="true" tabindex="-1"></a>lzMoveLeft (<span class="dt">ListZipper</span> [] f r) <span class="ot">=</span> <span class="dt">ListZipper</span> [] f r</span>
<span id="cb19-17"><a href="#cb19-17" aria-hidden="true" tabindex="-1"></a>lzMoveLeft (<span class="dt">ListZipper</span> (x<span class="op">:</span>xs) f r) <span class="ot">=</span> <span class="dt">ListZipper</span> xs x (f<span class="op">:</span>r)</span>
<span id="cb19-18"><a href="#cb19-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-19"><a href="#cb19-19" aria-hidden="true" tabindex="-1"></a><span class="ot">zipper2list ::</span> <span class="dt">ListZipper</span> a <span class="ot">-></span> <span class="dt">NEL.NonEmpty</span> a</span>
<span id="cb19-20"><a href="#cb19-20" aria-hidden="true" tabindex="-1"></a>zipper2list (<span class="dt">ListZipper</span> l f r) <span class="ot">=</span> NEL.fromList <span class="op">$</span> <span class="fu">reverse</span> l <span class="op">++</span> f<span class="op">:</span>r</span></code></pre></div>
<p>Let’s see it in action in GHCi:</p>
<div class="sourceCode" id="cb20" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> lz <span class="ot">=</span> list2zipper <span class="op">$</span> NEL.fromList [<span class="dv">1</span><span class="op">..</span><span class="dv">9</span>]</span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> lz</span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a>ListZipper {lzLeft = [], lzFocus = 1, lzRight = [2,3,4,5,6,7,8,9]}</span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> lzMoveLeft lz</span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a>ListZipper {lzLeft = [], lzFocus = 1, lzRight = [2,3,4,5,6,7,8,9]}</span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> lzMoveRight lz</span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a>ListZipper {lzLeft = [1], lzFocus = 2, lzRight = [3,4,5,6,7,8,9]}</span>
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> lzMoveRight <span class="op">$</span> lzMoveRight lz</span>
<span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a>ListZipper {lzLeft = [2,1], lzFocus = 3, lzRight = [4,5,6,7,8,9]}</span>
<span id="cb20-10"><a href="#cb20-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> lzMoveRight <span class="op">$</span> lzMoveRight <span class="op">$</span> lzMoveRight lz</span>
<span id="cb20-11"><a href="#cb20-11" aria-hidden="true" tabindex="-1"></a>ListZipper {lzLeft = [3,2,1], lzFocus = 4, lzRight = [5,6,7,8,9]}</span>
<span id="cb20-12"><a href="#cb20-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> lz' <span class="ot">=</span> lzMoveRight <span class="op">$</span> lzMoveRight <span class="op">$</span> lzMoveRight lz</span>
<span id="cb20-13"><a href="#cb20-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> lzMoveLeft lz'</span>
<span id="cb20-14"><a href="#cb20-14" aria-hidden="true" tabindex="-1"></a>ListZipper {lzLeft = [2,1], lzFocus = 3, lzRight = [4,5,6,7,8,9]}</span>
<span id="cb20-15"><a href="#cb20-15" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> lzMoveLeft <span class="op">$</span> lzMoveLeft lz'</span>
<span id="cb20-16"><a href="#cb20-16" aria-hidden="true" tabindex="-1"></a>ListZipper {lzLeft = [1], lzFocus = 2, lzRight = [3,4,5,6,7,8,9]}</span>
<span id="cb20-17"><a href="#cb20-17" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> lzMoveLeft <span class="op">$</span> lzMoveLeft <span class="op">$</span> lzMoveLeft lz'</span>
<span id="cb20-18"><a href="#cb20-18" aria-hidden="true" tabindex="-1"></a>ListZipper {lzLeft = [], lzFocus = 1, lzRight = [2,3,4,5,6,7,8,9]}</span>
<span id="cb20-19"><a href="#cb20-19" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> NEL.toList <span class="op">$</span> zipper2list lz'</span>
<span id="cb20-20"><a href="#cb20-20" aria-hidden="true" tabindex="-1"></a>[1,2,3,4,5,6,7,8,9]</span></code></pre></div>
<p>With the understanding of the list zipper, let’s figure out a zipper for our parser input<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>.</p>
<h3 id="text-zipper">Text Zipper</h3>
<p>Though the input to our parser is a <code>String</code>, for the purpose of error reporting, we should think of it as two-dimensional text with rows of lines from top-to-bottom and columns of characters from left-to-right. For this representation, we can devise a zipper as shown in the diagram below:</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 297 321'%3E%3C/svg%3E" class="lazyload w-100pct mw-60pct nolink" style="--image-aspect-ratio: 0.9252336448598131" data-src="/images/json-parsing-from-scratch-in-haskell-2/text-zipper.svg" alt="Text zipper"></img>
<noscript><img src="/images/json-parsing-from-scratch-in-haskell-2/text-zipper.svg" class="w-100pct mw-60pct nolink" alt="Text zipper"></img></noscript>
<figcaption>Text zipper</figcaption>
</figure>
<p>If we think of our parser moving through this 2D text one character at a time—as a cursor moving through a text document—this zipper structure makes sense. The character just right of the cursor is the current character that the parser is going to consume next. There are some characters to the left of the cursor in the same line which have already been consumed and there are some to the right which are yet to be consumed. Similarly, there are some line above the current line which the parser has already seen and there are some yet unseen lines below the current line. With this view in mind, we can write the code for <code>TextZipper</code>:</p>
<div class="sourceCode" id="cb21" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">TextZipper</span> a <span class="ot">=</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">TextZipper</span> {</span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a><span class="ot"> tzLeft ::</span> a</span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> tzRight ::</span> a</span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> tzAbove ::</span> [a]</span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> tzBelow ::</span> [a]</span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> a <span class="ot">=></span> <span class="dt">Show</span> (<span class="dt">TextZipper</span> a) <span class="kw">where</span></span>
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> (<span class="dt">TextZipper</span> left right above below) <span class="ot">=</span></span>
<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a> <span class="st">"TextZipper{left="</span> <span class="op"><></span> <span class="fu">show</span> left</span>
<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">", right="</span> <span class="op"><></span> <span class="fu">show</span> right</span>
<span id="cb21-13"><a href="#cb21-13" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">", above="</span> <span class="op"><></span> <span class="fu">show</span> above</span>
<span id="cb21-14"><a href="#cb21-14" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">", below="</span> <span class="op"><></span> <span class="fu">show</span> below</span>
<span id="cb21-15"><a href="#cb21-15" aria-hidden="true" tabindex="-1"></a> <span class="op"><></span> <span class="st">"}"</span></span>
<span id="cb21-16"><a href="#cb21-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-17"><a href="#cb21-17" aria-hidden="true" tabindex="-1"></a><span class="ot">textZipper ::</span> [<span class="dt">String</span>] <span class="ot">-></span> <span class="dt">TextZipper</span> <span class="dt">String</span></span>
<span id="cb21-18"><a href="#cb21-18" aria-hidden="true" tabindex="-1"></a>textZipper [] <span class="ot">=</span> <span class="dt">TextZipper</span> <span class="st">""</span> <span class="st">""</span> [] []</span>
<span id="cb21-19"><a href="#cb21-19" aria-hidden="true" tabindex="-1"></a>textZipper (first<span class="op">:</span>rest) <span class="ot">=</span> <span class="dt">TextZipper</span> <span class="st">""</span> first [] rest</span>
<span id="cb21-20"><a href="#cb21-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-21"><a href="#cb21-21" aria-hidden="true" tabindex="-1"></a><span class="ot">currentPosition ::</span> <span class="dt">TextZipper</span> <span class="dt">String</span> <span class="ot">-></span> (<span class="dt">Int</span>, <span class="dt">Int</span>)</span>
<span id="cb21-22"><a href="#cb21-22" aria-hidden="true" tabindex="-1"></a>currentPosition zipper <span class="ot">=</span></span>
<span id="cb21-23"><a href="#cb21-23" aria-hidden="true" tabindex="-1"></a> (<span class="fu">length</span> (tzAbove zipper) <span class="op">+</span> <span class="dv">1</span>, <span class="fu">length</span> (tzLeft zipper) <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb21-24"><a href="#cb21-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-25"><a href="#cb21-25" aria-hidden="true" tabindex="-1"></a><span class="ot">currentChar ::</span> <span class="dt">TextZipper</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Char</span></span>
<span id="cb21-26"><a href="#cb21-26" aria-hidden="true" tabindex="-1"></a>currentChar zipper <span class="ot">=</span> <span class="kw">case</span> tzRight zipper <span class="kw">of</span></span>
<span id="cb21-27"><a href="#cb21-27" aria-hidden="true" tabindex="-1"></a> [] <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb21-28"><a href="#cb21-28" aria-hidden="true" tabindex="-1"></a> (c<span class="op">:</span>_) <span class="ot">-></span> <span class="dt">Just</span> c</span>
<span id="cb21-29"><a href="#cb21-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-30"><a href="#cb21-30" aria-hidden="true" tabindex="-1"></a><span class="fu">lines</span><span class="ot"> ::</span> <span class="dt">String</span> <span class="ot">-></span> [<span class="dt">String</span>]</span>
<span id="cb21-31"><a href="#cb21-31" aria-hidden="true" tabindex="-1"></a><span class="fu">lines</span> <span class="ot">=</span> (split <span class="op">.</span> dropFinalBlank <span class="op">.</span> keepDelimsR <span class="op">.</span> onSublist) <span class="st">"\n"</span></span></code></pre></div>
<p>Finding the current position of the cursor in <code>TextZipper</code> is trivially easy. The current row number is just the count of lines above the current line plus one. Similarly, the current column number is the count of characters left of the cursor plus one. The <code>currentChar</code> function returns the character just right of the cursor, if there’s one present.</p>
<p>The <code>lines</code> function is a slightly modified version of <code>Prelude</code>’s <a href="https://hackage.haskell.org/package/base/docs/Prelude.html#v:lines" target="_blank" rel="noopener"><code>lines</code></a> function which leaves the newlines (<code>\n</code>) in the output. We do this so that we can report an error if there is a newline at any wrong position like in middle of a JSON string.</p>
<p>Quick trial in GHCi:</p>
<div class="sourceCode" id="cb22" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> Prelude.lines <span class="st">"some\nnext\n\nmore"</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>["some","next","","more"]</span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">lines</span> <span class="st">"some\nnext\n\nmore"</span></span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a>["some\n","next\n","\n","more"]</span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> text <span class="ot">=</span> <span class="fu">lines</span> <span class="st">"some text\nnext line\nmore lines"</span></span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> text</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a>["some text\n","next line\n","more lines"]</span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz <span class="ot">=</span> textZipper text</span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz</span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a>TextZipper{left="", right="some text\n", above=[], below=["next line\n","more lines"]}</span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> currentPosition tz</span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a>(1,1)</span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> currentChar tz</span>
<span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a>Just 's'</span></code></pre></div>
<p>Next, we write functions to move forward and backward in the text zipper:</p>
<div class="sourceCode" id="cb23" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="ot">moveByOne ::</span> <span class="dt">TextZipper</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">TextZipper</span> <span class="dt">String</span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a>moveByOne zipper</span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a> <span class="co">-- not at end of line</span></span>
<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">not</span> <span class="op">$</span> <span class="fu">null</span> (tzRight zipper) <span class="ot">=</span></span>
<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a> zipper { tzLeft <span class="ot">=</span> <span class="fu">head</span> (tzRight zipper) <span class="op">:</span> tzLeft zipper</span>
<span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"></a> , tzRight <span class="ot">=</span> <span class="fu">tail</span> <span class="op">$</span> tzRight zipper</span>
<span id="cb23-7"><a href="#cb23-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb23-8"><a href="#cb23-8" aria-hidden="true" tabindex="-1"></a> <span class="co">-- at end of line but not at end of input</span></span>
<span id="cb23-9"><a href="#cb23-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">not</span> <span class="op">$</span> <span class="fu">null</span> (tzBelow zipper) <span class="ot">=</span></span>
<span id="cb23-10"><a href="#cb23-10" aria-hidden="true" tabindex="-1"></a> zipper { tzAbove <span class="ot">=</span> tzLeft zipper <span class="op">:</span> tzAbove zipper</span>
<span id="cb23-11"><a href="#cb23-11" aria-hidden="true" tabindex="-1"></a> , tzBelow <span class="ot">=</span> <span class="fu">tail</span> <span class="op">$</span> tzBelow zipper</span>
<span id="cb23-12"><a href="#cb23-12" aria-hidden="true" tabindex="-1"></a> , tzLeft <span class="ot">=</span> <span class="st">""</span></span>
<span id="cb23-13"><a href="#cb23-13" aria-hidden="true" tabindex="-1"></a> , tzRight <span class="ot">=</span> <span class="fu">head</span> <span class="op">$</span> tzBelow zipper</span>
<span id="cb23-14"><a href="#cb23-14" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb23-15"><a href="#cb23-15" aria-hidden="true" tabindex="-1"></a> <span class="co">-- at end of input</span></span>
<span id="cb23-16"><a href="#cb23-16" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> zipper</span>
<span id="cb23-17"><a href="#cb23-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb23-18"><a href="#cb23-18" aria-hidden="true" tabindex="-1"></a><span class="ot">move ::</span> <span class="dt">TextZipper</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">TextZipper</span> <span class="dt">String</span></span>
<span id="cb23-19"><a href="#cb23-19" aria-hidden="true" tabindex="-1"></a>move zipper <span class="ot">=</span> <span class="kw">let</span> zipper' <span class="ot">=</span> moveByOne zipper</span>
<span id="cb23-20"><a href="#cb23-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="kw">case</span> currentChar zipper' <span class="kw">of</span></span>
<span id="cb23-21"><a href="#cb23-21" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> _ <span class="ot">-></span> zipper'</span>
<span id="cb23-22"><a href="#cb23-22" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> moveByOne zipper'</span></code></pre></div>
<p>The <code>moveByOne</code> function moves forward in the text zipper by one character. It considers three cases:</p>
<ol type="1">
<li>When not at the end of line indicated by <code>tzRight</code> not being empty, it moves the cursor by one character in the same line.</li>
<li>When at the end of line but not at the end of input indicated by <code>tzBelow</code> not being empty, it moves the cursor to the beginning of the next line below.</li>
<li>When at the end of the input, it does nothing.</li>
</ol>
<p>The <code>move</code> function calls <code>moveByOne</code> one or two times to move past end of lines.</p>
<p>The <code>moveBackByOne</code> function is similar to <code>moveByOne</code> except it moves backwards in the zipper:</p>
<div class="sourceCode" id="cb24" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="ot">moveBackByOne ::</span> <span class="dt">TextZipper</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">TextZipper</span> <span class="dt">String</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>moveBackByOne zipper</span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a> <span class="co">-- not at start of line</span></span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">not</span> <span class="op">$</span> <span class="fu">null</span> (tzLeft zipper) <span class="ot">=</span></span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a> zipper { tzLeft <span class="ot">=</span> <span class="fu">tail</span> <span class="op">$</span> tzLeft zipper</span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a> , tzRight <span class="ot">=</span> <span class="fu">head</span> (tzLeft zipper) <span class="op">:</span> tzRight zipper</span>
<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb24-8"><a href="#cb24-8" aria-hidden="true" tabindex="-1"></a> <span class="co">-- at start of line but not at start of input</span></span>
<span id="cb24-9"><a href="#cb24-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">not</span> <span class="op">$</span> <span class="fu">null</span> (tzAbove zipper) <span class="ot">=</span></span>
<span id="cb24-10"><a href="#cb24-10" aria-hidden="true" tabindex="-1"></a> zipper { tzAbove <span class="ot">=</span> <span class="fu">tail</span> <span class="op">$</span> tzAbove zipper</span>
<span id="cb24-11"><a href="#cb24-11" aria-hidden="true" tabindex="-1"></a> , tzBelow <span class="ot">=</span> tzRight zipper <span class="op">:</span> tzBelow zipper</span>
<span id="cb24-12"><a href="#cb24-12" aria-hidden="true" tabindex="-1"></a> , tzLeft <span class="ot">=</span> <span class="fu">head</span> <span class="op">$</span> tzAbove zipper</span>
<span id="cb24-13"><a href="#cb24-13" aria-hidden="true" tabindex="-1"></a> , tzRight <span class="ot">=</span> <span class="st">""</span></span>
<span id="cb24-14"><a href="#cb24-14" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb24-15"><a href="#cb24-15" aria-hidden="true" tabindex="-1"></a> <span class="co">-- at start of input</span></span>
<span id="cb24-16"><a href="#cb24-16" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> zipper</span></code></pre></div>
<p>Phew. That was a lot. Let’s try them out in GHCi to build our understanding:</p>
<div class="sourceCode" id="cb25" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> text <span class="ot">=</span> <span class="fu">lines</span> <span class="st">"some\nnext\n\nmore"</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> text</span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a>["some\n","next\n","\n","more"]</span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz <span class="ot">=</span> textZipper text</span>
<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz</span>
<span id="cb25-6"><a href="#cb25-6" aria-hidden="true" tabindex="-1"></a>TextZipper{left="", right="some\n", above=[], below=["next\n","\n","more"]}</span>
<span id="cb25-7"><a href="#cb25-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="co">-- demostrating moveByOne</span></span>
<span id="cb25-8"><a href="#cb25-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> moveByOne tz <span class="co">-- moves by one char</span></span>
<span id="cb25-9"><a href="#cb25-9" aria-hidden="true" tabindex="-1"></a>TextZipper{left="s", right="ome\n", above=[], below=["next\n","\n","more"]}</span>
<span id="cb25-10"><a href="#cb25-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> f <span class="ot">`times`</span> n <span class="ot">=</span> (<span class="op">!!</span> n) <span class="op">.</span> <span class="fu">iterate</span> f</span>
<span id="cb25-11"><a href="#cb25-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> moveByOne <span class="ot">`times`</span> <span class="dv">1</span> <span class="op">$</span> tz <span class="co">-- moves by one char</span></span>
<span id="cb25-12"><a href="#cb25-12" aria-hidden="true" tabindex="-1"></a>TextZipper{left="s", right="ome\n", above=[], below=["next\n","\n","more"]}</span>
<span id="cb25-13"><a href="#cb25-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> moveByOne <span class="ot">`times`</span> <span class="dv">4</span> <span class="op">$</span> tz <span class="co">-- moves by four chars</span></span>
<span id="cb25-14"><a href="#cb25-14" aria-hidden="true" tabindex="-1"></a>TextZipper{left="emos", right="\n", above=[], below=["next\n","\n","more"]}</span>
<span id="cb25-15"><a href="#cb25-15" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> moveByOne <span class="ot">`times`</span> <span class="dv">5</span> <span class="op">$</span> tz <span class="co">-- moves by five chars</span></span>
<span id="cb25-16"><a href="#cb25-16" aria-hidden="true" tabindex="-1"></a>TextZipper{left="\nemos", right="", above=[], below=["next\n","\n","more"]}</span>
<span id="cb25-17"><a href="#cb25-17" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> moveByOne <span class="ot">`times`</span> <span class="dv">6</span> <span class="op">$</span> tz <span class="co">-- moves by six chars</span></span>
<span id="cb25-18"><a href="#cb25-18" aria-hidden="true" tabindex="-1"></a>TextZipper{left="", right="next\n", above=["\nemos"], below=["\n","more"]}</span>
<span id="cb25-19"><a href="#cb25-19" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="co">-- demostrating move</span></span>
<span id="cb25-20"><a href="#cb25-20" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz' <span class="ot">=</span> moveByOne <span class="ot">`times`</span> <span class="dv">6</span> <span class="op">$</span> tz</span>
<span id="cb25-21"><a href="#cb25-21" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz'</span>
<span id="cb25-22"><a href="#cb25-22" aria-hidden="true" tabindex="-1"></a>TextZipper{left="", right="next\n", above=["\nemos"], below=["\n","more"]}</span>
<span id="cb25-23"><a href="#cb25-23" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> move tz' <span class="co">-- moves by one char</span></span>
<span id="cb25-24"><a href="#cb25-24" aria-hidden="true" tabindex="-1"></a>TextZipper{left="n", right="ext\n", above=["\nemos"], below=["\n","more"]}</span>
<span id="cb25-25"><a href="#cb25-25" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> move <span class="ot">`times`</span> <span class="dv">1</span> <span class="op">$</span> tz' <span class="co">-- moves by one char</span></span>
<span id="cb25-26"><a href="#cb25-26" aria-hidden="true" tabindex="-1"></a>TextZipper{left="n", right="ext\n", above=["\nemos"], below=["\n","more"]}</span>
<span id="cb25-27"><a href="#cb25-27" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> move <span class="ot">`times`</span> <span class="dv">4</span> <span class="op">$</span> tz' <span class="co">-- moves by four chars</span></span>
<span id="cb25-28"><a href="#cb25-28" aria-hidden="true" tabindex="-1"></a>TextZipper{left="txen", right="\n", above=["\nemos"], below=["\n","more"]}</span>
<span id="cb25-29"><a href="#cb25-29" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> move <span class="ot">`times`</span> <span class="dv">5</span> <span class="op">$</span> tz' <span class="co">-- moves by six chars, moving past end of line</span></span>
<span id="cb25-30"><a href="#cb25-30" aria-hidden="true" tabindex="-1"></a>TextZipper{left="", right="\n", above=["\ntxen","\nemos"], below=["more"]}</span>
<span id="cb25-31"><a href="#cb25-31" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="co">-- demonstrating moveBackByOne</span></span>
<span id="cb25-32"><a href="#cb25-32" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz' <span class="ot">=</span> moveByOne <span class="ot">`times`</span> <span class="dv">10</span> <span class="op">$</span> tz</span>
<span id="cb25-33"><a href="#cb25-33" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz'</span>
<span id="cb25-34"><a href="#cb25-34" aria-hidden="true" tabindex="-1"></a>TextZipper{left="txen", right="\n", above=["\nemos"], below=["\n","more"]}</span>
<span id="cb25-35"><a href="#cb25-35" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> moveBackByOne tz' <span class="co">-- moves back by one char</span></span>
<span id="cb25-36"><a href="#cb25-36" aria-hidden="true" tabindex="-1"></a>TextZipper{left="xen", right="t\n", above=["\nemos"], below=["\n","more"]}</span>
<span id="cb25-37"><a href="#cb25-37" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> moveBackByOne <span class="ot">`times`</span> <span class="dv">1</span> <span class="op">$</span> tz' <span class="co">-- moves back by one char</span></span>
<span id="cb25-38"><a href="#cb25-38" aria-hidden="true" tabindex="-1"></a>TextZipper{left="xen", right="t\n", above=["\nemos"], below=["\n","more"]}</span>
<span id="cb25-39"><a href="#cb25-39" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> moveBackByOne <span class="ot">`times`</span> <span class="dv">3</span> <span class="op">$</span> tz' <span class="co">-- moves back by three chars</span></span>
<span id="cb25-40"><a href="#cb25-40" aria-hidden="true" tabindex="-1"></a>TextZipper{left="n", right="ext\n", above=["\nemos"], below=["\n","more"]}</span>
<span id="cb25-41"><a href="#cb25-41" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> moveBackByOne <span class="ot">`times`</span> <span class="dv">4</span> <span class="op">$</span> tz' <span class="co">-- moves back by four chars</span></span>
<span id="cb25-42"><a href="#cb25-42" aria-hidden="true" tabindex="-1"></a>TextZipper{left="", right="next\n", above=["\nemos"], below=["\n","more"]}</span>
<span id="cb25-43"><a href="#cb25-43" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> moveBackByOne <span class="ot">`times`</span> <span class="dv">5</span> <span class="op">$</span> tz' <span class="co">-- moves back by five chars</span></span>
<span id="cb25-44"><a href="#cb25-44" aria-hidden="true" tabindex="-1"></a>TextZipper{left="\nemos", right="", above=[], below=["next\n","\n","more"]}</span></code></pre></div>
<p>That works as expected. We are now ready to add position tracking to our error reporting parser.</p>
<h3 id="zippered-parser">Zippered Parser</h3>
<p>Adding <code>TextZipper</code> to the parser is simple. We just change the input to be of type <code>TextZipper i</code>.</p>
<div class="sourceCode" id="cb26" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Parser</span> i o <span class="ot">=</span> <span class="dt">Parser</span> {</span>
<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a><span class="ot"> runParser_ ::</span> <span class="dt">TextZipper</span> i <span class="ot">-></span> <span class="dt">ParseResult</span> (<span class="dt">TextZipper</span> i, o)</span>
<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb26-4"><a href="#cb26-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb26-5"><a href="#cb26-5" aria-hidden="true" tabindex="-1"></a><span class="ot">runParser ::</span> <span class="dt">Parser</span> <span class="dt">String</span> o <span class="ot">-></span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">ParseResult</span> (<span class="dt">String</span>, o)</span>
<span id="cb26-6"><a href="#cb26-6" aria-hidden="true" tabindex="-1"></a>runParser parser input <span class="ot">=</span></span>
<span id="cb26-7"><a href="#cb26-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> runParser_ parser (textZipper <span class="op">$</span> <span class="fu">lines</span> input) <span class="kw">of</span></span>
<span id="cb26-8"><a href="#cb26-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Error</span> errs <span class="ot">-></span> <span class="dt">Error</span> errs</span>
<span id="cb26-9"><a href="#cb26-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> (restZ, output) <span class="ot">-></span> <span class="dt">Result</span> (leftOver restZ, output)</span>
<span id="cb26-10"><a href="#cb26-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb26-11"><a href="#cb26-11" aria-hidden="true" tabindex="-1"></a> leftOver tz <span class="ot">=</span> <span class="fu">concat</span> (tzRight tz <span class="op">:</span> tzBelow tz)</span></code></pre></div>
<p>We also change the <code>runParser</code> function to convert the input string into a text zipper and to convert the text zipper for the leftover input back into a string at the end of parsing.</p>
<p>Finally, we rewrite the instances for <code>Parser</code> without any change in logic:</p>
<div class="sourceCode" id="cb27" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Functor</span> (<span class="dt">Parser</span> i) <span class="kw">where</span></span>
<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">fmap</span> f parser <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> <span class="fu">fmap</span> (<span class="fu">fmap</span> f) <span class="op">.</span> runParser_ parser</span>
<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb27-4"><a href="#cb27-4" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Applicative</span> (<span class="dt">Parser</span> i) <span class="kw">where</span></span>
<span id="cb27-5"><a href="#cb27-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> x <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> <span class="fu">pure</span> <span class="op">.</span> (, x)</span>
<span id="cb27-6"><a href="#cb27-6" aria-hidden="true" tabindex="-1"></a> pf <span class="op"><*></span> pa <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> runParser_ pf input <span class="kw">of</span></span>
<span id="cb27-7"><a href="#cb27-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Error</span> err <span class="ot">-></span> <span class="dt">Error</span> err</span>
<span id="cb27-8"><a href="#cb27-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> (rest, f) <span class="ot">-></span> <span class="fu">fmap</span> f <span class="op"><$></span> runParser_ pa rest</span>
<span id="cb27-9"><a href="#cb27-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb27-10"><a href="#cb27-10" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Monad</span> (<span class="dt">Parser</span> i) <span class="kw">where</span></span>
<span id="cb27-11"><a href="#cb27-11" aria-hidden="true" tabindex="-1"></a> parser <span class="op">>>=</span> f <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> runParser_ parser input <span class="kw">of</span></span>
<span id="cb27-12"><a href="#cb27-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">Error</span> err <span class="ot">-></span> <span class="dt">Error</span> err</span>
<span id="cb27-13"><a href="#cb27-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> (rest, o) <span class="ot">-></span> runParser_ (f o) rest</span></code></pre></div>
<p>Notice that there is no <code>Alternative</code> instance of <code>Parser</code> anymore unlike the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#cb31-1">previous post</a>. This is because we are eschewing the backtracking functionality provided by the <code>Alternative</code> instance for our current parser. This also means that we cannot use any convenience functions provided by <code>Alternative</code> like <code>many</code>, <code>some</code> and <code>optional</code>. But that’s okay because we will not need them when using lookahead.</p>
<p>Now that we have the parser with position tracking and error reporting separately, let’s integrate them together.</p>
<h2 data-track-content data-content-name="errors-with-position" data-content-piece="json-parsing-from-scratch-in-haskell-2" id="errors-with-position">Errors with Position</h2>
<p>We want to add positions of the errors to the error messages along with a sample text around the error position. The <code>addPosition</code> function does that:</p>
<div class="sourceCode" id="cb28" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a><span class="ot">addPosition ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">TextZipper</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb28-2"><a href="#cb28-2" aria-hidden="true" tabindex="-1"></a>addPosition err zipper <span class="ot">=</span></span>
<span id="cb28-3"><a href="#cb28-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (ln, cn) <span class="ot">=</span> currentPosition zipper</span>
<span id="cb28-4"><a href="#cb28-4" aria-hidden="true" tabindex="-1"></a> err' <span class="ot">=</span> printf (err <span class="op"><></span> <span class="st">" at line %d, column %d: "</span>) ln cn</span>
<span id="cb28-5"><a href="#cb28-5" aria-hidden="true" tabindex="-1"></a> left <span class="ot">=</span> <span class="fu">reverse</span> <span class="op">$</span> tzLeft zipper</span>
<span id="cb28-6"><a href="#cb28-6" aria-hidden="true" tabindex="-1"></a> right <span class="ot">=</span> tzRight zipper</span>
<span id="cb28-7"><a href="#cb28-7" aria-hidden="true" tabindex="-1"></a> left' <span class="ot">=</span> showStr <span class="op">$</span> <span class="fu">drop</span> (<span class="fu">length</span> left <span class="op">-</span> ctxLen) left</span>
<span id="cb28-8"><a href="#cb28-8" aria-hidden="true" tabindex="-1"></a> right' <span class="ot">=</span> showStr <span class="op">$</span> <span class="fu">take</span> ctxLen right</span>
<span id="cb28-9"><a href="#cb28-9" aria-hidden="true" tabindex="-1"></a> line <span class="ot">=</span> left' <span class="op"><></span> right'</span>
<span id="cb28-10"><a href="#cb28-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> printf (err' <span class="op"><></span> <span class="st">"%s\n%s↑"</span>)</span>
<span id="cb28-11"><a href="#cb28-11" aria-hidden="true" tabindex="-1"></a> line</span>
<span id="cb28-12"><a href="#cb28-12" aria-hidden="true" tabindex="-1"></a> (<span class="fu">replicate</span> (<span class="fu">length</span> err' <span class="op">+</span> <span class="fu">length</span> left') <span class="ch">' '</span>)</span>
<span id="cb28-13"><a href="#cb28-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb28-14"><a href="#cb28-14" aria-hidden="true" tabindex="-1"></a> ctxLen <span class="ot">=</span> <span class="dv">6</span></span>
<span id="cb28-15"><a href="#cb28-15" aria-hidden="true" tabindex="-1"></a> showStr <span class="ot">=</span> <span class="fu">concatMap</span> showCharForErrorMsg</span>
<span id="cb28-16"><a href="#cb28-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb28-17"><a href="#cb28-17" aria-hidden="true" tabindex="-1"></a><span class="ot">showCharForErrorMsg ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb28-18"><a href="#cb28-18" aria-hidden="true" tabindex="-1"></a>showCharForErrorMsg c <span class="ot">=</span> <span class="kw">case</span> c <span class="kw">of</span></span>
<span id="cb28-19"><a href="#cb28-19" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\b'</span> <span class="ot">-></span> <span class="st">"\\b"</span></span>
<span id="cb28-20"><a href="#cb28-20" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\f'</span> <span class="ot">-></span> <span class="st">"\\f"</span></span>
<span id="cb28-21"><a href="#cb28-21" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\n'</span> <span class="ot">-></span> <span class="st">"\\n"</span></span>
<span id="cb28-22"><a href="#cb28-22" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\r'</span> <span class="ot">-></span> <span class="st">"\\r"</span></span>
<span id="cb28-23"><a href="#cb28-23" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\t'</span> <span class="ot">-></span> <span class="st">"\\t"</span></span>
<span id="cb28-24"><a href="#cb28-24" aria-hidden="true" tabindex="-1"></a> <span class="ch">' '</span> <span class="ot">-></span> <span class="st">"·"</span></span>
<span id="cb28-25"><a href="#cb28-25" aria-hidden="true" tabindex="-1"></a> _ <span class="op">|</span> <span class="fu">isControl</span> c <span class="ot">-></span> <span class="st">"\\"</span> <span class="op"><></span> <span class="fu">show</span> (<span class="fu">ord</span> c)</span>
<span id="cb28-26"><a href="#cb28-26" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> [c]</span></code></pre></div>
<p><code>addPosition</code> takes an error message and a text zipper. It finds the current position in the zipper and gets the text around the current position. Then it adds this text and the position in the error message and returns it. It also takes care of replacing characters in a way that makes the error messages more readable. Let’s see it at work in GHCi:</p>
<div class="sourceCode" id="cb29" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> text <span class="ot">=</span> <span class="fu">lines</span> <span class="st">"some\nnext\n\nmore"</span></span>
<span id="cb29-2"><a href="#cb29-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> text</span>
<span id="cb29-3"><a href="#cb29-3" aria-hidden="true" tabindex="-1"></a>["some\n","next\n","\n","more"]</span>
<span id="cb29-4"><a href="#cb29-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz <span class="ot">=</span> textZipper text</span>
<span id="cb29-5"><a href="#cb29-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz</span>
<span id="cb29-6"><a href="#cb29-6" aria-hidden="true" tabindex="-1"></a>TextZipper{left="", right="some\n", above=[], below=["next\n","\n","more"]}</span>
<span id="cb29-7"><a href="#cb29-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> f <span class="ot">`times`</span> n <span class="ot">=</span> (<span class="op">!!</span> n) <span class="op">.</span> <span class="fu">iterate</span> f</span>
<span id="cb29-8"><a href="#cb29-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz' <span class="ot">=</span> move <span class="ot">`times`</span> <span class="dv">7</span> <span class="op">$</span> tz</span>
<span id="cb29-9"><a href="#cb29-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> tz'</span>
<span id="cb29-10"><a href="#cb29-10" aria-hidden="true" tabindex="-1"></a>TextZipper{left="en", right="xt\n", above=["\nemos"], below=["\n","more"]}</span>
<span id="cb29-11"><a href="#cb29-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStrLn</span> <span class="op">$</span> addPosition <span class="st">"Something went wrong"</span> tz'</span>
<span id="cb29-12"><a href="#cb29-12" aria-hidden="true" tabindex="-1"></a>Something went wrong at line 2, column 3: next\n</span>
<span id="cb29-13"><a href="#cb29-13" aria-hidden="true" tabindex="-1"></a> ↑</span></code></pre></div>
<p>It works perfectly. Now we can enhance our error related helper functions to add position info in errors.</p>
<div class="sourceCode" id="cb30" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb30-1"><a href="#cb30-1" aria-hidden="true" tabindex="-1"></a><span class="ot">parseError ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">TextZipper</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">ParseResult</span> a</span>
<span id="cb30-2"><a href="#cb30-2" aria-hidden="true" tabindex="-1"></a>parseError err zipper <span class="ot">=</span> <span class="dt">Error</span> [addPosition err zipper]</span>
<span id="cb30-3"><a href="#cb30-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb30-4"><a href="#cb30-4" aria-hidden="true" tabindex="-1"></a><span class="ot">throw ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> o</span>
<span id="cb30-5"><a href="#cb30-5" aria-hidden="true" tabindex="-1"></a>throw <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">.</span> parseError</span>
<span id="cb30-6"><a href="#cb30-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb30-7"><a href="#cb30-7" aria-hidden="true" tabindex="-1"></a><span class="ot">elseThrow ::</span> <span class="dt">Parser</span> <span class="dt">String</span> o <span class="ot">-></span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> o</span>
<span id="cb30-8"><a href="#cb30-8" aria-hidden="true" tabindex="-1"></a>elseThrow parser err <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span></span>
<span id="cb30-9"><a href="#cb30-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> runParser_ parser input <span class="kw">of</span></span>
<span id="cb30-10"><a href="#cb30-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Result</span> (rest, a) <span class="ot">-></span> <span class="dt">Result</span> (rest, a)</span>
<span id="cb30-11"><a href="#cb30-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">Error</span> errs <span class="ot">-></span> <span class="dt">Error</span> (addPosition err input <span class="op">:</span> errs)</span></code></pre></div>
<p>We also define a parser combinator <code>elseThrow</code> which tries to run the parser given to it and returns an error with position info in case the parser fails. We’ll see it in action soon.</p>
<p>Our new parser with position-ful error reporting is complete now. Next, we rewrite all our parsers to use lookahead as we mentioned before, starting with the basic parsers.</p>
<h2 data-track-content data-content-name="basic-parsers" data-content-piece="json-parsing-from-scratch-in-haskell-2" id="basic-parsers">Basic Parsers</h2>
<p>We rewrite the parsers <code>lookahead</code> and <code>satisfy</code> to use our new <code>TextZipper</code> based parser.</p>
<div class="sourceCode" id="cb31" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a><span class="ot">lookahead ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb31-2"><a href="#cb31-2" aria-hidden="true" tabindex="-1"></a>lookahead <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> currentChar input <span class="kw">of</span></span>
<span id="cb31-3"><a href="#cb31-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="ot">-></span> <span class="dt">Result</span> (input, c)</span>
<span id="cb31-4"><a href="#cb31-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> parseError <span class="st">"Empty input"</span> input</span>
<span id="cb31-5"><a href="#cb31-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb31-6"><a href="#cb31-6" aria-hidden="true" tabindex="-1"></a><span class="ot">safeLookahead ::</span> <span class="dt">Parser</span> <span class="dt">String</span> (<span class="dt">Maybe</span> <span class="dt">Char</span>)</span>
<span id="cb31-7"><a href="#cb31-7" aria-hidden="true" tabindex="-1"></a>safeLookahead <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> currentChar input <span class="kw">of</span></span>
<span id="cb31-8"><a href="#cb31-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="ot">-></span> <span class="dt">Result</span> (input, <span class="dt">Just</span> c)</span>
<span id="cb31-9"><a href="#cb31-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">Result</span> (input, <span class="dt">Nothing</span>)</span>
<span id="cb31-10"><a href="#cb31-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb31-11"><a href="#cb31-11" aria-hidden="true" tabindex="-1"></a><span class="ot">satisfy ::</span> (<span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Bool</span>) <span class="ot">-></span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb31-12"><a href="#cb31-12" aria-hidden="true" tabindex="-1"></a>satisfy predicate expectation <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> currentChar input <span class="kw">of</span></span>
<span id="cb31-13"><a href="#cb31-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="op">|</span> predicate c <span class="ot">-></span> <span class="dt">Result</span> (move input, c)</span>
<span id="cb31-14"><a href="#cb31-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> c <span class="ot">-></span> <span class="fu">flip</span> parseError input <span class="op">$</span></span>
<span id="cb31-15"><a href="#cb31-15" aria-hidden="true" tabindex="-1"></a> expectation <span class="op"><></span> <span class="st">", got '"</span> <span class="op"><></span> showCharForErrorMsg c <span class="op"><></span> <span class="st">"'"</span></span>
<span id="cb31-16"><a href="#cb31-16" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">flip</span> parseError input <span class="op">$</span></span>
<span id="cb31-17"><a href="#cb31-17" aria-hidden="true" tabindex="-1"></a> expectation <span class="op"><></span> <span class="st">", but the input is empty"</span></span>
<span id="cb31-18"><a href="#cb31-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb31-19"><a href="#cb31-19" aria-hidden="true" tabindex="-1"></a><span class="ot">char ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb31-20"><a href="#cb31-20" aria-hidden="true" tabindex="-1"></a>char c <span class="ot">=</span> satisfy (<span class="op">==</span> c) <span class="op">$</span> printf <span class="st">"Expected '%v'"</span> <span class="op">$</span> showCharForErrorMsg c</span>
<span id="cb31-21"><a href="#cb31-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb31-22"><a href="#cb31-22" aria-hidden="true" tabindex="-1"></a><span class="ot">digit ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Int</span></span>
<span id="cb31-23"><a href="#cb31-23" aria-hidden="true" tabindex="-1"></a>digit <span class="ot">=</span> <span class="fu">digitToInt</span> <span class="op"><$></span> satisfy <span class="fu">isDigit</span> <span class="st">"Expected a digit"</span></span>
<span id="cb31-24"><a href="#cb31-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb31-25"><a href="#cb31-25" aria-hidden="true" tabindex="-1"></a><span class="ot">string ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">String</span></span>
<span id="cb31-26"><a href="#cb31-26" aria-hidden="true" tabindex="-1"></a>string <span class="st">""</span> <span class="ot">=</span> <span class="fu">pure</span> <span class="st">""</span></span>
<span id="cb31-27"><a href="#cb31-27" aria-hidden="true" tabindex="-1"></a>string (c<span class="op">:</span>cs) <span class="ot">=</span> (<span class="op">:</span>) <span class="op"><$></span> char c <span class="op"><*></span> string cs</span></code></pre></div>
<p>We have an additional function <code>safeLookahead</code> which is like <code>lookahead</code> but instead of throwing a parser error on failure, it returns <code>Nothing</code>. Notice that <code>lookahead</code> and <code>safeLookahead</code> only call <code>currentChar</code> but not <code>move</code>, whereas <code>satisfy</code> calls both of them. This means the two lookahead functions do not consume from the input stream but <code>satisfy</code> does. Other parsers are barely changed. We can exercise them in GHCi to see the new functionalities:</p>
<div class="sourceCode" id="cb32" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb32-1"><a href="#cb32-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser lookahead <span class="st">"abc"</span></span>
<span id="cb32-2"><a href="#cb32-2" aria-hidden="true" tabindex="-1"></a>("abc",'a')</span>
<span id="cb32-3"><a href="#cb32-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser lookahead <span class="st">""</span></span>
<span id="cb32-4"><a href="#cb32-4" aria-hidden="true" tabindex="-1"></a>Empty input at line 1, column 1:</span>
<span id="cb32-5"><a href="#cb32-5" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb32-6"><a href="#cb32-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser safeLookahead <span class="st">"abc"</span></span>
<span id="cb32-7"><a href="#cb32-7" aria-hidden="true" tabindex="-1"></a>("abc",Just 'a')</span>
<span id="cb32-8"><a href="#cb32-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser safeLookahead <span class="st">""</span></span>
<span id="cb32-9"><a href="#cb32-9" aria-hidden="true" tabindex="-1"></a>("",Nothing)</span>
<span id="cb32-10"><a href="#cb32-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser (string <span class="st">"abh"</span>) <span class="st">"abhinav"</span></span>
<span id="cb32-11"><a href="#cb32-11" aria-hidden="true" tabindex="-1"></a>("inav","abh")</span>
<span id="cb32-12"><a href="#cb32-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser (string <span class="st">"abc"</span>) <span class="st">"abhinav"</span></span>
<span id="cb32-13"><a href="#cb32-13" aria-hidden="true" tabindex="-1"></a>Expected 'c', got 'h' at line 1, column 3: abhinav</span>
<span id="cb32-14"><a href="#cb32-14" aria-hidden="true" tabindex="-1"></a> ↑</span>
<span id="cb32-15"><a href="#cb32-15" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser digit <span class="st">"12s"</span></span>
<span id="cb32-16"><a href="#cb32-16" aria-hidden="true" tabindex="-1"></a>("2s",1)</span>
<span id="cb32-17"><a href="#cb32-17" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser digit <span class="st">"abhinav"</span></span>
<span id="cb32-18"><a href="#cb32-18" aria-hidden="true" tabindex="-1"></a>Expected a digit, got 'a' at line 1, column 1: abhina</span>
<span id="cb32-19"><a href="#cb32-19" aria-hidden="true" tabindex="-1"></a> ↑</span></code></pre></div>
<p>We get correct results and correct error messages with right error positions.</p>
<h2 data-track-content data-content-name="conclusion" data-content-piece="json-parsing-from-scratch-in-haskell-2" id="conclusion">Conclusion</h2>
<p>We are in the process of rewriting the JSON parser we wrote in the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed">previous post</a> to add support for error reporting. In this post, we rewrote the basic parser framework to support throwing errors with multiline contextual messages and error positions. In the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-3/?mtm_campaign=feed">next post</a>, we rewrite all JSON parsers using our new basic parsers to use lookahead instead of backtracking.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>Megaparsec provides a lot of other facilities for working with errors. You can define custom errors with custom messages which seamlessly work with the built-in pretty printing. You can catch errors in your parsers and choose to do another thing. You can even report multiple errors in a single run, which is useful when you are writing some sort of validation/inspection tool. <a href="https://markkarpov.com/tutorial/megaparsec.html" target="_blank" rel="noopener">This tutorial</a> goes in full depth about all the capabilities of Megaparsec.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>Having unit tests before refactoring your code is always a good idea. Refactoring should be a behavior-preserving code change. Running the tests continuously ensures that the refactoring steps have not changed any behavior of the code. Property-based tests go a step further ahead by literally capturing the behaviors of your code as tests, no hand-written test data required. Though it is not shown in this post, the tests written in the previous post were a huge help to me and caught many edge-cases while adding the error reporting capabilities.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>The suffix <code>1</code> has been added to the <code>Parser</code> type and the functions in the following code because this is not the final form of the parser we are going to write. The final ones will not have suffixes.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>Let me reiterate that the parser implementation in this series of posts is for illustrative and learning/teaching purposes only. So it’s okay to have some fun using an interesting technique. A production grade parser will certainly not use Zippers like the way we do.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>The <a href="https://web.archive.org/web/20200929/https://learnyouahaskell.com/zippers" target="_blank" rel="noopener"><em>Zippers</em></a> chapter from the <em>Learn you a Haskell</em> book is a great resource to learn about zippers in detail.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>JSON Parsing from Scratch in Haskell</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed">JSON Parsing from Scratch in Haskell</a>
</li>
<li>
<strong>Error Reporting—Part 1</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-3/?mtm_campaign=feed">Error Reporting—Part 2</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-2/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2020-09-29T00:00:00Z <p>In the <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/">previous post</a> we wrote a simple but correct JSON parser in Haskell. The parser was written very naively: if it failed, it returned nothing. You couldn’t tell what the failure was or where it happened. That’s OK for a toy parser but error reporting is an absolute must requirement for all good parsers. So in this post and next post, we’ll add simple but useful error reporting capability to our JSON parser.</p>
https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/ JSON Parsing from Scratch in Haskell 2020-05-04T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p><a href="https://en.wikipedia.org/wiki/JSON" target="_blank" rel="noopener">JSON</a> is probably the most used standard file format for storing and transmitting data on the Internet in recent times. Though it was historically derived from <a href="https://en.wikipedia.org/wiki/JavaScript" target="_blank" rel="noopener">JavaScript</a>, it is a programming language independent format and is now supported by almost all languages. JSON has a simple syntax specification with only four scalar data types and two composite data types. So, writing a parser for JSON is a great exercise for learning the basics of parsing. Let’s write one from scratch in Haskell.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>JSON Parsing from Scratch in Haskell</strong>.</p>
<ol>
<li>
<strong>JSON Parsing from Scratch in Haskell</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-2/?mtm_campaign=feed">Error Reporting—Part 1</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-3/?mtm_campaign=feed">Error Reporting—Part 2</a>
</li>
</ol>
</section>
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#introduction">Introduction</a></li><li><a href="#json-syntax">JSON Syntax</a></li><li><a href="#parsing">Parsing</a></li><li><a href="#setup">Setup</a></li><li><a href="#json-data-type">JSON Data Type</a></li><li><a href="#json-generators">JSON Generators</a></li><li><a href="#parser">Parser</a></li><li><a href="#char-parser">Char Parser</a></li><li><a href="#digit-parser">Digit Parser</a></li><li><a href="#string-parser">String Parser</a></li><li><a href="#jnull-and-jbool-parsers">JNull and JBool Parsers</a></li><li><a href="#jstring-parser">JString Parser</a></li><li><a href="#jnumber-parser">JNumber Parser</a></li><li><a href="#jarray-parser">JArray Parser</a></li><li><a href="#jobject-parser">JObject Parser</a></li><li><a href="#json-parser">JSON Parser</a></li><li><a href="#conclusion">Conclusion</a></li></ol></nav>
<h2 data-track-content data-content-name="introduction" data-content-piece="json-parsing-from-scratch-in-haskell" id="introduction">Introduction</h2>
<p>JSON (JavaScript Object Notation) grew out of JavaScript as a way to exchange data between browsers and servers. <a href="https://en.wikipedia.org/wiki/Douglas_Crockford" target="_blank" rel="noopener">Douglas Crockford</a>, an American computer programmer and author of the popular book <a href="https://web.archive.org/web/20200504/https://www.amazon.com/dp/0596517742/wrrrldwideweb" target="_blank" rel="noopener"><em>JavaScript: The Good Parts</em></a>, wrote the first specification of the JSON format. After seeing wide adoption as a general data-exchange format, <a href="https://en.wikipedia.org/wiki/IETF" target="_blank" rel="noopener">IETF</a> standardized JSON as an <a href="https://en.wikipedia.org/wiki/Internet_Standard" target="_blank" rel="noopener">Internet Standard</a> with <a href="https://tools.ietf.org/html/rfc7159" target="_blank" rel="noopener">RFC 7159</a> and later with <a href="https://tools.ietf.org/html/rfc8259" target="_blank" rel="noopener">RFC 8259</a>. Now JSON is the lingua franca of the <a href="https://en.wikipedia.org/wiki/Web_Service" target="_blank" rel="noopener">Web Service</a> world, both <a href="https://en.wikipedia.org/wiki/Remote_procedure_call" target="_blank" rel="noopener">RPC</a> and <a href="https://en.wikipedia.org/wiki/REST" target="_blank" rel="noopener">REST</a> varieties. It has also become a commonly used configuration file format and database storage format. With such broad uses, it is no wonder that almost all programming languages support JSON in some form or another. As of the time of writing this post, the <a href="https://www.json.org/json-en.html" target="_blank" rel="noopener">json.org</a> website lists 167 JSON libraries across 60 languages.</p>
<p>Since JSON came out of JavaScript—a <a href="https://en.wikipedia.org/wiki/Dynamic_typing" target="_blank" rel="noopener">dynamically typed</a>, <a href="https://en.wikipedia.org/wiki/Prototype-based_programming" target="_blank" rel="noopener">prototype-based</a> <a href="https://en.wikipedia.org/wiki/Object-oriented_programming" target="_blank" rel="noopener">object-oriented</a> language with a <a href="https://en.wikipedia.org/wiki/List_of_programming_languages_by_type#Curly-bracket_languages" target="_blank" rel="noopener">curly bracket</a> syntax—it heavily borrows the data types and syntax from JavaScript. It has only four scalar data types:</p>
<ol type="1">
<li>Null: a <a href="https://en.wikipedia.org/wiki/Nullable_type" target="_blank" rel="noopener">null</a> value.</li>
<li>Boolean: a <a href="https://en.wikipedia.org/wiki/Boolean_datatype" target="_blank" rel="noopener">boolean</a> value.</li>
<li>String: a <a href="https://en.wikipedia.org/wiki/String_(computer_science)" target="_blank" rel="noopener">string</a> value, a sequence of zero or more <a href="https://en.wikipedia.org/wiki/Unicode" target="_blank" rel="noopener">Unicode</a> characters.</li>
<li>Number: a numeric value representing integral and real numbers with support for <a href="https://en.wikipedia.org/wiki/scientific_notation" target="_blank" rel="noopener">scientific notation</a>.</li>
</ol>
<p>Along with these four scalar data types, JSON supports only two composite data types:</p>
<ol type="1">
<li>Array: an ordered list of values.</li>
<li>Object: a collection of name-value pairs.</li>
</ol>
<h2 data-track-content data-content-name="json-syntax" data-content-piece="json-parsing-from-scratch-in-haskell" id="json-syntax">JSON Syntax</h2>
<p>Let’s see how these types are represented syntactically in JSON.</p>
<h3 id="null-and-boolean">Null and Boolean</h3>
<p>The Null value is represented simply by the exact string <code>null</code>. Boolean data is either truthy or falsey, represented by the exact strings <code>true</code> and <code>false</code> respectively.</p>
<h3 id="string">String</h3>
<p>A String in JSON is sequence of zero or more Unicode characters (except <a href="https://en.wikipedia.org/wiki/Control_characters" target="_blank" rel="noopener">Control characters</a>), wrapped in double quotes (<code>"</code>). Some special characters can be escaped using backslashes. Additionally, all characters can also be represented with their four hex-digit codes prefixed with <code>\u</code>. This transition diagram depicts the String syntax:</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 417 577'%3E%3C/svg%3E" class="lazyload w-100pct mw-60pct nolink" style="--image-aspect-ratio: 0.7227036395147314" data-src="/images/json-parsing-from-scratch-in-haskell/string.svg" alt="JSON String syntax"></img>
<noscript><img src="/images/json-parsing-from-scratch-in-haskell/string.svg" class="w-100pct mw-60pct nolink" alt="JSON String syntax"></img></noscript>
<figcaption>JSON String syntax</figcaption>
</figure>
<h3 id="number">Number</h3>
<p>A Number in JSON is represented as a combination of an integral part, a fractional part and an exponent. All these parts are optional but they must follow some rules. For example, these numbers are invalid in JSON:</p>
<dl>
<dt>012</dt>
<dd>
integral part cannot start with 0
</dd>
<dt>1.</dt>
<dd>
fractional part cannot be empty
</dd>
<dt>.123</dt>
<dd>
integral part cannot be empty
</dd>
<dt>1.23e</dt>
<dd>
exponent part cannot be empty
</dd>
</dl>
<p>And these numbers are valid: 0, 1234, 1.23, 0.222, 1e5, 5E-45, 1.23e9, 1.77E-9.</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 385 609'%3E%3C/svg%3E" class="lazyload w-100pct mw-60pct nolink" style="--image-aspect-ratio: 0.632183908045977" data-src="/images/json-parsing-from-scratch-in-haskell/number.svg" alt="JSON Number syntax"></img>
<noscript><img src="/images/json-parsing-from-scratch-in-haskell/number.svg" class="w-100pct mw-60pct nolink" alt="JSON Number syntax"></img></noscript>
<figcaption>JSON Number syntax</figcaption>
</figure>
<h3 id="whitespace">Whitespace</h3>
<p>In JSON, whitespace is a string of zero or more valid whitespace characters which are space ( ), newline (<code>\n</code>), return (<code>\r</code>) and tab (<code>\t</code>).</p>
<h3 id="array">Array</h3>
<p>A JSON Array is an ordered list of zero or more JSON values separated by commas (<code>,</code>). An array begins with a left bracket (<code>[</code>) and ends with a right bracket (<code>]</code>) and may contain whitespace between them if empty.</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 401 145'%3E%3C/svg%3E" class="lazyload w-100pct mw-60pct nolink" style="--image-aspect-ratio: 2.7655172413793103" data-src="/images/json-parsing-from-scratch-in-haskell/array.svg" alt="JSON Array syntax"></img>
<noscript><img src="/images/json-parsing-from-scratch-in-haskell/array.svg" class="w-100pct mw-60pct nolink" alt="JSON Array syntax"></img></noscript>
<figcaption>JSON Array syntax</figcaption>
</figure>
<h3 id="object">Object</h3>
<p>A JSON Object is a collection of zero or more name-value pairs separated by commas (<code>,</code>). An object begins with a left brace (<code>{</code>) and ends with a right brace (<code>}</code>) and may contain whitespace between them if empty. Names and values are separated by colons (<code>:</code>) and are optionally surrounded by whitespace.</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 401 241'%3E%3C/svg%3E" class="lazyload w-100pct mw-60pct nolink" style="--image-aspect-ratio: 1.6639004149377594" data-src="/images/json-parsing-from-scratch-in-haskell/object.svg" alt="JSON Object syntax"></img>
<noscript><img src="/images/json-parsing-from-scratch-in-haskell/object.svg" class="w-100pct mw-60pct nolink" alt="JSON Object syntax"></img></noscript>
<figcaption>JSON Object syntax</figcaption>
</figure>
<h3 id="value">Value</h3>
<p>Finally, a JSON value is a string, or a number, or a boolean, or null, or an object, or an array, surrounded by optional whitespace. As you may have noticed, we referred to JSON values in the <a href="#array">Array</a> and <a href="#object">Object</a> sections above. Hence, the definition of the JSON syntax is recursive.</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 417 337'%3E%3C/svg%3E" class="lazyload w-100pct mw-60pct nolink" style="--image-aspect-ratio: 1.2373887240356083" data-src="/images/json-parsing-from-scratch-in-haskell/value.svg" alt="JSON Value syntax"></img>
<noscript><img src="/images/json-parsing-from-scratch-in-haskell/value.svg" class="w-100pct mw-60pct nolink" alt="JSON Value syntax"></img></noscript>
<figcaption>JSON Value syntax</figcaption>
</figure>
<h2 data-track-content data-content-name="parsing" data-content-piece="json-parsing-from-scratch-in-haskell" id="parsing">Parsing</h2>
<p><em>Parsing</em><a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> is the process of taking textual input data and converting it to a <a href="https://en.wikipedia.org/wiki/data_structure" target="_blank" rel="noopener">data structure</a>—often a hierarchal structure like a <a href="https://en.wikipedia.org/wiki/parse_tree" target="_blank" rel="noopener">parse tree</a>—while checking the input for correct syntax. Parsing is an important first step<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a> in compilers and interpreters for programming languages to check and convert source text files into internal representations which are used by later steps in the processes. However, parsing is also used for other purposes like converting data from one format to other, for <a href="https://en.wikipedia.org/wiki/linting" target="_blank" rel="noopener">linting</a>, and for <a href="https://en.wikipedia.org/wiki/pretty-printing" target="_blank" rel="noopener">pretty-printing</a>. Our use-case here is converting textual JSON data into Haskell’s internal data structures.</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 529 305'%3E%3C/svg%3E" class="lazyload w-100pct mw-80pct nolink" style="--image-aspect-ratio: 1.7344262295081967" data-src="/images/json-parsing-from-scratch-in-haskell/parse-tree.svg" alt="Parse tree for JSON data {"a": 1, "b": [false, null]}"></img>
<noscript><img src="/images/json-parsing-from-scratch-in-haskell/parse-tree.svg" class="w-100pct mw-80pct nolink" alt="Parse tree for JSON data {"a": 1, "b": [false, null]}"></img></noscript>
<figcaption>Parse tree for JSON data <code>{"a": 1, "b": [false, null]}</code></figcaption>
</figure>
<p>The syntax of a <em><a href="https://en.wikipedia.org/wiki/Formal_language" target="_blank" rel="noopener">Language</a></em> is defined by a set of rules. This set of rules is called the <em><a href="https://en.wikipedia.org/wiki/Formal_grammar" target="_blank" rel="noopener">Grammar</a></em> of the language. If an input does not adhere to a language’s grammar, it is considered incorrect. JSON’s grammar is a <a href="https://en.wikipedia.org/wiki/Deterministic_context-free_grammar" target="_blank" rel="noopener"><em>Deterministic context-free grammar</em></a> which is a subset of <a href="https://en.wikipedia.org/wiki/Context-free_grammar" target="_blank" rel="noopener"><em>Context-free grammar</em></a><a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>. Many programming languages, markup languages and data-definition languages can be described by context-free grammars<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>. Being deterministic, JSON grammar will never allow multiple parse trees for the same input.</p>
<p>Parsing is a widely studied field and as such, multiple parsing algorithms have been invented over time. These algorithms differ in the kind of grammars they can parse and their performances. In general, there are three broad categories of parsing algorithms for context-free grammars:</p>
<ol type="1">
<li>Universal parsing algorithms like <a href="https://en.wikipedia.org/wiki/Earley_parser" target="_blank" rel="noopener">Earley’s algorithm</a> which can parse any grammar. However, these algorithms are generally too slow to be used in real-world settings.</li>
<li><a href="https://en.wikipedia.org/wiki/Top-down_parsing" target="_blank" rel="noopener">Top-down parsing</a> algorithms which start at the root level of parse tree and work down to leaf nodes.</li>
<li><a href="https://en.wikipedia.org/wiki/Bottom-up_parsing" target="_blank" rel="noopener">Bottom-up parsing</a> algorithms which start at the leaf nodes of parse tree and work up to the root node.</li>
</ol>
<p>Both top-down and bottom-up parsers are widely used. In this post, we will implement a <a href="https://en.wikipedia.org/wiki/Recursive_descent_parser" target="_blank" rel="noopener"><em>Recursive descent parser</em></a> which is a top-down parser which executes a set of mutually recursive functions to process its input.</p>
<p>Lastly, there are different ways of writing parsers. We can write the entire parser by hand: read the input character-by-character and call different functions depending on the characters read, parsing the input. Or we can use a <a href="https://en.wikipedia.org/wiki/Parser_generator" target="_blank" rel="noopener">Parser generator</a> program to generate the code for a parser by providing the language’s grammar.</p>
<p>Alternatively, we can use a <a href="https://en.wikipedia.org/wiki/Parser_combinator" target="_blank" rel="noopener"><em>Parser combinator</em></a> system. A parser combinator is a way of combining smaller parsers using higher-order functions to create larger parsers. Let’s say we start with a simple parser to parse one digit. We can then combining it with itself to create a parser to parse a natural number. In the same way, we can start with parsers for constituent parts of a language’s grammar and combine them by following the grammar to create a parser for the whole language. Haskell, with its support for higher-order functional programming, has many good parser combinator <a href="https://hackage.haskell.org/packages/#cat:Parsing" target="_blank" rel="noopener">libraries</a> but we are going to write one from scratch here<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>. Here we go!</p>
<h2 data-track-content data-content-name="setup" data-content-piece="json-parsing-from-scratch-in-haskell" id="setup">Setup</h2>
<p>We are going to write a simple but correct JSON parser from scratch in Haskell, as a Parser combinator. This parser will be for illustrative and learning/teaching purposes only and will not be for production usage<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>. As such, we will not care about error handling and reporting, performance or ease of use. Our purpose here is to learn about some basics of parsing, nuances of the JSON syntax, and parser combinators and property-based testing in Haskell. We will use the <a href="https://tools.ietf.org/html/rfc8259" target="_blank" rel="noopener">RFC 8259</a> document as the reference for the JSON language specification.</p>
<p>To test our parser for correctness, we will use the <a href="https://hackage.haskell.org/package/QuickCheck" target="_blank" rel="noopener">QuickCheck</a> library. QuickCheck is a <em>Property-based Testing</em> framework. The key idea of property-based testing is to write properties of our code that hold true for any input and then, to automatically generate arbitrary inputs and make sure that that the properties are indeed true for them<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>. Since we are writing a JSON parser—or rather, several of them for small parts of the JSON syntax—we will generate arbitrary textual data which are valid JSON and we will throw them at our parsers and assert that they work correctly. We will use GHCi, the interactive Haskell <a href="https://en.wikipedia.org/wiki/REPL" target="_blank" rel="noopener">REPL</a>, to run the tests. Since we will be writing the parser from scratch, we will not use any libraries other than the <a href="https://hackage.haskell.org/package/base" target="_blank" rel="noopener">base</a> library.</p>
<p>Let’s start by writing the required imports:</p>
<div class="sourceCode" id="cb1" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE DeriveGeneric, TupleSections, LambdaCase #-}</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">JSONParser</span> <span class="kw">where</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Applicative</span> (<span class="dt">Alternative</span>(..), optional)</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Control.Monad</span> (replicateM)</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Bits</span> (shiftL)</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Char</span> (isDigit, isHexDigit, isSpace, chr, ord, digitToInt)</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Functor</span> (($>))</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.List</span> (intercalate)</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">GHC.Generics</span> (<span class="dt">Generic</span>)</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Numeric</span> (showHex)</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Test.QuickCheck</span> <span class="kw">hiding</span> (<span class="dt">Positive</span>, <span class="dt">Negative</span>)</span></code></pre></div>
<h2 data-track-content data-content-name="json-data-type" data-content-piece="json-parsing-from-scratch-in-haskell" id="json-data-type">JSON Data Type</h2>
<p>The data type for JSON in Haskell <code>JValue</code> directly reflects the JSON data types:</p>
<div class="sourceCode" id="cb2" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">JValue</span> <span class="ot">=</span> <span class="dt">JNull</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JBool</span> <span class="dt">Bool</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JString</span> <span class="dt">String</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JNumber</span> {<span class="ot"> int ::</span> <span class="dt">Integer</span>,<span class="ot"> frac ::</span> [<span class="dt">Int</span>],<span class="ot"> exponent ::</span> <span class="dt">Integer</span> }</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JArray</span> [<span class="dt">JValue</span>]</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">JObject</span> [(<span class="dt">String</span>, <span class="dt">JValue</span>)]</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Eq</span>, <span class="dt">Generic</span>)</span></code></pre></div>
<p>The JSON <code>null</code> type is represented by a singleton value type with the constructor <code>JNull</code> with no parameters. The JSON boolean type is just a wrapper constructor <code>JBool</code> over the Haskell <code>Bool</code> type. Similarly, the JSON string type is a wrapper over the Haskell <code>String</code> type. The JSON number type is represented as a collection of its integer, fraction and exponent parts. The integer and exponent parts are Haskell <code>Integer</code>s which are signed integers of unbounded size. Whereas the fraction part is a list of digits represented as Haskell <code>Int</code>s. A list of digits is needed here because the fraction part can have leading zeros.</p>
<p>The JSON array type is a wrapped Haskell list, with its elements being of any JSON data types. Likewise, the JSON object type is an <a href="https://en.wikipedia.org/wiki/Association_list" target="_blank" rel="noopener">Association list</a> of Haskell <code>String</code> and any JSON data type.</p>
<p>Let’s write a <a href="https://hackage.haskell.org/package/base/docs/Prelude.html#t:Show" target="_blank" rel="noopener"><code>Show</code></a> instance for the JSON type so that we can easily inspect its values:</p>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> <span class="dt">JValue</span> <span class="kw">where</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> value <span class="ot">=</span> <span class="kw">case</span> value <span class="kw">of</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">JNull</span> <span class="ot">-></span> <span class="st">"null"</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">JBool</span> <span class="dt">True</span> <span class="ot">-></span> <span class="st">"true"</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">JBool</span> <span class="dt">False</span> <span class="ot">-></span> <span class="st">"false"</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">JString</span> s <span class="ot">-></span> showJSONString s</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">JNumber</span> s [] <span class="dv">0</span> <span class="ot">-></span> <span class="fu">show</span> s</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">JNumber</span> s f <span class="dv">0</span> <span class="ot">-></span> <span class="fu">show</span> s <span class="op">++</span> <span class="st">"."</span> <span class="op">++</span> <span class="fu">concatMap</span> <span class="fu">show</span> f</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">JNumber</span> s [] e <span class="ot">-></span> <span class="fu">show</span> s <span class="op">++</span> <span class="st">"e"</span> <span class="op">++</span> <span class="fu">show</span> e</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">JNumber</span> s f e <span class="ot">-></span> <span class="fu">show</span> s <span class="op">++</span> <span class="st">"."</span> <span class="op">++</span> <span class="fu">concatMap</span> <span class="fu">show</span> f <span class="op">++</span> <span class="st">"e"</span> <span class="op">++</span> <span class="fu">show</span> e</span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">JArray</span> a <span class="ot">-></span> <span class="st">"["</span> <span class="op">++</span> intercalate <span class="st">", "</span> (<span class="fu">map</span> <span class="fu">show</span> a) <span class="op">++</span> <span class="st">"]"</span></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">JObject</span> o <span class="ot">-></span> <span class="st">"{"</span> <span class="op">++</span> intercalate <span class="st">", "</span> (<span class="fu">map</span> showKV o) <span class="op">++</span> <span class="st">"}"</span></span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb3-14"><a href="#cb3-14" aria-hidden="true" tabindex="-1"></a> showKV (k, v) <span class="ot">=</span> showJSONString k <span class="op">++</span> <span class="st">": "</span> <span class="op">++</span> <span class="fu">show</span> v</span>
<span id="cb3-15"><a href="#cb3-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-16"><a href="#cb3-16" aria-hidden="true" tabindex="-1"></a><span class="ot">showJSONString ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb3-17"><a href="#cb3-17" aria-hidden="true" tabindex="-1"></a>showJSONString s <span class="ot">=</span> <span class="st">"\""</span> <span class="op">++</span> <span class="fu">concatMap</span> showJSONChar s <span class="op">++</span> <span class="st">"\""</span></span>
<span id="cb3-18"><a href="#cb3-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-19"><a href="#cb3-19" aria-hidden="true" tabindex="-1"></a><span class="fu">isControl</span><span class="ot"> ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Bool</span></span>
<span id="cb3-20"><a href="#cb3-20" aria-hidden="true" tabindex="-1"></a><span class="fu">isControl</span> c <span class="ot">=</span> c <span class="ot">`elem`</span> [<span class="ch">'\0'</span> <span class="op">..</span> <span class="ch">'\31'</span>]</span>
<span id="cb3-21"><a href="#cb3-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-22"><a href="#cb3-22" aria-hidden="true" tabindex="-1"></a><span class="ot">showJSONChar ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb3-23"><a href="#cb3-23" aria-hidden="true" tabindex="-1"></a>showJSONChar c <span class="ot">=</span> <span class="kw">case</span> c <span class="kw">of</span></span>
<span id="cb3-24"><a href="#cb3-24" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\''</span> <span class="ot">-></span> <span class="st">"'"</span></span>
<span id="cb3-25"><a href="#cb3-25" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\"'</span> <span class="ot">-></span> <span class="st">"\\\""</span></span>
<span id="cb3-26"><a href="#cb3-26" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\\'</span> <span class="ot">-></span> <span class="st">"\\\\"</span></span>
<span id="cb3-27"><a href="#cb3-27" aria-hidden="true" tabindex="-1"></a> <span class="ch">'/'</span> <span class="ot">-></span> <span class="st">"\\/"</span></span>
<span id="cb3-28"><a href="#cb3-28" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\b'</span> <span class="ot">-></span> <span class="st">"\\b"</span></span>
<span id="cb3-29"><a href="#cb3-29" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\f'</span> <span class="ot">-></span> <span class="st">"\\f"</span></span>
<span id="cb3-30"><a href="#cb3-30" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\n'</span> <span class="ot">-></span> <span class="st">"\\n"</span></span>
<span id="cb3-31"><a href="#cb3-31" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\r'</span> <span class="ot">-></span> <span class="st">"\\r"</span></span>
<span id="cb3-32"><a href="#cb3-32" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\t'</span> <span class="ot">-></span> <span class="st">"\\t"</span></span>
<span id="cb3-33"><a href="#cb3-33" aria-hidden="true" tabindex="-1"></a> _ <span class="op">|</span> <span class="fu">isControl</span> c <span class="ot">-></span> <span class="st">"\\u"</span> <span class="op">++</span> showJSONNonASCIIChar c</span>
<span id="cb3-34"><a href="#cb3-34" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> [c]</span>
<span id="cb3-35"><a href="#cb3-35" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb3-36"><a href="#cb3-36" aria-hidden="true" tabindex="-1"></a> showJSONNonASCIIChar c <span class="ot">=</span></span>
<span id="cb3-37"><a href="#cb3-37" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> a <span class="ot">=</span> <span class="st">"0000"</span> <span class="op">++</span> showHex (<span class="fu">ord</span> c) <span class="st">""</span> <span class="kw">in</span> <span class="fu">drop</span> (<span class="fu">length</span> a <span class="op">-</span> <span class="dv">4</span>) a</span></code></pre></div>
<p>We want this <code>Show</code> instance to show the JSON values as they appear in JSON text data. We do this so that we can reuse this instance to convert JSON values to text, to test our parsers later. Most of the cases are straightforward. For numbers, we handle the empty fraction and zero exponent cases separately, omitting those in the text form. JSON strings however, require some special handing for possible escape sequences and control characters. Note that we do not use <a href="https://hackage.haskell.org/package/base/docs/Data-Char.html#v:isControl" target="_blank" rel="noopener"><code>Data.Char.isControl</code></a> function here to detect the control characters, instead we write our own. This is because the JSON definition of control characters is different from the Haskell one. We show control characters as their four hex-digit representations prefixed by <code>\u</code><a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>. Also note that JSON strings are shown with surrounding double-quotes (<code>"</code>).</p>
<p>A quick test in GHCi confirms that it works fine:</p>
<div class="sourceCode" id="cb4" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> json <span class="ot">=</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">JObject</span> [(<span class="st">"a"</span>, <span class="dt">JNumber</span> <span class="dv">1</span> [] <span class="dv">0</span>), (<span class="st">"b"</span>, <span class="dt">JArray</span> [<span class="dt">JBool</span> <span class="dt">False</span>, <span class="dt">JString</span> <span class="st">"\20A"</span>])]</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">print</span> json</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>{"a": 1, "b": [false, "\u0014A"]}</span></code></pre></div>
<h2 data-track-content data-content-name="json-generators" data-content-piece="json-parsing-from-scratch-in-haskell" id="json-generators">JSON Generators</h2>
<p>As mentioned earlier, let’s write some QuickCheck generators to generate arbitrary JSON text data to use with property-based testing later. The plan is to generate arbitrary values of type <code>JValue</code> and convert them to text using the <code>Show</code> instance we wrote earlier. QuickCheck has the typeclass <a href="https://hackage.haskell.org/package/QuickCheck/docs/Test-QuickCheck.html#t:Arbitrary" target="_blank" rel="noopener"><code>Arbitrary</code></a> for the types for which it can generate random values. We can implement this typeclass for the <code>JValue</code> type but the problem with that is, we can’t have different generators for JSON numbers and strings and other cases. So instead, we write functions to directly create generators for different JSON value types.</p>
<div class="note">
<p>You may skip this section and jump to the <a href="#parser">Parser</a> section if you wish. You can come back here and read it when we start implementing tests for our parsers.</p>
</div>
<h3 id="scalar-generators">Scalar Generators</h3>
<p>The <a href="https://hackage.haskell.org/package/QuickCheck/docs/Test-QuickCheck.html#t:Gen" target="_blank" rel="noopener"><code>Gen</code></a> monad lets us write generators by combining the built-in generators. We use the existing generators of <code>Bool</code>, <code>Integer</code> and list types in QuickCheck to write the generators for <code>JNull</code>, <code>JBool</code> and <code>JNumber</code> values.</p>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jNullGen ::</span> <span class="dt">Gen</span> <span class="dt">JValue</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>jNullGen <span class="ot">=</span> <span class="fu">pure</span> <span class="dt">JNull</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="ot">jBoolGen ::</span> <span class="dt">Gen</span> <span class="dt">JValue</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>jBoolGen <span class="ot">=</span> <span class="dt">JBool</span> <span class="op"><$></span> arbitrary</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="ot">jNumberGen ::</span> <span class="dt">Gen</span> <span class="dt">JValue</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>jNumberGen <span class="ot">=</span> <span class="dt">JNumber</span> <span class="op"><$></span> arbitrary <span class="op"><*></span> listOf (choose (<span class="dv">0</span>, <span class="dv">9</span>)) <span class="op"><*></span> arbitrary</span></code></pre></div>
<p>Here, the <a href="https://hackage.haskell.org/package/base/docs/Data-Functor.html#v:-60--36--62-" target="_blank" rel="noopener"><code><$></code></a> operator is the infix symbolic form of the <code>fmap</code> function, and the <a href="https://hackage.haskell.org/package/base/docs/Control-Applicative.html#v:-60--42--62-" target="_blank" rel="noopener"><code><*></code></a> operator is the applicative apply function.</p>
<p>The JSON string generator is bit more complicated because we need to generate strings with both unescaped and escaped characters.</p>
<div class="sourceCode" id="cb6" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jsonStringGen ::</span> <span class="dt">Gen</span> <span class="dt">String</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>jsonStringGen <span class="ot">=</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">concat</span> <span class="op"><$></span> listOf (oneof [ vectorOf <span class="dv">1</span> arbitraryUnicodeChar</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> , escapedUnicodeChar ])</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> escapedUnicodeChar <span class="ot">=</span> (<span class="st">"\\u"</span> <span class="op">++</span>) <span class="op"><$></span> vectorOf <span class="dv">4</span> (elements hexDigitLetters)</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> hexDigitLetters <span class="ot">=</span> [<span class="ch">'0'</span><span class="op">..</span><span class="ch">'9'</span>] <span class="op">++</span> [<span class="ch">'a'</span><span class="op">..</span><span class="ch">'f'</span>] <span class="op">++</span> [<span class="ch">'A'</span><span class="op">..</span><span class="ch">'F'</span>]</span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a><span class="ot">jStringGen ::</span> <span class="dt">Gen</span> <span class="dt">JValue</span></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a>jStringGen <span class="ot">=</span> <span class="dt">JString</span> <span class="op"><$></span> jsonStringGen</span></code></pre></div>
<p>Let’s test them with the <a href="https://hackage.haskell.org/package/QuickCheck/docs/Test-QuickCheck.html#v:generate" target="_blank" rel="noopener"><code>generate</code></a> function from QuickCheck in GHCi:</p>
<div class="sourceCode" id="cb7" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> generate jNullGen</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>null</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> generate jBoolGen</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>true</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> generate jNumberGen</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>2.76e-2</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> generate jStringGen</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>"\\uB6dC\\u365E\\u5085\\uCF54\\u47d8\\u17CA\\u10Fd𣲙\\uE62a𡅪"</span></code></pre></div>
<p>Note that <code>jStringGen</code> may generate strings with any Unicode character so the generated string may not be renderable entirely on terminals or on browsers.</p>
<h3 id="composite-generators">Composite Generators</h3>
<p>The generators for composite values—Arrays and Object—take an <code>Int</code> parameter to control the size of the generated values. They invoke <code>jValueGen</code> which we are yet to define, to generate the component values recursively.</p>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jArrayGen ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Gen</span> <span class="dt">JValue</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>jArrayGen <span class="ot">=</span> <span class="fu">fmap</span> <span class="dt">JArray</span> <span class="op">.</span> scale (<span class="ot">`div`</span> <span class="dv">2</span>) <span class="op">.</span> listOf <span class="op">.</span> jValueGen <span class="op">.</span> (<span class="ot">`div`</span> <span class="dv">2</span>)</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a><span class="ot">jObjectGen ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Gen</span> <span class="dt">JValue</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>jObjectGen <span class="ot">=</span> <span class="fu">fmap</span> <span class="dt">JObject</span> <span class="op">.</span> scale (<span class="ot">`div`</span> <span class="dv">2</span>) <span class="op">.</span> listOf <span class="op">.</span> objKV <span class="op">.</span> (<span class="ot">`div`</span> <span class="dv">2</span>)</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> objKV n <span class="ot">=</span> (,) <span class="op"><$></span> jsonStringGen <span class="op"><*></span> jValueGen n</span></code></pre></div>
<p><code>`div` 2</code> used twice is to reduce the size of generated values which are otherwise too large and take a long time to generate. Trial in GHCi:</p>
<div class="sourceCode" id="cb9" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> generate <span class="op">$</span> jArrayGen <span class="dv">6</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>[true, "\\u8d78\\uC2C0", null, {}, null, ["\\uD28b", null, null, null], "\\uaC63\\u3Fec\\u55Fa\\uaB47\\uEea0\\u3BB5", false, null]</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> generate <span class="op">$</span> jObjectGen <span class="dv">6</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>{"\\uB2c5\\uB6f4\\udee6\\u3E6F𨔂\\u6037": [[[true]]], "\\uf57b\\ua499\\uE936": null, "\\u9D5a": -7.3310625010e-10}</span></code></pre></div>
<p>And finally, we have the generator for any <code>JValue</code>. It also takes a parameter to control the size. For small values of the parameter it tends towards generating more scalar values and does the opposite for larger values. It does so by calling the generators we defined earlier.</p>
<div class="sourceCode" id="cb10" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jValueGen ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Gen</span> <span class="dt">JValue</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>jValueGen n <span class="ot">=</span> <span class="kw">if</span> n <span class="op"><</span> <span class="dv">5</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> frequency [(<span class="dv">4</span>, oneof scalarGens), (<span class="dv">1</span>, oneof (compositeGens n))]</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> frequency [(<span class="dv">1</span>, oneof scalarGens), (<span class="dv">4</span>, oneof (compositeGens n))]</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> scalarGens <span class="ot">=</span> [jNullGen , jBoolGen , jNumberGen , jStringGen]</span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> compositeGens n <span class="ot">=</span> [jArrayGen n, jObjectGen n]</span></code></pre></div>
<p>Quick trial again:</p>
<div class="sourceCode" id="cb11" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> generate <span class="op">$</span> jValueGen <span class="dv">2</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>9.99546402304186496400162205e-13</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> generate <span class="op">$</span> jValueGen <span class="dv">6</span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>{"\\u8F1D\\ua32E\\u8d8D𰜢𱽹\\u1b21": false, "\\u56dd\\uCEbb\\uED43": 13e3, "\\u0de3\\uFFB6颮\\ufb8A\\uFCBa\\u03fa": 5.546567497889e3, "\\u631e\\u9d95\\u2Bb8": {"\\u3a0B𧭯\\ue05E𱋫쀦": -5.397e-1, "\\u9BcD\\u3dbd": false, "": "\\uD65b"}, "\\u0BDb\\ufdEB\\u0749\\ucc92\\u9da3\\u9079\\uDCF1\\udcF3": null, "憝\\udB70\\u8E9a\\ud3a4": true, "\\ubF82\\uf8bD\\u29E0\\uC60A": "\\ub5D7\\u98Ea\\uec7E\\uB27A\\u6bb2\\uFc4C\\uB9cC\\uDEC9", "\\u2fde𛃀\\uF490\\uaC02": true}</span></code></pre></div>
<p>We use <code>jValueGen</code> to write the <code>Arbitrary</code> instance for <code>JValue</code>:</p>
<div class="sourceCode" id="cb12" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Arbitrary</span> <span class="dt">JValue</span> <span class="kw">where</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a> arbitrary <span class="ot">=</span> sized jValueGen</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> shrink <span class="ot">=</span> genericShrink</span></code></pre></div>
<p>The <a href="https://hackage.haskell.org/package/base/docs/GHC-Generics.html#t:Generic" target="_blank" rel="noopener"><code>Generic</code></a> instance derivation for <code>JValue</code> lets us use the <a href="https://hackage.haskell.org/package/QuickCheck/docs/Test-QuickCheck.html#v:genericShrink" target="_blank" rel="noopener"><code>genericShrink</code></a> function from QuickCheck to automatically shrink test input on test failure.</p>
<p>And finally, one last missing piece:</p>
<h3 id="adding-whitespace">Adding Whitespace</h3>
<p>The JSON grammar allows whitespaces around many of its parts as depicted in the transition diagrams in the <a href="#json-syntax">JSON syntax</a> section. But our current implementation of the <code>Show</code> instance for <code>JValue</code> does not add any extra whitespace around anything. This is because the <code>show</code> function is pure, and hence cannot generate arbitrary amount of whitespaces. But the <code>Gen</code> monad can! So let’s write a function to “stringify” <code>JValue</code> with arbitrary whitespace:</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jsonWhitespaceGen ::</span> <span class="dt">Gen</span> <span class="dt">String</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>jsonWhitespaceGen <span class="ot">=</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a> scale (<span class="fu">round</span> <span class="op">.</span> <span class="fu">sqrt</span> <span class="op">.</span> <span class="fu">fromIntegral</span>)</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> listOf</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> elements</span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> [<span class="ch">' '</span> , <span class="ch">'\n'</span> , <span class="ch">'\r'</span> , <span class="ch">'\t'</span>]</span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a><span class="ot">stringify ::</span> <span class="dt">JValue</span> <span class="ot">-></span> <span class="dt">Gen</span> <span class="dt">String</span></span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a>stringify <span class="ot">=</span> pad <span class="op">.</span> go</span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a> surround l r j <span class="ot">=</span> l <span class="op">++</span> j <span class="op">++</span> r</span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a> pad gen <span class="ot">=</span> surround <span class="op"><$></span> jsonWhitespaceGen <span class="op"><*></span> jsonWhitespaceGen <span class="op"><*></span> gen</span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a> commaSeparated <span class="ot">=</span> pad <span class="op">.</span> <span class="fu">pure</span> <span class="op">.</span> intercalate <span class="st">","</span></span>
<span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a> go value <span class="ot">=</span> <span class="kw">case</span> value <span class="kw">of</span></span>
<span id="cb13-16"><a href="#cb13-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">JArray</span> elements <span class="ot">-></span></span>
<span id="cb13-17"><a href="#cb13-17" aria-hidden="true" tabindex="-1"></a> <span class="fu">mapM</span> (pad <span class="op">.</span> stringify) elements</span>
<span id="cb13-18"><a href="#cb13-18" aria-hidden="true" tabindex="-1"></a> <span class="op">>>=</span> <span class="fu">fmap</span> (surround <span class="st">"["</span> <span class="st">"]"</span>) <span class="op">.</span> commaSeparated</span>
<span id="cb13-19"><a href="#cb13-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">JObject</span> kvs <span class="ot">-></span></span>
<span id="cb13-20"><a href="#cb13-20" aria-hidden="true" tabindex="-1"></a> <span class="fu">mapM</span> stringifyKV kvs <span class="op">>>=</span> <span class="fu">fmap</span> (surround <span class="st">"{"</span> <span class="st">"}"</span>) <span class="op">.</span> commaSeparated</span>
<span id="cb13-21"><a href="#cb13-21" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">return</span> <span class="op">$</span> <span class="fu">show</span> value</span>
<span id="cb13-22"><a href="#cb13-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-23"><a href="#cb13-23" aria-hidden="true" tabindex="-1"></a> stringifyKV (k, v) <span class="ot">=</span></span>
<span id="cb13-24"><a href="#cb13-24" aria-hidden="true" tabindex="-1"></a> surround <span class="op"><$></span> pad (<span class="fu">pure</span> <span class="op">$</span> showJSONString k) <span class="op"><*></span> stringify v <span class="op"><*></span> <span class="fu">pure</span> <span class="st">":"</span></span></code></pre></div>
<p><code>jsonWhitespaceGen</code> is a generator for valid JSON whitespace only strings. We use it in the <code>stringify</code> function to traverse over the <code>JValue</code> structure and recursively show parts of it and pad them with arbitrary whitespace.</p>
<p>With everything in place now, we can generate arbitrary JSON text in GHCi:</p>
<div class="sourceCode" id="cb14" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> generate <span class="op">$</span> jValueGen <span class="dv">6</span> <span class="op">>>=</span> stringify</span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>"\t\n\t\t\r[\t \tnull \t\n\t\r, \r \rtrue,\r\t \n\r\t\"\22981\93569\34480\873701\689193\476209\\\\ubacc\\\\u794A\\\\u1C30\"\t \n ,\r \n \t [ \n\n\n \t\t\n\t\ntrue\n\r\t\n\n\n\n,\t\r\t\r \r{\r\r\n \t \n\"\\\\u8a1F\\\\uCcDc\895076\"\r\r \r:\r\t\t\r\tfalse \n\t\r , \"\248461\"\r: \t\t\n {\r\n} \t\r\t\t},\t \n \t\t\r\t{\"\"\t:\r 3e-3\t,\n\"\\\\u5F81\\\\uc031\"\t\n:\"\803844\"\t\t\t ,\"\\\\u29b1\"\n:\r null\t\r\t\r\t \t\t\t\t}\r\n ,\n\t \rtrue\r\t \t,\n \t\t\r{\r\"\\\\u2fA6\759074\"\r\t\t:\t\n[\n \r\r\rnull]\n\t,\n\r\n\n \"\\\\uEee3\\\\u5Dab\61593\" : \n\tnull\n\n \t\r\n\r\r}\n \n \r\r\r,\r \r \n\"\951294\\\\u9dd3\\\\u0B39\"\t\n\t,\n\n \t\n \nfalse\r\r\n\r\n \t \r]\n \n\t\r, \r\t \n\t \"\16324\\\\uf6DE\733261\\\\u8b38\\\\ueBa2\382636\474586\\\\uCDDc\\\\u49ee\\\\ua989\"\n ,\n\t\r\rnull \r\r\n\n\t ]\r\r"</span></code></pre></div>
<p>You can go over the output and verify that it indeed is a valid JSON text data.</p>
<h2 data-track-content data-content-name="parser" data-content-piece="json-parsing-from-scratch-in-haskell" id="parser">Parser</h2>
<p>With the generators set up, let’s write the parsers now. So what exactly is a <em>Parser</em>? A parser takes some input, reads some part of it and maybe “parses” it into some relevant data structure. And it leaves the rest of the input to be potentially parsed later. That sounds like a function! Let’s write it down:</p>
<div class="sourceCode" id="cb15" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Parser</span> i o <span class="ot">=</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">Parser</span> {<span class="ot"> runParser ::</span> i <span class="ot">-></span> <span class="dt">Maybe</span> (i, o) }</span></code></pre></div>
<p>As per our definition, a parser is just a wrapper over the function type <code>i -> Maybe (i, o)</code><a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a>. If a parser succeeds in parsing then it returns the rest of the input and the output it parsed the input to, else it returns nothing. This definition is simple but it will do for our purpose.</p>
<p>Let’s write our first parser to illustrate this type.</p>
<h2 data-track-content data-content-name="char-parser" data-content-piece="json-parsing-from-scratch-in-haskell" id="char-parser">Char Parser</h2>
<p>We are starting simple. We are going to write a parser to match the first character of the input with a given character.</p>
<div class="sourceCode" id="cb17" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">char1 ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>char1 c <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \<span class="kw">case</span></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a> (x<span class="op">:</span>xs) <span class="op">|</span> x <span class="op">==</span> c <span class="ot">-></span> <span class="dt">Just</span> (xs, x)</span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">Nothing</span></span></code></pre></div>
<p><code>char1</code> parser matches the given character with the input string and succeeds only if the input starts with the given character<a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a>. It returns the rest of the input and the matched character on success. Let’s exercise this on GHCi:</p>
<div class="sourceCode" id="cb18" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser (char1 <span class="ch">'a'</span>) <span class="st">"abhinav"</span></span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>Just ("bhinav",'a')</span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser (char1 <span class="ch">'a'</span>) <span class="st">"sarkar"</span></span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a>Nothing</span></code></pre></div>
<p>Great! We just wrote and ran our first parser. We can generalized this parser by extracting the predicate satisfaction clause out:</p>
<div class="sourceCode" id="cb19" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">satisfy ::</span> (a <span class="ot">-></span> <span class="dt">Bool</span>) <span class="ot">-></span> <span class="dt">Parser</span> [a] a</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>satisfy predicate <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \<span class="kw">case</span></span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a> (x<span class="op">:</span>xs) <span class="op">|</span> predicate x <span class="ot">-></span> <span class="dt">Just</span> (xs, x)</span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a><span class="ot">char ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a>char c <span class="ot">=</span> satisfy (<span class="op">==</span> c)</span></code></pre></div>
<h2 data-track-content data-content-name="digit-parser" data-content-piece="json-parsing-from-scratch-in-haskell" id="digit-parser">Digit Parser</h2>
<p>Moving on, let’s write a parser to parse a digit:</p>
<div class="sourceCode" id="cb20" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">digit1 ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Int</span></span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>digit1 <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \i <span class="ot">-></span> <span class="kw">case</span> runParser (satisfy <span class="fu">isDigit</span>) i <span class="kw">of</span></span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (i', o) <span class="ot">-></span> <span class="dt">Just</span> (i', <span class="fu">digitToInt</span> o)</span></code></pre></div>
<p>We simply use the <code>satisfy</code> parser with the <a href="https://hackage.haskell.org/package/base/docs/Data-Char.html#v:isDigit" target="_blank" rel="noopener"><code>isDigit</code></a> function to parse a character which is a digit (0–9) and then run the <a href="https://hackage.haskell.org/package/base/docs/Data-Char.html#v:digitToInt" target="_blank" rel="noopener"><code>digitToInt</code></a> function on the output character to convert it to an <code>Int</code>. Trial run:</p>
<div class="sourceCode" id="cb21" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser digit1 <span class="st">"123"</span></span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>Just ("23",1)</span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser digit1 <span class="st">"abc"</span></span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a>Nothing</span></code></pre></div>
<p>However, we can do some refactoring:</p>
<div class="sourceCode" id="cb22" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="ot">digit2 ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Int</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a>digit2 <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \i <span class="ot">-></span> <span class="kw">case</span> runParser (satisfy <span class="fu">isDigit</span>) i <span class="kw">of</span></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (i', o) <span class="ot">-></span> <span class="dt">Just</span> <span class="op">.</span> <span class="fu">fmap</span> <span class="fu">digitToInt</span> <span class="op">$</span> (i', o)</span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a><span class="ot">digit3 ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Int</span></span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a>digit3 <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \i <span class="ot">-></span> <span class="fu">fmap</span> (<span class="fu">fmap</span> <span class="fu">digitToInt</span>) <span class="op">.</span> runParser (satisfy <span class="fu">isDigit</span>) <span class="op">$</span> i</span></code></pre></div>
<p>Hmm, it is staring to look like …</p>
<h3 id="parser-is-a-functor"><code>Parser</code> is a <code>Functor</code></h3>
<div class="sourceCode" id="cb23" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Functor</span> (<span class="dt">Parser</span> i) <span class="kw">where</span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">fmap</span> f parser <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> <span class="fu">fmap</span> (<span class="fu">fmap</span> f) <span class="op">.</span> runParser parser</span></code></pre></div>
<p>Now we can rewrite the digit parser as:</p>
<div class="sourceCode" id="cb24" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="ot">digit ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Int</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>digit <span class="ot">=</span> <span class="fu">digitToInt</span> <span class="op"><$></span> satisfy <span class="fu">isDigit</span></span></code></pre></div>
<h2 data-track-content data-content-name="string-parser" data-content-piece="json-parsing-from-scratch-in-haskell" id="string-parser">String Parser</h2>
<p>Let’s write a parser to parse out a given string from the input:</p>
<div class="sourceCode" id="cb25" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ot">string1 ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">String</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a>string1 s <span class="ot">=</span> <span class="kw">case</span> s <span class="kw">of</span></span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a> <span class="st">""</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="op">$</span> \i <span class="ot">-></span> <span class="dt">Just</span> (i, <span class="st">""</span>)</span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a> (c<span class="op">:</span>cs) <span class="ot">-></span> <span class="dt">Parser</span> <span class="op">$</span> \i <span class="ot">-></span> <span class="kw">case</span> runParser (char c) i <span class="kw">of</span></span>
<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb25-6"><a href="#cb25-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (rest, _) <span class="ot">-></span> <span class="kw">case</span> runParser (string1 cs) rest <span class="kw">of</span></span>
<span id="cb25-7"><a href="#cb25-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb25-8"><a href="#cb25-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (rest', _) <span class="ot">-></span> <span class="dt">Just</span> (rest', c<span class="op">:</span>cs)</span></code></pre></div>
<p>The <code>string1</code> parser is written recursively. As the base case, if the given string is empty, we simply return the input and an empty string as the result. Otherwise, we match the first character of the given string with the input by parsing it with the <code>char</code> parser. If it fails, the <code>string1</code> parser fails. Else, we recursively run the <code>string1</code> parser with the rest of the given string against the rest of the input. If the parsing succeeds, we cons the first parsed character with the rest of the parsed characters. Trying in GHCi:</p>
<div class="sourceCode" id="cb26" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser (string1 <span class="st">"hello"</span>) <span class="st">"hello world"</span></span>
<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a>Just (" world","hello")</span>
<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser (string1 <span class="st">"hello"</span>) <span class="st">"help world"</span></span>
<span id="cb26-4"><a href="#cb26-4" aria-hidden="true" tabindex="-1"></a>Nothing</span></code></pre></div>
<p>Let’s refactor this a bit:</p>
<div class="sourceCode" id="cb27" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="ot">string2 ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">String</span></span>
<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a>string2 s <span class="ot">=</span> <span class="kw">case</span> s <span class="kw">of</span></span>
<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a> <span class="st">""</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="op">$</span> <span class="fu">pure</span> <span class="op">.</span> (, <span class="st">""</span>)</span>
<span id="cb27-4"><a href="#cb27-4" aria-hidden="true" tabindex="-1"></a> (c<span class="op">:</span>cs) <span class="ot">-></span> <span class="dt">Parser</span> <span class="op">$</span> \i <span class="ot">-></span> <span class="kw">case</span> runParser (char c) i <span class="kw">of</span></span>
<span id="cb27-5"><a href="#cb27-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb27-6"><a href="#cb27-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (rest, c) <span class="ot">-></span> <span class="fu">fmap</span> (c<span class="op">:</span>) <span class="op"><$></span> runParser (string2 cs) rest</span></code></pre></div>
<p>If you squint a little bit, what do you think that looks like? Yes, you are right …</p>
<h3 id="parser-is-an-applicative"><code>Parser</code> is an <code>Applicative</code></h3>
<div class="sourceCode" id="cb28" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Applicative</span> (<span class="dt">Parser</span> i) <span class="kw">where</span></span>
<span id="cb28-2"><a href="#cb28-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">pure</span> x <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> <span class="fu">pure</span> <span class="op">.</span> (, x)</span>
<span id="cb28-3"><a href="#cb28-3" aria-hidden="true" tabindex="-1"></a> pf <span class="op"><*></span> po <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> runParser pf input <span class="kw">of</span></span>
<span id="cb28-4"><a href="#cb28-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb28-5"><a href="#cb28-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (rest, f) <span class="ot">-></span> <span class="fu">fmap</span> f <span class="op"><$></span> runParser po rest</span></code></pre></div>
<p>Take a minute to read and digest this. With the <code>Applicative</code> instance, we can now rewrite the <code>string1</code> parser as:</p>
<div class="sourceCode" id="cb29" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="ot">string ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">String</span></span>
<span id="cb29-2"><a href="#cb29-2" aria-hidden="true" tabindex="-1"></a>string <span class="st">""</span> <span class="ot">=</span> <span class="fu">pure</span> <span class="st">""</span></span>
<span id="cb29-3"><a href="#cb29-3" aria-hidden="true" tabindex="-1"></a>string (c<span class="op">:</span>cs) <span class="ot">=</span> (<span class="op">:</span>) <span class="op"><$></span> char c <span class="op"><*></span> string cs</span></code></pre></div>
<p>The <a href="https://hackage.haskell.org/package/base/docs/Prelude.html#t:Functor" target="_blank" rel="noopener"><code>Functor</code></a> and <a href="https://hackage.haskell.org/package/base/docs/Prelude.html#t:Applicative" target="_blank" rel="noopener"><code>Applicative</code></a> instances for <code>Parser</code> make it really powerful. With the <code>Functor</code> instance, we can lift pure functions to do operations on parsers. With the <code>Applicative</code> instance, we can combine multiple parsers together with <code>Applicative</code> functions (like <code><*></code>) to create new parsers. Now we are ready to write our first JSON parsers.</p>
<h2 data-track-content data-content-name="jnull-and-jbool-parsers" data-content-piece="json-parsing-from-scratch-in-haskell" id="jnull-and-jbool-parsers">JNull and JBool Parsers</h2>
<p>The parser for <code>JNull</code> is merely a <code>string</code> parser for the string <code>null</code>:</p>
<div class="sourceCode" id="cb30" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb30-1"><a href="#cb30-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jNull ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb30-2"><a href="#cb30-2" aria-hidden="true" tabindex="-1"></a>jNull <span class="ot">=</span> string <span class="st">"null"</span> <span class="op">$></span> <span class="dt">JNull</span></span></code></pre></div>
<p>We use the <a href="https://hackage.haskell.org/package/base/docs/Data-Functor.html#v:-36--62-" target="_blank" rel="noopener"><code>$></code></a> operator to discard the parsed string and return the <code>JNull</code> value.</p>
<p>The parser for JSON boolean values needs to parse for the string <code>true</code>, falling back on parsing for the string <code>false</code> if failed. This is called <em><a href="https://en.wikipedia.org/wiki/Backtracking" target="_blank" rel="noopener">Backtracking</a></em> in parsing parlance. To achieve this easily in Haskell, we have to make it so that …</p>
<h3 id="parser-is-an-alternative">Parser is an <code>Alternative</code></h3>
<div class="sourceCode" id="cb31" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Alternative</span> (<span class="dt">Parser</span> i) <span class="kw">where</span></span>
<span id="cb31-2"><a href="#cb31-2" aria-hidden="true" tabindex="-1"></a> empty <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> <span class="fu">const</span> empty</span>
<span id="cb31-3"><a href="#cb31-3" aria-hidden="true" tabindex="-1"></a> p1 <span class="op"><|></span> p2 <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span> runParser p1 input <span class="op"><|></span> runParser p2 input</span></code></pre></div>
<p>The <a href="https://hackage.haskell.org/package/base/docs/Control-Applicative.html#t:Alternative" target="_blank" rel="noopener"><code>Alternative</code></a> typeclass does exactly what is sounds like. The <code><|></code> function lets you choose a different alternative if the first option fails, hence allowing backtracking. With this, we can write the <code>JBool</code> parser simply as:</p>
<div class="sourceCode" id="cb32" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb32-1"><a href="#cb32-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jBool ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb32-2"><a href="#cb32-2" aria-hidden="true" tabindex="-1"></a>jBool <span class="ot">=</span> string <span class="st">"true"</span> <span class="op">$></span> <span class="dt">JBool</span> <span class="dt">True</span></span>
<span id="cb32-3"><a href="#cb32-3" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> string <span class="st">"false"</span> <span class="op">$></span> <span class="dt">JBool</span> <span class="dt">False</span></span></code></pre></div>
<p>Over to GHCi:</p>
<div class="sourceCode" id="cb33" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb33-1"><a href="#cb33-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNull <span class="st">"null"</span></span>
<span id="cb33-2"><a href="#cb33-2" aria-hidden="true" tabindex="-1"></a>Just ("",null)</span>
<span id="cb33-3"><a href="#cb33-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNull <span class="st">"dull"</span></span>
<span id="cb33-4"><a href="#cb33-4" aria-hidden="true" tabindex="-1"></a>Nothing</span>
<span id="cb33-5"><a href="#cb33-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jBool <span class="st">"true"</span></span>
<span id="cb33-6"><a href="#cb33-6" aria-hidden="true" tabindex="-1"></a>Just ("",true)</span>
<span id="cb33-7"><a href="#cb33-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jBool <span class="st">"false"</span></span>
<span id="cb33-8"><a href="#cb33-8" aria-hidden="true" tabindex="-1"></a>Just ("",false)</span>
<span id="cb33-9"><a href="#cb33-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jBool <span class="st">"truth"</span></span>
<span id="cb33-10"><a href="#cb33-10" aria-hidden="true" tabindex="-1"></a>Nothing</span>
<span id="cb33-11"><a href="#cb33-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jBool <span class="st">"falsities"</span></span>
<span id="cb33-12"><a href="#cb33-12" aria-hidden="true" tabindex="-1"></a>Nothing</span></code></pre></div>
<p>These two parsers were pretty simple. The next one is going to be a tad more complicated.</p>
<h2 data-track-content data-content-name="jstring-parser" data-content-piece="json-parsing-from-scratch-in-haskell" id="jstring-parser">JString Parser</h2>
<p>Before writing the JSON string parser, we need a parser to parse JSON characters.</p>
<p>As explained in the <a href="#string">String part</a> of the JSON syntax section, the JSON spec allows characters in JSON strings to escaped with some special sequences or with a <code>\u</code> prefix and characters’ hex-digit codes. Also, JSON control characters cannot be written directly in JSON strings. So we write the JSON character parser as a combination of all these alternatives:</p>
<div class="sourceCode" id="cb34" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb34-1"><a href="#cb34-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jsonChar ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb34-2"><a href="#cb34-2" aria-hidden="true" tabindex="-1"></a>jsonChar <span class="ot">=</span> string <span class="st">"\\\""</span> <span class="op">$></span> <span class="ch">'"'</span></span>
<span id="cb34-3"><a href="#cb34-3" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> string <span class="st">"\\\\"</span> <span class="op">$></span> <span class="ch">'\\'</span></span>
<span id="cb34-4"><a href="#cb34-4" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> string <span class="st">"\\/"</span> <span class="op">$></span> <span class="ch">'/'</span></span>
<span id="cb34-5"><a href="#cb34-5" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> string <span class="st">"\\b"</span> <span class="op">$></span> <span class="ch">'\b'</span></span>
<span id="cb34-6"><a href="#cb34-6" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> string <span class="st">"\\f"</span> <span class="op">$></span> <span class="ch">'\f'</span></span>
<span id="cb34-7"><a href="#cb34-7" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> string <span class="st">"\\n"</span> <span class="op">$></span> <span class="ch">'\n'</span></span>
<span id="cb34-8"><a href="#cb34-8" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> string <span class="st">"\\r"</span> <span class="op">$></span> <span class="ch">'\r'</span></span>
<span id="cb34-9"><a href="#cb34-9" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> string <span class="st">"\\t"</span> <span class="op">$></span> <span class="ch">'\t'</span></span>
<span id="cb34-10"><a href="#cb34-10" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> unicodeChar</span>
<span id="cb34-11"><a href="#cb34-11" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> satisfy (\c <span class="ot">-></span> <span class="fu">not</span> (c <span class="op">==</span> <span class="ch">'\"'</span> <span class="op">||</span> c <span class="op">==</span> <span class="ch">'\\'</span> <span class="op">||</span> <span class="fu">isControl</span> c))</span>
<span id="cb34-12"><a href="#cb34-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb34-13"><a href="#cb34-13" aria-hidden="true" tabindex="-1"></a> unicodeChar <span class="ot">=</span></span>
<span id="cb34-14"><a href="#cb34-14" aria-hidden="true" tabindex="-1"></a> <span class="fu">chr</span> <span class="op">.</span> <span class="fu">fromIntegral</span> <span class="op">.</span> digitsToNumber <span class="dv">16</span> <span class="dv">0</span></span>
<span id="cb34-15"><a href="#cb34-15" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> (string <span class="st">"\\u"</span> <span class="op">*></span> replicateM <span class="dv">4</span> hexDigit)</span>
<span id="cb34-16"><a href="#cb34-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb34-17"><a href="#cb34-17" aria-hidden="true" tabindex="-1"></a> hexDigit <span class="ot">=</span> <span class="fu">digitToInt</span> <span class="op"><$></span> satisfy <span class="fu">isHexDigit</span></span>
<span id="cb34-18"><a href="#cb34-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb34-19"><a href="#cb34-19" aria-hidden="true" tabindex="-1"></a><span class="ot">digitsToNumber ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Integer</span> <span class="ot">-></span> [<span class="dt">Int</span>] <span class="ot">-></span> <span class="dt">Integer</span></span>
<span id="cb34-20"><a href="#cb34-20" aria-hidden="true" tabindex="-1"></a>digitsToNumber base <span class="ot">=</span></span>
<span id="cb34-21"><a href="#cb34-21" aria-hidden="true" tabindex="-1"></a> <span class="fu">foldl</span> (\num d <span class="ot">-></span> num <span class="op">*</span> <span class="fu">fromIntegral</span> base <span class="op">+</span> <span class="fu">fromIntegral</span> d)</span></code></pre></div>
<p>Note that the order of the alternative clauses is important here. The most eager clause is the last one.</p>
<p>The <a href="https://hackage.haskell.org/package/base/docs/Control-Applicative.html#v:-42--62-" target="_blank" rel="noopener"><code>*></code></a> function from the <code>Applicative</code> typeclass lets us run the parser on its left, discard the parsed value on success and run the parser on its right. <code>replicateM</code> runs the given parser <code>n</code> times, gathering the results in a list. The <code>digitsToNumber</code> function takes a list of digits as <code>Int</code>s and combines them to create a number with the given base. We use these functions to write the <code>unicodeChar</code> parser which parses the <code>\u</code> prefix character representations.</p>
<div id="jstring-unicode">
<p>Now that we have the JSON character parser, it should be really easy to parse a JSON string, right? After all, a string is just a list of characters. Wrong! Quoting from the <a href="https://tools.ietf.org/html/rfc8259#section-7" target="_blank" rel="noopener">String section</a> of RFC 8259:</p>
</div>
<blockquote>
<p>To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a 12-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
<code>"\\uD834\\uDD1E"</code>.</p>
</blockquote>
<p>Now it’s time for short detour to the Unicode Land.</p>
<h3 id="unicode-planes-and-surrogate-characters">Unicode Planes and Surrogate Characters</h3>
<p>In the Unicode standard, a plane is a contiguous group of 2<sup>16</sup> <a href="https://en.wikipedia.org/wiki/code_point" target="_blank" rel="noopener">code points</a>. The first of these planes which covers most of the commonly used characters is called the <a href="https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane" target="_blank" rel="noopener">Basic Multilingual Plane</a> (BMP).</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 750 500'%3E%3C/svg%3E" class="lazyload nolink full-width" style="--image-aspect-ratio: 1.5" data-src="/images/json-parsing-from-scratch-in-haskell/Roadmap_to_Unicode_BMP.svg" alt="The map of the Basic Multilingual Plane. From Wikipedia."></img>
<noscript><img src="/images/json-parsing-from-scratch-in-haskell/Roadmap_to_Unicode_BMP.svg" class="nolink full-width" alt="The map of the Basic Multilingual Plane. From Wikipedia."></img></noscript>
<figcaption>The map of the Basic Multilingual Plane. From <a href="https://en.wikipedia.org/wiki/File:Roadmap_to_Unicode_BMP.svg" target="_blank" rel="noopener">Wikipedia</a>.</figcaption>
</figure>
<p>The characters which are not in the BMP can be encoded into it using the code points from the High Surrogate (U+D800–U+DBFF) and Low Surrogate (U+DC00–U+DFFF) blocks of the BMP. A pair of a High Surrogate and a Low Surrogate code points can be used to encode a non-BMP character. A lone surrogate code point cannot be a valid character. <a href="https://en.wikipedia.org/wiki/Clef#G-clefs" target="_blank" rel="noopener">G clef</a> 𝄞 residing in the <a href="https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Supplementary_Multilingual_Plane" target="_blank" rel="noopener">Plane 1</a>, is an example character with code point U+1D11E and surrogate representation (U+D834, U+DD1E).</p>
<p>So, to parse a JSON string, we need to work by character pairs and not just one character at a time. Our current abstractions of <code>Functor</code> and <code>Applicative</code> are not powerful enough for this because they work with only one element at a time. We need something more powerful. We need to learn that …</p>
<h3 id="parser-is-a-monad">Parser is a <code>Monad</code></h3>
<div class="sourceCode" id="cb35" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb35-1"><a href="#cb35-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Monad</span> (<span class="dt">Parser</span> i) <span class="kw">where</span></span>
<span id="cb35-2"><a href="#cb35-2" aria-hidden="true" tabindex="-1"></a> p <span class="op">>>=</span> f <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \input <span class="ot">-></span> <span class="kw">case</span> runParser p input <span class="kw">of</span></span>
<span id="cb35-3"><a href="#cb35-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb35-4"><a href="#cb35-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (rest, o) <span class="ot">-></span> runParser (f o) rest</span></code></pre></div>
<p>The <a href="https://hackage.haskell.org/package/base/docs/Prelude.html#t:Monad" target="_blank" rel="noopener"><code>Monad</code></a> typeclass lets us sequence operations in a context so that the second operation can depend on the result of the first operation. Let’s use it to write the JSON string parser:</p>
<div class="sourceCode" id="cb36" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb36-1"><a href="#cb36-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jString ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb36-2"><a href="#cb36-2" aria-hidden="true" tabindex="-1"></a>jString <span class="ot">=</span> <span class="dt">JString</span> <span class="op"><$></span> (char <span class="ch">'"'</span> <span class="op">*></span> jString') <span class="co">-- 1</span></span>
<span id="cb36-3"><a href="#cb36-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb36-4"><a href="#cb36-4" aria-hidden="true" tabindex="-1"></a> jString' <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb36-5"><a href="#cb36-5" aria-hidden="true" tabindex="-1"></a> optFirst <span class="ot"><-</span> optional jsonChar <span class="co">-- 2</span></span>
<span id="cb36-6"><a href="#cb36-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> optFirst <span class="kw">of</span></span>
<span id="cb36-7"><a href="#cb36-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="st">""</span> <span class="op"><$</span> char <span class="ch">'"'</span> <span class="co">-- 3</span></span>
<span id="cb36-8"><a href="#cb36-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> first <span class="op">|</span> <span class="fu">not</span> (isSurrogate first) <span class="ot">-></span> <span class="co">-- 4</span></span>
<span id="cb36-9"><a href="#cb36-9" aria-hidden="true" tabindex="-1"></a> (first<span class="op">:</span>) <span class="op"><$></span> jString' <span class="co">-- 5</span></span>
<span id="cb36-10"><a href="#cb36-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> first <span class="ot">-></span> <span class="kw">do</span> <span class="co">-- 6</span></span>
<span id="cb36-11"><a href="#cb36-11" aria-hidden="true" tabindex="-1"></a> second <span class="ot"><-</span> jsonChar <span class="co">-- 7</span></span>
<span id="cb36-12"><a href="#cb36-12" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> isHighSurrogate first <span class="op">&&</span> isLowSurrogate second <span class="co">-- 8</span></span>
<span id="cb36-13"><a href="#cb36-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> (combineSurrogates first second <span class="op">:</span>) <span class="op"><$></span> jString' <span class="co">-- 9</span></span>
<span id="cb36-14"><a href="#cb36-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> empty <span class="co">-- 10</span></span></code></pre></div>
<p>This code is quite dense so let’s look at it line-by-line. Match the number cues in the code comments with the step numbers below:</p>
<ol type="1">
<li>Parse the starting double-quote (<code>"</code>) and run the rest of the string through the ancillary parser <code>jString'</code>. Also wrap the returned result with the <code>JString</code> constructor at the end.</li>
<li>Parse and get the optional first character using the <a href="https://hackage.haskell.org/package/base/docs/Control-Applicative.html#v:optional" target="_blank" rel="noopener"><code>optional</code></a> function.</li>
<li>If there is no first character, the input is empty. Try to match the ending double-quote (<code>"</code>) and return an empty string as output<a href="#fn11" class="footnote-ref" id="fnref11" role="doc-noteref"><sup>11</sup></a>.</li>
<li>If there is a first character and it is not a surrogate then:</li>
<li>Run the <code>jString'</code> parser recursively on the rest of the input and return this character consed with the rest of the output just as it was done in the <a href="#string-parser">String parser</a>.</li>
<li>Else, that is, if the first character <strong>is</strong> a surrogate then:</li>
<li>Parse and get the second character. Note that this is not an optional operation like step 2 because there can be no lone surrogates.</li>
<li>If the first character is a High Surrogate and the second character is a Low Surrogate, that is, if we have a valid surrogate pair:</li>
<li>Combine the two surrogates, parse the rest of the string with the <code>jString'</code> parser recursively, cons the combined character with the rest of the output and return it.</li>
<li>Else fail because the surrogate pair is invalid.</li>
</ol>
<p>In summary, we read two characters from the input instead of one and see if we can find a valid surrogate pair.</p>
<p>The helper functions are:</p>
<div class="sourceCode" id="cb37" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb37-1"><a href="#cb37-1" aria-hidden="true" tabindex="-1"></a>highSurrogateLowerBound,<span class="ot"> highSurrogateUpperBound ::</span> <span class="dt">Int</span></span>
<span id="cb37-2"><a href="#cb37-2" aria-hidden="true" tabindex="-1"></a>highSurrogateLowerBound <span class="ot">=</span> <span class="bn">0xD800</span></span>
<span id="cb37-3"><a href="#cb37-3" aria-hidden="true" tabindex="-1"></a>highSurrogateUpperBound <span class="ot">=</span> <span class="bn">0xDBFF</span></span>
<span id="cb37-4"><a href="#cb37-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb37-5"><a href="#cb37-5" aria-hidden="true" tabindex="-1"></a>lowSurrogateLowerBound,<span class="ot"> lowSurrogateUpperBound ::</span> <span class="dt">Int</span></span>
<span id="cb37-6"><a href="#cb37-6" aria-hidden="true" tabindex="-1"></a>lowSurrogateLowerBound <span class="ot">=</span> <span class="bn">0xDC00</span></span>
<span id="cb37-7"><a href="#cb37-7" aria-hidden="true" tabindex="-1"></a>lowSurrogateUpperBound <span class="ot">=</span> <span class="bn">0xDFFF</span></span>
<span id="cb37-8"><a href="#cb37-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb37-9"><a href="#cb37-9" aria-hidden="true" tabindex="-1"></a>isHighSurrogate, isLowSurrogate,<span class="ot"> isSurrogate ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Bool</span></span>
<span id="cb37-10"><a href="#cb37-10" aria-hidden="true" tabindex="-1"></a>isHighSurrogate a <span class="ot">=</span></span>
<span id="cb37-11"><a href="#cb37-11" aria-hidden="true" tabindex="-1"></a> <span class="fu">ord</span> a <span class="op">>=</span> highSurrogateLowerBound <span class="op">&&</span> <span class="fu">ord</span> a <span class="op"><=</span> highSurrogateUpperBound</span>
<span id="cb37-12"><a href="#cb37-12" aria-hidden="true" tabindex="-1"></a>isLowSurrogate a <span class="ot">=</span></span>
<span id="cb37-13"><a href="#cb37-13" aria-hidden="true" tabindex="-1"></a> <span class="fu">ord</span> a <span class="op">>=</span> lowSurrogateLowerBound <span class="op">&&</span> <span class="fu">ord</span> a <span class="op"><=</span> lowSurrogateUpperBound</span>
<span id="cb37-14"><a href="#cb37-14" aria-hidden="true" tabindex="-1"></a>isSurrogate a <span class="ot">=</span> isHighSurrogate a <span class="op">||</span> isLowSurrogate a</span>
<span id="cb37-15"><a href="#cb37-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb37-16"><a href="#cb37-16" aria-hidden="true" tabindex="-1"></a><span class="ot">combineSurrogates ::</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Char</span></span>
<span id="cb37-17"><a href="#cb37-17" aria-hidden="true" tabindex="-1"></a>combineSurrogates a b <span class="ot">=</span> <span class="fu">chr</span> <span class="op">$</span></span>
<span id="cb37-18"><a href="#cb37-18" aria-hidden="true" tabindex="-1"></a> ((<span class="fu">ord</span> a <span class="op">-</span> highSurrogateLowerBound) <span class="ot">`shiftL`</span> <span class="dv">10</span>)</span>
<span id="cb37-19"><a href="#cb37-19" aria-hidden="true" tabindex="-1"></a> <span class="op">+</span> (<span class="fu">ord</span> b <span class="op">-</span> lowSurrogateLowerBound) <span class="op">+</span> <span class="bn">0x10000</span></span></code></pre></div>
<p>The <code>do</code> syntax is a syntactic-sugar on top of the monadic bind operation <code>>>=</code> which allows us to sequence monadic operations. That’s how we are able to read the first character and choose to do different things depending on whether it is a surrogate or not<a href="#fn12" class="footnote-ref" id="fnref12" role="doc-noteref"><sup>12</sup></a>. This cannot be done without the <code>Monad</code> instance of <code>Parser</code>.</p>
<p>Let’s give <code>jString</code> a try in GHCi:</p>
<div class="sourceCode" id="cb39" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb39-1"><a href="#cb39-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jString <span class="st">"\"abhinav\""</span></span>
<span id="cb39-2"><a href="#cb39-2" aria-hidden="true" tabindex="-1"></a>Just ("","abhinav")</span>
<span id="cb39-3"><a href="#cb39-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jString <span class="st">"\"\\u1234\""</span></span>
<span id="cb39-4"><a href="#cb39-4" aria-hidden="true" tabindex="-1"></a>Just ("","ሴ")</span>
<span id="cb39-5"><a href="#cb39-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jString <span class="st">"\"\\uD834\\uDD1E\""</span></span>
<span id="cb39-6"><a href="#cb39-6" aria-hidden="true" tabindex="-1"></a>Just ("","𝄞")</span>
<span id="cb39-7"><a href="#cb39-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jString <span class="st">"\"\\uD834\""</span> <span class="co">-- lone surrogate is invalid</span></span>
<span id="cb39-8"><a href="#cb39-8" aria-hidden="true" tabindex="-1"></a>Nothing</span>
<span id="cb39-9"><a href="#cb39-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jString <span class="st">"\"\\uD834\\uE000\""</span> <span class="co">-- \uEOOO is not a surrogate</span></span>
<span id="cb39-10"><a href="#cb39-10" aria-hidden="true" tabindex="-1"></a>Nothing</span></code></pre></div>
<p>It seems to work but we can’t be sure yet. Let’s write our first QuickCheck property to test it throughly:</p>
<div class="sourceCode" id="cb40" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb40-1"><a href="#cb40-1" aria-hidden="true" tabindex="-1"></a><span class="ot">prop_genParseJString ::</span> <span class="dt">Property</span></span>
<span id="cb40-2"><a href="#cb40-2" aria-hidden="true" tabindex="-1"></a>prop_genParseJString <span class="ot">=</span></span>
<span id="cb40-3"><a href="#cb40-3" aria-hidden="true" tabindex="-1"></a> forAllShrink jStringGen shrink <span class="op">$</span> \js <span class="ot">-></span></span>
<span id="cb40-4"><a href="#cb40-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> runParser jString (<span class="fu">show</span> js) <span class="kw">of</span></span>
<span id="cb40-5"><a href="#cb40-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb40-6"><a href="#cb40-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (_, o) <span class="ot">-></span> o <span class="op">==</span> js</span></code></pre></div>
<p>We use the JSON string generator <code>jStringGen</code> which we wrote in the <a href="#scalar-generators">Scalar Generators</a> section to generate arbitrary JSON strings. Then we parse them using the <code>jString</code> parser and equate the parsed result with the generated value for confirming that the parser works. The <a href="https://hackage.haskell.org/package/QuickCheck/docs/Test-QuickCheck.html#v:forAllShrink" target="_blank" rel="noopener"><code>forAllShrink</code></a> function from QuickCheck takes care of input generation and input shrinking in case of failures, automatically. We test this property in GHCi using the <a href="https://hackage.haskell.org/package/QuickCheck/docs/Test-QuickCheck.html#v:quickCheck" target="_blank" rel="noopener"><code>quickCheck</code></a> function:</p>
<div class="sourceCode" id="cb41" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb41-1"><a href="#cb41-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> quickCheck prop_genParseJString</span>
<span id="cb41-2"><a href="#cb41-2" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span></code></pre></div>
<p>Brilliant! That was a complicated parser to write. Let’s move on to parsing numbers.</p>
<h2 data-track-content data-content-name="jnumber-parser" data-content-piece="json-parsing-from-scratch-in-haskell" id="jnumber-parser">JNumber Parser</h2>
<p>Numbers in JSON can be in different formats. They can be an integer, or a real number with a integral and fractional part, or in scientific notation with or without a fractional part. We will write separate parsers for each of these formats and then combine them to create the number parser. We start with the integer parser:</p>
<div class="sourceCode" id="cb42" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb42-1"><a href="#cb42-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jUInt ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Integer</span></span>
<span id="cb42-2"><a href="#cb42-2" aria-hidden="true" tabindex="-1"></a>jUInt <span class="ot">=</span> (\d ds <span class="ot">-></span> digitsToNumber <span class="dv">10</span> <span class="dv">0</span> (d<span class="op">:</span>ds)) <span class="op"><$></span> digit19 <span class="op"><*></span> digits</span>
<span id="cb42-3"><a href="#cb42-3" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="fu">fromIntegral</span> <span class="op"><$></span> digit</span>
<span id="cb42-4"><a href="#cb42-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb42-5"><a href="#cb42-5" aria-hidden="true" tabindex="-1"></a><span class="ot">digit19 ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Int</span></span>
<span id="cb42-6"><a href="#cb42-6" aria-hidden="true" tabindex="-1"></a>digit19 <span class="ot">=</span> <span class="fu">digitToInt</span> <span class="op"><$></span> satisfy (\x <span class="ot">-></span> <span class="fu">isDigit</span> x <span class="op">&&</span> x <span class="op">/=</span> <span class="ch">'0'</span>)</span>
<span id="cb42-7"><a href="#cb42-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb42-8"><a href="#cb42-8" aria-hidden="true" tabindex="-1"></a><span class="ot">digits ::</span> <span class="dt">Parser</span> <span class="dt">String</span> [<span class="dt">Int</span>]</span>
<span id="cb42-9"><a href="#cb42-9" aria-hidden="true" tabindex="-1"></a>digits <span class="ot">=</span> some digit</span>
<span id="cb42-10"><a href="#cb42-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb42-11"><a href="#cb42-11" aria-hidden="true" tabindex="-1"></a><span class="ot">jInt' ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Integer</span></span>
<span id="cb42-12"><a href="#cb42-12" aria-hidden="true" tabindex="-1"></a>jInt' <span class="ot">=</span> signInt <span class="op"><$></span> optional (char <span class="ch">'-'</span>) <span class="op"><*></span> jUInt</span>
<span id="cb42-13"><a href="#cb42-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb42-14"><a href="#cb42-14" aria-hidden="true" tabindex="-1"></a><span class="ot">signInt ::</span> <span class="dt">Maybe</span> <span class="dt">Char</span> <span class="ot">-></span> <span class="dt">Integer</span> <span class="ot">-></span> <span class="dt">Integer</span></span>
<span id="cb42-15"><a href="#cb42-15" aria-hidden="true" tabindex="-1"></a>signInt (<span class="dt">Just</span> <span class="ch">'-'</span>) i <span class="ot">=</span> <span class="fu">negate</span> i</span>
<span id="cb42-16"><a href="#cb42-16" aria-hidden="true" tabindex="-1"></a>signInt _ i <span class="ot">=</span> i</span></code></pre></div>
<p><code>jUInt</code> is a parser for unsigned integers. Integers in JSON cannot start with leading zeros. So if there are multiple digits, <code>jUInt</code> makes sure that the first digit is 1–9. Alternatively, there can be one digit in range 0–9. <code>digitsToNumber</code> is used to combine parsed digits into an <code>Integer</code>. <code>jInt'</code> add support for an optional <code>-</code> sign over <code>jUInt</code>.</p>
<p>We use the <a href="https://hackage.haskell.org/package/base/docs/Control-Applicative.html#v:some" target="_blank" rel="noopener"><code>some</code></a> function here for writing the <code>digits</code> function. <code>some</code> runs the given parser one or more times and returns gathering the results in a list.</p>
<p>Parsers for the fractional and exponent parts are simple:</p>
<div class="sourceCode" id="cb43" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb43-1"><a href="#cb43-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jFrac ::</span> <span class="dt">Parser</span> <span class="dt">String</span> [<span class="dt">Int</span>]</span>
<span id="cb43-2"><a href="#cb43-2" aria-hidden="true" tabindex="-1"></a>jFrac <span class="ot">=</span> char <span class="ch">'.'</span> <span class="op">*></span> digits</span>
<span id="cb43-3"><a href="#cb43-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb43-4"><a href="#cb43-4" aria-hidden="true" tabindex="-1"></a><span class="ot">jExp ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Integer</span></span>
<span id="cb43-5"><a href="#cb43-5" aria-hidden="true" tabindex="-1"></a>jExp <span class="ot">=</span> (char <span class="ch">'e'</span> <span class="op"><|></span> char <span class="ch">'E'</span>)</span>
<span id="cb43-6"><a href="#cb43-6" aria-hidden="true" tabindex="-1"></a> <span class="op">*></span> (signInt <span class="op"><$></span> optional (char <span class="ch">'+'</span> <span class="op"><|></span> char <span class="ch">'-'</span>) <span class="op"><*></span> jUInt)</span></code></pre></div>
<p>Now we can combine these parsers to create a parser for various number formats:</p>
<div class="sourceCode" id="cb44" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb44-1"><a href="#cb44-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jInt ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb44-2"><a href="#cb44-2" aria-hidden="true" tabindex="-1"></a>jInt <span class="ot">=</span> <span class="dt">JNumber</span> <span class="op"><$></span> jInt' <span class="op"><*></span> <span class="fu">pure</span> [] <span class="op"><*></span> <span class="fu">pure</span> <span class="dv">0</span></span>
<span id="cb44-3"><a href="#cb44-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb44-4"><a href="#cb44-4" aria-hidden="true" tabindex="-1"></a><span class="ot">jIntExp ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb44-5"><a href="#cb44-5" aria-hidden="true" tabindex="-1"></a>jIntExp <span class="ot">=</span> <span class="dt">JNumber</span> <span class="op"><$></span> jInt' <span class="op"><*></span> <span class="fu">pure</span> [] <span class="op"><*></span> jExp</span>
<span id="cb44-6"><a href="#cb44-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb44-7"><a href="#cb44-7" aria-hidden="true" tabindex="-1"></a><span class="ot">jIntFrac ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb44-8"><a href="#cb44-8" aria-hidden="true" tabindex="-1"></a>jIntFrac <span class="ot">=</span> (\i f <span class="ot">-></span> <span class="dt">JNumber</span> i f <span class="dv">0</span>) <span class="op"><$></span> jInt' <span class="op"><*></span> jFrac</span>
<span id="cb44-9"><a href="#cb44-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb44-10"><a href="#cb44-10" aria-hidden="true" tabindex="-1"></a><span class="ot">jIntFracExp ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb44-11"><a href="#cb44-11" aria-hidden="true" tabindex="-1"></a>jIntFracExp <span class="ot">=</span> (\ <span class="op">~</span>(<span class="dt">JNumber</span> i f _) e <span class="ot">-></span> <span class="dt">JNumber</span> i f e) <span class="op"><$></span> jIntFrac <span class="op"><*></span> jExp</span></code></pre></div>
<p>And finally, the <code>jNumber</code> parser is a combination of all the format parser alternatives, ordered from most eager to least eager:</p>
<div class="sourceCode" id="cb45" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb45-1"><a href="#cb45-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jNumber ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb45-2"><a href="#cb45-2" aria-hidden="true" tabindex="-1"></a>jNumber <span class="ot">=</span> jIntFracExp <span class="op"><|></span> jIntExp <span class="op"><|></span> jIntFrac <span class="op"><|></span> jInt</span></code></pre></div>
<p>We can verify it in GHCi:</p>
<div class="sourceCode" id="cb46" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb46-1"><a href="#cb46-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"01"</span></span>
<span id="cb46-2"><a href="#cb46-2" aria-hidden="true" tabindex="-1"></a>Just ("1",0)</span>
<span id="cb46-3"><a href="#cb46-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"4"</span></span>
<span id="cb46-4"><a href="#cb46-4" aria-hidden="true" tabindex="-1"></a>Just ("",4)</span>
<span id="cb46-5"><a href="#cb46-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"44"</span></span>
<span id="cb46-6"><a href="#cb46-6" aria-hidden="true" tabindex="-1"></a>Just ("",44)</span>
<span id="cb46-7"><a href="#cb46-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"44.0"</span></span>
<span id="cb46-8"><a href="#cb46-8" aria-hidden="true" tabindex="-1"></a>Just ("",44.0)</span>
<span id="cb46-9"><a href="#cb46-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"44.45"</span></span>
<span id="cb46-10"><a href="#cb46-10" aria-hidden="true" tabindex="-1"></a>Just ("",44.45)</span>
<span id="cb46-11"><a href="#cb46-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"0.444"</span></span>
<span id="cb46-12"><a href="#cb46-12" aria-hidden="true" tabindex="-1"></a>Just ("",0.444)</span>
<span id="cb46-13"><a href="#cb46-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"44E4"</span></span>
<span id="cb46-14"><a href="#cb46-14" aria-hidden="true" tabindex="-1"></a>Just ("",44e4)</span>
<span id="cb46-15"><a href="#cb46-15" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"44.3e-7"</span></span>
<span id="cb46-16"><a href="#cb46-16" aria-hidden="true" tabindex="-1"></a>Just ("",44.3e-7)</span>
<span id="cb46-17"><a href="#cb46-17" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jNumber <span class="st">"0.35e+34"</span></span>
<span id="cb46-18"><a href="#cb46-18" aria-hidden="true" tabindex="-1"></a>Just ("",0.35e34)</span></code></pre></div>
<p>Nice. But to be sure, let’s write a QuickCheck property for <code>jNumber</code>:</p>
<div class="sourceCode" id="cb47" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb47-1"><a href="#cb47-1" aria-hidden="true" tabindex="-1"></a><span class="ot">prop_genParseJNumber ::</span> <span class="dt">Property</span></span>
<span id="cb47-2"><a href="#cb47-2" aria-hidden="true" tabindex="-1"></a>prop_genParseJNumber <span class="ot">=</span></span>
<span id="cb47-3"><a href="#cb47-3" aria-hidden="true" tabindex="-1"></a> forAllShrink jNumberGen shrink <span class="op">$</span> \jn <span class="ot">-></span></span>
<span id="cb47-4"><a href="#cb47-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> runParser jNumber (<span class="fu">show</span> jn) <span class="kw">of</span></span>
<span id="cb47-5"><a href="#cb47-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb47-6"><a href="#cb47-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (_, o) <span class="ot">-></span> o <span class="op">==</span> jn</span></code></pre></div>
<p>And run it:</p>
<div class="sourceCode" id="cb48" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb48-1"><a href="#cb48-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> quickCheck prop_genParseJNumber</span>
<span id="cb48-2"><a href="#cb48-2" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span></code></pre></div>
<p>That concludes the parsers for the scalar JSON types.</p>
<h2 data-track-content data-content-name="jarray-parser" data-content-piece="json-parsing-from-scratch-in-haskell" id="jarray-parser">JArray Parser</h2>
<p>A JSON array can contain zero or more items of any JSON types separated by commas (<code>,</code>). So it’s natural that the array parser will be recursive in nature. Arrays can also contain any amount of JSON whitespace around the items. First we write some helper functions to ease our parser implementation:</p>
<div class="sourceCode" id="cb49" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb49-1"><a href="#cb49-1" aria-hidden="true" tabindex="-1"></a><span class="ot">surroundedBy ::</span></span>
<span id="cb49-2"><a href="#cb49-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">Parser</span> <span class="dt">String</span> a <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> b <span class="ot">-></span> <span class="dt">Parser</span> <span class="dt">String</span> a</span>
<span id="cb49-3"><a href="#cb49-3" aria-hidden="true" tabindex="-1"></a>surroundedBy p1 p2 <span class="ot">=</span> p2 <span class="op">*></span> p1 <span class="op"><*</span> p2</span>
<span id="cb49-4"><a href="#cb49-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb49-5"><a href="#cb49-5" aria-hidden="true" tabindex="-1"></a><span class="ot">separatedBy ::</span> <span class="dt">Parser</span> i v <span class="ot">-></span> <span class="dt">Parser</span> i s <span class="ot">-></span> <span class="dt">Parser</span> i [v]</span>
<span id="cb49-6"><a href="#cb49-6" aria-hidden="true" tabindex="-1"></a>separatedBy v s <span class="ot">=</span> (<span class="op">:</span>) <span class="op"><$></span> v <span class="op"><*></span> many (s <span class="op">*></span> v)</span>
<span id="cb49-7"><a href="#cb49-7" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> <span class="fu">pure</span> []</span>
<span id="cb49-8"><a href="#cb49-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb49-9"><a href="#cb49-9" aria-hidden="true" tabindex="-1"></a><span class="ot">spaces ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">String</span></span>
<span id="cb49-10"><a href="#cb49-10" aria-hidden="true" tabindex="-1"></a>spaces <span class="ot">=</span> many (char <span class="ch">' '</span> <span class="op"><|></span> char <span class="ch">'\n'</span> <span class="op"><|></span> char <span class="ch">'\r'</span> <span class="op"><|></span> char <span class="ch">'\t'</span>)</span></code></pre></div>
<p>We are using the previously introduced operators <code><$></code> from the <code>Functor</code> typeclass, <code>*></code>, <code><*></code> and <code><*</code> from the <code>Applicative</code> typeclass, and <code><|></code> from the <code>Alternative</code> typeclass. The function <a href="https://hackage.haskell.org/package/base/docs/Control-Applicative.html#v:many" target="_blank" rel="noopener"><code>many</code></a> is like <code>some</code> except it runs the given parser zero or more times. Let’s see these in action:</p>
<div class="sourceCode" id="cb50" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb50-1"><a href="#cb50-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jArray ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb50-2"><a href="#cb50-2" aria-hidden="true" tabindex="-1"></a>jArray <span class="ot">=</span> <span class="dt">JArray</span> <span class="op"><$></span></span>
<span id="cb50-3"><a href="#cb50-3" aria-hidden="true" tabindex="-1"></a> (char <span class="ch">'['</span></span>
<span id="cb50-4"><a href="#cb50-4" aria-hidden="true" tabindex="-1"></a> <span class="op">*></span> (jValue <span class="ot">`separatedBy`</span> char <span class="ch">','</span> <span class="ot">`surroundedBy`</span> spaces)</span>
<span id="cb50-5"><a href="#cb50-5" aria-hidden="true" tabindex="-1"></a> <span class="op"><*</span> char <span class="ch">']'</span>)</span></code></pre></div>
<p>It’s amazing how this definition almost reads like the spec for JSON array itself. We use the yet undefined <code>jValue</code> parser here to recursively parse any JSON value. Let’s try this out:</p>
<div class="sourceCode" id="cb51" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb51-1"><a href="#cb51-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb51-2"><a href="#cb51-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray</span>
<span id="cb51-3"><a href="#cb51-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="st">"[1, \"hello\", \n3.5, null, [false,true]]"</span></span>
<span id="cb51-4"><a href="#cb51-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb51-5"><a href="#cb51-5" aria-hidden="true" tabindex="-1"></a>Just ("",[1, "hello", 3.5, null, [false, true]])</span>
<span id="cb51-6"><a href="#cb51-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"[ [ [ true ] ] ]"</span></span>
<span id="cb51-7"><a href="#cb51-7" aria-hidden="true" tabindex="-1"></a>Just ("",[[[true]]])</span>
<span id="cb51-8"><a href="#cb51-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runParser jArray <span class="st">"[123"</span></span>
<span id="cb51-9"><a href="#cb51-9" aria-hidden="true" tabindex="-1"></a>Nothing</span></code></pre></div>
<p>Let’s write the QuickCheck property for testing the parser:</p>
<div class="sourceCode" id="cb52" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb52-1"><a href="#cb52-1" aria-hidden="true" tabindex="-1"></a><span class="ot">prop_genParseJArray ::</span> <span class="dt">Property</span></span>
<span id="cb52-2"><a href="#cb52-2" aria-hidden="true" tabindex="-1"></a>prop_genParseJArray <span class="ot">=</span></span>
<span id="cb52-3"><a href="#cb52-3" aria-hidden="true" tabindex="-1"></a> forAllShrink (sized jArrayGen) shrink <span class="op">$</span> \ja <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb52-4"><a href="#cb52-4" aria-hidden="true" tabindex="-1"></a> jas <span class="ot"><-</span> <span class="fu">dropWhile</span> <span class="fu">isSpace</span> <span class="op"><$></span> stringify ja</span>
<span id="cb52-5"><a href="#cb52-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> <span class="op">.</span> counterexample (<span class="fu">show</span> jas) <span class="op">$</span> <span class="kw">case</span> runParser jArray jas <span class="kw">of</span></span>
<span id="cb52-6"><a href="#cb52-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb52-7"><a href="#cb52-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (_, o) <span class="ot">-></span> o <span class="op">==</span> ja</span></code></pre></div>
<p>We generate arbitrary JSON arrays using the <code>sized jArrayGen</code> generator, we <code>stringify</code> the arrays and parse the text data to equate them with the original generated arrays. Since the <code>jArray</code> parser does not deal with leading whitespace, we need to discard it before parsing the text. <a href="https://hackage.haskell.org/package/QuickCheck/docs/Test-QuickCheck.html#v:sized" target="_blank" rel="noopener"><code>sized</code></a> lets QuickCheck control the size of generated values. We also add additional info to the QuickCheck error reports using the <a href="https://hackage.haskell.org/package/QuickCheck/docs/Test-QuickCheck.html#v:counterexample" target="_blank" rel="noopener"><code>counterexample</code></a> function.</p>
<p>Running the test<a href="#fn13" class="footnote-ref" id="fnref13" role="doc-noteref"><sup>13</sup></a>:</p>
<div class="sourceCode" id="cb53" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb53-1"><a href="#cb53-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> quickCheck prop_genParseJArray</span>
<span id="cb53-2"><a href="#cb53-2" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span></code></pre></div>
<h2 data-track-content data-content-name="jobject-parser" data-content-piece="json-parsing-from-scratch-in-haskell" id="jobject-parser">JObject Parser</h2>
<p>On to the final piece, the JSON object parser almost writes itself after all we have learned till now:</p>
<div class="sourceCode" id="cb54" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb54-1"><a href="#cb54-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jObject ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb54-2"><a href="#cb54-2" aria-hidden="true" tabindex="-1"></a>jObject <span class="ot">=</span> <span class="dt">JObject</span> <span class="op"><$></span></span>
<span id="cb54-3"><a href="#cb54-3" aria-hidden="true" tabindex="-1"></a> (char <span class="ch">'{'</span> <span class="op">*></span> pair <span class="ot">`separatedBy`</span> char <span class="ch">','</span> <span class="ot">`surroundedBy`</span> spaces <span class="op"><*</span> char <span class="ch">'}'</span>)</span>
<span id="cb54-4"><a href="#cb54-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb54-5"><a href="#cb54-5" aria-hidden="true" tabindex="-1"></a> pair <span class="ot">=</span> (\ <span class="op">~</span>(<span class="dt">JString</span> s) j <span class="ot">-></span> (s, j))</span>
<span id="cb54-6"><a href="#cb54-6" aria-hidden="true" tabindex="-1"></a> <span class="op"><$></span> (jString <span class="ot">`surroundedBy`</span> spaces)</span>
<span id="cb54-7"><a href="#cb54-7" aria-hidden="true" tabindex="-1"></a> <span class="op"><*</span> char <span class="ch">':'</span></span>
<span id="cb54-8"><a href="#cb54-8" aria-hidden="true" tabindex="-1"></a> <span class="op"><*></span> jValue</span></code></pre></div>
<p>The property for testing is quite similar to that of the <code>jArray</code> parser:</p>
<div class="sourceCode" id="cb55" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb55-1"><a href="#cb55-1" aria-hidden="true" tabindex="-1"></a><span class="ot">prop_genParseJObject ::</span> <span class="dt">Property</span></span>
<span id="cb55-2"><a href="#cb55-2" aria-hidden="true" tabindex="-1"></a>prop_genParseJObject <span class="ot">=</span></span>
<span id="cb55-3"><a href="#cb55-3" aria-hidden="true" tabindex="-1"></a> forAllShrink (sized jObjectGen) shrink <span class="op">$</span> \jo <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb55-4"><a href="#cb55-4" aria-hidden="true" tabindex="-1"></a> jos <span class="ot"><-</span> <span class="fu">dropWhile</span> <span class="fu">isSpace</span> <span class="op"><$></span> stringify jo</span>
<span id="cb55-5"><a href="#cb55-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> <span class="op">.</span> counterexample (<span class="fu">show</span> jos) <span class="op">$</span> <span class="kw">case</span> runParser jObject jos <span class="kw">of</span></span>
<span id="cb55-6"><a href="#cb55-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="dt">False</span></span>
<span id="cb55-7"><a href="#cb55-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (_, o) <span class="ot">-></span> o <span class="op">==</span> jo</span></code></pre></div>
<p>And the test:</p>
<div class="sourceCode" id="cb56" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb56-1"><a href="#cb56-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> quickCheck prop_genParseJObject</span>
<span id="cb56-2"><a href="#cb56-2" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span></code></pre></div>
<h2 data-track-content data-content-name="json-parser" data-content-piece="json-parsing-from-scratch-in-haskell" id="json-parser">JSON Parser</h2>
<p>Finally, it’s time to put all the puzzle pieces together to write <strong>the</strong> JSON parser:</p>
<div class="sourceCode" id="cb57" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb57-1"><a href="#cb57-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jValue ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb57-2"><a href="#cb57-2" aria-hidden="true" tabindex="-1"></a>jValue <span class="ot">=</span> jValue' <span class="ot">`surroundedBy`</span> spaces</span>
<span id="cb57-3"><a href="#cb57-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb57-4"><a href="#cb57-4" aria-hidden="true" tabindex="-1"></a> jValue' <span class="ot">=</span> jNull</span>
<span id="cb57-5"><a href="#cb57-5" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> jBool</span>
<span id="cb57-6"><a href="#cb57-6" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> jString</span>
<span id="cb57-7"><a href="#cb57-7" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> jNumber</span>
<span id="cb57-8"><a href="#cb57-8" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> jArray</span>
<span id="cb57-9"><a href="#cb57-9" aria-hidden="true" tabindex="-1"></a> <span class="op"><|></span> jObject</span></code></pre></div>
<p>That was easier than expected<a href="#fn14" class="footnote-ref" id="fnref14" role="doc-noteref"><sup>14</sup></a>! Now we can write the <code>parseJSON</code> function to … parse JSON:</p>
<div class="sourceCode" id="cb59" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb59-1"><a href="#cb59-1" aria-hidden="true" tabindex="-1"></a><span class="ot">parseJSON ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">JValue</span></span>
<span id="cb59-2"><a href="#cb59-2" aria-hidden="true" tabindex="-1"></a>parseJSON s <span class="ot">=</span> <span class="kw">case</span> runParser jValue s <span class="kw">of</span></span>
<span id="cb59-3"><a href="#cb59-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> (<span class="st">""</span>, j) <span class="ot">-></span> <span class="dt">Just</span> j</span>
<span id="cb59-4"><a href="#cb59-4" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">Nothing</span></span></code></pre></div>
<p>And now we write the final property which just straight-up generates arbitrary JSON values, <code>stringify</code>s them and matches the parsed values with the original generated values:</p>
<div class="sourceCode" id="cb60" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb60-1"><a href="#cb60-1" aria-hidden="true" tabindex="-1"></a><span class="ot">prop_genParseJSON ::</span> <span class="dt">Property</span></span>
<span id="cb60-2"><a href="#cb60-2" aria-hidden="true" tabindex="-1"></a>prop_genParseJSON <span class="ot">=</span> forAllShrink (sized jValueGen) shrink <span class="op">$</span> \value <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb60-3"><a href="#cb60-3" aria-hidden="true" tabindex="-1"></a> json <span class="ot"><-</span> stringify value</span>
<span id="cb60-4"><a href="#cb60-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">return</span> <span class="op">.</span> counterexample (<span class="fu">show</span> json) <span class="op">.</span> (<span class="op">==</span> <span class="dt">Just</span> value) <span class="op">.</span> parseJSON <span class="op">$</span> json</span></code></pre></div>
<p>Let’s skip testing this property and instead write a test to test all of them:</p>
<div class="sourceCode" id="cb61" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb61-1"><a href="#cb61-1" aria-hidden="true" tabindex="-1"></a><span class="ot">runTests ::</span> <span class="dt">IO</span> ()</span>
<span id="cb61-2"><a href="#cb61-2" aria-hidden="true" tabindex="-1"></a>runTests <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb61-3"><a href="#cb61-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">putStrLn</span> <span class="st">"== prop_genParseJString =="</span></span>
<span id="cb61-4"><a href="#cb61-4" aria-hidden="true" tabindex="-1"></a> quickCheck prop_genParseJString</span>
<span id="cb61-5"><a href="#cb61-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb61-6"><a href="#cb61-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">putStrLn</span> <span class="st">"== prop_genParseJNumber =="</span></span>
<span id="cb61-7"><a href="#cb61-7" aria-hidden="true" tabindex="-1"></a> quickCheck prop_genParseJNumber</span>
<span id="cb61-8"><a href="#cb61-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb61-9"><a href="#cb61-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">putStrLn</span> <span class="st">"== prop_genParseJArray =="</span></span>
<span id="cb61-10"><a href="#cb61-10" aria-hidden="true" tabindex="-1"></a> quickCheck prop_genParseJArray</span>
<span id="cb61-11"><a href="#cb61-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb61-12"><a href="#cb61-12" aria-hidden="true" tabindex="-1"></a> <span class="fu">putStrLn</span> <span class="st">"== prop_genParseJObject =="</span></span>
<span id="cb61-13"><a href="#cb61-13" aria-hidden="true" tabindex="-1"></a> quickCheck prop_genParseJObject</span>
<span id="cb61-14"><a href="#cb61-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb61-15"><a href="#cb61-15" aria-hidden="true" tabindex="-1"></a> <span class="fu">putStrLn</span> <span class="st">"== prop_genParseJSON =="</span></span>
<span id="cb61-16"><a href="#cb61-16" aria-hidden="true" tabindex="-1"></a> quickCheck prop_genParseJSON</span></code></pre></div>
<p>As the tradition goes, let’s do a final run for all the tests<a href="#fn15" class="footnote-ref" id="fnref15" role="doc-noteref"><sup>15</sup></a>:</p>
<div class="sourceCode" id="cb62" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb62-1"><a href="#cb62-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> runTests</span>
<span id="cb62-2"><a href="#cb62-2" aria-hidden="true" tabindex="-1"></a>== prop_genParseJString ==</span>
<span id="cb62-3"><a href="#cb62-3" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span>
<span id="cb62-4"><a href="#cb62-4" aria-hidden="true" tabindex="-1"></a>== prop_genParseJNumber ==</span>
<span id="cb62-5"><a href="#cb62-5" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span>
<span id="cb62-6"><a href="#cb62-6" aria-hidden="true" tabindex="-1"></a>== prop_genParseJArray ==</span>
<span id="cb62-7"><a href="#cb62-7" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span>
<span id="cb62-8"><a href="#cb62-8" aria-hidden="true" tabindex="-1"></a>== prop_genParseJObject ==</span>
<span id="cb62-9"><a href="#cb62-9" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span>
<span id="cb62-10"><a href="#cb62-10" aria-hidden="true" tabindex="-1"></a>== prop_genParseJSON ==</span>
<span id="cb62-11"><a href="#cb62-11" aria-hidden="true" tabindex="-1"></a>+++ OK, passed 100 tests.</span></code></pre></div>
<p>Hurray! We have written a simple but correct JSON parser from scratch.</p>
<h2 data-track-content data-content-name="conclusion" data-content-piece="json-parsing-from-scratch-in-haskell" id="conclusion">Conclusion</h2>
<p>In the short span of thirty minutes, we have learned how to write a JSON parser from scratch in Haskell. We have also learned some basics of parsing and a great deal of details about the JSON syntax. We also gained some understanding of how to write Property-based tests with QuickCheck. I hope all these things will be useful to you. The full code for the JSON parser can be seen <a href="https://abhinavsarkar.net/code/jsonparser.html?mtm_campaign=feed">here</a>.</p>
<h2 class="notoc" data-track-content data-content-name="acknowledgements" data-content-piece="json-parsing-from-scratch-in-haskell" id="acknowledgements">Acknowledgements</h2>
<p>Many thanks to <a href="https://ankursethi.in/" target="_blank" rel="noopener">Ankur Sethi</a> and <a href="https://nirbheek.in/" target="_blank" rel="noopener">Nirbheek Chauhan</a> for helping me understand the intricacies of Unicode, and to <a href="https://www.deobald.ca/" target="_blank" rel="noopener">Steven Deobald</a> for reviewing a draft of this article.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p><strong>parse</strong>: verb [with object]</p>
<p>resolve (a sentence) into its component parts and describe their syntactic roles</p>
<p>Origin: mid 16th century: perhaps from Middle English pars ‘parts of speech’, from Old French pars ‘parts’ (influenced by Latin pars ‘part’).<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>Parsing is often preceded by a <em>Lexing</em> step which converts raw text input into a sequence of tokens which may be annotated with some extra syntactic information. Alternatively, Lexing can be skipped as a separate step and combined with the Parsing step.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>Context-free grammars are a part of the <a href="https://en.wikipedia.org/wiki/Chomsky_hierarchy" target="_blank" rel="noopener">Chomsky hierarchy</a> of formal grammars.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p><a href="https://en.wikipedia.org/wiki/C++" target="_blank" rel="noopener">C++</a> famously has a non-context-free grammar, aka a <a href="https://en.wikipedia.org/wiki/Context-sensitive_grammar" target="_blank" rel="noopener">Context-sensitive grammar</a>.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>For comparison, <a href="https://github.com/antlr/grammars-v4/blob/master/json/JSON.g4" target="_blank" rel="noopener">here</a> is the grammar of JSON to be used by the popular parser generator framework <a href="https://www.antlr.org/" target="_blank" rel="noopener">ANTLR</a>. And <a href="https://wesleytsai.io/2015/06/13/a-json-parser/" target="_blank" rel="noopener">this post</a> shows how to implement a JSON parser by hand in Javascript.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>For real production usage, I recommend using the <a href="https://hackage.haskell.org/package/aeson" target="_blank" rel="noopener">Aeson</a> library.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>Actually, QuickCheck does not generate entirely arbitrary inputs. It generates arbitrary inputs with increasing complexity—where the complexity is defined by the user—and asserts the properties on these inputs. When a test fails for a particular input, QuickCheck also tries to simplify the culprit input and tries to find the simplest input for which the test fails. This process is called <em>Shrinking</em> in QuickCheck parlance. QuickCheck then shows this simplest input to the user for them to use it to debug their code.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>If you are confused by so many backslashes (<code>\</code>) in the <code>showJSONChar</code> function, they are there because to write a backslash in Haskell code or in JSON text data, you need to escape it with … another backslash.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9"><p>Real-world parsers are of course more complicated than our definition. They need to report proper error messages and often need to be stateful. For example, the parser type in the popular <a href="https://hackage.haskell.org/package/attoparsec" target="_blank" rel="noopener">Attoparsec</a> parser library looks like this:</p>
<div class="sourceCode" id="cb16" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">Parser</span> i o <span class="ot">=</span> <span class="dt">Parser</span> {</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a><span class="ot"> runParser ::</span> <span class="kw">forall</span> r<span class="op">.</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">State</span> i</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a> <span class="ot">-></span> <span class="dt">Pos</span></span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a> <span class="ot">-></span> <span class="dt">More</span></span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a> <span class="ot">-></span> <span class="dt">Failure</span> i (<span class="dt">State</span> i) r</span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a> <span class="ot">-></span> <span class="dt">Success</span> i (<span class="dt">State</span> i) o r</span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> <span class="ot">-></span> <span class="dt">IResult</span> i r</span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a>}</span></code></pre></div>
<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn10"><p>In case you are confused by the dangling <code>\case</code>, we are using the <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/lambda_case.html#extension-LambdaCase" target="_blank" rel="noopener">LambdaCase</a> GHC extension here.<a href="#fnref10" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn11"><p>As you may have guessed, <a href="https://hackage.haskell.org/package/base/docs/Data-Functor.html#v:-60--36-" target="_blank" rel="noopener"><code><$</code></a> is a flipped version of <code>$></code>. It discards the parser result on left and returns the value on right.<a href="#fnref11" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn12"><p>Essentially, what the <code>Monad</code> abstraction gives our parsers is <a href="https://en.wikipedia.org/wiki/Parsing#Lookahead" target="_blank" rel="noopener"><em>Lookahead</em></a>, the capability to “look ahead” in the input and make decisions accordingly. With the lookahead capability, we can write a <em>Predictive parser</em>, which is a Recursive decent parser without the need to do backtracking. Predictive parsing is faster than backtracking because we don’t need to go over the input multiple times. With the lookahead of one character, we can write an alternative version of the <code>jBool</code> parser like this:</p>
<div class="sourceCode" id="cb38" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb38-1"><a href="#cb38-1" aria-hidden="true" tabindex="-1"></a><span class="ot">lookahead ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">Char</span></span>
<span id="cb38-2"><a href="#cb38-2" aria-hidden="true" tabindex="-1"></a>lookahead <span class="ot">=</span> <span class="dt">Parser</span> <span class="op">$</span> \<span class="kw">case</span></span>
<span id="cb38-3"><a href="#cb38-3" aria-hidden="true" tabindex="-1"></a> i<span class="op">@</span>(x<span class="op">:</span>_) <span class="ot">-></span> <span class="dt">Just</span> (i, x)</span>
<span id="cb38-4"><a href="#cb38-4" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb38-5"><a href="#cb38-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb38-6"><a href="#cb38-6" aria-hidden="true" tabindex="-1"></a><span class="ot">jBoolAlt ::</span> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb38-7"><a href="#cb38-7" aria-hidden="true" tabindex="-1"></a>jBoolAlt <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb38-8"><a href="#cb38-8" aria-hidden="true" tabindex="-1"></a> c <span class="ot"><-</span> lookahead</span>
<span id="cb38-9"><a href="#cb38-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">JBool</span> <span class="op"><$></span> <span class="kw">case</span> c <span class="kw">of</span></span>
<span id="cb38-10"><a href="#cb38-10" aria-hidden="true" tabindex="-1"></a> <span class="ch">'t'</span> <span class="ot">-></span> string <span class="st">"true"</span> <span class="op">$></span> <span class="dt">True</span></span>
<span id="cb38-11"><a href="#cb38-11" aria-hidden="true" tabindex="-1"></a> <span class="ch">'f'</span> <span class="ot">-></span> string <span class="st">"false"</span> <span class="op">$></span> <span class="dt">False</span></span>
<span id="cb38-12"><a href="#cb38-12" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> empty</span></code></pre></div>
<a href="#fnref12" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn13"><p>You can also run the tests with the <a href="https://hackage.haskell.org/package/QuickCheck/docs/Test-QuickCheck.html#v:verboseCheck" target="_blank" rel="noopener"><code>verboseCheck</code></a> function to see all the values being generated by QuickCheck.<a href="#fnref13" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn14"><p>Alternatively, we can write a predictive parser for JSON:</p>
<div class="sourceCode" id="cb58" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb58-1"><a href="#cb58-1" aria-hidden="true" tabindex="-1"></a><span class="ot">jValueAlt ::</span></span>
<span id="cb58-2"><a href="#cb58-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">Parser</span> <span class="dt">String</span> <span class="dt">JValue</span></span>
<span id="cb58-3"><a href="#cb58-3" aria-hidden="true" tabindex="-1"></a>jValueAlt <span class="ot">=</span></span>
<span id="cb58-4"><a href="#cb58-4" aria-hidden="true" tabindex="-1"></a> jValue' <span class="ot">`surroundedBy`</span> spaces</span>
<span id="cb58-5"><a href="#cb58-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb58-6"><a href="#cb58-6" aria-hidden="true" tabindex="-1"></a> jValue' <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb58-7"><a href="#cb58-7" aria-hidden="true" tabindex="-1"></a> c <span class="ot"><-</span> lookahead</span>
<span id="cb58-8"><a href="#cb58-8" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> c <span class="kw">of</span></span>
<span id="cb58-9"><a href="#cb58-9" aria-hidden="true" tabindex="-1"></a> <span class="ch">'n'</span> <span class="ot">-></span> jNull</span>
<span id="cb58-10"><a href="#cb58-10" aria-hidden="true" tabindex="-1"></a> <span class="ch">'t'</span> <span class="ot">-></span> jBool</span>
<span id="cb58-11"><a href="#cb58-11" aria-hidden="true" tabindex="-1"></a> <span class="ch">'f'</span> <span class="ot">-></span> jBool</span>
<span id="cb58-12"><a href="#cb58-12" aria-hidden="true" tabindex="-1"></a> <span class="ch">'\"'</span> <span class="ot">-></span> jString</span>
<span id="cb58-13"><a href="#cb58-13" aria-hidden="true" tabindex="-1"></a> <span class="ch">'['</span> <span class="ot">-></span> jArray</span>
<span id="cb58-14"><a href="#cb58-14" aria-hidden="true" tabindex="-1"></a> <span class="ch">'{'</span> <span class="ot">-></span> jObject</span>
<span id="cb58-15"><a href="#cb58-15" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> jNumber</span></code></pre></div>
<a href="#fnref14" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn15"><p>Our astute readers may have noticed that all the QuickCheck properties we wrote are for <em>Positive Tests</em>, that is, we only generate valid JSON text to test our parsers. They do not guarantee that the parsers fail for invalid inputs. Since the point of this post is not to learn testing in details, I’ll leave the <a href="https://en.wikipedia.org/wiki/Negative_Testing" target="_blank" rel="noopener"><em>Negative Testing</em></a> as an exercise for the readers. However, I’ve ascertained the correctness of this JSON parser by running it against <a href="https://github.com/nst/JSONTestSuite" target="_blank" rel="noopener">this test suite</a> written by the author of the famous <a href="https://seriot.ch/parsing_json.php" target="_blank" rel="noopener">Parsing JSON is a Minefield</a> article.<a href="#fnref15" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>JSON Parsing from Scratch in Haskell</strong>.</p>
<ol>
<li>
<strong>JSON Parsing from Scratch in Haskell</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-2/?mtm_campaign=feed">Error Reporting—Part 1</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell-3/?mtm_campaign=feed">Error Reporting—Part 2</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/json-parsing-from-scratch-in-haskell/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2020-05-04T00:00:00Z <p><a href="https://en.wikipedia.org/wiki/JSON" target="_blank" rel="noopener">JSON</a> is probably the most used standard file format for storing and transmitting data on the Internet in recent times. Though it was historically derived from <a href="https://en.wikipedia.org/wiki/JavaScript" target="_blank" rel="noopener">JavaScript</a>, it is a programming language independent format and is now supported by almost all languages. JSON has a simple syntax specification with only four scalar data types and two composite data types. So, writing a parser for JSON is a great exercise for learning the basics of parsing. Let’s write one from scratch in Haskell.</p>
https://abhinavsarkar.net/posts/twt-notes-1/ Notes for ‘Thinking with Types: Type-level Programming in Haskell’, Chapters 1–5 2020-03-18T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p><a href="https://www.haskell.org" target="_blank" rel="noopener">Haskell</a>—with its powerful type system—has a great support for type-level programming and it has gotten much better in the recent times with the new releases of the <a href="https://www.haskell.org/ghc/" target="_blank" rel="noopener">GHC</a> compiler. But type-level programming remains a daunting topic even with seasoned haskellers. <em><a href="https://thinkingwithtypes.com/" target="_blank" rel="noopener">Thinking with Types: Type-level Programming in Haskell</a></em> by <a href="https://sandymaguire.me/about/" target="_blank" rel="noopener">Sandy Maguire</a> is a book which attempts to fix that. I’ve taken some notes to summarize my understanding of the same.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/twt-notes-1/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more-->
<nav id="toc"><h3>Contents</h3><ol><li><a href="#introduction">Introduction</a></li><li><a href="#chapter-1.-the-algebra-behind-types">Chapter 1. The Algebra Behind Types</a><ol><li><a href="#isomorphisms-and-cardinalities">Isomorphisms and Cardinalities</a></li><li><a href="#sum-product-and-exponential-types">Sum, Product and Exponential Types</a></li><li><a href="#the-curry-howard-isomorphism">The Curry-Howard Isomorphism</a></li><li><a href="#canonical-representations">Canonical Representations</a></li></ol></li><li><a href="#chapter-2.-terms-types-and-kinds">Chapter 2. Terms, Types and Kinds</a><ol><li><a href="#the-kind-system">The Kind System</a></li><li><a href="#data-kinds">Data Kinds</a></li><li><a href="#promotion-of-built-in-types">Promotion of Built-In Types</a></li><li><a href="#type-level-functions">Type-level Functions</a></li></ol></li><li><a href="#chapter-3.-variance">Chapter 3. Variance</a></li><li><a href="#chapter-4.-working-with-types">Chapter 4. Working with Types</a></li><li><a href="#chapter-5.-constraints-and-gadts">Chapter 5. Constraints and GADTs</a><ol><li><a href="#constraints">Constraints</a></li><li><a href="#gadts">GADTs</a></li><li><a href="#heterogeneous-lists">Heterogeneous Lists</a></li><li><a href="#creating-new-constraints">Creating New Constraints</a></li></ol></li><li><a href="#conclusion">Conclusion</a></li></ol></nav>
<h2 data-track-content data-content-name="introduction" data-content-piece="twt-notes-1" id="introduction">Introduction</h2>
<ul>
<li>Type-level Programming (TLP) is writing programs that run at compile-time, unlike term-level programming which is writing programs that run at run-time.</li>
<li>TLP should be used in moderation.</li>
<li>TLP should be mostly used
<ul>
<li>for programs that are catastrophic to get wrong (finance, healthcare, etc).</li>
<li>when it simplifies the program API massively.</li>
<li>when power-to-weight ratio of adding TLP is high.</li>
</ul></li>
<li>Types are not a silver bullet for fixing all errors:
<ul>
<li>Correct programs can be not well-typed.</li>
<li>It can be hard to assign type for useful programs. e.g. <code>printf</code> from C.</li>
</ul></li>
<li>Types can turn possible runtime errors into compile-time errors.</li>
</ul>
<h2 data-track-content data-content-name="chapter-1.-the-algebra-behind-types" data-content-piece="twt-notes-1" id="chapter-1.-the-algebra-behind-types">Chapter 1. The Algebra Behind Types</h2>
<h3 id="isomorphisms-and-cardinalities">Isomorphisms and Cardinalities</h3>
<ul>
<li><em>Cardinality</em> of a type is the number of values it can have ignoring bottoms. The values of a type are also called the <em>inhabitants</em> of the type.</li>
</ul>
<div class="sourceCode" id="cb1" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Void</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="co">-- no possible values. cardinality: 0</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Unit</span> <span class="ot">=</span> <span class="dt">Unit</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="co">-- only one possible value. cardinality: 1</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Bool</span> <span class="ot">=</span> <span class="dt">True</span> <span class="op">|</span> <span class="dt">False</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="co">-- only two possible values. cardinality: 2</span></span></code></pre></div>
<ul>
<li>Cardinality is written using notation: <code>|Void| = 0</code></li>
<li>Two types are said to be <em>Isomorphic</em> if they have same cardinality.</li>
<li>An <em>isomorphism</em> between types <code>a</code> and <code>b</code> is a pair of functions <code>to</code> and <code>from</code> such that:</li>
</ul>
<div class="sourceCode" id="cb2" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">to ::</span> a <span class="ot">-></span> b</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="ot">from ::</span> b <span class="ot">-></span> a</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>to <span class="op">.</span> from <span class="ot">=</span> <span class="fu">id</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>from <span class="op">.</span> to <span class="ot">=</span> <span class="fu">id</span></span></code></pre></div>
<h3 id="sum-product-and-exponential-types">Sum, Product and Exponential Types</h3>
<ul>
<li><code class="sourceCode haskell"><span class="dt">Either</span> a b</code> is a <em>Sum</em> type. Its number of inhabitants is sum of the number of inhabitants of type <code>a</code> and <code>b</code> like so: <code>|a|</code> possible values with the <code class="sourceCode haskell"><span class="dt">Left</span></code> constructor and <code>|b|</code> possible values with the <code class="sourceCode haskell"><span class="dt">Right</span></code> constructor. Formally:</li>
</ul>
<pre class="plain"><code>|Either a b| = |a| + |b|</code></pre>
<ul>
<li><code>(a, b)</code> is a <em>Product</em> type. Its number of inhabitant is the product of the number of inhabitants of types <code>a</code> and <code>b</code>. Formally:</li>
</ul>
<pre class="plain"><code>|(a, b)| = |a| * |b|</code></pre>
<ul>
<li>Some more examples:</li>
</ul>
<pre class="plain"><code>|Maybe a| = |Nothing| + |Just a| = 1 + |a|
|[a]| = 1 + |a| + |a|^2 + |a|^3 + ...
|Either a Void| = |a| + 0 = |a|
|Either Void a| = 0 + |a| = |a|
|(a, Unit)| = |a| * 1 = |a|
|(Unit, a)| = 1 * |a| = |a|</code></pre>
<ul>
<li>Function types are exponentiation types.</li>
</ul>
<pre class="plain"><code>|a -> b| = |b|^|a|</code></pre>
<p>For every value in domain <code>a</code> there can be <code>|b|</code> possible values in the range <code>b</code>. And there are <code>|a|</code> possible values in domain <code>a</code>. So:</p>
<pre class="plain"><code>|a -> b|
= |b| * |b| * ... * |b| -- (|a| times)
= |b|^|a|</code></pre>
<ul>
<li>Data can be represented in many possible isomorphic types. Some of them are more useful than others. Example:</li>
</ul>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">TicTacToe1</span> a <span class="ot">=</span> <span class="dt">TicTacToe1</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> {<span class="ot"> topLeft ::</span> a</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> topCenter ::</span> a</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> topRight ::</span> a</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> middleLeft ::</span> a</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> middleCenter ::</span> a</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> middleRight ::</span> a</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> bottomLeft ::</span> a</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> bottomCenter ::</span> a</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a> ,<span class="ot"> bottomRight ::</span> a</span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a> }</span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a><span class="op">|</span><span class="dt">TicTacToe1</span> a<span class="op">|</span></span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="op">|</span>a<span class="op">|</span> <span class="op">*</span> <span class="op">|</span>a<span class="op">|</span> <span class="op">*</span> <span class="op">...</span> <span class="op">*</span> <span class="op">|</span>a<span class="op">|</span> <span class="co">-- 9 times</span></span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="op">|</span>a<span class="op">|^</span><span class="dv">9</span></span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a><span class="ot">emptyBoard1 ::</span> <span class="dt">TicTacToe1</span> (<span class="dt">Maybe</span> <span class="dt">Bool</span>)</span>
<span id="cb8-18"><a href="#cb8-18" aria-hidden="true" tabindex="-1"></a>emptyBoard1 <span class="ot">=</span></span>
<span id="cb8-19"><a href="#cb8-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">TicTacToe1</span> <span class="dt">Nothing</span> <span class="dt">Nothing</span> <span class="dt">Nothing</span></span>
<span id="cb8-20"><a href="#cb8-20" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="dt">Nothing</span> <span class="dt">Nothing</span></span>
<span id="cb8-21"><a href="#cb8-21" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="dt">Nothing</span> <span class="dt">Nothing</span></span>
<span id="cb8-22"><a href="#cb8-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-23"><a href="#cb8-23" aria-hidden="true" tabindex="-1"></a><span class="co">-- Alternatively</span></span>
<span id="cb8-24"><a href="#cb8-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-25"><a href="#cb8-25" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Three</span> <span class="ot">=</span> <span class="dt">One</span> <span class="op">|</span> <span class="dt">Two</span> <span class="op">|</span> <span class="dt">Three</span></span>
<span id="cb8-26"><a href="#cb8-26" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">TicTacToe2</span> a <span class="ot">=</span></span>
<span id="cb8-27"><a href="#cb8-27" aria-hidden="true" tabindex="-1"></a> <span class="dt">TicTacToe2</span> (<span class="dt">Three</span> <span class="ot">-></span> <span class="dt">Three</span> <span class="ot">-></span> a)</span>
<span id="cb8-28"><a href="#cb8-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-29"><a href="#cb8-29" aria-hidden="true" tabindex="-1"></a><span class="op">|</span><span class="dt">TicTacToe2</span> a<span class="op">|</span> <span class="ot">=</span> <span class="op">|</span>a<span class="op">|^</span>(<span class="op">|</span><span class="dt">Three</span><span class="op">|</span> <span class="op">*</span> <span class="op">|</span><span class="dt">Three</span><span class="op">|</span>)</span>
<span id="cb8-30"><a href="#cb8-30" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="op">|</span>a<span class="op">|^</span>(<span class="dv">3</span><span class="op">*</span><span class="dv">3</span>)</span>
<span id="cb8-31"><a href="#cb8-31" aria-hidden="true" tabindex="-1"></a> <span class="ot">=</span> <span class="op">|</span>a<span class="op">|^</span><span class="dv">9</span></span>
<span id="cb8-32"><a href="#cb8-32" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-33"><a href="#cb8-33" aria-hidden="true" tabindex="-1"></a><span class="ot">emptyBoard2 ::</span> <span class="dt">TicTacToe2</span> (<span class="dt">Maybe</span> <span class="dt">Bool</span>)</span>
<span id="cb8-34"><a href="#cb8-34" aria-hidden="true" tabindex="-1"></a>emptyBoard2 <span class="ot">=</span></span>
<span id="cb8-35"><a href="#cb8-35" aria-hidden="true" tabindex="-1"></a> <span class="dt">TicTacToe2</span> <span class="op">$</span> <span class="fu">const</span> <span class="op">$</span> <span class="fu">const</span> <span class="dt">Nothing</span></span></code></pre></div>
<h3 id="the-curry-howard-isomorphism">The Curry-Howard Isomorphism</h3>
<ul>
<li>Every logic statement can be expressed as an equivalent computer program.</li>
<li>Helps us analyze mathematical theorems through programming.</li>
</ul>
<h3 id="canonical-representations">Canonical Representations</h3>
<ul>
<li>Since multiple equivalent representations of a type are possible, the representation in form of sum of products is considered the canonical representation of the type. Example:</li>
</ul>
<div class="sourceCode" id="cb9" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="dt">Either</span> a (<span class="dt">Either</span> b (c, d)) <span class="co">-- canonical</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>(a, <span class="dt">Bool</span>) <span class="co">-- not canonical</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="dt">Either</span> a a</span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="co">-- same cardinality as above but canonical</span></span></code></pre></div>
<h2 data-track-content data-content-name="chapter-2.-terms-types-and-kinds" data-content-piece="twt-notes-1" id="chapter-2.-terms-types-and-kinds">Chapter 2. Terms, Types and Kinds</h2>
<h3 id="the-kind-system">The Kind System</h3>
<ul>
<li><em>Terms</em> are things manipulated at runtime. <em>Types</em> of terms are used by compiler to prove “things” about the terms.</li>
<li>Similarly, <em>Types</em> are things manipulated at compile-time. <em>Kinds</em> of types are used by the compiler to prove “things” about the types.</li>
<li>Kinds are “the types of the Types”.</li>
<li>Kind of things that can exist at runtime (terms) is <code class="sourceCode haskell"><span class="op">*</span></code>. That is, kind of <code class="sourceCode haskell"><span class="dt">Int</span></code>, <code class="sourceCode haskell"><span class="dt">String</span></code> etc is <code class="sourceCode haskell"><span class="op">*</span></code>.</li>
</ul>
<div class="sourceCode" id="cb10" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="dt">True</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>True :: Bool</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">Bool</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a>Bool :: *</span></code></pre></div>
<ul>
<li>There are kinds other than <code class="sourceCode haskell"><span class="op">*</span></code>. For example:</li>
</ul>
<div class="sourceCode" id="cb11" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">Show</span> <span class="dt">Int</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>Show Int :: Constraint</span></code></pre></div>
<ul>
<li>Higher-kinded types have <code class="sourceCode haskell">(<span class="ot">-></span>)</code> in their kind signature:</li>
</ul>
<div class="sourceCode" id="cb12" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">Maybe</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>Maybe :: * -> *</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">Maybe</span> <span class="dt">Int</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>Maybe Int :: *</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="dt">Control.Monad.Trans.Maybe.MaybeT</span></span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>Control.Monad.Trans.Maybe.MaybeT</span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> :: m (Maybe a) -> Control.Monad.Trans.Maybe.MaybeT m a</span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">Control.Monad.Trans.Maybe.MaybeT</span></span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a>Control.Monad.Trans.Maybe.MaybeT :: (* -> *) -> * -> *</span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">Control.Monad.Trans.Maybe.MaybeT</span> <span class="dt">IO</span> <span class="dt">Int</span></span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a>Control.Monad.Trans.Maybe.MaybeT IO Int :: *</span></code></pre></div>
<h3 id="data-kinds">Data Kinds</h3>
<ul>
<li><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/data_kinds.html#extension-DataKinds" target="_blank" rel="noopener"><code>DataKinds</code></a> extension lets us create new kinds.</li>
<li>It lifts data constructors into type constructors and types into kinds.</li>
</ul>
<div class="sourceCode" id="cb13" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XDataKinds</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">data</span> <span class="dt">Allow</span> <span class="ot">=</span> <span class="dt">Yes</span> <span class="op">|</span> <span class="dt">No</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="dt">Yes</span> <span class="co">-- Yes is data constructor</span></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>Yes :: Allow</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">Allow</span> <span class="co">-- Allow is a type</span></span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>Allow :: *</span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">'Yes</span> <span class="co">-- 'Yes is a type too. Its kind is 'Allow.</span></span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a>'Yes :: Allow</span></code></pre></div>
<ul>
<li>Lifted constructors and types are written with a preceding <code>'</code> (called <em>tick</em>).</li>
</ul>
<h3 id="promotion-of-built-in-types">Promotion of Built-In Types</h3>
<ul>
<li><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/data_kinds.html#extension-DataKinds" target="_blank" rel="noopener"><code>DataKinds</code></a> extension promotes built-in types too.</li>
<li>Strings are promoted to the kind <code>Symbol</code>.</li>
<li>Natural numbers are promoted to the kind <code>Nat</code>.</li>
</ul>
<div class="sourceCode" id="cb14" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="st">"hi"</span> <span class="co">-- "hi" is a type-level string</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>"hi" :: GHC.Types.Symbol</span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dv">123</span> <span class="co">-- 123 is a type-level natural number</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>123 :: GHC.Types.Nat</span></code></pre></div>
<ul>
<li>We can do type level operations on <code class="sourceCode haskell"><span class="dt">Symbol</span></code>s and <code class="sourceCode haskell"><span class="dt">Nat</span></code>s.</li>
</ul>
<div class="sourceCode" id="cb15" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>m <span class="op">+</span><span class="dt">GHC.TypeLits</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">AppendSymbol</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>AppendSymbol :: Symbol -> Symbol -> Symbol</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">AppendSymbol</span> <span class="st">"hello "</span> <span class="st">"there"</span></span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>AppendSymbol "hello " "there" :: Symbol</span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a>= "hello there"</span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeOperators</span></span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> (<span class="dv">1</span> <span class="op">+</span> <span class="dv">2</span>) <span class="op">^</span> <span class="dv">7</span></span>
<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a>(1 + 2) ^ 7 :: Nat</span>
<span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a>= 2187</span></code></pre></div>
<ul>
<li><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/type_operators.html#extension-TypeOperators" target="_blank" rel="noopener"><code>TypeOperators</code></a> extension is needed for applying type-level functions with symbolic identifiers.</li>
<li>There are type-level lists and tuples:</li>
</ul>
<div class="sourceCode" id="cb16" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind '[ <span class="dt">'True</span> ]</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>'[ 'True ] :: [Bool]</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind '[<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">3</span>]</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>'[1,2,3] :: [Nat]</span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind '[<span class="st">"abc"</span>]</span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>'["abc"] :: [Symbol]</span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">'False</span> '<span class="op">:</span> <span class="dt">'True</span> '<span class="op">:</span> '[]</span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a>'False ': 'True ': '[] :: [Bool]</span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind '(<span class="dv">6</span>, <span class="st">"x"</span>, <span class="dt">'False</span>)</span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a>'(6, "x", 'False) :: (Nat, Symbol, Bool)</span></code></pre></div>
<h3 id="type-level-functions">Type-level Functions</h3>
<ul>
<li>With the <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/type_families.html#extension-TypeFamilies" target="_blank" rel="noopener"><code>TypeFamilies</code></a> extension, it’s possible to write new type-level functions as closed type families:</li>
</ul>
<div class="sourceCode" id="cb17" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XDataKinds</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeFamilies</span></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">type</span> <span class="kw">family</span> <span class="dt">And</span> (<span class="ot">x ::</span> <span class="dt">Bool</span>) (<span class="ot">y ::</span> <span class="dt">Bool</span>)<span class="ot"> ::</span> <span class="dt">Bool</span> <span class="kw">where</span></span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">And</span> <span class="dt">'True</span> <span class="dt">'True</span> <span class="ot">=</span> <span class="dt">'True</span></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">And</span> _ _ <span class="ot">=</span> <span class="dt">'False</span></span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind <span class="dt">And</span></span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a>And :: Bool -> Bool -> Bool</span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">And</span> <span class="dt">'True</span> <span class="dt">'False</span></span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a>And 'True 'False :: Bool</span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a>= 'False</span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">And</span> <span class="dt">'True</span> <span class="dt">'True</span></span>
<span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a>And 'True 'True :: Bool</span>
<span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a>= 'True</span>
<span id="cb17-16"><a href="#cb17-16" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">And</span> <span class="dt">'False</span> <span class="dt">'True</span></span>
<span id="cb17-17"><a href="#cb17-17" aria-hidden="true" tabindex="-1"></a>And 'False 'True :: Bool</span>
<span id="cb17-18"><a href="#cb17-18" aria-hidden="true" tabindex="-1"></a>= 'False</span></code></pre></div>
<div class="page-break">
</div>
<h2 data-track-content data-content-name="chapter-3.-variance" data-content-piece="twt-notes-1" id="chapter-3.-variance">Chapter 3. Variance</h2>
<ul>
<li>There are three types of <em>Variance</em> (<code class="sourceCode haskell"><span class="dt">T</span></code> here a type of kind <code>* -> *</code>):
<ul>
<li>Covariant: any function of type <code class="sourceCode haskell">a <span class="ot">-></span> b</code> can be lifted into a function of type <code class="sourceCode haskell"><span class="dt">T</span> a <span class="ot">-></span> <span class="dt">T</span> b</code>. Covariant types are instances of the <a href="https://hackage.haskell.org/package/base/docs/Prelude.html#t:Functor" target="_blank" rel="noopener"><code>Functor</code></a> typeclass:</li>
</ul>
<div class="sourceCode" id="cb18" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> <span class="dt">Functor</span> f <span class="kw">where</span></span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a><span class="ot"> fmap ::</span> (a <span class="ot">-></span> b) <span class="ot">-></span> f a <span class="ot">-></span> f b</span></code></pre></div>
<ul>
<li>Contravariant: any function of type <code class="sourceCode haskell">a <span class="ot">-></span> b</code> can be lifted into a function of type <code class="sourceCode haskell"><span class="dt">T</span> b <span class="ot">-></span> <span class="dt">T</span> a</code>. Contravariant functions are instances of the <a href="https://hackage.haskell.org/package/base/docs/Data-Functor-Contravariant.html" target="_blank" rel="noopener"><code>Contravariant</code></a> typeclass:</li>
</ul>
<div class="sourceCode" id="cb19" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> <span class="dt">Contravariant</span> f <span class="kw">where</span></span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a><span class="ot"> contramap ::</span> (a <span class="ot">-></span> b) <span class="ot">-></span> f b <span class="ot">-></span> f a</span></code></pre></div>
<ul>
<li>Invariant: no function of type <code class="sourceCode haskell">a <span class="ot">-></span> b</code> can be lifted into a function of type <code class="sourceCode haskell"><span class="dt">T</span> a</code>. Invariant functions are instances of the <a href="https://hackage.haskell.org/package/invariant/docs/Data-Functor-Invariant.html#t:Invariant" target="_blank" rel="noopener"><code>Invariant</code></a> typeclass:</li>
</ul>
<div class="sourceCode" id="cb20" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> <span class="dt">Invariant</span> f <span class="kw">where</span></span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a><span class="ot"> invmap ::</span> (a <span class="ot">-></span> b) <span class="ot">-></span> (b <span class="ot">-></span> a) <span class="ot">-></span> f a <span class="ot">-></span> f b</span></code></pre></div></li>
<li>Variance of a type <code class="sourceCode haskell"><span class="dt">T</span></code> is specified with respect to a particular type parameter. A type <code class="sourceCode haskell"><span class="dt">T</span></code> with two parameters <code>a</code> and <code>b</code> could be covariant wrt. <code>a</code> and contravariant wrt. <code>b</code>.</li>
<li>Variance of a type <code class="sourceCode haskell"><span class="dt">T</span></code> wrt. a particular type parameter is determined by whether the parameter appears in positive or negative <em>position</em>s.
<ul>
<li>If a type parameter appears on the left-hand side of a function, it is said to be in a negative position. Else it is said to be in a positive position.</li>
<li>If a type parameter appears only in positive positions then the type is covariant wrt. that parameter.</li>
<li>If a type parameter appears only in negative positions then the type is contravariant wrt. that parameter.</li>
<li>If a type parameter appears in both positive and negative positions then the type is invariant wrt. that parameter.</li>
<li>positions follow the laws of multiplication for their <em>signs</em>.</li>
</ul></li>
</ul>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">a</th>
<th style="text-align: left;">b</th>
<th style="text-align: left;">a * b</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">+</td>
<td style="text-align: left;">+</td>
<td style="text-align: left;">+</td>
</tr>
<tr>
<td style="text-align: left;">+</td>
<td style="text-align: left;">-</td>
<td style="text-align: left;">-</td>
</tr>
<tr>
<td style="text-align: left;">-</td>
<td style="text-align: left;">+</td>
<td style="text-align: left;">-</td>
</tr>
<tr>
<td style="text-align: left;">-</td>
<td style="text-align: left;">-</td>
<td style="text-align: left;">+</td>
</tr>
</tbody>
</table>
</div>
<ul>
<li>Examples:</li>
</ul>
<div class="sourceCode" id="cb21" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">T1</span> a <span class="ot">=</span> <span class="dt">T1</span> (<span class="dt">Int</span> <span class="ot">-></span> a)</span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a><span class="co">-- a is in +ve position, T1 is covariant wrt. a.</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">T2</span> a <span class="ot">=</span> <span class="dt">T2</span> (a <span class="ot">-></span> <span class="dt">Int</span>)</span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a><span class="co">-- a is in -ve position, T2 is contravariant wrt. a.</span></span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">T3</span> a <span class="ot">=</span> <span class="dt">T3</span> (a <span class="ot">-></span> a)</span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a><span class="co">-- a is in both -ve and +ve position. T3 is invariant wrt. a.</span></span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">T4</span> a <span class="ot">=</span> <span class="dt">T4</span> ((<span class="dt">Int</span> <span class="ot">-></span> a) <span class="ot">-></span> <span class="dt">Int</span>)</span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a><span class="co">-- a is in +ve position but (Int -> a) is in -ve position.</span></span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a><span class="co">-- So a is in -ve position overall. T4 is contravariant wrt. a.</span></span>
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a><span class="kw">newtype</span> <span class="dt">T5</span> a <span class="ot">=</span> <span class="dt">T5</span> ((a <span class="ot">-></span> <span class="dt">Int</span>) <span class="ot">-></span> <span class="dt">Int</span>)</span>
<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a><span class="co">-- a is in -ve position but (a -> Int) is in -ve position.</span></span>
<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a><span class="co">-- So a is in +ve position overall. T5 is covariant wrt. a.</span></span></code></pre></div>
<ul>
<li>Covariant parameters are said to be <em>produced</em> or <em>owned</em> by the type.</li>
<li>Contravariant parameters are said to be <em>consumed</em> by the type.</li>
<li>A type that has two parameters and is covariant in both of them is an instance of <a href="https://hackage.haskell.org/package/base/docs/Data-Bifunctor.html#t:Bifunctor" target="_blank" rel="noopener"><code>BiFunctor</code></a>.</li>
<li>A type that has two parameters and is contravariant in first parameter and covariant in second parameter is an instance of <a href="https://hackage.haskell.org/package/profunctors/docs/Data-Profunctor.html#t:Profunctor" target="_blank" rel="noopener"><code>Profunctor</code></a>.</li>
</ul>
<h2 data-track-content data-content-name="chapter-4.-working-with-types" data-content-piece="twt-notes-1" id="chapter-4.-working-with-types">Chapter 4. Working with Types</h2>
<ul>
<li>Standard Haskell has no notion of scopes for types.</li>
<li><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/scoped_type_variables.html#extension-ScopedTypeVariables" target="_blank" rel="noopener"><code>ScopedTypeVariables</code></a> extension lets us bind type variables to a scope. It requires an explicitly <code>forall</code> quantifier in type signatures.</li>
</ul>
<div class="sourceCode" id="cb22" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="co">-- This does not compile.</span></span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a><span class="ot">> comp ::</span> (a <span class="ot">-></span> b) <span class="ot">-></span> (b <span class="ot">-></span> c) <span class="ot">-></span> a <span class="ot">-></span> c</span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> comp f g a <span class="ot">=</span> go f</span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">where</span></span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a><span class="ot">> go ::</span> (a <span class="ot">-></span> b) <span class="ot">-></span> c</span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> go f' <span class="ot">=</span> g (f' a)</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a><span class="ot"><</span>interactive<span class="op">>:</span><span class="dv">11</span><span class="op">:</span><span class="dv">11</span><span class="op">:</span> <span class="fu">error</span><span class="op">:</span></span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a> • Couldn't match expected type ‘c1’ with actual type ‘c’</span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a> ‘c1’ is a rigid type variable bound by</span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a> the type signature for:</span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a> go :: forall a1 b1 c1. (a1 -> b1) -> c1</span>
<span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a> at <interactive>:10:3-21</span>
<span id="cb22-16"><a href="#cb22-16" aria-hidden="true" tabindex="-1"></a> ‘c’ is a rigid type variable bound by</span>
<span id="cb22-17"><a href="#cb22-17" aria-hidden="true" tabindex="-1"></a> the type signature for:</span>
<span id="cb22-18"><a href="#cb22-18" aria-hidden="true" tabindex="-1"></a> comp :: forall a b c. (a -> b) -> (b -> c) -> a -> c</span>
<span id="cb22-19"><a href="#cb22-19" aria-hidden="true" tabindex="-1"></a> at <interactive>:7:1-38</span>
<span id="cb22-20"><a href="#cb22-20" aria-hidden="true" tabindex="-1"></a> • In the expression: g (f' a)</span>
<span id="cb22-21"><a href="#cb22-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-22"><a href="#cb22-22" aria-hidden="true" tabindex="-1"></a><span class="ot"><</span>interactive<span class="op">>:</span><span class="dv">11</span><span class="op">:</span><span class="dv">14</span><span class="op">:</span> <span class="fu">error</span><span class="op">:</span></span>
<span id="cb22-23"><a href="#cb22-23" aria-hidden="true" tabindex="-1"></a> • Couldn't match expected type ‘b’ with actual type ‘b1’</span>
<span id="cb22-24"><a href="#cb22-24" aria-hidden="true" tabindex="-1"></a> ‘b1’ is a rigid type variable bound by</span>
<span id="cb22-25"><a href="#cb22-25" aria-hidden="true" tabindex="-1"></a> the type signature for:</span>
<span id="cb22-26"><a href="#cb22-26" aria-hidden="true" tabindex="-1"></a> go :: forall a1 b1 c1. (a1 -> b1) -> c1</span>
<span id="cb22-27"><a href="#cb22-27" aria-hidden="true" tabindex="-1"></a> at <interactive>:10:3-21</span>
<span id="cb22-28"><a href="#cb22-28" aria-hidden="true" tabindex="-1"></a> ‘b’ is a rigid type variable bound by</span>
<span id="cb22-29"><a href="#cb22-29" aria-hidden="true" tabindex="-1"></a> the type signature for:</span>
<span id="cb22-30"><a href="#cb22-30" aria-hidden="true" tabindex="-1"></a> comp :: forall a b c. (a -> b) -> (b -> c) -> a -> c</span>
<span id="cb22-31"><a href="#cb22-31" aria-hidden="true" tabindex="-1"></a> at <interactive>:7:1-38</span>
<span id="cb22-32"><a href="#cb22-32" aria-hidden="true" tabindex="-1"></a> • In the first argument of ‘g’, namely ‘(f' a)’</span>
<span id="cb22-33"><a href="#cb22-33" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-34"><a href="#cb22-34" aria-hidden="true" tabindex="-1"></a><span class="ot"><</span>interactive<span class="op">>:</span><span class="dv">11</span><span class="op">:</span><span class="dv">17</span><span class="op">:</span> <span class="fu">error</span><span class="op">:</span></span>
<span id="cb22-35"><a href="#cb22-35" aria-hidden="true" tabindex="-1"></a> • Couldn't match expected type ‘a1’ with actual type ‘a’</span>
<span id="cb22-36"><a href="#cb22-36" aria-hidden="true" tabindex="-1"></a> ‘a1’ is a rigid type variable bound by</span>
<span id="cb22-37"><a href="#cb22-37" aria-hidden="true" tabindex="-1"></a> the type signature for:</span>
<span id="cb22-38"><a href="#cb22-38" aria-hidden="true" tabindex="-1"></a> go :: forall a1 b1 c1. (a1 -> b1) -> c1</span>
<span id="cb22-39"><a href="#cb22-39" aria-hidden="true" tabindex="-1"></a> at <interactive>:10:3-21</span>
<span id="cb22-40"><a href="#cb22-40" aria-hidden="true" tabindex="-1"></a> ‘a’ is a rigid type variable bound by</span>
<span id="cb22-41"><a href="#cb22-41" aria-hidden="true" tabindex="-1"></a> the type signature for:</span>
<span id="cb22-42"><a href="#cb22-42" aria-hidden="true" tabindex="-1"></a> comp :: forall a b c. (a -> b) -> (b -> c) -> a -> c</span>
<span id="cb22-43"><a href="#cb22-43" aria-hidden="true" tabindex="-1"></a> at <interactive>:7:1-38</span>
<span id="cb22-44"><a href="#cb22-44" aria-hidden="true" tabindex="-1"></a> • In the first argument of ‘f'’, namely ‘a’</span>
<span id="cb22-45"><a href="#cb22-45" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-46"><a href="#cb22-46" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="co">-- But this does.</span></span>
<span id="cb22-47"><a href="#cb22-47" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XScopedTypeVariables</span></span>
<span id="cb22-48"><a href="#cb22-48" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb22-49"><a href="#cb22-49" aria-hidden="true" tabindex="-1"></a><span class="ot">> comp ::</span> <span class="kw">forall</span> a b c<span class="op">.</span> (a <span class="ot">-></span> b) <span class="ot">-></span> (b <span class="ot">-></span> c) <span class="ot">-></span> a <span class="ot">-></span> c</span>
<span id="cb22-50"><a href="#cb22-50" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> comp f g a <span class="ot">=</span> go f</span>
<span id="cb22-51"><a href="#cb22-51" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">where</span></span>
<span id="cb22-52"><a href="#cb22-52" aria-hidden="true" tabindex="-1"></a><span class="ot">> go ::</span> (a <span class="ot">-></span> b) <span class="ot">-></span> c</span>
<span id="cb22-53"><a href="#cb22-53" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> go f' <span class="ot">=</span> g (f' a)</span>
<span id="cb22-54"><a href="#cb22-54" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span></code></pre></div>
<ul>
<li><a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/type_applications.html#extension-TypeApplications" target="_blank" rel="noopener"><code>TypeApplications</code></a> extension lets us directly apply types to expressions:</li>
</ul>
<div class="sourceCode" id="cb23" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeApplications</span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="fu">traverse</span></span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a>traverse</span>
<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a> :: (Traversable t, Applicative f) =></span>
<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a> (a -> f b) -> t a -> f (t b)</span>
<span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="fu">traverse</span> <span class="op">@</span><span class="dt">Maybe</span></span>
<span id="cb23-7"><a href="#cb23-7" aria-hidden="true" tabindex="-1"></a>traverse @Maybe</span>
<span id="cb23-8"><a href="#cb23-8" aria-hidden="true" tabindex="-1"></a> :: Applicative f =></span>
<span id="cb23-9"><a href="#cb23-9" aria-hidden="true" tabindex="-1"></a> (a -> f b) -> Maybe a -> f (Maybe b)</span>
<span id="cb23-10"><a href="#cb23-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="fu">traverse</span> <span class="op">@</span><span class="dt">Maybe</span> <span class="op">@</span>[]</span>
<span id="cb23-11"><a href="#cb23-11" aria-hidden="true" tabindex="-1"></a>traverse @Maybe @[]</span>
<span id="cb23-12"><a href="#cb23-12" aria-hidden="true" tabindex="-1"></a> :: (a -> [b]) -> Maybe a -> [Maybe b]</span>
<span id="cb23-13"><a href="#cb23-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="fu">traverse</span> <span class="op">@</span><span class="dt">Maybe</span> <span class="op">@</span>[] <span class="op">@</span><span class="dt">Int</span></span>
<span id="cb23-14"><a href="#cb23-14" aria-hidden="true" tabindex="-1"></a>traverse @Maybe @[] @Int</span>
<span id="cb23-15"><a href="#cb23-15" aria-hidden="true" tabindex="-1"></a> :: (Int -> [b]) -> Maybe Int -> [Maybe b]</span>
<span id="cb23-16"><a href="#cb23-16" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="fu">traverse</span> <span class="op">@</span><span class="dt">Maybe</span> <span class="op">@</span>[] <span class="op">@</span><span class="dt">Int</span> <span class="op">@</span><span class="dt">String</span></span>
<span id="cb23-17"><a href="#cb23-17" aria-hidden="true" tabindex="-1"></a>traverse @Maybe @[] @Int @String</span>
<span id="cb23-18"><a href="#cb23-18" aria-hidden="true" tabindex="-1"></a> :: (Int -> [String]) -> Maybe Int -> [Maybe String]</span></code></pre></div>
<ul>
<li>Types are applied in the order they appear in the type signature. It is possible to avoid applying types by using a type with an underscore: <code>@_</code></li>
</ul>
<div class="sourceCode" id="cb24" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="fu">traverse</span> <span class="op">@</span><span class="dt">Maybe</span> <span class="op">@</span>_ <span class="op">@</span>_ <span class="op">@</span><span class="dt">String</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>traverse @Maybe @_ @_ @String</span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a> :: Applicative w1 =></span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a> (w2 -> w1 String) -> Maybe w2 -> w1 (Maybe String)</span></code></pre></div>
<ul>
<li>Sometimes the compiler cannot infer the type of an expression. <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/ambiguous_types.html#extension-AllowAmbiguousTypes" target="_blank" rel="noopener"><code>AllowAmbiguousTypes</code></a> extension allow such programs to compile.</li>
</ul>
<div class="sourceCode" id="cb25" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XScopedTypeVariables</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a><span class="ot">> f ::</span> <span class="kw">forall</span> a<span class="op">.</span> <span class="dt">Show</span> a <span class="ot">=></span> <span class="dt">Bool</span></span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> f <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb25-6"><a href="#cb25-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-7"><a href="#cb25-7" aria-hidden="true" tabindex="-1"></a><span class="ot"><</span>interactive<span class="op">>:</span><span class="dv">7</span><span class="op">:</span><span class="dv">6</span><span class="op">:</span> <span class="fu">error</span><span class="op">:</span></span>
<span id="cb25-8"><a href="#cb25-8" aria-hidden="true" tabindex="-1"></a> • Could not deduce (Show a0)</span>
<span id="cb25-9"><a href="#cb25-9" aria-hidden="true" tabindex="-1"></a> from the context: Show a</span>
<span id="cb25-10"><a href="#cb25-10" aria-hidden="true" tabindex="-1"></a> bound by the type signature for:</span>
<span id="cb25-11"><a href="#cb25-11" aria-hidden="true" tabindex="-1"></a> f :: forall a. Show a => Bool</span>
<span id="cb25-12"><a href="#cb25-12" aria-hidden="true" tabindex="-1"></a> at <interactive>:7:6-29</span>
<span id="cb25-13"><a href="#cb25-13" aria-hidden="true" tabindex="-1"></a> The type variable ‘a0’ is ambiguous</span>
<span id="cb25-14"><a href="#cb25-14" aria-hidden="true" tabindex="-1"></a> • In the ambiguity check for ‘f’</span>
<span id="cb25-15"><a href="#cb25-15" aria-hidden="true" tabindex="-1"></a> To defer the ambiguity check to use sites, enable AllowAmbiguousTypes</span>
<span id="cb25-16"><a href="#cb25-16" aria-hidden="true" tabindex="-1"></a> In the type signature: f :: forall a. Show a => Bool</span></code></pre></div>
<ul>
<li><code class="sourceCode haskell"><span class="dt">Proxy</span></code> is a type isomorphic to <code>()</code> except with a phantom type parameter:</li>
</ul>
<div class="sourceCode" id="cb26" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Proxy</span> a <span class="ot">=</span> <span class="dt">Proxy</span></span></code></pre></div>
<ul>
<li>With all the three extensions enabled, it is possible to get a term-level representation of types using the <a href="https://hackage.haskell.org/package/base/docs/Data-Typeable.html" target="_blank" rel="noopener"><code>Data.Typeable</code></a> module:</li>
</ul>
<div class="sourceCode" id="cb27" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XScopedTypeVariables</span></span>
<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeApplications</span></span>
<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XAllowAmbiguousTypes</span></span>
<span id="cb27-4"><a href="#cb27-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>m <span class="op">+</span><span class="dt">Data.Typeable</span></span>
<span id="cb27-5"><a href="#cb27-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb27-6"><a href="#cb27-6" aria-hidden="true" tabindex="-1"></a><span class="ot">> typeName ::</span> <span class="kw">forall</span> a<span class="op">.</span> <span class="dt">Typeable</span> a <span class="ot">=></span> <span class="dt">String</span></span>
<span id="cb27-7"><a href="#cb27-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> typeName <span class="ot">=</span> <span class="fu">show</span> <span class="op">.</span> typeRep <span class="op">$</span> <span class="dt">Proxy</span> <span class="op">@</span>a</span>
<span id="cb27-8"><a href="#cb27-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb27-9"><a href="#cb27-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> typeName <span class="op">@</span><span class="dt">String</span></span>
<span id="cb27-10"><a href="#cb27-10" aria-hidden="true" tabindex="-1"></a>"[Char]"</span>
<span id="cb27-11"><a href="#cb27-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> typeName <span class="op">@</span>(<span class="dt">IO</span> <span class="dt">Int</span>)</span>
<span id="cb27-12"><a href="#cb27-12" aria-hidden="true" tabindex="-1"></a>"IO Int"</span></code></pre></div>
<h2 data-track-content data-content-name="chapter-5.-constraints-and-gadts" data-content-piece="twt-notes-1" id="chapter-5.-constraints-and-gadts">Chapter 5. Constraints and GADTs</h2>
<h3 id="constraints">Constraints</h3>
<ul>
<li><em>Constraints</em> are a kind different than the types (<code class="sourceCode haskell"><span class="op">*</span></code>).</li>
<li>Constraints are what appear on the left-hand side on the fat context arrow <code class="sourceCode haskell"><span class="ot">=></span></code>, like <code class="sourceCode haskell"><span class="dt">Show</span> a</code>.</li>
</ul>
<div class="sourceCode" id="cb28" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>k <span class="dt">Show</span></span>
<span id="cb28-2"><a href="#cb28-2" aria-hidden="true" tabindex="-1"></a>Show :: * -> Constraint</span>
<span id="cb28-3"><a href="#cb28-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>k <span class="dt">Show</span> <span class="dt">Int</span></span>
<span id="cb28-4"><a href="#cb28-4" aria-hidden="true" tabindex="-1"></a>Show Int :: Constraint</span>
<span id="cb28-5"><a href="#cb28-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>k (<span class="dt">Show</span> <span class="dt">Int</span>, <span class="dt">Eq</span> <span class="dt">String</span>)</span>
<span id="cb28-6"><a href="#cb28-6" aria-hidden="true" tabindex="-1"></a>(Show Int, Eq String) :: Constraint</span></code></pre></div>
<ul>
<li>Type equalities <code class="sourceCode haskell">(<span class="dt">Int</span> <span class="op">~</span> a)</code> are another way of creating Constraints. <code class="sourceCode haskell">(<span class="dt">Int</span> <span class="op">~</span> a)</code> says <code>a</code> is same as <code class="sourceCode haskell"><span class="dt">Int</span></code>.</li>
<li>Type equalities are
<ul>
<li>reflexive: <code>a ~ a</code> always</li>
<li>symmetrical: <code>a ~ b</code> implies <code>b ~ a</code></li>
<li>transitive: <code>a ~ b</code> and <code>b ~ c</code> implies <code>a ~ c</code></li>
</ul></li>
</ul>
<h3 id="gadts">GADTs</h3>
<ul>
<li><em>GADTs</em> are Generalized Algebraic DataTypes. They allow writing explicit type signatures for data constructors. Here is the code for a length-typed list using GADTs:</li>
</ul>
<div class="sourceCode" id="cb29" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XGADTs</span></span>
<span id="cb29-2"><a href="#cb29-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XKindSignatures</span></span>
<span id="cb29-3"><a href="#cb29-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeOperators</span></span>
<span id="cb29-4"><a href="#cb29-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XDataKinds</span></span>
<span id="cb29-5"><a href="#cb29-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>m <span class="op">+</span><span class="dt">GHC.TypeLits</span></span>
<span id="cb29-6"><a href="#cb29-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb29-7"><a href="#cb29-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">data</span> <span class="dt">List</span> (<span class="ot">a ::</span> <span class="op">*</span>) (<span class="ot">n ::</span> <span class="dt">Nat</span>) <span class="kw">where</span></span>
<span id="cb29-8"><a href="#cb29-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Nil</span><span class="ot"> ::</span> <span class="dt">List</span> a <span class="dv">0</span></span>
<span id="cb29-9"><a href="#cb29-9" aria-hidden="true" tabindex="-1"></a><span class="ot">> (:~) ::</span> a <span class="ot">-></span> <span class="dt">List</span> a n <span class="ot">-></span> <span class="dt">List</span> a (n <span class="op">+</span> <span class="dv">1</span>)</span>
<span id="cb29-10"><a href="#cb29-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">infixr</span> <span class="dv">5</span> <span class="op">:~</span></span>
<span id="cb29-11"><a href="#cb29-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb29-12"><a href="#cb29-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="dt">Nil</span></span>
<span id="cb29-13"><a href="#cb29-13" aria-hidden="true" tabindex="-1"></a>Nil :: List a 0</span>
<span id="cb29-14"><a href="#cb29-14" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="ch">'a'</span> <span class="op">:~</span> <span class="dt">Nil</span></span>
<span id="cb29-15"><a href="#cb29-15" aria-hidden="true" tabindex="-1"></a>'a' :~ Nil :: List Char 1</span>
<span id="cb29-16"><a href="#cb29-16" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="ch">'b'</span> <span class="op">:~</span> <span class="ch">'a'</span> <span class="op">:~</span> <span class="dt">Nil</span></span>
<span id="cb29-17"><a href="#cb29-17" aria-hidden="true" tabindex="-1"></a>'b' :~ 'a' :~ Nil :: List Char 2</span>
<span id="cb29-18"><a href="#cb29-18" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="dt">True</span> <span class="op">:~</span> <span class="ch">'a'</span> <span class="op">:~</span> <span class="dt">Nil</span></span>
<span id="cb29-19"><a href="#cb29-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb29-20"><a href="#cb29-20" aria-hidden="true" tabindex="-1"></a><span class="ot"><</span>interactive<span class="op">>:</span><span class="dv">1</span><span class="op">:</span><span class="dv">9</span><span class="op">:</span> <span class="fu">error</span><span class="op">:</span></span>
<span id="cb29-21"><a href="#cb29-21" aria-hidden="true" tabindex="-1"></a> • Couldn't match type ‘Char’ with ‘Bool’</span>
<span id="cb29-22"><a href="#cb29-22" aria-hidden="true" tabindex="-1"></a> Expected type: List Bool 1</span>
<span id="cb29-23"><a href="#cb29-23" aria-hidden="true" tabindex="-1"></a> Actual type: List Char (0 + 1)</span>
<span id="cb29-24"><a href="#cb29-24" aria-hidden="true" tabindex="-1"></a> • In the second argument of ‘(:~)’, namely ‘'a' :~ Nil’</span>
<span id="cb29-25"><a href="#cb29-25" aria-hidden="true" tabindex="-1"></a> In the expression: True :~ 'a' :~ Nil</span></code></pre></div>
<ul>
<li>GADTs are just syntactic sugar for ADTs with type equalities. The above definition is equivalent to:</li>
</ul>
<div class="sourceCode" id="cb30" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb30-1"><a href="#cb30-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XGADTs</span></span>
<span id="cb30-2"><a href="#cb30-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XKindSignatures</span></span>
<span id="cb30-3"><a href="#cb30-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeOperators</span></span>
<span id="cb30-4"><a href="#cb30-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XDataKinds</span></span>
<span id="cb30-5"><a href="#cb30-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>m <span class="op">+</span><span class="dt">GHC.TypeLits</span></span>
<span id="cb30-6"><a href="#cb30-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb30-7"><a href="#cb30-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">data</span> <span class="dt">List</span> (<span class="ot">a ::</span> <span class="op">*</span>) (<span class="ot">n ::</span> <span class="dt">Nat</span>)</span>
<span id="cb30-8"><a href="#cb30-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="ot">=</span> (n <span class="op">~</span> <span class="dv">0</span>) <span class="ot">=></span> <span class="dt">Nil</span></span>
<span id="cb30-9"><a href="#cb30-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">|</span> a <span class="op">:~</span> <span class="dt">List</span> a (n <span class="op">-</span> <span class="dv">1</span>)</span>
<span id="cb30-10"><a href="#cb30-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">infixr</span> <span class="dv">5</span> <span class="op">:~</span></span>
<span id="cb30-11"><a href="#cb30-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb30-12"><a href="#cb30-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="ch">'a'</span> <span class="op">:~</span> <span class="dt">Nil</span></span>
<span id="cb30-13"><a href="#cb30-13" aria-hidden="true" tabindex="-1"></a>'a' :~ Nil :: List Char 1</span>
<span id="cb30-14"><a href="#cb30-14" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="ch">'b'</span> <span class="op">:~</span> <span class="ch">'a'</span> <span class="op">:~</span> <span class="dt">Nil</span></span>
<span id="cb30-15"><a href="#cb30-15" aria-hidden="true" tabindex="-1"></a>'b' :~ 'a' :~ Nil :: List Char 2</span></code></pre></div>
<ul>
<li>Type-safety of this list can be used to write a safe <code class="sourceCode haskell"><span class="fu">head</span></code> function which does not compile for an empty list:</li>
</ul>
<div class="sourceCode" id="cb31" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb31-1"><a href="#cb31-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb31-2"><a href="#cb31-2" aria-hidden="true" tabindex="-1"></a><span class="ot">> safeHead ::</span> <span class="dt">List</span> a (n <span class="op">+</span> <span class="dv">1</span>) <span class="ot">-></span> a</span>
<span id="cb31-3"><a href="#cb31-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> safeHead (x <span class="op">:~</span> _) <span class="ot">=</span> x</span>
<span id="cb31-4"><a href="#cb31-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb31-5"><a href="#cb31-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> safeHead (<span class="ch">'a'</span> <span class="op">:~</span> <span class="ch">'b'</span> <span class="op">:~</span> <span class="dt">Nil</span>)</span>
<span id="cb31-6"><a href="#cb31-6" aria-hidden="true" tabindex="-1"></a>'a'</span>
<span id="cb31-7"><a href="#cb31-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> safeHead <span class="dt">Nil</span></span>
<span id="cb31-8"><a href="#cb31-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb31-9"><a href="#cb31-9" aria-hidden="true" tabindex="-1"></a><span class="ot"><</span>interactive<span class="op">>:</span><span class="dv">21</span><span class="op">:</span><span class="dv">10</span><span class="op">:</span> <span class="fu">error</span><span class="op">:</span></span>
<span id="cb31-10"><a href="#cb31-10" aria-hidden="true" tabindex="-1"></a> • Couldn't match type ‘1’ with ‘0’</span>
<span id="cb31-11"><a href="#cb31-11" aria-hidden="true" tabindex="-1"></a> Expected type: List a (0 + 1)</span>
<span id="cb31-12"><a href="#cb31-12" aria-hidden="true" tabindex="-1"></a> Actual type: List a 0</span>
<span id="cb31-13"><a href="#cb31-13" aria-hidden="true" tabindex="-1"></a> • In the first argument of ‘safeHead’, namely ‘Nil’</span>
<span id="cb31-14"><a href="#cb31-14" aria-hidden="true" tabindex="-1"></a> In the expression: safeHead Nil</span>
<span id="cb31-15"><a href="#cb31-15" aria-hidden="true" tabindex="-1"></a> In an equation for ‘it’: it = safeHead Nil</span></code></pre></div>
<h3 id="heterogeneous-lists">Heterogeneous Lists</h3>
<p>We can use GADTs to build heterogeneous lists which can store values of different types and are type-safe to use.</p>
<p>First, the required extensions and imports:</p>
<div class="sourceCode" id="cb32" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb32-1"><a href="#cb32-1" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE KindSignatures #-}</span></span>
<span id="cb32-2"><a href="#cb32-2" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE DataKinds #-}</span></span>
<span id="cb32-3"><a href="#cb32-3" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE TypeOperators #-}</span></span>
<span id="cb32-4"><a href="#cb32-4" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE GADTs #-}</span></span>
<span id="cb32-5"><a href="#cb32-5" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE FlexibleInstances #-}</span></span>
<span id="cb32-6"><a href="#cb32-6" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE FlexibleContexts #-}</span></span>
<span id="cb32-7"><a href="#cb32-7" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE TypeApplications #-}</span></span>
<span id="cb32-8"><a href="#cb32-8" aria-hidden="true" tabindex="-1"></a><span class="ot">{-# LANGUAGE ScopedTypeVariables #-}</span></span>
<span id="cb32-9"><a href="#cb32-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb32-10"><a href="#cb32-10" aria-hidden="true" tabindex="-1"></a><span class="kw">module</span> <span class="dt">HList</span> <span class="kw">where</span></span>
<span id="cb32-11"><a href="#cb32-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb32-12"><a href="#cb32-12" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="dt">Data.Typeable</span></span></code></pre></div>
<p><code class="sourceCode haskell"><span class="dt">HList</span></code> is defined as a GADT:</p>
<div class="sourceCode" id="cb33" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb33-1"><a href="#cb33-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">HList</span> (<span class="ot">ts ::</span> [<span class="op">*</span>]) <span class="kw">where</span></span>
<span id="cb33-2"><a href="#cb33-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">HNil</span><span class="ot"> ::</span> <span class="dt">HList</span> '[]</span>
<span id="cb33-3"><a href="#cb33-3" aria-hidden="true" tabindex="-1"></a><span class="ot"> (:#) ::</span> t <span class="ot">-></span> <span class="dt">HList</span> ts <span class="ot">-></span> <span class="dt">HList</span> (t '<span class="op">:</span> ts)</span>
<span id="cb33-4"><a href="#cb33-4" aria-hidden="true" tabindex="-1"></a><span class="kw">infixr</span> <span class="dv">5</span> <span class="op">:#</span></span></code></pre></div>
<p>Example usage:</p>
<div class="sourceCode" id="cb34" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb34-1"><a href="#cb34-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="dt">HNil</span></span>
<span id="cb34-2"><a href="#cb34-2" aria-hidden="true" tabindex="-1"></a>HNil :: HList '[]</span>
<span id="cb34-3"><a href="#cb34-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="ch">'a'</span> <span class="op">:#</span> <span class="dt">HNil</span></span>
<span id="cb34-4"><a href="#cb34-4" aria-hidden="true" tabindex="-1"></a>'a' :# HNil :: HList '[Char]</span>
<span id="cb34-5"><a href="#cb34-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span><span class="kw">type</span> <span class="dt">True</span> <span class="op">:#</span> <span class="ch">'a'</span> <span class="op">:#</span> <span class="dt">HNil</span></span>
<span id="cb34-6"><a href="#cb34-6" aria-hidden="true" tabindex="-1"></a>True :# 'a' :# HNil :: HList '[Bool, Char]</span></code></pre></div>
<p>We can write operations on <code class="sourceCode haskell"><span class="dt">HList</span></code>:</p>
<div class="sourceCode" id="cb35" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb35-1"><a href="#cb35-1" aria-hidden="true" tabindex="-1"></a><span class="ot">hLength ::</span> <span class="dt">HList</span> ts <span class="ot">-></span> <span class="dt">Int</span></span>
<span id="cb35-2"><a href="#cb35-2" aria-hidden="true" tabindex="-1"></a>hLength <span class="dt">HNil</span> <span class="ot">=</span> <span class="dv">0</span></span>
<span id="cb35-3"><a href="#cb35-3" aria-hidden="true" tabindex="-1"></a>hLength (x <span class="op">:#</span> xs) <span class="ot">=</span> <span class="dv">1</span> <span class="op">+</span> hLength xs</span>
<span id="cb35-4"><a href="#cb35-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb35-5"><a href="#cb35-5" aria-hidden="true" tabindex="-1"></a><span class="ot">hHead ::</span> <span class="dt">HList</span> (t '<span class="op">:</span> ts) <span class="ot">-></span> t</span>
<span id="cb35-6"><a href="#cb35-6" aria-hidden="true" tabindex="-1"></a>hHead (t <span class="op">:#</span> _) <span class="ot">=</span> t</span></code></pre></div>
<p>Example usage:</p>
<div class="sourceCode" id="cb36" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb36-1"><a href="#cb36-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> hLength <span class="op">$</span> <span class="dt">True</span> <span class="op">:#</span> <span class="ch">'a'</span> <span class="op">:#</span> <span class="dt">HNil</span></span>
<span id="cb36-2"><a href="#cb36-2" aria-hidden="true" tabindex="-1"></a>2</span>
<span id="cb36-3"><a href="#cb36-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> hHead <span class="op">$</span> <span class="dt">True</span> <span class="op">:#</span> <span class="ch">'a'</span> <span class="op">:#</span> <span class="dt">HNil</span></span>
<span id="cb36-4"><a href="#cb36-4" aria-hidden="true" tabindex="-1"></a>True</span>
<span id="cb36-5"><a href="#cb36-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> hHead <span class="dt">HNil</span></span>
<span id="cb36-6"><a href="#cb36-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb36-7"><a href="#cb36-7" aria-hidden="true" tabindex="-1"></a><span class="ot"><</span>interactive<span class="op">>:</span><span class="dv">7</span><span class="op">:</span><span class="dv">7</span><span class="op">:</span> <span class="fu">error</span><span class="op">:</span></span>
<span id="cb36-8"><a href="#cb36-8" aria-hidden="true" tabindex="-1"></a> • Couldn't match type ‘'[]’ with ‘t : ts0’</span>
<span id="cb36-9"><a href="#cb36-9" aria-hidden="true" tabindex="-1"></a> Expected type: HList (t : ts0)</span>
<span id="cb36-10"><a href="#cb36-10" aria-hidden="true" tabindex="-1"></a> Actual type: HList '[]</span>
<span id="cb36-11"><a href="#cb36-11" aria-hidden="true" tabindex="-1"></a> • In the first argument of ‘hHead’, namely ‘HNil’</span>
<span id="cb36-12"><a href="#cb36-12" aria-hidden="true" tabindex="-1"></a> In the expression: hHead HNil</span>
<span id="cb36-13"><a href="#cb36-13" aria-hidden="true" tabindex="-1"></a> In an equation for ‘it’: it = hHead HNil</span>
<span id="cb36-14"><a href="#cb36-14" aria-hidden="true" tabindex="-1"></a> • Relevant bindings include it :: t (bound at <interactive>:7:1)</span></code></pre></div>
<p>We need to define instances of typeclasses like <code class="sourceCode haskell"><span class="dt">Eq</span></code>, <code class="sourceCode haskell"><span class="dt">Ord</span></code> etc. for <code class="sourceCode haskell"><span class="dt">HList</span></code> because GHC cannot derive them automatically yet:</p>
<div class="sourceCode" id="cb37" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb37-1"><a href="#cb37-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Eq</span> (<span class="dt">HList</span> '[]) <span class="kw">where</span></span>
<span id="cb37-2"><a href="#cb37-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">HNil</span> <span class="op">==</span> <span class="dt">HNil</span> <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb37-3"><a href="#cb37-3" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> (<span class="dt">Eq</span> t, <span class="dt">Eq</span> (<span class="dt">HList</span> ts))</span>
<span id="cb37-4"><a href="#cb37-4" aria-hidden="true" tabindex="-1"></a> <span class="ot">=></span> <span class="dt">Eq</span> (<span class="dt">HList</span> (t '<span class="op">:</span> ts)) <span class="kw">where</span></span>
<span id="cb37-5"><a href="#cb37-5" aria-hidden="true" tabindex="-1"></a> (x <span class="op">:#</span> xs) <span class="op">==</span> (y <span class="op">:#</span> ys) <span class="ot">=</span></span>
<span id="cb37-6"><a href="#cb37-6" aria-hidden="true" tabindex="-1"></a> x <span class="op">==</span> y <span class="op">&&</span> xs <span class="op">==</span> ys</span>
<span id="cb37-7"><a href="#cb37-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb37-8"><a href="#cb37-8" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Ord</span> (<span class="dt">HList</span> '[]) <span class="kw">where</span></span>
<span id="cb37-9"><a href="#cb37-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">HNil</span> <span class="ot">`compare`</span> <span class="dt">HNil</span> <span class="ot">=</span> <span class="dt">EQ</span></span>
<span id="cb37-10"><a href="#cb37-10" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> (<span class="dt">Ord</span> t, <span class="dt">Ord</span> (<span class="dt">HList</span> ts))</span>
<span id="cb37-11"><a href="#cb37-11" aria-hidden="true" tabindex="-1"></a> <span class="ot">=></span> <span class="dt">Ord</span> (<span class="dt">HList</span> (t '<span class="op">:</span> ts)) <span class="kw">where</span></span>
<span id="cb37-12"><a href="#cb37-12" aria-hidden="true" tabindex="-1"></a> (x <span class="op">:#</span> xs) <span class="ot">`compare`</span> (y <span class="op">:#</span> ys) <span class="ot">=</span></span>
<span id="cb37-13"><a href="#cb37-13" aria-hidden="true" tabindex="-1"></a> x <span class="ot">`compare`</span> y <span class="op"><></span> xs <span class="ot">`compare`</span> ys</span>
<span id="cb37-14"><a href="#cb37-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb37-15"><a href="#cb37-15" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Show</span> (<span class="dt">HList</span> '[]) <span class="kw">where</span></span>
<span id="cb37-16"><a href="#cb37-16" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> <span class="dt">HNil</span> <span class="ot">=</span> <span class="st">"[]"</span></span>
<span id="cb37-17"><a href="#cb37-17" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> (<span class="dt">Typeable</span> t, <span class="dt">Show</span> t, <span class="dt">Show</span> (<span class="dt">HList</span> ts))</span>
<span id="cb37-18"><a href="#cb37-18" aria-hidden="true" tabindex="-1"></a> <span class="ot">=></span> <span class="dt">Show</span> (<span class="dt">HList</span> (t '<span class="op">:</span> ts)) <span class="kw">where</span></span>
<span id="cb37-19"><a href="#cb37-19" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> (x <span class="op">:#</span> xs) <span class="ot">=</span></span>
<span id="cb37-20"><a href="#cb37-20" aria-hidden="true" tabindex="-1"></a> <span class="fu">show</span> x </span>
<span id="cb37-21"><a href="#cb37-21" aria-hidden="true" tabindex="-1"></a> <span class="op">++</span> <span class="st">"@"</span> <span class="op">++</span> <span class="fu">show</span> (typeRep (<span class="dt">Proxy</span> <span class="op">@</span>t))</span>
<span id="cb37-22"><a href="#cb37-22" aria-hidden="true" tabindex="-1"></a> <span class="op">++</span> <span class="st">" :# "</span> <span class="op">++</span> <span class="fu">show</span> xs</span></code></pre></div>
<p>The instances are defined recursively: one for the base case and one for the inductive case.</p>
<p>Example usage:</p>
<div class="sourceCode" id="cb38" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb38-1"><a href="#cb38-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">True</span> <span class="op">:#</span> <span class="ch">'a'</span> <span class="op">:#</span> <span class="dt">HNil</span> <span class="op">==</span> <span class="dt">True</span> <span class="op">:#</span> <span class="ch">'a'</span> <span class="op">:#</span> <span class="dt">HNil</span></span>
<span id="cb38-2"><a href="#cb38-2" aria-hidden="true" tabindex="-1"></a>True</span>
<span id="cb38-3"><a href="#cb38-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">True</span> <span class="op">:#</span> <span class="ch">'a'</span> <span class="op">:#</span> <span class="dt">HNil</span> <span class="op">==</span> <span class="dt">True</span> <span class="op">:#</span> <span class="ch">'b'</span> <span class="op">:#</span> <span class="dt">HNil</span></span>
<span id="cb38-4"><a href="#cb38-4" aria-hidden="true" tabindex="-1"></a>False</span>
<span id="cb38-5"><a href="#cb38-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">True</span> <span class="op">:#</span> <span class="ch">'a'</span> <span class="op">:#</span> <span class="dt">HNil</span> <span class="op">==</span> <span class="dt">True</span> <span class="op">:#</span> <span class="dt">HNil</span></span>
<span id="cb38-6"><a href="#cb38-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb38-7"><a href="#cb38-7" aria-hidden="true" tabindex="-1"></a><span class="ot"><</span>interactive<span class="op">>:</span><span class="dv">17</span><span class="op">:</span><span class="dv">24</span><span class="op">:</span> <span class="fu">error</span><span class="op">:</span></span>
<span id="cb38-8"><a href="#cb38-8" aria-hidden="true" tabindex="-1"></a> • Couldn't match type ‘'[]’ with ‘'[Char]’</span>
<span id="cb38-9"><a href="#cb38-9" aria-hidden="true" tabindex="-1"></a> Expected type: HList '[Bool, Char]</span>
<span id="cb38-10"><a href="#cb38-10" aria-hidden="true" tabindex="-1"></a> Actual type: HList '[Bool]</span>
<span id="cb38-11"><a href="#cb38-11" aria-hidden="true" tabindex="-1"></a> • In the second argument of ‘(==)’, namely ‘True :# HNil’</span>
<span id="cb38-12"><a href="#cb38-12" aria-hidden="true" tabindex="-1"></a> In the expression: True :# 'a' :# HNil == True :# HNil</span>
<span id="cb38-13"><a href="#cb38-13" aria-hidden="true" tabindex="-1"></a> In an equation for ‘it’: it = True :# 'a' :# HNil == True :# HNil</span>
<span id="cb38-14"><a href="#cb38-14" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">show</span> <span class="op">$</span> <span class="dt">True</span> <span class="op">:#</span> <span class="ch">'a'</span> <span class="op">:#</span> <span class="dt">HNil</span></span>
<span id="cb38-15"><a href="#cb38-15" aria-hidden="true" tabindex="-1"></a>"True@Bool :# 'a'@Char :# []"</span></code></pre></div>
<h3 id="creating-new-constraints">Creating New Constraints</h3>
<ul>
<li>Type families can be used to create new Constraints:</li>
</ul>
<div class="sourceCode" id="cb39" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb39-1"><a href="#cb39-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XKindSignatures</span></span>
<span id="cb39-2"><a href="#cb39-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XDataKinds</span></span>
<span id="cb39-3"><a href="#cb39-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeOperators</span></span>
<span id="cb39-4"><a href="#cb39-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XTypeFamilies</span></span>
<span id="cb39-5"><a href="#cb39-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>m <span class="op">+</span><span class="dt">Data.Constraint</span></span>
<span id="cb39-6"><a href="#cb39-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb39-7"><a href="#cb39-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">type</span> <span class="kw">family</span> <span class="dt">AllEq</span> (<span class="ot">ts ::</span> [<span class="op">*</span>])<span class="ot"> ::</span> <span class="dt">Constraint</span> <span class="kw">where</span></span>
<span id="cb39-8"><a href="#cb39-8" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">AllEq</span> '[] <span class="ot">=</span> ()</span>
<span id="cb39-9"><a href="#cb39-9" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">AllEq</span> (t '<span class="op">:</span> ts) <span class="ot">=</span> (<span class="dt">Eq</span> t, <span class="dt">AllEq</span> ts)</span>
<span id="cb39-10"><a href="#cb39-10" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb39-11"><a href="#cb39-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>kind<span class="op">!</span> <span class="dt">AllEq</span> '[<span class="dt">Bool</span>, <span class="dt">Char</span>]</span>
<span id="cb39-12"><a href="#cb39-12" aria-hidden="true" tabindex="-1"></a>AllEq '[Bool, Char] :: Constraint</span>
<span id="cb39-13"><a href="#cb39-13" aria-hidden="true" tabindex="-1"></a>= (Eq Bool, (Eq Char, () :: Constraint))</span></code></pre></div>
<ul>
<li><code class="sourceCode haskell"><span class="dt">AllEq</span></code> is a type-level function from a list of types to a constraint.</li>
<li>With the <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/constraint_kind.html#extension-ConstraintKinds" target="_blank" rel="noopener"><code>ConstraintKinds</code></a> extension, <code class="sourceCode haskell"><span class="dt">AllEq</span></code> can be made polymorphic over all constraints instead of just <code class="sourceCode haskell"><span class="dt">Eq</span></code>:</li>
</ul>
<div class="sourceCode" id="cb40" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb40-1"><a href="#cb40-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>set <span class="op">-</span><span class="dt">XConstraintKinds</span></span>
<span id="cb40-2"><a href="#cb40-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb40-3"><a href="#cb40-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="kw">type</span> <span class="kw">family</span> <span class="dt">All</span> (<span class="ot">c ::</span> <span class="op">*</span> <span class="ot">-></span> <span class="dt">Constraint</span>)</span>
<span id="cb40-4"><a href="#cb40-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> (<span class="ot">ts ::</span> [<span class="op">*</span>])<span class="ot"> ::</span> <span class="dt">Constraint</span> <span class="kw">where</span></span>
<span id="cb40-5"><a href="#cb40-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">All</span> c '[] <span class="ot">=</span> ()</span>
<span id="cb40-6"><a href="#cb40-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">All</span> c (t '<span class="op">:</span> ts) <span class="ot">=</span> (c t, <span class="dt">All</span> c ts)</span>
<span id="cb40-7"><a href="#cb40-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span></code></pre></div>
<ul>
<li>With <code class="sourceCode haskell"><span class="dt">All</span></code>, instances for <code class="sourceCode haskell"><span class="dt">HList</span></code> can be written non-recursively:</li>
</ul>
<div class="sourceCode" id="cb41" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb41-1"><a href="#cb41-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">All</span> <span class="dt">Eq</span> ts <span class="ot">=></span> <span class="dt">Eq</span> (<span class="dt">HList</span> ts) <span class="kw">where</span></span>
<span id="cb41-2"><a href="#cb41-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">HNil</span> <span class="op">==</span> <span class="dt">HNil</span> <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb41-3"><a href="#cb41-3" aria-hidden="true" tabindex="-1"></a> (a <span class="op">:#</span> as) <span class="op">==</span> (b <span class="op">:#</span> bs) <span class="ot">=</span> a <span class="op">==</span> b <span class="op">&&</span> as <span class="op">==</span> bs</span></code></pre></div>
<h2 data-track-content data-content-name="conclusion" data-content-piece="twt-notes-1" id="conclusion">Conclusion</h2>
<p>I’m still in the process of reading the book and I’ll post the notes for the rest of the chapters in a later post. The complete code for <code>HList</code> can be found <a href="https://abhinavsarkar.net/code/hlist.html?mtm_campaign=feed">here</a>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p><p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/twt-notes-1/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2020-03-18T00:00:00Z <p><a href="https://www.haskell.org" target="_blank" rel="noopener">Haskell</a>—with its powerful type system—has a great support for type-level programming and it has gotten much better in the recent times with the new releases of the <a href="https://www.haskell.org/ghc/" target="_blank" rel="noopener">GHC</a> compiler. But type-level programming remains a daunting topic even with seasoned haskellers. <em><a href="https://thinkingwithtypes.com/" target="_blank" rel="noopener">Thinking with Types: Type-level Programming in Haskell</a></em> by <a href="https://sandymaguire.me/about/" target="_blank" rel="noopener">Sandy Maguire</a> is a book which attempts to fix that. I’ve taken some notes to summarize my understanding of the same.</p>
https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-3/ Fast Sudoku Solver in Haskell #3: Picking the Right Data Structures 2018-08-13T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In the <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed">previous part</a> in this series of posts, we optimized the simple Sudoku solver by implementing a new strategy to prune cells, and were able to achieve a speedup of almost 200x. Afterwards, we profiled the solution and found that there were bottlenecks in the program, leading to a slowdown. In this post, we are going to follow the profiler and use the right <em>Data Structures</em> to improve the solution further and make it <strong>faster</strong>.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-3/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>Fast Sudoku Solver in Haskell</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-1/?mtm_campaign=feed">A Simple Solution</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed">A 200x Faster Solution</a>
</li>
<li>
<strong>Picking the Right Data Structures</strong> 👈
</li>
</ol>
</section>
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#quick-recap">Quick Recap</a></li><li><a href="#profile-twice-code-once">Profile Twice, Code Once</a></li><li><a href="#a-set-for-all-occasions">A Set for All Occasions</a></li><li><a href="#bit-by-bit-we-get-faster">Bit by Bit, We Get Faster</a></li><li><a href="#back-to-the-profiler">Back to the Profiler</a></li><li><a href="#vectors-of-speed">Vectors of Speed</a></li><li><a href="#revenge-of-the">Revenge of the <code>(==)</code></a></li><li><a href="#one-function-to-prune-them-all">One Function to Prune Them All</a></li><li><a href="#rise-of-the-mutables">Rise of the Mutables</a></li><li><a href="#comparison-of-implementations">Comparison of Implementations</a></li><li><a href="#conclusion">Conclusion</a></li></ol></nav>
<h2 data-track-content data-content-name="quick-recap" data-content-piece="fast-sudoku-solver-in-haskell-3" id="quick-recap">Quick Recap</h2>
<p><a href="https://en.wikipedia.org/wiki/Sudoku" target="_blank" rel="noopener">Sudoku</a> is a number placement puzzle. It consists of a 9x9 grid which is to be filled with digits from 1 to 9 such that each row, each column and each of the nine 3x3 sub-grids contain all the digits. Some of the cells of the grid come pre-filled and the player has to fill the rest.</p>
<p>In the previous post, we improved the performance of the simple Sudoku solver by implementing a new strategy to prune cells. This <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed#a-little-forward-a-little-backward">new strategy</a> found the digits which occurred uniquely, in pairs, or in triplets and fixed the cells to those digits. It led to a speedup of about 200x over our original naive solution. This is our current run<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> time for solving all the 49151 <a href="https://abhinavsarkar.net/files/sudoku17.txt.bz2?mtm_campaign=feed">17-clue puzzles</a>:</p>
<pre class="plain"><code>$ cat sudoku17.txt | time stack exec sudoku > /dev/null
258.97 real 257.34 user 1.52 sys</code></pre>
<p>Let’s try to improve this time.<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
<h2 data-track-content data-content-name="profile-twice-code-once" data-content-piece="fast-sudoku-solver-in-haskell-3" id="profile-twice-code-once">Profile Twice, Code Once</h2>
<p>Instead of trying to guess how to improve the performance of our solution, let’s be methodical about it. We start with profiling the code to find the bottlenecks. Let’s compile and run the code with profiling flags:</p>
<pre class="plain"><code>$ stack build --profile
$ head -1000 sudoku17.txt | stack exec -- sudoku +RTS -p > /dev/null</code></pre>
<p>This generates a <code>sudoku.prof</code> file with the profiling output. Here are the top seven <em>Cost Centres</em><a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> from the file (cleaned for brevity):</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Cost Centre</th>
<th style="text-align: left;">Src</th>
<th style="text-align: right;">%time</th>
<th style="text-align: right;">%alloc</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities</code></td>
<td style="text-align: left;">Sudoku.hs:(49,1)-(62,26)</td>
<td style="text-align: right;">18.9</td>
<td style="text-align: right;">11.4</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCellsByFixed.pruneCell</code></td>
<td style="text-align: left;">Sudoku.hs:(75,5)-(76,36)</td>
<td style="text-align: right;">17.7</td>
<td style="text-align: right;">30.8</td>
</tr>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities.\.\</code></td>
<td style="text-align: left;">Sudoku.hs:55:38-70</td>
<td style="text-align: right;">11.7</td>
<td style="text-align: right;">20.3</td>
</tr>
<tr>
<td style="text-align: left;"><code>fixM.\</code></td>
<td style="text-align: left;">Sudoku.hs:13:27-65</td>
<td style="text-align: right;">10.7</td>
<td style="text-align: right;">0.0</td>
</tr>
<tr>
<td style="text-align: left;"><code>==</code></td>
<td style="text-align: left;">Sudoku.hs:15:56-57</td>
<td style="text-align: right;">5.6</td>
<td style="text-align: right;">0.0</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneGrid'</code></td>
<td style="text-align: left;">Sudoku.hs:(103,1)-(106,64)</td>
<td style="text-align: right;">5.0</td>
<td style="text-align: right;">6.7</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCellsByFixed</code></td>
<td style="text-align: left;">Sudoku.hs:(71,1)-(76,36)</td>
<td style="text-align: right;">4.5</td>
<td style="text-align: right;">5.0</td>
</tr>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities.\</code></td>
<td style="text-align: left;">Sudoku.hs:58:36-68</td>
<td style="text-align: right;">3.4</td>
<td style="text-align: right;">2.5</td>
</tr>
</tbody>
</table>
</div>
<p><em>Cost Centre</em> points to a function, either named or anonymous. <em>Src</em> gives the line and column numbers of the source code of the function. <em>%time</em> and <em>%alloc</em> are the percentages of time spent and memory allocated in the function, respectively.</p>
<p>We see that <code>exclusivePossibilities</code> and the nested functions inside it take up almost 34% time of the entire run time. Second biggest bottleneck is the <code>pruneCell</code> function inside the <code>pruneCellsByFixed</code> function.</p>
<p>We are going to look at <code>exclusivePossibilities</code> later. For now, it is easy to guess the possible reason for <code>pruneCell</code> taking so much time. Here’s the code for reference:</p>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneCellsByFixed ::</span> [<span class="dt">Cell</span>] <span class="ot">-></span> <span class="dt">Maybe</span> [<span class="dt">Cell</span>]</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>pruneCellsByFixed cells <span class="ot">=</span> <span class="fu">traverse</span> pruneCell cells</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> fixeds <span class="ot">=</span> [x <span class="op">|</span> <span class="dt">Fixed</span> x <span class="ot"><-</span> cells]</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> pruneCell (<span class="dt">Possible</span> xs) <span class="ot">=</span> makeCell (xs <span class="dt">Data.List</span><span class="op">.</span>\\ fixeds)</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> pruneCell x <span class="ot">=</span> <span class="dt">Just</span> x</span></code></pre></div>
<p><code>pruneCell</code> uses <code>Data.List.\\</code> to find the difference of the cell’s possible digits and the fixed digits in the cell’s block. In Haskell, lists are implemented as <a href="https://en.wikipedia.org/wiki/Linked_list#Singly_linked_list" target="_blank" rel="noopener">singly linked lists</a>. So, finding the difference or intersection of two lists is <span class="math inline">\(O(n^2)\)</span>, that is, quadratic <a href="https://en.wikipedia.org/wiki/Asymptotic_complexity" target="_blank" rel="noopener">asymptotic complexity</a>. Let’s tackle this bottleneck first.</p>
<h2 data-track-content data-content-name="a-set-for-all-occasions" data-content-piece="fast-sudoku-solver-in-haskell-3" id="a-set-for-all-occasions">A Set for All Occasions</h2>
<p>What is a efficient data structure for finding differences and intersections? Why, a <a href="https://en.wikipedia.org/wiki/Set_(abstract_data_type)" target="_blank" rel="noopener"><em>Set</em></a> of course! A Set stores unique values and provides fast operations for testing membership of its elements. If we use a Set to represent the possible values of cells instead of a List, the program should run faster. Since the possible values are already unique (<code>1</code>–<code>9</code>), it should not break anything.</p>
<p>Haskell comes with a bunch of Set implementations:</p>
<ul>
<li><a href="https://hackage.haskell.org/package/containers-0.6.0.1/docs/Data-Set.html" target="_blank" rel="noopener"><code>Data.Set</code></a> which is a generic data structure implemented as <a href="https://en.wikipedia.org/wiki/Self-balancing_binary_search_tree" target="_blank" rel="noopener">self-balancing binary search tree</a>.</li>
<li><a href="https://hackage.haskell.org/package/unordered-containers-0.2.9.0/docs/Data-HashSet.html" target="_blank" rel="noopener"><code>Data.HashSet</code></a> which is a generic data structure implemented as <a href="https://en.wikipedia.org/wiki/Hash_array_mapped_trie" target="_blank" rel="noopener">hash array mapped trie</a>.</li>
<li><a href="https://hackage.haskell.org/package/containers-0.6.0.1/docs/Data-IntSet.html" target="_blank" rel="noopener"><code>Data.IntSet</code></a> which is a specialized data structure for integer values, implemented as <a href="https://en.wikipedia.org/wiki/Radix_tree" target="_blank" rel="noopener">radix tree</a>.</li>
</ul>
<p>However, a much faster implementation is possible for our particular use-case. We can use a <a href="https://en.wikipedia.org/wiki/Bitset" target="_blank" rel="noopener"><em>BitSet</em></a>.</p>
<p>A BitSet uses <a href="https://en.wikipedia.org/wiki/Bit" target="_blank" rel="noopener">bits</a> to represent unique members of a Set. We map values to particular bits using some function. If the bit corresponding to a particular value is set to 1 then the value is present in the Set, else it is not. So, we need as many bits in a BitSet as the number of values in our domain, which makes is difficult to use for generic problems. But, for our Sudoku solver, we need to store only the digits <code>1</code>–<code>9</code> in the Set, which make BitSet very suitable for us. Also, the Set operations on BitSet are implemented using bit-level instructions in hardware, making them much faster than those on the other data structure listed above.</p>
<p>In Haskell, we can use the <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-Word.html" target="_blank" rel="noopener"><code>Data.Word</code></a> module to represent a BitSet. Specifically, we can use the <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-Word.html#t:Word16" target="_blank" rel="noopener"><code>Data.Word.Word16</code></a> type which has sixteen bits because we need only nine bits to represent the nine digits. The bit-level operations on <code>Word16</code> are provided by the <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-Bits.html" target="_blank" rel="noopener"><code>Data.Bits</code></a> module.</p>
<h2 data-track-content data-content-name="bit-by-bit-we-get-faster" data-content-piece="fast-sudoku-solver-in-haskell-3" id="bit-by-bit-we-get-faster">Bit by Bit, We Get Faster</h2>
<p>First, we replace List with <code>Word16</code> in the <code>Cell</code> type and add a helper function:</p>
<div class="sourceCode" id="cb4" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Cell</span> <span class="ot">=</span> <span class="dt">Fixed</span> <span class="dt">Data.Word.Word16</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Possible</span> <span class="dt">Data.Word.Word16</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="ot">setBits ::</span> <span class="dt">Data.Word.Word16</span> <span class="ot">-></span> [<span class="dt">Data.Word.Word16</span>] <span class="ot">-></span> <span class="dt">Data.Word.Word16</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>setBits <span class="ot">=</span> Data.List.foldl' (<span class="op">Data.Bits..|.</span>)</span></code></pre></div>
<p>Then we replace <code>Int</code> related operations with bit related ones in the read and show functions:</p>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ot">readGrid ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>readGrid s</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">length</span> s <span class="op">==</span> <span class="dv">81</span> <span class="ot">=</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">traverse</span> (<span class="fu">traverse</span> readCell) <span class="op">.</span> Data.List.Split.chunksOf <span class="dv">9</span> <span class="op">$</span> s</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="dt">Nothing</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> allBitsSet <span class="ot">=</span> <span class="dv">1022</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> readCell <span class="ch">'.'</span> <span class="ot">=</span> <span class="dt">Just</span> <span class="op">$</span> <span class="dt">Possible</span> allBitsSet</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a> readCell c</span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> Data.Char.isDigit c <span class="op">&&</span> c <span class="op">></span> <span class="ch">'0'</span> <span class="ot">=</span></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> <span class="op">.</span> <span class="dt">Fixed</span> <span class="op">.</span> Data.Bits.bit <span class="op">.</span> Data.Char.digitToInt <span class="op">$</span> c</span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="dt">Nothing</span></span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a><span class="ot">showGrid ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a>showGrid <span class="ot">=</span> <span class="fu">unlines</span> <span class="op">.</span> <span class="fu">map</span> (<span class="fu">unwords</span> <span class="op">.</span> <span class="fu">map</span> showCell)</span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a> showCell (<span class="dt">Fixed</span> x) <span class="ot">=</span> <span class="fu">show</span> <span class="op">.</span> Data.Bits.countTrailingZeros <span class="op">$</span> x</span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a> showCell _ <span class="ot">=</span> <span class="st">"."</span></span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a><span class="ot">showGridWithPossibilities ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a>showGridWithPossibilities <span class="ot">=</span> <span class="fu">unlines</span> <span class="op">.</span> <span class="fu">map</span> (<span class="fu">unwords</span> <span class="op">.</span> <span class="fu">map</span> showCell)</span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a> showCell (<span class="dt">Fixed</span> x) <span class="ot">=</span> (<span class="fu">show</span> <span class="op">.</span> Data.Bits.countTrailingZeros <span class="op">$</span> x) <span class="op">++</span> <span class="st">" "</span></span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a> showCell (<span class="dt">Possible</span> xs) <span class="ot">=</span></span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a> <span class="st">"["</span> <span class="op">++</span></span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a> <span class="fu">map</span> (\i <span class="ot">-></span> <span class="kw">if</span> Data.Bits.testBit xs i</span>
<span id="cb5-28"><a href="#cb5-28" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> Data.Char.intToDigit i</span>
<span id="cb5-29"><a href="#cb5-29" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> <span class="ch">' '</span>)</span>
<span id="cb5-30"><a href="#cb5-30" aria-hidden="true" tabindex="-1"></a> [<span class="dv">1</span><span class="op">..</span><span class="dv">9</span>]</span>
<span id="cb5-31"><a href="#cb5-31" aria-hidden="true" tabindex="-1"></a> <span class="op">++</span> <span class="st">"]"</span></span></code></pre></div>
<p>We set the same bits as the digits to indicate the presence of the digits in the possibilities. For example, for digit <code>1</code>, we set the bit 1 so that the resulting <code>Word16</code> is <code>0000 0000 0000 0010</code> or 2. This also means, for fixed cells, the value is <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-Bits.html#v:countTrailingZeros" target="_blank" rel="noopener">count of the zeros from right</a>.</p>
<p>The change in the <code>exclusivePossibilities</code> function is pretty minimal:</p>
<div class="sourceCode" id="cb6" data-lang="diff"><pre class="sourceCode numberSource diff"><code class="sourceCode diff"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="st">-exclusivePossibilities :: [Cell] -> [[Int]]</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="va">+exclusivePossibilities :: [Cell] -> [Data.Word.Word16]</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> exclusivePossibilities row =</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> row</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> & zip [1..9]</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> & filter (isPossible . snd)</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> & Data.List.foldl'</span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> (\acc ~(i, Possible xs) -></span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a><span class="st">- Data.List.foldl'</span></span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a><span class="st">- (\acc' x -> Map.insertWith prepend x [i] acc')</span></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a><span class="st">- acc</span></span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a><span class="st">- xs)</span></span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a><span class="va">+ Data.List.foldl'</span></span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a><span class="va">+ (\acc' x -> if Data.Bits.testBit xs x</span></span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a><span class="va">+ then Map.insertWith prepend x [i] acc'</span></span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a><span class="va">+ else acc')</span></span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a><span class="va">+ acc</span></span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a><span class="va">+ [1..9])</span></span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a> Map.empty</span>
<span id="cb6-20"><a href="#cb6-20" aria-hidden="true" tabindex="-1"></a> & Map.filter ((< 4) . length)</span>
<span id="cb6-21"><a href="#cb6-21" aria-hidden="true" tabindex="-1"></a> & Map.foldlWithKey' (\acc x is -> Map.insertWith prepend is [x] acc) Map.empty</span>
<span id="cb6-22"><a href="#cb6-22" aria-hidden="true" tabindex="-1"></a> & Map.filterWithKey (\is xs -> length is == length xs)</span>
<span id="cb6-23"><a href="#cb6-23" aria-hidden="true" tabindex="-1"></a> & Map.elems</span>
<span id="cb6-24"><a href="#cb6-24" aria-hidden="true" tabindex="-1"></a><span class="va">+ & map (Data.List.foldl' Data.Bits.setBit Data.Bits.zeroBits)</span></span>
<span id="cb6-25"><a href="#cb6-25" aria-hidden="true" tabindex="-1"></a> where</span>
<span id="cb6-26"><a href="#cb6-26" aria-hidden="true" tabindex="-1"></a> prepend ~[y] ys = y:ys</span></code></pre></div>
<p>In the nested folding step, instead of folding over the possible values of the cells, now we fold over the digits from <code>1</code> to <code>9</code> and insert the entry in the map if the bit corresponding to the digit is set in the possibilities. And as the last step, we convert the exclusive possibilities to <code>Word16</code> by folding them, starting with zero. As example in the <em>REPL</em> should be instructive:</p>
<div class="sourceCode" id="cb7" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> poss <span class="ot">=</span> Data.List.foldl' Data.Bits.setBit Data.Bits.zeroBits</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> row <span class="ot">=</span> [<span class="dt">Possible</span> <span class="op">$</span> poss [<span class="dv">4</span>,<span class="dv">6</span>,<span class="dv">9</span>], <span class="dt">Fixed</span> <span class="op">$</span> poss [<span class="dv">1</span>], <span class="dt">Fixed</span> <span class="op">$</span> poss [<span class="dv">5</span>], <span class="dt">Possible</span> <span class="op">$</span> poss [<span class="dv">6</span>,<span class="dv">9</span>], <span class="dt">Fixed</span> <span class="op">$</span> poss [<span class="dv">7</span>], <span class="dt">Possible</span> <span class="op">$</span> poss [<span class="dv">2</span>,<span class="dv">3</span>,<span class="dv">6</span>,<span class="dv">8</span>,<span class="dv">9</span>], <span class="dt">Possible</span> <span class="op">$</span> poss [<span class="dv">6</span>,<span class="dv">9</span>], <span class="dt">Possible</span> <span class="op">$</span> poss [<span class="dv">2</span>,<span class="dv">3</span>,<span class="dv">6</span>,<span class="dv">8</span>,<span class="dv">9</span>], <span class="dt">Possible</span> <span class="op">$</span> poss [<span class="dv">2</span>,<span class="dv">3</span>,<span class="dv">6</span>,<span class="dv">8</span>,<span class="dv">9</span>]]</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGridWithPossibilities [row]</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>[ 4 6 9] 1 5 [ 6 9] 7 [ 23 6 89] [ 6 9] [ 23 6 89] [ 23 6 89]</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> exclusivePossibilities row</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>[16,268]</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> [poss [<span class="dv">4</span>], poss [<span class="dv">8</span>,<span class="dv">3</span>,<span class="dv">2</span>]]</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>[16,268]</span></code></pre></div>
<p>This is the same example row as the <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed#a-little-forward-a-little-backward">last time</a>. And it returns same results, excepts as a list of <code>Word16</code> now.</p>
<p>Now, we change <code>makeCell</code> to use bit operations instead of list ones:</p>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">makeCell ::</span> <span class="dt">Data.Word.Word16</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Cell</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>makeCell ys</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> ys <span class="op">==</span> Data.Bits.zeroBits <span class="ot">=</span> <span class="dt">Nothing</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> Data.Bits.popCount ys <span class="op">==</span> <span class="dv">1</span> <span class="ot">=</span> <span class="dt">Just</span> <span class="op">$</span> <span class="dt">Fixed</span> ys</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="dt">Just</span> <span class="op">$</span> <span class="dt">Possible</span> ys</span></code></pre></div>
<p>And we change cell pruning functions too:</p>
<div class="sourceCode" id="cb9" data-lang="diff"><pre class="sourceCode numberSource diff"><code class="sourceCode diff"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a> pruneCellsByFixed :: [Cell] -> Maybe [Cell]</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a> pruneCellsByFixed cells = traverse pruneCell cells</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a> where</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="st">- fixeds = [x | Fixed x <- cells]</span></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="va">+ fixeds = setBits Data.Bits.zeroBits [x | Fixed x <- cells]</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a><span class="st">- pruneCell (Possible xs) = makeCell (xs Data.List.\\ fixeds)</span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a><span class="va">+ pruneCell (Possible xs) =</span></span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a><span class="va">+ makeCell (xs Data.Bits..&. Data.Bits.complement fixeds)</span></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a> pruneCell x = Just x</span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a> pruneCellsByExclusives :: [Cell] -> Maybe [Cell]</span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a> pruneCellsByExclusives cells = case exclusives of</span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a> [] -> Just cells</span>
<span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a> _ -> traverse pruneCell cells</span>
<span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a> where</span>
<span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a> exclusives = exclusivePossibilities cells</span>
<span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a><span class="st">- allExclusives = concat exclusives</span></span>
<span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a><span class="va">+ allExclusives = setBits Data.Bits.zeroBits exclusives</span></span>
<span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a> pruneCell cell@(Fixed _) = Just cell</span>
<span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a> pruneCell cell@(Possible xs)</span>
<span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a> | intersection `elem` exclusives = makeCell intersection</span>
<span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a> | otherwise = Just cell</span>
<span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a> where</span>
<span id="cb9-26"><a href="#cb9-26" aria-hidden="true" tabindex="-1"></a><span class="st">- intersection = xs `Data.List.intersect` allExclusives</span></span>
<span id="cb9-27"><a href="#cb9-27" aria-hidden="true" tabindex="-1"></a><span class="va">+ intersection = xs Data.Bits..&. allExclusives</span></span></code></pre></div>
<p>Notice how the list difference and intersection functions are replaced by <code>Data.Bits</code> functions. Specifically, list difference is replace by bitwise-and of the bitwise-complement, and list intersection is replaced by bitwise-and.</p>
<p>We make a one-line change in the <code>isGridInvalid</code> function to find empty possible cells using bit ops:</p>
<div class="sourceCode" id="cb10" data-lang="diff"><pre class="sourceCode numberSource diff"><code class="sourceCode diff"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a> isGridInvalid :: Grid -> Bool</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> isGridInvalid grid =</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> any isInvalidRow grid</span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> || any isInvalidRow (Data.List.transpose grid)</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> || any isInvalidRow (subGridsToRows grid)</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> where</span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> isInvalidRow row =</span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> let fixeds = [x | Fixed x <- row]</span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a><span class="st">- emptyPossibles = [x | Possible x <- row, null x]</span></span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a><span class="va">+ emptyPossibles = [() | Possible x <- row, x == Data.Bits.zeroBits]</span></span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a> in hasDups fixeds || not (null emptyPossibles)</span>
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a> hasDups l = hasDups' l []</span>
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true" tabindex="-1"></a> hasDups' [] _ = False</span>
<span id="cb10-16"><a href="#cb10-16" aria-hidden="true" tabindex="-1"></a> hasDups' (y:ys) xs</span>
<span id="cb10-17"><a href="#cb10-17" aria-hidden="true" tabindex="-1"></a> | y `elem` xs = True</span>
<span id="cb10-18"><a href="#cb10-18" aria-hidden="true" tabindex="-1"></a> | otherwise = hasDups' ys (y:xs)</span></code></pre></div>
<p>And finally, we change the <code>nextGrids</code> functions to use bit operations:</p>
<div class="sourceCode" id="cb11" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">nextGrids ::</span> <span class="dt">Grid</span> <span class="ot">-></span> (<span class="dt">Grid</span>, <span class="dt">Grid</span>)</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>nextGrids grid <span class="ot">=</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (i, first<span class="op">@</span>(<span class="dt">Fixed</span> _), rest) <span class="ot">=</span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> fixCell</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Data.List.minimumBy (<span class="fu">compare</span> <span class="ot">`Data.Function.on`</span> (possibilityCount <span class="op">.</span> <span class="fu">snd</span>))</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">filter</span> (isPossible <span class="op">.</span> <span class="fu">snd</span>)</span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">zip</span> [<span class="dv">0</span><span class="op">..</span>]</span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">concat</span></span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> grid</span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> (replace2D i first grid, replace2D i rest grid)</span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a> possibilityCount (<span class="dt">Possible</span> xs) <span class="ot">=</span> Data.Bits.popCount xs</span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a> possibilityCount (<span class="dt">Fixed</span> _) <span class="ot">=</span> <span class="dv">1</span></span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a> fixCell <span class="op">~</span>(i, <span class="dt">Possible</span> xs) <span class="ot">=</span></span>
<span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> x <span class="ot">=</span> Data.Bits.countTrailingZeros xs</span>
<span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="kw">case</span> makeCell (Data.Bits.clearBit xs x) <span class="kw">of</span></span>
<span id="cb11-18"><a href="#cb11-18" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="fu">error</span> <span class="st">"Impossible case"</span></span>
<span id="cb11-19"><a href="#cb11-19" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> cell <span class="ot">-></span> (i, <span class="dt">Fixed</span> (Data.Bits.bit x), cell)</span>
<span id="cb11-20"><a href="#cb11-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-21"><a href="#cb11-21" aria-hidden="true" tabindex="-1"></a><span class="ot"> replace2D ::</span> <span class="dt">Int</span> <span class="ot">-></span> a <span class="ot">-></span> [[a]] <span class="ot">-></span> [[a]]</span>
<span id="cb11-22"><a href="#cb11-22" aria-hidden="true" tabindex="-1"></a> replace2D i v <span class="ot">=</span></span>
<span id="cb11-23"><a href="#cb11-23" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (x, y) <span class="ot">=</span> (i <span class="ot">`quot`</span> <span class="dv">9</span>, i <span class="ot">`mod`</span> <span class="dv">9</span>) <span class="kw">in</span> replace x (replace y (<span class="fu">const</span> v))</span>
<span id="cb11-24"><a href="#cb11-24" aria-hidden="true" tabindex="-1"></a> replace p f xs <span class="ot">=</span> [<span class="kw">if</span> i <span class="op">==</span> p <span class="kw">then</span> f x <span class="kw">else</span> x <span class="op">|</span> (x, i) <span class="ot"><-</span> <span class="fu">zip</span> xs [<span class="dv">0</span><span class="op">..</span>]]</span></code></pre></div>
<p><code>possibilityCount</code> now uses <code>Data.Bits.popCount</code> to count the number of bits set to 1. <code>fixCell</code> now chooses the first set bit from right as the digit to fix. Rest of the code stays the same. Let’s build and run it:</p>
<pre class="plain"><code>$ stack build
$ cat sudoku17.txt | time stack exec sudoku > /dev/null
69.44 real 69.12 user 0.37 sys</code></pre>
<p>Wow! That is almost 3.7x faster than the previous solution. It’s a massive win! But let’s not be content yet. To the profiler again<a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>!</p>
<h2 data-track-content data-content-name="back-to-the-profiler" data-content-piece="fast-sudoku-solver-in-haskell-3" id="back-to-the-profiler">Back to the Profiler</h2>
<p>Running the profiler again gives us these top six culprits:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Cost Centre</th>
<th style="text-align: left;">Src</th>
<th style="text-align: right;">%time</th>
<th style="text-align: right;">%alloc</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities</code></td>
<td style="text-align: left;">Sudoku.hs:(57,1)-(74,26)</td>
<td style="text-align: right;">25.2</td>
<td style="text-align: right;">16.6</td>
</tr>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities.\.\</code></td>
<td style="text-align: left;">Sudoku.hs:64:23-96</td>
<td style="text-align: right;">19.0</td>
<td style="text-align: right;">32.8</td>
</tr>
<tr>
<td style="text-align: left;"><code>fixM.\</code></td>
<td style="text-align: left;">Sudoku.hs:15:27-65</td>
<td style="text-align: right;">12.5</td>
<td style="text-align: right;">0.1</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCellsByFixed</code></td>
<td style="text-align: left;">Sudoku.hs:(83,1)-(88,36)</td>
<td style="text-align: right;">5.9</td>
<td style="text-align: right;">7.1</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneGrid'</code></td>
<td style="text-align: left;">Sudoku.hs:(115,1)-(118,64)</td>
<td style="text-align: right;">5.0</td>
<td style="text-align: right;">8.6</td>
</tr>
</tbody>
</table>
</div>
<p>Hurray! <code>pruneCellsByFixed.pruneCell</code> has disappeared from the list of top bottlenecks. Though <code>exclusivePossibilities</code> still remains here as expected.</p>
<p><code>exclusivePossibilities</code> is a big function. The profiler does not really tell us which parts of it are the slow ones. That’s because by default, the profiler only considers functions as <em>Cost Centres</em>. We need to give it hints for it to be able to find bottlenecks inside functions. For that, we need to insert <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/profiling.html#inserting-cost-centres-by-hand" target="_blank" rel="noopener"><em>Cost Centre</em> annotations</a> in the code:</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">exclusivePossibilities ::</span> [<span class="dt">Cell</span>] <span class="ot">-></span> [<span class="dt">Data.Word.Word16</span>]</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>exclusivePossibilities row <span class="ot">=</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a> row</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> (<span class="ot">{-# SCC "EP.zip" #-}</span> <span class="fu">zip</span> [<span class="dv">1</span><span class="op">..</span><span class="dv">9</span>])</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> (<span class="ot">{-# SCC "EP.filter" #-}</span> <span class="fu">filter</span> (isPossible <span class="op">.</span> <span class="fu">snd</span>))</span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> (<span class="ot">{-# SCC "EP.foldl" #-}</span> Data.List.foldl'</span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> (\acc <span class="op">~</span>(i, <span class="dt">Possible</span> xs) <span class="ot">-></span></span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a> Data.List.foldl'</span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a> (\acc' n <span class="ot">-></span> <span class="kw">if</span> Data.Bits.testBit xs n</span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> Map.insertWith prepend n [i] acc'</span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> acc')</span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a> acc</span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a> [<span class="dv">1</span><span class="op">..</span><span class="dv">9</span>])</span>
<span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a> Map.empty)</span>
<span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> (<span class="ot">{-# SCC "EP.Map.filter1" #-}</span> Map.filter ((<span class="op"><</span> <span class="dv">4</span>) <span class="op">.</span> <span class="fu">length</span>))</span>
<span id="cb13-16"><a href="#cb13-16" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> (<span class="ot">{-# SCC "EP.Map.foldl" #-}</span></span>
<span id="cb13-17"><a href="#cb13-17" aria-hidden="true" tabindex="-1"></a> Map.foldlWithKey'</span>
<span id="cb13-18"><a href="#cb13-18" aria-hidden="true" tabindex="-1"></a> (\acc x is <span class="ot">-></span> Map.insertWith prepend is [x] acc)</span>
<span id="cb13-19"><a href="#cb13-19" aria-hidden="true" tabindex="-1"></a> Map.empty)</span>
<span id="cb13-20"><a href="#cb13-20" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> (<span class="ot">{-# SCC "EP.Map.filter2" #-}</span></span>
<span id="cb13-21"><a href="#cb13-21" aria-hidden="true" tabindex="-1"></a> Map.filterWithKey (\is xs <span class="ot">-></span> <span class="fu">length</span> is <span class="op">==</span> <span class="fu">length</span> xs))</span>
<span id="cb13-22"><a href="#cb13-22" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> (<span class="ot">{-# SCC "EP.Map.elems" #-}</span> Map.elems)</span>
<span id="cb13-23"><a href="#cb13-23" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> (<span class="ot">{-# SCC "EP.map" #-}</span></span>
<span id="cb13-24"><a href="#cb13-24" aria-hidden="true" tabindex="-1"></a> <span class="fu">map</span> (Data.List.foldl' Data.Bits.setBit Data.Bits.zeroBits))</span>
<span id="cb13-25"><a href="#cb13-25" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb13-26"><a href="#cb13-26" aria-hidden="true" tabindex="-1"></a> prepend <span class="op">~</span>[y] ys <span class="ot">=</span> y<span class="op">:</span>ys</span></code></pre></div>
<p>Here, <code>{-# SCC "EP.zip" #-}</code> is a <em>Cost Centre</em> annotation. <code>"EP.zip"</code> is the name we choose to give to this <em>Cost Centre</em>.</p>
<p>After profiling the code again, we get a different list of bottlenecks:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Cost Centre</th>
<th style="text-align: left;">Src</th>
<th style="text-align: right;">%time</th>
<th style="text-align: right;">%alloc</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities.\.\</code></td>
<td style="text-align: left;">Sudoku.hs:(64,23)-(66,31)</td>
<td style="text-align: right;">19.5</td>
<td style="text-align: right;">31.4</td>
</tr>
<tr>
<td style="text-align: left;"><code>fixM.\</code></td>
<td style="text-align: left;">Sudoku.hs:15:27-65</td>
<td style="text-align: right;">13.1</td>
<td style="text-align: right;">0.1</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCellsByFixed</code></td>
<td style="text-align: left;">Sudoku.hs:(85,1)-(90,36)</td>
<td style="text-align: right;">5.4</td>
<td style="text-align: right;">6.8</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneGrid'</code></td>
<td style="text-align: left;">Sudoku.hs:(117,1)-(120,64)</td>
<td style="text-align: right;">4.8</td>
<td style="text-align: right;">8.3</td>
</tr>
<tr>
<td style="text-align: left;"><code>EP.zip</code></td>
<td style="text-align: left;">Sudoku.hs:59:27-36</td>
<td style="text-align: right;">4.3</td>
<td style="text-align: right;">10.7</td>
</tr>
<tr>
<td style="text-align: left;"><code>EP.Map.filter1</code></td>
<td style="text-align: left;">Sudoku.hs:70:35-61</td>
<td style="text-align: right;">4.2</td>
<td style="text-align: right;">0.5</td>
</tr>
<tr>
<td style="text-align: left;"><code>chunksOf</code></td>
<td style="text-align: left;">Data/List/Split/Internals.hs:(514,1)-(517,49)</td>
<td style="text-align: right;">4.1</td>
<td style="text-align: right;">7.4</td>
</tr>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities.\</code></td>
<td style="text-align: left;">Sudoku.hs:71:64-96</td>
<td style="text-align: right;">4.0</td>
<td style="text-align: right;">3.4</td>
</tr>
<tr>
<td style="text-align: left;"><code>EP.filter</code></td>
<td style="text-align: left;">Sudoku.hs:60:30-54</td>
<td style="text-align: right;">2.9</td>
<td style="text-align: right;">3.4</td>
</tr>
<tr>
<td style="text-align: left;"><code>EP.foldl</code></td>
<td style="text-align: left;">Sudoku.hs:(61,29)-(69,15)</td>
<td style="text-align: right;">2.8</td>
<td style="text-align: right;">1.8</td>
</tr>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities</code></td>
<td style="text-align: left;">Sudoku.hs:(57,1)-(76,26)</td>
<td style="text-align: right;">2.7</td>
<td style="text-align: right;">1.9</td>
</tr>
<tr>
<td style="text-align: left;"><code>chunksOf.splitter</code></td>
<td style="text-align: left;">Data/List/Split/Internals.hs:(516,3)-(517,49)</td>
<td style="text-align: right;">2.5</td>
<td style="text-align: right;">2.7</td>
</tr>
</tbody>
</table>
</div>
<p>So almost one-fifth of the time is actually going in this nested one-line anonymous function inside <code>exclusivePossibilities</code>:</p>
<div class="sourceCode" id="cb14" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>(\acc' n <span class="ot">-></span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a> <span class="kw">if</span> Data.Bits.testBit xs n <span class="kw">then</span> Map.insertWith prepend n [i] acc' <span class="kw">else</span> acc')</span></code></pre></div>
<p>But we are going to ignore it for now.</p>
<p>If we look closely, we also find that around 17% of the run time now goes into list traversal and manipulation. This is in the functions <code>pruneCellsByFixed</code>, <code>pruneGrid'</code>, <code>chunksOf</code> and <code>chunksOf.splitter</code>, where the first two are majorly list traversal and transposition, and the last two are list splitting. Maybe it is time to get rid of lists altogether?</p>
<h2 data-track-content data-content-name="vectors-of-speed" data-content-piece="fast-sudoku-solver-in-haskell-3" id="vectors-of-speed">Vectors of Speed</h2>
<p><a href="https://hackage.haskell.org/package/vector-0.12.0.1" target="_blank" rel="noopener">Vector</a> is a Haskell library for working with arrays. It implements very performant operations for integer-indexed array data. Unlike the lists in Haskell which are implemented as <a href="https://en.wikipedia.org/wiki/Linked_list#Singly_linked_list" target="_blank" rel="noopener">singly linked lists</a>, vectors are stored in a contiguous set of memory locations. This makes random access to the elements a constant time operation. The memory overhead per additional item in vectors is also much smaller. Lists allocate memory for each item in the heap and have pointers to the memory locations in nodes, leading to a lot of wasted memory in holding pointers. On the other hand, operations on lists are lazy, whereas, operations on vectors are strict, and this may need to useless computation depending on the use-case<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a>.</p>
<p>In our current code, we represent the grid as a list of lists of cells. All the pruning operations require us to traverse the grid list or the row lists. We also need to transform the grid back-and-forth for being able to use the same pruning operations for rows, columns and sub-grids. The pruning of cells and the choosing of pivot cells also requires us to replace cells in the grid with new ones, leading to a lot of list traversals.</p>
<p>To prevent all this linear-time list traversals, we can replace the nested list of lists with a single vector. Then all we need to do it to go over the right parts of this vector, looking up and replacing cells as needed. Since both lookups and updates on vectors are constant time, this should lead to a speedup.</p>
<p>Let’s start by changing the grid to a vector of cells.:</p>
<div class="sourceCode" id="cb15" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Cell</span> <span class="ot">=</span> <span class="dt">Fixed</span> <span class="dt">Data.Word.Word16</span></span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="dt">Possible</span> <span class="dt">Data.Word.Word16</span></span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Grid</span> <span class="ot">=</span> <span class="dt">Data.Vector.Vector</span> <span class="dt">Cell</span></span></code></pre></div>
<p>Since we plan to traverse different parts of the same vector, let’s define these different parts first:</p>
<div class="sourceCode" id="cb16" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">CellIxs</span> <span class="ot">=</span> [<span class="dt">Int</span>]</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a><span class="ot">fromXY ::</span> (<span class="dt">Int</span>, <span class="dt">Int</span>) <span class="ot">-></span> <span class="dt">Int</span></span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>fromXY (x, y) <span class="ot">=</span> x <span class="op">*</span> <span class="dv">9</span> <span class="op">+</span> y</span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>allRowIxs, allColIxs,<span class="ot"> allSubGridIxs ::</span> [<span class="dt">CellIxs</span>]</span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a>allRowIxs <span class="ot">=</span> [getRow i <span class="op">|</span> i <span class="ot"><-</span> [<span class="dv">0</span><span class="op">..</span><span class="dv">8</span>]]</span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span> getRow n <span class="ot">=</span> [ fromXY (n, i) <span class="op">|</span> i <span class="ot"><-</span> [<span class="dv">0</span><span class="op">..</span><span class="dv">8</span>] ]</span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a>allColIxs <span class="ot">=</span> [getCol i <span class="op">|</span> i <span class="ot"><-</span> [<span class="dv">0</span><span class="op">..</span><span class="dv">8</span>]]</span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span> getCol n <span class="ot">=</span> [ fromXY (i, n) <span class="op">|</span> i <span class="ot"><-</span> [<span class="dv">0</span><span class="op">..</span><span class="dv">8</span>] ]</span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a>allSubGridIxs <span class="ot">=</span> [getSubGrid i <span class="op">|</span> i <span class="ot"><-</span> [<span class="dv">0</span><span class="op">..</span><span class="dv">8</span>]]</span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span> getSubGrid n <span class="ot">=</span> <span class="kw">let</span> (r, c) <span class="ot">=</span> (n <span class="ot">`quot`</span> <span class="dv">3</span>, n <span class="ot">`mod`</span> <span class="dv">3</span>)</span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> [ fromXY (<span class="dv">3</span> <span class="op">*</span> r <span class="op">+</span> i, <span class="dv">3</span> <span class="op">*</span> c <span class="op">+</span> j) <span class="op">|</span> i <span class="ot"><-</span> [<span class="dv">0</span><span class="op">..</span><span class="dv">2</span>], j <span class="ot"><-</span> [<span class="dv">0</span><span class="op">..</span><span class="dv">2</span>] ]</span></code></pre></div>
<p>We define a type for cell indices as a list of integers. Then we create three lists of cell indices: all row indices, all column indices, and all sub-grid indices. Let’s check these out in the <em>REPL</em>:</p>
<div class="sourceCode" id="cb17" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> Control.Monad.mapM_ <span class="fu">print</span> allRowIxs</span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>[0,1,2,3,4,5,6,7,8]</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>[9,10,11,12,13,14,15,16,17]</span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>[18,19,20,21,22,23,24,25,26]</span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a>[27,28,29,30,31,32,33,34,35]</span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>[36,37,38,39,40,41,42,43,44]</span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a>[45,46,47,48,49,50,51,52,53]</span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a>[54,55,56,57,58,59,60,61,62]</span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a>[63,64,65,66,67,68,69,70,71]</span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a>[72,73,74,75,76,77,78,79,80]</span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> Control.Monad.mapM_ <span class="fu">print</span> allColIxs</span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a>[0,9,18,27,36,45,54,63,72]</span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a>[1,10,19,28,37,46,55,64,73]</span>
<span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a>[2,11,20,29,38,47,56,65,74]</span>
<span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a>[3,12,21,30,39,48,57,66,75]</span>
<span id="cb17-16"><a href="#cb17-16" aria-hidden="true" tabindex="-1"></a>[4,13,22,31,40,49,58,67,76]</span>
<span id="cb17-17"><a href="#cb17-17" aria-hidden="true" tabindex="-1"></a>[5,14,23,32,41,50,59,68,77]</span>
<span id="cb17-18"><a href="#cb17-18" aria-hidden="true" tabindex="-1"></a>[6,15,24,33,42,51,60,69,78]</span>
<span id="cb17-19"><a href="#cb17-19" aria-hidden="true" tabindex="-1"></a>[7,16,25,34,43,52,61,70,79]</span>
<span id="cb17-20"><a href="#cb17-20" aria-hidden="true" tabindex="-1"></a>[8,17,26,35,44,53,62,71,80]</span>
<span id="cb17-21"><a href="#cb17-21" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> Control.Monad.mapM_ <span class="fu">print</span> allSubGridIxs</span>
<span id="cb17-22"><a href="#cb17-22" aria-hidden="true" tabindex="-1"></a>[0,1,2,9,10,11,18,19,20]</span>
<span id="cb17-23"><a href="#cb17-23" aria-hidden="true" tabindex="-1"></a>[3,4,5,12,13,14,21,22,23]</span>
<span id="cb17-24"><a href="#cb17-24" aria-hidden="true" tabindex="-1"></a>[6,7,8,15,16,17,24,25,26]</span>
<span id="cb17-25"><a href="#cb17-25" aria-hidden="true" tabindex="-1"></a>[27,28,29,36,37,38,45,46,47]</span>
<span id="cb17-26"><a href="#cb17-26" aria-hidden="true" tabindex="-1"></a>[30,31,32,39,40,41,48,49,50]</span>
<span id="cb17-27"><a href="#cb17-27" aria-hidden="true" tabindex="-1"></a>[33,34,35,42,43,44,51,52,53]</span>
<span id="cb17-28"><a href="#cb17-28" aria-hidden="true" tabindex="-1"></a>[54,55,56,63,64,65,72,73,74]</span>
<span id="cb17-29"><a href="#cb17-29" aria-hidden="true" tabindex="-1"></a>[57,58,59,66,67,68,75,76,77]</span>
<span id="cb17-30"><a href="#cb17-30" aria-hidden="true" tabindex="-1"></a>[60,61,62,69,70,71,78,79,80]</span></code></pre></div>
<p>We can verify manually that these indices are correct.</p>
<p>Read and show functions are easy to change for vector:</p>
<div class="sourceCode" id="cb18" data-lang="diff"><pre class="sourceCode numberSource diff"><code class="sourceCode diff"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a> readGrid :: String -> Maybe Grid</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a> readGrid s</span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="st">- | length s == 81 = traverse (traverse readCell) . Data.List.Split.chunksOf 9 $ s</span></span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a><span class="va">+ | length s == 81 = Data.Vector.fromList <$> traverse readCell s</span></span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a> | otherwise = Nothing</span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a> where</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a> allBitsSet = 1022</span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a> readCell '.' = Just $ Possible allBitsSet</span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a> readCell c</span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a> | Data.Char.isDigit c && c > '0' =</span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a> Just . Fixed . Data.Bits.bit . Data.Char.digitToInt $ c</span>
<span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a> | otherwise = Nothing</span>
<span id="cb18-14"><a href="#cb18-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-15"><a href="#cb18-15" aria-hidden="true" tabindex="-1"></a> showGrid :: Grid -> String</span>
<span id="cb18-16"><a href="#cb18-16" aria-hidden="true" tabindex="-1"></a><span class="st">-showGrid = unlines . map (unwords . map showCell)</span></span>
<span id="cb18-17"><a href="#cb18-17" aria-hidden="true" tabindex="-1"></a><span class="va">+showGrid grid =</span></span>
<span id="cb18-18"><a href="#cb18-18" aria-hidden="true" tabindex="-1"></a><span class="va">+ unlines . map (unwords . map (showCell . (grid !))) $ allRowIxs</span></span>
<span id="cb18-19"><a href="#cb18-19" aria-hidden="true" tabindex="-1"></a> where</span>
<span id="cb18-20"><a href="#cb18-20" aria-hidden="true" tabindex="-1"></a> showCell (Fixed x) = show . Data.Bits.countTrailingZeros $ x</span>
<span id="cb18-21"><a href="#cb18-21" aria-hidden="true" tabindex="-1"></a> showCell _ = "."</span>
<span id="cb18-22"><a href="#cb18-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-23"><a href="#cb18-23" aria-hidden="true" tabindex="-1"></a> showGridWithPossibilities :: Grid -> String</span>
<span id="cb18-24"><a href="#cb18-24" aria-hidden="true" tabindex="-1"></a><span class="st">-showGridWithPossibilities = unlines . map (unwords . map showCell)</span></span>
<span id="cb18-25"><a href="#cb18-25" aria-hidden="true" tabindex="-1"></a><span class="va">+showGridWithPossibilities grid =</span></span>
<span id="cb18-26"><a href="#cb18-26" aria-hidden="true" tabindex="-1"></a><span class="va">+ unlines . map (unwords . map (showCell . (grid !))) $ allRowIxs</span></span>
<span id="cb18-27"><a href="#cb18-27" aria-hidden="true" tabindex="-1"></a> where</span>
<span id="cb18-28"><a href="#cb18-28" aria-hidden="true" tabindex="-1"></a> showCell (Fixed x) = (show . Data.Bits.countTrailingZeros $ x) ++ " "</span>
<span id="cb18-29"><a href="#cb18-29" aria-hidden="true" tabindex="-1"></a> showCell (Possible xs) =</span>
<span id="cb18-30"><a href="#cb18-30" aria-hidden="true" tabindex="-1"></a> "[" ++</span>
<span id="cb18-31"><a href="#cb18-31" aria-hidden="true" tabindex="-1"></a> map (\i -> if Data.Bits.testBit xs i</span>
<span id="cb18-32"><a href="#cb18-32" aria-hidden="true" tabindex="-1"></a> then Data.Char.intToDigit i</span>
<span id="cb18-33"><a href="#cb18-33" aria-hidden="true" tabindex="-1"></a> else ' ')</span>
<span id="cb18-34"><a href="#cb18-34" aria-hidden="true" tabindex="-1"></a> [1..9]</span>
<span id="cb18-35"><a href="#cb18-35" aria-hidden="true" tabindex="-1"></a> ++ "]"</span></code></pre></div>
<p><code>readGrid</code> simply changes to work on a single vector of cells instead of a list of lists. Show functions have a pretty minor change to do lookups from a vector using the row indices and the <a href="https://hackage.haskell.org/package/vector-0.12.0.1/docs/Data-Vector.html#v:-33-" target="_blank" rel="noopener"><code>(!)</code></a> function. The <code>(!)</code> function is the vector indexing function which is similar to the <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Prelude.html#v:-33--33-" target="_blank" rel="noopener"><code>(!!)</code></a> function, except it executes in constant time.</p>
<p>The pruning related functions are rewritten for working with vectors:</p>
<div class="sourceCode" id="cb19" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">replaceCell ::</span> <span class="dt">Int</span> <span class="ot">-></span> <span class="dt">Cell</span> <span class="ot">-></span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">Grid</span></span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>replaceCell i c g <span class="ot">=</span> g <span class="op">Data.Vector.//</span> [(i, c)]</span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneCellsByFixed ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">CellIxs</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a>pruneCellsByFixed grid cellIxs <span class="ot">=</span></span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a> Control.Monad.foldM pruneCell grid <span class="op">.</span> <span class="fu">map</span> (\i <span class="ot">-></span> (i, grid <span class="op">!</span> i)) <span class="op">$</span> cellIxs</span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a> fixeds <span class="ot">=</span> setBits Data.Bits.zeroBits [x <span class="op">|</span> <span class="dt">Fixed</span> x <span class="ot"><-</span> <span class="fu">map</span> (grid <span class="op">!</span>) cellIxs]</span>
<span id="cb19-9"><a href="#cb19-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-10"><a href="#cb19-10" aria-hidden="true" tabindex="-1"></a> pruneCell g (_, <span class="dt">Fixed</span> _) <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb19-11"><a href="#cb19-11" aria-hidden="true" tabindex="-1"></a> pruneCell g (i, <span class="dt">Possible</span> xs)</span>
<span id="cb19-12"><a href="#cb19-12" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> xs' <span class="op">==</span> xs <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb19-13"><a href="#cb19-13" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="fu">flip</span> (replaceCell i) g <span class="op"><$></span> makeCell xs'</span>
<span id="cb19-14"><a href="#cb19-14" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb19-15"><a href="#cb19-15" aria-hidden="true" tabindex="-1"></a> xs' <span class="ot">=</span> xs <span class="op">Data.Bits..&.</span> Data.Bits.complement fixeds</span>
<span id="cb19-16"><a href="#cb19-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-17"><a href="#cb19-17" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneCellsByExclusives ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">CellIxs</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb19-18"><a href="#cb19-18" aria-hidden="true" tabindex="-1"></a>pruneCellsByExclusives grid cellIxs <span class="ot">=</span> <span class="kw">case</span> exclusives <span class="kw">of</span></span>
<span id="cb19-19"><a href="#cb19-19" aria-hidden="true" tabindex="-1"></a> [] <span class="ot">-></span> <span class="dt">Just</span> grid</span>
<span id="cb19-20"><a href="#cb19-20" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> Control.Monad.foldM pruneCell grid <span class="op">.</span> <span class="fu">zip</span> cellIxs <span class="op">$</span> cells</span>
<span id="cb19-21"><a href="#cb19-21" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb19-22"><a href="#cb19-22" aria-hidden="true" tabindex="-1"></a> cells <span class="ot">=</span> <span class="fu">map</span> (grid <span class="op">!</span>) cellIxs</span>
<span id="cb19-23"><a href="#cb19-23" aria-hidden="true" tabindex="-1"></a> exclusives <span class="ot">=</span> exclusivePossibilities cells</span>
<span id="cb19-24"><a href="#cb19-24" aria-hidden="true" tabindex="-1"></a> allExclusives <span class="ot">=</span> setBits Data.Bits.zeroBits exclusives</span>
<span id="cb19-25"><a href="#cb19-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-26"><a href="#cb19-26" aria-hidden="true" tabindex="-1"></a> pruneCell g (_, <span class="dt">Fixed</span> _) <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb19-27"><a href="#cb19-27" aria-hidden="true" tabindex="-1"></a> pruneCell g (i, <span class="dt">Possible</span> xs)</span>
<span id="cb19-28"><a href="#cb19-28" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> intersection <span class="op">==</span> xs <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb19-29"><a href="#cb19-29" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> intersection <span class="ot">`elem`</span> exclusives <span class="ot">=</span></span>
<span id="cb19-30"><a href="#cb19-30" aria-hidden="true" tabindex="-1"></a> <span class="fu">flip</span> (replaceCell i) g <span class="op"><$></span> makeCell intersection</span>
<span id="cb19-31"><a href="#cb19-31" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb19-32"><a href="#cb19-32" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb19-33"><a href="#cb19-33" aria-hidden="true" tabindex="-1"></a> intersection <span class="ot">=</span> xs <span class="op">Data.Bits..&.</span> allExclusives</span>
<span id="cb19-34"><a href="#cb19-34" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-35"><a href="#cb19-35" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneCells ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">CellIxs</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb19-36"><a href="#cb19-36" aria-hidden="true" tabindex="-1"></a>pruneCells grid cellIxs <span class="ot">=</span></span>
<span id="cb19-37"><a href="#cb19-37" aria-hidden="true" tabindex="-1"></a> fixM (<span class="fu">flip</span> pruneCellsByFixed cellIxs) grid</span>
<span id="cb19-38"><a href="#cb19-38" aria-hidden="true" tabindex="-1"></a> <span class="op">>>=</span> fixM (<span class="fu">flip</span> pruneCellsByExclusives cellIxs)</span></code></pre></div>
<p>All the three functions now take the grid and the cell indices instead of a list of cells, and use the cell indices to lookup the cells from the grid. Also, instead of using the <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-Traversable.html#v:traverse" target="_blank" rel="noopener"><code>traverse</code></a> function as earlier, now we use the <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Control-Monad.html#v:foldM" target="_blank" rel="noopener"><code>Control.Monad.foldM</code></a> function to fold over the cell-index-and-cell tuples in the context of the <code>Maybe</code> monad, making changes to the grid directly.</p>
<p>We use the <code>replaceCell</code> function to replace cells at an index in the grid. It is a simple wrapper over the vector update function <code>Data.Vector.//</code>. Rest of the code is same in essence, except a few changes to accommodate the changed function parameters.</p>
<p><code>pruneGrid'</code> function does not need to do transpositions and back-transpositions anymore as now we use the cell indices to go over the right parts of the grid vector directly:</p>
<div class="sourceCode" id="cb20" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneGrid' ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a>pruneGrid' grid <span class="ot">=</span></span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a> Control.Monad.foldM pruneCells grid allRowIxs</span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a> <span class="op">>>=</span> <span class="fu">flip</span> (Control.Monad.foldM pruneCells) allColIxs</span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a> <span class="op">>>=</span> <span class="fu">flip</span> (Control.Monad.foldM pruneCells) allSubGridIxs</span></code></pre></div>
<p>Notice that the <code>traverse</code> function here is also replaced by the <code>Control.Monad.foldM</code> function.</p>
<p>Similarly, the grid predicate functions change a little to go over a vector instead of a list of lists:</p>
<div class="sourceCode" id="cb21" data-lang="diff"><pre class="sourceCode numberSource diff"><code class="sourceCode diff"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a> isGridFilled :: Grid -> Bool</span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a><span class="st">-isGridFilled grid = null [ () | Possible _ <- concat grid ]</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a><span class="va">+isGridFilled = not . Data.Vector.any isPossible</span></span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a> isGridInvalid :: Grid -> Bool</span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a> isGridInvalid grid =</span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a><span class="st">- any isInvalidRow grid</span></span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a><span class="st">- || any isInvalidRow (Data.List.transpose grid)</span></span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a><span class="st">- || any isInvalidRow (subGridsToRows grid)</span></span>
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a><span class="va">+ any isInvalidRow (map (map (grid !)) allRowIxs)</span></span>
<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a><span class="va">+ || any isInvalidRow (map (map (grid !)) allColIxs)</span></span>
<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a><span class="va">+ || any isInvalidRow (map (map (grid !)) allSubGridIxs)</span></span></code></pre></div>
<p>And finally, we change the <code>nextGrids</code> function to replace the list related operations with the vector related ones:</p>
<div class="sourceCode" id="cb22" data-lang="diff"><pre class="sourceCode numberSource diff"><code class="sourceCode diff"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a> nextGrids :: Grid -> (Grid, Grid)</span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a> nextGrids grid =</span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a> let (i, first@(Fixed _), rest) =</span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a> fixCell</span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a><span class="st">- . Data.List.minimumBy</span></span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a><span class="va">+ . Data.Vector.minimumBy</span></span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a> (compare `Data.Function.on` (possibilityCount . snd))</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a><span class="st">- . filter (isPossible . snd)</span></span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a><span class="st">- . zip [0..]</span></span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a><span class="st">- . concat</span></span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a><span class="va">+ . Data.Vector.imapMaybe</span></span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a><span class="va">+ (\j cell -> if isPossible cell then Just (j, cell) else Nothing)</span></span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a> $ grid</span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a><span class="st">- in (replace2D i first grid, replace2D i rest grid)</span></span>
<span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a><span class="va">+ in (replaceCell i first grid, replaceCell i rest grid)</span></span></code></pre></div>
<p>We also switch the <code>replace2D</code> function which went over the entire list of lists of cells to replace a cell, with the vector-based <code>replaceCell</code> function.</p>
<p>All the required changes are done. Let’s do a run:</p>
<pre class="plain"><code>$ stack build
$ cat sudoku17.txt | time stack exec sudoku > /dev/null
88.53 real 88.16 user 0.41 sys</code></pre>
<p>Oops! Instead of getting a speedup, our vector-based code is actually 1.3x slower than the list-based code. How did this happen? Time to bust out the profiler again!</p>
<h2 data-track-content data-content-name="revenge-of-the" data-content-piece="fast-sudoku-solver-in-haskell-3" id="revenge-of-the">Revenge of the <code>(==)</code></h2>
<p>Profiling the current code gives us the following hotspots:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Cost Centre</th>
<th style="text-align: left;">Src</th>
<th style="text-align: right;">%time</th>
<th style="text-align: right;">%alloc</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><code>>>=</code></td>
<td style="text-align: left;">Data/Vector/Fusion/Util.hs:36:3-18</td>
<td style="text-align: right;">52.2</td>
<td style="text-align: right;">51.0</td>
</tr>
<tr>
<td style="text-align: left;"><code>basicUnsafeIndexM</code></td>
<td style="text-align: left;">Data/Vector.hs:278:3-62</td>
<td style="text-align: right;">22.2</td>
<td style="text-align: right;">20.4</td>
</tr>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities</code></td>
<td style="text-align: left;">Sudoku.hs:(75,1)-(93,26)</td>
<td style="text-align: right;">6.8</td>
<td style="text-align: right;">8.3</td>
</tr>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities.\.\</code></td>
<td style="text-align: left;">Sudoku.hs:83:23-96</td>
<td style="text-align: right;">3.8</td>
<td style="text-align: right;">8.8</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCellsByFixed.fixeds</code></td>
<td style="text-align: left;">Sudoku.hs:105:5-77</td>
<td style="text-align: right;">2.0</td>
<td style="text-align: right;">1.7</td>
</tr>
</tbody>
</table>
</div>
<p>We see a sudden appearance of <code>(>>=)</code> from the <code>Data.Vector.Fusion.Util</code> module at the top of the list, taking more than half of the run time. For more clues, we dive into the detailed profiler report and find this bit:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Cost Centre</th>
<th style="text-align: left;">Src</th>
<th style="text-align: right;">%time</th>
<th style="text-align: right;">%alloc</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><code>pruneGrid</code></td>
<td style="text-align: left;">Sudoku.hs:143:1-27</td>
<td style="text-align: right;">0.0</td>
<td style="text-align: right;">0.0</td>
</tr>
<tr>
<td style="text-align: left;"> <code>fixM</code></td>
<td style="text-align: left;">Sudoku.hs:16:1-65</td>
<td style="text-align: right;">0.1</td>
<td style="text-align: right;">0.0</td>
</tr>
<tr>
<td style="text-align: left;"> <code>fixM.\</code></td>
<td style="text-align: left;">Sudoku.hs:16:27-65</td>
<td style="text-align: right;">0.2</td>
<td style="text-align: right;">0.1</td>
</tr>
<tr>
<td style="text-align: left;"> <code>==</code></td>
<td style="text-align: left;">Data/Vector.hs:287:3-50</td>
<td style="text-align: right;">1.0</td>
<td style="text-align: right;">1.4</td>
</tr>
<tr>
<td style="text-align: left;"> <code>>>=</code></td>
<td style="text-align: left;">Data/Vector/Fusion/Util.hs:36:3-18</td>
<td style="text-align: right;">51.9</td>
<td style="text-align: right;">50.7</td>
</tr>
<tr>
<td style="text-align: left;"> <code>basicUnsafeIndexM</code></td>
<td style="text-align: left;">Data/Vector.hs:278:3-62</td>
<td style="text-align: right;">19.3</td>
<td style="text-align: right;">20.3</td>
</tr>
</tbody>
</table>
</div>
<p>Here, the indentation indicated nesting of operations. We see that both the <code>(>>=)</code> and <code>basicUnsafeIndexM</code> functions—which together take around three-quarter of the run time—are being called from the <code>(==)</code> function in the <code>fixM</code> function<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>. It seems like we are checking for equality too many times. Here’s the usage of the <code>fixM</code> for reference:</p>
<div class="sourceCode" id="cb24" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneCells ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">CellIxs</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>pruneCells grid cellIxs <span class="ot">=</span></span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a> fixM (<span class="fu">flip</span> pruneCellsByFixed cellIxs) grid</span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a> <span class="op">>>=</span> fixM (<span class="fu">flip</span> pruneCellsByExclusives cellIxs)</span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneGrid ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"></a>pruneGrid <span class="ot">=</span> fixM pruneGrid'</span></code></pre></div>
<p>In <code>pruneGrid</code>, we run <code>pruneGrid'</code> till the resultant grid settles, that is, the grid computed in a particular iteration is <strong>equal to</strong> the grid in the previous iteration. Interestingly, we do the same thing in <code>pruneCells</code> too. We equate <strong>the whole grid</strong> to check for settling of each block of cells. This is the reason of the slowdown.</p>
<h2 data-track-content data-content-name="one-function-to-prune-them-all" data-content-piece="fast-sudoku-solver-in-haskell-3" id="one-function-to-prune-them-all">One Function to Prune Them All</h2>
<p>Why did we add <code>fixM</code> in the <code>pruneCells</code> function at all? Quoting from the <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed#fn6">previous post</a>,</p>
<blockquote>
<p>We need to run <code>pruneCellsByFixed</code> and <code>pruneCellsByExclusives</code> repeatedly using <code>fixM</code> because an unsettled row can lead to wrong solutions.</p>
<p>Imagine a row which just got a <code>9</code> fixed because of <code>pruneCellsByFixed</code>. If we don’t run the function again, the row may be left with one non-fixed cell with a <code>9</code>. When we run this row through <code>pruneCellsByExclusives</code>, it’ll consider the <code>9</code> in the non-fixed cell as a <em>Single</em> and fix it. This will lead to two <code>9</code>s in the same row, causing the solution to fail.</p>
</blockquote>
<p>So the reason we added <code>fixM</code> is that, we run the two pruning strategies one-after-another. That way, they see the cells in the same block in different states. If we were to merge the two pruning functions into a single one such that they work in lockstep, we would not need to run <code>fixM</code> at all!</p>
<p>With this idea, we rewrite <code>pruneCells</code> as a single function:</p>
<div class="sourceCode" id="cb25" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneCells ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">CellIxs</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a>pruneCells grid cellIxs <span class="ot">=</span> Control.Monad.foldM pruneCell grid cellIxs</span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a> cells <span class="ot">=</span> <span class="fu">map</span> (grid <span class="op">!</span>) cellIxs</span>
<span id="cb25-5"><a href="#cb25-5" aria-hidden="true" tabindex="-1"></a> exclusives <span class="ot">=</span> exclusivePossibilities cells</span>
<span id="cb25-6"><a href="#cb25-6" aria-hidden="true" tabindex="-1"></a> allExclusives <span class="ot">=</span> setBits Data.Bits.zeroBits exclusives</span>
<span id="cb25-7"><a href="#cb25-7" aria-hidden="true" tabindex="-1"></a> fixeds <span class="ot">=</span> setBits Data.Bits.zeroBits [x <span class="op">|</span> <span class="dt">Fixed</span> x <span class="ot"><-</span> cells]</span>
<span id="cb25-8"><a href="#cb25-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-9"><a href="#cb25-9" aria-hidden="true" tabindex="-1"></a> pruneCell g i <span class="ot">=</span></span>
<span id="cb25-10"><a href="#cb25-10" aria-hidden="true" tabindex="-1"></a> pruneCellByFixed g (i, g <span class="op">!</span> i) <span class="op">>>=</span> \g' <span class="ot">-></span> pruneCellByExclusives g' (i, g' <span class="op">!</span> i)</span>
<span id="cb25-11"><a href="#cb25-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-12"><a href="#cb25-12" aria-hidden="true" tabindex="-1"></a> pruneCellByFixed g (_, <span class="dt">Fixed</span> _) <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb25-13"><a href="#cb25-13" aria-hidden="true" tabindex="-1"></a> pruneCellByFixed g (i, <span class="dt">Possible</span> xs)</span>
<span id="cb25-14"><a href="#cb25-14" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> xs' <span class="op">==</span> xs <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb25-15"><a href="#cb25-15" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="fu">flip</span> (replaceCell i) g <span class="op"><$></span> makeCell xs'</span>
<span id="cb25-16"><a href="#cb25-16" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb25-17"><a href="#cb25-17" aria-hidden="true" tabindex="-1"></a> xs' <span class="ot">=</span> xs <span class="op">Data.Bits..&.</span> Data.Bits.complement fixeds</span>
<span id="cb25-18"><a href="#cb25-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-19"><a href="#cb25-19" aria-hidden="true" tabindex="-1"></a> pruneCellByExclusives g (_, <span class="dt">Fixed</span> _) <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb25-20"><a href="#cb25-20" aria-hidden="true" tabindex="-1"></a> pruneCellByExclusives g (i, <span class="dt">Possible</span> xs)</span>
<span id="cb25-21"><a href="#cb25-21" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">null</span> exclusives <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb25-22"><a href="#cb25-22" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> intersection <span class="op">==</span> xs <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb25-23"><a href="#cb25-23" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> intersection <span class="ot">`elem`</span> exclusives <span class="ot">=</span></span>
<span id="cb25-24"><a href="#cb25-24" aria-hidden="true" tabindex="-1"></a> <span class="fu">flip</span> (replaceCell i) g <span class="op"><$></span> makeCell intersection</span>
<span id="cb25-25"><a href="#cb25-25" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb25-26"><a href="#cb25-26" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb25-27"><a href="#cb25-27" aria-hidden="true" tabindex="-1"></a> intersection <span class="ot">=</span> xs <span class="op">Data.Bits..&.</span> allExclusives</span></code></pre></div>
<p>We have merged the two pruning functions almost blindly. The important part here is the nested <code>pruneCell</code> function which uses monadic bind <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Control-Monad.html#v:-62--62--61-" target="_blank" rel="noopener"><code>(>>=)</code></a> to ensure that cells fixed in the first step are seen by the next step. Merging the two functions ensures that both strategies will see same <em>Exclusives</em> and <em>Fixeds</em>, thereby running in lockstep.</p>
<p>Let’s try it out:</p>
<pre class="plain"><code>$ stack build
$ cat sudoku17.txt | time stack exec sudoku > /dev/null
57.67 real 57.12 user 0.46 sys</code></pre>
<p>Ah, now it’s faster than the list-based implementation by 1.2x<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>. Let’s see what the profiler says:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Cost Centre</th>
<th style="text-align: left;">Src</th>
<th style="text-align: right;">%time</th>
<th style="text-align: right;">%alloc</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities.\.\</code></td>
<td style="text-align: left;">Sudoku.hs:82:23-96</td>
<td style="text-align: right;">15.7</td>
<td style="text-align: right;">33.3</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCells</code></td>
<td style="text-align: left;">Sudoku.hs:(101,1)-(126,53)</td>
<td style="text-align: right;">9.6</td>
<td style="text-align: right;">6.8</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCells.pruneCell</code></td>
<td style="text-align: left;">Sudoku.hs:(108,5)-(109,83)</td>
<td style="text-align: right;">9.5</td>
<td style="text-align: right;">2.1</td>
</tr>
<tr>
<td style="text-align: left;"><code>basicUnsafeIndexM</code></td>
<td style="text-align: left;">Data/Vector.hs:278:3-62</td>
<td style="text-align: right;">9.4</td>
<td style="text-align: right;">0.5</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCells.pruneCell.\</code></td>
<td style="text-align: left;">Sudoku.hs:109:48-83</td>
<td style="text-align: right;">7.6</td>
<td style="text-align: right;">2.1</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCells.cells</code></td>
<td style="text-align: left;">Sudoku.hs:103:5-40</td>
<td style="text-align: right;">7.1</td>
<td style="text-align: right;">10.9</td>
</tr>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities.\</code></td>
<td style="text-align: left;">Sudoku.hs:87:64-96</td>
<td style="text-align: right;">3.5</td>
<td style="text-align: right;">3.8</td>
</tr>
<tr>
<td style="text-align: left;"><code>EP.Map.filter1</code></td>
<td style="text-align: left;">Sudoku.hs:86:35-61</td>
<td style="text-align: right;">3.0</td>
<td style="text-align: right;">0.6</td>
</tr>
<tr>
<td style="text-align: left;"><code>>>=</code></td>
<td style="text-align: left;">Data/Vector/Fusion/Util.hs:36:3-18</td>
<td style="text-align: right;">2.8</td>
<td style="text-align: right;">2.0</td>
</tr>
<tr>
<td style="text-align: left;"><code>replaceCell</code></td>
<td style="text-align: left;">Sudoku.hs:59:1-45</td>
<td style="text-align: right;">2.5</td>
<td style="text-align: right;">1.1</td>
</tr>
<tr>
<td style="text-align: left;"><code>EP.filter</code></td>
<td style="text-align: left;">Sudoku.hs:78:30-54</td>
<td style="text-align: right;">2.4</td>
<td style="text-align: right;">3.3</td>
</tr>
<tr>
<td style="text-align: left;"><code>primitive</code></td>
<td style="text-align: left;">Control/Monad/Primitive.hs:195:3-16</td>
<td style="text-align: right;">2.3</td>
<td style="text-align: right;">6.5</td>
</tr>
</tbody>
</table>
</div>
<p>The double nested anonymous function mentioned before is still the biggest culprit but <code>fixM</code> has disappeared from the list. Let’s tackle <code>exclusivePossibilities</code> now.</p>
<h2 data-track-content data-content-name="rise-of-the-mutables" data-content-piece="fast-sudoku-solver-in-haskell-3" id="rise-of-the-mutables">Rise of the Mutables</h2>
<p>Here’s <code>exclusivePossibilities</code> again for reference:</p>
<div class="sourceCode" id="cb27" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="ot">exclusivePossibilities ::</span> [<span class="dt">Cell</span>] <span class="ot">-></span> [<span class="dt">Data.Word.Word16</span>]</span>
<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a>exclusivePossibilities row <span class="ot">=</span></span>
<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a> row</span>
<span id="cb27-4"><a href="#cb27-4" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> <span class="fu">zip</span> [<span class="dv">1</span><span class="op">..</span><span class="dv">9</span>]</span>
<span id="cb27-5"><a href="#cb27-5" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> <span class="fu">filter</span> (isPossible <span class="op">.</span> <span class="fu">snd</span>)</span>
<span id="cb27-6"><a href="#cb27-6" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> Data.List.foldl'</span>
<span id="cb27-7"><a href="#cb27-7" aria-hidden="true" tabindex="-1"></a> (\acc <span class="op">~</span>(i, <span class="dt">Possible</span> xs) <span class="ot">-></span></span>
<span id="cb27-8"><a href="#cb27-8" aria-hidden="true" tabindex="-1"></a> Data.List.foldl'</span>
<span id="cb27-9"><a href="#cb27-9" aria-hidden="true" tabindex="-1"></a> (\acc' n <span class="ot">-></span> <span class="kw">if</span> Data.Bits.testBit xs n</span>
<span id="cb27-10"><a href="#cb27-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">then</span> Map.insertWith prepend n [i] acc'</span>
<span id="cb27-11"><a href="#cb27-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">else</span> acc')</span>
<span id="cb27-12"><a href="#cb27-12" aria-hidden="true" tabindex="-1"></a> acc</span>
<span id="cb27-13"><a href="#cb27-13" aria-hidden="true" tabindex="-1"></a> [<span class="dv">1</span><span class="op">..</span><span class="dv">9</span>])</span>
<span id="cb27-14"><a href="#cb27-14" aria-hidden="true" tabindex="-1"></a> Map.empty</span>
<span id="cb27-15"><a href="#cb27-15" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> Map.filter ((<span class="op"><</span> <span class="dv">4</span>) <span class="op">.</span> <span class="fu">length</span>)</span>
<span id="cb27-16"><a href="#cb27-16" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> Map.foldlWithKey'(\acc x is <span class="ot">-></span> Map.insertWith prepend is [x] acc) Map.empty</span>
<span id="cb27-17"><a href="#cb27-17" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> Map.filterWithKey (\is xs <span class="ot">-></span> <span class="fu">length</span> is <span class="op">==</span> <span class="fu">length</span> xs)</span>
<span id="cb27-18"><a href="#cb27-18" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> Map.elems</span>
<span id="cb27-19"><a href="#cb27-19" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> <span class="fu">map</span> (Data.List.foldl' Data.Bits.setBit Data.Bits.zeroBits)</span>
<span id="cb27-20"><a href="#cb27-20" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb27-21"><a href="#cb27-21" aria-hidden="true" tabindex="-1"></a> prepend <span class="op">~</span>[y] ys <span class="ot">=</span> y<span class="op">:</span>ys</span></code></pre></div>
<p>Let’s zoom into lines 6–14. Here, we do a fold with a nested fold over the non-fixed cells of the given block to accumulate the mapping from the digits to the indices of the cells they occur in. We use a <a href="https://hackage.haskell.org/package/containers-0.6.0.1/docs/Data-Map-Strict.html" target="_blank" rel="noopener"><code>Data.Map.Strict</code></a> map as the accumulator. If a digit is not present in the map as a key then we add a singleton list containing the corresponding cell index as the value. If the digit is already present in the map then we prepend the cell index to the list of indices for the digit. So we end up “mutating” the map repeatedly.</p>
<p>Of course, it’s not actual mutation because the map data structure we are using is immutable. Each change to the map instance creates a new copy with the addition, which we thread through the fold operation, and we get the final copy at the end. This may be the reason of the slowness in this section of the code.</p>
<p>What if, instead of using an immutable data structure for this, we used a mutable one? But how can we do that when we know that Haskell is a pure language? Purity means that all code must be <a href="https://en.wikipedia.org/wiki/Referential_transparency" target="_blank" rel="noopener">referentially transparent</a>, and mutability certainly isn’t. It turns out, there is an escape hatch to mutability in Haskell. Quoting the relevant section from the book <a href="https://book.realworldhaskell.org/read/advanced-library-design-building-a-bloom-filter.html#id680273" target="_blank" rel="noopener">Real World Haskell</a>:</p>
<blockquote>
<p>Haskell provides a special monad, named <code>ST</code>, which lets us work safely with mutable state. Compared to the <code>State</code> monad, it has some powerful added capabilities.</p>
<ul>
<li>We can <em>thaw</em> an immutable array to give a mutable array; modify the mutable array in place; and freeze a new immutable array when we are done.</li>
<li>We have the ability to use <em>mutable references</em>. This lets us implement data structures that we can modify after construction, as in an imperative language. This ability is vital for some imperative data structures and algorithms, for which similarly efficient purely functional alternatives have not yet been discovered.</li>
</ul>
</blockquote>
<p>So if we use a mutable map in the <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Control-Monad-ST.html" target="_blank" rel="noopener"><code>ST</code> monad</a>, we may be able to get rid of this bottleneck. But, we can actually do better! Since the keys of our map are digits <code>1</code>–<code>9</code>, we can use a <a href="https://hackage.haskell.org/package/vector-0.12.0.1/docs/Data-Vector-Mutable.html" target="_blank" rel="noopener">mutable vector</a> to store the indices. In fact, we can go one step even further and store the indices as a BitSet as <code>Word16</code> because they also range from 1 to 9, and are unique for a block. This lets us use an <a href="https://hackage.haskell.org/package/vector-0.12.0.1/docs/Data-Vector-Unboxed-Mutable.html" target="_blank" rel="noopener">unboxed mutable vector</a>. What is <em>unboxing</em> you ask? Quoting from the <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/primitives.html#unboxed-types" target="_blank" rel="noopener">GHC docs</a>:</p>
<blockquote>
<p>Most types in GHC are boxed, which means that values of that type are represented by a pointer to a heap object. The representation of a Haskell <code>Int</code>, for example, is a two-word heap object. An unboxed type, however, is represented by the value itself, no pointers or heap allocation are involved.</p>
</blockquote>
<p>When combined with vector, unboxing of values means the whole vector is stored as single byte array, avoiding pointer redirections completely. This is more memory efficient and allows better usage of caches<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>. Let’s rewrite <code>exclusivePossibilities</code> using <code>ST</code> and unboxed mutable vectors.</p>
<p>First we write the core of this operation, the function <code>cellIndicesList</code> which take a list of cells and returns the digit to cell indices mapping. The mapping is returned as a list. The zeroth value in this list is the indices of the cells which have <code>1</code> as a possible digit, and so on. The indices themselves are packed as BitSets. If the bit 1 is set then the first cell has a particular digit. Let’s say it returns <code>[0,688,54,134,0,654,652,526,670]</code>. In 10-bit binary it is:</p>
<pre class="plain"><code>[0000000000, 1010110000, 0000110110, 0010000110, 0000000000, 1010001110, 1010001100, 1000001110, 1010011110]</code></pre>
<p>We can arrange it in a table for further clarity:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: right;">Digits</th>
<th style="text-align: right;">Cell 9</th>
<th style="text-align: right;">Cell 8</th>
<th style="text-align: right;">Cell 7</th>
<th style="text-align: right;">Cell 6</th>
<th style="text-align: right;">Cell 5</th>
<th style="text-align: right;">Cell 4</th>
<th style="text-align: right;">Cell 3</th>
<th style="text-align: right;">Cell 2</th>
<th style="text-align: right;">Cell 1</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
</tr>
<tr>
<td style="text-align: right;">2</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
</tr>
<tr>
<td style="text-align: right;">3</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: right;">4</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: right;">5</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
</tr>
<tr>
<td style="text-align: right;">6</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: right;">7</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
</tr>
<tr>
<td style="text-align: right;">8</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: right;">9</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">0</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1</td>
</tr>
</tbody>
</table>
</div>
<p>If the value of the intersection of a particular digit and a particular cell index in the table is set to 1, then the digit is a possibility in the cell, else it is not. Here’s the code:</p>
<div class="sourceCode" id="cb29" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb29-1"><a href="#cb29-1" aria-hidden="true" tabindex="-1"></a><span class="ot">cellIndicesList ::</span> [<span class="dt">Cell</span>] <span class="ot">-></span> [<span class="dt">Data.Word.Word16</span>]</span>
<span id="cb29-2"><a href="#cb29-2" aria-hidden="true" tabindex="-1"></a>cellIndicesList cells <span class="ot">=</span></span>
<span id="cb29-3"><a href="#cb29-3" aria-hidden="true" tabindex="-1"></a> Data.Vector.Unboxed.toList <span class="op">$</span> Control.Monad.ST.runST <span class="op">$</span> <span class="kw">do</span></span>
<span id="cb29-4"><a href="#cb29-4" aria-hidden="true" tabindex="-1"></a> vec <span class="ot"><-</span> Data.Vector.Unboxed.Mutable.replicate <span class="dv">9</span> Data.Bits.zeroBits</span>
<span id="cb29-5"><a href="#cb29-5" aria-hidden="true" tabindex="-1"></a> ref <span class="ot"><-</span> Data.STRef.newSTRef (<span class="dv">1</span><span class="ot"> ::</span> <span class="dt">Int</span>)</span>
<span id="cb29-6"><a href="#cb29-6" aria-hidden="true" tabindex="-1"></a> Control.Monad.forM_ cells <span class="op">$</span> \cell <span class="ot">-></span> <span class="kw">do</span></span>
<span id="cb29-7"><a href="#cb29-7" aria-hidden="true" tabindex="-1"></a> i <span class="ot"><-</span> Data.STRef.readSTRef ref</span>
<span id="cb29-8"><a href="#cb29-8" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> cell <span class="kw">of</span></span>
<span id="cb29-9"><a href="#cb29-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">Fixed</span> _ <span class="ot">-></span> <span class="fu">return</span> ()</span>
<span id="cb29-10"><a href="#cb29-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">Possible</span> xs <span class="ot">-></span> Control.Monad.forM_ [<span class="dv">0</span><span class="op">..</span><span class="dv">8</span>] <span class="op">$</span> \d <span class="ot">-></span></span>
<span id="cb29-11"><a href="#cb29-11" aria-hidden="true" tabindex="-1"></a> Control.Monad.when (Data.Bits.testBit xs (d<span class="op">+</span><span class="dv">1</span>)) <span class="op">$</span></span>
<span id="cb29-12"><a href="#cb29-12" aria-hidden="true" tabindex="-1"></a> Data.Vector.Unboxed.Mutable.unsafeModify vec (<span class="ot">`Data.Bits.setBit`</span> i) d</span>
<span id="cb29-13"><a href="#cb29-13" aria-hidden="true" tabindex="-1"></a> Data.STRef.writeSTRef ref (i<span class="op">+</span><span class="dv">1</span>)</span>
<span id="cb29-14"><a href="#cb29-14" aria-hidden="true" tabindex="-1"></a> Data.Vector.Unboxed.unsafeFreeze vec</span></code></pre></div>
<p>The whole mutable code runs inside the <code>runST</code> function. <code>runST</code> take an operation in <code>ST</code> monad and executes it, making sure that the mutable references created inside it cannot escape the scope of <code>runST</code>. This is done using a type-system trickery called <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/exts/rank_polymorphism.html" target="_blank" rel="noopener">Rank-2 types</a>.</p>
<p>Inside the <code>ST</code> operation, we start with creating a mutable vector of <code>Word16</code>s of size 9 with all its values initially set to zero. We also initialize a mutable reference to keep track of the cell index we are on. Then we run two nested for loops, going over each cell and each digit <code>1</code>–<code>9</code>, setting the right bit of the right index of the mutable vector. During this, we mutate the vector directly using the <code>Data.Vector.Unboxed.Mutable.unsafeModify</code> function. At the end of the <code>ST</code> operation, we freeze the mutable vector to return an immutable version of it. Outside <code>runST</code>, we convert the immutable vector to a list. Notice how this code is quite similar to how we’d write it in <a href="https://en.wikipedia.org/wiki/Imperative_programming" target="_blank" rel="noopener">imperative programming</a> languages like C or Java<a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a>.</p>
<p>It is easy to use this function now to rewrite <code>exclusivePossibilities</code>:</p>
<div class="sourceCode" id="cb30" data-lang="diff"><pre class="sourceCode numberSource diff"><code class="sourceCode diff"><span id="cb30-1"><a href="#cb30-1" aria-hidden="true" tabindex="-1"></a> exclusivePossibilities :: [Cell] -> [Data.Word.Word16]</span>
<span id="cb30-2"><a href="#cb30-2" aria-hidden="true" tabindex="-1"></a> exclusivePossibilities row =</span>
<span id="cb30-3"><a href="#cb30-3" aria-hidden="true" tabindex="-1"></a> row</span>
<span id="cb30-4"><a href="#cb30-4" aria-hidden="true" tabindex="-1"></a><span class="st">- & zip [1..9]</span></span>
<span id="cb30-5"><a href="#cb30-5" aria-hidden="true" tabindex="-1"></a><span class="st">- & filter (isPossible . snd)</span></span>
<span id="cb30-6"><a href="#cb30-6" aria-hidden="true" tabindex="-1"></a><span class="st">- & Data.List.foldl'</span></span>
<span id="cb30-7"><a href="#cb30-7" aria-hidden="true" tabindex="-1"></a><span class="st">- (\acc ~(i, Possible xs) -></span></span>
<span id="cb30-8"><a href="#cb30-8" aria-hidden="true" tabindex="-1"></a><span class="st">- Data.List.foldl'</span></span>
<span id="cb30-9"><a href="#cb30-9" aria-hidden="true" tabindex="-1"></a><span class="st">- (\acc' n -> if Data.Bits.testBit xs n</span></span>
<span id="cb30-10"><a href="#cb30-10" aria-hidden="true" tabindex="-1"></a><span class="st">- then Map.insertWith prepend n [i] acc'</span></span>
<span id="cb30-11"><a href="#cb30-11" aria-hidden="true" tabindex="-1"></a><span class="st">- else acc')</span></span>
<span id="cb30-12"><a href="#cb30-12" aria-hidden="true" tabindex="-1"></a><span class="st">- acc</span></span>
<span id="cb30-13"><a href="#cb30-13" aria-hidden="true" tabindex="-1"></a><span class="st">- [1..9])</span></span>
<span id="cb30-14"><a href="#cb30-14" aria-hidden="true" tabindex="-1"></a><span class="st">- Map.empty</span></span>
<span id="cb30-15"><a href="#cb30-15" aria-hidden="true" tabindex="-1"></a><span class="va">+ & cellIndicesList</span></span>
<span id="cb30-16"><a href="#cb30-16" aria-hidden="true" tabindex="-1"></a><span class="va">+ & zip [1..9]</span></span>
<span id="cb30-17"><a href="#cb30-17" aria-hidden="true" tabindex="-1"></a><span class="st">- & Map.filter ((< 4) . length)</span></span>
<span id="cb30-18"><a href="#cb30-18" aria-hidden="true" tabindex="-1"></a><span class="st">- & Map.foldlWithKey' (\acc x is -> Map.insertWith prepend is [x] acc) Map.empty</span></span>
<span id="cb30-19"><a href="#cb30-19" aria-hidden="true" tabindex="-1"></a><span class="st">- & Map.filterWithKey (\is xs -> length is == length xs)</span></span>
<span id="cb30-20"><a href="#cb30-20" aria-hidden="true" tabindex="-1"></a><span class="va">+ & filter (\(_, is) -> let p = Data.Bits.popCount is in p > 0 && p < 4)</span></span>
<span id="cb30-21"><a href="#cb30-21" aria-hidden="true" tabindex="-1"></a><span class="va">+ & Data.List.foldl' (\acc (x, is) -> Map.insertWith prepend is [x] acc) Map.empty</span></span>
<span id="cb30-22"><a href="#cb30-22" aria-hidden="true" tabindex="-1"></a><span class="va">+ & Map.filterWithKey (\is xs -> Data.Bits.popCount is == length xs)</span></span>
<span id="cb30-23"><a href="#cb30-23" aria-hidden="true" tabindex="-1"></a> & Map.elems</span>
<span id="cb30-24"><a href="#cb30-24" aria-hidden="true" tabindex="-1"></a> & map (Data.List.foldl' Data.Bits.setBit Data.Bits.zeroBits)</span>
<span id="cb30-25"><a href="#cb30-25" aria-hidden="true" tabindex="-1"></a> where</span>
<span id="cb30-26"><a href="#cb30-26" aria-hidden="true" tabindex="-1"></a> prepend ~[y] ys = y:ys</span></code></pre></div>
<p>We replace the nested two-fold operation with <code>cellIndicesList</code>. Then we replace some map related function with the corresponding list ones because <code>cellIndicesList</code> returns a list. We also replace the <code>length</code> function call on cell indices with <code>Data.Bits.popCount</code> function call as the indices are represented as <code>Word16</code> now.</p>
<p>That is it. Let’s build and run it now:</p>
<pre class="plain"><code>$ stack build
$ cat sudoku17.txt | time stack exec sudoku > /dev/null
35.04 real 34.84 user 0.24 sys</code></pre>
<p>That’s a 1.6x speedup over the map-and-fold based version. Let’s check what the profiler has to say:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Cost Centre</th>
<th style="text-align: left;">Src</th>
<th style="text-align: right;">%time</th>
<th style="text-align: right;">%alloc</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><code>cellIndicesList.\.\</code></td>
<td style="text-align: left;">Sudoku.hs:(88,11)-(89,81)</td>
<td style="text-align: right;">10.7</td>
<td style="text-align: right;">6.0</td>
</tr>
<tr>
<td style="text-align: left;"><code>primitive</code></td>
<td style="text-align: left;">Control/Monad/Primitive.hs:195:3-16</td>
<td style="text-align: right;">7.9</td>
<td style="text-align: right;">6.9</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCells</code></td>
<td style="text-align: left;">Sudoku.hs:(113,1)-(138,53)</td>
<td style="text-align: right;">7.5</td>
<td style="text-align: right;">6.4</td>
</tr>
<tr>
<td style="text-align: left;"><code>cellIndicesList</code></td>
<td style="text-align: left;">Sudoku.hs:(79,1)-(91,40)</td>
<td style="text-align: right;">7.4</td>
<td style="text-align: right;">10.1</td>
</tr>
<tr>
<td style="text-align: left;"><code>basicUnsafeIndexM</code></td>
<td style="text-align: left;">Data/Vector.hs:278:3-62</td>
<td style="text-align: right;">7.3</td>
<td style="text-align: right;">0.5</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCells.pruneCell</code></td>
<td style="text-align: left;">Sudoku.hs:(120,5)-(121,83)</td>
<td style="text-align: right;">6.8</td>
<td style="text-align: right;">2.0</td>
</tr>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities</code></td>
<td style="text-align: left;">Sudoku.hs:(94,1)-(104,26)</td>
<td style="text-align: right;">6.5</td>
<td style="text-align: right;">9.7</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCells.pruneCell.\</code></td>
<td style="text-align: left;">Sudoku.hs:121:48-83</td>
<td style="text-align: right;">6.1</td>
<td style="text-align: right;">2.0</td>
</tr>
<tr>
<td style="text-align: left;"><code>cellIndicesList.\</code></td>
<td style="text-align: left;">Sudoku.hs:(83,42)-(90,37)</td>
<td style="text-align: right;">5.5</td>
<td style="text-align: right;">3.5</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCells.cells</code></td>
<td style="text-align: left;">Sudoku.hs:115:5-40</td>
<td style="text-align: right;">5.0</td>
<td style="text-align: right;">10.4</td>
</tr>
</tbody>
</table>
</div>
<p>The run time is spread quite evenly over all the functions now and there are no hotspots anymore. We stop optimizating at this point<a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a>. Let’s see how far we have come up.</p>
<h2 data-track-content data-content-name="comparison-of-implementations" data-content-piece="fast-sudoku-solver-in-haskell-3" id="comparison-of-implementations">Comparison of Implementations</h2>
<p>Below is a table showing the speedups we got with each new implementation:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Implementation</th>
<th style="text-align: right;">Run Time (s)</th>
<th style="text-align: right;">Incremental Speedup</th>
<th style="text-align: right;">Cumulative Speedup</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">Simple</td>
<td style="text-align: right;">47450</td>
<td style="text-align: right;">1x</td>
<td style="text-align: right;">1x</td>
</tr>
<tr>
<td style="text-align: left;">Exclusive Pruning</td>
<td style="text-align: right;">258.97</td>
<td style="text-align: right;">183.23x</td>
<td style="text-align: right;">183x</td>
</tr>
<tr>
<td style="text-align: left;">BitSet</td>
<td style="text-align: right;">69.44</td>
<td style="text-align: right;">3.73x</td>
<td style="text-align: right;">683x</td>
</tr>
<tr>
<td style="text-align: left;">Vector</td>
<td style="text-align: right;">57.67</td>
<td style="text-align: right;">1.20x</td>
<td style="text-align: right;">823x</td>
</tr>
<tr>
<td style="text-align: left;">Mutable Vector</td>
<td style="text-align: right;">35.04</td>
<td style="text-align: right;">1.65x</td>
<td style="text-align: right;">1354x</td>
</tr>
</tbody>
</table>
</div>
<p>The first improvement over the simple solution got us the most major speedup of 183x. After that, we followed the profiler, fixing bottlenecks by using the right data structures. We got quite significant speedup over the naive list-based solution, leading to drop in the run time from 259 seconds to 35 seconds. In total, we have done more than a thousand times improvement in the run time since the first solution!</p>
<h2 data-track-content data-content-name="conclusion" data-content-piece="fast-sudoku-solver-in-haskell-3" id="conclusion">Conclusion</h2>
<p>In this post, we improved upon our list-based Sudoku solution from the <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed">last time</a>. We profiled the code at each step, found the bottlenecks and fixed them by choosing the right data structure for the case. We ended up using BitSets and Vectors—both immutable and mutable varieties—for the different parts of the code. Finally, we sped up our program by 7.4 times. Can we go even faster? How about using all those other CPU cores which have been lying idle? Come back for the next post in this series where we’ll explore the parallel programming facilities in Haskell. The code till now is available <a href="https://code.abhinavsarkar.net/abhin4v/hasdoku/src/commit/4a9a1531d5780e7abc7d5ab2a26dccbf34382031?mtm_campaign=feed" target="_blank" rel="noopener">here</a>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>All the runs were done on my MacBook Pro from 2014 with 2.2 GHz Intel Core i7 CPU and 16 GB memory.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>A lot of the code in this post references the code from the previous posts, including showing diffs. So, please read the previous posts if you have not already done so.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>Notice the British English spelling of the word “Centre”. GHC was originally developed in <a href="https://en.wikipedia.org/wiki/University_of_Glasgow" target="_blank" rel="noopener">University of Glasgow</a> in Scotland.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>The code for the BitSet based implementation can be found <a href="https://code.abhinavsarkar.net/abhin4v/hasdoku/src/commit/5a3044e09cd86dd6154bc50760095c4b38c48c6a?mtm_campaign=feed" target="_blank" rel="noopener">here</a>.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p><a href="https://www.schoolofhaskell.com/user/commercial/content/vector" target="_blank" rel="noopener">This article</a> on School of Haskell goes into details about performance of vectors vs. lists. There are also <a href="https://github.com/haskell-perf/sequences/blob/master/README.md" target="_blank" rel="noopener">these</a> benchmarks for sequence data structures in Haskell: lists, vectors, seqs, etc.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>We see Haskell’s laziness at work here. In the code for the <code>fixM</code> function, the <code>(==)</code> function is nested inside the <code>(>>=)</code> function, but because of laziness, they are actually evaluated in the reverse order. The evaluation of parameters for the <code>(==)</code> function causes the <code>(>>=)</code> function to be evaluated.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>The code for the vector based implementation can be found <a href="https://code.abhinavsarkar.net/abhin4v/hasdoku/src/commit/a320a7874c6fa0c39665151cc8e073532cc750a1?mtm_campaign=feed" target="_blank" rel="noopener">here</a>.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8"><p>Unboxed vectors have some <a href="https://hackage.haskell.org/package/vector-0.12.0.1/docs/Data-Vector-Unboxed.html#t:Unbox" target="_blank" rel="noopener">restrictions</a> on the kind of values that can be put into them but <code>Word16</code> already follows those restrictions so we are good.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9"><p>Haskell can be a pretty good imperative programming language using the <code>ST</code> monad. <a href="https://vaibhavsagar.com/blog/2017/05/29/imperative-haskell/" target="_blank" rel="noopener">This article</a> shows how to implement some algorithms which require mutable data structures in Haskell.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn10"><p>The code for the mutable vector based implementation can be found <a href="https://code.abhinavsarkar.net/abhin4v/hasdoku/src/commit/4a9a1531d5780e7abc7d5ab2a26dccbf34382031?mtm_campaign=feed" target="_blank" rel="noopener">here</a>.<a href="#fnref10" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>Fast Sudoku Solver in Haskell</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-1/?mtm_campaign=feed">A Simple Solution</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed">A 200x Faster Solution</a>
</li>
<li>
<strong>Picking the Right Data Structures</strong> 👈
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-3/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2018-08-13T00:00:00Z <p>In the <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/">previous part</a> in this series of posts, we optimized the simple Sudoku solver by implementing a new strategy to prune cells, and were able to achieve a speedup of almost 200x. Afterwards, we profiled the solution and found that there were bottlenecks in the program, leading to a slowdown. In this post, we are going to follow the profiler and use the right <em>Data Structures</em> to improve the solution further and make it <strong>faster</strong>.</p>
https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/ Fast Sudoku Solver in Haskell #2: A 200x Faster Solution 2018-07-11T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p>In the <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-1/?mtm_campaign=feed">first part</a> of this series of posts, we wrote a simple <a href="https://en.wikipedia.org/wiki/Sudoku" target="_blank" rel="noopener">Sudoku</a> solver in <a href="https://www.haskell.org/" target="_blank" rel="noopener">Haskell</a>. It used a <a href="https://en.wikipedia.org/wiki/Constraint_satisfaction_problem" target="_blank" rel="noopener">constraint satisfaction</a> algorithm with <a href="https://en.wikipedia.org/wiki/Depth-first_search" target="_blank" rel="noopener">backtracking</a>. The solution worked well but was very slow. In this post, we are going to improve it and make it <strong>fast</strong>.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>Fast Sudoku Solver in Haskell</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-1/?mtm_campaign=feed">A Simple Solution</a>
</li>
<li>
<strong>A 200x Faster Solution</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-3/?mtm_campaign=feed">Picking the Right Data Structures</a>
</li>
</ol>
</section>
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#quick-recap">Quick Recap</a></li><li><a href="#constraints-and-corollaries">Constraints and Corollaries</a></li><li><a href="#singles-twins-and-triplets">Singles, Twins and Triplets</a></li><li><a href="#a-little-forward-a-little-backward">A Little Forward, a Little Backward</a></li><li><a href="#pruning-the-cells-exclusively">Pruning the Cells, Exclusively</a></li><li><a href="#faster-than-a-speeding-bullet">Faster than a Speeding Bullet!</a><ol><li><a href="#update">Update</a></li></ol></li><li><a href="#conclusion">Conclusion</a></li></ol></nav>
<h2 data-track-content data-content-name="quick-recap" data-content-piece="fast-sudoku-solver-in-haskell-2" id="quick-recap">Quick Recap</h2>
<p><a href="https://en.wikipedia.org/wiki/Sudoku" target="_blank" rel="noopener">Sudoku</a> is a number placement puzzle. It consists of a 9x9 grid which is to be filled with digits from 1 to 9 such that each row, each column and each of the nine 3x3 sub-grids contain all the digits. Some of the cells of the grid come pre-filled and the player has to fill the rest.</p>
<p>In the previous post, we implemented a simple Sudoku solver without paying much attention to its performance characteristics. We ran<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> some of <a href="https://abhinavsarkar.net/files/sudoku17.txt.bz2?mtm_campaign=feed">17-clue puzzles</a><a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a> through our program to see how fast it was:</p>
<pre class="plain"><code>$ head -n100 sudoku17.txt | time stack exec sudoku
... output omitted ...
116.70 real 198.09 user 94.46 sys</code></pre>
<p>So, it took about 117 seconds to solve one hundred puzzles. At this speed, it would take about 16 hours to solve all the 49151 puzzles contained in the file. This is way too slow. We need to find ways to make it faster. Let’s go back to the drawing board.</p>
<h2 data-track-content data-content-name="constraints-and-corollaries" data-content-piece="fast-sudoku-solver-in-haskell-2" id="constraints-and-corollaries">Constraints and Corollaries</h2>
<p>In a Sudoku puzzle, we have a partially filled 9x9 grid which we have to fill completely while following the constraints of the game.</p>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 201 209'%3E%3C/svg%3E" class="lazyload w-100pct nolink mw-30pct" style="--image-aspect-ratio: 0.9617224880382775" data-src="/images/fast-sudoku-solver-in-haskell-1/sudoku01.svg" alt="A sample puzzle"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-1/sudoku01.svg" class="w-100pct nolink mw-30pct" alt="A sample puzzle"></img></noscript>
<figcaption>A sample puzzle</figcaption>
</figure>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 201 209'%3E%3C/svg%3E" class="lazyload w-100pct nolink mw-30pct" style="--image-aspect-ratio: 0.9617224880382775" data-src="/images/fast-sudoku-solver-in-haskell-1/sudoku02.svg" alt="And its solution"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-1/sudoku02.svg" class="w-100pct nolink mw-30pct" alt="And its solution"></img></noscript>
<figcaption>And its solution</figcaption>
</figure>
<p>Earlier, we followed a simple pruning algorithm which removed all the solved (or <em>fixed</em>) digits from neighbours of the fixed cells. We repeated the pruning till the fixed and non-fixed values in the grid stopped changing (or the grid <em>settled</em>). Here’s an example of a grid before pruning:</p>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 209'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 4.4066985645933014" data-src="/images/fast-sudoku-solver-in-haskell-1/sudoku1.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-1/sudoku1.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<p>And here’s the same grid when it settles after repeated pruning:</p>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 209'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 4.4066985645933014" data-src="/images/fast-sudoku-solver-in-haskell-1/sudoku4.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-1/sudoku4.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<p>We see how the possibilities conflicting with the fixed values are removed. We also see how some of the non-fixed cells turn into fixed ones as all their other possible values get eliminated.</p>
<p>This simple strategy follows directly from the constraints of Sudoku. But, are there more complex strategies which are implied indirectly?</p>
<h2 data-track-content data-content-name="singles-twins-and-triplets" data-content-piece="fast-sudoku-solver-in-haskell-2" id="singles-twins-and-triplets">Singles, Twins and Triplets</h2>
<p>Let’s have a look at this sample row captured from a solution in progress:</p>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 49'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 18.79591836734694" data-src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line1.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line1.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<p>Notice how the sixth cell is the only one with <code>1</code> as a possibility in it. It is obvious that we should fix the sixth cell to <code>1</code> as we cannot place <code>1</code> in any other cell in the row. Let’s call this the <em>Singles</em><a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> scenario.</p>
<p>But, our current solution will not fix the sixth cell to <code>1</code> till one of these cases arise:</p>
<ol type="a">
<li>all other possibilities of the cell are pruned away, or,</li>
<li>the cell is chosen as pivot in the <code>nextGrids</code> function and <code>1</code> is chosen as the value to fix.</li>
</ol>
<p>This may take very long and lead to a longer solution time. Let’s assume that we recognize the Singles scenario while pruning cells and fix the cell to <code>1</code> right then. That would cut down the search tree by a lot and make the solution much faster.</p>
<p>It turns out, we can generalize this pattern. Let’s check out this sample row from middle of a solution:</p>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 49'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 18.79591836734694" data-src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line2.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line2.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<p>It is a bit difficult to notice with the naked eye but there’s something special here too. The digits <code>5</code> and <code>7</code> occur only in the third and the ninth cells. Though they are accompanied by other digits in those cells, they are not present in any other cells. This means, we can place <code>5</code> and <code>7</code> either in the third or the ninth cell and no other cells. This implies that we can prune the third and ninth cells to have only <code>5</code> and <code>7</code> like this:</p>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 49'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 18.79591836734694" data-src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line3.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line3.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<p>This is the <em>Twins</em> scenario. As we can imagine, this pattern extends to groups of three digits and beyond. When three digits can be found only in three cells in a block, it is the <em>Triplets</em> scenario, as in the example below:</p>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 49'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 18.79591836734694" data-src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line4.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line4.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<p>In this case, the triplet digits are <code>3</code>, <code>8</code> and <code>9</code>. And as before, we can prune the block by fixing these digits in their cells:</p>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 49'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 18.79591836734694" data-src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line5.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line5.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<p>Let’s call these three scenarios <em>Exclusives</em> in general.</p>
<p>We can extend this to <em>Quadruplets</em> scenario and further. But such scenarios occur rarely in a 9x9 Sudoku puzzle. Trying to find them may end up being more computationally expensive than the benefit we may get in solution time speedup by finding them.</p>
<p>Now that we have discovered these new strategies to prune cells, let’s implement them in Haskell.</p>
<h2 data-track-content data-content-name="a-little-forward-a-little-backward" data-content-piece="fast-sudoku-solver-in-haskell-2" id="a-little-forward-a-little-backward">A Little Forward, a Little Backward</h2>
<p>We can implement the three new strategies to prune cells as one function for each. However, we can actually implement all these strategies in a single function. But, this function is a bit more complex than the previous pruning function. So first, let’s try to understand its working using tables. Let’s take this sample row:</p>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 49'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 18.79591836734694" data-src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line6.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-2/sudoku-line6.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<p>First, we make a table mapping the digits to the cells in which they occur, excluding the fixed cells:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Digit</th>
<th style="text-align: right;">Cells</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">2</td>
<td style="text-align: right;">6, 8, 9</td>
</tr>
<tr>
<td style="text-align: left;">3</td>
<td style="text-align: right;">6, 8, 9</td>
</tr>
<tr>
<td style="text-align: left;">4</td>
<td style="text-align: right;">1</td>
</tr>
<tr>
<td style="text-align: left;">6</td>
<td style="text-align: right;">1, 4, 6, 7, 8, 9</td>
</tr>
<tr>
<td style="text-align: left;">8</td>
<td style="text-align: right;">6, 8, 9</td>
</tr>
<tr>
<td style="text-align: left;">9</td>
<td style="text-align: right;">1, 4, 6, 7, 8, 9</td>
</tr>
</tbody>
</table>
</div>
<p>Then, we flip this table and collect all the digits that occur in the same set of cells:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Cells</th>
<th style="text-align: right;">Digits</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">1</td>
<td style="text-align: right;">4</td>
</tr>
<tr>
<td style="text-align: left;">6, 8, 9</td>
<td style="text-align: right;">2, 3, 8</td>
</tr>
<tr>
<td style="text-align: left;">1, 4, 6, 7, 8, 9</td>
<td style="text-align: right;">6, 9</td>
</tr>
</tbody>
</table>
</div>
<p>And finally, we remove the rows of the table in which the count of the cells is not the same as the count of the digits:</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Cells</th>
<th style="text-align: right;">Digits</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;">1</td>
<td style="text-align: right;">4</td>
</tr>
<tr>
<td style="text-align: left;">6, 8, 9</td>
<td style="text-align: right;">2, 3, 8</td>
</tr>
</tbody>
</table>
</div>
<p>Voilà! We have found a Single <code>4</code> and a set of Triplets <code>2</code>, <code>3</code> and <code>8</code>. You can go over the puzzle row and verify that this indeed is the case.</p>
<p>Translating this logic to Haskell is quite easy now:</p>
<div class="sourceCode" id="cb2" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="ot">isPossible ::</span> <span class="dt">Cell</span> <span class="ot">-></span> <span class="dt">Bool</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>isPossible (<span class="dt">Possible</span> _) <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>isPossible _ <span class="ot">=</span> <span class="dt">False</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="ot">exclusivePossibilities ::</span> [<span class="dt">Cell</span>] <span class="ot">-></span> [[<span class="dt">Int</span>]]</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>exclusivePossibilities row <span class="ot">=</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a> <span class="co">-- input</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a> row</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a> <span class="co">-- [Possible [4,6,9], Fixed 1, Fixed 5, Possible [6,9], Fixed 7, Possible [2,3,6,8,9],</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> <span class="co">-- Possible [6,9], Possible [2,3,6,8,9], Possible [2,3,6,8,9]]</span></span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a> <span class="co">-- step 1</span></span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> <span class="fu">zip</span> [<span class="dv">1</span><span class="op">..</span><span class="dv">9</span>]</span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a> <span class="co">-- [(1,Possible [4,6,9]),(2,Fixed 1),(3,Fixed 5),(4,Possible [6,9]),(5,Fixed 7),</span></span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a> <span class="co">-- (6,Possible [2,3,6,8,9]),(7,Possible [6,9]),(8,Possible [2,3,6,8,9]),</span></span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a> <span class="co">-- (9,Possible [2,3,6,8,9])]</span></span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a> <span class="co">-- step 2</span></span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> <span class="fu">filter</span> (isPossible <span class="op">.</span> <span class="fu">snd</span>)</span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a> <span class="co">-- [(1,Possible [4,6,9]),(4,Possible [6,9]),(6,Possible [2,3,6,8,9]),</span></span>
<span id="cb2-21"><a href="#cb2-21" aria-hidden="true" tabindex="-1"></a> <span class="co">-- (7,Possible [6,9]), (8,Possible [2,3,6,8,9]),(9,Possible [2,3,6,8,9])]</span></span>
<span id="cb2-22"><a href="#cb2-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-23"><a href="#cb2-23" aria-hidden="true" tabindex="-1"></a> <span class="co">-- step 3</span></span>
<span id="cb2-24"><a href="#cb2-24" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> Data.List.foldl'</span>
<span id="cb2-25"><a href="#cb2-25" aria-hidden="true" tabindex="-1"></a> (\acc <span class="op">~</span>(i, <span class="dt">Possible</span> xs) <span class="ot">-></span></span>
<span id="cb2-26"><a href="#cb2-26" aria-hidden="true" tabindex="-1"></a> Data.List.foldl' (\acc' x <span class="ot">-></span> Map.insertWith prepend x [i] acc') acc xs)</span>
<span id="cb2-27"><a href="#cb2-27" aria-hidden="true" tabindex="-1"></a> Map.empty</span>
<span id="cb2-28"><a href="#cb2-28" aria-hidden="true" tabindex="-1"></a> <span class="co">-- fromList [(2,[9,8,6]),(3,[9,8,6]),(4,[1]),(6,[9,8,7,6,4,1]),(8,[9,8,6]),</span></span>
<span id="cb2-29"><a href="#cb2-29" aria-hidden="true" tabindex="-1"></a> <span class="co">-- (9,[9,8,7,6,4,1])]</span></span>
<span id="cb2-30"><a href="#cb2-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-31"><a href="#cb2-31" aria-hidden="true" tabindex="-1"></a> <span class="co">-- step 4</span></span>
<span id="cb2-32"><a href="#cb2-32" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> Map.filter ((<span class="op"><</span> <span class="dv">4</span>) <span class="op">.</span> <span class="fu">length</span>)</span>
<span id="cb2-33"><a href="#cb2-33" aria-hidden="true" tabindex="-1"></a> <span class="co">-- fromList [(2,[9,8,6]),(3,[9,8,6]),(4,[1]),(8,[9,8,6])]</span></span>
<span id="cb2-34"><a href="#cb2-34" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-35"><a href="#cb2-35" aria-hidden="true" tabindex="-1"></a> <span class="co">-- step 5</span></span>
<span id="cb2-36"><a href="#cb2-36" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> Map.foldlWithKey'(\acc x is <span class="ot">-></span> Map.insertWith prepend is [x] acc) Map.empty</span>
<span id="cb2-37"><a href="#cb2-37" aria-hidden="true" tabindex="-1"></a> <span class="co">-- fromList [([1],[4]),([9,8,6],[8,3,2])]</span></span>
<span id="cb2-38"><a href="#cb2-38" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-39"><a href="#cb2-39" aria-hidden="true" tabindex="-1"></a> <span class="co">-- step 6</span></span>
<span id="cb2-40"><a href="#cb2-40" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> Map.filterWithKey (\is xs <span class="ot">-></span> <span class="fu">length</span> is <span class="op">==</span> <span class="fu">length</span> xs)</span>
<span id="cb2-41"><a href="#cb2-41" aria-hidden="true" tabindex="-1"></a> <span class="co">-- fromList [([1],[4]),([9,8,6],[8,3,2])]</span></span>
<span id="cb2-42"><a href="#cb2-42" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-43"><a href="#cb2-43" aria-hidden="true" tabindex="-1"></a> <span class="co">-- step 7</span></span>
<span id="cb2-44"><a href="#cb2-44" aria-hidden="true" tabindex="-1"></a> <span class="op">&</span> Map.elems</span>
<span id="cb2-45"><a href="#cb2-45" aria-hidden="true" tabindex="-1"></a> <span class="co">-- [[4],[8,3,2]]</span></span>
<span id="cb2-46"><a href="#cb2-46" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb2-47"><a href="#cb2-47" aria-hidden="true" tabindex="-1"></a> prepend <span class="op">~</span>[y] ys <span class="ot">=</span> y<span class="op">:</span>ys</span></code></pre></div>
<p>We extract the <code>isPossible</code> function to the top level from the <code>nextGrids</code> function for reuse. Then we write the <code>exclusivePossibilities</code> function which finds the Exclusives in the input row. This function is written using the reverse application operator <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-Function.html#v:-38-" target="_blank" rel="noopener"><code>(&)</code></a><a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a> instead of the usual <code>($)</code> operator so that we can read it from top to bottom. We also show the intermediate values for a sample input after every step in the function chain.</p>
<p>The nub of the function lies in step 3 (pun intended). We do a nested fold over all the non-fixed cells and all the possible digits in them to compute the map<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a> which represents the first table. Thereafter, we filter the map to keep only the entries with length less than four (step 4). Then we flip it to create a new map which represents the second table (step 5). Finally, we filter the flipped map for the entries where the cell count is same as the digit count (step 6) to arrive at the final table. The step 7 just gets the values in the map which is the list of all the Exclusives in the input row.</p>
<h2 data-track-content data-content-name="pruning-the-cells-exclusively" data-content-piece="fast-sudoku-solver-in-haskell-2" id="pruning-the-cells-exclusively">Pruning the Cells, Exclusively</h2>
<p>To start with, we extract some reusable code from the previous <code>pruneCells</code> function and rename it to <code>pruneCellsByFixed</code>:</p>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ot">makeCell ::</span> [<span class="dt">Int</span>] <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Cell</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>makeCell ys <span class="ot">=</span> <span class="kw">case</span> ys <span class="kw">of</span></span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> [] <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> [y] <span class="ot">-></span> <span class="dt">Just</span> <span class="op">$</span> <span class="dt">Fixed</span> y</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="dt">Just</span> <span class="op">$</span> <span class="dt">Possible</span> ys</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneCellsByFixed ::</span> [<span class="dt">Cell</span>] <span class="ot">-></span> <span class="dt">Maybe</span> [<span class="dt">Cell</span>]</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>pruneCellsByFixed cells <span class="ot">=</span> <span class="fu">traverse</span> pruneCell cells</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> fixeds <span class="ot">=</span> [x <span class="op">|</span> <span class="dt">Fixed</span> x <span class="ot"><-</span> cells]</span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a> pruneCell (<span class="dt">Possible</span> xs) <span class="ot">=</span> makeCell (xs <span class="dt">Data.List</span><span class="op">.</span>\\ fixeds)</span>
<span id="cb3-13"><a href="#cb3-13" aria-hidden="true" tabindex="-1"></a> pruneCell x <span class="ot">=</span> <span class="dt">Just</span> x</span></code></pre></div>
<p>Now we write the <code>pruneCellsByExclusives</code> function which uses the <code>exclusivePossibilities</code> function to prune the cells:</p>
<div class="sourceCode" id="cb4" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneCellsByExclusives ::</span> [<span class="dt">Cell</span>] <span class="ot">-></span> <span class="dt">Maybe</span> [<span class="dt">Cell</span>]</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>pruneCellsByExclusives cells <span class="ot">=</span> <span class="kw">case</span> exclusives <span class="kw">of</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> [] <span class="ot">-></span> <span class="dt">Just</span> cells</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a> _ <span class="ot">-></span> <span class="fu">traverse</span> pruneCell cells</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a> exclusives <span class="ot">=</span> exclusivePossibilities cells</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a> allExclusives <span class="ot">=</span> <span class="fu">concat</span> exclusives</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a> pruneCell cell<span class="op">@</span>(<span class="dt">Fixed</span> _) <span class="ot">=</span> <span class="dt">Just</span> cell</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a> pruneCell cell<span class="op">@</span>(<span class="dt">Possible</span> xs)</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> intersection <span class="ot">`elem`</span> exclusives <span class="ot">=</span> makeCell intersection</span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="dt">Just</span> cell</span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a> intersection <span class="ot">=</span> xs <span class="ot">`Data.List.intersect`</span> allExclusives</span></code></pre></div>
<p><code>pruneCellsByExclusives</code> works exactly as shown in the examples above. We first find the list of Exclusives in the given cells. If there are no Exclusives, there’s nothing to do and we just return the cells. If we find any Exclusives, we <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-Traversable.html#v:traverse" target="_blank" rel="noopener"><code>traverse</code></a> the cells, pruning each cell to only the intersection of the possible digits in the cell and Exclusive digits. That’s it! We reuse the <code>makeCell</code> function to create a new cell with the intersection.</p>
<p>As the final step, we rewrite the <code>pruneCells</code> function by combining both the functions.</p>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ot">fixM ::</span> (<span class="dt">Eq</span> t, <span class="dt">Monad</span> m) <span class="ot">=></span> (t <span class="ot">-></span> m t) <span class="ot">-></span> t <span class="ot">-></span> m t</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>fixM f x <span class="ot">=</span> f x <span class="op">>>=</span> \x' <span class="ot">-></span> <span class="kw">if</span> x' <span class="op">==</span> x <span class="kw">then</span> <span class="fu">return</span> x <span class="kw">else</span> fixM f x'</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneCells ::</span> [<span class="dt">Cell</span>] <span class="ot">-></span> <span class="dt">Maybe</span> [<span class="dt">Cell</span>]</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>pruneCells cells <span class="ot">=</span> fixM pruneCellsByFixed cells <span class="op">>>=</span> fixM pruneCellsByExclusives</span></code></pre></div>
<p>We have extracted <code>fixM</code> as a top level function from the <code>pruneGrid</code> function. Just like the <code>pruneGrid'</code> function, we need to use monadic bind (<a href="https://hackage.haskell.org/package/base-4.10.1.0/docs/Control-Monad.html#v:-62--62--61-" target="_blank" rel="noopener"><code>>>=</code></a>) to chain the two pruning steps. We also use <code>fixM</code> to apply each step repeatedly till the pruned cells settle<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>.</p>
<p>No further code changes are required. It is time to check out the improvements.</p>
<h2 data-track-content data-content-name="faster-than-a-speeding-bullet" data-content-piece="fast-sudoku-solver-in-haskell-2" id="faster-than-a-speeding-bullet">Faster than a Speeding Bullet!</h2>
<p>Let’s build the program and run the exact same number of puzzles as before:</p>
<pre class="plain"><code>$ head -n100 sudoku17.txt | time stack exec sudoku
... output omitted ...
0.53 real 0.58 user 0.23 sys</code></pre>
<p>Woah! It is way faster than before. Let’s solve all the puzzles now:</p>
<pre class="plain"><code>$ cat sudoku17.txt | time stack exec sudoku > /dev/null
282.98 real 407.25 user 109.27 sys</code></pre>
<p>So it is took about 283 seconds to solve all the 49151 puzzles. The speedup is about 200x<a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>. That’s about 5.8 milliseconds per puzzle.</p>
<p>Let’s do a quick profiling to see where the time is going:</p>
<pre class="plain"><code>$ stack build --profile
$ head -n1000 sudoku17.txt | stack exec -- sudoku +RTS -p > /dev/null</code></pre>
<p>This generates a file named <code>sudoku.prof</code> with the profiling results. Here are the top five most time-taking functions (cleaned for brevity):</p>
<div class="scrollable-table">
<table>
<thead>
<tr>
<th style="text-align: left;">Cost Center</th>
<th style="text-align: left;">Source</th>
<th style="text-align: right;">%time</th>
<th style="text-align: right;">%alloc</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities</code></td>
<td style="text-align: left;">(49,1)-(62,26)</td>
<td style="text-align: right;">17.6</td>
<td style="text-align: right;">11.4</td>
</tr>
<tr>
<td style="text-align: left;"><code>pruneCellsByFixed.pruneCell</code></td>
<td style="text-align: left;">(75,5)-(76,36)</td>
<td style="text-align: right;">16.9</td>
<td style="text-align: right;">30.8</td>
</tr>
<tr>
<td style="text-align: left;"><code>exclusivePossibilities.\.\</code></td>
<td style="text-align: left;">55:38-70</td>
<td style="text-align: right;">12.2</td>
<td style="text-align: right;">20.3</td>
</tr>
<tr>
<td style="text-align: left;"><code>fixM.\</code></td>
<td style="text-align: left;">13:27-65</td>
<td style="text-align: right;">10.0</td>
<td style="text-align: right;">0.0</td>
</tr>
<tr>
<td style="text-align: left;"><code>==</code></td>
<td style="text-align: left;">15:56-57</td>
<td style="text-align: right;">7.2</td>
<td style="text-align: right;">0.0</td>
</tr>
</tbody>
</table>
</div>
<p>Looking at the report, my guess is that a lot of time is going into list operations. Lists are known to be inefficient in Haskell so maybe we should switch to some other data structures?</p>
<h3 id="update">Update</h3>
<p>As per the <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed#comment-97ca7640-8531-11e8-a1d5-1fd7d3dbc496">comment</a> by Chris Casinghino, I ran both the versions of code without the <code>-threaded</code>, <code>-rtsopts</code> and <code>-with-rtsopts=-N</code> options. The time for previous post’s code:</p>
<pre class="plain"><code>$ head -n100 sudoku17.txt | time stack exec sudoku
... output omitted ...
96.54 real 95.90 user 0.66 sys</code></pre>
<p>And the time for this post’s code:</p>
<pre class="plain"><code>$ cat sudoku17.txt | time stack exec sudoku > /dev/null
258.97 real 257.34 user 1.52 sys</code></pre>
<p>So, both the versions run about 10% faster without the threading options. I suspect this has something to do with GHC’s parallel GC as described in <a href="https://web.archive.org/web/20180711/https://inner-haven.net/posts/2017-05-08-speed-up-haskell-programs-weird-trick.html" target="_blank" rel="noopener">this post</a>. So for now, I’ll keep threading disabled.</p>
<h2 data-track-content data-content-name="conclusion" data-content-piece="fast-sudoku-solver-in-haskell-2" id="conclusion">Conclusion</h2>
<p>In this post, we improved upon our simple Sudoku solution from the <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-1/?mtm_campaign=feed">last time</a>. We discovered and implemented a new strategy to prune cells, and we achieved a 200x speedup. But profiling shows that we still have many possibilities for improvements. We’ll work on that and more in the upcoming posts in this series. The code till now is available <a href="https://code.abhinavsarkar.net/abhin4v/hasdoku/src/commit/9d6eb18229f905c52cb4c98b569abb70757ba022?mtm_campaign=feed" target="_blank" rel="noopener">here</a>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>All the runs were done on my MacBook Pro from 2014 with 2.2 GHz Intel Core i7 CPU and 16 GB memory.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p>At least 17 cells must be pre-filled in a Sudoku puzzle for it to have a unique solution. So 17-clue puzzles are the most difficult of all puzzles. <a href="https://arxiv.org/pdf/1201.0749v2.pdf" target="_blank" rel="noopener">This paper</a> by McGuire, Tugemann and Civario gives the proof of the same.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3"><p>“Single” as in <a href="https://en.wikipedia.org/wiki/Single_child" target="_blank" rel="noopener">“Single child”</a><a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4"><p>Reverse application operation is not used much in Haskell. But it is the preferred way of function chaining in some other functional programming languages like <a href="https://clojuredocs.org/clojure.core/-%3E" target="_blank" rel="noopener">Clojure</a>, <a href="https://en.wikibooks.org/wiki/F_Sharp_Programming/Higher_Order_Functions#The_%7C%3E_Operator" target="_blank" rel="noopener">FSharp</a>, and <a href="https://hexdocs.pm/elixir/Kernel.html#%7C%3E/2" target="_blank" rel="noopener">Elixir</a>.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5"><p>We use <a href="https://hackage.haskell.org/package/containers-0.6.0.1/docs/Data-Map-Strict.html" target="_blank" rel="noopener">Data.Map.Strict</a> as the map implementation.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6"><p>We need to run <code>pruneCellsByFixed</code> and <code>pruneCellsByExclusives</code> repeatedly using <code>fixM</code> because an unsettled row can lead to wrong solutions.</p>
<p>Imagine a row which just got a <code>9</code> fixed because of <code>pruneCellsByFixed</code>. If we don’t run the function again, the row may be left with one non-fixed cell with a <code>9</code>. When we run this row through <code>pruneCellsByExclusives</code>, it’ll consider the <code>9</code> in the non-fixed cell as a Single and fix it. This will lead to two <code>9</code>s in the same row, causing the solution to fail.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7"><p>Speedup calculation: 116.7 / 100 * 49151 / 282.98 = 202.7<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>Fast Sudoku Solver in Haskell</strong>.</p>
<ol>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-1/?mtm_campaign=feed">A Simple Solution</a>
</li>
<li>
<strong>A 200x Faster Solution</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-3/?mtm_campaign=feed">Picking the Right Data Structures</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2018-07-11T00:00:00Z <p>In the <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-1/">first part</a> of this series of posts, we wrote a simple <a href="https://en.wikipedia.org/wiki/Sudoku" target="_blank" rel="noopener">Sudoku</a> solver in <a href="https://www.haskell.org/" target="_blank" rel="noopener">Haskell</a>. It used a <a href="https://en.wikipedia.org/wiki/Constraint_satisfaction_problem" target="_blank" rel="noopener">constraint satisfaction</a> algorithm with <a href="https://en.wikipedia.org/wiki/Depth-first_search" target="_blank" rel="noopener">backtracking</a>. The solution worked well but was very slow. In this post, we are going to improve it and make it <strong>fast</strong>.</p>
https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-1/ Fast Sudoku Solver in Haskell #1: A Simple Solution 2018-06-28T00:00:00Z Abhinav Sarkar https://abhinavsarkar.net/about/ abhinav@abhinavsarkar.net <p><a href="https://en.wikipedia.org/wiki/Sudoku" target="_blank" rel="noopener">Sudoku</a> is a number placement puzzle. It consists of a 9x9 grid which is to be filled with digits from 1 to 9. Some of the cells of the grid come pre-filled and the player has to fill the rest.</p>
<p><a href="https://www.haskell.org/" target="_blank" rel="noopener">Haskell</a> is a purely functional programming language. It is a good choice to solve Sudoku given the problem’s <a href="https://en.wikipedia.org/wiki/Combinatorics" target="_blank" rel="noopener">combinatorial</a> nature. The aim of this series of posts is to write a <strong>fast</strong> Sudoku solver in Haskell. We’ll focus on both implementing the solution and making it efficient, step-by-step, starting with a slow but simple solution in this post<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>.</p>
<p>This post was originally published on <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-1/?mtm_campaign=feed">abhinavsarkar.net</a>.</p><!--more--><section class="series-info">
<p>This post is a part of the series: <strong>Fast Sudoku Solver in Haskell</strong>.</p>
<ol>
<li>
<strong>A Simple Solution</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed">A 200x Faster Solution</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-3/?mtm_campaign=feed">Picking the Right Data Structures</a>
</li>
</ol>
</section>
<nav id="toc" class="right-toc"><h3>Contents</h3><ol><li><a href="#constraint-satisfaction-problem">Constraint Satisfaction Problem</a></li><li><a href="#setting-up">Setting up</a></li><li><a href="#pruning-the-cells">Pruning the Cells</a></li><li><a href="#pruning-the-grid">Pruning the Grid</a></li><li><a href="#making-the-choice">Making the Choice</a></li><li><a href="#solving-the-puzzle">Solving the Puzzle</a></li><li><a href="#conclusion">Conclusion</a></li></ol></nav>
<h2 data-track-content data-content-name="constraint-satisfaction-problem" data-content-piece="fast-sudoku-solver-in-haskell-1" id="constraint-satisfaction-problem">Constraint Satisfaction Problem</h2>
<p>Solving Sudoku is a <a href="https://en.wikipedia.org/wiki/Constraint_satisfaction_problem" target="_blank" rel="noopener">constraint satisfaction problem</a>. We are given a partially filled grid which we have to fill completely such that each of the following constraints are satisfied:</p>
<ol type="1">
<li>Each of the nine rows must have all the digits, from 1 to 9.</li>
<li>Each of the nine columns must have all the digits, from 1 to 9.</li>
<li>Each of the nine 3x3 sub-grids must have all the digits, from 1 to 9.</li>
</ol>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 201 209'%3E%3C/svg%3E" class="lazyload w-100pct nolink mw-30pct" style="--image-aspect-ratio: 0.9617224880382775" data-src="/images/fast-sudoku-solver-in-haskell-1/sudoku01.svg" alt="A sample puzzle"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-1/sudoku01.svg" class="w-100pct nolink mw-30pct" alt="A sample puzzle"></img></noscript>
<figcaption>A sample puzzle</figcaption>
</figure>
<figure>
<img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 201 209'%3E%3C/svg%3E" class="lazyload w-100pct nolink mw-30pct" style="--image-aspect-ratio: 0.9617224880382775" data-src="/images/fast-sudoku-solver-in-haskell-1/sudoku02.svg" alt="And its solution"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-1/sudoku02.svg" class="w-100pct nolink mw-30pct" alt="And its solution"></img></noscript>
<figcaption>And its solution</figcaption>
</figure>
<p>Each cell in the grid is member of one row, one column and one sub-grid (called <em>block</em> in general). Digits in the pre-filled cells impose constraints on the rows, columns, and sub-grids they are part of. For example, if a cell contains <code>1</code> then no other cell in that cell’s row, column or sub-grid can contain <code>1</code>. Given these constraints, we can devise a simple algorithm to solve Sudoku:</p>
<ol type="1">
<li>Each cell contains either a single digit or has a set of possible digits. For example, a grid showing the possibilities of all non-filled cells for the sample puzzle above:</li>
</ol>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 209'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 4.4066985645933014" data-src="/images/fast-sudoku-solver-in-haskell-1/sudoku1.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-1/sudoku1.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<ol start="2" type="1">
<li>If a cell contains a digit, remove that digit from the list of the possible digits from all its neighboring cells. Neighboring cells are the other cells in the given cell’s row, column and sub-grid. For example, the grid after removing the fixed value <code>4</code> of the row-2-column-1 cell from its neighboring cells:</li>
</ol>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 209'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 4.4066985645933014" data-src="/images/fast-sudoku-solver-in-haskell-1/sudoku2.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-1/sudoku2.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<ol start="3" type="1">
<li>Repeat the previous step for all the cells that are have been solved (or <em>fixed</em>), either pre-filled or filled in the previous iteration of the solution. For example, the grid after removing all fixed values from all non-fixed cells:</li>
</ol>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 209'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 4.4066985645933014" data-src="/images/fast-sudoku-solver-in-haskell-1/sudoku3.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-1/sudoku3.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<ol start="4" type="1">
<li>Continue till the grid <em>settles</em>, that is, there are no more changes in the possibilities of any cells. For example, the settled grid for the current iteration:</li>
</ol>
<div class="scrollable-img">
<p><img src="data:image/svg+xml,%3Csvg xmlns='https://www.w3.org/2000/svg' viewBox='0 0 921 209'%3E%3C/svg%3E" class="lazyload w-100pct nolink extra-width" style="--image-aspect-ratio: 4.4066985645933014" data-src="/images/fast-sudoku-solver-in-haskell-1/sudoku4.svg"></img>
<noscript><img src="/images/fast-sudoku-solver-in-haskell-1/sudoku4.svg" class="w-100pct nolink extra-width"></img></noscript></p>
</div>
<ol start="5" type="1">
<li>Once the grid settles, choose one of the non-fixed cells following some strategy. Select one of the digits from all the possibilities of the cell, and fix (assume) the cell to have that digit. Go back to step 1 and repeat.</li>
<li>The elimination of possibilities may result in inconsistencies. For example, you may end up with a cell with no possibilities. In such a case, discard that branch of solution, and backtrack to last point where you fixed a cell. Choose a different possibility to fix and repeat.</li>
<li>If at any point the grid is completely filled, you’ve found the solution!</li>
<li>If you exhaust all branches of the solution then the puzzle is unsolvable. This can happen if it starts with cells pre-filled wrongly.</li>
</ol>
<p>This algorithm is actually a <a href="https://en.wikipedia.org/wiki/Depth-first_search" target="_blank" rel="noopener">Depth-First Search</a> on the <a href="https://en.wikipedia.org/wiki/State_space_search" target="_blank" rel="noopener">state space</a> of the grid configurations. It guarantees to either find a solution or prove a puzzle to be unsolvable.</p>
<h2 data-track-content data-content-name="setting-up" data-content-piece="fast-sudoku-solver-in-haskell-1" id="setting-up">Setting up</h2>
<p>We start with writing types to represent the cells and the grid:</p>
<div class="sourceCode" id="cb1" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">data</span> <span class="dt">Cell</span> <span class="ot">=</span> <span class="dt">Fixed</span> <span class="dt">Int</span> <span class="op">|</span> <span class="dt">Possible</span> [<span class="dt">Int</span>] <span class="kw">deriving</span> (<span class="dt">Show</span>, <span class="dt">Eq</span>)</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Row</span> <span class="ot">=</span> [<span class="dt">Cell</span>]</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">type</span> <span class="dt">Grid</span> <span class="ot">=</span> [<span class="dt">Row</span>]</span></code></pre></div>
<p>A cell is either fixed with a particular digit or has a set of digits as possibilities. So it is natural to represent it as a <a href="https://en.wikipedia.org/wiki/Algebraic_data_type" target="_blank" rel="noopener">sum type</a> with <code>Fixed</code> and <code>Possible</code> constructors. A row is a list of cells and a grid is a list of rows.</p>
<p>We’ll take the input puzzle as a string of 81 characters representing the cells, left-to-right and top-to-bottom. An example is:</p>
<pre class="plain"><code>.......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6...</code></pre>
<p>Here, <code>.</code> represents an non-filled cell. Let’s write a function to read this input and parse it to our <code>Grid</code> data structure:</p>
<div class="sourceCode" id="cb3" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="ot">readGrid ::</span> <span class="dt">String</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>readGrid s</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">length</span> s <span class="op">==</span> <span class="dv">81</span> <span class="ot">=</span> <span class="fu">traverse</span> (<span class="fu">traverse</span> readCell) <span class="op">.</span> Data.List.Split.chunksOf <span class="dv">9</span> <span class="op">$</span> s</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="dt">Nothing</span></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> readCell <span class="ch">'.'</span> <span class="ot">=</span> <span class="dt">Just</span> <span class="op">$</span> <span class="dt">Possible</span> [<span class="dv">1</span><span class="op">..</span><span class="dv">9</span>]</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> readCell c</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> Data.Char.isDigit c <span class="op">&&</span> c <span class="op">></span> <span class="ch">'0'</span> <span class="ot">=</span> <span class="dt">Just</span> <span class="op">.</span> <span class="dt">Fixed</span> <span class="op">.</span> Data.Char.digitToInt <span class="op">$</span> c</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> <span class="dt">Nothing</span></span></code></pre></div>
<p><code>readGrid</code> return a <code>Just grid</code> if the input is correct, else it returns a <code>Nothing</code>. It parses a <code>.</code> to a <code>Possible</code> cell with all digits as possibilities, and a digit char to a <code>Fixed</code> cell with that digit. Let’s try it out in the <em>REPL</em>:</p>
<div class="sourceCode" id="cb4" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span> readGrid <span class="st">".......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6..."</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">mapM_</span> <span class="fu">print</span> grid</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>[Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Fixed 1,Possible [1,2,3,4,5,6,7,8,9]]</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>[Fixed 4,Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9]]</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>[Possible [1,2,3,4,5,6,7,8,9],Fixed 2,Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9]]</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>[Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Fixed 5,Possible [1,2,3,4,5,6,7,8,9],Fixed 4,Possible [1,2,3,4,5,6,7,8,9],Fixed 7]</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>[Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Fixed 8,Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Fixed 3,Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9]]</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>[Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Fixed 1,Possible [1,2,3,4,5,6,7,8,9],Fixed 9,Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9]]</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>[Fixed 3,Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Fixed 4,Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Fixed 2,Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9]]</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>[Possible [1,2,3,4,5,6,7,8,9],Fixed 5,Possible [1,2,3,4,5,6,7,8,9],Fixed 1,Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9]]</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>[Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Fixed 8,Possible [1,2,3,4,5,6,7,8,9],Fixed 6,Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9],Possible [1,2,3,4,5,6,7,8,9]]</span></code></pre></div>
<p>The output is a bit unreadable but correct. We can write a few functions to clean it up:</p>
<div class="sourceCode" id="cb5" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ot">showGrid ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>showGrid <span class="ot">=</span> <span class="fu">unlines</span> <span class="op">.</span> <span class="fu">map</span> (<span class="fu">unwords</span> <span class="op">.</span> <span class="fu">map</span> showCell)</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> showCell (<span class="dt">Fixed</span> x) <span class="ot">=</span> <span class="fu">show</span> x</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> showCell _ <span class="ot">=</span> <span class="st">"."</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a><span class="ot">showGridWithPossibilities ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">String</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>showGridWithPossibilities <span class="ot">=</span> <span class="fu">unlines</span> <span class="op">.</span> <span class="fu">map</span> (<span class="fu">unwords</span> <span class="op">.</span> <span class="fu">map</span> showCell)</span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a> showCell (<span class="dt">Fixed</span> x) <span class="ot">=</span> <span class="fu">show</span> x <span class="op">++</span> <span class="st">" "</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a> showCell (<span class="dt">Possible</span> xs) <span class="ot">=</span></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a> (<span class="op">++</span> <span class="st">"]"</span>)</span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Data.List.foldl' (\acc x <span class="ot">-></span> acc <span class="op">++</span> <span class="kw">if</span> x <span class="ot">`elem`</span> xs <span class="kw">then</span> <span class="fu">show</span> x <span class="kw">else</span> <span class="st">" "</span>) <span class="st">"["</span></span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> [<span class="dv">1</span><span class="op">..</span><span class="dv">9</span>]</span></code></pre></div>
<p>Back to the <em>REPL</em> again:</p>
<div class="sourceCode" id="cb6" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span> readGrid <span class="st">".......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6..."</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStrLn</span> <span class="op">$</span> showGrid grid</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>. . . . . . . 1 .</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a>4 . . . . . . . .</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>. 2 . . . . . . .</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a>. . . . 5 . 4 . 7</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a>. . 8 . . . 3 . .</span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a>. . 1 . 9 . . . .</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a>3 . . 4 . . 2 . .</span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a>. 5 . 1 . . . . .</span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a>. . . 8 . 6 . . .</span></code></pre></div>
<div class="sourceCode" id="cb7" data-lang="ghci"><pre class="sourceCode lhs numberSource small overflow"><code class="sourceCode literatehaskell"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStrLn</span> <span class="op">$</span> showGridWithPossibilities grid</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>[123456789] [123456789] [123456789] [123456789] [123456789] [123456789] [123456789] 1 [123456789]</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>4 [123456789] [123456789] [123456789] [123456789] [123456789] [123456789] [123456789] [123456789]</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>[123456789] 2 [123456789] [123456789] [123456789] [123456789] [123456789] [123456789] [123456789]</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>[123456789] [123456789] [123456789] [123456789] 5 [123456789] 4 [123456789] 7</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>[123456789] [123456789] 8 [123456789] [123456789] [123456789] 3 [123456789] [123456789]</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>[123456789] [123456789] 1 [123456789] 9 [123456789] [123456789] [123456789] [123456789]</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>3 [123456789] [123456789] 4 [123456789] [123456789] 2 [123456789] [123456789]</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a>[123456789] 5 [123456789] 1 [123456789] [123456789] [123456789] [123456789] [123456789]</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>[123456789] [123456789] [123456789] 8 [123456789] 6 [123456789] [123456789] [123456789]</span></code></pre></div>
<p>The output is more readable now. We see that, at the start, all the non-filled cells have all the digits as possible values. We’ll use these functions for debugging as we go forward. We can now start solving the puzzle.</p>
<div class="page-break">
</div>
<h2 data-track-content data-content-name="pruning-the-cells" data-content-piece="fast-sudoku-solver-in-haskell-1" id="pruning-the-cells">Pruning the Cells</h2>
<p>We can remove the digits of fixed cells from their neighboring cells, one cell as a time. But, it is faster to find all the fixed digits in a row of cells and remove them from the possibilities of all the non-fixed cells of the row, at once. Then we can repeat this <em>pruning</em> step for all the rows of the grid (and columns and sub-grids too! We’ll see how).</p>
<div class="sourceCode" id="cb8" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneCells ::</span> [<span class="dt">Cell</span>] <span class="ot">-></span> <span class="dt">Maybe</span> [<span class="dt">Cell</span>]</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>pruneCells cells <span class="ot">=</span> <span class="fu">traverse</span> pruneCell cells</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> fixeds <span class="ot">=</span> [x <span class="op">|</span> <span class="dt">Fixed</span> x <span class="ot"><-</span> cells]</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> pruneCell (<span class="dt">Possible</span> xs) <span class="ot">=</span> <span class="kw">case</span> xs <span class="dt">Data.List</span><span class="op">.</span>\\ fixeds <span class="kw">of</span></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> [] <span class="ot">-></span> <span class="dt">Nothing</span></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a> [y] <span class="ot">-></span> <span class="dt">Just</span> <span class="op">$</span> <span class="dt">Fixed</span> y</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a> ys <span class="ot">-></span> <span class="dt">Just</span> <span class="op">$</span> <span class="dt">Possible</span> ys</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a> pruneCell x <span class="ot">=</span> <span class="dt">Just</span> x</span></code></pre></div>
<p><code>pruneCells</code> prunes a list of cells as described before. We start with finding the fixed digits in the list of cells. Then we go over each non-fixed cells, removing the fixed digits we found, from their possible values. Two special cases arise:</p>
<ul>
<li>If pruning results in a cell with no possible digits, it is a sign that this branch of search has no solution and hence, we return a <code>Nothing</code> in that case.</li>
<li>If only one possible digit remains after pruning, then we turn that cell into a fixed cell with that digit.</li>
</ul>
<p>We use the <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-Traversable.html#v:traverse" target="_blank" rel="noopener"><code>traverse</code></a> function for pruning the cells so that a <code>Nothing</code> resulting from pruning one cell propagates to the entire list.</p>
<p>Let’s take it for a spin in the <em>REPL</em>:</p>
<div class="sourceCode" id="cb9" data-lang="ghci"><pre class="sourceCode lhs numberSource small overflow"><code class="sourceCode literatehaskell"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span></span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> readGrid <span class="st">"6......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6..."</span></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGridWithPossibilities <span class="op">$</span> [<span class="fu">head</span> grid] <span class="co">-- first row of the grid</span></span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a>6 [123456789] [123456789] [123456789] [123456789] [123456789] [123456789] 1 [123456789]</span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGridWithPossibilities [fromJust <span class="op">$</span> pruneCells <span class="op">$</span> <span class="fu">head</span> grid] <span class="co">-- same row after pruning</span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a>6 [ 2345 789] [ 2345 789] [ 2345 789] [ 2345 789] [ 2345 789] [ 2345 789] 1 [ 2345 789]</span></code></pre></div>
<p>It works! <code>6</code> and <code>1</code> are removed from the possibilities of the other cells. Now we are ready for …</p>
<h2 data-track-content data-content-name="pruning-the-grid" data-content-piece="fast-sudoku-solver-in-haskell-1" id="pruning-the-grid">Pruning the Grid</h2>
<p>Pruning a grid requires us to prune each row, each column and each sub-grid. Let’s try to solve it in the <em>REPL</em> first:</p>
<div class="sourceCode" id="cb10" data-lang="ghci"><pre class="sourceCode lhs numberSource small overflow"><code class="sourceCode literatehaskell"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span></span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> readGrid <span class="st">"6......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6..."</span></span>
<span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid' <span class="ot">=</span> <span class="fu">traverse</span> pruneCells grid</span>
<span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGridWithPossibilities grid'</span>
<span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a>6 [ 2345 789] [ 2345 789] [ 2345 789] [ 2345 789] [ 2345 789] [ 2345 789] 1 [ 2345 789]</span>
<span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a>4 [123 56789] [123 56789] [123 56789] [123 56789] [123 56789] [123 56789] [123 56789] [123 56789]</span>
<span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a>[1 3456789] 2 [1 3456789] [1 3456789] [1 3456789] [1 3456789] [1 3456789] [1 3456789] [1 3456789]</span>
<span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a>[123 6 89] [123 6 89] [123 6 89] [123 6 89] 5 [123 6 89] 4 [123 6 89] 7</span>
<span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a>[12 4567 9] [12 4567 9] 8 [12 4567 9] [12 4567 9] [12 4567 9] 3 [12 4567 9] [12 4567 9]</span>
<span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a>[ 2345678 ] [ 2345678 ] 1 [ 2345678 ] 9 [ 2345678 ] [ 2345678 ] [ 2345678 ] [ 2345678 ]</span>
<span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a>3 [1 56789] [1 56789] 4 [1 56789] [1 56789] 2 [1 56789] [1 56789]</span>
<span id="cb10-14"><a href="#cb10-14" aria-hidden="true" tabindex="-1"></a>[ 234 6789] 5 [ 234 6789] 1 [ 234 6789] [ 234 6789] [ 234 6789] [ 234 6789] [ 234 6789]</span>
<span id="cb10-15"><a href="#cb10-15" aria-hidden="true" tabindex="-1"></a>[12345 7 9] [12345 7 9] [12345 7 9] 8 [12345 7 9] 6 [12345 7 9] [12345 7 9] [12345 7 9]</span></code></pre></div>
<p>By <code>traverse</code>-ing the grid with <code>pruneCells</code>, we are able to prune each row, one-by-one. Since pruning a row doesn’t affect another row, we don’t have to pass the resulting rows between each pruning step. That is to say, <code>traverse</code> is enough for us, we don’t need <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-Foldable.html#v:foldl" target="_blank" rel="noopener"><code>foldl</code></a> here.</p>
<p>How do we do the same thing for columns now? Since our representation for the grid is rows-first, we first need to convert it to a columns-first representation. Luckily, that’s what <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Data-List.html#v:transpose" target="_blank" rel="noopener"><code>Data.List.transpose</code></a> function does:</p>
<div class="sourceCode" id="cb11" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span> readGrid <span class="st">"693784512487512936125963874932651487568247391741398625319475268856129743274836159"</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGrid grid</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>6 9 3 7 8 4 5 1 2</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a>4 8 7 5 1 2 9 3 6</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>1 2 5 9 6 3 8 7 4</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a>9 3 2 6 5 1 4 8 7</span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a>5 6 8 2 4 7 3 9 1</span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a>7 4 1 3 9 8 6 2 5</span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a>3 1 9 4 7 5 2 6 8</span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a>8 5 6 1 2 9 7 4 3</span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a>2 7 4 8 3 6 1 5 9</span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGrid <span class="op">$</span> Data.List.transpose grid</span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a>6 4 1 9 5 7 3 8 2</span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a>9 8 2 3 6 4 1 5 7</span>
<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a>3 7 5 2 8 1 9 6 4</span>
<span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a>7 5 9 6 2 3 4 1 8</span>
<span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a>8 1 6 5 4 9 7 2 3</span>
<span id="cb11-18"><a href="#cb11-18" aria-hidden="true" tabindex="-1"></a>4 2 3 1 7 8 5 9 6</span>
<span id="cb11-19"><a href="#cb11-19" aria-hidden="true" tabindex="-1"></a>5 9 8 4 3 6 2 7 1</span>
<span id="cb11-20"><a href="#cb11-20" aria-hidden="true" tabindex="-1"></a>1 3 7 8 9 2 6 4 5</span>
<span id="cb11-21"><a href="#cb11-21" aria-hidden="true" tabindex="-1"></a>2 6 4 7 1 5 8 3 9</span></code></pre></div>
<p>Pruning columns is easy now:</p>
<div class="sourceCode" id="cb12" data-lang="ghci"><pre class="sourceCode lhs numberSource small overflow"><code class="sourceCode literatehaskell"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> readGrid <span class="st">"6......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6..."</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid' <span class="ot">=</span> <span class="fu">fmap</span> Data.List.transpose <span class="op">.</span> <span class="fu">traverse</span> pruneCells <span class="op">.</span> Data.List.transpose <span class="op">$</span> grid</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGridWithPossibilities grid'</span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>6 [1 34 6789] [ 234567 9] [ 23 567 9] [1234 678 ] [12345 789] [1 56789] 1 [123456 89]</span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a>4 [1 34 6789] [ 234567 9] [ 23 567 9] [1234 678 ] [12345 789] [1 56789] [ 23456789] [123456 89]</span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a>[12 5 789] 2 [ 234567 9] [ 23 567 9] [1234 678 ] [12345 789] [1 56789] [ 23456789] [123456 89]</span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a>[12 5 789] [1 34 6789] [ 234567 9] [ 23 567 9] 5 [12345 789] 4 [ 23456789] 7</span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a>[12 5 789] [1 34 6789] 8 [ 23 567 9] [1234 678 ] [12345 789] 3 [ 23456789] [123456 89]</span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a>[12 5 789] [1 34 6789] 1 [ 23 567 9] 9 [12345 789] [1 56789] [ 23456789] [123456 89]</span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a>3 [1 34 6789] [ 234567 9] 4 [1234 678 ] [12345 789] 2 [ 23456789] [123456 89]</span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a>[12 5 789] 5 [ 234567 9] 1 [1234 678 ] [12345 789] [1 56789] [ 23456789] [123456 89]</span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a>[12 5 789] [1 34 6789] [ 234567 9] 8 [1234 678 ] 6 [1 56789] [ 23456789] [123456 89]</span></code></pre></div>
<p>First, we <code>transpose</code> the grid to convert the columns into rows. Then, we prune the rows by <code>traverse</code>-ing <code>pruneCells</code> over them. And finally, we turn the rows back into columns by <code>transpose</code>-ing the grid back again. The last <code>transpose</code> needs to be <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Prelude.html#v:fmap" target="_blank" rel="noopener"><code>fmap</code></a>-ped because <code>traverse pruneCells</code> returns a <code>Maybe</code>.</p>
<p>Pruning sub-grids is a bit trickier. Following the same idea as pruning columns, we need two functions to transform the sub-grids into rows and back. Let’s write the first one:</p>
<div class="sourceCode" id="cb13" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="ot">subGridsToRows ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">Grid</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>subGridsToRows <span class="ot">=</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">concatMap</span> (\rows <span class="ot">-></span> <span class="kw">let</span> [r1, r2, r3] <span class="ot">=</span> <span class="fu">map</span> (Data.List.Split.chunksOf <span class="dv">3</span>) rows</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> <span class="fu">zipWith3</span> (\a b c <span class="ot">-></span> a <span class="op">++</span> b <span class="op">++</span> c) r1 r2 r3)</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Data.List.Split.chunksOf <span class="dv">3</span></span></code></pre></div>
<p>And try it out:</p>
<div class="sourceCode" id="cb14" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span> readGrid <span class="st">"693784512487512936125963874932651487568247391741398625319475268856129743274836159"</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGrid grid</span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>6 9 3 7 8 4 5 1 2</span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>4 8 7 5 1 2 9 3 6</span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a>1 2 5 9 6 3 8 7 4</span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>9 3 2 6 5 1 4 8 7</span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a>5 6 8 2 4 7 3 9 1</span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a>7 4 1 3 9 8 6 2 5</span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a>3 1 9 4 7 5 2 6 8</span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a>8 5 6 1 2 9 7 4 3</span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a>2 7 4 8 3 6 1 5 9</span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGrid <span class="op">$</span> subGridsToRows grid</span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a>6 9 3 4 8 7 1 2 5</span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a>7 8 4 5 1 2 9 6 3</span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a>5 1 2 9 3 6 8 7 4</span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a>9 3 2 5 6 8 7 4 1</span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a>6 5 1 2 4 7 3 9 8</span>
<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a>4 8 7 3 9 1 6 2 5</span>
<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a>3 1 9 8 5 6 2 7 4</span>
<span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a>4 7 5 1 2 9 8 3 6</span>
<span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a>2 6 8 7 4 3 1 5 9</span></code></pre></div>
<p>You can go over the code and the output and make yourself sure that it works. Also, it turns out that we don’t need to write the back-transform function. <code>subGridsToRows</code> is its own back-transform:</p>
<div class="sourceCode" id="cb15" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGrid grid</span>
<span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>6 9 3 7 8 4 5 1 2</span>
<span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>4 8 7 5 1 2 9 3 6</span>
<span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a>1 2 5 9 6 3 8 7 4</span>
<span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>9 3 2 6 5 1 4 8 7</span>
<span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a>5 6 8 2 4 7 3 9 1</span>
<span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a>7 4 1 3 9 8 6 2 5</span>
<span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a>3 1 9 4 7 5 2 6 8</span>
<span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a>8 5 6 1 2 9 7 4 3</span>
<span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a>2 7 4 8 3 6 1 5 9</span>
<span id="cb15-11"><a href="#cb15-11" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGrid <span class="op">$</span> subGridsToRows <span class="op">$</span> subGridsToRows <span class="op">$</span> grid</span>
<span id="cb15-12"><a href="#cb15-12" aria-hidden="true" tabindex="-1"></a>6 9 3 7 8 4 5 1 2</span>
<span id="cb15-13"><a href="#cb15-13" aria-hidden="true" tabindex="-1"></a>4 8 7 5 1 2 9 3 6</span>
<span id="cb15-14"><a href="#cb15-14" aria-hidden="true" tabindex="-1"></a>1 2 5 9 6 3 8 7 4</span>
<span id="cb15-15"><a href="#cb15-15" aria-hidden="true" tabindex="-1"></a>9 3 2 6 5 1 4 8 7</span>
<span id="cb15-16"><a href="#cb15-16" aria-hidden="true" tabindex="-1"></a>5 6 8 2 4 7 3 9 1</span>
<span id="cb15-17"><a href="#cb15-17" aria-hidden="true" tabindex="-1"></a>7 4 1 3 9 8 6 2 5</span>
<span id="cb15-18"><a href="#cb15-18" aria-hidden="true" tabindex="-1"></a>3 1 9 4 7 5 2 6 8</span>
<span id="cb15-19"><a href="#cb15-19" aria-hidden="true" tabindex="-1"></a>8 5 6 1 2 9 7 4 3</span>
<span id="cb15-20"><a href="#cb15-20" aria-hidden="true" tabindex="-1"></a>2 7 4 8 3 6 1 5 9</span></code></pre></div>
<p>Nice! Now writing the sub-grid pruning function is easy:</p>
<div class="sourceCode" id="cb16" data-lang="ghci"><pre class="sourceCode lhs numberSource small overflow"><code class="sourceCode literatehaskell"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span></span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> readGrid <span class="st">"6......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6..."</span></span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid' <span class="ot">=</span> <span class="fu">fmap</span> subGridsToRows <span class="op">.</span> <span class="fu">traverse</span> pruneCells <span class="op">.</span> subGridsToRows <span class="op">$</span> grid</span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGridWithPossibilities grid'</span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a>6 [1 3 5 789] [1 3 5 789] [123456789] [123456789] [123456789] [ 23456789] 1 [ 23456789]</span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a>4 [1 3 5 789] [1 3 5 789] [123456789] [123456789] [123456789] [ 23456789] [ 23456789] [ 23456789]</span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a>[1 3 5 789] 2 [1 3 5 789] [123456789] [123456789] [123456789] [ 23456789] [ 23456789] [ 23456789]</span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a>[ 234567 9] [ 234567 9] [ 234567 9] [1234 678 ] 5 [1234 678 ] 4 [12 56 89] 7</span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a>[ 234567 9] [ 234567 9] 8 [1234 678 ] [1234 678 ] [1234 678 ] 3 [12 56 89] [12 56 89]</span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a>[ 234567 9] [ 234567 9] 1 [1234 678 ] 9 [1234 678 ] [12 56 89] [12 56 89] [12 56 89]</span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a>3 [12 4 6789] [12 4 6789] 4 [ 23 5 7 9] [ 23 5 7 9] 2 [1 3456789] [1 3456789]</span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a>[12 4 6789] 5 [12 4 6789] 1 [ 23 5 7 9] [ 23 5 7 9] [1 3456789] [1 3456789] [1 3456789]</span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a>[12 4 6789] [12 4 6789] [12 4 6789] 8 [ 23 5 7 9] 6 [1 3456789] [1 3456789] [1 3456789]</span></code></pre></div>
<p>It works well. Now we can string together these three steps to prune the entire grid. We also have to make sure that result of pruning each step is fed into the next step. This is so that the fixed cells created into one step cause more pruning in the further steps. We use monadic bind (<a href="https://hackage.haskell.org/package/base-4.10.1.0/docs/Control-Monad.html#v:-62--62--61-" target="_blank" rel="noopener"><code>>>=</code></a>) for that. Here’s the final code:</p>
<div class="sourceCode" id="cb17" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneGrid' ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>pruneGrid' grid <span class="ot">=</span></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">traverse</span> pruneCells grid</span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a> <span class="op">>>=</span> <span class="fu">fmap</span> Data.List.transpose <span class="op">.</span> <span class="fu">traverse</span> pruneCells <span class="op">.</span> Data.List.transpose</span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a> <span class="op">>>=</span> <span class="fu">fmap</span> subGridsToRows <span class="op">.</span> <span class="fu">traverse</span> pruneCells <span class="op">.</span> subGridsToRows</span></code></pre></div>
<p>And the test:</p>
<div class="sourceCode" id="cb18" data-lang="ghci"><pre class="sourceCode lhs numberSource small overflow"><code class="sourceCode literatehaskell"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span></span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> readGrid <span class="st">"6......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6..."</span></span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid' <span class="ot">=</span> pruneGrid' grid</span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGridWithPossibilities grid'</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a>6 [ 3 789] [ 3 5 7 9] [ 23 5 7 9] [ 234 78 ] [ 2345 789] [ 5 789] 1 [ 2345 89]</span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a>4 [1 3 789] [ 3 5 7 9] [ 23 567 9] [123 678 ] [123 5 789] [ 56789] [ 23 56789] [ 23 56 89]</span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a>[1 5 789] 2 [ 3 5 7 9] [ 3 567 9] [1 34 678 ] [1 345 789] [ 56789] [ 3456789] [ 3456 89]</span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a>[ 2 9] [ 3 6 9] [ 23 6 9] [ 23 6 ] 5 [123 8 ] 4 [ 2 6 89] 7</span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a>[ 2 5 7 9] [ 4 67 9] 8 [ 2 67 ] [12 4 67 ] [12 4 7 ] 3 [ 2 56 9] [12 56 9]</span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a>[ 2 5 7 ] [ 34 67 ] 1 [ 23 67 ] 9 [ 234 78 ] [ 56 8 ] [ 2 56 8 ] [ 2 56 8 ]</span>
<span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a>3 [1 6789] [ 67 9] 4 7 [ 5 7 9] 2 [ 56789] [1 56 89]</span>
<span id="cb18-14"><a href="#cb18-14" aria-hidden="true" tabindex="-1"></a>[ 2 789] 5 [ 2 4 67 9] 1 [ 23 7 ] [ 23 7 9] [ 6789] [ 34 6789] [ 34 6 89]</span>
<span id="cb18-15"><a href="#cb18-15" aria-hidden="true" tabindex="-1"></a>[12 7 9] [1 4 7 9] [ 2 4 7 9] 8 [ 23 7 ] 6 [1 5 7 9] [ 345 7 9] [1 345 9]</span>
<span id="cb18-16"><a href="#cb18-16" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGrid grid</span>
<span id="cb18-17"><a href="#cb18-17" aria-hidden="true" tabindex="-1"></a>6 . . . . . . 1 .</span>
<span id="cb18-18"><a href="#cb18-18" aria-hidden="true" tabindex="-1"></a>4 . . . . . . . .</span>
<span id="cb18-19"><a href="#cb18-19" aria-hidden="true" tabindex="-1"></a>. 2 . . . . . . .</span>
<span id="cb18-20"><a href="#cb18-20" aria-hidden="true" tabindex="-1"></a>. . . . 5 . 4 . 7</span>
<span id="cb18-21"><a href="#cb18-21" aria-hidden="true" tabindex="-1"></a>. . 8 . . . 3 . .</span>
<span id="cb18-22"><a href="#cb18-22" aria-hidden="true" tabindex="-1"></a>. . 1 . 9 . . . .</span>
<span id="cb18-23"><a href="#cb18-23" aria-hidden="true" tabindex="-1"></a>3 . . 4 . . 2 . .</span>
<span id="cb18-24"><a href="#cb18-24" aria-hidden="true" tabindex="-1"></a>. 5 . 1 . . . . .</span>
<span id="cb18-25"><a href="#cb18-25" aria-hidden="true" tabindex="-1"></a>. . . 8 . 6 . . .</span>
<span id="cb18-26"><a href="#cb18-26" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGrid grid'</span>
<span id="cb18-27"><a href="#cb18-27" aria-hidden="true" tabindex="-1"></a>6 . . . . . . 1 .</span>
<span id="cb18-28"><a href="#cb18-28" aria-hidden="true" tabindex="-1"></a>4 . . . . . . . .</span>
<span id="cb18-29"><a href="#cb18-29" aria-hidden="true" tabindex="-1"></a>. 2 . . . . . . .</span>
<span id="cb18-30"><a href="#cb18-30" aria-hidden="true" tabindex="-1"></a>. . . . 5 . 4 . 7</span>
<span id="cb18-31"><a href="#cb18-31" aria-hidden="true" tabindex="-1"></a>. . 8 . . . 3 . .</span>
<span id="cb18-32"><a href="#cb18-32" aria-hidden="true" tabindex="-1"></a>. . 1 . 9 . . . .</span>
<span id="cb18-33"><a href="#cb18-33" aria-hidden="true" tabindex="-1"></a>3 . . 4 7 . 2 . .</span>
<span id="cb18-34"><a href="#cb18-34" aria-hidden="true" tabindex="-1"></a>. 5 . 1 . . . . .</span>
<span id="cb18-35"><a href="#cb18-35" aria-hidden="true" tabindex="-1"></a>. . . 8 . 6 . . .</span></code></pre></div>
<p>We can clearly see the massive pruning of possibilities all around the grid. We also see a <code>7</code> pop up in the row-7-column-5 cell. This means that we can prune the grid further, until it settles. If you are familiar with Haskell, you may recognize this as trying to find a <a href="https://en.wikipedia.org/wiki/Fixed_point_%28mathematics%29" target="_blank" rel="noopener">fixed point</a> for the <code>pruneGrid'</code> function, except in a monadic context. It is simple to implement:</p>
<div class="sourceCode" id="cb19" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="ot">pruneGrid ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>pruneGrid <span class="ot">=</span> fixM pruneGrid'</span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a> fixM f x <span class="ot">=</span> f x <span class="op">>>=</span> \x' <span class="ot">-></span> <span class="kw">if</span> x' <span class="op">==</span> x <span class="kw">then</span> <span class="fu">return</span> x <span class="kw">else</span> fixM f x'</span></code></pre></div>
<p>The crux of this code is the <code>fixM</code> function. It takes a monadic function <code>f</code> and an initial value, and recursively calls itself till the return value settles. Let’s do another round in the <em>REPL</em>:</p>
<div class="sourceCode" id="cb20" data-lang="ghci"><pre class="sourceCode lhs numberSource small overflow"><code class="sourceCode literatehaskell"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span></span>
<span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> readGrid <span class="st">"6......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6..."</span></span>
<span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid' <span class="ot">=</span> pruneGrid grid</span>
<span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGridWithPossibilities grid'</span>
<span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a>6 [ 3 789] [ 3 5 7 9] [ 23 5 7 9] [ 234 8 ] [ 2345 789] [ 5 789] 1 [ 2345 89]</span>
<span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a>4 [1 3 789] [ 3 5 7 9] [ 23 567 9] [123 6 8 ] [123 5 789] [ 56789] [ 23 56789] [ 23 56 89]</span>
<span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a>[1 5 789] 2 [ 3 5 7 9] [ 3 567 9] [1 34 6 8 ] [1 345 789] [ 56789] [ 3456789] [ 3456 89]</span>
<span id="cb20-10"><a href="#cb20-10" aria-hidden="true" tabindex="-1"></a>[ 2 9] [ 3 6 9] [ 23 6 9] [ 23 6 ] 5 [123 8 ] 4 [ 2 6 89] 7</span>
<span id="cb20-11"><a href="#cb20-11" aria-hidden="true" tabindex="-1"></a>[ 2 5 7 9] [ 4 67 9] 8 [ 2 67 ] [12 4 6 ] [12 4 7 ] 3 [ 2 56 9] [12 56 9]</span>
<span id="cb20-12"><a href="#cb20-12" aria-hidden="true" tabindex="-1"></a>[ 2 5 7 ] [ 34 67 ] 1 [ 23 67 ] 9 [ 234 78 ] [ 56 8 ] [ 2 56 8 ] [ 2 56 8 ]</span>
<span id="cb20-13"><a href="#cb20-13" aria-hidden="true" tabindex="-1"></a>3 [1 6 89] [ 6 9] 4 7 [ 5 9] 2 [ 56 89] [1 56 89]</span>
<span id="cb20-14"><a href="#cb20-14" aria-hidden="true" tabindex="-1"></a>[ 2 789] 5 [ 2 4 67 9] 1 [ 23 ] [ 23 9] [ 6789] [ 34 6789] [ 34 6 89]</span>
<span id="cb20-15"><a href="#cb20-15" aria-hidden="true" tabindex="-1"></a>[12 7 9] [1 4 7 9] [ 2 4 7 9] 8 [ 23 ] 6 [1 5 7 9] [ 345 7 9] [1 345 9]</span></code></pre></div>
<p>We see that <code>7</code> in the row-7-column-5 cell is eliminated from all its neighboring cells. We can’t prune the grid anymore. Now it is time to make the choice.</p>
<h2 data-track-content data-content-name="making-the-choice" data-content-piece="fast-sudoku-solver-in-haskell-1" id="making-the-choice">Making the Choice</h2>
<p>One the grid is settled, we need to choose a non-fixed cell and make it fixed by assuming one of its possible values. This gives us two grids, next in the state-space of the solution search:</p>
<ul>
<li>one which has this chosen cell fixed to this chosen digit, and,</li>
<li>the other in which the chosen cell has all the other possibilities except the one we chose to fix.</li>
</ul>
<p>We call this function, <code>nextGrids</code>:</p>
<div class="sourceCode" id="cb21" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="ot">nextGrids ::</span> <span class="dt">Grid</span> <span class="ot">-></span> (<span class="dt">Grid</span>, <span class="dt">Grid</span>)</span>
<span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a>nextGrids grid <span class="ot">=</span></span>
<span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (i, first<span class="op">@</span>(<span class="dt">Fixed</span> _), rest) <span class="ot">=</span></span>
<span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a> fixCell</span>
<span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> Data.List.minimumBy (<span class="fu">compare</span> <span class="ot">`Data.Function.on`</span> (possibilityCount <span class="op">.</span> <span class="fu">snd</span>))</span>
<span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">filter</span> (isPossible <span class="op">.</span> <span class="fu">snd</span>)</span>
<span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">zip</span> [<span class="dv">0</span><span class="op">..</span>]</span>
<span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a> <span class="op">.</span> <span class="fu">concat</span></span>
<span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a> <span class="op">$</span> grid</span>
<span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> (replace2D i first grid, replace2D i rest grid)</span>
<span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb21-12"><a href="#cb21-12" aria-hidden="true" tabindex="-1"></a> isPossible (<span class="dt">Possible</span> _) <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb21-13"><a href="#cb21-13" aria-hidden="true" tabindex="-1"></a> isPossible _ <span class="ot">=</span> <span class="dt">False</span></span>
<span id="cb21-14"><a href="#cb21-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-15"><a href="#cb21-15" aria-hidden="true" tabindex="-1"></a> possibilityCount (<span class="dt">Possible</span> xs) <span class="ot">=</span> <span class="fu">length</span> xs</span>
<span id="cb21-16"><a href="#cb21-16" aria-hidden="true" tabindex="-1"></a> possibilityCount (<span class="dt">Fixed</span> _) <span class="ot">=</span> <span class="dv">1</span></span>
<span id="cb21-17"><a href="#cb21-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-18"><a href="#cb21-18" aria-hidden="true" tabindex="-1"></a> fixCell (i, <span class="dt">Possible</span> [x, y]) <span class="ot">=</span> (i, <span class="dt">Fixed</span> x, <span class="dt">Fixed</span> y)</span>
<span id="cb21-19"><a href="#cb21-19" aria-hidden="true" tabindex="-1"></a> fixCell (i, <span class="dt">Possible</span> (x<span class="op">:</span>xs)) <span class="ot">=</span> (i, <span class="dt">Fixed</span> x, <span class="dt">Possible</span> xs)</span>
<span id="cb21-20"><a href="#cb21-20" aria-hidden="true" tabindex="-1"></a> fixCell _ <span class="ot">=</span> <span class="fu">error</span> <span class="st">"Impossible case"</span></span>
<span id="cb21-21"><a href="#cb21-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb21-22"><a href="#cb21-22" aria-hidden="true" tabindex="-1"></a><span class="ot"> replace2D ::</span> <span class="dt">Int</span> <span class="ot">-></span> a <span class="ot">-></span> [[a]] <span class="ot">-></span> [[a]]</span>
<span id="cb21-23"><a href="#cb21-23" aria-hidden="true" tabindex="-1"></a> replace2D i v <span class="ot">=</span></span>
<span id="cb21-24"><a href="#cb21-24" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (x, y) <span class="ot">=</span> (i <span class="ot">`quot`</span> <span class="dv">9</span>, i <span class="ot">`mod`</span> <span class="dv">9</span>) <span class="kw">in</span> replace x (replace y (<span class="fu">const</span> v))</span>
<span id="cb21-25"><a href="#cb21-25" aria-hidden="true" tabindex="-1"></a> replace p f xs <span class="ot">=</span> [<span class="kw">if</span> i <span class="op">==</span> p <span class="kw">then</span> f x <span class="kw">else</span> x <span class="op">|</span> (x, i) <span class="ot"><-</span> <span class="fu">zip</span> xs [<span class="dv">0</span><span class="op">..</span>]]</span></code></pre></div>
<p>We choose the non-fixed cell with least count of possibilities as the pivot. This strategy make sense intuitively, as with a cell with fewest possibilities, we have the most chance of being right when assuming one. Fixing a non-fixed cell leads to one of the two cases:</p>
<ol type="a">
<li>the cell has only two possible values, resulting in two fixed cells, or,</li>
<li>the cell has more than two possible values, resulting in one fixed and one non-fixed cell.</li>
</ol>
<p>Then all we are left with is replacing the non-fixed cell with its fixed and fixed/non-fixed choices, which we do with some math and some list traversal. A quick check on the <em>REPL</em>:</p>
<div class="sourceCode" id="cb22" data-lang="ghci"><pre class="sourceCode lhs numberSource small overflow"><code class="sourceCode literatehaskell"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>{</span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> readGrid <span class="st">"6......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6..."</span></span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="op">:</span>}</span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid' <span class="ot">=</span> pruneGrid grid</span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGridWithPossibilities grid'</span>
<span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a>6 [ 3 789] [ 3 5 7 9] [ 23 5 7 9] [ 234 8 ] [ 2345 789] [ 5 789] 1 [ 2345 89]</span>
<span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a>4 [1 3 789] [ 3 5 7 9] [ 23 567 9] [123 6 8 ] [123 5 789] [ 56789] [ 23 56789] [ 23 56 89]</span>
<span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a>[1 5 789] 2 [ 3 5 7 9] [ 3 567 9] [1 34 6 8 ] [1 345 789] [ 56789] [ 3456789] [ 3456 89]</span>
<span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a>[ 2 9] [ 3 6 9] [ 23 6 9] [ 23 6 ] 5 [123 8 ] 4 [ 2 6 89] 7</span>
<span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a>[ 2 5 7 9] [ 4 67 9] 8 [ 2 67 ] [12 4 6 ] [12 4 7 ] 3 [ 2 56 9] [12 56 9]</span>
<span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a>[ 2 5 7 ] [ 34 67 ] 1 [ 23 67 ] 9 [ 234 78 ] [ 56 8 ] [ 2 56 8 ] [ 2 56 8 ]</span>
<span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a>3 [1 6 89] [ 6 9] 4 7 [ 5 9] 2 [ 56 89] [1 56 89]</span>
<span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a>[ 2 789] 5 [ 2 4 67 9] 1 [ 23 ] [ 23 9] [ 6789] [ 34 6789] [ 34 6 89]</span>
<span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a>[12 7 9] [1 4 7 9] [ 2 4 7 9] 8 [ 23 ] 6 [1 5 7 9] [ 345 7 9] [1 345 9]</span>
<span id="cb22-16"><a href="#cb22-16" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="co">-- the row-4-column-1 cell is the first cell with only two possibilities, [2, 9].</span></span>
<span id="cb22-17"><a href="#cb22-17" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="co">-- it is chosen as the pivot cell to find the next grids.</span></span>
<span id="cb22-18"><a href="#cb22-18" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> (grid1, grid2) <span class="ot">=</span> nextGrids grid'</span>
<span id="cb22-19"><a href="#cb22-19" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGridWithPossibilities grid1</span>
<span id="cb22-20"><a href="#cb22-20" aria-hidden="true" tabindex="-1"></a>6 [ 3 789] [ 3 5 7 9] [ 23 5 7 9] [ 234 8 ] [ 2345 789] [ 5 789] 1 [ 2345 89]</span>
<span id="cb22-21"><a href="#cb22-21" aria-hidden="true" tabindex="-1"></a>4 [1 3 789] [ 3 5 7 9] [ 23 567 9] [123 6 8 ] [123 5 789] [ 56789] [ 23 56789] [ 23 56 89]</span>
<span id="cb22-22"><a href="#cb22-22" aria-hidden="true" tabindex="-1"></a>[1 5 789] 2 [ 3 5 7 9] [ 3 567 9] [1 34 6 8 ] [1 345 789] [ 56789] [ 3456789] [ 3456 89]</span>
<span id="cb22-23"><a href="#cb22-23" aria-hidden="true" tabindex="-1"></a>2 [ 3 6 9] [ 23 6 9] [ 23 6 ] 5 [123 8 ] 4 [ 2 6 89] 7</span>
<span id="cb22-24"><a href="#cb22-24" aria-hidden="true" tabindex="-1"></a>[ 2 5 7 9] [ 4 67 9] 8 [ 2 67 ] [12 4 6 ] [12 4 7 ] 3 [ 2 56 9] [12 56 9]</span>
<span id="cb22-25"><a href="#cb22-25" aria-hidden="true" tabindex="-1"></a>[ 2 5 7 ] [ 34 67 ] 1 [ 23 67 ] 9 [ 234 78 ] [ 56 8 ] [ 2 56 8 ] [ 2 56 8 ]</span>
<span id="cb22-26"><a href="#cb22-26" aria-hidden="true" tabindex="-1"></a>3 [1 6 89] [ 6 9] 4 7 [ 5 9] 2 [ 56 89] [1 56 89]</span>
<span id="cb22-27"><a href="#cb22-27" aria-hidden="true" tabindex="-1"></a>[ 2 789] 5 [ 2 4 67 9] 1 [ 23 ] [ 23 9] [ 6789] [ 34 6789] [ 34 6 89]</span>
<span id="cb22-28"><a href="#cb22-28" aria-hidden="true" tabindex="-1"></a>[12 7 9] [1 4 7 9] [ 2 4 7 9] 8 [ 23 ] 6 [1 5 7 9] [ 345 7 9] [1 345 9]</span>
<span id="cb22-29"><a href="#cb22-29" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGridWithPossibilities grid2</span>
<span id="cb22-30"><a href="#cb22-30" aria-hidden="true" tabindex="-1"></a>6 [ 3 789] [ 3 5 7 9] [ 23 5 7 9] [ 234 8 ] [ 2345 789] [ 5 789] 1 [ 2345 89]</span>
<span id="cb22-31"><a href="#cb22-31" aria-hidden="true" tabindex="-1"></a>4 [1 3 789] [ 3 5 7 9] [ 23 567 9] [123 6 8 ] [123 5 789] [ 56789] [ 23 56789] [ 23 56 89]</span>
<span id="cb22-32"><a href="#cb22-32" aria-hidden="true" tabindex="-1"></a>[1 5 789] 2 [ 3 5 7 9] [ 3 567 9] [1 34 6 8 ] [1 345 789] [ 56789] [ 3456789] [ 3456 89]</span>
<span id="cb22-33"><a href="#cb22-33" aria-hidden="true" tabindex="-1"></a>9 [ 3 6 9] [ 23 6 9] [ 23 6 ] 5 [123 8 ] 4 [ 2 6 89] 7</span>
<span id="cb22-34"><a href="#cb22-34" aria-hidden="true" tabindex="-1"></a>[ 2 5 7 9] [ 4 67 9] 8 [ 2 67 ] [12 4 6 ] [12 4 7 ] 3 [ 2 56 9] [12 56 9]</span>
<span id="cb22-35"><a href="#cb22-35" aria-hidden="true" tabindex="-1"></a>[ 2 5 7 ] [ 34 67 ] 1 [ 23 67 ] 9 [ 234 78 ] [ 56 8 ] [ 2 56 8 ] [ 2 56 8 ]</span>
<span id="cb22-36"><a href="#cb22-36" aria-hidden="true" tabindex="-1"></a>3 [1 6 89] [ 6 9] 4 7 [ 5 9] 2 [ 56 89] [1 56 89]</span>
<span id="cb22-37"><a href="#cb22-37" aria-hidden="true" tabindex="-1"></a>[ 2 789] 5 [ 2 4 67 9] 1 [ 23 ] [ 23 9] [ 6789] [ 34 6789] [ 34 6 89]</span>
<span id="cb22-38"><a href="#cb22-38" aria-hidden="true" tabindex="-1"></a>[12 7 9] [1 4 7 9] [ 2 4 7 9] 8 [ 23 ] 6 [1 5 7 9] [ 345 7 9] [1 345 9]</span></code></pre></div>
<h2 data-track-content data-content-name="solving-the-puzzle" data-content-piece="fast-sudoku-solver-in-haskell-1" id="solving-the-puzzle">Solving the Puzzle</h2>
<p>We have implemented parts of our algorithm till now. Now we’ll put everything together to solve the puzzle. First, we need to know if we are done or have messed up:</p>
<div class="sourceCode" id="cb23" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="ot">isGridFilled ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">Bool</span></span>
<span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a>isGridFilled grid <span class="ot">=</span> <span class="fu">null</span> [ () <span class="op">|</span> <span class="dt">Possible</span> _ <span class="ot"><-</span> <span class="fu">concat</span> grid ]</span>
<span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a><span class="ot">isGridInvalid ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">Bool</span></span>
<span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a>isGridInvalid grid <span class="ot">=</span></span>
<span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">any</span> isInvalidRow grid</span>
<span id="cb23-7"><a href="#cb23-7" aria-hidden="true" tabindex="-1"></a> <span class="op">||</span> <span class="fu">any</span> isInvalidRow (Data.List.transpose grid)</span>
<span id="cb23-8"><a href="#cb23-8" aria-hidden="true" tabindex="-1"></a> <span class="op">||</span> <span class="fu">any</span> isInvalidRow (subGridsToRows grid)</span>
<span id="cb23-9"><a href="#cb23-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb23-10"><a href="#cb23-10" aria-hidden="true" tabindex="-1"></a> isInvalidRow row <span class="ot">=</span></span>
<span id="cb23-11"><a href="#cb23-11" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> fixeds <span class="ot">=</span> [x <span class="op">|</span> <span class="dt">Fixed</span> x <span class="ot"><-</span> row]</span>
<span id="cb23-12"><a href="#cb23-12" aria-hidden="true" tabindex="-1"></a> emptyPossibles <span class="ot">=</span> [x <span class="op">|</span> <span class="dt">Possible</span> x <span class="ot"><-</span> row, <span class="fu">null</span> x]</span>
<span id="cb23-13"><a href="#cb23-13" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> hasDups fixeds <span class="op">||</span> <span class="fu">not</span> (<span class="fu">null</span> emptyPossibles)</span>
<span id="cb23-14"><a href="#cb23-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb23-15"><a href="#cb23-15" aria-hidden="true" tabindex="-1"></a> hasDups l <span class="ot">=</span> hasDups' l []</span>
<span id="cb23-16"><a href="#cb23-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb23-17"><a href="#cb23-17" aria-hidden="true" tabindex="-1"></a> hasDups' [] _ <span class="ot">=</span> <span class="dt">False</span></span>
<span id="cb23-18"><a href="#cb23-18" aria-hidden="true" tabindex="-1"></a> hasDups' (y<span class="op">:</span>ys) xs</span>
<span id="cb23-19"><a href="#cb23-19" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> y <span class="ot">`elem`</span> xs <span class="ot">=</span> <span class="dt">True</span></span>
<span id="cb23-20"><a href="#cb23-20" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span> hasDups' ys (y<span class="op">:</span>xs)</span></code></pre></div>
<p><code>isGridFilled</code> returns whether a grid is filled completely by checking it for any <code>Possible</code> cells. <code>isGridInvalid</code> checks if a grid is invalid because it either has duplicate fixed cells in any block or has any non-fixed cell with no possibilities.</p>
<p>Writing the <code>solve</code> function is almost trivial now:</p>
<div class="sourceCode" id="cb24" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="ot">solve ::</span> <span class="dt">Grid</span> <span class="ot">-></span> <span class="dt">Maybe</span> <span class="dt">Grid</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>solve grid <span class="ot">=</span> pruneGrid grid <span class="op">>>=</span> solve'</span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a> <span class="kw">where</span></span>
<span id="cb24-4"><a href="#cb24-4" aria-hidden="true" tabindex="-1"></a> solve' g</span>
<span id="cb24-5"><a href="#cb24-5" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> isGridInvalid g <span class="ot">=</span> <span class="dt">Nothing</span></span>
<span id="cb24-6"><a href="#cb24-6" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> isGridFilled g <span class="ot">=</span> <span class="dt">Just</span> g</span>
<span id="cb24-7"><a href="#cb24-7" aria-hidden="true" tabindex="-1"></a> <span class="op">|</span> <span class="fu">otherwise</span> <span class="ot">=</span></span>
<span id="cb24-8"><a href="#cb24-8" aria-hidden="true" tabindex="-1"></a> <span class="kw">let</span> (grid1, grid2) <span class="ot">=</span> nextGrids g</span>
<span id="cb24-9"><a href="#cb24-9" aria-hidden="true" tabindex="-1"></a> <span class="kw">in</span> solve grid1 <span class="op"><|></span> solve grid2</span></code></pre></div>
<p>We prune the grid as before and pipe it to the helper function <code>solve'</code>. <code>solve'</code> bails with a <code>Nothing</code> if the grid is invalid, or returns the solved grid if it is filled completely. Otherwise, it finds the next two grids in the search tree and solves them recursively with backtracking by calling the <code>solve</code> function. Backtracking here is implemented by the using the <a href="https://hackage.haskell.org/package/base-4.11.1.0/docs/Control-Applicative.html#g:2" target="_blank" rel="noopener"><code>Alternative</code></a> (<code><|></code>) implementation of the <code>Maybe</code> type<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>. It takes the second branch in the computation if the first branch returns a <code>Nothing</code>.</p>
<p>Whew! That took us long. Let’s put it to the final test now:</p>
<div class="sourceCode" id="cb26" data-lang="ghci"><pre class="sourceCode lhs numberSource"><code class="sourceCode literatehaskell"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid <span class="ot">=</span></span>
<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a> readGrid "6......1.4.........2...........5.4.7..8...3....1.9....3..4..2...5.1........8.6..."</span>
<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGrid grid</span>
<span id="cb26-4"><a href="#cb26-4" aria-hidden="true" tabindex="-1"></a>6 . . . . . . 1 .</span>
<span id="cb26-5"><a href="#cb26-5" aria-hidden="true" tabindex="-1"></a>4 . . . . . . . .</span>
<span id="cb26-6"><a href="#cb26-6" aria-hidden="true" tabindex="-1"></a>. 2 . . . . . . .</span>
<span id="cb26-7"><a href="#cb26-7" aria-hidden="true" tabindex="-1"></a>. . . . 5 . 4 . 7</span>
<span id="cb26-8"><a href="#cb26-8" aria-hidden="true" tabindex="-1"></a>. . 8 . . . 3 . .</span>
<span id="cb26-9"><a href="#cb26-9" aria-hidden="true" tabindex="-1"></a>. . 1 . 9 . . . .</span>
<span id="cb26-10"><a href="#cb26-10" aria-hidden="true" tabindex="-1"></a>3 . . 4 . . 2 . .</span>
<span id="cb26-11"><a href="#cb26-11" aria-hidden="true" tabindex="-1"></a>. 5 . 1 . . . . .</span>
<span id="cb26-12"><a href="#cb26-12" aria-hidden="true" tabindex="-1"></a>. . . 8 . 6 . . .</span>
<span id="cb26-13"><a href="#cb26-13" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="dt">Just</span> grid' <span class="ot">=</span> solve grid</span>
<span id="cb26-14"><a href="#cb26-14" aria-hidden="true" tabindex="-1"></a><span class="ot">></span> <span class="fu">putStr</span> <span class="op">$</span> showGrid grid'</span>
<span id="cb26-15"><a href="#cb26-15" aria-hidden="true" tabindex="-1"></a>6 9 3 7 8 4 5 1 2</span>
<span id="cb26-16"><a href="#cb26-16" aria-hidden="true" tabindex="-1"></a>4 8 7 5 1 2 9 3 6</span>
<span id="cb26-17"><a href="#cb26-17" aria-hidden="true" tabindex="-1"></a>1 2 5 9 6 3 8 7 4</span>
<span id="cb26-18"><a href="#cb26-18" aria-hidden="true" tabindex="-1"></a>9 3 2 6 5 1 4 8 7</span>
<span id="cb26-19"><a href="#cb26-19" aria-hidden="true" tabindex="-1"></a>5 6 8 2 4 7 3 9 1</span>
<span id="cb26-20"><a href="#cb26-20" aria-hidden="true" tabindex="-1"></a>7 4 1 3 9 8 6 2 5</span>
<span id="cb26-21"><a href="#cb26-21" aria-hidden="true" tabindex="-1"></a>3 1 9 4 7 5 2 6 8</span>
<span id="cb26-22"><a href="#cb26-22" aria-hidden="true" tabindex="-1"></a>8 5 6 1 2 9 7 4 3</span>
<span id="cb26-23"><a href="#cb26-23" aria-hidden="true" tabindex="-1"></a>2 7 4 8 3 6 1 5 9</span></code></pre></div>
<p>It works! Let’s put a quick <code>main</code> wrapper around <code>solve</code> to call it from the command line:</p>
<div class="sourceCode" id="cb27" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="ot">main ::</span> <span class="dt">IO</span> ()</span>
<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a>main <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a> inputs <span class="ot"><-</span> <span class="fu">lines</span> <span class="op"><$></span> <span class="fu">getContents</span></span>
<span id="cb27-4"><a href="#cb27-4" aria-hidden="true" tabindex="-1"></a> Control.Monad.forM_ inputs <span class="op">$</span> \input <span class="ot">-></span></span>
<span id="cb27-5"><a href="#cb27-5" aria-hidden="true" tabindex="-1"></a> <span class="kw">case</span> readGrid input <span class="kw">of</span></span>
<span id="cb27-6"><a href="#cb27-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="fu">putStrLn</span> <span class="st">"Invalid input"</span></span>
<span id="cb27-7"><a href="#cb27-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> grid <span class="ot">-></span> <span class="kw">case</span> solve grid <span class="kw">of</span></span>
<span id="cb27-8"><a href="#cb27-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="ot">-></span> <span class="fu">putStrLn</span> <span class="st">"No solution found"</span></span>
<span id="cb27-9"><a href="#cb27-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">Just</span> grid' <span class="ot">-></span> <span class="fu">putStrLn</span> <span class="op">$</span> showGrid grid'</span></code></pre></div>
<p>And now, we can invoke it from the command line:</p>
<pre class="plain"><code>$ echo ".......12.5.4............3.7..6..4....1..........8....92....8.....51.7.......3..." | stack exec sudoku
3 6 4 9 7 8 5 1 2
1 5 2 4 3 6 9 7 8
8 7 9 1 2 5 6 3 4
7 3 8 6 5 1 4 2 9
6 9 1 2 4 7 3 8 5
2 4 5 3 8 9 1 6 7
9 2 3 7 6 4 8 5 1
4 8 6 5 1 2 7 9 3
5 1 7 8 9 3 2 4 6</code></pre>
<p>And, we are done.</p>
<p>If you want to play with different puzzles, the file <a href="https://abhinavsarkar.net/files/sudoku17.txt.bz2?mtm_campaign=feed">here</a> lists some of the toughest ones. Let’s run<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> some of them through our program to see how fast it is:</p>
<pre class="plain"><code>$ head -n100 sudoku17.txt | time stack exec sudoku
... output omitted ...
116.70 real 198.09 user 94.46 sys</code></pre>
<p>It took about 117 seconds to solve a hundred puzzles, so, about 1.2 seconds per puzzle. This is pretty slow but we’ll get around to making it faster in the subsequent posts.</p>
<h2 data-track-content data-content-name="conclusion" data-content-piece="fast-sudoku-solver-in-haskell-1" id="conclusion">Conclusion</h2>
<p>In this rather verbose article, we learned how to write a simple Sudoku solver in Haskell step-by-step. In the later parts of this series, we’ll delve into profiling the solution and figuring out better algorithms and data structures to solve Sudoku more efficiently. The code till now is available <a href="https://code.abhinavsarkar.net/abhin4v/hasdoku/src/commit/0ef77341a10fcc25926301ee47b931d92959c0fa?mtm_campaign=feed" target="_blank" rel="noopener">here</a>.</p>
<p class="like-msg">
If you have any questions or comments, please leave a comment below. If you liked this post, please share it. Thanks for reading!
</p>
<section id="footnotes" class="footnotes footnotes-end-of-document" role="doc-endnotes">
<hr></hr>
<ol>
<li id="fn1"><p>This exercise was originally done as a part of <a href="https://github.com/pratul/haskell-classes/" target="_blank" rel="noopener">the</a> <a href="https://github.com/ford-prefect/haskell-classes/" target="_blank" rel="noopener">Haskell</a> <a href="https://github.com/bnvinay92/haskell-classes/" target="_blank" rel="noopener">classes</a> I taught at <a href="https://nilenso.com" target="_blank" rel="noopener">nilenso</a>.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2"><p><code>Alternative</code> implementation of <code>Maybe</code>:</p>
<div class="sourceCode" id="cb25" data-lang="haskell"><pre class="sourceCode numberSource haskell"><code class="sourceCode haskell"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="kw">instance</span> <span class="dt">Alternative</span> <span class="dt">Maybe</span> <span class="kw">where</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a> empty <span class="ot">=</span> <span class="dt">Nothing</span></span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">Nothing</span> <span class="op"><|></span> r <span class="ot">=</span> r</span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a> l <span class="op"><|></span> _ <span class="ot">=</span> l</span></code></pre></div>
<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></li>
<li id="fn3"><p>All the runs were done on my MacBook Pro from 2014 with 2.2 GHz Intel Core i7 CPU and 16 GB memory.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section><section class="series-info">
<p>This post is a part of the series: <strong>Fast Sudoku Solver in Haskell</strong>.</p>
<ol>
<li>
<strong>A Simple Solution</strong> 👈
</li>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-2/?mtm_campaign=feed">A 200x Faster Solution</a>
</li>
<li>
<a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-3/?mtm_campaign=feed">Picking the Right Data Structures</a>
</li>
</ol>
</section>
<p>If you liked this post, please <a href="https://abhinavsarkar.net/posts/fast-sudoku-solver-in-haskell-1/?mtm_campaign=feed#syndications">leave a comment</a>.</p><img referrerpolicy="no-referrer-when-downgrade" src="https://anna.abhinavsarkar.net/matomo.php?idsite=1&rec=1" style="border:0" alt="" /> 2018-06-28T00:00:00Z <p><a href="https://en.wikipedia.org/wiki/Sudoku" target="_blank" rel="noopener">Sudoku</a> is a number placement puzzle. It consists of a 9x9 grid which is to be filled with digits from 1 to 9. Some of the cells of the grid come pre-filled and the player has to fill the rest.</p>
<p><a href="https://www.haskell.org/" target="_blank" rel="noopener">Haskell</a> is a purely functional programming language. It is a good choice to solve Sudoku given the problem’s <a href="https://en.wikipedia.org/wiki/Combinatorics" target="_blank" rel="noopener">combinatorial</a> nature. The aim of this series of posts is to write a <strong>fast</strong> Sudoku solver in Haskell. We’ll focus on both implementing the solution and making it efficient, step-by-step, starting with a slow but simple solution in this post<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>.</p>