| CARVIEW |
Domain-Theoretic Liftings
The domain-theoretic approach to non-termination is to model computations as maps between sets with an additional element. Thus we define a lifting operation which takes a set and adds an element to it
\[M A = A + 1\]which is the set of computations that either return an element of type \(A\) or do not terminate.
Of course, without proper restrictions on the functions that we can apply to it, this monad allows one to “decide non-termination”: one can write a function \(f : M A \to \{\textbf{True}, \textbf{False}\}\) which returns \(\textbf{True}\) if the program does not terminate and \(\textbf{False}\) otherwise. This clearly is not what we are trying to model.
To avoid this problem, in domain theory, a set \(A\) are endowed with a complete partial order (\(\sqsubseteq\)) where non-termination is modelled as the least element (\(\bot\)). The operation \(A \mapsto A_\bot\) which adds a least element to a CPO is called the lifting of a CPO.
Moreover, functions have to respect a continuity condition, that is the
function must preserve least upper bounds of arbitrary \(\omega\)-chains:
Essentially what this means is that the function \(f\) when applied to the best approximation of a subset, can be computed locally for each element of this subset. One consequence of this fact is that \(f\) is monotonic: it preserves the order of the CPO. One feature of this category is that every continuous map \(A_\bot \xrightarrow{\text{cont}} A_\bot\) has a fixed-point operator via the Fixed-Point Theorem:
\[\text{fix}(f) = \bigsqcup_{i \in \omega} f^n(\bot)\]which is given by the least upper bound of an \(\omega\)-chain
\[\bot \sqsubseteq f(\bot) \sqsubseteq f^2(\bot) \sqsubseteq \dots \sqsubseteq f^n(\bot) \sqsubseteq \dots\]To go back to our original problem. Since \(\bot \sqsubseteq a\) for all \(a \in A\), we cannot define a continuous map \(A_\bot \to 2_\bot\) such the one above because in the codomain of this function the elements \(\textbf{True}\) and \(\textbf{False}\) are not related.
Remark. When doing mathematics into a proof assistant the expert distinguishes two ways:
- implementing all the theory inside the prover’s logic, or
- creating a new synthetic language whose structure is interpreted inside the mathematical theory we want to work with
The second approach is the one, for example, used in HoTT, where types are certain topological spaces and functions are continuous.
The problem of formalising domain theory is that it becomes more complicated when the proof assistant is based on type theory. In particular, the problem is that a type is not really a set.
On the other hand, doing things synthetically would mean that recursion is somewhat spread across the whole language. What I mean by this is that since every continuous function has a fixed-point then non-termination can happen at every type making the internal language of this category effectively an inconsistent language when viewed as a logic. Hence the need for treating recursion as an effect.
The Coinductive Lifting (Capretta)
One solution proposed by Capretta is to take the coinductive solution to the following domain equation
\[D A \cong A + D A\]In other words, \(DA\) is the set coinductively generated by the constructors \(\text{now} : A \to D A\) and \(\text{delay} : DA \to DA\). Intuitively, \(\text{now}(x)\) is a terminating computation which returns an element \(x \in A\) in \(0\) steps, while \(\text{delay}(c)\) takes a computation \(c \in DA\) and delays it by adding one computational step to it. For example, \(\text{delay}(\text{delay}(\text{delay}(10)))\) is a computation which returns the number \(10\) in three steps.
Remark. \(D\) can be given the structure of an \(\omega\)-CPO with \(\bot\).
First, the non-terminating computation \(\bot\) can be defined coinductively as
\[\bot = \text{delay}(\bot)\]which is clearly a productive definition. Intuitively, \(\bot\) corresponds to the never-ending stream of delays:
\[\bot = \text{delay}(\text{delay}(\text{delay}\dots))\]Clearly, we cannot produce a function which discriminate between a terminating computation and non-terminating one. Capretta proves that \(D\) is a domain (up-to bisimilarity), that is, he defines a partial order \(\sqsubseteq_D\) on \(D\) which leads to a notion of least upper bounds for \(\omega\)-chains, written \(\bigsqcup_{n\in \omega} d_n\) for
\[d_0 \sqsubseteq_D d_0 \sqsubseteq_D d_1 \dots \sqsubseteq_D d_n \dots\]then it can be proven that every continuous function on \(D\) has a fixed-point similarly to the construction in domain theory.
Considerations. Now that recursion is being isolated into an effect we have solved one problem. However, programming in practice with this monad is far from being easy as one has to
- prove that each program on \(DA\) they define is a continuous function
- working with a coinductive bisimilarity relation rather than equality
- ensure productivity of definitions
Metric Lifting Monad (Martin Escardó)
Escardó’s metric lifting models partiality using metric spaces rather than coinduction, but the idea is not that different from Capretta’s. The metric lifting of a set \(A\), written \(LA\), is defined as
\[LA = (A \times \mathbb{N}) \cup \{\infty\}\]together with a distance function \(d : LA \times LA \to [0, \infty]\) where equal computations have distance \(0\), terminating computations \((a,k)\) and non-terminating ones have distance \((1/2)^k\), and terminating computations \((a,k)\) and \((b,l)\) have distance \(1/2^{\text{min}(k,l)}\). Intuitively, \((a,k)\) is a computation which returns \(a\) in \(k\) steps and \(\infty\) is the divergent computation.
Remark. \(LA\) is a complete bounded metric ultrametric space.
The unit of the monad \(LA\) is defined by \(\eta_A(a) = (a,0)\) and the delay operation is defined by
\[\delta_A(a,n) = (a, n + 1) \qquad \delta_A(\infty) = \infty\]In metric spaces terminology, a function is non-expansive if it does not expand the space relative to a distance function \(d\), but possibly contracts it:
\[d(f(x), f(y)) \le d(x,y)\]On the other hand, a contractive map is a map which contracts the space:
\[d(f(x), f(y)) \le c \dot (d(x,y))\]for a certain \(c < 1\). At this point it is possible to define a fixed-point operator for all contractive maps
\[\text{fix} : (LA \to LA) \to LA\]which sends every non-expansive map \(f\) to the fixed-point of \(\delta_A \circ f\), which is contractive because \(\delta_A\) is contractive. At this point the non-divergent computation is now defined as
\[\bot_A = \text{fix}(id_{LA})\]Considerations. This approach does not seem to suffer from the use of coinduction, but it still needs the programmer to prove functions are non-expansiveness.
Guarded Lifting (Atkey & McBride)
The coinductive lifting monad suffers from productivity and equality issues, while both the coinductive and metric liftings need additional structure on the maps defined on them to work properly with fixed-points.
In guarded type theory however, maps are always non-expansive and contractiveness is enforced at the type level. In particular, a contractive map is a function of type \(\triangleright X \to X\) for which there is always a fixed-point at all types \(X\):
\[\text{fix}_g : (\triangleright X \to X) \to X\]sending a map \(f : (\triangleright X \to X)\) to the unique fixed-point of \(f \circ \text{next}\). The guarded lifting is defined as the unique solution to the domain equation
\[L_g A = A + \triangleright L_g A\]There is an obvious unit of the monad \(\eta_A : A \to L_g A\) and delay map which has type
\[\delta_A : \triangleright L_g A \to L_g A\]Conceptually, this monad can be seen as Capretta’s lifting monad with an explicit notion of time or delay built into the type theory. At this point the divergent computation \(\bot_A : L_{g} A\) is defined as the guarded fixed-point of \(\delta\):
\[\bot_A = \text{fix}_g (\delta_A)\]Now we can check from the fixed-point property that \(\bot = \delta_A (\text{next}(\bot_A))\). Here, the term \(\delta_A \circ \text{next}\) corresponds to the delay operation which adds one step to the computation.
Conclusion
The Synthetic Approach. What I personally found truly amazing about the guarded lifting is that this monad is truly synthetic. There is no need for additional structure as in Capretta’s lifting, no need for checking continuity or non-expansiveness of maps. Furthermore, using the model of guarded type theory one can show that (in a certain sense) it corresponds to Martin’s metric lifting on one side and to Capretta’s monad on the other. I will probably need another post to explain this point.
Intensionality. To be honest, the only problem arising from the use of guarded recursion unfortunately is the fact that computations are modelled intensionally, that is two computations that return the same output given the same input are not necessarily equal if they take a different amount of steps to terminate. This is an issue that has to be solved once again by quotienting the monad which is another problem entirely.
Nevertheless, these problems also arise in coinductive and metric approaches. At present, the only extensional model of general recursion I am aware of is based on domain theory.
Consistency. Naturally, one might wonder why do we need guarded recursion, if domain theory already lets us model all of this extensionally? The answer to that is that, while domain theory is extremely powerful for modelling recursion extensionally, it does not yield a logically consistent model suitable for type theory. As noted in the introduction, this inconsistency makes domain-theoretic models ill-suited as foundations for type-theoretic languages, where logical soundness is essential.
]]>- a lax monoidal functor
- a monoid in a Day-monoidal category
- a morphism of lax-algebras for the free monoid 2-monad, and
- a codistributive law with the tensor product?
Well, None. Let’s see why.
To keep this post as concise as humanly possible I will assume knowledge of (symmetric)monoidal categories, kan extensions and enriched categories.
We show informally the following proposition.
Proposition. Let \((\mathcal{C}, \otimes_{\mathcal{C}}, I_{\mathcal{C}})\) be a small monoidal closed category enriched in a monoidal closed category \((\mathcal{D}, \otimes_{\mathcal{D}}, I_{\mathcal{V}})\) and let \(F : \mathcal{C} \to \mathcal{D}\) be a functor. The following statements for \(F\) are equivalent:
- It is a lax monoidal functor
- It is a monoid in the monoidal category \(([\mathcal{C}, \mathcal{D}], \otimes_\text{Day}, y(I_{\mathcal{C}}))\)
- It is a homomorphism of pseudo algebras for the free monoid 2-monad
-
It is a \(\mathbb{N}\)-indexed family of (co)distributive laws for a functor \(F : \mathcal{C} \to \mathcal{C}\)
\[\text{Nat}(\otimes^{n} \circ F^{n}, F \circ \otimes^{n})\]where \(\otimes^{n} : \mathcal{C}^{n} \to \mathcal{C}\)
Let us assume the hypothesis of the proposition.
Proof(Sketch). (1) \(\Leftrightarrow\) (2).
A lax monoidal functor is a functor which lax-preserves the monoidal structure of \(\mathcal{C}\) that is, there is a morphism
\[u : I_{\mathcal{D}} \to F I_{\mathcal{C}}\]and a family of morphisms
\[\circledast_{X,Y} : F X \otimes_{\mathcal{D}} F Y \to F (X \otimes_{\mathcal{C}} Y)\]indexed by \(X,Y\) and natural therein, subject to some coherence conditions.
On the other hand, the Day convolution provides a natural way to define a monoidal structure on the category of functors. In other words, the task is to turn the category of functors \([\mathcal{C}, \mathcal{D}]\) into a monoidal category by equipping it with a tensor product and a unit. Hence, for two functors \(F, G : \mathcal{C} \to \mathcal{D}\) the Day convolution \(\otimes_\text{Day}\) is defined as follows:
\[\begin{align*} (F \otimes_\text{Day} G) C & := \int^{X,Y \in \mathcal{C}} \mathcal{C}(X \otimes_{\mathcal{C}} Y, C) \otimes_{\mathcal{D}} F X \otimes_{\mathcal{D}} G Y\\ & = \text{Lan}_{\otimes_{\mathcal{C}}}(\otimes_{\mathcal{D}} \circ F \times F) \end{align*}\]while the unit of \([\mathcal{C}, \mathcal{D}]\) is given by the Yoneda embedding applied to the unit \(I_\mathcal{C}\) that is \(y(I_\mathcal{C}) = \mathcal{C}(I_\mathcal{C},-)\).
Now, a monoid in \(([\mathcal{C}, \mathcal{D}], \otimes_\text{Day}, y(I))\) is called a Day-monoid. This is a functor \(F : \mathcal{C} \to \mathcal{D}\) together with a unit and multiplication map.
- The unit map \(\eta : y(I) \to F\) is obtained from the unit of the lax monoidal functor (and viceversa) via the enriched Yoneda lemma
- The multiplication map \(\mu : F \otimes_{\text{Day}} F \to F\) is obtained from \(\circledast\) (and viceversa) by using the adjunction \(\text{Lan}_J \dashv - \circ J\) as follows
It remains to prove that the laws of the unit and multiplication of the monoid imply the lax monoidal properties of \(u\) and \(\circledast\) (left as exercise to the reader).
\((2) \Leftrightarrow (3)\).
This is a rather easy statement which generalises the free monoid construction to 2-categories.
In particular, the cheapest way of turning a set \(A\) into a monoid is to take the set of words over \(A\), namely \(A^*\). This is the free monoid over \(A\) where the empty word is the unit and concatenation is the multiplication of the monoid. The Eilenberg-Moore algebras of \(A^*\) are equivalent to the algebraic structure of the monoid \(A^*\).
In particular, the category of Eilenberg-Moore algebras over \(A^*\) is
equivalent to the category of monoids
Similarly, given a category \(\mathcal{C}\), the cheapest way of turning this category into a monoid (a monoidal category) is to send \(\mathcal{C}\) to the category of finite sequences of objects \((A_1, \dots, A_n)\) and componentwise sequences of morphisms in \(\mathcal{C}\). In other words, \(T\) is the free monoid 2-monad in \(\textbf{Cat}\) defined as the \(\mathbb{N}\)-coproduct \(\mathcal{C}^n\), that is
\[T\mathcal{C} = \sum_{n : \mathbb{N}} \mathcal{C}^{n}\]Now, similarly to what happens in the 1-category case, we have the following equivalence
\[\textbf{Cat}^T \simeq 2\text{-Mon}\]where \(\textbf{Cat}^T\) is the 2-category of algebras for a 2-monad \(T\) and \(T\)-algebra homomorphisms and \(2\)-Mon is the 2-category of monoidal categories and monoidal functors (monoids in \(\textbf{Cat}\)). Hence (pseudo) \(T\)-algebra homomorphisms are (lax) monoidal functors.
\((3 \Leftrightarrow 4)\).
Clearly, if \(T\) is the free monoid 2-monad, an algebra for \(T\) is a map
\[a : \sum_{n : \mathbb{N}} \mathcal{C}^n \to \mathcal{C}\]The previous point states that this is a monoidal category where \(A \otimes_\mathcal{C} B := a (A,B)\) and \(I_\mathcal{C} = a()\), thus \(a\) sends \((A_1, \dots, A_n)\) to \(A_1 \otimes_\mathcal{C} \dots \otimes_\mathcal{C} A_n\).
A lax monoidal functor \(F\) is a lax \(T\)-algebra homomorphism, thus it has to satisfy
\[F(A_1) \otimes_\mathcal{D} \dots \otimes_\mathcal{D} F(A_n) \to F(A_1 \otimes_\mathcal{C} \dots \otimes_\mathcal{C} A_n)\]which is defined at all \(n\) and \(A_i\) hence it is a (co)distributive law
\[\text{Nat}(\otimes^{n}_{\mathcal{D}} \circ F^{n}, F \circ \otimes^{n}_{\mathcal{C}})\]]]>One example of this fact is when considering CCS with the choice operator. In this language we can define a process \(P\) and a process \(Q\) as follows
\[P = \text{pay}.(\text{coffee}. 0 + \text{tea}. 0)\] \[Q = (\text{pay}.\text{coffee}.0 + \text{pay}.\text{tea}. 0)\]Now the trace semantics of the CCS processes can be defined by a function
\[[\![ \cdot ]\!] : \text{CCS} \to \mathcal{P}_\text{fin}(\text{Str } L)\]where \(L\) is the finite set of actions and, for a generic set \(A\), the set \(\text{Str } A = 1 + A \times \text{Str }A\) is the set of possibly finite streams over a set \(A\).
For the processes above we have that the semantics of \(P\) is \([\![P]\!] = \{\text{pay}.\text{coffee}, \text{pay}.\text{tea}\}\) and the semantics of \(Q\) is \([\![ Q ]\!] = \{\text{pay}.\text{coffee}, \text{pay}.\text{tea}\}\) and thus the trace semantics of \(P\) and \(Q\) indicate that these processes should be equal.
However, consider the relation \(P\) simulates \(Q\) which is stated as
\[P \lesssim Q \Leftrightarrow \forall P'. \text{ if } P \xrightarrow{a} P' \text{ then } \exists Q'. Q \xrightarrow{a} Q' \text{ s.t. } P' \lesssim Q'\]Now the bisimulation relation can be defined as \(P \approx Q \Leftrightarrow P \lesssim Q \text{ and } Q \lesssim P\).
The above example is a standard example in concurrency theory that shows that bisimulation can distinguish processes where equality on the trace semantics indicate that they should be regarded as equal and that is why bisimulations turn out to be more useful relations to compare processes.
Using the example above we can prove that \(Q \lesssim P\). Let’s define half-evaluated processes as
\[P' = \text{coffee}. 0 + \text{tea}. 0\] \[P'_{1} = \text{coffee}. 0\] \[P'_{2} = \text{tea}. 0\] \[Q_{1} = \text{pay}.\text{coffee}.0\] \[Q_{2} = \text{pay}.\text{tea}.0\] \[Q'_{1} = \text{coffee}.0\] \[Q'_{2} = \text{tea}.0\]Now for all transitions of \(Q\) we have to show \(P\) simulates them. The first one is \(Q \xrightarrow{\text{pay}} Q'_{1}\). Obviously \(P \xrightarrow{\text{pay}} P'_{1}\) and so now we have to show that \(Q'_{1} \lesssim P'_{1}\) which clearly does. This works similarly if \(Q\) decides to take the other route and produce tea in the end.
All right, but \(P \lesssim Q\) does not work. This is because since \(P\) makes a transition \(P \xrightarrow{\text{pay}} P'\) we are forced to select which branch in \(Q\) is simulating this behaviour. No matter which one we choose we get stuck in one way or the other. Say \(Q \xrightarrow{\text{pay}} Q'_{1}\) we have to show \(P' \lesssim Q'_{1}\), but this latter fact does not hold because \(P'\) can make two different transitions and \(Q'_1\) can only make one.
CoRecursion Schemes and Traces
Consider now the unfold function which takes a seed function an produces a trace by running the seed at each steps
unfold :: (x -> (L, x)) -> x -> Str L
unfold seed x = let (l,x') = seed x in l :: unfold seed x' Notice that the seed function \(X \to L \times X\) can be viewed as a Labeled Transition System (LTS) where the set of states is \(X\) and the function is the function implementing the transitions.
It is a very well-known fact that the unfold is a fully abstract map in the
sense if we consider the notion of bisimilarity above and set \([\![ \cdot
]\!]\) to be unfold seed then we have the following theorem
Full abstraction \(\text{ for all } t_{1}, t_{2}, t_{1} \approx t_{2} \Leftrightarrow [\![ t_{1} ]\!] = [\![ t_{2}]\!]\).
This is also backed by the fact that when programming in proof assistants like (e.g.) Agda – since coinductive data types are not really final coalgebras – it is common practice to just add the following axiom to the type theory
Axiom \(\text{ for all } (s_{1}, s_{2} : \text{Str L}). s_{1} \approx s_{2} \to s_{1} = s_{2}\) .
Even more so, in some proof assistants like Isabelle coinductive data types are real final coalgebras and so the above axiom is actually a true fact in the prover’s logic.
Notice that the other direction is obvious and thus the axiom implies bisimiliary is logically equivalent equality.
So why bisimulation in the above example does not correpond to equality?
The reason is that the shape behaviours for CCS+choice is not \(BX = L \times X\) but it is \(\mathcal{P}_\text{fin}(L \times X)\).
In fact, the seed function describing the LTS of CCS+choice has the following type
opsem :: CCS -> [(L, CCS)]where we use lists [-] as a (rough) implementation of finite powersets.
At this point the LTS for CCS+choice can be defined roughly like this
...
opsem (p + q) = [(l, p') | (l, p') <- opsem p ] ++ [(l, q') | (l, q') <- opsem q ] And now the unfold on this LTS will yield a fully abstract semantics
unfold opsem :: CCS -> Trees L where \(\text{Trees}\; L = \mathcal{P}_\text{fin} (L \times \text{Trees}\; L)\).
]]>Say that you want to do denotational semantics for a simply typed calculus with a unary constructor \(\textsf{R}\) which has the following typing rule
\[\frac{\Gamma \vdash t : A}{\Gamma \vdash \textsf{R}(t) : B}\]The task is to give a semantic interpretation \([\![ \cdot ]\!]\) for the language by induction on the typing judgment \(\Gamma \vdash t : A\) such that terms are interpreted as morphisms \([\![\Gamma ]\!] \xrightarrow{[\![ t ]\!]} [\![ A ]\!]\), assuming for course \([\![ \cdot ]\!]\) is also defined separately for contexts and types.
We interpret the rule above we do induction on the typing judgment. Thus we assume there exists a morphism \([\![ \Gamma ]\!] \xrightarrow{[\![ t ]\!]} [\![ A ]\!]\) and we construct a morphism \([\![ \Gamma ]\!] \xrightarrow{[\![ \textsf{R} ]\!] } [\![ B ]\!]\).
For simplicity we remove the semantics brackets, for example, assuming \(A\) be interpretation of \([\![ A ]\!]\), \(t : \Gamma \to A\) the interpretation of \(t\) an so on.
Back to the problem we are trying to solve. It can be quite tricky sometimes to figure out what the semantics of \(\textsf{R}(t)\) are since there is some plumming needed to pass around the context. A particular instantiation of the Yoneda lemma states that given a morphism \(t : \Gamma \xrightarrow{t} A\) and a morphism \(R : A \to B\) there is a canonical way to construct a morphism \(\Gamma \xrightarrow{R(t)} B\).
To show this we instantiate the contravariant Yoneda lemma by setting \(F = \mathbb{C}(-, B)\). Then for all objects \(A : \mathbb{C}^{\text{op}}\) we have
\[\mathbb{C}(A, B) \cong \mathbb{C}(-, A) \xrightarrow{\cdot} \mathbb{C}(-, B)\]Let \(R : A \to B\) be the interpretation of \(\textsf{R}\) then, one side of the isomorphism is \(\phi (\textsf{R},t) = F(t)(\textsf{R}) = \mathbb{C}(t, B)(\textsf{R})\). In other words, the interpretation of \(\textsf{R}(t)\) is simply \(\textsf{R} \circ t\).
]]>Assume \(\Lambda_X\) is the set of closed well-typed STLC (Simply Typed \(\lambda\)-calculus) terms. Clearly, STLC can be interpreted into any Cartesian Closed category (CCC) by defining an interpretation function \([\![\cdot]\!] : \Lambda_X \to \mathcal{C}\) such that for any term \(t \in \Lambda_X\) , \([\![t]\!] \in \mathcal{C}(1, [\![\sigma]\!])\) where \(\sigma\) is the type of \(t\). We will only consider well-typed interpretations here. Moreover, it can be proved that the interpretation function is sound and complete. The completeness statement reads as follows. For all terms \(t_1\) and \(t_2\),
\[t_1 \equiv_{\beta\eta} t_2 \text{ iff } [\![t_1]\!] = [\![t_2]\!]\]where the \((\Rightarrow)\) direction is soundness whereas \((\Leftarrow)\) is completeness of the interpretation.
This statement is certainly true. If two terms are \(\beta\eta\) equivalent they are equal in the model, i.e. the semantics is agnostic to \(\beta\eta\)-step reductions. Conversely, all equations that hold for any two STLC-denotable terms also hold in the syntax.
However, completeness of a model is a slightly different statement:
\[t_1 =_{\beta\eta} t_2 \text{ iff for all } [\![ \cdot ]\!] : \Lambda_X \to \mathcal{C}, [\![t_1]\!] = [\![t_2]\!]\]This one states that fixed a category \(\mathcal{C}\), \(\beta\eta\)-equivalence between terms holds if and only if these two terms are equal in every possible interpretation.
In this sense, CCC categories are not complete models. The counter example is given by the preorder category \(\mathcal{P}\) with CCC structure. The preorder the category has at most one morphism (\(\sqsubset\)) between objects. If this category has the greatest element \(\top\), binary meets (\(\wedge\)) and Heyting implications (\(\to\)) then \(\mathcal{P}\) is CCC.
Now the problem is that when the category is thin every (well-typed) interpretation interprets two programs of the same type into morphisms of the same type, but since the category is thin these two morphisms are always equal. For example, consider the projection maps out of the product \(x \wedge x \xrightarrow{\pi_1} x\) and \(x \wedge x \xrightarrow{\pi_2} x\) for the particular case when the codomain of the two coincide. In \(\mathcal{P}\) these two are the same map, i.e. \(\pi_1 = \pi_2\).
Now the right-hand side of the completeness theorem is satisfied since For all well-typed interpretations \([\![\cdot]\!]\) we have \([\![\pi_1]\!] = [\![\pi_2]\!]\) (when the codomain of the two is the same). However, the projections \(\pi_1\) and \(\pi_2\) in the syntax are definitely not \(\beta\eta\)-equivalent.
I will defer the reader to the original paper for more details.
]]>First off, I do not consider myself an expert on set theory, but after having this kind of conversation with mathematicians and computer scientists I found there are some misconceptions around this axiom and the reasons why it is needed.
For example, as you will see, it is indeed true that the axiom of choice is connected with the existential quantifier, it is not true, however, that we cannot pick an element out of the existential because the logic is classical.
In my mind there are two problems: the first is that
the existential quantifier does not ensure there exists one element with a particular property in the domain of discourse
and the second is that
we would need to create a infinite proof that uses Existential Instantiation for each element of the indexing set
However, in order to fully understand what is going on we need to be more precise. So first let’s begin with what is the axiom of choice.
The axiom of choice (AC)
The original formulation of the AC is the following.
Given a set \(X\) and a family of non-empty sets \(\{A_x\}_{x \in X}\) over \(X\), the infinite product of these sets, namely \(\Pi_{x \in X}. A_{x}\) is non-empty
For the record, the infinite product is defined as follows
\[\Pi_{x \in X}. A_{x} = \{ f : X \to \bigcup_{x \in X} A_{x} \mid f(x) = A_{x} \}\]However, this statement is a little bit more packed than we would like it to be. An equivalent statement is skolemization.
Skolemization (Sk)
Skolemization is what allows one to turn an existentially quantified formula into a function. Formally, skolemization is the following statement
Given a relation \(R \subseteq X \times Y\), \(\forall x \in X. \exists y \in Y. R(x,y)\) then \(\exists f \in X \to Y. \forall x \in X. R (x, f(x))\)
The AC is equivalent to Skolemization. A full discussion of this fact can be found in here
For proving that Sk \(\Rightarrow\) AC, for a family of sets \(\{A_{x}\}_{x \in X}\), we define a relation \(R(x,y) = y \in A_{x}\). For the other direction we assume a relation \(R \subseteq X \times Y\) and then we construct the family of sets \(\{A_{x}\}_{x \in X}\) such that each \(A_{x} = \{ y \mid y \in Y \text{ and } R(x,y)\}\).
The existential
Set theory is a first-order logic together with a set of axioms (9 of them exactly including the AC) postulating the existence of certain sets. Besides the propositional fragment of first-order logic there is also the predicate fragment formed by universal quantification (\(\forall\)) and existential quantification (\(\exists\)).
The Existential Instantiation rule states that if we know there exists an \(x\) that satisfies the property \(P\) and we can construct a proof from a fresh \(t\) that satisfies that property to a proposition \(R\) then we can obtain \(R\)
\[\frac{\exists x. P \qquad t, P[t/x]\cdots R }{R}\]with \(t\) free for \(x\) in \(P\).
So here we have to treat \(t\) carefully in that it is a fresh \(t\) that satisfies \(P\), but “we do not know what it is!”.
The reason why I put this sentence in quotes is because this is the explanation that many people would use. However, to me the real reason is that we do not know how many other elements in the universe exist with such a property. There is certainly one, but there may be more.
The problem with producing a choice function
To prove Sk we have to assume \(\forall x \in X. \exists y \in Y. R(x, y)\) and then prove \(\exists f : X \to Y. \forall x \in X. R (x , f (x))\). Though \(f : X \to Y\) really means a relation \(f \subseteq X \times Y\) such that it is a function, i.e. that for all \(x \in X\) there exists only one \(y \in Y\) such that \((x,y) \in f\).
Now first we try to construct this relation \(f\). A first naive attempt is to use the axiom of comprehension as follows
\[f = \{(x, y) \mid x \in X \wedge y \in Y \wedge R(x, y)\}\]The problem is that \(f\) is clearly not a function since there may be more than one \(y\) per one \(x\) in \(R\). Notice that the above statement is very simlar to the one where we include the existential
\[f = \{(x, y) \mid x \in X \wedge \exists y'. y = y' \wedge R(x, y)\}\]But this does not change much from before since we know there exists at least one \(y\) per every \(x\) but we do not know how many. Clearly, we can prove that for all \(x \in X\) we have \(R(x, f(x))\), however, we cannot prove that \(f\) is a function. In particular, that for each \(x \in X\) we have a unique \(y \in Y\) we map \(x\) to.
Now the question is, couldn’t we just have picked one \(y\) for each \(x\)?
We could do this if we were able to use Existential Instantiation for each \(x \in
X\). If \(X\) was finite then we could certainly do that as we can pick an \(n \in \mathbb{N}\) and assume \(X\) assuming that \(X = \{x_0, x_1, \dots, x_n \}\).
Now we can construct a set of pairs \((x_i, y_i)_{i\in \{1,\dots,n\}}\) such that every \((x_i, y_i) \in R\) by repeatedly using existential instantiation. Once the set is created we can assign \(f\) to it
However, when \(X\) is not finite, we cannot simply write down the set by hand. Instead we have to create a formula and then use set comprehension. However, there is no (open) formula of the form
$(x_0,y_0) \in R \wedge (x_1,y_1) \in R \wedge \dots \wedge (x_n, y_n) \in R \wedge \dots }$$
This is because formulas and proofs in set theory are finite and the one above is an infinite formula which would need an (potentially) infinite number of applications of the Existential Instantiation rule.
Conclusions
Hopefully this untangles some confusion around the axiom of choice.
On the other hand, AoC is derivable in Type Theory simply because we have access to the proof that for every \(x\) there exists a \(y\) such that \(R(x,y)\). But the reason why there exists only one is because inhabitants of the dependent product \(\forall\) are functions already.
See the code below.
choice : ∀ (A B : Set) → ∀ (R : A → B → Set) → (∀ (x : A) → Σ B (λ y → R x y)) → Σ (A → B) (λ f → ∀ x → R x (f x))
choice A B R r = (λ x → proj₁ (r x)) , (λ x → proj₂ (r x)) If you have any comment about this please feel free to drop me an email or something I would very happy to know more (especially if I said something wrong).
###
]]>Here the word trivial means that every object \(A\) in the category is isomorphic to the terminal object \(1\).
To do this proof we make use of the fixed-point operator, which exists at all types.
We know that for all endomaps \(f : A \to A\) in the category there exists a map \(\text{fix}_{f} : 1 \to A\) such that \(f \circ \text{fix}_{f} = \text{fix}_{f}\). Thus, we can use the unique endomap on the initial object, namely the identity map \(id_{0}: 0 \to 0\), to get a map \(\text{fix}_{id_{0}} : 1 \to 0\). But now, because \(0\) is initial (and \(1\) is terminal), we also have a unique map into the terminal object, namely \(! : 0 \to 1\). It is easy to see that \(\text{fix}_{id_{0}}\) and \(1\) are inverses to each other, hence they form an isomorphism \(0 \cong 1\). In particular, \(\text{fix}_{id_{0}} \circ ! : 0 \to 0\) is \(id_{0}\) by initiality and \(! \circ \text{fix}_{id_{0}} : 1 \to 1\) is \(id_{1}\) by finality.
Now we compute as follows. For every object \(A\) in the category \(1 \cong 0 \cong 0 \times A \cong 1 \times A \cong A\) and the proof is concluded.
This result was shown to hold also when in the case when instead of the initial object we postulate a natural numbers object \(\mathbb{N}\).
A natural question to ask now is:
is every model of PCF trivial?
To answer this question we take as a model of PCF the category of Scott domains. This category consists of pointed directed complete partial orders (dCPPO) as objects and continuous functions as arrows (just following Thomas Streicher’s book to avoid any misunderstanding).
Now, we would like to prove that this category is cartesian closed (which we know), has a fixed-point map (which it has) and that it has an initial object. However,
there is no initial object in the category of Scott domains
This is because if this category had an initial element \(0\) it would have at least a bottom element \(\bot_0\). Notice that the subset \(\{\bot_0\}\) is indeed directed and its suprema \(\bigsqcup \{\bot_0\}\) is \(\bot_0\) itself. Now if we take any other dCPPO \(X\), a continuous function \(f : 0 \to X\) that maps \(\bot_{0}\) to any element \(x \in X\) will satisfy the equation
\[f \bigsqcup \{\bot_0\} = \bigsqcup f \{\bot_0\}\]because, for any \(x \in X\) we choose for \(f(\bot_0)\) (even the bottom element), \(\bigsqcup f \{\bot_0\} = \bigsqcup \{x\} = x\).
The only way this category had an initial element is if the arrows in the category were strict, namely they preserved \(\bot\) elements, but, as we have seen, continuous functions do not necessarily preserve it.
Is this just a coincidence that Scott’s model is not trivial?
Not really. Because if it was trivial it would have broken computational adequacy which is the statement that for every pair or well-typed terms in the language \(\Gamma \vdash t : A\) and \(\Gamma \vdash t' : A\)
if \([\![ t ]\!] = [\![ t' ]\!]\) then \(t \approx t'\)
where \(\approx\) is contextual equivalence of programs.
But if the models was trivial then all the pairs of PCF-denotable terms (pairs of maps into something isomorphic to \(1\)) would be equal (by finality) and therefore operationally equivalent.
What does this all mean for the Haskell programmer?
Well nothing, because Haskell does not have a formal model.
But let’s say we make a big leap and take the fragment of Haskell consisting of “inductive data types” and recursion. Now I can craft a program that resembles what I just said above
{-# LANGUAGE GADTs #-}
data Empty where
data Unit = One ()
y :: (a -> a) -> a
y f = f (y f)
empty :: Empty -> Empty
empty x = x
(===) :: a -> a -> a
x === y
endoEmpty :: Unit -> Empty
endoEmpty = y id === id (y id) -- by Fixed-point property y f = f (y f)Is this a problem? No, this is not a problem because y id is the infinite
computation. In other words, sends the unit element to \(\bot\). But since
Haskell functions need not to be strict, I can send the \(\bot\) element in
Empty to One (). So this map is not an isomorphism.
Conclusions
This is probably a very convoluted way of saying
There is no initial object (or natural numbers object) in PCF (or other “PCF-like” languages like Haskell)
this is because Empty actually contains the bottom element \(\bot\).
For the same reasons, if we now consider System F with a polymorphic fixed-point operator and define the \(0\) object by setting
This object has actually an inhabitant: the non-terminating computation. Thus, it is not the initial object.
]]>