| CARVIEW |
I’ll start by rationalizing an example of “old” umbral calculus from Wikipedia. |\newcommand{pair}[2]{\langle{#1}\mid{#2}\rangle}| |\newcommand{bigpair}[2]{\left\langle{#1}\ \middle|\ {#2}\right\rangle}| |\newcommand{pseq}[2]{\{#1\}_{#2 \in \mathbb N}}| |\newcommand{ucomp}[2]{#1_n(\underline #2(x))}|
We know |(x+y)^n = \sum_{k=0}^n {n \choose k} x^{n-k} y^k|. We then “infer” that |B_n(x+y) = \sum_{k=0}^n {n \choose k} B_{n-k}(x) y^k| where |B_n(x)| are the Bernoulli polynomials. Actually, the “old” style would be even more dubious. You’d have a “rule” like representing |B_n(x+y)| as |(b+y)^n|, then expand like usual and replace |b^k = (b + 0)^k| with |B_k(x)|. The variables like |b| were the “shadow” or “umbral” variables.
Rationalizing it using techniques I’ll describe below. Let |\varepsilon_y| be the linear operator on polynomials satisfying |\varepsilon_y p(x) = p(x + y)|. Since |D_x[\varepsilon_y p(x)] = \varepsilon_y D_x p(x)| for all |y| where |D_x| is differentiation by |x|, |\varepsilon_y| is induced from a formal power series in |D_x|. In particular, |\varepsilon_y = e^{yD_x}|.
Let |T| be the linear operator (the Sheffer operator) characterized by mapping |x^n| to |B_n(x)| representing a change of basis on the vector space of polynomials. We can apply |T| to the equation \[ \varepsilon_y(x^n) = (x+y)^n = \sum_{k=0}^n {n \choose k} x^{n-k} y^k \] to get via linearity \[ T(\varepsilon_y(x^n)) = T((x+y)^n) = \sum_{k=0}^n {n \choose k} T(x^{n-k}) y^k = \sum_{k=0}^n {n \choose k} B_{n-k}(x) y^k \]
The key property we then need is |T(\varepsilon_y(x^n)) = \varepsilon_y T(x^n) = B_n(x + y)| which can be reduced to |D_xT(x^n) = T(D_x x^n)|. This is just the statement that |D_x T(x^n) = D_x B_n(x) = nB_{n-1}(x) = nT(x^{n-1}) = T(nx^{n-1}) = T(D_x x^n)| using a well-known property of Bernoulli (and other) polynomials. In fact, this relation implies that |T| is itself induced by a formal power series and thus commutes with any other linear operator so induced.
Ultimately, the only properties we needed for this was that we had a (linear) change of basis from the monomial basis to the polynomial sequence, which we’ll have for any polynomial sequence whose |n|th element has degree |n|, and that the change of basis commuted with the |D_x| operator. The latter is more stringent, but there are various ways we can expand the scope of the argument.
First, |D_x x^n = \frac{c_n}{c_{n-1}} x^{n-1}| with |c_n = n!| completely characterizes (with |D_x x^0 = 0|) the differentiation operation on polynomials. Choosing a different sequence |c| leads to different notions of “differentiation”. This will change |\varepsilon_y| and lead to different formulas, but they will be structurally similar.
In a different direction, we can ask for other formal power series and polynomial sequences that relate to each other the way |D_x| and |x^n| do. We say that a polynomial sequence |s_n(x)| is Sheffer for a pair of formal power series |(g(t), f(t))| with |\deg g = 0| and |\deg f = 1| when |\pair{g(t)f(t)^k}{s_n(x)} = c_n\delta_k^n|. (This inner-product-like notation will be defined later, but the key thing is that this mirrors |\pair{t^k}{x^n} = c_n\delta_k^n|.) This has |g(t)f(t)^k| taking the place of |t^k| and |s_n(x)| taking the place of |x^n|. This would let us transfer identities involving the |s_n(x)| to any linear operator that commutes with |g(t)f(t)|. While changing |c| changes our notion of “differentiation”, using a Sheffer sequence allows us to consider other “differential operators” using the same notion of “differentiation”.
This is based primarily off of works by Steven Roman and Gian-Carlo Rota. It closely follows The Theory of the Umbral Calculus. I by Steven Roman (1982). See also The Umbral Calculus by Steven Roman (1984).
I’ll include proofs below to illustrate that each bit of reasoning is fairly straightforward.
- Overview
- Conventions
- Formal Power Series
- Linear Functionals
- Linear Operators
- Polynomial Sequences
- Recurrence Formulas
- Transfer Formulas
- Umbral Composition and Transfer Operators
- Example: Chebyshev Polynomials
- Summary
Overview
One of the key ideas of umbral calculus is the relation of four vector spaces: the space of polynomials, its dual space, i.e. linear functionals on the space of polynomials, linear operators, i.e. endomorphisms, on the space of polynomials, and the space of formal power series. It will turn out that the space of formal power series and the space of linear functionals on the space of polynomials are isomorphic not just as vector spaces but as algebras. Further, we can embed the space of formal power series as a commutative sub-algebra of the space of linear operators on the space of polynomials. This latter view views formal power series as differential operators on polynomials. This gives us three perspectives on formal powers series and two ways they interact with polynomials.
We’ll also be interested in linear operators on the space of polynomials that aren’t induced by formal power series such as the transfer or umbral operators and umbral shift operators. A transfer operator is essentially a “well-behaved” change of basis from the monomial basis. Umbral shift operators generalize the “multiply by |x|” operator. These operators will have adjoints that are linear operators on the space of formal power series.
We’ll find that a linear operator on the space of formal power series is an adjoint of a linear operator on the space of polynomials if and only if it is continuous in a sense to be defined. Surjective derivations on the space of formal power series will be exactly those adjoint to umbral shifts. Finally, continuous algebra automorphisms on the algebra of formal power series will be exactly those adjoint to transfer operators.
Beyond these structural aspects, we’ll also derive many results for working with Sheffer sequences and polynomial sequences along the way.
Conventions
Fix a field |\mathbb K| of characteristic |0|. I’ll write |\mathscr F = \mathbb K[\![t]\!]| for the |\mathbb K|-algebra of univariate formal power series (with indeterminate |t|) and |P = \mathbb K[x]| for the |\mathbb K|-algebra of univariate polynomials (with indeterminate |x|). Further, |\mathrm{Hom}(X, Y)| will be the |\mathbb K|-vector space of |\mathbb K|-linear maps from a |\mathbb K|-vector space |X| to a |\mathbb K|-vector space |Y|. In particular, |X^* = \mathrm{Hom}(X,\mathbb K)| is the dual space of |X|.
The four main |\mathbb K|-vector spaces we’ll be focused on are |\mathscr F|, |P|, |P^*|, and |\mathrm{Hom}(P, P)|. The first two are additionally |\mathbb K|-algebras, and we’ll find that the third is as well and is, in fact, isomorphic to |\mathscr F| which is arguably a key enabling fact of umbral calculus. In particular, the |\mathbb K|-algebra structure induced on |P^*| via |\mathscr F| is called the umbral algebra.
Given the above, unsurprisingly, we’ll be talking a lot about formal power series and polynomials. To save a bit of space and typing for me, unless otherwise specified, if I say |f| is a formal power series or |f \in \mathscr F|, then that will also defined the sequence |\pseq{f_n}{n}| such that |f(t) = \sum_{k=0}^\infty f_k t^k|. Generally, when I state something is a sequence it will be a function |\mathbb N \to X| for some |X| and the parameter will be written as a subscript. So the formal power series |f| also defines a sequence, also called |f|, from |\mathbb N \to \mathbb K|. (In this case, we could literally identify formal power series with these sequences.) Similarly, stating |p| is a polynomial or |p \in P| defines a sequence also called |p| such that |p(x) = \sum_{n=0}^{\deg p}p_n x^n| where |\deg p| is the degree of the polynomial, i.e. the largest value of |n| such that |p_n \neq 0|. These are also sequences |\mathbb N \to \mathbb K|, and we could identify polynomials with such sequences that are eventually always |0|. We also have the degree of a formal power series written |\deg f| which is the smallest |k| such that |f_k \neq 0|. Note that this is dual to the notion for polynomials. It’s clear that |\deg(fg) = \deg f + \deg g|. Occasionally it will be useful to have |\deg 0 = -\infty| for polynomials and |\deg 0 = \infty| for formal power series.
Of course, sometimes I will explicitly state something like |f(t) = \sum_{k=0}^\infty a_k t^k| in which case the sequence |f| is not defined. Usually, this will arise with a formal power series expression, e.g. |(f \circ g)(t)| so there shouldn’t be any ambiguity. As is typical, I’ll often say “the formal power series |f(t)|” as opposed to “the formal power series |f|”. Finally, as has been illustrated, I’ll endeavor to use |k| as the indexing variable for formal power series and |n| for polynomials, but that won’t always be possible.
The Kronecker delta is written |\delta_k^n| and defined by |\delta_n^n = 1| and |\delta_k^n = 0| for |k \neq n|. This should typically be thought of as a way of forcing |n| and |k| to be equal, i.e. |f(n)\delta_k^n = f(k)\delta_k^n| and |\delta_{f(k)}^n = \delta_k^{f^{-1}(n)}| or, equivalently, |\delta_k^n = \delta_{f(k)}^{f(n)}| for an invertible function |f|.
Formal Power Series
I’ll assume familiarity with the basic algebra of formal power series. This lecture gives a nice in-depth and more technical overview, though it goes far beyond what we’ll need. I’ll recall a few results and fix the terminology and notation that will be used in this article which largely follows Roman but there are many variations in the literature.
Theorem (id:wewp): |f \in \mathscr F| has a multiplicative inverse, written |f^{-1}| if and only if |\deg f = 0|.
Proof: (click to expand)
|f(t)g(t) = \sum_{k=0}^\infty c_n t^k| where |c_k = \sum_{i+j=k} f_i g_j|. Clearly, |c_0 = f_0 g_0| which makes it clear a multiplicative inverse to |f| can only exist if |f_0 \neq 0|. It also makes it clear that for |g| to be |f^{-1}|, |g_0 = 1/f_0|. A simple calculation shows that |g_k = f_0^{-1} \sum_{n=1} f_n g_{k-n}| for |k > 0| which gives a recurrence computing all the remaining coefficients of |f^{-1}|. |\square|Thus, |f| being invertible will be synonymous with |f| having degree |0|.
It’s worth noting that |f/g| can be defined even when |g| isn’t invertible, e.g. |t/t = 1|. If |\deg f > \deg g|, then we can cancel out common factors of |t| until the denominator is invertible.
Suppose |g : \mathbb N \to \mathscr F|, which we’ll write as |g_k(t) \in \mathscr F|, such that |\deg g_k \to \infty| as |k \to \infty|. Given any |a : \mathbb N \to \mathbb K|, then |\sum_{k=0}^\infty a_k g_k(t)| is a well-defined element of |\mathscr F|. In particular, if |\deg g > 0|, then |g_k(t) = g(t)^k| satisfies the conditions. If we have |\deg g_k = k|, which is the case when |g_k(t) = g(t)^k| for |g| with degree |1| for example, then |g| forms a pseudobasis1 for |\mathscr F| meaning for any |f \in \mathscr F|, there exists a unique sequence |a| such that |f(t) = \sum_{k=0}^\infty a_n g_k(t)|. A series |f| with |\deg f = 1| is called a delta series. Every delta series gives rise to a pseudobasis of |\mathscr F|.
If |f, g \in \mathscr F| and |\deg g > 0|, then we can thus form the composition |f(g(t)) = \sum_{k=0}^\infty f_k g(t)^k|. It’s clear that |\deg(f\circ g) = \deg f\deg g|.
Theorem (id:cjme): A series, |f|, has a compositional inverse, written |\bar f|, meaning |f(\bar f(t)) = t = \bar f(f(t)| if and only if |\deg f = 1|.
Proof: (click to expand)
Suppose |g \in \mathscr F| such that |f(g(t)) = t|. By taking degrees, we immediately see that |f| (and |g|) need to be of degree exactly one to have any chance for |g| to be a compositional inverse to |f|. If |g(t)^k = \sum_{n=0}^\infty b_{k,n} t^n|, then clearly we need |f_1 b_{1,1} = 1|. |g(t)^{k+1} = g(t)g(t)^k| implies |b_{k+1,n} = \sum_{i+j=n} b_{1,i}b_{k,j}|. But note that |b_{k,n} = 0| for all |n < k| since |\deg g(t)^k = k| so this sum always has |k \leq j < n| and thus |1 \leq i \leq n-k|. Because the |n|-th coefficient of |f(g(t))| is |0| for |n > 1|, \[\sum_{k=1}^n f_k b_{k,n} = 0 = f_1 b_{1,n} + \sum_{k=2}^n f_k b_{k,n}\] Expanding |b_{k,n}| in the last sum shows that it only involves |b_{1,i}| for |i < n|. Since |\deg f = 1|, |f_1 \neq 0|, and we can thus solve for |b_{1,n}| iteratively.
Alternatively, simply note that if |f(t) = th(t)| and |g(t) = tk(t)| then |h| and |k| have degree |0| and |t = f(g(t)) = t k(t)h(g(t))|. So |k(t) = h(g(t))^{-1} = h(tk(t))^{-1}|. |h| is invertible and |tk(t)| clearly has degree greater than |0| so the composition is well-defined and invertible. Unfolding this expression for |k(t)| shows that the |i|th coefficient only depends on earlier coefficients. |\square|A useful result linking these two together is if |\deg f = 1|, then \[ 1 = t’ = [\bar f(f(t))]' = \bar f’(f(t))f’(t) \] where |f’(t)| is differentiation of formal power series. In other words, |f’(t)^{-1} = \bar f’(f(t))|.
Linear Functionals
For |L \in P^*| and |p \in P|, we’ll write |\pair{L}{p(x)}| for the action of |L| on |p|. Any such |L| is uniquely defined by its values on |x^n| for all |n\in\mathbb N|.
If |c : \mathbb N \to \mathbb K\setminus \{0\}|, we can define for each |f \in \mathscr F| a linear functional which we’ll also write as |f| or |f(t)| via \[\pair{f(t)}{x^n} = c_n f_n\] Really, we should write something like |\pair{f(t)}{p(x)}_c| to indicate the dependence on |c|. This play on notation is unambiguous since |f(t) = g(t)| if and only if |\pair{f(t)}{x^n} = \pair{g(t)}{x^n}| for all |n|, i.e. |f| and |g| are equal as power series if and only if the induced linear functionals are equal.
Notable choices for |c| are:
- |c_n = n!| is the most traditional case.
- |c_n = 1|
- |c_n = 1/{\lambda \choose n}| for |\lambda| not a negative integer.
The definition of the linear functional induced by |f \in \mathscr F| implies that |\pair{t^k}{x^n} = c_n\delta_k^n|. This leads to \[\bigpair{\sum_{n=0}^\infty a_n t^n}{p(x)} = \sum_{n=0}^\infty a_n \pair{t^n}{p(x)} \] where the right-hand side is well-defined because only finitely many of the terms of the sum will be non-zero. (We can generalize to allow Laurent series with only finitely many negative powers on the left and Laurent series with only finitely many positive powers on the right.)
We can articulate L’Hôpital’s rule with this notation as: if |\deg f \geq \deg g > 0|, then \[\pair{f(t)/g(t)}{x^0} = \pair{f’(t)/g’(t)}{x^0} \]
We can explicitly write the formal power series, |f_L|, corresponding to the linear functional, |L|, as \[f_L(t) = \sum_{k=0}^\infty \frac{\pair{L}{x^k}}{c_n}t^k\] It is trivial to verify that the linear functional induced by |f_L| is |L|. This gives an isomorphism |\mathscr F \cong P^*| as |\mathbb K|-vector spaces. However, the algebra structure on |\mathscr F| then induces an algebra structure on |P^*|. We can compute \[\pair{f(t)g(t)}{x^n}\rangle = \sum_{i+j=n}\frac{c_n}{c_i c_j}\pair{f(t)}{x^i}\pair{g(t)}{x^j}\]
We’ll call |L| a delta/invertible functional when it corresponds to a delta/invertible power series.
Some Properties
Proposition (id:yiyi): If |\deg f > \deg p| then |\pair{f(t)}{p(x)} = 0|.
Proof: Let |N = \deg p|.
|\pair{f(t)}{p(x)} = \sum_{n=0}^N p_n\pair{f(t)}{x^n} = \sum_{n=0}^N p_n f_n c_n| but
|f_n = 0| for all |n < \deg f|. |\square|
Proposition (id:pove): If |\deg p_n(x) = n| and |\pair{f(t)}{p_n(x)} = 0| for all |n \in \mathbb N|, then |f(t) = 0|.
Proof: If |f(t) = \sum_{k=K}^\infty f_k t^k|, then
|\pair{f(t)}{p_K(x)} = p_{K,K}\pair{f(t)}{x^K} = p_{K,K} f_K c_K = 0| so |f_K = 0| and
we can repeat with |K+1|. |\square|
Proposition (id:fpoz): |\pair{f(at)}{p(x)} = \pair{f(t)}{p(ax)}|
Proof: Follows immediately from
|\pair{a^n t^n}{x^n} = a^n c_n = \pair{t^n}{a^n x^n}|. |\square|
Proposition (id:nxqu): |p(x) = \sum_{n=0}^\infty \frac{\pair{t^n}{p(x)}}{c_n}x^n|
Proof: Just expand |p(x)| as |\sum_{n=0}^{\deg p} p_n x^n| on both sides and simplify.
|\square|
Proposition (id:rnhv): If |\deg f_k = k| and |\pair{f_k(t)}{p(x)} = 0| for all |k \in \mathbb N|, then |p(x) = 0|.
Proof: If |p(x) = \sum_{n=0}^k p_n x^n| and |f_k(t) = \sum_{j=k}^\infty a_j t^j|,
then |\pair{f_k(t)}{p(x)} = \pair{a_k t^k}{p_k x^k} = a_k p_k c_k = 0| so |p_k = 0|.
|\square|
Propositions (id:pove) and (id:rnhv) will often be invoked tacitly to show that two formal power series or polynomials are equal. For example, choose |p(x) = r(x) - s(x)| in proposition (id:rnhv). In particular, we will often just prove something like |\pair{t^k}{r(x)} = \pair{t^k}{s(x)}| with |k| implicitly being arbitrary to conclude |r(x) = s(x)|.
Evaluation Functional
We always have the evaluation functional |\varepsilon_y| for |y \in \mathbb K| defined by \[\pair{\varepsilon_y(t)}{p(x)} = p(y)\] Note that this definition doesn’t depend on the choice of |c|. We quickly compute |\pair{\varepsilon_y(t)}{x^n} = c_n y^n| so \[ \varepsilon_y(t) = \sum_{k=0}^\infty \frac{y^k}{c_k}t^k\]
When |c_n = n!|, then |\varepsilon_y(t) = e^{yt}|.
Formal Derivative
The formal derivative of |f \in \mathscr F|, written |\partial_t f(t)|, is defined as \[\partial_t t^k = \begin{cases} \frac{c_k}{c_{k-1}}t^{k-1}, & k > 0 \\ 0, & k = 0 \end{cases}\] which leads to the key property |\pair{\partial_t f(t)}{p(x)} = \pair{f(t)}{xp(x)}|.
As an example, we immediately compute that |\partial_t\varepsilon_y(t) = y\varepsilon_y(t)|.
We will also use the ordinary derivative of formal power series which we’ll notate with |f’(t)|. The formal derivative and the ordinary derivative coincide when |c_n = n!| as suggested by the previous example.
Linear Operators
We’ve identified formal power series with linear functionals on |P|. Next, we want to identify them with linear operators on |P|. We’re clearly not going to get an isomorphism in this case as multiplication (i.e. composition) of linear operators doesn’t commute in general, while multiplication of formal power series does. Nevertheless, we will derive simple characterizations of which linear operators are of this form.
One of the most important properties we will want is the following adjointness property: \[ \pair{f(t)g(t)}{p(x)} = \pair{f(t)}{g(t)p(x)} \] where |g(t) p(x)| is the action of the linear operator induced by |g(t)| on |p(x)|. We can derive what the induced linear operator must be to satisfy this property.
If |k \leq m| and |k \leq n|, then
\[\begin{align} \pair{t^{m-k}}{t^k x^n} & = \pair{t^{m-k} t^k}{x^n} \\ & = \pair{t^m}{x^n} \\ & = c_n \delta_m^n \\ & = c_n \delta_{m-k}^{n-k} \\ & = c_n \frac{\pair{t^{m-k}}{x^{n-k}}}{c_{n-k}} \\ & = \bigpair{t^{m-k}}{\frac{c_n}{c_{n-k}} x^{n-k}} \end{align}\] so |t^k x^n = \frac{c_n}{c_{n-k}} x^{n-k}| for |k \leq n| and |0| otherwise. Thus, \[f(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k}} f_k x^{n-k} \] which can be extended to all polynomials by linearity.
This should look familiar. |tx^n = \frac{c_n}{c_{n-1}}x^{n-1}| which is exactly the same formula as the one for |\partial_t| except this operates on polynomials while |\partial_t| operates on formal power series. In particular, when |c_n = n!|, |t| behaves exactly like the derivative of polynomials with respect to |x|, and we see that the formal power series pick out a special class of differential operators on polynomials.
A simple calculation shows that |(t^j t^k) x^n = t^j (t^k x^n)| which lifts to a general associativity law: |(f(t) g(t)) p(x) = f(t) (g(t) p(x))|. The adjointness property also immediately implies that the induced linear operators commute.
As before, we will say delta/invertible operator when the linear operator is induced by a delta/invertible formal power series.
Define |Dx^n = nx^{n-1}|, |D^{-1}x^n = \frac{1}{n+1}x^{n+1}|, and |x^{-1} x^n = \begin{cases}x^{n-1}, & n > 0 \\ 0, & n = 0\end{cases}|. Then for various choices of |c|, |t| behaves as the following linear operators:
- |c_n = n!| implies |t = D|
- |c_n = 1| implies |t = x^{-1}|
- |c_n = (n!)^{m+1}| implies |t = (Dx)^m D|
- |c_n = 1/(-\lambda)_{(n)}| implies |t = -(\lambda + xD)^{-1} x^{-1}|
- |c_n = 1/n| implies |t = x^{-1} D^{-1} x^{-1}|
- |c_n = 1/{-\lambda \choose n}| implies |t = -(\lambda + xD)^{-1} D|
- |c_n = 2^{2n}(1+\alpha)^{(n)}/(1 + \alpha + \beta)^{(2n)}| implies |t = 4(1 + \alpha + \beta + 2xD)^{-1} (2 + \alpha + \beta + 2xD)^{-1} x^{-1} (\alpha + xD)|
- |c_n = (1-q^n)^{-1} \prod_{k=1}^n (1-q^k)| implies |tp(x) = (p(qx) - p(x))/(qx - x)|
Here |x_{(n)}| and |x^{(n)}| are the falling and rising factorials respectively.
A linear operator |T| on |\mathscr F| is continuous if given a sequence of formal power series |\pseq{f_k}{k}| such that |\deg f_k \to \infty| as |k \to \infty|, we have |\deg T(f_k) \to \infty| as |k \to \infty|.
Theorem (id:fqys): If |T| is a continuous linear operator on |\mathscr F|, then \[ T\left(\sum_{k=0}^\infty a_k f_k(t)\right) = \sum_{k=0}^\infty a_k T(f_k(t)) \] for all sequences |\pseq{a_k}{k}| in |\mathbb K| and |\pseq{f_k}{k}| in |\mathscr F| for which |\deg f_k \to \infty| as |k \to \infty|. In particular, a continuous linear operator is completely determined by its action on the elements of a pseudobasis.
Proof: (click to expand)
By the assumptions, both |\pair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n}| and |\pair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n}| involve only finitely many terms of the sum. That is, for every |n| there is some |N| such that for all |m > N|, |\pair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n} = \pair{T\left(\sum_{k=0}^m a_k f_k(t)\right)}{x^n}| and |\pair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n} = \pair{\sum_{k=0}^m a_k T(f_k(t))}{x^n}|. But \[\begin{align} \bigpair{T\left(\sum_{k=0}^\infty a_k f_k(t)\right)}{x^n} & = \bigpair{T\left(\sum_{k=0}^m a_k f_k(t)\right)}{x^n} \\ & = \bigpair{\sum_{k=0}^m a_k T(f_k(t))}{x^n} \\ & = \bigpair{\sum_{k=0}^\infty a_k T(f_k(t))}{x^n} \end{align}\] so these two linear functionals agree on a pseudobasis and thus are the same which implies the formal power series are the same as well. |\square|This can be cast as an instance of topological continuity, but I won’t describe that here.
Evaluation Operator
Unlike the linear functional case, the linear operator induced by the formal power series corresponding to the evaluation functional does depend on the choice of |c|. In general, we have: \[\varepsilon_y(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k} c_k} y^k x^{n-k} \]
For |c_n = n!|, |\varepsilon_y(t) p(x) = p(x + y)|. For |c_n = 1|, |\varepsilon_y(t) x^n = \frac{x^{n+1} - y^{n+1}}{x-y}|.
Characterizing Linear Operators Induced from Formal Power Series
Theorem (id:lgtb): Let |U| be a linear operator on |P|. There is an |f \in \mathscr F| such that |Up(x) = f(t) p(x)| for all |p \in P| if and only if |U| commutes with the operator |t|, i.e. |Utp(x) = tUp(x)| for all |p \in P|.
Proof: (click to expand)
The only if direction is obvious. For the other direction, first note that |\deg Ux^n \leq n| because |t^k Ux^n = Ut^k x^n = 0| if |k > n| so |\pair{t^k}{Ux^n} = 0| for all |k > n|. Now define \[f(t) = \sum_{k=0}^\infty \frac{\pair{t^0}{Ux^k}}{c_k}t^k \] Then, \[\begin{align} f(t) x^n & = \sum_{k=0}^n \frac{\pair{t^0}{Ux^k}}{c_k}t^k x^n \\ & = \sum_{k=0}^n \frac{c_n}{c_k c_{n-k}} \pair{t^0}{Ux^k} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^0}{Ut^{n-k} x^n}}{c_{n-k}} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^0}{t^{n-k} Ux^n}}{c_{n-k}} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^{n-k}}{Ux^n}}{c_{n-k}} x^{n-k} \\ & = \sum_{k=0}^n \frac{\pair{t^k}{Ux^n}}{c_k} x^k \\ & = Ux^n \end{align}\] The last equality relies on the degree of |Ux^n| being less than or equal to |n|. |\square|Corollary (id:viti): A linear operator on P has the form of |f(t)| for an |f \in \mathscr F| if and only if it commutes with any specific delta operator.
Proof: (click to expand)
The sequence of powers of the formal power series associated with the specific delta operator form a pseudobasis which means we can write |t| as an infinite linear combination of them. Thus the linear operator commutes with |t| and we can apply the theorem. |\square|Corollary (id:qllw): A linear operator on P has the form of |f(t)| for an |f \in \mathscr F| if and only if it commutes with |\varepsilon_y(t)| for all |y \in \mathbb K|.
Proof: |\varepsilon_y(t) - c_0^{-1} t^0| is a delta operator. |\square|
Polynomial Sequences
When we say |\pseq{p_n(x)}{n}| is a (polynomial) sequence, that will always mean that |\deg p_n(x) = n|.
Theorem (id:cgsr): Let |f| be a delta series and |g| be an invertible series, then there is a unique polynomial sequence |\pseq{s_n(x)}{n}| such that \[\pair{g(t)f(t)^k}{s_n(x)} = c_n\delta_k^n \] holds for all |n,k \in \mathbb N|.
Proof: (click to expand)
Uniqueness follows easily by considering |\pair{g(t)f(t)^k}{s_n(x) - r_n(x)} = 0| where |\pseq{r_n(x)}{n}| is another sequence satisfying the same property.
For existence, we can just brute force it. If |g(t)f(t)^k = \sum_{i=k}^\infty b_{k,i} t^i| and we set |s_n(x) = \sum_{j=0}^n a_{n,j} x^j|, then we want to solve for the |a_{n,i}| induced by the following triangular system of linear equations: \[\begin{align} c_n\delta_k^n & = \pair{g(t)f(t)^k}{s_n(x)} \\ & = \bigpair{\sum_{i=k}^\infty b_{k,i} t^i}{\sum_{j=0}^n a_{n,j} x^j} \\ & = \bigpair{\sum_{i=k}^n b_{k,i} t^i}{\sum_{j=0}^n a_{n,j} x^j} \\ & = \sum_{i=k}^n \sum_{j=0}^n b_{k,i} a_{n,j} \pair{t^i}{x^j} \\ & = \sum_{i=k}^n c_i b_{k,i} a_{n,i} \end{align}\] |b_{k,k} \neq 0| since |\deg g(t)f(t)^k = k| and |c_n \neq 0| by assumption. Therefore, the diagonal entries of the triangular matrix corresponding to this system of linear equations are non-zero, and thus the matrix is invertible. |\square|We’ll say |\pseq{s_n(x)}{n}| is the Sheffer sequence or is Sheffer for the pair |(g, f)|. When |g(t) = 1|, then we say that the corresponding Sheffer sequence is the associated sequence to |f|. When |f(t) = t|, then we say that the corresponding Sheffer sequence is the Appell sequence for |g|. Often I’ll use |\pseq{p_n(x)}{n}| for associated sequences, i.e. when |g(t) = 1|, and reserve |\pseq{s_n(x)}{n}| for the general case. The idea is that if |g(t)f(t)^k| takes the place of |t^k|, then |s_n(x)| takes the place of |x^n|. This is illustrated by the defining property and the following theorems.
Theorem (Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for any |h \in \mathscr F|, \[ h(t) = \sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k \]
Proof: (click to expand)
|\pseq{s_n(x)}{n}|, like any polynomial sequence, is a basis for |P|, so two linear functionals are equal if they agree on all |s_n(x)|. Clearly, \[\begin{align} \bigpair{\sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k}{s_n(x)} & = \bigpair{\sum_{k=0}^n \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k}{s_n(x)} \\ & = \sum_{k=0}^n \frac{\pair{h(t)}{s_k(x)}}{c_k}\pair{g(t)f(t)^k}{s_n(x)} \\ & = \pair{h(t)}{s_n(x)} \end{align}\] |\square|Corollary (Polynomial Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for a |p \in P|, \[ p(x) = \sum_{n=0}^\infty \frac{\pair{g(t)f(t)^n}{p(x)}}{c_n} s_n(x) \]
Proof: (click to expand)
Choose |h = \varepsilon_y| in the Expansion Theorem, then apply it as a linear functional to |p|. (Note this proof relies on |\mathbb K| having characteristic |0| so that |p(y) = q(y)| as functions |\mathbb K \to \mathbb K| implies |p = q| as polynomials.) |\square|Theorem (Generating Function): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t)) = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \] for all |y \in \mathbb K|.
Proof: (click to expand)
For the forward implication, using the Expansion Theorem we have: \[ \varepsilon_y(t) = \sum_{k=0}^\infty \frac{\pair{\varepsilon_y}{s_k(x)}}{c_k} g(t) f(t)^k = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} g(t) f(t)^k \] Substituting |\bar f(t)| for |t| and dividing both sides by |g(\bar f(t))| gives the result.
For the reverse implication, if |\pseq{r_n(x)}{n}| is the Sheffer sequence for |(g, f)| then we immediately get from the forward implication \[ \sum_{k=0}^\infty \frac{r_k(y)}{c_k} t^k = \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t)) = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \] and applying both sides to |x^n| then gives |s_n(y) = r_n(y)| for all |y \in \mathbb K|. (Again, this proof relies on the characteristic of |\mathbb K| being |0|.) |\square|Theorem (Conjugate Representation): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ s_n(x) = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} x^k \]
Proof: (click to expand)
Applying |\varepsilon_y| to both sides gives \[\begin{align} s_n(y) & = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} y^k \\ & = \bigpair{\sum_{k=0}^n \frac{y^k}{c_k}g(\bar f(t))^{-1}\bar f(t)^k}{x^n} \\ & = \bigpair{\sum_{k=0}^\infty \frac{y^k}{c_k}g(\bar f(t))^{-1}\bar f(t)^k}{x^n} \\ & = \pair{g(\bar f(t))^{-1}\varepsilon_y(\bar f(t))}{x^n} \\ \end{align}\] but \[ s_n(y) = \bigpair{\sum_{k=0}^\infty \frac{s_n(y)}{c_k}t^k}{x^n} \] so we can apply the Generating Function theorem. |\square|Theorem (Multiplication Theorem): Let |\pseq{s_n(x)}{n}| be Appell for |g|, then \[ s_n(\alpha x) = \alpha^n \frac{g(t)}{g(t/\alpha)} s_n(x) \] for |\alpha\neq 0|.
Proof: (click to expand)
\[\begin{align} \pair{t^k}{g(t/\alpha) s_n(\alpha x)} & = \pair{t^k g(t/\alpha)}{s_n(\alpha x)} \\ & = \pair{(\alpha t)^k g(\alpha t/\alpha)}{s_n(x)} \tag{proposition (id: fpoz)} \\ & = \alpha^k \pair{g(t) t^k}{s_n(x)} \\ & = \alpha^k c_n \delta_k^n \\ & = \alpha^n c_n \delta_k^n \\ & = \alpha^n \pair{g(t)t^k}{s_n(x)} \\ & = \pair{t^k}{\alpha^n g(t) s_n(x)} \end{align}\] so |g(t/\alpha) s_n(\alpha x) = \alpha^n g(t) s_n(x)|. |\square|Theorem (id:qqes): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if |\pseq{g(t)s_n(x)}{n}| is the associated sequence for |f|.
Proof: Just apply adjointness to the definition. |\square|
Theorem (id:cutg): A sequence |\pseq{p_n(x)}{n}| is the associated sequence for |f| if and only if 1) |\pair{t^0}{p_n(x)} = c_0 \delta_n^0| for all |n \in \mathbb N|, and 2) |f(t) p_n(x) = \frac{c_n}{c_{n-1}}p_{n-1}(x)| for all |n \in \mathbb N_+|.
Proof: (click to expand)
|f(t)^0 = t^0| implies the first condition. For the second condition, if |\pseq{p_n(x)}{n}| is associated to |f|, then for |k > 0| \[\begin{align} \bigpair{f(t)^{k-1}}{\frac{c_n}{c_{n-1}}p_{n-1}(x)} & = \frac{c_n}{c_{n-1}}\pair{f(t)^{k-1}}{p_{n-1}(x)} \\ & = c_n \delta_k^n \\ & = \pair{f(t)^k}{p_n(x)} \\ & = \pair{f(t)^{k-1}}{f(t)p_n(x)} \end{align}\] and |\pseq{f(t)^k}{k}| is a pseudobasis.
Conversely, assuming (1) and (2) hold, then \[\begin{align} \pair{f(t)^k}{p_n(x)} & = \pair{t^0}{f(t)^k p_n(x)} \\ & = \frac{c_n}{c_{n-k}} \pair{t^0}{p_{n-k}(x)} \tag{(2) k times} \\ & = \frac{c_n}{c_{n-k}} c_0 \delta_{n-k}^0 \tag{(1)} \\ & = c_n \delta_k^n \end{align}\] |\square|Theorem (id:hvdt): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if |f(t) s_n(x) = \frac{c_n}{c_{n-1}}s_{n-1}(x)|.
Proof: (click to expand)
For the forward implication, simply apply the previous theorem to |\pseq{g(t)s_n(x)}{n}| which is associated to |f|, then apply |g(t)^{-1}| to the resulting recurrence equation.
For the reverse implication, let |\pseq{p_n(x)}{n}| be the associated sequence for |f| and |U| be the linear operator defined by sending |s_n(x)| to |p_n(x)|. Then we have \[ Uf(t)s_n(x) = \frac{c_n}{c_{n-1}}Us_{n-1}(x) = \frac{c_n}{c_{n-1}}p_{n-1}(x) = f(t)p_n(x) = f(t)Us_n(x) \] Since |\pseq{s_n(x)}{n}| is a basis, we see that |U| commutes with a delta series and thus must be of the form |g(t)| for some |g| which is invertible because |U| preserves degree. Thus |\pseq{g(t)s_n(x)}{n}| is associated to |f| which is equivalent to |\pseq{s_n(x)}{n}| being Sheffer for |(g, f)|. |\square|Iterating |k| times gives |f(t)^k s_n(x) = \frac{c_n}{c_{n-k}}s_{n-k}(x)|, and this implies \[ h(f(t)) s_n(x) = \sum_{k=0}^n \frac{c_n}{c_{n-k}} h_k s_{n-k}(x) \]
Corollary (id:tvdx): \[ ts_n(x) = \sum_{k=0}^{n-1} \frac{c_n}{c_k c_{n-k}} \pair{t}{p_{n-k}(x)} s_k(x) \]
Proof: (click to expand)
Start by expanding |ts_n(x)| via the Polynomial Expansion Theorem \[\begin{align} ts_n(x) & = \sum_{k=0}^\infty \frac{\pair{g(t)f(t)^k}{ts_n(x)}}{c_k} s_k(x) \\ & = \sum_{k=0}^{n-1} \frac{\pair{t}{g(t)f(t)^k s_n(x)}}{c_k} s_k(x) \\ & = \sum_{k=0}^{n-1} \frac{c_k}{c_{n-k} c_k} \pair{t}{g(t) s_{n-k}(x)} s_k(x) \tag{theorem (id:hvdt)} \\ & = \sum_{k=0}^{n-1} \frac{c_k}{c_{n-k} c_k} \pair{t}{p_{n-k}(x)} s_k(x) \tag{theorem (id:qqes)} \end{align}\] |\square|Theorem (Sheffer Identity): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if \[ \varepsilon_y(t) s_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) s_j(x) \] for all |y \in \mathbb K| where |\pseq{p_n(x)}{n}| is associated to |f|.
Proof: (click to expand)
First, we’ll establish that \[ \varepsilon_y(t) p_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) p_j(x) \]
Applying |f(t)^k| to both sides of the equation leads to \[\begin{align} \pair{f(t)^k}{\varepsilon_y(t) p_n(x)} & = \pair{\varepsilon_y(t)}{f(t)^k p_n(x)} \\ & = \frac{c_n}{c_{n-k}} \pair{\varepsilon_y(t)}{p_{n-k}(x)} \\ & = \frac{c_n}{c_{n-k}} p_{n-k}(y) \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_{j-k}} p_i(y) \pair{t^0}{p_{j-k}(x)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) \pair{t^0}{f(t)^k p_j(x)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) \pair{f(t)^k}{p_j(x)} \\ & = \bigpair{f(t)^k}{\sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) p_j(x)} \end{align}\] which is most easily read from outwards in.
Doing the same trick as the previous proof, we let |U| be a linear operator defined by sending |s_n(x)| to |p_n(x)|. If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some |g|, we can choose |U = g(t)| and applying |U| to both sides of the above equation gives the forward direction since |g(t)| commutes with |\varepsilon_y(t)|. Conversely, if we assume the equation then we’ll see that |U| must commute with |\varepsilon_y(t)| which implies its of the form |g(t)|. |\square|As an example, we see that the Bernoulli polynomials |B_n(x)| are Sheffer for |(g, t)| for some invertible |g| with |c_n = n!| for which the associated polynomials are |\pseq{x^n}{n}|. The Sheffer Identity is thus the one mentioned in the Introduction.
Theorem (id:jtbs): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and |\pseq{p_n(x)}{n}| be associated to |f|. For all |h, l \in \mathscr F|, \[ \pair{h(t)l(t)}{s_n(x)} = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{h(t)}{p_i(x)} \pair{l(t)}{s_j(x)} \]
Proof: (click to expand)
Use the Expansion Theorem on |h| with respect to |\pseq{p_n(x)}{n}| and |l| with respect to |\pseq{s_n(x)}{n}|. |h(t)l(t)| will then be \[ \sum_{i+j=k} \left(\frac{\pair{h(t)}{p_i(x)}}{c_i} f(t)^i\right) \left(\frac{\pair{l(t)}{s_j(x)}}{c_j} g(t)f(t)^j\right) = \left[\sum_{i+j=k} \frac{\pair{h(t)}{p_i(x)}\pair{l(t)}{s_j(x)}}{c_i c_j}\right] g(t)f(t)^k \] Applying this to |s_n(x)| gives the result. |\square|See the paper for an interesting alternative proof of this result.
Recurrence Formulas
Given a linear operator |\mu| on |P|, the adjoint |\mu^*| is a linear operator on |\mathscr F| characterized by: \[ \pair{\mu^* f(t)}{p(x)} = \pair{f(t)}{\mu p(x)} \]
We can readily compute that: \[ \mu^* f(t) = \sum_{k=0}^\infty \frac{\pair{f(t)}{\mu x^k}}{c_k} t^k \]
Theorem (id:ahdu): The adjoint to a linear operator on |P| is continuous.
Proof: (click to expand)
Let |\pseq{f_k(t)}{k}| where |\deg f_k \to \infty| as |k \to \infty|. Let |K_n| be an index such that |\deg f_{K_n} > \sup_{i=0}^n \deg \mu x^i|. |\pair{f_{K_n}(t)}{\mu x^m} = 0| for all |m \leq n|, so, using the formula above, |\deg \mu^* f_{K_n} \geq n|. |\square|In fact, a linear operator on |\mathscr F| is an adjoint to one on |P| if and only if it is continuous.
Theorem (id:fdui): If |T| is a continuous linear operator on |\mathscr F|, then there exists a linear operator |\mu| on |P| such that |T = \mu^*|.
Proof: (click to expand)
Define |\mu x^n = \sum_{k=0}^\infty \frac{\pair{Tt^k}{x^n}}{c_k} x^k| which is well-defined because |T| being continuous means only finitely many |\pair{Tt^k}{x^n}| are non-zero. By construction, we see that |\pair{t^k}{\mu x^n} = \pair{Tt^k}{x^n}|. |\square|If |\pseq{p_n(x)}{n}| is associated to |f|, then the umbral shift |\theta_f| associated to |f| is the linear operator on |P| defined by \[ \theta_f p_n(x) = \frac{(n+1)c_n}{c_{n+1}} p_{n+1}(x) \] for all |n \in \mathbb N|. In the case where |p_n(x) = x^n| and |c_n = n!| the umbral shift is just multiplication by |x|. Since, famously, multiplication by |x| does not commute with differentiation, |Dx - xD = 1| as operators, the umbral shift isn’t induced as a linear operator by a formal power series. We’ll see that this is generally the case below.
A derivation on an algebra |A| is a linear operator |\partial| on |A| satisfying \[ \partial(ab) = (\partial a)b + a\partial b \] for all |a, b \in A|.
Lemma (id:lxnr): A continuous linear operator |\partial| on |\mathscr F| is a continuous derivation if and only if |\partial 1 = 0| and for any delta series |f|, |\partial f(t)^k = kf(t)^{k-1}g(t)| for all |k \in \mathbb N| for some |g|.
Proof: (click to expand)
A continuous derivation satisfies |\partial h(t) = h’(t)\partial t| from which the result follows with |g(t) = f’(t)\partial t|. A continuous linear operator satisfying these laws satisfies |\partial h(t) = h’(t)f’(t)^{-1}g(t)| which we can see by expanding |h(t)| in terms of the pseudobasis |\pseq{f(t)^k}{k}| and using continuity to push the |\partial| into the sum. Given this \[\begin{align} \partial (h(t)k(t)) & = (h(t)k(t))'f’(t)^{-1}g(t) \\ & = k(t)h’(t)f’(t)^{-1}g(t) + h(t)k’(t)f’(t)^{-1}g(t) \\ & = (\partial h(t))k(t) + h(t)\partial k(t) \end{align}\] |\square|Theorem (id:oqyq): An operator |\theta| on |P| is the umbral shift for the delta series |f| if and only if its adjoint |\theta^*| is a derivation on |\mathscr F| and \[ \theta^* f(t)^k = kf(t)^{k-1} \] for all |k \in \mathbb N|.
Proof: (click to expand)
If |\theta| is the umbral shift for |f| associated to |\pseq{p_n(x)}{n}|, then \[\begin{align} \pair{\theta^* f(t)^k}{p_n(x)} & = \pair{f(t)^k}{\theta p_n(x)} \\ & = \frac{(n+1)c_n}{c_{n+1}}\pair{f(t)^k}{p_{n+1}(x)} \\ & = (n+1)c_n \delta_k^{n+1} \\ & = kc_n \delta_k^{n+1} \\ & = kc_n \delta_{k-1}^n \\ & = \pair{kf(t)^{k-1}}{p_n(x)} \end{align}\] We of course have |\pair{\theta^* t^0}{p_n(x)} = \pair{t^0}{\theta p_n(x)} = \frac{(n+1)c_n c_0}{c_{n+1}} \pair{f(t)^0}{p_{n+1}(x)} = \frac{(n+1)c_n c_0}{c_{n+1}} \delta_0^{n+1} = 0| since |n+1| is never |0|. Since |\theta^*| is an adjoint, it’s continuous, and we can apply the previous lemma to conclude that it is a derivation.
If |\theta^*| is a derivation satisfying |\theta^* f(t)^k = kf(t)^{k-1}| then we can rearrange the equations of the first result to get: \[\begin{align} \pair{f(t)^k}{\theta p_n(x)} & = \pair{\theta^* f(t)^k}{p_n(x)} \\ & = \pair{kf(t)^{k-1}}{p_n(x)} \\ & = kc_n \delta_{k-1}^n \\ & = kc_n \delta_k^{n+1} \\ & = (n+1)c_n \delta_k^{n+1} \\ & = \frac{(n+1)c_n}{c_{n+1}}\pair{f(t)^k}{p_{n+1}(x)} \end{align}\] |\square|In the particular case where |f(t) = t|, the above states that |\theta_t^* t^k = kt^{k-1}|, i.e. |\theta_t^* f(t) = f’(t)|. (Not to be confused with |\partial_t| which is only the true derivative when |c_n = n!|.) We can easily compute that |\theta_t t = xD|. Notably, this does not depend on the choice of |c|.
Theorem (id:kscz): If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)|, then \[ \theta_t s_n(x) = \sum_{k=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}} \pair{g’(t)}{s_{n-k}(x)} + \frac{kc_n}{c_k c_{n-k+1}} \pair{g(t)f’(t)}{s_{n-k+1}(x)} \right] s_k(x) \]
Proof: (click to expand)
Start with the Polynomial Expansion Theorem. \[\begin{align} \theta_t s_n(x) & = \sum_{n=0}^{n+1} \frac{\pair{g(t)f(t)^k}{\theta_t s_n(x)}}{c_k} s_k(x) \\ & = \sum_{n=0}^{n+1} \frac{\pair{\theta_t^*(g(t)f(t)^k)}{s_n(x)}}{c_k} s_k(x) \\ & = \sum_{n=0}^{n+1} \left[\frac{\pair{g’(t)f(t)^k + kg(t)f(t)^{k-1}f’(t)}{s_n(x)}}{c_k}\right] s_k(x) \\ & = \sum_{n=0}^{n+1} \left[\frac{\pair{g’(t)}{f(t)^k s_n(x)} + \pair{kg(t)f’(t)}{f(t)^{k-1} s_n(x)}}{c_k}\right] s_k(x) \\ & = \sum_{n=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}}\pair{g’(t)}{s_{n-k}(x)} + \frac{kc_n}{c_k c_{n-k+1}}\pair{g(t)f’(t)}{s_{n-k+1}(x)}\right] s_k(x) \end{align}\] |\square|Lemma (id:golc): A surjective derivation on |\mathscr F| is continuous.
Proof: (click to expand)
For any derivation |\partial|, we have |\partial 1 = \partial 1 + \partial 1| so |\partial 1 = 0|. We also have |\partial t^k = kt^{k-1}\partial t|. Since |\deg \partial t^0 = \infty| and |\deg \partial t^k = \deg \partial t + k - 1|, the only way to get something of degree |0| and have |\partial| be surjective is if |\deg \partial t = 0|. In general, if |\deg f = k| then |f(t) = t^k g(t)| where |\deg g = 0|. Therefore, |\deg \partial f(t) = \deg (t^k \partial g(t) + kt^{k-1}g(t)\partial t)| and the right-hand side has degree |k-1| implying |\partial| is continuous. |\square|Theorem (id:gxkw): A surjective derivation on |\mathscr F| is adjoint to an umbral shift and vice versa.
Proof: (click to expand)
For the reverse direction, since |f| is a delta series in theorem (id:oqyq) and |\theta^* f(t)^k = kf(t)^{k-1}|, |\pseq{\theta^* f(t)^k}{k}| is a pseudobasis and so |\theta^*| is surjective.
For the forward direction, if |\partial| is a surjective derivation, then it is an adjoint of a linear operator on |P| because it is continuous. We want to find the delta series |f| for which the adjoint is the umbral shift. Generally, we have |\partial f(t)^k = kf(t)^{k-1} f’(t)\partial t|, and we want |\partial f(t)^k = k f(t)^{k-1}| for theorem (id:oqyq). Thus we want |f’(t)\partial t = 1| or |f’(t) = (\partial t)^{-1}| where |\partial t| is invertible because |\partial| is surjective. We can solve this differential equation with |f(0) = 0|, i.e. |\deg f = 1|, to determine the delta series |f|. |\square|Lemma (id:woxc): If |f| and |g| are delta series, then \[\theta_f^* = (\theta_f^* g(t))\theta_g^* \]
Proof: (click to expand)
For any derivation |\partial|, we have |\partial a^k = (\partial a)ka^{k-1}| and |\theta_f^*| is a derivation. Therefore \[ \theta_f^* g(t)^k = (\theta_f^* g(t))kg(t)^{k-1} = (\theta_f^* g(t)) \theta_g^* g(t)^k \] and |g(t)^k| for |k\in \mathbb N| is a pseudobasis so this suffices to show the operators are equal. |\square|Theorem (id:dxii): If |\theta_f| and |\theta_g| are umbral shifts, then \[ \theta_f = \theta_g \circ (\theta_g^* f(t))^{-1} \]
Proof: (click to expand)
\[\begin{align} \pair{t^k}{\theta_f p(x)} & = \pair{\theta_f^* t^k}{p(x)} \\ & = \pair{(\theta_g^* f(t))^{-1}\theta_g^* t^k}{p(x)} \\ & = \pair{\theta_g^* t^k}{(\theta_g^* f(t))^{-1} p(x)} \\ & = \pair{t^k}{\theta_g (\theta_g^* f(t))^{-1} p(x)} \end{align}\] |\square|Theorem (id:ouma): If |\pseq{p_n(x)}{n}| is associated to |f|, then \[ p_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n} \theta_t(f’(t))^{-1} p_n(x) \]
Proof: This is just the previous theorem applied to |p_n(x)| with |g(t) = t|. |\square|
Lemma (id:xnvj): Let |\theta_f| be the umbral shift for |f|. Then \[ \theta_f^*(h(t)) = h(t) \theta_f - \theta_f h(t) \] for all |h \in \mathscr F|. The left-hand side is the linear operator on |P| induced by the formal power series that is the output of |\theta_f^*|.
Proof: (click to expand)
\[\begin{align} \pair{t^k}{\theta_f^*(h(t))x^n + \theta_f h(t) x^n} & = \pair{t^k}{\theta_f^*(h(t))x^n} + \pair{t^k}{\theta_f h(t) x^n} \\ & = \pair{\theta_f^*(h(t)) t^k}{x^n} + \pair{h(t)\theta_f^*(t^k)}{x^n} \\ & = \pair{\theta_f^*(h(t)) t^k + h(t)\theta_f^*(t^k)}{x^n} \\ & = \pair{\theta_f^*(h(t) t^k)}{x^n} \\ & = \pair{t^k}{h(t) \theta_f x^n} \end{align}\] |\square|This lemma shows that no umbral shift has the form |g(t)| for a formal power series |g|.
Theorem (id:omlq): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. Then if |\theta_f| is the umbral shift for |f|, \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]
Proof: (click to expand)
We have that |\pseq{g(t)s_n(x)}{n}| is associated to |f| leading to \[ g(t)s_n(x) = \frac{c_{n+1}}{(n+1)c_n} \theta_f g(t)s_{n+1}(x) \] The previous lemma leads to \[\begin{align} s_n(x) & = \frac{c_{n+1}}{(n+1)c_n} g(t)^{-1}\theta_f g(t)s_{n+1}(x) \\ & = \frac{c_{n+1}}{(n+1)c_n} g(t)^{-1}(g(t)\theta_f - \theta_f^*(g(t)))s_{n+1}(x) \\ & = \frac{c_{n+1}}{(n+1)c_n} (\theta_f - g(t)^{-1}\theta_f^*(g(t)))s_{n+1}(x) \\ & = \frac{c_{n+1}}{(n+1)c_n} (\theta_f + g(t)\theta_f^*(g(t)^{-1}))s_{n+1}(x) \end{align}\] where the final equality comes from \[ 0 = \theta_f^*(1) = \theta_f^*(g(t)g(t)^{-1}) = g(t)^{-1}\theta_f^*(g(t)) + g(t)\theta_f^*(g(t)^{-1}) \] |\square|Theorem (id:kusb): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}\left(\theta_t - \frac{g’(t)}{g(t)}\right) \frac{1}{f’(t)} s_n(x) \]
Proof: (click to expand)
Starting from the previous theorem \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]
Theorem (id:dxii) gives |\theta_f = \theta_t f’(t)^{-1}|. Since |\theta_f^* h(t) = h’(t)\theta_f^* t| for any |h|, we first note that using theorem (id:oqyq), |1 = \theta_f^* f(t) = f’(t) \theta_f^* t| or |\theta_f^* t = f’(t)^{-1}|. Then \[ g(t)\theta_f^*(g(t)^{-1}) = -\frac{g(t) g’(t)}{g(t)^2} \theta_f^* t = -\frac{g’(t)}{g(t)} \theta_f^* t = -\frac{g’(t)}{g(t)} f’(t)^{-1} \] |\square|Theorem (id:vxhh): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. If \[ T = \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{f(t)}{f’(t)} = \left(xD - \frac{tg’(t)}{g(t)}\right)\frac{f(t)}{tf’(t)} \] then \[ Ts_n(x) = ns_n(x) \] In other words, |s_n(x)| is an eigenfunction for |T| with eigenvalue |n|.
Proof: (click to expand)
The equality for |T| just involves inserting a |t/t| in the middle. \[ Ts_n(x) = \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{1}{f’(t)}f(t)s_n(x) = \frac{c_n}{c_{n-1}}\left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{1}{f’(t)}s_{n-1}(x) = n s_n(x) \] where theorem (id:kusb) and theorem (id:hvdt) have been used. |\square|Transfer Formulas
Theorem (Transfer Formula): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \] for all |n \in \mathbb N|.
Proof: (click to expand)
We verify that the right-hand side meets the conditions of theorem (id:cutg). Condition (2) is easily verified: \[\begin{align} f(t) p_n(x) & = f(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \\ & = f’(t)\left(\frac{t}{f(t)}\right)^n t x^n \\ & = \frac{c_n}{c_{n-1}} f’(t)\left(\frac{t}{f(t)}\right)^n x^{n-1} \\ & = \frac{c_n}{c_{n-1}} p_{n-1}(x) \end{align}\]
For condition (1), we start with a small trick by writing |f’(t) = [t(f(t)/t)]’|. \[\begin{align} \bigpair{t^0}{f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n} & = \bigpair{\left(t\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \\ & = \bigpair{\left(\frac{t}{f(t)}\right)^n + t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \tag{product rule} \\ & = \bigpair{\left(\frac{t}{f(t)}\right)^n}{x^n} + \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \end{align}\]
We proceed from there by cases. In the |n = 0| case, we have \[ \bigpair{t^0}{x^0} + \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)}{x^0} = \pair{t^0}{x^0} = c_0 \]
For the |n > 0| case, to simplify the expressions we’ll show that \[ \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} = -\bigpair{\left(\frac{t}{f(t)}\right)^n}{x^n} \] We proceed as follows \[\begin{align} \bigpair{t\left(\frac{f(t)}{t}\right)'\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} & = \bigpair{\left(\frac{f(t)}{t}\right)'\left(\frac{f(t)}{t}\right)^{-n-1}}{tx^n} \\ & = -\frac{1}{n}\bigpair{\left[\left(\frac{f(t)}{t}\right)^{-n}\right]'}{tx^n} \\ & = -\frac{1}{n}\bigpair{\theta_t^*\left[\left(\frac{f(t)}{t}\right)^{-n}\right]}{tx^n} \\ & = -\frac{1}{n}\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{\theta_t tx^n} \\ & = -\frac{1}{n}\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{xDx^n} \\ & = -\bigpair{\left(\frac{f(t)}{t}\right)^{-n}}{x^n} \end{align}\] |\square|Theorem (Transfer Formula, alternate form): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = \frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1} \] for all |n \geq 1|.
Proof: (click to expand)
\[\begin{align} \bigpair{f(t)^k}{\frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1}} & = \frac{c_n}{nc_{n-1}}\bigpair{\theta_t^*[f(t)^k]}{\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\ & = \frac{kc_n}{nc_{n-1}}\bigpair{f(t)^{k-1}f’(t)}{\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\ & = \frac{kc_n}{nc_{n-1}}\bigpair{f(t)^{k-1}}{f’(t)\left(\frac{t}{f(t)}\right)^n x^{n-1}} \\ & = \frac{k}{n}\bigpair{f(t)^{k-1}}{\frac{c_n}{c_{n-1}}p_{n-1}(x)} \tag{Transfer Formula} \\ & = \frac{k}{n}\pair{f(t)^{k-1}}{f(t)p_n(x)} \tag{theorem (id:cutg)} \\ & = \frac{k}{n}\pair{f(t)^k}{p_n(x)} \\ & = \frac{k}{n}c_n\delta_k^n \\ & = c_n\delta_k^n \\ & = \pair{f(t)^k}{p_n(x)} \end{align}\] |\square|Corollary (id:hcem): Let |\pseq{p_n(x)}{n}| be associated to |f| and |\pseq{q_n(x)}{n}| be associated to |gf| with |g| invertible. Then \[ q_n(x) = \theta_t g(t)^{-n} \theta_t^{-1} p_n(x) \] where |\theta_t^{-1} x^{n+1} = (c_{n+1}/((n+1)c_n))x^n| and |\theta_t^{-1} 1 = 0|.
Proof: Note that |\theta_t^{-1}\theta_t = 1| and write |q_n(x)| using the alternate form of the Transfer Formula. |\square|
Theorem (id:jyhq): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|, and let |h| and |l| be invertible series. Then the sequence |r_n(x) = h(t)l(t)^n s_n(x)| is Sheffer for \[ \left(\frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)}l(t)g(t), l(t)^{-1} f(t) \right) \]
Proof: (click to expand)
Apply |g(t)| to both sides getting \[ g(t)r_n(x) = h(t)l(t)^n g(t)s_n(x) \] and noting that |\pseq{g(t)s_n(x)}{n}| is associated to |f| and applying the Transfer Formula, we get: \[\begin{align} g(t)r_n(x) & = h(t)l(t)^n f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \\ & = h(t)\frac{f’(t)}{l(t)} l(t)^{n+1} \left(\frac{t}{f(t)}\right)^{n+1} x^n \\ & = h(t)\frac{f’(t)}{l(t)} \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\ & = h(t)\frac{f’(t)}{l(t)} \frac{[l(t)^{-1} f(t)]'}{[l(t)^{-1} f(t)]'} \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\ & = \frac{h(t)f’(t)}{[l(t)^{-1} f(t)]'l(t)} [l(t)^{-1} f(t)]' \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \\ \end{align} \] So we get \[ \frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)} l(t)g(t)r_n(x) = [l(t)^{-1} f(t)]' \left(\frac{t}{l(t)^{-1} f(t)}\right)^{n+1} x^n \] which by the Transfer Formula means these are associated to |l(t)^{-1}f(t)| and thus |\pseq{r_n(x)}{n}| is Sheffer for \[ \left(\frac{[l(t)^{-1} f(t)]'}{f’(t)h(t)}l(t)g(t), l(t)^{-1} f(t) \right) \] |\square|Umbral Composition and Transfer Operators
Let |\pseq{p_n(x)}{n}| be the associated sequence to |f|. The transfer or umbral2 operator for |\pseq{p_n(x)}{n}| (or |f|) is the linear operator |\lambda_f| on |P| defined by \[ \lambda_f x^n = p_n(x) \] This implies the adjoint operator is \[ \lambda_f^* g(t) = \sum_{k=0}^\infty \frac{\pair{g(t)}{p_k(x)}}{c_k} t^k \]
The Transfer Formula gives us the action of a transfer operator at each monomial. Note, this doesn’t imply that a transfer operator is induced by a formal power series, since the Transfer Formula uses a different formal power series for each monomial.
Lemma (id:oony): A |\mathbb K|-algebra homomorphism of |\mathscr F| is an automorphism if and only if it is preserves degree.
Proof: (click to expand)
For the forward direction, assume |T| is an automorphism. Let |f(t) = T^{-1}(t)| with |\deg f = k| so |f(t) = t^kg(t)| with |g(t)| invertible. |k > 0| since |\mathbb K|-algebra homomorphisms send constants to themselves but then |T(f(t)) = t = T(t)^k T(g(t))| and the degrees can only line up if |k = 1| and |\deg T(t) = 1|. |T| thus can’t reduce degree and by the same logic neither can |T^{-1}|, so |T| must preserve degree.
For the reverse direction, a degree preserving linear operator is continuous. We have that |\pseq{T(t)^k}{k}| is a pseudobasis, so for any |f \in \mathscr F|, we can write \[ f(t) = \sum_{k=0}^\infty a_k T(t)^k = T\left(\sum_{k=0}^\infty a_k t^k\right) \] so |g(t) = \sum_{k=0}^\infty a_k t^k| satisfies |f = T(g)| and for every |f| we can find such a |g| making |T| surjective. The uniqueness of the |\pseq{a_k}{k}| implies |T| is injective and thus bijective. For abstract nonsense reasons this is enough for it to be an automorphism. |\square|Lemma (id:lfum): If |T| is a continuous |\mathbb K|-algebra homomorphism on |\mathscr F| and |f, g \in \mathscr F| with |\deg f > 0|, then |T(g(t)) = g(T(t))| and |(Tg)(f(t)) = T(g(f(t)))|.
Proof: (click to expand)
\[\begin{align} T(g(t)) & = T\left(\sum_{k=0}^\infty g_k t^k\right) \\ & = \sum_{k=0}^\infty g_k T(t)^k \\ & = g(T(t)) \end{align}\] and |(Tg)(f(t)) = g(T(f(t))) = T(g(f(t)))|. |\square|Theorem (id:stji): A linear operator |\lambda| on |P| is the transfer operator for |f \in \mathscr F| if and only if its adjoint |\lambda^*| is a |\mathbb K|-algebra automorphism of |\mathscr F| for which |\lambda^* f(t) = t|. This makes |\lambda_f(g(t)) = g(\bar f(t))|.
Proof: (click to expand)
For the forward direction, if |\lambda| is a transfer operator for |f| then we immediately get \[ \lambda^* f(t) = \sum_{k=0}^\infty \frac{\pair{f(t)}{p_k(x)}}{c_k} t^k = \sum_{k=0}^\infty \delta_1^k t^k = t \] and \[\begin{align} \pair{\lambda^*(g(t)h(t))}{x^n} & = \pair{g(t)h(t)}{\lambda x^n} \\ & = \pair{g(t)h(t)}{p_n(x)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{g(t)}{p_i(x)}\pair{h(t)}{p_j(x)} \tag{theorem (id:jtbs)} \\ & = \sum_{i+j=n} \frac{c_n}{c_i c_j} \pair{\lambda^* g(t)}{x^i}\pair{\lambda^* h(t)}{x^j} \\ & = \pair{(\lambda^* g(t))(\lambda^* h(t))}{x^n} \end{align}\]
For the reverse direction, \[\begin{align} \pair{f(t)^k}{\lambda x^n} & = \pair{\lambda^*(f(t)^k)}{x^n} \\ & = \pair{\lambda^*(f(t))^k}{x^n} \\ & = \pair{t^k}{x^n} \\ & = c_n \delta_k^n \end{align}\] so |\lambda x^n| has the characteristic property of |p_n(x)|. |\square|Corollary (id:chiw): A continuous |\mathbb K|-algebra automorphism on |\mathscr F| is the adjoint of a transfer operator.
Proof: By theorem (id:fdui) a continuous linear operator is adjoint to some linear operator on |P|, and being an automorphism there is some |f \in \mathscr F| that gets mapped to |t| by the automorphism so the previous theorem applies. |\square|
Summarizing some results of this form: There’s a bijection between continuous linear operators on |\mathscr F| and linear operators on |P| via adjointness. Further, there’s a bijection between continuous surjective derivations on |\mathscr F| and umbral shifts, and a bijection between continuous |\mathbb K|-algebra automorphisms on |\mathscr F| and transfer operators.
Corollary (id:ocfq): Transfer operators form a group with |(\lambda_f^*)^{-1} = \lambda_{\bar f}^*| and |\lambda_f^* \circ \lambda_g^* = \lambda_{g\circ f}^*|.
Proof: This readily follows from |\lambda_f^* g(t) = g(\bar f(t))|. |\square|
Theorem (id:ydci):
- An transfer operator maps associated sequences to associated sequences.
- If |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| are associated to |f| and |g| respectively, and |\lambda| is a linear operator which maps |p_n(x)| to |q_n(x)|, then |\lambda^* g(t) = f(t)|.
- Additionally, |\lambda| is a transfer operator.
Proof: (click to expand)
For all of the following let |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| be associated to |f| and |g| respectively and |\lambda_f|, |\lambda_g| be the transfer operators.
For 1, \[ \pair{(\lambda_g^*)^{-1}(g(t)^k)}{\lambda_g q_n(x)} = \pair{\lambda_g(\lambda_g^*)^{-1}(g(t)^k)}{q_n(x)} = \pair{g(t)^k}{q_n(x)} = c_n \delta_k^n \] so |\lambda_g q_n(x)| is associated to |(\lambda_g^*)^{-1}(g(t))|.
For 2, \[ \pair{\lambda^*(g(t))}{p_n(x)} = \pair{g(t)}{\lambda p_n(x)} = \pair{g(t)}{q_n(x)} = c_n \delta_1^n = \pair{f(t)}{p_n(x)} \]
For 3, by lemma (id:lfum), we can just precompose with |\bar f| to get |\lambda(g(\bar f(t))) = t| and do the same logic as 2 with |t^k| and |x^n| implying that |\lambda| is a transfer operator for |g(\bar f(t))|. |\square|Let |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| be two polynomial sequences. The umbral composition of |q| with |p| is written and defined as \[ \ucomp{q}{p} = \sum_{k=0}^n q_{n,k}p_k(x) \] If |\lambda| is the transfer operator for |\pseq{p_n(x)}{n}|, then we have \[ \ucomp{q}{p} = \lambda q_n(x) \]
Theorem (id:xvse): If |\pseq{p_n(x)}{n}| and |\pseq{q_n(x)}{n}| are associated to |f| and |g| respectively, then |\pseq{\ucomp{q}{p}}{n}| is associated to |g(f(t))|.
Proof: (click to expand)
\[\begin{align} \pair{g(f(t))^k}{\ucomp{q}{p}} & = \pair{g(f(t))^k}{\lambda_f q_n(x)} \\ & = \pair{\lambda_f^*(g(f(t)))^k}{q_n(x)} \\ & = \pair{g(\lambda_f^*(f(t)))^k}{q_n(x)} \\ & = \pair{g(t)^k}{q_n(x)} \\ & = c_n \delta_k^n \\ \end{align}\] |\square|Corollary (id:xajh): Umbral composition makes the set of associated sequences into a group.
Proof: It follows the group structure of transfer operators. |\square|
A Sheffer operator is the linear operator |\mu_{g,f}| defined by |\mu_{g,f}x^n = s_n(x)| where |\pseq{s_n(x)}{n}| is Sheffer for |(g,f)|. By considering the associated sequence induced by a Sheffer sequence, i.e. |\pseq{g(t)s_n(x)}{n}|, we readily get |\mu_{g,f} = g(t)^{-1}\lambda_f| so Sheffer operators can be reduced to transfer operators.
We can immediately read off the |g| for Bernoulli polynomials using its representation by a differential operator. Namely, |g(t) = \frac{e^t - 1}{t} = \frac{\varepsilon_1(t) - 1}{t}|. This leads to the identity |\Delta B_n(x) = Dx^n| where |\Delta| is the forward difference operator.
Theorem (id:pnci): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and let |\pseq{r_n(x)}{n}| be Sheffer for |(h, l)|. Then |\ucomp{r}{s}| is Sheffer for the pair |(g(t)h(f(t)), l(f(t)))|.
Proof: (click to expand)
\[\begin{align} \pair{g(t)h(f(t))l(f(t))^k}{\ucomp{r}{s}} & = \pair{g(t)h(f(t))l(f(t))^k}{\mu_{g,f} r_n(x)} \\ & = \pair{h(f(t))l(f(t))^k}{g(t)\mu_{g,f} r_n(x)} \\ & = \pair{h(f(t))l(f(t))^k}{\lambda_f r_n(x)} \\ & = \pair{\lambda_f^*(h(f(t))l(f(t))^k)}{r_n(x)} \\ & = \pair{h(\lambda_f^*(f(t)))l(\lambda_f^*(f(t)))^k)}{r_n(x)} \\ & = \pair{h(t)l(t)^k)}{r_n(x)} \\ & = c_n \delta_k^n \end{align}\] |\square|Corollary (id:fony): \[ \pair{h(t)}{\mu_{g,f} q_n(x)} = \pair{\mu_{g,f}^*(h(t))}{q_n(x)} = \pair{g(\bar f(t))^{-1} h(\bar f(t))}{q_n(x)} \]
Proof: Immediate from definition of |\mu_{g,f}| and theorem (id:stji). |\square|
Given two polynomial sequences |\pseq{r_n(x)}{n}| and |\pseq{s_n(x)}{n}| related by \[ r_n(x) = \sum_{k=0}^n a_{n,k} s_k(x) \] the connection-constants problem is to determine the constants |a_{n,k}|. When |\pseq{r_n(x)}{n}| and |\pseq{s_n(x)}{n}| are Sheffer for given formal power series pairs, we can solve this problem as follows.
Theorem (id:hqtd): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and |\pseq{r_n(x)}{n}| be Sheffer for |(h, l)|. If \[ r_n(x) = \sum_{k=0}^n a_{n,k} s_k(x) \] then the sequence |t_n(x) = \sum_{k=0}^n a_{n,k} x^k| is Sheffer for \[ \left(\frac{h(\bar f(t))}{g(\bar f(t))}, l(\bar f(t)) \right) \]
Proof: (click to expand)
Clearly, |r_n(x) = \ucomp{t}{s}|, so we just apply theorem (id:pnci) and solve for the |u, v \in \mathscr F| such that |\pseq{t_n(x)}{n}| is Sheffer for |(u, v)|. |\square|Corollary (id:ixkb): Let |\pseq{p_n(x)}{n}| be associated to |f| and |\pseq{q_n(x)}{n}| be associated to |l| and \[ q_n(x) = \sum_{k=0}^n a_{n,k} p_k(x) \] then |t_n(x) = \sum_{k=0}^n a_{n,k} x^k| is associated to |l(\bar f(t))|.
Proof: Immediate from the previous theorem. |\square|
Transfer operators give us a concise proof of the Lagrange Inversion Formula used for the compositional inverse. The usual formula would arise from |g(t) = t| in the below.
Corollary (Lagrange Inversion Formula): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \] Of course, |\pair{g(\bar f(t))}{x^0} = \pair{g(t)}{x^0}| since |\deg \bar f = 1|.
Proof: (click to expand)
From theorem (id:stji), |\lambda_f^*(g(t)) = g(\bar f(t))|.
We conclude with a use of the alternate form of the Transfer Formula with |\pseq{p_n(x)}{n}| being associated to |f|. \[\begin{align} \pair{g(\bar f(t))}{x^n} & = \pair{\lambda_f^*(g(t))}{x^n} \\ & = \pair{g(t)}{\lambda_f x^n} \\ & = \pair{g(t)}{p_n(x)} \\ & = \bigpair{g(t)}{\theta_t\left(\frac{t}{f(t)}\right)^n x^{n-1}} \tag{Transfer Formula, alternate form} \\ & = \bigpair{\theta_t^*(g(t))\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \\ & = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \end{align}\] |\square|Corollary (Lagrange Inversion Formula, alternate form): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \]
Proof: Do the same proof as the previous corollary just with the first form of the Transfer Formula. |\square|
Corollary (Lagrange Inversion Formula, Hermite form): Let |f, h \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \bigpair{\frac{th(\bar f(t))}{\bar f(t)f’(\bar f(t))}}{x^n} = \bigpair{h(t)\left(\frac{t}{f(t)}\right)^n}{x^n} \]
Proof: Apply the previous corollary with |g(t) = h(t)\frac{f(t)}{tf’(t)}|. |\square|
Example: Chebyshev Polynomials
While the “classical” umbral calculus generally used |c_n = n!|, one interesting (orthogonal) polynomial sequence that benefits from the extra flexibility is Chebyshev polynomials where we’ll use |c_n = (-1)^n|. (This can be viewed as a special case of Gegenbauer polynomials which use |c_n = {-\lambda \choose n}^{-1}| which reduces to the Chebyshev case when |\lambda = 1|.)
The book “The Umbral Calculus” mentioned in the introduction primarily covers the “classical” case and has many examples of those. It also covers the “non-classical” case as well albeit as a bit of a second thought.
|tx^n = -x^{n-1}| so |tp(x) = -x^{-1} p(x)|
|\pseq{T_n(x)}{n}| is Sheffer for |(g, f)| where |g(t) = (1-t^2)^{-2}| and |f(t) = \frac{\sqrt{1-t^2} - 1}{t} = \frac{-t}{1 + \sqrt{1 - t^2}}|. |\bar f(t) = \frac{-2t}{1+t^2}| and |f’(t) = \frac{-1}{\sqrt{1-t^2}(1 + \sqrt{1 - t^2})} = \frac{f(t)}{t\sqrt{1-t^2}}|.
|f(t)s_n(x) = \frac{c_n}{c_{n-1}}s_n(x)| from theorem (id:hvdt) gives the recurrence \[\frac{c_n}{c_{n+1}ts_{n+1}(x) + \frac{c_n}{c_{n-1}}ts_{n-1}(x)} + 2s_n(x) = 0 \] for any Sheffer sequence with |f(t)| as its delta series. For the Chebyshev polynomials, this simplifies to \[ 2xT_n(x) + T_{n+1}(x) + T_{n-1}(x) = 0 \]
|\theta_t x^n = -(n+1)x^{n+1}| which we can compute from |\theta_t t = xD| which takes the form |\theta_t t x^n = -\theta_t x^{n-1} = nx^n|. This leads to |\theta_t = -x(1 + xD)|.
From theorem (id:vxhh), we get |TT_n(x) = nT_n(x)| where
\[ T = \left(\theta_t - \frac{g’(t)}{g(t)}\right)\frac{f(t)}{f’(t)} = \left(xD - \frac{tg’(t)}{g(t)}\right)\frac{f(t)}{tf’(t)} \]
This leads to |nT_n(x) = (xD - (1-t^2)^{-1})\sqrt{1-t^2}T_n(x)|.
Summary
A formal power series |f| is invertible iff |\deg f = 0| and a delta series iff |\deg f = 1|. We call a linear functional/operator induced by an invertible/delta series an invertible/delta functional/operator.
The defining property of |\pseq{s_n(x)}{n}| being Sheffer for |(g, f)| where |g| is an invertible series and |f| is a delta series is \[ \pair{g(t)f(t)^k}{s_n(x)} = \pair{t^k}{x^n} = c_n \delta_k^n \]
If |g(t) = 1|, then we say |\pseq{s_n(x)}{n}| is the associated sequence for |f|, and usually we’ll use |\pseq{p_n(x)}{n}| instead.
If |f(t) = t|, then we say |\pseq{s_n(x)}{n}| is the Appell sequence for |g|.
Formal power series as operators \[ t^k x^n = \frac{c_n}{c_{n-k}} x^{n-k} \] and generally, \[h(t) x^n = \sum_{k=0}^n \frac{c_n}{c_{n-k}} h_k x^{n-k} \]
We can generalize this to Sheffer sequences a la \[h(f(t)) s_n(x) = \sum_{k=0}^n \frac{c_n}{c_{n-k}} h_k s_{n-k}(x) \] where |(g, f)| is Sheffer for |\pseq{s_n(x)}{n}| for some invertible |g \in \mathscr F|.
A linear operator |T| on |\mathscr F| is continuous if given |\pseq{f_k}{k}| such that |\deg f_k \to \infty| as |k \to \infty|, we have |\deg T(f_k) \to \infty| as |k \to \infty|.
Given a linear operator |\mu| on |P|, the adjoint |\mu^*| is a linear operator on |\mathscr F| characterized by: \[ \pair{\mu^* f(t)}{p(x)} = \pair{f(t)}{\mu p(x)} \]
Its adjoint expansion is: \[ \mu^* f(t) = \sum_{k=0}^\infty \frac{\pair{f(t)}{\mu x^k}}{c_k} t^k \]
This applies generally, but, in particular for umbral shifts and transfer operators.
Theorem (id:ahdu): The adjoint to a linear operator on |P| is continuous.
Theorem (id:fdui): If |T| is a continuous linear operator on |\mathscr F|, then there exists a linear operator |\mu| on |P| such that |T = \mu^*|.
Theorem (Generating Function): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ \frac{1}{g(\bar f(t))} \varepsilon_y(\bar f(t)) = \sum_{k=0}^\infty \frac{s_k(y)}{c_k} t^k \] for all |y \in \mathbb K|.*
Theorem (Sheffer Identity): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if \[ \varepsilon_y(t) s_n(x) = \sum_{i+j=n} \frac{c_n}{c_i c_j} p_i(y) s_j(x) \] for all |y \in \mathbb K| where |\pseq{p_n(x)}{n}| is associated to |f|.
Theorem (id:cutg): A sequence |\pseq{p_n(x)}{n}| is the associated sequence for |f| if and only if 1) |\pair{t^0}{p_n(x)} = c_0 \delta_n^0| for all |n \in \mathbb N|, and 2) |f(t) p_n(x) = \frac{c_n}{c_{n-1}}p_{n-1}(x)| for all |n \in \mathbb N_+|.
Theorem (Transfer Formula): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = f’(t)\left(\frac{t}{f(t)}\right)^{n+1} x^n \] for all |n \in \mathbb N|.
Theorem (Transfer Formula, alternate form): If |\pseq{p_n(x)}{n}| is the associated sequence of |f|, then \[ p_n(x) = \frac{c_n}{nc_{n-1}} \theta_t \left(\frac{t}{f(t)}\right)^n x^{n-1} \] for all |n \geq 1|.
Theorem (id:qqes): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if |\pseq{g(t)s_n(x)}{n}| is the associated sequence for |f|.
Theorem (id:hvdt): A sequence |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| for some invertible |g| if and only if |f(t) s_n(x) = \frac{c_n}{c_{n-1}}s_{n-1}(x)|.
Theorem (id:omlq): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)|. Then if |\theta_f| is the umbral shift for |f|, \[ s_{n+1}(x) = \frac{c_{n+1}}{(n+1)c_n}(g(t)\theta_f^*(g(t)^{-1}) + \theta_f)s_n(x) \]
Theorem (Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for any |h \in \mathscr F|, \[ h(t) = \sum_{k=0}^\infty \frac{\pair{h(t)}{s_k(x)}}{c_k}g(t)f(t)^k \]
Corollary (Polynomial Expansion Theorem): Let |\pseq{s_n(x)}{n}| be a Sheffer for |(g, f)|. Then for a |p \in P|, \[ p(x) = \sum_{n=0}^\infty \frac{\pair{g(t)f(t)^n}{p(x)}}{c_n} s_n(x) \]
Theorem (id:stji): A linear operator |\lambda| on |P| is the transfer operator for |f \in \mathscr F| if and only if its adjoint |\lambda^*| is a |\mathbb K|-algebra automorphism of |\mathscr F| for which |\lambda^* f(t) = t|. This makes |\lambda_f(g(t)) = g(\bar f(t))|.
Theorem (id:pnci): Let |\pseq{s_n(x)}{n}| be Sheffer for |(g, f)| and let |\pseq{r_n(x)}{n}| be Sheffer for |(h, l)|. Then |\ucomp{r}{s}| is Sheffer for the pair |(g(t)h(f(t)), l(f(t)))|.
Corollary (id:fony): \[ \pair{h(t)}{\mu_{g,f} q_n(x)} = \pair{\mu_{g,f}^*(h(t))}{q_n(x)} = \pair{g(\bar f(t))^{-1} h(\bar f(t))}{q_n(x)} \]
Theorem (id:kscz): If |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)|, then \[ \theta_t s_n(x) = \sum_{k=0}^{n+1} \left[\frac{c_n}{c_k c_{n-k}} \pair{g’(t)}{s_{n-k}(x)} + \frac{kc_n}{c_k c_{n-k+1}} \pair{g(t)f’(t)}{s_{n-k+1}(x)} \right] s_k(x) \]
Corollary (id:tvdx): \[ ts_n(x) = \sum_{k=0}^{n-1} \frac{c_n}{c_k c_{n-k}} \pair{t}{p_{n-k}(x)} s_k(x) \]
Theorem (Conjugate Representation): |\pseq{s_n(x)}{n}| is Sheffer for |(g, f)| if and only if \[ s_n(x) = \sum_{k=0}^n \frac{\pair{g(\bar f(t))^{-1}\bar f(t)^k}{x^n}}{c_k} x^k \]
Theorem (Multiplication Theorem): Let |\pseq{s_n(x)}{n}| be Appell for |g|, then \[ s_n(\alpha x) = \alpha^n \frac{g(t)}{g(t/\alpha)} s_n(x) \] for |\alpha\neq 0|.
Corollary (Lagrange Inversion Formula): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g’(t)\left(\frac{t}{f(t)}\right)^n}{x^{n-1}} \] Of course, |\pair{g(\bar f(t))}{x^0} = \pair{g(t)}{x^0}| since |\deg \bar f = 1|.
Corollary (Lagrange Inversion Formula, alternate form): Let |f, g \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \pair{g(\bar f(t))}{x^n} = \bigpair{g(t)f’(t)\left(\frac{t}{f(t)}\right)^{n+1}}{x^n} \]
Corollary (Lagrange Inversion Formula, Hermite form): Let |f, h \in \mathscr F| with |\deg f = 1| and |c_n = n!|, then for |n > 0| \[ \bigpair{\frac{th(\bar f(t))}{\bar f(t)f’(\bar f(t))}}{x^n} = \bigpair{h(t)\left(\frac{t}{f(t)}\right)^n}{x^n} \]
This isn’t a basis because we’re allowing countably infinite sums of the elements of the pseudobasis, whereas for a basis, even with infinite elements, we’d be claiming that every element is a finite sum of the basis elements.↩︎
As far as I can tell, “umbral operator” is used in the “classical” |c_n = n!| case while “transfer operator” is more general. I’m sure people use “umbral operator” generally too, though.↩︎
I want to talk about one of the many pretty areas of number theory. This involves the notion of an arithmetic function and related concepts. A few relatively simple concepts will allow us to produce a variety of useful functions and theorems. This provides only a glimpse of the start of the field of analytic number theory, though many of these techniques are used in other places as we’ll also start to see.
(See the end for a summary of identities and results.)
- Prelude
- Arithmetic Functions
- Dirichlet Series
- Dirichlet Convolution
- Euler Product Formula
- Möbius Inversion
- Combinatorial Species
- Derivative of Dirichlet series
- Dirichlet Inverse
- More Examples
- Summatory Functions
- Summary
Prelude
As some notation, I’ll write |\mathbb N_+| for the set of positive naturals, and |\mathbb P| for the set of primes. |\mathbb N| will contain |0|. Slightly atypically, I’ll write |[n]| for the set of numbers from |1| to |n| inclusive, i.e. |a \in [n]| if and only if |1 \leq a \leq n|.
I find that the easiest way to see results in number theory is to view a positive natural number as a multiset of primes which is uniquely given by factorization. Coprime numbers are ones where these multisets are disjoint. Multiplication unions the multisets. The greatest common divisor is multiset intersection. |n| divides |m| if and only if |n| corresponds to a sub-multiset of |m|, in which case |m/n| corresponds to the multiset difference. The multiplicity of an element of a multiset is the number of occurrences. For a multiset |P|, |\mathrm{dom}(P)| is the set of elements of the multiset |P|, i.e. those with multiplicity greater than |0|. For a finite multiset |P|, |\vert P\vert| will be the sum of the multiplicities of the distinct elements, i.e. the number of elements (with duplicates) in the multiset.
We can represent a multiset of primes as a function |\mathbb P \to \mathbb N| which maps an element to its multiplicity. A finite multiset would then be such a function that is |0| at all but finitely many primes. Alternatively, we can represent the multiset as a partial function |\mathbb P \rightharpoonup \mathbb N_+|. It will be finite when it is defined for only finitely many primes. Equivalently, when it is a finite subset of |\mathbb P\times\mathbb N_+| (which is also a functional relation).
Unique factorization provides a bijection between finite multisets of primes and positive natural numbers. Given a finite multiset |P|, the corresponding positive natural number is |n_P = \prod_{(p, k) \in P} p^k|.
I will refer to this view often in the following.
Arithmetic Functions
An arithmetic function is just a function defined on the positive naturals. Usually, they’ll land in (not necessarily positive) natural numbers, but that isn’t required.
In most cases, we’ll be interested in the specific subclass of multiplicative arithmetic functions. An arithmetic function, |f|, is multiplicative if |f(1) = 1| and |f(ab) = f(a)f(b)| whenever |a| and |b| are coprime. We also have the notion of a completely multiplicative arithmetic function for which |f(ab) = f(a)f(b)| always. Obviously, completely multiplicative functions are multiplicative. Analogously, we also have a notion of (completely) additive where |f(ab) = f(a) + f(b)|. Warning: In other mathematical contexts, “additive” means |f(a+b)=f(a)+f(b)|. An obvious example of a completely additive function being the logarithm. Exponentiating an additive function will produce a multiplicative function.
For an additive function, |f|, we automatically get |f(1) = 0| since |f(1) = f(1\cdot 1) = f(1) + f(1)|.
Lemma: The product of two multiplicative functions |f| and |g| is multiplicative.
Proof: For |a| and |b| coprime, |f(ab)g(ab) = f(a)f(b)g(a)g(b) = f(a)g(a)f(b)g(b)|. |\square|
A parallel statement holds for completely multiplicative functions.
It’s also clear that a completely multiplicative function is entirely determined by its action on prime numbers. Since |p^n| is coprime to |q^n| whenever |p| and |q| are coprime, we see that a multiplicative function is entirely determined by its action on powers of primes. To this end, I’ll often define multiplicative/additive functions by their action on prime powers and completely multiplicative/additive functions by their action on primes.
Multiplicative functions aren’t closed under composition, but we do have that if |f| is completely multiplicative and |g| is multiplicative, then |f \circ g| is multiplicative when that composite makes sense.
Here are some examples. Not all of these will be used in the sequel.
- The power function |({-})^z| for any |z|, not necessarily an integer, is completely multiplicative.
- Choosing |z=0| in the previous, we see the constantly one function |\bar 1(n) = 1| is completely multiplicative.
- The identity function is clearly completely multiplicative and is also the |z=1| case of the above.
- The unit function, i.e. the indicator function for |1|, is |\varepsilon(n) = \begin{cases}1, & n = 1 \\ 0, & n \neq 1\end{cases}| and is completely multiplicative.
- Define a multiplicative function via |\mu(p^n) = \begin{cases} -1, & n = 1 \\ 0, & n > 1\end{cases}| where |p| is prime. This is the Möbius function. More holistically, |\mu(n)| is |0| if |n| has any square factors, otherwise |\mu(n) = (-1)^k| where |k| is the number of (distinct) prime factors.
- Define a completely multiplicative function via |\lambda(p) = -1|. |\lambda(n) = \pm 1| depending on whether there is an even or odd number of prime factors (including duplicates). This function is known as the Liouville function.
- |\lambda(n) = (-1)^{\Omega(n)}| where |\Omega(n)| is the completely additive function which counts the number of prime factors of |n| including duplicates. |\Omega(n_P) = \vert P\vert|.
- Define a multiplicative function via |\gamma(p^n) = -1|. |\gamma(n) = \pm 1| depending on whether there is an even or odd number of distinct prime factors.
- |\gamma(n) = (-1)^{\omega(n)}| where |\omega(n)| is the additive function which counts the number of distinct prime factors of |n|. See Prime omega function. We also see that |\omega(n_P) = \vert\mathrm{dom}(P)\vert|.
- The completely additive function for |q\in\mathbb P|, |\nu_q(p) = \begin{cases}1,&p=q\\0,&p\neq q\end{cases}| is the p-adic valuation.
- It follows that the |p|-adic absolute value |\vert r\vert_p = p^{-\nu_p(r)}| is completely multiplicative. It can be characterized on naturals by |\vert p\vert_q = \begin{cases}p^{-1},&p=q\\1,&p\neq q\end{cases}|.
- |\gcd({-}, k)| for a fixed |k| is multiplicative. Given any multiplicative function |f|, |f \circ \gcd({-},k)| is multiplicative. This essentially “restricts” |f| to only see the prime powers that divide |k|. Viewing the finite multiset of primes |P| as a function |\mathbb P\to\mathbb N|, |f(\gcd(p^n,n_P)) = \begin{cases}f(p^n),&n\leq P(p)\\f(p^{P(p)}),&n>P(p)\end{cases}|.
- The multiplicative function characterized by |a(p^n) = p(n)| where |p(n)| is the partition function counts the number of abelian groups the given order. That this function is multiplicative is a consequence of the fundamental theorem of finite abelian groups.
- The Jacobi symbol |\left(\frac{a}{n}\right)| where |a\in\mathbb Z| and |n| is an odd positive integer is a completely multiplicative function with either |a| or |n| fixed. When |n| is an odd prime, it reduces to the Legendre symbol. For |p| an odd prime, we have |(\frac{a}{p}) = a^{\frac{p-1}{2}} \pmod p|. This will always be in |\{-1, 0, 1\}| and can be alternately defined as |\left(\frac{a}{p}\right) = \begin{cases}0,&p\mid a\\1,&p\nmid a\text{ and }\exists x.x^2\equiv a\pmod p\\-1,&\not\exists x.x^2\equiv a\pmod p\end{cases}|. Therefore, |\left(\frac{a}{p}\right)=1| (|=0|) when |a| is a (trivial) quadratic residue mod |p|.
- An interesting example which is not multiplicative nor additive is the arithmetic derivative. Let |p\in\mathbb P|. Define |\frac{\partial}{\partial p}(n)| via |\frac{\partial}{\partial p}(p) = 1|, |\frac{\partial}{\partial p}(q) = 0| for |q\neq p| and |q\in\mathbb P|, and |\frac{\partial}{\partial p}(nm) = \frac{\partial}{\partial p}(n)m + n\frac{\partial}{\partial p}(m)|. We then have |D_S = \sum_{p\in S}\frac{\partial}{\partial p}| for non-empty |S\subseteq\mathbb P| which satisfies the same product rule identity. This perspective views a natural number (or, more generally, a rational number) as a monomial in infinitely many variables labeled by prime numbers.
- A Dirichlet character of modulus |m| is, by definition, a completely multiplicative function |\chi| satisfying |\chi(n + m) = \chi(n)| and |\chi(n)| is non-zero if and only if |n| is coprime to |m|. The Jacobi symbol |\left(\frac{({-})}{m}\right)| is a Dirichlet character of modulus |m|. |\bar 1| is the Dirichlet character of modulus |1|.
Dirichlet Series
Given an arithmetic function |f|, we define the Dirichlet series:
\[\mathcal D[f](s) = \sum_{n=1}^\infty \frac{f(n)}{n^s} = \sum_{n=1}^\infty f(n)n^{-s}\]
When |f| is a Dirichlet character, |\chi|, this is referred to as the (Dirichlet) |L|-series of the character, and the analytic continuation is the (Dirichlet) |L|-function and is written |L(s, \chi)|.
We’ll not focus much on when such a series converges. See this section of the above Wikipedia article for more details. Alternatively, we could talk about formal Dirichlet series. We can clearly see that if |s = 0|, then we get the sum |\sum_{n=1}^\infty f(n)| which clearly won’t converge for, say, |f = \bar 1|. We can say that if |f| is asymptotically bounded by |n^k| for some |k|, i.e. |f \in O(n^k)|, then the series will converge absolutely when the real part of |s| is greater than |k+1|. For |\bar 1|, it follows that |\mathcal D[\bar 1](x + iy)| is defined when |x > 1|. We can use analytic continuation to go beyond these limits.
See A Catalog of Interesting Dirichlet Series for a more reference-like listing. Beware differences in notation.
Dirichlet Convolution
Why is this interesting in this context? Let’s consider two arithmetic functions |f| and |g| and multiply their corresponding Dirichlet series. We’ll get:
\[\mathcal D[f](s)\mathcal D[g](s) = \sum_{n=1}^\infty h(n)n^{-s} = \mathcal D[h](s)\]
where now we need to figure out what |h(n)| is. But |h(n)| is going to be the sum of all the terms of the form |f(a)a^{-s}g(b)b^{-s} = f(a)g(b)(ab)^{-s}| where |ab = n|. We can thus write: \[h(n) = \sum_{ab=n} f(a)g(b) = \sum_{d\mid n} f(d)g(n/d)\] We’ll write this more compactly as |h = f \star g| which we’ll call Dirichlet convolution. We have thus shown a convolution theorem of the form \[\mathcal D[f]\mathcal D[g] = \mathcal D[f \star g]\]
The unit function serves as a unit to this operation which is reflected by |\mathcal D[\varepsilon](s) = 1|.
In the same way we can view a sum of the form |\sum_{a+b=n}f(a)g(b)| that arises in “normal” convolution as a sum along the line |y = n - x|, we can view the sum |\sum_{ab=n}f(a)g(b)| as a sum along a hyperbola of the form |y = n/x|. For all of |\sum_{n=1}^\infty\sum_{k=1}^\infty f(n)g(k)|, |\sum_{n=1}^\infty\sum_{k=1}^n f(k)g(n-k)|, and |\sum_{n=1}^\infty\sum_{k\mid n}f(k)g(n/k)| we’re including |f(a)g(b)| for every |(a,b)\in\mathbb N_+\times\mathbb N_+| in the sum exactly once. The difference is whether we’re grouping the internal sum by rows, diagonals, or hyperbolas. This idea of summing hyperbolas can be expanded to a computational technique for sums of multiplicative functions called the Dirichlet hyperbola method.
Since we will primarily be interested in multiplicative functions, we should check that |f \star g| is a multiplicative function when |f| and |g| are.
Lemma: Assume |a| and |b| are coprime, and |f| and |g| are multiplicative. Then |(f \star g)(ab) = (f \star g)(a)(f \star g)(b)|.
Proof: Since |a| and |b| are coprime, they share no divisors besides |1|. This means every |d| such that |d \mid ab| factors as |d = d_a d_b| where |d_a \mid a| and |d_b \mid b|. More strongly, write |D_n = \{ d \in \mathbb N_+ \mid d \mid n\}|, then for any coprime pair of numbers |i| and |j|, we have |D_{ij} \cong D_i \times D_j| and that every pair |(d_i, d_j) \in D_i \times D_j| are coprime1. Thus,
\[\begin{flalign} (f \star g)(ab) & = \sum_{d \in D_{ab}} f(d)g((ab)/d) \tag{by definition} \\ & = \sum_{(d_a, d_b) \in D_a \times D_b} f(d_a d_b)g((ab)/(d_a d_b)) \tag{via the bijection} \\ & = \sum_{(d_a, d_b) \in D_a \times D_b} f(d_a)f(d_b)g(a/d_a)g(b/d_b) \tag{f and g are multiplicative} \\ & = \sum_{d_a \in D_a} \sum_{d_b \in D_b} f(d_a)f(d_b)g(a/d_a)g(b/d_b) \tag{sum over a Cartesian product} \\ & = \sum_{d_a \in D_a} f(d_a)g(a/d_a) \sum_{d_b \in D_b} f(d_b)g(b/d_b) \tag{undistributing} \\ & = \sum_{d_a \in D_a} f(d_a)g(a/d_a) (f \star g)(b) \tag{by definition} \\ & = (f \star g)(b) \sum_{d_a \in D_a} f(d_a)g(a/d_a) \tag{undistributing} \\ & = (f \star g)(b) (f \star g)(a) \tag{by definition} \\ & = (f \star g)(a) (f \star g)(b) \tag{commutativity of multiplication} \end{flalign}\] |\square|
It is not the case that the Dirichlet convolution of two completely multiplicative functions is completely multiplicative.
We can already start to do some interesting things with this. First, we see that |\mathcal D[\bar 1] = \zeta|, the Riemann zeta function. Now consider |(\bar 1 \star \bar 1)(n) = \sum_{k \mid n} 1 = d(n)|. |d(n)| is the divisor function which counts the number of divisors of |n|. We see that |\mathcal D[d](s) = \zeta(s)^2|. A simple but useful fact is |\zeta(s - z) = \mathcal D[(-)^z](s)|. This directly generalizes the result for |\mathcal D[\bar 1]| and also implies |\mathcal D[\operatorname{id}](s) = \zeta(s - 1)|.
Generalizing in a different way, we get the family of functions |\sigma_k = ({-})^k \star \bar 1|. |\sigma_k(n) = \sum_{d \mid n} d^k|. From the above, we see |\mathcal D[\sigma_k](s) = \zeta(s - k)\zeta(s)|.
Lemma: Given a completely multiplicative function |f|,
we get |f(n)(g \star h)(n) = (fg \star fh)(n)|.
Proof: \[\begin{flalign}
(fg \star fh)(n)
& = \sum_{d \mid n} f(d)g(d)f(n/d)h(n/d) \\
& = \sum_{d \mid n} f(d)f(n/d)g(d)h(n/d) \\
& = \sum_{d \mid n} f(n)g(d)h(n/d) \\
& = f(n)\sum_{d \mid n} g(d)h(n/d) \\
& = f(n)(g \star h)(n)
\end{flalign}\]
|\square|
As a simple corollary, for a completely multiplicative |f|, |f \star f = f(\bar 1 \star \bar 1) = fd|.
Euler Product Formula
However, the true power of this is unlocked by the following theorem:
Theorem (Euler product formula): Given a multiplicative function |f| which doesn’t grow too fast, e.g. is |O(n^k)| for some |k > 0|, \[\mathcal D[f](s) = \sum_{n=1}^\infty f(n)n^{-s} = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-ns} = \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty f(p^n)p^{-ns}\right) \] where the series converges.
Proof: The last equality is simply using the fact that |f(p^0)p^0 = f(1) = 1| because |f| is multiplicative. The idea for the main part is similar to how we derived Dirichlet convolution. When we start to distribute out the infinite product, each term will correspond to the product of selections of a term from each series. When all but finitely many of those selections select the |1| term, we get |\prod_{(p, k) \in P}f(p^k)(p^k)^{-s}| where |P| is some finite multiset of primes induced by those selections. Therefore, |\prod_{(p, k) \in P}f(p^k)(p^k)^{-s} = f(n_P)n_P^{-s}|. Thus, by unique factorization, |f(n)n^{-s}| for every positive natural occurs in the sum produced by distributing the right-hand side exactly once.
In the case where |P| is not a finite multiset, we’ll have \[ \frac{\prod_{(p, k) \in P}f(p^k)}{\left(\prod_{(p, k) \in P}p^k\right)^s}\]
The denominator of this expression goes to infinity when the real part of |s| is greater than |0|. As long as the numerator doesn’t grow faster than the denominator (perhaps after restricting the real part of |s| to be greater than some bound), then this product goes to |0|. Therefore, the only terms that remain are these corresponding to the Dirichlet series on the left-hand side. |\square|
If we assume |f| is completely multiplicative, we can further simplify Euler’s product formula via the usual sum of a geometric series, |\sum_{n=0}^\infty x^n = (1-x)^{-1}|, to:
\[ \sum_{n=1}^\infty f(n)n^{-s} = \prod_{p \in \mathbb P}\sum_{n=0}^\infty (f(p)p^{-s})^n = \prod_{p \in \mathbb P}(1 - f(p)p^{-s})^{-1} \]
Now let’s put this to work. The first thing we can see is |\zeta(s) = \mathcal D[\bar 1](s) = \prod_{p\in\mathbb P}(1 - p^{-s})^{-1}|. But this lets us write |1/\zeta(s) = \prod_{p\in\mathbb P}(1 - p^{-s})|. If we look for a multiplicative function that would produce the right-hand side, we see that it must send a prime |p| to |-1| and |p^n| for |n > 1| to |0|. In other words, it’s the Möbius function |\mu| we defined before. So |\mathcal D[\mu](s) = 1/\zeta(s)|.
Using |\mathcal D[d](s) = \zeta(s)^2|, we see that \[\begin{flalign} \zeta(s)^2 & = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty p^{-ns}\right)^{-2} \\ & = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty (n+1)p^{-ns}\right)^{-1} \\ & = \prod_{p\in\mathbb P}\left(\sum_{n=0}^\infty d(p^n)p^{-ns}\right)^{-1} \\ & = \mathcal D[d](s) \end{flalign}\] Therefore, |d(p^n) = n + 1|. This intuitively makes sense because the only divisors of |p^n| are |p^k| for |k = 0, \dots, n|, and for |a| and |b| coprime |d(ab) = \vert D_{ab} \vert = \vert D_a \times D_b\vert = \vert D_a\vert\vert D_b\vert = d(a)d(b)|.
Another result leveraging the theorem is given any multiplicative function |f|, we can define a new multiplicative function via |f^{[k]}(p^n) = \begin{cases}f(p^m), & km = n\textrm{ for }m\in\mathbb N \\ 0, & k \nmid n\end{cases}|.
Lemma: The operation just defined has the property that
|\mathcal D[f^{[k]}](s) = \mathcal D[f](ks)|.
Proof:
\[\begin{flalign}
\mathcal D[f^{[k]}](s)
& = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f^{[k]}(p^n)p^{-ns} \\
& = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f^{[k]}(p^{kn})p^{-nks} \\
& = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-nks} \\
& = \mathcal D[f](ks)
\end{flalign}\]
|\square|
Möbius Inversion
We can write a sum over some function, |f|, of the divisors of a given natural |n| as |(f \star \bar 1)(n) = \sum_{d \mid n} f(d)|. Call this |g(n)|. But then we have |\mathcal D[f \star \bar 1] = \mathcal D[f]\mathcal D[\bar 1] = \mathcal D[f]\zeta| and thus |\mathcal D[f] = \mathcal D[f]\zeta/\zeta = \mathcal D[(f \star \bar 1) \star \mu]|. Therefore, if we only have the sums |g(n) = \sum_{d \mid n} f(d)| for some unknown |f|, we can recover |f| via |f(n) = (g \star \mu)(n) = \sum_{d\mid n}g(d)\mu(n/d)|. This is Möbius inversion.
Formally:
\[g(n) = \sum_{d\mid n} f(d) \iff f(n) = \sum_{d \mid n} \mu(d)g(n/d)\]
As a simple example, we clearly have |\zeta(s)/\zeta(s) = 1 = \mathcal D[\varepsilon](s)| so |\bar 1 \star \mu = \varepsilon| or |\sum_{d \mid n}\mu(d) = 0| for |n > 1| and |1| when |n = 1|.
We also get generalized Möbius inversion via |\varepsilon(n) = \varepsilon(n)n^k = (\mu\star\bar 1)(n)n^k = (({-})^k\mu\star({-})^k)(n)|. Which is to say if |g(n) = \sum_{d\mid n}d^k f(n/d)| then |f(n) = \sum_{d\mid n} \mu(d)d^kg(n/d)|.
By considering logarithms, we also get a multiplicative form of (generalized) Möbius inversion: \[g(n) = \prod_{d\mid n}f(n/d)^{d^k} \iff f(n) = \prod_{d\mid n}g(n/d)^{\mu(d)d^k}\]
Theorem: As another guise of Möbius inversion, given any completely multiplicative function |h|, let |g(m) = \sum_{n=1}^\infty f(mh(n))|. Assuming these sums make sense, we can recover |f(k)| via |f(k) = \sum_{m=1}^\infty \mu(m)g(kh(m))|.
Proof: \[\begin{align} \sum_{m=1}^\infty \mu(m)g(kh(m)) & = \sum_{m=1}^\infty \mu(m)\sum_{n=1}^\infty f(kh(m)h(n)) \\ & = \sum_{N=1}^\infty \sum_{N=mn} \mu(m)f(kh(N)) \\ & = \sum_{N=1}^\infty f(kh(N)) \sum_{N=nm} \mu(m) \\ & = \sum_{N=1}^\infty f(kh(N)) (\mu\star\bar 1)(N) \\ & = \sum_{N=1}^\infty f(kh(N)) \varepsilon(N) \\ & = f(k) \end{align}\] |\square|
This will often show up in the form of |r(x^{1/n})| or |r(x^{1/n})/n|, i.e. with |h(n)=n^{-1}| and |f_x(k) = r(x^k)| or |f_x(k) = kr(x^k)|. Typically, we’ll then be computing |f_x(1) = r(x)|.
Lambert Series
As a brief aside, it’s worth mentioning Lambert Series.
Given an arithmetic function |a|, these are series of the form: \[ \sum_{n=1}^\infty a(n) \frac{x^n}{1-x^n} = \sum_{n=1}^\infty a(n) \sum_{k=1}^\infty x^{kn} = \sum_{n=1}^\infty (a \star \bar 1)(n) x^n \]
This leads to: \[\sum_{n=1}^\infty \mu(n) \frac{x^n}{1-x^n} = x\] and: \[\sum_{n=1}^\infty \varphi(n) \frac{x^n}{1-x^n} = \frac{x}{(1-x)^2}\]
Inclusion-Exclusion
The Möbius and |\zeta| functions can be generalized to incidence algebras where this form is from the incidence algebra induced by the divisibility order2. A notable and relevant example of a Möbius functions for another, closely related, incidence algebra is when we consider the incidence algebra induced by finite multisets with the inclusion ordering. Let |T| be a finite multiset, we get |\mu(T) = \begin{cases}0,&T\text{ has repeated elements}\\(-1)^{\vert T\vert},&T\text{ is a set}\end{cases}|. Since we can view a natural number as a finite multiset of primes, and we can always relabel the elements of a finite multiset with distinct primes, this is equivalent to the Möbius function we’ve been using.
This leads to a nice and compact way of describing the principle of inclusion-exclusion. Let |A| and |S| be (finite) multisets with |S \subseteq A| and assume we have |f| and |g| defined on the set of sub-multisets of |A|. If \[g(A) = \sum_{S\subseteq A} f(S)\] then \[f(A) = \sum_{S\subseteq A}\mu(A\setminus S)g(S)\] and this is Möbius inversion for this notion of Möbius function. We can thus take a different perspective on Möbius inversion. If |P| is a finite multiset of primes, then \[g(n_P) = \sum_{Q\subseteq P}f(n_Q) \iff f(n_P) = \sum_{Q\subseteq P}\mu(P\setminus Q)g(n_Q)\] recalling that |Q\subseteq P \iff n_Q \mid n_P| and |n_{P\setminus Q} = n_P/n_Q| when |Q\subseteq P|.
We get traditional inclusion-exclusion by noting that |\mu(T)=(-1)^{\vert T\vert}| when |T| is a set, i.e. all elements have multiplicity at most |1|. Let |I| be a finite set and assume we have a family of finite sets, |\{T_i\}_{i\in I}|. Write |T = \bigcup_{i\in I}T_i| and define |\bigcap_{i\in\varnothing}T_i = T|.
Define \[f(J) = \left\vert\bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i\right\vert\] for |J\subseteq I|. In particular, |f(I) = 0|. |f(J)| is then the number of elements shared by all |T_i| for |i\notin J| and no |T_j| for |j\in J|. Every |x \in \bigcup_{i\in I}T_i| is thus associated to exactly one such subset of |I|, namely |\{j\in I\mid x\notin T_j\}|. Formally, |x \in \bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i \iff J = \{j\in I\mid x\notin T_j\}| so each |\bigcap_{i\in I\setminus J}T_i\setminus\bigcup_{i \in J}T_i| is disjoint and \[g(J) = \sum_{S\subseteq J}f(S) = \left\vert\bigcup_{S\subseteq J}\left(\bigcap_{i\in I\setminus S}T_i\setminus\bigcup_{i \in S}T_i\right)\right\vert = \left\vert\bigcap_{i\in I\setminus J}T_i\right\vert \] for |J \subseteq I|. In particular, |g(I) = \vert\bigcup_{i\in I}T_i\vert|.
By the Möbius inversion formula for finite sets, we thus have: \[f(J) = \sum_{S\subseteq J}(-1)^{\vert J\vert - \vert S\vert}g(S)\] which for |J = I| gives: \[ 0 = \sum_{J\subseteq I}(-1)^{\vert I\vert - \vert J\vert}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert = \left\vert\bigcup_{i\in I}T_i\right\vert + \sum_{J\subsetneq I}(-1)^{\vert I\vert - \vert J\vert}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert \] which is equivalent to the more usual form: \[\left\vert\bigcup_{i\in I}T_i\right\vert = \sum_{J\subsetneq I}(-1)^{\vert I\vert - \vert J\vert - 1}\left\vert\bigcap_{i\in I\setminus J}T_i\right\vert = \sum_{\varnothing\neq J\subseteq I}(-1)^{\vert J\vert + 1}\left\vert\bigcap_{i\in J}T_i\right\vert \]
|\varphi|
An obvious thing to explore is to apply Möbius inversion to various arithmetic functions. A fairly natural first start is applying Möbius inversion to the identity function. From the above results, we know that this unknown function |\varphi| will satisfy |\mathcal D[\varphi](s) = \zeta(s-1)/\zeta(s) = \mathcal D[\operatorname{id}\star\mu](s)|. We also immediately have the property that |n = \sum_{d \mid n}\varphi(d)|. Using Euler’s product formula we have: \[\begin{flalign} \zeta(s-1)/\zeta(s) & = \prod_{p \in \mathbb P} \frac{1 - p^{-s}}{1 - p^{-s+1}} \\ & = \prod_{p \in \mathbb P} \frac{1 - p^{-s}}{1 - pp^{-s}} \\ & = \prod_{p \in \mathbb P} (1 - p^{-s})\sum_{n=0}^\infty p^n p^{-ns} \\ & = \prod_{p \in \mathbb P} \left(\sum_{n=0}^\infty p^n p^{-ns}\right) - \left(\sum_{n=0}^\infty p^n p^{-s} p^{-ns}\right) \\ & = \prod_{p \in \mathbb P} \left(\sum_{n=0}^\infty p^n p^{-ns}\right) - \left(\sum_{n=0}^\infty p^n p^{-(n + 1)s}\right) \\ & = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty p^n p^{-ns}\right) - \left(\sum_{n=1}^\infty p^{n-1} p^{-ns}\right) \\ & = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty (p^n - p^{n-1}) p^{-ns}\right) \\ & = \prod_{p \in \mathbb P} \left(1 + \sum_{n=1}^\infty \varphi(p^n) p^{-ns}\right) \\ & = \mathcal D[\varphi](s) \end{flalign}\]
So |\varphi| is the multiplicative function defined by |\varphi(p^n) = p^n - p^{n-1}|. For |p^n|, we can see that this counts the number of positive integers less than or equal to |p^n| which are coprime to |p^n|. There are |p^n| positive integers less than or equal to |p^n|, and every |p|th one is a multiple of |p| so |p^n/p = p^{n-1}| are not coprime to |p^n|. All the remainder are coprime to |p^n| since they don’t have |p| in their prime factorizations and |p^n| only has |p| in its. We need to verify that this interpretation is multiplicative. To be clear, we know that |\varphi| is multiplicative and that this interpretation works for |p^n|. The question is whether |\varphi(n)| for general |n| meets the above description, i.e. whether the number of coprime numbers less than |n| is multiplicative.
Theorem: The number of coprime numbers less than |n| is multiplicative and is equal to |\varphi(n)|.
Proof: |\varphi = \mu\star\operatorname{id}|. We have:
\[\begin{flalign} \varphi(n_P) & = \sum_{d\mid n_P}\mu(d)\frac{n_P}{d} \\ & = \sum_{Q\subseteq P}\mu(Q)\frac{n_P}{n_Q} \\ & = \sum_{Q\subseteq \mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{n_Q} \end{flalign}\]
We can see an inclusion-exclusion pattern. Specifically, let |C_k = \{ c \in [k] \mid \gcd(c, k) = 1\}| be the numbers less than or equal to |k| and coprime to |k|. Let |S_{k,m} = \{ c \in [k] \mid m \mid c\}|. We have |S_{k,a} \cap S_{k,b} = S_{k,\operatorname{lcm}(a,b)}|. Also, when |c \mid k|, then |\vert S_{k,c}\vert = k/c|. |C_{n_P} = [n_P] \setminus \bigcup_{p \in \mathrm{dom}(P)} S_{n_P,p}| because every number not coprime to |n_P| shares some prime factor with it. Applying inclusion-exclusion to the union yields \[\begin{align} \vert C_{n_P}\vert & = n_P - \sum_{\varnothing\neq Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert+1}\left\vert \bigcap_{p\in Q}S_{n_P,p}\right\vert \\ & = n_P + \sum_{\varnothing\neq Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{\prod_{p\in Q}p} \\ & = \sum_{Q\subseteq\mathrm{dom}(P)}(-1)^{\vert Q\vert}\frac{n_P}{n_Q} \end{align}\] |\square|
Many of you will already have recognized that this is Euler’s totient function.
Combinatorial Species
The book Combinatorial Species and Tree-Like Structures has many examples where Dirichlet convolutions and Möbius inversion come up3. A combinatorial species is a functor |\operatorname{Core}(\mathbf{FinSet})\to\mathbf{FinSet}|. Any permutation on a finite set can be decomposed into a collection of cyclic permutations. Let |U| be a finite set of cardinality |n| and |\pi : U \cong U| a permutation of |U|. For any |u\in U|, there is a smallest |k\in\mathbb N_+| such that |\pi^k(u) = u| where |\pi^{k+1} = \pi \circ \pi^k| and |\pi^0 = \operatorname{id}|. The |k| elements |\mathcal O(u)=\{\pi^{i-1}(u)\mid i\in[k]\}| make up a cycle of length |k|, and |\pi| restricted to |U\setminus O(u)| is a permutation on this smaller set. We can just inductively pull out another cycle until we run out of elements. Write |\pi_k| for the number of cycles of length |k| in the permutation |\pi|. We clearly have |n = \sum_{k=1}^\infty k\pi_k| as every cycle has |k| elements in it.
Write |\operatorname{fix}\pi| for the number of fixed points of |\pi|, i.e. the cardinality of the set |\{u\in U\mid \pi(u) = u\}|. Clearly, every element that is fixed by |\pi^k| needs to be in a cycle whose length divides |k|. This leads to the equation:
\[ \operatorname{fix}\pi^k = \sum_{d\mid k} d\pi_d = ((d \mapsto d\pi_d) \star \bar 1)(k)\]
Since |F(\pi^k) = F(\pi)^k| for a combinatorial species |F|, Möbius inversion, as explicitly stated in Proposition 2.2.3 of Combinatorial Species and Tree-Like Structures, leads to:
\[k(F(\pi))_k = \sum_{d\mid k}\mu\left(\frac{k}{d}\right)\operatorname{fix}F(\pi^d) = (\mu\star(d\mapsto \operatorname{fix}F(\pi^d)))(k) \]
If we Dirichlet convolve both sides of this with |\operatorname{id}|, replacing |F(\pi)| with |\beta| as it doesn’t matter that this permutation comes from an action of a species, we get:
\[\sum_{d\mid m} d\beta_d(m/d) = m\sum_{d\mid m} \beta_d = (\varphi\star(d\mapsto \operatorname{fix}\beta^d))(m)\]
This is just using |\varphi = \operatorname{id}\star\mu|. If we choose |m| such that |\beta^m = \operatorname{id}|, then we get |\sum_{d\mid m} \beta_d = \sum_{k=1}^\infty \beta_k| because |\beta_k| will be |0| for all the |k| which don’t divide |m|. This makes the previous equation into equation 2.2 (34) in the book.
Since we know |n = \sum_{k=1}^\infty k\pi_k| for any permutation |\pi|, we also get: \[\vert F([n])\vert = \sum_{k=1}^\infty\sum_{d\mid k}\mu\left(\frac{k}{d}\right)\operatorname{fix}F(\pi^d) = \sum_{k=1}^\infty(\mu\star(d\mapsto\operatorname{fix}F(\pi^d)))(k)\]
These equations give us a way to compute some of these divisor sums by looking at the number fixed points and cycles of the action of species and vice versa. For example, 2.3 (49) is a series of Dirichlet convolutions connected to weighted species.
Example 12 from this book presents a nice and perhaps surprising identity. The core of it can be written as: \[\sum_{k=1}^\infty\ln(1-ax^k) = \sum_{k=1}^\infty\rho_k(a)\ln(1-x^k)\] where |\rho_k(a) = k^{-1}\sum_{d\mid k}\varphi(k/d)a^d|. We can rewrite this definition as the characterization |k\rho_k(a) = (\varphi\star a^{({-})})(k)|. Recalling that |\varphi = \mu \star \operatorname{id}| and |\ln(1-x) = -\sum_{n=1}^\infty x^n/n|, we get the following derivation:
Theorem: \[\sum_{k=1}^\infty\ln(1-ax^k) = \sum_{k=1}^\infty\rho_k(a)\ln(1-x^k)\] where |\rho_k(a) = k^{-1}\sum_{d\mid k}\varphi(k/d)a^d|.
Proof: \[\begin{flalign} \sum_{k=1}^\infty\ln(1-ax^k) & = -\sum_{k=1}^\infty\sum_{n=1}^\infty \frac{a^n x^{nk}}{n} \\ & = -\sum_{n=1}^\infty\sum_{k=1}^\infty \frac{a^n x^{nk}}{n} \\ & = -\sum_{N=1}^\infty\sum_{k\mid N} \frac{a^{N/k} x^N}{N/k} \tag{N=nk} \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N} ka^{N/k} \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\operatorname{id}\star a^{({-})})(N) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\varepsilon\star\operatorname{id}\star a^{({-})})(N) \tag{the trick} \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\bar 1\star\mu\star\operatorname{id}\star a^{({-})})(N) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}(\bar 1\star\varphi\star a^{({-})})(N) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N}(\varphi\star a^{({-})})(k) \\ & = -\sum_{N=1}^\infty\frac{x^N}{N}\sum_{k\mid N}k\rho_k(a) \\ & = -\sum_{n=1}^\infty\frac{x^{nk}}{n}\sum_{k=1}^\infty\rho_k(a) \tag{N=nk again} \\ & = \sum_{k=1}^\infty\rho_k(a) \ln(1-x^k) \end{flalign}\] |\square|
Derivative of Dirichlet series
We can easily compute the derivative of a Dirichlet series (assuming sufficiently strong convergence so we can push the differentiation into the sum):
\[\begin{flalign} \mathcal D[f]’(s) & = \frac{d}{ds}\sum_{n=1}^\infty f(n)n^{-s} \\ & = \sum_{n=1}^\infty f(n)\frac{d}{ds}n^{-s} \\ & = \sum_{n=1}^\infty f(n)\frac{d}{ds}e^{-s\ln n} \\ & = \sum_{n=1}^\infty -f(n)\ln n e^{-s\ln n} \\ & = -\sum_{n=1}^\infty f(n)\ln n n^{-s} \\ & = -\mathcal D[f\ln](s) \end{flalign}\]
This leads to the identity |\frac{d}{ds}\ln\mathcal D[f](s) = \mathcal D[f]’ (s)/\mathcal D[f](s) = -\mathcal D[f\ln \star \mu](s)|. For example, we have |-\zeta’(s)/\zeta(s) = \mathcal D[\ln \star \mu](s)|. Using the Euler product formula, we have |\ln\zeta(s) = -\sum_{p\in\mathbb P}\ln(1-p^{-s})|. Differentiating this gives \[\begin{flalign} \frac{d}{ds}\ln\zeta(s) & = -\sum_{p\in\mathbb P} p^{-s}\ln p/(1 - p^{-s}) \\ & = -\sum_{p\in\mathbb P} \sum_{k=1}^\infty \ln p (p^k)^{-s} \\ & = -\sum_{n=1}^\infty \Lambda(n) n^{-s} \\ & = -\mathcal D[\Lambda](s) \end{flalign}\] where |\Lambda(n) = \begin{cases}\ln p,&p\in\mathbb P\land\exists k\in\mathbb N_+.n=p^k \\ 0, & \text{otherwise}\end{cases}|. |\Lambda|, which is not a multiplicative nor an additive function, is known as the von Mangoldt function. Just to write it explicitly, the above implies |\Lambda = \ln \star \mu|, i.e. |\Lambda| is the Möbius inversion of |\ln|. This can be generalized for arbitrary completely multiplicative functions besides |\bar 1| to get |\mathcal D[f]’/\mathcal D[f] = \mathcal D[f\Lambda]|.
We now have multiple perspectives on |\Lambda| which is a kind of “indicator function” for prime powers.
Dirichlet Inverse
Let’s say we’re given an arithmetic function |f|, and we want to find an arithmetic function |g| such that |f \star g = \varepsilon| which we’ll call the Dirichlet inverse of |f|. We immediately get |(f \star g)(1) = f(1)g(1) = 1 = \varepsilon(1)|. So, supposing |f(1)\neq 1|, we can define |g(1) = 1/f(1)|. We then get a recurrence relation for all the remaining values of |g| via: \[0 = (f \star g)(n) = f(1)g(n) + \sum_{d \mid n, d\neq 1} f(d)g(n/d)\] for |n > 1|. Solving for |g(n)|, we have: \[g(n) = -f(1)^{-1}\sum_{d\mid n,d\neq 1}f(d)g(n/d)\] where the right-hand side only requires |g(k)| for |k < n|. If |f| is multiplicative, then |f(1) = 1| and the inverse of |f| exists.
If |f| is completely multiplicative, its Dirichlet inverse is |\mu f|. This follows easily from |f \star \mu f = (\bar 1 \star \mu)f = \varepsilon f = \varepsilon|. As an example, |({-})^z| is completely multiplicative so its inverse is |({-})^z\mu|. Since the inverse of a Dirichlet convolution is the convolution of the inverses, we get |\varphi^{-1}(n) = \sum_{d\mid n}d\mu(d)|. Not to be confused with |\varphi(n) = (\operatorname{id}\star\mu)(n) = \sum_{d\mid n} d\mu(n/d)|.
Less trivially, the inverse of a multiplicative function is also a multiplicative function. We can prove it by complete induction on |\mathbb N_+| using the formula for |g| from above.
Theorem: If |f\star g = \varepsilon|, then |g| is multiplicative when |f| is.
Proof: Let |n = ab| where |a| and |b| are coprime. If |a| (or, symmetrically, |b|) is equal to |1|, then since |g(1) = 1/f(1) = 1|, we have |g(1n) = g(1)g(n) = g(n)|. Now assume neither |a| nor |b| are |1| and, as the induction hypothesis, assume that |g| is multiplicative on all numbers less than |n|. We have: \[\begin{flalign} g(ab) & = -\sum_{d\mid ab,d\neq 1}f(d)g(ab/d) \\ & = -\sum_{d_a \mid a}\sum_{d_b \mid b,d_a d_b \neq 1}f(d_ad_b)g(ab/(d_ad_b)) \\ & = -\sum_{d_a \mid a}\sum_{d_b \mid b,d_a d_b \neq 1}f(d_a)f(d_b)g(a/d_a)g(b/d_b)) \\ & = -\sum_{d_b \mid b,d_b \neq 1}f(d_b)g(a)g(b/d_b)) - \sum_{d_a \mid a,d_a \neq 1}\sum_{d_b \mid b}f(d_a)f(d_b)g(a/d_a)g(b/d_b)) \\ & = -g(a)\sum_{d \mid b,d \neq 1}f(d)g(b/d)) - \sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a)\sum_{d_b \mid b}f(d_b)g(b/d_b)) \\ & = g(a)g(b) - \sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a) (f \star g)(b) \\ & = g(a)g(b) - \varepsilon(b)\sum_{d_a \mid a,d_a \neq 1}f(d_a)g(a/d_a) \\ & = g(a)g(b) \end{flalign}\] |\square|
Assuming |f| has a Dirichlet inverse, we also have: \[\mathcal D[f^{-1}](s) = \mathcal D[f](s)^{-1}\] immediately from the convolution theorem.
More Examples
Given a multiplicative function |f|:
\[\begin{align} \mathcal D[f(\gcd({-},n_P))](s) & = \zeta(s)\prod_{(p,k)\in P}(1 - p^{-s})\left(\sum_{n=0}^\infty f(p^{\min(k,n)})p^{-ns}\right) \\ & = \zeta(s)\prod_{(p,k)\in P}(1 - p^{-s})\left(\frac{f(p^k)p^{-(k+1)s}}{1 - p^{-s}} + \sum_{n=0}^k f(p^n)p^{-ns}\right) \end{align}\]
As an example, |\eta(s) = (1 - 2^{1-s})\zeta(s) = \mathcal D[f](s)| where |f(n) = \begin{cases}-1,&n=2\\1,&n\neq 2\end{cases}|.
Alternatively, |f(n) = \mu(\gcd(n, 2))| and we can apply the above formula to see: \[\begin{flalign} \mathcal D[\mu(\gcd({-},2))] & = \zeta(s)(1-2^{-s})\left(\frac{\mu(2)2^{-2s}}{1 - 2^{-s}} + \sum_{n=0}^1 \mu(2^n)2^{-ns}\right) \\ & = \zeta(s)(1-2^{-s})\left(\frac{-2^{-2s}}{1 - 2^{-s}} + 1 - 2^{-s}\right) \\ & = \zeta(s)(-2^{-2s} + (1 - 2^{-s})^2) \\ & = \zeta(s)(1 - 2^{1-s}) \end{flalign}\]
|\lambda| and |\gamma|
Recalling, |\lambda| is completely multiplicative and is characterized by |\lambda(p) = -1|.
We can show that |\mathcal D[\lambda](s) = \zeta(2s)/\zeta(s)| which is equivalent to saying |\bar 1^{(2)} \star \mu = \lambda| or |\lambda\star\bar 1 = \bar 1^{(2)}|.
\[\begin{flalign} \zeta(2s)/\zeta(s) & = \prod_{p\in\mathbb P} \frac{1-p^{-s}}{1-(p^{-s})^2} \\ & = \prod_{p\in\mathbb P} \frac{1-p^{-s}}{(1-p^{-s})(1+p^{-s})} \\ & = \prod_{p\in\mathbb P} (1 + p^{-s})^{-1} \\ & = \prod_{p\in\mathbb P} (1 - \lambda(p)p^{-s})^{-1} \\ & = \mathcal D[\lambda](s) \end{flalign}\]
We have |\lambda\mu = \vert\mu\vert = \mu\mu| is the inverse of |\lambda| so |\mathcal D[\vert\mu\vert](s) = \zeta(s)/\zeta(2s)|.
Recalling, |\gamma| is multiplicative and is characterized by |\gamma(p^n) = -1|.
\[\begin{flalign} \mathcal D[\gamma](s) & = \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty \gamma(p^n)p^{-ns}\right) \\ & = \prod_{p \in \mathbb P}\left(1 - \sum_{n=1}^\infty p^{-ns}\right) \\ & = \prod_{p \in \mathbb P}\left(1 - \left(\sum_{n=0}^\infty p^{-ns} - 1\right)\right) \\ & = \prod_{p \in \mathbb P}\frac{2(1 - p^{-s}) - 1}{1 - p^{-s}} \\ & = \prod_{p \in \mathbb P}\frac{1 - 2p^{-s}}{1 - p^{-s}} \end{flalign}\]
This implies that |(\gamma\star\mu)(p^n) = \begin{cases}-2, & n=1 \\ 0, & n > 1 \end{cases}|.
Indicator Functions
Let |1_{\mathbb P}| be the indicator function for the primes. We have |\omega = 1_{\mathbb P}\star\bar 1| or |1_{\mathbb P} = \omega\star\mu|. Directly, |\mathcal D[1_{\mathbb P}](s) = \sum_{p\in\mathbb P}p^{-s}| so we have |\mathcal D[\omega](s)/\zeta(s) = \sum_{p\in\mathbb P} p^{-s}|.
Lemma: |\mathcal D[1_{\mathbb P}](s)=\sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns)|
Proof: We proceed as follows:
\[\begin{align}
\sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns)
& = \sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\left(\prod_{p\in\mathbb P}(1 - p^{-ns})^{-1}\right) \\
& = -\sum_{n=1}^\infty \frac{\mu(n)}{n}\sum_{p\in\mathbb P}\ln(1 - p^{-ns}) \\
& = \sum_{p\in\mathbb P}\sum_{n=1}^\infty \frac{\mu(n)}{n}\sum_{k=1}^\infty p^{-kns}/k \\
& = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \sum_{N=kn} \frac{\mu(n)}{N}p^{-Ns} \\
& = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\sum_{N=kn}\mu(n) \\
& = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}(\mu\star\bar 1)(N) \\
& = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\varepsilon(N) \\
& = \sum_{p\in\mathbb P} p^{-s} \\
& = \mathcal D[1_{\mathbb P}](s)
\end{align}\] |\square|
Let |1_{\mathcal P}| be the indicator function for prime powers. |\Omega = 1_{\mathcal P}\star\bar 1| or |1_{\mathcal P} = \Omega\star\mu|. |\mathcal D[1_{\mathcal P}](s) = \sum_{p\in\mathbb P}(1 - p^{-s})^{-1}| so we have |\mathcal D[\Omega](s)/\zeta(s) = \sum_{p\in\mathbb P}(1 - p^{-s})^{-1}|.
Lemma: |\mathcal D[1_{\mathcal P}](s)=\sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns)|
Proof: This is quite similar to the previous proof.
\[\begin{align}
\sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns)
& = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}\sum_{N=kn}\varphi(n) \\
& = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N}(\varphi\star\bar 1)(N) \\
& = \sum_{p\in\mathbb P}\sum_{N=1}^\infty \frac{p^{-Ns}}{N} N \\
& = \sum_{p\in\mathbb P}\sum_{N=1}^\infty p^{-Ns} \\
& = \mathcal D[1_{\mathcal P}](s)
\end{align}\] |\square|
Summatory Functions
One thing we’ve occasionally been taking for granted is that the operator |\mathcal D| is injective. That is, |\mathcal D[f] = \mathcal D[g]| if and only if |f = g|. To show this, we’ll use the fact that we can (usually) invert the Mellin transform which can be viewed roughly as a version of |\mathcal D| that operates on continuous functions.
Before talking about the Mellin transform, we’ll talk about summatory functions as this will ease our later discussion.
We will turn a sum into a continuous function via a zero-order hold, i.e. we will take the floor of the input. Thus |\sum_{n\leq x} f(n)| is constant on any interval of the form |[k,k+1)|. It then (potentially) has jump discontinuities at integer values. The beginning of the sum is at |n=1| so for all |x<1|, the sum up to |x| is |0|. We will need a slight tweak to better deal with these discontinuities. This will be indicated by a prime on the summation sign.
For non-integer values of |x|, we have: \[\sum_{n \leq x}’ f(n) = \sum_{n \leq x} f(n)\]
For |m| an integer, we have: \[ \sum_{n \leq m}’ f(n) = \frac{1}{2}\left(\sum_{n<m} f(n) + \sum_{n \leq m} f(n)\right) = \sum_{n\leq m} f(n) - f(m)/2 \]
This kind of thing should be familiar to those who’ve worked with things like Laplace transforms of discontinuous functions. (Not for no reason…)
One reason for introducing these summation functions is they are a little easier to work with. Arguably, we want something like |\frac{d}{dx}\sum_{n\leq x}f(n) = \sum_{n=1}^\infty f(n)\delta(n-x)|, but that means we end up with a bunch of distribution nonsense and even more improper integrals. The summation function may be discontinuous, but it at least has a finite value everywhere. Of course, another reason for introducing these functions is that they often are values we’re interested in.
Several important functions are these continuous “sums” of arithmetic functions of this form:
- Mertens function: |M(x) = \sum_{n\leq x}’ \mu(n)|
- Chebyshev function: |\vartheta(x) = \sum_{p\leq x, p\in\mathbb P}’ \ln p = \sum_{n\leq x} 1_{\mathbb P}(n)\ln n|
- Second Chebyshev function: |\psi(x) = \sum_{n\leq x}’ \Lambda(n) = \sum_{n=1}^\infty \vartheta(x^{1/n})|
- The prime-counting function: |\pi(x) = \sum_{n\leq x}’ 1_{\mathbb P}|
- Riemann’s prime-power counting function: |\Pi_0(x) = \sum_{n\leq x} \frac{\Lambda(n)}{\ln n} = \sum_{n=1}^\infty \sum_{p^n\leq x,p\in\mathbb P}’ n^{-1} = \sum_{n=1}^\infty\pi(x^{1/n})n^{-1}|
- |D(x) = \sum_{n\leq x}d(n)|
These are interesting in how they related to the prime-counting function.
Let’s consider the arithmetic function |\Lambda/\ln| whose Dirichlet series is |\ln\zeta|.
We have the summation function |\sum_{n\leq x}’ \Lambda(n)/\ln(n)|, but |\Lambda(n)| is |0| except when |n=p^k| for some |p\in\mathbb P| and |k\in\mathbb N_+|. Therefore, we have \[\begin{align} \sum_{n\leq x}’ \frac{\Lambda(n)}{\ln(n)} & = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{\Lambda(p^k)}{\ln(p^k)} \\ & = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{\ln(p)}{k\ln(p)} \\ & = \sum_{k=1}^\infty\sum_{p^k\leq x, p\in\mathbb P}’ \frac{1}{k} \\ & = \sum_{k=1}^\infty \frac{1}{k} \sum_{p^k\leq x, p\in\mathbb P}’ 1 \\ & = \sum_{k=1}^\infty \frac{1}{k} \sum_{p\leq x^{1/k}, p\in\mathbb P}’ 1 \\ & = \sum_{k=1}^\infty \frac{\pi(x^{1/k})}{k} \\ \end{align}\]
|\ln\zeta(s) = s\mathcal M[\Pi_0](-s)=\mathcal D[\Lambda/\ln](s)| where |\mathcal M| is the Mellin transform, and the connection to Dirichlet series is described in the following section.
Mellin Transform
The definition of the Mellin transform and its inverse are:
\[\mathcal M[f](s) = \int_0^\infty x^s\frac{f(x)}{x}dx\] \[\mathcal M^{-1}[\varphi](x) = \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty} x^{-s}\varphi(s)ds\]
The contour integral is intended to mean the vertical line with real part |c| traversed from negative to positive imaginary values. Modulo the opposite sign of |s| and the extra factor of |x|, this is quite similar to a continuous version of a Dirichlet series.
The Mellin transform is closely related to the two-sided Laplace transform.
\[\mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s)\]
Using Mellin transform properties, particularly the one for transforming the derivative, we can write the following.
\[\begin{align} \mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s) & \iff \mathcal D[f](1-s) = -(s-1)\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](s-1) \\ & \iff \mathcal D[f](1-s) = \mathcal M\left[x\mapsto \frac{d}{dx}\sum_{n\leq x}’ f(n)\right](s) \\ & \iff \mathcal D[f](1-s) = \int_0^\infty x^{s-1}\frac{d}{dx}\sum_{n\leq x}’ f(n)dx \\ & \iff \mathcal D[f](1-s) = \int_0^\infty x^{s-1}\sum_{n=1}^\infty f(n)\delta(x-n)dx \\ & \iff \mathcal D[f](1-s) = \sum_{n=1}^\infty f(n)n^{s-1} \\ & \iff \mathcal D[f](s) = \sum_{n=1}^\infty f(n)n^{-s} \end{align}\]
This leads to Perron’s formula
\[\begin{align} \sum_{n\leq x}’ f(n) & = \mathcal M^{-1}[s\mapsto -\mathcal D[f](-s)/s](x) \\ & = \frac{1}{2\pi i}\int_{-c-i\infty}^{-c+i\infty}\frac{\mathcal D[f](-s)}{-s} x^{-s} ds \\ & = -\frac{1}{2\pi i}\int_{c+i\infty}^{c-i\infty}\frac{\mathcal D[f](s)}{s} x^s ds \\ & = \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty}\frac{\mathcal D[f](s)}{s} x^s ds \end{align}\]
for which we need to take the Cauchy principal value to get something defined. (See also Abel summation.)
There are side conditions on the convergence of |\mathcal D[f]| for these formulas to be justified. See the links.
Many of the operations we’ve described on Dirichlet series follow from Mellin transform properties. For example, we have |\mathcal M[f]’(s) = \mathcal M[f\ln](s)| generally.
Summary
Properties
Dirichlet Convolution
Dirichlet convolution is |(f\star g)(n) = \sum_{d\mid n} f(d)g(n/d) = \sum_{mk=n} f(m)g(k)|.
Dirichlet convolution forms a commutative ring with it as the multiplication, |\varepsilon| as the multiplicative unit and the usual additive structure. This is to say that Dirichlet convolution is commutative, associative, unital, and bilinear.
For |f| completely multiplicative, |f(g\star h) = fg \star fh|.
Dirichlet Inverse
For any |f| such that |f(1)\neq 0|, there is a |g| such that |f\star g = \varepsilon|. In particular, the set of multiplicative functions forms a subgroup of this multiplicative group, i.e. the Dirichlet convolution of multiplicative functions is multiplicative.
If |f(1) \neq 0|, then |f \star g = \varepsilon| where |g| is defined by the following recurrence:
\[\begin{flalign} g(1) & = 1/f(1) \\ g(n) & = -f(1)^{-1}\sum_{d\mid n,d\neq 1}f(d)g(n/d) \end{flalign}\]
For a completely multiplicative |f|, its Dirichlet inverse is |\mu f|.
Convolution Theorem
\[\mathcal D[f](s)\mathcal D[g](s) = \mathcal D[f\star g](s)\]
Möbius Inversion
\[\varepsilon = \bar 1 \star \mu\]
This means from a divisor sum |g(n)\sum_{d\mid n}f(d) = (f\star\bar 1)(n)| for each |n|, we can recover |f| via |g\star\mu = f\star\bar 1\star\mu = f|. Which is to say |f(n)=\sum_{d\mid n}g(d)\mu(n/d)|.
This can be generalized via |({-})^k\mu\star({-})^k = \varepsilon|. In sums, this means when |g(n)=\sum_{d\mid n}d^k f(n/d)|, then |f(n)=\sum_{d\mid n}\mu(d)d^k g(n/d)|.
Let |h| be a completely multiplicative function. Given |g(m) = \sum_{n=1}^\infty f(mh(n))|, then |f(n) = \sum_{m=1}^\infty \mu(m)g(nh(m))|.
Using the Möbius function for finite multisets and their inclusion ordering, we can recast Möbius inversion of naturals as Möbius inversion of finite multisets (of primes) a la: \[n_P = \sum_{Q\subseteq P}\mu(P\setminus Q)n_Q = \sum_{Q\subseteq P}\mu(n_P/n_Q)n_Q = \sum_{d\mid n_P}\mu(n_P/d)d \]
As a nice result, we have: \[\sum_{n=1}^\infty\ln(1-ax^n) = \sum_{n=1}^\infty\rho_n(a)\ln(1-x^n)\] where |n\rho_n(a) = (\varphi \star a^{({-})})(n)|.
Dirichlet Series
\[\mathcal D[f](s) = \sum_{n=1}^\infty f(n)n^{-s}\]
\[\mathcal D[n\mapsto f(n)n^k](s) = \mathcal D[f](s - k)\]
\[\mathcal D[f^{-1}](s) = \mathcal D[f](s)^{-1}\] where the inverse on the left is the Dirichlet inverse.
\[\mathcal D[f]’(s) = -\mathcal D[f\ln](s)\]
For a completely multiplicative |f|, \[\mathcal D[f]’(s)/\mathcal D[f](s) = -\mathcal D[f\Lambda](s)\] and: \[\ln\mathcal D[f](s) = \mathcal D[f\Lambda/\ln](s)\]
Dirichlet series as a Mellin transform:
\[\mathcal D[f](s) = s\mathcal M\left[x\mapsto \sum_{n\leq x}’ f(n)\right](-s)\]
The corresponding inverse Mellin transform statement is called Perron’s Formula:
\[\sum_{n\leq x}’ f(n) = \frac{1}{2\pi i}\int_{c-i\infty}^{c+i\infty}\frac{\mathcal D[f](s)}{s} x^s ds\]
Euler Product Formula
Assuming |f| is multiplicative, we have:
\[\mathcal D[f](s) = \prod_{p \in \mathbb P}\sum_{n=0}^\infty f(p^n)p^{-ns} = \prod_{p \in \mathbb P}\left(1 + \sum_{n=1}^\infty f(p^n)p^{-ns}\right) \]
When |f| is completely multiplicative, this can be simplified to:
\[\mathcal D[f](s) = \prod_{p \in \mathbb P}(1 - f(p)p^{-s})^{-1} \]
Lambert Series
Given an arithmetic function |a|, these are series of the form: \[ \sum_{n=1}^\infty a(n) \frac{x^n}{1-x^n} = \sum_{n=1}^\infty (a \star \bar 1)(n) x^n \]
\[\sum_{n=1}^\infty \mu(n) \frac{x^n}{1-x^n} = x\]
\[\sum_{n=1}^\infty \varphi(n) \frac{x^n}{1-x^n} = \frac{x}{(1-x)^2}\]
Arithmetic function definitions
|f(p^n)=\cdots| implies a multiplicative/additive function, while |f(p)=\cdots| implies a completely multiplicative/additive function.
|p^z| for |z\in\mathbb C| is completely multiplicative. This includes the identity function (|z=1|) and |\bar 1| (|z=0|). For any multiplicative |f|, |f\circ \gcd({-},k)| is multiplicative.
|\ln| is completely additive.
Important but neither additive nor multiplicative are the indicator functions for primes |1_{\mathbb P}| and prime powers |1_{\mathcal P}|.
The following functions are (completely) multiplicative unless otherwise specified.
\[\begin{flalign} \varepsilon(p) & = 0 \tag{unit function} \\ \bar 1(p) & = 1 = p^0 \\ \mu(p^n) & = \begin{cases}-1, & n = 1 \\ 0, & n > 1\end{cases} \tag{Möbius function} \\ \Omega(p) & = 1 \tag{additive} \\ \lambda(p) & = -1 = (-1)^{\Omega(p)} \tag{Liouville function} \\ \omega(p^n) & = 1 \tag{additive} \\ \gamma(p^n) & = -1 = (-1)^{\omega(p^n)} \\ a(p^n) & = p(n) \tag{p(n) is the partition function} \\ \varphi(p^n) & = p^n - p^{n-1} = p^n(1 - 1/p) = J_1(p^n) \tag{Euler totient function} \\ \sigma_k(p^n) & = \sum_{m=0}^n p^{km} = \sum_{d\mid p^n} d^k = \frac{p^{k(n+1)}-1}{p^k - 1} \tag{last only works for k>0} \\ d(p^n) & = n + 1 = \sigma_0 \\ f^{[k]}(p^n) & = \begin{cases}f(p^m),& km=n\\0,& k\nmid n\end{cases} \tag{f multiplicative} \\ \Lambda(n) & = \begin{cases}\ln p,&p\in\mathbb P\land\exists k\in\mathbb N_+.n=p^k \\ 0, & \text{otherwise}\end{cases} \tag{not multiplicative} \\ J_k(p^n) & = p^{kn} - p^{k(n-1)} = p^{kn}(1 - p^{-k}) \tag{Jordan totient function} \\ \psi_k(p^n) & = p^{kn} + p^{k(n-1)} = p^{kn}(1 + p^{-k}) = J_{2k}(p^n)/J_k(p^n) \tag{Dedekind psi function} \\ \end{flalign}\]
Dirichlet convolutions
\[\begin{flalign} \varepsilon & = \bar 1 \star \mu \\ \varphi & = \operatorname{id}\star\mu \\ \sigma_z & = ({-})^z \star \bar 1 = \psi_z \star \bar 1^{(2)} \\ \sigma_1 & = \varphi \star d \\ d & = \sigma_0 = \bar 1 \star \bar 1 \\ f \star f & = fd \tag{f completely multiplicative} \\ f\Lambda & = f\ln \star f\mu = f\ln \star f^{-1} \tag{f completely multiplicative, Dirichlet inverse} \\ \lambda & = \bar 1^{(2)} \star \mu \\ \vert\mu\vert & = \lambda^{-1} = \mu\lambda \tag{Dirichlet inverse} \\ 2^\omega & = \vert\mu\vert \star \bar 1 \\ \psi_z & = ({-})^z \star \vert\mu\vert \\ \operatorname{fix} \pi^{(-)} & = \bar 1 \star (k \mapsto k\pi_k) \tag{for a permutation} \\ ({-})^k & = J_k \star \bar 1 \end{flalign}\]
More Dirichlet convolution identities are here, though many are trivial consequences of the earlier properties.
Dirichlet series
\[\begin{array}{l|ll} f(n) & \mathcal D[f](s) & \\ \hline \varepsilon(n) & 1 & \\ \bar 1(n) & \zeta(s) & \\ n & \zeta(s-1) & \\ n^z & \zeta(s-z) & \\ \sigma_z(n) & \zeta(s-z)\zeta(s) & \\ \mu(n) & \zeta(s)^{-1} & \\ \vert\mu(n)\vert & \zeta(s)/\zeta(2s) & \\ \varphi(n) & \zeta(s-1)/\zeta(s) & \\ d(n) & \zeta(s)^2 & \\ \mu(\gcd(n, 2)) & \eta(s) = (1-2^{1-s})\zeta(s) & \\ \lambda(n) & \zeta(2s)/\zeta(s) \\ \gamma(n) & \prod_{p \in \mathbb P}\frac{1-2p^{-s}}{1-p^{-s}} & \\ f^{[k]}(n) & \mathcal D[f](ks) & \\ f(n)\ln n & -\mathcal D[f]’ (s) & f\text{ completely multiplicative}\\ \Lambda(n) & -\zeta’(s)/\zeta(s) & \\ \Lambda(n)/\ln(n) & \ln\zeta(s) & \\ 1_{\mathbb P}(n) & \sum_{n=1}^\infty \frac{\mu(n)}{n}\ln\zeta(ns) & \\ 1_{\mathcal P}(n) & \sum_{n=1}^\infty \frac{\varphi(n)}{n}\ln\zeta(ns) & \\ \psi_k(n) & \zeta(s)\zeta(s - k)/\zeta(2s) & \\ J_k(n) & \zeta(s - k)/\zeta(s) & \end{array}\]
Viewing natural numbers as multisets, |D_n| is the set of all sub-multisets of |n|. The isomorphism described is then simply the fact that given any sub-multiset of the union of two disjoint multisets, we can sort the elements into their original multisets producing two sub-multisets of the disjoint multisets.↩︎
Incidence algebras are a decategorification of the notion of a category algebra.↩︎
In a different but related vein, we can provide a combinatorial interpretation of Dirichlet series via Dirichlet species as described in On the arithmetic product of combinatorial species.↩︎
Purely functional list concatenation, xs ++ ys in Haskell syntax, is well known to be linear time
in the length of the first input and constant time in the length of the second, i.e. xs ++ ys is
O(length xs). This leads to quadratic complexity if we have a bunch of left associated uses of
concatenation.
The ancient trick to resolve this is to, instead of producing lists, produce list-to-list functions
a la [a] -> [a] or ShowS = String -> String = [Char] -> [Char]. “Concatenation” of “lists”
represented this way is just function composition which is a constant time operation. We can lift a
list xs to this representation via the section (xs ++). This will still lead to O(length xs)
amount of work to apply this function, but a composition of such functions applied to a list will
always result in a fully right associated expression even if the function compositions aren’t
right associated.
In the last several years, it has become popular to refer to this technique as “difference lists”. Often no justification is given for this name. When it is given, it is usually a reference to the idea of difference lists in logic programming. Unfortunately, other than both techniques giving rise to efficient concatenation, they have almost no similarities.
Functional Lists
To start, I want to do a deeper analysis of the “functional lists” approach, because I think what it is doing is a bit misunderstood and, consequently, oversold1. Let’s see how we would model this approach in an OO language without higher-order functions, such as early Java. I’ll use strings for simplicity, but it would be exactly the same for generic lists.
interface PrependTo {
String prependTo(String end);
}
class Compose implements PrependTo {
private PrependTo left;
private PrependTo right;
public Compose(PrependTo left, PrependTo right) {
this.left = left; this.right = right;
}
String prependTo(String end) {
this.left.prependTo(this.right.prependTo(end));
}
}
class Prepend implements PrependTo {
private String s;
public Prepend(String s) { this.s = s; }
String prependTo(String end) {
return this.s + end;
}
}This is just a straight, manual implementation of closures for (.) and (++) (specialized to
strings). Other lambdas not of the above two forms would lead to other implementations of
PrependTo. Let’s say, however, these are the only two forms that actually occur, which is mostly
true in Haskell practice, then another view on this OO code (to escape back to FP) is that it is an
OOP encoding of the algebraic data type:
data PrependTo = Compose PrependTo PrependTo | Prepend String
prependTo :: PrependTo -> String -> String
prependTo (Compose left right) end = prependTo left (prependTo right end)
prependTo (Prepend s) end = s ++ endWe could have also arrived at this by defunctionalizing a typical example of the technique. Modulo
some very minor details (that could be resolved by using the Church-encoded version of this), this
does accurately reflect what’s going on in the technique. Compose is clearly constant time. Less
obviously, applying these functional lists requires traversing this tree of closures – made
into an explicit tree here. In fact, this reveals that this representation could require arbitrarily
large amounts of work for a given size of output. This is due to the fact that prepending an empty
string doesn’t increase the output size but still increases the size of the tree. In practice,
it’s a safe assumption that, on average, at least one character will be prepended per leaf of the
tree which makes the overhead proportional to the size of the output.
This tree representation is arguably better than the “functional list” representation. It’s less
flexible for producers, but that’s arguably a good thing because we didn’t really want arbitrary
String -> String functions. It’s more flexible for consumers. For example, getting the head of
the list is a relatively efficient operation compared to applying a “functional list” and taking
the head of the result even in an eager language. (Laziness makes both approaches comparably
efficient.) Getting the last element is just the same for the tree version, but, even with laziness,
is much worse for the functional version. More to the point, this concrete representation allows
the concatenation function to avoid adding empty nodes to the tree whereas (.) can’t pattern
match on whether a function is the identity function or not.
This view makes it very clear what the functional version is doing.
Difference Lists in Prolog
List append is the archetypal example of a Prolog program due to the novelty of its “invertibility”.
For our purposes, viewing this as a function of the first two arguments, this is exactly the usual functional implementation of list concatenation with exactly the same problems. We could, of course, encode the defunctionalized version of the functional approach into (pure) Prolog. This would produce:
prepend_to(compose(Xs, Ys), End, Zs) :- prepend_to(Ys, End, End2), prepend_to(Xs, End2, Zs).
prepend_to(prepend(Xs), End, Zs) :- append(Xs, End, Zs).(I’ll be ignoring the issues that arise due to Prolog’s untyped nature.)
However, this being a logic programming language means we have additional tools available to use that functional languages lack. Namely, unification variables. For an imperative (destructive) implementation of list concatenation, the way we’d support efficient append of linked lists is we’d keep pointers to the start and end of the list. To append two lists, we’d simply use the end pointer of the first to update the end of the first list to point at the start of the second. We’d then return a pair consisting of the start pointer of the first and the end pointer of the second.
This is exactly how Prolog difference lists work, except instead of pointers, we use unification
variables which are more principled. Concretely, we represent a list as a pair of lists, but the
second list will be represented by an unbound unification variable and the first list contains
that same unification variable as a suffix. This pair is often represented using the infix
operator (“functor” in Prolog terminology), -, e.g. Xs - Ys. We could use diff(Xs, Ys) or
some other name. - isn’t a built-in operator, it’s just a binary constructor essentially.
At the level of logic, there are no unification variables. The constraints above mean that Xs - Ys
is a list Xs which contains Ys as a suffix.
The name “difference list” is arguably motivated by the definition of concatenation in this representation.
This looks a lot like |Xs - Ys + Ys - Zs = Xs - Zs|. If the suffix component of the first argument
is unbound, like it’s supposed to be, then this is a constant-time operation of binding that
component to Ys. If it is bound, then we need to unify which, in the worst-case, is O(length Ys)
where the length is up to either nil or an unbound variable tail2.
We also have the unit of concat, i.e. the empty
list via3:
See the footnote, but this does in some way identify Xs - Ys with the “difference” of Xs and
Ys.
We get back to a “normal” list via:
to_list is a constant-time operation, no matter what. Note, to_list binds the suffix component
of the difference list. This means that the first input no longer meets our condition to be a
difference list. In other words, to_list (and prepend_to) consumes the difference list.
More precisely, it constrains the possible suffixes the list could be.
Indeed, any operation that binds the suffix component of a difference list consumes it. For example,
concat consumes its first argument.
Of course, it still makes logical sense to work with the difference list when its suffix component
is bound, it’s just that its operational interpretation is different. More to the point, given a
difference list, you cannot prepend it (via prepend_to or concat) to two different lists to get
two different results.
Converting from a list does require traversing the list since we need to replace the nil node, i.e.
[], with a fresh unbound variable. Luckily, this is exactly what append does.
from_list also suggests this “difference list” idea. If all of Xs, Ys, and Zs are ground
terms, then from_list(Xs, Ys - Zs) holds when append(Xs, Zs, Ys) holds. Exactly when if our
invariants are maintained, i.e. that Zs is a suffix of Ys. Writing these relations more
functionally and writing append as addition, we’d have:
\[\mathtt{from\_list}(Xs) = Ys - Zs \iff Xs + Zs = Ys\]
If we did want to “duplicate” a difference list, we’d essentially need to convert it to a (normal)
list with to_list, and then we could use from_list multiple times on that result. This would,
of course, still consume the original difference list. We’d also be paying O(length Xs) for every
duplicate, including to replace the one we just consumed4.
That said, we can prepend a list to a difference list without consuming it. We can perform other actions with the risk of (partially) consuming the list, e.g. indexing into the list. Indexing into the list would force the list to be at least a certain length, but still allow prepending to any list that will result in a final list at least that long.
Comparison
I’ll start the comparison with a massive discrepancy that we will ignore going forward. Nothing
enforces that a value of type ShowS actually just appends something to its input. We could use
abstract data type techniques or the defunctionalized version to avoid this. To be fair, difference
lists also need an abstraction barrier to ensure their invariants, though their failure modes are
different. A difference list can’t change what it is based on what it is prepended to.
| Functional Representation | Difference Lists |
|---|---|
| constant-time concatenation | constant-time concatenation |
| constant-time conversion from a list (though you pay for it later) | O(n) conversion from a list |
| persistent | non-persistent, requires linear use |
| represented by a tree of closures | represented by a pair of a list and a unification variable |
| O(n) (or worse!) conversion to a list | constant-time conversion to a list |
| defunctionalized version can be implemented in pretty much any language | requires at least single-assignment variables |
| unclear connection to being the difference of two lists (which two lists?) | mathematical, if non-obvious, connection to being the difference of two (given) lists |
As an illustration of the difference between persistent and non-persistent uses, the function:
is a perfectly sensible function on ShowS values that behaves exactly as you’d expect. On the
other hand:
is nonsense that will fail the occurs check (if it is enabled, otherwise it will create a cyclic
list) except for when In is the empty difference list.
Conclusion
I hope I’ve illustrated that the functional representation is not just not difference lists, but is, in fact, wildly different from difference lists.
This functional representation is enshrined into Haskell via the ShowS type and related functions,
but I’d argue the concrete tree representation is actually clearer and better. The functional
representation is more of a cute trick that allows us to reuse existing functions. Really, ShowS
should have been an abstract type.
Difference lists are an interesting example of how imperative ideas can be incorporated into a declarative language. That said, difference lists come with some of the downsides of an imperative approach, namely the lack of persistence.
As far as I’m aware, there isn’t an unambiguous and widely accepted name for this functional representation. Calling it “functional lists” or something like that is, in my opinion, very ambiguous and potentially misleading. I think the lack of a good name for this is why “difference lists” started becoming popular. As I’ve argued, using “difference list” in this context is even more misleading and confusing.
If people really want a name, one option might be “delta list”. I don’t think this term is used. It keeps the intuitive idea that the functional representation represents some “change” to a list, a collection of deltas that will all be applied at once, but it doesn’t make any false reference to difference lists. I’m not super into this name; I just want something that isn’t “difference list” or otherwise misleading.
To be clear, it’s still much, much, better than using plain concatenation.↩︎
Such a length relation couldn’t be written in pure Prolog but can in actual Prolog.↩︎
For those algebraically minded, this almost makes
concatandemptyinto another monoid exceptconcatis partial, but such a partial monoid is just a category! In other words, we have a category whose objects are lists and whose homsets are, at most, singletons containingXs - Ysfor Hom(Xs, Ys). If we maintain our invariant that we haveXs - Ysonly whenYsis a suffix ofXs, this thin category is exactly the category corresponding to the reflexive, transitive “has suffix” relation. We could generalize this to any monoid via a “factors through” relation, i.e. |\mathrm{Hom}(m, n)| is inhabited if and only if |\exists p. m = pn| which you can easily prove is a reflexive, transitive relation given the monoid axioms. However, for a general monoid, we can have a (potentially) non-thin category by saying |p \in \mathrm{Hom}(m,n)| if and only if |m = pn|. The category will be thin if and only if the monoid is cancellative. This is exactly the slice category of the monoid viewed as a one-object category.↩︎Again, in actual Prolog, we could make a duplicate without consuming the original, though it would still take O(length Xs) time using the notion of length mentioned before.↩︎
Classical First-Order Logic (Classical FOL) has an absolutely central place in traditional logic, model theory, and set theory. It is the foundation upon which ZF(C), which is itself often taken as the foundation of mathematics, is built. When classical FOL was being established there was a lot of study and debate around alternative options. There are a variety of philosophical and metatheoretic reasons supporting classical FOL as The Right Choice.
This all happened, however, well before category theory was even a twinkle in Mac Lane’s and Eilenberg’s eyes, and when type theory was taking its first stumbling steps.
My focus in this article is on what classical FOL looks like to a modern categorical logician. This can be neatly summarized as “classical FOL is the internal logic of a Boolean First-Order Hyperdoctrine.” Each of the three words in this term, “Boolean”, “First-Order”, and “Hyperdoctrine”, suggest a distinct axis in which to vary the (class of categorical models of the) logic. All of them have compelling categorical motivations to be varied.
Boolean
The first and simplest is the term “Boolean”. This is what differentiates the categorical semantics of classical (first-order) logic from constructive (first-order) logic. Considering arbitrary first-order hyperdoctrines would give us a form of intuitionistic first-order logic.
It is fairly rare that the categories categorists are interested in are Boolean. For example, most toposes, all of which give rise to first-order hyperdoctrines, are not Boolean. The assumption that they are tends to correspond to a kind of “discreteness” that’s often at odds with the purpose of the topos. For example, a category of sheaves on a topological space is Boolean if and only if every open subset of that space is closed. This implies, for example, that such a space is extremally disconnected.
First-Order
The next term is the term “first-order”. As the name suggests, a first-order hyperdoctrine has the necessary structure to interpret first-order logic. The question, then, is what kind of categories have this structure and only this structure. The answer, as far as I’m aware, is not many.
Many (classes of) categories have the structure to be first-order hyperdoctrines, but often they have additional structure as well that it seems odd to ignore. The most notable and interesting example is toposes. All elementary toposes (which includes all Grothendieck toposes) have the structure to give rise to a first-order hyperdoctrine. But, famously, they also have the structure to give rise to a higher order logic. Even more interesting, while Grothendieck toposes, being elementary toposes, technically do support the necessary structure for first-order logic, the natural morphisms of Grothendieck toposes, geometric morphisms, do not preserve that structure, unlike the logical functors between elementary toposes.
The natural internal logic for Grothendieck toposes turns out to be geometric logic. This is a logic that lacks universal quantification and implication (and thus negation) but does have infinitary disjunction. This leads to a logic that is, at least superficially, incomparable to first-order logic. Closely related logics are regular logic and coherent logic which are sub-logics of both geometric logic and first-order logic.
We see, then, just from the examples of the natural logics of toposes, none of them are first-order logic, and we get examples that are more powerful, less powerful, and incomparable to first-order logic. Other common classes of categories give other natural logics, such as the cartesian logic from left exact categories, and monoidal categories give rise to (ordered) linear logics. We get the simply typed lambda calculus from cartesian closed categories which leads to the next topic.
Hyperdoctrine
A (posetal) hyperdoctrine essentially takes a category and, for each object in that category, assigns to it a poset of “predicates” on that object. In many cases, this takes the form of the Sub functor assigning to each object its poset of subobjects. Various versions of hyperdoctrines will require additional structure on the source category, these posets, and/or the functor itself to interpret various logical connectives. For example, a regular hyperdoctrine requires the source category to have finite limits, the posets to be meet-semilattices, and the functor to give rise to monotonic functions with left adjoints satisfying certain properties. This notion of hyperdoctrines is suitable for regular logic.
It’s very easy to recognize that these functors are essentially indexed |(0,1)|-categories. This immediately suggests that we should consider higher categorical versions or at the very least normal indexed categories.
What this means for the logic is that we move from proof-irrelevant logic to proof-relevant logic. We now have potentially multiple ways a “predicate” could “entail” another “predicate”. We can present the simply typed lambda calculus in this indexed category manner. This naturally leads/connects to the categorical semantics of type theories.
Pushing forward to |(\infty, 1)|-categories is also fairly natural, as it’s natural to want to talk about an entailment holding for distinct but “equivalent” reasons.
Summary
Moving in all three of these directions simultaneously leads pretty naturally to something like Homotopy Type Theory (HoTT). HoTT is a naturally constructive (but not anti-classical) type theory aimed at being an internal language for |(\infty, 1)|-toposes.
Why Classical FOL?
Okay, so why did people pick classical FOL in the first place? It’s not like the concept of, say, a higher-order logic wasn’t considered at the time.
Classical versus Intuitionistic was debated at the time, but at that time it was primarily a philosophical argument, and the defense of Intuitionism was not very compelling (to me and obviously people at the time). The focus would probably have been more on (classical) FOL versus second- (or higher-)order logic.
Oversimplifying, the issue with second-order logic is fairly evident from the semantics. There are two main approaches: Henkin-semantics and full (or standard) semantics. Henkin-semantics keeps the nice properties of (classical) FOL but fails to get the nice properties, namely categoricity properties, of second-order logic. This isn’t surprising as Henkin-semantics can be encoded into first-order logic. It’s essentially syntactic sugar. Full semantics, however, states that the interpretation of predicate sorts is power sets of (cartesian products of) the domain1. This leads to massive completeness problems as our metalogical set theory has many, many ways of building subsets of the domain. There are metatheoretic results that state that there is no computable set of logical axioms that would give us a sound and complete theory for second-order logic with respect to full semantics. This aspect is also philosophically problematic, because we don’t want to need set theory to understand the very formulation of set theory. Thus Quine’s comment that “second-order logic [was] set theory in sheep’s clothing”.
On the more positive and (meta-)mathematical side, we have results like Lindström’s theorem which states that classical FOL is the strongest logic that simultaneously satisfies (downward) Löwenheim-Skolem and compactness. There’s also a syntactic result by Lindström which characterizes first-order logic as the only logic having a recursively enumerable set of tautologies and satisfying Löwenheim-Skolem2.
The Catch
There’s one big caveat to the above. All of the above results are formulated in traditional model theory which means there are various assumptions built in to their statements. In the language of categorical logic, these assumptions can basically be summed up in the statement that the only category of semantics that traditional model theory considers is Set.
This is an utterly bizarre thing to do from the standpoint of categorical logic.
The issues with full semantics follow directly from this choice. If, as categorical logic would have us do, we considered every category with sufficient structure as a potential category of semantics, then our theory would not be forced to follow every nook and cranny of Set’s notion of subset to be complete. Valid formulas would need to be true not only in Set but in wildly different categories, e.g. every (Boolean) topos.
These traditional results are also often very specific to classical FOL. Dropping this constraint of classical logic would lead to an even broader class of models.
Categorical Perspective on Classical First-Order Logic
A Boolean category is just a coherent category where every object has a complement. Since coherent functors preserve complements, we have that the category of Boolean categories is a full subcategory of the category of coherent categories.
One nice thing about, specifically, classical first-order logic from the perspective of category theory is the following. First, coherent logic is a sub-logic of geometric logic restricted to finitary disjunction. Via Morleyization, we can encode classical first-order logic into coherent logic such that the categories of models of each are equivalent. This implies that a classical FOL formula is valid if and only if its encoding is. Morleyization allows us to analyze classical FOL using the tools of classifying toposes. On the one hand, this once again suggests the importance of coherent logic, but it also means that we can use categorical tools with classical FOL.
Conclusion
There are certain things that I and, I believe, most logicians take as table stakes for a (foundational) logic3. For example, checking a proof should be computably decidable. For these reasons, I am in complete accord with early (formal) logicians that classical second-order logic with full semantics is an unacceptably worse alternative to classical first-order logic.
However, when it comes to statements about the specialness of FOL, a lot of them seem to be more statements about traditional model theory than FOL itself, and also statements about the philosophical predilections of the time. I feel that philosophical attitudes among logicians and mathematicians have shifted a decent amount since the beginning of the 20th century. We have different philosophical predilections today than then, but they are informed by another hundred years of thought, and they are more relevant to what is being done today.
Martin-Löf type theory (MLTT) and its progeny also present an alternative path with their own philosophical and metalogical justifications. I mention this to point out actual cases of foundational frameworks that a (very) superficial reading of traditional model theory results would seem to have been “ruled out”. Even if one thinks the FOL+ZFC (or whatever) is the better foundations, I think it is unreasonable to assert that MLTT derivatives are unworkable as a foundations.
It’s worth mentioning that this is exactly what categorical logic would suggest: our syntactic power objects should be mapped to semantic power objects.↩︎
While nice, it’s not clear that compactness and, especially, Löwenheim-Skolem are sacrosanct properties that we’d be unwilling to do without. Lindström’s first theorem is thus a nice abstract characterization theorem for classical FOL, but it doesn’t shut the door on considering alternatives even in the context of traditional model theory.↩︎
I’m totally fine thinking about logics that lack these properties, but I would never put any of them forward as an acceptable foundational logic.↩︎
In 1983, Mark Overmars described global rebuilding in The Design of Dynamic Data Structures. The problem it was aimed at solving was turning the amortized time complexity bounds of batched rebuilding into worst-case bounds. In batched rebuilding we perform a series of updates to a data structure which may cause the performance of operations to degrade, but occasionally we expensively rebuild the data structure back into an optimal arrangement. If the updates don’t degrade performance too much before we rebuild, then we can achieve our target time complexity bounds in an amortized sense. An update that doesn’t degrade performance too much is called a weak update.
Taking an example from Okasaki’s Purely Functional Data Structures, we can consider a binary search tree where deletions occur by simply marking the deleted nodes as deleted. Then, once about half the tree is marked as deleted, we rebuild the tree into a balanced binary search tree and clean out the nodes marked as deleted at that time. In this case, the deletions count as weak updates because leaving the deleted nodes in the tree even when it corresponds to up to half the tree can only mildly impact the time complexity of other operations. Specifically, assuming the tree was balanced at the start, then deleting half the nodes could only reduce the tree’s depth by about 1. On the other hand, naive inserts are not weak updates as they can quickly increase the tree’s depth.
The idea of global rebuilding is relatively straightforward, though how you would actually realize it in any particular example is not. The overall idea is simply that instead of waiting until the last moment and then rebuilding the data structure all at once, we’ll start the rebuild sooner and work at it incrementally as we perform other operations. If we update the new version faster than we update the original version, we’ll finish it by the time we would have wanted to perform a batched rebuild, and we can just switch to this new version.
More concretely, though still quite vaguely, global rebuilding involves, when a threshold is reached, rebuilding by creating a new “empty” version of the data structure called the shadow copy. The original version is the working copy. Work on rebuilding happens incrementally as operations are performed on the data structure. During this period, we service queries from the working copy and continue to update it as usual. Each update needs to make more progress on building the shadow copy than it worsens the working copy. For example, an insert should insert more nodes into the shadow copy than the working copy. Once the shadow copy is built, we may still have more work to do to incorporate changes that occurred after we started the rebuild. To this end, we can maintain a queue of update operations performed on the working copy since the start of a rebuild, and then apply these updates, also incrementally, to the shadow copy. Again, we need to apply the updates from the queue at a fast enough rate so that we will eventually catch up. Of course, all of this needs to happen fast enough so that 1) the working copy doesn’t get too degraded before the shadow copy is ready, and 2) we don’t end up needing to rebuild the shadow copy before it’s ready to do any work.
Coroutines
Okasaki passingly mentions that global rebuilding “can be usefully viewed as running the rebuilding transformation as a coroutine”. Also, the situation described above is quite reminiscent of garbage collection. There the classic half-space stop-the-world copying collector is naturally the batched rebuilding version. More incremental versions often have read or write barriers and break the garbage collection into incremental steps. Garbage collection is also often viewed as two processes coroutining.
The goal of this article is to derive global rebuilding-based data structures from
an expression of them as two coroutining processes. Ideally, we should be able to
take a data structure implemented via batched rebuilding and simply run the batch
rebuilding step as a coroutine. Modifying the data structure’s operations and the
rebuilding step should, in theory, just be a matter of inserting appropriate yield
statements. Of course, it won’t be that easy since the batched version of rebuilding
doesn’t need to worry about concurrent updates to the original data structure.
In theory, such a representation would be a perfectly effective way of articulating the global rebuilding version of the data structure. That said, I will be using the standard power move of CPS transforming and defunctionalizing to get a more data structure-like result.
I’ll implement coroutines as a very simplified case of modeling cooperative concurrency with continuations. In that context, a “process” written in continuation-passing style “yields” to the scheduler by passing its continuation to a scheduling function. Normally, the scheduler would place that continuation at the end of a work queue and then pick up a continuation from the front of the work queue and invoke it resuming the previously suspended “process”. In our case, we only have two “processes” so our “work queue” can just be a single mutable cell. When one “process” yields, it just swaps its continuation into the cell and the other “process’” out and invokes the continuation it read.
Since the rebuilding process is always driven by the main process, the pattern is a bit more like generators. This has the benefit that only the rebuilding process needs to be written in continuation-passing style. The following is a very quick and dirty set of functions for this.
module Coroutine ( YieldFn, spawn ) where
import Control.Monad ( join )
import Data.IORef ( IORef, newIORef, readIORef, writeIORef )
type YieldFn = IO () -> IO ()
yield :: IORef (IO ()) -> IO () -> IO ()
yield = writeIORef
resume :: IORef (IO ()) -> IO ()
resume = join . readIORef
terminate :: IORef (IO ()) -> IO ()
terminate yieldRef = writeIORef yieldRef (ioError $ userError "Subprocess completed")
spawn :: (YieldFn -> IO () -> IO ()) -> IO (IO ())
spawn process = do
yieldRef <- newIORef undefined
writeIORef yieldRef $ process (yield yieldRef) (terminate yieldRef)
return (resume yieldRef)A simple example of usage is:
process :: YieldFn -> Int -> IO () -> IO ()
process _ 0 k = k
process yield i k = do
putStrLn $ "Subprocess: " ++ show i
yield $ process yield (i-1) k
example :: IO ()
example = do
resume <- spawn $ \yield -> process yield 10
forM_ [(1 :: Int) .. 10] $ \i -> do
putStrLn $ "Main process: " ++ show i
resume
putStrLn "Main process done"with output:
Main process: 1
Subprocess: 10
Main process: 2
Subprocess: 9
Main process: 3
Subprocess: 8
Main process: 4
Subprocess: 7
Main process: 5
Subprocess: 6
Main process: 6
Subprocess: 5
Main process: 7
Subprocess: 4
Main process: 8
Subprocess: 3
Main process: 9
Subprocess: 2
Main process: 10
Subprocess: 1
Main process done
Queues
I’ll use queues since they are very simple and Purely Functional Data Structures describes Hood-Melville Real-Time Queues in Figure 8.1 as an example of global rebuilding. We’ll end up with something quite similar which could be made more similar by changing the rebuilding code. Indeed, the differences are just an artifact of specific, easily changed details of the rebuilding coroutine, as we’ll see.
The examples I’ll present are mostly imperative, not purely functional. There
are two reasons for this. First, I’m not focused on purely functional data structures
and the technique works fine for imperative data structures. Second, it is arguably
more natural to talk about coroutines in an imperative context. In this case,
it’s easy to adapt the code to a purely functional version since it’s not much
more than a purely functional data structure stuck in an IORef.
For a more imperative structure with mutable linked structure and/or in-place array updates, it would be more challenging to produce a purely functional version. The techniques here could still be used, though there are more “concurrency” concerns. While I don’t include the code here, I did a similar exercise for a random-access stack (a fancy way of saying a growable array). There the “concurrency” concern is that the elements you are copying to the new array may be popped and potentially overwritten before you switch to the new array. In this case, it’s easy to solve, since if the head pointer of the live version reaches the source offset for copy, you can just switch to the new array immediately.
Nevertheless, I can easily imagine scenarios where it may be beneficial, if
not necessary, for the coroutines to communicate more and/or for there to be
multiple “rebuild” processes. The approach used here could be easily adapted
to that. It’s also worth mentioning that even in simpler cases, non-constant-time
operations will either need to invoke resume multiple times or need more
coordination with the “rebuild” process to know when it can do more than a
constant amount of work. This could be accomplished by “rebuild” process
simply recognizing this from the data structure state, or some state could
be explicitly set to indicate this, or the techniques described earlier
could be used, e.g. a different process for non-constant-time operations.
The code below uses the extensions BangPatterns, RecordWildCards, and GADTs.
Batched Rebuilding Implementation
We start with the straightforward, amortized constant-time queues where we push to a stack representing the back of the queue and pop from a stack representing the front. When the front stack is empty, we need to expensively reverse the back stack to make a new front stack.
I intentionally separate out the reverse step as an explicit rebuild function.
module BatchedRebuildingQueue ( Queue, new, enqueue, dequeue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef )
data Queue a = Queue {
queueRef :: IORef ([a], [a])
}
new :: IO (Queue a)
new = do
queueRef <- newIORef ([], [])
return Queue { .. }
dequeue :: Queue a -> IO (Maybe a)
dequeue q@(Queue { .. }) = do
(front, back) <- readIORef queueRef
case front of
(x:front') -> do
writeIORef queueRef (front', back)
return (Just x)
[] -> case back of
[] -> return Nothing
_ -> rebuild q >> dequeue q
enqueue :: a -> Queue a -> IO ()
enqueue x (Queue { .. }) =
modifyIORef queueRef (\(front, back) -> (front, x:back))
rebuild :: Queue a -> IO ()
rebuild (Queue { .. }) =
modifyIORef queueRef (\([], back) -> (reverse back, []))Global Rebuilding Implementation
This step is where a modicum of thought is needed. We need to make the
rebuild step from the batched version incremental. This is straightforward,
if tedious, given the coroutine infrastructure. In this case, we incrementalize
the reverse by reimplementing reverse in CPS with some yield calls
inserted. Then we need to incrementalize append. Since we’re not waiting
until front is empty, we’re actually computing front ++ reverse back.
Incrementalizing append is hard, so we actually reverse front and then
use an incremental reverseAppend (which is basically what the incremental
reverse does anyway1).
One of first thing to note about this code is that the actual operations are
largely unchanged other than inserting calls to resume. In fact, dequeue
is even simpler than in the batched version as we can just assume that front
is always populated when the queue is not empty. dequeue is freed from the
responsibility of deciding when to trigger a rebuild. Most of the bulk of
this code is from reimplementing a reverseAppend function (twice).
The parts of this code that require some deeper though are 1) knowing when
a rebuild should begin, 2) knowing how “fast” the incremental operations
should go2
(e.g. incrementalReverse does two steps at a time and the
Hood-Melville implementation has an explicit exec2 that does two steps
at a time), and 3) dealing with “concurrent” changes.
For the last, Overmars describes a queue of deferred operations to perform
on the shadow copy once it finishes rebuilding. This kind of suggests a
situation where the “rebuild” process can reference some “snapshot” of
the data structure. In our case, that is the situation we’re in, since
our data structures are essentially immutable data structures in an IORef.
However, it can easily not be the case, e.g. the random-access stack.
Also, this operation queue approach can easily be inefficient and inelegant.
None of the implementations below will have this queue of deferred operations.
It is easier, more efficient, and more elegant to just not copy over parts of
the queue that have been dequeued, rather than have an extra phase of the
rebuilding that just pops off the elements of the front stack that we just
pushed. A similar situation happens for the random-access stack.
The use of drop could probably be easily eliminated. (I’m not even sure it’s
still necessary.) It is mostly an artifact of (not) dealing with off-by-one issues.
module GlobalRebuildingQueue ( Queue, new, dequeue, enqueue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef' )
import Coroutine ( YieldFn, spawn )
data Queue a = Queue {
resume :: IO (),
frontRef :: IORef [a],
backRef :: IORef [a],
frontCountRef :: IORef Int,
backCountRef :: IORef Int
}
new :: IO (Queue a)
new = do
frontRef <- newIORef []
backRef <- newIORef []
frontCountRef <- newIORef 0
backCountRef <- newIORef 0
resume <- spawn $ const . rebuild frontRef backRef frontCountRef backCountRef
return Queue { .. }
dequeue :: Queue a -> IO (Maybe a)
dequeue q = do
resume q
front <- readIORef (frontRef q)
case front of
[] -> return Nothing
(x:front') -> do
modifyIORef' (frontCountRef q) pred
writeIORef (frontRef q) front'
return (Just x)
enqueue :: a -> Queue a -> IO ()
enqueue x q = do
modifyIORef (backRef q) (x:)
modifyIORef' (backCountRef q) succ
resume q
rebuild :: IORef [a] -> IORef [a] -> IORef Int -> IORef Int -> YieldFn -> IO ()
rebuild frontRef backRef frontCountRef backCountRef yield = let k = go k in go k where
go k = do
frontCount <- readIORef frontCountRef
backCount <- readIORef backCountRef
if backCount > frontCount then do
back <- readIORef backRef
front <- readIORef frontRef
writeIORef backRef []
writeIORef backCountRef 0
incrementalReverse back [] $ \rback ->
incrementalReverse front [] $ \rfront ->
incrementalRevAppend rfront rback 0 backCount k
else do
yield k
incrementalReverse [] acc k = k acc
incrementalReverse [x] acc k = k (x:acc)
incrementalReverse (x:y:xs) acc k = yield $ incrementalReverse xs (y:x:acc) k
incrementalRevAppend [] front !movedCount backCount' k = do
writeIORef frontRef front
writeIORef frontCountRef $! movedCount + backCount'
yield k
incrementalRevAppend (x:rfront) acc !movedCount backCount' k = do
currentFrontCount <- readIORef frontCountRef
if currentFrontCount <= movedCount then do
-- This drop count should be bounded by a constant.
writeIORef frontRef $! drop (movedCount - currentFrontCount) acc
writeIORef frontCountRef $! currentFrontCount + backCount'
yield k
else if null rfront then
incrementalRevAppend [] (x:acc) (movedCount + 1) backCount' k
else
yield $! incrementalRevAppend rfront (x:acc) (movedCount + 1) backCount' kDefunctionalized Global Rebuilding Implementation
This step is completely mechanical.
There’s arguably no reason to defunctionalize. It produces a result that is more data-structure-like, but, unless you need the code to work in a first-order language, there’s nothing really gained by doing this. It does lead to a result that is more directly comparable to other implementations.
For some data structures, having the continuation be analyzable would provide a simple means for the coroutines to communicate. The main process could directly look at the continuation to determine its state, e.g. if a rebuild is in-progress at all. The main process could also directly manipulate the stored continutation to change the “rebuild” process’ behavior. That said, doing this would mean that we’re not deriving the implementation. Still, the opportunity for additional optimizations and simplifications is nice.
As a minor aside, while it is, of course, obvious from looking at the
previous version of the code, it’s neat how the Kont data type
implies that the call stack is bounded and that most calls are tail calls.
REVERSE_STEP is the only constructor that contains a Kont argument,
but its type means that that argument can’t itself be a REVERSE_STEP.
Again, I just find it neat how defunctionalization makes this concrete
and explicit.
module DefunctionalizedQueue ( Queue, new, dequeue, enqueue ) where
import Data.IORef ( IORef, newIORef, readIORef, writeIORef, modifyIORef, modifyIORef' )
data Kont a r where
IDLE :: Kont a ()
REVERSE_STEP :: [a] -> [a] -> Kont a [a] -> Kont a ()
REVERSE_FRONT :: [a] -> !Int -> Kont a [a]
REV_APPEND_START :: [a] -> !Int -> Kont a [a]
REV_APPEND_STEP :: [a] -> [a] -> !Int -> !Int -> Kont a ()
applyKont :: Queue a -> Kont a r -> r -> IO ()
applyKont q IDLE _ = rebuildLoop q
applyKont q (REVERSE_STEP xs acc k) _ = incrementalReverse q xs acc k
applyKont q (REVERSE_FRONT front backCount) rback =
incrementalReverse q front [] $ REV_APPEND_START rback backCount
applyKont q (REV_APPEND_START rback backCount) rfront =
incrementalRevAppend q rfront rback 0 backCount
applyKont q (REV_APPEND_STEP rfront acc movedCount backCount) _ =
incrementalRevAppend q rfront acc movedCount backCount
rebuildLoop :: Queue a -> IO ()
rebuildLoop q@(Queue { .. }) = do
frontCount <- readIORef frontCountRef
backCount <- readIORef backCountRef
if backCount > frontCount then do
back <- readIORef backRef
front <- readIORef frontRef
writeIORef backRef []
writeIORef backCountRef 0
incrementalReverse q back [] $ REVERSE_FRONT front backCount
else do
writeIORef resumeRef IDLE
incrementalReverse :: Queue a -> [a] -> [a] -> Kont a [a] -> IO ()
incrementalReverse q [] acc k = applyKont q k acc
incrementalReverse q [x] acc k = applyKont q k (x:acc)
incrementalReverse q (x:y:xs) acc k = writeIORef (resumeRef q) $ REVERSE_STEP xs (y:x:acc) k
incrementalRevAppend :: Queue a -> [a] -> [a] -> Int -> Int -> IO ()
incrementalRevAppend (Queue { .. }) [] front !movedCount backCount' = do
writeIORef frontRef front
writeIORef frontCountRef $! movedCount + backCount'
writeIORef resumeRef IDLE
incrementalRevAppend q@(Queue { .. }) (x:rfront) acc !movedCount backCount' = do
currentFrontCount <- readIORef frontCountRef
if currentFrontCount <= movedCount then do
-- This drop count should be bounded by a constant.
writeIORef frontRef $! drop (movedCount - currentFrontCount) acc
writeIORef frontCountRef $! currentFrontCount + backCount'
writeIORef resumeRef IDLE
else if null rfront then
incrementalRevAppend q [] (x:acc) (movedCount + 1) backCount'
else
writeIORef resumeRef $! REV_APPEND_STEP rfront (x:acc) (movedCount + 1) backCount'
resume :: Queue a -> IO ()
resume q = do
kont <- readIORef (resumeRef q)
applyKont q kont ()
data Queue a = Queue {
resumeRef :: IORef (Kont a ()),
frontRef :: IORef [a],
backRef :: IORef [a],
frontCountRef :: IORef Int,
backCountRef :: IORef Int
}
new :: IO (Queue a)
new = do
frontRef <- newIORef []
backRef <- newIORef []
frontCountRef <- newIORef 0
backCountRef <- newIORef 0
resumeRef <- newIORef IDLE
return Queue { .. }
dequeue :: Queue a -> IO (Maybe a)
dequeue q = do
resume q
front <- readIORef (frontRef q)
case front of
[] -> return Nothing
(x:front') -> do
modifyIORef' (frontCountRef q) pred
writeIORef (frontRef q) front'
return (Just x)
enqueue :: a -> Queue a -> IO ()
enqueue x q = do
modifyIORef (backRef q) (x:)
modifyIORef' (backCountRef q) succ
resume qFunctional Defunctionalized Global Rebuilding Implementation
This is just a straightforward reorganization of the previous code into purely functional code. This produces a persistent queue with worst-case constant time operations.
It is, of course, far uglier and more ad-hoc than Okasaki’s extremely elegant real-time queues, but the methodology to derive it was simple-minded. The result is also quite similar to the Hood-Melville Queues even though I did not set out to achieve that. That said, I’m pretty confident you could derive pretty much exactly the Hood-Melville queues with just minor modifications to Global Rebuilding Implementation.
module FunctionalQueue ( Queue, empty, dequeue, enqueue ) where
data Kont a r where
IDLE :: Kont a ()
REVERSE_STEP :: [a] -> [a] -> Kont a [a] -> Kont a ()
REVERSE_FRONT :: [a] -> !Int -> Kont a [a]
REV_APPEND_START :: [a] -> !Int -> Kont a [a]
REV_APPEND_STEP :: [a] -> [a] -> !Int -> !Int -> Kont a ()
applyKont :: Queue a -> Kont a r -> r -> Queue a
applyKont q IDLE _ = rebuildLoop q
applyKont q (REVERSE_STEP xs acc k) _ = incrementalReverse q xs acc k
applyKont q (REVERSE_FRONT front backCount) rback =
incrementalReverse q front [] $ REV_APPEND_START rback backCount
applyKont q (REV_APPEND_START rback backCount) rfront =
incrementalRevAppend q rfront rback 0 backCount
applyKont q (REV_APPEND_STEP rfront acc movedCount backCount) _ =
incrementalRevAppend q rfront acc movedCount backCount
rebuildLoop :: Queue a -> Queue a
rebuildLoop q@(Queue { .. }) =
if backCount > frontCount then
let q' = q { back = [], backCount = 0 } in
incrementalReverse q' back [] $ REVERSE_FRONT front backCount
else
q { resumeKont = IDLE }
incrementalReverse :: Queue a -> [a] -> [a] -> Kont a [a] -> Queue a
incrementalReverse q [] acc k = applyKont q k acc
incrementalReverse q [x] acc k = applyKont q k (x:acc)
incrementalReverse q (x:y:xs) acc k = q { resumeKont = REVERSE_STEP xs (y:x:acc) k }
incrementalRevAppend :: Queue a -> [a] -> [a] -> Int -> Int -> Queue a
incrementalRevAppend q [] front' !movedCount backCount' =
q { front = front', frontCount = movedCount + backCount', resumeKont = IDLE }
incrementalRevAppend q (x:rfront) acc !movedCount backCount' =
if frontCount q <= movedCount then
-- This drop count should be bounded by a constant.
let !front = drop (movedCount - frontCount q) acc in
q { front = front, frontCount = frontCount q + backCount', resumeKont = IDLE }
else if null rfront then
incrementalRevAppend q [] (x:acc) (movedCount + 1) backCount'
else
q { resumeKont = REV_APPEND_STEP rfront (x:acc) (movedCount + 1) backCount' }
resume :: Queue a -> Queue a
resume q = applyKont q (resumeKont q) ()
data Queue a = Queue {
resumeKont :: !(Kont a ()),
front :: [a],
back :: [a],
frontCount :: !Int,
backCount :: !Int
}
empty :: Queue a
empty = Queue { resumeKont = IDLE, front = [], back = [], frontCount = 0, backCount = 0 }
dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q =
case front of
[] -> (Nothing, q)
(x:front') ->
(Just x, q' { front = front', frontCount = frontCount - 1 })
where q'@(Queue { .. }) = resume q
enqueue :: a -> Queue a -> Queue a
enqueue x q@(Queue { .. }) = resume (q { back = x:back, backCount = backCount + 1 })Hood-Melville Implementation
This is just the Haskell code from Purely Functional Data Structures adapted to the interface of the other examples.
This code is mostly to compare. The biggest difference, other than some code structuring differences, is the front and back lists are reversed in parallel while my code does them sequentially. As mentioned before, to get a structure like that would simply be a matter of defining a parallel incremental reverse back in the Global Rebuilding Implementation.
Again, Okasaki’s real-time queue that can be seen as an application of the lazy rebuilding and scheduling techniques, described in his thesis and book, is a better implementation than this in pretty much every way.
module HoodMelvilleQueue (Queue, empty, dequeue, enqueue) where
data RotationState a
= Idle
| Reversing !Int [a] [a] [a] [a]
| Appending !Int [a] [a]
| Done [a]
data Queue a = Queue !Int [a] (RotationState a) !Int [a]
exec :: RotationState a -> RotationState a
exec (Reversing ok (x:f) f' (y:r) r') = Reversing (ok+1) f (x:f') r (y:r')
exec (Reversing ok [] f' [y] r') = Appending ok f' (y:r')
exec (Appending 0 f' r') = Done r'
exec (Appending ok (x:f') r') = Appending (ok-1) f' (x:r')
exec state = state
invalidate :: RotationState a -> RotationState a
invalidate (Reversing ok f f' r r') = Reversing (ok-1) f f' r r'
invalidate (Appending 0 f' (x:r')) = Done r'
invalidate (Appending ok f' r') = Appending (ok-1) f' r'
invalidate state = state
exec2 :: Int -> [a] -> RotationState a -> Int -> [a] -> Queue a
exec2 !lenf f state lenr r =
case exec (exec state) of
Done newf -> Queue lenf newf Idle lenr r
newstate -> Queue lenf f newstate lenr r
check :: Int -> [a] -> RotationState a -> Int -> [a] -> Queue a
check !lenf f state !lenr r =
if lenr <= lenf then exec2 lenf f state lenr r
else let newstate = Reversing 0 f [] r []
in exec2 (lenf+lenr) f newstate 0 []
empty :: Queue a
empty = Queue 0 [] Idle 0 []
dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q@(Queue _ [] _ _ _) = (Nothing, q)
dequeue (Queue lenf (x:f') state lenr r) =
let !q' = check (lenf-1) f' (invalidate state) lenr r in
(Just x, q')
enqueue :: a -> Queue a -> Queue a
enqueue x (Queue lenf f state lenr r) = check lenf f state (lenr+1) (x:r)Okasaki’s Real-Time Queues
Just for completeness. This implementation crucially relies on lazy evaluation. Our queues are of
the form Queue f r s. If you look carefully, you’ll notice that the only place we consume s is
in the first clause of exec, and there we discard its elements. In other words, we only care about
the length of s. s gets “decremented” each time we enqueue until it’s empty at which point we
rotate r to f in the second clause of exec. The key thing is that f and s are initialized
to the same value in that clause. That means each time we “decrement” s we are also forcing a bit
of f. Forcing a bit of f/s means computing a bit of rotate. rotate xs ys a is an
incremental version of xs ++ reverse ys ++ a (where we use the invariant
length ys = 1 + length xs for the base case).
Using Okasaki’s terminology, rotate illustrates a simple form of lazy rebuilding where we use
lazy evaluation rather than explicit or implicit coroutines to perform work “in parallel”. Here, we
interleave the evaluation of rotate with enqueue and dequeue via forcing the conses of
f/s. However, lazy rebuilding itself may not lead to worst-case optimal times (assuming it is
amortized optimal). We need to use Okasaki’s other technique of scheduling to strategically
force the thunks incrementally rather than all at once. Here s is a schedule telling us when to
force parts of f. (As mentioned, s also serves as a counter telling us when to perform a
rebuild.)
module OkasakiQueue ( Queue, empty, dequeue, enqueue ) where
data Queue a = Queue [a] ![a] [a]
empty :: Queue a
empty = Queue [] [] []
dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q@(Queue [] _ _) = (Nothing, q)
dequeue (Queue (x:f) r s) = (Just x, exec f r s)
rotate :: [a] -> [a] -> [a] -> [a]
rotate [] (y: _) a = y:a
rotate (x:xs) (y:ys) a = x:rotate xs ys (y:a)
exec :: [a] -> [a] -> [a] -> Queue a
exec f !r (_:s) = Queue f r s
exec f !r [] = let f' = rotate f r [] in Queue f' [] f'
enqueue :: a -> Queue a -> Queue a
enqueue x (Queue f r s) = exec f (x:r) s It’s instructive to compare the above to the following implementation which doesn’t use a schedule.
This implementation is essentially the Banker’s Queue from Okasaki’s book, except we use lazy
rebuilding to spread the xs ++ reverse ys (particularly the reverse part) over multiple
dequeues via rotate. The following implementation performs extremely well in my benchmark, but
the operations are subtly not constant-time. Specifically, after a long series of enqueues, a
dequeue will do work proportional to the logarithm of the number of enqueues. Essentially, f
will be a nested series of rotate calls, one for every doubling of the length of the queue. Even
if we change let f' to let !f', that will only make the first dequeue cheap. The second will
still be expensive.
module UnscheduledOkasakiQueue ( Queue, empty, dequeue, enqueue ) where
data Queue a = Queue [a] !Int [a] !Int
empty :: Queue a
empty = Queue [] 0 [] 0
dequeue :: Queue a -> (Maybe a, Queue a)
dequeue q@(Queue [] _ _ _) = (Nothing, q)
dequeue (Queue (x:f) lenf r lenr) = (Just x, exec f (lenf - 1) r lenr)
rotate :: [a] -> [a] -> [a] -> [a]
rotate [] (y: _) a = y:a
rotate (x:xs) (y:ys) a = x:rotate xs ys (y:a)
exec :: [a] -> Int -> [a] -> Int -> Queue a
exec f !lenf !r !lenr | lenf >= lenr = Queue f lenf r lenr
exec f !lenf !r !lenr = let f' = rotate f r [] in Queue f' (lenf + lenr) [] 0
enqueue :: a -> Queue a -> Queue a
enqueue x (Queue f lenf r lenr) = exec f lenf (x:r) (lenr + 1) Empirical Evaluation
I won’t reproduce the evaluation code as it’s not very sophisticated or interesting. It randomly generated a sequence of enqueues and dequeues with an 80% chance to produce an enqueue over a dequeue so that the queues would grow. It measured the average time of an enqueue and a dequeue, as well as the maximum time of any single dequeue.
The main thing I wanted to see was relatively stable average enqueue and dequeue times with only the batched implementation having a growing maximum dequeue time. This is indeed what I saw, though it took about 1,000,000 operations (or really a queue of a couple hundred thousand elements) for the numbers to stabilize.
The results were mostly unsurprising. Unsurprisingly, in overall time, the batched
implementation won. Its enqueue is also, obviously, the fastest. (Indeed, there’s
a good chance my measurement of its average enqueue time was largely a measurement
of the timer’s resolution.) The operations’ average times were stable illustrating their
constant (amortized) time. At large enough sizes, the ratio of the maximum dequeue
time versus the average stabilized around 7000 to 1, except, of course, for the
batched version which grew linearly to millions to 1 ratios at queue sizes of tens
of millions of elements. This illustrates the worst-case time complexity of all the
other implementations, and the merely amortized time complexity of the batched one.
While the batched version was best in overall time, the difference wasn’t that great.
The worst implementations were still less 1.4x slower. All the worst-case optimal
implementations performed roughly the same, but there were still some clear winners
and losers. Okasaki’s real-time queue is almost on-par with the batched
implementation in overall time and handily beats the other implementations in average
enqueue and dequeue times. The main surprise for me was that the loser was the
Hood-Melville queue. My guess is this is due to invalidate which seems like it
would do more work and produce more garbage than the approach taken in my functional
version.
Conclusion
The point of this article was to illustrate the process of deriving a deamortized data structure from an amortized one utilizing batched rebuilding by explicitly modeling global rebuilding as a coroutine.
The point wasn’t to produce the fastest queue implementation, though I am pretty happy with the results. While this is an extremely simple example, it was still nice that each step was very easy and natural. It’s especially nice that this derivation approach produced a better result than the Hood-Melville queue.
Of course, my advice is to use Okasaki’s real-time queue if you need a purely functional queue with worst-case constant-time operations.
This code could definitely be refactored to leverage this similarity to reduce code. Alternatively, one could refunctionalize the Hood-Melville implementation at the end.↩︎
Going “too fast”, so long as it’s still a constant amount of work for each step, isn’t really an issue asymptotically, so you can just crank the knobs if you don’t want to think too hard about it. That said, going faster than you need to will likely give you worse worst-case constant factors. In some cases, going faster than necessary could reduce constant factors, e.g. by better utilizing caches and disk I/O buffers.↩︎
Morleyization is a fairly important operation in categorical logic for which it is hard to find readily accessible references to a statement and proof. Most refer to D1.5.13 of “Sketches of an Elephant” which is not an accessible text. 3.2.8 of “Accessible Categories” by Makkai and Paré is another reference, and “Accessible Categories” is more accessible but still a big ask for just a single theorem.
Here I reproduce the statement and proof from “Accessible Categories” albeit with some notational and conceptual adaptations as well as some commentary. This assumes some basic familiarity with the ideas and notions of traditional model theory, e.g. what structures, models, and |\vDash| are.
Preliminaries
The context of the theorem is infinitary, classical (multi-sorted) first-order logic. |L| will stand for a language aka a signature, i.e. sorts, function symbols, predicate symbols as usual, except if we’re allowing infinitary quantification we may have function or predicate symbols of infinite arity. We write |L_{\kappa,\lambda}| for the corresponding classical first-order logic where we allow conjunctions and disjunctions indexed by sets of cardinality less than the regular (infinite) cardinal |\kappa| while allowing quantification over sets of variables of (infinite) cardinality less than |\lambda \leq \kappa|. |\lambda=\varnothing| is also allowed to indicate a propositional logic. If |\kappa| or |\lambda| are |\infty|, that means conjunctions/disjunctions or quantifications over arbitrary sets. |L_{\omega,\omega}| would be normal finitary, classical first-order logic. Geometric logic would be a fragment of |L_{\infty,\omega}|. The theorem will focus on |L_{\infty,\infty}|, but inspection of the proof shows that theorem would hold for any reasonable choice for |\kappa| and |\lambda|.
As a note, infinitary logics can easily have a proper class of formulas. Thus, it will make sense to talk about small subclasses of formulas, i.e. ones which are sets.
Instead of considering logics with different sets of connectives Makkai and Paré, introduces the fairly standard notion of a positive existential formula which is a formula that uses only atomic formulas, conjunctions, disjunctions, and existential quantification. That is, no implication, negation, or universal quantification. They then define a basic sentence as “a conjunction of a set of sentences, i.e. closed formulas, each of which is of the form |\forall\vec x(\phi\to\psi)| where |\phi| and |\psi| are [positive existential] formulas”.
It’s clear the component formulas of a basic sentences correspond to sequents of the form |\phi\vdash\psi| for open positive existential formulas. A basic sentence corresponds to what is often called a theory, i.e. a set of sequents. Infinitary logic lets us smash a theory down to a single formula, but I think the theory concept is clearer though I’m sure there are benefits to having a single formula. Instead of talking about basic sentences, we can talk about a theory in the positive existential fragment of the relevant logic. This has the benefit that we don’t need to introduce connectives or infinitary versions of connectives just for structural reasons. I’ll call a theory that corresponds to a basic sentence a positive existential theory for conciseness.
Makkai and Paré also define |L_{\kappa,\lambda}^*| “for the class of formulas |L_{\kappa,\lambda}| which are conjunctions of formulas in each of which the only conjunctions occurring are of cardinality |< \lambda|”. For us, the main significance of this is that geometric theories correspond to basic sentences in |L_{\infty,\omega}^*| as this limits the conjunctions to the finitary case. Indeed, Makkai and Paré include the somewhat awkward sentence: “Thus, a geometric theory is the same as a basic sentence in |L_{\infty,\omega}^*|, and a coherent theory is a conjunction of basic sentences in |L_{\omega,\omega}|.” Presumably, the ambiguous meaning of “conjunction” leads to the differences in how these are stated, i.e. a basic sentence is already a “conjunction” of formulas.
The standard notion of an |L|-structure and model are used, and I won’t give a precise definition here. An |L|-structure assigns meaning (sets, functions, and relations) to all the sorts and symbols of |L|, and a model of a formula (or theory) is an |L|-structure which satisfies the formula (or all the formulas of the theory). We’ll write |Str(L)| for the category of |L|-structures and homomorphisms. In categorical logic, an |L|-structure would usually be some kind of structure preserving (fibred) functor usually into |\mathbf{Set}|, and a homomorphism is a natural transformation. A formula would be mapped to a subobject, and a model would require these subobjects to satisfy certain factoring properties specified by the theory. A sequent |\varphi \vdash \psi| in the theory would require a model to have the interpretation of |\varphi| factor through the interpretation of |\psi|, i.e. for the former to be a subset of the latter when interpreting into |\mathbf{Set}|.
Theorem Statement
|\mathcal F \subseteq L_{\infty,\infty}| is called a fragment of |L_{\infty,\infty}| if:
- it contains all atomic formulas of |L|,
- it is closed under substitution,
- if a formula is in |\mathcal F| then so are all its subformulas,
- if |\forall\vec x\varphi \in \mathcal F|, then so is |\neg\exists\vec x\neg\varphi|, and
- if |\varphi\to\psi \in \mathcal F|, then so is |\neg\varphi\lor\psi|.
Basically, and the motivation for this will become clear shortly, formulas in |\mathcal F| are like “compound atomic formulas” with the caveat that we must include the classically equivalent versions of |\forall| and |\to| in terms of |\neg| and |\exists| or |\lor| respectively.
Given |\mathcal F|, we define an |\mathcal F|-basic sentence exactly like a basic sentence except that we allow formulas from |\mathcal F| instead of just atomic formulas as the base case. In theory language, an |\mathcal F|-basic sentence is a theory, i.e. set of sequents, using only the connectives |\bigwedge|, |\bigvee|, and |\exists|, except within subformulas contained in |\mathcal F| which may use any (first-order) connective. We’ll call such a theory a positive existential |\mathcal F|-theory. Much of the following will be double-barrelled as I try to capture the proof as stated in “Accessible Categories” and my slight reformulation using positive existential theories.
|\mathrm{Mod}^{(\mathcal F)}(\mathbb T)| for a theory |\mathbb T| (or |\mathrm{Mod}^{(\mathcal F)}(\sigma)| for a basic sentence |\sigma|) is the category whose objects are |L|-structures that are models of |\mathbb T| (or |\sigma|), and whose arrows are the |\mathcal F|-elementary mappings. An |\mathcal F|-elementary mapping |h : M \to N|, for any subset of formulas of |L_{\infty,\infty}|, |\mathcal F|, is a mapping of |L|-structures which preserves the meaning of all formulas in |\mathcal F|. That is, |M \vDash \varphi(\vec a)| implies |N \vDash \varphi(h(\vec a))| for all formulas, |\varphi \in \mathcal F| and appropriate sequences |\vec a|. We can define the elementary mappings for a language |L’| as the |\mathcal F’|-elementary mappings where |\mathcal F’| consists of (only) the atomic formulas of |L’|. |\mathrm{Mod}^{(L’)}(\mathbb T’)| (or |\mathrm{Mod}^{(L’)}(\sigma’)|) can be defined by |\mathrm{Mod}^{(\mathcal F’)}(\mathbb T’)| (or |\mathrm{Mod}^{(L’)}(\sigma’)|) for the |\mathcal F’| determined this way.
Here’s the theorem as stated in “Accessible Categories”.
Theorem (Proposition 3.2.8): Given any small fragment |\mathcal F| and an |\mathcal F|-basic sentence |\sigma|, the category of |\mathrm{Mod}^{(\mathcal F)}(\sigma)| is equivalent to |\mathrm{Mod}^{(L’)}(\sigma’)| for some other language |L’| and basic sentence |\sigma’| over |L’|, hence by 3.2.1, to the category of models of a small sketch as well.
We’ll replace the |\mathcal F|-basic sentences |\sigma| and |\sigma’| with positive existential |\mathcal F|-theories |\mathbb T| and |\mathbb T’|.
Implied is that |\mathcal F \subseteq L_{\infty,\infty}|, i.e. that |L| and |L’| may be distinct and usually will be. As the proof will show, they agree on sorts and function symbols, but we have different predicate symbols in |L’|.
I’ll be ignoring the final comment referencing Theorem 3.2.1. Theorem 3.2.1 is the main theorem of the section and states that every small sketch gives rise to a language |L| and theory |\mathbb T| (or basic sentence |\sigma|) and vice versa such that the category of models of the sketch are equivalent to models of |\mathbb T| (or |\sigma|). Thus, the final comment is an immediate corollary.
For us, the interesting part of 3.2.8 is that it takes a classical first-order theory, |\mathbb T|, and produces a positive existential theory, as represented by |\mathbb T’|, that has an equivalent, in fact isomorphic, category of models. This positive existential theory is called the Morleyization of the first-order theory.
In particular, if we have a finitary classical first-order theory, then we get a coherent theory with the same models. This means to study models of classical first-order theories, it’s enough to study models of coherent theories via the Morleyization of the classical first-order theories. This allows many techniques for geometric and coherent theories to be applied, e.g. (pre)topos theory and classifying toposes. As stated before, the theorem statement doesn’t actually make it clear that the result holds for a restricted degree of “infinitariness”, but this is obvious from the proof.
Proof
I’ll quote the first few sentences of the proof to which I have nothing to add.
The idea is to replace each formula in |\mathcal F| by a new predicate. Let the sorts of the language |L’| be the same as those of |L|, and similarly for the [function] symbols.
The description of the predicate symbols is complicated by their (potential) infinitary nature. I’ll quote the proof here as well as I have nothing to add and am not as interested in this case. The finitary quantifiers case would be similar, just slightly less technical. It would be even simpler if we defined formulas in a given (ordered) variable context as is typical in categorical logic.
With any formula |\phi(\vec x)| in |\mathcal F|, with |\vec x| the repetition free sequence |\langle x_\beta\rangle_{\beta<\alpha}| of exactly the free variables of |\phi| in a once and for all fixed order of variables, let us associate the new [predicate] symbol |P_\phi| of arity |a : \alpha \to \mathrm{Sorts}| such that |a(\beta) = x_\beta|. The [predicate] symbols of |L’| are the |P_\phi| for all |\phi\in\mathcal F|.
The motivation of |\mathcal F|-basic sentences / positive existential |\mathcal F|-theories should now be totally clear. The |\mathcal F|-basic sentences / positive existential |\mathcal F|-theories are literally basic sentences / positive existential theories in the language of |L’| if we replace all occurrences of subformulas in |\mathcal F| with their corresponding predicate symbol in |L’|.
We can extend any |L|-structure |M| to an |L’|-structure |M^\sharp| such that they agree on all the sorts and function symbols of |L|, and |M^\sharp| satisfies |M^\sharp \vDash P_\varphi(\vec a)| if and only if |M \vDash \varphi(\vec a)|. Which is to say, we define the interpretation of |P_\varphi| to be the subset of the interpretation of its domain specified by |M \vDash \varphi(\vec a)| for all |\vec a| in the domain. In more categorical language, we define the subobject that |P_\varphi| gets sent to to be the subobject |\varphi|.
We can define an |L|-structure, |N^\flat|, for |N| an |L’|-structure by, again, requiring it to do the same thing to sorts and function symbols as |N|, and defining the interpretation of the predicate symbols as |N^\flat \vDash R(\vec a)| if and only if |N \vDash P_{R(\vec x)}(\vec a)|.
We immediately have |(M^\sharp)^\flat = M|.
We can extend this to |L’|-formulas. Let |\psi| be an |L’|-formula, then |\psi^\flat| is defined by a connective-preserving operation for which we only need to specify the action on predicate symbols. We define that by declaring |P_\varphi(\vec t)^\flat| gets mapped to |\varphi(\vec t)|. We extend |\flat| to theories via |\mathbb T’^\flat \equiv \{ \varphi^\flat \vdash \psi^\flat \mid (\varphi\vdash\psi) \in \mathbb T’\}|. A similar induction allows us to prove \[M\vDash\psi^\flat(\vec a)\iff M^\sharp\vDash\psi(\vec a)\] for all |L|-structures |M| and appropriate |\vec a|.
We have |\mathbb T = \mathbb T’^\flat| for a positive existential theory |\mathbb T’| over |L’| (or |\sigma = \rho^\flat| for a basic |L’|-sentence |\rho|) and thus |\varphi^\flat \vDash_M \psi^\flat \iff \varphi \vDash_{M^\sharp}\psi| for all |\varphi\vdash\psi \in \mathbb T’| (or |M \vDash\sigma \iff M^\sharp\vDash\rho|). We want to make it so that any |L’|-structure |N| interpreting |\mathbb T’| (or |\rho|) as |\mathbb T| (or |\sigma|) is of the form |N = M^\sharp| for some |M|. Right now that doesn’t happen because, while the definition of |M^\sharp| forces it to respect the logical connectives in the formula |\varphi| associated to the |L’| predicate symbol |P_\varphi|, this isn’t required for an arbitrary model |N|. For example, nothing requires |N \vDash P_\top| to hold.
The solution is straightforward. In addition to |\mathbb T’| (or |\rho|) representing the theory |\mathbb T| (or |\sigma|), we add in an additional set of axioms |\Phi| that capture the behavior of the (encoded) logical connectives of the formulas associated to the predicate symbols.
These axioms are largely structural with a few exceptions that I’ll address separately. I’ll present this as a collection of sequents for a theory, but we can replace |\vdash| and |\dashv \vdash| with |\to| and |\leftrightarrow| for the basic sentence version. |\varphi \dashv\vdash \psi| stands for two sequents going opposite directions.
\[\begin{align} \varphi(\vec x) & \dashv\vdash P_\varphi(\vec x) \tag{for atomic $\varphi$} \\ P_{R(\vec x)}(\vec t) & \dashv\vdash P_{R(\vec t)}(\vec y) \tag{for terms $\vec t$ with free variables $\vec y$} \\ P_{\bigwedge\Sigma}(\vec x) & \dashv\vdash \bigwedge_{\varphi \in \Sigma} P_\varphi(\vec x_\varphi) \tag{$\vec x_\varphi$ are the free variables of $\varphi$} \\ P_{\bigvee\Sigma}(\vec x) & \dashv\vdash \bigvee_{\varphi \in \Sigma} P_\varphi(\vec x_\varphi) \tag{$\vec x_\varphi$ are the free variables of $\varphi$} \\ P_{\exists\vec y.\varphi(\vec x,\vec y)}(\vec x) & \dashv\vdash \exists\vec y.P_{\varphi(\vec x,\vec y)}(\vec x,\vec y) \end{align}\]
We then have two axiom schemas that eliminate the |\forall| and |\to| by leveraging the defining property of |\mathcal F| being a fragment.
\[\begin{align} P_{\forall\vec y.\varphi(\vec x,\vec y)}(\vec x) & \dashv\vdash P_{\neg\exists\vec y.\neg\varphi(\vec x,\vec y)}(\vec x) \\ P_{\varphi\to\psi}(\vec x) & \dashv\vdash P_{\neg\varphi}(\vec x) \lor P_\psi(\vec x) \end{align}\]
We avoid needing negation by axiomatizing that |P_{\neg\varphi}| is the complement to |P_\varphi|. This is arguably the key idea. Once we can simulate the behavior of negation without actually needing it, then it is clear that we can embed all the other non-positive-existential connectives.
\[\begin{align} & \vdash P_{\neg\varphi}(\vec x) \lor P_\varphi(\vec x) \\ P_{\neg\varphi}(\vec x) \land P_\varphi(\vec x) & \vdash \bot \end{align}\]
|\Phi| is the set of all these sequents. (For the basic sentence version, |\Phi| is the set of universal closures of all these formulas for all |\varphi,\psi \in \mathcal F|.)
Another straightforward structural induction over the subformulas of |\varphi\in\mathcal F| shows that \[N^\flat \vDash \varphi(\vec a) \iff N \vDash P_\varphi(\vec a)\] for any |L’|-structure |N| which is a model of |\Phi|. The only interesting case is the negation case. Here, the induction hypothesis states that |N^\flat\vDash\varphi(\vec a)| agrees with |N\vDash P_\varphi(\vec a)| and the axioms state that |N\vDash P_{\neg\varphi}(\vec a)| is the complement of the latter which thus agrees with the complement of the former which is |N^\flat\vDash\neg\varphi(\vec a)|.
From this, it follows that |N = M^\sharp| for |M = N^\flat| or, equivalently, |N = (N^\flat)^\sharp|.
|({-})^\sharp| and |({-})^\flat| thus establish a bijection between the objects of |\mathrm{Mod}^{(\mathcal F)}(\mathbb T)| (or |\mathrm{Mod}^{(\mathcal F)}(\sigma)|) and |\mathrm{Mod}^{(L’)}(\mathbb T’\cup\Phi))| (or |\mathrm{Mod}^{(L’)}(\bigwedge(\{\rho\}\cup\Phi))|). The morphisms of these two categories would each be subclasses of the morphisms of |Str(L_0)| where |L_0| is the language consisting of only the sorts and function symbols of |L| and thus |L’|. We can show that they are identical subclasses which basically comes down to showing that an elementary mapping of |\mathrm{Mod}^{(L’)}(\mathbb T’\cup\Phi))| (or |\mathrm{Mod}^{(L’)}(\bigwedge(\{\rho\}\cup\Phi))|) is an |\mathcal F|-elementary mapping.
The idea is that such a morphism is a map |h : N \to N’| in |Str(L_0)| which must satisfy \[N \vDash P_\varphi(\vec a) \implies N’ \vDash P_\varphi(h(\vec a))\] for all |\varphi \in \mathcal F| and appropriate |\vec a|. However, since |N = (N^\flat)^\sharp| and |P_\varphi(\vec a)^\flat = \varphi(\vec a)|, we have |N^\flat \vDash \varphi(\vec a) \iff N \vDash P_\varphi(\vec a)| and similarly for |N’|. Thus \[N^\flat \vDash \varphi(\vec a) \implies N’^\flat \vDash \varphi(h(\vec a))\] for all |\varphi \in \mathcal F|, and every such |h| corresponds to an |\mathcal F|-elementary mapping. Choosing |N = M^\sharp| allows us to show the converse for any |\mathcal F|-elementary mapping |g : M \to M’|. |\square|
Commentary
The proof doesn’t particularly care that we’re interpreting the models into |\mathbf{Set}| and would work just as well if we interpreted into some other category with the necessary structure. The amount of structure required would vary with how much “infinitariness” we actually used, though it would need to be a Boolean category. In particular, the proof works as stated (in its theory form) without any infinitary connectives being implied for mapping finitary classical first-order logic to coherent logic.
We could simplify the statement and the proof by first eliminating |\forall| and |\to| and then considering the proof over classical first-order logic with the connectives |\{\bigwedge,\bigvee,\exists,\neg\}|. This would simplify the definition of fragment and remove some cases in the proof.
To reiterate, the key is how we handle negation.
Defunctionalization
Morleyization is related to defunctionalization1. For simplicity, I’ll only consider the finitary, propositional case, i.e. |L_{\omega,\varnothing}|.
In this case, we can consider each |P_\varphi| to be a new data type. In most cases, it would be
a newtype to use Haskell terminology. The only non-trivial case is |P_{\neg\varphi}|. Now, the
computational interpretation of classical propositional logic would use control operators to handle
negation. Propositional coherent logic, however, has a straightforward (first-order) functional
interpretation. Here, a negated formula, |\neg\varphi|, is represented by an primitive type
|P_{\neg\varphi}|.
The |P_{\neg\varphi} \land P_\varphi \vdash \bot| sequent is the apply
function for the defunctionalized continuation (of type |\varphi|). Even more clearly, this
is interderivable with |P_{\neg\varphi} \land \varphi’ \vdash \bot| where |\varphi’| is
the same as |\varphi| except the most shallow negated subformulas are replaced with the corresponding
predicate symbols. In particular, if |\varphi| contains no negated subformulas, then |\varphi’=\varphi|.
We have no way of creating new values of |P_{\neg\varphi}| other than via whatever sequents have been given.
We can, potentially, get a value of |P_{\neg\varphi}| by case analyzing on |\vdash \mathsf{lem}_\varphi : P_{\neg\varphi}\lor P_\varphi|.
What this corresponds to is a first-order functional language with a primitive type for each negated formula. Any semantics/implementation for this, will need to decide if the primitive type |P_{\neg\varphi}| is empty or not, and then implement |\mathsf{lem}_\varphi| appropriately (or allow inconsistency). A programmer writing a program in this signature, however, cannot assume either way whether |P_{\neg\varphi}| is empty unless they can create a program with that type.
As a very slightly non-trivial example, let’s consider implementing |A \to P_{\neg\neg A}| corresponding to double negating. Using Haskell-like syntax, the program looks like:
proof :: A -> NotNotA
proof a = case lem_NotA of
Left notNotA -> notNotA
Right notA -> absurd (apply_NotA (notA, a))where lem_NotA :: Either NotNotA NotA, apply_NotA :: (NotA, A) -> Void, and absurd :: Void -> a
is the eliminator for |\bot| where |\bot| is represented by Void.
Normally in defunctionalization we’d also be adding constructors to our new types for all the
occurrences of lambdas (or maybe |\mu|s would be better in this case). However, since the only
thing we can do (in general) with NotA is use apply_A on it, no information can be extracted
from it. Either it’s inhabited and behaves like (), i.e. |\top|, or it’s not inhabited and
behaves like Void, i.e. |\bot|. We can even test for this by case analyzing on lem_A which
makes sense because in the classical logic this formula was decidable.
Bonus: Grothendieck toposes as categories of models of sketches
The main point of this section of “Accessible Categories” is to show that we can equivalently view categories of models of sketches as categories of models of theories. In particular, models of geometric sketches, those whose cone diagrams are finite but cocone diagrams are arbitrary, correspond to models of geometric theories.
We can view a site, |(\mathcal C, J)|, for a Grothendieck topos as the data of a geometric sketch. In particular, |\mathcal C| becomes the underlying category of the sketch, we add cones to capture all finite limits, and the coverage, |J|, specifies the cocones. These cocones have a particular form as the quotient of the kernel of a sink as specified by the sieves in |J|. (We need to use the apex of the cones representing pullbacks instead of actual pullbacks.)
Lemma 3.2.2 shows the sketch-to-theory implication. The main thing I want to note about its proof is that it illustrates how infinitely large cones would require infinitary (universal) quantification (in addition to the unsurprising need for infinitary conjunction), but infinitely large cocones do not (but they do require infinitary disjunction). I’ll not reproduce it here, but it comes down to writing out the normal set-theoretic constructions of limits and colimits (in |\mathbf{Set}|), but instead of using some first-order theory of sets, like ZFC, uses of sets would be replaced with (infinitary) logical operations. The “infinite tuples” of an infinite limit become universal quantification over an infinitely large number of free variables. For the colimits, though, the most complex use of quantifiers is an infinite disjunction of increasingly deeply nested quantifiers to represent the transitive closure of a relation, but no single disjunct is infinitary. Figuring out the infinitary formulas is a good exercise.
An even more direct connection to defunctionalization is the fact that geometric logic is the internal logic of Grothendieck toposes, but Grothendieck toposes are elementary toposes and so have the structure to model implication and universal quantification. It’s just that those connectives aren’t preserved by geometric morphisms. For implication, the idea is that |A \to B| is represented by |\bigvee\{\bigwedge\Gamma\mid \Gamma,A\vdash B\}| where |\Gamma| is finite. We can even see how a homomorphism that preserved geometric logic structure will fail to preserve this definition of |\to|. Specifically, there could be additional contexts not in the image of the homomorphism that should be included in the image of the disjunction for it to lead to |\to| in the target but won’t be.↩︎
Andrej Bauer has a paper titled The pullback lemma in gory detail that goes over the proof of the pullback lemma in full detail. This is a basic result of category theory and most introductions leave it as an exercise. It is a good exercise, and you should prove it yourself before reading this article or Andrej Bauer’s.
Andrej Bauer’s proof is what most introductions are expecting you to produce. I very much like the representability perspective on category theory and like to see what proofs look like using this perspective.
So this is a proof of the pullback lemma from the perspective of representability.
Preliminaries
The key thing we need here is a characterization of pullbacks in terms of representability. To just jump to the end, we have for |f : A \to C| and |g : B \to C|, |A \times_{f,g} B| is the pullback of |f| and |g| if and only if it represents the functor \[\{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \}\]
That is to say we have the natural isomorphism \[ \mathrm{Hom}({-}, A \times_{f,g} B) \cong \{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \} \]
We’ll write the left to right direction of the isomorphism as |\langle u,v\rangle : U \to A \times_{f,g} B| where |u : U \to A| and |v : U \to B| and they satisfy |f \circ u = g \circ v|. Applying the isomorphism right to left on the identity arrow gives us two arrows |p_1 : A \times_{f,g} B \to A| and |p_2 : A \times_{f,g} B \to B| satisfying |p_1 \circ \langle u, v\rangle = u| and |p_2 \circ \langle u,v \rangle = v|. (Exercise: Show that this follows from being a natural isomorphism.)
One nice thing about representability is that it reduces categorical reasoning to set-theoretic reasoning that you are probably already used to, as we’ll see. You can connect this definition to a typical universal property based definition used in Andrej Bauer’s article. Here we’re taking it as the definition of the pullback.
Proof
The claim to be proven is if the right square in the below diagram is a pullback square, then the left square is a pullback square if and only if the whole rectangle is a pullback square. \[ \xymatrix { A \ar[d]_{q_1} \ar[r]^{q_2} & B \ar[d]_{p_1} \ar[r]^{p_2} & C \ar[d]^{h} \\ X \ar[r]_{f} & Y \ar[r]_{g} & Z }\]
Rewriting the diagram as equations, we have:
Theorem: If |f \circ q_1 = p_1 \circ q_2|, |g \circ p_1 = h \circ p_2|, and |(B, p_1, p_2)| is a pullback of |g| and |h|, then |(A, q_1, q_2)| is a pullback of |f| and |p_1| if and only if |(A, q_1, p_2 \circ q_2)| is a pullback of |g \circ f| and |h|.
Proof: If |(A, q_1, q_2)| was a pullback of |f| and |p_1| then we’d have the following.
\[\begin{align} \mathrm{Hom}({-}, A) & \cong \{(u_1, u_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, B) \mid f \circ u_1 = p_1 \circ u_2 \} \\ & \cong \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = p_1 \circ \langle v_1, v_2\rangle \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = v_1 \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, v_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, C) \mid g \circ f \circ u_1 = h \circ v_2 \} \end{align}\]
The second isomorphism is |B| being a pullback and |u_2| is an arrow into |B| so it’s necessarily of the form |\langle v_1, v_2\rangle|. The first equality is just |p_1 \circ \langle v_1, v_2\rangle = v_1| mentioned earlier. The second equality merely eliminates the use of |v_1| using the equation |f \circ u_1 = v_1|.
This overall natural isomorphism, however, is exactly what it means for |A| to be a pullback of |g \circ f| and |h|. We verify the projections are what we expect by pushing |id_A| through the isomorphism. By assumption, |u_1| and |u_2| will be |q_1| and |q_2| respectively in the first isomorphism. We see that |v_2 = p_2 \circ \langle v_1, v_2\rangle = p_2 \circ q_2|.
We simply run the isomorphism backwards to get the other direction of the if and only if. |\square|
The simplicity and compactness of this proof demonstrates why I like representability.
]]>Introduction
It is not uncommon for universal quantification to be described as (potentially) infinite conjunction1. Quoting Wikipedia’s Quantifier_(logic) page (my emphasis):
For a finite domain of discourse |D = \{a_1,\dots,a_n\}|, the universal quantifier is equivalent to a logical conjunction of propositions with singular terms |a_i| (having the form |Pa_i| for monadic predicates).
The existential quantifier is equivalent to a logical disjunction of propositions having the same structure as before. For infinite domains of discourse, the equivalences are similar.
While there’s a small grain of truth to this, I think it is wrong and/or misleading far more often than it’s useful or correct. Indeed, it takes a bit of effort to even get a statement that makes sense at all. There’s a bit of conflation between syntax and semantics that’s required to have it naively make sense, unless you’re working (quite unusually) in an infinitary logic where it is typically outright false.
What harm does this confusion do? The most obvious harm is that this view does not generalize to non-classical logics. I’ll focus on constructive logics, in particular. Besides causing problems in these contexts, which maybe you think you don’t care about, it betrays a significant gap in understanding of what universal quantification actually is. Even in purely classical contexts, this confusion often manifests, e.g., in confusion about |\omega|-inconsistency.
So what is the difference between universal quantification and infinite conjunction? Well, the most obvious difference is that infinite conjunction is indexed by some (meta-theoretic) set that doesn’t have anything to do with the domain the universal quantifier quantifies over. However, even if these sets happened to coincide2 there are still differences between universal quantification and infinite conjunction. The key is that universal quantification requires the predicate being quantified over to hold uniformly, while infinite conjunction does not. It just so happens that for the standard set-theoretic semantics of classical first-order logic this “uniformity” constraint is degenerate. However, even for classical first-order logic, this notion of uniformity will be relevant.
Classical Semantic View
I want to start in the context where this identification is closest to being true, so I can show where the idea comes from. The summary of this section is that the standard, classical, set-theoretic semantics of universal quantification is equivalent to an infinitary generalization of the semantics of conjunction. The issue is “infinitary generalization of the semantics of conjunction” isn’t the same as “semantics of infinitary conjunction”.
The standard set-theoretic semantics of classical first-order logic interprets each formula, |\varphi|, as a subset of |D^{\mathsf{fv}(\varphi)}| where |D| is a given domain set and |\mathsf{fv}| computes the (necessarily finite) set of free variables of |\varphi|. Traditionally, |D^{\mathsf{fv}(\varphi)}| would be identified with |D^n| where |n| is the cardinality of |\mathsf{fv}(\varphi)|. This involves an arbitrary mapping of the free variables of |\varphi| to the numbers |1| to |n|. The semantics of a formula then becomes an |n|-ary set-theoretic relation.
The interpretation of binary conjunction is straightforward:
\[\den{\varphi \land \psi} = \den{\varphi} \cap \den{\psi}\]
where |\den{\varphi}| stands for the interpretation of the formula |\varphi|. To be even more explicit, I should index this notation by a structure which specifies the domain, |D|, as well as the interpretations of any predicate or function symbols, but we’ll just consider this fixed but unspecified.
The interpretation of universal quantification is more complicated but still fairly straightforward:
\[\den{\forall x.\varphi} = \bigcap_{d \in D}\left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\}\]
Set-theoretically, we have:
\[\begin{align} \bar z \in \bigcap_{d \in D}\left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\} \iff & \forall d \in D. \bar z \in \left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\} \\ \iff & \forall d \in D. \exists \bar y \in \den{\varphi}. \bar z = \bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \land \bar y(x) = d \\ \iff & \forall d \in D. \bar z[x \mapsto d] \in \den{\varphi} \end{align}\]
where |f[x \mapsto c]| extends a function |f \in D^{S}| to a function in |D^{S \cup \{x\}}| via |f[x \mapsto c](v) = \begin{cases}c, &\textrm{ if }v = x \\ f(v), &\textrm{ if }v \neq x\end{cases}|. The final |\iff| arises because |\bar z[x \mapsto d]| is the unique function which extends |\bar z| to the desired domain such that |x| is mapped to |d|. Altogether, this illustrates our desired semantics of the interpretation of |\forall x.\varphi| being the interpretations of |\varphi| which hold when |x| is interpreted as any element of the domain.
This demonstrates the summary that the semantics of quantification is an infinitary version of the semantics of conjunction, as |\bigcap| is an infinitary version of |\cap|. But even here there are substantial cracks in this perspective.
Infinitary Logic
The first problem is that we don’t have an infinitary conjunction so saying universal quantification is essentially infinitary conjunction doesn’t make sense. However, it’s easy enough to formulate the syntax and semantics of infinitary conjunction (assuming we have a meta-theoretic notion of sets).
Syntactically, for a (meta-theoretic) set |I| and an |I|-indexed family of formulas |\{\varphi_i\}_{i \in I}|, we have the infinitary conjunction |\bigwedge_{i \in I} \varphi_i|.
The set-theoretic semantics of this connective is a direct generalization of the binary conjunction case:
\[\bigden{\bigwedge_{i \in I}\varphi_i} = \bigcap_{i \in I}\den{\varphi_i}\]
If |I = \{1,2\}|, we recover exactly the binary conjunction case.
Equipped with a semantics of actual infinite conjunction, we can compare to the semantics of universal quantification case and see where things go wrong.
The first problem is that it makes no sense to choose |I| to be |D|. The formula |\bigwedge_{i \in I} \varphi_i| can be interpreted with respect to many different domains. So any particular choice of |D| would be wrong for most semantics. This is assuming that our syntax’s meta-theoretic sets were the same as our semantics’ meta-theoretic sets, which need not be the case at all3.
An even bigger problem is that infinitary conjunction expects a family of formulas while with universal quantification has just one. This is one facet of the uniformity I mentioned. Universal quantification has one formula that is interpreted a single way (with respect to the given structure). The infinitary intersection expression is computing a set out of this singular interpretation. Infinitary conjunction, on the other hand, has a family of formulas which need have no relation to each other. Each of these formulas is independently interpreted and then all those separate interpretations are combined with an infinitary intersection. The problem we have is that there’s generally no way to take a formula |\varphi| with free variable |x| and an element |d \in D| and make a formula |\varphi_d| with |x| not free such that |\bar y[x \mapsto d] \in \den{\varphi} \iff \bar y \in \den{\varphi_d}|. A simple cardinality argument shows that: there are only countably many (finitary) formulas, but there are plenty of uncountable domains. This is why |\omega|-inconsistency is possible. We can easily have elements in the domain which cannot be captured by any formula.
Syntactic View
Instead of taking a semantic view, let’s take a syntactic view of universal quantification and infinitary conjunction, i.e. let’s compare the rules that characterize them. As before, the first problem we have is that traditional first-order logic does not have infinitary conjunction, but we can easily formulate what the rules would be.
The elimination rules are superficially similar but have subtle but important distinctions:
\[\frac{\Gamma \vdash \forall x.\varphi}{\Gamma \vdash \varphi[x \mapsto t]}\forall E,t \qquad \frac{\Gamma \vdash \bigwedge_{i \in I} \varphi_i}{\Gamma \vdash \varphi_j}{\wedge}E,j\] where |t| is a term, |j| is an element of |I|, and |\varphi[x \mapsto t]| corresponds to syntactically substituting |t| for |x| in |\varphi| in a capture-avoiding way. A first, not-so-subtle distinction is if |I| is an infinite set, then |\bigwedge_{i \in I}\varphi_i| is an infinitely large formula. Another pretty obvious issue is universal quantification is restricted to instantiating terms while |I| stands for either an arbitrary (meta-theoretic) set or it may stand for some particular (meta-theoretic) set, e.g. |\mathbb N|. Either way, it is typically not the set of terms of the logic.
Arguably, this isn’t an issue since the claim isn’t that every infinite conjunction corresponds to a universal quantification, but only that universal quantification corresponds to some infinite conjunction. The set of terms is a possible choice for |I|, so that shouldn’t be a problem. Well, whether it’s a problem or not depends on how you set up the syntax of the language. In my preferred way of handling the syntax of logical formulas, I index each formula by the set of free variables that may occur in that formula. This means the set of terms varies with the set of possible free variables. Writing |\vdash_V \varphi| to mean |\varphi| is well-formed and provable in a context with free variables |V|, then we would want the following rule:
\[\frac{\vdash_V \varphi}{\vdash_U \varphi}\] where |V \subseteq U|. This simply states that if a formula is provable, it should remain provable even if we add more (unused) free variables. This causes a problem with having an infinitary conjunction indexed by terms. Writing |\mathsf{Term}(V)| for the set of terms with (potential) free variables in |V|, then while |\vdash_V \bigwedge_{t \in \mathsf{Term}(V)}\varphi_t| might be okay, this would also lead to |\vdash_U \bigwedge_{t \in \mathsf{Term}(V)}\varphi_t| which would also hold but would no longer correspond to universal quantification in a context with free variables in |U|. This really makes a difference. For example, for many theories, such as the usual presentation of ZFC, |\mathsf{Term}(\varnothing) = \varnothing|, i.e. there are no closed terms. As such, |\vdash_\varnothing \forall x.\bot| is neither provable (which we wouldn’t expect it to be) nor refutable without additional axioms. On the other hand, |\bigwedge_{i \in \varnothing}\bot| is |\top| and thus trivially provable. If we consider |\vdash_{\{y\}} \forall x.\bot| next, it becomes refutable. This doesn’t contradict our earlier rule about adding free variables because |\vdash_\varnothing \forall x.\bot| wasn’t provable and so the rule says nothing. On the other hand, that rule does require |\vdash_{\{y\}} \bigwedge_{i \in \varnothing}\bot| to be provable, and it is. Of course, it no longer corresponds to |\forall x.\bot| with this set of free variables. The putative corresponding formula would be |\bigwedge_{i \in \{y\}}\bot| which is indeed refutable.
With the setup above, we can’t get the elimination rule for |\bigwedge| to correspond to the elimination rule for |\forall|, because there isn’t a singular set of terms. However, a more common if less clean approach is to allow all free variables all the time, i.e. to fix a single countably infinite set of variables once and for all. This would “resolve” this problem.
The differences in the introduction rules are more stark. The rules are:
\[\frac{\Gamma \vdash \varphi \quad x\textrm{ not free in }\Gamma}{\Gamma \vdash \forall x.\varphi}\forall I \qquad \frac{\left\{\Gamma \vdash \varphi_i \right\}_{i \in I}}{\Gamma \vdash \bigwedge_{i \in I}\varphi_i}{\wedge}I\]
Again, the most blatant difference is that (when |I| is infinite) |{\wedge}I| corresponds to an infinitely large derivation. Again, the uniformity aspects show through. |\forall I| requires a single derivation that will handle all terms, whereas |{\wedge}I| allows a different derivation for each |i \in I|.
We don’t run into the same issue as in the semantic view with needing to turn elements of the domain into terms/formulas. Given a formula |\varphi| with free variable |x|, we can easily make a formula |\varphi_t| for every term |t|, namely |\varphi_t = \varphi[x \mapsto t]|. We won’t have the issue that leads to |\omega|-inconsistency because |\forall x.\varphi| is derivable from |\bigwedge_{t \in \mathsf{Term}(V)}\varphi[x \mapsto t]|. Of course, the reason this is true is because one of the terms in |\mathsf{Term}(V)| will be a variable not occurring in |\Gamma| allowing us to derive the premise of |\forall I|. On the other hand, if we choose |I = \mathsf{Term}(\varnothing)|, i.e. only consider closed terms, which is what the |\omega| rule in arithmetic is doing, then we definitely can get |\omega|-inconsistency-like situations. Most notably, in the case of theories, like ZFC, which have no closed terms.
Constructive View
A constructive perspective allows us to accentuate the contrast between universal quantification and infinitary conjunction even more as well as bring more clarity to the notion of uniformity.
We’ll start with the BHK interpretation of Intuitionistic logic and specifically a realizabilty interpretation. For this, we’ll allow infinitary conjunction only for |I = \mathbb N|.
I’ll write |n\textbf{ realizes }\varphi| for the statement that the natural number |n| realizes the formula |\varphi|. As in the linked articles, we’ll need a computable pairing function which computably encodes a pair of natural numbers as a natural number. I’ll just write this using normal pairing notation, i.e. |(n,m)|. We’ll also need Gödel numbering to computably map a natural number |n| to a computable function |f_n|.
\[\begin{align} (n_0, n_1)\textbf{ realizes }\varphi_1 \land \varphi_2 \quad & \textrm{if and only if} \quad n_0\textbf{ realizes }\varphi_0\textrm{ and } n_1\textbf{ realizes }\varphi_1 \\ n\textbf{ realizes }\forall x.\varphi \quad & \textrm{if and only if}\quad \textrm{for all }m, f_n(m)\textbf{ realizes }\varphi[x \mapsto m] \\ (k, n_k)\textbf{ realizes }\varphi_1 \lor \varphi_2 \quad & \textrm{if and only if} \quad k \in \{0, 1\}\textrm{ and }n_k\textbf{ realizes }\varphi_k \\ n\textbf{ realizes }\neg\varphi \quad & \textrm{if and only if} \quad\textrm{there is no }m\textrm{ such that }m\textbf{ realizes }\varphi \end{align}\]
I included disjunction and negation in the above so I could talk about the Law of the Excluded Middle. Via the above interpretation, given any formula |\varphi| with free variable |x|, the meaning of |\forall x.\varphi\lor\neg\varphi| would be a computable function which for each natural number |m| produces a bit indicating whether or not |\varphi[x \mapsto m]| holds. The Law of Excluded Middle holding would thus mean every such formula is computationally decidable which we know isn’t the case. For example, choose |\varphi| as the formula which asserts that the |x|-th Turing machine halts.
This example illustrates the uniformity constraint. Assuming a traditional, classical meta-language, e.g. ZFC, then it is the case that |(\varphi\lor\neg\varphi)[x \mapsto m]| is realized for each |m| in the case where |\varphi| is asserting the halting of the |x|-th Turing machine4. But this interpretation of universal quantification requires not only that the quantified formula holds for all naturals, but also that we can computably find this out.
It’s clear that trying to formulate a notion of infinitary conjunction with regards to realizability would require using something other than natural numbers as realizers if we just directly generalize the finite conjunction case. For example, we might use potentially infinite sequences of natural numbers as realizers. Regardless, the discussion of the previous example makes it clear an interpretation of infinitary conjunction can’t be done in standard computability5, while, obviously, universal quantification can.
Categorical View
The categorical semantics of universal quantification and conjunction are quite different which also suggests that they are not related, at least not in some straightforward way.
One way to get to categorical semantics is to restate traditional, set-theoretic semantics in categorical terms. Traditionally, the semantics of a formula is a subset of some product of the domain set, one for each free variable. Categorically, that suggests we want finite products and the categorical semantics of a formula should be a subobject of a product of some object representing the domain.
Conjunction is traditionally represented via intersection of subsets, and categorically we form the intersection of subobjects via pulling back. So to support finite conjunctions, we need our category to additionally have finite pullbacks of monomorphisms. Infinitary conjunctions simply require infinitely wide pullbacks of monomorphisms. However, we can start to see some cracks here. What does it mean for a pullback to be infinitely wide? It means the obvious thing; namely, that we have an infinite set of monomorphisms sharing a codomain, and we’ll take the limit of this diagram. The key here, though, is “set”. Regardless of whatever the objects of our semantic category are, the infinitary conjunctions are indexed by a set.
To talk about the categorical semantics of universal quantification, we need to bring to the foreground some structure that we have been leaving – and traditionally accounts do leave – in the background. Before, I said the semantics of a formula, |\varphi|, depends on the free variables in that formula, e.g. if |D| is our domain object, then the semantics of a formula with three free variables would be a subobject of |\prod_{v \in \mathsf{fv}(\varphi)}D \cong D\times D \times D| which I’ll continue to write as |D^{\mathsf{fv}(\varphi)}| though now it will be interpreted as a product rather than a function space, i.e. we interpret this notation as a power. For |\mathbf{Set}|, this makes no difference. It would be more accurate to say that a formula can be given semantics in any product of the domain object indexed by any superset of the free variables. This is just to say that a formula doesn’t need to use every free variable that is available. Nevertheless, even if it is induced by the same formula, a subobject of |D^{\mathsf{fv}(\varphi)}| is a different subobject than a subobject of |D^{\mathsf{fv}(\varphi) \cup \{u\}}| where |u| is a variable not free in |\varphi|, so we need a way of relating the semantics of formulas considered with respect to different sets of free variables.
To do this, we will formulate a category of contexts and index our semantics by it. Fix a category |\mathcal C| and an object |D| of |\mathcal C|. Our category of contexts, |\mathsf{Ctx}|, will be the full subcategory of |\mathcal C| with objects of the form |D^S| where |S| is a finite subset of |V|, a fixed set of variables. We’ll assume these products exist, though typically we’ll just assume that |\mathcal C| has all finite products. From here, we use the |\mathsf{Sub}| functor. |\mathsf{Sub} : \mathsf{Ctx}^{op} \to \mathbf{Pos}| maps an object of |\mathsf{Ctx}| to the poset of its subobjects as objects of |\mathcal C|6. Now an arrow |f : D^{\{x,y,z,w\}} \to D^{\{x,y,z\}}| would induce a monotonic function |\mathsf{Sub}(f) : \mathsf{Sub}(D^{\{x,y,z\}}) \to \mathsf{Sub}(D^{\{x,y,z,w\}})|. This is defined for each subobject by pulling back a representative monomorphism of that subobject along |f|. Arrows of |\mathsf{Ctx}| are the semantic analogues of substitutions, and |\mathsf{Sub}(f)| applies these “substitutions” to the semantics of formulas.
Universal quantification is then characterized as the (indexed) right adjoint (Galois connection in this context) of |\mathsf{Sub}(\pi^x)| where |\pi^x : D^S \to D^{S \setminus \{x\}}| is just projection. The indexed nature of this adjoint leads to Beck-Chevalley conditions reflecting the fact universal quantification should respect substitution. |\mathsf{Sub}(\pi^x)| corresponds to adding |x| as a new, unused free variable to a formula. Let |U| be a subobject of |D^{S \setminus \{x\}}| and |V| a subobject of |D^S|. Furthermore, write |U \sqsubseteq U’| to indicate that |U| is a subobject of the subobject |U’|, i.e. that the monos that represent |U| factor through the monos that represent |U’|. The adjunction then states: \[\mathsf{Sub}(\pi^x)(U) \sqsubseteq V\quad \textrm{if and only if}\quad U \sqsubseteq \forall_x(V)\] The |\implies| direction is a fairly direct semantic analogue of the |\forall I| rule: \[\frac{\Gamma \vdash \varphi\quad x\textrm{ not free in }\Gamma}{\Gamma \vdash \forall x.\varphi}\] Indeed, it is easy to show that the converse of this rule is derivable with |\forall E| validating the semantic “if and only if”. To be clear, the full adjunction is natural in |U| and |V| and indexed, effectively, in |S|.
Incidentally, we’d also want the semantics of infinite conjunctions to respect substitution, so they too have a Beck-Chevalley condition they satisfy and give rise to an indexed right adjoint.
It’s hard to even compare the categorical semantics of infinitary conjunction and universal quantification, let alone conflate them, even when |\mathcal C = \mathbf{Set}|. This isn’t too surprising as these semantics work just fine for constructive logics where, as illustrated earlier, these can be semantically distinct. As mentioned, both of these constructs can be described by indexed right adjoints. However, they are adjoints between very different indexed categories. If |\mathcal M| is our indexed category (above it was |\mathsf{Sub}|), then we’ll have |I|-indexed products if |\Delta_{\mathcal M} : \mathcal M \to [DI, -] \circ \mathcal M| has an indexed right adjoint where |D : \mathbf{Set} \to \mathbf{cat}| is the discrete (small) category functor. For |\mathcal M| to have universal quantification, we need an indexed right adjoint to an indexed functor |\mathcal M \circ \mathsf{cod} \circ \iota \to \mathcal M \circ \mathsf{dom} \circ \iota| where |\iota : s(\mathsf{Ctx}) \hookrightarrow \mathsf{Ctx}^{\to}| is the full subcategory of the arrow category |\mathsf{Ctx}^{\to}| consisting of just the projections.
Conclusion
My hope is that the preceding makes it abundantly clear that viewing universal quantification as some kind of special “infinite conjunction” is not sensible even approximately. To do so is to seriously misunderstand universal quantification. Most discussions “equating” them involve significant conflations of syntax and semantics where a specific choice of domain is fixed and elements of that specific domain are used as terms.
A secondary goal was to illustrate an aspect of logic from a variety of perspectives and illustrate some of the concerns in meta-logical reasoning. For example, quantifiers and connectives are syntactical concepts and thus can’t depend on the details of the semantic domain. As another example, better perspectives on quantifiers and connectives are more robust to weakening the logic. I’d say this is especially true when going from classical to constructive logic. Structural proof theory and categorical semantics are good at formulating logical concepts modularly so that they still make sense in very weak logics.
Unfortunately, the traditional trend towards minimalism strongly pushes in the other direction leading to the exploiting of every symmetry and coincidence a stronger logic (namely classical logic) provides producing definitions that don’t survive even mild weakening of the logic7. The attempt to identify universal quantification with infinite conjunction here takes that impulse too far and doesn’t even work in classical logic as demonstrated. While there’s certainly value in recognizing redundancy, I personally find minimizing logical assumptions far more important and valuable than minimizing (primitive) logical connectives.
“Universal statements are true if they are true for every individual in the world. They can be thought of as an infinite conjunction,” from some random AI lecture notes. You can find many others.↩︎
The domain doesn’t even need to be a set.↩︎
For example, we may formulate our syntax in a second-order arithmetic identifying our syntax’s meta-theoretic sets with unary predicates, while our semantics is in ZFC. Just from cardinality concerns, we know that there’s no way of injectively mapping every ZFC set to a set of natural numbers.↩︎
It’s probably worth pointing out that not only will this classical meta-language not tell us whether it’s |\varphi[x \mapsto m]| or |\neg\varphi[x \mapsto m]| that holds for every specific |m|, but it’s easy to show (assuming consistency of ZFC) that |\varphi[x \mapsto m]| is independent of ZFC for specific values of |m|. For example, it’s easy to make a Turing machine that halts if and only if it finds a contradiction in the theory of ZFC.↩︎
Interestingly, for some models of computation, e.g. ones based on Turing machines, infinitary disjunction, or, specifically, |\mathbb N|-ary disjunction is not problematic. Given an infinite sequence of halting Turing machines, we can interleave their execution such that every Turing machine in the sequence will halt at some finite time. Accordingly, extending the definition of disjunction in realizability to the |\mathbb N|-ary case does not run into any of the issues that |\mathbb N|-ary conjunction has and is completely unproblematic. We just let |k| be an arbitrary natural instead of just |\{0, 1\}|.↩︎
This is a place we could generalize the categorical semantics further. There’s no reason we need to consider this particular functor. We could consider other functors from |\mathsf{Ctx}^{op} \to \mathbf{Pos}|, i.e. other indexed |(0,1)|-categories. This setup is called a hyperdoctrine↩︎
The most obvious example of this is defining quantifiers and connectives in terms of other connectives particularly when negation is involved. A less obvious example is the overwhelming focus on |\mathbf 2|-valued semantics when classical logic naturally allows arbitrary Boolean-algebra-valued semantics.↩︎
The purpose of this article is to answer the question: what is the coproduct of two groups? The approach, however, will be somewhat absurd. Instead of simply presenting a construction and proving that it satisfies the appropriate universal property, I want to find the general answer and simply instantiate it for the case of groups.
Specifically, this will be a path through the theory of Lawvere theories and their models with the goal of motivating some of the theory around it in pursuit of the answer to this relatively simple question.
If you really just want to know the answer to the title question, then the construction is usually called the free product and is described on the linked Wikipedia page.
Groups as Models of a Lawvere Theory
A group is a model of an equational theory. This means a group is described by a set equipped with a collection of operations that must satisfy some equations. So we’d have a set, |G|, and operations |\mathtt{e} : () \to G|, |\mathtt{i} : G \to G|, and |\mathtt{m} : G \times G \to G|. These operations satisfy the equations, \[ \begin{align} \mathtt{m}(\mathtt{m}(x, y), z) = \mathtt{m}(x, \mathtt{m}(y, z)) \\ \mathtt{m}(\mathtt{e}(), x) = x = \mathtt{m}(x, \mathtt{e}()) \\ \mathtt{m}(\mathtt{i}(x), x) = \mathtt{e}() = \mathtt{m}(x, \mathtt{i}(x)) \end{align} \] universally quantified over |x|, |y|, and |z|.
These equations can easily be represented by commutative diagrams, i.e. equations of compositions of arrows, in any category with finite products of an object, |G|, with itself. For example, the left inverse law becomes: \[ \mathtt{m} \circ (\mathtt{i} \times id_G) = \mathtt{e} \circ {!}_G \] where |{!}_G : G \to 1| is the unique arrow into the terminal object corresponding to the |0|-ary product of copies of |G|.
One nice thing about this categorical description is that we can now talk about a group object in any category with finite products. Even better, we can make this pattern describing what a group is first-class. The (Lawvere) theory of a group is a (small) category, |\mathcal{T}_{\mathbf{Grp}}| whose objects are an object |\mathsf{G}| and all its powers, |\mathsf{G}^n|, where |\mathsf{G}^0 = 1| and |\mathsf{G}^{n+1} = \mathsf{G} \times \mathsf{G}^n|. The arrows consist of the relevant projection and tupling operations, the three arrows above, |\mathsf{m} : \mathsf{G}^2 \to \mathsf{G}^1|, |\mathsf{i} : \mathsf{G}^1 \to \mathsf{G}^1|, |\mathsf{e} : \mathsf{G}^0 \to \mathsf{G}^1|, and all composites that could be made with these arrows. See my previous article for a more explicit description of this, but it should be fairly intuitive.
An actual group is then, simply, a finite-product-preserving functor |\mathcal{T}_{\mathbf{Grp}} \to \mathbf{Set}|. It must be finite-product-preserving so the image of |\mathsf{m}| actually gets sent to a binary function and not some function with some arbitrary domain. The category, |\mathbf{Grp}|, of groups and group homomorphisms is equivalent to the category |\mathbf{Mod}_{\mathcal{T}_{\mathbf{Grp}}}| which is defined to be the full subcategory of the category of functors from |\mathcal{T}_{\mathbf{Grp}} \to \mathbf{Set}| consisting of the functors which preserve finite products. While we’ll not explore it more here, we could use any category with finite products as the target, not just |\mathbf{Set}|. For example, we’ll show that |\mathbf{Grp}| has finite products, and in fact all limits and colimits, so we can talk about the models of the theory of groups in the category of groups. This turns out to be equivalent to the category of Abelian groups via the well-known Eckmann-Hilton argument.
A Bit of Organization
First, a construction that will become even more useful later. Given any category, |\mathcal{C}|, we define |\mathcal{C}^{\times}|, or, more precisely, an inclusion |\sigma : \mathcal{C} \hookrightarrow \mathcal{C}^{\times}| to be the free category-with-finite-products generated from |\mathcal{C}|. Its universal property is: given any functor |F : \mathcal{C} \to \mathcal{E}| into a category-with-finite-products |\mathcal E|, there exists a unique finite-product-preserving functor |\bar{F} : \mathcal{C}^{\times} \to \mathcal E| such that |F = \bar{F} \circ \sigma|.
An explicit construction of |\mathcal{C}^{\times}| is the following. Its objects consist of (finite) lists of objects of |\mathcal{C}| with concatenation as the categorical product and the empty list as the terminal object. The arrows are tuples with a component for each object in the codomain list. Each component is a pair of an index into the domain list and an arrow from the corresponding object in the domain list to the object in the codomain list for this component. For example, the arrow |[A, B] \to [B, A]| would be |((1, id_B), (0, id_A))|. The idea is that |((k_1, f_1), \dots, (k_n, f_n))| will be interpreted as |\langle f_1 \circ \pi_{k_1}, \dots, f_n \circ \pi_{k_n}\rangle| where |\pi_{k_i}| is the projection |k_i|-th component of the input. Identity and composition is straightforward. |\sigma| then maps each object to a singleton list and each arrow |f| to |((0, f))|.
Like most free constructions, this construction completely ignores any finite products the original category may have had. In particular, we want the category |\mathcal{T}_{\mathbf{Set}} = \mathbf{1}^{\times}|, called the theory of a set. The fact that the one object of the category |\mathbf{1}| is terminal has nothing to do with its image via |\sigma| which is not the terminal object.
We now define the general notion of a (Lawvere) theory as a small category with finite products, |\mathcal{T}|, equipped with a finite-product-preserving, identity-on-objects functor |\mathcal{T}_{\mathbf{Set}} \to \mathcal{T}|. A morphism of (Lawvere) theories is a finite-product-preserving functor that preserves these inclusions a la: \[ \xymatrix { & \mathcal{T}_{\mathbf{Set}} \ar[dl] \ar[dr] & \\ \mathcal{T}_1 \ar[rr] & & \mathcal{T}_2 } \]
The identity-on-objects aspect of the inclusion of |\mathcal{T}_{\mathbf{Set}}| along with finite-product-preservation ensures that the only objects in |\mathcal{T}| are powers of a single object which we’ll generically call |\mathsf{G}|. This is sometimes called the “generic object”, though the term “generic object” has other meanings in category theory. To be clear, if |F| is an identity-on-objects functor, we’re not just saying |FX = X| for every object |X|, but that the object part of the functor is the identity function, i.e. if |F : \mathcal C \to \mathcal D|, then |\mathcal C| and |\mathcal D| have exactly the same objects.
A model of a theory (in |\mathbf{Set}|) is then simply a finite-product-preserving functor into |\mathbf{Set}|. |\mathbf{Mod}_{\mathcal{T}}| is the full subcategory of functors from |\mathcal{T} \to \mathbf{Set}| which preserve finite products. The morphisms of models are simply the natural transformations. As an exercise, you should show that for a natural transformation |\tau : M \to N| where |M| and |N| are two models of the same theory, |\tau_{\mathsf{G}^n} = \tau_{\mathsf{G}}^n|.
The Easy Categorical Constructions
This relatively simple definition of model already gives us a large swathe of results. An easy result in basic category theory is that (co)limits in functor categories are computed pointwise whenever the corresponding (co)limits exist in the codomain category. In our case, |\mathbf{Set}| has all (co)limits, so all categories of |\mathbf{Set}|-valued functors have all (co)limits and they are computed pointwise.
However, the (co)limit of finite-product-preserving functors into |\mathbf{Set}| may not be finite-product-preserving, so we don’t immediately get that |\mathbf{Mod}_{\mathcal{T}}| has all (co)limits (and they are computed pointwise). That said, finite products are limits and limits commute with each other, so we do get that |\mathbf{Mod}_{\mathcal{T}}| has all limits and they are computed pointwise. Similarly, sifted colimits, which are colimits that commute with finite products in |\mathbf{Set}| also exist and are computed pointwise in |\mathbf{Mod}_{\mathcal{T}}|. Sifted colimits include the better known filtered colimits which commute with all finite limits.
I’ll not elaborate on sifted colimits. We’re here for (finite) coproducts, and, as you’ve probably already guessed, coproducts are not sifted colimits.
When the Coproduct of Groups is Easy
There is one class of groups whose coproduct is easy to compute for general reasons: the free groups. The free group construction, like most “free constructions”, is a left adjoint and left adjoints preserve colimits, so the coproduct of two free groups is just the free group on the coproduct, i.e. disjoint union, of their generating sets. We haven’t defined the free group yet, though.
Normally, the free group construction would be defined as left adjoint to the underlying set functor. We have a very straightforward way to define the underlying set functor. Define |U : \mathbf{Mod}_{\mathcal T} \to \mathbf{Set}| as |U(M) = M(\mathsf{G}^1)| and |U(\tau) = \tau_{\mathsf{G}^1}|. Identifying |\mathsf{G}^1| with the functor |\mathsf G : \mathbf{1} \to \mathcal{T}| we have |U(M) = M \circ \mathsf{G}| giving a functor |\mathbf{1} \to \mathbf{Set}| which we identify with a set. The left adjoint to precomposition by |\mathsf{G}| is the left Kan extension along |\mathsf{G}|.
We then compute |F(S) = \mathrm{Lan}_{\mathsf{G}}(S) \cong \int^{{*} : \mathbf{1}} \mathcal{T}(\mathsf{G}({*}), {-}) \times S({*}) \cong \mathcal{T}(\mathsf{G}^1, {-}) \times S|. This is the left Kan extension and does form an adjunction but not with the category of models because the functor produced by |F(S)| does not preserve finite products. We should have |F(S)(\mathsf{G}^n) \cong F(S)(\mathsf{G})^n|, but substituting in the definition of |F(S)| clearly does not satisfy this. For example, consider |F(\varnothing)(\mathsf{G}^0)|.
We can and will show that the left Kan extension of a functor into |\mathbf{Set}| preserves finite products when the original functor did. Once we have that result we can correct our definition of the free construction. We simply replace |S : \mathbf{1} \to \mathbf{Set}| with a functor that does preserve finite products, namely |\bar{S} : \mathbf{1}^{\times} \to \mathbf{Set}|. Of course, |\mathbf{1}^{\times}| is exactly our definition of |\mathcal{T}_{\mathbf{Set}}|. We see now that a model of |\mathcal{T}_{\mathbf{Set}}| is the same thing as having a set, hence the name. Indeed, we have an equivalence of categories between |\mathbf{Set}| and |\mathbf{Mod}_{\mathcal{T}_{\mathbf{Set}}}|. (More generally, this theory is called “the theory of an object” as we may consider models in categories other than |\mathbf{Set}|, and we’ll still have this relation.)
The correct definition of |F| is |F(S) = \mathrm{Lan}_{\iota}(\bar S) \cong \int^{\mathsf{G}^n:\mathcal{T}_{\mathbf{Set}}} \mathcal{T}(\iota(\mathsf{G}^n), {-}) \times \bar{S}(\mathsf{G}^n) \cong \int^{\mathsf{G}^n:\mathcal{T}_{\mathbf{Set}}} \mathcal{T}(\iota(\mathsf{G}^n), {-}) \times S^n| where |\iota : \mathcal{T}_{\mathbf{Set}} \to \mathcal{T}| is the inclusion we give as part of the definition of a theory. We can also see |\iota| as |\bar{\mathsf{G}}|.
We can start to see the term algebra in this definition. An element of |F(S)| is a choice of |n|, an |n|-tuple of elements of |S|, and a (potentially compound) |n|-ary operation. We can think of an element of |\mathcal{T}(\mathsf{G}^n, {-})| as a term with |n| free variables which we’ll label with the elements of |S^n| in |F(S)|. The equivalence relation in the explicit construction of the coend allows us to swap projections and tupling morphisms from the term to the tuple of labels. For example, it equates a unary term paired with one label with a binary term paired with two labels but where the binary term immediately discards one of its inputs. Essentially, if you are given a unary term and two labels, you can either discard one of the labels or you can make the unary term binary by precomposing with a projection. Similarly for tupling.
It’s still not obvious this definition produces a functor which preserves finite products. As a lemma to help in the proof of that fact, we have a bit of coend calculus.
Lemma 1: Let |F \dashv U : \mathcal{D} \to \mathcal{C}| and |H : \mathcal D^{op} \times \mathcal{C} \to \mathcal{E}|. Then, |\int^C H(FC, C) \cong \int^D H(D, UD)| when one, and thus both, exist. Proof: \[ \begin{align} \mathcal{E}\left(\int^C H(FC, C), {-}\right) & \cong \int_C \mathcal{E}(H(FC, C), {-}) \tag{continuity} \\ & \cong \int_C \int_D [\mathcal{D}(FC, D), \mathcal{E}(H(D, C), {-})] \tag{Yoneda} \\ & \cong \int_C \int_D [\mathcal{C}(C, UD), \mathcal{E}(H(D, C), {-})] \tag{adjunction} \\ & \cong \int_D \int_C [\mathcal{C}(C, UD), \mathcal{E}(H(D, C), {-})] \tag{Fubini} \\ & \cong \int_D \mathcal{E}(H(D, UD), {-}) \tag{Yoneda} \\ & \cong \mathcal{E}\left(\int^D H(D, UD), {-}\right) \tag{continuity} \\ & \square \end{align} \]
Using the adjunctions |\Delta \dashv \times : \mathcal{C} \times \mathcal{C}\to \mathcal{C}| and |{!}_1 \dashv 1 : \mathbf{1} \to \mathcal{C}|, where we’re treating |1| as the functor |\mathbf{1}\to\mathcal{C}| which picks out a terminal object of |\mathcal{C}|, gives the following corollary.
Corollary 2: For any |H : \mathcal{C}^{op} \times \mathcal{C}^{op} \times \mathcal{C} \to \mathcal{E}|, \[\int^{C} H(C, C, C) \cong \int^{C_1}\int^{C_2} H(C_1, C_2, C_1 \times C_2)\] when both exists and for any |H’ : \mathcal{C} \to\mathcal{E}|, |H’(1) \cong \int^C H’(C)|. The former allows us to combine two (co)ends into one. The latter reproduces a standard result about colimits over diagrams whose index category has a terminal object.
Now our theorem.
Theorem 3: Let |F : \mathcal{T}_1 \to \mathbf{Set}| and |J : \mathcal{T}_1 \to \mathcal{T}_2| where |\mathcal{T}_1| and |\mathcal{T}_2| have finite products. Then |\mathrm{Lan}_J(F)| preserves finite products if |F| does.
Proof: \[ \begin{flalign} \mathrm{Lan}_J(F)(X \times Y) & \cong \int^A \mathcal{T}_2(J(A), X \times Y) \times F(A) \tag{coend formula for left Kan extension} \\ & \cong \int^A \mathcal{T}_2(J(A), X) \times \mathcal{T}_2(J(A), Y) \times F(A) \tag{continuity} \\ & \cong \int^{A_1}\int^{A_2}\mathcal{T}_2(J(A_1), X) \times \mathcal{T}_2(J(A_2), Y) \times F(A_1 \times A_2) \tag{Corollary 2} \\ & \cong \int^{A_1}\int^{A_2}\mathcal{T}_2(J(A_1), X) \times \mathcal{T}_2(J(A_2), Y) \times F(A_1) \times F(A_2) \tag{finite product preservation} \\ & \cong \left(\int^{A_1}\mathcal{T}_2(J(A_1), X) \times F(A_1) \right) \times \left(\int^{A_2}\mathcal{T}_2(J(A_2), Y) \times F(A_2)\right) \tag{commutativity and cocontinuity of $\times$} \\ & \cong \mathrm{Lan}_J(F)(X) \times \mathrm{Lan}_J(F)(Y) \tag{coend formula for left Kan extension} \end{flalign} \] and for the 0-ary product case: \[ \begin{flalign} \mathrm{Lan}_J(F)(1) & \cong \int^A \mathcal{T}_2(J(A), 1) \times F(A) \tag{coend formula for left Kan extension} \\ & \cong \int^A 1 \times F(A) \tag{continuity} \\ & \cong 1 \times F(1) \tag{Corollary 2} \\ & \cong 1 \times 1 \tag{finite product preservation} \\ & \cong 1 \tag{1 is unit to $\times$} \end{flalign} \] |\square|
The Coproduct of Groups
To get general coproducts (and all colimits), we’ll show that |\mathbf{Mod}_{\mathcal{T}}| is a reflective subcategory of |[\mathcal{T}, \mathbf{Set}]|. Write |\iota : \mathbf{Mod}_{\mathcal{T}} \hookrightarrow [\mathcal{T}, \mathbf{Set}]|. If we had a functor |R| such that |R \dashv \iota|, then |\iota| being full and faithful implies |\varepsilon : R \circ \iota \cong Id| which allows us to quickly produce colimits in the subcategory via |\int^I D(I) \cong R\int^I \iota D(I)|. It’s easy to verify that |R\int^I \iota D(I)| has the appropriate universal property to be |\int^I D(I)|.
We’ll compute |R| by composing two adjunctions. First, we have |\bar{({-})} \dashv \iota({-}) \circ \sigma : \mathbf{Mod}_{\mathcal{T}^{\times}} \to [\mathcal T, \mathbf{Set}]|. This is essentially the universal property of |\mathcal{T}^{\times}|. When |\mathcal{T}| has finite products, which, of course, we’re assuming, we can use the universal property of |\mathcal{T}^{\times}| to factor |Id_{\mathcal{T}}| into |Id = \bar{Id} \circ \sigma|. The second adjunction is then |\mathrm{Lan}_{\bar{Id}} \dashv {-} \circ \bar{Id} : \mathbf{Mod}_{\mathcal{T}} \to \mathbf{Mod}_{\mathcal{T}^{\times}}|. To verify that these are well-defined, i.e. they produce finite-product-preserving functors, we argue as follows. The left adjoint sends finite-product-preserving functors to finite-product-preserving functors via Theorem 3. The right adjoint is the composition of finite-product-preserving functors.
The composite of the left adjoints is |\iota({-} \circ \bar{Id}) \circ \sigma = \iota({-}) \circ \bar{Id} \circ \sigma = \iota({-})|. The composite of the right adjoint is \[ \begin{align} R(F) & = \mathrm{Lan}_{\bar{Id}}(\bar{F}) \\ & \cong \int^X \mathcal{T}(\bar{Id}(X), {-}) \times \bar{F}(X) \\ & \cong \int^X \mathcal{T}\left(\prod_{i=1}^{\lvert X\rvert} X_i, {-}\right) \times \prod_{i=1}^{\lvert X \rvert} F(X_i) \end{align} \] where we view the list |X : \mathcal{T}^{\times}| as a |\lvert X\rvert|-tuple with components |X_i|.
This construction of the reflector, |R|, is quite similar to the free construction. The main difference is that here we factor |Id| via |\mathcal{T}^{\times}| where there we factored |\mathsf{G} : \mathbf{1} \to \mathcal{T}| via |\mathbf{1}^{\times} = \mathcal{T}_{\mathbf{Set}}|.
Let’s now explicitly describe the coproducts via |R|. As a warm-up, we’ll consider the initial object, i.e. nullary coproducts. We consider |R(\Delta 0)|. Because |0 \times S = 0|, the only case in the coend that isn’t |0| is when |\lvert X \rvert = 0| so the underlying set of the coend reduces to |\mathcal{T}(\mathsf{G}^0, \mathsf{G}^1)|, i.e. the nullary terms. For groups, this is just the unit element. For bounded lattices, it would be the two element set consisting of the top and bottom elements. For lattices without bounds, it would be the empty set. Of course, |R(\Delta 0)| matches |F(0)|, i.e. the free model on |0|.
Next, we consider two models |G| and |H|. First, we compute to the coproduct of |G| and |H| as (plain) functors which is just computed pointwise, i.e. |(G+H)(\mathsf{G}^n) = G(\mathsf{G}^n)+H(\mathsf{G}^n) \cong G(\mathsf{G^1})^n + H(\mathsf{G^1})^n|. Considering the case where |X_i = \mathsf{G}^1| for all |i| and where |\lvert X \rvert = n|, which subsumes all the other cases, we see we have a term with |n| free variables each labelled by either an element of |G| or an element of |H|. If we normalized the term into a list of variables representing a product of variables, then we’d have a essentially a word as described on the Wikipedia page for the free product. If we then only considered quotienting by the equivalences induced by projection and tupling, we’d have the free group on the disjoint union of the underlying sets of the |G| and |H|. However, for |R|, we quotient also by the action of the other operations. The lists of objects with |X_i \neq \mathsf{G}^1| come in here to support equating non-unary ops. For example, a pair of the binary term |\mathsf{m}| and the 2-tuple of elements |(g_1, g_2)| for |g_1, g_2 \in U(G)|, will be equated with the pair of the unary term |id| and the 1-tuple of elements |(g)| where |g = g_1 g_2| in |G|. Similarly for |H| and the other operations (and terms generally). Ultimately, the quotient identifies every element with an element that consists of a pair of a term that is a fully right associated set of multiplications ending in a unit where each variable is labelled with an element from |U(G)| or |U(H)| in an alternating fashion. These are the reduced words in the Wikipedia article.
This, perhaps combined with a more explicit spelling out of the equivalence relation, should make it clear that this construction does actually correspond to the usual free product construction. The name “free product” is also made a bit clearer, as we are essentially building the free group on the disjoint union of the underlying sets of the inputs, and then quotienting that to get the result. While there are some categorical treatments of normalization, the normalization arguments used above were not guided by the category theory. The (underlying sets of the) models produced by the above |F| and |R| functors are big equivalence classes of “terms”. The above constructions provide no guidance for finding “good” representatives of those equivalence classes.
Conclusions
This was, of course, a very complex and round-about way of answering the title question. Obviously the real goal was illustrating these ideas and illustrating how “abstract” categorical reasoning can lead to relatively “concrete” results. Of course, these concrete constructions are derived from other concrete constructions, usually concrete constructions of limits and colimits in |\mathbf{Set}|. That said, category theory allows you to get a lot from a small collection of relatively simple concrete constructions. Essentially, category theory is like a programming language with a small set of primitives. You can write “abstract” programs in terms of that language, but once you provide an “implementation” for those primitives, all those “abstract” programs can be made concrete.
I picked (finite) coproducts, in particular, as they are where a bunch of complexity suddenly arises when studying algebraic objects categorically, but (finite) coproducts are still fairly simple.
For Lawvere theories, one thing to note is that the Lawvere theory is independent of the presentation. Any presentation of the axioms of a group would give rise to the same Lawvere theory. Of course, to explicitly describe the category would end up requiring a presentation of the category anyway. Beyond Lawvere theories are algebraic theories and algebraic categories, and further into essentially algebraic theories and categories. These extend to the multi-sorted case and then into the finite limit preserving case. The theory of categories, for example, cannot be presented as a Lawvere theory but is an essentially algebraic theory. There’s much more that can be said even about specifically Lawvere theories, both from a theoretical perspective, starting with monadicity, and from practical perspectives like algebraic effects.
Familiarity with the properties of functor categories, and especially categories of (co)presheaves was behind many of these results, and many that I only mentioned in passing. It is always useful to learn more about categories of presheaves. That said, most of the theory works in an enriched context and often without too many assumptions. The fact that all we need to talk about models is for the codomains of the functors to have finite products allows quite broad application. We can talk about algebraic objects almost anywhere. For example, sheaves of rings, groups, etc. can equivalently be described as models of the theories of rings, groups, etc. in sheaves of sets.
Kan extensions unsurprisingly played a large role, as they almost always do when you’re talking about (co)presheaves. One of the motivations for me to make this article was a happy confluence of things I was reading leading to a nice, coend calculus way of describing and proving finite-product-preservation for free models.
Thinking about what exactly was going on around finite-product-preservation was fairly interesting. The incorrect definition of the free model functor could be corrected in a different (though, of course, ultimately equivalent) way. The key is to remember that the coend formula for the left Kan extension generally involves a copower and not a cartesian product. The copower for |\mathbf{Set}|-valued functors is different from the copower for finite-product-preserving |\mathbf{Set}|-valued functors. For a category with (arbitrary) coproducts, the copower corresponds to the coproduct of a constant family. We get, |F(S) \cong \coprod_{S} \mathcal T(\mathsf{G}^1, {-})| as is immediately evident from |F| being a left adjoint and a set |S| being the coproduct of |1| |S|-many times. For the purposes of this article, this would have been less than satisfying as figuring out what coproducts were was the nominal point.
That said, it isn’t completely unsatisfying as this defines the free model in terms of a coproduct of, specifically, representables and those are more tractable. In particular, an easy and neat exercise is to work out what |\mathcal{T}(\mathsf{G}^n, {-}) + \mathcal{T}(\mathsf{G}^m, {-})| is. Just use Yoneda and work out what must be true of the mapping out property, and remember that the object you’re mapping into preserves finite products. Once you have finite coproducts described, you can get all the rest via filtered colimits.
]]>This is a brief article about the notions of preserving, reflecting, and creating limits and, by duality, colimits. Preservation is relatively intuitive, but the distinction between reflection and creation is subtle.
Preservation of Limits
A functor, |F|, preserves limits when it takes limiting cones to limiting cones. As often happens in category theory texts, the notation focuses on the objects. You’ll often see things like |F(X \times Y) \cong FX \times FY|, but implied is that one direction of this isomorphism is the canonical morphism |\langle F\pi_1, F\pi_2\rangle|. To put it yet another way, in this example we require |F(X \times Y)| to satisfy the universal property of a product with the projections |F\pi_1| and |F\pi_2|.
Other than that subtlety, preservation is fairly intuitive.
Reflection of Limits versus Creation of Limits
A functor, |F|, reflects limits when whenever the image of a cone is a limiting cone, then the original cone was a limiting cone. For products this would mean that if we had a wedge |A \stackrel{p}{\leftarrow} Z \stackrel{q}{\to} B|, and |FZ| was the product of |FA| and |FB| with projections |Fp| and |Fq|, then |Z| was the product of |A| and |B| with projections |p| and |q|.
A functor, |F|, creates limits when whenever the image of a diagram has a limit, then the diagram itself has a limit and |F| preserves the limiting cones. For products this would mean if |FX| and |FY| had a product, |FX \times FY|, then |X| and |Y| have a product and |F(X \times Y) \cong FX \times FY| via the canonical morphism.
Creation of limits implies reflection of limits since we can just ignore the apex of the cone. While creation is more powerful, often reflection is enough in practice as we usually have a candidate limit, i.e. a cone. Again, this is often not made too explicit.
Example
Consider the posets:
$$\xymatrix{ & & & c \\ X\ar@{}[r]|{\Large{=}} & a \ar[r] & b \ar[ur] \ar[dr] & \\ & & & d \save "1,2"."3,4"*+[F]\frm{} \restore } \qquad \xymatrix{ & & c \\ Y\ar@{}[r]|{\Large{=}} & b \ar[ur] \ar[dr] & \\ & & d \save "1,2"."3,3"*+[F]\frm{} \restore } \qquad \xymatrix{ & c \\ Z\ar@{}[r]|{\Large{=}} & \\ & d \save "1,2"."3,2"*+[F]\frm{} \restore }$$
Failure of reflection
Let |X=\{a, b, c, d\}| with |a \leq b \leq c| and |b \leq d| mapping to |Y=\{b, c, d\}| where |a \mapsto b|. Reflection fails because |a| maps to a meet but is not itself a meet.
Failure of creation
If we change the source to just |Z=\{c, d\}|, then creation fails because |c| and |d| have a meet in the image but not in the source. Reflection succeeds, though, because there are no non-trivial cones in the source, so every cone (trivially) gets mapped to a limit cone. It’s just that we don’t have any cones with both |c| and |d| in them.
In general, recasting reflection and creation of limits for posets gives us: Let |F: X \to Y| be a monotonic function. |F| reflects limits if every lower bound that |F| maps to a meet is already a meet. |F| creates limits if whenever |F[U]| has a meet for |U \subseteq X|, then |U| already had a meet and |F| sends the meet of |U| to the meet of |F[U]|.
]]>