Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: nginx date: Fri, 16 Jan 2026 10:09:46 GMT content-type: text/html content-length: 162 location: https://theorylunch.wordpress.com/feed/atom/ x-ac: 2.bom _dca MISS alt-svc: h3=":443"; ma=86400 strict-transport-security: max-age=31536000 server-timing: a8c-cdn, dc;desc=bom, cache;desc=MISS;dur=564.0 HTTP/2 200 server: nginx date: Fri, 16 Jan 2026 10:09:46 GMT content-type: application/atom+xml; charset=UTF-8 vary: Accept-Encoding x-hacker: Want root? Visit join.a8c.com/hacker and mention this header. host-header: WordPress.com vary: accept, content-type, cookie last-modified: Mon, 19 May 2025 20:58:06 GMT x-nc: HIT dca 165 content-encoding: gzip x-ac: 4.bom _dca MISS alt-svc: h3=":443"; ma=86400 strict-transport-security: max-age=31536000 server-timing: a8c-cdn, dc;desc=bom, cache;desc=MISS;dur=235.0 Theory Lunch Institute of Cybernetics, Tallinn 2022-11-29T17:58:18Z https://theorylunch.wordpress.com/feed/atom/ WordPress.com Silvio Capobianco https://anotherblogonca.wordpress.com <![CDATA[A Remarkable Property of Real-Valued Functions on Intervals of the Real Line]]> https://theorylunch.wordpress.com/?p=1809 2022-11-29T17:58:18Z 2019-10-17T13:41:09Z

Continue reading →]]>

Today the 17 October 2019 I discussed a very remarkable fixed point theorem discovered by the Ukrainian mathematician Oleksandr Micholayovych Sharkovsky.

We recall that a periodic point of period $n\geq1$ for a function $f:X\to{X}$ is a point $x_n$ such that $f^n(x_n)=x_n$ . With this definition, a periodic point of period $n$ is also periodic of period $m$ for every $m$ which is a multiple of $n$ . If $f^n(x_n)=x_n$ but $f^k(x_n)\neq{x_n}$ for every $k$ from 1 to $n-1$ , we say that $n$ is the least period of $x_n$ .

Theorem 1. (Sharkovsky’s “little” theorem) Let $I\subseteq\mathbb{R}$ be an interval and let $f:I\to\mathbb{R}$ be a continuous function su. If $f$ has a point of least period 3, then it has points of arbitrary least period; in particular, it has a fixed point.

Note that no hypothesis is made on $I$ being open or closed, bounded or unbounded.

Our proof of Sharkovsky’s “little” theorem follows the one given in (Sternberg, 2010), and could even be given in a Calculus 1 course: the most advanced result will be the intermediate value theorem.

Lemma 1. Let $I=[a,b]$ be a compact interval of the real line, let $f:I\to\mathbb{R}$ be a continuous function. Suppose that for some compact interval $J$ it is $I\subseteq{J}\subseteq{f(I)}$ . Then $f$ has a fixed point in $J$ .

Proof. Let $m$ and $M$ be the minimum and the maximum of $f$ in $I$ , respectively. As $I\subseteq{f(I)}$ , it is $m\leq{a}$ and $M\geq{b}$ . Choose $u,v\in{I}$ such that $f(u)=m$ and $f(v)=M$ . Then $g(x)=f(x)-x$ is nonpositive at $x=u$ and nonnegative at $x=v$ . By the intermediate value theorem applied to $g$ , $f$ must have a fixed point in the closed and bounded interval (possibly reduced to a single point) delimited by $u$ and $v$ , which is a subset of $J$ . $\Box$

Lemma 2. In the hypotheses of Lemma 1, let $K$ be a closed and bounded interval contained in $f(I)$ . Then there exists a closed and bounded subinterval $J$ of $I$ such that $f(J)=K$ .

Proof. Let $K=[c,d]$ . We may suppose $c<d$ , otherwise the statement is trivial. Let $u\in{I}$ be the largest such that $f(u)=c$ . Two cases are possible.

There exists $x\in(u,b]$ such that $f(x)=d$ . Let $v$ be the smallest such $x$ , and let $J=[u,v]$ . Then surely $f(J)\supset{K}$ , but if for some $x\in(u,v)$ we had either $f(x)<c$ or $f(x)>d$ , then by the intermediate value theorem, for some $y\in(u,v)$ we would also have either $f(y)=c$ or $f(y)=d$ , against our choice of $u$ and $v$ .
$f(x)<d$ for every $x\in(u,b]$ . Let then $w$ be the largest $x\in[a,u]$ such that $f(x)=d$ , and let $J=[w,u]$ . Then $f(J)=K$ for reasons similar to those of the previous point.

$\Box$

Proof of Sharkovsky’s “little” theorem. Let $a,b,c,\in\mathbb{R}$ be such that $f(a)=b$ , $f(b)=c$ , and $f(c)=a$ . Up to cycling between these three values and replacing $f(x)$ with $-f(-x)$ , we may suppose $a<b<c$ . Fix a positive integer $n$ : we will prove that there exists $x_{n}\in{I}$ such that $f^n(x_{n})=x$ and $f^i(x_{n})\neq{x_{n}}$ for every $i<n$ .

Let $L=[a,b]$ and $R=[b,c]$ be the “left” and “right” side of the closed and bounded interval $[a,c]$ : then $R\subseteq{f(L)}$ and $L\cup{R}\subseteq{f(R)}$ by the intermediate value theorem. In particular, $R\subseteq{f(R)}$ , and Lemma 1 immediately tells us that $f$ has a fixed point $x_{1}$ in $R$ . Also, $L\subseteq{f(R)}\subseteq{f^2(L)}$ , so $f$ also has a point of period 2 in $L$ , again by Lemma 1: call it $x_{2}$ . This point $x_{2}$ cannot be a fixed point, because then it would also belong to $R$ as $L\subseteq{f(R)}$ , but $L\cap{R}=\{b\}$ which has period 3. As we can obviously take $x_{3}=b$ , we only need to consider the case $n\geq4$ .

By Lemma 2, there exists a closed and bounded subinterval $A_1$ of $R$ such that $f(A_1)=R$ . In turn, as $A_1\subseteq{R}$ , there also exists a closed and bounded subinterval $A_2$ of $A_1$ such that $f(A_2)=A_1$ , again by Lemma 2: but then, $f^2(A_2)=f(A_1)=R$ . By iterating the procedure, we find a sequence of closed and bounded intervals $A_i$ such that, for every $i\geq1$ , $A_{i+1}\subseteq{A_i}$ and $f^i(A_i)=R$ .

We stop at $i=n-2$ and recall that $R\subseteq{f(L)}$ : we are still in the situation of Lemma 2, with $A_{n-2}$ in the role of $K$ . So we choose $A_{n-1}$ as a closed and bounded subinterval not of $A_{n-2}$ , but of $L$ , such that $f(A_{n-1})=A_{n-2}$ . In turn, as $L\subseteq{f(R)}$ , there exists a closed and bounded subinterval $A_n$ of $R$ such that $f(A_n)=A_{n-1}$ . Following the chain of inclusions we obtain $f^n(A_n)=R$ . By Lemma 1, $f^n$ has a fixed point $x_n$ in $A_n$ , which is a periodic point of period $n$ for $f$ .

Can the least period of $x_n$ for $f$ be smaller than $n$ ? No, it cannot, for the following reason. If $x_{n}$ has period $m\leq{n}$ , then so has $y=f(x_{n})$ , and in addition $n$ is divisible by $m$ . But $f(x_n)\in{L}$ while $f^i(x_n)\in{R}$ for every $i\in[2:n]$ : consequently, if $x_{n}$ has period $m<n$ , then $y\in{L}\cap{R}=\{b\}$ . But this is impossible, because $f^{2}(y)=f^{3}(x_{n})\in{R}$ by construction as $n\geq4$ , while $f^{2}(b)=a\not\in{R}$ . $\Box$

Theorem 1 is a special case of a much more general, and complex, result also due to Sharkovsky. Before stating it, we need to define a special ordering on positive integers.

Definition. The Sharkovsky ordering $\rhd$ between positive integers is defined as follows:

Identify the number $n=2^k\cdot{m}$ , with $m$ odd integer, with the pair $(k,m)$ .
Sort the pairs with $m>1$ in lexicographic order.
That is: first, list all the odd numbers, in increasing order; then, all the doubles of the odd numbers, in increasing order; then, all the quadruples of the odd numbers, in increasing order; and so on.
For example, $17\rhd243$ and $4095\rhd6$
Set $(k,m)\rhd(h,1)$ for every $m>1$ and $k,h\geq0$ .
That is: the powers of 2 follow, in the Sharkovskii ordering, any number which has an odd factor.
For example, $17000000000000\rhd2$ .
Sort the pairs of the form $(k,1)$ —i.e., the powers of 2—in reverse order.

The set of positive integer with the Sharkowsky ordering has then the form:

$3\rhd5\rhd7\rhd\ldots\rhd6\rhd10\rhd14\rhd\ldots\rhd12\rhd20\rhd28\rhd\ldots\rhd8\rhd4\rhd2\rhd1$

Note that $\rhd$ is a total ordering.

Theorem 2. (Sharkovsky’s “great” theorem) Let $I$ be an interval on the real line and let $f:\mathbb{R}\to\mathbb{R}$ be a continuous function.

If $f$ has a point of least period $m$ , and $m\rhd{n}$ , then $f$ has a point of least period $n$ . In particular, if $f$ has a periodic point, then it has a fixed point.
For every $m\geq1$ integer it is possible to choose $I$ and $f$ so that $f$ has a point of minimum period $m$ and no points of minimum period $k$ for any $k\rhd{m}$ . In particular, there are functions whose only periodic points are fixed.

Bibliography:

Keith Burns and Boris Hasselblatt. The Sharkovsky theorem: A natural direct proof. The American Mathematical Monthly 118(3) (2011), 229–244. doi:10.4169/amer.math.monthly.118.03.229
Robert L. Devaney, An Introduction to Chaotic Dynamical Systems, Second Edition, Westview Press 2003.
Shlomo Sternberg, Dynamical Systems, Dover 2010.

]]> 0 Silvio Capobianco https://anotherblogonca.wordpress.com <![CDATA[Positive expansivity is impossible for reversible cellular automata]]> https://theorylunch.wordpress.com/?p=1806 2019-08-30T14:12:06Z 2019-08-30T14:12:06Z

On Thursday 29 August 2019 I gave a talk about expansivity. I focused on positive expansivity and discussed a general statement which has a most remarkable consequence for cellular automata theory.

Find the talk on my personal blog HERE

]]> 0 Silvio Capobianco https://anotherblogonca.wordpress.com <![CDATA[A crash course in subadditivity, part 1]]> https://theorylunch.wordpress.com/?p=1799 2018-03-01T18:16:44Z 2018-03-01T14:12:57Z

Continue reading →]]>

Today, the 1st of March 2018, I gave what ended up being the first of a series of Theory Lunch talks about subadditive functions. The idea is to give an introduction to the subject, following Hille’s and Lind and Marcus’s textbooks, and stating an important theorem by the Hungarian mathematician Mihály Fekete; then, discuss some extensions to the case of many variables and their implications in the theory of cellular automata, referring to two of my papers from 2008, one of them with Tommaso Toffoli and Patrizia Mentrasti.

Let’s start from the beginning:

Definition 1. Let $(S, \cdot)$ be a semigroup. A function $f : S \to \mathbb{R}$ is subadditive if:

$f(x \cdot y) \leq f(x) + f(y)$ for every $x, y \in S$ (1)

$\Diamond$

If $S$ is a group, with an identity element $e$ and a multiplicative inverse $x^{-1}$ for every element $x$ , then (1) is equivalent to:

$f(x) \geq f(y) - f(x^{-1} \cdot y)$ for every $x, y \in S$

Usually, we will have $S$ be one of the sets $\mathbb{Z}$ and $\mathbb{R}$ of integers and reals, respectively, or one of the sets $\mathbb{Z}_{+}$ and $\mathbb{R}_{+}$ of positive integers and positive reals, respectively. All these sets will be considered as semigroups with respect to addition.

Examples of subadditive functions are:

The Heaviside function $H : \mathbb{R} \to \mathbb{R}$ defined by $H(x)=1$ if $x \geq 0$ and $H(x)=0$ if $x<0$ . This function is subadditive, because if $x$ and $y$ are both negative, then the left-hand side of (1) is 0, and if one of them is nonnegative, then the right-hand side is either 1 or 2.
Let $U \subseteq S$ and let $f : S \to \mathbb{R}$ be defined by $f(x)=1$ if $x \in U$ and $f(x)=2$ if $x \not \in U$ . Then $f$ is subadditive, because the left-hand side of (1) is either 1 or 2, and the right-hand side is either 2, 3, or 4. For $S=\mathbb{R}$ and $U=\mathbb{Q}$ this shows that a subadditive function can be discontinuous at every point.
Let $A$ be a finite nonempty set and let $A^\ast$ be the free monoid on $A$ , that is, the set of words on $A$ with concatenation as the binary operation and the empty word $\lambda$ as the identity element. The length of a word is a subadditive (actually, additive) function on $A^\ast$ .

If $S$ is a subsemigroup of either $\mathbb{R}$ or $\mathbb{Z}$ , a useful trick is that $f(x)$ is subadditive on $-S$ if and only if $f(-x)$ is subadditive on $S$ . This, for example, allows to “dualize” proofs on $\mathbb{R}_{+}$ to make them work on $\mathbb{R}_{-}$ , the additive semigroup of negative reals.

Fekete’s lemma. Let $f : S \to \mathbb{R}$ be a subadditive function.

If $S=\mathbb{R}_{+}$ or $S=\mathbb{Z}_{+}$ , then $\ell_{+} = \lim_{x \to +\infty} f(x)/x = \inf_{x > 0} f(x)/x$ .
Dually, if $S=\mathbb{R}_{-}$ or $S=\mathbb{Z}_{-}$ , then $\ell_{-} = \lim_{x \to -\infty} f(x)/x = \sup_{x < 0} f(x)/x$ .
Finally, if $S=\mathbb{R}$ or $S=\mathbb{Z}$ , then $\ell_{-} \leq \ell_{+}$ , and both are finite.

Note that $\ell_{+}$ cannot be $+\infty$ , but can be $-\infty$ : for example, $f(x) = -x^2$ is subadditive on $\mathbb{R}_{+}$ and $\lim_{x \to +\infty} f(x)/x = -\infty$ . Dually, $\ell_{-}$ cannot be $-\infty$ , but can be $+\infty$ . Note that $f(x)/x$ itself needs not be subadditive: for example, $f(x)=-x$ is subadditive on $\mathbb{R}_{+}$ , but $f(x)/x=-1$ is not.

Proof of point 1 with $S = \mathbb{Z}_{+}$ : Fix a positive integer $t$ . Every positive integer $x$ large enough can be written in the form $x = qt + r$ with $q$ positive integer and (attention!) $1 \leq r \leq t$ . By subadditivity,

$\dfrac{f(x)}{x} \leq \dfrac{f(qt) + f(r)}{x} \leq \dfrac{q}{x} f(t) + \dfrac{f(r)}{x}$ .

But by construction, $\lim_{x \to \infty} q/x = 1/t$ : since there are no more than $r$ possible values for $f(r)$ , by taking the upper limits we get

$\limsup_{x \to +\infty} \dfrac{f(x)}{x} \leq \dfrac{f(t)}{t}$ .

This holds for every positive integer $t$ , so we can conclude:

$\limsup_{x \to \infty} \dfrac{f(x)}{x} \leq \inf_{x > 0} \dfrac{f(x)}{x} \leq \liminf_{x \to \infty} \dfrac{f(x)}{x}$ .

$\Box$

A key ingredient of the proof is that $f$ is bounded on $\{1, \ldots, t\}$ . The argument can be modified to work on $\mathbb{R}_{+}$ , but requires that $f$ be bounded in every closed and bounded interval of the form $[1,t+1)$ : this is actually true if $f$ is subadditive, but proving this fact would go beyond the scope of our talks.

An immediate consequence of Fekete’s lemma is that, as it was intuitively true from the definition, a subadditive function defined on $\mathbb{R}_{+}$ or $\mathbb{Z}_{+}$ can go to $+\infty$ for $x \to +\infty$ at most linearly. On the other hand, an everywhere negative subadditive function defined on positive reals or positive integers can go to $-\infty$ for $x \to +\infty$ arbitrarily fast. Indeed, the following holds:

Lemma 1. Let $S$ be either $\mathbb{R}_{+}$ or $\mathbb{Z}_{+}$ and let $f, g : S \to \mathbb{R}$ . If $f$ is negative and subadditive and $g$ is positive and nondecreasing, then $f(x)g(x)$ is subadditive (and negative).

Proof: If $a<0$ and $0 < b \leq c$ , then $ac \leq ab$ . Then the following chain of inequalities hold:

$f(x+y) g(x+y) \leq \left( f(x) + f(y) \right) g(x+y) \leq f(x) g(x) + f(y) g(y)$ .

$\Box$

Hence, if $f(x) = -x$ and $g(x)$ is positive and nondecreasing, then $f(x)g(x)$ is subadditive and $|f(x)g(x)| = \Omega(g(x))$ .

To see an application of Fekete’s lemma in the context of the theory of dynamical system, we introduce the notions of subshift and of cellular automaton. We will first do so in dimension 1, then expand to arbitrary dimension in later talks.

Definition 2. Let $A$ be a finite nonempty set. A subset $X$ of the set $A^\mathbb{Z}$ of bi-infinite words is a (one-dimensional) subshift if there exists a set of forbidden words $\mathcal{F} \subseteq A^\ast$ such that the following holds: for every $x \in A^\mathbb{Z}$ , it is $x \in X$ if and only if for no $i \in \mathbb{Z}$ and $n \in \mathbb{Z}_{+}$ it is $x_i \ldots x_{i+n-1} \in \mathcal{F}$ . The set $\mathcal{L}(X)$ of the words over $A$ which appear in elements of $X$ is called the language of the subshift $X$ . $\Diamond$

Examples of subshifts are:

The full shift $X = A^\mathbb{Z}$ . In this case, $\mathcal{F} = \emptyset$ .
The golden mean shift on the binary alphabet, where $\mathcal{F} = \{ 11 \}$ .

Let $X$ be a subshift on $A$ and let $u$ and $v$ be words on $A$ of length $m$ and $n$ , respectively. If $uv$ is an allowed word for $X$ , then so must be $u$ and $v$ : that is, there cannot be more allowed words of length $m+n$ than juxtapositions of an allowed word of length $m$ and an allowed word of length $n$ . Switching to logarithms, we see that

$f(n) = \log |\mathcal{L}(X) \cap A^n|$

is a subadditive function of the positive integer variable $n$ : Fekete’s lemma then tells us that

$h(X) = \lim_{n \to \infty} \dfrac{\log |\mathcal{L}(X) \cap A^n|}{n}$ (2)

exists. The quantity (2) is called the entropy of the subshift $X$ , and is a measure of the quantity of information it can convey.

(As a funny note, the use of the uncapitalized letter $h$ to indicate entropy apparently originates from Claude Shannon, after John von Neumann suggested that he called “entropy” a similar information-theoretical quantity. Shannon decided to uncapitalize the letter $H$ used by Ludwig Boltzmann… which, however, was a capitalized $\eta$ .)

Definition 3. A one-dimensional cellular automaton is a triple $\mathcal{A} = \langle Q, r, \delta \rangle$ where $Q$ is a finite nonempty set of states, $r$ is a nonnegative integer radius, and $\delta : Q^{2r+1} \to Q$ is a local update rule. $\Diamond$

A cellular automaton $\mathcal{A} = \langle Q, r, \delta \rangle$ induces a global transition function $G : Q^\mathbb{Z} \to Q^\mathbb{Z}$ by synchronous update:

$G(x)_i = \delta(x_{i-r} \ldots x_{i+r})$ for every $i \in \mathbb{Z}$ (3)

If $G$ is the global transition function of a cellular automaton with set of states $Q$ , then $G \left( Q^\mathbb{Z} \right)$ is a subshift. Not every subshift can be obtained this way: those that can, belong to the special class of sofic shifts, a term suggested by Benjamin Weiss coming from the Hebrew word for “finite”.

In the upcoming talk (or talks) we will examine the case of several variables, and correspondingly, subshifts and cellular automata in higher dimension. In particular, we will discuss a generalization of Fekete’s lemma to arbitrarily many positive integer variables.

Bibliography:

Silvio Capobianco. Multidimensional cellular automata and generalization of Fekete’s lemma. Discrete Mathematics and Theoretical Computer Science 10 (2008), 95–104.
Einar Hille. Functional Analysis and Semigroups. American Mathematical Society, 1948.
Douglas Lind and Brian Marcus. An Introduction to Symbolic Dynamics and Coding. Cambridge University Press 1995.
Tommaso Toffoli, Silvio Capobianco, and Patrizia Mentrasti. When—and how—can a cellular automaton be rewritten as a lattice gas? Theoretical Computer Science 403 (2008), 71–88.

]]> 0 Silvio Capobianco https://anotherblogonca.wordpress.com <![CDATA[Nonuniversality in computation: A proof by semantic shift?]]> https://theorylunch.wordpress.com/?p=1759 2016-09-08T14:59:37Z 2016-09-08T14:59:37Z

Continue reading →]]>

Today, the 8th of September 2016, we had a very interesting discussion about a theorem, due to Selim G. Akl, pointed to me in a tweet by Andy Adamatzky. Such theorem has, according to Akl, the consequence that the Church-Turing thesis, a basic tenet of theoretical computer science, is false. Of course, surprising statements require solid arguments: is Akl’s solid enough?

First of all, let us recall what the Church-Turing thesis is, and what it is not. Its statement, as reported by the Stanford Encyclopedia of Philosophy, goes as follows:

A function of positive integers is effectively calculable only if recursive.

Here, for a calculation procedure to be “effective” means the following:

it has a finite description;
it always returns the correct output, given any valid input;
it can be “carried on by pencil and paper” by a human being; and
it requires no insight or ingenuity on the human’s behalf.

One model of effective procedures is given by the recursive functions; another one, by the functions computable by Turing machines; a third one, by the functions which are representable in Church’s $\lambda$ -calculus. Alan Turing and Stephen Cole Kleene proved that the three classes coincide: thus, in ordinary practice, the Church-Turing thesis is often stated with “Turing-computable” in place of “recursive”.

The class of Turing machines has the advantage of containing a universal element: a special Turing machine and an encoding from the set of Turing machines to the set of natural numbers exists such that, when the special Turing machine is provided the encoding of an arbitrary Turing machine and a valid input for the latter, it will return the value of the encoded Turing machine on the provided input.

Now that we have written down what the Church-Turing thesis is, we can examine Akl’s theorem.

In his 2005 paper, Akl defines a universal computer as a system $\mathcal{U}$ having the following features:

It has means of communicating with the outside world, so to receive input, and where to send its output.
It can perform every elementary arithmetic and logic operations.
It can be programmed, according to the two previous rules.
It has unlimited memory to use for input, output, and temporary values.
It can only execute finitely many operations (evaluating input, producing output, performing an elementary operation, etc.) at each time step.
It can simulate any computation performed by any other model of computation.

The statement of the theorem, which does not appear explicitly in the original paper but is written down in the one from 2015 which clarifies the idea and addresses criticism, is hereby reported verbatim:

Nonuniversality in Computation Theorem (NCT): No computer is universal if it is capable of exactly $T(i)$ operations during time unit $i$ of computation, where $i$ is a positive integer, and $T(i)$ is finite and fixed once and for all.

The main argument is that no such computer can perform a computation which requires more than $T(i)$ operations at some time $i$ . Explicit examples happen in parallel computation, a field Akl is a master of, where the number of operations that can be performed in a time unit grows linearly with the number of processors: for instance, reading $n$ values in input can be done in time $T(i)=1$ by a parallel machine with $n$ processors, but not by any machine with $m<n$ processors.

Such requirement, however, does not appear in the notion of universality at the base of the original, and actual, Church-Turing thesis. There, to “simulate” a machine or algorithm means to be able of always reproducing the same output of the algorithm, given any valid input for it, up to an encoding of the input and the output. But no hypothesis on how the output is achieved from the input is made: a simulation in linear time, such that each step of the simulated algorithm is reproduced by exactly $k$ operations of the Turing machine, is as good as one where the simulation of the $i$ th step takes $17^{i^2-i}$ operations from the Turing machine, or where no such regularity appears.

Among the (counter)examples provided by Akl are:

Computations with time-varying variables.
Computations with time-varying computational complexity.
Computations whose complexity depends on their placement on a schedule.
Computations with interacting variables, e.g., states of entangled electrons.
Computations with uncertain time constraints.

None of these, however, respect the definition of computation from the model of recursive functions: where the values of the variables are given once and for all, and can possibly change for recursive calls, but not for the original call. They can be seen as instances of unconventional models of computation: but by doing this, one changes the very notion of computation, which ceases to be the one at the basis of the Church-Turing thesis.

So my guess is that Akl’s statement about the falsity of the Church-Turing thesis actually falls in the following category, as reported in the humorous list by Dana Angluin:

Proof by semantic shift: Some standard but inconvenient definitions are changed for the statement of the result.

Actually, if we go back to Akl’s definition of a universal computer, it appears to be fine until the very last: the first two points agree with the definition of effective computation at the basis of the actual Church-Turing thesis, the next three are features of any universal Turing machine. The problem comes from the last point, which has at least two weak spots: the first one being that it does not define precisely what a model of computation is, which can be accepted as Akl is talking of unconventional computation, and it is wiser to be open to other possibilities. But there is a more serious one, in that it is not clear

what does the expression “to simulate” mean.

Note that the Stanford Encyclopedia of Philosophy reports the following variant of the Church-Turing thesis, attributed to David Deutsch:

Every finitely realizable physical system can be perfectly simulated by a universal model computing machine operating by finite means.

Deutsch’s thesis, however, does not coincide with the Church-Turing thesis! (This, notwithstanding Deutsch’s statement that “[t]his formulation is both better defined and more physical than Turing’s own way of expressing it”.) Plus, there is another serious ambiguity, which is of the same kind as the one in Akl’s definition:

what is “perfectly simulated” supposed to mean?

Does it mean that every single step performed by the system can be reproduced in real time? In this case, Akl is perfectly right in disproving it under the constraint of boundedly many operations at each time unit. Or does it mean that the simulation of each elementary step of the process (e.g., one performed in a quantum of time) ends with the correct result if the correct initial conditions are given? In this case, the requirement to reproduce exactly what happens between the reading of the input and the writing of the output is null and void.

Worse still, there is a vulgarized form of the Church-Turing thesis, which is reported by Akl himself on page 172 of his 2005 paper!, and goes as follows:

Any computable function can be computed on a Turing machine.

If one calls that “the Church-Turing thesis”, then Akl’s NCT is absolutely correct in disproving it. But that is not the actual Church-Turing thesis! It is actually a rewording of what in the Stanford Encyclopedia of Philosophy is called “Thesis M”, and explicitly stated not to be equivalent to the original Church-Turing thesis—and also false. Again, the careful reader will have noticed that, in the statement above, being “computable by a Turing machine” is a well defined property, but “computable” tout court definitely not so.

At the end of this discussion, my thesis is that Akl’s proof is correct, but NCT’s consequences and interpretation might not be what Akl means, or (inclusive disjunction) what his critics understand. As for my personal interpretation of NCT, here it goes:

No computer which is able to perform a predefinite, finite number of operations at each finite time step, is universal across all the different models of computation, where the word “computation” may be taken in a different meaning than that of the Church-Turing thesis.

Is mine an interpretation by semantic shift? Discussion is welcome.

References:

Selim G. Akl. The Myth of Universal Computation. Parallel Numerics ’05, 167–192.
Selim G. Akl. Nonuniversality explained. International Journal of Parallel, Emergent and Distributed Systems 31:3, 201–219. doi:10.1080/17445760.2015.1079321
The Church-Turing Thesis. Stanford Encyclopedia of Philosophy. First published January 8, 1997; substantive revision August 19, 2002. https://plato.stanford.edu/entries/church-turing/
Dana Angluin’s List of Proof Techniques. https://www.cs.northwestern.edu/~riesbeck/proofs.html

]]> 2 Silvio Capobianco https://anotherblogonca.wordpress.com <![CDATA[Second-order theories should not be taken lightly]]> https://theorylunch.wordpress.com/?p=1686 2016-01-14T12:14:27Z 2016-01-14T12:14:27Z

Continue reading →]]>

First-order formal logic is a standard topic in computer science. Not so for second-order logic: which, though used the default in fields of mathematics such as topology and analysis, is usually not treated in standard courses in mathematical logic. For today’s Theory Lunch I discussed some classical theorems that hold for first-order logic, but not for second-order logic: my talk was based on Boolos’ classical textbook.

We consider languages made of symbols that represents either objects, or functions, or relations: in particular, unary relations, or equivalently, sets. A sentence on such a language is a finite sequence of symbols from the language and from the standard logical connectives and quantifiers ( $\wedge$ for conjunction, $\vee$ for disjunction, $\sim$ for negation, etc.) according to the usual rules, such that every variable is bounded by some quantifier. A first-order sentence only has quantifiers on objects, while a second-order sentence can have quantifiers on functions and relations (in particular, sets) as well.

For example, the set $\mathbf{Q}$ is made of the following first-order sentences on the language $\{ \mathbf{0}, {}', +, \cdot, < \}$ :

$\forall x . \sim (x' = \mathbf{0})$
$\forall x y . x' = y' \to x = y$
$\forall x . x + \mathbf{0} = x$
$\forall x y . x + y' = (x + y) '$
$\forall x . x \cdot \mathbf{0} = 0$
$\forall x y . x \cdot y' = x \cdot y + x$
$\forall x . \sim (x < \mathbf{0})$
$\forall x y. x < y' \iff (x < y \vee x = y)$
$\forall x . \mathbf{0} < x \iff \sim (x = \mathbf{0})$
$\forall x y . x' < y \iff (x < y \wedge y \neq x')$

Of course, second-order logic is much more expressive than first order logic. The natural question is: how much?

The answer is: possibly, too much more than we would like.

To discuss how it is so, we recall the notion of model. Informally, a model of a set of sentences is a “world” where all the sentences in the set are true. For instance, the set $\mathbb{N}$ of natural numbers with the usual zero, successor, addition, multiplication, and ordering is a model of $\mathbf{Q}$ . A model for a set of sentences is also a model for every theorem of that set, i.e., every sentence that can be derived in finitely many steps from those of the given set by applying the standard rules of logic.

For sets of first-order sentences, the following four results are standard:

Compactness theorem. (Tarski and Mal’tsev) Given a set $\Gamma$ of first-order sentences, if every finite subset of $\Gamma$ has a model, then $\Gamma$ has a model.

Upwards Löwenheim-Skolem theorem. If a set of first-order sentences has a model of infinite cardinality $\alpha$ , then it also has models of every cardinality $\beta>\alpha$ .

Downwards Löwenheim-Skolem theorem. If a set of first-order sentences on a finite or countable language has a model, then it also has a finite or countable model.

Completeness theorem. (Gödel) Given a set $\Gamma$ of first-order sentences, if a first-order sentence $A$ is true in every model of $\Gamma$ , then $A$ is a theorem of $\Gamma$ .

All of these facts fail for second-order theories. Let us see how:

We start by considering the following second-order sentence:

$\mathbf{Denum} \equiv \exists x \exists f \forall S . ( S(x) \wedge \forall y . (S(y) \to S(f(y))) ) \to \forall y . S(y)$

Lemma 1. The sentence $\mathbf{Denum}$ is true in a model $\mathcal{M}$ if and only if the universe of $\mathcal{M}$ is at most countable.

The informal reason is that $\mathbf{Denum}$ intuitively means:

the universe is a monoid on a single generator

Let us now consider the following second-order sentence:

$\mathbf{Infin} \equiv \exists x \exists f . (\forall y . x \neq f(y)) \wedge (\forall y z . f(y) = f(z) \to y = z)$

Lemma 2. The sentence $\mathbf{Infin}$ is true in a model $\mathcal{M}$ if and only if the universe of $\mathcal{M}$ is infinite.

The informal reason is that $\mathbf{Infin}$ intuitively means:

the universe contains a copy of the natural numbers

Theorem 1. Both Löwenheim-Skolem theorems fail for sets of second-order sentences.

Proof. $\mathbf{Infin} \wedge \mathbf{Denum}$ only has countably infinite models. $\mathbf{Infin} \wedge \sim \mathbf{Denum}$ only has uncountably infinite models. $\Box$

Let us now consider the set $\mathbf{PA2}$ of all the sentences of $\mathbf{Q}$ together with the following second-order sentence:

$\mathbf{Ind} \equiv \forall S . (S(\mathbf{0}) \wedge \forall x. (S(x) \to S(x'))) \to \forall x . S(x)$

Clearly, $\mathbf{Ind}$ is the induction principle: which is an axiom in second-order Peano arithmetics, but only an axiom scheme in first-order PA.

Lemma 3. Every model of $\mathbf{PA2}$ is isomorphic to the set of natural numbers with zero, successor, addition, multiplication, and ordering.

The informal reason is that $\mathbf{Q}$ , though finite, is powerful enough to tell numbers from each other: therefore, in every model of $\mathbf{PA2}$ , each numeral $\mathbf{n}$ ( $n$ th iteration of the successor, starting from $\mathbf{0}$ ) can be denoted by at most one item in the universe of the model. On the other hand, $\mathbf{Ind}$ is powerful enough to reconstruct every numeral.

Theorem 2. The compactness theorem fails for sets of second-order sentences.

Proof. Let $c$ be a constant outside the language of $\mathbf{PA2}$ . Consider the set $\Gamma$ made of all the sentences from $\mathbf{PA2}$ and all the sentences of the form $X_n \equiv c \neq \mathbf{n}$ . Then every finite subset $\Gamma_0$ of $\Gamma$ has a model, which can be obtained from the set of natural numbers by interpreting $c$ as some number $M$ strictly greater than all of the values $n$ such that $X_n \in \Gamma_0$ . However, a model of $\Gamma$ is also a model of $\mathbf{PA2}$ , and must be isomorphic to the set of natural numbers: but no interpretation of the constant $c$ is possible within such model. $\Box$

We can now prove

Theorem 3. The completeness theorem does not hold for second-order sentences.

In other words, second-order logic is semantically inadequate: it is not true anymore that all “inequivocably true” sentences are theorems. The proof will be based on the following two facts:

Fact 1. (Gödel) The set of the first-order formulas which are true in every model of $\mathbf{Q}$ is recursively enumerable.

Fact 2. (Tarski) The set of first-order formulas which are true in $\mathbb{N}$ is not recursively enumerable.

Fact 1 is actually a consequence of the completeness theorem: the set of first-order formulas which are true in every model of $\mathbf{Q}$ is the same as the set of first-order sentences that are provable from $\mathbf{Q}$ , and that set is recursively enumerable by producing every possible proof! To prove Theorem 3 it will thus be sufficient to prove that Fact 1 does not hold for second-order sentences.

Proof of Theorem 3. We identify $\mathbf{PA2}$ with the conjunction of all its formulas, which are finitely many.

Let $A$ be a first-order sentence in the language of $\mathbf{Q}$ . Because of what we saw while discussing the compactness theorem, $A$ is true in $\mathbb{N}$ if and only if it is true in every model of $\mathbf{PA2}$ : this, in turn, is the same as saying that $\mathbf{PA2} \to A$ is true in every model of $\mathbf{Q}$ . Indeed, let $\mathcal{M}$ be a model of $\mathbf{Q}$ : if $\mathcal{M}$ is isomorphic to $\mathbb{N}$ , then $\mathbf{PA2} \to A$ is true in $\mathcal{M}$ if and only if $A$ is true in $\mathcal{M}$ ; if $\mathcal{M}$ is not isomorphic to $\mathbb{N}$ , then $\mathbf{PA2}$ is false in $\mathcal{M}$ , which makes $\mathbf{PA2} \to A$ true in $\mathcal{M}$ . This holds whatever $A$ is.

Fix a Gödel numbering for sentences. There exists a recursive function that, for every sentence $A$ , transforms the Gödel number of the first-order sentence $A$ into the Gödel number of the second-order sentence $\mathbf{PA2} \to A$ .

Suppose now, for the sake of contradiction, that the set of second-order sentences that are true in every model of $\mathbf{Q}$ is recursively enumerable. Then we could get a recursive enumeration of the set of first-order sentences which are true in the standard model of $\mathbf{Q}$ by taking the Gödel number of such a sentence $A$ , turning it into that of $\mathbf{PA2} \to A$ via the aforementioned recursive function, and feeding the latter number to the semialgorithm for second-order sentences that are true in every model of $\mathbf{Q}$ . But because of Tarski’s result, no such recursive enumeration exists. $\Box$

Bibliography:

George S. Boolos et al. Computability and Logic. Fifth Edition. Cambridge University Press, 2007

]]> 2 Silvio Capobianco https://anotherblogonca.wordpress.com <![CDATA[A concrete piece of evidence for incompleteness]]> https://theorylunch.wordpress.com/?p=1596 2015-05-05T16:23:41Z 2015-05-04T16:34:41Z

Continue reading →]]>

On Thursday, the 25th of March 2015, Venanzio Capretta gave a Theory Lunch talk about Goodstein’s theorem. Later, on the 9th of March, Wolfgang Jeltsch talked about ordinal numbers, which are at the base of Goodstein’s proof. Here, I am writing down a small recollection of their arguments.

Given a base $b \geq 2$ , consider the base- $b$ writing of the nonnegative integer

$n = b^m \cdot a_m + b^{m-1} \cdot a_{m-1} + \ldots + b \cdot a_1 + a_0$

where each $a_i$ is an integer between $0$ and $b-1$ . The Cantor base- $b$ writing of $n$ is obtained by iteratively applying the base- $b$ writing to the exponents as well, until the only values appearing are integers between $0$ and $b$ . For example, for $b = 2$ and $n = 49$ , we have

$49 = 32 + 16 + 1 = 2^{2^2 + 1} + 2^{2^2} + 1$

and also

$49 = 27 + 9 \cdot 2 + 3 + 1 = 3^3 + 3^2 \cdot 2 + 3 + 1$

Given a nonnegative integer $n$ , consider the Goodstein sequence defined for $i \geq 2$ by putting $x_2 = n$ , and by constructing $x_{i+1}$ from $x_i$ as follows:

Take the Cantor base- $i$ representation of $x_i$ .
Convert each $i$ into $i+1$ , getting a new number.
If the value obtained at the previous point is positive, then subtract $1$ from it.
(This is called the woodworm’s trick.)

Goodstein’s theorem. Whatever the initial value $x_2$ , the Goodstein sequence ultimately reaches the value $0$ in finitely many steps.

Goodstein’s proof relies on the use of ordinal arithmetic. Recall the definition: an ordinal number is an equivalence class of well-ordered sets modulo order isomorphisms, i.e., order-preserving bijections.Observe that such order isomorphism between well-ordered sets, if it exists, is unique: if $(X, \leq_X)$ and $(Y, \leq_Y)$ are well-ordered sets, and $f, g : X \to Y$ are two distinct order isomorphisms, then either $U = \{ x \in X \mid f(x) <_Y g(x) \}$ or $V = \{ x \in X \mid g(x) <_Y f(x) \}$ has a minimum $m$ , which cannot correspond to any element of $Y$ .

An interval in a well-ordered set $(X, \leq)$ is a subset of the form $[0, y) = \{ x \in \alpha \mid x < y \}$ .

Fact 1. Given any two well-ordered sets, either they are order-isomorphic, or one of them is order-isomorphic to an initial interval of the other.

In particular, every ordinal $\alpha$ is order-isomorphic to the interval $[0, \alpha)$ .

All ordinal numbers can be obtained via von Neumann’s classification:

The zero ordinal is $0 = \emptyset$ , which is trivially well-ordered as it has no nonempty subsets.
A successor ordinal is an ordinal of the form $\alpha + 1 = \alpha \sqcup \{\alpha\}$ , with every object in $\alpha$ being smaller than $\{\alpha\}$ in $\alpha + 1$ .
For instance, $N + 1$ can be seen as $N \sqcup \{N\}$ .
A limit ordinal is a nonzero ordinal which is not a successor. Such ordinal must be the least upper bound of the collection of all the ordinals below it.
For instance, the smallest transfinite ordinal $\omega$ is the limit of the collection of the finite ordinals.

Observe that, with this convention, each ordinal is an element of every ordinal strictly greater than itself.

Fact 2. Every set of ordinal numbers is well-ordered with respect to the relation: $\alpha < \beta$ if and only if $\alpha \in \beta$ .

Operations between ordinal numbers are defined as follows: (up to order isomorphisms)

$\alpha + \beta$ is a copy of $\alpha$ followed by a copy of $\beta$ , with every object in $\alpha$ being strictly smaller than any object in $\beta$ .
If $M$ and $N$ are finite ordinals, then $M+N$ has the intuitive meaning. On the other hand, $1 + \omega = \omega$ , as a copy of $1$ followed by a copy of $\omega$ is order-isomorphic to $\omega$ : but $\omega + 1$ is strictly larger than $\omega$ , as the latter is an initial interval of the former.
$\alpha \cdot \beta$ is a stack of $\beta$ copies of $\alpha$ , with each object in each layer being strictly smaller than any object of any layer above.
If $M$ and $N$ are finite ordinals, then $M \cdot N$ has the intuitive meaning. On the other hand, $2 \cdot \omega$ is a stack of $\omega$ copies of $2$ , which is order-isomorphic to $\omega$ : but $\omega \cdot 2$ is a stack of $2$ copies of $\omega$ , which is order-isomorphic to $\omega + \omega$ .
$\alpha^\beta$ is $1$ if $\beta = 0$ , $\alpha^\gamma \cdot \alpha$ if $\beta$ is the successor of $\gamma$ , and the least upper bound of the ordinals of the form $\alpha^x$ with $x < \beta$ if $\beta$ is a limit ordinal.
If $M$ and $N$ are finite ordinals, then $M^N$ has the intuitive meaning. On the other hand, $2^\omega$ is the least upper bound of all the ordinals of the form $2^N$ where $N$ is a finite ordinal, which is precisely $\omega$ : but $\omega^2 = \omega \cdot \omega$ .

Proof of Goodstein’s theorem: To each integer value $x_i$ we associate an ordinal number $y_i$ by replacing each $i$ (which, let’s not forget, is the base $x_i$ is written in) with $\omega$ . For example, if $x_2 = 49 = 2^{2^2 + 1} + 2^{2^2} + 1$ , then

$y_2 = \omega^{\omega^\omega + 1} + \omega^{\omega^\omega} + 1$

and $x_3 = 3^{3^3 + 1} + 3^{3^3}$ (which, incidentally, equals $30,502,389,939,948$ ) so that

$y_3 = \omega^{\omega^\omega + 1} + \omega^{\omega^\omega}$

We notice that, in our example, $x_3 > x_2$ , but $y_3 < y_2$ : why is it so?, and is it just a case, or is there a rule behind this?

At each step $i \geq 2$ where $x_i > 0$ , consider the writing $x_i = i^m \cdot a_m + \ldots + i \cdot a_1 + a_0$ . Three cases are possible:

$m = 0$ .
Then $y_i = a_0$ , $x_{i+1} = a_0 - 1$ as $a_0 < i$ , and $y_{i+1} = a_0 - 1 < y_i$ .
$m > 0$ and $a_0 > 0$ .
Then $y_i = \alpha + a_0$ for a transfinite ordinal $\alpha$ , and $y_{i+1} = \alpha + (a_0 - 1) < y_i$ .
$m > 0$ and $a_0 = 0$ .
Then $x_i = i^m \cdot a_m + \ldots + i^p \cdot a_p$ for some $p > 0$ , and $x_{i+1} = (i+1)^m \cdot a_m + \ldots + (i+1)^p \cdot a_p - 1$ is a number whose $p$ th digit in base $i+1$ is zero: correspondingly, the rightmost term in $y_i$ will be replaced by a smaller ordinal in $y_{i+1}$ .

It is then clear that the sequence $y_i$ is strictly decreasing. But the collection of all ordinals not larger than $y_2$ is a well-ordered set, and every nonincreasing sequence in a well-ordered set is ultimately constant: hence, there must be a value $i$ such that $y_i = y_{i+1}$ . But the only way it can be so is when $y_i = 0$ : in turn, the only option for $y_i$ to be zero, is that $x_i$ is zero as well. This proves the theorem. $\Box$

So why is it that Goodstein’s theorem is not provable in the first order Peano arithmetics? The intuitive reason, is that the exponentiations can be arbitrarily many, which requires having available all the ordinals up to

$\varepsilon_0 = \left. \omega^{\omega^{\omega^{\omega \vdots}}} \right\}$ , $\omega$ times $= \sup_{n < \omega} \left. \omega^{\omega^{\vdots^\omega}} \right\}$ , $n$ times:

this, however, is impossible if induction only allows finitely many steps, as it is the case for first-order Peano arithmetics. A full discussion of a counterexample, however, would greatly exceed the scope of this post.

]]> 0 Silvio Capobianco https://anotherblogonca.wordpress.com <![CDATA[Limit languages of cellular automata]]> https://theorylunch.wordpress.com/?p=1591 2015-03-12T14:32:41Z 2015-03-12T14:07:03Z

At today’s Theory Lunch I discussed limit languages of cellular automata, and Lyman Hurd’s example of a CA whose limit language is not regular. I wrote about this on my other blog.

Link: https://anotherblogonca.wordpress.com/2015/03/12/more-evidence-that-regularity-is-not-preserved-up-to-infinity/

]]> 0 Silvio Capobianco https://anotherblogonca.wordpress.com <![CDATA[Proving the beauty of a mind]]> https://theorylunch.wordpress.com/?p=1443 2014-06-05T11:25:17Z 2014-06-05T11:25:17Z

Continue reading →]]>

In the previous Theory Lunch talk we introduced the notion of Nash equilibrium for games in normal form. Today, we went through the proof of Nash’s theorem of existence of mixed strategy Nash equilibria for finite games in normal form.

Let us recall the basic notions. In a game in normal form we have:

A set $N$ of players.
A set $S_i$ of strategies for each player.
A collection of utility functions $\{u_i\}_{i \in N}$ which associate to each strategic profile $s \in S = \prod_{i \in N} S_i$ a real number, such that $u_i(s)$ is the utility player $i$ gets from the strategic profile $s$ .

A Nash equilibrium for a game in normal form is a strategic profile $s$ such that, for every player $i$ and every strategy $s'_i$ feasible for player $i$ , it is the case that $u_i(s_i \mid s_{-i}) \geq u_i(s'_i \mid s_{-i})$ . We had seen that not every finite game in normal form admits a pure strategy Nash equilibrium: so, we introduced randomization.

A mixed strategy for player $i$ is a probability distribution $\mu_i : \mathcal{P}(S_i) \to [0,1]$ . If $S_i$ is finite, this is the same as assigning values $p_{i,j} = \mu_i(s_{i,j})$ for $j = 1, \ldots, |S_i|$ . A mixed strategy profile is a collection $\mu = \{\mu_i\}_{i \in N}$ of mixed strategies for each player. A mixed strategy Nash equilibrium is a mixed strategy profile $\mu = \{\mu_i\}_{i \in N}$ such that, for every player $i$ and every mixed strategy $\mu'_i$ feasible for player $i$ ,

$\mathbb{E}(u_i \mid \mu_i, \mu_{-i}) \geq \mathbb{E}(u_i \mid \mu'_i, \mu_{-i})$ .

The idea behind Nash’s proof goes as follows. If the game is finite, then a mixed strategy for player $i$ is identified with a point of

$\Delta_i = \left\{ x \in \mathbb{R}^{|S_i|} \mid x_j \geq 0 \, \forall j \, , \, \sum_{j=1}^{|S_i|} x_j = 1 \right\} \, :$

therefore, mixed strategy profiles can be identified with points of

$\Delta = \prod_{i=1}^{|N|} \Delta_i \subseteq \mathbb{R}^{|S_1| + |S_2| + \ldots + |S_{|N|}|} \, ,$

which is compact and convex as all of its components are. Mixed strategy Nash equilibria are those points of $\Delta$ where each pure strategy $s_{i,j}$ , $i \in N$ , $j = 1, \ldots, |S_i|$ , is used in the most efficient way: by relaxing the condition and allowing a small “slack” with respect to such most efficient way, it is possible to define a continuous transformation of mixed strategy profiles into mixed strategy profiles, which will have a fixed point because of the Brouwer fixed-point theorem. By gradually reducing the slack, a mixed strategy Nash equilibrium is found as a limit point of such approximations.

Suppose player $i$ has available the pure strategies $s_{i,j}$ for $j = 1, \ldots, |S_i|$ . Let $\mu$ be an arbitrary mixed strategy profile and $k \geq 1$ be an arbitrary integer. Consider the following quantities:

$u_{i,j}(\mu) = \mathbb{E}(u_i \mid s_{i,j}, \mu_{-i})$ .
$w_i(\mu) = \max_j u_{i,j}(\mu)$ .
$\Psi_{i,j}(\mu, k) = u_{i,j}(\mu) - w_i(\mu) + \dfrac{1}{k}$ .
$\Psi_{i,j}^+(\mu, k) = \max \left( \Psi_{i,j}(\mu, k), 0 \right)$ .

Given $i \in N$ , the sum $\sum_{j=1}^{|S_i|} \Psi_{i,j}^+ (\mu, k)$ is bounded from below by $\max_{j \in \{1, \ldots, |S_i|\}} \Psi_{i,j}^+ (\mu,k) = 1/k$ , hence the functions

$P_{i,j}(\mu,k) = \dfrac{\Psi_{i,j}^+(\mu, k)}{\sum_{j=1}^{|S_i|} \Psi_{i,j}^+ (\mu, k)}$

are continuous and nonnegative and satisfy $\sum_j P_{i,j}(\mu,k) = 1$ whatever $i \in N$ and $k \geq 1$ are. As a consequence, the functions

$F_k = \lambda (\mu : \Delta) \,.\, (P_{i,j}(\mu,k) \, \mathtt{for} \, j \, \mathtt{in} \, S_i \, \mathtt{for} \, i \, \mathtt{in} \, N) \,,$

that is,

$(F_k(\mu))_i(s_{i,j}) = P_{i,j}(\mu,k) \; \forall i \in N \, \forall j \in \{1, \ldots, |S_i|\} \,,$

are continuous transformations of $\Delta$ into itself. Let $\mu_k$ be a fixed point of $F_k$ , whose existence is ensured by the Brouwer fixed-point theorem: as $\Delta$ is compact, the sequence $\{\mu_k\}_{k \geq 1}$ has a limit point $\overline{\mu}$ .

Suppose, for the sake of contradiction, that $\overline{\mu}$ is not a mixed strategy Nash equilibrium. Then there must be a player $i$ and a mixed strategy $\mu_i$ such that $\mathbb{E}(u_i \mid \mu_i, \overline{\mu}_{-i}) > \mathbb{E}(u_i \mid \overline{\mu})$ . The only way this may happen, is that some pure strategy $s_{i,j}$ is used suboptimally by $\overline{\mu}$ , that is,

$0 < u_{i,j}(\overline{\mu}) < w_i(\overline{\mu}) \,.$

Choose $\varepsilon > 0$ and $k \geq 1$ so that:

$\mu_k$ belongs to a subsequence converging to $\overline{\mu}$ .
$u_{i,j}(\overline{\mu}) - w_i(\overline{\mu}) < -\varepsilon$ .
$\left| \left( u_{i,j}(\mu_k) - w_i(\mu_k) \right) - \left( u_{i,j}(\overline{\mu}) - w_i(\overline{\mu}) \right) \right| < \varepsilon/2$ .
$1/k < \varepsilon/2$ .

Points 2 and 3 tells us that $u_{i,j}(\mu_k) - w_i(\mu_k)$ is strictly smaller than $-\varepsilon/2$ : this, together with point 4, yields $\Psi_{i,j}(\mu_k) < 0$ , thus $\Psi_{i,j}^+(\mu_k) = 0$ . But $\mu_k$ is a fixed point for $F_k$ , so

$u_{i,j}(\mu_k) = u_{i,j}(F_k(\mu_k)) = \dfrac{\Psi_{i,j}^+(\mu_k)}{\sum_{j=1}^{|S_i|} \Psi_{i,j}^+(\mu_k)} = 0$ :

and as $k$ may be taken arbitrarily large and $|\mu_k - \overline{\mu}|$ be made arbitrarily small, we must conclude that $u_{i,j}(\overline{\mu}) = 0$ too. This is a contradiction.

]]> 0 Silvio Capobianco https://anotherblogonca.wordpress.com <![CDATA[Playing with a beautiful mind]]> https://theorylunch.wordpress.com/?p=1307 2014-05-29T11:21:40Z 2014-05-29T11:21:40Z

Continue reading →]]>

Today’s talk’s topic is an idea so important in game theory, and with so many applications in different fields including computer science, that it earned its discoverer, together with Reinhard Selten and John Harsanyi, the 1994 Nobel Memorial Prize in Economic Sciences.

To introduce this idea, together with other basic game-theoretic notions, we resort to some examples. Here goes the first one:

Alice and Bob are planning an evening at the cinema. Alice would like to watch the romantic movie, while Bob would like to watch the action movie. Neither of them likes much the other’s favored movie: however, should they split, the sadness for being alone would be so big, that neither of them would enjoy his or her movie!

This is the kind of situation modeled by a game in normal form, where we have:

A set $N$ of players.
A set $S_i$ of strategies for each player.
A collection of utility functions $\{u_i\}_{i \in N}$ which associate to each strategic profile $s \in S = \prod_{i \in N} S_i$ a real number, such that $u_i(s)$ is the utility player $i$ gets from the strategic profile $s$ .

In the case of Alice and Bob, this may be summarized with a table such as the following:

	Romantic	Action
Romantic	$(4,1)$	$(0,0)$
Action	$(0,0)$	$(1,4)$

Such tables represent games in normal form between two players, where the rows of the table are labeled with the strategies suitable for the first player, and the columns of the table are labeled with the strategies suitable for the second player: the entries of the table indicate the values of the utility functions when the first player plays the corresponding row and the second player plays the corresponding column. When we want to emphasize the role of player $i$ in contrast to the others, we write $u_i(s)$ as $u_i(s_i \mid s_{-i})$ , and talk about the strategy $s_i$ of player $i$ given the strategic profile $s_{-i}$ of the other players.

Suppose that Alice is the first player, and Bob is the second player: then the table tells us that, if they both choose the romantic movie, Alice will enjoy it a lot (utility value $u_1(R,R) = 4$ ) and Bob not very much (utility value $u_2(R,R) = 1$ ). However, if Bob defects from this strategic profile and goes watch the action movie, he will ultimately not enjoy it, because he will be sad for not being together with Alice—which was the entire point about organizing the evening at the movies!

Let us consider another game (a rather serious one indeed) where the players are a lion and a gazelle. The lion wants to catch the gazelle; the gazelle wants to avoid being caught by the lion. To do this, they may choose between being on the move, or staying more or less in the same place. It turns out, from observation in the field, that the table for the lion-and-gazelle situation is similar to the one below:

	Move	Stay
Move	$(5,3)$	$(7,0)$
Stay	$(3,1)$	$(1,4)$

We observe that, for the lion, the most profitable strategy is to move. Indeed, if the gazelle moves, then the utility for the lion is $5$ if he moves, which is more than the $3$ he gets if he stays; on the other hand, if the gazelle stays, then the utility for the lion is $7$ if he moves, which is more than the $1$ he gets if he stays. A strategy such as this, which always gives the best possible result independently of the other players’ strategies, is called a dominant strategy. Such strategies are indeed quite rare: indeed, neither Alice nor Bob from the previous game had a dominant strategy, nor has the gazelle here, as they can maximize their own profit only by choosing the same strategy as the other player.

So, what if we relax the requirement, and just demand that every player chooses the most favorable strategy, given the strategies of the other players? This is the basic intuition under the concept of Nash equilibrium, formalized and studied by John Nash in his 1950 doctoral thesis.

Definition 1. A Nash equilibrium for a game in normal form is a strategic profile $s$ such that, for every player $i$ and every strategy $s'_i$ feasible for player $i$ , it is the case that $u_i(s_i \mid s_{-i}) \geq u_i(s'_i \mid s_{-i})$ .

The situation when both the lion and the gazelle are on the move, is a Nash equilibrium: and is the only Nash equilibrium in the corresponding game. (By definition, every dominant strategy enters every Nash equilibrium.) The situation when both Alice and Bob go watch the romantic movie, is a Nash equilibrium: and so is the one when they go watch the action movie.

So, does every game have a Nash equilibrium?

Actually, no.

Indeed, suppose that the predator and the prey, instead of being large mammals such as the lion and the gazelle, are small insects such as a dragonfly and a mosquito. It then turns out, after careful observation, that the table for the predator-prey game gets more similar to the following:

	Move	Stay
Move	$(5,0)$	$(1,3)$
Stay	$(3,4)$	$(7,1)$

In this situation, the dragonfly maximizes its utility if it does the same as the mosquito. In turn, however, the mosquito maximizes its own utility if it does the opposite than the dragonfly! In such a situation there can be no such thing as a Nash equilibrium as defined above.

Where determinism fails, however, randomization may help.

Definition 2. A mixed strategy for the player $i$ in a game in normal form is a probability distribution $\mu_i$ on the space $S_i$ of the strategies for player $i$ . A mixed strategy profile is a collection $\mu = \{\mu_i\}_{i \in N}$ of mixed strategies for each player.

For example, the dragonfly might decide to move with probability $p$ , and stay still with probability $1-p$ ; similarly, the mosquito might decide to move with probability $q$ , and stay still with probability $1-q$ .

With mixed strategies, the important value for player $i$ to take into account is the expected utility given the strategic profile

$\mathbb{E}(u_i \mid \mu) = \sum_{s \in S} \mu(s) u_i(s)$

which we may write $\mathbb{E}(u_i \mid \mu_i, \mu_{-i})$ when we want to emphasize the role of player $i$ .

Now, suppose that the dragonfly decides to set its own paramenter $p$ so that its expected utility does not change if the mosquito decides to move or to stay: this corresponds to the dragonfly maximizing its expected utility, given the mixed strategy of the mosquito. Our table tells us that this corresponds to

$5p + 3(1-p) = p + 7(1-p)$

which has solution $p = 1/2$ . In turn, if the mosquito sets its own parameter $q$ so that its own expected utility does not change if the dragonfly decides to move or stay, then

$3(1-q) = 4q + (1-q)$

which has solution $q = 1/3$ . The situation where the dragonfly moves with probability $p = 1/2$ and the mosquito moves with probability $q = 1/3$ is a situation none of the two insects has any advantage to change on its own part, given the choice of the other.

Definition 3. A mixed strategy Nash equilibrium for a game in normal form is a mixed strategy profile $\mu = \{\mu_i\}_{i \in N}$ such that, for every player $i$ and every mixed strategy $\mu'_i$ feasible for player $i$ , it is the case that $\mathbb{E}(u_i \mid \mu_i, \mu_{-i}) \geq \mathbb{E}(u_i \mid \mu'_i, \mu_{-i})$ .

And here comes Nash’s great result:

Nash’s theorem. Every game in normal form that allows at most finitely many pure strategic profiles admits at least one, possibly mixed strategy, Nash equilibrium.

It is actually sufficient to prove Nash’s theorem (as he did in his doctoral thesis) when there are only many players, and each of them only has finitely many pure strategies: such limitation is only apparent, because the condition that pure strategy profiles are finitely many means that all players have finitely many pure strategies, and at most finitely many of them have more than one.

The idea of the proof, which we might go through in a future Theory Lunch talk, goes as follows:

Identify the space of mixed strategic profiles with a compact and convex set $\Delta \subseteq \mathbb{R}^n$ for suitable $n$ .
For $k \geq 1$ define a family of continuous transformations $F_k : \Delta \to \Delta$ .
By the Brouwer fixed-point theorem, for every $k \geq 1$ there exists a mixed strategic profile $\mu_k$ such that $F_k(\mu_k) = \mu_k$ .
As $\Delta$ is compact, the sequence $\{\mu_k\}_{k \geq 1}$ has a limit point $\overline{\mu}$ .
By supposing that $\overline{\mu}$ is not a mixed strategy Nash equilibrium we reach a contradiction.

We remark that Nash equilibria are not optimal solutions: they are, at most, lesser evils for everyone given the circumstances. To better explain this we illustrate a classic problem in decision theory, called the prisoner’s dilemma. The police has arrested two people, who are suspects in a bank robbery: however, the only evidence is about carrying firearms without license, which is a minor crime leading to a sentence of one year, compared to the ten years for bank robbery. So, while interrogating each suspect, they propose a bargain: if the person will testify against the other person for bank robbery, the police will drop the charges for carrying firearms without license. The table for the prisoner’s dilemma thus has the following form:

	Quiet	Speak
Quiet	$(-1,-1)$	$(-11,0)$
Speak	$(0,-11)$	$(-10,-10)$

Then the situation where both suspects testify against each other is the only pure strategy Nash equilibrium: however, it is very far from being optimal…

]]> 1 Silvio Capobianco https://anotherblogonca.wordpress.com <![CDATA[Many choices from few parameters]]> https://theorylunch.wordpress.com/?p=1304 2014-05-15T12:48:59Z 2014-05-15T12:48:20Z

Continue reading →]]>

At the end of March I gave a talk about how to obtain a wide range of Bernoulli distributions with as few parameters as possible. This originates from the needs of simulations of complex systems with cellular automata machine. The solution I described comes from Mark Smith’s doctoral thesis, performed under the supervision of Tommaso Toffoli.

I wrote a post on this on my blog on cellular automata.

Link: https://anotherblogonca.wordpress.com/2014/05/15/random-settings-in-cellular-automata-machines/

]]> 0

Original Source | Taken Source