6 Numbers from logic

6 Numbers from logic

The axioms that we have given so far generate numbers from numbers using the arithmetic operations, logic from numbers using relations, and logic from logic using the connectives and quantifiers. However, for the logic to be of any practical use, we must complete the circle by recovering numbers from it. The idea is to define an integer or a real number by giving some propositional expression that fixes it uniquely. We shall see that the various “limiting” operations in analysis can be derived from this, and that all computation is done in the arena of logic.

In the case of ℕ, we characterise those propositions that ought to be of the form φ n≡(n=_ℕa), where a:ℕ is the number to be defined. This is a rule of logic that is easily overlooked; it was first stated correctly (including our Lemma 6.3) by Giuseppe Peano, who wrote ι for our “the” [Pea97, §22].

Definition 6.1 A description is a propositional expression φ n containing a variable of overt discrete type N (such as ℕ or ℚ but not ℝ) such that

⊤ ⇔ ∃ n.φ n and φ n ∧φ m =⇒ (n=_N m).

For example, for any given a:ℕ, the proposition φ n ≡(n=_N a) is a description.

Axiom 6.2 For any description φ, there is term, written a≡the n.φ n , of type N, for which φ n ⇔(n=_N a). For example, the n.(n³=_ℕ8) is 2.

The following is a syntactical result about this new symbol:

Lemma 6.3 Any expression of type ℕ or ℚ is provably equivalent to one of the form the n.φ n , in which the description operator is not used in the syntax of φ, whilst any real or logical term be rewritten without using it at all.

Proof Any term of type ℕ or ℚ may be wrapped in a description operator,

a = the n.(n=_N a),

so we just have to show how to eliminate it from within propositional sub-expressions. From Lemma 5.14 and the Axiom we have

θ(the n.φ n) ⇔ ∃ m.θ m∧(m=_Nthe n.φ n) ⇔ ∃ m.θ m∧ φ m

and we may use this to rewrite each description operator within the expression. ▫

Lemma 6.4 A predicate α x is called decidable or complemented if there is another predicate ω x such that

α x∧ω x ⇔ ⊥ and α x∨ω x ⇔ ⊤.

They define a unique function f:X→2 for which α x≡(f x=0) and ω x≡(f x=1).

Proof The expression θ n x ≡ (n=0∧α x)∨(n=1∧ω x) satisfies

(∃ n.θ n x) ⇔ (α x∨ω x) ⇔ ⊤

and θ n x∧θ m x =⇒ (α x∧ω x)∨(m=n=0)∨(m=n=1) =⇒ (m=n).

Hence θ is a description (of n, parametrically in x) and defines a map f:X→2⊂ℕ. Since (f x=0)⇔θ 0 x⇔α x and (f x=1)⇔θ 1 x⇔ω x, the correspondence is bijective (see also Section B 11). ▫

In particular, ℕ has decidable equality, because both = and ≠ are defined for it.

Remark 6.5 Propositional expressions that have parameters may be used to embed computation in our calculus. First consider a function f:N→2 such that ∃ n.f n=1, so there are predicates α n≡(f n=1) and ω n≡(f n=0) with

…, n:ℕ ⊢ α n ∧ω n ⇔⊥, α n ∨ω n ⇔⊤ and … ⊢ ∃ n.α n ⇔⊤.

Then there is a third predicate φ n, defined by

…, n:ℕ ⊢ φ n ≡ α n ∧ ∀ m< n.ω m,

that has the uniqueness property too, so it is a description, and the n.φ n is the least n such that α n.

This way of obtaining a number n from a function f is called general recursion or minimalisation. It is known that there are general recursive functions that grow faster than any that can be defined using primitive recursion (Axiom 4.2) alone, so Lemma 6.3 cannot eliminate all descriptions: they properly extend the calculus. Minimalisation over partial functions may be formulated using pairs of predicates α and ω that need only satisfy the first of the three conditions above; [D] interprets them in ASD.

These results are proved as constructions within the ASD calculus and so hold in any model of it. However, the converse operation, which you may think of as a very limited choice principle, depends on syntactical analysis, so it is a property of the free model:

Remark 6.6 The calculus obeys the existence property:

if ⊢∃ n.φ n ⇔⊤ is provable then so is ⊢φ m ⇔⊤, for some numeral ⊢ m:ℕ,

where ⊢ with nothing on the left means that no parameters are allowed. This may be shown by methods of proof theory that go outside the calculus (e.g. logical relations, cf. Remark 3.23): the proof of ⊢φ n ⇔⊤ can be manipulated to derive the numeral ⊢ m:ℕ.

The meaning of ∃ was the battleground between classical and intuitionistic mathematics in the 1920s. Brouwer’s philosophy that in order to say that something exists we must be able to construct it was later justified by this property when intuitionism became formalised . However, our version is weaker because it only applies to integer expressions without parameters.

The search for a witness m can be made by a machine, using logic programming, and it doesn’t even require the proof as an input: the search terminates (possibly after a very long time) if it exists, but diverges or aborts if it doesn’t. However, the details of the computation depend on a careful syntactic analysis of the whole calculus, which is way beyond the scope of this paper [N].

Beware, also, that there is no lattice dual of this result: there is a propositional expression φ n that is not equivalent to ⊤, but such that ⊢φ m ⇔⊤ for each numeral ⊢ m:ℕ .

Turning to real numbers, we cannot define them by descriptions because equality of real numbers is not a proposition (Definition 4.7). We could use inequality instead (Exercise I 14.14), but it is more natural to characterise the propositional expressions that are of the form δ d⇔(d< a) and υ u⇔(a< u).

Definition 6.7 A Dedekind cut (δ,υ) is a pair of propositional expressions δ d and υ u, with real or rational arguments d and u, such that

∃ e. (d< e) ∧ δ e	⇔	δ d	∃ t. υ t ∧ (t< u)	⇔	υ u
∃ d. δ d	⇔	⊤	∃ u. υ u	⇔	⊤
δ d ∧ υ u	⇒	(d< u)	(δ d ∨ υ u)	⇐	(d< u).

The top two statements say that δ and υ are respectively ascending and descending reals, cf. Definition 3.3; we call such a pair (δ,υ) a pseudo-cut. The middle two say that they are bounded; and the bottom ones that they are disjoint and located, cf. Axiom 4.9. The real line itself is constructed from Dedekind cuts of ℚ in Section I 6.

Analogously to Axiom 6.2 for ℕ, we assert

Axiom 6.8 The type ℝ is Dedekind complete in the sense that every cut is of the form

δ d ⇔ (d< a) and υ u ⇔ (a< u) for some unique a:ℝ.

Just as we introduced the n.φ n as a new piece of syntax for the integer defined by a description, we sometimes also write cut (δ,υ) for this new real number a. However, the analogue of Lemma 6.3 for cuts relies on additional axioms, and will be given in Corollary 10.3.

In order to print the result of a real-valued expression in the traditional decimal or binary notation, we first need another axiom to say that it is equivalent to a Dedekind cut of the (decimal or dyadic) rationals, and not infinite or infinitesimal:

Axiom 6.9 The Archimedean principle says that, for x,ε:ℚ or ℝ:

ε> 0 =⇒ ∃ k:ℤ. ε(k−1)< x< ε(k+1).

We often use ε≡ 2⁻ⁿ in this, cf. Proposition I 11.15 and Lemma I 14.2:

Lemma 6.10 Between any two real numbers lies a dyadic rational:

x< y=⇒∃ n,k:ℤ. x< k· 2⁻ⁿ< y.

On either side of any real number there are arbitrarily close dyadic rationals,

x:ℝ, n:ℤ ⊢ ∃ k:ℤ. (k−1)· 2⁻ⁿ< x<(k+1)· 2⁻ⁿ. ▫

Remark 6.11 Applying the existence property (Remark 6.6) to k, we may therefore evaluate any real-valued expression ⊢ a:ℝ (without parameters) to any prescribed number ⊢ n:ℤ of binary digits: a≈ k· 2⁻ⁿ. More generally, given any possibility operator ⊢◊:Σ^ℝ→Σ (without parameters) that is true on some interval, so ⊢◊(a,b)⇔⊤, we may find k:ℤ such that

⊢◊((k−1)· 2⁻ⁿ,(k+1)· 2⁻ⁿ)⇔⊤,

cf. Theorem 2.8. The query that we raised about the existential quantifier ∃ x_n in Theorem 1.7 may also be settled in this way.

We stress that this proof-theoretic computation operates on an entire expression, and that it needs to be given the required precision n explicitly (say, 1000). Our sketchy mathematical discussion does not claim to say anything about how to implement these ideas practically on a computer, so you cannot judge from what we have said here how fast or slow it is. There is a syntactic translation that eliminates real numbers from our calculus in favour of dyadic intervals [K]. These can then be manipulated using constraint logic programming [Cle87], which can solve equations as rapidly as Newton’s algorithm does so. For some preliminary work on real computation within our calculus, see [Bau08][N].

Also, even in the case of a single real number a, it is subject to the usual caveat that there may be an error of as much as one unit (not just a half) in the last place, essentially because of Remark 1.3 again. In other words, the result k is non-determinate. This helps to explain how the computation can jump from one solution to another when we vary the coefficients of, for example, a cubic equation.

Nevertheless, returning from computation to analysis, we may abstract from this result by replacing the fixed numeral n with a (variable) parameter, and k· 2⁻ⁿ with a real-valued expression a_n that depends in a definite way on n. (Bolzano [Bol17, §§7,12] did this before Cauchy [Cau21, Chapître VI].)

Definition 6.12 A Cauchy sequence is an expression a_n:ℝ containing a variable n (albeit subscripted) of type ℕ and maybe other parameters such that

…, n,m:ℕ ⊢ n≤ m =⇒ |a_n−a_m| < 2⁻ⁿ.

This sequence converges to a limit a:ℝ if

…, n:ℕ ⊢ ⊤ ⇔ (|a_n−a| < 2¹⁻ⁿ) ≡ (a_n−2¹⁻ⁿ< a< a_n+2¹⁻ⁿ).

Remark 6.13 Here 2⁻ⁿ is called the modulus of convergence, and may be replaced by another function µ(n), for example Bishop used µ(n)≡1/n+1. Specifying the modulus makes the difference between

the classical definition, where any finite number of terms may be altered or deleted without affecting the limit, which we only find out retrospectively; and
our constructive one, for which the limit is determined to any prescribed accuracy (2⁻ⁿ) by finitely many (n) terms, where n is known in advance.

The constructive definition is consistent with the fact that the result of any computation can only depend on a finite part of the data (Remark 3.20), but the classical definition conflicts with this.

Addition, multiplication and substitution of sequences incur an administrative overhead, because they need to be re-parametrised in order to satisfy the modulus of convergence. Commonly occurring sequences such as the sums of power series do not march obediently towards their limits at the exponential pace that we have ordered. But analysts who regularly deal with series are skilled in re-bracketing and re-arranging them: there is no reason why the nth member of the sequence need be the sum of exactly n terms of the series.

Now we shall find the limit of any Cauchy sequence. Given that we are about to prove our first Theorem of analysis in ASD, we adopt a more formal style, to illustrate the way in which the rules of the previous section are used. Nevertheless, plenty of logical details have been abbreviated, so it would be a very good exercise to make them explicit, in particular formalising the idiomatic use of “there exists” using Axiom 5.10 and [Tay99, §1.6].

Theorem 6.14 Every Cauchy sequence in ℝ converges to a unique limit.

Proof The limit a will be defined as a Dedekind cut (δ,υ), where

δ d ≡ ∃ n:ℕ.(d< a_n−2⁻ⁿ) and υ u ≡ ∃ n:ℕ.(a_n+2⁻ⁿ< u).

These are easily seen to be lower and upper respectively (using transitivity of < and the Frobenius law, Lemma 5.12) and rounded (using interpolation). They are bounded by any d< a₀−1< a₀+1< u. For disjointness, given d,u, we have

δ d∧υ u ≡ ∃ n n′:ℕ.(d< a_n−2⁻ⁿ) ∧ (a_n′+2^−n′< u),

so let m≡min(n,n′), whence |a_n−a_n′| < 2^−m < 2⁻ⁿ+2^−n′ and u−d> 0.

Locatedness is usually the most difficult thing to prove, as we found when constructing ℝ itself and defining addition and multiplication on it [I]. We need the Archimedean principle (Axiom 6.9), in the form that

(d< u) =⇒ ∃ n:ℕ. (u−d> 2²⁻ⁿ).

Besides expanding the definitions of δ, υ and ≤ (Example 4.6), the proof relies on the meaning of ∃ (Axiom 5.10) and the Phoa principle (Axioms 5.6):

…, (δ d∨υ u)⇔⊥	⊢	(∃ n.d< a_n−2⁻ⁿ∨ a_n+2⁻ⁿ< u) ⇒ ⊥
…, (δ d∨υ u)⇔⊥, n:ℕ	⊢	(d< a_n−2⁻ⁿ∨ a_n+2⁻ⁿ< u) ⇒ ⊥
…, (δ d∨υ u)⇔⊥, n:ℕ	⊢	d≥ a_n−2⁻ⁿ and a_n+2⁻ⁿ≥ u
…, (δ d∨υ u)⇔⊥, n:ℕ	⊢	u−d ≤ (a_n+2⁻ⁿ)−(a_n−2⁻ⁿ) < 2²⁻ⁿ
…, (δ d∨υ u)⇔⊥, n:ℕ	⊢	(u−d> 2²⁻ⁿ) ⇒ ⊥
…, (δ d∨υ u)⇔⊥	⊢	(d< u) ⇒ (∃ n.u−d> 2²⁻ⁿ) ⇒ ⊥
…	⊢	(d< u) ⇒ (δ d∨υ u).

Hence, by Dedekind completeness, δ d⇔(d< a) and υ u⇔(a< u) for some unique a:ℝ. This is a limit because

…, n,m:ℕ ⊢ n< m =⇒ a_n−a_m < 2⁻ⁿ < 2¹⁻ⁿ−2^−m =⇒ a_n − 2¹⁻ⁿ < a_m −2^−m,

so, putting m≡ n+1,

…, n:ℕ ⊢ ⊤ ⇔ (∃ m.a_n − 2¹⁻ⁿ < a_m −2^−m) ≡ δ(a_n − 2¹⁻ⁿ) ≡ (a_n − 2¹⁻ⁿ < a),

and similarly a< a_n + 2¹⁻ⁿ. Finally, it is unique because, if …⊢ a,b:ℝ satisfy

…, n:ℕ ⊢ ⊤ ⇔ a_n − 2¹⁻ⁿ < a,b < a_n + 2¹⁻ⁿ ⇒ |a−b|< 2²⁻ⁿ,

then

… ⊢ (∃ n.|a−b|> 2²⁻ⁿ) ⇔ ⊥,

so a=b by the Archimedean principle and Definition 4.6. ▫

Remark 6.15 Other accounts of constructive analysis, most notably Errett Bishop’s [BB85], define real numbers as Cauchy sequences of rationals, but this approach leads to a heavily metrical treatment of analysis. By contrast, in ASD we can do things topologically, as we shall see. Indeed, in the drafting of this paper and [I], I had developed the analysis as far as the intermediate value theorem using the order on ℚ and ℝ before considering their arithmetic at all (Section I 11–13).

We can also take a more topological view than Remark 6.11 of the translation from Dedekind cuts to Cauchy sequences. Bearing in mind the unavoidable non-determinacy, one popular representation, called signed binary, is ∑_n d_n 2⁻ⁿ, where d_n∈{−1,0,+1}≡3. We want the evaluation map 3^ℕ→[−2,+2] to be surjective, rather than an isomorphism. In fact, it is a proper surjection, i.e. the inverse image of any a:[−2,+2] is an occupied compact subspace (Definition 8.6 and Theorem 8.9) of 3^ℕ. But the calculus does not provide function-spaces like this (essentially, Cantor space) for free, so we have first to construct it [L].

Returning to computation,

Theorem 6.16 Any computable continuous function f:ℝ→ℝ in the classical world can be represented in ASD.

Proof (Sketch) Since it is continuous, f is uniquely determined by the open subset of rational triplets (d,q,u) such that d< f q < u. On the other hand, we have a program that approximates f, which it may do in any of the senses that we have discussed in this section. The point is that this program can be adapted to one that

terminates on input (d,q,u) iff d< f q < u.

General recursion, and indeed the denotational semantics of any modern programming language, can be formulated in ASD, so this modified program can also be represented. Finally, since it yields Dedekind cuts, it defines a term x:ℝ⊢ g x:ℝ in our calculus, whose denotation back in the classical world is the given function f. ▫

The other two limiting operations of elementary analysis, derivatives and integrals, are also naturally defined as Dedekind cuts, and will be considered briefly in Section 10. Section 12 shows that ℝ is complete in an order-theoretic sense by constructing the maximum of a non-empty compact overt subspace. Definition by description and Dedekind completeness may be seen as examples of a more general categorical idea called sobriety: see Section I 14 and [A].