Computational values

1 Computational values

What does it mean for a computation to yield a value?

If the computational object is a function, or a database measured in terabytes, we may only obtain parts of its value, by querying it with arguments or search-terms. It is usual to say that if the type of the object is simple then the object is directly observable, but for complex types we must perform some computational experiment in order to access the value.

Typically, ℕ is regarded as an observable type [Plo77], but, as Alan Turing had already observed [Tur35, Section 8], if we are given two numbers with a lot of digits, say 9999999999999999 and 999999999999999, we may only determine whether or not they are equal by carefully comparing them digit by digit. For very large numbers, it may not even be feasible to print out all of the digits, so we are back in the situation of merely being prepared to print (or, indeed, to compute) whichever of the digits are actually required. Recursion theory traditionally regards the contents of a database as a huge number too.

So much for integers. What does it mean to define a real number? It is no good writing it out in decimal notation — even overlooking the ambiguity between 0.99999... and 1.00000... — because such an expression is necessarily finite, and therefore defines a rational number. For me to give you a real number in this notation, you have first to tell me how many decimal digits you require.

This interactive manner of obtaining mathematical values goes back to Weierstrass’s definition of continuity of f:ℝ→ℝ at u,

∀є> 0.∃δ> 0.∀ u′. |u′−u|<δ⇒|f(u′)−f(u)|<є.

We ask the consumer of f(u) how much accuracy (є) is required, and pass this information back to the producer of u as our own demand (δ) on the input.

Remark 1.1 In all of these examples, the value can only be elucidated by being ready to use it in situations that ultimately result in an observable value. In general, the best I can do is to be prepared (with a program) to provide as much information as you actually require.

The theme of this paper is that, once we have loosened our control over computational values to this extent, we open the floodgates to many more of them.

As ℕ is too big a type to be observable, maybe we should use 2, the type of bits? But no, this assumes that all computations terminate, so we need a type that’s simpler still. The type Σ of semi-bits is the only observable type that we need: such a value may be present (“yes”), or may never appear (“wait”). Σ is like the type that is called unit in Ml, but void in C and Java. A program of this type returns no useful information besides the fact that it has terminated, but it need not even do that. The results of many such programs may be used in parallel to light up a dot-matrix display, and thereby give an answer that is intelligible to humans.

Remark 1.2 Abstractly, it is therefore enough to consider a single program of type Σ, so long as we allow processing to go on in parallel.

A computation φ[x] of type Σ is an affirmative property of x, that is, a property that will (eventually) announce its truth, if it is true. Steven Vickers has given a nice account of properties that are affirmative but false, refutative but true, etc., showing how the algebra of affirmative properties has finite conjunctions and infinitary disjunctions, just like the lattice of open subsets of a topological space [Vic88, Chapter 2].

Indeed, by running two processes in parallel and waiting for one or both of them to terminate, this algebra admits binary conjunction and disjunction, whilst there are trivial programs that denote ⊥ and ⊤. The other possibility is to start off (one at a time) a lot of copies of the same program, giving them each the input 0, 1, 2, ..., and wait to see if one of them terminates. If the nth program terminates after N steps, we ignore (silently kill off) the n+N−1 other processes that we have started, and don’t bother to start process number n+N+1 that’s next in the queue to go. This existential quantifier is similar to the search or minimalisation operator in general recursion, though in fact it is simpler, and general recursion can be defined from it.

Definition 1.3 Mathematically, these constructions make Σ into a lattice with infinitary joins, over which meets (⊤,∧) distribute. It is convenient to consider finite (⊥,∨) and directed (⋁^↑) joins separately. Allowing joins of arbitrary families, as is required in traditional point-set topology, such a lattice is called a frame.

For computation, the joins must be recursively defined, and in particular countable. It is one of the objectives of the programme (Abstract Stone Duality) to which this paper is an introduction to re-formulate topology to agree with computation.

Because of the halting problem, there is no negation or implication. Nor is a predicate of the form ∀ n:ℕ.φ[n] affirmative, as we never finish testing φ[n]s. Whilst we can use proof theory to investigate stronger logics, we can only talk about them: the connectives ∧ and ∨, and the quantifier ∃ n, constitute the logic that we can do. In particular, we can do the pattern-matching and searching that proof theory needs by using ∧, ∨ and ∃.

We write Σ^X for the type (lattice) of observations that can be made about values of type X, because λ-abstraction and application conveniently express the formal and actual roles of the value in the process of observation. Observations, being computational objects, are themselves values that we can access only by making observations of them. The type of meta-observations is called Σ^{Σ^X}, and of course there are towers of any height you please.

There is a duality between values and observations.

Remark 1.4 One special way of making a meta-observation P:Σ^{Σ^X} about an observation φ:Σ^X is to apply it to a particular value p:X. We write

P ≡ η_X(p) for the meta-observation with P(φ) ≡ φ(p).

Thus P is a summary of the results φ(p) of all of the (possible) observations φ that we could make about p. (Being itself a computational object, the value of P can only be accessed by making observations ...)

If someone gives us a P, they are allegedly telling us the results of all of the observations that we might make about some value p, but to which they are giving us no direct access. Must we accept their word that there really is some value p behind P?

First, there are certain “healthiness” conditions that P must satisfy [Dij76, Chapter 3]. These are rather like testing the plausibility of someone’s alibis: was it really possible for someone to have been in these places at these times?

Remark 1.5 The application of observations to a value p respects the lattice operations on the algebra of observations:

truth:: If the observation (λ x.⊤) applied to the value p were not to terminate, this would mean that the computation of p did not terminate.
falsity:: If (λ x.⊥)p were to terminate, this would mean that the code to make the observation (λ x.⊥) had never been executed: somehow the computation of p has hijacked the output channel of the observation. This is indeed done in various programming languages, with a command abort, halt, etc., or by throwing an exception that is not caught. This raises the question of the scope of the exception: how far out of the execution environment is it actually caught, given that it doesn’t bring the World to an end?
Hayo Thielecke calls values that respect these constant observations discardable, though the point is that it is safe to calculate them, even when we may not need to use them.
binary conjunction:: If we can observe both φ(p) and ψ(p) separately, then we can also observe φ(p)∧ψ(p). A computation p that changes the state or consumes some resource can fail this property by using angelic non-determinism: in the execution of φ(p), it so makes its internal choices as to make φ(p) more likely to succeed, but as it might want to make different choices for ψ(p), it may not be able to win twice.
binary disjunction:: If we can observe φ(p)∨ψ(p), then we could instead observe either φ(p) or ψ(p). A program p can fail this property by using demonic non-determinism. Suppose that φ(p) and ψ(p), run individually, are unsuccessful, i.e. they don’t terminate. However, if instead we run them in parallel, the way in which each of them changes the state amounts to communication between them, with the effect that the combined computation φ(p)∨ψ(p) may follow a different and successful computation path.
directed joins:: If we can observe ⋁_i^↑φ_i(p) then we can already observe some φ_i(p). This argument about finiteness of computation is familiar from the Rice–Shapiro theorem and 1970s domain theory.

See also [Smy94, Section 4.4] for further discussion of the relationship between non-determinism and the preservation of lattice structure.

Examples 1.6 Here are some programs that violate the properties above, i.e. which respectively fail to preserve the logical connectives on the left, although we are mis-using “the” here (Section 9).

⊤	force(λφ.⊥)	the n.⊥
⊥	force(λφ.⊤)
∧	force(λφ.φ[0]∨φ[1])	the n.(n=0∨ n=1)
∨	force(λφ.φ[0]∧φ[1])

Definition 1.7 A subset P of a frame is called a filter if it contains ⊤, it is closed upwards, and also under finite meets (∧). A completely coprime filter is one such that, if ⋁ U∈ P, then already u∈ P for some u∈ U. Classically, the complement of such a filter is a prime ideal, I, which is closed downwards and under infinitary joins, ⊤∉ I, and if u∧ v∈ I then either u∈ I or v∈ I.

Remark 1.8 The motivations that we have given were translated from topology, using the dictionary

point	value
open subset	observation
open neighbourhood	observation of a value.

This view of general topology is more akin to Felix Hausdorff’s approach [Hau14] in terms of the family of open neighbourhoods of each point than to the better known Bourbaki axiomatisation of the lattice of all open subsets of the space [Bou66]. Beware that we only consider open neighbourhoods, whereas for Bourbaki any subset is a neighbourhood so long as it contains an open subset around the point. Bourbaki writes B(x) for the collection of such neighbourhoods of x.

Remark 1.9 The family P=η_X(p) of open neighbourhoods of a point p∈ X is also a completely coprime filter in the frame Σ^X of open subsets of X:

it is closed upwards, i.e. if φ∈ P and φ≤ψ then ψ∈ P;
⊤∈ P;
it is closed under intersection: if φ,ψ∈ P then φ∧ψ∈ P;
it is coprime: ⊥∉ P, and if φ∨ψ∈ P then either φ∈ P or ψ∈ P;
it is Scott-open: if ⋁_i^↑φ_i∈ P then φ_i∈ P for some i.

Remark 1.10 In Theorem 5.12 we make use of something that fails some of these conditions, namely the collection {U ∣ K⊂ U} of open neighbourhoods of a compact subspace K⊂ X. This is still a Scott-open filter (it respects ⊤, ∧ and ⋁^↑), but is only coprime if K is a singleton. (At least, that is the situation for T₁-spaces: the characterisation is more complicated in general. When we use this idea in Theorem 5.12, K must also be an upper subset in the specialisation order.)

Remark 1.11 So far we have only discussed computations that run on their own, without any input. In general, a program will take inputs u₁:U₁, ..., u_k:U_k over certain types, and we conventionally use Γ to name this list of typed variables. For the moment, we take k=1.

Suppose that P(u):Σ^{Σ^X} is a meta-observation of type X that satisfies the conditions that we have described, for each input value u∈ U. If φ:Σ^X is an observation of the output type X then P(u)(φ) is an observation of the input u, which we call ψ(u)≡ H(φ)(u). Then the lattice-theoretic properties of P(u) transfer to H:

H(⊤_Σ^X)	=	⊤_Σ^U		H(φ₁∧φ₂)	=	H(φ₁)∧ H(φ₂)
H(⊥_Σ^X)	=	⊥_Σ^U		H(φ₁∨φ₂)	=	H(φ₁)∨ H(φ₂)

together with the infinitary version, H(⋁^↑φ_i)=⋁^↑ H(φ_i).

Remarks 1.12

Such an H:Σ^X→Σ^U is called a frame homomorphism, and any continuous function f:U→ X defines such a homomorphism by Σ^f:φ↦λ u.φ(f u).
This is the Bourbaki definition of continuity: for every open subset φ⊂ X of the output, the inverse image, ψ ≡ Σ^fφ ≡ f⁻¹(φ) ≡ {u ∣ f u∈φ}⊂ U, is an open subset of the input.
In particular, for f:ℝ→ℝ, u∈ℝ and є> 0, let φ≡{x ∣ |x−f u|<є} be an open interval around f u, then ψ is an open neighbourhood of u iff it contains the open interval {u′ ∣ |u′−u|<δ } for some δ> 0. This is Weierstrass’s definition.
Computationally, if you tell me what you’re going to do with my output (your continuation φ from my procedure), I can tell you what (ψ) we’re going to do with your input.

So computations are given in the same contravariant way as continuous functions are defined in general topology.

Remark 1.13 Since we only access values via their observations,

if φ[a] ⇔ φ[b] for all observations φ:Σ^X then a=b:X.

This is a Leibniz principle for values. The corresponding property for points and open subsets of a topological space is known as the T₀ separation axiom. An equality such as φ[a] ⇔ φ[b] of two terms of type Σ means that one program terminates if and only if the other does. This equality is not itself an observable computation, as we cannot see the programs (both) failing to terminate.

Remark 1.14 Now suppose that the system P of observations does satisfy the consistency conditions that we have stated, i.e. it is a completely coprime filter. Must there now be some point p∈ X such that P=η_X(p)?

Sobriety says that there is — and then T₀ says that it’s unique.

In the parametric version, every frame homomorphism H:Σ^X→Σ^U is given by Σ^f for some unique continuous function f:U→ X. [Sobriety means existence and uniqueness.]

We shall show in this paper that the lattice-theoretic way in which we have introduced sobriety is equivalent to an equational one in the λ-calculus. In Section 9 we return to a lattice-theoretic view of ℕ, where the corresponding notion is that of a description, i.e. a predicate that provably has exactly one witness. Then sobriety produces that witness, i.e. the number that is defined by the description. Taking the same idea a little further, we obtain the search operation in general recursion.

In topology, sobriety says that spaces are determined (up to isomorphism) by their frames of open subsets, just as points are determined (up to equality) by their neighbourhoods. Sobriety is therefore a Leibniz principle for spaces. The next step is to say that not only the spaces but the entire category of spaces and continuous functions is determined by the category of frames and homomorphisms — a Leibniz principle for categories. This is developed in [B], for which we set up the preliminaries here.

Another idea, called repleteness, was investigated in synthetic domain theory [Hyl91, Tay91]. This played the same role in the theory as sobriety (cf. Remark 10.9), but it is technically weaker in some concrete categories.

Remark 1.15 We have stressed that a meta-observation P:Σ^{Σ^X} only defines a value of type X when certain conditions are satisfied. Indeed, we justified those conditions by excluding certain kinds of programs that have non-trivial computational effects.

Since fire burns, we adopt precautions for avoiding it or putting it out — that is the point of view of this paper. On the other hand, fire is useful for cooking and heating, so we also learn how to use it safely.

The mathematical techniques discussed in this paper are closely related to those that have been used by Hayo Thielecke [Thi97a, Thi97b], Carsten Führmann [Füh99] and Peter Selinger [Sel01] to study computational effects. More practically, Guy Steele [Ste78] and Andrew Appel [App92] showed how an ordinary functional program f:U→ X (without jumps, etc.) may be compiled very efficiently by regarding it as a continuation-transformer Σ^X→Σ^U. This is called the continuation-passing style. It may be extended to handle imperative idioms such as jumps, exceptions and co-routines by breaking the rules that we lay down. As in Remark 1.5, programs may hijack their continuations — altering them, not running them at all, or even calling them twice! We discuss this briefly in Remark 11.4.

Theoretical computer science often displays this ambiguity of purpose — are we applying mathematics to computation or vice versa? It is important to understand, of this and each other study, which it is trying to do.

The development of mathematics before Georg Cantor was almost entirely about the employment of computation in the service of mathematical ideas, but in an age of networks mathematics must now also be the servant of the science of complex systems, with non-determinism and computational effects. This paper and the programme that it introduces seek to use computational ideas as a foundation for conceptual mathematics. The science of systems is a travelling companion, but our destinations are different. This does not mean that our objectives conflict, because the new mathematics so obtained will be better suited than Cantor’s to the denotational foundations of high-level computation.