このページは http://www.slideshare.net/csclub/20130928-automated-theoremprovingharrison の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

- A Survey of Automated Theorem Proving

John Harrison

Intel Corporation

28–29 September 2013 - 1: Background, history and propositional logic

John Harrison, Intel Corporation

Computer Science Club, St. Petersburg

Sat 28th September 2013 (17:20–18:55) - What I will talk about

Aim is to cover some of the most important approaches to

computer-aided proof in classical logic.

This is usually called ‘automated theorem proving’ or ‘automated

reasoning’, though we interpret “automated” quite broadly.

1. Background and propositional logic

2. First-order logic, with and without equality

3. Decidable problems in logic and algebra

4. Interactive theorem proving

5. Applications to mathematics and computer verification - What I won’t talk about

Temporal logic, model checking etc.

Higher-order logic and type theory

Constructive logic, modal logic, other nonclassical logics - For more details

An introductory survey of many central results in automated

reasoning, together with actual OCaml model implementations

http://www.cl.cam.ac.uk/~jrh13/atp/index.html - What is automated reasoning?

Attempting to perform logical reasoning in an automatic and

algorithmic way. An old dream:

Hobbes (1651): “Reason . . . is nothing but reckoning (that is,

adding and subtracting) of the consequences of general names

agreed upon, for the marking and signifying of our thoughts.”

Leibniz (1685) “When there are disputes among persons, we

can simply say: Let us calculate [calculemus], without further

ado, to see who is right.”

Nowadays, by ‘automatic and algorithmic’ we mean ‘using a

computer program’. - What does automated reasoning involve?

There are two steps to performing automated reasoning, as

anticipated by Leibniz:

Express statement of theorems in a formal language.

(Leibniz’s characteristica universalis.)

Use automated algorithmic manipulations on those formal

expressions. (Leibniz’s calculus ratiocinator).

Is that really possible? - Theoretical and practical limitations

Limitative results in logic (G¨

odel, Tarski, Church-Turing,

Matiyasevich) imply that not even elementary number theory

can be done completely automatically.

There are formal proof systems (e.g. first-order set theory)

and semi-decision procedures that will in principle find the

proof of anything provable in ‘ordinary’ mathematics.

In practice, because of time or space limits, these automated

procedures are not all that useful, and we may prefer an

interactive arrangement where a human guides the machine. - Why automated reasoning?

For general intellectual interest? It is a fascinating field that helps

to understand the real nature of mathematical creativity. Or more

practically:

To check the correctness of proofs in mathematics,

supplementing or even replacing the existing ‘social process’ of

peer review etc. with a more objective criterion.

To extend rigorous proof from pure mathematics to the

verification of computer systems (programs, hardware

systems, protocols etc.), supplementing or replacing the usual

testing process.

These are currently the two main drivers of progress in the field. - Theorem provers vs. computer algebra systems

Both systems for symbolic computation, but rather different:

Theorem provers are more logically flexible and rigorous

CASs are generally easier to use and more efficient/powerful

Some systems like MathXpert, Theorema blur the distinction

somewhat . . . - Limited expressivity in CASs

Often limited to conditional equations like

√

x

if x ≥ 0

x 2 =

−x

if x ≤ 0

whereas using logic we can say many interesting (and highly

undecidable) things

∀x ∈ R. ∀ > 0. ∃δ > 0. ∀x . |x − x | < δ ⇒ |f (x) − f (x )| < - Unclear expressions in CASs

Consider an equation (x 2 − 1)/(x − 1) = x + 1 from a CAS. What

does it mean?

Universally valid identity (albeit not quite valid)?

Identity true when both sides are defined

Identity over the field of rational functions

. . . - Lack of rigour in many CASs

CASs often apply simplifications even when they are not strictly

valid.

Hence they can return wrong results.

Consider the evaluation of this integral in Maple:

∞ e−(x−1)2

√

dx

0

x

We try it two different ways: - An integral in Maple

> int(exp(-(x-t)^2)/sqrt(x), x=0..infinity);

1

1

1

t2

3(t2) 4 π 2 2 2 e 2 K 3 ( t2 )

2

1

1

1

t2

e−t2

−

4

+(t 2) 4 π 2 2 2 e 2 K7 ( t2 )

1

t2

2

4

2

1

π 2

> subs(t=1,%);

1

1

1

1

1

1

e−1 −3π 2 2 2 e 2 K 3 ( 1 )+π 2 2 2 e 2 K7 ( 1 )

1

2

2

4

4

2

1

π 2

> evalf(%);

0.4118623312

> evalf(int(exp(-(x-1)^2)/sqrt(x), x=0..infinity));

1.973732150 - Early research in automated reasoning

Most early theorem provers were fully automatic, even though

there were several different approaches:

Human-oriented AI style approaches (Newell-Simon,

Gelerntner)

Machine-oriented algorithmic approaches (Davis, Gilmore,

Wang, Prawitz)

Modern work dominated by machine-oriented approach but some

successes for AI approach. - A theorem in geometry (1)

Example of AI approach in action:

A

✁❆

✁

❆

✁

❆

✁

❆

✁

❆

✁

❆

✁

❆

B ✁

❆ C

If the sides AB and AC are equal (i.e. the triangle is isoseles),

then the angles ABC and ACB are equal. - A theorem in geometry (2)

Drop perpendicular meeting BC at a point D:

A

✁❆

✁

❆

✁

❆

✁

❆

✁

❆

✁

❆

✁

❆

B ✁

❆ C

D

and then use the fact that the triangles ABD and ACD are

congruent. - A theorem in geometry (3)

Originally found by Pappus but not in many books:

A

✁❆

✁

❆

✁

❆

✁

❆

✁

❆

✁

❆

✁

❆

B ✁

❆ C

Simply, the triangles ABC and ACB are congruent. - The Robbins Conjecture (1)

Huntington (1933) presented the following axioms for a Boolean

algebra:

x + y

=

y + x

(x + y ) + z

=

x + (y + z)

n(n(x ) + y ) + n(n(x ) + n(y ))

=

x

Herbert Robbins conjectured that the Huntington equation can be

replaced by a simpler one:

n(n(x + y ) + n(x + n(y ))) = x - The Robbins Conjecture (2)

This conjecture went unproved for more than 50 years, despite

being studied by many mathematicians, even including Tarski.

It because a popular target for researchers in automated reasoning.

In October 1996, a (key lemma leading to) a proof was found by

McCune’s program EQP.

The successful search took about 8 days on an RS/6000 processor

and used about 30 megabytes of memory. - What can be automated?

Validity/satisfiability in propositional logic is decidable (SAT).

Validity/satisfiability in many temporal logics is decidable.

Validity in first-order logic is semidecidable, i.e. there are

complete proof procedures that may run forever on invalid

formulas

Validity in higher-order logic is not even semidecidable (or

anywhere in the arithmetical hierarchy). - Some specific theories

We are often interested in validity w.r.t. some suitable background

theory.

Linear theory of N or Z is decidable. Nonlinear theory not

even semidecidable.

Linear and nonlinear theory of R is decidable, though

complexity is very bad in the nonlinear case.

Linear and nonlinear theory of C is decidable. Commonly used

in geometry.

Many of these naturally generalize known algorithms like

linear/integer programming and Sturm’s theorem. - Propositional Logic

We probably all know what propositional logic is.

English

Standard

Boolean

Other

false

⊥

0

F

true

1

T

not p

¬p

p

−p, ∼ p

p and q

p ∧ q

pq

p&q, p · q

p or q

p ∨ q

p + q

p | q, p or q

p implies q

p ⇒ q

p ≤ q

p → q, p ⊃ q

p iff q

p ⇔ q

p = q

p ≡ q, p ∼ q

In the context of circuits, it’s often referred to as ‘Boolean

algebra’, and many designers use the Boolean notation. - Is propositional logic boring?

Traditionally, propositional logic has been regarded as fairly boring.

There are severe limitations to what can be said with

propositional logic.

Propositional logic is trivially decidable in theory.

Propositional satisfiability (SAT) is the original NP-complete

problem, so seems intractible in practice.

But . . . - There are many interesting problems that can be expressed in

propositional logic

Efficient algorithms can often decide large, interesting

problems of real practical relevance.

The many applications almost turn the ‘NP-complete’ objection on

its head.

No!

The last decade or so has seen a remarkable upsurge of interest in

propositional logic.

Why the resurgence? - No!

The last decade or so has seen a remarkable upsurge of interest in

propositional logic.

Why the resurgence?

There are many interesting problems that can be expressed in

propositional logic

Efficient algorithms can often decide large, interesting

problems of real practical relevance.

The many applications almost turn the ‘NP-complete’ objection on

its head. - Logic and circuits

The correspondence between digital logic circuits and propositional

logic has been known for a long time.

Digital design

Propositional Logic

circuit

formula

logic gate

propositional connective

input wire

atom

internal wire

subexpression

voltage level

truth value

Many problems in circuit design and verification can be reduced to

propositional tautology or satisfiability checking (‘SAT’).

For example optimization correctness: φ ⇔ φ is a tautology. - Combinatorial problems

Many other apparently difficult combinatorial problems can be

encoded as Boolean satisfiability, e.g. scheduling, planning,

geometric embeddibility, even factorization.

¬( (out0 ⇔ x0 ∧ y0)∧

(out1 ⇔ (x0 ∧ y1 ⇔ ¬(x1 ∧ y0)))∧

(v 2 ⇔ (x

2

0 ∧ y1) ∧ x1 ∧ y0)∧

(u0 ⇔ ((x

))∧

2

1 ∧ y1) ⇔ ¬v 2

2

(u1 ⇔ (x

)∧

2

1 ∧ y1) ∧ v 2

2

(out2 ⇔ u0) ∧ (out

)∧

2

3 ⇔ u1

2

¬out0 ∧ out1 ∧ out2 ∧ ¬out3)

Read off the factorization 6 = 2 × 3 from a refuting assignment. - Efficient methods

The naive truth table method is quite impractical for formulas with

more than a dozen primitive propositions.

Practical use of propositional logic mostly relies on one of the

following algorithms for deciding tautology or satisfiability:

Binary decision diagrams (BDDs)

The Davis-Putnam method (DP, DPLL)

St˚

almarck’s method

We’ll sketch the basic ideas behind Davis-Putnam. - DP and DPLL

Actually, the original Davis-Putnam procedure is not much used

now.

What is usually called the Davis-Putnam method is actually a later

refinement due to Davis, Loveland and Logemann (hence DPLL).

We formulate it as a test for satisfiability. It has three main

components:

Transformation to conjunctive normal form (CNF)

Application of simplification rules

Splitting - Normal forms

In ordinary algebra we can reach a ‘sum of products’ form of an

expression by:

Eliminating operations other than addition, multiplication and

negation, e.g. x − y → x + −y .

Pushing negations inwards, e.g. −(−x ) → x and

−(x + y ) → −x + −y .

Distributing multiplication over addition, e.g.

x (y + z) → xy + xz.

In logic we can do exactly the same, e.g. p ⇒ q → ¬p ∨ q,

¬(p ∧ q) → ¬p ∨ ¬q and p ∧ (q ∨ r ) → (p ∧ q) ∨ (p ∧ r ).

The first two steps give ‘negation normal form’ (NNF).

Following with the last (distribution) step gives ‘disjunctive normal

form’ (DNF), analogous to a sum-of-products. - Conjunctive normal form

Conjunctive normal form (CNF) is the dual of DNF, where we

reverse the roles of ‘and’ and ‘or’ in the distribution step to reach a

‘product of sums’:

p ∨ (q ∧ r )

→ (p ∨ q) ∧ (p ∨ r )

(p ∧ q) ∨ r

→ (p ∨ r ) ∧ (q ∨ r )

Reaching such a CNF is the first step of the Davis-Putnam

procedure. - Conjunctive normal form

Conjunctive normal form (CNF) is the dual of DNF, where we

reverse the roles of ‘and’ and ‘or’ in the distribution step to reach a

‘product of sums’:

p ∨ (q ∧ r )

→ (p ∨ q) ∧ (p ∨ r )

(p ∧ q) ∨ r

→ (p ∨ r ) ∧ (q ∨ r )

Reaching such a CNF is the first step of the Davis-Putnam

procedure.

Unfortunately the naive distribution algorithm can cause the size of

the formula to grow exponentially — not a good start. Consider

for example:

(p1 ∧ p2 ∧ · · · ∧ pn) ∨ (q1 ∧ p2 ∧ · · · ∧ qn) - Definitional CNF

A cleverer approach is to introduce new variables for subformulas.

Although this isn’t logically equivalent, it does preserve

satisfiability.

(p ∨ (q ∧ ¬r )) ∧ s

introduce new variables for subformulas:

(p1 ⇔ q ∧ ¬r ) ∧ (p2 ⇔ p ∨ p1) ∧ (p3 ⇔ p2 ∧ s) ∧ p3

then transform to (3-)CNF in the usual way:

(¬p1 ∨ q) ∧ (¬p1 ∨ ¬r ) ∧ (p1 ∨ ¬q ∨ r )∧

(¬p2 ∨ p ∨ p1) ∧ (p2 ∨ ¬p) ∧ (p2 ∨ ¬p1)∧

(¬p3 ∨ p2) ∧ (¬p3 ∨ s) ∧ (p3 ∨ ¬p2 ∨ ¬s) ∧ p3 - Clausal form

It’s convenient to think of the CNF form as a set of sets:

Each disjunction p1 ∨ · · · ∨ pn is thought of as the set

{p1, . . . , pn}, called a clause.

The overall formula, a conjunction of clauses C1 ∧ · · · ∧ Cm is

thought of as a set {C1, . . . , Cm}.

Since ‘and’ and ‘or’ are associative, commutative and idempotent,

nothing of logical significance is lost in this interpretation.

Special cases: an empty clause means ⊥ (and is hence

unsatisfiable) and an empty set of clauses means

(and is hence

satisfiable). - Simplification rules

At the core of the Davis-Putnam method are two transformations

on the set of clauses:

I The 1-literal rule: if a unit clause p appears, remove ¬p from

other clauses and remove all clauses including p.

II The affirmative-negative rule: if p occurs only negated, or

only unnegated, delete all clauses involving p.

These both preserve satisfiability of the set of clause sets. - Splitting

In general, the simplification rules will not lead to a conclusion.

We need to perform case splits.

Given a clause set ∆, simply choose a variable p, and consider the

two new sets ∆ ∪ {p} and ∆ ∪ {¬p}.

∆

❅

❅

❅

✠

❅

❘

∆ ∪ {¬p}

∆ ∪ {p}

I, II

I, II

❄

❄

∆0

∆1

In general, these case-splits need to be nested. - DPLL completeness

Each time we perform a case split, the number of unassigned

literals is reduced, so eventually we must terminate. Either

For all branches in the tree of case splits, the empty clause is

derived: the original formula is unsatisfiable.

For some branch of the tree, we run out of clauses: the

formula is satisfiable.

In the latter case, the decisions leading to that leaf give rise to a

satisfying assignment. - Modern SAT solvers

Much of the improvement in SAT solver performance in recent

years has been driven by several improvements to the basic DPLL

algorithm:

Non-chronological backjumping, learning conflict clauses

Optimization of the basic ‘constraint propagation’ rules

(“watched literals” etc.)

Good heuristics for picking ‘split’ variables, and even

restarting with different split sequence

Highly efficient data structures

Some well-known SAT solvers are Chaff, MiniSat and PicoSAT. - Backjumping motivation

Suppose we have clauses

¬p1 ∨ ¬p10 ∨ p11

¬p1 ∨ ¬p10 ∨ ¬p11

If we split over variables in the order p1,. . . ,p10, assuming first

that they are true, we then get a conflict.

Yet none of the assignments to p2,. . . ,p9 are relevant.

We can backjump to the decision on p1 and assume ¬p10 at once.

Or backtrack all the way and add ¬p1 ∨ ¬p10 as a deduced

‘conflict’ clause. - St˚

almarck’s algorithm

St˚

almarck’s ‘dilemma’ rule attempts to avoid nested case splits by

feeding back common information from both branches.

∆

❅

❅

❅

✠

❅

❘

∆ ∪ {¬p}

∆ ∪ {p}

R

R

❄

❄

∆ ∪ ∆0

∆ ∪ ∆1

❅

❅

❅

❅

❘

✠

∆ ∪ (∆0 ∩ ∆1) - 2: First-order logic with and without equality

John Harrison, Intel Corporation

Computer Science Club, St. Petersburg

Sat 28th September 2013 (19:05–20:40) - First-order logic

Start with a set of terms built up from variables and constants

using function application:

x + 2 · y

≡ +(x, ·(2(), y ))

Create atomic formulas by applying relation symbols to a set of

terms

x > y

≡ > (x, y )

Create complex formulas using quantifiers

∀x. P[x] — for all x, P[x]

∃x. P[x] — there exists an x such that P[x] - Quantifier examples

The order of quantifier nesting is important. For example

∀x. ∃y . loves(x, y ) — everyone loves someone

∃x. ∀y . loves(x, y ) — somebody loves everyone

∃y . ∀x. loves(x, y ) — someone is loved by everyone

This says that a function R → R is continuous:

∀ . > 0 ⇒ ∀x. ∃δ. δ > 0 ∧ ∀x . |x − x| < δ ⇒ |f (x ) − f (x)| < ε

while this one says it is uniformly continuous, an important

distinction

∀ . > 0 ⇒ ∃δ. δ > 0 ∧ ∀x. ∀x . |x − x| < δ ⇒ |f (x ) − f (x)| < ε - Skolemization

Skolemization relies on this observation (related to the axiom of

choice):

(∀x . ∃y . P[x , y ]) ⇔ ∃f . ∀x . P[x , f (x )]

For example, a function is surjective (onto) iff it has a right

inverse:

(∀x . ∃y . g (y ) = x ) ⇔ (∃f . ∀x . g (f (x )) = x

Can’t quantify over functions in first-order logic.

But we get an equisatisfiable formula if we just introduce a new

function symbol.

∀x1, . . . , xn. ∃y . P[x1, . . . , xn, y ]

→ ∀x1, . . . , xn. P[x1, . . . , xn, f (x1, . . . , xn)]

Now we just need a satisfiability test for universal formulas. - First-order automation

The underlying domains can be arbitrary, so we can’t do an

exhaustive analysis, but must be slightly subtler.

We can reduce the problem to propositional logic using the

so-called Herbrand theorem and compactness theorem, together

implying:

Let ∀x1, . . . , xn. P[x1, . . . , xn] be a first order formula

with only the indicated universal quantifiers (i.e. the

body P[x1, . . . , xn] is quantifier-free). Then the formula

is satisfiable iff all finite sets of ‘ground instances’

P[ti , . . . , ti

1

n] that arise by replacing the variables by

arbitrary variable-free terms made up from functions and

constants in the original formula is propositionally

satisfiable.

Still only gives a semidecision procedure, a kind of proof search. - Example

Suppose we want to prove the ‘drinker’s principle’

∃x. ∀y . D(x) ⇒ D(y )

Negate the formula, and prove negation unsatisfiable:

¬(∃x. ∀y . D(x) ⇒ D(y ))

Convert to prenex normal form: ∀x . ∃y . D(x ) ∧ ¬D(y )

Skolemize: ∀x . D(x ) ∧ ¬D(f (x ))

Enumerate set of ground instances, first D(c) ∧ ¬D(f (c)) is not

unsatisfiable, but the next is:

(D(c) ∧ ¬D(f (c))) ∧ (D(f (c)) ∧ ¬D(f (f (c))) - Instantiation versus unification

The first automated theorem provers actually used that approach.

It was to test the propositional formulas resulting from the set of

ground-instances that the Davis-Putnam method was developed.

Humans tend to find instantiations intelligently based on some

understanding of the problem.

Even for the machine, instantiations can be chosen more

intelligently by a syntax-driven process of unification.

For example, choose instantiation for x and y so that D(x ) and

¬(D(f (y ))) are complementary. - Unification

Given a set of pairs of terms

S = {(s1, t1), . . . , (sn, tn)}

a unifier of S is an instantiation σ such that each

σsi = σti

If a unifier exists there is a most general unifier (MGU), of which

any other is an instance.

MGUs can be found by straightforward recursive algorithm. - Roughly, you can take a propositional decision procedure and “lift”

it to a first-order one by adding unification, though there are

subtleties:

Distinction between top-down and bottom-up methods

Need for factoring in resolution

Unification-based theorem proving

Many theorem-proving algorithms based on unification exist:

Tableaux

Resolution / inverse method / superposition

Model elimination

Connection method

. . . - Unification-based theorem proving

Many theorem-proving algorithms based on unification exist:

Tableaux

Resolution / inverse method / superposition

Model elimination

Connection method

. . .

Roughly, you can take a propositional decision procedure and “lift”

it to a first-order one by adding unification, though there are

subtleties:

Distinction between top-down and bottom-up methods

Need for factoring in resolution - Resolution

Propositional resolution is the rule:

p ∨ A ¬p ∨ B

A ∨ B

and full first-order resolution is the generalization

P ∨ A Q ∨ B

σ(A ∨ B)

where σ is an MGU of literal sets P and Q−. - Factoring

Pure propositional resolution is (refutation) complete in itself, but

in the first-order context we may in general need ‘factoring’.

Idea: we may need to make an instantiation more special to

collapse a set of literals into a smaller set.

Example: there does not exist a barber who shaves exactly the

people who do not shave themselves:

∃b. ∀x. shaves(b, x) ⇔ ¬shaves(x, x)

If we reduce to clauses we get the following to refute:

{¬shaves(x, x) ∨ ¬shaves(b, x)}, {shaves(x, x) ∨ shaves(b, x)}

and resolution doesn’t derive useful consequences without

factoring. - Adding equality

We often want to restrict ourselves to validity in normal models

where ‘equality means equality’.

Add extra axioms for equality and use non-equality decision

procedures

Use other preprocessing methods such as Brand

transformation or STE

Use special rules for equality such as paramodulation or

superposition - Equality axioms

Given a formula p, let the equality axioms be equivalence:

∀x. x = x

∀x y . x = y ⇒ y = x

∀x y z. x = y ∧ y = z ⇒ x = z

together with congruence rules for each function and predicate in

p:

∀xy . x1 = y1 ∧ · · · ∧ xn = yn ⇒ f (x1, . . . , xn) = f (y1, . . . , yn)

∀xy . x1 = y1 ∧ · · · ∧ xn = yn ⇒ R(x1, . . . , xn) ⇒ R(y1, . . . , yn) - Brand transformation

Adding equality axioms has a bad reputation in the ATP world.

Simple substitutions like x = y ⇒ f (y ) + f (f (x )) = f (x ) + f (f (y ))

need many applications of the rules.

Brand’s transformation uses a different translation to build in

equality, involving ‘flattening’

(x · y ) · z = x · (y · z)

x · y = w1 ⇒ w1 · z = x · (y · z)

x · y = w1 ∧ y · z = w2 ⇒ w1 · z = x · w2

Still not conclusively better. - Paramodulation and related methods

Often better to add special rules such as paramodulation:

.

C ∨ s = t

D ∨ P[s ]

σ (C ∨ D ∨ P[t])

Works best with several restrictions including the use of orderings

to orient equations.

Easier to understand for pure equational logic. - Normalization by rewriting

Use a set of equations left-to-right as rewrite rules to simplify or

normalize a term:

Use some kind of ordering (e.g. lexicographic path order) to

ensure termination

Difficulty is ensuring confluence - Failure of confluence

Consider these axioms for groups:

(x · y ) · z = x · (y · z)

1 · x = x

i (x ) · x = 1

They are not confluent because we can rewrite

(i (x ) · x ) · y −→ i (x ) · (x · y )

(i (x ) · x ) · y −→ 1 · y - Knuth-Bendix completion

Key ideas of Knuth-Bendix completion:

Use unification to identify most general situations where

confluence fails (‘critical pairs’)

Add critical pairs, suitably oriented, as new equations and

repeat

This process completes the group axioms, deducing some

non-trivial consequences along the way. - Completion of group axioms

i (x · y ) = i (y ) · i (x )

i (i (x )) = x

i (1) = 1

x · i (x ) = 1

x · i (x ) · y = y

x · 1 = x

i (x ) · x · y = y

1 · x = x

i (x ) · x = 1

(x · y ) · z = x · y · z - AE formulas: no function symbols, universal quantifiers before

existentials in prenex form

Monadic formulas: no function symbols, only unary predicates

All ‘syllogistic’ reasoning can be reduced to the monadic fragment:

If all M are P, and all S are M, then all S are P

can be expressed as the monadic formula:

(∀x . M(x ) ⇒ P(x )) ∧ (∀x . S (x ) ⇒ M(x )) ⇒ (∀x . S (x ) ⇒ P(x ))

Decidable fragments of F.O.L.

Validity in first-order logic is only semidecidable (Church-Turing).

However, there are some interesting special cases where it is

decidable, e.g. - All ‘syllogistic’ reasoning can be reduced to the monadic fragment:

If all M are P, and all S are M, then all S are P

can be expressed as the monadic formula:

(∀x . M(x ) ⇒ P(x )) ∧ (∀x . S (x ) ⇒ M(x )) ⇒ (∀x . S (x ) ⇒ P(x ))

Decidable fragments of F.O.L.

Validity in first-order logic is only semidecidable (Church-Turing).

However, there are some interesting special cases where it is

decidable, e.g.

AE formulas: no function symbols, universal quantifiers before

existentials in prenex form

Monadic formulas: no function symbols, only unary predicates - Decidable fragments of F.O.L.

Validity in first-order logic is only semidecidable (Church-Turing).

However, there are some interesting special cases where it is

decidable, e.g.

AE formulas: no function symbols, universal quantifiers before

existentials in prenex form

Monadic formulas: no function symbols, only unary predicates

All ‘syllogistic’ reasoning can be reduced to the monadic fragment:

If all M are P, and all S are M, then all S are P

can be expressed as the monadic formula:

(∀x . M(x ) ⇒ P(x )) ∧ (∀x . S (x ) ⇒ M(x )) ⇒ (∀x . S (x ) ⇒ P(x )) - Why AE is decidable

The negation of an AE formula is an EA formula to be refuted:

∃x1, . . . , xn. ∀y1, . . . , ym. P[x1, . . . , xn, y1, . . . , ym]

and after Skolemization we still have no functions:

∀y1, . . . , ym. P[c1, . . . , cn, y1, . . . , ym]

So there are only finitely many ground instances to check for

satisfiability. - Why AE is decidable

The negation of an AE formula is an EA formula to be refuted:

∃x1, . . . , xn. ∀y1, . . . , ym. P[x1, . . . , xn, y1, . . . , ym]

and after Skolemization we still have no functions:

∀y1, . . . , ym. P[c1, . . . , cn, y1, . . . , ym]

So there are only finitely many ground instances to check for

satisfiability.

Since the equality axioms are purely universal formulas, adding

those doesn’t disturb the AE/EA nature, so we get Ramsey’s

decidability result. - The finite model property

Another way of understanding decidability results is that fragments

like AE and monadic formulas have the finite model property:

If the formula in the fragment has a model it has a finite

model.

Any fragment with the finite model property is decidable: search

for a model and a disproof in parallel.

Often we even know the exact size we need consider: e.g. size 2n

for monadic formula with n predicates.

In practice, we quite often find finite countermodels to false

formulas. - Failures of the FMP

However many formulas with simple quantifier prefixes don’t have

the FMP:

(∀x . ¬R(x , x )) ∧ (∀x . ∃z. R(x , z))∧

(∀x y z. R(x , y ) ∧ R(y , z) ⇒ R(x , z))

(∀x . ¬R(x , x )) ∧ (∀x . ∃y . R(x , y ) ∧ ∀z. R(y , z) ⇒ R(x , z)))

¬( (∀x. ¬(F (x, x))∧

(∀x y . F (x , y ) ⇒ F (y , x ))∧

(∀x y . ¬(x = y ) ⇒ ∃!z. F (x , z) ∧ F (y , z))

⇒ ∃u. ∀v . ¬(v = u) ⇒ F (u, v )) - Failures of the FMP

However many formulas with simple quantifier prefixes don’t have

the FMP:

(∀x . ¬R(x , x )) ∧ (∀x . ∃z. R(x , z))∧

(∀x y z. R(x , y ) ∧ R(y , z) ⇒ R(x , z))

(∀x . ¬R(x , x )) ∧ (∀x . ∃y . R(x , y ) ∧ ∀z. R(y , z) ⇒ R(x , z)))

¬( (∀x. ¬(F (x, x))∧

(∀x y . F (x , y ) ⇒ F (y , x ))∧

(∀x y . ¬(x = y ) ⇒ ∃z. F (x , z) ∧ F (y , z)∧

∀w . F (x, w ) ∧ F (y , w ) ⇒ w = z)

⇒ ∃u. ∀v . ¬(v = u) ⇒ F (u, v )) - The theory of equality

Even equational logic is undecidable, but the purely universal

(quantiifer-free) fragment is decidable. For example:

∀x. f (f (f (x)) = x ∧ f (f (f (f (f (x))))) = x ⇒ f (x) = x

after negating and Skolemizing we need to test a ground formula

for satisfiability:

f (f (f (c)) = c ∧ f (f (f (f (f (c))))) = c ∧ ¬(f (c) = c)

Two well-known algorithms:

Put the formula in DNF and test each disjunct using one of

the classic ‘congruence closure’ algorithms.

Reduce to SAT by introducing a propositional variable for

each equation between subterms and adding constraints. - Current first-order provers

There are numerous competing first-order theorem provers

Vampire

E

SPASS

Prover9

LeanCop

and many specialist equational solvers like Waldmeister and EQP.

There are annual theorem-proving competitions where they are

tested against each other, which has helped to drive progress. - 3: Decidable problems in logic and algebra

John Harrison, Intel Corporation

Computer Science Club, St. Petersburg

Sun 29th September 2013 (11:15–12:50) - Decidable theories

More useful in practical applications are cases not of pure validity,

but validity in special (classes of) models, or consequence from

useful axioms, e.g.

Does a formula hold over all rings (Boolean rings,

non-nilpotent rings, integral domains, fields, algebraically

closed fields, . . . )

Does a formula hold in the natural numbers or the integers?

Does a formula hold over the real numbers?

Does a formula hold in all real-closed fields?

. . .

Because arithmetic comes up in practice all the time, there’s

particular interest in theories of arithmetic. - Theories

These can all be subsumed under the notion of a theory, a set of

formulas T closed under logical validity. A theory T is:

Consistent if we never have p ∈ T and (¬p) ∈ T .

Complete if for closed p we have p ∈ T or (¬p) ∈ T .

Decidable if there’s an algorithm to tell us whether a given

closed p is in T

Note that a complete theory generated by an r.e. axiom set is also

decidable. - Quantifier elimination

Often, a quantified formula is T -equivalent to a quantifier-free one:

C |= (∃x . x 2 + 1 = 0) ⇔

R |= (∃x . ax 2 + bx + c = 0) ⇔ a = 0 ∧ b2 ≥ 4ac ∨ a =

0 ∧ (b = 0 ∨ c = 0)

Q |= (∀x . x < a ⇒ x < b) ⇔ a ≤ b

Z |= (∃k x y . ax = (5k + 2)y + 1) ⇔ ¬(a = 0)

We say a theory T admits quantifier elimination if every formula

has this property.

Assuming we can decide variable-free formulas, quantifier

elimination implies completeness.

And then an algorithm for quantifier elimination gives a decision

method. - However, arithmetic with multiplication over Z is not even

semidecidable, by G¨

odel’s theorem.

Nor is arithmetic over Q (Julia Robinson), nor just solvability of

equations over Z (Matiyasevich). Equations over Q unknown.

Important arithmetical examples

Presburger arithmetic: arithmetic equations and inequalities

with addition but not multiplication, interpreted over Z or N.

Tarski arithmetic: arithmetic equations and inequalities with

addition and multiplication, interpreted over R (or any

real-closed field)

Complex arithmetic: arithmetic equations with addition and

multiplication interpreted over C (or other algebraically closed

field of characteristic 0). - Important arithmetical examples

Presburger arithmetic: arithmetic equations and inequalities

with addition but not multiplication, interpreted over Z or N.

Tarski arithmetic: arithmetic equations and inequalities with

addition and multiplication, interpreted over R (or any

real-closed field)

Complex arithmetic: arithmetic equations with addition and

multiplication interpreted over C (or other algebraically closed

field of characteristic 0).

However, arithmetic with multiplication over Z is not even

semidecidable, by G¨

odel’s theorem.

Nor is arithmetic over Q (Julia Robinson), nor just solvability of

equations over Z (Matiyasevich). Equations over Q unknown. - Word problems

Want to decide whether one set of equations implies another in a

class of algebraic structures:

∀x. s1 = t1 ∧ · · · ∧ sn = tn ⇒ s = t

For rings, we can assume it’s a standard polynomial form

∀x. p1(x) = 0 ∧ · · · ∧ pn(x) = 0 ⇒ q(x) = 0 - Word problem for rings

∀x. p1(x) = 0 ∧ · · · ∧ pn(x) = 0 ⇒ q(x) = 0

holds in all rings iff

q ∈ Id

p

Z

1, . . . , pn

i.e. there exist ‘cofactor’ polynomials with integer coefficients

such that

p1 · q1 + · · · + pn · qn = q - Special classes of rings

n times

Torsion-free: x + · · · + x = 0 ⇒ x = 0 for n ≥ 1

n times

Characteristic p: 1 + · · · + 1 = 0 iff p|n

Integral domains: x · y = 0 ⇒ x = 0 ∨ y = 0 (and 1 = 0). - Special word problems

∀x. p1(x) = 0 ∧ · · · ∧ pn(x) = 0 ⇒ q(x) = 0

Holds in all rings iff q ∈ Id

p

Z

1, . . . , pn

Holds in all torsion-free rings iff q ∈ Id

p

Q

1, . . . , pn

Holds in all integral domains iff qk ∈ Id

p

Z

1, . . . , pn

for some

k ≥ 0

Holds in all integral domains of characteristic 0 iff

qk ∈ Id

p

Q

1, . . . , pn

for some k ≥ 0 - Embedding in field of fractions

✬

✩

✬ ✩

✬ ✩

field

✛

integral

isomorphism

domain

✫ ✪

✲

✫ ✪

✫

✪

Universal formula in the language of rings holds in all integral

domains [of characteristic p] iff it holds in all fields [of

characteristic p]. - Embedding in algebraic closure

✬

✩

✬ ✩

✬ ✩

algebraically

✛

closed

field

isomorphism

field

✫ ✪

✲

✫ ✪

✫

✪

Universal formula in the language of rings holds in all fields [of

characteristic p] iff it holds in all algebraically closed fields [of

characteristic p] - Connection to the Nullstellensatz

Also, algebraically closed fields of the same characteristic are

elementarily equivalent.

For a universal formula in the language of rings, all these are

equivalent:

It holds in all integral domains of characteristic 0

It holds in all fields of characteristic 0

It holds in all algebraically closed fields of characteristic 0

It holds in any given algebraically closed field of characteristic

0

It holds in C

Penultimate case is basically the Hilbert Nullstellensatz. - Gr¨

obner bases

Can solve all these ideal membership goals in various ways.

The most straightforward uses Gr¨

obner bases.

Use polynomial m1 + m2 + · · · + mp = 0 as a rewrite rule

m1 = −m2 + · · · + −mp for a ‘head’ monomial according to

ordering.

Perform operation analogous to Knuth-Bendix completion to get

expanded set of equations that is confluent, a Gr¨

obner basis. - Geometric theorem proving

In principle can solve most geometric problems by using coordinate

translation then Tarski’s real quantifier elimination.

Example: A, B, C are collinear iff

(Ax − Bx )(By − Cy ) = (Ay − By )(Bx − Cx )

In practice, it’s much faster to use decision procedures for complex

numbers. Remarkably, many geometric theorems remain true in

this more general context.

As well as Gr¨

obner bases, Wu pioneered the approach using

characteristic sets (Ritt-Wu triangulation). - This is ‘true’ but fails when ABC is collinear.

A major strength of Wu’s method is that it can actually derive

many such conditions automatically without the user’s having to

think of them.

Degenerate cases

Many simple and not-so-simple theorems can be proved using a

straightforward algebraic reduction, but we may encounter

problems with degenerate cases, e.g.

The parallelegram theorem: If ABCD is a parallelogram,

and the diagonals AC and BD meet at E , then

|AE | = |CE |. - A major strength of Wu’s method is that it can actually derive

many such conditions automatically without the user’s having to

think of them.

Degenerate cases

Many simple and not-so-simple theorems can be proved using a

straightforward algebraic reduction, but we may encounter

problems with degenerate cases, e.g.

The parallelegram theorem: If ABCD is a parallelogram,

and the diagonals AC and BD meet at E , then

|AE | = |CE |.

This is ‘true’ but fails when ABC is collinear. - Degenerate cases

Many simple and not-so-simple theorems can be proved using a

straightforward algebraic reduction, but we may encounter

problems with degenerate cases, e.g.

The parallelegram theorem: If ABCD is a parallelogram,

and the diagonals AC and BD meet at E , then

|AE | = |CE |.

This is ‘true’ but fails when ABC is collinear.

A major strength of Wu’s method is that it can actually derive

many such conditions automatically without the user’s having to

think of them. - Quantifier elimination for real-closed fields

Take a first-order language:

All rational constants p/q

Operators of negation, addition, subtraction and multiplication

Relations ‘=’, ‘<’, ‘≤’, ‘>’, ‘≥’

We’ll prove that every formula in the language has a quantifier-free

equivalent, and will give a systematic algorithm for finding it. - Applications

In principle, this method can be used to solve many non-trivial

problems.

Kissing problem: how many disjoint n-dimensional

spheres can be packed into space so that they touch a

given unit sphere?

Pretty much any geometrical assertion can be expressed in this

theory.

If theorem holds for complex values of the coordinates, and then

simpler methods are available (Gr¨

obner bases, Wu-Ritt

triangulation. . . ). - History

1930: Tarski discovers quantifier elimination procedure for this

theory.

1948: Tarski’s algorithm published by RAND

1954: Seidenberg publishes simpler algorithm

1975: Collins develops and implements cylindrical algebraic

decomposition (CAD) algorithm

1983: H¨

ormander publishes very simple algorithm based on

ideas by Cohen.

1990: Vorobjov improves complexity bound to doubly

exponential in number of quantifier alternations.

We’ll present the Cohen-H¨

ormander algorithm. - Current implementations

There are quite a few simple versions of real quantifier elimination,

even in computer algebra systems like Mathematica.

Among the more heavyweight implementations are:

qepcad —

http://www.cs.usna.edu/~qepcad/B/QEPCAD.html

REDLOG — http://www.fmi.uni-passau.de/~redlog/ - One quantifier at a time

For a general quantifier elimination procedure, we just need one for

a formula

∃x. P[a1, . . . , an, x]

where P[a1, . . . , an, x] involves no other quantifiers but may involve

other variables.

Then we can apply the procedure successively inside to outside,

dealing with universal quantifiers via (∀x . P[x ]) ⇔ (¬∃x . ¬P[x ]). - Forget parametrization for now

First we’ll ignore the fact that the polynomials contain variables

other than the one being eliminated.

This keeps the technicalities a bit simpler and shows the main

ideas clearly.

The generalization to the parametrized case will then be very easy:

Replace polynomial division by pseudo-division

Perform case-splits to determine signs of coefficients - Sign matrices

Take a set of univariate polynomials p1(x), . . . , pn(x).

A sign matrix for those polynomials is a division of the real line

into alternating points and intervals:

(−∞, x1), x1, (x1, x2), x2, . . . , xm−1, (xm−1, xm), xm, (xm, +∞)

and a matrix giving the sign of each polynomial on each interval:

Positive (+)

Negative (−)

Zero (0) - Sign matrix example

The polynomials p1(x) = x2 − 3x + 2 and p2(x) = 2x − 3 have the

following sign matrix:

Point/Interval

p1

p2

(−∞, x1)

+

−

x1

0

−

(x1, x2)

−

−

x2

−

0

(x2, x3)

−

+

x3

0

+

(x3, +∞)

+

+ - Using the sign matrix

Using the sign matrix for all polynomials appearing in P[x ] we can

answer any quantifier elimination problem: ∃x . P[x ]

Look to see if any row of the matrix satisfies the formula

(hence dealing with existential)

For each row, just see if the corresponding set of signs

satisfies the formula.

We have replaced the quantifier elimination problem with sign

matrix determination - Finding the sign matrix

For constant polynomials, the sign matrix is trivial (2 has sign ‘+’

etc.)

To find a sign matrix for p, p1, . . . , pn it suffices to find one for

p , p1, . . . , pn, r0, r1, . . . , rn, where

p0 ≡ p is the derivative of p

ri = rem(p, pi )

(Remaindering means we have some qi so p = qi · pi + ri .)

Taking p to be the polynomial of highest degree we get a simple

recursive algorithm for sign matrix determination. - Details of recursive step

So, suppose we have a sign matrix for p , p1, . . . , pn, r0, r1, . . . , rn.

We need to construct a sign matrix for p, p1, . . . , pn.

May need to add more points and hence intervals for roots of

p

Need to determine signs of p1, . . . , pn at the new points and

intervals

Need the sign of p itself everywhere. - Step 1

Split the given sign matrix into two parts, but keep all the points

for now:

M for p , p1, . . . , pn

M for r0, r1, . . . , rn

We can infer the sign of p at all the ‘significant’ points of M as

follows:

p = qi pi + ri

and for each of our points, one of the pi is zero, so p = ri there

and we can read off p’s sign from ri ’s. - Step 2

Now we’re done with M and we can throw it away.

We also ‘condense’ M by eliminating points that are not roots of

one of the p , p1, . . . , pn.

Note that the sign of any of these polynomials is stable on the

condensed intervals, since they have no roots there.

We know the sign of p at all the points of this matrix.

However, p itself may have additional roots, and we don’t

know anything about the intervals yet. - Step 3

There can be at most one root of p in each of the existing

intervals, because otherwise p would have a root there.

We can tell whether there is a root by checking the signs of p

(determined in Step 1) at the two endpoints of the interval.

Insert a new point precisely if p has strictly opposite signs at the

two endpoints (simple variant for the two end intervals).

None of the other polynomials change sign over the original

interval, so just copy the values to the point and subintervals.

Throw away p and we’re done! - Multivariate generalization

In the multivariate context, we can’t simply divide polynomials.

Instead of

p = pi · qi + ri

we get

ak p = pi · qi + ri

where a is the leading coefficient of pi .

The same logic works, but we need case splits to fix the sign of a. - Real-closed fields

With more effort, all the ‘analytical’ facts can be deduced from the

axioms for real-closed fields.

Usual ordered field axioms

Existence of square roots: ∀x . x ≥ 0 ⇒ ∃y . x = y 2

Solvability of odd-degree equations:

∀a0, . . . , an. an = 0 ⇒ ∃x. anxn + an−1xn−1 + · · · + a1x + a0 = 0

Examples include computable reals and algebraic reals. So this

already gives a complete theory, without a stronger completeness

axiom. - Need for combinations

In applications we often need to combine decision methods from

different domains.

x − 1 < n ∧ ¬(x < n) ⇒ a[x ] = a[n]

An arithmetic decision procedure could easily prove

x − 1 < n ∧ ¬(x < n) ⇒ x = n

but could not make the additional final step, even though it looks

trivial. - Most combinations are undecidable

Adding almost any additions, especially uninterpreted, to the usual

decidable arithmetic theories destroys decidability.

Some exceptions like BAPA (‘Boolean algebra + Presburger

arithmetic’).

This formula over the reals constrains P to define the integers:

(∀n. P(n + 1) ⇔ P(n)) ∧ (∀n. 0 ≤ n ∧ n < 1 ⇒ (P(n) ⇔ n = 0))

and this one in Presburger arithmetic defines squaring:

(∀n. f (−n) = f (n)) ∧ (f (0) = 0)∧

(∀n. 0 ≤ n ⇒ f (n + 1) = f (n) + n + n + 1)

and so we can define multiplication. - Quantifier-free theories

However, if we stick to so-called ‘quantifier-free’ theories, i.e.

deciding universal formulas, things are better.

Two well-known methods for combining such decision procedures:

Nelson-Oppen

Shostak

Nelson-Oppen is more general and conceptually simpler.

Shostak seems more efficient where it does work, and only recently

has it really been understood. - Nelson-Oppen basics

Key idea is to combine theories T1, . . . , Tn with disjoint

signatures. For instance

T1: numerical constants, arithmetic operations

T2: list operations like cons, head and tail.

T3: other uninterpreted function symbols.

The only common function or relation symbol is ‘=’.

This means that we only need to share formulas built from

equations among the component decision procedure, thanks to the

Craig interpolation theorem. - The interpolation theorem

Several slightly different forms; we’ll use this one (by compactness,

generalizes to theories):

If |= φ1 ∧ φ2 ⇒ ⊥ then there is an ‘interpolant’ ψ, whose

only free variables and function and predicate symbols are

those occurring in both φ1 and φ2, such that |= φ1 ⇒ ψ

and |= φ2 ⇒ ¬ψ.

This is used to assure us that the Nelson-Oppen method is

complete, though we don’t need to produce general interpolants in

the method.

In fact, interpolants can be found quite easily from proofs,

including Herbrand-type proofs produced by resolution etc. - Nelson-Oppen I

Proof by example: refute the following formula in a mixture of

Presburger arithmetic and uninterpreted functions:

f (v − 1) − 1 = v + 1 ∧ f (u) + 1 = u − 1 ∧ u + 1 = v

First step is to homogenize, i.e. get rid of atomic formulas

involving a mix of signatures:

u + 1 = v ∧ v1 + 1 = u − 1 ∧ v2 − 1 = v + 1 ∧ v2 = f (v3) ∧ v1 =

f (u) ∧ v3 = v − 1

so now we can split the conjuncts according to signature:

(u + 1 = v ∧ v1 + 1 = u − 1 ∧ v2 − 1 = v + 1 ∧ v3 = v − 1)∧

(v2 = f (v3) ∧ v1 = f (u)) - Nelson-Oppen II

If the entire formula is contradictory, then there’s an interpolant ψ

such that in Presburger arithmetic:

Z |= u + 1 = v ∧ v1 + 1 = u − 1 ∧ v2 − 1 = v + 1 ∧ v3 = v − 1 ⇒ ψ

and in pure logic:

|= v2 = f (v3) ∧ v1 = f (u) ∧ ψ ⇒ ⊥

We can assume it only involves variables and equality, by the

interpolant property and disjointness of signatures.

Subject to a technical condition about finite models, the pure

equality theory admits quantifier elimination.

So we can assume ψ is a propositional combination of equations

between variables. - Nelson-Oppen III

In our running example, u = v3 ∧ ¬(v1 = v2) is one suitable

interpolant, so

Z |= u + 1 = v ∧ v1 + 1 = u − 1 ∧ v2 − 1 = v + 1 ∧ v3 = v − 1 ⇒

u = v3 ∧ ¬(v1 = v2)

in Presburger arithmetic, and in pure logic:

|= v2 = f (v3) ∧ v1 = f (u) ⇒ u = v3 ∧ ¬(v1 = v2) ⇒ ⊥

The component decision procedures can deal with those, and the

result is proved. - Nelson-Oppen IV

Could enumerate all significantly different potential interpolants.

Better: case-split the original problem over all possible equivalence

relations between the variables (5 in our example).

T1, . . . , Tn |= φ1 ∧ · · · ∧ φn ∧ ar (P) ⇒ ⊥

So by interpolation there’s a C with

T1 |= φ1 ∧ ar (P) ⇒ C

T2, . . . , Tn |= φ2 ∧ · · · ∧ φn ∧ ar (P) ⇒ ¬C

Since ar (P) ⇒ C or ar (P) ⇒ ¬C , we must have one theory with

Ti |= φi ∧ ar (P) ⇒ ⊥. - Nelson-Oppen V

Still, there are quite a lot of possible equivalence relations

(bell(5) = 52), leading to large case-splits.

An alternative formulation is to repeatedly let each theory deduce

new disjunctions of equations, and case-split over them.

Ti |= φi ⇒ x1 = y1 ∨ · · · ∨ xn = yn

This allows two important optimizations:

If theories are convex, need only consider pure equations, no

disjunctions.

Component procedures can actually produce equational

consequences rather than waiting passively for formulas to

test. - Shostak’s method

Can be seen as an optimization of Nelson-Oppen method for

common special cases. Instead of just a decision method each

component theory has a

Canonizer — puts a term in a T-canonical form

Solver — solves systems of equations

Shostak’s original procedure worked well, but the theory was flawed

on many levels. In general his procedure was incomplete and

potentially nonterminating.

It’s only recently that a full understanding has (apparently) been

reached. - SMT

Recently the trend has been to use a SAT solver as the core of

combined decision procedures (SMT = “satisfiability modulo

theories”).

Use SAT to generate propositionally satisfiable assignments

Use underlying theory solvers to test their satisfiability in the

theories.

Feed back conflict clauses to SAT solver

Mostly justified using same theory as Nelson-Oppen, and likewise a

modular structure where new theories can be added

Arrays

Machine words

... - 4: Interactive theorem proving and proof checking

John Harrison, Intel Corporation

Computer Science Club, St. Petersburg

Sun 29th September 2013 (13:00–14:35) - Interactive theorem proving (1)

In practice, many interesting problems can’t be automated

completely:

They don’t fall in a practical decidable subset

Pure first order proof search is not a feasible approach with,

e.g. set theory - Interactive theorem proving (1)

In practice, most interesting problems can’t be automated

completely:

They don’t fall in a practical decidable subset

Pure first order proof search is not a feasible approach with,

e.g. set theory

In practice, we need an interactive arrangement, where the user

and machine work together.

The user can delegate simple subtasks to pure first order proof

search or one of the decidable subsets.

However, at the high level, the user must guide the prover. - Interactive theorem proving (2)

The idea of a more ‘interactive’ approach was already anticipated

by pioneers, e.g. Wang (1960):

[...] the writer believes that perhaps machines may more

quickly become of practical use in mathematical research,

not by proving new theorems, but by formalizing and

checking outlines of proofs, say, from textbooks to

detailed formalizations more rigorous that Principia

[Mathematica], from technical papers to textbooks, or

from abstracts to technical papers.

However, constructing an effective and programmable combination

is not so easy. - SAM

First successful family of interactive provers were the SAM systems:

Semi-automated mathematics is an approach to

theorem-proving which seeks to combine automatic logic

routines with ordinary proof procedures in such a manner

that the resulting procedure is both efficient and subject

to human intervention in the form of control and

guidance. Because it makes the mathematician an

essential factor in the quest to establish theorems, this

approach is a departure from the usual theorem-proving

attempts in which the computer unaided seeks to

establish proofs.

SAM V was used to settle an open problem in lattice theory. - Three influential proof checkers

AUTOMATH (de Bruijn, . . . ) — Implementation of type

theory, used to check non-trivial mathematics such as

Landau’s Grundlagen

Mizar (Trybulec, . . . ) — Block-structured natural deduction

with ‘declarative’ justifications, used to formalize large body

of mathematics

LCF (Milner et al) — Programmable proof checker for Scott’s

Logic of Computable Functions written in new functional

language ML.

Ideas from all these systems are used in present-day systems.

(Corbineau’s declarative proof mode for Coq . . . ) - Sound extensibility

Ideally, it should be possible to customize and program the

theorem-prover with domain-specific proof procedures.

However, it’s difficult to allow this without compromising the

soundness of the system.

A very successful way to combine extensibility and reliability was

pioneered in LCF.

Now used in Coq, HOL, Isabelle, Nuprl, ProofPower, . . . . - Key ideas behind LCF

Implement in a strongly-typed functional programming

language (usually a variant of ML)

Make thm (‘theorem’) an abstract data type with only simple

primitive inference rules

Make the implementation language available for arbitrary

extensions. - First-order axioms (1)

p ⇒ (q ⇒ p)

(p ⇒ q ⇒ r ) ⇒ (p ⇒ q) ⇒ (p ⇒ r )

((p ⇒ ⊥) ⇒ ⊥) ⇒ p

(∀x . p ⇒ q) ⇒ (∀x . p) ⇒ (∀x . q)

p ⇒ ∀x . p [Provided x ∈ FV(p)]

(∃x . x = t) [Provided x ∈ FVT(t)]

t = t

s1 = t1 ⇒ ... ⇒ sn = tn ⇒ f (s1, .., sn) = f (t1, .., tn)

s1 = t1 ⇒ ... ⇒ sn = tn ⇒ P(s1, .., sn) ⇒ P(t1, .., tn) - First-order axioms (2)

(p ⇔ q) ⇒ p ⇒ q

(p ⇔ q) ⇒ q ⇒ p

(p ⇒ q) ⇒ (q ⇒ p) ⇒ (p ⇔ q)

⇔ (⊥ ⇒ ⊥)

¬p ⇔ (p ⇒ ⊥)

p ∧ q ⇔ (p ⇒ q ⇒ ⊥) ⇒ ⊥

p ∨ q ⇔ ¬(¬p ∧ ¬q)

(∃x . p) ⇔ ¬(∀x . ¬p) - First-order rules

Modus Ponens rule:

p ⇒ q

p

q

Generalization rule:

p

∀x. p - LCF kernel for first order logic (1)

Define type of first order formulas:

type term = Var of string | Fn of string * term list;;

type formula = False

| True

| Atom of string * term list

| Not of formula

| And of formula * formula

| Or of formula * formula

| Imp of formula * formula

| Iff of formula * formula

| Forall of string * formula

| Exists of string * formula;; - LCF kernel for first order logic (2)

Define some useful helper functions:

let mk_eq s t = Atom(R("=",[s;t]));;

let rec occurs_in s t =

s = t or

match t with

Var y -> false

| Fn(f,args) -> exists (occurs_in s) args;;

let rec free_in t fm =

match fm with

False | True -> false

| Atom(R(p,args)) -> exists (occurs_in t) args

| Not(p) -> free_in t p

| And(p,q) | Or(p,q) | Imp(p,q) | Iff(p,q) -> free_in t p or free_in t q

| Forall(y,p) | Exists(y,p) -> not(occurs_in (Var y) t) & free_in t p;; - LCF kernel for first order logic (3)

module Proven : Proofsystem =

struct type thm = formula

let axiom_addimp p q = Imp(p,Imp(q,p))

let axiom_distribimp p q r = Imp(Imp(p,Imp(q,r)),Imp(Imp(p,q),Imp(p,r)))

let axiom_doubleneg p = Imp(Imp(Imp(p,False),False),p)

let axiom_allimp x p q = Imp(Forall(x,Imp(p,q)),Imp(Forall(x,p),Forall(x,q)))

let axiom_impall x p =

if not (free_in (Var x) p) then Imp(p,Forall(x,p)) else failwith "axiom_impall"

let axiom_existseq x t =

if not (occurs_in (Var x) t) then Exists(x,mk_eq (Var x) t) else failwith "axiom_existseq"

let axiom_eqrefl t = mk_eq t t

let axiom_funcong f lefts rights =

itlist2 (fun s t p -> Imp(mk_eq s t,p)) lefts rights (mk_eq (Fn(f,lefts)) (Fn(f,rights)))

let axiom_predcong p lefts rights =

itlist2 (fun s t p -> Imp(mk_eq s t,p)) lefts rights (Imp(Atom(p,lefts),Atom(p,rights)))

let axiom_iffimp1 p q = Imp(Iff(p,q),Imp(p,q))

let axiom_iffimp2 p q = Imp(Iff(p,q),Imp(q,p))

let axiom_impiff p q = Imp(Imp(p,q),Imp(Imp(q,p),Iff(p,q)))

let axiom_true = Iff(True,Imp(False,False))

let axiom_not p = Iff(Not p,Imp(p,False))

let axiom_or p q = Iff(Or(p,q),Not(And(Not(p),Not(q))))

let axiom_and p q = Iff(And(p,q),Imp(Imp(p,Imp(q,False)),False))

let axiom_exists x p = Iff(Exists(x,p),Not(Forall(x,Not p)))

let modusponens pq p =

match pq with Imp(p’,q) when p = p’ -> q | _ -> failwith "modusponens"

let gen x p = Forall(x,p)

let concl c = c

end;; - Derived rules

The primitive rules are very simple. But using the LCF technique

we can build up a set of derived rules. The following derives p ⇒ p:

let imp_refl p = modusponens (modusponens (axiom_distribimp p (Imp(p,p)) p)

(axiom_addimp p (Imp(p,p))))

(axiom_addimp p p);; - Derived rules

The primitive rules are very simple. But using the LCF technique

we can build up a set of derived rules. The following derives p ⇒ p:

let imp_refl p = modusponens (modusponens (axiom_distribimp p (Imp(p,p)) p)

(axiom_addimp p (Imp(p,p))))

(axiom_addimp p p);;

While this process is tedious at the beginning, we can quickly

reach the stage of automatic derived rules that

Prove propositional tautologies

Perform Knuth-Bendix completion

Prove first order formulas by standard proof search and

translation - Fully-expansive decision procedures

Real LCF-style theorem provers like HOL have many powerful

derived rules.

Mostly just mimic standard algorithms like rewriting but by

inference. For cases where this is difficult:

Separate certification (my previous lecture)

Reflection (Tobias’s lectures) - Proof styles

Directly invoking the primitive or derived rules tends to give proofs

that are procedural.

A declarative style (what is to be proved, not how) can be nicer:

Easier to write and understand independent of the prover

Easier to modify

Less tied to the details of the prover, hence more portable

Mizar pioneered the declarative style of proof.

Recently, several other declarative proof languages have been

developed, as well as declarative shells round existing systems like

HOL and Isabelle.

Finding the right style is an interesting research topic. - Procedural proof example

let NSQRT_2 = prove

(‘!p q. p * p = 2 * q * q ==> q = 0‘,

MATCH_MP_TAC num_WF THEN REWRITE_TAC[RIGHT_IMP_FORALL_THM] THEN

REPEAT STRIP_TAC THEN FIRST_ASSUM(MP_TAC o AP_TERM ‘EVEN‘) THEN

REWRITE_TAC[EVEN_MULT; ARITH] THEN REWRITE_TAC[EVEN_EXISTS] THEN

DISCH_THEN(X_CHOOSE_THEN ‘m:num‘ SUBST_ALL_TAC) THEN

FIRST_X_ASSUM(MP_TAC o SPECL [‘q:num‘; ‘m:num‘]) THEN

ASM_REWRITE_TAC[ARITH_RULE

‘q < 2 * m ==> q * q = 2 * m * m ==> m = 0 <=>

(2 * m) * 2 * m = 2 * q * q ==> 2 * m <= q‘] THEN

ASM_MESON_TAC[LE_MULT2; MULT_EQ_0; ARITH_RULE ‘2 * x <= x <=> x = 0‘]);; - Declarative proof example

let NSQRT_2 = prove

(‘!p q. p * p = 2 * q * q ==> q = 0‘,

suffices_to_prove

‘!p. (!m. m < p ==> (!q. m * m = 2 * q * q ==> q = 0))

==> (!q. p * p = 2 * q * q ==> q = 0)‘

(wellfounded_induction) THEN

fix [‘p:num‘] THEN

assume("A") ‘!m. m < p ==> !q. m * m = 2 * q * q ==> q = 0‘ THEN

fix [‘q:num‘] THEN

assume("B") ‘p * p = 2 * q * q‘ THEN

so have ‘EVEN(p * p) <=> EVEN(2 * q * q)‘ (trivial) THEN

so have ‘EVEN(p)‘ (using [ARITH; EVEN_MULT] trivial) THEN

so consider (‘m:num‘,"C",‘p = 2 * m‘) (using [EVEN_EXISTS] trivial) THEN

cases ("D",‘q < p \/ p <= q‘) (arithmetic) THENL

[so have ‘q * q = 2 * m * m ==> m = 0‘ (by ["A"] trivial) THEN

so we’re finished (by ["B"; "C"] algebra);

so have ‘p * p <= q * q‘ (using [LE_MULT2] trivial) THEN

so have ‘q * q = 0‘ (by ["B"] arithmetic) THEN

so we’re finished (algebra)]);; - Is automation even more declarative?

let LEMMA_1 = SOS_RULE

‘p EXP 2 = 2 * q EXP 2

==> (q = 0 \/ 2 * q - p < p /\ ~(p - q = 0)) /\

(2 * q - p) EXP 2 = 2 * (p - q) EXP 2‘;;

let NSQRT_2 = prove

(‘!p q. p * p = 2 * q * q ==> q = 0‘,

REWRITE_TAC[GSYM EXP_2] THEN MATCH_MP_TAC num_WF THEN MESON_TAC[LEMMA_1]);; - The Seventeen Provers of the World (1)

ACL2 — Highly automated prover for first-order number

theory without explicit quantifiers, able to do induction proofs

itself.

Alfa/Agda — Prover for constructive type theory integrated

with dependently typed programming language.

B prover — Prover for first-order set theory designed to

support verification and refinement of programs.

Coq — LCF-like prover for constructive Calculus of

Constructions with reflective programming language.

HOL (HOL Light, HOL4, ProofPower) — Seminal LCF-style

prover for classical simply typed higher-order logic.

IMPS — Interactive prover for an expressive logic supporting

partially defined functions. - The Seventeen Provers of the World (2)

Isabelle/Isar — Generic prover in LCF style with a newer

declarative proof style influenced by Mizar.

Lego — Well-established framework for proof in constructive

type theory, with a similar logic to Coq.

Metamath — Fast proof checker for an exceptionally simple

axiomatization of standard ZF set theory.

Minlog — Prover for minimal logic supporting practical

extraction of programs from proofs.

Mizar — Pioneering system for formalizing mathematics,

originating the declarative style of proof.

Nuprl/MetaPRL — LCF-style prover with powerful graphical

interface for Martin-L¨

of type theory with new constructs. - The Seventeen Provers of the World (3)

Omega — Unified combination in modular style of several

theorem-proving techniques including proof planning.

Otter/IVY — Powerful automated theorem prover for pure

first-order logic plus a proof checker.

PVS — Prover designed for applications with an expressive

classical type theory and powerful automation.

PhoX — prover for higher-order logic designed to be relatively

simple to use in comparison with Coq, HOL etc.

Theorema — Ambitious integrated framework for theorem

proving and computer algebra built inside Mathematica.

For more, see Freek Wiedijk, The Seventeen Provers of the World,

Springer Lecture Notes in Computer Science vol. 3600, 2006. - Certification of decision procedures

We might want a decision procedure to produce a ‘proof’ or

‘certificate’

Doubts over the correctness of the core decision method

Desire to use the proof in other contexts

This arises in at least two real cases:

Fully expansive (e.g. ‘LCF-style’) theorem proving.

Proof-carrying code - Certifiable and non-certifiable

The most desirable situation is that a decision procedure should

produce a short certificate that can be checked easily.

Factorization and primality is a good example:

Certificate that a number is not prime: the factors! (Others

are also possible.)

Certificate that a number is prime: Pratt, Pocklington,

Pomerance, . . .

This means that primality checking is in NP ∩ co-NP (we now

know it’s in P). - Certifying universal formulas over C

Use the (weak) Hilbert Nullstellensatz:

The polynomial equations p1(x1, . . . , xn) = 0, . . . ,

pk (x1, . . . , xn) = 0 in an algebraically closed field have no common

solution iff there are polynomials q1(x1, . . . , xn), . . . , qk (x1, . . . , xn)

such that the following polynomial identity holds:

q1(x1, . . . , xn)·p1(x1, . . . , xn)+· · ·+qk (x1, . . . , xn)·pk (x1, . . . , xn) = 1

All we need to certify the result is the cofactors qi (x1, . . . , xn),

which we can find by an instrumented Gr¨

obner basis algorithm.

The checking process involves just algebraic normalization (maybe

still not totally trivial. . . ) - Certifying universal formulas over R

There is a similar but more complicated Nullstellensatz (and

Positivstellensatz) over R.

The general form is similar, but it’s more complicated because of

all the different orderings.

It inherently involves sums of squares (SOS), and the certificates

can be found efficiently using semidefinite programming (Parillo

. . . )

Example: easy to check

∀a b c x. ax2 + bx + c = 0 ⇒ b2 − 4ac ≥ 0

via the following SOS certificate:

b2 − 4ac = (2ax + b)2 − 4a(ax 2 + bx + c) - Less favourable cases

Unfortunately not all decision procedures seem to admit a nice

separation of proof from checking.

Then if a proof is required, there seems no significantly easier way

than generating proofs along each step of the algorithm.

Example: Cohen-H¨

ormander algorithm implemented in HOL Light

by McLaughlin (CADE 2005).

Works well, useful for small problems, but about 1000× slowdown

relative to non-proof-producing implementation.

Should we use reflection, i.e. verify the code itself? - 5: Applications to mathematics and computer verification

John Harrison, Intel Corporation

Computer Science Club, St. Petersburg

Sun 29th September 2013 (15:35–17:10) - This practical formal mathematics was to forestall objections

to Russell and Whitehead’s ‘logicist’ thesis, not a goal in itself.

The development was difficult and painstaking, and has

probably been studied in detail by very few.

Subsequently, the idea of actually formalizing proofs has not

been taken very seriously, and few mathematicians do it today.

But thanks to the rise of the computer, the actual formalization of

mathematics is attracting more interest.

100 years since Principia Mathematica

Principia Mathematica was the first sustained and successful actual

formalization of mathematics. - The development was difficult and painstaking, and has

probably been studied in detail by very few.

Subsequently, the idea of actually formalizing proofs has not

been taken very seriously, and few mathematicians do it today.

But thanks to the rise of the computer, the actual formalization of

mathematics is attracting more interest.

100 years since Principia Mathematica

Principia Mathematica was the first sustained and successful actual

formalization of mathematics.

This practical formal mathematics was to forestall objections

to Russell and Whitehead’s ‘logicist’ thesis, not a goal in itself. - Subsequently, the idea of actually formalizing proofs has not

been taken very seriously, and few mathematicians do it today.

But thanks to the rise of the computer, the actual formalization of

mathematics is attracting more interest.

100 years since Principia Mathematica

Principia Mathematica was the first sustained and successful actual

formalization of mathematics.

This practical formal mathematics was to forestall objections

to Russell and Whitehead’s ‘logicist’ thesis, not a goal in itself.

The development was difficult and painstaking, and has

probably been studied in detail by very few. - But thanks to the rise of the computer, the actual formalization of

mathematics is attracting more interest.

100 years since Principia Mathematica

Principia Mathematica was the first sustained and successful actual

formalization of mathematics.

This practical formal mathematics was to forestall objections

to Russell and Whitehead’s ‘logicist’ thesis, not a goal in itself.

The development was difficult and painstaking, and has

probably been studied in detail by very few.

Subsequently, the idea of actually formalizing proofs has not

been taken very seriously, and few mathematicians do it today. - 100 years since Principia Mathematica

Principia Mathematica was the first sustained and successful actual

formalization of mathematics.

This practical formal mathematics was to forestall objections

to Russell and Whitehead’s ‘logicist’ thesis, not a goal in itself.

The development was difficult and painstaking, and has

probably been studied in detail by very few.

Subsequently, the idea of actually formalizing proofs has not

been taken very seriously, and few mathematicians do it today.

But thanks to the rise of the computer, the actual formalization of

mathematics is attracting more interest. - Computers are expressly designed for performing formal

manipulations quickly and without error, so can be used to

check and partly generate formal proofs.

Correctness questions in computer science (hardware,

programs, protocols etc.) generate a whole new array of

difficult mathematical and logical problems where formal proof

can help.

Because of these dual connections, interest in formal proofs is

strongest among computer scientists, but some ‘mainstream’

mathematicians are becoming interested too.

The importance of computers for formal proof

Computers can both help with formal proof and give us new

reasons to be interested in it: - Correctness questions in computer science (hardware,

programs, protocols etc.) generate a whole new array of

difficult mathematical and logical problems where formal proof

can help.

Because of these dual connections, interest in formal proofs is

strongest among computer scientists, but some ‘mainstream’

mathematicians are becoming interested too.

The importance of computers for formal proof

Computers can both help with formal proof and give us new

reasons to be interested in it:

Computers are expressly designed for performing formal

manipulations quickly and without error, so can be used to

check and partly generate formal proofs. - Because of these dual connections, interest in formal proofs is

strongest among computer scientists, but some ‘mainstream’

mathematicians are becoming interested too.

The importance of computers for formal proof

Computers can both help with formal proof and give us new

reasons to be interested in it:

Computers are expressly designed for performing formal

manipulations quickly and without error, so can be used to

check and partly generate formal proofs.

Correctness questions in computer science (hardware,

programs, protocols etc.) generate a whole new array of

difficult mathematical and logical problems where formal proof

can help. - The importance of computers for formal proof

Computers can both help with formal proof and give us new

reasons to be interested in it:

Computers are expressly designed for performing formal

manipulations quickly and without error, so can be used to

check and partly generate formal proofs.

Correctness questions in computer science (hardware,

programs, protocols etc.) generate a whole new array of

difficult mathematical and logical problems where formal proof

can help.

Because of these dual connections, interest in formal proofs is

strongest among computer scientists, but some ‘mainstream’

mathematicians are becoming interested too. - “I am delighted to know that Principia Mathematica can

now be done by machinery [...] I am quite willing to

believe that everything in deductive logic can be done by

machinery. [...] I wish Whitehead and I had known of

this possibility before we wasted 10 years doing it by

hand.” [letter from Russell to Simon]

Newell and Simon’s paper on a more elegant proof of one result in

PM was rejected by JSL because it was co-authored by a machine.

Russell was an early fan of mechanized formal proof

Newell, Shaw and Simon in the 1950s developed a ‘Logic Theory

Machine’ program that could prove some of the theorems from

Principia Mathematica automatically. - Newell and Simon’s paper on a more elegant proof of one result in

PM was rejected by JSL because it was co-authored by a machine.

Russell was an early fan of mechanized formal proof

Newell, Shaw and Simon in the 1950s developed a ‘Logic Theory

Machine’ program that could prove some of the theorems from

Principia Mathematica automatically.

“I am delighted to know that Principia Mathematica can

now be done by machinery [...] I am quite willing to

believe that everything in deductive logic can be done by

machinery. [...] I wish Whitehead and I had known of

this possibility before we wasted 10 years doing it by

hand.” [letter from Russell to Simon] - Russell was an early fan of mechanized formal proof

Newell, Shaw and Simon in the 1950s developed a ‘Logic Theory

Machine’ program that could prove some of the theorems from

Principia Mathematica automatically.

“I am delighted to know that Principia Mathematica can

now be done by machinery [...] I am quite willing to

believe that everything in deductive logic can be done by

machinery. [...] I wish Whitehead and I had known of

this possibility before we wasted 10 years doing it by

hand.” [letter from Russell to Simon]

Newell and Simon’s paper on a more elegant proof of one result in

PM was rejected by JSL because it was co-authored by a machine. - Express statements of theorems in a formal language, typically

in terms of primitive notions such as sets.

Write proofs using a fixed set of formal inference rules, whose

correct form can be checked algorithmically.

Correctness of a formal proof is an objective question,

algorithmically checkable in principle.

Formalization in current mathematics

Traditionally, we understand formalization to have two

components, corresponding to Leibniz’s characteristica universalis

and calculus ratiocinator. - Write proofs using a fixed set of formal inference rules, whose

correct form can be checked algorithmically.

Correctness of a formal proof is an objective question,

algorithmically checkable in principle.

Formalization in current mathematics

Traditionally, we understand formalization to have two

components, corresponding to Leibniz’s characteristica universalis

and calculus ratiocinator.

Express statements of theorems in a formal language, typically

in terms of primitive notions such as sets. - Correctness of a formal proof is an objective question,

algorithmically checkable in principle.

Formalization in current mathematics

Traditionally, we understand formalization to have two

components, corresponding to Leibniz’s characteristica universalis

and calculus ratiocinator.

Express statements of theorems in a formal language, typically

in terms of primitive notions such as sets.

Write proofs using a fixed set of formal inference rules, whose

correct form can be checked algorithmically. - Formalization in current mathematics

Traditionally, we understand formalization to have two

components, corresponding to Leibniz’s characteristica universalis

and calculus ratiocinator.

Express statements of theorems in a formal language, typically

in terms of primitive notions such as sets.

Write proofs using a fixed set of formal inference rules, whose

correct form can be checked algorithmically.

Correctness of a formal proof is an objective question,

algorithmically checkable in principle. - Mathematics is reduced to sets

The explication of mathematical concepts in terms of sets is now

quite widely accepted (see Bourbaki).

A real number is a set of rational numbers . . .

A Turing machine is a quintuple (Σ, A, . . .)

Statements in such terms are generally considered clearer and more

objective. (Consider pathological functions from real analysis . . . ) - Symbolism is important

The use of symbolism in mathematics has been steadily increasing

over the centuries:

“[Symbols] have invariably been introduced to make

things easy. [. . . ] by the aid of symbolism, we can make

transitions in reasoning almost mechanically by the eye,

which otherwise would call into play the higher faculties

of the brain. [. . . ] Civilisation advances by extending the

number of important operations which can be performed

without thinking about them.” (Whitehead, An

Introduction to Mathematics) - Formalization is the key to rigour

Formalization now has a important conceptual role in principle:

“. . . the correctness of a mathematical text is verified by

comparing it, more or less explicitly, with the rules of a

formalized language.” (Bourbaki, Theory of Sets)

“A Mathematical proof is rigorous when it is (or could

be) written out in the first-order predicate language L(∈)

as a sequence of inferences from the axioms ZFC, each

inference made according to one of the stated rules.”

(Mac Lane, Mathematics: Form and Function)

What about in practice? - Mathematicians don’t use logical symbols

Variables were used in logic long before they appeared in

mathematics, but logical symbolism is rare in current mathematics.

Logical relationships are usually expressed in natural language, with

all its subtlety and ambiguity.

Logical symbols like ‘⇒’ and ‘∀’ are used ad hoc, mainly for their

abbreviatory effect.

“as far as the mathematical community is concerned

George Boole has lived in vain” (Dijkstra) - Mathematicians don’t do formal proofs . . .

The idea of actual formalization of mathematical proofs has not

been taken very seriously:

“this mechanical method of deducing some mathematical

theorems has no practical value because it is too

complicated in practice.” (Rasiowa and Sikorski, The

Mathematics of Metamathematics)

“[. . . ] the tiniest proof at the beginning of the Theory of

Sets would already require several hundreds of signs for

its complete formalization. [. . . ] formalized mathematics

cannot in practice be written down in full [. . . ] We shall

therefore very quickly abandon formalized mathematics”

(Bourbaki, Theory of Sets) - . . . and the few people that do end up regretting it

“my intellect never quite recovered from the strain of

writing [Principia Mathematica]. I have been ever since

definitely less capable of dealing with difficult

abstractions than I was before.” (Russell, Autobiography)

However, now we have computers to check and even automatically

generate formal proofs.

Our goal is now not so much philosphical, but to achieve a real,

practical, useful increase in the precision and accuracy of

mathematical proofs. - Are proofs in doubt?

Mathematical proofs are subjected to peer review, but errors often

escape unnoticed.

“Professor Offord and I recently committed ourselves to

an odd mistake (Annals of Mathematics (2) 49, 923,

1.5). In formulating a proof a plus sign got omitted,

becoming in effect a multiplication sign. The resulting

false formula got accepted as a basis for the ensuing

fallacious argument. (In defence, the final result was

known to be true.)” (Littlewood, Miscellany)

A book by Lecat gave 130 pages of errors made by major

mathematicians up to 1900.

A similar book today would no doubt fill many volumes. - Even elegant textbook proofs can be wrong

“The second edition gives us the opportunity to present

this new version of our book: It contains three additional

chapters, substantial revisions and new proofs in several

others, as well as minor amendments and improvements,

many of them based on the suggestions we received. It

also misses one of the old chapters, about the “problem

of the thirteen spheres,” whose proof turned out to need

details that we couldn’t complete in a way that would

make it brief and elegant.” (Aigner and Ziegler, Proofs

from the Book) - Most doubtful informal proofs

What are the proofs where we do in practice worry about

correctness?

Those that are just very long and involved. Classification of

finite simple groups, Seymour-Robertson graph minor theorem

Those that involve extensive computer checking that cannot

in practice be verified by hand. Four-colour theorem, Hales’s

proof of the Kepler conjecture

Those that are about very technical areas where complete

rigour is painful. Some branches of proof theory, formal

verification of hardware or software - 4-colour Theorem

Early history indicates fallibility of the traditional social process:

Proof claimed by Kempe in 1879

Flaw only point out in print by Heaywood in 1890

Later proof by Appel and Haken was apparently correct, but gave

rise to a new worry:

How to assess the correctness of a proof where many explicit

configurations are checked by a computer program?

Most worries finally dispelled by Gonthier’s formal proof in Coq. - Recent formal proofs in pure mathematics

Three notable recent formal proofs in pure mathematics:

Prime Number Theorem — Jeremy Avigad et al

(Isabelle/HOL), John Harrison (HOL Light)

Jordan Curve Theorem — Tom Hales (HOL Light), Andrzej

Trybulec et al. (Mizar)

Four-colour theorem — Georges Gonthier (Coq)

Odd order theorem — Georges Gonthier and others (Coq)

These indicate that highly non-trivial results are within reach.

However these all required months/years of work. - The Kepler conjecture

The Kepler conjecture states that no arrangement of identical balls

in ordinary 3-dimensional space has a higher packing density than

the obvious ‘cannonball’ arrangement.

Hales, working with Ferguson, arrived at a proof in 1998:

300 pages of mathematics: geometry, measure, graph theory

and related combinatorics, . . .

40,000 lines of supporting computer code: graph enumeration,

nonlinear optimization and linear programming.

Hales submitted his proof to Annals of Mathematics . . . - The response of the reviewers

After a full four years of deliberation, the reviewers returned:

“The news from the referees is bad, from my perspective.

They have not been able to certify the correctness of the

proof, and will not be able to certify it in the future,

because they have run out of energy to devote to the

problem. This is not what I had hoped for.

Fejes Toth thinks that this situation will occur more and

more often in mathematics. He says it is similar to the

situation in experimental science — other scientists

acting as referees can’t certify the correctness of an

experiment, they can only subject the paper to

consistency checks. He thinks that the mathematical

community will have to get used to this state of affairs.” - The birth of Flyspeck

Hales’s proof was eventually published, and no significant error has

been found in it. Nevertheless, the verdict is disappointingly

lacking in clarity and finality.

As a result of this experience, the journal changed its editorial

policy on computer proof so that it will no longer even try to check

the correctness of computer code.

Dissatisfied with this state of affairs, Hales initiated a project

called Flyspeck to completely formalize the proof. - Flyspeck

Flyspeck = ‘Formal Proof of the Kepler Conjecture’.

“In truth, my motivations for the project are far more

complex than a simple hope of removing residual doubt

from the minds of few referees. Indeed, I see formal

methods as fundamental to the long-term growth of

mathematics. (Hales, The Kepler Conjecture)

The formalization effort has been running for a few years now with

a significant group of people involved, some doing their PhD on

Flyspeck-related formalization.

In parallel, Hales has simplified the informal proof using ideas from

Marchal, significantly cutting down on the formalization work. - Flyspeck: current status

Almost all the ordinary mathematics has been formalized in

HOL Light: Euclidean geometry, measure theory, hypermaps,

fans, results on packings.

Many of the linear programs have been verified in

Isabelle/HOL by Steven Obua. Alexey Solovyev has recently

developed a faster HOL Light formalization.

The graph enumeration process has been verified (and

improved in the process) by Tobias Nipkow in Isabelle/HOL

Some initial work by Roland Zumkeller on nonlinear part using

Bernstein polynomials. Solovyev has been working on

formalizing this in HOL Light. - Formal verification

In most software and hardware development, we lack even informal

proofs of correctness.

Correctness of hardware, software, protocols etc. is routinely

“established” by testing.

However, exhaustive testing is impossible and subtle bugs often

escape detection until it’s too late.

The consequences of bugs in the wild can be serious, even deadly.

Formal verification (proving correctness) seems the most

satisfactory solution, but gives rise to large, ugly proofs. - Recent formal proofs in computer system verification

Some successes for verification using theorem proving technology:

Microcode algorithms for floating-point division, square root

and several transcendental functions on Intel Itanium

processor family (John Harrison, HOL Light)

CompCert verified compiler from significant subset of the C

programming language into PowerPC assembler (Xavier Leroy

et al., Coq)

Designed-for-verification version of L4 operating system

microkernel (Gerwin Klein et al., Isabelle/HOL).

Again, these indicate that complex and subtle computer systems

can be verified, but significant manual effort was needed, perhaps

tens of person-years for L4. - If the Intel Software and Services Group (SSG) were split off as a

separate company, it would be in the top 10 software companies

worldwide.

A diversity of activities

Intel is best known as a hardware company, and hardware is still the

core of the company’s business. However this entails much more:

Microcode

Firmware

Protocols

Software - A diversity of activities

Intel is best known as a hardware company, and hardware is still the

core of the company’s business. However this entails much more:

Microcode

Firmware

Protocols

Software

If the Intel Software and Services Group (SSG) were split off as a

separate company, it would be in the top 10 software companies

worldwide. - A diversity of verification problems

This gives rise to a corresponding diversity of verification problems,

and of verification solutions.

Propositional tautology/equivalence checking (FEV)

Symbolic simulation

Symbolic trajectory evaluation (STE)

Temporal logic model checking

Combined decision procedures (SMT)

First order automated theorem proving

Interactive theorem proving

Most of these techniques (trading automation for generality /

efficiency) are in active use at Intel. - A spectrum of formal techniques

Traditionally, formal verification has been focused on complete

proofs of functional correctness.

But recently there have been notable successes elsewhere for

‘semi-formal’ methods involving abstraction or more limited

property checking.

Airbus A380 avionics

Microsoft SLAM/SDV

One can also consider applying theorem proving technology to

support testing or other traditional validation methods like path

coverage.

These are all areas of interest at Intel. - Models and their validation

We have the usual concerns about validating our specs, but also

need to pay attention to the correspondence between our models

and physical reality.

Actual requirements

✻

Formal specification

✻

Design model

✻

Actual system - In 1978, Intel encountered problems with ‘soft errors’ in some

of its DRAM chips.

The cause turned out to be alpha particle emission from the

packaging.

The factory producing the ceramic packaging was on the

Green River in Colorado, downstream from the tailings of an

old uranium mine.

However, these are rare and apparently well controlled by existing

engineering best practice.

Physical problems

Chips can suffer from physical problems, usually due to overheating

or particle bombardment (‘soft errors’). - The cause turned out to be alpha particle emission from the

packaging.

The factory producing the ceramic packaging was on the

Green River in Colorado, downstream from the tailings of an

old uranium mine.

However, these are rare and apparently well controlled by existing

engineering best practice.

Physical problems

Chips can suffer from physical problems, usually due to overheating

or particle bombardment (‘soft errors’).

In 1978, Intel encountered problems with ‘soft errors’ in some

of its DRAM chips. - The factory producing the ceramic packaging was on the

Green River in Colorado, downstream from the tailings of an

old uranium mine.

However, these are rare and apparently well controlled by existing

engineering best practice.

Physical problems

Chips can suffer from physical problems, usually due to overheating

or particle bombardment (‘soft errors’).

In 1978, Intel encountered problems with ‘soft errors’ in some

of its DRAM chips.

The cause turned out to be alpha particle emission from the

packaging. - However, these are rare and apparently well controlled by existing

engineering best practice.

Physical problems

Chips can suffer from physical problems, usually due to overheating

or particle bombardment (‘soft errors’).

In 1978, Intel encountered problems with ‘soft errors’ in some

of its DRAM chips.

The cause turned out to be alpha particle emission from the

packaging.

The factory producing the ceramic packaging was on the

Green River in Colorado, downstream from the tailings of an

old uranium mine. - Physical problems

Chips can suffer from physical problems, usually due to overheating

or particle bombardment (‘soft errors’).

In 1978, Intel encountered problems with ‘soft errors’ in some

of its DRAM chips.

The cause turned out to be alpha particle emission from the

packaging.

The factory producing the ceramic packaging was on the

Green River in Colorado, downstream from the tailings of an

old uranium mine.

However, these are rare and apparently well controlled by existing

engineering best practice. - Error in the floating-point division (FDIV) instruction on some

early Intel Pentium processors

Very rarely encountered, but was hit by a mathematician

doing research in number theory.

Intel eventually set aside US $475 million to cover the costs.

This did at least considerably improve investment in formal

verification.

The FDIV bug

Formal methods are more useful for avoiding design errors such as

the infamous FDIV bug: - Very rarely encountered, but was hit by a mathematician

doing research in number theory.

Intel eventually set aside US $475 million to cover the costs.

This did at least considerably improve investment in formal

verification.

The FDIV bug

Formal methods are more useful for avoiding design errors such as

the infamous FDIV bug:

Error in the floating-point division (FDIV) instruction on some

early Intel Pentium processors - Intel eventually set aside US $475 million to cover the costs.

This did at least considerably improve investment in formal

verification.

The FDIV bug

Formal methods are more useful for avoiding design errors such as

the infamous FDIV bug:

Error in the floating-point division (FDIV) instruction on some

early Intel Pentium processors

Very rarely encountered, but was hit by a mathematician

doing research in number theory. - This did at least considerably improve investment in formal

verification.

The FDIV bug

Formal methods are more useful for avoiding design errors such as

the infamous FDIV bug:

Error in the floating-point division (FDIV) instruction on some

early Intel Pentium processors

Very rarely encountered, but was hit by a mathematician

doing research in number theory.

Intel eventually set aside US $475 million to cover the costs. - The FDIV bug

Formal methods are more useful for avoiding design errors such as

the infamous FDIV bug:

Error in the floating-point division (FDIV) instruction on some

early Intel Pentium processors

Very rarely encountered, but was hit by a mathematician

doing research in number theory.

Intel eventually set aside US $475 million to cover the costs.

This did at least considerably improve investment in formal

verification. - Implement high-level floating-point algorithm assuming

addition works correctly.

Implement a cache coherence protocol assuming that the

abstract protocol ensures coherence.

Many similar ideas all over computing: protocol stack, virtual

machines etc.

If this clean separation starts to break down, we may face much

worse verification problems. . .

Layers of verification

If we want to verify from the level of software down to the

transistors, then it’s useful to identify and specify intermediate

layers. - Implement a cache coherence protocol assuming that the

abstract protocol ensures coherence.

Many similar ideas all over computing: protocol stack, virtual

machines etc.

If this clean separation starts to break down, we may face much

worse verification problems. . .

Layers of verification

If we want to verify from the level of software down to the

transistors, then it’s useful to identify and specify intermediate

layers.

Implement high-level floating-point algorithm assuming

addition works correctly. - Many similar ideas all over computing: protocol stack, virtual

machines etc.

If this clean separation starts to break down, we may face much

worse verification problems. . .

Layers of verification

If we want to verify from the level of software down to the

transistors, then it’s useful to identify and specify intermediate

layers.

Implement high-level floating-point algorithm assuming

addition works correctly.

Implement a cache coherence protocol assuming that the

abstract protocol ensures coherence. - If this clean separation starts to break down, we may face much

worse verification problems. . .

Layers of verification

If we want to verify from the level of software down to the

transistors, then it’s useful to identify and specify intermediate

layers.

Implement high-level floating-point algorithm assuming

addition works correctly.

Implement a cache coherence protocol assuming that the

abstract protocol ensures coherence.

Many similar ideas all over computing: protocol stack, virtual

machines etc. - Layers of verification

If we want to verify from the level of software down to the

transistors, then it’s useful to identify and specify intermediate

layers.

Implement high-level floating-point algorithm assuming

addition works correctly.

Implement a cache coherence protocol assuming that the

abstract protocol ensures coherence.

Many similar ideas all over computing: protocol stack, virtual

machines etc.

If this clean separation starts to break down, we may face much

worse verification problems. . . - How some of our verifications fit together

For example, the fma behavior is the assumption for my

verification, and the conclusion for someone else’s.

sin correct

✻

fma correct

✻

gate-level description

But this is not quite trivial when the verifications use different

formalisms! - Our work

We have formally verified correctness of various floating-point

algorithms.

Division and square root (Marstein-style, using fused

multiply-add to do Newton-Raphson or power series

approximation with delicate final rounding).

Transcendental functions like log and sin (table-driven

algorithms using range reduction and a core polynomial

approximations).

Proofs use the HOL Light prover

http://www.cl.cam.ac.uk/users/jrh/hol-light - Our HOL Light proofs

The mathematics we formalize is mostly:

Elementary number theory and real analysis

Floating-point numbers, results about rounding etc.

Needs several special-purpose proof procedures, e.g.

Verifying solution set of some quadratic congruences

Proving primality of particular numbers

Proving bounds on rational approximations

Verifying errors in polynomial approximations - If the reduced argument r is still not small enough, it is

separated into its leading few bits B and the trailing part

x = r − B, and the overall result computed from tan(x ) and

pre-stored functions of B, e.g.

1

tan(x )

sin(B)cos(B)

tan(B + x ) = tan(B) + cot(B) − tan(x)

Now a power series approximation is used for tan(r ), cot(r ) or

tan(x ) as appropriate.

Example: tangent algorithm

The input number X is first reduced to r with approximately

|r | ≤ π/4 such that X = r + Nπ/2 for some integer N. We

now need to calculate ±tan(r ) or ±cot(r ) depending on N

modulo 4. - Now a power series approximation is used for tan(r ), cot(r ) or

tan(x ) as appropriate.

Example: tangent algorithm

The input number X is first reduced to r with approximately

|r | ≤ π/4 such that X = r + Nπ/2 for some integer N. We

now need to calculate ±tan(r ) or ±cot(r ) depending on N

modulo 4.

If the reduced argument r is still not small enough, it is

separated into its leading few bits B and the trailing part

x = r − B, and the overall result computed from tan(x ) and

pre-stored functions of B, e.g.

1

tan(x )

sin(B)cos(B)

tan(B + x ) = tan(B) + cot(B) − tan(x) - Example: tangent algorithm

The input number X is first reduced to r with approximately

|r | ≤ π/4 such that X = r + Nπ/2 for some integer N. We

now need to calculate ±tan(r ) or ±cot(r ) depending on N

modulo 4.

If the reduced argument r is still not small enough, it is

separated into its leading few bits B and the trailing part

x = r − B, and the overall result computed from tan(x ) and

pre-stored functions of B, e.g.

1

tan(x )

sin(B)cos(B)

tan(B + x ) = tan(B) + cot(B) − tan(x)

Now a power series approximation is used for tan(r ), cot(r ) or

tan(x ) as appropriate. - The range reduction to obtain r is done accurately.

The mathematical facts used to reconstruct the result from

components are applicable.

Stored constants such as tan(B) are sufficiently accurate.

The power series approximation does not introduce too much

error in approximation.

The rounding errors involved in computing with floating point

arithmetic are within bounds.

Most of these parts are non-trivial. Moreover, some of them

require more pure mathematics than might be expected.

Overview of the verification

To verify this algorithm, we need to prove: - The mathematical facts used to reconstruct the result from

components are applicable.

Stored constants such as tan(B) are sufficiently accurate.

The power series approximation does not introduce too much

error in approximation.

The rounding errors involved in computing with floating point

arithmetic are within bounds.

Most of these parts are non-trivial. Moreover, some of them

require more pure mathematics than might be expected.

Overview of the verification

To verify this algorithm, we need to prove:

The range reduction to obtain r is done accurately. - Stored constants such as tan(B) are sufficiently accurate.

The power series approximation does not introduce too much

error in approximation.

The rounding errors involved in computing with floating point

arithmetic are within bounds.

Most of these parts are non-trivial. Moreover, some of them

require more pure mathematics than might be expected.

Overview of the verification

To verify this algorithm, we need to prove:

The range reduction to obtain r is done accurately.

The mathematical facts used to reconstruct the result from

components are applicable. - The power series approximation does not introduce too much

error in approximation.

The rounding errors involved in computing with floating point

arithmetic are within bounds.

Most of these parts are non-trivial. Moreover, some of them

require more pure mathematics than might be expected.

Overview of the verification

To verify this algorithm, we need to prove:

The range reduction to obtain r is done accurately.

The mathematical facts used to reconstruct the result from

components are applicable.

Stored constants such as tan(B) are sufficiently accurate. - The rounding errors involved in computing with floating point

arithmetic are within bounds.

Most of these parts are non-trivial. Moreover, some of them

require more pure mathematics than might be expected.

Overview of the verification

To verify this algorithm, we need to prove:

The range reduction to obtain r is done accurately.

The mathematical facts used to reconstruct the result from

components are applicable.

Stored constants such as tan(B) are sufficiently accurate.

The power series approximation does not introduce too much

error in approximation. - Most of these parts are non-trivial. Moreover, some of them

require more pure mathematics than might be expected.

Overview of the verification

To verify this algorithm, we need to prove:

The range reduction to obtain r is done accurately.

The mathematical facts used to reconstruct the result from

components are applicable.

Stored constants such as tan(B) are sufficiently accurate.

The power series approximation does not introduce too much

error in approximation.

The rounding errors involved in computing with floating point

arithmetic are within bounds. - Overview of the verification

To verify this algorithm, we need to prove:

The range reduction to obtain r is done accurately.

The mathematical facts used to reconstruct the result from

components are applicable.

Stored constants such as tan(B) are sufficiently accurate.

The power series approximation does not introduce too much

error in approximation.

The rounding errors involved in computing with floating point

arithmetic are within bounds.

Most of these parts are non-trivial. Moreover, some of them

require more pure mathematics than might be expected. - Why mathematics?

Controlling the error in range reduction becomes difficult when the

reduced argument X − Nπ/2 is small.

To check that the computation is accurate enough, we need to

know:

How close can a floating point number be to an integer

multiple of π/2?

Even deriving the power series (for 0 < |x | < π):

1

1

2

cot(x ) = 1/x −

x −

x 3 −

x 5 − . . .

3

45

945

is much harder than you might expect. - Why HOL Light?

We need a general theorem proving system with:

High standard of logical rigor and reliability

Ability to mix interactive and automated proof

Programmability for domain-specific proof tasks

A substantial library of pre-proved mathematics

Other theorem provers such as ACL2, Coq and PVS have also been

used for verification in this area.