Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Flyspecking Flyspeck
Mark AdamsRadboud University / Proof Technologies Ltd
Tom’s 60th BirthdayHappy Birthday Tom!
Overview
● Part 1: Brief History of Tom’s Proof● Part 2: The Flyspeck Proof● Part 3: Possible Pitfalls in Flyspeck● Part 4: Proof Auditing Flyspeck
PART 1
Brief History of Tom’s Proof
The Conjecture
● Kepler Conjecture (1611):– The best sphere packing is the Face-Centered
Cubic (FCC) packing
– Density = /√18 ≈ 74.048%
● In other words:– Any cuboid box of same-shaped balls has enough
space to hold at least 25.951% liquid
Outline Proof
● Part A (Main Text):– Show that can reduce to considering spheres within a
locality of k (>2) from a central sphere.
– For a given packing, consider the graph from projecting the spheres’ centres onto the surface of the central sphere, where edges connect centres with locality of k from each other. Show that, for sufficiently small k, the graph is planar.
– Show that to do better than an FCC packing, the planar graph must be “tame”.
Outline Proof
● Part B (Graph Enumeration):– Show that there are only ≈3,000 possible tame planar graphs
(modulo isomorphism)
● Part C (Non-linear Inequalities):– For each tame graph, find a set of non-linear inequalities that
must hold, and reduce each of these down to a set of linear inequalities.
● Part D (Linear Inequalities):– Show that each linear inequality set is unsatisfiable.
● Thus, by contradiction, there is no packing better than FCC
Proof Size
● The main text (Part A):– 300 pages of “traditional” mathematical text
● Graph enumeration (Part B):– 2,000 lines of Java code
● Non-linear inequalities (Part C):– Thousands of lines of C / C++
● Linear inequalities (Part D):– 100,000 inequalities in 200 variables
– Solved using CPlex and Mathematica
Publication Review
● Submitted to the Annals of Mathematics● 5 years of reviewing (1998-2003)● Referees found no significant error● But only a qualified acceptance: “99%” certain
of overall correctness
Formalisation
● Flyspeck project (2003 – 2014)● Proof completely re-expressed in terms of formal logic
– Using highly trustworthy theorem prover software
● Largest ever formalisation by most measures– 20-30 person-years of effort
– 450,000 lines of proof script + 50,000 lines of theorem prover extension
– An estimated 2,000,000,000,000 primitive inferences
– Around 20 contributors
Outsourcing
● Main text (Part A) outsourced to Vietnam● 2009 workshop
– Background mathematics and overview of proof
– How to use HOL Light
– English lessons
● Proof chopped up into around 700 lemmas● Bounty awarded upon completion of a lemma
PART 2
The Flyspeck Proof
Theorem Provers (Proof Assistants)
● Software for performing mechanised formal proof, rigidly adhering to a given formal logic
● User guides the proof using a proof script– Steps roughly correspond to “human” steps
● Breaks down proof script steps into tiny primitive inferences of the formal logic– So each step is justified in terms of the formal logic
● Manages joining the proof script steps together to prove a given lemma– Ensures that the intended result is proved by the proof script
Theorem Prover Components
● Inference kernel (LCF-style systems only)– Small collection of inference rules and definition commands that can
create theorems
– ‘thm’ made a private datatype, and ML type system ensures that all theorems must be created ulitmately via the kernel
● Parser– for turning concrete syntax into abstract syntax
● Pretty printer– For turning abstract syntax into concrete syntax
● Derived inference rules● Environment for managing a proof script
Example Proof Script Extract let MUL_POW2 = REAL_ARITH` (a*b) pow 2 = a pow 2 * b pow 2 `;;
let x = Some(0, `x pow 4 = x pow 2`) ;;
let COMPUTE_SIN_DIVH_POW2 = prove(`! (v0: real^N) va vb vc.
let betaa = dihV v0 vc va vb in
let a = arcV v0 vc vb in
let b = arcV v0 vc va in
let c = arcV v0 va vb in
let p =
&1 - cos a pow 2 - cos b pow 2 - cos c pow 2 +
&2 * cos a * cos b * cos c in
~collinear {v0, vc, va} /\ ~collinear {v0, vc, vb} ==>
( sin betaa ) pow 2 = p / ((sin a * sin b) pow 4) `,
REPEAT STRIP_TAC THEN MP_TAC (SPEC_ALL RLXWSTK ) THEN
REPEAT LET_TAC THEN SIMP_TAC[SIN_POW2_EQ_1_SUB_COS_POW2 ] THEN
REPEAT STRIP_TAC THEN REPLICATE_TAC 2 (FIRST_X_ASSUM MP_TAC) THEN
NHANH (NOT_COLLINEAR_IMP_NOT_SIN0) THEN
EXPAND_TAC "a" THEN EXPAND_TAC "b" THEN PHA THEN
SIMP_TAC[REAL_FIELD` ~( a = &0 ) /\ ~ ( b = &0 ) ==>
&1 - ( x / ( a * b )) pow 2 = (( a * b ) pow 2 - x pow 2 ) / (( a * b ) pow 2 )`;
eval "x"] THEN
ASM_SIMP_TAC[] THEN STRIP_TAC THEN
MATCH_MP_TAC (MESON[]` a = b ==> a / x = b / x `) THEN
EXPAND_TAC "p" THEN SIMP_TAC[MUL_POW2; SIN_POW2_EQ_1_SUB_COS_POW2] THEN
REAL_ARITH_TAC);;
HOL Light
● Theorem prover for the HOL logic● Simple logical core
– 10 primitive inference rules
– 3 axioms
– 2 commands for conservative definition
● Parser and pretty printer for concrete syntax● Theory library of 2,000 theorems● Overall around 800 lines of trusted code● Logical core has been proved correct
Formalisation Stages
1. Prepare the proof– Re-express in a “formalisable” form
– Symbolic; No big steps; Coherent foundation
2. Prepare the theorem prover– Ensure the library supports the proof’s foundation
3. Only then can start proving– Translate the formalisable proof into proof script
Flyspeck Preparation
● Tom made significant changes to the original proof to make more formalisable– Changed partition of geometric space (Voronoi → Marshall)
– Used hypermaps instead of planar graphs
– Detail added in various places
– Parts B/C/D were adjusted (20,000 tame hypermaps)
● John Harrison added HOL Light Multivariate library– Vectors, Determinants, Topology, Integration, Measure, …
– 190,000 lines of proof script
Formalised Proof
● Formal text (Part A) proved in 450,000 lines of HOL Light proof script
● Tame planar graphs (Part B) generated by program proved by Isabelle/HOL.
● Non-linear inequalities (Part C) automatically proved by 25,000 lines of HOL Light extension– 5,000 hours of processing
● Linear inequalities (Part D) automatically proved by a few thousand lines of HOL Light extention
PART 3
Possible Pitfalls in Flyspeck
Is the Formal Statement Correct?
● Informal statement:
A / B ≤ / √18where A is the volume occupied by a packing of same-sized spheres within a containing sphere of volume B, as the radius of B tends to infinity
● Formal statement in HOL Light:!V. packing V
==> (?c. !r. &1 <= r
==> &(CARD(V INTER ball(vec 0,r))) <=
pi * r pow 3 / sqrt(&18) + c * r pow 2)
● Are these precisely equivalent?
Multiple Sessions
● The proof is spread over multiple sessions– One HOL Light session for Parts A and D– 600 parallel HOL Light sessions for Part C– One Isabelle/HOL session for Part B to generate a list manually
transcribed for use in Part A
● How can we be sure that these sessions fit together coherently without introducing inconsistency?
● Was the process of transcribing the list correct?● Cross checks were done
– But this is not policed by theorem proving
Other Concerns
● Does the proof actually run without falling over?● Was ‘new_axiom’ used in any session?
– If used, this can introduce inconsistency
● Were any HOL Light unsoundnesses exploited?– Can use mutable strings to rename constants
– Can use Obj.magic to subvert the OCaml type system
● Were any Isabelle/HOL unsoundnesses exploited?– Known soundness bugs in some versions
● Is the display of formulae correct?– HOL Light and Isabelle/HOL have known problems– Can mean that definitions or top-level theorem isn’t what it seems
● [Demo ...]
Pedigree of Formalisation Team
● Cannot rely on pedigree of formalisation team● Although innocent error is unlikely, it cannot be dismissed
as impossible– 20 contributors
– 450,000 lines of proof script
– 50,000 lines of complex automatic extension performing over a trillion proof steps
● Malicious error cannot be dismissed either– Outsourcing makes this more likely
● All it takes is one tiny exploit!
Can anyone see the exploit in the proof script extract?
(I maliciously doctored it!)
PART 4
Proof Auditing Flyspeck
Proof Auditing
● We propose the rigorous, independent assessment of important formalisation projects– Would result in EITHER a robust justification that a
complete and correct formal proof of the original informal theorem has been performed OR exposure of flaws in the project
● Should aim to make the justification as simple as possible
● Should not assume the good intentions of the project team
Proposed Auditing Process
1. Replay original project– Run each of the sessions of the formal proof
2. Port the proof to a trusted target system– Use proof porting in the original system(s) to export proof objects
– Consolidate the proof objects into a single session in target system
3. Examine the final state of the target system– Examine the display settings
– Examine the list of axioms
– Review the statement of the top-level theorem, and its dependency graph of supporting definitions
Requirements for the Process
● Proof porting software– Must be able to efficiently and reliably record and
export proofs from the original system
– Must be able to handle proofs of very large scale
● Trusted target system– Ideally want a system that is widely trusted not to
suffer from soundness issues or display issues
Common HOL
● A standard for basic HOL system functionality● Enables portability of proofs and source code between HOL
systems– HOL4, HOL Light, ProofPower HOL, Isabelle/HOL, HOL Zero, hol90
– Currently only implemented for HOL Light and HOL Zero
● Consists of:– Application Programming Interface (API)
– Standard HOL theory
– Adapted versions of various HOL systems for the API/theory
– Proof object exporter
– Proof object importer
Common HOL API
● Interface of around 450 ML functions/values– Functional programming library (100)
– Type, term & theorem utilities (150)
– Theory extension & listing commands (40)
– Inference rules (100)
– Parsing & pretty printing (20)
– Theorems (55)
● Enables fast and reliable porting of source code between HOL systems
Common HOL: Axioms
hol90 HOL4 ProofPower HOL Light HOL Zero
IMP_ANTISYM_AX axiom derived axiom - axiom
ETA_AX axiom axiom axiom axiom axiom
SELECT_AX axiom axiom axiom axiom axiom
BOOL_CASES_AX axiom axiom axiom derived derived
INFINITY_AX axiom axiom axiom axiom axiom
Common HOL: Inference Rules
hol90 HOL4 ProofPower HOL Light HOL Zero
ABS prim prim prim prim prim
ASSUME prim prim prim prim prim
BETA (not in platform) - - - prim -
BETA_CONV prim prim prim derived prim
DISCH prim prim prim derived prim
DEDUCT_ANTISYM_RULE - - - prim derived
EQ_MP k-derived k-derived k-derived prim prim
INST {k-derived} k-derived {k-derived} prim derived
INST_TYPE {prim} {prim} {prim} {prim} prim
MK_COMB k-derived k-derived k-derived prim prim
MP prim prim prim derived prim
REFL prim prim prim prim prim
SUBST prim prim prim derived derived
Common HOL: Term Utilities
hol90 HOL4 ProofPower HOL Light HOL Zero
type_of type_of type_of type_of type_of
type_vars_in_term type_vars_in_term {term_tyvars} type_vars_in_term term_tyvars
aconv aconv (~=$) aconv alpha_eq
- rename_bvar - {alpha} rename_bvar
free_vars free_vars frees frees free_vars
free_varsl free_varsl - freesl list_free_vars
- var_occurs is_free_in {vfree_in} var_free_in
{free_in} free_in - free_in term_free_in
all_vars - - variables all_vars
all_varsl - - - list_all_vars
inst {inst} {inst} {inst} tyvar_inst
- - {var_subst} vsubst var_inst
{subst} {subst} subst subst subst
variant variant variant {variant} variant
Using Common HOL
● Successfully used to record and port Flyspeck Parts A/D between two HOL Light sessions– 1.4 billion primitive inferences
– Recording/exporting overhead is around 40% of execution time
– Requires around 300MB of RAM
– Proof objects occupy around 200MB of .tgz files
– Replays in about 15% of time for original proof
● Scope for improvement
HOL Zero
● Another implementation of the HOL logic● Designed for proof checking, not interactive proof● Simple LCF-style implementation● Extensive code comments● Carefully designed concrete syntax and pretty printer● Avoids OCaml exploits● No known flaws● $100 soundness bounty!● [Demo …]
Using HOL Zero
● Successfully used Common HOL to port Flyspeck Parts A/D to HOL Zero– First-time port did fail, but due to subtle error in
implementation of Common HOL API for HOL Light
● Significantly slower than HOL Light– 6 hours vs 33 minutes
– Due to highly conservative architecture of HOL Zero
– Could perhaps speed up
● No issues found in Parts A/D
Modus Ponens
● In Common HOL:A ⟝ P ⇒ Q B ⟝ P A ∪ B ⟝ Q● In HOL Light:A ⟝ P ⇒ Q B ⟝ P (A \ {P}) ∪ B ⟝ Q
The Future
● Full audit of Flyspeck is feasible– Parts A/B/C/D all in a single HOL session
● Two remaining challenges– Recording/replaying Non-Linear Inequalities (Part C)
– Manual port of Graph Enumeration (Part B) to HOL Light
● Need to speed up HOL Zero● Need to improve proof porting● Need to port fixity as well as definitions/proofs
Try It Yourself
● Proof Technologies website contains downloads for replaying the Main Text (Part A)– Follow link Various / Flyspeck / Flyspeck Replay
– HOL Zero and HOL SuperLight target systems
– 200MB of proof modules
● [demo …]