Certification of Computational Results Greg Bronevetsky

Certification of Computational Results

Greg Bronevetsky

Background

• Technique proposed by • Gregory F. Sullivan• Dwight S. Wilson• Gerald B. Masson

• All from Johns Hopkins CS Department.

Overview

• Trying to do fault detection without the severe overhead of replication.

• Certification Trails are a manual approach that has that programmer provide additional code to have the program check itself.

• A program generates a certification trail that details its work.

• A checker program can use this trail to verify that the output is correct in asymptotically less time.

• Several examples provided. No automation.

Roadmap

• We will cover some algorithms to which the Certification Trails technique has been applied• Sorting• Convex Hull• Heap Data Structures

• The addition of Certification Trails and the creation of the Checker is done manually by the programmer in all cases.

Trail for Sorting

• In order to verify the output of a sorting algorithm we must check that• The sorted items are a permutation of the

original input items.• The sorted items appear in a non-decreasing

order in the sorter's output.

• Thus, the trail should contain all the items in their original order, each labeled with its location in the sorted list.

Sorting Checker

• A Sorting Checker must:• Use the labels to place all elements into their

sorted spots and verify that this results in a non-decreasing order.

• Verify that no two elements are placed in the same location in the ordered list.

• The Sorter takes O(n2) or O(n log n) time.• The Checker takes O(n) time.• Checker is asymptotically faster than Sorter.

Convex Hull Problem

Given a set of points on a 2D plane, find a subset of points that forms a convex hull around all the points.

Convex Hull: Step 1

P1 is the

point with the least x-coordinate.

P6

P2

P8

P3

P5

P1

P7

P4

Points sorted in order of increasing slope relative to P1

Convex Hull: Invariant

P6

P2

P8

P3

P5

P1

P7

P4

All the points not on the Hull are inside a triangle formed by P1 and two

successive points on the Hull.


P6

P2

P8

P3

P5

P1

P7

P4

We know that P3 is not a Hull point because the clockwise angle between

lines and

is ≥ 180º.

≥ 180º

P2P3 P3P4


P6

P2

P8

P3

P5

P1

P7

P4

< 180º

Note that if clockwise angle between lines and

is < 180º, then P

3 is a Hull pointP2P3 P3P4

Convex Hull Algorithm

• Add P1, P

2 and P

3 to the Hull.

(Note: P1, P

2 and P

n must be on the Hull.)

• For Pk = P

4 to P

n• ... trying to add P

k to the Hull ...

• Let QA and Q

B be the two points most recently

added to the Hull:• While the angle formed by Q

A, Q

B and P

k ≥180

• remove QB from the Hull since it is inside the triangle: P1, QA, Pk.

• Add Pk to the Hull.

Trail for Convex Hull

• Augment Program to • Output {q

1, q

2, ..., q

m} = the indexes of the points

on the hull.• Output a proof of correctness for {x

1, x

2, ..., x

r} =

all points not on the Hull in the form of the triangle that contains it.

Point not on Convex Hull 3 Surrounding PointsP

3P

1, P

2, P

4

P7

P1, P

6, P

8

Convex Hull Checker

Checker must check that:• There is a 1-1 correspondence between input

points and {q1, q

2, ..., q

m} U {x

1, x

2, ..., x

r}.

• All points in the triangle proofs correspond to input points.

• Each point in in the triangle proofs actually lies in the given triangle.

• Every triple of supposed Hull points forms a convex angle.

• There is a unique locally maximal point on the hull.

Asymptotic Runtimes

• Original Convex Hull Algorithm takes O(n log n) time to sort and the Hull construction loop takes only O(n) time.O(n log n)-time total.

• Convex Hull Checker runs thru the set of points once for each check.O(n)-time total.

• Checker asymptotically faster than Original.

Certification Trails for Data Structures

• Lets have a data structure for storing value/key pairs, ordered lexicographically:(key, val) < (key', val') iff val<val' or (val=val' and key<key')

• Operations:• member(key): returns whether key is mapped to

some val.• insert(key, val): inserts a pair (key, val) into the

data structure.• delete(key): deletes the pair that contains key.

Data Structure Specs

• Data Structure Operations• changekey(key, newval): executed when the pair

(key, oldval) exists in the data structure. Removes this pair and inserts the pair (key, newval)

• deletemin(): deletes the smallest pair (according to the ordering). Returns “empty” if the data structure contains no pairs.

• predecessor(key): returns the key of the pair thatimmediately precedes key's pair or “smallest”if there is no such pair.

• empty(): returns whether the data structure is empty.

Data Structure Implementation

• Such a Data Structure can be implemented via an AVL tree, a red-black tree or a b-tree.

• Most operations will take O(log n) time.• We can augement implementations to

generate a certification trail:• insert(key, val): output the key of the

predecessor of the newly inserted pair (key, val). If there is no predecessor, output “smallest”.

• changekey(key, newval): output predecessor of the new pair (key, newval). If there is no predecessor, output “smallest”.

Data Structure Checker

• A Checker for any program using the above data structure can use the certification trail to implement a much faster data structure.

• All operations can be done in O(1) time.

• Resulting program will be faster than original program. Maybe asymptotically faster.

Optimized Data Structure

• A doubly linked list of (key, val) pairs, sorted according to the pair ordering relation.

• An array indexed by keys, containing pointers to (key, val) pairs corresponding to the indexes.

• The first pair (with key=0) contains value=sm, which is defined to be smaller than any other possible value.


• Optimized data structure operations:• insert(key, val):

• Read from trail prec_key = the key of the pair preceding the new (key, val) pair.

• Check that it is a valid index.• Look at the pair pointed to by array[prec_key].

• Verify that it is ≠null.• Place the (key, val) pair at index key, following the

(prec_key, prec_val) pair. • Check that before the insert() array[key] was =null.• Ensure that (key, val) is greater than its

predecessor and less than its successor.

Optimized Insert Example

Result of the call insert(5, 62)


• Optimized data structure operations:• delete(key): Remove the pair pointed to by array[key].

• Ensure that array[key]≠null.• changekey(key, newval): Call delete(key), followed by

insert(key, newval). These calls will check all necessary conditions.

• deletemin(): Look at the pair that follows the pair (0,sm) (pointed to by array[0]).

• If no such pair, return “empty”. • Else, if there exists pair (key, val), then remove it and

set array[key] to null.• empty(): Return whether there is a pair following the pair

(0,sm).


• Optimized data structure operations:• member(key): return whether array[key]=null.• predecessor(key):

• Look at the pair pointed to by array[key].• Follow its backward link to its predecessor pair.• If the predecessor pair is (0,sm) then return “smallest”.• Else, return the key field of that pair.

• Note that all the operations can be done in O(1) time.

Shortest Path

• A Shortest Path algorithm was implemented using the above algorithm.

• The original program used the original data structure that produced a certification trail.

• The checker version was identical to the original except that its data structure was the optimized version that used the trail.

• Original runtime = O(m•log n)• Checker runtime = O(m)• (m=number of edges, n=number of nodes)

Performance: Sort

• Basic Algorithm – Sorting algorithm with no certification trails.

• 1st Execution – Sorter that produces certification trail.• 2nd Execution – Checking algorithm that uses the trail.• Speedup – factor of improvement of 2nd vs Basic.• %Savings – of 1st + 2nd trails execution over running

Basic twice.

Size Basic Speedup % SavingsAlgorithm (Generates Trail) (Uses Trail)

10000 0.28 0.30 0.04 7.00 39.29%50000 1.80 1.90 0.19 9.47 41.94%

100000 3.96 4.08 0.41 9.66 43.31%500000 23.95 24.69 2.14 11.19 43.99%1000000 50.23 51.57 4.38 11.47 44.31%

1st Execution 2nd Execution

Performance: Sort

Size Basic Speedup % SavingsAlgorithm (Generates Trail) (Uses Trail)

10000 0.28 0.30 0.04 7.00 39.29%50000 1.80 1.90 0.19 9.47 41.94%

100000 3.96 4.08 0.41 9.66 43.31%500000 23.95 24.69 2.14 11.19 43.99%1000000 50.23 51.57 4.38 11.47 44.31%


Performance: Convex HullSize Basic Speedup % Savings

Algorithm (Generates Trail) (Uses Trail)5000 0.61 0.62 0.07 8.73 43.62%10000 1.33 1.34 0.14 9.56 44.54%25000 3.68 3.68 0.36 10.22 45.12%50000 7.68 7.74 0.71 10.75 44.94%

100000 16.23 16.30 1.43 11.35 45.39%200000 33.93 34.37 2.84 11.94 45.16%


Performance: Shortest PathSize Basic Speedup % Savings(n,m) Algorithm (Generates Trail) (Uses Trail)

100,1000 0.04 0.05 0.02 2.00 12.50%250,2500 0.15 0.16 0.06 2.50 26.67%500,5000 0.31 0.33 0.11 2.82 29.03%

100,10000 0.70 0.76 0.23 3.04 29.29%2000,20000 1.58 1.67 0.45 3.51 32.91%2500,25000 2.06 2.15 0.55 3.75 34.47%


Summary of Experiments

• The overhead of generating a certification trail is about 2%.

• The checker run is much faster than the original. It can be run on much slower hardware or use a formally verified language.

Application to Byzantine Failures

• Current technique is completely manual. No known way to automatically convert a program to generate a trail.

• We may develop libraries that use the Certification Trails technique, allowing us to catch errors in a large fraction of a program.

• Door open to Failure Recovery: when an error is detected the checker goes back to using original code to redo the work.

Documents

Certification of Computational Results Greg Bronevetsky