Exploiting the Search Process John A Clark Dept. of Computer Science University of York, UK [email protected]

Exploiting the Search Process

John A Clark Dept. of Computer Science

University of York, [email protected]

A Talk with a Title I Cannot Remember but with Content Much as Advertised So Don’t Worry Too Much

John A Clark Dept. of Computer Science

University of York, [email protected]

Overview Some initial motivation. The cost function matters. The cost function doesn’t matter. Profiling optimisation - why it pays to watch

paint dry. Further ideas for heuristic optimisation.

Early Cryptanalysis Use of optimisation techniques typically uses a

costs function that seeks to provide a frequency profile close to that expected from average text…

In every paper I have seen the value of R is 1 – but why?

A hang-over from pencil and paper days? Does it matter?

ij

R

ij

ijij

E

EOkey)(cost

The Cost Function Matters Different cost functions give different ‘results’.

Not very surprising in itself. Most optimisation work ‘fiddles around’ with various

cost functions until one is obtained that works well. Effectiveness of cost function may depend on search

technique used (e.g. genetic algorithms or simulated annealing)

Different Cost Functions Give Different Results I

How to Plant A Trapdoor and And Why You Might Not Get Away With

It or…

Secret Agents Leave Big Footprints

Boolean Function Design A Boolean function }1,0{}1,0{: nf

0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1

01234567

1 -10 10 10 11 -10 11 -11 -1

f(x) f(x)xCan use non-linearityas the cost function.

)(max22

1

FN nf

Or minimise new cost function.

Cost(f)=||F()|-(2 n/2+K)| R

Uses and Abuses Can use optimisation to maximise the non-linearity,

minimise autocorrelation elements etc. These are publicly recognised good properties that we

might wish to demonstrate to others. From an optimisation point of view one way of satisfying

these requirements is as good as another. But for a malicious designer this may not be the case. Who

says that optimisation has to be used honestly????

What’s to stop me creating Boolean functions or S-boxes with good public properties but with hidden (unknown) properties?

Planting Trapdoors Can use these techniques to generate cryptographic

elements with good public properties using an honest cost function

honestCost(x) But also can try to hide useful (but privately known)

properties using a malicious cost function trapCost(x)

Now take combination and do both at the same time Cost(x)=(1-) honestCost(x) + trapCost(x)

Want as high as you can get away with for the next N years! The result must still possess the required good properties.

is the ‘malice factor’

Planting Trapdoors Carried out some experiments to generate highly non-

linear Boolean functions. wanted a technique that allowed new ‘trapdoor functionality’

to be inserted. didn’t know what new trapdoor functionality would look like (it

would be specific to the rest of the cipher in which the function were used)

clearly needs to be something other than closeness to a linear function (since this is diametrically opposite to public property)

Good way to test the technique was to generate a random Boolean function and require the eventual solution to be ‘close’ to it.

Cost(f)=(1-) ||F()|-(2 8/2-4)| 3

256 -x

f(x)g(x)3

+g is trapdoor function

is malice factor

Planting Trapdoors

112 114 11664 0 0 056 0 0 048 0 0 140 0 3 432 2 7 1224 0 0 1

=0.0MeanTrap=12.8

Non-linearity

Au

tocorr

ela

tion

30 runs at each malice factor level

112 114 11664 0 0 056 0 1 048 0 7 040 0 16 032 0 6 024 0 0 0

=0.2MeanTrap=198.9

112 114 11664 0 1 056 1 2 048 5 7 040 2 12 032 0 0 024 0 0 0

=0.6MeanTrap=222.1

112 114 11664 1 0 056 4 1 048 19 1 040 3 1 032 0 0 024 0 0 0

=0.8MeanTrap=232.3

112 114 11664 0 1 056 0 2 048 1 6 040 2 17 032 0 0 124 0 0 0

=0.4MeanTrap=213.1

MeanTrap=averagedot productof derived function withparticular trapdoor function

Planting TrapdoorsPublicly good solutions with high trapdoor bias found by annealing and combined honest and trapdoor cost functions.

Publicly good solutions, e.g. Boolean functions with same very high non-linearity

Publicly good solutions found by annealing and honest cost function

There appears nothing to distinguish the sets of solutions obtained – unless you know what form the trapdoor takes!

Or is there…

Vector Representations

+1

-1

+1

+1

-1

+1

-1

-1

Different cost functions may give similar goodness results but may do so in radically different ways.

Results using honest and dishonest cost functions cluster in different parts of the design space

Basically distinguish using discriminant analysis.

If you don’t have an alternative hypothesis then you can generate a family of honest results and ask how probable the offered one is.

Games People Play It seems possible to tell that something has been going on. And we don’t need to know precisely what has been going

on. Since any design has a binary vector representation the

technique is general. Currently have only looked at simple properties of vector

projections. More complex tests easily possible.

Myriad of further games you can play… if you know the form of discriminant tests used you can build

that knowledge into your dishonest cost function develop an artefact with some dishonest bias but which passes

the envisaged tests

More Games People Play Some honest cost function families may give different

characteristics for malicious as well as normal use QUT use of non-linearity and my recent cost function Both plausibly (obviously) honest

Could people with large amount of computing power use it to get an honest cost function that facilitates a particular malicious cost function?

I used power factor R=3.0? Why not 2.95????

More Games People Play Said you can try to build non-detection into your cost

functions. This assumes that you know the discriminant tests used

But verifier has arbitrary choice, e.g projection onto random discriminant vectors

Passing a discriminant test is a random variable. Malicious designer cannot protect arbitrary choices.

Note we are looking to detect malicious insertion. Cannot protect against accidental possession of a malicious property.

More Games People Play If you have a better optimisation technique than anyone

else… keep quiet about it can trade off additional capability for increased malice

factor.

Last Slide on This (Honest) An optimisation based design process may be open and

reproducible. Optimisation can be used and abused. Optimisation produces results with some regularity of

structure. Designs developed against different criteria just look

different. The games do not stop.

(Same and) Different Cost Functions Give Different Results II

Serious Cryptanalysis with (Poor) Cost Functions

Cryptanalysis: Pointcheval’s Scheme

Zero knowledge protocol based on NP-hard problem

1

5

1

1

3

1

1

1

1

1

11111

11111

11111

11111

11111

)1,1,3(

))5(),3(),1((

hhhhistogram

0b1s1a iiij

11

,,

bsA nnnm

A and the histogram are public. If you can recover secret s then the system is brokenSome suggested values for (m,n) are

(101,117) (131,147)

Pointcheval’s Scheme

Need cost function to indicate how good an x-candidate vector y is. Examples of factors we might like to consider:

Non-negativity of Ay elements and histogram agreement

Could give negativity punishment of costNeg(y)=|-3|+|-1| =4 Could give histogram punishment of costHist(y)=|3-2|+|1-0|| =2 Now take weighted sum of these costs

cost(y)=w1costNeg(y)+w2costHist(y)

3

1

3

1

1

Ay

)0,1,2()(

)1,1,3()(

yhist

shist

Outline of Annealing 1 Improving moves always accepted Non-improving moves may be accepted

probabilistically and in a manner depending on the temperature parameter Temp. Loosely

the worse the move the less likely it is to be accepted a worsening move is less likely to be accepted the

cooler the temperature The temperature T starts high and is gradually

cooled as the search progresses. Initially virtually anything is accepted, at the end only

improving moves are allowed (and the search effectively reduces to hill-climbing)

Outline of Annealing 2Current solution is x

Generate neighbouring solution y

Cost difference =f(y)-f(x)

If <0 then

accept move (current=y)

else accept if

exp-D/T>U(0,1)

else reject

Uniform (0,1) randomvariable

Outline of Annealing 3

T=80

T=64

T=...

T0

Try 10000 moves

Search finishes when no progress has been made for some number QT of temperature cycles orsome maximum number of cycles has been executed

Try 10000 moves

T=100

Try 10000 moves

Try 10000 moves

Simulated Annealing A local search technique. Current candidate x.

farsobestisSolution

TempTemp

rejectelse

acceptyxcurrentUifelse

acceptyxcurrentif

yfxf

xighbourgenerateNey

timesDo

dofrozenUntil

TTemp

xxcurrent

Temp

95.0

)( ))1,0((exp

)( )0(

)()(

)(

1000

)(

0

0

/

At each temperature consider 1000 moves

Always accept improving moves

Accept worsening moves probabilistically.

Gets harder to do this the worse the move.

Gets harder as Temp decreases.

Temperature cycle

Profiling Annealing (Analysis of Repeated Runs)

Simulated annealing can make progress with this scheme, typically getting solutions with around 80% of the vector entries correct (but don’t know which 80%!!!)

Some efforts have been made to look at repeated runs of the annealing process and looking for commonality of elements in results. Hopefully where the solutions agree they are correct (Knudsen and Meier)

Actual SecretRun 1Run 2Run 3Run 4Run 5Run 6All runs agree


The runs may agree correctly. The runs may agree incorrectly.

Actual Secret

All runs agree

All agree (rightly)

All agree (wrongly)


Knudsen and Meier use repeated runs, fix the commonly agreed elements and get a new series of runs to obtain a new set of commonly agreed bits etc.

At the end some bits will be fixed wrongly but the problem of finding them is now within computational range.

For the smallest (101,117) problem.

Viewpoint Analysis

But look again at the cost function templates

It’s as before but with two honest components. Different weights w1 and w2 will given different results yet the

resulting cost functions seem plausibly well-motivated. We can view different choices of weights as different

viewpoints on the problem.

)(costHistw)(costNegw)(cost 21 xxx

Radical Viewpoint Analysis

Take different viewpoints on the same problem, i.e. different cost functions

cost1(y)=5 costNeg(y)+1 costHist(y) cost2(y)=3 costNeg(y)+3 costHist(y) cost3(y)=1 costNeg(y)+5 costHist(y)

The cost surface is now different in each case but we still have cost=0 => problem solved.

Now use these to converge on candidate solutions For suitable chosen functions results typically have between

75-92% correct values. Now consider those values on which they agree. By taking a

large number of different cost functions you can reduce the number of values on which they agree wrongly almost to 0 (e.g. 30 cost functions get about 25% of key right with almost no bits wrong)

Additional cost functions remove incorrect agreement (but may also reduce correct agreement).

Random Viewpoint Analysis

But what’s the rationale behind the choices for the weights? cost1(y)=5 costNeg(y)+1 costHist(y) cost2(y)=3 costNeg(y)+3 costHist(y) cost3(y)=1 costNeg(y)+5 costHist(y)

They were chosen by me because the various sets looked different.

Actually since the actual cost functions don’t need to be good in themselves then they may as well be random…

cost1(y)=w11 costNeg(y)+ w12 costHist(y) cost2(y)= w21 costNeg(y)+ w22 costHist(y) cost3(y)= w31 costNeg(y)+ w32 costHist(y) ……

Thermo-statistical Annealing

The Power of Watching Paint Dry


Suppose now you have a binary 0-1 (+1,-1) problem of say 100 bits. ( Assume move strategy is simply a bit flip. In a temperature cycle with 10000 moves each bit will be given on average 100

opportunities to change value. Some strange things happen if you watch the values taken by the variables

within a temperature cycle. As the process cools some variables seem increasingly keen to take on

particular values (either 0 or 1). E.g. the first bit variable may spend 95% of the cycle taking the value 1. Thus, it

seems reluctant to take the value 0 and when it does so seems very ready to swap back to 1.

For various binary problems it is found that if a variable exhibits this behaviour it will generally take the preferred value at the end of the search.

Accept the inevitable and fix the variable at the preferred value when 95% threshold is achieved. Now spend rest of time changing other non-fixed variables.


Intended primarily as a way of achieving annealing more efficiently. can reduce the number of moves within a temperature cycle as variables are fixed. I find it better to simply use the extra time on the remaining variables (i.e. get closer

to thermal equilibrium)

But Why?

Why does this work? Why should a variable exhibit clear tendency to a preferred value? Obvious answer is because it is very difficult for it not to do so. There is

something about the problem instance that drives it in this direction.

Could it be that this is because it is the correct value?

Thermo-statistical Trajectories

Yes. The search process wants to take those values because THEY ARE

THE CORRECT ONES. With certain cost functions and problems the FIRST 50% OF

VARIABLE VALUES FIXED IN THIS WAY ARE CORRECT. Thus, within a few minutes you have half the key. Not always this

successful but most cost functions and problems I have used give 25%+ initial correctness.

Can use about 8 different cost functions and typically one of those 8 will have 40%+ initially fixed bits correct (but you do not know which one).

Evolving Protocols

Recent IEEE S&P Oakland paper using genetic algorithms to evolve abstract protocols (with proofs!).

Fitness function is based on number of stated goals met at each message.

Random bits strings can be decoded as protocols expressed in BAN-logic formalism and executed.

When a receiver gets a message he uses BAN inference rules to update his belief state according to what he knows already and what is in the message.

this is a form of abstract execution.

Quantum Search

Problem: You are asked to maximise f(x)=x over 0..1000000.

Do you use hill climbing use quantum search say, “you are obviously an academic, the answer is

obvious” Quantum search is an awfully inefficient way to get to

the answer because it does not exploit structure. It essentially is a form of spruced up brute force search.

Find a solution x such that predicate(x) holds Calculates in parallel predicate(x) for all x.

Quantum Search

If there are many solutions that satisfy the predicate then quantum search will find one much quicker and will select randomly between them.

Virtually all QS uses are blunt brute force. But why not use structure in a problem. Get QS to

produce solutions efficiently that are good in some way and then hill-climb.

More Optimisation Work

Do Genetic Cryptanalysis Programming evolve programs to leak information (not just static

approximations) Do Genetic Quantum Programming

Unitary transformations are programming language statements for quantum computing.

Represented by unitary matrices. Can evolve strings of matrices to represent a computation and simulate for small machines. Use this as a means of learning new quantum algorithms.

Possibilities to evolve new quantum cryptanalytic approaches Try to plant keyed trapdoors in more complex artefacts. Statistical profiling of traditional optimisation techniques –

potentially a very rich seam to mine (both in analysis and design).

A Tease

I do not think it will be long before optimisation techniques start making inroads into some surprising ways into major cryptanalysis.

Documents

Exploiting the Search Process John A Clark Dept. of Computer Science University of York, UK [email protected]