View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Exploiting the Search Process
John A Clark Dept. of Computer Science
University of York, [email protected]
A Talk with a Title I Cannot Remember but with Content Much as Advertised So Don’t Worry Too Much
John A Clark Dept. of Computer Science
University of York, [email protected]
Overview Some initial motivation. The cost function matters. The cost function doesn’t matter. Profiling optimisation - why it pays to watch
paint dry. Further ideas for heuristic optimisation.
Early Cryptanalysis Use of optimisation techniques typically uses a
costs function that seeks to provide a frequency profile close to that expected from average text…
In every paper I have seen the value of R is 1 – but why?
A hang-over from pencil and paper days? Does it matter?
ij
R
ij
ijij
E
EOkey)(cost
The Cost Function Matters Different cost functions give different ‘results’.
Not very surprising in itself. Most optimisation work ‘fiddles around’ with various
cost functions until one is obtained that works well. Effectiveness of cost function may depend on search
technique used (e.g. genetic algorithms or simulated annealing)
Different Cost Functions Give Different Results I
How to Plant A Trapdoor and And Why You Might Not Get Away With
It or…
Secret Agents Leave Big Footprints
Boolean Function Design A Boolean function }1,0{}1,0{: nf
0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1
01234567
1 -10 10 10 11 -10 11 -11 -1
f(x) f(x)xCan use non-linearityas the cost function.
)(max22
1
FN nf
Or minimise new cost function.
Cost(f)=||F()|-(2 n/2+K)| R
Uses and Abuses Can use optimisation to maximise the non-linearity,
minimise autocorrelation elements etc. These are publicly recognised good properties that we
might wish to demonstrate to others. From an optimisation point of view one way of satisfying
these requirements is as good as another. But for a malicious designer this may not be the case. Who
says that optimisation has to be used honestly????
What’s to stop me creating Boolean functions or S-boxes with good public properties but with hidden (unknown) properties?
Planting Trapdoors Can use these techniques to generate cryptographic
elements with good public properties using an honest cost function
honestCost(x) But also can try to hide useful (but privately known)
properties using a malicious cost function trapCost(x)
Now take combination and do both at the same time Cost(x)=(1-) honestCost(x) + trapCost(x)
Want as high as you can get away with for the next N years! The result must still possess the required good properties.
is the ‘malice factor’
Planting Trapdoors Carried out some experiments to generate highly non-
linear Boolean functions. wanted a technique that allowed new ‘trapdoor functionality’
to be inserted. didn’t know what new trapdoor functionality would look like (it
would be specific to the rest of the cipher in which the function were used)
clearly needs to be something other than closeness to a linear function (since this is diametrically opposite to public property)
Good way to test the technique was to generate a random Boolean function and require the eventual solution to be ‘close’ to it.
Cost(f)=(1-) ||F()|-(2 8/2-4)| 3
256 -x
f(x)g(x)3
+g is trapdoor function
is malice factor
Planting Trapdoors
112 114 11664 0 0 056 0 0 048 0 0 140 0 3 432 2 7 1224 0 0 1
=0.0MeanTrap=12.8
Non-linearity
Au
tocorr
ela
tion
30 runs at each malice factor level
112 114 11664 0 0 056 0 1 048 0 7 040 0 16 032 0 6 024 0 0 0
=0.2MeanTrap=198.9
112 114 11664 0 1 056 1 2 048 5 7 040 2 12 032 0 0 024 0 0 0
=0.6MeanTrap=222.1
112 114 11664 1 0 056 4 1 048 19 1 040 3 1 032 0 0 024 0 0 0
=0.8MeanTrap=232.3
112 114 11664 0 1 056 0 2 048 1 6 040 2 17 032 0 0 124 0 0 0
=0.4MeanTrap=213.1
MeanTrap=averagedot productof derived function withparticular trapdoor function
Planting TrapdoorsPublicly good solutions with high trapdoor bias found by annealing and combined honest and trapdoor cost functions.
Publicly good solutions, e.g. Boolean functions with same very high non-linearity
Publicly good solutions found by annealing and honest cost function
There appears nothing to distinguish the sets of solutions obtained – unless you know what form the trapdoor takes!
Or is there…
Vector Representations
+1
-1
+1
+1
-1
+1
-1
-1
Different cost functions may give similar goodness results but may do so in radically different ways.
Results using honest and dishonest cost functions cluster in different parts of the design space
Basically distinguish using discriminant analysis.
If you don’t have an alternative hypothesis then you can generate a family of honest results and ask how probable the offered one is.
Games People Play It seems possible to tell that something has been going on. And we don’t need to know precisely what has been going
on. Since any design has a binary vector representation the
technique is general. Currently have only looked at simple properties of vector
projections. More complex tests easily possible.
Myriad of further games you can play… if you know the form of discriminant tests used you can build
that knowledge into your dishonest cost function develop an artefact with some dishonest bias but which passes
the envisaged tests
More Games People Play Some honest cost function families may give different
characteristics for malicious as well as normal use QUT use of non-linearity and my recent cost function Both plausibly (obviously) honest
Could people with large amount of computing power use it to get an honest cost function that facilitates a particular malicious cost function?
I used power factor R=3.0? Why not 2.95????
More Games People Play Said you can try to build non-detection into your cost
functions. This assumes that you know the discriminant tests used
But verifier has arbitrary choice, e.g projection onto random discriminant vectors
Passing a discriminant test is a random variable. Malicious designer cannot protect arbitrary choices.
Note we are looking to detect malicious insertion. Cannot protect against accidental possession of a malicious property.
More Games People Play If you have a better optimisation technique than anyone
else… keep quiet about it can trade off additional capability for increased malice
factor.
Last Slide on This (Honest) An optimisation based design process may be open and
reproducible. Optimisation can be used and abused. Optimisation produces results with some regularity of
structure. Designs developed against different criteria just look
different. The games do not stop.
(Same and) Different Cost Functions Give Different Results II
Serious Cryptanalysis with (Poor) Cost Functions
Cryptanalysis: Pointcheval’s Scheme
Zero knowledge protocol based on NP-hard problem
1
5
1
1
3
1
1
1
1
1
11111
11111
11111
11111
11111
)1,1,3(
))5(),3(),1((
hhhhistogram
0b1s1a iiij
11
,,
bsA nnnm
A and the histogram are public. If you can recover secret s then the system is brokenSome suggested values for (m,n) are
(101,117) (131,147)
Pointcheval’s Scheme
Need cost function to indicate how good an x-candidate vector y is. Examples of factors we might like to consider:
Non-negativity of Ay elements and histogram agreement
Could give negativity punishment of costNeg(y)=|-3|+|-1| =4 Could give histogram punishment of costHist(y)=|3-2|+|1-0|| =2 Now take weighted sum of these costs
cost(y)=w1costNeg(y)+w2costHist(y)
3
1
3
1
1
Ay
)0,1,2()(
)1,1,3()(
yhist
shist
Outline of Annealing 1 Improving moves always accepted Non-improving moves may be accepted
probabilistically and in a manner depending on the temperature parameter Temp. Loosely
the worse the move the less likely it is to be accepted a worsening move is less likely to be accepted the
cooler the temperature The temperature T starts high and is gradually
cooled as the search progresses. Initially virtually anything is accepted, at the end only
improving moves are allowed (and the search effectively reduces to hill-climbing)
Outline of Annealing 2Current solution is x
Generate neighbouring solution y
Cost difference =f(y)-f(x)
If <0 then
accept move (current=y)
else accept if
exp-D/T>U(0,1)
else reject
Uniform (0,1) randomvariable
Outline of Annealing 3
T=80
T=64
T=...
T0
Try 10000 moves
Search finishes when no progress has been made for some number QT of temperature cycles orsome maximum number of cycles has been executed
Try 10000 moves
T=100
Try 10000 moves
Try 10000 moves
Simulated Annealing A local search technique. Current candidate x.
farsobestisSolution
TempTemp
rejectelse
acceptyxcurrentUifelse
acceptyxcurrentif
yfxf
xighbourgenerateNey
timesDo
dofrozenUntil
TTemp
xxcurrent
Temp
95.0
)( ))1,0((exp
)( )0(
)()(
)(
1000
)(
0
0
/
At each temperature consider 1000 moves
Always accept improving moves
Accept worsening moves probabilistically.
Gets harder to do this the worse the move.
Gets harder as Temp decreases.
Temperature cycle
Profiling Annealing (Analysis of Repeated Runs)
Simulated annealing can make progress with this scheme, typically getting solutions with around 80% of the vector entries correct (but don’t know which 80%!!!)
Some efforts have been made to look at repeated runs of the annealing process and looking for commonality of elements in results. Hopefully where the solutions agree they are correct (Knudsen and Meier)
Actual SecretRun 1Run 2Run 3Run 4Run 5Run 6All runs agree
Profiling Annealing (Analysis of Repeated Runs)
The runs may agree correctly. The runs may agree incorrectly.
Actual Secret
All runs agree
All agree (rightly)
All agree (wrongly)
Profiling Annealing (Analysis of Repeated Runs)
Knudsen and Meier use repeated runs, fix the commonly agreed elements and get a new series of runs to obtain a new set of commonly agreed bits etc.
At the end some bits will be fixed wrongly but the problem of finding them is now within computational range.
For the smallest (101,117) problem.
Viewpoint Analysis
But look again at the cost function templates
It’s as before but with two honest components. Different weights w1 and w2 will given different results yet the
resulting cost functions seem plausibly well-motivated. We can view different choices of weights as different
viewpoints on the problem.
)(costHistw)(costNegw)(cost 21 xxx
Radical Viewpoint Analysis
Take different viewpoints on the same problem, i.e. different cost functions
cost1(y)=5 costNeg(y)+1 costHist(y) cost2(y)=3 costNeg(y)+3 costHist(y) cost3(y)=1 costNeg(y)+5 costHist(y)
The cost surface is now different in each case but we still have cost=0 => problem solved.
Now use these to converge on candidate solutions For suitable chosen functions results typically have between
75-92% correct values. Now consider those values on which they agree. By taking a
large number of different cost functions you can reduce the number of values on which they agree wrongly almost to 0 (e.g. 30 cost functions get about 25% of key right with almost no bits wrong)
Additional cost functions remove incorrect agreement (but may also reduce correct agreement).
Random Viewpoint Analysis
But what’s the rationale behind the choices for the weights? cost1(y)=5 costNeg(y)+1 costHist(y) cost2(y)=3 costNeg(y)+3 costHist(y) cost3(y)=1 costNeg(y)+5 costHist(y)
They were chosen by me because the various sets looked different.
Actually since the actual cost functions don’t need to be good in themselves then they may as well be random…
cost1(y)=w11 costNeg(y)+ w12 costHist(y) cost2(y)= w21 costNeg(y)+ w22 costHist(y) cost3(y)= w31 costNeg(y)+ w32 costHist(y) ……
Thermo-statistical Annealing
Suppose now you have a binary 0-1 (+1,-1) problem of say 100 bits. ( Assume move strategy is simply a bit flip. In a temperature cycle with 10000 moves each bit will be given on average 100
opportunities to change value. Some strange things happen if you watch the values taken by the variables
within a temperature cycle. As the process cools some variables seem increasingly keen to take on
particular values (either 0 or 1). E.g. the first bit variable may spend 95% of the cycle taking the value 1. Thus, it
seems reluctant to take the value 0 and when it does so seems very ready to swap back to 1.
For various binary problems it is found that if a variable exhibits this behaviour it will generally take the preferred value at the end of the search.
Accept the inevitable and fix the variable at the preferred value when 95% threshold is achieved. Now spend rest of time changing other non-fixed variables.
Thermo-statistical Annealing
Intended primarily as a way of achieving annealing more efficiently. can reduce the number of moves within a temperature cycle as variables are fixed. I find it better to simply use the extra time on the remaining variables (i.e. get closer
to thermal equilibrium)
But Why?
Why does this work? Why should a variable exhibit clear tendency to a preferred value? Obvious answer is because it is very difficult for it not to do so. There is
something about the problem instance that drives it in this direction.
Could it be that this is because it is the correct value?
Thermo-statistical Trajectories
Yes. The search process wants to take those values because THEY ARE
THE CORRECT ONES. With certain cost functions and problems the FIRST 50% OF
VARIABLE VALUES FIXED IN THIS WAY ARE CORRECT. Thus, within a few minutes you have half the key. Not always this
successful but most cost functions and problems I have used give 25%+ initial correctness.
Can use about 8 different cost functions and typically one of those 8 will have 40%+ initially fixed bits correct (but you do not know which one).
Evolving Protocols
Recent IEEE S&P Oakland paper using genetic algorithms to evolve abstract protocols (with proofs!).
Fitness function is based on number of stated goals met at each message.
Random bits strings can be decoded as protocols expressed in BAN-logic formalism and executed.
When a receiver gets a message he uses BAN inference rules to update his belief state according to what he knows already and what is in the message.
this is a form of abstract execution.
Quantum Search
Problem: You are asked to maximise f(x)=x over 0..1000000.
Do you use hill climbing use quantum search say, “you are obviously an academic, the answer is
obvious” Quantum search is an awfully inefficient way to get to
the answer because it does not exploit structure. It essentially is a form of spruced up brute force search.
Find a solution x such that predicate(x) holds Calculates in parallel predicate(x) for all x.
Quantum Search
If there are many solutions that satisfy the predicate then quantum search will find one much quicker and will select randomly between them.
Virtually all QS uses are blunt brute force. But why not use structure in a problem. Get QS to
produce solutions efficiently that are good in some way and then hill-climb.
More Optimisation Work
Do Genetic Cryptanalysis Programming evolve programs to leak information (not just static
approximations) Do Genetic Quantum Programming
Unitary transformations are programming language statements for quantum computing.
Represented by unitary matrices. Can evolve strings of matrices to represent a computation and simulate for small machines. Use this as a means of learning new quantum algorithms.
Possibilities to evolve new quantum cryptanalytic approaches Try to plant keyed trapdoors in more complex artefacts. Statistical profiling of traditional optimisation techniques –
potentially a very rich seam to mine (both in analysis and design).