Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Evolutionary computation in cryptography and security
SoSySec seminar: Artificial Intelligence and SecurityRennes, 25.4.2017.
Domagoj Jakobović ([email protected]) FER, University of Zagreb
http://gp.zemris.fer.hrhttp://ecf.zemris.fer.hr
gp.zemris.fer.hr 2/38
OverviewOverview� What about EC?� Cryptographic motivation� Genetic programming� Boolean functions� S-boxes� One-class intrusion detection� Software
gp.zemris.fer.hr 3/38
Evolutionary computationEvolutionary computation� EC: a research area within computer science that draws
inspiration from the process of natural evolution� Evolutionary algorithms: subset of EC, population based
metaheuristic optimization methods that use biology inspired mechanisms like selection, crossover or mutation� Genetic Algorithm (GA), Holland, 1975.� Tree based Genetic Programming (GP), Koza, 1992.� Cartesian Genetic Programming (CGP), Miller, 1999.� Evolution Strategy (ES), Rechenberg, Schwefel, 1970s.
� found application in numerous fields� the topic today: cryptography and security
gp.zemris.fer.hr 4/38
Optimization of cryptographic primitivesOptimization of cryptographic primitives� Cryptographic primitive is a part of a cryptographic tool
used to provide information security (a low-level cryptographic component that is frequently used)
� modern cryptography relies mostly on definitions and proofs, but there are nevertheless many primitives used today that do not have rigorous proofs
� examples of primitives:� Boolean functions� S-boxes (substitution boxes)� PRNGs (pseudo-random number generators)� addition chains
� primitives designed/optimized for information security and resilience to attacks
gp.zemris.fer.hr 5/38
Side channel attacksSide channel attacks� Implementation attacks: all attacks that do not aim at the
weaknesses of the algorithm itself, but on the actual implementations on cryptographic devices� sources: power, sound, light, electromagnetic radiation, etc.� among the most powerful known attacks against
cryptographic devices� common types: side channel attacks and fault injection
attacks� Side channel attacks are passive and non-invasive attacks
� examples: power analysis attacks to infer the key or plaintext
� properties may be known that increase resilience to these attacks
gp.zemris.fer.hr 6/38
Problem 1: Boolean functionsProblem 1: Boolean functions� important cryptographic primitive, often used in stream
ciphers as the source of nonlinearity� in cryptography, a Boolean function needs to fulfill a
number of properties:� to be used in filter generators: balancedness, high
nonlinearity, high algebraic degree, high algebraic immunity, high fast algebraic immunity
� to be used in combiner generators, additionally required a good value of correlation immunity
� as a part of the side-channel attack countermeasure (masking) it needs to have low Hamming weight and high correlation immunity
� to be of practical importance: at least 13 inputs� three main design options: algebraic constructions, random
search, heuristics
gp.zemris.fer.hr 7/38
Boolean function optimizationBoolean function optimization� search space size: 2^(2^n)� How to represent a function?
� truth table form: string of bits of length 2^n� Boolean function with 8 inputs: search space size is
2^(256)� larger inputs: very hard to optimize in truth table form� the best results: using genetic programming (GP)
Boolean function with 2 inputs
gp.zemris.fer.hr 8/38
DigressionDigression: Genetic programming: Genetic programming� What is GP?
� an attempt of automatic programming� How does it work?
� maintains a set (population) of possible solutions – programs (individuals)
� every individual has a quality assesment – the fitness� What does it do?
� simulates evolution: worse individuals are eliminated, better ones survive
� simulates genetic material exchange: better individuals make new ones
� with time, population gets better and better� When does it end?
� when a good enough solution is found� when we're out of time
gp.zemris.fer.hr 9/38
Solution representationSolution representation� most common: tree based
� tree elements:� leaves (terminals) – input variables (as given), constants,
actions (turn, move, operate…)� tree value – program output� inner nodes (function) – need to be chosen/defined!
� function examples: � arithmetic (+,-,*,/,sin,cos, log, sqrt, pow, exp…), logical (AND,
OR, NOT…), conditional (ifgte, IF…), loops...
gp.zemris.fer.hr 10/38
Initial populationInitial population� most often: created randomly
*
*
+ cos
*
+ cos
x 1 x
gp.zemris.fer.hr 11/38
EvolutionEvolution !!� each solution is evaluated according to the fitness function
� programs usually simulated over a number of test cases� apply selection
� many variants, same idea: better solutions have a greater probability of surviving
� and then the most important element: genetic operators� crossover : creating new solutions with existing ones
� combining (good?) parts of individuals� intensification: exploiting promising regions of seach space
� mutation : random change� diversification: finding new regions of search space
� many many variants for both operators
gp.zemris.fer.hr 12/38
CrossoverCrossover : : create something newcreate something new� at tleast two individuals (parents) combine and make a new
solution (child)
gp.zemris.fer.hr 13/38
CrossoverCrossover : : create something newcreate something new� most often: exchange randomly selected subtrees (subtree
crossover)
gp.zemris.fer.hr 14/38
MutationMutation : : can it get any bettercan it get any better� most often: replace a randomly selected subtree (subtree
mutation)
gp.zemris.fer.hr 15/38
GP for Boolean functionsGP for Boolean functions� terminals: input variables (x1, ..., xn)� inner nodes: Boolean primitives (AND, OR, NOT, XOR, IF,
...)� set the desired cryptographic properties
� a problem of its own: designing efficient fitness function� repeat many times!
� beats any other representation� at least for reasonable number of inputs
gp.zemris.fer.hr 17/38
Evolving constructions of Boolean functionsEvolving constructions of Boolean functions� Could we use evolutionary computation to evolve algebraic
constructions?� Example : evolve secondary algebraic constructions that
result in bent (max. nonlinearity) Boolean functions� take 4 existing bent functions of 4 inputs (easy to construct)� add two inputs� combine in a function totalling in 6 inputs (slightly less easy to
construct)� an example construction:
� ((((v1 XNOR f0) OR (f3 AND f0)) XOR ((f1 XOR v0) XNOR v1)) AND2 ((v0 AND2 f2) AND2 ((f0 XNOR f3) XOR (f1 AND2 v1))))
� optimize the evolved construction to maximize nonlinearity� and show that it holds for any existing input functions!� and for any number of inputs!
� empirically proven for up to 24 inputs
gp.zemris.fer.hr 19/38
Problem 2: SProblem 2: S --boxesboxes� natural extension from the Boolean function case� S-boxes (Substitution Boxes): vectorial Boolean functions� often used in block ciphers as a source of nonlinearity� design problem: much more difficult!� S-box of dimension m x n has m inputs and n output
Boolean functions
2 x 2 S-box
gp.zemris.fer.hr 20/38
SS--box propertiesbox properties� many properties of interest: balancedness, high
nonlinearity, low δ-uniformity, high algebraic degree, etc.� there are properties that algebraic constructions do not
consider� properties related with the side-channel resistance will usually
have poor values if S-boxes are constructed with algebraic constructions
� the task: evolve S-boxes that have good side channel resistance while maintaining other properties optimal
gp.zemris.fer.hr 21/38
SS--boxes side channel related propertiesboxes side channel related properties� Transparency order: cryptographic property of S-boxes
introduced by Prouff in 2005.� the higher the transparency order is, the lower is the S-box
resistance to the DPA attacks� new definition in 2015!
� Confusion coefficient� low confusion coecient values (also referred to as high
collision values) make side-channel attacks harder, i.e. they may require greater number of traces or SNR to yield the correct key candidate
gp.zemris.fer.hr 22/38
SS--box propertiesbox properties� we are also interested in implementation properties like
power, area, and latency� algebraic constructions usually do not consider such
properties � evolve S-boxes with good cryptographic properties that are
hardware-friendly� multiobjective problems (trade-off in different properties)
gp.zemris.fer.hr 23/38
SS--box optimizationbox optimization� when m = n, we can represent S-boxes as permutations,
i.e., with all values between 0 and 2^n -1 (where n is the dimension of the S-box)� the S-box is always bijective and we do not need to concern
with the balancedness property� when m > n, permutation encoding is not adequate� GP to the rescue: m x n S-box represented as n
independent trees� good results when m >> n
� another variant: CGP – Cartesian GP� instead of a tree, solution represented as a graph� offers multiple outputs – natural mapping to S-box
gp.zemris.fer.hr 24/38
CGP structureCGP structure
� resulting genotype:0 0 1 1 0 0 1 3 1 2 0 1 0 4 4 2 5 4 2 5 7 3
gp.zemris.fer.hr 26/38
Cellular automata defined SCellular automata defined S --boxesboxes� another approach: evolve S-boxes in form of cellular
automata (CA) rules� also used in practice (Keccak cipher)
� GP evolves a Boolean function that is used as a local CA rule
� example rule: vi(t+1) = vi(t) OR vi-1(t)
� better results than permutation and basic GP/CGP for some S-box sizes (5x5, 6x6, 7x7)
� additional benefit: optimize for smaller number of gates (smaller area, power, latency)
1101t+1
0100t
v0v1v2v3
gp.zemris.fer.hr 27/38
� example evolved rule: ((v2 NOR NOT(v4)) XOR v1)
5x5 rule with optimal nonlinearity and di fferential uniformity
gp.zemris.fer.hr 29/38
Problem 3Problem 3: Security application : Security application –– intrusion intrusion detectiondetection� Intrusion detection: process of monitoring the events
occurring in a computer system or network and analyzing them for intrusions� attempts to bypass the security mechanisms of a computer or
network � common approaches: supervised/unsupervised
classification (machine learning)� GP can be used in classification, with either
� decision tree classifier� regression tree classifier
� our example: use regression GP as a one-class classifier
gp.zemris.fer.hr 30/38
DigressionDigression: Symbolic regression: Symbolic regressionProblem example: physical process modelling
� electronic circuit response model (Arbitrary-Angle Unmitered Microstrip Bend)
� short term load forecasting� cryptographic element response
� if the model is known:� choose/optimize model parameters� somewhat simpler problem
� what if the model is not known?� surrogate model: neural net, SVM, expert system...
� building a model with genetic programming � symbolic regression
gp.zemris.fer.hr 31/38
TheThe symbolic regression problemsymbolic regression problem� task: discover the symbolic form of the model
� no assumptions of the unknown function! (right…)
-20
-15
-10
-5
0
5
10
15
20
-10 -5 0 5 10
x
f(x)
x f(x)?
gp.zemris.fer.hr 32/38
Symbolic regression with GPSymbolic regression with GP� evolve individuals using arithmetic
functions as tree elements� individuals (models) are evaluated
on input data� measures: MSE, RMSE, MAE, MAPE...
� many uses and application examples� popular software packages: Eureqa, DTReg, different
solvers� example: SRM application
gp.zemris.fer.hr 33/38
OneOne--class classification for intrusion detectionclass classification for intrusion detection� use regression GP as one-class classifier� assumption: only 'normal' class data available for training� learn a model (function) that forces the output to a certain
output range� e.g. [1, 2], [4, 5], [8, 9]; same range for all normal examples
� also, penalize 'trivial' models� reward the use of all features
� test the model on unseen data containing anomalies (intrusions)
� outputs falling outside the defined range are classified as anomalies!
� results: comparable to one-class SVM (mainly)� median of F1 measure ~0.82 (this weekend)� improve reliability with ensembles
� under heavy construction
gp.zemris.fer.hr 34/38
Available softwareAvailable softwareHow do I test all these examples myself?
� ECF – Evolutionary computation framework� ECF is a C++ framework intended for application of any
type of evolutionary computation: � http://ecf.zemris.fer.hr/
� project concerning evolutionary computation and cryptology: � http://evocrypt.zemris.fer.hr/ (under constant development)
gp.zemris.fer.hr 35/38
Instead of a conclusion...Instead of a conclusion...� EC proved to be successful in cryptography:
� when there exist no other, specialized approaches� to include new properties of interest to optimize� to assess the quality of some other method� to produce "good-enough'' solutions
� not a magic wand – requires both experience and someknowledge of the problem to produce competitive results
� outlook: combination with machine learning approaches for security applications
gp.zemris.fer.hr 36/38
AcknowledgementsAcknowledgementsThanks to:� Stjepan Picek� Annelie Heuser� EC team at FER Zagreb
gp.zemris.fer.hr 37/38
ReferencesReferences
� General� Evolutionary Computation Framework (http://ecf.zemris.fer.hr/) � EC group at FER/UNIZG (http://gp.zemris.fer.hr/)
� Boolean functions� Picek, Stjepan; Carlet, Claude; Guilley, Sylvain; Miller, Julian F.; Jakobović,
Domagoj. Evolutionary Algorithms for Boolean Functions in Diverse Domains of Cryptography // Evolutionary computation. 24 (2016) , 4; 667-694
� Picek, Stjepan; Jakobović, Domagoj; Miller, Julian; Batina, Lejla; Čupić, Marko. Cryptographic Boolean functions: One output, many design criteria // Applied soft computing. 40 (2016) ; 635-653
� Picek, Stjepan; Batina, Lejla; Jakobović, Domagoj. Evolving DPA-Resistant Boolean Functions // Lecture Notes in Computer Science. 8672 (2014) ; 812-821
� Picek, Stjepan; Jakobović, Domagoj. Evolving Algebraic Constructions for Designing Bent Boolean Functions // Proceedings of the Genetic and Evolutionary Computation Conference GECCO 2016. ACM 781-788
� Picek, Stjepan; Marchiori, Elena; Batina, Lejla; Jakobović, Domagoj. Combining Evolutionary Computation and Algebraic Constructions to Find Cryptography-Relevant Boolean Functions. // Lecture Notes in Computer Science. 8672 (2014); 822-831
gp.zemris.fer.hr 38/38
ReferencesReferences� S-boxes
� Stjepan Picek, Marko Cupic, and Leon Rotim. A New Cost Function for Evolution of S-boxes. Evolutionary Computation, Winter 2016, Vol. 24, No. 4, 695-718
� Picek, Stjepan; Jakobovic, Domagoj; Miller, Julian; Batina, Lejla. Cartesian Genetic Programming Approach for Generating Substitution Boxes of Different Sizes // Genetic and Evolutionary Computation Conf. GECCO 2015. 1457-1458
� Picek, Stjepan; Ege, Baris; Papagiannopoulos, Kostas; Batina, Lejla; Jakobović, Domagoj. Optimality and beyond: The case of 4×4 S-boxes // IEEE International Symposium on Hardware-Oriented Security and Trust (HOST 2014). 80-83
� Picek, Stjepan; Ege, Baris; Batina, Lejla; Jakobović, Domagoj; Chmielewski, Lukasz; Golub, Marin. On Using Genetic Algorithms for Intrinsic Side-channel Resistance: The Case of AES S-box // Proceedings of the First Workshop on Cryptography and Security in Computing Systems. ACM 13-18
� Stjepan Picek, Luca Mariot, Domagoj Jakobovic, Alberto Leporati. Evolving S-boxes Based on Cellular Automata with Genetic Programming. GECCO-2017 (accepted)
� Stjepan Picek, Luca Mariot, Domagoj Jakobovic, Bohan Yang, Nele Mentens. Design of S-boxes Defined with Cellular Automata Rules. MAL-IoT 2017 (accepted)
� K. Chakraborty, S. Sarkar, S. Maitra, B. Mazumdar, D. Mukhopadhyay, and E Prouff. Redefining the transparency order. In Coding and Cryptography, International Workshop, 2015.