38
Evolutionary computation in cryptography and security SoSySec seminar: Artificial Intelligence and Security Rennes, 25.4.2017. Domagoj Jakobović ([email protected] ) FER, University of Zagreb http://gp.zemris.fer.hr http://ecf.zemris.fer.hr

Evolutionary computation in cryptography and security - Inriaseminaire-dga.gforge.inria.fr/2016/20170425_DomagojJakobovic.pdf · Evolutionary computation in cryptography and security

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Evolutionary computation in cryptography and security

SoSySec seminar: Artificial Intelligence and SecurityRennes, 25.4.2017.

Domagoj Jakobović ([email protected]) FER, University of Zagreb

http://gp.zemris.fer.hrhttp://ecf.zemris.fer.hr

gp.zemris.fer.hr 2/38

OverviewOverview� What about EC?� Cryptographic motivation� Genetic programming� Boolean functions� S-boxes� One-class intrusion detection� Software

gp.zemris.fer.hr 3/38

Evolutionary computationEvolutionary computation� EC: a research area within computer science that draws

inspiration from the process of natural evolution� Evolutionary algorithms: subset of EC, population based

metaheuristic optimization methods that use biology inspired mechanisms like selection, crossover or mutation� Genetic Algorithm (GA), Holland, 1975.� Tree based Genetic Programming (GP), Koza, 1992.� Cartesian Genetic Programming (CGP), Miller, 1999.� Evolution Strategy (ES), Rechenberg, Schwefel, 1970s.

� found application in numerous fields� the topic today: cryptography and security

gp.zemris.fer.hr 4/38

Optimization of cryptographic primitivesOptimization of cryptographic primitives� Cryptographic primitive is a part of a cryptographic tool

used to provide information security (a low-level cryptographic component that is frequently used)

� modern cryptography relies mostly on definitions and proofs, but there are nevertheless many primitives used today that do not have rigorous proofs

� examples of primitives:� Boolean functions� S-boxes (substitution boxes)� PRNGs (pseudo-random number generators)� addition chains

� primitives designed/optimized for information security and resilience to attacks

gp.zemris.fer.hr 5/38

Side channel attacksSide channel attacks� Implementation attacks: all attacks that do not aim at the

weaknesses of the algorithm itself, but on the actual implementations on cryptographic devices� sources: power, sound, light, electromagnetic radiation, etc.� among the most powerful known attacks against

cryptographic devices� common types: side channel attacks and fault injection

attacks� Side channel attacks are passive and non-invasive attacks

� examples: power analysis attacks to infer the key or plaintext

� properties may be known that increase resilience to these attacks

gp.zemris.fer.hr 6/38

Problem 1: Boolean functionsProblem 1: Boolean functions� important cryptographic primitive, often used in stream

ciphers as the source of nonlinearity� in cryptography, a Boolean function needs to fulfill a

number of properties:� to be used in filter generators: balancedness, high

nonlinearity, high algebraic degree, high algebraic immunity, high fast algebraic immunity

� to be used in combiner generators, additionally required a good value of correlation immunity

� as a part of the side-channel attack countermeasure (masking) it needs to have low Hamming weight and high correlation immunity

� to be of practical importance: at least 13 inputs� three main design options: algebraic constructions, random

search, heuristics

gp.zemris.fer.hr 7/38

Boolean function optimizationBoolean function optimization� search space size: 2^(2^n)� How to represent a function?

� truth table form: string of bits of length 2^n� Boolean function with 8 inputs: search space size is

2^(256)� larger inputs: very hard to optimize in truth table form� the best results: using genetic programming (GP)

Boolean function with 2 inputs

gp.zemris.fer.hr 8/38

DigressionDigression: Genetic programming: Genetic programming� What is GP?

� an attempt of automatic programming� How does it work?

� maintains a set (population) of possible solutions – programs (individuals)

� every individual has a quality assesment – the fitness� What does it do?

� simulates evolution: worse individuals are eliminated, better ones survive

� simulates genetic material exchange: better individuals make new ones

� with time, population gets better and better� When does it end?

� when a good enough solution is found� when we're out of time

gp.zemris.fer.hr 9/38

Solution representationSolution representation� most common: tree based

� tree elements:� leaves (terminals) – input variables (as given), constants,

actions (turn, move, operate…)� tree value – program output� inner nodes (function) – need to be chosen/defined!

� function examples: � arithmetic (+,-,*,/,sin,cos, log, sqrt, pow, exp…), logical (AND,

OR, NOT…), conditional (ifgte, IF…), loops...

gp.zemris.fer.hr 10/38

Initial populationInitial population� most often: created randomly

*

*

+ cos

*

+ cos

x 1 x

gp.zemris.fer.hr 11/38

EvolutionEvolution !!� each solution is evaluated according to the fitness function

� programs usually simulated over a number of test cases� apply selection

� many variants, same idea: better solutions have a greater probability of surviving

� and then the most important element: genetic operators� crossover : creating new solutions with existing ones

� combining (good?) parts of individuals� intensification: exploiting promising regions of seach space

� mutation : random change� diversification: finding new regions of search space

� many many variants for both operators

gp.zemris.fer.hr 12/38

CrossoverCrossover : : create something newcreate something new� at tleast two individuals (parents) combine and make a new

solution (child)

gp.zemris.fer.hr 13/38

CrossoverCrossover : : create something newcreate something new� most often: exchange randomly selected subtrees (subtree

crossover)

gp.zemris.fer.hr 14/38

MutationMutation : : can it get any bettercan it get any better� most often: replace a randomly selected subtree (subtree

mutation)

gp.zemris.fer.hr 15/38

GP for Boolean functionsGP for Boolean functions� terminals: input variables (x1, ..., xn)� inner nodes: Boolean primitives (AND, OR, NOT, XOR, IF,

...)� set the desired cryptographic properties

� a problem of its own: designing efficient fitness function� repeat many times!

� beats any other representation� at least for reasonable number of inputs

gp.zemris.fer.hr 16/38

� Boolean function optimization example

gp.zemris.fer.hr 17/38

Evolving constructions of Boolean functionsEvolving constructions of Boolean functions� Could we use evolutionary computation to evolve algebraic

constructions?� Example : evolve secondary algebraic constructions that

result in bent (max. nonlinearity) Boolean functions� take 4 existing bent functions of 4 inputs (easy to construct)� add two inputs� combine in a function totalling in 6 inputs (slightly less easy to

construct)� an example construction:

� ((((v1 XNOR f0) OR (f3 AND f0)) XOR ((f1 XOR v0) XNOR v1)) AND2 ((v0 AND2 f2) AND2 ((f0 XNOR f3) XOR (f1 AND2 v1))))

� optimize the evolved construction to maximize nonlinearity� and show that it holds for any existing input functions!� and for any number of inputs!

� empirically proven for up to 24 inputs

gp.zemris.fer.hr 18/38

� Boolean function construction evolution example

gp.zemris.fer.hr 19/38

Problem 2: SProblem 2: S --boxesboxes� natural extension from the Boolean function case� S-boxes (Substitution Boxes): vectorial Boolean functions� often used in block ciphers as a source of nonlinearity� design problem: much more difficult!� S-box of dimension m x n has m inputs and n output

Boolean functions

2 x 2 S-box

gp.zemris.fer.hr 20/38

SS--box propertiesbox properties� many properties of interest: balancedness, high

nonlinearity, low δ-uniformity, high algebraic degree, etc.� there are properties that algebraic constructions do not

consider� properties related with the side-channel resistance will usually

have poor values if S-boxes are constructed with algebraic constructions

� the task: evolve S-boxes that have good side channel resistance while maintaining other properties optimal

gp.zemris.fer.hr 21/38

SS--boxes side channel related propertiesboxes side channel related properties� Transparency order: cryptographic property of S-boxes

introduced by Prouff in 2005.� the higher the transparency order is, the lower is the S-box

resistance to the DPA attacks� new definition in 2015!

� Confusion coefficient� low confusion coecient values (also referred to as high

collision values) make side-channel attacks harder, i.e. they may require greater number of traces or SNR to yield the correct key candidate

gp.zemris.fer.hr 22/38

SS--box propertiesbox properties� we are also interested in implementation properties like

power, area, and latency� algebraic constructions usually do not consider such

properties � evolve S-boxes with good cryptographic properties that are

hardware-friendly� multiobjective problems (trade-off in different properties)

gp.zemris.fer.hr 23/38

SS--box optimizationbox optimization� when m = n, we can represent S-boxes as permutations,

i.e., with all values between 0 and 2^n -1 (where n is the dimension of the S-box)� the S-box is always bijective and we do not need to concern

with the balancedness property� when m > n, permutation encoding is not adequate� GP to the rescue: m x n S-box represented as n

independent trees� good results when m >> n

� another variant: CGP – Cartesian GP� instead of a tree, solution represented as a graph� offers multiple outputs – natural mapping to S-box

gp.zemris.fer.hr 24/38

CGP structureCGP structure

� resulting genotype:0 0 1 1 0 0 1 3 1 2 0 1 0 4 4 2 5 4 2 5 7 3

gp.zemris.fer.hr 25/38

CGP structureCGP structure

(primjer CGP)

gp.zemris.fer.hr 26/38

Cellular automata defined SCellular automata defined S --boxesboxes� another approach: evolve S-boxes in form of cellular

automata (CA) rules� also used in practice (Keccak cipher)

� GP evolves a Boolean function that is used as a local CA rule

� example rule: vi(t+1) = vi(t) OR vi-1(t)

� better results than permutation and basic GP/CGP for some S-box sizes (5x5, 6x6, 7x7)

� additional benefit: optimize for smaller number of gates (smaller area, power, latency)

1101t+1

0100t

v0v1v2v3

gp.zemris.fer.hr 27/38

� example evolved rule: ((v2 NOR NOT(v4)) XOR v1)

5x5 rule with optimal nonlinearity and di fferential uniformity

gp.zemris.fer.hr 28/38

� CA rule evolution

Evolved CA rule for the 5x5 S-box

gp.zemris.fer.hr 29/38

Problem 3Problem 3: Security application : Security application –– intrusion intrusion detectiondetection� Intrusion detection: process of monitoring the events

occurring in a computer system or network and analyzing them for intrusions� attempts to bypass the security mechanisms of a computer or

network � common approaches: supervised/unsupervised

classification (machine learning)� GP can be used in classification, with either

� decision tree classifier� regression tree classifier

� our example: use regression GP as a one-class classifier

gp.zemris.fer.hr 30/38

DigressionDigression: Symbolic regression: Symbolic regressionProblem example: physical process modelling

� electronic circuit response model (Arbitrary-Angle Unmitered Microstrip Bend)

� short term load forecasting� cryptographic element response

� if the model is known:� choose/optimize model parameters� somewhat simpler problem

� what if the model is not known?� surrogate model: neural net, SVM, expert system...

� building a model with genetic programming � symbolic regression

gp.zemris.fer.hr 31/38

TheThe symbolic regression problemsymbolic regression problem� task: discover the symbolic form of the model

� no assumptions of the unknown function! (right…)

-20

-15

-10

-5

0

5

10

15

20

-10 -5 0 5 10

x

f(x)

x f(x)?

gp.zemris.fer.hr 32/38

Symbolic regression with GPSymbolic regression with GP� evolve individuals using arithmetic

functions as tree elements� individuals (models) are evaluated

on input data� measures: MSE, RMSE, MAE, MAPE...

� many uses and application examples� popular software packages: Eureqa, DTReg, different

solvers� example: SRM application

gp.zemris.fer.hr 33/38

OneOne--class classification for intrusion detectionclass classification for intrusion detection� use regression GP as one-class classifier� assumption: only 'normal' class data available for training� learn a model (function) that forces the output to a certain

output range� e.g. [1, 2], [4, 5], [8, 9]; same range for all normal examples

� also, penalize 'trivial' models� reward the use of all features

� test the model on unseen data containing anomalies (intrusions)

� outputs falling outside the defined range are classified as anomalies!

� results: comparable to one-class SVM (mainly)� median of F1 measure ~0.82 (this weekend)� improve reliability with ensembles

� under heavy construction

gp.zemris.fer.hr 34/38

Available softwareAvailable softwareHow do I test all these examples myself?

� ECF – Evolutionary computation framework� ECF is a C++ framework intended for application of any

type of evolutionary computation: � http://ecf.zemris.fer.hr/

� project concerning evolutionary computation and cryptology: � http://evocrypt.zemris.fer.hr/ (under constant development)

gp.zemris.fer.hr 35/38

Instead of a conclusion...Instead of a conclusion...� EC proved to be successful in cryptography:

� when there exist no other, specialized approaches� to include new properties of interest to optimize� to assess the quality of some other method� to produce "good-enough'' solutions

� not a magic wand – requires both experience and someknowledge of the problem to produce competitive results

� outlook: combination with machine learning approaches for security applications

gp.zemris.fer.hr 36/38

AcknowledgementsAcknowledgementsThanks to:� Stjepan Picek� Annelie Heuser� EC team at FER Zagreb

gp.zemris.fer.hr 37/38

ReferencesReferences

� General� Evolutionary Computation Framework (http://ecf.zemris.fer.hr/) � EC group at FER/UNIZG (http://gp.zemris.fer.hr/)

� Boolean functions� Picek, Stjepan; Carlet, Claude; Guilley, Sylvain; Miller, Julian F.; Jakobović,

Domagoj. Evolutionary Algorithms for Boolean Functions in Diverse Domains of Cryptography // Evolutionary computation. 24 (2016) , 4; 667-694

� Picek, Stjepan; Jakobović, Domagoj; Miller, Julian; Batina, Lejla; Čupić, Marko. Cryptographic Boolean functions: One output, many design criteria // Applied soft computing. 40 (2016) ; 635-653

� Picek, Stjepan; Batina, Lejla; Jakobović, Domagoj. Evolving DPA-Resistant Boolean Functions // Lecture Notes in Computer Science. 8672 (2014) ; 812-821

� Picek, Stjepan; Jakobović, Domagoj. Evolving Algebraic Constructions for Designing Bent Boolean Functions // Proceedings of the Genetic and Evolutionary Computation Conference GECCO 2016. ACM 781-788

� Picek, Stjepan; Marchiori, Elena; Batina, Lejla; Jakobović, Domagoj. Combining Evolutionary Computation and Algebraic Constructions to Find Cryptography-Relevant Boolean Functions. // Lecture Notes in Computer Science. 8672 (2014); 822-831

gp.zemris.fer.hr 38/38

ReferencesReferences� S-boxes

� Stjepan Picek, Marko Cupic, and Leon Rotim. A New Cost Function for Evolution of S-boxes. Evolutionary Computation, Winter 2016, Vol. 24, No. 4, 695-718

� Picek, Stjepan; Jakobovic, Domagoj; Miller, Julian; Batina, Lejla. Cartesian Genetic Programming Approach for Generating Substitution Boxes of Different Sizes // Genetic and Evolutionary Computation Conf. GECCO 2015. 1457-1458

� Picek, Stjepan; Ege, Baris; Papagiannopoulos, Kostas; Batina, Lejla; Jakobović, Domagoj. Optimality and beyond: The case of 4×4 S-boxes // IEEE International Symposium on Hardware-Oriented Security and Trust (HOST 2014). 80-83

� Picek, Stjepan; Ege, Baris; Batina, Lejla; Jakobović, Domagoj; Chmielewski, Lukasz; Golub, Marin. On Using Genetic Algorithms for Intrinsic Side-channel Resistance: The Case of AES S-box // Proceedings of the First Workshop on Cryptography and Security in Computing Systems. ACM 13-18

� Stjepan Picek, Luca Mariot, Domagoj Jakobovic, Alberto Leporati. Evolving S-boxes Based on Cellular Automata with Genetic Programming. GECCO-2017 (accepted)

� Stjepan Picek, Luca Mariot, Domagoj Jakobovic, Bohan Yang, Nele Mentens. Design of S-boxes Defined with Cellular Automata Rules. MAL-IoT 2017 (accepted)

� K. Chakraborty, S. Sarkar, S. Maitra, B. Mazumdar, D. Mukhopadhyay, and E Prouff. Redefining the transparency order. In Coding and Cryptography, International Workshop, 2015.