Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
Generic Side-Channel Countermeasures
for Reconfigurable Devices CHES 2011, Nara, Japan
Tim Güneysu, Amir Moradi Horst Görtz Institute for IT-Security, Ruhr-University Bochum, Germany 29/09/2011
2 CHES 2011 | Nara, Japan | Tim Güneysu
Agenda
Introduction and Motivation
Design Proposals for FPGAs
Noise Generation
Clock Disalignment
Memory Masking
Evaluations
Conclusions FPGA
3 CHES 2011 | Nara, Japan | Tim Güneysu
Agenda
Introduction and Motivation
Design Proposals for FPGAs
Noise Generation
Clock Disalignment
Memory Masking
Evaluations
Conclusions FPGA
4 CHES 2011 | Nara, Japan | Tim Güneysu
Introduction and Motivation
All cryptographic implementations need countermeasures (CM)
against side-channel attacks
Designing and deploying a CM on a dedicated platform is costly
Development time (e.g., balanced routing for logic styles)
Execution time (e.g., additional time for random dummy cycles)
Physical resources (e.g., more logic for masked data paths/S-boxes)
For strong protection, several CMs need to be combined (which are even
often very cipher-dependant)
Ideally: Given a set of generic and efficient CMs to establish
(basic) SCA protection on a specific processing platform
5 CHES 2011 | Nara, Japan | Tim Güneysu
Introduction and Motivation
This talk: proposing countermeasures for FPGAs
Generically usable with most (symmetric) cryptosystems
Applicable to many (Xilinx) FPGA devices
Predesigned as (hard) macros just to be added to an application
Portfolio of countermeasures:
FPGA-specific noise generators (using registers, memories, short circuits)
Clock disalignment using Digital Clock Managers (DCM)
Memory masking in dual-ported memories
Detector for input clock manipulations (prevent down-clocking)
6 CHES 2011 | Nara, Japan | Tim Güneysu
Agenda
Introduction and Motivation
Design Proposals for FPGAs
Noise Generation
Clock Disalignment
Memory Masking
Evaluations
Conclusions FPGA
7 CHES 2011 | Nara, Japan | Tim Güneysu
Implementing Noise Generators in FPGAs
Common design: application
including cryptographic core
Noise generation strategy
Configure remaining,
routable slices (flip-flops)
as cyclic shift registers
Preload sequence „01“
into shift registers
Run noise generator
in synch with crypto core
CR
YP
TO
CO
RE
AP
PLIC
ATIO
N
LOG
IC
1 0 FF FF
1 0 FF FF
1 0 FF FF
…
CE
8 CHES 2011 | Nara, Japan | Tim Güneysu
Slice
LUT Carry
LUT Carry D Q
CE
PRE
CLR
D Q CE
PRE
CLR
FF
FF
Proposal #1: Using Shift Register LUTs (SRL)
Logic elements consist of LUTs and FFs
Special (alternative) LUT function:
Shift Register LUT (SRL)
n-bit register length (n=16 or n=64)
D Q CE
D Q CE
D Q CE
D Q CE
Shift Register LUT
D CE
CLK
A[3:0]
Q
LUT
9 CHES 2011 | Nara, Japan | Tim Güneysu
Proposal #1: Using Shift Register LUTs (SRL)
Logic elements consist of LUTs and FFs
Special (alternative) LUT function:
Shift Register LUT (SRL)
n-bit register length (n=16 or n=64)
D Q CE
D Q CE
D Q CE
D Q CE
Shift Register LUT
D CE
CLK
A[3:0]
Q
10 CHES 2011 | Nara, Japan | Tim Güneysu
Proposal #1: Using Shift Register LUTs (SRL)
Logic elements consist of LUTs and FFs
Special (alternative) LUT function:
Shift Register LUT (SRL)
n-bit register length (n=16 or n=64)
Preload SRL with „01“ combination
1
0
1 1
Shift Register LUT
D CE
CLK
Q 1
1
11 CHES 2011 | Nara, Japan | Tim Güneysu
Proposal #1: Using Shift Register LUTs (SRL)
Logic elements consist of LUTs and FFs
Special (alternative) LUT function:
Shift Register LUT (SRL)
n-bit register length (n=16 or n=64)
Preload SRL with „01“ combination
Create r cyclic rings using s cascaded SRLs
SRLs are clocked according to free-running RNG
1
0
1 1
LUT D
CE CLK
Q
CE
1
1
SRL s SRL 1 SRL 2 ... . .
.
SRL s SRL 1 SRL 2 ...
R
N
G
CE 1
CE r
. . .
12 CHES 2011 | Nara, Japan | Tim Güneysu
Write collision when concurrently writing data to
the same address of dual-ported memories (BRAM)
Opposite driving directions in inverter pair
result in uncertain outcome [GP09,G10]
Proposal #2: Write Collisions in BRAMs
IN B
ADDR B
IN A
WE A
ADDR A
WE B 1
A=1
B=0
BRAM
1
A B B A
1 0 1 0
? ?
13 CHES 2011 | Nara, Japan | Tim Güneysu
Proposal #2: Write Collisions in BRAMs
Write collision when concurrently writing data to
the same address of dual-ported memories (BRAM)
Opposite driving directions in inverter pair
result in uncertain outcome [GP09,G10]
Likely to exhibit higher power consumption
Idea for noise generation:
Create a write collision generator
Create collisions according to output
of an RNG
P O R T B P O R T A
I N B
I N A
A D
D R
A
A D
D R
B
W E B
W E
A
s = 3 6
BRAM
V cc
CE
BN i
9
RNG
CLK
s =
3 6
BN r BN 1 BN 2
CE 1 CE r
CLK
...
RNG
CE 2
14 CHES 2011 | Nara, Japan | Tim Güneysu
Proposal #3: Short Circuits in FPGAs
Short circuits (SC) can be created in
the FPGA‘s routing network [BKT10]
SCs in output multiplexers of switch boxes
Power restriction limits currents < 100 µA
Establishing controlled SCs requires
manual routing (via XDL)
Switch box
Slices
Output multiplexer
Output
1 0 0 0 0 0 1 In
pu
t Config. 1
0
SC
CLB
15 CHES 2011 | Nara, Japan | Tim Güneysu
OUTPUT_OBUF
SLICE
Proposal #3: Short Circuits in FPGAs
Controlled SC with 3 CLBs
Switch box
Active
slice
Design of a Controlled Short Circuit
1
1/0
16 CHES 2011 | Nara, Japan | Tim Güneysu
Proposal #3: Short Circuits in FPGAs
Package controlled SC into hard macro
Instantiate r controlled SC units
on FPGA
Distribute SCs among different
power domains to distribute load
SC i SLICE 1
SWITCH BOX
SLICE 2 SLICE 3
I A I B
CLK
SC r SC 1 SC 2
I A , 1
CLK
...
I B , 1 I A , 2 I B , 2 I A , r I B , r
RNG
17 CHES 2011 | Nara, Japan | Tim Güneysu
Agenda
Introduction and Motivation
Design Proposals for FPGAs
Noise Generation
Clock Disalignment
Memory Masking
Evaluations
Conclusions FPGA
18 CHES 2011 | Nara, Japan | Tim Güneysu
Proposal #4: Clock Disalignment using DCMs
Digital Clock Managers (DCM) support
concurrent phase-shift channels
Clock buffers can be configured as
glitch-free clock multiplexers
Cascading clock muxes result in a
randomly delayed, phase-shifted clock
RNG
PS
DC
MA
PS
+4
5°
DC
MB
0°
90°
180°
270°
45°
135°
225°
315°
CLKCiph.
CLKI
S2
S0 S1
B
C
A
CLKFSM
CLKI
A
B
C
CLKCiph.
Clock Output Waveform
CLKFSM
19 CHES 2011 | Nara, Japan | Tim Güneysu
Agenda
Introduction and Motivation
Design Proposals for FPGAs
Noise Generation
Clock Disalignment
Memory Masking
Evaluations
Conclusions FPGA
20 CHES 2011 | Nara, Japan | Tim Güneysu
Proposal #5: Data Masking with BRAMs
Round functions often have linear
and non-linear part (S-box in memory)
CM: implement masking on data path
Implementation idea:
Push masking scheme into
dual-ported memory (S-box)
Perform mask update by
concurrent process
Simplification: use same random
mask for (few) consecutive rounds
(first-order SCA-resistant only!)
Round i
R(x): S(x) m
L(x): Linear transformation
xi yi
ki ^ m=L(m)
(Isolated) Round i
S(x): S-box
L(x): Linear transformation
xi yi
ki
Round i+1
R(x): S(x) m
L(x): Linear transformation
x+1 yi+1
ki+1 ^ m
Round i+1
Q(x): R(x m)
L(x): Linear transformation
x+1 yi+1
ki+1
^
S-box im memory: Q(x) = S(x L(m)) m
m m ^ m m ^
21 CHES 2011 | Nara, Japan | Tim Güneysu
B R
A M
Context B S - box under scrambling
Context A Active S - box
OUT A
ADDR A
Active Context
OUT B
ADDR B
WE B S c r a
m b
l e r
RNG
FSM
IN B
Mask m
L ( x ) π
C i p
h e r
L ( x
)
Proposal #5: Data Masking with BRAMs
Dual-ported BRAM allows simultaneous use and mask update of Q-box
Active context (Q-box #1) used by cipher operation
Inactive context (Q-box #2) updates mask by concurrent process
Context switch after update and cipher process are finished
BR
AM
Q-box #2 (inactive, using m2)
Q-box #1 (active,
using m1)
22 CHES 2011 | Nara, Japan | Tim Güneysu
B R
A M
Context B S - box under scrambling
Context A Active S - box
OUT A
ADDR A
Active Context
OUT B
ADDR B
WE B S c r a
m b
l e r
RNG
FSM
IN B
Mask m
L ( x ) π
C i p
h e r
L ( x
)
Proposal #5: Data Masking with BRAMs
Dual-ported BRAM allow simultaneous access and mask update in Q-box
Active context (Q-box #1) used by cipher operation
Inactive context (Q-box #2) updates mask by concurrent process
Context switch after update and cipher process are finished
BR
AM
Q-box #2 (active,
using m2)
Q-box #3 (inactive, using m3)
Current Mask m:
F439AD0B8C…
23 CHES 2011 | Nara, Japan | Tim Güneysu
B R
A M
Context B S - box under scrambling
Context A Active S - box
OUT A
ADDR A
Active Context
OUT B
ADDR B
WE B S c r a
m b
l e r
RNG
FSM
IN B
Mask m
L ( x ) π
C i p
h e r
L ( x
)
Proposal #5: Data Masking with BRAMs
Dual-ported BRAM allow simultaneous access and mask update in Q-box
Active context (Q-box #1) used by cipher operation
Inactive context (Q-box #2) updates mask by concurrent process
Context switch after update and cipher process are finished
BR
AM
Q-box #4 (inactive, using m4)
Q-box #3 (active,
using m3)
Current Mask m:
4E9A25C321…
24 CHES 2011 | Nara, Japan | Tim Güneysu
Agenda
Introduction and Motivation
Design Proposals for FPGAs
Noise Generation
Clock Disalignment
Memory Masking
Evaluations
Conclusions FPGA
25 CHES 2011 | Nara, Japan | Tim Güneysu
Evaluation based on AES T-Table Implementation
AES-128 T-Table implementation/128-bit data path (16 T-Tables, 21 cycles)
SASEBO board populated with Xilinx Virtex-II Pro FPGA (xc3vp7)
Measuring setup: Diff. Probe at LeCroy WP715Zi 1.5 GHz@2GS/s
Correlation Power Analysis (CPA) using Hamming Weight (HW) model
B R
A M
8 8 8 32
T 0 T 1 T 2 T 3
IN 0
k i
8
128
8 8 8 8
128 ...
Column 3
T 0 T 1 T 2 T 3 k 0
k i + 3
k 3
32
Column 0
IN 3
S 3 S 0
32 32
π π π π π π π π
26 CHES 2011 | Nara, Japan | Tim Güneysu
Evaluations: CPA on individual CMs
Plain AES-128@24Mh:
104 measurements
3,000 traces req.
Individual/all noise generators combined:
Parameters used: r=16 (instances), s=36 (width)
5x104 measurements
8,000 traces req.
Clock disalignment
8 phase shift steps
107 measurements
3,000,000 traces req.
Memory masking with dual-ported BRAMs
108 measurements
Not successful (using first-order attack)
27 CHES 2011 | Nara, Japan | Tim Güneysu
Evaluations: Efficiency and Resources
To achieve higher SCA protection, combine several countermeasures
CMs are quite efficient (parameters used: s=16 (instances), r=36 (unit width))
(FF = Flip-Flop, LUT = Look-Up-Table, CB = Clock Buffer,
DCM=Digital Clock Manager, BRAM = Block RAM)
Overhead for
28 CHES 2011 | Nara, Japan | Tim Güneysu
Agenda
Introduction and Motivation
Design Proposals for FPGAs
Noise Generation
Clock Disalignment
Memory Masking
Evaluations
Conclusions FPGA
29 CHES 2011 | Nara, Japan | Tim Güneysu
Conclusions
Proposed five generic countermeasures specific for FPGAs
(implemented using resources usually wasted otherwise)
Noise generators
Clock disalignment and manipulation detection
Memory masking using dual-ported BRAMs
Memory masking method provides solid protection against first-order attacks
Combining countermeasures might also provide protection against
higher-order attacks ( still needs to be evaluated!)
For third-party evaluation, PROM files for SASEBO are provided
available after next week at http://www.emsec.rub.de/research/publications
Ende. Thank you!