SCAs against Embedded Crypto Devicesattackschool.di.uminho.pt/slides/slides_fxs1.pdf · 2014. 11....

Preview:

Citation preview

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 1

SCAs against Embedded Crypto Devices

F.-X. Standaert

UCL Crypto Group, Universite catholique de Louvain

Lecture 1 - Hardware Implementations

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 2

Outline

I Different types of computing devices

I Two key concepts

I Hardware performance indicatorsI Implementation tradeoffs

I Technology scaling

I Design tradeoffs

I FPGAs

I Application to block ciphers

I Further readings

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 3

Different types of computing devices

I General purpose computers (e.g. microprocessors)I Software-programmed

I Reconfigurable devices (e.g. FPGAs)I Application Specific Integrated Circuits (e.g. AES)

I Hard-codedI Tradeoff: flexibility vs. performance

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 4

Sequential logic

I 1 cycle: read in memory - operate - store in memory

I Operation delay Top > than critical path Tph (in sec)

I Operation frequency fop = 1/Top (in Hz)

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 5

Abstraction levels (for memory & operations)

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 6

Hardware performance indicators

I Hardware cost (in gates, transistors or circuit size)

I Operation frequency (in Hz)

I Data throughput (in bit/sec)

I Data latency (in clock cycles)I Power and energy (in Watts and Joules)

I Not equivalent, e.g.I Power matters for RFID devicesI Energy matters for battery-supplied devices

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 7

Implementation tradeoffs

I Tph ∝ LD · CL·Vdd

Ion, with:

I LD the operation logic depth (in gates)I CL the load capacitance (in Farad)I Vdd the circuit supply voltageI Ion the MOSFET drain current in ON state

“Tph decreaseswith larger Vdd”

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 8

Implementation tradeoffs (II)

I Sources of power and energy consumptionI Ptot = Pdyn + Pstat

I Pdyn ∝ Ngates · CL · V 2dd · fop · α (1 + βsc)

I α: activity factor / β: short circuitsI Pstat ∝ Ileak · Vdd

I with Ileak increasing with smaller Vdd

“Minimum energy besttrades Pdyn and Pstat”(here with Top = Tph)

⇒ ∃ frequency/energy tradeoff

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 9

Technology scaling

I Pdyn dominates old technologies (down to 0.1µm)

I Pstat becomes significant in nanoscale devices

I Inter-device variability also increases with scaling !

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 10

Design tradeoffs

I Resources sharing, e.g. with the AES ByteSub

I Low cost design: 1 S-box, 16 cycles

I Fast design: 16 S-boxes, 1 cycles

I Low cost implies more control ⇒ less efficient

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 11

Design tradeoffs (II)

I Inner pipelining, e.g. with the AES round

Ideally: fop × 2(usually worse in practice)

Latency: 11 → 22 (cycles)

Throughput?

(128 bits/11 cycles) · fop(256 bits/22 cycles) · fop⇒ ideally ×2

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 11

Design tradeoffs (II)

I Inner pipelining, e.g. with the AES round

Ideally: fop × 2(usually worse in practice)

Latency: 11 → 22 (cycles)

Throughput?(128 bits/11 cycles) · fop(256 bits/22 cycles) · fop⇒ ideally ×2

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 12

Design tradeoffs (III)

I Further improvements of the throughput (fop fixed)

Parallelism (left) less efficient than outer pipelining (right)

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 12

Design tradeoffs (III)

I Further improvements of the throughput (fop fixed)

Parallelism (left) less efficient than outer pipelining (right)

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 13

FPGAs

I “Sea” of programmable logic blocks

I Connected with programmable routing

I Functionality determined by configuration bitsI Different technologies

I 0.18µm → 45 nmI Several manufacturers

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 14

FPGAs (II)

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 15

FPGAs (III)

I Logic blocksI From 3-input Look-Up Tables. . .

to 8-bit Arithmetic and Logic UnitsI The granularity of the device influences both the

design performances and configuration time

I Routing blocksI Structured according to the interconnect lengthI Major impact in final performances

I Embedded blocksI Memories, multipliers, . . .

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 16

(How to use) FPGAs (IV)

I Compared to ASICs: fabrication + packaging arereplaced by configuration (i.e. sending a programmingfile to the chip to determine the “gates” functionality)

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 17

FPGAs (V)

I Example: Xilinx logic block

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 18

Application to block ciphers

I Target FPGA 1 has logic blocks (LB1) made of:I Two 4-input LUTsI One 1-bit MUX to combine the LUTsI Two registers

I Target FPGA 2 has logic blocks (LB2) made of:I Four 6-input LUTsI Three 1-bit MUX to combine the LUTsI Four registers

I Embedded memory, with each block (MB) made of:I 4096-bit RAM memoriesI Dual-ported (i.e. 2 R/W operations per cycle)I Configurable (4096× 1, 2048× 2, . . . )

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 19

S-box implementations

I “Minimum memory” cost (in bits) of S1/S2? { . . . }I Cost of S1/S2 in LB1/LB2? { . . . }I Would you use the memory to implement S1/S2?

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 20

Block cipher design

I Consider an AES-like cipher with the following round:

I To be implemented in FPGA 1 with S-box S2

I With MixColumn in 256 LUTs and logic depth 2 LUTs

I And the full cipher iterating 11 rounds

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 21

Block cipher design

I What is the cost of one round in LUTs?I Design and evaluate the cost (in LUTs and regs) of:

I A 1-round loop architecture without pipelineI A 1-round loop architecture with maximum pipeline

I What is the latency (in cycles) of these architectures?I Assume TLUT = 10 ns, what is the throughput

achieved by these architectures (in bit/sec)?I Is this assumption realistic (physically speaking)?

I “Ideally”, what would happen if we move to a 2-roundloop architecture, or a 32-bit loop architecture?

I { . . . }

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 22

Examples

I FPGA implementations of the AES Rijndael

Index E,D? Key Sched. Feedback? Device Architecture

1. E only on-the-fly no Virtex-E 128-bit unrolled2. E only on-the-fly no Virtex-E 128-bit loop3. E/D precomputed yes Virtex-II 32-bit loop4. E/D precomputed yes Spartan-II 8-bit loop5. E/D precomputed yes Spartan-II PicoBlaze

Index LUTs Regs. Slices RAMBs Freq. Throughput1. 3516 3840 2784 100 92 MHz 11.7 Gbit/sec2. 3846 2517 2257 0 169 MHz 2 Gbit/sec3. 288 113 146 3 123 MHz 358 Mbit/sec4. - - 124 2 67 MHz 2.2 Mbit/sec5. - - 119 2 90 MHz 710 Kbit/sec

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 23

Summarizing

I Specialized hardware implementations (ASICs, FPGAs)can be used to reach high performances

I Many different metrics exist (cost, speed, . . . )

I Hardware Design optimization (e.g. sharing,pipelining) depends on algorithmic features

I Technology scaling can have high impact too!

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 24

Further readings

I International Technology Roadmap forSemiconductors, http://www.itrs.net/

I F. Rodriguez-Henriquez, F. Saqib, N.A. Diaz, C.K.Koc, Cryptographic Algorithms on ReconfigurableHardware, Springer, 2007.

I H. Kaeslin, Digital Integrated Circuit Design,Cambridge University Press, 2008.

I J.M. Rabaey, Digital Integrated Circuits: a DesignPerspective, second edition, Prentice Hall, 2003.

UCL Crypto GroupMicroelectronics Laboratory SCAs against Embedded Crypto Devices - L1 25

Thanks

Recommended