24
An Optimized S-Box Circuit Architecture for Low Power AES Design IBM Japan Ltd. Tokyo Research Laboratory Sumio Morioka and Akashi Satoh

IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

An Optimized S-Box Circuit Architecture for Low Power

AES Design

IBM Japan Ltd.Tokyo Research Laboratory

Sumio Morioka and Akashi Satoh

Page 2: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

� Background

� Power Analysis of Conventional S-Boxes

� Multi-Stage PPRM S-Box for Low-Power H/W

� Conclusion

Contents

Page 3: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Background

Page 4: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Back Ground

� In 2001, NIST selected Rijndael as the new symmetric key standard cipher AES

� AES H/W will be integrated in various applications � Low-power feature is important not only in low-end

applications but also in high-end servers� A 10-Gbps AES H/W chip consumes several Watts

in 0.13-µm CMOS � Need for power analysis and development of low-

power architectures for AES H/W

Page 5: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Power Analysis Method

PowerCalculation

� Power analysis is based on timing simulation� Gate switching including dynamic hazard is evaluated� Quite accurate estimation compared with static

analysis

TimingSimulation

BUFFER 0101010101MUX21I 0011001101XOR2 0100101101NAND2N1 0100101010XNOR2 1111000111XOR3 0101010100XOR2 1110001111NAND2N1 0111000111XOR3 0011100011

Wave Form

Test Vector

DQ

Synthesis

Net Listlibrary ieee;use ieee.std_logic;

entity XNOR is port (A : in std_logic; B : in std_logic; Y : out std_logic;end XNOR;

architecture RTL of XNOR isbegin signal T : std_logic; T <= A xor B; Y <= not T;end RTL

VHDL Code

Page 6: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

� 128-bit bus 11-round Loop Architecture� Table-lookup-based S-Box using SOP logic

Power Analysis in AES H/W

128

AddRoundKey

Data Input

Data Output

Data Register

SubBytes

ShiftRows

MixColumns

3:1

2:11 %

10 %

1 %

75 %

15%OthersOthers

Page 7: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

For Low-Power AES H/W

� Reducing S-Box power is most effective� There are various S-Box H/W architectures� Which architecture is suitable for low power ?

� Investigate performance of all conventional S-Boxes

� Develop a low-power S-Box architecture

Objectives

Page 8: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Power Analysis ofConventional S-Boxes

Page 9: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

S-Box Definition

� Nonlinear byte substitution function� Multiplicative Inversion on GF(28) + affine

transformation.

1 0 0 0 1 1 1 11 1 0 0 0 1 1 1 1 1 1 0 0 0 1 11 1 1 1 0 0 0 11 1 1 1 1 0 0 00 1 1 1 1 1 0 00 0 1 1 1 1 1 00 0 0 1 1 1 1 1

1 1000110

+-1a=b

Affine trans.8x8 XOR matrix

InversionComplicated math. logic

Page 10: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

S-Box Architectures

� Generate black box circuit from a truth table� High-speed� Rather large� S-Box and S-Box-1 cannot be merged

� Various mathematical techniques can be applied� Rather slow� Compact implementation� S-Box and S-Box-1 can be merged

GF inverter + affine

Direct mapping

Page 11: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

GF Inverter + Affine� Apply hieratical structure of composite field GF(((22)2)2)

� Map elements on GF(28) onto GF(((22)2)2) by isomorphism� Each component is constructed using AND-XOR logic

isomorphism

-1

affine trans.

isomorphism

inversion

����

����

GF((2

GF(2)

)2 )2GF(((2 )2 )2 )2

)2GF(2

GF(28)

GF(((22)2 2) )

GF(28)φφφφ2

2

2x-1x

2

2

4

4

2x-1x

4

4

λλλλ

isomorphism

-1

affine trans.

isomorphism

inversion

����

����

GF((2

GF(2)

)2 )2GF(((2 )2 )2 )2

)2GF(2

GF(28)

GF(((22)2 2) )

GF(28)merged

Page 12: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Direct Mapping

ANDmatrix

8-bit input

8-bit output

OR / XORmatrix

Severalhundred

wires

� 2-Level Logic� SOP (Sum of Products) : (NOT)-AND-OR� POS (Product of Sums) : (NOT)-OR-AND� PPRM (Positive Polarity Reed-Muller) : AND-XOR

Page 13: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Direct Mapping

in0 in1 in2 in3 in4 in5 in6 in7

out0

out7

out1

2:1MUX

0

1

0

1

in0 in1 in2 in3 in4 in5 in6 in7

out0

out7

out1

� Selector-Based Logic� BDD (Binary Decision Diagram)� Twisted BDD will appear at ICCD 2002 in Sep.

BDDTwisted

BDD

Page 14: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Power vs. Gate Count

� Smaller circuits do not always consume less power

0

50

100

150

200

250

300

350

400

0 0.5 1 1.5 2 2.5 3

Gate Count (Kgate)

Pow

er (u

W)

Composite Field SOP Table

Lookup

TwistedBDD

BDDPPRM

0.13-µm 1.5-V CMOS @ 10 MHzNominal conditions

POS

Page 15: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Analysis

� Power consumption of S-Box is greatly influenced by the number of dynamic hazards

� Differences of signal arrival times at each gate� Propagation probability of signal transitions

Caused by

Page 16: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Analysis

� Composite field S-Box consumes a lot of power in spite of having the smallest size

� It has many crossing and branched signal paths

2x-1x

λλλλ

Switch many times

Multiple signalpaths

Differences of signal arrival times at each gate

Page 17: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Analysis

� An XOR gate transfers signal transitions from input to output with probability 100%

� For AND, OR gates, the probability is 50%� So power for SOP is lower than PPRM

100% Probability

AND

OR

0

1

50% Probability

1

0 1

0

1XOR0

Propagation probability of signal transitions

Page 18: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Multi-Stage PPRM S-Box for Low-Power H/W

Page 19: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Approach for Low Power S-Box

0.5

ANDarray

XORarray

1.0Transition Probability

0.5 1.0 0.5 1.0 0.5 1.0

� Use composite field S-Box� Reduce gate counts

� Divide combination logic into multiple stages� Reduce probability of signal transitions

� Adjust the signal timing between each stage using 2-level logic� Reduce the number of dynamic hazards

Page 20: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Multi-Stage PPRM� Composite field S-Box is divided into several blocks� Each block is designed using PPRM logic suitable for

GF operations� Adjust signal timing by using 2-level (AND-XOR) logic

ANDarray

XORarray

28 ANDarray

XORarray

12 ANDarray

XORarray

324 4

4

4

4

4

Delay chain

88

4

4

2x-1x

4

4

-1����

����44

4

4

88 +affine

λλλλ

3-stage PPRM

Page 21: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Multi-Stage PPRM� PPRM S-Boxes can be divided many ways (1~6-stages)

4

4

2x-1x

4

4

λλλλ -1����

����44

4

4

88 +affine

XOR AND-XOR AND-XOR AND-XOR

AND-XOR AND-XOR AND-XOR

φφφφ2

2

2x-1x

2

2

Page 22: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Power vs. Gate Count� 3-stage PPRM is the most effective architecture� Multi-stage SOP is not suitable for GF operation

0

100

200

300

400

500

600

700

800

0 1 2 3 4 5 6 7Gate Count (Kgate)

Pow

er (u

W)

PPRMSOP

1-stage(normal PPRM)

2-

3-4-6-

2-

1-stage(normal SOP)

3-0.13-µm 1.5-V CMOS @ 10 MHzNominal conditions

Page 23: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

0

50

100

150

200

250

300

350

400

0 0.5 1 1.5 2 2.5 3

Gate Count (Kgate)

Pow

er (u

W)

Composite Field SOP Table

Lookup

TwistedBDD

BDDPPRM

3-stage PPRM

0.13-µm 1.5-V CMOS @ 10 MHzNominal conditions

POS

Power vs. Gate Count� 3-stage PPRM is the most effective architecture� Multi-stage SOP is not suitable for GF operation

1/3 power with 1/2 gates

Page 24: IBM Japan Ltd. AES Design · PPRM 3-stage PPRM 0.13-µ m 1.5-V CMOS @ 10 MHz Nominal conditions POS Power vs. Gate Count 3-stage PPRM is the most effective architecture Multi-stage

Conclusion

� The AES S-Box (SOP logic) consumes 75% of the power

� Dynamic hazards boost power needs of S-Boxes� A multi-stage PPRM architecture based on a

composite field S-Box was developed� 3-stage PPRM / SOP : Power = 1/3, Size = 1/2� This architecture can be applied to other S-Boxes

defined over Galois fields