5
On-The-Fly AES Key Expansion For All Key Sizes on ASIC P.V.Sriniwas Shastry 1 , M. S. Sutaone 2 , 1 Cummins College of Engineering for Women, Pune, 2 College of Engineering , Pune [email protected] Abstract This paper proposes the design and implementation of On-The-Fly (OTF) computation of round keys of Advanced Encryption Standard (AES) for all key sizes. The OTF implementation architecture has ensured generation of round key of 128 bits each for the input cipher key sizes of 128, 192 and 256 bits. The implementation was targeted on 180nm CMOS technology using standard cell libraries. Key expansion unit is such designed that, it can be used for both encryption and decryption of AES. The design was clocked at 179MHz to generate 128-bit round keys at a throughput of 22.912Gbps. Key words: On-The-Fly Key Expansion, AES, Very Large Scale Integration (VLSI), All key sizes. 1. Introduction Advanced Encryption Standard (AES) is a symmetric key, block cryptographic algorithm [1]. The rapidly growing need of secure data communication on mobile computing platforms as well as portable devices has led to increasing demand of hardware implementation of stronger encryption standards like AES. The hardware implementation of AES is more reliable and introduces more security against attacks. The need of higher speed of operations and higher security has instigated many researchers to implement the crypto-system algorithms on FPGA and ASIC platforms. Researchers have implemented AES using rolled architectures, pipelined architectures, sub- pipelined architectures. To date several AES implementations have been published to target very low area designs, while some have been targeting high throughput approaches. Rolled architecture implementations have resulted into minimum use of silicon area and low power, whereas pipelined architectures have achieved high throughput in several tens of Gbps. Further better results were achieved in these same architectures by optimizing substitute box and mixed column operations of AES. The OTF computation of round keys required by encryption or decryption block are performed in the key expansion unit without needing memory to store the keys [2]. Instead of dedicated key expansion units for different key lengths, an architecture which support different key lengths combined with key generating process for encryption as well as decryption, can significantly reduce the hardware cost of full key length AES [3]. The computation of substitute byte on the fly employs the use of composite field arithmetic in reducing the complexity while computing the multiplicative inverse in GF(2 8 ) has further reduced the power consumption and helped in increasing the speed [2][3]. The implementation of substitute byte function involves handling the nonlinearity properties of multiplicative inverse computation of an input byte. The substitute byte operation is a byte function hence an AES implementation with 128bit depth of data path requires sixteen such concurrent functions. Concurrently the substitute byte operation is also needed while performing the key expansion. In this paper we have presented OTF architecture for round key generation for all cipher key sizes. The substitute byte operation is also performed using combinational circuit and hence does not require the memory elements. The design uses limited resources with merely one 256 bit register, for all key sizes. The rest of the paper is organized in the following manner, Section 2 describes the Key expansion unit, Section 3 includes our proposed architecture and Section 4 gives the results and compares with that of others. Lastly in Section 5, conclusion of this work. 2. Key Expansion for AES The key expansion unit of AES takes a cipher key and conducts a key expansion routine to generate various round keys required based on the size of the original cipher key. The key expansion routine can generate 128-bit round keys required by AddRoundKey operation of the encryption or InvAddRoundKey operation of the decryption, from 128-bit or 192-bit or 256-bit input cipher key. The number of rounds (Nr) to be performed depends of the key size, and are mentioned in Table.I. Nb is the number of words of key data with 32bits of each word. The key expansion unit performs RotWord, SubWord and XOR operation with RCON. The explanation of each of these sub- operations are given as under. The RotWord operation is a cyclic rotation of bytes within a word to left. This operation is applied only to the lowest significant word of the cipher key. Let the P V Sriniwas Shastry et al , Int.J.Computer Technology & Applications,Vol 5 (1),217-221 IJCTA | Jan-Feb 2014 Available [email protected] 217 ISSN:2229-6093

On-The-Fly AES Key Expansion For All Key Sizes on ASIC · PDF file1 Cummins College of Engineering for Women, Pune, 2 College of Engineering , Pune. [email protected]

  • Upload
    tranbao

  • View
    229

  • Download
    3

Embed Size (px)

Citation preview

Page 1: On-The-Fly AES Key Expansion For All Key Sizes on ASIC · PDF file1 Cummins College of Engineering for Women, Pune, 2 College of Engineering , Pune. pvs.shastry@cumminscollege.in

On-The-Fly AES Key Expansion For All Key Sizes on ASIC P.V.Sriniwas Shastry1, M. S. Sutaone2,

1 Cummins College of Engineering for Women, Pune, 2 College of Engineering , Pune

[email protected]

Abstract

This paper proposes the design and implementation of

On-The-Fly (OTF) computation of round keys of Advanced Encryption Standard (AES) for all key sizes.

The OTF implementation architecture has ensured

generation of round key of 128 bits each for the input

cipher key sizes of 128, 192 and 256 bits. The

implementation was targeted on 180nm CMOS

technology using standard cell libraries. Key

expansion unit is such designed that, it can be used for

both encryption and decryption of AES. The design was

clocked at 179MHz to generate 128-bit round keys at a

throughput of 22.912Gbps.

Key words: On-The-Fly Key Expansion, AES, Very Large Scale Integration (VLSI), All key sizes.

1. Introduction Advanced Encryption Standard (AES) is a

symmetric key, block cryptographic algorithm [1]. The

rapidly growing need of secure data communication on

mobile computing platforms as well as portable devices

has led to increasing demand of hardware implementation of stronger encryption standards like

AES. The hardware implementation of AES is more

reliable and introduces more security against attacks.

The need of higher speed of operations and higher

security has instigated many researchers to implement

the crypto-system algorithms on FPGA and ASIC

platforms. Researchers have implemented AES using

rolled architectures, pipelined architectures, sub-

pipelined architectures. To date several AES

implementations have been published to target very

low area designs, while some have been targeting high

throughput approaches. Rolled architecture

implementations have resulted into minimum use of

silicon area and low power, whereas pipelined

architectures have achieved high throughput in several

tens of Gbps. Further better results were achieved in

these same architectures by optimizing substitute box and mixed column operations of AES.

The OTF computation of round keys required by

encryption or decryption block are performed in the

key expansion unit without needing memory to store

the keys [2]. Instead of dedicated key expansion units

for different key lengths, an architecture which support

different key lengths combined with key generating

process for encryption as well as decryption, can

significantly reduce the hardware cost of full key length

AES [3]. The computation of substitute byte on the fly

employs the use of composite field arithmetic in

reducing the complexity while computing the multiplicative inverse in GF(2

8) has further reduced the

power consumption and helped in increasing the speed

[2][3]. The implementation of substitute byte function

involves handling the nonlinearity properties of

multiplicative inverse computation of an input byte.

The substitute byte operation is a byte function hence

an AES implementation with 128bit depth of data path

requires sixteen such concurrent functions.

Concurrently the substitute byte operation is also

needed while performing the key expansion.

In this paper we have presented OTF architecture for

round key generation for all cipher key sizes. The

substitute byte operation is also performed using

combinational circuit and hence does not require the

memory elements. The design uses limited resources

with merely one 256 bit register, for all key sizes. The

rest of the paper is organized in the following manner, Section 2 describes the Key expansion unit, Section 3

includes our proposed architecture and Section 4 gives

the results and compares with that of others. Lastly in

Section 5, conclusion of this work.

2. Key Expansion for AES The key expansion unit of AES takes a cipher key

and conducts a key expansion routine to generate

various round keys required based on the size of the

original cipher key. The key expansion routine can

generate 128-bit round keys required by AddRoundKey

operation of the encryption or InvAddRoundKey

operation of the decryption, from 128-bit or 192-bit or 256-bit input cipher key. The number of rounds (Nr) to

be performed depends of the key size, and are

mentioned in Table.I. Nb is the number of words of

key data with 32bits of each word. The key expansion

unit performs RotWord, SubWord and XOR operation

with RCON. The explanation of each of these sub-

operations are given as under.

The RotWord operation is a cyclic rotation of bytes

within a word to left. This operation is applied only to

the lowest significant word of the cipher key. Let the

P V Sriniwas Shastry et al , Int.J.Computer Technology & Applications,Vol 5 (1),217-221

IJCTA | Jan-Feb 2014 Available [email protected]

217

ISSN:2229-6093

Page 2: On-The-Fly AES Key Expansion For All Key Sizes on ASIC · PDF file1 Cummins College of Engineering for Women, Pune, 2 College of Engineering , Pune. pvs.shastry@cumminscollege.in

4-byte word be represented as w[i], with i in the range 0

i < Nb(Nr+1), then the RotWord operation is

performed to the word w[ik -1], where the condition {ik mod Nk = 0}, is satisfied. The value of Nk is 4 , 6 or 8

for 128-bit,192-bit or 256-bit cipher keys respectively.

SubWord is SubstituteByte transformation applied

independently to each byte of an word w[ik -1], after

RotWord operation is performed, except in case of 256-

bit cipher keys. The SubWord operation for a 256-bit

cipher key is performed on the w[ik -1] word where the

condition {ik mod Nk = 0} and the condition {ik mod

Nk = 4} is satisfied.

RCON is the round constant word which is XORed

with the substituted word after SubWord operation.

The values of RCON array, [xi-1

, {00}h, {00}h ,{00}h ]

are constituted for i, where the initial value starts with

„1‟ and not „0‟. The values of xi-1

being powers of x,

denoted as{02}h in the GF(28). Every following word,

w[i] is equal to the XOR of the previous word, w[i-1]

and the word Nk positions earlier, w[i-Nk]. Refer Figure 1.

The key expansion may be processing either 128-bit

or 192-bit or 256-bit in each iteration, but the round

keys supplied to the AddRoundKey operation in

encryption or InvAddRoundKey operation in

decryption is always 128-bit. This is because the data

path consisting of encryption or decryption is always

128-bit depth, while the key expansion path may be

different for different key sizes.

Figure 1. Computations of key expansions

TABLE I. KEY EXPANSION COMBINATIONS

Cipher Key

Size

Number of

Rounds

(Nr)

Number of

Words per

expansion

Number of

Words per

round key

(Nk) (Nb)

128 -bit 10 4 4

192 -bit 12 6 4

256 -bit 14 8 4

The derivation of round keys from the expanded

keys is illustrated in Figure 2. In all there will be Ne

key expansions, depending upon the key size, where

the value of Ne can be computed as shown in equation

(1). Hence the value of Ne is 10, 8 and 7 for 128-bit,

192-bit and 256-bit respectively, after substituting the

values of Nr, Nb and Nk from Table I.

Ne = (Nr * Nb)/Nk (1)

The round keys are required in the reverse order

while performing the decryption data path. Hence the

round keys expanded while encryption are normally

stored in the memory so as to retrieve the keys in the

reverse order while decryption.

Figure 2. Key expansion for different key sizes (a) 128-

bit key (b) 192-bit key (c) 256-bit key

3. Proposed Architecture for OTF Key

expansion

Our proposed architecture makes use of a 256-bit

register, which temporarily logs the round keys. The

(a) (b) (c)

For 128-bit and 192-bit keys:

w[i-1]* = SubWord(RotWord(w[i-1])

w[i] = [{w[i-1]* RCON[i/Nk]} w[i-Nk] ]

---for i mod Nk = 0;

= w[i-1] w[i-Nk] ---for other values of i;

For 256-bit key:

w[i-1]* = SubWord(RotWord(w[i-1])

w[i] = [{w[i-1]* RCON[i/Nk]} w[i-Nk] ]

---for i mod Nk = 0;

= SubWord(w[i-1]) ---for i mod Nb = 0;

= w[i-1] w[i-Nk] ---for other values of i;

P V Sriniwas Shastry et al , Int.J.Computer Technology & Applications,Vol 5 (1),217-221

IJCTA | Jan-Feb 2014 Available [email protected]

218

ISSN:2229-6093

Page 3: On-The-Fly AES Key Expansion For All Key Sizes on ASIC · PDF file1 Cummins College of Engineering for Women, Pune, 2 College of Engineering , Pune. pvs.shastry@cumminscollege.in

size of the register is chosen so as to accommodate expanded round key of all three sizes. The round keys

are generated using multiple iterations and after every

iteration the round key of 128-bit needed for

AddRoundKey operation, is placed at the upper half of

the register. As shown in the Figure 3, a multiplexer is

used, which swaps the key expanded in the earlier

iteration, to place the round key at the upper half of the

256-bit register.

A common architecture is designed for all the three

key sizes. The most critical part of the architecture is

to manage different number of expansion iteration for

each size, while keeping the round key size as 128-bit.

With an assumption that the encryption and decryption

data path is implemented using rolled architecture and

every clock event to the encryption or decryption data

path, results into one round of encryption. Hence the

key expansion unit also has to generate one round key

per clock cycle and this condition would be applicable for all three key sizes.

As mentioned in the Figure 1, there are specific

words which are operated with SubWord, RotWord and

then XOR with RCON. The round key generation per

clock cycle is based on 128-key expansion procedure.

In order to match to timing for different key sizes, the

original key as well as subsequent round keys are

shuffled after every clock cycle. The advantage of data

shuffling is that only four data processing elements

would be required for completion of key expansion for

three key sizes [7]. Figure 3 shows the above said

arrangement and the key expansion architecture. In our

architecture we have generated controls signals which

select the multiplexer data lines using sequential

machine and no processor has been employed as done

in [7].

A round counter is maintained so as to generate the select lines for the multiplexers. In case of 128-bit key

expansion, each clock cycle generates one round key

through one expansion iteration. Hence a total of 10

clock cycles would be needed to generate round keys

using 128-bit expansion. In case of 192-bit key

expansion, every three clock cycles generate three

round keys through two expansion iterations, therefore

we require 12 clock cycles. While expanding 256-bit

keys, every two consecutive clock cycles generate two

round keys through one expansion iteration, resulting in

to use of total 14 clock cycles. These iteration and their

required number of clock cycles are exactly matches

with that of encryption or decryption data paths.

In Figure 3(a), the swapping of the words are

shown for 192-bit and 256-bit key expansion. In case

of128-bit key expansion, no swapping of words is

needed and hence the data lines joins direct vertically

down to the corresponding word. While performing 128-bit key expansion, the words, w4, w8, w12, w16,etc.,

performs extra computations of RotWord, SubWord

and XOR with RCON. Similarly the words w6, w12,

w18, w24, etc., in 192-bit expansion performs extra

computations alike 128-bit expansion. In case of 256-

bit expansion the words w8, w16, w24, w32, - - w56

perform RotWord, SubWord and XOR with RCON,

while the words w12, w20, w28, w36, - - ,w52 performs

only SubWord operation.

The word multiplexers in Figure 3(b) selects the first

input cipher key or swapped data word from the

previous key expansion iteration based on the swapping

strategy shown in Figure 3(a). The architecture also

performs the reverse expansion of the round keys for

the decryption data path.

(a) (b)

Figure 3. (a) Data swapping strategy (b) All key size key expansion architecture

P V Sriniwas Shastry et al , Int.J.Computer Technology & Applications,Vol 5 (1),217-221

IJCTA | Jan-Feb 2014 Available [email protected]

219

ISSN:2229-6093

Page 4: On-The-Fly AES Key Expansion For All Key Sizes on ASIC · PDF file1 Cummins College of Engineering for Women, Pune, 2 College of Engineering , Pune. pvs.shastry@cumminscollege.in

In our proposed architecture the splitting of the 256-bit data shuffling multiplexers [7] into word

multiplexers has reduced the power consumption,

because the multiplexers unselected remain inactive

resulting into lower dynamic power consumption. The

input to the multiplexer at „0‟ indexed port is for the

cipher key given by the user. The input to the „1‟

indexed port is for the 128-bit expansion, „2‟ indexed

port for 192-bit expansion and „3‟ indexed input port is

for 256-bit key expansion.

The design was synthesized using RTL Compiler of

Cadence. Standard cell libraries of 180nm were

employed for synthesizing the design. A clock

frequency of 179MHz has successfully clocked the

design, while having 495ps worst case slack.

Irrespective of the key size, every clock cycle has

generated one round key of 128-bit at a throughput of

22.91Gbps. The throughput calculations are done

using equation (2).

Throughput = 128 * Clock Frequency

(2)

The synthesis results are presented in Table II. The

physical layout design on 180nm was performed using

SoC Encounter of Cadence. The total design was fit

into 61153 um2 area, with a core density of 70%.

4. Results and Comparison

We have implemented the OTF key expansion for all

key sizes using TSMC 180nm cell libraries. We

compare our implementation results in Table III. The

design in [7] has similar implementation and clocked

the design at 102MHz and achieving approximately

13.056Gbps. The design in [8] also implemented OTF

key expansion unit, but only for 128-bit key size. Even

though another similar implementation for different key

sizes was proposed in design [4], but it was

implemented on 250nm technology, also it has

consumed 26,639 gate count which is quite higher than

our gate count. The design proposed in [6] was also

implemented on 180nm technology, but have used

pipelined architecture for the 128-bit OTF key

expansion. Also this design has used32-bit data path

and achieved 10.656Gbps.

5. Conclusion

We have presented a new optimization method while

implementing the On-The-Fly Key expansion for all

key sizes on 180nm technology, by splitting

Multiplexers into word multiplexers and keeping them

inactive, when not in use. Particularly while 128-bit

key expansion is performed and while 192-bit key

expansion is performed. This has not only reduced the

number of gates required but also reduced the dynamic

power consumption.

6. References [1] Advanced Encryption Standard (AES)", Federal

Information Processing Standards Publications (FIPS PUBS)

Publication 197, November, 2001. [2] Qingfu Cao, Shuguo Li, “A high throughput cost-

effective ASIC implementation of the AES algorithm”, Proc.

IEEE 8th International Conference on ASIC (ASICON)2009,

pp. 805-808. [3] Po-Chun Lie, Chang Hsie-Chia, Chen-Yi Lee, “A

1.69Gbps area-efficient AES Crypto Core with compact on

the fly key expansion unit”, Proc. ESSCIRC 2009, pp. 404-

407. [4] Chih-Pin Su, Chia-Lung Horng, Chih-Tsun Huang and

Cheng-Wen Wu, “ A configurable AES processor for

enhanced security”, Proc. ASP-DAC 2005, pp. 361-366

[5] Shen-Fu Hsiao, Ming-Chih Chen, Chia-Shin Tu, “Memory-free low cost designs of advanced encryption

standard using common subexpression eliminationfor

subfunctions in transformations”, IEEE Trans. Circuits and

Systems -I: Regular papers, Vol.53, No.3, March 2006, pp 615-626.

[6] P Saravanan, N Renukadevi, G Swathi, P Kalpana, “A

high-throughput ASIC implementation of configurable

advanced encryption standard(AES)”, Proc. IJCA special issue on “Network Security and Cryptography NSC, 2011.

Table II. Synthesis result

Particulars Values

Number of Standard cell

Instances

3231

Standard Cell Area 16273

Power dissipation 1.79mW

Slack 495ps

Clock Frequency 179MHz

Physical Area (Physical Layout) 61153m2

Table III. Implementation Comparison

Particulars [4] [7] Ours

CMOS Technology 250nm 180nm 180nm

Frequency (MHz) 66 102 179

Throughput (Gbps) 8.448 13.056 22.912

Gates 26,639 26,639 16,284

Key sizes 128, 192 and 256-

bit

128,192 and 256-

bit

128,192 and

256-bit

Data path depth 128 bits 128 bits 128 bits

P V Sriniwas Shastry et al , Int.J.Computer Technology & Applications,Vol 5 (1),217-221

IJCTA | Jan-Feb 2014 Available [email protected]

220

ISSN:2229-6093

Page 5: On-The-Fly AES Key Expansion For All Key Sizes on ASIC · PDF file1 Cummins College of Engineering for Women, Pune, 2 College of Engineering , Pune. pvs.shastry@cumminscollege.in

[7] Mao-Yin Wang, Chih-Pin Su, Chia-Lung Horng, Chen-Wen Wu, Chih-Tsun Huang, “Single and multi-core

configurable AES architectures for flexible security”, IEEE

Tans. on Very Large Scale Integration (VLSI) Systems,

Vol.18, No. 4, April 2010, pp. 541-551. [8] A Alma‟aitah, Zine-Eddine Abid, “ Area efficient-high

throughput sub-pipelined design of the AES in CMOS

180nm”, Proc. 5th International Design and Test Workshop

(IDT), 2010, pp. 31-36.

P V Sriniwas Shastry et al , Int.J.Computer Technology & Applications,Vol 5 (1),217-221

IJCTA | Jan-Feb 2014 Available [email protected]

221

ISSN:2229-6093