Upload
tranbao
View
229
Download
3
Embed Size (px)
Citation preview
On-The-Fly AES Key Expansion For All Key Sizes on ASIC P.V.Sriniwas Shastry1, M. S. Sutaone2,
1 Cummins College of Engineering for Women, Pune, 2 College of Engineering , Pune
Abstract
This paper proposes the design and implementation of
On-The-Fly (OTF) computation of round keys of Advanced Encryption Standard (AES) for all key sizes.
The OTF implementation architecture has ensured
generation of round key of 128 bits each for the input
cipher key sizes of 128, 192 and 256 bits. The
implementation was targeted on 180nm CMOS
technology using standard cell libraries. Key
expansion unit is such designed that, it can be used for
both encryption and decryption of AES. The design was
clocked at 179MHz to generate 128-bit round keys at a
throughput of 22.912Gbps.
Key words: On-The-Fly Key Expansion, AES, Very Large Scale Integration (VLSI), All key sizes.
1. Introduction Advanced Encryption Standard (AES) is a
symmetric key, block cryptographic algorithm [1]. The
rapidly growing need of secure data communication on
mobile computing platforms as well as portable devices
has led to increasing demand of hardware implementation of stronger encryption standards like
AES. The hardware implementation of AES is more
reliable and introduces more security against attacks.
The need of higher speed of operations and higher
security has instigated many researchers to implement
the crypto-system algorithms on FPGA and ASIC
platforms. Researchers have implemented AES using
rolled architectures, pipelined architectures, sub-
pipelined architectures. To date several AES
implementations have been published to target very
low area designs, while some have been targeting high
throughput approaches. Rolled architecture
implementations have resulted into minimum use of
silicon area and low power, whereas pipelined
architectures have achieved high throughput in several
tens of Gbps. Further better results were achieved in
these same architectures by optimizing substitute box and mixed column operations of AES.
The OTF computation of round keys required by
encryption or decryption block are performed in the
key expansion unit without needing memory to store
the keys [2]. Instead of dedicated key expansion units
for different key lengths, an architecture which support
different key lengths combined with key generating
process for encryption as well as decryption, can
significantly reduce the hardware cost of full key length
AES [3]. The computation of substitute byte on the fly
employs the use of composite field arithmetic in
reducing the complexity while computing the multiplicative inverse in GF(2
8) has further reduced the
power consumption and helped in increasing the speed
[2][3]. The implementation of substitute byte function
involves handling the nonlinearity properties of
multiplicative inverse computation of an input byte.
The substitute byte operation is a byte function hence
an AES implementation with 128bit depth of data path
requires sixteen such concurrent functions.
Concurrently the substitute byte operation is also
needed while performing the key expansion.
In this paper we have presented OTF architecture for
round key generation for all cipher key sizes. The
substitute byte operation is also performed using
combinational circuit and hence does not require the
memory elements. The design uses limited resources
with merely one 256 bit register, for all key sizes. The
rest of the paper is organized in the following manner, Section 2 describes the Key expansion unit, Section 3
includes our proposed architecture and Section 4 gives
the results and compares with that of others. Lastly in
Section 5, conclusion of this work.
2. Key Expansion for AES The key expansion unit of AES takes a cipher key
and conducts a key expansion routine to generate
various round keys required based on the size of the
original cipher key. The key expansion routine can
generate 128-bit round keys required by AddRoundKey
operation of the encryption or InvAddRoundKey
operation of the decryption, from 128-bit or 192-bit or 256-bit input cipher key. The number of rounds (Nr) to
be performed depends of the key size, and are
mentioned in Table.I. Nb is the number of words of
key data with 32bits of each word. The key expansion
unit performs RotWord, SubWord and XOR operation
with RCON. The explanation of each of these sub-
operations are given as under.
The RotWord operation is a cyclic rotation of bytes
within a word to left. This operation is applied only to
the lowest significant word of the cipher key. Let the
P V Sriniwas Shastry et al , Int.J.Computer Technology & Applications,Vol 5 (1),217-221
IJCTA | Jan-Feb 2014 Available [email protected]
217
ISSN:2229-6093
4-byte word be represented as w[i], with i in the range 0
i < Nb(Nr+1), then the RotWord operation is
performed to the word w[ik -1], where the condition {ik mod Nk = 0}, is satisfied. The value of Nk is 4 , 6 or 8
for 128-bit,192-bit or 256-bit cipher keys respectively.
SubWord is SubstituteByte transformation applied
independently to each byte of an word w[ik -1], after
RotWord operation is performed, except in case of 256-
bit cipher keys. The SubWord operation for a 256-bit
cipher key is performed on the w[ik -1] word where the
condition {ik mod Nk = 0} and the condition {ik mod
Nk = 4} is satisfied.
RCON is the round constant word which is XORed
with the substituted word after SubWord operation.
The values of RCON array, [xi-1
, {00}h, {00}h ,{00}h ]
are constituted for i, where the initial value starts with
„1‟ and not „0‟. The values of xi-1
being powers of x,
denoted as{02}h in the GF(28). Every following word,
w[i] is equal to the XOR of the previous word, w[i-1]
and the word Nk positions earlier, w[i-Nk]. Refer Figure 1.
The key expansion may be processing either 128-bit
or 192-bit or 256-bit in each iteration, but the round
keys supplied to the AddRoundKey operation in
encryption or InvAddRoundKey operation in
decryption is always 128-bit. This is because the data
path consisting of encryption or decryption is always
128-bit depth, while the key expansion path may be
different for different key sizes.
Figure 1. Computations of key expansions
TABLE I. KEY EXPANSION COMBINATIONS
Cipher Key
Size
Number of
Rounds
(Nr)
Number of
Words per
expansion
Number of
Words per
round key
(Nk) (Nb)
128 -bit 10 4 4
192 -bit 12 6 4
256 -bit 14 8 4
The derivation of round keys from the expanded
keys is illustrated in Figure 2. In all there will be Ne
key expansions, depending upon the key size, where
the value of Ne can be computed as shown in equation
(1). Hence the value of Ne is 10, 8 and 7 for 128-bit,
192-bit and 256-bit respectively, after substituting the
values of Nr, Nb and Nk from Table I.
Ne = (Nr * Nb)/Nk (1)
The round keys are required in the reverse order
while performing the decryption data path. Hence the
round keys expanded while encryption are normally
stored in the memory so as to retrieve the keys in the
reverse order while decryption.
Figure 2. Key expansion for different key sizes (a) 128-
bit key (b) 192-bit key (c) 256-bit key
3. Proposed Architecture for OTF Key
expansion
Our proposed architecture makes use of a 256-bit
register, which temporarily logs the round keys. The
(a) (b) (c)
For 128-bit and 192-bit keys:
w[i-1]* = SubWord(RotWord(w[i-1])
w[i] = [{w[i-1]* RCON[i/Nk]} w[i-Nk] ]
---for i mod Nk = 0;
= w[i-1] w[i-Nk] ---for other values of i;
For 256-bit key:
w[i-1]* = SubWord(RotWord(w[i-1])
w[i] = [{w[i-1]* RCON[i/Nk]} w[i-Nk] ]
---for i mod Nk = 0;
= SubWord(w[i-1]) ---for i mod Nb = 0;
= w[i-1] w[i-Nk] ---for other values of i;
P V Sriniwas Shastry et al , Int.J.Computer Technology & Applications,Vol 5 (1),217-221
IJCTA | Jan-Feb 2014 Available [email protected]
218
ISSN:2229-6093
size of the register is chosen so as to accommodate expanded round key of all three sizes. The round keys
are generated using multiple iterations and after every
iteration the round key of 128-bit needed for
AddRoundKey operation, is placed at the upper half of
the register. As shown in the Figure 3, a multiplexer is
used, which swaps the key expanded in the earlier
iteration, to place the round key at the upper half of the
256-bit register.
A common architecture is designed for all the three
key sizes. The most critical part of the architecture is
to manage different number of expansion iteration for
each size, while keeping the round key size as 128-bit.
With an assumption that the encryption and decryption
data path is implemented using rolled architecture and
every clock event to the encryption or decryption data
path, results into one round of encryption. Hence the
key expansion unit also has to generate one round key
per clock cycle and this condition would be applicable for all three key sizes.
As mentioned in the Figure 1, there are specific
words which are operated with SubWord, RotWord and
then XOR with RCON. The round key generation per
clock cycle is based on 128-key expansion procedure.
In order to match to timing for different key sizes, the
original key as well as subsequent round keys are
shuffled after every clock cycle. The advantage of data
shuffling is that only four data processing elements
would be required for completion of key expansion for
three key sizes [7]. Figure 3 shows the above said
arrangement and the key expansion architecture. In our
architecture we have generated controls signals which
select the multiplexer data lines using sequential
machine and no processor has been employed as done
in [7].
A round counter is maintained so as to generate the select lines for the multiplexers. In case of 128-bit key
expansion, each clock cycle generates one round key
through one expansion iteration. Hence a total of 10
clock cycles would be needed to generate round keys
using 128-bit expansion. In case of 192-bit key
expansion, every three clock cycles generate three
round keys through two expansion iterations, therefore
we require 12 clock cycles. While expanding 256-bit
keys, every two consecutive clock cycles generate two
round keys through one expansion iteration, resulting in
to use of total 14 clock cycles. These iteration and their
required number of clock cycles are exactly matches
with that of encryption or decryption data paths.
In Figure 3(a), the swapping of the words are
shown for 192-bit and 256-bit key expansion. In case
of128-bit key expansion, no swapping of words is
needed and hence the data lines joins direct vertically
down to the corresponding word. While performing 128-bit key expansion, the words, w4, w8, w12, w16,etc.,
performs extra computations of RotWord, SubWord
and XOR with RCON. Similarly the words w6, w12,
w18, w24, etc., in 192-bit expansion performs extra
computations alike 128-bit expansion. In case of 256-
bit expansion the words w8, w16, w24, w32, - - w56
perform RotWord, SubWord and XOR with RCON,
while the words w12, w20, w28, w36, - - ,w52 performs
only SubWord operation.
The word multiplexers in Figure 3(b) selects the first
input cipher key or swapped data word from the
previous key expansion iteration based on the swapping
strategy shown in Figure 3(a). The architecture also
performs the reverse expansion of the round keys for
the decryption data path.
(a) (b)
Figure 3. (a) Data swapping strategy (b) All key size key expansion architecture
P V Sriniwas Shastry et al , Int.J.Computer Technology & Applications,Vol 5 (1),217-221
IJCTA | Jan-Feb 2014 Available [email protected]
219
ISSN:2229-6093
In our proposed architecture the splitting of the 256-bit data shuffling multiplexers [7] into word
multiplexers has reduced the power consumption,
because the multiplexers unselected remain inactive
resulting into lower dynamic power consumption. The
input to the multiplexer at „0‟ indexed port is for the
cipher key given by the user. The input to the „1‟
indexed port is for the 128-bit expansion, „2‟ indexed
port for 192-bit expansion and „3‟ indexed input port is
for 256-bit key expansion.
The design was synthesized using RTL Compiler of
Cadence. Standard cell libraries of 180nm were
employed for synthesizing the design. A clock
frequency of 179MHz has successfully clocked the
design, while having 495ps worst case slack.
Irrespective of the key size, every clock cycle has
generated one round key of 128-bit at a throughput of
22.91Gbps. The throughput calculations are done
using equation (2).
Throughput = 128 * Clock Frequency
(2)
The synthesis results are presented in Table II. The
physical layout design on 180nm was performed using
SoC Encounter of Cadence. The total design was fit
into 61153 um2 area, with a core density of 70%.
4. Results and Comparison
We have implemented the OTF key expansion for all
key sizes using TSMC 180nm cell libraries. We
compare our implementation results in Table III. The
design in [7] has similar implementation and clocked
the design at 102MHz and achieving approximately
13.056Gbps. The design in [8] also implemented OTF
key expansion unit, but only for 128-bit key size. Even
though another similar implementation for different key
sizes was proposed in design [4], but it was
implemented on 250nm technology, also it has
consumed 26,639 gate count which is quite higher than
our gate count. The design proposed in [6] was also
implemented on 180nm technology, but have used
pipelined architecture for the 128-bit OTF key
expansion. Also this design has used32-bit data path
and achieved 10.656Gbps.
5. Conclusion
We have presented a new optimization method while
implementing the On-The-Fly Key expansion for all
key sizes on 180nm technology, by splitting
Multiplexers into word multiplexers and keeping them
inactive, when not in use. Particularly while 128-bit
key expansion is performed and while 192-bit key
expansion is performed. This has not only reduced the
number of gates required but also reduced the dynamic
power consumption.
6. References [1] Advanced Encryption Standard (AES)", Federal
Information Processing Standards Publications (FIPS PUBS)
Publication 197, November, 2001. [2] Qingfu Cao, Shuguo Li, “A high throughput cost-
effective ASIC implementation of the AES algorithm”, Proc.
IEEE 8th International Conference on ASIC (ASICON)2009,
pp. 805-808. [3] Po-Chun Lie, Chang Hsie-Chia, Chen-Yi Lee, “A
1.69Gbps area-efficient AES Crypto Core with compact on
the fly key expansion unit”, Proc. ESSCIRC 2009, pp. 404-
407. [4] Chih-Pin Su, Chia-Lung Horng, Chih-Tsun Huang and
Cheng-Wen Wu, “ A configurable AES processor for
enhanced security”, Proc. ASP-DAC 2005, pp. 361-366
[5] Shen-Fu Hsiao, Ming-Chih Chen, Chia-Shin Tu, “Memory-free low cost designs of advanced encryption
standard using common subexpression eliminationfor
subfunctions in transformations”, IEEE Trans. Circuits and
Systems -I: Regular papers, Vol.53, No.3, March 2006, pp 615-626.
[6] P Saravanan, N Renukadevi, G Swathi, P Kalpana, “A
high-throughput ASIC implementation of configurable
advanced encryption standard(AES)”, Proc. IJCA special issue on “Network Security and Cryptography NSC, 2011.
Table II. Synthesis result
Particulars Values
Number of Standard cell
Instances
3231
Standard Cell Area 16273
Power dissipation 1.79mW
Slack 495ps
Clock Frequency 179MHz
Physical Area (Physical Layout) 61153m2
Table III. Implementation Comparison
Particulars [4] [7] Ours
CMOS Technology 250nm 180nm 180nm
Frequency (MHz) 66 102 179
Throughput (Gbps) 8.448 13.056 22.912
Gates 26,639 26,639 16,284
Key sizes 128, 192 and 256-
bit
128,192 and 256-
bit
128,192 and
256-bit
Data path depth 128 bits 128 bits 128 bits
P V Sriniwas Shastry et al , Int.J.Computer Technology & Applications,Vol 5 (1),217-221
IJCTA | Jan-Feb 2014 Available [email protected]
220
ISSN:2229-6093
[7] Mao-Yin Wang, Chih-Pin Su, Chia-Lung Horng, Chen-Wen Wu, Chih-Tsun Huang, “Single and multi-core
configurable AES architectures for flexible security”, IEEE
Tans. on Very Large Scale Integration (VLSI) Systems,
Vol.18, No. 4, April 2010, pp. 541-551. [8] A Alma‟aitah, Zine-Eddine Abid, “ Area efficient-high
throughput sub-pipelined design of the AES in CMOS
180nm”, Proc. 5th International Design and Test Workshop
(IDT), 2010, pp. 31-36.
P V Sriniwas Shastry et al , Int.J.Computer Technology & Applications,Vol 5 (1),217-221
IJCTA | Jan-Feb 2014 Available [email protected]
221
ISSN:2229-6093