Maximizing the hash function of authentication codesyuba.stanford.edu/~yiannis/docs/potentials.pdf · SHA-1 hash function The SHA-1 hash function is an itera-tive algorithm that requires

MARCH/APRIL 2006 90278-6648/06/$20.00 © 2006 IEEE

A DESIGN APPROACH to create small-sized, high-speed implementations ofthe keyed-hash message authenticationcode (HMAC) is the focus of this article.The goal of this approach is to increasethe HMAC throughput to a level thatcan be used in modern telecommuni-cation applications such as virtual pri-vate networks (VPNs) and the oncom-ing 802.11n. We focus on increasing themaximum operating frequen-cy that, compared to commer-cially available IP cores,ranges from 30% to 390%.The proposed implementationdoesn’t introduce significantarea penalty. More specifical-ly, the overall increase inlookup tables required by ourimplementation is less than10% compared to that ofother implementations.

Implementing hashfunctions

Hash functions are common and crit-ical cryptographic primitives. Their pri-mary application is combined use withpublic-key cryptosystems in digital sig-nature schemes. Hash functions com-press a string of arbitrary length to astring of fixed length. Their main pur-pose is to produce a fingerprint of amessage or some other block of datathat will provide a high level of securityfor communication protocols.

Implementing a hash function onhardware presents numerous advan-tages. Hardware implementations pre-sent higher throughput than software,thus being more adaptable for high-speed applications. They operate with-out interuption, contrary to softwareimplementations in a multitask environ-ment. Hardware provides higher levelsecurity than software in cases of hack-ing attempts.

The most widespread functions aresecure hash algorithm-1 (SHA-1) andmessage digest (MD5). These two hashfunctions are widely known for beingused in the HMAC, which is used innumerous communication applicationsto address authentication issues.

The SHA-1 hash function is selectedfor the digital signature algorithm (DSA)as specified in the digital signature stan-dard whenever a secure hash algorithmis required for federal applications. TheSHA and MD-family hash functions areused widely in the field of communica-tions, where, until recently, throughputof the cryptographic systems was not

required to be high. However, since theuse of the HMAC in the IPSec, e-pay-ment, and VPN applications, thethroughput of the cryptographic systemhas to reach the highest degree ofthroughput, especially for the server. Inapplications with high transmission andreception rates, any latency or delay incalculating the digital signature of thedata packet decreases the network’s

quality of service. Software implementa-tions are presenting unacceptable per-formance for high-speed applicationssuch as e-commerce, e-health, andvideo conferences. Poor performanceand bulk implementations of HMAC IPcores are currently occurring in themarket; Intron and Ocean Logic imple-mentations are one example.

The latter facts were a strong motiva-

tion to propose a novel hardwareimplementation of the HMAC. We aimto provide a low-cost design approach,compared to the solutions proposed byboth academia and industry, to satisfythe requirements of the new communi-cation applications. It introduces a neg-ligible area penalty, increasing thethroughput and keeping the area smallenough for most portable communica-

tion devices. The main contribution ofthis work is the design approach tooptimize performance without introduc-ing extra area.

The HMAC algorithm The HMAC standard defines a special

mechanism that guarantees messageauthentication for transmission through anonsecure communication channel. The

Maximizing the hash function of authentication codes

IOANNIS I. YIAKOUMIS, MARKOS E. PAPADONIKOLAKIS, HARRIS E. MICHAIL,ATHANASIOS P. KAKAROUNTAS, AND COSTAS E. GOUTIS

© DIGITALVISION, PHOTO DISC.

10 IEEE POTENTIALS

main design approach forthis mechanism is to use acryptographic hash function(usually the MD5 or SHA-1).The purpose of the HMACis to authenticate both thesource of a message and itsintegrity. The main parame-ters of the HMAC are themessage input and thesecret key, which is knownonly to the message origina-tor and the intended receiv-er. The main function of theHMAC is the generation of avalue (the MAC), formed bycondensing the message inputand the secret key. The MACvalue is sent along with the mes-sage, and the receiver has to evalu-ate that the received message gener-ates the received MAC value usingthe secret key, which is agreed uponby the message originator and thereceiver. The final MAC value is given bythe expression shown in (1), where text isthe plain text of the message, K is thesecret key and K0 is K appended withzeros to form a mod32(n) byte key, ipadand opad are predefined constants, and ⊕ is bitwise XOR.

HMAC(K , text) =H((K0 ⊕ ipad)

‖H(K0 ⊕ opad)‖text)

(1)

Proposed HMAC implementationThe architecture of the proposed

HMAC offers a significant benefit con-cerning the maximum achieved opera-tion frequency. The critical path isobserved to the hash core block, wherethe hash functions are implemented. Thisallows the design effort to be focused onthe hash core and the optimization of thehash functions’ critical path. The twohash functions are then presented alongwith some critical optimizations on thecritical path. Solutions are offered forapplications that require either HMAC-MD5, SHA-1, or a combined of theHMAC-MD5-SHA-1 function.

SHA-1 hash functionThe SHA-1 hash function is an itera-

tive algorithm that requires 80 transfor-mation steps to generate the final hashvalue or message digest (MD). In eachtransformation step, a hash operation isperformed that takes as inputs five 32-b variables (a, b, c, d , e) , and two

extra 32-b words (one is the messageschedule Wt that is provided by thepadding unit, and the other word is aconstant K t predefined by the stan-dard). The calculations taking place ineach operation (clock cycle t) aredescribed in (2), where ROTLx(y) rep-resents rotation of word y to the left byx b and f t (z, w, v) represents the non-linear function associated to clock cyclet

et = dt−1

dt = ct−1

ct = ROTL30(bt−1)

bt = at−1

at = ROTL5(at−1)

+ f t (bt−1, ct−1, dt−1)

+ K t + Wt (2)

The linear function f t changes every 20cycles. Thus, the SHA-1 is divided infour rounds of 20 identical operations,based on the used nonlinear function.The hash value resulted from the 80iterations is a 160-b MD.

From (2), since the first four operationsare hardwire logic (thus introducing nodelay), it is easy to determine that the criti-cal path is located in the calculation of at,which is equal to the delay of three carry-propagate adders (CPA). However, thereis a design approach that tries to exploit

the characteristics of thecarry save adder (CAS) tominimize the critical path.

The proposed designapproach to optimize thecritical path exploits thefact that at is calculatedusing the inputs of cyclet − 1. Thus, we can pre-compute some intermedi-ate values, store them in aregister, and use themwithout introducing anydelay. So, we transform

(2) in (3) to reduce the criti-cal path. The new operation

block of the SHA-1, as a resultof the application of the pre-

computation stage, is illustratedin Fig. 1. Some observations can

be made analyzing (3) and Fig. 1.First, the new data path is assem-

bled by the final calculation block fol-lowed by the precomputation stage

(from register output to register input).The critical path is observed in the calcu-lation of at (or e∗t−l ) and presents adelay of two adders, synthesized as aCSA and a CPA. Second, the introducedarea penalty is a single register that storesthe intermediate value gt−1. Additionally,power dissipation is kept low and almostthe same as that of the initial implemen-tation. The extra power dissipation is thatof the read/write operations of the intro-duced register. On the other hand, thepaths are shortened and balanced, reduc-ing the glitches and the dynamic powerdissipation on the circuit’s wires Theintroduction of this precomputationalstage is a novel design approach.

e ′t−1 = et−1 + K t + W , et = d ′

t−1

d ′t−1 = dt−1, dt = c ′

t−1

c ′t−1 = ct−1, ct = ROTL30(b′

t−1)

b′t−1 = bt−1, bt = a′

t−1

a′t−1 = at−1, at = ROTL5(a′

t−1)

+ e ′t−1 + gt−1

gt−1 = f t (bt−1, ct−1, dt−1). (3)

MD5 hash functionMD5 is an improved version of MD4,

which addresses several known successfulattacks on MD4. As in SHA-1, MD5 focus-es on the transformation of an initial inputthrough iterative operations. MD5 pro-duces a 128-b MD instead of the 160-b

Fin

alC

alcu

latio

n

Reg

iste

r

Pre

-co

mpu

tatio

n

MUX

0

1

0

ROT5

ROT30

s

H0-H4

et

dt

ct

ft

bt

at

d*t−1

c*t−1

b*t−1

a*t−1

gt−1

e*t−1

a*t−1

b*t−1

gt−1

c*t−1

d*t−1

e*t−1

at−1

bt−1

ct−1

dt−1

et−1

Wt

Kt

+

+

+

+

Fig. 1 The modified SHA-1 operation block,separated in two calculation phases

hash value of SHA-1. Additionally, thereare still four rounds consisting of 16 oper-ations each. There are four 32-b(a, b, c, d ) inputs and two extra 32-b val-ues that are transformed iteratively to pro-duce the final MD. One is the messageschedule Mt that is provided by thepadding unit, and the other is a constantL t predefined by the standard. The calcu-lations that take place in each operation(clock cycle t) are described in (4), wheref nt (z, w, v) represents the nonlinearfunction associated to clock cycle t.Rotation in (4) is performed for s posi-tions, which vary from cycle to cycle andare predefined by the standard

dt = ct−1

ct = bt−1

bt = bt−1 + ROTLs(at−1 + f nt

× (bt−1, ct−1, dt−1) + Mt + L t )

at = dt−1, (4)

The critical path is located on the calcu-lation of a sole output bt. Identically toSHA-1, a precomputational stage can beapplied to this hash function to reducethe critical path.

HMAC implementation scenariosAs already mentioned, HMAC can

be implemented using one hash func-tion or two hash functions combinedto operate when selected. Also, bothSHA-1 and MD5 hash functions havean identical parameter; they both havefour discreet rounds. The above offer awide range of characteristics of theHMAC implementation that, if exploit-ed wisely, can give solutions depend-ing on the nature of the application.

If the critical parameter is a smallarea, a rolling loop technique can beapplied. As illustrated in Fig. 1, the out-put of the operational block is fed backto the input through precomputationstage. Notice that the main benefit of theinsertion of the precomputation stage isthat at , which is the output of the finalcalculation block, enters the precompu-tation stage as the new at−1, which is awire directly connected to the register.This technique allows small-sized imple-mentations through reuse of the sameconfigurable operation block.Configurability issues have to addresscorrect selection of the nonlinear func-tion for both hash functions and therotate positions in the case of MD5.

If the critical design parameter is per-formance, with a more relaxed area

constraint, then pipeline can be applied.As already mentioned, a common char-acteristic of the two hash functions isthe four rounds. Thus, applying apipeline stage to every round results ina quadruplicating of the achievedthroughput. This technique exploitssmall-sized implementations based onrolling loop and the characteristic of thefour rounds to result in relatively smallsized implementations, achievingthroughput four times higher than thelimit imposed by the design of the oper-ation block of the hash function.

In the case of implementing HMAC-MD5 or HMAC-SHA1, the throughput isdirectly associated to the maximumoperating frequency of the hash func-tion’s operation block. The proposedmodifications of the two hash functionssignificantly reduce making the criticalpath. The implementation of HMAC-MD5 or HMAC-SHA1, using the precom-putational stage, scores a 30% increaseof throughput if no pipeline is applied.

In many applications, there is a needfor the selective use of SHA-1 or MD5.There are two design approaches forcoexistence of the two hash functions.The first is the implementation of thetwo hash functions as separate cores andselection through a multiplexer.Although this approach presents lowdesign complexity, it is not optimal forsmall-area requirements. Power dissipa-tion is also considerably high. The sec-ond design approach is the explorationof the two hash functions to locateresources that can be used by both func-tions. In this case, area requirements arereduced and extra power dissipation is afactor of the latter approach only.

Implementation and resultsConsidering the afrementioned imple-

mentation scenarios, we implementedseveral HMAC designs to verify and evalu-ate the value of the presented designapproach. The designs were captured inVHDL and were fully simulated and veri-fied using commercial tools. The XILINXVirtex field programmable gate array tech-nologies were selected as the targetedtechnologies synthesizing the designs forthe Virtex-II and Virtex-E device families.We used these device families to exploitthe different characteristics offered byeach. More specifically, Virtex-E is appro-priate for area-optimized designs offeringcompact and area efficiency, while Virtex-II presents performance efficiency.

Results of the implementationsIn Table 1, the characteristics of the

proposed HMAC implementations areoffered. Only implementations of theVirtex-E FPGA family were fully verified,and numbers reflect experimental results.The results of the FPGA technologies arereported from the synthesis tool. Theimplementation of the combined hashfunctions is considered for two targetdesign parameters: performance opti-mized, which uses implementation oftwo separate cores and selection througha multiplexer, and area optimized, whichexploits commonly reused primitives.The reported throughput corresponds toa design approach with rolling loop tech-nique applied but without pipeline. If thepipeline technique is applied, throughputis quadrupled, and the area is increasedan average of 3.21 times. This is the firsttime that an implementation withoutpipeline exceeds 1 Gb/s in Virtex-II FPGA

MARCH/APRIL 2006 11

Table 1. Characteristics of the proposed HMAC implementations for the targeted FPGA technologies.

Op.Frequency Throughput HMAC Slices (MHz) (Mb/s)

Xilinx Virtex-II (−6)

SHA-1 854 162 1024.0

MD5 797 96 756.0

SHA-1 MD5 (perf.) 1357 96 606.8756.0

SHA-1 MD5 (area) 982 81 512.0638.0

Xilinx Virtex-E (−8)

SHA-1 686 111 701.6

MD5 612 65 512.0

SHA-1 MD5 (perf.) 1100 65 410.8512.0

SHA-1 MD5 (area) 780 61 385.5480.4

12 IEEE POTENTIALS

technology. As illustrated, synthesizingthe area-optimized design for Virtex-Eresults in even smaller area require-ments, while the performance-opti-mized design takes further advantage ofthe Virtex-II device family. It can beobserved that HMAC-SHA1 is more effi-cient, in terms of performance, thanHMAC-MD5. The combined HMAC-SHA1-MD5 doesn’t present the perfor-mance benefits of SHA1, because themaximum operating frequency is limit-ed to that of the lower one, which inthis case is MD5. However, this solutionoffers reduced area requirements byreusing the same resources for bothSHA1 and MD5.

In Table 2, the characteristics of thecommercial HMAC IP cores are reportedto make comparisons. Every single SHA-1 implementation (not an HMAC-SHA1)is marked with a * at the start. Thesedesigns are offered as a reference dueto the explicit dependency of the maxi-mum operating frequency of the HMACfrom the critical path of the used hashfunction. As illustrated in Table 2, MD5is not exploiting the high-speed perfor-mance efficiencies of Virtex-II devicefamilies, focusing on a optimum areadesign, while much better performancecould be obtained with only a smallarea tradeoff. Moreover, MD5 presentseven worse characteristics, in terms ofboth performance and area, while itdoesn’t either exploit the Virtex-II highperformance efficiency or make theeffort for an optimum area design.Analyzing the performance of the imple-mentations presented in Table 2, it can

be observed that throughput of the pro-posed HMAC implementations exceedsthose of the available commercial IPcores by up to 390%.

ConclusionsA novel design approach for the

development of small sized and high-speed HMACs was presented in thisarticle. The approach showed that thecritical path can be further reduced byexploiting special properties of theincluded hash functions. A significantdesign effort was made to keep thearea low. The experimental resultsshowed that a negligible area penaltywas introduced for achieving anincrease in throughput up to 390%compared to the competing implemen-tations. Finally, the design was fullytested and verified for the Xilinx Virtex-E FPGA family using a prototype board.

Read more about it• Secure Hash Standard, NIST, FIPS

Pub. 180-2, 2002.• IETF Network Working Group, RFC

1321, 1992.• The Keyed-Hash Message

Authentication Code (HMAC) (Standard).NIST, FIPS Pub 198 Standard, 2002.

• Digital Signature, NIST, FIPS Pub186-2, 2000.

• IP Security Protocol [Online]Charter, Internet Drafts for IPSec.Available: http://www.ietf.org/htmlcharters/ipsec-charter.html

• S. Dominikus, “A hardware imple-mentation of MD-4 family hash algo-rithms,” in Proc. IEEE Int. Conf. Elec-

tronics, Circuits and Systems, 2002, pp.1143–1146.

• N. Sklavos, G. Dimitroulakos, andO. Koufopavlou, “An ultra high speedarchitecture for VLSI implementation ofhash functions,” in Proc. IEEE Int. Conf.Electronics, Circuits and Systems, 2003, pp. 990–993.

• T. Grembowski, R. Lien, K. Gaj, N.Nguyen, P. Bellows, J. Flidr, T. Lehman,and B. Schott, “Comparative analysis ofthe hardware implementations of hashfunctions SHA-1 and SHA-512,” in Proc.Information Security Conf., Springer-Verlag, Berlin, Germany, 2002, pp. 75–89.

• B. den Boer and A. Bosselaers, “Anattack on the last two rounds of MD4,”in Proc. CRYPTO ’91, Advances inCryptology, Springer Verlag, Berlin,Germany, 1992, pp. 194–203.

• ALMA Technologies. Available:http://www.alma-tech.com

• Bisquare Systems Private Ltd.Available: http://www.bisquare.com

• Helion Technology Ltd. Available:http://www.heliontech.com

• Intron, Ltd. Available: http://www.lviv.uar.net/~intron/

• Ocean Logic Ltd. Available: http://www.ocean-logic.com

• Amphion. Available: http://www.amphion.com/index.html

About the authorsIoannis I. Yiakoumis is a student of

electrical and computer engineering atthe University of Patras, Greece. He is aStudent Member of IEEE. His researchinterests include hardware design, com-puter security, wireless networks, andembedded systems programming.

Markos E. Papadonikolakis is a stu-dent of electrical and computer engineer-ing at the University of Patras, Greece.He is a Student Member of the IEEE. Hisresearch includes computer security,hardware design, and image encoding.

Harris E. Michail is a researcher of elec-trical and computer engineering at theUniversity of Patras, Greece. He is aMember of the IEEE, the TechnicalChamber of Greece, and the GreekElectrical Engineering Society. His researchincludes computer security, hardwaredesign, and reconfigurable architectures.

Athanasios P. Kakarountas is withthe electrical and computer engineeringdepartment, University of Patras,Greece. He is a Member of the IEEE.

Costas E. Goutis is with the electricaland computer engineering department,University of Patras, Greece. He is aMember of the IEEE.

Table 2. Characteristics of other HMAC implementations for the targeted FPGA technologies.

Op.Frequency Throughput HMAC Slices (MHz) (Mb/s)

Xilinx Virtex-II (−6)

*SHA-1 [12] 573 140 874.0

*SHA-1 [14] 612 79 498.1

*MD5 [12] 613 96 744.0

*MD5 [14] 614 62 488.3

*MD5 [15] 844 60 472.0

SHA1 & MD5 [12] 888 95 593.0736.0

Xilinx Virtex-E (−8)

*SHA-1 [13] 716 71 449.0

*SHA-1 [14] 612 72 451.9

*MD5 [14] 605 50 393.8

SHA-1 [11] 579 66 422.4

MD5 [11] 324 50 400.0

The designs that are marked with a ‘*’ are indicating implementations of the described hash functions, not HMAC implementations.

0278-6648/06/$20.00 © 2006 IEEEMARCH/APRIL 2006 13

ELLIPTIC CURVE CRYPTOGRAPHY is apublic key cryptosystem that is becom-ing increasingly popular. Implementa-tions of cryptographic algorithmsshould not only be fast, compact, andpower efficient, they should also resistside channel attacks. One of the sidechannels is the electromagnetic radia-tion out of an integrated circuit. Hence,it is very important to assess the vulner-

ability of implementations of cryptosys-tems against these attacks. A simpleelectromagnetic analysis (SEMA) attackon an unprotected implementation canfind all the key bits with only one mea-surement. We also describe a differen-tial electromagnetic analysis (DEMA)attack on an improved implementationand demonstrate that a correlationanalysis requires 1,000 measurements tofind the key bits.

Cryptographic algorithms andprotocols hold the key

Keeping information secret andauthentic is a very old concern, but theexponential growth of technology exac-erbates the need for secure communica-tion. Cryptographic algorithms and pro-tocols are essential in protecting theconfidentiality and authentication ofdata; they replace the problem of pro-tecting information by protecting shortcryptographic keys.

Ironically, the very same technologythat forms the basis for the higherdemand in security has a few annoyingside effects. The use of side channels tobreak a cryptosystem was introducedby P. Kocher. In this context, a sidechannel is a physical property that canbe measured externally during the exe-cution of a cryptographic algorithm toderive information on secret keys.

Examples are the execution time of thealgorithm on the chip or the powerconsumption of implementations ofcryptosystems. With this idea, crypt-analysis no longer focuses exclusivelyon the mathematical aspects but alsoevaluates weaknesses of implementa-tions. The three main physical proper-ties of cryptographic modules can beexploited in side channel attacks:power consumption, timing, and elec-tromagnetic radiation. Others such assound and heat are currently beingexplored but see less promising.

Elliptic curve cryptography (ECC)was proposed independently by Millerand Koblitz in the 1980s. Since then aconsiderable amount of research hasbeen performed on secure and efficientECC implementations.

This article reports on the first imple-mentation of an electromagnetic analysis(EMA) attack on a hardware implemen-tation of an elliptic curve (EC) processor

with a key length of 160 b. Earlier workis either theoretical or presents attackson software implementations for 8-bsmart cards. The main differencebetween our implementation of an ECprocessor and these software implemen-tations is that, in our hardware, all oper-ations are done in parallel. Hence, thenumber of bit transitions during everyclock cycle can be up to 160, compared

to eight for a smart card. This impliesthat the predictions of the transitions aremuch harder. To detect the effect of anybit changes, we have to increase thenumber of measurements by a factor of20 or more.

The U.S. government has been awareof electromagnetic leakage since the1950s. The resulting standards are calledTEMPEST and are partially available at<http://cryptome.org/nsa-tempest.htm>.The first published papers are the workof J. Quisquater and D. Samyde and theGemplus team. According to D. Agrawal,there are two types of radiations: inten-tional and unintentional. Later on, infor-mation of different side channels wascombined in so-called multichannel att-acks in which the side channels are notnecessarily of a different kind.

Until now, most papers on EMAapplied similar techniques such aspower analysis while apparently muchmore information is available to be

ELKE DE MULDER, PIETER BUYSSCHAERT, SIDDIKA B. ÖRS, PETER DELMOTTE, BART PRENEEL, GUY VANDENBOSCH, AND INGRID VERBAUWHEDE

©D

IGIT

AL

ST

OC

K

Measuring the vulnerability of cryptographic algorithmsMeasuring the vulnerability of cryptographic algorithms

14 IEEE POTENTIALS

explored. It is likely that future workwill also deal with combinations ofEMA with other side channel attacks.

Elliptic curves over GF(p)The public key cryptosystem imple-

mented on the field programmable gatearray (FPGA) is the elliptic curve cryp-tosystem. An elliptic curve E isexpressed in terms of the Weierstrassequation: y 2 = x3 + ax + b , where a, b ∈GF (p). The points on this curve can beadded to each other, and the resultingpoint is again a point on the samecurve. The point at infinity zero plays arole analogous to that of the number 0in ordinary addition. Thus, P + O = Pand P + (−P ) = O for all points P .With these properties, it is straight for-ward to introduce the point or scalar multiplication as the main operation forECC, i.e., k P = P + P + . . . P (k t ime s) .This operation can be calculated byusing the double-and-add algorithm asshown in Algorithm 1.

Algorithm 1: Elliptic Curve PointMultiplicationRequire: EC point P = (x, y), integer k,0 < k < M , k = (kl −1, kl −2, . . . , k0)2 ,kl −1 = 1Ensure: Q = (x ′, y ′) = [k]P

1. Q ← P2. for i from l -2 downto 0 do3. Q ← 2Q4. if ki = 1 then

5. Q ← Q + P6. end if7. end for

The goal is to guess the key bits kibecause by finding them, the algorithmis broken.

Electromagnetic analysis attackNowadays, complementary metal-

oxide semiconductor (CMOS) is by farthe most commonly used technology toimplement digital integrated circuits. ACMOS gate consists of a pull-up net-work with C-MOS transistors and a pull-down network with n-MOS transistors.Those networks are complementary.When the input is stable, only one ofthe two networks conducts. The mostsimple logic gate is an inverter; itspower consumption is representativefor all logic ports and gives a generalimage of the power consumption in aCMOS circuit. During the functioning ofthe inverter, three types of power con-sumption can be distinguished: theleakage current, the current that flowsfrom the power source to the groundduring the switching from 0 to 1 (short-circuit current), and the current used tocharge and discharge the differentcapacitors in a digital network (dynamicpower consumption). The last onecauses the biggest power consumptionin present designs because of all thewiring in an FPGA on which this algo-

rithm is implemented. These capaci-tors, however, are necessary to main-tain the two different logic levels. Thecapacitor, when charging or dischargingdiffers so to switch an invertor from 0to 1 or from 1 to 0, consumes a differ-ent amount of power. In addition, allcapacitors for each gate differ, whichresults in a different power consump-tion of the different gates according tothe data being processed. This is partial-ly illustrated in Fig. 1: (a) shows chargingthe load capacitor of the invertor, (b)shows the discharging phase, and (c)and (d) do not show any change incharge because the input voltage is notaltered.

The sudden current pulse that occursduring the transition of the output of aCMOS gate causes a variation of theelectromagnetic field surrounding thechip. This can be monitored, for exam-ple, by inductive probes that are partic-ularly sensitive to the related impulsion.When using a loop antenna, the voltageinduced by the current equals

V = − dφ

d t

φ =∫

�B. �A

where V is the probe’s output voltage,φ the magnetic flux sensed by probe, tis the time, B is the magnetic field, andA is the area that it penetrates.

Two types of electromagnetic analy-sis attacks are distinguished. In a SEMAattack, an attacker uses the informationfrom one electromagnetic radiationmeasurement directly to determine(parts of) the secret key. In a DEMAattack, many measurements are usedto fi l ter out noise and the key isderived using a statistical analysis. ASEMA attack is typically used whenthere is a conditional branch in thealgorithm that results in a differentradiation pattern whenever the branchis taken. A DEMA attack uses the prop-erty that processing different dataneeds a distinct amount of power andradiates a different field.

A simple analysisIn a simple power analysis attack, an

attacker uses the side-channel informa-tion from one measurement directly todetermine parts of the secret key. Theseattacks are possible because of differ-ences between executed instructions.They need to have a simple relationshipwith the key, like the f.e. key dependent

Fig. 1 Possible switching events for a CMOS invertor (From left to right, from top tobottom (a)–(b), (c)–(d)

(a) (b)

(d)(c)

MARCH/APRIL 2006 15

branches. The algorithm under attack inthis article can be implemented with keydependent branches as in shown in linefour of Algorithm 1. If the instruction inthis branch has a power consumptiongraph that is distinguishable from otherinstructions in the algorithm, it will bevery easy to gain some informationabout the key. An attacker measures thepower trace of the cryptodevice andtries to relate his knowledge about theimplementation on the power trace.

A differential analysisA differential analysis works accord-

ing to the following procedure and isshown in the flowchart. In the first step,a number of side-channel measurementsare taken while the cryptographicdevice encrypts the same algorithm withthe same key but different inputs. In thesecond step, the attacker chooses apoint of attack where the intermediateresults at that time only depend on apart of the key and calculates the hypo-thetical side-channel value for everyinput based on all guesses for the sub-key according to a model of the sidechannel. In the next step, the attackeruses statistical techniques to verifywhich hypothesis about the key is cor-rect by looking for a relationshipbetween the hypothetical side channelvalues and the real ones.

As explained earlier, when there issome switching from 0 to 1 or 1 to 0,more power is used than when there isno switching activity. This relates thepower consumption with the amount ofbit toggles, giving us a model to mimicpower as a side channel. When anattacker chooses a point of attack forwhich he can calculate the intermediatevalues of the algorithm with knowledgeof the input and a partial guess fromthe key, he can relate the power con-sumptions of the measurements withdifferent inputs to the hypothetical val-ues of the amount of bit toggles. If thisstatistical analysis makes sense, the keyguess was correct. If not, the key guesswas probably incorrect and a new onesould be made.

Correlation analysisIn DEMA, an attacker uses a hypo-

thetical model of the attacked device.The quality of this model is dependenton the knowledge of the attacker. Themodel is used to predict several valuesfor the electromagnetic radiation of adevice, which are compared to the real,measured electromagnetic radiation of

the device. Comparisons are performedby applying statistical methods on thedata. Among others, the most popularare the distance-of-mean test and thecorrelation analysis. For the correlationanalysis, the model predicts the amountof side channel leakage at a certainmoment of time in the execution. Thesepredictions are correlated to the realelectromagnetic radiation. The correla-tion can be measured using the Pearsoncorrelation coefficient. Let ti denote thei th measurement data (i.e., the i thtrace) and T the set of traces. Let pidenote the prediction of the model forthe i th trace and P the set of such pre-dictions. Then we calculate

C (T, P ) =E (T.P ) − E (T ).E (P )√Var (T ).Var (P )

− 1 ≤ C (T, P ) ≤ 1

where E(T) denotes the expectation(average) trace of the set of traces T andVar(T) denotes the variance of a set oftraces T. T and P are said to be uncorre-

lated, if C (T, P ) equals zero. Other-wise, they are said to be correlated. Iftheir correlation is high, i.e., if C (T, P )

is close to +1 or −1 , it is usuallyassumed that the prediction of themodel, and thus the key hypothesis, iscorrect.

Measurement setupFigure 2 shows the most important

part of our measurement setup: theVIRTEX FPGA that is under attack.Because the field surrounding the chipis mainly a magnetic field in the nearfield, a loop antenna is used to pick upthe variations of the field. Our setupconsists of essentially two boards. Themain board is responsible for interfac-ing to the PC via the parallel port. It isconnected with the XILINX parallelcable to program the VIRTEX FPGA andprovides some LEDs, switches, and but-tons for testing purposes. The daughterboard itself carries the VIRTEX FPGA. Itallows to access some pins for trigger-ing and to measure the power con-sumption of the VIRTEX FPGA in aconvenient way.

Fig. 2 The FPGA and thehandmade antenna

Fig. 3 Electromagneticradiation trace of a160-b EC pointmultiplication withdouble-and-addalgorithm

× 104

5

4.5

4

3.5

3

2.5

Ele

ctro

mag

netic

Em

anat

ion

2

1.5

1

0.5

00 0.5 1 1.5 2 2.5

Sample3 3.5 4 4.5 5

× 106

1 0 0 0 01 1

16 IEEE POTENTIALS

SEMA attack on an FPGAimplementation of an EC processor

The EM radiation trace of a 160-b ECpoint multiplication is shown in Fig. 3.The SEMA attack is implemented on theEC processor published in S.B. Ors etal, which uses Algorithm 1 for EC pointmultiplication. It can be derived fromFig. 3 that the key used during thismeasurement is 11001100, becausethere is difference between the EM radi-ation traces of the EC point additionand doubling. The SEMA attack wassuccessful because of the conditionalbranch in Step 4 of Algorithm 1.

As a countermeasure to this attack,we implemented the EC point multiplica-tion by using the always double and addalgorithm published in J.S. Coron. Algo-rithm 2 shows that the EC point additionis executed independently from the valueof the key bits. One EM radiation mea-surement will not reveal the key bits.

Algorithm 2: Elliptic Curve PointMultiplicationRequire: EC point P = (x, y) , integer k, 0 < k < M , k = (kl −1, kl −2, . . . , k0)2 ,kl −1 = 1Ensure: Q = (x ′, y ′) = [k]P

1. Q ← P2. for i from l-2 downto 0 do3. Q1 ← 2Q4. Q2 ← Q1 + P5. if ki = 1 then6. Q ← Q27. else8. Q ← Q19. end if10. end for

DEMA attack on an FPGAimplementation of an EC processor

The target for our DEMA attack isthe second most significant bit (MSB) ofthe key kl −2 in Algorithm 2. If kl −2 = 0,then Q will be updated by 2P, other-wise by 3P at step 5 in Algorithm 2.

In the first step of our attack, wehave produced a so-called EM radiationfile. For this purpose, N random pointswere chosen on the EC and one fixedbut random key. We have let the FPGAexecute N point multiplications of N ECpoints, Pi , i = 1 . . . N with the samekey, k as Q i = [k]Pi . We will attack thecircuit at the time the coordinates of Q1is updated for the second time at step 3of Algorithm 2. With these measure-ments, an N x 2 000 000 matrix, M1 isproduced. We have applied a pre-processing technique to reduce theamount of measurement data in everyclock cycle. We have found the maxi-mum value of the measurement data ineach clock cycle and stored them inmatrix M2. Because the clock frequencyof the function generator we have usedfor our experiments was slightly differ-ing during the measurements, the num-ber of points in one clock cycle Di hasto be found. To compute Di , we haveto know the exact clock frequency. Forthis, we have calculated the discretefourier transform of each measurement.Figure 4 shows the first measurementafter taking the maximum value in everyclock cycle.

We have implemented the EC pointmultiplication with Algorithm 2 in the Cprogramming language. The C program

computes N EC point multiplications withN EC points and the key. The EC pointsand the key are the same as the onesgiven to the FPGA. During the executionof the EC point multiplications, the Cprogram computes the number of bitsthat change from 0 to 1 and from 1 to 0in some registers at the correspondingsteps to the five spikes shown in Fig. 4.The number of transitions is used as theEM radiation prediction.

We have predicted the EM radiationof the events that correspond to the fivespikes shown in Fig. 4 for kl −2 = 0 andkl −2 = 1 for each measurement andstored them in M3. Now, we can learnthe right value of kl −2 by finding thecorrelations between M3 and M2. Therewill be two values for each spike: onefor the guess that the key bit is 0, onefor the guess that the key bit is 1. Thecorrelations for spike five give us thecorrect key bit by using only 1000 mea-surements. The correlation for the guessthat the key bit is 1 is much higher thanthe correlation for the other guess asshown in Fig. 5.

After 1,000 measurements, the corre-lation for the kl −2 = 1 guess starts todiffer from the correlation for thekl −2 = 0 guess. The correlation for thekl −2 = 1 guess starts to rise, for thekl −2 = 0 guess the correlation staysaround 0.

Conclusions In this article, we have presented a

SEMA and DEMA electromagnetic analy-sis attack on an FPGA implementation ofan elliptic curve processor. As a result of

Fig. 4 The EM trace of the first measurement after taking themaximum value in every clock cycle

7

6

5

4

3

Ele

ctro

mag

netic

Rad

iatio

n (m

V)

2

1

00 200

First SpikeSecond Spike

Third Spike

Fourth SpikeFifth Spike

400 600 800 1,000 1,200 1,400 1,600 1,800Clock Cycle

Fig. 5 Correlation in function of the number of measurementsfor the fifth peak

Guess: Key–bit=0Guess: Key–bit=1

0.2

0.18

0.16

0.14

0.12

0.1

Cor

rela

tion

0.08

0.06

0.04

0.02

00 1,000 2,000 3,000 4,000

Number of Measurements5,000 6,000 7,000 8,000

a SEMA attack on an unprotected imple-mentation, we can find all the key bitsusing just one measurement. We haveconducted a DEMA attack on the alwaysdouble and add implementation andhave shown that it is possible to find thekey bits by making more measurementsand using correlation analysis. Ourattacks show that electromagnetic attacksform a realistic threat for a broad rangeof cryptographic hardware implementa-tions. Further work is necessary to opti-mize these attacks using more sophisti-cated antennas and signal processingtechniques. On the other hand, systemdesigners and cryptographers shouldjointly develop, implement, and evaluateadditional countermeasures against sidechannel attacks. These can consist of fre-quent key updates and various maskingand decorrelation approaches.

Read more about it• P. Kocher, “Timing attacks on

implementations of Diffie-Hellman, RSA,DSS and other systems,” in Advances inCryptology: Proce. CRYPTO’96, Springer-Verlag, vol. 1109, 1996, pp. 104–113.

• P. Kocher, J. Jaffe, and B. Jun, “Dif-ferential power analysis,” in Advances inCryptology: Proc. CRYPTO’99, vol. 1666,1999, pp. 388–397.

• S.-M. Kang and Y. Leblebici, CMOSDigital Integrated Circuits: Analysis andDesign. New York: McGraw Hill, 2002.

• NSA, “NSA TEMPEST Documents,”[Online] Available: http://www.cryp-tome.org/nsa-tempest. htm.

• J.J. Quisquater and D. Samyde,“Electromagnetic analysis (EMA): Mea-sures and counter-measures for smartcards,” in Proc. Int. Conf. ResearchSmart Cards: Smart Card Programmingand Security (E-smart), I. Attali and T.Jensen, Eds., 2001, pp. 200–210.

• D. Agrawal, B. Archambeault, J.R.Rao, and P. Rohatgi, “The EM side-channel(s): Attacks and assessmentmethodologies,” in Proc. 4th Int.Workshop Cryptographic Hardwareand Embedded Systems (CHES), B.S.Kaliski Jr., C.K. Koc, and C. Paar, Eds.,2002, pp. 29–45.

• D. Agrawal, B. Archambeault, S.Chari, J.R. Rao, and P. Rohatgi,“Advances in side-channel cryptanalysis,”RSA Lab. Cryptobytes, vol. 6, no. 1, pp.20–32, 2003.

• D. Agrawal, J.R. Rao, and P. Rohatgi,“Multi-channel attacks,” in Proc. 5th Int.Workshop Cryptographic Hardware andEmbedded Systems (CHES), C. Walter,Ç.K. Koç, and C. Paar, Eds., 2003, pp.2–16, .

• S.B. Örs, E. Oswald, and B.Preneel, “Power-analysis attacks on anFPGA, First experimental results,” inProc. 5th Int. Workshop CryptographicHardware and Embedded Systems(CHES), C. Walter, Ç.K. Koç, and C.

Paar, Eds., 2003, pp. 35–50.• S.B. Örs, L. Batina, B. Preneel, and

J. Vandewalle, “Hardware implementa-tion of an elliptic curve processor overGF(p),” in Proc. IEEE 14th Int. Conf.Application-Specific Systems, Architec-tures and Processors, 2003, pp. 433-443.

• J.S. Coron, “Resistance against dif-ferential power analysis for elliptic curvecryptosystems,” in Proc. 1st Int. Work-shop Cryptographic Hardware andEmbedded Systems (CHES), Ç.K. Koç andC. Paar, Eds., 1999 vol. 1717, pp.292–302.

About the authorsElke De Mulder is a Ph.D. student

at the Katholieke Universiteit Leuven,Belgium.

Pieter Buysschaert is a project leaderat Belgocontrol, Belgium.

Saddika B. Örs is a research assistantat the Istanbul Technical University.

Peter Delmotte is working at Prox-imus, Belgium.

Bart Preneel is professor in the Electri-cal Engineering Department of theKatholieke Universiteit Leuven in Belgium.

Guy Vandenbosch is professor in theElectrical Engineering Department of theKatholieke Universiteit Leuven in Belgium.

Ingrid Verbauwhede is professor inthe Electrical Engineering Department ofthe Katholieke Universiteit Leuven inBelgium.

© MASTERSERIES

Tell us what you think; the sky’s the limit

Every issue of IEEE Potentials features articles that are on

the cutting edge of ideas with subject matter that often

involves new technology, new applications of existing tech-

nology, or the results of new research. Such new ideas are

likely to provoke questions and discussion among colleagues

and friends. They may also generate controversy.

Whatever your reaction to an article, essay, IEEE pro-

gram, or general state of the world, IEEE Potentials would

like to hear about it. Instead of venting in obscurity or only to

those within earshot, put your thoughts into an e-mail and

send it to <[email protected]>. Remind your friends and

colleagues to do the same.

E-mails that are received may be published in the next

available issue of IEEE Potentials in our “Letters to the Edi-

tor” column. Letters may be edited for publication.

MARCH/APRIL 2006 17