Protecting AES on TriCore against Power Analysis AttacksDeclaration I hereby declare that this submission is my own work and that, to the best of my knowledgeandbelief,itcontainsnomaterialpreviouslypublishedorwrittenbyanother

EMSEC

Protecting AES on TriCoreagainst Power Analysis Attacks

Robert Spielmann

Master’s Thesis. August 27, 2013.Chair for Embedded Security – Prof. Dr.-Ing. Christof PaarAdvisor: Dr.-Ing. Timo Kasper

Abstract

Embedded Systems are increasingly influencing modern everyday life. Mobile phonesnow have more computing power than a single computer did some years ago. Vehicles areequipped with a perpetually growing number of electronic units that control everythingfrom electric windows down to engine management. It becomes evident that embeddedsecurity is required more than ever before. With our thesis we aim at laying the foundationfor a solution to one small part of the issues arising from embedded computing. Wecreate a number of AES implementations for the Infineon TriCore TC1797. The basicimplementation features no protection against power analysis attacks. We analyze currentmasking schemes and create AES implementations protected with two of those schemes.We analyze the resistance of the protected implementations against 1𝑠𝑡-order CorrelationPower Analysis (CPA) and give an outlook towards potential areas of future work.

Acknowledgements

In October 2010, I set out on a journey towards a Master’s degree in Applied IT Security.At that time, I had already been working full-time for three and a half years and I didnot consider it an option to give up on my day job. Therefore I decided to enrol in adistance learning programme, accepting the fact that I would have to cope with highwork load and little free time over the course of two to three years. Looking back, I cometo realize that being successful at work and at university in parallel would have beenimpossible without the tremendous amount of help and support I received.

My employer, codecentric AG, played an essential part in my success. The companywas generous enough to pay my tuition fees. In addition, both my managers and mycoworkers trusted me to the highest possible extent. Based on this, I was able to achievethe optimal balance between project-related work and exams. During my work on thisthesis, I was allowed to spend a number of weeks at the chair’s laboratory in Bochum,which was essential regarding the technical parts of this work.

The decision to write my thesis at the Chair for Embedded Security at Ruhr-UniversitätBochum (EMSEC) was probably the best I could have made. Timo Kasper entrustedme with the non-trivial subject of this thesis. He advised and supported me during allphases of my work. Falk Schellenberg and David Oswald gave me valuable input andfeedback, and helped me overcome a number of technical challenges. Christof Paar andAmir Moradi took the time to discuss some mathematical aspects. All EMSEC staff Imet were always ready to help, even in extremely busy times.

For some specific topics, I turned to external entities. I wish to thank Matthieu Rivainfor a very helpful e-mail exchange revolving around countermeasures. Louis Goubinand Vincent Rijmen kindly replied to mails as well. Elisabeth Oswald deserves thanksfor allowing me to use some of the example data from the supplementary materials of[MOP07]. Werner Schindler discussed characteristics of random number generators withme in a spontaneous phone call. I found it highly satisfying to realize that members ofthe scientific community took me and my questions seriously and that they helped mewithout knowing me personally.

Looking at the private aspects of life, I am almost at a loss for words of appreciation.I am deeply indebted to my girlfriend who invested incredible amounts of energy andpatience in order to keep me on track. Thank you for having my back no matter what!

Also in the private context, I thank my parents, my brothers, and all of my friendsfor being reliable partners in every condition of life. I had my head in the clouds for along time. You kept faith with me even though I was often hard to reach. Thank you forbelieving in me!

I am happy to see that my journey towards the Master’s degree is now coming to agood ending. From my point of view, it was worth while every single step. While I donot have a fitting quote at hand to conclude my writing, insiders know what I meanwhen I say this:

Praise the Sun!

DeclarationI hereby declare that this submission is my own work and that, to the best of myknowledge and belief, it contains no material previously published or written by anotherperson nor material which to a substantial extent has been accepted for the award of anyother degree or diploma of the university or other institute of higher learning, exceptwhere due acknowledgement has been made in the text.

ErklärungHiermit versichere ich, dass ich die vorliegende Arbeit selbstständig verfasst und keineanderen als die angegebenen Quellen und Hilfsmittel benutzt habe, dass alle Stellen derArbeit, die wörtlich oder sinngemäß aus anderen Quellen übernommen wurden, als solchekenntlich gemacht sind und dass die Arbeit in gleicher oder ähnlicher Form noch keinerPrüfungsbehörde vorgelegt wurde.

Robert Spielmann

Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Our Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background: AES 52.1 The AES Contest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 The Rijndael Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 Mathematical Foundation . . . . . . . . . . . . . . . . . . . . . . . . 62.2.3 Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.4 Decryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.5 Implementation Outlook . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Background: Power Analysis Attacks 133.1 Simple Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Differential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3 Correlation Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 153.4 Other Types of Power Analysis Attacks . . . . . . . . . . . . . . . . . . . 19

3.4.1 Inferential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . 193.4.2 Mutual Information Analysis . . . . . . . . . . . . . . . . . . . . . . 21

3.5 Order of Power Analysis Attacks . . . . . . . . . . . . . . . . . . . . . . . 213.6 Selecting one Type of Power Analysis . . . . . . . . . . . . . . . . . . . . 22

4 Target Platform: TriCore TC1797 234.1 The TriCore Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Selecting a TriCore Product . . . . . . . . . . . . . . . . . . . . . . . . . . 234.3 The TriCore Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.3.1 Registers and Instructions . . . . . . . . . . . . . . . . . . . . . . . 244.3.2 Memory Layout and Addressing . . . . . . . . . . . . . . . . . . . . 254.3.3 Calling Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4 Relevant Features of the TC1797 . . . . . . . . . . . . . . . . . . . . . . . 254.4.1 CPU Cores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.4.2 Memory Sections and Caching . . . . . . . . . . . . . . . . . . . . . 264.4.3 Serial Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 264.4.4 Digital Input / Output (I/O) . . . . . . . . . . . . . . . . . . . . . . 27

ii Contents

5 Working Environment 295.1 The EMSEC TriCore Board . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Communicating with the Board . . . . . . . . . . . . . . . . . . . . . . . . 30

5.2.1 JTAG Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2.2 USB Connection to the Host Computer . . . . . . . . . . . . . . . . 32

5.3 Measurement Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.4 Software Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.4.1 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.4.2 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.4.3 Evaluation and Attack . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Software Countermeasures 376.1 Different Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6.1.1 Masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.1.2 Hiding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6.2 Recently Proposed Masking Schemes . . . . . . . . . . . . . . . . . . . . . 416.2.1 Rivain-Prouff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.2.2 Rivain-Prouff without Mask Refreshing . . . . . . . . . . . . . . . . 456.2.3 Kim-Hong-Lim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.2.4 Kim-Hong-Lim without Mask Refreshing . . . . . . . . . . . . . . . 516.2.5 Goubin-Martinelli . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2.6 Overall Masking of the AES Encryption . . . . . . . . . . . . . . . . 54

6.3 Complexity and Resource Comparison . . . . . . . . . . . . . . . . . . . . 576.3.1 Plain AES according to FIPS-197 . . . . . . . . . . . . . . . . . . . 576.3.2 32-bit Optimized AES . . . . . . . . . . . . . . . . . . . . . . . . . . 586.3.3 CPRR13 - Rivain-Prouff without Mask Refreshing . . . . . . . . . . 596.3.4 KHL11 - Kim-Hong-Lim . . . . . . . . . . . . . . . . . . . . . . . . 606.3.5 SKHL13 - Kim-Hong-Lim without Mask Refreshing . . . . . . . . . 626.3.6 Comparing the Estimates . . . . . . . . . . . . . . . . . . . . . . . . 62

6.4 Selecting Candidates for Implementation . . . . . . . . . . . . . . . . . . 62

7 Implementation 657.1 Random Number Generation . . . . . . . . . . . . . . . . . . . . . . . . . 657.2 Implementing AES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.2.1 Straightforward AES by the Book . . . . . . . . . . . . . . . . . . . 677.2.2 AES Optimized for the TriCore Architecture . . . . . . . . . . . . . 747.2.3 AES Protected with CPRR13 . . . . . . . . . . . . . . . . . . . . . 787.2.4 AES Protected with SKHL13 . . . . . . . . . . . . . . . . . . . . . . 817.2.5 Verification Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.3 Findings on Execution Timing . . . . . . . . . . . . . . . . . . . . . . . . 847.3.1 Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 847.3.2 Compiler Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 857.3.3 Memory Management Tricks . . . . . . . . . . . . . . . . . . . . . . 87

7.4 Key Takeaways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Contents iii

8 Side-Channel Analysis 898.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

8.1.1 Technical Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 898.1.2 Practical Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 89

8.2 Attacking the Implementations . . . . . . . . . . . . . . . . . . . . . . . . 908.2.1 Initial Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 908.2.2 Common parameters for all attacks . . . . . . . . . . . . . . . . . . 918.2.3 Unprotected 8-bit AES . . . . . . . . . . . . . . . . . . . . . . . . . 928.2.4 Unprotected 32-bit AES . . . . . . . . . . . . . . . . . . . . . . . . 958.2.5 AES protected with SKHL13 . . . . . . . . . . . . . . . . . . . . . . 968.2.6 AES protected with CPRR13 . . . . . . . . . . . . . . . . . . . . . . 99

8.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

9 Conclusion 1039.1 Breaking AES on TriCore using Power Analysis Attacks . . . . . . . . . . 1039.2 Protecting AES on TriCore against Power Analysis Attacks . . . . . . . . 103

10 Future Work 10510.1 TC1797 Profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10510.2 Optimizing the Implementations . . . . . . . . . . . . . . . . . . . . . . . 10510.3 Additional Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . 10610.4 Additional Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10610.5 Extended Software Tooling . . . . . . . . . . . . . . . . . . . . . . . . . . 10610.6 Real-World Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10610.7 Newest Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

A Acronyms 109

B Appendix 113B.1 Lookup Tables for the Kim-Hong-Lim Scheme . . . . . . . . . . . . . . . 113

List of Figures 115

List of Tables 117

List of Algorithms 118

List of Listings 121

Bibliography 123

1 Introduction

In recent years the world has seen an increasing trend towards ubiquitous computing.Many areas of everyday life are now pervaded by embedded electronic devices. Forexample, many humans have become accustomed to using smart phones and tablets.Such devices are built from complex and powerful hardware components. Softwarerunning on those devices provides the user with a broad range of features like telephony,photography, or internet access. At the same time many embedded devices lack even themost basic protection against eavesdropping and manipulation. Generally speaking, thetrend towards pervasive computing carries along the need for ubiquitous cryptography.When applied correctly, cryptography can mitigate many of the risks modern technologybears.

Other industries aside from the mobile phone sector are affected by identical trends.The automotive industry poses as a great example. Some thirty years ago, a car usedto be a mechanical object that could be repaired using mechanical tools. Starting inthe early 1990s, an increasing amount of electronic units was built into all kinds ofmodels ranging from the simple budget car to top of the line luxury models. Somemodern upper class vehicles contain more than 100 Electronic Control Units (ECUs)[MNBSL10]. The ECUs control a broad range of infotainment, comfort, and safetysystems. Infotainment features include navigation systems, radio units, and full mediasystems for passengers. In the area of safety we are used to systems like Antilock BrakingSystem (ABS) and Electronic Stability Control (ESC). Comfort features like adaptivecruise control or Volvo’s Blind Spot Information System (BLIS)1 round off the picture.Today, almost all engines are fully controlled and driven by electronic units, just like thebrakes, the lights, the doors, and many other components of the vehicle. Most of theECUs exchange data over bus systems because they cannot work fully by themselves.Some recent car models even communicate with the outside world by means of mobileInternet connections. It seems clear that an increasing need for protection is arising.Two main goals are of interest with regard to the automotive setting. Firstly, data beingexchanged between components inside the car or with the environment outside the carshould be protected against eavesdropping and manipulation. Secondly, manufacturersmight be interested in tamper resistance to thwart the manipulation of firmware runningon ECUs. More individual needs may already exist or be identified in the near future.The need for automotive-related security research is clearly underlined by projects likeSCAAS (https://www.emsec.rub.de/research/projects/SCAAS/) and events like theannual escar (Embedded Security in Cars) conference.

1BLIS uses cameras inside the exterior rear-view mirrors to warn the driver of vehicles in the blind spot.A short explanatory video is available at http://www.volvocars.com/uk/top/my_volvo/videos/pages/volvo-blis.aspx.

https://www.emsec.rub.de/research/projects/SCAAS/http://www.volvocars.com/uk/top/my_volvo/videos/pages/volvo-blis.aspxhttp://www.volvocars.com/uk/top/my_volvo/videos/pages/volvo-blis.aspx

2 1 Introduction

1.1 Motivation

According to Infineon, TriCore products are used in every second car that is manufacturedtoday. For example the TC1797 is used in a broad range of engine control unitsmanufactured by Bosch which can then be found in models like Audi RS5, Land RoverDefender, and Mercedes-Benz G65 AMG. In fact a simple Internet search featuring termslike “TriCore” and “ECU” directly reveals detailed instructions on how to perform chiptuning by means of ECU manipulation. Some search results even include images thatshow pictures of the contact points required to boot the respective TriCore product.

We mentioned earlier that the automotive industry might be in need of means to protectECU firmware or inter-ECU communication. Employing cryptography could be oneapproach to providing the required security. The Advanced Encryption Standard (AES)is in widespread use for encryption and the TC1797 is one of the top TriCore productsavailable as of today. Therefore we felt motivated to analyze the chances of mountingpower analysis attacks against AES on the TC1797. We felt that the mere recovery ofAES keys by means of power analysis would prove nothing new. Thus we decided toaim at creating AES implementations that are protected against 1𝑠𝑡-order power analysisattacks.

1.2 Related Work

In this section we give a short overview of related work. We present details of thementioned publications later in this thesis.

We know of exactly one scientific work that is directly related to the topic of ourthesis. In June 2009 Andreas Hoheisel submitted his Master’s Thesis [Hoh09] in which hedevelops a side-channel resistant implementation of AES for a TriCore TC1796. While hisimplementation uses masking based on table recomputation, Hoheisel never performed anypower consumption analysis but focused on potential timing attacks instead. Nevertheless,we are able to relate our work to selected parts of his thesis.

AES itself is defined by [FIP01]. The standard was published on November 26, 2001by the National Institute of Standards and Technology (NIST). It specifies the Rijndaelalgorithm and names it “the AES algorithm”. Rijndael was developed by Joan Daemenand Vincent Rijmen. A book titled “The Design of Rijndael” [DR02] provides detailedinsight into design decisions and inner workings of AES.

In the area of power analysis attacks, [KJJ98] is the seminal whitepaper in which theauthors introduce the notion of both Differential Power Analysis (DPA) and SimplePower Analysis (SPA). A third form of power analysis, called CPA, was introducedlater [BCO04]. We additionally refer to [MOP07] as a source of profound theoreticalknowledge about attacks and countermeasures. We explain DPA, SPA, and CPA indetail in Chap. 3.

When it comes to practical countermeasures, four papers are of vital importance forour thesis. All of those papers deal with software countermeasures against power analysisattacks. Firstly, we refer to [RP10a] and [CPRR13]. The former introduces the masking

1.3 Our Contribution 3

scheme itself while the latter exhibits a security flaw in this proposal and provides a fixfor that flaw. Secondly, we found that [KHL11] poses as a viable alternative to [RP10a].Finally, we analyzed the ideas presented in [GM11] in order to have a greater numberof potential implementation alternatives. We give precise explanations of the distinctcountermeasures, along with additional references, in Chap. 6.

1.3 Our ContributionTo the best of our knowledge there has not been much public research involving thecombination of Side-Channel Analysis (SCA) and products from the TriCore family. Weonly know of [Hoh09]. We intend to improve this situation by conducting basic workon the software development for, and special features of, one product from the TriCorefamily: The TC1797. We put up the following two propositions in accordance with thesubject of our thesis:

Proposition 1.3.1 The key of an unprotected AES implementation running on theTC1797 can be recovered using power analysis.

as well as the complementary

Proposition 1.3.2 It is possible to protect AES on the TC1797 against first-order poweranalysis attacks.

We intend to verify the substance of both propositions in the course of our work. Weassume that the results we find will serve as suitable starting points for further researchor industrial projects.

1.4 Organization of this ThesisThis thesis is divided into distinct chapters where each chapter deals with one part ofour overall work. In Chapter 1 we describe our goal and our motivation. We give anoverview of related work and define general terms that span across the whole thesis. InChapter 2 we present the algorithms comprising AES and their mathematical foundation.In Chapter 3 we give an introduction to the concept of power analysis attacks. Wepresent different approaches and select one of them for our work. In Chapter 4 weintroduce the reader to the target platform we selected for this thesis. We describe itsfeatures and the design philosophy behind the platform’s architecture. In Chapter 5we give a detailed description of our working environment. We cover all hardware andsoftware components comprising our development and testing environment and concludethe chapter with a report about our measurement setup. In Chapter 6 we look at generalprinciples of software countermeasures against side-channel attacks. Subsequently wepresent recent publications proposing such countermeasures. We compare the proposalsand select candidates for implementation. In Chapter 7 we give an in-depth report ofhow we created unprotected and protected implementations of AES. We explain whathas to be taken into account when writing code for a microcontroller and summarize

4 1 Introduction

the key takeaways at the end of the chapter. In Chapter 8 we show how we attackedour AES implementations and which observations we made during this practical securityanalysis. In Chapter 9 we explain the conclusions we drew from the results we found.In Chapter 10 we present our opinion regarding potentially interesting areas of furtherresearch. We also discuss some general ideas we have around SCA, attacks and protection,and software used in scientific settings.

2 Background: AES

The term “Advanced Encryption Standard”, in short AES, does not provide muchinformation about itself to the reader. The original algorithm that became standardizedas the AES is called Rijndael. It was named after its developers, Joan Daemen andVincent Rijmen. In this chapter, we recall how the Rijndael algorithm became the currentde-facto standard for symmetric encryption. Subsequently, we describe the algorithmand its mathematical foundation.

2.1 The AES Contest

The term “Advanced” in the name of the standard hints towards a predecessor, which inour case is represented by the Data Encryption Standard (DES). The DES was specifiedwith the release of FIPS-46 by the National Bureau of Standards (NBS)1 in 1977. Thestandard was revalidated in 1983, 1988, 1993, and finally in 1999. This final revalidation,FIPS 46-3 [FIP99], declared 3DES the preferred implementation and restricted plainDES to legacy systems.

Between 1997 and 1999, RSA Security put up a total of four challenges to break DESencryption using brute force attacks. While the first challenge took 96 days to solve, thefourth challenge was solved within less than 24 hours. Additionally, Eli Biham and AdiShamir had published multiple papers on differential cryptanalysis. With the publicationof [BS92], a theoretic attack with lower complexity than brute force existed. It becameevident that the time had come to find a new algorithm to be used as both governmentand industry standard for data encryption.

In 1997, NIST started the AES contest. A total of 15 candidate algorithms weresubmitted. The contest consisted of multiple rounds and narrowed the candidates downto five finalists:

∙ Rijndael, designed by Vincent Rijmen and Joan Daemen.

∙ Twofish, designed by Bruce Schneier, Niels Ferguson, John Kelsey, Doug Whiting,David Wagner, and Chris Hall.

∙ Serpent, designed by Ross Anderson, Eli Biham, and Lars Knudsen.

∙ RC6, designed by Ron Rivest.

∙ MARS, designed by a team of IBM researchers, including Don Coppersmith.1known as NIST since 1988.

6 2 Background: AES

In October 2000, NIST announced Rijndael as the winner of the AES contest. Fol-lowing a drafting and request-for-comments phase, NIST published FIPS-197 [FIP01]on November 26, 2001. The standard became effective on May 26, 2002. Since thisday, Rijndael is the de-facto standard for symmetric encryption. Its predecessor, DES,co-existed within clearly set boundaries until NIST declared FIPS-46-3 invalid in 2005.

2.2 The Rijndael AlgorithmIn this section, we dive into the internals of Rijndael. The algorithm specified as theAES fixes certain parameters like the key length, which makes it a slight modification ofthe original (more generic) Rijndael algorithm. Henceforth, we use the term AES:

Definition 2.2.1 Throughout this thesis, we use the term AES to indicate the algorithmstandardized in FIPS-197, not the document itself.

For the scope of this thesis, we decided to work with a key length of 128 bits. In otherwords, this means that we focus on AES-128. Where applicable, we provide algorithmdescriptions tailored to this specific configuration.

2.2.1 Basic PropertiesAES is a block cipher based on a Substitution-Permutation Network (SPN). Input andoutput blocks have a fixed length of 128 bits. The internal state of the cipher consists of128 bits as well, arranged as 𝑁𝑏 = 4 columns, where each column contains four bytes (one32-bit word). The key length can be chosen from 128, 192, and 256 bits. As mentionedbefore, we fix the key length at 𝑛 = 128 bits or 𝑁𝑘 = 4 32-bit words. The combinationof 𝑁𝑏 = 4 and 𝑁𝑘 = 4 determines that the number of rounds amounts to 𝑁𝑟 = 10.

In terms of data, AES operates on bytes. One block of input, the state, and the outputare each represented by 16 bytes (four 32-bit words). The same holds for the key in ourcase. Next, we describe the cipher.

2.2.2 Mathematical FoundationEach byte manipulated in the AES during encryption or decryption represents an elementof GF(28). This finite field is constructed over GF(2) using the irreducible polynomial

𝑃 (𝑥) = 𝑥8 + 𝑥4 + 𝑥3 + 𝑥 + 1 . (2.1)

Each element of GF(28) is represented by a polynomial

𝑎(𝑥) =7∑︁

𝑖=0𝑎𝑖𝑥

𝑖 = 𝑎7𝑥7 + 𝑎6𝑥6 + · · ·+ 𝑎1𝑥 + 𝑎0

where all coefficients 𝑎𝑖 ∈ GF(2). The degree and the binary coefficients make iteasy to encode such a polynomial as a byte by mapping the coefficients to distinct bit

2.2 The Rijndael Algorithm 7

positions. For example, the polynomial 𝑎(𝑥) = 𝑥7 + 𝑥5 + 𝑥2 + 1 has 𝑎7, 𝑎5, 𝑎2, 𝑎0 = 1and 𝑎6, 𝑎4, 𝑎3, 𝑎1 = 0. This gives the binary notation 101001012 which is equal to 0xA5in hexadecimal notation. Whenever we refer to an element of GF(28) in hexadecimalnotation, we denote this by using curly braces, for example, {A5}.

Over GF(28), addition and multiplication are defined. We recall the rules for bothoperations in the following.

Addition is performed coefficient-wise. The coefficients are added modulo 2 becausethey come from GF(2). Bitwise addition modulo 2 corresponds to calculating theExclusive Or (XOR) of both values. This gives

𝑎(𝑥) + 𝑏(𝑥) =7∑︁

𝑖=0𝑎𝑖𝑥

𝑖 +7∑︁

𝑖=0𝑏𝑖𝑥

𝑖 =7∑︁

𝑖=0(𝑎𝑖 ⊕ 𝑏𝑖)𝑥𝑖

where ⊕ denotes the XOR operation. The XOR calculation is not performed for singlebits at a time but instead it is applied to whole bytes. Because the coefficients are fromGF(2), addition is technically equivalent to subtraction. The neutral element is given by{00} ∈ GF(28).

Multiplication is performed like regular polynomial multiplication followed by thereduction of the product modulo 𝑃 (𝑥) as given in (2.1). While the product withoutmodular reduction would still be equivalent to an element of GF(28), it is technicallyimportant to perform the modular reduction because only then can the result stillbe represented as a single byte. The neutral element for multiplication is given by{01} ∈ GF(28). A multiplicative inverse exists for all non-zero elements of GF(28)because 𝑃 (𝑥) is irreducible. The multiplicative inverse can be found in different ways,one of them being the Extended Euclidean Algorithm (EEA). We discuss alternatives tothis approach in Chap. 6.

2.2.3 Encryption

In order to describe the cipher, we first reproduce the pseudo code from [FIP01]. Wesubstitute concrete numbers for the symbolic parameters. Algorithm 2.2.1 shows theresulting steps for a single encryption. First, the state is initialized with the input.Next, an initial AddRoundKey operation is performed. Then, the same four operationsare repeated nine times in the same order: SubBytes, ShiftRows, MixColumns, andAddRoundKey. The 10𝑡ℎ and final round is almost the same except for the missingMixColumns operation. Finally, the state is copied into the output. In the following, wetake a look at the four elementary operations defined in AES.

SubBytes

SubBytes substitutes each individual byte of the state by a new value. StraightforwardAES implementations usually resort to a static lookup table for performance reasons. Incontrast, implementations resistant to side-channel attacks often compute the SubBytesoperation dynamically. Therefore we describe the computation in detail.

8 2 Background: AES

Algorithm 2.2.1: AES-128 Cipher – EncryptionData: bytes 𝑖𝑛[16], words 𝑤[44]Result: bytes 𝑜𝑢𝑡[16]

1 begin2 byte 𝑠𝑡𝑎𝑡𝑒[16]3 𝑠𝑡𝑎𝑡𝑒← 𝑖𝑛45 AddRoundKey(𝑠𝑡𝑎𝑡𝑒, 𝑤[0..3])6 for 𝑟𝑜𝑢𝑛𝑑← 1 to 9 do7 SubBytes(𝑠𝑡𝑎𝑡𝑒)8 ShiftRows(𝑠𝑡𝑎𝑡𝑒)9 MixColumns(𝑠𝑡𝑎𝑡𝑒)

10 AddRoundKey(𝑠𝑡𝑎𝑡𝑒, 𝑤[𝑟𝑜𝑢𝑛𝑑 * 4, (𝑟𝑜𝑢𝑛𝑑 + 1) * 4])1112 SubBytes(𝑠𝑡𝑎𝑡𝑒)13 ShiftRows(𝑠𝑡𝑎𝑡𝑒)14 AddRoundKey(𝑠𝑡𝑎𝑡𝑒, 𝑤[40..43])1516 𝑜𝑢𝑡← 𝑠𝑡𝑎𝑡𝑒

Computing the output of SubBytes consists of two steps. First, the multiplicativeinverse of the input is computed over GF(28), where {00} is mapped to itself. Second,the resulting byte is subjected to an affine transformation. In matrix notation, thistransformation is denoted as⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

𝑏′0𝑏′1𝑏′2𝑏′3𝑏′4𝑏′5𝑏′6𝑏′7

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

1 0 0 0 1 1 1 11 1 0 0 0 1 1 11 1 1 0 0 0 1 11 1 1 1 0 0 0 11 1 1 1 1 0 0 00 1 1 1 1 1 0 00 0 1 1 1 1 1 00 0 0 1 1 1 1 1

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠·

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

𝑏0𝑏1𝑏2𝑏3𝑏4𝑏5𝑏6𝑏7

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠+

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

11000110

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠(2.2)

where 𝑏0 is the Least Significant Bit (LSB) and 𝑏7 is the Most Significant Bit (MSB).The result of the affine transformation serves as the substitute for the input value.

ShiftRows

ShiftRows performs a cyclical byte-wise left shift on the rows of the state. The first row isnot shifted. The second row is shifted by one byte, the third row is shifted by two bytes,


and the fourth row is shifted by three bytes. This gives⎡⎢⎢⎢⎣𝑠0,0 𝑠0,1 𝑠0,2 𝑠0,3𝑠1,0 𝑠1,1 𝑠1,2 𝑠1,3𝑠2,0 𝑠2,1 𝑠2,2 𝑠2,3𝑠3,0 𝑠3,1 𝑠3,2 𝑠3,3

⎤⎥⎥⎥⎦ 123

⎡⎢⎢⎢⎣𝑠0,0 𝑠0,1 𝑠0,2 𝑠0,3𝑠1,1 𝑠1,2 𝑠1,3 𝑠1,0𝑠2,2 𝑠2,3 𝑠2,0 𝑠2,1𝑠3,3 𝑠3,0 𝑠3,1 𝑠3,2

⎤⎥⎥⎥⎦where the matrix on the right represents the new state after ShiftRows has completed.

MixColumns

MixColumns operates on each column of the state. A column is interpreted as a four-termpolynomial with coefficients from GF(28). Each column is multiplied modulo 𝑥4 + 1 withthe fixed polynomial

𝑎(𝑥) = {03}𝑥3 + {01}𝑥2 + {01}𝑥 + {02} .

This multiplication can be written in matrix form as⎛⎜⎜⎜⎝𝑠′0,𝑐𝑠′1,𝑐𝑠′2,𝑐𝑠′3,𝑐

⎞⎟⎟⎟⎠ =⎛⎜⎜⎜⎝

02 03 01 0101 02 03 0101 01 02 0303 01 01 02

⎞⎟⎟⎟⎠ ·⎛⎜⎜⎜⎝

𝑠0,𝑐𝑠1,𝑐𝑠2,𝑐𝑠3,𝑐

⎞⎟⎟⎟⎠ (2.3)where 0 ≤ 𝑐 < 4 indexes the column being processed.

AddRoundKey

Finally, AddRoundKey adds one word of the key schedule to each column of the state bymeans of an XOR operation. That is⎛⎜⎜⎜⎝

𝑠′0,𝑐𝑠′1,𝑐𝑠′2,𝑐𝑠′3,𝑐

⎞⎟⎟⎟⎠ =⎛⎜⎜⎜⎝


⎞⎟⎟⎟⎠⊕ 𝑤[4 * 𝑟 + 𝑐]where 𝑤 is an array of four-byte words that is indexed based on the round 𝑟 and thecolumn 𝑐 being processed. We explain the meaning of the array 𝑤 in the following.

Key Expansion

Up to now, we have not clarified what the key schedule is and how it is created. We seefrom Alg. 2.2.1 that one full encryption requires a total of 11 AddRoundKey invocations.The term “round key” refers to the use of one individual key in each round. Thoseindividual keys comprise what is called the key schedule of the cipher. The round keysare derived from the cipher key based on a specific procedure called key expansion.

10 2 Background: AES

Algorithm 2.2.2: AES-128 Key ExpansionData: bytes 𝑘𝑒𝑦[16]Result: words 𝑤[44]

1 begin2 word 𝑡𝑒𝑚𝑝34 𝑤[0..3]← 𝑘𝑒𝑦[0..15]56 𝑖← 47 while 𝑖 < 44 do8 𝑡𝑒𝑚𝑝 = 𝑤[𝑖− 1]9 if 𝑖 mod 4 = 0 then

10 𝑡𝑒𝑚𝑝← SubWord(RotWord(𝑡𝑒𝑚𝑝))⊕ Rcon[𝑖/4]11 𝑤[𝑖]← 𝑤[𝑖− 4]⊕ 𝑡𝑒𝑚𝑝12 𝑖← 𝑖 + 1

In our case, the cipher key consists of 128 bits, and we need a total of 11 round keysof the same individual length. We provide the key expansion specification, tailored toAES-128, as Alg. 2.2.2. Executing this algorithm yields the array 𝑤 we mentioned earlier.

We see from the algorithm that the first four 32-bit words of the key schedule are filledwith the cipher key. Inside the loop, SubWord simply applies the S-box to each bytein the given word while RotWord cyclically rotates the word one byte to the left, i.e.,[𝑎0, 𝑎1, 𝑎2, 𝑎3] becomes [𝑎1, 𝑎2, 𝑎3, 𝑎0]. Rcon is an array that contains the value of 𝑥𝑖−1 inGF(28), starting at 𝑖 = 1. We see from the loop index 𝑖 that we require only the first tenvalues of Rcon. This fact strongly reduces the amount of memory required for the Rconlookup table.

2.2.4 DecryptionDecryption works by performing inverse operations in a modified order as depicted inAlg. 2.2.3. In the following, we take a look at the inverse operations InvShiftRows,InvSubBytes, and InvMixColumns.

InvShiftRows

InvShiftRows does exactly the opposite of ShiftRows by shifting rows of the state cyclicallyto the right. The first row is not shifted. The second row is shifted by one byte, the thirdrow is shifted by two bytes, and the fourth row is shifted by three bytes. This gives⎡⎢⎢⎢⎣

𝑠0,0 𝑠0,1 𝑠0,2 𝑠0,3𝑠1,0 𝑠1,1 𝑠1,2 𝑠1,3𝑠2,0 𝑠2,1 𝑠2,2 𝑠2,3𝑠3,0 𝑠3,1 𝑠3,2 𝑠3,3

⎤⎥⎥⎥⎦ �1�2�3

⎡⎢⎢⎢⎣𝑠0,0 𝑠0,1 𝑠0,2 𝑠0,3𝑠1,3 𝑠1,0 𝑠1,1 𝑠1,2𝑠2,2 𝑠2,3 𝑠2,0 𝑠2,1𝑠3,1 𝑠3,2 𝑠3,3 𝑠3,0

⎤⎥⎥⎥⎦


Algorithm 2.2.3: AES-128 Cipher – DecryptionData: bytes 𝑖𝑛[16], words 𝑤[44]Result: bytes 𝑜𝑢𝑡[16]

1 begin2 byte 𝑠𝑡𝑎𝑡𝑒[16]3 𝑠𝑡𝑎𝑡𝑒← 𝑖𝑛45 AddRoundKey(𝑠𝑡𝑎𝑡𝑒, 𝑤[40..43])6 for 𝑟𝑜𝑢𝑛𝑑← 9 to 1 do7 InvShiftRows(𝑠𝑡𝑎𝑡𝑒)8 InvSubBytes(𝑠𝑡𝑎𝑡𝑒)9 AddRoundKey(𝑠𝑡𝑎𝑡𝑒, 𝑤[𝑟𝑜𝑢𝑛𝑑 * 4, (𝑟𝑜𝑢𝑛𝑑 + 1) * 4])

10 InvMixColumns(𝑠𝑡𝑎𝑡𝑒)1112 InvShiftRows(𝑠𝑡𝑎𝑡𝑒)13 InvSubBytes(𝑠𝑡𝑎𝑡𝑒)14 AddRoundKey(𝑠𝑡𝑎𝑡𝑒, 𝑤[0..3])1516 𝑜𝑢𝑡← 𝑠𝑡𝑎𝑡𝑒

where the matrix on the right represents the new state after ShiftRows has completed.

InvSubBytes

InvSubBytes applies the inverse S-box individually to all bytes of the state by computingthe inverse of the affine function given in (2.2) followed by inversion in GF(28). Theinverse of the affine function is given in [Gla07] as

𝑏′𝑖 = 𝑏(𝑖+2) mod 8 ⊕ 𝑏(𝑖+5) mod 8 ⊕ 𝑏(𝑖+7) mod 8 ⊕ 𝑑𝑖with 𝑑 = {05}. We can write this in matrix form as⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

𝑏′0𝑏′1𝑏′2𝑏′3𝑏′4𝑏′5𝑏′6𝑏′7

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠=

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

0 0 1 0 0 1 0 11 0 0 1 0 0 1 00 1 0 0 1 0 0 11 0 1 0 0 1 0 00 1 0 1 0 0 1 00 0 1 0 1 0 0 11 0 0 1 0 1 0 00 1 0 0 1 0 1 0

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠·

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

𝑏0𝑏1𝑏2𝑏3𝑏4𝑏5𝑏6𝑏7

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠+

⎛⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝

10100000

⎞⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠. (2.4)

The same remark as for SubBytes holds for InvSubBytes: The inverse S-box is usuallyimplemented as a lookup table, but is often computed in a secure way in the contextof side-channel countermeasures. Therefore, we provide the detailed description of theinverse affine transformation here.

12 2 Background: AES

InvMixColumns

InvMixColumns is the inverse of MixColumns. The columns of the state are multipliedmodulo 𝑥4 + 1 by the fixed polynomial

𝑎−1(𝑥) = {0𝑏}𝑥3 + {0𝑑}𝑥2 + {09}𝑥 + {0𝑒} .

This multiplication can again be written in matrix form as⎛⎜⎜⎜⎝𝑠′0,𝑐𝑠′1,𝑐𝑠′2,𝑐𝑠′3,𝑐

⎞⎟⎟⎟⎠ =⎛⎜⎜⎜⎝

0𝑒 0𝑏 0𝑑 0909 0𝑒 0𝑏 0𝑑0𝑑 09 0𝑒 0𝑏0𝑏 0𝑑 09 0𝑒

⎞⎟⎟⎟⎠ ·⎛⎜⎜⎜⎝


⎞⎟⎟⎟⎠where 0 ≤ 𝑐 < 4 indexes the column being processed.

2.2.5 Implementation OutlookThe internal data structures of AES primarily facilitate software implementations of thecipher. The input, the state, the output, and the key schedule are easy to representas arrays that contain bytes. AES can be implemented in a straight-forward manneron 8-bit processors by simply converting the pseudo code from the specification intocompilable code in a language like C.

In our case, the target system features a 32-bit Central Processing Unit (CPU). Itis not efficient to process single bytes at a time on a 32-bit processor because doing soleaves 24 bits of the affected registers unused. It is highly beneficial to manipulate full32-bit words whenever this is possible. To our advantage, many of the AES operationscan be optimized in such a way that they use the full potential of the 32-bit architecture.Firstly, some operations like AddRoundKey can operate on full four-byte words at onetime. Secondly, there are cases in which it is possible to combine up to four individual8-bit operations in one 32-bit register by using special instructions. We describe thetarget platform in detail in Chap. 4, and discuss AES optimization opportunities andlimits in Chap. 7.

3 Background: Power Analysis Attacks

Cryptographic algorithms have always been under attack by adversaries who intend toreveal the secret message, the secret key, or both. Many attacks exist, ranging from plainbrute force to the exploitation of statistical properties in cipher texts or simply designflaws in the targeted algorithm. Those attacks represent classic forms of cryptanalysis.

Other forms of cryptanalysis include linear cryptanalysis [MY92] where the goal is tofind affine approximations of a cipher, and differential cryptanalysis [BS90] where theresults of subtle changes to the input of a cipher are studied in order to discover potentialweaknesses. Linear and differential cryptanalysis have evolved from plain attack formsinto scientific methods which are applied during the design phase of new cryptographicalgorithms.

One of the most prevalent cryptanalytic techniques today is called side channel analysis.Instead of mounting a direct attack on a cryptographic algorithm, the attacker attemptsto gain useful information by observing a physical implementation of the cipher. Thedesired information flows from the cryptographic device to the attacker through what iscalled a side channel. Common types of side channels include

∙ Sounds produced by mechanical devices like the Enigma, lock dials, or evencomputers [ST04],

∙ Time required to perform a cryptographic operation [Koc96],

∙ Electromagnetic Fields caused by cryptographic devices (see for example [Kas11]),

∙ Light emitted from devices or distinct parts (see for example [SNK+12]), and

∙ Power Consumption of a cryptographic device during the execution of a crypto-graphic algorithm [KJJ98].

Information gathered from side channels is usually stored and then analyzed in orderto validate or invalidate one or more hypotheses put up by the attacker. We are primarilyinterested in power analysis attacks, that is, attacks using the power consumption of acryptographic device as the side channel. In the following, we describe three types ofpower analysis attacks. We subsequently select one of the three approaches for the restof this thesis.

3.1 Simple Power AnalysisSimple Power Analysis (SPA) was introduced by Kocher et al. in [KJJ98]. In an SPAattack, recorded power traces are directly interpreted, for example by visual inspection.

14 3 Background: Power Analysis Attacks

Square and Multiply Exponentiation is a well-known example when it comes to SPA. Ifsquarings and multiplications can be clearly distinguished in a power trace, an attackercan derive the secret exponent directly from the plotted trace. But also block ciphers arepotentially vulnerable to SPA. When [KJJ98] was published, DES was the standard fordata encryption. The paper shows power traces of a DES encryption as examples of theinformation an attacker can uncover. In the case of AES some SPA attacks have beenlaunched against the key schedule [BS99, Man02].

SPA attacks become possible when the timing or the order of operations (i.e., conditionalbranches) depend on the data being processed. It is thus important to ensure, as faras possible, that an implementation of a cryptographic algorithm features a constantexecution path, effectively suppressing most SPA-enabling characteristics.

3.2 Differential Power AnalysisIn Chapter 4 of [KJJ98], the authors note that “In addition to large-scale power varia-tions due to the instruction sequence, there are effects correlated to data values beingmanipulated.” Even in the presence of noise and measurement errors, those smaller effectscan be exploited using Differential Power Analysis (DPA).

Because we want to recover a secret AES key, we adapt the attack on DES proposedby Kocher et al. to AES. We choose a naive approach and try to attack the intermediatevalue after the first S-box in the first round. In order to do so, we first record 1000 powertraces of an unprotected implementation, sending random plaintexts to the device. Westore the plaintexts in a separate file. We know that the targeted intermediate dependsonly on the first plaintext byte and on the first key byte. We select the least significantbit of the intermediate as our target. The DPA selection function is then

𝐷(𝑃, 𝑏, 𝐾)

where the plaintext byte 𝑃 is known, the bit index 𝑏 is 0 because we target the leastsignificant bit of the intermediate, and the value of the hypothetical key byte 𝐾 rangesfrom 0 to 255. We know that the intermediate value 𝐼 is given as

𝐼 = 𝑆(𝑃 ⊕𝐾)

where 𝑆(. . . ) denotes the S-box. The concrete selection function becomes

𝐷(𝑃, 𝑏, 𝐾) = LSB(𝐼) = LSB(𝑆(𝑃 ⊕𝐾)) .

We calculate a running average of the traces where 𝐷 = 1 and a running average of thetraces where 𝐷 = 0. After all traces have been processed, we subtract the two averagesand get a differential trace Δ𝐷. If the guessed key byte is correct, the differential tracewill show a clearly visible spike while it will be mostly flat if the guess was incorrect.This is due to the fact that parts of the trace that are unrelated to 𝐷 get smaller with

1√𝑚

, m being the number of measurements, while the parts of the trace that are directlyrelated (correlated) to 𝐷 stay in place and add up with increasing 𝑚.

3.3 Correlation Power Analysis 15

To give an impression of differential traces we have created three plots. We used theMatlab examples accompanying [MOP07] with kind permission from Elisabeth Oswald.The data consists of 200 power traces at 5000 samples per trace. The plots show thedifference traces resulting from an attack on the first S-box in the first round of AES. Asingle-bit power model targeting the LSB of the S-box output was used to predict powerconsumption. The differential traces were computed as the difference between the sum ofthose power traces where the LSB of the predicted power consumption is 1 and the sumof those power traces where the LSB of the predicted power consumption is 0. Althoughthis computation is simplified in comparison to the original formula given in [KJJ98], wekeep the notation and call the resulting differential traces Δ𝐷.

In the following, we describe the plots in order to clarify their meaning. Figure 3.1shows the differential trace that results from the wrong key guess 𝐾 = 6. It is easy to seethat there are no significant spikes in the plot. Figure 3.2 shows the differential trace forthe correct key guess 𝐾 = 43. In contrast to Fig. 3.1, the spikes are immediately visible.To show a downside of DPA, we present Fig. 3.3. This differential trace results fromthe wrong key guess 𝐾 = 1. The curve roughly spans the range from -3000 to +3000while the first two curves reached maximal values around ±400 and ±650 respectively.This is bad for at least two reasons. Firstly, the overall height of the three plots we havecreated is not identical. This hinders comparison of the curves by visual inspection. Onecould try to remedy this problem by creating all plots with identical y axis limits. Inthis case the problem would remain the same because the ±650 spikes would hardly bevisible relative to a ±3000 y axis. Secondly, an automated decision about the correct keyhypothesis seems impossible in the case of our example. If the software that is supposedto make this decision simply looked for the highest peak, it would by no means select𝐾 = 43 as the best key candidate.

The idea of DPA can be summarized as the analysis of the difference in power con-sumption caused by changes to the plaintext or ciphertext, depending on the cipherround being attacked.

3.3 Correlation Power Analysis

DPA can be used to quickly mount an attack on any physical implementation. It requiresrather low amounts of computational effort and the power model is simple because itusually takes exactly one bit of an intermediate value into account. DPA does in factexhibit the correlation between input, key, and power consumption, but the difference ofmeans is not a very strong metric. There is however a more flexible and more powerful toolfor the evaluation phase of a power analysis attack. In [BCO04] Brier, Clavier, and Olivierintroduced an approach called Correlation Power Analysis (CPA). They propose to usean established and well-researched method to measure the linear correlation between twovariables: The Pearson Correlation Coefficient1. Papers like [Man04] explicitly endorsethe usage of the correlation coefficient as the mathematical tool in power analysis attacks.

1In short “correlation coefficient”.


0 1,000 2,000 3,000 4,000 5,000

−500

0

500

Samples

Δ𝐷

Figure 3.1: Differential trace for the wrong guess 𝐾 = 6

0 1,000 2,000 3,000 4,000 5,000

−500

0

500

Samples

Δ𝐷

Figure 3.2: Differential trace for the correct guess 𝐾 = 43

0 1,000 2,000 3,000 4,000 5,000

−2,000

0

2,000

Samples

Δ𝐷

Figure 3.3: Differential trace for the wrong guess 𝐾 = 1

3.3 Correlation Power Analysis 17

The correlation 𝜌 of two variables 𝑋 and 𝑌 is defined as

𝜌(𝑋, 𝑌 ) = 𝐶𝑜𝑣(𝑋, 𝑌 )√︀𝑉 𝑎𝑟(𝑋) · 𝑉 𝑎𝑟(𝑌 )

where 𝐶𝑜𝑣(𝑋, 𝑌 ) is the covariance and 𝑉 𝑎𝑟(𝑋) resp. 𝑉 𝑎𝑟(𝑌 ) is the variance of 𝑋 and𝑌 . When we attack a physical implementation using power analysis, we do not look atrandom populations sharing a common distribution but instead we analyze samples. ThePearson correlation coefficient for samples, also called empirical correlation coefficient, isdefined as

𝑟𝑥,𝑦 =∑︀𝑁

𝑛=1 (𝑥𝑛 − �̄�)(𝑦𝑛 − 𝑦)√︁∑︀𝑁𝑛=1 (𝑥𝑛 − �̄�)2

√︁∑︀𝑁𝑛=1 (𝑦𝑛 − 𝑦)2

(3.1)

where 𝑥 and 𝑦 denote the two sets of samples, �̄� and 𝑦 denote the mean of each set, and𝑁 is the number of samples.

We must now clarify between which sets we want to calculate a linear correlation. ACPA attack generally consists of five important steps which are explained in [MOP07]and in university courses like “Implementation of Cryptographic Schemes 1” [Paa12]. Webriefly recall the steps in the following. To perform an attack the attacker has to:

Select an intermediate value to be attacked. We can use any intermediate variablethat is used during the execution of the algorithm. Some intermediates are suited betterthan others. The output of the first S-box in the first round of AES is often used as thetargeted intermediate.

Measure the power consumption. The attacker makes the target device encrypt ordecrypt 𝐷 different inputs. She measures the power consumption during those 𝐷 runs ofthe algorithm. The measurements are usually taken with a digital oscilloscope. The oscil-loscope samples the power consumption at a fixed frequency, resulting in 𝑇 samples perpower trace. Measuring the power consumption of 𝐷 runs of the algorithm at 𝑇 samplesper trace results in a matrix T of size 𝐷 × 𝑇 . The individual traces are usually not keptin memory but stored to the disk of a computer instead. The computer interfaces withthe target device (it sends the input and receives the output) and controls the oscilloscope.

Calculate hypothetical intermediate values. The goal of the attack is to recover thesecret key based on the power consumption measured during the 𝐷 runs of the algorithmfrom the previous step. The key is unknown to the attacker. It is thus necessary tocreate a set of hypotheses that can then be compared to the physical behavior of thedevice observed in terms of power consumption during the measurement phase. The firststep consists in defining the possible key values. Attacking a full 128-bit AES key at oncewould be too resource intensive. Instead a single byte of the key is usually attacked. Thislimits the choices for the key byte 𝑘 to the range (0, ..., 255). The next step is to calculatethe hypothetical value of the intermediate selected at the beginning of the attack. Theintermediate value depends on the key byte 𝑘 and on the corresponding byte 𝑑 of theinput used in each individual run of the algorithm. As an example, if we selected the


output of the S-box in the first round as the target for the attack, the intermediate valuewould be

𝑣 = 𝑓(𝑑, 𝑘) = 𝑆(𝑑⊕ 𝑘)

where 𝑑 is one byte of the input, 𝑘 is one hypothetical byte of the key, and 𝑆(. . . ) denotesthe S-box. If we further decided to attack the first byte of the key we would calculate 𝑣for each individual first byte of the 𝐷 inputs and each possible choice for 𝑘. This resultsin a matrix V of size 𝐷×𝐾 that contains the hypothetical intermediate values. Formallywe have

V𝑖,𝑗 = 𝑓(𝑑𝑖, 𝑘𝑗) 𝑖 = 1, . . . , 𝐷 𝑗 = 1, . . . , 𝐾 .

The next step is to map the hypothetical intermediate values to hypothetical powerconsumption values.

Map hypothetical intermediate values to hypothetical power consumption.The hypothetical intermediate values cannot be compared to the measured power con-sumption because they lie in different domains. To make the comparison possible it isnecessary to map each hypothetical intermediate value to the amount of power the devicemight consume while processing the respective value. Due to noise and other effects itis impossible to predict a precise power consumption value that would be sampled bythe oscilloscope during the collection of power traces. Instead the power consumption ofthe device is described by a power model. A power model is always an approximationbecause it is nearly impossible to create a perfect model without intimate knowledge ofthe internal design of the device under attack. It is however possible to create powermodels that fit the observed behavior of the device very well. One very popular model iscalled the Hamming Weight Model. It assumes that the power consumption of a devicedepends on the Hamming weight of the data being processed. The (n-bit) Hammingweight HW(𝑣) is defined as the number of bits equal to 1 in the binary representation of𝑣. For example, HW(4216) = HW(010000102) = 2. Other models range from a simple BitModel taking only one bit of a hypothetical intermediate into account to more complexmodels like the Hamming Distance Model which calculates the Hamming distance, definedas HD(𝑣1, 𝑣2) = HW(𝑣1 ⊕ 𝑣2), for example between two subsequent values stored in aregister of the device under attack. Detailed simulation and modeling techniques aredescribed in Chapter 3 of [MOP07].

The attacker applies the selected power model to each individual entry of V andreceives a matrix H of size 𝐷 ×𝐾 that contains hypothetical power consumption values.

Compare the hypothetical power consumption with the power traces. As thefinal step of the attack, the attacker has to analyze the relation between the calculatedhypothetical power consumption values and the measured power traces. This is where thecorrelation coefficient from (3.1) is applied as a measure. The linear correlation betweenhypothetical and observed power consumption is the criterion enabling the attacker todecide which key hypothesis was most likely the correct one at the end of the attack.

3.4 Other Types of Power Analysis Attacks 19

The result of this five-step process is a correlation trace that shows clearly distin-guishable peaks both for the correct hypothesis and for the correct point in time. Thistype of trace results from difference of means evaluation as well, but using the Pearsoncorrelation coefficient has two advantages. Firstly the underlying mathematical theoryis well established and well studied. Secondly, in contrast to the difference of means,all values of the Pearson correlation coefficient lie between −1 and 1. Different attackresults are therefore very easy to compare. To give an impression of correlation traceswe have prepared three plots again. We plotted the correlation traces for the same keyguesses that we used in the previous section. Figure 3.4 shows the correlation trace forthe wrong guess 𝐾 = 6. The trace ist mostly flat. Figure 3.5 shows the correlationplot for the correct key 𝐾 = 43, and Fig. 3.6 depicts the correlation for the wrong key𝐾 = 6. In the DPA case, the latter had the extremely unfortunate property that itmade the comparison of differential traces nearly impossible due to its overall amplitude(see Fig. 3.3). In the CPA case we see that it simply leads to a flat correlation trace.Moreover, the distinct plots are easy to compare because the y axis ranges exactly from-1 to 1 due to the nature of the Pearson correlation coefficient.

3.4 Other Types of Power Analysis Attacks

There are two more approaches to power analysis attacks that we wish to note. The firstis called Inferential Power Analysis (IPA) while the other goes by the name of MutualInformation Analysis (MIA).

3.4.1 Inferential Power Analysis

Inferential Power Analysis (IPA) was presented by Fahn and Pearson [FP99] as analternative to DPA. Both DPA and CPA require that the attacker has access to eitherplaintexts or ciphertexts. Attacks are based on the power consumption of computationsthat directly combine a small part of the secret key and some part of the input or outputof the cipher. However, there are scenarios in which the attacker does not have accessto inputs or outputs of the cipher, which makes it impossible to mount a DPA or CPAattack. In contrast, IPA requires no knowledge of the inputs or outputs of the cipher.Recorded power traces are first subjected to a profiling stage. The profiling stage consistsof preprocessing steps that attempt to identify moments in which the key is involved inthe cipher computation. The profiling stage is followed by the key extraction stage inwhich the secret key is recovered based on the results of the profiling stage. IPA allowsan attacker to target the whole cipher, for example one of the inner rounds, becauseno control over plaintexts or ciphertexts is required. From the attacker’s point of viewthere is another major advantage to this type of attack. As soon as a profile for onecryptographic device has been created, the key extraction can be applied to any otherdevice that follows the same design, for example identical smart card terminal models.


0 1,000 2,000 3,000 4,000 5,000−1

−0.5

0

0.5

1

Samples

Cor

rela

tion

Figure 3.4: Correlation trace for the wrong guess 𝐾 = 6

0 1,000 2,000 3,000 4,000 5,000−1

−0.5

0

0.5

1

Samples

Cor

rela

tion

Figure 3.5: Correlation trace for the correct guess 𝐾 = 43

0 1,000 2,000 3,000 4,000 5,000−1

−0.5

0

0.5

1

Samples

Cor

rela

tion

Figure 3.6: Correlation trace for the wrong guess 𝐾 = 1

3.5 Order of Power Analysis Attacks 21

3.4.2 Mutual Information Analysis

DPA and CPA employ statistical methods to accept or reject a given hypothesis. Gier-lichs, Batina, Tuyls, and Preneel introduced Mutual Information Analysis (MIA) as analternative approach [GBTP08]. In simple terms, mutual information describes the extentto which two random variables depend on each other. The mutual information 𝐼(𝑋; 𝑌 )is zero if 𝑋 and 𝑌 are independent and goes up to the Shannon entropy 𝐻(𝑋) if 𝑋 isfully dependent on 𝑌 . The authors of [GBTP08] build a side-channel distinguisher basedon two cascaded channels. The first channel exists between words W processed insidea cryptographic device and leakage L dependant on those words. The second channelleads from the leakage L to the physical observation O. In our case, O is representedby power consumption measurements. The attacker measures O(t) in order to recordpower traces and then tries to deduce information about W from the leakage informationL contained in O. To enable this, Gierlichs et al. describe methods of estimating themutual information between O and L under a given key hypothesis. The result is similarto that of CPA in that it yields a clearly distinguishable peak in value for the correctkey hypothesis. For the fully detailed distinction between MIA and other approaches, werefer to [GBTP08], especially to Section 5.

The approach by Gierlichs and his colleagues has gained much attention in the scientificcommunity. Amongst others, Rivain and Prouff [PR09] have picked up on the idea ofMIA. MIA is also becoming a prominent tool for theoretic security evaluation, see forexample [CPRR13].

3.5 Order of Power Analysis Attacks

In our description of DPA and CPA, we discussed how a single point in time within apower trace serves as the target of the attack. As an example, we selected the moment intime at which the result of the first S-box lookup in the first round of AES is processed bythe cryptographic device we are attacking. We described how a selection function is used(DPA) and how power consumption is predicted (CPA). Kocher and his colleagues statedin their original paper that “More sophisticated selection functions may also be used. Ofparticular importance are high-order DPA functions that combine multiple samples fromwithin a trace.” [KJJ98]. With regard to this statement, using a single sample from apower trace (and thereby a single intermediate value) means that a so-called 1𝑠𝑡-orderattack is being mounted. A 2𝑛𝑑-order attack would be based on a combination functionthat takes two samples from a power trace (two intermediate values) into account, and soforth. Higher-Order DPA (HODPA) and suitable countermeasures are subjects of activeresearch [SVCO+10]. In the scope of our work, we are almost exclusively interested in1𝑠𝑡-order attacks and suitable countermeasures. Nevertheless, it is important to knowthat higher-order attacks exist: If an implementation is secure against 𝑑𝑡ℎ-order attacks,it might still be breakable by attacks of (𝑑 + 1)𝑡ℎ or even higher order.


3.6 Selecting one Type of Power AnalysisThe initial proposal of DPA proved to be a powerful tool that could be used to recoversecret keys used in cryptographic devices. DPA reveals a correlation between thekey, the chosen intermediate, and the power consumption of the cryptographic device.Nevertheless, the scientific community has improved the original idea by using the Pearsoncorrelation coefficient as the mathematical tool of choice. Mangard argues that thePearson correlation coefficient is favorable “because there exists a well-established theoryon measuring correlations this way” [Man04]. Our experience is that this statement holdsand that CPA is both more precise and more efficient than DPA. MIA is an interestingapproach that exploits the full amount of information available from the observed powerconsumption, but apart from the papers we mentioned, we have not seen or heard ofany practical MIA-based attacks. Thus we decided to use 1𝑠𝑡-order CPA to conductside-channel analysis of our AES implementations.

4 Target Platform: TriCore TC1797

Power analysis attacks aim at exploiting information gathered from the power consumptionof some hardware executing one concrete implementation of a cryptographic algorithm.The scope of our thesis encompasses software implementations of AES which implies theneed for a target platform on which we can run those implementations. In this chapterwe present the target platform we selected for our work. We give an overview of theplatform’s architecture and of those features bearing the highest relevance with regard toour goal of mounting and hindering power analysis attacks against AES.

4.1 The TriCore Concept

The term TriCore was conceived by Infineon Technologies for a family of powerful 32-bitmicrocontrollers. The central concept behind TriCore products is that they combine aReduced Instruction Set Computer (RISC) architecture with Microcontroller Unit (MCU)and Digital Signal Processing (DSP) features on a single chip, a property that explainsthe origin of the name TriCore. The first generation of TriCore products was launched in1999 under the name “AUtomotive unifieD processOr”, in short AUDO. Automotive usecases like engine management or safety systems are the main area targeted by TriCoreproducts. Nevertheless, Infineon states that they are also suited for industrial scenarioslike solar panel or wind turbine related requirements. For details we refer to [ITA12]. Wewish to note that the term TriCore must not be confused with computer-related termslike DualCore or QuadCore. TriCore does not mean that multiple CPU cores operate inparallel.

4.2 Selecting a TriCore Product

For our work we selected the Infineon TriCore TC1797 as the target platform. TheTC1797 resides in the high end section of the third AUDO generation, the “AUDOFuture” product family. It is manufactured using a 130 nm process. The maximum clockfrequency amounts to 180 MHz and there are 4 MB of on-chip program memory. TheTC1797 offers 221 digital I/O lines and 48 Analog-to-Digital Converter (ADC) channels,along with 118 timed I/O channels for purposes like Pulse Width Modulation (PWM)and other applications that require timed execution of code. For external communicationit can natively deal with up to four Controller Area Network (CAN) nodes. Additionallyit features the following interfaces by twos:

∙ Asynchronous Serial Channel (ASC)

24 4 Target Platform: TriCore TC1797

∙ Synchronous Serial Channel (SSC)

∙ Micro Second Channel (MSC)

∙ Micro Link Interface (MLI)

∙ FlexRay

This list of features makes it clear that the TC1797 is a very powerful and at the sametime a highly complex device. It seems reasonable to ask why such a piece of hardwarewould be interesting with regard to side-channel resistance. As we mentioned in theintroduction, TriCore products are used in a lot of modern vehicles. Topics like securecommunication between automotive components and tamper resistance are attractingincreasing amounts of interest from both vendors and the research community. Thisjustifies the idea of evaluating side-channel attacks against software running on TriCoreproducts.

4.3 The TriCore Architecture

Common features and properties of multiple TriCore microcontroller models are definedby the TriCore architecture. As of today two versions of the TriCore architecture exist:TriCore version 1.3 (referred to as v1.3 or TC1.3), and TriCore version 2.0 (referred toas v2.0 or TC2) [ITA02]. Individual models can differ in terms of periphery, amount ofbuilt-in memory, maximum CPU frequency, and other characteristics. Nevertheless, theymust all fulfill the requirements of the architecture version they implement. The TC1797implements v1.3.1 which we investigate in the following.

4.3.1 Registers and Instructions

The v1.3 architecture is described in detail in [ITA08a]. According to the specification,a TriCore microcontroller is a 32-bit system. All registers and instruction opcodes are32 bits wide except for 16-bit instructions which can be used to reduce the overall codesize. There are two register files containing 16 32-bit registers each. One register filecontains data registers D0 through D15. D15 acts as an implicit data register. The otherregister file contains address registers A0 through A15. Not all address registers canbe used freely because three of them fulfill a special purpose: A10 contains the StackPointer (SP), A11 contains the Return Address (RA), and A15 is an implicit addressregister. The registers used implicitly (D15, A15) are of special importance to 16-bitinstructions. For those instructions the implicit register is hard coded into the opcodewhich reduces the overall code size.

Most operations are executed within one clock cycle. TriCore devices contain twofour-stage pipelines and one separate pipeline for loop execution. Up to three instructionscan be executed in a single clock cycle due to the fact that the pipelines run in parallel[ITA02, p. 40].

4.4 Relevant Features of the TC1797 25

4.3.2 Memory Layout and Addressing

The architecture further specifies that memory is addressed using 32-bit addresses. Thisimplies that up to 4 GBytes of RAM can be addressed. The address space is logicallydivided into 16 segments of 256 MBytes each, denoted by the hexadecimal numbers0𝐻 through 𝐹𝐻 . Depending on the segment used, the same physical memory may beaccessed in different technical ways. For example, the segments 8𝐻 and 𝐴𝐻 point to thesame physical memory locations but addressing via the segment 8𝐻 allows cached accesswhile addressing via 𝐴𝐻 allows non-cached access [ITA09b, ch. 8].

4.3.3 Calling Conventions

The TC1797 is a RISC microcontroller. RISC architectures normally make use of a smallinstruction set and a big number of registers. The TC1797 features 16 data and 16 addressregisters. A task being executed on the TC1797 runs inside a so-called context that “isdivided into the upper context and the lower context”. The upper context comprisesthe registers A[10] through A[15] as well as D[8] through D[15], the processor statusword, and information about the previous context. We recall that A[10] holds the SP,A[11] holds the RA, and A[15] is used for implicit addresses just as D[15] is used forimplicit data. The lower context consists of A[2] through A[7], D[0] through D[7], andthe program counter stored in A[11].

The conventions for function calls are described in the TriCore Embedded ApplicationsBinary Interface (EABI) documentation [ITA07]. The upper context is automaticallysaved when a function is called and restored when the function returns. There is noautomatic handling of the lower context but it can be saved and restored manually usingthe SVLCX and RSLCX instructions.

Due to the TriCore architecture function arguments are passed in registers and not viathe stack. Non-pointer arguments are passed to the called function in the registers D[4]through D[7] while pointer arguments are passed in the registers A[4] through A[7]. Ifmore than four arguments per type are required they are put onto the stack. Specialcases like 64-bit pointers or struct arguments are not relevant for our work and thusskipped here. 32-bit scalar return values are generally passed back to the caller insideD[2] while 32-bit pointers are returned in A[2]. When the function returns, the previouslysaved upper context is restored.

As a side note we wish to point out that the TriCore compiler is able to generatemachine code that will always use the stack to handle parameters and return values.This feature is only provided for backward compatibility and the instruction set containsno PUSH and POP mnemonics. This fact underlines the strict adherence to the load/storearchitecture.

4.4 Relevant Features of the TC1797

While many shared properties of TriCore products are specified by the TriCore architec-ture, we only worked with the specific model called TC1797. The TC1797 implements

26 4 Target Platform: TriCore TC1797

version 1.3.1 of the TriCore architecture. In Sect. 4.2, we gave a general overview ofthe TC1797’s features. The product brochure [ITA12], the data sheet [ITA09a], andthe user’s manual [ITA09b] give a good impression of how feature-rich and complex theTC1797 is. For our work we need only a small subset of the long list of features. Weshortly name the relevant features in the following and refer to [ITA09b] for the fulldetails.

4.4.1 CPU CoresThe TC1797 contains a 32-bit CPU that is supported by a 32-bit Peripheral ControlProcessor (PCP). The PCP is a fully self-contained processor with its own program anddata memory. It is intended for complex management tasks regarding peripheral units.For our work only the CPU is of interest because we perform no complex peripheraltasks.

4.4.2 Memory Sections and CachingRunning a binary on a microcontroller unavoidably requires some memory to store codeand data. The Program Memory Unit (PMU) of the TC1797 features 4 MB of programmemory and 64 kB of data memory. We mentioned earlier that the address space isdivided into 16 segments and that memory is addressed using 32-bit values. The overalladdress space therefore covers a range of 4 GB. Each of the 16 segments spans 256 MB.

Related to our work we are interested in accessing the program and data memories.Cached access to those memories is possible via segment 8 while non-cached accessis allowed via segment A. We might also encounter cases in which we wish to readfrom or write to the Local Data RAM (LDRAM) which is part of the Data MemoryInterface (DMI). The DMI handles data requests from the CPU and the Local MemoryBus (LMB). Access to the LDRAM is allowed via segment D.

The remaining segments are either reserved (0-7, 9, B) or irrelevant for our work.Knowledge about the segments is an essential prerequisite for the memory configurationeach binary requires. We found that the HighTec toolchain offers default configurations,but they are tailored to specific products like the Infineon TriBoard. Therefore we hadto customize the memory settings for our binaries. For the most part we chose segmentA in order to enforce non-cached memory access. We come back to caching in Chap. 7.One more problem with the default configurations lies in the fact that they assume theexistence of external memory. In our case no external memory is present. The TC1797immediately raises a debug trap if the memory configuration is wrong. Thankfully, FalkSchellenberg of EMSEC kindly provided us with a working memory map file.

4.4.3 Serial CommunicationWe need a way to exchange data like plaintexts and ciphertexts with the TC1797. Forthis communication we use one of two ASC interfaces provided by the TC1797. Thehardware implementation of suitable peripheral components was already finished whenwe started working on our thesis. Therefore we did not require to learn many details

4.4 Relevant Features of the TC1797 27

about the ASC interface. Nevertheless it is interesting to know that the ASC features abuilt-in baud rate generator whose internal clock frequency depends on the ASC clock,which in turn depends on the core frequency. This imposes some limits to possible baudrates. We mention one such case in Chap. 8.

4.4.4 Digital I/OAside from the CPU, working memory maps, and serial I/O, we require one moremechanism: Digital I/O. We need to supply a trigger signal to the outside worldfor power trace recording. The TC1797 offers a total of 215 digital General PurposeI/O (GPIO) lines grouped into 16 ports. A digital I/O line suits the need for a triggersignal perfectly because the signal is binary. No states other than “on” (high voltage)and “off” (low voltage) are required for the oscilloscope trigger. We chose to use one pinof port 2 for the trigger signal.

5 Working Environment

We introduced different types of power analysis attacks in Chap. 3. In order to conduct apractical attack we must meet some technical requirements. We discussed the most basicrequirement in Chap. 4 where we described which target platform we use four our work. Inthis chapter, we present the essential parts of our overall working environment. As a startwe explain the kind of TriCore evaluation board we work with and the communicationchannels between the board and surrounding components. Subsequently we describe themeasurement setup and the software we used to develop, test, and attack AES on theTC1797.

5.1 The EMSEC TriCore BoardInfineon offers a starter kit for the TC1797. The so-called TriBoard is a PCB equippedwith a TC1797, some external RAM, a number of connectors including JTAG, four 80-pinmale and four 80-pin female I/O connectors, and some additional components includingbus transceivers and LEDs. The author of [Hoh09] used a TriBoard for his work but thisevaluation board has two major downsides regarding power analysis attacks:

1. There is exactly one power supply for the entire board which in turn implies thatthere is exactly one electrical ground.

2. The TC1797 is soldered to the PCB. Thus the chip can not be exchanged.

The problem with the central power supply and ground connection is obvious. Whenperforming power analysis attacks, it is important to have the greatest possible amountof control over power supply and electrical ground connections. The TC1797 is a highlycomplex device featuring a total of 80 digital ground (VSS) pins complemented by anadditional eight distinct ground pins serving as oscillator ground, ADC reference voltageand analog part ground, and Fast Analog-to-Digital Converter (FADC) reference voltageand logic ground. Selective clustering of ground connections enhances the number ofpotential measurement points and in turn makes it easier to find the optimal approachto recording power traces.

The fact that the TC1797 is soldered to the board leads to more potential problemswhen it comes to experiments in a laboratory. Both constructive and destructive scenariosmust be considered. From the constructive point of view one might wish to downloadthe same binary to multiple TC1797 chips in order to use them in a team of researchers.It might also be necessary to try a different TriCore model for other experiments. Fromthe destructive point of view, a chip or its periphery can theoretically break at any time

30 5 Working Environment

Figure 5.1: The TriCore SCA board developed at EMSEC with the TC1797 socketed inthe middle, measurement probes on the left and at the bottom, and a JTAGcable on the right

during lab experiments. Both scenarios make it clear that physical exchangeability ofthe chip and full control over the Printed Circuit Board (PCB) seem desirable.

Based on those two criteria the TriBoard is obviously not the optimal device to conductpower analysis experiments with TriCore products in a research environment. Thankfully,David Oswald and Falk Schellenberg of EMSEC designed a custom board targeted at alltypes of side-channel attack against TriCore products. We use version 1.0 of this boardfor our attacks. The board is depicted in Fig. 5.1.

The two requirements we mentioned previously, namely fine-grained control over groundconnections and physical exchangeability of the chip, are fulfilled by the EMSEC board.The board offers distinct ground connectors for all major ground pin groups used by theTC1797. Direct electrical connections to the ground plane were replaced with sockets.The ground connection can be closed using jumper plugs for regular operation of theboard. For power consumption measurements it is easy to equip the socket with a resistorinstead of a jumper plug. We give details on ground connections and related issues inSect. 5.3 where we describe our measurement setup. Before we get to the measurementsetup we describe the development and debugging process.

5.2 Communicating with the Board

The SCA board itself would be quite useless if it was unable to communicate with otherdevices, in our case primarily with the host computer. We need to download binariesto the TC1797 in order to execute and debug our implementations. For this purpose

5.2 Communicating with the Board 31

Host PC

TriCore boardPicoScope UAD2

USB (Power, RS232)

USB USB

A

B JTAG

Figure 5.2: Schematic picture of the laboratory setup

we use the Joint Test Action Group (JTAG) capabilities of the TC1797. The boardfeatures a JTAG connector to interface with the respective hardware which we describelater in this section. In addition to flashing and debugging we also need to send keysand plaintexts to the TriCore and retrieve ciphertexts in exchange. For this purpose weuse the ASC0 interface of the TC1797. The board features an FTDI chip that performsRS-232 communication over Universal Serial Bus (USB). Figure 5.2 shows a schematicoverview of the laboratory setup. We describe the distinct components in the following.

5.2.1 JTAG Hardware

In order to interface with the JTAG connector on the board we require hardware thatcan do so because the host computer does not offer a JTAG port. For this purpose weuse the Universal Access Device 2 (UAD2) offered by PLS.1 The UAD2 is a small boxwith a serial and a JTAG connector on the one side and a USB connector on the other.The USB port allows for a connection to the host computer while the other ports areused to connect the UAD2 to the target device.

The UAD2 is powered externally and features a ground socket. The manual mandatesthat all devices are connected to a common ground domain using this socket in orderto prevent damage caused by Electrostatic Discharge (ESD). The ground connectionshould always be established first. The UAD2 is not specifically designed for the TC1797but supports a broader range of microcontrollers so that it can be reused across differentprojects. We use the UAD2 for flash programming and debugging.

Flash Programming

When a binary has been assembled it must be downloaded to the flash memory of theTC1797. During our work we are in a phase of “active development” in which frequentreprogramming of the target device is required. The UAD2 enables us to download codeto the TriCore via the JTAG interface. Whenever new code is downloaded to the target,the flash memory is first erased. It is then reprogrammed using the provided binary.Each programming cycle ends with a checksum computation by which the integrity ofthe downloaded binary data is asserted.

1PLS Programmierbare Logik & Systeme GmbH, http://www.pls-mc.com.

http://www.pls-mc.com


Debugging

Active development also brings along the need for debugging. Even though live debuggingshould be the last resort, it is sometimes unavoidable. Additionally, some effects likepointer arithmetic bugs only surface while testing code on the target platform. TheTC1797 offers On-Chip Debug Support (OCDS) which is described in [ITA09a, pp. 56-58]and in more detail in [ITA09b, ch. 15]. OCDS uses the JTAG port which in this caseserves as the communication link between the TC1797 and external debugging hardware.There are two levels of OCDS: Level 1 (L1) offers software debugging features likememory access, breakpoints, register inspection, and single-step execution. Level 3 (L3)offers more complex features like tracing but requires a special emulator. We only needthe features offered by OCDS L1. Combining the UAD2 and an accompanying softwarecalled Universal Debug Engine (UDE) enables us to set breakpoints and inspect registersand memory. In Sect. 5.4 we give more information on the software stack we use for ourwork.

5.2.2 USB Connection to the Host Computer

The board is connected to the host computer via USB. This connection powers the boardand serves as the serial communication link. The host computer detects the FTDI chip asa plug-in device and creates a virtual serial port accordingly. Data can then be exchangedwith the board and effectively with the TC1797 via this serial connection. This USBconnection completes the two required communication lines between the TC1797 (or theentire board) and the host computer.

5.3 Measurement Setup

Power traces are usually recorded using a digital oscilloscope. We use a PicoScope 5203as depicted in Fig. 5.3. The PicoScope 5203 is a USB oscilloscope with no built-in display.Sampled data is handled on the host computer. The oscilloscope can be controlled in twoways. Firstly, a Graphical User Interface (GUI) application is provided free of charge byPicoTech. This application allows a user to select settings like sampling mode, samplingresolution, and trigger thresholds. The GUI is a comfortable replacement for the internalscreen of classic oscilloscopes, especially because it offers more complex features likeserial data decoding and special measurement modes. We use v6.7.21.2 of the PicoScopesoftware.

Secondly, the oscilloscope can be controlled by custom software through an ApplicationProgramming Interface (API). This is the preferred approach for the actual recording ofsampled data because the GUI has no capabilities to do so. In the context of our attackswe need more functionality than visual inspection of a live signal. We must interact withthe TriCore board to supply the cipher with inputs, retrieve the outputs, and at the sametime record the power consumption. For this purpose we use a framework developed atEMSEC which we describe in Sect. 5.4.2.

5.4 Software Stack 33

Figure 5.3: The PicoScope 5203 with connectors on the front panel.Source: http://www.pc-oscilloscopes.com/images/ps5000-6low.jpg

From the technical point of view, the PicoScope 5203 features two channels namedA and B. It can sample each channel at up to 500 MHz. Samples are taken at an 8-bitresolution and represented as 16-bit values where the second byte includes special flagslike an overflow indicator. We use Channel A to measure the power consumption whileChannel B serves as the trigger that starts a sampling run. The measurement probe forChannel A is connected to a measurement resistor that we can freely insert into any ofthe ground sockets available on the TriCore board. As an additional tool we have a set ofwires that joins multiple jumper plugs together so that we can combine multiple groundsockets through the central measurement resistor. The probe for Channel B is connectedto an I/O pin of the TC1797. We configured the trigger in such a way that a rising edgeat this pin indicates the start of a measurement run.

5.4 Software StackIn order to develop software destined to be run on the TC1797 we need a number of tools.We also need some software for measurements and evaluation during our side-channelattacks. We describe the toolchain in the following.

5.4.1 Development

We develop software for the TC1797 using the Integrated Development Environment(IDE) provided by PLS. We use v3.4.7.3 which comes as Eclipse Helios SR1 augmentedwith some plugins. C and C++ development support is provided by the C/C++Development Tooling (CDT) plugin. The Eclipse Modeling Framework (EMF) acts asthe basis for features from the PLS development tools. Those are installed in version1.1.9.201101141645 and constitute the main plugin.

http://www.pc-oscilloscopes.com/images/ps5000-6low.jpg


The general integration of C and C++ compilers into Eclipse is managed by CDT.PLS provides a full GNU-based C and C++ compiler toolchain that has been customizedto fit the needs of TriCore development. We use v4.6.2.0 of the toolchain. Binariescreated are ELF compatible. Many well-known GNU tools like make are included inthe toolchain. A version of objdump is also provided which is of great value for binaryanalysis.

The PLS development tools contribute a HighTec and a UDE perspective to Eclipse.The HighTec perspective works like a regular C/C++ development perspective. Inaddition, it provides a tailored project explorer and some modeling integration for targetdevice memory configurations. The UDE perspective integrates the UDE workbenchinto Eclipse which can also be run as a standalone tool. We use v3.00.07 of UDE for allpurposes of interaction with the TC1797 via the UAD2. From UDE we can downloadbinaries to the TC1797 as well as debug them at runtime.

The last development tool we present comes directly from Infineon and is called DigitalApplication Virtual Engineer (DAVE). We use DAVE 2.2 r2. DAVE enables us to createa system configuration for the TC1797 and other TriCore models in a visual manner. Asimple GUI shows a graphical representation of TriCore components like system timer,serial ports, and memory configuration. The

Documents

Protecting AES on TriCore against Power Analysis AttacksDeclaration I hereby declare that this submission is my own work and that, to the best of my knowledgeandbelief,itcontainsnomaterialpreviouslypublishedorwrittenbyanother