31

Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Generi Disassembler Using Pro essor ModelsA Thesis Submittedin Partial Ful�llment of the Requirementsfor the Degree ofMaster of Te hnology

byPrithvi Pal Singh Bisht

to theDepartment of Computer S ien e & EngineeringIndian Institute of Te hnology, KanpurFebruary, 2002

Page 2: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Certi� ateThis is to ertify that the work ontained in the thesis entitled � Generi Dis-assembler Using Pro essor Models �, by Prithvi Pal Singh Bisht, has been arriedout under my supervision and that this work has not been submitted elsewhere for adegree.

February, 2002 (Prof. Rajat Moona)Department of Computer S ien e & Engineering,Indian Institute of Te hnology,Kanpur.

Page 3: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Abstra tThe generi disassembler, produ es symboli relo atable disassembly of an obje t�le in Exe utable and Linking Format (ELF). It uses a pro essor model that on-tains instru tion set des ription of the pro essors in a language alled Sim-nML.Sim-nML is simple, elegant and powerful language to express the behavior of pro- essors at instru tion level. It uses synthesized attributes to represent timing in-formation, instru tion semanti s, assembly language syntax and binary representa-tion of instru tions. Generi Disassembler fa ilitates disassembly of programs in aGNU- ompatible format. For identifying the instru tions, depth �rst sear h andba ktra king is used on a tree like stru ture of the Sim-nML instru tion set des rip-tion. Sin e the attributes of an instru tion are s attered in various subtrees, syntaxfor the instru tion is olle ted from the subtrees sele ted during traversal. Di�erentparts of a single instru tion may be mat hed with di�erent subtrees. Symboli andrelo atable disassembly is a hieved by using relo ation and symbol information fromthe obje t �le and analyzing the ode to identify basi blo ks.

Page 4: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

A knowledgmentsMy sin ere thanks goes to my thesis supervisor Prof. Rajat Moona. His stream-lined view and to-the-point approa h is the ba kbone of this work. In addition tote hni al aspe ts, I obtained a strong sense of professionalism from him. In short,it was a rewarding experien e for me to work under him.Work done in this thesis is a part of the resear h done under CARES proje tin Caden e Resear h Center at IIT Kanpur. Spe ial thanks to Dr. Sanjeev KumarAggarwal and Dr. Deepak Gupta for their a tive involvements in CARES team.Mr Souvik Basu, senior student in CARES proje t, did his best in helping mebegin my work in this area. I thank him for all the time he spent for me. I wouldlike to thank Mr R. Ravindran, PhD student, for his in-time helps.I am thankful to the MTe h 2000 and 2001 bat hes for their signi� ant nearness.Spe ial thanks goes to Pijush, Asheesh and Jayadeep for giving "in-home" and"granted" feeling ba k at hostel. Girish, Rajesh, Pradeep and Varada will holdseparate and important pla es in my memory.I am indebted to my parents, brothers and sisters for their love, are and un-derstanding in a hard and ru ial phase of life. I express my deep gratitude to myelder brother Brijesh da for his love, understanding and inspirations. Without himI would have been washed o� to a di�erent land.Before losing this se tion I would like to express my deep feelings for Vinod. Iwill always remember him for providing the reasons to go on.

i

Page 5: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Contents1 Introdu tion 11.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Sim-nML 52.1 Hierar hi al Stru ture . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Sim-nML grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Resour e Usage Model . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Generi Disassembler 103.1 Approa h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2.1 Identi� ation of Code Blo ks . . . . . . . . . . . . . . . . . . . 133.2.2 Generation of Symboli Names . . . . . . . . . . . . . . . . . 153.2.3 Disassembly . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Con lusions 18Bibliography 19A User's Manual 23A.1 IR Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23A.2 Usage of Generi Disassembler . . . . . . . . . . . . . . . . . . . . . . 24A.3 Con�guration �le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

ii

Page 6: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

List of Figures2.1 OR-Rule and AND-Rule formats . . . . . . . . . . . . . . . . . . . . 62.2 Addressing modes and �avor of add and bran h instru tions in Sim-nML 73.1 Approa h for Tool Design with Sim-nML . . . . . . . . . . . . . . . . 113.2 Identifying Code Blo ks . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Algorithm for identifying obje t and patterns in data se tions . . . . 143.4 Algorithm for expressing relo ation information . . . . . . . . . . . . 16

iii

Page 7: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Chapter 1Introdu tionWith development of an Appli ation Spe i� Pro essor (ASP), spe i� tools e.g.disassembler, assembler, simulator, ompiler ba k end, et . are also required. Su htools provide an interfa e for appli ation development on this pro essor. Non-availability of su h tools in time, may ause the whole design to fail. Problem gets ompli ated in ase many design alternatives are to be onsidered. In a traditionalapproa h, for every alternative design model, su h tools are redesigned. Develop-ment of these tools is a tedious and error-prone pro ess. CARES proje t group at IITKanpur is developing an interfa e that automates the pro ess of tools development.In our approa h we use pro essor models ontaining instru tion set des ription ofpro essors in Sim-nML. From these pro essor models we develop various tools us-ing tool generators. Pro essor models of SPARC, Motorola 68HC11, PowerPC603,MIPS R10000, ARM, ADSP2105, 8085 and a few other pro essors have been devel-oped by us and used for the generation of various tools. Generi disassembler is oneof the tools in the tool hain developed at IIT Kanpur. Other developed tools areGeneri Assembler Generator[4℄, Fun tional Simulator Generator[5℄, RetargetableCa he Simulator and Code Instrumentor[6℄ and Compiler ba k-end generator[7℄.1.1 Related WorkThere are various related works reported in the literature.1

Page 8: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

CBC/SIGH/SIM framework onsists of a retargetable ode generator CBC[8℄ forC ompilers and the instru tion set simulator SIGH/SIM[9℄. This framework usesnML[3℄ language for pro essor models. nML permits on ise, hierar hi al pro essordes ription in a behavioral style.Aviv[11℄ is a retargetable ode generator whi h produ es the optimized ma hine ode for target pro essors with various instru tion set ar hite tures. It uses Instru -tion Set Des ription Language (ISDL[10℄) as a high level language for des ribing thepro essor models.Cy le a urate models of pipelined pro essor ar hite tures require a pipeline-a urate behavioral des ription. In nML or ISDL, the semanti s of the language donot provide means for des ribing the y le a urate models.EXPRESSION[12℄ language uses a mixed behavioral and stru tural representa-tion approa h for pro essor modeling. SIMPRESS[13℄, based on EXPRESSION, is aretargetable simulator generator for pro essor-memory ar hite tures and EXPRESSis a retargetable ompiler for embedded system-on- hip systems.New Jersey Ma hine-Code toolkit[15℄ helps programmers write appli ations thatpro ess ma hine ode e.g. assemblers, disassemblers, ode generators, tra ers, pro-�lers, and debuggers. It uses pro essor models written in Spe i� ation Languagefor En oding and De oding (SLED[14℄). SLED des ribes abstra t, binary and as-sembly language representations of ma hine instru tions. With the help of toolkitretargetable debugger, retargetable optimizing linker and disassembler for SPARChave been developed.BUILDABONG (Building Spe ial Computer Ar hite tures based on Ar hite -ture and Compiler Co-Generation[16℄) dis usses the automati generation of instru -tion set simulators and the orresponding retargeted ompilers. The methodologyis based on ASMs (Abstra t State Ma hines) as the formal model for des ribing apro essor's behavior. ASM is a mathemati al model of omputation based on the on ept of universes, fun tions and updates of fun tions. BUILDABONG proje tgroup[31℄ aims at ar hite ture synthesis and ompiler generation for the ar hite -tures. XASM[18℄ is ASM spe i� ation language. Gem-Mex[19℄ tool automati allygenerate a debugging and simulation environment for a given XASM spe i� ation.2

Page 9: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

MIMOLA (Ma hine Independent Mi roprogramming Language[20℄) is a stru -ture level des ription language. RECORD[21℄ system based on MIMOLA aims atautomati ode generation for �xed point DSPs with a �xed instru tion word length.The PlayDoh[22℄ ar hite ture of HP Laboratories is based on ma hine des rip-tions spe i�ed in a high level textual language MDES[23℄. It aims at developingperforman e oriented ompilers for the VLIW and super-s alar pro essors.LISA[24℄ pro essor modeling language fa ilitates pipeline des ription and expli it ontrol spe i� ation. Lisa Pro essor Design Platform (LPDP[32℄) tool-suit uses pro- essor models in LISA to generate software development tools in luding C- ompiler,assembler, linker, instru tion set simulator and debugger front-end. RADL[25℄, an-other spe i� ation language is derived from LISA, fo uses on detailed pipeline be-havior and is used to generate instru tion set simulators.CHESS/CHECKERS[26℄ environment onsists of a C ompiler alled Chess, alinker alled Bridge, an instru tion set simulator alled Che kers and an assemblerand disassembler alled Darts. Pro essor models are des ribed using nML[3℄. Chessreads the nML des ription to generate binary ode from a C program. Similarly,Che kers uses the nML des ription to a ept the binary ode and simulate its exe- ution on the target pro essor. Chess/Che kers is a retargetable environment.The FlexWare[27℄ framework onsists of a retargetable ode generator CODESYNand an instru tion set simulator INSULIN. CODESYN takes one or more algorithmsexpressed in a high-level language and maps them onto a user de�ned instru tionset to produ e optimized ma hine ode. It an be used to develop ode for pro- essors in luding Appli ation Spe i� Pro essors (ASPs). INSULIN is based on are on�gurable VHDL model of a generi instru tion set pro essor.CASTLE[28℄ is a o-design platform whi h provides a number of design tools for on�guring appli ation spe i� design �ows. The design �ow starts with a C/C++program and gradually derives a register-transfer level des ription of a pro essorhardware, as well as the orresponding ompiler for generating the pro essor op ode.Visualization Based Mi ro-ar hite ture Workben h (VMW[29℄) fa ilitates spe i-� ation of mi ro-ar hite ture of pro essors and automati generation of performan esimulators. It de�nes a set of ma hine des ription �les. These ma hine des ription3

Page 10: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

�les an be ompiled into a working performan e simulator for a spe i� target pro- essor. Visualization apabilities are in orporated to allow monitoring of simulationpro ess. With the help of VMW performan e simulators for PowerPC 601 and 620,DEC Alpha AXP 21064 and 21164 and IBM RS/6000 mi ropro essors have beengenerated and exe uted.

4

Page 11: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Chapter 2Sim-nMLSim-nML is an extensible formalism designed to spe ify generi single pro essor mod-els. It is a language used to des ribe the instru tion set ar hite ture of a pro essorwith the minimal knowledge about its mi ro-ar hite ture. The design of Sim-nMLis highly in�uen ed by that of the nML[3℄.The pro essor models in Sim-nML are des ribed using attribute grammar1 ina hierar hi al manner. To fa ilitate this, Sim-nML de�nes two kind of primitiverules, namely op-rules and mode-rules. Op rules are generally used for des ribingthe instru tions while mode rules are used for des ribing the addressing modes.The Sim-nML des ription given in �gure 2.2 des ribes a simple pro essor withfour type of instru tions - add, bran h, load and store. In a pipelined pro essor,several instru tions may oexist at the same time. In su h a pro essor, a singleregister su h as PC annot spe ify the address of an instru tion in �ight. Sim-nMLsupports a spe ial token, $, whi h is used to denote the memory address of theinstru tion in the de�nition of various attributes of an instru tion.2.1 Hierar hi al Stru tureIn Sim-nML based des riptions, the instru tion set is des ribed in a hierar hi almanner with fragments of ea h of the attribute being distributed over the whole1An attribute grammar is a ontext free grammar in whi h ea h non-terminal have a �xed setof attributes and for ea h produ tion a set of semanti rules is given.5

Page 12: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Sample OR Ruleop n0= n1 |n2 |n3 |n4Sample AND Ruleop n1 ( p1 : t1, p2 : t2, p3 : t3, ...)a1 = e1, a2 = e2, a3 = e3, ...where ea h ni is a non-terminal, ea h ti a token.Ea h ai is an attribute name and ei their respe tive de�nitions.Figure 2.1: OR-Rule and AND-Rule formatsspe i� ation tree. The ommon behavior of a lass of instru tions is aptured at thetop level of the tree. The spe ialized behavior of the instru tions is aptured in thesubsequent lower levels.For example in �gure 2.2, two instru tions, represented by addRtoR and addItoR,have a ommon part of the syntax as "ADD". Similarly these instru tions share a ommon part of the image as "00000001". This ommon behavior of both theinstru tions is aptured by spe i� ation tree node addinst, the an estor of addRToRand addIToR nodes.2.2 Sim-nML grammarThe root of the spe i� ation tree in the Sim-nML is represented by a �xed symbol alled instru tion. There are two type of onstru tions supported in the Sim-nMLnamely OR-Rule and AND-Rule, (�gure 2.1). The and-rule onstru tions are usedfor the terminal symbol de�nition and represent the leaf nodes in the spe i� ationtree. The or-rules are non terminals whi h an expand to further and-rules or or-rules or both. Both primitive rules, i.e. mode and op rules an be onstru tedusing Or or And onstru tions. The Sim-nML grammar prede�nes four attributes- syntax, image, a tion and uses. The syntax attribute des ribes the assemblylanguage format of the instru tion while the image attribute des ribes the binary oding of the orresponding instru tion. Similarly, a tion attribute des ribes the6

Page 13: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

syntax = format("%d",addr)image = format("%16b",addr)

uses = if "rand"() < 0.95 then else endif

#{1} #{10}

syntax = format("%d",x)image = format("%16b",x)

syntax = format("R%d",i)image = format("0%2b",i)

uses = s.uses, wb#{2}syntax = format("STORE %s,%s",r.syntax,s.syntax)image = format("00000100",r.image,s.image)action = { s.action; M[tmp] = r<7..0>; M[tmp+1] = r<15..8>;}

uses = ifu#{1}, r.uses, wb#{1}syntax = format("%s",r.syntax)image = format("00010%s",r.image)action = { PC = PC + 2; tmp = r;}

uses = ifu#{2}, wb#{1}syntax = format("%s",m.syntax)image = format("00001%s",m.image)action = { PC = PC + 4; tmp = m;}

type byte = card(8)reg R[4,word]reg PC[1,word]mem M[2**16,byte]resource ifu, bu, alu[2], wb

// Addressing modes

mode immediate(x:word)= x

op inst_type = add_inst | branch | load_store

op addr_type = addRToR | addIToR

M[R[i]]::M[R[i]+1] mode reg_indirect( i : card ( 2 ) ) =

mode direct(addr:word)=M[addr]::M[addr+1]

op instruction(x:inst_type)

op add_inst(x:addr_type)

op addRToR(R2:register,R3:register)

mode register( i : card ( 2 ) ) =R[i]

op addIToR(x:immediate,R:register)

op branch(x:branchtype)

op branchtype = branchrelative | branchabsolute

op load_store = load | storevar tmp[1,word]

op loadmode = load_direct | load_indirect

op branchrelative( target : card ( 16 ) )

op branchabsolute( target : card ( 16 ) )

op load(r:register,l:loadmode)

op load_direct(m:direct)

uses = l.usessyntax = format("LOAD %s,%s",r.syntax,l.syntax)

action= { l.action; r = tmp;}

type word = card(16)

op load_indirect(r:reg_indirect)

uses = ifu#{2}, m.uses, wb#{1}syntax = format("%s",m.syntax)image = format("01000%16b",m.image)action = { PC = PC + 4; tmp = m;}

syntax = format("ADD %s",x.syntax)image = format("00000001%s",x.image)uses = x.usesaction = { x.action ; }

uses = x.usessyntax = x.syntaximage = x.imageaction = { x.action ; }

syntax = format("(R%d)",i)image = format("1%2b",i)

uses = if "rand"() < 0.95 then else endif

#{1} #{10}

syntax = format("%s,%s",R3.syntax,R2.syntax)image = format("00%s%s",R2.image,R3.image)

uses = ifu#{1}, alu#{1}, wb#{1}

action = { R3 = R3 + R2; PC = PC + 2;}

uses = ifu#{2}, alu#{1}, bu#{1}, wb#{1}syntax = format("JMP %s",x.syntax)

action = { x.action ; }

uses = ifu#{2}, alu#{1}, wb#{1}syntax = format("%s,%s",R.syntax,x.syntax)

action = { R = R + x; PC = PC + 4;}

image = format("00000010%s",x.image)

image = format("00000011%s%s",r.image,l.image)

image = format("01%s%s",R.image,x.image)

}

syntax = format("%d", target)

action = { PC = target;

syntax = format("%d",target)

action={ PC = target;}

op storemode = store_direct | store_indirect

op store(r:register,s:storemode)

op store_direct(m:immediate)

op store_indirect(r:reg_indirect)

uses = ifu#{1}, r.uses, wb#{1}syntax = format("%s",r.syntax)image = format("00%s",r.image)action = { PC = PC + 2; tmp = r;}

image = format("00000001%16b", $+target)

image = format("00000010%16b",target)

Figure 2.2: Addressing modes and �avor of add and bran h instru tions in Sim-nML7

Page 14: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

semanti s of the a tion and the uses attribute des ribes the resour e usage model.2.3 Resour e Usage ModelThe mi ro-ar hite ture details of the pro essor an be spe i�ed using the resour eusage model. In Sim-nML spe i� ation, the entities within the pro essor su h asfun tional units, ALUs, pipeline stages, registers, ports et ., onstitute a set ofresour es. Every instru tion utilizes several resour es ea h for an amount of timewhi h may be onstant or dependent upon only the instru tion. The resour e usagemodel is based on the assumption that at any instant, an instru tion in exe utionholds a set of resour es and performs ertain a tion. The next set of resour es mustbe a quired before performing the orresponding a tion. The resour es held by theinstru tion and the a tion taken hange progressively.In this model instru tions ontend for resour es and wait, if the resour es are notavailable. Thus the �ow of the instru tions in the mi ro-ar hite ture is essentiallya way of resolving resour e on�i ts. When two instru tions wait simultaneouslyfor a single resour e, the on�i t is resolved in FIFO order of the instru tions. Theuses attribute in Sim-nML des riptions des ribes the resour e usage model for aninstru tion.An example model for a super-s alar pro essor is shown in �gure 2.2. All instru -tions are read by a resour e IFU (Instru tion Fet h Unit). The instru tions havevariable length (16 bits or 32 bits) and a ordingly take one or two units of time atthe IFU. In addition there is a BU (Bran h Unit), a resour e that is used by thebran h instru tions only. There are two ALUs, both alike, and are used by the ADDinstru tion (and other ALU instru tions) and by the bran h instru tions. Finallythere is a resour e alled WB (Write Ba k) unit. It is used by the instru tions thatneed to write their results ba k into the registers.In some ases, the resour e uses depends on a ondition. For example, time formemory a ess depends on whether it was a hit in the a he or not. The onditionaluse an be spe i�ed by the if onstru tion of uses attribute. For example, onstru -tion if "rand"() < 0.95 #{1} else #{10} represents that in ase of a a he hit (hit8

Page 15: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

ratio is 0.95), delay of 1 time unit o urs while delay is of 10 time units in aseof a miss. This onditional onstru tion also shows that in Sim-nML models, it ispossible to spe ify omponents of uses attribute without any resour e to representjust the time delay.In Sim-nML models, the ar hite tural hanges are relatively easy to in orporate.For example a three way super-s alar pro essor with 3 ALU units an be obtainedby appropriately hanging the resour e de laration in �gure 2.2.

9

Page 16: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Chapter 3Generi Disassembler3.1 Approa hThe approa h for the generi disassembler is similar to the other tools based on Sim-nML pro essor models (�gure 3.1). Sim-nML pro essor models are �rst onvertedto a ompa t IR (Intermediate Representation) by a tool IRG (IR-Generator). ThisIR along with the other required inputs is used by di�erent tools.IR aptures all the information in the Sim-nML des ription and makes it easyto a ess the information by various tools. In our approa h, IR is a olle tion oftables ea h of �xed size re ords. A `table of index' in the beginning of the IR �leprovides the size and lo ations of other tables in the �le. The generi disassembleruses a simple on�guration �le in addition to the IR to onvert an ELF binary to orresponding assembly language program.3.2 AlgorithmWhile disassembling an obje t �le, the obje t �le is �rst read in the memory. Thepro ess of disassembly for relo atable and exe utable obje t �le di�ers only in theaspe t that exe utable �le ontains absolute addresses while the relo atable �le ontains the relo atable addresses. This di�eren e is handled while onsidering thestart addresses of the various se tions in the obje t �le.10

Page 17: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Disassembler

DisassembledProgram

IRG

Prcoessor Model in Sim−nML

Processor Modelintermediate form

AssemblerGenerator

Functional simulator generator

Functional SimulatorAssembler

Assemblerprogram

ELF Binary

ELF Binary

Binary program

CONFIG file

Figure 3.1: Approa h for Tool Design with Sim-nMLIn order to disassemble an instru tion, the bit pattern from the obje t �le isused to identify the assembly language instru tion. Algorithm for identifying an in-stru tion is based on a ombination of depth �rst sear h and ba ktra king. Startingfrom the root node instru tion in the Sim-nML spe i� ation tree, images of di�erentnodes are mat hed with the given input bit string in depth �rst order. Sin e theinstru tion spe i� ations may be distributed over the spe i� ation tree, nodes ina path from the root to the leaf ontribute in de�ning an instru tion. The imageformed by olle ting the images of all su h nodes in a path should mat h in order toidentify the instru tion. On e su h a path is identi�ed, syntax attribute of all nodesin the path are used to get the syntax of the identi�ed instru tion. The mat h is arried out using DFS and when it fails, ba ktra king is used to sear h in anotherpath.The disassembler also analyzes the obje t �le in various ways to make the disas-sembler output readable. In parti ular the disassembler uses the symbol informationfrom the ELF �le and also generates symbols whenever needed. These symbols areused in pla e of addresses in the instru tions. The pro ess of disassembly is divided11

Page 18: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

1 from the binary ELF, extra t the addresses of various fun tions;2 put these addresses in start_ ode_blo k list;3 i = 0;4 while start_ ode_blo k list is not empty {a. // take out the address from the list.address = listout(start_ ode_blo k);b. for j = 0 to i-1 { // if address is already tra ed, ignore it.if address � ode_blo k_start[j℄ &&address � ode_blo k_end[j℄ {goto step 4;}} . ode_blo k_start[i℄ = address;d. follow binary till an instru tion that hanges PC;e. if the instru tion is un onditional bran h { ode_blo k_end[i℄ = address of the urrent instru tion;append the target address in the start_ ode_blo k list;i = i + 1;goto step 4;}f. if the instru tion is un onditional return { ode_blo k_end[i℄ = address of the urrent instru tion;i = i + 1;goto step 4;}g. if the instru tion is some other PC hanging instru tion {append the target address in the start_ ode_blo k list;goto step 4.d;}}5 merge the adja ent ode blo ks;Figure 3.2: Identifying Code Blo ks12

Page 19: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

into following steps.3.2.1 Identi� ation of Code Blo ksThe �rst kind of analysis on the obje t �le is to �nd out the ode and data area. Thegeneri disassembler �rst �nds out the ode blo ks (see the algorithm given in �gure3.2). The algorithm works as follows. From the ELF binary, addresses of variousfun tions in the program are read, if available. This information is stored in thesymbol table se tion of the binary obje t �le. Assuming that ea h of these will bestarting/entry point of a ode blo k, the ode is tra ed till a bran h un onditionalor return un onditional instru tion is found. Bran h un onditional or return un on-ditional instru tions are used to denote the end of a ode blo k. While tra ing the ode blo ks, if a all or onditional bran h instru tion is en ountered, the target ad-dress of the instru tion de�nes another ode blo k. If the entry point for the targetaddress is already tra ed, tra ing is ontinued with other untra ed starting points.The tra ing pro ess is repeated as long as an untra ed starting point is there. Callinstru tions may refer to the system library fun tions. Tra ing is not done for entrypoints orresponding to su h fun tions. With the help of symbol table of the ELFbinary su h starting points are identi�ed and marked as untra eable. At the end of ode blo k analysis, all the adja ent ode blo ks are merged into one.For this analysis, the disassembler needs to know the type of the instru tion(bran h vs. non-bran h). The instru tion type is determined with the help of a on�guration �le. Instru tions an be one of the four types - bran h, all, return andsimple. The on�guration �le ontains entries for the �rst three instru tion types.Following line is taken from a sample on�guration �le. It spe i�es that the subtreerooted at node with label bran h_un ond ontains all the bran h un onditionalinstru tions.% BRANCH_UNCOND bran h_un ondAfter the ode blo k analysis, data analysis is done. Data se tions onsist ofdata obje ts de�ned in the program, strings et . used in the program. Symboltable of the ELF binary ontains information about the obje ts in data se tions,if present. We need to identify the beginning of obje ts in data se tions so that13

Page 20: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

data_se tion = ontents of the data se tion from the ELF binary;o�set = 0;while o�set is not equal to the length of data se tion {if an obje t is found in the symbol table of the binary ELF for this o�set {get the size/obje t_name/alignment information from the symbol table;o�set = o�set + size_of_obje t;}if an obje t is not found in the symbol table {if data_se tion[o�set℄ is a non-printable hara ter {use pseudo op .byte for this o�set;o�set = o�set + 1;}if data_se tion[o�set℄ is a printable hara ter {�nd length of the string of printable hara ters starting at o�set;if the identi�ed string is NULL terminateduse pseudo op .as iz for the identi�ed string;if the identi�ed string is not NULL terminateduse pseudo op .as ii for the identi�ed string;o�set = o�set + length_of_string ;}}} Figure 3.3: Algorithm for identifying obje t and patterns in data se tions14

Page 21: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

appropriate information (size, s ope, label et .) an be written with data ontentsof the data obje ts. Algorithm used for identifying obje ts and patterns is givenin �gure 3.3. It works as follows. For every o�set of the data se tion, a sear h ismade in the symbol table, to he k if it is the beginning of an obje t. If an obje tis found, symbol table is used to get the obje t s ope (global/lo al), label and sizeinformation. In ase no obje t is found, depending upon the a ess pattern of thedata it is de�ned as follows. Array of bytes (.byte pseudo op), if data is a sequen eof non-printable hara ters or a string of printable hara ters (.as ii/.as iz pseudoop). If the string of printable hara ters terminate with a NULL, .as iz pseudo opis used otherwise .as ii pseudo op is used. O�set is in reased by the size of obje tsidenti�ed in symbol table or the patterns identi�ed by the disassembler internally.3.2.2 Generation of Symboli NamesIn the se ond part of obje t �le analysis, an asso iation is made between the uniquesymboli names and the addresses of the instru tions referred to by other instru -tions. The generi disassembler generates symboli names for the starting addressesof new ode blo ks, if the orresponding symboli name is not already available inthe symbol table of the ELF binary. These generated symboli names are enteredin a symbol table maintained by the generi disassembler. The symbol table isinitialized with the symbol table information read from the ELF binary.The symbol table thus generated is used in the pro ess of disassembly as de-s ribed in the next se tion. In our approa h, the unique symbols are generated as.L0, .L1, .L2 et . (i.e. by pre�xing a string ".L" to a ounter value).3.2.3 DisassemblyThe �nal part of the obje t ode analysis onsists of generation of the assemblylanguage program from the ELF binary. At this point symboli names and odeblo k information is available. This step starts with writing general information inthe output su h as the sour e �le name from whi h obje t �le has been generated(.�le pseudo op). 15

Page 22: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

for i = 0 to total_relo _entries � 1 {if relo _table[i℄.o�set is equal to target_address {relo ated_symbol = relo _table[i℄.symbol ;relo ation_type = relo _table[i℄.type;if operator is determined for relo ation_typewrite relo ation information with operator and relo ated_symbol;if operator is not determined for relo ation_typewrite �Relo (relo ated_symbol,relo ation_type);}} Figure 3.4: Algorithm for expressing relo ation informationLabels are written for the �rst instru tion of ea h ode blo k, in the beginning ofits disassembly. The symboli label for the address is obtained from the symbol tablemaintained by the generi disassembler. If the instru tion is the �rst instru tion ofa fun tion then orresponding to its label, other pseudo instru tions su h as .type,.align, .size, .globl or .lo al are also written.After generation of the label, the instru tion is disassembled as per the algo-rithm given earlier. The disassembled instru tion may ontain numeri addresses.In our approa h, we onvert these numeri addresses to the symbols. Symboli names orresponding to numeri addresses are found in the symbol table main-tained by the generi disassembler. The target address of an instru tion mightbe relo atable. If the target address is found in the relo ation table, algorithmgiven in �gure 3.4 is used to express the relo ation information. Relo ation tableentry for the target address determines the relo ation type and the relo ated sym-bol. Relo ation type is spe i� to a pro essor. Target pro essors an be uniquelyidenti�ed with the information present in the header stru ture of the ELF binary.The relo ation information is expressed using an operator on the relo ated symbol.For example, with relo ation type R_PPC_ADD16_HA and relo ated symbol .L0for PowerPC603, the information is written as .L0�ha where �ha is the opera-tor on the symbol .L0. If the orresponding operator annot be determined for16

Page 23: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

a relo ation type, disassembler writes relo ation information in the generi format�Relo (relo ated_symbol,relo ation type).Finally, the ontents of data se tions are written to the output. In the earliersteps information like obje ts, patterns et . is already olle ted. For ea h obje tfound in the symbol table, information su h as its se tion, alignment, size, s ope,label et . is written before the ontents, using appropriate pseudo-ops. This infor-mation is taken from the symbol table of the binary ELF. For the patterns identi�edby the generi disassembler, pseudo ops su h as .byte/.as iz/.as ii are written alongwith the ontents.

17

Page 24: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Chapter 4Con lusionsThe generi disassembler is tested for PowerPC603, Motorola 68HC11 and MIPSR10000 pro essor models. Output produ ed by the generi disassembler is ompat-ible to the GNU assemblers and semanti ally similar to the output produ ed by theGNU C ompilers. The output di�ers from the g - ompiled assembly program inthe names of the labels. In addition, depending upon the spe i� ations, alternateinstru tions are generated by the generi disassembler. For example, in pla e of theinstru tion m�r 0 produ ed by the GNU C ompiler for PowerPC603, the generi disassembler produ es a semanti ally equivalent instru tion mfpr 0,256. Similarly,instru tions in the pairs su h as mfspr 0,256 and m�r 0; ori 0,0,0 and nop; addi0,0,0 and li 0,0; or 31,1,1 and mr 31,1 et . are semanti ally equivalent and havethe same ma hine oding.In the output produ ed by the native ompiler, redundant labels are generatedthat are not referred to by any instru tion. All su h redundant labels are notpresent in the disassembly produ ed by the generi disassembler. Sin e the generi disassembler generates the labels by ode analysis, these labels are di�erent in the ompiler generated assembly program. The labels are however positioned at thesame lo ation. Therefore, though the labels are di�erent, e�e t is the same.18

Page 25: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Bibliography[1℄ Exe utable and Linking Format (ELF). Tools Interfa e Standard (TIS), PortableFormats Spe i� ations. Unix System Laboratories, Version1.1.[2℄ V. Rajesh and R. Moona. Pro essor modeling for Software Hardware o-design.In Pro . of Int. Conf. on VLSI Design, pp. 132-137, Jan. 1999, Goa, India.[3℄ M. Freeri k. The nML Ma hine Des ription Formalism. Te h. Rep. 1991/15, TUBerlin, Fa hberei h Informatik, 1991, Berlin.[4℄ Sarika Kumari. Generation of Assemblers using High Level Pro essor Models.MTe h Thesis, Department of CSE, Indian Institute of Te hnology Kanpur, Feb.2000, Kanpur[5℄ Subhash Chandra and Rajat Moona. Retargetable Fun tional Simulator UsingHigh Level Pro essor Models. In Pro . of 13th Int. Conf. on VLSI Design, pp.424-429, Jan. 3-7, 2000, Cal utta, India.[6℄ Rajiv A. R. and R. Moona. Retargetable Ca he Simulation Using High LevelPro essor Models. In Pro . of 6th Australasian Computer Syst. Ar hite ture Conf.2001, pp. 114-121, Jan. 29-30, 2001, GoldCoast, Australia[7℄ Shishir Mondal. Compiler Ba k End Generation from nML Ma hine Des ription.MTe h Thesis, Department of CSE, Indian Institute of Te hnology Kanpur, Jun.1999, Kanpur.[8℄ A. Fauth, A. Knoll. Automated Generation of DSP Program Development ToolsUsing a Ma hine Des ription Formalism. In Pro . IEEE ICASSP-93, pp. 457-460,Apr., 1993, Minneapolis. 19

Page 26: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

[9℄ F. Lohr, A. Fauth, M. Freeri ks SIGH/SIM - An Environment for RetargetableInstru tion Set Simulation. Te h. Rep. 1993/43, TU Berlin, Fe hberei h Infor-matik, Berlin, 1993.[10℄ G. Hadjiyiannis, S. Hanono, and S. Devadas. ISDL: An Instru tion Set De-s ription Language for Retargetability. in Pro . of the 34th Design AutomationConf., pp. 299-302, Jun. 9-13, 1997, Anaheim, California, USA.[11℄ Silvina Zimi Hanono. Aviv: A Retargetable Code Generator for Embedded Pro- essors. PhD Thesis, Department of EECS, MIT, Jun. 1999, USA.[12℄ A. Halambi, P. Grun, et al. EXPRESSION: A Language for ar hite ture explo-ration through ompiler/simulator retargetability. In Pro . of the European Conf.on Design, Automation and Test (DATE), pp. 485-490, Mar. 9-12, 1999, Muni h,Germany.[13℄ Asheesh Khare. SIMPRESS: A Simulator Generation Environment for System-On-Chip Exploration. MS Thesis, Department of Information and Computer S i-en e, University of California Irvine, 1999, Irvine.[14℄ Norman Ramsey and Mary F. Fernandez. Spe ifying representation of ma hineinstru tions. ACM Trans. Program. Lang. Syst., volume 19, number 3, pp. 492-524, Jan. 1997.[15℄ Norman Ramsey and Mary F. Fernandez. The New Jersey Ma hine-CodeToolkit. In Pro . of the Usenix Te hni al Conf. 1995, 1995, pp. 289-301, Jan.16-20, 1995, New Orleans, Louisiana.[16℄ J. Tei h, R. Weper, D. Fis her, and S. Trinkert. BUILDABONG: A RapidPrototyping Environment for ASIPs. In Pro . DSP-Deuts hland, pp. 153-162,O t. 2000, Muni h, Germany.[17℄ Y. Gurevi h. Evolving Algeras 1993: Lipari Guide Spe i� ation and ValidationMethods, Oxford University Press, pp. 9-36, 1995.20

Page 27: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

[18℄ M. Analau�. XASM: An Extensible, omponent-based abstra t state ma hinelanguage. In Pro . of Int. Workshop on Abstra t State Ma hines ASM 2000, Le -ture Notes on Computer S ien e, Volume 1912, pp. 69-90, Mar. 19-24, 2000,Springer.[19℄ M. Anlau�, P. Kutter, and A. Pierantonio. Formal aspe ts of and developmentenvironments for Montages. In 2nd Int. Workshop on the Theory and Pra ti e ofAlgebrai Spe i� ations, Workshops in Computing, Sep. 25-26, 1997, Springer-Verlag, Amsterdam, The Netherlands.[20℄ S. Bashford, U. Bieker, B. Harking, R. Leupers, P. Marwedel, A. Nenmann,and D. Vogenaner. The MIMOLA Language - Version4.1. Te h. Rep., LehrstuhlInformatik XII, University of Dortmund, Sep., 1994.[21℄ R. Leupers. Retargetable Code Generation for Digital Signal Pro essors. Jun.,1997, First Edition, Kluwer A ademi Publishers, Dordre ht, Netherlands[22℄ V Kathail, M. S hlansker, and B. Rau. HPL PlayDoh Ar hite ture Spe i� ation:Version 1.0. Te h. Rep. HPL-93-80, HP laboratories, Mar. 1994.[23℄ The MDES User Manual http://www.trimaran.org/do s/mdes_manual.pdf.Trimaran Release, 1998.[24℄ V. Zivojnovi , S. Pees, and H. Meyr. LISA: ma hine des ription language andgeneri ma hine model for HW/SW o-design. In Pro . of IEEE Workshop onVLSI Signal Pro essing, pp. 127-136, O t., 1996, San Fran is o, California.[25℄ C. Siska. A Pro essor Des ription Language Supporting Retargetable Multi-Pipeline DSP program Development Tools. In Pro . on 11th Int. Symposium onSyst. Synthesis, pp. 31-36, De . 2-4, 1998, Taiwan, China.[26℄ D. Lanneer, J. Van Praet, A. Ki�, K. S hoofs, W. Geurts, F. Thoen, and G.Goossens. Chess: Retargetable Code Generation for embedded DSP pro essors. InP. Marwedel, G. Goossens Code Generation for Embedded Pro essors, pp. 85-102,Kluwer A ademi Publishers, 1995. 21

Page 28: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

[27℄ P. Paulin, C. Liem, T. May, and S. Sutarwala. FlexWare: A Flexible �rmwaredevelopment environment for embedded systems. In P. Marwedel, G. Goossens,Code Generation for Embedded Pro essors, pp. 67-84, Kluwer A ademi Publish-ers, 1995.[28℄ M. Theissinger, P. Stravers, and H. Veit CASTLE: An Intera tive Environmentfor HW-SW Co-Design . In Pro . of the 3rd Int. Workshop on Hardware-SoftwareCo-design, pp. 203-209, Sep. 5-9, 1994, Grenoble, Fran e.[29℄ Trung A. D. and John Paul S. VMW: A Visualization based Mi ro-ar hite tureWorkben h. IEEE Computer, Volume 28, Number 12, pp. 57-64, De . 1995.[30℄ Rajat Moona. Pro essor Models for Retargetable Tools. In Pro . of 11th IEEEInt. Workshop on Rapid Systems Prototyping (RSP 2000), pp. 34-39, Jun. 21-23,2000, Paris, Fran e.[31℄ The BUILDABONG Proje t.http://www-date.upb.de/RESEARCH/BUILDABONG/buildabong.html.[32℄ Lisa Pro essor Design Platform.http://klaus.ert.rwth-aa hen.de/lisa/lpdp.html.

22

Page 29: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

Appendix AUser's ManualSim-nML spe i� ation of the target pro essor is �rst hanged to Intermediate Rep-resentation with the help of IR-Generator tool.A.1 IR GenerationIR-Generator tool is available with the sour e ode of disassembler tool. Usage ofIR-Generator are as followsUse : irg [-d level℄ [-h℄ [-w℄ -o ir_�le Sim-nML_�le-d : To get debug information in debug.tmp at di�erent levels (1..4) of detailValue 1 means minimum detailed information and 4 means maximum detailedinformation-h : to get this message-o : Intermediate ode will be in �le ir_�le otherwise default �le name is IR-w : to get warning messages. Default is no warning.Sim-nML_�le : input �le having Sim-nML spe i� ation of target pro essor23

Page 30: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

A.2 Usage of Generi DisassemblerDisassembler tool is ompiled with make in the sour e dire tory. Usage of the dis-assembler tool are as followsUse : Disassembler_exe [-h℄ [-o output℄ -i ir_�le - on�g obj�le-h : to get this help message-o : Disassembler writes the disassembly to the �le output.If the output �le name is not spe i�ed disassembler appends _symdis.s to ob-je t �le name to produ e the output �le name. e.g. if obje t �le name is 8q.o andoutput �le has not been spe i�ed, output is written to �le 8q_symdis.s.-i : ir_�le is generated with the Sim-nML spe i� ation of the target pro essor.- : on�g is the on�guration �le for pro essor spe i� ation in ir_�le. This�le ontains information spe i� to Sim-nML pro essor spe i� ation. It helps indetermining the instru tion type and identifying the immediate modes.obj_�le : It is the ELF obje t �le to be disassembled.Disassembler_exe is the exe utable �le for the disassembler tool. With the helpof make�le generator s ript genmake, available with disassembler tool, tool name an be on�gured a ording to the target pro essor e.g. for PowerPC603, exe utabledisassembler tool �le is named pp disa.For Elf �le, native ompiler for the target pro essor is required. Obje t �le isprodu ed with the help of ompiler, using appropriate ompilation �ags.A.3 Con�guration �leSim-nML spe i� ation is again used for produ ing the on�guration �le. SampleCon�guration �le for PowerPC603 Sim-nML des ription is given below.%IMM_MODE IMM16, IMM24, SIMM, UIMM16%BRANCH_UNCOND bran h_un ond%BRANCH_COND bran h_ ond 24

Page 31: Generic - Indian Institute of Technology Kanpur · 2007. 5. 27. · Generic Disassem bler Using Pro cessor Mo dels A Thesis Submitted in P artial F ul llmen t of the Requiremen ts

%CALL_UNCOND all_un ond%CALL_COND all_ ond%RETURN_UNCOND bran_ ond_lr%RETURN_COND ret_ ondHere bran h_un ond is the label of the subtree whi h ontains all the bran hun onditional instru tions. All the labels should be written, omma separated, in ase instru tions for one ategory are in di�erent subtrees. For writing a new on-�guration �le, hierar hy of the Sim-nML pro essor spe i� ation should be analyzed.Labels of all the subtrees, whi h ontain instru tions for a parti ular type should bewritten, omma separated, in front of that instru tion type. One an also write theop odes of instru tions in pla e of subtree labels.

25