How Main Functions Works

Embed Size (px)

Citation preview

  • 8/12/2019 How Main Functions Works

    1/12

    Linux x86 Program Start Up

    or - How the heck do we get to main()?

    by Patrick Horgan

    (Back to debugging.)

    Click to show Table of Contents

    ho!" thi" #or?

    This is for people who want to understand how programs get loaded under linux. In particular it talks about

    dynamically loaded x86 ELF files. The information you learn will let you understand how to debug problems

    that occur in your program before main starts up. Eerything I tell you is true! but some things will be glossed

    oer since they don"t take us toward our goal. Further! if you link statically! some of the details will be different.

    I won"t coer that at all. #y the time you"re done with this though! you"ll know enough to figure that out for

    yourself if you need to.

    $hi" i" what we!%% co&er (pretty picture brought to you by dot - #i%ter #or drawing directed graph")

    $hen we"re done! you"ll understand this.

    How did we get to main?

    $e"re going to build the simplest % program possible! an empty main! and then we"re going to look at thedisassembly of it to see how we get to main. $e"ll see that the first thing that"s run is a function linked to eery

    program named &start which eentually leads to your program"s main being run.

    'ae a copy of this as prog(.c if you want! and follow along. The first thing I"ll do is to build it like this.

    HOME TUTORIALS PHOTOGRAPHY DEBUGGING STUFF

    Page 1 of 12Linux x86 Program Start Up

    11/16/21!"ttp#//dbp$con%u&ting.com/tutoria&%/debugging/&inuxProgramStartup."tm&

  • 8/12/2019 How Main Functions Works

    2/12

    #efore we try to debug a later ersion of this )prog*+! in gdb! we"re going to look at the disassembly of it and

    learn a few things about how our program starts up. I"m going to show the output of ! but

    I"m not going to show it in the order it would be dumped by ob,dump! but rather in the order it would be

    executed. )#ut you"re perfectly welcome to dump it yourself. 'omething like

    will sae a copy for you! and then you can use your faorite editor to look at it. )#ut -/0I 1 -eal en /se 0I2+

    But first, how do we get to _start?

    $hen you run a program! the shell or gui calls which executes the linux system call . If you

    want more information about then you can simply type from your shell. It will comefrom section * of man where all the system calls are. To summari3e! it will set up a stack for you! and push

    onto it ! ! and . The file descriptions 4! (! and *! )stdin! stdout! stderr+! are left to whateer the

    shell set them to. The loader does much work for you setting up your relocations! and as we"ll see much later!

    calling your preinitiali3ers. $hen eerything is ready! control is handed to your program by calling

    5ere from is the section with &start.

    _start is, oddly enough, where we start

    xor of anything with itself sets it to 3ero. so the sets to 3ero. This is suggested by the #I

    )pplication #inary Interface specification+! to mark the outermost frame. 7ext we pop off the top of the

    stack. n entry we hae ! and on the stack! so the pop makes go into . $e"re ,ust

    going to sae it and push it back on the stack in a minute. 'ince we popped off ! is now pointing at

    . The puts into without moing the stack pointer. Then we the stack pointer with a mask

    that clears off the bottom four bits. 9epending on where the stack pointer was it will moe it lower! by 4 to (:

    bytes. In any case it will make it aligned on an een multiple of (6 bytes. This alignment is done so that all ofthe stack ariables are likely to be nicely aligned for memory and cache efficiency! in particular! this is

    re;uired for ''E )'treaming 'I9 Extensions+! instructions that can work on ectors of single precision floating

    point simultaneously. In a particular run! the was on entry to . fter we popped

    off the stack! was . It moed up to a higher address )putting things on the stack moes down

    in memory! taking things off moes up in memory+. fter the the stack pointer is back at .

    Now set up for calling __libc_start_main

    'o now we start pushing arguments for onto the stack. The first one! is garbage

    pushed onto the stack ,ust because < things are going to be pushed on the stack and they needed an 8th

    one to keep the (61byte alignment. It"s neer used for anything. is linked in from glibc. In

    the source tree for glibc! it lies in csu=libc1start.c. is specified like

    'o we expect &start to push those arguments on the stack in reerse order before the call to

    &&libc&start&main.

    Page 2 of 12Linux x86 Program Start Up

    11/16/21!"ttp#//dbp$con%u&ting.com/tutoria&%/debugging/&inuxProgramStartup."tm&

  • 8/12/2019 How Main Functions Works

    3/12

    Stack contents just before call of __libc_start_main

    is linked into our code from glibc! and lies in the source tree in csu=elf1init.c. It"s our

    program"s % leel destructor! and I"ll look at it later in the white paper.

    Hey! Where's the environment variables?

    9id you notice that we didn"t get enp!

    the pointer to our enironment ariables

    off the stack> It"s not one of the

    arguments to !

    either. #ut we know that is called

    so what"s up>

    $ell! calls ! who immediately uses secret inside information to find the

    enironment ariables ,ust after the terminating null of the argument ector and then sets a global ariable

    which uses thereafter wheneer it needs it including when it calls . fter

    the is established! then &&libc&start&main uses the same trick and surprise!?ust past the terminating null

    at the end of the enp array! there"s anotherector! the ELF auxiliary ector the loader uses to pass some

    information to the process. n easy way to see what"s in there is to set the enironment ariable

    before running the program. 5ere"s the result for our prog(.

    Isn"t that interesting. ll sorts of

    information. The T&E7T-@ is the address

    of &start! there"s our userid! our effectie

    userid! and our groupid. $e know we"re

    a 686! times)+ fre;uency is (44! clock1

    ticks=s> I"ll hae to inestigate this. The

    T&A59- is the location of the ELF

    program header that has information

    about the location of all the segments

    of the program in memory and about

    relocation entries! and anything else a

    loader needs to know. T&A5E7T is ,ust

    the number of bytes in a header entry.

    $e won"t chase down this path ,ust now!

    since we don"t need thatmuch

    information about the loading of a file

    to be an effectie program debugger.

    __libc_start_main in general

    That"s about as much as I"m going to get into the nitty1gritty details of how ! but in general!

    it

    B Takes care of some security problems with setuid setgid programs

    B 'tarts up threading

    B -egisters the )our program+! and )run1time loader+ arguments to get run by to

    run the program"s and the loader"s cleanup routines

    Page ! of 12Linux x86 Program Start Up

    11/16/21!"ttp#//dbp$con%u&ting.com/tutoria&%/debugging/&inuxProgramStartup."tm&

  • 8/12/2019 How Main Functions Works

    4/12

    B %alls the argument

    B %alls the with the and arguments passed to it and with the global &&eniron argument as

    detailed aboe.

    B %alls with the return alue of main

    Calling the argument

    The argument! to ! is set to which is also linked into our code. It"s

    compiled from a % program which lies in the glibc source tree in csu=elf1init.c and linked into our program.The % code is similar to )but with a lot more Cifdefs+!

    This is our program's constructor

    It"s pretty important to our program

    because it"s our executable"s

    constructor. D$aitD! you say! DThis isn"t

    %D. @es that"s true! but the concept of

    constructors and destructors doesn"t

    belong to %! and preceeded %

    ur executable! and eery other

    executable gets a % leel constructor

    and a % leel

    destructor! . Inside the

    constructor! as you"ll see! theexecutable will look for global % leel constructors and call any that it finds. It"s possible for a % program to

    also hae these! and I"ll demonstrate it before this paper is through. If it makes you more comfortable though!

    you can call them initiali3ers and finali3ers. 5ere"s the assembler generated for .

    What the

    heck is a

    thunk?

    7ot

    much to

    talk

    about

    here! but

    I thought

    you"dwant to

    see it.

    The

    get&pc&thunk thing is a little interesting. It"s used for position independent code. They"re setting up for position

    independent code to be able to work. In order for it to work! the base pointer needs to hae the address of

    the GL#L&FF'ET&T#LE. The code had something likeH

    'o! look

    closely

    at what

    Page 4 of 12Linux x86 Program Start Up

    11/16/2013http://dbp-on!u"ting#om/tutoria"!/debugging/"inuxProgramStartup#htm"

  • 8/12/2019 How Main Functions Works

    5/12

    happens. The call to ! like all other calls! pushes onto the stack the address of the next

    instruction! so that when we return! the execution continues at the next consecutie instruction. In this case!

    what we really want is that address. 'o in ! we copy the return address from the stack into

    . $hen we return! the next instruction adds to it &GL#L&FF'ET&T#LE& which resoles to the difference

    between the current address and the global offset table used by position independent code. That table

    keeps a set of pointers to data that we want to access! and we ,ust hae to know offsets into the table. The

    loader fixes up the address in the table for us. There is a similar table for accessing procedures. It could be

    really tedious to program this way in assembler! but you can ,ust write % or % and pass the 1pic argument

    to the compiler and it will do it automagically. 'eeing this code in the assembler tells you that the sourcecode was compiled with the 1pic flag.

    But what is that loop?

    The loop from will be discussed in a minute after we discuss the init)+ call that really calls

    . For now! ,ust remember that it calls any % leel initiali3ers for our program.

    _init gets the call

    k! the loader handed control to ! who called who called who

    now calls .

    It starts with the regular C calling convention

    If you want to know more about the % calling conention! ,ust look at #asic ssembler 9ebugging with G9#.

    The short story is that we sae our caller"s base pointer on the stack and point our base pointer at the top of

    the stack and then sae space for a byte local of some sort. n interesting thing is the first call. It"s purpose is

    ;uite similar to that call to get&pc&thunk that we saw earlier. If you look closely! the call is to the next

    se;uential address That gets you to the next address as if you"d ,ust continued! but with the side effect that

    the address is now on the stack. It gets popped into Jebx and then used to set up for access to the global

    access table.

    Show me your best profile

    Then we grab the address of . If it"s 3ero then we don"t call it! instead we ,ump past it. therwise!

    we call it to set up profiling. It runs a routine to start profiling! and calls at&exit to schedule another routine to

    run later to write gmon.out at the end of execution.

    This guy's no dummy! e's been framed!

    In either case! next we call frame&dummy. The intention is to call &&register&frame&info! but frame&dummy is

    called to set up the arguments to it. The purpose of this is to set up for unwinding stack frames for exception

    handling. It"s interesting! but not a part of this discussion! so I"ll leae it for another tutorial perhaps. )9on"t be

    too disappointed! in our case! it doesn"t get run anyway.+

    inally we're getting constructive!

    Finally we call &do&global&ctors&aux. If you hae a problem with your program that occurs before main starts!

    this is probably where you"ll need to look. f course! constructors for global % ob,ects are put in here but it"s

    possible for other things to be in here as well.

    Page 5 of 12Linux x86 Program Start Up

    11/16/2013http://dbp-on!u"ting#om/tutoria"!/debugging/"inuxProgramStartup#htm"

  • 8/12/2019 How Main Functions Works

    6/12

    Let's set up an example

    Let's modify our prog1 and make a prog2. The exciting part is the that tells

    gcc that the linker should stick a pointer to this in the table used by . As you can see,

    our fake constructor gets run. (!"#$T% is filled in by the compiler ith the name of the function. %t's gcc

    magic.

    prog2's _init, much the same as prog1

    %n a minute e'll drop into gdb and see it happen. )e'll be going into prog2's init.

    As you can see, the addresses are slightly different than in prog1. The extra bit of data seems to ha*e shifted

    things 2+ bytes. o, there's the name of the to functions, -aconstructor- (1 bytes ith null terminator, and

    -main- (/ bytes ith null terminator and the to format strings, -0sn- (2 bytes ith the neline as 1

    character and the null terminator, so 1 3 / 3 3 4 256 7mmm off by one somehere. %t's 8ust a guess

    anyay, % didn't go and look. Anyay, e're going to break on the call to doglobalctorsaux, and then

    single step and atch hat happens.

    And here's the code that will get called

    9ust to help, here's the $ source code for out of the gcc source code here it li*es in

    a file .

    As you can see, it initiali:es from a

    global *ariable and

    subtracts 1 from it. ;emember this is

    pointer arithmetic though and the

    pointer points at a function, so in this

    case, that

  • 8/12/2019 How Main Functions Works

    7/12

    Here's the same in assembler

    7ere's the assembler that corresponds to it from ob8dump

  • 8/12/2019 How Main Functions Works

    8/12

    )e ran it in the debugger, turned on, so that it ill alays sho us the disassembly

    for the line of code that is about to be executed, and set a breakpoint at the line in here e're about

    to call .

    % typed r to run the program and hit the breakpoint. Fy next command to gdb as , step instruction, to tell

    gdb to single step one instruction. )e'*e no entered . As e go along you'll see

    times hen it seems that % entered no command to gdb. That's because, if you simply press return, gdb ill

    repeat the last instruction. o if % press enter no, %'ll do another si.

    &k, no e'*e finished the preamble, and the real code is about to start.

    % as curious after loading the pointer so % told gdb hich means print as hexadecimal the contents

    of the register . %t's not

  • 8/12/2019 How Main Functions Works

    9/12

    #o this is *ery interesting. )e'*e single stepped into the call. #o e're in our function, .

    ince gdb has the source code for it, it shos us the $ source for the next line. ince % turned on

    , it ill also gi*e us the assembler that corresponds to that line. %n this case, it's the

    preamble for the function that corresponds to the declaration of the function, so e get all three lines of the

    preamble. %sn't that interesting6 #o %'m going to sitch o*er to the command n (for next because our printf

    is coming up. The first n ill skip the preamble, the second the printf, and the third the epilogue. %f you'*e e*er

    ondered hy you ha*e to do an extra step at the beginning and end of a function hen single stepping

    ith gdb, no you kno the anser.

    )e mo*ed the address of the string -aconstructor- onto the stack as an argument for , but it calls

    since the compiler as smart enough to see that as all e needed.

    ince e're tracing the program, it is, of course running, so e see print out abo*e. The

    closing brace (G corresponds to the epilogue so that prints out no. 9ust a note, if you don't kno about the

    instruction it does exactly the same as

    &ne more step and e exit the function and return, %'ll ha*e to sitch back to si.

    Hot curious and checked again. This time, our function pointer is

  • 8/12/2019 How Main Functions Works

    10/12

    #otice e 8umped back up into , and that's hen % typed @ to @uite the debugger. That's all

    the debugging % promised you. #o that e're back in libccsuinit there's another loop to deal ith,

    and %'m not going to step through it, but % am about to talk about it.

    Back up to

    ince e'*e spent a long tedious time dealing ith a loop in assembler and the assembler for this one is e*en

    more tedious, %'ll lea*e it to you to figure it out if you ant. 9ust to remind you, here it is in $.

    Here's another function call loop

    )hat is this initarray6 % thought you'd ne*er ask. >ou can ha*e code run at this stage as ell. ince this is

    8ust after returning from running hich ran our constructors, that means anything in this array ill run

    after constructors are done. >ou can tell the compiler you ant a function to run at this phase. The functionill recei*e the same arguments as main.

    )e on't do it, yet, because there's more things like that. Lets 8ust return from . Io you

    remember here that ill take us6

    We'll be all the way back in

    7e calls our main no, and then passes the result to exit(.

    exit() runs somemoreloops of functions

    exit( runs the functions registered ith atexit run in the order they ere added. Then he runs another loop of

    functions, this time, functions in the fini array. After that he runs another loop of functions, this time destructors.

    (%n reality, he's in a nested loop dealing ith an array of lists of functions, but trust me this is the order they

    come out in. 7ere, %'ll sho you.

    This program, hooks.c ties it all together

    Page 10 of 12Linux x86 Program Start Up

    11/16/2013http://dbp-on!u"ting#om/tutoria"!/debugging/"inuxProgramStartup#htm"

  • 8/12/2019 How Main Functions Works

    11/12

    If you build and run this, (I call it hooks.c), the output is

    The End

    I'll give you a last look at how far we've come. This time it should all be familiar territory to you.

    Page 11 of 12Linux x86 Program Start Up

    11/16/2013http://dbp-conu!ting"com/tutoria!/debugging/!inuxProgramStartup"htm!

  • 8/12/2019 How Main Functions Works

    12/12

    (Back to debugging.)

    Page 12 of 12Linux x86 Program Start Up