29
Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: [email protected] Web: http:// www.cs.tufts.edu/~noah COMP 40: Machine Structure and Assembly Language Programming (Fall 2014)

Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: [email protected]@cs.tufts.edu Web: noahnoah

Embed Size (px)

Citation preview

Page 1: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

Copyright 2013 – Noah Mendelsohn

Compiling C Programs

Noah MendelsohnTufts UniversityEmail: [email protected]: http://www.cs.tufts.edu/~noah

COMP 40: Machine Structure and

Assembly Language Programming (Fall 2014)

Page 2: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn3

How do we get from source to executable program?

Page 3: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Executable files

Executable file:

– A single file with all code ready to run at a fixed address in memory

– Typically the same address for all programs

Requirements

– Code divided into multiple source files (.c files and .h files)

– Functions in shared .c files need to show up in lots of executables

– Often we want to share only the compiled versions (.o files) [you don’t have the source for printf() but you use it all the time]

The challenge

– In different executables using the same shared code…

– … the same functions and global variables may wind up at different addresses …

– … but we still need to make references work across source files

4

Page 4: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Resolving external references

5

#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n”,sum(1,2));}

two_plus_one.c

int sum(int a, int b) { return a+b;}

arith.c

call to sum(1,2) code for sum()

How do we know where sum() wound up?

two_plus_one (executable)

Page 5: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

From source code to executable (simplified)

6

two_plus_one.c

int sum(int a, int b) { return a+b;}

arith.c

gcc –c arith.c

Relocateable object code for sum()

arith.o

gcc –c two_plus_one.c

Relocateable object code for main()

two_plus_one.o

#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n”,sum(1,2));}

Page 6: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

From source code to executable (simplified)

7

#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n” sum(1,2));}

two_plus_one.c

int sum(int a, int b) { return a+b;}

gcc –c arith.c

Relocateable object code for sum()

arith.c

arith.o

gcc –c two_plus_one.c

Relocateable object code for main()

two_plus_one.o

Relocatable .o files

• Contain machine code• References within the file are resolved

• References to external files not resolved• Some address fields may need adjusting later depending on final location in executable program

• Includes lists of: 1) Names and addresses of defined externals

2) Names and referents of things needing relocation

Page 7: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Linking .o files to create executable

8

gcc –o two_plus_one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

Page 8: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Linking .o files to create executable

9

gcc –o two_plus_one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

gcc actually runs a program named “ld” to create the executable.

Page 9: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Linking .o files to create executable

10

gcc –o two_plus_one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

To create executable:Code from all .o files collected in one executableFixed load address assumedAll references resolved – code & vars updated

Page 10: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Linking .o files to create executable

11

gcc –o two_plus one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

The executable contains all the code, with references resolved, loadable at a fixed addr. It is ready to be invoked using the exec_() family of system calls or from the command line [which uses exec()].

Page 11: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Linking .o files to create executable

12

gcc –o two_plus_one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

The default name for an executable is a.out so programmers sometimes informally refer to any executable as an “a.out”.

Page 12: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn13

We left out two important steps!

Page 13: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Preprocessor

14

#include <stdio.h>#define TWO 2int main(int argc, char *argv[]) { printf(“The sum is %d\n”, sum(1,TWO));}

Before the compiler even sees the code…

…the preprocessor rewrites the code handling all #define, #include, #ifdef and macro substitution…

These are gone before the compiler sees the code

Page 14: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Preprocessor used for sharing declarations

15

#include <stdio.h>#include “arith.h”int main(int argc, char *argv[]) { printf(“The sum is %d\n”,sum(1,2));}

two_plus_one.c

#include “arith.h”int sum(int a, int b) { return a+b;}

arith.c

int sum(int a, int b);

arith.h

Caller and callee agree on function prototype for sum()

Page 15: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

We also left out the assembler step

The object code in a .o is binary (not human-readable)

Assembly language is a human-reable form of machine code

– Symbolic names for machine instructions

– Symbolic labels for addresses (like variables and branch targets in code)

– Etc.

When you run gcc –c it actually does three steps:

– Run the preprocessor

– Run the compiler itself to create an assembler file

– Run the assembler to create a .o

– Normally, we do these steps together, but you can use switches to run them separately

16

Page 16: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Common invocations of gcc

17

gcc –c two_plus_two.c Runs preprocessor, compiler & assembler to make two_plus_two.o

gcc –c arith.c Same: makes arith.o

gcc –o two_plus_two two_plus_two.o arith.o Use ld to link .o files + system libraries to make two_plus_two executale

gcc –E two_plus_two.c Runs just preprocessor

gcc –S two_plus_two.c Runs just preprocessor & compiler, produces assembler in .s file

gcc –c two_plus_two.s Notices .s extension, runs assembler

Page 17: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn18

Putting it All Together

Page 18: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Compiling a program

19

#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n” sum(1,2));}

Preprocessor(cpp)

Preprocessed

source

Compiler(cpp)

AssemblerSource

Assembler(as) .o file

Preprocessor(cpp)

Preprocessed

source

Compiler(cpp)

AssemblerSource

Assembler(as) .o file

int sum(int a, int b) { return a+b;}

Loader(ld)

Two_plus_two(executable)

Page 19: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn20

Shared Libraries(not required for COMP 40)

(these slides on shared libraries were used in COMP 111…you may find them interesting to read)

Page 20: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Ooops! Where does printf come from?

21

gcc –o two_plus one two_plus_one.o arith.o libc.a

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

Routines like printf live in libraries.

Page 21: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Ooops! Where does printf come from?

22

gcc –o two_plus one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

Routines like printf live in libraries.

These are created with the “ar” command, which packages up several .o files together into a “.a” archive or library. You can list the .a along with your separate .o files and ld will pull from it any .o files it needs.

Page 22: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Ooops! Where does printf come from?

23

gcc –o two_plus one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

Routines like printf live in libraries.

These are created with the “ar” command, which packages up several .o files together into a “.a” archive or library. You can list the .a along with your separate .o files and ld will pull from it any .o files it needs.

printf used to live in the system library named libc.a, which the compiler links automatically into the executable (so you don’t have to list it).

Page 23: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Why shared libraries?

Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf

Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?

Challenges:

– We can’t link it when ld builds the rest of the executable: we can just note we need it

– The same copy is likely to be mapped at different addresses in different programs

24

Page 24: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Why shared libraries?

Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf

Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?

Challenges:

– We can’t link it when ld builds the rest of the executable: we can just note we need it

– The same copy is likely to be mapped at different addresses in different programs

Solution: compiler, linker and OS work together to support shared libraries

– gcc –fPIC printf.c generates “position-independent code” that can load at any address

– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library

– gcc –o two_plus_one two_plus_one.o arith.o libc.so

25

We’ll use printf as an example even though it’s built in to the system…

Compile the source with –fPIC to make a position-independent .o file.

Page 25: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Why shared libraries?

Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf

Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?

Challenges:

– We can’t link it when ld builds the rest of the executable: we can just note we need it

– The same copy is likely to be mapped at different addresses in different programs

Solution: compiler, linker and OS work together to support shared libraries

– gcc –fPIC printf.c generates “position-independent code” that can load at any address

– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library

– gcc –o two_plus_one two_plus_one.o arith.o libc.so

26

Link that printf.o and any other files with the –shared option to create a shared library (.so) file.

Page 26: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Why shared libraries?

Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf

Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?

Challenges:

– We can’t link it when ld builds the rest of the executable: we can just note we need it

– The same copy is likely to be mapped at different addresses in different programs

Solution: compiler, linker and OS work together to support shared libraries

– gcc –fPIC printf.c generates “position-independent code” that can load at any address

– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library

– gcc –o two_plus_one two_plus_one.o arith.o libc.so

27

The linker recognizes .so files…instead of including the code, it leaves a little stub that tells the OS to find and map the shared copy of the .so file when exec loads the program.

(Actually, libc.so is so widely used that it’s automatically linked, so you don’t need to list it as you would your own .so libraries).

Page 27: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Angry

B

irds

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds Data)

Heap(malloc’d)

argv, environ

???

libc.so

Stack(Browser Call

Stack)

Text(Browser code)

Static initialized (Browser Data)

Static uninitialized(Browser Data)

Heap(malloc’d)

argv, environ

libc.so

libc.so (with printf code) shows up at

different locations in

the two programs

Memory mapping allows sharing of .so libraries

Page 28: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Memory mapping allows sharing of .so libraries

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds Data)

Heap(malloc’d)

argv, environ

Stack(Angry Birds Call Stack)

Text(Browser code)

Static initialized (Browser Data)

Static uninitialized(Browser Data)

Heap(malloc’d)

argv, environ

Angry

B

irds ???

libc.so

libc.so

libc.so

Only one copy lives in

memory… everyone shares it!

Page 29: Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: noahnoah

© 2010 Noah Mendelsohn

Memory mapping allows sharing of .so libraries

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds Data)

Heap(malloc’d)

argv, environ

Stack(Angry Birds Call Stack)

Text(Browser code)

Static initialized (Browser Data)

Static uninitialized(Browser Data)

Heap(malloc’d)

argv, environ

Angry

B

irds ???

libc.so

libc.so

libc.so

Memory mapping

hardware can do this…

Code must be position-

independent!