88
Copyright 2013 – Noah Mendelsohn The Life and Death of A Process Noah Mendelsohn Tufts University Email: [email protected] Web: http://www.cs.tufts.edu/~noah Based on a presentation by Professor Alva Couch COMP 111: Operating Systems (Fall 2013)

Copyright 2013 – Noah Mendelsohn The Life and Death of A Process Noah Mendelsohn Tufts University Email: [email protected]@cs.tufts.edu Web: noahnoah

Embed Size (px)

Citation preview

Copyright 2013 – Noah Mendelsohn

The Life and Deathof

A Process

Noah MendelsohnTufts UniversityEmail: [email protected]: http://www.cs.tufts.edu/~noahBased on a presentation by Professor Alva Couch

COMP 111: Operating Systems (Fall 2013)

© 2010 Noah Mendelsohn2

Today

How processes are created, managed and terminated

Sharing the computer (redux)

From file.c to a.out to running image

Library routines and shared libraries

© 2010 Noah Mendelsohn3

Review: Processes & the Kernel

© 2010 Noah Mendelsohn

Operating systems do two things for us:

4

• They make the computer easier to use

• The facilitate sharing of the computer by multiple programs and users

© 2010 Noah Mendelsohn

…actually, Unix & Linux have one more goal:

5

• To facilitate running the same program (and OS!) on different types of computer

© 2010 Noah Mendelsohn

CPU

OPERATING SYSTEM KERNEL

MA

IN M

EM

ORY

The protected OS “Kernel”

Angry Birds Play Video Browser

Multiple ProgramsRunning at once

The operating system is a special, privileged program,

with its own code and data. We call the protected, shared part

of the OS the “kernel”.

© 2010 Noah Mendelsohn

CPU

OPERATING SYSTEM KERNEL

MA

IN M

EM

ORY

We need help from the hardware to protect the kernel!

Angry Birds Play Video Browser

The hardware has memory mapping features that the OS can use to:

• Hide the kernel from other programs• Hide programs from each other

• Convince each program it’s got its own private memory starting at address zero

© 2010 Noah Mendelsohn

CPU

OPERATING SYSTEM KERNEL

MA

IN M

EM

ORY

Angry Birds Play Video Browser

The hardware has special instructions that only the kernel can use to:

* Initiate I/O* Set clocks and timers

* Control memory mapping

The Kernel runs in “privileged” or “kernel” or “supervisor” state.

Ordinary programs run in “user mode”.

If a user program tries a privileged operation, the hardware will tell the

kernel!

Privileged instructions only the OS can use

© 2010 Noah Mendelsohn

CPUMEMORY

Angry Birds Play Video Browser

Disk PrinterKeyboard,mouse,display

OPERATING SYSTEM KERNEL

A process is an instance of a running program

© 2010 Noah Mendelsohn

CPUMEMORY

Angry Birds Play Video Browser

Disk PrinterKeyboard,mouse,display

OPERATING SYSTEM KERNEL

A process is an instance of a running program

How does this process get started?

How does the OS know what code to run?

© 2010 Noah Mendelsohn11

Today

How processes are created, managed and terminated

Sharing the computer (redux)

From file.c to a.out to running image

Library routines and shared libraries

© 2010 Noah Mendelsohn12

Cloning a Process with fork

© 2010 Noah Mendelsohn

Process creation in Unix/Linux

Each process starts life as a clone of its parent

– Use the fork() system call to create a clone

When it’s born, each process inherits from its parent

– Open files (related processes share a file pointer)

– Environment variables

– Many other things: e.g. current directory

– An exact copy of all memory segments from the parent

– The same code running at the same place, I.E. dropping through the fork!

Each copy can tell whether it is parent or child

– Parent gets process ID as return value from fork

– Child gets zero

13

© 2010 Noah Mendelsohn

Example of fork() system call

int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */

fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();

if (child_pid) { fprintf(stderr,

"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}

14

Using stderr instead of stdout because it’s unbuffered…output is never delayed

© 2010 Noah Mendelsohn

Example of fork() system call

int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */

fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();

if (child_pid) { fprintf(stderr,

"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}

15

Using stderr instead of stdout because it’s unbuffered…output is never delayed

Print startup message and fork into two processes – a parent and a child

© 2010 Noah Mendelsohn

Example of fork() system call

int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */

fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();

if (child_pid) { fprintf(stderr,

"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}

16

Using stderr instead of stdout because it’s unbuffered…output is never delayed

The parent gets the child’s process id..the child gets zero.

© 2010 Noah Mendelsohn

Example of fork() system call

int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */

fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();

if (child_pid) { fprintf(stderr,

"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}

17

Using stderr instead of stdout because it’s unbuffered…output is never delayed

In the parent…

• Print a message• Wait for the child to complete its work• Announce that the child has been “reaped”• Exit (drop through)

© 2010 Noah Mendelsohn

Example of fork() system call

int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */

fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();

if (child_pid) { fprintf(stderr,

"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}

18

Using stderr instead of stdout because it’s unbuffered…output is never delayed

In the child…

• Print a message• Exit

© 2010 Noah Mendelsohn

Example of fork() system call

int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */

fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();

if (child_pid) { fprintf(stderr,

"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}

19

Using stderr instead of stdout because it’s unbuffered…output is never delayed

Remember:

On a multi-core machine, the parent and the child may really be running at the same time!

© 2010 Noah Mendelsohn

Example of fork() system call

int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */

fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();

if (child_pid) { fprintf(stderr,

"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}

20

Using stderr instead of stdout because it’s unbuffered…output is never delayed

OUTPUT$ fork1PARENT: Parent has startedPARENT: my pid is 26928 and my parent is 24979CHILD: my pid is 26930 and my parent is 26928PARENT: my child with pid=26930 has died$

© 2010 Noah Mendelsohn

Example of fork() system call

int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */

fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();

if (child_pid) { fprintf(stderr,

"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}

21

Using stderr instead of stdout because it’s unbuffered…output is never delayed

Question? What’s the parent’s parent?

© 2010 Noah Mendelsohn

Example of fork() system call

int main(int argc, char *argv[]) { pid_t child_pid; /* child’s process id or zero */

fprintf(stderr,"PARENT: Parent has started\n"); child_pid = fork();

if (child_pid) { fprintf(stderr,

"PARENT: my pid is %d and my parent is %d\n", getpid(), getppid()); wait(child_pid); fprintf(stderr,"PARENT: my child with pid=%d has died\n", child_pid); } else { fprintf(stderr,"CHILD: my pid is %d and my parent is %d\n", getpid(), getppid()); }}

22

Using stderr instead of stdout because it’s unbuffered…output is never delayed

$ ps PID TTY TIME CMD24979 pts/22 00:00:00 tcsh26834 pts/22 00:00:00 ps$ $ $ fork1PARENT: Parent has startedPARENT: my pid is 26928 and my parent is 24979CHILD: my pid is 26930 and my parent is 26928PARENT: my child with pid=26930 has died$

Most commands have the shell as a parent!

© 2010 Noah Mendelsohn

Some things to note about fork

Code for parent and child is the same

…we haven’t learned to run another program yet

Parent and child run in parallel

Child gets a copy of variables – changes are not seen by the parent

There is a tree of processes rooted at a special system “init” process, which is always pid=1 … every process except init has a parent!

Parent must wait for the child to die or it becomes a “zombie”

23

© 2010 Noah Mendelsohn

Zombies, process IDs and the process table

Every process has an ID (you’ve seen that)

Inside the kernel, there is a data structure known as a process descriptor for each process that hasn’t been “reaped” by a wait call from the parent

On 32 bit Linux systems, there are typically 32767 process IDs…if they get used up, the system can’t make new processes

Process IDs and descriptors can’t be reused until the parent has waited

Therefore: be sure to wait for the death of every process you create!

What happens if the parent dies without waiting?

– Children get inherited by init, which will reap them

– The big problem is if you keep running and don’t reap your children!

24

© 2010 Noah Mendelsohn

Killing a process

Send the process a kill signal to ask/tell it to die

How?

– From another process: kill(victim_pid, SIGKILL) system call

– From the console: kill -9 victim_pid

SIGKILL (integer value -9) is a magic signal that the victim cannot intercept: it will kill the process immediately

Other signals are used for many purposes...covered next week

The parent still must wait/reap or the result is a zombie…the shell will reap processes of commands it launches

25

© 2010 Noah Mendelsohn26

Brief Interlude…What’s a Signal

© 2010 Noah Mendelsohn

Review: How can your process “call” the kernel?

CPUMEMORY

Angry Birds Play Video Browser

Disk PrinterKeyboard,mouse,display

OPERATING SYSTEM KERNEL

Filesystem, Graphics System, Window system, TCP/IP Networking, etc., etc.

Your program can use system calls to ask the kernel for

service (e.g. read, kill, etc.)

© 2010 Noah Mendelsohn

Signals: How the kernel can call your program!

CPUMEMORY

Parent

Child

OPERATING SYSTEM KERNEL

The kernel can cause a preset signal handler in your program

to run…this alerts your program to some news from

the kernel

Before signal can be caught, child must issue to identify the handler function:

signal(SIGNAME, handler_function)

If no handler, OS supplies default behavior

© 2010 Noah Mendelsohn

Signal handling example

29

/* sigalarm.c */ #include <stdio.h> #include <signal.h>typedef enum { false, true } bool;

int sleeping = false;

void timeisup(int sig) { fprintf(stderr, "Quit bothering me, I was sleeping!\n", sig); sleeping = false;}

main() { signal(SIGALRM,timeisup); alarm(5); /* Please wake me in 5 seconds */

sleeping = true; while (sleeping) { printf("zzz...\n"); sleep(1); }

fprintf(stderr, "Dang, you woke me up!\n");}

Tell the OS to call the timeisup function when the alarm signal arrives.

timeisup is the signal handler

© 2010 Noah Mendelsohn

Signal handling example

30

/* sigalarm.c */ #include <stdio.h> #include <signal.h>typedef enum { false, true } bool;

int sleeping = false;

void timeisup(int sig) { fprintf(stderr, "Quit bothering me, I was sleeping!\n", sig); sleeping = false;}

main() { signal(SIGALRM,timeisup); alarm(5); /* Please wake me in 5 seconds */

sleeping = true; while (sleeping) { printf("zzz...\n"); sleep(1); }

fprintf(stderr, "Dang, you woke me up!\n");}

Tell the OS to send SIGALRM in 5 seconds

© 2010 Noah Mendelsohn

Signal handling example

31

/* sigalarm.c */ #include <stdio.h> #include <signal.h>typedef enum { false, true } bool;

int sleeping = false;

void timeisup(int sig) { fprintf(stderr, "Quit bothering me, I was sleeping!\n", sig); sleeping = false;}

main() { signal(SIGALRM,timeisup); alarm(5); /* Please wake me in 5 seconds */

sleeping = true; while (sleeping) { printf("zzz...\n"); sleep(1); }

fprintf(stderr, "Dang, you woke me up!\n");}

QUESTION: does sleeping ever become false?

Loop printing “zzz”

© 2010 Noah Mendelsohn

Signal handling example

32

/* sigalarm.c */ #include <stdio.h> #include <signal.h>typedef enum { false, true } bool;

int sleeping = false;

void timeisup(int sig) { fprintf(stderr, "Quit bothering me, I was sleeping!\n", sig); sleeping = false;}

main() { signal(SIGALRM,timeisup); alarm(5); /* Please wake me in 5 seconds */

sleeping = true; while (sleeping) { printf("zzz...\n"); sleep(1); }

fprintf(stderr, "Dang, you woke me up!\n");}

QUESTION: does sleeping ever become false?

YES!!When the signal arrives!

© 2010 Noah Mendelsohn

Signal handling example

33

/* sigalarm.c */ #include <stdio.h> #include <signal.h>typedef enum { false, true } bool;

volatile sigatomic_t sleeping = false;

void timeisup(int sig) { fprintf(stderr, "Quit bothering me, I was sleeping!\n", sig); sleeping = false;}

main() { signal(SIGALRM,timeisup); alarm(5); /* Please wake me in 5 seconds */

sleeping = true; while (sleeping) { printf("zzz...\n"); sleep(1); }

fprintf(stderr, "Dang, you woke me up!\n");}

Advanced topic

On most machines, this will work fine if you declare sleeping as an int, however…

There are two issues in principle:

1) The compiler working on main() needs to know that sleeping could get updated

by code it’s not seeing (the handler – volatile warns it)

2) For some data types, updating a value takes multiple instructions, and the alarm

could ring while the data is in an inconsistent state. Very unlikely for an int, but sigatomic_t is guaranteed to

be updated atomically

Glad you asked?

Lesson: asynchronous programming is tricky, and OS’s do it all the time!

© 2010 Noah Mendelsohn

One process can ask kernel to signal another

CPUMEMORY

Parent

Child

OPERATING SYSTEM KERNEL

© 2010 Noah Mendelsohn

One process can ask kernel to signal another

CPUMEMORY

Parent

Child

OPERATING SYSTEM KERNEL

Before signal can be caught, child must issue:

signal(SIG_XXXX, handler_function)

If no handler, OS supplies default behaviorkill(SIG_XXXX, child_pid)

Oddly, kill is used not just for kill signals, but for all signals!

handler_function()called

© 2010 Noah Mendelsohn

One process can ask kernel to signal another

CPUMEMORY

Parent

Child

OPERATING SYSTEM KERNEL

Before signal can be caught, child must issue:

signal(SIGNAME, handler_function)

If no handler, OS supplies default behaviorkill(SIG_STP, child_pid)

Oddly, kill is used not just for kill signals, but for all signals!

By the way, the shell has a kill command you can use to send signals to any of your processes (or other

people’s processes if you have permission). Use “man 1 kill” for more info on the shell command, and “man 2

kill” & “man 2 signal” for the system calls .

© 2010 Noah Mendelsohn

Background and suspended processes

From most shells:– emacs myfile.txt shell (parent process) stays busy while Emacs runs– emacs myfile.txt & & says: run in background: let shell run while Emacs runs

If you start a program in the foreground and want to do something else– emacs myfile.txt shell (parent process) stays busy while Emacs runs– CTRL-Z: suspend Emacs and let shell run– Choices after CTRL-Z: fg puts job back in foreground; bg resumes it in background

To find out about running background jobs– Run the jobs command– Each job is named: %1, %2, etc.– You can do things like kill -9 %1 or fg %2

Most of this is implemented with signals (e.g. CTRL-Z sends SIGSTP which by default pauses the process) – so you can do this from a program as well as from the shell

37

Summary from Prof. Couch: “ typing ./a.out in the shell is an explicit wait. typing ./a.out & in the shell is a background execution. ”

© 2010 Noah Mendelsohn38

Running a new Program with fork and exec

© 2010 Noah Mendelsohn

Using exec to launch a new program

Fork creates parallel copies of the same program

Exec replaces the code for a process with a brand new program and calls its “main” function

Common idiom: to run a new program

– fork() /* to create a child process */

– exec() /* have the child replace itself with the program to be run */

– [ optional: continue to do work in the parent while the child runs ]

– wait(): /* in the parent for the new program to complete */

Ever wondered where your return values from exit() go?

– The are available to the parent via: pid = wait(int *child_exit_status)

– So, the parent can find out if the child returned success or an error

39

© 2010 Noah Mendelsohn

Examples of fork/exec

Example: running a "cat" command in the foreground:

– http://www.cs.tufts.edu/comp/111/examples/The_Visible_OS/wait1.c

Example: running a "cat" command in the foreground with explicit wait:

– http://www.cs.tufts.edu/comp/111/examples/The_Visible_OS/wait2.c

Example: running a "cat" command in the background with implicit wait:

– http://www.cs.tufts.edu/comp/111/examples/The_Visible_OS/wait3.c

Example: running a user-typed command in the foreground without arguments:

– http://www.cs.tufts.edu/comp/111/examples/The_Visible_OS/shell.c

Example: running a user-typed command in the background without arguments:

– http://www.cs.tufts.edu/comp/111/examples/The_Visible_OS/shell2.c

40

These examples are from Prof. Couch’s lecture

© 2010 Noah Mendelsohn

Some things to watch with exec

Read the man page to find out the arguments it takes – there are several flavors

As with fork, the new program retains:

– Open files, environment variables, current working directory, owner, etc., etc.

All data and variables from the caller are replaced – if you have buffered I/O that is be lost

41

Consider the following:

main() { printf("this won't get seen at all…"); execl("/bin/cat", "cat", "/dev/null", 0); }

// prints nothing at all, because // the execl erases the unwritten line buffer!

© 2010 Noah Mendelsohn

Some things to watch with exec

Read the man page to find out the arguments it takes – there are several flavors

As with fork, the new program retains:

– Open files, environment variables, current working directory, owner, etc., etc.

All data and variables from the caller are replaced – if you have buffered I/O that will be lost

42

Consider the following:

main() { printf("this won't get seen at all…"); fprintf(stderr, "this will get seen, because stderr ” “flushes buffers on each write…"); execl("/bin/cat", "cat", "/dev/null", 0); }

/* prints “this will get seen, because stderr flushes buffers on each write” *//* the execl erases the unwritten stdout buffer, but the stderr output is already done */

By default, stderr does not buffer it’s output……as soon as you print it goes out. Try it!

© 2010 Noah Mendelsohn43

Today

How processes are created, managed and terminated

Sharing the computer (redux)

From file.c to a.out to running image

Library routines and shared libraries

© 2010 Noah Mendelsohn44

Running, Runnable & WaitingProcesses

© 2010 Noah Mendelsohn

MAIN MEMORY

CPU

Sharing the CPU

Angry Birds Play Video Browser

Multiple ProgramsRunning at once

OPERATING SYSTEM

CPU is shared…can only do one thing at a time*

*Modern multi-core CPUs can schedule one process/core at a time

© 2010 Noah Mendelsohn

Process scheduling

If processes are ready to run, the OS picks one and runs it

– The chosen process is in the running state

– Processes have priority and high priority processes are run more often

– The others are marked as ready (or runnable) (I.e. they’d like to run but need to wait their turn)

Some processes are healthy but waiting for something

– Reasons: sleep(), waiting for I/O, wait(), select(), page fault

– These processes are in the blocked state and they aren’t scheduled until that changes

Multicore CPUs: exactly the same, but we can have one running process on each core!

46

(For now, assume we have a simple one core CPU)

Designing process schedulers is an art. The strategy that gives good interactive response on a shared server may not be what you need for a massive database system!

© 2010 Noah Mendelsohn

The five state process model

47

New Ready Running Exit

Blocked

Admit

Dispatch

TimeoutEvent occurs Even

t w

ait

release

“Run queue” (processes in line for CPU)

See: Stallings 7th Edition Page 118

© 2010 Noah Mendelsohn

The five state process model

48

New Ready Running Exit

Blocked

Admit

Dispatch

TimeoutEvent occurs Even

t w

ait

release

“Run queue” (processes in line for CPU)

See: Stallings 7th Edition Page 118

In some OS’s, a blocked process can die without running any

cleanup.

© 2010 Noah Mendelsohn49

Sharing Memory

© 2010 Noah Mendelsohn

MAIN MEMORY

CPU

Sharing Memory

Angry Birds Play Video Browser

Multiple ProgramsRunning at once

All programs share memory

OPERATING SYSTEM

© 2010 Noah Mendelsohn

MAIN MEMORY

CPU

Memory shortage

Angry Birds Play Video Browser

What if we need more memory than we have?

OPERATING SYSTEMC compiler Browser Emacs

© 2010 Noah Mendelsohn

MAIN MEMORY

CPU

Swapping

Angry Birds Browser

The OS can “swap” some process memory to disk

OPERATING SYSTEM

Disk

C compiler Browser Emacs

Play Video

© 2010 Noah Mendelsohn

MAIN MEMORY

CPU

Swapping

Angry Birds

The OS can “swap” some process memory to disk

OPERATING SYSTEM

Disk

C compiler

Browser Emacs

Play Video

Browser

© 2010 Noah Mendelsohn

The five seven state process model

54

New Ready Running Exit

Blocked

Admit

Dispatch

Timeout

EventEv

ent w

ait

Release

“Run queue” (processes in line for CPU)

Ready / Suspend

See: Stallings 7th Edition Page 118

Blocked / Suspend

Event

© 2010 Noah Mendelsohn

The five seven state process model

55

New Ready Running Exit

Blocked

Admit

Dispatch

Timeout

EventEv

ent w

ait

Release

“Run queue” (processes in line for CPU)

Ready / Suspend

See: Stallings 7th Edition Page 118

Blocked / Suspend

Event

Processes that have been swapped to disk aren’t

scheduled to run even if they’re otherwise ready

© 2010 Noah Mendelsohn

Summary of paging and swapping

30 years ago, systems swapped whole processes

Today, page-size chunks of memory are moved and mapped individually– A process is often partially resident– The scheduler blocks when a particular page that’s needed is on disk– On our systems, pagesize = 4096 bytes (try command: “getconf PAGESIZE”)

Special hardware is needed to make this work– “Real” memory pages can be mapped to arbitrary locations in one or more virtual

memories– The hardware faults (tells the kernel) if a needed page is reference but not mapped– Pages can be mapped “read-only”: fault if write is attempted

The system tends to run well if the “working sets” of pages that programs reference a lot fit together in memory

56

Historical reference: Denning, P.J. (1968), The working set model for program behavior. Communications of the ACM, 5/1968, Volume 11, pp. 323-333

© 2010 Noah Mendelsohn57

“Copy on Write”A Classic OS Optimization

© 2010 Noah Mendelsohn

Each process has its own virtual memory

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

0

0xFFF…FFFF

Angry Birds Virtual Memory

© 2010 Noah Mendelsohn

Each process has its own virtual memory

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

0

0xFFF…FFFF

Angry Birds Virtual Memory

Consider what happens when we fork…

…we’ll need two complete copies of the whole process

memory!

© 2010 Noah Mendelsohn

Fork needs to copy the virtual memory

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

0

0xFFF…FFFF

Angry

B

irds

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

© 2010 Noah Mendelsohn

Each process has its own virtual memory

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

0

0xFFF…FFFF

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

Angry

B

irds

Can we find a way to make the copy cheap?

Yes!

© 2010 Noah Mendelsohn

Each process has its own virtual memory

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

0

0xFFF…FFFF

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

Angry

B

irds

Map all the pages in both VMs to the same actual

pages in memory…

© 2010 Noah Mendelsohn

Each process has its own virtual memory

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

0

0xFFF…FFFF

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

Angry

B

irds

Map all the pages in both VMs to the same actual

pages in memory…

…and mark them all read only…

© 2010 Noah Mendelsohn

Each process has its own virtual memory

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

0

0xFFF…FFFF

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds

Data)

Heap(malloc’d)

argv, environ

Angry

B

irds

Map all the pages in both VMs to the same actual

pages in memory…

…and mark them all read only…

If either process changes data, the OS will get the

write fault and copy just the updated page!

© 2010 Noah Mendelsohn

Shared executables

By the way: the same mapping tricks allow us to share a single copy of each executable (e.g. Emacs, gcc, firefox)

This works even if they are launched using unrelated exec calls

The system typically loads at most one copy of a given executable…any later exec calls just map it!

We can also share copies of libraries by using something called “shared libraries”…we’ll study those later today

65

© 2010 Noah Mendelsohn

Summary of copy on write

A classic optimization used in many systems

– Hardware must support read-only mapping at the page level

The time to clone a process or other data is small and often independent of the size

– Some work is necessary to set up the new maps: depends on the hardware

Read-only data is never copied (e.g. the program code!)

Writeable data is still not copied until it’s actually updated

66

© 2010 Noah Mendelsohn

Sharing executable programs

A similar trick is used to share the code for programs

– We’ve seen how that works with fork(), but…

Even if lots of copies of a program are loaded independently using exec(), there’s typically only one copy in memory or swapped to disk

This is a huge savings. Think of how many times our Halligan “homework” server runs:

– bash, tcsh, gcc, make, emacs, vim

– There’s typically just one loaded copy of each, no matter how many users

– Launching a new copy is very quick

67

© 2010 Noah Mendelsohn68

Today

How processes are created, managed and terminated

Sharing the computer (redux)

From file.c to a.out to running image

Library routines and shared libraries

© 2010 Noah Mendelsohn69

How do we get from source to executable program?

© 2010 Noah Mendelsohn

From source code to executable

70

#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n” sum(1,2))}

two_plus_one.c

gcc –c two_plus_one.c

int sum(int a, int b) { return a+b;}

gcc –c arith.c

Relocateable object code for sum()

arith.c

arith.oRelocateable object code for sum()

two_plus_one.o

© 2010 Noah Mendelsohn

From source code to executable

71

#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n” sum(1,2))}

two_plus_one.c

gcc –c two_plus_one.c

int sum(int a, int b) { return a+b;}

gcc –c arith.c

Relocateable object code for sum()

arith.c

arith.oRelocateable object code for main()

two_plus_one.o

Relocatable .o files

• Contain machine code• References within the file are resolved

• References to external files not resolved• Some address fields may need adjusting

depending on final location in executable program

© 2010 Noah Mendelsohn

Linking .o files to create executable

72

gcc –o two_plus_one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

gcc actually runs a program named “ld” to create the executable.

© 2010 Noah Mendelsohn

Linking .o files to create executable

73

gcc –o two_plus one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

The executable contains all the code, with references resolved. It is ready to be invoked using the exec_() family of system calls.

© 2010 Noah Mendelsohn

Linking .o files to create executable

74

gcc –o two_plus_one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

The default name for an executable is a.out so programmers sometimes informally refer to any executable as an “a.out”.

© 2010 Noah Mendelsohn75

Today

How processes are created, managed and terminated

Sharing the computer (redux)

From file.c to a.out to running image

Library routines and shared libraries

© 2010 Noah Mendelsohn

Ooops! Where does printf come from?

76

gcc –o two_plus one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

Routines like printf live in libraries.

© 2010 Noah Mendelsohn

Ooops! Where does printf come from?

77

gcc –o two_plus one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

Routines like printf live in libraries.

These are created with the “ar” command, which packages up several .o files together into a “.a” archive or library. You can list the .a along with your separate .o files and ld will pull from it any .o files it needs.

© 2010 Noah Mendelsohn

Ooops! Where does printf come from?

78

gcc –o two_plus one two_plus_one.o arith.o

Relocateable object code for sum()

two_plus_one.o

Relocateable object code for sum()

arith.o

Executable Program

two_plus_one

Routines like printf live in libraries.

These are created with the “ar” command, which packages up several .o files together into a “.a” archive or library. You can list the .a along with your separate .o files and ld will pull from it any .o files it needs.

printf used to live in the system library named libc.a, which the compiler links automatically into the executable (so you don’t have to list it).

© 2010 Noah Mendelsohn

Why shared libraries?

Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf

Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?

Challenges:

– We can’t link it when ld builds the rest of the executable: we can just note we need it

– The same copy is likely to be mapped at different addresses in different programs

79

© 2010 Noah Mendelsohn

Why shared libraries?

Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf

Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?

Challenges:

– We can’t link it when ld builds the rest of the executable: we can just note we need it

– The same copy is likely to be mapped at different addresses in different programs

Solution: compiler, linker and OS work together to support shared libraries

– gcc –fPIC printf.c generates “position-independent code” that can load at any address

– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library

– gcc –o two_plus_one two_plus_one.o arith.o libc.so

80

We’ll use printf as an example even though it’s built in to the system…

Compile the source with –fPIC to make a position-independent .o file.

© 2010 Noah Mendelsohn

Why shared libraries?

Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf

Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?

Challenges:

– We can’t link it when ld builds the rest of the executable: we can just note we need it

– The same copy is likely to be mapped at different addresses in different programs

Solution: compiler, linker and OS work together to support shared libraries

– gcc –fPIC printf.c generates “position-independent code” that can load at any address

– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library

– gcc –o two_plus_one two_plus_one.o arith.o libc.so

81

Link that printf.o and any other files with the –shared option to create a shared library (.so) file.

© 2010 Noah Mendelsohn

Why shared libraries?

Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf

Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?

Challenges:

– We can’t link it when ld builds the rest of the executable: we can just note we need it

– The same copy is likely to be mapped at different addresses in different programs

Solution: compiler, linker and OS work together to support shared libraries

– gcc –fPIC printf.c generates “position-independent code” that can load at any address

– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library

– gcc –o two_plus_one two_plus_one.o arith.o libc.so

82

The linker recognizes .so files…instead of including the code, it leaves a little stub that tells the OS to find and map the shared copy of the .so file when exec loads the program.

(Actually, libc.so is so widely used that it’s automatically linked, so you don’t need to list it as you would your own .so libraries).

© 2010 Noah Mendelsohn

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Angry

B

irds

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds Data)

Heap(malloc’d)

argv, environ

???

libc.so

Stack(Browser Call

Stack)

Text(Browser code)

Static initialized (Browser Data)

Static uninitialized(Browser Data)

Heap(malloc’d)

argv, environ

libc.so

libc.so (with printf code) shows up at

different locations in

the two programs

Memory mapping allows sharing of .so libraries

© 2010 Noah Mendelsohn

Memory mapping allows sharing of .so libraries

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds Data)

Heap(malloc’d)

argv, environ

Stack(Angry Birds Call Stack)

Text(Browser code)

Static initialized (Browser Data)

Static uninitialized(Browser Data)

Heap(malloc’d)

argv, environ

Angry

B

irds ???

libc.so

libc.so

libc.so

Only one copy lives in

memory… everyone shares it!

© 2010 Noah Mendelsohn

Memory mapping allows sharing of .so libraries

MA

IN M

EM

ORY

CPU

Angry

B

irds

Pla

y

Vid

eo

Bro

wse

r

OPER

ATIN

G S

YSTEM

Stack(Angry Birds Call Stack)

Text(Angry Birds

code)

Static initialized (Angry Birds

Data)

Static uninitialized(Angry Birds Data)

Heap(malloc’d)

argv, environ

Stack(Angry Birds Call Stack)

Text(Browser code)

Static initialized (Browser Data)

Static uninitialized(Browser Data)

Heap(malloc’d)

argv, environ

Angry

B

irds ???

libc.so

libc.so

libc.so

Memory mapping

hardware can do this…

Code must be position-

independent!

© 2010 Noah Mendelsohn86

Wrapup

© 2010 Noah Mendelsohn

Summary of today’s topics

Processes are cloned using fork

To run a new program: fork then exec

Kernel-to-program communication:

– A process calls the kernel using a system call (trap)

– The kernel calls a process using a signal

Processes form a tree, and dead processes must be “reaped”

The OS scheduler chooses high priority processes to run

Process memory can be swapped or paged to disk when memory is tight

Memory mapping & copy-on-write are used:

– To make fork quick

– To save memory by sharing executables and shared libraries across processes

87

© 2010 Noah Mendelsohn88

Thank you!