Upload
woody
View
50
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Improving IPC by Kernel Design Jochen Liedtke. Shane Matthews Portland State University. Summary. Review Performance improved Architecture Level Algorithmic Level Interface Level Coding Level. Micro-kernels. - PowerPoint PPT Presentation
Citation preview
Improving IPC by Kernel DesignJochen Liedtke
Shane MatthewsPortland State University
3/12/2004 Portland State University
Summary
• Review• Performance improved
– Architecture Level– Algorithmic Level– Interface Level– Coding Level
3
Micro-kernels• Minimal OS, providing
a set of primitives used to implement thread/address space management and IPC [1]
• Everything else is moved to user-space (servers)
4
Terminology (L3)• Dataspace
– Memory object, mapped into address space• Task
– Composed of threads, dataspaces, and an address space• Message
– String/memory object
5
L3 Architecture & IPC• Active components communicate via
messages• Applies to:
– Device drivers• Implemented as user level tasks
– Hardware Interrupts• Interrupt message from micro-kernel to thread
6
L3 Redesign Principles• IPC performance is the master
– Security and performance must not be affected• Synergetic effects taken into consideration
– (Think combined effects)– May lead to reinforcement or diminution
• Design must aim at performance goal– Per short message transfer– 350 cycles (7 micro-seconds)
3/12/2004 Portland State University
Architectural Level
• Messages• Process Structure• Control Blocks
3/12/2004 Portland State University
Compound Messages• Multiple
send/receive -> 1 send/receive
• Messages consists of direct/indirect strings, and memory objects
9
Twofold message copy
• [A space] -> [kernel] -> [B space]
• O(20 + .75n) cycles, n:= bytes
• Good for small messages
• Need something better as n grows
10
LRPC and SRC RPC• Client/server share user level memory
– sender -> shared buffer• Problems
– When server to client is 1 to many, shared regions of address space become critical resources
– Shared regions require explicit opens (unlike L3)– Message change during/after checking
11
Direct Message Copy Via Windows
• L3's method– Destination mapped
into window– Message copied to
window
• Window– per address space– Accessed exclusivly
by kernel
12
Communication Windows• Problems
– Must be fast– Different threads
coxisting within address space
• L3 Implementation– One word page
directory B to A.
13
Process Structure• Threads running kernel mode have 1 kernel
stack per thread– Efficient since interupts, page faults, IPC,
already save state on kernel stack• Continuations
– Pro: • Reduce kernel stack
– Cons: • Require additional copies between kernel and
continutation• Interfere with other optimizations
14
Tread Control Blocks• Implemented as large array in kernel
– fast tcb access• Array base + tcb # + tcb size
– Saves TLB misses (IPC)• kernel stacks of sender and reciever located in TCB
page
– Locking done via unmapping on TCB
3/12/2004 Portland State University
Algorithmic Level• Thread Identifier
• Lazy Scheduling
• Short Messages Via Registers
3/12/2004 Portland State University
Thread Identifier
• Thread addressed by 64-bit UID in user-mode
• Thread number in lower 32-bits of UID– AND with bit mask, add to TCB’s array base
3/12/2004 Portland State University
Lazy Scheduling
• IPC operation call or reply & receive next– Delete sending thread from ready queue– Insert into waiting queue– Delete receiving thread from waiting queue– Insert into ready queue
• Too many queue operations!
3/12/2004 Portland State University
Lazy Scheduling cont.
• L3 queue invariants– Ready queue contains all ready threads– Waiting queue contains at least all threads
waiting• TCB contains threads state (ready/waiting)• Scheduler removes all threads not
belonging to queue during queue parsing
3/12/2004 Portland State University
Short Messages Via Registers
• High proportion of messages are short– Ex. Driver ack/error, hardware interrupts
• 486– 7 general registers– 3 needed: sender ID, result code– 4 available
• 8-byte messages using coding scheme
3/12/2004 Portland State University
Interface Level
• Simple RPC stubs– Load registers, system call, check success– Compiler generates stubs inline
• Parameter Passing– Use registers when possible
3/12/2004 Portland State University
Coding Level
• Reduce cache and TLB misses– Short kernel code
• Short jumps, use registers, short address displacements
– IPC kernel code in one page– Handle save/restore of coprocessor lazily
• Delayed until different thread needs to use it
3/12/2004 Portland State University
Results• 100% would indicate
double the time increase
• Removal of all increase IPC time by 134% for 8 byte message
3/12/2004 Portland State University
Results• L3 VS Mach• System
– Intel 486 DX-50– 256 KB external
cache– 16 MB memory
3/12/2004 Portland State University
Results cont.
3/12/2004 Portland State University
Conclusions
• IPC improved by applying– Performance based reasoning– Synergetic effects– Architecture -> coding
26
References• [1]
http://en.wikipedia.org/wiki/Micro_kernel• [2] Improving IPC by Kernel Design -
Jochen Liedtke