Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Presented by: Tim Fleck

User-Level Interprocess Communication for Shared Memory Multiprocessors

Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy.

Presented by: Tim Fleck

Interprocess Communication (IPC) User-Level Remote Procedure Call (URPC) URPC Design Rational

Processor Reallocation Data Transfer Using Shared Memory Thread Management

URPC Performance Latency Throughput

Related Work Conclusion

OUTLINE

Central to the design of contemporary Operating Systems

Encourages system decomposition across address space boundaries◦ Fault isolation◦ Extensibility◦ Modularity

Provides for communication between different address spaces on the same machine

Interprocess Communication

Extent of the usability of the address spaces depends on the performance of the communication primitives

IPC has bee the responsibility of the kernel – which has two significant issues◦ Architectural performance barriers

Performance of kernel-based synchronous communication is limited by the cost of invoking the kernel and reallocating processor between address spaces

In prior work on LRPC. 70% of the overhead can be attributed to the kernel’s mediation of the cross-address space call

◦ Interaction between kernel-based communication and high-performance user-level threads For satisfactory performance, medium- and fine-grained

parallel applications need user-level thread management. Costs (performance and system complexity) high for

partitioning strongly interdependent communication and thread management across protection boundaries.

Interprocess Communication

Eliminate the kernel from the path of cross-address space communication

User-level Remote Procedure Call improves performance because:◦ Messages are sent between address spaces

directly without invoking the kernel◦ Eliminates unnecessary CPU reallocation◦ When CPU reallocation is needed, the cost can be

amortized over multiple independent calls◦ Exploitation of inherent parallelism in message

sending and receiving improves performance.

Solution

In many contemporary OS’s applications communicate via narrow channels or ports

Only a few available operations – create, send, receive, destroy

Permit program to program communication across address space boundaries or even machine to machine

Messages are powerful, but they represent a control and data structure alien to traditional Algol-like languages.

Messages Review

Almost every mature OS supports RPCs which enable messages to do the work with a procedure call interface

RPCs provide the synchronous language-level transfer of control between programs in different address spaces

Communication occurs through a narrow channel, which is left undefined as to its specific operation

Remote Procedure Call (RPC)

URPC exploits the lack of definition of the RPC channel in two ways◦ Messages are passed between address spaces

through logical channels kept in memory and shared between client and server

◦ Thread management is implemented at the user-level and manages messages at the user-level without kernel involvement for a call or reply

URPC provides synchronous, typed messages for the programmer, hiding the asynchronous untyped characteristics below the thread management layer

User-level Remote Procedure Call (URPC)

URPC provides safe and efficient communication between address spaces on the same machine without kernel mediation

Isolates the three components of interprocess communication: processor reallocation, thread management, and data transfer

Kernel involvement limited to CPU reallocation Control transfer handled by thread management

and CPU reallocation A simple procedure call with URPC has a latency

of 93 µsecs compared to the LRPC’s 157 µsecs

User-level Remote Procedure Call (URPC)

Designed on the Observation that there are several independent components to a cross-address space call.

Main components are: Processor Reallocation

Ensuring that there is a physical processor to handle the client’s call in the server and the server’s reply in the client

Data Transfer Using Shared Memory Moving arguments between the client and server address

spaces Thread Management

Blocking the caller’s thread, running a thread through the procedure in the server’s address space, and resuming the caller’s thread on return

URPC Design Rational

Aim is to reduce the frequency that CPU reallocations occur with an optimistic reallocation policy

Optimistic assumption◦ Client has other work to do◦ Server will soon have a processor to service a

message Some situations to not be optimistic and invoke

the kernel for a reallocation◦ Single thread applications◦ High-latency I/O◦ Real-Time applications◦ Priority invocations

Processor Reallocation

Kernel handles processor reallocation to underpowered address spaces

Invoked using Processor.Donate which identifies the receiving address space to the kernel

Receiver is given identity of the caller by the kernel

The voluntary return of the processor is not guaranteed

Processor Reallocation

Three applications in there own address spaces◦ Editor as the Client◦ Server WinMgr◦ Server FCMgr

Two available processors

Two threads T1 & T2 in the client

SampleExecution

In URPC each client-server combination is bound to a pair-wise mapped logical channel in shared memory

Mapping occurs once before the first call Applications access URPC through the stubs

layer Safety of the communication is the

responsibility of the stubs Unlike traditional RPC the kernel is NOT

invoked to copy data from one address space to another

Data Transfer Using Shared Memory

Data flows over a bidirectional shared memory queue with non-spinning test-and-set locks on either end

Data Transfer Using Shared Memory

Calling semantics of cross address space procedure call are synchronous with respect to the calling thread

Each communication function (send-receive) has corresponding thread management function (start-stop)

This close interaction between threads and communication can be exploited with user-level implementation to achieve good performance for both

Thread Management

Thread overhead – points of reference◦ Heavyweight – kernel makes no distinction

between a thread and its address space◦ Middleweight – Kernel managed but decoupled

from address space to allow multiple threads◦ Lightweight – user-level managed via libraries

that execute in the context of weightier threads Lightweight thread usage implies two level

scheduling◦ Lightweight threads scheduled user-level on

heavier threads◦ Heavier threads scheduled by the kernel

Thread Management

Cost of thread management actions between URPC and Toas threads

Breakdown of the time taken by each component when no processor reallocation needed

URPC Performance

C-client processorsS-Server processorsT-runnable client threads

Time for T threads to make 100,000 “Null” procedure calls.

Latency measured from call into the Null stub until control returns from the stub

URPC Performance - Latency

C-client processorsS-Server processorsT-runnable client threads

Time for T threads to make 100,000 “Null” procedure calls.

URPC Performance - Throughput

URPC represents the appropriate division of responsibility between the user-level and the system kernel in shared memory multiprocessor systems

Performance improves over kernel involved message methods

URPC demonstrates the advantages to designing system facilities for the capabilities of a multiprocessor machine and making the distinction between a multiprocessor OS and uniprocessor OS that runs on a multiprocessor

Conclusion

Documents

Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy. Presented by: Tim Fleck