Inter-Processor Communication for Heterogeneous Dual Core Systems Chun-Ming Huang, Ph.D. National Chip Implementation Center (CIC) [email protected] 2006/09/27

Inter-Processor Communication for

Heterogeneous Dual Core Systems

Chun-Ming Huang, Ph.D.National Chip Implementation Center (CIC)

[email protected]

2006/09/27

C. M. Huang / SLDC-IPC / 09.2006 2

Agenda IPC Overview IPC Schemes Nokia DSP Gateway TI DSP/BIOS Link IPC Hardware Architecture Conclusions

IPC Overview


What is IPC? Inter-Process Communication Inter-Processor Communication

P1 P2

Single-Chip Multi-Chip

Single-Core

Multi-Core

C1 C2

C1 C2 C3 C4

C1

C1 C2

C1 C2

C1 C2 C3 C4

C1

C1 C2

P1

P2

P1

P2

P1

P2

P1

P2

P1

P2

P1

P2

P1

P2

P1

P2

P1

P2

C1 C2

C1 C2 C3 C4

C1

C1 C2

P1

P2

P1

P2

P1

P2

P3

How to provide inter-process communication How to provide inter-process communication services for multi-core systems?services for multi-core systems?


Independent & Cooperating Process Processes executing concurrently in the multitasking

environment may be either independent processes or cooperating processes

A process is independent if it cannot affect or be affected by the other processes executing in the system; any process that does not share data with any other process is independent

A process is cooperating if it can affect or be affected by the other processes executing in the system; any process that shares data with other processes is a cooperating process

Silberschatz, et al., Operating System Principles, Seventh Edition


Why Allow Process Cooperation? Information sharing Computation speedup Modularity Convenience

Cooperating processes requires an inter-process communication (IPC) mechanism that will allow them to exchange data and information



IPC Example Unix pipe ls –l / | grep 2005 | wc 2 19 98 The grep utility searches text files for a pattern and prints all l

ines that contain that pattern. The wc utility displays a count of lines, words and characters

in a text file. Data exchange Synchronization


Operating System Kernel Components Process scheduler

– determines when and for how long a process execute on a processor Memory manager

– determines when and how memory is allocated to processes and what to do when memory becomes full

I/O manager– services input and output requests from and to hardware devices

Inter-process communication (IPC) manager– allows processes to communicate with one other

File system manager– organizes named collections of data on storage devices and provides an

interface for accessing data on those devicesDeitel, et al., Operating Systems, Third Edition


Linux Kernel 2.6.17.11drwxr-xr-x arch

drwxr-xr-x block

drwxr-xr-x crypto

drwxr-xr-x drivers

drwxr-xr-x fs

drwxr-xr-x include

drwxr-xr-x init

drwxr-xr-x ipc

drwxr-xr-x kernel

drwxr-xr-x lib

drwxr-xr-x mm

drwxr-xr-x net

drwxr-xr-x scripts

drwxr-xr-x security

drwxr-xr-x sound

drwxr-xr-x usr

-rw-r--r-- Makefile

-rw-r--r-- compat.c

-rw-r--r-- compat_mq.c

-rw-r--r-- mqueue.c

-rw-r--r-- msg.c

-rw-r--r-- msgutil.c

-rw-r--r-- sem.c

-rw-r--r-- shm.c

-rw-r--r-- util.c

-rw-r--r-- util.h

http://www.kernel.org


Machine-Independent SW in the FreeBSD KernelCategory Lines of Code Percentage of Kernel (%)Headers 38,158 4.8initialization 1,663 0.2kernel facilities 53,805 6.7generic interfaces 22,191 2.8interprocess communication 10,019 1.3

terminal handling 5,798 0.7virtual memory 24,714 3.1vnode memory 22,764 2.9local filesystem 28,067 3.5miscellaneous filesystems (19) 58,753 7.4network filesystem 22,436 2.8network communication 46,570 5.8Internet V4 protocols 41,220 5.2Internet V6 protocols 45,527 5.7IPsec 17,956 2.2netgraph 74,338 9.3cryptographic support 7,515 0.9GEOM layer 11,563 1.4CAM layer 41,805 5.2ATA layer 14,192 1.8ISA bus 10,984 1.4PCI bus 72,366 9.1pccard bus 6,916 0.9Linux compatibility 10,474 1.3Total Machine Independent 689,794 86.4

McKusic & Neville-Neil, The Design and Implementation of the FreeBSD Operating System


Homogeneous vs. Heterogeneous

TI OMAP 5910

Sun


Multiprocessor OS Organizations Can classify systems based on how processors share

operating system responsibilities Three types

– Master/slave– Separate kernels– Symmetrical organization

Deitel, et al., Operating Systems, Third Edition


Master/Slave Master/Slave organization

– Master processor executes the operating system– Slaves execute only user processors– Hardware asymmetry– Low fault tolerance– Good for computationally intensive jobs– Example: nCUBE system



Separate Kernels Separate kernels organization

– Each processor executes its own operating system– Some globally shared operating system data– Loosely coupled– Catastrophic failure unlikely, but failure of one processor results in

termination of processes on that processor– Little contention over resources– Example: Tandem system



Symmetrical Organization Symmetrical organization

– Operating system manages a pool of identical processors– High amount of resource sharing– Need for mutual exclusion– Highest degree of fault tolerance of any organization– Some contention for resources– Example: BBN Butterfly



Memory Access Architectures Memory access

– Can classify multiprocessors based on how processors share memory

– Goal: Fast memory access from all processors to all memory• Contention in large systems makes this impractical



Uniform Memory Access Uniform memory access (UMA) multiprocessor

– All processors share all memory– Access to any memory page is nearly the same for all processors

and all memory modules (disregarding cache hits)– Typically uses shared bus or crossbar-switch matrix– Also called symmetric multiprocessing (SMP)– Small multiprocessors (typically two to eight processors)



Uniform Memory Access



Non-Uniform Memory Access Non-uniform memory access (NUMA) multiprocessor

– Each node contains a few processors and a portion of system memory, which is local to that node

– Access to local memory faster than access to global memory (rest of memory)

– More scalable than UMA (fewer bus collisions)



Non-Uniform Memory Access



Cache-Only Memory Architecture Cache-only memory architecture (COMA) multiprocessor

– Physically interconnected as a NUMA is• Local memory vs. global memory

– Main memory is viewed as a cache and called an attraction memory (AM)

• Allows system to migrate data to node that most often accesses it at granularity of a memory line (more efficient than a memory page)

• Reduces the number of cache misses serviced remotely• Overhead

– Duplicated data items– Complex protocol to ensure all updates are received at all processors



Cache-Only Memory Architecture



No Remote Memory Access No-remote-memory-access (NORMA) multiprocessor

– Does not share physical memory– Some implement the illusion of shared physical memory—shared

virtual memory (SVM)– Loosely coupled– Communication through explicit messages– Distributed systems– Not networked system



No Remote Memory Access



Four Possible Cases

Symmetrical OSs Asymmetrical OSs

HomogeneousCores

CPU_A(OS_X)CPU_A(OS_X)

CPU_A(OS_X)CPU_A(OS_Y)

HeterogeneousCores

CPU_A(OS_X)CPU_B(OS_X)

CPU_A(OS_X)CPU_B(OS_Y)

IPC Schemes


Communication via Files Communication via files is in fact the oldest way of

exchanging data between programs. Program A writes data to a file and Program B reads it. In a system in which only one program can be run at any given time, this does not present any problem.

In a multitasking system, however both programs could be run as processes at least quasi-parallel to each other. Race conditions then usually produce inconsistencies in the file data which result from one program reading a data area before the other has finished modifying it, or both processes modifying the same area of memory at the same time.


Communication via Files Locking entire files

– lock file– fcntl( ) (POSIX), flock( ) (BSD 4.3)

Locking file areas (record locking)– Deadlock

1

2

...

Process 2Process 1

ReadWrite

WriteRead


Process Communication Models Message passing Shared memory

Process A M

Process B M

Kernel M

2 1

Process A

Process B

M

Kernel

2

1



IPC for Linux Linux IPC

– Many IPC mechanisms derived from traditional UNIX IPC • Allow processes to exchange information

– Some are better suited for particular applications• For example, those that communicate over a network or exchange short

messages with other local applications



IPC for Linux Signal Pipe Message queue Shared memory System V Semaphores Sockets


Signals Signals

– One of the first interprocess communication mechanisms available in UNIX systems

– Kernel uses them to notify processes when certain events occur– Do not allow processes to specify more than a word of data to ex

change with other processes– Created by the kernel in response to interrupts and exceptions, a

re sent to a process or thread• as a result of executing an instruction (such as a segmentation fault)• from another process (such as when one process terminates another) • from an asynchronous event



POSIX Signals



Signals A process/thread can handle a signal by

1. Ignore the signal—processes can ignore all but the SIGSTOP and SIGKILL signals.

2. Catch the signal—when a process catches a signal, it invokes its signal handler to respond to the signal.

3. Execute the default action that the kernel defines for that signal Default actions

– Abort: terminate immediately– Memory dump: Copies execution context before exiting– Ignore– Stop (i.e., suspend)– Continue (i.e., resume)



Signals Signal blocking

– A process or thread can block a signal• Signal is not delivered until process/thread stops blocking it

– While a signal handler is running, signals of that type are blocked by default• Still possible to receive signals of a different type

– Common signals are not queued• Real-time signals provide signal queuing



Pipes Pipes

– Producer process writes data to the pipe, after which the consumer process reads data from the pipe in first-in-first-out order

– When pipe is created, an inode that points to pipe buffer (page of data) is created

– Access to pipes is controlled by file descriptors• Can be passed between related processes (e.g., parent and child)

– Named pipes (FIFOs) ↔• Can be accessed via the directory tree

– Limitation: Fixed-size buffer



Message Queues Message queues

– Allow processes to transmit information that is composed of a message type and a variable-length data area

• Stored in message queues, remain until a process is ready to receive them• Related processes can search for a message queue identifier in a global

array of message queue descriptors– Message queue descriptor contains

» Queue of pending messages » Queue of processes waiting for messages » Queue of processes waiting to send messages » Data describing the size and contents of the message queue



Shared Memory Shared memory [protection schemes]

– Advantages• Improves performance for processes that frequently access shared data• Processes can share as much data as they can address

– Standard interfaces• System V shared memory• POSIX shared memory

– Does not allow processes to change privileges for a segment of shared memory



System V Shared Memory System Calls



Shared Memory Shared memory implementation

– Treats region of shared memory as a file– Shared memory page frames are freed when file is deleted– Tmpfs (temporary file system) stores such files

• Tmpfs pages are swappable• Permissions can be set• File system does not require formatting



System V Semaphores System V semaphores

– Designed for user processes to access via the system call interface Semaphore arrays

– Protect a group of related resources– Before a process can access resources protected by a semaphore

array, the kernel requires that there be sufficient available resources to satisfy the process’s request

– Otherwise, kernel blocks requesting process until resources become available

Preventing deadlock– When a process exits, the kernel reverses all the semaphore

operations it performed to allocate its resourcesDeitel, et al., Operating Systems, Third Edition


Sockets Sockets

– Allows pairs of processes to exchange data by establishing direct bidirectional communication channels

– Primarily used for bidirectional communication between multiple processes on different systems, but can be used for processes on the same system

– Stored internally as files– File name used as socket’s address, accessed via the VFS



Sockets Stream sockets

– Implement the traditional client/server model– Data is transferred as a stream of bytes– Use TCP to communicate, so they are more appropriate for reliable co

mmunication Datagram sockets

– Faster, but less reliable communication– Data is transferred using datagram packets

Socketpairs– Pair of connected, unnamed sockets– Limited to use by processes that share file descriptors



sf01a:cmhuang[/] ipcs

IPC status from <running system> as of Thu Sep 21 14:35:30 CST 2006T ID KEY MODE OWNER GROUPMessage Queues:

Shared Memory:m 1 0x50000d1d --rw-r--r-- root rootm 2 0xabbaca01 --rw-rw-rw- pc62 TRm 3103 0 --rw-rw-rw- cmhuang DSDm 1404 0 --rw-rw-rw- root root

Semaphores:s 0 0x1 --ra-ra-ra- root roots 2031617 0 --ra-ra-ra- cmhuang DSDs 917506 0 --ra-ra-ra- cmhuang DSD


IPC for WinXP Data oriented

– Pipes– Mailslots (message queues)– Shared memory

Procedure oriented / object oriented– Remote procedure calls– Microsoft COM objects– Clipboard– GUI drag-and-drop capability



Pipes Manipulated with file system calls

– Read– Write– Open

Pipe server– Process that creates pipe

Pipe clients– Processes that connect to pipe

Modes– Read: pipe server receives data from pipe clients– Write: pipe server sends data to pipe clients– Duplex: pipe server sends and receives data



Pipes Anonymous Pipes

– Unidirectional– Between local processes– Synchronous– Pipe handles, usually passed through inheritance

Named Pipes– Unidirectional or bidirectional– Between local or remote processes– Synchronous or asynchronous– Opened by name– Byte stream vs. message stream– Default mode vs. write-through mode



Mailslots Mailslot server: creates mailslot Mailslot clients: send messages to mailslot Communication

– Unidirectional– No acknowledgement of receipt– Local or remote communication– Implemented as files– Two modes

• Datagram: for small messages• Server Message Block (SMB): for large messages



Shared Memory File mapping

– Processes map their virtual memory to same page frames in physical memory

– Multiple processes access same file– No synchronization guaranteed

File mapping object– Maps file to main memory

File view– Maps a process’s virtual memory to main memory mapped by file

mapping objectDeitel, et al., Operating Systems, Third Edition

Nokia DSP Gateway


Nokia DSP Gateway Overview Supports TI OMAP1510, 1610, 5910, 5912, 2410, and 2412. GPP side

– Linux kernel 2.6.6– Linux device driver– Access DSP through normal system calls such as read() and write()

DSP side– TI DSP/BIOS– DSP kernel library (tokliBIOS) and API

http://dspgateway.sourceforge.net/pub/index.php


Nokia DSP Gateway Overview Current version: 3.3.1 (2006-09-13) Open source software Current license state:

Release License

1.0 GPL

2.X GPL

3.XARM pack DSP pack

GPL BSD


TI OMAP 1610


Summary of changes from v2.6.5 to v2.6.6 ============================================

<[email protected].(none)> [ARM PATCH] 1777/1: Add TI OMAP support to ARM core files

Patch from Tony Lindgren

This patch updates the ARM Linux core files to add support for Texas Instruments OMAP-1510, 1610, and 730 processors.

OMAP is an embedded ARM processor with integrated DSP.

OMAP-1610 has hardware support for USB OTG, which might be of interest to Linux developers. OMAP-1610 could be easily be used as development platform to add USB OTG support to Linux.

This patch is an updated version of an earlier patch 1767/1 with the dummy Kconfig added for OMAP as suggested by Russell King here:

http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=1767/1

This patch is brought to you by various linux-omap developers.

http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.6


TI DSP/BIOS Scalable real-time kernel Real-time scheduling and synchronization Host-to-target communication Real-time instrumentation Preemptive multi-threading Hardware abstraction Real-time analysis and configuration tools Application programs use DSP/BIOS by making calls to the API All DSP/BIOS modules provide C-callable interfaces


DSP Gateway System Architecture


Mailbox in OMAP1 Each set of mailbox registers consists of two 16-bit registers and a 1-bit

flag register. The interrupting processor can use one 16-bit register to pass a data

word to the interrupted processor and the other 16-bit register to pass a command word.


Mailbox in OMAP2 6 sets of mailbox registers, and each message register can carry a 32-

bit data two mailbox queues are reserved, MAILBOX_0 for ARM to DSP

direction and MAILBOX_1 for DSP to ARM direction


Mailbox Command and Data Register Command register bit definitions

Data register bit definitions


Mailbox Command Definition


Mailbox Command Sequence Configuration sequence

– System configuration– Task configuration– Task add/delete

Data transfer sequence– ARM to DSP transfer– DSP to ARM transfer– Task control– Read/write DSP register– Read/write DSP system parameters


System Configuration Sequence


DSPCFG Command


ARM to DSP Passive Word Receiving


ARM to DSP Active Word Receiving


ARM to DSP Passive Block Receiving


IPC Buffer It is unrealistic to transfer a large amount of data between tw

o processors with only mailbox registers. Therefore, IPBUF (Inter-Processor Buffer) is introduced for the large block data transfer.

There are three types of IPBUFs:– Global IPBUF– Private IPBUF– System IPBUF


Global IPBUF The Global IPBUFs are defined for the block data transfer be

tween ARM and DSP. The Global IPBUF lines are identified with BID (Buffer ID), an

d all tasks can use them commonly. The maximum line size is 64k words (128k bytes).


Global IPBUF


DSP Gateway Linux Device Interfaces


DSP Gateway Linux APIs


Passive Receiving Task


Active Receiving Task

TI DSP/BIOS Link


TI DSP/BIOS Link For TI OMAP5910/5912, Davinci, and DM642 devices. DSP/BIOS Link is a no-charge, royalty-free product and is pr

ovided in C source code form. Current version: 1.30.06 (Nov. 22, 2005) Portable across different operating systems. OS (GPP) + DSP/BIOS (DSP)

http://focus.ti.com/dsp/docs/dspsupportatn.tsp?sectionId=3&tabId=477&familyId=44&toolTypeId=5


DSP/BIOS Link Supported Platforms Davinci running Montavista Linux Pro 4.0 or PrKernel v4.1 on

ARM OMAP5912 running Montavista Linux Pro 3.1 on ARM DA300 running PrKernel v4.1 on ARM DM642 connected to a PC running Red Hat Linux 9.0 or Red

Hat Enterprise Linux 4.0


Software Architecture of DSP/BIOS Link


On the GPP Side The OS ADAPTATION LAYER encapsulates the generic OS services

that are required by the other components of DSP/BIOS LINK. This component exports a generic API that insulates the other components from the specifics of an OS. All other components use this API instead of direct OS calls. This makes DSP/BIOS LINK portable across different operating systems.

The LINK DRIVER encapsulates the low-level control operations on the physical link between the GPP and DSP. This module is responsible for controlling the execution of the DSP and data transfer using defined protocol across the GPP-DSP boundary.


On the GPP Side The PROCESSOR MANAGER maintains book-keeping

information for all components. It also allows different boot-loaders to be plugged into the system. It builds exposes the control operations provided by the LINK DRIVER to the user through the API layer.

The DSP/BIOS LINK API is interface for all clients on the GPP side. This is a very thin component and usually doesn’t do any more processing than parameter validation. The API layer can be considered as ‘skin’ on the ‘muscle’ mass contained in the PROCESSOR MANAGER and LINK DRIVER.


On the DSP Side The LINK DRIVER is one of the drivers in DSP/BIOS. This

driver specializes in communicating with the GPP over the physical link.

There is no specific DSP/BIOS LINK API on the DSP. The communication (data/message transfer) is done using the DSP/BIOS modules - SIO/GIO/MSGQ.


DSP/BIOS Link Key Components PROC

– This component represents the DSP processor in the application space.

– This component provides services to:• Initialize the DSP & make it available for access from the GPP.• Load code on the DSP.• Start execution from the run address specified in the executable.• Read from or write to DSP memory.• Stop execution.• Additional platform-specific control actions.

– In the current version, only one processor is supported. However, the APIs are designed to support multiple DSPs and hence they accept a processorID argument to support this future enhancement.


DSP/BIOS Link Key Components CHNL

– This component represents a logical data transfer channel in the application space.

– CHNL is responsible for the data transfer across the GPP and DSP. – CHNL is an acronym for ‘channel’.– A channel (when referred in context of DSP/BIOS LINK) is:

• A means of transferring data across GPP and DSP.• A logical entity mapped over a physical connectivity between the GPP and

DSP.• Uniquely identified by a number within the range of channels for a specific

physical link towards a DSP.• Unidirectional. The direction of a channel is decided at run time based on the

attributes passed to the corresponding API.


DSP/BIOS Link Key Components MSGQ

– This component represents queue based messaging– This component is responsible for exchanging short messages of

variable length between the GPP and DSP clients. It is based on the MSGQ module in DSP/BIOS.

– The messages are sent and received through message queues.– A reader gets the message from the queue and a writer puts the

message on a queue. A message queue can have only one reader and many writers. A task may read from and write to multiple message queues.


DSP/BIOS Link Key Components POOL

– This component provides APIs to open and close memory pools, which are used by the CHNL and MSGQ component for allocating the buffers used in data transfer and messaging respectively.

– This component is responsible for providing a uniform view of different memory pool implementations, which may be specific to the hardware architecture or OS on which DSP/BIOS LINK is ported. This component is based on the POOL interface in DSP/BIOS.


Initialization Phase API PROC

– PROC_Setup()– PROC_Attach()– PROC_Load()

CHNL– CHNL_Create()– CHNL_AllocateBuffer()

MSGQ– MSGQ_TransportOpen()– MSGQ_Open()– MSGQ_SetErrorHandler()– MSGQ_Locate()

POOL– POOL_Open()


Execution Phase API PROC

– PROC_Start()– PROC_Read()– PROC_Write()– PROC_Stop()

CHNL– CHNL_Issue()– CHNL_Reclaim()

MSGQ– MSGQ_Alloc()– MSGQ_Put()– MSGQ_Get()– MSGQ_GetSrcQueue()– MSGQ_Free()


Finalization Phase API PROC

– PROC_Detach()– PROC_Destroy()

CHNL– CHNL_FreeBuffer()– CHNL_Delete()

MSGQ– MSGQ_Release()– MSGQ_TransportClose()– MSGQ_Close()

POOL– POOL_Close()

IPC Hardware Architecture


Tightly Coupled vs. Loosely Coupled Systems

Tightly coupled systems– Processors share most resources including memory– Communicate over shared buses using shared physical memory

Loosely coupled systems– Processors do not share most resources– Most communication through explicit messages or shared virtual

memory (although not shared physical memory) Comparison

– Loosely coupled systems: more flexible, fault tolerant, scalable– Tightly coupled systems: more efficient, less burden to operating

system programmers Deitel, et al., Operating Systems, Third Edition


Tightly Coupled Systems



Loosely Coupled Systems



Processor Interconnection Schemes Interconnection scheme

– Describes how the system’s components, such as processors and memory modules, are connected

– Consists of nodes (components or switches) and links (connections)

– Parameters used to evaluate interconnection schemes• Node degree• Bisection width• Network diameter• Cost of the interconnection scheme



Shared bus multiprocessor organization.

Processor Interconnection Schemes



Crossbar-switch matrix multiprocessor organization.




4-connected 2-D mesh network.




3- and 4-dimensional hypercubes.




Multistage baseline network.




A Simple IPC Architecture ARM writes command in shared

memory ARM interrupts DSP DSP responds to interrupt and

reads command in shared memory

DSP executes a task based on the command

DSP interrupts ARM upon completion of the task

TMS320DM644x DMSoC ARM Subsystem Reference Guide (SPRUE14)


TI OMAP5910

C. M. Huang / SLDC-IPC / 09.2006 100

OMAP5910 IPC Architecture Mailbox registers

– Each direction 32bit x 2– Interrupt occurrence

MPU interface (MPUI)– MPU accesses DSP memory

space directly Shared memory

– Arrangement with the Traffic Controller

– 3 type of memories– Best suitable to large amount of

data sharing

C. M. Huang / SLDC-IPC / 09.2006 101

Traffic Controller (TC) The IMIF allows access to the 192K bytes of on-chip SRAM. The EMIFS interface provides 16-bit-wide access to asynchronous or synchronous m

emories. The EMIFF Interface provides access to 16-bit-wide access to standard SDRAM me

mories. The TC provides the functions of

– arbitrating contending accesses to the same memory interface from different initiators (MPU, DSP, System DMA, Local Bus),

– synchronization of accesses due to the initiators and the memory interfaces running at different clock rates,

– and the buffering of data allowing burst access for more efficient multiplexing of transfers from multiple initiators to the memory interfaces.

The TC’s architecture allows simultaneous transfers between initiators and different memory interfaces without penalty. For instance, if the MPU is accessing the EMIFF at the same time, the DSP is accessing the IMIF, transfers may occur simultaneously since there is no contention for resources.

C. M. Huang / SLDC-IPC / 09.2006 102

ARM IPCM Module The IPCM provides up to 32 mailboxes with control logic and

interrupt generation to support inter-processor communication.

An AHB interface enables access from source and destination cores.

The IPCM:– sends interrupts to other cores– passes small amounts of data to other cores.

A source core can have multiple mailboxes and send messages in parallel (multitasking).

PrimeCell Inter-Processor Communications Module Technical Reference Manual

C. M. Huang / SLDC-IPC / 09.2006 103

IPCM Components 1-32 programmable mailboxes, each comprising:

– a single 1-32-bit Mailbox Source Register– a single 1-32-bit Mailbox Destination Register – a single 2-bit Mailbox Mode Register– a single 1-32-bit Mailbox Mask Register– a single 2-bit Mailbox Send Register– 0-7 32-bit data registers to store the message.

1-32 sets of read-only interrupt status registers, one for each interrupt, each comprising:– 1-32-bit Raw Interrupt Status Register (each bit corresponds to each mailbox)– 1-32-bit Masked Interrupt Status Register (each bit corresponds to each mailbox).

A 32-bit Configuration Status Register

C. M. Huang / SLDC-IPC / 09.2006 104

IPCM Functional Block

PrimeCell Inter-Processor Communications Module Technical Reference Manual

C. M. Huang / SLDC-IPC / 09.2006 105

IPCM Example

C. M. Huang / SLDC-IPC / 09.2006 106

IPCM Example Core0 has a message to send to Core1. Core0 claims the mailbox by

setting bit 0 in the Mailbox Source Register. Core0 then sets bit 1 in the Mailbox Destination Register, enables the interrupts and programs the message into the Mailbox Data Registers. Finally, Core0 sends the message by writing 01 to the Mailbox Send Register. This asserts the interrupt to Core1.

When Core1 is interrupted, it reads the Masked Interrupt Status Register for IPCMINT[1] to determine which mailbox contains the message. Core1 reads the message in that mailbox, then clears the interrupt and asserts the acknowledge interrupt by writing 10 to the Mailbox Send Register.

Core0 is interrupted with the acknowledge message, completing the operation. Core0 then decides whether to retain the mailbox to send another message or release the mailbox, freeing it up for other cores in the system to use it.

Conclusions

C. M. Huang / SLDC-IPC / 09.2006 108

Conclusions IPC schemes for supporting many cores Performance and power consumption analysis for different

IPC schemes IPC API schemes

Thanks for Your Attention!

Documents

Inter-Processor Communication for Heterogeneous Dual Core Systems Chun-Ming Huang, Ph.D. National Chip Implementation Center (CIC) [email protected] 2006/09/27