ECE 526 – Network Processing Systems Design

ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems

DesignDesignProgramming Model

Chapter 21: D. E. Comer

Ning Weng ECE 526 2

OverviewOverview• Recalled

─ Network processors is complicated and heterogeneous architecture

─ Hard to program it• Need understand fine details of architecture• Current approach assembly or subset of C language

• Programming Model─ Filling the gap between application and architecture─ Natural interface (e.g. domain-specific language for

programmer)─ Abstraction of underlying hardware

• Enough architecture details to write efficient code• Not too complicated for programmer

• Two models─ Hardware specific model: IXP Programming Model─ General Models: NP–Click and ADAG

Ning Weng ECE 526 3

IXP Programming ModelIXP Programming Model• What kind of software abstractions are used on

IXP?• Active Computing Element (ACE):

─ Fundamental software building block─ Used to construct packet processing system─ Runs on XScale, uE, host─ Handles control plane and fast or slow path packet

processing─ Coordinates and synchronizes with other ACEs─ Can have multiple outputs─ Can serve as part of pipeline

• Protocol processing is implemented by combining multiple ACEs

Ning Weng ECE 526 4

ACE Terminology • Library ACE:

─ ACE that has been provided by Intel for basic functions

• Conventional ACE or Standard ACE:─ ACE build by customer─ Might make use of Intel’s Action Service Libraries

• Micro ACE─ ACE with two components:

• Core component (runs on XScale)• Microblock component (runs on uE)

• Terminology for microblocks:─ Source microblock: initial point that receives packets─ Transform microblock: intermediate point that accepts

and forwards packets─ Sink microblock: last point that sends packets

Ning Weng ECE 526 5

ACE PartsACE Parts• An ACE contains four conceptual parts:• Initialization:

─ Initialization of data structures and variables before code execution

• Classification:─ ACE classifies packet on arrival─ Classification can be chosen or use default

• Actions:─ Based on classification an action is invoked

• Message and event management:─ ACE can generate or handle messages─ Communication with another ACE or hardware

Ning Weng ECE 526 6

ACE BindingACE Binding• ACE can be bound together to implement protocol

processing:

• Binding happens when loading ACE into NP• Binding can be changed dynamically• Unbound targets perform silent discard

Ning Weng ECE 526 7

ACE DivisionACE Division

Ning Weng ECE 526 8

Microengine AssignmentMicroengine Assignment• Packet processing involves several microblocks• How should microblocks be allocated to

microengines?─ One microblock per micorengine─ Multiple microblocks per microengine (in pipeline)─ Multiple pipelines on multiple microengines

• What are pros and cons?─ Passing packets between microengines incurs overhead─ Pipelining causes inefficiencies if blocks are not equal in

size─ Multiple blocks per microengine causes contention and

requires more instruction storage

• Intel terminology: “microblock group”─ Set of microblock running on one microengine

Ning Weng ECE 526 9

Microblocks GroupsMicroblocks Groups

• Microblock groups can be replicated to increase parallelism

Ning Weng ECE 526 10

Microblock Group Replication • Performance Critical Groups can be replicated


Control of Packet FlowControl of Packet Flow• Packets require different processing blocks

─ IP requires different microblocks than ARP─ Special packets get handed off to core

• “Dispatch Look” control packet flow among microblocks─ Each thread runs its own dispatch loop─ Infinite loop that grabs packets and hands them to

microblocks─ Return value from microblock determines the next step

• Invocation of microblockis similar to function call


Dispatch LoopDispatch Loop• Example:

─ Three microblocks─ Ingress, IP, egress


Click Model of IPv4Click Model of IPv4

NP-Click: A Programming Model for the Intel IXP1200 by Niraj Shah and etc, UC Berkeley


My Approach: ADAGMy Approach: ADAG• Architecture-

independent workload representation

• ADAG (Annotated Directed Acyclic Graph)─ Node: processing task

• 3-tuple: the number of instructions, the number of memory reads and writes.

─ Edge: the dependency • edge weight: the amount of

data communicated between nodes.


Profiling: Profiling: Trace GenerationTrace Generation• PacketBench [Ramaswamy 2003]• Data dependencies between registers and

memories• Control dependency for conditional branch


Clustering AlgorithmClustering Algorithm• Ratio Cut [ Wei 1991]

─ identify the natural cluster without a-priori knowledge of the final number of clusters

─ cluster nodes together such that rij is minimized

─ top down approach─ NP-complete

• MLRC (Maximum Local Ratio Cut) ─ bottom-up─ merge the nodes that should be least separated and

recursively apply the process─ computation complexity O(n3)


ADAG Mapping onto NPsADAG Mapping onto NPs• Goal: to generate a high

performance schedule• Mapping is NP-complete

problem• Using randomized mapping to

solve this NP-complete• Evaluate the randomized

mapping by an analytical performance model

B. A. Malloy, E. L. Lloyd, and M. L. Souffa. Scheduling DAG’s for asynchronous multiprocessor execution. IEEE Transactions on Parallel and Distributed Systems, 5(5):498–508, May 1994.

A0

A1

A2

A3

A4 A5 A6 A7

E0

E1

E2

E3

E4 E5 E6

E7

B0

B1

B2

B3

B4 B5 B6

B7

C0

C1

C2

C3

C4 C5

C6 C7

D0

D1

D2

D3

D4

D5 D6

D7

PE

ADAGNode


Mapping Quality IMapping Quality I• Simulation setup: pipeline depth 1, width 8.• Performance model of ideal mapping:


Mapping Quality IIMapping Quality II

• Exhaustive search: enumerates all possible mappings• Randomized search: randomly chooses a mapping


SummarySummary• NP programming for high performance is hard

problem• Programming model is solution

─ Intel ACE ─ NP Click ─ ADAGs


For Next Class and For Next Class and ReminderReminder

• Read Chapter 23• Lab 3• Project

Documents

ECE 526 – Network Processing Systems Design