21
ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems Design Design Programming Model Chapter 21: D. E. Comer

ECE 526 – Network Processing Systems Design

  • Upload
    charla

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

ECE 526 – Network Processing Systems Design. Programming Model Chapter 21: D. E. Comer. Overview. Recalled Network processors is complicated and heterogeneous architecture Hard to program it Need understand fine details of architecture Current approach assembly or subset of C language - PowerPoint PPT Presentation

Citation preview

Page 1: ECE 526 – Network Processing Systems Design

ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems

DesignDesignProgramming Model

Chapter 21: D. E. Comer

Page 2: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 2

OverviewOverview• Recalled

─ Network processors is complicated and heterogeneous architecture

─ Hard to program it• Need understand fine details of architecture• Current approach assembly or subset of C language

• Programming Model─ Filling the gap between application and architecture─ Natural interface (e.g. domain-specific language for

programmer)─ Abstraction of underlying hardware

• Enough architecture details to write efficient code• Not too complicated for programmer

• Two models─ Hardware specific model: IXP Programming Model─ General Models: NP–Click and ADAG

Page 3: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 3

IXP Programming ModelIXP Programming Model• What kind of software abstractions are used on

IXP?• Active Computing Element (ACE):

─ Fundamental software building block─ Used to construct packet processing system─ Runs on XScale, uE, host─ Handles control plane and fast or slow path packet

processing─ Coordinates and synchronizes with other ACEs─ Can have multiple outputs─ Can serve as part of pipeline

• Protocol processing is implemented by combining multiple ACEs

Page 4: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 4

ACE Terminology • Library ACE:

─ ACE that has been provided by Intel for basic functions

• Conventional ACE or Standard ACE:─ ACE build by customer─ Might make use of Intel’s Action Service Libraries

• Micro ACE─ ACE with two components:

• Core component (runs on XScale)• Microblock component (runs on uE)

• Terminology for microblocks:─ Source microblock: initial point that receives packets─ Transform microblock: intermediate point that accepts

and forwards packets─ Sink microblock: last point that sends packets

Page 5: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 5

ACE PartsACE Parts• An ACE contains four conceptual parts:• Initialization:

─ Initialization of data structures and variables before code execution

• Classification:─ ACE classifies packet on arrival─ Classification can be chosen or use default

• Actions:─ Based on classification an action is invoked

• Message and event management:─ ACE can generate or handle messages─ Communication with another ACE or hardware

Page 6: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 6

ACE BindingACE Binding• ACE can be bound together to implement protocol

processing:

• Binding happens when loading ACE into NP• Binding can be changed dynamically• Unbound targets perform silent discard

Page 7: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 7

ACE DivisionACE Division

Page 8: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 8

Microengine AssignmentMicroengine Assignment• Packet processing involves several microblocks• How should microblocks be allocated to

microengines?─ One microblock per micorengine─ Multiple microblocks per microengine (in pipeline)─ Multiple pipelines on multiple microengines

• What are pros and cons?─ Passing packets between microengines incurs overhead─ Pipelining causes inefficiencies if blocks are not equal in

size─ Multiple blocks per microengine causes contention and

requires more instruction storage

• Intel terminology: “microblock group”─ Set of microblock running on one microengine

Page 9: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 9

Microblocks GroupsMicroblocks Groups

• Microblock groups can be replicated to increase parallelism

Page 10: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 10

Microblock Group Replication • Performance Critical Groups can be replicated

Page 11: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 11

Control of Packet FlowControl of Packet Flow• Packets require different processing blocks

─ IP requires different microblocks than ARP─ Special packets get handed off to core

• “Dispatch Look” control packet flow among microblocks─ Each thread runs its own dispatch loop─ Infinite loop that grabs packets and hands them to

microblocks─ Return value from microblock determines the next step

• Invocation of microblockis similar to function call

Page 12: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 12

Dispatch LoopDispatch Loop• Example:

─ Three microblocks─ Ingress, IP, egress

Page 13: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 13

Click Model of IPv4Click Model of IPv4

NP-Click: A Programming Model for the Intel IXP1200 by Niraj Shah and etc, UC Berkeley

Page 14: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 14

My Approach: ADAGMy Approach: ADAG• Architecture-

independent workload representation

• ADAG (Annotated Directed Acyclic Graph)─ Node: processing task

• 3-tuple: the number of instructions, the number of memory reads and writes.

─ Edge: the dependency • edge weight: the amount of

data communicated between nodes.

Page 15: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 15

Profiling: Profiling: Trace GenerationTrace Generation• PacketBench [Ramaswamy 2003]• Data dependencies between registers and

memories• Control dependency for conditional branch

Page 16: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 16

Clustering AlgorithmClustering Algorithm• Ratio Cut [ Wei 1991]

─ identify the natural cluster without a-priori knowledge of the final number of clusters

─ cluster nodes together such that rij is minimized

─ top down approach─ NP-complete

• MLRC (Maximum Local Ratio Cut) ─ bottom-up─ merge the nodes that should be least separated and

recursively apply the process─ computation complexity O(n3)

Page 17: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 17

ADAG Mapping onto NPsADAG Mapping onto NPs• Goal: to generate a high

performance schedule• Mapping is NP-complete

problem• Using randomized mapping to

solve this NP-complete• Evaluate the randomized

mapping by an analytical performance model

B. A. Malloy, E. L. Lloyd, and M. L. Souffa. Scheduling DAG’s for asynchronous multiprocessor execution. IEEE Transactions on Parallel and Distributed Systems, 5(5):498–508, May 1994.

A0

A1

A2

A3

A4 A5 A6 A7

E0

E1

E2

E3

E4 E5 E6

E7

B0

B1

B2

B3

B4 B5 B6

B7

C0

C1

C2

C3

C4 C5

C6 C7

D0

D1

D2

D3

D4

D5 D6

D7

PE

ADAGNode

Page 18: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 18

Mapping Quality IMapping Quality I• Simulation setup: pipeline depth 1, width 8.• Performance model of ideal mapping:

Page 19: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 19

Mapping Quality IIMapping Quality II

• Exhaustive search: enumerates all possible mappings• Randomized search: randomly chooses a mapping

Page 20: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 20

SummarySummary• NP programming for high performance is hard

problem• Programming model is solution

─ Intel ACE ─ NP Click ─ ADAGs

Page 21: ECE 526 – Network Processing Systems Design

Ning Weng ECE 526 21

For Next Class and For Next Class and ReminderReminder

• Read Chapter 23• Lab 3• Project