Upload
charla
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
ECE 526 – Network Processing Systems Design. Programming Model Chapter 21: D. E. Comer. Overview. Recalled Network processors is complicated and heterogeneous architecture Hard to program it Need understand fine details of architecture Current approach assembly or subset of C language - PowerPoint PPT Presentation
Citation preview
ECE 526 – Network ECE 526 – Network Processing Systems Processing Systems
DesignDesignProgramming Model
Chapter 21: D. E. Comer
Ning Weng ECE 526 2
OverviewOverview• Recalled
─ Network processors is complicated and heterogeneous architecture
─ Hard to program it• Need understand fine details of architecture• Current approach assembly or subset of C language
• Programming Model─ Filling the gap between application and architecture─ Natural interface (e.g. domain-specific language for
programmer)─ Abstraction of underlying hardware
• Enough architecture details to write efficient code• Not too complicated for programmer
• Two models─ Hardware specific model: IXP Programming Model─ General Models: NP–Click and ADAG
Ning Weng ECE 526 3
IXP Programming ModelIXP Programming Model• What kind of software abstractions are used on
IXP?• Active Computing Element (ACE):
─ Fundamental software building block─ Used to construct packet processing system─ Runs on XScale, uE, host─ Handles control plane and fast or slow path packet
processing─ Coordinates and synchronizes with other ACEs─ Can have multiple outputs─ Can serve as part of pipeline
• Protocol processing is implemented by combining multiple ACEs
Ning Weng ECE 526 4
ACE Terminology • Library ACE:
─ ACE that has been provided by Intel for basic functions
• Conventional ACE or Standard ACE:─ ACE build by customer─ Might make use of Intel’s Action Service Libraries
• Micro ACE─ ACE with two components:
• Core component (runs on XScale)• Microblock component (runs on uE)
• Terminology for microblocks:─ Source microblock: initial point that receives packets─ Transform microblock: intermediate point that accepts
and forwards packets─ Sink microblock: last point that sends packets
Ning Weng ECE 526 5
ACE PartsACE Parts• An ACE contains four conceptual parts:• Initialization:
─ Initialization of data structures and variables before code execution
• Classification:─ ACE classifies packet on arrival─ Classification can be chosen or use default
• Actions:─ Based on classification an action is invoked
• Message and event management:─ ACE can generate or handle messages─ Communication with another ACE or hardware
Ning Weng ECE 526 6
ACE BindingACE Binding• ACE can be bound together to implement protocol
processing:
• Binding happens when loading ACE into NP• Binding can be changed dynamically• Unbound targets perform silent discard
Ning Weng ECE 526 7
ACE DivisionACE Division
Ning Weng ECE 526 8
Microengine AssignmentMicroengine Assignment• Packet processing involves several microblocks• How should microblocks be allocated to
microengines?─ One microblock per micorengine─ Multiple microblocks per microengine (in pipeline)─ Multiple pipelines on multiple microengines
• What are pros and cons?─ Passing packets between microengines incurs overhead─ Pipelining causes inefficiencies if blocks are not equal in
size─ Multiple blocks per microengine causes contention and
requires more instruction storage
• Intel terminology: “microblock group”─ Set of microblock running on one microengine
Ning Weng ECE 526 9
Microblocks GroupsMicroblocks Groups
• Microblock groups can be replicated to increase parallelism
Ning Weng ECE 526 10
Microblock Group Replication • Performance Critical Groups can be replicated
Ning Weng ECE 526 11
Control of Packet FlowControl of Packet Flow• Packets require different processing blocks
─ IP requires different microblocks than ARP─ Special packets get handed off to core
• “Dispatch Look” control packet flow among microblocks─ Each thread runs its own dispatch loop─ Infinite loop that grabs packets and hands them to
microblocks─ Return value from microblock determines the next step
• Invocation of microblockis similar to function call
Ning Weng ECE 526 12
Dispatch LoopDispatch Loop• Example:
─ Three microblocks─ Ingress, IP, egress
Ning Weng ECE 526 13
Click Model of IPv4Click Model of IPv4
NP-Click: A Programming Model for the Intel IXP1200 by Niraj Shah and etc, UC Berkeley
Ning Weng ECE 526 14
My Approach: ADAGMy Approach: ADAG• Architecture-
independent workload representation
• ADAG (Annotated Directed Acyclic Graph)─ Node: processing task
• 3-tuple: the number of instructions, the number of memory reads and writes.
─ Edge: the dependency • edge weight: the amount of
data communicated between nodes.
Ning Weng ECE 526 15
Profiling: Profiling: Trace GenerationTrace Generation• PacketBench [Ramaswamy 2003]• Data dependencies between registers and
memories• Control dependency for conditional branch
Ning Weng ECE 526 16
Clustering AlgorithmClustering Algorithm• Ratio Cut [ Wei 1991]
─ identify the natural cluster without a-priori knowledge of the final number of clusters
─ cluster nodes together such that rij is minimized
─ top down approach─ NP-complete
• MLRC (Maximum Local Ratio Cut) ─ bottom-up─ merge the nodes that should be least separated and
recursively apply the process─ computation complexity O(n3)
Ning Weng ECE 526 17
ADAG Mapping onto NPsADAG Mapping onto NPs• Goal: to generate a high
performance schedule• Mapping is NP-complete
problem• Using randomized mapping to
solve this NP-complete• Evaluate the randomized
mapping by an analytical performance model
B. A. Malloy, E. L. Lloyd, and M. L. Souffa. Scheduling DAG’s for asynchronous multiprocessor execution. IEEE Transactions on Parallel and Distributed Systems, 5(5):498–508, May 1994.
A0
A1
A2
A3
A4 A5 A6 A7
E0
E1
E2
E3
E4 E5 E6
E7
B0
B1
B2
B3
B4 B5 B6
B7
C0
C1
C2
C3
C4 C5
C6 C7
D0
D1
D2
D3
D4
D5 D6
D7
PE
ADAGNode
Ning Weng ECE 526 18
Mapping Quality IMapping Quality I• Simulation setup: pipeline depth 1, width 8.• Performance model of ideal mapping:
Ning Weng ECE 526 19
Mapping Quality IIMapping Quality II
• Exhaustive search: enumerates all possible mappings• Randomized search: randomly chooses a mapping
Ning Weng ECE 526 20
SummarySummary• NP programming for high performance is hard
problem• Programming model is solution
─ Intel ACE ─ NP Click ─ ADAGs
Ning Weng ECE 526 21
For Next Class and For Next Class and ReminderReminder
• Read Chapter 23• Lab 3• Project