Upload
cameron-byrd
View
218
Download
3
Embed Size (px)
Citation preview
Department of Electrical and Computer Engineering
University of Massachusetts, Amherst Xin Huang and Tilman Wolf
{xhuang,wolf}@ecs.umass.edu
A Methodology for Evaluating Runtime Support in Network
Processors
2Department of Electrical and Computer Engineering
Runtime Support in Network Processor
Network processor (NP)• Multi-core system-on-chip• Programmability & high packet processing rate
Heterogeneous resources• Control processors• Multiple packet processors• Co-processors• Memory hierarchy• Interconnection
Runtime support• Dynamic task allocation
Receiveand
Transmit
Scratchpad
Hash Unit
μEμEμEμE
μEμEμEμE
SRAMand
DRAMInterface
XscaleControl
Processor
μEμEμEμE
μEμEμEμE
IXP 2800
3Department of Electrical and Computer Engineering
Receiveand
Transmit
Scratchpad
Hash Unit
μEμEμEμE
μEμEμEμE
SRAMand
DRAMInterface
XscaleControl
Processor
μEμEμEμE
μEμEμEμE
NP Hardware Resources
SRAM
Flash
Memory Mapped I/O
SDRAM
Workload
Task Allocation on the Processors
Runtime Mapping
General Operation of Runtime Support in NP
Input• Hardware resources• Workload
Mapping method Output
• Task allocation
Dynamic adaptation• Different runtime
support systems• Difficult to compare
AP2
AP1
AP3AP2 AP3AP3
4Department of Electrical and Computer Engineering
Contributions
Evaluation methodology• Traffic representation• Analytical system model based on queuing networks• Results
Specific: 3 example runtime support systemI. Ideal AllocationII. Full Processor Allocation
• R. Kokku, T. Riche, A. Kunze, J. Mudigonda, J. Jason, and H. Vin. A case for run-time adaptation in packet processing systems. In Proc. of the 2nd workshop on Hot Topics in Networks (HOTNETS-II), Cambridge, MA, Nov. 2003
III.Partitioned Application Allocation• T. Wolf, N. Weng, and C.-H. Tai. Design consideration for network
processor operating systems. In Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication System (ANCS), pages 71-80, Princeton, NJ, Oct. 2005
5Department of Electrical and Computer Engineering
Outline
Introduction Evaluation Methodology
• Dynamic Workload Model• Runtime System Model
Result Summary
6Department of Electrical and Computer Engineering
Workload
NP workload is characterized by applications and traffic
How to represent workload?
7Department of Electrical and Computer Engineering
Dynamic Workload Model
Workload graph:• Application/Task: T• Traffic: • Processing requirement:
Example:
Processing requirement:• R. Ramaswamy and T. Wolf. PacketBench: A tool for workload
characterization of network processing. In Proc. of IEEE 6th Annual Workshop on Workload Characterization (WWC-6), page 42-50, Austin, TX, Oct. 2003
( , )W T U
,t tU R( )iD t
8Department of Electrical and Computer Engineering
Outline
Introduction Evaluation Methodology
• Dynamic Workload Model• Runtime System Model
Result Summary
9Department of Electrical and Computer Engineering
Runtime System Model
Unified approach for all runtime systems• Queuing networks• Specific solution for each runtime system
• Runtime mapping: • Graph:• Packet arrival rate:• Service time:
Metrics for all runtime systems• Processor utilization:• Average number of packets in the system:
( , )i jD t p,ti j
:t tM T P( , )S P Q
K
10Department of Electrical and Computer Engineering
Three Example Runtime Support Systems
System I: Ideal Allocation System II: Full Processor Allocation System III: Partitioned Application Allocation
Workload
T1 T2T2
T1 & T2T1 & T2
T1 & T2T1 & T2
T1
T2 T2
T1_1
T2_1T2_1T2_1
T1_2T2_2T2_2
T1_4T2_4T2_4
T1_3T2_3T2_3
Ideal Allocation Full Processor Allocation Partitioned Application Allocation
11Department of Electrical and Computer Engineering
Example Evaluation Model – System I
Ideal Allocation • All processors can process all packets completely• Unrealistic, but can provide baseline
M/G/m FCFS single station
12Department of Electrical and Computer Engineering
M/G/m Single Station Queuing System
Cosmetatos approximation
Evaluation metrics
2 2/ / / / / /
11
/ /
0
1/ / / /
(1 ) ,
( ) ( ) ( ) 1; ; [ ] ,
(1 ) !(1 ) ! ! (1 )
1 1 4 5 2; (1 (1 )( 1) )
2 16
M G m M M m M D mB B
m k mmm
M M m mk
M D m M M m DmDm
W c W c W
where
P m m mW P
m m m k m
and
mW W nc m
nc m
K W m
G. Cosmetatos. Some Approximate Equilibrium Results for the Multiserver Queue (M/G/r). Operations Research Quarterly, USA, pages 615 – 620, 1976
G. Bolch, S. Greiner, H. de Meer, and K. S. Trivedi. Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. John Wiley & Sons, Inc., New York, NY, August 1998
;m
13Department of Electrical and Computer Engineering
Example Evaluation Model – System II
Full Processor Allocation• Allocate entire tasks to subsets of processors• Allocate as few processors as possible to save power• One processor run one type of task• Reallocation is triggered by queue length
BCMP M/M/1-FCFS model
(Jackson network)
14Department of Electrical and Computer Engineering
BCMP Network
BCMP: Basket, Chandy, Muntz, and Palacios Characteristics: Open, closed, and mixed queuing network;
Several job classes; Four types of nodes: M/M/m–FCFS (class-independent service time), M/G/1–PS, M/G/∞–IS, and M/G/1–LCFS PR
Product-form steady-state solution: Open M/M/1-FCFS BCMP Queuing Network:
• Evaluation metrics:
11
1( ,..., ) ( ) ( ),
( )
N
N i ii
s s d s n sG K
11
( ,..., ) ( ), ( ) (1 ) i
Nk
N i i i i i ii
k k k k
F. Baskett, K. Chandy, R. Muntz, and F. Palacios. Open, Closed, and Mixed Networks of Queues wit Different Classes of Customers. Journal of the ACM, 22(2): 248 – 260, April 1975
,1 1 1
,1
C C Cir ir
i iri ir ir rr r r i i
eK K
15Department of Electrical and Computer Engineering
Example Evaluation Model – System III
Partitioned Application Allocation• Tasks be partitioned across multiple processors• Synchronized pipelines• Allocate tasks equally across all processors to maximize
throughput• Reallocate at fixed time intervals
Equations for evaluation metrics are the same as System II.
BCMP M/M/1-FCFS model(Jackson network)
16Department of Electrical and Computer Engineering
Outline
Introduction Evaluation Methodology
• Dynamic Workload Model• Runtime System Model
Result Summary
17Department of Electrical and Computer Engineering
Setup
System• 16 100MIPS processing engines • Queue lengths are infinite
Workload
Other assumptions• Partition applications into 7-15 subtasks
18Department of Electrical and Computer Engineering
Processor Allocation Over Time
Ideal:• 16 processors
Full Processor:• Change with traffic
Partitioned Application:• 16 processors
Full processor allocation system
19Department of Electrical and Computer Engineering
Processor Utilization Over Time
Ideal:• Lowest processor
utilization Full Processor:
• Highest processor utilization because using fewer number of processors
Partitioned Application:• Low processor utilization• Not equal to ideal case
due to the unbalanced task allocation and pipeline overhead
20Department of Electrical and Computer Engineering
Packets in System Over Time
Ideal:• Least number of packets
Full Processor:• Packets queued up due to
its high processor utilization
Partitioned Application:• Most number of packets
due to unbalanced task allocation and pipeline overhead
• More stable performance because of finer processor allocation granularity
21Department of Electrical and Computer Engineering
Performance for Different Data Rates
Ideal:• Smooth increase
Full Processor: • Periodical peak
Partitioned Application:• Smooth increase
The maximum data rate supported by the systems• Ideal: 100%• Full Processor: 79.6%• Partitioned application:
75.1%
22Department of Electrical and Computer Engineering
Implication of the Results
Ideal Allocation• Provide a base line
Full Processor Allocation• Allocate as few processors as possible to save power• Use entire processor as the allocation granularity• Good: High processor utilization• Bad: High performance variance
Partitioned Application Allocation• Equally distribute tasks on all the processors• Finer processor allocation granularity• Good: Stable performance• Bad: Difficult to get optimized solution => pipeline
synchronization overhead
23Department of Electrical and Computer Engineering
Summary
Analytical methodology for evaluating different runtime support NP systems
Dynamic workload model and runtime system model
Results: 3 example runtime support systems• Quantitative metrics• Tradeoffs
24Department of Electrical and Computer Engineering
Questions ?