Upload
casimir-lel
View
38
Download
2
Embed Size (px)
DESCRIPTION
Design & Co-design of Embedded Systems. Distributed System Co-synthesis (2). Maziar Goudarzi. Today Program. Introduction Preliminaries Hardware/Software Partitioning Distributed System Co-Synthesis (part 2). References: - PowerPoint PPT Presentation
Citation preview
Design & Co-design of Embedded Systems
Distributed System Co-synthesis (2)
Maziar Goudarzi
Fall 2005 Design & Co-design of Embedded Systems
2
Today Program
IntroductionPreliminariesHardware/Software PartitioningDistributed System Co-Synthesis (part 2)
References:
Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2, Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W. Wolf, Kluwer Academic Publishers, 1997.
W. Wolf, “An architectural co-synthesis algorithm for distributed, embedded computing systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 5, no. 2, pp. 218-229, 1997.
References:
Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2, Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W. Wolf, Kluwer Academic Publishers, 1997.
W. Wolf, “An architectural co-synthesis algorithm for distributed, embedded computing systems,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 5, no. 2, pp. 218-229, 1997.
Fall 2005 Design & Co-design of Embedded Systems
3
Topics
IntroductionAn Integer Linear Programming
ModelA Heuristic Algorithm
On ordinary task graphs On an Object-Oriented model
Co-Synthesis Algorithms:Distributed System Co-Synthesis
Wolf’s Heuristic Algorithm on Ordinary Task Graphs
Fall 2005 Design & Co-design of Embedded Systems
5
Wolf’s Heuristic Algorithm
As ever, topics of importance: System Specification Language/Model Target Architecture Functionality (Allocation/Scheduling) Quantum Allocation Strategy Scheduling Strategy Cost Estimation Performance Estimation Algorithm Details
Fall 2005 Design & Co-design of Embedded Systems
6
Wolf’s Heuristic Algorithm (cont’d)
Wolf’s Heuristic Algorithm System Specification Language/Model
Algorithm input: single-rate task graph Target Architecture
Heterogeneous multiprocessor architecture Allocation
Primal approach: Performance is the major objective
Scheduling?
Functionality QuantumProcesses in a single-rate task graph
Fall 2005 Design & Co-design of Embedded Systems
7
Wolf’s Heuristic Algorithm (cont’d)
Wolf’s Heuristic Algorithm (cont’d) Performance Estimation
Component Technology LibraryRun-time of each process on each available PE is
supposed to be known Cost Estimation
Component Technology LibraryTotal Cost = i (Cost of PEi)
+ j (Cost of Devicej) + (Cost of Comm. Channelk)
Algorithm Details
Fall 2005 Design & Co-design of Embedded Systems
8
Wolf’s Heuristic AlgorithmDetails
Four major steps in co-design Partitioning: dividing the spec. into smaller parts (e.g.
processes) Allocation: assigning each process to a multiprocessor
node (PE) Scheduling: serializing processes assigned to each PE Mapping: selecting a particular component for each PE
Problem: These steps (especially allocation, scheduling, and mapping) have a circular relationship
Solution: Break the loop
Fall 2005 Design & Co-design of Embedded Systems
9
Wolf’s Heuristic AlgorithmDetails (cont’d)
Wolf:1. Give an initial allocation2. Refine it to reduce cost
Order of satisfying design criteria:1. Satisfy all deadlines2. Minimize PE cost3. Minimize comm. port cost4. Minimize device cost
Fall 2005 Design & Co-design of Embedded Systems
10
Wolf’s Heuristic AlgorithmDetails (cont’d)
First ignore communication costs. Later, take them into account
Steps:1. Create an initial feasible solution, and perform an
initial scheduling on it.• Initial feasible solution: assign each process to a separate PE
2. Reallocate processes to PEs to minimize total PE cost.• Possibly eliminate PEs from initial feasible solution
3. Reallocate processes again to minimize the amount of communication required between PEs
4. Allocate communication channels5. Allocate IO devices. (Internal or external to PEs)
Fall 2005 Design & Co-design of Embedded Systems
11
Wolf’s Heuristic Algorithm Details (cont’d)
The most important step: 2. Initial reallocationReason: PE cost is the dominant hardware cost
Initial reallocation1. PE cost reduction:
1.1 Scan the PEs, starting with the least-utilized PE. 1.2 Try to reallocate that PE’s processes to other
existing PEs 1.3 If no process left on the PE, eliminate it
otherwise replace the PE with a suitable lower-cost one
2. Pair-wise mergeMerge a pair of PEs into a single, more powerful one
3. Load balancing
Fall 2005 Design & Co-design of Embedded Systems
12
Wolf’s Heuristic Algorithm Details (cont’d)
Initial reallocation (cont’d)“PE cost reduction” phase tries to reallocate
multiple processes at a timeThe above 3 phases are repeated as far as
possible
Fall 2005 Design & Co-design of Embedded Systems
13
Wolf’s Heuristic Algorithm: Experimental Results
Example
#processes
Period Impl. Cost CPU time (sec)
Wolf P&P Wolf P&P
pp1 4 2.5 14 14 0.05 11
3 14 13 0.05 24
4 7 7 0.05 28
7 5 5 0.05 37
pp2 9 5 15 15 0.7 3732
6 12 12 1.1 26710
7 8 8 1.6 32320
8 8 7 1.0 4511
15 5 5 1.1 385012
Fall 2005 Design & Co-design of Embedded Systems
14
Wolf’s Heuristic Algorithm Experimental Results (cont’d)
Finds optimal solutions to most of ILP-solved examples
Finds near-optimal solutions for the remaining examples
Showed good results on larger examplesRequires very little run-time
Due to multiple-move strategy during PE cost minimization phase
Co-Synthesis Algorithms:Distributed System Co-Synthesis
Wolf’s Heuristic Algorithm for Object-Oriented Models
Fall 2005 Design & Co-design of Embedded Systems
16
Introduction
Target Co-synthesis of a Distributed-System
out of an Object-Oriented Specification
Significance OO is a promising approach in
designing embedded systems at ESL
Reference:
W. Wolf, “Object-Oriented Co-Synthesis of Distributed Embedded Systems,” ACM Transactions on Design Automation of Electronics Systems, pp. 301-314, 1996
Reference:
W. Wolf, “Object-Oriented Co-Synthesis of Distributed Embedded Systems,” ACM Transactions on Design Automation of Electronics Systems, pp. 301-314, 1996
Fall 2005 Design & Co-design of Embedded Systems
17
OO Co-Synthesis Algorithm
Again, our eight topics System Specification Language/Model Target Architecture Functionality (Allocation/Scheduling) Quantum Allocation Strategy Scheduling Strategy Cost Estimation Performance Estimation Algorithm Details
Fall 2005 Design & Co-design of Embedded Systems
18
OO Co-Synthesis Algorithm (cont’d)
System Specification Model/Language An Object-Oriented Specification as input Method dataflow graph as model
Object O1
method m1variables v1,v2
method m2variables v2,v3
Object O2
method m4variables v10,v20
Object O3
method m3variables v8,v9
Fall 2005 Design & Co-design of Embedded Systems
19
OO Co-Synthesis Algorithm (cont’d)
Target Architecture Distributed System
An arbitrary-topology network of PEs
Functionality Quantum Methods of Objects in an OO Specification As far as possible, keeps together all methods
of an object Partitioning is done during algorithm
execution
Fall 2005 Design & Co-design of Embedded Systems
20
OO Co-Synthesis Algorithm (cont’d)
Cost and Performance Estimation Pre-specified
A technology description of available components is input to the algorithm
Allocation, Scheduling, and Algorithm Details Much like Wolf’s previous heuristic algorithm Includes modifications in order to:
handle large sets of methodsconsider effects of splitting objects across PEs
Fall 2005 Design & Co-design of Embedded Systems
21
OO Co-Synthesis Algorithm (cont’d)
Allocation, Scheduling, and Algorithm Details1. Initial allocation and scheduling.
Allocate processes to PEs such that all tasks are placed on PEs fast enough to ensure that all deadlines are met, keeping objects together as much as possible
2. Minimize PE cost.Reallocate processes to PEs to minimize PE cost, splitting objects when necessary.
3. Minimize communication. Reallocate processes again to minimize inter-PE communication, taking into account traffic generated by splitting objects across PEs
Fall 2005 Design & Co-design of Embedded Systems
22
OO Co-Synthesis Algorithm (cont’d)
4. Allocate channels.Allocate communication channels
5. Allocate devices.either as on-chip devices or external devices on communication channels
Allocation, … Details (cont’d)
Fall 2005 Design & Co-design of Embedded Systems
23
OO Co-synthesis Details
Step 1 (initial allocation) One PE per object
Step 2 (minimize PE cost) oo_balance_load()
Tries to redistribute methods to better balance the system load
PE_replacement()Use a cheaper PE without distributing the allocation
oo_pairwise_merge()Tries to eliminate PE by moving its methods to other
PEsStep 2 is done repeatedly
Methods are re-scheduled after each new allocation
Fall 2005 Design & Co-design of Embedded Systems
24
OO Co-synthesis Details (cont’d)
Note :
This operation may cause "Hidden communication”.
Note :
This operation may cause "Hidden communication”.
Fall 2005 Design & Co-design of Embedded Systems
25
OO Co-synthesis Details (cont’d)
Fall 2005 Design & Co-design of Embedded Systems
26
OO Co-Synthesis Algorithm (cont’d)
Experimental Results Algorithm implemented in C++
Using NIH class library8600 lines of codeExecuted on SGI Indigo workstation
Algorithm applied to examples from software engineering books on OO designExample #objects/methods CPU Timecfuge 2/3 0.05dye 3/15 2.0juice 3/4 0.05train 5/6 0.05
Reason for highest cpu-time:
Having most methods => scheduling required in each inner loop of step 2
This implementation, had a simple inefficient scheduler.
Reason for highest cpu-time:
Having most methods => scheduling required in each inner loop of step 2
This implementation, had a simple inefficient scheduler.
Fall 2005 Design & Co-design of Embedded Systems
27
OO Co-Synthesis Algorithm (cont’d)
Main contribution OO specification is an important aid to
automatic partitioningThe specification is naturally divided into two levels
of granularity• Systems is composed of Objects• Objects are composed of data members and
methods
The heuristic:Preserve the specification’s partitioning as much as
possible
Fall 2005 Design & Co-design of Embedded Systems
28
What we learned today
Distributed System Co-Synthesis A heuristic approach
Non-OO algorithmCustomization to OO specificationsHeuristic: First minimize the PE cost since it is
the dominant factor