A New Approach for Task Level Computational Resource Bi-Partitioning Gang Wang, Wenrui Gong, Ryan Kastner Express Lab, Dept. of ECE, University of California,

A New Approach for Task Level A New Approach for Task Level Computational Resource Computational Resource Bi-PartitioningBi-Partitioning

Gang Wang, Wenrui Gong, Ryan Kastner Express Lab, Dept. of ECE,

University of California, Santa Barbara

OverviewOverview Resource Partitioning Problem Ant System (AS) Heuristic AS for Task Level Resource

Partitioning Experiment Results Future Work

Resource Partitioning Problem(1)Resource Partitioning Problem(1) Heterogeneous architecture is

getting more and more popular Partitioning problem is a

fundamental challenge Automatically assign application onto

different computation resources Optimizing system performance under

constraints Two resource case : hardware/software

co-design

Resource Partitioning Problem(2)Resource Partitioning Problem(2) NP-hard Different heuristic methods

have been developed Simulated annealing Genetic Algorithms Tabu Search Expert System Kernighan/Lin



Ant System Heuristic (1)Ant System Heuristic (1) First introduced for optimization

problems by [Dorigo et. al. 1996] Inspired by ethological study on the

behavior of ants [Goss et. al. 1989] A meta heuristic A multi-agent cooperative searching

method A new way for combining

global/local heuristics

Ant System Heuristic (2)Ant System Heuristic (2)









Key ObservationsKey Observations Autocatalytic effect Indirect communication (stigmergy)

Ants deposit pheromones on the ground different the quality of the paths Pheromone trails encode a long-term

global memory about the search process When the ants reach a decision, they

are biased by the amount of pheromone (maybe probabilistically )



AS Algorithm for HW/SW Co-DesignAS Algorithm for HW/SW Co-Design

Problem: For a given application, find the optimal resource partition under certain system constraints: Task level abstraction Task can map to GPP or

Configurable Logic Pre-knowledge about the

computational resources

Modeling the Task/Resource Modeling the Task/Resource Partitioning ProblemPartitioning Problem

Application is modeled as Task Graph (DAG)

Sequential scheduling (not pipelined)

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

Partitioning as Graph Bi-coloring Partitioning as Graph Bi-coloring Task 1, 2, 7 and 8

are assigned to the GPP

Task 3, 4, and 6 onto the configurable logic

The inbound edges are colored accordingly

We don’t care the coloring for virtual nodes t0 and tn

We don’t care the coloring for edge e8n

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

C o n fi gu rab le L o g i c ,c o lo r C 2

G P P , c o lo r C 1

Partitioning as Graph Bi-coloringPartitioning as Graph Bi-coloring Each computing resource is assigned

with a color ck

Each edge eij is associated with a set of global heuristics (pheromone trails) ij(k) indicating the favorableness for tj

to be colored with ck

A coherent coloring is defined as: Each task node in the DAG is colored All the inbound edges of a task node have

the same coloring as that of the corresponding task node

AS algorithm for resource AS algorithm for resource partitioning (1)partitioning (1)1. Initially, assign each of the edges in the task graph

with a fixed pheromone 0 for both color c1 and c2, where c1 corresponds to GPP, while c2 for the configurable logic;

2. Put m ants on t0;

3. Each ant traverses the task graph to create a feasible bi-coloring solution si for the task graph, where i =1, . . . ,m;

4. Evaluate all the m solutions. The quality of the solution s is measured by the overall execution time time(s). Among all solutions, find the best solution sbest which provides the minimum execution time and satisfies the configurable logic area constraint;

AS algorithm for resource AS algorithm for resource partitioning (2)partitioning (2)5. Update the pheromone for each color on

the edges as follows:

ij(k) (1 - )ij(k) + ij(k) (1)

where : 0 < < 1 is the evaporation ratio, escape from local minima

k = 1 or 2, ij(k) =Q/time(sbest ) if eij is colored with ck in sbest

0 otherwise

6. If the ending condition is reached, stop and report the best solution found. Otherwise go to step 2.

Step 3: How to construct Step 3: How to construct individual coloringindividual coloring Each ant traverses the graph in

topologically sorted order Guarantees that each inbound edge

to the current node has been already examined

At each node, the ant will: Make guesses for the coloring of the

successor nodes Make decision on the coloring of the

current node

Make guesses for the successor Make guesses for the successor task nodestask nodes At task node ti, the ant makes guesses the

coloring for each of the successor nodes tj : ij(k) : global heuristic on coloring tj with ck

j(k) : local heuristic on coloring tj with ck

)2((l)η(l)τ

(k)η(k)τ(k)p

1,2l

βj

αij

βj

αij

ij

)3(k)area(j,wk)time(j,w

1

k)cost(j,

1(k)η

atj

Make decision on the coloring of Make decision on the coloring of the current nodethe current node Upon entering a new task node ti, the

ant makes a decision on the coloring of ti : probabilistically based on the guesses

made by all the immediate precedents of ti Inbound edges are correspondingly

colored once this decision is made

)4( of precedents immediate ofcount

for guess ofcount (k)pi

i

ik

t

tc

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

P 1 P K

t

S1 SL. . . . . .

. . . . . .

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

P 1 P K

t

S1 SL. . . . . .

. . . . . .

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

P 1 P K

t

S1 SL. . . . . .

. . . . . .

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

P 1 P K

t

S1 SL. . . . . .

. . . . . .

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

P 1 P K

t

S1 S L. . . . . .

. . . . . .

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

P 1 P K

t

S1 S L. . . . . .

. . . . . .

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

P 1 P K

t

S1 S L. . . . . .

. . . . . .

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

P 1 P K

t

S1 S L. . . . . .

. . . . . .

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

Find the best and update thepheromone trails based on the solution’s quality

t1

t2 t3

t4 t5

t6

t7

t8

t0

tn

Next iteration

ExtensibilityExtensibility Easy to extend to multi-way

partitioning Different performance/constraint

pair Different task level cost model



Experiment System (1)Experiment System (1) Target system contains:

One GPP ( PowerPC 405 RISC) One configurable logic (Xilinx Virtex II

with 1232 CLBs) Sequential scheduling

Precedence level has to be respected Tasks without precedence constraint

can run concurrently given the resource partitioning allows

Experiment System (2)Experiment System (2) Testing benchmark:

DAGs of different sizes are generated randomly with average branching factor of 5

Real functions (in C/C++) extracted from the MediaBench suits are mapped onto the task nodes

Tasks are analyzed using SUIF and Machine SUIF tools to achieve detailed CDFG level description

Simplified communication interface between tasks Goal: Find the optimal resource partition

that achieves the best worst case execution time under FPGA area constraint

Evaluating AS algorithmEvaluating AS algorithm Compare the AS results with:

Brute force search Offers definitive measurement for the

quality Theoretical performance for Random

Sampling Helps to filter out EASY test cases

Stimulated annealing Popularly used Allow much bigger problem size

Experiment SettingsExperiment Settings Each DAG has 25 task nodes, over 33

million possible assignments! 50 testing instances are generated

originally After filtering out the “easy” cases using

the brute force search, 25 difficult testing cases left

Number of ants is set to 5, which equals to the average branching factor of the task graph

Force AS algorithm stop after 100 iterations in each run

Typical ant search runTypical ant search run

Result Quality Assessment (I)Result Quality Assessment (I) 91.7% of the

results are within the top 3%

77% of the results of AS are within the top 2%

63.5% of the results are within top 0.1%

Result Quality Assessment (II)Result Quality Assessment (II) The absolute

performance of the majority of the results found by AS are within 10% range comparing with the optimal

Result Quality Assessment (III)Result Quality Assessment (III) The ability for finding one of the

optimal partitions 460 times for 2,500 instances (18.4%) While random sampling approach with

the same computation time only has a chance of 8.5E-7

For significant portion (>20%) of the tested examples, AS discovers the optimal partition with probability >1/2

Result Quality Assessment (IV):Result Quality Assessment (IV):Multi-way & SAMulti-way & SA

Extended to the 3-way partitioning problem

33 difficult testing cases

325 possible partitions

SA-50 has comparable run time as the AS

SA-500 and SA-1000 runs at 10 and 20 times

ContributionsContributions For the first time, introduced AS heuristic for

HW/SW co-design problem Constructed a novel AS algorithm that achieved

robust results that are qualitatively close to the optimal with minor computational cost for the testing benchmark

Provided definitive quality assessment by comparing the proposed algorithm with the theoretical random sampling results

Experiments shows the proposed algorithm surpasses popularly used SA heuristic

Future workFuture work Extend to the multi-way resource partitioning

problem More comprehensive comparison with other

heuristic methods (such as GA, Tabu) Hybrid approach (e.g. AS followed by SA) Applying to more realistic and complex system

model, e.g. more realistic communication model Extend AS from static partitioning to dynamic

partitioning problem ( truly reconfigurable)

Thanks the your attention. Questions?

Documents

A New Approach for Task Level Computational Resource Bi-Partitioning Gang Wang, Wenrui Gong, Ryan Kastner Express Lab, Dept. of ECE, University of California,