19
R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez WALSAIP

R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

Embed Size (px)

Citation preview

Page 1: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

R. Arce-Nazario, M. Jimenez, and D. RodriguezElectrical and Computer EngineeringUniversity of Puerto Rico – Mayagüez

WALSAIP

Page 2: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

2

Motivation and ObjectiveMotivation and Objective

Discrete Signal Transforms (DSTs)DFT, DCT, lots of applications

Hardware accelerated but at high area cost

Distributed (dedicated) hardware architectures (DHAs)Cost-effective

Partitioning plays key role

Objective: Use inherent properties of DSTs to improve their hardware partitioning to distributed hardware architectures.

DST Partitioning

DHA

Page 3: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

3

Previous WorkPrevious Work

Automated partitioning of DST to DHA’sDSTs treated as any other algorithm/benchmark [Srinivasan01][Bringmann00]Converted to high-level or structural DFG and treated as such.

Manual partitioning & automated code generationDST specific properties exploited [Kumhom01]New formulations developed to exploit architectural features. [VanLoan92]SPIRAL and FFTW – code generation platforms exploring the space of equivalent algorithms. ([Pueschel05], [Frigo05])

[Arce05] – Automated partitioning methodology that incorporates DST features and formulation exploration

Page 4: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

4

Partitioning Methodology Partitioning Methodology

KPA DSTFormulation

ArchitecturalDescription

FormulationManipulator

FormulationTo DFG

Heuristic Control

Partition/Placement

Estimators

High-level partition solution

KPAFormulation

DFG

Cost andIndicators

RuleSelection

KPAFormulation

HypergraphRepresentation

Page 5: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

5

DSTs – General Concepts DSTs – General Concepts

),()..,(],..,[..],..,[ 11111

1

ddddnn

d knknnnxkkXd

General formula for d-dimensional DST

Essentially a vector-matrix multiplication

Fast versions exists, using divide and conquer techniquesHighly regular

Highly connected

Rules can be applied at formulation level: permutation,index-set..

α’s determine type of transform, e.g. DFT: iii Nknjiii ekn /2),(

( ) ( )( ) ( )8 2 4 1 2 2 2 0 4 2 8F F I T I F I T I F R Ä Ä Ä Ä

8R ( )4 2I FÄ ( )( )2 2 2 0I F I TÄ Ä ( )2 4 1F I TÄ

Page 6: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

6

Kronecker Algebra Kronecker Algebra

4444 FFF x Ä)()( 242,4248 FITIFF ÄÄ

84242,4248 )()( PFITIFF ÄÄ

F4

F2 W

W

F2 W

W

F2 W

W

F2 W

WF4

Page 7: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

7

Target topologyTarget topology

Similar to existing platforms in market and academia.Annapolis Micro Systems (Wildforce)Gidel (PROC20KE)Berkeley Emulation Engine (BEE) – being proposed as a cost effective alternative to traditional high performance computing systems.

M0

D0

M1

D1

Mk-1

Dk-1

Crossbar

Page 8: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

8

Partitioning Methodology Partitioning Methodology

KPA DSTFormulation

ArchitecturalDescription

FormulationManipulator

FormulationTo DFG

Heuristic Control

Partition/Placement

Estimators

High-level partition solution

KPAFormulation

DFG

Cost andIndicators

RuleSelection

KPAFormulation

HypergraphRepresentation

Page 9: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

9

DST properties in our methodologyDST properties in our methodology

Incorporated graph considerations to partitioning/placement process

Exploration of equivalent formulations

Partition/Placement

FormulationManipulator

FormulationTo DFG

Heuristic Control

Partition/Placement

Estimators

KPAFormulation

DFG

Cost andIndicators

RuleSelection

Page 10: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

10

Graph partitioning considerationsGraph partitioning considerations

Focus on horizontal partitioning schemes (SIMD-like implementation)

Initial solution = balanced horizontal linear partitioning

scheduling consideration: swap nodes from same computational stages.

M0

D0

M1

D1

Mk-1

Dk-1

Crossbar

Kernigan Lin - bipartitioning Heterogeneous channel k-way partitioning

Page 11: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

11

Formulation explorationFormulation exploration

( ) ( ), ,n p m n p p m n pF F I T I F P Ä ÄFormulationManipulator

FormulationTo DFG

Heuristic Control

Partition/Placement

KPAFormulation

DFG

Cost andIndicators

RuleSelection

FormulationManipulator

Applies permutation and factorization to Kronecker formulation of DSTs to obtain equivalent formulations

Rule

Number of possible reformulations grows exponentially with DST size

Heuristic control method, first answer questions:Do reformulations have an effect on solution quality?How can we effectively explore the equivalent formulation space to find more apt formulations?

Experiments Gain an understanding of algorithmic level effects on solution quality and convergence.

( ) ( )8 2 16,8 8 2 16,8F I T I F PÄ Ä

( ) ( )( )( )( )2 4 8,2 2 4 8,2 2

16,8 8 2 16,8

F I T I F P I

T I F P

Ä Ä Ä

Ä

Page 12: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

12

Measuring quality of solutionMeasuring quality of solution

0 1 1, , , mCost where

‘weight’ of channel iii i WR

required communications through i

D0

D1

D2

D3

D0

D1

D2

D3

4,4 4, ,8Cost

Example: W01 = W12 = W23 = 1, WXBAR = 2

Page 13: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

13

Experiment #1 – Inter-stage permutationsExperiment #1 – Inter-stage permutations

Since Cooley-Tukey’s FFT several common formulations available.( ) ( )( ) ( )8 2 4 1 2 2 2 0 4 2 8F F I T I F I T I F R Ä Ä Ä Ä Pease formulation here

Experiment – several sizes of 5 common formulations where partitioned.

ISP have effect on solution quality, yet no clear winner formulation.

StockahmTr. Stockahm

Cooley-TukeyG. Sande

Pease

Page 14: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

14

Experiment #2 - GranularityExperiment #2 - Granularity

The weight of the nodes for the various computational stages of the transform.

F4F4 F4F4

F4F4

F4F4

F4F4

F4F4

F4F4

F4F4

F2F2

F2F2

F4F4

F4F4

F4F4

F4F4

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

164 4 4 4 4 4 4( ) ( )F F I T I F P Ä Ä 16

422422244444 )))()(()(( PFITIFIIFF ÄÄÄÄ

coarser finer

Page 15: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

15

Experiment #2 – GranularityExperiment #2 – Granularity

Decomposition rules: Large DST = combinations of smaller DSTs analogous to node clustering

* Multiple formulations achieved best cost. Coarsest granularity is shown.

Size Cost Formulation Cost Formulation Cost Formulation Cost Formulation32 11 2/2/2/4* 7 2/2/2/4 32 8/2/2* 16 2/4/2/264 22 8/2/4* 14 2/2/8* 48 2/2/2/2/4 20 4/2/2/4

128 43 8/2/8* 26 16/2/2/2* 92 2/2/2/2/2/4 32 2/2/2/2/2/4256 86 4/2/32* 55 16/8/2* 132 4/2/2/2/2/4 58 2/2/2/2/2/2/4512 171 4/2/64* 106 64/4/2* 276 2/2/2/2/2/2/4/2 116 2/2/2/2/2/2/8

Array 4 Ring 4 Array 8 Ring 8

Effect of topology: Ring vs. Linear: 57% cost reductionFinest granularity not necessarily best.

( ) ( ) ( ) ( ) ( ) ( ) ( )( )( )8 4 2 8,4 4 2 8,4 2 4 8,2 2 4 8,2 2 4 8,2 2 2 2 4,2 2 2 4,2 8,2F F I T I F P F I T I F P F I T I F I T I F P P Ä Ä Ä Ä Ä Ä Ä Ä

Page 16: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

16

Experiment #3 – Breakdown strategyExperiment #3 – Breakdown strategy

Breakdown strategy – order and divisors with which a transform is decomposed.

Split trees – a common graphical representation of break. Strategy

Example: Two split tress for a DFT size 64.

( ) ( )( )( ) ( )64 4 2 8,4 4 2 8,4 8 64,8 8 8 64,8F F I T I F P I T I F P Ä Ä Ä Ä

( )64 2 32 64,2F F I T Ä ( ) ( )( )( )2 2 16 16,2 2 16 16,2 64,2I F I T I F P PÄ Ä Ä

(a)

(b)

6

3 3

2 1

6

1 5

41

(a) (b)

Page 17: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

17

Experiment #3 – Results Experiment #3 – Results

ProcedureExhaustive generation of split trees for DFT sizes n=16 to 256.

Formulations partitioned for various topologies

Observation of split tree decisions that lead to ‘partition friendly’ formulations

Generation of n > 256 formulations using rules.

Page 18: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

18

Conclusions and Future WorkConclusions and Future WorkMethodology for partitioning of DST to DHAs:

DST graph considerations Formulation exploration

Graph considerationsGeneration of initial partition linear – provides better results than random.Limitation of node moves – faster convergence time.

Exploration at the algorithmic level experimentsIsolated features such as permutations and granularity

Effect was evidenced, but hard to establish a relation to solution quality.Coarse granularity = better convergence, good solution quality

Breakdown strategy – ‘partition friendly’ formulations generated.

Current Work: Experimentation with DCTs.Experimentation with other properties define overall exploration strategy

Page 19: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez

19

AcknowledgementsAcknowledgements

Puerto Rico Experimental Program to Stimulate Competitive

Research (PR-EPSCoR)

WALSAIP - Wide-Area Large Scale Automated Information Project

Puerto Rico NASA Space Grant

QUESTIONS?