Ajay K. Verma, Philip Brisk and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for...

Ajay K. Verma, Philip Brisk and Paolo Ienne

Processor Architecture Laboratory (LAP)& Centre for Advanced Digital Systems (CSDA)

Ecole Polytechnique Fédérale de Lausanne (EPFL)

Fast, Quasi-Optimal, and Pipelined Fast, Quasi-Optimal, and Pipelined Instruction-Set ExtensionsInstruction-Set Extensions

Custom ISE IdentificationCustom ISE Identification

Register File

ALU MUL LD/ST

Data Memory

AFUout1 = F (in1, in2, in3, in4)out2 = G (in1, in2, in3, in4)

Limited number ofI/O ports

OutlineOutline

Problem formulation ISE selection I/O serialisation

Related work

Non-optimality of earlier work

Integer Linear Programming (ILP) formulation

Results

Conclusions

Problem FormulationProblem Formulation Given

a dataflow graph

a set of forbidden nodes

Find a subgraph S, which isconvex free of

forbidden nodes

And, has largest gainM (S) =

Nexec * (SW (S) – HW (S))

b c e g

Convex SubgraphConvex Subgraph

In order to execute the AFU we need the output of node b

Computation of node b requires the output of AFU

A non-convex AFU cannot be scheduled without creating a deadlock

I/O SerialisationI/O Serialisation

2 inputs, 4 outputsAvailable I/O ports: (1, 2)

ISE Merit EstimationISE Merit Estimation

M (S) = Nexec * (SW (S) – HW (S))

b c e g

Related WorkRelated Work ISE identification under I/O constraints

Search space pruning using I/O and convexity constraints [Atasu03, Clark03, Yu04, Pozzi06, Yu07, Chen07]

ILP based approach [Atasu05] Pseudo-polynomial time algorithm [Bonzini07]

ISE identification under relaxed I/O constraints Restricted search space exploration [Pozzi05] Generation of a semi compact set of connected ISEs

[Pothineni07]

I/O serialisation Exponential time algorithms [Pozzi05, Pothineni07]

Algorithms for specific processor models Single-issue RISC processor model [Verma07]

Earlier WorkEarlier Work

ISE Selection I/O Serialisation

Atasu03

Chen07

Bonzini07

Pozzi05

Pothineni07

Optimal ISEs selection undervarious I/O constraints

Exponential time I/O serialisation algorithm

Non-Optimality of Earlier WorkNon-Optimality of Earlier Work

cycle saved:

cycle saved: 066

cycle saved: 112

Our ContributionsOur Contributions

Optimal ILP formulation for a large class of processor modelsEarlier work consider RISC processor model only

Single run In the earlier work ISE selection was done for

various I/O constraints

ISE selection and I/O scheduling togetherAnother source of non-optimality of earlier work

Integer Linear ProgrammingInteger Linear Programming

Objective function

Linear constraints

ILP FormulationILP Formulation

Linear constraintsNo forbidden nodesConvexity constraints I/O serialisation based constraints I/O access per cycle based constraints

Objective functionSaving in cycles should be maximum

ISE Selection Constraints (1 of 2)ISE Selection Constraints (1 of 2) Variable: For each node ni a Boolean variable xi

xi is true iff node ni is in the selected ISE

Constraint: No forbidden node should be in the ISE If ni is a forbidden node, then xi = 0

Variable: For each node ni two Boolean variables pi and si

pi (si) is true iff at least a predecessor (successor) of ni is in the selected ISE

Constraint: Subgraph corresponding to the selected ISE must be convex If (pi and si are true), then xi must be true (i.e., pi + si – xi ≤

ISE Selection Constraints (2 of 2)ISE Selection Constraints (2 of 2)

Relationship between pi, si and xi

pi = 0 if ni has no children

U (xj U pj) where nj’s are children of ni

si = 0 if ni has no parents

U (xj U pj) where nj’s are parents of ni

I/O Serialisation Based Constraints (1 I/O Serialisation Based Constraints (1 of 3)of 3)

Variable: An integer variable intDelayi

Denotes the cycle in which node ni is executed, e.g.,

intDelay1 = 0 intDelay4 = 1 intDelay5 = 2

Variable: A real variable fractionalDelayi Denotes the smallest time after

intDelayi cycle when output of ni are available, e.g.,

fractionalDelay3 = HW (n3) fractionalDelay4 = HW (n3) + HW (n4)

Variable: An integer variable ρij Denotes the number of stages across

the edges between the nodes ni and nj , e.g.,

ρ13 = 1 ρ34 = 0 ρ25 = 2

Constraint: The difference between the cycles of predecessor and successor node is the same as number of latches on the edge connecting them, e.g., intDelay4 = intDelay3 +

intDelay5 = intDelay2 + ρ25

Constraint: The total number of stages is the same as the last cycle in which an output node is computed, e.g., R = intDelay5 + ρ57 R = intDelay2 + ρ26

Extra latches on output edges are createdin order to realize an imaginary sink node

Constraint: fractionalDelay of a node depends on the fractionalDelay of its predecessor nodes, e.g., Case 1: if node is the first node

in the cycle fractionalDelay3 = HW (n3)

Case 2: if node is not the first node in the cycle

fractionalDelay4 = fractionalDelay3 + HW (n4)

Constraint: fractionalDelay of a node should never exceed the cycle time, e.g., fractionalDelay3 ≤ λ fractionalDelay4 ≤ λ

I/O Access Per Cycle Based I/O Access Per Cycle Based Constraints Constraints

Variable: Boolean variables cikIN and cik

cikIN is true, iff ni is an input of ISE and is accessed in the

kth stage of execution (similarly for cikOUT)

Constraint: In each stage no more than m inputs should be accessed, and no more than n outputs should be written back, i.e., for each k ∑ cik

IN ≤ m

∑ cikOUT ≤ n

cikIN and cik

OUT can be computed using the intDelay, fractionalDelay of nodes and ρ values of incoming and outgoing edges of the AFU

Objective FunctionObjective Function

Saving in cycles should be maximized SW (S) – HW (S) should be maximum

SW (S) = ∑ xi SW (ni)

HW (S) = R

Any processor model where SW (S) and HW (S) can becomputed using linear inequalities, can be handled using ILP

Experimental SetupExperimental Setup

Input dataflowgraph

ISE selectionAtasu03

ILP method

I/O serialisationPozzi05

No serialisation

exp / subopt

exp / opt

Results (1 of 3)Results (1 of 3)

viterbi

adpcmdecoder adpcmcoder

No pipelining

Pozzi’s algorithm

ILP method

Pozzi’s algorithm takes several hours on this benchmark, and produces inferior results

Benchmark: aes

Biggest dataflow graph: 703

After 3 minutes After an hour

The best AFU with 22 inputs and 22 outputs

ConclusionsConclusions

ISE Selection I/O Serialisation

Atasu03

Chen07

Bonzini07

Pozzi05

Pothineni07

The methodology can be generalized for a large class of processor models

Optimal, single run algorithm

Ajay K. Verma, Philip Brisk and Paolo Ienne Processor Architecture Laboratory (LAP) & Centre for...

Documents

CSDA Presentation 2008.ppt

Infoscience - École polytechnique fédérale de Lausanne

Ecole Polytechnique Fédérale de Lausanne EPFL …sti.epfl.ch/files/content/sites/sti/files/shared/sel/pdf/SEL...Ecole Polytechnique Fédérale de Lausanne EPFL ... ST Microelectronics

jkT; Lrjh; cSda lZ lfefr] e/; izzzzns'kslbcmadhyapradesh.in/docs/SLBC meeting held/Agenda...135 th AGENDA jkT; Lrjh; cSda lZ lfefr] e/; izzzzns'k STATE LEVEL BANKERS' COMMITTEE, M.P

Message relatif à la loi fédérale sur la Haute école …2019-2750 641 19.070 Message relatif à la loi fédérale sur la Haute école fédérale en formation professionnelle (Loi

Computer Arithmetic Number Representation EPFL IC LAP EPFL CSDA and UC Davis ACSEL

CSDA – The Who? The What? The Why? David Oppenheim- CSDA Jamie Murray- Santa Cruz/San Benito Regional Tamara Thomas- Stanislaus County Adele Hendrickson-

Ecole Polytechnique Fédérale de Lausanne EPFL …sti.epfl.ch/files/content/sites/sti/files/shared/smt/Stages/SMT...Ecole Polytechnique Fédérale de Lausanne EPFL –Internships

high throuput multi standard transform core realisation using csda

How to check the proton csda range

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE Organic Photonics

École Polytechnique Fédérale de Lausannehexhive.epfl.ch/theses/19-rizzo-thesis.pdfÉcole Polytechnique Fédérale de Lausanne Hardening and Testing Privileged Code through Binary

CSDA leader of the future from John Spence

2020–2021 MEMBERSHIP DIRECTORY - CSDA

CSDA Win/Win Negotiations and Conflict Resolution 6.14

Hadi Afshar, Philip Brisk, Paolo Ienne EPFL

Federal Court Cour fédérale - Barristers & Solicitors

Ecole Polytechnique Fédérale de Lausanne · Ecole Polytechnique Fédérale de Lausanne Laboratory of Nanostructures and Novel Electronic Materials. Metals Semiconductors Novel conductors,

Standard 115 Measuring Concrete Micro Surface Texturenovoshinesolutions.com/wp-content/uploads/2017/06/CSDA-ST-115.… · Spec. No. Title Effective Date Revised Page CSDA-ST-115 Measuring

CSDA Annual Report