10.Synthesis 1(1)

Embed Size (px)

Citation preview

  • 8/13/2019 10.Synthesis 1(1)

    1/59

    111

    ehrdad NouraniDept. of EE

    Univ. of Texas at Dallas

    EEDG/CE 6301: Advanced Digital Logic

  • 8/13/2019 10.Synthesis 1(1)

    2/59

    222

    Synthesis and Design Automation

    Session 10

  • 8/13/2019 10.Synthesis 1(1)

    3/59

    3

    Architectural Synthesis

  • 8/13/2019 10.Synthesis 1(1)

    4/59

    4

    Synthesis

    Transform behavioral into structural view.

    Architectural-level synthesis

    Architectural abstraction level.

    Determine macroscopic structure.

    Example: major building blocks like adder, register,mux.

    Logic-level synthesis

    Logic abstraction level.

    Determine microscopic structure.

    Example: logic gate interconnection.

  • 8/13/2019 10.Synthesis 1(1)

    5/59

    5

    Synthesis and Optimization

  • 8/13/2019 10.Synthesis 1(1)

    6/59

    6

    Architectural-Level Synthesis Motivation

    Raise input abstraction level.

    Reduce specification of details.

    Extend designer base.

    Self-documenting design specifications.

    Ease modifications and extensions. Reduce design time.

    Explore and optimize macroscopic structure

    Series/parallel execution of operations.

  • 8/13/2019 10.Synthesis 1(1)

    7/59

    7

    Architectural-Level Synthesis

    Translate HDL models into sequencing graphs.

    Behavioral-level optimization

    Optimize abstract models independently from theimplementation parameters.

    Architectural synthesis and optimizationCreate macroscopic structure

    data-path and control-unit.

    Consider area and delay information of the

    implementation.

  • 8/13/2019 10.Synthesis 1(1)

    8/59

    8

    Example - Pseudo Code

    Second-order differential equation solver

  • 8/13/2019 10.Synthesis 1(1)

    9/59

    9

    Example - VHDL

  • 8/13/2019 10.Synthesis 1(1)

    10/59

    10

    Example - Verilog

  • 8/13/2019 10.Synthesis 1(1)

    11/59

    11

    Dataflow Graphs

    Behavioral views of

    architectural models.

    Useful to represent data-paths.

    Graph

    Vertices = operations.Edges = dependencies.

    Dependencies arise due

    Input to an operation is result

    of another operation.

    Serialization constraints inspecification.

    Two tasks share the same

    resource.

  • 8/13/2019 10.Synthesis 1(1)

    12/59

    12

    Dataflow Graphs (cont.)

    Assumes the existence of variables who storeinformation required and generated byoperations.

    Each variable has a lifetime which is the interval

    frombirth todeath. Variable birth is the time at which the value is

    generated.

    Variable death is the latest time at which thevalue is referenced as input to an operation.

    Values must be preserved during life-time.

  • 8/13/2019 10.Synthesis 1(1)

    13/59

    13

    Sequencing Graphs

    Useful to represent data-path and control.

    Extended dataflow graphs

    Control Data Flow Graphs

    (CDFGs).Polar: source and sink.

    Operation serialization.

    Hierarchy.

    Control-flow commands branching and iteration.

    Paths in the graphrepresent concurrent

    streams of operations.

  • 8/13/2019 10.Synthesis 1(1)

    14/59

    14

    Behavioral-level optimization

    Tree-height reduction

    using commutative andassociative properties x = a + b * c + d

    =>

    x = (a + d) + b * c

    Tree-height reduction

    using distributiveproperty

    x = a * (b * c * d + e)=>

    x = a * b * c * d + a * e

  • 8/13/2019 10.Synthesis 1(1)

    15/59

    15

    Architectural Synthesis and Optimization

    Synthesize macroscopic structure in terms of

    building-blocks.

    Explore area/performance trade-offs

    maximum performance implementations subject to

    area constraints.minimum area implementations subject to

    performance constraints.

    Determine an optimal implementation.

    Create logic model for data-path and control.

  • 8/13/2019 10.Synthesis 1(1)

    16/59

    16

    Circuit Specification for Architectural Synthesis

    Circuit behavior

    Sequencing graphs.

    Building blocks

    Resources.

    Functional resources: process data (e.g. ALU). Memory resources: store data (e.g. Register).

    Interface resources: support data transfer (e.g. MUX and Buses).

    Constraints

    Interface constraints Format and timing of I/O data transfers.

    Implementation constraints

    Timing and resource usage.

    +Area+ Cycle-time and latency

  • 8/13/2019 10.Synthesis 1(1)

    17/59

    17

    Resources

    Functional resources: perform operations on data.

    Example: arithmetic and logic blocks.

    Standard resources

    Existing macro-cells.

    Well characterized (area/delay).

    Example: adders, multipliers, ALUs, Shifters, ...

    Application-specific resources

    Circuits for specific tasks.

    Yet to be synthesized.

    Example: instruction decoder.

    Memory resources: store data.

    Example: memory and registers.

    Interface resources

    Example: busses and ports.

  • 8/13/2019 10.Synthesis 1(1)

    18/59

    18

    Resources and Circuit Families

    Resource-dominated circuits.

    Area and performance depend on few, well-characterized blocks.

    Example: DSP circuits.

    Non resource-dominated circuits.Area and performance are strongly influenced by

    sparse logic, control and wiring.

    Example: some ASIC circuits.

  • 8/13/2019 10.Synthesis 1(1)

    19/59

    19

    Synthesis in the Temporal Domain: Scheduling

    Scheduling

    Associate a start-time with each operation.

    Satisfying all the sequencing (timing and resource)constraint.

    GoalDetermine area/latency trade-off.

    Determine latency and parallelism of the implementation.

    Scheduled sequencing graph

    Sequencing graph with start-time annotation.

    Unconstrained scheduling.

    Scheduling with timing constraints

    Scheduling with resource constraints.

  • 8/13/2019 10.Synthesis 1(1)

    20/59

    20

    Tradeoff in Scheduling

    4 Mult ipl iers, 2 ALUs 1 Multip l ier , 1 ALU

  • 8/13/2019 10.Synthesis 1(1)

    21/59

    21

    Tradeoff in Scheduling (cont.)

    2 Mult ipl iers, 3 ALUs 2 Mult ipl iers, 2 ALUs

  • 8/13/2019 10.Synthesis 1(1)

    22/59

    22

    Synthesis in the Spatial Domain: Binding

    Binding

    Associate a resource with each operation with thesame type.

    Determine area of the implementation.

    SharingBind a resource to more than one operation.

    Operations must not execute concurrently.

    Bound sequencing graph

    Sequencing graph with resource annotation.

  • 8/13/2019 10.Synthesis 1(1)

    23/59

    23

    Example: Bound Sequencing Graph

  • 8/13/2019 10.Synthesis 1(1)

    24/59

    24

    Performance and Area Estimation

    Resource-dominated circuits

    Area = sum of the area of the resources bound to theoperations.

    Determined by binding.

    Latency = start time of the sink operation (minus

    start time of the source operation). Determined by scheduling

    Non resource-dominated circuits

    Area also affected by

    registers, steering logic, wiring and control.

    Cycle-time also affected by

    steering logic, wiring and (possibly) control.

  • 8/13/2019 10.Synthesis 1(1)

    25/59

    25

    Scheduling Algorithms

  • 8/13/2019 10.Synthesis 1(1)

    26/59

    Outline

    The scheduling problem

    Scheduling without constraints

    Scheduling under timing constraints

    Relative scheduling

    Scheduling under resource constraintsThe ILP model

    Heuristic methods

    List scheduling

    Force-directed scheduling

  • 8/13/2019 10.Synthesis 1(1)

    27/59

    27

    Scheduling

    Circuit model

    Sequencing graph.Cycle-time is given.

    Operation delays expressed in cycles.

    Scheduling

    Determine the start times for the operations.Satisfying all the sequencing (timing and resource)

    constraint.

    Goal

    Determine area/latency trade-off. Scheduling affects

    Area: maximum number of concurrent operations of sametype is a lower bound on required hardware resources.

    Performance: concurrency of resulting implementation.

  • 8/13/2019 10.Synthesis 1(1)

    28/59

    28

    Scheduling Example

    Sequencing Graph A scheduled DFG

  • 8/13/2019 10.Synthesis 1(1)

    29/59

    29

    Scheduling Models

    Unconstrained scheduling.

    Scheduling with timing constraints

    Latency

    Detailed timing constraints

    Scheduling with resource constraints Simplest scheduling model

    All operations have bounded delays.

    All delays are in cycles.

    Cycle-time is given

    No constraints - no bounds on area.

    Goal

    Minimize latency

  • 8/13/2019 10.Synthesis 1(1)

    30/59

    30

    Minimum-Latency Unconstrained Scheduling

    Given a set of operations V with integer delays D

    and a partial order on the operations E Find an integer labeling of the operations: V Z+, such thatti= (vi),

    ti tj+ dj i, j s.t. (vj, vi) Eand tnis minimum.

    Unconstrained schedulingused whenDedicated resources are used.Operations differ in type.Operations cost is marginal when compared to that of

    steering logic, registers, wiring, and control logic.Binding is done before scheduling: resource conflicts

    solved by serializing operations sharing same resource.

    Deriving bounds on latency for constrained problems.

  • 8/13/2019 10.Synthesis 1(1)

    31/59

    31

    ASAP Scheduling Algorithm

    Denote by tsthe start times computed by the as soon

    aspossible(ASAP)algorithm. Yields minimumvalues of start times.

  • 8/13/2019 10.Synthesis 1(1)

    32/59

    32

    ALAP Scheduling Algorithm

    Denote by tLthe start times computed by the as late

    as possible(ALAP)algorithm. Yields maximumvalues of start times.

    Latency upper bound (i.e. tn-t0)

  • 8/13/2019 10.Synthesis 1(1)

    33/59

    33

    Latency-Constrained Scheduling

    ALAP solves a latency-constrained problem.

    Latency bound can be set to latency computedby ASAP algorithm.

    Mobility

    Defined for each operation.Difference between ALAP and ASAP schedule.

    Zero mobility implies that an operation can be startedonly at one given time step.

    Mobility greater than 0 measures span of timeinterval in which an operation may start.

    Slack on the start time.

  • 8/13/2019 10.Synthesis 1(1)

    34/59

    34

    Example

    Operations with zero mobility

    {v1, v2, v3, v4, v5} Critical path

    Operations with mobility one

    {v6, v7}

    Operations with mobility two

    {v8, v9, v10, v11}

  • 8/13/2019 10.Synthesis 1(1)

    35/59

    35

    Classical scheduling problem.

    Fix area bound - minimize latency.

    The amount of available resources affects theachievable latency.

    Dual problemFix latency bound - minimize resources.

    Assumption

    All delays bounded and known.

    Scheduling under Resource Constraints

  • 8/13/2019 10.Synthesis 1(1)

    36/59

    36

    Minimum Latency Resource-Constrained Scheduling

    Given a set of ops V with integer delays D, a

    partial order on the operations E, and upperbounds {ak; k = 1, 2, , nres}

    Find an integer labeling of the operations

    : V Z+

    , such that

    ti= (vi),

    ti tj+ dj i, j s.t. (vj, vi) E

    and tnis minimum.

    Number of operations of any given type in any

    schedule step does not exceed bound.

    :V{1,2, nres}

  • 8/13/2019 10.Synthesis 1(1)

    37/59

    37

    Scheduling under Resource Constraints

    Intractable problem

    Algorithms

    Approximate

    List scheduling

    Force-directed scheduling

    Exact

    Integer linear program

    Hu (restrictive assumptions)

  • 8/13/2019 10.Synthesis 1(1)

    38/59

  • 8/13/2019 10.Synthesis 1(1)

    39/59

    39

    List Scheduling Algorithms

    Heuristic method for

    Minimum latency subject toresource bound.

    Minimum resource subject tolatency bound.

    Greedy strategy.

    Priority list heuristics.

    Assign a weight to each

    vertex indicating itsscheduling priority

    Longest path to sink.

    Longest path to timingconstraint. Labeled Sequencin g Graph

    Priority of Vi=label of Vi=i=

    #of edges in the longest pathFrom ViVn

  • 8/13/2019 10.Synthesis 1(1)

    40/59

    40

    List Scheduling Algorithm for Minimum Latency

    Based on a prioritymetric (e.g. mobility,

    labeling, etc.)

  • 8/13/2019 10.Synthesis 1(1)

    41/59

    41

    List Scheduling for Minimum Latency (cont.)

    Candidate Operations Ul,k

    Operations of type kwhose predecessors are scheduled andcompleted at time step before l

    Unfinished operations Tl,k are operations of type kthatstarted at earlier cycles and whose execution is notfinished at time l

    Note that when execution delays are 1,Tl,k is empty.

    }),(:and)(:{, EvvjldtkvypeVvU ijjjiikl

    }),(:)(:{, EvvjldtkvypeVvT ijjjiikl and

  • 8/13/2019 10.Synthesis 1(1)

    42/59

    42

    Example I

    Assumptions

    a1= 2 multipliers with delay 1. a2= 2 ALUs with delay 1.

    First Step U1,1= {v1, v2, v6, v8}

    Select {v1, v2}

    U1,2= {v10}; selected

    Second step U2,1= {v3, v6, v8}

    select {v3, v6}

    U2,2= {v11}; selected

    Third step U3,1= {v7, v8} Select {v7, v8}

    U3,2= {v4}; selected

    Fourth step

    U4,2= {v5, v9}; selected

    4, 4, 3, 2

    3, 3, 2

    2, 2

    Labels

    (priorities)

  • 8/13/2019 10.Synthesis 1(1)

    43/59

    43

    Example II

    Assumptions

    a1= 3 multipliers withdelay 2.

    a2= 1 ALU with delay 1.

  • 8/13/2019 10.Synthesis 1(1)

    44/59

    44

    List Scheduling for Minimum Resource Usage

    ALAP does not exist

    (Latency is too tight)

    Lower slack means

    higher urgency/priority

  • 8/13/2019 10.Synthesis 1(1)

    45/59

    45

    Example (i)

    Assume =4

    Let a = [1, 1]T

    First Step U1,1= {v1, v2, v6, v8} Operations with zero slack{v1, v2} a = [2, 1]T

    U1,2= {v10}

    Second step U2,1= {v3, v6, v8} Operations with zero slack{v3, v6} U2,2= {v11}

    Third step U3,1= {v7, v8}

    Operations with zero slack{v7, v8} U3,2= {v4}

    Fourth step U4,2= {v5, v9} Both have zero slack; a = [2, 2]T

  • 8/13/2019 10.Synthesis 1(1)

    46/59

    46

    Example (ii)

    Assume =8

    Let a = [1, 1]T

    ALAP Schedule Final Schedule

  • 8/13/2019 10.Synthesis 1(1)

    47/59

    47

    Scheduling Based onInteger Linear Programming (ILP)

  • 8/13/2019 10.Synthesis 1(1)

    48/59

    48

    ILP Solution

    Use standard ILP packages.

    Transform into LP problem [Gebotys].

    Advantages

    Exact method.

    Other constraints can be incorporated easily Maximum and minimum timing constraints

    Disadvantages

    Works well up to few thousand variables.

  • 8/13/2019 10.Synthesis 1(1)

    49/59

    49

    ILP Formulation

    Binary decision variables

    X = { xil; i = 1, 2, , n; l = 1, 2, , +1}.

    xil, is TRUE (1) only when operation vistarts in steplof the schedule (i.e. l = ti).

    is an upper bound on latency.

    Start time of operation vi

    Operations start only once

    ti =

  • 8/13/2019 10.Synthesis 1(1)

    50/59

    50

    ILP Formulation (cont.)

    Sequencing relations must be satisfied

    Resource bounds must be satisfied

  • 8/13/2019 10.Synthesis 1(1)

    51/59

    51

    ILP Formulation (cont.)

    Minimize cTtsuch that

  • 8/13/2019 10.Synthesis 1(1)

    52/59

    52

    ILP Formulation (cont.)

    About the objective function (cTt)

    cT=[0,0,,0,1]Tcorresponds to minimizing thelatency of the schedule.

    cT=[1,1,,1,1]Tcorresponds to finding the earlieststart times of all operations under the givenconstraints.

  • 8/13/2019 10.Synthesis 1(1)

    53/59

    53

    Example

    Resource constraints

    2 ALUs; 2 Multipliers.a1= 2; a2= 2.

    Single-cycle operation.di= 1 i.

    =4 is given as upper bound Objective function:

    No need to considerx1,1+x2,1+2x3,2+3x4,3+4x5,4because operations 2, 2, 3, 4and 5 have zero mobility.

  • 8/13/2019 10.Synthesis 1(1)

    54/59

    54

    Example (cont.)

    Resource constraints

    2 ALUs; 2 Multipliers.a1= 2; a2= 2.

    Single-cycle operation.di= 1 i.

    Operations start only oncex0,1=1; x1,1=1; x2,1=1; x3,2=1x4,3=1; x5,4=1x6,1+x6,2=1x

    7,2

    +

    x7,3

    =1x8,1+x8,2+x8,3=1x9,2+x9,3+x9,4=1x10,1+x10,2+x10,3=1x11,2+x11,3+x11,4=1

    xn,5=1

    Consider tiSand ti

    L

    of operations (i.e.

    the mobility)

  • 8/13/2019 10.Synthesis 1(1)

    55/59

    55

    Example (cont.)

    Sequencing relations must be

    satisfied2x3,2-x1,1 1

    2x3,2-x2,1 1

    2x7,2

    +3x7,3

    -x6,1

    -2x6,2

    1

    2x9,2+3x9,3+4x9,4-x8,1-2x8,2-3x8,3 1

    2x11,2+3x11,3+4x11,4-x10,1-2x10,2 -3x10,3 1

    4x5,4-2x7,2-3x7,3 14x5,4-3x4,3 1

    5xn,5-2x9,2-3x9,3-4x9,4 1

    5xn,5-2x11,2-3x11,3-4x11,4 1

    5xn,5-4x5,4 1

    Operationdependency

  • 8/13/2019 10.Synthesis 1(1)

    56/59

    56

    Example (cont.)

    Resource bounds must be satisfied:

    Any set of start times satisfyingconstraints provides a feasiblesolution.

    Any feasible solution is optimumsince sink (x

    n,5=1) mobility is 0.

  • 8/13/2019 10.Synthesis 1(1)

    57/59

    57

    Relative Timing Constraints in ILP

    Relative timing constraints can be considered by

    adding new set of inequalities. Example:

  • 8/13/2019 10.Synthesis 1(1)

    58/59

    58

    Dual ILP Formulation

    Minimize resource usage under latency

    constraint. Same constraints as previous formulation.

    Additional constraint

    Latency bound must be satisfied.

    Resource usage is unknown in the constraints.

    Resource usage is the objective to minimize.

    Minimize cTa

    a vectorrepresents resource usage

    cTvectorrepresents resource costs

  • 8/13/2019 10.Synthesis 1(1)

    59/59

    Example

    Multiplier area = 5; ALU area = 1.

    Objective function: 5a1+a2 Latency constraints: = 4

    Start time constraints same.

    Sequencing dependency

    constraints same. Resource constraints

    x1,1+x2,1+x6,1+x8,1a1 0

    x3,2+x6,2+x7,2+x8,2a1 0

    x7,3

    +x8,3

    a1

    0

    x10,1a2 0

    x9,2+x10,2+x11,2a2 0

    x4,3+x9,3+x10,3+x11,3a2 0

    x5,4+x9,4+x11,4a2 0