18
Lecture 11 CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions of scheduling and instruction level parallelism

Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Embed Size (px)

DESCRIPTION

Lecture 11 Motivation Long pipelines and fine grain parallel processors (e.g., SuperScalar RISC, VLIW & EDGE) benefit from instruction level parallelism. Decreasing critical path length improves loop performance Grouping constants improves constant propagation.

Citation preview

Page 1: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11 CS 380C 1

380CLast Time• Interactions of

scheduling and register usage

Today• Interactions of

scheduling and instruction level parallelism

Page 2: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Shape of Expressions• Proebsting & Fischer assume a

fixed expression tree • Hunt et al. reorganize

commutative and associative operations in expression trees to – Increase ILP– Decrease critical path length– Group constants

Page 3: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Motivation• Long pipelines and fine grain

parallel processors (e.g., SuperScalar RISC, VLIW & EDGE) benefit from instruction level parallelism.

• Decreasing critical path length improves loop performance

• Grouping constants improves constant propagation.

Page 4: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

ExampleLet M denote intermediate values we need to preserve.

Let I denote associative operations whose intermediate values we do not need to preserve.

Page 5: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Example

What should we do to balance this tree?

Page 6: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Balancing M3

Page 7: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Balancing M1

Page 8: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Baer & Bovet: Balance Subtree Approach

• Given a tree of associative and commutative operators, and other operators

• Rearrange the tree to make it more balanced

• Caveats – Preserve intermediate values in the

expression tree that are used elsewhere– Preserve subtrees rooted by non-

associative operations

Page 9: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Problem - unbalanced• Although each preserved node has

a balanced sub-tree, the whole tree isn’t very balanced.

• Note that preserved nodes with many leaves can be closer to the root.

Page 10: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Solution – Huffman Coding

• Give constants weight 0• Give other leaves weight 1• Give interior nodes weight by summing their

leaves• Put them all in a sorted worklist• Take two lowest weight nodes out of the

worklist until the worklist is a singleton– Combine them in a subtree– Weigh this interior node by summing its leaves,

insert it in the worklist• Weigh preserved nodes by summing subtrees• Guarantees optimally balanced tree

Page 11: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

After marking the weights

Page 12: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Balancing according to Huffman

Is this better?

Page 13: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Another Example

Page 14: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Balanced

Page 15: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

After constant propagation

Page 16: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Results• Mixed• Improves a few programs by a lot, but not a lot

of programs on TRIPS simulator• Huffman minimizes the sum of the tree• Baer and Bovet minimize the length of the

critical path• In practice, they often attain the same result

for expression reduction• For software fanout trees, Huffman seems to

tolerate unknown latencies through the program better than Hartley and Casavant, which minimizes the length of the critical path given non-unit weights

Page 17: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Summary• Reorganize trees of commutative

and associative operations.– Use Huffman coding to produce an

overall balanced tree– Improves ILP– Decrease critical path length– Group constants

Page 18: Lecture 11CS 380C 1 380C Last Time Interactions of scheduling and register usage Today Interactions…

Lecture 11

Next Time• P. Briggs, Register Allocation via

Graph Coloring, PhD dissertation, Rice University, April 1992, Chapters 1, 2, 3, 6, 7, 8 & 9

• Skim and/or cherry pick depending on your interests