Upload
reynard-wilkinson
View
212
Download
0
Embed Size (px)
DESCRIPTION
Lecture 11 Motivation Long pipelines and fine grain parallel processors (e.g., SuperScalar RISC, VLIW & EDGE) benefit from instruction level parallelism. Decreasing critical path length improves loop performance Grouping constants improves constant propagation.
Citation preview
Lecture 11 CS 380C 1
380CLast Time• Interactions of
scheduling and register usage
Today• Interactions of
scheduling and instruction level parallelism
Lecture 11
Shape of Expressions• Proebsting & Fischer assume a
fixed expression tree • Hunt et al. reorganize
commutative and associative operations in expression trees to – Increase ILP– Decrease critical path length– Group constants
Lecture 11
Motivation• Long pipelines and fine grain
parallel processors (e.g., SuperScalar RISC, VLIW & EDGE) benefit from instruction level parallelism.
• Decreasing critical path length improves loop performance
• Grouping constants improves constant propagation.
Lecture 11
ExampleLet M denote intermediate values we need to preserve.
Let I denote associative operations whose intermediate values we do not need to preserve.
Lecture 11
Example
What should we do to balance this tree?
Lecture 11
Balancing M3
Lecture 11
Balancing M1
Lecture 11
Baer & Bovet: Balance Subtree Approach
• Given a tree of associative and commutative operators, and other operators
• Rearrange the tree to make it more balanced
• Caveats – Preserve intermediate values in the
expression tree that are used elsewhere– Preserve subtrees rooted by non-
associative operations
Lecture 11
Problem - unbalanced• Although each preserved node has
a balanced sub-tree, the whole tree isn’t very balanced.
• Note that preserved nodes with many leaves can be closer to the root.
Lecture 11
Solution – Huffman Coding
• Give constants weight 0• Give other leaves weight 1• Give interior nodes weight by summing their
leaves• Put them all in a sorted worklist• Take two lowest weight nodes out of the
worklist until the worklist is a singleton– Combine them in a subtree– Weigh this interior node by summing its leaves,
insert it in the worklist• Weigh preserved nodes by summing subtrees• Guarantees optimally balanced tree
Lecture 11
After marking the weights
Lecture 11
Balancing according to Huffman
Is this better?
Lecture 11
Another Example
Lecture 11
Balanced
Lecture 11
After constant propagation
Lecture 11
Results• Mixed• Improves a few programs by a lot, but not a lot
of programs on TRIPS simulator• Huffman minimizes the sum of the tree• Baer and Bovet minimize the length of the
critical path• In practice, they often attain the same result
for expression reduction• For software fanout trees, Huffman seems to
tolerate unknown latencies through the program better than Hartley and Casavant, which minimizes the length of the critical path given non-unit weights
Lecture 11
Summary• Reorganize trees of commutative
and associative operations.– Use Huffman coding to produce an
overall balanced tree– Improves ILP– Decrease critical path length– Group constants
Lecture 11
Next Time• P. Briggs, Register Allocation via
Graph Coloring, PhD dissertation, Rice University, April 1992, Chapters 1, 2, 3, 6, 7, 8 & 9
• Skim and/or cherry pick depending on your interests