18
OOE vs. EPIC OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Embed Size (px)

Citation preview

Page 1: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

OOE vs. EPICOOE vs. EPIC

Emily Evans

Prashant Nagaraddi

Lin Gu

Page 2: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

ObjectiveObjective

Our objective is to evaluate the claims and counterclaims about OOE and EPIC made in:– “Is Out-of-Order Out of Date?” by

William S. Worley and Jerry Huck– “A Critical Look at IA-64” by Martin

Hopkins

Page 3: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

OutlineOutline

Analysis of ILPAnalysis of Code SizeAnalysis of Hardware ComplexityAnalysis of Compiler ComplexityAnalysis of Power ConsumptionComparison MethodologyConclusion

Page 4: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

What is EPIC?What is EPIC?“One of our goals for EPIC was to retain VLIW's philosophy of statically

constructing the POE, but to augment it with features, akin to those in a superscalar processor, that would permit it to better cope with these dynamic factors. The EPIC philosophy has the following key aspects to it.”

“Providing the ability to design the desired POE at compile-time.”“Providing features that permit the compiler to "play" the statistics.”“Providing the ability to communicate the POE to the hardware.”

*From EPIC: An architecture for instruction-level parallel processors by Michael S. Schlansker and B. Ramakrishna Rau.

 

Page 5: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Analysis of ILPAnalysis of ILP

MH: Hardware provides good ILP because it dynamically adjusts the instruction schedule based on the actual execution path and cache misses, with the use of:– Large reorder buffers– Register renaming– Branch prediction– Alias detection

WW & JH: Compiler can exploit ILP more effectively with the use of:– Massive resources -- large register set, more function units– Predication– Speculation

Page 6: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Analysis of ILP Analysis of ILP (cont.)(cont.)

Our observation:– From H&P book:

The SPECint benchmark shows that the Alpha 21264 and Pentium 4 considerably outperform the Itanium .

The SPECfp benchmark shows that the Itanium slightly outperforms the Alpha 21264 and Pentium 4.

– These diagrams are not an absolute measurement of the performance of OOE and EPIC.

A different implementations of the architectures may perform differently.

As EPIC compilers improve over time, these performance figures will change.

Page 7: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Analysis of Code SizeAnalysis of Code Size

MH: Code size for IA-64 could be as much as 4 times that of x86 to perform the same work.

WW & JH: Code size will be larger, but the instruction stream will contain fewer branches. Also, there are mechanisms to efficiently deliver instructions to the processor.

Page 8: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Analysis of Code Size Analysis of Code Size (cont.)(cont.)

Our observation:– Both sides agree that code size increases overall,

however they disagree on the extent to which it affects performance.

– EPIC code size will expand dramatically in some cases.

– EPIC code size can also be smaller than OOE code size in some cases.

– We expect that a mature optimizing compiler will be able to deliver code with reasonable size and, after all, code size doesn’t necessarily reflect performance loss linearly.

Page 9: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Analysis of Hardware Analysis of Hardware ComplexityComplexity

MH: To support features for greater ILP, EPIC hardware will be quite complex.– Predication requires more functional units– NaT bits to allow deferring exceptions– ALAT to allow loads before stores

WW & JH: IA-64 makes the hardware less complex because it is not responsible for detecting and scheduling the parallelism.– Reorder buffer, register renaming, etc

Page 10: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Analysis of Hardware Analysis of Hardware Complexity Complexity (cont.)(cont.)

Our observation:– Is EPIC processor more complex than OOE

processor? Example: Alpha 21264, two stages fewer (but more

stages don't necessarily mean more complexity)

– As mentioned in H&P book, good techniques in ‘enemy camp’ are often borrowed. EPIC processors are expected to be simple. However, to support better ILP, they will also invoke hardware support, which makes them more complex than expected.

Page 11: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Analysis of Compiler Analysis of Compiler ComplexityComplexity

MH: It is very difficult to write a good EPIC compiler. Profiling is also a burden:– Not welcomed by programmers– Hard to get and maintain a test suite– Formidable task for large programs

WW & JH: OOE compilers are difficult to write as well.– OOE processors still need good compilers to ensure

performance gains.– OOE compiler writers must understand the limitations of the

hardware and figure out how to work around them.– Code profiling is only “slightly” more important for EPIC

processors.

Page 12: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Analysis of Compiler Analysis of Compiler Complexity Complexity (cont.)(cont.)

Our observation:– Optimizing compiler can help performance for both

OOE and EPIC processors.– Profiling, which is a non-trivial task, adds

complexity to compiler.– An EPIC compiler has a much more responsibility

than an OOE compiler, so it is likely to be more complex.

– The EPIC philosophy aims to trade compiler complexity for hardware simplicity. Whether this is a critical disadvantage must be considered in the context of overall system complexity and performance.

Page 13: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Analysis of Power Analysis of Power ConsumptionConsumption

MH: Massive resources consume lots of power.– “Thus, IA-6 gambles that, in the future, power will

not be the critical limitation, …”

WW & JH: They left this issue out, perhaps because they do not think it is a big problem.

Page 14: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Analsysis of Power Analsysis of Power Consumption Consumption (cont.)(cont.)

Our observation:– The use of massive resources is likely to consume

more power.– Whether or not this will be a problem depends on

the aimed application area of the EPIC technology.

For servers and high-end workstations, the power consumption is not as important.

For embedded systems, power consumption is likely a very critical issue.

– For EPIC really to be a ‘general purpose’ technology, power consumption control must be considered.

Page 15: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Comparison MethodologyComparison Methodology

MH: Accumulating “facts” supporting a skeptical view of EPIC.– Example: EPIC stalls when OOE proceeds

WW & JH: Accumulating “facts” supporting an optimistic view of EPIC.– Example: Dynamic translation

Architecture design is a balance of CPI, frequency, instruction count, application limitation, and cost. There are always cases and countercases for every solution. They need to be considered in an integrated context.

Page 16: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Comparison Methodology Comparison Methodology (cont.)(cont.)

EPIC stalls when OOE proceeds– This will happen in some cases.– But, we must determine how this case actually

hurts performance. Cache miss is not a common case. Speculation makes this case even less common. In cache miss, OOE is also not expected to proceed far

enough.

Page 17: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

Comparison Methodology Comparison Methodology (cont.)(cont.)

Dynamic translation– It rarely gives much performance gain with highly

optimized code.– Dynamo example:

Page 18: OOE vs. EPIC Emily Evans Prashant Nagaraddi Lin Gu

ConclusionConclusion– Both authors make claims about the EPIC architecture

without providing any quantitative evidence.

– Quantitative evidence is necessary to conclude that one architecture is superior to another.

– EPIC is a useful effort in the exploration of higher ILP. When evaluating it, we need to isolate the usefulness of the

architectural approach from a single specific implementation of it.

The idea behind EPIC is good, but more time, effort, and calm calculation are needed to know whether it works.