31
Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Embed Size (px)

DESCRIPTION

#1 – Reporting Results Reporting “peak” performance (Capable of 4 Instructions Per Cycle) Solution: Report actual performance on actual programs

Citation preview

Page 1: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Performance9 ways to fool the public

Old Chapter 4New Chapter 1.4

Page 2: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#1 – Reporting Results

• Reporting “peak” performance (Capable of 4 Instructions Per Cycle)

Page 3: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#1 – Reporting Results

• Reporting “peak” performance (Capable of 4 Instructions Per Cycle)

• Solution: Report actual performance on actual programs

Page 4: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#2 - Choosing Applications

• Choose programs that run quickly but do not represent real world– That you know run faster on your machine

than your competitor’s

Page 5: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#2 - Choosing Applications

• Choose programs that run quickly but do not represent real world

• Solution: Form a committee that creates benchmarks

• Benchmarks should:– Representative of real world– Expose performance aspects of machine

Page 6: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#3 - Choosing Applications

• Stuff the benchmark body with company members and do not invite your competitor

Page 7: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#3 - Choosing Applications

• Stuff the benchmark body with company members and do not invite your competitor– Intel did this to AMD

• Solution: Make sure lots of companies are represented on the committee

Page 8: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Benchmark suites

• SPEC-fp – scientific codes• SPEC-int – integer codes (irregular, small

codes)• TPC – database benchmarks• EEMBC – embedded proc benchmarks• Many, many others (Java, Networks,

media, …)

Page 9: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#4 - Running Applications

• Include flags in your compiler that hand-optimize specific benchmarks

• Solution:

Page 10: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#4 - Running Applications

• Include flags in your compiler that hand-optimize specific benchmarks

• Solution: Specify how much optimization is allowed and/or have independent bodies perform tests– http://www.tomshardware.com– http://www.anandtech.com

Page 11: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#5 – Reporting Results

• Selectively reporting results

• Solution:

Page 12: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#5 – Reporting Results

• Selectively reporting results– Apple vs Intel

• Solution: Report suite performance, not just individual benchmarks

• Speedup of X over Y =

Page 13: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#5 – Reporting Results

• Selectively reporting results– Apple vs Intel

• Solution: Report suite performance, not just individual benchmarks

• Speedup of X over Y = ExecutionTimeY / ExecutionTimeX

Page 14: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Speedup

Comp A Comp B Comp C

P1 1 10 20P2 1000 100 20

Total 1001 110 40

• On P1, A is ____ x’s as fast as B• On P1, B is ____ x’s as fast as C

Page 15: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Speedup

Comp A Comp B Comp C

P1 1 10 20P2 1000 100 20

• On P1, A is 10 x’s as fast as B• On P1, B is 2 x’s as fast as C

Page 16: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#6 – Floating 0 on graph

60

65

70

75

80

85

90

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

EastWestNorth

Page 17: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#6 – Normalized speedup, 0 to 1…

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

EastWestNorth

All execution times are divided by North’s execution time to obtain speedup

Page 18: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#7 – Reporting Results

• Manipulating average results

• Solution:

Page 19: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

SpeedupComp A Comp B Comp C

P1 1 10 20P2 1000 100 20

Arithmetic Mean

Geometric Mean

Weighted Mean

(60/40)

Page 20: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#7 – Reporting Results

• Manipulating average results

• Solution: Use geometric mean or weighted means rather than arithmetic means (average)

Page 21: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#8 – Reporting Results

• Reporting speedup of a small portion of code

• Solution:

Page 22: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#8 – Reporting Results

• Reporting speedup of a small portion of code

• Solution: Use Amdahl’s Law

Page 23: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Amdahl’s Law

• Speedup =

Page 24: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Amdahl’s Law

• Helps target design effort• Optimize the common case• Allow rare cases to suffer

Page 25: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Amdahl’s Law ExampleSuppose multiplication operations constitute

80% of the execution of a program. How much improvement do we need in the multiply to get a 3x speedup in the program?

What is the most possible speedup by improving only the multiply?

Page 26: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#9 – Misusing alternate metrics

• IPC = Instructions per Cycle• CPI = Cycles per Instruction• MIPS = Millions of Instructions per Second

Page 27: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

CPU time =

• #Insts * Clock cycle time * CPI• #Inst * CPI / Clock rate

• (go to board)

Page 28: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Using Execution Time

• Our old 1GHz machine runs our program in 20 seconds. We are designing a new machine, and the target execution time is 10 seconds. Through various architectural innovations, we can increase clock rate. Unfortunately, this increase comes at a price – we can pump the clock rate up to 2GHz, but the new execution takes 1.2 times as many cycles. Is this enough? What is the clock rate necessary for our performance target?

Page 29: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

#9 – Misusing alternate metrics

• IPC = Instructions per Cycle• CPI = Cycles per Instruction• MIPS = Millions of Instructions per Second• Clock Rate = Cycles / Second

• Solution: Only use alternate metrics when other elements of execute time equation are equal.

Page 30: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Using CPIWe have two code sequences involving

recalculation of the same values. The compiler writer can choose between having a code sequence of 7 instructions to recalculate three values, or temporarily storing them in the stack.

What are the total cycles of the code sequences?

Which should the compiler do?

Page 31: Performance 9 ways to fool the public Old Chapter 4 New Chapter 1.4

Code Sequence Loads Stores ALU Ops

1 2 1 15

2 5 4 8

Instruction Class Average CPI for Instruction ClassLoads 2Stores 1.5

ALU Ops 1

Machine performance characteristics:

Program characterstics