Power calculation for transistor operation
• What will cause power consumption to increase?
CS2710 Computer Organization 1
2
2clock
P CapacitiveLoad Voltage ClockFrequencyor
P CV f
CS2710 Computer Organization 2
0
5
10
15
20
25
30
35
40
45
50
f(x) = 1.37064407211569E-06 x + 21.4060648531567R² = 0.992074254384158
ATMEGA 32 Current versus Crystal Frequency
CurrentLinear (Current)
Crystal Frequency (Hz)
Micr
opro
cess
or C
urre
nt (m
A)2
:
:
clock
clock
P CV fAlsoP IVThusI CVf
Measuring the current used by the Atmega microprocessor shows a linear relationship
Note: V=5v for in this case
CS2710 Computer Organization 3
2 2.5 3 3.5 4 4.5 5 5.50
50
100
150
200
250
f(x) = 3.42804303590267 x^2.58744148306454R² = 0.998420723836924
Power versus Microprocessor Voltage
PowerPower (Power)
Microprocessor Voltage
Micr
opro
cess
or P
ower
(mW
)
What effect does increasing voltage to a microprocessor have on power? On speed?
Below around 2.5v (for this microprocessor), the transistors simply stop working
The Power Wall: Why haven’t clock rates continued to increase at historical rates?
CS2710 Computer Organization 4
Manufacturers have turned to multi-core architectures to bypass the Power Wall
CS2710 Computer Organization 5
Clock speed decrease, but overall performanceincrease
Lecture Objectives:
Benchmarking
1) Explain the SPEC benchmarks.2) Define Amdahl's law3) Define MIPS
Amdahl’s Law (p51)
• The performance enhancement possible with a given improvement is limited by the amount that the improved feature is used
CS2710 Computer Organization 7
ImIm
AffectedBy provementproved Unaffected
ExecutionTimeExecutionTime ExecutionTime
AmountOfImprovement
Amdahl’s Law Applied• A Program spends 40 seconds
performing network transfers and 60 seconds generating reports. – Suppose we could rewrite the report
generator to make it more efficient.– What improvement in performance in the
report generator would be necessary to increase the overall speed of the program by a factor of 2?
– How about by a factor of 3?
CS2710 Computer Organization 8
A Performance Metric: MIPS
Units: millions of instructions per second
CS2710 Computer Organization 9
610InstructionCountMIPS
ExecutionTime
Issues with MIPS metrics1. Measures instruction execution rate, but doesn’t
consider the complexity of the instructions performed
2. Average instruction complexity varies between programs executing on a single computer
3. Different microprocessors implement instructions of differing complexities
• MIPS may vary independently from performance• We cannot compare computers with different
instruction sets using MIPS!CS2710 Computer Organization 10
Benchmarking: How do you decide which computer to buy?
CS2710 Computer Organization 11
SPEC Benchmark• A set of programs used to measure performance
– Supposedly typical of actual workload• Standard Performance Evaluation Corp (SPEC)
– Develops benchmarks for CPU, I/O, Web, …
• SPEC CPU2006– Elapsed time to execute a selection of programs
• Negligible I/O, so focuses on CPU performance– Normalize relative to reference machine– Summarize as geometric mean of performance ratios
• CINT2006 (integer) and CFP2006 (floating-point)
CS2710 Computer Organization 12
n
n
1iiratio time Execution
Geometric vs. Arithmetic Mean
• Arithmetic mean:
• Geometric mean:
CS2710 Computer Organization 13
1
1 n
ii
xn
1
nn i
i
x
Which computer has better overall performance?
CS2710 Computer Organization 14
Computer A Computer B Computer C
Program 1 1 10 20
Program 2 1000 100 20
Which computer has better overall performance?
CS2710 Computer Organization 15
Computer A Computer B Computer C
Program 1 1 10 20
Program 2 1000 100 20
Arithmetic mean 500.5 55 20
Geometric mean 31.622 . . . 31.622 . . . 20
A is fastest via Arithmetic mean.A and B are tied via Geometric mean.
Geometric mean is the appropriate mean when the ranges of the values being compared vary significantly.
Benchmarking often computes performance relative to a standard reference
CS2710 Computer Organization 16
Computer A Computer B Computer C
Program 1 1 10 20
Program 2 1000 100 20
Computer A(reference) Computer B Computer C
Program 1 1 10 20
Program 2 1 0.1 0.02
Scaling the results in this manner is called normalization.Note that no normalization was needed for Program 1 sincethe reference computer’s value was already 1.
Let’s say A is the “reference” computer. We adjust all performance values by dividing each value by the reference computer’s value. In this example, we divide all results for Program 2 by the reference computer’s performance value of 1000, giving:
Arithmetic and Geometric means based on the normalized values:
Computer A Computer B Computer C
Program 1 1 10 20
Program 2 1 0.1 0.02
Arithmetic mean 1 5.05 10.01
Geometric mean 1 1 0.632 . . .
CS2710 Computer Organization 17
Now C is fastest via Arithmetic mean!A and B are still tied via Geometric mean.
Now consider computer B to be the “reference” computer and normalize A and C w.r.t. B
CS2710 Computer Organization 18
Now A is fastest via Arithmetic mean!A and B are still tied via Geometric mean.
The Geometric mean is consistent regardless of normalization!
Computer A Computer B(reference) Computer C
Program 1 0.1 1 2
Program 2 10 1 0.2
Arithmetic mean 5.05 1 1.1
Geometric mean 1 1 0.632
The SPECjvm2008 application– SPECjvm2008 is a benchmark suite for
measuring the performance of a Java Runtime Environment (JRE), containing several real life applications and benchmarks focusing on core java functionality.
– The SPECjvm2008 workload mimics a variety of common general purpose application computations.
CS2710 Computer Organization 19
CINT2006 integer performance benchmarks for the Opteron X4 2356
• Highlight the high cache misses of a few, and how this impacts the overall performance
• Highlight how different apps might be useful for different teamsCS2710 Computer Organization 20
Name Description IC×109 CPI Tc (ns) Exec time Ref time SPECratio
perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3
bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8
gcc GNU C Compiler 1,050 1.72 0.47 24 8,050 11.1
mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8
go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6
hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5
sjeng Chess game (AI) 2,176 0.96 0.48 37 12,100 14.5
libquantum Quantum computer simulation
1,623 1.61 0.40 1,047 20,720 19.8
h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3
omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1
astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1
xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0
Geometric mean 11.7
SPEC and power: ssj_ops(server-side java operations/sec)
• Power consumption of server at different workload levels– Performance: ssj_ops/sec– Power: Watts (Joules/sec)
CS2710 Computer Organization 21
10
0ii
10
0ii powerssj_ops Wattper ssj_ops Overall
A Power benchmark: SPEC Power versus loadSPECpower_ssj2008 for X4
CS2710 Computer Organization 22
Target Load % Performance (ssj_ops/sec) Average Power (Watts)100% 231,867 295
90% 211,282 28680% 185,803 27570% 163,427 26560% 140,160 25650% 118,324 24640% 920,35 23330% 70,500 22220% 47,126 20610% 23,066 1800% 0 141
Overall sum 1,283,590 2,605∑ssj_ops/ ∑power 493
Low power at low usage? No!
• Look back at X4 power benchmark– At 100% load: 295W– At 50% load: 246W (83%)– At 10% load: 180W (61%)
• Google data center– Mostly operates at 10% – 50% load– At 100% load less than 1% of the time
• Future research/development: Design processors to make power proportional to load
CS2710 Computer Organization 23