Upload
jerome-oliver
View
212
Download
0
Embed Size (px)
Citation preview
CAECW 2008 -- Salt Lake City -- Veazey & Gaither
Varying Memory Size with TPC-CPerformance and Resource Effects
Jay Veazey and Blaine GaitherHewlett-Packard
[email protected]@hp.com
CAECW 2008 -- Salt Lake City -- Veazey & Gaither
Motivation --- why is this interesting?
• More memory increases performance→ How much?→ Why exactly? → Reveal and quantify the underlying causes
• Focus is R&D tradeoffs→ Performance, cost, schedule, power→ How much memory to design into a commercial server?→ Is memory latency more important than memory size?
CAECW 2008 -- Salt Lake City -- Veazey & Gaither
Experimental Design
• Vary memory 32-192 GBytes→ Measure
• Throughput• Resource utilization
– CPU, disk I/O, memory BW, CPI, OS context switches
• HP Integrity rx6600→ Itanium 2 9050 CPUs (2S/4C)→ About 750 disk drives
• TPC-C→ Resource intensive→ Standard, “coin of the realm”…easy to communicate→ Unofficial results
CAECW 2008 -- Salt Lake City -- Veazey & Gaither
Throughput
Throughput vs Memory Size -- TPC-C SQL
120000
140000
160000
180000
200000
220000
240000
0 32 64 96 128 160 192 224 256
GBytes Memory
thro
ug
hp
ut
• Increase of 48% in throughput
CAECW 2008 -- Salt Lake City -- Veazey & Gaither
Resource Utilization
• I/O reduction accounts for 20% of the 48% throughput improvement.
• Where’s the rest of it?
Disk I/O and CPU utilization
GB Mem thruput
CPU Util. IOs / sec
Relative thruput
approx. % insts. devoted to I/O
32 149,934 99.7% 71,068 1.00 31%
64 173,017 99.0% 58,907 1.15 24%
96 184,716 99.7% 50,574 1.23 20%
128 196,521 99.5% 44,397 1.31 17%
192 221,289 99.9% 29,422 1.48 11%
CAECW 2008 -- Salt Lake City -- Veazey & Gaither
CPI and Memory
• As memory is added, CPU cycles are used more efficiently
• But this is an effect, not a cause---why does CPI fall?
CPI vs Memory Size
1.30
1.35
1.40
1.45
1.50
1.55
1.60
0 32 64 96 128 160 192 224 256
GBytes Memory
CP
I
CAECW 2008 -- Salt Lake City -- Veazey & Gaither
CPI and Memory Bandwidth• CPI can change for many reasons, most irrelevant here
• Memory accesses are relevant– When a load misses cache, the delay counts toward CPI
Memory Size vs Bus BW
2,600
2,700
2,800
2,900
3,000
3,100
3,200
0 32 64 96 128 160 192 224 256
GBytes Memory
Bu
s B
W -
- M
by
tes
/ s
ec
CAECW 2008 -- Salt Lake City -- Veazey & Gaither
Caches Stabilize with Increasing Memory
• Units normalized for throughput– accesses (or misses) / sec / CPU / tpmC
• L1 accesses imply that the registers also stabilize
memoryL1
accessesL1
missesL2
missesL3
misses
32 6901 1549 183 22
64 6219 1377 155 19
96 5943 1297 139 17
128 5683 1232 127 16
192 5122 1095 109 14
CAECW 2008 -- Salt Lake City -- Veazey & Gaither
OS Thread Switches and Memory
• Reduced thread switches probably cause of register / cache stabilization --- working sets stay around longer
Thread Switches vs. Memory Size
3000
3500
4000
4500
5000
5500
6000
6500
7000
7500
8000
0 32 64 96 128 160 192 224 256
GBytes Memory
thre
ad
sw
itc
he
s / t
pm
C
CAECW 2008 -- Salt Lake City -- Veazey & Gaither
Summary and Conclusions
• Adding memory increases performance significantly• I/O is reduced, as well as I/O instruction pathlength• Context switches are reduced as a result of less I/O
– Fewer memory accesses– Lower CPI– More stable caches and registers