View
29
Download
0
Category
Preview:
DESCRIPTION
OLTP on Hardware Islands. Danica Porobic , Ippokratis Pandis*, Miguel Branco, Pınar Tözün , Anastasia Ailamaki. Data-Intensive Application and Systems Lab, EPFL *IBM Research - Almaden. Hardware topologies have changed. Core. Core. Core. high. Core. Core. S MP. (~100ns). - PowerPoint PPT Presentation
Citation preview
OLTP on Hardware Islands
Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki
Data-Intensive Application and Systems Lab, EPFL*IBM Research - Almaden
Hardware topologies have changed
Topology Core to core latency
Core
Core Core
Core Core
Core Core
Core Core
Island
CMP
SMP of CMP
low
variable
SMPhigh
(~10ns)
(10-100ns)
(~100ns)
2Variable latencies affect performance & predictability
Should we favor shared-nothing software configurations?
The Dreaded Distributed Transactions
3
% Multisite TransactionsTh
roug
hput
Shared-Nothing
Shared-Everything
• Shared-nothing OLTP configurations need to execute distributed transactions
• Extra lock holding time• Extra sys calls for comm.• Extra logging• Extra 2PC logic/instructions• ++• At least SE provide robust perf.• Some opportunities for constructive
interference (e.g. Aether logging)Shared-nothing configurations not the panacea!
Deploying OLTP on Hardware Islands
4HW + Workload -> Optimal OLTP configuration
Shared-everything Single address space Easy to programх Random read-write accesses х Many synchronization pointsх Sensitive to latency
Shared-nothing Thread-local accessesх Distributed transactionsх Sensitive to skew
Best Worst
Thread to core assignment
Perf
orm
ance
Outline• Introduction• Hardware Islands• OLTP on Hardware Islands• Conclusions and future work
5
6
Multisocket multicores
Communication latency varies significantly
Core Core Core
L1
L2
L1
L2
L1
L2
L3
Memory controller
Core
L1
L2
Core Core Core
L1
L2
L1
L2
L1
L2
L3
Core
L1
L2
L1
L3
Inter-socket links
Memory controller
Inter-socket linksInter-socket links Inter-socket links
50 cycles500 cycles
<10 cycles
OSOS
Placement of application threadsCounter microbenchmark TPC-C Payment
0
50000000
100000000
150000000
200000000
250000000
300000000
350000000
400000000
Thro
ughp
ut (M
tps)
0
2000
4000
6000
8000
10000
12000
Thro
ughp
ut (K
tps)
8socket x 10cores 4socket x 6cores
39%
7
Unpredictable
40%
47%
? ?? ?? ?? ?
? ?? ?
Spread IslandSpread Island
Islands-aware placement matters
8
Counter per core
Counter per
socket
Single counter
1000000
10000000
100000000
1000000000
10000000000
Thro
ughp
ut (M
tps)
Impact of sharing data among threadsCounter microbenchmark TPC-C Payment – local-only
Shared not..
.
Shared eve
ry...
020000400006000080000
100000120000140000160000
18.7x
516.8x
4.5x
Fewer sharers lead to higher performance
8socket x 10cores 4socket x 6cores
Log
scal
e
(vs 8x)
(vs 80x)
Outline• Introduction• Hardware Islands• OLTP on Hardware Islands
– Experimental setup– Read-only workloads– Update workloads– Impact of skew
• Conclusions and future work
9
Experimental setup• Shore-MT
– Top-of-the-line open source storage manager– Enabled shared-nothing capability
• Multisocket servers– 4-socket, 6-core Intel Xeon E7530, 64GB RAM– 8-socket, 10-core Intel Xeon E7-L8867, 192GB RAM
• Disabled hyper-threading• OS: Red Hat Enterprise Linux 6.2, kernel 2.6.32• Profiler: Intel VTune Amplifier XE 2011• Workloads: TPC-C, microbenchmarks
10
Microbenchmark workload• Single-site version
– Probe/update N rows from the local partition
• Multi-site version– Probe/update 1 row from the local partition– Probe/update N-1 rows uniformly from any partition– Partitions may reside on the same instance
• Input size: 10 000 rows/core
11
Software System Configurations
1 IslandShared-everything
24 IslandsShared-nothing
4 Islands
12
13
Increasing % of multisite xcts: reads
0 20 40 60 80 1000
20000400006000080000
100000120000140000160000180000200000 1 Island
24 Islands4 Islands
% Multisite transactions
Contention for shared data
No locks or latches
Messaging overhead
Fewer msgs per xct
Finer grained configurations are more sensitive to distributed transactions
0% 50% 100%0
10
20
30
40
50
60
LoggingLockingCommunicationXct managementXct execution
Multisite transactions
Tim
e pe
r tra
nsac
tions
(µs)
Where are the bottlenecks? Read case
14
4 Islands10 rows
Communication overhead dominates
15
Increasing size of multisite xct: read case
0 10 20 30 40 50 60 70 80 90 1000
20
40
60
80
100
120
140 24 Islands12 Islands4 Islands1 Island
Number of rows retrieved
Tim
e pe
r tra
nsac
tion
(μs) Physical
contention
More instances per transaction
All instances involved in a transaction
Communication costs rise until all instances are involved in every transaction
16
Increasing % of multisite xcts: updates
0 20 40 60 80 1000
20000
40000
60000
80000
100000
120000 1 Island24 Islands4 Islands
% Multisite transactions
Thro
ughp
ut (K
Tps)
No latches
2 round of messages+Extra logging
+Lock held longer
Distributed update transactions are more expensive
0% 50% 100%0
50
100
150
200
250
300
LoggingLockingCommunicationXct managementXct execution
Multisite transactions
Tim
e pe
r tra
nsac
tions
(µs)
Where are the bottlenecks? Update case
17
4 Islands10 rows
Communication overhead dominates
18
Increasing size of multisite xct: update case
0 10 20 30 40 50 60 70 80 90 1000
50
100
150
200
25024 Islands12 Islands4 Islands1 Island
Number of rows updated
Tim
e pe
r tra
nsac
tion
(μs)
Efficient logging with
Aether*
More instances per transaction
Increased contention
*R. Johnson, et al: Aether: a scalable approach to logging, VLDB 2010Shared everything exposes constructive interference
19
0 0.25 0.5 0.75 10
100000200000300000400000500000600000700000800000
50% multisite
Skew factor
0 0.25 0.5 0.75 10
100000200000300000400000500000600000700000800000
Local only
24 Islands4 Islands1 Island
Skew factor
Thro
ughp
ut (K
Tps)
Effects of skewed input
4 Islands effectively balance skew and contention
Few instances are highly loaded
Still few hot instances
Contention for hot data
Larger instances can balance load
OLTP systems on Hardware Islands• Shared-everything: stable, but non-optimal• Shared-nothing: fast, but sensitive to workload• OLTP Islands: a robust, middle-ground
– Runs on close cores– Small instances limits contention between threads– Few instances simplify partitioning
• Future work:– Automatically choose and setup optimal configuration– Dynamically adjust to workload changes
Thank you!20
Recommended