© 2009 IBM Corporation
Architectures for Massive Parallel Data Base Clusters providing Linear Scale-Out and Fault Tolerance on Commodity Hardware for OLTP Workloads
Lightning Talk: XLDB Workshop 2013 @CERN, 28.05.2013Romeo Kienzler, IBM Innovation Center Zurich
© 2009 IBM Corporation
IBM Presentation Template Full Version
2
Source: If applicable, describe source origin
Shared Disk vs. Shared Nothing
Centralized Locking Distributed Locking
Compute Node Fault Tolerance Partition Replication
Ad-Hoc Load Balancing Data Partitioning, Data Skew
Resource-Starvation on Disk System Linear Scale-Out for Writes
Write-Limited Write-Limited for Distributed Two Phase Commit
Requires Distributed Buffering Effectiveness of Local Buffer Pools
Inherent Data-Shipping support Performance Impact on Data-Shipping
© 2009 IBM Corporation
IBM Presentation Template Full Version
3
Source: If applicable, describe source origin
Show-Stopper for Shared-Nothing
Partition-Skew for Random Access Patterns
© 2009 IBM Corporation
IBM Presentation Template Full Version
4
Source: If applicable, describe source origin
BUT
Large-Scale Shared-Disk Systems introduce Bottlenecks
© 2009 IBM Corporation
IBM Presentation Template Full Version
5
Source: If applicable, describe source origin
IDEA
Cluster File System
© 2009 IBM Corporation
IBM Presentation Template Full Version
6
Source: If applicable, describe source origin
GPFS Declustered RAID
© 2009 IBM Corporation
IBM Presentation Template Full Version
7
Source: If applicable, describe source origin
GPFS Declustered RAID
© 2009 IBM Corporation
IBM Presentation Template Full Version
8
Source: If applicable, describe source origin
GPFS - Example
© 2009 IBM Corporation
IBM Presentation Template Full Version
9
Source: If applicable, describe source origin
GPFS - Example
© 2009 IBM Corporation
IBM Presentation Template Full Version
10
Source: If applicable, describe source origin
IDEA
Compute Nodes without Disks
© 2009 IBM Corporation
IBM Presentation Template Full Version
11
Source: If applicable, describe source origin
Problem: No Data Locality
200K Disks => 60 ms
© 2009 IBM Corporation
IBM Presentation Template Full Version
12
Source: If applicable, describe source origin
Problem: No Data Locality
-------------------------------
© 2009 IBM Corporation
IBM Presentation Template Full Version
13
Source: If applicable, describe source origin
IDEA
Point-To-Point Connections
© 2009 IBM Corporation
IBM Presentation Template Full Version
14
Source: If applicable, describe source origin
Switching Fabric
© 2009 IBM Corporation
IBM Presentation Template Full Version
15
Source: If applicable, describe source origin
Network Bottleneck Problem Solved
© 2009 IBM Corporation
IBM Presentation Template Full Version
16
Source: If applicable, describe source origin
IDEA
Centralized Lock Management
© 2009 IBM Corporation
IBM Presentation Template Full Version
17
Source: If applicable, describe source origin
Centralized Locking
Infiniband
Low Latency Up to 60 Gbit/s RDMA
Source: http://thetechjournal.com
Source: http://www.mellanox.co.jp
© 2009 IBM Corporation
IBM Presentation Template Full Version
18
Source: If applicable, describe source origin
Centralized Buffer Pool
© 2009 IBM Corporation
IBM Presentation Template Full Version
19
Source: If applicable, describe source origin
IDEA
Centralized Lock Management
Switching FabricCompute NodesClients
Cluster File System
Centralized Buffer Pool
© 2009 IBM Corporation
IBM Presentation Template Full Version
20
Source: If applicable, describe source origin
DB2 pureScale – General Concepts
Based on DB2z Parallel Sysplex concept1¹ Shared disk concept
Multiple DB2 worker nodes Single GPFS file system
Centralized buffer pool and lock management
¹For example, Toronto Dominion Bank (TD Bank) has had 100 percent availability of customer information for 10 consecutive years, including two DB2 for z/OS upgrades during that timeframe.
© 2009 IBM Corporation
IBM Presentation Template Full Version
21
Source: If applicable, describe source origin
DB2 pureScale – Operation Model
Infiniband, RDMA
Infiniband, 10 GBit Ethernet, 8 Gbit/s SAN
© 2009 IBM Corporation
IBM Presentation Template Full Version
22
Source: If applicable, describe source origin
DB2 pureScale – Fault Tolerance
Active-active concept Clean pages don't need to be recovered -> GPFS reliability Dirty pages are known to the CF
CF locks dirty pages Recovery DB2 instance flushes dirty pages to GPFS
© 2009 IBM Corporation
IBM Presentation Template Full Version
23
Source: If applicable, describe source origin
DB2 pureScale – Recovery Performance
© 2009 IBM Corporation
IBM Presentation Template Full Version
24
Source: If applicable, describe source origin
DB2 pureScale - Scale-Out
0123456789
101112
0 5 10 15
© 2009 IBM Corporation
IBM Presentation Template Full Version
25
Source: If applicable, describe source origin
Summary
● Linear Scale-Out● Fault Tolerance● Commodity Hardware● Support for OLTP Workloads
© 2009 IBM Corporation
IBM Presentation Template Full Version
26
Source: If applicable, describe source origin
Summary
● Linear Scale-Out● Fault Tolerance● Commodity Hardware● Support for OLTP Workloads
© 2009 IBM Corporation
IBM Presentation Template Full Version
27
Source: If applicable, describe source origin
Summary
● Linear Scale-Out● Fault Tolerance● Commodity Hardware● Support for OLTP Workloads
© 2009 IBM Corporation
IBM Presentation Template Full Version
28
Source: If applicable, describe source origin
Summary
● Linear Scale-Out● Fault Tolerance● Commodity Hardware● Support for OLTP Workloads
© 2009 IBM Corporation
IBM Presentation Template Full Version
29
Source: If applicable, describe source origin
Summary
● Linear Scale-Out● Fault Tolerance● Commodity Hardware● Support for OLTP Workloads
© 2009 IBM Corporation
IBM Presentation Template Full Version
30
Source: If applicable, describe source origin
Summary
● Linear Scale-Out● Fault Tolerance● Commodity Hardware● Support for OLTP Workloads