The ScaleDB Storage EngineEnabling high performance and scalability, using a Multi-Table Index, and a Shared-Disk Clustering Architecture
Moshe Shadmon [email protected]
Agenda
Overview
ScaleDB’s Clustering Architecture
o Shared-Disk vs. Shared-Nothing
o MySQL and a Shared-Disk Storage Engine
o ScaleDB Installation
o Demo
ScaleDB’s Indexing Technology
o Multi-Table Index
o Enabling Multi-Table Index in MySQL
o Demo
Summary
ScaleDB Status & Product Availability
Overview
Plug-in Storage Engine for MySQL Main Features:
o Shared-Disk Architectureo Innovative Multi-Table Indexingo Transactionalo Row-Level Lockingo ACID Compliant
o Atomicity: All tasks of a transaction performed or none of them are.
o Consistency: The database is in a consistent state before and after the transaction.
o Isolation: Data is not available in an intermediate state during a transaction
o Durability: When a transaction completes, the transaction’s data will persist
o Disk-Based Storage Engine
Shared-Disk vs. Shared-Nothing
Manageability
Adaptability
Availability/Fault-Tolerance
Scalability
Performance
Total Cost of Ownership (TCO)
Shared-Nothing:
DatabaseInstance 1
Table ATable B
Table C
DatabaseInstance 1
DatabaseInstance 2
DatabaseInstance 3
Table A
Table B
Table C
Vertical Partitioning
Shared Nothing: Partitioning Your Data…How
Predict usage patterns, application evolution, data growth patterns…all are moving targets
Avoid data skew: bottlenecks caused by frequently accessed data on just a few nodes
Avoid data shipping between nodes
Avoid delays from distributed 2-phase commit
Searches outside the partition column require participation by all nodes
Scaling becomes an exercise in fire fighting
Bob 20 10K
Shideh 18 35K
Ted 50 60K
Kevin 62 120K
Angela 55 140K
Mike 45 90K
Physical View
name age salary
Partitioned by Salary
Logical View
Shared-Nothing:Horizontal Partitioning
Ted 50 60K
Kevin 62 120K
Mike 46 90K
name age salary
Bob 20 10K
name age salaryShideh 18 35K
Angela 55 140K
name age salary
Horizontal Partitioning – Salary % 3
Selections with equality predicates referencing the partitioning attribute are directed to a single node:
o Retrieve Emp where salary = 60K
SELECT FROM Emp WHERE salary=60K
Equality predicates referencing a non-partitioning attribute and range predicates are directed to all nodes:
o Retrieve Emp where age = 20
o Retrieve Emp where salary < 20K
SELECT FROM Emp WHERE salary<20K
Shared-Nothing:Horizontal Partitioning Pitfalls
DB ClusterNode 1
DB ClusterNode 2
DB ClusterNode 3
Table A
Table B
Table C
Shared DiskSubsystemH
igh-
Sp
eed
Inte
rcon
nect
Shared-Disk:No Partitioning, Full Access to Data
DatabaseInstance 1
Table ATable B
Table C
Slave A
Slave B
Slave C
Scalability & Availability Shared Nothing
Scalability & Availability Shared Disk
No
de A
No
de B
No
de C
Data
MySQL Serverswith ScaleDBEngine
No
de D
No
de E
Grow by simply adding nodes to the clustero Servers can be added and removed dynamically
according to your needs
o No interruption to your application
High-Availability with dynamic failovero Existing nodes automatically take over
Significantly reduced maintenance costso Can be built on low-cost commodity hardwareo No data partitioning
o No need for slaves
Low Total Cost of Ownership (TCO)
Shared-Disk:Summarizing Shared-Disk Benefits
ScaleDB Engine Instance A
ClusterManager
Buffer Manager
Comm.Layer
Server Instance A
Shared-Disk:Making it work with MySQL
Node 1Node 1
ScaleDB Engine Instance B
Buffer Manager
ClusterManager
Comm.Layer
Node 2Node 2
Server Instance B
Shared Disk Sub-systemShared Disk Sub-system
Cluster InterconnectCluster Interconnect
ScaleDB Engine Instance A
ClusterManager
Buffer Manager
Comm.Layer
Node 1Node 1
Server Instance A
ScaleDB Engine Instance B
Buffer Manager
ClusterManager
Comm.Layer
Node 2Node 2
Server Instance B
Shared-Disk: Insert New Row
Shared Disk Sub-systemShared Disk Sub-system
Cluster InterconnectCluster Interconnect
ScaleDB Engine Instance A
ClusterManager
Buffer Manager
Comm.Layer
Node 1Node 1
Server Instance A
ScaleDB Engine Instance B
Buffer Manager
ClusterManager
Comm.Layer
Node 2Node 2
Server Instance B
Shared-Disk: Select
Shared Disk Sub-systemShared Disk Sub-system
Cluster InterconnectCluster Interconnect
ScaleDB Engine Instance A
ClusterManager
Buffer Manager
Comm.Layer
Node 1Node 1
Server Instance A
ScaleDB Engine Instance B
Buffer Manager
ClusterManager
Comm.Layer
Node 2Node 2
Server Instance B
Shared-Disk: Create Table
Shared Disk Sub-systemShared Disk Sub-system
Cluster InterconnectCluster Interconnect
Table AMeta-DataTable A
Meta-Data
ScaleDB Installation Define cluster = true in ScaleDB Config file:
ScaleDB.cnf is at the same directory as my.cnf:
Cluster params:o cluster = true
o nodes_in_cluster = 2
o node_id = 1
o this_machine_port = 100
o next_machine_ip_address = 192.168.0.101
o next_machine_port = 100
o log_directory = /share/logs/
Demo - Sysbench ScaleDB cluster – one node – show throughput
ScaleDB cluster – 2nd node – show throughput
ScaleDB: Multi-Table Indexing
B-tree: Only indexes the data in tablesIndex
#1
#1 #2
Index #2
Index #3
Index #4
Index #5
#3 #4 #5
ScaleDB Index
#1
#2
#3
#4
#5
ScaleDB: Indexes the data and relationships
Advantages:• Faster• Smaller• Referential integrity
Example Scenario: Select information that is spread
across 3 tables: Colleges, Students and Enrollment
Relationships: Students are enrolled in courses within departments of colleges
SELECT c1.CollName, s.StudName, c2.CourseName , e.Grade
FROM College AS c1
JOIN Student AS s
JOIN Enrollment AS e
JOIN Course AS c2
ON ( c1.CollNo = s.CollNo AND
s.CollNo = e.CollNo AND
s.StudentNo = e.StudentNo AND
e.CollNo = c2.CollNo AND
e.DeptNo = c2.DeptNo AND
e.CourseNum = c2.CourseNum )
WHERE c1.CollNo = X
AND s.StudentNo = Y ;
Option #1: Conventional JoinsID College Students
234 Institute of Technology 1,334
167 High Tech Institute 5,742
85 Golden State College 2,119
298 Kaplan College 12,323
510 California College 1,926
ID Student Name SS# Phone
1220 Bruce Chizen 422-72-8495 (650) 234-2234
6778 Naomi Seligman 533-99-1234 (279) 331-2345
4435 Raymond Bingham
8872 Reed Hastings 412-44-5567 (312)676-8812
1129 Maria Klawe
1123 Bernard Vergnes
College ID Course Name Student Grade
510 C67 Mathematics 4435 87
167 C123 History 1 1129 70
167 C14 Photography 1 1120 88
Students Table
College Table
Enrollment Table
Search enrollment by College & Student
Get Student information
Get College information
Option #2: Materialized View
ID College Students ID Course Name ID Student Name
234 Institute of Technology 1,334 C134 Mathematics 1145 John Cheechoo …
234 Institute of Technology 1,334 C134 Mathematics 1837 Ryane Clowe …
234 Institute of Technology 1,334 C134 Mathematics 2256 Patrick Marleau …
234 Institute of Technology 1,334 C134 Mathematics 2277 Jamie McGinn …
234 Institute of Technology 1,334 C134 Mathematics 4113 Torrey Mitchell …
234 Institute of Technology 1,334 C134 Mathematics 1145 …
385 Golden State College 2,224 G85 World History 7783 Joe Pavelski …
385 Golden State College 2,224 G85 World History 2234 Jeremy Roenick …
385 Golden State College 2,224 G85 World History 1177 Devin Setoguchi …
385 Golden State College 2,224 G85 World History 4113 Torrey Mitchell …
. . .
Col_ID# Col_Name Col_Budget Col_DescriptionColleges
001 Agriculture $1,234,567 Nice place to visit
002 Arts $5,432,567 Sports not so good
003 Business $9,999,666 Cool logo
004 Education $3,234,567 Ugh Worcester
005 Engineering $8,238,568 Serious work
006 Law $7,237,767 Jumpy students
007 Liberal Arts $9,898,777 Pretty campus
008 Medicine $5,987,004 In Texas
Students
56-8033 008 Mike Hogan Caucasian
56-8045 008 Moshe Smith Caucasian
56-8044 008 Sally Shadmon Native American
56-8055 008 Billy Fleegle African American
56-8037 008 Saul Goode African American
56-8122 008 Tim Collins Polynesian
56-8233 008 Sam Gee Asian
56-8334 008 Rod Paulino Asian
Enrollment
008 4455 56-8037 B+
008 4455 56-8033 C
008 4455 56-8045 B+
008 4456 56-8044 A-
008 4456 56-8122 B-
008 4454 56-8233 C
008 4455 56-8334 F
008 4454 56-8055 D
Coll_ID# Coll_Name Coll_Budget Coll_Description Student_ID# College_ID# Student_Name Student_Desc College_ID# Dept_ID# Student_ID# Grade
Option #3: Multi-Table Index
CollegeCollege
StudentsStudents
EnrollmentEnrollment
DepartmentsDepartments
CoursesCourses
ScaleDB Multi-Table Index
EnrollmentEnrollment
Mapping Foreign Keys to Data Views
Create Students Table
o Foreign key – College
Students
Enrollment Create Enrollment Table
o Foreign key - Students
Course
Create Course Table
o Foreign Key – DepartmentDepartment
Create Department Table
o Foreign key – College
College Create College Table
The Parent-Child tables are Created in MySQL Such that MySQL is able to operate over the new tables
The data of the Parent-Child tables is assembled on the fly from the source tables
Mapping Foreign Keys to Data Views
Students
Enrollment
Course
DepartmentCollege
DepartmentCollege
College
StudentsCollege
Physical files:1. College2. Department3. Student4. Course5. Enrollment
ScaleDB
Meta-Data Tables:
1. College2. College-Dept3. College-Dept-Course4. College-Students5. College-Students-Enrollment6. Department7. Students8. Course9. Enrollment
Enabling the MySQL optimizer to use a Multi-Table IndexSELECT c1.CollName, s.StudName,
c2.CourseName , e.Grade
FROM College AS c1
JOIN Student AS s
JOIN Enrollment AS e
JOIN Course AS c2
ON ( c1.CollNo = s.CollNo AND
s.CollNo = e.CollNo AND
s.StudentNo = e.StudentNo AND
e.CollNo = c2.CollNo AND
e.DeptNo = c2.DeptNo AND
e.CourseNum = c2.CourseNum )
WHERE c1.CollNo = X
AND s.StudentNo = Y ;
CREATE TABLE sdb_view_college_course_student (
L1_CollNo INT NOT NULL,
L1_CollName CHAR(32) NOT NULL,
L1_CollBudget INT NOT NULL,
L1_CollDescription CHAR(60) NOT NULL,
… Table College Columns
L2_StudNo INT NOT NULL,
L2_StudName CHAR(48) NOT NULL,
… Table Student Columns
L3_CourseNum CHAR(9) NOT NULL,
L3_Grade CHAR(2) NOT NULL,
… Table Enrollment Columns
PRIMARY KEY ( L1_CollNo, L2_StudtNo, L3_CourseNum))
ENGINE = SCALEDB;
Select L1_CollName, L2_StudName, L3_CourseName, L3_Grade
FROM sdb_view_college_course_student WHERE l1_CollNo = X AND l2_StudentNo = Y ;
The Multi-Table Index
Multi-Table Index appears to MySQL as a data table
ScaleDB does not maintain data file associated with the Multi-Table Index
For a query using virtual table, ScaleDB assembles the rows on the fly using the Multi-Table Index
ScaleDB indexes are different than B-tree indexes
ScaleDB indexes provide the same functionality as B-tree, plus…
o They maintain referential integrity with minimal overhead
o They allow you to search for the data and relationships
o They are much smaller in size
Demo Query with join
Query with Multi-Table Index
2nd node virtual table
Benchmarking ScaleDB Index
0
10
20
30
40
50
60
Engine X Join ScaleDB MTI ScaleDB 2 Nodes
Queries/Sec
Summary
ScaleDB Cluster
o Multiple ScaleDB instances share the same physical data.
o Connecting to the cluster is similar to connecting to a single node.
o For the application, the cluster appears as a single node.
o Transparent application failover
o Transparent Scalability
ScaleDB Indexes
o Provide the B-tree functionality
o High performance
Map relationships
Maintain referential integrity
Smaller footprint
Independent of the key size
ScaleDB Status and Product Availability
Started Beta Process
o We are looking for beta companies
Product launch is scheduled for June timeframe
Please talk to us if you are developer interested in working with ScaleDB