25
RENORMALIZE Akiban Technologies, Inc. Confidential & Proprietary Solving Performance Problems in MySQL Without Denormalization

Solving performance problems in MySQL without denormalization

Embed Size (px)

DESCRIPTION

As operational database schemas become complex, users resort to denormalization to handle performance issues. This includes a range of techniques from materialized views to using MySQL as a key-value store for blobs containing full objects. While denormalization solves immediate bottlenecks, it comes at a hefty price. In this presentation Ari will explore common denormalization approaches and tradeoffs using real world examples. He will then present a solution under development at Akiban Technologies to alleviate these same problems much more efficiently, and allow users to get the best of both worlds.

Citation preview

Page 1: Solving performance problems in MySQL without denormalization

RENORMALIZE

Akiban Technologies, Inc. Confidential & Proprietary

Solving Performance Problems in MySQL Without Denormalization

Page 2: Solving performance problems in MySQL without denormalization

Problem Statement

Schemas scale out

Data volume grows

Joins become a real bottleneck

2 Akiban Technologies, Inc. Confidential & Proprietary

Page 3: Solving performance problems in MySQL without denormalization

Two Common Manifestations

SQL Joins Queries become slower as more tables are joined.

Application Object Creations Constructing an object is as expensive as SELECTing the sum of its parts

Denormalize. Problem solved.

3 Akiban Technologies, Inc. Confidential & Proprietary

Page 4: Solving performance problems in MySQL without denormalization

V1 Release Get Customers!

4

Application Growing Pains

V2 Release De-normalize DB

Time

Com

plex

ity &

Cos

t

Cus

tom

ers

V3 Release Replicate DB

V4 Release Add Caching

V5 Release Shard Database

V6 Release Rip & Replace

MySQL

Cache Server

MySQL Slaves

MySQL MySQL

MySQL

Sharding

Web Server

Rip & Replace Database Architecture

?

Page 5: Solving performance problems in MySQL without denormalization

De·nor·mal·ize [de-nawr-muh-lahyze]

verb, -ized, -iz·ing.

–verb (used with object)

1.  the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data wikipedia

2.  Denormalize means to allow redundancy in a table so that the table can remain flat UCSD Blink

3.  The process of restructuring a normalized data model to accommodate operational constraints or system limitations celiang.tongji.edu.cn

5 Akiban Technologies, Inc. Confidential & Proprietary

Page 6: Solving performance problems in MySQL without denormalization

Materialized Views

Persistent database object Contains the results of a query Store summary and pre-joined tables Require maintenance/refresh for dynamic data

SELECT DISTINCT(n.nid),n.sticky,n.title,n.created FROM node n INNER JOIN term_node tn0

ON n.vid = tn0.vid WHERE n.status = 1

AND tn0.tid IN (77) ORDER BY n.sticky DESC, n.created DESC LIMIT 0, 25;

Result: using where, using filesort 6 Akiban Technologies, Inc. Confidential & Proprietary

Page 7: Solving performance problems in MySQL without denormalization

Drupal Materialized View Project CREATE TABLE `mv_drupalorg_node_by_term` ( `entity_type` varchar(64) NOT NULL, `entity_id` int(10) unsigned NOT NULL DEFAULT '0’, `term_tid` int(10) unsigned NOT NULL DEFAULT '0', `node_sticky` int(11) NOT NULL DEFAULT '0', `last_node_activity` int(11) NOT NULL DEFAULT '0', `node_created` int(11) NOT NULL DEFAULT '0', `node_title` varchar(255) NOT NULL DEFAULT '’, PRIMARY KEY (`entity_type`,`entity_id`,`term_tid`), KEY `activity` (`term_tid`,`node_sticky`,`last_node_activity`,`node_created`), KEY `creation` (`term_tid`,`node_sticky`,`node_created`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 SELECT DISTINCT entity_id AS nid, node_sticky AS sticky, node_title AS title,

node_created AS created FROM mv_drupalorg_node_by_term WHERE term_tid IN (77) ORDER BY node_sticky DESC, node_created DESC LIMIT 0, 25;

Result: using where, using temporary table

7 Akiban Technologies, Inc. Confidential & Proprietary

Page 8: Solving performance problems in MySQL without denormalization

Denormalization Technique Listing

8 Akiban Technologies, Inc. Confidential & Proprietary

Technique Pros Cons

Materialized views Faster queries (no joins) Data explosion Manually keep synched

Store object as Blob Fast object get No modeling, or querying

Denormalize 1NF: Folding parent-child into parent table

Data in one row limited # of child rows Hard to query (UNION hell)

Denormalize 2NF to 1NF: repeat columns from 1 table in M table (Double writing)

Avoid join Data explosion Manually keep synched

Adding derived columns Avoid joins, aggregation Manually keep synched

Property bag (RDF) Schema flexibility Manage schema in app Hard to index or perform

Page 9: Solving performance problems in MySQL without denormalization

Renormalization

Join for free - Improved performance. 10-100x! - Retrieve an object in one request

9 Akiban Technologies, Inc. Confidential & Proprietary

Page 10: Solving performance problems in MySQL without denormalization

Introduction to Table-Groups

Traditional SQL Schema à Table à Column

Akiban newSQL Schema à GROUP à Table à Column

Table-Groups are first class citizens

10 Akiban Technologies, Inc. Confidential & Proprietary

Page 11: Solving performance problems in MySQL without denormalization

Typical Relational DB Schema

11 Akiban Technologies, Inc. Confidential & Proprietary

Page 12: Solving performance problems in MySQL without denormalization

Typical Schema: Grouped

12

Block

Group

User Group

Node Group

Page 13: Solving performance problems in MySQL without denormalization

Physical

Artist Table-group

Logical

Table-Groups Eliminate Joins

13

Users Users_Roles Sessions

Table bTree

uid name pass

1 rriegel ***

2 twegner ***

Table bTree

Table bTree

Akiban Technologies, Inc. Confidential & Proprietary

id rid

1 1

1 2

2 1

id timestamp sid

1 2011-10-01-06:02.00 19390

2 2011-10-04-22:32.10 22828

1 2011-10-04-16:07.30 49377

Group bTree

Page 14: Solving performance problems in MySQL without denormalization

Benefits of Table-grouping

SQL join operations are fast -  Table Group access is equivalent to a

single table access. Joins are free! -  Performance increases 10-100x

Applications do not change -  Maintain the same tables and SQL -  Objects (e.g. ORM) fetched in one request -  Akiban uses standard MySQL replication

14 Akiban Technologies, Inc. Confidential & Proprietary

Page 15: Solving performance problems in MySQL without denormalization

Design Partner Sample Query

SELECT t1.id , t3.c1, t3.c2, t3.c3, t3.c4 FROM t1 INNER JOIN t2 on t2.id = t1.id LEFT JOIN t3 ON t1.id = t3.id WHERE t2.region in (1297789)

AND t1.c1 = '0' ORDER BY t1.latestLogin DESC LIMIT 500

15 Akiban Technologies, Inc. Confidential & Proprietary

Page 16: Solving performance problems in MySQL without denormalization

Typical MySQL EXPLAIN Plan

1

2 3

4

7

6

5

8

9

Akiban Technologies, Inc. Confidential & Proprietary

3 Index Accesses

Sort

Temp Table

2 Joins

2 Table Accesses

Project Results 10

Page 17: Solving performance problems in MySQL without denormalization

3 Index Accesses

Sort

Temp Table

2 Joins

2 Table Accesses

Project Results

Efficiency for Speed and Scale

Akiban Technologies, Inc. Confidential & Proprietary

2

1

3

Typical MySQL EXPLAIN 1 Group Index Access

No Joins, Temp Tables or

Sorts!

1 Group Access

Project Results

Page 18: Solving performance problems in MySQL without denormalization

Design Partner Acceleration: 27x

18 Akiban Technologies, Inc. Confidential & Proprietary

Concurrent Connections

Page 19: Solving performance problems in MySQL without denormalization

Object Creation Query Stream

SELECT * FROM t1 Where u.uid=1387 SELECT * FROM t2 Where as.uid=1387 SELECT * FROM t3 Where os.uid=1387 SELECT * FROM t4 Where pm.uid=1387 SELECT * FROM t5 Where pl.uid=1387 SELECT * FROM t6 Where pa.uid=1387 ... ...

19 Akiban Technologies, Inc. Confidential & Proprietary

Page 20: Solving performance problems in MySQL without denormalization

Becomes Single ORM Request SELECT * , (SELECT * FROM t2 where as.uid=u.uid), (SELECT * FROM t3 where as.uid=u.uid),

... FROM t1 Where u.uid=1387;

Or simply: get my_schema:t1:uid=1387

20 Akiban Technologies, Inc. Confidential & Proprietary

Page 21: Solving performance problems in MySQL without denormalization

Object Access in One Request

21 Akiban Technologies, Inc. Confidential & Proprietary

Page 22: Solving performance problems in MySQL without denormalization

Application Integration

22

Akiban Server

MyISAM / InnoDB Storage

MySQL Master

MyS

QL

adap

ter

Replication

Problem Queries Write Operations

HA Redirect Enabled

Akiban Technologies, Inc. Confidential & Proprietary

Fully independent server Data replicated to Akiban

Page 23: Solving performance problems in MySQL without denormalization

Akiban is looking for Design Partners! Do you have •  Slow multi-join read queries? •  User concurrency or data volume challenges? http://www.akiban.com/design-partner-program

23 Akiban Technologies, Inc. Confidential & Proprietary

Page 24: Solving performance problems in MySQL without denormalization

Ah, so you’re…

Denormalizing…no. -  Schema doesn’t change -  Data is stored once, more efficiently

Materializing Views…no. -  No triggers or post-processing -  No 2ndary logical objects

Introducing Write Latency…no. -  Previous design partner showed 2x write

improvement

24 Akiban Technologies, Inc. Confidential & Proprietary

Page 25: Solving performance problems in MySQL without denormalization

Artist

Table bTree

id name gender

1 Lennon M

2 Joplin F

Table-Grouping: A Closer Look

25 Akiban Technologies, Inc. Confidential & Proprietary

• Covering index •  Index on frequently joined columns •  Index on common sort order

Each table maintains its own bTree

Indexes add their own bTrees

Covering Index

Join Cols Index Sort

Order Index

How many indexes do you maintain? • Slow updates == reduced concurrency • More resources == more overhead • Ongoing maintenance == high TCO