Addressing vendor weaknesses in user space (Robert Treat)

Preview:

DESCRIPTION

 

Citation preview

Addressing Vendor Weaknesses in User-Space

ROBERT TREAT,OmniTI

Highload++ 2011

@robtreat2xzilla.net

+Robert Treat1

Monday, October 3, 11

Who Am I?

OMNTI - Internet Scalability ConsultantsLead Database Operations

2

Monday, October 3, 11

Who Am I?

OMNTI - Internet Scalability ConsultantsLead Database Operations

“Large Scale”

3

Monday, October 3, 11

Who Am I?

OMNTI - Internet Scalability ConsultantsLead Database Operations

“Large Scale”

High TransactionsTB+ Data

4

Monday, October 3, 11

Who Am I?

OMNTI - Internet Scalability ConsultantsLead Database Operations

“Large Scale”

High TransactionsTB+ Data

Mission Critical

5

Monday, October 3, 11

Who Am I?

Database Operations @OMNTI

PostgresMySQLOracle& More

6

Monday, October 3, 11

Postgres for Scalability

Traditional RDBMSHighly ExtensibleRuns Everywhere

Talks To Everything“BSD” Licensed

15+ Years DevelopmentOpen Development Community

7

Monday, October 3, 11

The Bloat Problem

Data Footprint Can Be Critical To Performance

8

Monday, October 3, 11

The Bloat Problem

Data Footprint Can Be Critical To Performance

Size On Disk Affects The Needs OfRAM, Disk Speed, Storage

9

Monday, October 3, 11

The Bloat Problem

Data Footprint Can Be Critical To Performance

Size On Disk Affects The Needs OfRAM, Disk Speed, Storage

“Bloat” is unused, wasted disk space, used by the database,

but not needed for actual data storage

10

Monday, October 3, 11

The Bloat Problem

Data Footprint Can Be Critical To Performance

Size On Disk Affects The Needs OfRAM, Disk Speed, Storage

“Bloat” is unused, wasted disk space, taken up by the database,

but not needed for actual data storage

Why?

11

Monday, October 3, 11

MVCC Architecture

Multiversion Concurrency Control (MVCC) allows Postgres to offer high concurrency even during significant database read/write activity. MVCC specifically offers behavior where "readers never block writers, and writers never block readers".

12

Monday, October 3, 11

MVCC Architecture

• Oracle

• MySQL (InnoDB)

• Informix

• Firebird

• MSSQL (optional)

13

Monday, October 3, 11

MVCC Architecture

• Oracle

• MySQL (InnoDB)

• Informix

• Firebird

• MSSQL (optional)

• CouchDB

14

Monday, October 3, 11

“Bloat” Manifests Differently, But Is Common

• MongoDB (deletes, some updates)

• dump/restore

• mongod --repair

• db.runCommand( { compact : 'mycollectionname' } )

• Lucene (updates)

• Hadoop / HDFS (small files)

15

Monday, October 3, 11

Postgres MVCC Architecture

• Implemented Postgres 6.5• 1999, Vadim Mikheev

• MVCC Unmasked • http://momjian.us/main/writings/pgsql/mvcc.pdf

16

Monday, October 3, 11

Postgres MVCC Architecture

• Postgres maintains global transaction counters

• Keeps track of transaction counter per row for• creating transaction• removing transaction

• Using these counters, Postgres allows different transactions to see different rows, based on visibility rules.

17

Monday, October 3, 11

Postgres MVCC Architecture

• Postgres maintains global transaction counters

• Keeps track of transaction counter per row for• creating transaction• removing transaction

• Using these counters, Postgres allows different transactions to see different rows, based on visibility rules.

18

Transaction Reading An Old RowDoesn’t Block Transaction Writing A Row

Monday, October 3, 11

MVCC Architecture

19

user_id X42Create 32 Expire

INSERT

Monday, October 3, 11

MVCC Architecture

20

user_id X42Create 32 Expire

INSERT

DELETEuser_id X42Create 32 Expire 38

Monday, October 3, 11

MVCC Architecture

21

user_id X69Create 43 Expire

user_id X69Create 43 Expire 56

OLD(delete)

NEW(insert)

UPDATE

Monday, October 3, 11

MVCC Architecture

22

user_id X69Create 43 Expire

user_id X69Create 43 Expire 56

<~~ DEAD ROW

<~~ VISIBLE ROW

Clean Up / Bloat

Monday, October 3, 11

MVCC Architecture

23

user_id X69Create 43 Expire

user_id X69Create 43 Expire 56

<~~ DEAD ROW

<~~ VISIBLE ROW

Clean Up / Bloat

Speed Up SQL Commands ByDealing With Clean Up Later

Monday, October 3, 11

How Postgres Deals With Bloat

• Heap-Only-Tuples (HOT)• On-The-Fly, Per Page Cleanup• Marks Given Row’s Space Reusable• Update Only

24

Monday, October 3, 11

How Postgres Deals With Bloat

• Heap-Only-Tuples (HOT)• On-The-Fly, Per Page Cleanup• Marks Given Row’s Space Reusable• Update Only

• VACUUM• Non-Blocking Bulk Cleanup• Removes End-Of-File Pages• “autovacuum” Process Monitors Tables

25

Monday, October 3, 11

Problems With Automatic Cleanup

• HOT• Update Only• Doesn’t Work With Changing Index Data

26

Monday, October 3, 11

Problems With Automatic Cleanup

• HOT• Update Only• Doesn’t Work When Changing Index Data

• VACUUM• Must Wait For Long Transactions To Complete• Costs I/O, Can Only Work So Fast• Can’t Remove Non End-Of-File Pages • Leaves A “High Water Mark”

27

Monday, October 3, 11

Dealing With Bloat - The Hard Way

• VACUUM FULL / CLUSTER• The Good

• Reclaims All “Dead Rows”

28

Monday, October 3, 11

Dealing With Bloat - The Hard Way

• VACUUM FULL / CLUSTER• The Good

• Reclaims All “Dead Rows”

• The Bad• Exclusive Lock• Rewrite All Data In Tables• Needs Working Space• Heavy I/O

29

Monday, October 3, 11

Monitoring Your Bloat

• check_postgres.pl• Nagios plugin• Compares physical size to row size estimates• http://bucardo.org/wiki/Check_postgres

• “bloat report”• Script to measure table/index bloat• Compares physical size to row size estimates• http://labs.omniti.com/labs/pgtreats/browser/trunk/tools/

30

Monday, October 3, 11

Dealing With Bloat In Userspace

• Solving MVCC Bloat Is A “Hard Problem”• Even a good solution would be hard to implement in core

31

Monday, October 3, 11

Dealing With Bloat In Userspace

• Solving MVCC Bloat Is A “Hard Problem”• Even a good solution would be hard to implement in core

• Can we build a tool in user space?• Develop solution quicker• Easier to deploy and maintain • Provide a prototype for future development

32

Monday, October 3, 11

Dealing With Bloat Redux

• Updating A Row Rewrites Data To New Location

33

Monday, October 3, 11

Dealing With Bloat Redux

• Updating A Row Rewrites Data To New Location

• Use Vacuum To Mark Old Rows “Reusable”

34

Monday, October 3, 11

Dealing With Bloat Redux

• Updating A Row Rewrites Data To New Location

• Use Vacuum To Mark Old Rows “Reusable”• Update Row To Rewrite Data At “Front” Of Page

35

Monday, October 3, 11

Dealing With Bloat Redux

• Updating A Row Rewrites Data To New Location

• Use Vacuum To Mark Old Rows “Reusable”• Update Row To Rewrite Data At “Front” Of Page• Use Vacuum To Reclaim Space From End Of File

36

Monday, October 3, 11

Dealing With Bloat Redux

• Updating A Row Rewrites Data To New Location

• Use Vacuum To Mark Old Rows “Reusable”• Update Row To Rewrite Data At “Front” Of Page• Use Vacuum To Reclaim Space From End Of File

• Put A Script On It• https://labs.omniti.com/pgtreats/trunk/tools/compact_table

37

Monday, October 3, 11

Dealing With Bloat Redux

• “Compact Table”• Requires Lots of Time, I/O• Often Causes Heavy Index Bloat• Heavy Concurrency Bloats Faster Than We Can Recover It

38

Monday, October 3, 11

Dealing With Bloat For Real!

• Enter “pg_reorg”

39

Monday, October 3, 11

Dealing With Bloat For Real!

• Enter “pg_reorg”• Vacuum / Cluster Replacement

40

Monday, October 3, 11

Dealing With Bloat For Real!

• Enter “pg_reorg”• Vacuum / Cluster Replacement• Command Line Tool

41

Monday, October 3, 11

Dealing With Bloat For Real!

• Enter “pg_reorg”• Vacuum / Cluster Replacement• Command Line Tool• Online Table Rewrite

• Uses Minimal Locking

42

Monday, October 3, 11

Dealing With Bloat For Real!

• Enter “pg_reorg”• Vacuum / Cluster Replacement• Command Line Tool• Online Table Rewrite

• Uses Minimal Locking• Developed By NTT

43

Monday, October 3, 11

Dealing With Bloat For Real!

• Enter “pg_reorg”• Vacuum / Cluster Replacement• Command Line Tool• Online Table Rewrite

• Uses Minimal Locking• Developed By NTT• BSD Licensed• C Code• http://pgfoundry.org/projects/reorg/

44

Monday, October 3, 11

How pg_reorg Works

45

• Create a log table for changes• Create triggers on the old table to log changes (I/U/D)• Create a new table with a copy of all data in old table• Create all indexes on the new table• Apply all changes from the log table to the new table• Modify the system catalogs information about table files• Drop old table, leaving new table in it’s place

Monday, October 3, 11

How pg_reorg Works

46

• Create a log table for changes• Create triggers on the old table to log changes• Create a new table with a copy of all data in old table• Create all indexes on the new table• Apply all changes from the log table to the new table• MODIFY THE SYSTEM CATALOGS INFORMATION ABOUT THE TABLE FILES (!!!)• Drop old table, leaving the new table in it’s place

Monday, October 3, 11

Dealing With Bloat For Real!

Open Source Code

The Power Is In Your Hands

Look At CodeExamine the SQL

(User Space Is Really Visible)

TEST!

47

Monday, October 3, 11

Dealing With Bloat For Real!

What Does Testing Look Like?

Create Some Tables, Create Artificial Bloat,

run pg_reorg

48

Monday, October 3, 11

Dealing With Bloat For Real!

What Does Testing Look Like?

Create Some Tables, Create Artificial Bloat,

run pg_reorg

WIN!

49

Monday, October 3, 11

Dealing With Bloat For Real!

Test In “Prod”

50

Monday, October 3, 11

Dealing With Bloat For Real!

Test In “Prod”

Find Some Bloated Tables,Make Backup Of Tables,

Cross Fingers, pg_reorg

51

Monday, October 3, 11

Dealing With Bloat For Real!

Test In “Prod”

Find Some Bloated Tables,Make Backup Of Tables,

Cross Fingers, pg_reorg

WIN!

52

Monday, October 3, 11

Dealing With Bloat For Real!

Eventually You Have To Use ItOn Something That Matters

53

Monday, October 3, 11

pg_reorg In The Real World

• Production Database (OLTP) • 540GB Size• 2000 TPS (off-peak time, multiple statements)• Largest Table (pre-reorg) 127GB

54

Monday, October 3, 11

pg_reorg In The Real World

• Production Database (OLTP) • 540GB Size• 2000 TPS (off-peak time, multiple statements)• Largest Table (pre-reorg) 127GB

• Rebuild Stats• 5.75 Hours To Rebuild• Reclaimed 52GB Disk Space • No outages reported for Website/API’s

55

Monday, October 3, 11

pg_reorg In The Real World

56

Monday, October 3, 11

pg_reorg In The Real World

56

Monday, October 3, 11

pg_reorg In The Real World

57

Monday, October 3, 11

pg_reorg In The Real World

57

Monday, October 3, 11

pg_reorg In The Real World

57

Monday, October 3, 11

pg_reorg In The Real World

YAY!

58

Monday, October 3, 11

Return Of The Jedi

59

Monday, October 3, 11

“your overconfidence is your weakness.”

-Luke Skywalker

60

Monday, October 3, 11

“your faith in your friends is yours.”

-Emperor Palpatine

61

Monday, October 3, 11

Sometimes You Can Have Both

Trust in NTT’s Code == faith in friends

Success in production == overconfidence

62

Monday, October 3, 11

When Good pg_reorgs Go Bad!

WARNING:  unexpected attrdef record found for attr 61 of rel orders

WARNING:  1 attrdef record(s) missing for rel orders

63

Monday, October 3, 11

When Good pg_reorgs Go Bad!

WARNING:  unexpected attrdef record found for attr 61 of rel orders

WARNING:  1 attrdef record(s) missing for rel orders

64

Yes, On A Production SystemYes, Trying To Take 1000’s of Orders Per Second

Monday, October 3, 11

When Good pg_reorgs Go Bad!

create table test ( a int4, b int4 default 2112, c bool);

65

Monday, October 3, 11

When Good pg_reorgs Go Bad!

create table test ( a int4, b int4 default 2112, c bool);

Postgres internals track defaults / constraints based on column position “2”, not column name “b”

66

Monday, October 3, 11

When Good pg_reorgs Go Bad!

create table test ( a int4, b int4 default 2112, c bool);

Postgres internals track defaults / constraints based on column position “2”, not column name “b”

If you drop column “a” and then do pg_reorg, column “c” is now column “2”, and default 2112 is on boolean

67

Monday, October 3, 11

When Good pg_reorgs Go Bad!

create table test ( a int4, b int4 default 2112, c bool);

Postgres internals track defaults / constraints based on column position “2”, not column name “b”

If you drop column “a” and then do pg_reorg, column “c” is now column “2”, and default 2112 is on boolean

This Is Fair - pg_reorg hacks the system tables68

Monday, October 3, 11

When Good pg_reorgs Go Bad!

69

Basic Fix: Drop All Defaults And Recreate

Monday, October 3, 11

When Good pg_reorgs Go Bad!

70

Basic Fix: Drop All Defaults And Recreate

Alternative Fix: Hack System Catalogs Some More

Monday, October 3, 11

When Good pg_reorgs Go Bad!

71

Basic Fix: Drop All Defaults And Recreate

Alternative Fix: Hack System Catalogs Some More

Haven’t we had enough system catalog hacking

for now?

Monday, October 3, 11

When Good pg_reorgs Go Bad!

72

“now, if you'll excuse me, I'll go away and have a heart attack.”

Monday, October 3, 11

What Next?

Report Problem To Mailing ListSubmit A Patch

Ultimately The Problem Is FixedEveryone’s Happy?

73

Monday, October 3, 11

Hackers Discussion

Postgres Development Community Is Funny

Sometimes Hard To Get Them To Recognize Problems

Not Everyone See Online Rebuild As A Big Problem

74

Monday, October 3, 11

Hackers Discussion

Postgres Development Community Is Funny

Sometimes Hard To Get Them To Recognize Problems

Not Everyone See Online Rebuild As A Big Problem

In All The Fairness, Not Everyone Has This Problem

75

Monday, October 3, 11

Hackers Discussion

Hackers Meeting 2011, Discussion On Internal Queuing System

Could Be Used As Underlying Basis For On-Line Rebuilding

Until Then...

76

Monday, October 3, 11

pg_reorg Is A Great Tool!Best Option For Difficult Situation

Just Be Careful!

77

Monday, October 3, 11

Highload++NTT

OmniTIPostgres Community

Momjian, Depesz, Patel, Kocoloski

xzilla.net@robtreat2

+ Robert Treat

78

THANKS!

Monday, October 3, 11

Recommended