View
437
Download
1
Category
Tags:
Preview:
DESCRIPTION
How to break a legacy application with 15 years of tech debt into two databases.
Citation preview
Logically Sharding a Growing PostgreSQL Database
The Breakup
Introductions
Students hate us.
Introductions
Turnitin.com
Samantha: database
@mzsamantha
Fred: code
@phredmoyer
The Seven Stages Of Grief Scaling
1. Shock and Denial2. Pain and Guilt3. Anger and
Bargaining4. Depression &
Reflection5. The Upward Turn6. Reconstruction7. Acceptance and
Hope
The Seven Stages Of Grief Scaling
1. Shock and Denial2. Pain and Guilt3. Anger and
Bargaining4. Depression &
Reflection5. The Upward Turn6. Reconstruction7. Acceptance and
Hope
1. Monolithic Scaling 2. Hardware is
Expensive3. If We Do It This
Way...4. We Are So *%@#!&5. Down To 150 Bugs!6. Release Day7. Beer & Therapy
(beerapy?)
The Problem
● The ability to efficiently backup and restore
● The amount of ram required to keep indexes in memory
● Resource contention causing query planner to make sub-optimal choices.
● Aged data extending query resources and execution time
● Overlap in existing ID spaces
● No account crossover between shards. I.E. Tii-UK and Tii require separate accounts.
Stage 2: Options
● Account based shardingo Difficult to split account usage evenly across shards.
● Geographical based shardingo Currently have one geographical shard (UK).o Added deployment, poor resource utilization.
● Oracle RAC ($$$)o Oracle OpenWorld is Sunday in SF. No bacon there.
● Horizontal shardingo Move fast growing tables to separate physical hosts.o Break relational constraints.o Good path to a service oriented architecture node.
Stage 2: Options
Why Did We Discuss All That Before Phase 1?
Stage 2: Options
Objective Expertise.Please Step Away From the Application.
Triage
What is going to kill us first?
Stage 1: DiagnoseTa
ble
siz
e in
Gig
s
Stage 1: Diagnose
Database size: 507 GB
m_object_paper: 94 GB
gm3_mark: 71 GB
m_object: 53 GB
m_report_stats: 35 GB
Four tables account for half the bulk of the entire database.
Stage 1: Diagnose
What About Table Sharding?
Stage 2: Options
Three Part Two Year Proposal: Short, Mid, and Long term Goals.
Short: 3 MonthsQuery Partition and Refactor
Removal of ‘Leaf Service’: Marks
Stage 2: Options
Three Part Two Year Proposal: Short, Mid, and Long term Goals.
Mid: 9 MonthsID Reconciliation Between Shards
Table Partitioning
Stage 2: Options
Three Part Two Year Proposal: Short, Mid, and Long term Goals.
Long: 12 MonthsCreate DAL
Removal of Large TablesGlobal Statistics and Reporting
Stage 2: Options
Short Term: 12 Months LaterI do not think it means what you think it means.
Stage 3: Scoping The Solution - Database
Main
Marks
Stage 3: Scoping The Solution - Database
Data Up Approach:
Start with the schemaIsolate direct links
Slow, Tedious, and Painful
Stage 3: Scoping The Solution - Database
Foreign-key constraints:
"$1" FOREIGN KEY (source) REFERENCES m_object(id)
"$2" FOREIGN KEY (reader) REFERENCES m_user(id)
"m_dg_read_pm_review_set_fkey" FOREIGN KEY (pm_review_set) REFERENCES pm_review_set(id)
Referenced by:
TABLE "gm_mark" CONSTRAINT "$1" FOREIGN KEY (read) REFERENCES m_dg_read(id)
TABLE "erater_read_filter" CONSTRAINT "erater_read_filter_read_fkey" FOREIGN KEY (read) REFERENCES m_dg_read(id) ON DELETE CASCADE
TABLE "gm3_mark" CONSTRAINT "gm3_mark_read_fkey" FOREIGN KEY (read) REFERENCES m_dg_read(id) ON DELETE CASCADE
TABLE "gm3_rubric_scoring" CONSTRAINT "gm3_rubric_scoring_read_fkey" FOREIGN KEY (read) REFERENCES m_dg_read(id)
TABLE "r_mark_criterion" CONSTRAINT "mark_criterion_read_fkey" FOREIGN KEY (read) REFERENCES m_dg_read(id) ON DELETE CASCADE
TABLE "pm_review" CONSTRAINT "pm_review_id_fkey" FOREIGN KEY (id) REFERENCES m_dg_read(id)
TABLE "r_read_audio" CONSTRAINT "r_read_audio_read_id_fkey" FOREIGN KEY (read_id) REFERENCES m_dg_read(id)
Stage 3: Scoping The Solution - Database
Original: 236 tablesNew main database (192 tables)New marks database (40 tables)
Stage 3: Scoping The Solution - Code
Option 1 - Data Access Layer (DAL)
o Separate codebase encapsulating new set of tables
o Written in Golang, an HTTP based REST service
o Avoids carrying forward existing technical debt
o Requires detailed knowledge of existing product features
o Unit tests are very helpful, but coverage is never 100%
o 14 years of business logic (dark matter)
o In long lived web apps, tribal knowledge is authoritative
Stage 3: Scoping The Solution - Code
Option 2 - Add additional database handles to new db
o Perceived as a safer approach (deciding factor,
known risks).
o Requires paying interest on existing technical debt.
o Refactoring is less risky than rewriting.
o Take advantage of existing business logic and tribal
knowledge.
o Preserve sacred cows.
Stage 3: Scoping The Solution - Hardware
"We can use smaller hardware because we are splitting off part of the database"
➢ This is somewhat of a fallacy➢ You might need smaller storage➢ You might need slightly less CPU➢ Stick with close to the same amount of RAM
Stage 4: Implementation - Rollback
S: “What if this fails?”
F: “We Rollback the code, restore the database,
and look for new jobs.”
Stage 4: Implementation - Rollback
Q: How do you bifurcate a database and
rollback without data loss?
A: Slony.
Stage 4: Implementation - Rollback
Timelines matter. Prepare in advance.
Split Replication Well In Advance.
Test Process, Then Test It Again.
Stage 4: Implementation - Archaeology
● What is this table? That service doesn’t exist
anymore?○ Let’s Drop it!
● What’s that table? It’s an old version still in
use?○ Let’s Drop it!
● What’s that one over there?○ Let’s Drop it!
Stage 4: Implementation - Archaeology
Wait… old version still in use?
Stage 4: Implementation - Archaeology
● Fourteen years of application development.
● Five major codebases, dozens of support utilities.
● Hundreds of codepoints for database connections.
● A dozen different ORMs.
● Dynamically generated SQL joining tables.
● Technical debt (code with high maintenance costs).
● Best practices of 10 years ago are now liabilities.
How do you change all of the electrical sockets in an
(old) office building?
Stage 4: Implementation - Archaeology
Stage 4: Implementation - Archaeology
EMPATHY
Stage 4: Implementation - Archaeology
EMPATHYput yourself in the mind of the
author
Stage 4: Implementation - Archaeology
James left 8 years ago. The elevator is in old building.They tore down the old building to build a Target.
# this code is critical to our workflow, don’t remove it!!# for details talk to jamesb <> who sits near the elevator# $foo = $object->flocculate( key => $cfg->secret_key );# return $foo;return;
Stage 4: Implementation - Archaeology
Bob is still here though. Bob is a little particular about his code though (we are all to some degree).
Now you’re in there meddling with Bob’s code. How would you feel if you were Bob?
A little empathy goes a long way towards getting Bob to help you get his code ported to the new dual database schema.
Stage 4: Implementation - Queries
main database - marks databaseSELECT count(m.*) FROM gm3_mark m, gm3_qm_template qmtWHERE m.read IN
(SELECT dgr.id FROM m_dg_read dgrJOIN m_object_paper mop ON (mop.id =
dgr.source AND mop.owner = ?)JOIN m_assignment ma ON (ma.id =
mop.assignment AND ma.class = ?) WHERE reader = ?)
AND m.qm_template = qmt.id AND qmt.id = ?
Main Database - grab ids to pass to marks database.
SELECT p.id FROM m_object_paper pJOIN m_assignment a ON a.id = p.assignmentWHERE a.class = ? AND p.owner = ?
Stage 4: Implementation - Queries
Stage 4: Implementation - Queries
Marks database - pass former FK ids to an IN clause.
SELECT count(m.*) FROM gm3_mark m JOIN gm3_qm_template qmt ON qmt.id = m.qm_template JOIN m_dg_read dgr ON dgr.id = m.read WHERE dgr.source IN (?, ?, ?) AND qmt.id = ? AND dgr.reader = ?
Stage 4: Implementation - Transactions
Single database transactions are easy.eval { $db->do(“INSERT INTO foo (name) VALUES (‘bar’)”); $id = $db->do(“SELECT CURRVAL(‘foo’)”); $db->do(“INSERT INTO fee (foo_id) VALUES ($id)”);};if ($@) { # catch exception $db->rollback; # roll transaction back} else { $db->commit; # commit transaction}
Stage 4: Implementation - Transactions
Dual database transactions are harder.
eval { # insert into foo in main db, grab last value $main_db->do(“INSERT INTO foo VALUES (‘bar’)”); $foo_id = $main_db->do(“SELECT CURRVAL(‘foo’)”);
# insert foo id into marks db, grab last value $marks_db->do(“INSERT INTO fee VALUES ($id)”); $fee_id = $main_db->do(“SELECT CURRVAL(‘fee’)”);};
Stage 4: Implementation - Transactions
Roll back both handles on exception, commit both on success.
if ($@) { # catch exception $main_db->rollback; # roll main_db back $marks_db->rollback; # roll marks_db back} else { $main_db->commit; # commit main_db $marks_db->commit; # commit marks_db}
Stage 4: Implementation - Transactions
What if the commit fails?if ($@) { # catch exception $main_db->rollback; # roll main_db back $marks_db->rollback; # roll marks_db back} else { eval { $main_db->commit }; if ($@) { $main_db->rollback; $marks_db->rollback; } eval { $marks_db->commit }; ...
Stage 4: Implementation - Transactions
CAP (Brewer’s Law)
Stage 4: Implementation - Transactions
Consistency or Availability?
Stage 4: Implementation - Transactions
9 out of 10 users prefer availability
So does customer support.You can fix consistency.
Stage 4: Implementation - ORMs
ORMs are full of pain
● They hide away db connection details.
● They make it hard to break models apart.
● They make writing code easy…
● But debugging is much more difficult.
Stage 4: Implementation - ORMs
ORMs are full of painBack in my day we used SQL, and we liked it.
$classes = $c->classes->search( $select_hash, { '+select' => 'source.id', '+as' => 'src_id', 'join' => [ { 'user_rights_class' => { 'user_role' => 'owner' } }, 'source' ], 'rows' => 200, 'page' => 1 } );
Stage 4: Implementation - Juggling
Talking to two databases is easy, right?
Stage 4: Implementation - Juggling
Talking to two databases is easy, right?
Not as easy as it seems.
Stage 4: Implementation - Juggling
Main database - Marks database
Are you talking to me?
Stage 4: Implementation - Juggling
Main database - Marks database
I think he was talking to me.
Stage 4: Implementation - Config
● Main Database: One master, two slaves (2)
● Marks Database: One master, two slaves (2)
● ASP application: write user, read only user (2)
● Catalyst Application: write user, read only user (2)
● REST Application: write user, read only user (2)
● dev, qa, staging, production, sandbox, uk (6)
Stage 4: Implementation - Config
● Database hosts and users: 2*5 = 10
● Stages: 10 * 6 = 60
● Config managed in version control, no discovery.
● Config deployed via RPM with application.
● Get one wrong? Start all over again.
● Configuration is full of pain and suffering.
Stage 4: Implementation - Config
Yes, we are moving to Chef.
Stage 4: Implementation - Tech Debt
How much tech debtdo you have?
Stage 4: Implementation - Tech Debt
How much tech debtdo you have?
More than you think.
Stage 4: Implementation - Tech Debt
How much of it will you have to deal with?
Stage 4: Implementation - Tech Debt
How much of it will you have to deal with?
More than you think.
Stage 4: Implementation - Tech Debt
Our legacy app:
● 5 ORMs
● No unit tests (many integration tests)
● Two template frameworks
● 9 different log files
● Code is generally pretty readable!
Stage 4: Implementation - Tech Debt
Stage 5: Release
Planned 8 hour Maintenance Window15 People + support
2.5 Hours Main Service1.5 Hours UK
2 Hours Sandbox + Cat Videos
Stage 5: Release
Stage 6: Cleanup
Patch Flavors:How Did That Get there?
That’s a bug.It worked fine in dev.
Stage 6: Cleanup
“Sometimes the query planner does dumb things”
o People forget why you embarked on this effort.
o People forget the successes and risk mitigation.
o People won’t forget the visceral reactions to
service degradations.
Stage 6: Cleanup
How to bring your site to a halt:
1.Start transaction to database 12.Start transaction to database 23.Wait for database 1 to finish
Stage 6: Cleanup
PANIC
Stage 6: Cleanup
Gone in 60 seconds
Stage 6: Cleanup
Stage 6: Cleanup
Where Do We Golang From here?
Back To Plan A.
Most of the heavy lifting is done.
“The first split is the hardest” - Some Guy Here
The End
So long SurgeCon!
And thanks for the bacon.
Recommended