Upload
tyrone-hinderson
View
304
Download
1
Embed Size (px)
Citation preview
Data Migration at ScaleMOVING THE ELEPHANT IN THE ROOM
2
· BDPA Los Angeles Chapter· 4 year HSCC participant
· Columbia University, CC ‘14· Conductor, Inc.· linkedin.com/in/calltyrone
WHO AM I?
3
· Web Presence Management· SAAS· Big data
· Collect 6TB of raw web data a week· Scalable Collection & ETL pipelines· Final Product: reports
· 6 years running· Tons of data!
CONDUCTOR, INC.
4
· Growth· More users· More data
· Systems have to keep up!
WHY WE CARE ABOUT SCALABILITY
5
HORIZONTAL SCALING
6
VERTICAL SCALING
7
· Yesterday’s solution is tomorrow’s problem· Under-prioritized· It’s hard!
· Can require massive changes· No cure-all
SCALABILITY IN THE REAL WORLD
8
· Save money· Improve performance· Clear the way for progress
WHY REPLACE AN UNSCALABLE SYSTEM?
9
· If it ain’t broke…· Significant Resource Investment
· Time· Money
· Software Downtime· Data Quality Concerns
WHY NOT?
10
1. Identify an unscalable system2. Discover and vet a suitable successor3. Replace the legacy system with the new system
· while minimizing risk and cost
Simple, no???
YOUR TASK, AT A GLANCE
TALKING ABOUT THE ELEPHANTIdentifying an Unscalable System
12
· MySql· Normalized data model
· Helpful for initial modeling of our problem space· Hosted by a single, very powerful machine
OverviewCASE STUDY: LEGACY REPORTING DATABASE
Talking about the Elephant: Diagnosing an Unscalable System
13
· Powerful hardware isn’t cheap.· Vertical Scaling· Obsolete Schema· Difficult to backup· Queries aren’t getting any faster.
UnsustainableCASE STUDY: LEGACY REPORTING DATABASE
Talking about the Elephant: Diagnosing an Unscalable System
14
· If your solution…· Scales vertically· Prevents progress· Can’t perform at scale· Is difficult/slow/expensive to upgrade
…It’s time for a change!
SEE FOR YOURSELF
Talking about the Elephant: Diagnosing an Unscalable System
FINDING A BIGGER ROOMVetting Scalable Alternatives
16
· Price-efficient· Easy to maintain· Scales Horizontally
WHAT TO LOOK FOR
Finding a Bigger Room: Vetting Scalable Alternatives
17
· Write once, read many· De-normalized reports· High storage capacity· High Availability
Our Use CaseCASE STUDY: AWS S3 DATASTORE
18
· Write once, read many· Decent write performance, great read performance
· De-normalized reports· Flat files
· High storage capacity· No defined space limit
· High Availability· Configurable file replication
Technical OverviewCASE STUDY: AWS S3 DATASTORE
Finding a Bigger Room: Vetting Scalable Alternatives
19
· Cheap· Cloud-based· Architecture facilitates testing· Easy to back up
BenefitsCASE STUDY: AWS S3 DATASTORE
Finding a Bigger Room: Vetting Scalable Alternatives
20
· “Eventual Consistency”· Switching to non-relational storage is nontrivial
· Application code must change· Migration path gets complicated
CaveatsCASE STUDY: AWS S3 DATASTORE
Finding a Bigger Room: Vetting Scalable Alternatives
MOVING THE ELEPHANTMigrating Legacy Data to the New System
22
· Time Frame· Scheduling Constraints
· Operational Cost· Resource Constraints
· Standards for data parity
INITIAL CONSIDERATIONS
Moving the Elephant: Migrating Legacy Data to the New System
23
· Two-month finish line· Developed COGS models· Built data validation software
CASE STUDY: OUR UPFRONT PLANNING
Moving the Elephant: Migrating Legacy Data to the New System
24
· Can be scaled up or down· Speed up to save time· Slow down to save resources
· Can be run in a testing capacity· Configurable data sources/sinks· Configurable hardware resource use
IDEAL MIGRATION SOFTWARE CHARACTERISTICS
Moving the Elephant: Migrating Legacy Data to the New System
25
· Oozie and Hive· Controllable time/resource tradeoff· Testable in a qa environment
OUR MIGRATION SOFTWARE
26
· Easy to track progress· Enables concurrency· Dilutes failure risks· E.g. Conductor “Time Periods”
AN INCREMENTAL MIGRATION: PARTITIONING DATA
Moving the Elephant: Migrating Legacy Data to the New System
27
· Limit client exposure to subtler bugs· Incorporate customer feedback· Demonstrate progress early· E.g. Conductor Searchlight 3.0 Beta Program
AN INCREMENTAL RELEASE
28
YOU CAN DO IT!
29
QUESTIONS?Thanks for Listening!
30
(We’re Hiring!)