A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?

Why Wordnik went Non-Relational

Tony Tam@fehguy

What this Talk is About

•5 Key reasons why Wordnik migrated into a Non-Relational database

•Process for selection, migration

•Optimizations and tips from living survivors of the battle field

Why Should You Care?

•MongoDB user for 2 years

•Lessons learned, analysis, benefits from process

•We migrated from MySQL to MongoDB with no downtime

•We have interesting/challenging data needs, likely relevant to you

More on Wordnik

•World’s fastest updating English dictionary

• Based on input of text up to 8k words/second

• Word Graph as basis to our analysis

• Synchronous & asynchronous processing

•10’s of Billions of documents in NR storage

•20M daily REST API calls, billions served

• Powered by Swagger OSS API framework

Powered APIswagger.wordnik.com

Architectural History

•2008: Wordnik was born as a LAMP AWS EC2 stack

•2009: Introduced public REST API, powered wordnik.com, partner APIs

•2009: drank NoSQL cool-aid

•2010: Scala

•2011: Micro SOA

Non-relational by Necessity

•Moved to NR because of “4S”

• Speed

• Stability

• Scaling

• Simplicity

•But…

• MySQL can go a LONG way

• Takes right team, right reasons (+ patience)

• NR offerings simply too compelling to focus on scaling MySQL

Wordnik’s 5 Whys for NoSQL

Why #1: Speed bumps with MySQL

•Inserting data fast (50k recs/second) caused MySQL mayhem

• Maintaining indexes largely to blame

• Operations for consistency unnecessary but "cannot be turned off”

•Devised twisted schemes to avoid client blocking

• Aka the “master/slave tango”

Why #2: Retrieval Complexity

•Objects typically mapped to tables

• Object Hierarchy always => inner + outer joins

•Lots of static data, so why join?

• “Noun” is not getting renamed in my code’s lifetime!

• Logic like this is probably in application logic

•Since storage is cheap

• I’ll choose speed


One definition = 10+ joins

50 requests

per second!


•Embed objects in rows “sort of works”

• Filtering gets really nasty

• Native XML in MySQL?

• If a full table-scan is OK…

•OK, then cache it!

• Layers of caching introduced layers of complexity

• Stale data/corruption

• Object versionitis

• Cache stampedes

Why #3: Object Modeling

•Object models being compromised for sake of persistence

• This is backwards!

• Extra abstraction for the wrong reason

•OK, then performance suffers

• In-application joins across objects

• “Who ran the fetch all query against production?!” –any sysadmin

•“My zillionth ORM layer that only I understand” (and can maintain)

Why #4: Scaling

•Needed "cloud friendly storage"

• Easy up, easy down!

• Startup: Sync your data, and announce to clients when ready for business

• Shutdown: Announce your departure and leave

•Adding MySQL instances was a dance

• Snapshot + bin files

mysql> change master to MASTER_HOST='db1', MASTER_USER='xxx', MASTER_PASSWORD='xxx', MASTER_LOG_FILE='master-relay.000431', MASTER_LOG_POS=1035435402;

Why #4: Scaling

•What about those VMs?

• So convenient! But… they kind of suck

• Can the database succeed on a VM?

•VM Performance:

• Memory, CPU or I/O—Pick only one

• Can your database really reduce CPU or disk I/O with lots of RAM?

Why #5: Big Picture

•BI tools use relational constraints for discovery

• Is this the right reason for them?

• Can we work around this?

• Let’s have a BI tool revolution, too!

•True service architecture makes relational constraints impractical/impossible

•Distributed sharding makes relational constraints impractical/impossible

Why #5: Big Picture

•Is your app smarter than your database?

• The logic line is probably blurry!

•What does count(*) really mean when you add 5k records/sec?

• Maybe eventual consistency is not so bad…

•2PC? Do some reading and decide!http://eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf

Ok, I’m in!

•I thought deciding was easy!?

• Many quickly maturing products

• Divergent features tackle different needs

•Wordnik spent 8 weeks researching and testing NoSQL solutions

• This is a long time! (for a startup)

• Wrote ODM classes and migrated our data

•Surprise! There were surprises

• Be prepared to compromise

Choice Made, Now What?

•We went with MongoDB ***

• Fastest to implement

• Most reliable

• Best community

•Why?

• Why #1: Fast loading/retrieval

• Why #2: Fast ODM (50 tps => 1000 tps!)

• Why #3: Document Models === Object models

• Why #4: MMF => Kernel-managed memory + RS

• Why #5: It’s 2011, is there no progress?

More on Why MongoDB

•Testing, testing, testing

• Used our migration tools to load test

• Read from MySQL, write to MongoDB

• We loaded 5+ billion documents, many times over

•In the end, one server could…

• Insert 100k records/sec sustained

• Read 250k records/sec sustained

• Support concurrent loading/reading

Migration & Testing

•Iterated ODM mapping multiple times

• Some issues

• Type Safetycur.next.get("iWasAnIntOnce").asInstanceOf[Long]

• Dates as Stringsobj.put("a_date", "2011-12-31") !=

obj.put("a_date", new Date("2011-12-31"))

• Storage Sizeobj.put("very_long_field_name", true) >>

obj.put("vsfn", true)

Migration & Testing

•Expect data model iterations

• Wordnik migrated table to Mongo collection "as-is”

• Easier to migrate, test

• _id field used same MySQL PK

• Auto Increment?

• Used MySQL to “check-out” sequences

• One row per mongo collection

• Run out of sequences => get more

• Need exclusive locks here!

Migration & Testing

•Sequence generator in-processSequenceGenerator.checkout("doc_metadata,100")

•Sequence generator as web service

• Centralized UID management

Migration & Testing

•Expect data access pattern iterations

• So much more flexibility!

• Reach into objects> db.dictionary_entry.find({"hdr.sr":"cmu"})

• Access to a whole object tree at query time

• Overwrite a whole object at once… when desired

• Not always! This clobbers the whole record> db.foo.save({_id:18727353,foo:"bar"})

• Update a single field:> db.foo.update({_id:18727353},{$set:{foo:"bar"}})

Flip the Switch

•Migrate production with zero downtime

• We temporarily halted loading data

• Added a switch to flip between MySQL/MongoDB

• Instrument, monitor, flip it, analyze, flip back

•Profiling your code is key

• What is slow?

• Build this in your app from day 1

Flip the Switch

Flip the Switch

•Storage selected at runtimeval h = shouldUseMongoDb match {

case true => new MongoDbSentenceDAO

case _ => new MySQLDbSentenceDAO

}

h.find(...)

•Hot-swappable storage via configuration

• It worked!

Then What?

•Watch our deployment, many iterations to mapping layer

• Settled on in-house, type-safe mapper https://github.com/fehguy/mongodb-benchmark-tools

•Some gotchas (of course)

• Locking issues on long-running updates (more in a minute)

•We want more of this!

• Migrated shared files to Mongo GridFS

• Easy-IT

Performance + Optimization

•Loading data is fast!

• Fixed collection padding, similarly-sized records

• Tail of collection is always in memory

• Append faster than MySQL in every case tested

•But... random access started getting slow

• Indexes in RAM? Yes

• Data in RAM? No, > 2TB per server

• Limited by disk I/O /seek performance

• EC2 + EBS for storage?


•Moved to physical data center

• DAS & 72GB RAM => great uncached performance

•Good move? Depends on use case

• If “access anything anytime”, not many options

• You want to support this?


•Inserts are fast, how about updates?

• Well… update => find object, update it, save

• Lock acquired at “find”, released after “save”

• If hitting disk, lock time could be large

•Easy answer, pre-fetch on update

• Oh, and NEVER do “update all records” against a large collection


•Indexes

• Can't always keep index in ram. MMF "does it's thing"

• Right-balanced b-tree keeps necessary index hot

• Indexes hit disk => mute your pager17

15

27

More Mongo, Please!

•We modeled our word graph in mongo

• 50M Nodes• 80M Edges• 80mS edge

fetch

More Mongo, Please!

•Analytics rolled-up from aggregation jobs

• Send to Hadoop, load to mongo for fast access

What’s next

•Liberate our models

• stop worrying about how to store them (for the most part)

•New features almost always NR

•Some MySQL left

• Less on each release

Questions?

• See more about Wordnik APIs

http://developer.wordnik.com

• Migrating from MySQL to MongoDBhttp://www.slideshare.net/fehguy/migrating-from-mysql-to-mongodb-at-wordnik

• Maintaining your MongoDB Installationhttp://www.slideshare.net/fehguy/mongo-sv-tony-tam

• Swagger API Frameworkhttp://swagger.wordnik.com

• Mapping Benchmarkhttps://github.com/fehguy/mongodb-benchmark-tools

• Wordnik OSS Tools https://github.com/wordnik/wordnik-oss

Technology

A Case Study of NoSQL Adoption: What Drove Wordnik Non-Relational?