MongoDBUse Cases
Healthcare, CMS, Analytics
Thomas O’RourkeUpstream Innovations Ltd.
Oulu / Seattle
www.dashwire.com
Dashwire Dashconfig• Users configure their mobile phones on PC.
o Email accounts, wallpapers, ringtones, bookmarks, contacts, etc.o Generates a lot of data!
• Wanted: Google Analytics + Splunk + BI.o Sensitive data:
• Can’t send out => No Google Analytics.o Many sources
• (Server log files, SQS, Web analytics, etc.)o internal error report &
• UI issues (powerful paradigm)o Real time vs. Reports/Enterprise
• ~500,000 events a day o Store for year
Solution• Eco-system in Mongo
o Evolved
• Layered architectureo L1. Store - “De-duplication.
• Streaming live (syslog) • Playback of log files
o L2. Parsing into key/value pairs.o L3. Processing. o L4. Reports.
• Trade-offs for real-timeo Reconcilero Trade offs for real time and offline
Tools• MongoDB • Ruby• Sinatra• Ruby driver
o (Connection pooling, multithreaded, replica set support)
• Event machine + em-mongo• ZeroMQ• Sinatra/Rack/Thin• Mixpanel• Server density• Excel• Highcharts• softlayer
Integrity ChecksOnce day
Eco system
Parsing logs"2012-08-17 13:08:11 app02 Passngr[20167]: I script(www-data) -- {\”analytics\":{\"scenario\":\"three\",\"initial scenario\":\"three\",\"phone\":\”Cool Phone\",\"name\":\"Facebook\",\"time\":\"2012-08-17 18:08:11.399 UTC\",\"event\":\"Bookmark Added\",\"browser_tracking_id\":\"857b307a4d1xxxxx08ebca70f6\",\"browser_time\":\"2012-08-17 18:08:14.794 UTC\",\"browser_event\":1,\"session_id\":\"68528379d5xxxxxxxcda27fd625fe\"}}"
{ scenario: “three”, phone : “Cool phone”, event : “Bookmark Added”, session_id : ... }
JSON.parse( )
Collection = Event_Bookmark_Added
De-duplication• Multikey index
o Integers perform well• MD5 of entire log line as string (only use half of result)• Unix time stamp (seconds)• Fraction of second (if one is present)
• Better to use millisecond but not required
@collections[collection].create_index( [ [:ts, Mongo::ASCENDING],
[:ts_frac, Mongo::ASCENDING], [:dhash, Mongo::ASCENDING ] ], { :unique => true, :drop_dups => true} )
Process pattern
Pre allocate “processed : 0”At insert time (creation)
Index (no dup)
process
@collections[collection].insert( doc )
Reports• Needed both Real time and Enterprise (Excel Reports)
o We use MongoDB for both and all intermediate tables
• Reports o Map/Reduce for Reports and Graphs o Considered MySQL but rejected as unnecessaryo Write Excel (*.xlsx) directly using Ruby and accessing MongoBD.
• https://github.com/randym/axlsx
• Real-timeo Incremental Map/Reduce gives performance to do real time graphs.
• http://www.highcharts.com
Server Density
PART 2Technical Discussion
• Performance• Durability• Replica sets• Maintenance• Transactions• Drivers and Languages• Demos
Performance• ~3000 inserts a second for unsafe mode.• < 1000 for safe mode.• Indexes = memory.• Use slaves when possible for reads (note:
consistency)• Your driver makes a HUGE difference.• Pre-allocate for updates!• Safe mode is much slower
o Not everything is required to be 100% safeo Not everything is unsafe.o Think! ARCHITECT your durability where you need it!
Durability SAFE /SLOWER
FAST/UNSAFE
Replica set uses• Redundancy
o Data is at multiple nodeso n-seconds behind mode, is an ‘ass’ saver (it’s very easy to accidentally drop a
collection!)
• Failovero Sleep at night
• Maintenanceo Backup slaveso Build indexes on slaves and promote them
• Load balancingo Reads on slaves
@collection.insert(doc, :safe => { :w => “majority” } )Journal + replicate (journal only applies to primary) but guarantees the rollback will be available if failed before replication.
Maintenance• Backup/Maintenance
o Backup by stopping slave, copy files, start slave• /data/*• Can be copied and backed up and compressed• Compression is high! (Can be 70%!) because fields names are not
compressedo Mongo export and import BSON can be run while database is runningo Server density
• Nodes health • Slave lag - time behind• Index size• Etc.
Transactions• findAndUpdate().
o Atomic update and return it in same document
• Upserts and indexes .• Planning for failure not assuming transactions.
Driver and language• Driver and Language
o Use a dynamic language! Ruby, Python, etc. o Driver support for replica set, and connection pool preferred.o A Simple ORM/Mapper, etc. works great.
• Mongoid• MongoMapper• Or even just plain driver (Mongo Ruby driver)
o Learn Javascript! • Shell Javascript commands and Ruby driver methods are very
similaro findOne vs find_one
• Map/Reduce –is always Javascript• Everything is a Map/Reduce – get used to it.• (It’s not difficult for these purposes!)
Demos• https://github.com/tomjoro/mongo_browser
o JQuery tree viewo Sinatrao Mongo
• Coolo Integrating R with MongoDBo Highcharts
• Contact information:o http://www.linkedin.com/in/tomjoro [email protected]