34
1 Breaking the Oracle tie; High Performance OLTP and analytics using MongoDB Alexandros Giamas Senior Software Engineer

Breaking the oracle tie

  • Upload
    agiamas

  • View
    516

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Breaking the oracle tie

1

Breaking the Oracle tie; High Performance OLTP and analytics using MongoDB

Alexandros GiamasSenior Software Engineer

Page 2: Breaking the oracle tie

Persado: Proven Value - Worldwide

Skype

$500 MILLION

Incremental Revenue

30+Premium Brands

20+Worldwide Languages

40+Countries

500M+

Engaged Consumers

100%Average Conversion

Lift

Page 3: Breaking the oracle tie

4

Can you afford to leave half the opportunity on the table?

You won't believe itPick an Online Number! Why you'll love your Online Number: 1. Your friends without VoIP can call you 2. You answer on VoIP 3. You also have voicemail included

I like that!

2.07%

You won't believe itPick an Online Number! Why get an Online Number: 1. Your friends without VoIP can call you 2. You answer on VoIP 3. You also have voicemail included

I like that!

1.42%

You won't believe itThey dial, you answer on VoIP! Why you'll love your Online Number: 1. Family & friends without VoIP can call you 2. You answer on VoIP 3. And you can use it from anywhere in the world

I like that!

1.11%

…another 16 Million + combinations

Page 4: Breaking the oracle tie

The Marketing Communication Suite

We Generate the marketing messages that work best.For any customer, any product, at any time.

Page 5: Breaking the oracle tie

Persado History

Oracle shop

Page 6: Breaking the oracle tie

Persado History

Page 7: Breaking the oracle tie

Persado History

• Exponentially growing dataset• Data value/KB?

Page 8: Breaking the oracle tie

Persado History

Not anymore...

Page 9: Breaking the oracle tie

Persado History

Transactional Data and Analytics

Page 10: Breaking the oracle tie

Transaction (Re)-defined

Social, Mobile, Email, Web, Display, Search

Which one stands out?

Page 11: Breaking the oracle tie

Conversational and Transactional PropertiesWeb based channels

Page 12: Breaking the oracle tie

Mobile Text Messaging

Conversational and Transactional Properties

Page 13: Breaking the oracle tie

Flexi-structured data

One User across campaigns and mediums

{"_id" : ObjectId("511e3cbea9f1fd01fbd51c67"),"domain" : DBRef("Domain", NumberLong(3)),"locale" : "en","msisdn" : "59210000000","email" : "[email protected]","mobileInfo" : {"State" : "CA"},"emailInfo" : { "referral" : "www.google.com" },"expclk" : { "h" : 0.05, "d" : 0.02 }

}

Page 14: Breaking the oracle tie

Overall Architecture - Data flow

Page 15: Breaking the oracle tie

Sizing transactional data

☛ User Terminated data☛ User Originated data☛ Metadata (state for User per campaign and globally)☛ Must hold data in memory, or at least indexes

Page 16: Breaking the oracle tie

ETL for OLAP

Offline / Online processing•Going online is mostly simpler•Offline must take into account data irregularities (data validation policy driven by business needs)

Page 17: Breaking the oracle tie

ETL for OLAP

☛Custom Data transformation☛Custom “continueOnError” implementation

Page 18: Breaking the oracle tie

Analytics

First cut- Custom js server-side using $where

Page 19: Breaking the oracle tie

Analytics

GWLGlobal Write Lock

Page 20: Breaking the oracle tie

Analytics In the real world

Page 21: Breaking the oracle tie

Your own mini transactions

Break down Spring Batch steps in idempotent and non idempotent ones•For idempotent steps, just replay them•For non idempotent, replace current state with last known good state before latest spring batch step invocation (undo log) and retry the step

Page 22: Breaking the oracle tie

Your own mini transactions Issues•16MB document size limit...•Slow to replay•Hard to test using Selenium

Page 23: Breaking the oracle tie

Analytics In the real world

Map Reduce Implementation

Page 24: Breaking the oracle tie

Analytics In the real world

Caching layers✓ Caching in collections

Page 25: Breaking the oracle tie

Analytics In the real world

Caching layers✓ Caching in ehcache

Page 26: Breaking the oracle tie

Analytics using the Aggregation Framework{$project: { "rdd": {

$isoDate: { year: {$year:"$_id.receivedDateHour"}, month: {$month:"$_id.receivedDateHour"}, dayOfMonth: {$dayOfMonth:"$_id.receivedDateHour"},

hour: {$hour:"$_id.receivedDateHour"} }

}, "value.diffDaysSum.0":1, "value.diffDaysSum.1":1, "value.diffDaysSum.2":1

} }, {$project: {rdd:1, diffDaysSum : {$add : ["$value.diffDaysSum.0",

"$value.diffDaysSum.1", "$value.diffDaysSum.2" ] } } },{$group: {

_id:"$rdd", totalSumPerDay: { $sum: "$diffDaysSum" } } }

Page 27: Breaking the oracle tie

Analytics using the Aggregation Framework

Double project phase, followed by grouping results

Page 28: Breaking the oracle tie

Analytics using the Aggregation Framework

Pros:✓ More flexible than it sounds✓ Rapid development✓ Easy debugging

Cons:✘ No custom js supported ✘ Memory limitation✘ API still evolving

Page 29: Breaking the oracle tie

Fine grained write semantics and asynchronous magic

Fine grained write semantics• WriteConcern.SAFE for most writes• WriteConcern.REPLICAS_SAFE for writes that are costly

to recompute in case of failure

Reactive Mongo • Asynchronous and non blocking scala driver for

MongoDB• Async writes with WriteConcern.SAFE and callback retry

policy in case of error

Page 30: Breaking the oracle tie

Lessons LearnedUse

replica setsJournalingAggregation FrameworkMMS

Don't useDevelopment versions across the teamUnbound datasets that can't fit in memoryMapReduce if you don't need to

Page 31: Breaking the oracle tie

MongoDB on EC2

4 nodes with 6 mongod processes

Page 32: Breaking the oracle tie

MongoDB on EC2 Using LVM's

http://goo.gl/8NbV7

For high performance, use LVM's with RAID 0 or 10Have your guerilla team ready:

Page 33: Breaking the oracle tie

MongoDB on EC2 Lesson Learned

Unix level tweaks:• Raise ulimit• Raise tcp timeout• Noatime nodirtime• Use XFS or ext4• Use LVM for snapshotting

Use journaling