35
| the prospect engine for brands. Cassandra in Online Advertising: Real Time Bidding

M6d cassandrapresentation

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: M6d cassandrapresentation

|

the prospect engine for brands.

Cassandra in Online Advertising: Real Time Bidding

Page 2: M6d cassandrapresentation

Who are we?Costa Sevdinoglou & Edward Capriolo

Page 3: M6d cassandrapresentation

Impressions look like…

Page 4: M6d cassandrapresentation

A High Level look at RTB

4. On behalf of the marketer, m6d bids the impressions via the auction house. If m6d wins, we display our ad to the browser.

3. Exchanges serve as auction houses for the impressions

1. Browsers visit Publishers and create impressions.

2. Publishers sell impressions via Exchanges.

Page 5: M6d cassandrapresentation

Performance and Data

• Billions and billions of bid requests a day• A single request can result in multiple

Cassandra Operations!• One cluster is just under 10TB and growing

• Low latency requirement below 120 ms typical• Limited data available to m6d via the exchange

Page 6: M6d cassandrapresentation

Segment Data

Segments are how we assign product or service affinity to a group of users. User’s we consider to be like minded with respect to a given brand will be placed in the same segment.

Segment Data is just one component of our overarching data model.

Segments help to reduce the number of calculations we do in real time.

Page 7: M6d cassandrapresentation

Old Approach for Segment Data

Limitations•Periodically updated.•Only subsection of the data.•Cluster performance is effected during a data push.

Page 8: M6d cassandrapresentation

Cassandra Approach for Segment Data

Better!• Updating in real time now

possible• Distributed not duplicated• Less complexity to manage• Storing more information• We can now bid on users sooner!

Page 9: M6d cassandrapresentation
Page 10: M6d cassandrapresentation

During waking hours: Dr. Realtime

• User traffic is at peak• Applications need low latency operations• High volume of read and write operations• Desire high cache hit rate to limit disk IO• Dr. Realtime conducts 'experiments' on

optimization

Page 11: M6d cassandrapresentation

Experiment: Active Set, VFS, cache size tuning

• Cluster optimization is a topic that must be revisited periodically

• User base and requests are perpetually growing• Amount of physical data stored grows• New features typically result in new data and

more requests• How to tune your environment is application

and hardware dependent

Page 12: M6d cassandrapresentation

Physical data directory

• sstable holds data• Index holds offsets to avoid

disk seeks• Bloom filter probabilistic

lookup system – (also a stat table)

Page 13: M6d cassandrapresentation

When RAM > Data Size

• If you can afford to keep your data set in RAM:

• It is fast from VFS cache• That's it. Your optimized.• However you do not

usually need this much ram

Page 14: M6d cassandrapresentation

When RAM < Data Size

• The OS will cache the most active portions of disk

• The write/compact model causes the cache to churn

• User requests causes the cache to churn

Page 15: M6d cassandrapresentation

Understanding Active set with a hypothetical example

Webmail service (Coldmail): • I have an account for 10 years, I never log in more

than twice a month• I have 1,000,000 items in my inbox• Not in the active set

Social networking (chirper):• I am logged in every day• Commonly read get updates from my friends• In the active set

Page 16: M6d cassandrapresentation

$60,000 Question

How do you determine what the active set of your application and user base is?

Page 17: M6d cassandrapresentation

Setup instruments for testing

Page 18: M6d cassandrapresentation

Turn on a cache

• JMX allows you to tune only a single node for side by side comparisons

• Set the size very large for key cache (be more careful with row cache)

Page 19: M6d cassandrapresentation

Analysis

• 8:30 hit rate 91% 1.2 mil

• 10:30 hit rate ~93% 1.7 mil

• Past 1.2 million entry cache might be better spent elsewhere

Page 20: M6d cassandrapresentation

Active set conclusions

• Determine sweet spot for hit rate and cache size• Do not try to cache long tail of requests• When all other things equal dedicate more

cache to most read column family • Use row cache only if rows are a predictable size• Large row caches can not be saved so cold on

restart

Page 21: M6d cassandrapresentation

read_repair_chance – Cassandra's version of an ethical dilemma

• Read Repair generates additional reads across the cluster for each user read

• Read Repair Chance controls the probability of Read Repair occurring.

• If data is write-once or write-rarely Read Repair may be unnecessary– data read ratio much larger then write ratio– data that does not need strict consistency

• 1.0 Hinted handoff now does not need to wait on the failure detector. Read Repair Chance default has been set to 10% from 100%.– Cassandra-2045 TX ntelford and co!

Page 22: M6d cassandrapresentation

Analysis for RRC 'test subjects'

Candidate: Many reads few writes

Inside story: This data used to take 2 days. A few ms... Come on man!

Candidate ?: Many writes

Inside story: This is used for frequency capping, higher % justified

Page 23: M6d cassandrapresentation

Experiment: Test the limits of NoSQL science with YCSB

YCSB is a distributed load generator that comes in handy! • Before our upgrade from 0.6.X->0.7.X

– All the benchmarks were better– But good to kick the tires

• Prototyping new Column Family– Time to write 500 million records– How many reads/second on 50GB of data

Page 24: M6d cassandrapresentation

Create a mixed workload

java -cp $CP com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.CassandraClient7 -P workloads/workloadb -t \

-threads 10 \-p recordcount=75000000 \-p operationcount=1000000 \-p readproportion=0.33 \-p updateproportion=0.33 \-p scanproportion=0 \-p insertproportion=0.33 \

Page 25: M6d cassandrapresentation

Round 1 Results

RunTime: 410 Seconds Throughput: 2437 Operations/Second

Shared the results on #cassandra irc. Suggestion! Try: -threads 30 \

Page 26: M6d cassandrapresentation

Trying it again…

Original Results:-threads 10RunTime: 410 SecondsThroughput: 2437 Operations/Second

New Results:

-threads 30RunTime: 196 SecondsThroughput 5088 Operations/Second

Page 27: M6d cassandrapresentation

Cassandra writes fast! (duh)

• Read path – Row, Key, and VFS caches– With enough data and read ops disks bottleneck

• Write path– structured log writes are linear to disk-wide and fast– compaction merges sstables in background

• Many threads maximizes write capability• Many threads also stops a read blocking on IO

from limiting write potential

Page 28: M6d cassandrapresentation

Night falls and Dr. Realtime transforms...

/etc/cron.d/mr_batch_dr_realtime# turn into Mr. batch at night0 0 * * * root nodetool -h `hostname` setcompactionthroughput 999#turn back into Dr. Realtime for day0 6 * * * root nodetool -h `hostname` setcompactionthroughput 16

Setting throughput ensures• During the day most iops are free to serve traffic• At night can rip through compactions

Page 29: M6d cassandrapresentation

Mr Batch ravages data creating tombstones

• If User clears cookies they vanish forever• In actuality they return as a new user• Data has very high turnover• We need to enforce retention policy on data• TTL columns do not meet our requirements :(• Cleanup daemon is a throttled range scanner• Cleanup daemon also produces histograms

every cycle

Page 30: M6d cassandrapresentation

Mr. Batch 'kills' rows while you sleep

Page 31: M6d cassandrapresentation

A note about different workloads

• Structured log format of C* has deep implications

• Many factors effect performance and disk size:

• Write once data

• Wide rows (many columns)

• Wide rows over time (fragmented)

• Application read write profile

• Deletion/update percentage

• LevelDB inspired compaction in 1.0 different profile then current tiered compaction

Page 32: M6d cassandrapresentation

Tombstones have costs

• Physically live on disk• Bloat data, index, and

bloom filters• Tombstone live for a grace

period and then are eligible to be removed

Page 33: M6d cassandrapresentation

Caching after (major) compaction

• Our case (lots of churn) major compaction shrinks data significantly

• Rows fragmented over many sstables are joined

• Tombstones and related data columns removed

• All files should be smaller• Smaller files means better

VFS caching

Page 34: M6d cassandrapresentation

Simple compaction scheduler

HOSTS=$1for i in $HOSTS ; do nodetool -h cassandra{i} -p 8585 compact KS1 nodetool -h cassandra{i} -p 8585 compact KS2 nodetool -h cassandra{i} -p 8585 compact KS3Done

30 23 * * * /root/compacto/spool.sh "01 02 03 04" 30 23 * * * /root/compacto/spool.sh "05 06 07 08"

Page 35: M6d cassandrapresentation

Questions

? ?

?