Cassandra Day Denver 2014: Transitioning to Cassandra for an Already Giant Product

Integrated with 100,000+ publishers Connected to 35+ DSPs

Partnerships with Industry-Leading Trading Desks 10,000+ Brand Name Advertisers

Who We Are

•  Holistic video advertising platform for publishers

•  Most transparent global marketplace for sellers

•  Founded in 2007, 180+ employees globally

•  First to market with video RTB in 2010

•  Integrated with over half of comScore top 100 pubs

Ad decisions per day

2+ Billion

Serving impressions in

100+ Countries

Uniques every month 335+ Million

Reaching

●  Over 2 billion ad auc1ons per day ●  Each auc1on generates an average of 20-‐30 “records”

●  Audience data ●  Bid data ●  Event tracking

●  A “record everything” approach would result in approximately 50 billion records per day ●  Normalized: ~ 1.5 TB / day uncompressed ●  Denormalized: ~ 5 TB / day uncompressed

●  Possibly up to 150 TB of data per month ●  We are not currently using a “record everything” approach, but we want

to get there

How Big is Our Data?

How Fast Does Our Data Grow?

0

500000000

1E+09

1.5E+09

2E+09

2.5E+09

10/9/13 11/9/13 12/9/13 1/9/14 2/9/14 3/9/14 4/9/14 5/9/14 6/9/14 7/9/14 8/9/14 9/9/14

Auctions

Auctions

Growth Curve

●  Typically our numbers double every 6 months ●  We expect more rapid growth over the next year or two

How Fast Does Our Data Grow?

0

2E+09

4E+09

6E+09

8E+09

1E+10

1.2E+10

Auctions

Auctions

Growth Curve

●  Typically our numbers double every 6 months ●  We expect more rapid growth over the next year or two

●  Over 10 billion ad auc1ons per day ●  Each auc1on generates an average of 30-‐40 “records”

●  Audience data ●  Bid data ●  Event tracking

●  A “record everything” approach would result in approximately 350 billion records per day ●  Normalized: ~ 10.5 TB / day uncompressed ●  Denormalized: ~ 35 TB / day uncompressed

●  Possibly up to 1 PB of data per month

How Big Might Our Data Get in a Year?

Excited to see how we’re using Cassandra for all this?

Too bad, we aren’t (yet)!

Where Do We Start?

●  Informa1on about the people that are viewing ads ●  Segment data (demographics, browsing history, etc) ●  Ads viewed ●  ID syncing

●  Used for adver1sers to reach their target audience ●  “My product is relevant only to bald, le_-‐handed, highly educated

immigrants from Uzbekistan.” ●  Historically stored in cookies ●  Technology advancement necessitates abandoning the cookie strategy

●  Track users on mul1ple devices ●  Mobile devices and connected TVs don’t typically support cookies

●  Offline availability of data provides analy1cs opportuni1es ●  Discover trends ●  Look-‐alike segments

Audience Data

Cookie-‐based Workflow Browser SpotXchange

Browser requests an ad via HTTP

Server responds with an ad The ad payload includes data partner URLs

Data Partner

Browser requests partner URL Request payload includes partner’s cookies

Data provider replies with a redirect containing segment information Browser redirects to us

We respond with our own cookies containing their segment data

Browser requests an ad via HTTP Now including our cookies

Server responds with an ad targeted at audience segments

●  Cookies are overly constraining and gefng worse ●  Limited to desktop traffic ●  Payload is expensive

●  Bandwidth ●  Processing (encryp1on and encoding)

●  Impossible to run deep analy1cs ●  Impossible to perform server-‐to-‐server synchroniza1on

●  Newer iden1fica1on standards are emerging ●  Apple IDFA, Android ID, UIDH ●  Facebook/Google ID ●  Device Fingerprin1ng

●  Moving audience data onto the server allows data to be associated with any iden1fier and even tying mul1ple iden1fiers together

Moving Away from Cookies

Server-‐side Storage Workflow Browser SpotXchange


Server responds with an ad The ad payload includes data partner URLs

Data Partner

Browser requests partner URL with SpotX audience ID attached

Data provider replies with a redirect containing segment information and partner audience ID Browser redirects to us


Server responds with an ad targeted at audience segments

We store segment information on the server

Addi1onal Capabili1es Browser SpotXchange Data Partner

User visits a site that provides the partner new data about that user Provider

recognizes that they have synced this user with us in the past


Server responds with an ad targeted at the new audience segments

Partner calls us server-to-server with the user information, including our ID and new data

We store the new information

Storing Audience Data In Cassandra

Data Modeling Cluster Sizing

Replica1on Strategy

Data Modeling

●  Solu1on must minimize latency ●  Ajempt to constrain to one read or one write per event whenever

possible {! "audience_id" : "12345678-1234-1234-1234-123456789012",! "segments" : {"123": 1, "456": 3, "789": 1},! "foreign_ids" : {! "7180" : "967992447104804725",! "7347" : "bWv2-HOyJD8y6D",! "6960" : "404_53e3bfa26d377"! },! "pacing" : {! "2235" : 1412892591! }!}!

Data Modeling

●  Ad auc1oning requires reading nearly all the data at once ●  Most events write to one and only one data type (segments, ids, etc) {! "audience_id" : "12345678-1234-1234-1234-123456789012",! "segments" : {"123": 1, "456": 3, "789": 1},! "foreign_ids" : {! "7180" : "967992447104804725",! "7347" : "bWv2-HOyJD8y6D",! "6960" : "404_53e3bfa26d377"! },! "pacing" : {! "2235" : 1412892591! }!}!

Data Modeling

●  Store an en1re user record in one row so it can be read all at once ●  All data can be represented as a tuple with a unique iden1fier

CREATE TABLE audience_data (!!audience_id uuid,!!type int,!!key text,!!value text,!!PRIMARY KEY (audience_id, type, key)!!);!

!SELECT * FROM audience_data WHERE!

!audience_id = 12345678-1234-1234-1234-123456789012;!!SELECT * FROM audience_data WHERE!

!audience_id = 12345678-1234-1234-1234-123456789012 AND!!type = 1;!

!INSERT INTO audience_data (audience_id, type, key, value) VALUES!

!(12345678-1234-1234-1234-123456789012, 1, '123', '1');!

Data Modeling

Cluster Sizing

●  Distributed a modified version of our implementa1on to produc1on ●  Replaced Cassandra calls with writes to a log file

●  Created a spreadsheet detailing each opera1on and how much load to expect during peak 1mes

●  Used peak load to size the cluster for each data center ●  Used formula provided by Aaron Morton at The Last Pickle

system_constant * #cores * #nodes = ops / sec!replication_factor .!

!

!ops = 1 read or write to one row (cluster in a partition)!

!system_constant = !3000 for AWS!

! ! !4000 for spinning disk!! ! !7-12K for SSD!

Cluster Sizing

●  Typically clusters start small and grow as product adop1on grows ●  Our cluster will be working hardest when we first turn it on

●  Exis1ng cookie data needs to migrate to Cassandra ●  As data migrates the load will decrease, normalize, and then increase

slowly over the next few months ●  Don’t expect to match original load for nearly a year

Our Backwards Scenario

0

20000

40000

60000

80000

100000

120000

140000

Peak OPS

Peak OPS

Cluster Sizing

den01 iad02 lon01 hkg01 % of total traffic 40% 40% 13% 7% Normal tag rate 0.1 0.1 0.1 0.1 Migra1on tag rate 0.75 0.75 0.75 0.75

SELECT DC Avg 46,296 46,296 15,046 8,102 Peak 138,889 138,889 45,139 24,306

FE Avg 126 263 684 675 Peak 377 789 2,052 2,025

UPDATE tag (typical load) DC Avg 4,630 4,630 1,505 810 Peak 13,889 13,889 4,514 2,431

FE Avg 13 26 68 68 Peak 38 79 205 203

UPDATE tag (migra1on) DC Avg 30,093 30,093 9,780 5,266 Peak 90,278 90,278 29,340 15,799

FE Avg 82 171 445 439 Peak 245 513 1,334 1,317

Total DC ops (normal load) Avg 51,389 51,389 16,701 8,993 Peak 154,167 154,167 50,104 26,979

Total DC ops (migra1on) Avg 87,963 87,963 28,588 15,394 Peak 263,889 263,889 85,764 46,181

Constant

Nodes required (8 core) Spinning disk 4000 17 17 6 3 SSD 7000 10 10 4 3

Cluster Sizing

den01 iad02 lon01 hkg01 % total traffic 40.00% 40.00% 13.00% 7.00%

Tag Daily GB 0.9 0.9 0.3 0.2 Total GB 84 84 27 15

Frqcap Daily GB 0.6 0.6 0.2 0.1 Total GB 3.9 3.9 1.3 0.7

Partner Daily GB 8 8 3 1 Total GB 1509 1509 490 264

Total GB 3193 3193 1038 559

Per Node GB 456 456 346 186

Replication Strategy

●  Typical Cassandra replication is expensive ●  Each write is replicated to all data centers ●  Each cluster must be approximately the same size ●  Need a large pipe between data centers

●  3.7 million columns updated per second at peak load ●  Amount of replica1on needed increases with each new data center

Replica1on Strategy

●  Alternate strategies suggested: ●  Offline copying of SSTables ●  Maintain a log of changed records and run a process to copy those

periodically ●  We realized that this data doesn’t need to be available in all places at all

1mes ●  People don’t o_en move far enough to switch data centers ●  Data integrity is of fairly low importance

●  If our data isn’t replicated the user will appear to be new when they switch data centers, but that only has a minor short-‐term impact on applica1on performance

●  Other replica1on strategies we considered: ●  None ●  Just-‐in-‐1me ●  Queued

Replica1on Strategy

●  Don’t replicate at all ●  Each data center has its own completely self-‐contained cluster ●  Advantage: Simplicity ●  Disadvantage: Limits our ability to target users when they move or we

reassign regions to a different data center

Replica1on Strategy: None

●  Each data center has its own completely self-‐contained cluster ●  The user’s iden1fier cookie contains a data center iden1fier ●  When an incoming request’s cookie says it’s from a different data center,

read from that data center in real 1me and replicate on the fly to the local data center

●  Reassign the cookie using the new data center ●  Advantage

●  Audience data is (almost) always available (99.99%) ●  Disadvantages

●  Addi1onal latency while wai1ng for user data ●  In cookie-‐less situa1ons we’d need to query all data centers if the

local data center has no data

Replica1on Strategy: Just-‐In-‐Time

if(cookie != null) {!!audience_id = cookie[id]!!audience_dc_id = cookie[dc_id]!

}!else {!

!audience_id = some other identifier!}!!if(audience_dc_id == local_dc) {!

!audience_data = local_dc->cassandra->fetch(audience_id)!}!else {!

!other_dcs = audience_dc_id != null ?!! ! !{audience_dc_id} : {dc1, dc2, dc3}!!for dc in other_dcs {!! !audience_data = dc1->cassandra->fetch(audience_id)!! !if(audience_data != null) {!! ! !local_dc->cassandra->write(!! ! ! !audience_id, audience_data!! ! ! !)!! ! !break;!! !}!!}!

}!

Replica1on Strategy: Just-‐In-‐Time

●  Each data center has its own completely self-‐contained cluster ●  When a fetch ajempt misses, the user ID is added to a queue for

reconcilia1on ●  Treat the user as a new user and store their data locally ●  Background process consumes IDs from the queue and ajempts to fetch

data from other data centers for reconcilia1on ●  Advantages

●  Audience data is mostly available (98%) ●  Minimal addi1onal latency introduced

●  Disadvantages ●  Addi1onal opera1onal complexity ●  Occasional data misses

Replica1on Strategy: Queued

if(cookie != null) {!!audience_id = cookie[id]!!audience_dc_id = cookie[dc_id]!

}!else {!

!audience_id = some other identifier!}!!audience_data = local_dc->cassandra->fetch(audience_id)!if(audience_data == null) {!

!local_dc->cassandra->queue_for_migration(!! !audience_id, audience_dc_id!! !)!

}!


audience_migrations = local_dc->fetch_from_queue()!for {audience_id, audience_dc_id} in audience_migrations {!

!other_dcs = audience_dc_id != null ?!! ! !{audience_dc_id} : {dc1, dc2, dc3}!!for dc in other_dcs {!! !audience_data = dc1->cassandra->fetch(audience_id)!! !if(audience_data != null) {!! ! !local_dc->cassandra->write(!! ! ! !audience_id, audience_data!! ! ! !)!

! !}!!}!

} !


THANK YOU Andrew Ku0g Send ques5ons, deck requests, complaints, cat videos, and resumes to: [email protected]