How MediaMath Built Faster, Scalable Attribution Reporting with Hadoop-Impala

How MediaMath Solved a Cri1cal Repor1ng Problem with Impala

The Cloudera Sessions

June 18, 2014 Ram Narayanan, Senior Director of Database Architecture & Opera1ons

Digital Marke1ng Pioneer •  Founded in 2007 •  Global technology company •  Invented first Demand Side PlaJorm (DSP) for online ads •  Conducts online adverNsing through real-‐Nme bidding & programmaNc buying

About MediaMath

About MediaMath Overview of Real-‐Time Bidding

Real-‐1me Auc1on

<30 ms Adver1ser (Client)

www.cnn.com

About MediaMath Overview or Real-‐Time Bidding

www.cnn.com

Purchased!

ad www.shoes.com

$$ Event Logs

•  Ad OpportuniNes: 80-‐100 billion per day "   1.2 million opportuniNes per second at peak

•  We bid on 30-‐40 billion ads per day •  We serve 1-‐2 billion ads per day •  15-‐20 million events (click, sale, online sign-‐up) per hour •  2 TB of data daily (compressed) "   Note: This only counts our wins. If we count losses, we easily reach PBs.

About MediaMath

Which ad (impression) led to which ac1on, like a sale or online signup •  35-‐40 billion recorded impressions served every 30 days •  15-‐20 million events per hour •  Need to join events with impressions 2x per hour

à Find matching records à Perform complex sequencing & allocaNon logic à Run aggregaNons on results à Send data to data marts

à Provide hourly reporNng to clients

The Repor1ng AZribu1on Problem

Incumbent Architecture: Appliance-‐based (Netezza)

Cost: Expensive -‐ Scale: Non-‐incremental scalability -‐ Performance: ReporNng lag -‐ ReporNng inflexibility

Product feature constrained -‐ -‐

To build a data warehouse architecture that could perform hourly repor1ng of aZribu1on data at scale that is affordable and easy to manage.

Our goal

" Scalability Handle 10-‐50x scale

" Capability Ability to perform big data joins at scale

" Performance Complete aggregaNon in <60 minutes

" Cost effec1ve Cheaper than appliance-‐based soluNons

EvaluaNon Criteria:

" Hive Run Nme: Took 5-‐6 hours to complete Stability: High

" Pig Run Nme: Took 4-‐5 hours to complete Stability: High

" Impala Beta (0.6) Run Nme: Took 2-‐3 hours to complete Stability: Low

Evaluated OpNons: Round 1

" Hive: Post-‐Tuning (map joins, bucke1ng, split size, etc.) Run Nme: Took 2-‐3 hours to complete Stability: High

" Impala GPA (1.0) (L0 compression, slicing, tuning, hw upgrade) Run Nme: Took 30 minutes to complete Stability: High

Evaluated OpNons: Round 2

Data Warehouse Architecture 2011

Bid Logs

Pixel Logs

Metadata

Repor1ng Data Marts

A T T R I B U T I O N

Reports

Aggrega1on

Netezza 2011

Data Warehouse Architecture 2011

Bid Logs

Pixel Logs

Metadata

Repor1ng Data Marts

A T T R I B U T I O N

Reports

Aggrega1on

Reports Aggrega1on

Netezza Hadoop

•  December 2013: Peak season "   New architecture accommodated 2x data volume with unprecedented scalability & stability

•  Present: We are planning to add more features "   Considering moving some part of aggregaNon into Hadoop

Proof:

•  Process ONLY the required data •  Compress your data •  “Divide & Conquer” your data (i.e. slice and dice)

Lessons Learned & Best Prac1ces

THANK YOU

How MediaMath Built Faster, Scalable Attribution Reporting with Hadoop-Impala

Data & Analytics

IMPALA 2015

Impala Rustenburg Strategic Review - Impala Platinum

Apache Impala Guide Configuration File.....615 Connecting to impalad through impala-shell

Impala cookbook 01-2017 - Cloudera Blog · •As of CDH 5.4/Impala 2.2, •Impala might choose the wrong plan –BROADCAST •Impala sometimes require 256MB as the minimal requirement

IMPALA 2016

Adform, Turn, MediaMath,DataXu | Company Showdown

Apache Impala GuideHow Impala Works with Apache Hadoop.....15 Primary Impala Impala Concepts and Architecture.....16 Components of the Impala

Apache Impala (incubating) Guide · | Contents | vi Impala Date and Time Functions.....434

Drive Business Results with MediaMath Retail

NYC Rebels of Recruiting Roadshow | Peter Phelan from MediaMath

IMPALA 2015 - Dealer.com CHEVROLET.CA IMPALA 2015 youtube.com/ chevroletcanada facebook.com/ chevroletcanada twitter.com/ chevroletcanada

2005 Chevrolet Impala Owner Manual M · 2005 Chevrolet Impala Owner Manual M. GENERAL MOTORS, GM, the GM Emblem, CHEVROLET, the CHEVROLET Emblem, the IMPALA Emblem, and the name IMPALA

Cloudera ODBC Driver for Impala...2015/02/05 · Cloudera ODBC Driver for Impala is used for direct SQL and Impala SQL access to Apache Hadoop / Impala distributions, enabling Business

2014 Chevrolet Impala

Apache Impala Guide - impala.apache.org · How Impala Works with Apache Hadoop.....14 Primary Impala Features.....15 Impala Concepts and Architecture.....15 Components of the Impala

How Impala Works

Extreme-scale Ad-Tech using Spark and Databricks at MediaMath

Digiday State of the Industry with MediaMath

Impala Warehousing

Cloudera Impala