R&D to Product Pipeline Using Apache Spark in AdTech: Spark Summit East talk by: Maximo...

Preview:

Citation preview

R&D To Product Pipeline Using Apache Spark in Adtech

Maximo GurmendezDr. Sunanda ParthasarathyDr. Saket MengleDataXu Inc.

What to expect from this session• Who is DataXu?

• Why Apache Spark?

• From R&D to product using Apache Spark Demo

• Analytics using Apache Spark

DataXuMake Marketing Smarter Through Data Science

• Who• Spun out of MIT Labs• A petabyte scale digital

marketing platform• One of the fastest growing

companies in Inc. 5000• What

• Help world’s most valuable brands understand and engage with their consumer

• Maximize ROI

Quick Statistics• Billions of ads served per month• ~10ms round trip response time• 130+ TB logs per day • 3000+ servers powering the platform• 13 regions, 24x7

Real Time Bidding

DataXu Machine Learning

Learn Models

ModelsImpressionsClicks

Activities

Calibrate

Evaluate

Real Time

BiddingS3

Why is this hard?Huge Scale • 2.7 million bid decisions per second

• 3 PB of data processed daily• Runs 24 X 7 on 5 Continents• Thousands of ML Models Trained per Day

Unattended Operation • Model training and deployment runs automatically every day

Changing Industry • Need ability to adapt quickly to new customer requirements

Demo

Benchmarks

0 5,000,000 10,000,000 15,000,0000

100

200

300

400

500Training Time Comparison

Logistic RegressionLinear (Logistic Regression)Decision TreeLinear (Decision Tree)Linear (Decision Tree)Random ForestLinear (Random Forest)

Number of Training Records

Trai

ning

Tim

e ( i

n se

c)

Current DataXu Model Spark Random Forest0

0.4

0.8

1.2

1.6

Avg. Bidding Latency (milliseconds)

Random Forests

Logistic Regression

Naive Bayes Decision Trees

DataXu Model

020406080

100120140

Model Size in Memory (KB)

S3 – meta data

Why Apache Spark for Adv. Analytics

• Makes Advanced Analytics a reality – accelerated queries, graph processing, streaming analytics

• Speaks multiple languages (Python, Scala, SQL)

• Makes it easy – Compared to Java/Hadoop complexities

• Accelerates the analyst/data scientist workflow

Real Time Bidding Engine

Adv. Analytics Engine

S3 – meta data

Advanced Analytics at DataXu

Real Time Bidding Engine

Analytics Engine

Partner/Client Data

Dashboarding/Reporting

+}

Analytics Demo

Thank You.

smengle@dataxu.commgurmendez@dataxu.comsparthasarathy@dataxu.com

!! We’re hiring !! Data Scientists, Data Science Engineers. FTEs, Interns

Recommended