26
Real-Time Applications At Terabyte Scale Isaac Mosquera VP Engineering, Data & Insights

Real time pipeline at terabyte sacle

Embed Size (px)

Citation preview

Real-Time Applications At Terabyte Scale

Isaac MosqueraVP Engineering, Data & Insights

You’ve probably seen our sharing tools...

But that’s not all we do...

WE MAKE SOCIAL DATA ACTIONABLE

Over 1B social

signals are processed

monthly by the

ShareThis Social

Intelligence Platform™

to generate insights

about your brand,

industry and events.

ENGAGEMENT

Users consume and share content

across web and mobile

TARGETING

Desktop and mobiletargeting at scale

INSIGHTS

Actionable cross-device insights

DATA

1B+ first party Social Actions

Monthly

ENGAGEMENT

TARGETING INSIGHTS

DATA

• Lookalike Audiences• Audience Segments

“Wow small SUVs are fuel efficient!”

User #12345

• Automotive Study• Car Buying Infographic

Why Is Real-Time Important?

Time

Sharing Interest Decays With Time

The Previous Architecture

Previous Architecture ProblemsDuplicated Data

Query Engine

Share Data

Insights

Query Engine

Ad Tech

Query Engine

Consumer Engagement

Query Engine

Data Science

Fragmented & Siloed Data Sources

Query Engine

Share Data

Insights

Query Engine

Ad Tech

Query Engine

Consumer Engagement

Query Engine

Data Science

Campaign RTB Conversion

Summarization3rd Party

Trends Studies

Generating Reports From Old Platform

Raw Data

PreAggregation

Staged Data

ResultsConsumers

Query

Rest API

New Report Type

Why Focus On These Problems?

Faster Iterations Data Science

New Applications

Business Value

Targeting

The Birth of a New Team

Data Team’s MissionMaking our data easily accessible

Our Data

Vision

Centralize Data sources Data Quality & Trust

Reliable Infrastructure

Real Time All The Things

Raw Social Data

DLX Geo Device Mappings

SentimentSocial Keywords

Downstream Applications

Kafka ArchitectureData

ScienceApplication

Data Science

Logs

Data Science

Producers

Data Science

Application

Data Science

Logs

Data Science

Producers

Brokers

Data ScienceConsumers

Data Loaders

Data ScienceAnnotations Data ScienceFilters

Destinations

Big QuerySocial Ad Tech

Integrate Campaign

Social Data

DLX Geo Device Mappings

SentimentSocial Keywords

RTB Bid Data

Campaign Data

Downstream Applications

Build An Active Warehouse

3 Trillion Row Interactive Query

Engine

Share Data

Data Science

Ad TechConsumer

Engagement

Sales Strategy

Insights

RTBImpressionS &

Clicks + RT

External

Data Scienc

e

Data Scienc

eATDs

Data Scien

ceDMPs DSPs

Internal

Google Big Query

Add in redundancy and robustness into our data pipeline that protects us against data loss.

Reliability

Unified MonitoringCentralizing monitoring allows us to have a singular definition of “data quality”

Monitoring InfrastructureConsumer App

Metrics Library

Producer App

Metrics Library

Graphite

Slack

Dev Team

Seyren

Dashboards

Defining Data Quality

Expected Field Distribution

Data Loss Business KPIs

What’s Next?

Dynamic Stream Filter

You Want This But You Get This

Stream Sources

Filter Application

Data Filter UI

Filter Definitions

Data Stream Filter Prototype

Real Time Pipeline

shares from top 100 domains

user actions in north east

region

users who recently bought

car

user likely to buy a car soon

actions from user ids in

(1234, 5432, 9999)

Data ScienceExternal

Customers

Data ScienceInternal Teams

Predictive Algorithms

Dynamically create filters based on customer’s needs. These can be created instantly on-demand.

Questions?

Isaac Mosqueratwitter: imosquerae-mail: [email protected]