47
An Architecture for Agile Machine Learning in Real-Time Applications [email protected] @jssmith github.com/ifwe Johann Schleier-Smith if(we) Inc. August 11, 2015 KDD, Sydney Australia

An Architecture for Agile Machine Learning in Real-Time Applications

Embed Size (px)

Citation preview

Page 1: An Architecture for Agile Machine Learning in Real-Time Applications

An Architecture for Agile Machine Learning in Real-Time Applications

[email protected]@jssmith github.com/ifwe

Johann Schleier-Smith if(we) Inc.

August 11, 2015KDD, Sydney Australia

Page 2: An Architecture for Agile Machine Learning in Real-Time Applications

• Profitable startup actively pursuing big opportunities in social apps

• Millions of users on existing products

• Thousands of social contacts per second

Page 3: An Architecture for Agile Machine Learning in Real-Time Applications

Overview

• Agile machine learning can be difficult—but brings big benefits

• Key challenges in deployment and feature engineering

• Solution in single path to data

Page 4: An Architecture for Agile Machine Learning in Real-Time Applications

production

development

servepersonalized

recommendations

datacollection

modelupdates

Page 5: An Architecture for Agile Machine Learning in Real-Time Applications

production

development

servepersonalized

recommendations

datacollection

modelupdates

study &understand

train &backtest

design newmodels & features

Page 6: An Architecture for Agile Machine Learning in Real-Time Applications

production

development

servepersonalized

recommendations

datacollection

study &understand

design newmodels & features

modelupdates

train &backtest

Page 7: An Architecture for Agile Machine Learning in Real-Time Applications

modelupdates

train &backtest

writespec

e-mail modelto engineers

requestengineering

why didwe want this?

QA

bug fixesmeetingswait

exportto Excel

checkparameters

Java development

new databaseschema

Page 8: An Architecture for Agile Machine Learning in Real-Time Applications

modelupdates

train &backtest

• Shared path to data• Shared feature definition code

Page 9: An Architecture for Agile Machine Learning in Real-Time Applications

production

development

servepersonalized

recommendations

datacollection

modelupdates

study &understand

train &backtest

design newmodels & features

Page 10: An Architecture for Agile Machine Learning in Real-Time Applications

• >10 million candidates • >1000 updates/sec

• Must be responsive to current activity • Users expect instant query results

Recommendation Enginefor Dating Product

Page 11: An Architecture for Agile Machine Learning in Real-Time Applications

Model

Page 12: An Architecture for Agile Machine Learning in Real-Time Applications

Model

Page 13: An Architecture for Agile Machine Learning in Real-Time Applications

Model• Decompose likelihood of match between vote outcomes

and vote occurrence

• Logistic regression

• Real-time personalization through feature vector evolution

• Model parameters trained offline by data scientists

• Consider 1000s of features, select 50-100

Page 14: An Architecture for Agile Machine Learning in Real-Time Applications

Application APIs& Business Logic

RDBMS

Page 15: An Architecture for Agile Machine Learning in Real-Time Applications

Application APIs& Business Logic

RDBMSData Warehouse /

Hadoop

Page 16: An Architecture for Agile Machine Learning in Real-Time Applications

Application APIs& Business Logic

RDBMSData Warehouse /

HadoopStreaming Logs

Page 17: An Architecture for Agile Machine Learning in Real-Time Applications

Application APIs& Business Logic

RDBMSData Warehouse /

HadoopStreaming Logs

Page 18: An Architecture for Agile Machine Learning in Real-Time Applications

Application APIs& Business Logic

RDBMS

production

development

ExploratoryAnalysis

Training &Backtesting

Data Warehouse /HadoopStreaming Logs

Page 19: An Architecture for Agile Machine Learning in Real-Time Applications

Application APIs& Business Logic

RDBMS

production

development

ExploratoryAnalysis

Training &Backtesting

BatchPredictions

Data Warehouse /HadoopStreaming Logs

Page 20: An Architecture for Agile Machine Learning in Real-Time Applications

Application APIs& Business Logic

RDBMS

production

development

ExploratoryAnalysis

Training &Backtesting

BatchPredictions

Predictive Services /Ranking

Data Warehouse /HadoopStreaming Logs

Page 21: An Architecture for Agile Machine Learning in Real-Time Applications

Application APIs& Business Logic

RDBMS

production

development

ExploratoryAnalysis

Training &Backtesting

BatchPredictions

Predictive Services /Ranking

Data Warehouse /HadoopStreaming Logs

Page 22: An Architecture for Agile Machine Learning in Real-Time Applications

EventsTime

Page 23: An Architecture for Agile Machine Learning in Real-Time Applications

Aggregation

first( )last( )

count( )

sum( )max( )

count( )

avg( ) min( )

EventsTime

Page 24: An Architecture for Agile Machine Learning in Real-Time Applications

Machine learning inputAggregation

first( )last( )

count( )

sum( )max( )

count( )

avg( ) min( )

EventsTime

Page 25: An Architecture for Agile Machine Learning in Real-Time Applications

Event History APItrait EventHistory { def publishEvent(e: Event)

def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }

Page 26: An Architecture for Agile Machine Learning in Real-Time Applications

Event History APItrait EventHistory { def publishEvent(e: Event)

def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }

Page 27: An Architecture for Agile Machine Learning in Real-Time Applications

Event History APItrait EventHistory { def publishEvent(e: Event)

def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }

Page 28: An Architecture for Agile Machine Learning in Real-Time Applications

Event History APItrait EventHistory { def publishEvent(e: Event)

def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }

+∞ for real-time streaming

Page 29: An Architecture for Agile Machine Learning in Real-Time Applications

Events

Alice updates profile

Bob opens app

Bob sees Alice in recommendations

Bob swipes yes on Alice

Alice receives push notification

Alice sees Bob in recommendations

Alice sends message to Bob

Tim

e

Page 30: An Architecture for Agile Machine Learning in Real-Time Applications

Online feature stateEvents

Alice updates profile

Bob opens app

Bob sees Alice in recommendations

Bob swipes yes on Alice

Alice receives push notification

Alice sees Bob in recommendations

Alice sends message to Bob

Tim

e

Page 31: An Architecture for Agile Machine Learning in Real-Time Applications

Machine learning inputOnline feature stateEvents

Alice updates profile

Bob opens app

Bob sees Alice in recommendations

Bob swipes yes on Alice

Alice receives push notification

Alice sees Bob in recommendations

Alice sends message to Bob

Tim

e

Page 32: An Architecture for Agile Machine Learning in Real-Time Applications

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

Page 33: An Architecture for Agile Machine Learning in Real-Time Applications

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

Page 34: An Architecture for Agile Machine Learning in Real-Time Applications

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

Page 35: An Architecture for Agile Machine Learning in Real-Time Applications

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

Page 36: An Architecture for Agile Machine Learning in Real-Time Applications

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

Page 37: An Architecture for Agile Machine Learning in Real-Time Applications

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

Page 38: An Architecture for Agile Machine Learning in Real-Time Applications

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

Page 39: An Architecture for Agile Machine Learning in Real-Time Applications

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

Page 40: An Architecture for Agile Machine Learning in Real-Time Applications

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

Page 41: An Architecture for Agile Machine Learning in Real-Time Applications

Monitoring

RDBMS

Application APIs& Business Logic

Event History Repository

Ranking

Real-TimeState Updates

State Updates

ExploratoryAnalysis

Training &Backtesting

production

development

Page 42: An Architecture for Agile Machine Learning in Real-Time Applications

• Single path to data for real-time streaming and history

• Shared feature engineering code for development and production

• Team shares access to code and data

• Fine-grained alignment of feature state and prediction outcomes

• Temporally accurate modeling ensured (no looking ahead)

Event History API

Page 43: An Architecture for Agile Machine Learning in Real-Time Applications
Page 44: An Architecture for Agile Machine Learning in Real-Time Applications

15 new models released and tested within 6 months >30% cumulative improvement in usage shown in A/B testing

0

500,000

1,000,000

1,500,000

2,000,000

Apr 2013 Jul 2013 Oct 2013 Jan 2014 Apr 2014

Daily

Uni

que

User

s

MatchersVoters

New model releasedA/B test updated

Page 45: An Architecture for Agile Machine Learning in Real-Time Applications

• Open source implementation derived from if(we)’s proprietary platform

• Provides Scala DSL for building online features from event history

• Examples include dating recommendations and product search with learning to rank

• Not yet ready for scale or production

• Seeking collaborators

Page 46: An Architecture for Agile Machine Learning in Real-Time Applications

Production Serving Data Science

Ranking R MatlabPython

Feature Engineering

Event History API

Kafka

Streaming data

Storm

Historical data

S3 NFSHDFS

Antelope Open Source Vision

Page 47: An Architecture for Agile Machine Learning in Real-Time Applications

Agile Machine Learning with Event History

• Solving deployment yields quick product cycles

• All data saved and retrieved as time-ordered events • Single path to data for both historical and real-time access • Same feature engineering code used in development and production

• Agile success • Team shares access to code and data • Production product iterations measured in days rather than months

github.com/ifwe/antelope@jssmith