Upload
johann-schleier-smith
View
130
Download
1
Tags:
Embed Size (px)
Citation preview
An Architecture for Agile Machine Learning in Real-Time Applications
[email protected]@jssmith github.com/ifwe
Johann Schleier-Smith if(we) Inc.
August 11, 2015KDD, Sydney Australia
• Profitable startup actively pursuing big opportunities in social apps
• Millions of users on existing products
• Thousands of social contacts per second
Overview
• Agile machine learning can be difficult—but brings big benefits
• Key challenges in deployment and feature engineering
• Solution in single path to data
production
development
servepersonalized
recommendations
datacollection
modelupdates
production
development
servepersonalized
recommendations
datacollection
modelupdates
study &understand
train &backtest
design newmodels & features
production
development
servepersonalized
recommendations
datacollection
study &understand
design newmodels & features
modelupdates
train &backtest
modelupdates
train &backtest
writespec
e-mail modelto engineers
requestengineering
why didwe want this?
QA
bug fixesmeetingswait
exportto Excel
checkparameters
Java development
new databaseschema
modelupdates
train &backtest
• Shared path to data• Shared feature definition code
production
development
servepersonalized
recommendations
datacollection
modelupdates
study &understand
train &backtest
design newmodels & features
• >10 million candidates • >1000 updates/sec
• Must be responsive to current activity • Users expect instant query results
Recommendation Enginefor Dating Product
Model
Model
Model• Decompose likelihood of match between vote outcomes
and vote occurrence
• Logistic regression
• Real-time personalization through feature vector evolution
• Model parameters trained offline by data scientists
• Consider 1000s of features, select 50-100
Application APIs& Business Logic
RDBMS
Application APIs& Business Logic
RDBMSData Warehouse /
Hadoop
Application APIs& Business Logic
RDBMSData Warehouse /
HadoopStreaming Logs
Application APIs& Business Logic
RDBMSData Warehouse /
HadoopStreaming Logs
Application APIs& Business Logic
RDBMS
production
development
ExploratoryAnalysis
Training &Backtesting
Data Warehouse /HadoopStreaming Logs
Application APIs& Business Logic
RDBMS
production
development
ExploratoryAnalysis
Training &Backtesting
BatchPredictions
Data Warehouse /HadoopStreaming Logs
Application APIs& Business Logic
RDBMS
production
development
ExploratoryAnalysis
Training &Backtesting
BatchPredictions
Predictive Services /Ranking
Data Warehouse /HadoopStreaming Logs
Application APIs& Business Logic
RDBMS
production
development
ExploratoryAnalysis
Training &Backtesting
BatchPredictions
Predictive Services /Ranking
Data Warehouse /HadoopStreaming Logs
EventsTime
Aggregation
first( )last( )
count( )
sum( )max( )
count( )
avg( ) min( )
EventsTime
Machine learning inputAggregation
first( )last( )
count( )
sum( )max( )
count( )
avg( ) min( )
EventsTime
Event History APItrait EventHistory { def publishEvent(e: Event)
def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }
Event History APItrait EventHistory { def publishEvent(e: Event)
def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }
Event History APItrait EventHistory { def publishEvent(e: Event)
def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }
Event History APItrait EventHistory { def publishEvent(e: Event)
def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }
+∞ for real-time streaming
Events
Alice updates profile
Bob opens app
Bob sees Alice in recommendations
Bob swipes yes on Alice
Alice receives push notification
Alice sees Bob in recommendations
Alice sends message to Bob
Tim
e
Online feature stateEvents
Alice updates profile
Bob opens app
Bob sees Alice in recommendations
Bob swipes yes on Alice
Alice receives push notification
Alice sees Bob in recommendations
Alice sends message to Bob
Tim
e
Machine learning inputOnline feature stateEvents
Alice updates profile
Bob opens app
Bob sees Alice in recommendations
Bob swipes yes on Alice
Alice receives push notification
Alice sees Bob in recommendations
Alice sends message to Bob
Tim
e
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
Monitoring
RDBMS
Application APIs& Business Logic
Event History Repository
Ranking
Real-TimeState Updates
State Updates
ExploratoryAnalysis
Training &Backtesting
production
development
• Single path to data for real-time streaming and history
• Shared feature engineering code for development and production
• Team shares access to code and data
• Fine-grained alignment of feature state and prediction outcomes
• Temporally accurate modeling ensured (no looking ahead)
Event History API
15 new models released and tested within 6 months >30% cumulative improvement in usage shown in A/B testing
0
500,000
1,000,000
1,500,000
2,000,000
Apr 2013 Jul 2013 Oct 2013 Jan 2014 Apr 2014
Daily
Uni
que
User
s
MatchersVoters
New model releasedA/B test updated
• Open source implementation derived from if(we)’s proprietary platform
• Provides Scala DSL for building online features from event history
• Examples include dating recommendations and product search with learning to rank
• Not yet ready for scale or production
• Seeking collaborators
Production Serving Data Science
Ranking R MatlabPython
Feature Engineering
Event History API
Kafka
Streaming data
Storm
Historical data
S3 NFSHDFS
Antelope Open Source Vision
Agile Machine Learning with Event History
• Solving deployment yields quick product cycles
• All data saved and retrieved as time-ordered events • Single path to data for both historical and real-time access • Same feature engineering code used in development and production
• Agile success • Team shares access to code and data • Production product iterations measured in days rather than months
github.com/ifwe/antelope@jssmith