Upload
stephen-whitworth
View
870
Download
0
Embed Size (px)
Citation preview
building intelligent data products
who am i? what does Ravelin do?
building intelligent data products
things to think about when building them
stephen whitworth
2 years at Hailo as data scientist/jack of some trades out of university
product and marketplace analytics, agent based modelling, data engineering, stream processing
services
data science/engineering at ravelin, specifically focused on our detection capabilities
what is ravelin?
online fraud detection and prevention platform
stream data to us
we give fraud probability instantly + beautiful data visualisation to understand your customers
backed by techstars/passion/playfair/amadeus/indeed.com founder/wonga founder amongst other great investors
fraud?
$14Blost in card not present fraud in 2014
a dollar for every year the universe has existed
Same day delivery On-demand services
‘victimless crime’
police ill-equipped to handle
low barrier to entry from dark net
3D secure - conversion killer
traditional: human generated rules, born of deep expertise
order-centric view of the world
hybrid: augment expertise by learning rules from data
cards don’t commit fraud, people do
stop the customer before they even get to ordering
‘a random forest is like a room full of experts who have seen different
cases of fraud from different perspectives’
‘a random forest is like a room full of experts who have seen different
cases of fraud from different perspectives’
N
measure and optimise for the right thing(s) in your data product
account for the fact that your customers are at different stages to one another, and optimise for different things
precision: of all of my predictions, what % was I correct?
recall: out of all of the fraudsters, what % did I catch?
implicit tradeoff between conversion and fraud loss
‘accuracy’ a useless metric for fraud
99.9% ACCURATE
use tools that make you disproportionately productive
shameless fans of BigQuery
our analysis stack: BigQuery, JupyterHub, pandas, scikit-learn
internal Google network is super fast, so wise to co-locate with your data
enable fast iteration by keeping model interfaces simple
hide arbitrarily complex transformations behind it
expose it over REST or a queue
version control them, roll backwards/forwards/sideways
q: do you always trade performance for explainability? a: no
if someone’s neck is on the line for your decision, allow them to understand how you came to it
RANDOM FORESTS
MONITORING
always be monitoring, probing for edge cases
dogfood - use robot customers
run strategies in ‘dark mode’ to determine performance
many ways things could break - be paranoid
‘machine learning: the high interest credit card of technical debt’ - Google
in beta and signing up clients
looking for on-demand services/marketplaces, payment service providers that are facing fraud
problems
talk to me afterwards
obligatory: we are hiring!
junior machine learning engineers/data scientists
[email protected] or talk to me after
[email protected] - @sjwhitworth
www.ravelin.com - @ravelinhq