Building Intelligent Data Products (Applied AI)

Preview:

Citation preview

building intelligent data products

who am i? what does Ravelin do?

building intelligent data products

things to think about when building them

stephen whitworth

2 years at Hailo as data scientist/jack of some trades out of university

product and marketplace analytics, agent based modelling, data engineering, stream processing

services

data science/engineering at ravelin, specifically focused on our detection capabilities

what is ravelin?

online fraud detection and prevention platform

stream data to us

we give fraud probability instantly + beautiful data visualisation to understand your customers

backed by techstars/passion/playfair/amadeus/indeed.com founder/wonga founder amongst other great investors

fraud?

$14Blost in card not present fraud in 2014

a dollar for every year the universe has existed

Same day delivery On-demand services

‘victimless crime’

police ill-equipped to handle

low barrier to entry from dark net

3D secure - conversion killer

traditional: human generated rules, born of deep expertise

order-centric view of the world

hybrid: augment expertise by learning rules from data

cards don’t commit fraud, people do

stop the customer before they even get to ordering

‘a random forest is like a room full of experts who have seen different

cases of fraud from different perspectives’

‘a random forest is like a room full of experts who have seen different

cases of fraud from different perspectives’

N

measure and optimise for the right thing(s) in your data product

account for the fact that your customers are at different stages to one another, and optimise for different things

precision: of all of my predictions, what % was I correct?

recall: out of all of the fraudsters, what % did I catch?

implicit tradeoff between conversion and fraud loss

‘accuracy’ a useless metric for fraud

99.9% ACCURATE

use tools that make you disproportionately productive

shameless fans of BigQuery

our analysis stack: BigQuery, JupyterHub, pandas, scikit-learn

internal Google network is super fast, so wise to co-locate with your data

enable fast iteration by keeping model interfaces simple

hide arbitrarily complex transformations behind it

expose it over REST or a queue

version control them, roll backwards/forwards/sideways

q: do you always trade performance for explainability? a: no

if someone’s neck is on the line for your decision, allow them to understand how you came to it

RANDOM FORESTS

MONITORING

always be monitoring, probing for edge cases

dogfood - use robot customers

run strategies in ‘dark mode’ to determine performance

many ways things could break - be paranoid

‘machine learning: the high interest credit card of technical debt’ - Google

in beta and signing up clients

looking for on-demand services/marketplaces, payment service providers that are facing fraud

problems

talk to me afterwards

obligatory: we are hiring!

junior machine learning engineers/data scientists

stephen.whitworth@ravelin.com or talk to me after

stephen.whitworth@ravelin.com - @sjwhitworth

www.ravelin.com - @ravelinhq

Recommended