Upload
eugene-mandel
View
1.044
Download
4
Embed Size (px)
Citation preview
THE OTHER 99% OF A DATA SCIENCE PROJECT
Open Data Science ConferenceSanta Clara | November 4-6th
2016Eugene Mandel
@eugmandel
∎ @eugmandel∎ lead of data science at
directly∎ formerly:
□data science team at Jawbone
□co-founder qualaroo, jaxtr
ABOUT ME
DATA SCIENCE NEEDS PRODUCT MANAGEMENTsuccess of a data science project has as much to do with product management as with data science
2 KINDS OF DATA SCIENCE B
ANALYZE
A
BUILD
PAYFORPARKINGWITHYOURPHONE
DON’TYOUKNOWME?!
∎ “don’t you know me?!” -> “you get me!”
∎ get smarter with every interaction
∎ reduce search space
SMART PRODUCTS
SMART PRODUCTS
BUT NOT THAT SMART...
SMARTPRODUCTSGOPROBABILISTIC
THE OTHER 99% PERCENT
algorithms
Show and explain your web, app or software projects using these gadget templates.
PARKING APP
ON DEMAND CUSTOMER SUPPORT
LOOKING FOROPPORTUNITIES
PROBLEM: choose support tickets that expert users can resolve
LOOKING FOR OPPORTUNITIES
CHOOSERESOLVABLETICKETSWITHMACHINELEARNING
GETTING THE DATA
GETTING ALLIES
GETTING THE DATA
CLEAN YOUR DATAAutomated bug reportsSurveysBounced emailsInternal ticketsEmail metadataEmail threads...
GUYS CLEAN A DATASET, GET RICH
FEATURE ENGINEERING
TRAINING - COLD START PROBLEMall tickets
tickets seen by expert
TRAINING -GET LABELS
“Is there a cat in this picture?” “Is this support ticket resolvable?”
TRAINING -GET LABELS
∎ label manually∎ derive labels from user
behavior∎ derive labels from external
sources∎ mix
My favorite data science algorithm is division.
Monica RogatiFormer VP of Data, Jawbone & LinkedIn data scientist
TokenizationBag of words (BOW)Tf–idfRandom Forest Classifier
MODEL
DEVELOPMENT
PLAYING WELL WITH ENGINEERING
∎ gaining trust∎ development process
POINTS OF INTEGRATION
online or offline?
DEVELOPMENT
integration - broad APIs
“NAPKIN ARCHITECTURE”
IS IT WORKING? evaluatingdataproducts
Image source: https://themouseandthewindmill.wordpress.com
accuracyprecision/recalldriven by business
EVALUATION METRICS
IS IT WORKING? QA’ing dataproducts
Image source: https://themouseandthewindmill.wordpress.com
PLAYING WELL WITH DEVOPS
BRIDGING TECHSTACKS
IN PRODUCTION
THE KNOBS:HOW TO CONTROL THE PRODUCT
∎ on/off switch per customer∎ prediction threshold∎ exclusions
“... SMART…”“... AI …”“...MACHINE LEARNING…”“...INTELLIGENT…”
NAMING THINGS
UPDATING THE MODEL
∎ input data changes∎ users behaviour changes∎ dataset grows
NEGATIVE SAMPLINGsend small % of predicted negativeas if they were positive
predicted positive
NEGATIVE LABELINGsend small % of predicted negativefor manual labeling
predicted positive
∎ “Would you be able to resolve this ticket successfully?”
∎ “Would an expert user be able to resolve this ticket successfully?”
∎ “Would an expert user be able to resolve this ticket successfully without getting a negative rating?”
LABELING - HOW TOPHRASE THE QUESTION?
∎ customers∎ sales∎ account managers∎ marketing∎ execs
MESSAGING
CUSTOMER ENGAGEMENT PLAYBOOK
DATA ETHICS
INTERPRETABILITY
Image source:https://en.wikipedia.org/wiki/File:Blue_Poles_(Jackson_Pollock_painting).jpg
THANKS!Eugene Mandel@eugmandel
∎ Presentation template by SlidesCarnival∎ Images:
□ http://jedismedicine.blogspot.com/□ Jawbone□ Directly□ Wikipedia□ https://themouseandthewindmill.wordpress.com□ http://www.imdb.com/
CREDITS