Upload
evan-estola
View
79
Download
1
Embed Size (px)
Citation preview
When Recommendation
Systems Go Bad
Evan Estola3/31/17
About Me
● Evan Estola
● Staff Machine Learning Engineer, Data Team Lead @ Meetup
● @estola
Meetup
● Do more
● 270,000 Meetup Groups
● 30 Million Members
● 180 Countries
Why Recs at Meetup are Hard
● Cold Start
● Sparsity
● Lies
Recommendation Systems: Collaborative Filtering
Recommendation Systems: Rating Prediction
● Netflix prize
● How many stars would user X give movie Y
● Ineffective!
Recommendation Systems: Learning To Rank
● Treat Recommendations as a supervised ranking problem
● Easy mode:
○ Positive samples - joined a Meetup
○ Negative samples - didn’t join a Meetup
○ Logistic Regression, use output/confidence for ranking
You just wanted a kitchen scale, now Amazon thinks you’re a drug dealer
● “Black-sounding” names 25% more
likely to be served ad suggesting
criminal record
●
● Fake profiles, track ads
● Career coaching for “200k+”
Executive jobs Ad
● Male group: 1852 impressions
● Female group: 318
● Twitter bot● “Garbage in,
garbage out”● Responsibility?
“In the span of 15 hours Tay referred to feminism as a
"cult" and a "cancer," as well as noting "gender equality
= feminism" and "i love feminism now." Tweeting
"Bruce Jenner" at the bot got similar mixed response,
ranging from "caitlyn jenner is a hero & is a stunning,
beautiful woman!" to the transphobic "caitlyn jenner
isn't a real woman yet she won woman of the year?"”
Tay.ai
Know your data
● Outliers can matter
● The real world is messy
● Some people will mess with you
● Not everyone looks like you
○ Airbags
● More important than ever with
more impactful applications
○ Example: Medical data
Keep it simple
● Interpretable models
● Feature interactions
○ Using features against
someone in unintended ways
○ Work experience is good up
until a point?
○ Consequences of location?
○ Combining gender and
interests?
● When you must get fancy, combine
grokable models
Ensemble Model, Data Segregation
Data:*InterestsSearchesFriendsLocation
Data:*GenderFriendsLocation
Data:Model1 PredictionModel2 Prediction
Model1 Prediction
Model2 Prediction
Final Prediction
Diversity Controlled Testing
● CMU - AdFisher
○ Crawls ads with simulated user profiles
● Same technique can work to find bias in your own models!
○ Generate Test Data
■ Randomize sensitive feature in real data set
○ Run Model
■ Evaluate for unacceptable biased treatment
● Florian Tramèr
○ FairTest
https://research.google.com/bigpicture/attacking-discrimination-in-ml/
Human Problems
● Auto-ethics
○ Defining un-ethical features
○ Who decides to look for fairness in the first place?
By restricting or removing certain features aren’t you sacrificing performance? Isn’t it actually adding bias if you decide which features to put in or not?If the data shows that there is a relationship between X and Y, isn’t that your ground truth?
Isn’t that sub-optimal?
It’s always a human problem
● “All Models are wrong, but some are useful”
● Your model is already biased
Bad Features
● Not all features are ok!
○ ‘Time travelling’
■ Rating a movie => watched the movie
■ Cancer Surgery
Misguided Models
● “It’s difficult to make predictions, especially about the future”
○ Offline performance != Online performance
○ Predicting past behavior != Influencing behavior
○ Example: Clicks vs. buy behavior in ads
Asking the right questions
● Need a human
○ Choosing features
○ Choosing the right target variable
■ Value-added ML
“Computers are useless,
they can only give you
answers”
Bad Questions
● Questionable real-world applications
○ Screen job applications
○ Screen college applications
○ Predict salary
○ Predict recidivism
● Features?
○ Race
○ Gender
○ Age
Correlating features
● Name -> Gender
● Name -> Age
● Grad Year -> Age
● Zip -> Socioeconomic Class
● Zip -> Race
● Likes -> Age, Gender, Race, Sexual Orientation...
● Credit score, SAT score, College prestigiousness...
At your job...
Not everyone will have the same ethical values, but you don’t have to take
‘optimality’ as an argument against doing the right thing.
You know racist computers are a bad idea
Don’t let your company invent racist computers
@estola