19
Data Science and Goodhart’s Law Kyle Polich Data Science, Inc.

Data Science and Goodhart's Law

Embed Size (px)

Citation preview

Page 1: Data Science and Goodhart's Law

Data Science and Goodhart’s Law

Kyle PolichData Science, Inc.

Page 2: Data Science and Goodhart's Law

2

Goodhart’s Law

When a measure becomes a target, it ceases to be a good measure

Page 3: Data Science and Goodhart's Law

3

Sales Rep Compensation Example

• Base pay + variable commission• For monthly <50k, commission = 3%• For monthly 50-99k, commission = 5%• For monthly 100k+, commission = 7%

Page 4: Data Science and Goodhart's Law

4

Some Examples

Spam filtering arms race Search engine ranking Clearing cookies to get better airline prices Keep account open to manipulate FICO score Retail discounting/couponing strategies Bidding in AdTech marketplaces

Page 5: Data Science and Goodhart's Law

5

Measuring with Cross ValidationCross Validation• You should be doing this anyway!• Set production performance expectation• Measure post deployment• Total deviation =

deviation due to overfit + deviation due to incomplete training+ deviation due to Goodhart’s Law

Page 6: Data Science and Goodhart's Law

6

Measuring via Homogeneity Assumption

Can you train a model to accurately predict the date at which the observation was created?

Page 7: Data Science and Goodhart's Law

7

Measuring Drift

Page 8: Data Science and Goodhart's Law

8

Measuring DriftTypical failure from a web application release

Page 9: Data Science and Goodhart's Law

9

Measuring DriftPossible failure from a web application release

Page 10: Data Science and Goodhart's Law

10

Dealing with it

• Detection is key• Experimentation is required• Agile methods for model

deployment

Page 11: Data Science and Goodhart's Law

11

Causal Impact• An approach to

estimating the causal effect of a designed intervention on a time series.

• Predicts counterfactual (how response likely would have evolved absent the intervention)

Page 12: Data Science and Goodhart's Law

12

Self Fulfilling Prophecies

• Beware!• Case study: lead qualification

– Try to predict leads that will close– Relearn the bias of your training

Page 13: Data Science and Goodhart's Law

13

Fast Iterations

• Outside normal SWLC release cycle– State updates– Parameter tuning

• Run experiments

Page 14: Data Science and Goodhart's Law

14

Explanatory power

• Goodhart’s law will often manifest on only a subset of (possibly significant) instances.

• Model interpretability for effected instances is key

Page 15: Data Science and Goodhart's Law

15

Interpretable Models

Page 16: Data Science and Goodhart's Law

16

Interpretable Models

Page 17: Data Science and Goodhart's Law

17

Why Should I Trust You?Explaining the Predictions of Any Classifier

Ribeiro, Singh, Guestrin

Model Interpretability

Page 18: Data Science and Goodhart's Law

18

Summary• Goodhart’s law: When a measure becomes a target, it

ceases to be a good measure• As a data scientist, if your work is meaningful, you will

encounter it• Try to measure it in the data• Work on explanatory models to mitigate• Don’t let the average case blind you

Page 19: Data Science and Goodhart's Law

19

DataScience

facebook.com/datascience

@DataSkeptic@datascienceinc

linkedin.com/company/datascience-inc

(310) 579 - 6200