Been Kim - Interpretable machine learning, Nov 2015

Interactive and Interpretable Machine Learning Models

for Human Machine Collaboration

Been Kim Nov 2015

VisionHarness the relative strength of

humans and machine learning models

2

HumanMachine Learning Models

http://blogs.teradata.com/

Research objectivesDevelop machine learning models inspired by how humans think that can…

3



Go#here#

4


infer decisions of humans

Develop machine learning models inspired by how humans think that can…


Research objectives

5


make sense to humansinfer decisions

of humans



Research objectives

6



interact with humans

make sense to humans



Research objectives


1. Infer human team decisions from team planning conversation

2. Communication from machine to

human: provide intuitive

explanations

3. Communication from human to

machine: incorporate feedback

Go#here#




7

Research objectives

Road map

8



explanations





1. Infer human team decisions from team planning conversation

Go#here#


Road map1. Infer human team

decisions from team planning conversation

9



explanations






Go#here#

• Human’s tactical decision is based on exemplar-based reasoning (matching and prototyping) [Cohen 96, Newell 72]

• Skilled fire fighters use recognition-primed decision making — a situation is matched to typical cases [Klein 89]

• Machines can better support peoples’ decision-making by representing data in the same way

Mirror the way humans think

10

Case-based reasoning and interpretable models

11

Case-based reasoning

• Applied to various applications thanks to its intuitive power

[Aamodt 94, Slade 91, Bekkerman 06]

Limitations

• Always require labels (supervised)

• Does not scale to complex problems

• Does not leverage global patterns of data

Interpretable models

• Decision tree [De`ath 00] • Sparse linear classifiers [Tibshirani 96, Ustun 14] • Prototype-based [Graf 09]

Limitations

• Sparsity is not enough [Freitas 14]

• Linear models or supervised

Our approach: Bayesian Case Model (BCM)

*

Bayesian generative models

Case-based reasoning

Bayesian Case Model (BCM)

• Leverage the power of examples (prototypes) and subspaces (hot features) to explain machine learning results

prototypessubspaces

Explain complicated

concepts using examples

12[Kim, Rudin, Shah NIPS 2014]


13

• A general framework for Bayesian case-based reasoning • Joint inference on prototypes, subspaces and cluster labels

Cluster A


…

Cluster B Cluster C

prototypes subspaces cluster labels

14

subspaces

prototypes

Explanations provided by Bayesian Case Model (BCM)

salsa sour cream avocado salt, pepper, taco shell, lettuce, oil

Taco

flour egg water, salt, milk, butter

Basic crepe

chocolate strawberry pie crust, whipping cream, kirsch, almonds

Chocolate berry tart

Cluster A Cluster B Cluster C

15

Prototype Quintessential observation that best represents the cluster

Subspace sets of important features in characterizing clusters

• A general framework for Bayesian case-based reasoning • Joint inference on cluster labels, prototypes and subspaces


salsa sour cream avocado

Taco

Explain Cluster A1. clustering 2. learning

explanation

prototypes subspacescluster labels

16

It is a crepe, since it has flour and egg. It is inspired by Mexican food, because it has avocado, salsa and sour cream.

Cluster labels:

• Admixture model for modeling the underlying distributionsCluster A Cluster B Cluster C

= [A, B, A]mexican_crepe


1. Clustering part

17

It is a crepe, since it has flour and egg. It is sweet crepe that is like chocolate

and berry dessert.

• Admixture model for modeling the underlying distributionsCluster A Cluster B Cluster C

= [B, C, C]chocolate_crepe


1. Clustering part

18

• Cluster distribution + supervised classification methods can be used for evaluating the clustering performance[1]

• Hyper parameter can be used to control how many different cluster labels within one data point

The concentration parameter:

Cluster distribution of the data point

[1] D. Blei, A. Ng, M. Jordan 2003


1. Clustering part

19

Subspacesbinary variable 1 for important features

• Each cluster is characterized by a prototype and subspaces

Prototype


2. Learning explanation part

20

A prototype is an actual data point that exists in the dataset

Prototype


2. Learning explanation part• prototype: quintessential observation that best represents the cluster

21


2. Learning explanation part• subspace: sets of important features in characterizing clusters


22


• subspace: sets of important features in characterizing clusters

Any similarity measure can be used. For example, using any loss function:

The feature j of cluster s is an

important feature (i.e., subspace)

The value of feature j is identical to the

value of the prototype of clusters


2. Learning explanation part

23

ResultsChallenges of interpretable models

1. Do the learned prototypes and subspaces make sense?

2. Are we sacrificing performance for the interpretability?

3. Do learned prototypes and subspaces help humans’ understanding?

24

Data from computer cooking contest: liris/cnrs.fr/ccc/ccc2014

• Unsupervised clustering on a subset of recipe data


BCM on recipe data

25

sesame


BCM on digit data

http://www.cs.nyu.edu/~roweis/data.html26

Learned cluster D

Gibbs sampling iteration


BCM on digit data

27

2. Are we sacrificing anything for the interpretability?

Maintain accuracy

Handdigit dataset

20Newsgroups dataset

Sensitivity Analysis

BCM BCM

28

2. Are we sacrificing anything for the interpretability?

Joint inference on prototypes, subspaces and cluster labels is the key

Posterior distribution

Level set

Another solution that clusters data equally well

and has better interpretability

—- BCM gives higher score for this solution

One solution that clusters

data well

29

Collapsed Gibbs sampling for inference

• Observed to converge quickly in admixture models

• Integrating out and for efficient inference


3. Does the model make sense to humans? Objective measure of human understanding

Accuracy of human classifier

a new data point to be classified

• Participant’s task is to assign the ingredients of a specific dish (a new data point) to a cluster

• Each cluster is explained using either BCM or LDA

31

• 384 classification questions asked to 24 people

• Statistically significantly better performance with BCM (85.9% v.s. 71.3%)

a new data point to be classified

Explanations of clusters

Clusters explained using

1. BCM : ingredients of the prototype recipe

2. LDA: representative ingredients of each cluster

3. Does the model make sense to humans? Objective measure of human understanding

Accuracy of human classifier


sesame

Road map1. Infer human team

decisions from team planning conversation

33



explanations





Go#here#


Why interactive?

34

Why interactive?

35

Why interactive?

36

Related work on interactive machine learning

• Interact via multiple model parameter settings [Patel 10, Amershi 15]

• Design smart interfaces [Amershi 11] and visualization [Chaney 12, Gou 03]

• Interact via simplified medium of interaction [Kapoor 10, Ware 01]

Prototypesand

Subspaces!

37

interactive BCM (iBCM)

38

BCM iBCM

Double circled nodes represent interacted latent variables —

Node that get information from both user feedback and

information obtained from data points

interactive BCM (iBCM)

39

BCM iBCM

Double circled nodes represent interacted latent variables —

Node that get information from both user feedback and

information obtained from data points

interactive BCM (iBCM) internal mechanism

40

3. Listen to Data

Key: Balance between what the data indicates and what makes most sense to the user

Our approach: Decompose Gibbs sampling steps to

1) adjust the feedback propagation depending on user’s confidence 2) accelerate the inference by rearranging latent variables

2. Propagate Users feedback

to accelerate inference

1. Listen to Users

User’s workflow with iBCM abstract domain

41

click to change to to

click to promote any items to be

prototype

Experiment procedure1. Subjects are asked how they want to

group items

2. Subjects view results from BCM

• Essentially shows one of the optimal clustering

3. Subjects indicate how well the results matched their preferred clustering

4. Subjects interact with iBCM

5. Subjects indicate how well the results matches with what they want

42Compare 24 participants, 192 questions

Experiment results1. Subjects are asked how they want to

group items

2. Subjects view results from BCM

• Essentially shows one of the optimal clustering

3. Subjects indicate how well the results matched their preferred clustering

4. Subjects interact with iBCM

5. Subjects indicate how well the results matches with what they want

4324 participants, 192 questions

Participants agreed more strongly that final clusters matched their preferences

compared to the initial clusters

Wilcoxon signed rank test

iBCM for introductory programming education

44

• Why education?

• Current teachers’ workflow for creating grading rubric: randomly pick 4-5 assignments and Hodgepodge Grading [Cross 99]

• Understanding this variation is important for providing appropriate, tailored feedback to students [Basu13, Huang 13]

• What are the challenges?

• Extracting right features — OverCode [Glassman 15]

iBCM + OverCode system

45submissions from MIT introductory python classes

iBCM + OverCode system

46

Select/unselect subspaces (keywords)

Promote/demote prototypes

iBCM experiment with domain experts results

Click here to get a new grouping

V.S.

Task: Explore the full spectrum of students’ submissions and write down `discovery list’ for a recitation

47

d

48

Experiment with domain experts results

• 48 problems explored by 12 subjects who previously taught introductory python class

• participants agreed more strongly to the following compared to BCM ( )

Were more satisfied Better explored the full spectrum of students’ submissions Better identified important features to expand discovery listImportant features and prototypes are useful𝑝 < 0.001

49

with iBCM, they…


Experiment with domain experts results

• 48 problems explored by 12 subjects who previously taught introductory python class

• participants agreed more strongly to the following compared to BCM ( )

Were more satisfied Better explored the full spectrum of students’ submissions Better identified important features to expand discovery listImportant features and prototypes are useful𝑝 < 0.001

50

with iBCM, they…

“[iBCM enabled me to] go in depth as to how students could do”

“ [iBCM] is useful with large datasets where brute-force would not be practical.”


Summary

51[Kim, Chacha, Shah AAAI13] [Kim, Chacha, Shah JAIR15]

Communication from machine to human:

provide intuitive explanations



[Kim, Rudin, Shah NIPS 2014] [Kim, Glassman, Johnson, Shah submitted*][Kim, Patel, Rostamizadeh, Shah AAAI 2015]

Inspiration: how humans make decisions

Approach: case-based Bayesian model

Results: provided intuitive explanations while

maintaining performance

Approach: enable interaction by

decomposing sampling inference steps

Results: implemented and validated the approach in

education domain

Communication from human to machine:

incorporate feedback

miss-classified data

Next steps• Interpretability for data

exploration: visualization

• Domain specific interpretability: learning features that distinguishes clusters

• Interactive machine learning for debugging models or hyper parameter explorations

predicted: politics

Doc id #24

True label: medicine

[Kim, Patel, Rostamizadeh, Shah AAAI 2015]

52

[Kim, Doshi-Velez, Shah NIPS 2015]

Next steps at AI2• Extend interpretability for initially

uninterpretable features (neural nets)

53

4th grade science exam

question

Q&A

[Kim, Chacha, Shah AAAI13] [Kim, Chacha, Shah JAIR15]

Communication from machine to human:

provide intuitive explanations



[Kim, Rudin, Shah NIPS 2014][Kim, Glassman, Johnson, Shah submitted*][Kim, Patel, Rostamizadeh, Shah AAAI 2015]

Inspiration: how humans make decisions

Approach: case-based Bayesian model

Results: provided intuitive explanations while

maintaining performance

Approach: enable interaction by

decomposing sampling inference steps

Results: implemented and validated the approach in

education domain

Communication from human to machine:

incorporate feedback

[Kim, Doshi-Velez, Shah NIPS 2015]

AI2 is hiring research interns

any time of the year. Shoot me an email

if interested! [email protected]

Science

Been Kim - Interpretable machine learning, Nov 2015