Upload
seattle-daml-meetup
View
392
Download
3
Embed Size (px)
Citation preview
Interactive and Interpretable Machine Learning Models
for Human Machine Collaboration
Been Kim Nov 2015
VisionHarness the relative strength of
humans and machine learning models
2
HumanMachine Learning Models
http://blogs.teradata.com/
Research objectivesDevelop machine learning models inspired by how humans think that can…
3
HumanMachine Learning Models
http://blogs.teradata.com/
Go#here#
4
HumanMachine Learning Models
infer decisions of humans
Develop machine learning models inspired by how humans think that can…
http://blogs.teradata.com/
Research objectives
5
HumanMachine Learning Models
make sense to humansinfer decisions
of humans
Develop machine learning models inspired by how humans think that can…
http://blogs.teradata.com/
Research objectives
6
HumanMachine Learning Models
infer decisions of humans
interact with humans
make sense to humans
Develop machine learning models inspired by how humans think that can…
http://blogs.teradata.com/
Research objectives
Develop machine learning models inspired by how humans think that can…
1. Infer human team decisions from team planning conversation
2. Communication from machine to
human: provide intuitive
explanations
3. Communication from human to
machine: incorporate feedback
Go#here#
infer decisions of humans
make sense to humans
interact with humans
7
Research objectives
Road map
8
2. Communication from machine to
human: provide intuitive
explanations
3. Communication from human to
machine: incorporate feedback
make sense to humans
interact with humans
1. Infer human team decisions from team planning conversation
Go#here#
infer decisions of humans
Road map1. Infer human team
decisions from team planning conversation
9
2. Communication from machine to
human: provide intuitive
explanations
3. Communication from human to
machine: incorporate feedback
infer decisions of humans
make sense to humans
interact with humans
Go#here#
• Human’s tactical decision is based on exemplar-based reasoning (matching and prototyping) [Cohen 96, Newell 72]
• Skilled fire fighters use recognition-primed decision making — a situation is matched to typical cases [Klein 89]
• Machines can better support peoples’ decision-making by representing data in the same way
Mirror the way humans think
10
Case-based reasoning and interpretable models
11
Case-based reasoning
• Applied to various applications thanks to its intuitive power
[Aamodt 94, Slade 91, Bekkerman 06]
Limitations
• Always require labels (supervised)
• Does not scale to complex problems
• Does not leverage global patterns of data
Interpretable models
• Decision tree [De`ath 00] • Sparse linear classifiers [Tibshirani 96, Ustun 14] • Prototype-based [Graf 09]
Limitations
• Sparsity is not enough [Freitas 14]
• Linear models or supervised
Our approach: Bayesian Case Model (BCM)
*
Bayesian generative models
Case-based reasoning
Bayesian Case Model (BCM)
• Leverage the power of examples (prototypes) and subspaces (hot features) to explain machine learning results
prototypessubspaces
Explain complicated
concepts using examples
12[Kim, Rudin, Shah NIPS 2014]
Bayesian Case Model (BCM)
13
• A general framework for Bayesian case-based reasoning • Joint inference on prototypes, subspaces and cluster labels
Cluster A
Bayesian Case Model (BCM)
…
Cluster B Cluster C
prototypes subspaces cluster labels
14
subspaces
prototypes
Explanations provided by Bayesian Case Model (BCM)
salsa sour cream avocado salt, pepper, taco shell, lettuce, oil
Taco
flour egg water, salt, milk, butter
Basic crepe
chocolate strawberry pie crust, whipping cream, kirsch, almonds
Chocolate berry tart
Cluster A Cluster B Cluster C
15
Prototype Quintessential observation that best represents the cluster
Subspace sets of important features in characterizing clusters
• A general framework for Bayesian case-based reasoning • Joint inference on cluster labels, prototypes and subspaces
Bayesian Case Model (BCM)
salsa sour cream avocado
Taco
Explain Cluster A1. clustering 2. learning
explanation
prototypes subspacescluster labels
16
It is a crepe, since it has flour and egg. It is inspired by Mexican food, because it has avocado, salsa and sour cream.
Cluster labels:
• Admixture model for modeling the underlying distributionsCluster A Cluster B Cluster C
= [A, B, A]mexican_crepe
Bayesian Case Model (BCM)
1. Clustering part
17
It is a crepe, since it has flour and egg. It is sweet crepe that is like chocolate
and berry dessert.
• Admixture model for modeling the underlying distributionsCluster A Cluster B Cluster C
= [B, C, C]chocolate_crepe
Bayesian Case Model (BCM)
1. Clustering part
18
• Cluster distribution + supervised classification methods can be used for evaluating the clustering performance[1]
• Hyper parameter can be used to control how many different cluster labels within one data point
The concentration parameter:
Cluster distribution of the data point
[1] D. Blei, A. Ng, M. Jordan 2003
Bayesian Case Model (BCM)
1. Clustering part
19
Subspacesbinary variable 1 for important features
• Each cluster is characterized by a prototype and subspaces
Prototype
Bayesian Case Model (BCM)
2. Learning explanation part
20
A prototype is an actual data point that exists in the dataset
Prototype
Bayesian Case Model (BCM)
2. Learning explanation part• prototype: quintessential observation that best represents the cluster
21
Bayesian Case Model (BCM)
2. Learning explanation part• subspace: sets of important features in characterizing clusters
Subspacesbinary variable 1 for important features
22
Subspacesbinary variable 1 for important features
• subspace: sets of important features in characterizing clusters
Any similarity measure can be used. For example, using any loss function:
The feature j of cluster s is an
important feature (i.e., subspace)
The value of feature j is identical to the
value of the prototype of clusters
Bayesian Case Model (BCM)
2. Learning explanation part
23
ResultsChallenges of interpretable models
1. Do the learned prototypes and subspaces make sense?
2. Are we sacrificing performance for the interpretability?
3. Do learned prototypes and subspaces help humans’ understanding?
24
Data from computer cooking contest: liris/cnrs.fr/ccc/ccc2014
• Unsupervised clustering on a subset of recipe data
1. Do the learned prototypes and subspaces make sense?
BCM on recipe data
25
sesame
1. Do the learned prototypes and subspaces make sense?
BCM on digit data
http://www.cs.nyu.edu/~roweis/data.html26
Learned cluster D
Gibbs sampling iteration
1. Do the learned prototypes and subspaces make sense?
BCM on digit data
27
2. Are we sacrificing anything for the interpretability?
Maintain accuracy
Handdigit dataset
20Newsgroups dataset
Sensitivity Analysis
BCM BCM
28
2. Are we sacrificing anything for the interpretability?
Joint inference on prototypes, subspaces and cluster labels is the key
Posterior distribution
Level set
Another solution that clusters data equally well
and has better interpretability
—- BCM gives higher score for this solution
One solution that clusters
data well
29
Collapsed Gibbs sampling for inference
• Observed to converge quickly in admixture models
• Integrating out and for efficient inference
30[Kim, Rudin, Shah NIPS 2014]
3. Does the model make sense to humans? Objective measure of human understanding
Accuracy of human classifier
a new data point to be classified
• Participant’s task is to assign the ingredients of a specific dish (a new data point) to a cluster
• Each cluster is explained using either BCM or LDA
31
• 384 classification questions asked to 24 people
• Statistically significantly better performance with BCM (85.9% v.s. 71.3%)
a new data point to be classified
Explanations of clusters
Clusters explained using
1. BCM : ingredients of the prototype recipe
2. LDA: representative ingredients of each cluster
3. Does the model make sense to humans? Objective measure of human understanding
Accuracy of human classifier
32[Kim, Rudin, Shah NIPS 2014]
sesame
Road map1. Infer human team
decisions from team planning conversation
33
2. Communication from machine to
human: provide intuitive
explanations
3. Communication from human to
machine: incorporate feedback
make sense to humans
interact with humans
Go#here#
infer decisions of humans
Why interactive?
34
Why interactive?
35
Why interactive?
36
Related work on interactive machine learning
• Interact via multiple model parameter settings [Patel 10, Amershi 15]
• Design smart interfaces [Amershi 11] and visualization [Chaney 12, Gou 03]
• Interact via simplified medium of interaction [Kapoor 10, Ware 01]
Prototypesand
Subspaces!
37
interactive BCM (iBCM)
38
BCM iBCM
Double circled nodes represent interacted latent variables —
Node that get information from both user feedback and
information obtained from data points
interactive BCM (iBCM)
39
BCM iBCM
Double circled nodes represent interacted latent variables —
Node that get information from both user feedback and
information obtained from data points
interactive BCM (iBCM) internal mechanism
40
3. Listen to Data
Key: Balance between what the data indicates and what makes most sense to the user
Our approach: Decompose Gibbs sampling steps to
1) adjust the feedback propagation depending on user’s confidence 2) accelerate the inference by rearranging latent variables
2. Propagate Users feedback
to accelerate inference
1. Listen to Users
User’s workflow with iBCM abstract domain
41
click to change to to
click to promote any items to be
prototype
Experiment procedure1. Subjects are asked how they want to
group items
2. Subjects view results from BCM
• Essentially shows one of the optimal clustering
3. Subjects indicate how well the results matched their preferred clustering
4. Subjects interact with iBCM
5. Subjects indicate how well the results matches with what they want
42Compare 24 participants, 192 questions
Experiment results1. Subjects are asked how they want to
group items
2. Subjects view results from BCM
• Essentially shows one of the optimal clustering
3. Subjects indicate how well the results matched their preferred clustering
4. Subjects interact with iBCM
5. Subjects indicate how well the results matches with what they want
4324 participants, 192 questions
Participants agreed more strongly that final clusters matched their preferences
compared to the initial clusters
Wilcoxon signed rank test
iBCM for introductory programming education
44
• Why education?
• Current teachers’ workflow for creating grading rubric: randomly pick 4-5 assignments and Hodgepodge Grading [Cross 99]
• Understanding this variation is important for providing appropriate, tailored feedback to students [Basu13, Huang 13]
• What are the challenges?
• Extracting right features — OverCode [Glassman 15]
iBCM + OverCode system
45submissions from MIT introductory python classes
iBCM + OverCode system
46
Select/unselect subspaces (keywords)
Promote/demote prototypes
iBCM experiment with domain experts results
Click here to get a new grouping
V.S.
Task: Explore the full spectrum of students’ submissions and write down `discovery list’ for a recitation
47
d
48
Experiment with domain experts results
• 48 problems explored by 12 subjects who previously taught introductory python class
• participants agreed more strongly to the following compared to BCM ( )
Were more satisfied Better explored the full spectrum of students’ submissions Better identified important features to expand discovery listImportant features and prototypes are useful𝑝 < 0.001
49
with iBCM, they…
Wilcoxon signed rank test
Experiment with domain experts results
• 48 problems explored by 12 subjects who previously taught introductory python class
• participants agreed more strongly to the following compared to BCM ( )
Were more satisfied Better explored the full spectrum of students’ submissions Better identified important features to expand discovery listImportant features and prototypes are useful𝑝 < 0.001
50
with iBCM, they…
“[iBCM enabled me to] go in depth as to how students could do”
“ [iBCM] is useful with large datasets where brute-force would not be practical.”
Wilcoxon signed rank test
Summary
51[Kim, Chacha, Shah AAAI13] [Kim, Chacha, Shah JAIR15]
Communication from machine to human:
provide intuitive explanations
make sense to humans
interact with humans
[Kim, Rudin, Shah NIPS 2014] [Kim, Glassman, Johnson, Shah submitted*][Kim, Patel, Rostamizadeh, Shah AAAI 2015]
Inspiration: how humans make decisions
Approach: case-based Bayesian model
Results: provided intuitive explanations while
maintaining performance
Approach: enable interaction by
decomposing sampling inference steps
Results: implemented and validated the approach in
education domain
Communication from human to machine:
incorporate feedback
miss-classified data
Next steps• Interpretability for data
exploration: visualization
• Domain specific interpretability: learning features that distinguishes clusters
• Interactive machine learning for debugging models or hyper parameter explorations
predicted: politics
Doc id #24
True label: medicine
[Kim, Patel, Rostamizadeh, Shah AAAI 2015]
52
[Kim, Doshi-Velez, Shah NIPS 2015]
Next steps at AI2• Extend interpretability for initially
uninterpretable features (neural nets)
53
4th grade science exam
question
Q&A
[Kim, Chacha, Shah AAAI13] [Kim, Chacha, Shah JAIR15]
Communication from machine to human:
provide intuitive explanations
make sense to humans
interact with humans
[Kim, Rudin, Shah NIPS 2014][Kim, Glassman, Johnson, Shah submitted*][Kim, Patel, Rostamizadeh, Shah AAAI 2015]
Inspiration: how humans make decisions
Approach: case-based Bayesian model
Results: provided intuitive explanations while
maintaining performance
Approach: enable interaction by
decomposing sampling inference steps
Results: implemented and validated the approach in
education domain
Communication from human to machine:
incorporate feedback
[Kim, Doshi-Velez, Shah NIPS 2015]
AI2 is hiring research interns
any time of the year. Shoot me an email
if interested! [email protected]