Decision-making Bias in Instance Matching Model Selectionkejriwalresearch.azurewebsites.net/pdf/iswc16-slides.pdf · Decision-making and Model Selection Cognitive psychology has shown

Decision-making

Bias in Instance

Matching Model

SelectionMayank Kejriwal, Daniel P. Miranker

Acknowledgements: US National Science Foundation, Microsoft Research

Instance Matching

50+ year old Artificial Intelligence problem

When do two entities refer to the same underlying entity?

“Record linkage: making maximum use of the discriminating power of

identifying information.” Newcombe and Kennedy (1962)

Numerous surveys by Winkler (2006), Rahm et al. (2010) etc.

2

Machine learning

3

Classifier example: feedforward multilayer perceptron (MLP)

“Machine Learning: an artificial intelligence approach.” Michalski,

Carbonell and Mitchell (2013)

Supervised machine learning

4

▪ Requires a (manually) labeled set for both training and

validation

▪ Typically acquired through sampling a ground-truth

▪ Training: Classifier parameters (e.g. edge weights of MLP)

▪ Validation: Classifier hyperparameters (e.g. number of

layers, nodes, learning rate...)

▪ Also requires model selection decisions:

▪ Which training algorithm?

▪ What sampling technique?

▪ How to split the data for training/validation?

▪ Not obvious



Model Selection Exercise

What percentage of labeled data should I

use for training and what percentage for

validation?



5

What do other people do?

Most common approach in the literature is a ten-fold

split (and less often, two-fold)

What if I care more about one performance metric (say

recall, versus precision) within reasonable constraints?

What if I have sampled and labeled a lot of data (say 90%

of the estimated ground-truth?)

Should answers to these questions (and others) bias my

decision?

“Semi-supervised instance matching using boosted classifiers.” Kejriwal

and Miranker (2015)

6

Let’s do an experiment

Labeled Data (as

percentage of

ground-truth)

Precision Recall

10% 54.13% 25.77%

50% 61.51% 28.77%

90% 73.27% 27.69%

10% 45.47% 35.64%

50% 55.50% 34.92%

90% 66.67% 36.92%

Ten-fold split

Two-fold split

Results for the Amazon-GoogleProducts benchmark, using MLP

Consistent results across two other benchmarks, and

several experimental controls...7

What if I care more about recall than precision?

I should choose a two-fold split (unlike what the

literature would suggest)

What if I have sampled and labeled a lot of data(say

90% of the estimated ground-truth?)

An irrelevant concern, once the metric is

specified

Concluding the exercise

8

Takeaway: Some model selection decisions can bias other

model selection decisions, not always in an obvious way

How do we make informed model

selection decisions?

9

Decision-making and Model

Selection

Cognitive psychology has shown (empirically) that

human beings are neither logical nor rational

Wason Selection Task

Prospect Theory (awarded the 2002 Nobel Prize for

Economics)

“Reasoning about a rule.” Wason (1968)

“The logic of social exchange: Has natural selection shaped how humans

reason? Studies with the Wason selection task.” Cosmides (1989)

“Propsect theory: an analysis of decision under risk.” Kahneman and

Tversky (1979)

10

One systematic method is to

start by...

Visualizing decision-making biases through capturing

influences between decisions

Labeling

budget

Computational

resources

Training/

Validation

split

Performance

Metric

11

Decision

Concise approach: bipartite graphs

“Bipartite graphs and their applications.” Asratian et al. (1998)

Labeling

budget

Computational

resources

Training/

Validation

split

Performance

Metric

Node of influence

12

The interpretation of the nodes and edges is

abstract (we don’t impose strict requirements)

Hypothesizing about biases

The art in model selection: are there edges we should

consider removing/adding?

In the paper, we form at least four hypotheses that

directly translate to recommendations

Labeling

budget

Computational

resources

Training/

Validation

split

Performance

Metric

13

14

Experimental platform Collected over 25 GB of data on the Microsoft Azure ML platform

Used three publicly available benchmarks

15

Efficiency Recommendation 1

Validation is usually much faster than training,

especially for expressive classifiers

Run-time reductions of almost 70% with proportionally less

loss in effectiveness

Recommendation: consider favoring more validation over

training if speed is an important concern

16

Efficiency Recommendation 2

Validation is usually much faster than training,

especially for expressive classifiers

Grid search is no more effective than random search for

default hyperparameter values

Mean difference less than 0.99% and not statistically

significant

Recommendation: Favor random search in your

hyperparameter optimization as it is much faster (over 90%

run-time decrease)

17

Concluding notes

Hard problems (e.g. instance matching) require an

ingenious combination of heuristics, biases and models

Understanding decision-making biases can help us do

better model selection

Can also help to identify experimental confounds!

There are many proposals to visualize decision-making,

but not decision-making bias

We proposed a bipartite graph as a good candidate

The visualization is not just a pedantic exercise

About 25 GB of data shows that it can also be useful

Many future directions!

kejriwalresearch.azurewebsites.net

18

https://sites.google.com/a/utexas.edu/mayank-

kejriwal/projects/semantics-and-model-selection

What biases go into your

model selection process?

19

Documents

Decision-making Bias in Instance Matching Model Selectionkejriwalresearch.azurewebsites.net/pdf/iswc16-slides.pdf · Decision-making and Model Selection Cognitive psychology has shown