Upload
kelley-mcdaniel
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
9.94 The cognitive science of intuitive theories
J. Tenenbaum, T. Lombrozo,
L. Schulz, R. Saxe
Plan for the class
• Goal: – Introduce you to an exciting field of research
unfolding at MIT and in the broader world.
Plan for the class• An integrated series of talks and discussions
– Today: Josh Tenenbaum (MIT), “Computational models of theory-based inductive inference”
– Tuesday: Tania Lombrozo (Harvard/Berkeley) “Explanation in intuitive theories”
– Wednesday: Rebecca Saxe (Harvard/MIT), “Understanding other minds”
– Thursday: Laura Schulz (MIT), “Theories and evidence”
– Friday: Special Mystery Guest, “When theories fail”
Plan for the class• Requirements for credit (pass/fail, 3 units)
– Attend the classes– Participate in discussions– Take-home quiz:
• Emailed to you this weekend (after the class)
• Due back to me by email on Wednesday, Feb. 1
If you are not registered on the list below, make sure to register and send me an email message at: [email protected]
Class list
Azeez,Zainab O
Belova,Nadezhda
Brenman,Stephanie V
Cherian,Tharian S
Clark,Abigail M
Clark,Samuel D
Curry,Justin M
Dai,Jizi
Dean,Clare
Liu,Yicong
McGuire,Salafa'anius
Ovadya,Aviv
Poon,Samuel H
Pradhan,Nikhil T
Ren,Danan T
Rotter,Juliana C
Slagle,Amy M
Tang,Di
Ferranti,Darlene E
Frazier,Jonathan J
Gordon,Matthew A
Green,Delbert A
Huhn,Anika M
Hunt,Beatrice P.
Kamrowski,Kaitlin M
Kanaga,Noelle J
Kwon,Jennifer
Taub,Daniel M
Tung,Roland
Voelbel,Kathleen
Vosoughi,Soroush
Willmer,Anjuli J
Ye,Diana F
Yuen,Grace J
Zhao,Bo
Scheduling
• Today:– Go to 4:30 (with a break in the middle)?
• Friday:– An hour earlier: 1:00 – 3:00?
The big problem of intelligence
How does the mind get so much out of so little?
The big problem of intelligence
How does the mind get so much out of so little?
Three-dimensional:
Two-dimensional:
The big problem of intelligence
How can we generalize new concepts reliably from just one or a few examples? – Learning word meanings
“horse” “horse” “horse”
The objects of planet Gazoob“tufa”
“tufa”
“tufa”
The big problem of intelligence
How can we generalize new concepts reliably from just one or a few examples? – Learning about new properties of categories
Cows have T4 hormones.Bees have T4 hormones.Salmon have T4 hormones.
Humans have T4 hormones.
Cows have T4 hormones.Goats have T4 hormones.Sheep have T4 hormones.
Humans have T4 hormones.
The big problem of intelligence
How do we use concepts in ways that go beyond our experience?
• “dog”
• Is it still a dog if you…– Put a penguin costume on it?
– Surgically alter it until it looks just like a penguin?
– Pre-natally inject a substance that causes it to look just like a penguin? … and it can mate with penguins and produce penguin offspring?
The big problem of intelligence
How do we use concepts in ways that go beyond our experience?
• Two cars were reported stolen by the Groveton police yesterday.
• The judge sentenced the killer to die in the electric chair for the second time.
• No one was injured in the blast, which was attributed to a buildup of gas by one town official.
• One witness told the commissioners that she had seen sexual intercourse taking place between two parked cars in front of her house.
The big problem of intelligence
How do we use concepts in ways that go beyond our experience?
Consider a man named Boris. – Is the mother of Boris’s father his grandmother?
– Is the mother of Boris’s sister his mother?
– Is the son of Boris’s sister his son?
(Note: Boris and his family were stranded on a desert island when he was a young boy.)
What makes us so smart?
• Memory?
• Logical inference?
What makes us so smart?
• Memory? No.– The difference between a test that you can pass on
rote memory and a test that shows whether you “actually learned something”.
• Logical inference? No.– The difference between deductive inference and
inductive inference.
Modes of inference
• Deductive inference:
• Inductive inference:
All mammals have biotinic acid in their blood.Horses are mammals.
Horses have biotinic acid in their blood.
Horses have biotinic acid in their blood.Horses are mammals.
All mammals have biotinic acid in their blood.
What makes us so smart?
• Intuitive theories– Systems of concepts that are in some important
respects like scientific theories. – Abstract knowledge that supports prediction,
explanation, exploration, and decision-making for an infinite range of situations that we have not previously encountered.
Some questions about intuitive theories
• What is their content? • How are they represented in the mind or brain?• How are they used to generalize to new situations? • How are they acquired?
Some questions about intuitive theories
• What is their content?
• How are they represented in the mind or brain?
• How are they used to generalize to new situations?
• How are they acquired?
• Can they be described in computational terms?
• In what essential ways are they similar to or different from scientific theories?
• How good (accurate, comprehensive, rich) are they, under what circumstances? What can we learn from their failures?
What can we learn from perceptual or cognitive illusions?
• Goal of visual perception is to recover world structure from visual images.
• Why the problem is hard: many world structures can produce the same visual input.
Scene hypotheses
Image data
What can we learn from perceptual or cognitive illusions?
• Goal of visual perception is to recover world structure from visual images.
• Why the problem is hard: many world structures can produce the same visual input.
• Illusions reveal the visual system’s implicit theories of the physical world and the process of image formation.
Computational models of theory-based inductive inference
Josh TenenbaumDepartment of Brain and Cognitive Sciences
Computer Science and Artificial Intelligence Laboratory
MIT
Plan for today
• A general framework for solving under-constrained inference problems– Bayesian inference
• Applications in perception and cognition– lightness perception
– predicting the future (with Tom Griffiths)
– learning about properties of natural species (with Charles Kemp)
Modes of inference
• Deductive inference:
• Inductive inference:
All mammals have biotinic acid in their blood.Horses are mammals.
Horses have biotinic acid in their blood.
Horses have biotinic acid in their blood.Horses are mammals.
All mammals have biotinic acid in their blood.
Logic
Probability
Bayesian inference
• Definition of conditional probability:
• Bayes’ rule:
• “Posterior probability”:• “Prior probability”:• “Likelihood”:
)|()()|()(),( BAPBPABPAPBAP
ihii hdPhP
hdPhPdhP
)|()(
)|()()|(
)|( dhP
)(hP
)|( hdP
Bayesian inference
• Bayes’ rule:• An example
– Data: John is coughing – Some hypotheses:
1. John has a cold
2. John has emphysema
3. John has a stomach flu
– Prior favors 1 and 3 over 2– Likelihood P(d|h) favors 1 and 2 over 3– Posterior P(d|h) favors 1 over 2 and 3
ihii hdPhP
hdPhPdhP
)|()(
)|()()|(
Bayesian inference
• Bayes’ rule:
• What makes a good scientific argument? P(h|d) is high if:– Hypothesis is plausible: P(h) is high
– Hypothesis strongly predicts the observed data:
P(d|h) is high
– Data are surprising: is low
ihii hdPhP
hdPhPdhP
)|()(
)|()()|(
ih
ii hdPhPdP )|()()(
Coin flipping
HHTHT
HHHHH
What process produced these sequences?
Comparing two simple hypotheses
• Contrast simple hypotheses:– H1: “fair coin”, P(H) = 0.5
– H2:“always heads”, P(H) = 1.0
• Bayes’ rule:
• With two hypotheses, use odds form
)(
)|()()|(
DP
HDPHPDHP
Comparing two simple hypotheses
D: HHTHTH1, H2: “fair coin”, “always heads”
P(D|H1) = 1/25 P(H1) = ?
P(D|H2) = 0 P(H2) = 1-?
)(
)(
)|(
)|(
)|(
)|(
2
1
2
1
2
1
HP
HP
HDP
HDP
DHP
DHP
Comparing two simple hypotheses
D: HHTHTH1, H2: “fair coin”, “always heads”
P(D|H1) = 1/25 P(H1) = 999/1000
P(D|H2) = 0 P(H2) = 1/1000infinity
1
999
0
321
)|(
)|(
2
1 DHP
DHP
)(
)(
)|(
)|(
)|(
)|(
2
1
2
1
2
1
HP
HP
HDP
HDP
DHP
DHP
Comparing two simple hypotheses
D: HHHHHH1, H2: “fair coin”, “always heads”
P(D|H1) = 1/25 P(H1) = 999/1000
P(D|H2) = 1 P(H2) = 1/1000
301
999
32
1
)|(
)|(
2
1 DHP
DHP
)(
)(
)|(
)|(
)|(
)|(
2
1
2
1
2
1
HP
HP
HDP
HDP
DHP
DHP
Comparing two simple hypotheses
D: HHHHHHHHHHH1, H2: “fair coin”, “always heads”
P(D|H1) = 1/210 P(H1) = 999/1000
P(D|H2) = 1 P(H2) = 1/1000
)(
)(
)|(
)|(
)|(
)|(
2
1
2
1
2
1
HP
HP
HDP
HDP
DHP
DHP
11
999
1024
1
)|(
)|(
2
1 DHP
DHP
The role of intuitive theories
The fact that HHTHT looks representative of a fair coin and HHHHH does not reflects our implicit theories of how the world works. – Easy to imagine how a trick all-heads coin
could work: high prior probability.– Hard to imagine how a trick “HHTHT” coin
could work: low prior probability.
Plan for today
• A general framework for solving under-constrained inference problems– Bayesian inference
• Applications in perception and cognition– lightness perception
– predicting the future (with Tom Griffiths)
– learning about properties of natural species (with Charles Kemp)
Gelb / Gilchrist demo
Explaining the illusion
• The problem of lightness constancy– Separating the intrinsic reflectance (“color”) of a
surface from the intensity of the illumination.
• Anchoring heuristic: – Assume that the brightest patch in each scene is white.
• Questions:– Is this really right?– Why (and when) is it a good solution to the problem of
lightness constancy?
Why is lightness constancy hard?
• The physics of light reflection:
L = I x R
L: luminance (light emitted from surface)
I: intensity of illumination in the world
R: reflectance of surface in the world
• The problem: Given L, solve for I and R.
Why is lightness constancy hard?
• The physics of light reflection:
L1 = I x R1
L2 = I x R2
...
Ln = I x Rn
• The problem: Given L1, …, Ln, solve for I and R1, …, Rn.
L = {2, 4, 5, 9}
Scene hypotheses
Image data
L = I x RI = 10 R = {0.2, 0.4, 0.5, 0.9}
I = 100 R = {0.02, 0.04, 0.05, 0.09}
I = 15 R = {0.13, 0.26, 0.33, 0.60}
Why is lightness constancy hard?
A simplified theory of the visual world
• Really bright illuminants are rare.
P(I)
I 0
P(I)
I 0
A simplified theory of the visual world
• Really bright illuminants are rare.
• Any surface color is equally likely.
P(Ri)
Ri 0 1 (black) (white)
P(I)
I 0
P(I)
I 0
A simplified theory of the visual world
• Really bright illuminants are rare.
• Observed luminances, Li = I x Ri , are a random sample from 0 to I.
P(Li|I )
Li 0 I
P(I)
I 0
P(I)
I 0
A simplified theory of the visual world
• Really bright illuminants are rare.
• Observed luminances, Li = I x Ri , are a random sample from 0 to I.
P(Li|I )
Li 0 I’
P(I)
I 0
P(I)
I 0
I
Scene hypotheses h
Image data d
I = 10
I = 15
I = 100
h1
P(h1): high
h2
P(h2): med
h3
P(h3): low
ihii hdPhP
hdPhPdhP
)|()(
)|()()|(
L = {9}
Scene hypotheses h
Image data d
I = 10
I = 15
I = 100
h1
P(h1): high
h2
P(h2): med
h3
P(h3): low
ihii hdPhP
hdPhPdhP
)|()(
)|()()|(
L = {2, 4, 5, 9}
Prior probabilityalone can’t explainhow inference changeswith more data.
Scene hypotheses h
Image data d
I = 10
I = 15
I = 100
h1
P(h1): high
h2
P(h2): med
h3
P(h3): low
ihii hdPhP
hdPhPdhP
)|()(
)|()()|(
L = {9}
nIhdP
1)|(
Scene hypotheses h
Image data d
I = 10
I = 15
I = 100
h1
P(h1): high
h2
P(h2): med
h3
P(h3): low
ihii hdPhP
hdPhPdhP
)|()(
)|()()|(
L = {2, 4, 5, 9}
nIhdP
1)|(
4101
4151
41001
L
9
0
p(L
= l
| I )
I =10
I =15
p({l1}| I=10) p({l1}| I=15)
Graphing the likelihood
~
Graphing the likelihood
L
2 4 5 9
0
p(L
= l
| I )
I =10
I =15
p({l1, l2, l3, l4,}| I=10) >> p({l1, l2, l3, l4,}| I=15)
Explanations lightness constancy
• Anchoring heuristic: Assume that the brightest patch in each scene is white.– Is this really right?
– Why (and when) is it a good solution to the problem?
• Bayesian analysis– Explains the computational basis for inference.
– Explains why confidence in “brightest = white” increases as more samples are observed.
Applications to cognition
• Predicting the future (with Tom Griffiths)
• Learning about properties of natural species (with Charles Kemp)
Everyday prediction problems• You read about a movie that has made $60 million to date.
How much money will it make in total?
• You see that something has been baking in the oven for 34 minutes. How long until it’s ready?
• You meet someone who is 78 years old. How long will they live?
• Your friend quotes to you from line 17 of his favorite poem. How long is the poem?
• You see taxicab #107 pull up to the curb in front of the train station. How many cabs in this city?
Making predictions
• You encounter a phenomenon that has existed for tpast units of time. How long will
it continue into the future? (i.e. what’s ttotal?)
• We could replace “time” with any other variable that ranges from 0 to some unknown upper limit (c.f. lightness).
Bayesian inference
P(ttotal|tpast) P(tpast|ttotal) P(ttotal)
posterior probability
likelihood prior
Bayesian inference
P(ttotal|tpast) P(tpast|ttotal) P(ttotal)
P(ttotal|tpast) 1/ttotal P(ttotal)
posterior probability
likelihood prior
Assume randomsample
(0 < tpast < ttotal)
Bayesian inference
P(ttotal|tpast) P(tpast|ttotal) P(ttotal)
P(ttotal|tpast) 1/ttotal 1/ttotal
posterior probability
likelihood prior
“Uninformative” prior
Assume randomsample
(0 < tpast < ttotal)
How about maximal value of P(ttotal|tpast)?
Bayesian inference
P(ttotal|tpast) 1/ttotal 1/ttotal
posterior probability
What is the best guess for ttotal?
P(ttotal|tpast)
ttotalttotal = tpast
Randomsampling
“Uninformative” prior
Bayesian inference
P(ttotal|tpast)
ttotal
What is the best guess for ttotal? Instead, compute t such that P(ttotal > t|tpast) = 0.5:
P(ttotal|tpast) 1/ttotal 1/ttotal
posterior probability
Randomsampling
“Uninformative” prior
Bayesian inference
Yields Gott’s Rule: P(ttotal > t|tpast) = 0.5 when t = 2tpast
i.e., best guess for ttotal = 2tpast .
P(ttotal|tpast) 1/ttotal 1/ttotal
posterior probability
Randomsampling
“Uninformative” prior
What is the best guess for ttotal? Instead, compute t such that P(ttotal > t|tpast) = 0.5.
Evaluating Gott’s Rule
• You read about a movie that has made $78 million to date. How much money will it make in total?– “$156 million” seems reasonable.
• You meet someone who is 35 years old. How long will they live?– “70 years” seems reasonable.
• Not so simple:– You meet someone who is 78 years old. How long will they live?
– You meet someone who is 6 years old. How long will they live?
The effects of priors
• Different kinds of priors P(ttotal) are appropriate in different domains.
Gott: P(ttotal) ttotal-1
The effects of priors
• Different kinds of priors P(ttotal) are appropriate in different domains.
e.g., wealth, contacts
e.g., height, lifespan
The effects of priors
Evaluating human predictions
• Different domains with different priors:– A movie has made $60 million– Your friend quotes from line 17 of a poem– You meet a 78 year old man– A move has been running for 55 minutes – A U.S. congressman has served for 11 years– A cake has been in the oven for 34 minutes
• Use 5 values of tpast for each.
• People predict ttotal .
You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?
You learn that in ancient Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?
How long did the typicalpharaoh reign in ancientegypt?
Assumptions guiding inference
• Random sampling
• Strong prior knowledge – Form of the prior (power-law or exponential)– Specific distribution given that form
(parameters)– Non-parametric distribution when necessary.
• With these assumptions, strong predictions can be made from a single observation
Applications to cognition
• Predicting the future (with Tom Griffiths)
• Learning about properties of natural species (with Charles Kemp)
Which argument is stronger?
Cows have biotinic acid in their blood
Horses have biotinic acid in their blood
Rhinos have biotinic acid in their blood
All mammals have biotinic acid in their blood
Cows have biotinic acid in their blood
Dolphins have biotinic acid in their blood
Squirrels have biotinic acid in their blood
All mammals have biotinic acid in their blood
“Diversity phenomenon”
Osherson, Smith, Wilkie, Lopez, Shafir (1990):
• 20 subjects rated the strength of 45 arguments:
X1 have property P.
X2 have property P.
X3 have property P.
All mammals have property P.
• 40 different subjects rated the similarity of all pairs of 10 mammals.
Traditional psychological models
Osherson et al. consider two similarity-based models:
• Sum-Similarity:
• Max-Similarity:
mammals
),sim()mammals all(i Xj
ji| XP
),sim(max)mammals all(mammals
ji| XPi Xj
Model
Dat
aData vs. models
Each “ ” represents one argument:X1 have property P.X2 have property P.X3 have property P.
All mammals have property P.
.
Open questions
• Explaining similarity:– Why does Max-sim fit so well? When worse?– Why does Sum-sim fit so poorly? When better?
• Explaining Max-sim:– Is there some rational computation that Max-sim
implements or approximates?– What theory about this task and domain is implicit in
Max-sim?
(c.f., analysis of lightness constancy)
• Species generated by an evolutionary branching process.– A tree-structured taxonomy of species.
• Taxonomy also central in folkbiology (Atran).
A simplified theory of biology
Theory-based Bayesian model
Begin by reconstructing intuitive taxonomy from similarity judgments:
chim
pgo
rilla
hors
eco
wel
epha
ntrh
ino
mou
sesq
uirre
ldo
lphi
nse
al
clustering
Hypothesis space H: each taxonomic cluster is a possible hypothesis for the extension of the novel property.
chim
pgo
rilla
hors
eco
wel
epha
ntrh
ino
mou
sesq
uirre
ldo
lphi
nse
al
h1
h3
h6
h17 . . .
h0: “all mammals”
Theory-based Bayesian model
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
Hh
hphXp
hXpXp
)()|(
)|()|mammals all( 0
hxx
n
nhhXp
,,if
1)size(
1)|(
hxi any if 0
h0: “all mammals”
p(h): uniform
How taxonomy constrains induction
• Atran (1998): “Fundamental principle of systematic induction” (Warburton 1967, Bock 1973)– Given a property found among members of any
two species, the best initial hypothesis is that the property is also present among all species that are included in the smallest higher-order taxon containing the original pair of species.
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
“all mammals”
Cows have property P.Dolphins have property P.Squirrels have property P.
All mammals have property P.
Strong (0.76 [max = 0.82])
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
Cows have property P.Dolphins have property P.Squirrels have property P.
All mammals have property P.
Cows have property P.Horses have property P.Rhinos have property P.
All mammals have property P.
“large herbivores”
Strong: 0.76 [max = 0.82]) Weak: 0.17 [min = 0.14]
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
Seals have property P.Dolphins have property P.Squirrels have property P.
All mammals have property P.
Cows have property P.Dolphins have property P.Squirrels have property P.
All mammals have property P.
“all mammals”
Strong: 0.76 [max = 0.82] Weak: 0.30 [min = 0.14]
Max-sim
Sum-sim
Conclusionkind:
Number ofexamples:
“all mammals” “horses” “horses”
3 2 1, 2, or 3
Bayes(taxonomic)
Max-sim
Sum-sim
Conclusionkind:
Number ofexamples:
“all mammals”
3
Seals have property P.Dolphins have property P.Squirrels have property P.
All mammals have property P.
Cows have property P.Dolphins have property P.Squirrels have property P.
All mammals have property P.
Bayes(taxonomic)
A simplified theory of biology
• Species generated by an evolutionary branching process.– A tree-structured taxonomy of species.
• Features generated by stochastic mutation process and passed on to descendants. – Novel features can appear anywhere in tree, but
some distributions are more likely than others.
Hypothesis space H: each taxonomic cluster is a possible hypothesis for the extension of a novel feature.
chim
pgo
rilla
hors
eco
wel
epha
ntrh
ino
mou
sesq
uirre
ldo
lphi
nse
al
h1
h3
h6
h17 . . .
h0: “all mammals”
Theory-based Bayesian model
Generate hypotheses for novel feature F via (Poisson arrival) mutation process over branches b:
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
bebFp 1) along develops (
Theory-based Bayesian model
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
bebFp 1) along develops (
Generate hypotheses for novel feature F via (Poisson arrival) mutation process over branches b:
Theory-based Bayesian model
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
bebFp 1) along develops (
Generate hypotheses for novel feature F via (Poisson arrival) mutation process over branches b:
Theory-based Bayesian model
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
bebFp 1) along develops (
Generate hypotheses for novel feature F via (Poisson arrival) mutation process over branches b:
Theory-based Bayesian model
Samples from the prior
• Labelings that cut the data along longer branches are more probable:
>x
x
chim
pgo
rilla
horse cow
eleph
ant
rhin
om
ouse
squi
rrel
dolp
hin
seal
chim
pgo
rilla
horse cow
eleph
ant
rhin
om
ouse
squi
rrel
dolp
hin
seal
Samples from the prior
• Labelings that cut the data along fewer branches are more probable:
>
“monophyletic” “polyphyletic”
x xx
chim
pgo
rilla
horse cow
eleph
ant
rhin
om
ouse
squi
rrel
dolp
hin
seal
chim
pgo
rilla
horse cow
eleph
ant
rhin
om
ouse
squi
rrel
dolp
hin
seal
elep
hant
squi
rrel
chim
pgo
rilla
hors
eco
w
rhin
om
ouse
dolp
hin
seal
Hh
hphXp
hXpXp
)()|(
)|()|mammals all( 0
hxx
n
nhhXp
,,if
1)size(
1)|(
hxi any if 0
h0: “all mammals”
p(h): “evolutionary” process (mutation + inheritance)
Max-sim
Sum-sim
Conclusionkind:
Number ofexamples:
“all mammals” “horses” “horses”
3 2 1, 2, or 3
Bayes(taxonomic)
Max-sim
Sum-sim
Conclusionkind:
Number ofexamples:
“all mammals” “horses” “horses”
3 2 1, 2, or 3
Bayes(taxonomy+mutation)
Explaining similarity
• Why does Max-sim fit so well? – An efficient and accurate approximation to
Bayesian (evolutionary) model.
Correlation (r)
Correlation with Bayeson three-premise general arguments,over 100 simulated tree structures:
Mean r = 0.94
There’s also a theorem.
Biology: Summary• Theory-based statistical inference explains inductive
reasoning in folk biology.
• Mathematical modeling reveals people’s implicit theories about the world. – Category structure: taxonomic tree.– Feature distribution: stochastic mutation process +
inheritance.
• Clarifies traditional psychological models.– Why Max-sim over Sum-sim?
Beyond taxonomic similarity• Generalization
based on known dimensions: (Smith et al., 1993; Blok et al., 2002)
Poodles can bite through wire.
German shepherds can bite through wire.
Dobermans can bite through wire.
German shepherds can bite through wire.
Beyond taxonomic similarity• Generalization
based on known dimensions: (Smith et al., 1993; Blok et al., 2002)
• Generalization based on causal relations: (Medin et al., 2004; Shafto & Coley, 2003)
Salmon carry E. Spirus bacteria.
Grizzly bears carry E. Spirus bacteria.
Grizzly bears carry E. Spirus bacteria.
Salmon carry E. Spirus bacteria.
Poodles can bite through wire.
German shepherds can bite through wire.
Dobermans can bite through wire.
German shepherds can bite through wire.
Predicate type “has T4 hormones” “can bite through wire” “carries E. Spirus bacteria” Generative theory taxonomic tree directed chain directed network + mutation + unknown threshold + noisy transmission
Class C
Class A
Class D
Class E
Class G
Class F
Class BClass C
Class A
Class D
Class E
Class G
Class F
Class B
Class AClass BClass CClass DClass EClass FClass G
. . . . . . . . .
Class C
Class G
Class F
Class E
Class D
Class B
Class A
Hypotheses
Kelp
Human
Dolphin
Sand shark
Mako shark
Tuna
Herring
KelpHuman
Dolphin
Sand shark
Mako shark Tuna Herring
Island ecosystem
Taxonomy
Food web
(Shafto, Kemp, Baraff, Coley, Tenenbaum)
DatasetsModels
Mammalecosystem:- disease
- genetic property
Island ecosystem:- disease
- genetic property
0.75 -0.15 0.07
0.25 0.92 0.87
0.79 0.01 0.17
0.31 0.89 0.86
r =
Bayes Bayes Max-(food web) (tree) Sim
Assumptions guiding inferences
• Qualitatively different priors are appropriate for different domains of inductive generalizaiton.
• In each domain, a prior that matches the world’s structure fits people’s inductive judgments better than alternative priors.
• A common framework for representing people’s domain models: a graph structure defined over entities or classes, and a probability distribution for predicates over that graph.
Conclusion• The hard problem of intelligence: how do we “go
beyond the information given”?
• The solution:– Bayesian statistical inference:
– Implicit theories about the structure of the world, generating P(h) and P(d | h).
Cows have property P.Dolphins have property P.Squirrels have property P.
All mammals have property P.
ihii hdPhP
hdPhPdhP
)|()(
)|()()|(
Discussion
• How is this intuitive theory of biology like or not like a scientific theory?
• In what sense does the visual system have a theory of the world? How is it like or not like a cognitive theory of biology, or a scientific theory?