Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Commonsense Knowledge Acquisition and Applications
Niket TandonPh.D. Supervisor: Gerhard Weikum
Max Planck Institute for Informatics
Towards Commonsense Enriched Machines
2
Hard Rock
Hand, leg
Climbing a rock
brown
Person
Adventurous Activity
property
part of
scene
Climber Personis a
3
Hard Rock
Hand, leg
Climbing a rock
brown
Person
Adventurous Activity
property
part of
scene
Humans
Climber Personis a
Machines
1 Rock
2 Hands
2 Legs
1 Person
Human- Machine Knowledge Gap
4
Hard Rock
Hand, leg
Climbing a rock
brown
Person
Adventurous Activity
property
part of
scene
Humans
Climber Personis a
Machines
1 Rock
2 Hands
2 Legs
1 Person
Human- Machine Knowledge Gap
Commonsense of
objects
Commonsense of
relationships
Commonsense of
interactions
5
How will the machines be smarter if we fill this knowledge gap
Smarter Robots
Get me a coffee (where?)
Smarter Vision
Better classifiers Monitor or TV?given mouse, keyboard
Smarter IR
Adventurous activities
6
Encyclopedic Knowledge
Commonsense
Knowledge
Facts about instances/events
Facts about Instances:A. Honnold, married, Lisa Honnold
Their events:A. Honnold, married on, 19.08.2016
Facts about classes/activities
Can we fill the human machine knowledge gap using existing Encyclopedic KBs like FreeBase?
7
Encyclopedic Knowledge
Commonsense Knowledge
Facts about instances
1. EKB acquisition Unimodal
2. EKB Curation Textual verification
3. EKB CompletionNegative training assumptions hold
If (ei, rk, ej) holds, then
(ei, rk, ej’ != ej) is -ve
A. Honnold, bornIn, USA. Honnold, bornIn, UK
Facts about classes
1. CKB acquisitionMultimodal
2. CKB Curation Textual + Visual
3. CKB CompletionNegative trainingassumptions fail
climber, at location, {mountain, university}
8
Encyclopedic Knowledge
Commonsense Knowledge
Facts about instances
1. EKB acquisition Unimodal
2. EKB Curation Textual verification
3. EKB CompletionNegative training assumptions hold
If (ei, rk, ej) holds, then
(ei, rk, ej’ != ej) is -ve
A. Honnold, bornIn, USA. Honnold, bornIn, UK
Facts about classes
1. CKB acquisitionMultimodal
2. CKB Curation Textual + Visual
3. CKB CompletionNegative trainingassumptions failEKBs have several functional relations
hence the assumption holds.
0
0.2
0.4
0.6
0.8
1
EKB CKB
Functional
Non-functional
Commonsense knowledge acquisition is different and harder
Humans hardly express the obvious: Scarce & Implicit
Spread across multiple modalities: Multimodal
Unusual reported more than usual: Reporting Bias
Culture specific, Location specific: Contextual
9
KBs possessing commonsense knowledge
10
Need: automatically constructed, semantically organized Commonsense KB
KB Supervision Pros Cons
Cyc manually curated
accuracy costcoverage
ConceptNet semi-automated
coverage accuracy
less organized
Tandon et. al AAAI’11
bootstrapped usingConceptNet
coverage noise, less organized
Desiderata minimalsupervision
organized,high accuracy > 80%, high coverage >10M
---
Need: robust techniques to automatically construct semantically organized Commonsense KB
Three research questions:Investigate robust techniques to acquire:
RQ 1. Commonsense of objects in the environment - fine-grained, semantically refined properties.
Three research questions:Investigate robust techniques to acquire:
RQ 2. Commonsense of relationships between objects. - part whole relation, comparative relation…
Three research questions:Investigate robust techniques to acquire:
RQ 3. Commonsense of interactions between objects.- activities and their semantic attributes.
Three research questions:Investigate robust techniques to acquire:
Three research questions:Investigate robust techniques to acquire:
RQ.1
RQ.2
RQ.3
RQ.3
Research question 1
RQ.2
Previous work: • lump together these properties • do not distinguish the meanings of the words• have low coverage
RQ 1. Commonsense of objects in the environment - fine-grained, semantically refined properties.
18
Output 𝑡𝑟𝑖𝑝𝑙𝑒𝑠 ∶ < 𝑤1𝑛𝑠 , 𝑟, 𝑤2𝑎
𝑠 >
Input ∶ 𝐿𝑎𝑟𝑔𝑒 𝑡𝑒𝑥𝑡 𝑐𝑜𝑟𝑝𝑢𝑠
𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑒. 𝑔. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝
𝑠𝑢𝑚𝑚𝑖𝑡𝑛2 ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑐𝑟𝑖𝑠𝑝𝑎
3
19
disambiguated n
1.)
2.)
3.)
…
fine-grained relations: r∈R
hasAppearancehasSoundhasTastehasTemperaturehasSoundevokesEmotion
…
Output 𝑡𝑟𝑖𝑝𝑙𝑒𝑠 ∶ < 𝑤1𝑛𝑠 , 𝑟, 𝑤2𝑎
𝑠 >
Input ∶ 𝐿𝑎𝑟𝑔𝑒 𝑡𝑒𝑥𝑡 𝑐𝑜𝑟𝑝𝑢𝑠
𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑒. 𝑔. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝
disambiguated a
1.)
2.)
3.)
…
𝑠𝑢𝑚𝑚𝑖𝑡𝑛2 ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑐𝑟𝑖𝑠𝑝𝑎
3
20
Extract generic hasProperty
triples over input
verb [adv] e.g. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝..
Disambiguate argsand classify triple
𝒔𝒖𝒎𝒎𝒊𝒕, 𝒄𝒓𝒊𝒔𝒑
Our approach
𝒎𝒐𝒖𝒏𝒕𝒂𝒊𝒏, 𝒄𝒐𝒍𝒅
𝒄𝒉𝒊𝒍𝒊, 𝒉𝒐𝒕
Extract generic hasProperty
triples over input
Disambiguate argsand classify triple
Typically requirestraining data
22
< 𝒘𝟏𝒏 , 𝒘𝟐𝒂 >
< 𝒘𝟏𝒏𝒔 , 𝒓, 𝒘𝟐𝒂
𝒔 >
< 𝒘𝟏𝒏𝒔 , 𝒓,∗>
Suppose 𝑟 =ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒𝑠𝑢𝑚𝑚𝑖𝑡, 𝑐𝑟𝑖𝑠𝑝
Extract generic hasProperty
triples over input
Disambiguate argsand classify triple
𝒄𝒓𝒊𝒔𝒑𝒂𝟑, 𝒉𝒐𝒕𝒂
𝟏, 𝒄𝒐𝒍𝒅𝒂𝟏,
𝒊𝒄𝒚𝒂𝟐 …
𝒃𝒆𝒂𝒄𝒉𝒏𝟑 , 𝒔𝒖𝒎𝒎𝒊𝒕𝒏
𝟐 , 𝒎𝒆𝒕𝒂𝒍𝒏
𝟏 , 𝒎𝒆𝒕𝒂𝒍𝒏𝟐 …
< 𝒔𝒖𝒎𝒎𝒊𝒕𝒏𝟐 , 𝒄𝒓𝒊𝒔𝒑𝒂
𝟑 >< 𝒃𝒆𝒂𝒄𝒉𝒏
𝟏 , 𝒉𝒐𝒕𝒂𝟏 > …
𝒓𝒂𝒏𝒈𝒆 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆
𝒅𝒐𝒎𝒂𝒊𝒏 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆
𝒂𝒔𝒔𝒆𝒓𝒕𝒊𝒐𝒏 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆
𝑑𝑜𝑚𝑎𝑖𝑛(𝑟), 𝑟𝑎𝑛𝑔𝑒(𝑟), 𝑎𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛(𝑟) 𝑖𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒
23
Noisy, Surface
form candidates
for 𝒓
Graph construction
Graph inference
An instance of the problem: 𝑟𝑎𝑛𝑔𝑒(𝑟)
24
summit mountain dancer
cold 20 50 3
hot 30 40 10
crisp 15 15 1
An instance of the problem: 𝑟𝑎𝑛𝑔𝑒(𝑟)
25
𝒄𝒓𝒊𝒔𝒑𝒂𝟏 clearly defined
𝒄𝒓𝒊𝒔𝒑𝒂𝟑 cold and invigorating
temperature
𝒄𝒐𝒍𝒅𝒂𝟏 low or inadequate
temperature
An instance of the problem: 𝑟𝑎𝑛𝑔𝑒(𝑟)
26
sense #1 sense #2 sense #3
1/2 1/3 1/4
Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature
27
Similar nodes Similar labels
But, limitedtraining data
𝒔𝒖𝒎𝒎𝒊𝒕, 𝒄𝒓𝒊𝒔𝒑
𝒎𝒐𝒖𝒏𝒕𝒂𝒊𝒏, 𝒄𝒐𝒍𝒅
s𝒂𝒍𝒔𝒂, 𝒉𝒐𝒕
28
Similar nodes Similar labels
But, limitedtraining data
Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature
Label Propagation: Loss function (Talukdar et. al 2009)
Seed label loss
Similar node diff label loss
Label prior loss (high
degree nodes are noise)
29
UV
30
Seed label loss
Similar node diff label loss
Label prior loss
Label propagation for graph inference, given few seeds.- Label per node = in/not in range of hasTemperature
WebChild : Model recap
31
Noisy, surface form candidates
for 𝒓
Clean, disambiguated triples in
𝒓
Graph construction
Graph inference
Resulting KB
Domain (hasShape)
mountain-n1
leaf-n1
...
Range (hasShape)
triangular-a1
tapered-a1
...
Assertions (hasSshape)
lens-n1, spherical-a2
palace-n2, domed-a1
...
WebChild: Large (~5Million), Semantically organized Accurate (0.82 sampled precision)
Summary of property commonsense
WebChild: First commonsense KB with fine-grained relations and disambiguated arguments ; 4.6 million assertions including domain and range for 19 relations.
Take away message: Transductive methods help
overcome sparsity of commonsense in text.
Research question 3
RQ 3. Commonsense of interactions between objects.- activities and their semantic attributes.
Previous work: • largely discuss events, but activities only at small-scale• do not organize the attributes of the activities• do not distinguish the meanings of the attribute values
35
{Climb up a mountain , Hike up a hill}
Participants climber, boy, rope
Location camp, forest, sea shore
Time day, holiday
Visuals
An Activity frame
36
{Climb up a mountain
, Hike up a hill}
Participants climber, boy, rope
Location camp, forest, sea shore
Time day, holiday
Visuals
Get to village
.. ..
Go up an elevation
.. ..
Previous activityParent activity
Reach at the top
.. ..
Next activity
Semantic organization of Activity frames
37
Contain events but not activity knowledge
May contain activities but no visuals and varying granularity of scene boundaries, transitions.
38
Hollywood narratives are good
Contain events but not activity knowledge
May contain activities but no visuals and varying granularity of scene boundaries, transitions.
39
Semantic parsing of scripts
Graph construction
40
Input: Text in a scene taken from a semi-structured movie script e.g. : He began to shoot a video on the summit
Output: Disambiguated, semantic roles e.g.the man : agent began to shoot : action a video : patientsummit : location
SRL systems are computationally expensive, domain specific
Semantic parsing of scripts
Graph construction
41
State of the art WSD customized for phrases
man.1
video.1
shoot.1
shoot.4
man.2
the man
began to
shoot
a video
42
State of the art WSD customized for phrases
man.1
video.1
shoot.1
shoot.4
man.2
the man
began to
shoot
a video
agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
NP VP NP
NP VP NP
VerbNet contains curated semantic roles for verbs
Selectional restriction
Selectional restriction
Can we use two different information sources to perform SRL given no training data?
43
State of the art WSD customized for phrasesSyntactic and semantic role
semantics from VerbNet
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Thing/ inanimate
WordNet class hierarchy
WordNet VerbNetlinkage
Jointly leverage
44
State of the art WSD customized for phrasesSyntactic and semantic role
semantics from VerbNet
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Thing/ inanimate
WordNet class hierarchy
WordNet VerbNetlinkage
Jointly leverage
Binary decision variable
45
State of the art WSD customized for phrasesSyntactic and semantic role
semantics from VerbNet
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Thing/ inanimate
WordNet class hierarchy
WordNet VerbNetlinkage
Jointly leverage
WSD prior WN prior
46
State of the art WSD customized for phrasesSyntactic and semantic role
semantics from VerbNet
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Thing/ inanimate
WordNet class hierarchy
WN VN linkage
Jointly leverage
Sense, VN syntactic match score
47
State of the art WSD customized for phrasesSyntactic and semantic role
semantics from VerbNet
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Thing/ inanimate
WordNet class hierarchy
WN VN linkage
Jointly leverage
Sense, VN semantic match score
48
xij = binary decision var. for word i, mapped to WN sense j
WSD prior WN prior Word, VN match score
Selectional restriction score
One VN sense per verb
WN, VN sense consistency
Selectional restr. constraints
binary decision
Joint WSD and SRL
… …
Joint WSD and SRL O/P
Agent:
man.1
Action:
shoot.4
Patient:
video.1
man.1
video.1
shoot.1
shoot.4
man.2 agent.animate
shoot.vn.1patient.animate
agent.animate
shoot.vn.3patient.
inanimate
the man
began to
shoot
a video
NP VP NP
NP VP NP
Semantic parsing of scripts
Graph construction
Climb up a mountain
Participants climber, rope
Location summit, forest
Time day
Semantic parsing of scripts
Graph construction
51
Climb up a mountain
Participants climber, rope
Location summit, forest
Time day
Hike up a hill
Participants climber
Location sea shore
Time holiday
Go up an
elevation
.. ..
Reach top
.. ..
Semantic parsing of scripts
Graphconstruction
Construct a graph of activity frames with three edge types:
Similar : S(a,b) Previous: P(a,b)TypeOf : T(a,b)
52
Similarity: S (climb up a mountain, hike up a hill)
Attribute similarity
Climb up a mountain
Participants climber, rope
Location forest
Time day
Hike up a Hill
Participants climber
Location woods
Time holiday
+Activity Similarity
53
Attribute hypernymy
Climb up a mountain
Participants climber, rope
Location forest
Time day
Go up an elevation
Participants Person
Location Exterior
Time day
+Activity hypernymy
TypeOf: T (climb up a mountain, go up an elevation)
54
Climb up a mountain
… …
Reach the top
… …
Previous: P (reach the top, climb up a mountain)
Allow gaps between activities within one scene.PMI style counting to suppress generic activities.
Scene:
Carrie and Big start out early to head to the village. They climb up the beautiful mountain which felt as if they were in a different world. After several hours they eventually reach the top.
…
55
Climb up a mountain
Participants climber, rope
Location summit, forest
Time day
Hike up a hill
Participants climber
Location sea shore
Time holiday
Go up an elevation
.. ..
Reach top
.. ..
Semantic parsing of scripts
Graph construction
similar
56
Semantic parsing of scripts
Graph construction
57
Knowlywood Statistics
Scenes 1,708,782Activity synsets 505,788
Accuracy 0.85 ± 0.01#Images from scenes 30,000
Resulting KB: Knowlywood
Summary of activity commonsense
Knowlywood: First organized commonsense activity KB with activity attributes and disambiguated values containing nearly 1 million activities with visuals.
Take away message: Jointly leveraging different annotated
resources helps overcome sparsity of training data.
The overall KB: WebChild KB
> 3M concepts, > 18M triples, >1000 relations
Conclusions and take home messages:Knowledge to make machines smarter can be acquired with robust techniques that jointly leverage global information
• Research Question 1Properties
(WSDM’14)
• Research Question 2Comparatives, part-whole
(AAAI’14, AAAI’16)
• Research Question 3Activities
(WWW’15, CIKM’15)
60
WEBCHILD KB Applications(CVPR’15, ACL’15, ISWC’16..)
Conclusions and take home messages:Knowledge to make machines smarter can be acquired with robust techniques that jointly leverage global information
• RQ1
• Range, domain, assertions of fine-grained relations
Properties
(WSDM’14)
• RQ2
• Fine-grained comparative, part-whole relations
Comparatives, part-whole
(AAAI’14, AAAI’16)
• RQ3
• Activity frames with semantic attributes
Activities
(WWW’15, CIKM’15)
61
WEBCHILD KB Applications(CVPR’15, ACL’15, ISWC’16..)
ML + NLP community
limited training data can be overcome by jointly leveraging multiple cues
Computer Vision community
commonsense helps computer vision
vision helps commonsense acquisition
AI community
semantically organized knowledge is a step towards filling human machine gap