54
Linguistic Knowledge Linguistic Knowledge Representation Representation Scott Farrar Scott Farrar Department of Linguistics Department of Linguistics [email protected] [email protected]

Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Embed Size (px)

DESCRIPTION

Inference from Knowledge John’s hand is in his pocket. John owns the hand. The hand is physically attached to John. Hand is a body-part, not a person. The hand is physically contained in the pocket, not the other way around. John’s wants his hand to be in his pocket. This event is occurring now. John’s hand is not in Bill’s pocket. A pocket is a container in clothing. A hand is smaller than a pocket. L CS, V L, CS V CS L L-Linguistic CS-Commonsense V-Visual

Citation preview

Page 1: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Linguistic Knowledge Linguistic Knowledge RepresentationRepresentation

Scott FarrarScott FarrarDepartment of LinguisticsDepartment of Linguistics

[email protected]@u.arizona.edu

Page 2: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Problems to OvercomeProblems to Overcome1.1. Specifying the relationship between Specifying the relationship between

linguistic and other forms of linguistic and other forms of knowledge.knowledge.

Page 3: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Inference from KnowledgeInference from KnowledgeJohn’s hand is in his pocket.

John owns the hand.

The hand is physically attached to John.

Hand is a body-part, not a person.

The hand is physically contained in the pocket, not the other way around.

John’s wants his hand to be in his pocket.

This event is occurring now.

John’s hand is not in Bill’s pocket.

A pocket is a container in clothing.

A hand is smaller than a pocket.

LCS, V

L, CS

VCS

CS

L

CS

CS

L-LinguisticCS-CommonsenseV-Visual

Page 4: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Problems to OvercomeProblems to Overcome1.1. Specifying the relationship between Specifying the relationship between

linguistic and other forms of linguistic and other forms of knowledge.knowledge.

2.2. Dealing with ambiguity and other Dealing with ambiguity and other issues of natural language issues of natural language processing (NLP).processing (NLP).

Page 5: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

What is Meaning?What is Meaning? Symbols, Representation, ExtensionsSymbols, Representation, Extensions

[010011101]

moon

Page 6: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

What is Meaning?What is Meaning?

Conceptual structure hypothesis:Conceptual structure hypothesis:

The meaning of a word is the The meaning of a word is the corresponding mental corresponding mental representationrepresentation in the mind of an in the mind of an agent.agent.

Page 7: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

The Lexicon links Linguistic The Lexicon links Linguistic Knowledge to the Rest of Knowledge to the Rest of

CognitionCognition

ConceptualStructure

SpatialStructure

SyntacticStructure

Visioninput

Hapticinput

Auditoryinput

Motoroutput

Jackendoff’s Model

Page 8: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

The Computational LexiconThe Computational Lexicon formform—what data structures comprise —what data structures comprise

the lexicon?the lexicon?

organizationorganization—how are the data —how are the data structures organized?structures organized?

contentcontent—what information is contained —what information is contained in the data structures?in the data structures?

Page 9: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Features—Katz and Fodor (1963):Features—Katz and Fodor (1963):knowledge is a conjunction of knowledge is a conjunction of features (monadic predicates)features (monadic predicates)

bachelor (x)bachelor (x)→→unmarried (x) & male (x) & unmarried (x) & male (x) &

young (x)young (x)

Form of the LexiconForm of the Lexicon

Page 10: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Form of the LexiconForm of the Lexicon Frames—Minsky (1975):Frames—Minsky (1975):

knowledge is organized around knowledge is organized around conceptsconcepts

give:give:<agent Person><agent Person><recipient Person><recipient Person><theme PhysicalObject><theme PhysicalObject>slots values

Page 11: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Form of the LexiconForm of the Lexicon Attribute-value matrices (feature Attribute-value matrices (feature

structures)structures)From knowledge engineering (AI) From knowledge engineering (AI) communitycommunitye.g., Head-Driven Phrase Structure e.g., Head-Driven Phrase Structure GrammarGrammar

<see example><see example>

Page 12: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Organization of the LexiconOrganization of the Lexicon Semantic network—Quillian (1966):Semantic network—Quillian (1966):

knowledge is interconnectedknowledge is interconnectedanimal

Subbird

tweety

lungs

feathers

has-part

has-part

Inst

Page 13: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Hierarchy of ConceptsHierarchy of ConceptsArtifact

Machine Tool

Motorized-Machine

Non-motorized-Machine

automobile drill loom windmill

… …

Page 14: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Problems to OvercomeProblems to Overcome1.1. Specifying the relationship between Specifying the relationship between

linguistic and other forms of knowledge.linguistic and other forms of knowledge.2.2. Dealing with ambiguity and other issues Dealing with ambiguity and other issues

of natural language processing (NLP).of natural language processing (NLP).3.3. Determining what role visual knowledge Determining what role visual knowledge

of objects and events has in the of objects and events has in the disambiguation process.disambiguation process.

Page 15: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Formalism for Ambiguity Formalism for Ambiguity ResolutionResolution(9/13/02)(9/13/02)Scott FarrarScott Farrar

Department of LinguisticsDepartment of [email protected]@u.arizona.edu

Page 16: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Goals of the Present Goals of the Present ResearchResearch

Build a theoretical system that can Build a theoretical system that can construct a visual scene from English construct a visual scene from English text input.text input. focus on the problem of lexical ambiguityfocus on the problem of lexical ambiguity access and use the visual knowledge access and use the visual knowledge

linked to lexical itemslinked to lexical items argue for a knowledge-rich approach to argue for a knowledge-rich approach to

natural language processing (lexicon)natural language processing (lexicon)

Page 17: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Lexical AmbiguityLexical AmbiguityWhen one linguistic form has multiple meanings:When one linguistic form has multiple meanings:The book is on the The book is on the edgeedge of the table. of the table. [area][area]The The edgeedge of the table is sharp. of the table is sharp. [line] [line]

The park is five The park is five blocksblocks away. away. [large dimension][large dimension]Kids like to play with Kids like to play with blocksblocks.. [small dimension][small dimension]

The The middlemiddle (of the bench) is wet. (of the bench) is wet. [center-part][center-part]Put the pan in the Put the pan in the middlemiddle (between the bowls). (between the bowls). [space-[space-

between]between]

To vote “YES” check the upper To vote “YES” check the upper boxbox.. [2d][2d]Put your hand in the Put your hand in the boxbox.. [3d][3d]

Page 18: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Natural Language Natural Language ProcessingProcessing

syntax

“The king gave the people bread.”

give: tense: past agent: the king recipient: the people theme: bread

The king gave the people bread DT N VB DT N N

(the king) (gave) (the people) (bread)

The people ate the bread.

The people have bread.

lexicon

semantics

Grammar

other knowledge

Page 19: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

How much visual knowledge How much visual knowledge does the lexicon have access does the lexicon have access

to?to?Type of conceptType of concept ExampleExamplea.a. a-spatiala-spatial fear, hour, durationfear, hour, durationb.b. extrinsicallyextrinsically animal, robot,animal, robot,

spatialspatial instrumentinstrumentc.c. intrinsicallyintrinsically horse, man, violinhorse, man, violin

spatialspatial my legmy legd.d. strictly spatialstrictly spatial square, margin, square, margin,

heightheight(Bierwisch 1996: 52)(Bierwisch 1996: 52)

Page 20: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Content of the LexiconContent of the Lexicon

Purely linguistic:Purely linguistic:

dogdog

““dog”, [da:g]dog”, [da:g]dog collar dog collar not *not *collar dogcollar dog

dog+PL = dog+PL = dogs dogs not *not *dogesdoges

Page 21: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Content of the LexiconContent of the LexiconPurely visual:Purely visual: dogdog

shape: shape: size:size:

color:color:texture:texture:

Page 22: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Content of the LexiconContent of the LexiconCommonsense/other/visual:Commonsense/other/visual:

dogdog

has-part (dog, tail)has-part (dog, tail)makes-noise (dog, “makes-noise (dog, “barkbark”)”)

disjoint (dog cat)disjoint (dog cat)likes (Scott, terrier)likes (Scott, terrier)

Page 23: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Knowledge ComponentsKnowledge Components

L

VCS

knowledge is distributedyet interoperable

L LinguisticCS Commonsense V Visual

Page 24: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Formalization of the ProblemFormalization of the Problem input: a list of well-formed English input: a list of well-formed English

utterances utterances U, U, where where UU={={uu11,u,u22,u,u33,,…,u…,unn}, |}, |UU||≥1, and ≥1, and UU can be can be interpreted as a complete visual interpreted as a complete visual scene.scene.

Page 25: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Examples of Examples of UU{John is standing on the bridge. John has his {John is standing on the bridge. John has his

hands in his pockets. John is wearing a hands in his pockets. John is wearing a cap…}cap…}

{The table is in the middle of the room. There {The table is in the middle of the room. There is a ball on the edge of the table…}is a ball on the edge of the table…}

notnot{Mary loves John. She has known him for four {Mary loves John. She has known him for four

years…}years…}

Page 26: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Formalization of the Problem Formalization of the Problem (cont.)(cont.)

output: output: VVUU, such that , such that VVUU is a visual is a visual scene based on scene based on UU consisting of a 3- consisting of a 3-tuple tuple <<I, O, RI, O, R> where > where II is a set of icons, is a set of icons, OO is a set of orientations for the is a set of orientations for the icons, and icons, and RR is a set of relations is a set of relations among the icons.among the icons.

Page 27: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

A Solution ApproachA Solution Approach Represent all knowledge in pure first-Represent all knowledge in pure first-

order logicorder logic a knowledge base KB consists of a knowledge base KB consists of

axioms and facts about the domain axioms and facts about the domain with no distinctions made between with no distinctions made between types of knowledgetypes of knowledge

a forward-chaining algorithm is used a forward-chaining algorithm is used to generate a visual sceneto generate a visual scene

Page 28: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Formalization of the LexiconFormalization of the LexiconA lexicon L is at least the 4-tuple <F, R, G, C>, A lexicon L is at least the 4-tuple <F, R, G, C>,

where:where:

F is the set of linguistic forms.F is the set of linguistic forms.

R is the set of formal relations among members of R is the set of formal relations among members of F.F.

G is the grammatical information relevant to F.G is the grammatical information relevant to F.C is the conceptual content (the meaning).C is the conceptual content (the meaning).

Page 29: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Commonsense KnowledgeCommonsense Knowledge

A ball will not remain stationary A ball will not remain stationary on an on an inclined surface.inclined surface.

A jar can be a container.A jar can be a container. Unsupported objects fall.Unsupported objects fall.

Page 30: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Formalization of Commonsense Formalization of Commonsense KnowledgeKnowledge

KB is the tuple <C, R, I, O>, where:KB is the tuple <C, R, I, O>, where:

C is the (possibly infinite) set of concepts.C is the (possibly infinite) set of concepts.

R is set of relations over C.R is set of relations over C.

I is the set of individuals.I is the set of individuals.

O is an ontology specifying the precise O is an ontology specifying the precise formalization of C, R, and I.formalization of C, R, and I.

Page 31: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Visual KnowledgeVisual KnowledgeConcepts:Concepts:

AbstractShapes={Circle, Line, Sphere,…}AbstractShapes={Circle, Line, Sphere,…}

Relations:Relations:SpatialRelations={In, On, Contains,…}SpatialRelations={In, On, Contains,…}

Axioms:Axioms:Peas are smaller than landmines.Peas are smaller than landmines.If object A contacts object B, then A is near B.If object A contacts object B, then A is near B.

Page 32: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Formalization of Visual Formalization of Visual KnowledgeKnowledge

Visual knowledge V is a subset of KB. Visual knowledge V is a subset of KB.

Page 33: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

So far so GoodSo far so Good lexicon L = lexicon L = <F, R, G, C><F, R, G, C> knowledge base KB = <C, R, I, O>knowledge base KB = <C, R, I, O> visual knowledge V visual knowledge V ∈∈ KB KB* Well understood inferencing * Well understood inferencing

procedures for FOL knowledge procedures for FOL knowledge bases: theorem proving: Prolog, bases: theorem proving: Prolog, forward-chaining: CLIPSforward-chaining: CLIPS

Page 34: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

So Far so IntractableSo Far so Intractable** If represented in pure first-order (or If represented in pure first-order (or

higher order) logic, then the higher order) logic, then the problem will eventually become problem will eventually become computationally intractable computationally intractable depending on scope of domain and, depending on scope of domain and, for NLP, ambiguity of the word in for NLP, ambiguity of the word in question (compare ‘edge’ to ‘hand’).question (compare ‘edge’ to ‘hand’).

Page 35: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

AlternativesAlternatives

Use a logical system that is well-Use a logical system that is well-understood and known to be understood and known to be complete and tractablecomplete and tractable

Description Logic (Brachman Description Logic (Brachman 1979)1979)

Page 36: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Description LogicDescription Logic A KR formalism (like frames, semantic nets, prod. syst.)A KR formalism (like frames, semantic nets, prod. syst.) A way to build a conceptualization of the domainA way to build a conceptualization of the domain Basic structure is the concept (a structured entity)Basic structure is the concept (a structured entity) Intuitively appealingIntuitively appealing Incorporates a subset of FOLIncorporates a subset of FOL Expressive syntax and decidable inference proceduresExpressive syntax and decidable inference procedures

e.g., KL-ONE (Brachman 1979) e.g., KL-ONE (Brachman 1979) KRYPTON (Brachman, Fikes, and Levesque (1983) KRYPTON (Brachman, Fikes, and Levesque (1983) LOOM, CLASSIC…LOOM, CLASSIC…

Page 37: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Description Logic Description Logic ComponentsComponents

SyntaxSyntax The KBThe KB SemanticsSemantics Reasoning ProceduresReasoning Procedures Reasoning TasksReasoning Tasks

Page 38: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Syntax: AtomsSyntax: Atoms concepts concepts (unary predicates)(unary predicates) rolesroles (binary predicates)(binary predicates) individualsindividuals (constants)(constants)

Page 39: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Syntax: ConceptsSyntax: Concepts RoundRound FlatFlat LongLong HoleHole PersonPerson EventEvent

Page 40: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Syntax: RolesSyntax: Roles OnOn InIn StrikeStrike TouchTouch NeighborNeighbor FatherFather

Page 41: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Syntax: IndividualsSyntax: Individuals TABLE-1TABLE-1 JOHNJOHN MY-HANDMY-HAND ROLLING-EVENT-3ROLLING-EVENT-3

Page 42: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

The Knowledge Base of a The Knowledge Base of a Description LogicDescription Logic

Terminology (TBox) – hierarchy of Terminology (TBox) – hierarchy of concepts and rolesconcepts and roles

Assertions (ABox) – axioms for Assertions (ABox) – axioms for individual objectsindividual objects

Page 43: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Benefits of Dividing the KBBenefits of Dividing the KB Reasoning is tractable due to sacrifice of Reasoning is tractable due to sacrifice of

expressiveness (Brachman and Levesque 1985)expressiveness (Brachman and Levesque 1985) Philosophically ‘clean’: Philosophically ‘clean’:

TBox (intensional knowledge, always true, TBox (intensional knowledge, always true, doesn’t change, a priori) doesn’t change, a priori) ABox (extensional knowledge, can change, a ABox (extensional knowledge, can change, a posteriori)posteriori)

Satisfiability of conceptualization (domain) is Satisfiability of conceptualization (domain) is determined easily when only TBox is determined easily when only TBox is consideredconsidered

Conceptual modeling appears more intuitiveConceptual modeling appears more intuitive

Page 44: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Syntax: ConstructorsSyntax: Constructors intersection (C intersection (C D) : D) :

Round Round Flat FlatRound Round Flat Flat Light Light

value restriction (value restriction (∀ ∀ R.C): R.C): ∀∀hasHole.ContainerhasHole.Container

limited existential quantification (limited existential quantification (∃R.T∃R.T): ): ∃hasHole.T∃hasHole.T

Page 45: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Syntax: TBox AxiomsSyntax: TBox Axioms DefinitionsDefinitions

Ball Ball Sphere Sphere Toy ToyBox Box Container Container Cube CubeBiped Biped Animal Animal =2hasLegs =2hasLegsDefinitions are basic operation in TBox for Definitions are basic operation in TBox for

deriving new concepts (other concepts deriving new concepts (other concepts are primitive).are primitive).

Page 46: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Syntax: TBox Axioms (cont.)Syntax: TBox Axioms (cont.) subsumption operator: ‘subsumption operator: ‘’’

Table Table FurnitureFurnitureHuman Human Animal AnimalSphere Sphere 3DShape 3DShape

Provide structure for TBoxProvide structure for TBox

Page 47: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Syntax: ABox AssertionsSyntax: ABox Assertions Concept assertions C(a):Concept assertions C(a):

Person (JOHN)Person (JOHN)Table (TABLE-1)Table (TABLE-1)Tool (HAMMER-23)Tool (HAMMER-23)

Role assertions R(a,b):Role assertions R(a,b):Likes (JOHN, HAMMER-23)Likes (JOHN, HAMMER-23)On (HAMMER-23, TABLE-1)On (HAMMER-23, TABLE-1)

Serves to link individuals in ABox to concepts in TBoxServes to link individuals in ABox to concepts in TBox

Page 48: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Note on T/ABox RelationNote on T/ABox Relation TBox imposes selection restrictions TBox imposes selection restrictions

on ABox assertions, e.g., Eat(JOHN, on ABox assertions, e.g., Eat(JOHN, TABLE-1) is not possible, becauseTABLE-1) is not possible, because

Edible(Table) is not in TBoxEdible(Table) is not in TBox

Page 49: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Informal SemanticsInformal Semantics A concept C is a set of individuals A concept C is a set of individuals

{a,b,…}{a,b,…} A role R is a relation between a pair A role R is a relation between a pair

of individuals or concepts.of individuals or concepts. A conjoined concept C A conjoined concept C ⊔⊔ D is both C D is both C

and D.and D.

Page 50: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Reasoning TasksReasoning Tasks TBoxTBox

categorization, satisfiabilitycategorization, satisfiability ABoxABox

consistency checking w.r.t. TBoxconsistency checking w.r.t. TBox

Page 51: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Ambiguity in a DL SystemAmbiguity in a DL System More than one referent concept in More than one referent concept in

TBoxTBox if a word W represents concept C and if a word W represents concept C and

concept D, and C and D are disjoint, concept D, and C and D are disjoint, then W is ambiguousthen W is ambiguous

Represents(‘edge’, LinearBoundary)Represents(‘edge’, LinearBoundary) Represents(‘edge’, 2DSurface)Represents(‘edge’, 2DSurface)

Page 52: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Consistency Checking for Consistency Checking for Determining Lexical AmbiguityDetermining Lexical Ambiguity

Recall that ‘box’ is ambiguous, as in Recall that ‘box’ is ambiguous, as in ‘Check the ‘Check the boxbox if you are a student.’ if you are a student.’

2DObject(Box-34)2DObject(Box-34)3DObject(Box-34)3DObject(Box-34)

Disjoint (2DObject, 3DObject)Disjoint (2DObject, 3DObject)

Page 53: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

Consistency Checking for Consistency Checking for Resolving AmbiguityResolving Ambiguity

The ball is on the The ball is on the edgeedge of the table. of the table. Ball can’t be in two places.Ball can’t be in two places. Only one semantic representation Only one semantic representation

obtains from the TBox.obtains from the TBox.

Page 54: Linguistic Knowledge Representation Scott Farrar Department of Linguistics

ConclusionConclusion Word sense disambiguation is Word sense disambiguation is

challenge to NLP.challenge to NLP. NLP can benefit from a knowledge-rich NLP can benefit from a knowledge-rich

approach.approach. A combination of visual and other A combination of visual and other

commonsense assertions can enrich commonsense assertions can enrich the lexicon.the lexicon.

Description logic provide a Description logic provide a computationally tractable solution.computationally tractable solution.