23
1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University in Prague

1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

Embed Size (px)

Citation preview

Page 1: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

1

Lexical Semantics for the Semantic Web

Patrick HanksMasaryk University, Brno

Czech Republichanks@fi,muni.cz

UFAL, Mathematics Faculty, Charles University in Prague

Page 2: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

Outline of the talk

• A neglected aspect of Tim Berners-Lee’s vision:– Introducing semantics to the semantic web– Computing meaning and inferences in free text

• Patterns in text and how to use them• Building a resource that encodes patterns

– linking meanings (implicatures) to patterns (not to words)– A “pattern dictionary”– What does the pattern dictionary look like?– The role of an ontology in a pattern dictionary

2

Page 3: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

3

Aims of the Semantic Web

• “To enable computers to manipulate data meaningfully”

• “Most of the Web's content today is designed for humans to read, not for computer programs to manipulate meaningfully.”

— Berners-Lee et al., Scientific American, 2001

Page 4: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

A neglected aspect of Berners-Lee’s vision

• “Web technology must not discriminate between the scribbled draft and the polished performance.” —T. Berners-Lee et al., Scientific American, 2001

• The vision includes being able to process the meaning and implicatures of free text

• not just pre-processed tagged texts – Wikis, names, addresses, appointments, and suchlike.

4

Page 5: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

5

A paradox

• “Traditional KR systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as 'parent' or 'vehicle'.”

– Berners-Lee et al. 2001.– Implying that SW is more tolerant? – Apparently not:

• “Human languages thrive when using the same term to mean somewhat different things, but automation does not.” --Ibid.

Page 6: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

6

The root of the problem• Scientists from Leibniz to the present have wanted word

meaning to be precise and certain.– But it isn’t. Meaning in natural language is vague and

probabilistic

• Some theoretical linguists (and CL researchers), not liking fuzziness in data, have preferred to disregard data in order to preserve theory

Do not allow SW research to fall into this trapTo fulfil Berners-Lee’s dream, we need to be able to compute

the meaning of un-pre-processed documents

Page 7: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

7

What NOT to do for the SW

• The meaning of the English noun second is vague: “a short unit of time” or “1/60 of a minute”. – Wait a second. – He looked at her for a second.

• It is also a very precisely defined technical term in certain scientific contexts – the basic SI unit of time: – “the duration of 9,192,631,770 cycles of radiation

corresponding to the transition between two hyperfine levels of the ground state of an atom of caesium 133.”

• If we try to stipulate a precise meaning for all terms in advance of using them, we’ll never be able to fulfil the dream – and we will invent an unusable language

Page 8: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

8

Precision and vagueness

• Stipulating a precise definition for an ordinary word such as second removes it from ordinary language.

• When it is given a precise, stipulative definition, an ordinary word becomes a technical term.

• “An adequate definition of a vague concept must aim not at precision but at vagueness; it must aim at precisely that level of vagueness which characterizes the concept itself.”

– Wierzbicka 1985, pp.12-13

Page 9: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

9

The paradox of natural language

• Word meaning may be vague and fuzzy, but people use words to make very precise statements

• This can be done because text meaning is holistic, e.g. – “fire” in isolation is very ambiguous; – But “He fired the bullet that was recovered from the

girl's body” is not at all ambiguous– “Ithaca” is ambiguous; – But “Ithaca, NY” is much less ambiguous.

Even the tiniest bit of (relevant) context helps.

Page 10: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

10

What is to be done?

• Process only the (strictly defined) mark-up of documents, not their linguistic content?– And so abandon the dream of enabling computers to

manipulate linguistic content?• Force humans to conform to formal requirements when

writing documents? – Not a serious practical possibility

• Teach computers to deal with natural language in all its fearful fuzziness?– Maybe this is what we need to do

Page 11: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

11

Hypertext and relevance

• “The power of hypertext is that anything can link to anything.”

– Berners-Lee et al., 2001• Yes, but we need procedures for determining

(automatically) what counts as a relevant link, e.g. – Firing a person is relevant to employment law.– Firing a gun is relevant to warfare and armed robbery.

Page 12: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

How do we know who is doing what to whom?

• Through context (a standard, uncontroversial answer)• But teasing out relevant context is tricky:

– Firing a person: [[Person]] MUST be mentioned – Whereas firing a gun occurs in patterns where neither

[[Firearm]] nor [[Projectile]] are mentioned, e.g.– The police fired into the crowd/over their heads/wide.

• Negative evidence can be important:– “He fired” cannot mean he dismissed someone from employment

• Relevant context is cumulative– So correlations among arguments are often needed

12

Page 13: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

How to compute meaning for the Semantic Web

STEP 1. Identify all the normal patterns of normal utterances by data analysis

STEP 2. Develop a resource that says precisely what the basic implicatures of each pattern are, e.g.

[[Human]] fire [Adv[Direction]] = [[Human]] causes [[Firearm]] to discharge [[Projectile]]

STEP 3. Populate the semantic types in an ontologySTEP 4. Develop a linguistic theory that distinguishes norms

from exploitationsAbandon the received theories of speculative linguists

STEP 5. Develop procedures for finding best matches between a free text statement and a pattern.

13

Page 14: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

The double helix of language: norms and exploitations

• A natural language consists of TWO kinds of rule-governed behaviour:– Using words normally– Exploiting the norms

• We don’t even know what the norms of any language are, still less the exploitation rules

• People have assumed that norms of usage are obvious– But only some of the things that are obvious are true– We need to identify the norms by painstaking empirical analysis of

evidence

• There is not a sharp dividing line between ‘norm’ and ‘exploitation’

• Today’s norm is tomorrow’s exploitation

14

Page 15: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

15

Corpus Pattern Analysis (CPA)

1. Identifies normal usage patterns for each word– Each pattern include a verb, its valencies, and the

semantic type(s) of each argument (valence)2. Associates a meaning (“implicature”) with each pattern

(NOT with each word)3. Provides a basis for matching occurrences of target

words in unseen texts to their nearest pattern (“norm”)4. CPA is the basis for a “Pattern Dictionary” (demo):

– http://nlp.fi.muni.cz/projekty/cpa/– Click on “web access’ in line 1

Page 16: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

Focusing arguments by semantic-type alternation

• You can calm a person, calm a horse, calm someone’s nerves, fears, or anxiety.– These all activate the same meaning of the verb calm. Anxiety

does not have the required semantic type (anxiety is not [[Animate]])

– However, the expected animate argument is present – but only as a possessive. And even if there is no possessive, being an attribute of [[Animate]] is part of the meaning of nerves, fear, anxiety, etc.

• Regular alternations such as these have a focusing function. They do not activate different senses.

• Other examples:– Repair a car, repair the engine (of a car), repair the damage– Treat a person, treat her injuries, treat her injured arm

16

Page 17: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

17

Ontologies• The arguments of CPA patterns are expressed as semantic

types, related to a shallow semantic ontology.• The term ontology is – has become – highly ambiguous:• SW ontologies are, typically, interlinked networks of things

like address lists, dates, events, and websites, with html mark-up showing attributes and values

• They differ from philosophical ontologies, which are theories about the nature of all the things in the universe that exist

• They also differ from lexical ontologies such as WordNet, which are networks of words with supposed conceptual relations

• The CPA shallow ontology is a device for grouping semantically similar words together to facilitate meaning processing

Page 18: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

The CPA Shallow Ontology

• The CPA Shallow Ontology is a “bag of bags of words”• Developed, bottom-up, by cluster analysis of corpora:

– The nouns that NORMALLY occur in the same syntagmatic slot in relation to a given verb are grouped into a cluster

– A cluster of different nouns activate the same meaning of the verb

– The cluster is named with a semantic type, e.g. [[Human]], [[Event]], [[Abstract]], [[Artefact]], etc.

– Each cluster is compared with similar clusters occurring with other verbs. Each combination of clusters constitutes a lexical set.

– Identically named clusters contain slightly different members (lexical items)

– Therefore, lexical sets “shimmer”.

18

Page 19: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

The Predictive Power of Lexical Sets

• EXAMPLE: A noun, ‘meeting’ has been classified with semantic type [[Event]] at both arrange and attend

• Suppose ‘meeting’ is found in the direct object slot after leave or run—but not frequently enough to have been included in a cluster for those verbs in the Ontology

• However, the patterns “[[Human]] leave [[Event]]” and “[[Human]] run [[Event]]” will be found in the Pattern Dictionary– Then there is a high probability that ‘meeting’ belongs there (even

though not listed as typical), activating probable implicatures:– leave = "go away from” – run = "organize and cause to function efficiently”

19

Page 20: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

Phraseology in Computational Linguistics

• Computational linguists are turning away from word-by-word analysis (the “Lego bricks” method, inherited from Frege) to phraseological analysis. E.g.– Marine Carpuat and Dekai Wu. 2007. How phrase sense

disambiguation outperforms word sense disambiguation for statistical machine translation. In Proceedings, Conference on Theoretical and Methodological Issues in Machine Translation (TMI 2007). Skovde, Sweden

• The Pattern Dictionary provides an inventory of patterns– A benchmark for NLP researchers using patterns

– A benchmark for introducing semantics to the Semantic Web

20

Page 21: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

The English Pattern Dictionary: current status

• Focuses on verbs– Specifically, the correlations among the lexical and semantic

values of the arguments of each sense of each verb• 700 verbs analysed so far

– 400 verbs complete, finalized, checked and released– 300 more are work in progress, awaiting checking– There are approximately 6000 verbs in English, so we have done

about 10%

• Shallow ontology in development• New lexically driven theory of language, which is precise

about the vague phenomenon of language – Hanks (forthcoming): Analysing the Lexicon: Norms and

Exploitations. MIT Press

21

Page 22: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

The English Pattern Dictionary: the future

• 5,400 more verbs to analyse (then the adjectives)• Develop a different procedure for nouns (noun-y nouns)• Finalize the CPA shallow ontology and populate it• Pattern dictionaries for other languages

– Czech – German (A. Geyken, Berlin)– Italian (E. Jezek, U. of Pavia)

• Theoretical work: – Typology of exploitations– Implications of CPA for parsing theory– Alternation of semantic types in arguments– Relationship between semantic types and semantic roles– Links between the Pattern Dictionary and FrameNet–

22

Page 23: 1 Lexical Semantics for the Semantic Web Patrick Hanks Masaryk University, Brno Czech Republic hanks@fi,muni.cz UFAL, Mathematics Faculty, Charles University

23

Conclusions

To enable computers to “manipulate data meaningfully” (the raw data itself, not just tags added to the data), we need:

• an inventory of patterns of normal usage for each word– a “pattern dictionary”

• a theory that distinguishes normal usage from exploitations of norms for rhetorical, poetic, and other purposes

• pattern-matching procedures: text < — > pattern dictionary• a statistical, probabilistic approach to identifying meaning.

– Only then will computers be able to compute the “meaning” of texts, understand the implicatures, translate them, retrieve data from them, and manipulate them in other ways

– At that point, we shall be a little closer to realizing Berners-Lee’s 2001 dream