27
Building a Primitive- based Lexical Consultation System prepared by Lim Beng Tat Supervisor: Dr Tang Enya Kong Dr. Guo Cheng Ming

Limbengtatt25july2002

Embed Size (px)

Citation preview

Page 1: Limbengtatt25july2002

Building a Primitive-based Lexical Consultation System

prepared by Lim Beng Tat

Supervisor: Dr Tang Enya Kong

Dr. Guo Cheng Ming

Page 2: Limbengtatt25july2002

Abstract

The research gives about the design of semantic-primitive-based lexical consultation system and the possible processes which will be performed on a mahine-readable dictionary (MRD) and corpus to produce a machine-tractable dictionary (MTD) and tractable corpus automatically. Linguistic tools such as sense tagger and reources are created during or after the processes. Besides that, this research will also show how to perform an unsupervised word sense disambiguation method to the samples of unrestricted text from various prospective application areas by using the newly constructed MTD. This is important to the applications that need lexical semantics such as machine translation, information retrieval and hypertext navigation, content and thematic analysis, grammatical analysis, speech processing and text processing.

Page 3: Limbengtatt25july2002

Outline Introduction

Problem Objective

Lexical Consultation System System design and architecture

Example applications Bilingual Knowledge Bank

Page 4: Limbengtatt25july2002

Introduction Dictionaries

Supply knowledge (language and world) E.g. Collins English Dictionary (CED), Longman's

Dictionary of Contemporary English (LDOCE) and Webster's 9th Dictionary (W9)

word sn pos definition be 10 n spend or use timeenglish 2 n people of england . . . ... . . . ...

Page 5: Limbengtatt25july2002

Introduction (Cont) Explicit information (POS) Implicit information / semantic information

Hypernym/hyponym relations (class/subclass)

Synonymy/Antonymy relations Meronym/Holonym relation (part/whole, ...) Collocational relations (compounds,

idioms, ...) and etc

Page 6: Limbengtatt25july2002

Introduction (Cont) Problem: Extracting semantic information from

dictionary? 2 methods:

Defining pattern Identify significant recurring phrase E.g. “A member of”- NP

hand a member of a ship's crew…[W9]

Extraction of semantic hierarchy Extraction of hyponym. E.g. dipper a ladle used for dipping... [CED] ladle a long-handled spoon... [CED] spoon a metal, wooden, or plastic utensil... [CED]

utensil

spoon

ladle

dipper

Page 7: Limbengtatt25july2002

Introduction (Cont) Disadv:

Circularity E.g. tool an implement, such as a hammer... [CED] implement a piece of equipment; tool or utensil. [CED] utensil an implement, tool or container... [CED]

Inconsistency in dictionaries E.g. corkscrew a pointed spiral piece of metal... [W9] dinner service a complete set of plates and

dishes... [LDOCE] Dictionaries for human usage

Other methods: Semantic primitive and word sense disambiguation

Page 8: Limbengtatt25july2002

Semantic Primitive Semantic primitive refer to a “core”

meaning that cannot be not further analyzed E.g. bachelor and red bachelor means that someone is a man who

is not married What does red mean ? red represents semantic primitive (a basic

meaning), while bachelor does not.

Page 9: Limbengtatt25july2002

Semantic Primitive (Cont) 2 types of semantic primitive

Prescriptive and descriptive Prescriptive semantic primitives

Set of pre-defined primitive

E.g. father marry couple marry :[human, human]. father : [human] couple : [human, thing].

To choose the correct sense of ‘couple’

Page 10: Limbengtatt25july2002

Prescriptive semantic primitives Problem: always need to be extended

Descriptive semantic primitives Set of semantic primitives which is derived

from a natural source of data such as dictionary.

E.g. father5 - a term#5 of address for priest#2 in some

church especially roman#7 or orthodox#3 catholic

marry3 - perform#1 a marriage#4 ceremony

couple1 - a pair#5 of people#5 who live#7 together#2

Semantic Primitive (Cont)

Uniquely identify each of the

definition of entries

Avoid Circularity

Page 11: Limbengtatt25july2002

Word Sense Disambiguation(WSD)

Documents are collections of sentences containing words

Some words have more than one meaning. These meanings are often called word senses.

Goal: Assign meanings to words in some context

according to some lexical resource.

Page 12: Limbengtatt25july2002

Objective Producing Machine-Tractable Dictionary

(MTD) from Machine-Readable Dictionary using descriptive semantic primitives and WSD

Producing tractable database/corpus from database/corpus

Page 13: Limbengtatt25july2002

word sn sp pos definition

be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1]

english 2 N n [the, D, -] [people, N, 1] [of, P, -] [england, N, 1]

. . . ... . . . ...

LCDD

= 0.1 %

Linguistic Resources Machine-Tractable dictionary

Encoded with information extracted from MRD Usable format and highly structured semantic

information for NLP tasks

Determining the relatedness or closeness among word

senses in a dictionary

Descriptive semantic primitives

Page 14: Limbengtatt25july2002

Lexical Consultation System

Semantic Primitive Extractor LCDD Generator WSD

Page 15: Limbengtatt25july2002

Searching for self-reference circle in definition For example,

sense_1 [def] [sense_2 sense_5 sense_6]

sense_2 [def] [sense_3 sense_2]

sense_3 [def] [sense_1 sense_2]

sense_4 [def] [sense_5]

sense_5 [def] [sense_2 sense_4]

sense_6 [def] [sense_5 sense_4]

=>sense_1 is a semantic primitive

Semantic Primitive Extractor

Page 16: Limbengtatt25july2002

Step 1: Expanding dictionaryabandon 1 a feeling of

extreme emotional

intensity

abandon 2 leave behind

.

.

betray 2 abandon

abandon 1 a feeling of

extreme emotional

intensity

abandon 2 leave behind

.

.

betray 2 abandon1 abandon2

Semantic Primitive Extractor (cont)

Page 17: Limbengtatt25july2002

Step 2: identify semantic primitives using self-reference circle

Example, Extract primitives from pre-released

WordNet during SENSEVAL2. Pre-released WordNet1.7 = 192,460 entries Extracted primitives = 9368 entries (around 5% of

pre-released WordNet1.7 entries)

Semantic Primitive Extractor (cont)

Page 18: Limbengtatt25july2002

LCDD generator Identify the word senses’ definition layers

First layer for forecast2 and fixed6

Second layer for forecast2 and fixed6 forecast2

fixed6

make3 a prediction1 about a change1 for the better2 progress4

predict1 advance3

be specific1 about a change1 for the better2 progress4

specify1 advance3

forecast2 : predict1 in advance3

fixed6 : specify1 in advance3

Page 19: Limbengtatt25july2002

LCDD generator(Cont)LCDD(forecast2, fixed6) = a*70% + (b + c + d)/3*30%

Depth-First Method

Layer 1 for forecast2

Layer 2 for forecast2 Layer 2 for fixed6

Layer 1 for fixed6a

b

d

c

Layer 1 specify1 in advance3

Layer 1 predict1 in advance3

a = 1/[(2+2)/2]

Page 20: Limbengtatt25july2002

WSD Simple Summation Algorithm

For example, assume that a sentence, ‘father’, ‘marry’ and ‘couple’. Each word in the sentence has two senses only.

father1 marry1 couple1 father1 marry1 couple2 father1 marry2 couple1 father1 marry2 couple2 father2 marry1 couple1 father2 marry1 couple2 father2 marry2 couple2 father2 marry2 couple2

Dynamic programming techniques

Repetitive Calculation

15.0 15.0 = 40.0= 40.0= 60.0

= 45.0= 35.0= 24.0= 40.0

= 40.0

35.0

10.0

The best combination of word senses: father1 marry2 couple1

Page 21: Limbengtatt25july2002

System Design

Lexical Consultation System

DomainMTD

for WSD

GeneralDictionary

(MTD)

Domain

Database/Corpus

Tractable Domain

Database/Corpus

+

DomainMRD

Page 22: Limbengtatt25july2002

System Architecture

1. Part-of-speech tagging (Auto)2. Semantic Primitive (SP) identification3. SP WSD (Auto)4. SP LCDD generator (Auto)

DomainSemantic primitive

(MTD)

GeneralDictionary

(MTD)

DomainMRD

1. Part-of-speech tagging(Auto)2. WSD (Auto)3. LCDD generation (Auto)

DomainMTD

1. Part-of-speech tagging(Auto)2. WSD (Auto)

Domain

Database/CorpusTractable Domain

Database/Corpus

LCDD=10%

word sn sp pos definition

be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1] english 2 N n [the, D -] [people, N, ?] [of, -, -] [england, N, ?]

people 1 Y n [the, D, -] [body, N, 2] [of, P, -] [citizen, N, 1] [of, P, -] [a, D, -] [state, N, 1] [or, P, -] [country, N, 2]

. . . . ... . . . . ...

LCDD=0.3%

word sn sp pos definition

be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1] english 2 N n [the, D, -] [people, N, 1] [of, P, -] [england, N, 3]

people 1 Y n [the, D, -] [body, N, 2] [of, P, -] [citizen, N, 1] [of, P, -] [a, D, -] [state, N, 1] [or, P, -] [country, N, 2] . . . . ... . . . . ...

Papillon Dictionaries or

FEM

Bilingual Knowledge Bank (BKB)

Page 23: Limbengtatt25july2002

Tractable Bilingual Knowledge Bank (BKB)

kutip(1)[v](3-4/3-4)

itu(1)[det](3-4/3-4)

dia(1)[n](0-1/0-1)

bola(1)[n](2-3/2-4)

dia kutip bola itu0-1 3-4 2-3 3-4

1E 1M

pick(1)[v] up(1)[p](3-4+7-8/3-4)

the(1)[det](2-3/2-3)

he(1)[n](0-1/0-1)

ball(1)[n](3-4/2-4)

he pick the ball up0-1 3-4 2-3 3-4 7-8

(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

he(1)[n](0-1/0-1)

kutip(1)[v](3-4/3-4)

itu(1)[det](3-4/3-4)

dia(1)[n](0-1/0-1)

bola(1)[n](2-3/2-4)

dia kutip bola itu0-1 3-4 2-3 3-4

1E 1M

pick(1)[v] up(1)[p](3-4+7-8/3-4)

the(1)[det](2-3/2-3)

he(1)[n](0-1/0-1)

ball(1)[n](3-4/2-4)

he pick the ball up0-1 3-4 2-3 3-4 7-8

(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)

(2-3,3-4)(3-4,2-3)(0-1,0-1)

he(1)[n]he(1)[n]0-10-1 0-10-1

(0-1,0-1)(0-1,0-1)

(0-1,0-1)(0-1,0-1)

dia(1)[n]dia(1)[n](0-1/0-1)(0-1/0-1)

kutip(2)[v](3-4/3-4)

itu(1)[det](3-4/3-4)

bola(1)[n](2-3/2-4)

0lelaki

1tua

2itu

3kutip

4bola

5itu

6

lelaki(3)[n](0-1/0-3)

itu (1)[det](2-3/2-3)

tua (2)[adj](1-2/1-2)

pick(1)[v] up(1)[p](3-4+7-8/3-4)

the(2)[det](2-3/2-3)

ball(1)[n](3-4/2-4)

0the

1old

2man

3pick

4the

5ball

6up

7

man(4)[n](2-3/0-3)

the(2)[det](0-1/0-1)

old(3)[adj](1-2/1-2)

Page 24: Limbengtatt25july2002

Thank youAny comments please send to

[email protected]

Page 25: Limbengtatt25july2002

Step 2: compute the frequency of each sense entry in dictionary according to its appearance in definition text. Sort the list by frequency

an entry with high frequency =>

high probability that entry is a primitive

Problems: Empty definition Possibility of selecting wrong semantic primitives

based on the self-reference method

Sense frequency

be10 40

english2 20

Semantic Primitive Extractor (cont)

Page 26: Limbengtatt25july2002

WSD (Cont)

Improving the quality of a number of

Natural Language Processing Tasks:

Machine Translation

Information Extraction

Internet Search Engines

Page 27: Limbengtatt25july2002

WSD (Cont)

Path Difference Path value

father1 marry1 couple1 P1 P1

father1 marry1 couple2 D1 P2 = P1+ D1

father1 marry2 couple1 D2 P3 = P2+ D2

father1 marry2 couple2 D3 P4 = P3+ D3

father2 marry1 couple1 D4 P5 = P4+ D4

father2 marry1 couple2 D5 P6 = P5+ D4

father2 marry2 couple1 D6 P7= P6+ D5

father2 marry2 couple2 D7 P8 = P7+ D6

previous path value + difference between the two consecutive paths