Upload
madzani-nusa
View
363
Download
0
Tags:
Embed Size (px)
Citation preview
Building a Primitive-based Lexical Consultation System
prepared by Lim Beng Tat
Supervisor: Dr Tang Enya Kong
Dr. Guo Cheng Ming
Abstract
The research gives about the design of semantic-primitive-based lexical consultation system and the possible processes which will be performed on a mahine-readable dictionary (MRD) and corpus to produce a machine-tractable dictionary (MTD) and tractable corpus automatically. Linguistic tools such as sense tagger and reources are created during or after the processes. Besides that, this research will also show how to perform an unsupervised word sense disambiguation method to the samples of unrestricted text from various prospective application areas by using the newly constructed MTD. This is important to the applications that need lexical semantics such as machine translation, information retrieval and hypertext navigation, content and thematic analysis, grammatical analysis, speech processing and text processing.
Outline Introduction
Problem Objective
Lexical Consultation System System design and architecture
Example applications Bilingual Knowledge Bank
Introduction Dictionaries
Supply knowledge (language and world) E.g. Collins English Dictionary (CED), Longman's
Dictionary of Contemporary English (LDOCE) and Webster's 9th Dictionary (W9)
word sn pos definition be 10 n spend or use timeenglish 2 n people of england . . . ... . . . ...
Introduction (Cont) Explicit information (POS) Implicit information / semantic information
Hypernym/hyponym relations (class/subclass)
Synonymy/Antonymy relations Meronym/Holonym relation (part/whole, ...) Collocational relations (compounds,
idioms, ...) and etc
Introduction (Cont) Problem: Extracting semantic information from
dictionary? 2 methods:
Defining pattern Identify significant recurring phrase E.g. “A member of”- NP
hand a member of a ship's crew…[W9]
Extraction of semantic hierarchy Extraction of hyponym. E.g. dipper a ladle used for dipping... [CED] ladle a long-handled spoon... [CED] spoon a metal, wooden, or plastic utensil... [CED]
utensil
spoon
ladle
dipper
Introduction (Cont) Disadv:
Circularity E.g. tool an implement, such as a hammer... [CED] implement a piece of equipment; tool or utensil. [CED] utensil an implement, tool or container... [CED]
Inconsistency in dictionaries E.g. corkscrew a pointed spiral piece of metal... [W9] dinner service a complete set of plates and
dishes... [LDOCE] Dictionaries for human usage
Other methods: Semantic primitive and word sense disambiguation
Semantic Primitive Semantic primitive refer to a “core”
meaning that cannot be not further analyzed E.g. bachelor and red bachelor means that someone is a man who
is not married What does red mean ? red represents semantic primitive (a basic
meaning), while bachelor does not.
Semantic Primitive (Cont) 2 types of semantic primitive
Prescriptive and descriptive Prescriptive semantic primitives
Set of pre-defined primitive
E.g. father marry couple marry :[human, human]. father : [human] couple : [human, thing].
To choose the correct sense of ‘couple’
Prescriptive semantic primitives Problem: always need to be extended
Descriptive semantic primitives Set of semantic primitives which is derived
from a natural source of data such as dictionary.
E.g. father5 - a term#5 of address for priest#2 in some
church especially roman#7 or orthodox#3 catholic
marry3 - perform#1 a marriage#4 ceremony
couple1 - a pair#5 of people#5 who live#7 together#2
Semantic Primitive (Cont)
Uniquely identify each of the
definition of entries
Avoid Circularity
Word Sense Disambiguation(WSD)
Documents are collections of sentences containing words
Some words have more than one meaning. These meanings are often called word senses.
Goal: Assign meanings to words in some context
according to some lexical resource.
Objective Producing Machine-Tractable Dictionary
(MTD) from Machine-Readable Dictionary using descriptive semantic primitives and WSD
Producing tractable database/corpus from database/corpus
word sn sp pos definition
be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1]
english 2 N n [the, D, -] [people, N, 1] [of, P, -] [england, N, 1]
. . . ... . . . ...
LCDD
= 0.1 %
Linguistic Resources Machine-Tractable dictionary
Encoded with information extracted from MRD Usable format and highly structured semantic
information for NLP tasks
Determining the relatedness or closeness among word
senses in a dictionary
Descriptive semantic primitives
Lexical Consultation System
Semantic Primitive Extractor LCDD Generator WSD
Searching for self-reference circle in definition For example,
sense_1 [def] [sense_2 sense_5 sense_6]
sense_2 [def] [sense_3 sense_2]
sense_3 [def] [sense_1 sense_2]
sense_4 [def] [sense_5]
sense_5 [def] [sense_2 sense_4]
sense_6 [def] [sense_5 sense_4]
=>sense_1 is a semantic primitive
Semantic Primitive Extractor
Step 1: Expanding dictionaryabandon 1 a feeling of
extreme emotional
intensity
abandon 2 leave behind
.
.
betray 2 abandon
abandon 1 a feeling of
extreme emotional
intensity
abandon 2 leave behind
.
.
betray 2 abandon1 abandon2
Semantic Primitive Extractor (cont)
Step 2: identify semantic primitives using self-reference circle
Example, Extract primitives from pre-released
WordNet during SENSEVAL2. Pre-released WordNet1.7 = 192,460 entries Extracted primitives = 9368 entries (around 5% of
pre-released WordNet1.7 entries)
Semantic Primitive Extractor (cont)
LCDD generator Identify the word senses’ definition layers
First layer for forecast2 and fixed6
Second layer for forecast2 and fixed6 forecast2
fixed6
make3 a prediction1 about a change1 for the better2 progress4
predict1 advance3
be specific1 about a change1 for the better2 progress4
specify1 advance3
forecast2 : predict1 in advance3
fixed6 : specify1 in advance3
LCDD generator(Cont)LCDD(forecast2, fixed6) = a*70% + (b + c + d)/3*30%
Depth-First Method
Layer 1 for forecast2
Layer 2 for forecast2 Layer 2 for fixed6
Layer 1 for fixed6a
b
d
c
Layer 1 specify1 in advance3
Layer 1 predict1 in advance3
a = 1/[(2+2)/2]
WSD Simple Summation Algorithm
For example, assume that a sentence, ‘father’, ‘marry’ and ‘couple’. Each word in the sentence has two senses only.
father1 marry1 couple1 father1 marry1 couple2 father1 marry2 couple1 father1 marry2 couple2 father2 marry1 couple1 father2 marry1 couple2 father2 marry2 couple2 father2 marry2 couple2
Dynamic programming techniques
Repetitive Calculation
15.0 15.0 = 40.0= 40.0= 60.0
= 45.0= 35.0= 24.0= 40.0
= 40.0
35.0
10.0
The best combination of word senses: father1 marry2 couple1
System Design
Lexical Consultation System
DomainMTD
for WSD
GeneralDictionary
(MTD)
Domain
Database/Corpus
Tractable Domain
Database/Corpus
+
DomainMRD
System Architecture
1. Part-of-speech tagging (Auto)2. Semantic Primitive (SP) identification3. SP WSD (Auto)4. SP LCDD generator (Auto)
DomainSemantic primitive
(MTD)
GeneralDictionary
(MTD)
DomainMRD
1. Part-of-speech tagging(Auto)2. WSD (Auto)3. LCDD generation (Auto)
DomainMTD
1. Part-of-speech tagging(Auto)2. WSD (Auto)
Domain
Database/CorpusTractable Domain
Database/Corpus
LCDD=10%
word sn sp pos definition
be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1] english 2 N n [the, D -] [people, N, ?] [of, -, -] [england, N, ?]
people 1 Y n [the, D, -] [body, N, 2] [of, P, -] [citizen, N, 1] [of, P, -] [a, D, -] [state, N, 1] [or, P, -] [country, N, 2]
. . . . ... . . . . ...
LCDD=0.3%
word sn sp pos definition
be 10 Y n [spend, V, 1] [or, C, -] [use, V, 2] [time, N, 1] english 2 N n [the, D, -] [people, N, 1] [of, P, -] [england, N, 3]
people 1 Y n [the, D, -] [body, N, 2] [of, P, -] [citizen, N, 1] [of, P, -] [a, D, -] [state, N, 1] [or, P, -] [country, N, 2] . . . . ... . . . . ...
Papillon Dictionaries or
FEM
Bilingual Knowledge Bank (BKB)
Tractable Bilingual Knowledge Bank (BKB)
kutip(1)[v](3-4/3-4)
itu(1)[det](3-4/3-4)
dia(1)[n](0-1/0-1)
bola(1)[n](2-3/2-4)
dia kutip bola itu0-1 3-4 2-3 3-4
1E 1M
pick(1)[v] up(1)[p](3-4+7-8/3-4)
the(1)[det](2-3/2-3)
he(1)[n](0-1/0-1)
ball(1)[n](3-4/2-4)
he pick the ball up0-1 3-4 2-3 3-4 7-8
(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
he(1)[n](0-1/0-1)
kutip(1)[v](3-4/3-4)
itu(1)[det](3-4/3-4)
dia(1)[n](0-1/0-1)
bola(1)[n](2-3/2-4)
dia kutip bola itu0-1 3-4 2-3 3-4
1E 1M
pick(1)[v] up(1)[p](3-4+7-8/3-4)
the(1)[det](2-3/2-3)
he(1)[n](0-1/0-1)
ball(1)[n](3-4/2-4)
he pick the ball up0-1 3-4 2-3 3-4 7-8
(0-5,0-4)(0-1,0-1)(2-4,2-4)(2-3,3-4)
(2-3,3-4)(3-4,2-3)(0-1,0-1)
he(1)[n]he(1)[n]0-10-1 0-10-1
(0-1,0-1)(0-1,0-1)
(0-1,0-1)(0-1,0-1)
dia(1)[n]dia(1)[n](0-1/0-1)(0-1/0-1)
kutip(2)[v](3-4/3-4)
itu(1)[det](3-4/3-4)
bola(1)[n](2-3/2-4)
0lelaki
1tua
2itu
3kutip
4bola
5itu
6
lelaki(3)[n](0-1/0-3)
itu (1)[det](2-3/2-3)
tua (2)[adj](1-2/1-2)
pick(1)[v] up(1)[p](3-4+7-8/3-4)
the(2)[det](2-3/2-3)
ball(1)[n](3-4/2-4)
0the
1old
2man
3pick
4the
5ball
6up
7
man(4)[n](2-3/0-3)
the(2)[det](0-1/0-1)
old(3)[adj](1-2/1-2)
Thank youAny comments please send to
Step 2: compute the frequency of each sense entry in dictionary according to its appearance in definition text. Sort the list by frequency
an entry with high frequency =>
high probability that entry is a primitive
Problems: Empty definition Possibility of selecting wrong semantic primitives
based on the self-reference method
Sense frequency
be10 40
english2 20
Semantic Primitive Extractor (cont)
WSD (Cont)
Improving the quality of a number of
Natural Language Processing Tasks:
Machine Translation
Information Extraction
Internet Search Engines
WSD (Cont)
Path Difference Path value
father1 marry1 couple1 P1 P1
father1 marry1 couple2 D1 P2 = P1+ D1
father1 marry2 couple1 D2 P3 = P2+ D2
father1 marry2 couple2 D3 P4 = P3+ D3
father2 marry1 couple1 D4 P5 = P4+ D4
father2 marry1 couple2 D5 P6 = P5+ D4
father2 marry2 couple1 D6 P7= P6+ D5
father2 marry2 couple2 D7 P8 = P7+ D6
previous path value + difference between the two consecutive paths