Click here to load reader
Upload
dmitry-kan
View
1.180
Download
0
Embed Size (px)
Citation preview
METHOD FOR AN AUTOMATIC GENERATION OF A SEMANTIC-LEVEL CONTEXTUAL TRANSLATIONAL DICTIONARY
St. Petersburg State University
Dmitry Kan Faculty of Applied Mathematics and Control Processes, Department of Technology of Programming, Peterhof, Russia
Abstract Word alignment Translational dictionary
In this paper we demonstrate the semantic
feature machine translation (MT) system
as a combination of two fundamental
approaches, where the rule-based side is
supported by the functional model of the
Russian language and the statistical side
utilizes statistical word alignment. The
MT system relies on a semantic-level
contextual translational dictionary as its
key component. We will present the
method for an automatic generation of the
dictionary where disambiguation is done
on a semantic level.
Desperate to hold onto power , Pervez
Musharraf has
discarded Pakistan ' s constitutional
framework and
declared a state of emergency .
NULL ({20}) В ({})
отчаянном ({1 3 4})
стремлении ({2}) удержать ({}) власть
({5}) ,
({6}) Первез ({7}) Мушарраф ({8}) от-
верг ({9 10})
конституционную ({14 15})
систему ({})
Пакистана ({11 12 13}) и ({16})
объявил ({17}) о ({18})
введении ({})
чрезвычайного ({19 21})
положения ({}) . ({22})
Table 1: Word alignment for English and Russian sentences
Parallel corpus: UMC 0.1 86000 pairs of sentences 1,3 million phrase pairs ~18000 resulting dictionary entries
В Y1>HabU(Y1:,ПРЕД:Z1)
\\ <149>--->Within
В Y1>Loc(Y1:,ВНУТРИ$12/313/05
(ПРЕД:Z1))
\\ <146>--->at
В Y1>Loc(Y1:,Oper01(#,ПРЕД:Z1))
\\ <208>--->In
В Y1>Loc(Y1:,ПРЕД:Z1)
\\ <224>--->Throughout
...
НА Y1>Direkt(Y1:,ВЕРХ$12/141/05
(ВИН:Z1))
\\ <67>--->at
НА Y1>Direkt(Y1:,РОД:Z1) \\ <100>-
-->on
НА Y1>Direkt(Y1:,РОД:Z1) \\ <69>--
->for
...
ОБРАЗ (РОД:Z1) \\ <2>--->a way
ОБЩЕМИРОВОЙ A1>Rel
(A1:НЕЧТО$1,ПОЛНЫЙ$12/207/05
(МИР$1227))
\\ <1>--->global
...
Computer semantics theory
Thesis 1. Language is an algebraic system
{f1, .., fn, M}, where fi is basis function and
M is data structure (set of basis concepts) of
a natural language L.
Thesis 2. Each word in a sentence S is the
name of its semantic function.
Thesis 3. Grammar links with semantics and
can be incorporated into semantics
dictionary
mjhiww
wwfwwfFS
hmij
nlnnk
,,
)),,...,(),...,,...,(( 11111
Russian English
NULL of
отчаянном Desperate to hold
стремлении to
власть power
, ,
Первез Pervez
Мушарраф Musharraf
отверг has discarded
конституционную constitutional framework
Пакистана Pakistan ´ s
и and
объявил declared
о a
чрезвычайного state emergency
. .
Features of Machine Translation System dictionary entries contain semantic attributes of the Russian words
each entry represents a sample of a context extracted using statistical
word alignment and coded with the corresponding semantic formula;
the MT system is automatically extendable through acquiring new par-
allel corpora and applying the method of word alignment with semantic
analysis of sentences on source language side
Semantic Machine Translation Model
where
i
lk
s
i
ml
mk
S
ni
P
ttm
tti
SMTM
),(maxarg),...,1
(maxarg
,2
1,1,1
ML
lt
kt
MLl
tk
t
lt
kt
Si
2,0
2,1
),(