Click here to load reader

Processing of Russian by the ETAP-3 Linguistic Processor

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Processing of Russian by the ETAP-3 Linguistic Processor. Igor Boguslavsky Institute for Information Transmission Problems , Russian Academy of Sciences / Universidad Politécnica de Madrid. ETAP-3. - PowerPoint PPT Presentation

Text of Processing of Russian by the ETAP-3 Linguistic Processor

De la estructura semntica de la palabra a la estructura semntica de la oracin.

Processing of Russian by the ETAP-3 Linguistic ProcessorIgor BoguslavskyInstitute for Information Transmission Problems, Russian Academy of Sciences / Universidad Politcnica de Madrid1ETAP-3A multipurpose NLP environment developed in the Institute for Information Transmission Problems, Russian Academy of SciencesTheoretical foundations: Igor Meluk. Meaning Text Linguistic Theory Jury Apresjan. Systematic Lexicography and the Theory of Integral Description of LanguageETAP-3 is a multipurpose NLP environment developed in the Institute for Information Transmission Problems, Russian Academy of Sciences (Apresjan et al. 2003). The theoretical foundation of ETAP-3 is the Meaning Text linguistic model by Igor' Mel'uk and the Integral Theory of Language by Jurij Apresjan. MotivationA non-commercial environment, primarily oriented at linguistic research, rather than the creation of a marketable software product.Main focus: computational modelling of natural languages. NB: grammatical vs. non-grammaticalLargest coverage: Russian and EnglishETAP-3 is a non-commercial environment, primarily oriented at linguistic research, rather than the creation of a marketable software product. The main focus of the research pursued with ETAP-3 is computational modelling of natural languages. The language for which the model offers the largest coverage and has been developed deepest of all is Russian, a language very rich in morphology. The model relies on a comprehensive grammar, which is used in several rule-based components that perform dependency parsing, deep-structure construction, and generation.Ideally, it should distinguish between correct and faulty. Obviously, the desire to distinguish between correct Russian and faulty Russian is more important theoretically than practically. As far as practical applications are concerned, we sometimes relax grammar rules, in order to increase robustness Major featuresRule-based (with some statistical support of the parsing)Strict stratificationDependency syntactic structureLexicalistic approachSelf-tuning to the text processedMaximum reusability of modules and resources in different applicationsETAP-3 disposes of a more than 130,000 lemmas-strong morphological dictionary that accounts for inflection, and a so-called combinatorial dictionary of nearly the same size. Many types of info.

5General Architecture of Translation Process5DictionariesMorphological dictionary (130,000+)Combinatorial dictionaryLemma nameSyntactic features (out of 200+)Semantic features (out of 40+)Subcategorization frame (morphological, syntactic, semantic, lexical constraints)Values of Lexical FunctionsRules of various typesA combinatorial dictionary entry contains, in addition to the lemma name, information on syntactic and semantic features of the word, its subcategorization frame, a default translation (into English), rules of various types, and values of lexical functions (in the sense of Meluk) for which the lemma is the keyword. The word's syntactic features characterize its ability/non-ability to participate in specific syntactic constructions. A word can have several syntactic features selected from a total of more than 200 items. Semantic features are needed to check semantic agreement between the words in a sentence and formulate word sense disambiguation rules. The subcategorization frame shows the surface marking of the words arguments (in terms of case, prepositions, conjunctions, etc.). Rules are an essential part of the dictionary entry. As a matter of fact, all rules operating in ETAP-3 are distributed between the grammar and the dictionary. Grammar rules are more general and apply to large classes of words, whereas rules listed or simply referred to in the dictionary are restricted in their scope and only apply to small classes of words or even individual words. This organization of the rules ensures self-tuning of the system to the processing of each particular sentence. In processing a sentence, only those dictionary rules are activated that are present, or explicitly referred to in the dictionary entries of the words making up the sentence.

Self-Tuning: Grammar vs. DictionaryGeneral regularities: general rules applied to large lexical classes and resorted to in every sentence processingSimple examples: Agreement Adj + N, N + VLess general regularities: Specific rules applied to closed lexical classes. Simple example: composite numerals77ETAP optionsMachine Translation Ontology-based semantic analysisInterlingua-based analysis and generation (UNL)Paraphrasing on the basis of LFs Speech synthesis support Syntactic annotation of textsGrammar checking8Machine Translation Russian EnglishRussian-French PrototypeRussian-German PrototypeRussian-Korean PrototypeSpanish-English PrototypeArabic-English Prototype99Multiple parsing and translation facilityInteractive disambiguationExtensive use of Lexical Functions for parsing, disambiguation, translation, paraphrasing. Some salient features Multiple Parsing/TranslationThey made a general reply that

(a) they replied in a general way that

(b) they forced a general to reply that1111Syntactic Dependency Tree: First Option12

12Translation: First Option13 , . 13Syntactic Dependency Tree: Second Option14

14Translation: Second Option15 , . 15Interactive disambiguationWe know that automatic WSD and syntactic disambiguation are not very efficient so far. Therefore it is useful for some applications to interact with the user, when the model does not have enough information for reliable disambiguation. Scenario. MT for the author. User: understands SL

Lexical FunctionsLFs {R, X, Y} are widely spread linguistically relevant meanings that are expresed differently in different languages.These meanings are language-independent. Their correlates are language-specific.MAGN (disease) = graveMAGN (rain) = heavy MAGN ( disease) = , lit. heavy MAGN ( rain) = , lit. strong

19where R is a certain general semantic relation obtaining between the argument lexeme X (the keyword) and some other lexeme Y which is the value of R with regard to X

Substitute LFs= those which replace the keyword in the given utterance without substantially changing its meaning or changing it in a strictly predictable way.Synonyms, hypernyms, antonymsConverse terms: buy sell, right left, Derivatives: encourage encouragement,to build buildernominate nomineeteach student

Paraphrasing with converse termsShe bought a computer for 500 dollars from a retail dealer A retail dealer sold her a computer for 500 dollars She paid 500 dollars to the retail dealer for a computer The retail dealer got 500 dollars from her for a computer.Collocate LFs= those which appear in an utterance alongside the keyword. Adjectival LFs, such as MAGNSupport verbs of the OPER/FUNC/LABOR family: play a leading role in paraphrasing

Paraphrasing based on collocatesHe respects [X] his teachers He has [Oper1(S0(X))] respect [S0(X)] for his teachers He treats [Labor1-2(S0(X))] his teachers with respect His teachers enjoy [Oper2(S0(X))] his respect.

Rules for the previous exampleX Oper1(X) + S0(X)X Oper2(X) + S0(X)X Labor12(X) + S0(X)Some other rulesX Copul + S1(X) He taught me at school He was my teacher at schoolX Func0 + S0(X) They are arguing heatedly A heated argument between them is onX Func1 + S0(X)He is afraid Fear possesses him IncepOper1 + S0(X) IncepOper2 + S0(X) He conceived a dislike for her She caused his dislikeFinOper1 + S0(X) FinOper2 + S0(X) England lost control of this territory This territory went out of Englands control Some more rules LiquOper1 + S0(X) LiquOper2 + S0(X) The government deprived the monopolies of control over the prices The government took the prices out of the monopolies controlLiquOper1 + S0(X) LiquFunc1 + S0(X) We freed him of this burden We lifted this burden from himX IncepOper1 + Sres(X) IncepFunc1 + Sres(X).He learned physics He acquired the knowledge of physics.

Some moreX CausOper1 + Sres(X) etc. He taught me physics He gave me the knowledge of physics. LiquOper1 + Sinit (X) LiquFunc1 + Sinit(X) etc. A sudden bell woke him up A sudden bell interrupted his sleep.CausFact0-M + X / CausFact1-M + X / CausReal1-M + X IncepFact0-M + X / IncepReal1-M + X etc. They sent him on leave for a few days He went on leave for a few days. Some moreLiquFact0-M + X / LiquFact1-M + X / LiquReal1-M + X FinFact0-M + X / FinReal1-M + X etc. He was deprived of his last chance to win in this event He lost his last chance to win in this eventAnti1Fact0-M(X) + X = negFact0-M(X) + X etc. The plans of pacifying the aggressor failed The plans of pacifying the aggressor did not succeed;Anti1Real1-M(X) + X negReal1-M(X) + X etc. The board of directors declined the compromise The board of directors did not accept the compromise,

Some moreAnti1Real2-M(X) + X negReal2-M(X) + X etc. He swallowed up the insult He did not avenge the insultAnti1Real3-M(X) + X negReal3-M(X) + X etc. The lecturer ignored the questions of the audience The lecturer did not answer the questions of the audience, He neglected my advice He did not follow my advice, Any soldier who violates the order is subject to court martial Any soldier who does not obey the order is subject to court martial. X + Y Anti1 + Anti2He stopped violating the rules He began observing the rules

Lexicographic supportA paraphrasing system of this kind requires a good lexicographic source from which the appropriate LF values of words could be extracted. Such a source is provided by the combinatorial dictionaryCombinatorial dictionaries of English and Russian: an inventory of 120+ LFs LFs have a strong potential for NLP applications. LFs are used for: Lexical and syntactic disambiguation Adequate word selecti

Search related