Upload
berenice-pearson
View
214
Download
0
Embed Size (px)
Citation preview
Jan Hajič Otakar Smrž
Petr ZemánekJan Šnaidauf
Emanuel Beška
Faculty of Mathematics and PhysicsFaculty of Philosophy and ArtsCharles University in Prague
Development in Data and Tools
Prague Arabic DependencyTreebank
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
2
Project Release – PADT 1.0 December 2004, Linguistic Data
Consortium 148 000 Morpho, 113 500 Syntax
AFP 13 000 N/A France Presse Penn ATB 1
UMH 38 500 N/A Ummah Press Penn ATB 2
XIN 13 500 N/A Xinhua News A Gigaword
ALH 10 000 73 500 Al-Hayat News A Gigaword
ANN 12 500 25 500 An-Nahar News A Gigaword
XIA 26 500 49 500 Xinhua News A Gigaword
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
3
Open-Source Tools TrEd Tree Editor
Multi-purpose annotation environment Suite of programming utilities
Netgraph Search Engine Server/Client system architecture Easy-to-learn query language
Encode::Arabic Perl Module Extension for processing of Arabic script ArabTeX, Buckwalter, Unicode, …
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
4
PADT Functional Views Functional Generative Description
Theory of linguistic meaning and its expression Prague Dependency Treebank for Czech
Independence of representation levels Tectogrammatical – linguistic meaning Analytical – surface dependency syntax Morphological – categories and lexical units
Abstraction of the relations across levels Strict distinction between form and function Different units of description on each level
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
5
Functional Morphology Provides syntax levels with their abstract
language, not just giving letters in tokens Revives multiple senses of categories Completeness of generation Strict modeling of grammatical control MorphoTrees – ‘human tagging’ Successful prototype feature-based tagger
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
6
Syntactic Levels of Description
Analytical level Pragmatically motivated, close to surface syntax Every single token resulting from
morphological level forms one node Tree-like dependency structure for every sentence
Tectogrammatical level Linguistic (literal) meaning, deep relations, TFA Initial structures transformed from AL Nodes for autosemantic words only Decisive role of valency frames
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
7
Logic of Analytical Trees Concepts of dependency and valency Reduction: sentence must retain
grammatical correctness if leaves(terminal nodes) are chopped off
Trees: clause components clauses sentences paragraphs etc.Subtrees of clauses exchangeable for non-clauses
Nodes: words, tokenized parts of words, punctuation marks – marked by functions
Edges: syntactic relations –governing node dependent node/subtree
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
8
Some Syntax Issues of Arabic
Non-verbal predication of several types Subordinate non-verbal clauses / modification Verb-like behavior of many nominal forms Mostly VSO in verbal sentences, but…
vice-versa in non-verbal clauses different, depending on context boundness
Compound verbs, fixed composite prepositions Grammatical co-reference, accusative of
inner object, complex referencing, etc.
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
9
Problem I: Predication Head node of tree: PREDICATE
Why? Steady role in sentence, cannot be omitted Verbal predicate: I-go to school Non-verbal predicate
Nominal: The-house a-big (=the house is big) Existential: There a-city (=there is a city) Prepositional
Possessive: For him a-house (=he has a house) Adverbial: The-mosque in the-city (=…is…)
Conjunctional: The-problem that (=…is that)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
10
la- [PredP]for
-hu [Obj]him
baytun [Sb]a-house [nom.]
Predication Types in TreesdAma [Pred]lasted
iqtirAHu [Sb]proposal
‑hu [Atr]his
al-EamalIyata [Obj]the-operation [acc.]
EalA [AuxP]on
zumalA’i [Obj]colleagues
‑hi [Atr]his
sAEatayni [Adv]two-hours [acc.]al-baytu [Sb]
the-house [nom.]
kabIrun [Pnom]a-big [nom.]
vam~ata [PredE]there-is
fI [PredP]in
al-madInati [Adv]the-city [gen.]
al-jAmiEu [Sb]the-mosque [nom.]
madInatun [Sb]a-city [nom.]
Nominal
Prepositional(possessive)
Existential
Prepositional(adverbial, locative)
Verbal
Verb-like behavior (object of noun?)
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
11
Problem II: Clauses & Co-reference
Recursiveness: subordinate clause is con-tained as subtree in place of simple element Head-node of clause gets the same function Problem: non-verbal structures – clauses or not? Compound verbs (mA zAla etc.) treated equally
Grammatical co-reference: Personal pro- noun formally required by another element Pronoun must be marked to be treated as such Target of reference is unambiguously identifiable Often in subordinate clauses, mostly attributive
Ex.: He-wrote a-book number its-pages hundred
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
12
naHwu [Sb]grammar [nom.]
jumalan [Sb]sentences [acc.]
fI [Atr_PredP]in
Clauses & Co-reference in Trees
kataba [Pred]he-wrote
SafHatin [Atr]pages [gen.]
kitAban [Obj]a-book
mi’atu [Sb]hundred [nom.]
zAlat [Pred]she-stopped
tuHis~u [Atv]she-feels
anna [AuxC]that
‑hA [Atr_Ref] their
-hA [Obj]her
wADiHun [Atr_Pnom]clear [nom.]
tuEjibu [Obj_Pred]they-impress
al-rajulu [Sb]the-man [nom.]
Attributive clause, prepositional
predicate (adverbial)
Objective clause, verbal predicate
Compound verb, formed as main verb and its complement
zaybabu [Sb]Zaynab
mA [AuxM]not
-hi [Adv_Ref]it
Referencing pronoun, as
attribute in clause
Attributive clause, nominal predicate
Referencing pronoun, as
adverbial in clause
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
13
Future Prospects Implementation of Functional
Morphology Tectogrammatical annotation Lexicons of valency frames Re-training the feature-based tagger
on MorphoTrees Machine-learning on the treebank
data for various purposes
September 23, 2004 Prague Arabic Dependency Treebank: Development in Data and Tools
14
Thank you
Questions welcome!
http://ckl.mff.cuni.cz/padt/