Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
. . . . . .
.
.. ..
.
.
Parsing Sanskrit Texts: Some relation specificissues
Amba Kulkarni1 and K V Ramakrishnamacharyulu2
Department of Sanskrit Studies,University of Hyderabad,
Hyderabad
Department of Vyakarana,Rashtriya Sanskrit Vidyapeetha,
Tirupati
1 / 39
. . . . . .
.. Sentence Level Analysis
Parsing: a linear string of words – > a structure showing therelations between words.
Structure: Constituency / Dependency
2 / 39
. . . . . .
.. Constituency Structure: Eng Sentence
For positional languages such as English, constituency structuremore or less leads to the dependency relations.
Figure: Constituency Parse of an English Sentence
3 / 39
. . . . . .
.. Constituency Structure: Sanskrit CompoundIn case of Sanskrit compounds, such a constitutional structure isvery useful in determining the parse.
Figure: Constituency Structure of a Sanskrit Compound
4 / 39
. . . . . .
.. Structure of a Sanskrit Sentence
Sanskrit:Morphologically richFree word order to a large extent
Dependency structure makes more sense than the constituencystructure.
5 / 39
. . . . . .
.. Dependency Structure: Sanskrit Sentence
Dependency Parse: rājā viprāya gām dadāti
6 / 39
. . . . . .
.. Sentence Level Analysis
Traditional Approaches:Daṇdānvyayaḥ [Anvyayamukhī]Khaṇdānvyayaḥ [Kathambhūtinī]
7 / 39
. . . . . .
.. Sentence Level Analysis ...
Modern TimesDependency Parsing: credited to Tesniére [1959]
8 / 39
. . . . . .
.. Tagset
Dependency Relations:Compilation of all relations: 90 [KVRK, 2009]Proposed a tagset
9 / 39
. . . . . .
.. Tagset ...
Is this tagset suitable?
For Manual AnnotationFor Statistical ParsingFor Rule Based Parsing
10 / 39
. . . . . .
.. Tagset ...
Is this tagset suitable?For Manual AnnotationFor Statistical ParsingFor Rule Based Parsing
10 / 39
. . . . . .
.. Tagset ...
Concerns related to Manual Annotation / design of a StatisticalParser
The inter annotator agreementThe grey / fussy boundaries between semantics of tags lead toerrors in annotation.
Concerns related to a design of a Rule based ParserWhether the relations can be decided purely on the basis ofmorphological and syntactic information?
11 / 39
. . . . . .
.. Annotation Convention
1 rAmaH kartA,32 vanam karma,33 gacchati
12 / 39
. . . . . .
.. Clues for extracting the relations
Abhihitatva (property of being expressed)VibhaktiIndeclinables (avyaya)Sāmānādhikaraṇya (being in g-n-p agreement)Nitya sambandhaḥ
13 / 39
. . . . . .
.. Clues: Abhitatva
tiṅrāmaḥ vanam gacchati.marks the kartā relationrāmeṇa vanam gamyate.marks the karma relation
kṛtdhāvan aśvaḥ.marks the kartā.
taddhitaḥsamāsaḥ
14 / 39
. . . . . .
.. Clues: Vibhakti
kāraka vibhaktiMarks noun-verb relations.e.g. kartā, karma, etc.upapada vibhaktiMarks noun-noun relations through the upapadas.More of a morphological requirement.E.g. rāmeṇa saha sītā vanam gacchati.special vibhakti
kriyā-viśeṣaṇamvegena dhāvatiaṅgavikāraḥakṣṇā kāṇaḥnirdhāraṇanareṣu śreṣṭaḥ
15 / 39
. . . . . .
.. Clues: Indeclinables
negationrāmaḥ gṛham na gacchati.emphasisrāmaḥ eva tatra upaviṣati.
16 / 39
. . . . . .
.. Clues: sāmānādhikaraṇya
śvetaḥ aśvaḥ dhāvati.aśvaḥ śvetaḥ asti.
17 / 39
. . . . . .
.. Clues: Nitya Sambandhaḥ
Relation between yat-tat pairyadā - tadāyatra - tatra
18 / 39
. . . . . .
.. Choice of the relations
Should the inflectional suffixes and derivational suffixes betreated at par?How to treat the function words? Should they be treated as anode in a tree or an edge?How to represent the inter sentential relations?Should anaphoric resolution be part of this annotation?
19 / 39
. . . . . .
.. Basic Principles
...1 Preserve one-one mapping between the nodes of a tree andthe words in a sentence.
...2 In case of derived nouns, consider only the inflectional suffixfor establishing the relations.
...3 In case of derived indeclinables, use the derived suffix to markthe relations.
...4 A suffix or a word can represent one and only one relation.
20 / 39
. . . . . .
.. Which relation to mark?
Example: dhāvantam aśvam paśya.
21 / 39
. . . . . .
.. Which relation to mark? Contd...
Example: dhāvantam aśvam paśya.
A loop destroys the nice tree structure of a parse. Hence mark onlyrelations indicated through the inflectional suffixes and not theones indicated through the derivational suffixes.
22 / 39
. . . . . .
.. Which relation? .. contd
What about the faithfulness?
Information is available through the kṛt suffix, which is availablefor marking the relation of kartā at a later stage.
23 / 39
. . . . . .
.. Treating indeclinables:
(1) kṛdanta avyayas:rāmaḥ dugdham pītvā śālām gacchati.
24 / 39
. . . . . .
.. Treating indeclinables:
(1) kṛdanta avyayas:rāmaḥ dugdham pītvā śālām gacchati.
25 / 39
. . . . . .
.. Treating indeclinables:Content / function ?
(2) upapada avyayassītā rāmeṇa saha vanam gacchati.
26 / 39
. . . . . .
.. Treating indeclinables:Content / function ?
(3) Rest of the avyayaseva – avadhāraṇāna – niṣedhya
Number of relations explode.
27 / 39
. . . . . .
.. Treating indeclinables:Content / function ?
(3) Rest of the avyayaseva – avadhāraṇāna – niṣedhya
Number of relations explode.
28 / 39
. . . . . .
.. Inter Sentential Connectivesyadi tvam icchasi tarhi aham bhavataḥ gṛham āgacchāmi.
29 / 39
. . . . . .
.. Treatment of Anaphoras
yatra nāryaḥ pūjyante ramante tatra devatāḥ
30 / 39
. . . . . .
31 / 39
. . . . . .
.. Treatment of Conjunction and Disjunctionrāmaḥ sītā lakṣmaṇaḥ ca vanam gacchanti
32 / 39
. . . . . .
.. GranularityCriterion for Granularity:If one can tell one relation from the other purely on the basis ofsyntax or morphology, then the two relations may be treated asdistinctTradition clasifies kartā into the following subcategories.
anubhavī kartāEx: ghaṭo naśyatiamūrtaḥ kartāEx: krodhaḥ āgacchatiprayojaka kartāEx: devadattaḥ viṣṇumitreṇa pācayati.prayojya kartāEx: devadattaḥ viṣṇumitreṇa pācayati.madhyastha kartāEx: devadattaḥ yajñadattena viṣṇumitreṇa pācayati.abhipreraka / utpreraka kartāEx: modakaḥ rocate.
33 / 39
. . . . . .
.. Granularity ...
karma-kartṛEx: kāṣṭhaḥ svayameva bhidyate.karaṇa-kartṛEx: asiḥ chinatti.ṣaṣṭhī kartāEx: ācāryasya anuśāsanam
34 / 39
. . . . . .
Necessary conditions forprayojaka kartāprayojya kartāṣaṣṭhī kartā
areMorphologySyntax
devadattena annam pācayati. (prayojya kartā)devadattaḥ agninā annam pācayati. (karaṇa)
35 / 39
. . . . . .
asiḥ chinattiHere asiḥ is karaṇakartṛ of chinn only because it is a karaṇa forchinn.
Semantics is involved in the decision.
31 relations: Necessary information for deciding the possibilityinvolves only morphology and Syntactic information.
36 / 39
. . . . . .
.. Set of Relations
kartā prayojyakartāprayojakakartā karmakaraṇam sampradānamapādānam ṣaṣṭhīsambandhaḥadhikaraṇam sambodhanasūcakamsambodhyaḥhetuḥ prayojanamtādarthya niṣedhyaḥkriyāviśeṣaṇam viśeṣaṇamnirdhāraṇam upapadasambandhaḥpratiyogī anuyogīsamuccitam anyataraḥkartṛsamānādhikaraṇam karmasamānādhikaraṇamsamānakālaḥ anantarakālaḥpūrvakālaḥ vīpsāśeṣasambandhaḥ sambandhaḥ
37 / 39
. . . . . .
.. Towards a more useful parse
Treat upapadas as function words rather than content wordsShow the co-indexing for anaphora resolutionShow the sharing of kārakas
38 / 39
. . . . . .
DEMO
39 / 39