33
The Descriptive Grammar as a (Meta)Database Jeff Good University of Pittsburgh and Max Planck Institute for Evolutionary Anthropology

The Descriptive Grammar as a (Meta)Database Jeff Good University of Pittsburgh and Max Planck Institute for Evolutionary Anthropology

Embed Size (px)

Citation preview

The Descriptive Grammar as a (Meta)Database

Jeff GoodUniversity of Pittsburgh and

Max Planck Institute for Evolutionary Anthropology

Structure of Presentation

•Discuss major features found in four descriptive grammars surveyed for this paper

•Discuss special features particular to each of the grammars

•Propose a conceptual model for the structure of the information found in descriptive grammars

•Propose a basic XML representation of that model

The Grammars Surveyed•A “best practice” grammar

Haspelmath’s (1993) Lezgian Grammar (Northeast Caucasian/Nakh-Daghestanian)

•A subcommunity grammar Maganga and Schadeberg’s (1992) Kinyamwezi Grammar (Bantu)

•A Lingua Questionnaire grammar Huttar and Huttar’s (1994) Ndyuka Grammar (Atlantic creole)

•A “legacy” grammar Williamson’s (1965) Ijaw Grammar (Niger-Congo)

Four common features

•Nested, labeled sections

•Descriptive prose

•Exemplars

•Reference to multiple ontologies

•A fifth feature: “Structured description”

•See Penton, Bow, Bird, and Hughes

•“Towards a General Model for Linguistic Paradigms”

Sections•Like most relatively long documents,

grammars are divided into marked sections

•Sections can be nested inside other sections

•The content of sections tends to be partially standardized (e.g., most grammars will have a sections on consonants, vowels, basic sentence structure, etc.)

•Sections typically are associated with a label and often with a title

•Sectioning can also be sensitive to ontologies

Ndyuka Grammar Sections

Lezgian Grammar Sections

Verbal inflection

Introduction

The three stems of strong verbs

Verbal inflectional categories

       Forms derived from the Masdar stem

       Forms derived from the Imperfective stem

       Forms derived from the Aorist stem

       Secondary verbal categories

       Prefixal negation and the Periphrasis forms

Illustrative verbal paradigms

Irregular verbs

       The copulas

       Verbs lacking a Masdar and Aorist stem

       Secondary verbal categories

       Verbs with root in ä(g)-

Functions of basic tense-aspect categories

       Imperfective

       Future

       Aorist

       Perfect

       Continuative Imperfective and Continuative Perfect

       Past

Functions of non-indicative finite verb forms

       Imperative

       Prohibitive

       Hortative

       Optative

       Conditional

       Interrogative...

Morphological criteriaSemantic criteria

Ontologies

•Descriptive grammars make extensive use of multiple ontologies (i.e., structured sets of categories)

•Three kinds of ontologies

•General (assumed to be understood by the entire linguistics community)

•Subcommunity (assumed to be understood by a well-defined subcommunity of linguists)

•Local (only taken to be meaningful in the context of the particular language being described)

General and Local Ontologies in Ijaw Grammar

Arbitrary labels:Local Ontology

Terms from aGeneral Ontology

Subcommunity Ontology in Kinyamwezi Grammar

Numbering scheme consistent throughoutBantu

Ontologies

•In the description, the use of different ontologies often overlaps

•In Ijaw, the labels I, II, III, IV, and V are drawn from a local ontology—but these labels designate tone classes, a concept drawn from a general ontology

•Furthermore, the terms from local and subcommunity ontologies are often explicitly defined using terms from general ontologies (e.g., a Bantu extension is defined as a type of suffix)

Descriptive Prose

•Descriptive prose forms the heart of the traditional grammar

•In addition to free-form prose, it can contain

•References to lexical items

•References to other sections

•References to exemplar data

•References to terms drawn from ontologies

Descriptive Prose

•The various references can have a standardized format

•References to lexical items, for example, can have a orthographic_form ‘gloss’ format

•References to other sections make use of a (typically numeric) label like “1.2.1”

•References to exemplar data also make use of a label of some kind like “(3a)”

•General practice appears to be that references to ontologies are implicit

Descriptive Prose in Lezgian Grammar

Reference to lexical

item

Reference to exemplar

data

{Implicit grouping of prose and

exemplar data

Exemplar Data

•I use the term exemplar for data specifically selected as an example of some phenomena in a descriptive grammar

•There appear to be two major classes of exemplars

•Lexical exemplars (often arranged in a paradigm)

•Textual exemplars (typically in the form of interlinear text)

Exemplar Data

•Some features of exemplar data

•It is typically associated with a label (most commonly a number and/or a number followed by a letter)

•Exemplars can be grouped together

•Data may deviate from standard presentation format for illustrative purposes

Exemplar Data from Lezgian Grammar

LexicalExemplars(part of an exemplar

group)

TextualExemplars

Syntacticbracketing References to external set of texts

Comparisonforms

Structured Description

•I use the term structured description to refer to description, typically in tabular format, covering a particular, coherent domain of a language’s grammar

•Structured description, as understood here, is broader than the notion of a paradigm, as discussed in the Penton, Bow, Bird, and Hughes at this conference

•However, there is a large degree of overlap between the two

Schematic Structured Description from Kinyamwezi Grammar

Schematic tone patterns

Particular features: Lezgian

•A subject index with conventions for explicitly indicating the lack of grammatical phenomenon

•An example index indicating what examples, anywhere in the grammar, exemplify a given phenomenon

•A typographic distinction between language-particular morphological categories and universal and semantic categories (and, by extension, terms drawn from local and general ontologies)

Subject Index from Lezgian Grammar

“Negative”

Indexation

Example Index from Lezgian Grammar

Particular features: Lezgian• Terms referring to language-specific

categories are capitalized

• Ergative case

• Involuntary Agent construction

• “Universal” and semantic categories are not capitalized

• complement clause

• adverbial modifier

• The choice of labels for language-specific categories often implies a default mapping to a universal category

Particular features

•Ndyuka: Based on Comrie and Smith’s 1977 Lingua “Questionnaire”

•Kinyamwezi: Extensive use of a subcommunity ontology

•Ijaw: Extensive use of “legacy” formalisms

Excerpt from Lingua “Questionnaire”, Basis of Ndyuka Grammar

Legacy Formalism in Ijaw Grammar

Towards a Model

• A descriptive grammar can be understood as a series of annotations over a lexicon and set of texts for a given language—that is, as a type of metadatabase over more “primary” linguistic data

Towards a Model•The structure of an annotation

Towards a Model•Relationships among annotations

Towards an XML representation

•Grammar

•Ontologies

•Annotations

•Descriptive Prose

•Exemplar Set

•Lexicon

•Texts

•Various positions: Ontology References

<grammar>

    <ontology id="GOLD" level="general">     An internal general ontology, or a reference to an external general ontology would be placed in this element.     </ontology>

    <annotation title="Sample annotation" id="annotation_1">

       <ontRef ontologyName="GOLD" ref="some_GOLD_id">        An annotation can be associated with a reference to an ontology.       </ontRef>

        <descProse>         Descriptive prose for an annotation would be placed here. In addition, there could be inline references to a lexical item via an element like the following <lexRef ref="some_lexicon_id"/>. There can also be an exemplar set using the markup immediately below. The descriptive prose could also draw a term from an ontology by using an ontology reference as follows <ontRef ontologyName="GOLD" ref="some_other_GOLD_id"/>.         </descProse>

...

Towards an XML representation

...

        <exSet id="exemplar_set_1">

           <textEx id="some_text_id">

                 <ontRef ontologyName="MyLang" ref="some_localOnt_id">                  Ontology references can also be directly related to exemplars.                  </ontRef>

            </textEx>

        </exSet>

    </annotation>

    <lexicon>     An internal lexicon, or reference to an external lexicon.     </lexicon>

    <texts>     An internal set of texts, or reference to an external set of texts. </texts>

</grammar>

Towards an XML representation

Future Research

•Representations for structured description other than paradigms

•Representations for special annotations on exemplar data

•Incorporating machine-readable formal representations of phenomena into annotations

•Development of methods to transform XML representation into human-readable forms