Upload
warren-norton
View
224
Download
0
Tags:
Embed Size (px)
Citation preview
The Descriptive Grammar as a (Meta)Database
Jeff GoodUniversity of Pittsburgh and
Max Planck Institute for Evolutionary Anthropology
Structure of Presentation
•Discuss major features found in four descriptive grammars surveyed for this paper
•Discuss special features particular to each of the grammars
•Propose a conceptual model for the structure of the information found in descriptive grammars
•Propose a basic XML representation of that model
The Grammars Surveyed•A “best practice” grammar
Haspelmath’s (1993) Lezgian Grammar (Northeast Caucasian/Nakh-Daghestanian)
•A subcommunity grammar Maganga and Schadeberg’s (1992) Kinyamwezi Grammar (Bantu)
•A Lingua Questionnaire grammar Huttar and Huttar’s (1994) Ndyuka Grammar (Atlantic creole)
•A “legacy” grammar Williamson’s (1965) Ijaw Grammar (Niger-Congo)
Four common features
•Nested, labeled sections
•Descriptive prose
•Exemplars
•Reference to multiple ontologies
•A fifth feature: “Structured description”
•See Penton, Bow, Bird, and Hughes
•“Towards a General Model for Linguistic Paradigms”
Sections•Like most relatively long documents,
grammars are divided into marked sections
•Sections can be nested inside other sections
•The content of sections tends to be partially standardized (e.g., most grammars will have a sections on consonants, vowels, basic sentence structure, etc.)
•Sections typically are associated with a label and often with a title
•Sectioning can also be sensitive to ontologies
Lezgian Grammar Sections
Verbal inflection
Introduction
The three stems of strong verbs
Verbal inflectional categories
Forms derived from the Masdar stem
Forms derived from the Imperfective stem
Forms derived from the Aorist stem
Secondary verbal categories
Prefixal negation and the Periphrasis forms
Illustrative verbal paradigms
Irregular verbs
The copulas
Verbs lacking a Masdar and Aorist stem
Secondary verbal categories
Verbs with root in ä(g)-
Functions of basic tense-aspect categories
Imperfective
Future
Aorist
Perfect
Continuative Imperfective and Continuative Perfect
Past
Functions of non-indicative finite verb forms
Imperative
Prohibitive
Hortative
Optative
Conditional
Interrogative...
Morphological criteriaSemantic criteria
Ontologies
•Descriptive grammars make extensive use of multiple ontologies (i.e., structured sets of categories)
•Three kinds of ontologies
•General (assumed to be understood by the entire linguistics community)
•Subcommunity (assumed to be understood by a well-defined subcommunity of linguists)
•Local (only taken to be meaningful in the context of the particular language being described)
General and Local Ontologies in Ijaw Grammar
Arbitrary labels:Local Ontology
Terms from aGeneral Ontology
Ontologies
•In the description, the use of different ontologies often overlaps
•In Ijaw, the labels I, II, III, IV, and V are drawn from a local ontology—but these labels designate tone classes, a concept drawn from a general ontology
•Furthermore, the terms from local and subcommunity ontologies are often explicitly defined using terms from general ontologies (e.g., a Bantu extension is defined as a type of suffix)
Descriptive Prose
•Descriptive prose forms the heart of the traditional grammar
•In addition to free-form prose, it can contain
•References to lexical items
•References to other sections
•References to exemplar data
•References to terms drawn from ontologies
Descriptive Prose
•The various references can have a standardized format
•References to lexical items, for example, can have a orthographic_form ‘gloss’ format
•References to other sections make use of a (typically numeric) label like “1.2.1”
•References to exemplar data also make use of a label of some kind like “(3a)”
•General practice appears to be that references to ontologies are implicit
Descriptive Prose in Lezgian Grammar
Reference to lexical
item
Reference to exemplar
data
{Implicit grouping of prose and
exemplar data
Exemplar Data
•I use the term exemplar for data specifically selected as an example of some phenomena in a descriptive grammar
•There appear to be two major classes of exemplars
•Lexical exemplars (often arranged in a paradigm)
•Textual exemplars (typically in the form of interlinear text)
Exemplar Data
•Some features of exemplar data
•It is typically associated with a label (most commonly a number and/or a number followed by a letter)
•Exemplars can be grouped together
•Data may deviate from standard presentation format for illustrative purposes
Exemplar Data from Lezgian Grammar
LexicalExemplars(part of an exemplar
group)
TextualExemplars
Syntacticbracketing References to external set of texts
Comparisonforms
Structured Description
•I use the term structured description to refer to description, typically in tabular format, covering a particular, coherent domain of a language’s grammar
•Structured description, as understood here, is broader than the notion of a paradigm, as discussed in the Penton, Bow, Bird, and Hughes at this conference
•However, there is a large degree of overlap between the two
Particular features: Lezgian
•A subject index with conventions for explicitly indicating the lack of grammatical phenomenon
•An example index indicating what examples, anywhere in the grammar, exemplify a given phenomenon
•A typographic distinction between language-particular morphological categories and universal and semantic categories (and, by extension, terms drawn from local and general ontologies)
Particular features: Lezgian• Terms referring to language-specific
categories are capitalized
• Ergative case
• Involuntary Agent construction
• “Universal” and semantic categories are not capitalized
• complement clause
• adverbial modifier
• The choice of labels for language-specific categories often implies a default mapping to a universal category
Particular features
•Ndyuka: Based on Comrie and Smith’s 1977 Lingua “Questionnaire”
•Kinyamwezi: Extensive use of a subcommunity ontology
•Ijaw: Extensive use of “legacy” formalisms
Towards a Model
• A descriptive grammar can be understood as a series of annotations over a lexicon and set of texts for a given language—that is, as a type of metadatabase over more “primary” linguistic data
Towards an XML representation
•Grammar
•Ontologies
•Annotations
•Descriptive Prose
•Exemplar Set
•Lexicon
•Texts
•Various positions: Ontology References
<grammar>
<ontology id="GOLD" level="general"> An internal general ontology, or a reference to an external general ontology would be placed in this element. </ontology>
<annotation title="Sample annotation" id="annotation_1">
<ontRef ontologyName="GOLD" ref="some_GOLD_id"> An annotation can be associated with a reference to an ontology. </ontRef>
<descProse> Descriptive prose for an annotation would be placed here. In addition, there could be inline references to a lexical item via an element like the following <lexRef ref="some_lexicon_id"/>. There can also be an exemplar set using the markup immediately below. The descriptive prose could also draw a term from an ontology by using an ontology reference as follows <ontRef ontologyName="GOLD" ref="some_other_GOLD_id"/>. </descProse>
...
Towards an XML representation
...
<exSet id="exemplar_set_1">
<textEx id="some_text_id">
<ontRef ontologyName="MyLang" ref="some_localOnt_id"> Ontology references can also be directly related to exemplars. </ontRef>
</textEx>
</exSet>
</annotation>
<lexicon> An internal lexicon, or reference to an external lexicon. </lexicon>
<texts> An internal set of texts, or reference to an external set of texts. </texts>
</grammar>
Towards an XML representation
Future Research
•Representations for structured description other than paradigms
•Representations for special annotations on exemplar data
•Incorporating machine-readable formal representations of phenomena into annotations
•Development of methods to transform XML representation into human-readable forms