Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Manual annota)on in a func)onal-‐ typological grammar study
(A study on the Javanese dialect of Kudus, Indonesia)
Noor Malihah
Loca)on Kudus Yogyakarta
Solo
The grammar of Javanese
• SVO (verbal clauses). • Javanese NPs lack number marking, plurality is indicated by a numeral.
• No tenses. • Verbs may be combined with aspectual markers and modals.
Goal
• To annotate manually the JDK spoken and wriLen corpus
• To use the annota)on in a func)onal-‐typological grammar study, especially on the passive, the applica)ve, and the causa)ve.
Why passive, applica)ve, and causa)ve?
a. Many scholars have broadly discussed the phenomena of passive, applica)ve, and causa)ve in the Austronesian languages.
b. The same phenomenon: valency changing construc)on
c. In JDK, they have dis)nc)ve features compared to standard Javanese
d. Applica)ve and causa)ve have the same morphological markers in Javanese
Passive
• It contrasts to another construc)on, the ac)ve; • The subject of the ac)ve corresponds to a non-‐obligatory oblique phrase in the passive; or is not overtly expressed;
• The subject of the passive, if there is one, corresponds to the direct object of the ac)ve;
• The construc)on is pragma)cally restricted rela)ve to the ac)ve;
• The construc)on displays special morphological marking of the verb.
(Siewierska, 2005)
Example of passive in English
a. John bought the book.
b. The book was bought by John.
The applica)ve
• A sentence where an extra object is added. • Haspelmath (2001): Applica)ve as a valency-‐increasing phenomenon where a direct object is added to a verb.
• It is just like in English (Gropen et al. 1989: 204): a. John gave a gi^ to Mary. b. John gave Mary a gi^.
Example of an applica)ve in JDK a. FS:03:M:A:C: 136 Lha otoma)se asu iku mau kan yo nyedak-‐i EMPH automa)cally dog that DEF EMPH also ACT.approach-‐APPL bulus iku mau turtle that DEF
b. Non-‐applica)ve (manipulated example) Lha otoma)se asu iku mau kan yo nyedak ning
EMPH automa)cally dog that DEF EMPH also ACT.approach to bulus iku mau turtle that DEF ‘Huh, automa)cally, that dog also approached that turtle.’
Causa)ve • Causa)viza)on creates a new predicate with an agent causer added.
• Somebody makes someone do something. • Talmy (2000), Shibatani (1976) define a causa)ve situa)on as a situa)on that can be analyzed into two sub-‐events: a causing and a caused event. The cause event must follow causally from the causing event. a. The caused event would not occur if the causing event did not occur; b. The caused event does indeed occur.
Example in English
(1) a. The children danced. b. The teacher made the children dance.
(2) a. The robber died.
b. The policeman killed the robber.
General ideas
• A rela)vely small data collec)on • Manually annotated the data for various gramma)cal features
• Use the tags to examine the correla)on between one code and the other code(s)
Data collec)on
• Type of data : Elicited narra)ve, spontaneous speech, wriLen data.
• Period : September 2010 – January 2011 (5-‐month data collec)on)
• Place : Kudus regency, Central Java, Indonesia
Manual annota)on • Goal To produce a corpus for a grammar study. I am not producing the perfect corpus for future genera)ons, but a workable corpus for my own use. The annotated corpus will be used to do the analysis of the JDK grammar. The manual annota)on of the JDK data is linguis)cally rich informa)on ranging from morphology through syntax and seman)cs.
Why manual annota)on?
• The data set contains a small number of annotated data (see table 1). a. Recording from 49 JDK na)ve speakers b. WriLen data from six ar)cles from a local newspaper
Table 1. The distribu)on of informants, clauses, words with different data sources
Corpus Number of informants
Number of clauses
Number of words
Narra)ve Frog story
41 2,431 37,716
Spontaneous speech
8 1,045 6,103
WriLen data 6 586 3,547
TOTAL 55 4,062 47,366
Prepara)on
• A word document is used to transcribe and annotate.
• An excel sheet is used to record the quan)ta)ve results.
Step 1 • Decided the codes used to annotate, including:
a. Type of clauses (ac)ves, passives, and erga)ve-‐like) b. Applica)ves and causa)ves; c. Transi)vity of the verb base; d. Gramma)cal rela)ons; e. Seman)c features of the nouns; f. Seman)c roles of the nouns; g. POS; h. Data sources.
Step 2
• Read through and annotated every single clause.
• Explicitly added informa)on on each clauses and words in each text in the corpora.
• These tags were used to look at the correla)on between a par)cular gramma)cal feature and the others.
Rules: • A single clause: -‐ Indicates a single situa)on or ac)on or event -‐ A dependency of a predicate and an argument (Ewing, 1998: 14)
• Annota)on Each annota)on was placed in angle brackets, the posi)on of these tags varies.
Step 3: Code for data sources • My transcripts were coded to indicate informa)on about the speakers who produced each clause.
• Each single clause is labeled using a uniform format.
• The ID code preceding each clause iden)fies the type of data, the sex, age, and place of residence of the speaker and clause number.
How to use the codes for data sources • A combina)on of codes serves as a unique iden)fier for a par)cular clause.
• There is no clause that has the same string. • Example: FS:01:F:A:C: 008 refers to data elicited using the frog story method, narrated by informant number one, who is female, adult and who lives in an urban area; and this is clause number eight in the transcript.
Codes applied to verbs Codes Informa>on Posi>on
TR1 or TR2 or INT1 or INT2
Ac)ve transi)ve/intransi)ve verbs. Each TR or INT is iden)fied as 1 (for verbs with the nasal prefix) or 2 (for verbs without the nasal prefix).
Immediately a^er the verb
PASS1 or PASS2 or PASS3
Passive type 1, or passive type 2, or passive type 3. The classifica)on is based on the presence of agent, pa)ent, and preposi)on in a clause
Immediately a^er the verb
UNMARKED Passive without morphology Immediately a^er PASS1, or PASS2 or PASS3
ERGL1 or ERG2L
ERGL1 labels an erga)ve-‐like clause where the agent is the first person singular pronoun, ERGL2 codes an erga)ve-‐like clause where the agent is the second person pronoun
Immediately a^er the verb
APPL1 or APPL2 or APPL3
APPL1 labels a verb with –(a)ke; APPL2 shows a verb with –na; and APPL3 indicates a verb with –i.
Immediately a^er TR1 or TR2 or PASS1 or PASS2 or PASS3 or ERG1 or ERG2
con.nue
Codes Informa>on Posi>on
CAUS1 or CAUS2 or CAUS3
CAUS1 labels a verb with –(a)ke; CAUS2 shows a verb with –na; and CAUS3 indicates a verb with –i.
Immediately a^er TR1 or TR2 or PASS1 or PASS2 or PASS3 or ERG1 or ERG2
ADV Indicates an adversa)ve passive Immediately a^er the verb
ANS Ac)ve clause without Subject Immediately a^er TR1 or TR2
PNS Passive clause without subject Immediately a^er PASS1 or PASS2 or PASS3
Example (1) FS:01:M:A:C: 003 terus kui bocah-‐bocah kui mancing <INT2> then that child-‐child that ACT.go.fishing ‘Then those children went fishing.’ (2) WR:07: 042 Suplo ngagetna <TR2> <CAUS1> paklike lan mboklike Suplo ACT.surprise.CAUS uncle and aunty ‘Suplo caused his uncle and his aunty to surprise.’
Codes applied to clauses Codes Informa>on Posi>on
NOM1 or NOM2
NOM1 indicates a non-‐verbal clause and NOM2 labels an existen)al clause
At the end of the clause.
IMPER Impera)ve clause At the end of the clause
REL Rela)ve marker Immediately a^er the Javanese rela)ve marker sing or kang
Examples (1) FS:02:M:A:C: 006 nanging Budi orak kuat <NOM1> But Budi NEG strong ‘But Budi was not strong.’
(2) FS:03:M:A:C: 025 Loh kok malah ono bulus <NOM2> Huh EMP actually exist turtle ‘Huh, actually there was a turtle.’
Codes applied to nouns 1: Indica)ng seman)c features of the nouns
Codes Informa>on Posi>on
HUM or NONH
Human or non-‐human noun Immediately a^er a noun
ANIM or INA Animate or inanimate Immediately a^er label HUM or NONH
DEF NP or INDEF
Definite or Indefinite noun phrase
Immediately a^er label ANIM or INA. Only for common nouns. NAME is used instead when a noun is a name. 1 or 2 or 3 is used instead when the noun is first person pronoun or second person pronoun or third person pronoun
1 or 2 or 3 First person pronoun, or second person pronoun or third person pronoun
Immediately a^er label ANIM OR INA
S or P Singular or plural Immediately a^er DEF NP or INDEF NP or NAME or 1 or 2 or 3
Examples (1) SS:02:F:A:C: 305 Setange iki <NONH> <INA> <DEF NP> <S> hurung Steering this NEG dibenak-‐benakke <PASS3> <CAUS2> PASS.fix-‐fix.CAUS ‘This steering has not been fixed.’
Codes indica)ng seman)c roles Codes Informa>on Posi>on
AGT Agent A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL
PAT Pa)ent A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL
BEN Benefac)ve A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL
REC Recipient A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL
LOC Loca)on A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL
INST Instrument A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL
GOAL Goal A^er the code indica)ng the seman)c features of the nouns, immediately a^er SG or PL
Examples • SS:02:F:A:C: 305 setange iki <INA> <NONH> <DEF NP> <S> <PAT> hurung dibenak-‐benakke <PASS3> <CAUS2>
‘This steering has not been fixed.’ • FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> marani <TR2> <APPL3> buluse <GOAL> karo njegogi <TR2> <APPL3>
‘Budi’s dog approached the turtle and barked.’
Codes indica)ng the gramma)cal rela)ons of the nouns
Codes Informa>on Posi>on
SUBJ Subject of the clause A^er the code indica)ng the seman)c roles of the noun
OBJ Object of the clause A^er the code indica)ng the seman)c roles of the noun
IO Indirect object of the clause
A^er the code indica)ng the seman)c roles of the noun
Examples
• SS:02:F:A:C: 305 Setange iki <INA> <NONH> <DEF NP> <S> <PAT> <SUBJ> hurung dibenak-‐benakke <PASS3> <CAUS2>
‘This steering has not been fixed.’ • FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> <SUBJ> marani <TR2> <APPL3> buluse <NONH> <ANIM> <GOAL> <OBJ> karo njegogi <TR2> <APPL3> ‘Budi’s dog approached the turtle and barked.’
Codes indica)ng the lexical and morphosyntac)c features of the dialect
Code Informa>on Posi>on
JDK Lexical or morphosyntac)c features of JDK. To allow me to demonstrate that the clauses are originally produced by the na)ve speakers of JDK, the features need to be coded. I will only analyze any texts containing clauses with JDK features.
Immediately a^er the features in the clause
Examples
• SS:02:F:A:C: 305 Setange iki <INA> <NONH> <DEF NP> <S> <PAT> <SUBJ> hurung <JDK> dibenak-‐benakke <PASS3> <CAUS2>
‘This steering has not been fixed.’ • FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> <SUBJ> marani <TR2> <APPL3> buluse <NONH> <ANIM> <GOAL> <OBJ> karo njegogi <TR2> <APPL3> <JDK> ‘Budi’s dog approached the turtle and barked.’
Results
• An annotated dataset containing relevant informa)on to answer my research ques)ons
• Quan)ta)ve results are obtained by coun)ng the co-‐occurrence of a par)cular feature in the dataset.
con.nue
• From these tags, I can describe a par)cular construc)on in data number xxx, for example: a. The type of clause b. The transi)vity of the verb base c. The animacy of the subject d. The animacy of promoted argument e. The seman)c role of the promoted
argument
Example • FS:02:M:A:C: 010 Kirike budi <NONH> <ANIM> <DEF NP> <S> <AGT> <SUBJ> marani <TR2> <APPL3> buluse <NONH> <ANIM> <GOAL> <OBJ> karo njegogi <TR2> <APPL3> <JDK>
a. Data in FS:02:M:A:C: 010 is an applica)ve type 3 b. The agent is the subject and is non-‐human animate (animal).
c. The promoted argument or the object is also a non-‐human animate (animal) and it is a goal.
How to use the results
• Combine one informa)on with another informa)on to answer about the use of a par)cular gramma)cal construc)on.
• For example: informa)on about seman)c role of a noun phrase can be combined with the applica)ve to answer how each seman)c role of the promoted argument is promoted with the applica)ve type 1.
How to use the tags (1) • Search for the occurrences of a par)cular construc)on, for example applica)ve.
• Highlight all entries with applica)ve (APPL1, APPL2, APPL3)
• Put the entry for a par)cular construc)on in a separate file, for example: when I searched for an applica)ve, I will have four separate file for APPL1, APPL2, APPL3 and applica)ve all together
con.nue
• At the same )me, I used an excel sheet for several purposes, such as to list the verbs or other informa)on needed, to record the quan)ta)ve results, and to create a graph based on the quan)ta)ve results
List of verbs in APPL1
Quan)ta)ve results
Graph
-‐na -‐(a)ke -‐i All applica)ve
Baseline
64.9
80.0 76.2 73.8 78.3
35.1
20.0 23.8 26.2 21.7
The distribu>on of subject animacy with the different applica>ve markers
Animate subject Inanimate subject
How to use the tags (2)
• To examine the transi)vity of the verb bases in the applica)ve construc)ons, I looked at the tags on the verbs (TR1 or TR2 or INT1 or INT2 or ERGL1 or ERGL2)
• To see the animacy of the subject in the applica)ves, I used the tags for ANIM or INA and SUBJ
con.nue
• To see the animacy of the promoted argument in the applica)ves, I looked at the tags for ANIM or INA and OBJ (the promoted argument)
• To inves)gate the seman)c role of the promoted argument in the applica)ve, I examined the tags for seman)c roles (PAT or BEN or INST or LOC or GOAL or REC)
con.nue
• I also used these tags to count the frequency distribu)on with which each gramma)cal phenomenon co-‐occurs
• For example to examine the co-‐occurrence of the affixes used to promote each seman)c role.
Example
Benefac)ve Recipient Loca)on Goal Instrument Pa)ent
62.7
0.0 0.0
18.3
100.0
73.5
37.3
0.0 0.0 6.1 0.0
26.5
0.0
100.0 100.0
75.6
0.0 0.0
The distribu>on of the affixes used to promote each seman>c role
-‐na -‐(a)ke -‐i
Challenge 1
• To decide the appropriate codes in the annota)on which were relevant to the main research ques)ons. The annota)on should make it possible to search for specific informa)on in the data set
• For example: to adopt INT or INTR for an intransi)ve verb, S or SUBJ for a subject of a clause.
Challenge 2
• Consistency • For example: to adopt clear criteria on what counts as an animate or inanimate noun or other gramma)cal terms.
• Sikile asu ‘the dog’s leg’ is an animate or inanimate noun
Challenge 3 • High accuracy • For example: a. Mistyped <APPL1> à <APLL1> b. Extra space <ANIM> à < ANIM> c. Human mistakes <HUM> à <NONH>
Challenge 4
• Many files • Save each files for a par)cular construc)on in a separate file.
• For example: In the applica)ve, at least there were 5 files, namely: file for all dataset, file for applica)ve all together, file for applica)ve type 1, type 2 and type 3.
Challenge 5
• Time-‐consuming • Why? A manual entry of the analysis • When there were any changes for one piece of informa)on, a revision is needed for the whole dataset – start the tagging from the beginning
Summary
• Manual annota)on is possible to do in a func)onal-‐typological grammar study
• Some good points • Some challenges