Upload
elaine
View
61
Download
0
Tags:
Embed Size (px)
DESCRIPTION
[A Recursive Annotation Scheme [ for Referential Information Status] ] Arndt Riester 1 , David Lorenz 2 , Nina Seemann 1 1 Institute for Natural Language Processing (IMS) & SFB 732, University of Stuttgart 2 English Department, University of Freiburg. 19.5.2010 LREC Malta. - PowerPoint PPT Presentation
Citation preview
ww
w.u
ni-s
tuttg
art.d
e
[A Recursive Annotation Scheme [for Referential Information Status] ]
Arndt Riester1, David Lorenz2, Nina Seemann1
1Institute for Natural Language Processing (IMS) & SFB 732,University of Stuttgart
2English Department,University of Freiburg
19.5.2010LREC Malta
ww
w.u
ni-s
tuttg
art.d
e
2
Information Status
Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items
ww
w.u
ni-s
tuttg
art.d
e
3
Information Status
Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items or between GIVEN, ACCESSIBLE and NEW items (Chafe 1976,
1994)
ww
w.u
ni-s
tuttg
art.d
e
4
Information Status
Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items or between GIVEN, ACCESSIBLE and NEW items (Chafe 1976,
1994) or between EVOKED, INFERRABLE and NEW items (Prince
1981)
ww
w.u
ni-s
tuttg
art.d
e
5
Information Status
Describes the cognitive activation of nominal expressions Distinguishes between GIVEN and NEW items or between GIVEN, ACCESSIBLE and NEW items (Chafe 1976,
1994) or between EVOKED, INFERRABLE and NEW items (Prince
1981) or: e.g. Prince (1992), Nissim et al. (2004), Dipper et al. (2007)
BRAND-NEW ANCHORED
DISCOURSE OLD
OLD-RELATIVE
HEARER NEW
OLD-IDENTITYUNUSED
CONTAINING INFERRABLE
BRAND-NEW UNANCHORED
BRIDGING
DISC
OUR
SE N
EWTEXTUALLY EVOKED
MEDIATED-SITUATION
OLD-GENERIC
MEDIATED-PART
OLD-ID-GENERIC
OLD-GENERIC
OLD-GENERAL
DISCOURSE OLD
OLD-EVENTMEDIATED-GENERAL MEDIATED-AGGREGATED
MEDIATED-FUNC_VALUES
MEDIATED-POSSESSIVE
MEDIATED-EVENT
ACCESSIBLE-INFERABLE
ACCESSIBLE-SITUATION
ACCESSIBLE-GENERAL
SITUATIONALLY EVOKED
ww
w.u
ni-s
tuttg
art.d
e
6
Desiderata
A simple scheme based on clear theoretical assumptions Good inter-coder agreement for different textual genres Full coverage of all nominal expressions Capable of dealing with recursive embeddings
(1) [the red gem [in [the Queen‘s] crown] ]
3 referents
ww
w.u
ni-s
tuttg
art.d
e
7
Desiderata
A simple scheme based on clear theoretical assumptions Good inter-coder agreement for different textual genres Full coverage of all nominal expressions Capable of dealing with recursive embeddings
(1) [the red gem [in [the Queen‘s] A crown] B ] C
3 referents
3 nested labels for information status
ww
w.u
ni-s
tuttg
art.d
e
8
Two levels of givenness Givenness of words: repetition, synonymy, hypernymy(2) {On my way home, I saw a poodle.
a. It reminded me of Anna‘s poodle.b. It reminded me of Anna‘s dog.
Givenness of referents: coreference(3) {On my way home, I saw a poodle.}
a. The poodle / It tried to bite me.b. The stupid beast tried to bite me.
ww
w.u
ni-s
tuttg
art.d
e
9
Two levels of givenness Givenness of words: repetition, synonymy, hypernymy(2) {On my way home, I saw a poodle.
a. It reminded me of Anna‘s poodle.b. It reminded me of Anna‘s dog.
Givenness of referents: coreference(3) {On my way home, I saw a poodle.}
a. The poodle / It tried to bite me.b. The stupid beast tried to bite me.
Keep the two apart! In the following: GIVEN ≡ coreferential But see Baumann & Riester (2010) for a two-level scheme
( Importance for prosody)
ww
w.u
ni-s
tuttg
art.d
e
10
Context Theory
discourse context(e.g. DRT; Kamp &
Reyle 1993): what has been explicitly stated
before
utterance context (indexicality; e.g.
Kaplan 1989): speaker, location, time; entities in visual environment
frame contexts(e.g. Fillmore 1985):
plausible protagonists in a scenario
encyclopaedic context (e.g. Kamp, to appear): world
knowledge of an expected audience
ww
w.u
ni-s
tuttg
art.d
e
11
A Simple Rule for Definite Expressions
Definite descriptions, demonstratives, proper names, pronouns trigger the presupposition that their referent should be identified in „the“ context (e.g. Heim, 1983; van der Sandt, 1992).
Claim: Information status classes should directly reflect the four context components.
ww
w.u
ni-s
tuttg
art.d
e
12
A Simple Rule for Definite Expressions
Definite descriptions, demonstratives, proper names, pronouns trigger the presupposition that their referent should be identified in „the“ context (e.g. Heim, 1983; van der Sandt, 1992).
Claim: Information status classes should directly reflect the four context components.
Definite identified in Information status class
discourse context GIVEN
utterance context SITUATIVE
frame context BRIDGING
encyclopaedic context UNUSED
ww
w.u
ni-s
tuttg
art.d
e
13
Annotating Hearer Knowledge (UNUSED) Prince (1981): choice of referring expression reflects the speaker‘s/
writer‘s assumptions concerning the hearer‘s knowledge (assumed familiarity)
No access to the speaker‘s mind Simplification: as an annotator, decide upon your own expectations
whether a (non-anaphoric) item is known to an intended audience
ww
w.u
ni-s
tuttg
art.d
e
14
Will they know this?
YES
UNUSED-KNOWN
NO
UNUSED-UNKNOW
N
Annotating Hearer Knowledge (UNUSED) Prince (1981): choice of referring expression reflects the speaker‘s/
writer‘s assumptions concerning the hearer‘s knowledge (assumed familiarity)
No access to the speaker‘s mind Simplification: as an annotator, decide upon your own expectations
whether a (non-anaphoric) item is known to an intended audience
„Barack Obama“ „the woman Max went out with last night“
ww
w.u
ni-s
tuttg
art.d
e
15
Will they know this?
YES
UNUSED-KNOWN
NO
UNUSED-UNKNOW
N
Annotating Hearer Knowledge (UNUSED) Prince (1981): choice of referring expression reflects the speaker‘s/
writer‘s assumptions concerning the hearer‘s knowledge (assumed familiarity)
No access to the speaker‘s mind Simplification: as an annotator, decide upon your own expectations
whether a (non-GIVEN) item is known to an intended audience
„Barack Obama“ „the woman Max went out with last night“
accommodationencyclopaedic
knowledge
ww
w.u
ni-s
tuttg
art.d
e
16
News Example (USA Today, 17.5.10)[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but [a recent flurry of Catholic and Jewish appointments]INDEF-NEW
has turned [them]GIVEN-PRONOUN [into a minority of one [on the Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should [Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a Protestant-free zone]INDEF-GENERIC [for the first time since [John Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged [[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.
ww
w.u
ni-s
tuttg
art.d
e
17
News Example (USA Today, 17.5.10) [...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but [a recent flurry of Catholic and Jewish appointments]INDEF-NEW
has turned [them]GIVEN-PRONOUN [into a minority of one [on the Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should [Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a Protestant-free zone]INDEF-GENERIC [for the first time since [John Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged [[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.
ww
w.u
ni-s
tuttg
art.d
e
18
News Example (USA Today, 17.5.10)[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but [a recent flurry of Catholic and Jewish appointments]INDEF-NEW
has turned [them]GIVEN-PRONOUN [into a minority of one [on the Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should [Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a Protestant-free zone]INDEF-GENERIC [for the first time since [John Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged [[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.
ww
w.u
ni-s
tuttg
art.d
e
19
News Example (USA Today, 17.5.10)[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED, but [a recent flurry of Catholic and Jewish appointments]INDEF-NEW
has turned [them]GIVEN-PRONOUN [into a minority of one [on the Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should [Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the nation‘s]GIVEN-EPITHET highest court]GIVEN-EPITHET would be [a Protestant-free zone]INDEF-GENERIC [for the first time since [John Jay, [the nation‘s]GIVEN-REPEATED first chief justice (and an Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged [[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.
ww
w.u
ni-s
tuttg
art.d
e
20
News Example (USA Today, 17.5.10)[...] [Protestants]INDEF-RESUMPTIVE still account [for about 55% [of the 111th Congress]UNUSED-UNKNOWN]INDEF-PARTITIVE-CONTAINED,
but [a recent flurry of Catholic and Jewish appointments]INDEF-NEW has turned [them]GIVEN-PRONOUN [into a minority of one [on the Supreme Court]BRIDGING]INDEF-NEW(PREDICATE). Should [Kagan]GIVEN-SHORT be confirmed [next week]SITUATIVE, [[the nation‘s]GIVEN-EPITHET
highest court]GIVEN-EPITHET would be [a Protestant-free zone]INDEF-GENERIC [for the first time since [John Jay, [thenation‘s]GIVEN-REPEATED first chief justice (and an Episcopalian)]UNUSED-UNKNOWN]UNUSED-UNKNOWN, banged [[his]GIVEN-PRONOUN gavel]UNUSED-UNKNOWN [in 1790]UNUSED-KNOWN.
ww
w.u
ni-s
tuttg
art.d
e
21
Data
Transcripts from German radio news bulletins (three full days of (hourly) news)
About 3000 sentences Parsed with XLE / German LFG grammar (Rohrer & Forst 2006) Annotated with SALTO tool (Burchardt et al. 2006), extended
TigerXML format Two annotators, verification and ultimate decision by a third
annotator
ww
w.u
ni-s
tuttg
art.d
e
22
Annotation using SALTO (Burchardt et al. 2006)
„...said Kirchner in Cordoba...“ „... the Argentinian head of state...“
ww
w.u
ni-s
tuttg
art.d
e
23
Inter-Annotator Agreement (Cohen 1960)
Evaluation performed on a subset comprising 1149 nominal expressions, which the annotators had to identify by themselves
1100 expressions identified by both annotators 757 labeled identically Agreement κ = .66 (full scheme: 21 subclasses)
κ = .78 (core scheme comprising 6 classes: GIVEN, SITUATIVE, BRIDGING, UNUSED, INDEF, OTHER)
Comparison: Dipper et al. (2007), κ = .55 (newspaper commentaries) Nissim et al. (2004), κ = .79 (full); κ = .85 (core) (dialogue)
(fewer embeddings; pre-exclusion of „difficult“ cases)
(Source: Ritz et al. 2008)
ww
w.u
ni-s
tuttg
art.d
e
24
Conclusion
Scheme enables fast, comprehensible and reliable annotations of nested expressions in arbitrary text genres
Useful fora. Computational linguists: e.g. creating a gold standard for anaphora
resolution and related tasksb. Theoretical linguists: empirical data for investigations into form of
referring expressions, (non-)restrictivity of modification, word order, grammatical role, discourse structure etc.
c. Phoneticians: investigating prosody in spoken corpora
Learn more: http://www.ims.uni-stuttgart.de/~arndt
ww
w.u
ni-s
tuttg
art.d
e
25
Thank you!
ww
w.u
ni-s
tuttg
art.d
e
26
Details: GIVEN
Subclasses: PRONOUN, REFLEXIVE, SHORT, REPEATED, EPITHET
(1) Both had the blessings of Dr. Richard Klausner. But even [Klausner]GIVEN-SHORT had to be persuaded at first.
(2) Before the European Union‘s ban on incandescent lightbulbs went into effect on Sept. 1, consumers across Europe raided stores to stockpile [the familiar bulbs]GIVEN-EPITHET
ww
w.u
ni-s
tuttg
art.d
e
27
Details: BRIDGING
Subclasses: 0, TEXT, CONTAINED
(1) Germany lost the football match against England because [the audience]BRIDGING was against them.
(2) United were trailing 3-1 when Fletcher was felled [in the area]BRIDGING-TEXT by Aleksei Berezutski. The Scotland Midfielder midfielder was then yellow-carded by [the referee]BRIDGING-TEXT.
ww
w.u
ni-s
tuttg
art.d
e
28
Details: bridging-contained vs. unused-unknown
(1) The Republicans won [the governorship of Virginia]BRIDGING-
CONTAINED.
(expected / prototypical relationship)
(2) He was convicted of helping to organise [the seizure [of Osama Moustafa Nasr]]UNUSED-UNKNOWN from a Milan street in February 2003.(non-prototypical relationship, can‘t be separated)
(3) # Speaking of Osama Moustafa Nasr, [the seizure] happened in 2003.
ww
w.u
ni-s
tuttg
art.d
e
29
Details: INDEF
Subclasses: NEW, GENERIC, PARTITIVE, RESUMPTIVE
(1) [A man]INDEF-NEW came in. He bought a pair of shoes.(2) [Serious beer drinkers]INDEF-GENERIC should head straight to this
550-year old institution.(3) At violent clashes between the police and demonstrating Kurds,
[three demonstrators]INDEF-PARTITIVE were injured.(4) That‘s close to how a cancer vaccine works, but not precisely.
Most experts see [cancer vaccines]INDEF-RESUMPTIVE as a hybrid of treatment and prevention.
ww
w.u
ni-s
tuttg
art.d
e
30
Other
EXPLETIVE NULL: nobody, nothing RELATIVE: non-restrictive relative clause CATAPHOR: can be indefinite or definite