26
Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary RTMBank: Capturing Verbs with Reichenbach’s Tense Model Leon Derczynski and Robert Gaizauskas University of Sheffield 20 July 2011 Leon Derczynski and Robert Gaizauskas University of Sheffield RTMBank: Capturing Verbs with Reichenbach’s Tense Model

RTMBank: Capturing Verbs with Reichenbach's Tense Model

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

RTMBank: Capturing Verbs with Reichenbach’s

Tense Model

Leon Derczynski and Robert Gaizauskas

University of Sheffield

20 July 2011

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 2: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Outline

1 Introduction

2 Reichenbach’s Model of Tense

3 RTMML

4 Building the corpus

5 Summary

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 3: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Introduction to RTMBank

We present RTMBank - a annotated corpus of tense verbs andtheir relations - that helps us better automatically process

temporal information in text.

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 4: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Temporal Annotation

Why annotate temporal information?

Relating time is an essential part of discourse.

To automatically process time information in documents, wecan record data with a temporal annotation.

Data annotated might include: times, events, the relationsbetween them.

Existing standards - TIMEX, TimeML

Existing corpora - TimeBank, AQUAINT TimeML corpus,WikiWars

We focus on two sub-tasks: relating events, and processingtemporal expressions.

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 5: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Relating events

How do we relate things in time?

Temporal expressions and events may be seen as intervals.

Defining relations between two intervals permits temporalordering.

Human readers can usually temporally relate events in adiscourse using cues.

Automatic labelling of temporal links is a difficult researchproblem.

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 6: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Temporal expressions

What is a temporal expression?

A temporal expression is text that describes a time or period.

Wednesday; December 4, 1997; for two weeks

The process of mapping temporal expressions to an absolutecalendar is normalisation.

Wednesday ⇒ 2011/01/12 T 00:00:00 - 2011/01/12 T

23:59:59

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 7: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Outline

1 Introduction

2 Reichenbach’s Model of Tense

3 RTMML

4 Building the corpus

5 Summary

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 8: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

The Model

When describing an event, we can call the time of the event E .

The event is described at speech time S .

“I will walk the dog.” ⇒ S < E

Reichenbach introduces a reference point, an abstract timefrom which events are viewed.

“I ran home” ⇒ E < S

“I had run home” ⇒ E < R < S

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 9: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Special properties of the reference point

permanence: when sentences or clauses are combined,grammatical rules constrain the set of available tenses. Theserules work such that R has the same position in all cases.

“I had run home when I heard the noise”

positional use: when a time is found in the same clause as averbal event, R is bound to the time.

“It was six o’clock and John had prepared” – the preparation,E , occurs before six o’clock, R .

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 10: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Motivation for using the model

How will this model help temporal annotation?

At least 10% of normalisation errors are due to incorrect R1.

There is a “need to develop sophisticated methods for

temporal focus tracking if we are to extend current

time-stamping technologies”2.

If can relate the S and R of multiple event verbs, we canreason about and label the relation between event times.

Tracking S , E and R according to Reichenbach’s model helpsgenerate correct tense and aspect for NLG3.

1Mani & Wilson, 20002Mazur & Dale, 2010

3Elson & McKeown, 2010

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 11: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Outline

1 Introduction

2 Reichenbach’s Model of Tense

3 RTMML

4 Building the corpus

5 Summary

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 12: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

What to annotate?

RTMML is intended to describe Reichenbach’s verbal eventstructure.

First primitive: verb groups describing an event (had snowed,is running, will have given).

Second primitive: text describing a time (next two weeks, lastApril).

We are more concerned with the relations between times thanthe absolute position of the times.

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 13: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Annotation Schema

What does the markup look like?

RTMML is an XML-based standoff annotation standard thatmay be standalone or integrated with TimeML.

<doc> defines document creation/utterance time, which isalso the default speech time.

<verb> denotes verb groups, including tense, and relationsbetween S , E and R . These points may be expressed in termsof other times in the annotation or new labels.

<timerefx> denotes a time-referring expression, which can beoptionally specified in TIMEX3 format.

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 14: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Example markup

<rtmml>

Yesterday, John ate well.

<seg type="token" />

<doc time="now" />

<timerefx xml:id="t1" target="#token0" />

<verb xml:id="v1" target="#token3"

view="simple" tense="past"

sr=">" er="=" se=">"

r="t1" s="doc" />

</rtmml>

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 15: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

RTMML links

Although we can relate primitives with the current syntax,three common types of link are catered for with <rtmlink>:

positions, describing positional use of the reference point;

same timeframe, where events share a commonly situatedreference point;

reports, for reported speech.

<rtmlink xml:id="l1" type="POSITIONS">

<link source="#t1" />

<link target="#v1" />

</rtmlink>

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 16: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Outline

1 Introduction

2 Reichenbach’s Model of Tense

3 RTMML

4 Building the corpus

5 Summary

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 17: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Annotation Method

How did we create the corpus?

Collect 50 documents that overlap with TimeML text(TimeBank / ATC)

Pre-process with GATE to identify VG (verb groups)

Pre-process with TERNIP to identify times

Add “reporting” links for some verbs (e.g. said)

Curate annotations per-document

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 18: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Annotation Guidelines

What edge cases frequently emerged?

Infinitives: ignore these, and use the dominating verb (theywant to find out)

Times as durations: only include those that can be absolutelynormalised

Modals: only include these in a verb annotation for futuretense

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 19: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Corpus Statistics

What did we end up with?

50 RTMML annotated documents, comprising

22 000 tokens; in which are described

5 000 events, and

1 000 times; these are connected by

temporal linkages.

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 20: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Context

How do events tend to relate to each other in discourse?

Emmanuel had said “This will explode!”, but later changed

his mind.

Positions of S and R persist throughout each context.

Changing the S − R relationship inside a context leads toawkward text;“By the time I ran, John will have arrived”.

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 21: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Contexts in natural language

What can we use context for?

Sets of events and times are often found in “pools”, allbelonging to the same temporal context

These pools each describe a temporally-close set of events inthe same realis or irrealis world

Identifying the nature of the pools helps us perform temporalannotation (conditionals, reported speech)

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 22: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Outline

1 Introduction

2 Reichenbach’s Model of Tense

3 RTMML

4 Building the corpus

5 Summary

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 23: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Summary

Have we accomplished our goals?

We have created a corpus annotated with Reichenbach’s tensemodel

We can use this corpus to observe temporal links in text indetail

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 24: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Future Work

Automatic annotation of events and links using Reichenbach’smodel and RTMBank as training data

Resolution of RTMML-annotated annotations for temporalreasoning

Adapting the model to work outside of English, especially forlanguages with more complex aspectual systems (e.g. Russian)

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 25: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Conclusion

We have presented RTMBank, a corpus containing temporalannotations based on Reichenbach’s model of tense.

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model

Page 26: RTMBank: Capturing Verbs with Reichenbach's Tense Model

Introduction Reichenbach’s Model of Tense RTMML Building the corpus Summary

Questions

Thank you for your time. Are there any questions?

Leon Derczynski and Robert Gaizauskas University of Sheffield

RTMBank: Capturing Verbs with Reichenbach’s Tense Model