View
224
Download
0
Category
Tags:
Preview:
Citation preview
“Mini” Tutorial:Recognition and Normalization
of Time Expressions
Matteo Negri
1st ONTOTEXT Project WorkshopTrento, 25/11/2004
25/11/2004 Recognition and Normalization of Time Expressions
Outline
Reasoning about Time
TIMEX2The TERN Experience
CHRONOS and
ONTOTEXT
25/11/2004 Recognition and Normalization of Time Expressions
Outline
Reasoning about Time
TIMEX2The TERN Experience
CHRONOS and
ONTOTEXT
25/11/2004 Recognition and Normalization of Time Expressions
Reasoning About Time
• Crucial step towards automatic language comprehension– Broad variety of related issues (e.g. What is the basic
temporal unit? How to represent the temporal meaning of an expression? Which parts of a text convey a temporal meaning? How to capture the relation between time and events? …)
• What this talk IS about: the interpretation of the meaning of expressions that refer to time– i.e. expressions telling us when something happened, how
long something lasted, or how often something occurs
• What this talk IS NOT about: event expressions, temporal anchoring of events, dependencies between events and times, etc.
>
25/11/2004 Recognition and Normalization of Time Expressions
Interpreting the Meaning of Temporal Expressions• What we want to do:
Place any temporal expression present in a given input text over a timeline (a discrete, unlimited, and totally ordered sequence of points)
intervals (“four years”)
points (“2004”)
sets of times (“every 3 years”)
Ancora quattro anni di scavo e poi altri due dedicati solo al ripristino. La cava di Ronchi di Mattarello, che da mezzo secolo viene usata per l´estrazione di materiale calcareo per l´edilizia, sta per chiudere il suo ciclo vitale. Il comitato provinciale per l´ambiente nella seduta di ieri ha deciso di concedere la proroga di sei anni richiesta dalla Cava di Ronchi srl, che da pochi mesi ha rilevato da una società immobiliare l´attività che fu della Pedrotti asfalti. La coltivazione vera e propria potrà però andare avanti solo fino al 2008. Poi l´azienda dovrà concentrarsi solo sulla cura e il ripristino dei luoghi.Quella concessa ieri, che dovrà essere ora confermata dalla giunta provinciale, è l´ultima proroga. La delibera lo dirà espressamente. Quel tipo di attività qualche decina di anni fa non aveva grosse controindicazioni in quella zona, ma ora la città e il sobborgo sono cresciuti e le polveri e quel viavai di camion non sono più considerati compatibili. Anche perché l´avallamento che ospita la cava è posto proprio a fianco dell´area naturalistica del Casteller, gestita dalla Federazione cacciatori ma che la Provincia vorrebbe valorizzare con sostanziosi contributi pubblici.Anche dopo la chiusura della cava la società potrà comunque proseguire con l´altra attività, quella di produzione di asfalto. Ogni tre anni, il ciclo operativo prevede lo scavo del materiale e la selezione per la vendita. Il buco viene poi progressivamente riempito con il materiale proveniente dalle demolizioni, che passa prima attraverso la sezione di separazione per il recupero e il riciclo e poi dal vaglio e dal frantoio. Anche in questo caso una parte viene indirizzata alla produzione di asfalti.
texttimeline
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
2004
25/11/2004 Recognition and Normalization of Time Expressions
Interpreting the Meaning of Temporal Expressions cont.• What we need to do:
1. Detect the temporal expressions present in a text and determine their extension
2. Model the temporal context in which they occur
• Human language is full of context-dependent time expressions (e.g. “today”, “next week”, “November 25”) which refer to a particular temporal location (day, month, year)
3. Provide a normalization framework in order to encode the same meaning (e.g. “November 25, 2004”, “25/11/2004”, “2004/11/25”) in the same way (e.g. “2004-11-25” in ISO 8601 format)
25/11/2004 Recognition and Normalization of Time Expressions
Motivation: Question Answering
• Question reformulation
Q: “Is Bill Clinton currently the President of the United States?”
25/11/2004 Recognition and Normalization of Time Expressions
Motivation: Question Answering
• Question reformulation
Q: “Is Bill Clinton currently the President of the United States?”
November 2004
25/11/2004 Recognition and Normalization of Time Expressions
Motivation: Question Answering
• Question reformulation
Q: “Is Bill Clinton currently the President of the United States?”
November 2004
Q’: “Is Bill Clinton the President of the United States in November 2004?”
Q’’: “Who is the President of the United States in November 2004?”
25/11/2004 Recognition and Normalization of Time Expressions
Motivation: Question Answering
• Answer selectionQ: “When did J.R.R. Tolkien retire from his professorship
at Oxford?”
In 1957, Tolkien was to travel to the United States to accept honorary degrees from Marquette, Harvard, and several other universities, but the trip was cancelled due to the ill health of his wife Edith. He retired two years later from his professorship at Oxford
“The Adventures of Tom Bombadil” was published in 1962, three years after Tolkien retired his professorship at Oxford.
…Tolkien makes a brief allusion to the future of Middle-earth in a letter written in 1958. The following year, after his retirement from teaching at Oxford, he …
25/11/2004 Recognition and Normalization of Time Expressions
Motivation: Question Answering
• Answer selectionQ: “When did J.R.R. Tolkien retire from his professorship
at Oxford?”
In 1957, Tolkien was to travel to the United States to accept honorary degrees from Marquette, Harvard, and several other universities, but the trip was cancelled due to the ill health of his wife Edith. He retired two years later from his professorship at Oxford
“The Adventures of Tom Bombadil” was published in 1962, three years after Tolkien retired his professorship at Oxford.
…Tolkien makes a brief allusion to the future of Middle-earth in a letter written in 1958. The following year, after his retirement from teaching at Oxford, he …
25/11/2004 Recognition and Normalization of Time Expressions
Motivation: Question Answering
• Answer selectionQ: “When did J.R.R. Tolkien retire from his professorship
at Oxford?”
In 1957, Tolkien was to travel to the United States to accept honorary degrees from Marquette, Harvard, and several other universities, but the trip was cancelled due to the ill health of his wife Edith. He retired two years later from his professorship at Oxford
“The Adventures of Tom Bombadil” was published in 1962, three years after Tolkien retired his professorship at Oxford.
…Tolkien makes a brief allusion to the future of Middle-earth in a letter written in 1958. The following year, after his retirement from teaching at Oxford, he …
25/11/2004 Recognition and Normalization of Time Expressions
Motivation: Question Answering
• Answer selectionQ: “When did J.R.R. Tolkien retire from his professorship
at Oxford?”
In 1957, Tolkien was to travel to the United States to accept honorary degrees from Marquette, Harvard, and several other universities, but the trip was cancelled due to the ill health of his wife Edith. He retired two years later (=1959) from his professorship at Oxford
“The Adventures of Tom Bombadil” was published in 1962, three years after (=1959) Tolkien retired his professorship at Oxford.
…Tolkien makes a brief allusion to the future of Middle-earth in a letter written in 1958. The following year (=1959), after his retirement from teaching at Oxford, he …
1957: 11958: 11959: 31962: 1
A: 1959
25/11/2004 Recognition and Normalization of Time Expressions
Motivation: Question Answering
• Advanced reasoningQ: “Could Mozart and Beethoven meet in Vienna?”
“In 1784 Beethoven was able to deputize for his teacher. Three years later, recognizing his talent, Prince Maximilian Franz sent him to Vienna to further his education. He would soon return less than four months later on the news that his mother was dying. She passed away on July 17th 1787.”
“Mozart went to Munich to compose the opera late in 1780. Soon after, he was summoned from Munich to Vienna, where the Salzburg court was in residence on the accession of a new emperor. Mozart lived in Vienna for the rest of his life, until he died in 1791.”
25/11/2004 Recognition and Normalization of Time Expressions
Motivation: Question Answering
• Advanced reasoningQ: “Could Mozart and Beethoven meet in Vienna?”
“In 1784 Beethoven was able to deputize for his teacher. Three years later, recognizing his talent, Prince Maximilian Franz sent him to Vienna to further his education. He would soon return less than four months later on the news that his mother was dying. She passed away on July 17th 1787.”
“Mozart went to Munich to compose the opera late in 1780. Soon after, he was summoned from Munich to Vienna, where the Salzburg court was in residence on the accession of a new emperor. Mozart lived in Vienna for the rest of his life, until he died in 1791.”
1787
Beethoven in Vienna
1780 1791
Mozart in Vienna
A: YES, in 1787
25/11/2004 Recognition and Normalization of Time Expressions
Motivation: other NLP areas
• Information Retrieval“Give me the articles from the press one week after the election day”
• Summarization
“Give me a short biography on Mozart, in chronological order”
Need to know when events occur, to avoid inappropriate merging of distinct events
• …
25/11/2004 Recognition and Normalization of Time Expressions
Outline
Reasoning about Time
TIMEX2
The TERN Experience
CHRONOS and
ONTOTEXT
25/11/2004 Recognition and Normalization of Time Expressions
TIMEX2
• Annotation standard for temporal expressions
• Extends the MUC’s definition of the TIMEX named entities category by:– Including a broader variety of expressions (e.g. “daily”,
“three years later”, “now”, “18-year-old”)– Replacing the TYPE (DATE vs TIME) categorization
attribute with a set of attributes expressing the normalized, intended meaning of a temporal expression
About TIMEX and the MUC Named Entity task:
http://www.itl.nist.gov/iaui/894.02/related_projects/muc/
25/11/2004 Recognition and Normalization of Time Expressions
TIMEX2: Annotation Format
• Temporal expressions are annotated by inserting a special SGML tag around the text string, as in:
<TIMEX2>Christmas</TIMEX2>
• In addition, the TIMEX2 tag may contain one or more attributes, as in:
<TIMEX2 val=“2005-11-25TAF” mod=“START”> early this afternoon</TIMEX2>
25/11/2004 Recognition and Normalization of Time Expressions
TIMEX2: Markable Expressions
• Features of a markable expression:
– The syntactic head of the expression must be an appropriate lexical trigger, or a pronoun that co-refers with a markable time expression
– Each lexical trigger is a word or numeric expression whose meaning conveys a temporal unit or concept
– To be a trigger, the referent must be able to be oriented on a timeline with a relation to a time (past, present, future)
For details, seeFerro et al.: TIDES 2003 Standard for
the Annotation of Temporal Expressions, September 2003
>
25/11/2004 Recognition and Normalization of Time Expressions
TIMEX2: Normalization Attributes• Designed to consistently capture the semantics of
markable expressions in the annotations
Attribute Function Example
VALContains a normalized form of the date/time (ISO 8601 format)
VAL=“2004-11-25”
MOD Captures temporal modifiers MOD=“APPROX”
ANCHOR_VALContains a normalized form of an anchoring date/time
ANCHOR_VAL=“2004-11-24”
ANCHOR_DIRCaptures the relative direction/ orientation between VAL and ANCHOR_VAL
ANCHOR_DIR=“BEFORE”
SETIdentifies expressions denoting sets of times
SET=“YES”
NON_SPECIFIC Identifies non-specific expressions NON_SPECIFIC=“YES”
COMMENTContains any comment the annotator wants to add
COMMENT=“any string”
25/11/2004 Recognition and Normalization of Time Expressions
TIMEX2: Some Examples1. <TIMEX2 val=”2004-11-25" mod="" set="" non_specific="" anchor_val="" anchor_dir=""
comment="">today</TIMEX2>
2. <TIMEX2 val="XXXX-XX-XX" mod="" set="YES" non_specific="YES" anchor_val="" anchor_dir="" comment="">daily</TIMEX2>
3. <TIMEX2 val="19" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">the 20th century</TIMEX2>
4. <TIMEX2 val="PRESENT_REF" mod="" set="" non_specific="" anchor_val="1998-10-03" anchor_dir="AS_OF" comment="">now</TIMEX2>
5. <TIMEX2 val="P.66CE" mod="LESS_THAN" set="" non_specific="" anchor_val="1998" anchor_dir="ENDING" comment="">nearly two-thirds of a century</TIMEX2>
6. <TIMEX2 val="P13Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">the 13 years of Milosevic's rule</TIMEX2>
7. <TIMEX2 val="1998-W46" mod="START" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Early this week</TIMEX2>
8. <TIMEX2 val="" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Cold War-era</TIMEX2>
9. <TIMEX2 val="P27Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">27-year-old</TIMEX2>
25/11/2004 Recognition and Normalization of Time Expressions
TIMEX2: Some Examples1. <TIMEX2 val=”2004-11-25" mod="" set="" non_specific="" anchor_val="" anchor_dir=""
comment="">today</TIMEX2>
2. <TIMEX2 val="XXXX-XX-XX" mod="" set="YES" non_specific="YES" anchor_val="" anchor_dir="" comment="">daily</TIMEX2>
3. <TIMEX2 val="19" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">the 20th century</TIMEX2>
4. <TIMEX2 val="PRESENT_REF" mod="" set="" non_specific="" anchor_val="1998-10-03" anchor_dir="AS_OF" comment="">now</TIMEX2>
5. <TIMEX2 val="P.66CE" mod="LESS_THAN" set="" non_specific="" anchor_val="1998" anchor_dir="ENDING" comment="">nearly two-thirds of a century</TIMEX2>
6. <TIMEX2 val="P13Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">the 13 years of Milosevic's rule</TIMEX2>
7. <TIMEX2 val="1998-W46" mod="START" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Early this week</TIMEX2>
8. <TIMEX2 val="" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Cold War-era</TIMEX2>
9. <TIMEX2 val="P27Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">27-year-old</TIMEX2>
25/11/2004 Recognition and Normalization of Time Expressions
TIMEX2: Some Examples1. <TIMEX2 val=”2004-11-25" mod="" set="" non_specific="" anchor_val="" anchor_dir=""
comment="">today</TIMEX2>
2. <TIMEX2 val="XXXX-XX-XX" mod="" set="YES" non_specific="YES" anchor_val="" anchor_dir="" comment="">daily</TIMEX2>
3. <TIMEX2 val="19" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">the 20th century</TIMEX2>
4. <TIMEX2 val="PRESENT_REF" mod="" set="" non_specific="" anchor_val="1998-10-03" anchor_dir="AS_OF" comment="">now</TIMEX2>
5. <TIMEX2 val="P.66CE" mod="LESS_THAN" set="" non_specific="" anchor_val="1998" anchor_dir="ENDING" comment="">nearly two-thirds of a century</TIMEX2>
6. <TIMEX2 val="P13Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">the 13 years of Milosevic's rule</TIMEX2>
7. <TIMEX2 val="1998-W46" mod="START" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Early this week</TIMEX2>
8. <TIMEX2 val="" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Cold War-era</TIMEX2>
9. <TIMEX2 val="P27Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">27-year-old</TIMEX2>
25/11/2004 Recognition and Normalization of Time Expressions
TIMEX2: Some Examples1. <TIMEX2 val=”2004-11-25" mod="" set="" non_specific="" anchor_val="" anchor_dir=""
comment="">today</TIMEX2>
2. <TIMEX2 val="XXXX-XX-XX" mod="" set="YES" non_specific="YES" anchor_val="" anchor_dir="" comment="">daily</TIMEX2>
3. <TIMEX2 val="19" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">the 20th century</TIMEX2>
4. <TIMEX2 val="PRESENT_REF" mod="" set="" non_specific="" anchor_val="1998-10-03" anchor_dir="AS_OF" comment="">now</TIMEX2>
5. <TIMEX2 val="P.66CE" mod="LESS_THAN" set="" non_specific="" anchor_val="1998" anchor_dir="ENDING" comment="">nearly two-thirds of a century</TIMEX2>
6. <TIMEX2 val="P13Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">the 13 years of Milosevic's rule</TIMEX2>
7. <TIMEX2 val="1998-W46" mod="START" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Early this week</TIMEX2>
8. <TIMEX2 val="" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Cold War-era</TIMEX2>
9. <TIMEX2 val="P27Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">27-year-old</TIMEX2>
25/11/2004 Recognition and Normalization of Time Expressions
TIMEX2: Some Examples1. <TIMEX2 val=”2004-11-25" mod="" set="" non_specific="" anchor_val="" anchor_dir=""
comment="">today</TIMEX2>
2. <TIMEX2 val="XXXX-XX-XX" mod="" set="YES" non_specific="YES" anchor_val="" anchor_dir="" comment="">daily</TIMEX2>
3. <TIMEX2 val="19" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">the 20th century</TIMEX2>
4. <TIMEX2 val="PRESENT_REF" mod="" set="" non_specific="" anchor_val="1998-10-03" anchor_dir="AS_OF" comment="">now</TIMEX2>
5. <TIMEX2 val="P.66CE" mod="LESS_THAN" set="" non_specific="" anchor_val="1998" anchor_dir="ENDING" comment="">nearly two-thirds of a century</TIMEX2>
6. <TIMEX2 val="P13Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">the 13 years of Milosevic's rule</TIMEX2>
7. <TIMEX2 val="1998-W46" mod="START" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Early this week</TIMEX2>
8. <TIMEX2 val="" mod="" set="" non_specific="" anchor_val="" anchor_dir="" comment="">Cold War-era</TIMEX2>
9. <TIMEX2 val="P27Y" mod="" set="" non_specific="" anchor_val="2000" anchor_dir="ENDING" comment="">27-year-old</TIMEX2>
25/11/2004 Recognition and Normalization of Time Expressions
Outline
Reasoning about Time
TIMEX2The TERN Experience
Normalization
CHRONOS ArchitectureDetection and
Bracketing
Results
OverviewCHRONOS
and ONTOTEXT
25/11/2004 Recognition and Normalization of Time Expressions
Outline
Reasoning about Time
TIMEX2
Normalization
CHRONOS ArchitectureDetection and
Bracketing
Results
OverviewCHRONOS
and ONTOTEXT
The TERN Experience
25/11/2004 Recognition and Normalization of Time Expressions
Time Expression Recognition and Normalization (TERN)• Task: detect and normalize (with TIMEX2 tags) all the
temporal expressions occurring in the source data
• Time span: April-September 2004
• Organizers: NIST, MITRE Corp.
• Sponsor: Automatic Content Extraction (ACE) program – Started in September 1999
– Administered by NSA, NIST, and CIA
– ACE’s objective: develop NLP technology to support automatic understanding of textual data
For further information:
TERN: http://timex2.mitre.org/tern.html
ACE: http://itl.nist.gov/iaui/894.01/tests/ace/
25/11/2004 Recognition and Normalization of Time Expressions
TERN-2004
• Two source languages: English and Chinese• Two separate tasks:
– Detection– Detection + Normalization
• Input:– Broadcast news and newswire texts
• Output:– In-line annotation with TIMEX2 tags
• Evaluation figures:– Correct, incorrect, misses, spurious, undergeneration,
overgeneration, substitution, error, precision, recall, F-measure
>
25/11/2004 Recognition and Normalization of Time Expressions
TERN-2004: Detection
• Participants:– English: 6; Chinese: 4
• All the participants viewed the task as a supervised learning problem, and used the available annotated data to train their system (SVM, Maximum Entropy, HMM)
• Features considered :– LEXICAL: tokens, n-grams, prefixes and suffixes,
capitalization, digits, punctuation
– SYNTACTIC: Parts of Speech, chunks, syntactic patterns, patterns of numerical date expressions
– TASK SPECIFIC: timex dictionary, other taggers
25/11/2004 Recognition and Normalization of Time Expressions
TERN-2004: EnglishDetection + Normalization• Participants:
– English: 6; Chinese: 0
• All the participants addressed the task with rather similar rule-based approaches– 2-step strategy:
• detection and bracketing
• normalization
– Similar linguistic preprocessing: POS-tagging, chunking
– Normalization is still considered out of the reach of ML
25/11/2004 Recognition and Normalization of Time Expressions
Outline
Reasoning about Time
TIMEX2
Normalization
CHRONOS ArchitectureDetection and
Bracketing
Results
OverviewCHRONOS
and ONTOTEXT
The TERN Experience
25/11/2004 Recognition and Normalization of Time Expressions
The CHRONOS system
• Extension of our multilingual (Eng/Ita) NER system– Rule-based approach to recognize: <PERSON>, <LOCATION>, <ORGANIZATION>, <MEASURE>,
<MONEY>, <CARDINAL>, <PERCENT>;
<DATE>, <DURATION>, <TIME> mapped to <TIMEX2>
– Proper names (“Galileo Galilei”) and trigger words (“astronomer”) are mined from the WordNet hierarchy Advantages: 1) reduced effort to create/maintain
reliable gazetteers (261 proper name hyponyms of calendar_day#1)
2) useful basis for multilinguality
25/11/2004 Recognition and Normalization of Time Expressions
CHRONOS: Architecture
Tokenization, POS Tagging & Multiwords
Recognition
Basic Rules Application
Composition Rules Application
Plain English Text
Detection and Bracketing
Tagged Text
OK for (non normalized)
<PERSON>, <LOCATION>, <ORGANIZATION>, <MEASURE>, <MONEY>, <CARDINAL>, <PERCENT>
TIMEX2?
25/11/2004 Recognition and Normalization of Time Expressions
CHRONOS: Architecture
Tokenization, POS Tagging & Multiwords
Recognition
Basic Rules Application
Composition Rules Application
Plain English Text
Attributes Normalization
Dates Normalization
Anchors Selection
Tagged TextIntermediate Annotation
Detection and Bracketing Normalization
25/11/2004 Recognition and Normalization of Time Expressions
Outline
Reasoning about Time
TIMEX2
Normalization
CHRONOS ArchitectureDetection and
Bracketing
Results
OverviewCHRONOS
and ONTOTEXT
The TERN Experience
25/11/2004 Recognition and Normalization of Time Expressions
Detection & Bracketing:Basic Rules• ~1500 hand-crafted rules (~1.5PM)
– Regular expressions checking for word senses, parts of speech, symbols, words satisfying specific predicates
• Detection– Markable expressions are detected considering the presence
in the input text of lexical triggers• “year”, “Seventies”, “Friday”, “Christmas”, “today”,
“daily”, “09/23/2004”, “1970s”, etc.• Bracketing
– Considers the context surrounding the detected triggers• “beginning”, “end”, “previous”, “next”, “ago”, “later”,
“before”, “during”, “nearly”, “almost”, “3”, “sixth”, etc.
25/11/2004 Recognition and Normalization of Time Expressions
• Information gathering– Goal: mine relevant information for normalization
– Considers triggers+context to fill:
Detection & Bracketing: Basic Rules cont.
25/11/2004 Recognition and Normalization of Time Expressions
• Information gathering– Goal: mine relevant information for normalization
– Considers triggers+context to fill:
TIMEX2 attributes
MOD: “more than”, “approximately” …
SET: “every”, “twice a” …
ANCHOR_DIR: “before”, “ago”, “during”...
Detection & Bracketing:Basic Rules cont.
25/11/2004 Recognition and Normalization of Time Expressions
• Information gathering– Goal: mine relevant information for normalization
– Considers triggers+context to fill:
TIMEX2 attributes
MOD: “more than”, “approximately” …
SET: “every”, “twice a” …
ANCHOR_DIR: “before”, “ago”, “during”...
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day, …, millennium]
op: [=, +, -]
quant: [n≥0]
Detection & Bracketing:Basic Rules cont.
Nearly three years later
LESS_THAN
Nearly three years later
ENDING
Nearly three years later
T-REL
Nearly three years later
YEAR
Nearly three years later
+
Nearly three years later
3
Nearly three years later
25/11/2004 Recognition and Normalization of Time Expressions
Basic Rules: an Example
PATTERN t1 t2 t3 t4
t1t2t3t4
[pred = approx-p] [pred = number-p][lemma = “year”][lemma = “later”]
OUTPUT (intermediate annotation)
<TIMEX2 val=“?” anchor_val=“?” mod=“LESS_THAN” anchor_dir=“ENDING” type=“T-REL” t-cat=“year” quant=“t2” op=“+”>t1 t2 t3 t4<\TIMEX2>
TIMEX2 attributes
Temporary attributes
Values to be determined
A basic rule matching with “Nearly three years later”
25/11/2004 Recognition and Normalization of Time Expressions
Detection & Bracketing:Composition Rules• Handle conflicts between possible multiple taggings
“I traveled for the whole Monday night”
Monday
the whole Monday
Monday night
the whole Monday night
25/11/2004 Recognition and Normalization of Time Expressions
Detection & Bracketing:Composition Rules• Handle conflicts between possible multiple taggings
“I traveled for the whole Monday night”
Monday
the whole Monday
Monday night
the whole Monday night
25/11/2004 Recognition and Normalization of Time Expressions
Outline
Reasoning about Time
TIMEX2
Normalization
CHRONOS ArchitectureDetection and
Bracketing
Results
OverviewCHRONOS
and ONTOTEXT
The TERN Experience
25/11/2004 Recognition and Normalization of Time Expressions
Normalization
• Anchors Selection (only for T-RELs)– Goal: connect each T-REL to an anchor
– 2 heuristics
CR-DATE: connects a T-REL to the document’s creation date (found at the beginning of the doc, or induced from doc’s name. e.g. “NYT20001025.1839.0279.sgm”)
PR-DATE: connects a T-REL to the nearest time expression with a compatible granularity (a t-cat with at least the same degree of specificity).
t-cat= “month” “month”, “week”, “day”, “decade”
25/11/2004 Recognition and Normalization of Time Expressions
Normalization cont.
HEURISTIC trigger trigger+context
PR-DATE
former, then, that, it following+trigger, previous+trigger, same+trigger, that+trigger, trigger+before, trigger+later
CR-DATE
yesterday, today, tonight, now, Monday, …, Sunday, January, …, December
this+trigger, last+trigger, next+trigger, past+trigger,
the+trigger, trigger+ago
25/11/2004 Recognition and Normalization of Time Expressions
Normalization cont.
• Dates Normalization– Goal: fill the VAL attribute of each detected time
expression
T-ABS: regular expressions considering their superficial form (“1990s” “199”)
T-REL: rewriting rules considering
the anchor (e.g. “2001”)
the operator (“OP”) to be applied (e.g. “+”)
the quantity (“QUANT”) to be added/subtracted (e.g. “3”)
three years later 2004“2001” “+” “3”
25/11/2004 Recognition and Normalization of Time Expressions
Normalization cont.
• Attributes Normalization– Goal: produce the final tagged text
• Removes temporary attributes• Introduces the normalized attributes “ANCHOR_VAL”
and “ANCHOR_DIR”
25/11/2004 Recognition and Normalization of Time Expressions
Outline
Reasoning about Time
TIMEX2
Normalization
CHRONOS ArchitectureDetection and
Bracketing
TERN-2004 Results
OverviewCHRONOS
and ONTOTEXT
The TERN Experience
25/11/2004 Recognition and Normalization of Time Expressions
0,8410,716
0,944 0,872
TERN-2004: English DetectionF
- Me a
sur e
6 Participating Sites
00,10,20,30,40,50,60,70,80,9
1
CU IBMLingPipe
MetaCartaSheffield
Amsterdam
TIMEX2TEXT
25/11/2004 Recognition and Normalization of Time Expressions
TERN-2004: English Detection
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
Broadcast News Newswire
TIMEX2TEXT
F- M
e asu
r eIBM
• Similar results between the two sources
• The same trend holds among most systems
25/11/2004 Recognition and Normalization of Time Expressions
0,724
0,567
0,927 0,825
TERN-2004: Chinese DetectionF
- Me a
sur e
4 Participating Sites
•Minimal drop-off from English (94%) to Chinese (93%) for TIMEX2
00,1
0,20,30,40,5
0,60,70,8
0,91
CU LingPipe PolyU Sheffield
TIMEX2TEXT
25/11/2004 Recognition and Normalization of Time Expressions
0,95 (0,944) 0,849 (0,872)
TERN-2004: English Detection + Normalization
00,10,20,30,40,50,60,70,80,9
1
CLACCymfonyITC-irstLockheedAlicante
Amsterdam
TIMEX2
TEXT
F- M
e asu
r e
6 Participating Sites
•Comparable performance of the two top system wrt English detection
25/11/2004 Recognition and Normalization of Time Expressions
TERN-2004: English Detection + Normalization
F- M
e asu
r e
6 Participating Sites
0,872 0,8640,774
•Three systems score above 85% for the VAL attribute
00,10,20,30,40,50,60,70,80,9
1
CLACCymfonyITC-irstLockheedAlicante
Amsterdam
VAL
MOD
SET
25/11/2004 Recognition and Normalization of Time Expressions
TERN-2004: English Detection + Normalization
F- M
e asu
r e
6 Participating Sites
0,760 0,726>
00,10,20,30,40,50,60,70,80,9
1
CLACCymfonyITC-irstLockheedAlicante
Amsterdam
A-DIR
A-VAL
25/11/2004 Recognition and Normalization of Time Expressions
TERN-2004 Results: CHRONOS
TAG POSS ACT CORR INCO MISS SPUR PREC REC F
TIMEX2 1828 1648 1609 0 219 39 0.976 0.880 0.926
TIMEX2:ANCHOR_DIR 351 294 245 26 80 23 0.833 0.698 0.760
TIMEX2:ANCHOR_VAL 351 398 272 56 23 70 0.683 0.775 0.726
TIMEX2:MOD 50 43 36 1 13 6 0.837 0.720 0.774
TIMEX2:SET 39 25 22 0 17 3 0.880 0.564 0.688
TIMEX2:TEXT 1828 1648 1458 151 219 39 0.885 0.798 0.839
TIMEX2:VAL 1569 1560 1365 190 14 5 0.785 0.870 0.872
Detection Bracketing Normalization
25/11/2004 Recognition and Normalization of Time Expressions
TERN-2004: Wrap Up
• What worked– The rule-based approach (competitive + easy to develop,
maintain, and extend)
• What did not work– Relatively high number of missing tags (219: 11% of the
total detectable time expressions in the reference)
– Poor recall performance on specific attributes: SET: 0,56% ANCHOR_DIR: 0,69
25/11/2004 Recognition and Normalization of Time Expressions
TERN-2004 Wrap Up: Future Directions• Conflict resolution (impact on detection)
– Our implementation of composition rules ignores embedded time expressions (e.g. “The eve of the new year”, “Sixty years ago today”)
• Anaphoric expressions (impact on detection)– Pronouns are not recognized by the system as possible
triggers (e.g. “Evelyn has seen 80 winters. This, she says, was the coldest”)
• Apparent dates (spurious taggings)– A stopword list of proper names (e.g. “USA Today”,
“Daily Telegraph”, “20th Century Fox”) is just a partial solution (intrinsically incomplete)
25/11/2004 Recognition and Normalization of Time Expressions
• Reported speech (impact on normalization)– The system’s ANCHOR_VAL selection heuristic fails
with reported speech fragments:
“He concluded the 1998 annual meeting saying: ‘The next year will be the eve of a new era for our company’.”
TERN-2004 Wrap Up: Future Directions
25/11/2004 Recognition and Normalization of Time Expressions
Outline
Reasoning about Time
TIMEX2The TERN Experience
Normalization
CHRONOS ArchitectureDetection and
Bracketing
Results
OverviewCHRONOS
and ONTOTEXT
25/11/2004 Recognition and Normalization of Time Expressions
• Given a topic (e.g. “Lorenzo Dellai”), present the user with related information in chronological order
…1990-XX-XX: a soli 31 anni Lorenzo Dellai diviene il più giovane sindaco di capoluogo regionale1995-XX-XX: Dellai rieletto con la maggioranza assoluta dei voti direttamente dai cittadini2003-09-27: Dellai entra di prepotenza nelle questioni interne alla lista dei DS: che dopo un primo scatto d’orgoglio si fanno umiliare e accettano il ruolo di satelliti. 2004-10-26: Il presidente uscente della Giunta provinciale, Lorenzo Dellai (centrosinistra), è il primo 'governatore' della storia della Provincia autonoma di Trento, eletto con 169.913 voti, pari al 60,82%.…2004-11-25: CONFERENZA DI INFORMAZIONE sullo stato del comparto industriale in Trentino […] conclude il dibattito Lorenzo Dellai
CHRONOS and ONTOTEXT: Application Scenarios
25/11/2004 Recognition and Normalization of Time Expressions
CHRONOS and ONTOTEXT: Application Scenarios cont.
• Given a topic and a year (e.g. “Cantina La-vis”, “2003”), present related information in chronological order
2003-06-14: Secondo Fausto Peratoner, direttore della Cantina La-vis, “Lo Chardonnay non deve essere visto solo come un vitigno della viticoltura globalizzata. In Trentino, è un “vitigno naturalizzato”.
2003-08-25: Con il matrimonio tra la Cantina La-Vis e la Cantina Val di Cembra nasce oggi il terzo polo della viticoltura trentina.
2003-12-03: il direttore della Cantina La-Vis ha rammentato i 5 milioni di bottiglie, cui s'aggiunge un altro milione dello spumante Cesarini Sforza, gli oltre 40 milioni di fatturato, i 1.300 ettari di vigneti.
• Given a topic and a date (e.g. “Gianni Marangoni”, “2004-11-20”), retrieve related news articles from the DB
Cinque uomini armati di pistole e coltelli hanno assaltato ieri a Rovereto poco dopo le 19 di ieri la villa di Gianni Marangoni
25/11/2004 Recognition and Normalization of Time Expressions
• A new detection&bracketing component, based on ML• SVMlight was used
• Features considered: PoS, token, lemma, punctuation, capitalization, hyphenation, collocations of words and tokens
– A specific gazetteer of “temporal terms” has been mined from WordNet and will be used for further improvements
• Performance is close to state of the art: 0,83% F-Measure over the TEXT attribute (best system: 0,87; average system: 0,72; 3rd rank)
CHRONOS and ONTOTEXT: What we have done (Sept.2004-now)
For details: Gliozzo et al.: Instance Pruning by Filtering Uninformative Words: an Information Extraction Case Study, to appear at CICling 2005
00,10,20,30,40,50,60,70,80,9
1
CU IBMLingPipe
MetaCartaSheffield
Amsterdam
CHRONOS-SVM
25/11/2004 Recognition and Normalization of Time Expressions
• Short-term: Porting to Italian (ongoing activity)– Rewriting basic rules
• Mid-term: CHRONOS2 (ongoing activity)– Modularization– Integration with NERD
• Long-term: events– Temporal anchoring– Temporal ordering
CHRONOS and ONTOTEXT: Roadmap
25/11/2004 Recognition and Normalization of Time Expressions
The end
25/11/2004 Recognition and Normalization of Time Expressions
TERN-2004: Chinese Detection
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
Broadcast News Newswire
TIMEX2TEXT
F- M
e asu
r ePolytechnic University
25/11/2004 Recognition and Normalization of Time Expressions
TIMEX2: Lexical TriggersPart of Speech Lexical Triggers Non-Triggers
Noun minute, afternoon, midnight, day, weekend, month, summer, season, quarter, era, period, future, past, ...
instant, episode, occasion, timetable, reign, …
Proper name Monday, January, Christmas, etc.
Specialized time patterns
8:00, 12/2/2000, 1994, 1960s, …
Adjective recent, former, current, future, daily, semiannual, biannual, daytime, ago, preseason, …
early, ahead, next, subsequent, frequent, later, contemporary, …
Adverb currently, lately, hourly, daily, monthly, ago, …
earlier, immediately, instantly, meanwhile, next, following, later, soon, eventually, …
Time noun/adverb now, today, tomorrow, …
Number 3, three, third, Sixties, …
<
25/11/2004 Recognition and Normalization of Time Expressions
TERN: Evaluation Figures
• For each item in the aligned reference/system output:– Corr: the two items are identical
– Inco: the two items are not identical
– Miss: A reference has no system output aligned with it
– Spur: A system output has no reference aligned with it
• Given a set of corr, inco, miss, spur values:– Possible: CORR+INCO+MISS
– Actual: CORR+INCO+SPUR
– Undergeneration: MISS/POS
– Overgeneration: SPUR/ACT
– Substitution: INCO/CORR+INCO
– Error rate: INCO+SPUR+MISS/CORR+INCO+SPUR+MIS
– Precision: CORR/ACT
– Recall: CORR/POSS
– F-measure: 2*P*R/2*P+R
<
25/11/2004 Recognition and Normalization of Time Expressions
Appunti
• Training data (annotated with TIMEX2 tags)– English: 862 files (306K words)– Chinese: 503 files (158K words)
• Evaluation corpus – English: 50K words– Chinese: 50K words
• Humans are always aware of their temporal location (day, month, year) and use context-dependent time expressions (e.g. “today”, “next week”)
Given a temporal expression, the interpretation of its meaning equals to finding its correct position over a timeline
25/11/2004 Recognition and Normalization of Time Expressions
TERN-2004: English Detection + Normalization
00,10,20,30,40,50,60,70,80,9
1
CLACCymfonyITC-irstLockheedAlicante
Amsterdam
TIMEX2A-DIRA-VALMODSETTEXTVAL
F- M
e asu
r e
6 Participating Sites
<
25/11/2004 Recognition and Normalization of Time Expressions
Temporal Ordering: related issues
•In news, events aren’t usually described in the (narrative) order in which they occur
–Temporal structure dictated by perceived news value•Latest news usually presented first
–News sometimes expresses multiple viewpoints, with commentaries, eyewitness recapitulations, etc.,
•Temporal ordering appears to involve a variety of knowledge sources–Tense & aspect
•Max entered the room. Mary stood up/was seated on the desk.–Temporal adverbials
•Simpson made the call at 3. Later, he was spotted driving towards Westwood.
–Rhetorical relations and World Knowledge•Narration: Max stood up. John greeted him.•Cause/Explanation: Max fell. John pushed him.•Background: Boutros-Ghali Sunday opened a meeting in Nairobi. He arrived in Nairobi from South Africa.
<
Recommended