View
31
Download
0
Category
Tags:
Preview:
DESCRIPTION
Dealing with Italian Temporal Expressions: the ITA-Chronos System. Matteo Negri Fondazione Bruno Kessler - IRST, Trento - Italy negri@itc.it EVALITA 2007 - Evaluation of NLP Tools for Italian Rome - Italy September 10, 2007. Outline. - PowerPoint PPT Presentation
Citation preview
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Matteo NegriFondazione Bruno Kessler - IRST, Trento - Italy
negri@itc.it
EVALITA 2007 - Evaluation of NLP Tools for ItalianRome - ItalySeptember 10, 2007
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Outline
• Chronos: a multilingual system for TE recognition/normalization
• System description
• Some examples
• Results at EVALITA 2007
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Chronos
• Multilingual (ITA/ENG) tool for TE recognition and normalization according to the TIMEX2 standard
• Approach– Rule-based system
• ENG-Chronos: 1500 rules
• ITA-Chronos: 981 rules
– Six phases: Preprocessing, Detection, Braketing, Information Gathering, Anchors Selection, Normalization
• ENG-Chronos participated in TERN-04 with good results on the “Recognition+Normalization Task”
– Ranked 2nd, with 76% TERN-Value (best system: 78%)
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
ITA-Chronos: System Architecture
Tokenization, POS Tagging, Multiwords Recognition
DetectionBasic Tagging Rules
Bracketing Composition Rules
Information GatheringTagging Rules for: SET, Anchor_Dir,
Anchor_Val, MOD Type, T_Cat, Heur, Op,
Quant, Val_Ext
Plain Text Intermediate Annotation
Attributes Normalization
Dates Normalization
Anchors Selection
Tagged Text
Detection and Bracketing Normalization
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP1: Preprocessing
• The first phase of the process performs:– Tokenization
– POS tagging
– Multiwords recognition
• The preprocessed input text is then passed to the TE detection phase, where around 400 tagging rules are in charge of finding all the TEs it contains.
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP2: Detection
• Markable expressions are detected considering the presence of lexical triggers in the input text– “anno”, “oggi”, “Venerdì”, “Natale”, “quotidianamente”,
“10/09/2007”, “1982”, etc.
• Basic Tagging Rules– Regular expressions checking for: word senses, parts of speech,
symbols, or words satisfying specific predicates
PATTERN t1 t2 t3
t1 [pos=“E”]
t2 [pos=“N”]
t3 [pred=TimeUnit-p]
OUTPUT <TIMEX2>t1 t2 t3<\TIMEX2>
Tagging rule matching with “Fra tre giorni”
…“E” = preposition
…“N” = numeral
…TimeUnit-p satisfied by: “secondo”, “minuto”, “ora”, “giorno”, “settimana”, “mese”, etc.
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP3: Bracketing
• Considers the context surrounding the detected triggers– “inizio”, “fine”, “prima”, “dopo”, “fa”, “successivo”, “precedente”,
“durante”, “circa”, “almeno”, “3”, “sesto”, etc.
• Composition rules:– In charge of handling conflicts between possible multiple taggings (e.g.
when a recognized TE contains, overlaps, or is adjacent to one or more detected TEs)
PATTERN T-EXP1 T-EXP2
T-EXP1 [start = n] [end = m]
T-EXP2 [start = n≤o<m] [end = o<p≤m]
OUTPUT T-EXP-1
T-EXP-1 [start = n] [end = m]
Composition rule for handling inclusions
Tutta la notte di sabato
Tutta la nottela notte
la notte di sabatosabato
Tutta la notte di sabato
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP4: Information gathering
• Goal: mine relevant information for normalization
• Considers triggers+context to assign values to – TIMEX2 attributes (e.g. SET, MOD, ANCHOR_DIR)
– TEMPORARY attributes (e.g. Type, T_Cat, Heur, Op, Quant)
• This is done by running separate sets of specialized tagging rules
• Such information is stored in the Intermediate Annotation, and input to the normalization component
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Example
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Exampleoltre tre anni dopo
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
Detected TE
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Exampleoltre tre anni dopo
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
MORE_THAN
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Exampleoltre tre anni dopo
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
MORE_THAN
ENDING
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Exampleoltre tre anni dopo
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
MORE_THAN
ENDING
T-REL
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Exampleoltre tre anni dopo
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
MORE_THAN
ENDING
T-REL
YEAR
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Exampleoltre tre anni dopo
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
MORE_THAN
ENDING
T-REL
YEAR
+
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Exampleoltre tre anni dopo
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
MORE_THAN
ENDING
T-REL
YEAR
+
3
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Information Gathering: Exampleoltre tre anni dopo
TIMEX2 attributes
MOD: “più di”, “circa”, “oltre” …
SET: “ogni”, “tutti” …
ANCHOR_DIR: “prima”, “durante”, “dopo”...
TEMPORARY attributes
type: [T-ABS | T-REL]
t-cat: [second, minute, hour, day,…]
op: [=, +, -]
quant: [n≥0]
heur: [CR-DATE | PR-DATE]
MORE_THAN
ENDING
T-REL
YEAR
+
3
PR-DATE
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Intermediate Annotation: Example
adige20041007_id413938
“…Così il 31 Luglio del 2002, quindi oltre tre anni dopo l’incidente, il giovane venne nuovamente ricoverato e sottoposto ad un intervento che si dimostrerà risolutivo…”
…quindi <TIMEX2 MOD=“MORE_THAN” ANCHOR_DIR=“ENDING” type=“T-REL” t-cat=“YEAR” op=“+” quant=“3”, heur=“PR-DATE>oltre tre anni dopo </TIMEX2> l’incidente…
Detection and Bracketing
Intermediate Annotation
Plain Text
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP5: Anchors Selection
• Goal: connect each detected T-REL to an appropriate anchor date – While the meaning of T-ABSs (“13 Marzo 2005”) is context-
independent, T-RELs (“tre anni dopo”) can only be interpreted with respect to e reference TE
• The “heur” attribute is used for this purpose– 2 heuristics:
CR-DATE: connects a T-REL to the document’s creation date (found at the beginning of the doc, or induced from doc’s name. e.g. “adige20041007_…)
PR-DATE: connects a T-REL to the nearest detected TE with a compatible granularity (a “t-cat” with at least the same degree of specificity)
t-cat= “month” “month”, “week”, “day”, “century”
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
STEP6: Dates Normalization• Goal: fill the VAL attribute of each detected TE
T-ABSs: regular expressions considering their superficial form (“1990s” “199”)
T-RELs: rewriting rules considering
the anchor (e.g. “2002”)
the operator (“OP”) to be applied (e.g. “+”)
the quantity (“QUANT”) to be added/subtracted (e.g. “3”)
tre anni dopo 2005“2002” “+” “3”
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
ITA-Chronos at EVALITA 2007
• Results over the EVALITA-07 test set (27’15’’ computation time, ~50 words/sec)
• Higher scores on MOD and SET attributes– Activated by the presence of triggers that are easy to identify
• Lower scores with ANCHOR_VAL and ANCHOR_DIR– Require the analysis of a larger context, e.g. including verb tense
Value Precision Recall F-Measure
Rec. 85.7 95.7 89.8 92.6
Rec.+Norm. 61.9 68.5 66.3 67.4
EVALITA’07 - 09/10/2007M. Negri
Dealing with Italian Temporal Expressions: the ITA-Chronos System
Web Demo
http://www.qallme.itc.it/server/chronos/italian
Recommended