36
NYU: Description of the Proteus/PET System as Used for MUC-7 ST Roman Yangarber & Ralph Grishman Presented by Jinying Chen 10/04/2002

NYU: Description of the Proteus/PET System as Used for MUC-7 ST

  • Upload
    maia

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

NYU: Description of the Proteus/PET System as Used for MUC-7 ST. Roman Yangarber & Ralph Grishman Presented by Jinying Chen 10/04/2002. Outline. Introduction Proteus IE System PET User Interface Performance on the Launch Scenario. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

NYU:Description of the Proteus/PET System as Used for MUC-7 ST

Roman Yangarber & Ralph Grishman

Presented by Jinying Chen

10/04/2002

Page 2: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Outline

• Introduction

• Proteus IE System

• PET User Interface

• Performance on the Launch Scenario

Page 3: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Introduction

• Problem : portability and customization of IE engines at the scenario level

• To address this problem– NYU built a set of tools, which allow the user

to adapt the system to new scenarios rapidly through example-based learning

– The present system operates on two tiers: Proteus & PET

Page 4: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Introduction (Cont.)

• Proteus– Core extraction engine, an enhanced version of

the one employed at MUC-6

• PET– GUI front end, through which the user interacts

with Proteus– The user provide the system examples of events

in text, and examples of associated database entries to be created

Page 5: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Proteus IE System

• Modular design– Control is encapsulated in immutable, domain-

independent core components– Domain-specific information resides in the

knowledge bases

Page 6: NYU: Description of the Proteus/PET System as Used for MUC-7 ST
Page 7: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Proteus IE System (Cont.)

• Lexical analysis module– Assign each token a reading or a list of

alternative readings by consulting a set of on-line dictionaries

• Name Recognition– Identify proper names in the text by using local

contextual cues

Page 8: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Proteus IE System (Cont.)• Partial Syntax

– Find small syntactic units, such as basic NPs and VPs

– Marks the phrase with semantic information, e.g. the semantic class of the head of the phrase

• Scenario Patterns– Find higher–level syntactic constructions using

local semantic information: apposition, prepositional phrase attachment, limited conjucntions, and clausal constructions.

Page 9: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Proteus IE System (Cont.)

• Note:– The above three modules are Pattern matching

phrases, they operate by deterministic, bottom-up, partial parsing or pattern matching.

– The output is a sequence of LFs corresponding to the entities, relationships, and events encountered in the analysis.

Page 10: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Figure 2: LF for the NP: “a satellite built by Loral Corp. of New York for Intelsat”

Page 11: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Proteus IE System (Cont.)

• Reference Resolution (RefRes)– Links anaphoric pronouns to their antecedents

and merges other co-referring expressions

• Discourse Analysis– Uses higher-level inference rules to build more

complex event structures– E.g. a rule that merges a Mission entity with a

corresponding Launch event.

• Output Generation

Page 12: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

PET User Interface

• A disciplined method of customization of knowledge bases, and the pattern base in particular

• Organization of Patterns– The pattern base is organized in layers– Proteus treats the patterns at the different levels

differently– Acquires the most specific patterns directly

from user, on a per-scenario basis

Page 13: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

clausal patterns that capture events (scenario-specific)

find relationships among entities, such as between persons and organizations

perform partial syntactic analysis

most general patterns, capture the most basic constructs, such as proper names, temporal expressions etc.

Domain-dependent

Domain-independent

Core part of

System

user

Pattern Lib

Page 14: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

PET User Interface (Cont.)

• Pattern Acquisition – Enter an example – Choose an event template – Apply existing patterns (step 3)– Tune pattern elements (step 4)– Fill event slots (step 5)– Build pattern – Syntactic generalization

Page 15: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Step 3

Step 4

Step 5

Page 16: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Performance on the Launch Scenario

• Scenario Patterns– Basically two types: launch events and mission events

– In cases there is no direct connection between these two events, the post-processing inference rules attempted to tie the mission to a launch event

• Inference Rules– Involve many-to-many relations (e.g. multiple payloads

correspond to a single event)

– Extending inference rule set with heuristics, e.g. find date and site

Page 17: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Conclusion:

•Example-based pattern acquisition is appropriate for ST-level task, especially when training data is quite limited

•Pattern editing tools are useful and effective

Page 18: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

NYU:Description of the MENE Named Entity System as Used in MUC-7

Andrew Borthwick, John Sterling etc.

Presented by Jinying Chen

10/04/2002

Page 19: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Outline

• Maximum Entropy

• MENE’s Feature Classes

• Feature Selection

• Decoding

• Results

• Conclusion

Page 20: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Maximum Entropy

• Problem DefinitionThe problem of named entity recognition can be reduced to the problem of assigning 4*n+1 tags to each token

– n: the number of name categories, such as company, product, etc. For MUC-7, n=7

– 4 states: x_start, x_continue, x_end, x_unique

– other : not part of a named entity

Page 21: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Maximum Entropy (cont.)

• Maximum Solution– compute p(f | h), where f is the prediction

among the 4*n+1 tags and h is the history– the computation of p(f | h) depends on a set of

binary-valued features, e.g.

Page 22: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Maximum Entropy (cont.)

– Given a set of features and some training data, the maximum entropy estimation process produces a model:

Page 23: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

MENE’s Feature Classes

• Binary Features

• Lexical Features

• Section Features

• Dictionary Features

• External Systems Features

Page 24: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Binary Features

• Features whose “history” can be considered to be either on or off for a given token.

• Example:– The token begins with a capitalized letter– The token is a four-digit number

Page 25: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Lexical Features

• Example:

Page 26: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Section Features

• Features make predictions based on the current section of the article, like “Date”, “Preamble”, and “Text”.

• Play a key role by establishing the background probability of the occurrence of the different futures (predictions).

Page 27: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Dictionary Features

• Make use of a broad array of dictionaries of useful single or multi-word terms such as first names, company names, and corporate suffixes.

• Require no manual editing

Page 28: NYU: Description of the Proteus/PET System as Used for MUC-7 ST
Page 29: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

External Systems Features

• MENE incorporates the outputs of three NE taggers– a significantly enhanced version of the

traditional , hand-coded “Proteus” named-entity tagger

– Manitoba– IsoQuest

Page 30: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

• Example:

Page 31: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Feature Selection

• Simple

• Select all features which fire at least 3 times in the training corpus

Page 32: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Decoding

• Simple– For each token, check all the active features for

this token and compute p(f | h)– Run a viterbi search to find the highest

probability coherent path through the lattice of conditional probabilities

Page 33: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Results

• Training set: 350 aviation disaster articles (consisted of about 270,000 words)

• Test set: – Dry run : within-domain corpus – Formal run : out-of-domain corpus

Page 34: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Result (cont.)

Page 35: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Result (cont.)

Page 36: NYU: Description of the Proteus/PET System as Used for MUC-7 ST

Conclusion

• A new, still immature system. Can improve the performance by:– Adding long-range reference-resolution features

– Exploring compound features

– Sophisticated methods of feature selection

• Highly portable• An efficient method to combine NE systems