60
Plain Text Information Extraction (based on Machine Learning) Chia-Hui Chang Department of Computer Science & Information Engineering National Central University [email protected] 9/24/2002

Information extraction for Free Text

  • Upload
    butest

  • View
    753

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Information extraction for Free Text

Plain Text Information Extraction (based on Machine Learning)Chia-Hui Chang Department of Computer Science & Information EngineeringNational Central University

[email protected]/24/2002

Page 2: Information extraction for Free Text

Introduction

Plain Text Information Extraction The task of locating specific pieces of data from a

natural language document To obtain useful structured information from unstr

uctured text DARPA’s MUC program

The extraction rules are based on syntactic analyzer semantic tagger

Page 3: Information extraction for Free Text

On-line documents SRV, AAAI-1998

D. Freitag Rapier, ACL-1997, AAAI-

1999 M. E. Califf

WHISK, ML-1999 Solderland

Related Work

Free-text documents PALKA, MUC-5, 1993 AutoSlog, AAAI-1993

E. Riloff LIEP, IJCAI-1995

Huffman Crystal, IJCAI-1995, KD

D-1997 Solderland

Page 4: Information extraction for Free Text

SRVInformation Extraction from HTML: Application of a General Machine Learning Approach

Dayne Freitag

[email protected]

AAAI-98

Page 5: Information extraction for Free Text

Introduction

SRV A general-purpose relational learner A top-down relational algorithm for IE Reliance on a set of token-oriented features

Extraction pattern First-order logic extraction pattern with predicates

based on attribute-value tests

Page 6: Information extraction for Free Text

Extraction as Text Classification Extraction as Text Classification

Identify the boundaries of field instances Treat each fragment as a bag-of-words Find the relations from the surrounding context

Page 7: Information extraction for Free Text

Relational Learning

Inductive Logic Programming (ILP) Input: class-labeled instances Output: classifier for unlabeled instances Typical covering algorithm

Attribute values are added greedily to a rule The number of positive examples is heuristically

maximized while the number of negative examples is heuristically minimized

Page 8: Information extraction for Free Text

Simple Features

Features on individual token Length (e.g. single letter or multiple letters) Character type (e.g. numeric or alphabet) Orthography (e.g. capitalized) Part of speech (e.g. verb) Lexical meaning (e.g. geographical_place)

Page 9: Information extraction for Free Text

Individual Predicates

Individual predicate: Length (=3): accepts only fragments containing three token

s Some(?A [] capitalizedp true): the fragment contains some t

oken that is capitalized Every(numericp false): every token in the fragment is non-n

umeric Position(?A fromfirst <2): the token bound to ?A is either fir

st or second in the fragment Relpos(?A ?B =1) the token bound to ?A immediately prece

ds the token bound to ?B

Page 10: Information extraction for Free Text

Relational Features

Relational Feature types Adjacency (next_token) Linguistic syntax (subject_verb)

Page 11: Information extraction for Free Text

Example

Page 12: Information extraction for Free Text

Search

Adding predicates greedily, attempting to cover as many positive and as few negative examples as possible.

At every step in rule construction, all documents in the training set are scanned and every text fragment of appropriate size counted.

Every legal predicate is assessed in terms of the number of positive and negative examples it covers.

A position-predicate is not legal unless some-predicate is already part of the rule

Page 13: Information extraction for Free Text

Relational Paths

Relational features are used only in the Path argument to the some-predicate Some(?A [prev_token prev_token] capitalized tru

e): The fragment contains some token preceded by a capitalized token two tokens back.

Page 14: Information extraction for Free Text

Validation Training Phase

2/3: learning 1/3: validation

Testing Bayesian m-

estimates: All rules matching a given

fragment are used to assign a confidence score.

Combined confidence:

Ccc)1(1

Page 15: Information extraction for Free Text

Adapting SRV for HTML

Page 16: Information extraction for Free Text

Experiments Data Source:

Four university computer science departments: Cornell, U. of Texas, U. of Washington, U. of Wisconsin

Data Set: Course: title, number, instructor Project: title, member 105 course pages 96 project pages

Two Experiments Random: 5 cross-validation LOUO: 4-fold experiments

Page 17: Information extraction for Free Text

OPD Coverage:Each rule

has its own confidence

Page 18: Information extraction for Free Text

MPD

Page 19: Information extraction for Free Text

Baseline Strategies

OPD

MPD

Simply memorizes field instances

Random Guesser

Page 20: Information extraction for Free Text

Conclusions

Increased modularity and flexibility Domain-specific information is separate from the

underlying learning algorithm Top-down induction

From general to specific Accuracy-coverage trade-off

Associate confidence score with predictions Critique: single-slot extraction rule

Page 21: Information extraction for Free Text

RAPIERRelational Learning of Pattern-Match Rules for Information Extraction

M.E. Califf and R.J. Mooney

ACL-97, AAAI-1999

Page 22: Information extraction for Free Text

Rule Representation

Single-slot extraction patterns Syntactic information (part-of-speech tagger) Semantic class information (WordNet)

Page 23: Information extraction for Free Text

The Learning Algorithm A specific to general search

The pre-filler pattern contains an item for each word The filler pattern has one item from each word in the

filler The post-filler has one item for each word

Compress the rules for each slot Generate the least general generalization (LGG) of each

pair of rules When the LGG of two constraints is a disjunction, we

create two alternatives (1) disjunction (2) removal of the constraints.

Page 24: Information extraction for Free Text

Example Located in Atlanta, Georgia. Offices in Kansas City, Missouri.

,,

,,

Page 25: Information extraction for Free Text

Example:

Assume there is a semantic class for states, but not one for cities.

Located in Atlanta, Georgia.Offices in Kansas City, Missouri.

Page 26: Information extraction for Free Text
Page 27: Information extraction for Free Text

Experimental Evaluation

300 computer-related Jobs 17 slots: employer, location, salary, job requirements,

language and platform.

Page 28: Information extraction for Free Text

Experimental Evaluation

485 seminar announcement 4 slots:

Page 29: Information extraction for Free Text

WHISK:

S. Soderland

University of Washington

Journal of Machine Learning 1999

Page 30: Information extraction for Free Text

Semi-structured Text

Page 31: Information extraction for Free Text

Free Text

Person name Position

Verb stem

Verb stem

Page 32: Information extraction for Free Text

WHISK Rule Representation

For Semi-structured IE

Page 33: Information extraction for Free Text

WHISK Rule Representation For Free Text IE

Person name Position

Verb stem

Verb stem

Skip only whithin the same syntactic field

Page 34: Information extraction for Free Text

Example – Tagged by Users

Page 35: Information extraction for Free Text

The WHISK Algorithm

Page 36: Information extraction for Free Text

Creating a Rule from a Seed Instance Top-down rule induction

Start from an empty rule

Add terms within the extraction boundary (Base_1) Add terms just outside the extraction (Base_2)

Until the seed is covered

Page 37: Information extraction for Free Text

Example

Page 38: Information extraction for Free Text
Page 39: Information extraction for Free Text
Page 40: Information extraction for Free Text

EN

Page 41: Information extraction for Free Text

AutoSlog: Automatically Constructing a Dictionary for Information Extraction Tasks

Ellen RiloffDept. of Computer Science, University of Massachusetts, AAAI93

Page 42: Information extraction for Free Text

AutoSlog

Purpose: Automatically constructs a domain-specific

dictionary for IE Extraction pattern (concept nodes):

Conceptual anchor: a trigger word Enabling conditions: constraints

Page 43: Information extraction for Free Text

Concept Node Example

Physical target slot of a bombing template

Page 44: Information extraction for Free Text

Construction of Concept Nodes1. Given a target piece of information.

2. AutoSlog finds the first sentence in the text that contains the string.

3. The sentence is handed over to CIRCUS which generates a conceptual analysis of the sentence.

4. The first clause in the sentence is used.

5. A set of heuristics are applied to suggest a good conceptual anchor point for a concept node.

6. If none of the heuristics is satisfied, AutoSlog searches for the next sentence, and goto 3.

Page 45: Information extraction for Free Text

Conceptual Anchor Point Heuristics

Page 46: Information extraction for Free Text

Background Knowledge

Concept Node Construction Slot

The slot of the answer key Hard and soft constraints

Type: Use template types such as bombing, kidnapping Enabling condition: heuristic pattern

Domain Specification The type of a template The constraints for each template slot

Page 47: Information extraction for Free Text

Another good concept node definition Perpetrator slot from a

perpetrator template

Page 48: Information extraction for Free Text

A bad concept node definition Victim slot from a

kidnapping template

Page 49: Information extraction for Free Text

Empirical Results Input:

Annotated corpus of texts in which the targeted information is marked and annotated with semantic tags denoting the type of information (e.g., victim) and type of event (e.g., kidnapping)

1500 texts with 1258 answer keys contain 4780 string fillers Output:

1237 concept node definitions Human intervention: 5 user-hour to sift through all generated concept nodes 450 definitions are kept

Performance:

Page 50: Information extraction for Free Text

Conclusion

In 5 person-hour, AutoSlog creates a dictionary that achieves 98% of the performance of hand-crafted dictionary

Each concept node is a single-slot extraction pattern Reasons for bad definitions

When a sentence contains the targeted string but does not describe the event

When a heuristic proposes the wrong conceptual anchor point

When CIRCUS incorrectly analyzes the sentence

Page 51: Information extraction for Free Text

CRYSTAL: Inducing a Conceptual Dictionary

S. Soderland, D. Fisher, J. Aseltine, W. Lehnert

University of Massachusetts

IJCAI’95

Page 52: Information extraction for Free Text

Concept Nodes (CN)

CN-type Subtype Extracted syntactic

constituents Linguistic patterns Constraints on

syntactic constituents

Page 53: Information extraction for Free Text

The CRYSTAL Induction Tool

Creating initial CN definitions For each instance

Inducing generalized CN definitions Relaxing constraints for highly similar definitions

Word constraints: intersecting strings of words Class constraints: moving up the semantic hierarchy

Page 54: Information extraction for Free Text
Page 55: Information extraction for Free Text

Inducing Generalized CN Definitions1. Start from a CN definition, D

2. Assume we have found a second definition D’ which is similar to D,

a) Create a new definition U

b) Delete from the dictionary all definitions covered by U, e.g. D and D’

c) Test if U extracts only marked informationa) If ‘Yes’, then go to Step 2 and set D=U,

b) If ‘No’, then start from another definition as D

Page 56: Information extraction for Free Text
Page 57: Information extraction for Free Text

Implementation Issue

Finding similar definitions Indexing CN definitions by verbs and by extraction

buffers Similarity metric

Intersecting classes or intersecting strings of words

Testing error rate of a generalized definition A database of instances segmented by sentence

analyzer is constructed

Page 58: Information extraction for Free Text

Experimental Results

385 annotated hospital discharge reports

14719 training instances The choice of error

tolerance parameter is used to manipulate a tradeoff between precision and recall

Output: CN definitions 194, coverage=10 527, 2<coverage<10

Page 59: Information extraction for Free Text

Comparison

Bottom-up: From specific to generalized CRYSTAL [Soderland, 1996] RAPIER [Califf & Mooney, 1997]

Top-down: From general to specific SRV [Freitag, 1998] WHISK [Soderland, 1999]

Page 60: Information extraction for Free Text

References

I. Muslea, Extraction Patterns for Information Extraction Tasks: A Survey, The AAAI-99 Workshop on Machine Learning for Information Extraction.

Riloff, E. (1993) Automatically Constructing a Dictionary for Information Extraction Tasks, AAAI-93, pp. 811-816

S. Soderland, et al, CRYSTAL: Inducing a Conceptual Dictionary, AAAI-95.

Dayne Freitag, Information Extraction from HTML: Application of a General Machine Learning Approach, AAAI98

Mary Elaine Califf and Raymond J. Mooney, Relational Learning of Pattern-Match Rules for Information Extraction, AAAI-99, Orlando, FL, pp. 328-334, July, 1999.

S. Soderland, Learning information extraction rules for semi-structured and free text. J. of Machine Learning, 1999.