View
212
Download
0
Embed Size (px)
Citation preview
A Metric-based Framework for Automatic Taxonomy InductionHui Yang and Jamie Callan
Language Technologies InstituteCarnegie Mellon University
ACL2009, Singapore
ROADMAP
Introduction
Related Work
Metric-Based Taxonomy Induction Framework
The Features
Experimental Results
Conclusions
INTRODUCTION
Semantic taxonomies, such as WordNet, play an important role in solving knowledge-rich problems
Limitations of Manually-created Taxonomies Rarely complete Difficult to include new terms from emerging/changing domains Time-consuming to create; May make it unfeasible for specialized
domains and personalized tasks
INTRODUCTION
Automatic Taxonomy Induction is a solution to Augment existing resources Quickly produce new taxonomies for specialized domains and
personalized tasks Subtasks in Automatic Taxonomy Induction
Term extraction Relation formation
This paper focuses on Relation Formation
Related Work Pattern-based Approaches Define lexical-syntactic patterns
for relations, and use these patterns to discover instances
Have been applied to extract Is-a, part-of, sibling, synonym, causal, etc, relations
Strength: Highly accurate Weakness: Sparse coverage of
patterns
Clustering-based Approaches Hierarchically cluster terms based
on similarities of their meanings usually represented by a feature vector
Have only been applied to extract is-a and sibling relations
Strength: Allowing discovery of relations which do not explicitly appear in text; higher recall
Weaknesses: Generally fail to produce coherent cluster for small corpora [Pantel and Pennacchiotti 2006]; Hard to label non-leaf nodes
A UNIFIED SOLUTION
Combine strengths of both approaches in a unified framework Flexibly incorporate heterogeneous features Use lexical-syntactic patterns as one types of features in a
clustering framework
Metric-based Taxonomy Induction
THE FRAMEWORK
A novel framework, which Incrementally clusters terms Transforms taxonomy induction into a multi-criteria optimization Using heterogeneous features
Optimization based on two criteria Minimization of taxonomy structures
Minimum Evolution Assumption Modeling of term abstractness
Abstractness Assumption
LET’S BEGIN WITH SOME IMPORTANT DEFINITIONS
A Taxonomy is a data model
Concept Set Relationship Set Domain
MORE DEFINITIONS
ball table
Game Equipment
A Full Taxonomy:
AssignedTermSet={game equipment, ball, table, basketball, volleyball, soccer, table-tennis table, snooker table}UnassignedTermSet={}
MORE DEFINITIONS
ball
Game Equipment
A Partial Taxonomy
table
AssignedTermSet={game equipment, ball, table, basketball, volleyball}UnassignedTermSet={soccer, table-tennis table, snooker table}
MORE DEFINITIONSOntology
Metric
distance = 1.5 distance = 2
distance =1
distance =1
d( , ) = 2
d( , ) = 1 ball
d( , ) = 4.5 table
ASSUMPTIONSMinimum Evolution
Assumption: The Optimal Ontology is One that Introduces
Least Information Changes!
ILLUSTRATIONMinimum Evolution
Assumption
ILLUSTRATIONMinimum Evolution
Assumption
ILLUSTRATIONMinimum Evolution
Assumptionball
ILLUSTRATIONMinimum Evolution
Assumption ball
table
ILLUSTRATIONMinimum Evolution
Assumption
ball table
Game Equipment
ILLUSTRATIONMinimum Evolution
Assumption
ball table
Game Equipment
ILLUSTRATIONMinimum Evolution
Assumption
ball table
Game Equipment
ASSUMPTIONSAbstractness
Assumption: Each abstraction level
has its own Information
function
ASSUMPTIONSAbstractness Assumption
ball table
Game Equipment
MULTIPLE CRITERION OPTIMIZATION
Minimum Evolution
objective function
Abstractnessobjective function
Scalarization variable
ESTIMATING ONTOLOGY METRIC
Assume ontology metric is a linear interpolation of some underlying feature functions
Ridge Regression to estimate and predict the ontology metric
THE FEATURES
Our framework allows a wide range of features to be used Input for the Feature Functions: Two terms Output: A numeric score to measure semantic
distance between these two terms We can use the following types of feature functions, but not
restricted to only these: Contextual Features Term Co-occurrence Lexical-Syntactic Patterns Syntactic Dependency Features Word Length Difference Definition Overlap, etc
EXPERIMENTAL RESULTS
Task: Reconstruct taxonomies from WordNet and ODP Not the entire WordNet or ODP, but fragments of WordNet or ODP
Ground Truth: 50 hypernym taxonomies from WordNet; 50 hypernym taxonomies from ODP; 50 meronym taxonomies from WordNet.
Auxiliary Datasets: 1000 Google documents per term or per term pair; 100 Wikipedia documents per term.
Evaluation Metrics: F1-measure (averaged by Leave-One-Out Cross Validation).
DATASETS
PERFORMANCE OF TAXONOMY INDUCTION
Compare our system (ME) with other state-of-the-art systems HE: 6 is-a patterns [Hearst 1992] GI: 3 part-of patterns [Girju et al. 2003] PR: a probabilistic framework [Snow et al. 2006] ME: our metric-based framework
PERFORMANCE OF TAXONOMY INDUCTION
Our system (ME) consistently gives the best F1 for all three tasks.
Systems using heterogeneous features (ME and PR) achieve a significant absolute F1 gain (>30%)
FEATURES VS. RELATIONS
This is the first study of the impact of using different features on taxonomy induction for different relations
Co-occurrence and lexico-syntactic patterns are good for is-a, part-of, and sibling relations
Contextual and syntactic dependency features are only good for sibling relation
FEATURES VS. ABSTRACTNESS
This is the first study of the impact of using different features on taxonomy induction for terms at different abstraction levels
Contextual, co-occurrence, lexical-syntactic patterns, and syntactic dependency features work well for concrete terms;
Only co-occurrence works well for abstract terms
CONCLUSIONS
This paper presents a novel metric-based taxonomy induction framework, which Combines strengths of pattern-based and clustering-based
approaches
Achieves better F1 than 3 state-of-the-art systems
The first study on the impact of using different features on taxonomy induction for different types of relations and for terms at different abstraction levels
CONCLUSIONS
This work is a general framework, which Allows a wider range of features
Allows different metric functions at different abstraction levels
This work has a potential to learn more complex taxonomies than previous approaches
THANK YOU AND QUESTIONS
[email protected]@cs.cmu.edu
EXTRA SLIDES
FORMAL FORMULATION OF TAXONOMY INDUCTION
The Task of Taxonomy Induction:
The construction of a full ontology T given a set of concepts C and an initial partial ontology T0
Keeping adding concepts in C into T0
Note T0 could be empty
Until a full ontology is formed
GOAL OF TAXONOMY INDUCTION
Find the optimal full ontology s.t. the information changes since T0 are least , i.e.,
Note that this is by the Minimum Evolution Assumption
GET TO THE GOAL
Goal:
Since the optimal set of concepts is always C
Concepts are added incrementally
GET TO THE GOAL
Plug in definition of information change
Transform into a minimization problemMinimum
Evolution objective function
EXPLICITLY MODEL ABSTRACTNESS
Model Abstractness for each Level by Least Square Fit
Plug in definition of amount of information for an abstraction level
Abstractnessobjective function
THE OPTIMIZATION ALGORITHM
MORE DEFINITIONS
distance = 1.5 distance = 2
distance =1
distance =1
d( , ) = 2
d( , ) = 1 ball
d( , ) = 4.5 table
Information in an
Taxonomy T
MORE DEFINITIONS
d( , ) = 2
d( , ) = 1 ball
d( , ) = 1
Information in a Level L
ball
Contextual Features Global Context KL-Divergence = KL-Divergence(1000 Google Documents for
Cx , 1000 Google Documents for Cy);
Local Context KL-Divergence = KL-Divergence(Left two and Right two words for Cx , Left two and Right two words for Cy).
Term Co-occurrence Point-wise Mutual Information (PMI)
= # of sentences containing the term(s);
or # of documents containing the term(s);
or n as in “Results 1-10 of about n for …” in Google.
EXAMPLES OF FEATURES
EXAMPLES OF FEATURES
Syntactic Dependency Features Minipar Syntactic Distance = Average length of syntactic paths in
syntactic parse trees for sentences containing the terms; Modifier Overlap = # of overlaps between modifiers of the terms; e.g.,
red apple, red pear; Object Overlap = # of overlaps between objects of the terms when the
terms are subjects; e.g., A dog eats apple; A cat eats apple; Subject Overlap = # of overlaps between subjects of the terms when
the terms are objects; e.g., A dog eats apple; A dog eats pear; Verb Overlap = # of overlaps between verbs of the terms when the
terms are subjects/objects; e.g., A dog eats apple; A cat eats pear.
EXAMPLES OF FEATURES
Lexical-Syntactic Patterns
EXAMPLES OF FEATURES
Miscellaneous Features Definition Overlap = # of non-stopword overlaps between
definitions of two terms. Word Length Difference