Semantic Annotation – Week 3

JHU WORKSHOP - 2003July 30th, 2003

Semantic Annotation – Week 3

Team: Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva, Jia Cui, Klaus Macherey, David Guthrie, Martin Holub, Marco Cammisa, Cassia Martin, Jerry Liu, Kris Haralambiev

Fred Jelinek


Our Hypotheses

● A transformation of a corpus to replace words and phrases with coarse semantic categories will help overcome the data sparseness problem encountered in language modeling

● Semantic category information will also help improve machine translation

● A noun-centric approach initially will allow bootstrapping for other syntactic categories


An Example

● Astronauts aboard the space shuttle Endeavor were forced to dodge a derelict Air Force satellite Friday

● Humans aboard space_vehicle dodge satellite timeref.


Our Progress – Preparing the data- Pre-Workshop

● Identify a tag set

● Create a Human annotated corpus

● Create a double annotated corpus

● Process all data for named entity and noun phrase recognition using GATE Tools

● Develop algorithms for mapping target categories to Wordnet synsets to support the tag set assessment


The Semantic Classes for Annotators

● A subset of classes available in Longman's Dictionary of contemporary English (LDOCE) Electronic version

● Rationale:

The number of semantic classes was smallThe classes are somewhat reliable since they were used by a team of lexicographers to code

Noun senses Adjective preferences Verb preferences


Semantic Classes

Abstract T

B Movable N

Animate Q

Plant P Animal A Human H

Inanimate I

Liquid L Gas G Solid S

Concrete C

D F MNon-movable J

• Target Classes• Annotated Evidence--

PhysQuant 4Organic 5


More Categories

● U: Collective● K: Male● R: Female● W: Not animate● X: Not concrete or animal● Z: Unmarked

We allowed annotators to choose “none of the above” (? in the slides that follow)


Our Progress – Data Preparation

● Assess annotation format and define uniform descriptions for irregular phenomena and normalize them

● Determine the distribution of the tag set in the training corpus

● Analyze inter-annotator agreement

● Determine a reliable set of tags – T

● Parse all training data


Doubly Annotated Data

● Instances (headwords): 10960

● 8,950 instances without question marks.

● 8,446 of those are marked the same.

● Inter-annotator agreement is 94% (83% including question marks)

●

Recall – these are non named entity noun phrases


77,06%

5,79%12,55% 4,60%

agree w/o ?agree with ?disagree with ?disagree w/o ?

Distribution of Double Annotated Data


Agreement of doubly marked instances

94%

6% agree w/o ?

disagree w/o ?


Inter-annotator agreement – for each category

00,10,20,30,40,50,60,70,80,9

1

A C G H I J K L N P Q R S 4 5 T U W X

2


Category distribution among agreed part

inter-annotator agreement

W K Q G I P C L X R 5

4 A U N J S H T

69%


A few statistics on the human annotated data

● Total annotated 262,230 instances48,175 with ?

● 214,055 with a categoryof those Z .5%

W and X .5%

4 , 5 1.6%


Our progress – baselines

● Determine baselines for automatic tagging of noun phrases

● Baselines for tagging observed words in new contexts (new instances of known words)

● Baselines for tagging unobserved words Unseen words – not in the training material but in dictionary

Novel words – not in the training material nor in the dictionary/Wordnet


Overlap of dictionary and head nouns (in the BNC)

● 85% of NP’s covered

● only 33% of vocabulary (both in LDOCE and in Wordnet) in the NP’s covered


Preparation of the test environment

● Selected the blind portion of the human annotated data for late evaluation

● Divided the remaining corpus into training and held-out portions

Random division of files

Unambiguous words for training – ambiguous for testing


Baselines using only (target) words

Error Rate Unseen words marked with

Method Valid training instances

blame

15.1% the first class MaxEntropy count 3 Klaus

12.6% most frequent class

MaxEntropy count 3 Jerry

16% most frequent class

VFI all Fabio


NaiveBayes all Fabio


Baselines using only (target) words and preceeding adjectives

Error Rate Unseen words marked with

Method Valid training instances

blame




MaxEntropy all Jerry




Baselines using multiple knowledge sources

● Experiments in Sheffield

● Unambiguous tagger (assign only available semantic categories)

● bag-of-words tagger (IR inspired)window size 50 wordsnouns and verbs

● Frequency-based tagger (assign the most frequent semantic category)


Baselines using multiple knowledge sources (cont’d)

● Frequency-based tagger

16-18% error rate

● bag-of-words tagger

17% error rate

● Combined architecture

14.5-15% error rate


Bootstrapping to Unseen Words

● Problem: Automatically identify the semantic class of words in LDOCE whose behavior was not observed in the training data

● Basic Idea: We use the unambiguous words (unambiguous with respect to the our semantic tag set) to learn context for tagging unseen words.


Bootstrapping: statistics

6,656 different unambiguous lemmas in the (visible) human tagged corpus

...these contribute to 166,249 instances of data

...134,777 instances were considered correct by the annotators

! Observation: Unambiguous words can be used in the corpus in an “unforeseen” way


Bootstrapping baselines

Method % correct labelled instances

Assigning the most frequent semantic tag (i.e. Abstract)

52%

Using one previous word (Adjective, Noun, or Verb) (using Naive Bayes Classifier)

(with reliable tagged instances) 45%

(with all instances) 44.3%

1 previous and 1 following word (Adjective, Noun, or Verb) (using Naive Bayes Classifier)

(with reliable tagged instances) 46.8%

(with all instances) 44.5%

● Test Instances (instances of ambiguous words) : 62,853


Metrics for Intrinsic Evaluation

● Need to take into account the hierarchical structure of the target semantic categories

● Two fuzzy measures based on:

dominance between categories

edge distance in the category tree/graph

● Results wrt inter annotator agreement is almost identical to exact match


What’s next

● Investigate respective contribution of (independent) features

● Incorporate syntactic information

● Refine some coarse categories

Using subject codes

Using genus terms

Re-mapping via Wordnet


What’s next (cont’d)

● Reduce the number of features/values via external resources:

lexical vs. semantic models of the context

use selectional preferences

● Concentrate on complex cases (e.g. unseen words)

● Preparation of test data for extrinsic evaluation (MT)

Documents

Semantic Annotation – Week 3