39
Centre for Text Technology (CTexT) Research Unit: Languages and Literature in the South African Context North-West University, Potchefstroom Campus (PUK) South Africa {Gerhard.VanHuyssteen; Martin.Puttkammer; Sulene.Pilon; Handre.Groenewald}@nwu.ac.za 30 September 2007; Borovets Gerhard B van Huyssteen, Martin J Puttkammer, Suléne Pilon and Hendrik J Groenewald Using Machine Learning to Annotate Data for NLP Tasks Semi-Automatically

Centre for Text Technology (CTexT) Research Unit: Languages and Literature in the South African Context North-West University, Potchefstroom Campus (PUK)

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Centre for Text Technology (CTexT)Research Unit: Languages and Literature in the South African Context

North-West University, Potchefstroom Campus (PUK)South Africa

{Gerhard.VanHuyssteen; Martin.Puttkammer; Sulene.Pilon; Handre.Groenewald}@nwu.ac.za

30 September 2007; Borovets

Gerhard B van Huyssteen, Martin J Puttkammer, Suléne Pilon and Hendrik J Groenewald

Using Machine Learning to Annotate Data for NLP Tasks Semi-Automatically

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Overview

• Introduction

• End-User Requirements

• Solution: Design & Implementation

• Evaluation

• Conclusion

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Human Language Technologies

• HLTs depends on availability of linguistic data

• Specialized lexicons• Annotated and raw corpora• Formalized grammar rules

• Creation of such resources • Expensive and protractive• Especially for less-resourced languages

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Human Language TechnologiesLess-resourced LanguagesMethodology

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Less-resourced Languages

• "languages for which few digital resources exist; and thus, languages whose computerization poses unique challenges. [They] are languages with limited financial, political, and legal resources… " (Garrett, 2006)

• Implicit in this definition:– Lacks human resources (little attention in research or discussions)– Lacks computational linguists working on these languages

• Research question:– How could one facilitate development of linguistic data by

enabling non-experts to collaborate in the computerization of less-resourced languages?

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Human Language TechnologiesLess-resourced LanguagesMethodology

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Methodology I• Empowering linguists and mother-tongue

speakers to deliver annotated data– High quality– Shortest possible time

• Escalate the annotation of linguistic data by mother-tongue speakers – User-friendly environments– Bootstrapping– Machine learning instead of rule-based

techniques

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Human Language TechnologiesLess-resourced LanguagesMethodology

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Methodology II• The general idea:

– Development of gold standards– Development of annotated data – Bootstrapping

• With the click of a button:– Annotate data– Train machine-learning algorithm

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Human Language TechnologiesLess-resourced LanguagesMethodology

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Central Point of Departure I • Annotators are invaluable resources• Based on experiences with less-resourced

languages– Annotators have mostly word processing skills – Used to a GUI-based environment– Usually limited skills in a computational or

programming environment• Worst cases annotators have difficulties with

– File management– Unzipping– Proper encoding of text files

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

AssumptionsInterviews

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Central Point of Departure II • Aim of this project: Enabling

annotators to focus on what they are good at: Enriching data with expert linguistic knowledge

• Training the machine learner occurs automatically

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

AssumptionsInterviews

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

End-user Requirements I

• Unstructured interviews with four annotators 1. What do you find unpleasant

about your work as an annotator?2. What will make your life as an

annotator easier?

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

AssumptionsInterviews

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

End-user Requirements II1. What do you find unpleasant

about your work as an annotator?

–Repetitiveness • Lack of concentration/motivation

–Feeling “useless”• Do not see results

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

AssumptionsInterviews

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

End-user Requirements III2. What will make your life as an annotator

easier?– Friendly environment (i.e. GUI-based, and not

lists of words)– Bite-sizes of data rather than endless lists– Rather correct data than annotate from scratch

• Program should already suggest a possible annotation

– Click or drag– Reference works need to be available– Automatic data management

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

AssumptionsInterviews

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Solution: TurboAnnotate

• User-friendly annotating environment – Bootstrapping with machine learning

– Creating gold standards/annotated lists

• Inspired by DictionaryMaker (Davel and Peche, 2006) and Alchemist (University of Chicago, 2004)

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

DictionaryMaker

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Alchemist

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

StartMake

TrainingSet

Make GoldStandard

GoldStandard

AutoEvaluateAutoMake Classifier

Verify AnnotatedSet

AutoMake AnnotatedSet

Continue?

AnnotatedSet

End

No

Yes

TrainingSet

Simplified Workflow of TurboAnnotate

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Step 1: Create Gold Standard

• Create gold standard –Independent test set for evaluating

performance–1000 random instances used–Annotator only has to select one

data file

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

StartMake

TrainingSet

Make GoldStandard

GoldStandard

AutoEvaluateAutoMake Classifier

Verify AnnotatedSet

AutoMake AnnotatedSet

Continue?

AnnotatedSet

End

No

Yes

TrainingSet

Simplified Workflow of TurboAnnotate

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Step 2: Verify Annotations

• New data sourced from base list – Automatically annotated by classifier – Presented to annotator in the "Annotate" tab

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

TurboAnnotate : Annotation Environment

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

TurboAnnotateTurboAnnotate

Project Search/Edit Results

v r e e s l i ko n t s t e l l e n dm i s v o r m ds w i e r i g v o o r t r e f l i k

s p o g * g e * r i g Accept

l e l * i kp r a g * t i gv i e s * l i ko u * l i kf a n * t a s * t i e s

Options

Incoming:

Done:

To Do:

Help

Cancel & Exit

Save & Exit

Save & Train

Annotate

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

StartMake

TrainingSet

Make GoldStandard

GoldStandard

AutoEvaluateAutoMake Classifier

Verify AnnotatedSet

AutoMake AnnotatedSet

Continue?

AnnotatedSet

End

No

Yes

TrainingSet

Simplified Workflow of TurboAnnotate

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Step 3: Verify Annotated Set

• Bootstrapping – inspired by DictionaryMaker

• 200 words per chunk – trained in background

• Annotator verifies– Click “accept” or correct the instance

• Verified data serve as training data• Iterative process till desired results

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

The Machine Learning System I• Tilburg Memory-Based Learner (TiMBL).

– Wide success and applicability in the field of natural language processing

– Available for research purposes– Relative ease to use

• On the down-side– Performs best with large quantities of data

• For the tasks of hyphenation and compound analysis, TiMBL performs well with small quantities of data

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

The Machine Learning System II

• Default parameter settings used

• Task specific feature selection

• Performance is evaluated against gold standard–For hyphenation and compound

analysis, accuracy is determined on word-level and not per instance

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Features I

• All input words converted feature vectors– Splitting window– Context 3 positions (left and right)

• Class– Hyphenation: indicating a break– Compound Analysis: 3 possible classes

• + indicating word boundary• _ indicating valence morpheme• = no break

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Features II

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

• Example: eksamenlokaal -‘examination room’

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Parameter Optimisation I

• Large variations in accuracy occur when parameter settings of MBL algorithms are changed

• Finding the best combination of parameters– Exhaustive searches undesirable– Slow and computationally expensive

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Parameter Optimisation II

• Alternative: Paramsearch (Van den Bosch, 2005)– delivers combinations of algorithmic

parameters that are estimated to perform well

• PSearch– Our own modification of Paramsearch– Only implemented after all data has been

annotated– Ensures the best possible classifier

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Criteria• Two criteria

– Accuracy– Human effort (time)

• Evaluated on the tasks of hyphenation and compound analysis for Afrikaans and Setswana

• Four human annotators– Two well-experienced in annotating– Two considered novices in the field

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

CriteriaAccuracyEffort

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Accuracy• Two kinds of accuracy

– Classifier accuracy– Human accuracy

• Expressed as percentage of correctly annotated words over total number of words

• Gold standard excluded as training data

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

CriteriaAccuracyEffort

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Classifier Accuracy (Hyphenation)

# Words in Training Data

Accuracy: Afrikaans Accuracy: Setswana

200 38.60% 94.50%

600 54.00% 98.30%

1000 58.30% 98.80%

2000 68.50% 98.90%

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

CriteriaAccuracyEffort

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Human Accuracy• Human accuracy

– Two separate unseen datasets of 200 words for each language

– First dataset annotated in an ordinary text editor

– The second dataset annotated with TurboAnnotate.

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

CriteriaAccuracyEffort

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Human Accuracy

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

CriteriaAccuracyEffort

Annotation Tool

Accuracy (Hyph)

Time (s) (Hyph)

Accuracy (CA)

Time (s)

(CA)

Text Editor (200 Words)

93.25% 1325 91.50% 802

TurboAnnotate (200 words)

98.34% 1258 94.00% 748

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Human Effort I

• Two questions – Is it faster to annotate with TurboAnnotate?– What would the predicted saving on human effort be on

a large dataset?

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

CriteriaAccuracyEffort

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Human Effort II

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

CriteriaAccuracyEffort

# Words in Training Set

Time (s) (Hyph)

Time (s)

(CA)0 1258 748

600 663 614

2000 573 582

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Human Effort III• 1 minute faster to annotate 200 words with

TurboAnnotate• Larger dataset (40,000 words)

– Difference of only circa 3.5 uninterrupted human hours

• This picture changes when the effect of bootstrapping is considered– Extrapolating to 42,967 words

• Saving of 51 hours (68%) for hyphenation• Saving of 9 hours (41%) for compound analysis

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluationConclusion

CriteriaAccuracyEffort

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Conclusion• TurboAnnotate helps to increase the

accuracy of human annotators• Saves human effort

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluation

Conclusion

ConclusionFuture WorkObtaining TurboAnnotateAcknowledgements

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Future Work• Other lexical annotation tasks

– Creating lexicons for spelling checkers – Creating data for morphological analysis

• Stemming• Lemmatization

• Improve GUI• Network solution• Active Learning• Experiment with C5.0

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluation

Conclusion

ConclusionFuture WorkObtaining TurboAnnotateAcknowledgements

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

TurboAnnotate• Requirements:

– Linux– Perl 5.8– Gtk+ 2.10– TiMBL 5.1

• Open-source• Available at http://www.nwu.ac.za/ctext

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluation

Conclusion

ConclusionFuture WorkObtaining TurboAnnotateAcknowledgements

30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald

Acknowledgements• This work was supported by a grant from the

South African National Research Foundation (GUN: FA2004042900059).

• We also acknowledge the inputs and contributions of – Ansu Berg– Pieter Nortjé– Rigardt Pretorius– Martin Schlemmer– Wikus Slabbert

ConclusionFuture WorkObtaining TurboAnnotateAcknowledgements

IntroductionEnd-User Requirements

Solution: Design & ImplementationEvaluation

Conclusion