View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Centre for Text Technology (CTexT)Research Unit: Languages and Literature in the South African Context
North-West University, Potchefstroom Campus (PUK)South Africa
{Gerhard.VanHuyssteen; Martin.Puttkammer; Sulene.Pilon; Handre.Groenewald}@nwu.ac.za
30 September 2007; Borovets
Gerhard B van Huyssteen, Martin J Puttkammer, Suléne Pilon and Hendrik J Groenewald
Using Machine Learning to Annotate Data for NLP Tasks Semi-Automatically
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Overview
• Introduction
• End-User Requirements
• Solution: Design & Implementation
• Evaluation
• Conclusion
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Human Language Technologies
• HLTs depends on availability of linguistic data
• Specialized lexicons• Annotated and raw corpora• Formalized grammar rules
• Creation of such resources • Expensive and protractive• Especially for less-resourced languages
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Human Language TechnologiesLess-resourced LanguagesMethodology
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Less-resourced Languages
• "languages for which few digital resources exist; and thus, languages whose computerization poses unique challenges. [They] are languages with limited financial, political, and legal resources… " (Garrett, 2006)
• Implicit in this definition:– Lacks human resources (little attention in research or discussions)– Lacks computational linguists working on these languages
• Research question:– How could one facilitate development of linguistic data by
enabling non-experts to collaborate in the computerization of less-resourced languages?
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Human Language TechnologiesLess-resourced LanguagesMethodology
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Methodology I• Empowering linguists and mother-tongue
speakers to deliver annotated data– High quality– Shortest possible time
• Escalate the annotation of linguistic data by mother-tongue speakers – User-friendly environments– Bootstrapping– Machine learning instead of rule-based
techniques
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Human Language TechnologiesLess-resourced LanguagesMethodology
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Methodology II• The general idea:
– Development of gold standards– Development of annotated data – Bootstrapping
• With the click of a button:– Annotate data– Train machine-learning algorithm
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Human Language TechnologiesLess-resourced LanguagesMethodology
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Central Point of Departure I • Annotators are invaluable resources• Based on experiences with less-resourced
languages– Annotators have mostly word processing skills – Used to a GUI-based environment– Usually limited skills in a computational or
programming environment• Worst cases annotators have difficulties with
– File management– Unzipping– Proper encoding of text files
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
AssumptionsInterviews
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Central Point of Departure II • Aim of this project: Enabling
annotators to focus on what they are good at: Enriching data with expert linguistic knowledge
• Training the machine learner occurs automatically
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
AssumptionsInterviews
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
End-user Requirements I
• Unstructured interviews with four annotators 1. What do you find unpleasant
about your work as an annotator?2. What will make your life as an
annotator easier?
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
AssumptionsInterviews
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
End-user Requirements II1. What do you find unpleasant
about your work as an annotator?
–Repetitiveness • Lack of concentration/motivation
–Feeling “useless”• Do not see results
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
AssumptionsInterviews
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
End-user Requirements III2. What will make your life as an annotator
easier?– Friendly environment (i.e. GUI-based, and not
lists of words)– Bite-sizes of data rather than endless lists– Rather correct data than annotate from scratch
• Program should already suggest a possible annotation
– Click or drag– Reference works need to be available– Automatic data management
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
AssumptionsInterviews
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Solution: TurboAnnotate
• User-friendly annotating environment – Bootstrapping with machine learning
– Creating gold standards/annotated lists
• Inspired by DictionaryMaker (Davel and Peche, 2006) and Alchemist (University of Chicago, 2004)
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
DictionaryMaker
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Alchemist
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
StartMake
TrainingSet
Make GoldStandard
GoldStandard
AutoEvaluateAutoMake Classifier
Verify AnnotatedSet
AutoMake AnnotatedSet
Continue?
AnnotatedSet
End
No
Yes
TrainingSet
Simplified Workflow of TurboAnnotate
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Step 1: Create Gold Standard
• Create gold standard –Independent test set for evaluating
performance–1000 random instances used–Annotator only has to select one
data file
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
StartMake
TrainingSet
Make GoldStandard
GoldStandard
AutoEvaluateAutoMake Classifier
Verify AnnotatedSet
AutoMake AnnotatedSet
Continue?
AnnotatedSet
End
No
Yes
TrainingSet
Simplified Workflow of TurboAnnotate
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Step 2: Verify Annotations
• New data sourced from base list – Automatically annotated by classifier – Presented to annotator in the "Annotate" tab
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
TurboAnnotate : Annotation Environment
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
TurboAnnotateTurboAnnotate
Project Search/Edit Results
v r e e s l i ko n t s t e l l e n dm i s v o r m ds w i e r i g v o o r t r e f l i k
s p o g * g e * r i g Accept
l e l * i kp r a g * t i gv i e s * l i ko u * l i kf a n * t a s * t i e s
Options
Incoming:
Done:
To Do:
Help
Cancel & Exit
Save & Exit
Save & Train
Annotate
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
StartMake
TrainingSet
Make GoldStandard
GoldStandard
AutoEvaluateAutoMake Classifier
Verify AnnotatedSet
AutoMake AnnotatedSet
Continue?
AnnotatedSet
End
No
Yes
TrainingSet
Simplified Workflow of TurboAnnotate
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Step 3: Verify Annotated Set
• Bootstrapping – inspired by DictionaryMaker
• 200 words per chunk – trained in background
• Annotator verifies– Click “accept” or correct the instance
• Verified data serve as training data• Iterative process till desired results
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
The Machine Learning System I• Tilburg Memory-Based Learner (TiMBL).
– Wide success and applicability in the field of natural language processing
– Available for research purposes– Relative ease to use
• On the down-side– Performs best with large quantities of data
• For the tasks of hyphenation and compound analysis, TiMBL performs well with small quantities of data
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
The Machine Learning System II
• Default parameter settings used
• Task specific feature selection
• Performance is evaluated against gold standard–For hyphenation and compound
analysis, accuracy is determined on word-level and not per instance
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Features I
• All input words converted feature vectors– Splitting window– Context 3 positions (left and right)
• Class– Hyphenation: indicating a break– Compound Analysis: 3 possible classes
• + indicating word boundary• _ indicating valence morpheme• = no break
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Features II
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
• Example: eksamenlokaal -‘examination room’
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Parameter Optimisation I
• Large variations in accuracy occur when parameter settings of MBL algorithms are changed
• Finding the best combination of parameters– Exhaustive searches undesirable– Slow and computationally expensive
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Parameter Optimisation II
• Alternative: Paramsearch (Van den Bosch, 2005)– delivers combinations of algorithmic
parameters that are estimated to perform well
• PSearch– Our own modification of Paramsearch– Only implemented after all data has been
annotated– Ensures the best possible classifier
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
Functional Specifications & SolutionsTechnical Specifications & SolutionsUser Instructions
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Criteria• Two criteria
– Accuracy– Human effort (time)
• Evaluated on the tasks of hyphenation and compound analysis for Afrikaans and Setswana
• Four human annotators– Two well-experienced in annotating– Two considered novices in the field
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
CriteriaAccuracyEffort
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Accuracy• Two kinds of accuracy
– Classifier accuracy– Human accuracy
• Expressed as percentage of correctly annotated words over total number of words
• Gold standard excluded as training data
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
CriteriaAccuracyEffort
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Classifier Accuracy (Hyphenation)
# Words in Training Data
Accuracy: Afrikaans Accuracy: Setswana
200 38.60% 94.50%
600 54.00% 98.30%
1000 58.30% 98.80%
2000 68.50% 98.90%
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
CriteriaAccuracyEffort
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Human Accuracy• Human accuracy
– Two separate unseen datasets of 200 words for each language
– First dataset annotated in an ordinary text editor
– The second dataset annotated with TurboAnnotate.
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
CriteriaAccuracyEffort
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Human Accuracy
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
CriteriaAccuracyEffort
Annotation Tool
Accuracy (Hyph)
Time (s) (Hyph)
Accuracy (CA)
Time (s)
(CA)
Text Editor (200 Words)
93.25% 1325 91.50% 802
TurboAnnotate (200 words)
98.34% 1258 94.00% 748
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Human Effort I
• Two questions – Is it faster to annotate with TurboAnnotate?– What would the predicted saving on human effort be on
a large dataset?
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
CriteriaAccuracyEffort
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Human Effort II
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
CriteriaAccuracyEffort
# Words in Training Set
Time (s) (Hyph)
Time (s)
(CA)0 1258 748
600 663 614
2000 573 582
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Human Effort III• 1 minute faster to annotate 200 words with
TurboAnnotate• Larger dataset (40,000 words)
– Difference of only circa 3.5 uninterrupted human hours
• This picture changes when the effect of bootstrapping is considered– Extrapolating to 42,967 words
• Saving of 51 hours (68%) for hyphenation• Saving of 9 hours (41%) for compound analysis
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluationConclusion
CriteriaAccuracyEffort
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Conclusion• TurboAnnotate helps to increase the
accuracy of human annotators• Saves human effort
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluation
Conclusion
ConclusionFuture WorkObtaining TurboAnnotateAcknowledgements
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Future Work• Other lexical annotation tasks
– Creating lexicons for spelling checkers – Creating data for morphological analysis
• Stemming• Lemmatization
• Improve GUI• Network solution• Active Learning• Experiment with C5.0
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluation
Conclusion
ConclusionFuture WorkObtaining TurboAnnotateAcknowledgements
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
TurboAnnotate• Requirements:
– Linux– Perl 5.8– Gtk+ 2.10– TiMBL 5.1
• Open-source• Available at http://www.nwu.ac.za/ctext
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluation
Conclusion
ConclusionFuture WorkObtaining TurboAnnotateAcknowledgements
30 September 2007; Borovets Van Huyssteen, Puttkammer, Pilon & Groenewald
Acknowledgements• This work was supported by a grant from the
South African National Research Foundation (GUN: FA2004042900059).
• We also acknowledge the inputs and contributions of – Ansu Berg– Pieter Nortjé– Rigardt Pretorius– Martin Schlemmer– Wikus Slabbert
ConclusionFuture WorkObtaining TurboAnnotateAcknowledgements
IntroductionEnd-User Requirements
Solution: Design & ImplementationEvaluation
Conclusion