NAACL HLT 2010 d-Confidence

D-Confidence: an active learning strategy which efficiently identifies small classes

Learning from Incomplete Specifications

Nuno Filipe Escudeiro nfe@isep.ipp.pt Alípio Mário Jorge amjorge@fc.up.pt

NAACL HLT, 6 de Junho de 2010

Outline

1. Motivations

2. D-Confidence

3. Evaluation

4. Conclusions

5. Future Work

• Fraud detection

• Medical data, disease detection

• Web page classification

• Mail categorization

• …

Motivations | D-Confidence | Evaluation | Conclusions | Future Work

Automatic resource organization•Large corpora•Unlabeled text documents•Labeling is expensive

Need to identify exemplary cases for all labels to learn… fast (with few labels)

Collecting and annotating exemplary cases

– Critical

– Costly

Labeling effort related to:

– Number of labels to learn

– Class distribution in the working set

– Sample representativeness

Learning settings

– Supervised: high labeling effort

– Unsupervised: low expressiveness

– Semi-supervised: unable to deal with incomplete specifications

– Active learning: criterious selection of cases to label

• Minimize error

• Availability of pre-labeled examples on all classes

Active Learning

Accuracy at low cost

from a complete specification

D-Confidence

Accuracy and Representativeness at low cost

from incomplete specification

Active Learning

Accuracy at low cost

from a complete specification

D-Confidence

Accuracy and Representativeness at low cost

from incomplete specification

D-Confidence

– Active learning strategy selecting queries with:

• Low confidence

– exploitation / accuracy

• High distance to known classes

– exploration / representativeness

Intuition

Combines low-confidence with high-distance to produce a bias towards cases from unknown classes located in unexplored regions in case space

kk xlab,udistmedian

u|cconfmaxudConf

Effect on (SVM) confidence

-5 -4 -3 -2 -1 0 1 2 3 4 5 6

Signed distance to dividing hyperplane

D-Confidence

– Repository (UCI) datasets

– Text corpora

Class distributionDataset # 1 2 3 4 5 6 7 8 9 10 11Iris 150 50 50 50 Cleveland 298 161 53 36 35 13 Vowels 330 30 30 30 30 30 30 30 30 30 30 30SatImg 500 125 48 96 46 67 118 Poker 500 270 170 34 12 4 3 3 2 1 1

Dataset ActiveLearn1st hit

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7 Class 8 Class 9 Class 10 Class 11

irisConf 1 7 3 dConf 1 3 1

clevelandConf 3 7 8 19 40 dConf 3 15 8 5 8

vowelsConf 3 10 14 31 12 27 29 15 31 18 24dConf 2 12 19 16 24 26 23 2 26 3 23

satimgConf 12 28 34 23 32 5 dConf 9 1 4 10 3 10

pokerConf 1 3 20 43 113 112 147 223 279 277 dConf 3 2 5 9 45 97 98 68 100 65

D-Confidence

– Text corpora

Text corpora

20 Newsgroups• 500 cases, 20 classes

• most frequent class 35

• least frequent class 20

Reuters-21578• 1000 cases, 52 classes

• most frequent class 435

• least frequent class 2

• 42 out of 52 classes with frequency below 10

ConfidenceFarthestFirstdConfidence

– D-Confidence identifies classes faster (lower cost)

– This gain is bigger for minority classes

– D-Confidence performs better in imbalanced data

– Error may increase

• Exploration / exploitation

• Representativeness / accuracy

– Semi-supervised D-Confidence

– Retrieve cases when representativeness assumption fails

– Scalability

Thank you!

Nuno Filipe Escudeiro nfe@isep.ipp.pt Alípio Mário Jorge amjorge@fc.up.pt

D-Confidence

– Simulated datasets

– Text corpora

Levels (refer to training set properties)

Factor 1 (+) 0 (-)

Colinearity colinear centroids non-colinear centroids

Balancing imbalanced class distribution balanced class distribution

Cohesion isomorphic classes polymorphic classes

Overlapping overlapping separable

Response

ErrorGain = gen.error(dConfidence) – gen.error(Confidence)

Simulated datasets

Colinear Imbalanced Isomorphic Overlapping

1 (+) 1 (+) 1 (+) 1 (+)

1 (+) 1 (+) 1 (+) 0 (-)

1 (+) 1 (+) 0 (-) 1 (+)

1 (+) 1 (+) 0 (-) 0 (-)

1 (+) 0 (-) 1 (+) 1 (+)

1 (+) 0 (-) 1 (+) 0 (-)

1 (+) 0 (-) 0 (-) 1 (+)

1 (+) 0 (-) 0 (-) 0 (-)

0 (-) 1 (+) 1 (+) 1 (+)

0 (-) 1 (+) 1 (+) 0 (-)

0 (-) 1 (+) 0 (-) 1 (+)

0 (-) 1 (+) 0 (-) 0 (-)

0 (-) 0 (-) 1 (+) 1 (+)

0 (-) 0 (-) 1 (+) 0 (-)

0 (-) 0 (-) 0 (-) 1 (+)

0 (-) 0 (-) 0 (-) 0 (-)

Colinearity Imbalanced Isomorphic Overlapping

4,241 -3,835 -15,459 1,296

Finding cases from all classes

Meta-LearningColinearity

– correlation coefficient, r, among cluster centroids– colinear when |r| ~ 1

Balancing– variance of nk

– balanced when var(nk) ~ 0

Cohesion– #classes divided by #clusters– cohesive when ~ 1– representativeness fails (or highly overlapping clusters) when > 1

Overlapping– inter-cluster inertia divided by intra-cluster inertia– separable when >> 1

NAACL HLT 2010 d-Confidence

Technology

Page 1 NAACL-HLT BEA-5 2010 Los Angeles, CA Annotating ESL Errors: Challenges and Rewards Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign

1 Combining Probability-Based Rankers for Action-Item Detection HLT/NAACL 2007 April 24, 2007 Paul N. Bennett Microsoft Research Jaime G. Carbonell Carnegie

Constructing Knowledge Graph from Unstructured …...Word Representations. In Proceedings of NAACL HLT, 2013. 2) Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey

Data-efficient Neural Text Compression with Interactive LearningProceedings of NAACL-HLT 2019 , pages 2543 2554 Minneapolis, Minnesota, June 2 - June 7, 2019. c 2019 Association for

Human Language Technologies: Association for Proceedings of … · NAACL HLT 2009 Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association

NAACL HLT 2015 · NAACL HLT 2015 The Tenth Workshop on Innovative Use of NLP for Building Educational Applications Proceedings of the Workshop June 4, 2015 Denver, Colorado, USA

Simpler & More General Minimization for Weighted Finite-State Automata Jason Eisner Jason Eisner Johns Hopkins University May 28, 2003 — HLT-NAACL First

Neural Dialog Models: A Survey2boy.org/~yuta/publications/neural-dialog-models-survey-20150906.pdf · In NAACL-HLT 2015. • Lifeng Shang, Zhengdong Lu, and Hang Li. Neural Responding

One Size Does Not Fit All: Comparing NMT Representations of …groups.csail.mit.edu/sls/publications/2019/... · 2019-07-08 · Proceedings of NAACL-HLT 2019 , pages 1504 1516 Minneapolis,

Tokenization and Sentence Segmentationstp.lingfil.uu.se/~shaooyan/textanalys17/TokeninsationSegmentation.pdf · Exercise Evaluation Summary ... Chinese Word Segmentation ... NAACL-HLT,pages381–384.AssociationforComputational

Computational Linguistics in the Translator's … of the NAACL HLT 2010 Workshop on Computational Linguistics and Writing, pages 1–6, ... this is done in relation to special style

NAACL HLT 2018in our presentation of extant storytelling research, discourse, and innovations. Rather, we seek to spark dialogues, questions, and cross-discipline exchanges around

Margaret Mitchell - NAACL HLT 2016m-mitchell.com/NAACL-2016/SRW/N16-2-2016.pdfMinjoon Seo, University of Washington Kairit Sirts, Tallinn University of Technology Huan Sun, University

Computational Models of Personality Recognition through ...farm2.user.srcf.net/research/papers/cogsci-hlt-talk-05-06-2006.pdf · HLT-NAACL Conference 2006 June 5th 2006, New York

The NLTK FrameNet Lexicon API - NAACL: North American ...naacl.org/naacl-hlt-2015/tutorial-framenet-data/FrameNetAPI.pdf · • NLTK (Bird, Klein, & Loper 2009; ) is a Python toolkit

NAACL HLT 2019

Learning Relational Representations by Analogy …Proceedings of NAACL-HLT 2019 , pages 3235 3245 Minneapolis, Minnesota, June 2 - June 7, 2019. c 2019 Association for Computational

Ferhan Ture and Jimmy Lin University of Maryland, College Park NAACL-HLT’12 June 6, 2012

Punny Captions: Witty Wordplay in Image DescriptionsProceedings of NAACL-HLT 2018, pages 770–775 New Orleans, Louisiana, June 1 - 6, 2018. c 2018 Association for Computational Linguistics

Cross-Language Name Search Raghavendra UdupaMicrosoft Research India Mitesh KhapraIIT Bombay NAACL-HLT 2010 June 3, 2010 Improving the Multilingual User