21
A Report on the First Native Language Identification Shared Task Joel Tetreault Nuance Communications Daniel Blanchard Educational Testing Service Aoife Cahill Educational Testing Service

A Report on the First Native Language Identification Shared Task

  • Upload
    cate

  • View
    50

  • Download
    6

Embed Size (px)

DESCRIPTION

A Report on the First Native Language Identification Shared Task. Joel Tetreault Nuance Communications Daniel Blanchard Educational Testing Service Aoife Cahill Educational Testing Service. Native Language Identification. - PowerPoint PPT Presentation

Citation preview

Page 1: A Report on the First Native Language Identification Shared Task

A Report on the First Native Language Identification Shared Task

Joel Tetreault Nuance CommunicationsDaniel Blanchard Educational Testing ServiceAoife Cahill Educational Testing Service

Page 2: A Report on the First Native Language Identification Shared Task

Native Language IdentificationTask of automatically identifying

a speaker’s first language based solely on the speaker’s writing in another language

Applications:◦Authorship profiling (Estival et al.,

2007)◦Education: more targeted feedback

to language learners (Leacock et al., 2010)

Page 3: A Report on the First Native Language Identification Shared Task

Sample Essay 1 No risk no fun  I agree the statement

"Successful people try new things and take risk".In my mind it is so, to. When you thing you like do new stuff you need a liddelbit the kick. That is the big point what I need. For exsample I like to go to a big city like New York. I was never in this town I dont no from the city. But I like go to the city. Thats fun I stay every time for proplems. I need eat a hood offer my head. The ather side I can go dow. I dont gat waht I need…Next exsample the wall street you put money in funds, well you this make a good job. Dont for get the risk look like lose money.German

Page 4: A Report on the First Native Language Identification Shared Task

Sample Essay 2  For example, if you take a look at an ordinary

school, you have different teachers for every subject. Your calculus teacher is different than your literature teacher. Each teacher must specialize in a specific subject in order to convey suffiecient and proper information to the students. However, that doesn't mean that the teacher is narrow-minded and has a limited perspective in life because to specialize in one subject doesn't hinder you or stop you from exploring other subjects.

Arabic

Page 5: A Report on the First Native Language Identification Shared Task

MotivationLots of work in NLI but…it has

been hard to compare different approaches:

1. ICLEv2 (Granger et al, 2009): de facto train/test data is small and has NLI-unfriendly idiosyncrasies

2. No consensus on evaluation:- Which L1’s / how many L1’s?- Train/test splits?- Best features?

Page 6: A Report on the First Native Language Identification Shared Task

ContributionsGoal to unify community and help

field progressProvide a larger, more NLI-friendly

corpus that improves upon ICLEv2Common evaluation framework

◦Everyone evaluates using same train/dev/test splits and same L1s

Corpus and scripts to be made public to further promote the field

Page 7: A Report on the First Native Language Identification Shared Task

OutlinePrior WorkDataShared Task OverviewResultsNLI Shared Task in the Future

Page 8: A Report on the First Native Language Identification Shared Task

Prior WorkTreat NLI as a classification taskKoppel et al. (2005): POS n-grams, content

and function words, spelling and grammatical errors

Syntactic features (Wong and Dras, 2011)Tree Substitution Grammars (Swanson and

Charniak, 2012)Adaptor Grammars (Wong et al., 2012)Data Size Effects (Brooke and Hirst, 2012)Word n-grams (Bykh and Meurers, 2012): LMs and Ensemble Classifiers (Tetreault et

al., 2012)

Page 9: A Report on the First Native Language Identification Shared Task

Data: TOEFL11 Corpus12,100 essays from the ETS Test of

English as a Foreign Language (TOEFL) 11 L1s:

◦Arabic, Chinese French, German, Hindi, Italian, Japanese, Korean, Spanish, Telugu, Turkish

◦900 train / 100 dev / 100 testSampled for equal representation of L1s

across topics as much as possibleIncludes 3-tier proficiency levelPublic release via LDC this summer?

Page 10: A Report on the First Native Language Identification Shared Task

Shared Task Description: 3 Sub-tasks1. Closed-Training: 11-way classification

task using only TOEFL11-TRAIN and DEV2. Open-Training-1: use of any amount or

type of training data excluding TOEFL113. Open-Training-2: use of any amount or

type of training data combined with TOEFL11

* All sub-tasks use TOEFL11-TEST for the final evaluation set

Page 11: A Report on the First Native Language Identification Shared Task

Shared Task DescriptionEach team allowed to submit up

to 5 different systems per taskTeams submitted a CSV file for

each system to NLI OrganizersEvaluation script automatically

compares each prediction file to gold standard and creates performance report and contingency tables

Page 12: A Report on the First Native Language Identification Shared Task

29 TeamsBobicev Eurac MITRE

“Carnie”UKP

Chonger HAUTCS MQ UnibucCMU-Haifa ItaliaNLP NAIST UNTCologne-Nijmegen

Jarvis NRC UTD

CoRAL Lab @ UAB Kyle et al. Oslo NLI VTEXCUNI (Charles University)

LIMSI Toronto

Cywu LTRC IIIT Hyderabad

Tuebingen

Dartmouth Michigan Ualberta

Page 13: A Report on the First Native Language Identification Shared Task

RESULTS

Page 14: A Report on the First Native Language Identification Shared Task

Sub-Task Participation Statistics

Sub-task # Teams Competing

# Submissions

Closed 29 116Open-1 3 13Open-2 4 15

Page 15: A Report on the First Native Language Identification Shared Task

Closed Sub-TaskSee Table 3 of Report for full

resultsNo statistically significant

differences between top 5 teamsTeam Name Abbreviation Overall

AccuracyJarvis JAR 0.836Oslo NLI OSL 0.834Unibuc BUC 0.827MITRE “Carnie” CAR 0.826Tuebingen TUE 0.822

Page 16: A Report on the First Native Language Identification Shared Task

Open Sub-tasksChallenge : finding new data to cover each L1

Data sources for HIN & TEL:◦ ICNALE Pakistani essays HIN (TUE team)◦ Bilingual blogs (TOR & TUE team)

Corpus Description

ICLE All L1s except ARA, HIN, TELFCE All L1s except ARA , HIN, TELICNALE CHI, JPN, KOR essays onlyLang8 All L1s, but mostly Asian L1s

Page 17: A Report on the First Native Language Identification Shared Task

Discussion of ApproachesMachine Learning

◦SVM overwhelmingly the most popular approach

◦4 teams also tried Ensemble classifiers

◦String kernels (BUC) using character level n-grams

Page 18: A Report on the First Native Language Identification Shared Task

Discussion of ApproachesFeatures

◦N-grams: word, POS, character, function

◦Syntactic Features: Dependencies, TSG, CF Productions, Adaptor Grammars

◦Spelling Features4 of top 5 teams used n-grams at

least 4-grams, some went up to 9-grams

2 of top 10 teams used syntactic features

Page 19: A Report on the First Native Language Identification Shared Task

Future of NLI Shared TaskIdeas to expand scope of task

◦ Use a new set of TOEFL essays for test◦ Expand genres: blogs? Tweets? ◦ Number of L1s◦ Do different L2

ItaliaNLP – preparing Italian NLI corpus with CNR Pisa Also a corpus of Finnish with L1 (Turku Uni)

◦ Add slavic languagesLogistics

◦ Hold another shared task in 2014? Or 2015?◦ Merge with PAN Shared Task?

Tell us your thoughts!

Page 20: A Report on the First Native Language Identification Shared Task

AcknowledgmentsDerrick Higgins (ETS)ETS TOEFL Patrick Houghton (ETS)BEA8 OrganizersAll the NLI Participants!