Transcript
Page 1: Preposition Usage Errors by English as a Second Language (ESL) learners: “ They ate by* their hands.”  The writer used by instead of with. This work is

Preposition Usage Errors by English as a Second Language (ESL) learners:“They ate by* their hands.” The writer used by instead of with.

This work is supported by a grant from the US department of Education.

•A multi-class classifier is trained; each class corresponds to a distinct preposition.•Usually, one considers 9-34 top English prepositions.

Generating Confusion Sets for Context-Sensitive Error Correction

Alla Rozovskaya and Dan Roth{rozovska, danr}@illinois.edu

•A Confusion Set (candidate set) for preposition pi – prepositions considered as corrections for pi.•Standard confusion sets -- every participating preposition is viewed as a valid correction for pi. (De Felice & Pulman ’08, Tetreault & Chodorow ’08, Gamon et al., ’08).

•To narrow down the candidates, we need knowledge about which prepositions can serve as valid corrections. •Narrow down candidates by L1 (writer’s first language).“They ate by* their hands .”(1) Standard Conf. sets – 10 correction candidates for by: {about, on, p3, …p10}.

(2) L1-dependent Conf. sets – exclude candidates not seen as corrections for by in the ESL texts: Russian {by, with, of}; Chinese {by, with, in}.

(3) L1-dependent Weighted Conf. sets – enhanced with probability for each cand.

Experimental Results

Preposition Errors and ESL •Preposition errors are very common with ESL learners.•Preposition errors are influenced by L1.•Not all preposition confusions are equally likely (Han et al. ‘10, Rozovskaya & Roth ’10a).The Annotated ESL Corpus

•63000 words of ESL writing, annotated for article and preposition errors, other grammar and lexical errors (Rozovskaya & Roth ’10a).•Data from speakers of 9 first languages.•4185 prepositions, 352 (8.4%) erroneous.

Sourcelanguage

Totalpreps.

Incorrectpreps.

Errorrate

Chinese 953 144 15.1%

Czech 627 28 4.5%

Italian 687 43 6.3%

Russian 1210 85 7.0%

Spanish 708 52 7.3%

All 4185 352 8.4%

Preposition errors in the ESL data.

ExperimentsModels are trained on top 10 prepositions on native English data using the Averaged Perceptron Algorithm with LBJ (Rizzolo & Roth ’07).(1) Standard confusion sets.(2) L1-dependent confusion sets; Bad candidates excluded at decision time.(3) L1-dependent Weighted confusion sets; Bad candidates excluded in training.

•Artificial preposition errors are added in training, using error distributions of the speakers of L1 (Rozovskaya & Roth, ’10b).

Contributions

Confusion Sets for Preposition Error Correction

Problem: Multi-class Classification with a Very Large Number of Classes

Our Approach – Narrow down the Candidates

•L1-dependent confusion sets are superior to the standard confusion sets. •On the same recall points, the models with restricted confusion sets obtain a consistently better precision.•Using knowledge about the likelihood of each preposition confusion (weighted confusion sets) is even more effective. (stat. signif. at p<0.001, using McNemar’s test).

Preposition Error Correction as a Multi-class Classification Problem

Selected References•We propose to narrow down candidates instead of considering all possible classes.

•We propose methods to narrow down candidates at decision time and in training.

•We narrow down preposition correction candidates using knowledge about typical errors observed with writers whose first language is L1.

M. Gamon, J. Gao, C. Brockett, A. Klementiev, W. Dolan, D. Belenko, and L. Vanderwende. 2008. Using contextual speller techniques and language modeling for ESL error correction. IJCNLP.N. Han, J. Tetreault, S. Lee, and J. Ha. 2010. Using an error annotated learner corpus to develop and ESL/EFL error correction System. LREC.A. Rozovskaya and D. Roth. 2010a. Annotating ESL errors: Challenges and rewards. NAACL-BEA workshop.A. Rozovskaya and D. Roth. 2010b. Training paradigms for correcting errors in grammar and usage. NAACL.

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

Recall

Precision

(1) Standard Confusion Sets

(2) L1-dependent Conf. Sets enforced throughdecision threshold(3) L1-dependent Conf. Sets enforced througherrors in training