1
Preposition Usage Errors by English as a Second Language (ESL) learners: They ate by* their hands.” The writer used by instead of with. This work is supported by a grant from the US department of Education. A multi-class classifier is trained; each class corresponds to a distinct preposition. Usually, one considers 9-34 top English prepositions. Generating Confusion Sets for Context-Sensitive Error Correction Alla Rozovskaya and Dan Roth {rozovska, danr}@illinois.edu •A Confusion Set (candidate set) for preposition p i – prepositions considered as corrections for p i . Standard confusion sets -- every participating preposition is viewed as a valid correction for p i . (De Felice & Pulman ’08, Tetreault & Chodorow ’08, Gamon et al., ’08). To narrow down the candidates, we need knowledge about which prepositions can serve as valid corrections. Narrow down candidates by L1 (writer’s first language). They ate by* their hands .” (1) Standard Conf. sets – 10 correction candidates for by: {about, on, p3, …p10}. (2) L1-dependent Conf. sets – exclude candidates not seen as corrections for by in the ESL texts: Russian {by, with, of}; Chinese {by, with, in}. (3) L1-dependent Weighted Conf. sets – enhanced with probability for each cand. Experimental Results Preposition Errors and ESL Preposition errors are very common with ESL learners. Preposition errors are influenced by L1. Not all preposition confusions are equally likely (Han et al. ‘10, Rozovskaya & Roth ’10a). The Annotated ESL Corpus 63000 words of ESL writing, annotated for article and preposition errors, other grammar and lexical errors (Rozovskaya & Roth ’10a). Data from speakers of 9 first languages. 4185 prepositions, 352 (8.4%) erroneous. Source language Total preps. Incorrect preps. Error rate Chinese 953 144 15.1% Czech 627 28 4.5% Italian 687 43 6.3% Russian 1210 85 7.0% Spanish 708 52 7.3% All 4185 352 8.4% Preposition errors in the ESL data. Experiments Models are trained on top 10 prepositions on native English data using the Averaged Perceptron Algorithm with LBJ (Rizzolo & Roth ’07). (1) Standard confusion sets. (2) L1-dependent confusion sets; Bad candidates excluded at decision time. (3) L1-dependent Weighted confusion sets; Bad candidates excluded in training. Artificial preposition errors are added in training, using error distributions of the speakers of L1 (Rozovskaya & Roth, ’10b). Contributions Confusion Sets for Preposition Error Correction Problem: Multi-class Classification with a Very Large Number of Classes Our Approach – Narrow down the Candidates L1-dependent confusion sets are superior to the standard confusion sets. On the same recall points, the models with restricted confusion sets obtain a consistently better precision. Using knowledge about the likelihood of each preposition confusion (weighted confusion sets) is even more effective. (stat. signif. at p<0.001, using McNemar’s test). Preposition Error Correction as a Multi-class Classification Problem Selected References We propose to narrow down candidates instead of considering all possible classes. We propose methods to narrow down candidates at decision time and in training. We narrow down preposition correction candidates using knowledge about typical errors observed with writers whose first M. Gamon, J. Gao, C. Brockett, A. Klementiev, W. Dolan, D. Belenko, and L. Vanderwende. 2008. Using contextual speller techniques and language modeling for ESL error correction. IJCNLP. N. Han, J. Tetreault, S. Lee, and J. Ha. 2010. Using an error annotated learner corpus to develop and ESL/EFL error correction System. LREC. A. Rozovskaya and D. Roth. 2010a. Annotating ESL errors: Challenges and rewards. NAACL-BEA workshop. A. Rozovskaya and D. Roth. 2010b. Training paradigms for correcting errors in grammar and usage. NAACL. 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 Recall Precision (1)Standard Confusion Sets (2)L1-dependentConf.Sets enforced through decision threshold (3)L1-dependentConf.Sets enforced through errors in training

Preposition Usage Errors by English as a Second Language (ESL) learners: “ They ate by* their hands.” The writer used by instead of with. This work is

Embed Size (px)

Citation preview

Page 1: Preposition Usage Errors by English as a Second Language (ESL) learners: “ They ate by* their hands.”  The writer used by instead of with. This work is

Preposition Usage Errors by English as a Second Language (ESL) learners:“They ate by* their hands.” The writer used by instead of with.

This work is supported by a grant from the US department of Education.

•A multi-class classifier is trained; each class corresponds to a distinct preposition.•Usually, one considers 9-34 top English prepositions.

Generating Confusion Sets for Context-Sensitive Error Correction

Alla Rozovskaya and Dan Roth{rozovska, danr}@illinois.edu

•A Confusion Set (candidate set) for preposition pi – prepositions considered as corrections for pi.•Standard confusion sets -- every participating preposition is viewed as a valid correction for pi. (De Felice & Pulman ’08, Tetreault & Chodorow ’08, Gamon et al., ’08).

•To narrow down the candidates, we need knowledge about which prepositions can serve as valid corrections. •Narrow down candidates by L1 (writer’s first language).“They ate by* their hands .”(1) Standard Conf. sets – 10 correction candidates for by: {about, on, p3, …p10}.

(2) L1-dependent Conf. sets – exclude candidates not seen as corrections for by in the ESL texts: Russian {by, with, of}; Chinese {by, with, in}.

(3) L1-dependent Weighted Conf. sets – enhanced with probability for each cand.

Experimental Results

Preposition Errors and ESL •Preposition errors are very common with ESL learners.•Preposition errors are influenced by L1.•Not all preposition confusions are equally likely (Han et al. ‘10, Rozovskaya & Roth ’10a).The Annotated ESL Corpus

•63000 words of ESL writing, annotated for article and preposition errors, other grammar and lexical errors (Rozovskaya & Roth ’10a).•Data from speakers of 9 first languages.•4185 prepositions, 352 (8.4%) erroneous.

Sourcelanguage

Totalpreps.

Incorrectpreps.

Errorrate

Chinese 953 144 15.1%

Czech 627 28 4.5%

Italian 687 43 6.3%

Russian 1210 85 7.0%

Spanish 708 52 7.3%

All 4185 352 8.4%

Preposition errors in the ESL data.

ExperimentsModels are trained on top 10 prepositions on native English data using the Averaged Perceptron Algorithm with LBJ (Rizzolo & Roth ’07).(1) Standard confusion sets.(2) L1-dependent confusion sets; Bad candidates excluded at decision time.(3) L1-dependent Weighted confusion sets; Bad candidates excluded in training.

•Artificial preposition errors are added in training, using error distributions of the speakers of L1 (Rozovskaya & Roth, ’10b).

Contributions

Confusion Sets for Preposition Error Correction

Problem: Multi-class Classification with a Very Large Number of Classes

Our Approach – Narrow down the Candidates

•L1-dependent confusion sets are superior to the standard confusion sets. •On the same recall points, the models with restricted confusion sets obtain a consistently better precision.•Using knowledge about the likelihood of each preposition confusion (weighted confusion sets) is even more effective. (stat. signif. at p<0.001, using McNemar’s test).

Preposition Error Correction as a Multi-class Classification Problem

Selected References•We propose to narrow down candidates instead of considering all possible classes.

•We propose methods to narrow down candidates at decision time and in training.

•We narrow down preposition correction candidates using knowledge about typical errors observed with writers whose first language is L1.

M. Gamon, J. Gao, C. Brockett, A. Klementiev, W. Dolan, D. Belenko, and L. Vanderwende. 2008. Using contextual speller techniques and language modeling for ESL error correction. IJCNLP.N. Han, J. Tetreault, S. Lee, and J. Ha. 2010. Using an error annotated learner corpus to develop and ESL/EFL error correction System. LREC.A. Rozovskaya and D. Roth. 2010a. Annotating ESL errors: Challenges and rewards. NAACL-BEA workshop.A. Rozovskaya and D. Roth. 2010b. Training paradigms for correcting errors in grammar and usage. NAACL.

0

10

20

30

40

50

60

70

80

90

100

0 20 40 60 80 100

Recall

Precision

(1) Standard Confusion Sets

(2) L1-dependent Conf. Sets enforced throughdecision threshold(3) L1-dependent Conf. Sets enforced througherrors in training