CIKM14: Fixing grammatical errors by preposition ranking

Preview:

Citation preview

Correct Me If I’m Wrong: Fixing Grammatical

Errors by Preposition Ranking

Roman Prokofyev, Ruslan Mavlyutov, Gianluca Demartini and

Philippe Cudre-Mauroux

eXascale Infolab

University of Fribourg, Switzerland

November 4th, CIKM’14

Shanghai, China

1

Motivations and Task Overview

2

• Grammatical correction is important by itself

• Also as a part of Machine Translation or Speech Recognition

Correction of textual content written by English Learners.

⇒ Rank candidate prepositions by their likelihood of being

correct in order to potentially replace the original.

I am new in android programming.

[to, at, for, …]

State-of-Art in Grammar Correction

Approaches:

• Multi-class classification using pre-defined candidate set

• Statistical Machine Translation

• Language modeling

Features:

• Lexical features, POS tags

• Head verbs/nouns, word dependencies

• N-gram counts from large corpora (Google Web 1-T)

• Confusion matrix

3

Key Ideas

• Usage of a particular preposition is governed by a

particular word/n-gram;

⇒ Task: select/aggregate n-grams that influence

preposition usage;

⇒ Use n-gram association measures to score each

preposition.

4

Processing Pipeline

5

m...1

n-gram

construction

Supervised

Classifier

Sentence

Tokenization

Feature

extractionFeature

extractionFeatures N-gram

Statistics

Documentforeach

foreach

Sentence

Extraction

Corrected

SentenceCorrected

Document

List of

n-grams

Ranked list

of

prepositions

for n-gram

Tokenization and n-gram distance

6

N-gram Type Distance

the force PREP 3gram -2

force be PREP 3gram -1

be PREP you 3gram 0

PREP you . 3gram 1

N-gram Type Distance

be PREP 2gram -1

PREP you 2gram 1

PREP . 2gram 2

N-gram association measures

Motivation:

use association measures to compute a score that will be

proportional to the likelihood of an n-gram appearing

together with a preposition.

Background N-gram collection: Google Books N-grams.

7

N-gram PMI scores by preposition

force be PREP (with: -4.9), (under: -7.86), (at: -9.26), (in: -9.93), …

be PREP you (with: -1.86), (amongst: -1.99), (beside: -2.26), …

PREP you . (behind: -0.71), (beside: -0.82), (around: -0.84), …

N-grams with determiner skips

8

Around 30% prepositions are used in proximity to

determiners (“a”, “the”, etc.)

Determiners Nouns

Correct preposition ranks

“one of the most” “one of most”

PMI-based Features

9

• Average rank of a preposition among the ranks of the

considered n-grams;

• Average PMI score of a preposition among the PMI

scores of the considered n-grams;

• Total number of occurrences of a certain preposition on

the first position in the ranking among the ranks of the

considered n-grams.

Calculated across 2 logical groups (considered n-grams):

• N-gram size;

• N-gram distances.

Central N-grams

10

Distribution of correct preposition counts on top of PMI

rankings with respect to n-gram distance.

Other features

• Confusion matrix values

• POS tags: 5 most frequent tags + “OTHER” catch-all tag;

• Preposition itself: sparse vector of the size of the

candidate preposition set.

11

to in of for on at with from

to 0.958 0.007 0.002 0.011 0.004 0.003 0.005 0.002

in 0.037 0.79 0.01 0.009 0.066 0.036 0.015 0.008

Preposition selection

Supervised Learning algorithm.

• Two-class classification with a confidence score for every

preposition from the candidate set;

• Every candidate preposition will receive its own set of

feature values;

Classifier: random forest.

12

Training/Test Collections

Training collection:

• First Certificate of English (Cambridge exams)

Test collections:

• CoNLL-2013 (50 essays written by NUS students)

• StackExchange (historical edits)

Cambridge FCE CoNLL-2013 StackExhange

N# sentences 27k 1.4k 6k

13

Experiments: overview

1. Feature importance scores

2. N-gram sizes

3. N-gram distances

4. CoNLL and StackExchange evaluation

14

Experiments: Feature Importances

All top features except “confusion matrix” are based on the

PMI scores.

15

Feature name Importance score

Confusion matrix probability 0.28

Top preposition counts (3grams) 0.13

Average rank (distance=0) 0.06

Central n-gram rank 0.06

Average rank (distance=1) 0.05

N-gram Sizes and Distances

Distance restriction Precision Recall F1 score

(0) 0.3077* 0.3908* 0.3442*

(-1,1) 0.3231 0.4166 0.3637

(-2,2) 0.3214 0.4222 0.3648

(-5,5) 0.3223 0.4028 0.3577

None 0.3214 0.3924* 0.3532

16

N-gram sizes Precision Recall F1 score

{2,3}-grams 0.3005* 0.3879* 0.3385*

{3}-grams + skip n-grams (4) 0.2931* 0.4187 0.3447

{2,3}-grams + skip n-grams 0.3231 0.4166 0.3637

* indicates a statistically significant difference to the best performing

approach (in bold)

Test Collection Evaluation

17

Collection Approach Precision Recall F1

score

CoNLL-2013

NARA Team @CoNLL2013 0.2910 0.1254 0.1753

N-gram-based classification 0.2592 0.3611 0.3017

StackExchange

N-gram-based classification 0.1585 0.2185 0.1837

N-gram-based classification

(cross-validation)

0.2704 0.2961 0.2824

Takeaways

• PMI association measures

• + Two-class preposition ranking

⇒ allow to significantly outperform the state of the art.

• Skip n-grams contribute significantly to the final result.

18

Roman Prokofyev (@rprokofyev)

eXascale Infolab (exascale.info), University of Fribourg, Switzerland

http://www.slideshare.net/eXascaleInfolab/

Two-class vs. Multi-class classification

Multi-class:

• input: one feature vector

• output: one of N classes (1-N)

Two-class (still with N possible outcomes)

• input: N feature vectors

• output: 1/0 for each of N feature vectors

19

Training/Test Collections

Training collection:

• Cambridge Learner Corpus: First Certificate of English

(exams of 2000-2001), CLC FCE

Test collections:

• CoNLL-2013 (50 essays written by NUS students)

• StackExchange (StackOverflow + Superuser)

CLC FCE CoNLL-2013 SE

N# sentences 27k 1.4k 6k

N# prepositions 60k 3.2k 15.8k

N# prepositional errors 2.9k 152 6k

% errors 4.8 4.7 38.2

20

Evaluation Metrics

Precision =

Recall =

F1 =

21

valid _ suggested _corrections

total _ suggested _corrections

valid _ suggested _corrections

total _valid _corrections

2·Precision·Recall

Precision+Recall