19
Automated Essay Evaluation Martin Angert Rachel Drossman

Automated Essay Evaluation Martin Angert Rachel Drossman

Embed Size (px)

Citation preview

Automated Essay Evaluation

Martin Angert

Rachel Drossman

The Problem…

It takes humans a long time to evaluate written text

Lack of teacher time to assess student writing samples

“As I have proven on the board, all of your essays are terrible and I’m tired of reading them. What

should I do?”

The Solution…

Automated essay evaluation! Criterion Evaluation Service

– Critique: evaluates and provides feedback for grammatical, usage, and mechanical errors

– E-Rater 2.0: gives essays a holistic score

Vantage Learning– Intellimetric

“Thank God for Artificial

Intelligence!”

What is Automated Essay Evaluation?

Teachers assign essays to students Student submit essays online Students get feedback Teachers get summary reports of students’

performance “I love you e-Rater 2.0!”

Nuts and Bolts

Automated essay evaluation relies on four main areas of Artificial Intelligence– Machine Learning – Natural Language Processing– Pattern Matching – Heuristics Integration

BOLT

Machine Learning

Teacher supplies training data– Corpus of edited and graded essays

Uses statistical methods to evaluate essays Ex: word sense disambiguation

– Looks at 2 words to the left and right of word to determine context

“I love Machine

Learning!”

Nuts and Bolts

Automated essay evaluation relies on four main areas of Artificial Intelligence– Machine Learning – Natural Language Processing– Pattern Matching – Heuristics Integration

BOLT

Natural Language Processing

Parse trees used to analyze sentence structure

Compares linguistic style of student essays to training data to evaluate grammar, mechanics, and usage

“Bertha, do your hands hurt from

processing natural

languages all day?”

Nuts and Bolts

Automated essay evaluation relies on four main areas of Artificial Intelligence– Machine Learning – Natural Language Processing– Pattern Matching – Heuristics Integration

BOLT

Pattern Matching

System contains examples of good vocabulary, sentence structure, etc.

Tries to match patterns in student essays and awards corresponding scores

?=

Nuts and Bolts

Automated essay evaluation relies on four main areas of Artificial Intelligence– Machine Learning – Natural Language Processing– Pattern Matching – Heuristics Integration

BOLT

Heuristics Integration

Searches students’ essays for phrases that occur more or less often than expected based on corpus frequencies

Example: repetitious words– If a single word accounts for more than 5% of the

word count in the essay, that word is repetitive

Criterion Diagnostic Analysis Tools

Grammar Usage Mechanics Style Org/Dev

Fragments

Run-ons

Garbled Sentences

S-V agreement

Ill-formed verb

Pronoun error

Missing Possessive

Wrong word

Wrong article

Missing article

Nonstandard verb or word form

Confused words

Wrong word form

Faulty comparisons

Spelling

Capitalization of proper nouns

Initial capitalization in a sentence

Missing apostrophe for contractions

Missing end punctuation

Comma error

Repetition

Inappropriate words

Sentences containing passive voice

Long Sentences

Essay statistics- # of words- # of sentences- Average # of words in sent.

Transitional words and phrases

Introductory material

Thesis statement

Topic sentences

Supporting Ideas

ConclusionOther

E-Rater Score Generation

12 features used when scoring an essay– 11 features reflect essential characteristics in

essay writing and are aligned with human scoring criteria

– 12th feature: word count Weighted less heavily so that longer essays do not

automatically earn higher scores

Trained on a sample of 200-250 scored essays with scores between 1 and 6

Implementation

For GMAT grading using automated essay evaluation…– Both a human and e-Rater grade the essay on a

six-point scale– If scores agree, essay is assigned that score– If scores differ by 1 point, essay is assigned score

of human grader– If scores differ by more than 1 point, automated

score is discarded and another human grader evaluates the essay

“How am I supposed to get into Harvard when e-Rater gave

me a 0.1 on my essay?”

Benefits

Immediate feedback to students

Enables teachers to spend more time with students and less time grading

Provides students with more practice writing

“Thanks a lot e-Rater, now instead of playing soccer I get to stay inside and

practice my writing!”

Limitations

Not always as accurate as teacher feedback– Would rather miss an error than tell a student that

a well-formed construction is ill-formed

Machine cannot understand unique writing styles, humor, irony, etc.

Input sentence: “This presentation deserves an A”

E-Rater Output: “Well-constructed sentence. I concur!”

Limitations - Example

Which sentence do you think is better?1. It is with the greatest esteem and confidence that I write to

support Joey as a candidate for a faculty position. I have known Joey in a variety of capacities for more than five years, and I find him to be one of the most eloquent…

2. It is with chimpanzee greatest esteem and confidence that I write to support Joey as a candidate for a faculty position. I have known Joey in a variety of capacities for more than five years, and I find him to be one of chimpanzee most eloquent…

Conclusion

Currently has over 500,000 users in 445 institutions

Strongest use in the K-12 market System’s understanding of word meaning

(rather than just grammar) is improving Companies survey current users to receive

feedback for future releases

“I pity da fool who doesn’t use

e-Rater 2.0!”