24
Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado 5 June 2009 Pearson Knowledge Technologies Palo Alto, California Jian Cheng Jared Bernstein Ulrike Pado Masa Suzuki

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Embed Size (px)

Citation preview

Page 1: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1

Automatic Assessment ofSpoken Modern Standard Arabic

NAACLBoulder, Colorado

5 June 2009

Pearson Knowledge TechnologiesPalo Alto, California

Jian ChengJared Bernstein

Ulrike PadoMasa Suzuki

Page 2: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 2

Outline

1. Pearson Knowledge Technologies

2. How Versant tests operate

2. Versant Arabic Test (development)

3. Validation evidence

4. Predictive accuracy

Page 3: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 3

Pearson Knowledge Tech. (PKT)

(KAT + Ordinate) are now PKT

KAT ≈ {LSA, Essay Scoring, Write-to-Learn, PTE, etc.}

Ordinate ≈ {Versant, ORF for NCES, VersaReader, PTE, etc.)

PKT is part of Pearson

Pearson ≈ { FT, Economist, Penguin,Longman, PsychCorp, … etc}

PearsonKT is in Boulder, Colorado and Palo Alto, California.

Page 4: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 4

Test delivery

Databasetests, prompts,

responses

ENGLISH

SPANISH

DUTCH

speech

report

Com

munication N

etwork

Delivery

Interface

CaliforniaAnywhere

Scoring system

ARABIC

Page 5: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 5

Versant Database

Test Delivery Server

Scoring

“The train’s been delayed by one hour ”

How Versant tests operate

Page 6: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 6

Versant Arabic Test

• DLI purpose~1000 students at DLI need predictive speaking tests

• RequirementsAccurate test of Arabic listening & speaking

Convenient to use at DLI and worldwide (ILR is costly)

Suitable for repeated formative testing

High peak capacity for mass screening

Page 7: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 7

Construct ComparisonOPI Construct: Oral Proficiency as manifest in an Oral

Proficiency Interview, is compatible with communicative competence as reflected in the functional level and/or complexity of content accurately produced.

Versant Construct: facility in spoken language –the ability to understand spoken language and speak appropriately in response at a conversational pace on everyday topics.

Page 8: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 8

Versant Arabic Test

Part A: Reading

Part B: Repeat -1

Part C: Short Answers

Part D: Sentence Builds

Part E: Repeat -2

Part F: Passage Retelling

Test Structure

Page 9: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 9

Versant Scoring

Read Repeat Sentence 1 Sent Build Repeat Sentence 2SAQ Passage

HumanScoring

VocabularySentence MasteryFluencyPronunciation

20% 30% 30% 20%

Page 10: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 10

How Versants are developed (1)

ScaleEstimates

Test Spec

Versant Scores

Native TestDevelopers

Ordinate System

Item TextRecorded

Items

Validation

ConcurrentILR

Interviews

Arabic Learners

NativeScribes

CriteriaNativeJudges

scale scores

transcripts

ILR ScoresArabic Natives

Internal

External

(Versant Arabic Test)

Page 11: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 11

kutubu al-waladi – the books of the boy

kataba al-waladu – wrote the boysubj

• No disambiguating short vowels written• Vowels carry phonetic information• Vowels carry grammar information

Arabic Challenges: Voweling

Page 12: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 12

for visit of us – for our visit• Complicates lexicon lookup, frequency

estimates…• “Short” Arabic items are harder than English

items with the same number of words

Complex Morphology

liziyaaratnaa

Page 13: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 13

Development & Run-time ProcessesCompilation of expectation and runtime flow

Page 14: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 14

Training data sources

Native Data  

Egypt Syria Iraq Palestine Other Total

484 281 179 187 517 1648

Learner Data  

DLI Non-DLI Total

1120 552 1672

Prompt Voices

Country Egypt Iraq Jordan Morocco Lebanon Palestine Syria

Voices F, M F, M M F M F, M F, M

Prompt Voices and Training Samples

Page 15: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 15

Reliability: Scores are consistent

Validity:Native and non-native speakers should be clearly

distinct

MSA and dialect speakers should be distinct(since we’re testing MSA)

Machine scores should predict human scores

Validation Criteria

Page 16: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 16

Reliability

Score

Split-Half Reliability

(N = 134)

Test – Retest Reliability

(N = 100)

Overall 0.98 0.97

Sentence Mastery

0.97 0.96

Vocabulary 0.89 0.82

Fluency 0.97 0.96

Pronunciation 0.96 0.94

Page 17: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 17

Native ~ Non-Native Scores

Page 18: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 18

Natives by Countries

Page 19: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 19

Educated ~ Uneducated SpeakersC

um

ula

tive

Den

sity

Arabic Overall Score

Page 20: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 20

Machine – Human Comparison

ScoreCorrelation

(N = 134)

Overall 0.97

Sentence Mastery 0.97

Vocabulary 0.96

Fluency 0.84

Pronunciation 0.83

Page 21: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 21

How Versants Compare to OPIs

Versant Arabic Overall Score

ILR

OP

I S

core

(lo

git

s)

N = 118r = 0.87

Page 22: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 22

Spanish & English: Versant ~ HumanIL

R O

PI

Sc

ore

(lo

git

s)

Versant Spanish Score

N = 37r = 0.92

Spanish English

N = 37r = 0.92

N = 151r = 0.86

Page 23: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 23

Summary

• Versant Arabic Test (VAT) is in operation• Based on a large and wide body of transcribed

spoken material• VAT is available on demand• Returns consistent, accurate scores that

reflect real-time skills with MSA• VAT can triage or screen for OPI tests

Page 24: Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 1 Automatic Assessment of Spoken Modern Standard Arabic NAACL Boulder, Colorado

Pearson Knowledge Technologies, Palo Alto, California NAACL Boulder 2009 24

النهاية

Thanks to Waheed Samy, Naima Bousofara Omar, Eli Andrews,Mohamed Al-Saffar, Nazir Kikhia, Rula Kikhia,and Linda Istanbullifor item development and data collection/transcription in Arabic,

and to Andy Freeman for providing diacritic markings.