81
Russian Morphological Processing for ICALL Introduction & Motivation ICALL context System architecture Exercise design Error types Morphological analysis Lexicon Error detection Constructing the Lexicon Summary & Outlook References Russian Morphological Processing for ICALL Markus Dickinson and Joshua Herring Dept. of Linguistics, Indiana University ACL Workshop on Building Educational Applications Columbus, OH June 19, 2008 1 / 20

Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Russian Morphological Processing for ICALL

Markus Dickinson and Joshua Herring

Dept. of Linguistics, Indiana University

ACL Workshop on Building Educational ApplicationsColumbus, OHJune 19, 2008

1 / 20

Page 2: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Introduction & Motivation

Intelligent computer-aided language learning (ICALL)systems are ideal for language pedagogy

I provide additional practice outside classroomI aiding awareness of language forms & rules (see Amaral

and Meurers 2006)

However:I Few ICALL systems in existence today

I German (Heift and Nicholson 2001)I Portuguese (Amaral and Meurers 2006, 2007)I Japanese (Nagata 1995)

I Processing of ill-formed learner text focuses on a limitedset of languages and language types

I See Vandeventer Faltin (2003) and references therein

⇒ Should expand to more language families

2 / 20

Page 3: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Introduction & Motivation

Intelligent computer-aided language learning (ICALL)systems are ideal for language pedagogy

I provide additional practice outside classroomI aiding awareness of language forms & rules (see Amaral

and Meurers 2006)

However:I Few ICALL systems in existence today

I German (Heift and Nicholson 2001)I Portuguese (Amaral and Meurers 2006, 2007)I Japanese (Nagata 1995)

I Processing of ill-formed learner text focuses on a limitedset of languages and language types

I See Vandeventer Faltin (2003) and references therein

⇒ Should expand to more language families

2 / 20

Page 4: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Introduction & Motivation

Intelligent computer-aided language learning (ICALL)systems are ideal for language pedagogy

I provide additional practice outside classroomI aiding awareness of language forms & rules (see Amaral

and Meurers 2006)

However:I Few ICALL systems in existence today

I German (Heift and Nicholson 2001)I Portuguese (Amaral and Meurers 2006, 2007)I Japanese (Nagata 1995)

I Processing of ill-formed learner text focuses on a limitedset of languages and language types

I See Vandeventer Faltin (2003) and references therein

⇒ Should expand to more language families

2 / 20

Page 5: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Introduction & Motivation

Intelligent computer-aided language learning (ICALL)systems are ideal for language pedagogy

I provide additional practice outside classroomI aiding awareness of language forms & rules (see Amaral

and Meurers 2006)

However:I Few ICALL systems in existence today

I German (Heift and Nicholson 2001)I Portuguese (Amaral and Meurers 2006, 2007)I Japanese (Nagata 1995)

I Processing of ill-formed learner text focuses on a limitedset of languages and language types

I See Vandeventer Faltin (2003) and references therein

⇒ Should expand to more language families

2 / 20

Page 6: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Re-usability

Significant overhead in developing an ICALL system

Effort in producing an ICALL system can be reduced by:I reusing system architecture

I evaluating and optimizing the architectureI adapting existing NLP tools

I and/or developing resource-light technology

It is important to determine where and how reuse oftechnology is appropriate

3 / 20

Page 7: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Re-usability

Significant overhead in developing an ICALL system

Effort in producing an ICALL system can be reduced by:I reusing system architecture

I evaluating and optimizing the architecture

I adapting existing NLP toolsI and/or developing resource-light technology

It is important to determine where and how reuse oftechnology is appropriate

3 / 20

Page 8: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Re-usability

Significant overhead in developing an ICALL system

Effort in producing an ICALL system can be reduced by:I reusing system architecture

I evaluating and optimizing the architectureI adapting existing NLP tools

I and/or developing resource-light technology

It is important to determine where and how reuse oftechnology is appropriate

3 / 20

Page 9: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Re-usability

Significant overhead in developing an ICALL system

Effort in producing an ICALL system can be reduced by:I reusing system architecture

I evaluating and optimizing the architectureI adapting existing NLP tools

I and/or developing resource-light technology

It is important to determine where and how reuse oftechnology is appropriate

3 / 20

Page 10: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Russian ICALL

We are developing an ICALL system for beginning learners ofRussian

I Based on the TAGARELA system for Portuguese(Amaral and Meurers 2006, 2007)

I Q1: How can the technology in TAGARELA can beadapted for efficient & accurate use with Russian?

I Requires development of techniques to parse ill-formedinput for a morphologically-rich language

I Q2: What kind of processing do we need, and areexisting NLP tools reusable for this purpose?

I Q2a: What is the context for processing (i.e., theexercise requirements)?

I Q2b: What are the expected types of morphologicalerrors?

4 / 20

Page 11: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Russian ICALL

We are developing an ICALL system for beginning learners ofRussian

I Based on the TAGARELA system for Portuguese(Amaral and Meurers 2006, 2007)

I Q1: How can the technology in TAGARELA can beadapted for efficient & accurate use with Russian?

I Requires development of techniques to parse ill-formedinput for a morphologically-rich language

I Q2: What kind of processing do we need, and areexisting NLP tools reusable for this purpose?

I Q2a: What is the context for processing (i.e., theexercise requirements)?

I Q2b: What are the expected types of morphologicalerrors?

4 / 20

Page 12: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Russian ICALL

We are developing an ICALL system for beginning learners ofRussian

I Based on the TAGARELA system for Portuguese(Amaral and Meurers 2006, 2007)

I Q1: How can the technology in TAGARELA can beadapted for efficient & accurate use with Russian?

I Requires development of techniques to parse ill-formedinput for a morphologically-rich language

I Q2: What kind of processing do we need, and areexisting NLP tools reusable for this purpose?

I Q2a: What is the context for processing (i.e., theexercise requirements)?

I Q2b: What are the expected types of morphologicalerrors?

4 / 20

Page 13: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Russian ICALL

We are developing an ICALL system for beginning learners ofRussian

I Based on the TAGARELA system for Portuguese(Amaral and Meurers 2006, 2007)

I Q1: How can the technology in TAGARELA can beadapted for efficient & accurate use with Russian?

I Requires development of techniques to parse ill-formedinput for a morphologically-rich language

I Q2: What kind of processing do we need, and areexisting NLP tools reusable for this purpose?

I Q2a: What is the context for processing (i.e., theexercise requirements)?

I Q2b: What are the expected types of morphologicalerrors?

4 / 20

Page 14: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Russian ICALL

We are developing an ICALL system for beginning learners ofRussian

I Based on the TAGARELA system for Portuguese(Amaral and Meurers 2006, 2007)

I Q1: How can the technology in TAGARELA can beadapted for efficient & accurate use with Russian?

I Requires development of techniques to parse ill-formedinput for a morphologically-rich language

I Q2: What kind of processing do we need, and areexisting NLP tools reusable for this purpose?

I Q2a: What is the context for processing (i.e., theexercise requirements)?

I Q2b: What are the expected types of morphologicalerrors?

4 / 20

Page 15: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

System architecture

From TAGARELA, we retain:I Modular separation of activities from analysis

I Each activity type has own directory, to ease in:I loading different kinds of external files (e.g., sound)I calling different processing tools (Amaral 2007)

I Web processing codeI e.g., code for handling user logins, design of user

databases (for tracking learner information)I Minimizes amount of online overhead in our system,

allowing us to focus on linguistic processing

I Idea of using annotation-based processing (cf. Amaraland Meurers 2007).

I Before error detection/diagnosis, annotate learner inputwith linguistic properties that can be automaticallydetermined

5 / 20

Page 16: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

System architecture

From TAGARELA, we retain:I Modular separation of activities from analysis

I Each activity type has own directory, to ease in:I loading different kinds of external files (e.g., sound)I calling different processing tools (Amaral 2007)

I Web processing codeI e.g., code for handling user logins, design of user

databases (for tracking learner information)I Minimizes amount of online overhead in our system,

allowing us to focus on linguistic processing

I Idea of using annotation-based processing (cf. Amaraland Meurers 2007).

I Before error detection/diagnosis, annotate learner inputwith linguistic properties that can be automaticallydetermined

5 / 20

Page 17: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

System architecture

From TAGARELA, we retain:I Modular separation of activities from analysis

I Each activity type has own directory, to ease in:I loading different kinds of external files (e.g., sound)I calling different processing tools (Amaral 2007)

I Web processing codeI e.g., code for handling user logins, design of user

databases (for tracking learner information)I Minimizes amount of online overhead in our system,

allowing us to focus on linguistic processing

I Idea of using annotation-based processing (cf. Amaraland Meurers 2007).

I Before error detection/diagnosis, annotate learner inputwith linguistic properties that can be automaticallydetermined

5 / 20

Page 18: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Exercise design

Goals of the system:I Support an 8-week “survival” Russian course

I Basics of the languageI Contextualized practice to support traveling to Russia

I Cover a range of exercises, all of which require somemorphosyntactic analysis of Russian

I listening, video-based narratives, reading practice,exercises centered around maps and locations, . . .

A simple example of a Russian verbal exercise:

(1) ВчераvcheraYesterday

онonhe

______

(видеть)(videt’)(to see)

фильм.fil’ma film

⇒ This set-up constrains what types of errors learners areallowed to make

6 / 20

Page 19: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Exercise design

Goals of the system:I Support an 8-week “survival” Russian course

I Basics of the languageI Contextualized practice to support traveling to Russia

I Cover a range of exercises, all of which require somemorphosyntactic analysis of Russian

I listening, video-based narratives, reading practice,exercises centered around maps and locations, . . .

A simple example of a Russian verbal exercise:

(1) ВчераvcheraYesterday

онonhe

______

(видеть)(videt’)(to see)

фильм.fil’ma film

⇒ This set-up constrains what types of errors learners areallowed to make

6 / 20

Page 20: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Exercise design

Goals of the system:I Support an 8-week “survival” Russian course

I Basics of the languageI Contextualized practice to support traveling to Russia

I Cover a range of exercises, all of which require somemorphosyntactic analysis of Russian

I listening, video-based narratives, reading practice,exercises centered around maps and locations, . . .

A simple example of a Russian verbal exercise:

(1) ВчераvcheraYesterday

онonhe

______

(видеть)(videt’)(to see)

фильм.fil’ma film

⇒ This set-up constrains what types of errors learners areallowed to make

6 / 20

Page 21: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Exercise design

Goals of the system:I Support an 8-week “survival” Russian course

I Basics of the languageI Contextualized practice to support traveling to Russia

I Cover a range of exercises, all of which require somemorphosyntactic analysis of Russian

I listening, video-based narratives, reading practice,exercises centered around maps and locations, . . .

A simple example of a Russian verbal exercise:

(1) ВчераvcheraYesterday

онonhe

______

(видеть)(videt’)(to see)

фильм.fil’ma film

⇒ This set-up constrains what types of errors learners areallowed to make

6 / 20

Page 22: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (1)

We focus on morphological errors, as these are commonacross exercises

1. Inappropriate verb stem1.1 Always inappropriate (spelling error)

I Requires some spell-checking technology1.2 Inappropriate for this context

I Requires activity model specifying appropriate verbs

External needs: lexicon, spell checker

7 / 20

Page 23: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (1)

We focus on morphological errors, as these are commonacross exercises

1. Inappropriate verb stem

1.1 Always inappropriate (spelling error)I Requires some spell-checking technology

1.2 Inappropriate for this contextI Requires activity model specifying appropriate verbs

External needs: lexicon, spell checker

7 / 20

Page 24: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (1)

We focus on morphological errors, as these are commonacross exercises

1. Inappropriate verb stem1.1 Always inappropriate (spelling error)

I Requires some spell-checking technology

1.2 Inappropriate for this contextI Requires activity model specifying appropriate verbs

External needs: lexicon, spell checker

7 / 20

Page 25: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (1)

We focus on morphological errors, as these are commonacross exercises

1. Inappropriate verb stem1.1 Always inappropriate (spelling error)

I Requires some spell-checking technology1.2 Inappropriate for this context

I Requires activity model specifying appropriate verbs

External needs: lexicon, spell checker

7 / 20

Page 26: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (1)

We focus on morphological errors, as these are commonacross exercises

1. Inappropriate verb stem1.1 Always inappropriate (spelling error)

I Requires some spell-checking technology1.2 Inappropriate for this context

I Requires activity model specifying appropriate verbs

External needs: lexicon, spell checker

7 / 20

Page 27: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (2)

2. Inappropriate verb affix

2.1 Always inappropriate (spelling error)2.2 Always inappropriate for verbs

I ев is an appropriate nominal ending:

(2) *начина-евbegin-??

2.3 Inappropriate for this verbI ит is for a different verb conjugation:

(3) *начина-итbegin-3s

(cf. начина-ет)

External needs: lexicon, spell checker

8 / 20

Page 28: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (2)

2. Inappropriate verb affix2.1 Always inappropriate (spelling error)

2.2 Always inappropriate for verbsI ев is an appropriate nominal ending:

(2) *начина-евbegin-??

2.3 Inappropriate for this verbI ит is for a different verb conjugation:

(3) *начина-итbegin-3s

(cf. начина-ет)

External needs: lexicon, spell checker

8 / 20

Page 29: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (2)

2. Inappropriate verb affix2.1 Always inappropriate (spelling error)2.2 Always inappropriate for verbs

I ев is an appropriate nominal ending:

(2) *начина-евbegin-??

2.3 Inappropriate for this verbI ит is for a different verb conjugation:

(3) *начина-итbegin-3s

(cf. начина-ет)

External needs: lexicon, spell checker

8 / 20

Page 30: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (2)

2. Inappropriate verb affix2.1 Always inappropriate (spelling error)2.2 Always inappropriate for verbs

I ев is an appropriate nominal ending:

(2) *начина-евbegin-??

2.3 Inappropriate for this verbI ит is for a different verb conjugation:

(3) *начина-итbegin-3s

(cf. начина-ет)

External needs: lexicon, spell checker

8 / 20

Page 31: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (2)

2. Inappropriate verb affix2.1 Always inappropriate (spelling error)2.2 Always inappropriate for verbs

I ев is an appropriate nominal ending:

(2) *начина-евbegin-??

2.3 Inappropriate for this verbI ит is for a different verb conjugation:

(3) *начина-итbegin-3s

(cf. начина-ет)

External needs: lexicon, spell checker

8 / 20

Page 32: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (3)

3. Inappropriate combination of stem and affixI The verb for ’can’ varies between the stems мог and

мож (e.g., мож-ем ’we can’)

(4) *мож-уcan-1s

(cf. мог-у)

External needs: lexicon

9 / 20

Page 33: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (3)

3. Inappropriate combination of stem and affixI The verb for ’can’ varies between the stems мог and

мож (e.g., мож-ем ’we can’)

(4) *мож-уcan-1s

(cf. мог-у)

External needs: lexicon

9 / 20

Page 34: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (4)

4. Well-formed word in inappropriate context

4.1 Inappropriate agreement featuresI Need to know best analysis in context of verb &

subject

(5) *ЯI

думаетthink-3sg

4.2 Inappropriate verb form (tense, (im)perfective, etc.)I Activity model can often indicate correct form—e.g.,

perfective (completed action) or imperfectiveI Need to know best analysis in context—e.g., infinitive

verb is governed by a verb selecting for infinitive

External needs: morphological analyzer, POS tagger

10 / 20

Page 35: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (4)

4. Well-formed word in inappropriate context4.1 Inappropriate agreement features

I Need to know best analysis in context of verb &subject

(5) *ЯI

думаетthink-3sg

4.2 Inappropriate verb form (tense, (im)perfective, etc.)I Activity model can often indicate correct form—e.g.,

perfective (completed action) or imperfectiveI Need to know best analysis in context—e.g., infinitive

verb is governed by a verb selecting for infinitive

External needs: morphological analyzer, POS tagger

10 / 20

Page 36: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (4)

4. Well-formed word in inappropriate context4.1 Inappropriate agreement features

I Need to know best analysis in context of verb &subject

(5) *ЯI

думаетthink-3sg

4.2 Inappropriate verb form (tense, (im)perfective, etc.)I Activity model can often indicate correct form—e.g.,

perfective (completed action) or imperfective

I Need to know best analysis in context—e.g., infinitiveverb is governed by a verb selecting for infinitive

External needs: morphological analyzer, POS tagger

10 / 20

Page 37: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (4)

4. Well-formed word in inappropriate context4.1 Inappropriate agreement features

I Need to know best analysis in context of verb &subject

(5) *ЯI

думаетthink-3sg

4.2 Inappropriate verb form (tense, (im)perfective, etc.)I Activity model can often indicate correct form—e.g.,

perfective (completed action) or imperfectiveI Need to know best analysis in context—e.g., infinitive

verb is governed by a verb selecting for infinitive

External needs: morphological analyzer, POS tagger

10 / 20

Page 38: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Expected error types (4)

4. Well-formed word in inappropriate context4.1 Inappropriate agreement features

I Need to know best analysis in context of verb &subject

(5) *ЯI

думаетthink-3sg

4.2 Inappropriate verb form (tense, (im)perfective, etc.)I Activity model can often indicate correct form—e.g.,

perfective (completed action) or imperfectiveI Need to know best analysis in context—e.g., infinitive

verb is governed by a verb selecting for infinitive

External needs: morphological analyzer, POS tagger

10 / 20

Page 39: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Using the error taxonomy

Even for simple exercises, there are a range of errors,requiring new technology

I Error types #1 through #3 make no use of contextI Only need information from activity model and lexicon

to tell whether the word is validI Priority is thus to develop or acquire a lexicon

I Error type #4 requires contextual information, as thewords are well-formed

I Requires morphological analysis, based on a lexiconI Ideally, the lexicon design should be integrated with

morphological analysis

I No category for argument structure misuse or word ordervariation as these are syntactic errors, not morphological

11 / 20

Page 40: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Using the error taxonomy

Even for simple exercises, there are a range of errors,requiring new technology

I Error types #1 through #3 make no use of contextI Only need information from activity model and lexicon

to tell whether the word is validI Priority is thus to develop or acquire a lexicon

I Error type #4 requires contextual information, as thewords are well-formed

I Requires morphological analysis, based on a lexiconI Ideally, the lexicon design should be integrated with

morphological analysis

I No category for argument structure misuse or word ordervariation as these are syntactic errors, not morphological

11 / 20

Page 41: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Using the error taxonomy

Even for simple exercises, there are a range of errors,requiring new technology

I Error types #1 through #3 make no use of contextI Only need information from activity model and lexicon

to tell whether the word is validI Priority is thus to develop or acquire a lexicon

I Error type #4 requires contextual information, as thewords are well-formed

I Requires morphological analysis, based on a lexiconI Ideally, the lexicon design should be integrated with

morphological analysis

I No category for argument structure misuse or word ordervariation as these are syntactic errors, not morphological

11 / 20

Page 42: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Using the error taxonomy

Even for simple exercises, there are a range of errors,requiring new technology

I Error types #1 through #3 make no use of contextI Only need information from activity model and lexicon

to tell whether the word is validI Priority is thus to develop or acquire a lexicon

I Error type #4 requires contextual information, as thewords are well-formed

I Requires morphological analysis, based on a lexiconI Ideally, the lexicon design should be integrated with

morphological analysis

I No category for argument structure misuse or word ordervariation as these are syntactic errors, not morphological

11 / 20

Page 43: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Morphological analysis

Annotation of input must be able to determine morphologicalproperties, independent of surrounding context

I We cannot assume well-formed input, as traditionalmorphological analyzers do

I We need ready access to alternative analyses, especiallyfor learner innovations

(6) душ-уsoul-N.acc?*shower-V.1s?

I We need easy implementation of activity-specificheuristics, e.g., weight analyses

Finite State Morphology is ideal for this purpose (see, e.g.,Roark and Sproat 2007)

12 / 20

Page 44: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Morphological analysis

Annotation of input must be able to determine morphologicalproperties, independent of surrounding context

I We cannot assume well-formed input, as traditionalmorphological analyzers do

I We need ready access to alternative analyses, especiallyfor learner innovations

(6) душ-уsoul-N.acc?*shower-V.1s?

I We need easy implementation of activity-specificheuristics, e.g., weight analyses

Finite State Morphology is ideal for this purpose (see, e.g.,Roark and Sproat 2007)

12 / 20

Page 45: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Morphological analysis

Annotation of input must be able to determine morphologicalproperties, independent of surrounding context

I We cannot assume well-formed input, as traditionalmorphological analyzers do

I We need ready access to alternative analyses, especiallyfor learner innovations

(6) душ-уsoul-N.acc?*shower-V.1s?

I We need easy implementation of activity-specificheuristics, e.g., weight analyses

Finite State Morphology is ideal for this purpose (see, e.g.,Roark and Sproat 2007)

12 / 20

Page 46: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Morphological analysis

Annotation of input must be able to determine morphologicalproperties, independent of surrounding context

I We cannot assume well-formed input, as traditionalmorphological analyzers do

I We need ready access to alternative analyses, especiallyfor learner innovations

(6) душ-уsoul-N.acc?*shower-V.1s?

I We need easy implementation of activity-specificheuristics, e.g., weight analyses

Finite State Morphology is ideal for this purpose (see, e.g.,Roark and Sproat 2007)

12 / 20

Page 47: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Morphological analysis

Annotation of input must be able to determine morphologicalproperties, independent of surrounding context

I We cannot assume well-formed input, as traditionalmorphological analyzers do

I We need ready access to alternative analyses, especiallyfor learner innovations

(6) душ-уsoul-N.acc?*shower-V.1s?

I We need easy implementation of activity-specificheuristics, e.g., weight analyses

Finite State Morphology is ideal for this purpose (see, e.g.,Roark and Sproat 2007)

12 / 20

Page 48: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

The nature of the lexicon

Goal: Accurately obtain partial information from well-formedand ill-formed input

Proposal: Use a fully-specified lexicon, implemented as aFinite State Transducer (FST), indexed by both word edges

I Russian morphological information is at word edges—i.e.,prefixes and suffixes

I Analysis proceeds by working inwards, one character ata time, beginning at each end of an input item

13 / 20

Page 49: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

The nature of the lexicon

Goal: Accurately obtain partial information from well-formedand ill-formed input

Proposal: Use a fully-specified lexicon, implemented as aFinite State Transducer (FST), indexed by both word edges

I Russian morphological information is at word edges—i.e.,prefixes and suffixes

I Analysis proceeds by working inwards, one character ata time, beginning at each end of an input item

13 / 20

Page 50: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

The nature of the lexicon

Goal: Accurately obtain partial information from well-formedand ill-formed input

Proposal: Use a fully-specified lexicon, implemented as aFinite State Transducer (FST), indexed by both word edges

I Russian morphological information is at word edges—i.e.,prefixes and suffixes

I Analysis proceeds by working inwards, one character ata time, beginning at each end of an input item

13 / 20

Page 51: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Lexical chains

Specifically, morphological endings are stored as separatechains, attached to the main chain as appropriate

I Read symbols from input string one at a time, building aset of hypotheses about the proper analysis

I set of legal continuations of the current stringI set of continuations that can be obtained through

application of a repair operation (insert, delete, etc.)

Consider дума-ю (‘think-1sg’):

I Up to morpheme boundary, identical to some form ofдума (duma), ‘parliament’

I At hypothesized boundary, both competing hypotheses(‘think’ and ‘parliament’) are possible

I For ‘think’, continuing to ю is legalI For ‘parliament’, continuing to ю requires a repair

14 / 20

Page 52: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Lexical chains

Specifically, morphological endings are stored as separatechains, attached to the main chain as appropriate

I Read symbols from input string one at a time, building aset of hypotheses about the proper analysis

I set of legal continuations of the current stringI set of continuations that can be obtained through

application of a repair operation (insert, delete, etc.)

Consider дума-ю (‘think-1sg’):

I Up to morpheme boundary, identical to some form ofдума (duma), ‘parliament’

I At hypothesized boundary, both competing hypotheses(‘think’ and ‘parliament’) are possible

I For ‘think’, continuing to ю is legalI For ‘parliament’, continuing to ю requires a repair

14 / 20

Page 53: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Lexical chains

Specifically, morphological endings are stored as separatechains, attached to the main chain as appropriate

I Read symbols from input string one at a time, building aset of hypotheses about the proper analysis

I set of legal continuations of the current stringI set of continuations that can be obtained through

application of a repair operation (insert, delete, etc.)

Consider дума-ю (‘think-1sg’):

I Up to morpheme boundary, identical to some form ofдума (duma), ‘parliament’

I At hypothesized boundary, both competing hypotheses(‘think’ and ‘parliament’) are possible

I For ‘think’, continuing to ю is legalI For ‘parliament’, continuing to ю requires a repair

14 / 20

Page 54: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Lexical chains

Specifically, morphological endings are stored as separatechains, attached to the main chain as appropriate

I Read symbols from input string one at a time, building aset of hypotheses about the proper analysis

I set of legal continuations of the current stringI set of continuations that can be obtained through

application of a repair operation (insert, delete, etc.)

Consider дума-ю (‘think-1sg’):

I Up to morpheme boundary, identical to some form ofдума (duma), ‘parliament’

I At hypothesized boundary, both competing hypotheses(‘think’ and ‘parliament’) are possible

I For ‘think’, continuing to ю is legalI For ‘parliament’, continuing to ю requires a repair

14 / 20

Page 55: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Lexical chains

Specifically, morphological endings are stored as separatechains, attached to the main chain as appropriate

I Read symbols from input string one at a time, building aset of hypotheses about the proper analysis

I set of legal continuations of the current stringI set of continuations that can be obtained through

application of a repair operation (insert, delete, etc.)

Consider дума-ю (‘think-1sg’):

I Up to morpheme boundary, identical to some form ofдума (duma), ‘parliament’

I At hypothesized boundary, both competing hypotheses(‘think’ and ‘parliament’) are possible

I For ‘think’, continuing to ю is legalI For ‘parliament’, continuing to ю requires a repair

14 / 20

Page 56: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Information for feedback

As it changes state, the transducer will add information tothe current set of analyses:

I Append input symbol to outputI Add morphological features, generally when a transition

crosses a morphological boundaryI Add corrections on the input string, when phonological

processes have been misapplied

Hypothesizing morpheme boundaries means we can:I segment word into its likely component partsI analyze each part independently of the others

I e.g., ignore an erroneous morpheme while identifying anadjoining correct morpheme

15 / 20

Page 57: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Information for feedback

As it changes state, the transducer will add information tothe current set of analyses:

I Append input symbol to output

I Add morphological features, generally when a transitioncrosses a morphological boundary

I Add corrections on the input string, when phonologicalprocesses have been misapplied

Hypothesizing morpheme boundaries means we can:I segment word into its likely component partsI analyze each part independently of the others

I e.g., ignore an erroneous morpheme while identifying anadjoining correct morpheme

15 / 20

Page 58: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Information for feedback

As it changes state, the transducer will add information tothe current set of analyses:

I Append input symbol to outputI Add morphological features, generally when a transition

crosses a morphological boundary

I Add corrections on the input string, when phonologicalprocesses have been misapplied

Hypothesizing morpheme boundaries means we can:I segment word into its likely component partsI analyze each part independently of the others

I e.g., ignore an erroneous morpheme while identifying anadjoining correct morpheme

15 / 20

Page 59: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Information for feedback

As it changes state, the transducer will add information tothe current set of analyses:

I Append input symbol to outputI Add morphological features, generally when a transition

crosses a morphological boundaryI Add corrections on the input string, when phonological

processes have been misapplied

Hypothesizing morpheme boundaries means we can:I segment word into its likely component partsI analyze each part independently of the others

I e.g., ignore an erroneous morpheme while identifying anadjoining correct morpheme

15 / 20

Page 60: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Information for feedback

As it changes state, the transducer will add information tothe current set of analyses:

I Append input symbol to outputI Add morphological features, generally when a transition

crosses a morphological boundaryI Add corrections on the input string, when phonological

processes have been misapplied

Hypothesizing morpheme boundaries means we can:I segment word into its likely component parts

I analyze each part independently of the othersI e.g., ignore an erroneous morpheme while identifying an

adjoining correct morpheme

15 / 20

Page 61: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Information for feedback

As it changes state, the transducer will add information tothe current set of analyses:

I Append input symbol to outputI Add morphological features, generally when a transition

crosses a morphological boundaryI Add corrections on the input string, when phonological

processes have been misapplied

Hypothesizing morpheme boundaries means we can:I segment word into its likely component partsI analyze each part independently of the others

I e.g., ignore an erroneous morpheme while identifying anadjoining correct morpheme

15 / 20

Page 62: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Efficiency

Is fully specifying every word wasteful of memory?

I Since the lexicon is an FST, sections shared across formswill only be stored once

I stems which require such affixes simply point to themI Added advantage: analyzer operating over FST lexicon

retains explicit knowledge of stateI easy to entertain competing analyses (Cavar 2008)I easy to return to previous points in an analysis to resolve

ambiguities (cf., e.g., Beesley and Karttunen 2003)

The error taxonomy prevents all possible paths from beingsimultaneously entertained

16 / 20

Page 63: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Efficiency

Is fully specifying every word wasteful of memory?I Since the lexicon is an FST, sections shared across forms

will only be stored onceI stems which require such affixes simply point to them

I Added advantage: analyzer operating over FST lexiconretains explicit knowledge of state

I easy to entertain competing analyses (Cavar 2008)I easy to return to previous points in an analysis to resolve

ambiguities (cf., e.g., Beesley and Karttunen 2003)

The error taxonomy prevents all possible paths from beingsimultaneously entertained

16 / 20

Page 64: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Efficiency

Is fully specifying every word wasteful of memory?I Since the lexicon is an FST, sections shared across forms

will only be stored onceI stems which require such affixes simply point to them

I Added advantage: analyzer operating over FST lexiconretains explicit knowledge of state

I easy to entertain competing analyses (Cavar 2008)I easy to return to previous points in an analysis to resolve

ambiguities (cf., e.g., Beesley and Karttunen 2003)

The error taxonomy prevents all possible paths from beingsimultaneously entertained

16 / 20

Page 65: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Efficiency

Is fully specifying every word wasteful of memory?I Since the lexicon is an FST, sections shared across forms

will only be stored onceI stems which require such affixes simply point to them

I Added advantage: analyzer operating over FST lexiconretains explicit knowledge of state

I easy to entertain competing analyses (Cavar 2008)

I easy to return to previous points in an analysis to resolveambiguities (cf., e.g., Beesley and Karttunen 2003)

The error taxonomy prevents all possible paths from beingsimultaneously entertained

16 / 20

Page 66: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Efficiency

Is fully specifying every word wasteful of memory?I Since the lexicon is an FST, sections shared across forms

will only be stored onceI stems which require such affixes simply point to them

I Added advantage: analyzer operating over FST lexiconretains explicit knowledge of state

I easy to entertain competing analyses (Cavar 2008)I easy to return to previous points in an analysis to resolve

ambiguities (cf., e.g., Beesley and Karttunen 2003)

The error taxonomy prevents all possible paths from beingsimultaneously entertained

16 / 20

Page 67: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Efficiency

Is fully specifying every word wasteful of memory?I Since the lexicon is an FST, sections shared across forms

will only be stored onceI stems which require such affixes simply point to them

I Added advantage: analyzer operating over FST lexiconretains explicit knowledge of state

I easy to entertain competing analyses (Cavar 2008)I easy to return to previous points in an analysis to resolve

ambiguities (cf., e.g., Beesley and Karttunen 2003)

The error taxonomy prevents all possible paths from beingsimultaneously entertained

16 / 20

Page 68: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Sketch of error detection

Analyzer will try to build a path based on information it has

17 / 20

Page 69: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Sketch of error detection

Analyzer will try to build a path based on information it hasI Inappropriate ending for a verb

(7) *начина-евbegin-??

17 / 20

Page 70: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Sketch of error detection

Analyzer will try to build a path based on information it hasI Inappropriate ending for a verb

(7) *начина-евbegin-??

I Analyzers working from both directions will find samemorpheme boundary

17 / 20

Page 71: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Sketch of error detection

Analyzer will try to build a path based on information it hasI Inappropriate ending for a verb

(7) *начина-евbegin-??

I Analyzers working from both directions will find samemorpheme boundary

I Analysis of начина- and of -ев are easily identified asincompatible

17 / 20

Page 72: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Sketch of error detection

Analyzer will try to build a path based on information it hasI Inappropriate ending for this verb

(7) *начина-итbegin-3s

(cf. начина-ет)

17 / 20

Page 73: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Sketch of error detection

Analyzer will try to build a path based on information it hasI Inappropriate ending for this verb

(7) *начина-итbegin-3s

(cf. начина-ет)

I Analyzers working from both directions will find samemorpheme boundary

17 / 20

Page 74: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Sketch of error detection

Analyzer will try to build a path based on information it hasI Inappropriate ending for this verb

(7) *начина-итbegin-3s

(cf. начина-ет)

I Analyzers working from both directions will find samemorpheme boundary

I Analysis of начина- and of -ит do not match in featuresI Morphological information from affix will enable the

repair operation substitution to find the rightcontinuation.

17 / 20

Page 75: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Constructing the Lexicon

I Lexicon generation can be done semi-automatically

I We need:I Freely-available corpus (Sharoff et al. 2008)I A handful of inflected forms to derive common

morphological paradigmsI Unsupervised morphology learner like Linguistica

(Goldsmith and Hu 2004)

18 / 20

Page 76: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Constructing the Lexicon

I Lexicon generation can be done semi-automaticallyI We need:

I Freely-available corpus (Sharoff et al. 2008)I A handful of inflected forms to derive common

morphological paradigmsI Unsupervised morphology learner like Linguistica

(Goldsmith and Hu 2004)

18 / 20

Page 77: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Summary & Outlook

Summary:I An FST lexicon provides a way to do morphological

error analysis on learner language in Russian that is:1. easily optimizable for learner environments2. accurate without sacrificing generality3. flexible enough to detect even unanticipated errors

I We believe this approach is applicable to a number oflanguages

Next Steps:1. Construction of lexicon for small subset of the language

relevant to our exercises2. Performing/testing error detection and diagnosis on top

of the linguistic analysis3. Addition of linguistic analysis beyond the word level,

operating in parallel with the morphological analyzer

19 / 20

Page 78: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Summary & Outlook

Summary:I An FST lexicon provides a way to do morphological

error analysis on learner language in Russian that is:1. easily optimizable for learner environments2. accurate without sacrificing generality3. flexible enough to detect even unanticipated errors

I We believe this approach is applicable to a number oflanguages

Next Steps:1. Construction of lexicon for small subset of the language

relevant to our exercises2. Performing/testing error detection and diagnosis on top

of the linguistic analysis3. Addition of linguistic analysis beyond the word level,

operating in parallel with the morphological analyzer

19 / 20

Page 79: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Acknowledgments

We would like to thankI Detmar Meurers and Luiz Amaral for providing us with

the TAGARELA sourcecode & insights into ICALLsystems

I Anna Feldman and Jirka Hana for advice on Russianresources

I Two anonymous reviewers for insightful comments

This research was supported by grant P116S070001 throughthe U.S. Department of Education’s Fund for theImprovement of Postsecondary Education.

20 / 20

Page 80: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

References

Amaral, Luiz (2007). Designing Intelligent Language Tutoring Systems:integrating Natural Language Processing technology into foreign languageteaching. Ph.D. thesis, The Ohio State University.

Amaral, Luiz and Detmar Meurers (2006). Where does ICALL Fit into ForeignLanguage Teaching? Talk given at CALICO Conference. University ofHawaii, http://purl.org/net/icall/handouts/calico06-amaral-meurers.pdf.

Amaral, Luiz and Detmar Meurers (2007). Putting activity models in thedriver’s seat: Towards a demand-driven NLP architecture for ICALL. Talkgiven at EUROCALL. University of Ulster, Coleraine Campus,http://purl.org/net/icall/handouts/eurocall07-amaral-meurers.pdf.

Beesley, Kenneth R. and Lauri Karttunen (2003). Finite State Morphology .CSLI Publications.

Ćavar, Damir (2008). The Croatian Language Repository: Quantitative andQualitative Resources for Linguistic Research and Language Technologies.Invited talk, Indiana University Department of Lingistics, January 2008.

Clemenceau, David (1997). Finite-State Morphology: Inflections andDerivations in a Singl e Framework Using Dictionaries and Rules. InEmmanuel Roche and Yves Schabes (eds.), Finite State LanguageProcessing , The MIT Press.

Goldsmith, John and Yu Hu (2004). From Signatures to Finite StateAutomata. In Midwest Computational Linguistics Colloquium (MCLC-04).Bloomington, IN.

20 / 20

Page 81: Russian Morphological Processing for ICALLtetreaul/dickinson-slides.pdf · Author: Markus Dickinson and Joshua Herring Created Date: 6/18/2008 9:05:35 PM

RussianMorphologicalProcessing for

ICALL

Introduction &Motivation

ICALL contextSystemarchitectureExercise designError types

MorphologicalanalysisLexiconError detection

Constructing theLexicon

Summary &Outlook

References

Heift, Trude and Devlan Nicholson (2001). Web delivery of adaptive andinteractive language tutoring. International Journal of ArtificialIntelligence in Education 12(4), 310–325.

Koskenniemi, Kimmo (1983). Two-level morphology: a general computationalmodel for word-fo rm recognition and production. Ph.D. thesis, Universityof Helsinki.

Murray, Janet H. (1995). Lessons Learned from the Athena LanguageLearning Project: Using Natural Language Processing, Graphics, SpeechProcessing, and Interactive Video for Communication-Based LanguageLearning. In V. Melissa Holland, Michelle R. Sams and Jonathan D.Kaplan (eds.), Intelligent Language Tutors: Theory Shaping Technology ,Lawrence Erlbaum Associates, chap. 13, pp. 243–256.

Nagata, Noriko (1995). An Effective Application of Natural LanguageProcessing in Second Language Instruction. CALICO Journal 13(1),47–67.

Roark, Brian and Richard Sproat (2007). Computational Approaches toMorphology and Syntax . Oxford University Press.

Sharoff, Serge, Mikhail Kopotev, Tomaž Erjavec, Anna Feldman and DagmarDivjak (2008). Designing and evaluating Russian tagsets. In Proceedingsof LREC 2008 . Marrakech.

Vandeventer Faltin, Anne (2003). Syntactic error diagnosis in the context ofcomputer assisted language learning. Thèse de doctorat, Université deGenève, Genève.

20 / 20