41
Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

Embed Size (px)

Citation preview

Page 1: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

Introduction to Computational Linguistics

Jay Munson(special thanks to Misty

Azara)

May 30, 2003

Page 2: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

Today’s Goals

I. Introduction to Computational Linguistics (CL) through the discussion of 7 CL core areas.

II. Identify Common CL applications III. Identify the importance of

theoretical linguistics in CL

Page 3: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

What is Computational Linguistics?

Essentially, CL is any task, model, algorithm, etc. that attempts to place any type of language processing (syntax, phonology, morphology, etc.) in a computational setting

Page 4: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

What is Computational Linguistics (CL)?

CL is interdisciplinary Linguistics Computer Science Mathematics Electrical Engineering Speech and Hearing Science

Page 5: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

Seven Core Areas of CL

1. Machine Translation 2. Speech Recognition 3. Text-to-Speech 4. Natural Language Generation 5. Human-Computer Dialogs 6. Information Retrieval 7. Computational Modeling

Page 6: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

1.0 Machine Translation (MT)

Using computers to automate some or all of translating from

one language to another

Page 7: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

1.1 MT (cont.)

Three general models or tasks: Tasks for which a rough translation is

adequate Tasks where a human post-editor can

be used to improve the output Tasks limited to a small sublanguage

Page 8: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

1.2 MT (cont.)

Linguistic knowledge is extremely useful in this area of CL

MT benefits from knowledge of language typology and language-specific linguistic information

Programs are typically “trained” using pre-translated documents/texts.

Page 9: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

1.3 MT Example KANT Knowledge-based Machine Translation The KANT project, Knowledge-based, Accurate Translation

for technical documentation, was founded in 1989 for the research and development of large-scale, practical translation systems for technical documentation. KANT uses a controlled vocabulary and grammar for each source language, and explicit yet focused semantic models for each technical domain to achieve very high accuracy in translation. Designed for multilingual document production, KANT has been applied to the domains of electric power utility management and heavy equipment technical documentation.

http://www.lti.cs.cmu.edu/Research/cmt-projects.html

Page 10: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

2.0 Speech Recognition (SR)

Taking spoken language as input and outputting the

corresponding text

Page 11: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

2.1 SR - Architecture SR takes the source speech and

produces “guesses” as to which words could correspond to the source via some type of acoustic model

The word with the highest probability is selected as the optimal candidate

Contexts are “contained” to improve accuracy

Page 12: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

2.2 Why use SR?

Allow for hands-free human-computer interaction

Assists in automated telephony

Page 13: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

3.0 Text-to-Speech (TTS)

Taking text as input and outputting the corresponding

spoken language

Page 14: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

3.1 Three types of TTS 1. Articulatory- models the

physiological characteristics of the vocal tract

2. Concatenative- uses pre-recorded segments to construct the utterance(s) ScanSoft: Jennifer and Susana

http://www.scansoft.com/realspeak/demo/ Speechify: British Female

http://www.speechworks.com/demos/speechify.cfm

Page 15: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

3.2 Three types of TTS (cont.)

3. Parametric/Formant- models the formant transitions of speech ETI-Eloquence: Reed

http://www.speechworks.com/demos/eti.cfm

Page 16: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

3.3 Why is TTS so difficult?

Spelling through, rough, though, thought

Homonyms PERmit (n) vs. perMIT (v)

Prosody (dependent on context) Pitch, duration of segments, phrasing of

segments, intonational tune, emotion“I am so angry at you. I have never been more enraged in my

life!!”

Page 17: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

3.4 Why use TTS?

Allows for text to be read automatically

Extremely useful for the visually impaired and the hearing impaired.

For a review of the history of TTS until 1987 with sound files, goto:

http://www.ece.ogi.edu/~macon/ECE580/klatt/

Page 18: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

4.0 Natural Language Generation (NLG)

Constructing linguistic outputs from non-linguistic inputs; the NLG goal is to produce natural language

from internal data/structure.

Page 19: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

4.1 Natural language generation (cont) Maps meaning to text Nature of the input varies greatly

from one application to another (i.e documenting structure of a computer program)

The job of the NLG system is to extract the necessary information to drive the generation process

Page 20: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

4.2 NLG systems have to make choices:

Content selection- the system must choose the appropriate content for input, basing its decision on a pre-specified communicative goal

Lexical selection- the system must choose the lexical item most appropriate for expressing a concept

Page 21: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

4.3 NLG (cont)

Sentence Structure Aggregation- the system must

apportion the content into phrase, clause, and sentence-sized chunks

Referential expression- the system must determine how to refer to the objects under discussion (not a trivial task).

Page 22: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

4.4 NLG - Structures

Discourse structure- many NLG systems have to deal with multi-sentence discourses, which must have a coherent structure

Page 23: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

4.5 Sample NLG output

To save a file1. Choose save from the file menu2. Choose the appropriate folder3. Type the file name4. Click the save button

The system will save the document.…

Page 24: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

5.0 Human-Computer Dialogs

Uses a mix of SR, TTS, and pre-recorded prompts to

achieve some goal

Page 25: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

5.1 Human-Computer Dialogs

Uses speech recognition, or a combination of SR and touch tone as input to the system

The system processes the spoken information and outputs appropriate TTS or pre-recorded prompts

Page 26: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

5.2 Human-Computer Dialogs

Dialog systems have specific tasks, which limit the domain of conversation

This makes the SR problem much easier, as the potential responses become very constrained

Page 27: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

5.3 Sample dialog system for banking

…Sys: would you like information for

checking or savings? User: Checking, please.Sys: Your current balance is $2,568.92.

Would you like another transaction?User: Yes, has check #2431 cleared?…

Page 28: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

5.4 Linguistic knowledge in dialog systems

Discourse structure- ensuring natural flowing discourse interaction

Building appropriate vocabularies/lexicons for the tasks

Ensuring prosodic consistencies (i.e. questions sound like questions and spliced prompts sound continuous)

Page 29: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

5.5 Why use human-computer systems?

Automate simple tasks- no need for a teller to be on the other end of the line!

Allow access to system information from anywhere, via the telephone

Page 30: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

6.0 Information Retrieval

Storage, analysis, and retrieval of text documents

Page 31: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

6.1 Information Retrieval (IR)

Most current IR systems are based on some interpretation of “compositional semantics” (e.g. the meaning of the whole is based the meaning of its parts and their combination).

IR is the core of web-based searching, i.e. Google, Altavista, etc.

Page 32: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

6.2 IR - Architecture

User inputs a word or string of words

System processes the words and retrieves documents corresponding to the request

Page 33: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

6.3 “Bag of Words”

The dominant approach to IR systems is to ignore syntactic information and process the meaning of individual words only

Thus, “I see what I eat” and “I eat what I see” would mean exactly the same thing to the system!

Page 34: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

6.4 Linguistic Knowledge in IR

Semantics Compositional Lexical

Syntax (depending on the model used)

Page 35: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

7.0 Computational Modeling

Computational approaches to problem solving, modeling,

and development of theories

Page 36: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

7.1 How can we use computational modeling?

Develop working models of language evolution

Model speech perception, production, and processing

Almost any theoretical model can have a computational counterpart

Page 37: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

7.2 Why Use Computational Modeling?

Forces explicitness – no black boxes or behind the scenes “magic”

Allows us to test our formal theories given a large amount of data

Allows for enhancements in technology and benefits to society through the implementaions of models.

Page 38: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

Conclusions

CL applications utilize linguistic knowledge from all of the major subfields of theoretical linguistics (e.g. theory is necessary!)

Computational modeling can aid/test linguists’ theories of language processing and structure

Page 39: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

Conclusions - Review of 7 core areas in CL

1. Machine Translation 2. Speech Recognition 3. Text-to-Speech 4. Natural Language Generation 5. Human-Computer Dialogs 6. Information Retrieval 7. Computational Modeling

Page 40: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

Conclusions – Review of Today’s Goals

I. Introduction to Computational Linguistics (CL) through the discussion of 7 CL core areas.

II. Identify Common CL applications III. Identify the importance of

theoretical linguistics in CL

Page 41: Introduction to Computational Linguistics Jay Munson (special thanks to Misty Azara) May 30, 2003

El fin.