32
Special Topics in Computer Science NLP in a Nutshell NLP in a Nutshell CS492B Spring Semester 2009 Jong C. Park Computer Science Department Korea Advanced Institute of Science and Technology

in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

  • Upload
    vannhan

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

Special Topics in Computer Science

NLP in a NutshellNLP in a NutshellCS492B Spring Semester 2009

Jong C. ParkgComputer Science Department

Korea Advanced Institute of Science and Technology

Page 2: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

AN OVERVIEW OF LANGUAGE PROCESSING

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 2

Page 3: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

Linguistics and Language ProcessingLinguisticsgComputational LinguisticsNatural Language ProcessingNatural Language ProcessingNatural Language Understanding

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 3

Page 4: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

li i f i ( / )Applications of Language Processing (1/3)Spelling and grammar checkers

Status: Ubiquitous, though not perfect yetText indexing and information retrieval and style errors

Status: Among the most popular of the WebSpeech dictation of letters or reports

Status: Some systems have a high performance (cf. IBM’  Vi V i li k  h //IBM’s ViaVoice – link: http://www‐01.ibm.com/software/pervasive/embedded_viavoice/)

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 4

Page 5: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

Applications of Language Processing (2/3)Voice control of domestic devices such as videocassette recorders or disc changers

Status: Few commercial‐grade systems, despite many Status: Few commercial grade systems, despite many prototype systems

Interactive voice response applicationsInteractive voice response applicationsStatus: Most servers are just interfaces to existing databases, but significant research is on‐going, g g g

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 5

Page 6: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

li i f P i ( / )Applications of Language Processing (3/3)Machine translation

Status: One of the oldest domains, with systems that work in a restricted domain in real time (cf. SYSTRAN – link: http://www.systran‐software.co.kr/)p // y /)

Conversational agents Status: Some systems show an interesting performance (cf. TRAINS – link: http://www.cs.rochester.edu/research/cisd/projects/trains/, Ulysse – link: yhttp://www.cs.lth.se/home/Pierre_Nugues/Articles/twlt11/twlt11.html)

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 6

Page 7: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

Domains of Language Processing (1/3)Phonetics

Concerns the production and perception of acoustic sounds that form the speech signalp gPhonemes (vowels and consonants) to form syllables 

WordsWordsLexicon: the word set of a languageMorphology: the study of the structure and the forms Morphology: the study of the structure and the forms of a word

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 7

Page 8: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

Domains of Language Processing (2/3)Syntaxy

Studies the order of words in a sentence and their relationshipspSyntax defines word categories and functions.Parsing determines the structure of a sentence and gassigns functions to words or groups of words

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 8

Page 9: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

Domains of Language Processing (3/3)Semantics

Considers the meaning of words and sentencesAlso concerns the determination of the sense of a Also concerns the determination of the sense of a word or the representation of a sentence in a logical format

PragmaticsConcerns the meaning of words and sentences in gspecific situations

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 9

Page 10: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

PhoneticsFig. 1.1. A speech signal corresponding to This is. g p g p g

Fig. 1.2. A spectrogram corresponding to the word serious.

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 10

Page 11: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

PhoneticsClassification of phonemesp

Simple vowels and nasal vowels appear on the spectrogram as a horizontal bar (the fundamental p g (frequency) and several superimposed horizontal bars (the harmonics).Plosives, fricatives, nasals and approximants

Prosody concerns the general rhythm of the sentence. Speech synthesis, Speech recognitionp y , p g

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 11

Page 12: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

i d h lLexicon and MorphologyParts‐of‐speech

Article, noun, verb, adjective, adverb, conjunction, preposition, or pronoun

Morphology is the study of how root words and Morphology is the study of how root words and affixes are composed to form words. 

Inflection is the form variation of a word under certain grammatical conditions.Derivation combines affixes to an existing root or stem to form a new word  form a new word. 

Morphological parsing

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 12

Page 13: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

SyntaxGoverns the formation of a sentence from wordsSometimes combined with morphology under the term morpho‐syntaxp yGenerative grammars consist of syntactic rules that decompose a phrase into subphrases and describe a p p psentence composition in terms of phrase structure. 

Cf. Phrase‐structure rules

Parsing is the reverse of generation. Cf. Bottom‐up parsing, top‐down parsing

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 13

Page 14: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

SemanticsLogical formg

Discourse and DialogueA hAnaphorsSpeech acts

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 14

Page 15: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

Why Speech and Language Processing Are Difficult

AmbiguityOccurs in morphological analysis  part‐of‐speech Occurs in morphological analysis, part of speech annotation, word senses, references, speech recognition, parsing, anaphora resolution, and g p g pdialogue. 

Models and Their Implementationp

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 15

Page 16: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

h l i iLanguage Technology in ActionThe Persona Project at Microsoft [paper]Sample Dialogue with Peedy

[Peedy is asleep on his perch.]User:   Good morning, Peedy.

[Peedy rouses]P d  G d  i gPeedy: Good morning.User: Let’s do a demo.

[Peedy stands up, smiles][Peedy stands up, smiles]Peedy: Your wish is my command, what would you like to hear?

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 16

Page 17: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

U Wh t h     t b  B i  R itt?User: What have you got by Bonnie Raitt?[Peedy waves in a stream of notes, and grabs one as 

they rush by.]y y ]Peedy: OK.User: Play some rock after that.

[Peedy scans the notes again, selects one]Peedy: How about “Fools in Love”?User: Who wrote that?User: Who wrote that?

[Peedy cups one wing to his ‘ear’]Peedy: Huh?yUser: Who wrote that?

[Peedy looks up, scrunches his brow]

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 17

Page 18: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language Processing

d kPeedy: Joe JacksonUser: Fine.

[D   t     il ][Drops note on pile]Peedy: OK.

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 18

Page 19: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

An Overview of Language ProcessingAn Overview of Language ProcessingFigure 1: System diagram of the Persona conversational assistant

WhisperSpeech

Recognition

SemanticTemplate

Matching &Object

Descriptions

NamesProper NameSubstitution

NLPLanguageAnalysis

ActionTemplatesDatabase

NamesDatabase Object

Database

Player/ReActorAnimation

Engine

DialogueContext &

Conversation

Database(CDs)

SpeechController

ConversationState

Speech &AnimationDatabase

ApplicationCD Changer

DialogueRules

Database

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 19Lifelike Computer Characters: the Persona projectAt Microsoft Research, Gene Ball et al., Microsoft Research.

Page 20: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

AN INTRODUCTION TO PROLOG

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 20

Page 21: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

A Brief Guide to SWI PrologA Brief Guide to SWI‐Prolog

SWI‐Prolog’s Homepagehttp://www.swi‐prolog.org/p // p g g/

SWI‐Prolog for MS‐WindowslUsing SWI‐Prolog

SWI‐Prolog reference manualSWI Prolog reference manualhttp://hcs.science.uva.nl/projects/SWI‐Prolog/Manual/Prolog/Manual/

Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 21

Page 22: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

Introduction to PrologIntroduction to PrologDefining relations by factsDefining relations by factsDefining relations by rulesRecursive rulesHow Prolog answers questionsHow Prolog answers questionsDeclarative and procedural meaning of programs

CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 22

Page 23: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

Defining relations by factsDefining relations by facts

Example SentenceExample SentenceTom is a parent of Bob.

Example RepresentationExample Representationparent(tom,bob).

Are there any other ways?y o ytom‐is‐a‐parent‐of‐bob.parent‐of‐bob(tom).l i ( b b)relation(parent,tom,bob).

relation(parent,2,tom,bob).What are the pros and cons?What are the pros and cons?

generality, extensibility, indexibility

CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 23

Page 24: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

Defining relations by factsDefining relations by facts

QuestionsQuestionsWho is Tom a parent of?

?‐ parent(tom,X).? parent(tom,X).X = bobyesWho else is Tom a parent of?

parent(tom,bob). parent(tom liz)parent(tom,liz).?‐ parent(tom,X).X = bob;;X = liz;no

CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 24

Page 25: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

Defining relations by factsDefining relations by facts

Q iQuestionWho is a grandparent of Jim?

parent(tom,bob). parent(tom,liz). parent(bob,jim).

?‐ parent(X,Y), parent(Y,jim).X = tom, Y = bobyesAny other possibilities?

CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 25

Page 26: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

Defining relations by rulesDefining relations by rules

Encoding gender informationEncoding gender informationfemale(liz).

h hAre there any other ways?gender(liz,feminine).

(l f l )person(liz,female,21,...).female([liz,sue,mary]).“X i     l  if it i   t   f l ” “X is a male if it is not a female.” 

What are the pros and cons?i ?issues?

CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 26

Page 27: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

Defining relations by rulesDefining relations by rules

Oth   l tiOther relationsthe offspring relationg

Method 1: offspring(liz,tom).Method 2: offspring(Y,X) :‐ parent(X,Y).p g( , ) p ( , )

the mother relationmother(X,Y) :‐ parent(X,Y), female(X).mother(X,Y) : parent(X,Y), female(X).

the grandparent relationgrandparent(X Z) : parent(X Y)   grandparent(X,Z) :‐ parent(X,Y),  

parent(Y,Z).

Are there any other ways?Are there any other ways?CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 27

Page 28: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

Defining relations by rulesDefining relations by rules

fi i h i l iDefining the sister relationsister(X,Y) :‐ parent(Z,X), parent(Z,Y), 

female(X).Any problems?

Liz is a sister of herself?One possible solution

sister(X,Y) :‐ parent(Z,X), parent(Z,Y), female(X), different(X,Y).

How do we define different(X,Y)?

CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 28

Page 29: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

Recursive rulesRecursive rules

What is recursion?definition

Why do we need recursion?reason

Example: the predecessor relationp ppredecessor(X,Z) :‐ parent(X,Z).predecessor(X Z) : parent(X Y)predecessor(X,Z) :‐ parent(X,Y),

predecessor(Y,Z).

CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 29

Page 30: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

How Prolog answers questionsHow Prolog answers questions

i l iTerminologiespredicate, argument, clause, procedurefact, rule, head and body, goal, question

Sample InteractionAxioms

fallible(X) :‐man(X). % All men are fallible.( ) ( )man(socrates). % Socrates is a man.Is this a theorem?

?‐ fallible(socrates).        % Is Socrates fallible?yesy

CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 30

Page 31: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

How Prolog answers questionsHow Prolog answers questions

Another sample interaction?‐ predecessor(tom,pat).

backtrackingp ( ,p )

parent(pam,bob). parent(tom,bob).  parent(tom,liz).parent(bob,ann).   parent(bob,pat). parent(pat,jim).p ( , ) p ( ,p ) p (p ,j )female(pam). male(tom). male(bob).female(liz). female(ann). female(pat).

l (ji )male(jim).offspring(Y,X) :‐ parent(X,Y).mother(X Y) :‐   sister(X Y) :‐  mother(X,Y) : ... . sister(X,Y) : ... .grandparent(X,Z) :‐ parent(X,Y), parent(Y,Z).predecessor(X,Z) :‐ parent(X,Z).p ( ) p ( )predecessor(X,Z) :‐ parent(X,Y), predecessor(Y,Z).

CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 31

Page 32: in Computer Science NLP in a Nutshell - KAIST …nlpcl.kaist.ac.kr/~cs492/lecture02.pdfSpecial Topics in Computer Science NLP in a Nutshell CS492B Spring Semester 2009 JongC. Park

Meaning of programsMeaning of programs

Declarative meaningconcerned with the relations defined by the yprogramdetermines what will be the output of the determines what will be the output of the program

P d l  i gProcedural meaningdetermines how this output is obtained (or, how the relations are actually evaluated by the Prolog system)

CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 32