Upload
vannhan
View
223
Download
0
Embed Size (px)
Citation preview
Special Topics in Computer Science
NLP in a NutshellNLP in a NutshellCS492B Spring Semester 2009
Jong C. ParkgComputer Science Department
Korea Advanced Institute of Science and Technology
AN OVERVIEW OF LANGUAGE PROCESSING
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 2
An Overview of Language ProcessingAn Overview of Language Processing
Linguistics and Language ProcessingLinguisticsgComputational LinguisticsNatural Language ProcessingNatural Language ProcessingNatural Language Understanding
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 3
An Overview of Language ProcessingAn Overview of Language Processing
li i f i ( / )Applications of Language Processing (1/3)Spelling and grammar checkers
Status: Ubiquitous, though not perfect yetText indexing and information retrieval and style errors
Status: Among the most popular of the WebSpeech dictation of letters or reports
Status: Some systems have a high performance (cf. IBM’ Vi V i li k h //IBM’s ViaVoice – link: http://www‐01.ibm.com/software/pervasive/embedded_viavoice/)
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 4
An Overview of Language ProcessingAn Overview of Language Processing
Applications of Language Processing (2/3)Voice control of domestic devices such as videocassette recorders or disc changers
Status: Few commercial‐grade systems, despite many Status: Few commercial grade systems, despite many prototype systems
Interactive voice response applicationsInteractive voice response applicationsStatus: Most servers are just interfaces to existing databases, but significant research is on‐going, g g g
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 5
An Overview of Language ProcessingAn Overview of Language Processing
li i f P i ( / )Applications of Language Processing (3/3)Machine translation
Status: One of the oldest domains, with systems that work in a restricted domain in real time (cf. SYSTRAN – link: http://www.systran‐software.co.kr/)p // y /)
Conversational agents Status: Some systems show an interesting performance (cf. TRAINS – link: http://www.cs.rochester.edu/research/cisd/projects/trains/, Ulysse – link: yhttp://www.cs.lth.se/home/Pierre_Nugues/Articles/twlt11/twlt11.html)
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 6
An Overview of Language ProcessingAn Overview of Language Processing
Domains of Language Processing (1/3)Phonetics
Concerns the production and perception of acoustic sounds that form the speech signalp gPhonemes (vowels and consonants) to form syllables
WordsWordsLexicon: the word set of a languageMorphology: the study of the structure and the forms Morphology: the study of the structure and the forms of a word
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 7
An Overview of Language ProcessingAn Overview of Language Processing
Domains of Language Processing (2/3)Syntaxy
Studies the order of words in a sentence and their relationshipspSyntax defines word categories and functions.Parsing determines the structure of a sentence and gassigns functions to words or groups of words
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 8
An Overview of Language ProcessingAn Overview of Language Processing
Domains of Language Processing (3/3)Semantics
Considers the meaning of words and sentencesAlso concerns the determination of the sense of a Also concerns the determination of the sense of a word or the representation of a sentence in a logical format
PragmaticsConcerns the meaning of words and sentences in gspecific situations
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 9
An Overview of Language ProcessingAn Overview of Language Processing
PhoneticsFig. 1.1. A speech signal corresponding to This is. g p g p g
Fig. 1.2. A spectrogram corresponding to the word serious.
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 10
An Overview of Language ProcessingAn Overview of Language Processing
PhoneticsClassification of phonemesp
Simple vowels and nasal vowels appear on the spectrogram as a horizontal bar (the fundamental p g (frequency) and several superimposed horizontal bars (the harmonics).Plosives, fricatives, nasals and approximants
Prosody concerns the general rhythm of the sentence. Speech synthesis, Speech recognitionp y , p g
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 11
An Overview of Language ProcessingAn Overview of Language Processing
i d h lLexicon and MorphologyParts‐of‐speech
Article, noun, verb, adjective, adverb, conjunction, preposition, or pronoun
Morphology is the study of how root words and Morphology is the study of how root words and affixes are composed to form words.
Inflection is the form variation of a word under certain grammatical conditions.Derivation combines affixes to an existing root or stem to form a new word form a new word.
Morphological parsing
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 12
An Overview of Language ProcessingAn Overview of Language Processing
SyntaxGoverns the formation of a sentence from wordsSometimes combined with morphology under the term morpho‐syntaxp yGenerative grammars consist of syntactic rules that decompose a phrase into subphrases and describe a p p psentence composition in terms of phrase structure.
Cf. Phrase‐structure rules
Parsing is the reverse of generation. Cf. Bottom‐up parsing, top‐down parsing
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 13
An Overview of Language ProcessingAn Overview of Language Processing
SemanticsLogical formg
Discourse and DialogueA hAnaphorsSpeech acts
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 14
An Overview of Language ProcessingAn Overview of Language Processing
Why Speech and Language Processing Are Difficult
AmbiguityOccurs in morphological analysis part‐of‐speech Occurs in morphological analysis, part of speech annotation, word senses, references, speech recognition, parsing, anaphora resolution, and g p g pdialogue.
Models and Their Implementationp
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 15
An Overview of Language ProcessingAn Overview of Language Processing
h l i iLanguage Technology in ActionThe Persona Project at Microsoft [paper]Sample Dialogue with Peedy
[Peedy is asleep on his perch.]User: Good morning, Peedy.
[Peedy rouses]P d G d i gPeedy: Good morning.User: Let’s do a demo.
[Peedy stands up, smiles][Peedy stands up, smiles]Peedy: Your wish is my command, what would you like to hear?
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 16
An Overview of Language ProcessingAn Overview of Language Processing
U Wh t h t b B i R itt?User: What have you got by Bonnie Raitt?[Peedy waves in a stream of notes, and grabs one as
they rush by.]y y ]Peedy: OK.User: Play some rock after that.
[Peedy scans the notes again, selects one]Peedy: How about “Fools in Love”?User: Who wrote that?User: Who wrote that?
[Peedy cups one wing to his ‘ear’]Peedy: Huh?yUser: Who wrote that?
[Peedy looks up, scrunches his brow]
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 17
An Overview of Language ProcessingAn Overview of Language Processing
d kPeedy: Joe JacksonUser: Fine.
[D t il ][Drops note on pile]Peedy: OK.
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 18
An Overview of Language ProcessingAn Overview of Language ProcessingFigure 1: System diagram of the Persona conversational assistant
WhisperSpeech
Recognition
SemanticTemplate
Matching &Object
Descriptions
NamesProper NameSubstitution
NLPLanguageAnalysis
ActionTemplatesDatabase
NamesDatabase Object
Database
Player/ReActorAnimation
Engine
DialogueContext &
Conversation
Database(CDs)
SpeechController
ConversationState
Speech &AnimationDatabase
ApplicationCD Changer
DialogueRules
Database
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 19Lifelike Computer Characters: the Persona projectAt Microsoft Research, Gene Ball et al., Microsoft Research.
AN INTRODUCTION TO PROLOG
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 20
A Brief Guide to SWI PrologA Brief Guide to SWI‐Prolog
SWI‐Prolog’s Homepagehttp://www.swi‐prolog.org/p // p g g/
SWI‐Prolog for MS‐WindowslUsing SWI‐Prolog
SWI‐Prolog reference manualSWI Prolog reference manualhttp://hcs.science.uva.nl/projects/SWI‐Prolog/Manual/Prolog/Manual/
Jong C. Park, CS Dept., KAIST CS492B: Spring 2009 21
Introduction to PrologIntroduction to PrologDefining relations by factsDefining relations by factsDefining relations by rulesRecursive rulesHow Prolog answers questionsHow Prolog answers questionsDeclarative and procedural meaning of programs
CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 22
Defining relations by factsDefining relations by facts
Example SentenceExample SentenceTom is a parent of Bob.
Example RepresentationExample Representationparent(tom,bob).
Are there any other ways?y o ytom‐is‐a‐parent‐of‐bob.parent‐of‐bob(tom).l i ( b b)relation(parent,tom,bob).
relation(parent,2,tom,bob).What are the pros and cons?What are the pros and cons?
generality, extensibility, indexibility
CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 23
Defining relations by factsDefining relations by facts
QuestionsQuestionsWho is Tom a parent of?
?‐ parent(tom,X).? parent(tom,X).X = bobyesWho else is Tom a parent of?
parent(tom,bob). parent(tom liz)parent(tom,liz).?‐ parent(tom,X).X = bob;;X = liz;no
CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 24
Defining relations by factsDefining relations by facts
Q iQuestionWho is a grandparent of Jim?
parent(tom,bob). parent(tom,liz). parent(bob,jim).
?‐ parent(X,Y), parent(Y,jim).X = tom, Y = bobyesAny other possibilities?
CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 25
Defining relations by rulesDefining relations by rules
Encoding gender informationEncoding gender informationfemale(liz).
h hAre there any other ways?gender(liz,feminine).
(l f l )person(liz,female,21,...).female([liz,sue,mary]).“X i l if it i t f l ” “X is a male if it is not a female.”
What are the pros and cons?i ?issues?
CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 26
Defining relations by rulesDefining relations by rules
Oth l tiOther relationsthe offspring relationg
Method 1: offspring(liz,tom).Method 2: offspring(Y,X) :‐ parent(X,Y).p g( , ) p ( , )
the mother relationmother(X,Y) :‐ parent(X,Y), female(X).mother(X,Y) : parent(X,Y), female(X).
the grandparent relationgrandparent(X Z) : parent(X Y) grandparent(X,Z) :‐ parent(X,Y),
parent(Y,Z).
Are there any other ways?Are there any other ways?CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 27
Defining relations by rulesDefining relations by rules
fi i h i l iDefining the sister relationsister(X,Y) :‐ parent(Z,X), parent(Z,Y),
female(X).Any problems?
Liz is a sister of herself?One possible solution
sister(X,Y) :‐ parent(Z,X), parent(Z,Y), female(X), different(X,Y).
How do we define different(X,Y)?
CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 28
Recursive rulesRecursive rules
What is recursion?definition
Why do we need recursion?reason
Example: the predecessor relationp ppredecessor(X,Z) :‐ parent(X,Z).predecessor(X Z) : parent(X Y)predecessor(X,Z) :‐ parent(X,Y),
predecessor(Y,Z).
CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 29
How Prolog answers questionsHow Prolog answers questions
i l iTerminologiespredicate, argument, clause, procedurefact, rule, head and body, goal, question
Sample InteractionAxioms
fallible(X) :‐man(X). % All men are fallible.( ) ( )man(socrates). % Socrates is a man.Is this a theorem?
?‐ fallible(socrates). % Is Socrates fallible?yesy
CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 30
How Prolog answers questionsHow Prolog answers questions
Another sample interaction?‐ predecessor(tom,pat).
backtrackingp ( ,p )
parent(pam,bob). parent(tom,bob). parent(tom,liz).parent(bob,ann). parent(bob,pat). parent(pat,jim).p ( , ) p ( ,p ) p (p ,j )female(pam). male(tom). male(bob).female(liz). female(ann). female(pat).
l (ji )male(jim).offspring(Y,X) :‐ parent(X,Y).mother(X Y) :‐ sister(X Y) :‐ mother(X,Y) : ... . sister(X,Y) : ... .grandparent(X,Z) :‐ parent(X,Y), parent(Y,Z).predecessor(X,Z) :‐ parent(X,Z).p ( ) p ( )predecessor(X,Z) :‐ parent(X,Y), predecessor(Y,Z).
CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 31
Meaning of programsMeaning of programs
Declarative meaningconcerned with the relations defined by the yprogramdetermines what will be the output of the determines what will be the output of the program
P d l i gProcedural meaningdetermines how this output is obtained (or, how the relations are actually evaluated by the Prolog system)
CS492B: Spring 2009Jong C. Park, CS Dept., KAIST 32