Upload
nya-barton
View
232
Download
4
Tags:
Embed Size (px)
Citation preview
Introduction to Introduction to Computational LinguisticsComputational Linguistics
Dr. Radhika MamidiDr. Radhika Mamidi
ENG 270ENG 270
Lecture 2
CL vs NLPCL vs NLP
CL and NLP are related with the focus being CL and NLP are related with the focus being different.different.
Computational Linguistics aims to model Computational Linguistics aims to model language as people do.language as people do.
Natural Language Processing is processing Natural Language Processing is processing language from a computational point of view language from a computational point of view in order to build different applications and in order to build different applications and tools.tools. Applications on the computer sideApplications on the computer side
History: 1940-1950’sHistory: 1940-1950’s
Development of Development of formal language theoryformal language theory(Chomsky, Kleene, Backus)(Chomsky, Kleene, Backus)– – Formal characterization of classes of grammarFormal characterization of classes of grammar(context-free, regular)(context-free, regular)– – Association with relevant automataAssociation with relevant automata
Probability theoryProbability theory: language understanding as: language understanding asdecoding through noisy channel (Shannon)decoding through noisy channel (Shannon)– – Use of information theoretic concepts like entropy to Use of information theoretic concepts like entropy to measure success of language models.measure success of language models.
1957-19831957-1983Symbolic vs. StochasticSymbolic vs. Stochastic
SymbolicSymbolic– – Use of formal grammars as basis for natural languageUse of formal grammars as basis for natural languageprocessing and learning systems. (Chomsky, Harris)processing and learning systems. (Chomsky, Harris)– – Use of logic and logic based programming forUse of logic and logic based programming forcharacterizing syntactic or semantic inference characterizing syntactic or semantic inference (Kaplan, Kay,Pereira)(Kaplan, Kay,Pereira)– – First toy natural language understanding and generationFirst toy natural language understanding and generationsystems (Woods, Minsky, Schank, Winograd)systems (Woods, Minsky, Schank, Winograd)– – Discourse Processing: Role of Intention, Focus (Grosz,Discourse Processing: Role of Intention, Focus (Grosz,Sidner, Hobbs)Sidner, Hobbs)
Stochastic ModelingStochastic Modeling– – Probabilistic methods for early speech recognition, OCRProbabilistic methods for early speech recognition, OCR(Bledsoe and Browning, Jelinek, Black, Mercer)(Bledsoe and Browning, Jelinek, Black, Mercer)
1983-1993:1983-1993:Return of EmpiricismReturn of Empiricism
Use of stochastic techniques for part of speech tagging, Use of stochastic techniques for part of speech tagging, parsing, word sense disambiguation, etc.parsing, word sense disambiguation, etc.
Comparison of stochastic, symbolic and other models for Comparison of stochastic, symbolic and other models for
language understanding and learning taskslanguage understanding and learning tasks. .
1993-Present1993-Present
Advances in software and hardware Advances in software and hardware
create NLP needs for information retrieval create NLP needs for information retrieval (web), machine translation, spelling and (web), machine translation, spelling and grammar checking, speech recognition and grammar checking, speech recognition and synthesissynthesis..
Language and Intelligence:Language and Intelligence:Turing TestTuring Test
Turing test:Turing test:-- machine, human, and human judge-- machine, human, and human judgeJudge asks questions of computer and human.Judge asks questions of computer and human.-- Machine’s job is to act like a human-- Machine’s job is to act like a human-- Human’s job is to convince judge that he’s not the -- Human’s job is to convince judge that he’s not the
machine.machine.Machine judged “intelligent” if it can fool judge.Machine judged “intelligent” if it can fool judge.Judgment of “intelligence” linked to appropriateJudgment of “intelligence” linked to appropriate
answers to questions from the system.answers to questions from the system.
ELIZAELIZA
A simple “Rogerian Psychologist”A simple “Rogerian Psychologist” Uses pattern Matching to carry on limited Uses pattern Matching to carry on limited
form of conversation.form of conversation. It gives a feeling that it is “human”It gives a feeling that it is “human” Seems to pass the “Turing Test”Seems to pass the “Turing Test” It is one of the first chatbots.It is one of the first chatbots.
Ambiguity - Mental Ambiguity - Mental processingprocessing
He showed me the mouse - rodent/objectHe showed me the mouse - rodent/object The leopard was spotted - verb/adjectiveThe leopard was spotted - verb/adjective SheShe hithit the boy the boy with the umbrellawith the umbrella I am reading a book on films - I am reading a book on films -
now-a-days/right nownow-a-days/right now Mary promised SallyMary promised Sally(i)(i) to go to her to go to her(i)(i) party party MaryMary(i)(i) persuaded Sally to go to her persuaded Sally to go to her(i)(i) party party
What’s involved in anWhat’s involved in an“intelligent” Answer?“intelligent” Answer?
Analysis:Analysis:
Decomposition of the signal (spoken orDecomposition of the signal (spoken orwritten) eventually into meaningful units.written) eventually into meaningful units.This involves …This involves …
PhonologyPhonology MorphologyMorphology SyntaxSyntax Discourse AnalysisDiscourse Analysis SemanticsSemantics PragmaticsPragmatics
Levels of Language ProcessingLevels of Language Processing
PhonologyPhonology MorphologyMorphology SyntaxSyntax SemanticsSemantics PragmaticsPragmatics Discourse AnalysisDiscourse Analysis
ExamplesExamples
Pronounce “GHOTI”Pronounce “GHOTI” I scream, A nameless man I scream, A nameless man change, kite, park, finechange, kite, park, fine Fine for parking!Fine for parking! Flying planes can be dangerous.Flying planes can be dangerous. If the baby doesn’t thrive on raw If the baby doesn’t thrive on raw
milk, milk, boil it!boil it!
How was it?How was it?
Speech/Character RecognitionSpeech/Character Recognition
Decomposition into words, segmentation of Decomposition into words, segmentation of words into appropriate phones or letterswords into appropriate phones or letters
Requires knowledge of phonological patternsRequires knowledge of phonological patterns
ApplicationsApplications Text to speechText to speech
Riyadh is the capital city of the Kingdom of Riyadh is the capital city of the Kingdom of Saudi Arabia. Riyadh is a beautiful place. I Saudi Arabia. Riyadh is a beautiful place. I love living here.love living here.http://tcts.fpms.ac.be/synthesis/mbrola/http://tcts.fpms.ac.be/synthesis/mbrola/
Use: Public announcements – airport, railway stationsUse: Public announcements – airport, railway stations
Speech Recognition Speech Recognition
Use: Pronunciation dictionaries, mobile phones, Use: Pronunciation dictionaries, mobile phones, voice commands in pcvoice commands in pc
Some problemsSome problems
Grapheme to Phoneme conversionGrapheme to Phoneme conversion Different spellings – same pronunciationDifferent spellings – same pronunciation Same spellings – different pronunciationSame spellings – different pronunciation
Example:Example: read, bow, doveread, bow, dove reed-read, bear-barereed-read, bear-bare
Numbers, Names, AcronymsNumbers, Names, Acronyms 1980, St., PSU1980, St., PSU
MemoryGeneral Knowledge
Lexicon Syntactic Rules
Semantic Rules
Discourse Rules
LexicalProcessing
INPUTSSyntactic
Processing Semantic
ProcessingDiscourseProcessing
OUTPUTS
Hetararchical model of Language Processing
Morphological AnalysisMorphological Analysis Inflectional morphologyInflectional morphology
:word variation reflects features like tense, number, degree, :word variation reflects features like tense, number, degree, gender gender :grammatical category remains same:grammatical category remains sameeg. eat-eats, boy-boys, thin-thinnereg. eat-eats, boy-boys, thin-thinner
Derivational morphologyDerivational morphology:word variation changes grammatical category:word variation changes grammatical categoryeg. act-actor, boy-boyish eg. act-actor, boy-boyish
:word variation maintains grammatical category :word variation maintains grammatical category eg. fair-unfair, like-dislikeeg. fair-unfair, like-dislike Inflection follows Derivation: act--actInflection follows Derivation: act--actoror--actor--actorss Morphological analyzerMorphological analyzer
identifies roots and affixesidentifies roots and affixes
Syntactic ParsingSyntactic Parsing
Process of identifying syntactic structure of Process of identifying syntactic structure of a valid sentencea valid sentence Represented by trees, rules and networksRepresented by trees, rules and networks
Syntax ComponentsSyntax Components Phrase Structure RulesPhrase Structure Rules Transformational Rules Transformational Rules
Syntactic ParsersSyntactic Parsers e.g. Augmented Transition Networkse.g. Augmented Transition Networks
Syntax ComponentSyntax Component
Chomsky’s (1965) model of languageChomsky’s (1965) model of language Phrase Structure rulesPhrase Structure rules generate deep structures generate deep structures Deep StructureDeep Structure holds all the syntactic information needed to holds all the syntactic information needed to
derive the meaning of a sentencederive the meaning of a sentence This is fed into the This is fed into the semantic componentsemantic component to obtain to obtain
acceptable combinations acceptable combinations Transformational rulesTransformational rules map deep structures to surface map deep structures to surface
structurestructure Surface StructureSurface Structure has words in the right order has words in the right order
This is obtained after feeding surface structure into the This is obtained after feeding surface structure into the phonologicalphonological component component
Chomsky’s modelChomsky’s model
SYNTAX COMPONENT
Surface structures
Transformational rules
Phrase Structure Rules
Deep structures
PHONOLOGICAL COMPONENT
Phonological rules
Selection restriction rules
Lexicon
SEMANTIC COMPONENT
Augmented Transition Augmented Transition NetworksNetworks
Developed by Woods (1970)Developed by Woods (1970) Series of states with arrows (arcs) Series of states with arrows (arcs)
linking one state to the nextlinking one state to the next Works through a sentence from left to Works through a sentence from left to
rightright The arcs are labelledThe arcs are labelled Group of words stored temporarily in Group of words stored temporarily in
‘register’‘register’ helps in helps in look aheadlook ahead - which arc to take next - which arc to take next
s1 s2 s3
NP VPS:
s1 s2 s3
article noun
Empty Adj loop
NP:
s1 s2 s3
verb NPVP:
S
NP VP
N V NP
Riyadh isart
beautiful
Adja
Noun
place
Example of syntactic analysis by ‘Link parser’.Example of syntactic analysis by ‘Link parser’.
Riyadh is a beautiful place.Riyadh is a beautiful place.(S (NP Riyadh) (S (NP Riyadh) (VP is (VP is
(NP a beautiful place)) (NP a beautiful place)) .) .)
http://www.link.cs.cmu.edu/link/submit-sentence-4.html http://www.link.cs.cmu.edu/link/submit-sentence-4.html