24
Introduction to Introduction to Computational Computational Linguistics Linguistics Dr. Radhika Mamidi Dr. Radhika Mamidi ENG 270 ENG 270 Lecture 2

Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Embed Size (px)

Citation preview

Page 1: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Introduction to Introduction to Computational LinguisticsComputational Linguistics

Dr. Radhika MamidiDr. Radhika Mamidi

ENG 270ENG 270

Lecture 2

Page 2: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

CL vs NLPCL vs NLP

CL and NLP are related with the focus being CL and NLP are related with the focus being different.different.

Computational Linguistics aims to model Computational Linguistics aims to model language as people do.language as people do.

Natural Language Processing is processing Natural Language Processing is processing language from a computational point of view language from a computational point of view in order to build different applications and in order to build different applications and tools.tools. Applications on the computer sideApplications on the computer side

Page 3: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

History: 1940-1950’sHistory: 1940-1950’s

Development of Development of formal language theoryformal language theory(Chomsky, Kleene, Backus)(Chomsky, Kleene, Backus)– – Formal characterization of classes of grammarFormal characterization of classes of grammar(context-free, regular)(context-free, regular)– – Association with relevant automataAssociation with relevant automata

Probability theoryProbability theory: language understanding as: language understanding asdecoding through noisy channel (Shannon)decoding through noisy channel (Shannon)– – Use of information theoretic concepts like entropy to Use of information theoretic concepts like entropy to measure success of language models.measure success of language models.

Page 4: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

1957-19831957-1983Symbolic vs. StochasticSymbolic vs. Stochastic

SymbolicSymbolic– – Use of formal grammars as basis for natural languageUse of formal grammars as basis for natural languageprocessing and learning systems. (Chomsky, Harris)processing and learning systems. (Chomsky, Harris)– – Use of logic and logic based programming forUse of logic and logic based programming forcharacterizing syntactic or semantic inference characterizing syntactic or semantic inference (Kaplan, Kay,Pereira)(Kaplan, Kay,Pereira)– – First toy natural language understanding and generationFirst toy natural language understanding and generationsystems (Woods, Minsky, Schank, Winograd)systems (Woods, Minsky, Schank, Winograd)– – Discourse Processing: Role of Intention, Focus (Grosz,Discourse Processing: Role of Intention, Focus (Grosz,Sidner, Hobbs)Sidner, Hobbs)

Stochastic ModelingStochastic Modeling– – Probabilistic methods for early speech recognition, OCRProbabilistic methods for early speech recognition, OCR(Bledsoe and Browning, Jelinek, Black, Mercer)(Bledsoe and Browning, Jelinek, Black, Mercer)

Page 5: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

1983-1993:1983-1993:Return of EmpiricismReturn of Empiricism

Use of stochastic techniques for part of speech tagging, Use of stochastic techniques for part of speech tagging, parsing, word sense disambiguation, etc.parsing, word sense disambiguation, etc.

Comparison of stochastic, symbolic and other models for Comparison of stochastic, symbolic and other models for

language understanding and learning taskslanguage understanding and learning tasks. .

Page 6: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

1993-Present1993-Present

Advances in software and hardware Advances in software and hardware

create NLP needs for information retrieval create NLP needs for information retrieval (web), machine translation, spelling and (web), machine translation, spelling and grammar checking, speech recognition and grammar checking, speech recognition and synthesissynthesis..

Page 7: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Language and Intelligence:Language and Intelligence:Turing TestTuring Test

Turing test:Turing test:-- machine, human, and human judge-- machine, human, and human judgeJudge asks questions of computer and human.Judge asks questions of computer and human.-- Machine’s job is to act like a human-- Machine’s job is to act like a human-- Human’s job is to convince judge that he’s not the -- Human’s job is to convince judge that he’s not the

machine.machine.Machine judged “intelligent” if it can fool judge.Machine judged “intelligent” if it can fool judge.Judgment of “intelligence” linked to appropriateJudgment of “intelligence” linked to appropriate

answers to questions from the system.answers to questions from the system.

Page 8: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

ELIZAELIZA

A simple “Rogerian Psychologist”A simple “Rogerian Psychologist” Uses pattern Matching to carry on limited Uses pattern Matching to carry on limited

form of conversation.form of conversation. It gives a feeling that it is “human”It gives a feeling that it is “human” Seems to pass the “Turing Test”Seems to pass the “Turing Test” It is one of the first chatbots.It is one of the first chatbots.

Page 9: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Ambiguity - Mental Ambiguity - Mental processingprocessing

He showed me the mouse - rodent/objectHe showed me the mouse - rodent/object The leopard was spotted - verb/adjectiveThe leopard was spotted - verb/adjective SheShe hithit the boy the boy with the umbrellawith the umbrella I am reading a book on films - I am reading a book on films -

now-a-days/right nownow-a-days/right now Mary promised SallyMary promised Sally(i)(i) to go to her to go to her(i)(i) party party MaryMary(i)(i) persuaded Sally to go to her persuaded Sally to go to her(i)(i) party party

Page 10: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

What’s involved in anWhat’s involved in an“intelligent” Answer?“intelligent” Answer?

Analysis:Analysis:

Decomposition of the signal (spoken orDecomposition of the signal (spoken orwritten) eventually into meaningful units.written) eventually into meaningful units.This involves …This involves …

PhonologyPhonology MorphologyMorphology SyntaxSyntax Discourse AnalysisDiscourse Analysis SemanticsSemantics PragmaticsPragmatics

Page 11: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Levels of Language ProcessingLevels of Language Processing

PhonologyPhonology MorphologyMorphology SyntaxSyntax SemanticsSemantics PragmaticsPragmatics Discourse AnalysisDiscourse Analysis

Page 12: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

ExamplesExamples

Pronounce “GHOTI”Pronounce “GHOTI” I scream, A nameless man I scream, A nameless man change, kite, park, finechange, kite, park, fine Fine for parking!Fine for parking! Flying planes can be dangerous.Flying planes can be dangerous. If the baby doesn’t thrive on raw If the baby doesn’t thrive on raw

milk, milk, boil it!boil it!

How was it?How was it?

Page 13: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Speech/Character RecognitionSpeech/Character Recognition

Decomposition into words, segmentation of Decomposition into words, segmentation of words into appropriate phones or letterswords into appropriate phones or letters

Requires knowledge of phonological patternsRequires knowledge of phonological patterns

Page 14: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

ApplicationsApplications Text to speechText to speech

Riyadh is the capital city of the Kingdom of Riyadh is the capital city of the Kingdom of Saudi Arabia. Riyadh is a beautiful place. I Saudi Arabia. Riyadh is a beautiful place. I love living here.love living here.http://tcts.fpms.ac.be/synthesis/mbrola/http://tcts.fpms.ac.be/synthesis/mbrola/

Use: Public announcements – airport, railway stationsUse: Public announcements – airport, railway stations

Speech Recognition Speech Recognition

Use: Pronunciation dictionaries, mobile phones, Use: Pronunciation dictionaries, mobile phones, voice commands in pcvoice commands in pc

Page 15: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Some problemsSome problems

Grapheme to Phoneme conversionGrapheme to Phoneme conversion Different spellings – same pronunciationDifferent spellings – same pronunciation Same spellings – different pronunciationSame spellings – different pronunciation

Example:Example: read, bow, doveread, bow, dove reed-read, bear-barereed-read, bear-bare

Numbers, Names, AcronymsNumbers, Names, Acronyms 1980, St., PSU1980, St., PSU

Page 16: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

MemoryGeneral Knowledge

Lexicon Syntactic Rules

Semantic Rules

Discourse Rules

LexicalProcessing

INPUTSSyntactic

Processing Semantic

ProcessingDiscourseProcessing

OUTPUTS

Hetararchical model of Language Processing

Page 17: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Morphological AnalysisMorphological Analysis Inflectional morphologyInflectional morphology

:word variation reflects features like tense, number, degree, :word variation reflects features like tense, number, degree, gender gender :grammatical category remains same:grammatical category remains sameeg. eat-eats, boy-boys, thin-thinnereg. eat-eats, boy-boys, thin-thinner

Derivational morphologyDerivational morphology:word variation changes grammatical category:word variation changes grammatical categoryeg. act-actor, boy-boyish eg. act-actor, boy-boyish

:word variation maintains grammatical category :word variation maintains grammatical category eg. fair-unfair, like-dislikeeg. fair-unfair, like-dislike Inflection follows Derivation: act--actInflection follows Derivation: act--actoror--actor--actorss Morphological analyzerMorphological analyzer

identifies roots and affixesidentifies roots and affixes

Page 18: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Syntactic ParsingSyntactic Parsing

Process of identifying syntactic structure of Process of identifying syntactic structure of a valid sentencea valid sentence Represented by trees, rules and networksRepresented by trees, rules and networks

Syntax ComponentsSyntax Components Phrase Structure RulesPhrase Structure Rules Transformational Rules Transformational Rules

Syntactic ParsersSyntactic Parsers e.g. Augmented Transition Networkse.g. Augmented Transition Networks

Page 19: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Syntax ComponentSyntax Component

Chomsky’s (1965) model of languageChomsky’s (1965) model of language Phrase Structure rulesPhrase Structure rules generate deep structures generate deep structures Deep StructureDeep Structure holds all the syntactic information needed to holds all the syntactic information needed to

derive the meaning of a sentencederive the meaning of a sentence This is fed into the This is fed into the semantic componentsemantic component to obtain to obtain

acceptable combinations acceptable combinations Transformational rulesTransformational rules map deep structures to surface map deep structures to surface

structurestructure Surface StructureSurface Structure has words in the right order has words in the right order

This is obtained after feeding surface structure into the This is obtained after feeding surface structure into the phonologicalphonological component component

Page 20: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Chomsky’s modelChomsky’s model

SYNTAX COMPONENT

Surface structures

Transformational rules

Phrase Structure Rules

Deep structures

PHONOLOGICAL COMPONENT

Phonological rules

Selection restriction rules

Lexicon

SEMANTIC COMPONENT

Page 21: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Augmented Transition Augmented Transition NetworksNetworks

Developed by Woods (1970)Developed by Woods (1970) Series of states with arrows (arcs) Series of states with arrows (arcs)

linking one state to the nextlinking one state to the next Works through a sentence from left to Works through a sentence from left to

rightright The arcs are labelledThe arcs are labelled Group of words stored temporarily in Group of words stored temporarily in

‘register’‘register’ helps in helps in look aheadlook ahead - which arc to take next - which arc to take next

Page 22: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

s1 s2 s3

NP VPS:

s1 s2 s3

article noun

Empty Adj loop

NP:

s1 s2 s3

verb NPVP:

Page 23: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

S

NP VP

N V NP

Riyadh isart

beautiful

Adja

Noun

place

Page 24: Introduction to Computational Linguistics Dr. Radhika Mamidi ENG 270 Lecture 2

Example of syntactic analysis by ‘Link parser’.Example of syntactic analysis by ‘Link parser’.

Riyadh is a beautiful place.Riyadh is a beautiful place.(S (NP Riyadh) (S (NP Riyadh) (VP is (VP is

(NP a beautiful place)) (NP a beautiful place)) .) .)

http://www.link.cs.cmu.edu/link/submit-sentence-4.html http://www.link.cs.cmu.edu/link/submit-sentence-4.html