22
Natural Language Natural Language Processing Processing Rogelio Dávila Pérez Rogelio Dávila Pérez Profesor – Investigador Profesor – Investigador [email protected] [email protected]

Natural Language Processing Rogelio Dávila Pérez Profesor – Investigador [email protected]

Embed Size (px)

Citation preview

Natural Language Natural Language ProcessingProcessingRogelio Dávila PérezRogelio Dávila Pérez

Profesor – InvestigadorProfesor – Investigador

[email protected]@hotmail.com

Some terms …Some terms …

Speech recognitionSpeech recognition Natural language understandingNatural language understanding Computational LinguisticsComputational Linguistics Natural language generationNatural language generation Speech synthesisSpeech synthesis Information retrievalInformation retrieval Information extractionInformation extraction InferenceInference

Application AreasApplication Areas

Machine TranslationMachine Translation Information RetrievalInformation Retrieval Knowledge-acquisitionKnowledge-acquisition User interfacesUser interfaces

– Question-Answering SystemsQuestion-Answering Systems

Application AreasApplication Areas Advantages of Natural Language InterfacesAdvantages of Natural Language Interfaces

Natural language has several obvious and Natural language has several obvious and desirable properties: desirable properties: (Patrick Doyle(Patrick Doyle ) )– It provides an immediate vocabulary for talking about It provides an immediate vocabulary for talking about

the contents of the computer. the contents of the computer.

– It provides a means of accessing information in the It provides a means of accessing information in the computer independently of its structure and encodings.computer independently of its structure and encodings.

– It shields the user from the formal access language of It shields the user from the formal access language of

the underlying system. the underlying system.

– It is available with a minimum of training. It is available with a minimum of training.

Natural Language Processing HistoryNatural Language Processing History

ElizaEliza [Weizenbaum, 1966] The most famous pattern-matching [Weizenbaum, 1966] The most famous pattern-matching natural language program, ELIZA was built at MIT in 1966. natural language program, ELIZA was built at MIT in 1966. The program assumes the role of a Rogerian, or The program assumes the role of a Rogerian, or "nondirective," therapist in its dialog with the user. "nondirective," therapist in its dialog with the user.

It operated by matching the left sides of its rules against It operated by matching the left sides of its rules against the user's last sentence, and using the appropriate right the user's last sentence, and using the appropriate right side to generate a response. Rules were indexed by side to generate a response. Rules were indexed by keywords so only a few had to be matched against a keywords so only a few had to be matched against a particular sentence. Some rules had no left side, so they particular sentence. Some rules had no left side, so they could apply anywhere with replies like "Tell me more about could apply anywhere with replies like "Tell me more about that." Note that these rules are "approximate" matchers. that." Note that these rules are "approximate" matchers. This accounts for ELIZA's major strength, its ability to say This accounts for ELIZA's major strength, its ability to say something reasonable most of the time, as well as its major something reasonable most of the time, as well as its major weakness, the superficiality of its understanding and its weakness, the superficiality of its understanding and its ability to be led completely astray. ability to be led completely astray.

Natural Language Processing HistoryNatural Language Processing History

LUNARLUNAR[William Woods, 1973] LUNAR answered questions about [William Woods, 1973] LUNAR answered questions about the rock samples brought back from the moon using two the rock samples brought back from the moon using two databases -- the chemical analyzes and the literature databases -- the chemical analyzes and the literature references. Specifically, it helped geologists access, references. Specifically, it helped geologists access, compare, and evaluate chemical analysis data on moon compare, and evaluate chemical analysis data on moon rocks and soil composition obtained from the Apollo-11 rocks and soil composition obtained from the Apollo-11 mission. It operated by translating a question entered in mission. It operated by translating a question entered in English into an expression in a formal query language. The English into an expression in a formal query language. The translation was done with an ATN parser coupled with a translation was done with an ATN parser coupled with a rule-driven semantic interpretation procedure.rule-driven semantic interpretation procedure.

Natural Language Processing HistoryNatural Language Processing History

SHRDLU SHRDLU [Winograd, 1972] SHRDLU carried on a dialog with a user in [Winograd, 1972] SHRDLU carried on a dialog with a user in which the system simulated a robot manipulating a set of which the system simulated a robot manipulating a set of simple objects on a tabletop. Knowledge was represented simple objects on a tabletop. Knowledge was represented as procedures within the system. as procedures within the system.

The design of the system was based on the belief that, to The design of the system was based on the belief that, to understand language, a program must deal in an integrated understand language, a program must deal in an integrated way with syntax, semantics, and reasoning. The basic way with syntax, semantics, and reasoning. The basic viewpoint guiding its implementation was that meanings (of viewpoint guiding its implementation was that meanings (of words, phrases, and sentences) can be embodied in words, phrases, and sentences) can be embodied in procedural structures and that language is a way of procedural structures and that language is a way of activating appropriate procedures in the hearer. activating appropriate procedures in the hearer.

Natural Language Processing HistoryNatural Language Processing History

HEARSAYHEARSAYSpeech understanding for voice chess [CMU, 1976]. Speech understanding for voice chess [CMU, 1976]. HEARSAY uses a blackboard architecture with knowledge HEARSAY uses a blackboard architecture with knowledge sources posting constraints. HEARSAY-II understood a sources posting constraints. HEARSAY-II understood a spoken speech query about computer science abstracts spoken speech query about computer science abstracts stored in a database. HEARSAY-III is a general blackboard stored in a database. HEARSAY-III is a general blackboard architecture. architecture. The HEARSAY project was meant to overcome the The HEARSAY project was meant to overcome the limitations of syntax-directed methods of parsing from left limitations of syntax-directed methods of parsing from left to right. to right. HEARSAY uses three knowledge sources: acoustics and HEARSAY uses three knowledge sources: acoustics and phonetics, syntax of legal utterances, and semantics of the phonetics, syntax of legal utterances, and semantics of the domain. Knowledge was constrained by using expected domain. Knowledge was constrained by using expected utterances. utterances.

Basic DefinitionsBasic Definitions Computational linguistics (CL)Computational linguistics (CL)

Computational LinguisticsComputational Linguistics is a discipline between is a discipline between linguistics and computer science which is linguistics and computer science which is concerned with the computational aspects of the concerned with the computational aspects of the human language faculty. It belongs to the human language faculty. It belongs to the cognitive sciences and overlaps with the field of cognitive sciences and overlaps with the field of artificial intelligence (AI)artificial intelligence (AI), a branch of , a branch of computer sciencecomputer science aiming at computational aiming at computational models of human cognition. (models of human cognition. (HANS USZKOREITHANS USZKOREIT))

Basic DefinitionsBasic Definitions Natural Language (NL)Natural Language (NL)

The languages that people speak: English, The languages that people speak: English, Spanish, Nahuatl, etc.Spanish, Nahuatl, etc.

Natural Language Processing (NLP)Natural Language Processing (NLP)NLP is concerned with making the computer to NLP is concerned with making the computer to understand natural language.understand natural language.

Machine translation Machine translation Machine translation is concerned with making the Machine translation is concerned with making the computer to automatically translate from one computer to automatically translate from one language into another.language into another.

Basic DefinitionsBasic Definitions

GrammarGrammar A grammar of a language is a scheme for A grammar of a language is a scheme for

specifying the sentences in that language. It specifying the sentences in that language. It indicates the syntactic rules for combining indicates the syntactic rules for combining words into well-formed phrases and clauses. words into well-formed phrases and clauses. The theory of The theory of generative grammar [Chomsky, generative grammar [Chomsky, 1957] had a profound effect on linguistic 1957] had a profound effect on linguistic research, including AI work in computational research, including AI work in computational linguistics. (linguistics. (Patrick DoylePatrick Doyle ))

Basic DefinitionsBasic Definitions

ParsingParsing Parsing is the "de-linearization" of linguistic Parsing is the "de-linearization" of linguistic

input; that is, the use of grammatical rules and input; that is, the use of grammatical rules and other knowledge sources to determine the other knowledge sources to determine the functions of words in the input sentence. functions of words in the input sentence. Usually a parser produces a data structure like Usually a parser produces a data structure like a a derivation treederivation tree to represent the structural to represent the structural meaning of a sentence. meaning of a sentence. ((Patrick DoylePatrick Doyle ))

Knowledge about languageKnowledge about language Phonetics and PhonologyPhonetics and Phonology

MorphologyMorphology

SyntaxSyntax

SemanticsSemantics

PragmaticsPragmatics

DiscourseDiscourse

Knowledge about languageKnowledge about language Phonetics and Phonology. Phonetics and Phonology. The study of The study of

linguistic sounds.linguistic sounds.

MorphologyMorphology

SyntaxSyntax

SemanticsSemantics

PragmaticsPragmatics

DiscourseDiscourse

Knowledge about languageKnowledge about language Phonetics and Phonology. Phonetics and Phonology. The study of The study of

linguistic sounds.linguistic sounds.

Morphology. Morphology. The study of the meaningful The study of the meaningful components of words.components of words.

SyntaxSyntax

SemanticsSemantics

PragmaticsPragmatics

DiscourseDiscourse

Knowledge about languageKnowledge about language Phonetics and Phonology Phonetics and Phonology –The study of –The study of

linguistic sounds.linguistic sounds.

Morphology Morphology –The study of the meaningful –The study of the meaningful components of words.components of words.

Syntax Syntax –The study of the structural relationships –The study of the structural relationships between words.between words.

SemanticsSemantics

PragmaticsPragmatics DiscourseDiscourse

Knowledge about languageKnowledge about language Phonetics and Phonology Phonetics and Phonology –The study of –The study of

linguistic sounds.linguistic sounds.

Morphology Morphology –The study of the meaningful –The study of the meaningful components of words.components of words.

Syntax Syntax –The study of the structural relationships –The study of the structural relationships between words.between words.

Semantics Semantics –The study of meaning.–The study of meaning.

PragmaticsPragmatics

DiscourseDiscourse

Knowledge about languageKnowledge about language Phonetics and Phonology Phonetics and Phonology –The study of –The study of

linguistic sounds.linguistic sounds. Morphology Morphology –The study of the meaningful –The study of the meaningful

components of words.components of words. Syntax Syntax –The study of the structural relationships –The study of the structural relationships

between words.between words.

Semantics Semantics –The study of meaning.–The study of meaning.

Pragmatics Pragmatics –The study of how language is –The study of how language is to accomplish goals.to accomplish goals.

DiscourseDiscourse

Knowledge about languageKnowledge about language Phonetics and Phonology Phonetics and Phonology –The study of –The study of

linguistic sounds.linguistic sounds.

Morphology Morphology –The study of the meaningful –The study of the meaningful components of words.components of words.

Syntax Syntax –The study of the structural relationships –The study of the structural relationships between words.between words.

Semantics Semantics –The study of meaning.–The study of meaning.

Pragmatics Pragmatics –The study of how language is –The study of how language is to accomplish goals.to accomplish goals.

Discourse Discourse –The study of linguistic units larger–The study of linguistic units largerthan a single sentence.than a single sentence.

AmbiguityAmbiguity We say that some linguistic construction is We say that some linguistic construction is

ambiguousambiguous if there are multiple alternative if there are multiple alternative structures that can be built for it.structures that can be built for it.– Lexical ambiguity: a word or expression may have more Lexical ambiguity: a word or expression may have more

than one meaning. than one meaning. E.g.E.g.

– Syntactic ambiguity: a sentence may have more than Syntactic ambiguity: a sentence may have more than one syntactic tree.one syntactic tree.

State of Art State of Art A Canadian computer program METEO for more A Canadian computer program METEO for more

than 20 years have accepted daily weather data than 20 years have accepted daily weather data and generated weather reports that are passed and generated weather reports that are passed along unedited to the public in English and French along unedited to the public in English and French [Chandioux, 1976].[Chandioux, 1976].

The The Babel FishBabel Fish translation system from Systran translation system from Systran handles over 1,000,000 translation requests a handles over 1,000,000 translation requests a day from the AltaVista.com search engine site.day from the AltaVista.com search engine site.

A visitor to Cambridge, Massachusetts, asks a A visitor to Cambridge, Massachusetts, asks a computer about places to eat using only spoken computer about places to eat using only spoken language. The system returns relevant language. The system returns relevant information from a database of facts about the information from a database of facts about the local restaurant scene [Zue et al., 1991].local restaurant scene [Zue et al., 1991].

Some DefinitionsSome Definitions MorphemeMorpheme

A meaningful linguistic unit that contains no smaller A meaningful linguistic unit that contains no smaller meaningful parts. meaningful parts. (Patrick Doyle(Patrick Doyle ) )

AnaphoraAnaphora The use of a word to refer to previously-mentioned The use of a word to refer to previously-mentioned

entities, e.g., entities, e.g., The boys and I went over to Frank's, The boys and I went over to Frank's, because because theythey needed to talk to needed to talk to himhim. (Patrick Doyle. (Patrick Doyle ) )