49
Diwakar Vishwakarma & Bharti Gupta MCA II Year BBAU(A Central University) Lucknow

Natural Language Processing

Embed Size (px)

DESCRIPTION

Artificial Intelligence, Natural Language processing

Citation preview

Diwakar Vishwakarma & Bharti Gupta

MCA II Year

BBAU(A Central University)

Lucknow

AI Concept and Definition

Encompasses Many Definitions AI Involves Studying Human Thought

ProcessesRepresenting Thought Processes on

Machines “…study of how to make computers do

things at which, at the moment, people are better” (Rich and Knight [1991])

Theory of how the human mind works (Mark Fox)

AI Objectives

Make machines smarter Understand what intelligence is Make machines more useful (practical

purpose)

Turing Test for Intelligence A computer can be considered to be smart

only when a human interviewer, “conversing” with both an unseen human being and an unseen computer, can not determine which is which.

Major AI Areas

Expert Systems

Natural Language Processing Speech Understanding Robotics and Sensory Systems Computer Vision and Scene Recognition Neural Computing Fuzzy Logic

Interaction Level Natural Language Processing is a

technique where machine can become more human and there by reducing the distance between human being and the machine can be reduced. Therefore in simple sense NLP makes human to communicate with the machine easily. NLP applications are very useful in everyday life for example a machine that takes instructions by voice.

Interaction Level

The level that computer and human interact.

NL used for make Interaction level near to human.

Human Computer

Command-lineNL UIGraphical UI

Interaction level

Natural?

Natural Language?Natural Language is one of fundamental

aspects of human behaviors.Provide easy interaction with computerRefers to the language spoken by

people, e.g. English, Japanese, Hindi as opposed to artificial languages, like C++, Java, etc.

Where does it fit in the CS taxonomy?

Computers

Artificial Intelligence AlgorithmsDatabases Networking

Robotics Expert SystemNatural Language Processing

InformationRetrieval

Machine Translation

Language Analysis

Semantics Parsing

Natural Language Processing is a collection used to extract the meaning from input

in order to perform the useful task as a result.Automatic analysis of human language by

computer algorithms.

Natural Language Processing

Why Natural Language Processing ?

Huge amounts of data Internet = at least 20 billions pages and

exponentially increasing…

Applications for processing large amounts of texts require NLP expertise

Application Areas of NLP

Text-based applicationsThis involves applications such as searching for a certain topic or a keyword in a data base, extracting information from a large document, translating one language to another or summarizing text for different purposes.

Application Areas of NLP

Dialogue based applications Some of the typical examples of this are answering systems that can answer questions, services that can be provided over a telephone without an operator, teaching systems, voice controlled machines (that take instructions by speech) and general problem solving systems.

Components of Natural Language Processing

• Natural Language Understanding

o Mapping the given input in the natural language

into a useful representation.

o Different level of analysis required:

morphological analysis , syntactic analysis,

semantic analysis, discourse analysis, …

Components of Natural Language Processing

• Natural Language Generation

o Producing output in the natural language from

some internal representation.

o Different level of synthesis required:

deep planning (what to say), syntactic generation

Natural Language Processing

Natural Language UnderstandingThe steps in natural language understanding

are as follows:

Words

Morphological Analysis

Morphologically analyzed words (another step: POS tagging)

Syntactic Analysis

Syntactic Structure

Natural Language UnderstandingSemantic Analysis

Context-independent meaning representation

Discourse Processing

Final meaning representation

MAJOR TASKS INVOLVED IN NATURAL LANGUAGE PROCESSING

Phonology Morphology Syntax Semantics Pragmatics Discourse

Phonology

Deals with the interpretation of speech sounds within and across words. Three types of rules used in phonological analysis:

1) phonetic rules – for sounds within words;

2) phonemic rules – for variations of pronunciation when words are spoken together, and;

3) prosodic rules – for fluctuation in stress and intonation across a sentence.

Morphology

Morphology is the first stage of analysis once input has been received. It looks at the ways in which words break down into their components and how that affects their grammatical status.

Morphology

Morphemes are the smallest meaningful units of language.

cars car+PLU

Children Child+PLU

Syntax

Syntax involves applying the rules of the target language’s grammar, its task is to determine the role of each word in a sentence and organize this data into a structure that is more easily manipulated for further analysis.

Issues in Syntax

“the dog ate my homework” - Who did what?

1. Identify the part of speech (POS)Dog = noun ; ate = verb ; homework = noun

English POS tagging: 95%

(Can be improved)

Identify collocations

mother in law, hot dog

Issues in Syntax Full Parsing Ravindra loves Khusi.

Ravindra loves Khusi

NP(Ravindra) VP(loves Khusi)

Noun(R)NPVerb

lovesNoun(K)

Khusi

Ravindra

LoveMorphological

c

hange

More Issues in Syntax

Preposition Attachment

“I saw the man in the park with a telescope”

Semantics

Semantics are the examination of the meaning of words and sentences. Semantics convey Useful information relevant to the scenario as a whole.

Issues in Semantics

Understand language! How? “plant” = industrial plant “plant” = living organism Words are ambiguous Importance of semantics?

Machine Translation: wrong translationsInformation Retrieval: wrong information

Issues in Semantics

Learn from annotated examples:Assume 100 examples containing “plant”

previously tagged by a humanTrain a learning algorithmHow to choose the learning algorithm?How to obtain the 100 tagged examples?

Pragmatics

Pragmatics is the sequence of steps taken that exposes the overall purpose of the statement being analyzed. This will be broken down into ambiguous entities and will be disambiguate to facilitate understanding.

Discourse

Concerns how the immediately preceding sentences affect the interpretation of the next sentence. For example, interpreting pronouns and interpreting the temporal aspects of the information.

Issues in Discourse

Anaphora Resolution: to resolve referring expression

“The dog entered my room. It scared me”

Mary bought a book for Kelly. She didn’t like it.

• She refers to Mary or Kelly. -- possibly Kelly

• It refers to what -- book.

Approaches to Natural Language Processing

Natural language processing approaches fall

roughly into 3 categories:

Symbolic Approach:

Perform deep analysis of linguistic phenomena

Based on explicit representation of facts about

language

Approaches to Natural Language Processing

Statistical Approach

Employ various mathematical techniques

Use large text corpora to develop

approximate generalized models of

linguistic phenomena

Approaches to Natural Language Processing

Connectionist Approach

Develop generalized models from

examples of linguistic phenomena

Combine statistical learning with various

theories of representation

Research

Microsoft Natural Language Processing GroupThe team is broadening the scope of

the NLP effort by developing parallel systems in several languages. The languages covered are Chinese, English, French, German, Japanese, Korean and Spanish.

Research

Canon Natural Language Processing Groupresearch and development of large

vocabulary speech understanding software, for interactive spoken systems;

Applications of NLP

Machine Translation: different strategiesSystran: www.Systransoft.comGoogle: Translate.google.com

Question – Answering Information Extraction Spell Checking

Microsoft Spell Checker

Machine Translation

Machine Translation is the process of translating from source language text into target language.

There are 2 types of MT: Rule based MT Statistical MT

Machine Translation

Rule based MT

Explicit use and manual creation of linguistically informed rules and representations

Statistical MT

Corpus based, i.e. learned from examples of translations called parallel or bilingual corpora

Applications of Machine Translation

ANGLABHARTI (1991), a machine-aided translation system specifically designed for translating English to Indian languages

at IIT Kanpur.

Anglabharti uses a pseudo-interlingua approach. It analyses English only once and creates an intermediate structure called PLIL (Pseudo Lingua for Indian Languages).

Applications of Machine Translation Anusaaraka (1995) project which started at

IIT Kanpur, and is now being continued at IIIT Hyderabad

Aim of translation from one Indian language to another

Anusaaraka's have been built from Telugu, Kannada, Bengali, and Marathi to Hindi.

TDIL(Technology Development for Indian Languages) is also working on developing various MT tools

Question Answering

Is a system that automatically answer questions posed by humans in natural language

Three steps involved in question answering: Question Manipulation and classification Matching Answer selection

Applications of Question Answering LUNAR gives access to a data base

containing information on lunar rocks and soil composition obtained during the NASA Apollo-11 moon landing mission.

It respond to a natural queries of geologist like “what is the average of the basalt?”

Applications of Question Answering ELIZA uses the keyword and pattern

matching approach. It is based on the use of sentence templates

which contain keywords or phrases. Other famous Question Answering systems

are-SHRDLU, GUS, JUPITER, QUALM, BASEBALL

Future of NLP

Well there are so many applications we can dream with NLP techniques. How about robots that understand and follow instructions by human voice or driving by talking to the car like in some science fiction movies. Well they all can be real one day. Imagine we have a computer system that can follow simple human instructions and do what ever we want it to do. How convenient will it be ?But lets leave all that to the FUTURE.........

Conclusions…

A lot of research is going into developing new applications and investigating new techniques and approaches that will make Statistical NLP more feasible in the near future.

So we will be able to see improved applications of NLP in the near future.

References

Blogs on Natural Language Processing

from the Microsoft’s official site. Tutorial on NLP by Saad Ahmad

(University of northern Iowa) Coppin, B. (2004). Artificial Intelligence

Illuminated.Sudbury, Massachusetts: Jones and Bartlett Publishers

Di Eugenio, B. (2001).Natural-Language Processing for Computer-Supported Instruction. Intelligence. Winter 2001

Thank You