36
I lli N lL Intelligent Natural Language System MANISH JOSHI System MANISH JOSHI RAJENDRA AKERKAR

Intelligent natural language system

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Intelligent natural language system

I lli N l LIntelligent Natural Language System

M A N I S H J O S H I

System

M A N I S H J O S H I R A J E N D R A A K E R K A R

Page 2: Intelligent natural language system

Open Domain Question Answering

What is Question Answering?

How is QA related to IR, IE?

S i l d QA Some issues related to QA

Question taxonomiesQ

General approach to QA

8 July, 2007ENLIGHT sys 2

Page 3: Intelligent natural language system

Question Answering Systems

These types of systems try to provide exact informationas an answer in response to the natural language queryraised by the user.by

Motivation: given a question, system should provideMotivation: given a question, system should provide an answer instead of requiring user to search for the answer in a set of documents

Example:

Q: What year was Mozart born? A: Mozart was born in 1756.

8 July, 2007ENLIGHT sys 3

Page 4: Intelligent natural language system

Information Retrieval

Document is the unit of information

Answers questions indirectly One has to search into the Document

Results: (ranked) list based on estimated relevance

Effective approaches are predominantly statistical pp p y(“bag of words”)

QA = (very short) passage retrieval with natural language Q ( y ) p g g g

questions (not queries)

8 July, 2007ENLIGHT sys 4

Page 5: Intelligent natural language system

Information Extraction

Task

Identify messages that fall under a number of specific topics

Extract information according to pre-defined templates g p p

Place the information into frame-like database records

Limitations

Templates are hand-crafted by human experts

Templates are domain dependent and not easily portable

8 July, 2007ENLIGHT sys 5

Page 6: Intelligent natural language system

Issues

Applications

Source of the answers Source of the answers Structured data — natural language queries on databases A fixed collection or book — encyclopedia

b dWeb data

Domain-independent vs. Domain specificp p

Users

Casual users vs. Regular users — Profile, History, etc.

May be maintained for regular users

8 July, 2007ENLIGHT sys 6

y g

Page 7: Intelligent natural language system

Question Taxonomy

Factual questions: answer is often found in a text snippet

from one or more documents

Questions that may have yes/no answers

h i ( h h h ) wh questions (who, where, when, etc.)

what, which questions are hard

Questions may be phrased as requests or commands Questions may be phrased as requests or commands

Questions requiring simple reasoning: Some world

knowledge elementary reasoning may be required to relateknowledge, elementary reasoning may be required to relate

the question with the answer. why, how questions

e g How did Socrates die? (by) drinking poisoned wine

8 July, 2007ENLIGHT sys 7

e.g. How did Socrates die? (by) drinking poisoned wine.

Page 8: Intelligent natural language system

Question Taxonomy

Context questions: Questions have to be answered in the

context of previous interactions with the usercontext of previous interactions with the user

Who assassinated Indira Gandhi?

When did this happen?

List questions: Fusion of partial answers scattered over

several documents is necessary

Ex. - List 3 major rice producing nations.

How do I assemble a bicycle?

8 July, 2007ENLIGHT sys 8

Page 9: Intelligent natural language system

QA System Architecture

8 July, 2007ENLIGHT sys 9

Page 10: Intelligent natural language system

General Approach

Question analysis: Find type of object that answers question: "when" -time, date "who" -person, organization, etc.

Document collection preprocessing: Prepare documents for real-time query processing q y p g

Document retrieval (IR): Using (augmented) question, retrieve set of possible relevant documents/passages using IR

8 July, 2007ENLIGHT sys 10

Page 11: Intelligent natural language system

General Approach

Document processing (IE): Search documents for entities of the desired type and in appropriate relations using NLPof the desired type and in appropriate relations using NLP

Answer extraction and ranking: Extract and rank candidate answers from the documents

Answer construction: Provide (links to) context evidenceAnswer construction: Provide (links to) context, evidence, etc.

8 July, 2007ENLIGHT sys 11

Page 12: Intelligent natural language system

Question Analysis

Identify semantic type of the entity sought by the question when, where, who — easy to handle which, what — ambiguous

e.g. What was the Beatles’ first hit single?e.g. What was the Beatles first hit single?

Determine additional constraints on the answer entity key words that will be used to locate candidatekey words that will be used to locate candidateanswer-bearing sentencesrelations (syntactic/semantic) that should hold between a candidate answer entity and other entities mentioned in the question

8 July, 2007ENLIGHT sys 12

Page 13: Intelligent natural language system

Document Processing

Preprocessing: Detailed analysis of all texts in the corpus

b d i imay be done a priori

one group annotates terms with one of 50 semantic

tags which are indexed along with terms

Retrieval: Initial set of candidate answer bearing documentsRetrieval: Initial set of candidate answer-bearing documents

are selected from a large collection

Boolean retrieval methods may be used profitably

Passage retrieval may be more appropriate

8 July, 2007ENLIGHT sys 13

Page 14: Intelligent natural language system

Document Processing

Analysis:

P t f h t i Part of speech tagging

Named entity identification: recognizes multi-wordstrings as names of companies/persons, locations/addresses, quantities, etc.

Shallow/deep syntactic analysis: Obtains informationabout syntactic relations, semantic roles

8 July, 2007ENLIGHT sys 14

Page 15: Intelligent natural language system

History

(MURAX ((Kupiec, 1993 )

was designed to answer questions from the Trivial Pursuit general-knowledge board game – drawing answers from Grolier’s on-line encyclopaedia (1990).

Text Retrieval Conference (TREC). TREC was started in 1992 with the aim of supporting information retrieval research by pp g yproviding the infrastructure necessary for large-scale evaluation of text retrieval methodologies.

The QA track was first included as part of TREC in 1999 with seventeen research groups entering one or more systems

8 July, 2007ENLIGHT sys 15

seventeen research groups entering one or more systems.

Page 16: Intelligent natural language system

Techniques for performing open-domain question answering

Manual and automatically constructed question analysers,

answering

Document retrieval specifically for question answering,

Semantic type answer extractionSemantic type answer extraction,

Answer extraction via automatically acquired surface matching text patterns, p ,

principled target processing combined with document retrieval for definition questions,

and various approaches to sentence simplification which aid in the generation of concise definitions.

8 July, 2007ENLIGHT sys 16

Page 17: Intelligent natural language system

Answer ExtractionLook for strings whose semantic type matches that of theLook for strings whose semantic type matches that of the expected answer - matching may include subasumption (incorporating something under a more general category )

Check additional constraints Select a window around matching candidate and

calculate word overlap between window and query;calculate word overlap between window and query;OR

Check how many distinct question keywords are found in a matching sentence order of occurrence etc

Check syntactic/semantic role of matching candidate

in a matching sentence, order of occurrence, etc.

Semantic Symmetry

Ambiguous Modification

8 July, 2007ENLIGHT sys 17

Page 18: Intelligent natural language system

Semantic Symmetry

Question – Who killed militants?

Militants killed five innocents in Doda District.

After 6 hour long encounter army soldiers killed 3 Militants.

We are looking for sentences containing word ‘Militant’ assubject but we got a sentence where word ‘Militant’ acts asobject (second sentence)

It is a Linguistic Phenomena which occur when an entity acts

object (second sentence)

It is a Linguistic Phenomena which occur when an entity actsas subject in some sentences and as object in anothersentences.

8 July, 2007ENLIGHT sys 18

Page 19: Intelligent natural language system

Example

Following Example illustrates the phenomenon of semantic symmetry and demonstrates problems caused thereof.

Question : Who visited President of India?

Candidate Answer 1: George Bush visited President of India

Candidate Answer 2: President of India visited flood affected area ofMumbai.

More than one sentences are similar at the word level, but they havevery different meanings.

8 July, 2007ENLIGHT sys 19

Page 20: Intelligent natural language system

Some more examples showing semantic symmetry

(1) The birds ate the snake. (1) The snake ate the bird.

(2) Communists in India are

(What does snake eat?)

(2) Small parties are supportingsupporting UPA government.(To whom communists aresupporting?)

Communists in Kerala.

8 July, 2007ENLIGHT sys 20

Page 21: Intelligent natural language system

Ambiguous Modification

It is a Linguistic Phenomena which occurs when an adjective in the sentence may modify more than one nounnoun.Question : What is the largest volcano in the Solar System?

Candidate Answer 1: In the Solar System, the largest planetJupitor has several volcanoes. ---- Wrong

Candidate Answer 2: Olympus Mons, the largest volcano inthe solar system. --- Correct

In first sentence Largest modifies word ‘planet’ whereas in second sentence Largest modifies word ‘volcano’.

8 July, 2007ENLIGHT sys 21

Page 22: Intelligent natural language system

Approaches to tackle the problem

Boris Katz and James Lin of MIT developed a system SAPERE that handles problems occurring due to semantic symmetry and ambiguous modification.

These problems occurs at semantic levelThese problems occurs at semantic level.

To deal with problems occurring at semantic level detailed information at syntactic level is gathered in all approachesy g pp

System developed by Katz and Lin gives results after utilizing syntactic relations. These typical S-V-O ternary relations are obtained after processing the information gathered by Minipar functional dependency parser.

8 July, 2007ENLIGHT sys 22

gat e ed by pa u ct o a depe de cy pa se .

Page 23: Intelligent natural language system

Our Approach

To deal with problems at semantic level most of the approaches available need to obtain and work on

We have proposed a new approach to deal with the

information gathered at syntactic level.

We have proposed a new approach to deal with the problems caused by Linguistic phenomena of Semantic Symmetry and Ambiguous Modification.

The Algorithms based on our approach removes wrong t f th ith th h l f i f tisentences from the answer with the help of information

obtained at Lexical level (Lexical Analysis).

8 July, 2007ENLIGHT sys 23

Page 24: Intelligent natural language system

Algorithm for Handling Semantic Symmetry

Rule 1 -If (sequence of keywords in question and candidateIf (sequence of keywords in question and candidate answer matches) then

If (POS of verb keyword are same) thenC did t i C tCandidate answer is Correct

Rule 2 -If (sequence of keywords in question and candidate answer do not match) then

If (POS verb keyword are not same) thenCandidate answer is CorrectCandidate answer is Correct

Otherwise -Candidate Answer is wrong

8 July, 2007ENLIGHT sys 24

Page 25: Intelligent natural language system

Algorithm for Handling Ambiguous Modification

We have identified the adjective as Adj, Scope defining noun as SN and the Identifier noun as IN.

Rules –If the sentence contains keywords in following order –

Adj α SN Where α indicate string of zero or more keywords.Then e

Rule1-a If α is IN == Correct Answer Or

Rule1 b If α is Blank == Correct AnswerRule1-b If α is Blank == Correct AnswerElse

Rule 2 If α is Otherwise == Wrong Answer

8 July, 2007ENLIGHT sys 25

Page 26: Intelligent natural language system

Algorithm for Handling Ambiguous Modification (Cont.)

If the sentence contains keywords in following order –

(Cont.)

y gSN α Adj β IN Where α and β indicate string

of zero or more keywords.ThenThen

Rule 3 If β is Blank == Correct Answer

(V l f D t tt )(Value of α Does not matter)Else

Rule 4 If β is Otherwise == Wrong Answer

8 July, 2007ENLIGHT sys 26

Page 27: Intelligent natural language system

Working System - ENLIGHT

We have developed a system that answers questions using ‘keyword based matching paradigm’.

We have incorporated newly formulated algorithms in the system and we got goodalgorithms in the system and we got good results.

8 July, 2007ENLIGHT sys 27

Page 28: Intelligent natural language system

ENLIGHT System Architecture

8 July, 2007ENLIGHT sys 28

Page 29: Intelligent natural language system

Thi d l l tf f th I t lli t d

PreprocessingThis module prepares platform for the Intelligent and Effective interface.

This module transfer raw format data into well organized corpus with the help of following activities.

Keyword Extraction Sentence Segmentation Handling of Abbreviations and Punctuation MarksHandling of Abbreviations and Punctuation Marks Tokenization

Stemming Identifying Group of Words with Specific MeaningIdentifying Group of Words with Specific Meaning Shallow Parsing Reference Resolution

8 July, 2007ENLIGHT sys 29

Page 30: Intelligent natural language system

Question Analysis

Q i T k i iQuestion Tokenization Question Classification

C M tCorpus Management Various database tables are created to manage the vast data

InfoKeywordsQuestionKeyword QuestionAnswer CorpusSentences Abb i iAbbreviations Apostrophes StopWords

Answer RetrievalAnswer Searching

8 July, 2007ENLIGHT sys 30

Answer Generation

Page 31: Intelligent natural language system

Answer Rescoring

Handling problems caused due to linguistic phenomena using shallow parsing based algorithms

Intelligence Incorporation

Semantic SymmetryAmbiguous Modification

LearningRote Learning

Intelligence Incorporation

g Feedback

Can ImproveSatisfactoryWrong AnswerLoose criterion

Automated Classification

8 July, 2007ENLIGHT sys 31

Page 32: Intelligent natural language system

Results

P iPreciseness

Response Timep

Adaptability

8 July, 2007ENLIGHT sys 32

Page 33: Intelligent natural language system

Preciseness

ENLIGHT Basic Keyword Matchingg

Average Number of sentences returned as Answer 3 34.6

Average Number of correct sentences 2.63 6

Average precision 84 % 32 %Average precision 84 % 32 %

8 July, 2007ENLIGHT sys 33

Page 34: Intelligent natural language system

Response Time (ENLIGHT Vs Sapere)

Type of Data andN f d

Time Required byQTAG

Time Required by MiniparNo. of words QTAG

(Used in ENLIGHT)Minipar

(Used in Sapere)

News extract, Times ofIndia 202 Words 1.71 s 2.88 sIndia. 202 Words

Reply, START QASystem. 251 Words 1.89 s 3.11 s

Google Search EngineResult 1.55 s 2.86 s

Y h S h E iYahoo Search EngineResults 1.67 s 3.13 s

AVERAGE 1.705 s 2.995 s

8 July, 2007ENLIGHT sys 34

Page 35: Intelligent natural language system

Adaptability

Handling Additional Keywords

Question like ‘who killed the Prime Minister?’ can also be handled by ENLIGHT Systemy y

Use of synonyms

If the question and answer contains synonyms ENLIGHT System can associate these two words using the Learning phase.

8 July, 2007ENLIGHT sys 35

Page 36: Intelligent natural language system

References

L. Hirschman, R. Gaizauskas, Natural language question answering: the view from here, Natural Language engineering, 7(4), December 2001.

Manish Joshi, Rajendra Akerkar, The ENLIGHT System, Intelligent Natural Language System, Journal of Digital Information M JManagement, June 2007.

8 July, 2007ENLIGHT sys 36