31
A ROBUST FACTOID BASED QUESTION GENERATION SYSTEM PRESENTED BY ANIMESH SHAW ARITRA DAS SHREEPARNA SARKAR

Factoid based natural language question generation system

Embed Size (px)

Citation preview

Page 1: Factoid based natural language question generation system

A ROBUST FACTOID BASED QUESTION GENERATION

SYSTEM

PRESENTED BY

ANIMESH SHAWARITRA DAS

SHREEPARNA SARKAR

Page 2: Factoid based natural language question generation system

CONTENTS• Motivation.• Our Objective.• About Factoid Questions.• Basic Terminology.• Working Procedure.• Rule base Generation.• Question Generation.• Evaluation• Future Scope

Page 3: Factoid based natural language question generation system

MOTIVATION

Google Speech Recognition

Chat bots talking to each other , taken from Cornell Creative Machines Lab

Page 4: Factoid based natural language question generation system

Google Translator , currently translating English to Bengali.

Cleverbot, A chat bot with a good sense of humor. Taken from http://www.cleverbot.com/

CONTD.

Page 5: Factoid based natural language question generation system

OUR OBJECTIVEGenerate an efficient Question Generation SystemTo generate factoid questions from some text document or corpusGenerate questions from each and every sentence if the sentence has some information else the sentence is discarded

For some sentences more than one type of factoid questions is possible thus attempt generating all such possible types.Take user’s opinion or feedback, and improve the result for further use.

Page 6: Factoid based natural language question generation system

FACTOID QUESTIONS?

Factoid Questions: Type of questions that demand accurate information regarding an entity or an event like person names, locations, organization etc., as opposed to definition questions, opinion questions or complex questions as the why or how questions.

Page 7: Factoid based natural language question generation system

BASIC TERMINOLOGY1. TOKENIZING: Breaking the string into words and Punctuation marks.

e.g. - I went home last night. → [‘I’, ‘went’, ‘home’, ‘last’, ‘night’, ‘.’ ]

2. TAGGING: Assigning Parts-of-speech tags to words.e.g. - cat → noun → NN, eat → verb → VB

3. LEMMATIZING: Finding word lemmata (e.g. - was → be).

4. CHUNKING: A feature that allows grouping of words that convey the same thought and then tagging those sets of words. These tags can be like - Verb phrase, Prepositional Phrase, Noun Phrase.

e.g. → Bangladesh defeated India in 2007 World Cup

Page 8: Factoid based natural language question generation system

CONTD.

5. CHUNKS: ‘Bangladesh’ , ‘defeated’, ‘India’ , ‘in’, ‘2007 World Cup’

6. RELATION FINDING: Finding relation between the chunks, , sentence subject, object and predicates as: RELATIONS:

Bangladesh defeated India in 2007 World Cup NP-SBJ-1 VP-1 NP-OBJ-1 PP-TMP-1 NP-TMP-1

Page 9: Factoid based natural language question generation system

RELATION EXAMPLE

CONTD.

Page 10: Factoid based natural language question generation system

WORKING PROCEDURE1. Taken large train sets of Wh-Questions.2. Broken the sentence into chunks and parsed.3. Found relations.

The Sentence: “Who became the 16th president of the United States of America in 1861”

CHUNKING:['Who', 'NP-SBJ-1']

['became', 'VP-1'] ['the 16th president', 'NP-PRD-1']

['of', 'PP']['United States', 'NP']

['of', 'PP']['America', 'NP']

['in', 'PP']

['1681', 'NP']

Page 11: Factoid based natural language question generation system

Storing the tags in a List

['NP-SBJ-1', 'VP-1', 'NP-PRD-1', 'PP', 'NP', 'PP-1', 'NP-1']

“who”

Storing the tags with the corresponding Wh-Type in a list

['Who', ['VP-1', 'NP-PRD-1', 'PP', 'NP', 'PP-1', 'NP-1']]

4. Determined the wh-type , which is determined by the observing the head word of the question.

CONTD.

Page 12: Factoid based natural language question generation system

RULE BASE GENERATIONThe Parent Tree:

This tree is fed to the system before the training is done. When the system reads a question it determines the Wh-type and traverses to that specific node and starts populating the tree.

Page 13: Factoid based natural language question generation system

POPULATING THE RULE-TREETravelled to the specific wh-node and stored these relations by populating the subsequent nodes of the tree with these chunk relations

Page 14: Factoid based natural language question generation system

NORMALIZED COUNTThis is used to let the parser know when to print the question and when not while backtracking to other child nodes. It is defined as :

Occurrence of every tail node of a question from train set Total number of question of that particular wh-tag

The Count is attached to the tail node only.

Example: ‘Who doesn’t want to rule the world?’Nodes: NP-SBJ-1 VP-1,VBZ-VB-TO-VB NP-OBJ -1,14

Here, this type of question structure appears 14 times in the trained set. The tail will have the count value as integer but when the recursive decent parser parses the question base it normalizes the value. This also provides the user with a more probable question among many other questions.

Page 15: Factoid based natural language question generation system

RULE-TREE WITH NORMALIZED COUNT

While populating, the count of visiting each tail node (the node that holds the last chunk relation) is saved in the corresponding node.

A snapshot of the rule base with count value:

Page 16: Factoid based natural language question generation system

ANSWER PREPROCESSING AND QUESTION TYPE DECISION SYSTEM

• While populating the tree with manually generated questions, the NER tag of the answer for a given question is stored with the corresponding wh-tag. • Only some word(s) are stored.

Example :

“Who is the Father of the Nation? Ans Mahatma Gandhi.”

‘Mahatma Gandhi ‘ on NER tagging :

Mahatma [PERSON]

Gandhi [PERSON]

Page 17: Factoid based natural language question generation system

ANSWER BASEWhen the same tag is found in the answer again and again, the count value is increased accordingly.

The Answer Base

Vocabulary = 4

Vocabulary = 4

Page 18: Factoid based natural language question generation system

Generate Questions from Sentences

Page 19: Factoid based natural language question generation system

PRIORITIZING THE QUESTIONSIt’s possible that there are more than one questions in a single path from root to a leaf. The system will prioritize the questions according to their count-depth product:

Normalized count * depth of tail

Example:

Questions Priority

Who is Mahatma Gandhi? (14/747)*3 = 0.056Who is the father of nation? (21/747)*5 = 0.14

So the 2nd sentence is more likely to be the question that is generally asked if the given sentence is parsed. Although it depends upon the train sets’ questions.

Page 20: Factoid based natural language question generation system

SELECTION OF QUESTION

The probability of each type of probable question is calculated using the following function:

Tag with maximum probability is taken into consideration and that type of question is generated.

F(sentence) = Max(Probability(Words/Wh-tag))

Page 21: Factoid based natural language question generation system

EXAMPLE: TRIGGERING WH-TYPE QUESTIONS

When Tags Count2011 DATE DATE = 19:30PM TIME TIME = 1In 2012 IN IN = 210th OCT DATEIn summer IN

Who Tags Count Grace Badell Person Person = 3John Whiks PersonGeneral Mccllen Person

Sentence: “Sourav was captain of India in the 2003 world cup.”

Chunks: ‘Sourav’ , ‘India’ , ‘in the 2003’ Tags : ‘PERSON’ ‘LOCATION’ ‘IN’

Where Tags CountAsia LOCATION LOCATION = 3Plymouth LOCATION IN = 2In the sea INPacific Ocean LOCATIONIn her eyes IN

Page 22: Factoid based natural language question generation system

Probability(Sourav, India, in the 2003/when)

= Prob(When) * Prob (PERSON/when) * Prob (LOCATION/when) * Prob (IN/when)= (4/13) * (1/(4+3)) * (1/(4+3)) * (2/(4+3))= 0.30 * 0.14 * 0.14 * 0.28 = 0.0016Probability(Sourav, India, in the 2003/where)

= Prob(When) * Prob (PERSON/where) * Prob (LOCATION/where) * Prob (IN/where)= (6/13) * (1/(5+2)) * (3/(5+2)) * (2/(5+2))= 0.46 * 0.1 * 0.3 * 0.2 = 0.0027

So, the system will generate the ‘Who’ type question.

Probability(Sourav, India, in the 2003/who)

= Prob(Who) * Prob (PERSON/who) * Prob (LOCATION/who) * Prob (IN/who)= (3/13) * (3/(3+3)) * (1/(3+3)) * (1/(3+3))= 0.23 * 0.5 * 0.16 * 0.16 = 0.0029

CONTD.

Page 23: Factoid based natural language question generation system

After the training is done , the system will generate questions from sentences by traversing the question base with the values of the nodes.

Example: “Mahatma Gandhi is the Father of Nation.”

Suppose it tries to generate ‘Who’ question from this, then the steps would be :

Sentence parsing:

Mahatma Gandhi is the Father of Nation.Chunks: NP-SBJ-1 VP-1 NP-PRD-1 PP NPTags: NNP – NNP VBZ DT-NN IN NN

The chunks and the corresponding relations are put into a table where the keys are the relations and the values are the chunk phrases

CONTD.

Page 24: Factoid based natural language question generation system

Question Generation:

These relation and tag pairs are searched by a Recursive Descent Parser in the question base. If a path is found with these nodes the corresponding chunks are appended one after another and the question is generated.

“Who is the father of nation?”

The Chunk Table:

CONTD.

Page 25: Factoid based natural language question generation system

THE FEEDBACK SYSTEM• Takes the user feedback on the generated questions.• Updates the count values.• Updates the question base accordingly.• Reduces the generation of False Positives.• Enhances the probability of generation of quality questions.

For reference the image is as:

Page 26: Factoid based natural language question generation system

EVALUATION

Manual Generation(%)

System Generation(%)

Perception 100 58.82

Recall 100 91

Perception : % of selected items that are correct Recall : % of correct items that are selected

We tested the system on a given test dataset and acquire the following results :

Page 27: Factoid based natural language question generation system

SCOPE IN FUTUREQuestion Generation is an important function of advanced learning technologies such as:

• Intelligent tutoring systems• Inquiry-based environments• Game-based learning environments• Psycholinguistics• Discourse and Dialogue• Natural Language Generation• Natural Language Understanding• Academic purposes to create Practice and Assessment materials

Page 28: Factoid based natural language question generation system

REFERENCES[1] Liu, Ming, Rafael A. Calvo, and Vasile Rus. "G-Asks: An intelligent automatic question generation system for academic writing support." Dialogue & Discourse 3.2 (2012): 101-124.

[2] Chen, Wei, and Jack Mostow. "Using Automatic Question Generation to Evaluate Questions Generated by Children." The 2011 AAAI Fall Symposium on Question Generation. 2011.

[3] Radev, Dragomir, et al. "Probabilistic question answering on the web." Journal of the American Society for Information Science and Technology 56.6 (2005): 571-583.

[4] Roussinov, Dmitri, and Jose Robles. "Web question answering through automatically learned patterns." Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries. ACM, 2004.

[5] Agarwal, Manish, and Prashanth Mannem. "Automatic gap-fill question generation from text books." Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics, 2011.

[6] Skalban, Yvonne, et al. "Automatic Question Generation in Multimedia-Based Learning." COLING (Posters). 2012.

[7] Becker, Lee, Rodney D. Nielsen, and W. Ward. "What a pilot study says about running a question generation challenge." Proceedings of the Second Workshop on Question Generation, Brighton, England, July. 2009.

Page 29: Factoid based natural language question generation system

[8] Xu, Yushi, Anna Goldie, and Stephanie Seneff. "Automatic question generation and answer judging: a q&a game for language learning." SLaTE. 2009.

[9] Rus, Vasile, and C. Graesser Arthur. "The question generation shared task and evaluation challenge." The University of Memphis. National Science Foundation. 2009.

[10] Lin, Chin-Yew. "Automatic question generation from queries." Workshop on the Question Generation Shared Task. 2008.

[11] Ali, Husam, Yllias Chali, and Sadid A. Hasan. "Automation of question generation from sentences." Proceedings of QG2010: The Third Workshop on Question Generation. 2010.

[12] Bird, Steven, Ewan Klein, and Edward Loper. Natural language processing with Python. " O'Reilly Media, Inc.", 2009.

CONTD.

Page 30: Factoid based natural language question generation system
Page 31: Factoid based natural language question generation system