20
Samatha Gagan Sunil

OpenNLP demo

Embed Size (px)

DESCRIPTION

this ppt was prepared on ubuntu ,so might effect some formatting while opened in windows

Citation preview

Page 1: OpenNLP demo

Samatha

Gagan Sunil

Page 2: OpenNLP demo

What is NLP?

• NLP provides means of analyzing text

• The goal of NLP is to make computers analyze and understand the languages that humans use naturally

• Interaction between Computers-Humans

Page 3: OpenNLP demo

Why Natural Language Processing?

• kJfmmfj mmmvvv nnnffn333• Uj iheale eleee mnster vensi credur• Baboi oi cestnitze

• Computers “see” text in English the same way you have seen above!

• People have no trouble understanding language• Computers have

– No common sense knowledge– No reasoning capacity

Page 4: OpenNLP demo

raw(unstructured)

text

part-of-speechtagging

named entityrecognition

deepsyntacticparsing

annotated(structured)

text

Natural Language Processing

………………………………..………………………………………….………....... Secretion of TNF was abolished by BHA in PMA-stimulated U937 cells. ……………………………………………………………..

Secretion of TNF was abolished by BHA in PMA-stimulated U937 cells .

NN IN NN VBZ VBN IN NN IN JJ NN NNS .

PP PP NP

PP

VP

VP

NP

NP

S

Source: personalpages.manchester.ac.uk/staff/Sophia.Ananiadou/DTCII.ppt

Page 5: OpenNLP demo

Uses of NLP

• Text based application

• Dialogue based application

• Information extractionExtract useful information. e.g. resumes

• Automatic summarizationCondense 1 book into 1 page

Page 6: OpenNLP demo

What is ?

OpenNLP is a open source, java-based NLP tools which perform 1. sentence detection,2. Tokenization, 3. pos-tagging, 4. parsing, 5. named-entity detection using the OpenNLP package.1

1http://opennlp.sourceforge.net/

Page 7: OpenNLP demo

Use of openNLP in our University project

• It can be used in “searching” names using Named entity recognition.

Page 8: OpenNLP demo

OpenNLP is used for:

• Sentence splitting

• Tokenization

• Part-of-speech tagging

• Named entity recognition

• Chunking

• Treebank Parser

Page 9: OpenNLP demo

Sentence splittingsentence boundary = period + space(s) + capital letter

Unusually, the gender of crocodiles is determined by temperature.

If the eggs are incubated tat over 33c, then the egg hatches into a male or 'bull' crocodile.

At lower temperatures only female or 'cow' crocodiles develop.

Unusually, the gender of crocodiles is determined by temperature. If the eggs are incubated tat over 33c, then the egg hatches into a male or 'bull' crocodile. At lower temperatures only female or 'cow' crocodiles develop.

Page 10: OpenNLP demo

sentDetect(s, language = "en", model = NULL)

A character vector with texts from which sentences

should be detected. A character string giving the language of s. This

argument is only used if model is NULL for selecting a default model.

A model. If model is NULL then a default model for

sentence detection is loaded from the corresponding openNLP models language package.

s

language

model

http://opennlp.sourceforge.net/

Page 11: OpenNLP demo

Tokenization

• Convert a sentence into a sequence of tokens

• Divides the text into smallest units (usually words), removingpunctuation.

Rule:

• Use spaces as the boundaries• Adds spaces before and after special characters

tokenize(s, language = "en", model = NULL)

http://opennlp.sourceforge.net/

Page 12: OpenNLP demo

Tokenization

"A Saudi Arabian woman can get a divorce if her husband doesn't give her coffee."

" A Saudi Arabian woman can get a divorce if her husband does n't give her coffee . "

Page 13: OpenNLP demo

Part-of-speech tagging

Assign a part-of-speech tag to each token in a sentence.

Most/JJS lipstick/NN is/VBZ partially/RB made/VBN of/IN fish/NN scales/NNS

Most lipstick is partially made of fish scales

tagPOS(sentence, language = "en", model = NULL, tagdict = NULL)

http://opennlp.sourceforge.net/

Page 14: OpenNLP demo

Part of speech tags1

CC - Coordinating conjunctionCD - Cardinal numberDT - DeterminerEX - Existential thereFW - Foreign wordIN - Preposition or subordinating conjunctionJJ - AdjectiveJJR - Adjective, comparativeJJS - Adjective, superlativeNN - Noun, singular or massNNS - Noun, pluralNNP - Proper noun, singularNNPS - Proper noun, pluralPDT – Predeterminer

NP - Noun Phrase.

PP - Prepositional Phrase

VP - Verb Phrase.

PRP - Personal pronounRB - AdverbRBR - Adverb, comparativeRBS - Adverb, superlativeRP - ParticleSYM - SymbolTO - toUH - InterjectionVB - Verb, base formVBD - Verb, past tenseVBG - Verb, gerund or present participleVBN - Verb, past participleVBP - Verb, non-3rd person singular presentVBZ - Verb, 3rd person singular presentWDT - Wh-determinerWP - Wh-pronounWRB - Wh-adverb

1 http://bulba.sdsu.edu/jeanette/thesis/PennTags.html

Page 15: OpenNLP demo

Named-Entity Recognition

• Named entity recognition classify tokens in text into predefined categories such as date, location, person, time.

• The name finder can find up to seven different types of entities - date, location, money, organization, percentage, person, and time.

15

Page 16: OpenNLP demo

Named-Entity Recognition

Diana Hayden was in Philadelphia city on 3rd october

<namefind/person>Diana Hayden</namefind/person> was

in<namefind/location>Philadelphia</namefind/location> city on<namefind/date>3rd october</namefind/date>

Page 17: OpenNLP demo

Chunking (shallow parsing)

He reckons the current account deficit will narrow toNP VP NP VP PPonly # 1.8 billion in September . NP PP NP

A chunker (shallow parser) segments a sentence into meaningful phrases.

Source: personalpages.manchester.ac.uk/staff/Sophia.Ananiadou/DTCII.ppt

Page 18: OpenNLP demo

Tree bank parser

It tags tokens and groups phrases into a tree.

(TOP (S (NP (DT A) (NN hospital) (NN bed)) (VP (VBZ is) (NP (NP (DT a) (VBN parked) (NN taxi)) (PP (IN with) (NP (DT the) (NN meter) (VBG running)))))))

A hospital bed is a parked taxi with the meter running

Page 19: OpenNLP demo

S

NP VP

DT NN NN VBZ NP

NP

DT VBN NN

PP

IN NP

DT NN VBG

a hospital bed is a parked taxi with the meter running

Visualization of Treebank Parser

Page 20: OpenNLP demo