Upload
clifton-greer
View
212
Download
0
Embed Size (px)
Citation preview
WordNet – What is it?
WordNet® a large lexical database of English a combination of dictionary and
thesaurus created and maintained by Cognitive
Science Lab of Princeton University designed to establish the connections
between words
WordNet
http://wordnet.princeton.edu/
WordNet – some concepts
WORDnet 4 types of Parts of Speech (POS)▪ Noun, Verb, Adjective, Adverb
Synset▪ the smallest unit in WordNet▪ a synonym set▪ Represent a specific meaning of a word
WordNet – some concepts wordNET
Synsets are connected to one anther through semantic and lexical relations
Type of relations (based on POS)▪ hypernyms (kind-of): ‘vehicle’ is a hypernym of ‘car’▪ hyponyms (kind-of): ‘car’ is a hyponym of ‘vehicle’▪ holonym (part-of): ‘building’ is a holonym of ‘window’▪ meronym(part-of): ‘window’ is a meronym of ‘building’▪ similar to: ‘smart’ is similar to ‘intelligent’ ▪ antonyms: ‘smart’ is antonym of ‘unintelligent’
WordNet – some concepts
hypernym
hyponym
WordNet – interfaces
Unix-style manual Web Interfaces Local Interfaces/APIs
Java Perl C#
http://wordnet.princeton.edu/wordnet/related-projects/#web
Stemming
Definition: the process for removing suffixes of
words to get their base or root form
Example: ‘fishing’, ‘fished’, ‘fish’, ‘fisher’ ‘fish’
Stemmers
Porter Stemmer http://tartarus.org/~martin/PorterStemmer/
Krovetz Stemmer (in Lemur package) http://www.lemurproject.org/phorum/read.p
hp?11,1394
WordNet Stemmer http://tipsandtricks.runicsoft.com/Other/
JavaStemmer.html
NLP techniques
Tokenization The process of breaking a stream of text
up into “words” and punctuation marks. Sentence Splitting Part of Speech Tagging
Example:
He/PRP 's/VBZ at/IN peace/NN with/IN the/DT house/NN and/CC could/MD stay/VB there/RB indefinitely/RB ./.
NLP techniques (cont’)
Name Entity Recognition The process of labeling sequences of words which are
the names of things, such as person, company, location names.
Example:
Jim bought 300 shares of Acme Corp. in 2006.
<ENAMEX TYPE="PERSON">Jim</ENAMEX> bought 300 shares of <ENAMEX TYPE="ORGANIZATION">Acme Corp.</ENAMEX> in <TIMEX TYPE="DATE">2006</TIMEX>.
NLP tools
Stanford POS tagger http://nlp.stanford.edu/software/tagger.shtml
Stanford NER http://nlp.stanford.edu/software/CRF-
NER.shtml GATE
http://gate.ac.uk/ JET
http://cs.nyu.edu/grishman/jet/license.html http://www.cs.nyu.edu/courses/spring10/G22
.2590-001/schedule.html