Upload
tranquynh
View
246
Download
0
Embed Size (px)
Citation preview
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Natural Language Processing (NLP):Overview & Tools
L715/B659
Dept. of Linguistics, Indiana UniversityFall 2016
1 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Natural Language Processing
Natural Language Processing (NLP): “The goal of this. . . field is to get computers to perform useful tasks involvinghuman language” (Jurafsky & Martin 2009, p. 1)
Applications include:I conversational agents / dialogue systemsI machine translationI question answeringI ...
We will focus on natural language understanding (NLU):obtaining linguistic information (meaning) from input (text)
2 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
What do we need NLP for?
I One hand: we intend to do NLP, i.e., automaticallyanalyze natural language for the purposes of providingmeaning (of a sort) from a text
I Other hand: use NLP tools to pre-process data, i.e.,provide sentence-level grammatical information:
I Segment sentencesI Tokenize wordsI Part-of-speech tag wordsI Syntactically (and semantically?) parse sentencesI Provide semantic word sensesI Provide named entitiesI Provide language models
This kind of (pre-)processing is the focus for today
3 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Why (not) (just) surface features?
Surface features can be very usefulI Function words: small, closed set that recur a lotI Ease of use: data-driven patterns emerge without
writing out patterns by handI Hypothesis: people differ in specific word choices
Surface features can be limited:I Data sparsity: surface features may not be seen again,
especially with small training dataI Morphological complexity: word similarity can be
“hidden”I Hypothesis: people differ in deeper linguistic properties
4 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Where we’re going
We are going to focus on:I what the general tasks are & what the uses areI what kinds of information they generally rely onI what tools are available
We’ll look at POS tagging, parsing, word sense assignment,named entity recognition, & semantic role labeling
I We’ll focus on English, but try to note generalapplicability
Many taggers/parsers have pre-built models; others can betrained on annotated dataI For now, we’ll focus on pre-built models
5 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Wikis with useful technology information
Places you can get your own information:
I Our very own IU CL wiki, which includes some people’sexperiences with various tools
I http://cl.indiana.edu/wikiI Always feel free to add your own experiences to help
the next person who wants to use that toolI ACL wiki & resources
I http://www.aclweb.org/aclwiki/index.php?title=Main Page
I http://www.aclweb.org/aclwiki/index.php?title=ACL Data and Code Repository
I http://www.aclweb.org/aclwiki/index.php?title=List of resources by language
I ACL software registry: http://registry.dfki.de/
6 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
General NLP packages
I Stanford NLP: http://nlp.stanford.edu/software/ (seeesp. the CoreNLP package)
I EmoryNLP (NLP4J): http://nlp.mathcs.emory.eduI ClearNLP: http://www.clearnlp.comI FreeLing: http://nlp.lsi.upc.edu/freeling/I LingPipe: http://alias-i.com/lingpipe/I OpenNLP: http://opennlp.apache.org/index.htmlI Natural Language Toolkit (NLTK): http://www.nltk.org/I Illinois tools:
http://cogcomp.cs.illinois.edu/page/softwareI DKPro: https://www.ukp.tu-darmstadt.de/research/
current-projects/dkpro/I Includes text classification tool built on top of weka
7 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Topic #1: Language modeling
Language models store lots of text in n-gram form, using it toassign probabilities to new sequences of textI Tend to be fast & surprisingly accurate
Some packages:I KenLM Language Model Toolkit:
https://kheafield.com/code/kenlm/I MIT Language Modeling Toolkit:
https://code.google.com/p/mitlm/I SRI Language Modeling Toolkit:
http://www.speech.sri.com/projects/srilm/I CMU-Cambridge Statistical Language Modeling Toolkit
v2: http://www.speech.cs.cmu.edu/SLM/toolkit.html
8 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Why n-grams?
The packages themselves may or may not help us, but theidea of surface n-grams likely will
I Core idea: sequences of words approximate syntactic &some semantic constraints
I e.g., Who uses of the more? (of : nominals, the:concrete objects/ideas)
I e.g., my life vs. your life
I n-grams also are at the core of other technologies:POS tagging, distributional semantics, etc.
9 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Topic #2: POS Tagging
Idea: assign a part-of-speech to every word in a text
I (Supervised) Taggers work by:I looking up a set of appropriate tags for a word in a
dictionaryI using local context to disambiguate from among the set
I Sequence modeling (HMMs, CRFs) are thus popular
Some examples illustrating the utility of local context:
I for the man: noun or verb?I we will man: noun or verb?I I can put: verb base form or past?I re-cap real quick: adjective or adverb?
Bigram or trigram tagging is quite popularI Take L545/L645 if you want to know more
10 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Motivation for POS tags
What are POS tags good for in our intended downstreamapplications?I First step towards knowing the meaning, e.g., for word
senses (e.g., leaves)I Help identify function words & content words (e.g., for
stylometry)I POS sequences (n-grams) may be indicative of style
I POS n-grams approximate syntax
Note that POS tags are generally very fast to obtain & aregenerally accurate (for English, on well-formed data)
11 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Challenges for POS tagging
General challenges:I Ambiguity
I e.g., still as noun, verb, adverb, adjective, ...I Unknown words
I Programs use things like suffix tries to guess at thepossible POS tags for unknown words
These challenges are exacerbated in the following areas:I Morphologically-rich languagesI Data which is not well-edited (e.g., web data)
12 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
POS taggers
I TnT: http://www.coli.uni-saarland.de/∼thorsten/tnt/I Trainable; models for German & English
I TreeTagger: http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/
I Trainable; models for English, German, Italian, Dutch,Spanish, Bulgarian, Russian, & French; unix, mac, PC
I Qtag: http://www.english.bham.ac.uk/staff/omason/software/qtag.html
I Trainable; models for German & EnglishI LingPipe: http://alias-i.com/lingpipe/index.html
I Has a variety of NLP modulesI OpenNLP: http://opennlp.sourceforge.net/
I Models for English, German, Spanish, & Thai; Has avariety of NLP modules
13 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
POS taggers (2)
I ACOPOST: http://acopost.sourceforge.net/I Trainable; integrates different technologies
I Stanford tagger:http://nlp.stanford.edu/software/tagger.shtml
I Trainable; models for English, Arabic, Chinese, &German
I CRFTagger: http://crftagger.sourceforge.net/I English
I Can also use SVMTool(http://www.lsi.upc.edu/∼nlp/SVMTool/) or CRF++(http://crfpp.sourceforge.net/) for tagging sequentialdata, or fnTBL for classification tasks(http://www.cs.jhu.edu/∼rflorian/fntbl/index.html)
14 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Specialized POS taggers
Twitter tagger:I CMU Ark: http://www.ark.cs.cmu.edu/TweetNLP/I GATE: https://gate.ac.uk/wiki/twitter-postagger.html
(also available to plug into Stanford tagger)
Biomedical tagger:I GENIA tagger:
http://www.nactem.ac.uk/tsujii/GENIA/tagger/I cTAKES (clinical Text Analysis and Knowledge
Extraction System):https://ctakes.apache.org/index.html
15 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Topic #3: Parsing
Parsers attempt to build a tree, based on some grammarI Efficiency based on many things, including the manner
in which the tree is builtI They often disambiguate by using probabilities of rules
Again, take L545/L645 for more details
16 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Constituencies & Dependencies
Rough idea of the difference:
Constituency: S
VP
NN
fire
VBD
breathed
NP
NN
dragon
DT
the
Dependency:
vroot the dragon breathed fireDT NN VBD NN
DET SUBJROOT
OBJ
17 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Constituency parsing
Goal is to obtain phrasesI Structured prediction: dealing with embedded /
recursive structuresI Parsing can be slow, but tends to be fairly accurate
I POS tags obtained while parsing more accurate thanwith a standalone POS tagger
Usefulness for downstream applications:I Identifying sequences, e.g., named entitiesI Identifying complexity, e.g., depth of embeddingI Identifying particular types of constructions, e.g.,
relative clauses
18 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Challenges in parsing
In addition to things like lexical ambiguity & unknown words,additional challenges include:I Structural ambiguity: e.g., They saw the man in the park
with a telescopeI Garden paths: e.g., The horse raced past the barn fell
Again, out-of-domain data poses a challengeI Note that for morphologically-rich languages, parsing is
underdeveloped & some of the work is in themorphology
19 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Dependency parsing
Dependency parsing is the task of assigning dependency(grammatical) relations to a sentenceI Provides quick access to semantic relations (“who did
what to whom”)I Can be done on top of constituency parsing or on its
ownI Formally, dependency parsing is simpler: assign a
single head & relation for every word (single-headconstraint)
Useful applications:I Pretty close to the same set as with constituencies ...
20 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Constituency Parsers
I LoPar: http://www.ims.uni-stuttgart.de/tcl/SOFTWARE/LoPar.html
I Trainable; models for English & GermanI BitPar: http:
//www.ims.uni-stuttgart.de/tcl/SOFTWARE/BitPar.htmlI Trainable; models for English & German
I Charniak & Johnson parser:http://www.cs.brown.edu/people/ec/#software
I Trainable; mainly used for English
21 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Constituency Parsers (2)
I Collins/Bikel parser:http://people.csail.mit.edu/mcollins/code.htmlhttp://www.cis.upenn.edu/∼dbikel/software.html
I Trainable on English, Chinese, and Arabic; designed forPenn Treebank-style annotation
I Stanford parser:http://nlp.stanford.edu/downloads/lex-parser.shtml
I Trainable; models for English, German, Chinese, &Arabic; dependencies also available
I Berkeley parser:http://code.google.com/p/berkeleyparser/
I Trainable; models for English, German, and Chinese
22 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Dependency parsers
Recent parsers, which generally include other NLP tools:I Mate Parser: https://code.google.com/p/mate-tools/I TurboParser: http://www.ark.cs.cmu.edu/TurboParser/I ZPar: http://sourceforge.net/projects/zpar/I Stanford Neural Network Parser:
http://nlp.stanford.edu/software/nndep.shtmlClassic dependency parsers:I MaltParser:
http://w3.msi.vxu.se/∼nivre/research/MaltParser.htmlI Trainable; models for Swedish, English, & Chinese
I MSTParser: http://sourceforge.net/projects/mstparserI Trainable; has some models for English & Portuguese
I Link Grammar parser:http://www.abisource.com/projects/link-grammar/
I English only
CCG parsers: http://groups.inf.ed.ac.uk/ccg/software.htmlI Primarily for English, although can be trained on
German CCGbank23 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Topic #4: Semantics
Semantics is the study of meaning in language
We’ll break it down into:I Lexical semantics: word meaning
I Semantic spaces: word meaning derived from data
I Compositional semantics: sentence meaning
and look at technology for all of them
24 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Semantic class assignmentWord sense disambiguation
Word sense disambiguation (WSD): for a given word,determine its semantic classI bank.01: They robbed a bank and took the cash.I bank.02: They swam awhile and then rested on the
bank.
Lexical resources define the senses, e.g.I WordNet: http://wordnet.princeton.eduI BabelNet: http://babelnet.orgI SentiWordNet: http://sentiwordnet.isti.cnr.it
25 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
WSD software
I GWSD: Unsupervised Graph-based Word SenseDisambiguationhttp://web.eecs.umich.edu/∼mihalcea/downloads.html
I SenseLearner: All-Words Word Sense DisambiguationTool:http://web.eecs.umich.edu/∼mihalcea/downloads.html
I KYOTO UKB graph-based WSD:http://ixa2.si.ehu.es/ukb/
I pyWSD: Python Implementation of Simple WSDalgorithms: https://github.com/alvations/pywsd
I Various packages from Ted Pedersen, includingSenseval systems:http://www.d.umn.edu/∼tpederse/code.html
26 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Semantic class assignmentNamed entity recognition
Named entity recognition (NER): classify elements (words,phrases) into pre-defined entity classesI Common categories include: PER(son),
ORG(anization), LOC(ation), etc.I May have hierarchical categories
Techniques often rely on phrase chunking & may involveusing a gazetteer (external list of entities)I From the list of general NLP tools above, Stanford,
UIUC, & OpenNLP have NER modules
27 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
LIWC
A popular tool to use is LIWC (Linguistic Inquiry and WordCount)I http://liwc.wpengine.com
Words are grouped into “psychologically-relevant categories”based on hand-crafted dictionariesI It does not (admittedly) handle ambiguityI it is proprietary
28 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Statistical semanticsDistributional representations
Part of the motivation with using semantic classes is togroup together relatively infrequent wordsI i.e., get a handle on data sparsity
A long-standing hypothesis: the distributional hypothesisI “[L]inguistic items with similar distributions have similar
meanings”I https://en.wikipedia.org/wiki/Distributional semantics
I These patterns can be learned in large, general(unannotated) corpora and applied to our problems
I Roughly: the meaning of a word corresponds to itsposition in a vector space
I One package in Python is gensimI http://radimrehurek.com/gensim/
Consider also, e.g., Brown clustering(https://github.com/percyliang/brown-cluster)
29 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Statistical semanticsDistributed representations
More recently, distributed representations of words, usingneural networks, have been extremely popularI key phrases: deep learning, word embeddings,
recurrent neural networksI A word is represented by a variety of dimensions, each
one capturing potentially useful propertiesI http://aclweb.org/anthology/P/P10/P10-1040.pdf
Some resources:I A general guide to distributed representations:
http://www.cs.toronto.edu/∼bonner/courses/2014s/csc321/lectures/lec5.pdf
I A practical guide to word vectors: https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-2-word-vectors
I The word2vec page:https://code.google.com/archive/p/word2vec/
30 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Statistical semantics
Options to think about with word embeddings:I ArchitectureI Training algorithmI Downsampling of frequent wordsI Word vector dimensionalityI Context / window sizeI Worker threadsI Minimum word count
See Kaggle page for tips ...
31 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Semantic role labeling
Idea: The words of a sentence combine to form a meaningI Hypothesis: the syntax and semantics can be built up in
a corresponding fashion
Semantic role labeling is the task of assigning semanticroles to arguments in a sentence
e.g., for John loves Mary:
I (to) love is the predicateI John is the agent (ARG0)I Mary is the patient (ARG1)
32 / 33
Natural LanguageProcessing
Language modeling
POS taggingAvailable POS Taggers
ParsingAvailable parsers
SemanticprocessingSemantics (lexical)
Statistical semantics
Semantics (compositional)
Semantic role labelers
I Clear: http://www.clearnlp.comI SENNA: http://ml.nec-labs.com/senna/I UIUC:
http://cogcomp.cs.illinois.edu/page/software view/SRLI SEMAFOR:
https://code.google.com/p/semafor-semantic-parser/I SwiRL: http://www.surdeanu.info/mihai/swirl/I Shalmaneser:
http://www.coli.uni-saarland.de/projects/salsa/shal/I MATE: https://code.google.com/p/mate-tools/I Turbo: http://www.ark.cs.cmu.edu/TurboParser/
33 / 33