24
NLP

NLP. Natural Language Processing Where is this quote from? Dave Bowman: Open the pod bay doors, HAL. HAL: I’m sorry Dave. I’m afraid I can’t do that

Embed Size (px)

Citation preview

NLP

Natural Language Processing

Class Logistics

Quiz

• Where is this quote from?

Dave Bowman: Open the pod bay doors, HAL.HAL: I’m sorry Dave. I’m afraid I can’t do that.

Quiz Answer

• “2001: A Space Odyssey” – 1968 film by Stanley Kubrick – based on a joint screenplay with

Arthur C. Clarke.

Watson Example

http://www.geekwire.com/2013/ibm-takes-watson-cloud/

What is Natural Language Processing• Natural Language Processing (NLP) is the study

of the computational treatment of natural (human) language.

• In other words, teaching computers how to understand (and generate) human language.

How Computers Understand Language

Modern Applications

• Search engines– Google, Yahoo!, Bing, Baidu

• Question answering– IBM’s Watson

• Natural language assistants– Apple’s Siri

• Translation systems– Google Translate

• News digest– Yahoo!

• Automated earthquake reports– LA Times

• Automated stock market reports– Narrative Science

Notes• Computers are confused by (human) language

– Specific techniques are needed– NLP draws on research in Linguistics, Theoretical Computer

Science, Mathematics, Statistics, Artificial Intelligence, Psychology, Databases, etc.

• Goals of this class– Understand that language processing is hard (and why)– Understand the key problems in NLP– Learn about the methods used to address these problems– Understand the limitations of these methods

EECS 595/LING 541/SI 561• Instructor:

– Dragomir Radev ([email protected])

• Class times:– M 3:10-5:55 in 133 Chrysler

• GSI:– Catherine Finegan-Dollak (cfdollak)

• Grader:– TBA

EECS 595/LING 541/SI 561• Course home page:

– http://web.eecs.umich.edu/~radev/NLP-fall2015 /

• Textbook: – http://www.cs.colorado.edu/~martin/slp.html– Speech and Language Processing– by Jurafsky and Martin– Second edition, 2009– http://web.stanford.edu/~jurafsky/slp3 /

• Additional readings:– www.nltk.org

Other Available Books

• Foundations of Statistical Natural Language Processing– Chris Manning and Hinrich Schütze– http://nlp.stanford.edu/fsnlp/

• Natural Language Understanding– James Allen

Course Dates

• SEP – 14 21 28

• OCT 5– 12 26

• NOV– 2 9 16 23 30

• DEC – 7 14

• no class Mon Oct 19• midterm (unofficial) Nov 2• last class Mon Dec 14• exams Dec 16-23

Structure of the Course

• Four major parts:– Linguistic, mathematical, and computational background– Computational models of morphology, syntax, semantics, discourse, pragmatics– Core NLP technology: parsing, part of speech tagging, text generation, semantic

analysis, etc.– Applications: text classification, sentiment analysis, text summarization, question

answering, machine translation, information extraction, etc.• Three major goals:

– Learn the basic principles and theoretical issues underlying natural language processing

– Learn techniques and tools used to develop practical, robust systems that can understand text and communicate with users in one or more languages

– Gain insight into some open research problems in natural language

Syllabus

• Book sections– Introduction (chapter 1)– Words (chapters 2-6)– Syntax (chapters 12-16)– Semantics and Pragmatics (chapters 17-21)– Applications (chapters 22-25)

Draft SyllabusIntroduction Language ModelingPart-of-Speech TaggingHidden Markov ModelsFormal Grammars of EnglishSyntactic ParsingStatistical ParsingFeatures and UnificationDependency ParsingThe Representation of MeaningComputational SemanticsLexical SemanticsComputational Lexical SemanticsComputational DiscourseInformation ExtractionQuestion Answering and SummarizationDialog and Conversational AgentsMachine TranslationSentiment and Subjectivity AnalysisVector SemanticsDeep Learning for NLP

Grading

• Assignments– 4 programming projects (60%)– Midterm (15%)– Final (20%)– Class participation (5%)

Programming Projects

• Language Modeling and Part of Speech Tagging• Dependency Parsing• Vector Semantics for Word Sense Disambiguation• Machine Translation

More Sample Projects• Noun phrase parser• Paraphrase identification• Question answering• NL access to databases• Named entity tagging• Rhetorical parsing• Anaphora resolution• Document and sentence

alignment• Using bioinformatics methods• Information extraction• Speech processing• Sentence normalization

• Text summarization• Sentence compression• Definition extraction• Crossword puzzle generation• Prepositional phrase attachment• Machine translation• Generation• Semi-structured document parsing• Semantic analysis of short queries• User-friendly summarization• Number classification• Time-dependent fact extraction

Courses at Other Places• Brick-and-Mortar

– Johns Hopkins University (Jason Eisner)– Cornell University (Lillian Lee)– Stanford University (Chris Manning, Dan Jurafsky, Richard

Socher)– U. Maryland (Hal Daumé)– Berkeley (Dan Klein)– U. Texas (Ray Mooney)

• Coursera– Manning/Jurafsky (2012, survey)– Michael Collins (2013, more advanced)

The Association for Computational Linguistics (ACL)

www.aclweb.org

The Alphabet Soup

• NLP (Natural Language Processing)• CL (Computational Linguistics)• IR (Information Retrieval)• SP (Speech Processing)• HLT (Human Language Technology)• NLE (Natural Language Engineering)• ML (Machine Learning)

Research in NLP

• Conferences: – ACL/NAACL, EMNLP, SIGIR, AAAI/IJCAI, Coling, HLT, EACL/NAACL, AMTA/MT Summit, ICSLP/Eurospeech

• Journals: – Computational Linguistics, TACL, Natural Language Engineering, Information Retrieval, Information Processing and

Management, ACM Transactions on Information Systems, ACM TALIP, ACM TSLP

• University centers: – Berkeley, Columbia, Stanford, CMU, JHU, Brown, UMass, MIT, UPenn, USC/ISI, Illinois, Michigan, UW, Maryland, etc.– Toronto, Edinburgh, Cambridge, Sheffield, Saarland, Trento, Prague, QCRI, NUS, and many others

• Industrial research sites: – Google, MSR, Yahoo!, FB, IBM, SRI, BBN, MITRE, AT&T Labs

• The ACL Anthology– http://www.aclweb.org/anthology

• The ACL Anthology Network (AAN)– http://clair.eecs.umich.edu/aan/index.php

NLP