153
CS 7650: Natural Language Processing Alan Ri:er (many slides from Greg Durrett)

CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

CS 7650: Natural Language Processing

Alan Ri:er(many slides from Greg Durrett)

Page 2: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Administrivia

‣ Course website: h:ps://ari:er.github.io/CS-7650/

‣ Piazza and Gradescope: links on the course website ‣ We will do our best to make sure quesJons about the

homework, etc. get answered within 24 hours

‣ TA Office hours: ‣ See spreadsheet

Page 3: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Course Requirements

‣ Prior exposure to machine learning very helpful but not required

‣ Programming / Python experience

‣ Probability

‣ Linear Algebra

‣ MulJvariable Calculus

Page 4: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Course Requirements

‣ Prior exposure to machine learning very helpful but not required

‣ Programming / Python experience

‣ Probability

‣ Linear Algebra

‣ MulJvariable Calculus

There will be a lot of math and programming!

Page 5: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Coursework Plan‣ 3 Programming Projects (fairly substanJal implementaJon effort)

‣ Text classificaJon

‣ Named enJty recogniJon (BiLSTM-CNN-CRF)

‣ Neural chatbot (Seq2Seq with a:enJon)

Page 6: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Coursework Plan‣ 3 Programming Projects (fairly substanJal implementaJon effort)

‣ Text classificaJon

‣ Named enJty recogniJon (BiLSTM-CNN-CRF)

‣ Neural chatbot (Seq2Seq with a:enJon)

‣ 2 wri:en assignments + midterm exam

‣ Mostly math problems related to ML / NLP

Page 7: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Coursework Plan‣ 3 Programming Projects (fairly substanJal implementaJon effort)

‣ Text classificaJon

‣ Named enJty recogniJon (BiLSTM-CNN-CRF)

‣ Neural chatbot (Seq2Seq with a:enJon)

‣ 2 wri:en assignments + midterm exam

‣ Mostly math problems related to ML / NLP

‣ Final project (details on course website, will discuss later)

Page 8: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Coursework Plan

‣ Problem Set 1 (math review) is out now on Gradescope (due Jan 25)

‣ 3 Programming Projects (fairly substanJal implementaJon effort)

‣ Text classificaJon

‣ Named enJty recogniJon (BiLSTM-CNN-CRF)

‣ Neural chatbot (Seq2Seq with a:enJon)

‣ 2 wri:en assignments + midterm exam

‣ Mostly math problems related to ML / NLP

‣ Final project (details on course website, will discuss later)

Page 9: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Free Textbooks!

‣ 2 really awesome free textbooks available

‣ There will be assigned readings from both

‣ Both freely available online

Page 10: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Programming Projects: ComputaJon‣ Modern NLP methods require non-trivial computaJon

‣ Training neural networks with many parameters can take a long Jme (it is a very good idea to start working on the assignments early!)

‣ You probably want to use a GPU

‣ Google Colab: free GPUs (some limitaJons)

‣ The programming projects are designed with Colab in mind

Page 11: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?

Page 12: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Page 13: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

‣ Example: dialogue systems

Page 14: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

‣ Example: dialogue systems

Page 15: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

‣ Example: dialogue systems

Page 16: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

‣ Example: dialogue systems

Page 17: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

Who is its CEO?

‣ Example: dialogue systems

Page 18: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

Who is its CEO?

‣ Example: dialogue systems

Tim Cook

Page 19: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

recognize marketCapis the target value

Who is its CEO?

‣ Example: dialogue systems

Tim Cook

Page 20: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

recognize marketCapis the target value

recognize predicate

Who is its CEO?

‣ Example: dialogue systems

Tim Cook

Page 21: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

recognize marketCapis the target value

recognize predicate

do computaJon

Who is its CEO?

‣ Example: dialogue systems

Tim Cook

Page 22: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

recognize marketCapis the target value

recognize predicate

do computaJon

Who is its CEO?

‣ Example: dialogue systems

resolve references

Tim Cook

Page 23: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

AutomaJc SummarizaJon

Page 24: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

AutomaJc SummarizaJon

Page 25: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

AutomaJc SummarizaJon

One of New America’s writers posted a statement criJcal of Google. Eric Schmidt, Google’s CEO, was displeased.

The writer and his team were dismissed.

Page 26: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

AutomaJc SummarizaJon

One of New America’s writers posted a statement criJcal of Google. Eric Schmidt, Google’s CEO, was displeased.

The writer and his team were dismissed.

compress text

Page 27: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

AutomaJc SummarizaJon

One of New America’s writers posted a statement criJcal of Google. Eric Schmidt, Google’s CEO, was displeased.

The writer and his team were dismissed.

provide missing context

compress text

Page 28: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

AutomaJc SummarizaJon

One of New America’s writers posted a statement criJcal of Google. Eric Schmidt, Google’s CEO, was displeased.

The writer and his team were dismissed.

provide missing context

paraphrase to provide clarity

compress text

Page 29: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Machine TranslaJon

People’s Daily, August 30, 2017

Page 30: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Machine TranslaJon

People’s Daily, August 30, 2017

Page 31: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Machine TranslaJon

Trump Pope family watch a hundred years a year in the White House balcony

People’s Daily, August 30, 2017

Page 32: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Machine TranslaJon

Trump Pope family watch a hundred years a year in the White House balcony

People’s Daily, August 30, 2017

Page 33: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis Pipeline

Page 34: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis PipelineText

Page 35: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Text AnalysisText

Page 36: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Text AnalysisText Annota.ons

Page 37: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Text Analysis Applica.onsText Annota.ons

Page 38: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Text Analysis Applica.onsText Annota.ons

Page 39: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Text Analysis Applica.onsText Annota.ons

Page 40: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Answer quesJons

Text Analysis Applica.onsText Annota.ons

Page 41: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Answer quesJons

IdenJfy senJment

Text Analysis Applica.onsText Annota.ons

Page 42: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Answer quesJons

IdenJfy senJment

Translate

Text Analysis Applica.onsText Annota.ons

Page 43: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Answer quesJons

IdenJfy senJment

‣ NLP is about building these pieces!Translate

Text Analysis Applica.onsText Annota.ons

Page 44: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Answer quesJons

IdenJfy senJment

‣ NLP is about building these pieces!Translate

Text Analysis Applica.onsText Annota.ons

‣ All of these components are modeled with staJsJcal approaches trained with machine learning

Page 45: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we represent language?Text

Page 46: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we represent language?LabelsText

Page 47: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we represent language?LabelsText

the movie was good +

Page 48: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we represent language?LabelsText

the movie was good +Beyoncé had one of the best videos of all 6me subjec.ve

Page 49: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we represent language?Labels

Sequences/tags

Text

the movie was good +Beyoncé had one of the best videos of all 6me subjec.ve

Tom Cruise stars in the new Mission Impossible filmPERSON MOVIE

Page 50: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we represent language?Labels

Sequences/tags

Trees

Text

the movie was good +Beyoncé had one of the best videos of all 6me subjec.ve

Tom Cruise stars in the new Mission Impossible filmPERSON MOVIE

I eat cake with icing

PPNPS

NPVP

VBZ NN

Page 51: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we represent language?Labels

Sequences/tags

Trees

Text

the movie was good +Beyoncé had one of the best videos of all 6me subjec.ve

Tom Cruise stars in the new Mission Impossible filmPERSON MOVIE

I eat cake with icing

PPNPS

NPVP

VBZ NNflights to Miami

λx. flight(x) ∧ dest(x)=Miami

Page 52: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Page 53: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

Page 54: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

Extract syntacJc features

Page 55: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

Extract syntacJc features

Tree-structured neural networks

Page 56: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

Tree transducers (for machine translaJon)

Extract syntacJc features

Tree-structured neural networks

Page 57: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

Tree transducers (for machine translaJon)

Extract syntacJc features

Tree-structured neural networks

Page 58: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

Tree transducers (for machine translaJon)

Extract syntacJc features

Tree-structured neural networks

end-to-end models …

Page 59: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

‣ Main quesJon: What representaJons do we need for language? What do we want to know about it?

Applica.ons

Tree transducers (for machine translaJon)

Extract syntacJc features

Tree-structured neural networks

end-to-end models …

Page 60: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

‣ Main quesJon: What representaJons do we need for language? What do we want to know about it?

‣ Boils down to: what ambiguiJes do we need to resolve?

Applica.ons

Tree transducers (for machine translaJon)

Extract syntacJc features

Tree-structured neural networks

end-to-end models …

Page 61: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Why is language hard? (and how can we handle that?)

Page 62: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

Page 63: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

Page 64: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

Page 65: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they advocated

Page 66: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they advocated

Page 67: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they feared

they advocated

Page 68: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they feared

they advocated

Page 69: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they feared

they advocated

‣ This is so complicated that it’s an AI challenge problem! (AI-complete)

Page 70: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they feared

they advocated

‣ This is so complicated that it’s an AI challenge problem! (AI-complete)

‣ ReferenJal/semanJc ambiguity

Page 71: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!

slide credit: Dan Klein

Page 72: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

Page 73: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids

Page 74: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors

Page 75: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk

Page 76: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk‣ Iraqi Head Seeks Arms

Page 77: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk‣ Iraqi Head Seeks Arms

‣ Stolen PainJng Found by Tree

Page 78: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk‣ Iraqi Head Seeks Arms

‣ Stolen PainJng Found by Tree‣ Kids Make NutriJous Snacks

Page 79: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk‣ Iraqi Head Seeks Arms

‣ Stolen PainJng Found by Tree‣ Kids Make NutriJous Snacks‣ Local HS Dropouts Cut in Half

Page 80: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ SyntacJc/semanJc ambiguity: parsing needed to resolve these, but need context to figure out which parse is correct

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk‣ Iraqi Head Seeks Arms

‣ Stolen PainJng Found by Tree‣ Kids Make NutriJous Snacks‣ Local HS Dropouts Cut in Half

Page 81: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

Page 82: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

il fait vraiment beau

Page 83: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau

Page 84: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really nice

Page 85: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJful

Page 86: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJfulIt is really beauJful outside

Page 87: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJfulIt is really beauJful outsideHe makes truly beauJful

Page 88: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJfulIt is really beauJful outsideHe makes truly beauJfulHe makes truly boyfriend

Page 89: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJfulIt is really beauJful outsideHe makes truly beauJful

It fact actually handsomeHe makes truly boyfriend

Page 90: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

‣ Combinatorially many possibiliJes, many you won’t even register as ambiguiJes, but systems sJll have to resolve them

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJfulIt is really beauJful outsideHe makes truly beauJful

It fact actually handsomeHe makes truly boyfriend

Page 91: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

‣ Lots of data!

slide credit: Dan Klein

What do we need to understand language?

Page 92: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

Page 93: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

DOJ greenlights Disney - Fox merger

Page 94: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

DOJ greenlights Disney - Fox merger

Page 95: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

DOJ greenlights Disney - Fox merger

Department of Jus6ce

Page 96: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

DOJ greenlights Disney - Fox merger

metaphor; “approves”

Department of Jus6ce

Page 97: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

DOJ greenlights Disney - Fox merger

metaphor; “approves”

Department of Jus6ce

‣ What is a green light? How do we understand what “green lighJng” does?

Page 98: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

‣ Grounding: learn what fundamental concepts actually mean in a data-driven way

What do we need to understand language?

Page 99: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

‣ Grounding: learn what fundamental concepts actually mean in a data-driven way

Golland et al. (2010)

What do we need to understand language?

Page 100: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

‣ Grounding: learn what fundamental concepts actually mean in a data-driven way

McMahan and Stone (2015)Golland et al. (2010)

What do we need to understand language?

Page 101: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

‣ LinguisJc structure

Centering Theory Grosz et al. (1995)

What do we need to understand language?

Page 102: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

‣ LinguisJc structure

‣ …but computers probably won’t understand language the same way humans do

Centering Theory Grosz et al. (1995)

What do we need to understand language?

Page 103: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

‣ LinguisJc structure

‣ …but computers probably won’t understand language the same way humans do

‣ However, linguisJcs tells us what phenomena we need to be able to deal with and gives us hints about how language works

Centering Theory Grosz et al. (1995)

What do we need to understand language?

Page 104: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

‣ LinguisJc structure

‣ …but computers probably won’t understand language the same way humans do

‣ However, linguisJcs tells us what phenomena we need to be able to deal with and gives us hints about how language works

Centering Theory Grosz et al. (1995)

What do we need to understand language?

Page 105: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

What techniques do we use? (to combine data, knowledge, linguisJcs, etc.)

Page 106: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

A brief history of (modern) NLP

1980 1990 2000 2010 2020

Page 107: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

A brief history of (modern) NLP

1980 1990 2000 2010 2020

“AI winter” rule-based, expert systems

Page 108: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Page 109: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Page 110: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Page 111: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Page 112: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Sup: SVMs, CRFs, NER, SenJment

Page 113: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Unsup: topic models, grammar inducJon

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Sup: SVMs, CRFs, NER, SenJment

Page 114: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Unsup: topic models, grammar inducJon

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Sup: SVMs, CRFs, NER, SenJment

Semi-sup, structured predicJon

Page 115: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Unsup: topic models, grammar inducJon

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Sup: SVMs, CRFs, NER, SenJment

Semi-sup, structured predicJon

Neural

Page 116: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Unsup: topic models, grammar inducJon

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Sup: SVMs, CRFs, NER, SenJment

Semi-sup, structured predicJon

Neural

Page 117: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Structured PredicJon

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

Page 118: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Structured PredicJon

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Page 119: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Structured PredicJon

‣ Supervised techniques work well on very li:le data

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Page 120: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Structured PredicJon

‣ Supervised techniques work well on very li:le data

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Page 121: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Structured PredicJon

‣ Supervised techniques work well on very li:le data

unsupervised learning

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Page 122: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Structured PredicJon

‣ Supervised techniques work well on very li:le data

annotaJon (two hours!)

unsupervised learning

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Page 123: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Structured PredicJon

‣ Supervised techniques work well on very li:le data

annotaJon (two hours!)

unsupervised learning

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

be:er system!

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Page 124: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Structured PredicJon

‣ Supervised techniques work well on very li:le data

annotaJon (two hours!)

unsupervised learning

‣ Even neural nets can do pre:y well!

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

be:er system!

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Page 125: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Bahdanau et al. (2014)DeNero et al. (2008)

Less Manual Structure?

Page 126: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Does manual structure have a place?

Page 127: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

Page 128: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

‣ Coreference: rule-based systems are sJll about as good as deep learning out-of-domain

Page 129: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

Moosavi and Strube (2017)

‣ Coreference: rule-based systems are sJll about as good as deep learning out-of-domain

Wikipedia

Newswire

Page 130: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

Moosavi and Strube (2017)

‣ Coreference: rule-based systems are sJll about as good as deep learning out-of-domain

‣ LORELEI: transiJon point below which phrase-based systems are be:er

Wikipedia

Newswire

Page 131: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

Moosavi and Strube (2017)

‣ Coreference: rule-based systems are sJll about as good as deep learning out-of-domain

‣ LORELEI: transiJon point below which phrase-based systems are be:er

‣ Why is this? InducJve bias!

Wikipedia

Newswire

Page 132: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

Moosavi and Strube (2017)

‣ Coreference: rule-based systems are sJll about as good as deep learning out-of-domain

‣ LORELEI: transiJon point below which phrase-based systems are be:er

‣ Why is this? InducJve bias!

‣ Can mulJ-task learning help?

Wikipedia

Newswire

Page 133: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Trump Pope family watch a hundred years a year in the White House balcony

Does manual structure have a place?

Page 134: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Trump Pope family watch a hundred years a year in the White House balcony

‣ Maybe manual structure would help…

Does manual structure have a place?

Page 135: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Where are we?

Page 136: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Where are we?

‣ NLP consists of: analyzing and building representaJons for text, solving problems involving text

Page 137: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Where are we?

‣ NLP consists of: analyzing and building representaJons for text, solving problems involving text

‣ These problems are hard because language is ambiguous, requires drawing on data, knowledge, and linguisJcs to solve

Page 138: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Where are we?

‣ NLP consists of: analyzing and building representaJons for text, solving problems involving text

‣ These problems are hard because language is ambiguous, requires drawing on data, knowledge, and linguisJcs to solve

‣ Knowing which techniques use requires understanding dataset size, problem complexity, and a lot of tricks!

Page 139: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Where are we?

‣ NLP consists of: analyzing and building representaJons for text, solving problems involving text

‣ These problems are hard because language is ambiguous, requires drawing on data, knowledge, and linguisJcs to solve

‣ Knowing which techniques use requires understanding dataset size, problem complexity, and a lot of tricks!

‣ NLP encompasses all of these things

Page 140: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP vs. ComputaJonal LinguisJcs

‣ NLP: build systems that deal with language data

‣ CL: use computaJonal tools to study language

Hamilton et al. (2016)

Page 141: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP vs. ComputaJonal LinguisJcs

‣ NLP: build systems that deal with language data

‣ CL: use computaJonal tools to study language

Hamilton et al. (2016)

Page 142: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP vs. ComputaJonal LinguisJcs

‣ ComputaJonal tools for other purposes: literary theory, poliJcal science…

Bamman, O’Connor, Smith (2013)

Page 143: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

NLP vs. ComputaJonal LinguisJcs

‣ ComputaJonal tools for other purposes: literary theory, poliJcal science…

Bamman, O’Connor, Smith (2013)

Page 144: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Course Goals

Page 145: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Course Goals

‣ Cover fundamental machine learning techniques used in NLP

Page 146: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Course Goals

‣ Cover fundamental machine learning techniques used in NLP

‣ Understand how to look at language data and approach linguisJc phenomena

Page 147: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Course Goals

‣ Cover fundamental machine learning techniques used in NLP

‣ Cover modern NLP problems encountered in the literature: what are the acJve research topics in 2021?

‣ Understand how to look at language data and approach linguisJc phenomena

Page 148: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Course Goals

‣ Cover fundamental machine learning techniques used in NLP

‣ Make you a “producer” rather than a “consumer” of NLP tools

‣ Cover modern NLP problems encountered in the literature: what are the acJve research topics in 2021?

‣ Understand how to look at language data and approach linguisJc phenomena

Page 149: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Course Goals

‣ Cover fundamental machine learning techniques used in NLP

‣ Make you a “producer” rather than a “consumer” of NLP tools

‣ Cover modern NLP problems encountered in the literature: what are the acJve research topics in 2021?

‣ The three assignments should teach you what you need to know to understand nearly any system in the literature

‣ Understand how to look at language data and approach linguisJc phenomena

Page 150: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Assignments‣ 3 Homework Assignments ‣ ImplementaJon-oriented ‣ ~2 weeks per assignment, 3 “slip days” for automaJc extensions

Page 151: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Assignments‣ 3 Homework Assignments ‣ ImplementaJon-oriented ‣ ~2 weeks per assignment, 3 “slip days” for automaJc extensions

These projects require understanding of the concepts, ability to write performant code, and ability to think about how to debug complex systems. They are challenging, so start early!

Page 152: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Final Project

‣ Final project (20%) ‣ Groups of 3-4 preferred, 1 is possible. ‣ Good idea to talk to run your project idea by me in office hours or email. ‣ 4 page report + final project presentaJon.

Page 153: CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple recognize marketCap is the target value recognize predicate do computaon Who is its

Gather.town Hangout