CS 7650: Natural Language Processing · Siri, what’s the most valuable American company? Apple...

Preview:

Citation preview

CS 7650: Natural Language Processing

Alan Ri:er(many slides from Greg Durrett)

Administrivia

‣ Course website: h:ps://ari:er.github.io/CS-7650/

‣ Piazza and Gradescope: links on the course website ‣ We will do our best to make sure quesJons about the

homework, etc. get answered within 24 hours

‣ TA Office hours: ‣ See spreadsheet

Course Requirements

‣ Prior exposure to machine learning very helpful but not required

‣ Programming / Python experience

‣ Probability

‣ Linear Algebra

‣ MulJvariable Calculus

Course Requirements

‣ Prior exposure to machine learning very helpful but not required

‣ Programming / Python experience

‣ Probability

‣ Linear Algebra

‣ MulJvariable Calculus

There will be a lot of math and programming!

Coursework Plan‣ 3 Programming Projects (fairly substanJal implementaJon effort)

‣ Text classificaJon

‣ Named enJty recogniJon (BiLSTM-CNN-CRF)

‣ Neural chatbot (Seq2Seq with a:enJon)

Coursework Plan‣ 3 Programming Projects (fairly substanJal implementaJon effort)

‣ Text classificaJon

‣ Named enJty recogniJon (BiLSTM-CNN-CRF)

‣ Neural chatbot (Seq2Seq with a:enJon)

‣ 2 wri:en assignments + midterm exam

‣ Mostly math problems related to ML / NLP

Coursework Plan‣ 3 Programming Projects (fairly substanJal implementaJon effort)

‣ Text classificaJon

‣ Named enJty recogniJon (BiLSTM-CNN-CRF)

‣ Neural chatbot (Seq2Seq with a:enJon)

‣ 2 wri:en assignments + midterm exam

‣ Mostly math problems related to ML / NLP

‣ Final project (details on course website, will discuss later)

Coursework Plan

‣ Problem Set 1 (math review) is out now on Gradescope (due Jan 25)

‣ 3 Programming Projects (fairly substanJal implementaJon effort)

‣ Text classificaJon

‣ Named enJty recogniJon (BiLSTM-CNN-CRF)

‣ Neural chatbot (Seq2Seq with a:enJon)

‣ 2 wri:en assignments + midterm exam

‣ Mostly math problems related to ML / NLP

‣ Final project (details on course website, will discuss later)

Free Textbooks!

‣ 2 really awesome free textbooks available

‣ There will be assigned readings from both

‣ Both freely available online

Programming Projects: ComputaJon‣ Modern NLP methods require non-trivial computaJon

‣ Training neural networks with many parameters can take a long Jme (it is a very good idea to start working on the assignments early!)

‣ You probably want to use a GPU

‣ Google Colab: free GPUs (some limitaJons)

‣ The programming projects are designed with Colab in mind

What’s the goal of NLP?

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

‣ Example: dialogue systems

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

‣ Example: dialogue systems

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

‣ Example: dialogue systems

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

‣ Example: dialogue systems

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

Who is its CEO?

‣ Example: dialogue systems

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

Who is its CEO?

‣ Example: dialogue systems

Tim Cook

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

recognize marketCapis the target value

Who is its CEO?

‣ Example: dialogue systems

Tim Cook

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

recognize marketCapis the target value

recognize predicate

Who is its CEO?

‣ Example: dialogue systems

Tim Cook

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

recognize marketCapis the target value

recognize predicate

do computaJon

Who is its CEO?

‣ Example: dialogue systems

Tim Cook

What’s the goal of NLP?‣ Be able to solve problems that require deep understanding of text

Siri, what’s the most valuable American

company?

Apple

recognize marketCapis the target value

recognize predicate

do computaJon

Who is its CEO?

‣ Example: dialogue systems

resolve references

Tim Cook

AutomaJc SummarizaJon

AutomaJc SummarizaJon

AutomaJc SummarizaJon

One of New America’s writers posted a statement criJcal of Google. Eric Schmidt, Google’s CEO, was displeased.

The writer and his team were dismissed.

AutomaJc SummarizaJon

One of New America’s writers posted a statement criJcal of Google. Eric Schmidt, Google’s CEO, was displeased.

The writer and his team were dismissed.

compress text

AutomaJc SummarizaJon

One of New America’s writers posted a statement criJcal of Google. Eric Schmidt, Google’s CEO, was displeased.

The writer and his team were dismissed.

provide missing context

compress text

AutomaJc SummarizaJon

One of New America’s writers posted a statement criJcal of Google. Eric Schmidt, Google’s CEO, was displeased.

The writer and his team were dismissed.

provide missing context

paraphrase to provide clarity

compress text

Machine TranslaJon

People’s Daily, August 30, 2017

Machine TranslaJon

People’s Daily, August 30, 2017

Machine TranslaJon

Trump Pope family watch a hundred years a year in the White House balcony

People’s Daily, August 30, 2017

Machine TranslaJon

Trump Pope family watch a hundred years a year in the White House balcony

People’s Daily, August 30, 2017

NLP Analysis Pipeline

NLP Analysis PipelineText

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Text AnalysisText

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Text AnalysisText Annota.ons

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Text Analysis Applica.onsText Annota.ons

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Text Analysis Applica.onsText Annota.ons

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Text Analysis Applica.onsText Annota.ons

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Answer quesJons

Text Analysis Applica.onsText Annota.ons

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Answer quesJons

IdenJfy senJment

Text Analysis Applica.onsText Annota.ons

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Answer quesJons

IdenJfy senJment

Translate

Text Analysis Applica.onsText Annota.ons

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Answer quesJons

IdenJfy senJment

‣ NLP is about building these pieces!Translate

Text Analysis Applica.onsText Annota.ons

NLP Analysis Pipeline

SyntacJc parses

Coreference resoluJon

EnJty disambiguaJon

Discourse analysis

Summarize

Extract informaJon

Answer quesJons

IdenJfy senJment

‣ NLP is about building these pieces!Translate

Text Analysis Applica.onsText Annota.ons

‣ All of these components are modeled with staJsJcal approaches trained with machine learning

How do we represent language?Text

How do we represent language?LabelsText

How do we represent language?LabelsText

the movie was good +

How do we represent language?LabelsText

the movie was good +Beyoncé had one of the best videos of all 6me subjec.ve

How do we represent language?Labels

Sequences/tags

Text

the movie was good +Beyoncé had one of the best videos of all 6me subjec.ve

Tom Cruise stars in the new Mission Impossible filmPERSON MOVIE

How do we represent language?Labels

Sequences/tags

Trees

Text

the movie was good +Beyoncé had one of the best videos of all 6me subjec.ve

Tom Cruise stars in the new Mission Impossible filmPERSON MOVIE

I eat cake with icing

PPNPS

NPVP

VBZ NN

How do we represent language?Labels

Sequences/tags

Trees

Text

the movie was good +Beyoncé had one of the best videos of all 6me subjec.ve

Tom Cruise stars in the new Mission Impossible filmPERSON MOVIE

I eat cake with icing

PPNPS

NPVP

VBZ NNflights to Miami

λx. flight(x) ∧ dest(x)=Miami

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

Extract syntacJc features

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

Extract syntacJc features

Tree-structured neural networks

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

Tree transducers (for machine translaJon)

Extract syntacJc features

Tree-structured neural networks

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

Tree transducers (for machine translaJon)

Extract syntacJc features

Tree-structured neural networks

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

Applica.ons

Tree transducers (for machine translaJon)

Extract syntacJc features

Tree-structured neural networks

end-to-end models …

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

‣ Main quesJon: What representaJons do we need for language? What do we want to know about it?

Applica.ons

Tree transducers (for machine translaJon)

Extract syntacJc features

Tree-structured neural networks

end-to-end models …

How do we use these representaJons?

LabelsSequencesTrees

Text AnalysisText

‣ Main quesJon: What representaJons do we need for language? What do we want to know about it?

‣ Boils down to: what ambiguiJes do we need to resolve?

Applica.ons

Tree transducers (for machine translaJon)

Extract syntacJc features

Tree-structured neural networks

end-to-end models …

Why is language hard? (and how can we handle that?)

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they advocated

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they advocated

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they feared

they advocated

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they feared

they advocated

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they feared

they advocated

‣ This is so complicated that it’s an AI challenge problem! (AI-complete)

Language is Ambiguous!‣ Hector Levesque (2011): “Winograd schema challenge” (named aner Terry

Winograd, the creator of SHRDLU)

The city council refused the demonstrators a permit because they ______ violence

they feared

they advocated

‣ This is so complicated that it’s an AI challenge problem! (AI-complete)

‣ ReferenJal/semanJc ambiguity

Language is Ambiguous!

slide credit: Dan Klein

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk‣ Iraqi Head Seeks Arms

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk‣ Iraqi Head Seeks Arms

‣ Stolen PainJng Found by Tree

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk‣ Iraqi Head Seeks Arms

‣ Stolen PainJng Found by Tree‣ Kids Make NutriJous Snacks

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk‣ Iraqi Head Seeks Arms

‣ Stolen PainJng Found by Tree‣ Kids Make NutriJous Snacks‣ Local HS Dropouts Cut in Half

Language is Ambiguous!‣ Ambiguous News Headlines:

slide credit: Dan Klein

‣ SyntacJc/semanJc ambiguity: parsing needed to resolve these, but need context to figure out which parse is correct

‣ Teacher Strikes Idle Kids‣ Hospitals Sued by 7 Foot Doctors‣ Ban on Nude Dancing on Governor’s Desk‣ Iraqi Head Seeks Arms

‣ Stolen PainJng Found by Tree‣ Kids Make NutriJous Snacks‣ Local HS Dropouts Cut in Half

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

il fait vraiment beau

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really nice

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJful

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJfulIt is really beauJful outside

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJfulIt is really beauJful outsideHe makes truly beauJful

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJfulIt is really beauJful outsideHe makes truly beauJfulHe makes truly boyfriend

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJfulIt is really beauJful outsideHe makes truly beauJful

It fact actually handsomeHe makes truly boyfriend

Language is Really Ambiguous!‣ There aren’t just one or two possibiliJes which are resolved pragmaJcally

‣ Combinatorially many possibiliJes, many you won’t even register as ambiguiJes, but systems sJll have to resolve them

It is really nice out

il fait vraiment beau It’s really niceThe weather is beauJfulIt is really beauJful outsideHe makes truly beauJful

It fact actually handsomeHe makes truly boyfriend

‣ Lots of data!

slide credit: Dan Klein

What do we need to understand language?

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

DOJ greenlights Disney - Fox merger

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

DOJ greenlights Disney - Fox merger

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

DOJ greenlights Disney - Fox merger

Department of Jus6ce

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

DOJ greenlights Disney - Fox merger

metaphor; “approves”

Department of Jus6ce

What do we need to understand language?

‣ World knowledge: have access to informaJon beyond the training data

DOJ greenlights Disney - Fox merger

metaphor; “approves”

Department of Jus6ce

‣ What is a green light? How do we understand what “green lighJng” does?

‣ Grounding: learn what fundamental concepts actually mean in a data-driven way

What do we need to understand language?

‣ Grounding: learn what fundamental concepts actually mean in a data-driven way

Golland et al. (2010)

What do we need to understand language?

‣ Grounding: learn what fundamental concepts actually mean in a data-driven way

McMahan and Stone (2015)Golland et al. (2010)

What do we need to understand language?

‣ LinguisJc structure

Centering Theory Grosz et al. (1995)

What do we need to understand language?

‣ LinguisJc structure

‣ …but computers probably won’t understand language the same way humans do

Centering Theory Grosz et al. (1995)

What do we need to understand language?

‣ LinguisJc structure

‣ …but computers probably won’t understand language the same way humans do

‣ However, linguisJcs tells us what phenomena we need to be able to deal with and gives us hints about how language works

Centering Theory Grosz et al. (1995)

What do we need to understand language?

‣ LinguisJc structure

‣ …but computers probably won’t understand language the same way humans do

‣ However, linguisJcs tells us what phenomena we need to be able to deal with and gives us hints about how language works

Centering Theory Grosz et al. (1995)

What do we need to understand language?

What techniques do we use? (to combine data, knowledge, linguisJcs, etc.)

A brief history of (modern) NLP

1980 1990 2000 2010 2020

A brief history of (modern) NLP

1980 1990 2000 2010 2020

“AI winter” rule-based, expert systems

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Sup: SVMs, CRFs, NER, SenJment

Unsup: topic models, grammar inducJon

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Sup: SVMs, CRFs, NER, SenJment

Unsup: topic models, grammar inducJon

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Sup: SVMs, CRFs, NER, SenJment

Semi-sup, structured predicJon

Unsup: topic models, grammar inducJon

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Sup: SVMs, CRFs, NER, SenJment

Semi-sup, structured predicJon

Neural

Unsup: topic models, grammar inducJon

Collins vs. Charniak parsers

A brief history of (modern) NLP

1980 1990 2000 2010 2020

earliest stat MT work at IBM

“AI winter” rule-based, expert systems

Penn treebank

NP VPS

Ratnaparkhi tagger

NNP VBZ

Sup: SVMs, CRFs, NER, SenJment

Semi-sup, structured predicJon

Neural

Structured PredicJon

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

Structured PredicJon

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Structured PredicJon

‣ Supervised techniques work well on very li:le data

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Structured PredicJon

‣ Supervised techniques work well on very li:le data

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Structured PredicJon

‣ Supervised techniques work well on very li:le data

unsupervised learning

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Structured PredicJon

‣ Supervised techniques work well on very li:le data

annotaJon (two hours!)

unsupervised learning

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Structured PredicJon

‣ Supervised techniques work well on very li:le data

annotaJon (two hours!)

unsupervised learning

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

be:er system!

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Structured PredicJon

‣ Supervised techniques work well on very li:le data

annotaJon (two hours!)

unsupervised learning

‣ Even neural nets can do pre:y well!

“Learning a Part-of-Speech Tagger from Two Hours of AnnotaJon” Garre:e and Baldridge (2013)

be:er system!

‣ All of these techniques are data-driven! Some data is naturally occurring, but may need to label

Bahdanau et al. (2014)DeNero et al. (2008)

Less Manual Structure?

Does manual structure have a place?

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

‣ Coreference: rule-based systems are sJll about as good as deep learning out-of-domain

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

Moosavi and Strube (2017)

‣ Coreference: rule-based systems are sJll about as good as deep learning out-of-domain

Wikipedia

Newswire

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

Moosavi and Strube (2017)

‣ Coreference: rule-based systems are sJll about as good as deep learning out-of-domain

‣ LORELEI: transiJon point below which phrase-based systems are be:er

Wikipedia

Newswire

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

Moosavi and Strube (2017)

‣ Coreference: rule-based systems are sJll about as good as deep learning out-of-domain

‣ LORELEI: transiJon point below which phrase-based systems are be:er

‣ Why is this? InducJve bias!

Wikipedia

Newswire

Does manual structure have a place?

‣ Neural nets don’t always work out of domain!

Moosavi and Strube (2017)

‣ Coreference: rule-based systems are sJll about as good as deep learning out-of-domain

‣ LORELEI: transiJon point below which phrase-based systems are be:er

‣ Why is this? InducJve bias!

‣ Can mulJ-task learning help?

Wikipedia

Newswire

Trump Pope family watch a hundred years a year in the White House balcony

Does manual structure have a place?

Trump Pope family watch a hundred years a year in the White House balcony

‣ Maybe manual structure would help…

Does manual structure have a place?

Where are we?

Where are we?

‣ NLP consists of: analyzing and building representaJons for text, solving problems involving text

Where are we?

‣ NLP consists of: analyzing and building representaJons for text, solving problems involving text

‣ These problems are hard because language is ambiguous, requires drawing on data, knowledge, and linguisJcs to solve

Where are we?

‣ NLP consists of: analyzing and building representaJons for text, solving problems involving text

‣ These problems are hard because language is ambiguous, requires drawing on data, knowledge, and linguisJcs to solve

‣ Knowing which techniques use requires understanding dataset size, problem complexity, and a lot of tricks!

Where are we?

‣ NLP consists of: analyzing and building representaJons for text, solving problems involving text

‣ These problems are hard because language is ambiguous, requires drawing on data, knowledge, and linguisJcs to solve

‣ Knowing which techniques use requires understanding dataset size, problem complexity, and a lot of tricks!

‣ NLP encompasses all of these things

NLP vs. ComputaJonal LinguisJcs

‣ NLP: build systems that deal with language data

‣ CL: use computaJonal tools to study language

Hamilton et al. (2016)

NLP vs. ComputaJonal LinguisJcs

‣ NLP: build systems that deal with language data

‣ CL: use computaJonal tools to study language

Hamilton et al. (2016)

NLP vs. ComputaJonal LinguisJcs

‣ ComputaJonal tools for other purposes: literary theory, poliJcal science…

Bamman, O’Connor, Smith (2013)

NLP vs. ComputaJonal LinguisJcs

‣ ComputaJonal tools for other purposes: literary theory, poliJcal science…

Bamman, O’Connor, Smith (2013)

Course Goals

Course Goals

‣ Cover fundamental machine learning techniques used in NLP

Course Goals

‣ Cover fundamental machine learning techniques used in NLP

‣ Understand how to look at language data and approach linguisJc phenomena

Course Goals

‣ Cover fundamental machine learning techniques used in NLP

‣ Cover modern NLP problems encountered in the literature: what are the acJve research topics in 2021?

‣ Understand how to look at language data and approach linguisJc phenomena

Course Goals

‣ Cover fundamental machine learning techniques used in NLP

‣ Make you a “producer” rather than a “consumer” of NLP tools

‣ Cover modern NLP problems encountered in the literature: what are the acJve research topics in 2021?

‣ Understand how to look at language data and approach linguisJc phenomena

Course Goals

‣ Cover fundamental machine learning techniques used in NLP

‣ Make you a “producer” rather than a “consumer” of NLP tools

‣ Cover modern NLP problems encountered in the literature: what are the acJve research topics in 2021?

‣ The three assignments should teach you what you need to know to understand nearly any system in the literature

‣ Understand how to look at language data and approach linguisJc phenomena

Assignments‣ 3 Homework Assignments ‣ ImplementaJon-oriented ‣ ~2 weeks per assignment, 3 “slip days” for automaJc extensions

Assignments‣ 3 Homework Assignments ‣ ImplementaJon-oriented ‣ ~2 weeks per assignment, 3 “slip days” for automaJc extensions

These projects require understanding of the concepts, ability to write performant code, and ability to think about how to debug complex systems. They are challenging, so start early!

Final Project

‣ Final project (20%) ‣ Groups of 3-4 preferred, 1 is possible. ‣ Good idea to talk to run your project idea by me in office hours or email. ‣ 4 page report + final project presentaJon.

Gather.town Hangout

Recommended