44
Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations for dummies as well

Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Embed Size (px)

Citation preview

Page 1: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Natural Language Processing

Yoav GoldbergComputer Science Department

Presented in Academic Writing in English course

Please try and make your own

presentations for dummies as well

Page 2: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

What is a Natural Language?

• Natural Languages are languages of humans(such as Hebrew, English, Arabic, Hindi, Latin..)

• These can be either written or spoken

Page 3: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

What is Natural Language Processing?

• A subfield of Computer Science

Page 4: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

What is Natural Language Processing?

• A subfield of Computer Science

• 20-30 years ago:

Page 5: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

What is Natural Language Processing?

• A subfield of Computer Science

• 20-30 years ago:

“NLP is about making the computer understand natural language”

Page 6: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

What is Natural Language Processing?

• A subfield of Computer Science

• 20-30 years ago:

“NLP is about making the computer understand natural language”

But today we know that:

• Language is HARD

• Computers are STUPID

Page 7: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

What is Natural Language Processing?

• A subfield of Computer Science• 20-30 years ago:

“NLP is about making the computer understand natural language”

But today we know that:• Language is HARD• Computers are STUPID

The computer will never understand language

Page 8: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

הרכבת

Page 9: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

הרכבת

הרכבת המהירה לחיפה

הרכבת הממשלה

הרכבת את הפאזל

הרכבת על הסוס?

Page 10: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

כפיות

Page 11: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

• I play the Bass

• I hate banks

Page 12: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

• I play the Bass

• Some people like Bass fishing

• I hate banks

• River banks are fun places

Page 13: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Some of the next ones might be hard also for humans!

Page 14: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

• Thin people eat candy

Page 15: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

• Thin people eat candy

• Fat people eat candy

Page 16: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

• Thin people eat candy

• Fat people eat candy

• Fat people eat steaks

Page 17: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

• Thin people eat candy

• Fat people eat candy

• Fat people eat steaks

• Fat people eat accumulates

Page 18: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

• Flying planes are dangerous

• Flying planes is dengerous

Page 19: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

• I saw a man on the hill with a telescope

Page 20: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

• I saw a man on the hill with a telescope

Who has the telescope?

Who is on the hill?

Page 21: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Language is Hard

• I saw a man on the hill with a telescope

Who has the telescope?Who is on the hill?

(and this takes for granted that the sentence is not about a very cruel way of killing someone)

Page 22: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Ok, so I hope I convinved you language is hard

And these examples didn’t even touch the subject of what understanding is all about!

Page 23: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

So computers will never understand language

Page 24: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

So computers will never understand language

How will I ever finish my thesis??

Page 25: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Fortunately, we can go a long way by cheating

Page 26: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

So what is Natural Language Processing today?

• Natural language processing is about making computer programs that can do seemingly intelligent things with Natural Language input

• Or, in other words, finding devious ways of cheating people to think the computer can understand language to some extent

Page 27: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Lies, Damn Lies, and Statistics

• One of our main cheating tools is Statistics

• Let me demonstrate

Page 28: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

For example:

• Humans know that:I have a spelling checker

• Makes far more sense than: Eye halve a spelling chequer

Can computers do that?

Page 29: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Example (cont.)

• “Make Sense” is hard, but we can cheat by changing the question:

Which of the following is More Probable?

“I have a spelling checker”

“Eye halve a spelling chequer”

Page 30: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Example (cont.)

• This is still hard, but we can cheat yet again by asking several easier questions

What’s the probability of seeing:

halve after Eye ?

a after halve ?

spelling after a ?

chequer after spelling ?

have after I ?a after have ? spelling after a ? checker after spelling ?

Page 31: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Example (cont.)

• This is still hard, but we can cheat yet again by asking several easier questions

What’s the probability of seeing:

halve after Eye ?

a after halve ?

spelling after a ?

chequer after spelling ?

have after I ?a after have ? spelling after a ? checker after spelling ?

(we are assuming every words depends only on the word preceding it. This is ofcourse wrong.)

Page 32: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Example (cont.)

Seeing halve after Eye:

P(halve | Eye)

Page 33: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Example (cont.)

Seeing halve after Eye:

P(halve | Eye)

= count(Eye halve) / count(Eye)

Page 34: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Example (cont.)

Seeing halve after Eye:

P(halve | Eye)

= count(Eye halve) / count(Eye)

= 14,600 / 301,000,000

= 4.85e-5

Page 35: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Example (cont.)

Seeing halve after Eye:P(halve | Eye) = count(Eye halve) / count(Eye) = 14,600 / 301,000,000

= 4.85e-5In the same manner:P( a | halve ) = 0.0033 P( have | I ) = 0.19P( spelling | a) = 1.5e-4 P ( a | have ) = 0.45P( chequer | spelling ) = 2.55e-4 P ( checker | spelling ) = 0.012

Page 36: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Example (cont)

• Combining the probabilities, we can estimate:

P(“Eye halve a spelling chequer”)

P(“I have a spelling checker”)

Page 37: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Example (cont)

• Combining the probabilities, we can estimate:

P(“Eye halve a spelling chequer”) 6.12e-15

P(“I have a spelling checker”) 1.53e-7

Page 38: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Example (cont)

• Combining the probabilities, we can estimate:

P(“Eye halve a spelling chequer”) 6.12e-15P(“I have a spelling checker”) 1.53e-7

Yep, I have a spelling checker makes far more sense.

Page 39: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

What else can we do?• Tell if a certain article is from the Washington Post or the

New York Times• Find the most informative sentence in a paragraph• Categorize texts into subjects (e.g. sports, economics,

literature, religion) • Tell that 2 news items are about the same event• Answer factual questions (When did Beethoven die?)• Divide sentences into meaningfull units[Pierre Vinken], [61 years old], [will join] [the board of

directors] [next Sunday]And much much more..

Page 40: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Boundaries disambiguation of coordinated conjunctions of NPs

And – what I’m interested in:

Page 41: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

And – what I’m interested in

• I work on Ands.

• More specifically, I’m trying to figure out the boundaries of the things joined by Ands.

I ate green apples and juicy bananas for lunch

Page 42: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

And – what I’m interested in

• Which of the following makes most sense?

I ate green (apples and juicy bananas) for lunch

I ate (green apples and juicy bananas) for lunch

I ate green (apples and juicy) bananas for lunch

I (ate green apples and juicy) bananas for lunch

Page 43: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

And?

• This is very easy for people• This is very hard for computers

My main intuitions:

People are joining similar things

When they do so, they tend to use similar structure

Switching between the joined things is usually allowed

Page 44: Natural Language Processing Yoav Goldberg Computer Science Department Presented in Academic Writing in English course Please try and make your own presentations

Thanks

Questions?