42
CS 4705 Natural Language Processing Fall 2017 Professor Kathy McKeown 1

Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

CS4705NaturalLanguageProcessingFall2017

ProfessorKathyMcKeown

1

Page 2: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

ClassSizeandIn-classDiscussion•  Piazzatointeractinclass.Forin-classandvideostudents•  Regularintervals:pausetoanswerques@ons•  Iwillalsoaskques@onsofyou

•  Answerswillbeshownontheslide•  Makingyournameshowntofellowstudents•  Ontheinstructorsystems,wewillseewhoposedaques@on->assigncreditforinterac@on

•  Nopenaltyforwronganswerexceptforinappropriatecontent

•  Mayuseaninteres@nganswertofurtherdiscussion•  In-classpar@cipa@onwillcounttowardsyourgrade 2

Page 3: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

DownloadPiazzaApponyourPhone

3

Page 4: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

ClassPolicyonElectronics• CellphoneinclassOKandneededforPiazzainterac@on• Keeplaptopsclosedordon’tbringtoclass

4

Page 5: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Today• WhatisNLP?• ClassLogis@cs• Whatwillwecover• Helpfulbackground• Classhomeworkandexams

5

Page 6: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Whatwillwestudyinthisclass• Howcanmachinesunderstandandgeneratelanguage?•  Examplesdrawnfromnaturallyoccurringcorpora•  Theoriesaboutlanguage•  Algorithms

•  Sta@s@calmethods•  Applica@ons

6

Page 7: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

KnowledgeNeeded• Morphology:wordforma@on• Syntax:wordorder• Seman@cs:wordmeaningandcomposi@on• Pragma@cs:Influenceofcontextandsitua@on

Goal:discoverwhatthespeakermeant 7

Page 8: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Morphology•  Importantforsearch,machinetransla@on,summariza@on• UnionAc9vi9esinNewYork•  Singular/plural•  Union/unions•  Ac@vity/ac@vi@es

•  Otherlanguagesaremorphologicallyrich•  Arabic:definiteembeddedintheword(cli@cs):Theunion(Al+)vs.aunion,unions•  German:casepartoftheword(subjvsobj)

•  Arethereexamplesinyourlanguage? 8

Page 9: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Responses

9

Page 10: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Newsarticletitles•  Stud@resout•  Eyedropsoffshelf•  Teacherstrikesidlekids• Drunkgetsninemonthsinviolincase•  Enragedcowinjuresfarmerwithax•  BanonnudedancingonGovernor’sdesk• Hospitalsaresuedbysevenfootdoctors•  Redtapeholdsupnewbridges• Governmentheadseeksarms•  Pa@entatdeath’sdoor–doctorspullhimthrough•  InAmericaawomanhasababyevery15minutes 10

Page 11: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Syntax• Partofspeechtagging:isawordanoun,verb,adverb,adjec@ve,etc?• Parsing•  Iden@fyingcons@tuents•  NP:KathyMcKeown,amaninthepark•  VP:waslookingup,hadrisen

•  Iden@fyingsubjectsandobjects•  BillhitJohnvsJohnhitBill

• Modifica@on•  Johnsawthemanintheparkwithatelescope 11

Page 12: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

PartofSpeechtagging• Stud9resout•  Tires:anounoraverb?• Eyedropsoffshelf• Drops:anounoraverb?• Teacherstrikesidlekids•  Strikes:anounoraverb?

12

Page 13: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Responses

13

Page 14: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

ConstituentStructureandModiKication• TheproblemofPPakachmentEnragedcowinjuresfarmerwithax• [Enragedcow]injuresfarmer[withax]• [Enragedcow]injuresfarmer[withax]

14

Page 15: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

RepresentingmodiKicationwithbrackets• [Enragedcow][injures[farmer[withax]]]• [Enragedcow][injures[farmer][withax]]]

15

Page 16: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

ConstituentStructureandModiKication• TheproblemofPPakachmentBanonnudedancingongovernor’sdesk• [Ban]on[nudedancing][ongovernor’sdesk]• Therearetwopossiblemodifica@ons?Whatarethey?• Whichoneiscorrect?

16

Page 17: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Response

17

Page 18: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

ConstituentStructureandModiKication

• TheproblemofPPakachmentBanonnudedancingongovernor’sdesk• [[Ban]on[nudedancing]][ongovernor’sdesk]• [Ban]on[[nudedancing][ongovernor’sdesk]]

18

Page 19: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

NounnounmodiKication• Waterfountain:afountainthatsupplieswater• Waterballet:aballetthattakesplaceinwater• Watermeter:adevice(calledameter)thatmeasureswater• Waterbarometer:abarometerthatuseswater(insteadofmercury)tomeasureairpressure• Waterglass:aglassthatismeanttoholdwater

19

Page 20: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

NounnounmodiKication:constituentstructure• Whichcons9tuentstructurebestrepresentsthemeaningofcountrysongpla9numalbum?

1.  [country[song[pla9numalbum]]]2.  [country[[songpla9num]album]]3.  [[countrysong][pla9numalbum]]4.  [[country[songpla9num]]album]5.  [[[countrysong]pla9num]album]

20

Page 21: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Response

21

Page 22: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

NounnounmodiKicationandheadlines• Hospitalsaresuedbysevenfootdoctors

• Hospitalsaresuedby[[sevenfoot]doctors]

• Hospitalsaresuedby[seven[footdoctors]]

22

Page 23: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

WordMeaning• Redtapeholdsupnewbridges• Holdsup:

1.  [TRANSITIVE]tosupportsomeoneorsomethingsothattheydonotfalldown•  Herlegswerealmosttooshakytoholdherup.

2.  [TRANSITIVE][OFTENPASSIVE]tocauseadelayforsomeoneorsomething,ortomakethemlate•  SorryI’mlate,butmyflightwasheldup.

• Governmentheadseeksarms• Head:• Arms: 23

Page 24: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Pragmatics• Discoursecontext•  Johnwenttothestore.HeboughtbreadandbuYer.

• Situa@onal(realworld)context• DayanertheCharlokesville,VAriots• Hisirresponsibleac9onstookthelifeofayoungwomanwhowasjustbeginningheradultlife.

• Commonsenseknowledge• Bostoncalledandle[amessageforJoe. 24

Page 25: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

TheLanguageofGenres:news,journal,socialmedia,novel?

•  Devasta@ng,butyetamazingstorm.IMBYjustsomebranches,tonofleaves,…ThissurpassesanyBigdaddystorm,shouldbecalledBigMama.(ComputerGuy)

•  Producedbyateamof26scien@stsledbytheUniversityofNewSouthWalesClimateResearchCentre,theDiagnosisconvincinglyprovesthattheeffectsofglobalwarminghavegokenworseinthelastthreeyears.(Somervilleetal2011)

•  HurricaneSandychurnedabout290milesofftheMid-Atlan@ccoastSundaynight,withtheNa@onalHurricaneCenterrepor@ngthatthemonsterstormwasexpectedtocomeashorewithnear-hurricane-forcewindsandpoten@ally"life-threatening"stormsurgeflooding.

•  WhatisthemaYer?'Icried.'Awreck!Closeby!'

25

Page 26: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Machinelearningframework• Data(onenlabeled)• Extrac@onof“features”fromtextdata• Predic@onofoutput

26

Page 27: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Machinelearningframework• Data(onenlabeled)• Extrac@onof“features”fromtextdata• Predic@onofoutputWhatdataisavailableforlearning?

27

Page 28: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Machinelearningframework• Data(onenlabeled)• Extrac@onof“features”fromtextdata• Predic@onofoutputWhatfeaturesyieldgoodpredic9ons?

28

Page 29: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

MachineLearningMethods• Supervised•  Supportvectormachine,NaïveBayes,Logis@cregression•  Sequencelabeling:HiddenMarkovModeling(HMM),Condi@onalRandomFields(CRF)•  Neuralnetworks

• Unsupervised•  Clustering

• Semi-supervised•  Boot-strapping,self-training,co-training•  Distantlearning 29

Page 30: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Wheredoesthedatacomefrom?• Manuallylabeled• Naturallyoccurring• Anoisy,butplen@fulsubs@tute

30

Page 31: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

CoreNLP• Morphologicalanalysis• Partofspeechtagging• Parsing

31

Page 32: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Applications• Searchingverylargetextandspeechcorpora•  E.g.,theweb

• Ques@onansweringovertheweb• Transla@ngbetweenonelanguageandanother:e.g.,ChineseandEnglish• Summarizingtext:e.g.,youremail,thenews,reviews• Sen@mentanalysis• Genera@ngtexts• Dialogsystems:Amtrak’sJulie

32

Page 33: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Logistics

33

Page 34: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Instructor• KathyMcKeown•  Office:722CEPSR•  NLPGroup•  35yearsatColumbia,FoundingDirectoroftheDataScienceIns@tute(juststeppeddown)

•  Research•  Summariza@on•  Ques@onAnswering•  LanguageGenera@on•  Sen@mentanalysis• Mul@lingualapplica@ons 34

Page 35: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

TAs• ElsbethTurcan(headTA)• DheerajKalmekolan• ApoorvKulshreshtha• RobertKwiatkowski• Fei-TzinLee• BhavanaRamachandra• SamarthTripathi

35

Page 36: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Background• Programming.WewillusePython• Inaddi@on,atleastone:• Ar@ficialIntelligence• Machinelearning• ProgrammingLanguagesandTranslators•  Sta@s@cs 36

Page 37: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Syllabus• Availableat:hkp://www.cs.columbia.edu/~kathy/NLP/2017

37

Page 38: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Textbooks• SpeechandLanguageProcessing,2ndEdi@on,[email protected],aswellasfromAmazonandotheronlineproviders.ItisalsoonreserveintheScienceLibrary.

• NeuralNetworkMethodsforNaturalLanguageProcessingbyYoavGoldberg.Itisavailableonlinebutyoucanalsopurchasehardcopyfromthepublisher. 38

Page 39: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Assignments• 4homeworkassignments:3programming,1wriken

•  WewillbeusingGoogleCloud•  HW0:•  Worth2points,butifyoudonot/cannotdoit,thisisnottheclassforyou.

•  Setsupyourgooglecloudaccountproperly•  Fourfreelatedays•  Anerthat10%offforeachdaylate

• Midtermandfinal• Evalua@on:50%homework+40%exams+10%classpar@cipa@on(viaPiazza)

39

Page 40: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

AcademicIntegrity•  Copyingorparaphrasingsomeone'swork(codeincluded),orpermi{ngyourownworktobecopiedorparaphrased,evenifonlyinpart,isforbidden,andwillresultinanautoma@cgradeof0fortheen@reassignmentorexaminwhichthecopyingorparaphrasingwasdone.Yourgradeshouldreflectyourownwork.Ifyouaregoingtohavetroublecomple@nganassignment,talktotheinstructororTAinadvanceoftheduedateplease.Everyone:Read/writeprotectyourhomeworkfilesatall@mes.

40

Page 41: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

ForNextClass• ReadChapters1-2ofJ&M,Chapter1ofNN• Ques@ons?(UsePiazza)

41

Page 42: Natural Language Processing - Columbia Universitykathy/NLP/2017/ClassSlides/... · • Bhavana Ramachandra • Samarth Tripathi 35 Background • Programming. We will use Python •

Questions

42