52
Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning and Richard Socher Lecture 6: Dependency Parsing

Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Natural Language Processingwith Deep Learning

CS224N/Ling284

Christopher Manning and Richard Socher

Lecture 6: Dependency Parsing

Page 2: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

LecturePlan

1. SyntacticStructure:ConsistencyandDependency2. DependencyGrammar3. Research highlight4. Transition-baseddependencyparsing5. Neuraldependencyparsing

Reminders/comments:Assignment1duetonightJAssignment2outtodayLFinalprojectdiscussion– comemeetwithusSorrythatwe’vebeenkindofoverloadedforofficehours

Page 3: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

1.Twoviewsoflinguisticstructure:Constituency=phrasestructuregrammar=context-freegrammars(CFGs)

Phrasestructureorganizeswordsintonestedconstituents.

thecatadog

large inacratebarkingonthetablecuddlybythedoor

Page 4: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

1.Twoviewsoflinguisticstructure:Constituency=phrasestructuregrammar=context-freegrammars(CFGs)

Phrasestructureorganizeswordsintonestedconstituents.

thecatadog

largeinacratebarkingonthetablecuddlybythedoor

largebarkingtalktolookfor

Page 5: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Twoviewsoflinguisticstructure:Dependencystructure

• Dependencystructureshowswhichwordsdependon(modifyorareargumentsof)whichotherwords.

Lookforthelargebarkingdogbythedoorinacrate

Page 6: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Twoviewsoflinguisticstructure:Dependencystructure

• Dependencystructureshowswhichwordsdependon(modifyorareargumentsof)whichotherwords.

Lookforthelargebarkingdogbythedoorinacrate

Page 7: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Ambiguity:PPattachments

Scientistsstudywhalesfrom space

Page 8: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Ambiguity:PPattachments

Scientistsstudywhalesfromspace

Scientists studywhalesfrom space

Page 9: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Attachmentambiguities

• Akeyparsingdecisionishowwe‘attach’variousconstituents• PPs,adverbialorparticipialphrases,infinitives,coordinations,etc.

Page 10: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Attachmentambiguities

• Akeyparsingdecisionishowwe‘attach’variousconstituents• PPs,adverbialorparticipialphrases,infinitives,coordinations,etc.

• Catalan numbers: Cn = (2n)!/[(n+1)!n!]

• An exponentially growing series, which arises in many tree-like contexts:• E.g., the number of possible triangulations of a polygon with n+2 sides

• Turns up in triangulation of probabilistic graphical models….

Page 11: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

The rise of annotated data:Universal Dependencies treebanks

[Universal Dependencies: http://universaldependencies.org/ ;cf. Marcus et al. 1993, The Penn Treebank, Computational Linguistics]

Page 12: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Theriseofannotateddata

Startingoff,buildingatreebankseemsalotslowerandlessusefulthanbuildingagrammar

Butatreebankgivesusmanythings• Reusabilityofthelabor• Manyparsers,part-of-speechtaggers,etc.canbebuiltonit• Valuableresourceforlinguistics

• Broadcoverage,notjustafewintuitions• Frequenciesanddistributionalinformation• Awaytoevaluatesystems

Page 13: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies

2.DependencyGrammarandDependencyStructure

submitted

Bills were

Senatorby

immigration

Brownback

andon

ports

Republican

of

Kansas

Page 14: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies

Thearrowsarecommonlytypedwiththenameofgrammaticalrelations(subject,prepositionalobject,apposition,etc.)

DependencyGrammarandDependencyStructure

submitted

Bills were

Senatorby

nsubj:pass aux obl

case

immigration

conj

Brownback

cc

andon

case

nmod

ports flat

Republican

ofcase

nmod

Kansas

appos

Page 15: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Dependencysyntaxpostulatesthatsyntacticstructureconsistsofrelationsbetweenlexicalitems,normallybinaryasymmetricrelations(“arrows”)calleddependencies

Thearrowconnectsahead (governor,superior,regent)withadependent (modifier,inferior,subordinate)

Usually,dependenciesformatree(connected,acyclic,single-head)

DependencyGrammarandDependencyStructure

submitted

Bills were

Senatorby

nsubj:pass aux obl

case

immigration

conj

Brownback

cc

andon

case

nmod

ports flat

Republican

ofcase

nmod

Kansas

appos

Page 16: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Pāṇini’s grammar (c. 5th century BCE)

16

Gallery: http://wellcomeimages.org/indexplus/image/L0032691.htmlCC BY 4.0 File:Birch bark MS from Kashmir of the Rupavatra Wellcome L0032691.jpg

Page 17: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

DependencyGrammar/ParsingHistory

• Theideaofdependencystructuregoesbackalongway• ToPāṇini’s grammar(c.5thcenturyBCE)• Basicapproachof1stmillenniumArabicgrammarians

• Constituency/context-freegrammarsisanew-fangledinvention• 20thcenturyinvention(R.S.Wells,1947)

• ModerndependencyworkoftenlinkedtoworkofL.Tesnière(1959)• Wasdominantapproachin“East”(Russia,China,…)• Goodforfree-er wordorderlanguages

• AmongtheearliestkindsofparsersinNLP,evenintheUS:• DavidHays,oneofthefoundersofU.S.computationallinguistics,builtearly(first?)dependencyparser(Hays1962)

Page 18: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

ROOTDiscussionoftheoutstandingissueswascompleted.

DependencyGrammarandDependencyStructure

• Somepeopledrawthearrowsoneway;sometheotherway!• Tesnière hadthempointfromheadtodependent…

• UsuallyaddafakeROOTsoeverywordisadependentofprecisely1othernode

Page 19: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Whatarethesourcesofinformationfordependencyparsing?1. Bilexical affinities[discussionà issues]isplausible

2. Dependencydistancemostlywithnearbywords

3. InterveningmaterialDependenciesrarelyspaninterveningverbsorpunctuation

4. Valency ofheadsHowmanydependentsonwhichsideareusualforahead?

DependencyConditioningPreferences

ROOTDiscussionoftheoutstandingissueswascompleted.

Page 20: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

DependencyParsing

• Asentenceisparsedbychoosingforeachwordwhatotherword(includingROOT)isitadependentof

• Usuallysomeconstraints:• OnlyonewordisadependentofROOT• Don’twantcyclesA→B,B→A

• Thismakesthedependenciesatree• Finalissueiswhetherarrowscancross(non-projective)ornot

20

I give a on bootstrappingtalk tomorrowROOT ’ll

Page 21: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

MethodsofDependencyParsing

1. DynamicprogrammingEisner(1996)givesacleveralgorithmwithcomplexityO(n3),byproducingparseitemswithheadsattheendsratherthaninthemiddle

2. GraphalgorithmsYoucreateaMinimumSpanningTreeforasentenceMcDonaldetal.’s(2005)MSTParser scoresdependenciesindependentlyusinganMLclassifier(heusesMIRA,foronlinelearning,butitcanbesomethingelse)

3. ConstraintSatisfactionEdgesareeliminatedthatdon’tsatisfyhardconstraints.Karlsson (1990),etc.

4. “Transition-basedparsing”or“deterministicdependencyparsing”GreedychoiceofattachmentsguidedbygoodmachinelearningclassifiersMaltParser (Nivre etal.2008).Hasprovenhighlyeffective.

Page 22: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Improving Distributional Similarity with Lessons Learned from Word Embeddings

Omer Levy, Yoav Goldberg, Ido Dagan

Presented by: Ajay Sohmshetty

Page 23: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Count-based distributional models

● SVD (Singular Value Decomposition)

● PPMI (Positive Pointwise Mutual Information)

Neural network-based models

● SGNS (Skip-Gram Negative Sampling)

● GloVe

Word Representation Methods

Page 24: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Word Representation Methods

Count-based distributional models

● SVD (Singular Value Decomposition)

● PPMI (Positive Pointwise Mutual Information)

Neural network-based models

● SGNS (Skip-Gram Negative Sampling) / CBOW

● GloVe

Page 25: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Word Representation Methods

Conventional wisdom: Neural-network based models > Count-based models

Neural network-based models

● SGNS (Skip-Gram Negative Sampling) / CBOW

● GloVe

Count-based distributional models

● SVD (Singular Value Decomposition)

● PPMI (Positive Pointwise Mutual Information)

Page 26: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Word Representation Methods

Levy et. al.:Hyperparameters and system design choices more important,

not the embedding algorithms themselves.

Neural network-based models

● SGNS (Skip-Gram Negative Sampling) / CBOW

● GloVe

Count-based distributional models

● SVD (Singular Value Decomposition)

● PPMI (Positive Pointwise Mutual Information)

Page 27: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Hyperparameters in Skip-Gram

→ These can be transferred over to the count-based variants.

Unigram distribution smoothing exponent

# of negative samples

Page 28: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Context Distribution Smoothing

Page 29: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Shifted PMI

Page 30: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

All Transferable HyperparametersHyperparameter Explored Values Applicable Methods

Window 2, 5, 10 All

Dynamic Context Window None, with All

Subsampling None, dirty, clean All

Deleting Rare Words None, with All

Shifted PMI 1, 5, 15 PPMI, SVD, SGNS

Context Distribution Smoothing 1, 0.75 PPMI, SVD, SGNS

Adding Context Vectors Only w, w+c SVD, SGNS, GloVe

Eigenvalue Weighting 0, 0.5, 1 SVD

Vector Normalization None, row, col, both All

Preprocessing

Association Metric

Postprocessing

Page 31: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ResultsWord Similarity Tasks Analogy Tasks

Page 32: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Key Takeaways- This paper challenges the conventional wisdom that neural

network-based models are superior to count-based models.

- While model design is important, hyperparameters are also KEY for achieving reasonable results. Don’t discount their importance!

- Challenge the status quo!

Page 33: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

4.Greedytransition-basedparsing[Nivre 2003]

• Asimpleformofgreedydiscriminativedependencyparser• Theparserdoesasequenceofbottomupactions

• Roughlylike“shift”or“reduce”inashift-reduceparser,butthe“reduce”actionsarespecializedtocreatedependencieswithheadonleftorright

• Theparserhas:• astackσ,writtenwithtoptotheright• whichstartswiththeROOTsymbol

• abufferβ,writtenwithtoptotheleft• whichstartswiththeinputsentence

• asetofdependencyarcsA• whichstartsoffempty

• asetofactions

Page 34: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Basictransition-baseddependencyparser

Start: σ =[ROOT],β=w1,…,wn ,A=∅1. Shiftσ,wi|β,Aè σ|wi,β,A2. Left-Arcr σ|wi|wj,β,Aè σ|wj,β,A∪{r(wj,wi)}3. Right-Arcr σ|wi|wj,β,Aè σ|wi,β,A∪{r(wi,wj)}Finish:σ =[w], β=∅

Page 35: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Arc-standard transition-basedparser(thereareothertransitionschemes…)Analysisof“Iatefish”

ate fish[root]

Start

I

[root]

Shift

I ate fish

ate[root] fish

Shift

I

Start: σ = [ROOT], β = w1, …, wn , A = ∅1. Shift σ, wi|β, A è σ|wi, β, A2. Left-Arcr σ|wi|wj, β, A è

σ|wj, β, A∪{r(wj,wi)} 3. Right-Arcr σ|wi|wj, β, A è

σ|wi, β, A∪{r(wi,wj)}Finish: β = ∅

Page 36: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Arc-standard transition-basedparserAnalysisof“Iatefish”

ate[root] ate[root]

Left Arc

IA +=nsubj(ate → I)

ate fish[root] ate fish[root]

Shift

ate[root] [root]

Right ArcA +=obj(ate → fish)fish ate

ate[root] [root]

Right ArcA +=root([root] → ate)Finish

Page 37: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

MaltParser[Nivre andHall2005]

• Wehavelefttoexplainhowwechoosethenextaction• Eachactionispredictedbyadiscriminativeclassifier(often

SVM,canbeperceptron,maxent classifier)overeachlegalmove• Maxof3untyped choices;maxof|R|× 2+1whentyped• Features:topofstackword,POS;firstinbufferword,POS;etc.

• ThereisNOsearch(inthesimplestform)• Butyoucanprofitablydoa beamsearchifyouwish(slowerbutbetter)

• ItprovidesVERY fastlineartimeparsing• Themodel’saccuracyisslightly belowthebestdependency

parsers,but• Itprovidesfast,closetostateoftheartparsingperformance

Page 38: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

FeatureRepresentation

Feature templates: usually a combination of 1 ~ 3 elements from the configuration.

Indicator features

0 0 0 1 0 0 1 0 0 0 1 0binary, sparsedim =106 ~ 107

Page 39: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

EvaluationofDependencyParsing:(labeled)dependencyaccuracy

ROOT She saw the video lecture 0 1 2 3 4 5

Gold1 2 She nsubj2 0 saw root 3 5 the det4 5 video nn5 2 lecture obj

Parsed1 2 She nsubj2 0 saw root 3 4 the det4 5 video nsubj5 2 lecture ccomp

Acc = #correctdeps#ofdeps

UAS=4/5=80%LAS=2/5=40%

Page 40: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Dependencypathsidentifysemanticrelations– e.g,forproteininteraction

[Erkan et al. EMNLP 07, Fundel et al. 2007, etc.]

KaiCçnsubj interactsnmod:withè SasAKaiCçnsubj interactsnmod:withè SasA conj:andè KaiAKaiCçnsubj interactsprep_withè SasA conj:andè KaiB

demonstrated

results

KaiC

interacts

rythmically

nsubj

The

markdet

ccomp

thatnsubj

KaiBKaiA

SasA

conj:and

conj:andadvmod

nmod:with

with andcc

case

Page 41: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

• DependenciesparalleltoaCFGtreemustbeprojective• Theremustnotbeanycrossingdependencyarcswhenthewordsarelaidoutintheirlinearorder,withallarcsabovethewords.

• Butdependencytheorynormallydoesallownon-projectivestructurestoaccountfordisplacedconstituents• Youcan’teasilygetthesemanticsofcertainconstructionsrightwithoutthesenonprojective dependencies

WhodidBillbuythecoffeefromyesterday?

Projectivity

Page 42: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Handlingnon-projectivity

• Thearc-standardalgorithmwepresentedonlybuildsprojectivedependencytrees

• Possibledirectionstohead:1. Justdeclaredefeatonnonprojective arcs2. Useadependencyformalismwhichonlyadmitsprojective

representations(aCFGdoesn’trepresentsuchstructures…)3. Useapostprocessortoaprojectivedependencyparsingalgorithmto

identifyandresolvenonprojective links4. Addextratransitionsthatcanmodelatleastmostnon-projective

structures(e.g.,addanextraSWAPtransition,cf.bubblesort)5. Movetoaparsingmechanismthatdoesnotuseorrequireany

constraintsonprojectivity(e.g.,thegraph-basedMSTParser)

Page 43: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

5.Whytrainaneuraldependencyparser?IndicatorFeaturesRevisited

• Problem#1:sparse• Problem#2:incomplete• Problem#3:expensivecomputation

Morethan95%ofparsingtimeisconsumedbyfeaturecomputation.

OurApproach:learnadenseandcompactfeaturerepresentation

0.1densedim = ~1000

0.9-0.2 0.3 -0.1 -0.5…

Page 44: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Aneuraldependencyparser[ChenandManning2014]

• EnglishparsingtoStanfordDependencies:• Unlabeledattachmentscore(UAS)=head• Labeledattachmentscore(LAS)=headandlabel

Parser UAS LAS sent./s

MaltParser 89.8 87.2 469

MSTParser 91.4 88.1 10

TurboParser 92.3* 89.6* 8

C &M2014 92.0 89.7 654

Page 45: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

• Werepresenteachwordasad-dimensionaldensevector(i.e.,wordembedding)• Similarwordsareexpected tohaveclosevectors.

• Meanwhile,part-of-speechtags (POS)anddependencylabelsarealsorepresentedasd-dimensionalvectors.• Thesmallerdiscretesetsalsoexhibitmanysemanticalsimilarities.

DistributedRepresentations

come

go

werewas

isgood

NNS(pluralnoun)shouldbecloseto NN(singularnoun).num(numericalmodifier)shouldbecloseto amod(adjectivemodifier).

Page 46: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

ExtractingTokensandthenvectorrepresentationsfromconfiguration

s1

s2

b1

lc(s1)rc(s1)lc(s2)rc(s2)

goodhascontrol∅∅He∅

JJVBZNN∅∅PRP∅

∅∅∅∅∅nsubj∅

+ +

word POS dep.

• Weextractasetoftokensbasedonthestack/bufferpositions:

• Weconvertthemtovectorembeddingsandconcatenatethem

Page 47: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

ModelArchitecture

Input layer xlookup+concat

Hidden layer hh = ReLU(Wx + b1)

Output layer yy = softmax(Uh + b2)

Softmax probabilities

cross-entropy error will beback-propagated to the embeddings.

Page 48: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Non-linearities betweenlayers:Whythey’reneeded

• Forlogisticregression:maptoprobabilities• Here:functionapproximation,

e.g.,forregressionorclassification• Withoutnon-linearities,deepneuralnetworkscan’tdoanythingmorethanalineartransform• Extralayerscouldjustbecompileddownintoasinglelineartransform

• Peopleusevariousnon-linearities

48

Page 49: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Non-linearities:What’sused

logistic(“sigmoid”)tanh

tanh isjustarescaledandshiftedsigmoidtanh isoftenusedandoftenperformsbetterfordeepnets

• It’soutputissymmetricaround0

tanh(z) = 2logistic(2z)−1

49

Page 50: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

ChristopherManning

Non-linearities:What’sused

hardtanh softsign linearrectifier(ReLU)

• hardtanh similarbutcomputationallycheaperthantanh andsaturateshard.• [Glorot andBengioAISTATS 2010,2011]discusssoftsign andrectifier• Rectifiedlinearunitisnowmegacommon– transferslinearsignalifactive

rect(z) =max(z, 0)softsign(z) = a1+ a

50

Page 51: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Dependencyparsingforsentencestructure

Neuralnetworkscanaccuratelydeterminethestructureofsentences,supportinginterpretation

ChenandManning(2014)wasthefirstsimple,successfulneuraldependencyparser

Thedenserepresentationsletitoutperformothergreedyparsersinbothaccuracyandspeed

Page 52: Natural Language Processing with Deep Learning CS224N/Ling284 · Christopher Manning Methods of Dependency Parsing 1. Dynamic programming Eisner (1996) gives a clever algorithm with

Furtherdevelopmentsintransition-basedneuraldependencyparsing

Thisworkwasfurtherdevelopedandimprovedbyothers,includinginparticularatGoogle

• Bigger,deepernetworkswithbettertunedhyperparameters• Beamsearch• Global,conditionalrandomfield(CRF)-styleinferenceoverthedecisionsequence

LeadingtoSyntaxNet andtheParsey McParseFace modelhttps://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html

Method UAS LAS(PTB WSJSD3.3Chen&Manning2014 92.0 89.7Weiss etal.2015 93.99 92.05Andor etal.2016 94.61 92.79