32
Information Extraction and Named Entity Recognition Introducing the tasks: Getting simple structured information out of text borrowing from Christopher Manning

Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

InformationExtractionandNamedEntity

Recognition

Introducingthetasks:

Gettingsimplestructuredinformationoutoftext

borrowingfromChristopherManning

Page 2: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

InformationExtraction

• Informationextraction(IE)systems• Findandunderstandlimitedrelevantpartsoftexts• Gatherinformationfrommanypiecesoftext• Produceastructuredrepresentationofrelevantinformation:• relations(inthedatabasesense),a.k.a.,• aknowledgebase

• Goals:1. Organizeinformationsothatitisusefultopeople2. Putinformationinasemanticallypreciseformthatallowsfurther

inferencestobemadebycomputeralgorithms

Page 3: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

InformationExtraction(IE)

• IEsystemsextractclear,factualinformation• Roughly:Whodidwhattowhomwhen?

• E.g.,• Gatheringearnings,profits,boardmembers,headquarters,etc.fromcompanyreports• TheheadquartersofBHPBillitonLimited,andtheglobalheadquartersofthecombinedBHPBillitonGroup,arelocatedinMelbourne,Australia.

• headquarters(“BHPBilitonLimited”,“Melbourne,Australia”)• Learndrug-geneproductinteractionsfrommedicalresearchliterature

Page 4: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Low-levelinformationextraction

• Isnowavailable–andIthinkpopular–inapplicationslikeAppleorGooglemail,andwebindexing

• Oftenseemstobebasedonregularexpressionsandnamelists

Page 5: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Low-levelinformationextraction

Page 6: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

NamedEntityRecognition(NER)

• Averyimportantsub-task:findandclassifynamesintext,forexample:

• ThedecisionbytheindependentMPAndrewWilkietowithdrawhissupportfortheminorityLaborgovernmentsoundeddramaticbutitshouldnotfurtherthreatenitsstability.When,afterthe2010election,Wilkie,RobOakeshott,TonyWindsorandtheGreensagreedtosupportLabor,theygavejusttwoguarantees:confidenceandsupply.

Page 7: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

• Averyimportantsub-task:findandclassifynamesintext,forexample:

• ThedecisionbytheindependentMPAndrewWilkietowithdrawhissupportfortheminorityLaborgovernmentsoundeddramaticbutitshouldnotfurtherthreatenitsstability.When,afterthe2010election,Wilkie,RobOakeshott,TonyWindsorandtheGreensagreedtosupportLabor,theygavejusttwoguarantees:confidenceandsupply.

NamedEntityRecognition(NER)

Page 8: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

• Averyimportantsub-task:findandclassifynamesintext,forexample:

• ThedecisionbytheindependentMPAndrewWilkietowithdrawhissupportfortheminorityLaborgovernmentsoundeddramaticbutitshouldnotfurtherthreatenitsstability.When,afterthe2010election,Wilkie,RobOakeshott,TonyWindsorandtheGreensagreedtosupportLabor,theygavejusttwoguarantees:confidenceandsupply.

NamedEntityRecognition(NER)

Person Date Location Organi- zation

Page 9: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

NamedEntityRecognition(NER)

• Theuses:• Namedentitiescanbeindexed,linkedoff,etc.• Sentimentcanbeattributedtocompaniesorproducts• AlotofIErelationsareassociationsbetweennamedentities• Forquestionanswering,answersareoftennamedentities.

• Concretely:• Manywebpagestagvariousentities,withlinkstobioortopicpages,etc.

• Reuters’OpenCalais,Evri,AlchemyAPI,Yahoo’sTermExtraction,…• Apple/Google/Microsoft/…smartrecognizersfordocumentcontent

Page 10: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

InformationExtractionandNamedEntity

Recognition

Introducingthetasks:

Gettingsimplestructuredinformationoutoftext

borrowingfromChristopherManning

Page 11: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

EvaluationofNamedEntityRecognition

TheextensionofPrecision,Recall,andtheFmeasureto

sequences

borrowingfromChristopherManning

Page 12: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

TheNamedEntityRecognitionTask

Task:Predictentitiesinatext

Foreign ORG Ministry ORG spokesman O Shen PER Guofang PER told O Reuters ORG : :

} Standardevaluation isperentity,notpertoken

Page 13: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Precision/Recall/F1forIE/NER

• RecallandprecisionarestraightforwardfortaskslikeIRandtextcategorization,wherethereisonlyonegrainsize(documents)

• ThemeasurebehavesabitfunnilyforIE/NERwhenthereareboundaryerrors(whicharecommon):• FirstBankofChicagoannouncedearnings…

• Thiscountsasbothafpandafn• Selectingnothingwouldhavebeenbetter• Someothermetrics(e.g.,MUCscorer)givepartialcredit(accordingto

complexrules)

Page 14: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

EvaluationofNamedEntityRecognition

TheextensionofPrecision,Recall,andtheFmeasureto

sequences

Page 15: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

SequenceModelsforNamedEntityRecognition

borrowingfromChristopherManning

Page 16: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

TheMLsequencemodelapproachtoNER

Training1. Collectasetofrepresentativetrainingdocuments2. Labeleachtokenforitsentityclassorother(O)3. Designfeatureextractorsappropriatetothetextandclasses4. Trainasequenceclassifiertopredictthelabelsfromthedata

Testing1. Receiveasetoftestingdocuments2. Runsequencemodelinferencetolabeleachtoken3. Appropriatelyoutputtherecognizedentities

Page 17: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Encodingclassesforsequencelabeling

IOencoding IOBencoding

Fred PER B-PER showed O O Sue PER B-PER Mengqiu PER B-PER Huang PER I-PER ‘s O O new O O painting O O

Page 18: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Featuresforsequencelabeling

• Words• Currentword(essentiallylikealearneddictionary)• Previous/nextword(context)

• Otherkindsofinferredlinguisticclassification• Part-of-speechtags

• Labelcontext• Previous(andperhapsnext)label

18

Page 19: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Features:Wordsubstrings

241

414174

drugcompanymovieplaceperson

Cotrimoxazole Wethersfield

AlienFury:CountdowntoInvasion

18

oxa

708

6

:

14

68

68

field

Page 20: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Features: Word shapes

• Word Shapes • Map words to simplified representation that encodes attributes

such as length, capitalization, numerals, Greek letters, internal punctuation, etc.

Varicella-zoster Xx-xxx

mRNA xXXX

CPA1 XXXd

Page 21: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

SequenceModelsforNamedEntityRecognition

borrowingfromChristopherManning

Page 22: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Maximumentropysequencemodels

MaximumentropyMarkovmodels(MEMMs)or

ConditionalMarkovmodels

borrowingfromChristopherManning

Page 23: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Sequenceproblems

• ManyproblemsinNLPhavedatawhichisasequenceofcharacters,words,phrases,lines,orsentences…

• WecanthinkofourtaskasoneoflabelingeachitemVBG NN IN DT NN IN NNChasing opportunity in an age of upheaval

POStagging

B B I I B I B I B B

而 相 对 于 这 些 品 牌 的 价

Wordsegmentation

PERS O O O ORG ORG

Murdoch discusses future of News Corp.

Namedentityrecognition

Textsegmen-tation

Q A Q A A A Q A

Page 24: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

MEMMinferenceinsystems

• ForaConditionalMarkovModel(CMM)a.k.a.aMaximumEntropyMarkovModel(MEMM),theclassifiermakesasingledecisionatatime,conditionedonevidencefromobservationsandpreviousdecisions

• Alargerspaceofsequencesisusuallyexploredviasearch

-3 -2 -1 0 +1

DT NNP VBD ??? ???

The Dow fell 22.6 %

LocalContextFeatures

W0 22.6

W+1 %

W-1 fell

T-1 VBD

T-1-T-2 NNP-VBD

hasDigit? true

… …(Ratnaparkhi1996;Toutanovaetal.2003,etc.)

DecisionPoint

Page 25: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Example:POSTagging

• Scoringindividuallabelingdecisionsisnomorecomplexthanstandardclassificationdecisions• Wehavesomeassumedlabelstouseforpriorpositions• Weusefeaturesofthoseandtheobserveddata(whichcanincludecurrent,

previous,andnextwords)topredictthecurrentlabel

-3 -2 -1 0 +1

DT NNP VBD ??? ???

The Dow fell 22.6 %

LocalContextFeatures

W0 22.6

W+1 %

W-1 fell

T-1 VBD

T-1-T-2 NNP-VBD

hasDigit? true

… …

DecisionPoint

(Ratnaparkhi1996;Toutanovaetal.2003,etc.)

Page 26: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Example:POSTagging

• POStaggingFeaturescaninclude:• Current,previous,nextwordsinisolationortogether.• Previousone,two,threetags.• Word-internalfeatures:wordtypes,suffixes,dashes,etc.

-3 -2 -1 0 +1

DT NNP VBD ??? ???

The Dow fell 22.6 %

LocalContextFeatures

W0 22.6

W+1 %

W-1 fell

T-1 VBD

T-1-T-2 NNP-VBD

hasDigit? true

… …(Ratnaparkhi1996;Toutanovaetal.2003,etc.)

DecisionPoint

Page 27: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

InferenceinSystemsSequenceLevel

LocalLevel

Loca

l D

ata Feature

Extraction

Features

Label

Optimization

Smoothing

Classifier Type

Features

Label

Sequen

ce

Dat

a

Maximum Entropy Models

Quadratic Penalties

Conjugate Gradient

Sequence Model Inference

Loca

l D

ata

Loca

l D

ata

Page 28: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

GreedyInference

• Greedy inference: • We just start at the left, and use our classifier at each position to assign a label • The classifier can depend on previous labeling decisions as well as observed data

• Advantages: • Fast, no extra memory requirements • Very easy to implement • With rich features including observations to the right, it may perform quite well

• Disadvantage: • Greedy. We make commit errors we cannot recover from

Sequence Model

Inference

Best Sequence

Page 29: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

BeamInference

• Beam inference: • At each position keep the top k complete sequences. • Extend each sequence in each local way. • The extensions compete for the k slots at the next position.

• Advantages: • Fast; beam sizes of 3–5 are almost as good as exact inference in many cases. • Easy to implement (no dynamic programming required).

• Disadvantage: • Inexact: the globally best sequence can fall off the beam.

Sequence Model

Inference

Best Sequence

Page 30: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

ViterbiInference

• Viterbi inference: • Dynamic programming or memoization. • Requires small window of state influence (e.g., past two states are relevant).

• Advantage: • Exact: the global best sequence is returned.

• Disadvantage: • Harder to implement long-distance state-state interactions (but beam inference

tends not to allow long-distance resurrection of sequences anyway).

Sequence Model

Inference

Best Sequence

Page 31: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

CRFs [Lafferty, Pereira, and McCallum 2001]

• Another sequence model: Conditional Random Fields (CRFs) • A whole-sequence conditional model rather than a chaining of local models.

• The space of c’s is now the space of sequences

• But if the features fi remain local, the conditional sequence likelihood can be calculated exactly using dynamic programming

• Training is slower, but CRFs avoid causal-competition biases • These (or a variant using a max margin criterion) are seen as the state-of-the-

art these days … but in practice usually work much the same as MEMMs.

∑ ∑'

),'(expc i

ii dcfλ=),|( λdcP ∑

iii dcf ),(exp λ

Page 32: Information Extraction and Named Entity Recognitionexactly using dynamic programming • Training is slower, but CRFs avoid causal-competition biases • These (or a variant using

Maximumentropysequencemodels

MaximumentropyMarkovmodels(MEMMs)or

ConditionalMarkovmodels

borrowingfromChristopherManning