Entity-Relationship Extraction from Wikipedia Unstructured Text - Overview

  • View
    141

  • Download
    3

  • Category

    Science

Preview:

Citation preview

Entity-RelationshipExtractionfromWikipediaUnstructuredText

RadityoEkoPrasojo (Rido)PhDStudent@KRDB,FreeUniversityofBozen-Bolzano

Supervisedby:Mouna Kacimi &WernerNutt

20.07.16,Bilbao,Spain

Automaticallygenerated Manuallycurated

Automatedextractionwithout(yet)aKBasaresult

KnowledgeVault[1]

KnowledgeGraph

NELL[2]

220/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

Infobox completion [3][4]

320/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

420/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

520/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

WherewasObamaborn?

WhoarethechildrenofObama?

620/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

WhenwasObamaborn?

WhoarethechildrenofObama?

Yeswecan!

Honolulu, HawaiiMaliaandSashaObama

720/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

WhichareObama’sfavourite sportsteam?

DoesObamahavepets?

OurgoalistoenrichexistingKnowledgeBasesbyextractingnewfactsintheformofmachine-readableentity-relationshipfromWikipediaunstructuredtext.

Specificfocus:RDF

820/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao

Whyisitdifficult?

• Theextractionproblem• Entityextraction&disambiguation• Relationextraction

• Therepresentationproblem• Lackofpredefinedschema/ontology• Topic-independency• Complexfactrepresentation

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 9

Whyisitdifficult?Example

• “Obamaisasupporterofthe ChicagoWhiteSox”• Straightforward,singletoninformation• Puresyntacticextractionpossible• Barack_Obama supporterOf Chicago_White_Sox

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 10

Whyisitdifficult?Example

• “Obamaisasupporterofthe ChicagoWhiteSox”• Straightforward,singletoninformation• Puresyntacticextractionpossible• Barack_Obama supporterOf Chicago_White_Sox

• “He isalsoprimarily a ChicagoBears footballfaninthe NFL,butinhischildhoodandadolescencewas a fanofthePittsburghSteelers”• Complex,multipleinformation• Semanticunderstandingnecessary• …howdowerepresentthis?

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 11

Example:representingcomplexfact

• “He isalsoprimarily a ChicagoBears footballfaninthe NFL,butinhischildhoodandadolescencewas a fanofthePittsburghSteelers”• Barack_Obama footballFan Chicago_Bears in NFL• supporterOf vsfootballFan• IsitnecessarytoincludeNFL inthewholerelations?• Whatabouttheadjectiveprimarily?Whatinformationdoesitimply?

• Barack_Obama fanOf Pittsburgh_Steelers• fanOf vs supporterOf• Missingthetimeinformationreferredin“inhischildhoodandadolescencewas”

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 12

Approach

• Documentpreprocessingtoannotateallentityoccurences.• Grammaticaldependencytoextract(candidate)relations.

• Separationbetweentheextractionproblemandtherepresentationproblem• Wefirstextractallcandidaterelationsandthenlaterapplysemanticrefinementforbetterrepresentation.

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 13

Preliminaryresults

• Groundtruthmanuallycuratedfrom25Wikipediaarticlesoffamouspeople.• Preprocessing• 4handcraftedextractionrulesleveraginggrammaticaldependency

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 14

Ongoingwork• Automatedrulemining• Semanticrefinementforknowledgerepresentation• Ontologybuilding

• Namingandtaxonomyofentities,classes,andrelations• Handlingcomplexfact

• Obamaappointsxasyinz• Handlingmodality,adjectives,andsentiment

• “Inthepast”,“itisrumoured that”,“itisnottruethat”

• Futureevaluation• Biggergroundtruth(amount+topiccoverage)• EvaluatehowwellweenrichexistingKBs

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 15

Futurework

• Metadataextraction• Dataquality,datacompleteness

• NaturallanguagequestionansweringbasedontheenrichedKB.

20/07/16 REPrasojo|KRDB@UNIBZ|WebST'16,Bilbao 16

Recommended