63
Knowledge Graphs: In Theory and Practice Nitish Aggarwal, IBM Watson, USA, Sumit Bhatia, IBM Research, India Saeedeh Shekarpour, Knoesis Research Centre Ohio, USA Amit Sheth, Knoesis Research Centre Ohio, USA

Knowledge Graphs: In Theory and Practice - Sumit …sumitbhatia.net/papers/KG_Tutorial_CIKM17_part1.pdf · Knowledge Graphs: In Theory and Practice NitishAggarwal, IBM Watson, USA,

Embed Size (px)

Citation preview

KnowledgeGraphs:InTheoryandPractice

Nitish Aggarwal,IBMWatson,USA,Sumit Bhatia, IBMResearch,IndiaSaeedeh Shekarpour,Knoesis ResearchCentreOhio,USAAmitSheth,Knoesis ResearchCentreOhio,USA

The material presented in this tutorial represents the personal opinion of the presenters and not of IBM and affiliated organization.

Outlineofthetutorial

Part 1: Knowledge Graph Construction• Introduction• DBpedia: Knowledge extraction• Approaches to extend knowledge graph• Knowledge extraction from scratch

Part 2: Knowledge Graph Analytics• Finding entities of interest• Entity exploration• Upcoming challenges

WhatisKnowledgeGraph

“The KnowledgeGraph isa knowledgebase usedby Google toenhanceits searchengine'ssearchresultswith semantic-searchinformationgatheredfromawidevarietyofsources.”

WhatisKnowledgeGraph

“The KnowledgeGraph isa knowledgebase usedby Google toenhanceits searchengine'ssearchresultswith semantic-searchinformationgatheredfromawidevarietyofsources.”

“AKnowledgegraph(i)mainlydescribesrealworldentitiesandinterrelations,organizedinagraph(ii)definespossibleclassesandrelationsofentitiesinaschema”(iii)allowspotentiallyinterrelatingarbitraryentitieswitheachother… [Paulheim H.]

“WedefinesaKnowledgeGraphasanRDFgraphconsistsofasetofRDFtripleswhereeachRDFtriple(s,p,o)isanorderedsetoffollowingRDFterm….”[Pujara J.alal.]

WhatisKnowledgeGraph

Nosingleformaldefinition…

• Definesrealworldentities

• Providesrelationshipsbetweenthem

WhatisKnowledgeGraph

Nosingleformaldefinition…

• Definesrealworldentities

• Providesrelationshipsbetweenthem

• Containsrulesdefinesthroughontologies

• Enablereasoningtoinfernewknowledge

WhyKnowledgeGraph

Building an intelligent system that can interact with human, requires knowledge about real world entities.

WhyKnowledgeGraph

Building an intelligent system that can interact with human, requires knowledge about real world entities.

• Enhance search results.

• Enhance ad sense.

• Help in language understanding.

• Enables knowledge discovery.

Isthereexistingknowledgegraphreadytouseformyapplication?

GoogleKnowledgeGraphFacebook

EntityGraph

MicrosoftSatori

LinkedInKnowledgeGraph

AmazonProductGraph

DBpedia:Knowledgeextraction

DBpedia:Knowledgeextraction

TheCityofNewYork,oftencalledNewYorkCity orsimplyNewYork,isthemostpopulouscityintheUnitedStates.

DBpedia:Knowledgeextraction

TheCityofNewYork,oftencalledNewYorkCity orsimplyNewYork,isthemostpopulouscityintheUnitedStates.

<NewYorkCity>,<CityIn><UnitedStates>.

<CityName>,<locatedIn><CountryName>.

DBpedia:Knowledgeextraction

TheCityofNewYork,oftencalledNewYorkCity orsimplyNewYork,isthemostpopulouscityintheUnitedStates.

DBpedia:Knowledgeextraction

DBpedia:Knowledgeextraction

<headentity>,<rel>< tailentity>

DBpedia:Knowledgeextraction

<headentity>,<rel>< tailentity>

WikipediaInfobox

DBpedia:Knowledgeextraction

DBpedia:Knowledgeextraction

DBpedia:Knowledgeextraction

DBpedia:Knowledgeextraction

Parsers

Ontology(Classes,properties)

dbr:IBM dbp:foundedBydbr:Charles_Ranlett_Flint

dbr:IBM dbp:foundedBydbr:Charles_Ranlett_Flint

dbr:IBM dbp:foundedBydbr:Charles_Ranlett_Flint

……………

(Research)problemsinknowledgegraphs

• Incomplete knowledge– Missing entities– Missing relations– Limited entity and relation types

(Research)problemsinknowledgegraphs

• Incomplete knowledge– Missing entities– Missing relations– Limited entity and relation types

• Incorrect knowledge– Wrong entity label recognition– Wrong entity and relation type– Wrong facts

(Research)problemsinknowledgegraphs

• Incomplete knowledge– Missing entities– Missing relations– Limited entity and relation types

• Incorrect knowledge– Wrong entity label recognition– Wrong entity and relation type– Wrong facts

• Inconsistency in knowledge– Different labels for same entity– Merging entities with same labels

Approachestoextendknowledgegraphs

• Extracting knowledge from Wikipedia tables– Large amount of raw data in form of tables– Tables have some implicit structure/patterns

Approachestoextendknowledgegraphs

• Extracting knowledge from Wikipedia tables– Large amount of raw data in form of tables– Tables have some implicit structure/patterns

Wiki:AFC_Ajax containingrelationsbetweenplayers,theirshirtnumber,andcountry

Approachestoextendknowledgegraphs

• <Wiki:AFC_Ajax,dbp:rel,Wiki:Andre_Onana>• 80%entitiesinthetablehaverelationdbp:rel withtheWikipediatitleentity

Wiki_AFC_Ajax• Other20%entitiesarelikelytohavethesamerelationshipdbp:rel withWiki_AFC_Ajax

[MunozE.atal.]UsingLinkedDatatoMineRDFfromWikipedia'sTables,WSDM2014

Approachestoextendknowledgegraphs

[MunozE.atal.]UsingLinkedDatatoMineRDFfromWikipedia'sTables,WSDM2014

• Features– Articlefeatures:no.oftables,length– Tablefeatures:no.ofrows,no.ofcolumns– Columnfeatures:no.ofentitiesincolumn,potentialrelations– Cellfeatures:no.ofentitiesinacell,lengthofcell– Manyothers

• Combinesusingclassificationmethod

Prec. Rec. F1

Rule-based 64.23 70.46 67.20

SVM 72.43 75.77 74.06

Logistic 79.62 79.01 79.31

Approachestoextendknowledgegraphs

[MunozE.atal.]UsingLinkedDatatoMineRDFfromWikipedia'sTables,WSDM2014

• Features– Articlefeatures:no.oftables,length– Tablefeatures:no.ofrows,no.ofcolumns– Columnfeatures:no.ofentitiesincolumn,potentialrelations– Cellfeatures:no.ofentitiesinacell,lengthofcell– Manyothers

• Combinesusingclassificationmethod

Prec. Rec. F1

Rule-based 64.23 70.46 67.20

SVM 72.43 75.77 74.06

Logistic 79.62 79.01 79.31

• Rules/heuristicsbasedmethodsmakesmistakes,andhardtocreateoneruleforeveryone.

• Eventhoughcombiningdifferentfeaturesachieves80%accuracy,itintroduces20%noise.

Tabledataislimited,weneedtogobeyond

Approachestoextendknowledgegraphs

• Missingentity/literalforarelation– “ChristopherA.WeltyisanAmericancomputerscientist,whoworksat

GoogleResearchinNY”• <dbr:Chris_Welty><employedBy><?>

– "TomCruiseandBradPittappearinInterviewwiththeVampire"• <dbr:Brad_Pitt><?><dbr:Tom_Cruise>

Approachestoextendknowledgegraphs

• Missingentity/literalforarelation– “ChristopherA.WeltyisanAmericancomputerscientist,whoworksat

GoogleResearchinNY”• <dbr:Chris_Welty><employedBy><?>

– "TomCruiseandBradPittappearinInterviewwiththeVampire"• <dbr:Brad_Pitt><?><dbr:Tom_Cruise>

• KnowledgeBaseCompletion– Similartolinkpredictioninsocialnetworkbutabitmorechallenging– Needtoidentifyrelationtypeinadditiontobinaryoutput.

Approachestoextendknowledgegraphs

• KnowledgeBaseCompletion– TransE:learntheentityandrelationembeddings byassumingthattranslation

ofentityembeddings correspondtotheirrelationembeddings.[Bordes etat.2013]

– S+R≈T,where<S,R,T>

– TransH:Learndifferententityembeddingfordifferentrelationships[Wangatel.2014]

– TransR:Learnentityandrelationembeddings indifferentspace,followingbytranslationperforminrelationspace.[LinY.atel.2015]

– Manymoremethods [NickelM.atal,2015]

Knowledgebasecompletionapproachesfocusonfindingmissingentities/relations

Needtoaddnewentitiesfromexternalsources

Needtoaddnewentitiesfromexternalsources

• Entityrecognitioninexternaltextresource• ManyNamedEntityRecognitionsystems

• LinkextractedentitytoKGorcreateanewnodeifitdoesnothaveacorrespondingentity

• TAC-KBP(EntityDiscoveryandLinkingtask)[JiH.atel.2016]

BuildingknowledgegraphsuchasDBpedia requireslotofmanualefforts.

BuildingknowledgegraphsuchasDBpedia requireslotofmanualefforts.

• Manyapplicationsrequiredomain/dataspecificcustomknowledgegraphs.

• CreatingschemawithclassstructureandconstraintsforeachKGisdifficult.

Howtocreateaknowledgegraphfromunstructuredtext?

JonathonWatsonworksatIBM.Hehasmorethan50patents,andwonbestinventorawardforhisinvention“NeuralChipbyJonWatsonetal.

JonathonWatsonworksatIBM.Hehasmorethan50patents,andwonbestinventorawardforhisinvention“NeuralChipbyJonWatsonetal.

Entityextraction

Relationextraction

Noisereduction KG

JonathonWatsonIBMJonWatson

employedBy(JonathonWatson,IBM)JonWatson

JonathonWatson,JonWatson

Relationextraction

• Supervised methods

Predefined schema (employedBy, bornOn, BirthPlace …)

Training data

JonathonWatsonworksatIBM.

JonathonWatsonjoinedIBM.

employedBy

employedBy

Relationextraction

• Supervised methods

Predefined schema (employedBy, bornOn, BirthPlace …)

Training data Test data

JonathonWatsonworksatIBM.

JonathonWatsonjoinedIBM.

employedBy

employedBy JonathonWatsonismanageratIBM.

?

Relationextraction

• Supervised methods

Predefined schema (employedBy, bornOn, BirthPlace …)

Training data Test data

JonathonWatsonworksatIBM.

JonathonWatsonjoinedIBM.

employedBy

employedBy JonathonWatsonismanageratIBM.

employedBy

Relationextraction

• Supervised methods

Pros: High accuracy and less noise

Cons: Hard and expensive to build labeled data

Relationextraction

• Supervised methods• Distantly supervised methods

employedBy (JonWatson,IBM)

affiliated(MichaelDecker,,SMU)

JonWatsonworksatIBM.

JonWatsonbecomesVPatIBM.……….

MichaelDeckerjoinsDataSciencegroupatSMU.

MichaelDeckerwonanationalfundingawardat

SMU.……….

Relationextraction

• Supervised methods• Distantly supervised methods

employedBy (JonWatson,IBM)

affiliated(MichaelDecker,,SMU)

JonWatsonworksatIBM.

JonWatsonbecomesVPatIBM.……….

MichaelDeckerjoinsDataSciencegroupatSMU.

MichaelDeckerwonanationalfundingawardat

SMU.……….

Trainingsentences

Relationextraction

• Supervised methods• Distantly supervised methods

Pros: Overcome the effort of labeling data

Cons: Dependency of existing knowledge graph and corresponding . text

Relationextraction

• Supervised methods• Distantly supervised methods• Unsupervised methods (OpenIE, Universal Schema)

JonathonWatsonworksatIBM.

JonathonWatsonjoinedIBM.

join

(ROOT(S(NP(JonWatson))(VP(VBZworks)(PP(INat)(NPIBM)))

(ROOT(S(NP(JonWatson))(VP(VBDjoined)(NPIBM))

work

Relationextraction

• Supervised methods• Distantly supervised methods• Unsupervised methods (OpenIE, Universal Schema)

Pros: eliminates the effort of labeling data

Cons: Noisy, large number of relations

Relationextraction

• Supervised methods• Distantly supervised methods• Unsupervised methods (OpenIE, Universal Schema)

Relation1 Relation2 Relation3

WorksemployerCompany

employedBy….

livesIncurrentCityCountry

….

VicePresidentexecutive

Boardmember

….

Relationextraction(UniversalSchema)

• Clustering using vector similarity• Matrix completion and fill the empty values [YaoL.atel.,

2012]

employeBy affiliated Leaderof

Jon x x

Michael x

Steve x x

Joyce x x x

Entitytypesidentification(UniversalSchema)

• Clustering using vector similarity• Matrix completion and fill the empty values [YaoL.atel.,

2012]

director musician actor

Jon x x

Michael x x

Steve x

Joyce x x

Relationextractionindomain

• Supervised methods – Need domain experts to label the data• Distantly supervised methods – Hard to find corresponding

text• Unsupervised methods (OpenIE, Universal Schema) – Noisy

A 59-year-old African American man with a past medical history of hypertension, benign prostatichypertrophy, type II diabetes mellitus for the past 15 years, and chronic back pain presents to the hospitalwith gross hematuria. The patient states that he noticed blood in his urine last night. The patient also reportsmild, intermittent flank pain. The patient states that his diabetes and blood pressure are well controlled withmedications, and that he has managed his chronic back pain with 2 aspirin per day for the past 4 years. Vitalsigns are Temp- 98.6°F, BP- 124/82 mm/Hg, pulse- 88/min, and RR- 14/min. Blood work is notable for HbA1Cof 6.5%. A pyelogram reveals a ring sign. His current fasting glucose is 140mmol/L.<br /><br />What is themost likely etiology of hematuria in this patient?

Symptom

Knowledgegraphsindomain

• Domain specific entity extraction is more challenging

• Limited relation types

• Less explicit mention of entity and relation types in text

• Creating simple schema requires domain experts

Knowledgegraph- Simple

JonathonWatsonworksatIBM.

MichaelDeckerjoinedIBM.

MichaelDeckerattendsSMU

JonathonWatson

IBM

MichaelDecker

SMU

Knowledgegraph– Simple+Schema

JonathonWatsonworksatIBM.

MichaelDeckerjoinsIBM.

MichaelDeckerattendedSMU

JonathonWatson

IBM

MichaelDecker

SMU

affiliated

affiliatedaffiliated

Knowledgegraph– Simple+Schema+Ontology

JonathonWatsonworksatIBM.

MichaelDeckerjoinedIBM.

MichaelDeckerattendsSMU

JonathonWatson

IBM

MichaelDecker

SMU

affiliated

affiliatedaffiliated

Domain,range,constraint

Summary

• Simple knowledge graph works for many applications

• Identify the requirement before finding the solution.

• Many knowledge graphs are publically available

https://www.youtube.com/watch?v=kao05ArIiok&feature=youtu.be