Combining Information Extraction, Deductive Reasoning and ...€¦ · Limitation: unstructured...

Preview:

Citation preview

Combining Information Extraction, DeductiveReasoning and Machine Learning for Relation

Prediction

Xueyan Jiang 2, Yi Huang 1,2, Maximilian Nickel 2,Volker Tresp 1,2

Siemens AG, Corporate Technology, Munich, Germany 1

Ludwig Maximilian University of Munich, Munich, Germany 2

May 31, 2012

1 / 22

Introduction

Relation prediction in RDF graph

RDF graph: knowledge base in form of a triple store

2 / 22

Introduction

Relation prediction in RDF graph

RDF graph: knowledge base in form of a triple storeTask: predict the truth of an instance of a relation orstatement, i.e. of an RDF triple

3 / 22

Introduction

Knowledge base: existing triples and new triples derived fromdeductive reasoning

4 / 22

Introduction

Unstructured contextual information: Wikipedia pages, Webpages, texts in literals

5 / 22

Motivation

Common approaches for relation prediction

IE (Information Extraction)

Data source: unstructured data, such as texts or imagesLimitation: unstructured information may not be available

DR (Deductive Reasoning)

Data source: a set of axiomsLimitation: can only derive subset, difficult to deal withuncertainty

ML (Machine Learning)

Data source: a set of true statementsLimitation: data must contain relevant statistical structureAdvantage: can express statistical dependencies betweenrelations, handle incomplete data

6 / 22

Motivation

Common approaches for relation prediction

IE (Information Extraction)Data source: unstructured data, such as texts or imagesLimitation: unstructured information may not be available

DR (Deductive Reasoning)Data source: a set of axiomsLimitation: can only derive subset, difficult to deal withuncertainty

ML (Machine Learning)Data source: a set of true statementsLimitation: data must contain relevant statistical structureAdvantage: can express statistical dependencies betweenrelations, handle incomplete data

Proposal

Combine IE, DR and ML in a principled way to make use of allknowledge sources for relation prediction

7 / 22

Outline

Matrix Representation for an RDF Graph

Proposed Framework for Combining IE, DR and ML

Prediction of relations from unstructured information (IE step)Derivation of relations from the knowledge base (DR step)Combination of IE step and DR stepDerivation of confidence values for predicted relations using aprobabilistic latent factor model (ML step)

8 / 22

Matrix Representation for an RDF Graph

We construct a matrix X from the RDF graphEach subject is represented as a rowEach column represents a (p,o) pair

9 / 22

Matrix Representation for an RDF Graph

We construct a matrix X from the RDF graphEach subject is represented as a rowEach column represents a (p,o) pairA matrix element X(s,p,o) is equal to one if the correspondingtriple is known to exist and is equal to zero otherwise

10 / 22

Proposed Framework for Combining IE, DR and ML

Prediction of relations from unstructured information(IE step)

In principle, any IE system can be usedIn our approach, we build a classifier to predictP(X = 1|IE ) ⇐⇒ P(X = 1|textsubject , textobject)

11 / 22

Proposed Framework for Combining IE, DR and ML

Prediction of relations from unstructured information(IE step)

In principle, any IE system can be usedIn our approach, we build a classifier to predictP(X = 1|IE ) ⇐⇒ P(X = 1|textsubject , textobject)

12 / 22

Proposed Framework for Combining IE, DR and ML

Derivation of relations from the knowledge base (DR step)

Knowledge Base: known triples and the triples added viaDeductive Reasoning (calculation of deductive closure)Any reasoner can be used

13 / 22

Proposed Framework for Combining IE, DR and ML

Derivation of relations from the knowledge base (DR step)

Knowledge Base: known triples and the triples added viaDeductive Reasoning (calculation of deductive closure)Any reasoner can be used

14 / 22

Proposed Framework for Combining IE, DR and ML

Derivation of relations from the knowledge base (DR step)

Knowledge Base: known triples and the triples added viaDeductive Reasoning (calculation of deductive closure)Any reasoner can be used

15 / 22

Proposed Framework for Combining IE, DR and ML

Combination of IE step and DR step:P(X = 1|IE ,DR) = max(P(X = 1|IE ),P(X = 1|DR))

16 / 22

Proposed Framework for Combining IE, DR and ML

Derivation of confidence values for predicted relations using aprobabilistic latent factor model (ML step)

Model descriptionWe define a new parameterization with a continuous fi,k usingsig(fi,k) = P(Xi,k = 1|IE ,DR)For each subject entity ei we introduce a d-dimensional latentvariable hi ∼ N(0, I )For each subject entity ei , αi is generated, via αi = Ahi ,where A has d columnsThen we assume fi,k = αi,k + εi,k

17 / 22

Proposed Framework for Combining IE, DR and ML

The maximum likelihood solution can be written as

α̂i = Ud diagd

(λj − σ̂2

λj

)UTd fi

where the columns of Ud are the principal d eigenvectors ofthe covariance matrix C = FTF with eigen values λ1, . . . , λd

Then P(Xi ,k = 1|IE ,DR,ML) = sig(α̂i ,k)

18 / 22

Experiments

Predicting gene-disease-relationships using LOD’s Linked LifeData and BIO2RDF (2462 genes, 331 diseases)

Target: for a given gene, predict likely diseasesIE: text fields from literals

19 / 22

Experiments

YAGO2 experiment: Prediction of writers’ nationalitiesML: 354 writers, 4 countries, city of birthML + AGG: include as columns the country of birth, derivedfrom the city of birth using geo reasoning (DR)IE: unstructured data from wikipages of the writers

20 / 22

Conclusion

IE: Exploit unstructured information

DR: Exploit axiomatic knowledge

ML: Exploit statistical patterns

We proposed an efficient way to combine ML, IE and DR in aprobabilistic model

21 / 22

Thanks!

22 / 22