Temporal Link Prediction in Knowledge Networks

Embed Size (px)

Citation preview

Temporal Link Prediction in Knowledge Networks




Julia Perl, Jrme Kunegis

Wikipedia Knowledge Network

Knowledge Network consists of articles which are interlinked

Nodes = Wikipedia articles

Links = Links between Wikipedia articles

Wikipedia Knowledge Network

Appropriate links provide instant pathways to locations within and outside the project that are likely to increase readers' understanding of the topic at hand. When writing or editing an article, it is important to consider not only what to put in the article, but what links to include to help the reader find related information[...]An article is said to be underlinked if words are not linked that are needed to aid understanding of the article.[...]An overlinked article contains an excessive number of links, making it difficult to identify links likely to aid the reader's understanding significantly.[http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style/Linking accessed last on Dec. 14, 2013]

Research Questions

Our Research Questions

How to predict new interlinks between articles to avoid underlinking? (Link Prediction) How to predict interlinks between articles that should be removed to avoid overlinking or wrong links?(Unlink Prediction)

HypothesisStructural changes can be predicted from the network structure.

Link and UnlinkPrediction

Link and Unlink Prediction

Additions

Removals

Training

Link Prediction Problem

The Snapshot View

Unlink Prediction Problem

Link and Unlink Prediction

Unlink prediction is more difficult than link prediction

The snapshot view does not provide information on links that have been removed.

The Snapshot View

Temporal Link and Unlink Prediction

Prediction Models

Model 0: Baseline ModelSnapshot Model: measures computed from adjacency matrix

Model 1: Add-Remove Model Classic adjacency matrix and removal adjacency matrix

Model 2: Temporal Add-Remove ModelTemporal Values in adjacency and removal adjacency matrix

Model 3: Temporal Preferential Attachment & Preferential Detachment Estimate growth and decay for each node based on temporal evolution

Temporal data

Snapshot View

Hypothesis: Usage of temporal information improves the classification of links and unlinks significantly.

Model 0: Baseline Model
Snapshot View: all measures are computed from the adjacency matrix

Training

Adjacency matrix A,

Compute characteristics from A

d(i) Degree of article i

CN(i,j) Number of common neighbors of articles i and j

P3(i,j): Number of paths of length 3 between articles i and j

Model 1: Add-Remove Model
Classic adjacency matrix and removal adjacency matrix

Adjacency matrix A

Removal adjacency matrix A

+

Compute characteristics from A

d(i) Remove-degree of article i

dRatio(i) Ratio of deletes and adds

CN(i,j)

P3(i,j)

Model 2: Temporal Add-Remove Model
Temporal Values in adjacency and removal adjacency matrix

Difference between two articles that have recently connected with the same other articles or long ago.

More recent common neighbors higher likelihood for link

Functions , decreasing in time

Seconds

Years

Model 3: Temporal PA & PD
Estimate growth and decay for each node based on temporal evolution

Preferential Attachment (PA): number of new links proportional to node degree

Disregards temporal evolution

Based on temporal evolution estimate number of

new links (Link Prediction)

removed links (Unlink Prediction)

Set Up

Set Up

Five large Wikipedia datasets

Our datasets comprise several year (up to ten) of data.

Compute AUC-value of features for Link and Unlink Prediction

Ready for your feedback and other interesting datasets :)

Set Up

Julia [email protected] Workshop 12/16/13

of

Julia [email protected] 05/14/2013

of