Download pptx - Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro

Link Prediction

Topics in Data MiningFall 2015

Bruno Ribeiro

2

Deterministic Heuristics Methods

Matrix / Probabilistic Methods

Supervised Learning Approaches

Overview

3

Input: Snapshot of social network at time t

Ouput: Predict edges that will be added to network at time t’ > t

Goal

4

Product Recommendation

2

2

j5

i

Example of link prediction application

5

Twitter’s Who to Follow

Another example of link prediction application

6

A few more examples:

Fraud Detection: (Beutel, Akoglu, Faloutsos, KDD’15)◦ Focuses on discovering surprising link patterns

Anomaly detection: (Rattigan et al, 2005) ◦ Focuses on finding unlikely existing links

Other Applications

7

Deterministic Heuristics

8

Every entity is assigned a score

Score(v) = how many friends person already has

A Very Naïve Approach

9

Assign connection weight score(u, v) to non-existing edge (u, v)

Rank edges and recommend ones with highest score

Can be viewed as computing a measure of proximity or “similarity” between nodes u and v.

General Approach

10

Negated length of shortest path between u and v. All nodes that share one neighbor will be linked

Problem: Network diameter often very small

Shortest Path

11

Common neighbors uses as score the number of common neighbors between vertices u and v.

Score can be high when vertices have too many neighbors

Common Neighbors

12

Fixes common neighbors “too many neighbors” problem by dividing the intersection by the union

Between 0 and 1

Jaccard Similarity

13

This score gives more weight to neighbors that that are not shared with many others.

Adamic / Adar

5 neighbors

50neighbors

Too activeLess evidence of missing link between Alice and BobLess activeMore evidence of missing link between Bob and Carol

Alice

Bob

Carol

14

The probability of co-authorship of u and v is proportional to the product of the number of collaborators of u and v

Preferential Attachment

15

Exponentially weighted sum of number of paths of length l

For small α << 1 : predictions ≈ common neighbors

Katz (1953)

16

where, Hu,v is the random walk hitting time between u and v

Hitting Time

17

Stationary probability walker is at v under the following random walk: ◦ With probability α, jump back to u ◦ With probability 1 – α, go to random neighbor of

current node

Personalized PageRank

18

N(u) and N(v) are number of in-degrees of nodes u and v

Only directed graphs

score(u,v) ∊ [0,1]. If u=v then, s(u,v)=1

SimRank

19

Latent Space Models

20

Represent the adjacency matrix A with a lower rank matrix Ak.

If Ak(u,v) has large value for a missing A(u,v)=0, then recommend link (u,v)

Low Rank Reconstruction

A ≈

Vn⨉k

Uk⨉

n

= Ak

21

(Non-negative matrix factorization)

Product Recommendations

2

2

j5

Users have a propensity rate to buy certaincategory of product (W)

In a category some products have a propensity rate to be bought (H)

i

Example Euclidean Generative Model

22

The problem of link prediction is to infer distances in the latent space, i.e., find the nearest neighbor in latent space not currently linkedRaftery, A. E., Handcock, M. S., & Hoff, P. D. (2002). Latent

space approaches to social network analysis. J. Amer. Stat. Assoc., 15, 460.

Unit plane

• Vertices uniformly distributed in latent unit

hyperplane

• Vertices close in latent space more likely to be

connected

• Probability of edge = f (distance on latent space)

23

Directed Graphical Models

Bayesian networks and Prob. Relational Models (Getoor et al., 2001, Neville & Jensen 2003)

Captures the dependence of link existence on attributes of entities

Constrains probabilistic dependency to directed acyclic graph

Undirected Graphical Models Markov Networks

Bayesian Networks

24

Example of Today’s Approaches

Credit: N. Barbieri, F. Bonchi, G. Manco KDD’14

Whom To Follow and Why

25

A Relational Markov Network (RMN) specifies cliques and the potentials between attributes of related entities at the template level

Single model provides distribution for entire graph

Train model to maximize the probability of observed edge and labels

Use trained model to predict edges and unknown attributes

Relational Markov Networks

Latent Space Rational

26

Generative model

Link Prediction Heuristic

Node u

Most likely neighbor of node v ?

Node v

Observed edges

Model parameters (latent space)

θ

27

Relationship between Deterministic & Probabilistic methods?

Yes = (Sarkar, Chakrabarti, Moore, COLT’10)

Empirically

RandomShortest Path

Common Neighbors

Adamic/Adar Ensemble of short paths

Lin

k pre

dic

tion

acc

ura

cy*

*Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

How do we justify these observations?

Especially if the graph is sparse

Credit: Sarkar, Chakrabarti, Moore

Raftery’s Model (cont)

29

1

½

Higher probability of linking

Two sources of randomness• Point positions: uniform in D dimensional space

• Linkage probability: logistic with parameters α, r

• α, r and D are known

radius r

α determines the steepness


Raftery’s Model: Probability of New Edge

Logistic function integrated over volume determined by dij

i j


Connection to Common Neighbors

31

ik

jr r

j

No. commonneigh.

Net size Decreases with N

k ji

Empirical Bernstein bounds (Maurer & Pontil, 2009)

Distinct radius per node gives Adamic/Adar

32

Create binary classifier that predicts whether an edge exists between two nodes

Supervised Learning Approaches

33

Types of Features

Features

Label Features

KeywordsFrequency of

an item

Topological Features

Shortest DistanceDegree

CentralityPageRank

index

SVM Decision Trees Deep Belief Network (DBN) K-Nearest Neighbors (KNN) Naive Bayes Radial Basis Function (RBF) Logistic Regression

Link Prediction Heuristic

Classifier