Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and

Active Learning for Networked Data Based on Non-

progressive Diffusion Model

Zhilin Yang, Jie Tang, Bin Xu, Chunxiao XingDept. of Computer Science and Technology

Tsinghua University, China

An Example

An Example

Instances

Correlation

An Example

Instances

Correlation

??

?

?

?

?

Classify each instance into {+1, -1}

An Example

Instances

Correlation

+1?

-1

+1

?

?

An Example

Instances

Correlation

+1?

-1

+1

?

?

Query for label

An Example

Instances

Correlation

+1?

-1

+1

-1

?

Problem: Active Learning for Networked Data

Instances

Correalation

+1?

-1

+1

?

?ChallengeIt is expensive to query for labels!

QuestionsWhich instances should we select to query?How many instances do we need to query, for an accurate classifier?

Challenges

Active Learning for Networked Data

How to leverage network correlation among instances?

How to query in a batch mode?

Batch Mode Active Learning for Networked Data

Given a graph ( , , , , )LU LG V V E y X

Unlabeled instancesFeatures MatrixLabeled instancesLabels of labeled instancesEdges

Our objective is

max ( )s UV V SQ V Subject to | |SV k

A subset of unlabeled instances

The utility functionLabeling budget

Factor Graph Model

?

?

?

?

?

?

Variable Node

Factor Node

Factor Graph Model

The joint probability

( , )

1( | ; ) exp ( , ) ( , )

i i j

T TL i i i j

v V v v E

P f y g y yZ

y y θ λ x β

Local factor function Edge factor function

Log likelihood of labeled instances

|( ) log exp exp- logL L

T T y y yθ θ S θ SO

Factor Graph Model

Learning

Gradient descent

( | ; ) ( ; )LP P θy y y θθ

S SE EO

Calculate the expectation: Loopy Belief Propagation (LBP)

* *

1

( )\( ) ( )

i i if i iN y f yy f f

x x

1( )\{~ }( ) ( ) ( )

i i j i jy i f N f yf y y y f jf x x x

Message from variable to factor

Message from factor to variable

Question: How to select instances from Factor graph for

active learning?

Basic principle: Maximize the Ripple Effects

?

?

?

?

?

?

Maximize the Ripple Effects

?

?

?

+1

?

?

Labeling information is propagated


?

?

?

+1

?

?



?

?

?

+1

?

?


Statistical bias is propagated

How to model the propagation process in a unlabeled network?

Diffusion Model

Linear Threshold ModelEach instance has a threshold Each instance at time has two statuses (inactive) or (active)Each instance has a set of neighbors

Progressive Diffusion Model

iff or

Non-Progressive Diffusion Model

iff Linear Threshold


?

?

?

+1

?

?


Statistical bias is propagated

Will it be dominated by labeling information (active) or statistical bias (inactive)?

Based on non-progressive diffusion model

Maximize the number of activated instances in the end

An instance has an uncertainty measure

We aim to activate the most uncertain instances!

Instantiate the Problem

Active Learning Based on Non-Progressive Diffusion Model

max max{ | |}S U T UV V V V TV | |SV k,

The number of activated instancesWith constraints

0 ( ) 1 Sf v Vv Initially activate all queried instances

s.t. , ( 1 )M T Mv V f v All instances in should be active after convergence

, ( ) ),\ (U T Tv V V u V v u We activate the most uncertain instances

( ) 1( ) 1 ( ) ( )u N vf v f u t v Based on the non-progressive diffusion

Reduce the Problem

The original problem

Fix , maximize

The reduced problem

Fix , minimize Constraints are inherited.

Reduction procedure

Enumerate by bisection. Solve the reduced problem.

Algorithm

The reduced problem

Fix , minimize The key idea

Find a superset ()Such that there exists a subset ()If we initially activate , we can activate finally

Algorithm

Input: , for each instanceOutput: Initialize to be top uncertain instances;For each iteration:

greedily select a set with minimum thresholds from , while satisfying the constraint that each instance has at least neighbors in ;

;if then converges;

Greedily select a set with minimum degrees from , while satisfying the constraint that each instance has at least neighbors in ;Return ;

Theoretical Analysis

Convergence

Lemma 1 The algorithm will converge within (| | | |)U TV VO time.

Correctness

Theorem 1 If the algorithm converges, is a feasible solution, i.e., if we initially label , we will activate finally.

Approximation Ratio

Theorem 2 Let be the solution given by the algorithm, represent the optimal solution. Let be the max degree of instances and suppose . Then we have

2,

,

| | ( )

| | (1 ) [2 ( ) ( )]s g

s opt

V

V Avg t v d v

Experiments

Datasets

Datasets #Variable node #Factor node

Coauthor 6,096 24,468

Slashdot 370 1,686

Mobile 314 513

Enron 100 236

Comparison Methods

Batch Mode Active Learning (BMAL), proposed by Shi et al.Influence Maximization Selection (IMS), proposed by Zhuang et al.Maximum Uncertainty (MU)Random (RAN)Max Coverage (MaxCo), our method

Experiments

Performance

Related Work

Active Learning for Networked DataActively learning to infer social ties

H. Zhuang, J. Tang, W. Tang, T. Lou, A. Chin and X. Wang

Batch mode active learning for networked dataL. Shi, Y. Zhao and J. Tang

Towards active learning on graphs: an error bound minimization approachQ. Gu and J. Han

Integreation of active learing in a collaborative crfO. Martinez and G. Tsechpenakis

Diffusion ModelOn the non-progressive spread of influence through social networks

M. Fazli, M. Ghodsi, J. Habibi, P. J. Khalilabadi, V. Mirrokni and S. S. Sadeghabad

Maximizing the spread of influence through a social networkD. Kempe, J. Kleinberg and E. Tardos

Conclusion

Connect active learning for networked data to non-progressive diffusion model, and precisely formulate the problem

Propose an algorithm to solve the problem

Theoretically guarantee the convergence, correctness and approximation ratio of the algorithm

Empirically evaluate the performance of the algorithm on four datasets of different genres

Future work

Consider active learning for networked data in a streaming setting, where data distribution and network structure are changing over time

About Me

Zhilin [email protected]

3rd year undergraduate at Tsinghua Univ.

Applying for PhD programs this year

Data Mining & Machine Learning

Thanks!

[email protected]

Documents

Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and