23
Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks Baoxu Shi, Tim Weninger University of Notre Dame 1

Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Embed Size (px)

Citation preview

Page 1: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Mining Interesting Meta-Paths from Complex Heterogeneous

Information NetworksBaoxu Shi, Tim Weninger University of Notre Dame

1

Page 2: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Homogeneous Network

MoDAT

2

Page 3: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Heterogeneous Network

Association

People

University City Country

Conference WorkshopBelongs to

Speaks atlocates at

locates at the capital of

affiliate

Professor of

3

MoDAT

Page 4: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Heterogeneous Network

Belongs to

Speaks at

locates at

locates at the capital of

affiliate

Professor at

People

Association

Meeting

Education Geography

Meeting

Geography

Page 5: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Heterogeneous Network

People

Association

Meeting

Education Geography

Meeting

Geography

Page 6: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Path and Meta-Path

PeopleMeeting Education Geography

Association

Page 7: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

How things are uniquely connected/separated?

NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND

3

EDUCATION GEOGRAPHYPEOPLE

Path

Meta-Path

1 2

1 2 3

EDUCATION

Interesting meta-path is meta-path that best describes how two objects are uniquely related in complex HINs.

7

Page 8: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND

3

EDUCATION GEOGRAPHYPEOPLE

Path

Meta-Path

1 2

1 2 3

EDUCATION

Education Professor University Geography

Page 9: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND

3

EDUCATION GEOGRAPHYPEOPLE

Path

Meta-Path

1 2

1 2 3

EDUCATION

Education Professor University Geography

Education Network Scientist Catholic University Geography

9

Page 10: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND

3

EDUCATION GEOGRAPHYPEOPLE

Path

Meta-Path

1 2

1 2 3

EDUCATION

Education Professor University Geography

Education Network Scientist Catholic University Geography

EducationNetwork Scientist

who born in Transylvania,1967

Catholic University

at South Bend, IN Geography

10

Page 11: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Limitations of State of the Art Meta-Path Related Researches

• Type of meta-labels are limited

• Meta-types do not have complex hierarchy

• Meta-paths are pre-defined manually

• No large scale experiments

Term Venue

Paper

Author

11

NORTHEASTERN BARABÁSI NOTRE DAME SOUTH BEND

3

EDUCATION GEOGRAPHYPEOPLE

Path

Meta-Path

1 2

1 2 3

EDUCATION

Page 12: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Limitations of State of the Art Meta-Path Related Researches

• Type of meta-labels are limited

• Meta-types do not have complex hierarchy

• Meta-paths are pre-defined manually

• No large scale experiments

Framework that can handle millions of meta-types

Meta-types with complex hierarchy

Meta-path are automatically generated

Experiments are done on Wikipedia (10 million nodes, 740 million edges)

12

Page 13: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

How to find interesting paths?

• Generate paths

• Rank top k interesting paths using meta-data

• Extract meta-path for searching

13

Page 14: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Path Generation

sib(ai, aj) i↵ ai 2 t ^ aj 2 t

8au0 2 A0u, sib(au0 , au) 8av0 2 A0

v, sib(av0 , av)

{~y1, ~y2, . . .} 2 Y ~y = ha1, a2, . . . , a|~y|ii

~x = ha1, a2, . . . , a|~x|ii{~x1, ~x2, . . . , ~xk} 2 X

au

= a1, av = a|~x|, 1 i k

• Generate path set for given points

• Generate sibling path set

14

a1 2 A0u, a|~y| 2 A0

v, 1 i k

au, av

Page 15: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

ROCKNEWAND

AMERICAN COMPUTER SCIENTISTS

PROGRAMMING LANGUAGE

RESEARCHERS

983 Others

BARBARA LISKOV

ANDERS_HEJLSBERG79 Others

UNIVERSITY OF NOTRE DAME

FACULTY

COLLEGE FOOTBALL

HALL OF FAME INDUCTEES

21 Others

JOHN HEISMAN

BARRY SANDERS

JULIUS NIEUWLAND

BARABÁSI204 Others

1075 OthersHAL ABELSON

YX

VASANT HONAVAR

Short Paths

Example: Path generation

15

Page 16: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

ROCKNENORTHEASTERN NOTRE DAME

WANDBARABÁSI

ROSE BOWLHARVARD

CY YOUNG CARL HUBBELL

CARNEGIE MELLON UNIVERSITY

TD GARDEN LA COLISEUM

Example: Path generation

16

Which is the most interesting path?

Page 17: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Path Ranking

• Unordered Ranking~x

i

= ha1, a2, . . . , a|~xi|i ~T~x

i

= hTa1 , Ta2 , . . . , Ta|~x

i

|i

T~xi =

|~xn|[

n=1

Tan

TY =

|Y |[

i=1

{T~yi}T~yi = Ta0u[

|~yi|�1[

n=2

{Tan} [ Ta0v

r(~xi

) =|T

~xi \ T

Y

||T

Y

|

Page 18: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Path Ranking

• Ordered Ranking

~x

i

= ha1, a2, . . . , a|~xi|i ~T~x

i

= hTa1 , Ta2 , . . . , Ta|~x

i

|i

p(an, a0n) =

|Tan \ TYn ||TYn |

r(~xi

) = mean

|~xi|n=1(p(an, a

0n

))

Page 19: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

19

Result: Path Ranking

Qualitative analysis is done with mechanical turkers.

Page 20: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

20

0.48

0.52

0.56

0.60

0 0.25 0.5 0.75 1

Result: Path Ranking

Result shows user more like to pick path with lowest or highest similarity.

People pick path with highest score may because they treat best as correct.

Page 21: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

DATA MINERS

JIAWEI HAN

DATA MINING SIGKDD JOHANNES GEHRKE

STATISTICIANS

MATHEMATICIANS

PEOPLE

SCHOLARS AND ACADEMICS

DATA MINING

SCIENCE

ACM SIGS

PEOPLE

Mor

e sp

ecifi

cM

ore

gene

ral

Nodes

Types

COMPUTATIONAL STATISTICS

MATHEMATICAL SCIENCES

STATISTICS

SOCIETY

ACM

PROFESSIONAL ORGANIZATIONS

SCIENTIFIC SOCIETIES

DATABASE RESEARCHERS

COMPUTER SCIENTISTS

SCHOLARS AND ACADEMICS

SCHOLARS

ORGANIZATIONS

Example: Extract Meta-Path

Page 22: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

22

Result: Meta-Path Constraint RWR0 0.24 0.41 0.48

Edgar F. Codd 40.5 18.1 9.0

Johannes Gehrke 28.4 29.4 8.4 2.8

Raghu Ramakrishnan 31.1 6.0 3.6

Anita Borg 5.1 0.6 0.2

Shafi Goldwasser 4.9 0.6

Osmar R. Zaiane 4.8 3.6 1.6

Vint Cerf 4.1 2.4 0.2

Allen Newell 2.0 0.6

ACM 5.1

IEEE 4.9

Yahoo! Research 4.8

Microsoft Research 4.4

Database researchers

Computer Scientist

Page 23: Mining Interesting Meta-Paths from Complex Heterogeneous Information Networks

Questions?