19
Hypergraph Mining for Social Networks Giacomo Bergami [email protected] Università di Bologna July 17, 2014 Giacomo Bergami[email protected] (Università di BolognaHypergraph Mining for Social Networks July 17, 2014 1 / 19

Hypergraph Mining For Social Networks

Embed Size (px)

Citation preview

Page 1: Hypergraph Mining For Social Networks

Hypergraph Mining for Social Networks

Giacomo [email protected]

Università di Bologna

July 17, 2014

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 1 / 19

Page 2: Hypergraph Mining For Social Networks

Contents

1 Goals & State of the Art

2 Why Hypergraphs?

3 Data Mining Algebra

4 gSpanThe Original AlgorithmSpecialization Proposal

5 Conclusions & Future work

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 2 / 19

Page 3: Hypergraph Mining For Social Networks

Goals & . . .

Define an hypergraph data structure for both data and relations.Define some algebra operators.Open question: how to automate the mining process.Evaluations: online social network (OSN) data representation.Experiments: Graph clustering of Iris data set.

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 3 / 19

Page 4: Hypergraph Mining For Social Networks

. . . & State of the Art

Toon Calders’s Data Mining (Relational) Algebra.Data structures that have already been studied in detail:.

I Data mining operations have been developed in both graphs andrelational graphs.

I Well known algorithms (optimal computational cost) and operators.

While hypergraphs may be an intuitive representation of higher ordersimilarities, it seems (anecdotally at least) that graphs lie at the heartof this problem.

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 4 / 19

Page 5: Hypergraph Mining For Social Networks

Why Hypergraphs?

tommy32

Tom S.

(lat1,long1)

Tom S. (lat1,long1) tommy32

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 5 / 19

Page 6: Hypergraph Mining For Social Networks

Data Structures: ED as data entities’ representation

tommy32

Tom S.

(lat1,long1)

sam

Adeola S.

(lat2,long2)

47561

10231

75234

80235

11230

UserID UserLocation UserName w ϕtommy23 (lat1,long1) Tom Sawyer 1 1000

sam (lat2,long2) Adeola Samuel 1 10002

Entities(E

D)

PostID PostLocation PostContent UserID w ϕ47561 (lat3,long3) Have a nice day! tommy32 1 1

80235 (lat4,long4) Some great news... sam 1 2

10231 (NULL,NULL) J.S. Bach and ... tommy32 1 3

75234 (lat5,long5) Telemann & Xenakis... tommy32 1 4

11230 (NULL,NULL) Yet another plot! sam 1 5

Entities(E

D)

Database with Uncertain Data

Users<:Object

Posts<:Object

Collection

Attributes

Primary key Foreign key

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 6 / 19

Page 7: Hypergraph Mining For Social Networks

Data Structures: EE as binary relationships’ representation

tommy32

Tom S.

(lat1,long1)

sam

Adeola S.

(lat2,long2)

47561

10231

75234

80235

11230

tommy3210231

75234

47561

sam

80235

10231follows

follows

The information’s atomization allows to automatically identify relationshipsbetween data (users’ posts)If we keep binary EE relations between ED , we could retain a linear timecomplexity on the size of a binary graph (O(|G|)), where ED are mapped asG’s vertices.

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 7 / 19

Page 8: Hypergraph Mining For Social Networks

DHImp, the final step. (DHImp = (db,T ))

Relational Database will express data relations (ED) while tensors willexpress data correlations (EE )This definition permits to define also tensors τi ∈ T for non binaryrelations (OSN relations are mainly binary).Permits to separate the operations for data and the operations overrelations.

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 8 / 19

Page 9: Hypergraph Mining For Social Networks

(Relational) Data Mining Algebra

D E

Pop

I

πA πRDA

κλ

Toons’ DMA Identifies different “data” categories: Data World,Intensional World, Extensional World.Some internal world operations and external world operations aredefined.Data Mining algorithms could be described by this algebra.

I Is it possible to map these worlds over our hypergraphs? YesI Is it possible to define an algebra for (weighted and indexed)

hypergraphs? Yes

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 9 / 19

Page 10: Hypergraph Mining For Social Networks

Is it possible to map these worlds over our hypergraphs?

DHImp (as Hyperg

raph)

euser

epos

eid

fuser

fpos

fid

Pure I-Hypergrap

h

{e}

{f}

{e, f}

HDM = (h,H, EL)

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 10 / 19

Page 11: Hypergraph Mining For Social Networks

Is it possible to define an algebra for (weighted and indexed) hypergraphs?

Definition (Database operations)

.(DB) = { .(t) | t ∈ DB }DB1onDB2 = { t1 on t2 | t1 ∈ DB1 ∧ t2 ∈ DB2 }

Definition (Index-consistency)A database unary operation . is said to be index-consistent iff. for all thetables of the current database, the indices among the tables are keptdistinct. (Similarly for binary ones).

Relational algebra operators over DWorld should be redefined forweight update and indexing properties.

I The reindexing over the tables obtained as a result of algebraicoperations is performed via dovetailing.

I All the relational algebra operations have to be proved asindex-consistent.

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 11 / 19

Page 12: Hypergraph Mining For Social Networks

Hypergraph Data Mining Example: H. Clustering via Binary Graph Clustering

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

3031

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

9293

94

95 9697

98

99100

101

102

103

104

105

106

107

108

109

110

111

112

113

114115

116

117

118119

120121

122

123

124

125

126

127128

129

130

131

132

133

134 135

136

137138

139

140

141

142

143

144

145

146

147

148

149

150

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

3031

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

9293

94

95 9697

98

99100

101

102

103

104

105

106

107

108

109

110

111

112

113

114115

116

117

118119

120121

122

123

124

125

126

127128

129

130

131

132

133

134 135

136

137138

139

140

141

142

143

144

145

146

147

148

149

150

Real data (left) vs. Clustered (right) values.

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 12 / 19

Page 13: Hypergraph Mining For Social Networks

Hypergraph Data Operators Example: 2nd Neighbours and Node Degree

<gyankos,Jack Bergus>

<jsbach,Johann Sebastian Bach>

<handel,G F Haendel>

<mozart,W A Mozart>

<faux,P.D.Q. Bach>

1

1

1

1 1

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 13 / 19

Page 14: Hypergraph Mining For Social Networks

G ′′ ←Calcf (x)=1 as Count(ρ{NScreenName←ScreenName,NUserId←UserId}(G))

<Jack Bergus,gyankos,1>

<Johann Sebastian Bach,jsbach,1>

<G F Haendel,handel,1>

<W A Mozart,mozart,1>

<P.D.Q. Bach,faux,1>

1

1

1

1 1

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 14 / 19

Page 15: Hypergraph Mining For Social Networks

G ′′′ ← σPe 7→w(e)≥0.5(G onx ,y 7→ϕ(x)=ϕ(y)∨T [x ,y ,hasFriend]6=0 G ′′)

<jsbach,Johann Sebastian Bach,G F Haendel,handel,1>

<jsbach,Johann Sebastian Bach,P.D.Q. Bach,faux,1>

<gyankos,Jack Bergus,Jack Bergus,gyankos,1><handel,G F Haendel,G F Haendel,handel,1>

<mozart,W A Mozart,W A Mozart,mozart,1>

<gyankos,Jack Bergus,P.D.Q. Bach,faux,1>

<jsbach,Johann Sebastian Bach,Johann Sebastian Bach,jsbach,1>

<mozart,W A Mozart,G F Haendel,handel,1><faux,P.D.Q. Bach,Johann Sebastian Bach,jsbach,1>

<faux,P.D.Q. Bach,P.D.Q. Bach,faux,1>

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

1

1

0.5

0.50.5

0.5

0.5

0.5 0.5

0.5

1

0.5

0.5 0.5

1

0.5 0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

1

0.5

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 15 / 19

Page 16: Hypergraph Mining For Social Networks

G ′′′ ← σPe 7→w(e)≥0.25(Γsum(Count)-1 as Count

〈ScreenName,UserId〉 (G ′′′))

<W A Mozart,mozart,1>

<Jack Bergus,gyankos,1>

<P.D.Q. Bach,faux,1>

<Johann Sebastian Bach,jsbach,2>

<G F Haendel,handel,0>

0.75

0.62

0.25

0.38

0.58

0.25

0.5

0.28 0.67

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 16 / 19

Page 17: Hypergraph Mining For Social Networks

gSpanThe Original Algorithm

gSpan is a frequent subgraph mining algorithm that works in the followingway:�gSpan(DB, minsupp, Solution) {

Ξ←sort(FrequentEdges(DB,minsupp))Solution← Ξ;NStack ← Solution

5 while (g ← NStack.pop()) {if g 6= minDfsCode(g) continue;Solution← g∀e ∈ Ξ. if (e �re g)⇒ { // if e �re g is a rightmost expansion of g by eif GS(e �re g ,DB) ≥ minsupp

10 NStack.push(e �re g)}}

}� �Listing 1 : gSpan

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 17 / 19

Page 18: Hypergraph Mining For Social Networks

gSpanSpecialization Proposalhttps://github.com/jackbergus/gSpanExtended

Suppose to see DHImp = (db,T ) as a directed (binary) graph:

({ϕ(e)|t ∈ db ∧ e ∈ t}, {(e, f )|∃e, f , k.T [e, f , k] 6= 0})

We suppose that our HDM contains only one vertex per entityinstance. Hence, each vertex will have a unique label, that is its datarepresentation (D(e)) or index (ϕ(e)).Suppose that relation label λ(e) = λ(v ,w) = (ϕ(v),T(e), ϕ(w)),each edge has an unique label because each vertex is unique.gSpan algorithm guarantees that the minimal dfs code is unique foreach graph. This algorithm strengthens the fact.Subgraph isomorphism over distinct edges and vertices reduces to anapproximated ordered-subset test that could be implemented in lineartime.The overall original algorithm has a polynomial time complexity.

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 18 / 19

Page 19: Hypergraph Mining For Social Networks

Conclusions & Future work

ConclusionWe used R for experimenting data mining operations. Some Javaimplementations of the algebra were provided(https://github.com/jackbergus/hypergraphalgebra/)Hypergraph express n-ary relations, and change both data andrelations.Hypergraph problems could be reduced to graph ones.

Future WorkWe could study the time complexity of each data mining operators forhypergraphs and complete the algebra definition.We could set an environment where execute such hypergraph algebricoperations with distributed or parallel algorithms.

Giacomo [email protected] (Università di Bologna)Hypergraph Mining for Social Networks July 17, 2014 19 / 19