Overlapping Communities - hpi.de · Overlapping Communities Graph Mining course Winter Semester...

Preview:

Citation preview

Overlapping Communities

Graph Mining course Winter Semester 2017

DavideMottinHasso Plattner Institute

Acknowledgements

§ Mostofthislectureistakenfrom:http://web.stanford.edu/class/cs224w/slides

GRAPH MINING WS 2017 2

Lecture road

Introductiontographclustering

Hierarchicalapproaches

Spectralclustering

GRAPH MINING WS 2017 3

Identifying Communities

Nodes:FootballTeamsEdges:Gamesplayed

Canweidentifynodegroups?(communities,

modules,clusters)

GRAPH MINING WS 2017 4

College Football NetworkAtlanticFootballCups/conferences

Nodes:FootballTeamsEdges:Gamesplayed

GRAPH MINING WS 2017 5

Protein-Protein Interactions

Canweidentifyfunctionalmodules?

Nodes:ProteinsEdges:Physicalinteractions

GRAPH MINING WS 2017 6

Protein-Protein Interactions

Functionalmodules

Nodes:ProteinsEdges:Physicalinteractions

GRAPH MINING WS 2017 7

Facebook Network

Canweidentifysocialcommunities?

Nodes:FacebookUsersEdges:Friendships

GRAPH MINING WS 2017 8

Facebook Network

High school Summerinternship

Stanford (Squash)Stanford (Basketball)

Socialcommunities

Nodes:FacebookUsersEdges:Friendships

GRAPH MINING WS 2017 9

Overlapping Communities

§ Non-overlappingvs.overlappingcommunities

GRAPH MINING WS 2017 10

Non-overlapping Communities

Network Adjacencymatrix

Nodes

Nod

es

GRAPH MINING WS 2017 11

Communities as Tiles!

§ What is the structure of community overlaps:Edgedensityintheoverlapsishigher!

Communitiesas“tiles”GRAPH MINING WS 2017 12

Recap so far…

Thisiswhatwewant!Communitiesinanetwork

GRAPH MINING WS 2017 13

Plan of attack

§ 1)Givenamodel,wegeneratethenetwork:

§ 2)Givenanetwork,findthe“best”model

C

A

B

D E

H

F

G

C

A

B

D E

H

F

G

Generativemodelfornetworks

Generativemodelfornetworks

GRAPH MINING WS 2017 14

Model of networks

§ Goal: Defineamodelthatcangeneratenetworks• Themodelwillhaveasetof“parameters”thatwewilllaterwanttoestimate(anddetectcommunities)

§ Q:Givenasetofnodes,howdocommunities“generate”edgesofthenetwork?

C

A

B

D E

H

F

G

Generativemodelfornetworks

GRAPH MINING WS 2017 15

Community-Affiliation Graph

§ GenerativemodelB(V,C,M,{pc})forgraphs:• NodesV,CommunitiesC,MembershipsM• Eachcommunityc hasasingleprobabilitypc

• Laterwefitthemodeltonetworkstodetectcommunities

Model

Network

Communities,C

Nodes,V

Model

pA pB

Memberships,M

GRAPH MINING WS 2017 16

AGM: Generative Process

§ AGMgeneratesthelinks:Foreach• Foreachpairofnodesincommunity𝑨,weconnectthemwithprob.𝒑𝑨

• Theoveralledgeprobabilityis:

Model

ÕÇÎ

--=vu MMc

cpvuP )1(1),(

Network

Communities,C

Nodes,VCommunity Affiliations

pApB

Memberships,M

If𝒖, 𝒗 shareno communities:𝑷 𝒖, 𝒗 = 𝜺Think of this as an “OR” function: If at least 1 community says “YES” we create an edge

𝑴𝒖 … set of communities node 𝒖 belongs to

GRAPH MINING WS 2017 17

Recap: AGM networks

Model

NetworkGRAPH MINING WS 2017 18

AGM: Flexibility

§ AGMcanexpressavarietyofcommunitystructures:Non-overlapping,Overlapping,Nested

GRAPH MINING WS 2017 19

Detecting Communities

§ DetectingcommunitieswithAGM:

C

A

B

D E

H

F

G

GivenaGraph𝑮(𝑽, 𝑬),findtheModel1) AffiliationgraphM2) NumberofcommunitiesC3) Parameterspc

GRAPH MINING WS 2017 20

generate

infer

Maximum Likelihood Estimation

§ MaximumLikelihoodPrinciple(MLE):• Given: Data𝑿• Assumption: Dataisgeneratedbysomemodel𝒇(𝚯)⁃ 𝒇 …model⁃ 𝚯 …modelparameters

• Wanttoestimate𝑷𝒇 𝑿 𝚯):⁃ Theprobabilitythatourmodel𝒇 (withparameters𝜣)generatedthedata

• Nowlet’sfindthemostlikelymodelthatcouldhavegeneratedthedata:argmax

9𝑷𝒇 𝑿 𝚯)

GRAPH MINING WS 2017 21

Example: MLE

§ Imaginewearegivenasetofcoinflips§ Task: Figureoutthebiasofacoin!

• Data: Sequenceofcoinflips:𝑿 = [𝟏, 𝟎, 𝟎, 𝟎, 𝟏, 𝟎, 𝟎, 𝟏]• Model:𝒇 𝚯 = return1withprob.Θ, elsereturn0• Whatis𝑷𝒇 𝑿 𝚯 ?Assumingcoinflipsareindependent⁃ So,𝑷𝒇 𝑿 𝚯 = 𝑷𝒇 𝟏 𝚯 ∗ 𝑷𝒇 𝟎 𝚯 ∗ 𝑷𝒇 𝟎 𝚯 …∗ 𝑷𝒇 𝟏 𝚯▪ Whatis𝑷𝒇 𝟏 𝚯 ?Simple, 𝑷𝒇 𝟏 𝚯 = 𝚯⁃ Then, 𝑷𝒇 𝑿 𝚯 = 𝚯𝟑 𝟏 − 𝚯 𝟓

⁃ Forexample:▪ 𝑷𝒇 𝑿 𝚯 = 𝟎. 𝟓 = 𝟎. 𝟎𝟎𝟑𝟗𝟎𝟔

▪ 𝑷𝒇 𝑿 𝚯 = 𝟑𝟖= 𝟎. 𝟎𝟎𝟓𝟎𝟐𝟗

• Whatdidwelearn? Ourdatawasmostlikelygeneratedbycoinwithbias 𝚯 = 𝟑/𝟖

𝑷𝒇𝑿𝚯

𝚯

𝚯∗ = 𝟑/𝟖

GRAPH MINING WS 2017 22

MLE for Graphs

§ HowdowedoMLEforgraphs?• Modelgeneratesaprobabilisticadjacencymatrix• Wethenflipalltheentriesoftheprobabilisticmatrixtoobtainthebinaryadjacencymatrix𝑨

§ ThelikelihoodofAGMgeneratinggraphG:

0 0.9 0.4 0.040.1 0 0.85 0.750.1 0.77 0 0.60.04 0.65 0.7 0

0 1 0 01 0 1 10 1 0 10 1 1 0

Foreverypairofnodes𝒖, 𝒗 AGMgivestheprob.𝒑𝒖𝒗 ofthembeinglinked

Flip biased coins

)),(1(),()|(),(),(

vuPvuPGPEvuEvu

-PP=QÏÎ

𝑨

GRAPH MINING WS 2017 23

Graphs: Likelihood P(G|Θ)

24GRAPH MINING WS 2017

GivengraphG(V,E) andΘ, wecalculatelikelihoodthatΘ generatedG: P(G|Θ)

0 1 1 01 0 1 01 1 0 10 0 1 0

0 0.9 0.9 00.9 0 0.9 00.9 0.9 0 0.90 0 0.9 0Θ=B(V,C,M,{pc})

GP(G|Θ)

)),(1(),()|(),(),(

vuPvuPGPEvuEvu

-PP=QÏÎ

G

A B

MLE for Graphs

§ Ourgoal: Find𝚯 = 𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) suchthat:

§ Howdowefind𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) thatmaximizesthelikelihood?

QP( | )AGM

argmaxQ

𝑮

GRAPH MINING WS 2017 25

MLE for AGM

§ Ourgoalistofind𝑩 𝑽, 𝑪,𝑴, 𝒑𝑪 suchthat:argmax

L(𝑽,𝑪,𝑴, 𝒑𝑪 )M 𝑷(𝒖, 𝒗) M(𝟏 − 𝑷 𝒖, 𝒗

𝒖𝒗∉𝑬

)�

𝒖,𝒗∈𝑬

§ Problem:FindingBmeansfindingthebipartiteaffiliationnetwork.

• Thereisnonicewaytodothis.• Fitting𝑩(𝑽, 𝑪,𝑴, 𝒑𝑪 ) is too hard, let’schangethemodel(soitiseasiertofit)!

GRAPH MINING WS 2017 26

From AGM to BigCLAM

§ Relaxation:Membershipshavestrengths

• 𝑭𝒖𝑨: Themembershipstrengthofnode𝒖tocommunity𝑨 (𝑭𝒖𝑨 = 𝟎:nomembership)

• Eachcommunity𝑨 linksnodesindependently:𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)

𝑭𝒖𝑨

u v

GRAPH MINING WS 2017 27

Factor Matrix 𝑭§ Communitymembershipstrengthmatrix𝑭

𝑭 =

j

Communities

Nod

es

𝑭𝒗𝑨 … strength of 𝒖’s membership to 𝑨

𝑭𝒖 … vector of community membershipstrengths of 𝒖

¡ 𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)§ Probabilityofconnection is proportionaltotheproductofstrengths§ Notice:Ifonenodedoesn’tbelongtothecommunity(𝐹XY = 0)then𝑷(𝒖, 𝒗) = 𝟎

¡ Prob.thatatleastonecommoncommunity𝑪 linksthenodes:§ 𝑷 𝒖, 𝒗 = 𝟏 −∏ 𝟏 − 𝑷𝑪 𝒖, 𝒗�

𝑪

GRAPH MINING WS 2017 28

From AGM to BigCLAM

§ Community𝑨 linksnodes𝒖, 𝒗 independently:𝑷𝑨 𝒖, 𝒗 = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖𝑨 ⋅ 𝑭𝒗𝑨)

§ Thenprob.atleastonecommon𝑪 linksthem:

𝑷 𝒖, 𝒗 = 𝟏 −∏ 𝟏 − 𝑷𝑪 𝒖, 𝒗�𝑪

= 𝟏 − 𝐞𝐱𝐩(−∑ 𝑭𝒖𝑪 ⋅ 𝑭𝒗𝑪�𝑪 )

= 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖 ⋅ 𝑭𝒗𝑻)§ Example𝑭matrix:

𝑭𝒖 :

𝑭𝒗 :

Then:𝑭𝒖 ⋅ 𝑭𝒗𝑻 = 𝟎. 𝟏𝟔And:𝑷 𝒖, 𝒗 = 𝟏 − 𝒆𝒙𝒑 −𝟎. 𝟏𝟔 = 𝟎. 𝟏𝟒But:𝑷 𝒖,𝒘 = 𝟎. 𝟖𝟖

𝑷 𝒗,𝒘 = 𝟎𝑭𝒘 :Node community

membership strengths

0 1.2 0 0.2

0.5 0 0 0.8

0 1.8 1 0

GRAPH MINING WS 2017 29

BigCLAM: How to find F

§ Task:Givenanetwork𝑮(𝑽, 𝑬),estimate𝑭• Find𝑭 thatmaximizesthelikelihood:

𝒂𝒓𝒈𝒎𝒂𝒙𝑭 M 𝑷(𝒖, 𝒗�

(𝒖,𝒗)∈𝑬

) M (𝟏 − 𝑷 𝒖, 𝒗 )�

𝒖,𝒗 ∉𝑬⁃ where:𝑷(𝒖, 𝒗) = 𝟏 − 𝐞𝐱𝐩(−𝑭𝒖 ⋅ 𝑭𝒗𝑻)⁃ Manytimeswetakethelogarithmofthelikelihood,andcallitlog-likelihood:𝒍 𝑭 = 𝐥𝐨𝐠𝑷(𝑮|𝑭)

§ Goal:Find𝑭 thatmaximizes𝒍(𝑭):

GRAPH MINING WS 2017 30

BigCLAM: V1.0

§ Computegradientofasinglerow𝑭𝒖 of𝑭:

§ Coordinategradientascent:• Iterateovertherowsof𝑭:⁃ Computegradient𝜵𝒍 𝑭𝒖 ofrow𝒖 (whilekeepingothersfixed)⁃ Updatetherow𝑭𝒖:𝑭𝒖 ← 𝑭𝒖 + 𝜼𝛁𝒍(𝑭𝒖)⁃ Project𝑭𝒖 backtoanon-negativevector:If𝑭𝒖𝑪 < 𝟎:𝑭𝒖𝑪 = 𝟎

§ Thisisslow! Computing𝜵𝒍 𝑭𝒖 takeslineartime!

𝓝(𝒖)..Setoutoutgoingneighbors

GRAPH MINING WS 2017 31

BigCLAM: V2.0

§ However,wenotice:

• Wecache∑ 𝑭𝒗�𝒗

• So,computing∑ 𝑭𝒗�𝒗∉𝓝(𝒖) nowtakeslineartime

inthedegree|𝓝 𝒖 | of𝒖⁃ Innetworksdegreeofanodeismuchsmallertothetotalnumberofnodesinthenetwork,sothisisasignificantspeedup!

GRAPH MINING WS 2017 32

BigClam: Scalability

§ BigCLAM takes5minutesfor300knodenets• Othermethodstake10days

§ Canprocessnetworkswith100Medges!

GRAPH MINING WS 2017 33

Extension: Beyond Clusters

GRAPH MINING WS 2017 34

Extension: Directed AGM

§ Extension:Makecommunitymembershipedgesdirected!

• Outgoingmembership: Nodes“sends”edges• Incomingmembership: Node“receives”edges

GRAPH MINING WS 2017 35

Example: Model and Network

GRAPH MINING WS 2017 36

Directed AGM

§ Everythingisalmostthesameexceptnowwehave2matrices:𝑭 and𝑯

• 𝑭…out-goingcommunitymemberships• 𝑯…in-comingcommunitymemberships

§ Edgeprob.:𝑷 𝒖, 𝒗 = 𝟏 − 𝒆𝒙𝒑(−𝑭𝒖𝑯𝒗𝑻)

§ Networklog-likelihood:

whichweoptimizethesamewayasbefore

𝑭 𝑯

GRAPH MINING WS 2017 37

Predator-prey Communities

GRAPH MINING WS 2017 38

Questions?

GRAPH MINING WS 2017 39

References§ Yang,J.andLeskovec,J.Community-affiliationgraphmodelforoverlappingnetwork

communitydetection. ICDE,2012.

§ OverlappingCommunityDetectionatScale:ANonnegativeMatrixFactorizationApproach byJ.Yang,J.Leskovec. ACMInternationalConferenceonWebSearchandDataMining(WSDM),2013.

§ DetectingCohesiveand2-modeCommunitiesinDirectedandUndirectedNetworks byJ.Yang,J.McAuley,J.Leskovec. ACMInternationalConferenceonWebSearchandDataMining(WSDM),2014.

§ CommunityDetectioninNetworkswithNodeAttributes byJ.Yang,J.McAuley,J.Leskovec. IEEEInternationalConferenceOnDataMining(ICDM),2013.

GRAPH MINING WS 2017 40

Recommended