24
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel [email protected] University of Haifa, Israel [email protected] © Igor Kanovsky, Lilach Prego @ Graph2004, Haifa, May 2004

Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel [email protected] University of Haifa, Israel

Embed Size (px)

Citation preview

Page 1: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

Clusters Recognition from Large Small World

Graph

Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel

[email protected] University of Haifa, Israel

[email protected]

© Igor Kanovsky, Lilach Prego @ Graph2004, Haifa, May 2004

Page 2: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 20042

Small World Definition(1)

Def.1. The characteristic path length L(G) of a graph G = (V;E) is the average length of the shortest path between two vertices in G

Def.2. The clustering coefficient C(G)=<C(v)> of a graph G = (V;E) is the average clustering coefficient of its vertices C(v);

Page 3: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 20043

Small World Definition(2)

Def. 3. Clustering coefficient for vertex v:

Where k(v) number of edges indicate v (degree of v) , N(v) – neighborhood of v. The clustering coefcient C(v) of a vertex v is the density of the subgraph induced by G[N(v)].

)1)(()(

2)(____)(

vkvk

vNGinedgesofnumbervC

Page 4: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 20044

Small World Definition(3)

Def. 4. Small World graph is a graph G(V,E) with L~LR and C>>CR where GR(VR,ER) - a random graph with |VR|=|V|, |ER|=|E|.

A lot of real world graphs are Small World graphs:

1. Social relationships.2. Business (organization) collaborations.3. The Web. The Internet.4. Biological data (DNA structure, cells

metabolism etc.).

Page 5: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 20045

The Watts and Strogatz Small World model

In 1998, Watts and Strogatz brought the small-world phenomenon to the attention of researchers in various fields by proposing simple SW model.

Page 6: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 20046

The WS Small World model(2)

The WS model does not succeed in capturing theproperties of the real world graphs.

Page 7: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 20047

The Web as a graph

The known significant properties of the Web as a graph are:

1.Small world topology. 2.Power-law distributions.3.Bipartite cliques.4.“Bow-tie" shape.

A huge digraph with similar to the Web graph statistical characteristics is called a Web-like graph.

Page 8: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 20048

The Web as a Small World

Lada A. Adamic. The Small World Web. 2000.

Page 9: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 20049

Power-Law distributions (PLD)

PLD of in- and out-degrees of vertices. The number of web pages having kin links on the page or kout links from the page is proportional to k- for some constants in, out > 2

Andrei Broder, Ravi Kumar and others. Graph structure in the web.2001

Page 10: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200410

Bipartite Small Cores

There are a lot of bipartite small cores Ci,j (with i,j ≥ 3) in the Web graph (a random graph does not have small cliques).

K3,3

A bipartite core Ci,j is a graph on i+j nodes that contains at least one bipartite clique Ki,j as a subgraph.

This small cliques are the cores of the web communities – set of connected sites with a common content topic.

Page 11: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200411

Bipartite Small Cores (2)

Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan and Andrew Tomkins. Extracting large-scale knowledge bases from the web.2000.

Number of Cij as functions of i.j

Page 12: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200412

"bow-tie" shape

The major part of web pages can be divided into four sets: a core made by the strongly connected components (SCC), i.e. pages that are mutually connected to each other, 2 sets (upstream and downstream) made by the pages that can only reach (or be reached by) the pages in the core, and a set (tendril) containing pages that can neither reach nor be reached from the core.

The Web graph has a "bow-tie" shape,

Page 13: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200413

Small World Graph Clustering

The aim is to find subgraphs with high density of links or to find real communities in real graphs.

A lot of approaches:•hub and authority method of Klienberg, •the edge betweennes method of Newman and Girwan,•local density of Virtanen,•minimum spanning tree, spectral methods,•others traditional clustering methods.

Problems: how to define a cluster, how to recognise a cluster from a huge graph.

Page 14: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200414

Small World Graph Clustering

Definition: Two vertices v1, v2 belongs to the same cluster if and

where β -edge weighing parameter (proximity), α - level of the cluster separation for a small world graph.

Definition: A set of vertices belongs to cluster is called cluster.

))(,)(min(

)()(),(

21

2121 vNvN

vNvNvv

Evv ),( 21

10

Page 15: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200415

Iterated Clustering Algorithm (ICA)

Input: undirected graph G=(V,E), level of cluster separation α;Output: clustering {C1,C2,...Ck};Method:i=0;while (V is not empty){ find an arbitrary cluster C in G[V]; i++; Ci=C; V=V-C; }k=i;

Advantage(!): it is not necessary to analyze all the graph to find some local clusters.

Page 16: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200416

Algorithm for finding an arbitrary cluster

Input: undirected graph G=(V,E), level of cluster separation α;Output: A cluster .Method: ; put arbitrary vertex v into queue Q;while(Q is not empty) { get vertex u from Q; add u to C; for each { if then put w into Q; }}

GC

C

)(uNw

),( wu

Page 17: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200417

ICA properties

ICA has a polynomial complexity z2|V|, where z=|E|/ |V| - average edges density, so it is applicable for real world SW graphs and it is better then other clustering algorithms rely on graph connectivity.

ICA intuition is based on big clustering coefficient for SW graphs.

number_of_edges_in_G[N(v)]=

=

)())(( vNvNN

)(

)()(vNu

vNuN

)1)(()(

2)(____)(

vkvk

vNGinedgesofnumbervC

Page 18: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200418

ICA evaluation

ICA was tested for simplest clustered SW graphs generated by Watts and Strogatz model. The model is set of SW chains with number of random inter-clusters edges.

On the next step the algorithm will be applied to the different real SW graphs and more real models.

Page 19: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200419

Web-like Graph Modeling

The aim is to find stochastic processes yields web-like graph.

Our integrated approach is based on well known Web graph models extended in order to satisfy all mentioned above statistical properties.We try to keep a web-like graph model as simple as possible, thus it has to have a minimum set of parameters.

Page 20: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200420

1. At each time step, a new vertex is added and is connected to existing vertex through random number m ( z) of new edges, where the average number of edges per node (z) is constant for a growing graph. The probability that an existing vertex gains an edge is proportional to its in-degree.

Extended scale-free model (1)

)()(

,

,,

j

jin

iiniin

in

in

Ak

Akk

Page 21: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200421

2. Simultaneously, z-m directed edges are distributed among all the vertices in the graph by the following rules: (i) the source is chosen with a probability proportional to their out degree, (ii) the target ends is chosen with a probability proportional to their in-degree.

Extended scale-free model (2)

The model has 3 parameters: average degree z, initial attractiveness of vertex to gain in and out edge Ain , Aout .

Page 22: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200422

Simulation results.In-degree distribution.

Our model. N = 30 K.<k>=8Ain = 2.Aout = 6.

Web. N = 500 M.

Page 23: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200423

Characteristics of several web-like models

Models in out K3,3 C

Our -2.18 -2.85 201 0.0031

PA -2.94 NA 0 0.0029

Copying -2.14 NA 986 0.0022

Small World NA NA 0 0.6191

NA – not applicable

Page 24: Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel igork@yvc.ac.il University of Haifa, Israel

© Igor Kanovsky, Lilach Prego @ Graph2004 , Haifa, May 200424

Thank you.

For contacts:igor kanovsky, [email protected],

http://www.yvc.ac.il/ik/