4
Modeling of scale-free network based on Pagerank algorithm Yi Zhang Department of Computer Science, Huazhong Normal University, Wuhan, China Kaihua Xu College of Physical Science and Technology Huazhong Normal University, Wuhan, China Yuhua Liu Department of Computer Science, Huazhong Normal University, Wuhan, China, Email: [email protected] Zhenrong Luo Department of Computer Science, Huazhong Normal University, Wuhan, China Abstract—BA scale-free network model ignores the appraisal which the node obtains in the network evolution. In the real system, new nodes tend to connect the nodes have been highly recognized. Therefore the paper presents a scale-free network model based on pagerank algorithm. The simulation experiment has showed the network dynamic generation process and obtained the degree distribution of BA model and new model under the same level scale. The simulation result indicated that the new model’s degree distribution presents the power-law distribution, which has similar with BA model. Keywords- BA model; pagerank; degree distribution I. INTRODUCTION Complex network closely associated with our life, including the Internet, www, power networks, metabolic networks, research and co-operation networks can be described by complex network theory. The complex network has attracted many related area researchers' widespread attention. The researchers through statistical data, analysis of network characteristics, have established a series of network models to further realize the network behavior, and improve the network performance. The most famous network model including the random graph model built by Erdós and Rényi [1] , the Watts-Strogatz small-world model [2] , the Barabási-Albert scale-free model [3] . In recent years, complex network research's important discovery was that many complex networks, including www, Internet and other networks, the distribution of node's degree satisfied the power law distribution. The vast majority of node has low degree, but there are a few high degree nodes. As the degree of such networks there is no obvious characteristic length, so called it scale-free network [4] . In order to explain that the power law distributed production mechanism, Barabási and Albert proposed the scale-free network model. BA model has played a significant role in promoting scale-free network research, It quite accurate reveals the essential characteristics of networks, has exposed the formation mechanism of scale-free networks, but regarding the realistic network, BA model is too simplified and ignores some factors of the network evolution. Many researchers made a number of new attempts and efforts base on it. Bianconi and Barabási posed a fitness model [5] , firstly studied the impact of competitive factors. Xiang Li and Guanrong Chen proposed a local-world evolution model [6] , they think exists the preferential attachment in the local world. Many examples indicated that the additional connection has also referred to the appraisal obtained in the real system, but the degree does not objectively reflect the node’s recognized extent. This paper introduces the Google’s pagerank algorithm, proposed a scale-free network model based on the BA model. V3-783 978-1-4244-5824-0/$26.00 c 2010 IEEE

[IEEE 2010 2nd International Conference on Future Computer and Communication - Wuhan, China (2010.05.21-2010.05.24)] 2010 2nd International Conference on Future Computer and Communication

Embed Size (px)

Citation preview

Page 1: [IEEE 2010 2nd International Conference on Future Computer and Communication - Wuhan, China (2010.05.21-2010.05.24)] 2010 2nd International Conference on Future Computer and Communication

Modeling of scale-free network based on Pagerank algorithm

Yi Zhang

Department of Computer Science,

Huazhong Normal University, Wuhan, China

Kaihua Xu

College of Physical Science and Technology

Huazhong Normal University, Wuhan, China

Yuhua Liu

Department of Computer Science,

Huazhong Normal University, Wuhan, China,

Email: [email protected]

Zhenrong Luo

Department of Computer Science,

Huazhong Normal University, Wuhan, China

Abstract—BA scale-free network model ignores the appraisal

which the node obtains in the network evolution. In the real

system, new nodes tend to connect the nodes have been highly

recognized. Therefore the paper presents a scale-free network

model based on pagerank algorithm. The simulation

experiment has showed the network dynamic generation

process and obtained the degree distribution of BA model and

new model under the same level scale. The simulation result

indicated that the new model’s degree distribution presents the

power-law distribution, which has similar with BA model.

Keywords- BA model; pagerank; degree distribution

I. INTRODUCTION

Complex network closely associated with our life,

including the Internet, www, power networks, metabolic

networks, research and co-operation networks can be

described by complex network theory. The complex network

has attracted many related area researchers' widespread

attention. The researchers through statistical data, analysis of

network characteristics, have established a series of network

models to further realize the network behavior, and improve

the network performance. The most famous network model

including the random graph model built by Erdós and

Rényi[1], the Watts-Strogatz small-world model[2], the

Barabási-Albert scale-free model[3].

In recent years, complex network research's important

discovery was that many complex networks, including

www, Internet and other networks, the distribution of node's

degree satisfied the power law distribution. The vast

majority of node has low degree, but there are a few high

degree nodes. As the degree of such networks there is no

obvious characteristic length, so called it scale-free network[4]. In order to explain that the power law distributed

production mechanism, Barabási and Albert proposed the

scale-free network model. BA model has played a

significant role in promoting scale-free network research, It

quite accurate reveals the essential characteristics of

networks, has exposed the formation mechanism of

scale-free networks, but regarding the realistic network, BA

model is too simplified and ignores some factors of the

network evolution. Many researchers made a number of new

attempts and efforts base on it. Bianconi and Barabási posed

a fitness model [5], firstly studied the impact of competitive

factors. Xiang Li and Guanrong Chen proposed a

local-world evolution model [6], they think exists the

preferential attachment in the local world.

Many examples indicated that the additional connection

has also referred to the appraisal obtained in the real system,

but the degree does not objectively reflect the node’s

recognized extent. This paper introduces the Google’s

pagerank algorithm, proposed a scale-free network model

based on the BA model.

V3-783978-1-4244-5824-0/$26.00 c©2010 IEEE

Page 2: [IEEE 2010 2nd International Conference on Future Computer and Communication - Wuhan, China (2010.05.21-2010.05.24)] 2010 2nd International Conference on Future Computer and Communication

II. PAGERANK ALGORITHM

PageRank is the core technology Google uses to

determine a page's relevance or importance, it is proposed by

Google's founders Larry Page and Sergey Brin. Pagerank

algorithm applies the academic citation literature to the Web,

largely by counting citations or links to a given page. This

gives some approximation of a page’s importance or quality[7]. Each link to a page is a vote of the page, being linked

more means more votes by other sites,� because of this

correspondence, PageRank is an excellent way to prioritize

the results of Web keyword searches. The probability that

the random surfer visits a page is its PageRank. If a page

linked by many other pages, that it was universally

recognized and trusted, then its pagerank value is also

higher. Some popular web site’s pagerank value is relatively

very high, because in order to enhance their click rates, the

general pages tend to link these high-impact Web sites.

PageRank is an objective measure of its citation importance

that corresponds well with people’s subjective idea of

importance.

Pagerank algorithm is based on the following

assumptions [8]:” A page is referenced (reverse link) the

more, explained that this page is more important; Another

intuitive justification is that although many times a page has

not been cited, but it was cited by important pages, it may be

very important; The importance of a web page equally

passed to the reference page.”

The pagerank value is calculated like this[7][8]: The entire

Internet is a large directed graph, Denoted by� ( , )G V E= ,�

V is the set of all pages, E is the set of directed edges,

( , )i j express that page i have a hyperlink points to page�

j .we suppose that ( )iPR V is the pagerank value of page

iV , ( )iC V is page iV ’s out-degree, d is the damping

factor, its value is between 0 to 1.The pagerank value of

page A is given by:

( 1) ( )( ) ( )

( 1) ( )

PR T PR TnPR A d

C T C Tn= + +� � � � � � � ����

The PR values of all the pages are met, therefore can

obtain:

T TPR dMPR= � � � � � � � � � � � � � � ���

Where M is the coefficient matrix, TPR corresponds to

all the page’s pagerank values, Therefore,� TPR is the

eigenvectors of characteristic root 1/d of M .� As long as

calculates the eigenvector, that is the page collection

correspondence pagerank value, may use the iterative

method to compute the pagerank value. Given the initial

vector P to do the first iteration, is equivalent to using the

initial vector multiplied by the above matrix. Use the result

with the first iteration multiplied by the matrix above to do

the second iteration, according to this iteration continues,

finally calculate the pagerank values of the corresponding

set of web pages.

If there is a page does not contain any hyperlink, then its

out-degree is 0, then after finite iterations, the PR value of

all vertices will converge to 0. This is because as the page is

not to make contributions to any PR, therefore, the sum of

PR as a whole continued to decline and eventually reduced

to 0. To overcome this problem, equation (1) Improvements

as follows:

( 1) ( )( ) (1 ) ( )

( 1) ( )

PR T PR TnPR A d d

C T C Tn= − + + +� (3)

Can obtain, the PR value of each page is equivalent to

the probability of a user randomly click browse. Pagerank

algorithm in addition to sort your search results, but also can

be applied to other aspects, such as estimates of network

traffic, back links predictor and so on [8].

III. MODEL OF SCALE-FREE NETWORK BASED ON

PAGERANK ALGORITHM

Compared with the experimental result which obtains in

the real network, BA model has some limitations. In a real

network, new nodes tend to link good "word of mouth"

node. For instance, in the scientific research cooperation

V3-784 2010 2nd International Conference on Future Computer and Communication [Volume 3]

Page 3: [IEEE 2010 2nd International Conference on Future Computer and Communication - Wuhan, China (2010.05.21-2010.05.24)] 2010 2nd International Conference on Future Computer and Communication

network, scholars are fond of quoting literature cited by

more people. BA model according to the degree of nodes to

predict new additional connection, for the new nodes gain

connections less time, so this nodes’ connection number are

few. But in the real system a node's connection and growth

are not only rely on the sequence of nodes enter the network,

on the WWW some documents through a combination of

good content acquire a large number of links in a very short

time, easily overtaking websites that have been around for

much longer time. Therefore, we should consider the impact

of high recognized node to the new connections. Pagerank

algorithm can objectively reflect the node’s recognized

extent, also reflect the appraisal this node obtained.�Therefore this paper introduces the pagerank value into the

BA model, carries on the connection together with degree.

A. The new model of networks

Use ( )PR i to represent the pagerank value of node i

use ik to indicate the degree of node i , the improvement

BA model definition is as follows:

1) Growth Starting with a small number of nodes,

everytime we add a new vertex with m ( 0m≤ )

edges that link the new vertex connected to mdifferent vertices already present in the system.

2) preferential attachment:� we assume that the

probability i∏ that a new vertex connect to vertex

i depends on ik and ( )PR i , as follows:

( )( )

( )i

i

jj

k PR i

k PR j

+∏ =

+�� � � � � � � � � � � ���

( )( )jj

k PR j+� � is the sum of the rest of nodes

and its pagerank value in network. In this modified BA

model, comprehensive considered the attraction that a node

to new joined node, the degree and pagerank value of a node

collectively determine the probability that the new node

connected to it.

B. Degree distribution

Using the continuum mechanics theory to analyze the new model’s character and predict degree distribution. We

suppose that ik and ( )PR i are two continuums real

variable, and by dynamic equation, we obtain that

( )( )

( )i i

i

jj

k k PR im m

t k PR j

∂ += =

∂ +∏ �

� � �

�( ) ( )

1 2 1i i

jj

k PR i k PR im m

k mt

+ += =

+ +�� � (5)

The coefficient m indicates the change of the total degree of vertices in the original system between a time interval, it equal to the degree of the new node. By the generation rules of network, the initial con- dition of equation (5) is:

( )i ik t m= ���� it t= �

So the equation solution to meet this initial condition is

as follows:

( )1

22 1

( ) ( ( )) ( )2 1i

i

mtk t m PR i PR i

mt

+= + −

+� (6)�

The above equation indicated that all nodes evolve

according to the identical way, namely obeys the power law

distribution, the node evolution is consistent with the BA

model. Then in accordance with the formula (6) calculated

the degree of the node distribution function ( )p k as

follows:

( )( ) iP k k

p kk

∂ <=

∂� � �

2

30

(2 1)( ( ))

( )( ( ))

mt m PR i

m m t k PR i

+ +=

+ +� � � (7)�

From equation (7), when t →∞ the degree distribution of vertices is

2

3

2( ( ))( )

( ( ))

m PR ip k

k PR i

+≈

+� � � � � � � (8)�

From equation (8), when 0iC = so that

2 3( ) 2p k m k −≈ � � � � � � � � � � � (9)�

[Volume 3] 2010 2nd International Conference on Future Computer and Communication V3-785

Page 4: [IEEE 2010 2nd International Conference on Future Computer and Communication - Wuhan, China (2010.05.21-2010.05.24)] 2010 2nd International Conference on Future Computer and Communication

This is the degree distribution of BA model, this model

also shows that BA is a special case of this model.

IV. SIMULATION

We adopt the new model as well as BA model to

simulate the process of dynamically generation of the

realistic computer networks, and compare the two models. In

this paper, the initial nodes 0m with 5, when a new vertex is

added, the new vertex will connect with four old vertices

( m =4), and ultimately form the network of

vertex 2000N = . Starts from the random network to evolve,

degree and degree distribution under the new model can get

conclusion and we can compare degree and degree

distribution with BA model, as Fig.1 and Fig.2.

�Fig.1. new model degree distribution�

�Fig.2. BA model degree distribution�

From Fig.1 and Fig.2, we know the power-law of two models have similar degree distribution,� the degree of vertices of the two models follow the power-law distribution, the simulation results of the model are consistent with the theoretical analysis. However, the degree distribution of new model is gentler than the BA model. It indicated that pagerank values affect the additional connections’ choices, not only depends upon the degree to

choose, but also consider the grades the nodes have been recognized.

V. CONCLUSIONS

Although BA model reveals the essential characteristics of networks, which can be used to describe many realistic networks, it still has some limitations. In order to make the network model close to the realistic system, we need to consider more intrinsic factors in the construction process of network. According to new nodes tend to connect the nodes have been highly recognized in the real system, this paper presents a scale-free network model based on pagerank algorithm. Through theoretical analysis and simulation experiment, we can obtain that the new model follow power-law distribution. Although the new model consider the grades the nodes have been recognized, but the extent of this impact need further research.

REFERENCES

[1] Erdós and Rényi. On the evolution of random graphs. Publ. Math. Inst.Hung. Acad. Sci 5, 1960:17~61.

[2] Watts D.J. and Strogatz S.H. Collective dynamics of small-world networks, Nature393, 1998, pp.440~442.

[3] R. Albert and A. L. Barabási.Emergence of scaling in random networks [J].Science, 1999, 286:509-512.

[4] � Xiao-Fan Wang, Li Xiang, Chen Guan-Rong. Complex network theory and its application [M]in chinese. Beijing: Tsinghua University Press 2006.

[5] G. Bianconi and A. L. Barabási, Competition and multiscaling in evolving networks, Euro phys. Lett. 54 (2001), 436.

[6] Xiang Li and Guanrong Chen, A local-world evolving network model, physic A 328 (2003) 274-286.

[7] Larry Page, Sergey Brin. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proceedings of the 7th International World Wide Web Conference (Brisbane, Australia, Apr. 14–18), pp.107–117.

[8] Larry Page, Sergey Brin. The PageRank Citation Ranking: Bringing Order to the Web[C]. Stanford Digital Libraries Working Paper 1998.

[9] Yuhua Liu, Shaohua Tao, Kaihua Xu, Hongcai Chen, “A New Evolving Model of Complex Networks”. Proceedings of the 4th International Conference on Impulsive Dynamical Systems and Applications (ICIDSA 2007), Nanning, China, pp.1803-1805.

[10] Dorogovtsev S. N., Mendes J. F. F., Evolution of networks with aging of sites[J]. Phys. Rev. E 71, 2000 046112.

V3-786 2010 2nd International Conference on Future Computer and Communication [Volume 3]