Upload
zhenrong
View
217
Download
4
Embed Size (px)
Citation preview
Modeling of scale-free network based on Pagerank algorithm
Yi Zhang
Department of Computer Science,
Huazhong Normal University, Wuhan, China
Kaihua Xu
College of Physical Science and Technology
Huazhong Normal University, Wuhan, China
Yuhua Liu
Department of Computer Science,
Huazhong Normal University, Wuhan, China,
Email: [email protected]
Zhenrong Luo
Department of Computer Science,
Huazhong Normal University, Wuhan, China
Abstract—BA scale-free network model ignores the appraisal
which the node obtains in the network evolution. In the real
system, new nodes tend to connect the nodes have been highly
recognized. Therefore the paper presents a scale-free network
model based on pagerank algorithm. The simulation
experiment has showed the network dynamic generation
process and obtained the degree distribution of BA model and
new model under the same level scale. The simulation result
indicated that the new model’s degree distribution presents the
power-law distribution, which has similar with BA model.
Keywords- BA model; pagerank; degree distribution
I. INTRODUCTION
Complex network closely associated with our life,
including the Internet, www, power networks, metabolic
networks, research and co-operation networks can be
described by complex network theory. The complex network
has attracted many related area researchers' widespread
attention. The researchers through statistical data, analysis of
network characteristics, have established a series of network
models to further realize the network behavior, and improve
the network performance. The most famous network model
including the random graph model built by Erdós and
Rényi[1], the Watts-Strogatz small-world model[2], the
Barabási-Albert scale-free model[3].
In recent years, complex network research's important
discovery was that many complex networks, including
www, Internet and other networks, the distribution of node's
degree satisfied the power law distribution. The vast
majority of node has low degree, but there are a few high
degree nodes. As the degree of such networks there is no
obvious characteristic length, so called it scale-free network[4]. In order to explain that the power law distributed
production mechanism, Barabási and Albert proposed the
scale-free network model. BA model has played a
significant role in promoting scale-free network research, It
quite accurate reveals the essential characteristics of
networks, has exposed the formation mechanism of
scale-free networks, but regarding the realistic network, BA
model is too simplified and ignores some factors of the
network evolution. Many researchers made a number of new
attempts and efforts base on it. Bianconi and Barabási posed
a fitness model [5], firstly studied the impact of competitive
factors. Xiang Li and Guanrong Chen proposed a
local-world evolution model [6], they think exists the
preferential attachment in the local world.
Many examples indicated that the additional connection
has also referred to the appraisal obtained in the real system,
but the degree does not objectively reflect the node’s
recognized extent. This paper introduces the Google’s
pagerank algorithm, proposed a scale-free network model
based on the BA model.
V3-783978-1-4244-5824-0/$26.00 c©2010 IEEE
II. PAGERANK ALGORITHM
PageRank is the core technology Google uses to
determine a page's relevance or importance, it is proposed by
Google's founders Larry Page and Sergey Brin. Pagerank
algorithm applies the academic citation literature to the Web,
largely by counting citations or links to a given page. This
gives some approximation of a page’s importance or quality[7]. Each link to a page is a vote of the page, being linked
more means more votes by other sites,� because of this
correspondence, PageRank is an excellent way to prioritize
the results of Web keyword searches. The probability that
the random surfer visits a page is its PageRank. If a page
linked by many other pages, that it was universally
recognized and trusted, then its pagerank value is also
higher. Some popular web site’s pagerank value is relatively
very high, because in order to enhance their click rates, the
general pages tend to link these high-impact Web sites.
PageRank is an objective measure of its citation importance
that corresponds well with people’s subjective idea of
importance.
Pagerank algorithm is based on the following
assumptions [8]:” A page is referenced (reverse link) the
more, explained that this page is more important; Another
intuitive justification is that although many times a page has
not been cited, but it was cited by important pages, it may be
very important; The importance of a web page equally
passed to the reference page.”
The pagerank value is calculated like this[7][8]: The entire
Internet is a large directed graph, Denoted by� ( , )G V E= ,�
V is the set of all pages, E is the set of directed edges,
( , )i j express that page i have a hyperlink points to page�
j .we suppose that ( )iPR V is the pagerank value of page
iV , ( )iC V is page iV ’s out-degree, d is the damping
factor, its value is between 0 to 1.The pagerank value of
page A is given by:
( 1) ( )( ) ( )
( 1) ( )
PR T PR TnPR A d
C T C Tn= + +� � � � � � � ����
The PR values of all the pages are met, therefore can
obtain:
T TPR dMPR= � � � � � � � � � � � � � � ���
Where M is the coefficient matrix, TPR corresponds to
all the page’s pagerank values, Therefore,� TPR is the
eigenvectors of characteristic root 1/d of M .� As long as
calculates the eigenvector, that is the page collection
correspondence pagerank value, may use the iterative
method to compute the pagerank value. Given the initial
vector P to do the first iteration, is equivalent to using the
initial vector multiplied by the above matrix. Use the result
with the first iteration multiplied by the matrix above to do
the second iteration, according to this iteration continues,
finally calculate the pagerank values of the corresponding
set of web pages.
If there is a page does not contain any hyperlink, then its
out-degree is 0, then after finite iterations, the PR value of
all vertices will converge to 0. This is because as the page is
not to make contributions to any PR, therefore, the sum of
PR as a whole continued to decline and eventually reduced
to 0. To overcome this problem, equation (1) Improvements
as follows:
( 1) ( )( ) (1 ) ( )
( 1) ( )
PR T PR TnPR A d d
C T C Tn= − + + +� (3)
Can obtain, the PR value of each page is equivalent to
the probability of a user randomly click browse. Pagerank
algorithm in addition to sort your search results, but also can
be applied to other aspects, such as estimates of network
traffic, back links predictor and so on [8].
III. MODEL OF SCALE-FREE NETWORK BASED ON
PAGERANK ALGORITHM
Compared with the experimental result which obtains in
the real network, BA model has some limitations. In a real
network, new nodes tend to link good "word of mouth"
node. For instance, in the scientific research cooperation
V3-784 2010 2nd International Conference on Future Computer and Communication [Volume 3]
network, scholars are fond of quoting literature cited by
more people. BA model according to the degree of nodes to
predict new additional connection, for the new nodes gain
connections less time, so this nodes’ connection number are
few. But in the real system a node's connection and growth
are not only rely on the sequence of nodes enter the network,
on the WWW some documents through a combination of
good content acquire a large number of links in a very short
time, easily overtaking websites that have been around for
much longer time. Therefore, we should consider the impact
of high recognized node to the new connections. Pagerank
algorithm can objectively reflect the node’s recognized
extent, also reflect the appraisal this node obtained.�Therefore this paper introduces the pagerank value into the
BA model, carries on the connection together with degree.
A. The new model of networks
Use ( )PR i to represent the pagerank value of node i
use ik to indicate the degree of node i , the improvement
BA model definition is as follows:
1) Growth Starting with a small number of nodes,
everytime we add a new vertex with m ( 0m≤ )
edges that link the new vertex connected to mdifferent vertices already present in the system.
2) preferential attachment:� we assume that the
probability i∏ that a new vertex connect to vertex
i depends on ik and ( )PR i , as follows:
( )( )
( )i
i
jj
k PR i
k PR j
+∏ =
+�� � � � � � � � � � � ���
( )( )jj
k PR j+� � is the sum of the rest of nodes
and its pagerank value in network. In this modified BA
model, comprehensive considered the attraction that a node
to new joined node, the degree and pagerank value of a node
collectively determine the probability that the new node
connected to it.
B. Degree distribution
Using the continuum mechanics theory to analyze the new model’s character and predict degree distribution. We
suppose that ik and ( )PR i are two continuums real
variable, and by dynamic equation, we obtain that
( )( )
( )i i
i
jj
k k PR im m
t k PR j
∂ += =
∂ +∏ �
� � �
�( ) ( )
1 2 1i i
jj
k PR i k PR im m
k mt
+ += =
+ +�� � (5)
The coefficient m indicates the change of the total degree of vertices in the original system between a time interval, it equal to the degree of the new node. By the generation rules of network, the initial con- dition of equation (5) is:
( )i ik t m= ���� it t= �
So the equation solution to meet this initial condition is
as follows:
( )1
22 1
( ) ( ( )) ( )2 1i
i
mtk t m PR i PR i
mt
+= + −
+� (6)�
The above equation indicated that all nodes evolve
according to the identical way, namely obeys the power law
distribution, the node evolution is consistent with the BA
model. Then in accordance with the formula (6) calculated
the degree of the node distribution function ( )p k as
follows:
( )( ) iP k k
p kk
∂ <=
∂� � �
2
30
(2 1)( ( ))
( )( ( ))
mt m PR i
m m t k PR i
+ +=
+ +� � � (7)�
From equation (7), when t →∞ the degree distribution of vertices is
2
3
2( ( ))( )
( ( ))
m PR ip k
k PR i
+≈
+� � � � � � � (8)�
From equation (8), when 0iC = so that
2 3( ) 2p k m k −≈ � � � � � � � � � � � (9)�
[Volume 3] 2010 2nd International Conference on Future Computer and Communication V3-785
This is the degree distribution of BA model, this model
also shows that BA is a special case of this model.
IV. SIMULATION
We adopt the new model as well as BA model to
simulate the process of dynamically generation of the
realistic computer networks, and compare the two models. In
this paper, the initial nodes 0m with 5, when a new vertex is
added, the new vertex will connect with four old vertices
( m =4), and ultimately form the network of
vertex 2000N = . Starts from the random network to evolve,
degree and degree distribution under the new model can get
conclusion and we can compare degree and degree
distribution with BA model, as Fig.1 and Fig.2.
�Fig.1. new model degree distribution�
�Fig.2. BA model degree distribution�
From Fig.1 and Fig.2, we know the power-law of two models have similar degree distribution,� the degree of vertices of the two models follow the power-law distribution, the simulation results of the model are consistent with the theoretical analysis. However, the degree distribution of new model is gentler than the BA model. It indicated that pagerank values affect the additional connections’ choices, not only depends upon the degree to
choose, but also consider the grades the nodes have been recognized.
V. CONCLUSIONS
Although BA model reveals the essential characteristics of networks, which can be used to describe many realistic networks, it still has some limitations. In order to make the network model close to the realistic system, we need to consider more intrinsic factors in the construction process of network. According to new nodes tend to connect the nodes have been highly recognized in the real system, this paper presents a scale-free network model based on pagerank algorithm. Through theoretical analysis and simulation experiment, we can obtain that the new model follow power-law distribution. Although the new model consider the grades the nodes have been recognized, but the extent of this impact need further research.
REFERENCES
[1] Erdós and Rényi. On the evolution of random graphs. Publ. Math. Inst.Hung. Acad. Sci 5, 1960:17~61.
[2] Watts D.J. and Strogatz S.H. Collective dynamics of small-world networks, Nature393, 1998, pp.440~442.
[3] R. Albert and A. L. Barabási.Emergence of scaling in random networks [J].Science, 1999, 286:509-512.
[4] � Xiao-Fan Wang, Li Xiang, Chen Guan-Rong. Complex network theory and its application [M]in chinese. Beijing: Tsinghua University Press 2006.
[5] G. Bianconi and A. L. Barabási, Competition and multiscaling in evolving networks, Euro phys. Lett. 54 (2001), 436.
[6] Xiang Li and Guanrong Chen, A local-world evolving network model, physic A 328 (2003) 274-286.
[7] Larry Page, Sergey Brin. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proceedings of the 7th International World Wide Web Conference (Brisbane, Australia, Apr. 14–18), pp.107–117.
[8] Larry Page, Sergey Brin. The PageRank Citation Ranking: Bringing Order to the Web[C]. Stanford Digital Libraries Working Paper 1998.
[9] Yuhua Liu, Shaohua Tao, Kaihua Xu, Hongcai Chen, “A New Evolving Model of Complex Networks”. Proceedings of the 4th International Conference on Impulsive Dynamical Systems and Applications (ICIDSA 2007), Nanning, China, pp.1803-1805.
[10] Dorogovtsev S. N., Mendes J. F. F., Evolution of networks with aging of sites[J]. Phys. Rev. E 71, 2000 046112.
V3-786 2010 2nd International Conference on Future Computer and Communication [Volume 3]