Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
Cooperative Control of
Multi-Agent Systems
Hideaki Ishii
Dept. Computational Intelligence & Systems Science
Advanced Topics in Mathematical Information Sciences II
Jul 17th, 2015
2
Part 2: Application of consensus
Distributed computation of PageRank
for search engines
2
Search Engine at Google
Proposed by the co-founders S. Brin and L. Page
Quantifies importance/popularity of each web page
Popular pages are ranked higher in search results
One of the 200 signals used at Google
PageRank Algorithm at Google
Brin & Page (1998), Langville & Meyer (2006)
A paradigmatic problem for ranking objects in
IT, Bibliometrics, Biology, E-Commerce,…
Scientific Journals: Eigenfactor
cf. Impact Factor: Based on # of citations only (Not enough!)
Ideas similar to PageRank can be traced back to 70s
Proteins in systems biology
Companies related by business interactions
…
PageRank in Various Areas
6
Keyword: “Department of control engineering” (in Japanese)
Differences in search results
6
Tokyo Tech: Only in the 16th place!
7
Keyword: “Department of control engineering” (in Japanese)
Differences in search results
7
What makes the difference?
8
PageRank algorithm at Google
Quantifies the importance of each website Uses this info for the ordering
8
Google Toolbar
9
How is PageRank determined?
Basic idea Brin & Page (1999)
More incoming links, especially those from
important pages, make a page important.
Determined by the link structure of the web
9
5 5
7 Top page of Tokyo Tech
Cont. Sys. Eng. Mech. Eng. Sci.
5
Comp. Intel. Sys. Sci.
10
PageRank: Computational aspectsComputed centrally with in Google
Web data: Automatically collected by crawlers
Over 8 billion webpage indices
Computed once a month, takes about a week
Recent research on more efficient computing
10
11
Motivation 1: Multi-agent consensusView each webpage as an agent capable of computing
Motivation 2: Probabilistic algorithms for systems and control problems
Allow the agents to behave asynchronously
Ishii & Tempo, IEEE Control Systems Magazine, 201411
Distributed randomized approach
12
PageRank problem
13
PageRank problem
Web consisting of n pages
Link
Page i
13
Node
Edge
Directed graph
Add an artificial link
Page without any link(e.g., PDF file)
14
PageRank problem
Web consisting of n pages
Link
Page i
14
Page without any link(e.g., PDF file)
Add an artificial link
: PageRank value of page i Larger value implies more important Determined by the link structure only
]10[ ,ix
15
PageRank definition Example: A web of 4 pages
15
1
2 3
4
2 3
4
41 31 xx
# of links of page 4
PageRank value of page 1
1616
324 21
21 xxx
1
2 3
4
# of links of page 2
PageRank definition Example: A web of 4 pages
PageRank value of page 4
17
PageRank definition Example: A web of 4 pages
17
1
2 3
4
324
423
4312
41
21
21
31
21
31
21
31
xxx
xxx
xxxx
xx
14321 xxxx
Normalization:
1
2
3
4
Ordering
1818
1
2 3
4
4
3
2
1
4
3
2
1
02/12/103/102/103/12/1013/1000
xxxx
xxxx
Sum of elements in each column = 1
PageRank definition Example: A web of 4 pages
In the vector form
(Column) stochastic
matrix
19
PageRank vector
Eigenvector corresponding to eigenvalue 1
Always exists, but there may be multiple such vectors
If the web as a graph is strongly connected, then it is
unique
1]10[1
, , ,n
ii
n xxAxx
19
Link matrix: Stochastic matrix
However, the real web does not have this
property…
20
where
M > 0:Each element is a positive value
Modified PageRank problem
1]10[1
, , ,n
ii
n xxMxx
: , ,: SmSn
mAmM 15.01)1(
11
11
20
Stochastic
(by Perron theorem in matrix theory)
Redefine PageRank by
Only one such eigenvector exists
21
Computation based on the power method
Eigenvalues of M have magnitude ≦ 1
As a discrete-time system, it is (critically) stable
Asymptotic convergence to PageRank:
Modified PageRank problem
1]10[1
, , ,n
ii
n xxMxx
)()1( kMxkx
21
kxkx ,)(
22
Computation based on the power method
Centralized computation:
May require high computational load
Modified PageRank problem
1]10[1
, , ,n
ii
n xxMxx
)()1( kMxkx
22
Can the computation be implemented in a distributed way?
23
Data center of Google
23The Dalles, Oregon, U.S.A.
24
Distributed randomized approach to PageRank computation
25
Distributed randomized approach
Each page i computes its own value
Pages exchange their info over the links)(kxi
25
)(kxi
Page i
26
Basic protocol in the algorithm
At each time k, one page is chosen.
Denote the index of this page by
Then follow Steps 1~3.
26
)(k
Chosen at time k)()( kx k
Distributed randomized approach
Step 1: Send Step 2:
Return
Step 3: Update
27
One page is chosen at a time
Select a page probabilistically
Each page has the same probability of
Distributed randomized approach
27
It can be implemented decentrally
n1
28
Distributed update scheme
Goal: The scheme should compute the PageRank
of each agent from the state
)()1( )( kxAkx k
Switches depending on the chosen page
ix
i
ixx 1)0(0)0( ,
How shall the link matrices be selected?
28
)(kxi
Initial vector
29
Centralized scheme
02/12/103/102/103/12/1013/1000
A
Distributed link matrices
1
2 3
4
)()1( kAxkx
29
3
4
Page 4 is chosen
02/12/103/13/13/1
4A
Distributed scheme
4A
4)()()1( )( kkxAkx k ,
02/12/103/1003/1003/100
4A
02/12/103/12/1003/102/103/1001
4A
Column stochastic matrix
30
Modified update scheme
Stochastic system
Average state Its dynamics Average matrix
)()1( )( kxMkx k i
ixx 1)0(0)0( ,
30
)()1( kxMkx
)]([)( kxEkx :
][ )(kMEM :
Must converge to x
31
Modified link matrix
Same form as M:
)()1( )( kxMkx k i
ixx 1)0(0)0( ,
iSn
mAmM ii ,1ˆ)ˆ1(
31
kxkx ,)(
kxkx ,)(However, the state does not converge…
and M share the eigenvector for eigenvalue 1.M: , )10(ˆ m
32
Convergence result
The time average converges to the PageRank vector in the mean-square sense:
kxkyE ,0)(
2*
x
32Ishii and Tempo (2010), Ishii, Tempo, Bai (2012)
k
llx
kky
0)(
11)( :
We hence focus on the time average:
Numerical Experiment
From a university in New Zealand (www.lincoln.ac.nz)
3,756 nodes, 31,718 links, 684 subdomains
Statistical Cybernetics Research Group, Univ. Wolverhampton, U.K.
0 1000 2000 3000
0
1000
2000
3000
Index j
Inde
x i
Inde
x i
Web Structure
Plotted nonzero entries in the link matrix A. Very Sparse.
Large number of dangling nodes (red dots > 85 %)
Large clusters
Index j
PageRank Values
Pages in the clusters take larger values.
Top ranked pages: “Search” page & Univ. top page
0 1000 2000 3000 40000
0.002
0.004
0.006
0.008
0.01
0.012Pa
geR
ank
Page index
Pag
eRan
k
Index i
36
Relation to consensus
37
Consensus problemNetwork of agents: Directed graph, Strongly connected
Agent i
37
)(kxi
Communication: Edges are chosen randomlyConsensus: With probability 1, it holds that
jikkxkx ji ,0)()( , ,
Hatano & Mesbahi (2005), Wu (2006), Tahbaz & Jadbabaie (2008)
38
Comparison
PageRank Consensus
Graph × Strongly connected
Update law
Randomization Page Edge
Objective
Matrixaaaa Column stochastic
Row stochastic
*)( xky 0)()( kxkx ji
)()1( )( kxMkx k
38
Time ave.
)(kM
)(k
39
Effects of uncertain links
40
Uncertain links When the linked page cannot be viewed
Server failure, or the page deleted temporarily
Incorrect data of the web structure
Can we find how much PageRank values vary in the presence of such links?
Especially, for important pages, the error may be large
40Ishii & Tempo (2009)
41
PageRank values under uncertain data
: Set of uncertain links (with d links)
Graphs for all combinations of missing links:
PageRank vector corresponding to each graph
41
fEd2D
1]10[1
)()()()()(
, , ,n
j
ij
niiii xxxMx
Difficult to compute all of them (just too many!)
Proposed method: (Centralized) computation ofPageRank interval
⇒ Contains error, but computationally efficient
42
Total # of links: 2,560, Uncertain links: 18
Range of PageRank (for 20 pages)
Numerical example 1: Web of 150 pages
42
0 2 4 6 8 10 12 14 16 18 200
0.01
0.02
0.03
0.04
0.05
Page index
Page
Ran
k
Page Index
Pag
eRan
kImportant pages:
Linked by about half of the pages
True maximum & minimum
4343
2 4 6 8 10 12 14 16 18 200
200
400
600
800
1000
1200
Number of fragile links
Com
puta
tion
time
# of uncertain links
Computation time
2 4 6 8 10 12 14 16 18 200.98
0.99
1
1.01
1.02
Number of fragile links
Rel
ativ
e er
ror
Relative error
# of uncertain links
Relative error w.r.t. true values
True range
Proposed method True range
Proposed method
Numerical example 1: Web of 150 pages
44
0 2 4 6 8 10 12 14 160
0.005
0.01
0.015
0.02
0.025
Page index
Page
Ran
k
Total # of links: 27,500, Uncertain links: 1000
Range of PageRank values (for 16 pages)
44Page Index
Pag
eRan
kImportant pages:
Linked by over 250 pages
Range obtained by proposed method
Large enough that rankings may change!
Numerical example 1: Web of 1,200 pages
45
Summary: Part 2
PageRank computation at Google
Algorithms via a distributed randomized approach
Method to study effects of uncertain links
Issues related to computation/communication resources
Collaborators:
Roberto Tempo (IEIIT-CNR, Politecnico di Torino)
Er-Wei Bai (University of Iowa)45