8
Finding Influencers in Networks using Social Capital Karthik Subbian, Dhruv Sharma University of Minnesota, Minneapolis, MN 55455. Email: {karthik,dsharma}@cs.umn.edu Zhen Wen IBM T.J. Watson Research Center, Yorktown Heights, NY 10598. Email: [email protected] Jaideep Srivastava University of Minnesota, Minneapolis, MN 55455. Email: [email protected] Abstract—The existing methods for finding influencers use the process of information diffusion to discover the nodes with maximum information spread. These models capture only the process of information diffusion and not the actual social value of collaborations in the network. We have proposed a method for finding influencers using the idea that people generate more value for their work by collaborating with peers of high influence. The social value generated through such collaborations denotes the notion of individual social capital. We hypothesize and show that players with high social capital are often key influencers in the network. We propose a value-allocation model to compute the social capital and allocate the fair share of this capital to each individual involved in the collaboration. We show that our allocation satisfies several axioms of fairness and falls in the same class as the Myerson’s allocation function. We implement our allocation rule using an efficient algorithm SoCap and show that our algorithm outperforms the baselines in several real-life data sets. Specifically, in DBLP network, our algorithm outperforms PageRank, PMIA and Weighted Degree baselines up to 8% in terms of precision, recall and F 1-measure. I. I NTRODUCTION With the proliferation of online social networks, the in- fluence of one person or one event may reach every corner of the globe in a very short period of time. The problem of identifying the key sources of influence, is important for practical applications in sales and marketing [1], public health and policies. Existing approaches for finding influencers in networks [2], [3], [4], [5] attempt to model social influence through the process of information diffusion. The more influential a user is, the wider is the spread of information. The information flow is modeled using a network structure with static or dynamic edge probabilities, which are estimated from past observation of information flow. The notion of influence model used in these papers, is that each node independently infects its neighbors with a certain probability and every infected node further cascades this infection in the network. Even though this captures the process of influence from a node-to-node perspective, it fails to offer insights on the influence in the context of whole network, or when there are only few available observations of information flow. To explain, let us consider the following example. The new CEO of a company may have only a few connections and limited information flows in an organizations network. Can he/she influence a new technology in the company? The answer would be yes. The reason for this influence is not because of his few connections or limited information flows, but because of the control that he exerts on the network resources (in this case all the employees of the company). Such aspects, cannot be fully captured by local interactions of two nodes. Instead, they can only be studied if we understand the value that each node contributes to and derives from the overall network. We hypothesize that nodes that have high social value in the network tend to be more influential in the network, as is the case of the CEO in our example. For this purpose, we first characterize the value of the network using “Social Capital”. Among the various definitions of social capital [6][7][8], the most accepted definition is: “Social capital is about the value of social networks, bonding similar people and bridging diverse people, with norms of reciprocity” [7]. The notion of social capital includes bonding and bridging capital. The bonding capital is the ability to calibrate similar people against each other. The bridging capital is the ability to connect diverse people. The ability of these bonding and bridging nodes to coop- erate and communicate with each other creates an inherent value for the entire network. For example, in an academic co-authorship network, interdisciplinary researchers can be considered strong bridging nodes, and research leaders in their own community acting as strong bonding nodes. The overall value generated by such cooperation is termed as the social capital of the network. The question now boils down to, how these nodes share this overall social capital amongst themselves in a fair way? After we know the fair share of network value for each node, we hypothesize that this value is proportional to the potential of a node to influence the network. A. Contributions and Organization We have proposed a method for finding influencers using the idea that people generate more value for their work by collaborating with peers of high influence. The social value generated through such collaborations denotes the notion of individual social capital. We hypothesize and show that players with high social capital are often key influencers in the network. Overall, we bring interesting concepts from Social Science and Game Theory to address the problem of finding influencers. Some of the key contributions of our work are as follows: We have developed a novel method for finding influ- encers in a given network using the notion of social capital. It is different from other centrality measures such as degree, and PageRank as we find the node with high social value generated through multiple collabo- rations. Our method is different from cascade based influence models as we do not use any underlying influence propagation model in our approach. 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 592 ASONAM'13, August 25-29, 2013, Niagara, Ontario, CAN Copyright 2013 ACM 978-1-4503-2240-9 /13/08 ...$15.00

[ACM Press the 2013 IEEE/ACM International Conference - Niagara, Ontario, Canada (2013.08.25-2013.08.28)] Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social

  • Upload
    jaideep

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Finding Influencers in Networks using Social Capital

Karthik Subbian, Dhruv SharmaUniversity of Minnesota,

Minneapolis, MN 55455.

Email: {karthik,dsharma}@cs.umn.edu

Zhen WenIBM T.J. Watson Research Center,

Yorktown Heights, NY 10598.

Email: [email protected]

Jaideep SrivastavaUniversity of Minnesota,

Minneapolis, MN 55455.

Email: [email protected]

Abstract—The existing methods for finding influencers usethe process of information diffusion to discover the nodes withmaximum information spread. These models capture only theprocess of information diffusion and not the actual social valueof collaborations in the network. We have proposed a methodfor finding influencers using the idea that people generate morevalue for their work by collaborating with peers of high influence.The social value generated through such collaborations denotesthe notion of individual social capital. We hypothesize and showthat players with high social capital are often key influencers inthe network. We propose a value-allocation model to computethe social capital and allocate the fair share of this capital toeach individual involved in the collaboration. We show that ourallocation satisfies several axioms of fairness and falls in the sameclass as the Myerson’s allocation function. We implement ourallocation rule using an efficient algorithm SoCap and show thatour algorithm outperforms the baselines in several real-life datasets. Specifically, in DBLP network, our algorithm outperformsPageRank, PMIA and Weighted Degree baselines up to 8% interms of precision, recall and F1-measure.

I. INTRODUCTION

With the proliferation of online social networks, the in-fluence of one person or one event may reach every cornerof the globe in a very short period of time. The problemof identifying the key sources of influence, is important forpractical applications in sales and marketing [1], public healthand policies.

Existing approaches for finding influencers in networks [2],[3], [4], [5] attempt to model social influence through theprocess of information diffusion. The more influential a useris, the wider is the spread of information. The informationflow is modeled using a network structure with static ordynamic edge probabilities, which are estimated from pastobservation of information flow. The notion of influence modelused in these papers, is that each node independently infects itsneighbors with a certain probability and every infected nodefurther cascades this infection in the network. Even thoughthis captures the process of influence from a node-to-nodeperspective, it fails to offer insights on the influence in thecontext of whole network, or when there are only few availableobservations of information flow. To explain, let us considerthe following example. The new CEO of a company may haveonly a few connections and limited information flows in anorganizations network. Can he/she influence a new technologyin the company? The answer would be yes. The reason forthis influence is not because of his few connections or limitedinformation flows, but because of the control that he exertson the network resources (in this case all the employees ofthe company). Such aspects, cannot be fully captured by local

interactions of two nodes. Instead, they can only be studiedif we understand the value that each node contributes to andderives from the overall network. We hypothesize that nodesthat have high social value in the network tend to be moreinfluential in the network, as is the case of the CEO in ourexample.

For this purpose, we first characterize the value of thenetwork using “Social Capital”. Among the various definitionsof social capital [6][7][8], the most accepted definition is:“Social capital is about the value of social networks, bondingsimilar people and bridging diverse people, with norms ofreciprocity” [7]. The notion of social capital includes bondingand bridging capital. The bonding capital is the ability tocalibrate similar people against each other. The bridgingcapital is the ability to connect diverse people.

The ability of these bonding and bridging nodes to coop-erate and communicate with each other creates an inherentvalue for the entire network. For example, in an academicco-authorship network, interdisciplinary researchers can beconsidered strong bridging nodes, and research leaders intheir own community acting as strong bonding nodes. Theoverall value generated by such cooperation is termed as thesocial capital of the network. The question now boils downto, how these nodes share this overall social capital amongstthemselves in a fair way? After we know the fair share ofnetwork value for each node, we hypothesize that this value isproportional to the potential of a node to influence the network.

A. Contributions and Organization

We have proposed a method for finding influencers usingthe idea that people generate more value for their work bycollaborating with peers of high influence. The social valuegenerated through such collaborations denotes the notion ofindividual social capital. We hypothesize and show that playerswith high social capital are often key influencers in thenetwork. Overall, we bring interesting concepts from SocialScience and Game Theory to address the problem of findinginfluencers. Some of the key contributions of our work are asfollows:

• We have developed a novel method for finding influ-encers in a given network using the notion of socialcapital. It is different from other centrality measuressuch as degree, and PageRank as we find the node withhigh social value generated through multiple collabo-rations. Our method is different from cascade basedinfluence models as we do not use any underlyinginfluence propagation model in our approach.

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

592ASONAM'13, August 25-29, 2013, Niagara, Ontario, CAN Copyright 2013 ACM 978-1-4503-2240-9 /13/08 ...$15.00

• Our model is based on the popular value-allocationapproach [9] for finding individual social capital value.We compute the social capital value generated bymultiple collaborations and allocate the fair shareof this value to individual nodes involved in thatcollaboration. We show that our allocation satisfiesseveral axioms of fairness and it falls in the same classas the popular Myeron’s allocation rule[10].

• We implement our allocation rule using an efficientalgorithm SoCap and show that our algorithm outper-forms the baselines in several real-life data sets. Espe-cially, in DBLP collaboration network, our algorithmoutperforms PageRank, PMIA and Weighted Degreebaselines up to 8% in terms of precision, recall andF1-measure.

The paper is organized as follows. Section II discusses therelated work. We formally define the notion of social capitaland the valuation function in section III. The desired axioms offairness required to allocate the social capital among the nodesand an efficient algorithm to implement this allocation functionis shown in section IV. Finally, in section V we verify ourapproach on multiple real life data sets to show that nodes withhigh social capital value are indeed key influencers. Section VIis the conclusions.

II. RELATED WORK

The problem of finding influencers in the network is oftenstudied as an influence maximization problem [2], [11], [4],[5], [3]. The problem of influence maximization is findingthe top-k nodes such that the the average infection spreadis maximized, under a specific influence propagation model.There are two popular choices for the influence propagationmodel, Independent Cascade (IC) and Linear Threshold (LT)[2]. All these related work assume edge propagation proba-bilities for the influence propagation model are given. Themost popular choices for edge propagation probabilities areweighted cascade model [2] or trivalency model [5]. Further,most of these influence maximization approaches use a MonteCarlo (MC) simulation technique, to find the top-k seed nodesin the network that maximizes the average influence spread.Some other recent work focuses on optimizing the greedyheuristics of MC simulation, such as [11], [4]. Leskovec et.al, proposed an optimization strategy (CELF) [4] using thesubmodularity property of influence maximization function. In[11], the authors propose a shortest path based influence model,and describe an efficient algorithm for computing influencespread using this influence model. All these approaches findinfluencers using an underlying influence propagation modeland use efficient techniques for MC simulations.

In a parallel thread, there are several papers that discussthe network formation, their efficiency and stability [9], [12],[13], [14]. The seminal work of Jackson and Wolinksy [9]defines a valuation function for the network and an allocationrule that distributes this value to nodes in the network. Theirmodel does not take into account the bridging benefits of anode. In [12], a non-cooperative game model is proposed tostudy network formation, and they assume that benefits do notdecay from non-neighbor nodes. In similar lines, Kleinberg et.al. [13] proposes a non-cooperative game model for network

formation which assumes bridging benefits exist only for apath length of two. This corresponds to a node bridging twoof its neighbors, and getting a certain benefit for doing so. Allother bridging benefits are ignored in this paper. In this paper,the authors also characterize the structure of stable networkswith Nash Equilibrium as the notion of stability. In a morerecent work [14], the authors formulate the problem of networkformation using several important characteristics such as thecost of maintaining links and decaying benefits with distance.These related work discuss fair division of value amongstnodes in a network. Their focus is to understand the stabilityand efficiency of network formation. In contrast, the goal ofthis paper is to propose a framework to compute social capitalvalue of nodes in networks.

There have also been some recent attempts to define andcompute social capital [15], [16], [17]. In [15], [17] Smithet. al. discusses bonding and bridging capital as the twocomponents of social capital and uses implicit affinities andexplicit affinities between nodes to compute the bonding andbridging capital. Their model assumes bonding and bridgingcapital is only from benefits due to immediate neighborsand completely ignore the non-neighbor benefits. Hence, thedecaying benefits from far away non-neighbor nodes are alsoignored. They also attempt to compute social capital directlyat a node level. However, the social capital is generatedbecause of social interactions of all nodes in the network.Therefore, our paper focus on computing the overall value ofsuch interaction and the fair allocation of the capital.

III. COMPUTING SOCIAL CAPITAL

In this section, we define the notion of social capital,in the context of collaborative networks. As defined in [7],Social capital is the value of social networks, bonding similarpeople and bridging between diverse people, with norms ofreciprocity. We now translate this definition to a computablesocial capital value v(g) for the network g.

A. The Valuation Function

Let g = 〈V,E〉 be a network1 with vertex set V and edgeset E. The valuation function is defined as v : Ψ → �,where Ψ is the set of all possible network topologies of nodesin set V . More precisely, for a given graph g ∈ Ψ, ourvaluation function that computes the social capital value ofthe network is given by v(g) =

∑(i,j)∈E b (dg(i, j)). The

distance function dg computes the distance of the shortestpath between nodes i and j. Here we assume that peopleoften make new connections within the network to reach newerfriends through shortest paths. This assumption is consistentwith many network formation studies [9], [18]. The functionb(.) is the benefits achieved due to shortest path of certaindistance l. We choose the benefits function to be exponentiallydecaying e−λl, where l is the distance of the shortest path.When there is no shortest path between i and j, the functiondg becomes ∞ and the corresponding benefit value is zero.

The social capital valuation of network g is given byv(g) which is computed by adding the shortest path benefitsobtained between all pairs of nodes in g. Note that, this

1The terms “network” and “graph” are used interchangeably in this paper.

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

593

definition captures the two important desired properties of ourvaluation function: (1) benefits due to immediate neighbors,and (2) the decaying benefits from non-immediate neighbors.The benefits due to immediate neighbors are captured bythe benefits of path length equal to one, and the benefitsdue to non-immediate neighbors are captured by benefits ofpath length greater than one. The benefit function is chosensuch that the benefits are decaying exponentially as the pathlength grows. So, for a complete graph g where each node isconnected to any other node through shortest path of lengthone, the social capital value is maximum and its value isv(g) = ((|V | − 1) ∗ |V | ∗ b(1))/2. In contrast, for a graphg with no edges, the social capital value is zero.

The immediate neighbor benefits, eventually measures thebonding capability of the network and the non-immediateneighbor benefits measures the bridging capability of thenetwork. Thus, our valuation function captures both bridgingand bonding benefits for measuring the social capital ofthe network. we now describe the desired properties for thevaluation function.

Our valuation function satisfies two desired properties [19]:(1) Anonymity and (2) Component Additivity, defined asfollows.

Definition 1 (Anonymity of v). Given a permutation π of theset of nodes N , and any graph g ∈ Ψ, let the permuted graphbe gπ = {(π(i), π(j)) |(i, j) ∈ g}. Then, a valuation functionv is said to be anonymous if v(gπ) = v(g).

Note that, graph g and permuted graph gπ share thesame network structure with relabeling of nodes. Hence, ananonymous valuation function v is independent of node labels.

Definition 2 (Component Additivity). Let Ω(N) ={C1, ..., Ck} be the set of all connected components of graphg ∈ Ψ. A valuation function v is component additive, if itsatisfies,

∑i v(Ci) = v(g).

Now, we show that our valuation function satisfies thesetwo properties.

Lemma 1. The valuation function v(g) satisfies anonymity.

Proof:

v(gπ) =∑

(π(i),π(j))∈Eπ

b (dg(π(i), π(j))

=∑

(i,j)∈E

b (dg(i, j)) = v(g)

Lemma 2. The valuation function v(g) satisfies componentadditivity.

Proof: We know that,

v(Cl) =∑

(i,j)∈El

b(dg(i, j))

where El is the edge set of the component Cl. Then,

k∑l=1

v(Cl) =k∑

l=1

∑(i,j)∈El

b(dg(i, j))

=∑

(i,j)∈ k∪l=1

El

b(dg(i, j)) =∑

(i,j)∈E

b(dg(i, j)) = v(g)

IV. THE ALLOCATION FUNCTION

We now explain our allocation function Y and show that itsatisfies the four desired properties of fairness. If these desiredproperties are satisfied, then as shown in [19], our allocationfunction falls in the same class of allocation functions as theMyerson Value allocation [10].

An allocation function is defined as Y : Ψ → �n, whereY = [Y1...Yn] denotes the allocated social capital value ofnodes 1 through n.

A. Properties of Allocation Function

The following four properties [19] are the desired proper-ties of allocation function Y : (1) Anonymity, (2) ComponentBalance, (3) Weak Link Symmetry and (4) ImprovementProperty.

The anonymity property ensures that the allocation functionis independent of the player labels and the definition followsfrom definition 1 where Yπ(i)(g

π) = Yi(g).

Definition 3 (Component Balance). Let the set of all connectedcomponents of graph g be Ω(N) = {C1, ..., Ck}. The alloca-tion function Y is component balanced if,

∑j∈Ci

Yj(g) =v(Ci), ∀Ci ∈ Ω(N).

The component balance property ensures that value withina connected component Ci is completely allocated to only theelements within the component.

Definition 4 (Weak Link Symmetry). An allocation rule Ysatisfies weak link symmetry, if Yi(g ∪ {e}) > Yi(g), thenYj(g ∪ e) > Yj(g) must hold for all e = (i, j) /∈ g.

This is a more general form of equality criterion specifiedby Myerson in [10]. We prefer to ensure this criterion becausethe utility received by adding a new edge in the graph may notbe necessarily due to equal contributions from both the nodes.

Definition 5 (Improvement Property). An allocation rule Ysatisfies improvement property, if Yz(g ∪ {e}) > Yz(g), ∀e =(i, j) /∈ g, z ∈ N/{i, j}, then Yi(g ∪ {e}) > Yi(g) or Yj(g ∪{e}) > Yj(g) must be satisfied.

This property implies that when adding a new edge e =(i, j) to the graph g, if the utility for any other node other thani and j increases, then the utility for at least one of the nodesi or j must strictly increase.

V. PROPOSED ALLOCATION FUNCTION

Our proposed allocation function is based on the idea thateach node contributes a certain value to the network by beingin the shortest path of lengths varying from 1 to the diameter

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

594

of the graph. The diameter in the worst case is |V | − 1. Wemeasure the fractional contribution of each node k ∈ V for allpossible shortest path lengths and take a weighted average ofthe benefits due to each path length based on this fractionalcontribution.

Now, we define formally the Fractional Contribution ofeach node k.

Definition 6 (Fractional Contribution). The fractional contri-bution for a graph g of path length l for any node k ∈ V

is given by αlk(g) =

βlk

l+1 , where βlk is the total number of

shortest paths of length l in which node k is present. As wedistribute the value generated by each shortest path equally toall nodes in the path, we divide by (l + 1).

Definition 7 (Allocation Function). The allocation functionis defined as Y : Ψ → �n. The kth component of Y isYk(g) which is the weighted sum of benefits due to all the pathlengths, weighted by the corresponding fractional contribution.

Formally, Yi(g) =|V |−1∑l=1

αli(g)b(l).

Now we show that our allocation function satisfies the fouraxioms described in Section IV-A.

Lemma 3. The allocation function Y (g) satisfies anonymity.

Proof:

Yπ(i)(gπ) =

|V |−1∑l=1

αlπ(i)(g)b(l) =

|V |−1∑l=1

αli(g)b(l) = Yi(g).

Lemma 4. The allocation function Y (g) satisfies componentbalance.

Proof: Let Vk be the node set of component Ck and γlk

corresponds to the total number of shortest paths of length lin component Ck. The shortest path length between nodes iand j for component Ck is dk(i, j). Then,

∑j∈Vk

Yj(Ck) =∑

j∈Vk

|Vk−1|∑l=1

αlj(Ck)b(l)

=∑

j∈Vk

|Vk−1|∑l=1

βlj

l+1b(l) =|Vk−1|∑l=1

b(l)l+1

∑j∈Vk

βlj

=|Vk−1|∑l=1

b(l)(l+1)γ

lk(l + 1) =

∑i,j∈Vk

b(dk(i, j)) = v(Ck)

.

Lemma 5. The allocation function Y (g) satisfies weak linksymmetry.

Proof: By adding an edge e = (i, j), if Yi(g∪ e) > Yi(g)then there is new shortest path P = (s, ..., i, j, ...t) from somenode s to t through e of length l. Let α̃l

i be the fractionalcontribution for node i for graph g∪e. Then, α̃l

i > αli. As, the

shortest path P has to pass through the edge e and node j mustparticipate in this new shortest path, corresponding α̃l

j > αlj .

Thus, Yj(g ∪ e) > Yj(g).

Lemma 6. The allocation function Y (g) satisfies improvementproperty.

Proof: If edge e = (i, j) is added then Yz(g∪ e) > Yz(g)implies that the node z is in the new shortest path P =(s, ..., z, ..., i , j, ...t) of length l due to the addition of edgee to graph g. The allocation Yz of this node can increase ifα̃lz > αl

z . As a consequence, β̃lz > βl

z . This is true for any nodethat lies in the new shortest path P , and hence without loss ofgenerality we can say, β̃l

i > βli and hence Yi(g ∪ e) > Yi(g).

Now, we proceed to explain our algorithm to implementthis allocation rule.

A. Algorithm

Our algorithm for computing the social capital value ofnodes in a given network is listed in Algo. 1. The algorithmiterates over all the vertices, considering each vertex as thesource vertex. In each iteration it finds all the shortest pathsfrom the source vertex to the remaining vertices in the graphusing Dijkstras [20] approach. However to compute the socialcapital for a node we also need the information about howmany times the node has appeared on different shortest pathsalong with the distance and the hops for each of these paths.We address this challenge efficiently in our algorithm. Ouralgorithm has two phases namely, the forward propagationphase (line 3 to line 29) and the backward propagation phase(line 30 to line 37).

The forward propagation phase essentially finds all shortestpaths between the current iteration source vertex and allthe other vertices connected to it within the maximum hopsspecified by the user. Our algorithm uses a Fibonacci heappriority queue to manage the vertices that are still to beexpanded. Each vertex on expansion passes the messages (alsocalled packets) it has received to its adjacent vertices afterincrementing the hops for the message by one. Each vretexaccumulates the messages it receives using hops as the key,and maintains the count of number of packets received fora given hop value. It may be noted that there may multipleshortest paths between two vertices, which will all have thesame distance but different number of hops. The social capitalvalue for each shortest path is to be distributed amongst allthe vertices equally in that shortest path based on the numberof hops.

The backward propagation phase sends information backfrom the terminal nodes of the shortest paths in the reversechronological order of discovery. Each vertex picked fromthe stack sends messages or packets back to the immediatelypreceding vertex from which it received the packets. In thecase of backward propagation these messages are accumulatedusing distance and hop as the key. The distance is the shortestdistance from the source vertex to backward propagation vertexwhere the current packet is initiated. The hops are countedfrom the backward propagation vertex to the current vertex.

When a vertex is picked from the stack for backwardpropagation each forward message can be multiplied witha backward message to compute the social capital valuecontributed by all the shortest paths passing through this vretexfor a given distance and hop. This process is illustrated in Algo.

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

595

Algorithm 1: SoCapInput: G(V,E): social network graph; L: maximum

path lengthOutput: Y = [y1, ..., yn]: social capital of nodes in V

1 for each v in V do2 Initialize Queue Q, Stack S;3 v.status ← found; v.dist ← 0; v.fpmsgs ← [];

v.bpmsgs ← []; v.activeICEdges ← [];4 v.fpmsgs.add(hops ← 0,pkts ← 1);5 Q.add(v);6 while (!Q.empty()) do7 u ← Q.poll();8 u.status ← closed;9 S.push(u);

10 if u.fpmsgs.minHops() == L then11 continue;

12 for each w adjacent to u do13 e = E[u,w]; /*E[u,w] is edge from u to w*/14 if w.status == closed then15 continue;

16 if w.status == found then17 if w.dist < u.dist+(1/e.weight) then18 continue;

19 if w.dist > u.dist+(1/e.weight)) then20 w.fpmsgs ← [];21 w.activeICEdges ← [];22 w.dist ← u.dist + (1/e.weight);23 Q.decreasePriority(w);

24 else25 w.dist ← u.dist + (1/e.weight); w.status

← found; Q.add(w);

26 for each msg in u.fpmsgs do27 if msg.hops() == L then28 continue;

29 w.fpmsgs.add( hops ← msg.hops() + 1,pkts ← msg.pkts );w.activeICEdges.add(e);

30 while (!S.empty()) do31 u ← S.pop();32 u.bpmsgs.add(dist ← u.dist, hops ← 0,pkts ←

1);33 u.scv ← u.scv + computeSCV(u);34 for each e(w, u) in u.activeICEdges do35 for each msg in u.bpmsgs do36 w.bpmsgs.add(dist ← msg.dist, hops ←

msg.hops + 1, pkts ← msg.pkts);

37 v.status ← not found; v.dist ← ∞; v.fpmsgs ←[]; v.bpmsgs ← []; v.activeICEdges ← [];

38 for each v in V do39 Y.add(v.scv);

40 Return Y;

2 . Once each vertex in the graph G has been processed as thesource vertex, the social capital value of each vertex has beencomputed and accumulated.

Algorithm 2: computeSCVInput: v: vertex to compute Social Capital ForOutput: iterSCV : social capital of vertex in V for the

current source vertex1 currentSCV ← 0;2 for each fpmsg in v.fpmsgs do3 for each bpmsg in v.bpmsgs do4 dist ← bpmessage.dist;5 totalPkts ← fpmsg.pkts ∗ bpmsg.pkts;6 totalHops ← fpmsg.hops + bpmsg.hops;7 currentSCV ← currentSCV + (totalPkts ∗

b(dist) / (totalHops + 1))) ;

8 Return currentSCV ;

VI. EXPERIMENTS

In this section, we describe the data sets, baselines and theevaluation measures used in our analysis. Then, we present theeffectiveness and efficiency results of our proposed algorithmcompared to various baselines. All the experiments are eval-uated using a Windows 2008 Server, running two Intel XeonCPU 2.67 GHz processors with 8GB RAM. The code wasimplemented using Java version 1.6.

A. Data Sets

DBLP data set: The downloaded DBLP dataset [21] has1,033,321 distinct authors and 1,632,443 publications. Weconstructed the DBLP co-authorship consisting of 1,033,321authors and 3,489,607 edges. Out of this 58,277 nodes werefound with no neighbors. The average degree of a node was3.38 and the number of weakly connected components was104,299, including the zero degree nodes.

Patent data set: The US patents data set downloaded2

consists of all the patents granted from 1977 to 1999 by the USPatent Office (USPTO). The patent co-authorship network has1,357,542 nodes and 2,509,120 undirected edges. The averagedegree of a node in this data set is 1.85 which is almosthalf (1.8X lower) compared to the DBLP data set. There are291,660 nodes with zero degree and a total of 417,933 weaklyconnected components including the zero degree nodes. Thepatent network is sparse when compared to the DBLP network.Also it is much more disconnected compared to DBLP as it isevident from the number of weakly connected components.This is primarily due to the limited collaboration amonginventors in the patent network compared to a more openenvironment for collaboration in academic publishing.

B. Baselines

We have used three baselines to compare against ouralgorithm. These algorithms are popularly used for comparisonin influence maximization problems [2], [3]. The three choicesare justified by their variety, where PMIA is a more recentand influence cascade based algorithm, PageRank is a popularvariant of Eigenvalue centrality measure and WeightedDegreeis based on the degree of the node. Following are the briefdescription of the baselines.

2http://www.google.com/googlebooks/uspto.html

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

596

• PMIA: It is the Prefix excluded extension of Maxi-mum Influence Arborescence (PMIA) model[3]. Thisalgorithm takes a network structure with pre-definededge probabilities. We have used weighted cascademodel proposed in [2] to compute the edge proba-bilities.

• Weighted Degree: This baseline WeightedDegreepicks the top-k nodes with maximum total out-degreeweights.

• PageRank: We have used the power method to com-pute the page rank values and the restart probabilitywas set to 0.15. The stopping criteria was set 10−7

which is the difference of L1 norm between twosuccessive iterations.

We refer to our algorithm as SoCap and we use the modelparameters λ = 1 and K = 3 through out the experiments.We use the number of collaborations among the co-authors asthe edge weights in both DBLP and USPTO networks.

C. Evaluation Measures

We evaluate each method by comparing the top-k influ-encers of that method against the top-k authors based onthe total citation count. For this purpose, we obtained thecitation counts for authors in DBLP and USPTO from [21] and[22] respectively. We have used standard information retrievalmeasures, such as precision, recall and F1-measure to comparethe top-k authors list against the top-k most cited authors. Thetop-k most cited authors act as the relevant set of results,typical in a web-search evaluation scenario, while the top-kauthors list from each method acts as the retrieved list. Giventhe relevant and retrieved lists we can compute the precision,recall and F1-measure as follows:

precision = |relevant∩retrieved||retrieved| , (1)

recall = |relevant∩retrieved||relevant| , (2)

F1 = 2∗precision∗recall(precision+recall) . (3)

The precision measures the fraction of authors retrievedthat are relevant (i.e most cited), while recall measures thefraction of authors that are relevant that are successfullyretrieved. The F1 measure evenly weighs the precision andrecall factors to measure the test’s accuracy.

D. SoCap - A Case Study

In this section, we discuss a case study of the top-kinfluencers found by our algorithm compared to others. Wehave listed the 10 most cited authors in the DBLP and USPTOnetwork in Table I and II respectively. We list the ranks of eachof these influencers as found in the top-1000 list found by eachmethod. We denote a missing influencer in top-1000 list by a‘-’. The influencers are ranked by each method in ascendingorder, where the first influencer is ranked 1 and the last isranked 1000.

Our method performs extremely well in the DBLP networkwhere these is strong academic collaboration forming highbonding and bridging nodes. As shown in Table I, our methodhas found 9 out of 10 influencers and in the best position

TABLE I. TABLE SHOWING THE 10 MOST CITED AUTHORS IN DBLPWITH THEIR CORRESPONDING RANKS AS OBTAINED BY VARIOUS

METHODS. A ’-’ DENOTES THE AUTHOR WAS NOT FOUND IN TOP-1000LIST BY THE METHOD.

DBLP Influencer SoCap PageRank Weighted Degree PMIAJeffrey D. Ullman 753 - - -Rakesh Agrawal 492 - - -Hector Garcia-molina 218 399 541 852David S. Johnson - - - -Jiawei Han 44 158 219 447Scott Shenker 445 - - -Christos Faloutsos 92 225 328 719David E. Culler 450 - - -David J. Dewitt 416 - - -Hari Balakrishnan 866 - - -

TABLE II. TABLE SHOWING THE 10 MOST CITED AUTHORS IN USPTOWITH THEIR CORRESPONDING RANKS AS OBTAINED BY VARIOUS

METHODS. A ’-’ DENOTES THE AUTHOR WAS NOT FOUND IN TOP-1000LIST BY THE METHOD.

USPTO Influencer SoCap PageRank Weighted Degree PMIAFelix Theeuwes - - - -Roshantha A.S. Chandra - - - -Shunpei Yamazaki 107 400 546 -Donald E. Weder - - 183 139Kary B. Mullis - - - -Yasushi Sato 160 206 258 429George Spector 1 2 1 1Jerome H. Lemelson 493 - 776 542Charles W. Eichelberger - - - -Terry M. Haber 621 - - 709

(lower rank) when it was found by a baseline. We find that inUSPTO (Table II) the influencer George Spector is consistentlypicked by all the methods in the top position. However, thebaseline methods and our approach missed 50% of the top-10 influencers in USPTO network, as the patent collaborationnetwork is highly disconnected. Overall, in both the data sets,our approach did find the influencers in a higher rank when atleast one of the baseline methods discovered the influencer.

E. Effectiveness Analysis

We study the effectiveness of our proposed approach bycomparing the top-k influencer list obtained from each methodagainst the top-k cited authors list, which acts as the groundtruth. The comparison will be performed in a typical informa-tion retrieval setting. The ground truth top-k cited authors willact as the relevant list, and the top-k list from each methodacts as the retrieved list. Now we measure the precision, recalland F1-measure using (1), (2) and (3) respectively. We haveshown our results in Figure 1.

We measured the precision, in Figure 1(a), by varyingthe top-500 authors retrieved by each method against the 500most cited authors (i.e. the relevant list). As one can see theproposed approach SoCap finds up to 16% of the 500 mostcited authors, while the best baseline method, PageRank, canonly find 8%. We also measured the average precision over thetop-500 authors for each method which is shown in brackets.The average score indicates the overall performance of eachmethod measured over the entire range of top-500 list. SoCaphas 4% higher average precision compared to the best baseline.Our method also performs the best in terms of recall and F1-measure as shown in Figure 1(b) and (d) better than the bestbaseline up to 8%. This clearly shows up in the precision-recall curve where our method is consistently above all theother baselines.

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

597

50 100 150 200 250 300 350 400 450 5000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

TOP−K INFLUENCERS

PR

EC

ISIO

N

SoCap (0.105)

PMIA (0.0367)

PageRank (0.0596)

Weighted Degree (0.0471)

50 100 150 200 250 300 350 400 450 5000

0.05

0.1

0.15

TOP−K INFLUENCERS

RE

CA

LL

SoCap (0.0631)

PMIA (0.0195)

PageRank (0.0343)

Weighted Degree (0.0264)

(a) Precision measured for top-k influencers for DBLP (b) Recall measured for top-k influencers for DBLP

0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040

0.05

0.1

0.15

RECALL

PREC

ISIO

N

SoCapPMIAPageRankWeighted Degree

0 50 100 150 200 250 300 350 400 450 5000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

TOP−K INFLUENCERSF 1 M

EASU

RE

SoCap (0.0801)PMIA (0.0255)PageRank (0.0459)Weighted Degree (0.0344)

(c) Precision-Recall curve for DBLP (d) F1-measure for DBLP

50 100 150 200 250 300 350 400 450 5000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

TOP−K INFLUENCERS

PREC

ISIO

N

SoCap (0.3858)PMIA (0.3673)PageRank (0.3724)Weighted Degree (0.3768)

50 100 150 200 250 300 350 400 450 5000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

TOP−K INFLUENCERS

REC

ALL

SoCap (0.1388)PMIA (0.1356)PageRank (0.1356)Weighted Degree (0.1363)

(e) Precision measured for top-k influencers for USPTO (f) Recall measured for top-k influencers for USPTO

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.180

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

RECALL

PREC

ISIO

N

SoCapPMIAPageRankWeighted Degree

0 50 100 150 200 250 300 350 400 450 5000

0.05

0.1

0.15

0.2

0.25

TOP−K INFLUENCERS

F 1 MEA

SUR

E

SoCap (0.1808)PMIA (0.1758)PageRank (0.1768)Weighted Degree (0.1773)

(g) Precision-Recall curve for USPTO (h) F1-measure for USPTO

Fig. 1. The effectiveness results for DBLP and USPTO data sets are shown for the top-k influencers compared against the top-k most cited authors. The averagevalue of precision, recall and F1-measure computed over the top-k list for each method is shown in brackets. In DBLP our approach performs 8% better thanthe best baseline in terms of precision, recall and F1-measure. All methods are closely competing in USPTO data set as the network is extremely disconnectedand sparse. Our approach on average performs better than all baseline methods in this network.

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

598

TABLE III. THE RUN TIME REPORTED (IN SEC.) FOR BASELINE AND

PROPOSED APPROACH.

Method DBLP USPTOPMIA 4233.76 3356.04PR 1001.43 856.67WD 352.21 239.75SoCap 3394.64 1395.91

We performed the same set of experiments on the USPTOdata set. As noted earlier, patent data set is extremely discon-nected. The is evident from the number of weakly connectedcomponents in the patent data set, which is 417,933 (4X larger)than DBLP network. This is because the collaborations amonginventors are more closed than academic publication and rarelythere are inventions spanning across multiple organizations orgroups. As there is no clear winner amongst the baselines weuse the average performance, shown in brackets, to compareagainst our method. We find that our method performs betterthan all the baseline methods in terms of precision, recall andF1-measure on average as shown in the Figure 1 (e)-(h).

F. Efficiency Results

We compare the run times of baseline algorithms againstour proposed approach in Table III. The WeightedDegree andPageRank algorithms are fast but they are not as effective asour approach SoCap (See Figure 1). Our proposed approachtakes almost the same time as PageRank in sparse networks,while giving on average a beter performance in terms ofprecision, recall and F1-measure. While in a comparativelydenser network, DBLP, our approach works extremely wellshowing an improvement of up to 8% on precision, recalland F1-measure. However, it consumes more time comparedto centrality like measures such as PageRank. The PMIAbaseline is compute intensive and not as efficient our proposedapproach.

VII. CONCLUSIONS AND FUTURE WORK

The problem of finding influencers in networks is importantfor many domains from marketing to public health. We haveproposed a new approach to find influencers in networksusing their social capital value. We formulated this problemas a value-allocation model, where the allocated value denotesthe individual social capital. Our allocation function satisfiesseveral desired axioms of fairness and falls in the same classas Myerson’s value allocation function. We implement thisallocation model using an algorithm and empirically demon-strate the effectiveness of our algorithm using two real-lifedata sets. We showed that on sparse networks, our method iscomputationally efficient and works well on average and onrelatively denser networks it shows an improvement in termsof precision, recall and F1-measure up to 8% compared to thebest baseline method.

One can further extend our model to include temporaldynamics for evolving networks by capturing the decayingstrength of ties between nodes as time progresses. Also,with the tremendous increase in the volume of data, graphsizes are quickly reaching billions of nodes which makessequential processing nearly impossible. Our method can beeasily extended to large scale graph processing frameworkssuch as Apache Giraph or GraphLab as the vertex centricmessage passing is more natural in these frameworks.

ACKNOWLEDGEMENTS

This research was sponsored by the Defense Advanced Re-search Project Agency (DARPA) agreement number W911NF-12-C-0028. The views and conclusions contained in this docu-ment are those of the authors and should not be interpreted asrepresenting the official policies, either expressed or implied,of the DARPA or the U.S. Government. The U.S. Govern-ment is authorized to reproduce and distribute reprints forGovernment purposes notwithstanding any copyright notationhere on. We thank the anonymous reviewers for their valuablesuggestions and comments.

REFERENCES

[1] K. Subbian and P. Melville, “Supervised rank aggregation for predictinginfluencers in twitter,” in SocialCom, 2011, pp. 661–665.

[2] D. Kempe, J. M. Kleinberg, and va Tardos, “Maximizing the spread ofinfluence through a social network,” in KDD, 2003, pp. 137–146.

[3] W. Chen, C. Wang, and Y. Wang, “Scalable influence maximizationfor prevalent viral marketing in large-scale social networks,” in KDD,2010, pp. 1029–1038.

[4] J. Leskovec, A. Krause, C. Guestrin, C. Faloutsos, J. VanBriesen, andN. S. Glance, “Cost-effective outbreak detection in networks,” in KDD,2007, pp. 420–429.

[5] W. Chen, Y. Wang, and S. Yang, “Efficient influence maximization insocial networks,” in KDD, 2009, pp. 199–208.

[6] R. S. Burt, Brokerage and Closure : An Introduction to Social Capital.Oxford University Press, 2005.

[7] P. Dekker and E. M. Uslaner, Social Capital and Participation inEveryday Life. Rouledge, 2001.

[8] D. Knoke, Organizational Networks and Corporate Social capital.Kluwer, 1999.

[9] M. O. Jackson and A. Wolinsky, “A strategic model of social andeconomic networks,” in Jour. of Eco. Theo., vol. 71, no. 1, 1996, pp.44–74.

[10] R. B. Myerson, “Graphs and cooperation in games,” in Mathematics ofOperations Research, vol. 2, no. 3, 1977, pp. 225–229.

[11] M. Kimura and K. Saito, “Tractable models for information diffusionin social networks,” in KDD, 2006, pp. 259–271.

[12] S. Goyal and F. Vega-Redondo, “Structural holes in social networks,”in Journal of Economic Thoery, vol. 137, no. 1, 2007, pp. 460–492.

[13] J. Kleinberg, S. Suri, E. Tardos, and T. Wexler, “Strategic networkformation with structural holes,” in Elec. Comm., 2008, pp. 132–141.

[14] R. Narayanam and Y. Narahari, “Topologies of strategically formedsocial networks based on a generic value function - allocation rulemodel,” in Social Networks, vol. 33, 2011, pp. 56–69.

[15] M. Smith, C. Giraud-Carrier, and B. Judkins, “Implicit affinity net-works,” in Annual Workshop on Information Technologies and Systems,2007, pp. 8–13.

[16] M. Smith, “Social capital in online communities,” in Workshop on PhDStudents in Information and Knowledge Management, 2008, pp. 17–24.

[17] M. Smith, C. Giraud-Carrier, and N. Purser, “Implicity affinity networksand social capital,” in Information Technology and Management, 2009,pp. 123–134.

[18] C. Johnson and R. P. Gilles, “Spatial social networks,” in Rev. of Eco.Des., vol. 5, no. 3, 2000, pp. 273–299.

[19] B. Dutta, A. V. D. Nouweland, and S. Tijs, “Link formation incooperative situtations,” in Int. Jour. of Game Theo., vol. 27, no. 2,1998, pp. 245–256.

[20] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introductionto algorithms. MIT press, 2001.

[21] Y. Li, B. Liu, and S. Sarawagi, Eds., Proceedings of the 14th ACMSIGKDD International Conference on Knowledge Discovery and DataMining, Las Vegas, Nevada, USA, August 24-27, 2008. ACM, 2008.

[22] http://www.google.com/googlebooks/uspto.html.

2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

599