Accurate Content Push for Content-Centric Social Networks ...static.tongtianta.site/paper_pdf/13dfc57c-562f-11e9-9f0a-00163e08bb8… · Content-centric networking (CCN) paradigm proposed

426 IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, VOL. 2, NO. 6, DECEMBER 2018

Accurate Content Push for Content-Centric SocialNetworks: A Big Data Support Online

Learning ApproachYinan Feng , Student Member, IEEE, Pan Zhou , Member, IEEE, Dapeng Wu , Fellow, IEEE,

and Yuchong Hu , Member, IEEE

Abstract—With the rapid growth of the social network, infor-mation overload becomes a critical issue. Service providers push alot of redundant contents and advertisements to users every day.Thus, users’ interests and the probability of reading them havedropped considerably and the network load is wasted. To addressthis issue, accurate content push is needed, where the main chal-lenges are proving precise descriptions of users and supporting thebig data nature of users and contents. Content-centric networking(CCN) has emerged as a new network architecture to meet today’srequirement for content access and delivery. By using the namedcontent, CCN makes it possible to track users’ real-time interestsand motivates us studying a novel content accurate push (or calledcontent recommendation) system. In this paper, we model this is-sue as a novel contextual multiarmed bandit based Monte Carlo treesearch problem and propose a big data support online learning al-gorithm to meet the demand of content push with low cost. To avoiddestroying CCN’s energy efficient feature, the energy consumptionis considered into our module. Then, we theoretically prove thatour online learning algorithm achieves sublinear regret bound andsublinear storage, which is very efficient in the big data contextand do not increase the network burden. Experiments in an offlinecollected dataset show that our approach significantly increasesthe accuracy and convergence speed against other state-of-the-artbandit algorithms and can overcome the cold start problem as well.

Index Terms—Online learning, content-centric networking, bigdata, social network, recommender system, contextual bandit,monte-carlo tree search.

I. INTRODUCTION

THE rapidly development of social networks such as Face-book, Twitter, Google+, Youtube (which can be considered

as a social platform as well) in the last few years has transformedthe scale and nature of humans’ social interaction. It gives endusers access to easily get information and, with the fast viralspread of information, information overload becomes a great

Manuscript received July 25, 2017; revised November 20, 2017 and December20, 2017; accepted January 11, 2018. Date of publication March 1, 2018; date ofcurrent version November 21, 2018. This work was supported by the NationalScience Foundation of China under Grant 61401169. (Corresponding author:Pan Zhou.)

Y. Feng and P. Zhou are with the School of Electronic Information andCommunications, Huazhong University of Science and Technology, Wuhan430074, China (e-mail: [email protected]; [email protected]).

D. Wu is with the Department of Electrical and Computer Engineering,University of Florida, Gainesville, FL 32611 USA (e-mail: [email protected]).

Y. Hu is with the School of Computer Science and Technology, HuazhongUniversity of Science and Technology, Wuhan 430074, China (e-mail:[email protected]).

Digital Object Identifier 10.1109/TETCI.2018.2804335

issue. Service providers recommend all kinds of content (in-cluding user-generated content) and push similar advertisementsto millions of users, which exceeds users’ ability and interestsfor browsing. Thus, most delivered contents are useless, net-work load is wasted and providers’ profits are plummeted, sinceadvertisement is a major source of income for social networkservice providers. Motivated by these, a novel accurate pushcomputing method should be proposed. Accurate push can beseen as a multiclass classification problem, where the goal isto sequentially learn a mapping from the context space to thecontent space [7], that requires service providers can followusers’ real-time context and make a personalized recommenda-tion. The two main obstacles are: 1) For high-accuracy push,a precise description of users is necessary. 2) Both users andcontents are big data. It means either the volume or the diversityof both of them is giant and grows rapidly.

Content-centric networking (CCN) paradigm proposed byJacobson et al. [1] emerges as a new network paradigm forcontent push to meet the fact that people regard what contentsthe Internet contains as important rather than where they com-municate to. CCN utilizes named content as packet “address”instead of the host identifier. This new approach greatly savesnetwork load and is more energy efficient than IP-based networkin the current network environment [1], [2]. What’s more, theconcept of CCN perfectly fits the online social network behav-ior, which tends to one-to-many or many-to-many diffusion andretrieval of content [3]. Meanwhile, direct routing content basedon users interests, CCN allows service providers to access moreaccurate user portrait. This property is exploited in our accuratecontent push algorithm for real-time users’ context.

When comes to big data area, most previous works aboutsocial media recommendation [4], [5] cannot perform well.Since they traverse and compute for every individual data,when dataset is relatively small, they can promise high accuracyresult. But with the scale of content increasing, the highlycomputational cost makes these algorithms hardly operate.To handle relatively large dataset, Song et al. [6] organizeall contents as a tree structure where exploring of the treecan achieve a higher accuracy recommendation. However, topromise the property of each node in the tree, they divide thetree-building process away from recommendation, and thiscomputing-inefficient process leads it difficult to performancevery well. Moreover, these traditional method should know the

2471-285X © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

https://orcid.org/0000-0002-7299-5559

https://orcid.org/0000-0002-8629-4622

https://orcid.org/0000-0003-1755-0183

https://orcid.org/0000-0003-1265-7141

Administrator

高亮

Administrator

高亮

Administrator

高亮

Administrator

高亮

Administrator

下划线

Administrator

下划线

Administrator

下划线

Administrator

下划线

FENG et al.: ACCURATE CONTENT PUSH FOR CONTENT-CENTRIC SOCIAL NETWORKS: A BIG DATA SUPPORT ONLINE LEARNING APPROACH 427

amount of data in advance, and cannot support instant increasinglarge dataset, which do not suit for big-data enviroment.

In this paper, we propose a novel contextual multi-armedbandit [7] (MAB)-based Monte-Carlo Tree Search (MCTS) al-gorithm to solve the accurate push issue in the big data context.Due to the continuous similarity among users’ features in parc-tice, we model user context as a measurable user space withLipschitz condition and divide it into minute cluster unit to rep-resent different types of users. The content space, containingall kinds of heterogeneous content spreading in social networkssuch as news, user-generated contents and advertisements, obeysa Lipschitz condition w.r.t the maximum. And a lot of existingworks focus on content feature learning and multiview featureembedding, like [8], [9]. For this multiclass case, we use ban-dit algorithm, guaranteeing the finite-time optimality, to learnthe relation between users and content. For efficiency and ac-curacy, the content space is organized into a Monte-Carlo Treestructure. MCTS is a method for finding optimal decisions. In agiven domain, it takes samples randomly in the decision spaceand builds a search tree according to the results [10]. In thisparticular application, the set of all contents in social networksis the decision space. Each content is mapped to a arm in MABproblem, and each node in the tree is a content cluster. By defin-ing and restrict the dissimilarity in a cluster, our approach isagnostic to the exact number of content and has ability to big-data support and scalability, which can work on an increasingdataset.

Coming from the lake of priori interaction history for diverseusers and new uploaded contents, cold start becomes a greatproblem in recommender system. To address this issue and as-sure computation efficiency as well, all of our operation workon the cluster level. Our algorithm clusters the users accordingto their contexts and contents according to their features. Theuser cluster unit is used to store history information and thecontent cluster is selected by the algorithm, such that past in-formation gained from users with similar contexts and contentswith similar features can be used to solve the cold start prob-lem. Moreover, as an online learning method, our algorithmexplores continuously. It explores and exploits alternately, andkeeps learning when new users come. This property addressesthe cold start problem since it helps the algorithm to learn users’attitudes of new contents and preferences of new users.

We build the same number of content tree with the user types,matching one-to-one. Tree-building and exploration processesoccur with the learning steps—the root node contains the allcontents, and only when the feature of a parent node is learnedenough precision, two children nodes (subsets of the parent clus-ter) will be expanded. Push results become more accurate withthe tree growing. Benefitting from such top-to-down design, ouralgorithm has important advantages in computational efficiency,overcoming cold start and shielding the specific number of con-tents, which give us a good adaptability of CCN-based socialnetwork scenario.

We utilize the concept of regret to measure the performanceof the proposed algorithm. It is the accumulated loss since theglobally optimal policy is not followed all the times [11]. Inother words, it is the gap between delivering the optimal content

to the corresponding user and the push strategy followed byour algorithm. We theoretically prove the proposed algorithmachieves sublinear regret. So, it is an asymptotically optimalstrategy. Meanwhile, we prove the space complexity is sublinearand time complexity, up to time T is O (T log T ), which willonly add a little additional burden of computation and storageto servicer in CCN. In a word, the proposed approach can makeefficient and accurate content push in big-data environment withlow cost, solve cold start problem and support instant increasingdataset as well. We also evaluate our algorithm in simulated off-line environment with a large scale real-world dataset.

The remainder of the paper is structured as follows. In SectionII, we describe the related work and compare our approach toexisting works. In Section III, we formalize the contextual ac-curate push problem, and describe the details of system model,definition and notations. While in Section IV, we present ouralgorithm, then analyze the regret bound and complexity. Ex-periment setup and the simulation results are showed in SectionV to verify the theoretical regret bound. Section VI concludesthe paper.

II. RELATED WORKS

Since Jacobson et al. [1] proposed it, CCN paradigm has at-tracted a lot attention. In CCN, end user sends Interest messageidentified by the content name for content his/her requirement.Identified by the same content name, a Data message will returnas response. CCN paradigm is a simple enough network layer,like the renowned IP. And native added-value functions of CCN,such as caching, multicast and multipath, make it greatly reducethe network load, be energy efficient, and can be widely appliedin most scenario [1]–[3]. Many existing works concentrate onCCN’s cache strategy [12]–[14], cache freshness mechanism[15], content retrieval [16] and security [17] to refine the infras-tructure of CCN. Different from them, we are more interestingin extending the application of CCN in a specific scenario, e.g.social networks.

Social networks, nowaday, have entirely changed people’slifestyle, and attract tremendous researchers’ attention. Thereare two main research areas related to our work. The closestone is social recommender system [4], [5]. They focus on lever-aging users’ social information (e.g., gender, age, friendship,history of bought items) to recommend social media contentor products from a relatively small range. Different from them,our approach is specificly proposed for big data applications inthis special scenario. That is, with the developing of CCN andsocial networks, contents gradually presents characteristics ofmassive, complex and heterogeneous. Zhang et al. [18] proposea large-scale social recommender systems. However, as theydemonstrated in experiment, the time cost increases linearlywith the scale of dataset increasing. Although it is acceptable intheir setting, it still cannot handle the big data problem. Thus,existing methods cannot satisfy the requirements of this newsituation.

Another related area is influence maximization (IM), whichselects a set of key users and these users will influence theirneighbours. The goal is to maximize the spread of impact.

Administrator

下划线

Administrator

下划线

Administrator

下划线

Administrator

下划线

Administrator

下划线

Administrator

下划线

Administrator

下划线

Administrator

下划线

Administrator

下划线


Fig. 1. Workflow of the push contents over CCN-based social network.

Kempe et al. prove IM in a social network is a NP-hard problem,and only can get approximate solutions. Gomez-Rodriguez et al.[20] propose an approximation algorithm with near-optimal per-formance in a continuous time network. In [21], Chen et al.extend the goal to maximize the influence in a deadline and pro-pose two heuristic algorithms to overcome the inefficiency ofthe approximation algorithm. Moreover, in [22], Vaswani et al.introduce MAB to IM problem. It is the IM problem that bringsus to an accurate push content problem, which is the anotherkey aspect to maximize the influence.

The online learning method, more exactly, contextual MABalgorithm, is utilized to learn the mapping from users to con-tent. MAB is one of the best online learning algorithms due toits finite-time optimality guarantee and balancing exploration ofcontents with relatively high uncertainties of performance andexploitation of contents with highest estimated performance [7],[11]. Previous contextual bandit algorithms such as [23]–[25],are widely utilized in recommender system, and some proposedfor large-scale problems [6]. They introduce a context space intotraditional bandit problem, mapping it to a single arm space. Toovercome existing MAB algorithms working for discrete andfinite arms for an optimization problem, in [26], [27], they con-sider a bounded surface with Lipschitz condition that each pointon the surface is a arm. Thus, their method can handle infi-nite arms theoretically with Monte-Carlo Tree structure. How-ever, their algorithms do not consider the multiclass situation,which cannot work for personalized content push. From a vary-ing standpoint, Yue et al. [29] proposed the linear submodularbandits problem, which aims to minimize redundancy in per-sonalized recommendation under the contextual bandit frame-work. And Yu et al. [30] introduced the budget constraint to theproblem. Their works and the proposed system concentrate ondifferent important aspect of the recommendation problem andcan complement each other.

III. PROBLEM FORMULATION

A. System Model

We illustrate the content push workflow over CCN-based so-cial network in Fig. 1. The user set is denoted by U and be parti-tioned into different user types. It expands over time, and userscome in time slot t = 1, 2, 3, · · · , T . Our system pushes content

when a user ut ∈ U login or refreshing the homepage of socialnetworks and sends an Interest message to service provider attime t. Observing that, to accelerate learning progress, we con-sider the Interest message for requesting specific content as aspecial kind of push. These two types of Interest message arenamed Recommendation Request Message (RRM) and ContentRequest Message (CRM) respectively. Note that a same typeof users can login from various devices. It does not affect thepush results. The content set is denoted by C. There are severalcontent providers uploading content in the network, so C willexpand over time.

When the system receives the Interest message, it reads theuser’s context (e.g. gender, location, education, recent inter-ests and relationships) according to the message name like/Twitter/ID @Feng and determines user type xat

by his/hercontext. In social networks, the most notable two relations arefriendship and subscriptions (or follow). However, this friend-ship may relate users to a large variety of people who has nocommon interest, like business partner or colleagues. Thus, itcannot reflect one’s interests exactly. So, in context, we onlyconsider using users’ recent subscriptions to facilitate recom-mendation. Next, the system identifies the message type. ForCRM, it replies the certain content directly. For RRM, the systemexplores the corresponding content tree, which is a personalizeddata structure built by previous same user type push results andtheir feedback, to search for the node with the highest estimatedreward of xat

and then select a content ct belonging to that node.Once the user receives the reply message, containing the con-tent’s name, he/she can get it from the nearest cache and repliesa feedback (reward) to the system, presented by rt , to reflectusers’ satisfaction. It will be used to improve performance offuture content push. Since the system only transmits content’sname rather than content itself, it does not add the network loadand the energy efficiency nature of CCN has been kept.

For example, at time t, user 1 belonging to type 1 loginto the social network and sends Interest message. The messagetransmits through the network node 1→ 2→ 3, received by theservice provider. The provider recognizes it is a RRM and sent bya type 1 user. Then, it runs the push algorithm, selects a contentand replies the content name as a Data message backward to theuser. User can find the content cache by utilizing its name inthe nearest CCN node 2, if he/she is interested in it. Finally, theuser replies his/her reward. For simplify, we draw the feedbacktransmits through the same path. But it may be a completeddifferent path in practice.

B. User Space Model

The user set U = {u1 , u2 , · · · } is modeled as a measurabledU dimensional space to outline the different users by theircontexts. Each user u ∈ U is described by a dU dimensionalvector, using the same notation u with a little abuse of notation.For simplicity of exposition and without loss of generality, weassume U = [0, 1]dU is a unit hypercube. For gender in Face-book example, 0 may represent Gender Nonconforming, 1 mayrepresent Pangender, and other value between 0 to 1 can rep-resent other 54 gender types according to their similarity (e.g.,

Administrator

下划线

Administrator

下划线

Administrator

下划线

Administrator

下划线

Administrator

下划线

Administrator

下划线

Administrator

下划线


Cis Female is very close to Cis Woman, but far away from CisMale). The dissimilarity between two users ui, uj ∈ U is de-

fined as ‖ ui − uj ‖=√∑dU

d=1 | udi − ud

j |2 , where udi is the

d-th dimension of ui . As mentioned above, users are classi-fied into serval types. So we further partition the user spaceinto mT subspace. Each subspace represents a user type, de-noting a-th user type by xa . To better reflect the users’ simi-larity, the shape of subspace is not restricted. Some subspaces’shape may be a rectangle containing a larger range of usersand some may be a smaller triangle. However, to balance thecomputing efficiency and personalized performance (push ac-curacy), it should satisfy for any a ∈ [1,mT ] that diam(xa) :=supui ,uj ∈xa

‖ ui − uj ‖≤ αT , where αT > 0 is depended onthe time horizon T . The detail of choosing mT and αT willbe presented in the next section. Note that, for a specific user,he/she can come to system multiple times with different context(of course, different user type as well). Thus, the user space Uonly focuses on user context, but not tracking a certain user.

C. Content Space Model and Content Tree

For the center of CCN, the content, we describe its featuresas a dC dimensional vector, including provider name, type, du-ration, tag, connotation etc. So the content set C = {c1 , c2 , · · · }is modeled as a dC dimensional space. We define a dissimilarityfunction f(c1 , c2) to represent how different between two con-tents. In this paper, we consider the function is given, since thecontent dissimilarity analysis is not our topic. We formulate thefunction as follows.

Definition 1: (content dissimilarity function). The contentspace C is equiped with a dissimilarity function f : C2 →[0,∞) such that for all c1 , c2 ∈ C we have f(c1 , c2) ≥ 0 andf(c1 , c1) = 0.

Normally, more relevant two contents have a smaller dissim-ilarity function value. Then the diameter of a subspace P ⊆ Ccan be defined as diam(P) := supc1 ,c2 ∈P f(c1 , c2) to depict themaximum size of a content subset. And, by borrowing the con-cept of sphere in geometry, we define a special kind of contentcluster, called relative sphere, as B(c, l) := {c′ ∈ C : f(c, c′) ≤l}, where radius l > 0 and center c ∈ C. Later, we will utilizethis kind of content cluster to limit the minimum size of a contentsubset.

As aforementioned, to reduce the search difficulty, we de-crease the search width and depth by the Monte-Carlo binarytree structure covering the whole content space, named con-tent tree. Totally, we build mT trees, and each tree is denotedby T a , where a is the number of corresponding user type xa .Let (a, h, i) (a ∈ [1,mT ], h ≥ 0, i ∈ [1, 2h ]) denote the nodeof tree T a at depth h and index i among all nodes at thesame depth. And ∀a ∈ [1,mT ], the notation Sh,i ⊆ C repre-sents the content subset corresponding to the node (a, h, i).Then, for each subset, we randomly select a content ch,i ∈ Sh,i

to represent this subset. Once our algorithm selects the sub-set, ch,i will be output. Since they are binary trees, they sat-isfy: 1) for each tree, there is only one root node (a, 0, 1)at the beginning and S0,i = C; 2) Sh,i ∩ Sh,j = ∅ (i = j);3) (a, h + 1, 2i− 1) and (a, h + 1, 2i) is the two children of

Fig. 2. Two examples of content clusters.

(a, h, i), where Sh+1,2i−1 ∪ Sh+1,2i = Sh,i . From the property

3, we can further deduce that⋃2h

i=1 Sh,i . It means a tree’s allnodes at the same depth can cover the entire contents.

Thus, the tree structure can be viewed as an index of con-tents in CCN. As an index, similar contents should be organizednearby each other. So, in a content subset, the maximum dis-similarity should be within a certain range. If a content subsetcontains both a user favourite content and a user hating content,the index may be useless. However, if a subset is too small, itmay cause over-fitting phenomena. Thus, the minimum dissim-ilarity should be restricted as well. We formalize these in termsof a restriction of nodes’ size.

Assumption 1: (Nodes’ size). There exist l1 > l2 > 0 and0.5 ≤ ρ < 1 such that for any nodes (a, h, i) ∈ T a :

a) diam(Sh,i) ≤ l1ρh ,

b) ∃voh,i ∈ Sh,i s.t. Bh,i := B(vo

h,i , l2ρh) ⊂ Sh,i .

By this assumption, we restrict both maximum and minimumsize of a node. A simple example is illustrated in Fig. 2, and thepartition method in (b) does not satisfy our assumption. As wecan see, the cluster (a, 1, 1) contains too many contents whichare totally dissimilar to each other, yet the cluster (a, 1, 2) is toosmall which is only the specific user’s interests. By contrast, thepartition method in (a) is much better. Note that our approach isspecialized for big data applications in current social networksscenario. That means we consider, even for a long running timeT , the content subsets will not be divided into individuals (e.g.a leaf node merely contains one contents).

D. Regret Analysis

Our system outputs a content’s name identified in CCN. Thenotation (at , ht , it) is used to denote the selected node at timet, and ct = cht ,it

∈ (at, ht , it) is the corresponding content.When a user receives this reply message, he/she browses thecontent, and feed back an information rt ∈ [0, 1], named reward,to reflect the attitude of content. Normally, the reward can beeither explicit or implicit. Explicit reward asks users to send ascore or rating proactively, which might be difficult to achievein the real world. So we employ implicit reward as feedback.It measures users’ diverse activities to obtain their satisfaction(e.g. rt = 0 means user does not click the push content, rt = 1represents users like the content and value between 0 and 1contains “click”, “share”, etc.).

The reward is considered as a random variable with unknownindependent and identically distribution (i.i.d.). The i.i.d.assumption is according to the fact that typically users witha same context has a relatively fixed response to a specific

Administrator

下划线

Administrator

下划线


content. Thus, we model the unknown expected reward ofcontent c to user context u as μ(c, u) := E{r|c, u}.1 Forsimplify, a short notation μh,i(t) := μ(ch,i , ut) is utilized.The optimal content for a user with context ut is defined asc∗t = argmaxcμ(c, ut). And we define the μ∗t = μ(c∗t , ut) asthe optimal expected reward for its user type. We assume, insocial networks, each content has similar impact for similarusers, and each user has a similar attitude to similar contents;we formalize this as the widely-adopted Lipschitz conditionassumption.

Assumption 2: (Lipschitz condition). For any two user con-text u1 , u2 ∈ U , and for all contents v ∈ V , at each time t > 0,there exists a constant LU that satisies: |μ(c, u1)− μ(c, u2)| ≤LU ‖ u1 − u2 ‖ and μ∗t − μ(c, ut) ≤ f(c∗t , c).

This is a natural and reasonable assumption in content push,since similar users would always have parallel reflection on thepushed content. It includes the relationship between reward andboth user context and contents. For context, it should satisfya standard Lipschitz condition. LU is the Lipschitz constantand is only used in the theoretical analysis but not the runningof our algorithm. Then, for content, we only ask a Lipschitzcondition w.r.t the maximum, which is weaker than the standardone.

In this paper, we measure the loss during the push progressby learning regret. It is caused by lack of knowledge of rewarddistribution. When define one step regret at time t as Δt =μ∗t − rt . And the total expected regret over T steps as

RT = E

[T∑

t=1

Δt

]=

T∑t=1

(μ∗t − E[rt ]) =T∑

t=1

(μ∗t − μht ,it(t)).

(1)The goal of our algorithm is to efficiently computing a push

strategy to minimize Rn . Rn reflects the learning speed, or,formally, the convergence rate to the optimal strategy. Thus, asublinear regret upper bound means our algorithm can convergeto the optimal strategy. Although the number of nodes at thesame depth h can be bounded as 2h , we need a tighter upperbound for a near-optimal content subset in the theoretical anal-ysis. We utilize the concept of packing number. And we definethem as follows.

Definition 2: (Near-optimal subset and Packing number).Let Cγ (t) = {c ∈ C : μ∗t − μ(c, ut) ≤ γ} denote the subsetof γ-optimal contents. We definde the packing numberN (Cγ (t), γ′) ≤ LC (γ′)−dC is the maximum number of pack-ing balls with radiu γ′ that are disjoint with each other in theregion Cγ (t) w.r.t the disimilarrity measure function f . LC isthe packing constant of content space.

Then, combining with Assumption 1, we can conclude thatthe maximum number of γ-optimal nodes at the depth h is thepacking number N (Cγ (t), l2ρh) ≤ LC (l2ρh)−dC , if γ ≥ l2ρ

h .That ensures the search width is proper.

1For non-i.i.d. condition, Tekin et al. [28] propose using two i.i.d processesto bound a non-i.i.d. process.

Algorithm 1: ACP Algorithm.1: Input : Parameters l1 > 0, ρ ∈ (0, 1), c > 0, confidence

δ ∈ (0, 1), time horizon T and cover tree’s structure(Sh,i)h≥0,1≤i≤2h .

2: Partition context space into mT parts.3: Initialize : t = 1, For all a ∈ [1, mT ]: T a

t = {(0, 1), (1, 1),(1, 2)}, Da (t) = 1, Ea

1,1(t) = Ea1,2(t) = Cmax.

4: loop5: if t = t+ then6: for all a ∈ [1, mT ] do7: for all video clusters (a, h, i) ∈ T a

t backwardfrom the leaf nodes (a, D(t), i)1≤i≤2D ( t ) do

8: Update the estimated upper bound Eah,i(t) in (3).

9: Update the recursive upper bound Rah,i(t) in (5).

10: end for11: end for12: end if13: System recives a Interest message, reading the user

context ut .14: Find the user type xat it belongs to.15: if Interest message is CRM then16: Find the request content ct .17: {(at , ht , it), Pt} ← Content− F ind(T at

t , ct).18: else19: {(at , ht , it), Pt} ← Content− Select(T at

t ).20: ut accesses the content cht ,it

from the nearest cacheand gives his/her reward.

21: end if22: Go to the next time slot t = t + 1.23: Update the browsed counter T at

ht ,it(t) and the empirical

average reward μat

ht ,it(t).

24: Update the estimated upper bound Eat

ht ,it(t).

25: Update the traversal path:26: for all (at , h, i) ∈ Pt backward from (at , ht , it) to

(at , 0, 1) do27: Update recursive upper bound Rat

h,i(t) accordingto (5).

28: end for29: Explore the new node:30: if T at

ht ,it(t) ≥ τht

(t) AND (at , ht , it) ∈ leaf (T att )

then31: T at

t ←T att

⋃{(at , ht + 1, 2it − 1), (at , ht + 1, 2it)}.32: Eat

ht +1,2it−1(t) = Eat

ht +1,2it(t) = Cmax.

33: end if34: end loop

IV. ACCURATE CONTENT PUSH ALGORITHM

In this section, we propose the content accurate push (ACP)algorithm to learn the optimal content push strategy. And we givethe theoretical analysis of regret and the algorithm complexity.

A. Algorithm Description

We show the details of ACP in Algorithm 1, and firstly,introduce some important notations and variables needed in thecomputing process. Let Ts = {T 1 , T 2 , . . . , T mT } be the setof all content trees. The notation T a

t denotes the part of treeT a has already been expanded at time t and Da(t) representsthe depth of the tree T x

t . And the notation Cmax represents the

Administrator

下划线

Administrator

下划线


maximum number supported by the operational environment.Then for any node (a, h, i), the number of times contents in ithave been pushed to users within type xa up to time t (includingreply both RRM and CRM) is defined as

Tah,i(t) =

t∑τ =1

Ia,h,i , (2)

where I is the indicator function and

Ia,h,i = I{at = a, ht = h, it = i}.is an abbreviated form. To find the optimal content, the systemshould estimate the upper bound for each node. So we utilizedefine the estimated upper bound E and the recursive upperbound R as follows.

Estimated upper bound: Let t+ = 2�log(t)�+1 ∈ [t, 2t] andδ(t) = min{c1δ/t, 1} (c1 , δ > 0 are constants). The estimatedupper bound Ea

h,i(t) for any node (a, h, i) ∈ T at is computed as

Eah,i(t) = μa

h,i(t) + c

√√√√ log(

1δ(t+ )

)

Tah,i(t)

+ l1ρh , (3)

where μah,i(t) is the empirical average reward of contents in

subset Sh,i to the users within type xa up to time t, computedas

μah,i(t) =

1Ta

h,i(t)

t∑τ =1

rτ (ch,i)Ia,h,i . (4)

The first two trems is the classical upper-confidence bound inbandit algoritms to balance the exploration and exploitation [11].And according to Asumption 1, the third term, the maximumcontent dissimilarity in a node, extend the upper bound from acontent ch,i to a subset Sh,i .

Recursive upper bound: It is the actual upper bound used inACP and recursively computed as

Rah,i(t) =

⎧⎪⎪⎨⎪⎪⎩

Eah,i(t), (h, i) ∈ leaf(T a

t )Cmax, T a

h,i(t) = 0

min

[Ea

h,i(t), maxj∈{2i−1,2i}

Rah+1,j (t)

], other.

(5)

By tree structure, we have Sh,i = Sh+1,2i−1 ∪ Sh+1,2i . So, wecan see that maxj∈{2i−1,2i}R

ah+1,j (t) is a valid upper bound of

node (a, h, i). Combining with the another valid upper boundEa

h,i(t), we compute a tighter upper bound, reflecting the feed-back from child nodes to the parent node, by choosing the min-imum of these two value.

The last new variable is τh(t). It is a threshold ofcounter Ta

h,i(t) to reduce the search depth and guaranteeeach node to has been explored sufficiently. We set τh(t) =c2 log[1/δ(t+)]/(l1ρh)2 . The detail will be demonstratedshortly.

We illustrate how ACP works in Fig. 3. When a user arrives,the system receives the Interest message and read user’s contextut . Then the system finds the user type ut belonging to. Forexample, ut ∈ x13 in Fig. 3. The rest of operations are in thematching tree (Line 13 and 14). Then, it checks the type of Inter-est message. If it is a CRM, the system sends the corresponding

Fig. 3. An example of ACP, with dU = 2, dC = 1, mT = 14. And x13 isthe current user type.

content and calls the Content-Select function to find the nodecontaining the content and the traversal path for later updatingthe tree. In this case, rt = 1 (Line15–17). If it is a RRM, the sys-tem calls the Content-Select function to search for the estimatedoptimal content node (i.e. (13, 4, 8) in Fig. 3), geting the node’sindex and the traversal path Pt (Line 18 and 19). After the userreceives the push, he/she replies the reward (Line 20). Then,based on the reward, the algorithm updates the tree to improvefuture performance (Line 5–12, 23–28). If the current leaf nodehas been learned sufficiently, the algorithm will expand childrennodes to more accurate result (Line 29–34), then, waiting forthe next user coming.

Now, let us focus on the updating process (Line 5–12, 23–28).By the recursive definition of R, at time t = t+ , a node’s recur-sive upper bound only changes when itself or its descendants isselected, and updating process should be down-to-up. So we justneed update the T , μ and E of (at , ht , it), then back propagatingthrough the path, renew all recursive upper bounds of nodes inthe path (Line 13–28). Besides, at the moment t = t+ , all mT

trees’ nodes should refresh E and R, similarly, from leaf nodesto root (Line 5–12). Keeping computing efficiency, this secondwhole tree updating process occurs with phase O(log(n)).

Then, we discuss about the threshold τh(t), a key to promiseefficiency and accuracy. From the two subroutines, Algorithms2 and 3, we can see that ACP selects a node (at, ht , it) onlyif its counter Tat

ht ,it(t) ≤ �τht

(n)�. According to (3), the upper

confidence bound c√

log(1/δ(t+))/T at

ht ,it(t) decreases with the

increasing of Tat

ht ,it(t). At some point, it is smaller than the third

term l1ρht , implying that the uncertainty over the rewards in the

node becomes dominated by the potential dissimilarity due to thenode’s size. This means the node has been sufficiently exploredand its size is too big. Thus, for partitioning the content clustersinto smaller subsets, ACP expand the node by adding its twochildren nodes to the tree (Algorithm 1 Line 29–33). Formally,the choice of threshold comes as

l1ρh = c

√√√√ log[

1δ(t+ )

]

τh (t)⇒ τh (t) =

c2 log[

1δ(t+ )

]

l21ρ2h

. (6)

Therefore, this dynamic expanding process happens duringthe ACP running and cares about region’s size rather than the


Algorithm 2: The Content-Select function.1: Input : Tree T a

t , content ct .2: Initialize : (a, h, i)← (a, 0, 1), P ← (a, 0, 1), T a

0,1(t) =τ0(t) = 1.

3: while T ah,i(t) ≥ τh (t) AND (a, h, i) /∈ leaf (T a

t ) do4: if ct ∈ (a, h + 1, 2i− 1) then5: (a, h, i)← (a, h + 1, 2i− 1).6: else7: (a, h, i)← (a, h + 1, 2i).8: end if9: P ← P

⋃{(a, h, i)}.10: end while11: Output : (a, h, i) and P .

Algorithm 3: The Content-Select function.1: Input : Tree T a

t .2: Initialize : (a, h, i)← (a, 0, 1), P ← (a, 0, 1), T a

0,1(t) =τ0(t) = 1.

3: while T ah,i(t) ≥ τh (t) AND (a, h, i) /∈ leaf (T a

t ) do4: if Ra

h+1,2i−1 ≥ Rah+1,2i then

5: (a, h, i)← (a, h + 1, 2i− 1).6: else7: (a, h, i)← (a, h + 1, 2i).8: end if9: P ← P

⋃{(a, h, i)}.10: end while11: Output : (a, h, i) and P .

number of contents. To add new contents, it just needs to figureout and put them in the region they are belonging to, which donot influence the content tree.

B. Regret Analysis

Before the analysis, we introduce some additional notations.For all 1 ≤ h ≤ Da(t) (a ∈ [1,mT ]) and t > 0, we denote theset of all nodes at depth h up to time t in tree T a

t created by thealgorithm by Ia

h (t), and the subset of Iah (t) only including the

internal nodes (the nodes have been expanded before time t) atdepth h by Ia+

h (t). Since the tree T at is a binary tree, the number

of nodes at depth h is at most twice of the number of expandednodes at depth h− 1. So clearly, it has |Ia

h (t)| ≤ 2|Ia+h−1(t)|.

We denote by tah,i := min{t : Tah,i(t) ≥ τh} the time (a, h, i) is

expanded.As aforementioned, the threshold τh(t) can help us reduce

the depth of Monte-Carlo Tree Search. Now we bound the max-imum depth of a tree that can be generated by ACP.

Lemma 1: Given the expanding threshold τh(t) in (6), ifit satisfies log[1/δ

(ta1,i

)] ≥ 1, the depth of the tree T a

T (a ∈[1,mT ]) can be bounded as

Da(T ) ≤ Dmax(T ) ≤ 12(1− ρ)

log(

T l212(cρ)2

),

where Dmax(T ) denotes the maximum depth at time T .Proof: The deepest tree can be developed by ACP is a lin-

ear tree that at each depth only one node is expanded (e.g.∀h ∈ (0,Da(T )), |Ia

h (t)| = 2, |Ia+h (t)| = 1). And at each time

t > 0, only the users belonging to one user type xa arrive, corre-sponding to Ta

h,i(T ) =∑T

t=1 I{ht = h, it = i}. Then, we have

T =Da (h)∑h=0

∑i∈Ih (T )

Tah,i(T ) ≥

Da (h)−1∑h=0

∑

i∈I +h (T )

Tah,i(T )

≥Da (h)−1∑

h=0

∑

i∈I +h (T )

Tah,i(t

ah,i)

(a)≥

Da (h)−1∑h=0

∑

i∈I +h (T )

τh(tah,i)

≥Da (h)−1∑

h=1

∑

i∈I +h (T )

c2 log[1/δ

(tah,i

)]

l21ρ−2h ≥

Da (h)−1∑h=1

c2

l21ρ−2h ,

where the intermediate inequality (a) follows the fact that onlywhen a node is pushed enough at time tah,i , i.e. Ta

h,i(tah,i) ≥

τh,i(tah,i), it is expanded. And the last inequality comes from the

prerequisite that log[1/δ(tah,i)] ≥ log[1/δ(ta1,i

)] ≥ 1. We will

verify the prerequisite after we finish all proofs and determinethe value of parameters. Then, we have

T ≥ cρ2

l21ρ−2Da (T )

Da (h)−1∑h=0

ρ−2(h−Da (T )+1)

≥ cρ2

l21ρ2Da (T ) Da(T ) ≥ cρ2

l21ρ−2Da (T ) ,

where we utilize the fact Da(T ) ≥ 1. Finally we can obtain

ρ−2Da (T ) ≤(

T l21(cρ)2

)⇒ Da(T ) ≤ 1

2log(

T l21(cρ)2

)/ log(1/ρ)

≤ 12(1− ρ)

log(

T l21(cρ)2

),

where final inequality follows the fact log(1/ρ) ≥ 1− ρ.By Lemma 1, we bound the maximum depth of content trees

by O(log n). It promises our approach traverse the tree with arelatively low cost and a high computing efficiency. So, serviceprovider can respond to users quickly. Next, we will define a highprobability event that a real excepted reward is in a confidenceinterval of the empirical estimates, and prove its probability.This event will future be utilized to prove our algorithm alwaysmakes superior quality content push to users in the next-stepanalysis.

Lemma 2: We define the set for all nodes which is pos-sible in trees under the maximum depth D(max)(t) as Nt =⋃T ∈Ts

⋃T :Depth(T )≤Dm a x (t) Nodes(T ). The high probability

event is defined as

εt = {∀ (a, h, i) ∈ Nt ,∀Tah,i(t) = 1 . . . t :

∣∣μah,i(t)− E[μa

h,i(t)]∣∣ ≤ c

√√√√ log[1/δ (t)

]

Tah,i(t)

}.

Then the event εt holds with probability at least 1− δ/t4 , ifc =

√3/1− ρ and c1 = 6

√ρ/3mT l1 ,


Proof: The probability of the complementary event is

P [εct ] ≤

∑(a,h,i)∈Nt

t∑T a

h , i (t)=1

P

[∣∣μah,i(t)− E[μa

h,i(t)]∣∣ ≥ c

√log[1/δ (t)

]/T a

h,i(t)

]

≤∑

(a,h,i)∈Nt

t∑T a

h , i (t)=1

2 exp

(−2Ta

h,i(t)c2 log(1/δ (t))

Tah,i(t)

)

= 2 exp(−2c2 log(1/δ (t)))t|Nt |,

where the second inequality follows from the famous Chernoff-Hoeffding inequality. Then we bound the maximum number ofnodes inNt by the all full binary trees with the maximum depthDmax(t), so that |Nt | ≤ 2Dmax(t)+1mT . Then, we have

P [εct ] ≤ 2 exp(−2c2 log(1/δ (t)))t|Nt |

≤ 2t[δ(t)]2c22Dmax(t)+1mT

≤ 4t[δ(t)]2c2(

tl21(cρ)2

) 12 ( 1−ρ )

mT ,

where we have used the result in Lemma 1 to get the last in-equality. Now, considering the c and c1 as in the statement, itleads to

P [εct ] ≤ 4t

(δ

t6

√ρ

3mT l1

) 61−ρ(

tl21 (1− ρ)3ρ2

) 12 ( 1−ρ )

mT

≤ 4δ6

1−ρ t1−6

1−ρ + 12 ( 1−ρ )

(√1− ρ

3√

3

) 1( 1−ρ )

mT1− 1

1−ρ

≤ 43√

3δt

−2 ρ−92 ( 1−ρ ) ≤ δ

t4,

which completes the proof. �By Lemma 2, we determine two important parameters. So,

we can obtain that δ (t) = δ 6√

ρ/3mT l1/t, and

log

(1

δ (t)

)= log

(t 6√

3mT l1δ 6√

ρ

). (7)

Obviously, it is easy to satisfy log(1/δ (t)) > 1, since 0 < δ, ρ <1 and t,mT ≥ 1. Meanwhile, l1 always larger than 1 as well.Thus, the prerequisite for Lemma 1 is reasonable. Next, com-bining the result from above lemmas, we bound the regret ofACP.

Theorem 1: (Regret Bound of HCR). Let users’ re-ward be i.i.d., Assumptions 1-2 hold at each timeslot t, αT = (log T/T )1/(dC +dU +1) and mT = MU (T/log T )dU /(dC +dU +1) (MU .0 is a constant). Then, the ex-pectation regret of ACP RT with time horizon T can be

bound as

RT ≤4δ

3+ 6LC T

d C + d Ud C + d U + 1 (log T )

1d C + d U + 1

+6l1ρ

Td C + d U

d C + d U + 1 (log T )1

d C + d U + 1

+36LC MU ρdC −1(1− ρ)−1

l1 l2dC (1− ρdC +1)

log(

2T

δc1

)(T

log T

) d C + d Ud C + d U + 1

+12l1LC MU ρdC −1

l2dC (1− ρdC −1)

Td C + d U −1d C + d U + 1 (log T )

2d C + d U + 1

= O(

Td C + d U

d C + d U + 1 (log T )1

d C + d U + 1

).

Proof: See [31].Remark: In the second step of proof, we bound one-setp re-

gret, under the event εt , as Δht ,it(t) ≤ 4l1ρ

hpt + 2LU αT . It

means, with the high probability at least 1− δ/t4 , our ACP al-gorithm always pushes near optimal content to users, accordingto their real-time interests.

When data in CCN social networks extended, it only needs toadd new uploaded contents into corresponding content clusters,which satisfying Assumption 1. Since our algorithm works oncluster level, the added contents do not affect its operation. Andthe conclusion of Theorem 1 remains true.

We have proved that ACP achieves sublinear regret Rn =

O(Td C + d U

d C + d U + 1 (log T )1

d C + d U + 1 ), which guarantees convergencein terms of the average reward, i.e., limn→∞Rn/n = 0. Thus,ACP can always learn to push the correct content to users.If either dU or dC goes to infinity, the regret approaches tolinearity. Because, with the dimension of the content increasing,the number of nodes we need to explore increases, especially atthe first few layers, where we limit the search width and depthby tree’s structure and the threshold (the result of Lemma 1).

C. Complexity Analysis

In this subsection, we discuss the complexity issue of ouralgorithm and prove that ACP has sublinear storage.

Space complexity: For HCR, the storage comes from storingall nodes for all mC video trees. The following theorem boundsthe expectation number of all nodes.

Theorem 2: LetNT denote the space complexity of HCR upto time T. Under the same condition of Theorem 2, we have

E[NT ] = O(

log (T )2 (d C + 1 )

3 (d C + d U + 1 ) Td C + 3 d U + 1

3 (d C + d U + 1 )

).

Proof: See [31].Time complexity: The number of content clusters need to

be updated is at most O(log (T )2 (d C + 1 )

3 (d C + d U + 1 ) Td C + 3 d U + 1

3 (d C + d U + 1 ) ) byTheorem 2, with refresh phase O(log t). And at each timet, ACP needs to find the user type xat

. Utilizing the re-trieval algorithm like hashing methods, the time cost is O(1).Then, the cost of both traversing the tree and updating Rand U is O(log t), the boundedness of the trees’ depth. Asa result, the total computational cost up to time horizon


TABLE ICOMPARISON WITH EXISTING LARGE-SCALE BANDIT ALGORITHMS

Regret Space complexity Time complexity Context Big data

ACR [6] O(Td I + d C + 1d I + d C + 2 log T ) O(

∑E

l=0 Kl + T ) O(T 2 + KE T ) Yes No

HCT [27] O(Td + 1d + 2 (log T )

1d + 2 ) O(log T

2d + 2 T

dd + 2 ) O(T log T ) No Yes

HOO [26] O(Td + 1d + 2 (log T )

1d + 2 ) O(T ) O(T 2 ) No Yes

ACP O(Td V + d C

d V + d C + 1 (log T )1

d V + d C + 1 ) O(log (T )2 (d C + 1 )

3 (d C + d U + 1 ) Td C + 3 d U + 1

3 (d C + d U + 1 ) ) O (T log T ) Yes Yes

T is O(log (T )1+ 2 (d C + 1 )3 (d C + d U + 1 ) T

d C + 3 d U + 13 (d C + d U + 1 ) + T + T log T ) =

O(T log T ).Theorem 2 proves that our ACP only requires sublinear

storage, where we just have to store the corresponding empiricalrewards and push times of each node for all mT content trees.Although, our approach cannot reduce the content storage inCCN, a sublinear storage complexity means ACP only adds afew additional storage burden on CCN servicer. Moreover, thetime complexity O(T log T ) is the total time cost up to time T ,which merely needs O(log T ) in most of the time. Thus, eachtime user sends an Interest message, service provider only needsto cost a few computation resources and user can be respondedquickly.

We list the regret and complexity of our algorithm and threeexisting large-scale bandit algorithms in Table I. Comparingwith [6], our algorithm outperforms it on all aspects. In otherhands, since they cluster contents from the individual, they mayhave a better performance in a relatively small content dataset.As for HCT [27] and HOO [26], although it looks like we haveclose performances with them in theory, they do not take contextinto account. Thus, they cannot handle this multiclass task tomake accurate content.

V. NUMERICAL RESULT

In this section, we conduct experiments utilizing real tracesfrom Sina Weibo to evaluate our algorithm’s performance. Fora comprehensive test of the algorithm, the experiments con-tain the following four parts: 1) push accuracy; 2) scalability;3) influence of user context clusters’ size; 4) robustness of usersarrivals. Due to the serious logistical challenges of establishingan online system to run the algorithms on live data, the experi-ments are run on a previously collected offline dataset. And withmore focus on the algorithm’s pure performance, we considerthere are only RRM during the experiments.

A. Experimental Setup

We collected data from Sina Weibo2, one of the biggest socialnetwork in China. Like Twitter, users can upload a “weibo” nolonger than 140 words, images and videos. Others can browse,comment, re-share and “like” them. We totally crawled 265,347weibos and 537,082 users profile from Feb. 15-Apr. 2 2017.

2http://weibo.com/ API: http://open.weibo.com/wiki/API

As contents, each weibo is described by a 4-D vector and nor-malized them from 0 to 1. The vectors include four main fea-tures of a weibo: type (“text”, “image”, “video”, “text+image”,“text+video”), uploder, upload date, text MD5. Then for users,we get their information about: ID, age, gender, the latest loginlocation, latest user he/she subscribing, latest uploading content,birthday and personal profile. We adopt the method describedin [24] to reduce the dimension and get a 5-D vector as usercontext. Meanwhile, for each weibo, we recode users’ interac-tion history with it (e.g. who comments, re-shares or likes it)as ground-truth of reward. We set the parameters as: l1 =

√22,

ρ = 1/√

2 and δ = 0.05. Then the partition method of contentcluster is split equally. That means we divide the longest dimen-sion of parents node equally into two parts belonging to twochildren node respectively. And for user, we set all types as hy-percubes with the same size, which have same length 1/mT ineach dimension. And we run the experiment with time horizonT = 105 and mT = 81.

To get unbiased evaluation of our online algorithms by thisoffline dataset, we run the experiment similar to [24]: When runour algorithm, at each time slot t, we randomly pick a user ut asinput, simulating a RRM arrives servicer, and decide a contentct to he/she based on context. Then checking the interactionhistory, the reward rt = I{ut has interacted with ct}. So thereward is a binary value, reflecting users have an interest in thecontent or not.

Since traditional approach cannot handle the big data prob-lem efficiently, ACP is compared with three state-of-art large-scale bandit algorithms to show the increasing of learningperformance.

� Adaptive Clustering Recommendation Algorithm (ACR)[6]: As stated earlier, it is a contextual bandit algorithmfor recommendation. By changing the context into users’query, we compare our algorithm with it to show our im-provement of regret bound, time and space complexity. Inour experiment, we set number of item clusters K = 150.

� Hierarchical Optimistic Optimization Algorithm (HOO)[26]: it is the so-called x-armed bandit algorithm whichproposed for infinite arms situation. But its weakness isit can only work for one kind of user since it does notconsider the context.

� High Confidence Tree Algorithm (HCT) [27]: it is an im-provement x-armed bandit algorithm which reduces thetime and space complexity, without context as well.


Fig. 4. Regret in whole dataset.

Fig. 5. Average regret in whole dataset.

All experiments are implemented and operated on our univer-sity high performance computing platform, whose GPU reachesto 18.46 TFlops and SSD cache is 1.25 TB.

B. Accuracy Comparison

In this subsection, we run our first experiment in the entirestatic dataset to show the learning ability and push accuracy ofour method. We show the cumulative regret and average regretin Figs. 4 and 5.

As we can see, in Fig. 4, the total regret up to time T = 105 ofACP is 39.71% less than the regret of ACR, 61.87% less than theregret of HCT and 64.07% less than the regret of HOO. Sameresult of average regret can get from Fig. 5. Comparing withother bandit algorithms, our methods actually have a lower regretbound and converges more quickly, which coincides with thetheoretical analysis. Looking at the first few rounds, the averageregrets of four algorithms have significant fluctuations, but ouralgorithm become convergence and smoothness faster. Since atthese time, both contents and users are new to algorithms, ithappens cold start problem inevitably. And the fast convergencemeans our ACP can overcome the issue apace. Moreover, itis obvious that contextual algorithms outperform context-freealgorithms (e.g., HOO and HCT), where regret of ACP and ACRare much lower than other two algorithms in this multiclass

Fig. 6. Running time in whole dataset.

Fig. 7. Regret in increasing dataset.

classification task. However, context-free algorithms can getnearly 50% push accuracy, which is still a relatively high result.One probable reason is that most contents we crawled is hotones, and there are massive users who interact with them.

We also show the running time in Fig. 6. Our ACP achievesmuch lower time cost than ACR and HOO. Although it stillhigher than HCT, it is due to HCT do not take care of users’context and we keep the similar increasing rate with it. Con-sidering the significantly improvement in push accuracy, thiscomputing cost is considered acceptable.

C. Scalability

In this subsection, we verify the scalability of our algorithm.First, we run the four algorithms in a changing dataset. In partic-ular, we randomly divide content dataset into 5 parts equally, andrun algorithms in a part firstly. Beginning at 10000 round, weadd a new part into operation simulating content space extend-ing every 20000 rounds. Cumulative regret and average regretfor four algorithms are shown in Figs. 7 and 8.

Shown in Figs. 7 and 8, the regret of ACP is 35.32% lower thanACR, 62.37% lower than HCT and 56.73% lower than HOO.Clearly, at each time we add a new content subset (t = 104 , 3×104 , 5× 104 · · · ), the regret has obvious increase, especiallywhen t < 5× 104 . Since all four algorithms have scalability,


TABLE IIINFLUENCE OF CONTEXT CLUSTERS’ SIZE

Task Algorithm Push Times ×104 (mT = 81) Push Times ×104 (mT = 162)

0.1 1 5 8 10 0.1 1 5 8 10

Average accuracies ACP 69.63% 79.19% 82.35% 82.53% 83.65% 60.58% 70.63% 80.38% 81.20% 83.45%HOO 34.47% 43.57% 51.85% 54.23% 54.49% 35.39% 42.64% 53.32% 53.65% 53.73%HCT 32.56% 43.56% 54.17% 54.74% 57.12% 33.81% 41.65% 55.98% 55.33% 57.26%ACR 70.17% 66.25% 71.28% 73.26% 72.88% 61.46% 63.38% 68.81% 73.32% 73.16%

Gain ACP over HOO 102.00% 81.75% 58.82% 52.19% 53.51% 71.18% 65.64% 50.75% 51.35% 55.31%ACP over HCT 113.85% 81.80% 52.02% 50.77% 46.45% 79.18% 69.58% 43.59% 46.76% 45.74%ACP over ACR −0.77% 19.53% 15.53% 12.65% 14.78% −1.43% 11.44% 16.81% 10.75% 14.07%

Fig. 8. Average regret in increasing dataset.

Fig. 9. Nomalized gain over random algorithm, with double the dataset att = 5 × 104 .

the influence of adding content is low and overcomes it quickly.However, as we can see, when the algorithm is fully learning(t > 5× 104), adding contents only have slight influence onACP. On the other hand, both regret and average regret in thiscase is larger than the result in Figs. 4 and 5. That is owing to,for a static dataset, our algorithm can learn the its whole featuresat the beginning. But when face dynamic data, the features ACPlearned are not comprehensive. And the way we add the contentsmay not abide by the Lipschitz condition completely.

Next, we will test the performance when dataset rapidlygrows. At this time, we only divide content dataset into twoparts equally. The second part will be inserted when t = 50000.

TABLE IIIGAIN OF mT = 162 OVER mT = 81 TASK

Algorithm Push Times ×104

0.1 1 5 8 10

ACP over ACP −13.00% −10.81% −2.39% −1.61% −0.24%HOO over HOO 2.67% −2.13% 2.84% −1.07% −1.40%HCT over HCT 3.83% 4.39% 3.34% 1.08% 2.45%ACR over ACR −12.41% −4.33% −3.47% 0.08% 0.38%

Meanwhile, we compare the result with random algorithm,which selects content as output randomly. We calculate thelearning gain of each algorithms over the random algorithm(i.e., (Rrandom,t −Rπ,t)/Rrandom,t , where Rπ,t denotes thetotal regret of the algorithm π at time t). And the result is shownin Fig. 9. As we can see, after t = 50000, all four algorithmsdo not have an obvious performance loss, since all of themhave the scalability and original is large enough to learn the itscharacteristics. Noting that the performance up after t = 50000is coming from the decreased performance of the random algo-rithm. The performance increasing against the random algorithmis 76.57%, 63.28%, 37.14%, and 34.57% for the ACR, ACR,HCT and HOO algorithms, respectively.

D. Influence of Context Clusters’ Size

Now, we test the influence of context clusters’ size. We runfour algorithms at whole content dataset, but set two differentvalue of the number of context cluster for our algorithm.Specifically, we set mT = 81 and mT = 162, respectively(for ACR, the radius of context cluster ρt = 0.7/mT ). Theresults of average accuracies (reward) and gains are presentedin Tables II and III.

From the data in the tables, we know that, when the size of thecontext cluster decreases, the contextual algorithm will sufferserious cold start problem. We only double the number of usertypes, but the average push accuracy decrease 9.05% at t = 103

and 8.56% at t = 104 for ACP, and similar result for ACR. Butwith the learning process, they finally achieve a comparableresult at t = 105 . And we can read from Table II that, under asuitable number of user types (mT = 81), the learning processconverges faster. So, the ability of setting arbitrary shape of usertype with the proper number is significant in social networkapplications. Noting that, although suffering cold start problem,


TABLE IVACCURACY VARITION IN DIFFERENT USERS SEQUENCES

Case Case1 Case2 Case3 Case4 Case5Varition percentage 0.78% 2.07% −1.27% −2.10% −1.87%Case Case6 Case7 Case8 Case9 Case10Varition percentage 5.02% −2.10% −3.44% −1.33% 4.23%

contextual algorithm still outperform context-free ones. Theaverage accuracy of ACP is 35.16% and 37.07% higher thanaverage accuracy obtained from HOO and HCT, even att = 1000. Thus, we can conclude that the size of the contextcluster is important to the content push system, especially at thebeginning of system operation, and our contextual approach isquite suitable for this application.

E. Robustness of Users Arrivals

We evaluate the performance variation depending on differentsequences of users arrivals. Similar to [6], we randomly generate10 sequences of users arrivals. We then evaluate the averageaccuracy up to time T = 105 in these diverse user sequencesand calculate the average of these 10 cases. Finally, we comparethe average accuracy variation of each case with this mean. Theresult is shown in Table IV. A positive value means accuracyof a specific users arrival sequence is larger than average, and anegative value means accuracy is smaller than average. Shownin Table IV, the variation of average accuracy generated by ouralgorithm is less than 5.1%. Thus, our approach is robust todifferent sequences of users arrivals.

VI. CONCLUSION

In this paper, we propose a contextual online learning methodthat supports the continous incoming big data to make accuratepush in CCN-based social network. With the help of CCN, wecan get a precise user profile, which renders it possible to pushpersonalized contents to the user. Our approach helps reducingthe redundantly data transforming in current networks and max-imize the effectiveness of CCN. We prove our ACP algorithmachieves sublinear regret and space complexity. And the timecomplexity up to time T is O (T log T ). So, it only gives ser-vicers a little additional load but can significantly improve thepush performance. We verify our algorithm with three existinglarge-scale bandit algorithm in an offline dataset. From the ex-periments, we can conclude our approach is perfectly suit theapplication. In the future, we would like to operate our pushsystem in a real-world CCN-based social network to furtheranalyse it performance. In addition, we also wish to improveour algorithm in energy efficiency.

REFERENCES

[1] V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs,and R. L. Braynard, “Networking named content,” in Proc. Proc. 5th Int.Conf. Emerg. Netw. Exp. Technol., 2009, pp. 1–12.

[2] U. Lee, I. Rimac, and V. Hilt, “Greening the internet with content-centricnetworking,” in Proc. 1st Int. Conf. Energy-Efficient Comput. Netw., 2010,pp. 179–182.

[3] B. Mathieu, P. Truong, W. You, and J.-F. Peltier, “Information-centric net-working: A natural design for social network applications,” IEEE Com-mun. Mag., vol. 50, no. 7, pp. 44–51, Jul. 2012.

[4] C. Tekin, S. Zhang, and M. van der Schaar, “Distributed online learning insocial recommender systems,” IEEE J. Sel. Topics Signal Process., vol. 8,no. 4, pp. 638–652, Aug. 2014

[5] Ido Guy, N. Zwerdling, I. Ronen, D. Carmel, and E. Uziel, “Social mediarecommendation based on people and tags,” in Proc. 33rd Int. ACM SIGIRConf. Res. Develop. Inf. Retrieval, 2010, pp. 194–201.

[6] L. Song, C. Tekin, and M. van der Schaar, “Online learing in large-scalecontextual recommender system,” IEEE Trans. Serv. Comput., vol. 9, no. 3,pp. 433–445, May/Jun. 2016.

[7] S. Bubeck and N. Cesa-Bianchi, “Regret analysis of stochastic and non-stochastic multi-armed bandit problems,” Found. Trends Mach. Learn.,no. 5, vol. 1, pp. 1–122, Dec. 2012.

[8] B. Du, Z. Wang, L. Zhang, L. Zhang, and D. Tao, “Robust and discrimina-tive labeling for multi-label active learning based on maximum correntropycriterion,” IEEE Trans. Image Process., vol. 26, no. 4, pp. 1694–1707, Apr.2017.

[9] L. Zhang, Q. Zhang, L. Zhang, D. Tao, X. Huang, and B. Du, “Ensemblemanifold regularized sparse low-rank approximation for multiview fea-ture embedding,” Pattern Recognit., vol. 48, no. 10, pp. 3102–3112, Oct.2015.

[10] C. B. Browne et al., “A survey of Monte Carlo tree search methods,” IEEETrans. Comput. Intell. AI Games, vol. 4, no. 1, pp. 1–43, Mar. 2012.

[11] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time analysis of themultiarmed bandit problem,” Mach. Learn., vol. 47, pp. 235–256, 2002.

[12] J. Li, B. Liu, and H. Wu, “Energy-efficient in-network caching for content-centric networking,” IEEE Commun. Lett., vol. 17, no. 4, pp. 797–800,Apr. 2013.

[13] A. Gharaibeh, A. Khreishah, and I. Khalil, “An O(1)-competitive onlinecaching algorithm for content centric networking,” in Proc. 35th Annu.IEEE Int. Conf. Comput. Commun., 2016, pp. 1–9.

[14] Y. Kim and I. Yeom, “Performance analysis of in-network caching forcontent-centric networking,” Comput. Netw., vol. 57, no. 13, pp. 2465–2482, 2013.

[15] J. Quevedo, D. Corujo, and R. Aguiar, “Consumer driven informationfreshness approach for content centric networking,” in Proc. IEEE Conf.Comput. Commun. Workshops, Toronto, ON, Canada, 2014, pp. 482–487.

[16] L. Pu, X. Chen, J. Xu, and X. Fu, “Content retrieval at the edge: A social-aware and named data cooperative framework,” IEEE Trans. Emerg. TopicsComput., 2016, to be published, doi: 10.1109/TETC.2016.2581704.

[17] J. Burke, P. Gasti, N. Nathan, and G. Tsudik, “Securing instrumented en-vironments over content-centric networking: The case of lighting controland NDN,” in Proc. IEEE Conf. Comput. Commun. Workshops, 2013,pp. 394–398.

[18] D. Zhang, C.-H. Hsu, M. Chen, Q. Chen, N. Xiong, and J. Lloret, “Cold-start recommendation using bi-clustering and fusion for large-scale socialrecommender systems,” IEEE Trans. Emerg. Topics Comput., vol. 2, no. 2,pp. 239–250, Jun. 2014.

[19] D. Kempe, J. Kleinberg, and E. Tardos, “Maximizing the spread of in-fluence through a social network,” in Proc. 9th ACM SIGKDD Int. Conf.Knowl. Discovery Data Mining, 2003, pp. 137–146.

[20] M. Gomez-Rodriguez and B. Schlkopf, “Influence maximization in con-tinuous time diffusion networks,” in Proc. 29th Int. Conf. Mach. Learn.,2012, pp. 313–320.

[21] W. Chen, W. Lu, and N. Zhang, “Time-critical influence maximization insocial networks with time-delayed diffusion process,” in Proc. AAAI Conf.Artif. Intell., 2012, pp. 592–598.

[22] S. Vaswani, L. Lakshmanan, and M. Schmidt, “Influence maximizationwith bandits,” in Proc. Annu. Conf. Neural Inf. Process. Syst., 2015, pp.1–6.

[23] D. Bouneffouf, A. Bouzeghoub, and A. L. Gancarski, “A contextual-bandit algorithm for mobile context-aware recommender system,” in Proc.Neural Inf. Process., 2012, pp. 324–331.

[24] L. Li, W. Chu, J. Langford, and R. E. Schapire, “A contextual-banditapproach to personalized news article recommendation,” in Proc. 19thInt. Conf. World Wide Web, 2010, pp. 661–670.

[25] C. Tekin and M. van der Schaar, “Distributed online learning via coop-erative contextual bandits,” IEEE Trans. Signal Process., vol. 63, no. 14,pp. 3700–3714, Jul. 2015.

[26] S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvari, “X-armed bandits,” J.Mach. Learn. Res, vol. 12, pp. 1655–1695, 2011.

[27] M. G. Azar, A. Lazaric, and E. Brunskill, “Online stochastic optimiza-tion under correlated bandit feedback,” in Proc. Int. Conf. Mach. Learn.,Beijing, China, 2014, pp. 1557–1565.

http://dx.doi.org/10.1109/TETC.2016.2581704


[28] C. Tekin and M. van der Schaar, “Distributed online big data classifica-tion using context information,” in Proc. Int. Conf. Commun., Control,Comput., Oct. 2013, pp. 1435–1442.

[29] Y. Yue and C. Guestrin, “Linear submodular bandits and their applicationto diversified retrieval,” in Proc. 24th Int. Conf. Neural Inf. Process. Syst.,2011, pp. 2483–2491.

[30] B. Yu, M. Fang, and D. Tao, “Linear submodular bandits with a knapsackconstraint,” in Proc. 13th AAAI Conf. Artif. Intell., 2016, pp. 1380–1386.

[31] Y. Feng, P. Zhou, D. Wu, and Y. Hu, ”Supplementary: Accurate con-tent push for content-centric social networks: A big data-support onlinelearning approach,” 2017. [Online]. Available: https://www.dropbox.com/s/bkk02klgjd1vlvx/supptetc.pdf?dl=0

Yinan Feng (S’16) is currently working toward theUndergraduate degree at the School of Electronic In-formation and Communications, Huazhong Univer-sity of Science and Technology, Wuhan, China, work-ing with Prof. P. Zhou. His research interests includemultimedia big data and machine learning.

Pan Zhou (S’07–M’14) is currently an AssociateProfessor with the School of Electronic Informationand Communications, Huazhong University of Sci-ence and Technology, Wuhan, China, the Ph.D. de-gree from the School of Electrical and Computer En-gineering, Georgia Institute of Technology (GeorgiaTech), Atlanta, GA, USA, in 2011. He was a SeniorTechnical Member with Oracle, Inc., Boston, MA,USA, during 2011–2013, and worked on Hadoop anddistributed storage systems for big data analytics atOracle cloud Platform. His current research interests

include machine learning and big data, communication and information net-works, and security and privacy.

Dapeng Wu (S’98–M’04–SM’06–F’13) received thePh.D. degree in electrical and computer engineer-ing from Carnegie Mellon University, Pittsburgh, PA,USA, in 2003. He is a Professor with the Departmentof Electrical and Computer Engineering, Universityof Florida, Gainesville, FL, USA. His research in-terests include networking, communications, signalprocessing, computer vision, machine learning, smartgrid, and information and network security.

Yuchong Hu received the Ph.D. degree in computerscience and technology from the School of Com-puter Science, University of Science and Technologyof China, Hefei, China, in 2010. He is currently anAssociate Professor with the School of Computer Sci-ence and Technology, Huazhong University of Sci-ence and Technology, Wuhan, China. His researchfocuses on improving the fault tolerance, repair, andread/write performance of storage systems, which in-clude cloud storage systems, distributed storage sys-tems, and NVRAM-based systems.

Documents

Accurate Content Push for Content-Centric Social Networks ...static.tongtianta.site/paper_pdf/13dfc57c-562f-11e9-9f0a-00163e08bb8… · Content-centric networking (CCN) paradigm proposed