Data-locality-aware User Grouping in Cloud Radio Access Networks · 2019-05-23 · Data-locality-aware User Grouping in Cloud Radio Access Networks Weng Chon Ao and Konstantinos Psounis,

1

Data-locality-aware User Grouping in Cloud RadioAccess Networks

Weng Chon Ao and Konstantinos Psounis, Fellow, IEEE

Abstract—Cellular base band units of the future are expectedto reside in a cloud data center which provides computationresources, content storage and caching, and a natural place toperform multi-user precoding, thus addressing both cost andperformance concerns of cellular systems. Multi-user precodingrelies on efficient user grouping schemes to maximize multi-plexing gains. However, traditional user grouping schemes areunaware of data center constraints, and may induce a largenumber of data transfers across racks when fetching requesteddata to a certain rack for precoding. When congestion occurs inthe data center network, the delay of data transfers across racksmay exceed the channel coherence time. This would kill multi-user MIMO transmissions as channel state information becomesoutdated.

In this paper we design novel data-locality-aware user group-ing schemes which preferentially group users whose requesteddata are located under the same rack. We also design usergrouping algorithms which adapt to the congestion level in thecloud data center. Specifically, a regularized spectral efficiencymaximization problem is proposed where the number of datatransfers across racks is introduced as a regularization term.By adjusting the weight of the regularization term according tothe congestion level, we gradually suppress data transfers acrossracks in forming user groups when congestion occurs. We reducethe above problem to a soft-capacitated facility location problemand we devise a 2-approximation user grouping algorithm. Last,we conduct simulations which show that our proposed algorithmperforms close to the optimal in practical scenarios, and studythe tradeoff between higher spectral efficiency and lower datatransfer cost.

Index Terms—cloud radio access network, data center, cellu-lar network, multi-user MIMO, user grouping, approximationalgorithm

I. INTRODUCTION

In the cloud radio access network (C-RAN) architecture,base band units (BBUs) are centralized as a cloud data center(called BBU pool) separated from the remote radio heads(RRHs) deployed at base stations (BSs) to reduce energyconsumption and better utilize computation resources [1], [2],see Fig. 1a. The centralized BBUs can not only providecomputation resources to serve a large number of BSs but alsoenable coordination among BSs to increase spectral efficiencyby using techniques like Coordinated MultiPoint (CoMP) [3].In addition to providing computation resources, content datacan be stored or cached at the BBU pool [4], [5]. For example,

This work was supported by NSF grant ECCS-1444060 and a CiscoResearch Center grant.

The authors are with the Department of Electrical Engineering and Com-puter Science, University of Southern California, Los Angeles, CA, 90089USA (e-mail: [email protected]; [email protected]).

Digital Object Identifier

Fig. 1: (a) C-RAN architecture. (b) BBU pool: a cloud datacenter for efficient computation and content caching.

the content data can be recent movies and TV series hostedand organized by Netflix.

A cloud data center consists of a large number of racks.See, for example, the Telco cloud proposed by Nokia [6], [7].In each rack, there is a top-of-rack switch that connects to alarge number of servers (for both computing and data storage).Moreover, a core switch connects to the top-of-rack switchesto enable data transfers across different racks, see Fig. 1b.To perform computation over data such as precoding, thedata needs to be first fetched to a local drive. It is typicallymore expensive to fetch data across different racks than underthe same rack because data transfers across racks are moresusceptible to congestion and more prone to delay [8], [9].

In the BBU pool, a major computation task is to performprecoding over different data streams requested by differentusers to enable spatial multiplexing via multi-user MIMO [10]or massive MIMO [11]–[13]. Since most of today’s BSs areequipped with at most 16 antennas, e.g., in the current LTEstandard [14] codebooks for precoding are defined for up to16 antenna ports, we focus the discussion in the multi-userMIMO case, enabled via Zero-Forcing BeamForming (ZFBF)precoding, though our analysis can be readily extended to themassive MIMO setting as well where other precoders are morepopular, e.g. conjugate beamforming.

ZFBF requires to select a set of users to be served concur-rently, an operation commonly referred to as user grouping,and it also requires instantaneous channel state information(CSI) to be collected to perform the precoding. Traditionaluser grouping schemes are not aware of the data locality indata centers, that is, of where the data reside. Instead, theirtask is to minimize the overhead from collecting CSI and/orto maximize the multiplexing gain. For example the so-calledrandomized and round-robin schemes select users randomly orin a round-robin fashion, aiming to minimize overhead sinceCSI is collected only for the selected users once the group isformed. In contrast, in CSI-based user grouping [15]–[17] it is

2

required to collect CSI from a large number of users and thenselect a subset of users with semi-orthogonal channel vectorsto form a group, aiming to maximize the multiplexing gain.

Multi-user ZFBF precoding with CSI-based user groupinghas higher chances to achieve the maximum spectral efficiency.That said, it yields a higher data rate only if the turn-aroundtime between CSI feedback and the actual data transmissionis smaller than the channel coherence time.1 Otherwise, theCSI becomes outdated, causing significant performance degra-dation. And, the turn-around time might be large becauseafter collecting CSI and selecting the users, we then haveto fetch the requested data of the users to a local disk ina certain rack to perform precoding, as these data may belocated across different racks in the data center. Depending onthe congestion level in the data center, the data fetching timeranges from hundreds of microseconds to tens of milliseconds[8], [19]. This can be larger than the channel coherence time,since, depending on the user speed and the carrier frequency,the channel coherence time ranges from milliseconds (highmobility users) to tens or hundreds of milliseconds (lowmobility or static users) [14].

Therefore, when congestion occurs in the cloud data centernetwork, a reduction in the number of data transfers acrossracks is required to guarantee that the overall turn-around timeis smaller than the channel coherence time and the wirelesstransmissions are successful. However, all the above traditionaluser grouping schemes are independent of where the requesteddata is located, potentially inducing a large number of datatransfers across racks. With this in mind, we propose usergrouping schemes under the C-RAN architecture which takeinto consideration where the data is located in the data center.Thus, we preferentially group together users whose requesteddata are located under the same rack, and only transfer dataacross racks if the ensuing rate increase is sizable. Thissignificantly reduces the number of data transfers across racks,while still providing high wireless spectral efficiency throughspatial multiplexing.

The wireless spectral efficiency becomes higher as we allowmore data transfers across racks to form user groups withbetter-conditioned channel matrix as long as the CSI doesnot become outdated. On the other hand, when congestionoccurs in the cloud data center, the inter-rack data fetchingtime increases such that the overall turn-around time maybecome a large fraction of (or even larger than) the channelcoherence time and the CSI may become outdated, reducingthe spectral efficiency. To devise a user grouping algorithmthat is able to adapt to the congestion level in the cloud datacenter network, we propose a regularized spectral efficiencymaximization problem where the number of data transfers

1Note that the fronthaul delay is also a part of the turn-around time.However, we focus on scenarios where the fronthaul connections betweenthe BBU pool and the RRHs are fast enough [6], [7] for the fronthaul tonot be a bottleneck. For scenarios where the fronthaul is part of the systemoptimization, the interested reader is referred to [10], [18] and referencestherein. In addition, the delay associated with the computation of the precodingmatrix is also a part of the turn-around time. The number of complexoperations to compute the precoding matrix depends on the matrix size andthe precoding method (e.g., in ZFBF an inversion of the channel matrixis involved), so the latency associated with such computation ranges inmicroseconds. We will see later that it does not become a bottleneck.

across racks is introduced as a regularization term. When thereis no congestion in the cloud data center, one can simply setthe weight of the regularization term to zero, allowing arbitrarynumber of data transfers across racks to maximize the wirelessspectral efficiency (in this case our problem reduces to thetraditional CSI-based user grouping). On the other hand, whencongestion is high, one may impose a large weight to regulatethe inter-rack data transfers and settle for “local” user groups.By adjusting the weight of the regularization term accordingto the congestion level, data transfers across racks can becontrolled to achieve the desired system operating point.

The regularized spectral efficiency maximization problemis NP-hard. Motivated by this, we first introduce a simplifiedform of the problem (referred to as the regularized resourceblock minimization problem), where we use the number ofresource blocks needed to serve a fixed number of users as aproxy for the spectral efficiency. Then, we reduce this problemto a soft-capacitated facility location problem [20], [21], anddevise a 2-approximation algorithm to efficiently solve it.

The remainder of this paper is organized as follows. Wepresent related work in Section II. Section III describes thesystem model and presents motivating examples. The problemformulation and performance analysis are given in SectionIV. We extend our formulation to various practical scenariosof interest in Section V. Section VI presents numerical andsimulation results. Last, Section VII concludes the paper.

II. PRIOR WORK

An introduction of the C-RAN architecture and the advan-tages of BBU centralization can be found in [1], [2] and inreferences therein. Such BBU centralization facilitates basestation coordination like CoMP [3]. The concept of the BBUpool acting as a cloud data center for both computation andcontent caching or storage is introduced in [4], [22], [23].Multiple recent works have evaluated the performance ofthe C-RAN architecture using simulations, real-world data,software-defined radios, and even prototypes, see, for example,[6], [7], [24]–[27].

A major task of the BBU pool is to perform multi-userZFBF precoding over users’ requested data streams to enablemulti-user MIMO [10]. A central piece of multi-user ZFBF isuser grouping. The analysis of the performance of randomizeduser grouping for user scheduling in multi-user MIMO andmassive MIMO wireless networks can be found in [12],[13]. CSI-based user grouping for multi-user beamformingwas investigated in [15]–[17], where the users are groupedaccording to the instantaneous CSI to exploit the multi-userdiversity. User grouping can also be based on the channeldistribution information, the second-order channel statisticsor some combination of the above approaches. For example,in [28], a two stage precoding scheme has been presented,where the users are first grouped according to the second-orderchannel statistics, and then, for the users that have the samesuch statistics, randomized user grouping is used to schedulea subset of users for multi-user ZFBF precoding.

CSI-based beamforming has been considered in a C-RANwith a focus on fronthaul optimization [10], [18]. Specifically,

3

in [10], the authors designed a sparse beamformer to minimizethe fronthaul power consumption. In [18], the authors consid-ered enhanced RRHs that are also embedded with caches andbaseband processing units. By using superposition coding, theprecoding can be decomposed, performed partly at the BBUpool and partly at the enhanced RRH subject to the fronthaulcapacity constraint. As mentioned earlier, in this work weoperate under scenarios where the fronthaul connections arefast enough [6], [7] and the fronthaul delay does not becomea bottleneck in the turn-around time. Instead, we focus on thecase where the data fetching time for precoding in the clouddata center becomes a bottleneck when congestion occurs.

To the best of our knowledge, this is the first work thattakes into consideration the data locality information and thecongestion level in the cloud data center in forming usergroups for multi-user MIMO ZFBF precoding in a C-RAN.

III. SYSTEM MODEL

Consider a BS in the context of the C-RAN architecture.(Section V discusses how to extend the discussion and resultsfor multiple BSs.) Suppose that the BS has η antennas andthere are M users under the coverage of the BS, each equippedwith a single antenna. (See Section V for the case with multi-antenna users.) Denote the users by the setM = {1, 2, . . . , M}.

Suppose that each of the M users makes a single datarequest. We denote the data requested by user i as di, i =1, . . . , M , which is stored in the BBU pool cloud data center.Suppose that there are N racks (denoted by the set N =

{1, 2, . . . , N}) in the data center and assume that the requesteddata of a user is equally likely to be stored in any one ofthe N racks. The event that the data requested by user i isin rack j is represented as di ∈ rj (we use the shorthandnotation rj to refer to rack j). We denote the set of userswhose requested data are in rack j as M j , {i : di ∈ rj}.Note that M j, j = 1, 2 . . . , N form a partition of the userset, where Mi ∩ M j = ∅, i , j and

⋃Nj=1M j = M. Our

formulation can be extended to the case with data replicationacross different racks (Mi ∩M j , ∅, i , j), see Section Vfor a detailed discussion.

A. User grouping

Since the base station has η antennas, it can provide ηdegrees of freedom for spatial multiplexing (η is also calledthe spatial multiplexing gain) and can serve at most η usersin a resource block (time-frequency slot). For simplicity, weassume that the unit size of a user’s requested data is oneresource block. User grouping refers to partitioning the set ofusers M into T groups, i.e., S1,S2, . . . ,ST , where they aredisjoint Si ∩ Sj = ∅, i , j,

⋃Ti=1 Si =M, and the cardinality

of each group is less than or equal to the degrees of freedom|Si | ≤ η. By using multi-user ZFBF precoding, the users inthe same group will be served simultaneously in one resourceblock, so a total number of T resource blocks will be used.

Assuming η divides M , we partition the set of users Minto M

η groups each of size η, to fully utilize the degreesof freedom for data transmission and minimize the number ofresource blocks needed to serve all users. The partition criteria

(i.e, how to partition the M users into Mη groups) for the

randomized/CSI-based user grouping scheme depends on therandomization strategy or the instantaneous CSI of the usersand will be discussed in Section IV-A. The partition criteriafor the data-locality-aware user grouping scheme depends onthe data locality (encoded by the sets M j, j = 1, 2 . . . , N):given a fixed number of M

η resource blocks, we form Mη user

groups while minimizing the number of induced data transfersacross racks, see Section IV-B. Since the resulting wirelessrates depend not only on utilizing the maximum multiplexinggain (and thus the minimum number of resource blocks) butalso on the degree of orthogonality among the users’ channelvectors and thus on the CSI [15], [16], in Section IV-C weextend the above scheme to also take into consideration theCSI when forming those M

η user groups, and term the schemea joint CSI- and data-locality-aware user grouping scheme.In Sections IV-D and IV-E, to further reduce the numberof induced data transfers, we relax the constraint of usingonly M

η resource blocks (i.e., we allow to form T > Mη

user groups). Since transferring data across racks takes time,if the associated delay is larger than the channel coherencetime due to high congestion, the CSI will be invalidated andthe resulting spectral efficiency will be significantly reduced.Motivated by this, we devise two user grouping algorithmsthat can adapt to the congestion level in the cloud data center.First, we propose a regularized resource block minimizationproblem where we minimize a weighted sum of the numberof resource blocks used and the number of induced datatransfers across racks (see Section IV-D). Second, we proposea regularized spectral efficiency maximization problem wherewe consider the data rate directly (see Section IV-E). Byadjusting the weight of the regularization term according to thecongestion level in the cloud data center, we can suppress datatransfers across racks in forming user groups when congestionoccurs. To simplify the analysis we assume η divides M in allanalytical derivations and relax this in Section VI where wepresent simulation results.

B. Multi-user ZFBF precoding and the need to transfer data

Let hi be the η × 1 channel coefficient vector between theBS and user i and let H , [h1 · · · hM ]T be the M × η channelmatrix. Fix a typical user group S. Let H(S) be the |S|×η sub-matrix of H where the rows correspond to the channel vectorsof the users in S. The ZFBF precoding matrix is defined asV , H(S)†, where H(S)† is the pseudoinverse of H(S). Letd(t) denote the |S| × 1 data vector corresponding to the datastreams requested by the users in S and x(t) denote the η × 1symbol vector to be transmitted by the BS. We have x(t) =Vd(t) = H(S)†d(t). Under the ZFBF precoding, the |S| × 1received signal y(t) at the users is

y(t) = H(S)x(t) = H(S)H(S)†d(t), (1)

where we ignore the background noise. Note that when|S| ≤ η and H(S) has full row rank, we have H(S)H(S)† = Iand thus y(t) = d(t). In this case, we can see that by usingZFBF each user gets its own requested data stream withoutinterference from other streams. Unless otherwise stated, we

4

Fig. 2:M1 = {1, 3} andM2 = {2, 4}. Data-locality-aware usergrouping: S1 = {1, 3} and S2 = {2, 4}.

will assume that user groups of cardinality up to η yield a well-conditioned channel matrix and ZFBF can be used to serve allthe users in the same group concurrently.

In a C-RAN, the ZFBF precoding is performed at the BBUpool, where we jointly precode all data streams requested bythe users in the same group S to get a spatial multiplexinggain of order |S|. Before the actual computation for ZFBFprecoding happens, the requested data streams for users in Sneed to be fetched to the same rack, which may induce a largenumber of data transfers across racks depending on whetherdata locality is taken into consideration upon the formation ofthe user groups or not.

C. Simple motivating examples

Suppose that the base station has η = 2 antennas thatcan provide two degrees of freedom for spatial multiplexing,sending two data streams in a single resource block. Supposethat there are M = 4 single antenna users. Last, suppose thatthe data requested by user 1 and user 3 are in the first rackand the data requested by user 2 and user 4 are in the secondrack, i.e., M1 = {1, 3} and M2 = {2, 4}, see Fig. 2.

If we group user 1 and 2 together and user 3 and 4 togetherfor ZFBF precoding, i.e., S1 = {1, 2} and S2 = {3, 4}, thenwe need to first transfer data d2 from r2 to r1 and data d3from r1 to r2, resulting in a total of two data transfers acrossracks. This is because for ZFBF precoding, the coded symbolsto be transmitted by the BS antenna array are weighted linearcombinations of all streams. On the other hand, if we groupuser 1 and 3 together and user 2 and 4 together, i.e., S1 ={1, 3} and S2 = {2, 4}, then obviously there is no need totransfer any data across racks for precoding since, for example,for users 1 and 3 d1 ∈ r1 and d3 ∈ r1.

Suppose now that d1 ∈ r1, d3 ∈ r1, d2 ∈ r2, and d4 ∈ rNas shown in Fig. 3. Consider the following two user groupingschemes. The first one is to have two groups with S1 = {1, 3}and S2 = {2, 4}, in which case two resource blocks and onedata transfer (say, moving d4 from rN to r2) are required.The second one is to have three groups with S1 = {1, 3},S2 = {2}, and S3 = {4}, in which case three resource blocksare required (the degrees of freedom are not fully utilized)and no data transfer is needed. It is easy to see that to fullyutilize the available degrees of freedom provided by the systemmore data transfers across racks may be required. We shallsee in later sections that the number of data transfers acrossracks in forming user groups can be gradually suppressed by

Fig. 3: Regularized spectral efficiency maximization: Whenthe weight for the regularization term is small, we have S1 ={1, 3} and S2 = {2, 4}, resulting in usage of two resourceblocks and one data transfer; when the weight is large (we aimto suppress data transfers), we have S1 = {1, 3}, S2 = {2}, andS3 = {4}, resulting in usage of three resource blocks and nodata transfer.

adjusting the weight of the regularization term according tothe congestion level in the cloud data center.

IV. PROBLEM FORMULATION AND ANALYSIS

We start by deriving expressions for the number of datatransfers induced by traditional user grouping schemes (Sec-tion IV-A) and the proposed data-locality-aware user groupingscheme (Section IV-B) under the assumption that any usergroup of size η yields a well-conditioned channel matrixand thus M

η resource blocks suffice to serve all users. Notethat the derived analytical expressions can easily quantify thebehavior of the system and thus offer intuition. We then presentthe details of how to jointly use the CSI and data localityinformation for user grouping (Section IV-C). To design auser grouping algorithm that can adapt to the congestion levelin the cloud data center, we also consider the use of morethan M

η resource blocks. Thus, we formulate a regularized re-source block minimization problem and introduce an efficientapproximation algorithm (Section IV-D). Finally, we discusshow to exploit CSI to further increase the spectral efficiencyand formulate a regularized spectral efficiency maximizationproblem (Section IV-E).

A. Randomized/CSI-based user grouping

In this subsection, we compute for randomized and CSI-based user grouping the number of induced data transfers, andderive closed-form lower and upper bound expressions underthe regime where the number of racks is a lot larger than thedegrees of freedom (antennas) of the BS, which is triviallysatisfied in practice.

In randomized user grouping, we form the first group byselecting η users uniformly at random from the M users. Then,we form the second group by selecting η users uniformly atrandom from the remaining M −η users, and we continue thisprocess until we have M

η groups. In CSI-based user grouping,user groups are formed by lumping together η users withsemi-orthogonal channel vectors [15]. Since the performancemetric under study is the number of induced data transfers,the analysis is the same for both randomized and CSI-baseduser grouping schemes.

5

Under such traditional user grouping schemes, the setsSi, i = 1, . . . , Mη are formed independently of the setsM j, j = 1, . . . , N since users are grouped independently ofthe location of their requested data. As a result, the requesteddata by the η users in a group will be randomly and uniformlydistributed among the N racks. This is similar to the “balls intobins” problem where we throw η balls into N bins.

To perform ZFBF precoding over the η data streams re-quested by the users in a group we need to first select a rackas the designated rack to perform the precoding. Intuitively,we select the rack that stores the largest number of requesteddata as the designated rack (ties are randomly resolved). Then,we transfer all remaining requested data (located outside thedesignated rack) to the designated rack for precoding. Thus,the number of induced data transfers equals η minus thenumber of requested data in the designated rack. For example,if η = 6, rack 1 stores two, rack 2 stores one, and rack 3 storesthree requested data files, then rack 3 will be selected as thedesignated rack and a total of 3 data transfers will take placetowards rack 3 (two from rack 1 and one from rack 2). Asa result, the average number of induced data transfers underthe randomized/CSI-based user grouping scheme (denoted asDrand/csi) is equal to

Drand/csi =Mη· (2)

E [η − the no. of requested data files in the designated rack] .

Since the number of requested data files in the designatedrack is the same as the load of the maximum loaded bin ofthe “η balls into N bins” problem, we proceed the analysis bydefining Xj to be the number of requested data files (balls) inrack (bin) j. By using the concentration results for the ballsinto bins problem [29], when N

polylog(N ) ≤ η � N log N , we

have Pr

{maxj=1,...,N Xj >

log N

log N log Nη

(1 +

log(log N log N

η

)log N log N

η

)}=

o(1) and Pr{maxj=1,...,N Xj >

log N

log N log Nη

}= 1 − o(1).

As a result, the asymptotic upper and lower bound on theaverage number of induced data transfers are

Drand/csi =Mη

η∑k=1(η − k)Pr

{max

j=1,...,NXj = k

}.

Mη

©«η − log N

log N log Nη

ª®¬ , (3)

and

Drand/csi &Mη

η −log N

log N log Nη

©«1 +log

(log N log N

η

)log N log N

η

ª®®¬ . (4)

We compare both bounds with simulation results in Section VI.From Fig. 7, we can see that both bounds are quite tight.

B. Data-locality-aware user grouping

Consider a data-locality-aware user grouping scheme whosegoal is to minimize the number of induced data transfers by

Fig. 4: The number of induced data transfers in the data-locality-aware user grouping scheme.

properly grouping users into Mη groups.

To find the user groups that minimize data transfers weintroduce the following optimization problem. Let xi, j be abinary variable, where xi, j = 1 if the requested data di isassigned to rack j for processing, that is, it will be jointlyprecoded with another η − 1 requested data assigned to rackj. Let ci, j be the associated cost of transferring data file dito rack j. Since we aim to count the number of induced datatransfers across racks, we set ci, j = 0 if di ∈ rj and ci, j = 1otherwise. (In Section IV-D, we generalize to non-binary coststo take into account different routing paths.) Let yj be thenumber of resource blocks required to serve all the requesteddata assigned to rack j. The data-locality-aware user groupingproblem can be formulated as follows:

minimize∑i∈M

∑j∈N

ci, j xi, j

subject to∑j∈N

xi, j = 1, ∀i ∈ M∑i∈M

xi, j ≤ ηyj, ∀ j ∈ N∑j∈N

yj =Mη

xi, j ∈ {0, 1}, ∀i ∈ M, j ∈ Nyj ∈ {0, 1, 2, · · · }, ∀ j ∈ N . (5)

The objective function above is equal to the total number ofinduced data transfers across racks. The first constraint ensuresthat the requested data di has to be assigned to some rack. Thesecond constraint ensures that we allocate enough resourceblocks to rack j to serve all the requested data files assignedto it. Recall that the number of requested data files that canbe served in one resource block by using ZFBF precoding isless than or equal to the number of BS antennas η. The thirdconstraint ensures that the total number of resource blocks thatwe use is equal to M

η like in the randomized/CSI-based case.A polynomial time algorithm for Problem (5): The aboveproblem can be solved in polynomial time as follows. Recallthat M j denotes the set of users whose requested data arein rack j. First, for each rack j = 1, · · · , N with |M j | ≥ η,we group batches of η users in M j until |M j | mod η usersare left ungrouped. Note that in this step we form a total of∑N

j=1

⌊|M j |η

⌋groups, inducing no data transfers across racks.

The number of ungrouped requested data files in rack j at

6

the end of this step is |M j | mod η. Second, we sort the setof numbers {mj , |M j | mod η, j = 1, . . . , N} in descendingorder (note that 0 ≤ mj ≤ η−1). We denote the sorted numbersas m(1) ≥ m(2) ≥ · · · ≥ m(N ). Third, to ensure that we jointlyprecode and serve η requested data files at each resource block,we find the number t such that

t∑j=1(η − m(j)) =

N∑j=t+1

m(j),

and transfer the requested data files stored in the racks corre-sponding to the set {m(t+1), · · · ,m(N )} to the racks correspond-ing to the set {m(1), · · · ,m(t)}, as shown in Fig. 4.2 Intuitively,the equation above finds the right way to partition racks suchthat the minimum number of data files are transferred fromracks with a few ungrouped requests into racks with a few“empty” slots to reach η ungrouped requests and then groupthem together. As a result, the average number of induced datatransfers under the data-locality-aware user grouping schemeis equal to Dlocality = E

[∑Nj=t+1 m(j)

].

For M � Nη, that is, when the number of resource blocksrequired to serve one request of each user is a lot larger thanthe number of racks, we derive an approximation formula forDlocality. When M � Nη, η > 1, the number of requested datafiles left ungrouped in rack j, i.e., mj , |M j | mod η, will beapproximately uniformly distributed in {0, 1, 2, . . . , η − 1}. Asa result, we have on average N

η racks left with i requested datafiles ungrouped, for i = 0, 1, . . . , η−1. Under this deterministicapproximation, we may transfer the only ungrouped data fileof N

η racks to the Nη racks which have η − 1 ungrouped data

files, and so on and so forth till we form groups each of sizeη as usual. Specifically, when η is odd, the total number ofdata file transfers equals

Dlocality ≈ Nη

[1 + 2 + · · · +

(η − 1

2

)]=

N(η + 1)(η − 1)8η

, (6)

and, when η is even it equals

Dlocality ≈ Nη

[1 + 2 + · · · +

(η2− 1

)+

12η

2

]=

Nη8, (7)

where the term 12η2 comes from the fact that we have N

η rackswith η

2 ungrouped data files, and we transfer from half of theseracks these ungrouped data files to the other half. Note thatthese asymptotic expressions offer intuition, for example, theyimply that under the data-locality-aware scheme the numberof transfers does not depend on the number of users/requests,whereas under the CSI and randomized schemes it growslinearly with the number of users/requests, see Eq. (3) and(4).

The above results can be obtained formally using orderstatistics [30] when mj is uniformly distributed in [0, η − 1]and N → ∞. Specifically, it can be shown that when η isodd, we have Dlocality = E

[∑Nj=t+1 m(j)

]N→∞−→ N (η+1)(η−1)

8η , and,

when η is even, we have Dlocality = E[∑N

j=t+1 m(j)]

N→∞−→ Nη8 ,

2Note that the complexity of the above user grouping algorithm is 2M +N log N since we perform two linear passes in the first and third steps, andwe sort N numbers in the second step.

where t = N (η−1)2η since tη = E

[∑Nj=1 m(j)

]= E

[∑Nj=1 mj

]=

N η−12 . We compare the above approximations with simulation

results in Section VI, see Fig. 7.

C. Joint CSI- and data-locality-aware user grouping

The proposed data-locality-aware user grouping scheme inthe previous subsection can be generalized to a joint CSI-and data-locality-aware user grouping scheme that also takesCSI into account when forming user groups. Let us consideragain the problem of minimizing the number of induced datatransfers by properly grouping M users into M

η groups (Prob-lem (5)) and its corresponding polynomial time algorithm. Weare going to modify the polynomial time algorithm such thatnot only the optimal in Problem (5) is reached (i.e. we have thesame minimum number of induced data transfers) but also alluser groups of cardinality η yield a well-conditioned channelmatrix.

The polynomial time algorithm consists of three steps. Wemodify the first and last step as follows: Recall that in thefirst step of the polynomial time algorithm, we fix a rack jand group batches of η users in M j until |M j | mod η usersare left ungrouped. Unlike before where users are selectedat random, here we select users based on CSI. Note that itis well known that CSI-based user grouping is NP-hard, andno tight approximation algorithms exist. Thus, we resort to agreedy approach which forms what is commonly referred toas semi-orthogonal user groups, based on prior work, see, forexample, [15]–[17]. To form the first group (of size η), webegin with choosing an arbitrary user i11 ∈ M j and insert itinto the first group. Then, we choose another user i21 whosechannel vector is the most orthogonal to that of user i11 andinsert i21 into the first group. After that, we choose another useri31 whose channel vector is the most orthogonal to the signalspace spanned by the channel vectors of users i11 and i21 , and soon and so forth until we have η users in the first group, namely,{i11, . . . , i

η1 }. We repeat the above procedure to form the second

group considering the candidate user setM j \ {i11, . . . , iη1 }. We

continue this process until we form the⌊|M j |η

⌋-th user group

consisting of users{i1b |M j |/ηc, . . . , i

η

b |M j |/ηc

}.

We modify the last (third) step of the polynomial timealgorithm as follows. When we transfer requested data fromthe racks in the tail portion of Fig. 4 to the racks in the headportion, we again use CSI and the above greedy algorithmrepeatedly to decide the formation of batches of η users.Specifically, let us fix the rack with the most remainingungrouped users m(1). We select a total number of η−m(1) usersone by one (from the set of ungrouped users whose requesteddata files are stored in the racks corresponding to the set{m(t+1), · · · ,m(N )}) by using the aforementioned orthogonalityprinciple to form a group of size η. Then, we fix the rack withthe second most remaining ungrouped users m(2), and so onand so forth. We continue this procedure till we fix the rackwith m(t) ungrouped users and group them together with thelast remaining η − m(t) users, forming the last group.

Last, we characterize the complexity of the above usergrouping process in terms of the number of comparisons (inner

7

Fig. 5: Data center topologies [31]: (a) FlatNet. (b) DCell. (c)BCube.

products) between two channel vectors. We can see that toform a group of η users we need at most η2(M − η) suchcomparisons. We here have a total of M

η user groups, so thetotal number of comparisons is upper bounded by Mη(M −η).

D. Regularized resource block minimization

In this subsection, we aim to design a user groupingalgorithm that can adapt to the congestion level in the clouddata center. The idea is to suppress data transfers acrossracks in forming user groups when congestion occurs. Weachieve this goal by relaxing the constraint of using exactlyMη resource blocks to serve M

η user groups of η users each(i.e. allowing forming a larger number of smaller user groups)and by introducing a regularization term.

Specifically, we propose a regularized resource block min-imization problem in which we minimize a weighted sum ofthe number of resource blocks needed to serve all user requests(∑j∈N yj) and the cost associated with the data transfers acrossracks (∑i∈M

∑j∈N ci, j xi, j) that acts as a regularization term

with weight ρ. The regularized resource block minimizationproblem can be written as:

minimize∑j∈N

yj + ρ∑i∈M

∑j∈N

ci, j xi, j

subject to∑j∈N

xi, j = 1, ∀i ∈ M∑i∈M

xi, j ≤ ηyj, ∀ j ∈ N

xi, j ∈ {0, 1}, ∀i ∈ M, j ∈ Nyj ∈ {0, 1, 2, · · · }, ∀ j ∈ N . (8)

Note that here we allow a general cost ci, j for transferring datafile di to rack j (instead of restricting to the binary cost asin Section IV-B) to account for more complicated data centertopologies [31], [32], as shown in Fig. 5. For example, ci, jmay be the cost of the shortest path to transfer data file di torack j. Note that ci, j = 0 if di ∈ rj .

To solve the above problem, we first map it to thewell-known metric soft-capacitated facility location problem(SCFLP). Then, we follow the approach presented in [20] tosolve it with a 2-approximation algorithm.Mapping Problem (8) to a metric SCFLP: Under a generalcost ci, j , Problem (8) is NP-hard since it can be mapped to theSCFLP [20]. In the SCFLP, soft capacity means that we canopen multiple copies of the same facility at the same locationand each copy has a hard capacity for servicing customers.

In the SCFLP, we minimize the sum of the costs for openingfacilities and the costs for servicing customers by decidingwhich facilities to open, how many copies of each facility toopen, and which customers are served by which facilities.

The mapping between Problem (8) and the SCFLP isestablished as follows. Performing ZFBF precoding at rackj for yj resource blocks is equivalent to opening facility jwith yj copies. The degrees of freedom η (the maximumnumber of requested data files that can be jointly precodedto be transmitted in one resource block) is equivalent to thehard capacity of a copy of a facility (one resource block canonly serve up to η requested data files). The user data requestsare equivalent to customers. Processing/precoding data di atrack j with the associated data transfer cost ci, j is equivalent toservicing customer i by facility j with the associated servicingcost ci, j .

In the SCFLP, when the costs live in a metric space and thussatisfy the triangular inequality, we call it a metric SCFLP.Our problem is a metric SCFLP since our costs satisfy thetriangular inequality. Indeed, let r(i) denote the rack that storesdata di and denote by cr(i), j the cost for transferring data divia the shortest path between rack r(i) and rack j, and, moregeneral, denote by ck, j the cost of transferring a data file fromrack k to rack j. Then, we have ci, j = cr(i), j , ck,k = 0, ck, j =cj,k , and ck, j ≤ ck,l + cl, j for any other rack l.A 2-approximation algorithm for Problem (8): The metricSCFLP is NP-hard. The authors in [20] map the metric SCFLPto a linear facility location problem (LFLP) and then to anuncapacitated facility location problem (UFLP) which theysolve using the Jain-Mahdian-Saberi (JMS) algorithm [33],yielding a 2-approximation solution to the original metricSCFLP problem. We follow the same procedure to solveProblem (8), which yields a 2-approximation solution [20].

Algorithm 1 The 2-approximation algorithm for Problem (8)

1: Let ci, j = 1η + ρci, j .

2: At the beginning, all requests are unassigned, all racksare unopened, and the budget of every request i, denotedby Bi , is initialized to 0. At every moment, each request ioffers some money from its budget to each unopened rackj. The amount of this offer is equal to max{Bi − ci, j, 0}if i is unassigned, and max{ci, j′ − ci, j, 0} if it is assignedto some other rack j ′.

3: While there is an unassigned request, increase the budgetof each unassigned request at the same rate, until one ofthe following events occurs:• For some unopened rack j, the total offer that it receives

from requests is equal to 1 − 1η . In this case, we open

rack j, and for every request i (assigned or unassigned)which has a non-zero offer to rack j, we assign requesti to rack j.

• For some unassigned request i, and some rack j that isalready open, the budget of request i is equal to ci, j . Inthis case, we assign request i to rack j.

To evaluate the performance of the 2-approximation algo-rithm, we use CVX with Gurobi to solve Problem (8), which is

8

a mixed integer linear programming problem. Gurobi reportsthe distance of its output to the optimal and often this distanceis zero. Thus, we are able to compare the 2-approximationalgorithm against the optimal for small-scale scenarios, seeSection VI for the results.

Last, we can employ a bisection approach to search for theweight ρ∗ that results in a desired amount of data transfersacross racks (say, τ), i.e.,

∑i∈M

∑j∈N ci, j x∗i, j(ρ∗) ≈ τ.

Algorithm 2 The bisection approach for finding ρ∗

1: Initialize ρU = ρmax, ρL = 0, ε = ε0, ρ = 02: while |∑i∈M

∑j∈N ci, j x∗i, j(ρ) − τ | > ε) do

3: if∑

i∈M∑

j∈N ci, j x∗i, j(ρ) > τ + ε then4: ρL ← ρ5: else6: ρU ← ρ7: end if8: ρ← ρU+ρL

29: end while

10: return ρ∗ ← ρ

Note that ε0 > 0 is a small positive number (the tolerabledeviation from the desired amount of data transfers acrossracks τ), ρmax is chosen to be large enough such that there is nodata transfers across racks, and x∗i, j(ρ) is the optimal solutionof the regularized resource block minimization problem (8)under a given weight ρ.

E. Regularized spectral efficiency maximization

Similar to what we did in Section IV-C, for the regularizedresource block minimization problem, we can further increasethe spectral efficiency by exploiting CSI and the aforemen-tioned greedy algorithm to form semi-orthogonal user groupswhenever there are more than η requests assigned to the samerack at the end of the 2-approximation algorithm. Specifically,we greedily add users to user groups such that the channelvector of each added user is the most orthogonal among theungrouped users to the signal space spanned by the channelvectors of the users already in the group, see Section IV-C.

The procedure described above uses CSI to form bettergroups whenever there are more than η requests assigned tothe same rack at the end of the 2-approximation algorithm.However, it might be the case that upon transferring a requestfrom a rack with η or less requests to another rack, thecorresponding channel matrices might be such that the spectralefficiency of the resulting user groups is improved. And,depending on the weight ρ, the marginal gain from such amove might be larger than the marginal cost of the additionaldata transfer. Motivated by this, in the following we introducea general formulation referred to as the regularized spectralefficiency maximization problem, by using as a metric for theefficiency of the wireless channel not the number of resourceblocks required, but the actual data rates achieved on theseresource blocks.

We optimize over the set of all possible partitions of theuser set M, i.e., we optimize over S1,S2, . . . ,ST , whereSk ∩ Sl = ∅, k , l,

⋃Tk=1 Sk = M, and the cardinality of

each group is less than or equal to the degrees of freedom|Sk | ≤ η. Let RZFBF(Sk) denote the sum rate of the usersin group Sk under ZFBF precoding, where RZFBF(Sk) =∑

l∈Sk log(1+ γlSNR) and γl = 1/[(H(Sk)H(Sk)H )−1]

l,l[15].

The regularized spectral efficiency maximization problem canbe written as

maximize∑T

k=1 RZFBF(Sk)T

− ρT∑k=1

minj∈N

©«∑i∈Sk

ci, jª®¬

subject to T ∈ {1, 2, . . . , M}Sk ∩ Sl = ∅, k , l, k, l = 1, . . . ,TT⋃k=1Sk =M

|Sk | ≤ η, k = 1, . . . ,T . (9)

Note that all the data files requested by the users in groupSk will be transferred to a rack jk for ZFBF precoding. Itis easy to see that jk = arg minj∈N(

∑i∈Sk ci, j) since this rack

designation results in the minimum data transfer cost for usersin group Sk . Like before, ρ is the weight for the regularizationterm.

As already mentioned, the pure spectral efficiency maxi-mization problem (ρ = 0 in Problem (9)) is well knownto be NP-hard and no tight approximation algorithms exist.Thus, we devise a greedy algorithm to solve Problem (9).Let us define the utility of a user group Sk as U(Sk) ,RZFBF(Sk)−ρminj∈N(

∑l∈Sk cl, j), which is the sum rate minus

the induced data transfer cost. In addition, we define themarginal utility of including a new user i into a group Skas U(i |Sk) , U(Sk ∪ {i}) −U(Sk).

Algorithm 3 Greedy algorithm for Problem (9)

1: Initialize M ← {1, . . . , M}, k ← 0;2: while M , ∅ do3: k ← k + 1;4: Sk ← ∅;5: while (|Sk | < η and maxi∈M U(i |Sk) > α) or Sk = ∅

do6: Sk ← Sk ∪ {i∗}, where i∗ = argmaxi∈M U(i |Sk);7: M ←M \ {i∗};8: end while9: end while

In the greedy algorithm, when forming the k-th user groupSk , we include a new user i∗ that has the highest marginalutility into Sk if the number of users in Sk is less than thedegrees of freedom η and that highest marginal utility is largerthan some threshold α. Otherwise, we proceed to form the nextgroup Sk+1. At the end of the algorithm, the sets S1, S2, · · ·form a partition of the user set M.

Note that larger groups will allow for more concurrent trans-missions and thus better spectral efficiency, which Problem (9)takes into account by dividing the sum rate by T , the totalnumber of groups, and optimizing the average group rate ratherthan the total sum rate. Since the greedy algorithm does notknow a priori the optimal number of user groups, the user

9

group utility and marginal utility defined above work with thegroup rate, and, the parameter α is used to parameterize overdifferent group sizes (the smaller the α the larger the groupstend to be and thus the less their number and vice versa).

Last, to achieve a desired amount of data transfers acrossracks, the optimal weight ρ∗ can be obtained by using abisection approach similar to Algorithm 2.

V. EXTENSIONS

A. The case with replicas

In data centers it is common to have multiple replicas ofthe same data file in different racks to reduce the read/writelatency and increase system reliability [34]. Let us define theset of racks that store a requested data file di by user i as Ri .Recall that ck, j denotes the cost of transferring a data file fromrack k to rack j. Since we have |Ri | replicas of data file di ,the associated cost of transferring data file di to rack j wouldbe the minimum cost between transferring the data file fromany rack k ∈ Ri to rack j, that is, ci, j = mink∈Ri ck, j . Notethat ci, j = 0 if one of the replicas of data file di is stored inrack j already.

From the discussion above it is easy to see that the presenceof replicas yields the same formulation as before.

B. The case with multiple BSs

We consider the scenario as shown in Fig. 1a, where thereare multiple BSs/cells (say, G cells), each associated with anumber of users. For independent cells (no inter-cell interfer-ence and no coordination among cells like in CoMP [3]) it iseasy to see that the global problem can be decomposed intoG independent instantiations leading to the same formulationas before.

When cells are dependent, the problem is coupled. In thecase of Problem (8), this is so because two users belongingto two different cells may or may not use the same resourceblock depending on inter-cell interference, and the problemcannot be mapped to the SCFLP. In the case of Problem (9),this is so because the rate of users depends on both inter-cellinterference and on whether CoMP is used or not. That said,the greedy algorithm can still be used.

C. The case with multi-antenna users

When a user has more than one antennas, it can receivemore than one data streams at the same resource block. Letni, i ∈ M be the number of antennas of user i. By replacinguser i with ni users of one antenna each, and serving differentrequests of the original user i with each new user, we can mapthe multi-antenna problem to the same formulation as before.

D. The case with different requested data sizes

We consider the case where the requested data of users canhave different sizes, consuming different numbers of resourceblocks (note that we assume that the unit size of a user’srequested data is one resource block). Specifically, let ai bethe size of the requested data of user i (which is equal tothe number of resource blocks to be consumed by user i) and

0 10 20 30 40 500

5

10

15

20

25

30

35

Number of user data file requests M

Num

ber

of d

ata

tran

sfer

s

Randomized/CSI−based user groupingData−locality−aware user grouping

Fig. 6: Comparison between the data-locality-aware and therandomized/CSI-based user grouping schemes. (N = 10, η =4)

let Ai , {1, . . . , ai} be the partition of the requested datadi . Let xi, j,k be a binary variable, where xi, j,k = 1 if the k-th portion of the requested data di is assigned to rack j forprocessing and xi, j,k = 0 otherwise. The regularized resourceblock minimization problem (8) can be generalized as:

minimize∑j∈N

yj + ρ∑i∈M

∑k∈Ai

∑j∈N

ci, j xi, j,k

subject to∑j∈N

xi, j,k = 1, ∀i ∈ M, k ∈ Ai∑i∈M

∑k∈Ai

xi, j,k ≤ ηyj, ∀ j ∈ N∑k∈Ai

xi, j,k ≤ yj, ∀i ∈ M, j ∈ N

xi, j,k ∈ {0, 1}, ∀i ∈ M, k ∈ Ai, j ∈ Nyj ∈ {0, 1, 2, · · · }, ∀ j ∈ N, (10)

where the third constraint guarantees that any two portions ofthe requested data di are not multiplexed in the same resourceblock (since each user has only a single antenna and can onlyreceive one stream at each resource block). Problem (10) is amixed integer linear program which can be solved by usingGurobi.

VI. SIMULATION AND NUMERICAL RESULTS

A. Randomized/CSI-based vs data-locality-aware user group-ing

Fig. 6 compares the data-locality-aware user groupingscheme with the randomized/CSI-based user grouping schemein terms of the total number of induced data transfers. Weassume that the number of racks N equals 10, the number ofantennas in a BS η equals 4, and the number of user data filerequests M varies from 1 to 50. The data files requested bythe M users are distributed among the N racks independentlyand uniformly at random.

Under all schemes, we plot the number of induced datatransfers needed to fully exploit the spatial multiplexing gain,i.e., we group the M users into

⌈Mη

⌉groups of size η = 4. Fig. 6

10

0 50 100 150 2000

20

40

60

80

100

120

140


Num

ber

of d

ata

tran

sfer

s

Randomized/CSI−based user groupingRand/CSI asymptotic upper boundRand/CSI asymptotic lower boundData−locality−aware user groupingData−locality approximation formula

(a) N = 10, η = 4.

0 50 100 150 2000

50

100

150


Num

ber

of d

ata

tran

sfer

s


(b) N = 30, η = 4.

0 50 100 150 2000

50

100

150


Num

ber

of d

ata

tran

sfer

s


(c) N = 50, η = 4.

Fig. 7: Comparison of the asymptotic bounds / approximation formulas with simulation results.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

5

10

15

20

25

30

35

Traffic intensity (p)

Num

ber

of d

ata

tran

sfer

s

Randomized/CSI−based user groupingData−locality−aware user grouping

Fig. 8: Comparison between the data-locality-aware and therandomized/CSI-based user grouping schemes under differenttraffic intensity. (M = 50, N = 10, η = 4)

0 20 40 60 80 1002

3

4

5

6

7

8

9

Number of requests per rack

Spe

ctra

l effi

cien

cy (

bits

/s/H

z)

Joint CSI−data−locality−aware user groupingData−locality−aware user grouping

Fig. 9: Joint CSI- and data-locality-aware user groupingscheme. (η = 4)

illustrates that the data-locality-aware user grouping schemecan significantly reduce the number of induced data transfers.This comes as no surprise since the data-locality-aware schemehas been designed with this goal in mind.

In Fig. 7, under the randomized/CSI-based user grouping

scheme, we compare the asymptotic upper bound (Eq. (3)) andlower bound (Eq. (4)) for the number of induced data transferswith simulation results. These bounds hold for N � η and wevary N from 10 till 50 in Fig. 7a-c. We can see that both theupper and lower bounds give a good approximation.

In addition, under the data-locality-aware user groupingscheme, we compare the approximation formula (Eq. (7))for the number of induced data transfers with simulationresults. The approximation holds for M � Nη, and, indeed,it is accurate in this regime. Last, Fig. 7 illustrates that thenumber of induced data transfers for the data-locality-awareuser grouping scheme does not increase with the number ofuser data file requests M , while that for the randomized/CSI-based user grouping scheme increases linearly with M . 3

Last, in Fig. 8, we consider the case that the data requestsof a user arrive according to a discrete-time arrival processwhere the inter-arrival time follows a geometric distributionwith parameter p (traffic intensity). We assume that M = 50,N = 10, η = 4, and the user grouping decisions are madeper time slot. We plot the average number of data transfers aswe vary the traffic intensity from p = 0.1 (the average inter-arrival time of requests of a user is 10 slots) to p = 1 (eachtime slot there is a request from a user). We can see that theperformance gain increases as the traffic intensity increases.

B. Joint CSI- and data-locality-aware user groupingWe compare the performance of the data-locality-aware user

grouping scheme with that of the joint CSI- and data-locality-aware user grouping scheme in terms of the average spectralefficiency measured in bits/s/Hz. Note that as discussed inSection IV-C, both schemes induce the same number of datatransfers. We assume that the number of antennas in a BS isη = 4 and assume i.i.d. Rayleigh fading channel coefficientswith unit power. The spectral efficiency is calculated basedon ZFBF precoding at an SNR value of 3dB [15]. For thejoint CSI- and data-locality-aware user grouping scheme, weuse the greedy algorithm introduced in Section IV-C to formsemi-orthogonal user groups.

3For the randomized/CSI-based user grouping scheme, since the number ofdata transfers to form a group of η users is a fixed constant η − log N

log N log Nη

(independent of M) and we have a total of Mη groups, the total number of

data transfers increases with the number of users M .

11

13 14 15 16 170

1

2

3

4

5

6

7

8

9T

he c

ost f

or th

e in

duce

d da

ta tr

ansf

ers

The number of resource blocks

(a) M = 50.

25 26 27 280

1

2

3

4

5

6

7

The

cos

t for

the

indu

ced

data

tran

sfer

s


(b) M = 100.

38 39 40 410

0.5

1

1.5

2

2.5

3

3.5

4

The

cos

t for

the

indu

ced

data

tran

sfer

s


(c) M = 150.

Fig. 10: Pareto optimal curve for the regularized resource block minimization problem. (N = 10, η = 4)

13 14 150

0.5

1

1.5

2

2.5

3

3.5

4

The

cos

t for

the

indu

ced

data

tran

sfer

s


(a) N = 5.

13 14 15 16 170

1

2

3

4

5

6

7

8

9

The

cos

t for

the

indu

ced

data

tran

sfer

s


(b) N = 10.

13 14 15 16 17 18 190

5

10

15

20

25

30

The

cos

t for

the

indu

ced

data

tran

sfer

s


(c) N = 20.

Fig. 11: Pareto optimal curve for the regularized resource block minimization problem. (M = 50, η = 4)

25 26 270

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

The

cos

t for

the

indu

ced

data

tran

sfer

s


(a) η = 2.

13 14 15 16 170

1

2

3

4

5

6

7

8

9

The

cos

t for

the

indu

ced

data

tran

sfer

s


(b) η = 4.

9 10 110

0.5

1

1.5

2

2.5

3

3.5

The

cos

t for

the

indu

ced

data

tran

sfer

s


(c) η = 8.

Fig. 12: Pareto optimal curve for the regularized resource block minimization problem. (M = 50, N = 10)

Fig. 9 shows that, as expected, the joint CSI- and data-locality-aware user grouping scheme achieves a higher spectralefficiency due to the better-conditioned channel matrix formedby the users in the same group. Note that the rate gains can besignificant. For example, when the average number of requestsper rack is 50, there is a 1.4x improvement in data rate overthe data-locality-aware scheme. In addition, we can see thatthe gap between the two curves gradually increases as thenumber of requests per rack increases. The reason is that wehave progressively better-conditioned channel matrix due toincreasing multiuser diversity.

C. Regularized resource block minimization and regularizedspectral efficiency maximization

We study the Pareto optimal curves for the regularizedresource block minimization problem (8) in Figs. 10–13. Wevary the number of user data file requests M from 50 to 150,the number of racks N from 5 to 20, and the number ofantennas in a BS η from 2 to 8. The cost ck, j for transferringdata from rack k to rack j is assumed to be distributed inthe range [0, 50]. We plot the number of resource blocksneeded to serve all requests against the cost associated withthe induced data transfers as the weight ρ varies between 0

12

13 14 15 160

1

2

3

4

5

6

7T

he c

ost f

or th

e in

duce

d da

ta tr

ansf

ers


Fig. 13: Regularized resource blockminimization. (M=50, N=10, η= 4,seed=1)

6 6.2 6.4 6.6 6.8 7 7.2 7.40

1

2

3

4

5

6

7

The

cos

t for

the

indu

ced

data

tran

sfer

s

Spectral efficiency (bits/s/Hz)

Fig. 14: Regularized spectral efficiencymaximization. (M = 50, N = 10, η = 4,seed=1)

13 14 15 164.5

5

5.5

6

6.5

7

7.5


Spe

ctra

l effi

cien

cy (

bits

/s/H

z)

CSI−fully awareCSI−partially awareCSI−unaware

Fig. 15: Spectral efficiency for increas-ing use of CSI. (M = 50, N = 10, η =4, seed=1)

0 0.2 0.4 0.6 0.8 112

14

16

18

20

22

24

26

28

30

32

The weight ρ

The

ach

ieve

d co

st

2*Optimal2−approximation algorithmOptimal

Fig. 16: Performance of the 2-approximation algorithm. (M =50, N = 10, η = 4)

and 50. The Pareto-optimal curves are obtained by using CVXwith Gurobi, which is a commercial solver for mixed integerlinear programming problems. In all the cases, we see that asmaller number of resource blocks implies a larger numberof data transfers across racks to form larger user groups forZFBF precoding.

Fig. 14 shows the Pareto optimal curve for the regularizedspectral efficiency maximization problem (Problem (9)). ThePareto-optimal curve is obtained by using our proposed greedyalgorithm in Section IV-E with different values of the weightρ and the threshold α. As expected, the spectral efficiencyincreases as the number of the induced data transfers increases.Note that the weight ρ is the control knob that can be adjustedby the system operator according to the congestion level in thecloud data center. For example, when there is no congestion inthe cloud data center, ρ can be set to zero. That is, we focuson maximizing the spectral efficiency by forming user groupswith well-conditioned channel matrices and do not care aboutthe number of induced data transfers across racks (correspond-ing to operating the system at the rightmost point in Fig. 14).On the other hand, when congestion occurs in the cloud datacenter, we choose a large weight to suppress data transfersacross racks and preferentially group together users whose

requested data are located under the same rack. Operating thesystem at the leftmost point in Fig. 14 corresponds to the caseof using a very large weight for regularization (the data centernetwork is heavily congested), resulting in no data transfersacross racks in forming user groups.

Last, Fig. 15 shows the increase in the spectral efficiencyby exploiting CSI. The blue curve corresponds to Problem (8),the regularized resource block minimization problem. We referto this as the CSI-unaware scheme. The red curve correspondsto Problem (8) with the addition of using CSI to form semi-orthogonal groups within a rack with more than η requestsassigned to it, see the first paragraph of Section IV-E. Werefer to this as the CSI-partially-aware scheme. Last, the greencurve corresponds to Problem (9), the regularized spectralefficiency maximization problem, where CSI is fully exploited.We refer to this as the CSI-fully-aware scheme. To create theblue and red curves we use the Shannon rate formula withZFBF precoding to obtain the corresponding rates for differentnumber of resource blocks used. Since the CSI-partially-awarescheme is using CSI for some group formations, it has betterchannel gains, yielding about 1.2x improvement. To create thegreen curve and compare it with the other two, we use Fig. 13to map resource blocks to the cost of data transfers for thefirst two schemes, and then use Fig. 14 to directly get thecorresponding rates for the CSI-fully-aware scheme for theaforementioned cost of data transfers. Since this scheme usesCSI for all group formations it yields a higher gain, about1.3x. This shows how the increasing use of CSI can lead toincreasing spectral efficiency without causing additional datatransfers at the cloud data center.

D. Accuracy of the 2-approximation algorithmIn Fig. 16, we compare the achieved objective value of the

2-approximation algorithm for the regularized resource blockminimization problem with that we get by using CVX withGurobi, as we vary the weight ρ from 0 to 1. We can see thatthe 2-approximation algorithm behaves as expected, i.e., theachieved objective value lies within two times of the optimal.

VII. CONCLUSION

In this paper, we introduced a novel data-locality-aware usergrouping scheme for multi-user MIMO beamforming under

13

the cloud radio access network architecture. By grouping usersbased on both the CSI and the locality of their requesteddata, we significantly reduced the number of induced datatransfers across racks in the cloud data center, while providingthe same level of spatial multiplexing gain. To design auser grouping algorithm that can adapt to the congestionlevel in the cloud data center, we proposed a regularizedspectral efficiency maximization problem where the numberof data transfers across racks serves as a regularization termwhose weight can be controlled by a system operator. Whenthere is no congestion, a small weight can be used, resultingin the traditional CSI-based user grouping scheme. Whencongestion occurs, a large weight can be used, suppressingdata transfers across racks and preferentially grouping togetherusers whose requested data are located in the same rack. Last,under some simplifications, we casted the above problem asa soft-capacitated facility location problem, developed a 2-approximation user grouping algorithm to solve it, and studiedvia simulations and numerical analysis the tradeoff betweenspectral efficiency and data transfer cost.

REFERENCES

[1] A. Checko, H. L. Christiansen, Y. Yan, L. Scolari, G. Kardaras, M. S.Berger, and L. Dittmann, “Cloud RAN for mobile networks – a tech-nology overview,” IEEE Commun. Surveys Tuts., vol. 17, no. 1, pp.405–426, firstquarter 2015.

[2] M. Peng, Y. Sun, X. Li, Z. Mao, and C. Wang, “Recent advances incloud radio access networks: System architectures, key techniques, andopen issues,” IEEE Commun. Surveys Tuts., vol. 18, no. 3, pp. 2282–2308, thirdquarter 2016.

[3] A. Davydov, G. Morozov, I. Bolotin, and A. Papathanassiou, “Evaluationof joint transmission CoMP in C-RAN based LTE-A HetNets with largecoordination areas,” in Proc. IEEE Globecom Workshops, 2013.

[4] Z. Zhao, M. Peng, Z. Ding, W. Wang, and H. V. Poor, “Cluster contentcaching: An energy-efficient approach in cloud radio access networks,”IEEE J. Sel. Areas Commun., vol. 34, no. 5, pp. 1207–1221, May 2016.

[5] M. Zhu, D. Li, F. Wang, A. Li, K. K. Ramakrishnan, Y. Liu, J. Wu,N. Zhu, and X. Liu, “CCDN: Content-centric data center networks,”IEEE/ACM Trans. Netw., vol. 24, no. 6, pp. 3537–3550, Dec. 2016.

[6] “AirScale cloud RAN,” Nokia, Tech. Rep., 2016.[7] “Multi-layer and cloud-ready radio evolution towards 5G,” Nokia, Tech.

Rep., 2016.[8] C. Guo, L. Yuan, D. Xiang, Y. Dang, R. Huang, D. Maltz, Z. Liu,

V. Wang, B. Pang, H. Chen, Z.-W. Lin, and V. Kurien, “Pingmesh: Alarge-scale system for data center network latency measurement andanalysis,” in Proc. ACM SIGCOMM, 2015.

[9] K. He, E. Rozner, K. Agarwal, Y. J. Gu, W. Felter, J. Carter, andA. Akella, “AC/DC TCP: Virtual congestion control enforcement fordatacenter networks,” in Proc. ACM SIGCOMM, 2016.

[10] Y. Shi, J. Zhang, and K. B. Letaief, “Group sparse beamforming forgreen Cloud-RAN,” IEEE Trans. Wireless Commun., vol. 13, no. 5, pp.2809–2823, May 2014.

[11] T. L. Marzetta, “Noncooperative cellular wireless with unlimited num-bers of base station antennas,” IEEE Trans. Wireless Commun., vol. 9,no. 11, pp. 3590–3600, Nov. 2010.

[12] H. Huh, A. M. Tulino, and G. Caire, “Network MIMO with linear zero-forcing beamforming: Large system analysis, impact of channel esti-mation, and reduced-complexity scheduling,” IEEE Trans. Inf. Theory,vol. 58, no. 5, pp. 2911–2934, May 2012.

[13] H. Huh, G. Caire, H. C. Papadopoulos, and S. A. Ramprashad, “Achiev-ing massive MIMO spectral efficiency with a not-so-large number ofantennas,” IEEE Trans. Wireless Commun., vol. 11, no. 9, pp. 3226–3239, Sep. 2012.

[14] H. Holma, A. Toskala, and J. Reunanen, LTE Small Cell Optimization:3GPP Evolution to Release 13. Wiley, 2016.

[15] T. Yoo and A. Goldsmith, “On the optimality of multiantenna broad-cast scheduling using zero-forcing beamforming,” IEEE J. Sel. AreasCommun., vol. 24, no. 3, pp. 528–541, Mar. 2006.

[16] A. Bayesteh and A. K. Khandani, “On the user selection for MIMObroadcast channels,” IEEE Trans. Inf. Theory, vol. 54, no. 3, pp. 1086–1107, Mar. 2008.

[17] W. L. Shen, K. C. J. Lin, M. S. Chen, and K. Tan, “SIEVE: Scalableuser grouping for large MU-MIMO systems,” in Proc. IEEE INFOCOM,2015.

[18] S. H. Park, O. Simeone, and S. S. Shitz, “Joint optimization of cloud andedge processing for fog radio access networks,” IEEE Trans. WirelessCommun., vol. 15, no. 11, pp. 7621–7632, Nov. 2016.

[19] G. Judd, “Attaining the promise and avoiding the pitfalls of TCP in thedatacenter,” in Proc. USENIX NSDI, 2015.

[20] M. Mahdian, Y. Ye, and J. Zhang, “Approximation algorithms for metricfacility location problems,” SIAM Journal on Computing, vol. 36, no. 2,pp. 411–432, 2006.

[21] H.-C. An, M. Singh, and O. Svensson, “LP-based algorithms forcapacitated facility location,” in FOCS, 2014.

[22] P. Rost, I. Berberana, A. Maeder, H. Paul, V. Suryaprakash, M. Valenti,D. Wbben, A. Dekorsy, and G. Fettweis, “Benefits and challengesof virtualization in 5G radio access networks,” IEEE Commun. Mag.,vol. 53, no. 12, pp. 75–82, Dec. 2015.

[23] D. Pompili, A. Hajisami, and H. Viswanathan, “Dynamic provisioningand allocation in cloud radio access networks (C-RANs),” Ad Hoc Netw.,vol. 30, no. C, pp. 128–143, Jul. 2015.

[24] S. Bhaumik, S. P. Chandrabose, M. K. Jataprolu, G. Kumar, A. Muralid-har, P. Polakos, V. Srinivasan, and T. Woo, “CloudIQ: A framework forprocessing base stations in a data center,” in Proc. ACM Mobicom, 2012.

[25] A. Gudipati, D. Perry, L. E. Li, and S. Katti, “SoftRAN: Softwaredefined radio access network,” in Proc. HotSDN, 2013.

[26] Y. D. Beyene, R. Jntti, and K. Ruttik, “Cloud-RAN architecture forindoor DAS,” IEEE Access, vol. 2, pp. 1205–1212, 2014.

[27] K. Sundaresan, M. Y. Arslan, S. Singh, S. Rangarajan, and S. V.Krishnamurthy, “FluidNet: A flexible cloud-based radio access networkfor small cells,” IEEE/ACM Trans. Netw., vol. 24, no. 2, pp. 915–928,Apr. 2016.

[28] A. Adhikary, J. Nam, J. Y. Ahn, and G. Caire, “Joint spatial division andmultiplexing – the large-scale array regime,” IEEE Trans. Inf. Theory,vol. 59, no. 10, pp. 6441–6463, Oct. 2013.

[29] M. Raab and A. Steger, “Balls into bins - a simple and tight analysis,”in Proc. RANDOM, 1998.

[30] B. C. Arnold, N. Balakrishnan, and H. N. Nagaraja, A first course inorder statistics. Siam, 1992.

[31] T. Wang, Z. Su, Y. Xia, and M. Hamdi, “Rethinking the data centernetworking: Architecture, network protocols, and resource sharing,”IEEE Access, vol. 2, pp. 1481–1496, Dec. 2014.

[32] R. Rojas-Cessa, Y. Kaymak, and Z. Dong, “Schemes for fast transmis-sion of flows in data center networks,” IEEE Commun. Surveys Tuts.,vol. 17, no. 3, pp. 1391–1422, thirdquarter 2015.

[33] K. Jain, M. Mahdian, E. Markakis, A. Saberi, and V. V. Vazirani,“Greedy facility location algorithms analyzed using dual fitting withfactor-revealing LP,” J. ACM, vol. 50, no. 6, pp. 795–824, Nov. 2003.

[34] D. Boru, D. Kliazovich, F. Granelli, P. Bouvry, and A. Y. Zomaya,“Energy-efficient data replication in cloud computing datacenters,” Clus-ter Computing, vol. 18, no. 1, pp. 385–402, 2015.

Weng Chon Ao received the BS degree in computerscience and the MS degree in electrical engineeringfrom National Taiwan University, Taiwan. He re-ceived the PhD degree in electrical engineering fromthe University of Southern California, Los Angeles.His research interests include wireless communica-tion systems and data center networking. He receivedthe IEEE PIMRC 2013 Best Paper Award.

14

Konstantinos Psounis (S’97–M’02–SM’08–F’17)received the bachelor’s degree from the Departmentof Electrical and Computer Engineering, NationalTechnical University of Athens, Greece, in 1997,and the M.S. and Ph.D. degrees in electrical en-gineering from Stanford University, Stanford, CA,USA, in 1999 and 2002, respectively. He is currentlya Professor of electrical engineering and computerscience with the University of Southern California.He models and analyzes the performance of a varietyof wired and wireless networks. He also designs

methods, algorithms, and protocols to solve problems related to such systems.He is a senior member of ACM.

Documents

Data-locality-aware User Grouping in Cloud Radio Access Networks · 2019-05-23 · Data-locality-aware User Grouping in Cloud Radio Access Networks Weng Chon Ao and Konstantinos Psounis,