Scaling the LTE Control-Plane for Future Mobile Access...a key control-plane element in LTE.SCALEis fully compat-ible with the 3GPP protocols, ensuring that it can be readily deployed

Scaling the LTE Control-Plane for FutureMobile Access

Arijit BanerjeeUniversity of Utah

[email protected]

Rajesh MahindraNEC Labs America [email protected]

Karthik SundaresanNEC Labs America Inc.

[email protected]

Sneha KaseraUniversity of Utah

[email protected]

Kobus Van der MerweUniversity of Utah

[email protected]

Sampath RangarajanNEC Labs America Inc.

[email protected]

AbstractIn addition to growth of data traffic, mobile networks arebracing for a significant rise in the control-plane signaling.While a complete re-design of the network to overcome in-efficiencies may help alleviate the effects of signaling, ourgoal is to improve the design of the current platform to bettermanage the signaling. To meet our goal, we combine twokey trends. Firstly, mobile operators are keen to transformtheir networks with the adoption of Network Function Vir-tualization (NFV) to ensure economies of scales. Secondly,growing popularity of cloud computing has led to advancesin distributed systems. In bringing these trends together,we solve several challenges specific to the context of tele-com networks. We presentSCALE - A framework for effect-ively virtualizing the MME (Mobility Management Entity),a key control-plane element in LTE.SCALE is fully compat-ible with the 3GPP protocols, ensuring that it can be readilydeployed in today’s networks.SCALE enables (i) computa-tional scaling with load and number of devices, and (ii) com-putational multiplexing across data centers, thereby reducingboth, the latencies for control-plane processing, and the VMprovisioning costs. Using an LTE prototype implementationand large-scale simulations, we show the efficacy ofSCALE.

CCS Concepts•Networks → Network services; Wireless access points,base stations and infrastructure; Network control algorithms;

KeywordsNFV, EPC, MME, scaling, elasticity, Geo-multiplexing

1. INTRODUCTIONFactors such as always-on connectivity, the rise of cloud-

computing [1] and the growth of IoT devices projected at

Permission to make digital or hard copies of all or part of this work for personalor classroom use is granted without fee provided that copies arenot made ordistributed for profit or commercial advantage and that copies bear this noticeand the full citation on the first page. Copyrights for components of thiswork owned by others than ACM must be honored. Abstracting withcredit ispermitted. To copy otherwise, or republish, to post on servers orto redistribute tolists, requires prior specific permission and/or a fee. Request permissions [email protected] ’15, Dec 1–4, 2015, Heidelberg, GermanyCopyright 2015 ACM 978-1-4503-3412-9/15/12 ...$15.00.http://dx.doi.org/10.1145/2716281.2836104

26 billion by 2020 [2] have implications on the amount ofcontrol signaling in mobile networks. Nokia [3] estimatesthe growth in the signaling traffic would be50% faster thanthe growth in data traffic. In some US markets, the peak net-work usage recorded was as high as45 connection requestsper device per hour [4]. In a European network, about2500signals per hour were generated by a single application [5],causing network outages. In LTE, this increased signalingtraffic will have a significant impact on the performance ofthe MME, the main control-plane entity that handles5 timesmore signaling, than any other entity [6].

As the first step towards evolving their mobile networks,operators are keen to consider NFV [7]. Recent trends stronglysuggest that virtualized clouds [8] can provide high reliab-ility at lower cost. With NFV, the virtualized MME entit-ies would execute as VMs over general-purpose hardware.However, to ensure quick adoption of NFV in LTE, it iscritical to effectively manage the virtual MME resources tohandle the expected signaling growth. Efficient manage-ment of the MME resources encompasses2 key aspects:performanceandlower operating costs. Firstly, overloadedMME VMs can cause significant delays for connectivity andhandovers, directly affecting the user experience. Secondly,control signaling does not generate any direct revenue forthe mobile operators. To ensure cost effectiveness, it is im-portant to dimension the VM resources according to currentload. With the expected growth of certain IoT services thatexhibit signaling-centric behavior, conservative provisioningmay lead to large number of under-utilized VMs.

The transition from dedicated hardware to virtualized soft-ware based platforms will itself prove beneficial. However,MME implementations are designed for specialized hard-ware and cannot be directly virtualized since they are in-elastic: hard to manage compute resources and rigid: involvestatic configurations. It is critical to apply experiences andconcepts from the distributed systems [9] to virtual MMEsto ensure maximum gains. However, the concepts leveragedby distributed systems in cloud systems cannot be directlyapplied to a telecom service as the latter has unique charac-teristics that must be considered. Typical telecom servicesdeal with well definedinterfacesand protocols, persistentsessions, resulting in coupled storage and compute manage-ment andhighly distributed deployments. A well-definedinterface exists between the MMEs and the basestations.A

Figure 1: LTE-EPC Network Architecture.

device is persistently managed by the same MME for time-periods much longer than the duration of individual applica-tion flows. MME deployments are well distributed given thewidespread presence of telecom data-centers [10]. On theother hand, cloud services have higher flexibility to leveragecustom protocols, decouple storage and compute manage-ment as most sessions are short-lived and have relativelymore centralized deployments (For instance,3 DCs in theUS in the case of EC2).

We present the design ofSCALE, that systematically over-comes the afore-mentioned challenges in bridging the di-vide between virtualization and MME implementations. Ourcontributions are multi-fold: (1)SCALE re-architects theMME functionality into 2 parts–a front-end load balancerthat maintains standard interfaces and a back-end elastic MMEprocessing cluster. (2)SCALE instruments consistent hash-ing [11] for MME implementations to ensure scalability withlarge number of devices. The instrumentation includes a rep-lication strategy that intelligently replicates states ofdevicesand places the replicas across VMs. To devise this strategy,SCALE adopts a stochastic analysis driven approach that ac-counts for both compute and memory resources. (3)SCALEgives flexibility to operators to reduce costs by lowering theVM provisioning based on two key features: (i) leverageaccess patterns of devices, if available, to intelligentlyre-duce memory usage due to replication and (ii) pro-activelyreplicate selective device states externally across data cen-ters to leverage spatial multiplexing of compute resources.(4) SCALE is implemented on an end-to-end LTE Release-9compatible testbed using the OpenEPC [12] platform. Weperform extensive experiments to illustrate the inefficiencieswith the current systems and show the feasibility ofSCALEto work with existing protocols. In a particular instantiation,SCALE reduces the99th percentile delay of processing thecontrol-plane messages from more than1s with current im-plementations to about250ms. (5) We also perform systemsimulations to show the efficacy ofSCALE at larger scales.

2. BACKGROUNDIn this section, we provide a brief background of the LTE

architecture and the MME functionality. LTE networks con-sists of the Radio Access Network (RAN) and the EvolvedPacket Core (EPC) as shown in Figure 1. The RAN in-cludes eNodeBs that wirelessly serve the users; the EPCconsists of network entities that bothmanagethe devices androutethe data traffic. The control plane elements consists of

the MME (Mobility Management Entity), HSS (Home Sub-scriber Server) and the PCRF (Policy and Charging RulesFunction). The S-GW (Serving Gateway) and the P-GW(Packet Data Network Gateway) are routers that provide con-nectivity to the devices. HSS and PCRF are the databaseservers for user subscription information and QoS/billingpolicies respectively. MME is the key control node in theEPC network since it manages both device connectivity andmobility. In addition to being the entry point for controlplane messages from the devices, it manages other control-plane entities using 3GPP standard [13] specific well-definedinterfaces: (a) TheS1AP interface with the eNodeBs carriesthe control protocols exchanged between the MMEs and theeNodeBs and the MME and the devices; (b) TheS11 inter-face with the S-GW carries the protocols to create and des-troy the data-path for each device and (c) TheS6 interfacewith the HSS is used for protocol exchange to retrieve userinformation from the HSS.MME Procedures: When a device is powered-on, it re-gisters with the network. Henceforth, the device operatesin two modes:ActiveandIdle mode. There are several pro-cedures involving signaling between the MME nodes, eN-odeBs and the devices. We briefly describe the key pro-cedures:(a) Attach/Re-Attach:When a device powers-onor needs to transmit a packet while in Idle mode, it sendseither an “attach request” or a “service request” messageto the MME over theS1. The MME (re)-authenticates thedevice and (re)-establishes the data plane at the S-GW; thedevice moves back to Active mode.(b) Tracking Area (TA)updates:A device makes a transition into Idle mode afteran inactivity timeout. The MME releases the data plane atthe S-GW. While in Idle mode, the device sends periodicTA or location updates to the MME.(c) Paging: If a packetin the downlink is received for a device in Idle mode, theMME initiates thePaging procedure to all the eNodeBsin the device’s TA and the device responds with a re-attachprocedure.(d) Handovers:The MME establishes the data-path between the S-GW and the new eNodeB and tears downthe data-path with the old eNodeB.MME Provisioning: In performing the afore-mentioned pro-cedures, an MME stores an associatedstatein memory foreach device assigned to it, and executescomputational taskson thestateto respond to the control requests from the device.Some of thetasksinclude protocol parsing, authentication,authorization, inter-eNodeB mobility management, roaming,WiFi mobility, paging and TA-updates, S-GW load-balancing,generation of Call-Data Records, billing, and lawful inter-cepts [14]. Thestate typically consists of timers, crypto-graphy keys, S-GW, PGW and other data-path parameters,eNodeB radio resource management configurations, CDRsand location. Hence, MMEs have to be provisionedjointlyfor memory resources (to store state of all devices) and com-pute resources (to process requests ofActivedevices).

3. FUTURE NETWORK EVOLUTIONAs the demand for mobile data grows at a significant pace,

current networks need to evolve continuously to keep up

0

200

400

600

800

1000

1200

0 200 400 600 800 1000

99th

%til

e D

elay

(m

secs

)

Requests per second

Attach ReqService Req

Handovers

0

0.2

0.4

0.6

0.8

1

0 500 1000 1500 2000

CD

F

Processing Delay (msecs)

ATTACH Req (Overloaded)

ATTACH Req(Light Load)

0

20

40

60

80

100

120

10 20 30 40 50

Act

ual L

oad

%ta

ge

Overload %tage

MME#1(3GPP)

MME#1(IDEAL)MME#2(3GPP)

MME#2(IDEAL) 0

50

100

150

200

250

300

350

5 10 15 20 25 30 35Pro

cess

ing

Del

ays

(mse

cs)

Time (Seconds)

MME #1MME #2

(a) Static Assignment (b) Overload Protection (c) Signaling Overhead (d) Scaling-outFigure 2: Limitations of the current MME platform.

with the growth. The5G vision for the radio access network(RAN) is to explore higher frequencies (e.g., mmWave) toincrease bandwidth and reduce delays. However, in the caseof the EPC network, the focus is to evolve the current com-ponents to meet future requirements of flexibility and costreduction [15]. While the data-plane entities are critical,itis equally important to evolve the control-plane of the EPCnetwork. As a first step to evolve the control-plane, operatorsare keen to adopt Network Function Virtualization (NFV)based MME deployments [7]. To maximize the effectivenessof virtualization in managing the expected growth of control-plane signaling, we list some requirements and opportunitiesfor future virtual MME deployments:1. Scale of Operation: MMEs face a unique challenge –

while the number of active devices that generate signalingmay be relatively lower, typically, the number of registereddevices are significantly higher. This trend is going to dom-inate in future with the proliferation of IoT devices. Hence,as devices evolve from smartphones to wearables, sensors,vehicles etc., it is important that the design of the MME islight-weight to handle the growing number of devices.2. Elasticity: Historically, scaling MMEs has been a hard-

ware process involving upgrading to faster network processorswhile keeping the same architecture and software. To lever-age virtualization in ensuring lower operating costs, it iscritical to provision and scale-up and down the virtual MMEresources proportional to the current signaling load.3. Distributed deployments: Although operators have highlydistributed DCs (Data Centers), most MME deployments arecentralized to reduce provisioning costs. Within the NFVparadigm [10], there is an opportunity for operators to lever-age virtualization to distribute the MME computation. Do-ing this efficiently would reduce provisioning costs, by mul-tiplexing the resources across DCs and increase availability.4. Lowering Processing Delays: Higher processing delaysfor critical requests like handovers have negative impactsonTCP [16]. Devices make frequent transitions toIdle modeto reduce battery usage [17]; delays in transitioning backto Activemode affect the web page download times. Thus,vendors are focussing on defining strict delays for control-plane in future systems [18].

3.1 State of the Art: LimitationsTo meet future requirements, the idea of evolving the cur-

rent MMEs by simply porting the code to a virtualized plat-form will be highly inefficient. The fundamental problem

is that the MME design and implementations strictly ad-here to 3GPP-based standards based on archaic assumptions:(i) over-provisioned MMEs deployed on specialized hard-ware, (ii) limited number of devices and (iii) infrequent ca-pacity expansions. Field studies have shown that over-provisionedMME systems are also subject to persistent overloads [4].Future application characteristics such as synchronous mass-access [19], where multiple event-triggered devices becomeactive simultaneously , will aggravate the inefficiencies fur-ther due to spatio-temporal overloads and load-skewness.Experimental Illustration: The OpenEPC platform [12] isa software-based implementation of 3GPP standard-compliantEPC components that executes over general purpose servers.We use experiments on a OpenEPC testbed (details in Sec-tion 5) to elaborate inefficiencies with current implement-ations. We leverage OpenEPC, since it is the only EPCplatform that is readily available. Moreover, it is primarilya reference design that is implemented similar to currenthardware-based MMEs. Although the absolute performancenumbers from the OpenEPC experiments may not be validfor other commercial MME platforms, the trends shown inthe experiments will generally hold to virtual MME plat-forms. The experiments primarily focus on the inefficienciesof elasticity of current MMEs. However, we break it downinto 4 experiments for the ease of understanding individualprocedures that contribute to the inefficiencies.1. Static Assignment: The fundamental problem is that aneNodeB statically assigns devices to a particular MME andthe requests from a device are always forwarded to the sameMME. Specifically, a group of MME servers are clustered toform a pool and directly connect to all the eNodeBs in spe-cific geographical areas as shown in Figure 1.When a deviceattaches to the network, the corresponding eNodeB selectsan appropriate MME from the pool. After successful attach-ment, the device is assigned a temporary ID, known as theGUTI (Globally Unique Temp id), that contains the uniqueID of the MME. Subsequent requests from that device arerouted to the same MME by the eNodeBs using theGUTI.With static assignment, a sudden surge in the number ofactive devices can cause significant overloads in a MME.To illustrate the adverse effect of overloads, we measuredthe processing delays for different types of procedures pro-cessed by the same MME in Figure 2(a). As the load isincreased by increasing the number of requests per second,the delays increase significantly after a particular thresholdfor each case. Once the compute capacity is reached, the

0

20

40

60

80

100

120

30 20 10 0

99th

%ile

Del

ay (

mse

cs)

eNodeB MME RTT (msecs)

Attach ReqService Req

Handovers

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

CD

F


Single DCMultiple DC

(a) Propagation Delays (b) Avg Load ConditionsFigure 3: MME pooling across multiple DCs.

requests have to be queued, resulting in high and unpredict-able delays. As done in current deployments, overloadedVMs could be avoided by conservatively assigning devicesto each VM in order to handle the worst-case scenario whereall the devices become active at the same time. Hence, staticassignment would lead to provisioning significantly highernumber of VMs.2. Overload Protection: When overloaded, an MME has

the provision to reassign devices to another MME in thesame pool. The reassignment procedure, however, involveshigh overheads. Each device is sent messages asking it tore-initiate a control connection to ensure that the eNodeBwould assign it to another MME server. In addition, mes-sages are exchanged between the MMEs to transfer the stateof the devices. The additional signaling causes high delaysand further increase in load. In an experiment with2 MMEs,Figure 2(b) illustrates this by comparing the delays for twocases: (i) the devices of MME1 attach to the network whenMME1 is lightly loaded and (ii) devices of MME1 attachto the network when MME1 is overloaded, and MME1 re-assigns them to MME2. To show the effect of extra signalingon increase in load, we plot the average load on both theMMEs as the percentage of overload on MME1 is increasedin Figure 2(c). The ideal scenario represents the case whereMME2 is able to absorb the additional load from MME1without any overhead. Clearly, the average load due to theadditional signaling increases on both MMEs compared tothe ideal case as the amount of overload increases. In virtualMMEs, fine-grained load balancing among co-located VMsmust be: (i) transparent to devices to ensure scalability and(ii) proactive instead of reactive to ensure low delays.3. Scaling-out: Although it is possible to add MME serversto a pool, the procedure is cumbersome and designed for in-frequently planned capacity expansions. For instance, whenan MME is added, irrespective of its processing capability,it is initially configured with a lower weight. This ensuresit is aggressively assigned devices by the eNodeBs com-pared to other MMEs. Moreover, only unregistered devicescan be assigned to the newly added MMEs, limiting theability to re-balance the existing load. In an experimentwith an overloaded MME (MME1), MME2 is instantiated atabout10 seconds. The average load in the system is around50 requests per second and10% of the total requests arefrom unregistered devices. Figure 2(d) plots the connectivitydelays perceived by the devices every5 seconds in both theMMEs. Since the system is unable to rebalance the load

Figure 4: Design Architecture.

from existing devices, it takes approximately35 seconds forthe delays at both the MMEs to equalize. The effect will beworse in large-scale deployments. Dynamic partitioning ofdevice state needs careful thought since the primary driverfor virtualization is the ability to adapt resources propor-tional to the load.4. Geo-multiplexing: It is possible to provision resources

across data centers (DCs) in current systems by deployingcertain MME servers of the same pool at remote DCs. How-ever, it is highly inflexible and inefficient due to static config-urations: (i) The eNodeBs do not consider the propagationdelays to the MMEs while assigning devices to an MMEserver. As shown in Figure 3(a), the propagation delays canadversely affect the overall control plane delays for differenttypes of messages. (ii) Once a device is assigned to an MMEplaced in a remote DC, the requests from that device arealways forwarded to the remote DC by the eNodeBs, evenwhen the local DC is not overloaded. Figure 3(b) shows thatsuch assignment leads to inflated control-plane delays evenunder average load conditions. Hence, there is a need to de-vise ways toopportunisticallyoffload processing to remoteDCs when the local DC resources face persistent loads.

4. SCALE: MME VIRTUALIZATIONTowards addressing the above limitations, we design and

implementSCALE, a framework for efficient managing theresources of virtual MMEs. The first steps towards the designof an efficient virtual MME is to address the rigidity in cur-rent MME platforms. We begin with a description ofSCALE’sarchitecture followed by the detailed system design.

4.1 ArchitectureIn architecting a virtual MME framework,SCALE has to

enable elasticity of MME resources while ensuring stand-ards compatibility. To achieve elasticity, architecture mustensure that (i) the device assignment decisions taken by theeNodeBs can be over-ridden by the MMEs and (ii) the MMEresource management procedures, such as VM creation anddeletion should be transparent to the eNodeBs. Addition-ally, support for standard interfaces is critical for severalreasons:(a) Incremental deployment of virtual MME along-side legacy EPC nodes,(b) eNodeBs are commoditized andexpensive to replace or upgrade and(c) The presence ofMiddleboxes [20, 21] for traffic analytics and optimizations.

In order to meet these two requirements,SCALE decouplesstandard interfaces and eNodeB based device assignment from

the MME process implementation.SCALE achieves suchdecoupling by architecting the MME cluster as two separatelogical functions as shown in Figure 4. The figure showsa single instance of an MME pool realized bySCALE thatwould be deployed at a particular DC.

1. MLB (MME LoadBalancer): The design of theMLB is es-sentially similar to that of HTTP load-balancers in IT clouds.TheMLB acts as a MME proxy: eachMLB entity representsa single MME entity by exposing standard interfaces to theexternal entities. For instance, it establishes the S1AP andS11 interfaces with the eNodeBs and S-GWs respectively.TheMLB essentially negates the effect of device assignmentand request routing decisions taken by the eNodeBs. TheeNodeBs simply choose theMLB to route a device requestand theMLB forwards these requests to the appropriate MMEprocessing VM. Hence, theMLB ensures that the device (re)-assignment decisions within the MMP processing cluster canbe performed without affecting the eNodeBs or the S-GWs.

2. MMP (MME Processing Entity): TheMMP VMs collect-ively form a MME pool to process requests from all devicesbelonging to the geographic area managed by that pool. Es-sentially eachMMP VM of a certain pool can process requestsfrom devices assigned to different MMEs in that pool. Thisrequires device-to-MME mapping information to be storedfor each device at theMMPVMs. SCALE adds this informationto the existing state information that theMMP VMs alreadystore for each device. Such a design improves the utilizationof the cluster as the devices belonging to a particular DC canbe flexibly assigned across theMMP VMs.

4.2 System Design PrinciplesWhile the architecture effectively decouples the MME pro-

cessing from the eNodeB interfaces, theMMP VMs of a poolneed to collectivelymanageall the registered devices. Fig-ure 5(a) depicts the system components ofSCALE. Everyepoch (time granularity of several minutes),SCALE provisionsthe appropriate number ofMMP VMs independently at eachDC, based on the expected load for the current epoch. TheVM provisioning procedure triggers theState Managementroutine, that effectivelyassignscurrent devices across theactiveMMPVMs. Subsequently, theState Allocation routinestrives to refine the device allocations to reduce VM usageand assign selected devices toMMP VMs across DCs to avoidDC overload scenarios. In designing these components, fol-lowing considerations are key in the context of MMEs:1. Coupling between state management and computation:In addition to managing (storing) the state of all the devices,the MMEs perform intensive processing on the state of theactive devices. While it may be trivial to assign the devicestates uniformly across the VMs, performing it dynamically:(a) to provision and account for the compute resources and(b) in presence of VM addition or removal, is non-trivial.Hence,SCALE breaks the problem into two steps, namelystate partitioning and state replication as shown in Figure5(a).At a high-level, the problem of state partitioning is similar tothat faced by distributed key-value storage systems, such asdynamoDB and Cassandra, that have been deployed success-fully at scale [9, 22]. Thus,SCALE applies the same basic

(a) MMP Design (b) Consistent HashingFigure 5:SCALE’s key design components

technique leveraged by such systems, i.e., consistent hashingand tailors it to the needs of the MME. Additionally,SCALErelies on state replication to manage the compute resourcesat fine time-scales and avoid overload ofMMP VMs. How-ever, several factors need to be considered when devisingthe ideal replication strategy: (i) Compute provisioning forprocessing the requests from active devices, (ii) Consistencyacross the multiple replicas is critical for proper operation ofMME protocols. To this end,SCALE leverages an analyticalframework, that accounts for the above factors, to determinethe right number and placement of the replicas of devicestates at appropriateMMP VMs.2. Resource Provisioning Costs: As opposed to IT cloudsthat have the ability to scaleinfinitely, there is expected tobe an associated cost with every additional VM in an oper-ator’s DC. Typically, operator clouds have higher presenceand hence, it is harder for operators to over-provision re-sources at individual DCs. Moreover, multiple network ser-vices besides the EPC, typically share the resources withinasingle DC. Also, the total number of registered devices forwhich state has to be stored can be significantly higher thanthe number of active devices during most time instances.Consequently, storing multiple replicas of all devices withlimited memory resources becomes challenging, requiringSCALE to be prudent aboutwhich devices’ state need to bereplicated. To address these challenges,SCALE couples VMProvisioning with intelligent state allocation as shown inFig-ure 5(a).SCALE leveragesaccess awarenessor knowledge ofpast device connectivity patterns to (i) reduce the number ofreplicas for selective devices; and (ii) accommodate replicasof external (from remote DCs) device states at a local DC -this enables computational multiplexing across DCs at finetime-scales without incurring additional VM provisioning.

4.3 State ManagementWe now describe each component in detail.

4.3.1 State Partitioning: SCALE leverages consistent hash-ing [11] to uniformly distribute device states across the act-ive MMP VMs. In consistent hashing, the output range of ahash function is treated as a “fixed circular ring”. Figure 5(b)shows a simple instantiation of a consistent hash ring withoutput range [1-12]. EachMMP VM is represented by a num-ber of tokens(random numbers), and the tokens are hashedto the ring so that each VM gets assigned to multiple pointson the ring. For example,MMP-A gets assigned to points 1, 4

0

5

10

15

20

0 0.2 0.4 0.6 0.8 1

Nor

mal

ized

Cos

t

Arrival Rate (Requests/second)

Replication=1

Replication=2

Replication=3 0

5

10

15

20

0.7 0.8 0.9 1

Nor

mal

ized

Cos

t

Arrival Rate (Requests/second)

Random Replication

ProbabilisticReplication

(a) Effect of Replication (b) Access Awareness

Figure 6: State Allocation

& 9 on the ring. Then, each incoming device is assigned toanMMP VM by first hashing itsGUTI to yield its position onthe ring. The ring is then traversed in clockwise direction todetermine the firstMMP VM that has a position larger than thedevice’s position on the ring; this VM becomes the masterMMP for that device. For instance, in Figure 5(b), theMMP-Ais selected as the masterMMP for the device withGUTI1. Par-titioning the device states using consistent hashing ensuresthat (a)MMP VMs scale incrementally in a scalable, decent-ralized way: addition or removal of VM only affects statere-assignment among neighboring VMs, (b)MLBVMs do notneed to maintain routing tables for device toMMP mapping,making them efficient in terms of both lower memory usageas well as scalable with faster lookup speeds. Hence, whena control message orrequestfrom a device is received bytheMLB for processing, it simply forwards the request to theappropriateMMP using the hash value of theGUTI. In case ofa request from an unregistered device, theMLB first assignsit a GUTI before routing its request.4.3.2 State Replication: To counter overloads,SCALE devicesa strategy for the number and placement of replicas.1.Number of Replicas,R: While increasing the number ofreplicas leads to better load balancing, it contributes to highermemory requirements and synchronization costs to ensureconsistency. To strike a balance between these conflictingobjectives, we employ stochastic analysis to model the im-pact of replication in consistent hashing on load balancing.The analysis helps us understand the impact of replicationfrequencyR on the average delay (C̄) experienced by a device’scontrol request within a DC during a time epoch of durationT . We defer the proof to theAppendix and instead focus onthe implications. The closed-form expression for the averagedelay, C̄ as a function ofR given in the Equation 10 inAppendix-A1. Using the analysis, we now plot the averagecost or delay in processing the requests as a function ofthe arrival rate of requests. As seen from Figure 6(a), asthe arrival rate increases, the load on the VMs increasescausing higher normalized cost (processing delays) for re-quests when no replication is used. However, by replicatingthe state of a device in just one other VM drastically re-duces the delays perceived by the devices. Hence, the utilityof increasing the replication count (R) beyond2 does nothave significant value. This is promising as it indicates thatmost of the load balancing benefits can be realized with just2 copies of the state. InSCALE, replication is performedby the masterMMP asynchronously as and when a request

for a device is received. After processing the request for adevice, the masterMMP may choose to replicate the state ofthat device to its neighboringMMP on the hash ring.2.Placement of Replicas:AlthoughSCALE replicates the stateof each device only once, the notion of “tokens” in con-sistent hashing aids load balancing. Each node has mul-tiple neighbor nodes on the ring since each node is repres-ented by multiple tokens on the ring. Hence, the states ofdevices assigned to a particular VM end up getting replic-ated uniformly across multiple other VMs, thereby avoidinghot-spots during replication. For instance, in Figure 5(b),while the masterMMP for both devices-GUTI1 & GUTI2 isMMP-A, their states are replicated atMMPs-B&C resp. Al-though using higher number of tokens increases the numberof VMs involved in exchanging states during VM additionor removal, most of the benefit is achieved even with a relat-ively low number of tokens.

4.4 VM ProvisioningEvery epoch, theMMPVMs are provisioned by considering

the maximum of the processing and memory requirements.While SCALE estimates the processing requirement for thecurrent epoch based on the load from the past epochs, thememory requirements are dictated by the storage space re-quired for the state of the currently registered devices. LetK(t) be the number of registered devices, andL̄(t) be theaverage expected signaling load from the existing devicesat the DC in an upcoming epocht. Let N is the numberof requests that eachMMP VM can process in every epoch,based on its compute power andS represents the maximumnumber of devices whose state can be stored in it, based onits memory capacity. The number ofMMP VMs required tomeet the processing requirements for an epocht is computedby dividing the expected signaling load̄L(t) by the com-pute capacityN of eachMMP VM; the number ofMMP VMsrequired to meet the memory requirements is computed bydividing the aggregate storage required for all the replicasof the existing devicesK(t) by the memory capacityS of aMMP VM:

VC(t) =

⌈(

L̄(t)

N

)⌉

, VS(t) =

⌈

β ·

(

R ·K(t)

S

)⌉

(1)

L̄(t) ← αL(t− 1) + (1− α)L̄(t− 1)

whereβ is a parameter ((0, 1]) used to control the VM pro-visioning, andR is the number of replicas (R = 2) neededfor each device. The average load for an upcoming epoch(L̄(t)) is estimated as a moving average of actual (L(t− 1))and average loads from the prior epoch. Now, the requirednumber ofMMP VMs is V (t) = max{VC(t), VS(t)}.

The choice ofβ plays a critical role in VM provisioning.Recall that the total number of registered devices is signific-antly higher than the active ones in an epoch. Further, a largefraction of these devices have low probability of access in agiven epoch. Simply storingR copies of the state of eachdevice would result in the memory component (VS) dom-inating the VM provisioning costs, unnecessarily driving ithigh. Whileβ can be used as a control parameter to restrictthe VM provision costs, this will amount to some devices

not being replicated and could lead to increased processingdelays for those devices. Hence, it is important to appropri-ately select both, the value ofβ and the devices that will getreplicated, which we address next.

4.5 State AllocationSCALE keeps track of the average access frequency of a

device in an epoch (as a moving average) and includes itwith the rest of the state that is already stored for the device.Future networks are expected to comprise of a large numberof IoT devices; some studies have shown that certain IoTdevices exhibit predictable connectivity patterns [19]. Forinstance, smart meters upload information to the cloud peri-odically. Such predictable access patterns, when available,contribute to a more accurate profiling of device access fre-quency.SCALE leverages such access frequency informationof devices to intelligently determine if their state will getreplicated. This can lead to reduced VM provisioning costson two fronts: (i) within a DC without an appreciable impacton load balancing; and (ii) across DCs by making room ineach DC to store state of devices from remote DCs, therebyallowing for multiplexing of resources across DCs.4.5.1 Access-aware Replication: Let wi be the access fre-quency of a devicei; highly active devices will have a highvalue forwi while devices that are mostly dormant will havelower values forwi. At every epoch in a DC,SCALE strivesto select a subset of devices for whom a single replica (i.eR = 1) of the state might suffice without effecting the ef-ficacy of load-balancing.SCALE selects the number of suchdevices appropriately to achieve the required reduction inthenumber of VMs. Such reduction is valid in the case whenthe memory capacity of theMMP VMs is the main constraint(VS > VC in Equation 1). Specifically,SCALE estimates thenumber of devices:̂K(x), with low access probability, suchthatwi ≤ x (eg.x = 0.1). The state of such devices is onlymaintained in their masterMMP VM. Part of this reclaimedmemory can be used to accommodate new devices (Sn, e.g.,5% of K) that might register with this DC in the epoch aswell as for state of devices from remote DCs for the purposeof multiplexing (Sm, explained in the next subsection).Thus,only K̂(x)−Sn−Sm effectively contributes to reduction inmemory, resulting in

β(x) = 1−(K̂(x)− Sn − Sm)

RK(2)

By increasing the fraction of devices whose state is notreplicated (increasingx), we reduceβ(x) and hence the VMprovisioning cost. Thus, based on the distribution of ac-cess probabilities of devices, an appropriateβ(x) can beused to determine the VM provisioning (V ) using Equation1. Once the provisioning is done, the actual replication ofdevice states are executed in an access-aware manner as fol-lows: (i) Each device state is stored in its masterMMP VM(ii) The replica of the state is stored in the neighboringMMPVM on the hash ring, based on the remaining memory andproportional to its access probability as,

Pi(rep) =

(

wi∑

jwj

)

(V S − Sn − Sm −K) (3)

To understand the impact of such access-aware replication,we extend our stochastic model to incorporate access-awareness.We defer the proof of the analysis to theAppendix and fo-cus on the implications. The closed-form expression for theaverage delay for a devicei in a memory-constrained envir-onment,C̄i gets modified to Equation 13 inAppendix-A2.Figure 6(b) shows the normalized cost (processing delay)comparison ofSCALE with a system that randomly picksdevices whose states get a replica (i.e. access unaware).Clearly, leveraging access patterns helpsSCALE reduce theimpact of insufficient memory. For instance, in the plot, tosupport a given load of say0.85, SCALE reduces control-plane latency by5x as compared to the baseline system thatperforms access unaware replication. Alternately, access-aware replication allowsSCALE to provide a comparable per-formance at a reduced VM provisioning cost. For instance,in a particular case with25% of the devices having low fre-quency of access,SCALE reduces VMs by10% (as shown inSection 5.1).4.5.2 Geo-Multiplexing: By provisioning resources and main-taining separate hash rings forMMP VMs at each individualDC,SCALE ensures that the masterMMP VM for every deviceis located in the local DC. This allows for minimizing delaysby processing as many requests as possible at the local DC.However, to load balance the processing across DCs dur-ing periods of local DC overloads,SCALE needs to (i) makeroom (Sim) in each DCi for state of devices from other DCs(j 6= i), and (ii) decide which devices in a DC will have theirstate replicated remotely and in which remote DC. While theformer is handled by the DC, the latter is handled by theMMPVMs independently for scalability. The sequence of stepsare as follows.1. DC-level operation: Each DCi (i) independently choosesSim (state budget) to capture potential under-utilization inprocessing in an epoch (e.g., 10% of net processingV ·N ).Sim is the maximum amount of state, DCi will accept fromdevices belonging to external DCs; (ii) maintains and up-dates a variablêSim, that represents the amount of externaldevice state from the totalSim, that is unused; (iii) periodic-ally broadcasts the value of̂Sim to the neighboring DCs. (iv)periodically updates the value ofSim, Ŝ

im to track the aver-

age processing load and hence room for processing externalstate; and (v) if at any stagêSim ≥ S

im or Ŝ

im = S

im = 0

(over-load), it requests the other DCs to appropriately reducetheir share of device states stored in DCi to reflect the re-duction inSim.2. MMP-level operation: (1) Choice of devices:With each

DC i making a room ofSim for external state, it has anequivalent room forSim of its devices to have their statereplicated remotely (to ensure conservation of external stateresources across DCs). However, deciding which devices’state need to be stored remotely is not straight-forward. Notethat each DC would like to process most of its high prob-ability devices locally so as to keep the processing delayslow. At the same time, storing low probability device statesremotely will not help multiplex resources from remote DCs,since the probability of those devices appearing is low to

begin with. To balance between processing delays and re-source multiplexing, eachMMP VM vk (at a DC) inSCALE

selects its share ofSim

Vdevices of high access probability

(wi ≥ 0.5) in an epoch, to be replicated once in the externalspace (Sjm, j 6= i) reserved by one of the remote DCs. How-ever, this replication is in addition to the two copies that arestored locally for the high probability devices so as to not af-fect their processing delays appreciably. Further,SCALE rep-licates the state of a device withwi ≥ 0.5 externally, propor-

tional to its access probability as

(

wi∑j∈vk:wj≥0.5

wj

)

(

SimV

)

.

(2) Choice of remote DCs:Once a device’s state is chosenby aMMP VM for external replication, it determines the ap-propriate destination (remote DC) for the state based on twofactors: the remote DC’s current occupancy by external state(Ŝjm) and inter-DC propagation delay. Specifically, theMMPVM executes the following steps: (i) it checks if at least oneDC j has available budget for external state i.e.,Ŝjm ≥ 0;(ii) if multiple remote DCs have a non-zero budget, it prob-abilistically selects the appropriate one proportional tothefollowing metricp:

p =1

Dik∑C

i=11

Dij

whereDij is the propagation delay between DCi andj; Cis the total number of remote DCs witĥSjm ≥ 0. SCALE per-forms such probabilistic replication to ensure that hot spotsare avoided in cases where certain DCs have low delaysto multiple adjacent DCs. (iv) if requested by a DCj toreduce its replications byy%, theMMP VM deletes its shareof external state replication (y

V%) at DCj by starting with

those states having a relatively low access probability.(3) Execution: Similar to local replication, geo-replicationis performed asynchronously by the masterMMP. After pro-cessing the request of a device, the masterMMP may chooseto replicate its state to a remote DC. The replication is doneusing aMLB VM of the remote DC, which selects theMMPVM based on the hash ring of that DC. Additionally, themasterMMP attaches the location of the external state of adevice (i.e., the remote DC id) to its current state. Thisensures that the requests for the devices can be routed to theappropriate DC by the localMMP VMs.

4.6 Fine-grained Load-balancingThe state management and allocation appropriately main-

tain the replicas of the state of the devices to ensure the abil-ity of the MLB VMs to perform efficient load balancing. Weelaborate on key design considerations for load-balancing:(1) Low-overhead: The online load balancing is designedwith minimal overhead on theMLB, to ensure faster look-up speeds when routing requests to theMMPs. Specifically,theMLB is unaware of the (a) number and (b) placement ofthe replicas of the state of a device to avoid memory andexchange of per-device information at theMLB VMs. Hence,the only meta-data information needed by theMLB VMs arethe (i) updated consistent hash ring asMMP VMs are addedor removed, and (ii) current load (moving average of CPU

utilization) on eachMMP VM.(2) Granularity: As elaborated in Section 2, the (a) pro-cessing requirements for a device is higher while it is in theActivemode and (b) the processing delays are more criticalwhen the device makes a transition from theIdle to Activemode. Keeping this in mind, theMLB assigns the least loadedMMP VM among the choices for a device request when itmakes a transition to theActivemode. Subsequent requestsare sent to the sameMMP VM until the device make a trans-ition back to theIdle mode. This design choice was basedon a few key observations: (i) While the device isActive, thestate of the device is larger and theMMP VM maintains sev-eral timers. Hence, maintaining consistent state across VMswhile the device isActivemakes the system complicated. Byload balancing only when the device enters theActivemode,SCALE performs updates of the replicas once the device goesback toIdle mode; (ii) The routing implementation requiresmore meta-information since, once the device is in theActivemode, the subsequent requests do not contain itsGUTI.

When a request is received from an existing device, theMLB extracts theGUTI of the device from the request andhashes it to obtain both the masterMMP and the replicaMMPVMs. The request is forwarded to the least loadedMMP VM.When aMMP VM receives a request, it performs one of thefollowing tasks:(1) it processes the request if it has the stateof the device;(2) it forwards the request to the masterMMPif it does not have the state of the device. This may happenif the device has been replicated only once; (3) it forwardsthe processing request to theMLB of the appropriate remoteDC, if its load is above a threshold, and the device’s state hasbeen replicated externally.

5. PROTOTYPE AND EVALUATIONWe implement a prototype ofSCALE on the OpenEPC

platform [12], since we licensed it’s source-code. The Ope-nEPC testbed in our lab is a LTE Release 9 compatible net-work, consisting of standard EPC entities and an eNodeBemulator. Each component is written in C, and deployedover Intel based servers running Linux. Since the originalOpenEPC implementation does not support the split archi-tecture for the MME from Figure 4, we implemented theMLBandMMP components by modifying the source code for theMME. Once the architecture was implemented and tested,we implemented all the components ofSCALE from Fig-ure 5(a) by modifying the source code for theMLB andMMPcomponents. Although, our implementation is mainly fo-cussed on theMLB andMMP components, we spent consider-able effort on modifying and stabilizing the code for entitieslike the eNodeB emulator, S-and P-GWs to ensure that areasonable amount of load, in terms of device requests persecond could be supported.eNodeB: The eNodeB emulator supports the higher-layerprotocols of the eNodeB. The eNodeB includes an interfacethat accepts device connections. We implemented a pythonbased load generator that emulates device connections bygenerating attach/reattach/handover requests with differentconnectivity patterns.

0

20

40

60

80

100

0 1 2 3 4 5 6 7 8

%ta

ge C

PU

Util

izat

ion

Time (Seconds)

MLB

MMP4MMP2

0 10 20

40

60

80

100

0 5 10 15 20 25 30

%ta

ge C

PU

util

izat

ion

Time (Seconds)

Attach Requests

Updating the Replicas

Load On MMP 1

(a) MLB Processing (b) State Replication

Figure 7: E1,E2: Feasibility ofSCALE

MLB: Most of our implementation effort was spent on theMLB. To implement theMLB, we modified and added codeto the original MME function of the openEPC. TheMLBhas three subcomponents:(i) Standard Interfaces: TheMLBsupports the interfaces S1AP, S11 and S6 with the eNodeBs,S-GWs and the HSS resp. as shown in Figure 1. Hence,theMLB maintains standard compliant interactions with theother components (eNodeBs, S-GWs etc.) and hence actsas an MME to them. As devices make transitions betweenActiveandIdle modes, theMLB is implemented such that itensures routing of requests to the appropriateMMP VMs. Therequests may belong to any of the above mentioned inter-faces. (ii) Load Balancing: We implemented the Consist-ent Hashing functionality using the MD5 hash libraries andlinked it to the S1AP protocol parsing function in theMLB.When theMLB receives a (re)-attach request from a device, itextracts itsGUTI and obtains the corresponding master andreplicaMMP VMs. The initial request is then forwarded to theleast loadedMMP VM. However, as per the 3GPP standard,while in the Active mode, the subsequent requests for thesame device from the eNodeB and the S-GW contain otherunique identifiers:S1AP-idandS11-tunnel-idrespectively,each assigned by the associatedMMP. Hence, inSCALE, eachMMP embeds its unique ID in both theS1AP-id& S11-tunnel-id, thus enabling theMLB to route the subsequent requeststo the appropriate activeMMP. (iii) MMP Interface: Therequests from the eNodeBs are forwarded by theMLB to theMMP VMs using SCTP connections using an interface similarto S1AP. To receive meta-information such as updates onthe consistent hash ring and load information from theMMPVMs, an existing TCP connection used for the purpose ofmanagement is leveraged.MMP: The following functionalities were implemented intheMMP: (i) State Partitioning: The Consistent Hashing func-tionality was added to (re-)partition the state of devices acrossthe currentMMP VMs. (ii) State Replication: The masterMMPVM is responsible for replicating the state of a device bothwithin a DC and across DCs. We implemented the replica-tion framework, such that the masterMMP VM replicates thestate of a device after it processes its initial attach request.In case of within a DC, the masterMMP and the replica VMuses a direct TCP connection for state replication and syn-chronization to ensure consistency. For inter-DC, a similarprotocol is used except that the masterMMP communicateswith the remote replica through the remote DC’sMLB.

5.1 EvaluationWe perform extensive experiments and simulations to show

the efficacy ofSCALE based on different metrics. We meas-ure (a) the end-to-end delay of the control-plane requests asperceived by the devices; (b) the number of VMs requiredto process requests to achieve a target control-plane latencyrequirement and (c) the average CPU load of VMs to showthe efficacy in managing overloads in VMs.1. Prototype Evaluation: We first describe the results of

experiments from our prototype implementation. We wereunable to perform experiments with large number of VMson public clouds, such as EC2 due to licensing restrictions.Additionally, the original OpenEPC implementations do notsupport the generation and handling of significantly highloads. Note that we posses the source-code and did spendconsiderable effort to enable higher loads that aided our eval-uation. Nonetheless, our small-scale prototype provided keyinsights that helped the design and a real implementationproves the feasibility of its components in practice.

E1: Overhead of MLB: It is not intuitive whether theoverhead introduced by theMLB in SCALE’s architecture out-weighs its benefit in providing elasticity and fine-grainedload balancing in theMMP cluster. To prove the feasibilityof theMLB, we perform an experiment with a singleMLB VMand aMMP VMs. At each step, we add aMMP VMs and alsoattach corresponding devices to saturate theCPU load on theMMP VMs. We stopped the experiment at4 MMP VMs, sincewe observed reasonableCPU load on theMLB VM. We plottheCPU load of theMLB VM and a couple of theMMP VMsin Figure 7(a). Clearly, while the CPUs of theMMP VMs arecompletely utilized, the maximum CPU load on theMLB VMis slightly below 80% utilization. This is a promising resultgiven thatMLB code was modified from an existing MMEcode base and not optimized to function as a MME proxy.

E2: Replication Overhead: Another key difference ofSCALE from current status quo MME systems is the use ofproactive replication of device states to ensure efficient loadbalancing. To study the overhead of replication on the CPUcycles spent by theMMP VMs, we setup the prototype with4MMP VMs. The eNodeB emulator is configured to generatearound200 active devices whose state is stored atMMP1 andreplicated across otherMMP VMs. We plot the CPU loadof MMP1 in Figure 7(b). We force theMLB to forward allthe requests toMMP1. In order to process the requests, theload onMMP1 reaches almost90% at around2-4 seconds intothe experiment. After about10 seconds of inactivity, all thedevices make a transition intoIdle mode (at about15s). Atthis moment,MMP1 updates the copy of the state of all thedevices in the respective replicaMMPs. As seen from thefigure, at about15 seconds, the CPU load due to replicationat MMP1 is less than8%. Although this result shows the lowoverhead of proactive replication, note that the replicationframework on the prototype was built over existing codebase of OpenEPC. With further optimizations for state syn-chronization, e.g., differential replication and using techno-logies like remote direct memory access (RDMA) [23], theoverhead can be significantly reduced.

0

0.2

0.4

0.6

0.8

1

0 250 500 750 1000 1250

CD

F


Current Systems

SCALE

0

20

40

60

80

100

120

0 2 4 6 8 10 12

%ta

ge C

PU

utli

zatio

n

Time (seconds)

SCALE

CurrentSystems

0

20

40

60

80

100

120

0 2 4 6 8 10 12

%ta

ge C

PU

utli

zatio

n

Time (seconds)

SCALE

Current Systems

0

100

200

300

400

LOW HIGH EXTREME

99

th %

tile

De

lay (

mse

cs)

DC#1 LOAD

Local DCCurr Sys

SCALE

(a) Processing delays (b) Load on MMP1 (c) Load on MMP2 (d) Geo-multiplexing

Figure 8: E4: Efficacy ofSCALE over Current (3GPP standard-based) Systems

0

20

40

60

80

100

0 1 2 3 4 5 6

%ta

ge C

PU

utli

zatio

n

Time (Seconds)

SIMPLE (MMP1)

SIMPLE (MMP2)

SCALE(MMP2)

SCALE(MMP1)

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500

CD

F


SIMPLE

SCALE

(a) CPU Usage (b) Delay

Figure 9: E3: Placement of Replicas

E3: Placement of Replicas: We compareSCALE with asystem, namelySIMPLE that uniformly distributes the stateof the devices across existing VMs and additionally replic-ates the states of each VM to another VM. To the best ofour knowledge, the design ofSIMPLE is representative of afew commercially available virtual MME systems. Whilesuch an approach keeps the implementation simple, it hasseveral drawbacks compared toSCALE: (i) The MLB has tomaintain per-device routing table, hindering scalabilitytothe future device growth (ii) The load balancing is inefficientin handling VM overloads. To drive our point, we conductan experiment with 5MMP VMs where we generate high sig-naling traffic onMMP1, about twice its processing capacity.As shown in Figure 9(b), the99th %tile delays perceivedwith SIMPLE replication are more than400ms, while thesame forSCALE are below200ms. The high delay in theformer is because the state of the devices assigned toMMP1are only replicated toMMP2. Hence, when the processingload onMMP1 is sufficiently high, it causes overloads on boththeMMPs as seen in Figure 9(a). However,SCALE replicatesthe state of disjoint subsets of devices assigned toMMP1 ontoMMP2, MMP3, andMMP4, causing relatively low loads on bothMMP1 andMMP2. The above results highlights the importanceof SCALE’s choice of utilizing consistent hashing with tokensfor distributed placement of the replicas (Section 4.3).

E4: The Status Quo: We compare the efficacy ofSCALEover with standard-based MME systems that were describedearlier in Section 3. We conduct experiments to show thebenefits ofSCALE for two types of scenarios:(i) VM Overloads Within a DC: While current systems arereactive to overloads by transferring devices to other MMEs,SCALE performs proactive replication of device state acrossVMs to ensure that fine-grained load balancing. To quantifythe benefit, we conduct an experiment with the same setup

consisting of2 MMP VMs and configure the testbed suchthatMMP1 receives a high number of device requests aboveits processing capacity. To emulate current systems withour setup, theMMPs are statically assigned devices with noreplication of state. WhenMMP1 gets overloaded, it selectsseveral active devices and sends messages to the devices toreconnect again, to ensure they are assigned to the otherMMP by the eNodeBs. In addition,MMP1 transfers the stateof those devices toMMP2 to ensure thatMMP2 can continuethe processing. However, in the case ofSCALE, the stateof the device assigned toMMP1 are proactively replicated toMMP2. While in a real deployment ofSCALE, the replicasof the state of devices stored inMMP1 would get replicatedacross multiple otherMMP VMs, we use only2 VMs in thisexperiment for simplicity. In Figure 8(a), we plot the CDFof control-plane delays of the devices assigned toMMP1 forboth the cases. Clearly, the99th %tile delay is more than1 second with current systems that take a reactive approachas opposed less than250 ms withSCALE. Such high delaysaffect connectivity times for devices causing degraded QoE.

We also plot the CPU load on both theMMPs in Figures8(b) and 8(c) respectively.SCALE ensures that the load ontheMMP1 does not reach100%, thus avoiding high queueingdelays for the requests.SCALE effectively offloads processingto MMP2 at fine time-scales. While the current system alsostrives to offload processing toMMP2, the overhead of the sig-naling per device to transfer state of the devices between theMMPs leads to a higher CPU load on both theMMPs comparedto the case withSCALE. Hence,SCALE’s state managementis effective in managing the load acrossMMP VMs to ensurelow control-plane delays.(ii) Persistent DC overloads: To show the ability ofSCALE toperform fine-grained load balancing across DCs, we conductan experiment with3 DCs. While the DCs2&3 are fixedto be lightly loaded, the load of DC1 is varied. We alsoemulate inter-DC propagation delays using netem [24]. Wesetup the experiment for3 different cases:(a) Local DC:Resources are not pooled across DCs, hence the devices arealways processed at the local DCs.(b) Current Systems: Incurrent deployments, resources could be multiplexed acrossDCs by deploying a MME pool across DCs as explained inSection 3.(c) SCALE: It proactively replicates selected stateacross DCs. In Figure 8(d), we plot the mean and standarddeviation of the99th %tile delays perceived by the devicesregistered with the DC1 under different load conditions ofDC1. In current systems, requests from devices that are

0

0.5

1

1.5

2

2.5

1 2 3 4

99th

%ile

Del

ay (

secs

)

Replication Factor

Basic Const. Hashing

SCALE(L1)SCALE(L2)SCALE(L3)SCALE(L4)

0

50

100

150

200

250

IND RDM1 RDM2 SCALE

99

th %

tile

De

lay (

mse

cs) DC1

DC2DC3DC4

(a) Impact of Replication (b) Inter-DC

Figure 10: S1,S2: State Management and Allocation

assigned to a remote DC, are always processed at the re-mote DC irrespective of the load conditions at the local DC.Hence, in the case of low loads at DC1, the delays are higherfor current systems due to the propagation delays to the re-mote DCs. However,SCALE processes all requests locallyat the DC1 since the masterMMP for each device is locatedat the local DC, resulting in low delays. Even for higherloads at DC1, SCALE achieves better control-plane delaysthan current systems as shown in Figure 8(d). Unlike cur-rent systems that have to be statically configured,SCALE’sreplication strategy dynamically accounts for both (1) loadconditions at DCs and (2) inter-DC propagation delays. Thisreplication strategy giveSCALE the ability to perform moreefficient and fine-grained load balancing.2. System Simulations: To show the efficacy ofSCALE in

larger setups with higher number of VMs and devices, webuilt a custom event-driven simulator in Python. The sim-ulator is split into a load generator, that generates requestswith different access patterns and a cluster emulator thatemulates the processing at theMMP VMs. We now highlightthe key components ofSCALE, with focus not just on theirefficacy but also stress on a few key design considerations.

S1: State Management: Recall from Section 4.3 thatSCALE leverages consistent hashing for state partitioning acrossactiveMMP VMs and maintains2 replicas of each device stateto ensure efficient load balancing. To verify these designchoices, we setup a cluster of30 MMP VMs and initialize80Kdevices that are assigned to the VMs based on consistenthashing. Each VM is represented by5 tokens on the hashring.We repeat the experiments for4 different load scenariosand measure the99th percentile of connectivity delays of thedevices as shown in Figure 10(a). In each run, certain VMsare selected to have higher number of active devices thantheir processing capacity such that the load will be skewedacross VMs. The different scenarios L1-L4 represent in-creasing skewness with L1 having lowest and L4 having thehighest skewness across the VMs. As shown in the Figure,irrespective of the degree of load skewness, most of the be-nefit is obtained by replicating twice and replication beyondthat does not decrease the connectivity delays significantly.

To show the effect of mapping each VM as multiple tokenson the ring, we repeat the experiment with the basic con-sistent hashing without tokens. This algorithm simply mapseach VM directly on the ring, causing (i) uneven distribu-tion of state, (ii) all the states of a VM are replicated ontothe neighboring VM. Hence, when a VM is overloaded, its

0

20

40

60

80

100

120

0.75 0.875 0.95 1

#VM

Pro

visi

oned

β (Sec 4.5)

0

10

20

30

40

50

0.75 0.875 0.95 1

Del

ay (

mse

cond

s)

β (Sec 4.5)

(a) #VM Provisioned (b) Control-plane DelaysFigure 11: S3: Access Aware Replication

load can only be offloaded to another VM. Hence, furtherreplication is required to balance the load (Figure 10(a)).

S2: Geo-Multiplexing: SCALE proactively replicates stateacross DCs to ensure higher resource utilization while redu-cing the end-to-end control-plane delays. Referring to Sec-tion 4.5, SCALE effectively considers two key parameters:(1) Current resourceUtilization at each DC and (2)Propaga-tion delaysbetween the DCs. To focus on the importanceof these factors, we setup a simulation with4 DCs suchthat DCs1, 3 are overloaded while DCs2, 4 are lightlyloaded. We plot the99th %tile processing delays perceivedby the devices belonging to all4 DCs in Figure 10(b) for4 different cases. (i) In the first case (IND), the devicesare always processed by their respective local DCs, causinghigh delays for the two overloaded DCs. This representsmost deployments today. In the next two cases, RDM1 andRDM2, each DC uniformly replicates a fixed percentage ofits total device states across the other 3 DCs. (ii)CurrentLoad: In RDM1, we configure the experiment such that thecurrent load of DC2 is higher than that of DC4. As seenin Figure 10(b), although the delays of devices at DCs1,3were reduced compared to those with IND, the additionalload from DCs1, 3 on DC2 causes much higher delays atDC2 while the delays at DC4 are lower. (iii) PropagationDelays: In RDM2, the experiment is configured such thatthe propagation delays from DCs1,3 to DC2 are higher thanthe delays from DCs1,3 to DC 4. As seen in Figure 10(b),the delays of devices at DCs1 and3 are not significantlylower than those with IND. The additional propagation delayto DC 2 causes higher control plane delays for the devicesbelonging to DCs1, 3. (iv) By considering both the currentutilization and propagation delays,SCALE achieves betterload balancing across DCs and causes lower delays for allthe devices as shown in Figure 10(b).

S3: Access-Awareness: This experiment shows the flex-ibility of VM provisioning with SCALE. Specifically,SCALEcontains a parameterβ to control the number of VMs inscenarios when the state requirement is large due to highnumber of registered devices (Refer to Equation 1 in Sec-tion 4.5). We setx=0.2, i.e.,SCALE maintains only a singlereplica for all devices that have access frequency below0.2(wi ≤ x). We repeat the experiment by varying the numberof low probability devices, while keeping the number of totaldevices constant at100K. This setup gives us a sense of VMsavings thatSCALE can achieve depending on the fraction ofdevices that have low frequencies of access, a trend that is

visible with the advent of M2M or IOT. As shown in Fig-ure 11(a),β=1 represents the case where the VMs are provi-sioned to store2 replicas of the state of all devices. As thenumber of low probability connections increases, theβ valuedecreases and the VMs provisioned bySCALE decrease asseen in the figure. Figure 11(b) depicts the delays perceivedin different scenarios. Even in the case withβ=0.75, wherealmost50% of device states are not replicated more thanonce,SCALE reduces the number of VMs by25% withoutaffecting the average delays significantly.

6. RELATED WORKDistributed Systems: Distributed key-value store systems [9,22, 25] face requirements similar toSCALE in their respectivedomains. Although they effectively manage data at scale,there are few key distinctions: (i) they do not consider thecompute provisioning since processing requirements are notintensive. (ii) such systems are over-provisioned to meet thestringent SLA requirements. Low latencies leads to directrevenue, hence over-provisioning is sustainable. (iii) data isaccessed simultaneously by multiple users or applications.Hence, it benefits to replicate high-access keys in order ofmagnitude more than regular keys (upto10) [26]. Hence,althoughSCALE does borrow techniques from such systems,the key is thatSCALE had to additionally solve challengesinvolving both the MME protocols and compute workloads.NFV Trends: Recently, NFV has been a major focus of thetelecom industry leading them to transform their networkfunctions to execute as software over generalized intel-basedhardware. In the mobile context, it has led to the emergenceof commercial virtual EPC platforms [27, 28]. However,the primary focus has been on software improvements, suchas hypervisor enhancements to improve the performance ofthe functions running as VMs, while using simple, reactivemechanisms for elasticity.SCALE is complementary to theseapproaches since it adopts a system-wide approach to designMME implementations with better elasticity and scalability.EPC research: Recent works [29, 30] have proposed newarchitectures and protocols for an efficient EPC design. Al-though effective, such approaches are costly to deploy andthey ignore many aspects of device management prevalent incurrent networks. Nonetheless,SCALE is a complementarysolution that can be leveraged to virtualize such architecturesif realized in future. The dMME system [31] proposes a splitMME design with distributed processing nodes accessinga centralized storage. The dMME design is an alternatedesign choice toSCALE. However, the dMME systems hasnot been implemented and evaluated with reasonable loads.It is a good avenue for future work to compare the two designchoices in the scope of virtual MMEs.

7. CONCLUSIONSCALE is a virtualized MME framework designed to meet

the future requirements of mobile access.SCALE intro-duces a decoupled MME architecture to enable elasticity. Tomeet the goal of performance in terms of lowering signalingdelays,SCALE adapts the consistent hashing scheme with

analytically driven replication strategy to achieve efficientload balancing. To reduce VM footprint,SCALE performsdevice-aware replication and leverages geo-multiplexing. Witha prototype using a LTE testbed and large-scale simulations,we show the feasibility and efficacy ofSCALE.Acknowledgments: We thank the reviewers and and AjayMahimkar, our shepherd, for the insightful comments thatimproved the paper significantly.

8. REFERENCES[1] Qian et al. Periodic transfers in mobile applications: origin,

impact, and optimization. InACM WWW, 2012.[2] Forecast: The Internet of Things.

http://www.gartner.com/newsroom/id/2636073.[3] Signaling is growing 50% faster than data traffic.

http://goo.gl/89RtYs.[4] Managing LTE Core Network Signaling Traffic.

https://goo.gl/2nxq0X.[5] Angry Birds Ruffle Signaling Feathers. http://goo.gl/Xqj6ti.[6] Charting the Signaling Storms. http://tinyurl.com/k7qqeeb.[7] AT&T Domain 2.0 Vision White Paper, 2013.

http://tinyurl.com/p4uv3s3.[8] Amazon Elastic Cloud. http://aws.amazon.com/ec2/.[9] G. DeCandia et. al. Amazon DynamoDB: A Seamlessly

Scalable Database Service. InACM SIGMOD, 2012.[10] Why distribution is important in NFV?

http://tinyurl.com/kyewvqq.[11] Karger et al. Consistent hashing and random trees:

Distributed caching protocols for relieving hot spots on theworld wide web. STOC ’97. ACM.

[12] OpenEPC. http://www.openepc.com/.[13] GPRS enhancements for E-UTRAN access.

http://www.3gpp.org/DynaReport/23401.htm.[14] MME Functions. http://goo.gl/1BuKpB.[15] Path to 5G. http://tinyurl.com/kgqe76t.[16] Nguyen et al. Towards Understanding TCP Performance on

LTE/EPC Mobile Networks. InAll Things Cellular:Operations, Applications, & Challenges. ACM, 2014.

[17] Huang et al. Close Examination of Performance and PowerCharacteristics of 4G Networks. InMobiSys. ACM, 2012.

[18] Zeljko Savic. Lte design and deployment strategies.http://tinyurl.com/lj2erpg.

[19] Shafiq et al. A first look at M2M traffic: Measurement andCharacterization. InACM SIGMETRICS, 2012.

[20] RAN Congestion Analytics. http://tinyurl.com/nlhqg96.[21] VidScale Mediwarp RAN. http://www.vidscale.com.[22] Lakshman et al. Cassandra: A Decentralized Structured

Storage System.ACM SIGOP, 2010.[23] Remote Direct Memory Access (RDMA) over IP Problem

Statement. http://www.ietf.org/rfc/rfc4297.txt/.[24] Netem. http://www.linuxfoundation.org/collaborate/

workgroups/networking/netem.[25] R. Nishtala. Scaling Memcache at Facebook. InNSDI 2013.[26] Zhang et al. Load balancing of heterogeneous workloads in

memcached clusters. InFeedback Computing, 2014.[27] NEC Virtual EPC. http://tinyurl.com/nuefq28.[28] The Journey to Packet Core Virtualization.

http://resources.alcatel-lucent.com/asset/174234.[29] Jin et al. Softcell: Scalable and Flexible Cellular Core

Network Architecture. InACM CoNext, 2013.[30] Moradi et al. SoftMoW: Recursive and Reconfigurable

Cellular WAN Architecture. InACM CoNext, 2014.

[31] An et al. DMME: A Distributed LTE Mobility ManagementEntity. Bell Labs TR, 2012.

APPENDIXModel: Let the number of replications of a device’s state beR, withN being the limit on the number of devices a VM can process, andT being the size of the epoch over which decisions are made. WithPoisson processes serving as a representative tool for modeling webrequests, call arrivals, etc., we assume that the devices arriving atVM j follow a Poisson process with incoming rateλ (Varied arrivalrates across VMs can be easily incorporated).

For the sake of analysis, we assume the following simple strategyfor serving an incoming device: when a device arrives it is ran-domly assigned to one of theR VMs that containts its state - inother words, it is equally likely for the device to be assigned to oneof theR VMs. Although simplistic, such a strategy is helpful togain useful insights and hence guide the design of the replicationprocess. Under this strategy, it can be seen that each stream ofincoming devices at a VMj gets split intoR sub-streams, whichare sent to one of theR VMs, where the state of the devices inVM j are replication (based on consistent hashing). Each of thesesub-streams are also Poisson (Poisson splitting), but with rateλ

R.

Thus, from the perspective of each VMj, it gets a combination ofR sub-streams (fromR VMs including itself), each being a Poissonprocess with rateλ

R, resulting in an aggregate process that is also

Poisson (Poisson combining) with rateλ.Let us assume that a device incurs a costC only when it cannot

be served/processed (0 otherwise). Further, let the probability of adevice arriving in an epoch bewi.A1.Impact of Replication Frequency: We represent the count-ing process associated with the Poisson arrival at VMj asNj(t)indicating the number of devices that have arrived until timet.Now, we characterize the probility that a devicei cannot be served(represented by/∈S) in a VM j (Vj).P(i /∈S Vj at t)

= P ({i ∈A (t, T ]} ∩ {i /∈A (0, t]} ∩ {Nj(t) ≥ N})

=∞∑

k=N

P ({i ∈A (t, T ]} ∩ {i /∈A (0, t]} ∩ {Nj(t) = k})

=∞∑

k=N

P ({i ∈A (t, T ]}|{i /∈A (0, t] ∩Nj(t) = k})

· P ({i /∈A (0, t]} ∩ {Nj(t) = k})

=

∞∑

k=N

P ({i ∈A (t, T ]}|{i /∈A (0, t] ∩Nj(t) = k})

· P ({i /∈A (0, t]}|{Nj(t) = k}) · P (Nj(t) = k) (4)where∈A (/∈A) indicates if a device arrives (otherwise). Given

the arrival process being Poisson,P(i ∈A (0, t]) depends on thenumber of devices that have arrived untilt. If Nj(t) = k, then wehave,

P(i /∈A (0, t] | Nj(t) = k) =(

1−wiλT

)k

Further,P(Nj(t) = k) =(λt)ke−λt

k!andP ({i ∈A (t, T ]}|{i ∈A (t, T ] ∩Nj(t) = k})

={

1− e−λ(T−t)}

· wi

Applying back into Equation 4, we haveP(i /∈S Vj at t)

={

1− e−λ(T−t)}

· wi ·

∞∑

k=N

(λt)ke−λt

k!·(

1−wiλT

)k

(5)

The probablity that devicei cannot be served at any of theR

VMs where its state resides can be obtained as,P(i /∈S at t) = (P(i /∈S Vj at t))

R (6)Now, the expected cost incured by devicei reduces to,

C̄i = C

∫ T

0

(P(i /∈S Vj at t))R dt (7)

Applying Equation 5 in 6 and simplifying, we obtain the followingclosed-form expression for the expected cost of a device for largeT as,

C̄i =(

C

λ

)

wRi

∞∑

k=N

(

1−wiλT

)kR(

Γ(kR+ 1)

(Γ(k + 1))RRkR+1

)

(8)

whereΓ(n) = (n−1)! is the Gamma function. For large values ofk, the numerator and denominator of the above equation would behard to compute. Hence, we provide the following simplificationfor easier computaiton, wherein,

(

Γ(kR+ 1)

(Γ(k + 1))RRkR+1

)

=

(

1

R

)

Πk−1p=0ΠR−1q=0

(

1−q

(k − p)R

)

(9)

Now, the average cost (delay) experienced by a device in anepoch is given by,

C̄ =

∑

i wiC̄i∑

i wi(10)

Using the above, the impact of replication frequency (R) on theaverage delay experienced by a device can be easily studied.A2.Impact of Access-awareness: Here, we aim to understand theimportance of replicating the state of a device in proportion to itsaccess probabilitywi. This is especially useful, when the total statein the VMs does not allow all the devices to be replicatedR times.Hence, it becomes important to understand which set of devicesshould be replicated more relative to others.

LetS be the total state capacity at each VM, withK be the totalstate of all devices that need to be stored. LetV be the number ofVMs in the datacenter. Further, letS′ be the actual state capacitythat is available for devices in local DC after accounting for newdevice states (Sn) and external state of devices from remote DCs(Sm), i..eS′ = S − Sn+SmV . To replicate each device’s stateRtimes (R = 2 in SCALE), we need a total state capacity ofRK. WeneedV S′ ≥ K to allow for each device’s state to be stored at leastonce. IfV S′ < RK, then not all devices can be replicatedR times.

In this case, each device’s state can be replicatedR′ =⌊

V S′

K

⌋

times (R′ = 1 in SCALE). With the remaining(

V S′

K−⌊

V S′

K

⌋)

K

state capacity not being sufficient to accommodate(R − R′) rep-lications of all the devices, it becomes important to decide whichof the devices will get an additional replication.

Here, we consider two strategies: (i) access-unaware: each ofthe device has an equal probability of

Pi(rep) =V S′

K−

⌊

V S′

K

⌋

, ∀i (11)

to get an additional replication; and (ii) access-aware: the probabil-ity of replication is proportional to the device’s access probability:

Pi(rep) = min

{

1,

(

wi∑

i wi

)(

V S′

K−

⌊

V S′

K

⌋)

K

}

(12)

Now, let us represent the average cost of a device in Equation 8 asa function ofR asC̄i(R). Then, incaseR replicas are not possiblefor each device, the average cost of a device gets modified as,

C̄i = (1− Pi(rep))C̄i(R′) + Pi(rep)C̄i(R

′ + 1) (13)where,Pi(rep) will depend on the strategy used for replicatingdevices in the space remaining afterR′ replications of each device(i.e. Equations 11,12). With the above analysis, we can now easilystudy the impact of access-aware replication on the average delayexperienced by a device in a datacenter.

Documents

Scaling the LTE Control-Plane for Future Mobile Access...a key control-plane element in LTE.SCALEis fully compat-ible with the 3GPP protocols, ensuring that it can be readily deployed