Coalescent Computing - arXiv

Coalescent Computing

Kyle C. HaleIllinois Institute of Technology

Abstract

As computational infrastructure extends to the edge,it will increasingly offer the same fine-grained resourceprovisioning mechanisms used in large-scale cloud data-centers, and advances in low-latency, wireless network-ing technology will allow service providers to blur thedistinction between local and remote resources for com-modity computing. From the users’ perspectives, theirdevices will no longer have fixed computational power,but rather will appear to have flexible computationalcapabilities that vary subject to the shared, disaggre-gated edge resources available in their physical proxim-ity. System software will transparently leverage theseephemeral resources to provide a better end-user expe-rience. We discuss key systems challenges to enablingsuch tightly-coupled, disaggregated, and ephemeral in-frastructure provisioning, advocate for more research inthe area, and outline possible paths forward.

1 Introduction

We envision edge deployments (i.e., “cloudlets” [52])that expose virtualized resources which can transpar-ently augment user devices, and which can automati-cally scale up or down based on available resources,user demand, user proximity (network latency), avail-able network bandwidth, and spot pricing. From theusers’ perspective, it appears as if their device (laptop,thin client, or smart phone) acquires increased compu-tational power (increased memory or disk capacity, in-creased CPU count, or high-end GPU) when they wandernear such a deployment, for example, into their local cof-fee shop1. When the user leaves the area, the resourcesare revoked, and the user’s device appears as it did be-fore.

1The author acknowledges that since we are currently living in apandemic some readers might find this particular example far-fetched!

Since Satyanarayanan first laid out the basis for thisvision nearly two decades ago with Cyber Foraging [5],hardware, software, and networking technologies haveadvanced to the point where it will soon be possible forthe transparent coalescence of disaggregated computa-tional resources to client machines to occur at a fine gran-ularity and over short time scales. We call this notionCoalescent Computing, a type of Cyber Foraging for dis-aggregated hardware at the edge.

Cyber Foraging generally relies on discoverable ser-vices in users’ local environments, and on the ability tooffload application components to remote—and gener-ally more capable—machines [51]. This offloading usu-ally happens at the granularity of virtual machines [24];while the end user may be unaware of the local/remotedistinction in this scenario, it is still present for the ap-plication programmer and system software. Though ap-plications can be automatically partitioned into loosely-coupled components to make them amenable to suchcloud offload [11, 29], and while VM migration can beused to ship applications to the cloud transparently [23],we believe that there is an opportunity to leverage theincreasingly hierarchical and disaggregated structure ofcloud resources at the edge [60, 61] to support applica-tions that are more tightly coupled, including enhancedgaming, augmented reality (AR) [70], virtual reality(VR) [65], interactive data analysis [53], and IoT [68].

One goal of classic distributed operating system workfrom the 70s and 80s was to hide loosely-coupled dis-tributed machines behind the illusion of a single, logi-cal system (a single system image [9]). While commer-cially this did not come to pass (except for limited com-ponents, e.g. file systems), a natural question now arises:is it time to reconsider this aspiration for our moderncomputational ecosystem? In the datacenter, the answerseems to be yes. High-speed interconnects (e.g. Infini-Band) within the datacenter have increasingly made re-source sharing between systems feasible [31,62], for ex-ample shared remote memory [1, 15, 45, 71] or remote

1

arX

iv:2

104.

0712

2v2

[cs

.DC

] 1

5 A

ug 2

021

edge servers

edge servers

edge servers

user trajectory

RAM

GPU

Figure 1: Coalescent Computing.

swap [2, 22, 49]. I/O device virtualization is becom-ing more sophisticated, with efficient offload enabled byAPI remoting [4, 16, 50], and device sharing among ten-ants [66]. These trends, along with hardware proposalsfor scale-out systems [21,36,43] and disaggregated hard-ware on the horizon [3, 12, 18, 28] point towards a data-center that consists of loosely-coupled, disaggregated re-sources. LegoOS, a notable first step in disaggregatedoperating systems, embraces this view of the datacenterwhile retaining Linux ABI compatibility [55], and Gi-antVM demonstrates how to virtually and transparentlycompose datacenter resources [69].

While datacenters obviously benefit from low-latency,wired interconnects and relatively static hardware con-figurations, the momentum in disaggregation is encour-aging for commodity computing at the edge as well,and we argue that there is a ripe opportunity for OS re-search to make Coalescent Computing a reality. Belowwe describe Coalescent Computing in more detail, dis-cuss some of its key research challenges, and ideas forOS design to investigate the space.

2 Coalescent Computing

Coalescent Computing can be captured succinctly withthe following principle:

Coalescence Principle: Users’ devices expe-rience a coalescence of resources proportionalto proximity as users move through the physi-cal environment.

Figure 1 depicts the Coalescence Principle at work.The number of resources coalesced into a user’s deviceis inversely proportional to the user’s network distance

user device

small edge rackhigh-end

workstation

1 2

3

4

Figure 2: Coalescence of disaggregated remote cores anda GPU in the user’s proximity.

to the hosting machines2. While in most cases thosehosting machines will be stationary, in some cases, userswith less constrained devices may offer their resourcesto nearby users, as in Femtocloud [25]. Other metrics,such as pricing, network congestion, and power budgetsfor the systems hosting disaggregated resources will alsoaffect availability.

As a user navigates the physical environment, the OSon their device queries nearby resource availability, andputs in bids for resources based on current and histori-cal system load. For instance, a user that just finished arecorded Zoom call might cause a CPU spike that trig-gers their OS to bid for nearby leasable CPUs to aid invideo encoding. If no such resources are available, thesystem can either borrow resources from the traditionalcloud (with a latency penalty), or fall back to local re-sources. Figure 2 depicts a scenario where a user is play-ing a CPU and GPU-intensive game that outstrips theabilities of his or her laptop. The user is in nearby prox-imity of a high-end workstation housing several GPUsand an edge rack that contains a collection of disaggre-gated CPUs. The user sees both physical resources onthe local machine ( 1 ) and virtual resources from re-mote systems ( 2 ). To accommodate system load, thedevice’s Coalescent OS transparently discovers, nego-tiates, and acquires a virtual GPU 4 and four virtualcores 3 from the rack over the wireless link. This re-actionary resource provisioning is reminiscent of com-putational sprinting [38, 48] and JIT-provisioned CyberForaging [24], but it happens at the granularity of disag-gregated cores and devices. Note that a corollary of theCoalescence Principle is that as users leave the environ-ment resources are relinquished and the system grace-fully migrates any necessary computations or state backto the local machine, or to the cloud if WAN latenciescan be tolerated.

We expect a system implementing Coalescent Com-puting to have the following properties:

2This may or may not correspond to physical distance, e.g. in wiredsettings.

2

Transparency While users may set coalescence poli-cies ahead of time, the system transparently acquires andrelinquishes resources nearby in the environment; this isa key distinguisher from typical cloud offload. Acquisi-tion here does not mean sole ownership of the physicalremote resource, and does not necessarily imply that theuser should be charged for it. For example, an idlinguser’s system may make resource reservations, but aslong as the user is idle, the OS or monitor on the re-mote system is free to schedule other work. Quality-of-Service (QoS) policies and the degree of sharing canbe determined by providers, and will likely change fromuser to user. Because users’ resource acquisition policiesmay be at odds with one another, and because applica-tions have diverse requirements, policy enforcement willinvolve solving a challenging, multi-objective optimiza-tion problem [20] on hosts that expose resources. Whilethere are effective techniques for solving such schedul-ing problems in the datacenter—for example with rec-ommender systems [13] or reinforcement learning andBayesian optimization [47]—in this setting user mobilitywill significantly affect resource availability and users’devices may coalesce resources from different providers,rendering a centralized scheduling mechanism ineffec-tive. One possible path forward is to combine applicationprofiling (informing resource requests) with decentral-ized versions of ML-based schedulers (guiding resourcegrants). Thus, sets of leasable resources (servers, desk-tops, and possibly user devices) form ad hoc networks torun a distributed, coalescent scheduler.

The user can monitor currently “attached” resourcesusing familiar means. For example, /proc/cpuinfo ina Coalescent OS exposing a Linux-like interface wouldinclude both physical CPUs on the device and virtualCPUs coalesced from a nearby edge server. Similarly,/proc/meminfo would show remotely coalesced pages(though sub-page granularity remote memory is a possi-bility [49]).

At a surface level, the OS sees the remote CPUs (andother resources) just like normal CPUs, and once theyare properly initialized and booted, the OS can schedulework on them. However, the OS must take care in howit schedules work on remote resources when applicationsare tightly coupled, so must have some notion of resourcelocalization (see Section 3.2). This bears some similar-ity to NUMA-awareness, but is more challenging giventhe inherent dynamism of resources whose coalescencedepends on user proximity.

Performance Users will expect their devices to be re-sponsive. While there is more flexibility here than inthe datacenter environment, the underlying technologypresents more challenges too (Section 3). In particu-lar, as resources attach and detach from user devices, it

should not perceptibly affect response times.While the single-system image abstraction is a com-

pelling one (e.g., CPU cores come and go as the usermoves around), not all applications will want to use thosecores, since their use comes with a latency penalty. Thesystem must be aware of the distinction between latency-sensitive and throughput-sensitive workloads [56], andmust guide resource coalescence with that in mind. Webelieve that there is likely a sweet spot for applica-tions that thrive in a coalescent setting. For applicationswith components that communicate quite often (e.g., atightly-coupled, multi-threaded stencil code), decouplingthe components will incur a significant penalty. On theother hand, loosely-coupled applications that performbulk computations (e.g., rendering a single scene) canbe sufficiently handled by offloading to a distant cloud.Thus, identifying applications that fit into this sweet spotis a primary concern.

Resilience Though we can envision Coalescent Com-puting extending to wired environments3, systems willmore often need to make do with unreliable wireless con-nections. A Coalescent OS must deal with dropped con-nections gracefully, for example using replication andfail-over, or by periodic checkpointing. In any case, tech-niques applied to achieve resilience should avoid cen-tralized coordination given the ephemeral proximity ofresources. However, some systems—for example edgeservers housed in a back room cabinet—will be morestatic by nature, will have a constant power source, andwill likely have a reliable wired connection to the in-ternet, and thus should be weighted more heavily whenchoosing coordinating nodes.

Customizability While users need not normally tendto resource coalescence policies, we anticipate that therewill arise scenarios where customization will be advan-tageous. For example, users may set a lower threshold onbattery levels at which the system discovers, negotiates,and leases resources, thus limiting power consumptionby the wireless radio and by the system itself. Even ifone user has a high-end laptop, he or she likely wouldnot want another user pegging one of the CPUs when thebattery is on its last leg. Other users might prefer moredetailed performance tuning, for example setting thresh-olds on swap space using remote memory, capacity limitson leased resources, CPU load thresholds for offloading,and so on.

Privacy and Security When users offload computa-tion to cloud resources or instantiate VMs on public

3For example, inductive charging surfaces seen in some coffeeshops now might one day incorporate network interfaces.

3

Technology Latency

SoL lower bound at 10m 33 nsCross-core cache-coherence 100-200 ns [30]soNUMA (proposed) 300 ns [43]Cross-socket (QPI) 355 ns [10]Inter-processor Interrupts (IPI) ∼500 ns [26]PCIe Gen 3 900 ns [40]InfiniBand RDMA (one-sided) 1 µs [32]WiFi 6E (reported) ∼2 ms [14]5G URLLC (reported) 1 ms [19, 37]5G (first-hop) 14 ms [39]Typical WiFi (90th %-ile) 20 ms [58]

Table 1: One-way latency for common and emerging net-working technologies.

infrastructure, they place some degree of trust in theprovider, since they direct the action. With Coales-cent Computing, a user’s application may be run on un-trusted hardware, potentially divulging sensitive infor-mation. Some malicious users may be incentivized tolease out their resources just to compromise other users’data. Others may coalesce resources from nearby users(possibly coordinating with other bad actors nearby) tocarry out a denial-of-service attack. Systems must havemechanisms in place to mitigate such scenarios. This isa problem that also plagues decentralized volunteer com-puting systems [17, 33]. Proper isolation using hardwaresupport, virtualization, and collaborative monitoring andreporting of bad actors can alleviate the effects of mali-cious behavior.

3 Challenges

We now discuss major challenges both in hardware andin OS design that impede progress in realizing Coales-cent Computing.

3.1 Hardware

The overriding challenge for Coalescent Computingfrom the hardware perspective will be the performancecharacteristics of wireless links. Table 1 lists single-hoplatencies for various interconnects up and down the stackreported by others. Cache line transfers on the coherencenetwork between Nehalem cores land in the 100-200nsrange, whereas high-performance InfiniBand cards arestill more than 3X that latency at ∼1µs. As Shan et al.have already shown in the datacenter environment, thisputs coherent resources off the table for now [55]. Thisespecially rings true for wireless technologies (last fourrows of Table 1). Typical WiFi connections have reason-

ably low first-hop latency at 20ms, but there is a longtail that puts the damper on deterministic performance.However, emerging, ultra-low latency wireless standardslike 5G URLCC (designed with applications like wire-less factory automation and AR in mind) and WiFi 6Ebring the latency down by an order of magnitude and arereported to reduce latency variance significantly. Coher-ence will still be out of reach, but with the right OS sup-port we believe Coalescent Computing can be realizedover these low-latency wireless links. For reference, thefirst row of the table shows the speed-of-light delay at10m, which we can view as a lower bound on the la-tency of future wireless networking between edge sys-tems. While there is much work on characterizing andimproving the performance of wireless links, little hasbeen done to guide automated decisions based on theirproperties. In particular, for a coalescent system to workproperly, it must be able to infer signal strength (and userdistance) accurately in order to project the impacts onapplication performance and thus guide coalescence dy-namically. This is an open problem.

Unfortunately, current wireless interfaces are not suit-able for operating with disaggregated resources. Pro-totype systems for disaggregated hardware today makeheavy use of RDMA capabilities and fixed network la-tencies. WiFi interfaces could be optimized for Coales-cent Computing, for example by customizing the wireprotocol for resource acquisition, and by integrating low-power mechanisms for resource discovery, as in Blue-Tooth Low Energy (BLE). These NICs might also in-corporate features we see in high-end cards today likeRDMA, atomics, and memory protection. The NICsmight also be integrated near the processors to act asa proxy socket to facilitate communication between re-mote resources, as in soNUMA [43].

While the OS may employ loosely-coupled monitorson remote resources (Section 3.2), users may want tocustomize the software they run on these resources. Thiswill require enhanced lightweight virtualization in wire-less NICs, namely self virtualization (e.g., SR-IOV) andboot protocols that incorporate disaggregated hardware(extended PXE). NICs on the users’ systems must co-ordinate with the BIOS (e.g. via a lightweight platformmanagement controller or a BMC) in order to keep hard-ware information exposed to the OS (namely, ACPI ta-bles that enumerate NUMA regions and processor infor-mation like the SRAT and SLIT tables) consistent withcoalesced resources. ACPI likely needs to be extended tosupport Coalescent Computing, and platform hardwarewill need to route the boot sequence (e.g. the SIPI andIPI sequence on x86 chips) through something like anAPICv [41] rather than applying the traditional trap-and-emulate model.

4

3.2 Software

A Coalescent OS will need to support the following:performance, disaggregation, resource discovery, adap-tation, hardware heterogeneity, and fault tolerance. Sev-eral OSes from the research community support some ofthese features, but not all. For example, LegoOS is thefirst OS designed for disaggregated hardware [55], andprovides a good foundation to build upon for Coales-cent Computing. The idea of stateless, loosely-coupledmonitors running on disaggregated hardware compo-nents will serve a Coalescent OS as well. However, theLegoOS design focuses on datacenter applications, andthe ExCache-based memory management, the global re-source managers, and the InfiniBand/RDMA-based RPCwill not transfer easily to a wireless edge setting withoutsignificant hardware enhancements.

Performance To reconcile privacy and performance,users will likely want their code and data to reside inisolated environments. This means that monitors willneed to employ very lightweight, fast-start hardware vir-tualization, which we have previously shown is possibleon the order of microseconds [64]. Light-weight, virtualexecution environments will be launched on-demand tohost second-level monitors from the mobile user’s sys-tem. Hardware monitors will isolate user monitors fromone another. Virtualization hardware enhancements dis-cussed in the previous section will make this more effi-cient, and a Coalescent OS will likely incorporate some-thing like the boot drivers used in Barrelfish/DC to ac-count for dynamically changing CPU information notsupported in ACPI [67]. The CPU boot process will lookmuch more like the plug-and-play PCI probing processpresent in commodity OSes today. For undersubscribedCPUs, the resource monitor may use CPU hot-removefunctionality to space-partition the user monitor, reminis-cent of co-kernels in Pisces [46]. As with Barrelfish/DC,decoupling the OS from the underlying hardware will al-low for greater flexibility with dynamic OS updates aswell, as was also demonstrated in K42 [8].

Performance will be mainly limited by network la-tency and bandwidth. A Coalescent OS will have to em-ploy aggressive techniques to hide network latency andvariability. The OS can avoid expensive coherence traf-fic by using message passing in lieu of shared memory,for example as is done in Barrelfish [7] and LegoOS.Serialization costs and software overheads must also beavoided, as we are learning with disks as SSDs becomefaster [35]. For remote memory performance, skewedaccess distributions may help [21], allowing caches to beused to take advantage of temporal locality, but it is un-likely to produce the same benefits we see in the datacen-ter. That said, similarities between users in geographical

proximity may offer hope, and the same principles thatenable CDNs will present opportunities for deduplica-tion and sharing in edge systems [61].

Coalescent systems will benefit from QoS policies.These policies can be set based on provider inputs (e.g.informed by user account balance), social credits (“howmany CPU hours has the user leased out?”), current sys-tem load, physical proximity, and the user’s affinity forparticular resources (“only coalesce memory, not CPU oraccelerators”).

The system should also employ best effort coalescenceof resources; namely, if network conditions are inca-pable of providing adequate performance, resource ne-gotiations should fail, and applications can run on localresources or fall back to the traditional cloud. This best-effort behavior has already been demonstrated for servic-ing I/O requests in MittOS [27].

Generally speaking, as Schwarzkopf et al. aptly pointout [54], deterministic performance was the albatross forearly distributed OSes, and we must be mindful of thelessons learned there [59, 63]. Hardware improvementswill certainly help, but exposing performance variabilityto the OS is paramount for it to make acceptable deci-sions.

Heterogeneity Disaggregated CPUs, GPUs, FPGAs,memory, and storage will inevitably be more heteroge-neous than in a typical datacenter. A Coalescent OS musthandle this heterogeneity transparently. Monitors writtenfor different devices can expose a unified interface, butapplications must be able to run on diverse hardware, in-cluding different ISAs, especially as competitors to x86gain prominence. This system might require applica-tions be compiled into fat binaries, but a more flexibleapproach would employ an intermediate representation(IR) to dynamically adapt the application to the ISAsof nearby resources, as in Helios [42]. Such a systemwould make judicious use of just-in-time (JIT) compil-ers on edge nodes, or in cases where performance is lesscritical, language VMs. Using JIT compilation to ad-dress heterogeneity adds another layer of complexity forperformance, as it can introduce even more variability.Managing this variability is critical for Coalescent Com-puting, but we have only scratched the surface of mini-mizing JIT compilation latency [34].

Resource Discovery As users navigate the physicalenvironment, their devices must query nearby systemsfor available resources. This resource discovery processmust occur often enough to react to load spikes, but notso often as to drain device battery and congest the lo-cal network. The OS and hardware might employ UPnPhere [44], as in Slingshot [57], but the protocol will likelyneed to be enhanced to include resource load information

5

and performance characteristics. When negotiating coa-lescence, the OS will automatically choose a subset ofnearby resources subject to user preferences and systemload.

Programming Model The Coalescent OS can by de-fault transparently migrate computation between localand remote machines. For example, for each new vCPUadded to the system via Coalescence, the OS exposesa new run queue, e.g., over distributed shared memory.The OS can add a thread to the remote run queue asit would on the local system, but with a performancepenalty. For example, the code in Listing 1 shows a triv-ial example of creating a worker thread that processestasks in a shared queue. The Coalescent OS is freeto schedule this thread on a remote vCPU if available.However, since the user is mobile, that remote vCPUmay disappear. The remote worker thread might dequeuework from the shared queue then fail, losing that work.There are of course many techniques for handling fail-ures and ensuring consistency in distributed systems, butin a language like C where mutations on shared state canhappen anywhere, it is quite challenging to apply thesetechniques transparently.

1 s t a t i c vo id worker ( ) {2 w h i l e ( 1 ) {3 work_t * work = dequeue_work ( ) ;4 do_work ( work ) ;5 }6 }7

8 vo id main ( ) {9 p t h r e a d _ t t h r ;

10 p t h r e a d _ c r e a t e (& t h r , NULL, worker , NULL) ;11 p t h r e a d _ j o i n ( t h r , NULL) ;12 }

Listing 1: Creating a worker thread.

One possibility is to expose remote resources and thepotential for failure to the programmer. Listing 2 showssuch an example using a CC-aware wrapper around thepthreads runtime. Here the programmer explicitly placesconstraints on the remote vCPUs that the thread can runon by specifying the maximum acceptable latency to theremote vCPU in µs. The programmer also indicates thatthe system should favor remote vCPUs over local oneswhen available (EAGER_REMOTE). Finally, the program-mer specifies that when a failure is detected by the run-time, the thread should be recreated on the local machineafter a failure is handled by user-specified code (here theprogrammer provides code to recover the work queue,e.g., with a persistent write-ahead log).

1 # i n c l u d e < c t h r e a d . h>2 s t a t i c s t r u c t c t h r e a d _ a t t r s {3 . l a t e n c y _ b o u n d = 100 , / / u sec

4 . s t r a t e g y = EAGER_REMOTE,5 . f a i l u r e = RETRY_LOCAL,6 } c t a ;7

8 s t a t i c i n t h a n d l e _ f a i l u r e ( ) {9 r e t u r n recove r_work_queue ( ) ;

10 }11

12 vo id main ( ) {13 c t h r e a d _ t t h r ;14 c t h r e a d _ c r e a t e (& t h r , &c t a , fun , NULL,

h a n d l e _ f a i l u r e ) ;15 c t h r e a d _ j o i n ( t h r , NULL) ;16 }

Listing 2: Creating a worker thread with failure recoveryusing an explicit CC API.

In many cases, it will be preferable to manage func-tions running on remote resources, rather than executioncontexts. In this case, the Coalescent OS can exposea function-as-a-service (FaaS) API as well. In additionto the typical FaaS event-triggered function invocationmodel, a CC FaaS API might also allow for RPC-like,synchronous invocations via language annotations. Forexample, a programmer might specify that a function canrun on remote resources by using a virtine (virtual sub-routine) [64], as shown in Listing 3.

1 v i r t i n e i n t fun ( ) {2 do_work ( ) ;3 }

Listing 3: Coalescence with a virtine.

In this case, if remote resources are available, the in-vocation of fun will cause a light-weight, isolated VM(or container) to be spawned on the remote vCPU andthe function will run to completion.

Adaptation and Fault Tolerance A Coalescent OSwill need to adapt to changing network conditions andresource availability. Ideas from systems like Chromaapply here [6]; resources must be monitored, and appli-cation usage estimated so that applications can scale upand down depending on what is available. The systemmay leverage redundant computations across multiple re-sources to mitigate tail latency and for resilience.

To handle failures, a coalescent system will likely em-ploy replicas and periodic checkpointing. Replicationwill be more challenging than in the datacenter environ-ment given increased mobility, but replica selection canbe informed by mobility characteristics of different sys-tems. A server plugged into the wall would be a betterchoice for fail-over rather than a nearby laptop. Repli-cas might exist in hierarchies based on the environment.For example, a secondary replica may be placed on thenearby edge server, and a tertiary replica may be instanti-ated in the cloud with relaxed consistency. Append-only

6

storage can be used to persist state changes and aid infailure recovery.

When component or connection failures occur, orwhen the user moves out of range , the system must de-cide how to react. This will largely depend on applica-tion resource demands and latency sensitivity. For ex-ample, when a user training a neural network decides tomove away from a resource-rich area, the training canbe shipped off to the cloud to complete. However, a userrunning an immersive augmented reality application mayprefer to have all computation and state migrated back tothe local device, perhaps trading off degraded quality forresponsiveness.

4 Conclusion

Several challenges remain for Coalescent Computingwhich we do not touch on here, but which we do planto investigate. These include the storage interface, re-source naming, and a more detailed treatment of privacyand security (e.g. authentication).

The building blocks for Coalescent Computing aregradually being put in place. We will soon stand atthe confluence of disaggregated hardware, hierarchicallydistributed clouds, and ultra low-latency wireless net-works. We argue that exploring systems that support thismodel will not only put more computational power atusers’ fingertips, but will also shed light on new avenuesof systems research.

Acknowledgements

This paper would not have been possible without valu-able discussions and feedback from Conghao Liu, BrianTauro, Nicholas Wanninger, Rich Wolski, Peter Dinda,and Nikos Hardavellas. This work is supported by theUnited States National Science Foundation via awardsCNS-1718252, CNS-1763612, CNS-1730689, CCF-1757964, CCF-2029014, and CCF-2028958.

References

[1] Marcos K. Aguilera, Nadav Amit, Irina Cal-ciu, Xavier Deguillard, Jayneel Gandhi, StankoNovakovic, Arun Ramanathan, Pratap Subrah-manyam, Lalith Suresh, Kiran Tati, RajeshVenkatasubramanian, and Michael Wei. Remote re-gions: A simple abstraction for remote memory. InProceedings of the 2018 USENIX Annual Techni-cal Conference, USENIX ATC ’18, pages 775–787,USA, July 2018. USENIX Association.

[2] Emmanuel Amaro, Christopher Branner-Augmon,Zhihong Luo, Amy Ousterhout, Marcos K. Aguil-

era, Aurojit Panda, Sylvia Ratnasamy, and ScottShenker. Can far memory improve job through-put? In Proceedings of the 15th European Con-ference on Computer Systems, EuroSys ’20, NewYork, NY, USA, 2020. Association for ComputingMachinery.

[3] Krste Asanovic. Firebox: A hardware buildingblock for 2020 warehouse-scale computers. In Pro-ceedings of the 12th USENIX Conference on Fileand Storage Technologies, FAST ’14, Santa Clara,CA, February 2014. USENIX Association.

[4] Marco Bacis, Rolando Brondolin, and Marco D.Santambrogio. BlastFunction: An FPGA-as-a-Service system for accelerated serverless comput-ing. In Proceedings of the 23rd Conference on De-sign, Automation and Test in Europe, DATE ’20,pages 852–857, San Jose, CA, USA, March 2020.EDA Consortium.

[5] Rajesh Balan, Jason Flinn, M. Satyanarayanan,Shafeeq Sinnamohideen, and Hen-I Yang. The casefor cyber foraging. In Proceedings of the 10thWorkshop on ACM SIGOPS European Workshop,EW ’10, pages 87–92, New York, NY, USA, 2002.Association for Computing Machinery.

[6] Rajesh Krishna Balan, Mahadev Satyanarayanan,So Young Park, and Tadashi Okoshi. Tactics-basedremote execution for mobile computing. In Pro-ceedings of the 1st International Conference onMobile Systems, Applications and Services, Mo-biSys ’03, pages 273–286, New York, NY, USA,2003. Association for Computing Machinery.

[7] Andrew Baumann, Paul Barham, Pierre EvaristeDagand, Tim Harris, Rebecca Isaacs, Simon Peter,Timothy Roscoe, Adrian Schüpbach, and AkhileshSinghania. The Multikernel: A new OS architec-ture for scalable multicore systems. In Proceedingsof the 22nd ACM Symposium on Operating SystemsPrinciples, SOSP ’09, pages 29–44, October 2009.

[8] Andrew Baumann, Gernot Heiser, Jonathan Ap-pavoo, Dilma Da Silva, Orran Krieger, Robert W.Wisniewski, and Jeremy Kerr. Providing dynamicupdate in an operating system. In Proceedings ofthe 2005 USENIX Annual Technical Conference,USENIX ATC ’05, page 32, USA, April 2005.USENIX Association.

[9] Rajkumar Buyya, Toni Cortes, and Hai Jin. Singlesystem image. The International Journal of HighPerformance Computing Applications, 15(2):124–135, 2001.

7

[10] Young-kyu Choi, Jason Cong, Zhenman Fang,Yuchen Hao, Glenn Reinman, and Peng Wei. Aquantitative analysis on microarchitectures of mod-ern cpu-fpga platforms. In Proceedings of the 53rd

Annual Design Automation Conference, DAC ’16,New York, NY, USA, June 2016. Association forComputing Machinery.

[11] Byung-Gon Chun, Sunghwan Ihm, Petros Mani-atis, Mayur Naik, and Ashwin Patti. CloneCloud:Elastic execution between mobile device and cloud.In Proceedings of the 6th Conference on ComputerSystems, EuroSys ’11, pages 301–314, New York,NY, USA, 2011. Association for Computing Ma-chinery.

[12] I-Hsin Chung, Bulent Abali, and Paul Crumley. To-wards a composable computer system. In Proceed-ings of the International Conference on High Per-formance Computing in Asia-Pacific Region, HPCAsia ’18, pages 137–147, New York, NY, USA,2018. Association for Computing Machinery.

[13] Christina Delimitrou and Christos Kozyrakis.Paragon: QoS-aware scheduling for heterogeneousdatacenters. In Proceedings of the 18th Inter-national Conference on Architectural Support forProgramming Languages and Operating Systems,ASPLOS ’13, pages 77–88, New York, NY, USA,2013. Association for Computing Machinery.

[14] Gino Dion. Wi-Fi 6 and Wi-Fi 6E: better,faster, more. https://www.nokia.com/blog/wi-fi-6-and-wi-fi-6e-better-faster-more/,May 2020. Accessed 2020-01-31.

[15] Aleksandar Dragojevic, Dushyanth Narayanan,Miguel Castro, and Orion Hodson. FaRM: Fast re-mote memory. In Proceedings of the 11th USENIXSymposium on Networked Systems Design and Im-plementation, NSDI ’14, pages 401–414, Seattle,WA, April 2014. USENIX Association.

[16] José Duato, Antonio J. Peña, Federico Silla, RafaelMayo, and Enrique S. Quintana-Ortí. rCUDA:Reducing the number of GPU-based acceleratorsin high performance clusters. In Proceedings ofthe 2010 International Conference on High Perfor-mance Computing and Simulation, pages 224–231,June 2010.

[17] Arnaud Durand, Mikael Gasparian, Thomas Rou-vinez, Imad Aad, Torsten Braun, and Tuan AnhTrinh. BitWorker, a decentralized distributed com-puting system based on bittorrent. In Proceedingsof the 13th International Conference on Wired/Wir-less Internet Communications, WWIC ’15, pages151–164. Springer, May 2015.

[18] Paolo Faraboschi, Kimberly Keeton, Tim Mars-land, and Dejan Milojicic. Beyond processor-centric operating systems. In Proceedings of the15th Workshop on Hot Topics in Operating Systems,HotOS XV, Kartause Ittingen, Switzerland, May2015. USENIX Association.

[19] Thomas Fehrenbach, Rohit Datta, Baris Göktepe,Thomas Wirth, and Cornelius Hellge. URLLC ser-vices in 5G low latency enhancements for LTE. InProceedings of the 88th IEEE Vehicular TechnologyConference, VTC-Fall, pages 1–6, 2018.

[20] Yaru Fu, Xiaolong Yang, Peng Yang, K.Y. Wong,Zheng Shi, Hong Wang, and Tony Q.S. Quek.Energy-efficient offloading and resource allocationfor mobile edge computing enabled mission-criticalinternet-of-things systems. EURASIP Journal onWireless Communications and Networking, Febru-ary 2021.

[21] Vasilis Gavrielatos, Antonios Katsarakis, ArpitJoshi, Nicolai Oswald, Boris Grot, and Vijay Na-garajan. Scale-out ccNUMA: Exploiting skew withstrongly consistent caching. In Proceedings of the13th EuroSys Conference, EuroSys ’18, New York,NY, USA, 2018. Association for Computing Ma-chinery.

[22] Juncheng Gu, Youngmoon Lee, Yiwen Zhang,Mosharaf Chowdhury, and Kang G. Shin. Efficientmemory disaggregation with infiniswap. In Pro-ceedings of the 14th USENIX Symposium on Net-worked Systems Design and Implementation, NSDI’17, pages 649–667, Boston, MA, March 2017.USENIX Association.

[23] Kiryong Ha, Yoshihisa Abe, Thomas Eiszler, ZhuoChen, Wenlu Hu, Brandon Amos, Rohit Upad-hyaya, Padmanabhan Pillai, and Mahadev Satya-narayanan. You can teach elephants to dance: AgileVM handoff for edge computing. In Proceedings ofthe 2nd ACM/IEEE Symposium on Edge Comput-ing, SEC ’17, New York, NY, USA, 2017. Associ-ation for Computing Machinery.

[24] Kiryong Ha, Padmanabhan Pillai, WolfgangRichter, Yoshihisa Abe, and Mahadev Satya-narayanan. Just-in-time provisioning for cyber for-aging. In Proceeding of the 11th Annual Inter-national Conference on Mobile Systems, Applica-tions, and Services, MobiSys ’13, pages 153–166,New York, NY, USA, June 2013. Association forComputing Machinery.

8

https://www.nokia.com/blog/wi-fi-6-and-wi-fi-6e-better-faster-more/

https://www.nokia.com/blog/wi-fi-6-and-wi-fi-6e-better-faster-more/

[25] Karim Habak, Mostafa Ammar, Khaled A. Harras,and Ellen Zegura. Femto clouds: Leveraging mo-bile devices to provide cloud service at the edge.In Proceedings of the 8th IEEE International Con-ference on Cloud Computing, CLOUD ’15, pages9–16. IEEE, June 2015.

[26] Kyle C. Hale and Peter Dinda. An evaluation ofasynchronous events on modern hardware. In Pro-ceedings of the 26th IEEE International Symposiumon the Modeling, Analysis, and Simulation of Com-puter and Telecommunication Systems, MASCOTS’18. IEEE, September 2018.

[27] Mingzhe Hao, Huaicheng Li, Michael Hao Tong,Chrisma Pakha, Riza O. Suminto, Cesar A. Stu-ardo, Andrew A. Chien, and Haryadi S. Gunawi.MittOS: Supporting millisecond tail tolerance withfast rejecting SLO-aware OS interface. In Proceed-ings of the 26th Symposium on Operating SystemsPrinciples, SOSP ’17, pages 168–183, New York,NY, USA, October 2017. Association for Comput-ing Machinery.

[28] Hewlett-Packard, Inc. The machine: A newkind of computer. https://www.hpl.hp.com/research/systems-research/themachine/.Accessed: 2020-01-10.

[29] Galen C. Hunt and Michael L. Scott. The Coignautomatic distributed partitioning system. In Pro-ceedings of the 3rd Symposium on Operating Sys-tems Design and Implementation, OSDI ’99, pages187–200, USA, 1999. USENIX Association.

[30] Stefan Kaestle, Reto Achermann, Roni Haecki,Moritz Hoffmann, Sabela Ramos, and TimothyRoscoe. Machine-aware atomic broadcast trees formulticores. In Proceedings of the 12th USENIXSymposium on Operating Systems Design and Im-plementation, OSDI ’16, pages 33–48, Savannah,GA, November 2016. USENIX Association.

[31] Anuj Kalia, Michael Kaminsky, and David G. An-dersen. Design guidelines for high performanceRDMA systems. In Proceedings of the 2016USENIX Annual Technical Conference, USENIXATC ’16, pages 437–450. USENIX Association,June 2016.

[32] Anuj Kalia, Michael Kaminsky, and David G. An-dersen. Design guidelines for high performanceRDMA systems. In Proceedings of the 2016USENIX Annual Technical Conference, USENIXATC ’16, pages 437–450, Denver, CO, June 2016.USENIX Association.

[33] Nils Kopal, Matthäus Wander, Christopher Konze,and Henner Heck. Adaptive cheat detection indecentralized volunteer computing with untrustednodes. In Proceedings of the 17th IFIP Interna-tional Conference on Distributed Applications andInteroperable Systems, DAIS ’17, pages 192–205.Springer, June 2017.

[34] Martin Kristien, Tom Spink, Harry Wagstaff, BjörnFranke, Igor Böhm, and Nigel Topham. MitigatingJIT compilation latency in virtual execution envi-ronments. In Proceedings of the 15th ACM SIG-PLAN/SIGOPS International Conference on Vir-tual Execution Environments, VEE ’19, pages 101–107, New York, NY, USA, 2019. Association forComputing Machinery.

[35] Gyusun Lee, Seokha Shin, Wonsuk Song, Tae JunHam, Jae W. Lee, and Jinkyu Jeong. AsynchronousI/O stack: A low-latency kernel I/O stack for ultra-low latency SSDs. In Proceedings of the 2019USENIX Annual Technical Conference, USENIXATC ’19, pages 603–616, Renton, WA, July 2019.USENIX Association.

[36] Pejman Lotfi-Kamran, Boris Grot, Michael Ferd-man, Stavros Volos, Onur Kocberber, Javier Pi-corel, Almutaz Adileh, Djordje Jevdjic, Sachin Id-gunji, Emre Ozer, and Babak Falsafi. Scale-out pro-cessors. In Proceedings of the 39th Annual Interna-tional Symposium on Computer Architecture, ISCA’12, pages 500–511, USA, June 2012. IEEE Com-puter Society.

[37] Patrick Merias. Study on physical layer enhance-ments for NR ultra-reliable and low latency case(URLLC). Technical Report TR 38.824, release16, 3rd Generation Partnership Project (3GPP), July2018.

[38] Nathaniel Morris, Christopher Stewart, LydiaChen, Robert Birke, and Jaimie Kelley. Model-driven computational sprinting. In Proceedings ofthe 13th European Conference on Computer Sys-tems, EuroSys ’18, New York, NY, USA, April2018. Association for Computing Machinery.

[39] Arvind Narayanan, Eman Ramadan, Jason Carpen-ter, Qingxu Liu, Yu Liu, Feng Qian, and Zhi-LiZhang. A first look at commercial 5G performanceon smartphones. In Proceedings of The Web Con-ference, WWW ’20, pages 894–905, New York,NY, USA, April 2020. Association for ComputingMachinery.

9

https://www.hpl.hp.com/research/systems-research/themachine/

https://www.hpl.hp.com/research/systems-research/themachine/

[40] Rolf Neugebauer, Gianni Antichi, José FernandoZazo, Yury Audzevich, Sergio López-Buedo, andAndrew W. Moore. Understanding PCIe perfor-mance for end host networking. In Proceedings ofthe 2018 Conference of the ACM Special InterestGroup on Data Communication, SIGCOMM ’18,pages 327–341, New York, NY, USA, August 2018.Association for Computing Machinery.

[41] Khang T. Nguyen. APIC virtual-ization performance testing and io-zone. https://software.intel.com/content/www/us/en/develop/blogs/apic-virtualization-performance-testing-and-iozone.html, December 2013. Accessed 2020-12-20.

[42] Edmund B. Nightingale, Orion Hodson, Ross McIl-roy, Chris Hawblitzel, and Galen Hunt. He-lios: Heterogeneous multiprocessing with satel-lite kernels. In Proceedings of the 22nd ACMSIGOPS Symposium on Operating Systems Princi-ples, SOSP ’09, pages 221–234, New York, NY,USA, October 2009. Association for ComputingMachinery.

[43] Stanko Novakovic, Alexandros Daglis, EdouardBugnion, Babak Falsafi, and Boris Grot. Scale-outNUMA. In Proceedings of the 19th InternationalConference on Architectural Support for Program-ming Languages and Operating Systems, ASPLOS’14, pages 3–18, New York, NY, USA, 2014. Asso-ciation for Computing Machinery.

[44] Open Connectivity Foundation. UPnP Standard.https://openconnectivity.org/developer/specifications/upnp-resources/upnp/,2021. Accessed 2020-11-05.

[45] John Ousterhout, Parag Agrawal, David Erickson,Christos Kozyrakis, Jacob Leverich, David Maz-ières, Subhasish Mitra, Aravind Narayanan, GuruParulkar, Mendel Rosenblum, Stephen M. Rumble,Eric Stratmann, and Ryan Stutsman. The case forRAMClouds: Scalable high-performance storageentirely in DRAM. SIGOPS Operating Systems Re-view, 43(4):92–105, January 2010.

[46] Jiannan Ouyang, Brian Kocoloski, John R. Lange,and Kevin Pedretti. Achieving performance iso-lation with lightweight co-kernels. In Proceed-ings of the 24th International Symposium on High-Performance Parallel and Distributed Computing,HPDC ’15, pages 149–160, New York, NY, USA,June 2015. Association for Computing Machinery.

[47] Tirthak Patel and Devesh Tiwari. CLITE: Efficientand QoS-aware co-location of multiple latency-

critical jobs for warehouse scale computers. In Pro-ceedings of the IEEE International Symposium onHigh Performance Computer Architecture, HPCA’20, pages 193–206. IEEE, February 2020.

[48] Arun Raghavan, Yixin Luo, Anuj Chandawalla,Marios Papaefthymiou, Kevin P. Pipe, Thomas F.Wenisch, and Milo M. K. Martin. Computationalsprinting. In Proceedings of the 18th IEEE Interna-tional Symposium on High-Performance ComputerArchitecture, HPCA ’12, pages 1–12, USA, Febru-ary 2012. IEEE Computer Society.

[49] Zhenyuan Ruan, Malte Schwarzkopf, Marcos K.Aguilera, and Adam Belay. AIFM: High-performance, application-integrated far memory.In Proceedings of the 14th USENIX Symposiumon Operating Systems Design and Implementation,OSDI ’20, pages 315–332. USENIX Association,November 2020.

[50] Ardalan Amiri Sani and Thomas Anderson. Thecase for I/O-device-as-a-service. In Proceedings ofthe 17th Workshop on Hot Topics in Operating Sys-tems, HotOS XVII, pages 66–72, New York, NY,USA, 2019. Association for Computing Machinery.

[51] Mahadev Satyanarayanan. A brief history of cloudoffload: A personal journey from odyssey throughcyber foraging to cloudlets. GetMobile: Mo-bile Computing and Communications, 18(4):19–23, January 2015.

[52] Mahadev Satyanarayanan, Wei Gao, and BrandonLucia. The computing landscape of the 21st cen-tury. In Proceedings of the 20th International Work-shop on Mobile Computing Systems and Applica-tions, HotMobile ’19, pages 45–50, New York, NY,USA, 2019. Association for Computing Machinery.

[53] Mahadev Satyanarayanan, Guenter Klas, MarcoSilva, and Simone Mangiante. The seminal roleof edge-native applications. In Proceedings ofthe 2019 IEEE International Conference on EdgeComputing, EDGE ’19, pages 33–40, 2019.

[54] Malte Schwarzkopf, Matthew P. Grosvenor, andSteven Hand. New wine in old skins: The casefor distributed operating systems in the data cen-ter. In Proceedings of the 4th Asia-Pacific Workshopon Systems, APSys ’13, New York, NY, USA, July2013. Association for Computing Machinery.

[55] Yizhou Shan, Yutong Huang, Yilun Chen, and Yiy-ing Zhang. LegoOS: A disseminated, distributed

10

https://software.intel.com/content/www/us/en/develop/blogs/apic-virtualization-performance-testing-and-iozone.html




https://openconnectivity.org/developer/specifications/upnp-resources/upnp/

https://openconnectivity.org/developer/specifications/upnp-resources/upnp/

OS for hardware resource disaggregation. In Pro-ceedings of the 13th USENIX Symposium on Oper-ating Systems Design and Implementation, OSDI’18, pages 69–87, Carlsbad, CA, October 2018.USENIX Association.

[56] Jan Solanti, Michal Babej, Julius Ikkala, and PekkaJääskeläinen. POCL-R: Distributed OpenCL run-time for low latency remote offloading. In Proceed-ings of the International Workshop on OpenCL,IWOCL ’20, New York, NY, USA, 2020. Associ-ation for Computing Machinery.

[57] Ya-Yunn Su and Jason Flinn. Slingshot: Deployingstateful services in wireless hotspots. In Proceed-ings of the 3rd International Conference on MobileSystems, Applications, and Services, MobiSys ’05,pages 79–92, New York, NY, USA, 2005. Associa-tion for Computing Machinery.

[58] Kaixin Sui, Mengyu Zhou, Dapeng Liu, MinghuaMa, Dan Pei, Youjian Zhao, Zimu Li, and ThomasMoscibroda. Characterizing and improving WiFilatency in large-scale operational networks. In Pro-ceedings of the 14th Annual International Confer-ence on Mobile Systems, Applications, and Ser-vices, MobiSys ’16, pages 347–360, New York,NY, USA, June 2016. Association for ComputingMachinery.

[59] Andrew S. Tanenbaum and Robbert Van Renesse.Distributed operating systems. ACM ComputingSurveys, 17(4):419–470, December 1985.

[60] Liang Tong, Yong Li, and Wei Gao. A hierarchicaledge cloud architecture for mobile computing. InProceedings of the 35th Annual IEEE InternationalConference on Computer Communications, INFO-COM ’16, pages 1–9. IEEE, April 2016.

[61] Animesh Trivedi, Lin Wang, Henri Bal, andAlexandru Iosup. Sharing and caring of data at theedge. In Proceedings of the 3rd USENIX Workshopon Hot Topics in Edge Computing, HotEdge ’20.USENIX Association, June 2020.

[62] Shin-Yeh Tsai and Yiying Zhang. LITE kernelRDMA support for datacenter applications. In Pro-ceedings of the 26th Symposium on Operating Sys-tems Principles, SOSP ’17, pages 306–324, Octo-ber 2017.

[63] Nikos Vasilakis, Ben Karel, and Jonathan M.Smith. From lone dwarfs to giant superclusters:Rethinking operating system abstractions for thecloud. In Proceedings of the 15th Workshop on HotTopics in Operating Systems, HotOS XV, Kartause

Ittingen, Switzerland, May 2015. USENIX Associ-ation.

[64] Nicholas Wanninger, Joshua J. Bowden, andKyle C. Hale. Virtines: Virtualization at functioncall granularity, 2021.

[65] Chenhao Xie, Xie Li, Yang Hu, Huwan Peng,Michael Taylor, and Shuaiwen Leon Song. Q-VR:System-level design for future mobile collaborativevirtual reality. In Proceedings of the 26th ACMInternational Conference on Architectural Supportfor Programming Languages and Operating Sys-tems, ASPLOS ’21, pages 587–599, New York, NY,USA, 2021. Association for Computing Machinery.

[66] Hangchen Yu, Arthur Michener Peters, Amogh Ak-shintala, and Christopher J. Rossbach. AvA: Accel-erated virtualization of accelerators. In Proceed-ings of the 25th International Conference on Archi-tectural Support for Programming Languages andOperating Systems, ASPLOS ’20, pages 807–825,New York, NY, USA, March 2020. Association forComputing Machinery.

[67] Gerd Zellweger, Simon Gerber, Kornilios Kourtis,and Timothy Roscoe. Decoupling cores, kernels,and operating systems. In Proceedings of the 11th

USENIX Symposium on Operating Systems Designand Implementation, OSDI ’14, pages 17–31, USA,October 2014. USENIX Association.

[68] Steffen Zeuch, Eleni Tzirita Zacharatou, ShuhaoZhang, Xenofon Chatziliadis, Ankit Chaudhary,Bonaventura Del Monte, Dimitrios Giouroukis,Philipp M Grulich, Ariane Ziehn, and VolkerMark. NebulaStream: Complex analytics beyondthe cloud. In Proceedings of the InternationalWorkshop on Very Large Internet of Things, VLIoT’20, August 2020.

[69] Jin Zhang, Zhuocheng Ding, Yubin Chen, Xing-guo Jia, Boshi Yu, Zhengwei Qi, and HaibingGuan. GiantVM: A Type-II hypervisor imple-menting many-to-one virtualization. In Proceed-ings of the 16th ACM SIGPLAN/SIGOPS Interna-tional Conference on Virtual Execution Environ-ments, VEE ’20, pages 30–44, New York, NY,USA, 2020. Association for Computing Machinery.

[70] Wenxiao Zhang, Bo Han, and Pan Hui. On the net-working challenges of mobile augmented reality. InProceedings of the Workshop on Virtual Reality andAugmented Reality Network, VR/AR Network ’17,pages 24–29, New York, NY, USA, 2017. Associa-tion for Computing Machinery.

11

[71] Yiying Zhang, Jian Yang, Amirsaman Memaripour,and Steven Swanson. Mojim: A reliable andhighly-available non-volatile memory system. InProceedings of the 20th International Conferenceon Architectural Support for Programming Lan-

guages and Operating Systems, ASPLOS ’15,pages 3–18, New York, NY, USA, 2015. Associ-ation for Computing Machinery.

12

Documents

Coalescent Computing - arXiv