8
Preprint, to appear in Procs. IEEE International Conference on Cloud Engineering (IC2E 2020), 21-24 April, Sydney Australia The Ifs and Buts of Less is More: A Serverless Computing Reality Check orn Kuhlenkamp Information Systems Engineering Technische Universit¨ at Berlin Berlin, Germany [email protected] Sebastian Werner Information Systems Engineering Technische Universit¨ at Berlin Berlin, Germany [email protected] Stefan Tai Information Systems Engineering Technische Universit¨ at Berlin Berlin, Germany [email protected] Abstract—Serverless computing defines a pay-as-you-go cloud execution model, where the unit of computation is a function that a cloud provider executes and auto-scales on behalf of a cloud consumer. Serverless suggests not (or less) caring about servers but focusing (more) on business logic expressed in functions. Server’less’ may be ‘more’ when getting developer expectations and platform propositions right and when engineering solutions that take specific behavior and constraints of (current) Function- as-a-Service platforms into account. To this end, in this invited paper, we present a summary of findings and lessons learned from a series of research experiments conducted over the past two years. We argue that careful attention must be placed on the promises associated with the serverless model, provide a reality-check for five common assumptions, and suggest ways to mitigate unwanted effects. Our findings focus on application workload distribution and computational processing complexity, the specific auto-scaling mechanisms in place, the behavior and strategies implemented with operational tasks, the constraints and limitations existing when composing functions, and the costs of executing functions. Keywords-serverless computing, function-as-a-service (FaaS), FaaS platforms, FaaS experimentation. I NTRODUCTION Serverless computing is emerging as an alternative cloud execution model. Instead of leasing infrastructure resources as in traditional cloud systems, developers using serverless computing just submit function code to a cloud provider; thus the term ”Function-as-a-Service (FaaS)”. The general promise is that the provider runs the code on-demand, scales automatically as needed, and bills the consumer only for the time that the function code is running. Serverless computing thus suggests to abstract away op- erational concerns, to offer a simplified programming and deployment model for cloud applications, and to offer a cost advantage. Because of cloud developers ”doing less”, i.e., not caring about provisioning and operating machines, so the promise, developers can focus and ”do more” on the business aspects of their applications. In this paper, we identify a number of ifs and buts for making this ”less is more” promise a reality. We argue to pay careful attention on common assumptions existing with FaaS platforms and to question these assumptions taking the specifics and idiosyncrasies of the (current) FaaS platforms into account. We identify five such common assumptions and corresponding engineering needs, related to application work- load distribution and function processing complexity, auto- scaling based on events, the importance of operational tasks, function compositions and dependencies, and execution costs. Without a good understanding of the reality and limitations of (current) FaaS offerings and implementations, developers run danger to face even more responsibilities related to fixing undesirable system behavior, dealing with system failures, or running into unexpected costs. Neither the promises of a simplified execution model nor a cost advantage would then be fulfilled. Our findings are based on and summarize prior research and experimental work on serverless computing conducted over the last two years. This paper thus presents a cumulative, integrative view of what we have been observing (and are still learning about) serverless computing. Due to the dy- namic nature of the FaaS market and concrete FaaS offerings evolving at rapid speed, this paper to some extent is of a situational nature. Still, we believe our findings to be critical for the continued, larger objective of computing conveniently, cost-efficiently, with auto-scaling and minimal infrastructure management efforts, in the cloud. BACKGROUND Application developers can choose from a wide range of both commercial fully-managed FaaS platforms [1] and non- commercial open-source FaaS platforms [2]. Examples of fully-managed offers include AWS Lambda 1 , Google Cloud Functions 2 , Azure Functions 3 , and IBM Cloud Functions 4 . Prominent candidates of open source FaaS platform imple- mentations are OpenWhisk 5 , Fission 6 , and Knative 7 . Common to all FaaS platforms is that each platform defines a programming model, a function execution model, and a cost model. At the core of serverless computing is the function abstrac- tion: a function is a specific task that takes some input and 1 https://aws.amazon.com/de/lambda 2 https://cloud.google.com/functions 3 https://azure.microsoft.com/services/functions 4 https://www.ibm.com/cloud/functions 5 https://openwhisk.apache.org 6 https://fission.io/ 7 https://cloud.google.com/knative

The Ifs and Buts of Less is More: A Serverless Computing Reality … · Functions2, Azure Functions3, and IBM Cloud Functions4. Prominent candidates of open source FaaS platform imple-mentations

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Ifs and Buts of Less is More: A Serverless Computing Reality … · Functions2, Azure Functions3, and IBM Cloud Functions4. Prominent candidates of open source FaaS platform imple-mentations

Preprint, to appear in Procs. IEEE International Conference on Cloud Engineering (IC2E 2020), 21-24 April, Sydney Australia

The Ifs and Buts of Less is More: A ServerlessComputing Reality Check

Jorn KuhlenkampInformation Systems Engineering

Technische Universitat BerlinBerlin, Germany

[email protected]

Sebastian WernerInformation Systems Engineering

Technische Universitat BerlinBerlin, Germany

[email protected]

Stefan TaiInformation Systems Engineering

Technische Universitat BerlinBerlin, [email protected]

Abstract—Serverless computing defines a pay-as-you-go cloudexecution model, where the unit of computation is a function thata cloud provider executes and auto-scales on behalf of a cloudconsumer. Serverless suggests not (or less) caring about serversbut focusing (more) on business logic expressed in functions.Server’less’ may be ‘more’ when getting developer expectationsand platform propositions right and when engineering solutionsthat take specific behavior and constraints of (current) Function-as-a-Service platforms into account. To this end, in this invitedpaper, we present a summary of findings and lessons learnedfrom a series of research experiments conducted over the pasttwo years. We argue that careful attention must be placed onthe promises associated with the serverless model, provide areality-check for five common assumptions, and suggest waysto mitigate unwanted effects. Our findings focus on applicationworkload distribution and computational processing complexity,the specific auto-scaling mechanisms in place, the behavior andstrategies implemented with operational tasks, the constraintsand limitations existing when composing functions, and the costsof executing functions.

Keywords-serverless computing, function-as-a-service (FaaS),FaaS platforms, FaaS experimentation.

INTRODUCTION

Serverless computing is emerging as an alternative cloudexecution model. Instead of leasing infrastructure resourcesas in traditional cloud systems, developers using serverlesscomputing just submit function code to a cloud provider;thus the term ”Function-as-a-Service (FaaS)”. The generalpromise is that the provider runs the code on-demand, scalesautomatically as needed, and bills the consumer only for thetime that the function code is running.

Serverless computing thus suggests to abstract away op-erational concerns, to offer a simplified programming anddeployment model for cloud applications, and to offer a costadvantage. Because of cloud developers ”doing less”, i.e.,not caring about provisioning and operating machines, so thepromise, developers can focus and ”do more” on the businessaspects of their applications.

In this paper, we identify a number of ifs and buts formaking this ”less is more” promise a reality. We argue topay careful attention on common assumptions existing withFaaS platforms and to question these assumptions taking thespecifics and idiosyncrasies of the (current) FaaS platformsinto account. We identify five such common assumptions and

corresponding engineering needs, related to application work-load distribution and function processing complexity, auto-scaling based on events, the importance of operational tasks,function compositions and dependencies, and execution costs.Without a good understanding of the reality and limitationsof (current) FaaS offerings and implementations, developersrun danger to face even more responsibilities related to fixingundesirable system behavior, dealing with system failures,or running into unexpected costs. Neither the promises of asimplified execution model nor a cost advantage would thenbe fulfilled.

Our findings are based on and summarize prior research andexperimental work on serverless computing conducted overthe last two years. This paper thus presents a cumulative,integrative view of what we have been observing (and arestill learning about) serverless computing. Due to the dy-namic nature of the FaaS market and concrete FaaS offeringsevolving at rapid speed, this paper to some extent is of asituational nature. Still, we believe our findings to be criticalfor the continued, larger objective of computing conveniently,cost-efficiently, with auto-scaling and minimal infrastructuremanagement efforts, in the cloud.

BACKGROUND

Application developers can choose from a wide range ofboth commercial fully-managed FaaS platforms [1] and non-commercial open-source FaaS platforms [2]. Examples offully-managed offers include AWS Lambda1, Google CloudFunctions2, Azure Functions3, and IBM Cloud Functions4.Prominent candidates of open source FaaS platform imple-mentations are OpenWhisk5, Fission6, and Knative7.

Common to all FaaS platforms is that each platform definesa programming model, a function execution model, and a costmodel.

At the core of serverless computing is the function abstrac-tion: a function is a specific task that takes some input and

1https://aws.amazon.com/de/lambda2https://cloud.google.com/functions3https://azure.microsoft.com/services/functions4https://www.ibm.com/cloud/functions5https://openwhisk.apache.org6https://fission.io/7https://cloud.google.com/knative

Page 2: The Ifs and Buts of Less is More: A Serverless Computing Reality … · Functions2, Azure Functions3, and IBM Cloud Functions4. Prominent candidates of open source FaaS platform imple-mentations

Preprint, to appear in Procs. IEEE International Conference on Cloud Engineering (IC2E 2020), 21-24 April, Sydney Australia

returns some output. From an application programming per-spective, functions become the main and dominating program-ming abstraction. From an application execution perspective,every function execution is isolated and typically ephemeral.

The single execution of a function is triggered by eventsraised by event sources (function triggers), for example, HTTPrequests or a database update. A function handler imple-mentation must be provided by the application developer.In general, FaaS platforms support different programminglanguages and models, e.g., Python, Go, or NodeJS. Thehandler implementation then contains all the code and librariesnecessary to run the function, including, for example, memorysettings for the programming runtime and operating systemof choice, calls to external services like database or queuingservices, or the definition of an explicit return statement.

The execution model determines how a FaaS platformschedules the processing of events and provisions and de-provisions runtimes. Different execution orders and executionguarantees, such as At-Least Once, can be defined. Afterexecution, the runtime is either kept active for a certain periodof time or instantly destroyed. A FaaS platform can reuse afunction runtime for new event arriving.

Cost models of commercial FaaS platforms only includeprice components based on actual function executions. Exam-ples of such components are number of executions, executionduration, or used memory.

RELATED WORK

This paper synthesizes and discusses insights gained in ownprior work related to serverless computing [3]–[8]. Figure 1illustrates the six main publications, ordered in a timeline, thatserve as a basis to this paper. We refer to these publicationswith the shorthand notation: P1-P6.

2019 20202018

P1 –Faas

Cost-Tracing:

Method & System

P2 -Big Data Processing:

Design Guidelines

P3 -FaaS Benchmarking:

Survey

P4 -Big Data Processing:

Case Study

P6 -FaaS Benchmarking:

Elasticity

P5 -FaaS Benchmarking:

Operational Tasks

Figure 1. Timeline of select own previous work in the area of serverlesscomputing.

Hendrickson et al. [9], Baldini et al. [10], Hellerstein etal. [11] and Jonas et al. [12] relate to our work in that theyreport on serverless computing experience, identify fallaciesand pitfalls of current platforms, and propose an agendafor future research. Our work is in line with these reports,however, our work is not primarily about proposing (yetanother) research agenda, but to share lessons learned whenaddressing a specific serverless computing research problem(for example, related to benchmarking or cost tracing). In P1-P6, we report on extensive experiments conducted and proposeconcrete methods and solutions to address these challenges.

COMMON ASSUMPTIONS AND CURRENT REALITY

Typical applications for which serverless computing hasbeen suggested include: data analytics [13]–[15], scientificworkflows [16], [17], video processing [18], [19], chat-bots [20], and publish/subscribe systems [21], [22].

Using FaaS when engineering such applications raises a setof questions:

• How well does the function abstraction and executionmodel fit diverse application workloads and computa-tional processing complexities?

• Are the auto-scaling features of (current) FaaS platformssatisfactory?

• Given the (new) importance of cloud provider-side oper-ational tasks in FaaS architectures, is the set and actualbehavior of available pre-defined tasks appropriate?

• Can functions easily be composed without the composi-tions experiencing undesirable behavior or limitations?

• Does FaaS indeed offer a significant cost advantage overtraditional cloud computing models?

In the following, we take a closer look at these questions.We describe five main common assumptions and for each com-mon assumption, state the current, observed reality. Further,we discuss for each assumption the corresponding (engineer-ing) challenge, ways to detect or mitigate the challenge, andfurther opportunities for future research.

WORKLOAD AND PROCESSING COMPLEXITY

Common Assumption 1. The single function abstractionprovided by any FaaS platform suits all function needs in-dependent of the complexity of application workloads.

Different types of applications use different ways to dis-tribute workload in specific units of work that are processedby a single machine. Examples of units of work are: a webrequest that loads a thumbnail for a product in a webshop, animage that is resized and changed to a different format, or abatch of data records that are aggregated by adding a specificfield.

Traditional cloud computing models require developers toselect a virtual machine type and application framework thatis tailored to the computational needs of a workload at handfrom a large available pool. Moreover, developers must decideon appropriate means for queuing incoming workload anddispatching units of work to machines with sufficient capacity.

Developers expect FaaS platforms to process work unitsfrom different existing application domains in the functionabstraction. However, units of work can significantly differregarding inputs to and outputs from function handlers andhandler executions with different computational needs.

Reality 1. Workload and processing complexity must match aFaaS platform’s function handler settings.

In current reality, FaaS offerings do not distinguish differentfunction ’types’ but only provide a single function abstraction.For the single function abstraction provided, the executionmodels limit the amount and type of compute resources that

Page 3: The Ifs and Buts of Less is More: A Serverless Computing Reality … · Functions2, Azure Functions3, and IBM Cloud Functions4. Prominent candidates of open source FaaS platform imple-mentations

Preprint, to appear in Procs. IEEE International Conference on Cloud Engineering (IC2E 2020), 21-24 April, Sydney Australia

are available to the runtime of a function handler. Furthermore,FaaS platforms can expose a strict upper bound for theexecution time of a function handler.

Challenge: Many FaaS application scenarios will fitand execute well within the given ’standard’ boundaries ofa FaaS platform. For example, we found image resizing andprocessing single measurements in an IoT context to matchcurrent offerings well [3]. However, computationally intensiveapplications can easily exceed the time limit of commonfunction handlers, making the function abstraction infeasiblefor specific application requirements. An example of such acase is matrix multiplication [4] within the context of bigdata processing, matrix multiplication being a fundamentaltype of computation useful in diverse application domains thathave high computational needs compared to small input sizes(see Figure 2). In such cases, standard function handlers willfail execution and can quickly degrade the reliability of theapplication.

If a function abstracts a data-intensive functionality, loadinginputs from and writing outputs to external storage can requirea significant share of the maximum execution time of afunction handler. Thus, only a small amount of time remainsfor processing inputs. This behavior, for instance, frequentlyoccurs in data analytics scenarios.

Detection/Mitigation: We argue that continuous testingand monitoring of performance, reliability, and cost-efficiencyare necessary approaches for detecting application defects thatresult from the above boundaries currently imposed by FaaSfunction handlers.

We discuss four mitigation approaches: First, applicationdevelopers can design and implement an application-specificfunction decomposition based on a workload distributionmodel. Second, different FaaS platforms have different limita-tions for handler settings. Thus, an experiment-driven, compar-ative assessment of different FaaS platforms can expose thesedifferences and help to select the FaaS platform that fits theapplication requirements best. Third, developers can mitigatethe execution time boundary through the use of functionchaining [23]. Function chaining re-executes the same functionhandler in response to the same event. While function chainingmight improve reliability, it can impair cost-efficiency andperformance. Lastly, augmenting the runtime life-cycle modelof a FaaS platform with application-specific information couldenable caching and/or batching of redundant work for multipleevents. All mitigation approaches, however, imply significantadditional responsibilities for application developers.

Opportunity: In our previous work, we found that the re-quirement of implementing a new workload distribution modelfor the case of matrix multiplication essentially provideda simple and effective application-level tuning-knob for thecost-performance trade-off. We archived speedup by furtherdecreasing the workload per invocation under the processiblelimit. Precisely, we increased the number of total splits (seeFigure 2), which reduced the number of multiplications forsingle invocations.

In order to support data-intensive applications features for

UnifyPartitions

UnifyPartitions

CollectPartionCollectPartion

MultiplyPartitionMultiplyPartition

InvokeWorkers

MultiplyPartitions

InputMatrices

MultipliedResults

CollectSplit

UnifySplits

FinalResults

UnitResults

Mat

rix B

Matrix C

Mat

rix A

Split 1

Split 3

Split 4

Split 2

Figure 2. Illustration of the resulting composition of functions and objectstorage for matrix multiplication (based on P2).

loading and writing data beyond the current boundaries offunction handlers are needed. Specifically, function handlerinterfaces to respond to runtime life-cycle changes in theexecution environment are beneficial. This would allow, forexample, a developer to combine the write of the result untilthe execution environment is about to be paused instead ofwriting after every invocation.

AUTO-SCALING BASED ON EVENTS

Common Assumption 2. Functions scale automatically andtransparently with a volatile number of events.

Supporting applications that can deal with volatile work-loads has always been one of the main drivers for theadoption of cloud computing. In the traditional cloud model,an application developer manages both load balancing andresource provisioning herself. One of the promises of theFaaS model is that developers can now hand these labor-intensive responsibilities off to the cloud provider, allowing thedeveloper to concentrate more on implementing the businesslogic.

Reality 2. Standard provisioning and de-provisioning time ofany FaaS platform may not suit an application’s scaling needs.

Each FaaS provider implements a generic elasticity con-troller to respond to incoming events and to manage functionenvironments on behalf of the user. None of the current FaaSofferings, however, allows a user to control this elasticity con-troller, which can result in a misalignment with the applicationneeds and the platform.

Challenge: One challenge that can occur due to thismisalignment is temporal in-elastic behavior. That is, a FaaSplatform may temporarily not scale fast enough based on

Page 4: The Ifs and Buts of Less is More: A Serverless Computing Reality … · Functions2, Azure Functions3, and IBM Cloud Functions4. Prominent candidates of open source FaaS platform imple-mentations

Preprint, to appear in Procs. IEEE International Conference on Cloud Engineering (IC2E 2020), 21-24 April, Sydney Australia

warmup scaling cooldown0 20 40 60 80 100 120 140 160 180 200 220 240 260

Elapsed Time [s]

0

20

40

60

80

100

120

Requ

ests

[#]

Amazon Lambda: Client-View

0

5

10

15

20

25

30

Late

ncy

[s]

Target Request/sFailed RequestsClient-side Request-Response LatencyFunction Execution Latency

warmup scaling cooldown0 26024018012060

Elapsed Time [s]

Para

llel E

xecu

tions

[#]

Amazon Lambda: Platform-View

First ExecutionExecutions/s

warmup scaling cooldown0 20 40 60 80 100 120 140 160 180 200 220 240 260

Elapsed Time [s]

0

20

40

60

80

100

120

Requ

ests

[#]

Google Cloud Functions: Client-View

0

5

10

15

20

25

30

Late

ncy

[s]

warmup scaling cooldown0 26024018012060

Elapsed Time [s]

Para

llel E

xecu

tions

[#]

Google Cloud Functions: Platform-View

Figure 3. Experiment-driven comparison of the FaaS platforms AWS Lambda and Google Cloud Functions under linearly increasing target-requests-per-second (trps) from 0 to 120 trps over a 60 seconds interval (based on P6). Observable are temporary changes in three client-side metrics: Failed Requests,Client-side Request-Response-Latency, and Function-Execution-Latency and two function-side metrics: Number of Executions per Second, and Time of FirstExecution.

incoming events resulting in degraded performance and reli-ability under increasing workloads. In our previous work [8],we detected that, for instance, Google Cloud Functions havea delayed scaling-response to a volatile increase of events, asdepicted in Figure 3.

While the challenge of temporal in-elastic results fromtoo slow provisioning of execution environments, a secondchallenge arises from too fast, i.e., greedy, de-provisioningof execution environments. Greedy de-provisioning hindersthe re-use of cachable results from previous executions, forexample, downloading a large machine learning model beforeprocessing incoming images. Thus, greedy de-provisioningby the FaaS provider might degrade performance and incurundesirable costs under volatile workloads.

Detection/Mitigation: Detecting if an application is af-fected by these challenges can be labor-intensive and requirea broad understanding of the application needs as well as ofthe FaaS platform. In [8] we present an approach to detect andquantify these problems. We also give a general overview ofthe current elasticity behavior of four major FaaS platforms.

Besides detecting if an application is affected, mitigationstrategies exist that help to reduce the impact of these prob-lems. The temporal in-elastic behavior of a FaaS platformmight not be avoidable. However, developers can reduce thestart-up time of new function environments by reducing thesize of the deployment package. This reduction in size willreduce the impact of the so-called cold-start problem [24]and thus lessen the impact of scaling delays. The greedyde-provisioning of an execution environment might also not

be avoidable. However, developers can use external high-performance services to offload expensive pre- and post-condition time. An application that needs to acquire additionaldata might benefit if this data is cached in an ephemeral storageservice [12], [25] and expensive authentication tasks could beexternalized by relying on provider services instead.

Opportunity: New business and research opportunitiesexist to address the challenges of caching and offloadingexpensive pre- and post-conditions. Further, the extension ofFaaS platforms with a more tune-able elasticity controlleris a viable path for a cloud provider to increase her valueproposition.

OPERATIONAL TASKS

Common Assumption 3. Few fully managed operationaltasks that execute transparently solve the diversity of man-agement needs.

The management of applications requires the executionof different operational tasks through its lifetime. Examplesinclude updating deployment packages, changing routs, rollingback updates, or resizing resources. Furthermore, even thesame operational task can have varying quality requirements.For instance, some periods of unavailability of a function maybe acceptable for the roll-out of a security fix. In contrast, theroll-out of a new version should not impact availability.

Reality 3. FaaS platforms implement and apply operationaltasks with fixed strategies.

Page 5: The Ifs and Buts of Less is More: A Serverless Computing Reality … · Functions2, Azure Functions3, and IBM Cloud Functions4. Prominent candidates of open source FaaS platform imple-mentations

Preprint, to appear in Procs. IEEE International Conference on Cloud Engineering (IC2E 2020), 21-24 April, Sydney Australia

0 20 40 60 80 100Elapsed Time [s]

Exec

utio

ns [#

]AWS Lambda: Code Change

Envi

rom

ents

[#]

0 20 40 60 80 100Elapsed Time [s]

Exec

utio

ns [#

]

Google Cloud Functions: Code Change

Code Version 1Code Version 2Change RequestExecution Enviroment

Envi

rom

ents

[#]

Figure 4. Comparison of the FaaS Platforms AWS Lambda and Google Cloud Functions for deploying a new code version into production based on client-sidevisible (i) Version Consistency and (ii) Update Delay (based on P5).

Each FaaS platform implements a small set of operationaltasks, e.g., for deployment package or function configurationchanges, based on generic platform requirements. Each FaaSplatform, however, stops short in offering specialized strategiesfor executing operational tasks in support of specific manage-ment needs, e.g., rollbacks, or security fixes.

Challenge: One challenge that the one-solution-fits-allapproach creates is that critical management tasks like rollingout security patches are treated the same way by the platformas a common feature update. Our previous experimentation[7] show that some FaaS platforms delay the roll-out ofdeployment package changes, see the update delay in GoogleCloud(ii) in Figure 4. Thus, older code versions remainexposed for a variable amount of time. Such behavior can beundesirable if the older version has a costly defect. Especially,Microsoft’s and Google’s platforms show this behavior [7].

A second challenge can arise during a release that requiresthe atomic change of multiple functions. For example, afunction A depends on a function B. Updating B to B’ makesan update of A to A’ mandatory. Thus, using A and B’ or A’and B results in runtime errors. Traditionally, release engineerscan successfully address this challenge by using one of thetwo approaches (i) a release unit or (ii) a coordinated release.A single release unit would bundle A’ and B’ in a singleartifact for release. A coordinated release would use additionalmeans for coordination that enforce the correct pairing offunctions. The fine-granular programming model of FaaS canmake release units infeasible. Furthermore, FaaS platformsrestrict coordinated releases due to the fully-managed natureof associated operational tasks. These operational tasks areoften undocumented, have client-side visible side-effects, andcan significantly differ for Faas Platforms (see Figure 4).

Detection/Mitigation: We argue that experimentation andtesting of the effects of operational tasks are needed to under-stand these risks. In prior work [7] we presented a methodto evaluate and understand these risks during the developmentlife-cycle of a FaaS application.

We further recommend that developers should evaluate andselect FaaS platforms that better fit their risk scenarios as the

implementation of operational tasks differs significantly acrossthe major cloud providers. To further mitigate these issues, werecommend implementing feature toggles [26] that can enablemore complex roll-out behavior. For example, feature togglescan prevent calls to incompatible downstream services. Lastly,we recommend adding version references to events to allowfunctions to handle out of date events programmatically.

Opportunity: Applications that leverage the pre-definedoperational tasks of FaaS providers can introduce significantbenefits. For instance, functions can resize resources duringthe runtime of a job to mitigate errors that would previouslyresult in failures. Furthermore, developers can build morereliable data processing pipelines that do not suffer from outof memory problems for skewed input distributions [7].

Another possibility for applications that can leverage theoperational tasks effectively is the ad-hoc customization ofdeployment packages based on event inspection. Our previouswork [7] indicates that providers like AWS apply deploymentpackage changes quickly, allowing, for instance, to deploydifferent sorting strategies based on the content of events.

FUNCTION COMPOSITION

Common Assumption 4. Functions can easily be composedto implement any business logic.

FaaS platforms lend themselves well to diverse types ofapplications. Each application has different function composi-tion needs. Ideally, function composition comes with as fewconstraints and restrictions as possible.

Reality 4. Function compositions experience limitations inchoices of runtime environments and resource capabilitiesoutside of the control of the application developer.

Current FaaS platforms (naturally) limit the control of thedeployed execution environment. Moreover, persistent storagein execution environments is often restricted [4]. Furthermore,these environments only have limited connectivity and canonly be invoked through a finite set of triggers.

The implementation of even simple business logic mayquickly require composing additional services. A FaaS-based

Page 6: The Ifs and Buts of Less is More: A Serverless Computing Reality … · Functions2, Azure Functions3, and IBM Cloud Functions4. Prominent candidates of open source FaaS platform imple-mentations

Preprint, to appear in Procs. IEEE International Conference on Cloud Engineering (IC2E 2020), 21-24 April, Sydney Australia

CheckReading

PredictProduction

MeterReadings

ProductionPredictions

ConvertReading

reading

reading

reading

historicreadings

prediction

notifications

Alarms

Key:

Function= StorageService=

Data Flow forSync. Req.= Data Flow for

Async. Req.=

Cost DefectAnalysis

Figure 5. Cost defect example: Illustration of a composition realisingthe IngestMeterReading operation of a Smart Grid Meter Application(based on P1).

analytics process, for example, will need external storageservices to store intermediate and final results, as experiencedduring the FaaS-ification of a distributed matrix multiplicationscenario [4]. Figure 2 illustrates this application, where bothfunction composition and workload distribution (see Reality 1)were critical.

Challenge: There are FaaS-specific service compositionneeds due to environmental restrictions that result in severalchallenges for developers.

Debugging complex service compositions can be more com-plicated than in traditional cloud models. Especially, detectingthe root-cause of errors requires to analyze executions in het-erogeneous, highly decoupled systems, requiring developers tounderstand multiple different services and different debuggingand logging tools [27].

A further challenge relates to co-locating computation anddata. Especially for data-intensive applications, it is beneficialto move computation towards the data. However, FaaS plat-forms require to move the data to the computation.

It can be further challenging to migrate complex composedapplications to another cloud provider. Both the interfacesand qualities of specialized provider services can be unique.Thus, any such migration is very difficult and labor-intensive.Selecting an alternative service on a different cloud providerwithout significant re-engineering of the application is nottrivial.

Detection/Mitigation: One way to mitigate the chal-lenges of debugging FaaS-based applications is to introducebetter monitoring and testing that covers all services used byan application. Especially, the usage of distributed tracing anddistributed logging services with automated analysis can helpto reduce this overhead.

In order to address the data-locality challenges, we recom-mend the usage of specialized storage services for sharing

0 50 100 150 200Execution Latency [s]

0

100

200

300

400

500

600

Mar

gian

l Exe

cutio

n Co

st [n

$]

55 Traces:no failures

3 Traces:one failure

16 Traces:two failures

20 Traces:three failures

Figure 6. Illustration of the composition-driven re-execution cost defectbased on 94 calls to the IngestMeterReading operation (see figure 5).Executions belong to one of four clusters that are subject to 0-3 re-executionattempts of the PredictProduction function with associated increasedexecution latency and marginal execution costs (based on P1).

ephemeral data that can deal with the workload of FaaSapplications [25]. We also recommend evaluating the differentservices offerings to get data to a FaaS application to findthe best fit for a given application. Lastly, the extensionof the FaaS programming and execution model with datamanagement operations might enable better data processingperformance.

Opportunity: FaaS allows implementing ad-hoc large-scale computations in big data processing systems with lowentry barriers [6]. FaaS can enable large-scale computationwithin seconds, as many experiments have shown [13], [19].

EXECUTION COSTS

Common Assumption 5. FaaS offers a significant cost ad-vantage compared to traditional cloud computing models.

The cost model calculation of traditional cloud deploymentmodels typically is based on the size and duration of theprovisioned machines. Thus, provisioned machines with lowutilization are not cost-efficient. It is the responsibility ofapplication developers to ensure that appropriate mechanismsfor combined load-balancing and provisioning are in place thatensure the high utilization of provisioned machines over time.

In contrast, in serverless computing using FaaS, the cloudprovider bills the consumer only for the time that a function isactually running. Thus, application developers need to investless time and work in load-balancing and provisioning whileavoiding paying for idle machines. As consequence, cost-savings are the main expectation of application developers forusing FaaS [23].

Reality 5. The marginal costs of running code are higher, andcompositions can become additional cost drivers.

The cost model of FaaS can realize significant cost benefitsfor volatile cost models by avoiding charges for idle machines.

Page 7: The Ifs and Buts of Less is More: A Serverless Computing Reality … · Functions2, Azure Functions3, and IBM Cloud Functions4. Prominent candidates of open source FaaS platform imple-mentations

Preprint, to appear in Procs. IEEE International Conference on Cloud Engineering (IC2E 2020), 21-24 April, Sydney Australia

However, case studies show that for specific workloads, themarginal costs of executing code are lower [28] or higher [29]compared to traditional cloud computing models. Thus, config-uring the ”right” amount of computing resources for executinga function, i.e., sizing, remains a critical management task fordevelopers. In practice, sizing is often non-trivial because ofthe implicit cost-performance trade-off that has to be solvedbased on the requirements of a specific application context.

Moreover, functions do not execute in isolation but compo-sitions. The composition logic can become an additional driverof the cost. We provide three examples of composition-relatedcost-drivers in [3]. They include:

• Re-execution: Failing downstream services can result inmultiple executions of upstream functions.

• Synchronous Calls: Functions that make synchronouscalls to backend services count as executing and are billedduring the wait-time.

• Repeated Calls: A fine-granular workload distribution(see Reality 1), isolation, and ephemeral storage (seeReality 4) can increase the total number of calls tobackend services, e.g., databases, for the same amountof work [30].Detection/Mitigation: Testing, experimentation, and

monitoring are appropriate means for detecting the need forsizing-related cost-debugging. Sizing is a re-occurring task fornew code commits. Thus, sizing tasks should be automated.First approaches in support of automating sizing decisions forisolated AWS Lambda functions exist8. While such approachesare a promising start, sizing decisions should not occur forisolated functions but taking composition logic into account.Debugging composition-driven cost defects can prove difficult.Leitner et al. [31] propose an approach for detecting costs perservice and application. However, in many cases, detecting theroot-cause of cost-defects requires cost-tracing infrastructure.Unfortunately, due to the high level of abstraction, tracing canbe challenging in a serverless environment. Thus, we proposea corresponding cost-tracing approach [3]. We argue that forsome applications, static code analysis can be appropriate todetect composition-driven cost defects. However, to the bestof our knowledge, respective approaches are still missing.

Opportunity: The fine-granular metering model of theserverless abstraction enables a plethora of new opportunities.For example, application developers could offer correspondingfine-granular cost models for application-level APIs. For largecomputational tasks, e.g., big data analytics, the serverlessabstraction could enable job-level billing, budgets, and runtimebudgets.

CONCLUSION

Serverless computing and Function-as-a-Service platformsare gaining traction for good reasons. The serverless pro-gramming model, however, introducing a specific functionabstraction, and the serverless execution model, introducingdependencies to event and runtime handler implementations

8https://github.com/alexcasalboni/aws-lambda-power-tuning

and configurations, are inherently different from conventionalcloud computing practice. In this paper, we describe a set offive common serverless assumptions and provide a state-of-the-art assessment for each. This reality check, based on aseries of extensive experimentation over the last two years,reveals important challenges that, if not carefully addressed,can easily counter the serverless promises. We expect thatsome observed realities will change as FaaS platforms andimplementations change. FaaS platform propositions and de-veloper expectations must, however, be continuously be ques-tioned and verified. To that end, the need for continuousexperimentation and benchmarking along with the definitionof appropriate metrics and measurement methods remains tobe important future work.

REFERENCES

[1] H. Lee, K. Satyam, and G. Fox, “Evaluation of Production ServerlessComputing Environments,” in Proc. of the IEEE 11th InternationalConference on Cloud Computing, ser. CLOUD’18. San Francisco, CA,USA: IEEE, July 2018, pp. 442–450.

[2] K. Kritikos and P. Skrzypek, “A review of serverless frameworks,”in 2018 IEEE/ACM International Conference on Utility and CloudComputing Companion, ser. UCC’18. IEEE, Dec 2018, pp. 161–168.

[3] J. Kuhlenkamp and M. Klems, “Costradamus: A cost-tracing systemfor cloud-based software services,” in Service-Oriented Computing, ser.ICSOC’15, M. Maximilien, A. Vallecillo, J. Wang, and M. Oriol, Eds.Cham: Springer International Publishing, 2017, pp. 657–672.

[4] S. Werner, J. Kuhlenkamp, M. Klems, J. Muller, and S. Tai, “Serverlessbig data processing using matrix multiplication as example,” in 2018IEEE International Conference on Big Data. Seattle, WA, USA: IEEE,12 2018, pp. 358–365.

[5] J. Kuhlenkamp and S. Werner, “Benchmarking FaaS Platforms: Call forCommunity Participation,” in Proc. of the 3rd International Workshopon Serverless Computing, ser. WoSC’18. Zurich, Switzerland: IEEE,12 2018, pp. 189–194.

[6] S. Werner, J. Kuhlenkamp, M. Klems, J. Muller, and S. Tai, “Server-less big data processing using matrix multiplication as example,” inINFORMATIK 2019: 50 Jahre Gesellschaft fur Informatik - Informatikfur Gesellschaft, K. David, K. Geihs, M. Lange, and G. Stumme, Eds.Bonn: Gesellschaft fur Informatik e.V., 2019, pp. 263–264.

[7] J. Kuhlenkamp, S. Werner, M. C. Borges, K. El Tal, and S. Tai,“An evaluation of faas platforms as a foundation for serverless bigdata processing,” in Conference on Utility and Cloud Computing, ser.UCC’19. New York, NY, USA: ACM, 2019, pp. 1–9.

[8] J. Kuhlenkamp, S. Werner, M. C. Borges, D. Ernst, and D. Wen-zel, “Benchmarking elasticity of faas platforms as a foundation forobjective-driven design of serverless applications,” in Proc. of the 34thACM/SIGAPP Symposium on Applied Computing, ser. SAC ’19. NewYork, NY, USA: ACM, 2019, pp. 284–291.

[9] S. Hendrickson, S. Sturdevant, T. Harter, V. Venkataramani, A. C.Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “Serverless computationwith openlambda,” in 8th USENIX Workshop on Hot Topics in CloudComputing, ser. HotCloud’16. Denver, CO: USENIX Association,2016, pp. 14–19.

[10] I. Baldini, P. Castro, K. Chang, P. Cheng, S. Fink, V. Ishakian,N. Mitchell, V. Muthusamy, R. Rabbah, A. Slominski, and P. Suter,“Serverless computing: Current trends and open problems,” in ResearchAdvances in Cloud Computing, S. Chaudhary, G. Somani, and R. Buyya,Eds. Singapore: Springer Singapore, 2017, pp. 1–20.

[11] Joseph M. Hellerstein, Jose M. Faleiro, Joseph E. Gonzalez, JohannSchleier-Smith, Vikram Sreekanti, Alexey Tumanov, and ChenggangWu, “Serverless Computing: One Step Forward, Two Steps Back,”CoRR, vol. abs/1812.03651, pp. 1–8, Jan 2018.

[12] E. Jonas, J. Schleier Smith, V. Sreekanti, C.-C. Tsai, A. Khandelwal,Q. Pu, V. Shankar, J. Carreira, K. Krauth, N. Yadwadkar, J. E. Gonzales,R. A. Popa, I. Stoica, and D. A. Patterson, “Cloud ProgrammingSimplified: A Berkeley View on Serverless Computing,” CoRR, vol.arXiv preprint arXiv:1902.03383, pp. 1–33, Feb 2019.

Page 8: The Ifs and Buts of Less is More: A Serverless Computing Reality … · Functions2, Azure Functions3, and IBM Cloud Functions4. Prominent candidates of open source FaaS platform imple-mentations

Preprint, to appear in Procs. IEEE International Conference on Cloud Engineering (IC2E 2020), 21-24 April, Sydney Australia

[13] E. Jonas, Q. Pu, S. Venkataraman, I. Stoica, and B. Recht, “Occupythe Cloud: Distributed Computing for the 99%,” in Proc. of the 2017Symposium on Cloud Computing, ser. SoCC’17. New York, NY, USA:ACM, 2017, pp. 445–451.

[14] Y. Kim and J. Lin, “Serverless Data Analytics with Flint,” CoRR, vol.abs/1803.06354, Aug 2018.

[15] J. Sampe, G. Vernik, M. Sanchez Artigas, and P. Garcia Lopez, “Server-less Data Analytics in the IBM Cloud,” in Proc. of the 19th InternationalMiddleware Conference Industry, ser. Middleware’18. New York, NY,USA: ACM, 2018, pp. 1–8.

[16] M. Malawski, A. Gajek, A. Zima, B. Balis, and K. Figiela, “ServerlessExecution of Scientific Workflows: Experiments with HyperFlow, AWSLambda and Google Cloud Functions,” Future Generation ComputerSystems, Nov 2017.

[17] G. C. Fox, V. Ishakian, V. Muthusamy, and A. Slominski, “Status ofserverless computing and function-as-a-service(faas) in industry andresearch,” CoRR, vol. abs/1708.08028, Aug 2017.

[18] L. Ao, L. Izhikevich, G. M. Voelker, and G. Porter, “Sprocket: A Server-less Video Processing Framework,” in Proc. of the ACM Symposium onCloud Computing, ser. SoCC’18. New York, NY, USA: ACM, 2018,pp. 263–274.

[19] S. Fouladi, R. S. Wahby, B. Shacklett, K. V. Balasubramaniam, W. Zeng,R. Bhalerao, A. Sivaraman, G. Porter, and K. Winstein, “Encoding,Fast and Slow: Low-Latency Video Processing Using Thousands ofTiny Threads,” in Proc. of the 14th USENIX Symposium on NetworkedSystems Design and Implementation, ser. NSDI 17. Boston, MA, USA:USENIX Association, 2017, pp. 363–376.

[20] M. Yan, P. Castro, P. Cheng, and V. Ishakian, “Building a chatbot withserverless computing,” in 1st International Workshop on Mashups ofThings and APIs, ser. MOTA’16. New York, NY, USA: ACM, 2016,pp. 5:1–5:4.

[21] P. Nasirifard, A. Slominski, V. Muthusamy, V. Ishakian, and H.-A. Jacobsen, “A Serverless Topic-based and Content-based Pub/SubBroker: Demo,” in Proc. of the 18th ACM/IFIP/USENIX MiddlewareConference: Posters and Demos, ser. Middleware’17. New York, NY,USA: ACM, 2017, pp. 23–24.

[22] F. Hafeez, P. Nasirifard, and H.-A. Jacobsen, “A serverless approach

to publish/subscribe systems,” in 19th International Middleware Con-ference (Posters), ser. Middleware ’18. New York, NY, USA: ACM,2018, pp. 9–10.

[23] P. Leitner, E. Wittern, J. Spillner, and W. Hummer, “A mixed-methodempirical study of function-as-a-service software development in indus-trial practice,” Journal of Systems and Software, vol. 149, pp. 340 – 359,Dec 2018.

[24] J. Manner, M. EndreB, T. Heckel, and G. Wirtz, “Cold Start InfluencingFactors in Function as a Service,” in 2018 International Conference onUtility and Cloud Computing Companion (UCC Companion). Zurich,Switzerland: IEEE, Dec. 2018, pp. 181–188.

[25] A. Klimovic, Y. Wang, P. Stuedi, A. Trivedi, J. Pfefferle, andC. Kozyrakis, “Pocket: Elastic Ephemeral Storage for Serverless An-alytics,” in Proc. of the 12th USENIX Conference on Operating SystemsDesign and Implementation, ser. OSDI’18. Berkeley, CA, USA:USENIX Association, 2018, pp. 427–444.

[26] L. Bass, I. Weber, and L. Zhu, DevOps: A Software Architect’s Perspec-tive, 1st ed. Addison-Wesley Professional, 2015.

[27] J. Manner, S. Kolb, and G. Wirtz, “Troubleshooting serverless func-tions: a combined monitoring and debugging approach,” SICS Software-Intensive Cyber-Physical Systems, Feb 2019.

[28] M. Villamizar, O. Garces, L. Ochoa, H. Castro, L. Salamanca, M. Ve-rano, R. Casallas, S. Gil, C. Valencia, A. Zambrano, and M. Lang,“Cost Comparison of Running Web Applications in the Cloud UsingMonolithic, Microservice, and AWS Lambda Architectures,” ServiceOriented Computing and Applications, vol. 11, no. 2, pp. 233–247, Jun2017.

[29] A. Eivy, “Be Wary of the Economics of Serverless Cloud Computing,”IEEE Cloud Computing, vol. 4, no. 2, pp. 6–12, Mar 2017.

[30] Q. Pu, S. Venkataraman, and I. Stoica, “Shuffling, fast and slow: Scalableanalytics on serverless infrastructure,” in 16th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 19). Boston,MA: USENIX Association, 2 2019, pp. 193–206.

[31] P. Leitner, J. Cito, and E. Stockli, “Modelling and managing deploymentcosts of microservice-based cloud applications,” in Proc. of the 9thInternational Conference on Utility and Cloud Computing, ser. UCC’16. New York, NY, USA: ACM, 2016, pp. 165–174.