12
Continuous Deployment of Mobile Software at Facebook Chuck Rossi Facebook Inc. 1 Hacker Way Menlo Park, CA 94025 [email protected] Elisa Shibley University of Michigan 2260 Hayward Street Ann Arbor, MI 48109 [email protected] Shi Su Carnegie Mellon University PO Box 1 Moffett Field, CA 94035 [email protected] Kent Beck Facebook Inc. 1 Hacker Way Menlo Park, CA 94025 [email protected] Tony Savor Facebook Inc. 1 Hacker Way Menlo Park, CA 94025 [email protected] Michael Stumm University of Toronto 10 Kings College Rd Toronto, Canada M8X 2A6 [email protected] ABSTRACT Continuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption in industry and has numerous perceived advan- tages including (i) lower risk due to smaller, more incremen- tal changes, (ii) more rapid feedback from end users, and (iii) better ability to respond to threats such as security vul- nerabilities. The frequency of updates of mobile software has tradition- ally lagged the state of practice for cloud-based services. For cloud-based services, changes can be released almost imme- diately upon completion, whereas mobile versions can only be released periodically (e.g., in the case of iOS, every two weeks). A further complication with mobile software is that users can choose when and if to upgrade, which means that several dierent releases coexist in production. There are also hundreds of Android hardware variants, which increases the risk of having errors in the software being deployed. Facebook has made significant progress in increasing the frequency of its mobile deployments. In fact, over a pe- riod of 4 years, the Android release has gone from a deploy- ment every 8 weeks to a deployment every week. In this paper, we describe in detail the mobile deployment process at FB. We present our findings from an extensive analysis of software engineering metrics based on data collected over a period of 7 years. A key finding is that the frequency of deployment does not directly aect developer productivity or software quality. We argue that this finding is due to the fact that increasing the frequency of continuous deployment forces improved release and deployment automation, which in turn reduces developer workload. Additionally, the data we present shows that dog-fooding and obtaining feedback from alpha and beta customers is critical to maintaining re- lease quality. 1. INTRODUCTION Continuous deployment is the software engineering practice of deploying many small incremental software updates into production as soon as the updates are ready [1]. Continu- ous deployment is becoming increasingly widespread in the industry [2, 3, 4, 5, 6] and has numerous perceived advan- tages including (i) lower risk because of smaller, more in- cremental changes, (ii) more rapid feedback from end users, and (iii) improved ability to respond more quickly to threats such as security vulnerabilities. In a previous paper, we described the continuous deploy- ment process for cloud-based software ; i.e., Web frontend and server-side SAAS software, at Facebook and another company 1 [1]. We presented both its implementation and our experiences operating with it. At Facebook each de- ployed software update involved, on average, 92 Lines of Code (LoC) that were added or modified, and each devel- oper pushed 3.5 software updates into production per week on average. Given the size of Facebook’s engineering team, this resulted in 1,000’s of deployments into production each day. At both companies discussed in [1], the developers were fully responsible for all aspects of their software updates. In particular, they were responsible for testing their own code, as there was (by design!) no separate testing team. The only requirement was peer code reviews on all code before it was pushed. However, there was considerable automated support for testing and quality control. As soon as a devel- oper believed her software was ready, she would release the code and deploy it into production. As demonstrated in the previous paper, cloud-based soft- ware environments allow for many small incremental deploy- ments in part because the deployments do not inconvenience end-users — in fact, end users typically do not notice them. Moreover, the environments support a number of features to manage the risk of potentially deploying erroneous software, including blue-green deployments [7], feature flags, and dark launches [8]. As a worst case, stop-gap measure, it is always possible to “roll back” any recently deployed update at the push of a button, eectively undoing the deployment of the target software module by restoring the module back to its previous version. 1 OANDA Corp.

Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

Continuous Deployment of Mobile Software at Facebook

Chuck RossiFacebook Inc.1 Hacker Way

Menlo Park, CA [email protected]

Elisa ShibleyUniversity of Michigan2260 Hayward StreetAnn Arbor, MI [email protected]

Shi SuCarnegie Mellon University

PO Box 1Moffett Field, CA [email protected]

Kent BeckFacebook Inc.1 Hacker Way

Menlo Park, CA [email protected]

Tony SavorFacebook Inc.1 Hacker Way

Menlo Park, CA [email protected]

Michael StummUniversity of Toronto10 Kings College Rd

Toronto, Canada M8X [email protected]

ABSTRACTContinuous deployment is the practice of releasing softwareto production as soon as it is ready. It is receiving widespreadadoption in industry and has numerous perceived advan-tages including (i) lower risk due to smaller, more incremen-tal changes, (ii) more rapid feedback from end users, and(iii) better ability to respond to threats such as security vul-nerabilities.

The frequency of updates of mobile software has tradition-ally lagged the state of practice for cloud-based services. Forcloud-based services, changes can be released almost imme-diately upon completion, whereas mobile versions can onlybe released periodically (e.g., in the case of iOS, every twoweeks). A further complication with mobile software is thatusers can choose when and if to upgrade, which means thatseveral di↵erent releases coexist in production. There arealso hundreds of Android hardware variants, which increasesthe risk of having errors in the software being deployed.

Facebook has made significant progress in increasing thefrequency of its mobile deployments. In fact, over a pe-riod of 4 years, the Android release has gone from a deploy-ment every 8 weeks to a deployment every week. In thispaper, we describe in detail the mobile deployment processat FB. We present our findings from an extensive analysisof software engineering metrics based on data collected overa period of 7 years. A key finding is that the frequency ofdeployment does not directly a↵ect developer productivityor software quality. We argue that this finding is due to thefact that increasing the frequency of continuous deploymentforces improved release and deployment automation, whichin turn reduces developer workload. Additionally, the datawe present shows that dog-fooding and obtaining feedbackfrom alpha and beta customers is critical to maintaining re-lease quality.

1. INTRODUCTIONContinuous deployment is the software engineering practiceof deploying many small incremental software updates intoproduction as soon as the updates are ready [1]. Continu-ous deployment is becoming increasingly widespread in theindustry [2, 3, 4, 5, 6] and has numerous perceived advan-tages including (i) lower risk because of smaller, more in-cremental changes, (ii) more rapid feedback from end users,and (iii) improved ability to respond more quickly to threatssuch as security vulnerabilities.

In a previous paper, we described the continuous deploy-ment process for cloud-based software; i.e., Web frontendand server-side SAAS software, at Facebook and anothercompany1 [1]. We presented both its implementation andour experiences operating with it. At Facebook each de-ployed software update involved, on average, 92 Lines ofCode (LoC) that were added or modified, and each devel-oper pushed 3.5 software updates into production per weekon average. Given the size of Facebook’s engineering team,this resulted in 1,000’s of deployments into production eachday.

At both companies discussed in [1], the developers werefully responsible for all aspects of their software updates. Inparticular, they were responsible for testing their own code,as there was (by design!) no separate testing team. Theonly requirement was peer code reviews on all code beforeit was pushed. However, there was considerable automatedsupport for testing and quality control. As soon as a devel-oper believed her software was ready, she would release thecode and deploy it into production.

As demonstrated in the previous paper, cloud-based soft-ware environments allow for many small incremental deploy-ments in part because the deployments do not inconvenienceend-users — in fact, end users typically do not notice them.Moreover, the environments support a number of features tomanage the risk of potentially deploying erroneous software,including blue-green deployments [7], feature flags, and darklaunches [8]. As a worst case, stop-gap measure, it is alwayspossible to “roll back” any recently deployed update at thepush of a button, e↵ectively undoing the deployment of thetarget software module by restoring the module back to itsprevious version.

1OANDA Corp.

Page 2: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

In contrast with previous work, in this paper, we describeand analyze how continuous deployment is applied to mobilesoftware at Facebook (FB). FB’s mobile software is usedby over a billion people each day. The key challenge indeploying mobile software is that it is not possible to deploythe software continuously in the same manner as for cloud-based services, due the following reasons:

1. The frequency of software updates may be limited be-cause (i) software updates on mobile platforms are notentirely transparent to the end-user, and (ii) the timeneeded for app reviews to take place by the platformowner; e.g., Apple reviews for iOS apps.

2. Software cannot be deployed in increments one moduleat a time; rather all of the software has to be deployedas one large binary; this increases risk.

3. Risk mitigation actions are more limited; for example,hot-fixes and roll-backs are largely unacceptable andcan be applied only in the rarest of circumstances asthey involve the cooperation of the distributor of thesoftware (e.g., Apple).

4. The end user can choose when to upgrade the mobilesoftware (if at all), which implies that di↵erent ver-sions of the software run at the same time and need tocontinue functioning correctly.

5. Many hardware variants — especially for Android —and multiple OS variants need to be supported simul-taneously. The risk of each deployment is increasedsignificantly because of the size of the Cartesian prod-uct: app version ⇥ (OS type & version) ⇥ hardwareplatform.

Given the above constraints, a compelling open questionthat arises is: How close to “continuous” can one updateand deploy mobile software?

Facebook is striving to push the envelope to get as close aspossible to continuous deployment of mobile software. Thekey strategy is to decouple software development and re-leases from actual deployment; the former occurs frequentlyin small increments, while the latter occurs only periodically.In particular, developers push their mobile software updatesinto a Master branch at the same frequency as with cloudsoftware and in similarly-sized increments. The developersdo this whenever they believe their software is ready for de-ployment. Then, periodically, a Release branch is cut fromthe Master branch — once a week for Android and once ev-ery two weeks for iOS. The code in that branch is testedextensively, fixed where necessary, and then deployed to thegeneral public. The full process used at FB is described indetail in §2.

An important follow-on question is: How does this mobilesoftware deployment process a↵ect development productivityand software quality compared to the productivity and qual-ity achieved with the more continuous cloud software deploy-ment process?

We address this question in §5.2 where we show that pro-ductivity, when measured either in terms of LoC modifiedor added, or in terms of the number of commits per day, iscomparable with what it is for software as a whole at FB.We show that mobile software development productivity re-mains constant even as the size of the mobile engineering

team grows by a factor of 15X and even as the softwarematures and becomes more complex. In fact, we show howAndroid has gone from a deployment every 8 weeks to oneevery week over a period of 4 years, with no noticeable e↵ecton programmer productivity.

Testing is particularly important for mobile apps given thelimited options available for taking remedial actions whencritical issues arise post-deployment (in contrast to the op-tions available for cloud-based software). We describe manyof the testing tools and processes used at FB for mobile codein §4. We show in §5.5 that the quality of FB mobile soft-ware is better than what it is on average for all FB-widedeployed software. And we show that the quality of themobile software code does not worsen as (i) the size of themobile software development team grows by a factor of 15,(ii) the mobile product becomes more mature and complex,and (iii) the release cycle is decreased. In fact, some of themetrics show that the quality improves over time.

Finally, we present additional findings from our analysisin §5. For example, we show that software updates pushedon the day of a Release branch cut are of lower quality onaverage. We also show that the higher the number of devel-opers working on the same code file, the lower the softwarequality.

A significant aspect of our study is that the data we baseour analysis on is extensive (§3). The data we analyzed cov-ers the period from January, 2009 to May, 2016. It includesall of the commit logs from revision control systems, all appcrashes that were captured, all issues reported by FB sta↵and customers, and all issues identified as critical during therelease process.

The data prior to 2012 is noisier and less useful to drawmeaningful conclusions from. Those were the early daysof mobile software, and there were relatively few mobile-specific developers working on the code base. In 2012, FB’sCEO announced the company’s “Mobile First!” strategy tothe public at the TechCrunch Disrupt conference in SanFrancisco. From that point on, the mobile developmentteam grew significantly, and the data collected became morerobust and meaningful. Hence, most of the data we presentis for that period of time.

In summary, this paper makes the following specific con-tributions:

1. We are, to the best of our knowledge, the first to de-scribe a process by which mobile software can be de-ployed in as continuous a fashion as possible. Specifi-cally, we describe the process used at FB.

2. This is the first time FB’s testing strategy for mobilecode is described and evaluated.

3. We present a full analysis with respect to productiv-ity and software quality based on data collected overa 7 year period as it relates to mobile software. Inparticular, we are able to show that fast release cy-cles do not negatively a↵ect developer productivity orsoftware quality.

4. We believe we are the first to show two compellingfindings: (i) the number of developers modifying a fileis inversely correlated to the quality of software in thatfile, and (ii) the software changes pushed on the dayof the release cut from the Master branch are of lowerquality than the files pushed on other days.

Page 3: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

In the next section we describe the mobile release cycleused at FB for both Android and iOS. Section 3 presentsthe collected data that we used for our analysis. Section 4describes the testing strategies used. Our analysis is pre-sented in §5. Related work is covered in §6. We close withconcluding remarks.

2. MOBILE RELEASE CYCLEIn this section we describe in detail the development anddeployment processes and activities at FB for mobile apps.The overall architecture is shown in Figure 1. As seen inthe figure, there are two classes of activities: developmentactivities and deployment activities.

2.1 Development activitiesFirst we describe development activities. There is e↵ectivelyno di↵erence between the development activities for mobile-based and cloud-based software at FB. The developer forks alocal revision control system branch from the Master branch.Updates to the software are made to the local branch withfrequent commits. After proper testing and when the devel-oper believes her software update is ready for deployment,she pushes2 the updates into the Master branch. As we showin §5, each developer pushes such updates into the Masterbranch 3-5 times a week, a timescale comparable to cloud-based software at FB.

A notable coding practice encouraged at FB is the use ofa mechanism called Gatekeeper, that allows one to dynami-cally control the availability of features on the mobile device.In e↵ect, this mechanism provides the ability to dynamicallyturn on or o↵ a mobile-side feature from the server side,even on devices in client hands. Hence, if a newly deployedfeature misbehaves, it can be turned o↵. This mechanismcan also be used to incrementally turn on new features in atargeted way, say by targeting the OS, the specific OS ver-sion, the hardware device type, the specific hardware devicemodel, the country, locale, etc. It can also be used for A/Btesting, for example to test whether one feature gets moreusage than an alternative.

2.2 Deployment activitiesWe now describe the lower half of the figure correspond-ing to deployment activities. FB has a Release EngineeringTeam (RelEng) that is responsible for these activities. Theteam includes project managers who are responsible for theirtarget platform and who can make informed decisions as towhich issues are launch-blocking or not.

Both iOS and Android deploy software on a fixed-datecycle in a pipelined fashion. We begin by describing thespecific process for iOS and then describe how the processfor Android di↵ers.

iOSFor iOS, a Release branch is cut from the Master branch byRelEng at a two week cadence every other Sunday at 6pm.The release branch is first stabilized over a period of fivedays, during which issues are fixed or eliminated, and otherpolish and stabilization updates are applied. Some bugs are

2In this paper, we use the term push exclusively to referto the pushing of the local development branch up into theMaster branch; we do not use the term here to refer to theact of deploying software.

Deploymen

tDe

velopm

ent

2weeksofdevelopment 2weeksofdevelopment

stabilizing soak

Master branch

Release branch

3 daily dog-food builds

deployment

cherry-picks

pushes pushes

Figure 1: Release cycle for iOS. The process is pipelinedwith each stage taking two weeks (for iOS mobile code): the sta-bilization, soak, review, and deployment with the Release branchtakes two week, while new software updates are continuously be-ing pushed to the master branch during the same two weeks untilthe next Release branch is cut from the Master branch.

categorized by RelEng as launch-blocking ; that is, bugs thatwould prevent making the app available to end-users andhence have to be fixed before deployment.3 Finally, RelEngmay decide to circumvent some issues by reverting parts ofthe code back to a previous version.

Developers are responsible for addressing any of the issuesraised by RelEng, such as identified bugs. Each update gen-erated by a developer that addresses issues raised by RelEng(regarding bug fixes, polishes, and stabilization) is only everpushed into the Master branch. The developer must thenmake a request to RelEng to merge the update from theMaster branch into the Release branch. In doing so, shemust provide justification as to why the update should bemerged into the Release branch.

In turn, RelEng “cherry-picks” (i.e. selects), at its dis-cretion, those updates that will be merged into the Releasebranch. It does this carefully by taking various risk factorsinto account. Merge requests may be declined for any num-ber of reasons. Two examples of merge requests that wouldlikely be declined simply because the risk is deemed to betoo high are: updates having too many dependencies (e.g.,more than 5) or updates that make significant changes tocore components like networking or display. Ultimately, thedecision is a judgement call by RelEng involving a trade-o↵ between risk and impact. For example, a polish update,which may have lower impact, is only accepted during thefirst 2-3 days and declined after that.

During the above described period of stabilization, the re-lease branch is built and the resulting app is made availableto FB internal users as “dog food.” This occurs three times aday. After five days of stabilization e↵orts, the code is frozenand a “soak” period of three days begins. During the soakperiod, only fixes for launch-blocking issues are accepted.

Finally, on the second Monday after the Release branchcut, the software is shipped to Apple for review and subse-quently deployed sometime later that week.

The overall timeline, shown in Fig. 1, is pipelined: new up-dates are being pushed into the Master branch concurrently

3Note that despite the name, launch-blocking issues neverget to the point where they actually block a deployment.

Page 4: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

with the stabilization of the current Release branch. Onlysome of these new updates are cherry-picked to be mergedinto the Release branch by RelEng. If, for whatever rea-son, this two week timeline depicted in the right half of thefigure cannot be met safely, then the deployment is haltedand all changes are deferred to the next regularly scheduleddeployment.4

AndroidThe Android release cycle is almost identical, except that itis compressed: branches are cut at a 1-week cadence ratherthan a 2-week cadence. On Thursday, a code freeze occursand merge requests are only accepted for launch-blockingissues.

On Monday mornings (one week after a branch cut), aslow rollout of the app to the Play Store occurs: typicallyfirst only 20%, then 50%, and finally 100% of the populationwill obtain access to the new version over the following threedays, with periodic stability checks after every increase inthe rollout percentage.

For Android, FB also releases Alpha and Beta versions ofthe app. Alpha is shipped from the Master branch once aday. It is made available through the Google Play Store toa small fraction of external users consisting of several 10,000users. Beta is shipped from the Release branch once a dayand made available to a larger fraction of external users(around 3 million).

Release Engineering TeamThe Release Engineering Team plays a critical role in theabove process. Despite its importance, this team is perhapssmaller than one might assume — it has fewer than 10 mem-bers. The team can be kept small, in part because of thetools it has at its disposal and the degree of automation inthe process, and in part because of the partnership it hasbuilt up over many years with the development teams.

Senior product developers are continuously being secondedto temporarily work for the Release Engineering Team forone release every two months or so. A key advantage of thisarrangement is that it creates a collaborative partnershipbetween RelEng and the development teams. The develop-ers become educated on what RelEng does, what its processflows are, what its problems and pain-points are, etc. Theknowledge accrued by the developers during their stint withRelEng is then permeated back to the rest of their develop-ment teams when they return.

A further advantage of this arrangement is that develop-ers, after having observed ine�ciencies in the release process,in some cases later develop tools for RelEng that allow theprocess to be further simplified and automated.

3. DATA COLLECTEDFacebook collects a significant amount of data related toeach release and each deployment. It retains all of this data.In this section, we describe the data sets we used for theanalysis we present in §5.

• Revision-control system records. This datasethas a record of each commit and push, the date, thesize of the di↵ being committed (as measured in LoC),the developer who issued the di↵, etc.

4This has occurred only once in the last 9 months.

• Crash database. A crash report is automatically sentto FB whenever a FB app crashes. The crash rate isa direct indicator of app quality, which is why thesecrash reports are carefully monitored and used to de-termine the healthiness of a release. Bots automati-cally categorize reliability issues into crashes, soft er-rors, hung systems, etc. Whenever a given crash ratemetric is higher than a specified threshold, then thecrash is automatically submitted as a launch blockerto the tasks database (described below). Additionally,the stack trace of the crashed app is automaticallycompared against recent changes recorded in the revi-sion control system, and if any instruction on the stacktrace is close to changed code, then the developer whomade the change is informed in an automated way. Weaggregated crash reports by date and app version.

• Flytrap database. This database contains issue re-ports submitted by (internal or external) users. Userscan submit issues from a “Report a Problem” compo-nent on the app which can be obtained from a pull-down menu or by shaking the device. Alternatively,they can fill out a Help Center “Contact Form”on a FBWebsite. Employee-generated flytraps are automati-cally entered into the tasks database (described below).User-generated flytraps are not automatically enteredinto the tasks database because many users submitunimportant issues, new feature requests, various im-provement suggestions, and sometimes just junk. Inaggregate, the Flytrap database is rather noisy. Forthis reason, user-generated flytraps are entered intothe tasks database automatically only if a bot is ableto identify an issue reported by a su�cient numberof users exceeding a given threshold. We aggregatedFlytrap issues by date and app version.

• Tasks database. This database is a core compo-nent of the release engineering management system. Itrecords each task that is to be implemented and eachissue that needs to be addressed. Tasks are entered byhumans and bots; the number of bot-created tasks isincreasing rapidly. Tasks related to a mobile releaseinclude:

– launch-blocking issues: critical issues marked byRelEng which need to be addressed before deploy-ment.

– crashbot issues: tasks created by a bot when acrash a↵ects more than a threshold number ofusers.

– flytrap issues: tasks created by a bot for employee-reported issues and when more than a thresholdnumber of flytrap reports identify the same issue.

– test fails: tasks created for failed autotests (see§4).

• Cherry-pick database. All cherry-pick requests arerecorded, and those accepted by RelEng are markedas such. We use the number of cherry-picks as a proxymetric of software quality.

• Production issues database. All errors identifiedin production code are recorded in a separate databaseand each is categorized by severity: critical, medium

Page 5: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

priority, and low priority. The recorded errors are en-tered by humans when they believe a production errorneeds to be fixed (even if low priority). We use thenumber of issues in this database as one measure ofquality of the software that was deployed to produc-tion.

• Daily Active People (DAP). This database recordsthe number of users using FB apps. We aggregated thenumbers by data and app version. We use this datain certain cases to normalize crash and flytrap qualityindicators.

• Boiler room. This database contains informationabout each mobile deployment, including the deploy-ment date, the deployment status, the release branchfrom which the build was made, the cut date of the re-lease being deployed, the versions of the alpha/beta/-production builds, etc.

• Sta↵ database. This data identifies how many devel-opers were working on mobile software per day. Theinformation is extracted from the commit logs: if adeveloper committed mobile code in the previous twoweeks then she is deemed have been working on mobilesoftware.

• Source and configuration code. We used informa-tion from this source to identify how and when auto-mated jobs are schedule, thresholds bots use to deter-mine when to create a task, etc.

4. TESTINGTesting is particularly important for mobile apps for thefollowing reasons:

• Thousands of updates are made to mobile softwareeach week.

• There are hundreds of device and OS version combi-nations the software has to run on.

• The options available for taking remedial actions whencritical issues arise post-deployment are limited.

As one might expect, FB applies numerous types of tests,including unit tests, static analysis tests, integration tests,screen layout tests, performance tests, build tests, as well asmanual tests.Tools and automation play a key role, supported in part by

hundreds of developers devoted to developing tools. Most ofthe tests are run in an automated fashion. When a regressionis detected, an attempt is made to automatically tie it backto specific code changes that were made recently, and anemail is automatically sent to the developer responsible forthat particular change.Before we describe some of the tests and when they are

used, we briefly describe the testing infrastructure availableat FB as well as the principles that guide the FB testingstrategies.

Testing InfrastructureThousands of compute nodes are dedicated for testing mo-bile software. Both white-box and black-box testing strate-gies are applied by running tests on simulated and emulatedenvironments.

For testing on real hardware, FB runs a mobile devicelab located in the Prineville data-center [9]. The mobile labis primarily used to test for performance regressions, witha primary focus on app speed, memory usage, and batterye�ciency.

The mobile lab contains electromagnetically isolated racks.Each rack contains multiple nodes that are connected to theiOS and Android devices that will be used for the targettests. The nodes install, test and uninstall target software.Chef, a configuration management tool [10], is used to con-figure the devices. A wireless access point in the rack sup-ports device Wi-Fi communication. The devices are set up inthe racks so that their screens can be captured by cameras.Engineers can access the cameras remotely to observe howeach phone reacts to code changes. Thus, the lab e↵ectivelyo↵ers testing Infrastructure-as-a-Service.

Testing Principles at FBFB’s testing strategy encompasses the following principles:

1. Coverage. Testing is done as extensively as possible.Almost every test ever written that is still useful is runas frequently as it makes sense to do so.

2. Responsive. The faster a regression is caught, theeasier it is to deal with; quickly caught issues are easierto address by the developers because the code and codestructure are still top of mind. For instance, parallelbuilds are used so as to be able to provide feedbackmore quickly. The objective is to be able to providethe developer with the results from smoke-tests within10 minutes of her actions.

3. Quality. Tests should identify issues with surgicalprecision. False positives and false negatives need tobe minimized. This is important not only to minimizethe time developers spend on chasing false alarms, butalso to prevent test results from being ignored overtime.

4. Automation. Tests are automated as much as pos-sible. This makes them repeatable, and it ensures thetests are run on a regular basis. An additional aspectof automation has proved to be useful: automaticallyidentifying the developer that caused a regression withhis code changes so that he can be informed immedi-ately with specific details on the detected issue and thecode that might have caused the issue. This is doneby comparing the location in the code likely to havecaused the detected issue with recently applied codechanges. Much e↵ort has gone into developing this ca-pability since it is e↵ective only when the quality ishigh.

5. Prioritization. Testing requires a great deal of com-puting resources if it is to be exhaustive and respon-sive at the same time. Since resources will invariablybe limited, a prioritized testing strategy is imperative.For example, when a change is pushed to the Masterbranch, integration tests are only done on those por-tions of the app that might be a↵ected by the changesbeing pushed, instead of running the full test-suite.This makes it possible to provide key test results tothe developer more quickly. Complete integration testsare run every few hours on the Master and Releasebranches.

Page 6: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

Unit tests: These white-box tests primarily verify the logic of target units. Limited unit tests are run manuallyon the development environments, XCode or Android Studio, using tools such as XCTest [11],JUnit [12], or Robolectric [13]. More extensive as well as automated unit tests are run on server-based simulators.

Static analysis: This analysis identifies potential null-pointer dereferences, resource leaks, and memory leaks. Be-cause exceptions are often the root cause of resource leaks, the tool is careful to identify wherethese particular leaks might occur.

Build tests: These tests determine whether the code builds properly.Snapshot tests: These tests generate images of screen views and components, which are then compared, pixel by

pixel, to previous snapshot versions [14, 15].Integration tests: These standard (blackbox) regression tests test key features and key flows of the app. They

typically run on simulators and are not necessarily device specific. Moreover they employ degreesof scope: “smoke tests” primarily target specific changes to the code (di↵s) or target high-levelfunctionality; long-tail integration tests run the full suite of regression tests and run against a liveserver so that they also cover client-server integration.

Performance tests: These tests run at the mobile lab to triage performance and resource usage regressions as describedabove.

Capacity tests: These tests verify that the app does not exceed various specified capacity limits.Conformance tests: These tests verify that the app conforms to various requirements.

Table 1: Range of tests performed on mobile software.

Tests and when they are runTable 1 lists some of the types of testing conducted on themobile software. These tests are run in all phases of thedevelopment and deployment cycle.

Pre-Push Testing. A developer will frequently run unittests on her PC/laptop while developing code. Given thesize of the apps and the limited power of the developmentPCs and laptops, more extensive unit tests will be run onservers in simulation environments. The developer can alsomanually invoke any of the other tests listed in the table,with static analysis and some integration testing being themost common ones invoked. Finally, a subtle point, whilethere is no separate testing team, a code review is requiredbefore any code can be pushed to the Master branch.

On Push. When the developer believes her changes arecomplete and work correctly, she initiates a push of herchanges to the Master branch. Before the actual push oc-curs, a number of tests are run automatically to determinewhether the push should be blocked. These include stan-dard unit tests as well as a number of smoke tests that verifythat heavily used features and their key flows work correctly.Moreover, a test is run to ensure a build with the changesworks correctly. However, since a full build is time and re-source intensive, the build test here only tests dependenciesa few levels deep.

If all the tests pass, then the changes are pushed onto theMaster branch. The merging process may identify conflicts,in which case the developer is alerted so she can addressthem.

Continuous testing on Master and Release branch.All of the tests are run continuously (every few hours) onboth the Master and the Release branch. The most impor-tant of these are the full build tests, integration regressiontests, and performance tests in the mobile device lab.

As mentioned in §2, alpha versions are built from the Mas-ter branch (twice a day for iOS and once a day for Android)and beta versions are built from the Release branch threetimes a day. Release of the alpha version of software isblocked if a certain percentage of tests fails. This happensvery rarely.

Manual testingA contracted manual testing team (of about 100) is usedto test the mobile apps. They do various smoke tests andedge-case tests to ensure the apps behave as expected. Thetesting team is primarily used to test new features for whichautomated tests have not yet been created. Finally, they areresponsible for UI testing after language translations havebeen added to ascertain the quality of the look and feel ishigh.

5. ANALYSIS

5.1 MethodologyOur quantitative analysis of the FB mobile software engi-neering e↵ort is based on the data described in §3. Thedata extracted from those data sets was cleaned, interpreted,cross-checked and the conclusions drawn were presented toFB-internal subject-matter experts for confirmation. We de-scribe how the data sets were cleaned in the figure captionswhere relevant.

Some of the data sets start from January 2009. All con-tinue until 2016. The metrics collected between 2009 and2012 are rather noisy, to the point where it is di�cult toinfer useful insights. While the first several graphs in thissection depict data from 2009 onwards, later graphs focuson the period from 2012 to 2016. In 2012, fewer than 100developers worked on mobile code, while by 2016, over 1,500developers did. Over one billion users use the software anal-ysed here on a daily basis.

5.2 Developer ProductivityFigure 2 depicts developer productivity as measured by linesof code (LOC) that are ultimately pushed and deployed onaverage per developer and per day. The figure shows thatdevelopers produce on average 70 LoC per day for both An-droid and iOS. This is in line with the 64.5 LoC producedper developer per day company-wide across all software.5

5Code for backend services is produced more slowly thanthe average.

Page 7: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

0

200

400

600

800

1,000

1,200

1,400Line

sCom

mitted

PerDevop

lerP

erDay

iOS

Android

0

50

100

150

200

Figure 2: Lines of Code pushed per developer per dayas averaged each month. The insert shows the same data frommid 2012 onwards at a larger scale. (Pushes from bots were re-moved from the data set, as were pushes that changed more than2,000 lines to avoid including third-party software packages, anddirectories being moved or deleted.)

0.5

1

1.5

2

2.5

3

3.5

4

Android

iOS

Figure 3: Number of pushes per developer per day asaveraged each month. (Pushes from bots were removed from thedata set.)

Figure 3 depicts developer productivity as measured bynumber of pushes per day on average for both Android andiOS. Over the last two years, the average number of pushesper developer per day was 0.7 and 0.8 for Android and iOS,respectively. When averaged over all company-wide soft-ware, it is 0.7 pushes per developer per day; hence, iOShas slightly more frequent pushes. While Android has fewerpushes than iOS per developer per day, it has the same num-ber of LoC generated per developer per day than iOS, whichimplies that Android pushes will be slightly larger than theiOS pushes.

More interesting than the absolute numbers is how pro-ductivity changes over time as a function of the number ofdevelopers producing code for the mobile platforms — Fig-ure 4 depicts how the number of these developers changesover time:

Finding 1: Productivity remains constant even as the num-ber of engineers working on the code base grows by a factorof 15. This is the case for both Android and iOS developers,and whether measured by LoC pushed or number of pushes.

One hypothesis for why this is the case might be that theine�ciencies caused by larger developer groups and a larger,more complex code base are o↵set, in part, by (i) toolingand automation improvements over time, and (ii) the agiledevelopment, continuous release, and deployment processesbeing used. One can, however, definitively conclude:

Android

iOS

Figure 4: Growth of the size of the Android and iOSdevelopment teams. The y-axis has been deliberately left outso as not to divulge proprietary information.

Finding 2: The continuous deployment processes being usedfor mobile software at FB does not negatively a↵ect produc-tivity even as the development organization size scales sig-nificantly.

5.3 QualityAt FB, each issue with production code that someone be-lieves needs to be addressed is registered in the ProductionIssues Database and categorized by severity: (i) critical,where the issue needs to be addressed immediately at highpriority, (ii) medium-priority, and (iii) low-priority. We usethe number of issues found in production code as one mea-sure of production software quality.

Figure 5 depicts the number of issues for each severitylevel as a function of the number of pushes per month forAndroid and iOS. The (red) triangles represent critical er-rors; they have been almost constant since 2012, with eachrelease either having 0 or 1 critical issue.

Finding 3: The number of critical issues arising from de-ployments is almost constant regardless of the number of de-ployments.

We surmise that the number of critical issues is low fortwo reasons. Firstly, critical errors are more likely to bedetected in testing. Secondly, the company takes criticalissues seriously, so after each occurrence, a standing reviewmeeting is used meetings to understand the root cause ofthe issue and to determine how similar issues of the sametype can be avoided in future deployments.

Unsurprisingly, the number of medium-priority and low-priority issues grows linearly with the number of pushes,albeit with very low slopes: for medium-priority issues theslope is 0.0003 and 0.00026 for Android and iOS respectively,and for low priority issues the slope is 0.0012 and 0.00097 forAndroid and iOS respectively. The slope for both medium-and low priority issues is higher for Android than for iOS;the relatively large number of Android hardware platformsmay be one potential explanation. Interestingly, company-wide, across all software, the two corresponding slopes are0.0061 and 0.0013, respectively, which are both steeper thanfor the mobile code alone. Overall, we argue that thesenumbers are evidence that the testing strategies applied tomobile software is e↵ective.

Figure 6 shows the number of launch-blocking issues andnumber of cherry-picks per deployment for Android and iOS.

Page 8: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

0

2

4

6

8

10

12

14

16

18

0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000

Prod

uctio

ncode

issues

Android

0

2

4

6

8

10

12

14

16

18

0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000

Prod

uctio

ncode

Issues

Numberofpushespermonth

iOSCriticalissuesMedium-priorityissuesLow-priorityissues

Figure 5: Number of recorded issues with Android andiOS production code as a function of the number ofpushes per month for the period between March, 2011 andMay, 2016. The top graph is for Android; the bottom for iOS.Note that the scale of the x- and y-axes are very di↵erent: all ofthe linear trendlines have a slope less than 0.0012.

To allow for direct comparison, the numbers are normalizedby the length of the release cycle, under the assumptionthat a release cycle twice as long will have twice as manypushes and twice as many LoC modified. That is, each datapoint in the graph depicts for each release cycle number ofcherry-picks and launch-blockers recorded per day on aver-age. Overall, the number of launch-blockers and the numberof cherry-picks seem to oscillate somewhat over time, butwith no particular trend: the number of cherry-picks seemto mostly stay within a range of 5–25 for Android and withina range of 5–15 for iOS.

Finding 4: The length of the deployment cycle does notsignificantly a↵ect the relative number of cherry-picks andlaunch-blockers as it decreases from 4 weeks to 2 weeks (andthen to 1 week for Android).

From this we conclude that the length of the deploymentcycle does not directly a↵ect the quality of the software pro-duced, with the described continuous deployment process.In particular, there is no discontinuity in the curves at thepoints where the frequency of deployments changes.

One should note that the number of launch-blocking is-sues is impacted in at least two ways. First whether anissue is declared to be launch-blocking or not is subjective.RelEng believes that they have become more stringent overtime on deciding what constitutes a launch-blocker. Thisis somewhat intuitive: as more people use the mobile app,standards tend to increase. Secondly, a large number oftesting tools were developed in 2015 and 2016. These toolsshould lead to an improvement in code quality, and hencereduce the number of launch-blocking issues.

Figure 7 shows the crash rates experienced by end users

0

5

10

15

20

25 Android4weeks2weeks1week

051015202530

iOSLaunch-blocker

Cherry-pick

4weeks2weeks

Figure 6: Number of launch-blocking issues (blue solidlines) and number of cherry-picks (orange dashed lines)per deployment for Android (top) and iOS (bottom), normalizedby length of release cycle. Each data point in the graph depicts foreach release cycle the number of cherry-picks and launch-blockersrecorded per day on average. The vertical straight lines showwhere release cycle was reduced, first from a 4 week cadence to atwo week cadence, and then for Android to a one week cadence.

0.000.050.100.150.200.250.300.350.40

Android

4Weeks 2Weeks 1Week

extended scopeofeventsconsideredascrashes

0.000.050.100.150.200.250.300.350.40

iOS2Weeks4Weeks

Figure 7: Normalized daily crash rate for Android (topgraph) and iOS (bottom graph): proportion of end usersexperiencing a crash relative to the total number of active usersnormalized to a constant. The vertical straight lines show whererelease cycle was reduced, first from a 4 week cadence to a twoweek cadence, and then for Android to a one week cadence. Thecurve is normalized to the same constant in order to hide theabsolute values but to allow a comparison.

per day, normalized to a constant in order to hide the ab-solute values. The di↵erent colors of the bars represent dif-ferent releases: the color swaths become more narrow eachtime the deployment cycle is reduced (but may be wideraround holidays).

Finding 5: The shortening of the release cycle does notappear to a↵ect the crash rate.

The number of Android crashes increased in the first halfof 2015; there are two reasons for this. First, the scope ofwhat was considered to be a crash was extended to also in-clude Java crashes when running a FB app and apps thatwere no longer responding. Second, a set of third-party soft-ware libraries were introduced that increased the crash rate;some of the o↵ending software was removed in the secondhalf of the year.

Internal FB employees play a critical role in helping to testthe mobile software through dog-fooding. The iOS platform

Page 9: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

0.00

0.20

0.40

0.60

0.80

1.00

AndroidAlpha

0.00

0.20

0.40

0.60

0.80

1.00AndroidBeta

0.00

0.20

0.40

0.60

0.80

1.00AndroidProduction

Figure 8: Normalized daily crash rate for alpha (topgraph), beta (middle graph), and production (bottomgraph) versions of Android FB apps: Proportion of endusers experiencing a crash relative to the total number of activeusers normalized to a constant. Note that the bottom graph hereis identical to the top graph of Fig. 7, just with a di↵erent scalefor the y-axis. These graphs show the importance of having userstesting alpha and beta versions of the software to improve soft-ware quality.

gets far better dog-food coverage internally, because employ-ees tend to skew towards iOS (over Android). This is par-ticularly problematic, since there is such a wide variety ofhardware that Android runs on. For this reason, Androidrelies more heavily on alpha and beta releases of its software.Figure 8 shows the e↵ectiveness of the alpha releases withrespect to crashes. A significant decrease in the crash rateis visible when going from alpha to beta; crash rate spikesthat occur occasionally with the beta version of the softwareno longer appear in the production version of the software.The crash rate for alpha versions of the software is roughly10X higher than for production versions.

5.4 Deadline EffectsThere is a clear increase in the number of pushes to theMaster branch on the day the Release branch is cut. Thismay indicate that developers rush to get their code in beforethe deadline, which begs the question of whether there is adecrease in quality of code submitted on the day of releasecut.

Figure 9 depicts the number of cherry-picks per push asa function of the proportion of pushes on the day of theRelease branch cut. The number of cherry-picks is used asa proxy for software quality. There is a correlation betweenthe percentage of pushes made on the day of release cut andthe number of cherry-picks per day: as the number of pushesmade on the day of a release cut increases, so do the numberof needed cherry-picks.

In absolute terms, Android has a higher proportion ofpushes on the cut day. This is primarily because many ofthe datapoints represent releases that occurred at a one-

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0% 5% 10% 15% 20% 25%

Cherry-pick

sperco

mmit

Proportionofcommitsoncut-day

Android

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0% 5% 10% 15%

Cherry-pick

sperco

mmit

Proportionofcommitsoncut-day

iOS

Figure 9: Cherry-picks per push as a function of theproportion of pushes on the release cuto↵ day. Thursdaycuto↵s are shown as red circles; Sunday cuto↵s as blue triangles.For Android, only releases after June, 2014 are included, which iswhen the release cycle switched to a two week cadence. For iOS,only releases after September, 2014 are included, which is whenthe release cycle switched to a two week cadence. Releases up tothe end of May, 2016 are shown for both Android and iOS.

week cadence, while all iOS datapoints represent releases ata two-week cadence. That is, if all of the Android pushes areevenly distributed across the seven days, one would expecteach day to have 14% of all pushes; on the other hand, ifthe iOS pushes are evenly distributed over the 14 days, thenone would expect each day to have 7% of all pushes.

Finding 6: Software pushed on the day of the deadline isof lower quality.

Our hypothesis is that the software pushed on the cutday is likely rushed and that rushed software will have lowerquality.

When, normalized, Android has a lower proportion ofpushes on the cut day, and because of this a smaller pro-portion of cherry-picks. An intuitive explanation for thisgoes as follows: if a developer believes her update will notmake the cut, they will only have to wait one week andhence do not force a push. However, if the next cut is fur-ther out, then some developers rather force a push than tohave to wait the longer period of time before their updatesget deployed.

Another interesting observation is that moving the cut

Page 10: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Prob

abilityofa

cherry-pick

(%)

Numberofdevelopersmodifyingfile

Android iOS

Figure 10: Proportion of files cherry-picked when modi-fied by n developers for n = 1..40.

date to a weekend leads to improved quality, in part becausethere is a lower tendency to rush to push updates when thecut date is on a weekend. The cut date was moved from aThursday to a Sunday in August, 2015 for Android and inFebruary, 2016 for iOS. While the average number of pusheson the weekend increased from 0.3 pushes per developer perday to 0.5 for iOS, and from 0.175 to 0.5 for Android (notshown in any figure),6 the number of weekend pushes is stillsignificantly lower than the number of weekday pushes. AsFigure 9 shows, the lower number of pushes for weekendcut days, correlates with a lower number of cherry picks forthose releases, a sign of improved quality.

5.5 Factors Affecting Software QualityAn interesting question what factors a↵ect software quality.Above we showed that that software pushed on the cuto↵day will be of lower quality. But we also showed that thelength of the release cycle, the size of the engineering team,and the number of pushes do not seem to a↵ect softwarequality directly. We have also not been able to find anycorrelation between software quality and

• the size of the change of each push (as measured inLoC modified)

• the size of the file being changed

We did, however, find two factors that do appear to a↵ectsoftware quality, although both factors are highly correlated.First, we found that as the number of developers that com-mit code to a file of a release increases, so does the probabil-ity of the file being cherry-picked. This is shown in Figure 10

Finding 7: The more developers involved in modifying afile for a release, the lower the software quality of the file.

Some files are modified by a surprisingly large number ofdevelopers. An example scenario where this happens natu-rally are very large switch statements, where the code asso-ciated with individual case statements are modified by dif-ferent developers.

The above finding suggests that files needing modificationby multiple developers should perhaps be split up to improvequality.

6Moving the cut date to a Sunday did not a↵ect overalldeveloper productivity.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Prob

abilityofa

cherry-pick

(%)

Numberofpushesforthesamefile

Android iOS

Figure 11: Proportion of files cherry-picked when com-mitted n times for n = 1..40.

A second factor we found that appears to a↵ect softwarequality is the number of times a file is pushed between cuts.A high number of pushes leads to a higher probability of thefile being cherry-picked (Figure 11). However, note that ahigh number of developers modifying a file implies a highnumber of pushes for the file.

6. RELATED WORKContinuous deployment is a natural extension of the pro-gression from agile software development [16, 17, 18, 19] tocontinuous integration [20, 21, 22] to continuous delivery [7,23]. Agile development, where software is developed itera-tively with cycles as short as half a day, started in the late1990’s and is now used in some form in many if not most or-ganizations. Continuous Integration is the practice in whichsoftware updates are integrated at the end of each softwarecycle; this allows integration to be verified by an automatedbuild and automated tests to detect integration errors asquickly as possible. Continuous Delivery goes one step fur-ther and ensures that the software is built and tested in sucha way that the software can be released to production at anytime. Continuous deployment deploys each software changeto production as soon as it is ready.

Related developments include lean software development [24],kanban [25], and kaizan [26]. DevOps is a movement thatemerged from combining roles and tools from both the de-velopment and operations sides of the business [27, 28].

In a recent study, McIlroy et al. showed that of 10,713 mo-bile apps in the Google Play store (the top 400 free apps atthe start of 2013 in each of the 30 categories in the GooglePlay store), 1% were deployed more frequently than oncea week, and 14% were deployed at least once every twoweeks [29]. While the study is based on a relatively smalltime window, it suggests that continuous deployment is ac-tively being pursued by many organizations.

However, the literature contains only limited explicit ref-erences to continuous deployment for mobile software. Klep-per et al. extended their agile process model Rugby to ac-commodate mobile applications [30]. Etsy, a leader in con-tinuous deployment, has presented how they do continuousintegration for their mobile apps in talks [31]; e↵ectively,they deploy mobile software out to their employees on adaily basis [32].

With respect to testing, Kamei et al. evaluate just-in-time quality assurance [33], which is similar to the strat-

Page 11: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

egy describe in §4. Gao et al. describe mobile Testing-As-A-Service infrastructure for mobile software [34]. Amazon’sAppthwack o↵ers a mobile device farm for that purpose [35].

7. CONCLUDING REMARKSThis paper described continuous deployment for mobile ap-plications at Facebook. Shorter release cycles have numer-ous advantages including quicker time to market and betterability to respond to product feedback from users, amongothers. The release cycle was shortened from 8 weeks to 1week for Android over a period of four years. We describedlearnings throughout this period.

Shortening the release cycle forces the organization to im-prove tools, automate testing and procedures. Our obser-vations is that over the four year period when the Androidrelease cycle was reduced from 8 weeks to 1 week, these ef-forts to improve tools and increase automation representedthe bulk of the work to make continuous deployment suc-cessful. The data collected show that tools and process im-provements increased quality and permitted faster releasecycles. However, continuous deployment itself did not o↵erany productivity or quality improvements.

Releasing software for mobile platforms is more di�cult,because it is not possible to roll back/forward in the eventof a problem as it is with cloud software. Feature flags areused to enable or disable functionality remotely so that abuggy feature doesn’t require a new release. Alpha and betaprograms were extremely e↵ective in providing bug reportsthat reduce problems, provided that reporting of such issueswas automated.

Analysis revealed several patterns leading to lower qualitysoftware. One was cramming for a release on a branch date:software committed on the day a release branch is cut wasfound to be of lower quality. Shortening the release cycleimproved this, because developers were less likely to rushtheir software pushes, knowing that another release was up-coming shortly. Another pattern that led to lower-qualitysoftware was having too many developers work on the samefile concurrently: the number of problems per file was foundto increase proportionally to the number of developers whomodified the file.

8. ACKNOWLEDGEMENTSWe sincerely thank Laurent Charignon, Scott Chou, AmandaDonohue, Ben Holcomb, David Mortenson, Bryan O’Sullivan,James Pearce, Anne Ruggirello, Damien Sereni and ChipTurner for their valuable contributions, suggestions and feed-back. Their input greatly improved this paper.

9. REFERENCES[1] T. Savor, M. Douglas, M. Gentili, L. Williams,

K. Beck, and M. Stumm, “Continuous deployment atFacebook and OANDA,” in Proc. 38th Intl. Conf. onSoftware Engineering, (ICSE16), May 2016, pp. 21–30.

[2] C. Parnin, E. Helms, C. Atlee, H. Bought, M. Ghattas,A. Glover, J. Holman, J. Micco, B. Murphy, T. Savor,M. Stumm, S. Whitaker, , and L. Williams, “The top10 adages in continuous deployment,” IEEE Software,accepted for publication, 2016.

[3] J. Allspaw and P. Hammond, “10 deploys per day —Dev and Ops cooperation at Flickr,” 2009, slides.[Online]. Available:

http://www.slideshare.net/jallspaw/10-deploys-per-day-dev-and-ops-cooperation-at-flickr

[4] M. Brittain, “Continuous deployment: The dirtydetails,” Jan. 2013. [Online]. Available:http://www.slideshare.net/mikebrittain/mbrittain-continuous-deploymentalm3public

[5] T. Schneider, “In praise of continuous deployment:The Wordpress.com story,” 2012, 16 deploymentsa day. [Online]. Available: http://toni.org/2010/05/19/in-praise-of-continuous-deployment-the-wordpress-com-story/

[6] G. Gruver, M. Young, and P. Fulgham, A PracticalApproach to Large-scale Agile Development — HowHP Transformed LaserJet FutureSmart Firmware.Addison-Wesley, 2013.

[7] J. Humble and D. Farley, Continuous Delivery:Reliable Software Releases through Build, Test, andDeployment Automation. Addison-Wesley, 2010.

[8] D. Feitelson, E. Frachtenberg, and K. Beck,“Development and deployment at Facebook,” IEEEInternet Computing, vol. 17, no. 4, pp. 8–17, 2013.

[9] A. Reversat, “The mobile device lab at the Prinevilledata center,”https://code.facebook.com/posts/300815046928882/the-mobile-device-lab-at-the-prineville-data-center,Jul. 2016.

[10] https://www.chef.io/chef/.[11] https://developer.apple.com/reference/xctest.[12] http://junit.org/.[13] http://robolectric.org.[14] P. Steinberger, “Running UI tests on iOS with

ludicrous speed,” https://pspdfkit.com/blog/2016/running-ui-tests-with-ludicrous-speed/, Apr. 2016.

[15] https://github.com/facebook/ios-snapshot-test-case/.[16] K. Beck, Extreme Programming Explained: Embrace

Change. Addison-Wesley, 2000.[17] K. Beck, M. Beedle, A. van Bennekum, A. Cockburn,

W. Cunningham, M. Fowler, J. Grenning,J. Highsmith, A. Hunt, R. Je↵ries, J. Kern, B. Marick,R. C. Martin, S. Mellor, K. Schwaber, J. Sutherland,and D. Thomas, “Manifesto for agile softwaredevelopment,” 2001. [Online]. Available:http://www.agilemanifesto.org/

[18] A. Cockburn, Agile Software Development. AddisonWesley Longman, 2002.

[19] A. Cockburn and L. Williams, “Agile softwaredevelopment: It’s about feedback and change,”Computer, vol. 36, no. 6, pp. 39–43, 2003.

[20] L. Williams, “Agile software developmentmethedologies and practices,” in Advances inComputers, vol. 80, 2010, pp. 1–44.

[21] M. Fowler and M. Foemmel, “Continuous integration,”Thought-Works) http://www. thoughtworks.com/Continuous Integration. pdf, p. 122, 2006.

[22] P. M. Duvall, Continuous Integration. PearsonEducation India, 2007.

[23] L. Chen, “Continuous delivery: Huge benefits, butchallenges too,” IEEE Software, vol. 32, no. 2, pp.50–54, 2015.

[24] M. Poppendieck and M. A. Cusumano, “Lean softwaredevelopment: A tutorial,” IEEE software, vol. 29,no. 5, pp. 26–32, 2012.

Page 12: Continuous Deployment of Mobile Software at FacebookContinuous deployment is the practice of releasing software to production as soon as it is ready. It is receiving widespread adoption

[25] D. J. Anderson, Kanban: Successful EvolutionaryChange for Your Technology Business. Blue HolePress, 2010.

[26] M. Poppendeick and T. Poppendeick, Lean SoftwareDevelopment. Addison Wesley, 2002.

[27] M. Httermann, DevOps for developers. Apress, 2012.[28] J. Roche, “Adopting DevOps practices in quality

assurance,”Communications of the ACM, vol. 56,no. 11, pp. 38–43, 2013.

[29] S. McIlroy, N. Ali, and A. E. Hassan, “Fresh apps: anempirical study of frequently-updated mobile apps inthe google play store,” Empirical SoftwareEngineering, vol. 21, no. 3, pp. 1346–1370, 2016.

[30] S. Klepper, S. Krusche, S. Peters, B. Bruegge, andL. Alperowitz, “Introducing continuous delivery ofmobile apps in a corporate environment: A casestudy,” in Proceedings of the Second InternationalWorkshop on Rapid Continuous Software Engineering,ser. RCoSE ’15, 2015, pp. 5–11. [Online]. Available:http://dl.acm.org/citation.cfm?id=2820678.2820681

[31] J. Miranda, “How Etsy does continuous integration formobile apps,” https://www.infoq.com/news/2014/11/continuous-integration-mobile/, Nov. 2014.

[32] K. Nassim, “Etsy’s journey to continuous integrationfor mobile apps,”https://codeascraft.com/2014/02/28/etsys-journey-to-continuous-integration-for-mobile-apps/,Nov. 2014.

[33] Y. Kamei, E. Shihab, B. Adams, A. E. Hassan,A. Mockus, A. Sinha, and N. Ubayashi, “A large-scaleempirical study of just-in-time quality assurance,”IEEE Transactions on Software Engineering, vol. 39,no. 6, pp. 757–773, 2013.

[34] J. Gao, W.-T. Tsai, R. Paul, X. Bai, and T. Uehara,“Mobile Testing-as-a-Service (MTaaS) —Infrastructures, issues, solutions and needs,” in 2014IEEE 15th International Symposium onHigh-Assurance Systems Engineering, 2014, pp.158–167.

[35] https://aws.amazon.com/device-farm/.