The Impact of Continuous Integration on
Other Software Development Practices: A Large-Scale Empirical Study
Yangyang Alexander Yuming Vladimir BogdanZhao Serebrenik Zhou Filkov Vasilescu
Nanjing U TU Eindhoven Nanjing U DECALat UC Davis
STRUDEL at CMU
@aserebrenik @vfilkov @b_vasilescu
https://imgur.com/0TBJ9OW
Happy Halloween!
Interventions are common in software engineering
• SVN —> git
• push —> pull request
• ? —> continuous integration
• …
Interventions are common in software engineering
• SVN —> git
• push —> pull request
• ? —> continuous integration
• …
How to measure effects using trace data?
Evaluating the effects of an intervention: before vs. after
change in slope
t-test no difference
change in slope
Evaluating the effects of an intervention: before vs. after
change in level
Evaluating the effects of an intervention: before vs. after
change in level
t-test no difference
Evaluating the effects of an intervention: before vs. after
Today
Methodology to empirically study the
effects of an intervention(continuous integration)
Interrupted time series
slope before
slope after
change in level
Interrupted time series
slope before
slope after
change in level
Multiple regression w/ controls for confounds
slope before slope
after
change in level
slope before slope
after
change in level
time: 1 2 3 … … … 100 101 102 … … … 200
slope before slope
after
change in level
time: 1 2 3 … … … 100 101 102 … … … 200time after intervention: 0 0 0 … … … 0 1 2 … … … 100
slope before slope
after
change in level
time: 1 2 3 … … … 100 101 102 … … … 200
intervention: F F F … … … T T T … … … T
time after intervention: 0 0 0 … … … 0 1 2 … … … 100
time: 1 2 3 … … … 100 101 102 … … … 200
intervention: F F F … … … T T T … … … T
time after intervention: 0 0 0 … … … 0 1 2 … … … 100
yi = α + β · timei + ɣ · interventioni + δ · time_after_interventioni + εi
β β + δ
ɣ
31/10/17 01:22
Page 1 of 1file:///Users/bogdanv/CMUGDrive/ase-2017-slides/table.html
Dependent variable:
y
time 0.991***
intervention -48.678***
time_after_intervention -1.500***
Constant 1.007
Observations 200R2 0.967
Adjusted R2 0.967Residual Std. Error 4.844 (df = 196)F Statistic 1,924.910*** (df = 3; 196)
Note: *p<0.1; **p<0.05; ***p<0.01
β β + δɣ
yi = β · timei + ɣ · interventioni + δ · time_after_interventioni + εi
• β ~ 1• ɣ ~ -50 • β + δ ~ -0.5
lm in R
Effects of adopting Travis CI
https://martinfowler.com/articles/originalContinuousIntegration.html
Why CI?
Lots of folklore, e.g., Martin Fowler:
• Everyone Commits To the Mainline Every Day
• Fix Broken Builds Immediately
• Keep the Build Fast
• …
time
Travis CI adoption(first .travis.yml commit)
-15 days
+15 days
-1… …-12 +1 +12
+45 days
+375 days
-375 days
-45 days
Unstable period excluded
Adoption of Travis CI
time
Travis CI adoption(first .travis.yml commit)
-15 days
+15 days
-1… …-12 +1 +12
+45 days
+375 days
-375 days
-45 days
Unstable period excluded
Starting sample: 165,549 GitHub projects using Travis
Adoption of Travis CI
time
Travis CI adoption(first .travis.yml commit)
-15 days
+15 days
-1… …-12 +1 +12
+45 days
+375 days
-375 days
-45 days
Unstable period excluded
Starting sample: 165,549 GitHub projects using Travis
24 active periods x
7 programming languages
Adoption of Travis CI
More frequent commits
Smaller code changes
Quick pull requests resolution
More issues and pull requests closed
RQsImpact on automated
testing?
ChurnChurn in
non-merge commitsChurn in
merge commitsIntercept (α) 1,336*** -1,297**
log(TotalCommits) 0,529** 1,113**
AgeAtTravis -0,003* -0,005**
log(NumAuthors) -0,233** -0,522**
time (β) -0,007 -0,012*
interventionTrue (ɣ) 0,071 0,220**
time_after_intervention (δ) -0,009 -0,022**
Churn in non-merge commits
Churn in merge commits
Intercept (α) 1,336*** -1,297**
log(TotalCommits) 0,529** 1,113**
AgeAtTravis -0,003* -0,005**
log(NumAuthors) -0,233** -0,522**
time (β) -0,007 -0,012*
interventionTrue (ɣ) 0,071 0,220**
time_after_intervention (δ) -0,009 -0,022**
Control variables
Churn
Churn in non-merge commits
Churn in merge commits
Intercept (α) 1,336*** -1,297**
log(TotalCommits) 0,529** 1,113**
AgeAtTravis -0,003* -0,005**
log(NumAuthors) -0,233** -0,522**
time (β) -0,007 -0,012*
interventionTrue (ɣ) 0,071 0,220**
time_after_intervention (δ) -0,009 -0,022**
Control variables
n.s.
Churn in non-merge commits is not affected by time or Travis CI
Churn
Churn in non-merge commits
Churn in merge commits
Intercept (α) 1,336*** -1,297**
log(TotalCommits) 0,529** 1,113**
AgeAtTravis -0,003* -0,005**
log(NumAuthors) -0,233** -0,522**
time (β) -0,007 -0,012*
interventionTrue (ɣ) 0,071 0,220**
time_after_intervention (δ) -0,009 -0,022**
n.s.
Churn in non-merge commits is not affected by time or Travis CI
Churn●
●
●
●
●
●
●
●
●●
●●
●
●●
●●●
●
●●
●
●●
●●●●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●●
●
●
●●●●
●
●
●
●●
●
●
●●●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●●●●
●
●
●●
●●
●
●
●●
●●●
● ●
●
●
●
●
●●●
●● ●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●●●●●●
●
● ●●●●●
●
●●●
●
●●●
●
●
●
●
●●
●●●
●●●
●●●●
●
●
●●
●
●●
●
●
●●●
●●●
●●
●
●
●
●
●
●
●
●
●●
● ●●
●
●●
●
●
●
●●●●
●
●
●
●
●●
●
●●●●●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●●●●●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●
●
●●
●
●●
●
●●●
●●
●●
●
●
●
●
●
●
●
1
5102050
100200500
5000
−12 −11 −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 7 8 9 10 11 12Month index w.r.t. Travis CI adoption
Mea
n no
n−m
erge
chu
rn (L
OC
)Churn in non-merge commits
Churn in non-merge commits
Churn in merge commits
Intercept (α) 1,336*** -1,297**
log(TotalCommits) 0,529** 1,113**
AgeAtTravis -0,003* -0,005**
log(NumAuthors) -0,233** -0,522**
time (β) -0,007 -0,012*
interventionTrue (ɣ) 0,071 0,220**
time_after_intervention (δ) -0,009 -0,022**
Control variables
n.s.
Churn in non-merge commits is not affected by time or Travis CIDiscontinuity in merge com.: preparation for transition, clean-up
Churn
Churn in non-merge commits
Churn in merge commits
Intercept (α) 1,336*** -1,297**
log(TotalCommits) 0,529** 1,113**
AgeAtTravis -0,003* -0,005**
log(NumAuthors) -0,233** -0,522**
time (β) -0,007 -0,012*
interventionTrue (ɣ) 0,071 0,220**
time_after_intervention (δ) -0,009 -0,022**
Control variables
n.s.
Churn in non-merge commits is not affected by time or Travis CI
Decrease in churn in merge commits is amplified by Travis CIDiscontinuity in merge com.: preparation for transition, clean-up
Churn
Churn in non-merge commits
Churn in merge commits
Intercept (α) 1,336*** -1,297**
log(TotalCommits) 0,529** 1,113**
AgeAtTravis -0,003* -0,005**
log(NumAuthors) -0,233** -0,522**
time (β) -0,007 -0,012*
interventionTrue (ɣ) 0,071 0,220**
time_after_intervention (δ) -0,009 -0,022**
Control variables
n.s.
Churn in non-merge commits is not affected by time or Travis CI
Decrease in churn in merge commits is amplified by Travis CIDiscontinuity in merge com.: preparation for transition, clean-up
Churn
●● ●
●●●●
● ●●●●●
●●●●● ● ●
●
●
●
●●●
●●●● ●●●
●●●
●●●●●●●●● ●●●
●
●●
●
●
●●● ● ●
●●
●●
●
●
●
●
●●●●●●1
5102050
100200500
5000
−12 −11 −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 7 8 9 10 11 12Month index w.r.t. Travis CI adoption
Mea
n m
erge
chu
rn (L
OC
)Churn in merge commits
Churn in non-merge commits
Churn in merge commits
Intercept (α) 1,336*** -1,297**
log(TotalCommits) 0,529** 1,113**
AgeAtTravis -0,003* -0,005**
log(NumAuthors) -0,233** -0,522**
time (β) -0,007 -0,012*
interventionTrue (ɣ) 0,071 0,220**
time_after_intervention (δ) -0,009 -0,022**
Control variables
n.s.
Churn in non-merge commits is not affected by time or Travis CI
Decrease in churn in merge commits is amplified by Travis CIDiscontinuity in merge com.: preparation for transition, clean-up
Churn
●● ●
●●●●
● ●●●●●
●●●●● ● ●
●
●
●
●●●
●●●● ●●●
●●●
●●●●●●●●● ●●●
●
●●
●
●
●●● ● ●
●●
●●
●
●
●
●
●●●●●●1
5102050
100200500
5000
−12 −11 −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 7 8 9 10 11 12Month index w.r.t. Travis CI adoption
Mea
n m
erge
chu
rn (L
OC
)Churn in merge commits
55
introduced Travis to their projects
Triangulation: user survey
R4: “commits became smaller and more frequent, to check the build; pull requests became easier to check”
Decrease in churn in merge commits is amplified by Travis CI
Discontinuity in merge commits: preparation for transition, clean-up
R25: “contributors couldn’t be trusted to run test suite on their own”
R38: Travis as “a part of automated package/release effort”
Quality and Productivity Outcomes Relating toContinuous Integration in GitHub
Bogdan Vasilescu†⇤ ,Yue Yu‡†⇤,Huaimin Wang‡,Premkumar Devanbu†,Vladimir Filkov†
†Department of Computer Science ‡College of ComputerUniversity of California, Davis National University of Defense Technology
Davis, CA 95616, USA Changsha, 410073, China{vasilescu, ptdevanbu, vfilkov}@ucdavis.edu {yuyue, hmwang}@nudt.edu.cn
ABSTRACTSoftware processes comprise many steps; coding is followedby building, integration testing, system testing, deployment,operations, among others. Software process integration andautomation have been areas of key concern in software engi-neering, ever since the pioneering work of Osterweil; marketpressures for Agility, and open, decentralized, software de-velopment have provided additional pressures for progress inthis area. But do these innovations actually help projects?Given the numerous confounding factors that can influenceproject performance, it can be a challenge to discern the ef-fects of process integration and automation. Software projectecosystems such as GitHub provide a new opportunity inthis regard: one can readily find large numbers of projectsin various stages of process integration and automation, andgather data on various influencing factors as well as produc-tivity and quality outcomes. In this paper we use large,historical data on process metrics and outcomes in GitHubprojects to discern the e↵ects of one specific innovation inprocess automation: continuous integration. Our main find-ing is that continuous integration improves the productivityof project teams, who can integrate more outside contribu-tions, without an observable diminishment in code quality.
Categories and Subject DescriptorsD.2.5 [Software Engineering]: Testing and Debugging—Testing tools
General TermsExperimentation, Human Factors
KeywordsContinuous integration, GitHub, pull requests
⇤Bogdan Vasilescu and Yue Yu are both first authors, andcontributed equally to the work.
1. INTRODUCTIONInnovations in software technology are central to economic
growth. People place ever-increasing demands on software,in terms of features, security, reliability, cost, and ubiquity;and these demands come at an increasingly faster rate. Asthe appetites grow for ever more powerful software, the hu-man teams working on them have to grow, and work moree�ciently together.
Modern games, for example, require very large bodies ofcode, matched by teams in the tens and hundreds of devel-opers, and development time in years. Meanwhile, teamsare globally distributed, and sometimes (e.g., with opensource software development) even have no centralized con-trol. Keeping up with market demands in an agile, orga-nized, repeatable fashion, with little or no centralized con-trol, requires a variety of approaches, including the adop-tion of technology to enable process automation. ProcessAutomation per se is an old idea, going back to the pio-neering work of Osterweil [32]; but recent trends such asopen-source, distributed development, cloud computing, andsoftware-as-a-service, have increased demands for this tech-nology, and led to many innovations. Examples of such in-novations are distributed collaborative technologies like gitrepositories, forking, pull requests, continuous integration,and the DEVOPS movement [36]. Despite rapid changes, itis di�cult to know how much these innovations are helpingimprove project outcomes such as productivity and quality.A great many factors such as code size, age, team size, anduser interest can influence outcomes; therefore, teasing outthe e↵ect of any kind of technological or process innovationcan be a challenge.
The GitHub ecosystem provides a very timely opportu-nity for study of this specific issue. It is very popular (in-creasingly so) and hosts a tremendous diversity of projects.GitHub also comprises a variety of technologies for dis-tributed, decentralized, social software development, com-prising version control, social networking features, and pro-cess automation. The development process on GitHub ismore democratic than most open-source projects: anyonecan submit contributions in the form of pull requests. A pullrequest is a candidate, proposed code change, sometimesresponsive to a previously submitted modification request(or issue). These pull requests are reviewed by project in-siders (aka core developers, or integrators), and accepted ifdeemed of su�cient quality and utility. Projects that aremore popular and widely used can be expected to attractmore interest, and more pull requests; these will have to be
ESEC/FSE ‘15
Closed PRsAmong others:
• On average, more PRs are being closed per unit time after adopting Travis CI
Closed PRs
●●●●●●
●●
●●● ●
● ●
1
51020
50100200
500
−12 −11 −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 7 8 9 10 11 12Month index w.r.t. Travis CI adoption
Num
clo
sed
PRs
Num
ber
Increasing trend only before adopting Travis CI
Closed PRs
●●●●●●
●●
●●● ●
● ●
1
51020
50100200
500
−12 −11 −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 7 8 9 10 11 12Month index w.r.t. Travis CI adoption
Num
clo
sed
PRs
Num
ber
●●●
●
●●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●●
●●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●●
●●
●
●●●●●●
●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●●
●
●
●
●●●
●
●
●●●●●
●
●
15
102050
100200500
5000
−12 −11 −10 −9 −8 −7 −6 −5 −4 −3 −2 −1 1 2 3 4 5 6 7 8 9 10 11 12Month index w.r.t. Travis CI adoption
Mea
n PR
late
ncy
Late
ncy
Increasing trend only before adopting Travis CI
More frequent commits
Smaller code changes
Quick pull requests resolution
Impact on automated
testing?RQs
Both before and after
Travis
Affected only for merge commits
# increases pre-Travis, flattened out by Travis duration not affected
Increasing trend slowed
down
↓missing files/dep ↑comp/exec errors
↑failed tests
More issues and pull requests closed
More frequent commits
Smaller code changes
Quick pull requests resolution
Impact on automated
testing?
RQs
Not affected by
Travis
Affected only for merge commits
# increases pre-Travis, flattened out by Travis duration not affected
increasing trend slowed
down
↓missing files/dep ↑comp/exec errors
↑failed tests
More issues and pull requests closed
Yangyang ZhaoAlexander SerebrenikYuming ZhouVladimir FilkovBogdan Vasilescu
Awards: 1717415, 1717370
Award: BK20130014
Interrupted time series
slope before
slope after
change in level
yi = α + β · timei + ɣ · interventioni + δ · time_after_interventioni + εi