29
Co-evolution of Infrastructure and Source Code - An Empirical Study Yujuan Jiang, Bram Adams MCIS lab Polytechnique Montreal, Canada

Infrastructure

Embed Size (px)

Citation preview

Co-evolution of Infrastructure and Source Code

- An Empirical Study

Yujuan Jiang, Bram Adams MCIS lab Polytechnique Montreal, Canada

Server 1: Ubuntu

Server 2: centOS

Infrastructure Code Automates Environment Setup

Commit Build

Deploy

Deploy

Automate Instantiation of Web Server with Puppet & Chef

# Chef snippet case node[:platform] when “ubuntu” package “httpd-v1” do version “2.4.12” action: install end when “centOS” package “httpd-v2” do version “2.2.29” action: install end end

# Puppet snippet case $platform{ ‘ubuntu’: { package {‘httpd-v1’: ensure => “2.4.12” } } ‘centOS’: { package {‘httpd-v2’: ensure => “2.2.29” } } }

Infrastructure Code Widely Used by Large Companies

Uses Both Chef & Puppet + Large Data Set of 262 Repos

How much effort do you need to pay to maintain infrastructure code?

Preliminary & Research Questions

TestInf

Inf

PQ1: How many !infrastructure files !

does a project have?

PQ2: How many !infrastructure files !change per month?

PQ3: How large !are infrastructure !system changes?

Inf

Bld

Prod

Test

Tester

Production !developer

Build developerInfrastructure !

developer

RQ1: How tight is the coupling !between infrastructure code and !

other kinds of code?RQ2: Who changes !

infrastructure code?

Prod BldTest

Inf Bld

Prod Other

Test

Collect all files !from repos

InfInfInfBldBldBld

ProdProdProdTestTestTest

OtherOther

Other

Classify files into 5 groups !(“Other” deserted) !

for each project

InfInfInf Inf

Split projects into 2 groups-!Multi & Single

Commit co-change Ownership coupling

TestInf

Inf

Statistical visualization

Monthly change ratio

Average churn

Preliminary analysis

Inf

Bld

Prod

Test

Production !developer

Build developer

Tester

Infrastructure !developer? ?

RQs: coupling relation

Approach

Prod BldTest

Inf Bld

Prod Other

Test

Collect all files !from repos

InfInfInfBldBldBld

ProdProdProdTestTestTest

OtherOther

Other

Classify files into 5 groups !(“Other” deserted) !

for each project

InfInfInf Inf

Split projects into 2 groups-!Multi & Single

Commit co-change Ownership coupling

TestInf

Inf

Statistical visualization

Monthly change ratio

Average churn

Preliminary analysis

Inf

Bld

Prod

Test

Production !developer

Build developer

Tester

Infrastructure !developer? ?

RQs: coupling relation

Approach

Prod BldTest

Inf Bld

Prod Other

Test

Collect all files !from repos

InfInfInfBldBldBld

ProdProdProdTestTestTest

OtherOther

Other

Classify files into 5 groups !(“Other” deserted) !

for each project

InfInfInf Inf

Split projects into 2 groups-!Multi & Single

Commit co-change Ownership coupling

TestInf

Inf

Statistical visualization

Monthly change ratio

Average churn

Preliminary analysis

Inf

Bld

Prod

Test

Production !developer

Build developer

Tester

Infrastructure !developer? ?

RQs: coupling relation

Approach

Prod BldTest

Inf Bld

Prod Other

Test

Collect all files !from repos

InfInfInfBldBldBld

ProdProdProdTestTestTest

OtherOther

Other

Classify files into 5 groups !(“Other” deserted) !

for each project

InfInfInf Inf

Split projects into 2 groups-!Multi & Single

Commit co-change Ownership coupling

TestInf

Inf

Statistical visualization

Monthly change ratio

Average churn

Preliminary analysis

Inf

Bld

Prod

Test

Production !developer

Build developer

Tester

Infrastructure !developer? ?

RQs: coupling relation

Approach

Prod BldTest

Inf Bld

Prod Other

Test

Collect all files !from repos

InfInfInfBldBldBld

ProdProdProdTestTestTest

OtherOther

Other

Classify files into 5 groups !(“Other” deserted) !

for each project

InfInfInf Inf

Split projects into 2 groups-!Multi & Single

Commit co-change Ownership coupling

TestInf

Inf

Statistical visualization

Monthly change ratio

Average churn

Preliminary analysis

Inf

Bld

Prod

Test

Production !developer

Build developer

Tester

Infrastructure !developer? ?

RQs: coupling relation

Approach

Prod BldTest

Inf Bld

Prod Other

Test

Collect all files !from repos

InfInfInfBldBldBld

ProdProdProdTestTestTest

OtherOther

Other

Classify files into 5 groups !(“Other” deserted) !

for each project

InfInfInf Inf

Split projects into 2 groups-!Multi & Single

Commit co-change Ownership coupling

TestInf

Inf

Statistical visualization

Monthly change ratio

Average churn

Preliminary analysis

Inf

Bld

Prod

Test

Production !developer

Build developer

Tester

Infrastructure !developer? ?

RQs: coupling relation

Approach

Case Study Results

TestInf

Inf

PQ1: How many !infrastructure files !

does a project have?

PQ2: How many !infrastructure files !change per month?

PQ3: How large !are infrastructure !system changes?

Inf

Bld

Prod

Test

Tester

Production !developer

Build developerInfrastructure !

developer

RQ1: How tight is the coupling !between infrastructure code and !

other kinds of code?RQ2: Who changes !

infrastructure code?

! PQ1: Infrastructure files almost as large as source code and test files!

TestInf

Inf

File Size !(LOC)

Infrastructure Build Production Test

1100

10000

2,486

Infrastructure Build Production Test

54

2991 2768

TestInf

Inf

PQ1: How many !infrastructure files !

does a project have?

PQ2: How many !infrastructure files !change per month?

PQ3: How large !are infrastructure !system changes?

Inf

Bld

Prod

Test

Tester

Production !developer

Build developerInfrastructure !

developer

RQ1: How tight is the coupling !between infrastructure code and !

other kinds of code?RQ2: Who changes !

infrastructure code?

Case Study Results

! PQ2: The monthly change for infrastructure files has a

median value of 0.28

Infrastructure vs BuildInfrastructure & Build Production & Test

Comparable to production, and!

Higher than Build & Test!0.28 0.28

0.180.21

The proportion of files

changed per month

TestInf

Inf

PQ1: How many !infrastructure files !

does a project have?

PQ2: How many !infrastructure files !change per month?

PQ3: How large !are infrastructure !system changes?

Inf

Bld

Prod

Test

Tester

Production !developer

Build developerInfrastructure !

developer

RQ1: How tight is the coupling !between infrastructure code and !

other kinds of code?RQ2: Who changes !

infrastructure code?

Case Study Results

Infrastructure vs Build Production vs Test

! PQ3: Average churn per file is the highest across all file

categoriesAverage MCF

(Monthly Churn/File)

Infrastructure & Build Production & Test

TestInf

Inf

PQ1: How many !infrastructure files !

does a project have?

PQ2: How many !infrastructure files !change per month?

PQ3: How large !are infrastructure !system changes?

Inf

Bld

Prod

Test

Tester

Production !developer

Build developerInfrastructure !

developer

RQ1: How tight is the coupling !between infrastructure code and !

other kinds of code?RQ2: Who changes !

infrastructure code?

Case Study Results

Using Confidence to Measure Coupling!

Beer? Diapers?

Association Rules

! RQ1: The changes to Infrastructure files are tightly coupled with the changes to Test

and Production files.!

0.0

0.1

0.2

0.3

0.4

0.5

Infrastructure <=> Build Infrastructure <=> Production Infrastructure <=> Test

LegendProbability that left requires changes to right

Vice versa

Implication of Infrastructure !to other Category Code Change

0.2637

0.4583

0.03470.1085 0.0885

0.2578

RQ1: The most common reasons for the coupling between Infrastructure and Test are

“Integration” and “Update”.!

INTEGRATION!(e.g.: enabling new !

test modules !or integrating new !

test cases)

UPDATE (e.g.: changed a global !

variable valu)

TestInf

Inf

PQ1: How many !infrastructure files !

does a project have?

PQ2: How many !infrastructure files !change per month?

PQ3: How large !are infrastructure !system changes?

Inf

Bld

Prod

Test

RQ1: How tight is the coupling !between infrastructure code and !

other kinds of code?RQ2: Who changes !

infrastructure code?

Case Study Results

Tester

Production !developer

Build developerInfrastructure !

developer

TestInf

Inf

PQ1: How many !infrastructure files !

does a project have?

PQ2: How many !infrastructure files !change per month?

PQ3: How large !are infrastructure !system changes?

Inf

Bld

Prod

Test

RQ1: How tight is the coupling !between infrastructure code and !

other kinds of code?RQ2: Who changes !

infrastructure code?

Case Study Results

Tester

Production !developer

Build developerInfrastructure !

developerCheck out !our paper !please! :)