The dynamics of software evolution - EVOLUMONS 2011

Preview:

DESCRIPTION

Slides of my talk at EVOLUMONS 2011 http://informatique.umons.ac.be/genlog/EvolMons/EvolMons2011.html

Citation preview

1

http://www.uax.es http://herraiz.org

The dynamics of software evolutionThe dynamics of software evolution

EVOLUMONS 2011EVOLUMONS 2011Research Seminar on Software EvolutionResearch Seminar on Software Evolution

Université de Mons, BelgiumUniversité de Mons, BelgiumJanuary 26th 2011January 26th 2011

Israel HerraizIsrael HerraizUniversidad Alfonso X el SabioUniversidad Alfonso X el Sabio

<<isra@herraiz.orgisra@herraiz.org>><<herraiz@uax.esherraiz@uax.es>>

http://www.uax.es http://herraiz.org

(c) 2011 Israel HerraizThis work is licensed under the

Creative Commons Attribution-Share Alike 3.0

To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/

or send a letter to

Creative Commons, 171 Second Street, Suite 300,

San Francisco, California, 94105, USA.

Get the full bibliographic references listed in these slides athttp://herraiz.org/stuff/evolumons_references_20110126.txt

3

http://www.uax.es http://herraiz.org

OutlineOutline

● The laws of software evolution● The nature of software evolution (for libre

software)● How to accurately forecast software evolution.

And why it works.● What's next?● And what did I learn during all these years of

work?

4

http://www.uax.es http://herraiz.org

The laws of software evolutionThe laws of software evolution

5

http://www.uax.es http://herraiz.org

My backgroundMy background

● Educated as a chemical and mechanical engineer

● Wasted my time in the chemical industry. But I did (and do) love doing software!

– http://caflur.sf.net http://gpinch.sf.net

● Involved in the open source community since around 2001, started a PhD in 2004 in the Libresoft research group

– http://libresoft.es

6

http://www.uax.es http://herraiz.org

How it all startedHow it all started

● Godfrey and Tu [GT00] [GT01] studied the evolution of the Linux kernel

● They said that the laws of software evolution were not valid for Linux

– Laws of software evolution. What is that?

● My supervisors and I wrote a paper on the topic [RAGBH05]

● At the time, I thought it was just one more paper

● It turned out to be our most cited paper● Completely puzzled

me

7

http://www.uax.es http://herraiz.org

The topic background:The topic background:Software evolutionSoftware evolution

● How and why does software evolve?

● Meir M. Lehman Laws of software evolution

● “Program evolution. Processes of software change” published in 1985

8

http://www.uax.es http://herraiz.org

The laws in the seventiesThe laws in the seventies

● Laws of Program Evolution Dynamics (1974)

[Leh74] [Leh85b]

9

http://www.uax.es http://herraiz.org

The evolution of the laws of The evolution of the laws of software evolutionsoftware evolution

[Leh74][Leh85b]

[Leh78][Leh85c]

[Leh80][LB85]

[Leh96] [LRW+97][MFRP06]

10

http://www.uax.es http://herraiz.org

The laws in the present day The laws in the present day (I – IV)(I – IV)

11

http://www.uax.es http://herraiz.org

The laws in the present day The laws in the present day (V – VIII)(V – VIII)

12

http://www.uax.es http://herraiz.org

Empirical studies of software Empirical studies of software evolutionevolution

See “Empirical Studies of Open Source Evolution” by Juan Fernandez-Ramil, Angela Lozano, Michel Wermelinger, Andrea Capiluppi in Tom Mens, Serge Demeyer (eds.) Software Evolution

13

http://www.uax.es http://herraiz.org

Why the controversy about the laws Why the controversy about the laws of software evolution?of software evolution?

● Fernandez-Ramil et al. found in the literature empirical validation for the I, VI, VII (partially) and VIII (partially)

● The most interesting part (for me)– Statistical analysis of software projects and their

evolution, using time series analysis among other techniques (suggested in ¡1974!) [Leh74] [Leh85b]

– “For maximum cost-effectiveness, management consideration and judgement should include the entire history of the project with the current state having the strongest, but not exclusive, influence” [Leh78] [Leh85c]

14

http://www.uax.es http://herraiz.org

The nature of (libre) software The nature of (libre) software evolutionevolution

15

http://www.uax.es http://herraiz.org

The nature of (libre) software The nature of (libre) software evolutionevolution

● The goal is to develop a theoretical model for software evolution

● Long pursued goal● Lehman and Belady in 1971 [BL71] [LB85]● Woodside progressive and anti-regressive work

[Woo80] (included in [LB85])● Turski models [Tur96] [Tur02]

– Growth is inversely proportional to complexity– Complexity is proportional to the square of size

16

http://www.uax.es http://herraiz.org

More recent modelsMore recent models

● Self-Organized criticality [Wu06] [WHH07]● Power laws for the size of the system● Long range correlations in the time series of

changes

● Maintenance Guidance Model [CFR07]● Those functions that have suffered more changes in

the past are more likely to be changed in the future● Assumptions:

– Distribution of accumulated changes is asymmetrical– Developers prioritize changes using past number of

changes and complexity

17

http://www.uax.es http://herraiz.org

Determinism and evolutionDeterminism and evolution

● Self Organized Criticality● This means that current events are influenced by

very old events● Against Lehman suggestions [Leh78] [Leh85c]

● In my opinion, counter intuitive

http://www.uax.es http://herraiz.org

Long range correlated processesLong range correlated processes

http://www.uax.es http://herraiz.org

Long range correlated processesLong range correlated processes

http://www.uax.es http://herraiz.org

Long range correlated processesLong range correlated processes

Unreachable

http://www.uax.es http://herraiz.org

Short range correlatedShort range correlated

http://www.uax.es http://herraiz.org

Short range correlatedShort range correlated

http://www.uax.es http://herraiz.org

Short range correlatedShort range correlated

http://www.uax.es http://herraiz.org

Short range correlatedShort range correlated

http://www.uax.es http://herraiz.org

How is software evolution?How is software evolution?

or ?

http://www.uax.es http://herraiz.org

Autocorrelation coefficientsAutocorrelation coefficients

1 2 3 4 5...

1 2 3 4...

1 2 3...

r(1)

r(2)

.

.

.

http://www.uax.es http://herraiz.org

Autocorrelation coefficientsAutocorrelation coefficientsr(k)

k1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1

0

http://www.uax.es http://herraiz.org

Autocorrelation coefficientsAutocorrelation coefficientsr(k)

k1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1

0

Long rangecorrelated

r k ~k 2d−1

Short rangecorrelated

(ARIMA process)r k ~C 1−k

0d0.5

http://www.uax.es http://herraiz.org

Autocorrelation coefficientsAutocorrelation coefficientsr(k)

k

1

0

Short rangecorrelated

(ARIMA process)

Long rangecorrelated

Logarithmicscale

r k ~k 2d−1

r k ~Ai 1−k

0d0.5

http://www.uax.es http://herraiz.org

Empirical studyEmpirical study

● 3,821 software projects– More than 3 developers– More than 1 year of active history– 9,234,104 commits / 2,357,438 modification requests– Projects registered between Nov. 1999 and Dec. 2004– Datasets publicly available

● See Determinism and evolution– 5th International Working Conference on

Mining Software Repositories (MSR 2008)FLOSSMole

+CVSAnalY-SF

http://www.uax.es http://herraiz.org

MethodologyMethodology

● Liner correlation to calculate linearity● Distribution of the Pearson coefficients● Smoothing applied to the series before

calculating ACF

http://www.uax.es http://herraiz.org

ResultsResults

http://www.uax.es http://herraiz.org

ResultsResults

http://www.uax.es http://herraiz.org

ResultsResults

Shortmemoryprocesses

Longmemoryprocesses

35

http://www.uax.es http://herraiz.org

Looking at the numbersLooking at the numbers

Quantile Commits0 0.3235 0.2886

20 0.7394 0.724840 0.8178 0.803660 0.8906 0.870580 0.9783 0.9464

100 0.9998 0.9998

MRs

Long memory process

Short memory process

http://www.uax.es http://herraiz.org

Implications for evolutionImplications for evolution

● Short memory -> Yesterday's weatherhttp://doi.ieeecomputersociety.org/10.1109/ICSM.2004.1357788

● When deciding, current situation should have more influence● As Lehman said in 1978

37

http://www.uax.es http://herraiz.org

How to forecast software evolutionHow to forecast software evolution

38

http://www.uax.es http://herraiz.org

BackgroundBackground

● Forecasting traditionally done using very simple statistical models● Regression

● Lehman suggested in 1974 that Time Series Analysis was the best approach to study software evolution

● Let's compare time series analysis against regression models

39

http://www.uax.es http://herraiz.org

Case studiesCase studies

Time

1993 1995 1997 1999 2001 2003 2005 2007

PostgreSQL

FreeBSD

NetBSD

Training set Test set

40

http://www.uax.es http://herraiz.org

Case studiesCase studies

Training set Test set

http://www.uax.es http://herraiz.org

Time Series AnalysisTime Series Analysis

Originaltime series

data

ACFPACF

Clearpattern?

Kernelsmoothing

p, d, qbased on

ACF / PACF

ARIMA modelfitting

Predictions

No

Yes

http://www.uax.es http://herraiz.org

Parameters of the modelParameters of the model

http://www.uax.es http://herraiz.org

Autocorrelation coefficients. Autocorrelation coefficients. No smoothingNo smoothing

http://www.uax.es http://herraiz.org

Autocorrelation coefficients. Autocorrelation coefficients. After smoothingAfter smoothing

http://www.uax.es http://herraiz.org

Parameters of all the modelsParameters of all the models

● Time series ARIMA model● d = 1 q = 0 p = 6, 7 or 9

● Regression model● r > 0.99

http://www.uax.es http://herraiz.org

How does the model look like?How does the model look like?

∇dxt 1−∑

j=1

q

jBj=t1−∑

i=1

p

iBi

∇ xt=xt−xt−1=1−Bxt

∇dxt=1−Bdxt

Bi=Bxt

i =xt−i

http://www.uax.es http://herraiz.org

How does the model look like?How does the model look like?

∇dxt 1−∑

j=1

q

jBj=t1−∑

i=1

p

iBi

Linear component

Linear component

Parameters ofthe model

Predicted / Actual values Coefficients Estimation errors

http://www.uax.es http://herraiz.org

Results Results Time series (ARIMA) vs. regressionTime series (ARIMA) vs. regression

ARIMA Regression3.93 16.891.80 15.941.48 6.86

FreeBSDNetBSD

PostgreSQL

Mean Squared Relative Error

http://www.uax.es http://herraiz.org

ConclusionsConclusions

● Time Series more accurate than Regression Analysis for macroscopic predictions

● Basic model. More components can be added.● Seasonality● Multi-variable, combining different factors

http://www.uax.es http://herraiz.org

More resultsMore results

● Ok, so you predicted last year...which is past...● What about predicting real future?

MSR Challenge 2007 winners

Goal: predicting the number of changes in Eclipse in the next three monthshttp://dx.doi.org/10.1109/MSR.2007.10

http://www.uax.es http://herraiz.org

Why this works?Why this works?

● Isn't it too accurate?● Why do you think this works?

52

http://www.uax.es http://herraiz.org

What's next?What's next?

53

http://www.uax.es http://herraiz.org

Further workFurther work

● Write a paper about the controversy around the validation of the laws of software evolution● In progress

● Write a paper about the short memory nature of evolution● Using Time Series Analysis to show it● And ARIMA as a forecasting tool● Extracting principles and guidelines for software

projects management

54

http://www.uax.es http://herraiz.org

And what I did learn during all these And what I did learn during all these years?years?

55

http://www.uax.es http://herraiz.org

Things I appreciate my advisors didThings I appreciate my advisors did

● Freedom of movements● Pressure to get my own funding● Unconditional support● Demanding and challenging environment● Opportunity to coordinate projects● And to participate in many meetings alone

56

http://www.uax.es http://herraiz.org

Things that I did not know and I do Things that I did not know and I do nownow

● Know-how about conferences and journals● English skills● Writing skills (papers and proposals)● Presentation skills● Self-motivation

– Brick walls are there for the rest of people– Experience is what you get when you don't get what

you want– Never give up

– http://www.youtube.com/watch?v=ji5_MqicxSo

57

http://www.uax.es http://herraiz.org

Take awayTake away

Laws ofSoftware Evolution

Controversy

Statisticalapproach

Replicable study

Short memorydynamics

ARIMAaccurate forecast

Brick walls area good thing

Keep working.Don't give up

Recommended