57
1 http://www.uax.es http://herraiz.org The dynamics of software evolution The dynamics of software evolution EVOLUMONS 2011 EVOLUMONS 2011 Research Seminar on Software Evolution Research Seminar on Software Evolution Université de Mons, Belgium Université de Mons, Belgium January 26th 2011 January 26th 2011 Israel Herraiz Israel Herraiz Universidad Alfonso X el Sabio Universidad Alfonso X el Sabio < < [email protected] [email protected] > > < < [email protected] [email protected] > >

The dynamics of software evolution - EVOLUMONS 2011

Embed Size (px)

DESCRIPTION

Slides of my talk at EVOLUMONS 2011 http://informatique.umons.ac.be/genlog/EvolMons/EvolMons2011.html

Citation preview

Page 1: The dynamics of software evolution - EVOLUMONS 2011

1

http://www.uax.es http://herraiz.org

The dynamics of software evolutionThe dynamics of software evolution

EVOLUMONS 2011EVOLUMONS 2011Research Seminar on Software EvolutionResearch Seminar on Software Evolution

Université de Mons, BelgiumUniversité de Mons, BelgiumJanuary 26th 2011January 26th 2011

Israel HerraizIsrael HerraizUniversidad Alfonso X el SabioUniversidad Alfonso X el Sabio

<<[email protected]@herraiz.org>><<[email protected]@uax.es>>

Page 2: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

(c) 2011 Israel HerraizThis work is licensed under the

Creative Commons Attribution-Share Alike 3.0

To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/

or send a letter to

Creative Commons, 171 Second Street, Suite 300,

San Francisco, California, 94105, USA.

Get the full bibliographic references listed in these slides athttp://herraiz.org/stuff/evolumons_references_20110126.txt

Page 3: The dynamics of software evolution - EVOLUMONS 2011

3

http://www.uax.es http://herraiz.org

OutlineOutline

● The laws of software evolution● The nature of software evolution (for libre

software)● How to accurately forecast software evolution.

And why it works.● What's next?● And what did I learn during all these years of

work?

Page 4: The dynamics of software evolution - EVOLUMONS 2011

4

http://www.uax.es http://herraiz.org

The laws of software evolutionThe laws of software evolution

Page 5: The dynamics of software evolution - EVOLUMONS 2011

5

http://www.uax.es http://herraiz.org

My backgroundMy background

● Educated as a chemical and mechanical engineer

● Wasted my time in the chemical industry. But I did (and do) love doing software!

– http://caflur.sf.net http://gpinch.sf.net

● Involved in the open source community since around 2001, started a PhD in 2004 in the Libresoft research group

– http://libresoft.es

Page 6: The dynamics of software evolution - EVOLUMONS 2011

6

http://www.uax.es http://herraiz.org

How it all startedHow it all started

● Godfrey and Tu [GT00] [GT01] studied the evolution of the Linux kernel

● They said that the laws of software evolution were not valid for Linux

– Laws of software evolution. What is that?

● My supervisors and I wrote a paper on the topic [RAGBH05]

● At the time, I thought it was just one more paper

● It turned out to be our most cited paper● Completely puzzled

me

Page 7: The dynamics of software evolution - EVOLUMONS 2011

7

http://www.uax.es http://herraiz.org

The topic background:The topic background:Software evolutionSoftware evolution

● How and why does software evolve?

● Meir M. Lehman Laws of software evolution

● “Program evolution. Processes of software change” published in 1985

Page 8: The dynamics of software evolution - EVOLUMONS 2011

8

http://www.uax.es http://herraiz.org

The laws in the seventiesThe laws in the seventies

● Laws of Program Evolution Dynamics (1974)

[Leh74] [Leh85b]

Page 9: The dynamics of software evolution - EVOLUMONS 2011

9

http://www.uax.es http://herraiz.org

The evolution of the laws of The evolution of the laws of software evolutionsoftware evolution

[Leh74][Leh85b]

[Leh78][Leh85c]

[Leh80][LB85]

[Leh96] [LRW+97][MFRP06]

Page 10: The dynamics of software evolution - EVOLUMONS 2011

10

http://www.uax.es http://herraiz.org

The laws in the present day The laws in the present day (I – IV)(I – IV)

Page 11: The dynamics of software evolution - EVOLUMONS 2011

11

http://www.uax.es http://herraiz.org

The laws in the present day The laws in the present day (V – VIII)(V – VIII)

Page 12: The dynamics of software evolution - EVOLUMONS 2011

12

http://www.uax.es http://herraiz.org

Empirical studies of software Empirical studies of software evolutionevolution

See “Empirical Studies of Open Source Evolution” by Juan Fernandez-Ramil, Angela Lozano, Michel Wermelinger, Andrea Capiluppi in Tom Mens, Serge Demeyer (eds.) Software Evolution

Page 13: The dynamics of software evolution - EVOLUMONS 2011

13

http://www.uax.es http://herraiz.org

Why the controversy about the laws Why the controversy about the laws of software evolution?of software evolution?

● Fernandez-Ramil et al. found in the literature empirical validation for the I, VI, VII (partially) and VIII (partially)

● The most interesting part (for me)– Statistical analysis of software projects and their

evolution, using time series analysis among other techniques (suggested in ¡1974!) [Leh74] [Leh85b]

– “For maximum cost-effectiveness, management consideration and judgement should include the entire history of the project with the current state having the strongest, but not exclusive, influence” [Leh78] [Leh85c]

Page 14: The dynamics of software evolution - EVOLUMONS 2011

14

http://www.uax.es http://herraiz.org

The nature of (libre) software The nature of (libre) software evolutionevolution

Page 15: The dynamics of software evolution - EVOLUMONS 2011

15

http://www.uax.es http://herraiz.org

The nature of (libre) software The nature of (libre) software evolutionevolution

● The goal is to develop a theoretical model for software evolution

● Long pursued goal● Lehman and Belady in 1971 [BL71] [LB85]● Woodside progressive and anti-regressive work

[Woo80] (included in [LB85])● Turski models [Tur96] [Tur02]

– Growth is inversely proportional to complexity– Complexity is proportional to the square of size

Page 16: The dynamics of software evolution - EVOLUMONS 2011

16

http://www.uax.es http://herraiz.org

More recent modelsMore recent models

● Self-Organized criticality [Wu06] [WHH07]● Power laws for the size of the system● Long range correlations in the time series of

changes

● Maintenance Guidance Model [CFR07]● Those functions that have suffered more changes in

the past are more likely to be changed in the future● Assumptions:

– Distribution of accumulated changes is asymmetrical– Developers prioritize changes using past number of

changes and complexity

Page 17: The dynamics of software evolution - EVOLUMONS 2011

17

http://www.uax.es http://herraiz.org

Determinism and evolutionDeterminism and evolution

● Self Organized Criticality● This means that current events are influenced by

very old events● Against Lehman suggestions [Leh78] [Leh85c]

● In my opinion, counter intuitive

Page 18: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Long range correlated processesLong range correlated processes

Page 19: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Long range correlated processesLong range correlated processes

Page 20: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Long range correlated processesLong range correlated processes

Unreachable

Page 21: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Short range correlatedShort range correlated

Page 22: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Short range correlatedShort range correlated

Page 23: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Short range correlatedShort range correlated

Page 24: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Short range correlatedShort range correlated

Page 25: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

How is software evolution?How is software evolution?

or ?

Page 26: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Autocorrelation coefficientsAutocorrelation coefficients

1 2 3 4 5...

1 2 3 4...

1 2 3...

r(1)

r(2)

.

.

.

Page 27: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Autocorrelation coefficientsAutocorrelation coefficientsr(k)

k1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1

0

Page 28: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Autocorrelation coefficientsAutocorrelation coefficientsr(k)

k1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1

0

Long rangecorrelated

r k ~k 2d−1

Short rangecorrelated

(ARIMA process)r k ~C 1−k

0d0.5

Page 29: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Autocorrelation coefficientsAutocorrelation coefficientsr(k)

k

1

0

Short rangecorrelated

(ARIMA process)

Long rangecorrelated

Logarithmicscale

r k ~k 2d−1

r k ~Ai 1−k

0d0.5

Page 30: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Empirical studyEmpirical study

● 3,821 software projects– More than 3 developers– More than 1 year of active history– 9,234,104 commits / 2,357,438 modification requests– Projects registered between Nov. 1999 and Dec. 2004– Datasets publicly available

● See Determinism and evolution– 5th International Working Conference on

Mining Software Repositories (MSR 2008)FLOSSMole

+CVSAnalY-SF

Page 31: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

MethodologyMethodology

● Liner correlation to calculate linearity● Distribution of the Pearson coefficients● Smoothing applied to the series before

calculating ACF

Page 32: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

ResultsResults

Page 33: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

ResultsResults

Page 34: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

ResultsResults

Shortmemoryprocesses

Longmemoryprocesses

Page 35: The dynamics of software evolution - EVOLUMONS 2011

35

http://www.uax.es http://herraiz.org

Looking at the numbersLooking at the numbers

Quantile Commits0 0.3235 0.2886

20 0.7394 0.724840 0.8178 0.803660 0.8906 0.870580 0.9783 0.9464

100 0.9998 0.9998

MRs

Long memory process

Short memory process

Page 36: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Implications for evolutionImplications for evolution

● Short memory -> Yesterday's weatherhttp://doi.ieeecomputersociety.org/10.1109/ICSM.2004.1357788

● When deciding, current situation should have more influence● As Lehman said in 1978

Page 37: The dynamics of software evolution - EVOLUMONS 2011

37

http://www.uax.es http://herraiz.org

How to forecast software evolutionHow to forecast software evolution

Page 38: The dynamics of software evolution - EVOLUMONS 2011

38

http://www.uax.es http://herraiz.org

BackgroundBackground

● Forecasting traditionally done using very simple statistical models● Regression

● Lehman suggested in 1974 that Time Series Analysis was the best approach to study software evolution

● Let's compare time series analysis against regression models

Page 39: The dynamics of software evolution - EVOLUMONS 2011

39

http://www.uax.es http://herraiz.org

Case studiesCase studies

Time

1993 1995 1997 1999 2001 2003 2005 2007

PostgreSQL

FreeBSD

NetBSD

Training set Test set

Page 40: The dynamics of software evolution - EVOLUMONS 2011

40

http://www.uax.es http://herraiz.org

Case studiesCase studies

Training set Test set

Page 41: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Time Series AnalysisTime Series Analysis

Originaltime series

data

ACFPACF

Clearpattern?

Kernelsmoothing

p, d, qbased on

ACF / PACF

ARIMA modelfitting

Predictions

No

Yes

Page 42: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Parameters of the modelParameters of the model

Page 43: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Autocorrelation coefficients. Autocorrelation coefficients. No smoothingNo smoothing

Page 44: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Autocorrelation coefficients. Autocorrelation coefficients. After smoothingAfter smoothing

Page 45: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Parameters of all the modelsParameters of all the models

● Time series ARIMA model● d = 1 q = 0 p = 6, 7 or 9

● Regression model● r > 0.99

Page 46: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

How does the model look like?How does the model look like?

∇dxt 1−∑

j=1

q

jBj=t1−∑

i=1

p

iBi

∇ xt=xt−xt−1=1−Bxt

∇dxt=1−Bdxt

Bi=Bxt

i =xt−i

Page 47: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

How does the model look like?How does the model look like?

∇dxt 1−∑

j=1

q

jBj=t1−∑

i=1

p

iBi

Linear component

Linear component

Parameters ofthe model

Predicted / Actual values Coefficients Estimation errors

Page 48: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Results Results Time series (ARIMA) vs. regressionTime series (ARIMA) vs. regression

ARIMA Regression3.93 16.891.80 15.941.48 6.86

FreeBSDNetBSD

PostgreSQL

Mean Squared Relative Error

Page 49: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

ConclusionsConclusions

● Time Series more accurate than Regression Analysis for macroscopic predictions

● Basic model. More components can be added.● Seasonality● Multi-variable, combining different factors

Page 50: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

More resultsMore results

● Ok, so you predicted last year...which is past...● What about predicting real future?

MSR Challenge 2007 winners

Goal: predicting the number of changes in Eclipse in the next three monthshttp://dx.doi.org/10.1109/MSR.2007.10

Page 51: The dynamics of software evolution - EVOLUMONS 2011

http://www.uax.es http://herraiz.org

Why this works?Why this works?

● Isn't it too accurate?● Why do you think this works?

Page 52: The dynamics of software evolution - EVOLUMONS 2011

52

http://www.uax.es http://herraiz.org

What's next?What's next?

Page 53: The dynamics of software evolution - EVOLUMONS 2011

53

http://www.uax.es http://herraiz.org

Further workFurther work

● Write a paper about the controversy around the validation of the laws of software evolution● In progress

● Write a paper about the short memory nature of evolution● Using Time Series Analysis to show it● And ARIMA as a forecasting tool● Extracting principles and guidelines for software

projects management

Page 54: The dynamics of software evolution - EVOLUMONS 2011

54

http://www.uax.es http://herraiz.org

And what I did learn during all these And what I did learn during all these years?years?

Page 55: The dynamics of software evolution - EVOLUMONS 2011

55

http://www.uax.es http://herraiz.org

Things I appreciate my advisors didThings I appreciate my advisors did

● Freedom of movements● Pressure to get my own funding● Unconditional support● Demanding and challenging environment● Opportunity to coordinate projects● And to participate in many meetings alone

Page 56: The dynamics of software evolution - EVOLUMONS 2011

56

http://www.uax.es http://herraiz.org

Things that I did not know and I do Things that I did not know and I do nownow

● Know-how about conferences and journals● English skills● Writing skills (papers and proposals)● Presentation skills● Self-motivation

– Brick walls are there for the rest of people– Experience is what you get when you don't get what

you want– Never give up

– http://www.youtube.com/watch?v=ji5_MqicxSo

Page 57: The dynamics of software evolution - EVOLUMONS 2011

57

http://www.uax.es http://herraiz.org

Take awayTake away

Laws ofSoftware Evolution

Controversy

Statisticalapproach

Replicable study

Short memorydynamics

ARIMAaccurate forecast

Brick walls area good thing

Keep working.Don't give up