View
1.736
Download
4
Category
Tags:
Preview:
DESCRIPTION
Slides of my talk at EVOLUMONS 2011 http://informatique.umons.ac.be/genlog/EvolMons/EvolMons2011.html
Citation preview
1
http://www.uax.es http://herraiz.org
The dynamics of software evolutionThe dynamics of software evolution
EVOLUMONS 2011EVOLUMONS 2011Research Seminar on Software EvolutionResearch Seminar on Software Evolution
Université de Mons, BelgiumUniversité de Mons, BelgiumJanuary 26th 2011January 26th 2011
Israel HerraizIsrael HerraizUniversidad Alfonso X el SabioUniversidad Alfonso X el Sabio
<<isra@herraiz.orgisra@herraiz.org>><<herraiz@uax.esherraiz@uax.es>>
http://www.uax.es http://herraiz.org
(c) 2011 Israel HerraizThis work is licensed under the
Creative Commons Attribution-Share Alike 3.0
To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/
or send a letter to
Creative Commons, 171 Second Street, Suite 300,
San Francisco, California, 94105, USA.
Get the full bibliographic references listed in these slides athttp://herraiz.org/stuff/evolumons_references_20110126.txt
3
http://www.uax.es http://herraiz.org
OutlineOutline
● The laws of software evolution● The nature of software evolution (for libre
software)● How to accurately forecast software evolution.
And why it works.● What's next?● And what did I learn during all these years of
work?
4
http://www.uax.es http://herraiz.org
The laws of software evolutionThe laws of software evolution
5
http://www.uax.es http://herraiz.org
My backgroundMy background
● Educated as a chemical and mechanical engineer
● Wasted my time in the chemical industry. But I did (and do) love doing software!
– http://caflur.sf.net http://gpinch.sf.net
● Involved in the open source community since around 2001, started a PhD in 2004 in the Libresoft research group
– http://libresoft.es
6
http://www.uax.es http://herraiz.org
How it all startedHow it all started
● Godfrey and Tu [GT00] [GT01] studied the evolution of the Linux kernel
● They said that the laws of software evolution were not valid for Linux
– Laws of software evolution. What is that?
● My supervisors and I wrote a paper on the topic [RAGBH05]
● At the time, I thought it was just one more paper
● It turned out to be our most cited paper● Completely puzzled
me
7
http://www.uax.es http://herraiz.org
The topic background:The topic background:Software evolutionSoftware evolution
● How and why does software evolve?
● Meir M. Lehman Laws of software evolution
● “Program evolution. Processes of software change” published in 1985
8
http://www.uax.es http://herraiz.org
The laws in the seventiesThe laws in the seventies
● Laws of Program Evolution Dynamics (1974)
[Leh74] [Leh85b]
9
http://www.uax.es http://herraiz.org
The evolution of the laws of The evolution of the laws of software evolutionsoftware evolution
[Leh74][Leh85b]
[Leh78][Leh85c]
[Leh80][LB85]
[Leh96] [LRW+97][MFRP06]
10
http://www.uax.es http://herraiz.org
The laws in the present day The laws in the present day (I – IV)(I – IV)
11
http://www.uax.es http://herraiz.org
The laws in the present day The laws in the present day (V – VIII)(V – VIII)
12
http://www.uax.es http://herraiz.org
Empirical studies of software Empirical studies of software evolutionevolution
See “Empirical Studies of Open Source Evolution” by Juan Fernandez-Ramil, Angela Lozano, Michel Wermelinger, Andrea Capiluppi in Tom Mens, Serge Demeyer (eds.) Software Evolution
13
http://www.uax.es http://herraiz.org
Why the controversy about the laws Why the controversy about the laws of software evolution?of software evolution?
● Fernandez-Ramil et al. found in the literature empirical validation for the I, VI, VII (partially) and VIII (partially)
● The most interesting part (for me)– Statistical analysis of software projects and their
evolution, using time series analysis among other techniques (suggested in ¡1974!) [Leh74] [Leh85b]
– “For maximum cost-effectiveness, management consideration and judgement should include the entire history of the project with the current state having the strongest, but not exclusive, influence” [Leh78] [Leh85c]
●
14
http://www.uax.es http://herraiz.org
The nature of (libre) software The nature of (libre) software evolutionevolution
15
http://www.uax.es http://herraiz.org
The nature of (libre) software The nature of (libre) software evolutionevolution
● The goal is to develop a theoretical model for software evolution
● Long pursued goal● Lehman and Belady in 1971 [BL71] [LB85]● Woodside progressive and anti-regressive work
[Woo80] (included in [LB85])● Turski models [Tur96] [Tur02]
– Growth is inversely proportional to complexity– Complexity is proportional to the square of size
16
http://www.uax.es http://herraiz.org
More recent modelsMore recent models
● Self-Organized criticality [Wu06] [WHH07]● Power laws for the size of the system● Long range correlations in the time series of
changes
● Maintenance Guidance Model [CFR07]● Those functions that have suffered more changes in
the past are more likely to be changed in the future● Assumptions:
– Distribution of accumulated changes is asymmetrical– Developers prioritize changes using past number of
changes and complexity
17
http://www.uax.es http://herraiz.org
Determinism and evolutionDeterminism and evolution
● Self Organized Criticality● This means that current events are influenced by
very old events● Against Lehman suggestions [Leh78] [Leh85c]
● In my opinion, counter intuitive
http://www.uax.es http://herraiz.org
Long range correlated processesLong range correlated processes
http://www.uax.es http://herraiz.org
Long range correlated processesLong range correlated processes
http://www.uax.es http://herraiz.org
Long range correlated processesLong range correlated processes
Unreachable
http://www.uax.es http://herraiz.org
Short range correlatedShort range correlated
http://www.uax.es http://herraiz.org
Short range correlatedShort range correlated
http://www.uax.es http://herraiz.org
Short range correlatedShort range correlated
http://www.uax.es http://herraiz.org
Short range correlatedShort range correlated
http://www.uax.es http://herraiz.org
How is software evolution?How is software evolution?
or ?
http://www.uax.es http://herraiz.org
Autocorrelation coefficientsAutocorrelation coefficients
1 2 3 4 5...
1 2 3 4...
1 2 3...
r(1)
r(2)
.
.
.
http://www.uax.es http://herraiz.org
Autocorrelation coefficientsAutocorrelation coefficientsr(k)
k1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1
0
http://www.uax.es http://herraiz.org
Autocorrelation coefficientsAutocorrelation coefficientsr(k)
k1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1
0
Long rangecorrelated
r k ~k 2d−1
Short rangecorrelated
(ARIMA process)r k ~C 1−k
0d0.5
http://www.uax.es http://herraiz.org
Autocorrelation coefficientsAutocorrelation coefficientsr(k)
k
1
0
Short rangecorrelated
(ARIMA process)
Long rangecorrelated
Logarithmicscale
r k ~k 2d−1
r k ~Ai 1−k
0d0.5
http://www.uax.es http://herraiz.org
Empirical studyEmpirical study
● 3,821 software projects– More than 3 developers– More than 1 year of active history– 9,234,104 commits / 2,357,438 modification requests– Projects registered between Nov. 1999 and Dec. 2004– Datasets publicly available
● See Determinism and evolution– 5th International Working Conference on
Mining Software Repositories (MSR 2008)FLOSSMole
+CVSAnalY-SF
http://www.uax.es http://herraiz.org
MethodologyMethodology
● Liner correlation to calculate linearity● Distribution of the Pearson coefficients● Smoothing applied to the series before
calculating ACF
http://www.uax.es http://herraiz.org
ResultsResults
http://www.uax.es http://herraiz.org
ResultsResults
http://www.uax.es http://herraiz.org
ResultsResults
Shortmemoryprocesses
Longmemoryprocesses
35
http://www.uax.es http://herraiz.org
Looking at the numbersLooking at the numbers
Quantile Commits0 0.3235 0.2886
20 0.7394 0.724840 0.8178 0.803660 0.8906 0.870580 0.9783 0.9464
100 0.9998 0.9998
MRs
Long memory process
Short memory process
http://www.uax.es http://herraiz.org
Implications for evolutionImplications for evolution
● Short memory -> Yesterday's weatherhttp://doi.ieeecomputersociety.org/10.1109/ICSM.2004.1357788
● When deciding, current situation should have more influence● As Lehman said in 1978
37
http://www.uax.es http://herraiz.org
How to forecast software evolutionHow to forecast software evolution
38
http://www.uax.es http://herraiz.org
BackgroundBackground
● Forecasting traditionally done using very simple statistical models● Regression
● Lehman suggested in 1974 that Time Series Analysis was the best approach to study software evolution
● Let's compare time series analysis against regression models
39
http://www.uax.es http://herraiz.org
Case studiesCase studies
Time
1993 1995 1997 1999 2001 2003 2005 2007
PostgreSQL
FreeBSD
NetBSD
Training set Test set
40
http://www.uax.es http://herraiz.org
Case studiesCase studies
Training set Test set
http://www.uax.es http://herraiz.org
Time Series AnalysisTime Series Analysis
Originaltime series
data
ACFPACF
Clearpattern?
Kernelsmoothing
p, d, qbased on
ACF / PACF
ARIMA modelfitting
Predictions
No
Yes
http://www.uax.es http://herraiz.org
Parameters of the modelParameters of the model
http://www.uax.es http://herraiz.org
Autocorrelation coefficients. Autocorrelation coefficients. No smoothingNo smoothing
http://www.uax.es http://herraiz.org
Autocorrelation coefficients. Autocorrelation coefficients. After smoothingAfter smoothing
http://www.uax.es http://herraiz.org
Parameters of all the modelsParameters of all the models
● Time series ARIMA model● d = 1 q = 0 p = 6, 7 or 9
● Regression model● r > 0.99
http://www.uax.es http://herraiz.org
How does the model look like?How does the model look like?
∇dxt 1−∑
j=1
q
jBj=t1−∑
i=1
p
iBi
∇ xt=xt−xt−1=1−Bxt
∇dxt=1−Bdxt
Bi=Bxt
i =xt−i
http://www.uax.es http://herraiz.org
How does the model look like?How does the model look like?
∇dxt 1−∑
j=1
q
jBj=t1−∑
i=1
p
iBi
Linear component
Linear component
Parameters ofthe model
Predicted / Actual values Coefficients Estimation errors
http://www.uax.es http://herraiz.org
Results Results Time series (ARIMA) vs. regressionTime series (ARIMA) vs. regression
ARIMA Regression3.93 16.891.80 15.941.48 6.86
FreeBSDNetBSD
PostgreSQL
Mean Squared Relative Error
http://www.uax.es http://herraiz.org
ConclusionsConclusions
● Time Series more accurate than Regression Analysis for macroscopic predictions
● Basic model. More components can be added.● Seasonality● Multi-variable, combining different factors
http://www.uax.es http://herraiz.org
More resultsMore results
● Ok, so you predicted last year...which is past...● What about predicting real future?
MSR Challenge 2007 winners
Goal: predicting the number of changes in Eclipse in the next three monthshttp://dx.doi.org/10.1109/MSR.2007.10
http://www.uax.es http://herraiz.org
Why this works?Why this works?
● Isn't it too accurate?● Why do you think this works?
52
http://www.uax.es http://herraiz.org
What's next?What's next?
53
http://www.uax.es http://herraiz.org
Further workFurther work
● Write a paper about the controversy around the validation of the laws of software evolution● In progress
● Write a paper about the short memory nature of evolution● Using Time Series Analysis to show it● And ARIMA as a forecasting tool● Extracting principles and guidelines for software
projects management
54
http://www.uax.es http://herraiz.org
And what I did learn during all these And what I did learn during all these years?years?
55
http://www.uax.es http://herraiz.org
Things I appreciate my advisors didThings I appreciate my advisors did
● Freedom of movements● Pressure to get my own funding● Unconditional support● Demanding and challenging environment● Opportunity to coordinate projects● And to participate in many meetings alone
56
http://www.uax.es http://herraiz.org
Things that I did not know and I do Things that I did not know and I do nownow
● Know-how about conferences and journals● English skills● Writing skills (papers and proposals)● Presentation skills● Self-motivation
– Brick walls are there for the rest of people– Experience is what you get when you don't get what
you want– Never give up
– http://www.youtube.com/watch?v=ji5_MqicxSo
57
http://www.uax.es http://herraiz.org
Take awayTake away
Laws ofSoftware Evolution
Controversy
Statisticalapproach
Replicable study
Short memorydynamics
ARIMAaccurate forecast
Brick walls area good thing
Keep working.Don't give up
Recommended