Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Software Reengineering���& Evolution
Serge Demeyer���Stéphane Ducasse���Oscar Nierstrasz
http://scg.unibe.ch/download/oorp/
© S. Demeyer, S. Ducasse, O. Nierstrasz Software Reengineering and Evolution.2
Schedule1. Introduction
There are OO legacy systems too !
2. Reverse EngineeringHow to understand your code
3. VisualizationScaleable approach
4. RestructuringHow to Refactor Your Code
5. Code DuplicationThe most typical problems
6. Software EvolutionLearn from the past
7. Conclusion
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.3
GoalsWe will try to convince you:
• Yes, Virginia, there are object-oriented legacy systems too!
• Reverse engineering and reengineering are essential activities in the lifecycle of any successful software system. (And especially OO ones!)
• There is a large set of lightweight tools and techniques to help you with reengineering.
• Despite these tools and techniques, people must do job and they represent the most valuable resource.
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.4
What is a Legacy System ?“legacy”
A sum of money, or a specified article, given to another by will; anything handed down by an ancestor or predecessor.
— Oxford English Dictionary
⇒ so, further evolution and development may be prohibitively expensive
A legacy system is a piece of software that:
• you have inherited, and• is valuable to you.
Typical problems with legacy systems:• original developers not available• outdated development methods used• extensive patches and modifications have
been made• missing or outdated documentation
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.5
Software Maintenance - Cost
requirementdesign
codingtesting
delivery
x 1
x 5
x 10
x 20
x 200Relative Maintenance EffortBetween 50% and 75% of global effort is spent on
“maintenance” !
Relative Costof Fixing Mistakes
Solution ?• Better requirements engineering?• Better software methods & tools���
(database schemas, CASE-tools, objects, components, …)?
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.6
Continuous Development
17.4% Corrective(fixing reported errors)
18.2% Adaptive(new platforms or OS)
60.3% Perfective(new functionality)
The bulk of the maintenance cost is due to new functionality⇒ even with better requirements, it is hard to predict new functions
data from [Lien78a]
4.1% Other
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.7
(*) process-oriented structured methods, information engineering,data-oriented methods, prototyping, CASE-tools – not OO !
Contradiction ? No!• modern methods make it easier to change���
... this capacity is used to enhance functionality!
Modern Methods & Tools ?[Glas98a] quoting empirical study from Sasa Dekleva (1992)
• Modern methods(*) lead to more reliable software• Modern methods lead to less frequent software repair
• and ...• Modern methods lead to more total maintenance time
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.8
Lehman's LawsA classic study by Lehman and Belady [Lehm85a] identified several “laws”
of system change.
Continuing change
• A program that is used in a real-world environment must change, or become progressively less useful in that environment.
Increasing complexity
• As a program evolves, it becomes more complex, and extra resources are needed to preserve and simplify its structure.
Those laws are still applicable…
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.9
What about Objects ?Object-oriented legacy systems• = successful OO systems whose architecture and design no longer
responds to changing requirements
Compared to traditional legacy systems• The symptoms and the source of the problems are the same
• The technical details and solutions may differ
OO techniques promise better• flexibility, • reusability,
• maintainability
• …
⇒ they do not come for free
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.10
What about Components ?
Components are very brittle …After a while one inevitably resorts to glue :)
© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.11
Soccer Field Metaphor
© A. Van Deursen
• Assume 10 lines of code���= 40 tiles of 1 x 1 cm
• 12.5 million lines of code���≈ 40 soccer fields
A. van Deursen, De software-evolutieparadoxIntreerede TU Delft, 23 feb 2005
Imagine 400 developers concurrentlymoving tiles around on 40 soccer fields
…
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.12
How to deal with Legacy ?New or changing requirements will gradually degrade original design… unless extra development effort is spent to adapt the structure
New Functionality
Hack it in ?
• duplicated code• complex conditionals• abusive inheritance• large classes/methods
First …• refactor• restructure• reengineer
Take a loan on your software⇒ pay back via reengineering
Investment for the future⇒ paid back during maintenance
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.13
Common SymptomsLack of Knowledge• obsolete or no documentation
• departure of the original developers or users
• disappearance of inside knowledge about the system
• limited understanding of entire system
⇒ missing tests
Process symptoms• too long to turn things over to
production• need for constant bug fixes• maintenance dependencies
• difficulties separating products
⇒ simple changes take too long
Code symptoms• duplicated code• code smells⇒ big build times
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.14
The Reengineering Life-Cycle
Requirements
Designs
Code
(0) requirementanalysis
(1) modelcapture
(2) problemdetection (3) problem
resolution
(4) program transformation
• people centric• lightweight
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.15
A Map of Reengineering Patterns
Tests: Your Life Insurance
Detailed Model Capture
Initial Understanding
First Contact
Setting Direction
Migration Strategies
Detecting Duplicated Code
Redistribute Responsibilities
Transform Conditionals to Polymorphism
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.16
2. Reverse Engineering• What and Why• First Contact
☞ Interview during Demo
• Initial Understanding☞ Analyze the Persistent Data
• Detailed Model Capture☞ Look for the Contracts
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.17
What and Why ?DefinitionReverse Engineering is the process of analysing a subject system
☞ to identify the system’s components and their interrelationships and
☞ create representations of the system in another form or at a higher level of abstraction. — Chikofsky & Cross, ’90
MotivationUnderstanding other people’s code(cfr. newcomers in the team, code reviewing,original developers left, ...)
Generating UML diagrams is NOT reverse engineering ... but it is a valuable support tool
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.18
The Reengineering Life-Cycle
(0) req. analysis(1) model captureissues• scale• speed• accuracy• politics
Requirements
Designs
Code
(0) requirementanalysis
(1) modelcapture
(2) problemdetection
(3) problemresolution
(4) program transformation
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.19
First Contact
System experts
Chat with the ���Maintainers
Interview���during Demo
Talk with ���developers
Talk withend users
Talk about it
Verify what���you hear
feasibility assessment(one week time)
Software System
Read All the Codein One Hour
Do a Mock���Installation
Read it Compile it
Skim the ���Documentation
Read about it
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.20
First Project PlanUse standard templates, including:• project scope
☞ see "Setting Direction"
• opportunities☞ e.g., skilled maintainers, readable source-code, documentation
• risks☞ e.g., absent test-suites, missing libraries, …☞ record likelihood (unlikely, possible, likely) ���
& impact (high, moderate, low) for causing problems
• go/no-go decision• activities
☞ fish-eye view
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.21
• Solution: interview during demo- select several users- demo puts a user in a positive mindset- demo steers the interview
Interview during Demo
Solution: Ask the user!
• ... however☞ Which user ?
☞ Users complain☞ What should you ask ?
Problem: What are the typical usage scenarios?
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.22
Initial Understanding
understand ⇒higher-level model
Top down
Speculate about Design
Recover design
Analyze the Persistent Data
Study the Exceptional
Entities
Recover database
Bottom up
Identify problems
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.23
Analyze the Persistent DataProblem: Which objects represent valuable data?Solution: Analyze the database schema• Prepare Model
☞ tables ⇒ classes; columns ⇒ attributes☞ candidate keys (naming conventions + unique indices)☞ foreign keys (column types + naming conventions���
+ view declarations + join clauses)• Incorporate Inheritance
☞ one to one; rolled down; rolled up• Incorporate Associations
☞ association classes (e.g. many-to-many associations)☞ qualified associations
• Verification☞ Data samples + SQL statements
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.24
Example: One To One
Patient id: char(5) insuranceID: char(7) insurance: char(5)
Salesman id: char(5) company: char(40)
Person id: char(5) name: char(40) addresss: char(60)
Patient id: char(5) insuranceID: char(7) insurance: char(5)
Salesman id: char(5) company: char(40)
Person id: char(5) name: char(40) addresss: char(60)
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.25
Example: Rolled DownPatient id: char(5) name: char(40) addresss: char(60) insuranceID: char(7) insurance: char(5)
Salesman id: char(5) name: char(40) addresss: char(60) company: char(40)
Patient id: char(5) insuranceID: char(7) insurance: char(5)
Salesman id: char(5) company: char(40)
Person id: char(5) name: char(40) addresss: char(60)
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.26
Example: Rolled Up
Person id: char(5) name: char(40) addresss: char(60) insuranceID: char(7) «optional» insurance: char(5) «optional» company: char(40) «optional»
Patient id: char(5) insuranceID: char(7) insurance: char(5)
Salesman id: char(5) company: char(40)
Person id: char(5) name: char(40) addresss: char(60)
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.27
Example: Qualified Association
Patient id: char(5) …
Treatment patientID: char(5) date: date nr: integer comment: varchar(255)
Patient id: char(5) …
Treatment comment: Text
date: Date nr: Integer
1
1
addTreatment(d, n, t) lookupTreatment(d, n)
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.28
Initial Understanding (revisited)Top down
Speculate about Design
Analyze the Persistent Data
Study the Exceptional
Entities
understand ⇒higher-level model
Bottom up
ITERATION
Recover design
Recover database
Identify problems
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.29
3. Software Visualization• Introduction
☞ The Reengineering life-cycle
• Examples• Lightweight Approaches
☞ CodeCrawler
• Dynamic Analysis• Conclusion
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.30
The Reengineering Life-cycle
Requirements
Designs
Code
(0) requirementanalysis
(1) modelcapture
(2) problemdetection (3) problem
resolution
(4) program transformation
(2) problem detectionissues• Tool support• Scalability• Efficiency
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.31
Visualising Hierarchies• Euclidean cones
☞ Pros:• More info than 2D
☞ Cons: • Lack of depth• Navigation
• Hyperbolic trees☞ Pros:
• Good focus• Dynamic
☞ Cons: • Copyright
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.32
Bottom Up Visualisation
Filter
All program entities and
relations
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.33
A lightweight approach• A combination of metrics
and software visualization☞ Visualize software using
colored rectangles for the entities and edges for the relationships
☞ Render up to five metrics on one node:• Size (1+2)• Color (3)• Position (4+5)
Relationship
Entity
Y Coordinate
Height Color tone
Width
X Coordinate
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.34
Nodes: ClassesEdges: Inheritance RelationshipsWidth: Number of attributesHeight: Number of methodsColor: Number of lines of code
System Complexity View
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.35
Inheritance Classification View
Boxes: ClassesEdges: InheritanceWidth: Number of Methods AddedHeight: Number of Methods OverriddenColor: Number of Method Extended
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.36
Data Storage Class Detection View
Boxes: ClassesWidth: Number of Methods Height: Lines of CodeColor: Lines of Code
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.37
Industrial Validation
Nokia (C++ 1.2 MLOC >2300 classes)Nokia (C++/Java 120 kLOC >400 classes)MGeniX (Smalltalk 600 kLOC >2100classes)Bedag (COBOL 40 kLOC)...
Personal experience2-3 days to get something
Used by developers + Consultants
© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.38
CO
BO
L C
ALL
GR
AP
H
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.39
Program Dynamics
• Simple• Reproducible• Scales well
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.40
• Visualization of similarities in event traces
• Eliminate similarities
Frequency Spectrum
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.41
• Extract run-time coupling• Apply datamining (“google”)• Experiment with documented
open-source cases (Ant, JMeter)☞ recall: +- 90 %☞ precision: +- 60 %
Key Concept Identification
Class
IC_C
C’ +
web-
mining
Ant docs
Project √ √ UnknownElement √ √ Task √ √ Main √ √ IntrospectionHelper √ √ ProjectHelper √ √ RuntimeConfigurable √ √ Target √ √ ElementHandler √ √ TaskContainer × √ Recall (%) 90 - Precision (%) 60 -
© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.42
Replication
T. Eisenbarth, R. Koschke, and D. Simon. Locating features in source code. IEEE Transactions on Software Engineering, 29(3):210–224, March 2003.
Replication is not supported, industrial cases are rare, …. In order to help the discipline mature, we think that more systematic empirical evaluation is needed. ���[Tonella et. Al, in Empirical Software Engineering]
© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.43
Pilot Study: ATM Simulation
Assumptions• Feature: invoked from the outside.• Map: scenario-feature map exists• Recompile: recompile or instrumentation
possible• Isolate: system can run in isolation
(prevent noise)• Manual: perform dynamic analysis
without help (I.e. no operator)• Generic: no limit to granularity of
computational unit
scenario-feature map
concept lattice
© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.44
Case Study: Portfolio Management
2nd iteration
3rd iteration
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.45
4. RestructuringMost common situations
Transform Conditionals to Polymorphism☞ Transform Self Type Checks☞ Transform Provider Type Checks
Redistribute Responsibilities☞ Move Behaviour Close to Data ☞ Eliminate Navigation Code☞ Split up God Class☞ Empirical Validation
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.46
Transform Conditionals to Polymorphism
Transform ���Self Type Checks
Test providertype
Test self type Test externalattribute
Transform ���Client Type Checks
Transform Conditionalsinto Registration
Testnull values
Introduce ���Null Object
Factor Out Strategy
Factor Out State
Test object state
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.47
class Message { private: int type_; void* data;
... void send (Channel* ch) {
switch (type_) { case TEXT : { ch->nextPutAll(data); break; } case ACTION : { ch->doAction(data); ...
void makeCalls (Telephone* phoneArray[]) {
for (Telephone *p = phoneArray; p; p++) { switch (p-> phoneType()) { case TELEPHONE::POTS : { POTSPhone* potsp = (POTSPhone*)p potsp->tourne(); potsp->call();... case TELEPHONE::ISDN : { ISDNPhone* isdnp = (ISDNPhone*)p isdnp->initLine(); isdnp->connect();...
Example: Transform Conditional
⇒ Transform Self Type Checks ⇒ Transform Client Type Checks
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.48
Message send()
Message send()
ActionMessage send()
TextMessage send()
switch (type_) { case TEXT : { ch->nextPutAll(data); break;} case ACTION : { ch->doAction(data);
...
Client1
Client1
Client2
Client2
Transform Self Type Check
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.49
TelephoneBox makeCall ()
Telephone
POTSPhone ...
ISDNPhone ...
TelephoneBox makeCall ()
Telephone makeCall()
POTSPhone makeCall()
...
ISDNPhone makeCall
...
Transform Client Type Check
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.50
Redistribute Responsibilities
Eliminate Navigation Code
Data containers
Monster client���of data containers
Split Up God Class
Move Behaviour Close to Data
Chains ofdata containers
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.51
Move Behavior Close to Data (example 1/2) Employee +telephoneNrs +name(): String +address(): String
Payroll +printEmployeeLabel()
System.out.println(currentEmployee.name() ); System.out.println(currentEmployee.address() ); for (int i=0; i < currentEmployee.telephoneNumbers.length; i++) {
System.out.print(currentEmployee.telephoneNumbers[i]); System.out.print(" "); }
System.out.println("");
TelephoneGuide +printEmployeeTelephones()
**
… for …
System.out.print(" -- "); …
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.52
Move Behavior Close to Data (example 2/2)
Employee - telephoneNrs - name(): String - address(): String +printLabel(String)
Payroll +printEmployeeLabel()
public void printLabel (String separator) { System.out.println(_name); System.out.println(_address); for (int i=0; i < telephoneNumbers.length; i++) {
System.out.print(telephoneNumbers[i]); System.out.print(separator); }
System.out.println("");}
TelephoneGuide +printEmployeeTelephones()
**
… emp.printLabel(" -- "); …
… emp.printLabel(" "); …
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.53
Car -engine +increaseSpeed()
Eliminate Navigation Code
… engine.carburetor.fuelValveOpen = true
Engine +carburetor
Car -engine +increaseSpeed()
Carburetor +fuelValveOpen
Engine -carburetor +speedUp()
Car -engine +increaseSpeed()
… engine.speedUp()
carburetor.fuelValveOpen = true
Carburetor -fuelValveOpen +openFuelValve()
Engine -carburetor +speedUp()
carburetor.openFuelValve() fuelValveOpen = true
Carburetor +fuelValveOpen
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.54
Split Up God ClassProblem: Break a class which monopolizes control?Solution: Incrementally eliminate navigation code
• Detection:☞ measuring size☞ class names containing Manager, System, Root, Controller☞ the class that all maintainers are avoiding
• How:☞ move behaviour close to data + eliminate navigation code☞ remove or deprecate façade
• However:☞ If God Class is stable, then don't split���
⇒ shield client classes from the god class
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.55
Split Up God Class: 5 variants
ControllerA
ControllerFilter1Filter2
B
ControllerFilter1Filter2
MailHeader
C
ControllerFilter1Filter2
MailHeader
FilterActionD
ControllerFilter1Filter2
MailHeader
FilterAction
NameValuePair
E
Mail client filters incoming mail
Extract behavioral class
Extract data class
Extract behavioral class
Extract data class
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.56
Empirical Validation• Controlled experiment with 63 last-
year master-level students (CS and ICT)
Independent Variables Dependent Variables
Experimental ���task
Institution
God classdecomposition
9
6
3 Time
Accuracy
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.57
Interpretation of Results• “Optimal decomposition” differs with respect to training
☞ Computer science: preference towards C-E
☞ ICT-electronics: preference towards A-C
• Advanced OO training can induce a preference towards particular styles of decomposition☞ Consistent with [Arisholm et al. 2004]
“Good” design is in the
eye of the beholder
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.58
5. Code Duplicationa.k.a. Software Cloning, Copy&Paste
Programming
• Code Duplication☞ What is it?☞ Why is it harmful?
• Detecting Code Duplication• Approaches• A Lightweight Approach• Visualization (dotplots)• Duploc• Recent trends
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.59
The Reengineering Life-Cycle
Requirements
Designs
Code
(0) requirementanalysis
(1) modelcapture
(2) problemdetection (3) problem
resolution
(2) Problem detection
(2) Problem detection
issues• Scale• Unknown a priori
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.60
Code is CopiedSmall Example from the Mozilla Distribution (Milestone 9)Extract from /dom/src/base/nsLocation.cpp
[432] NS_IMETHODIMP [433] LocationImpl::GetPathname(nsString[434] {[435] nsAutoString href;[436] nsIURI *url;[437] nsresult result = NS_OK;[438] [439] result = GetHref(href);[440] if (NS_OK == result) {[441] #ifndef NECKO[442] result = NS_NewURL(&url, href);[443] #else[444] result = NS_NewURI(&url, href);[445] #endif // NECKO[446] if (NS_OK == result) {[447] #ifdef NECKO[448] char* file;[449] result = url->GetPath(&file);[450] #else[451] const char* file;[452] result = url->GetFile(&file);[453] #endif[454] if (result == NS_OK) {[455] aPathname.SetString(file);[456] #ifdef NECKO[457] nsCRT::free(file);[458] #endif[459] }[460] NS_IF_RELEASE(url);[461] }[462] }[463] [464] return result;[465] }[466]
[467] NS_IMETHODIMP [468] LocationImpl::SetPathname(const nsString[469] {[470] nsAutoString href;[471] nsIURI *url;[472] nsresult result = NS_OK;[473] [474] result = GetHref(href);[475] if (NS_OK == result) {[476] #ifndef NECKO[477] result = NS_NewURL(&url, href);[478] #else[479] result = NS_NewURI(&url, href);[480] #endif // NECKO[481] if (NS_OK == result) {[482] char *buf = aPathname.ToNewCString();[483] #ifdef NECKO[484] url->SetPath(buf);[485] #else[486] url->SetFile(buf);[487] #endif[488] SetURL(url);[489] delete[] buf;[490] NS_RELEASE(url); [491] }[492] }[493] [494] return result;[495] }[496]
[497] NS_IMETHODIMP [498] LocationImpl::GetPort(nsString& aPort)[499] {[500] nsAutoString href;[501] nsIURI *url;[502] nsresult result = NS_OK;[503] [504] result = GetHref(href);[505] if (NS_OK == result) {[506] #ifndef NECKO[507] result = NS_NewURL(&url, href);[508] #else[509] result = NS_NewURI(&url, href);[510] #endif // NECKO[511] if (NS_OK == result) {[512] aPort.SetLength(0);[513] #ifdef NECKO[514] PRInt32 port;[515] (void)url->GetPort(&port);[516] #else[517] PRUint32 port;[518] (void)url->GetHostPort(&port);[519] #endif[520] if (-1 != port) {[521] aPort.Append(port, 10);[522] }[523] NS_RELEASE(url);[524] }[525] }[526] [527] return result;[528] }[529]
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.61
Case Study LOCDuplication
without comments
with comments
gcc 460’000 8.7% 5.6%
Database Server 245’000 36.4% 23.3%
Payroll 40’000 59.3% 25.4%
Message Board 6’500 29.4% 17.4%
How Much Code is Duplicated?Usual estimates: 8 to 12% in normal industrial code15 to 25 % is already a lot!
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.62
Copied Code Problems• General negative effect:
☞ Code bloat• Negative effects on Software Maintenance
☞ Copied Defects ☞ Changes take double, triple, quadruple, ... Work☞ Dead code☞ Add to the cognitive load of future maintainers
• Copying as additional source of defects ☞ Errors in the systematic renaming produce unintended aliasing
• Metaphorically speaking:☞ Software Aging, “hardening of the arteries”, ☞ “Software Entropy” increases even small design changes become very
difficult to effect
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.63
Code Duplication DetectionNontrivial problem:
• No a priori knowledge about which code has been copied• How to find all clone pairs among all possible pairs of segments?
Lexical Equivalence
Semantic Equivalence
Syntactical Equivalence
Type I
Type II(& Type III)Type IV
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.64
General Schema of Detection Process
Source Code Transformed Code Duplication Data
Transformation Comparison
Author Level Transformed CodeComparison Technique
[John94a] Lexical Substrings String-Matching
[Duca99a] Lexical Normalized Strings String-Matching
[Bake95a] Syntactical Parameterized Strings String-Matching
[Mayr96a] Syntactical Metric Tuples Discrete comparison
[Kont97a] Syntactical Metric Tuples Euclidean distance
[Baxt98a] Syntactical AST Tree-Matching
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.65
Simple Detection Approach (i)• Assumption:
• Code segments are just copied and changed at a few places• Code Transformation Step
• remove white space, comments• remove lines that contain uninteresting code elements
(e.g., just ‘else’ or ‘}’)
… //assign same fastid as container fastid = NULL; const char* fidptr = get_fastid(); if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ];
… fastid=NULL; constchar*fidptr=get_fastid(); if(fidptr!=NULL) intl=strlen(fidptr) fastid = newchar[l+]
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.66
Simple Detection Approach (ii)• Code Comparison Step
☞ Line based comparison (Assumption: Layout did not change during copying)
☞ Compare each line with each other line. ☞ Reduce search space by hashing:
1. Preprocessing: Compute the hash value for each line2. Actual Comparison: Compare all lines in the same hash bucket
• Evaluation of the Approach☞ Advantages: Simple, language independent ☞ Disadvantages: Difficult interpretation
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.67
A Perl script for C++ (i)while (<>) { chomp; $totalLines++;
# remove comments of type /* */ my $codeOnly = ''; while(($inComment && m|\*/|) || (!$inComment && m|/\*|)) { unless($inComment) { $codeOnly .= $` } $inComment = !$inComment; $_ = $'; } $codeOnly .= $_ unless $inComment; $_ = $codeOnly;
s|//.*$||; # remove comments of type // s/\s+//g; #remove white space s/$keywordsRegExp//og if $removeKeywords; #remove keywords
$equivalenceClassMinimalSize = 1;$slidingWindowSize = 5;$removeKeywords = 0;@keywords = qw(if then else );
$keywordsRegExp = join '|', @keywords;
@unwantedLines = qw( else return return; { } ; );push @unwantedLines, @keywords;
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.68
A Perl script for C++ (ii)$codeLines++; push @currentLines, $_; push @currentLineNos, $.; if($slidingWindowSize < @currentLines) { shift @currentLines; shift @currentLineNos;} #print STDERR "Line $totalLines >$_<\n"; my $lineToBeCompared = join '', @currentLines; my $lineNumbersCompared = "<$ARGV>"; # append the name of the file $lineNumbersCompared .= join '/', @currentLineNos; #print STDERR "$lineNumbersCompared\n"; if($bucketRef = $eqLines{$lineToBeCompared}) { push @$bucketRef, $lineNumbersCompared; } else {$eqLines{$lineToBeCompared} = [ $lineNumbersCompared ];} if(eof) { close ARGV } # Reset linenumber-count for next file
• Handles multiple files• Removes comments
and white spaces• Controls noise (if, {,)• Granularity (number of
lines)• Possible to remove
keywords
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.69
Output Sample
Lines: create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pnMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); create_property(pd,pnOwnership,stBool,true,*iOwnership); Locations: </face/typesystem/SCTypesystem.C>6178/6179/6180/6181/6182 </face/typesystem/SCTypesystem.C>6198/6199/6200/6201/6202 Lines: create_property(pd,pnSupertype,stReference,true,*iSupertype); create_property(pd,pnImplObjects,stReference,false,*iImplObjects); create_property(pd,pnElttype,stReference,true,*iEltType); create_property(pd,pMinelt,stInteger,true,*iMinelt); create_property(pd,pnMaxelt,stInteger,true,*iMaxelt); Locations: </face/typesystem/SCTypesystem.C>6177/6178 </face/typesystem/SCTypesystem.C>6229/6230
Lines = duplicated linesLocations = file names and line number
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.70
Visualization of Duplicated Code• Visualization provides insights into the duplication situation• A simple version can be implemented in three days• Scalability issue
• Dotplots — Technique from DNA Analysis • Code is put on vertical as well as horizontal axis• A match between two elements is a dot in the matrix
Exact Copies Copies with Inserts/Deletes Repetitive
a b c d e f a b c d e f a b c d e fa b x y e f b c d e a b x y dc ea x b c x d e x f xg ha
Variations Code Elements
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.71
Visualization of Copied Code Sequences
All examples are made using Duploc from an industrial case study (1 Mio LOC C++ System)
Detected ProblemFile A contains two copies of a piece of code
File B contains another copy of this code
Possible SolutionExtract Method
File A
File A
File B
File B
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.72
Visualization of Repetitive Structures
Detected Problem4 Object factory clones: a switch statement over a type variable is used to call individual construction code
Possible SolutionStrategy Method
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.73
Visualization of Cloned Classes
Class A
Class B
Class BClass A
Detected Problem:Class A is an edited copy of class B. Editing & Insertion
Possible SolutionSubclassing …
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.74
Visualization of Clone Families
20 Classes implementing lists for different data types
DetailOverview
Recent���Trends
© S. Demeyer, S. Ducasse, O. NierstraszObject-Oriented Reengineering.75
Clone Detection Inside
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.76
6. Software Evolution• Exploiting the Version Control System
☞ Visualizing CVS changes• The Evolution Matrix• Yesterday's weather
It is not age that turns a piece of software into a legacy system, ���but the rate at which it has been developed and adapted without being reengineered.
[Demeyer, Ducasse and Nierstrasz: Object-Oriented Reengineering Patterns]
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.77
The Reengineering Life-Cycle
Requirements
Designs
Code
(0) requirementanalysis
(1) modelcapture
(2) problemdetection (3) problem
resolution
(2) Problem detection
(2) Problem detection Issues• scale
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.78
Analyse CVS changes
4) Block Shift = Design Change
3) Triangle = Core Reduces
1) Vertical lines = Frequent Changers
2) Horizontal line = Shotgun Surgery
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.79
Ownership Map: ���Developer Activity
DialogueMonologue
Edit Takeover
Familiarization
What to (re)test ?
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.80
Data from Windows Vista and Windows 7
Software components with a high level of ownership will have fewer failures than components with lower top ownership levels.
Software components with many minor contributors will have more failures than software components that have fewer.
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.81
The Evolution Matrix
Last Version
First Version
Major Leap
Removed Classes
TIME (Versions) Growth Stabilisation
Added Classes
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.82
Example: MooseFinder (38 Versions)
© S. Demeyer, S. Ducasse, O. Nierstrasz Reengineering Legacy Systems.83
Test history
single test
unit tests
integration tests
… affect unit tests … affect unit tests
phased testing
System under study = checkstyle
Selenium Tests
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.84
GitHub Repository Project Description # Commit # Sel. Commit Java LoC Sel. LoCgxa/atlas Gene Expression Atlas Portal for sharing gene expression data 2118 358 32375 5374
INCF/eeg-database EEG/ERP portal Portal for sharing EEG/ERP portal clinical data 854 17 68262 7158mifos/head MiFos Portfolio management for microfinance institutions 7977 505 338705 18735
motech/TAMA-Web Tama Front office application for clinics 2358 239 62034 2815OpenLMIS/open-lmis OpenLMIS Logistics management information system 4714 1153 72275 19195
xwiki/xwiki-enterprise XWiki Enterprise Enterprise-level wiki 688 164 28405 13506zanata/zanata-server Zanata Software for translators 3430 81 111698 3509
Zimbra-Community/zimbra-sources Zimbra Enterprise collaboration software 377 243 1025410 189413
TABLE IV: The 8 repositories in the high-quality corpus.
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●● ●●●●●●●●●●●●●●
●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ●●●●●●●●●●●●●● ●●●●●●●●●
●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●
●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●● ●●●●● ●●●●●●●●●
●● ●●●●●● ●●●●●●●●●●●●●●●● ●● ●●●●●● ● ●●● ●●●●
0
500
1000
0 500 1000 1500CommitId
FileId
ChangeType● added−regular
added−seleniumdelete−regulardelete−seleniumedit−regularedit−selenium
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●●●
●
●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●● ●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●
●●●●●●●●●●●●●
●
●●●●●●●●●●
●
●●●●
●
●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●
0
500
1000
1500
0 1000 2000 3000 4000CommitId
FileId
ChangeType● added−regular
added−seleniumdelete−regulardelete−seleniumedit−regularedit−selenium
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●
●
●●●●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●
●●●●●●
●●
●
●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●
●
●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●
●
●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●
●●●●●●
●
●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●
●●●●●●●●●●●●●●
●
●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●
●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●
●●●●
●●
●
●
●●
●●●●●●●●●●●●
●●
●●
●
●●●●●●●●●●●●●●●●●
●●
●●●●●
●●●●
●●●●●●●●●●●●●●●●●●●
0
500
1000
1500
0 500 1000 1500 2000CommitId
FileId
ChangeType● added−regular
added−seleniumdelete−regulardelete−seleniumedit−regularedit−selenium
Fig. 3: Change histories of the XWIKI-ENTERPRISE (left), OPEN-LMIS (middle), and ATLAS projects (right).
B. Project-specific Change Histories
Before answering RQ2 quantitatively with two new corpus-wide metrics, we provide some visual insights into the commithistories of individual projects from our high-quality corpus.Figure 3 depicts a variant of the Change History Viewsintroduced by Zaidman et al. [23] for three randomly selectedprojects. For each commit in a project’s history, our variantvisualizes whether a SELENIUM (rather than a unit test file)or application file was added, removed or edited. The X-axisdepicts the different commits ordered by their timestamp. TheY-axis depicts the files of the project. To this end, we assign aunique identifier to each file. We ensure that SELENIUM filesget the lowest identifiers by processing them first. As a result,they are located at the bottom of the graph.
Figure 3 clearly demonstrates that SELENIUM tests aremodified as the projects evolve. However, the modificationsdo not appear to happen in a coupled manner. This is to beexpected as SELENIUM tests concern the user interface of anapplication, while application commits also affect the appli-cation logic underlying the user interface. Any evolutionarycoupling will therefore be less outspoken than, for instance,between a unit test and the application logic under test.
Note that a large number of files is removed and addedaround revision 1000 of the xwiki-enterprise project. Thecorresponding commit message6 reveals that this is due to achange in the project’s directory structure. We see this occur inseveral other projects. In providing a more quantitative answerto RQ2, the remainder of this section will therefore take careto track files based on their content rather than their path alone.
C. Corpus-Wide Commit Activity Metrics
We first aim to provide quantitative insights into the paceat which SELENIUM-based functional tests are changed as theweb application under test evolves.
1) Research method: To this end, we evaluate the high-quality corpus against several metrics that are based on thefollowing categorization of commit activities:
6Commit 74feec18b81dec12d9d9359f8fc793587b4ed329
Repository #S AS
C
SS
C
AS
D
SS
D
gxa/atlas 258 6.67 1.39 1.5 0.33INCF/eeg-database 11 75.36 1.55 98.59 2.8
mifos/head 381 19.58 1.33 6.7 0.4motech/TAMA-Web 170 11.52 1.41 3.33 0.44
OpenLMIS/open-lmis 704 5.05 1.64 0.45 0.16xwiki/xwiki-enterprise 96 5.33 1.71 6.94 1.95
zanata/zanata-server 51 65.65 1.59 35.82 0.72. . . /zimbra-sources 66 1.74 3.73 1.81 7.09
TABLE V: Averaged commit activity metrics for the high-quality corpus. The first column denotes either #SS or #AS(cannot diverge by more than 1).
SC SELENIUM commit: commit that adds, modifies or deletes atleast one SELENIUM file.
AC Application commit: commit that does not add, modify or deleteany SELENIUM file.
The same categorization transposes to commit sequences:
SS SELENIUM span: maximal sequence of successive SC.AS Application span: maximal sequence of successive AC.
Finally, this categorization enables computing the followingmetrics about each kind of span:
ASD,C
Length of an AS in days and in commits.SS
D,C
Length of a SS in days and in commits.
2) Results: The next table depicts the results for the com-mit activity metrics for the repositories in the high-quality cor-pus. The most revealing entries are related to the average lengthof the non-SELENIUM spans measured in commits (AS
C
).It takes on average about 11.23 non-SELENIUM commits(or 4.33 days) before a commit affects a SELENIUM file.However, this mean is largely influenced by outliers sincethe third quartile is even lower with only 9 non-SELENIUMcommits (2.05 days). These results suggest that SELENIUM-based functional tests do co-evolve with the web applicationunder test.
AS
C
SS
C
AS
D
SS
D
Mean 11.23 1.59 4.33 0.66Std Deviation 73.07 1.36 33.66 4.9
1st Quartile 2 1 0.05 0.02Median 4 1 0.54 0.06
3rd Quartile 9 2 2.05 0.36
Git repositories of the XWiki, OpenLMIS and Atlas© Laurent Christophe (Vrije Universiteit Brussel)
●
●
●
0.0
0.1
0.2
0.3
0.4
assertion command constant demarcator locationChange Classification
Cha
nge
Hit
Rat
io
Fig. 6: Summary of the corpus-wide change classification.Project Total Locator Command Demarcator Asserts ConstantsAtlas 8068 90 3 104 3282 2586XWiki 68665 115 154 24 1490 3114Tama 31821 95 89 43 36 571Zanata 12959 497 119 0 1 906EEG/ERP 248 3 0 0 6 24OpenLMIS 69792 2550 401 8 3454 8972
TABLE VII: Project-scoped change classification.
changes. Table VII lists project-scoped results. Our tool timedout on two projects of the corpus with an extensive history.
The most frequently made changes are those to con-stants and asserts. These are the two test components thatare most prone to changes. Constants occur frequently inlocator expressions to retrieve DOM elements from a webpage and in assert statements as the values compared against.10
Focusing future tool support for test maintenance on theseareas might therefore benefit test engineers most. Existingwork about repairing assertions in unit tests [4], and aboutrepairing references to GUI widgets in functional tests fordesktop applications [12] suggests that this is not infeasible.Note that existing work also targets repairing changes in testcommand sequences [15], but such repairs do not seem tooccur much in practice.
Both outliers in our results stem from the ATLAS project.Its test scripts contain hardcoded genome strings inside assertstatements that are frequently updated.
D. Threats to Validity
The edit script generated by CHANGENODES is not alwaysminimal. It may incorrectly output a sequence of redundantoperations for nodes that are not modified. This is due tosome of the heuristics used by the differencing algorithm.These unneeded operations only form a small set of the totaloperations. We have performed random validations of distilledchanges to ensure the correctness of our results.
Several changes are classified by looking at names ofmethods, without using type information. Computing thisinformation would be too expensive to do our experiments onmultiple large-scale projects. As a result some changes maybe incorrectly classified.
We have been unable to find examples of some of thechange categories from Section II. This is either due to ourchange query being too strict, the patterns not being present inthe examined projects or due incorrectly distilling the changesmade to the SELENIUM scripts.
10Our tool classifies such changes also in the locator or assertion category.
VI. RELATED WORK
Little is known about how automated functional tests areused in practice. A notable exception is Berner et al. [1]’saccount of their experiences with automated testing in in-dustrial applications. Their observation “The Capability ToRun Automated Tests Diminishes If Not Used” underlinesthe importance of test maintenance. In interviews with ex-perts from different organizations, an industrial survey [16]found that the main obstacles in adopting automated testsare development expenses and upkeep costs. Finally, Leottaet al. [17] recently reported on an experiment in which theauthors manually created two distinct sets of SELENIUM-basedautomated functional tests for 6 randomly selected open-sourcePHP projects. While the first set of tests corresponds to the testscripts studied in this paper, the second set of tests is createdusing a capture-and-replay functionality of the SELENIUMIDE. The authors find that the latter are less expensive todevelop in terms of time required, but much more expensive tomaintain. To the best of our knowledge, ours is the first large-scale study on the prevalence and maintenance of SELENIUM-based functional tests —albeit on open-source software.
Unit tests have received more attention in the literature. Thework of Zaidman et al. [23] on the co-evolution of unit testswith the application under test is seminal. Apart from “ChangeHistory Views” (cf. Section IV-B), they also proposed to plotthe relative growth of test and production code over time in“Growth History Views”. This enables observing whether theirdevelopment proceeds synchronously or in separate phases.Fluri et al. follow a similar method in their study on co-evolution of code and comments [8]. The same goes for astudy on co-evolution of database schema and application codeby Goeminne et al. [11]. Our metrics from Section IV aim toprovide quantitative rather than visual insights into this matter,based on commit activities rather than code growth.
Method-wise, there are several works tangential to oursin mining software repositories. German and Hindle [18], forinstance, classify metrics for change along the dimensions ofwhether the metric is aware of changes between two distinctprogram versions, of whether the metric is scoped to entitiesor commits, and of whether the metric is applied at specificevents or at evenly spaced intervals. The commit activity andmaintenance metrics from Section IV are scoped to commitsand SELENIUM files, unaware and aware of program changes,and applied at every and specific types of commits respec-tively. Several techniques and metrics have been proposed fordetecting co-changing files in a software repository (e.g., [13],[24]). We expect such fine-grained evolutionary couplings tobe less outspoken in our setting because test scripts exercisean implemented user interface along extensive scenarios, ratherthan the implementation itself. More research is needed.
Section V distilled and subsequently categorized changeswithin each commit to a SELENIUM file. Similar analyses havebeen performed using the CHANGEDISTILLER [9] tool. Theaforementioned study by Fluri et al. [8], for instance, includesa distribution of the types of source code changes (e.g., returntype change) that induce a comment change. Another fine-grained classification of changes to Java code has also beenused to better quantify evolutionary couplings of files [7]. Incontrast to these general-purpose change classifications, oursis specific to automated functional tests. More coarse-grained
Avoid Magic Constants !!
Recommender Systems
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.85
Stack Trace ⇒ link to source code
Description ⇒ text mining
Who to fix ? How long to fix ?
Misclassified bug reports ?
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.86
7. Conclusion1. Introduction
There are OO legacy systems too !
2. Reverse EngineeringHow to understand your code
3. VisualizationScaleable approach
4. RestructuringHow to Refactor Your Code
4. Code DuplicationThe most typical problems
5. Software EvolutionLearn from the past
6. ConclusionDid we convince you?
© S. Demeyer, S. Ducasse, O. Nierstrasz Object-Oriented Reengineering.87
GoalsWe will try to convince you:• Yes, Virginia, there are object-oriented legacy systems too!
☞ … actually, that's a sign of health
• Reverse engineering and reengineering are essential activities in the lifecycle of any successful software system. (And especially OO ones!)☞ … consequently, do not consider it second class work
• There is a large set of lightweight tools and ���techniques to help you with reengineering.☞ … check our book, but remember the list is growing
• Despite these tools and techniques, ���people must do job and represent the most valuable resource.☞ … pick them carefully and reward them properly
⇒ Did we convince you ?