48
16/04/10 Laboratory of Software Analy 1 Laboratory of Software Analysis Lesson 1 Filippo Ricca Unità CINI at DISI (Laboratorio Iniziativa Software FINMECCANICA/ELSAG spa - CINI) Genova, Italy [email protected] Mariano Ceccato, Alesssandro Marchetto ITC-Irst Trento, Italy [email protected], [email protected]

3/24/10 Laboratory of Software Analysis

Embed Size (px)

Citation preview

Page 1: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 1

Laboratory of Software AnalysisLesson 1

Filippo RiccaUnità CINI at DISI

(Laboratorio Iniziativa Software FINMECCANICA/ELSAG spa - CINI)

Genova, Italy [email protected]

Mariano Ceccato, Alesssandro Marchetto

ITC-IrstTrento, Italy

[email protected], [email protected]

Page 2: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 2

Overview

Objectives Course dependences Content / Course material / tools used Exam (discussion) Legacy systems Reverse engineering, re-structuring, re-engineering Program transformations (TXL) Past projects This year: “three small projects”

Page 3: 3/24/10 Laboratory of Software Analysis

Objectives, dependences, material, exam.

Page 4: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 4

Objectives

This course has two objectives: providing the practical skills involved in software

analysis and testing. Some techniques/approaches described during the theoretical lessons of the basic course (Software Analysis and Testing) will be applied to real cases of software systems to be re-engineered and tested.

introducing “Empirical studies in Software engineering”

Page 5: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 5

Dependences

---> Programming I and II, Software Engineering, Software Analysis and Testing.

---> It is important to kwon (a little): - OO programming, in particular Java (base level). - UML (class diagram, …).

- WEB technologies: HTML, JSP, … (Just a little)- Theoretical aspects of testing.- …

Not mandatory but …

Page 6: 3/24/10 Laboratory of Software Analysis

Content

Code analysis and transformations Theoretical aspects (already seen in Software analysis and testing). The TXL programming language. Practice: application of some techniques to software systems.

Software testing Theoretical aspects (already seen in Software analysis and testing).

Acceptance testing, GUI testing, Test-first, “Design for testing”, …

Tools: FIT, FITNESSE, JUnit, ABBOT, Robot …

Empirical studies in Software engineering Theoretical aspects (what is an ES?, How to design/conduct an ES?)

Analysis and interpretation (how to draw conclusions)

Execution of two empirical studies.

Page 7: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 7

Material / Tools

• Slides• Papers• Manuals of tools

• TXL: code analysis and transformations• Graphviz: Graph Visualization Software • VisualUML: UML modeling tool (diagrams recovery)• JUnit, Fit, Fitnesse, Abbot, Robot: Testing tools•…

Languages and Tools:

http://sra.itc.it/people/ceccato/courses/lsa/

Page 8: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 8

Examination

• During the course we will work at a lot of small projects.• The examination will consist of a discussion.• Admission to the examination requires (at least) the production of some documents that we will see during the year.

Examples of small projects:• Recovering the Architecture (class diagram) of a system.• Maintenance intervention / re-implementation of a system• Porting a C program in Java• Testing• Empirical study: C++ vs. Java•…

Page 9: 3/24/10 Laboratory of Software Analysis

A little of Terminology …

Page 10: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 10

Legacy systems

• They were implemented years ago (≅ 1970)• Their technology became obsolete (obsolete languages, language styles, hardware, …)• They have been maintained for a long time (≅ 30 years)• Their structure is deteriorated and does not facilitate understanding• Their documentation (if it exists) became obsolete• Original authors are not available• They contain business rules not recorded elsewhere• They can not be easily replaced (importart!)• They represent a large investment• …

Negative aspectsPositive aspects

Characteristics:

Each maintenance intervention is

Extremely difficult!

Page 11: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 11

Legacy dilemma

1. to build the new system from scratch.

2. trying to understand the legacy code and to reconstitute it in a new form.

What should we do with legacy code?

First step “reverse

engineering”

throw away …

Page 12: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 12

Reverse Engineering

Reverse engineering is the process of taking something (a device, an electrical component, a car, a software, …) apart and analyzing its working in details, usually with the intention to construct a new device or program that does the same thing.

Reverse engineering is used often by military, in order to copy other nations’ technology.

Page 13: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 13

Examples of military reverse-engineered projects include:

Soviet Union reverse-engineered Tu-4 Bull bomber from United States Boing B-29.

Soviet Union personal computer AGATHA was reverse-engineered from the Apple II.

North Korea reverse-engineered the Russian missile Scud Bs to make their own Scud Mod A.

Military Reverse Engineered projects

“Boing B-29”

Page 14: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 14

(Software) Reverse engineering

Reverse engineering is a process that helps understanding a software system. It is a process of examination, of extracting information, not a process of change or replication.

Software ----------> “Abstract representation”

Software ---------->

Page 15: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 15

Forward and Reverse Engineering

Forward engineering is the traditional process of moving from high-level abstractions to the physical implementation of a system.

Reverse engineering is “the inverse” of Forward engineering

Requirements Design Implementation

Requirements Design Implementation

“Abstract Code Representation” Code

Page 16: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 16

Reverse Engineering Tools

Pretty printers and code viewers Diagram generators (software views: flowcharts,

data flow diagrams, call graph diagrams, …) Embedded comments extractors (ex. Javadoc) Software metrics tools (Locs, methods/functions,

cohesion, coupling…) Design recovery tools (ex. Rational Rose,

Omondo, VisualUML: UML diagram extractor) Others …

Page 17: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 17

Restructuring

Restructuring is the transformation from one representation toanother at the same relative abstraction level - while preserving the system external behavior (functionality and semantics).

Examples:

• Code level: - from an unstructured (“spaghetti”) form to a structured form (“goto-less”) - conversion of set of “if-statements” into a “case structure”.

• Design level: to improve or change data structures (arrays to Lists, files system to DBMS …) or to improve algorithms (for example: time complexity).

Page 18: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 18

Re-engineering

The re-engineering process takes many forms, depending on its objectives. Sample objectives are: • code migration/porting (ex. C to C++)• reengineering code for reuse• reengineering code for security• …

Re-engineering is the examination (reverse engineering) of a system to reconstitute it (forward engineering) in a new form.

This process may include modifications with respect to new requirements not met by the original system (Semantics cannot be preserved).

Page 19: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 19

Reverse engineering

Reengineering

Restructuring

Restructuring

Restructuring

Restructuring

Livello di astrazione

+

-

Relationships

Page 20: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 20

Program analysis

• Program analysis is the (automated) inspection of a programto infer some properties. Usually, properties are inferred without running the program (static analysis).

Examples are:• Type analysis (type inference)• Dead code analysis• Clone analysis• Pointer Analysis• …

Page 21: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 21

Program Transformations

Program transformation is the act of changing one program into another.

source language L target language L’

P P’

Two cases:L is different from L’L is equal to L’

transformation

Examples:• Pascal to C porting• Goto elimination (Pascal language)

Page 22: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 22

TXL is a programming language specifically designed to support software analysis and program transformation.

Example: moves all loop-independent assignment statements outside of loops.

TXL

Loop x := a + b; y := x; a := y + 3; x := b + c; y := x – 2; z := x + a * y;End loop

x4 := b + c;y2 := x4 – 2;Loop x := a + b; y := x; a := y + 3; z := x4 + a * y2;End loop

Code motion optimization

Page 23: 3/24/10 Laboratory of Software Analysis

Past Projects and

Project of this year

Page 24: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 24

Project Year 2004 “Porting C to Java”

Porting of the Chullprogram (C code) in Java.

Chull determines the convex hull of a set of points in 3D.

Chull is not a trivial C code: (4161 LOCs, 31 functions, 3 struct, pointers, …).

“Convex hull in 2D”

Page 25: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 25

Project Steps 2004

• Instrumentation of Chull using TXL. Writing Testcases such that branch covered is reached.

• Reverse engineering of Chull using TXL: Call graph, dependences between functions and data structure.

• Object identification (clustering and concept analysis).• OO design in UML (only class diagram).• Java code generation. Chull’ (partially TXL)• Testing of Chull’ with testcases generated at point (1) to

show that Chull[i] = Chull’[i]

“Semi-automated procedure”

Page 26: 3/24/10 Laboratory of Software Analysis

Code instrumentation

read x, y

count(1) := 1

z := 1

If (x >y) count(3) := 1 exit

count(2) := 1

‘Program instrumented’

N.B count is an array where each element is assigned to 0.

true

false

To determine whether or not each branch is traversed, we can place a ‘counter’ (instrumentation) on each branch. Then we have to run the program with inputs.

To have branch coverage we have to check if “count” is equal to (1, 1, …, 1).

start

count = (0, 0, … 0)

Page 27: 3/24/10 Laboratory of Software Analysis

Project Year 2005“Maintenance intervention”

Adding a new “crosscutting functionality” (persistence history) to the “Jconsole” java program .

Jconsole: 27 java files, 1385 LOCs.

Two ways for adding a crosscutting functionality to a system:

1) Changing (almost) all the java classes.

2) Adding an aspect (AOP) in the language AspectJ.

is implemented by code fragments spread across

several classes

Page 28: 3/24/10 Laboratory of Software Analysis

AspectJ exampleSuppose to have to add ‘logging’ for all methods of a Java program.(Logger.entry(string) and Logger.exit(string))

/** Java */Public class Main { public void foo() { Logger.entry(“foo()”) …. Something … Logger.exit(“foo()”) } public void foo(int i) { Logger.entry(“foo(int)”) …. Something … Logger.exit(“foo(int)”) } public static void main(String [] args) { Logger.entry(“main()”) …. Something … Logger.exit(“main()”)}

/** AspectJ */Public class Main { public void foo() { …. Something … } public void foo(int i) { …. Something … } public static void main(String [] args) { …. Something …}

Public aspect autolog { pointcut publicMethods(): ….

Before(): publicMethods() {Logger.entry …} After(): publicMethods() {Logger.exit …}}

Page 29: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 29

Project Year 2006“a real SE experiment”

We have conducted a real software engineering experiment:

stereotyped UML class diagrams (“Conallen” proposal)

vs. Pure UML class diagrams

What are stereotypes? What is a software engineering experiment (or software

engineering empirical study)?

Web Application

s context

Page 30: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 30

Stereotypes

The designer’s of UML recognized that the language is not always perfect for every situation/domain.

UML has defined a mechanism to allow certain domains to extend the semantics of specific model elements. The extension mechanism allows the inclusion of new attributes, different semantics and additional constraints.

Stereotypes form an extension to UML.

Stereotypes are adornments or icons having a well-defined semantics.

Used instead of classesin the class diagram …

Page 31: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 31

Empirical studies in SE

Software engineering is the result of opinions and anecdotal evidences and not the result of empirical evidence...

For example no one has demonstrated that OO techniques are better that structured techniques, but everyone uses OO ...

Empirical studies (experiments) are useful to try to answer some research questions.

“technique A is better than B?”

Page 32: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 32

How to conduct an empirical study?

Suppose that we have to “demonstrate” this hypothesis:

“technique A is better than B”Procedure:5. Participants (students, professionals, etc) are divided into two groups

(Group 1 and Group 2).6. Group 1 will execute the task with technique A while Group 2 with

technique B.7. Data of the experiment are collected and metrics are measured.8. The hypothesis of the experiment is evaluated statistically using data

collected and metrics.

Page 33: 3/24/10 Laboratory of Software Analysis

Empirical study 1: “Conallen” vs. Pure UML

“Conallen notation” Pure UML

Which is more useful during understanding and maintenance?

Page 34: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 34

This year

Porting “Borland Delphi Object Pascal” program to Java using TXL

Empirical study 1: Testcases (“Fit tables”) can be used to clarify requirements?

Empirical study 2: Conallen vs. WebML. When doing a comprehension task is more useful Conallen or WebML?

Three projects:

Page 35: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 35

“Borland Delphi Object Pascal” to Java

Type Person = object surname: string[30];

name: string[20];age: Integer;

Procedure init; End;

Student = Object(Person) grade: Integer; teacher: String[30]End;

Procedure Person.Init; Begin surname := “”; name := “”; age:=0; End;

class Person { String surname; String name; int age;

Person() { surname = “”; name = “”; age = 0; } }

class Student extends Person { int grade; String teacher;}

TXL program

Page 36: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 36

Fit tables

• A Fit table is a way of expressing the business logic using a simple (input-output) HTML table.

• Fit tables are “added to the requirements” and are used as acceptance test cases.

• Customers and Analysts create Fit tables using a tool like Word, Excel, or even a text editor.

input output

Page 37: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 37

Sports Magazine Website

A sports magazine decides to add a new feature to its Website that will allow users to view top football teams based on their ratings.

Rating = ((10000*(won*3+drawn)) (3*played))/100)

The analyst can express the change requirement in the traditional way:- natural language, use cases, …. or- using natural language + Fit tables

“new feature added”

We did an example to understand a little bit better Fit tables

Page 38: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 38

Natural languagevs.

Fit tables + natural language

A user can search for top N football teams based on rating.

The rating is defined …

A user can search for top N football teams based on rating.

“Only natural language” “Fit table + natural language”

Fit tables can be used to clarify requirements?

Page 39: 3/24/10 Laboratory of Software Analysis

Empirical study 2

When doing a comprehension task which is the notation more useful?

Conallen

WEBML

Group 1

Group 2

+

+

Questionnaire

QuestionnaireWeb appls

Page 40: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 40

The end …

Next lessons …

Page 41: 3/24/10 Laboratory of Software Analysis

“Obfuscated C contest Winner”

IOCCC is a competition to see who can write the most unreadable, but legal C program.

return

return

Page 42: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 42

Code viewers

1) Textual representation (colors)

return

2) Graphical representation (colors)

Page 43: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 43

Imagix tool

‘C code’

Functions

Variables

Calls

Main Window (≅ call graph)

A picture is worth a thousand words …

return

Page 44: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 44

CVF 3.0

CVF 3.0 is a automated program Flow chart generator. It can perform automated reverse engineering of program code into programming flowcharts.

It works with: C, C++, VC++, VB, VBA, VBScript, ASP, Visual C#, Visual Basic .NET, Visual J# .NET, VC++.NET, ASP.NET, Java, JSP, JavaScript, Delphi, PowerBuilder and Perl.

return

Page 45: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 45

UML Class Diagram Recovery

return

Page 46: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 46

Type inference

a:=4;c:=a+b;Push(x, T);Push(y, T);d:=Pop(T);

a: integerc:real, b:realT: queue

d:real

Program P Inferred Types

return“Language without declarations”

Page 47: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 47

Dead code…

20: FOR I=1 TO 10

30: V[I] = V[I] +1;

40: PRINT V[I]

50: ENDFOR

60: PRINT X;

70: GOTO 100

80: CALL F1;

90: CALL F2;

100 END return

Suppose:No jumps to the lines 80 and 90!

Never executed

Page 48: 3/24/10 Laboratory of Software Analysis

16/04/10 Laboratory of Software Analysis 48

Clones

Example: clone analysis…20: FOR I=1 TO 1030: V[I] = V[I] +1;40: PRINT V[I]50: ENDFOR60: PRINT X;70: CALL F;…100: FOR J=1 TO 10110: W[J] = W[J] +1;120: PRINT W[J]130: ENDFOR

…Lines 20-50 and 100-130;…

Clones:

return