Static analysis techniques Part II - ist.tugraz.at · 1 Static Analysis Techniques –Part II ......

Preview:

Citation preview

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II1

S C I E N C E P A S S I O N T E C H N O L O G Y

u www.tugraz.at

Selected Topics of Software Technology 3

Talking about apples and orangesStatic analysis techniques – Part II

Birgit Hofer

Institute for Software Technology

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II2

Warren Buffet

“Risk comes from not knowing

what you’re doing”

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II3

Outline

Static Analysis Techniques

Code smells repetition

Header and Unit inference

Data Debugging

Spreadsheet Environment specific remarks

Recognizing identical formulas

Useful libraries

Spectrum-based fault localization

Practical

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II4

Outline

Static Analysis Techniques

Code smells repetition

Header and Unit inference

Data Debugging

Spreadsheet Environment specific remarks

Recognizing identical formulas

Useful libraries

Spectrum-based fault localization

Practical

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II5

Spreadsheet

Quality Assurance

Techniques

Visualization

StaticAnalysis

Debugging

Testing

Modeling

Design & Maintenance

Support

Source: Jannach et al. “Avoiding, Finding and Fixing Spreadsheet Errors – A Survey of

Automated Approaches for Spreadsheet QA”, in Journal of Systems and Software, 2014.

Visualization: Patrick Koch,Diploma Seminar, TU Graz, 2015.

Static Analysis

• Code Smells

• Static Checker

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II6

What are Code Smells?

Any symptom in the source code of a program that

possibly indicates a deeper problem.

Not bugs, but they increase the risk

of introducing bugs.

Indicate the need of refactoring.

Refactoring

Process of changing a system in such a way that it does not alter the external behavior of the system but improves the internal structure. (Martin Fowler)

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II7

Code Smells in Software

Code Smells within Classes

Code Smells between Classes

Coding Standard

Long Methods

Long Parameter List

Duplicated Code

Large Classes

Dead code

Inappropriate Intimacy

Data clumps

Indecent Exposure

Feature Envy

Message Chains

Shotgun Surgery

Lazy Class

Middle Class

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II8

Spreadsheet Smells

Formula Smells

Interworksheet smells

Multiple References Multiple Operators Conditional Complexity

Duplicated FormulasLong Calculation Chains

Inappropriate Intimacy Feature Envy

Shotgun Surgery Middle Class

Standard Deviation Empty Cell

Pattern FinderReference to Empty CellString Distance

Quasi functional dependency

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II9

Identify the formula smells!

Source: Patrick Koch,Diploma Seminar, TU Graz, 2015.

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II10

Formula Smells - Solution

Source: Patrick Koch,Diploma Seminar, TU Graz, 2015.

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II11

Outline

Static Analysis Techniques

Code smells repetition

Header and Unit inference

Data Debugging

Spreadsheet Environment specific remarks

Recognizing identical formulas

Useful libraries

Spectrum-based Fault Localization

Practical

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II12

Relevant Literature

Abraham and Erwig:

“Ucheck: A Spreadsheet Type Checker for End Users”

Journal of Visual Languages and Computing,

Volume 18, 2007

Erwig and Burnett:

“Adding Apples and Oranges”

4th Int. Symposium on Practical Aspects

of Declarative Languages (PADL) 2002.

Antoniu et al.:

“Validating the Unit Correctness of

Spreadsheet Programs”

26th Int. Conference on Software

Engineering (ICSE), 2014.

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II13

Ariane 5's first test flight on 4 June 1996

failed, with the rocket self-destructing 37

seconds after launch because of a

malfunction in the control software. A data

conversion from 64-bit floating point value to

16-bit signed integer value to be stored in a

variable representing horizontal bias caused

a processor trap (operand error) because the

floating point value was too large to be

represented by a 16-bit signed integer. ….

Ariane 5

Source: Wikipedia, https://en.wikipedia.org/wiki/Ariane_5, 2015-10-19

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II14

Classical Type Checker

Purpose

Find incorrect applications of operations/assignments

E.g. a multiplication of a number and a string

Types

Runtime errors

Incorrect results

Spreadsheets:

Label-based Type Checking to find more faults!

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II15

UCheck - A Spreadsheet Type Checker form the Oregon State University

Talking about apples and oranges

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II16

A Harvest Example

Source: Erwig and Burnett: “Adding Apples and Oranges”, 4th Int. Symposium on Practical Aspects of

Declarative Languages (PADL) 2002.

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II17

A Harvest Example

Source: Erwig and Burnett: “Adding Apples and Oranges”, 4th Int. Symposium on Practical Aspects of

Declarative Languages (PADL) 2002.

Month

Month [May]

Month[June]

Month[May] | Month[June]

= Month[May|June]

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II18

A Harvest Example

Source: Erwig and Burnett: “Adding Apples and Oranges”, 4th Int. Symposium on Practical Aspects of

Declarative Languages (PADL) 2002.

Fruit

Fruit [Apples]

Fruit[Oranges] Fruit[Apples] | Fruit[Oranges]

= Fruit[Apples|Oranges]

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II19

A Harvest Example

Source: Erwig and Burnett: “Adding Apples and Oranges”, 4th Int. Symposium on Practical Aspects of

Declarative Languages (PADL) 2002.

Month[May] & Fruit[Apples]

Month[June] & Fruit[Apples]

Month[May] & Fruit[Apples] | Month[June] & Fruit[Apples]

=(Month[May] | Month[June]) & Fruit[Apples]

=Month[May|June] & Fruit[Apples]

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II20

Rules for Unit Propagation

1. Every value without header is a well-formed unit.

2. If a cell has a value v and a header U, then u[v] is

a well-formed unit.

3. If two units have no common root unit, you can link

the units using &.

4. If two units have a common root unit, you can link

the units using |.

If you cannot derive meaningful unit expressions, you might have found an incorrect formula!

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II21

Identify all types and units!

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II22

Modified Harvest Example

Month[June] & Fruit[Apples]

Month[May] & Fruit [Oranges]

Month[June] & Fruit[Apples] |

Month[May] & Fruit[Oranges]

Month[May] & Fruit[Apples|Oranges]

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II23

Identify all types and units!

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II24

Another Type Checking Example

Source: Elisabeth Getzner: “Survey of Fault Localization Techniques in Spreadsheets”,

Diplomanten-Seminar, TU Graz 2014.

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II25

Another Type Checking Example

Source: Elisabeth Getzner: “Survey of Fault Localization Techniques in Spreadsheets”, Diploma Seminar, TU Graz 2014.

Hours & Worker [Jones]

Hours & Worker [Smith]

Bonus & Worker [Smith]

Salary & Hours & Worker [Jones]

Salary & Hours & Worker [Smith]

Salary Hours & Worker [Jones]

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II26

The devil is in the details.

Header Inference is not that easy!

Spatial layout (top down, left right)

Heuristics required

Header

Core

Footer

Cells to label the data

• do not contain formulas

• are not input to other cells

Aggregation formulas

• at the end of rows or columns, or

• formulas that reference core cells or other formula cells

Data cells

• do not contain formulas

• are input to other cells

FillerBlank cells or cells with special formatting

(to separate tables within sheets)

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II27

Validating Unit Correctness

Source: Antoniu et al.: “Validating the Unit Correctness of Spreadsheet Programs”, ICSE 2014.

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II28

Validating Unit Correctness

Source: Antoniu et al.: “Validating the Unit Correctness of Spreadsheet Programs”, ICSE 2014.

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II29

Validating Unit Correctness

Source: Antoniu et al.: “Validating the Unit Correctness of Spreadsheet Programs”, ICSE 2014.

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II30

Validating Unit Correctness

Source: Antoniu et al.: “Validating the Unit Correctness of Spreadsheet Programs”, ICSE 2014.

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II31

Outline

Static Analysis Techniques

Code smells repetition

Header and Unit inference

Data Debugging

Spreadsheet Environment specific remarks

Recognizing identical formulas

Useful libraries

Spectrum-based Fault Localization

Practical

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II32

Relevant Literature

Barawy, Gochev and Berger:

“CheckCell: Data Debugging for Spreadsheets”

Object-Oriented Programming, Systems,

Languages & Applications (OOPSLA), 2014.

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II33

Programmers proverb

Garbage in garbage out

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II34

Data Debugging

Input data errors

Data entry errors

Measurement errors

Data integration errors

Techniques

Statistical (Gaussian distribution, std. deviation)

Identify cells that have an unusual impact on result

CheckCell

Source: Barawy, Gochev and Berger: “CheckCell: Data Debugging for Spreadsheets”

Object-Oriented Programming, Systems, Languages & Applications (OOPSLA), 2014.

Identify cells that have an unusual impact on result

CheckCell

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II35

CheckCell

Premises on input vector:

Values in vector exchangeable

Function input is homogeneous

Computed value changes significantly when an

erroneous input value is corrected

Computation Trees

Root node = formula, leaves = input values

Dependency Graph == Computation Forest

Bootstrap procedure

To determine effect of a particular input on formulas

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II36

Example CheckCell

Source: Barawy, Gochev and Berger: “CheckCell: Data Debugging for Spreadsheets”

Object-Oriented Programming, Systems, Languages & Applications (OOPSLA), 2014.

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II37

Example CheckCell

Source: Barawy, Gochev and Berger: “CheckCell: Data Debugging for Spreadsheets”

Object-Oriented Programming, Systems, Languages & Applications (OOPSLA), 2014.

“Zahlendreher” - reversal of two neighboring

digits when writing down a number

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II38

Outline

Static Analysis Techniques

Code smells repetition

Header and Unit inference

Data Debugging

Spreadsheet Environment specific remarks

Recognizing identical formulas

Useful libraries

Spectrum-based Fault Localization

Practical

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II39

How to recognize equivalent formulas?

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II40

The R1C1 cell reference system

Absolute References

Relative References

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II41

The R1C1 cell reference system

Reference Meaning

R[-2]CRelative reference to the cell two rows up and in the same

column

R[2]C[2]Relative reference to the cell two rows down and two columns to

the right

R2C2Absolute reference to the cell in the second row and the second

column

R[-1] Relative reference to the entire row above the active cell

R Reference to the current row

Source: http://office.microsoft.com/en-us/help/about-cell-and-range-references-HP005198323.aspx

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II42

R1C1 with fixed references

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II43

Helpful libraries

Java:

Apache POI

http://poi.apache.org

Generic Framework for Spreadsheet Analysis

extends POI

http://ssaapp.di.uminho.pt

.NET:

Gembox

www.gemboxsoftware.com/spreadsheet/overview

Google GData API

https://developers.google.com/gdata

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II44

Outline

Static Analysis Techniques

Code smells repetition

Header and Unit inference

Data Debugging

Spreadsheet Environment specific remarks

Recognizing identical formulas

Useful libraries

Spectrum-based Fault Localization

Practical

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II45

Outline

Static Analysis Techniques

Code smells repetition

Header and Unit inference

Data Debugging

Spreadsheet Environment specific remarks

Recognizing identical formulas

Useful libraries

Spectrum-based Fault Localization

Practical

Selected Topics of Software Technology 3

Static Analysis Techniques – Part II46

Practical

No lecture for the next month

PART 1 - Testing (individual)

Read a scientific paper and present its content

in class (30 minutes/person, November 23rd)

In case of questions email

Recommended