Approaches to Evaluation in Software Engineering - NTNU · 1 Approaches to Evaluation in Software Engineering Carl-Fredrik Sørensen, PhD Trial Lecture 16.02.2006, Trondheim

1

Approaches to Evaluation in Software Engineering

Carl-Fredrik Sørensen,PhD Trial Lecture16.02.2006, Trondheim

2

Outline

• What, Why, Who, How, Where to evaluate? – Objects to study?– Objective of evaluation?– Effects studied?– Context of evaluation?

• Evaluation in SE Industry.• Evaluation in SE Research.• Challenges• Summary

3

What to evaluate?What are the objects of study?

• What, Why, Who, Where, How to evaluate?

• Evaluation in SE Industry.

• Evaluation in SE Research.

• Challenges• Conclusion

4

Software Engineering (SE)The IEEE Computer Society defines software engineering as:1. “The application of a systematic, disciplined, quantifiable

approach to the development, operation, and maintenance of software; that is, the application of engineering to software.”

2. “The study of approaches as in (1).”

Ian Sommerville, Software Engineering, Addison-Wesley, 2001:– An engineering discipline which is concerned with all

aspects of software production from the early stages of system specification through to maintaining the system after it has gone into use.

5

Knowledge Areas in Software Engineering (SWEBOK)

• Software requirements• Software design• Software construction• Software testing• Software maintenance

• Software configuration management

• Software engineering management• Software engineering process• Software engineering tools and

methods• Software quality

6

Why evaluate in SE? What are the objectives?

• What, Why, Who, Where, How toevaluate?




7

General Evaluation Objectives

• General: Understand state-of-practice. Confirm theories or conventional wisdom.

• Exploration: when an area is not well understood.

• Description: describe the current state of things.

• Prediction: predicting the future.• Explanation: explaining why things happen.

8

SE Evaluation Objectives

• Understand the software process and product.• Define, measure and validate qualities of

process and product.• Evaluate and confirm successes and failures.• Information feedback for project control.• Learn from experience. Learn to predict

(planning).• Evaluate technology.• Improve software development.

9

What is the context of evaluation?

• What, Why, Who, Where How, to evaluate?




10

How to evaluate?

• What, Why, Who, Where, How to evaluate?




11

Approaches to Evaluation

• Methods for model definition.

• Definition of measurements or metrics.• Methods for data capture.• Methods for analysis.• Managing validity treats.• Research methods designed for objective

evaluation.

12

Quantitative Research

• Why:– Provides data that can be used to generate

statistical results.• What data:

– Numbers or discrete categories (Interval, Ratio).• Used to evaluate:

– Hypotheses. – Cause-effect relationships.

13

Qualitative Research

• Why:– Combination of technical and human behaviour.– Complexity of human behaviour difficult to quantify.– Provides more explanatory information.

• What data:– Words, pictures, observations, interviews, diaries etc.

(nominal, ordinal).

• Used to evaluate:– Non-tangible objects like people, organisations, processes,

etc.– Relationships between technology and people.

14

Qualitative Vs. Quantitative

• Qualitative data often assumed subjective.• Quantitative data assumed objective.• Subjectivity, objectivity orthogonal to data!!• Combinations:

– Qualitative explains reasons behind hypothesis and relationships.– Anomalies described different.– Increases amount of information.– Increases diversity of the data increases confidence of results.– Often alternations within empirical studies.

15

Validity and Reliability

• Construct validity: Variables accurately model hypotheses – right metrics.

• Internal validity: Changes in dependent variables are attributed change in independent variables – right data.

• Conclusion validity: Concerned with relationship between treatment and outcome – right statistics.

• External validity: Generalisation of results outside the study context – right respondents/sample.

• Threats to validity: Factors that influence interpretation and the ability to draw conclusions.

16

Industrial SE evaluation• What, Why, Who,

How, Where to evaluate?




17

Evaluate SE practice• Goals of evaluation:

– Increase quality.– Budget compliance.– Eventual software process improvement –

learning organisation.

18

Product Evaluation• Right product? – Software Requirements:

– Conducting of requirements reviews, prototyping, model validation and acceptance tests.

• Make it right? – Software Architecture/Design: – Definition of quality attributes, quality analysis, and design

reviews.

• Product – Software construction:– Validation: against requirements, quality attributes, user

satisfaction.– Verification: Software testing (as developer, as user).

19

Process Evaluation• Software engineering management:

– Determining satisfaction of requirements.– Reviewing and evaluating performance.

• Software engineering process:– Define/use process assessment models and methods.– Define/use process and product measurement (Size, structure,

quality).

• Tools and methods:– Software process improvement.– Evaluate effect of introduction/use.– Feature analysis and tools benchmarking.

20

Industrial Process Measurement• Two general types: analytic and benchmarking.• Benchmarking: Adoption of best practice. • Analytical techniques: Rely on “quantitative evidence”.

– Quality Improvement Paradigm (QIP). – Experimental Studies: controlled or quasi

experiments.– Personal Software Process: on the individual level.

21

Software verification & validation • Disciplined approach to assessing software products

throughout the product life cycle – Addresses software quality – Locate defects.– Conformance to requirements.

• Quality Enhancing Methods:– Formal verification wrt. specification.– Inspections and reviews.– Testing.– User Acceptance.

22

Inspections, reviews & walkthrough

• Static verification and validation.• Aim: Find defects earlier in the software process.• Defects are costly, 10x increase in cost for

finding/correcting in a later phase in project.

• Practice: Detection and correction of software problems are often deferred until late in software projects.

• Reading techniques important!

23

Testing

• Dynamic verification and validation.• Defect detection with respect to expected behaviour.• Ensure functionality and reliability.• Applicable to the whole development process.

24

Integrate and test components

Write User Requirements

Write System requirements

Develop product Architecture

Design Components

Implement Design –build components

Verify User Requirements

Verify System Requirements

Component verification

Testing

Requirements traceability

Inte

grat

ion

and

Test

ITADS

Verification V Diagram

ITADS = Inspection, Test, Analysis, Design, Similarity

Validated product

ITADS

Test

Test

25

Research Methods in Software Engineering • What, Why, Who,

How, Where to evaluate?




26

Research in SE

• Research approaches are typically:– Descriptive, Evaluative, Formulative, or Predictive.

• General categories of research approaches in SE: – Scientific method – formal/mathematical, – Engineering method,– Empirical method, and – Analytical method – variant of the scientific

method.

27

Scientific method

• Top-down approach.• Emphasis on finding better formal methods and

languages. • Builds mathematical models of phenomena.• Simulate the models and refine the models in iterations.• Mostly quantitative data.

• This method does not scale up in SE!

28

Engineering method

• Bottom-up approach.• Emphasis on finding better methods for structuring

large systems and software development.• Software development is viewed as a creative task

which cannot be controlled other than through rigid constraints on the resulting product.

29

Empirical method• Compares beliefs to observations (facts).• Helps to understand how and why “things” work.• Understanding allows for changes/improvement.• Generation of theory – Exploratory. • Strengthening or confirmation of research

propositions/theories – Confirmative.

• Building, testing, applying, and refining theory.• Not able to prove hypotheses!

– Strengthen or weakening.

30

Experimentation in Software Engineering• Observational methods.• Historical methods.• Controlled methods.• Mixed methods.

31

Observational methodsValidation method

Description Strengths Weaknesses

Project Monitoring

Collect development data

Monitor project in depth

Use ad hoc validation techniques

Monitor multiple projects

Provides baseline for future; Inexpensive

No specific goals

Case Study Can constrain one factor at low cost.

Poor control for later replication. Not useful for prediction. Validity threats.

Assertion Serves as a basis for future experiments

Insufficient validation

Field Study Inexpensive form of replication

Treatments differ across projects

32

Historical methodsValidation method


Literature search

Examine previously published studies

Examine data from completed projects

Examine qualitative data from completed projects

Examine structure of developed product

Large available database; Inexpensive

Selection bias; treatments differ

Legacy data

Combines multiple studies; Inexpensive

Cannot constrain factors; data limited

Lessons learned

Determine trends; Inexpensive.

No quantitative data; cannot constrain factors

Static analysis

Can be automated; Applies to tools

Not related to development method

33

Controlled methodsValidation method


Replicated experiment

Develop multiple versions of product/processReplicate one factor in laboratory setting

Dynamic analysis

Execute developed product for performance

Can be automated; Applies to tools

Not related to development method

Simulation Execute product with artificial data

Can be automated; Applies to toolsEvaluation in safe environment

Data may not represent reality; Not related to development method

Can control factors for all treatments

Very expensive; Hawthorne effect

Synthetic environment experiments

Can control individual factors; moderate cost

Scaling up; interactions among multiple factors

34

Mixed research methodsValidation method


Survey Ask questions to a population/sample

Mix between experiment and case study

Grounded theory

Building theory from collected data

Creates new knowledge

No clear mission

Post-mortem analysis

Examine data from completed projects.Mix survey/case study

Real life project experiences

As case studies

Statistical valid. Relatively cheap.

Questionnaire bias; honesty of responses. Many alternative explanations possible

Action research

Achieves practical outcome; Creates new knowledge

Same as case study.Validity threats

35

Challenges• Perform externally valid experiments.• Create better empirical studies.

– Define quantifiable theories/models and metrics.– Draw more credible conclusions from them.

• Selection of appropriate method.– One size does not fit all.– Dependent on the research questions asked.

36

Summary

• Many different areas to evaluate.• Different approaches for industry and academia. • Different approaches for each object to evaluate.• Research questions decide method.• Mix and match of methods often useful.• Important to evaluate actual practice to provide

improvements.

37

References• Avison, David; Lau, Francis; Myers, Michael and Nielsen, Peter Axel, "Action

Research", Communications of the ACM, Vol. 42, No. 1, pp.94-97, 1999.• Ciolkowski, Marcus, Oliver Laitenberger, Dieter Rombach, Forrest Shull, and

Dewayne Perry. Software Inspections, Reviews & Walkthroughs. ICSE’2002, Orlando, Florida, USA. 641-642

• Conradi, Reidar and Alf Inge Wang, editors. Empirical Methods and Studies in Software Engineering. Springer, Heidelberg, Germany, 2003. LNCS 2765.

• Farbey, Barbara and Anthony Finkelstein. Evaluation in Software Engineering: ROI, but more than ROI. Proc. of the 3rd International Workshop on Economics-Driven Software Engineering Research (EDSER-3), 2001.

• Perry, Dewayne, Adam Porter and Lawrence Votta. Empirical Studies of Software Engineering: A Roadmap. ICSE’2000.

• Seaman, Carolyn B.. Qualitative Methods in Empirical Studies of Software Engineering. IEEE Transactions on Software Engineering, 25(4):557–572, July/August 1999.

• Moody, Daniel. Empirical Research Methods. Lecture in IT Topics, 2002.

38

References• Sommerville , Ian, “Software Engineering”, Addison-Wesley, 2001, 6th Edition• Tichy, Walter F., "Should Computer Scientists Experiment More?", Computer,

31(5):32-40, May 1998.• Wohlin Claes, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell

and Anders Wesslén. Experimentation in Software Engineering - An Introduction. Kluwer Academic Publishers, 2000.

• Zelkowitz , Marvin V. and Dolores R. Wallace. Experimental Models for Validating Technology. Computer, 31(5):23–31, May 1998.

• IEEE Computer Society. SWEBOK: Guide to the Software Engineering Body of Knowledge - 2004 Version. http://www.swebok.org/ironman/pdf/SWEBOK_Guide_2004.pdf

• Thomas J., FR-HiTEMP Ltd. Build, Integrate and Test. Presentation slides. www.secam.ex.ac.uk/teaching/ug/studyres/SOE3215-6/6a%20Build%20Int%20Test.ppt

39

Documents

Approaches to Evaluation in Software Engineering - NTNU · 1 Approaches to Evaluation in Software Engineering Carl-Fredrik Sørensen, PhD Trial Lecture 16.02.2006, Trondheim