50
8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 1 Quantitative empirical methods Dag Sjøberg and Gunnar Bergesen Second intro week Master’s, 8 January 2020

Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 1

Quantitative empirical methods Dag Sjøberg and Gunnar Bergesen

Second intro week Master’s, 8 January 2020

Page 2: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 2

About Me •  Current position: Professor at University of Oslo

–  Software Process Improvement, Agile and Lean Methods, Software Quality, Empirical Research Methods

•  Education: –  MSc, University of Oslo, 1987 –  PhD, University of Glasgow, Computing Science, 1993

•  Work experience –  University of Oslo since 1993 –  SINTEF ICT (20 %) 2014–2019 –  Simula Research Laboratory, 2001–2009 –  University of Oslo 1993–2000 –  Statistics Norway (SSB) 1987–1990 –  National University Hospital (Rikshospitalet) 1985–1986

•  Startup –  Member of steering committee and co-owner of four startup companies

Page 3: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 3

About me •  Current:

–  Associate professor II at Programming & Software Engineering (20%)

–  Chief Product Officer Technebies (skill testing of developers) •  Education

–  MSc (2001) and PhD (2015) from University of Oslo –  PhD thesis: “Measuring programming skill”

•  Prior work experience –  Programmer –  IT Project leader two companies –  CEO three companies

Page 4: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 4

Writing a Master’s thesis

You may wish to •  propose or develop a new method, tool, technique, language,

practice, etc. that is supposed to be better than what exists, or •  find out whether an existing X is better than an existing Y, or •  how to improve X

Most Master’s theses should include some of this

Page 5: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 5

How to find out whether something is better?

•  Investigate what works best in practice, that is, perform an empirical study

–  Experiment –  Survey that asks people about their opinions –  Other studies such as case studies, action research,

ethnographic studies

Page 6: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 6

Structure

•  Quantitative vs. qualitative data •  The research life cycle •  Controlled experiments

–  Experiment example –  AB-Experiments

•  Surveys •  Quality of experiments and surveys

–  Hypothesis testing and effect size –  Validity

•  Summary

Page 7: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 7

What does it means to be better?

–  How much? –  How many? –  How large? –  How old? –  How long? –  How high? –  How warm? –  How thick, firm, etc.

•  Data expresses quantity •  Data expressed as numbers •  Used in statistics

Answers are measured in terms of quantitative data

Page 8: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 8

Quantitative data

•  Data expresses quantity •  Data expressed as

numbers •  Used in statistics

Qualitative data

•  Data expresses quality in some sense

•  Data expressed as text, images and forms except numbers

•  Can obtain quantitative data indirectly if a mapping exists from quantitative to quality data

•  Not used in statistics

Page 9: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 9

Quantitative empirical methods

•  Experiments and surveys typically collect quantitative data •  Therefore called “quantitative empirical methods”

Empirical means using evidence based on observation or experience rather than theory or pure logic

Page 10: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 10

Any questions regarding evaluation in your thesis?

•  Please spend 5 minutes in groups of 3-4 persons and discuss anything that you may have regarding evaluation

You may wish to •  propose or develop a new method, tool, technique, language,

practice, etc. that is supposed to be better than what exists, or •  find out whether an existing X is better than an existing Y, or •  how to improve X

– or something else

Page 11: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 11

Structure

•  Quantitative vs. qualitative data •  The research life cycle •  Controlled experiments

–  Experiment example –  AB-Experiments

•  Surveys •  Quality of experiments and surveys

–  Hypothesis testing and effect size –  Validity

•  Summary

Page 12: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

You may wish to •  propose or develop a new

method, tool, technique, language, practice, etc. that is supposed to be better than what exists, or

•  find out whether an existing X is better than an existing Y, or

•  how to improve X

What do you believe is the case? That is, formulate a hypotheses

Test the hypothesis: given that the hypothesis is correct, which results do you expect (predict)?

If the hypothesis is confirmed, what you believed is not hypothetical any more; you have a thesis

Page 13: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

What is a thesis?

1 a statement or theory that is put forward as a premise to be maintained or proved: this central thesis is that psychological life is not part of the material world. •  (in Hegelian philosophy) a proposition forming the first stage in

the process of dialectical reasoning. Compare with antithesis, synthesis.

2 a long essay or dissertation involving personal research, written by a candidate for a college degree: a doctoral thesis. © Apple dictionary

or Master’s thesis

Page 14: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 14

Structure

•  Quantitative vs. qualitative data •  The research life cycle •  Controlled experiments

–  Experiment example –  AB-Experiments

•  Surveys •  Quality of experiments and surveys

–  Hypothesis testing and effect size –  Validity

•  Summary

Page 15: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 15

Experiment

method, tool, technique, language,

practice, etc.

Independent variables (treatment)

Dependent variables (outcome)

Effect

Moderator variables (context)

time, quality, costs, etc.

An experiment is a study in which an intervention (treatment) is introduced to observe its effects

Page 16: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 16

Controlled experiments

•  An experiment investigates cause-effect relations

•  Experiments are conducted when the investigator wants control over the situation, with direct, precise, and systematic manipulation of the behavior of the phenomenon to be studied

•  Difference in the outcome is supposed to be caused by the different treatments (or by the treatment compared to no treatment, that is, the control group)

Page 17: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 17

Structure

•  Quantitative vs. qualitative data •  The research life cycle •  Controlled experiments

–  Experiment example –  AB-Experiments

•  Surveys •  Quality of experiments and surveys

–  Hypothesis testing and effect size –  Validity

•  Summary

Page 18: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 18

What is best? Pair programming or solo programming*

•  295 junior, intermediate and senior professional Java consultants from 29 companies were paid to participate (one work day)

•  99 individuals; 98 pairs

•  The pairs and individuals performed the same Java maintenance tasks on either:

–  a ”simple” system (centralized control style), or –  a ”complex” system (delegated control style)

•  We measured: –  duration (elapsed time) –  effort (cost) –  quality (correctness) of their solutions

*E. Arisholm, H. Gallis, T. Dybå, and D. Sjøberg, “Evaluating Pair Programming with Respect to System Complexity and Programmer Expertise,” IEEE Transactions on Software Engineering, 2007, 33(2): 65-86.

Page 19: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 19

Delegated vs. centralized control – expert opinions •  The Delegated Control Style:

–  Rebecca Wirfs-Brock: A delegated control style ideally has clusters of well defined responsibilities distributed among a number of objects. To me, a delegated control architecture feels like object design at its best…

–  Alistair Cockburn: [The delegated coffee-machine design] is, I am happy to see, robust with respect to change, and it is a much more reasonable ''model of the world.”

•  The Centralized Control Style: –  Rebecca Wirfs-Brock: A centralized control style

is characterized by single points of control interacting with many simple objects. To me, centralized control feels like a "procedural solution" cloaked in objects…

–  Alistair Cockburn: Any oversight in the “mainframe” object (even a typo!) [in the centralized coffee-machine design] means potential damage to many modules, with endless testing and unpredictable bugs.

ResponsibilityDriven Design

Use-Case DrivenDesign

Role Modelling

Data Driven Design

Control S

tyleO

bject-Oriented D

esign Method

Object1

Object2

Object3Object4

Object5

Messa

ge3

Message4

Message5

Message1

Message2

Messa

ge6

Delegated Control StyleCentralized Control Style

Object1

Object2Object3

Object4Object5

Message2

Message5 Messa

ge1

Messa

ge3

Message4

Page 20: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 20

“Assuming that it is not only highly skilled experts who are going to maintain an object-oriented system, a viable conclusion from the controlled experiment

reported in this paper is that a design with a centralized control style may be more maintainable than is a design with a delegated control style.”

Erik Arisholm and Dag Sjøberg, IEEE Transactions on Software Engineering, vol. 30, no. 8, August 2004, pp. 521-534.

Evaluating the effect of a delegated vs. centralized control style on the maintainability of object-oriented software

Page 21: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 21

Total Effect of PP

84 %

7 %

-8 %-40 %

-20 %

0 %

20 %

40 %

60 %

80 %

100 %

120 %

140 %

160 %

Duration Effort Correctness

Diffe

renc

e fro

m in

divi

dual

s

Page 22: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 22 22

Effect of PP for Juniors

5 %

111 %

73 %

-40 %

-20 %

0 %

20 %

40 %

60 %

80 %

100 %

120 %

140 %

160 %

Duration Effort Correctness

Diffe

renc

e fro

m in

divi

dual

s

Page 23: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 23

Moderating Effect of System Complexity for Juniors

4 %

109 %

32 %

6 %

112 %

149 %

-40 %

-20 %

0 %

20 %

40 %

60 %

80 %

100 %

120 %

140 %

160 %

Duration Effort Correctness

Diffe

renc

e fro

m in

divi

dual

s CC (easy)DC (complex)

Page 24: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 24

Effect of PP for Seniors

-9 %

83 %

-8 %

-40 %

-20 %

0 %

20 %

40 %

60 %

80 %

100 %

120 %

140 %

160 %

Duration Effort Correctness

Diffe

renc

e fro

m in

divi

dual

s

Page 25: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 25

Moderating Effect of System Complexity for Seniors

55 %

-13 %

8 %

115 %

-23 %

-2 %

-40 %

-20 %

0 %

20 %

40 %

60 %

80 %

100 %

120 %

140 %

160 %

Duration Effort Correctness

Diffe

renc

e fro

m in

divi

dual

s

CC (easy)DC (complex)

Page 26: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 26

So, when should we use PP? Programmer Expertise

Task Complexity Use PP? Comments

Junior Easy Yes Provided that increased quality is the main goal

Complex Yes Provided that increased quality is the main goal

Intermediate Easy No

Complex Yes Provided that increased quality is the main goal

Expert Easy No

Complex No Unless you are sure that the task is too complex to be solved satisfactorily even by solo seniors

The question of whether PP is beneficial or not, is meaningless!

Page 27: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 27

Structure

•  Quantitative vs. qualitative data •  The research life cycle •  Controlled experiments

–  Experiment example –  AB-Experiments

•  Surveys •  Quality of experiments and surveys

–  Hypothesis testing and effect size –  Validity

•  Summary

Page 28: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 28

A/B testing

Page 29: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 29

A/B testing

•  Background –  Historically used in marketing, design, games –  Data-driven product development & profitability tuning –  Assumption: ”we’re all wrong most of the time”

•  Typical use –  Fast release cycles –  Bottom-up and context depedent (i.e., ”I want to improve X

for part Y) –  Between-subject design

Page 30: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 30

Requires

•  Two (or more) version to be compared (A, B, ….) •  At least one outcome variable (KPI or other metric or

measure) to improve

Page 31: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 31

Advantages •  They are a well established strategy,

seen by many as the most ‘scientific’ approach

•  The only research strategy that can prove cause-effect relationships

•  Laboratory experiments permit high levels of precision in measuring outcomes and in analyzing data

•  Laboratory experiments often create artificial situations that are not comparable to real-world situations

•  Often difficult or impossible to control all the relevant variables

•  It is often difficult to recruit a representative sample of participants

•  It may be necessary to conceal from the participants the purpose of the research so they do not skew the results

Disadvantages

Experiments

Page 32: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 32

Structure

•  Quantitative vs. qualitative data •  The research life cycle •  Controlled experiments

–  Experiment example –  AB-Experiments

•  Surveys •  Quality of experiments and surveys

–  Hypothesis testing and effect size –  Validity

•  Summary

Page 33: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 33

Surveys

Page 34: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 34

Common in society •  Requires relatively few resources to include many people

•  Create statistics and test hypotheses over characteristics of the target group (the population being investigated)

•  Obtain information about people’s opinion about what, how much, how many, how and why or what people say they do

–  As opposed to case studies and ethnography, one does not observe

Page 35: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 35

Types of surveys

•  Cross-sectional surveys are used to gather information on a population at a single point in time.

•  Longitudinal surveys gather data over a period of time. The researcher may then analyze changes in the population and attempt to describe and/or explain them.

–  Trend studies focus on a particular population, which is sampled and scrutinized repeatedly. While samples are of the same population, they are typically not composed of the same people.

–  Cohort studies also focus on a particular population, sampled and studied more than once. A cohort study would sample the same group of people, every time.

–  Panel studies allow the researcher to find out why changes in the population are occurring, since they use the same sample of people every time.

Page 36: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 36

The importance of wording …

Two catholic priests wondered if one is allowed to smoke when one prays? They both sent a letter to the Pope:

P1: “Is it allowed to smoke when one prays?” Answer: NO – the pray should get full attention

P2: “Is it allowed to pray when one smokes?” Answer: YES – it is always a good thing to pray

Page 37: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 37

Question formats

•  Classification of objects or individuals “Are you: Male___ Female ___ Other ___?”

•  Ranking of items in order to reflect the relative ordering of phenomena “Please rank the following factors in order of importance (1-4)”

•  Pairwise comparison “Which do you prefer”

•  Rating of characteristics Simple, single-item scales, e.g.,

“Programming is a terrific course (check one)” “Strongly agree__, agree__, neither__, disagree__, strongly disagree__”

Page 38: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 38

Advantages •  They provide a wide an inclusive

coverage of people or events •  They can be administered from

remote locations using mail, email or telephone

•  They can provide a lot of data in a short time at a reasonable cost

•  They can be quantitatively analysed •  They can be replicated

•  They lack depth •  They tend to focus on what can be

counted or measured •  They do not establish cause-effect •  They cannot judge the accuracy or

honesty of people’s responses by observing their body language

Disadvantages

Page 39: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 39

Structure

•  Quantitative vs. qualitative data •  The research life cycle •  Controlled experiments

–  Experiment example –  AB-Experiments

•  Surveys •  Quality of experiments and surveys

–  Hypothesis testing and effect size –  Validity

•  Summary

Page 40: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 40

Hypothesis testing

•  Null hypothesis: –  ”there is no difference between the effect of treatments

(experiments) or between groups (surveys)” –  ”there is no difference between individual and pair programming”

Page 41: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 41

P-value

•  If it’s very unlikely, for example < 5 % (significance level), that we had obtained the results that we actually got, if the null hypothesis were true, then we reject the null hypothesis and

–  claim the alternative hypothesis (”there is a difference …”) •  The likelihood that we obtained the results we did assumring

the null hypothesis is true, is called the p-value (probability value)

•  One uses statistical methods to test null hypotheses

Page 42: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 42

Effect size

•  P-values is about how likely it is that there is a difference •  Effect size is about how large the difference actually is

–  Is the difference large enough to have any meaning in practice?

V.B. Kampenes, T. Dybå, J.E. Hannay and D.I.K. Sjøberg. A Systematic Review of Effect Size in Software Engineering Experiments, Information and Software Technology 49(11-12):1073-1086, 2007

Page 43: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 43

Structure

•  Quantitative vs. qualitative data •  The research life cycle •  Controlled experiments

–  Experiment example –  AB-Experiments

•  Surveys •  Quality of experiments and surveys

–  Hypothesis testing and effect size –  Validity

•  Summary

Page 44: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 44

Validity of empirical studies •  Internal validity

–  Is the difference that we observed between the groups that received the treatments actually caused by the treatments, or may there be other causes for the difference?

•  External validity –  Can we generalize the results we found; that is, is it likely that we would

obtain the same result in other settings? •  Construct validity

–  If a complex concept is measured, does the measure represent the concept in a satisfactory way? For example, is it OK to measure quality of a software system only in terms of number of bugs found?

•  Statistical conclusion validity –  Is correct statistics used?

Page 45: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 45

Achieving internal validity: Randomized vs. quasi-experiment

•  To achieve internal validity, we would ideally run randomized experiments

•  A randomized (or true) experiment is characterized by random assignments of subjects to experimental groups to infer the effect of the treatment

•  A quasi-experiment do not use random assignment of treatment to groups

–  Instead, the comparisons depend on nonequivalent groups that differ from each other in other ways than the presence of the treatment. For example, consider two methods to be compared. If the subjects only know one of the methods, a quasi-experiment is appropriate if learning the other method is infeasible.

–  Challenge: How to separate the effect of the treatment from the effect of the differences among the groups?

Page 46: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 46

External validity is about generalization

W.M. Trochim, The Research Methods Knowledge Base, http://www.socialresearchmethods.net/kb/

Page 47: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 47

Sampling

•  Representativeness –  https://www.aftenposten.no/viten/i/OnK87b/

Psykologiforskning-gjelder-bare-for-noen-fa-mennesker--Nina-Kristiansen

–  “Nitti prosent av deltagerne i psykologistudier kommer fra Europa og USA, men de utgjør bare 18 prosent av verdens befolkning, … forskere i psykologi dessuten henter inn studenter som deltagere i studiene sine. Disse hvite, urbane ungdommene er ikke engang representative for befolkningen i sitt eget land, mener Gurven.”

Page 48: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 48

Construct validity

“Independent” concept

“Dependent” concept

Independent variable(s)

Dependent variable(s)

Conceptual level

Operational level

“Independent” construct Dependent construct

Moderatorvariable

ModeratorconceptOperationalization

Operationalization

Cause-effect

Page 49: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 49

Structure

•  Quantitative vs. qualitative data •  The research life cycle •  Controlled experiments

–  Experiment example –  AB-Experiments

•  Surveys •  Quality of experiments and surveys

–  Hypothesis testing and effect size –  Validity

•  Summary

Page 50: Second intro week Master’s, 8 January 2020 › ifi › studier › aktuelt › ... · Second intro week Master’s, 8 January 2020. ... • Current position: Professor at University

8.1.2020, Dag Sjøberg and Gunnar Bergersen Slide 50

Empirical research methods – literature

Dag I.K. Sjøberg received the MSc degree in computer science from the University of Oslo in 1987 and the PhD degree in computing science from the University of Glasgow in 1993. He has five years of industry experience as a consultant and group leader. He is now research director of the Department of Software Engineering, Simula Research Laboratory, and a professor of software engineering in the Department of Informatics, University of Oslo. Among his research interests are research methods in empirical software engineering, software processes, software process improvement, software effort estimation, and object-oriented analysis and design. He is a member of the International Software Engineering Research Network, the IEEE, and the editorial board of Empirical Software Engineering.

Tore Dybå received the MSc degree in electrical engineering and computer science from the Norwegian Institute of Technology in 1986 and the PhD degree in computer and information science from the Norwegian University of Science and Technology in 2001. He is the chief scientist at SINTEF ICT and a visiting scientist at the Simula Research Laboratory. Dr. Dybå worked as a consultant for eight years in Norway and Saudi Arabia before he joined SINTEF in 1994. His research interests include empirical and evidence-based software engineering, software process improvement, and organizational learning. He is on the editorial board of Empirical Software Engineering and he is a member of the IEEE and the IEEE Computer Society.

Magne Jørgensen received the Diplom Ingeneur degree in Wirtschaftswissenschaften from the University of Karlsruhe, Germany, in 1988 and the Dr. Scient. degree in informatics from the University of Oslo, Norway in 1994. He has about 10 years industry experience as software developer, project leader and manager. He is now professor in software engineering at University of Oslo and member of the software engineering research group of Simula Research Laboratory in Oslo, Norway. His research focus is on software cost estimation.

The Future of Empirical Methods in

Software Engineering Research Dag I. K. Sjøberg, Tore Dybå and Magne Jørgensen

Future of Software Engineering(FOSE'07)0-7695-2829-5/07 $20.00 © 2007

358

Qualitative research in software engineering

Tore Dybå & Rafael Prikladnicki & Kari Rönkkö &

Carolyn Seaman & Jonathan Sillito

Published online: 28 May 2011# Springer Science+Business Media, LLC 2011

Qualitative research methods were developed in the social sciences to enable researchers tostudy social and cultural phenomena and are designed to help researchers understandpeople and the social and cultural contexts within which they live (Denzin and Lincoln2011). The goal of understanding a phenomenon from the point of view of the participantsand its particular social and institutional context is largely lost when textual data arequantified. Taylor and Bogdan (1984) point out that qualitative research methods weredesigned mostly by educational researchers and other social scientists to study thecomplexities of human behavior (e.g., motivation, communication, difficulties inunderstanding). According to these authors, human behavior is clearly a phenomenon that,due to its complexity, requires qualitative methods to be fully understood, since much ofhuman behavior cannot be adequately described and explained through statistics and otherquantitative methods. Examples of qualitative methods are action research, case studyresearch, ethnography, and grounded theory. Qualitative data sources include observationand participant observation (fieldwork), interviews and questionnaires, documents andtexts, and the researcher’s impressions and reactions.

Many in the software industry recognize that software development also presents anumber of unique management and organizational issues that need to be addressed andsolved in order for the field to progress. And this situation has led to studies related not only

Empir Software Eng (2011) 16:425–429DOI 10.1007/s10664-011-9163-y

T. DybåSINTEF, Trondheim, Norway

R. PrikladnickiPUCRS, Porto Alegre, Brazil

K. RönkköBlekinge Institute of Technology, Karlskrona, Sweden

C. Seaman (*)University of Maryland Baltimore County, Baltimore, MD, USAe-mail: [email protected]

J. SillitoUniversity of Calgary, Calgary, Alberta, Canada