Empirical Research - hci-lecture.org · Understanding the purpose of experiments in HCI Learning experimental designs Empirical Research 2 Valentin Schwind. Why Empirical? ... Cause

Empirical Research

The following content is licensed under a Creative Commons Attribution 4.0 International license (CC BY-SA 4.0) Valentin Schwind1

Image Source: https://pxhere.com/de/photo/544817

Learning Goals

▪ Understanding the purpose of empirical research

▪ Understanding the purpose of experiments in HCI

▪ Learning experimental designs

Empirical Research Valentin Schwind2

Why Empirical?

▪ To understand cause and effect

▪ “When metal is heated it expands.”

▪ “As the moon has gravitational pull, the oceans have tides.”

▪ “When the price increases, the sales godown.“

▪ “When users type on my new keyboard, their typing speed increases.“

▪ “My algorithm increases thememorability of its users.“

Valentin Schwind3Empirical Research

E. (2001). GROUPS : INTERACTION AND PERFORMANCE.

Why Empirical?

▪ To understand cause and effect

▪ “When metal is heated it expands”

▪ To make predictions

▪ “The metal in this bridge needs space to expand in hot weather”

▪ To test hypotheses

▪ “The metal of the bridge withstands extreme weather changes”

▪ To derive models

▪ 𝐿 𝑇 = 𝐿 𝑇0 exp(𝑇0𝑇𝛼 𝑇 𝑑𝑇)


E. (2001). GROUPS : INTERACTION AND PERFORMANCE.

Causation versus CorrelationExample: Storks and birthrate


Matthews, R. (2000), Storks Deliver Babies (p= 0.008). Teaching Statistics, 22: 36-38. doi:10.1111/1467-

9639.00013

https://doi.org/10.1111/1467-9639.00013

Causation versus Correlation

▪ Fact: Birthrate and number of storks correlate

▪ Question: “If I want more babies can I move to an area with many storks?”

▪ Depends on the cause!


more storks more children

more children more storks

more children more storks

“Tertium Quid”

“Yes, do it!”

“No.”

“Depends.”


Hypothesis: “My new keyboard is easy to use”

Photo by Niels Henze

Controlled Experiments

▪ Participants rated the system easy to use, because

▪ they actually find the system easy to use?

▪ they want to support you in your research?

▪ they were overwhelmed by the system’s novelty?

▪ their football team won the world cup yesterday?

▪ Only determining the precise cause for our observation helps us to make any predictions about the world

▪ But a mere observation will not help to find the answer!


Controlled Experiments

▪ Controlled experiments are (probably the only reliable) means to isolate cause and effect

▪ What if there are potential two effects or if they potentially depend on each other?

▪ Can we consider multiple causes and observe multiple effects?


Cause Experiment Effect

Cause 1Experiment

Effect 1

Cause 2 Effect 2

Experimental Designs

▪ In controlled experiments, it is possible to analyze multiple factors at the same time

→ such designs are called: multifactorial designs

▪ Single or multifactorial designs: each factor must have at least two characteristics

→ such characteristics are called: levels

→ the combination of levels are called: conditions

▪ Levels can have different types:

▪ present (yes/no)

▪ categorial (dogs, cats, …)

▪ continuous (volume, length, age, …)


Experimental Variables

▪ Fixed Factors (x) → Independent variables

▪ „what we control“ (e.g. the prototype)

▪ Levels are fixed and represent the experimental interest

▪ Measures (y) → Dependent variables

▪ „what we observe“ (e.g. task completion time)

▪ The measure of experimental interest

▪ Control Variables (ε) → Covariates

▪ „what we know but don‘t control“ (e.g. handedness)

▪ Random Factors (έ) → Error variable

▪ No explicit factor between the levels (e.g. the participant)



Image from https://pxhere.com/en/photo/544817





The Independent Variable (IV)

▪ How to manipulate one single aspect?

▪ In theory:

▪ by keeping all other factors (environment, weather, intelligence, mood, training,…) stable

▪ but people, situations, training, fatigue, etc… are never identical after performing the first level of a condition!

▪ In practice:

▪ a random sample

▪ (pseudo) randomization of conditions

▪ permutations

▪ counter-balancing using e.g. a (balanced) Latin-Square


Latin Square

▪ a Latin square is an n × n array filled with n different symbols, each occurring exactly once in each row and exactly once in each column

▪ a balanced Latin square additionally ensures that one symbol never follows another twice


Image by Schultz (2006) from wikimedia.org (CC-BY-SA-2.5) https://commons.wikimedia.org/wiki/File:Fisher-stainedglass-gonville-caius.jpg

A B D C

B C A D

C D B A

D A C B

https://commons.wikimedia.org/wiki/File:Fisher-stainedglass-gonville-caius.jpg

Counter-Balancing vs. Randomization

▪ Conditions in a random order can avoid sequence effects, e.g. through training or tiredness

→ Randomness does not necessarily evens out sequence effects

▪ Conditions in a balanced Latin-square design evens out the “what-follows-what scenario” and protect the experiment against order effects

▪ But a balanced Latin Square design must be carried out by a number of participants using a multiple of the conditions

→ e.g. 50 conditions: 50, 100, 150, 200… participants

▪ Not feasible (e.g. in online surveys with a unpredictable number of participants)

→ In those cases experimental designs are (pseudo-)randomized by a computer


Within-Subjects IVs

▪ Participants are assigned to all conditions

▪ Advantages

▪ Economy

▪ Sensitiveness

▪ Cancelling out individual differences

▪ Disadvantages

▪ Carry-over effects from previous conditions

▪ Conditions must be balanced

Repeated measures designs


Between-Groups/Between-Subjects IVs

▪ Participants are assigned to one condition only

▪ Advantages

▪ Simplicity

▪ Less chance of practice or fatigue effects

▪ Useful when it is impossible for an individual to participate in all conditions (e.g. gender)

▪ Disadvantages

▪ Expense (time, effort, and number of participants)

▪ Insensitiveness to experimental manipulations

Independent measures designs


Hybrid IVs and Mixed-Designs

▪ Two types of variables:

▪ between-subjects variable(s)

▪ within-subjects variable(s)

▪ Participants are randomly assigned to each level of the between-subject variable(s)

▪ Randomized assignment

▪ All participants are exposed to each level of the within-subjects variable(s)

▪ Randomized or counter-balanced order


Multifactorial Designs


BPrototype

User Interface 1 32

A

Independent

Variables

C

Levels Conditions

= 9

Full Factorial Design

BPrototype

User Interface 1 3

C

= 6

Nested Design

Conditions: A1, A2, A3,

B1, B2, B3, C1, C2, C3

Conditions: A1, A2, B1,

B2, B3, C2, C32

A

HCI Research Methods

▪ Online surveys

▪ Quick, cheap, efficient, broad range of participants

▪ Lab studies

▪ controlled setting without interruptions

▪ In-situ

▪ natural environment

▪ VR/AR

▪ Safe, easy prototyping

▪ Are they different?


[1] A. Voit, S. Mayer, V. Schwind, and N. Henze. 2019. Online, VR, AR, Lab, and In-Situ: Comparison of Research

Methods to Evaluate Smart Artifacts. In CHI ’19. https://doi.org/10.1145/3290605.3300737

https://doi.org/10.1145/3290605.3300737


▪ Yes, the results aredifferent!

▪ In-situ and VR showed thehighest e.g. hedonic and pragmatic quality

▪ Participants are not able toignore the experimental apparatus

▪ But which one reflects the„truth“ best?


[1] A. Voit, S. Mayer, V. Schwind, and N. Henze. 2019. Online, VR, AR, Lab, and In-Situ: Comparison of Research

Methods to Evaluate Smart Artifacts. In CHI ’19. https://doi.org/10.1145/3290605.3300737

https://doi.org/10.1145/3290605.3300737

Internal Validity

▪ Identification, documentation, and elimination of confounds

▪ High, when there are no alternative explanations for your results

▪ The variation of your dependent variable is caused by the variation of your independent variable

▪ Low, when there when experimental effects can be explained through confounds

▪ The variation of your dependent variable can by explained by the variation of confounds

▪ We aim for high internal validity


THIS MESS

▪ Testing – subjects react on the experimental setup or task

▪ History – e.g. events between two measurements

▪ Instrument – e.g. change of the measurement tool

▪ Statistical regression (toward the mean) – data outliers e.g. caused by inhomogeneous test groups

▪ Maturation – subjects’ change between two measurements

▪ Experimental mortality – subjects‘ disappear

▪ Selection – lacking randomization of the tested sample

▪ Selective interaction – sequence/order effects


External Validity

▪ The extent to which results can be generalized

▪ High, when results of the study can be transferred to thereal world

▪ e.g. does the sample represent the general population?

▪ Low when the results cannot be applied to the population or real-life situations outside of the research setting

→ ecological validity


Internal vs. External Validity

▪ Do internal and external validity contradict each other?

▪ Internal validity: You have to control all interfering variables

▪ External validity: You establish an artificial, experimental setting

▪ Theories are being tested deductively, not inductively

▪ A theory is based on the assumption of falsification

▪ Does the observation in an experiment with high internal validity contradicts the theory?

▪ If yes: irrelevant if the results are “representative”

▪ If no: the experiment supports the theory (→ the theory must be further tested)



▪ All methods reflect „thetruth“

▪ The influence of external factors determines the ecological validity

▪ Controlled experiments are required to further determine those effects


Literature

▪ Field, Andy & Hole, Graham. (2003). How to Design and

Report Experiments.

▪ William Cochran & Gertrude Cox. (1950). Experimental

Designs.

▪ Donald Campbell & Julian Stanley. (1959). Experimental

and quasi-experimental designs for research


Documents

Empirical Research - hci-lecture.org · Understanding the purpose of experiments in HCI Learning experimental designs Empirical Research 2 Valentin Schwind. Why Empirical? ... Cause