Download pdf - Dr Mark Wright Key points - School of Informatics · 5 Quantitative analysis: inference n A Statistical Method n Can describe binary outcomes using the binomial distribution: •

1

Evaluating Design III: Experiment and Inference HCI Lecture 10

Dr Mark WrightKey points:

The Scientific Method Setting up an Experiment The Process of Inference Normal distributions and the Central Limit Theorem Interpreting significance Case Study - Research Project

Finding things out by Experiment• Objective• Controlled • Repeatable• Applies to the objective and measurable• May not represent real world• May not generalise

2

The Scientific Method: A way to find things out

• Choose something to study• Form a Theory of how it works• Form an Hypothesis consistent with

the Theory• Devise an Experiment to test the

Hypothesis• Perform the Experiment• Analyse the Data• Make an Inference• Confirm/Refute the Hypothesis• Confirm, Reject, Modify the Theory• Karl Popper -

Conjectures and Refutations 3

The Scientific Method

Moon Feather Hammer Experiment

http://livepage.apple.com/

http://livepage.apple.com/

Issues concerning experiments

• Hypothesis• Experimental design• Variables - independent/

Dependent• Types of Variables• Subjects - Within/Between• Samples - paired/unpaired• Statistical technique• Assumptions• Parametric/Non- parametric

techniques• CLT - why normal distribution is

so common

Issues concerning experiments: Hypothesis

Hypothesis• A formal statement about how the

the world is• Objective• Testable• Refutable

Null Hypothesis• For Statistical tests we often adopt

a ‘Null’ Hypothesis• Example: Typing on a touch screen

is just as fast as a keyboard• We then test to see how likely our

data is given the Hypothesis

4

Quantitative analysis: inferencen Example: From data-logging, you found that 8 out of 10

people used the menu rather than the shortcut button you had provided for the task. Is this evidence of a significant preference for menus in the user population?

n Rephrased: if there was no real preference (i.e. usage was random) what is the probability of observing at least 8 out of 10 people using the menu?

n We can calculate this probability exactly, and if it is very low, we could argue that it is unlikely (given the 8 out of 10 actually observed) that there was no real preference

5

Quantitative analysis: inferencen A Statistical Methodn Can describe binary outcomes using the binomial distribution:

• for n trials, each with probability p for outcome X, the probability of getting k occurances of X =

n In this case there were ten choices (n=10)n We want to know how likely it was that eight or more choices were for one option (k=8,9 or

10) if this was just random behaviour and there was no real preference (p=0.5) :

n A Statistical Inference based on conventionn As this probability > 0.05, more than 1 in 20, this is (by convention) not considered

significantly unlikelyn A Conclusion - The Null Hypothesis is Confirmed (In this case)n Conclude we do not have good evidence of preference for menus

Experimental Design

• Variables - independent/Dependent

• Types of Variables• Subjects - Within/Between• Samples - paired/unpaired• Statistical technique• Assumptions• Parametric/Non- parametric

techniques• CLT - why normal distribution is

so common

Experimental Design: Variables

• Variables - independent/Dependent– Independent - What you vary– Independent - The ‘Cause’– Dependent - What you measure– Dependent - The ‘Effect’

• Types of Variables– Discrete (Catergorical/Qualitative)

• Nominal• Ordinal

– Continuous (Quantitative)•

Experimental Design: Subjects

• Subjects - Within/Between• Within - all subjects exposed to all

conditions– less subjects– One phase may effect another– randomise

• Between - Different subjects for different conditions– More subjects– Avoids learning effects

Experimental Design:Samples

• Samples - paired/unpaired– Paired eg same user in two trials– Unpaired no link between trials

Experimental Design: Choosing a technique

• Statistical technique Chosen– experimental design– assumptions– types of variables

• Assumptions– non linked variables– underlying distribution

• Parametric/Non- parametric techniques– Parametric techniques fit a model

which may be an underlying distribution

– Non-Parametric Techniques do not rely on a specific model or

Statistical testsInterval/Ratio (Normality assumed)

Interval/Ratio (Normality not assumed), Ordinal

Dichotomy (Binomial)

Compare two unpaired groups

Unpaired t test Mann-Whitney test

Fisher's test

Compare two paired groups

Paired t test Wilcoxon test McNemar's test

Compare more than two unmatched groups

ANOVA Kruskal-Wallis test

Chi-square test

Compare more than two matched groups

Repeated-measures ANOVA

Friedman test Cochran's Q test

11(http://yatani.jp/HCIstats/HomePage)

http://yatani.jp/HCIstats/HomePage

http://yatani.jp/HCIstats/HomePage

6

Significance tests: The Central Limit TheoremBasic method: Calculate the probability of our observations with respect to

the expected distribution under some hypothesis.

To do this in general (e.g. for more than just binary variables) we often make use of the Central Limit Theorem:

If X1,X2…Xn are independent random variables from a distribution with mean µ and variance σ2

…then the sum Sn= X1+X2+…+Xn will approach the normal distribution with mean nµ and variance nσ2 as n->∞

n Note this does NOT depend on the actual distribution of X (it could be uniform, bimodal, skewed…)

7

The central limit theoremConsequences of the CLT:n For many things we measure, it makes sense to assume their value is

the sum of a number of underlying random variables E.g. height is affected by many random genetic and environmental

factors Time to complete a task is a sum of many micro-processes of

perception, cognition and actionn Hence we will often find our observations are normally distributedn But even if they are not, the distribution of the sample mean, which

is Sn /n, will approach the normal distribution with mean µ and variance σ2/n as n->∞

Note: to apply the theorem, our sample should be random and independent

n Hence we can use the normal distribution to say something about the probability of our observed mean.

12

Significant differencesn Typical question is whether A significantly differs from B

E.g. do users make more mistakes with an alternate interface?n Can rephrase as:

What is the probability of the observed difference in the samples, if they were actually random samples from the same distribution (i.e. there is no real difference)?

If this probability is small, we say they differ significantly E.g. if confidence intervals for the respective sample means do not

overlap, then the probability of them coming from the same distribution is less than 0.05

n There are some simpler ‘non-parametric’ methods that make fewer assumptions about the data: E.g. count the proportion of users who make more mistakes on A than B,

and see if this differs significantly from 50%, using the binomial distribution as in example 1.

13

Interpreting ‘significance’n Whenever you are told that X is larger than Y, you should always

ask, could this difference just be due to chance? E.g. 8 out of 10 prefer product X to product Y is not significant if

the survey only asked 10 people Should particularly avoid making expensive design changes on

the basis of data that might just reflect statistical noise

n However, if the difference is not significant this does not mean it is safe to conclude there is no real difference. The power of the measurement used might be too low If asked under better controlled conditions might have got 9/10 If asked 20 people and 16 prefer Y, this is significant at p<.05

15

False conclusions from multiple testing

n Vul et al. (2009) “Puzzlingly high correlations in fMRI studies of emotion, personality and social cognition”. Perspectives on Psychological Science, 4(3):274-290

http://jonathanstray.com/wp-content/uploads/2008/07/fmri.jpg



Visualisation of Inference (Anova)

20

Case Study:

3D Modelling is Not For WIMPS:Haptic Interaction and Digital Design Tools

• 4 Year Collaborative Project

• Edinburgh College of Art

• University of Edinburgh

• Aims:– develop virtual environment more in touch’ with creative working practices– discover degrees of multi-sensory feedback required to work intuitively – develop software applications to enhance the creative practice – promote the evolution and realisation of innovative concepts

Tacitus Project

Tacitus AHRC Research Project

Outputs• A new form of Haptic 3D digital design

medium • Working Demonstrators

– Product design tool– Animation tool– 3D sketching tool

• International publications

• Patent Application– Force guided manipulation– Dynamic guide-by-hand animation– Spin Out

User Centred Approach

• Extensive user trials– Designers and animators

consulted– Feedback on:

• Features and functions• Usability

• Engagement with Mainstream industry players

Engagement with Practice

• Partnering– Hardware / Middleware

• ReachIn / SenseGraphics• IRIS 3-D• Immersion / Novint /Force

Dimension researched– CAD

• Solid Works / Delcam /(UGS)‏

– Animation• Peppers Ghost / The Mill

Sketching

Sketches by Wendy Donaldson - ECA graduate

Physical Concept ModelsModel by Karen Wing Yin Lo - ECA graduate

Exploring Forms

Models by Jenny Deans - ECA graduate

Concepts Uncovered through Engagement

• sketching and modeling• visual discovery• supports cognitive process• facilitates mental restructuring• clarifies and communicates ideas• active “conversation”• ambiguous, abstracted, etc.• divergent, iterative process

The WIMP is today’s dominant HCI paradigm.

(Not a weak, ineffectual young man)

Examples of current computer Interfaces

Complex interfaces -complicated, overcrowded, non-intuitive, constrained...

Degrees Of Freedom

6 DOFRoll, Pitch, Yaw

Up/Down, Left/Right, Forward/Back

2 DOF

Up/Down, Left/Right

WIMP Interface- WINDOWS

- ICONS

- MENUS

- POINTERS

W

I

M

P

Why 3D Modelling is not for WIMPS

• Because of fundamental limitations of the standard WIMP input device

• The absence of affordances for the application of tacit skill

Two Handed Practice

Reachin Haptic Desktop System

Dislocated Working Practice

A more ‘Natural’ Computer Interface

• Two handedness• Co-location of haptic and visual cues.• 3 dimensional image• More natural working environment

A Comparative Study

Prototype 6dof application running on a haptic device with stereo and

co-location based on Reachin Display and Phantom

Conventional WIMP Interface

(Windows, Icons, Mouse and Pointer)

3D modeller (3Dmax)

VS

A Better look at the Task

Place the shapes onto the corresponding randomly aligned basesMotivation: Similar to assemble task in CAD Modelling ProcessWith WIMP/3DMAX (No 6DOF input No Haptic Feedback)With New 6DOF Haptic Interface - 6DOF, Stereo and Haptics

A Better look at the Task

• Within subjects design• 12 subjects – artists and designers• Minimum 2 years 3Dmax experience• Allowed to practice with no time limit• Task repeated from random start 3 times• Subjects must place objects on the

corresponding targets• Time and number of mouse/stylus clicks

recorded• Survey performed to determine usability

and perceived task load.

Analysis

• Quantitative• Task Completion Time• Stylus/Mouse Clicks (number and position)

• Qualitative• Perceived Task Load - NASA Task Load Index• System Usability - SUS rating

• Parametric• ANOVA, T-Tests

• Non- Parametric• Spearman, Mann-Whitney

Results

Total mouse clicks in 3DMax

3DS Max Mouse Click LocationsOther

View Rotation

View TranslationObject Rotation

Object translation

Task Space

Control Space

Mouse Click Analysis

48

Interaction “Fossil” Wimp

Interaction “Fossil” - Haptics

General Findings

• Striking results show great significance of spatial input devices

• Haptics increases usability and reduces workload• Potential to greatly improve 3D modelling efficiency,

task load and usability over WIMP interfaces• Confirms detrimental effect of control space/task space

mismatch of WIMP interface• Adds complexity and inefficiency of use which is a function of the

interface NOT the task!

• At least some aspects of 3D modelling may not be for WIMPs!

Questions

• Is the abstracted task representative of a real system?• When the system is built up to be fully featured will much of the

simplicity and advantage disappear?• This reductive objective analysis has confirmed an effect exists

which is the great power of the Classical Cognitive Paradigm of HCI

• We should partner this analysis with user centred design approach of the Embodied Interactive paradigm

Sketching: A simple but complete system to explore practice

3D Digital Sketching Trial of complete system

Artist User - Claude Heath – physical media

Artist User - Haptic Sketch - Claude Heath

Artist User - Haptic 3D IRIS - Claude Heath

Tacitus SpinOut - research partnership

• £180K Scottish Enterprise Proof of Concept Project

• SMART Award • Continued symbiotic relationship with research

• New applications to Rapid Prototyping,

Visualisation and Animation

•Video

Rapid Prototyping added to system

New Research Directions - Hybrid Animation Stop/Key Frame

17

References

Dix et. al. chapter 9 http://yatani.jp/HCIstats/HomePage http://my.ilstu.edu/~mshesso//apa_stats.htm Cairns, P. , “HCI... not as it should be: inferential

statistics in HCI research”, Proceedings of the 21st British HCI Group Annual Conference on People and Computers, 195-201, 2007.