1
Evaluating Design III: Experiment and Inference HCI Lecture 10
Dr Mark WrightKey points:
The Scientific Method Setting up an Experiment The Process of Inference Normal distributions and the Central Limit Theorem Interpreting significance Case Study - Research Project
Finding things out by Experiment• Objective• Controlled • Repeatable• Applies to the objective and measurable• May not represent real world• May not generalise
2
The Scientific Method: A way to find things out
• Choose something to study• Form a Theory of how it works• Form an Hypothesis consistent with
the Theory• Devise an Experiment to test the
Hypothesis• Perform the Experiment• Analyse the Data• Make an Inference• Confirm/Refute the Hypothesis• Confirm, Reject, Modify the Theory• Karl Popper -
Conjectures and Refutations 3
The Scientific Method
Moon Feather Hammer Experiment
Issues concerning experiments
• Hypothesis• Experimental design• Variables - independent/
Dependent• Types of Variables• Subjects - Within/Between• Samples - paired/unpaired• Statistical technique• Assumptions• Parametric/Non- parametric
techniques• CLT - why normal distribution is
so common
Issues concerning experiments: Hypothesis
Hypothesis• A formal statement about how the
the world is• Objective• Testable• Refutable
Null Hypothesis• For Statistical tests we often adopt
a ‘Null’ Hypothesis• Example: Typing on a touch screen
is just as fast as a keyboard• We then test to see how likely our
data is given the Hypothesis
4
Quantitative analysis: inferencen Example: From data-logging, you found that 8 out of 10
people used the menu rather than the shortcut button you had provided for the task. Is this evidence of a significant preference for menus in the user population?
n Rephrased: if there was no real preference (i.e. usage was random) what is the probability of observing at least 8 out of 10 people using the menu?
n We can calculate this probability exactly, and if it is very low, we could argue that it is unlikely (given the 8 out of 10 actually observed) that there was no real preference
5
Quantitative analysis: inferencen A Statistical Methodn Can describe binary outcomes using the binomial distribution:
• for n trials, each with probability p for outcome X, the probability of getting k occurances of X =
n In this case there were ten choices (n=10)n We want to know how likely it was that eight or more choices were for one option (k=8,9 or
10) if this was just random behaviour and there was no real preference (p=0.5) :
n A Statistical Inference based on conventionn As this probability > 0.05, more than 1 in 20, this is (by convention) not considered
significantly unlikelyn A Conclusion - The Null Hypothesis is Confirmed (In this case)n Conclude we do not have good evidence of preference for menus
Experimental Design
• Variables - independent/Dependent
• Types of Variables• Subjects - Within/Between• Samples - paired/unpaired• Statistical technique• Assumptions• Parametric/Non- parametric
techniques• CLT - why normal distribution is
so common
Experimental Design: Variables
• Variables - independent/Dependent– Independent - What you vary– Independent - The ‘Cause’– Dependent - What you measure– Dependent - The ‘Effect’
• Types of Variables– Discrete (Catergorical/Qualitative)
• Nominal• Ordinal
– Continuous (Quantitative)•
Experimental Design: Subjects
• Subjects - Within/Between• Within - all subjects exposed to all
conditions– less subjects– One phase may effect another– randomise
• Between - Different subjects for different conditions– More subjects– Avoids learning effects
Experimental Design:Samples
• Samples - paired/unpaired– Paired eg same user in two trials– Unpaired no link between trials
Experimental Design: Choosing a technique
• Statistical technique Chosen– experimental design– assumptions– types of variables
• Assumptions– non linked variables– underlying distribution
• Parametric/Non- parametric techniques– Parametric techniques fit a model
which may be an underlying distribution
– Non-Parametric Techniques do not rely on a specific model or
Statistical testsInterval/Ratio (Normality assumed)
Interval/Ratio (Normality not assumed), Ordinal
Dichotomy (Binomial)
Compare two unpaired groups
Unpaired t test Mann-Whitney test
Fisher's test
Compare two paired groups
Paired t test Wilcoxon test McNemar's test
Compare more than two unmatched groups
ANOVA Kruskal-Wallis test
Chi-square test
Compare more than two matched groups
Repeated-measures ANOVA
Friedman test Cochran's Q test
11(http://yatani.jp/HCIstats/HomePage)
6
Significance tests: The Central Limit TheoremBasic method: Calculate the probability of our observations with respect to
the expected distribution under some hypothesis.
To do this in general (e.g. for more than just binary variables) we often make use of the Central Limit Theorem:
If X1,X2…Xn are independent random variables from a distribution with mean µ and variance σ2
…then the sum Sn= X1+X2+…+Xn will approach the normal distribution with mean nµ and variance nσ2 as n->∞
n Note this does NOT depend on the actual distribution of X (it could be uniform, bimodal, skewed…)
7
The central limit theoremConsequences of the CLT:n For many things we measure, it makes sense to assume their value is
the sum of a number of underlying random variables E.g. height is affected by many random genetic and environmental
factors Time to complete a task is a sum of many micro-processes of
perception, cognition and actionn Hence we will often find our observations are normally distributedn But even if they are not, the distribution of the sample mean, which
is Sn /n, will approach the normal distribution with mean µ and variance σ2/n as n->∞
Note: to apply the theorem, our sample should be random and independent
n Hence we can use the normal distribution to say something about the probability of our observed mean.
12
Significant differencesn Typical question is whether A significantly differs from B
E.g. do users make more mistakes with an alternate interface?n Can rephrase as:
What is the probability of the observed difference in the samples, if they were actually random samples from the same distribution (i.e. there is no real difference)?
If this probability is small, we say they differ significantly E.g. if confidence intervals for the respective sample means do not
overlap, then the probability of them coming from the same distribution is less than 0.05
n There are some simpler ‘non-parametric’ methods that make fewer assumptions about the data: E.g. count the proportion of users who make more mistakes on A than B,
and see if this differs significantly from 50%, using the binomial distribution as in example 1.
13
Interpreting ‘significance’n Whenever you are told that X is larger than Y, you should always
ask, could this difference just be due to chance? E.g. 8 out of 10 prefer product X to product Y is not significant if
the survey only asked 10 people Should particularly avoid making expensive design changes on
the basis of data that might just reflect statistical noise
n However, if the difference is not significant this does not mean it is safe to conclude there is no real difference. The power of the measurement used might be too low If asked under better controlled conditions might have got 9/10 If asked 20 people and 16 prefer Y, this is significant at p<.05
15
False conclusions from multiple testing
n Vul et al. (2009) “Puzzlingly high correlations in fMRI studies of emotion, personality and social cognition”. Perspectives on Psychological Science, 4(3):274-290
Visualisation of Inference (Anova)
20
Case Study:
3D Modelling is Not For WIMPS:Haptic Interaction and Digital Design Tools
• 4 Year Collaborative Project
• Edinburgh College of Art
• University of Edinburgh
• Aims:– develop virtual environment more in touch’ with creative working practices– discover degrees of multi-sensory feedback required to work intuitively – develop software applications to enhance the creative practice – promote the evolution and realisation of innovative concepts
Tacitus Project
Tacitus AHRC Research Project
Outputs• A new form of Haptic 3D digital design
medium • Working Demonstrators
– Product design tool– Animation tool– 3D sketching tool
• International publications
• Patent Application– Force guided manipulation– Dynamic guide-by-hand animation– Spin Out
User Centred Approach
• Extensive user trials– Designers and animators
consulted– Feedback on:
• Features and functions• Usability
• Engagement with Mainstream industry players
Engagement with Practice
• Partnering– Hardware / Middleware
• ReachIn / SenseGraphics• IRIS 3-D• Immersion / Novint /Force
Dimension researched– CAD
• Solid Works / Delcam /(UGS)
– Animation• Peppers Ghost / The Mill
Sketching
Sketches by Wendy Donaldson - ECA graduate
Physical Concept ModelsModel by Karen Wing Yin Lo - ECA graduate
Exploring Forms
Models by Jenny Deans - ECA graduate
Concepts Uncovered through Engagement
• sketching and modeling• visual discovery• supports cognitive process• facilitates mental restructuring• clarifies and communicates ideas• active “conversation”• ambiguous, abstracted, etc.• divergent, iterative process
The WIMP is today’s dominant HCI paradigm.
(Not a weak, ineffectual young man)
Examples of current computer Interfaces
Complex interfaces -complicated, overcrowded, non-intuitive, constrained...
Degrees Of Freedom
6 DOFRoll, Pitch, Yaw
Up/Down, Left/Right, Forward/Back
2 DOF
Up/Down, Left/Right
WIMP Interface- WINDOWS
- ICONS
- MENUS
- POINTERS
W
I
M
P
Why 3D Modelling is not for WIMPS
• Because of fundamental limitations of the standard WIMP input device
• The absence of affordances for the application of tacit skill
Two Handed Practice
Reachin Haptic Desktop System
Dislocated Working Practice
A more ‘Natural’ Computer Interface
• Two handedness• Co-location of haptic and visual cues.• 3 dimensional image• More natural working environment
A Comparative Study
Prototype 6dof application running on a haptic device with stereo and
co-location based on Reachin Display and Phantom
Conventional WIMP Interface
(Windows, Icons, Mouse and Pointer)
3D modeller (3Dmax)
VS
A Better look at the Task
Place the shapes onto the corresponding randomly aligned basesMotivation: Similar to assemble task in CAD Modelling ProcessWith WIMP/3DMAX (No 6DOF input No Haptic Feedback)With New 6DOF Haptic Interface - 6DOF, Stereo and Haptics
A Better look at the Task
• Within subjects design• 12 subjects – artists and designers• Minimum 2 years 3Dmax experience• Allowed to practice with no time limit• Task repeated from random start 3 times• Subjects must place objects on the
corresponding targets• Time and number of mouse/stylus clicks
recorded• Survey performed to determine usability
and perceived task load.
Analysis
• Quantitative• Task Completion Time• Stylus/Mouse Clicks (number and position)
• Qualitative• Perceived Task Load - NASA Task Load Index• System Usability - SUS rating
• Parametric• ANOVA, T-Tests
• Non- Parametric• Spearman, Mann-Whitney
Results
Total mouse clicks in 3DMax
3DS Max Mouse Click LocationsOther
View Rotation
View TranslationObject Rotation
Object translation
Task Space
Control Space
Mouse Click Analysis
48
Interaction “Fossil” Wimp
Interaction “Fossil” - Haptics
General Findings
• Striking results show great significance of spatial input devices
• Haptics increases usability and reduces workload• Potential to greatly improve 3D modelling efficiency,
task load and usability over WIMP interfaces• Confirms detrimental effect of control space/task space
mismatch of WIMP interface• Adds complexity and inefficiency of use which is a function of the
interface NOT the task!
• At least some aspects of 3D modelling may not be for WIMPs!
Questions
• Is the abstracted task representative of a real system?• When the system is built up to be fully featured will much of the
simplicity and advantage disappear?• This reductive objective analysis has confirmed an effect exists
which is the great power of the Classical Cognitive Paradigm of HCI
• We should partner this analysis with user centred design approach of the Embodied Interactive paradigm
Sketching: A simple but complete system to explore practice
3D Digital Sketching Trial of complete system
Artist User - Claude Heath – physical media
Artist User - Haptic Sketch - Claude Heath
Artist User - Haptic 3D IRIS - Claude Heath
Tacitus SpinOut - research partnership
• £180K Scottish Enterprise Proof of Concept Project
• SMART Award • Continued symbiotic relationship with research
• New applications to Rapid Prototyping,
Visualisation and Animation
•Video
Rapid Prototyping added to system
New Research Directions - Hybrid Animation Stop/Key Frame
17
References
Dix et. al. chapter 9 http://yatani.jp/HCIstats/HomePage http://my.ilstu.edu/~mshesso//apa_stats.htm Cairns, P. , “HCI... not as it should be: inferential
statistics in HCI research”, Proceedings of the 21st British HCI Group Annual Conference on People and Computers, 195-201, 2007.