An Introduction to Web-based Experimentation Stian Reimers

An Introduction to Web-based Experimentation

Stian Reimers

Overview

• Why bother with web testing?• Different ways of implementing web tests• Design issues: How to get top-quality data• Ethics and good scientific practice• The future of experimenterless experiments

Why bother with web testing?

• Cheap - don’t need to pay participants• Time saving - once set up, experiment can be left to run• Thousands of participants• Wider range of people than traditional undergraduate

subject pool• Possible to target low-frequency subgroups• Reduces experimenter bias• Encourages dialogue between academic and public

• Please go to the following URL:

snipurl.com/128xy

http://snipurl.com/128xy

Case Study: Task Switching at the BBC

• Set up in 2003, to tie in with TV series• Coded in Flash, mainly using Actionscript• Visitors to BBC S&N website click through• Data passed to our server at end of experiment

Age effects on RT

Age effects on specific and general switch costs

Summary: BBC Task Switching Experiment

• Around 50,000 participants so far• Possible to examine task switching in ages 10-66• Reported in Reimers and Maylor (2005; 2006)• Experiment has changed several times• Allows us to test new theories with minimal effort• Gets participants for further experiments

Four examples of web-based implementation

HTML with forms (and javascript)

Java

Adobe Flash

Adobe Authorware

HTML

Perl scriptSave dataGenerate dynamicresponse page

Email experimenter

• Please return to:

snipurl.com/128xy

and follow the link to example 2


HTML/Javascript: Advantages

• Quick and easy to set up• Doesn’t require long initial downloads• No plug-ins required, fewer browser compatibility issues• Can display embedded visual and audio stimuli• Familiar interface

Ideal for surveys, personality research, decision making.

HTML/Javascript: Disadvantages

• Poor at <1 sec RT measurement• Imperfect control over item display• Hard to control simulus durations• Load time between pages depends on internet traffic• Have to be online for each new page

Not good for most psychophysical experiments, experiments where effects appear in RT

rather than accuracy data, etc.


snipurl.com/128xy



Java: Advantages

• Can be used for fairly accurate RT measurement.• Sandboxed, so can’t damage a user’s computer • Ostensibly platform independent, so one code could be

used for web-, lab-, and, say, mobile- based execution• Costs nothing.• Client-side implementation

Ideal for experiments measuring RT.

Java: Disadvantages

• Relatively difficult language to master, particularly for running on multiple platforms

• Need skill to make programs intuitive and ergonomic• Slow start-up time• Issues of different versions (Sun vs. Microsoft)• Not always installed/enabled

Not good for novice programmers, one-off experiments

• You can return to:

snipurl.com/128xy

to retry the task switching experiment


Adobe Flash: Advantages

• Similar advantages to Java– Client side processing, sandboxed, platform independent

• Designed for web implementation, so easier than Java to make good-looking experiments

• Can combine code written in Actionscript with animation-style features

• More ubiquitous plug-in

Ideal for ‘fun’ or multi-stage experiments.

Adobe Flash: RT Measurement

Reimers, S., & Stewart, N. (in press). Adobe Flash as a medium for online experimentation: A test of RT measurement capabilities. Behavior Research Methods.

Adobe Flash: Disadvantages

• Requires plug-in (but 97.3% of computers have it installed already)• Commercial software, so costs money• Easily decompilable• Awkward stimulus timing • May be blocked by advert-filtering software• Possible differences in performance across platform

Not good for very low spec machines, tachistoscopic presentation, sequences of rigidly timed stimuli.


snipurl.com/128xy



Adobe Authorware: Advantages

• Similar advantages to Flash• And very user friendly• Quite similar to testing applications like Superlab

Ideal for experimenters who aren’t very confident programmers but want to run web experiments.

Adobe Authorware: Disadvantages

• Not cheap to buy• Requires plug-in which most people won’t have• Quite a niche product - harder to find casual programmers

to code up experiments• Relatively untested with respect to measurement accuracy,

display consistency etc

Not good for uncommitted participants or cash-strapped researchers.

Designing Web-based Experiments

Key differences

Multiple submission

Drop-out

Dishonesty

Mental state

Recruitment

Key Differences Between Web- and Lab-based Testing

• Less social pressure– May reduce demand issues– But also increases drop-out rate, lying

• Unverifiable demographics• Less control over experimental setting

– Loud music, monitor size, drunkenness

• Less control over multiple submissions

Multiple Submissions

• Historically not that big a problem (Krantz & Dalal, 2000; Musch & Reips, 2000)– But likely to be more so if participants are paid

• Ask people if they’ve taken part before• Get unique identifier (email address, NI number)• Set a cookie• Log their IP address

Dropout in most studies is a minor problem

• Sample is not representative– But still better than undergraduates

• Ideally, should log the number of participants who start the experiment relative to number who finish it.

• Gives useful info on how much people are enjoying your study

Dropout in experiments can lead to sampling biases and misleading results

Lazy People

Committed PeopleEasy Condition

Lazy People

Committed PeopleHard Condition

So, to prevent this sort of problem, use the ‘high-hurdle’ approach

Dull, irrelevant task

Easy Condition

Hard Condition

And generally, try to prevent drop-out by making things fun, easy, and interesting

• Make it fun to do and nice to look at• Implement as a game where possible• Sunk cost effect: Put the dull stuff at the end• Ask people to complete the entire test• Feedback

– Tell people about themselves– Comparisons with rest of population

• Describe the experiment’s aims and the science behind it

Dishonesty, carelessness, misunderstanding

• Not as big a problem as you might imagine– 3.5-6.3% junk /1% split-half inconsistencies (Johnson, 2005)– 1-5% inconsistency in sex differences study (Reimers, in press)– Cf. 0.7% of pencil and paper (Gough & Bradley, 1996)

• Make submission of demographic data voluntary– Or give option of ‘I’d rather not say’

• Ask the same questions at start and end– Check for consistency, but may look sneaky

• Put in equivalent of a ‘lie scale’• Obviously, remove people who aren’t responding honestly

Mental state of participants

• Can’t screen for people in abnormal mental states• Relatively small proportion of experimental

population• Remove egregious datasets at analysis stage• Ask people directly (and sensitively)• Include screening questions to show general

competence

Getting participants to do your (ergonomic, well-designed) experiment

• Get links (e.g. from department or study index site)• Advertise (banners etc)

– Costs money. Unproven effectiveness, but great potential.• Set up email list of willing participants• Pay participants

– Costs money. Multiple submissions, careless participation. Hassle to implement.

• Use a reward scheme like ipoints– Effective, can pay little, no multiple submissions,

select appropriate demographic, easy to run

Ethics of Web-based Experiments

Key differences

Informed consent

Sensitive material / personal questions

Unflattering feedback

Deception

Debrief

Key Differences between lab- and web-based research

• You are not present– Can’t offer feedback and reassurance– Can’t check a participant is in a suitable mood– Can’t tell how old a participant is– Can’t answer any questions or concerns

• Broader demographic– More lonely or socially isolated participants– More participants with mental illnesses

Informed consent

Informed consent: Pros and cons

• Follows ethical guidelines• Explains things that may otherwise have caused concern

to the participant – Dropping out is okay, data are anonymous

• Makes the experiment look more authoritative and serious• But may scare off people who’d otherwise have enjoyed

the experiment• Seems to be more of a back-covering exercise than an

attempt to ensure the participant is protected

Do I need informed consent?

Kraut R., Olson J., Banaji M., Bruckman A., Cohen J., & Couper M. (2004) Psychological research online: Report of board of scientific affairs' advisory group on the conduct of research on the Internet, American Psychologist 59, 105-117.

Sensitive material / Personal questions

• You may offend people or evoke unpleasant thoughts or memories.

• Warn people at the start of the experiment• Remind people that responding is optional• Say ‘Adults only’ or better still get people to enter their age,

and skip sensitive questions if under 18• Be sensitive in wording of questions and implications of

particular ways of framing information• Offer contact details for further information

Feedback risks making a participant feel stupid or establishing apparent norms

• Don’t tell people they’re in the bottom decile for performance on a cognitive/IQ task

‘all the women were strong, all the men were good looking and all the children were above average’

• Use broad categories for giving feedback, but better not to lie about actual performance. ‘You did better than 20%...’

• Include caveats about how poor a measure or performance your test is

• And how performance varies a lot intraindividually• And how the other participants may not be representative

Deception is not recommended online

• Always a sensitive issue• Difficult online, because debrief is harder• Need to reassure participants that they are not

being mocked or exploited when experimenter is not present

• Get ethics board input before running

Debrief

• Try to explain the aim of the experiment in simple terms– Run it past your friends and family first to make sure it’s

easily understandable

• Thank the participant for their time• Give them an email address to contact you if they

want further information or to see the final results

Sixteen standards for web-based experimenting (Reips, 2002)

Sixteen standards

• Consider a software tool for development• Pretest for clarity• Decide on HTML vs. plugins• Check for errors• Link to several sites to check self selection• Run online and offline for comparison• Use warm-up technique to avoid dropout (maybe)• Use dropout to check motivational confounding

Sixteen standards

• Minimise dropout• Highlight seriousness of experiment• Check for obvious naming of files or passwords• Avoid multiple submissions• Perform consistency checks• Keep full details for others to analyse• Report and analyse dropout curves• Keep experiment available online

The Future…

Massive longitudinal panel experiments

• Already used to running experiments with >250,000 participants

• Possible to get thousands of people from ever broader demographic to participate repeatedly

• Look at, for example, cognitive aging of individuals• Set up panels of reliable participants

– Choose demographic, etc. Cross-tabulate results from many experiments, get vast amounts of data

New devices

• Run experiments using WAP on mobile phones • If you know Java, it’s relatively easy to adapt an

application to, say, Series 60 Nokias• E.g., memory task. Participants download application.

Every hour the phone vibrates and participants see another item. Test at end of day. Send results by SMS.

• Give people a task to do at unpredictable points, check effect of time of day, mood, etc.

Conclusion

• Web-based testing can be a powerful tool for investigating issues hard to investigate in the lab

• Web-based testing has some core differences from lab-based testing

• These differences have advantages and disadvantages• In years to come there will be new ways to test people

outside the laboratory• Web-based testing is now accepted in the research

community

Documents

An Introduction to Web-based Experimentation Stian Reimers