82
Questions Raised by Results Questions raised by significant results External validity Do results generalize to other levels of the IV? Do other variables moderate the IV’s effect? Construct validity (Was control group good enough?) Simple two-group experiments may not be good enough

Questions Raised by Results - Pennsylvania State University

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Questions Raised by Results - Pennsylvania State University

Questions Raised by Results Questions raised by significant results External validity

Do results generalize to other levels of the IV? Do other variables moderate the IV’s effect?

Construct validity (Was control group good enough?) Simple two-group experiments may not be good

enough

Page 2: Questions Raised by Results - Pennsylvania State University

Multi-Group Experiments

Improve construct validity Improve external validity Answer more questions

Page 3: Questions Raised by Results - Pennsylvania State University

No Med High

Amount of Treatment

No Yes

Example

Page 4: Questions Raised by Results - Pennsylvania State University

Benefits of Multi-Group Experiment

Better ability to estimate the effects of different amounts (levels) of a treatment

Better ability to rule out confounding variables May help discover significant relationships May help more accurately map the functional

relationship Summary: Multi-level experiments have more

external validity than simple experiments

Page 5: Questions Raised by Results - Pennsylvania State University

Group Assignment inMulti-Group Experiment

Similar procedure as in simple experiment Using random table Using a dice (rather than a coin) Using Excel

Page 6: Questions Raised by Results - Pennsylvania State University

Statistics Methods

t tests ANOVA

Page 7: Questions Raised by Results - Pennsylvania State University

Control Group

No treatment at all? May create noise Example: medicine Taking medicine may create psychological effect.

Subjects may guess what you try to do Solution Provide "fake" treatment Similar to real treatment, in form only. Placebo pills Systems with similar appearance, but different treatments.

Page 8: Questions Raised by Results - Pennsylvania State University

Factorial Design

Why multiple factors? Effects of multiple factors on subjects Main effect Effect of individual factors independently

Interaction Effect of two factors jointly

Most often seen design: 2 x 2

Page 9: Questions Raised by Results - Pennsylvania State University

2X2 Factorial Design

Two factors Two levels of each factor: presence or not. Four conditions

What do researchers expect? Two simple main effects Does a factor affect subjects?

The interaction between two factors Does these two factors interfere with each other? Concerning external validity

Page 10: Questions Raised by Results - Pennsylvania State University

Interactions Are Important

Simple experiment is about main effect. Maybe weak in generalization Other non-studied factors may interfere. e.g., drugs: active ingredient + other chemicals people may

take in their daily life.

Interesting questions are often about interactions e.g., drug safety

External validity questions are questions involving interactions

Page 11: Questions Raised by Results - Pennsylvania State University

An Example

A study on the effect of using IM on productivity Two groups of employees Experimental group: use IM Control group: don’t use IM

Don’t use Use

Page 12: Questions Raised by Results - Pennsylvania State University

Someone Questions Your Results

“I saw people using more IM in move. Does the use of IM in move have anything to do with productivity?” Two factors: IM and mobile

You need a 2 x 2 design Use IM: yes/no On mobile device: yes/no

Your data from the study Dependent variable: How many orders do they

make per week?

Page 13: Questions Raised by Results - Pennsylvania State University

Different Types of Result

Main effect of both factors, and interaction Main effect of both factors, but no interaction Main effect of one factor, and interaction Main effect of one factor, but no interaction Interaction with no main effect of either factor No interaction, no main effect

Page 14: Questions Raised by Results - Pennsylvania State University

Not Mobile Mobile

Not use IM

Use IM

No Mobile Mobile

Not use IM

Use IM

70 75

73 83

Two main effects, with interaction

Page 15: Questions Raised by Results - Pennsylvania State University

Not Mobile Mobile

Not use IM

Use IM

Two main effects, with no interaction

Page 16: Questions Raised by Results - Pennsylvania State University

Not Mobile Mobile

Not use IM

Use IM

One main effect, with interaction

Page 17: Questions Raised by Results - Pennsylvania State University

Not Mobile Mobile

Not use IM

Use IM

One main effect, with no interaction

Page 18: Questions Raised by Results - Pennsylvania State University

Not Mobile Mobile

Not use IM

Use IM

No main effect, with interaction

Page 19: Questions Raised by Results - Pennsylvania State University

Not Mobile Mobile

Not use IM

Use IM

No main effect, no interaction

Page 20: Questions Raised by Results - Pennsylvania State University

Interpretation of Results Two main effects without interaction

It all adds up. Two main effects with interaction

Two factors may amplify or impede the effect. One main effect without interaction

The effect of the factor is independent. One main effect with interaction

One factor may not have effect, but can affect the other one. e.g., catalyst

No effect with interaction The effect of one factor depends on the other, although the effect

is not significant.

Page 21: Questions Raised by Results - Pennsylvania State University

In Sum

Factorial design allows to study multiple factors in one study.

More treatments More complex results More difficult to design and execute

Page 22: Questions Raised by Results - Pennsylvania State University

Matched Pairs, Within-Subjects,

and Mixed Design

Page 23: Questions Raised by Results - Pennsylvania State University

Different Kinds of Experiments Field experiments

In a natural setting, rather than in a lab Advantages: external validity, construct validity Disadvantages: Independence of groups, and process control

Matched pairs design Create two similar groups based on a certain criteria Advantage: internal validity Problems: matching could be hard and inaccurate

Other factors: individual differences Within-subjects designs

Each subject goes through all treatments Comparing different treatments of the same subject

Advantages: internal validity Mixed designs in factorial design

One factor: within-subjects The other: matched pairs or between-subjects

Page 24: Questions Raised by Results - Pennsylvania State University

Experimental Group Control Group

=FieldExperiment

Matched Pairs =?

Within-Subjects

Page 25: Questions Raised by Results - Pennsylvania State University

Matched Pairs Design: Procedure

Form matched pairs Randomly assign one member of each pair to

the treatment condition, the other to the control condition

Page 26: Questions Raised by Results - Pennsylvania State University

Considerations in Using Matched Pairs Designs Finding an effective matching variable e.g., race, education, age, …

External validity Advantage: Don’t restrict subject population (can

have heterogeneous group) Disadvantage: Results may not be generalized to

participants who haven’t done the matching task Construct validity weakened because matching

may tip off participants about hypothesis

Page 27: Questions Raised by Results - Pennsylvania State University

Analysis of Data in the Matched Pairs Design

Not the between-subjects t test (observations are not independent)

Dependent t test: Differences between pairs/ standard error of differences

Page 28: Questions Raised by Results - Pennsylvania State University

Within-Subjects (Repeated Measures) Designs

Considerations in using within-subjects designs Increased power Order effects harm internal validity

Page 29: Questions Raised by Results - Pennsylvania State University

Danger: Order Effects

Four Sources of Order Effects Practice effects Gradually improved performances

Fatigue effects Gradually deteriorated performances

Treatment carryover effects Previous treatments affect the following treatments.

Sensitization Subjects can guess what your IVs and DVs are, and

may play along with it. Hurt both construct validity and internal validity (why)

Page 30: Questions Raised by Results - Pennsylvania State University

Dealing with Order Effects Minimizing each individual threat Practice, fatigue, carryover, sensitization

Allow sufficient practice, make tasks interesting, use few levels, allow sufficient time between treatments, make treatment level less noticeable, etc.

Mixing up sequences to try to balance out order effects: Randomizing and counterbalancing Randomized or counterbalanced within-subjects

designs

Page 31: Questions Raised by Results - Pennsylvania State University

Randomized Within-Subjects Design Randomly determine the sequence of treatments for

each participant

Bet on luck!

We need a better solution to rule out the possibility of order effects.

Page 32: Questions Raised by Results - Pennsylvania State University

Counterbalanced Within-Subjects Design

Design a set of sequences such that Every condition appears in every position the

same number of times Every condition precedes every other

condition just as many times as it follows that condition

Randomly assign participants to your sequences

Page 33: Questions Raised by Results - Pennsylvania State University

Examples Two variables A – B and B – A

Three variables Treatment order 1 2 3

Group 1 A B CGroup 2 B C AGroup 3 C A BGroup 4 C B AGroup 5 A C BGroup 6 B A C

Page 34: Questions Raised by Results - Pennsylvania State University

Latin Square

Help you to get the sequence of conditions 4 x 4 Latin square Treatment order 1 2 3 4

Group 1 A B D CGroup 2 B C A DGroup 3 C D B AGroup 4 D A C B

Page 35: Questions Raised by Results - Pennsylvania State University

Pros and Cons of Counterbalanced Within-Subjects Designs Balances out order effects Provides information not only about the effect

of treatment, but also about the effect of order (trials, position) and sequence

May require more subjects Analysis is more sophisticated ANOVA is often required.

Page 36: Questions Raised by Results - Pennsylvania State University

Conclusion Experiment studies Manipulate IVs Compare different groups Study several factors Good internal validity

Counterbalanced design, process control Improved external validity and construct validity

Experimental design Process

Craft Task

Designing a good task needs creative thinking!

Page 37: Questions Raised by Results - Pennsylvania State University

Running an Experimental Study

Page 38: Questions Raised by Results - Pennsylvania State University

The Book Guide book for

studies involving human users

Offers detailed guidelines

Page 39: Questions Raised by Results - Pennsylvania State University
Page 40: Questions Raised by Results - Pennsylvania State University

Practice and Experience Running an experiment needs practices. Experience is very important In particular to the design of an experiment

We can only discuss some general issues in this course. Semester-long courses on experimental design in

traditional psychology department

Use my research as an example

Page 41: Questions Raised by Results - Pennsylvania State University

Issues to Consider

Experimental Design Test Space, Instrument, Apparatus Data Collection Subject recruitment Experimental Protocols Experimental Process Pilot Test

Page 42: Questions Raised by Results - Pennsylvania State University

Experimental Design Most difficult part May take a long time to get a "satisfactory" design Several rounds Various factors to consider

Tasks, subjects, equipment, etc. Based on your hypothesis Key considerations Tasks

Incorporate both IVs and DVs with good validities Not harmful to subjects Not boring Reasonable time span

Treatments

Page 43: Questions Raised by Results - Pennsylvania State University

Experimental Design: Example

My hypothesis Multiscale collaboration is more effective to help

people deal with complex information with different levels of details.

What is Multiscale Collaboration?

Page 44: Questions Raised by Results - Pennsylvania State University

Gulliver’s Travels

Page 45: Questions Raised by Results - Pennsylvania State University
Page 46: Questions Raised by Results - Pennsylvania State University

Multiscale Collaborative Virtual Environments (mCVE)

Multiple users work together but from different scales Cross-scale collaboration between ants and

giants Users have different interaction domains

Visual information, navigation, manipulation, …

Page 47: Questions Raised by Results - Pennsylvania State University

mCVE Example

Page 48: Questions Raised by Results - Pennsylvania State University

Experimental Study on mCVE IVs: Multiscale information presentation and

interaction Collaboration

DV: Task completion time in interacting with complex

information Task: Searching objects with specific features in a large

area

Page 49: Questions Raised by Results - Pennsylvania State University

IVs and DV Multiscale factors Information presentation

Large area: Global-level information to assist search Specific features: local-level details to assist object identification

Movement Giant steps to move quickly but roughly Baby steps to move accurately

Collaboration factors Information sharing: where the other person is Movement: the giant can carry the ant to move quickly.

DV: how long it takes to complete a task

Page 50: Questions Raised by Results - Pennsylvania State University
Page 51: Questions Raised by Results - Pennsylvania State University
Page 52: Questions Raised by Results - Pennsylvania State University

Expected Results Multiscale + collaboration > multiscale Multiscale + collaboration > collaboration Multiscale > no multiscale Collaboration > no multiscale Multiscale + collaboration > no multiscale +

no collaboration Mulitscale ? collaboration

Page 53: Questions Raised by Results - Pennsylvania State University

Treatments Determined by your IVs Two factors Multiscale: Yes/No Collaboration: Yes/No 2X2 factorial design

Interested in the impact of collaboration style on user performance Three different collaboration style

Role-free One as a guide One as a carrier

2 x 2 + 2: six treatments

Page 54: Questions Raised by Results - Pennsylvania State University
Page 55: Questions Raised by Results - Pennsylvania State University

Construct Validity Variables Multiscale Cross-scale information access Cross-scale action

Collaboration Dividing a task and conquering it in parallel

Performances Task completion time

Page 56: Questions Raised by Results - Pennsylvania State University

Other Issues

Making the task fun Simple search defusing a bomb Providing feedback to motivate people Should not reveal your true intention.

Page 57: Questions Raised by Results - Pennsylvania State University

Test Space, Instrument, Apparatus

Test space A usability lab would be

ideal. Quiet, distraction free,

easy for observation, recording capabilities, etc.

Page 58: Questions Raised by Results - Pennsylvania State University

Test Space, Instrument, Apparatus If no access to a usability testing lab Set up space appropriate for experiment Private space for subject

Never do an experiment in a public place.

Page 59: Questions Raised by Results - Pennsylvania State University

Test Space, Instrument, Apparatus

Instrument Capturing user performance Recording devices: audio and/or video Computer to capture data automatically if possible

Often operated by experimenters Apparatus Used by subjects directly Usually computers and related peripheral devices Also include software tools

Page 60: Questions Raised by Results - Pennsylvania State University

Interaction Devices

How people interact with computer varies from person to person. May lead to errors. e.g., mouse moving and clicking delay

Methods to avoid or reduce the potential errors if possible

Page 61: Questions Raised by Results - Pennsylvania State University

In My Study Subjects needed to control movement and

scale. Mouse control could be a concern.

My approach Using keyboard Providing a key-function map Labeling the function of each key Using simple language

Page 62: Questions Raised by Results - Pennsylvania State University
Page 63: Questions Raised by Results - Pennsylvania State University
Page 64: Questions Raised by Results - Pennsylvania State University
Page 65: Questions Raised by Results - Pennsylvania State University

Software Tools Different levels of functions and fidelity Truly functional systems Can deal with any actions by subjects No need to worry about "misbehaviors" by subjects

Partially functional systems Support primary tasks Need to watch subjects closely and carefully Preventing the system from malfunctioning

Mock-up systems Example: Wizard-of-Oz style system A system only offering user interface and other functions A human experimenter sitting behind to provide results based on user

inputs

Page 66: Questions Raised by Results - Pennsylvania State University

What You Should Pick …

Depends on your goal, your technical skills/resources, your time, etc.

What is your interest? Design + evaluation vs. evaluation only

Can you develop the system or have someone do it for you?

Do you have one year or five years?

Page 67: Questions Raised by Results - Pennsylvania State University

In My Study,

I built up a fully functional system Years of work Java-based

Page 68: Questions Raised by Results - Pennsylvania State University

Data Collection

After the design, you should have a clear idea about what data to collect. Directly related to DV. Objective data collected through computer or

instruments Other data to collect Demographic data of subjects Subjective evaluation data

Page 69: Questions Raised by Results - Pennsylvania State University

Methods to Collect Data If using software tools is involved and

manipulating software is possible, better write codes to collecting performance data. Otherwise, have someone dedicated to collect

performance data

Using pre- and post-test questionnaires Pre-test questionnaire

Demographic data, background, relevant skills, etc. Post-test questionnaire

Feedback, assessment, and perception on relevant tasks and phenomenoa

Page 70: Questions Raised by Results - Pennsylvania State University

In My Study,

Performance data collection was coded into the system Time stamp Action type: moving, scaling Related parameters: scale (size), location

Questionnaires

Page 71: Questions Raised by Results - Pennsylvania State University

Data File

http://zhang.ist.psu.edu/teaching/505/Data_Sample_Collected.txt

Page 72: Questions Raised by Results - Pennsylvania State University
Page 73: Questions Raised by Results - Pennsylvania State University
Page 74: Questions Raised by Results - Pennsylvania State University

Surveys

Short surveys are often included in usability studies Pre-test questionnaire User demographic and other relevant data

Post-test User feedback on system

Paper or online

Page 75: Questions Raised by Results - Pennsylvania State University
Page 76: Questions Raised by Results - Pennsylvania State University
Page 77: Questions Raised by Results - Pennsylvania State University

Subject Recruitment General population

Representative enough Considering confronting factors

Where to get the subjects? Sampling from the population you target

Convenience vs. representative People on campus

How to get them? Subject pools General public: email, ads, flyers, etc.

How many to get? Experience Calculation based on desired power. More is better, if you have sufficient resources.

Costs are based the length of your study Always have backup subjects

No show, failed study, in sufficient subjects, etc.

Page 78: Questions Raised by Results - Pennsylvania State University

My Case Targeted population: general Students are OK. With 3D experiences

I recruited my participants from campus. Emails Flyers

The number of subjects Calculated with a statistics tool 24 participants (12 pairs)

Page 79: Questions Raised by Results - Pennsylvania State University

Results: Time Comparison (in seconds)

mCVE

CVE

VE

mVE

Tim

e

Non-Collaboration Collaboration

200

150

100

50

Non-Multiscale

Multiscale

Multiscale collaboration can be helpful for cross-scale tasks.

Page 80: Questions Raised by Results - Pennsylvania State University

In-Depth Study on Multiscale Collaboration 2 x 2 with-in subjects design

Non - Collaboration Collaboration

Non - Multiscale

Multiscale

VE

mVE

CVE

mCVE- MOVEmCVE- NoRole

mCVE - MOVEmCVE- GUIDE

+ 2

Page 81: Questions Raised by Results - Pennsylvania State University

Results: Time Comparison (in seconds)

Tim

e

mCVE - MOVECVE

VEmVE

Non-Collaboration Collaboration

200

150

100

50

Non-Multiscale

Multiscale

mCVE - GUIDEmCVE - NoRole

Page 82: Questions Raised by Results - Pennsylvania State University

Homework Critique a research paper

You need to list The hypothesis IVs, and DVs The task for the experiment Factors in the factorial design Approaches to counterbalance treatments

You need to point out at least one flaw in the experimental design or execution. Why? The impact(s)

External, internal, or construct validity Modification