Statistical analysis · A t test is a simplified analysis of variance involving only two groups or conditions. Independet-samples t test Paired-samples t test Analysis of variance

Statistical

analysisDIANA SAPLACAN

2017

* SLIDES ADAPTED – BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Vs.

2

Background

There are different types of research methods to study behaviour:

Descriptive: observations, focus groups, field studies, interviews

Relational: correlation analysis – observations, field studies, surveys

Experimental: the only that gives possibility to discover causal

relationships, controlled experiments

To analyse the data is more demanding than collecting it

Relies on critical decisions

3

Today…

Independent samples t-test, paired samples t-test, one way analysis of variance

(ANOVA), factorial ANOVA, repeated measures ANOVA, correlation, regression,

and chi-square test

Focus: not on the mathematics, but on the context of use

Overall learning outcome for this lecture: to know how to choose appropriate

statistical analysis

4

Learning outcomes INF2260 - again 5

Preparing data for statistical

analysis

Data from lab-based experiments, usability tests, field studies,

surveys, and other channels need to be carefully processed before

any statistical analysis can be conducted (PRE-PROCESSING)

WHY?

6

Preparing data for statistical analysis -

Cleaning up data

1. Screen the data for possible errors – especially important for the

manually entered data by participants:

«To err is human. All people make mistakes. (Norman, 1988)»

Examples?

2. Compare data from online based questionnaires with paper based

questionnaires. Any inconsistencies regarding the same participant?

3. Correct the errors where possible. Sometimes you can also remove problematic values (treat them as «missing values»/null)

7

Preparing data for statistical analysis –

Coding data

Age Gender Highest degree Previous experience in

software A

Participant 1 34 male College Yes

Participant 2 28 female Graduate No

Participant 3 21 female High school No

Age Gender Highest degree Previous

experience in

software A

Participant 1 34 1 2 1



8


Descriptive statistics

After the data is cleaned up, run some descriptive statistical tests to

understand the nature of your data set.

Range where the data points fall + distribution

Means, medians, modes, variances, std deviations

9


Descriptive statistics – some terminology Measure of central tendency – where the bulk of data is set?

Characteristics: mean, median, mode

Mean: arithmetic average SUM (all) / how_many

Example: Let it be group 1 G1, and another group, group 2 G2, where we look at the ages of each participant

G1 = {15, 19, 22, 29, 33, 45, 50}

G2 = {12, 15, 22, 22, 22, 34, 34}

Mean G1: 30; Mean G2= 23

Median is the middle score in a data set.

Median G1: 15<19<22<29<33<45<50

Median G2?

Mode: the value that occurs most.

G2: 22.

10


Descriptive statistics – some terminology

Measures of spread: how the data deviates from the

center of the data set

Range: the distance between the highest and lowest values

G1 = {15, 19, 22, 29, 33, 45, 50} , 50-15 = 35

Variances: the variance of a data set is the mean of the squared distances of all the scores from the mean of the data set.

G1 = {15, 19, 22, 29, 33, 45, 50}

Mean G1 = 30.

Variance = [ 30 − 15 ^2 + 30 − 19 ^2 + 30 − 22 ^2 + 30 − 29 ^2 +30 − 33 ^2 + 30 − 45 ^2 + 30 − 50 ^2]/ 7 =

[(15^2 + 11^2 + 8^2 +1^2 + (-3)^2 + (-15)^2 + (-20)^2)]/ 7 =

= (225+121+64+1+9+225+400) /7 = 1045/7 = 149

Standard deviations: the square root of the variance.

Square root (149) 21,28

Sample population take fewer values to calculate the

variance. Divide by (n-1), instead of n when calculating it.

11


Descriptive statistics – some terminology

Normal distribution – defined by the mean and the standard

deviation

Bell-shaped, but not always

12

Design Structure 13


Comparing means

When involving multiple groups, the ultimate objective of the researchers is to find

out whether there is any difference between the conditions or groups.

Example:

We talked about G1 and G2 representing ages. Use statistical significance tests. (BETWEEN GROUP)

Compare the values of the elements in G1. (WITHIN GROUP)

What if G1, would be a lot of paired elements (age+gender).

G1 = {(15,1) (19,0), (22,1), (29,1), (33,1), (45,0,) (50,1)}. 1 = male, 0 = female.

Common tests:

t-tests and analysis of

variance - ANOVA

14


Comparing means

A t test is a simplified analysis of variance involving only two groups or conditions.

Independet-samples t test

Paired-samples t test

Analysis of variance (ANOVA) tests:

One-way ANOVA

Factorial ANOVA

Repeated measures ANOVA

Split-plot ANOVA

Major types of empirical study regarding design methodology and the appropriate significance test for each design

15


Comparing means, t-test

«The most widely adopted statistical procedure for comparing two

means is the t test.» (Rosental and Rosnow, 2008)

1) Independent – samples t test

2) Paired-samples t test

Hypothesis: «There is no significant difference in the task completion

time between individuals who use the word-prediction software and

those who do not use the software.»

16



Hypothesis: «There is no significant difference in the task completion time between individuals who use the word-prediction software and those who do not use the software.»

1) Independent – samples t test

Example: two groups of participants, G1 and G2.

G1 – text using standard word processing software

G2 – text using word prediction software

2) Paired-samples t test

G1 + G2 is one group, G. Each participant completes tests under both conditions (i.e. using word processing sw, and using word predicition sw). Data points here will point to the same participant

For 1) When using SPSS to run independent

samples t test, only the third and the fourth

column will be used

For 2) When using SPSS to run paired-samples t test,

only the second, and the third column will be used

17



Interpretation of t-results:

t test returns a value, t; with larger t

values suggesting higher probability of

the null hypothesis being false. In other

words, the higher t value, the more likely

the two means are different.

Two-tailed t tests and one-tailed tests:

Hypothesis indicates the direction of the

results.

Example: expecting that the word-

prediction software improves the typing

speed

Hypothesis: «There is no significant difference in

the task completion time between individuals

who use the word-prediction software and those

who do not use the software.»

Hypothesis: «Individuals who use word-

prediction software can type faster than

those who do not use word-prediction

software.»

18

Design Structure 19

Analysis of variance (ANOVA)

A widely used statistical method to compare the means of two or more groups.

When there are only two means to be compared, the calculation of ANOVA is simplified to t tests

The value returned by ANOVA: omnibus F. Therefore the ANOVA tests are also called «F-tests»

Types:

One-way ANOVA – for between group design

Factorial ANOVA – for between group design

Repeated measures ANOVA – within group design

ANOVA for split-plot design – between + within group design

20


One-way ANOVA

For between-group design, and when investigating only one independent variable with three or more conditions.

Three groups: G1 (Standard), G2 (Prediction), and G3 (Speech-based dictation). Each group will complete the text entry using one of the three methods.

Hypothesis: «There is no significant difference in the task completion

time between individuals who use the word-prediction software,

those who do not use the software, and those who use speech-

based diction software.»

Control group – G1 (coded as 0),

who use the standard software.

21


Factorial ANOVA Empirical studies that adopt a between-group design and investigate two or

more independent variables.

Hypothesis: «There is no significant difference in the task completion time between individuals who use the word-prediction software, those who do not use the software, and those who use speech-based diction software, whether or not they use the softwares for composition or transcription.»

3 conditions (standard, prediction, speech-based dictation)

X 2 type variables (composition, transcription)

= 6

22

Analysis of variance (ANOVA)-

Repeated measures, one-way ANOVA

Apropriate for empirical studies that adopt a within-group design

Can investigate one or more variables

One-way ANOVA:

Decided by

the entry-method

23


Repeated measures, two-ways ANOVA

Apropriate for empirical

studies that adopt a within-

group design

Can investigate one or more

variables

Two-ways method:

Investigate the entry-method

and the type of task

(composition, transcription)

24

Analysis of variance (ANOVA) –

for split-plot design

Sometimes you may choose a study design that involves both between-group

factors and within-group factors.

In the text-entry study, you may recruit two groups of participants

One group completes transcription tasks using all three data-entry methods

The other group completes composition tasks using all three data-entry methods

The type of task is between-group factor and the text-entry method is within group

factor

25


Split-plot ANOVA

Involves both between-group and within group factors

Experiment design

Example: G1 completes transcription using all three data-entry methods. G2

completes composition tasks using all three data-entry methods.

26

Assumptions of t and F tests

Errors should be independent of each other (no systematic biases):

Example: if two investigators conducted the study, and one investigator gives more

detailed instructions to the participants, whilst the other does not, this would introduce

some systematic biases, and hence affect the results.

Errors need to be identically distributed – “homogeneity of variance”(population variances should not differ widely and sample sizes should be of the same order of magnitude)

Example: When multiple group means are compared, the t test or the F test is more

accurate if the variances of the sample population are nearly equal (see the

beginning of the lecture regarding variance and sample)

The errors should be normally distributed (when errors are not normally distributed, use non-parametric tests)

Example: when the sample data is highly skewed

27

A Collaborative Brain-Computer

Interface for ALS PatientsLi and Nam, ThinkMind, 2015

Abstract—This study evaluated a SSVEP-based collaborative brain-computer interface (BCI) for people with severe motor disabilities. With ten ALS (amyotrophic lateral sclerosis) patients and 10 age-matched able-bodied participants as control group, effects of collaboration and motor disability were investigated in a robot-control task. In the study, participants were requested to control a robot in a predefined path with their brain signals. Two collaboration modes were developed in the study: individual mode and simultaneous mode. In individual mode, participant performed the task alone. In simultaneous mode, two participants performed together to finish the task. Results revealed significantly better performance in simultaneous mode than individual mode, but no significant effect of motor disability. The study showed promising preliminary results for supporting collaborative work between BCI users with severe motor disabilities. It should provide invaluable empirical data and great insights for future research and system development.

Keywords-brain-computer interface (BCI), steady-state visual evoked potential (SSVEP), amyotrophic lateral sclerosis (ALS), collaboration, motor disabilities.

See the link to the article itself.

28

Identify relationships

Correlation: Two factors are correlated if there is a relationship between them

Example: is there any relationship (correlation) between age, computing experience, and target selection speed? In statistics, two factors are correlated if there is a significant relationship between them.

Most commonly used test for correlation is the Pearson’s product moment correlation coefficient test

Pearson’s r: ranges between -1 to 1

Pearson’s r square represents the proportion of the variance shared by the two variables

29

CorrelationMeasures the extent to which two concepts are related

For example, years of university training vs. computer ownership per capita

How?

obtain the two sets of measurements (training and ownership)

calculate correlation coefficient

+1: positively correlated (both variables increases – strong correlation “directly proportional”)

0: no correlation (no relation)

–1: negatively correlated (a relationship between variables, where as one variable increases, the other decreases “indirectly proportional”)

Dangers

attributing causality (a correlation does not imply cause and effect)

cause may be due to a third “hidden” (intervening) variable related to both IV and DV, age affluence example

drawing strong conclusion from small numbers (unreliable with small samples, min. 40 subjects)

30

Correlation

5 64 56 74 45 63 55 74 45 76 76 67 76 87 9

Years of training

3

4

5

6

7

8

9

10

2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5

Condition 1

r2 = .668

Co

nd

ition

2

Devices owned

31

Example

Data entry example:Text-processing with and without predictionIs there a correlation with years of computer experience?

Do Pearson’s correlation test. Interpret data. -0.723 means that there is a correlation, but it is negative – more experience less time.

-0.468 (with prediction) implies no significant correlation(statistical significance as usual: bellow 0.05 implies significance)

32


Correlation does not imply causal relationship

Observing online e-commerce site, you may find that there is correlation between income and performance.It may appear that the higher the income is, the poorer the performance.

Here, there may be an intervening variable (age of people with high income)

33


Regression: can investigate the relationship

between one dependent variable (DV) and

multiple independent variables (IVs)

Regression is used for 2 purposes:

Model construction (equation based on IVs

that explain changes in variance of DV)

Quantitative relationship between one DV and a number of

IVs.

Prediction (Selection of IVs that predict DV)

We use a number of factors to predict the value of the

dependent variable, also called the “criterion variable”

Different regression procedures

Simultaneous (DV and group of IVs) – most

common.

Hierarchical (DV and IVs separately) – here

the independent variable will be entered

one at a time

1 DV

IV IVIV

Example: You want to conduct a user study that investigates target selection tasks using a standard mouse. One important variable (DV) you are interested in is the completition time. Factors that you want to investigate to see how much impact they have on task completition time are: target size, distance, computer experience, age etc. Here these factors can be considered the IVs. Two regression analyses are possible:

• Simulatenous, where the IVs (target size, distance, computer experience, age etc.) are considered as a group.

• Hierarchical, where the IVs (target size, distance, computer experience, age etc.) are considered separately, in order to see how each of the factors impact the DV.

34

Regression Calculate a line of “best fit”

use the value of one variable to predict the value of the other

3

4

5

6

7

8

9

10

3 4 5 6 7Condition 1

y = .988x + 1.132, r2 = .668y = .988x + 1.132, r2 = .668

654 56 74 45 63 55 74 45 76 76 67 76 87 9

condition 1 condition 2

Co

nd

itio

n 2

35

Parametric and

Nonparametric Tests

Two common non-parametric hypothesis tests, used to analyze categorical data

are

the chi-square test for goodness of fit

the chi-square test for independence.

CHI-square test assumptions:

Data points need to be independent

The sample size should not be too small

36

Parametric and Nonparametric

Tests

The term "non-parametric" refers to the fact that the tests do not require assumptions about population parameters nor do they test hypotheses about population parameters.

Previous examples of hypothesis tests, such as the t-tests and analysis of variance, are parametric tests and they doinclude assumptions about parameters and hypotheses about parameters.

37Conditions for parametric tests:

• The data is normally distributed – the conditions is usually met if the population has an approximately normal distribution

• The variables should be scaled by intervals, i.e. the distance between two adjacent data units should be equal. Example: when examining the age variable, the distances between 1 and 2, 2 and 3, and 80 and 81 are all equal to each other.

• For tests that compare means of different groups, the variance in the data collected in the data collected from different groups should be approximately equal.

Conditions for non-parametric tests:

Example 1: one or several of the above conditions are not met no normal distribution. Example: when collecting subjective satisfaction about an application (“I am satisfied with the time it took to complete the task. Rate this from 1 to 5, where 1 is highly disagree, 3 is neutral, and 5 is highly agree.”). This can use Likert scale. The distance between two adjacent data points can be unequal. Example 2: yes and no-questions.

* Not assumption-free, but fewer assumptions about the data

Non-parametric tests

Non-parametric tests are used when:

The error is not normally distributed

The distances between any two data units are not equal

The variance of error is not equal

38

Parametric and Nonparametric Tests The most obvious difference between the chi-square tests and the other hypothesis tests we

have considered (t and ANOVA) is the nature of the data.

For chi-square, the data are frequencies rather than numerical scores.

Used to conduct significance tests in order to analyze frequency counts.

Example: categorical data (Yes or No) are collected and we need to determine whether there is any relationship in the variables. The results are frequencies rather than numerical data.

Example: examining the impact of age on users’ preferences towards two target selection devices: a mouse and a touch screen. You recruit two groups of users. One group consists of 20 adults, younger than 65, and the other of 20 adults older or equal to 65. After completing a number of tests, the participants specify the type of device they prefer to use. You can then run a Chi-test to see if there is indeed any relationship between age and the preference for pointing devices. If you will also include a group of children, then you can run another Chi-test to see how the age relates to the selection of the preferred device.

Chi-tests are not assumption free. There are two-assumptions that we make: the participant either prefers the mouse OR the touch screen, but not both or none of them at the same time. (Think about the Radio buttons-options).

Chi-test does not work well if the sample is too small. For a robust Chi-square, we need a sample of at least 20.

39

Other non-parametric tests

Two groups of data (and assumptions for parametric tests are not met)

For between-group design: Mann–Whitney U test or the Wald–Wolfowitz runs test

For within-group design: Wilcoxon signed ranks test

Three or more groups of data (and assumptions for parametric tests are not met)

For between-group design: Kruskal–Wallis one-way analysis of variance by ranks

For within-group design: Friedman’s two-way analysis of variance test

40

Vs.

41

“Statistics is the grammar of science.” (Karl Pearson)

Thanks!

42

Documents

Statistical analysis · A t test is a simplified analysis of variance involving only two groups or conditions. Independet-samples t test Paired-samples t test Analysis of variance