Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Statistical
analysisDIANA SAPLACAN
2017
* SLIDES ADAPTED – BASED ON LECTURE NOTES BY ALMA LEORA CULEN
Vs.
2
Background
There are different types of research methods to study behaviour:
Descriptive: observations, focus groups, field studies, interviews
Relational: correlation analysis – observations, field studies, surveys
Experimental: the only that gives possibility to discover causal
relationships, controlled experiments
To analyse the data is more demanding than collecting it
Relies on critical decisions
3
Today…
Independent samples t-test, paired samples t-test, one way analysis of variance
(ANOVA), factorial ANOVA, repeated measures ANOVA, correlation, regression,
and chi-square test
Focus: not on the mathematics, but on the context of use
Overall learning outcome for this lecture: to know how to choose appropriate
statistical analysis
4
Learning outcomes INF2260 - again 5
Preparing data for statistical
analysis
Data from lab-based experiments, usability tests, field studies,
surveys, and other channels need to be carefully processed before
any statistical analysis can be conducted (PRE-PROCESSING)
WHY?
6
Preparing data for statistical analysis -
Cleaning up data
1. Screen the data for possible errors – especially important for the
manually entered data by participants:
«To err is human. All people make mistakes. (Norman, 1988)»
Examples?
2. Compare data from online based questionnaires with paper based
questionnaires. Any inconsistencies regarding the same participant?
3. Correct the errors where possible. Sometimes you can also remove problematic values (treat them as «missing values»/null)
7
Preparing data for statistical analysis –
Coding data
Age Gender Highest degree Previous experience in
software A
Participant 1 34 male College Yes
Participant 2 28 female Graduate No
Participant 3 21 female High school No
Age Gender Highest degree Previous
experience in
software A
Participant 1 34 1 2 1
Participant 2 28 0 3 0
Participant 3 21 0 1 0
8
Preparing data for statistical analysis –
Descriptive statistics
After the data is cleaned up, run some descriptive statistical tests to
understand the nature of your data set.
Range where the data points fall + distribution
Means, medians, modes, variances, std deviations
9
Preparing data for statistical analysis –
Descriptive statistics – some terminology Measure of central tendency – where the bulk of data is set?
Characteristics: mean, median, mode
Mean: arithmetic average SUM (all) / how_many
Example: Let it be group 1 G1, and another group, group 2 G2, where we look at the ages of each participant
G1 = {15, 19, 22, 29, 33, 45, 50}
G2 = {12, 15, 22, 22, 22, 34, 34}
Mean G1: 30; Mean G2= 23
Median is the middle score in a data set.
Median G1: 15<19<22<29<33<45<50
Median G2?
Mode: the value that occurs most.
G2: 22.
10
Preparing data for statistical analysis –
Descriptive statistics – some terminology
Measures of spread: how the data deviates from the
center of the data set
Range: the distance between the highest and lowest values
G1 = {15, 19, 22, 29, 33, 45, 50} , 50-15 = 35
Variances: the variance of a data set is the mean of the squared distances of all the scores from the mean of the data set.
G1 = {15, 19, 22, 29, 33, 45, 50}
Mean G1 = 30.
Variance = [ 30 − 15 ^2 + 30 − 19 ^2 + 30 − 22 ^2 + 30 − 29 ^2 +30 − 33 ^2 + 30 − 45 ^2 + 30 − 50 ^2]/ 7 =
[(15^2 + 11^2 + 8^2 +1^2 + (-3)^2 + (-15)^2 + (-20)^2)]/ 7 =
= (225+121+64+1+9+225+400) /7 = 1045/7 = 149
Standard deviations: the square root of the variance.
Square root (149) 21,28
Sample population take fewer values to calculate the
variance. Divide by (n-1), instead of n when calculating it.
11
Preparing data for statistical analysis –
Descriptive statistics – some terminology
Normal distribution – defined by the mean and the standard
deviation
Bell-shaped, but not always
12
Design Structure 13
Preparing data for statistical analysis –
Comparing means
When involving multiple groups, the ultimate objective of the researchers is to find
out whether there is any difference between the conditions or groups.
Example:
We talked about G1 and G2 representing ages. Use statistical significance tests. (BETWEEN GROUP)
Compare the values of the elements in G1. (WITHIN GROUP)
What if G1, would be a lot of paired elements (age+gender).
G1 = {(15,1) (19,0), (22,1), (29,1), (33,1), (45,0,) (50,1)}. 1 = male, 0 = female.
Common tests:
t-tests and analysis of
variance - ANOVA
14
Preparing data for statistical analysis –
Comparing means
A t test is a simplified analysis of variance involving only two groups or conditions.
Independet-samples t test
Paired-samples t test
Analysis of variance (ANOVA) tests:
One-way ANOVA
Factorial ANOVA
Repeated measures ANOVA
Split-plot ANOVA
Major types of empirical study regarding design methodology and the appropriate significance test for each design
15
Preparing data for statistical analysis –
Comparing means, t-test
«The most widely adopted statistical procedure for comparing two
means is the t test.» (Rosental and Rosnow, 2008)
1) Independent – samples t test
2) Paired-samples t test
Hypothesis: «There is no significant difference in the task completion
time between individuals who use the word-prediction software and
those who do not use the software.»
16
Preparing data for statistical analysis –
Comparing means, t-test
Hypothesis: «There is no significant difference in the task completion time between individuals who use the word-prediction software and those who do not use the software.»
1) Independent – samples t test
Example: two groups of participants, G1 and G2.
G1 – text using standard word processing software
G2 – text using word prediction software
2) Paired-samples t test
G1 + G2 is one group, G. Each participant completes tests under both conditions (i.e. using word processing sw, and using word predicition sw). Data points here will point to the same participant
For 1) When using SPSS to run independent
samples t test, only the third and the fourth
column will be used
For 2) When using SPSS to run paired-samples t test,
only the second, and the third column will be used
17
Preparing data for statistical analysis –
Comparing means, t-test
Interpretation of t-results:
t test returns a value, t; with larger t
values suggesting higher probability of
the null hypothesis being false. In other
words, the higher t value, the more likely
the two means are different.
Two-tailed t tests and one-tailed tests:
Hypothesis indicates the direction of the
results.
Example: expecting that the word-
prediction software improves the typing
speed
Hypothesis: «There is no significant difference in
the task completion time between individuals
who use the word-prediction software and those
who do not use the software.»
Hypothesis: «Individuals who use word-
prediction software can type faster than
those who do not use word-prediction
software.»
18
Design Structure 19
Analysis of variance (ANOVA)
A widely used statistical method to compare the means of two or more groups.
When there are only two means to be compared, the calculation of ANOVA is simplified to t tests
The value returned by ANOVA: omnibus F. Therefore the ANOVA tests are also called «F-tests»
Types:
One-way ANOVA – for between group design
Factorial ANOVA – for between group design
Repeated measures ANOVA – within group design
ANOVA for split-plot design – between + within group design
20
Analysis of variance (ANOVA)
One-way ANOVA
For between-group design, and when investigating only one independent variable with three or more conditions.
Three groups: G1 (Standard), G2 (Prediction), and G3 (Speech-based dictation). Each group will complete the text entry using one of the three methods.
Hypothesis: «There is no significant difference in the task completion
time between individuals who use the word-prediction software,
those who do not use the software, and those who use speech-
based diction software.»
Control group – G1 (coded as 0),
who use the standard software.
21
Analysis of variance (ANOVA)
Factorial ANOVA Empirical studies that adopt a between-group design and investigate two or
more independent variables.
Hypothesis: «There is no significant difference in the task completion time between individuals who use the word-prediction software, those who do not use the software, and those who use speech-based diction software, whether or not they use the softwares for composition or transcription.»
3 conditions (standard, prediction, speech-based dictation)
X 2 type variables (composition, transcription)
= 6
22
Analysis of variance (ANOVA)-
Repeated measures, one-way ANOVA
Apropriate for empirical studies that adopt a within-group design
Can investigate one or more variables
One-way ANOVA:
Decided by
the entry-method
23
Analysis of variance (ANOVA)-
Repeated measures, two-ways ANOVA
Apropriate for empirical
studies that adopt a within-
group design
Can investigate one or more
variables
Two-ways method:
Investigate the entry-method
and the type of task
(composition, transcription)
24
Analysis of variance (ANOVA) –
for split-plot design
Sometimes you may choose a study design that involves both between-group
factors and within-group factors.
In the text-entry study, you may recruit two groups of participants
One group completes transcription tasks using all three data-entry methods
The other group completes composition tasks using all three data-entry methods
The type of task is between-group factor and the text-entry method is within group
factor
25
Analysis of variance (ANOVA)-
Split-plot ANOVA
Involves both between-group and within group factors
Experiment design
Example: G1 completes transcription using all three data-entry methods. G2
completes composition tasks using all three data-entry methods.
26
Assumptions of t and F tests
Errors should be independent of each other (no systematic biases):
Example: if two investigators conducted the study, and one investigator gives more
detailed instructions to the participants, whilst the other does not, this would introduce
some systematic biases, and hence affect the results.
Errors need to be identically distributed – “homogeneity of variance”(population variances should not differ widely and sample sizes should be of the same order of magnitude)
Example: When multiple group means are compared, the t test or the F test is more
accurate if the variances of the sample population are nearly equal (see the
beginning of the lecture regarding variance and sample)
The errors should be normally distributed (when errors are not normally distributed, use non-parametric tests)
Example: when the sample data is highly skewed
27
A Collaborative Brain-Computer
Interface for ALS PatientsLi and Nam, ThinkMind, 2015
Abstract—This study evaluated a SSVEP-based collaborative brain-computer interface (BCI) for people with severe motor disabilities. With ten ALS (amyotrophic lateral sclerosis) patients and 10 age-matched able-bodied participants as control group, effects of collaboration and motor disability were investigated in a robot-control task. In the study, participants were requested to control a robot in a predefined path with their brain signals. Two collaboration modes were developed in the study: individual mode and simultaneous mode. In individual mode, participant performed the task alone. In simultaneous mode, two participants performed together to finish the task. Results revealed significantly better performance in simultaneous mode than individual mode, but no significant effect of motor disability. The study showed promising preliminary results for supporting collaborative work between BCI users with severe motor disabilities. It should provide invaluable empirical data and great insights for future research and system development.
Keywords-brain-computer interface (BCI), steady-state visual evoked potential (SSVEP), amyotrophic lateral sclerosis (ALS), collaboration, motor disabilities.
See the link to the article itself.
28
Identify relationships
Correlation: Two factors are correlated if there is a relationship between them
Example: is there any relationship (correlation) between age, computing experience, and target selection speed? In statistics, two factors are correlated if there is a significant relationship between them.
Most commonly used test for correlation is the Pearson’s product moment correlation coefficient test
Pearson’s r: ranges between -1 to 1
Pearson’s r square represents the proportion of the variance shared by the two variables
29
CorrelationMeasures the extent to which two concepts are related
For example, years of university training vs. computer ownership per capita
How?
obtain the two sets of measurements (training and ownership)
calculate correlation coefficient
+1: positively correlated (both variables increases – strong correlation “directly proportional”)
0: no correlation (no relation)
–1: negatively correlated (a relationship between variables, where as one variable increases, the other decreases “indirectly proportional”)
Dangers
attributing causality (a correlation does not imply cause and effect)
cause may be due to a third “hidden” (intervening) variable related to both IV and DV, age affluence example
drawing strong conclusion from small numbers (unreliable with small samples, min. 40 subjects)
30
Correlation
5 64 56 74 45 63 55 74 45 76 76 67 76 87 9
Years of training
3
4
5
6
7
8
9
10
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5
Condition 1
r2 = .668
Co
nd
ition
2
Devices owned
31
Example
Data entry example:Text-processing with and without predictionIs there a correlation with years of computer experience?
Do Pearson’s correlation test. Interpret data. -0.723 means that there is a correlation, but it is negative – more experience less time.
-0.468 (with prediction) implies no significant correlation(statistical significance as usual: bellow 0.05 implies significance)
32
Identify relationships
Correlation does not imply causal relationship
Observing online e-commerce site, you may find that there is correlation between income and performance.It may appear that the higher the income is, the poorer the performance.
Here, there may be an intervening variable (age of people with high income)
33
Identify relationships
Regression: can investigate the relationship
between one dependent variable (DV) and
multiple independent variables (IVs)
Regression is used for 2 purposes:
Model construction (equation based on IVs
that explain changes in variance of DV)
Quantitative relationship between one DV and a number of
IVs.
Prediction (Selection of IVs that predict DV)
We use a number of factors to predict the value of the
dependent variable, also called the “criterion variable”
Different regression procedures
Simultaneous (DV and group of IVs) – most
common.
Hierarchical (DV and IVs separately) – here
the independent variable will be entered
one at a time
1 DV
IV IVIV
Example: You want to conduct a user study that investigates target selection tasks using a standard mouse. One important variable (DV) you are interested in is the completition time. Factors that you want to investigate to see how much impact they have on task completition time are: target size, distance, computer experience, age etc. Here these factors can be considered the IVs. Two regression analyses are possible:
• Simulatenous, where the IVs (target size, distance, computer experience, age etc.) are considered as a group.
• Hierarchical, where the IVs (target size, distance, computer experience, age etc.) are considered separately, in order to see how each of the factors impact the DV.
34
Regression Calculate a line of “best fit”
use the value of one variable to predict the value of the other
3
4
5
6
7
8
9
10
3 4 5 6 7Condition 1
y = .988x + 1.132, r2 = .668y = .988x + 1.132, r2 = .668
654 56 74 45 63 55 74 45 76 76 67 76 87 9
condition 1 condition 2
Co
nd
itio
n 2
35
Parametric and
Nonparametric Tests
Two common non-parametric hypothesis tests, used to analyze categorical data
are
the chi-square test for goodness of fit
the chi-square test for independence.
CHI-square test assumptions:
Data points need to be independent
The sample size should not be too small
36
Parametric and Nonparametric
Tests
The term "non-parametric" refers to the fact that the tests do not require assumptions about population parameters nor do they test hypotheses about population parameters.
Previous examples of hypothesis tests, such as the t-tests and analysis of variance, are parametric tests and they doinclude assumptions about parameters and hypotheses about parameters.
37Conditions for parametric tests:
• The data is normally distributed – the conditions is usually met if the population has an approximately normal distribution
• The variables should be scaled by intervals, i.e. the distance between two adjacent data units should be equal. Example: when examining the age variable, the distances between 1 and 2, 2 and 3, and 80 and 81 are all equal to each other.
• For tests that compare means of different groups, the variance in the data collected in the data collected from different groups should be approximately equal.
Conditions for non-parametric tests:
Example 1: one or several of the above conditions are not met no normal distribution. Example: when collecting subjective satisfaction about an application (“I am satisfied with the time it took to complete the task. Rate this from 1 to 5, where 1 is highly disagree, 3 is neutral, and 5 is highly agree.”). This can use Likert scale. The distance between two adjacent data points can be unequal. Example 2: yes and no-questions.
* Not assumption-free, but fewer assumptions about the data
Non-parametric tests
Non-parametric tests are used when:
The error is not normally distributed
The distances between any two data units are not equal
The variance of error is not equal
38
Parametric and Nonparametric Tests The most obvious difference between the chi-square tests and the other hypothesis tests we
have considered (t and ANOVA) is the nature of the data.
For chi-square, the data are frequencies rather than numerical scores.
Used to conduct significance tests in order to analyze frequency counts.
Example: categorical data (Yes or No) are collected and we need to determine whether there is any relationship in the variables. The results are frequencies rather than numerical data.
Example: examining the impact of age on users’ preferences towards two target selection devices: a mouse and a touch screen. You recruit two groups of users. One group consists of 20 adults, younger than 65, and the other of 20 adults older or equal to 65. After completing a number of tests, the participants specify the type of device they prefer to use. You can then run a Chi-test to see if there is indeed any relationship between age and the preference for pointing devices. If you will also include a group of children, then you can run another Chi-test to see how the age relates to the selection of the preferred device.
Chi-tests are not assumption free. There are two-assumptions that we make: the participant either prefers the mouse OR the touch screen, but not both or none of them at the same time. (Think about the Radio buttons-options).
Chi-test does not work well if the sample is too small. For a robust Chi-square, we need a sample of at least 20.
39
Other non-parametric tests
Two groups of data (and assumptions for parametric tests are not met)
For between-group design: Mann–Whitney U test or the Wald–Wolfowitz runs test
For within-group design: Wilcoxon signed ranks test
Three or more groups of data (and assumptions for parametric tests are not met)
For between-group design: Kruskal–Wallis one-way analysis of variance by ranks
For within-group design: Friedman’s two-way analysis of variance test
40
Vs.
41
“Statistics is the grammar of science.” (Karl Pearson)
Thanks!
42