Upload
daria-bogdanova
View
64
Download
4
Embed Size (px)
Citation preview
Introduction to applied statistics &
applied statistical methods
Prof. Dr. Chang Zhu1
Aims
• Exploring data (descriptive statistics)
central tendency
data distribution (spread/dispersion)
• Testing assumption
normal distribution
population vs. sample
• In reality, we can just collect a small
subset of the population.
descriptive vs. inferential
• Descriptive statistics: summarize a data
set
• Inferential statistics: draw conclusions
based on a data set (sample) to the entire
population.
descriptive statistics
• Measures of central tendency
(mean, median, mode)
• Measures of spread or dispersion
(range, variance, standard deviation)
measures of central tendency
• A researcher is interested in the degree to
which a person spends time Facebooking
(in hours per week) and the amount of
time spent socialising with friends (number
of social encounters per month).
• He comes up with the following data set.(adapted from
http://wps.pearsoned.co.uk/ema_uk_he_dancey_statsmath
_4/84/21626/5536329.cw/index.html)
measures of central tendency
P_ID Facebook use Social encounters
1 10 1
2 11 2
3 11 3
4 12 3
5 14 4
6 15 9
7 16 10
measures of central tendency
10 11 11 12 14 15 16
Facebook use (hours per week)
• How many hours do the participants spend on
average? (sum = 89)
• What is the score that occurs with the most
frequency?
• What is the score that divides the data into 2
equal halves?
measures of central tendency
10 11 11 12 14 15 16
Facebook use (hours per week)
• on average = 12.7
• the most frequency = 11
• divides the data into 2 equal halves = 12
measures of central tendency
10 11 11 12 14 15 16
Facebook use (hours per week)
• mean = 12.7
• mode = 11
• median = 12
measures of central tendency
– Mean: For normally distributed data, measured on
interval and ratio scales, the appropriate measure
of central tendency is the mean.
– Median: The median is most appropriate for data
measured on ordinal scales (but can still be used
for continuous data)
– Mode: is the appropriate measure of central
tendency for nominal data.
measures of spread
observed deviance from
mean (M = 12.7)
squared
deviances
10 -2.7 7.29
11 -1.7 2.89
11 -1.7 2.89
12 -0.7 0.49
14 1.3 1.69
15 2.3 5.29
16 3.3 10.89
measures of spread
How representative is the mean?
• add up all the squared deviances: sum of
squared errors
� affected by sample size
• divide by the number of participants minus 1:
variance
• square root the variance: standard deviation
measures of spread
• Range: the difference between the highest
(maximum) and lowest (minimum) scores.
� not quite objective, depending on the
length of the data set.
In SPSS
Analyse > Descriptive Statistics > Frequencies
> Statistics
SPSS output
M = 12.7
SD = 2.28
Check it!
Are the standard deviations correct for gender
and modes of learning (full-time/part-time)
variables?
visualize the distribution
with histogram
Analyse > Descriptive Statistics >
Frequencies > Charts
normal curve
visualize the distribution
with histogram
a histogram with normal
distribution (bell-shaped)
• unimodal (one peak)
• symmetrical
• centered around the mean
skewed distribution
mode<median<meanmode>median>mean
skewness: measure of symmetry
kurtosis
kurtosis: measure the shape of the “bell”
testing assumption
quantifying normal distribution:
• the Kolmogorov-Smirnov (K-S) test and the
Shapiro-Wilk test: compare the scores of our
data set with a normally distributed set of
scores with the same mean and standard
deviation)
• p>.05: non-significant
� not different: normally distributed
• p<.05: significant
� different : non-normal
normality assumption
• not normally distributed
• outliers: a score which is very different
from the others
• How to identify outliers?
� Boxplot
In SPSS
Graphs > Chart Builder > Boxplot
In SPSS
Graphs > Chart Builder > Boxplot
practice
• using the date file sample data 1.sav
• conduct the descriptive statistics to explore (the
variable named Intrinsic_Motivation_learn)
mean, mode, median
range, variance, standard deviation
draw a histogram to see the frequency of
scores
practice
• using the date file sample data 1.sav
• conduct the Kolmogorov-Smirnov (K-S) test
and the Shapiro-Wilk test
• Are the scores of the variable
Intrinsic_Motivation_learn normally
distributed?
practice
• using the date file SPSSexam.sav
• conduct the Kolmogorov-Smirnov (K-S) test
and the Shapiro-Wilk test for the variable
exam
• Are the scores of the variable exam
normally distributed?
Assignment 2
• Deadline: October 22, 2014
• Detail:
Lecture 2_practical guidelines_assignment
(p.4)