30
Introduction to applied statistics & applied statistical methods Prof. Dr. Chang Zhu 1

Applied statistics lecture_2

Embed Size (px)

Citation preview

Page 1: Applied statistics lecture_2

Introduction to applied statistics &

applied statistical methods

Prof. Dr. Chang Zhu1

Page 2: Applied statistics lecture_2

Aims

• Exploring data (descriptive statistics)

central tendency

data distribution (spread/dispersion)

• Testing assumption

normal distribution

Page 3: Applied statistics lecture_2

population vs. sample

• In reality, we can just collect a small

subset of the population.

Page 4: Applied statistics lecture_2

descriptive vs. inferential

• Descriptive statistics: summarize a data

set

• Inferential statistics: draw conclusions

based on a data set (sample) to the entire

population.

Page 5: Applied statistics lecture_2

descriptive statistics

• Measures of central tendency

(mean, median, mode)

• Measures of spread or dispersion

(range, variance, standard deviation)

Page 6: Applied statistics lecture_2

measures of central tendency

• A researcher is interested in the degree to

which a person spends time Facebooking

(in hours per week) and the amount of

time spent socialising with friends (number

of social encounters per month).

• He comes up with the following data set.(adapted from

http://wps.pearsoned.co.uk/ema_uk_he_dancey_statsmath

_4/84/21626/5536329.cw/index.html)

Page 7: Applied statistics lecture_2

measures of central tendency

P_ID Facebook use Social encounters

1 10 1

2 11 2

3 11 3

4 12 3

5 14 4

6 15 9

7 16 10

Page 8: Applied statistics lecture_2

measures of central tendency

10 11 11 12 14 15 16

Facebook use (hours per week)

• How many hours do the participants spend on

average? (sum = 89)

• What is the score that occurs with the most

frequency?

• What is the score that divides the data into 2

equal halves?

Page 9: Applied statistics lecture_2

measures of central tendency

10 11 11 12 14 15 16

Facebook use (hours per week)

• on average = 12.7

• the most frequency = 11

• divides the data into 2 equal halves = 12

Page 10: Applied statistics lecture_2

measures of central tendency

10 11 11 12 14 15 16

Facebook use (hours per week)

• mean = 12.7

• mode = 11

• median = 12

Page 11: Applied statistics lecture_2

measures of central tendency

– Mean: For normally distributed data, measured on

interval and ratio scales, the appropriate measure

of central tendency is the mean.

– Median: The median is most appropriate for data

measured on ordinal scales (but can still be used

for continuous data)

– Mode: is the appropriate measure of central

tendency for nominal data.

Page 12: Applied statistics lecture_2

measures of spread

observed deviance from

mean (M = 12.7)

squared

deviances

10 -2.7 7.29

11 -1.7 2.89

11 -1.7 2.89

12 -0.7 0.49

14 1.3 1.69

15 2.3 5.29

16 3.3 10.89

Page 13: Applied statistics lecture_2

measures of spread

How representative is the mean?

• add up all the squared deviances: sum of

squared errors

� affected by sample size

• divide by the number of participants minus 1:

variance

• square root the variance: standard deviation

Page 14: Applied statistics lecture_2

measures of spread

• Range: the difference between the highest

(maximum) and lowest (minimum) scores.

� not quite objective, depending on the

length of the data set.

Page 15: Applied statistics lecture_2

In SPSS

Analyse > Descriptive Statistics > Frequencies

> Statistics

Page 16: Applied statistics lecture_2

SPSS output

M = 12.7

SD = 2.28

Page 17: Applied statistics lecture_2

Check it!

Are the standard deviations correct for gender

and modes of learning (full-time/part-time)

variables?

Page 18: Applied statistics lecture_2

visualize the distribution

with histogram

Analyse > Descriptive Statistics >

Frequencies > Charts

Page 19: Applied statistics lecture_2

normal curve

visualize the distribution

with histogram

Page 20: Applied statistics lecture_2

a histogram with normal

distribution (bell-shaped)

• unimodal (one peak)

• symmetrical

• centered around the mean

Page 21: Applied statistics lecture_2

skewed distribution

mode<median<meanmode>median>mean

skewness: measure of symmetry

Page 22: Applied statistics lecture_2

kurtosis

kurtosis: measure the shape of the “bell”

Page 23: Applied statistics lecture_2

testing assumption

quantifying normal distribution:

• the Kolmogorov-Smirnov (K-S) test and the

Shapiro-Wilk test: compare the scores of our

data set with a normally distributed set of

scores with the same mean and standard

deviation)

• p>.05: non-significant

� not different: normally distributed

• p<.05: significant

� different : non-normal

Page 24: Applied statistics lecture_2

normality assumption

• not normally distributed

• outliers: a score which is very different

from the others

• How to identify outliers?

� Boxplot

Page 25: Applied statistics lecture_2

In SPSS

Graphs > Chart Builder > Boxplot

Page 26: Applied statistics lecture_2

In SPSS

Graphs > Chart Builder > Boxplot

Page 27: Applied statistics lecture_2

practice

• using the date file sample data 1.sav

• conduct the descriptive statistics to explore (the

variable named Intrinsic_Motivation_learn)

mean, mode, median

range, variance, standard deviation

draw a histogram to see the frequency of

scores

Page 28: Applied statistics lecture_2

practice

• using the date file sample data 1.sav

• conduct the Kolmogorov-Smirnov (K-S) test

and the Shapiro-Wilk test

• Are the scores of the variable

Intrinsic_Motivation_learn normally

distributed?

Page 29: Applied statistics lecture_2

practice

• using the date file SPSSexam.sav

• conduct the Kolmogorov-Smirnov (K-S) test

and the Shapiro-Wilk test for the variable

exam

• Are the scores of the variable exam

normally distributed?

Page 30: Applied statistics lecture_2

Assignment 2

• Deadline: October 22, 2014

• Detail:

Lecture 2_practical guidelines_assignment

(p.4)