39
Lecture Week 4 Inspecting Data: Distributions Introduction to Research Methods & Statistics 2013 2014 Hemmo Smit

Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Lecture Week 4 Inspecting Data: Distributions

Introduction to Research Methods & Statistics

2013 – 2014

Hemmo Smit

Page 2: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

So next week

No lecture & workgroups

But…

Practice Test on-line (BB)

Enter data for your own research

Practice SPSS skills with own data

Page 3: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Overview

Descriptive research

Describing and presenting data

Frequency distributions

Graphical displays (1)

Measures of Central Tendency and Variability

Graphical displays (2): Boxplots

Read:

Leary: Chapter 6

Howell: Chapter 2

Page 4: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Types of descriptive research

Survey research

Demographic

research

Epidemiological

research

Attitudes, lifestyles, behaviors, problems

Patterns of basic life events: birth,

marriage, migration, death.

Occurrence of disease and death

Page 5: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

3 types of surveys

Cross-sectional

Successive

independent

samples

Longitudinal (panel survey design)

One-shot

“cross-section” of the population

Changes over time

Different respondents each time

! Are samples comparable?

Changes over time

Same respondents more than once

! Drop out

Page 6: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Describing and presenting data

3 criteria for a good description:

1) Accurate

2) concise

3) comprehensible

Data can be presented in numerical and graphical format

Beware: Scale of measurement?!?

TIP: Always start with graphs

Trade-off

- Loss of information

- Possible distortion

Page 7: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

How to describe a distribution?

)( yy )( yy

A) Overall pattern

1) Shape

- number of peaks (uni-, bi- of multi-modal)?

- symmetrical or skewed?

2) Central tendency / Location: midpoint

3) Spread: a little or a lot?

B) Deviations from the pattern

- Outliers: observations that lie far from the majority

- Tails: thick or thin?

Page 8: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Frequency distributions: Example

How do children recall stories?

Respondents: 25 children

Task: Tell researcher about a movie

Dependent variable: number of “and then…” statements

(see Howell, Exercise 2.1, p.55)

Page 9: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Raw data and frequency distributions

18 17 16 18 15

15 18 16 20 18

22 20 17 21 17

19 17 21 20 19

18 12 23 20 20

Score f P

12 1 0.04

15 2 0.08

16 2 0.08

17 4 0.16

18 5 0.20

19 2 0.08

20 5 0.20

21 2 0.08

22 1 0.04

23 1 0.04

Total 25 1.00

Table 1. # ‘and then’ statements Table 2. # ‘and then’ statements

Page 10: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Absolute and relative frequencies

Absolute frequency (f)

= Number of respondents with a given score

Disadvantage: hard to interpret / compare

Relative frequency (P)

= Proportion of the total with a given score (P = f / n)

Advantage: easy to interpret

Note:

0 < P < 1

P x 100 = %

Page 11: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

SPSS: Frequencies - Menu

Analyze > Desciptive Statistics > Frequencies

Page 12: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

SPSS: Frequencies – Dialog box

Page 13: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

SPSS: Frequencies - Output

Page 14: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Grouped frequency distribution (1)

Simple frequency distributions unclear in case of:

- small number of participants in each category and/or

- variables with many categories

Solution: grouped frequency table

Distribute the raw data over K class intervals and make a new frequency distribution

Make sure all intervals are:

- exhaustive and mutually exclusive

- of equal width

Page 15: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Grouped frequency distribution (2)

Score f P

12-14 1 0.04

15-17 8 0.32

18-20 12 0.48

21-23 4 0.16

total 25 1.00

Rule 1: number of classes (K) = √n

Rule 2: class interval width (I) = range / number of classes

(Range (R) = highest score – lowest score)

In our example

Number of intervals = √25 = 5

Range = 23 – 12 = 11

Interval width = 11 / 5 ≈ 2 or 3

Page 16: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

SPSS: Grouped frequency distribution (1)

Page 17: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

SPSS: Grouped frequency distribution (2)

1

2

Page 18: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

SPSS: Grouped frequency distribution (3)

1

2

3

Page 19: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

SPSS: Grouped frequency distribution (4)

1

2

Page 20: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

SPSS: Grouped frequency distribution (5)

Page 21: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Cumulative frequency distributions (1)

Real lower limit = lower limit – 0.5

Real upper limit = upper limit + 0.5

Midpoint = upper limit + lower limit / 2

Class

interval

Real

lower

limit

Real

upper

limit

Midpoint f P F

12-14 11.5 14.5 13

15-17 14.5 17.5 16

18-20 17.5 20.5 19

21-23 20.5 23.5 22

Total

Page 22: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Cumulative frequency distributions (2)

F = Cumulative Relative Frequency (CRF): add all previous proportions.

Class

interval

Real

lower

limit

Real

upper

limit

Midpoint f P F

12-14 11.5 14.5 13 1 0.04

15-17 14.5 17.5 16 8 0.32

18-20 17.5 20.5 19 12 0.48

21-23 20.5 23.5 22 4 0.16

Total 25 1.00

Page 23: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Cumulative frequency distributions (3)

NB. Also possible: cumulative absolute frequency

Class

interval

Real

lower

limit

Real

upper

limit

Midpoint f P F

12-14 11.5 14.5 13 1 0.04 0.04

15-17 14.5 17.5 16 8 0.32 0.36

18-20 17.5 20.5 19 12 0.48 0.84

21-23 20.5 23.5 22 4 0.16 1.00

Total 25 1.00

Page 24: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Cumulative frequency distributions (4)

)( yy )( yy

The cumulative relative frequency polygon graphs

the possibility that someone has a score of X or lower.

Page 25: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Graphical displays: Nominal / Ordinal

Raw data Grouped

Bar

Pie

98765432

score

4

3

2

1

0

Co

un

t

8-96-74-52-3

score

6

4

2

0

Co

un

t

9

8

7

6

5

4

3

2

8-9

6-7

4-5

2-3

Page 26: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Graphical displays: Interval

Freq. Stem & Leaf

1,00 Extremes (=<12,0)

2,00 15. 00

2,00 16. 00

4,00 17. 0000

5,00 18. 00000

2,00 19. 00

5,00 20. 00000

2,00 21. 00

1,00 22. 0

1,00 23. 0

Stem width: 1

Each leaf: 1 case(s)

Histograms Stem & Leaf Display

Page 27: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Histogram – symmetrical or skewed?

Negatively skewed Positively skewed

Symmetrical

Page 28: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

SPSS: Graphs – Chart Builder / Legacy Dialogs

Page 29: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

SPSS: Graphs > Legacy Dialogs

Page 30: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

SPSS - Graphs > Chart builder

3

1

2

Page 31: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Measures of central tendency

1. Mode (Mo) = most common score

2. Median (Mdn) = middle score (50th percentile)

3. Mean (M) = average

2

1location Median

N

in x

nx

n

xxxx

1or

...21

Page 32: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Central tendency and skewness

sx

sx2

Shape

Mode

Median

Mean

positive skew symmetrical negative skew

A

B

C

A

A

A

C

B

A

Page 33: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Measures of variability

1. Range (R) = Highest score – Lowest score

2. Interquartile range (IQR) = Q3 – Q1

3. Standard deviation (s or σ) = spread around the mean

4. Variance (s² or σ²) = spread around the mean

Page 34: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Variance and standard deviation

Score Deviation Squared

… … …

Sum 0 ≥ 0

xx 1

1

)(

deviation Standard

2

n

xxs

i

x

1

)(

Variance

2

2

n

xxs

i

x

1x 2

1 )( xx

2x

3x

nx

ix

xx 2

xxn

xx 3

2

2 )( xx 2

3 )( xx

2)( xxn

The standard deviation and variance are:

only suitable as measures of spread around the mean

Not robust against outliers

Page 35: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Five-number summary and boxplot

Five-number summary consists of:

Graphical display: Boxplot

Minimum = Lowest (non-outlying) score

Q1 = 25th percentile (25% lower, 75% higher)

Median (=Q2) = 50th percentile

Q3 = 75th percentile

Maximum = Highest (non-outlying) score

Page 36: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Boxplot - Example

Nummerical (five-number summary) Graphical (boxplot)

Data: 3 13 17 19 22 24 25 28 35 39 44 45 83 86 93

Q3 = 45

Max = 93

M = 28

Q1 = 19

Min = 3 IQR = 45 – 19 = 26

Q1 – 1.5*IQR = -20

Q3 + 1.5*IQR = 84

Rule of thumb

Outlier = observation

that lies 1.5 x IQR

above Q3 or below Q1.

Page 37: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Overview

Scale of

Measurement

Graphical CT Spread

Nominal • Bar chart

• (Pie chart)

Mode ---

Ordinal • Boxplot

Median Range

IQR

Interval

(and higher)

• Histogram

• (Stem&Leaf display)

Mean - Standard dev.

- Variance

Page 38: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

What have you learned today?

What are the various ways to represent distributions

numerically?

What are the various ways to represent distributions

graphically?

How to describe a distribution

How to create and evaluate various numerical and

graphical representations of distributions

How to determine what numerical and graphical

representation is suitable for a variable.

Page 39: Lecture Week 4 - Universiteit Leiden...Overview Descriptive research Describing and presenting data Frequency distributions Graphical displays (1) Measures of Central Tendency and

Next week

No lecture and workgroups

Practice test on Blackboard

Enter your own data

Read:

Howell: Chapter 3

In two weeks

Normal distribution and standard scores