26
Displaying Data Cal State Northridge 320 Andrew Ainsworth PhD

Lecture03 displaying data

Embed Size (px)

Citation preview

Page 1: Lecture03 displaying data

Displaying Data

Cal State Northridge320Andrew Ainsworth PhD

Page 2: Lecture03 displaying data

Psy 320 - Cal State Northridge 2

Procedures for Displaying DataThe variable: scores on a 60 question exam for 20 students

50, 46, 58, 49, 50, 57, 49, 48, 53, 45, 50, 55, 43, 49, 46, 48, 44, 56, 57, 44

Page 3: Lecture03 displaying data

Psy 320 - Cal State Northridge 3

Procedures for Displaying DataFirst Step

Order the Data43, 44, 44, 45, 46, 46, 48, 48, 49, 49, 49, 50, 50, 50, 53, 55, 56, 57, 57, 58

Page 4: Lecture03 displaying data

Psy 320 - Cal State Northridge 4

Ungrouped Frequency Distribution

TESTSCOR

1 5.0 5.0 5.02 10.0 10.0 15.01 5.0 5.0 20.02 10.0 10.0 30.02 10.0 10.0 40.03 15.0 15.0 55.03 15.0 15.0 70.01 5.0 5.0 75.01 5.0 5.0 80.01 5.0 5.0 85.02 10.0 10.0 95.01 5.0 5.0 100.0

20 100.0 100.0

434445464849505355565758Total

ValidFrequency Percent Valid Percent

CumulativePercent

Page 5: Lecture03 displaying data

Psy 320 - Cal State Northridge 5

Ungrouped Frequency Distribution

TESTSCOR

58.057.0

56.055.0

54.053.0

52.051.0

50.049.0

48.047.0

46.045.0

44.043.0

3.5

3.0

2.5

2.0

1.5

1.0

.5

0.0

Page 6: Lecture03 displaying data

Psy 320 - Cal State Northridge 6

Grouped Distributions When sets of data become very large with a

large number of response categories (e.g. continuous data) it is sometimes easier to see a clear pattern in the data by grouping them into class intervals.

One can then form a Grouped Frequency Distribution, especially if the data are assumed to be continuous.

Page 7: Lecture03 displaying data

7

Construct classes of data, where number of classes varies between 10 – 20 (depending upon the range of scores).Size of the class interval is:

For our example:

Grouped Distributions

chosen number of intervalshighest lowestX X

i

58 43 1.5 - We'll round off to 210

i

Psy 320 - Cal State Northridge

Page 8: Lecture03 displaying data

Psy 320 - Cal State Northridge 8

Grouped Frequency Distribution of Testing Example

Apparent Limits

Real Limits Midpoint Frequency

Relative Frequency

Cumulative Frequency

Cumulative Relative

Frequency43-44 42.5-44.5 43.5 3 0.15 3 0.1545-46 44.5-46.5 45.5 3 0.15 6 0.347-48 46.5-48.5 47.5 2 0.1 8 0.449-50 48.5-50.5 49.5 6 0.3 14 0.751-52 50.5-52.5 51.5 0 0 14 0.753-54 52.5-54.5 53.5 1 0.05 15 0.7555-56 54.5-56.5 55.5 2 0.1 17 0.8557-58 56.5-58.5 57.5 3 0.15 20 159-60 58.5-60.5 59.5 0 0 20 1

rf =f/n, e.g.,1/20 = .05 crf =cf/n

Page 9: Lecture03 displaying data

Psy 320 - Cal State Northridge 9

20 21 22 23 24 25 26 27 28 29 30 31 32 33 34

Class interval:20-24

Class interval:25-29

Class interval:30-34

UpperStated

Limit

UpperStated

Limit

LowerStatedLimit

LowerStatedLimit

Lower real limit(25-29 interval)

Upper real limit(20-24 interval)

MidpointLower real limit(30-34 interval)

Upper real limit(25-29 interval)

Page 10: Lecture03 displaying data

Psy 320 - Cal State Northridge 10

Class Interval, Class Limits & Unit of Difference (American income data)Apparent Class Limits

Real Class Limits

21,000-25,000 20,500-25,500

16,000-20,000 15,500-20,500

11,000 -15,000 10,500-15,000

6,000-10,000 5,500-10,500

1,000-5,000 500-5,500

Unit of difference = Level of Accuracy

If the smallest unit of measurement is $1,000 this is the level of accuracy/unit of difference

Real lower limit = apparent lower limit - 0.5(unit of difference)

Real upper limit = apparent upper limit + 0.5(unit of difference)

Class interval = i = Real upper limit – real lower limit (25,500 – 20,500=5,000)

Page 11: Lecture03 displaying data

Psy 320 - Cal State Northridge 11

Note that where a given case is classified depends on the unit difference or measurement precision, i=5,000

Apparent Class Limits

Real Class Limits

21,000-25,000 20,500-25,50016,000-20,000 15,500-20,50011,000 -15,000 10,500-15,0006,000-10,000 5,500-10,5001,000-5,000 500-5,500

Apparent Class Limits

Real Class Limits

20,100-25,000 20,050-25,05015,100-20,000 15,050-20,05010,100 -15,000 10,050-15,050

5,100-10,000 5,050-10,050100-5,000 50-5,050

Income rounded to $1,000 Income rounded to $100

Person earning $20,100

Nature of distribution will also depend upon number of classes used

Page 12: Lecture03 displaying data

Psy 320 - Cal State Northridge 12

Graphical Displays

HistogramsFrequency PolygonsBar GraphsPie-charts Stem & Leaf plots

Page 13: Lecture03 displaying data

Psy 320 - Cal State Northridge 13

Histogram

TESTSCOR

59 - 6157 - 5955 - 5753 - 5551 - 5349 - 5147 - 4945 - 4743 - 45

7

6

5

4

3

2

1

0

Page 14: Lecture03 displaying data

Psy 320 - Cal State Northridge 14

Shape of Histogram & Number of Classes

TESTSCOR

59 - 6157 - 5956 - 5754 - 5652 - 5450 - 5248 - 5047 - 4845 - 4743 - 45

7

6

5

4

3

2

1

0

TESTSCOR

60 -

61

59 -

60

58 -

59

57 -

58

56 -

57

56 -

56

55 -

56

54 -

55

53 -

54

52 -

53

51 -

52

50 -

51

49 -

50

48 -

49

47 -

48

47 -

47

46 -

47

45 -

46

44 -

45

43 -

44

3.5

3.0

2.5

2.0

1.5

1.0

.5

0.0

TESTSCOR

57 - 6154 - 5750 - 5447 - 5043 - 47

10

8

6

4

2

0

5 Classes

10 classes

20 classes

Page 15: Lecture03 displaying data

Psy 320 - Cal State Northridge 15

Histograms

Height of bar = # of responses in the interval

Width of bar = size of the interval

Bars touch representing grouped continuous data

Page 16: Lecture03 displaying data

Psy 320 - Cal State Northridge 16

Frequency PolygonFrequency Polygon

0

1

2

3

4

5

6

7

43-44 45-46 47-48 49-50 51-52 53-54 55-56 57-58 59-60

Test Score Interval

Freq

uenc

y

Page 17: Lecture03 displaying data

Psy 320 - Cal State Northridge 17

Qualitative Data & Bar Graphs

MARITAL STATUS

MARITAL STATUS

NEVER MARRIED

SEPARATED

DIVORCED

WIDOWED

MARRIED

Perc

ent60

50

40

30

20

10

0

Page 18: Lecture03 displaying data

Psy 320 - Cal State Northridge 18

Bar GraphsLike Histograms

The height indicates the frequency

Unlike Histograms Bars represent categories Width is Meaningless Bars DO NOT touch Discrete Data

Page 19: Lecture03 displaying data

Psy 320 - Cal State Northridge 19

Pie-ChartsMARITAL STATUS

.1%

19.1%

2.7%

14.2%

11.0%

53.0%

Missing

NEVER MARRIED

SEPARATED

DIVORCED

WIDOWED

MARRIED

1. Pie-charts are especially good when showing distributions of a few qualitative classes and one wishes to emphasize the relative frequencies that fall into each class.

2. However, not as effective with a) large number of

classes.b) with numerical data

because the circle is confusing when ordered classes are represented.

Page 20: Lecture03 displaying data

Psy 320 - Cal State Northridge 20

Stem and Leaf Displays A stem and leaf diagram provides a

visual summary of your data. This diagram provides a partial sorting of the data and allows you to detect the distributional pattern of the data.

There are three steps for drawing a tem and leaf diagram. 1. Split the data into two pieces, the stem

(left 1, 2, 3 digits, etc.) and the leaf (the right most digit).

2. Arrange the stems from low to high. 3. Attach each leaf to the appropriate stem.

Page 21: Lecture03 displaying data

21

Stem and Leaf DisplaysOrdered Testscore Data43, 44, 44, 45, 46, 46, 48, 48, 49, 49, 49, 50, 50, 50, 53, 55, 56, 57, 57, 58

What are the stems?

What are the leaves?Psy 320 - Cal State Northridge

Page 22: Lecture03 displaying data

Stem & Leaf DisplaysTESTSCOR Stem-and-Leaf Plot

Frequency Stem & Leaf

3.00 4 . 344 8.00 4 . 56688999 4.00 5 . 0003 5.00 5 . 56778

Stem width: 10 Each leaf: 1 case(s)

Here the stem width is ten because the stems represent numbers in the 10s place numerically

Page 23: Lecture03 displaying data

Psy 320 - Cal State Northridge 23

Advantages & Disadvantages of Stem & Leaf Diagrams

Advantage: Combines frequency distribution with

histogram, thereby giving a pictorial description of data.

Disadvantages: Only works with numerical data. Works best with small and compact data

sets (e.g., will not work well with 1,000 cases & data in the range of 20-40).

Page 24: Lecture03 displaying data

Psy 320 - Cal State Northridge 24

Statistically Describing Distributions Modality (“How many peaks are

there?) Unimodal, bi-modal,

multimodal Symmetric vs. Skewed

Skewed positive (floor effect) Skewed negative (ceiling

effect) Kurtosis (“How peaked is your

data?”) Leptokurtic, Mesokurtic and

Platykurtic

Page 25: Lecture03 displaying data

Terms that Describe Distributions Term Features Example

"Symmetric" left side is mirror image of right side

"Positively skewed"

right tail is longer then the left

"Negatively skewed"

left tail is longer than the right

"Unimodal" one highest point

"Bimodal" two high points

"Normal" unimodal, symmetric, asymptotic

Page 26: Lecture03 displaying data