20
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics provide the basis of nearly any quantitative analysis of data.

Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Embed Size (px)

Citation preview

Page 1: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Descriptive Statistics

Used to describe the basic features of the data in any quantitative study.

Both graphical displays and descriptive summary statistics provide the basis of nearly any quantitative analysis of data.

Page 2: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Descriptive Statistics

The purpose of descriptive statistics is to organize and summarize data so that the data are more readily comprehended.

That is, descriptive statistics describe distributions with numbers.

Page 3: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

The Process of Becoming Familiar with the Data

• ???

Page 4: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

The Process of Becoming Familiar with the Data

• Screening for valid values• Missing data• Value labels• Levels of measurement• Center• Spread• Shape• Rank or relative position• Association

Page 5: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Background Information

• Types of variables– Qualitative– Quantitative

• Scales or Levels of Measurement– Nominal – Names the category, therefore a

qualitative variable represents a nominal scale– Ordinal – Values that can be ordered, reflect

differing degrees or amounts of a characteristic being studied, difference between values are not interpretable.

– Interval – Values can be ordered, however, difference between values are interpretable.

– Ratio – A zero as a value is meaningful, ratios make sense.

Page 6: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Examples of Levels of Measurement

• Nominal - Numbers assigned to sport figures, gender, party affiliation

• Ordinal – Numbers assigned to educational attainment, rank in population

• Interval – Temperature, there is a zero but it depends on how it is measured – it is not an absolute zero, a temperature of 100 is not twice as hot as a temperature of 50.

• Ratio – Has an absolute zero, weight, count of the number of people, height, distance, elapsed time.

Page 7: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Why is knowing the level of measurement important?

• It will help you decide how to interpret the data from that variable.

• Helps you decide what statistical analysis is appropriate on the values assigned.

• http://www.socialresearchmethods.net/selstat/ssstart.htm

Page 8: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Central Tendency

Central Tendency refers to measuring the center or average. Only the notation for mean is standard.

The most common measures of central tendency are:– Arithmetic Mean or Mean –

– Mode – Mo – the item that occurs with greatest frequency

– Median – Mdn – the middle score when the observations are arranged in order of magnitude, so that an equal number of scores fall below and above.

1

1 ni

iXn

X

Page 9: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Examples of these measures

• Mean of: 2, 3, 6, 7, 3, 5, 10(2 + 3 + 6 + 7 + 3 + 5 + 10)/ 7 = 36/ 7 = 5.14

• Mode of: 2, 3, 6, 7, 3, 5, 10 is 3

• Median of: 2, 3, 6, 7, 3, 5, 10

First data is ordered: 2, 3, 3, 5, 6, 7, 10. Middle value is 5 therefore that is the median.

Page 10: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Some Important Points About These Measures

• Mode is the only descriptive measure used for nominal data.

• Median is unaffected by extreme values, it is resistant to extreme observations.

Page 11: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Some Important Points About These Measures

• Mean or Average is affected by extremely small or large values. We say that it is sensitive or nonresistant to the influence of extreme observations.

• The mean is the balance point of the distribution.

• In symmetric distributions the mean and median are close together.

Page 12: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

More important points

• In skewed data the mean is pulled to the tail of the distribution.

• Median is not necessarily preferred over the mean even if it is resistant. However if data is known to be strongly skewed then the median is preferable.

• Finally, the average is usually the measurement of central tendency of choice because it is stable during sampling.

Page 13: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Measuring Spread or Variability

There are several measures of variability. These measures give an added dimension to the data. More information about the data is better than less.

Example: A test was given in two classes and the average in one class was 97 and the average in the other was 94. Was the second test more difficult? Was it easier to get an A in the first class than the other? Not necessarily to both questions. The spread of the test grades might help answer the questions. Say that the spread of grades in the first test was 85 – 100 and in the second test the spread was 92 – 96.

Page 14: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Measures of Variability, Spread of Dispersion

• Range – Difference between highest and lowest items in a distribution. This measure is not responsive to each item in the distribution.

• Quartiles Q1 and Q3 – Medians of each part of the distribution to the left and right of the median.

• Interquartile range – IQR is range between Q1 and Q3.

IQR is used to find outliers. The rule is that if an item is 1.5 times the IQR below or above the Q1 and Q3 then it is considered and outlier.

Page 15: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

The Five-Number Summary

A convenient and quick way to graph and give some preliminary descriptive statistics is to determine the five-number summary. We need two additional bits of information.

The maximum and minimum.

Example: The data set in a previous slide was: 2, 3, 3, 5, 6, 7, 10.

The median is : 5 The Q1 and Q3 are 3 and 7 respectively.The minimum is 2The maximum is 10

Page 16: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

The Boxplot

The boxplot would look like:

7N =

VAR00001

12

10

8

6

4

2

0

Page 17: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Deviations from the mean

Another way to measure spread is to measure the deviations from the mean or average. For our example:

Avg. 5.14, so deviations are,

2 – 5.14, 3 – 5.14, 3 – 5.14, 5 – 5.14, 6 – 5.14, 7 – 5.14, 10 – 5.14. So, they are:

-3.14, -2.14, -2.14, -0.14, 0.86, 1.86, 4.86.

Notice that they add up to zero.

So as a descriptor it tells you something about the spread but since the sum is always zero the squares are computed and added.

Page 18: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Deviations from the mean continued

• Simply dividing by the number of sample items would give us the average of the sum of the squared deviations from the mean or variance. However, we will find out that it will give us an unbiased estimator of the variance if we divide by # items – 1.

• So formula becomes:2 2

1

1 ( )1n

ii

S X Xn

Page 19: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Standard Deviation

A more useful and popular statistic is the standard deviation. Its units will be the same as the items in the data set. Fortunately, it does not involve another formula. By taking the square root of the variance we also have the standard deviation.

Again, the standard deviation is nonresistant to extreme values.

The formula then is:

2

1

1 ( )1n

ii

S X Xn

Page 20: Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics

Class Demos

Outliers

Demo Data

Teacher Stress Data

Key for Teacher Stress Data