54
1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

Embed Size (px)

Citation preview

Page 1: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

1

Research Methods

Winter 2008Winter 2008 Chapter 9 – Measures of Central Tendency and

Dispersion Instructor: Dr. Harry Webster

Page 2: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

2

Chapter 9Chapter 9 Describing Distributions with NumbersDescribing Distributions with Numbers

A) Finding the median involve ordering data and positions.

B) means, mode, and standard deviation do not involve position.

Measures of central tendency are: median, mode, and means.

Measures of spread are: Variance and standard deviation.

Page 3: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

3

A) Median, quartiles, five number summary and boxplots.

Median:Median: the midpoint of an arranged (ordered from smallest to largest) distribution of data. The 50th percentile.

(Percentile: ranking out of 100)

Calculating the median: 1. Arrange scores from smallest to largest. 2. Use formula: (n + 1)/2 to find the location of the

median. 3a. If you have an odd number of scores, the formula

will lead you to the median score.

Page 4: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

4

Ex1., 2 3 4 5 6 7 8 9 10 Formula: (n + 1)/2 (9 + 1 )/2 = 5 (location of median)

Count 5 scores and we get 66. 66 is the median score.

Ex2., 2 3 4 5 5 5 6 7 8 8 8 Formula (n + 1)/2(11 + 1)/2 = 6 (location of median).

Count 6 scores and we get 5. As there are many 5’s we must indicate (underline) which 5 is the median score.

Page 5: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

5

B) Mean and Standard DeviationB) Mean and Standard Deviation

MeanMean: An average of scores.

Pronounced ‘x-bar’; symbol =

Sum of scores divided by number of cases. Ex. 1 2 3 4 = 10/4 = 2.5

Sensitive to outliers. Ex., 1 2 3 40 = 46/4 = 11.5

X

Page 6: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

6

Standard DeviationStandard Deviation

Most frequently used expression of spread/variability

Is a measure of the average spread of scores from the mean.

Small standard deviations involve a set of scores that are close to the mean.

Large standard deviations involve a set of scores that are further away from the mean.

Is influenced by outliers (the mean is used to calculate the standard deviation).

Page 7: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

7

The Standard Deviation Formula

scores ofnumber n

mean

scoreeach

sparenthesein everything of

1

)x-(x ..

2

x

x

sum

nDevSt

Page 8: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

8

To calculate the standard deviation (S.D. or St. Dev.)

Step 1. Find the mean.

Step 2. Find the distance of each score from the mean.

Step 3. Square each result to get rid of negatives.

Step 4. Add up the squared deviations (from the mean).

Step 5. Divide by n-1. This gives the variance.

Step 6. Find the square root. This gives the St. Dev.

Page 9: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

9

Example: Data set: 1 2 4 6 7

1. Find the mean

20/5 = 4

Deviation 3. Squared Deviation

2. 1 – 4 = -3 3x3 = 9

2 – 4 = -2 2x2 = 4

4 – 4 = 0 0

6 – 4 = 2 2x2 = 4

7 – 4 = 3 3x3 = 9 4. Total 26

Page 10: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

10

5. Divide the sum of the squared deviations by n-1 26/5-1 = 6.5 This is the variancevariance.

6. Square root the variance Square root of 6.5 = 2.55 This is the standard deviationstandard deviation.

Page 11: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

11

Use medians when there are outliers. Ex. income.

Use means and standard deviations when the distribution appears symmetrical. Ex. Test grades, performance on athletic variables that are measured in time.

Use the Mode with Nominal, Ordinal, Interval, and Ratio levels of measurements.

The mode is the only measure of central tendency that can be used with Nominal data such as gender of respondents, preferred type of music, marital status, etc.

Page 12: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

12

6. We have used 2 sets of data (7 2 2 1 3 4 5 6 and 7 2 2 1 3 4 50 6) to determine five number summaries and standard deviations.

Using the numbers, show the effects of outliers.

Page 13: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

13

CHAPTER 9CHAPTER 9 NORMAL DISTRIBUTIONSNORMAL DISTRIBUTIONS

When a graph depicts proportion of scores instead of frequency of scores it is called a density graphdensity graph.

The proportions add up to 1 (100%). When the density graph is smoothed into a line, it is

called a density curvedensity curve.

Page 14: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

14

• The mean is further towards the tail of the distribution as it takes into account the size of those scores (ex., outliers).

• The median depicts position in a distribution of data only; it is not affected by the more extreme scores.

• Normal Curve Skewed to the right

Page 15: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

15

• Normal Curves/Normal DistributionsNormal Curves/Normal Distributions:• The most important curve in Social Science and

Commerce statistics.

• Many biological variables fall on a normal curve. Ex., height.

• Many psychological variables are ‘forced’ into a normal curve. Ex., I.Q., some psychological inventories.

• Many sociological/economic variables don’t fall into a normal curve.

• Ex. income, education.

Page 16: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

16

Features of Normal Curves (Normal Distributions):

1. Given the mean and standard deviation, we can draw the normal curve.

2. Mean is center of the distribution; cuts it in half. This is also the median or 50th percentile.

3. The curve is symmetrical; one side of the mean mirrors the other.

4. The standard deviation determines the shape of the curve. The smaller the standard deviation, the closer the scores are to one another, the ‘taller’ the curve.

Page 17: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

17

Page 18: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

18

The standard deviation breaks the normal curve into segments that reflect the percent of scores in the set of scores. The 50th percentile is at st. dev. zero.

Standard deviations for the mean

Page 19: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

19

• The 68-95-99.7 RuleThe 68-95-99.7 Rule• 68% of all scores fall between -1 and +1 standard

deviation.• 95% of all scores fall between -2 and +2 standard

deviations.• 99.7% of all scores fall between -3 and +3 standard

deviations.

• As the tails of the normal curve do not touch the horizontal axis, we cannot determine the number of standard deviations for 100% of the scores.

• This is to leave room for extreme outliers.

Page 20: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

20

Ex., Women’s height. Mean = 65 “ St. Dev. = 2.5 “

Page 21: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

21

Chapter 5

What is a Confidence Interval.

Page 22: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

22

INTRODUCTION TO STATISTICAL INFERENCEINTRODUCTION TO STATISTICAL INFERENCE

Statistical inference is a technique to make decisions regarding the probability that the population would behave in the same way as the sample.

As it is based on probability, then the rules of probability must be followed. Therefore, the assumptions which must be met are:

1) Randomness: the predictable pattern of outcomes after very many trials.

1a) If samples are chosen randomly, then the pattern of outcomes is a normal distribution. This is called a sampling distribution.

Page 23: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

23

2) We assume the mean of the normal distribution reflects the mean of the population parameter.

Statistical inference helps us determine how confident we are about where a result falls on the sampling distribution in two ways:

1. Confidence Intervals:1. Confidence Intervals: How confident we are that our sample’s result captured the population parameter within a certain range (margin of error).

2. Tests of significance: 2. Tests of significance: We make a claim about the population and use the sample’s results to test that claim. Want to determine the probability of our claim being right.

Page 24: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

24

CHAPTER 5CHAPTER 5 WHAT IS A CONFIDENCE INTERVAL?WHAT IS A CONFIDENCE INTERVAL?

A confidence intervalconfidence interval estimates a population parameter from a sample statistic at a certain level of confidence. Here confidence means the probability of being right.

We also refer to it as a Confidence Statement.

We take the sample’s statistic (data) and estimate what the population’s answer would be. Involves how sure we are (confidence levelconfidence level) and margin of errormargin of error (the margin where we believe the population’s answer falls.

Page 25: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

25

We take the sample’s statistic (data) and estimate what the population’s answer would be. Involves how sure we are (confidence confidence levellevel) and margin of errormargin of error (the margin within which we believe the population’s answer falls.

We can develop Confidence Statements for:

A) Data given in percents/proportions.

B) Data given in means. (The only difference is a change of formula)

Page 26: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

26

error). of(margin scores of

margin certain awithin parameter

population thecapturingit of

level) (conf.y probabilit theestimate

and statisticany Take

samples from

(results) statisticsp̂

parameter population

p

Page 27: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

27

A) When the statistic is given in percents or A) When the statistic is given in percents or proportions.proportions.

The formula to find a confidence interval for any level confidence interval for any level of confidence is:of confidence is:

* /)ˆ1(ˆˆ nppzp

size sample n

samples from

(results) statisticsp̂

score standard a is score z

z

Page 28: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

28

= sample statistic (proportion or percent) z* = z scores (standard scores) n = number of subjects in the sample

Page 29: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

29

Example:

Mayor Tremblay is two weeks from election day. He wants to know his chances of winning the election. A polling company asks 1000 people who they would vote for if the election were held today and 57% say Mayor Tremblay.

Tremblay wants to be 90% confident that he will win.

.57 + - .0256 or 57% plus and minus 2.5%

1000/)57.1(57.64.157.

Page 30: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

30

The margin of error is 2.5%. By subtracting and adding it to the percent of people who said they would vote for Mayor Tremblay (57%) we find the range of scores (margin of error) within which we are 90% confident lies the population parameter.

Confidence Statement Mayor Tremblay can be 90% confident that between

54.4% and 59.5% of all voters will vote for him if the election were held today. (The all reflects the population parameter)

The confidence statement is the whole sentence; the margin of error is between 54.4% and 59.5%; the confidence level is 90%.

Page 31: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

31

CHAPTER 9CHAPTER 9 DESCRIBING RELATIONSHIPS:DESCRIBING RELATIONSHIPS:

SCATTERPLOTS AND CORRELATIONSSCATTERPLOTS AND CORRELATIONS

Scatterplots:Scatterplots: Involves the relationship between two or more

quantitative (ordinal, interval or ratio: NB – not nominal) variables measured on the same individuals/objects.

(For our course, we will deal with two variables.)

Page 32: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

32

The graph that depicts this relationship is called a scatterplot.scatterplot.

Sometimes, scatterplots have an explanatory variable (on the horizontal axis) and a response variable (on the vertical axis).

The explanatory variable is the independent variable. The response variable is the dependent variable.

Page 33: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

33

Each dot in a scatterplot reflects two pieces of information (variables) about an individual.

In this example, the individuals are countries. The graph depicts the relationship between gross domestic product per person and longevity. (p. 271)

Page 34: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

34

Some scatterplots have no explanatory and response variables; only the relationship between two variables.

Ex., The Archaeopteryx: the femur (leg bone) and humerus (arm bone); the size of one does not ‘explain’ or ‘contribute’ to the size of the other. (p. 274)

Page 35: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

35

This scatterplot has a definite shape: as one variable increases, the other tends to increase.

This is called a positive associationpositive association.

Association betw een Ice Cream Sales and Temperature

0

2

4

6

8

10

10 12 14 16 18 20Temperature

Ice

Cre

am S

ales

Page 36: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

36

When one variable decreases and the other increases, it is called a negative associationnegative association.

Ice Cream Price and Sales

0

5

10

0 0.5 1 1.5 2 2.5

Price

Sal

es

Page 37: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

37

When there is no relationship between the change in one variable and the change in another variable, there is no association.

Scatterplot of Ice Cream Sales and TV Violence

0

12

3

45

6

78

9

0 2 4 6 8

TV Violence Ratings

Ice

Cre

am S

ales

Page 38: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

38

To examine a scatterplot:

1. Look at the overall pattern and any important deviations.

2. Describe the scatterplot using the form, direction and strength of the relationship.

3. Look for outliers

4. The closer the data are to forming a linear line, the stronger the association (either negative or positive).

Ex., The Archaeopteryx: There is a strong positive association between the size of the femur and the humerus with no outliers.

Page 39: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

39

When the association between two variables is expressed mathematically, it is called a correlation.correlation.

Features of Correlations 1. It is expressed as r.

2. The range is from -1.00 to +1.00.

3. -1.00 is a perfect negative correlation; +1.00 is a perfect positive correlation. These are never seen with real data. Zero is no correlation - there is no relationship between the variables.

Page 40: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

40

4. Correlations use standard scores so we can compute them for any two variables (doesn’t have to be the same unit of measurement).

5. Correlations measures the strength of straight-line (linear) associations between variables.

6. Correlations are affected by outliers. The more data there is, the less an outlier will influence the correlation.

Page 41: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

41

Page 42: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

42

Correlations between: Are considered: .8 - 1.00 Very Strong .6 - .79 Strong .4 - .59 Moderate .2 - .39 Weak 0.0 - .19 Very Weak

Page 43: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

43

2. What is wrong with the following statement:

a) The correlation between the first snow storm of any given year and the number of car accidents that day is r = - 1.3

b) The correlation between gender and income is about r = .66

Page 44: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

44

3. Give an example for each of the following: a) A strong positive correlation

b) A strong negative correlation

c) No correlation

Page 45: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

45

EXERCISES 1. Professor Lively, runs every day for at least 30

minutes and checks her pulse rate.

Time PulseTime Pulse 34.12 152 35.52 124 34.52 140 34.05 152 34.13 146 35.52 128 36.17 136

Page 46: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

46

1a) Draw a scatterplot for these data.

1b) The correlation, is r = -.815 Briefly describe what this means.

Page 47: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

47

2. The following are the closing quotes for the Nasdaq and Microsoft for ten trading days.

Nasdaq: Microsoft:• 1742 54• 1785 57• 1770 55• 1789 56• 1784 56• 1804 57• 1862 60• 1845 60• 1826 59• 1824 59

Page 48: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

48

2a) The correlation is r = .974 Describe what this means:

a. Does NASDAQ performance cause Microsoft sales to rise?

b. Does microsoft sale cause NASDAQ performance to rise?

c. Neither a. nor b.

Justify your answer.

Page 49: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

49

Causation: The reason something occurs; what makes it happen. Requires experimental research designs where there

is a great deal of control of all variables.

Philosophically, causation requires a ‘leap of faith’ from excluding all other possible explanations to granting the independent variable the power to have caused the behavior.

Page 50: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

50

a) Simple Causation: Very rare in real life. A causes B to happen. Ex., paying students $250 to get 80%+ in a course.

This would increase the number of students who get 80%+.

If everything else is kept constant, we could say that the $250 had an effect on students’ behavior; it caused caused an increase in grades.

A B

Page 51: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

51

b) Common Response A causes B and C When changes in two variables are caused by a third,

common, variable.

Ex., July is season for highest ice cream sales; July is also the month where the most people drown.

Ice cream does not cause drowning; the warm weather increases both sales and drownings.

A

B

C

Page 52: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

52

c) Confounding Response: We know two variables cause a change in a third but

we don’t know the ‘weight’ of each variable.

Ex., person smokes and drinks too much. Heart is affected; we know that both contribute but do

not know how much each contribute. Need to do experimental research to ‘sort out’ the

influences.

Helps to isolate each variable’s effect on heart.

Page 53: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

53

When experimentation is not possible, we can approach causation if the following conditions are met:

1. The association between two variables is strong.

2. The association between two variables is consistent.

3. The alleged cause precedes the effect.

4. The alleged cause is plausible.

Page 54: 1 Research Methods Winter 2008 Winter 2008 Chapter 9 – Measures of Central Tendency and Dispersion Instructor: Dr. Harry Webster

54

3. People who drink diet soft drinks tend to gain more weight over a one year period than people who do not.

Does drinking diet drinks make people gain weight? Give a more plausible explanation.