Statistics topics for CBASE Exam

Preview:

DESCRIPTION

Statistics topics for CBASE Exam. Dr. Glen A. Just. Major Topics. Overview Statistical Reasoning, Misleading Data Graphs and Visual Displays Trends, Predictions, Extrapolation Measures of Center (mean, median, mode) Measures of Spread (range, standard deviation) - PowerPoint PPT Presentation

Citation preview

STATISTICS TOPICS FOR CBASE EXAM

Dr. Glen A. Just

Major Topics

Overview

Statistical Reasoning, Misleading Data

Graphs and Visual Displays

Trends, Predictions, Extrapolation

Measures of Center (mean, median, mode)

Measures of Spread (range, standard deviation)

Probability (independence, mutually exclusive)

Overview - Basic Terms

Statistics: A systematic method of gathering, organizing, analyzing, and

interpreting data. Collection about a particular topic or event (football, baseball, etc.)

Two Primary Sources: Population (all possible people, experiments, observations, etc.) Sample (subset of population)

Two Primary Types of Investigations: Qualitative study (explorative) Quantitative study (attempts to prove hypotheses)

Statistical Reasoning – Quantitative Studies

Statistical reasoning may be defined as the way people reason with statistical ideas and make sense of statistical information.

This involves making interpretations based on sets of data, graphical representations, and statistical summaries.

Much of statistical reasoning combines ideas about data and chance, which leads to making inferences and interpreting statistical results.

Underlying this reasoning is a conceptual understanding of important ideas, such as distribution, center, spread, association, uncertainty, randomness, and sampling.

Statistical Reasoning – Quantitative Studies

Statistical reasoning can be viewed as a three-step process:

Comprehension (seeing a particular problem as similar to a class of problems),

Planning and execution (applying appropriate methods to solve the problem), and

Evaluation and interpretation (interpreting the outcome as it relates to the original problem).

The CBASE exam focuses primarily on

statistical logic,

data representation, and

computation.

Statistical Reasoning – Sample Selection

For statistical conclusions to be valid:

A sample must be selected randomly (each element had same chance)

A sample must be representative of the population

Conclusions must be reasonable for the data gathered

Random selection methods:

Simple random selection (from a hat)

Systematic random selection (random starting point)

Stratified random selection (equal proportions)

Statistical Reasoning – Sample Selection

We want to estimate the total income of adults living in a given street. We visit each household in that street, identify all adults living there, and randomly select one adult from each household. We then interview the selected person and find their income.

Statistical Reasoning – Sample Selection

We want to estimate the total income of adults living in a given street. We visit each household in that street, identify all adults living there, and randomly select one adult from each household. We then interview the selected person and find their income.

Problem 1: People living on their own are certain to be selected, so we simply add their income to our estimate of the total. But a person living in a household of two adults has only a one-in-two chance of selection.

Statistical Reasoning – Sample Selection

We want to estimate the total income of adults living in a given street. We visit each household in that street, identify all adults living there, and randomly select one adult from each household. We then interview the selected person and find their income.

Problem 1: People living on their own are certain to be selected, so we simply add their income to our estimate of the total. But a person living in a household of two adults has only a one-in-two chance of selection.

Fix 1: To correct this, when we come to such a household, we would count the selected person's income twice towards the total. (The person who is selected from that household can be loosely viewed as also representing the person who isn't selected.)

Statistical Reasoning – Sample Selection

We want to estimate the total income of adults living in a given street. We visit each household in that street, identify all adults living there, and randomly select one adult from each household. We then interview the selected person and find their income.

Problem 1: People living on their own are certain to be selected, so we simply add their income to our estimate of the total. But a person living in a household of two adults has only a one-in-two chance of selection.

Fix 1: To correct this, when we come to such a household, we would count the selected person's income twice towards the total. (The person who is selected from that household can be loosely viewed as also representing the person who isn't selected.)

Problem 2: Not everybody has the same probability of selection; so it would not be considered a “random” selection.

Statistical Reasoning – Sample Selection

We want to estimate the total income of adults living in a given street. We visit each household in that street, identify all adults living there, and randomly select one adult from each household. We then interview the selected person and find their income.

Problem 1: People living on their own are certain to be selected, so we simply add their income to our estimate of the total. But a person living in a household of two adults has only a one-in-two chance of selection.

Fix 1: To correct this, when we come to such a household, we would count the selected person's income twice towards the total. (The person who is selected from that household can be loosely viewed as also representing the person who isn't selected.)

Problem 2: Not everybody has the same probability of selection; so it would not be considered a “random” selection.

Fix 2: Identify the wager earners in each house living along the street. Suppose that there are k wage earners. Assign each wage earner a unique number from 1 to k. Place the numbers in a “hat” and draw out 15 numbers (with replacement). “SRS”

Statistical Reasoning – Sample Selection

In 1936, the early days of opinion polling, the American Literary Digest magazine collected over two million postal surveys and predicted that the Republican candidate in the U.S. presidential election, Alf Landon, would beat the incumbent president, Franklin Roosevelt by a large margin. However, the exact opposite result occurred. What happened?

Statistical Reasoning – Sample Selection

In 1936, the early days of opinion polling, the American Literary Digest magazine collected over two million postal surveys and predicted that the Republican candidate in the U.S. presidential election, Alf Landon, would beat the incumbent president, Franklin Roosevelt by a large margin. However, the exact opposite result occurred. What happened?

Was the sample large enough (replication)?

Statistical Reasoning – Sample Selection

In 1936, the early days of opinion polling, the American Literary Digest magazine collected over two million postal surveys and predicted that the Republican candidate in the U.S. presidential election, Alf Landon, would beat the incumbent president, Franklin Roosevelt by a large margin. However, the exact opposite result occurred. What happened?

Was the sample large enough (replication)? Yes. Over two million surveys were used.

Statistical Reasoning – Sample Selection

In 1936, the early days of opinion polling, the American Literary Digest magazine collected over two million postal surveys and predicted that the Republican candidate in the U.S. presidential election, Alf Landon, would beat the incumbent president, Franklin Roosevelt by a large margin. However, the exact opposite result occurred. What happened?

Was the sample representative of the population?

Statistical Reasoning – Sample Selection

In 1936, the early days of opinion polling, the American Literary Digest magazine collected over two million postal surveys and predicted that the Republican candidate in the U.S. presidential election, Alf Landon, would beat the incumbent president, Franklin Roosevelt by a large margin. However, the exact opposite result occurred. What happened?

Was the sample representative of the population? No. There is no mention that the sample was selected randomly. The Literary Digest survey represented a sample collected from readers of the magazine, supplemented by records of registered automobile owners and telephone users. This sample included an over-representation of individuals who were rich, who, as a group, were more likely to vote for the Republican candidate.

Sample Selection – Your Turn

Sally Statistics, wants to survey the AU student body on their stand on government programs to help the poor. She decides to sit at a table in the cafeteria and asks entering students if the amount spent by the government on welfare is a. too little, b. too much, or c. about right. After she gathers 12 responses, she reports her finding to her ethics class.

Is this a good study to gauge AU perceptions?

Sample Selection – Your Turn

Sally Statistics, wants to survey the AU student body on their stand on government programs to help the poor. She decides to sit at a table in the cafeteria and asks entering students if the amount spent by the government on welfare is a. too little, b. too much, or c. about right. After she gathers 12 responses, she reports her finding to her ethics class.

Problems: Only cafeteria users would be involved Sample size is small “Welfare” can be view negatively (“Assistance” is better).

Reading Graphs

The purpose of a graph is to present data in a pictorial format that is easy to understand. A graph should make sense without any additional explanation needed from the body of a report.

Graphs should not use unnecessary colors, shading, or three dimensional effects. Labeling should be adequate to make the graph informative.

Visual Representation of Data – Bar Graph

Northeast Midwest South West0

1

2

3

4

5

6

4.3

2.5

3.5

4.5

2.4

4.4

1.8

2.8

2.0 2.0

3.0

5.0

January February March

Visual Representation of Data – Line Graph

Jan Feb Mar Apr May June0

2

4

6

8

10

12

14

16

12.413.5

11.8

9.48.7

5.2

Amount

Visual Representation of Data – Line Graph

Jan Feb Mar Apr May June0

2

4

6

8

10

12

14

16

12.413.5

11.8

9.48.7

5.2

Amount

What is the difference between the values for February and April?

Visual Representation of Data – Circle Graph

Visual Representation of Data – Scatter Plot

Visual Representation of Data – MISLEADING

Visual Representation of Data – MISLEADING

Visual Representation of Data – CORRECTED

Visual Representation of Data - Table

The characteristics for a table are the same as for a graph. The purpose is to make the information more understandable for the reader.

This table could be made clearer if the definition of “Group” and “Class” were indicated.

If the table is trying to indicate a pattern, then a graph (such as a line graph) might be a better choice than this table.

Group 1 Group 2

Class 1 82 95

Class 2 76 88

Class 3 84 90

Visual Representation of Data – Stem-and-Leaf

The stem-and-leaf diagram shows data values by using two groups of numbers, stems and leaves.

For this diagram, the stem of “2” with leaf of “0” represents 20. The stem of “2” with the leaf of “1” means 21.

If a leaf if repeated, that means its associated value occurred multiple times. For instance, 60 occurred twice in the data set.

Leaf unit = 1

2 0 1 1 3 5

3 2 2 5 8

4 3 6 8

5 1 2 7 8

6 0 0 1

Leaf unitsStem units

Stem units are typically ten times the leaf units.Leaf = 1, Stem = 10.

Stem-and-Leaf – Your Turn

In the stem-and-leaf diagram to the right, what does the “3” and “1” mean?

Leaf unit = 10

1 0 1 2 3 7

2 2 5 6

3 1 6 9

4 2 3 6

5 0 0 1

Stem-and-Leaf – Your Turn

In the stem-and-leaf diagram to the right, what does the “3” and “1” mean?

Since the Leaf unit is 10, the stems represent units of 100. Thus the 3 means 300 and the 1 means 10.

“3” and “1” mean 310.

Leaf unit = 10

1 0 1 2 3 7

2 2 5 6

3 1 6 9

4 2 3 6

5 0 0 1

Sometimes a display will reveal a pattern or trend

This scatter plot shows a trend.

What is it?

Age

Sometimes a display will reveal a pattern or trend

This scatter plot shows a trend.

What is it?

There is a (linear) trend for larger values for the husband’s age to be matched to larger values for the wife’s age.

Age

Sometimes a display will reveal a pattern or trend

Sometimes the trend is not linear.

Here the trend is cyclic. In fact, there are cycles inside of cycles with this data.

Age

Measures of Center

The three most commonly used measures of center are:

Mean

Mode

Median

Mean: Sum of the data values divided by the number of data values.

1, 5, 8, 9, 12

Sum = 35

Divided by 5: 35/5 = 7

The mean is 7.

Center of the data - Mode

Mode: Data value with the highest frequency (count).

1, 5, 8, 9, 12

Mode = none. No one number shows up more often than the others.

1, 5, 5, 9, 12.

Mode = 5. The number 5 shows up twice. All others show up only once.

1, 5, 5, 9, 9.

Modes = 5 and 9. Two numbers show up more often than the other(s).

The data are bimodal.

Center of the data - Median

Median: Data value in the middle of the (sorted) values.

3, 5, 8, 9, 12

Median = 8. 8 is the number that is in the middle. There are two numbers less than 8 and two numbers more than 8.

5, 9, 3, 5, 12.

Median = 5. The number 3 is NOT the median because the data values were not in order (sorted). In order, the numbers are 3, 5, 5, 9,. 12. The number in the middle is 5. There are two numbers “below” 5 and two numbers above 5.

3, 5, 7, 9, 11, 12.

Median = 8. When there is an even number of data values, the middle two are averaged.

(7+9 ) / 2 = 16/2 = 8.

Center of the data - All

For the following values, find the mean, mode, and median:

11, 14, 15, 8, 9, 15

Mean =

Mode =

Median =

Center of the data - All

For the following values, find the mean, mode, and median:

11, 14, 15, 8, 9, 15

Mean = 12. (11+14+15+8+9+15)/6. 72/6 = 12.

Mode = 15. The number 15 shows up more often (twice) than the other numbers.

Median = 12.5 In order, the numbers are 8, 9, 11, 14, 15, 15. Since there is an even number of values, the middle two are averaged. (11+14)/2 = 25/2 = 12.5. The median is 12.5.

Spread of the data

There are two main measures for the spread (variation) of the data values.

Range: Distance from smallest to largest data values.

Standard deviation: Distance the data values are away from the mean.

For the following values, find the range and standard deviation:

11, 14, 15, 8, 9, 15

Range = 15 – 8 = 7.

Standard deviation needs the mean:

Mean = 12. (11+14+15+8+9+15)/6. 72/6 = 12.

Standard deviation =

Spread of the data - standard deviation

Standard deviation for the data values:

11, 14, 15, 8, 9, 15

Standard deviation needs the mean:

Mean = 12. (11+14+15+8+9+15)/6. 72/6 = 12.

Standard deviation = 1

2

n

xx

Spread of the data - standard deviation

Standard deviation for the data values:

11, 14, 15, 8, 9, 15

Standard deviation needs the mean:

Mean = 12. (11+14+15+8+9+15)/6. 72/6 = 12.

Standard deviation =

The total of the squared values is 48.

48/5 = 9.6

The square root of 9.6 is approx. 3.1

1

2

n

xx

x Mean X - mean

Squared

11 12 -1 1

14 12 2 4

15 12 3 9

8 12 -4 16

9 12 -3 9

15 12 3 9

Spread of the data – Range and Standard Deviation

Find the range and standard deviation for:

6, 4, 5, 2, 3

Range =

Standard deviation =

Range and Standard Deviation – Your Turn

Find the range and standard deviation for:

6, 4, 5, 2, 3

Range = 6 – 2 = 4.

Standard deviation =

Spread of the data – Range and Standard Deviation

Find the range and standard deviation for:

6, 4, 5, 2, 3

Range = 6 – 2 = 4.

Standard deviation needs the mean:

Mean = 4. (6+4+5+2+3)/5. 20/5 = 4.

Standard deviation = 1

2

n

xx

Spread of the data – Range and Standard Deviation

Find the range and standard deviation for:

6, 4, 5, 2, 3

Standard deviation needs the mean:

Mean = 4.

Standard deviation =

The total of the squared values is 10.

10/4 = 2.5

The square root of 2.5 is approx. 1.58.

1

2

n

xx

x Mean X - mean

Squared

6 4 2 4

4 4 0 0

5 4 1 1

2 4 -2 4

3 4 -1 1

10

Probability – Events and Likelihood

Probability is the likelihood (chance) of something happening.

If 3 red marbles and 2 blue marbles are placed in a bag and you draw out one marble (without looking), the probability that the marble is red is 3/5 or 60%.

Likewise, the probability of withdrawing a blue marble is 2/5 or 40%.

The probability of having a male child is 50%. A couple has two children, both of whom are male. What is the probability that the couple’s third child will be male?

a. 0.125

b. 0.50

c. 1.00

d. 1.25

Probability – Events and Likelihood

Probability is the likelihood (chance) of something happening.

If 3 red marbles and 2 blue marbles are placed in a bag and you draw out one marble (without looking), the probability that the marble is red is 3/5 or 60%.

Likewise, the probability of withdrawing a blue marble is 2/5 or 40%.

The probability of having a male child is 50%. A couple has two children, both of whom are male. What is the probability that the couple’s third child will be male?

a. 0.125

b. 0.50 (Correct answer. The probability is the same for each child.)

c. 1.00

d. 1.25 (This is not a valid probability 0 probability 1. )

Probability – Multiple Events (AND)

Two or more events can be combined. The probability of the combined events can be computed using a few formulas.

“AND”

Probability of A and B. (Independent events)

Let “A” mean flipping a coin and getting a “head”.

Let “B” mean rolling a die and getting a “3”.

Since A and B have no connection (independent),

P(A and B) = P(A)P(B) = (1/2)(1/6) = 1/12 = 0.8333…

Probability – Multiple Events (AND)

“AND”

Probability of A and B. (Dependent events)

Suppose we place 3 red marbles and 2 blue marbles in a bag.

Let “A” mean drawing out a red marble (without looking).

Let “B” mean drawing out a blue marble (without looking and without replacing the first “red” marble).

Since A and B have a connection they are dependent,

P(A and B) = P(A)P(B, given A) = (3/5)(2/4) = 3/10 = 0.3

Note: If the first marble was replaced, then the events would be independent,

P(A and B) = P(A)P(B, given A) = (3/5)(2/5) = 6/25 = 0.24

Probability – Multiple Events (OR)

“OR”

Probability of A or B. ( mutually exclusive events)

Suppose we place 3 red marbles and 2 blue marbles in a bag.

Let “A” mean drawing out a red marble (without looking).

Let “B” mean drawing out a blue marble (without looking).

P(A or B) = P(A) + P(B) = (3/5) + (2/5) = 5/5 = 1.0 (Guaranteed!)

Probability – Multiple Events (OR)

“OR”

Probability of A or B. ( mutually exclusive events)

Suppose we wish to draw one card out of a standard (shuffled) deck.

Let “A” mean drawing out a “Jack” (without looking).

Let “B” mean drawing out a “5” (without looking).

P(A or B) = P(A) + P(B) = (4/52) + (4/52) = 8/52 = 0.153846 = 15.4%

If the two events are NOT mutually exclusive (thus compatible), the duplication must be accounted for.

Probability – Multiple Events (OR)

“OR”

Probability of A or B. (compatible events)

Suppose we wish to draw one card out of a standard (shuffled) deck.

Let “A” mean drawing out a “Jack” (without looking).

Let “B” mean drawing out a “club” (without looking).

P(A or B) = P(A) + P(B) – P(A&B) = (4/52) + (13/52) – (1/52) =

16/52 = 0.30769 = 30.8%

If the two events are NOT mutually exclusive (thus compatible), the duplication must be accounted for.

Probability – Multiple Events (OR)

“OR”

Probability of A or B. (compatible events)

A B

A&B

Probability – Your Turn

“AND”

The probability of a child being born a female is 50%. The probability of having blue eyes is 25%. What is the probability of a new born being a blue-eyed female?

“OR”

What is the probability of a new born being blue-eyed or female?

Probability – Your Turn

“AND”

The probability of a child being born a female is 50%. The probability of having blue eyes is 25%. What is the probability of a newborn being a blue-eyed female?

P(A and B) = P(A)P(B) = (.50)(.25) = 0.125 = 12.5%

“OR”

What is the probability of a new born being blue-eyed or female?

P(A or B) = P(A) + P(B) – P(A&B) = .50 + .25 - .125 = 0.625 = 6.25%

Recommended