50
Discrete Data Distributions and Summary Statistics Terms: histogram, mode, mean, range, standard deviation, outlier

Discrete Data

Embed Size (px)

DESCRIPTION

Discrete Data. Distributions and Summary Statistics. Terms: histogram, mode, mean, range, standard deviation, outlier. Discrete vs. Continuous Data. dis·crete adj. 1. Constituting a separate thing. See Synonyms at distinct . 2. Consisting of unconnected distinct parts. - PowerPoint PPT Presentation

Citation preview

Page 1: Discrete Data

Discrete Data

Distributions and Summary Statistics

Terms: histogram, mode, mean, range, standard deviation, outlier

Page 2: Discrete Data

Discrete vs. Continuous Data

dis·crete adj. 1. Constituting a separate thing. See Synonyms at distinct.2. Consisting of unconnected distinct parts.3. Mathematics: Defined for a finite or countable set of values; not continuous.

con·tin·u·ous adj. 1. Uninterrupted in time, sequence, substance, or extent. See Synonyms at continual.2. Attached together in repeated units: a continuous form fed into a printer.3. Mathematics: Of or relating to a line or curve that extends without a break or irregularity.

Page 3: Discrete Data

Discrete vs. Continuous Datadiscrete

Usually related to counts.Variable values for different units often tie.Averaging two values does not necessary yield another possible value.

continuousAny value in some interval.A tie among different units is in theory virtually impossible (and in practice very rare). Ties (due to rounding) are infrequent in practice.The average of any two values is another (and different) possible value.

Page 4: Discrete Data

Distribution

The distribution of a variable tells us what values it takes and how often it takes those values.

MAKE A PICTURE!

For discrete quantitative data, use a relative frequency chart / histogram* to display the distribution.

* Fundamentally these are the same thing.

Page 5: Discrete Data

Left Skewed Distribution

Page 6: Discrete Data

Right Skewed Distribution

Page 7: Discrete Data

Symmetric Distribution

Page 8: Discrete Data

Outlier

outlier noun

1: something that is situated away from or classed differently from a main or related body

2: a statistical observation that is markedly different in value from the others of the sample

Page 9: Discrete Data

Measures of CenterMedian

Half the data are above/below the median.

Not too suitable to highly discrete data. More later about this.

(Sample) Mean

Sum all the data x, then divide by how many (n)

Denoted (“x bar”)

Both have the same measurement units as the data.

x

Page 10: Discrete Data

Less Important Measures of CenterMidrange

Average the minimum and maximum

For highly skewed data, the midrange is often a value that is quite atypical.

Mode

Most common value - highest proportion of occurrence

There can be 2 (or more) modes if there are ties in relative frequencies.

Generally found by graphical inspection.

Sometimes not anywhere near any “center.”

Both have the same measurement units as the data.

Page 11: Discrete Data

Measure of spread / variationSAME THING

Range = Max – MinIn statistics Range is a single number

Interquartile RangeBetter suited to continuous data

More later about this.

Variance / Standard Deviation

All but variance have the same measurement units as the data.

Page 12: Discrete Data

Variance S2

Mean of the squared deviations from the mean

1. Obtain the Mean.

2. Determine, for each value, the deviation from the Mean.

3. Square each of these deviations

4. Sum these squares

5. Divide this sum by one fewer than the number of observations to get the Variance

Measure of squared variation from the mean

Page 13: Discrete Data

Standard Deviation S

Square root of the Variance

Measure of spread / variation (from the mean)

Same measurement units as the data.

Page 14: Discrete Data

Comparing Means & Standard Deviations

Small: Mean = 41.60 SD = 2.07

Large: Mean = 44.80 SD = 2.59

50484644424038

Small Class

Large Class

Age Guess

Page 15: Discrete Data

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add a 40 and a 50…

Page 16: Discrete Data

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add a 40 and a 50…

Mean 44.86 SD 3.58

Page 17: Discrete Data

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add a 42 and a 48…

Page 18: Discrete Data

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add a 42 and a 48…

Mean 44.86 SD 2.73

Page 19: Discrete Data

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add 45 and 45…

Page 20: Discrete Data

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add 45 and 45…

Mean 44.86 SD 2.12

Page 21: Discrete Data

Comparing Means & Standard Deviations

0

12

3

45

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Mean = 4.0 SD = 3.0

0

12

3

45

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Mean = 8.0 SD = 3.0

Page 22: Discrete Data

Comparing Means & Standard Deviations

0

12

3

45

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Mean = 8.0 SD = 3.0

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Mean = 8.0 SD = 6.0

Page 23: Discrete Data

Computing Mean & Standard Deviation

Data listed by unit

1. By hand with calculator support (UGH)

2. Using your calculator’s built in statistics functionality

• 60 second quiz: Determine and write down the mean and standard deviation of at most 10 data values in under 1 minute

3. Using Excel

4. Using Minitab

Page 24: Discrete Data

Z = # of St Devs from Mean

“…within Z standard deviations of the mean…”

Determine Z SD.

Find the values

Mean – ZSD & Mean + ZSD

This means:

“…between __________ and ______________.”

Page 25: Discrete Data

Mean & Standard DeviationWhere the data are

In general you’ll find that about

68% of the data falls within 1 standard deviation of the mean

95% falls within 2

all falls within 3

There are exceptions.

These guidelines hold fairly precisely for data that has a bell (Normal) shaped histogram.

Page 26: Discrete Data

Range Rule of Thumb

To guess the standard deviation, take the usual range of data and divide by four.

Page 27: Discrete Data

Most homes for sale in the Oswego City School District are listed at prices between $50,000 and $200,000. What would you guess for the standard deviation of prices?

Page 28: Discrete Data

$50,000 to $200,000

Range about $200000 – $50000 = $150000

Apply the RRoT…

$150000 / 4 = $37,500

Page 29: Discrete Data

Students are asked to complete a survey online. This assignment is made on a Monday at about noon. The survey closes Wednesday at midnight.

Since each student’s submission is accompanied by a time stamp, it is simple to figure how early, relative to the deadline, each student submitted the work.

For the data set of amount of time early, guess the standard deviation. Give results in both days and hours.

Page 30: Discrete Data

This assignment is made on a Monday at about noon. The survey closes Wednesday at midnight.

That’s 2.5 days, or 60 hours. People will hand it in between immediately (2.5 days / 60 hours early) and at the last minute (0 early). The range is about 2.5 days or 60 hours.

Apply the RRoT…

2.5 / 4 = 0.625 days

these are the same

60 / 4 = 15 hours

Page 31: Discrete Data

Consider GPAs of graduating seniors.

Guess the standard deviation.

Page 32: Discrete Data

GPAs. You can’t graduate under 2.0. All As gives 4.0.

Min about 2.0 Max probably exactly 4.0

Range about 4.0 – 2.0 = 2.0

Apply the RRoT…

2.0 / 4 = 0.5

Page 33: Discrete Data

Example

An instructor asked students in two sections of the same course to guess the instructor’s age. Students in the first class (in a large lecture hall) had no other knowledge of the instructor’s personal life. Students in the second class (in a small classroom) knew that the instructor was the father of a young girl.

Page 34: Discrete Data

Variable

Guess of instructor’s age

Quantitative

Units

The students

Guess of instructor’s age varies from student to student.

Page 35: Discrete Data

Variable

Class (or Which class?)

Categorical

Units

The students

Which class varies from student to student.

Page 36: Discrete Data

This is a fairly symmetric distribution.

Mode = 42

Range = 54 – 32 = 22

5452504846444240383634323028Age_Large

Dotplot of Age_Large

Page 37: Discrete Data

This is a symmetric distribution.Mode = 42

Mean = 42.0

Symmetry: Typically Mean Mode

“Nearly equal”

5452504846444240383634323028Age_Large

Dotplot of Age_Large

Page 38: Discrete Data

Mean = 42.0

Mean = 39.0

5452504846444240383634323028Age_Small

Dotplot of Age_Small

5452504846444240383634323028Age_Large

Dotplot of Age_Large

Page 39: Discrete Data

Mean = 42.0 St Dev 22 / 4 = 5.5

Mean = 39.0 St Dev 22/ 4 = 5.5

5452504846444240383634323028Age_Large

Dotplot of Age_Large

5452504846444240383634323028Age_Small

Dotplot of Age_Small

Page 40: Discrete Data

Mean = 40.25 St Dev = 4.33 (guess 4.25)

Mean = 38.15 St Dev = 4.14 (guess 3.75)

454239363330

Large Class

454239363330

Small Class

Page 41: Discrete Data

Properties: Mean & Standard Deviation

They don’t really “depend” (in the usual sense) on how much data there is. They depend on the relative frequency (percent) of occurrence of each value.

Adding a new unit…

Sometimes the mean will go up; sometimes down. But on average it will stay the same.

Same for standard deviation.

Page 42: Discrete Data

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Page 43: Discrete Data

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Page 44: Discrete Data

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Page 45: Discrete Data

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Page 46: Discrete Data

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

ALWAYS – for every data set

Page 47: Discrete Data

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Page 48: Discrete Data

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Page 49: Discrete Data

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Page 50: Discrete Data

Standard Deviation Calculation

Sample Mean:

Sample Standard Deviation:

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

80.44x

59.2S