Models for Continuous Variables. Challenge What to do about histograms describing distributions for...

Preview:

Citation preview

Models for Continuous Variables

Longevity of Women (years)10090807060504030

ChallengeWhat to do about histograms describing distributions for continuous data? Especially for large collections.

Tabulating each unique value is cumbersome.

Bin choices are arbitrary.

Longevity of Women (years)10090807060504030

Models for Continuous PopulationsDistributions of continuous data are modeled with smooth curves (“density functions”).

Nonnegative.

Total area under the curveis exactly 1.

The area under the curve above an interval is equal to the probability of a result in that interval.

Example – Female LongevityLeft skewed. Median > Mean.

Standard deviation 10 – 12

Example – Female LongevityIf we want the probability a woman lives to at least 90 years of age…

Example – Female LongevityIf we want the probability a woman lives to at least 90 years of age…

…we find the area under the curve over the interval extending from 90 to the right.

Total area = 1

Total area = # rectangles size of one rectangle

Each rectangle is 2 0.004 = 0.008 units of area.

1 = # rectangles 0.008

# rectangles = 1 / 0.008 = 125

32 34 etc.

Interval # Rectangles

90 – 92 10.9

92 – 94 10.2

94 – 96 8.5

96 – 98 5.7

98 – 100 2.2

1 + 0.0+

Total 37.5

Interval # Rectangles

90 – 92 10.9

92 – 94 10.2

94 – 96 8.5

96 – 98 5.7

98 – 100 2.2

1 + 0.0+

Total 37.5

• Total of 37.5 rectangles

• Each rectangle is 0.008 area.

• The area under the curve is

37.5 0.008 = 0.30, or

37.5 / 125 = 0.30.

30.0% of women live to at least 90

years of age.

90 years is the 70th percentile.

Example – Female LongevityWhat is the probability a woman dies between the ages of 60 and 70?

Interval # Rectangles

60 – 62 1.2

62 – 64 1.6

64 – 66 1.9

66 – 68 2.3

68 – 70 2.8

Total 9.8

• Total of 9.8 rectangles

• Each rectangle is 0.008 area.

• The area under the curve is

9.8 0.008 = 0.0784.

About 7.84% of women die

between the ages of 60 and 70.

Example – Female LongevityDetermine the median.

It’s below 90. And above 80.

Example – Female Longevity0.50 probability above M / 0.5 below M

0.5 area (under curve) below M; 0.5 above M

Median Female Longevity

67.5 rectangles under curve below 86

67.50.008 = 0.54. 86 is 54th %-ile

86 is too high

Median Female Longevity

58 rectangles under curve below 84

580.008 = 0.464. 84 is 46.4th %-ile

84 is too low

%-ile k 46.4 50.0 54.0

Value x 84 ??? 86

%-ile k 46.4 50.0 54.0

Value x 84 84.75 86

The median is about 84.75 (85) years of age.

Example – Female LongevityTo approximate the mean…

…use midpoints and probabilities.

Midpoint = 71(rounded to nearest odd year)Area = 3.4 rectangles

Probability = 3.40.008 = 0.0272

Left skewed

Mode = 90

Median = 85

Mean = 83

Longevity of Women (years)

Den

sity

10090807060504030

0.044

0.040

0.036

0.032

0.028

0.024

0.020

0.016

0.012

0.008

0.004

0.000

m = 83.0

Example – Female LongevityMean = balance point

= 83.0

(Easy for symmetric distributions.)

Longevity of Women (years)

Den

sity

10090807060504030

0.044

0.040

0.036

0.032

0.028

0.024

0.020

0.016

0.012

0.008

0.004

0.000

m = 83.0s = 11.0

Example – Female LongevityThe mean and standard deviation will generally be given, or follow from formulas.

= 83.0 = 11.0

The curve models a population distribution for a continuous variable.

The model must capture the important information in the population of data.

If we think of the experiment that randomly selects a single item from the population and records a result, we call this curve a probability distribution.

ExampleWait time (minutes) until seating at a restaurant is modeled by y = x / 50 over the range from x = 0 to 10.

1086420

0.20

0.15

0.10

0.05

0.00

x

y = x / 5

0

Why is this a legitimate model for a continuous variable?

It’s nonnegative.

The total area is ½ b h = 0.5(10)(0.2) = 1.

Determine the probability of a wait less than 6 min.

If x = 6: y = 6/50 = 0.12.

The shaded area is 0.5(6)(0.12) = 0.36.

(6 is the 36th percentile.)

Determine the probability of waiting longer than 8 min.

If x = 8, y = 8/50 = 0.16. The pink area is 0.5(8)(0.16) = 0.64. (So 8 is the 64th percentile.)

The yellow area is the probability of a result greater than 8:

1 – 0.64 = 0.36.

Determine the median wait.

6 is the 36th percentile; 8 is the 64th. The median is the 50th.

7 would be a good guess. However, the probability of a result less than 7 is 0.5(7)(0.14) = 0.49. The median is a bit above 7.

Determine the median m.

Whatever m is: 0.5 m (m/50) = 0.5. So m2 = 50.

The median is m = 7.071.071.750 =m

The mode is 10.00

The median is m = 7.071.

The mean is = 6+ 2/3 = 6.667.

Determine the probability of a result between 5.5 and 6.5.

Area below 6.5: 0.4225

Area below 5.5: 0.3025

Area between 5.5 and 6.5: 0.1200

If the result were rounded to the nearest whole number, the probability is 0.12 that it rounds to 6.

1086420

0.20

0.15

0.10

0.05

0.00

x

y = x / 5

0

Determine the probability of a result between 4.5 and 5.5.

Area below 5.5: 0.3025

Area below 4.5: 0.2025

Area between 4.5 and 5.5: 0.1000

If the result were rounded to the nearest whole number, the probability is 0.10 that it rounds to 5.

1086420

0.20

0.15

0.10

0.05

0.00

x

y = x / 5

0

Rounds to 0 (between 0.0 and 0.5)

Rounds to 1 (between 0.5 and 1.5)

Rounds to 2 (between 1.5 and 2.5)

Rounds to 3 (between 2.5 and 3.5)

Rounds to 4 (between 3.5 and 4.5)

Rounds to 5 (between 4.5 and 5.5) 0.1000

Rounds to 6 (between 5.5 and 6.5) 0.1200

Rounds to 7 (between 6.5 and 7.5)

Rounds to 8 (between 7.5 and 8.5)

Rounds to 9 (between 8.5 and 9.5)

Rounds to 10 (between 9.5 and 10.0)

Rounds to 0 (between 0.0 and 0.5)

Rounds to 1 (between 0.5 and 1.5) 0.0200

Rounds to 2 (between 1.5 and 2.5) 0.0400

Rounds to 3 (between 2.5 and 3.5) 0.0600

Rounds to 4 (between 3.5 and 4.5) 0.0800

Rounds to 5 (between 4.5 and 5.5) 0.1000

Rounds to 6 (between 5.5 and 6.5) 0.1200

Rounds to 7 (between 6.5 and 7.5) 0.1400

Rounds to 8 (between 7.5 and 8.5) 0.1600

Rounds to 9 (between 8.5 and 9.5) 0.1800

Rounds to 10 (between 9.5 and 10.0)

Sum to 0.90

Determine the probability of a result between 0 and 0.5. (Would round to 0.)

Area below 0.5: 0.0025

Determine the probability of a result between 9.5 and 10.0. (Would round to 10.)

Area below 10.0: 1.0000

Area below 9.5: 0.9025

Area between9.5 and 10.0: 0.0975

1086420

0.20

0.15

0.10

0.05

0.00

x

y = x / 5

0

Rounds to 0 (between 0.0 and 0.5) 0.0025

Rounds to 1 (between 0.5 and 1.5) 0.0200

Rounds to 2 (between 1.5 and 2.5) 0.0400

Rounds to 3 (between 2.5 and 3.5) 0.0600

Rounds to 4 (between 3.5 and 4.5) 0.0800

Rounds to 5 (between 4.5 and 5.5) 0.1000

Rounds to 6 (between 5.5 and 6.5) 0.1200

Rounds to 7 (between 6.5 and 7.5) 0.1400

Rounds to 8 (between 7.5 and 8.5) 0.1600

Rounds to 9 (between 8.5 and 9.5) 0.1800

Rounds to 10 (between 9.5 and 10.0) 0.0975

Sum to 1.0

Rounded Value x P(x) Mean Computation

0 0.0025 00.0025 = 0.0000

1 0.0200 10.0200 = 0.0200

2 0.0400 20.0400 = 0.0800

3 0.0600 30.0600 = 0.1800

4 0.0800 40.0800 = 0.3200

5 0.1000 50.1000 = 0.5000

6 0.1200 60.1200 = 0.7200

7 0.1400 70.1400 = 0.9800

8 0.1600 80.1600 = 1.2800

9 0.1800 90.1800 = 1.6200

10 0.0975 100.0975 = 0.9750

SUM = 6.6750

Recommended