Last Time Hypothesis Testing –1-sided vs. 2-sided Paradox Big Picture Goals –Hypothesis Testing...

Preview:

Citation preview

Last Time

• Hypothesis Testing– 1-sided vs. 2-sided Paradox

• Big Picture Goals– Hypothesis Testing– Margin of Error– Sample Size Calculations

• Visualization– Histograms

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

• Excel notation to avoid actual calculation– So no computers or calculators

• Bring sheet of formulas, etc.

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

• Excel notation to avoid actual calculation– So no computers or calculators

• Bring sheet of formulas, etc.

• No blue books needed

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

• Excel notation to avoid actual calculation– So no computers or calculators

• Bring sheet of formulas, etc.

• No blue books needed

(will just write on my printed version)

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

• Material Covered:

HW 1 – HW 5

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

• Material Covered:

HW 1 – HW 5

– Note: due Thursday, Feb. 19

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

• Material Covered:

HW 1 – HW 5

– Note: due Thursday, Feb. 19– Will ask grader to return Mon. Feb. 23

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

• Material Covered:

HW 1 – HW 5

– Note: due Thursday, Feb. 19– Will ask grader to return Mon. Feb. 23– Can pickup in my office (Hanes 352)

Administrative Matters

Midterm I, coming Tuesday, Feb. 24

• Material Covered:

HW 1 – HW 5

– Note: due Thursday, Feb. 19– Will ask grader to return Mon. Feb. 23– Can pickup in my office (Hanes 352)– So today’s HW not included

Reading In Textbook

Approximate Reading for Today’s Material:

Pages 261-262, 9-14, 270-276, 30-34

Approximate Reading for Next Class:

Pages 279-282, 34-43

Big Picture

• Hypothesis Testing

(Given dist’n, answer “yes-no”)

• Margin of Error

(Find dist’n, use to measure error)

• Choose Sample Size

(for given amount of error)

Need better prob. tools

Big Picture

• Margin of Error

• Choose Sample Size

Need better prob tools

Start with visualizing probability distributions

(key to “alternate representation”)

Histograms

Idea: show rectangles, where area represents

Histograms

Idea: show rectangles, where area represents:

(a) Distributions: probabilities

(b) Lists (of numbers): # of observations

Histograms

Idea: show rectangles, where area represents:

(a) Distributions: probabilities

(b) Lists (of numbers): # of observations

Note: will studies these in parallel for a while

(several concepts apply to both)

Histograms

Idea: show rectangles, where area represents:

(a) Distributions: probabilities

(b) Lists (of numbers): # of observations

Caution: There are variations not based on

areas, see bar graphs in text

But eye perceives area, so sensible to use it

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns:

If possible values are: x = 0, 1, … , n,

get good picture from choice:

[-½, ½), [½, 1.5), [1.5, 2.5), … , [n-½, n+½)

where [1.5, 2.5) is “all #s ≥ 1.5 and < 2.5”

(called a “half open interval”)

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns

b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1

Start with [1,3), [3,7)

• As above use half open intervals

(to break ties)

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

a. Prob. dist’ns

b. Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1

Start with [1,3), [3,7)

• Can use anything for class intervals

• But some choices better than others…

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

2. Find “probabilities” or “relative frequencies”

for each class

(a) Probs: use f(x) for [x-½, x+½), etc.

(b) Lists: [1,3): rel. freq. = 1/5 = 20%

[3,7): rel. freq. = 4/5 = 80%

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

2. Find “probabilities” or “relative frequencies”

for each class

3. Above each interval, draw rectangle where

area represents class frequency

Histograms

3. Above each interval, draw rectangle where

area represents class frequency

(a) Probs: If width = 1, then

area = width x height = height

So get area = f(x), by taking height = f(x)

Histograms

3. Above each interval, draw rectangle where

area represents class frequency

(a) Probs: If width = 1, then

area = width x height = height

So get area = f(x), by taking height = f(x)

E.g. Binomial Distribution

Binomial Prob. Histograms

From Class Example 5http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls

Construct Prob. Histo:

• Create column of x values

• Compute f(x) values

• Make bar plot

Binomial Prob. Histograms• Make bar plot

– “Insert” tab– Choose “Column”– Right Click – Select Data

(Horizontal – x’s, “Add series”, Probs)– Resize, and move by dragging– Delete legend– Click and change title– Right Click on Bars, Format Data Series:

• Border Color, Solid Line, Black• Series Options, Gap Width = 0

Binomial Prob. Histograms

From Class Example 5http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg5.xls

Construct Prob. Histo:

• Create column of x values

• Compute f(x) values

• Make bar plot

• Make several, for interesting comparison

Binomial Prob. Histograms

From Class Example 5a

Binomial Prob. Histograms

From Class Example 5a

Compare

Different p

Binomial Prob. HistogramsFrom Class Example 5a

Compare

Different p:

• Surprisingly

similar

“mound”

shape

Binomial Prob. HistogramsFrom Class Example 5a

Compare

Different p:

• Surprisingly

similar

“mound”

shape

(will exploit this fact)

Binomial Prob. Histograms

From Class Example 5a

Compare

Different p:

• Centerpoint

moves

as p grows

Binomial Prob. HistogramsFrom Class Example 5a

Compare

Different p:

• Centerpoint

moves

as p grows

(will quantify, and use this, too)

Binomial Prob. Histograms

Important point:

Binomial shows common shape across p

Binomial Prob. Histograms

Important point:

Binomial shows common shape across p

Mound Shape

(like dumping dirt out of a truck)

Binomial Prob. Histograms

Important point:

Binomial shows common shape across p

Mound Shape

(like dumping dirt out of a truck)

What about n?

Binomial Prob. Histograms

From Class Example 5b

Compare

Different n

Binomial Prob. HistogramsFrom Class Example 5b

Compare

Different n:

• Again very

similar

mound

shape

Binomial Prob. HistogramsFrom Class Example 5b

Compare

Different n:

• Again very

similar

mound

shape

(will exploit this fact)

Binomial Prob. Histograms

From Class Example 5b

Compare

Different n:

• Center does

not appear

to move

Binomial Prob. Histograms

From Class Example 5b

Compare

Different n:

• Center does

not appear

to move,

but check axes!

Binomial Prob. Histograms

From Class Example 5b

Compare

Different n:

• Center does

not appear

to move,

but check axes!

(will quantify, and use this, too)

Binomial Prob. Histograms

From Class Example 5b

Compare

Different n:

• But width of

bump does

seem to

change

Binomial Prob. HistogramsFrom Class Example 5b

Compare

Different n:

• But width of

bump does

seem to

change

(will quantify, and use this, too)

Binomial Prob. Histograms

Important point:

Binomial shows common shape across p & n

Mound Shape

(like dumping dirt out of a truck)

Binomial Prob. Histograms

Important point:

Binomial shows common shape across p & n

Mound Shape

(like dumping dirt out of a truck)

Question for later: How can we put this work?

And now for something (sort of) different

Recall survey from first class meeting

And now for something (sort of) different

Recall survey from first class meeting

Display Results?

And now for something (sort of) different

Recall survey from first class meeting

Display Results? Use “bar graph”

And now for something (sort of) different

Bar Graph from Survey, on major

And now for something (sort of) different

Bar Graph from Survey, on major Business

biggest (true for many years)

And now for something (sort of) different

Bar Graph from Survey, on major Business

biggestBiology 2nd (fairly new)

And now for something (sort of) different

Bar Graph from Survey, on major Business

biggestBiology 2nd Variety of others

Welcome!

And now for something (sort of) different

Bar Graph from Survey, on major

Labels, notClass Intervals

And now for something (sort of) different

Bar Graph from Survey, on major

Thin bars Now OK

And now for something (sort of) different

Bar Graph from Survey, on major

Study Counts, not rel. freq.

And now for something (sort of) different

Bar Graph from Survey, on major

Study Counts, not rel. freq. (not areas)

And now for something (sort of) different

Bar Graph from Survey, on year

And now for something (sort of) different

Bar Graph from Survey, on year

Distributionmakes sense?

And now for something (sort of) different

Bar Graph from Survey, on year

Different color stresses different data

And now for something (sort of) different

Bar Graph from Survey, on year

Shorter & fewer labels appear as horizontal

Histograms

Steps for Constructing Histograms:

1. Pick class intervals that contain full dist’n

2. Find “probabilities” or “relative frequencies”

for each class

3. Above each interval, draw rectangle where

area represents class frequency

Histograms

HW: 5.21b (make & print an Excel plot)

Histograms

3. Above each interval, draw rectangle where

area represents class frequency

(a) Probs

Histograms

3. Above each interval, draw rectangle where

area represents class frequency

(a) Probs

(b) Lists

Histograms

3. Above each interval, draw rectangle where

area represents class frequency

(a) Probs

(b) Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1

same e.g. as above

Histograms

3. Above each interval, draw rectangle where

area represents class frequency

(a) Probs

(b) Lists: e.g. 2.3, 4.5, 4.7, 4.8, 5.1

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1

1 2 3 4 5 6 7

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

1 2 3 4 5 6 7

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

From above discussion

1 2 3 4 5 6 7

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

From above discussion

(will see: not very good)

1 2 3 4 5 6 7

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

1 2 3 4 5 6 7

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

1 2 3 4 5 6 7

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

20 Total Frequency = 100%

15

10

5

1 2 3 4 5 6 7

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

20 Total Frequency = 100%

15 So each is 20%

10

5

1 2 3 4 5 6 7

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

20 Total Frequency = 100%

15 20% = Area

10

5

1 2 3 4 5 6 7

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

20 Total Frequency = 100%

15 20% = Area = 2 * height

10

5

1 2 3 4 5 6 7

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

20 Total Frequency = 100%

15 20% = Area = 2 * ht = 2 * (10% / unit)

10

5

1 2 3 4 5 6 7

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

20 Total Frequency = 100%

15 20% = Area = 2 * ht = 2 * (10% / unit)

10

5

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

20 Total Frequency = 100%

15 20% = Area = 4 * ht

10

5

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

20 Total Frequency = 100%

15 20% = Area = 4 * ht = 4 * (5% / unit)

10

5

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

20 Total Frequency = 100%

15 20% = Area = 4 * ht = 4 * (5% / unit)

10

5

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

20 20% = Area = 4 * ht = 4 * (5% / unit)

15

10

5

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Rectangles - area represents class frequency

2.3, 4.5, 4.7, 4.8, 5.1, Class Intervals [1,3), [3,7)

20

15

10

5

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Note: This histogram hides structure in data:

2.3, 4.5, 4.7, 4.8, 5.1

20

15

10

5

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Quite sparse region

2.3, 4.5, 4.7, 4.8, 5.1

20

15

10

5

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Quite dense region

2.3, 4.5, 4.7, 4.8, 5.1

20

15

10

5

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Endpoints way off

2.3, 4.5, 4.7, 4.8, 5.1

20

15

10

5

1 2 3 4 5 6 7

% p

er u

nit

Histograms

General Major Challenge:

Choice of Class Intervals

20

15

10

5

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Try for “better” choice:

2.3, 4.5, 4.7, 4.8, 5.1

1 2 3 4 5 6 7

Histograms

Try for “better” choice:

2.3, 4.5, 4.7, 4.8, 5.1

[2,4)

[4,5)

[5,6)

1 2 3 4 5 6 7

Histograms

Now build histogram as above (areas):

2.3, 4.5, 4.7, 4.8, 5.1

60

30

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Now build histogram as above (areas):

2.3, 4.5, 4.7, 4.8, 5.1

60

30

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Now build histogram as above (areas):

2.3, 4.5, 4.7, 4.8, 5.1

60

30

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Now build histogram as above (areas):

2.3, 4.5, 4.7, 4.8, 5.1

60

30

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Now build histogram as above (areas):

2.3, 4.5, 4.7, 4.8, 5.1

60

30

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Note: much better visual impression

2.3, 4.5, 4.7, 4.8, 5.1

60

30

1 2 3 4 5 6 7

% p

er u

nit

Histograms

Note: much better visual impression

Histogram better reflects “structure in data”

60

30

1 2 3 4 5 6 7

% p

er u

nit

Histograms

General Comments:

• Total area under histogram is 100%

Histograms

General Comments:

• Total area under histogram is 100%

• So label vertical axis as “% per unit”

Histograms

General Comments:

• Total area under histogram is 100%

• So label vertical axis as “% per unit”

• Synonym for “Class Interval” is “bin”

Histograms

General Comments:

• Total area under histogram is 100%

• So label vertical axis as “% per unit”

• Synonym for “Class Interval” is “bin”

(think of relative frequency as counting

observations that “fall into bins”)

Histograms

General Comments:

• Total area under histogram is 100%

• So label vertical axis as “% per unit”

• Synonym for “Class Interval” is “bin”

(think of relative frequency as counting

observations that “fall into bins”)

• Choice of bins is critical

Histograms

General Comments:

• Total area under histogram is 100%

• So label vertical axis as “% per unit”

• Synonym for “Class Interval” is “bin”

(think of relative frequency as counting

observations that “fall into bins”)

• Choice of bins is critical

• Common Simplification: Equally spaced

Histograms

General Comments:

• Choice of bins is critical

• Common Simplification: Equally spaced

• But still have choice of binwidth

(also very challenging)

Histograms

HW: C15 For the data:

0.8, 2.1, 2.6, 0.9, 2.2, 0.8, 2.2, 0.9

a) Make histograms using the bins:

i. [0,1), [1,2), [2,3)

ii. [0.5,1.5), [1.5,2.5), [2.5,3.5)

iii. [0,1), 1,3)

(Interesting to look at differences)

Histograms

HW: C15 For the data:

0.8, 2.1, 2.6, 0.9, 2.2, 0.8, 2.2, 0.9

a) Make histograms using the bins:

i. [0,1), [1,2), [2,3)

ii. [0.5,1.5), [1.5,2.5), [2.5,3.5)

iii. [0,1), 1,3)

b) Why are bins [0,2), [1,3) inappropriate here?

c) Why are bins [1,2), [2,5) inappropriate here?

Histogram Real Data Example

Buffalo Snow Fall Data

• Annual totals (in inches)

Histogram Real Data Example

Buffalo Snow Fall Data

• Annual totals (in inches)

• For Buffalo, N.Y.

Histogram Real Data Example

Buffalo Snow Fall Data

• Annual totals (in inches)

• For Buffalo, N.Y.

• 63 years, ranging from ~30 to ~120

Histogram Real Data Example

Buffalo Snow Fall Data

• Annual totals (in inches)

• For Buffalo, N.Y.

• 63 years, ranging from ~30 to ~120

• A lot of snow, due to “lake effect”

Histogram Real Data Example

Buffalo Snow Fall Data

• Annual totals (in inches)

• For Buffalo, N.Y.

• 63 years, ranging from ~30 to ~120

• A lot of snow, due to “lake effect”

• Any patterns in data?

Histogram Real Data Example

Buffalo Snow Fall Data

• Data Available in Class Example 6

• Left hand column of spreadsheet:http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg6.xls

Histogram Real Data Example

Buffalo Snow Fall Data

• Data Available in Class Example 6

• Left hand column of spreadsheet:http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor155-2009/ClassNotes/Stor155Eg6.xls

• Now do histogram analysis

• Using Excel

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Data Tab

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Data Tab

• Push Data Analysis Button

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Data Tab

• Push Data Analysis Button

• Pulls up:

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Data Tab

• Push Data Analysis Button

• Pulls up:

• Choose:

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Pulls Up:

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Pulls Up:

• Link input data

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Pulls Up:

• Link input data

• Empty for default

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Pulls Up:

• Link input data

• Empty for default

• Choose here

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Pulls Up:

• Link input data

• Empty for default

• Choose here

• And location

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Pulls Up:

• Link input data

• Empty for default

• Choose here

• And location

• Get Histo Plot

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Manually Chart Result???

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Manually Chart Result???

• Twiddle Output (similar to above):

• Delete Series Legend

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Manually Chart Result???

• Twiddle Output (similar to above):

• Delete Series Legend

• Format Data Series – Gap Width 0

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Manually Chart Result???

• Twiddle Output (similar to above):

• Delete Series Legend

• Format Data Series – Gap Width 0

• Format Data Series – Border Color Black

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Manually Chart Result???

• Twiddle Output (similar to above):

• Delete Series Legend

• Format Data Series – Gap Width 0

• Format Data Series – Border Color Black

• Chart Tools – Design – Choose Titled

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Manually Chart Result???

• Twiddle Output (similar to above):

• Delete Series Legend

• Format Data Series – Gap Width 0

• Format Data Series – Border Color Black

• Chart Tools – Design – Choose Titled

• Type in Title

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Result:

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Result:

• Unround numbers

for bin edges

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Result:

• Unround numbers

for bin edges

• Hard to interpret

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Data centered

around 90

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Data centered

around 90

• Most data between

50 and 130

Histogram Real Data Example

Buffalo Snow Fall Data – Excel Default Histo

• Data centered

around 90

• Most data between

50 and 130

• Assymetric

Distribution

Histogram Real Data Example

Buffalo Snow Fall Data – Smaller binwidth

Histogram Real Data Example

Buffalo Snow Fall Data – Smaller binwidth

Histogram Real Data Example

Buffalo Snow Fall Data – Smaller binwidth

• Chosen by me

• Binwidth = 5, << ~13 from EXCEL default

Histogram Real Data Example

Buffalo Snow Fall Data – Smaller binwidth

• Chosen by me

• Binwidth = 5, << ~13 from EXCEL default

• Nicer edge numbers

Histogram Real Data Example

Buffalo Snow Fall Data – Smaller binwidth

• Chosen by me

• Binwidth = 5, << ~13 from EXCEL default

• Nicer edge numbers• Data centered around 84 (now more precise)

Histogram Real Data Example

Buffalo Snow Fall Data – Smaller binwidth

• Chosen by me

• Binwidth = 5, << ~13 from EXCEL default

• Nicer edge numbers• Data centered around 84 (now more precise)

• Bar graph rougher (fewer points in each bin)

Histogram Real Data Example

Buffalo Snow Fall Data – Smaller binwidth

• Chosen by me

• Binwidth = 5, << ~13 from EXCEL default

• Nicer edge numbers• Data centered around 84 (now more precise)

• Bar graph rougher (fewer points in each bin)

• Suggests 3 main groups

Histogram Real Data Example

Buffalo Snow Fall Data – Smaller binwidth

• Chosen by me

• Binwidth = 5, << ~13 from EXCEL default

• Nicer edge numbers• Data centered around 84 (now more precise)

• Bar graph rougher (fewer points in each bin)

• Suggests 3 main groups

(called “modes” or “clusters”)

Histogram Real Data Example

Buffalo Snow Fall Data – Smaller binwidth• Chosen by me• Binwidth = 5, << ~13 from EXCEL default• Nicer edge numbers• Data centered around 84 (now more precise)

• Bar graph rougher (fewer points in each bin)• Suggests 3 main groups

(called “modes” or “clusters”)

(can’t see this above: bin width is important)

Histogram Real Data Example

Buffalo Snow Fall Data – Larger binwidth

Histogram Real Data Example

Buffalo Snow Fall Data – Larger binwidth

Histogram Real Data Example

Buffalo Snow Fall Data – Larger binwidth

• Chosen by me

• Binwidth = 30, >> ~13 from EXCEL default

Histogram Real Data Example

Buffalo Snow Fall Data – Larger binwidth

• Chosen by me

• Binwidth = 30, >> ~13 from EXCEL default

• Bar graph is “smooth”

(since many points in each bin)

Histogram Real Data Example

Buffalo Snow Fall Data – Larger binwidth

• Chosen by me

• Binwidth = 30, >> ~13 from EXCEL default

• Bar graph is “smooth”

(since many points in each bin)

• Only one mode (cluster)???

Histogram Real Data Example

Buffalo Snow Fall Data – Larger binwidth

• Chosen by me

• Binwidth = 30, >> ~13 from EXCEL default

• Bar graph is “smooth”

(since many points in each bin)

• Only one mode (cluster)???

• Quite symmetric?

Histogram Real Data Example

Buffalo Snow Fall Data – Larger binwidth

• Chosen by me

• Binwidth = 30, >> ~13 from EXCEL default

• Bar graph is “smooth”

(since many points in each bin)

• Only one mode (cluster)???

• Quite symmetric?

(different from above: bin width is important)

Histogram Real Data Example

HW:

1.28 [data in ta01_005.xls]

((c) loses bump near 50)

1.36 [data in ex01_036.xls]

((a) 4 (b) 2 (c) 1)

1.37

1.39

Research Corner

Histo Bin Width (serious issue)Histo Bin Width (serious issue)

Research Corner

Histo Bin Width (serious issue)Histo Bin Width (serious issue)

Interesting Data Set: Hidalgo StampsInteresting Data Set: Hidalgo Stamps

Research Corner

Histo Bin Width (serious issue)Histo Bin Width (serious issue)

Interesting Data Set: Hidalgo StampsInteresting Data Set: Hidalgo Stamps

• Famous among postage stamp collectorsFamous among postage stamp collectors

Research Corner

Histo Bin Width (serious issue)Histo Bin Width (serious issue)

Interesting Data Set: Hidalgo StampsInteresting Data Set: Hidalgo Stamps

• Famous among postage stamp collectorsFamous among postage stamp collectors

• Printed in Mexico, 1800’s, over ~70 yearsPrinted in Mexico, 1800’s, over ~70 years

Research Corner

Histo Bin Width (serious issue)Histo Bin Width (serious issue)

Interesting Data Set: Hidalgo StampsInteresting Data Set: Hidalgo Stamps

• Famous among postage stamp collectorsFamous among postage stamp collectors

• Printed in Mexico, 1800’s, over ~70 yearsPrinted in Mexico, 1800’s, over ~70 years

• Very different paper thicknesses…Very different paper thicknesses…

Research Corner

Histo Bin Width (serious issue)Histo Bin Width (serious issue)

Interesting Data Set: Hidalgo StampsInteresting Data Set: Hidalgo Stamps

• Famous among postage stamp collectorsFamous among postage stamp collectors

• Printed in Mexico, 1800’s, over ~70 yearsPrinted in Mexico, 1800’s, over ~70 years

• Very different paper thicknesses…Very different paper thicknesses…

• How many paper sources?How many paper sources?

Research Corner

Histo Bin Width (serious issue)Histo Bin Width (serious issue)

Interesting Data Set: Hidalgo StampsInteresting Data Set: Hidalgo Stamps

• Famous among postage stamp collectorsFamous among postage stamp collectors

• Printed in Mexico, 1800’s, over ~70 yearsPrinted in Mexico, 1800’s, over ~70 years

• Very different paper thicknesses…Very different paper thicknesses…

• How many paper sources?How many paper sources?

• Unknown, since records are lostUnknown, since records are lost

Research Corner

Histo Bin Width (serious issue)Histo Bin Width (serious issue)

Interesting Data Set: Hidalgo StampsInteresting Data Set: Hidalgo Stamps

• Famous among postage stamp collectorsFamous among postage stamp collectors

• Printed in Mexico, 1800’s, over ~70 yearsPrinted in Mexico, 1800’s, over ~70 years

• Very different paper thicknesses…Very different paper thicknesses…

• How many paper sources?How many paper sources?

• Unknown, since records are lostUnknown, since records are lost

• Study histogram of stamp thicknessesStudy histogram of stamp thicknesses

Research Corner

Movie over binwidthMovie over binwidth

Research Corner

Movie over binwidthMovie over binwidth

Shows Shows veryvery wide range wide range

Research Corner

Movie over binwidthMovie over binwidth

Shows Shows veryvery wide range wide range

(much different(much different

visual impressions)visual impressions)

Research Corner

Movie over binwidthMovie over binwidth

Shows Shows veryvery wide range wide range

(much different(much different

visual impressions)visual impressions)

How many bumps?How many bumps?

Research Corner

Movie over binwidthMovie over binwidth

Shows Shows veryvery wide range wide range

(much different(much different

visual impressions)visual impressions)

How many bumps?How many bumps?

Answer published inAnswer published in

literature: 2, 3, 5, 7, 10literature: 2, 3, 5, 7, 10

Research Corner

Movie over binwidthMovie over binwidth

Shows Shows veryvery wide range wide range

(much different(much different

visual impressions)visual impressions)

How many bumps?How many bumps?

Answer published inAnswer published in

literature: 2, 3, 5, 7, 10literature: 2, 3, 5, 7, 10

Very challenging questionVery challenging question

Research Corner

How many bumps?How many bumps?

Believe in 2?Believe in 2?

Research Corner

How many bumps?How many bumps?

Believe in 3?Believe in 3?

Research Corner

How many bumps?How many bumps?

Believe in 5?Believe in 5?

Research Corner

How many bumps?How many bumps?

Believe in 7?Believe in 7?

Research Corner

How many bumps?How many bumps?

Believe in 10?Believe in 10?

Big Picture

• Margin of Error

• Choose Sample Size

Need better prob tools

Start with visualizing probability distributions

Big Picture

• Margin of Error

• Choose Sample Size

Need better prob tools

Start with visualizing probability distributions,

Next exploit constant shape property of Bi

Big Picture

Start with visualizing probability distributions,

Next exploit constant shape property of Binom’l

Big Picture

Start with visualizing probability distributions,

Next exploit constant shape property of Binom’l

Centerpoint feels p

Big Picture

Start with visualizing probability distributions,

Next exploit constant shape property of Binom’l

Centerpoint feels p Spread feels n

Big Picture

Start with visualizing probability distributions,

Next exploit constant shape property of Binom’l

Centerpoint feels p Spread feels n

Big Picture

Start with visualizing probability distributions,

Next exploit constant shape property of Binom’l

Centerpoint feels p Spread feels n

Now quantify these ideas, to put them to work

Notions of Center

Will later study “notions of spread”

Notions of Center

Textbook: Sections 4.4 and 1.2

Notions of Center

Textbook: Sections 4.4 and 1.2

Recall parallel development:

(a) Probability Distributions

(b) Lists of Numbers

Notions of Center

Textbook: Sections 4.4 and 1.2

Recall parallel development:

(a) Probability Distributions

(b) Lists of Numbers

Study 1st, since easier

Notions of Center

(b) Lists of Numbers

“Average” or “Mean”

Notions of Center

(b) Lists of Numbers

“Average” or “Mean” of x1, x2, …, xn

Mean = = xn

xn

ii

1

Notions of Center

(b) Lists of Numbers

“Average” or “Mean” of x1, x2, …, xn

Mean = =

common

notation

xn

xn

ii

1

Notions of Center

(b) Lists of Numbers

“Average” or “Mean” of x1, x2, …, xn

Mean = =

(as before) Greek sigma for sum

means “sum over I = 1,…,n”

xn

xn

ii

1

Notions of Center

HW:

C16: for the data of 1.57, find the mean using

the Excel function AVERAGE (10.03)

Notions of Center

Generalization of Mean:

“Weighted Average”

Notions of Center

Generalization of Mean:

“Weighted Average”

Idea: allow non-equal weights on s:ix

Notions of Center

Generalization of Mean:

“Weighted Average”

Idea: allow non-equal weights on s:ix

n

iiixw

1

Notions of Center

Generalization of Mean:

“Weighted Average”

Idea: allow non-equal weights on s:

Where ,

ix

n

iiixw

1

0iw 1i iw

Notions of Center

Generalization of Mean:

“Weighted Average”

E.g.: ordinary mean has each niw1

Notions of Center

Generalization of Mean:

“Weighted Average”

E.g.: ordinary mean has each

(constant weights)

niw1

Notions of Center

Generalization of Mean:

“Weighted Average”

Intuition: Corresponds to finding balance

point of weights on number line

Notions of Center

Generalization of Mean:

“Weighted Average”

Intuition: Corresponds to finding balance

point of weights on number line

1x 2x 3x

Notions of Center

Generalization of Mean:

“Weighted Average”

Intuition: Corresponds to finding balance

point of weights on number line

1x 2x 3x

Notions of Center

Generalization of Mean:

“Weighted Average”

Intuition: Corresponds to finding balance

point of weights on number line

1x 2x 3x

Notions of Center

HW: C17: Calculate (and think about as

“balance point”) weighted average of 1, 2, 3,

10 for the weights:

a. ¼, ¼, ¼, 1/4, (ordinary avg.) (4)

b. 0.1, 0.1, 0.1, 0.7 (more on 10) (7.6)

c. 0.3, 0.3, 0.3, 0.1 (less on 10) (2.8)

d. 1/3, 1/3, 1/3, 0 (none on 10) (2)

e. 0, 1, 0, 0 (all on 2) (2)

Recommended