3
8/12/2019 288-L2a http://slidepdf.com/reader/full/288-l2a 1/3  1 of 3 ME 288 Data Analysis Lab: Histogram and Probability Density Function ( Pdf ) A good way of understanding a Pdf  is to start with a histogram. Histogram is a preferred graphical way of presenting data which have been collected in categories. Data samples in categories or bins can not be plotted in scatter plots. Histograms are similar to bar charts except that: (i) It is drawn to represent the proportion (fraction) in each category (bin). (ii) Bar width represents the range for the category (bin). (iii) Bar height is given by Height=Fraction/width. (iv) Bar area (not height) represents the proportion. (v) The bars are adjacent (no gaps) –  the abscissa is a continuous variable. (vi) The proportion in each category is also the probability for belonging to that category. In other words, the next data point has the highest probability to fall in the category of highest proportion (area). Histogram and bar charts will look very different if the category widths are not the same. Please look at the example on the first page of: http://en.wikipedia.org/wiki/Histogram For example, we would make a bar chart with two bars to show the number of men and women in a population. Let’s do a simple example for a histogram: The NEW ENERGY COMPANY makes pressurized heat pipes to sell commercially. The  product performs well but there is concern that the heat pipes may burst. A co-op student is hired to test the burst pressure of heat pipes that are manufactured. He runs tests on 20 samples and gets the following table to make a histogram: CATEGORY/BIN Pressure Range (unit: atm) Mid Point # heat pipes  burst Proportion= Fraction  bursting Histogram height =Fraction/width 3.5 –  4.5 4 6 6/20 = 0.3 = 30% 0.3 4.5 –  5.5 5 10 10/20 = 0.5 = 50% 0.5 5.5 –  6.5 6 4 4/20 = 0.2 = 20% 0.2 3.5 –  6.5 Total 20 20/20 = 1 = 100% 1 In this simple case the category widths are the same (1 atm). The proportion for each category is the same as the probability of the next data point to be in that pressure range. We can get more points if we make smaller “bins or categories” e.g. 3 –  3.1, 3.1 –  3.2 … etc.. When we do that the histogram shape approaches the probability density function (  Pdf ). Therefore, histograms are an approximation of a  Pdf .

288-L2a

Embed Size (px)

Citation preview

Page 1: 288-L2a

8/12/2019 288-L2a

http://slidepdf.com/reader/full/288-l2a 1/3

  1 of 3

ME 288 Data Analysis Lab:

Histogram and Probability Density Function (Pdf  )

A good way of understanding a Pdf  is to start with a histogram. Histogram is a preferred

graphical way of presenting data which have been collected in categories. Data samples in

categories or bins can not be plotted in scatter plots.

Histograms are similar to bar charts except that:

(i)  It is drawn to represent the proportion (fraction) in each category (bin).

(ii)  Bar width represents the range for the category (bin).

(iii)  Bar height is given by Height=Fraction/width.

(iv)  Bar area (not height) represents the proportion.(v)  The bars are adjacent (no gaps) –  the abscissa is a continuous variable.

(vi)  The proportion in each category is also the probability for belonging to that category. In

other words, the next data point has the highest probability to fall in the category of

highest proportion (area).

Histogram and bar charts will look very different if the category widths are not the same.

Please look at the example on the first page of: http://en.wikipedia.org/wiki/Histogram. 

For example, we would make a bar chart with two bars to show the number of men and women

in a population. Let’s do a simple example for a histogram:

The NEW ENERGY COMPANY makes pressurized heat pipes to sell commercially. The

 product performs well but there is concern that the heat pipes may burst. A co-op student is hired

to test the burst pressure of heat pipes that are manufactured. He runs tests on 20 samples and

gets the following table to make a histogram:

CATEGORY/BINPressure Range

(unit: atm)

Mid Point # heat pipes burst

Proportion=Fraction

 bursting

Histogram height=Fraction/width

3.5 –  4.5 4 6 6/20 = 0.3 = 30% 0.3

4.5 –  5.5 5 10 10/20 = 0.5 = 50% 0.5

5.5 –  6.5 6 4 4/20 = 0.2 = 20% 0.2

3.5 –  6.5 Total 20 20/20 = 1 = 100% 1

In this simple case the category widths are the same (1 atm). The proportion for each category is

the same as the probability of the next data point to be in that pressure range.

We can get more points if we make smaller “bins or categories” e.g. 3 –  3.1, 3.1 –  3.2 … etc..

When we do that the histogram shape approaches the probability density function ( Pdf ).

Therefore, histograms are an approximation of a Pdf .

Page 2: 288-L2a

8/12/2019 288-L2a

http://slidepdf.com/reader/full/288-l2a 2/3

  2 of 3

We can draw a Pdf  curve to approximate this distribution by using the 3 points from the above

table. Using EXCEL, we can get the curve going through these 3 points:

  55024525   2   x x x pdf     in % or

  50.545.225.0   2   x x x pdf     in fractions.

The plot is shown below:

Definition: The probability density function is a curve; the area under the curve in an interval

gives the probability of a data point to be in that interval. It can be obtained by smoothing ahistogram.

If we use P as the probability, then:

Mathematically,  x

 P 

dx

dP  x pdf  

 

Therefore the probability:

For the interval   x from a to b: b

a

dx pdf   P    )( = area under the curve (just like the histogram!)

What is the probability that the next data point will belong to the range 3.5 to 6.5? The answer is

given by:

    %%.dx x x. x. P 

.

.

100947593550245255653

56

53

2  

Page 3: 288-L2a

8/12/2019 288-L2a

http://slidepdf.com/reader/full/288-l2a 3/3

  3 of 3

We can do this calculation for other intervals and compare with the histogram. We expect the

 Pdf  to approximate the histogram:

3.5 –  4.5     %.dx x x. x. P 

.

.

30927550245255453

54

53

2  

4.5 –  5.5     %.dx x x. x. P 

.

.

50947550245255554

55

54

2  

5.5 –  6.5     %.dx x x. x. P 

.

.

20917550245255655

56

55

2