37
Random Sampling In the real world, most R.V.’s for practical applications are continuous, and have no generalized formula for f X (x) and F X (x). We may approximate the density functions by taking a random sample, with a large enough sample size, n, and plot the relative frequencies within the sample.

Random Sampling

Embed Size (px)

DESCRIPTION

Random Sampling. In the real world, most R.V.’s for practical applications are continuous, and have no generalized formula for f X (x) and F X (x) . - PowerPoint PPT Presentation

Citation preview

Page 1: Random Sampling

Random Sampling

In the real world, most R.V.’s for practical applications are continuous, and have no generalized formula for fX(x) and FX(x).

We may approximate the density functions by taking a random sample, with a large enough sample size, n, and plot the relative frequencies within the sample.

Page 2: Random Sampling

Random Sampling

Examples: Suppose you wanted to know more information

about the GPAs of students enrolled at the U of A Rather than look up every individual student, you

can take a small sample of randomly selected students and figure out their GPAs to project what the GPAs of the entire student body would be.

Taking a poll of registered voters for the presidential election

Page 3: Random Sampling

Random Sampling

The whole idea behind random sampling is to

let a part represent the whole.

Page 4: Random Sampling

Random Sampling

• Estimate distribution and properties of a random variable by taking a random sample

• We consider that all events are independent from one another

• Collect a sample with n items (a random sample of size n)

Page 5: Random Sampling

Random Sampling

• If we take a large enough sample, the histogram shows us the distribution of the data

• This means we know the p.m.f. graph for a finite r.v. or the p.d.f. graph for a continuous r.v.

• In addition, the max, min, & mean of a sample should be similar to the max, min, & mean of the variable

Page 6: Random Sampling

Random Sampling

• Suppose that X is the number of assembly line stoppages that occur during an 8-hour shift in our manufacturing plant.

• We could obtain a random sample of size 10 by watching the line for 10 different shifts and recording the number of stoppages during each eight hour shift.

Page 7: Random Sampling

Random Sampling

• The table below shows the number of work stoppages for various shifts:

• A histogram plot of these stoppages gives us a pictorial representation of how this chaotic data behaves.

Shift observed 1 2 3 4 5 6 7 8 9 10 Number of stoppages 2 11 6 8 6 5 10 4 8 3

Page 8: Random Sampling

Random Sampling

• The relative frequency histogram plot is shown below:

Sample Data

0

0.05

0.10.15

0.2

0.25

0 1 2 3 4 5 6 7 8 9 10 11

Stoppages

Rela

tive F

requency

Page 9: Random Sampling

Random Sampling

• Our histogram plot of the relative frequencies for the work stoppage example can be used to approximate the p.m.f. for this situation

• Of course, if we increase the number of observations of shifts, our p.m.f. will be more accurate.

Page 10: Random Sampling

Random Sampling

• From our example, we can also look at the average number of work stoppages:

• The average we just found is sometimes called the sample mean and can be found using:

3.610

63

)384105686112(10

1

XXE

n

iix

nx

1

1

Page 11: Random Sampling

Random Sampling

• Suppose that the assembly line discussed in Example 1 runs 24 hours per day, with workers in three shifts. The sheet Numbers in the Excel file Stoppages.xls contains records of the number of stoppages per shift for nine months (819 shifts).

Page 12: Random Sampling

Random Sampling

• Computations in that sheet show that the number of stoppages in the sample ranged from 0 to 14, with a mean of 5.78.

• The sample in Stoppages.xls is much larger than the one of size 10 that we considered in the previous example.

• Hence, we would replace the earlier estimate of 6.3 for E(X) with the new estimate of 5.78.

Page 13: Random Sampling

Random Sampling

• A histogram plot of the relative frequencies also give us a good estimation for the p.m.f.

Sample Data

0.000

0.020

0.040

0.060

0.0800.100

0.120

0.140

0.160

0.180

0.200

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Stoppages

Rela

tive

Fre

quen

cy

Page 14: Random Sampling

Random Sampling

• We can also use a large sampling to approximate the p.d.f. for a continuous random variable.

• Plant manager wants to better understand the delays caused by stoppages of the assemble line.

• She is specifically interested in how long they last.

Page 15: Random Sampling

Random Sampling

• Let T be the length of time, in minutes, that a randomly selected stoppage will last.

• The duration of each of the 4,734 stoppages that occurred during the 819 shifts was recorded.

• This provides a random sample of observations of the continuous random variable T. The times are shown in the sheet Times in the Excel file Stoppages.xls.

Page 16: Random Sampling

Random Sampling

• The histogram of times is converted to relative frequencies. We would like to treat this as a p.d.f.

• This means the total area must be 1!

• To do this the area of each rectangle of our histogram must equal the relative frequency.

Page 17: Random Sampling

Random Sampling

• Because we already have made our bins of width 2 we must adjust the heights of our relative frequency so that the area of each rectangle equals the relative frequency.

• This is done by taking the relative frequencies and dividing by the bin width

• Notice for example the bar whose bin label is 7 has a height of 0.07, the area of this bar = 2 * 0.07 = 0.14, which is the relative frequency for this bin.

Page 18: Random Sampling

Random Sampling

• An example of a histogram from the Excel file Stoppages.xls is shown below

TIMES

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

1 7 13 19 25 31 37 43 49 55 61 67 73 79

t

App

rox.

fT(

t)

Page 19: Random Sampling

Random Sampling

• We can create an approximate p.d.f. by connecting the midpoints of the bins at the top of each bar

• Although the

actual p.d.f. is

smooth, this graph

gives a good

representation

TIMES

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

1 7 13 19 25 31 37 43 49 55 61 67 73 79

t

App

rox.

fT(

t)

Page 20: Random Sampling

Random Sampling

• Simulations used as a predictor

• Can give an estimate of what might happen for many trials

• Estimate is generally accurate

Page 21: Random Sampling

Random Sampling

• Simulation for finite choices

• Use RANDBETWEEN function in Excel

• Chooses an integer between two values

• Ex. =RANDBETWEEN(1,5) would return a value equal to 1, 2, 3, 4, or 5

Page 22: Random Sampling

Random Sampling

• Sample of RANDBETWEEN function

Page 23: Random Sampling

Random Sampling

• VLOOKUP function will find information in a table

Value in leftmost column

Location of table (database)

Number value of column (1, 2, 3, …)

Usually blank

Page 24: Random Sampling

Random Sampling

• Focus on the Project:

• In the Excel file SampleData.xls on my website, we find that there are 615 arrival values for the 9 a.m. hour. This is a random sample of the random variable A.

Number of Times

Minimum Time

Mean Time

Maximum Time

Range of Times

615 0.00 0.48 2.75 2.75

Page 25: Random Sampling

Random Sampling

• Focus on the Project:

• In order to graph the p.m.f. and p.d.f. the heights of the bars must be adjusted

• Take relative frequency (percentage) and divide by the bin width to give the new height

• This ensures the area under the graph will equal 1

Page 26: Random Sampling

Random Sampling

• Focus on the Project:• p.m.f. approximating p.d.f. for variable A

PMF approximating PDF

0

0.5

1

1.5

2

00.2

0.4

0.6

0.8 1

1.2

1.4

1.6

1.8 2

2.2

2.4

2.6

2.8

Arrival Times (9 a.m.)

Hei

gh

t

Page 27: Random Sampling

Random Sampling

• Focus on the Project:

• Approximate p.d.f. and actual p.d.f. for variable AApproximate PDF and Actual PDF

0

0.5

1

1.5

2

2.5

0 0.5 1 1.5 2 2.5 3

Arrival Times (9 a.m.)

Hei

gh

t

Page 28: Random Sampling

Random Sampling

• Focus on the Project:

• Note: (since )

0for

0for 048.0/

48.01 ae

xaf aA

0for 1

0for 048.0/ ae

aaF aA

48.0

Page 29: Random Sampling

Random Sampling

• Focus on the Project:

• We find that there are 130 arrival values for the 9 p.m. hour. This is a random sample of the random variable B.

Number of Times

Minimum Time

Mean Time

Maximum Time

Range of Times

130 0.00 2.20 9.22 9.22

Page 30: Random Sampling

Random Sampling

• Focus on the Project:

• Note: (since ) we get the following for 9 p.m.

0for

0for 020.2/

20.21 be

bbf bB

0for 1

0for 020.2/ be

bbF bB

20.2

Page 31: Random Sampling

Random Sampling

• Focus on the Project:

• For the service times, we get

Number of Times

Minimum Time

Mean Time

Maximum Time

Range of Times

8356 0.50 1.17 7.72 7.22

Page 32: Random Sampling

Random Sampling

• Focus on the Project:• For the service times, we get the following graph

Service Time PMF approximating PDF

0

0.2

0.4

0.6

0.8

1

1.20

0.5 1

1.5 2

2.5 3

3.5 4

4.5 5

5.5 6

6.5 7

7.5 8

Service Times

Hei

gh

t

Page 33: Random Sampling

Random Sampling

• Focus on the Project:

• You may not change the name of the Excel file Queue Focus.xls

• You may not delete or insert any rows, columns, nor cells

Page 34: Random Sampling

Random Sampling

• Focus on the Project:

• Copy and paste your service time data from your team data into the Excel file Queue Focus.xls on the sheet Data in cells H45:H???

• This will end with your last service time

• You should also increase the numbers in column G

Page 35: Random Sampling

Random Sampling

• Focus on the Project:

• Note: The sample data contains 8356 service time records

• This is the formula that exists in cell E83 (and continues to cell E282) in Queue Focus.xls in the sheet Random Sampling

=IF(ISNUMBER(D83),VLOOKUP(RANDBETWEEN(1,7634),Data!$G$45:Data!$H$7678,2),"")

Page 36: Random Sampling

Random Sampling

• Focus on the Project:

• The formula should be modified as follows:

=IF(ISNUMBER(D83),VLOOKUP(RANDBETWEEN(1,7634),Data!$G$45:Data!$H$7678,2),"")

=IF(ISNUMBER(D83),VLOOKUP(RANDBETWEEN(1,your last service time),Data!$G$45:Data!$H$your last service time cell,2),"")

Page 37: Random Sampling

Random Sampling

• Focus on the Project:

• Change the value in cell D80 (number of customers) of Queue Focus.xls in the sheet Random Sampling to the number in your team’s Excel file in cell F18 (number of customers simulation must accommodate)