Chapter 2 – Data Collection and Presentation

Preview:

DESCRIPTION

Chapter 2 – Data Collection and Presentation. In chapter one, we discussed briefly the importance of samples. When we select a sample from a population, the sample must be representative of the population. Let’s consider an example:. Sampling Designs. - PowerPoint PPT Presentation

Citation preview

Chapter 2 – Data Collection and Presentation

In chapter one, we discussed briefly the importance of samples. When we select a sample from a population, the sample must be representative of the population.

Let’s consider an example:

Sampling DesignsMethods by which a representative sample

can be chosen from a population.

Four sampling designs in common use:

1. Simple random sampling2. Systematic sampling3. Stratified sampling4. Cluster sampling

Sampling Designs

Simple Random SamplingThe example of putting all students’ names and thoroughly mixing these names before drawing each name represents a simple random sampling.

Sampling Designs

Systematic Samplingin this sampling design, every kth unit (or item) is selected from a population until the sample size is reached.

K = (size of population) ------------------------- (size of sample)

Sampling Designs

Stratified SamplingIn this sampling, the entire population is divided in to several groups, called strata, and a subsample is selected from each group. All subsamples are then combined to form a sample. This sampling design is used when a population is not homogeneous.

Sampling Designs

Stratified sampling could be either proportionate or disproportionate, depending on the number of units selected from each group.

Sampling Designs

Cluster SamplingThis sampling design involves selecting at random a few groups, called clusters, from a population, and then selecting units from each cluster. Cluster sampling is used when a population is large, fairly homogeneous and scattered over a large geographical area.

Data Organization

The process of selecting a sample from a population amounts to data collection. Once the data has been collected, it must be organized to make it meaningful. Unorganized data does not convey any meaningful information.

Raw DataA set of unorganized data

Data Organization requires 2 major steps:

1. Forming an array2. Creating a frequency

distribution table.

Array and Frequency Distribution

ArrayIf a set of data is organized in either ascending or descending order, an array is formed.

From the array, one can get some useful information, such as the lowest and the highest data value.

Frequency DistributionTable that arranges data into several classes.

All classes have:• A lower limit• An upper limit

Two questions:

1. how many classes to select?

2. what are the class limits?

Number of Classes

Generally, the number of classes should be no fewer than six and no more than 20.A Simple formula could be used to find the total number of classes:

THE TOTAL NUMBER OF CLASSES IS k SUCH THAT 2k IS AT LEAST EQUAL TO THE TOAL NUMBER OF OBSERVATIONS IN THE DATA SET

Class LimitsOnce we know the number of classes, we can find the class limits (lower and upper limits) of the classes.

•Certain guidelines should be followed:

1. If the data values are integer, the lower limit of the first class should be 0.5 less than the lowest data value.

Class LimitsThe midpoint of the class should be aninteger.

•For other classes, follow the guideline below:1. The lower limit is the same as the upper limit of the preceding class.2. The interval length is the same for all classes.

FREQUENCY DISTRIBUTION TABLELOOK AT TABLE 2-2 ON PAGE 21

Relative Frequency Distribution• A frequency distribution can be converted

into a relative frequency distribution. Look at table 2-3 on page 22.

• The relative cumulative frequency column is obtained by adding cumulatively relative frequencies.

Data Presentation

• Data can be presented in several ways.

HistogramRelative frequency histogramPolygonOgive

Data Presentation

• HistogramA type of bar chart in which class limits are shown on x-axis and frequencies on Y-Axis. Figure 2-1. (page 25)

• Relative Frequency HistogramIf relative frequencies are shown on Y-Axis, a histogram is called a relative frequency histogram. See Figure 2-2 on page 25.

Data Presentation

• The PolygonIf the mid-points of all classes of a histogram are connected together, a frequency polygon is formed. Figure 2-3 (page 26) is a frequency polygon. A relative frequency polygon is created from a relative frequency histogram by connecting the mid-points of the classes as in a histogram. See Figure 2-4 on page 26.

The Ogive• On an ogive, the x-axis represents the upper limit of

each class and the y-axis represents cumulative frequencies. The points are connected. The lower limit of the first class is the beginning point with zero frequency. Figures 2-5 on page 27 is an ogive. A relative cumulative frequency ogive can be formed by replacing cumulative frequencies of an ogive with relative cumulative frequencies. Look at Figure 2-6 on page 27.

Other tools for data presentation are pie charts and bar charts shown on pages 28 and 29.