How to Organize Data

Embed Size (px)

Citation preview

  • 7/28/2019 How to Organize Data

    1/4

    HOW TO ORGANIZE DATA

    One way of organizing raw data or observations is through the use offrequencydistribution table. One such example is a profile of cooperatives in a province which is

    given below:

    Initial Capital, in Pesos Number of CooperativesBelow 25,000 43

    25,000 49,999 28

    50,000 74,999 17

    75,000 and above 12

    STEPS IN CONSTRUCTING A FREQUENCY DISTRIBUTION TABLE:

    1. Obtain the number of class intervals to be used.Usually, the number of class intervals should be anywhere from 5 to 20. Too

    many intervals would result in a loss of organization. Too few intervals, on the

    other hand, would result in a loss of detail. To obtain a more specific guide, we

    can use Sturges Rule which states that:

    K = 1 + 3.322*(log10 n)

    whereK = the number of class intervals rounded upwards

    n = the number of observations

    2. Obtain the size of the intervals.To obtain an initial estimate of the size of the intervals we can use the formula:

    r

    cc

    i

    K

    ISO

    ILO

    i

    +

    =22

    where

    ii = the initial estimate of the interval size

    LO = the largest observed value

    Ic = the smallest increment of change in dataSO = smallest observed value

    Kr = the K value obtained from Sturges rule rounded upwards

    The interval size, I, is obtained by rounding to the nearest decimal places asindicated by the given raw data. If the data are in whole numbers, round to the

    nearest integer or whole number.

    3. Compute for the excess space.

    EXCESS = Space available Required Space, where

    Space available = Kr*IRequired Space = [LO + Ic/2] [SO Ic/2]

    4. Construct the table.

  • 7/28/2019 How to Organize Data

    2/4

    With the interval size and the number of intervals known, we can construct the

    frequency distribution by first dividing the excess between the lowest and the

    highest ends of the data.

    GRAPHICAL METHODS FOR DESCRIBING QUANTITATIVE DATA:

    (1) Frequency Histogram a bar graph representation of a frequency distributiontable. Marked along the horizontal axis are the class boundaries (CB).

    Frequencies are marked along the vertical axis. Each interval is drawn as a bar

    bounded or defined by the class boundaries and the corresponding frequencies.(2) Frequency Polygon uses class midpoints (CM) to represent the intervals. Class

    midpoint is computed as the average of the lower class limit (LCL) and the upper

    class limit (UCL). Class limits are the visible limits of the intervals in the

    frequency distribution table.

    NUMERICAL DESCRIPTIVE MEASURES

    - are numbers that are used to create a mental image of a data set.

    (1) Measures of Central Tendency or Location:

    The measure of central tendency is the point about which scores tend to cluster; asort of average in a series. It is the center of concentration of scores in any set of

    data. It is a single number which represents the general level of performance of

    the group.

    The three measures of central tendency in common use are: Mean, Median and

    Mode.

    MEAN is defined as the sum of the values in the data group divided by the

    number of values. The formula is:

    n

    XX =

    where X = the raw data or observations

    n = the number of observations or valuesFor grouped data which is in the form of a frequency table, the formula is:

    n

    fxX

    =

    where f = frequency of each class interval

    x = class midpoints

    n = total number of observations

    MEDIAN the middle value in an arrayed data (data which has been arranged in

    ascending order).

    For grouped data, the formula is:

    If

    Fn

    LCBXmed

    med

    += 2~

  • 7/28/2019 How to Organize Data

    3/4

    where:

    LCBmed = lower class boundary of the median class

    n = number of observationsF = cumulative frequency of the class before the median class

    fmed = frequency of the median class

    I = class interval size

    The median class is identified as the class whose cumulative frequency reaches

    n/2 first.

    MODE the value which occurs with the most number of times in a data set.

    For grouped data, the formula is:

    Iff

    fLCBM

    +

    +=

    21

    1

    mod

    where:

    LCBmod = lower class boundary of the modal class

    f1 = the difference between the frequency of the modal class andthe class immediately before it

    f2 = the difference between the frequency of the modal class and

    the class immediately after itI = size of the class interval

    (2) Measures of Dispersion or Variability:

    The measures of dispersion indicate the nature or degree of clustering. The moreconcentrated the values about the mean or average the more meaningful is the

    average as a measure of location.

    RANGE is the simplest measure of spread or variability. It is the differencebetween the highest score and lowest score in any given set of data or distribution.

    In the case of the data grouped into intervals, the range becomes the differencebetween the higher boundary of the highest class and the lower boundary of

    lowest class.

    The range is not considered a stable measure of variability because itconsiders only extreme values thus its value can fluctuate greatly with a change in

    a single score either the highest or the lowest score.

    STANDARD DEVIATION the most useful measure of variability. It is specialform of average deviation from the mean which is affected by all individual

    values of the items in any given distributions.For ungrouped population, the standard deviation is given by:

    ( )

    N

    X =2

    where X = the individual values of all the items

    = the population mean

    N = the population size

  • 7/28/2019 How to Organize Data

    4/4

    For grouped population, the standard deviation is given by:

    ( )[ ]N

    Xf =

    2

    where X = the class midpoints

    f = frequency of each class interval

    = the population meanN = the population size

    For ungrouped samples, the standard deviation is given by:

    ( )1

    2

    =

    n

    XXs

    where X = the individual values of all the items

    Xbar = the sample meann = sample size

    For grouped samples, the standard deviation is given by:

    ( )[ ]1

    2

    =

    n

    XXf

    s

    where X = the class midpoints

    Xbar = the sample mean

    n = sample size