3
1 Researchers’ Corner Four Steps to Tabular Presentation of Data Twice in the past, I mentioned about four steps to tabular presentation and here they are for the consumption of novice researchers who are not much exposed to basic statistics. To recapitulate, March 2012 issue elaborated the preparatory work for tabulation like tally marking so that a frequency table that displays data in a concise and logical order with one-way, two-way, or three-way classification depending upon the number of characteristics involved can be made. Note that the raw data itself can be classified broadly in four ways: qualitative, quantitative, temporal and spatial (see box for their definitions). Classification, by organizing similar things into groups or classes, brings order in the data and the classified data can be easily subjected to further statistical analysis. Mutually exclusive but exhaustive classes (or groups) are created while tabulating based on common characteristics. We use attributes (statistics of attributes) for qualitative data and class intervals, class limits, magnitude and frequencies for quantitative data (statistics of variables). Four steps presented here refer to quantitative data only. 1. Decide the number of classes: First, try to know the range and variations in the values of variables. Range is the difference between the largest and the smallest value of the variable. (It is also the sum of all class intervals or the number of classes i) Qualitative classification is based on qualitative characteristics like status, nationality, religion, marital status and gender. ii) Quantitative classification is based on characteristics measured quantitatively like age, height, income, etc. Quantitative variables can also be continuous or discrete. Continuous can take any numerical value like that of weight, height, etc. Discrete can take only certain values by a finite ‘jumps’ like number of books. It ‘jumps’ from one value to another but does not take any intermediate value between them. For example, we can have 71.5 Kg as weight of a person, but we cannot have 2.5 persons. iii) Temporal (or chronological) classification involves using time like hours, days, weeks, months or years as classifying variable (when it is in terms of years it is called time series). iv) Spatial classification is based on place as a classifying variable like village, town, block, district, state or country. Volume 4 Issue 7 July 2012 Newsletter

Four Steps to Tabular Presentation of Data

Embed Size (px)

DESCRIPTION

Researchers' Corner in J-gate newsletter 4(7) July 2012. http://informindia.co.in/Jgatenewsletter-current.html

Citation preview

Page 1: Four Steps to Tabular Presentation of Data

1

Researchers’ Corner

Four Steps to Tabular Presentation of Data

Twice in the past, I mentioned about four steps to tabular presentation and here they

are for the consumption of novice researchers who are not much exposed to basic

statistics. To recapitulate, March 2012 issue elaborated the preparatory work for

tabulation like tally marking so that a frequency table that displays data in a concise and

logical order with one-way, two-way, or three-way classification depending upon the

number of characteristics involved can be made. Note that the raw data itself can be

classified broadly in four ways: qualitative, quantitative, temporal and spatial (see box

for their definitions). Classification, by organizing similar things into groups or classes,

brings order in the data and the

classified data can be easily

subjected to further statistical

analysis. Mutually exclusive but

exhaustive classes (or groups) are

created while tabulating based on

common characteristics. We use

attributes (statistics of attributes)

for qualitative data and class

intervals, class limits, magnitude

and frequencies for quantitative

data (statistics of variables). Four

steps presented here refer to

quantitative data only.

1. Decide the number of classes: First, try to know the range and variations in the

values of variables. Range is the difference between the largest and the smallest

value of the variable. (It is also the sum of all class intervals or the number of classes

i) Qualitative classification is based on qualitative

characteristics like status, nationality, religion,

marital status and gender.

ii) Quantitative classification is based on

characteristics measured quantitatively like age,

height, income, etc. Quantitative variables can also

be continuous or discrete. Continuous can take any

numerical value like that of weight, height, etc.

Discrete can take only certain values by a finite

‘jumps’ like number of books. It ‘jumps’ from one

value to another but does not take any intermediate

value between them. For example, we can have 71.5

Kg as weight of a person, but we cannot have 2.5

persons.

iii) Temporal (or chronological) classification involves

using time like hours, days, weeks, months or years

as classifying variable (when it is in terms of years it

is called time series).

iv) Spatial classification is based on place as a

classifying variable like village, town, block, district,

state or country.

Volume 4 Issue 7 July 2012 Newsletter

Page 2: Four Steps to Tabular Presentation of Data

2

multiplied by class interval). In the sample Table 2 of March issue (see table) we had

the price of elementary textbooks ranging from, say 4 to 99 and hence had a range of

95. It was decided to have 10 classes of each with size or class interval of 10.

2. Decide the size of each class: This decision is inter-linked with the previous, i.e.,

with the number of classes. The thumb rule is to have 5 to 15 classes. The

mathematical way to work out size of class is given by the formula i = R / 1+3.3 log N ,

where i is the size of class interval, R is Range, N is Number of items to be grouped.

In the above referred table, it is already mentioned that, we have chosen a size of 10

for each class.

3. Determine the class limits: Choose a value less than the minimum value of the

variable as the lower class limit of the first class and a value greater than the

maximum value of the variable as the upper class limit for the last class. In the

example, we have chosen 1 as the lower class limit of the first class and 100 as the

upper class limit for the last class. It is important to choose class limit in such a way

that mid-point or class mark of each class coincides, as far as possible, with any value

around which the data tend to be concentrated. That is the class limits are chosen in

such a way that midpoint is close to average. Once the class limits are chosen, we

have the class interval. In other words, class intervals become the various intervals

of the variable chosen for classifying data. In the example we have chosen equal

Page 3: Four Steps to Tabular Presentation of Data

3

class interval for all the 10

classes. See diagram showing

the way midpoints of even and

odd class-intervals are

determined. Further, the class

intervals could be either

exclusive or inclusive (see

text box for further explanation).

4. Find the frequency of each class: Find how many times that a certain

observation occurs in the raw data to place in a suitable class as per tally marking

(see March 2012 issue).

Lastly, one may wonder why all these mind boggling exercises when software provides

ready-to-use table. True, much of statistical drudgery is simplified by software, but the

concepts and terms in these steps are required even to use the software. As an

exercise, try the pivot table tool of Excel to generate a frequency table with five classes.

M S Sridhar

[email protected]

(i) Exclusive method: When the upper class limit of one class equals the lower class limit of the next

class, it is exclusive interval. This is suitable for data from a continuous variable and while

recording frequencies the upper class limit is excluded but the lower class limit of a class is

included in the interval.

(ii) Inclusive method: If both lower and upper class limits are parts of the class interval it is inclusive

interval. If a ‘gap’ or discontinuity between the upper limit of a class and the lower limit of the

next class is found, an adjustment in class interval is done. The procedure is to divide the

difference between the upper limit of first class and lower limit of the second class by 2 and

subtract it from all lower limits and then add it to all upper class limits. This adjustment restores

continuity of data in the frequency distribution, i.e., Adjusted class mark = (Adjusted upper limit

+ Adjusted lower limit) / 2.