Upload
m-s-sridhar
View
396
Download
0
Embed Size (px)
DESCRIPTION
Researchers' Corner, J-gate newsletter, vol.4, no.5, May 2012. http://informindia.co.in/Jgatenewsletter-current.html
Citation preview
1
Researchers’ Corner
Working out Percentages and Random Number Generation
We have seen tally marking to prepare frequency table and various parts and features of a table
in the preparation for tabular presentation (Feb 2012 issue). Before we explore four steps to
tabular presentation, following two interesting tables found in two recent draft papers attracted
my attention.
Subject-wise distribution of books
Subjects Number
of Books
Percentage
Art & Architecture 10 1.06
Biographies 38 4.03
Generalia 9 0.95
Language& Literature 116 12.30
Mysticism 6 0.64
Religion& Philosophy 33 3.50
Sciences 46 4.88
Social sciences 685 72.64
Total 943 100
It is a common mistake in tabular presentations to work out percentages in wrong direction. In
the above tables, sample books and sample students are presented with distribution by subject
and branch of engineering respectively ignoring the total population. That is percentage of
books digitized should have been more meaningful for a given subject in relation to total books
in that subject and similarly the number of students in each branch in the sample in relation to
total students in that branch is necessary. First one is a digitization study consisting of 943
books out of over 3 lakh books in the library and the second is a study of use of e-resources by
engineering students with data collected through questionnaire from 150 sample students
selected from a population of 2160. It is claimed that simple random method is adopted without
any clue about how the size of sample was determined and the process followed for random
sampling. In both cases how the sample (or response) is distributed among characteristics like
‘subject’ in first case and engineering branch in the second is not examined. Some tips relating
to percentages are:
Branch-Wise Distribution of Engineering Students
Sl.
No. Branch
Students
Number Percentage
1
Electronic Communication
Engineering 30 20
2
Computer Science
Engineering 32 21
3 Information Technology 30 20
4
Electrical and Electronic
Engineering 25 17
5 Mechanical Engineering 19 13
6 Civil Engineering 14 09
TOTAL 150 100
Volume 4 Issue 5 May 2012
2
• Percentages including ratios and & proportion should be computed in the direction of causal
factor, if any
• Percentage should run only in the direction in which a sample is representative
• Do not average percentages ( without weighing by the size of samples)
• Do not use very large percentages (e.g. 1200% increase)
• Do not use too small a base (e.g. 33 1/3% for 1 in 3)
Incidentally, size of sample should be
Adequate to provide an estimate with sufficiently high precision
Representative to mirror the various patterns and sub-classes of the population
Neither too large nor too small, but optimum to meet efficiency (cost), reliability (precision) &
flexibility
Higher the precision and larger the variance, the larger the size and more the cost
The essence of Simple Random Sampling (SRS) is the non-zero equal probability of every
unit in the population to get selected, i.e., the probability of an unit getting selected in the
population N is 1/N [this is with replacement and the same without replacement is 1/N-1)].
However if we have to select n units (sample size) from a finite population of N, the probability
of every unit getting selected is n!/(N-1)!. For example, if N=5 and n=2, then n!/(N-1)! = 1/12. A
simple random sample is usually selected by without replacement. Often the phrase ‘Random
Sample’ and ‘Simple Random Sample’ are wrongly used interchangeably. As mentioned above,
in SRS each unit of the population has non-zero equal probability of being selected, where as in
‘Random sample’, it may have a known (equal or un-equal) probability of selection.
The selection process for finite population could be one of the following:
1. Lottery method (blind folded or using rotating drum) is an old classical method. All the units in
the population are numbered from 1 to N (and it is called sampling frame), written on the
small slips of paper, thoroughly mixed in the drum before picking blind folded. This method is
used when size of the population is small.
2. Random number table (like Tippetts numbers) is used for larger population as it is difficult to
mix the slips properly in lottery method. For example, one take two-digit numbers from the
table of random numbers if the population is up to 100 starting from any column or row of the
table. Of course any number above 99 will be ignored and if any number is repeated, it is not
considered in ‘sampling without replacement’. For example to select 10 items from a
population consisting of 150 items,
Number the population from 1 to 900 (the highest multiple of 150 less than 1000)
Select a starting position from the random table
3
Continue to choose numbers between 1 and 900 which has not already been selected till
you reach 10
Both lottery method and random number table method can be cumbersome, particularly for
large sample sizes.
3. Computer generated random numbers can be generated from free sources like StatTrek's
Random Number Generator (http://stattrek.com/statistics/random-number-generator.aspx) or
Random Integer Generator (http://www.random.org/integers/ ). Just answer online the
questions like how many random numbers, Minimum and Maximum value, whether to allow
duplicates, optional seed number you will have the SRS numbers in seconds.
The above example (under 2 above, Random number table) of choosing 10 samples from a
population of 150 in Random Integer generator gave the result: ‘Here are your random
numbers’
7 14 82 80 87
109 46 35 73 134
There are other methods of selection processes like Grid system for selecting a sample of an
area. Note that SRS is the basic selection process and all other complex random sampling
procedures are built on SRS.
M S Sridhar