37
1 IB-11-13-Pre-Induction – Basics of Business Statistics SYMBIOSIS INSTITUTE OF INTERNATIONAL BUSINESS [SIIB] Constituent of Symbiosis International University Accredited by NAAC with ‘A’ Grade

Module2_Basics of Business Statiscs

Embed Size (px)

DESCRIPTION

business stats

Citation preview

  • 1IB-11-13-Pre-Induction Basics of Business Statistics

    SYMBIOSIS INSTITUTE OF INTERNATIONAL BUSINESS [SIIB]

    Constituent of Symbiosis International University Accredited by NAAC with A Grade

  • 2IB-11-13-Pre-Induction Basics of Business Statistics

    INDEX

    MODULE 2 BASICS OF BUSINESS STATISTICS

    Sr. No. Content Page No. 1 Business Statistics

    Introduction History Applications of Business Statistics Levels of Measurement Variables

    3

    2 Presentation of Data

    Types of Data Frequency Distribution Diagrammatic Presentation Cumulative Frequency Distribution Statistical graphs Exercise

    8

    3 Measures of Central Tendency and Dispersion Types of Data Presentation Measures of Central Tendency Measure of Dispersion Shape Exercise

    16

    4 Probability

    Counting Principles Permutations Combinations Exercise Probability Axiomatic Approach to Probability Addition Rule of Probability Exercise

    25

  • 3IB-11-13-Pre-Induction Basics of Business Statistics

    BBUUSSIINNEESSSS SSTTAATTIISSTTIICCSS

    Introduction

    Statistics is a mathematical science pertaining to the collection, presentation, analysis and interpretation or explanation of data. It also provides tools for prediction and forecasting based on data. It is applicable to a wide variety of academic disciplines, from the natural and social sciences to the humanities, government and business.

    Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, and are then used to draw inferences about the process or population being studied; this is called inferential statistics. Descriptive, predictive, and inferential statistics comprise applied statistics. Business statistics is the science of good decision making in the face of uncertainty and is used in many disciplines such as financial analysis, econometrics, auditing, production and operations including services improvement, and marketing research.

    History

    The scope of the discipline of statistics broadened in the early 19th century to include the collection and analysis of data in general. Today, statistics is widely employed in government, business, and the natural and social sciences.

    Because of its empirical roots and its applications, statistics is generally considered not to be a subfield of pure mathematics, but rather a distinct branch of applied mathematics. Its mathematical foundations were laid in the 17th century with the development of probability theory by Pascal and Fermat. Probability theory arose from the study of games of chance. The method of least squares was first described by Carl Friedrich Gauss around 1794. The use of modern computers has expedited large-scale statistical computation, and has also made possible new methods that are impractical to perform manually.

    In applying statistics to a scientific, industrial, or societal problem, it is necessary to begin with a process or population to be studied. Population is aggregate of objects animate or inanimate. There might be a population of people in a country, of crystal grains in a rock, or of goods manufactured by a particular factory during a given period. It may instead be a process observed at various times; data collected about this kind of "population" constitute what is called a time series.

    For practical reasons, rather than compiling data about an entire population, a chosen subset of the population, called a sample, is studied. Data are collected about the sample in an observational or experimental setting. The data are then subjected to statistical analysis, which serves two related purposes: description and inference.

  • 4IB-11-13-Pre-Induction Basics of Business Statistics

    Descriptive statistics can be used to summarize the data, either numerically or graphically, to describe the sample. Basic examples of numerical descriptors include the mean and standard deviation. Graphical summarizations include various kinds of charts and graphs.

    Inferential statistics is used to model patterns in the data, accounting for randomness and drawing inferences about the larger population. These inferences may take the form of answers to yes/no questions (hypothesis testing), estimates of numerical characteristics (estimation), descriptions of association (correlation), or modeling of relationships (regression). Other modeling techniques include ANOVA, time series, and data mining.

    If the sample is representative of the population, then inferences and conclusions made from the sample can be extended to the population as a whole. A major problem lies in determining the extent to which the chosen sample is representative. Statistics offers methods to estimate and correct for randomness in the sample and in the data collection procedure, as well as methods for designing robust experiments in the first place. (See experimental design.)

    The fundamental mathematical concept employed in understanding such randomness is probability. Mathematical statistics (also called statistical theory) is the branch of applied mathematics that uses probability theory and analysis to examine the theoretical basis of statistics.

    The use of any statistical method is valid only when the system or population under consideration satisfies the basic mathematical assumptions of the method. Misuse of statistics can produce subtle but serious errors in description and interpretation subtle in the sense that even experienced professionals sometimes make such errors, serious in the sense that they may affect, for instance, social policy, medical practice and the reliability of structures such as bridges. Even when statistics is correctly applied, the results can be difficult for the non-expert to interpret. For example, the statistical significance of a trend in the data, which measures the extent to which the trend could be caused by random variation in the sample, may not agree with one's intuitive sense of its significance. The set of basic statistical skills (and skepticism) needed by people to deal with information in their everyday lives is referred to as statistical literacy.

    Applications of Business Statistics

    Accounting

    Public accounting firms use statistical sampling procedures when conducting audits for their clients.

  • 5IB-11-13-Pre-Induction Basics of Business Statistics

    Economics

    Economists use statistical information in making forecasts about the future of the economy or some aspect of it.

    Marketing

    Electronic point-of-sale scanners at retail checkout counters are used to

    collect data for a variety of marketing research applications

    .Production

    A variety of statistical quality

    control charts are used to monitor

    the output of a production process

    Finance

    Financial advisors use price-earnings ratios and dividend yields to guide their investment

    recommendations.

  • 6IB-11-13-Pre-Induction Basics of Business Statistics

    Levels of measurement

    There are four types of measurements or levels of measurement or measurement scales used in statistics: nominal, ordinal, interval, and ratio. They have different degrees of usefulness in statistical research.

    Nominal

    When the data for a variable consists of labels or names used to identify an attribute of the element, the scale of measurement is considered to be nominal.

    Example:

    Students of a university are classified by the school in which they are enrolled using a non-numeric label such as Business, Humanities, Education, and so on. Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and so on).

    Ordinal

    The data have the properties of nominal data and the order or rank of the data is meaningful.

    A nonnumeric label or numeric code may be used.

    Example:

    Students of a university are classified by their class standing using a nonnumeric label such as Freshman, Sophomore, Junior, or Senior. Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on).

    Interval

    The data have the properties of ordinal data, and the interval between observations is expressed in terms of a fixed unit of measure.

    Interval data are always numeric.

    Example:

    Melissa has an SAT score of 1205, while Kevin has an SAT score of 1090. Melissa scored 115 points more than Kevin.

    Ratio

    The data have all the properties of interval data and the ratio of two values is meaningful. Variables such as distance, height, weight, and time use the ratio scale. This scale must contain a zero value that indicates that nothing exists for the variable at the zero point

  • 7IB-11-13-Pre-Induction Basics of Business Statistics

    Example:

    Melissas college record shows 36 credit hours earned, while Kevins record shows 72 credit hours earned. Kevin has twice as many credit hours earned as Melissa.

    Qualitative and Quantitative Data

    Data can be further classified as being qualitative or quantitative. Qualitative data include labels or names used to identify an attribute of each element. Qualitative data use either the nominal or ordinal scale of measurement and may be nonnumeric or numeric. Quantitative data are obtained using either the interval or ratio scale of measurement. The statistical analysis appropriate for a particular variable depends upon whether the variable is qualitative or quantitative.

    If the variable is qualitative, the statistical analysis is rather limited. We can summarize the qualitative data by counting the number of observations in each category or by computing the proportion of the observations in each qualitative category.

    However if the characteristic is quantitative, arithmetic operations often provide meaningful results.

    Discrete variable

    A variable taking isolated values is called discrete variable. The graphical representation of a discrete variable is a step function. Examples of a discrete variable can be number of people in a group, number of accidents occurring on a particular day etc.

    Continuous variable

    A variable which takes any value within the given interval is referred to as continuous variable. e. g. weight of a person, temperature on a given day, rainfall on a given day etc. Graphical presentation of a continuous variable is a curve.

  • 8IB-11-13-Pre-Induction Basics of Business Statistics

    PRESENTATION OF DATA

    Types of Data Primary data Primary data is the one which is collected for the first time by the investigator. He can collect it using various methods, like survey (census), telephonic interviews, through e-mails etc. This data are generally referred to as raw data as it is unprocessed data.

    Secondary data

    In some cases, data needed for a particular application already exist. Companies maintain a variety of records or databases about their employees, and business operations. Data are also available from a variety of industry associations and special interest organizations. The internet continues to grow as an important source of information and statistical data. Almost all companies maintain Web sites that provide general information about the company as well as data on sales, number of employees, number of products etc. Government agencies are another important source of secondary data. Information on vital events (birth, death etc.) is available with the governmental agencies. Some times information can also be collected form published journals.

    Presentation of Data

    After collection of data, the next stage the statistician has to go through is presentation of data. Usually, size of the information collected is huge, so it becomes necessary to present it in a more systematic and concise way in order to bring out important feature or characteristics of the data. Basically, there are two ways to represent data.

    1. Tabular 2. Graphical

    Let us start the discussion with the introduction of frequency distribution and various components of the frequency distribution.

    Frequency Distribution

    A frequency distribution is a tabular summary of data showing the number (frequency) of items in each of several non-overlapping classes. When raw data is converted into the frequency distribution, frequency distribution provides summary which offers more insight than the original data. Three steps necessary to define classes for a frequency distribution with quantitative data are:

    1. Determine the number of non-overlapping classes. 2. Determine the width of each class. 3. Determine the class limits.

    Number of classes There is no specific rule for choosing the number of classes. As a general guideline, classes between 5 and 20 are chosen.

  • 9IB-11-13-Pre-Induction Basics of Business Statistics

    Width of the class Generally, we choose same width for all classes. Width is denoted by h. Class width = Upper class boundary Lower class boundary = Difference between the two consecutive upper limits = Difference between the two cosecutive lower limits

    Mid-point of the class (class mark) It is the mid point of the class interval. It is denoted by x. It is obtained as

    Upper class limit + Lower class limit Class mark = ___________________________________ 2

    Upper class boundary + Lower class boundary =______________________________________

    2

    Relative Frequency Frequency of the class Relative frequency of class = ________________________ n

    Where, n is the total number of observations.

    Tabulation

    While presenting the data one can make use of tabulation. It is the most concise way of presentation of data. There can be one-way, two-way or multifold tables depending on number of columns and rows we choose.

    Diagrammatic representation

    One of the graphical ways of representing data is diagrammatic representation. In this, one can use Bar graphs or Pie-charts. Bar graph can be of simple, multiple, sub-divided or percentage type. This is pictorial presentation of data. In this, points are not plotted according to the scale. These are more attractive and colourful as compared to various graphs available in statistics.

  • 10IB-11-13-Pre-Induction Basics of Business Statistics

    Simple bar diagram

    Pie-chart

    Graphical Representation

    A common graphical presentation of quantitative data is a histogram. It is series of adjacent rectangles erected on X-axis. It is constructed by placing the variable of interest on the horizontal axis and the frequency, relative frequency, or percent frequency on the vertical axis.

    Histogram

  • 11IB-11-13-Pre-Induction Basics of Business Statistics

    Before we learn ogive curve, let us look at cumulative frequency distribution. Cumulative Frequency Distribution The following frequency distribution table gives the marks obtained by 40 students: Cumulative frequency is obtained by adding the frequency of a class interval and the frequencies of the preceding intervals unto that class interval. This is explained by an example below.

    Class Mark Frequency Cumulative frequency 0-10 4 4 10-20 5 (4) + 5 = 9 20-30 12 (9) + 12 =21 30-40 11 (21) + 11 = 32 40-50 8 (32) + 8 = 40

    In the above table it can be observed that frequencies are added from top to bottom and also 4 students got marks 'less than 10', 9 students got marks 'less than 20' and so on. Therefore, the above distribution is called 'less than' cumulative frequency distribution. The above table can be re-written as follows:

    In the same way 'more than' cumulative frequency distribution can be obtained by adding to the other frequencies in the reverse order. It is explained in the following table.

    Class Mark Frequency Cumulative frequency 0-10 4 (36) + 4 = 40 10-20 5 (31) + 5 = 36 20-30 12 (19) + 12 =31 30-40 11 (8) + 11 = 19 40-50 8 8

  • 12IB-11-13-Pre-Induction Basics of Business Statistics

    The above table can be re-written as follows

    Ogive curve

    It is a cumulative frequency curve. There are two types of ogive curve; less than ogive curve and more than ogive curve. Ogive curve is drawn by taking data values on the horizontal axis and cumulative frequencies on the vertical axis.

    Example Draw a 'less than' ogive curve for the following data

    To Plot an Ogive: (i) We plot the points with coordinates having abscissa as actual limits and ordinates as the cumulative frequencies, (10, 2), (20, 10), (30, 22), (40, 40), (50, 68), (60, 90), (70, 96) and (80, 100) are the coordinates of the points. (ii) Join the points plotted by a smooth curve. (iii) An Ogive is connected to a point on the X-axis representing the actual lower limit of the first class. Scale: X -axis 1 cm = 10 marks, Y -axis 1cm = 10 c.f.

  • 13IB-11-13-Pre-Induction Basics of Business Statistics

    Example Using the data given below, construct a 'more than' cumulative frequency table and draw the Ogive.

    To Plot an Ogive (i) We plot the points with coordinates having abscissa as actual lower limits and ordinates as the cumulative frequencies (70.5, 2), (60.5, 7), (50.5, 13), (40.5, 23), (30.5, 37), (20.5, 49), (10.5, 57), (0.5, 60) are the coordinates of the points. Y-axis 2 cm = 10 c.f. (iii) An Ogive is connected to a point on the X-axis representing the actual upper limit of the last class [in this case) i.e., point (80.5, 0)]. Scale: X-axis 1 cm = 10 marks (ii) Join the points by a smooth curve.

  • 14IB-11-13-Pre-Induction Basics of Business Statistics

    Frequency Polygon The weights of 50 students are recorded below. Draw a frequency polygon for this data. Example In a frequency distribution, the mid-value of each class is obtained. Then on the graph paper, the frequency is plotted against the corresponding mid-value. These points are joined by straight lines. These straight lines may be extended in both directions to meet the X - axis to form a polygon.

    Answer

    If the above graph is joined by a smooth curve, then it is known as a frequency curve

  • 15IB-11-13-Pre-Induction Basics of Business Statistics

    Exercise The raw data displayed below are the electric and gas utility charges during the month of July 1990, for a random sample of 50, one- bedroom apartments in Mumbai: 96 171 202 178 147 102 153 197 127 82 157 185 90 116 172 111 148 213 130 165 141 149 206 175 123 128 144 168 109 167 95 163 150 154 130 143 187 166 139 149 108 119 183 151 114 135 191 137 129 158

    a. Form a frequency distribution having 7 class intervals with the following class boundaries Rs.80 but less than Rs.100, Rs.100 but less than Rs.120, and so on.

    b. Form the percentage distribution from the frequency distribution developed in a. c. From the percentage distribution developed in b.

    i. Plot the percentage histogram.

    ii. Plot the percentage polygon.

    d. From the frequency distribution developed in a.

    i. Approximate mean, mode, range, midrange, standard deviation and coefficient of variation.

    ii. Based on Chebyshevs rule, between what two values would we estimate that at least 75% of the data are contained?

    iii. What percentage of data are actually contained within 2 S.D. of the mean? iv. Compare above results with those in part ii.

    e. From the frequency distribution developed in a.

    i. Form the cumulative frequency distribution. ii. Form the cumulative percentage distribution. iii. Plot the ogive.

    iv. Approximate the median, Q1, Q3, the midhinge and the interquartile range.

  • 16IB-11-13-Pre-Induction Basics of Business Statistics

    MEASURE OF CENTRAL TENDENCY AND DISPERSION

    Types of Data Presentation

    Generally, data can be arranged in one of the following three ways.

    Series of individual observations x1 , x2 ,, xn

    Ungrouped Frequency Distribution ( xi , fi ) ; i=1, 2,.,n xi : i th observation in the series fi : frequency of ith observation in the series

    Grouped Frequency Distribution ( xi , fi ) ; i=1, 2,.,k xi : midpoint of the i th class fi : frequency of i th class

    Describing and Summarizing Data

    Three major properties which describe a batch of a numerical data are Central Tendency Dispersion Shape

    Summery measures computed from a sample of data are called Statistics. Descriptive summary measures computed from an entire population are called Parameters.

    Measure of central tendency/Location

    Most batches of data show a distinct tendency to group or cluster about a certain central value. Hence, generally it becomes possible to select some typical value called average, to describe the entire batch. Such a typical value is measure of central tendency or location. Different measures of central tendency are

    Arithmetic Mean Median Mode Midrange Midhinge

  • 17IB-11-13-Pre-Induction Basics of Business Statistics

    Arithmetic Mean

    It is obtained by adding the raw scores and dividing the sum by the number of items. Properties

    Based on each and every observation in the series. Capable of further mathematical treatment. Gives distorted representation of data under study if data consists of outliers, i. e. it is

    greatly affected by extreme observations.

    To find the mean of raw data

    Suppose the raw scores are x1, x2, x3,, xN

    then, mean is

    where, M = mean

    x = each score or item

    N = number of items

    = sigma, which means 'summation of '

    Example: Find the mean of 6, 10, 4, 12, 8.

    M = 8

    To find mean for grouped data

    Where, x is the mid-interval

  • 18IB-11-13-Pre-Induction Basics of Business Statistics

    M is the mean f is the frequency

    Example: Find the mean for the following table by the 'Direct Method'

    Example: Calculate the mean marks in the distribution given below.

  • 19IB-11-13-Pre-Induction Basics of Business Statistics

    = 29.75

    Median

    Median is defined as the middle value in an ordered sequence of data. It is not affected by magnitude of the observation but is affected by number of observations.

    Example: Find the median of 83, 37, 70, 29, 45, 63, 41, 70, 30, 54

    Data in the sequence is 29, 30, 37, 41, 45, 54, 63, 70, 70, 83 Median = Middle-most score

    Median = 49.5

    Example: Find the median of 15, 8, 14, 20, 13, 12, 16. Series in order is 8, 12, 13, 14, 15, 16, 20 n = 7 (odd)

    Median = 14

  • 20IB-11-13-Pre-Induction Basics of Business Statistics

    Mode

    Mode is defined as the value in a batch of data which occurs most frequently. It does not get affected by extreme observations. It is not used for more than descriptive purpose because it is more variable from sample to sample than other measure of central tendency.

    Example: Find the mode of 43, 42, 44, 40, 48, 45, 40, 40 The given series is 40, 40, 40, 42, 43, 44, 45, 48 Since 40 is the most repeated score, Mode = 40

    Midrange

    It is defined as the average of the two extremes of the data. Let xmax and xmin be the two extremes of the data then mid-range is defined as xmax + xmin Midrange = _________ 2

    The main drawback of this is that it becomes distorted as a summary measure of central tendency if an outlier is present.

    Measures of Dispersion

    Measure of location alone cannot reveal all the characteristics possessed by data under study. For example, it may happen that two series having same measure of central tendency may have different pattern of variation and if we try to compare these two series using average it will not be a right thing to do. A measure which can measure this variation is called measure of dispersion. Following are measures of dispersion which are most frequently used.

    Range Variance Standard Deviation Coefficient of Variation

    Range It is a crude measure of dispersion. It measures the total spread in the batch of data. It is given by xmax - xmin

    It fails to take into account how the data are distributed between the smallest and the largest values.

  • 21IB-11-13-Pre-Induction Basics of Business Statistics

    Variance

    It is based on each and every observation in the series. It is defined as mean of squared deviation of each observation about mean.

    Standard Deviation

    It is the most commonly used measure of dispersion. It is defined as positive square root of the variance. Variance and standard deviation reflect how data are varying. They measure the average scatter around the mean- that is, these measures evaluate how the values fluctuate about the mean. Standard deviation is calculated using the following formulae.

    For an individual series,

    For a frequency distribution,

    The square of the Standard deviation is known as Variance.

    Coefficient of Variation

    It is a relative measure of dispersion. It is particularly used when comparing the variability of two or more batches of data that are expressed in different units of measurement. C.V. is also used in a situation where we want to compare two or more sets of data which are measured in the same units but differ to such an extent that the direct comparison of the respective standard deviation is not very useful.

    00100

    .

    .).(var =MADS

    vciationtofCoefficien

    Example: Calculate the standard deviation and the variance for the following data 7, 8, 11, 6, 13, 8, 10.

  • 22IB-11-13-Pre-Induction Basics of Business Statistics

    Answer

    NMx

    Variance

    ==

    22 )(

    14.5736

    =

    =

    27.2736

    . === DS

    Shape

  • 23IB-11-13-Pre-Induction Basics of Business Statistics

    For Symmetric Distribution, Mean = Median = Mode

    For Right Skewed (Positively Skewed) Distribution, mean is affected by extremely large observation. In this case, mode < median < mean < midrange

    For Left Skewed (Negatively skewed) Distribution, midrange < mean < median < mode

    Quartiles These are the partition values. Quartile is a useful measure of non-central location. It is often employed when one wants to summarize or describe the properties of large batches of quantitative data. There are three quartiles, Q1 , Q2 and Q3 .

    Midhinge The midhinge is the mean of the first and third quartiles in a batch of data. It is used to overcome potential problems introduced by extreme values in the data. It is the measure of central tendency.

    Interquartile Range It is the measure of dispersion which measures the spread of middle 50 % of the observations. Hence, it is not affected by extreme observations.

    For Symmetric distribution median =midhinge = midrange = mean=mode

    For Positively Skewed distribution mode < median < midhinge < mean < midrange

    For Negatively Skewed distribution midrange < mean < midhinge < median < mode

    The Five Number Summary

    Median, midhinge and interquartile range are called resistance statistics because they are relatively insensitive to extreme values. In order to get a better idea about the shape of the distribution, we use the five number summery. These five numbers are; Xmin , Q1 , Q2 , Q3 and Xmax

  • 24IB-11-13-Pre-Induction Basics of Business Statistics

    Exercise

    1. In a class of 50 students, 10 have failed and their average of marks is 2.5. The total marks secured by the entire class were 281. Find the average marks of students who nave passed.

    2. What will be the mean and the median of 7 consecutive integers, the least of which is x. 3. Mean and median of 51 items are 100 and 95 respectively. At the time of calculations

    two items 180 and 90 were wrongly taken as 100 and 10. What are the correct values of mean and median?

    4. The mean of a group of 10 observations is 15. Fifteen more observations are added to this group and the mean of these 25 observations is found to be 12. Find the mean of the additional 15 observations.

    5. The mean of a group of 20 items is 30. Find the mean if each value is doubled and increased by 5.

    6. Calculate population variance from the following information; n = 15, x = 480, x2 =15735

    7. Means and variances of two series are given below:

    Mean Variance

    Series A 54 9

    Series B 100 4

    Which series is more stable?

    8. Two samples of size 40 and 45 respectively have the same mean 53, but different standard deviations 19 and 8. Find the standard deviation of the combined group.

    9. Find population variance of observations 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. Compare its variance with population variance of 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20.

    10. The mean and the standard deviation of population of 100 items were found to be 50 and 5 respectively. If at the time of calculations, two items were wrongly taken as 40 and 50 instead of 60 and 30, find the correct standard deviation.

    ----------

  • 25IB-11-13-Pre-Induction Basics of Business Statistics

    PROBABILITY

    Counting Principles Addition If two different operations can be performed in m and n different ways, then the number of ways in which either operation 1 or operation 2 can be performed is given by (m+n) ways.

    Multiplication If two different operations can be performed in m and n different ways, then the number of ways in which both operation 1 and operation 2 can be performed is given by (m*n) ways.

    Permutations

    Permutation is an arrangement of n things. In this case order in which these things are arranged is important. Broadly speaking, there are 2 different cases in which any problem on permutation can be classified into.

    Case I

    Arrangement of n distinct things taken r at a time is given by nPr.

    Examples: 1) 2 and 3 are two digits and with these digits, the numbers 32 and 23 are formed. Although, numbers viz., 32 and 23 consist of the digits 2 and 3, the order of digits is different. Each of the above arrangements is called a 'permutation'. Thus, the number of arrangements or permutations of two distinct digits 2 and 3 is 2.

    2) The permutation of the three letters a, b, c taken two at a time are

    The number of permutations of n dissimilar things taken r at a time without repetition is denoted by nPr. And is given by

    The number of permutations of n different things taken r at a time is the same as the number of ways of filling n letters in r positions, arranged in a straight line. Each position is accommodating only one letter. We may fill the first position with any one of the n letters. Having filled the first position in any one of these n ways, we have (n-1) letters with which to fill the next position. Having filled the first two positions, we have (n-2) letters with which to fill the third position. Proceeding in this way one can see that filling r positions is like performing r different operations with n, (n-1), (n-2) .. different ways respectively. And since, we have to fill all r

  • 26IB-11-13-Pre-Induction Basics of Business Statistics

    positions; we need to multiply the respective number of ways. Therefore, the total number of ways in which r positions can be filled with n letters without repetition is n (n-1) (n-2) (n-3) (n-r+1). Thus, number of r-permutations of n different things denoted by nPr = P(n,r) is given by nPr = n(n-1)(n-2) (n-3)...(n - r +1)

    If we put r = n in the above formula, then

    We may understand that 0! = 1.

    Properties

    Case II Circular Permutations When things are arranged in places along a line with first and last place, they form a linear permutation. So far we have dealt only with linear permutations. When things are arranged in places along a closed curve or a circle, in which any place may be regarded as the first or last place, they form a circular permutation. Thus, the number of permutations of 4 objects in a row = 4!, where as the number of circular permutations of 4 objects is (4-1)! = 3!. The permutation in a row or along a line has a beginning and an end, but there is nothing like beginning or end or first and last in a circular permutation. In circular permutations, we consider one of the objects as fixed and the remaining objects are arranged as in linear permutation. The following arrangements of 4 objects O1, O2, O3, O4 in a circle will be considered as one or same arrangement

  • 27IB-11-13-Pre-Induction Basics of Business Statistics

    Observe carefully that when arranged in a row, O1 O2 O3 O4, O2O3O4 O1, O3O4O1O2, O4O1O2O3 are different permutations. When arranged in a circle, these 4 permutations are considered as one permutation.

    Theorem: The number of circular permutations of n different objects is (n-1)!.

    Proof: Each circular permutation corresponds to n linear permutations depending on where we start.

    Since there are exactly n! linear permutations, there are exactly permutations. Hence, the number of circular permutations is the same as (n-1)!.

    Example

    Suppose there are n guests to be arranged along a circular table, then we have to fix the position of one of the guest (which can be done in only one way) and then arrange remaining (n-1) guest in (n-1) positions just like in linear case. Thus, the total number of ways in which n guest can be arranged in a circular manner is (n-1)!

    Combinations The number of ways of selecting r things out of n dissimilar things is denoted by C(n, r) or nCr The selections of number of things taking some or all of them at a time are called combinations.

    Example: From a class of 32 students, 4 are to be chosen for a competition. In how many ways can this be done? We are to select 4 students from 32. This selection can done in

    Note that there is a relationship between permutations and combinations. For a given set of n dissimilar things number of permutations is always greater than corresponding number of combinations.

  • 28IB-11-13-Pre-Induction Basics of Business Statistics

    Properties

    C(n,0) = C(n,n) = 1

    Difference between a Permutation and a Combination In a combination, only selection is made. In a permutation, not only a selection is made,

    but also there is an arrangement of a definite order. There is no order of selection in combinations. In permutation, order is a must. Usually (i.e., except in special cases or trivial cases), the number of permutations exceeds

    the number of combinations.

  • 29IB-11-13-Pre-Induction Basics of Business Statistics

    Exercise

    1. A gentleman has 6 friends to invite. In how many ways can he send invitation cards to them if he has 3 servants to carry the cards?

    2. How many numbers, each lying between 100 and 1000, can be formed with digits 2, 3, 4, 0, 8, 9 (if repetitions of digits are not allowed)?

    3. How many three digit numbers divisible by 5 can be formed using any numerals from 0 to 9 without repetition?

    4. There are 10 points in a plane, of which 3 are collinear. Find the number of triangles formed by joining these points.

    5. From 7 engineers and 4 doctors a committee of 5 members is to be formed. In how many ways can this be done

    i. To include exactly one doctor? ii. To include at least one doctor?

    6. There are 2 books each of 3 volumes and 2 books each of 2 volumes. In how many ways can these be arranged on a shelf so that the volumes of the same book remain together?

    7. A company has 11 computer engineers and 7 mechanical engineers. In how many ways can they be seated in a row so that no 2 of the mechanical engineers may sit together?

    8. A company has 11 computer engineers and 7 mechanical engineers. In how many ways can they be seated in a row so that all the mechanical engineers do not sit together?

    9. How many words can be formed using letters of the word MATHEMATICS if i. there is no restriction

    ii. all the vowels are together iii. vowels are together and consonants are together

    10. A person has 12 friends and he wants to invite 8 of them to a birthday party. Find i. how many times 3 particular friends will always attend the parties

    ii. how many times 3 particular friends will never attend the parties

    --------

  • 30IB-11-13-Pre-Induction Basics of Business Statistics

    Probability In our day to day life, we come across many uncertain events. We wake up in the morning and check the weather report. The statement could be 'there is 60% chance of rain today'. This statement infers that the chance of rain is more than that having a dry weather. We decide upon our breakfast from a statement that "corn flakes might reduce cholesterol". What is the chance of getting a flat tyre on the way to an important appointment? And so on. How probable an event is? We generally infer by repeated observation of such events in long term patterns. Probability is the branch of mathematics devoted to the study of such events People have always been interested in games of chance and gambling. The existence of games such as dice is evident since 3000 BC. But such games were not treated mathematically till fifteenth century. During this period, the calculation and theory of probability originated in Italy. Later in the seventeenth century, French Mathematicians Pascal and Fermat contributed to this Literature of study. The foundation of modern probability theory is credited to the Russian mathematician, Kolmogorov. He proposed the axioms, at which the present subject of probability is based.

    Random Experiment and Sample Space An experiment repeated under essentially homogeneous and similar conditions results in an outcome, which is unique or not unique but may be one of the several possible outcomes. When the result is unique then the experiment is called a 'deterministic' experiment. Example: While measuring the inner radius of an open tube, using slide calipers, we get the same result by performing repeatedly the same experiment. Many scientific and Engineering experiments are deterministic. If the outcome is one of the several possible outcomes, then such an experiment is called a "random experiment" or 'nondeterministic' experiment. In other words, any experiment whose outcome cannot be predicted in advance, but is one of the set of possible outcomes, is called a random experiment. If we think an experiment as being performed repeatedly, then each repetition is called a trial. We observe an outcome for each trial.

    Example: An experiment consists of 'tossing a die and observing the number on the upper-most face' In such cases, we talk of chance of probability, which numerically measures the degree of chance of the occurrence of events.

    Sample Space (S) The set of all possible outcomes of a random experiment is called the sample space, associated with the random experiment

  • 31IB-11-13-Pre-Induction Basics of Business Statistics

    Note: Each element of S denotes a possible outcome. Each element of S is known as sample point. Any trial results in an outcome and corresponds to one and only one element of the set S. e.g., 1. In the experiment of tossing a coin, S = {H, T} 2. In the experiment of tossing two coins simultaneously, S = {HH, HT, TH, TT} 3. In the experiment of throwing a pair of dice, S = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2),. (6,1), (6,2), (6,3), (6,4), (6,5), (6,6)}

    Events An event is the outcome or a combination of outcomes of an experiment. In other words, an event is a subset of the sample space.

    Consider a random experiment of rolling of a six faced die. The sample space of this experiment is S= {1,2,3,4,5,6 }

    Let A be the event that the number on the uppermost face is odd, then the corresponding set of favourable outcomes is {1,3,5}i.e. A= {1,3,5}

    Let B be the event that the number on the uppermost face is even. Then, B = {2,4,6}.

    Let C be the event that the number on the uppermost face is above 7. Now, this set is certainly a null set or an empty set because there is no favourable outcome. Thus, C=

    Let D be the event that the number on the uppermost face is an integer between 1 and 6, both inclusive, then D = {1,2,3,4,5,6} = S Let E be the event that the outcome is less than 2. then, E = {1}

    Types of Events As we have different types of sets, we have different types of events. We illustrate different types of events using above example.

    Simple Event If an event has one element of the sample space then it is called a simple or elementary event. In the above example, E = {1} is a simple event

    Compound Event If an event has more than one sample points, the event is called a compound event. In the above example, A = {1,3,5}is a compound event.

  • 32IB-11-13-Pre-Induction Basics of Business Statistics

    Null Event () As null set is a subset of S, it is also an event called the null event or impossible event. In the above example, C is a null event.

    Sure event In the above experiment, the sample space S= {1, 2, 3, 4, 5, 6}.. The event represented by D occurs whenever the experiment is performed. Therefore, the event D is called a sure event or certain event.

    Complement of an Event The complement of an event A with respect to S is the set of all the elements of S which are not in A. The complement of A is denoted by A' or AC.

    Note: In an experiment if A has not occurred then A' has occurred.

    Algebra of Events In a random experiment, considering S(the sample space) as the universal set, let A, B and C be the events of S. We can define union, intersection and complement of events and their properties on S, which is similar to those in set theory.

    ii) A-B is an event, which is same as ''A but not B"

    vii)

    Union of two events If A and B are two events defined on the sample space S, then A or B or (A B) denotes the event of the occurrence of at least one of the events A or B.

    Intersection of two events Intersection of two events A and B is the joint occurrence of these two events. It is denoted by (A

    B).

  • 33IB-11-13-Pre-Induction Basics of Business Statistics

    Mutually Exclusive Events Two events associated with a random experiment are said to be mutually exclusive, if both cannot occur together in the same trial or in other words, occurrence of one prevents the occurrence of the other. In the above experiment, the events A = {1,3,5 } and B = {2,4,6}are mutually exclusive. Symbolically, (A B) = Where, (A B) is the event that both A and B occur.

    Events E1, E2, , En associated with a random experiment are said to be pair-wise mutually exclusive

    Exhaustive Event For a random experiment, let E1, E2, E3,.. En be the subsets of the sample space S E1, E2, E3, , En form a set of Exhaustive events if

    Independent Events

    Events are said to be independent if the occurrence of one event does not affect the occurrence of others. Let A and B be two events defined on sample space S. Events A and B are said to be independent if

    Note: If A and B are independent, then i) Ac and Bc are independent iii) A and Bc are independent ii) Ac and B are independent

    Partition of the sample space A set of events E1, E2, E3, . En on S are said to form a partition of the sample space S, if they are collectively exhaustive and mutually exclusive. i.e. if

  • 34IB-11-13-Pre-Induction Basics of Business Statistics

    Equally Likely Outcomes The outcomes of a random experiment are said to be equally likely, if each one of them has equal chance of occurrence.

    Example: The outcomes of an unbiased coin are equally likely.

    Probability of an Event So far, we have introduced the sample of an experiment and used it to describe events. In this section, we introduce probabilities associated to the events. Let S be the sample space associated with the random experiment. Further, let S be finite and equally-likely, i.e. let there be n (finite) number of sample points in S and let each one of them be equally likely. Let A be the event defined on S then, probability of occurrence of event A is denoted by P(A) and is given by

    Where, m is the number of outcomes favourable for the occurrence of the event A.

    Note 1: 0 P(A) 1 as 0 m n

    Note 2: If P(A) = 0 then A is called a null event, or impossible event.

    Note 3: If P(A) = 1 then A is called a sure event.

    Note 4: If m is the number of cases favourable to A. Then n-m is favourable to "non occurrence of A".

    Axiomatic Approach to Probability Axiomatic approach to probability closely relates the theory of probability to set theory. Let S be the sample space of an experiment. Probability is a function, which associates a non-negative real number to every event A of the sample space denoted by P(A) satisfying the following axioms For every event A in S, P(A) 0. P(S) = 1

    P(AC) = 1 - P(A)

    P() = 0

    If A1, A2, A3,.An are mutually exclusive events in S, then

  • 35IB-11-13-Pre-Induction Basics of Business Statistics

    Addition Rule of Probability If A and B are any two events, then

    If A and B are mutually exclusive events, then P(A B) = P(A) + P(B)

    If A, B, C are any three events, then

  • 36IB-11-13-Pre-Induction Basics of Business Statistics

    Exercise 1. A sample of 500 respondents was selected in a large metropolitan area in order to

    determine various information concerning consumer behavior. Among the questions

    asked was Do you enjoy shopping for clothing? Of 240 males, 136 answered yes. Of 260 females, 224 answered yes. What is the probability that the respondent chosen at random

    i. Is a male?

    ii. Enjoys shopping for clothing? iii. Is a female?

    iv. Does not enjoy shopping for clothing?

    2. A five digit number is to be formed by digits 1,2,3,4 and 5 without repetition. What is the probability that the number is divisible by 4?

    3. What is the probability that a leap year will have 52 Tuesdays? 4. Two friends A and B apply for two vacancies at the same post. The chances of their

    selection are 0.25 and 0.20 respectively. What is the chance that i. One of them will be selected? ii. Both will be selected? iii. None of them will be selected?

    5. Probability that a man will be alive 25 years hence is 0.3 and the probability that his wife will be alive 25 years hence is 0.4. Find the probability that 25 years hence

    i. Both will be alive? ii. Only the man will be alive?

    iii. Only the women will be alive? iv. At least one of them will be alive?

    6. One bag contains 5 red and 7 black balls and the other 3 red and 12 black balls. A ball is drawn at random from either of the bags. What is the chance that the selected ball is

    black?

  • 37IB-11-13-Pre-Induction Basics of Business Statistics

    7. According to a survey, the probability that a family owns two cars if their annual income

    is greater than Rs. 8 lakh is 0.75. Of the households surveyed, 60 per cent had income over Rs. 8 lakh and 52 per cent had two cars. What is the probability that a family has two cars and an income over Rs. 8 lakh a year?

    8. The chance that a person stopping at a petrol pump will get his vehicles tyres checked is

    0.12, the chance that he will get the oil checked is 0.29 and the chance that he will get both checked is 0.07.

    i. What is the chance that a person will have neither his tyres nor oil checked? ii. What is the probability that a person who has his oil checked will also have

    tyres checked? 9. It is known that 15 per cent of the males and 10 per cent of the females in a town having

    equal number of them are unemployed. A person is selected at random from the town. What is the probability that

    i. A person is employed? ii. A person is male given that he is employed?

    10. A certain company encourages its employees to participate in cricket and hockey. A

    survey indicates that 40% play cricket, 50% play hockey and 25% play both cricket and hockey. Find the probability that

    i. An employee plays only hockey?

    ii. An employee plays only cricket?

    iii. An employee takes part in at least one of the games, cricket and hockey? Note:

    Four chapters together with four exercises have been given in the material for the purpose of self study. Make sure that you go through entire material. Evaluation will be conducted on

    this part immediately after you join the course. Wish you all the best!