11

Click here to load reader

Teaching introductory business statistics

Embed Size (px)

Citation preview

Page 1: Teaching introductory business statistics

Teaching introductory businessstatistics

GEORGE RAWLINGSUniversity of Central Lancashire, Preston, Lancashire, UK.E-mail: [email protected]

IAN MCGOWANUniversity of Central Lancashire, Preston, Lancashire, UK.E-mail: [email protected]

This paper seeks to compare the traditional delivery of introductory courses inbusiness statistics within a business and management faculty, with that of anapproach that seeks to teach through information technology (IT). To achievethis objective this paper will examine how traditionally taught courses inbusiness statistics are delivered and compare this with a teaching approachthrough the use of IT.

KEYWORDS: Higher education; business/commerce; management; pedagogy;teaching methods.

INTRODUCTION

This paper will focus on the comparison of teaching introductory business statisticscourses as taught to undergraduates in business and management faculties. Teachingthrough IT means that the subject material delivered has to be reviewed and reevalu-ated, the traditional learning objectives questioned and modified. This leads to thepresentation of a course that contains relatively similar materials, but with consider-able differences in the emphasis and delivery of the topics comprising the course.

The lessons learned and an indication of the potential for higher level courses andfuture developments will be discussed.

DISCUSSION

The traditional approach

Textbook approachesExamination of a range of available and commonly used textbooks aimed atstudents taking introductory courses in business statistics shows a relatively

Education and Information Technologies 2 143–153 (1997)

1360–2357 # 1997 IFIP

Page 2: Teaching introductory business statistics

common approach. This approach is characterized by the following broad struc-ture, Fig. 1.

The early chapters give an introduction to what statistics is and what the study ofstatistics entails, followed by a discussion of the basic principles of graphicalpresentation of sample data and introducing the basic ideas of descriptivestatistics. The emphasis then changes to a discussion of the ideas of probabilitytheory, leading to the ideas of a random variable and its probability distributionand associated descriptive measures. This typically leads to the derivation of thesampling distribution of the mean and associated sample parameters. Followingthis block of theory, the ideas of estimation and hypothesis testing are introduced.The later chapters then cover the analysis of variance and regression modelling;followed by a range of issues, typically time-series and forecasting, decision theory,non-parametric statistics and quality control.

A feature of most current textbooks is the development of ideas, and the use ofillustrative examples using output from a statistical package. However, the ideas aredeveloped formally and the package used to ’substitute’ into the developedformulae, with a common format being formulae derivation and output presentedtogether.

INTRODUCTION:WHAT IS STATISTICS

AND WHAT DOES ITS STUDY ENTAIL

SAMPLE DESCRIPTION:GRAPHICAL REPRESENTATION OF SAMPLE DATA

BASIC DESCRIPTIVE STATISTICS

PROBABILITY THEORYRANDOM VARIABLE/PROBABILITY DISTRIBUTIONS

THEORY

SAMPLING DISTRIBUTIONS

ESTIMATION THEORYHYPOTHESIS TESTING

REGRESSIONANALYSIS OF VARIANCE

OTHER ISSUES e.g.:NON-PARAMETRIC STATISTICSTIME SERIES & FORECASTING

Figure 1. Common structure of business statistics texts

144 Rawlings and McGowan

Page 3: Teaching introductory business statistics

Traditional teaching approachesHere the evidence is more anecdotal based on experiences as an advisor, exter-nal examiner and discussion with colleagues at various institutions. Experiencegenerally suggests that the textbook framework is followed in the sense that acommon teaching programme starts with ideas of data collected from samplesand how to summarize and present basic descriptive and summary information.Then switches to probability theory leading to a discussion of distribution theo-ry, in particular the normal distribution; then discusses sampling distributionsand hypothesis testing. Regression modelling and other topics, such as forecast-ing and quality control, typically follow.

Critique of the traditional approachExperience suggests that the major problems that business students studying acourse in introductory statistics have, are comprehending the role of the theoryas developed, based on the ideas of probability theory, and its role and use inthe analysis of sample data. The better student can certainly follow the opera-tional procedures developed, in terms of manipulating the theoretical structuresdeveloped; but questioning about subtle issues, such as ‘what is the role of theideas of a population?’ and ‘population parameters relative to sample statistics’generally reveals some confusion. This is not surprising given the mathematicalbackground of a typical student in a business faculty: handling the ideas of prob-ability and statistical distributions require a reasonable level of mathematical in-sight and skill.

On completion of a traditional introductory course in business statistics what willthe student be able to do?

• Can they describe what various techniques are trying to do, rather than justderive formulae and substitute into these formulae?

• Could they handle the simple initial analysis of a data set derived, forexample, from a survey; i.e. a common business activity?

In general issues of data analysis are not explicitly developed, although the keyideas are inherent in the traditional approach. Examples of statistical proceduresand calculations are often illustrated with small example sample data sets, wherethe focus is on the correct computational procedures. Consequently, the studentfrequently focuses on the details of the calculations. The interpretation of theinformation derived, if considered at all, is often relegated to a very cursorycomment at the end of the process.

Teaching through IT

ObjectivesThe initial objectives were to provide an introductory course in business statis-tics that would provide a sufficient framework for the student to be able to makea reasonable attempt at analysing a set of sample data. The course should

Teaching introductory business statistics 145

Page 4: Teaching introductory business statistics

provide a framework for analysis that would show the need and role for coherentunderlying theory to enable the analysis. The formal development of the underly-ing theoretical structures can then be discussed within the context provided.

Structure of the courseThe starting point is similar to the traditional approach, a discussion of the needand use of statistics in business; this develops into a discussion of a particulargroup of data sets, all of which are available in a format suitable for investigationusing a statistical analysis package.

Before a data set is provided to the student the nature of the business problemunder investigation is discussed, which is typically some market research typeactivity that is within the experience of the student group, with the objective of:

• Showing that the business problem could be investigated by collecting someinformation establishing the need for a sample, and for it to be a fair andrepresentative sample.

• Showing that one sample element will lead to the recording of a number ofobservations or measurements on that sample element. i.e a number ofdifferent variables are recorded on each sample element.

• Showing that the variables have different roles and characteristics. At leastone of the variables will directly relate to the objectives of the analysis, andis defined to be the response variable. The remaining variables are factorsthat may influence the response variable.

A scenario is to investigate the factors that effect the ‘amount spent’ by custo-mers in a supermarket. One way to investigate this is to undertake a survey: themechanics of collecting the data for this survey are discussed, and the problemsassociated with designing the data collection and with compiling an accuratecomputer based data file are highlighted.

The response variable for this scenario, the variable directly related to theobjective of the analysis, is the ‘amount spent’; the other variables observed=measured in this survey being the factors that may or may not influence the‘amount spent’.

Having defined this framework for the purpose of analysis, each variable can beclassified as being either a measured variable (discrete or continuous) or anattribute variable. The initial objective of the data analysis is to examine theinteraction between the response variable and each individual factor to see if theindividual factor influences the response variable. From this categorization fourtypes of analysis, according to the combination of variable types, are possible;these are summarized in Fig. 2.

Each of these four data analysis scenarios are discussed separately but within aconsistent framework of the analytical methodology, which as the first stage

146 Rawlings and McGowan

Page 5: Teaching introductory business statistics

provides an initial data analysis method allowing a judgement to be made on oneof three possible outcomes:

• There is no evidence of a relationship between the response and the factor.• There is strong evidence of a relationship between the response and the

factor.• The initial data analysis is inconclusive, hence further analysis is required,

which will either point to there being no relationship or to there being arelationship.

Ultimately there are two final outcomes:

• There is ‘no evidence of a relationship’, in which case the investigation iscomplete; or

• there is ‘evidence of a relationship’, in which case the nature of therelationship needs to be described.

This data analysis framework is shown in Fig. 3.

The teaching approachFor each of the four combinations of variable types, as illustrated in Fig. 2, theteaching approach is common. Initially the population concepts are described interms of having perfect information, a theoretical idea that in most practicalsituations is unattainable. Given perfect information, the idea of a statistical dis-tribution for both attribute and measured variables can be discussed and devel-oped. For measured variables the usual descriptive statistics are introduced withthe emphasis on exactly what property of the distribution they measure, namelycentre, width and shape. The next step is to define what is meant by a relation-ship between the response variable and the factor, which are defined using theideas of perfect information. The operational procedures for examining sampledata are then developed using the ideas developed above as the framework forthis procedure. The procedure is developed as a two-stage process: the initialdata analysis (IDA), and the description of the appropriate hypothesis test. If theresult of the analysis indicates a relationship, then methods of describing therelationship are discussed.

Measuredv

Attribute

Attributev

Attribute

Measuredv

Measured

Attributev

Measured

Attribute Measured

Measured

Attribute

RESPONSE

FACTOR

Figure 2. The four data analysis scenarios

Teaching introductory business statistics 147

Page 6: Teaching introductory business statistics

A broad indication of the teaching approach for the four data analysis scenarios, asindicated in Fig. 2 are summarized below:

Measured response versus attribute factor. If there is no relationship the statisticaldistribution of the response variable split down by each attribute level is exactlythe same. There is a relationship if the centre (mean) of distribution of theresponse depends on the level of the attribute, as summarized in Fig. 4. Theoperational procedure is developed via a statistical package. The package usedbeing MINITAB. The IDA for this situation being a combination of the twoMINITAB commands DESCRIBE=BY and BOXPLOT=BY. The interpretation of theIDA leads to one of the three outcomes given in Fig. 3, namely:

• There is no evidence of a relationship between the response and the factor.• There is strong evidence of a relationship between the response and the

factor.• The initial data analysis is inconclusive, requiring a statistical hypothesis test

that will either point to there being no relationship or to there being arelationship.

The hypotheses come naturally out of the analytical framework, as aided byFig. 4, i.e. no relationship is defined as µ1 � µ2, the null hypothesis beingH0: µ1 � µ2. A relationship is defined as µ1 and µ2 being different, the alternativehypothesis is consequently H1: µ1 µ2.

Initial DataAnalysis

HypothesisTest

No Yes

DescribeRelationship

Figure 3. The data analysis method for each data analysis scenario

148 Rawlings and McGowan

Page 7: Teaching introductory business statistics

The hypotheses extend naturally if the attribute response factor has three or morelevels. The appropriate hypothesis test can then be developed. Using MINITAB asthe operational procedure, depending on the number of levels in the factor,involves the TWOTEST or ONEWAY commands. If a relationship is found then thedescription is in terms of describing the mean values of the response variable foreach level of the factor.

Attribute response versus measured factor. This data analysis scenario can bedeveloped by analogy to the measured response attribute factor approach asdescribed above. For example, if the response variable is pass or fail a moduleand the factor is the proportion of lectures attended, then examining thestatistical distribution of the proportion of lectures attended split down by thosestudents that pass and those that fail the module will lead to a definition of norelationship and a relationship. This approach is the same as for the measuredresponse versus attribute factor, but with the role of response variable and thefactor variable reversed. A more ambitious approach would be to describe thelogit=log-linear modelling approaches. The most current version of MINITABprovides a logistic regression facility.

Attribute response versus attribute factor. This data analysis scenario uses the samekey idea of the distribution of the response variable split by the factor. There is

µ1 5 µ2

µ1 µ2

(a)

(b)

Figure 4. (a) Average response does not depend on factor level, and (b) average response dependson factor level

Teaching introductory business statistics 149

Page 8: Teaching introductory business statistics

no relationship if the distribution of the attribute response variable when splitdown by the attribute factor shows all the distributions to be the same,otherwise there is a relationship, and the nature of the relationship can bedescribed. The operational procedure when dealing with sample data is that setout in Fig. 3. The IDA, consisting of the MINITAB command TABLE to obtain thefrequency, counts for the response when split down by the factor, which maythen be presented graphically for easier interpretation. The IDA will lead to oneof three conclusions:

• There is no evidence of a relationship between the response and the factor.• There is strong evidence of a relationship between the response and the

factor.• The initial data analysis is inconclusive, a hypothesis test is required that

will either point to there being no relationship or to there being arelationship.

The hypothesis test being the standard chi-squared contingency table test forindependence, which is operationally obtained as a subcommand to the MINITABcommand TABLE.

Measured response versus measured factor: this is the last of the four data analysisscenarios. With perfect information (population information) a graph of theresponse versus the factor would show any relationship clearly.

The IDA consists of examining a MINITAB PLOT of the response versus factor andthe MINITAB REGRESSION command, using the R2 value as an additional indicatordefined as the percentage change in the response explained by changes in thefactor. Implicitly this process has the same simple linear model as the hypothesizedrelationship. The IDA can have one of three outcomes:

• There is no evidence of a relationship between the response and the factor.• There is strong evidence of a relationship between the response and the

factor.• The initial data analysis is inconclusive, a hypothesis test is required that will

either point to there being no relationship or to there being a relationship.

The hypotheses come naturally from the framework of the analysis as in Fig. 5.

The hypothesis test is implemented through the analysis of variance provided bythe MINITAB REGRESS command. If there is a relationship, the description consistsof the regression line and the value of R2.

Issues of the appropriateness of a linear model applied to a particular set ofvariables can be a relevant part of the final discussion.

EXPERIENCES

This type of approach has been piloted on first year undergraduate courses inbusiness studies and business information technology, delivered under a modular

150 Rawlings and McGowan

Page 9: Teaching introductory business statistics

scheme for the last three years. Typically the business statistics component ofthe module is about one-third of the module and consists of approximately 10 hof lectures, with an equivalent amount of time for seminar–practical support.

The student is provided with a data set that is used for lecture development andillustration, and a different data set for seminar work. The seminar activity is torepeat the analysis developed in the lecture focusing on the interpretation. Theoperational procedures, using the appropriate IDA and hypothesis test, provide alarge amount of information: the key information needs to be highlighted and theinterpretation skills developed bearing in mind that the handling of sample datameans handling imperfect information. However, because there is a clear andconsistent analytical methodology, as illustrated in Fig. 6, for each combination ofvariable types, Fig. 2, the student has a clear framework for analysis. The IDA,because it is processing sample information, can be interpreted to give one of thefollowing outcomes:

• There is no evidence of a relationship between the response and the factor.• There is strong evidence of a relationship between the response and the

factor.• The initial data analysis is inconclusive, a hypothesis test is required that

will either point to there being no relationship or to there being arelationship.

If the IDA is inconclusive then the diagram, Fig. 6, helps the student in definingthe hypotheses for the formal test. In all situations the NO branch specifies thenull hypothesis, and the analysis framework provides a formal definition of whatno relationship means. Conversely the YES branch specifies the alternative hy-pothesis with a clear definition of what yes means.

I.D.A.PLOTREGRESS

HypothesisTest

No Yes

H0: R2 5 0 H1: R2 . 0

Figure 5. Data analysis scenario: Measured v Measured

Teaching introductory business statistics 151

Page 10: Teaching introductory business statistics

The student’s response to this approach is generally positive: he=she is able toexamine sample data and track a relationship between a response variable and afactor. In general terms, experience suggests that the student finds theseprocedures straightforward and relatively easy to handle. The student gainsconfidence at handling these procedures and the general preconception of somestudents that business statistics is a very hard subject is belied. The second andfinal year of study requires the student to select sets of modules for further study.The number of students choosing to continue in this area is perhaps the biggestindicator of the success of the approach.

Practical issues

To set up a program as described requires a number of good illustrative datasets. These are not easy to obtain from conventional published sources. In gener-al the statistical journals do not provide this type of data, and the marketresearch areas tend to have confidentiality problems. The journals may suggest asuitable scenario, but a data set containing a suitable mixture of measured andattribute variables needs to be constructed. Constructing a suitable data set thatcontains a good mix of variables, with factors with no effects, strong effects andthose requiring a hypothesis test is a non-trivial and time-consuming task, parti-cularly with the added constraint that the final results should be ‘consistent withexperience’.

CONCLUSIONS

The teaching approach outlined above has been used successfully for the lastfew years on first level business statistics courses. The material delivered is notsubstantially different from the material delivered on the traditional approach.The emphasis and value placed on certain concepts and ideas have changed. The

Initial DataAnalysis

HypothesisTest

No YesNullHypothesis

AlternativeHypothesis

Figure 6. The data analysis method summary

152 Rawlings and McGowan

Page 11: Teaching introductory business statistics

formal introduction of probability theory for students at the introductory stage isseriously questioned, and is not formally included in the introductory course out-lined here. The idea of a statistical distribution of a variable, for both an attributevariable and measured variable, is described intuitively. For an attribute variable,the key idea is the description of the structure of the distribution in terms of fre-quency counts, and that there are no simple descriptive statistics for an attributevariable. For a measured variable, the key idea is description of the structure ofthe distribution using the descriptive statistics that describe its centre, widthand shape.

The main advantage to the student is the provision of a consistent framework fordata analysis. The need for a theoretical framework is clear and the better studentsenquire about the more formal approaches. A first course of this type provides aframework for a more rigorous course at the next level, with the students receptiveto the ideas and need to develop a more formal theoretical approach.

A major difficulty is obtaining data sets that are within a student’s experience andcontain a sufficient number of observations and variables to be of value. In generalthe statistical journals do not provide this type of data. The market research areastend to have confidentiality problems. The data sets used within these courses aresimulated data sets. To construct a realistic data set to meet the requirements ofthis type of course is time consuming and difficult.

Further developmentsThe frameworks developed provide a good platform for the development of amuch more rigorous course at second and third levels. The student can in gener-al handle operational issues, and is receptive to the development of the underly-ing theory. The introductory course framework as outlined allows for thedevelopment of a range of statistical modelling ideas that arise naturally from thestructure, e.g. log-linear models developed from the attribute versus attributesituation. The development of multifactor models follows naturally from this fra-mework, including the handling of attribute factors in a regression context usingdummy variables. The MINITAB facility GLM, for handling a general linear model,provides a neat culmination of this process.

Teaching introductory business statistics 153