32
Practitioner Viewpoint We often use regression analysis both to examine simple relationships and as the starting point in investigating more complex relationships. For exam- ple, we might look at the relationship between a B2B customer’s ratings of your company’s sales support and the customer’s overall satisfaction with your company. We seldom expect a high correlation between one measured item and overall satisfaction. Why don’t we expect a high correlation between any one item and over- all satisfaction? If you look at the cross-tabulation that follows you will see a typical data set. Satisfaction Rating 1 2 3 4 5 Sales 1 3 4 2 1 0 Support 2 3 6 5 4 3 Rating 3 1 4 5 5 5 4 2 4 14 8 5 1 2 8 11 The correlation between satisfaction and sales support is .55. By look- ing at those ratings on the highlighted diagonal, you can observe that, although there is a relationship, it is not strong. Every person whose ratings were “off the diagonal” reduced the correlation. This leads us to using mul- tiple regression to better explain factors (such as price, product quality, customer service, and so forth) that are related to customer satisfaction. In this chapter, you will learn more about simple and multiple regression analysis using SPSS. Ronald L. Tatham Chief Executive Officer Burke, Inc. Chapter Learning Objectives: To understand the basic concept of prediction To learn how marketing researchers use regression analysis To learn how marketing researchers use bivariate regression analysis To see how multiple regression differs from bivariate regression To appreciate various types of stepwise regression, how they are applied, and the interpretation of their findings To learn how to obtain and interpret regression analyses with SPSS 19 Predictive Analysis in Marketing Research 548 2009934199 Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright © 2003 by Pearson Prentice Hall.

DocumentMK

  • Upload
    iltdf

  • View
    3

  • Download
    1

Embed Size (px)

DESCRIPTION

dms

Citation preview

  • Practitioner ViewpointWe often use regression analysis both to examine simple relationships andas the starting point in investigating more complex relationships. For exam-ple, we might look at the relationship between a B2B customers ratings ofyour companys sales support and the customers overall satisfaction withyour company. We seldom expect a high correlation between one measureditem and overall satisfaction.

    Why dont we expect a high correlation between any one item and over-all satisfaction? If you look at the cross-tabulation that follows you will seea typical data set.

    Satisfaction Rating1 2 3 4 5

    Sales 1 3 4 2 1 0Support 2 3 6 5 4 3Rating 3 1 4 5 5 5

    4 2 4 14 85 1 2 8 11

    The correlation between satisfaction and sales support is .55. By look-ing at those ratings on the highlighted diagonal, you can observe that,although there is a relationship, it is not strong. Every person whose ratingswere off the diagonal reduced the correlation. This leads us to using mul-tiple regression to better explain factors (such as price, product quality,customer service, and so forth) that are related to customer satisfaction. In this chapter, you will learn more about simple and multiple regressionanalysis using SPSS.

    Ronald L. TathamChief Executive OfficerBurke, Inc.

    Chapter

    Learning Objectives:

    To understand the basic concept of prediction

    To learn how marketingresearchers use regression

    analysis

    To learn how marketingresearchers use bivariate

    regression analysis

    To see how multiple regressiondiffers from bivariate regression

    To appreciate various types ofstepwise regression, how they areapplied, and the interpretation of

    their findings

    To learn how to obtain andinterpret regression analyses

    with SPSS

    19Predictive Analysis in Marketing Research

    548

    6160811_CH19 11/8/06 4:23 PM Page 548

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 549

    Where We Are:

    1. Establish the need formarketing research

    2. Define the problem

    3. Establish research objectives

    4. Determine research design

    5. Identify information types andsources

    6. Determine methods ofaccessing data

    7. Design data collection forms

    8. Determine sample plan and size

    9. Collect data

    10. Analyze data

    11. Prepare and present the finalresearch report

    Gambler Target Market Similarities and DifferencesBetween Atlantic City and Las Vegas

    Although a number of casinos and other gambling venues have opened upacross the United States in the past several years, the granddaddy locationsremain Las Vegas and Atlantic City. Las Vegas has been the gambling capital for mostof the last century, beginning with the legalization of gambling in the 1930s. In the1950s Las Vegas exploded into the single most notorious gambling mecca in theUnited States with the development and aggressive marketing of The Strip with thefamous Sands Hotel and a number of other well-known casinos such as the Mirage,Rivera, and Frontier.

    Perhaps jealous of Las Vegass ability to attract tourists, New Jersey legalizedgambling in Atlantic City in the 1970s. What immediately followed was a flurry ofcasino building projects that transformed the famous Atlantic City boardwalk into acity with several casinos such as Ballys, Caesars, and the Trump Taj Mahal.

    Do Las Vegas and Atlantic City compete for U.S. gamblers? This is an importantmarketing question, for if they do compete, the strategies of their promoters should bevery different than if they do not compete. If they are not competing and drawinglocal, regional, or otherwise unique gamblers, the marketing strategies should bedesigned around these considerations. On the other hand, if Las Vegas and AtlanticCity are competing for the same market of gamblers across the United States, thenthe two gambling capitals should be aggressively battling each other for market share.

    A researcher investigated the characteristics of Las Vegas gamblers and AtlanticCity gamblers.1 The researcher used the American Travel Survey data made availableby the U.S. Department of Transportations Bureau of Transportation Statistics. Thissurvey identifies destinations of U.S. travelers, and it includes a number of demo-graphic and lifestyle questions. Using a sophisticated form of multiple regression

    Predictive analysis reveals that Las Vegas and Atlantic City do not compete forthe same U.S. gamblers.

    6160811_CH19 11/8/06 4:23 PM Page 549

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 550 Chapter 19 Predictive Analysis in Marketing Research

    Table 19.1 Characteristics of Las Vegas and Atlantic City Gamblers

    CHARACTERISTIC LAS VEGAS GAMBLERS ATLANTIC CITY GAMBLERS

    Income More trips with higher income More trips with higher incomeEducation More trips with more education More trips with more educationDistance to Las Vegas More trips the closer he or she Less trips the closer he or she

    lives to Las Vegas lives to Las VegasDistance to Atlantic City Less trips the closer he or she More trips the closer he or she

    lives to Atlantic City lives to Atlantic CityOwn home More trips with ownership Not relatedHome in Midwest More trips by Midwesterners Less trips by MidwesternersHome in Northeast Not related More trips by NortheasternersHome in South Not related Less trips by SouthernersRetired More trips if retired Not relatedStudent More trips if a student Not relatedAsian More trips if Asian Not relatedBlack Not related More trips if Black

    analysis, the researcher answered the question, Do Las Vegas and Atlantic Citycompete for U.S. gamblers? Table 19.1 lists his findings.

    The highlighted cells are the ones that distinguish the market segment profilesthat differentiate Las Vegas from Atlantic City gamblers. Specifically, both Las Vegasand Atlantic City are drawing gamblers who live closer to their respective locations,and they both are attracting higher income and higher education groups. In addition,Las Vegas gamblers are more likely to be (1) homeowners, (2) Midwesterners, (3)retired, or (4) students, and (5) Asian, and not Northeasterners, Southerners, orBlacks. Atlantic City, in contrast, is attractive to gamblers who are Northeasternersand Blacks, but it is definitely not attracting gamblers who are Midwesterners orSoutherners. Compared to Las Vegas, Atlantic City is not attracting homeowners,retirees, students, or Asians. From this set of findings, the two great American gam-bling destinations do not compete for the same gamblers.

    This chapter is the last one in which we discuss statistical procedures frequently usedby marketing researchers. A researcher sometimes wishes to predict what mightresult if the manager were to implement a certain alternative. Alternatively, theresearcher may be seeking a parsimonious way to describe market segments or thedifferences between various types of consumers such as was the case in our introduc-tory case about Las Vegas versus Atlantic City gamblers. In this chapter, we willdescribe regression analysis. Although it may seem like an intimidating procedure,we will show you how regression relates directly to the scatter diagrams and linearrelationship you learned about in the previous chapter. Three types of regressionanalysis are described in this chapter. The first, bivariate regression, simply takes cor-relation analysis between two variables into the realm of prediction. Next, multipleregression analysis introduces the concept of simultaneously using two or more vari-ables to make the prediction of a target variable such as sales. Finally, we will brieflyintroduce you to stepwise regression. This is a technique used by a researcher whenfaced with a large number of candidate predictors, and he or she is looking for thesubset of these that best predicts or describes the phenomenon under study.

    6160811_CH19 11/8/06 4:23 PM Page 550

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Understanding Prediction 551

    Prediction is a statement of whatis believed will happen in thefuture made on the basis of pastexperience or prior observation.

    Extrapolation detects a pattern inthe past and projects it into thefuture. Predictive modeling usesrelationships found among vari-ables to make a prediction.

    All predictions should be judgedas to their goodness (accuracy).

    The two approaches to predictionare extrapolation and predictivemodeling.

    UNDERSTANDING PREDICTIONA prediction is a statement of what is believed will happen in the future made on thebasis of past experience or prior observation. We are confronted with the need to makepredictions on a daily basis. For example, you must predict whether it will rain todecide whether to carry an umbrella. You must predict how difficult an examinationwill be in order to properly study. You must predict how heavy the traffic will be inorder to decide what time to start driving to make your dentist appointment on time.

    Marketing managers are also constantly faced with the need to make predictions,and the stakes are much higher than in the three examples just cited. That is, instead ofgetting wet, receiving a grade of C rather than a B, or being late for a dentist appoint-ment, the marketing manager has to worry about competitors reactions, changes insales, wasted resources, and whether profitability objectives will be achieved. Makingaccurate predictions is a vital part of the marketing managers workaday world.

    Two Approaches to PredictionThere are two ways of making a prediction: extrapolation and predictive modeling.In extrapolation, you can use past experience as a means of predicting the future.This process identifies a pattern over time and projects that pattern into the future.For example, if the weather forecaster has predicted an 80 percent chance of rainevery day for the past week and it had rained every day, you would expect it to rainif he or she predicted an 80 percent chance of rain today. Similarly, if the last twoexams you took under a professor were quite easy, you would predict the next onewould be easy as well. Of course, it might not rain or the professor might adminis-ter a hard exam, but the observed patterns argue for rain today and an easy nextexam. In both cases, you have detected a consistent pattern over time and basedyour predictions on this pattern.

    In the other case, prediction relies on an observed relationship perceived to existbetween the factor you are predicting and some condition you believe influences thefactor. For example, how does the weather forecaster make his or her predictions?He or she inspects several pieces of evidence such as wind direction and velocity,barometric pressure changes, humidity, jet stream configuration, and temperature.That is, he or she goes far beyond taking what happened yesterday and forecastingthat it will happen today. He or she builds a predictive model, using the relation-ships believed to exist among variables to make a prediction. A predictive modelrelates the conditions expected to be in place and influencing the factor you are pre-dicting. It is not an extrapolation of a consistent pattern over time; rather, it is anobserved relationship that exists across time.

    How to Determine the Goodness of Your PredictionsRegardless of the method of prediction, you will always want to judge the good-ness of your predictions, which is how good your method is at making those pre-dictions. But because predictions are for the future and we can never know the futureuntil it occurs, how can you judge the accuracy of your predictions? Here is a simpleexample that will explain the basic approach. Imagine that you are away at college.Your little brother, who is a high school sophomore, works part-time at the movietheater in your hometown. He is rather cocky about himself. When you come homefor the weekend, he claims that he can predict the theaters popcorn sales for eachday in the week. It turns out that you also worked at the theater while in high school,and you know the theater manager very well. She agrees to keep a record of popcornsales and to provide the daily amount to you for the next week. So you challengeyour little brother to write down the sales for the next seven days. After the weekpasses, how would you determine the accuracy of your brothers prediction?

    6160811_CH19 11/8/06 4:23 PM Page 551

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 552 Chapter 19 Predictive Analysis in Marketing Research

    The goodness of a prediction isbased on examination of theresiduals.

    Residuals are the errors: compar-isons of predictions to actualvalues.

    Table 19.2 Weekly Popcorn Sales: Using Residuals to Assess the Goodness of a Forecast

    YOUR BROTHERS RESIDUALDAY OF WEEK FORECAST ACTUAL SALES (DIFFERENCE) TYPE OF ERROR

    Monday $100 $125 25 Very lowTuesday $110 $130 20 LowWednesday $120 $135 15 LowThursday $125 $125 0 ExactFriday $260 $225 +35 Very highSaturday $300 $250 +50 Very highSunday $275 $235 +40 Very highAverages $185 $175 +10 High

    The easiest way would be to compare the predictions for each days popcornsales to the actual amount sold. We have done this in Table 19.2. When you look atthe table, you will see that we have calculated the difference between your brothersprediction and the actual sales for each day. Notice that for some days, the predic-tions were high, whereas for others, the predictions were low. When you comparehow far the predicted values are from the actual or observed values, you are per-forming analysis of residuals. Stated differently, assessment of the goodness of a pre-diction requires you to compare the pattern of errors in the predictions to the actualdata. Analysis of residuals underlies all assessments of the accuracy of a forecastingmethod, and because researchers cannot wait a month, a quarter, or a year to com-pare a prediction with what actually happens, they fall back on past data. In otherwords, they select a predictive model and apply it to the past data. Then they exam-ine the residuals to assess the models predictive accuracy.

    There are many ways to examine residuals. For example, in the case of your lit-tle brothers forecast, you could judge it either on a total basis or an individual basis.On a total basis, you might compute the average as we have done in the table, oryou could sum all of the daily residuals. Of course, you would need to square thedaily residuals or use the absolute values to avoid cancellation of the positive differ-

    By comparing your brotherspredictions of popcorn salesto actual sales, you canassess the goodness of his predictions.

    6160811_CH19 11/8/06 4:23 PM Page 552

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Bivariate Regression Analysis 553

    With bivariate regression, onevariable is used to predict anothervariable.

    The straight-line equation is thebasis of regression analysis.

    x

    y

    b

    a = intercept, the point on the y-axisthat the line hits when x = 0

    b = the slope, the change in the linefor each one-unit change in x

    a 1

    0

    Figure 19.1 The General Equation for a Straight Line in Graph Form

    ences by the negative differences. (You have seen the necessary squaring operationbefore, for instance, in the formula for a standard deviation or the sums of squaresformulas we described in Chapter 15.) For the individual error, you might look forsome pattern.2 On an individual basis, you might notice a pattern: Your littlebrother tends to underestimate how much popcorn will be bought on weekdays,which are low-sales days, whereas he overestimates it for Friday through Sunday,which are high-sales days. As you can see, the goodness of a prediction approachdepends on how closely it predicts a set of representative values judged by examin-ing the residuals (or errors).

    Now that you have a basic understanding of prediction and how you determinethe goodness of your predictions, we turn our attention to regression analysis.

    BIVARIATE REGRESSION ANALYSISWe first define bivariate regression analysis as a predictive analysis technique inwhich one variable is used to predict the level of another by use of the straight-lineformula. We review the equation for a straight line and introduce basic terms usedin regression. We also describe basic computations and significance with bivariateregression. We show how a regression prediction is made and we illustrate how toperform this analysis on SPSS.

    A straight-line relationship underlies regression, and it is a powerful predictivemodel. Figure 19.1 illustrates a straight-line relationship, and you should refer to itas we describe the elements in a general straight-line formula. The formula for astraight line is:

    Formula for a straight-line y = a + bxrelationship

    where

    y = the predicted variable

    x = the variable used to predict y

    a = the intercept, or point where the line cuts the y-axis when x = 0

    b = the slope or the change in y for any 1-unit change in x

    6160811_CH19 11/8/06 4:23 PM Page 553

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 554 Chapter 19 Predictive Analysis in Marketing Research

    You should recall the straight-line relationship we described underlying the cor-relation coefficient: When the scatter diagram for two variables appears as a thinellipse, there is a high correlation between them. Regression is directly related tocorrelation. In fact, we use one of our correlation examples to illustrate the applica-tion of bivariate regression shortly.

    Basic Procedure in Bivariate Regression AnalysisWe now describe independent and dependent variables and show how the interceptand slope are computed. Then we use SPSS output to show how tests of significanceare interpreted.

    Independent and Dependent Variables As we indicated, bivariate regressionanalysis is a case in which only two variables are involved in the predictive model.When we use only two variables, one is termed dependent and the other is termedindependent. The dependent variable is that which is predicted, and it is customarilytermed y in the regression straight-line equation. The independent variable is thatwhich is used to predict the dependent variable, and it is the x in the regression for-mula. We must quickly point out that the terms dependent and independent arearbitrary designations and are customary to regression analysis. There is no cause-and-effect relationship or true dependence between the dependent and the indepen-dent variable. It is strictly a statistical relationship, not causal, that may be foundbetween these two variables.

    Computing Slope and Intercept To compute a and b, a statistical analysis pro-gram needs a number of observations of the various levels of the dependent variablepaired with different levels of the independent variable, identical to the ones weillustrated previously when we were demonstrating how to perform correlationanalysis.

    The formulas for calculating the slope (b) and the intercept (a) are rather com-plicated, but some instructors are in favor of their students learning these formulas,so we have included them in Marketing Research Insight 19.1.

    Regression is directly related tocorrelation by the underlyingstraight-line relationship.

    In regression, the independentvariable is used to predict thedependent variable.

    Additional Insights

    How to Calculate the Intercept and Slope of a Bivariate Regression

    MARKETING RESEARCH

    INSIGHTIn the following example we are using the Novartis pharma-ceuticals company sales territory and number of salespersonsdata found in Table 19.3. Intermediate regression calculationsare included in Table 19.3.

    The formula for computing the regression parameter b is:

    wherexi = an x variable valueyi = a y value paired with each xi valuen = the number of pairs

    Formula for ,the slope, inbivariate regression

    bb

    n x y x y

    n x x

    i i

    i

    n

    i

    i

    n

    i

    i

    n

    i

    i

    n

    i

    i

    n=

    = = =

    = =

    1 1 1

    2

    1 1

    2

    The calculations for b, the slope, are as follows:

    Calculation of ,the slope, inbivariate regressionusing Novartis salesterritory data

    b

    b

    n x y x y

    n x x

    i i

    i

    n

    i

    i

    n

    i

    i

    n

    i

    i

    n

    i

    i

    n=

    =

    =

    = = =

    = =

    1 1 1

    2

    1 1

    2

    2

    20 58603 251 4 325

    20 3 469 2511172060 1085575

    69

    , ,

    ,, , , ,

    ,380380 6300186 4856 379

    13 56

    =

    =

    ,,,.

    19.1

    6160811_CH19 11/8/06 4:23 PM Page 554

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Bivariate Regression Analysis 555

    Table 19.3 Bivariate Regression Analysis Data and Intermediate Calculations

    SALES NUMBER OFTERRITORY ($ MILLIONS) SALESPERSONS(i ) (y) (x) xy 2

    1 102 7 714 492 125 5 625 253 150 9 1,350 814 155 9 1,395 815 160 9 1,440 816 168 8 1,344 647 180 10 1,800 1008 220 10 2,200 1009 210 12 2,520 144

    10 205 12 2,460 14411 230 12 2,760 14412 255 15 3,825 22513 250 14 3,500 19614 260 15 3,900 22515 250 16 4,320 25616 275 16 4,400 25617 280 17 4,760 28918 240 18 4,320 32419 300 18 5,400 32420 310 19 5,890 361Sums 4,325 251 58,603 3,469

    (Average = 216.25) (Average = 12.55)

    The formula for computing the intercept is:

    The computations for a, the intercept, are as follows:Calculation of ,the intercept,inbivariate regressionusing Novartis salesterritory data

    aa y bx= = =

    =

    216 25 13 56 12 55216 25 170 1546 10

    . . .

    . ..

    Formula for ,the intercept, inbivariate regression

    aa y bx=

    In other words, the bivariate regression equation hasbeen found to be:

    Novartis sales regression equation y = 46.10 + 13.56 x

    The interpretation of this equation is as follows. Annualsales in the average Novartis sales territory are $46.10 million,and they increase $13.56 million annually with each addi-tional salesperson.

    The least squares criterion usedin regression analysis guaranteesthat the best straight-line slopeand intercept will be calculated.

    When SPSS or any other statistical analysis program computes the intercept andthe slope in a regression analysis, it does so on the basis of the least squares crite-rion. The least squares criterion is a way of guaranteeing that the straight line thatruns through the points on the scatter diagram is positioned so as to minimize thevertical distances away from the line of the various points. In other words, if youdraw a line where the regression line is calculated and measure the vertical distancesof all the points away from that line, it would be impossible to draw any other linethat would result in a lower total of all of those vertical distances. Or, to state theleast squares criterion using residuals analysis, the line is the one with the lowesttotal squared residuals.

    6160811_CH19 11/8/06 4:23 PM Page 555

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 556 Chapter 19 Predictive Analysis in Marketing Research

    Figure 19.2 The SPSS Clickstream for Bivariate Regression Analysis

    RETURN TO Your Integrated Case

    The Hobbits Choice Restaurant Survey: How to Run and Interpret Bivariate Regression Analysis on SPSSNow let us illustrate bivariate regression with SPSS using The Hobbits Choice Restaurantsurvey data with which you are well acquainted. Our purpose is to help you learn the basicSPSS commands for bivariate regression and to familiarize you with SPSS output and vari-ous regression statistics found on it.

    The first step in bivariate regression analysis is to identify the dependent and indepen-dent variables. For our example, we will use the amount spent in restaurants per monthas our dependent variable. Logically, we would expect expenditures to be related toincome, so the before-tax household income level is a logical independent variable.However, we have used a code system for income level: The code numbers are not in dol-lars or dollar units (such as thousands). In any regression, it is best to use realistic valuesbecause realistic values are easiest to interpret. Consequently, we will recode the incomevalues to represent the midpoints of the income ranges on the questionnaire. For exam-ple, we will recode the less than $15,000 to 7.5 meaning $7,500, so our recoded unitsare in thousands of dollars. The recoded income values are $7.5, $20.0, $37.5, $62.5,$87.5, $125, and $175. Notice that the highest income level of $150,000 or higher didnot have an upper limit, so we arbitrarily use the range of the income level just below it.

    As you can see in Figure 19.2, the SPSS menu commands clickstream to run bivariateregression is ANALYZE-REGRESSION-LINEAR. This opens up the linear regression selec-tion window where you would indicate which variable is the dependent variable and theindependent variable. In our example, we are investigating what you would expect to be alinear relationship between the amount spent per month in restaurants (dependent vari-

    SPSS

    6160811_CH19 11/8/06 4:23 PM Page 556

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Bivariate Regression Analysis 557

    SPSS Student Assistant Online

    Running and Interpreting Bivariate

    Regression

    SPSS

    Figure 19.3 SPSS Output for Bivariate Regression Analysis

    able) and household income (independent variable). When these variables are clicked intotheir respective locations on the SPSS Linear Regression setup window, clicking on OK willgenerate the SPSS output we are about to describe.

    There are several pieces of information provided with a regression analysis such as this.First, there is information on Variables Entered/Removed, which indicates that the regres-sion analysis used a method designated as Enter. This designation refers to the regressionmethod we are using. (There are several methods, but most are beyond the scope of ourtextbook.)

    The annotated SPSS linear regression output is shown in Figure 19.3. In the ModelSummary table, three types of Rs are indicated. For bivariate regression, R Square (.738on the output) is the square of the correlation coefficient of 0.859. The Adjusted R Square(.737) reduces the R2 by taking into account the sample size and number of parametersestimated. This R Square value is very important, because it reveals how well the straight-line model fits the scatter of points. Because a correlation coefficient ranges from 1.0 to+1.0, its square will range from 0 to +1.0. The higher the R Square value, the better is thestraight lines fit to the elliptical scatter of points. A standard error value is reported, and weexplain its use later.

    Next, an Analysis of Variance (ANOVA) section is provided. As you can see, regres-sion is related to analysis of variance.3 We must determine whether the straight-linemodel we are attempting to apply to describe these two variables is appropriate. The Fvalue is significant (.000) so we reject the null hypothesis that a straight-line model doesnot fit the data we are analyzing. Just as in ANOVA, this test is a flag, and the flag hasnow been raised, making it justifiable to continue inspecting the output for more signifi-cant results. If the ANOVA F test is not significant, we would have to abandon our regres-sion analysis attempts with these two variables. Finally, you can see in the CoefficientsTable that the values of b and a are listed under Unstandardized Coefficients. The

    6160811_CH19 11/8/06 4:23 PM Page 557

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 558 Chapter 19 Predictive Analysis in Marketing Research

    You must always test the regres-sion model, intercept, and slopefor statistical significance.

    Regression analysis predictionsare estimates that have someamount of error in them.

    The standard error of the estimateis used to calculate a range of theprediction made with a regressionequation.

    constant (a) is 35.462 whereas b, identified as B, is 1.499. In other words, rounding tohundredths, the regression equation has been found to be the following.

    Bivariate regressionequation determined Dollars spent in restaurants per month by SPSS using The = $35.46 + $1.50 Income in $1,000sHobbits Choice data

    To relate this finding to our regression line in Figure 19.1, it says that the regression linewill intercept the dollars spent in restaurants per month (the y-axis) at $35.46, and theline will increase $1.50 per month for each $1,000 unit increase in the income level (thex-axis).

    Testing for Statistical Significance of the Intercept and the Slope Simply com-puting the values for a and b is not sufficient for regression analysis, because the twovalues must be tested for statistical significance. The intercept and slope that arecomputed are sample estimates of population parameters of the true intercept, (alpha), and the true slope, (beta). The tests for statistical significance are tests asto whether the computed intercept and computed slope are significantly differentfrom zero (the null hypothesis). To determine statistical significance, regressionanalysis requires that a t test be undertaken for each parameter estimate. The inter-pretation of these t tests is identical to other significance tests you have seen. Wedescribe what these t tests mean next.

    In our example, you would look at the Sig. column in the Coefficients table.This is where the slope and intercept t test results are reported. Both of our tests havesignificance levels of .000, which are below our standard significance level cutoff of.05, so our computed intercept and slope are valid estimates of the population interceptand slope. If x and y do not share a linear relationship, the population regression slopewill equal zero and the t test result will support the null hypothesis. However, if a sys-tematic linear relationship exists, the t test result will force rejection of the null hypoth-esis, and the researcher can be confident that the calculated slope estimates the true onethat exists in the population. Remember that we are dealing with a statistical concept,and you must be assured that the straight-line parameters and really exist in thepopulation before you can use your regression analysis findings as a prediction device.

    Making a Prediction and Accounting for Error Now, there is one more step torelate, and it is the most important one. How do you make a prediction? The factthat the line is a best-approximation representation of all the points means we mustaccount for a certain amount of error when we use the line for our predictions. Thetrue advantage of a significant bivariate regression analysis result lies in the abilityof the marketing researcher to use that information gained about the regression linethrough the points on the scatter diagram and to estimate the value or amount of thedependent variable based on some level of the independent variable. For example,with our regression result calculated for the relationship between total monthlyrestaurant purchases and income level, it is now possible to estimate the dollaramount of restaurant purchases predicted to be associated with a specific incomelevel. However, we know that the scatter of points does not describe a perfectlystraight line because the correlation is .859, not 1.0. So our regression predictioncan only be an estimate.

    Generating a regression prediction is conceptually identical to estimating a pop-ulation mean. That is, it is necessary to express the amount of error by estimating arange rather than stipulating an exact estimate for your prediction. Regressionanalysis provides for a standard error of the estimate, which is a measure of the

    6160811_CH19 11/8/06 4:23 PM Page 558

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Bivariate Regression Analysis 559

    x

    y

    Figure 19.4 Regression Assumes That Data Points Form a Bell-Shaped CurveAround the Regression Line

    The use of 95 percent or 99 percent confidence interval is standard.

    accuracy of the predictions of the regression equation. This standard error value islisted in the top half of the SPSS output and just beside the Adjusted R Square inFigure 19.3. It is analogous to the standard error of the mean you used in estimatinga population mean from a sample, but it is based on the residuals, or how far awayeach predicted value is from the actual value. Do you recall the popcorn sales exam-ple of residuals we described earlier in this chapter? SPSS does the same comparisonby using the regression equation it computed to predict the monthly restaurant pur-chases dollar amount for each respondent, and this predicted value is compared tothe actual amount given by the respondent. The differences, or residuals, are trans-lated into a standard error of estimate value. In our Hobbits Choice Restaurant sur-vey example, the standard error of the estimate was found to be $47.54 (rounded tothe nearest cent).

    One of the assumptions of regression analysis is that the plots on the scatter dia-gram will be spread uniformly and in accord with the normal curve assumptionsover the regression line. Figure 19.4 illustrates how this assumption might bedepicted graphically. The points are congregated close to the line and then becomemore diffuse as they move away from the line. In other words, a greater percentageof the points is found on or close to the line than is found further away. The greatadvantage of this assumption is that it allows the marketing researcher to use his orher knowledge of the normal curve to specify the range in which the dependent vari-able is predicted to fall. For example, if the researcher used the predicted dependentvalue result 1.96 times the standard error of the estimate, he or she would be stip-ulating a range with a 95 percent level of confidence; whereas if the researcher uses2.58 times the standard error of the estimate, he or she would be stipulating arange with a 99 percent level of confidence. The interpretation of these confidenceintervals is identical to interpretations for previous confidence intervals: Were theprediction made many times and an actual result determined each time, the actualresults would fall within the range of the predicted value 95 percent or 99 percent ofthese times.

    Figure 19.5 illustrates how you can envision a regression prediction. Let us usethe regression equation to make a prediction about the dollar amount of monthly

    6160811_CH19 11/8/06 4:23 PM Page 559

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 560 Chapter 19 Predictive Analysis in Marketing Research

    x

    y95% confidenceintervals aroundthe predicted y

    Predicted y 1.96 times thestandard error of the estimate

    Predictedy values

    0

    Figure 19.5 To Predict with Regression, Apply Levels of Confidence Around the Regression Line

    Regression predictions are madewith confidence intervals.

    restaurant purchases that would be associated with an income level of $75,000.Applying the regression formula, we have the following:

    Next, to reflect the imperfect aspects of the predictive tool being used, we mustapply confidence intervals. If the 95 percent level of confidence were applied, thecomputations would be:

    As you can see, the predicted y is the monthly dollar amount of restaurant pur-chases we just computed for an income level of $75,000, the 1.96 pertains to 95 per-cent level of confidence, and the standard error of the estimate is the value indicatedin the regression analysis output. The interpretation of these three numbers is as fol-lows. For a typical individual in The Hobbits Choice Restaurant survey population,if that persons household income before taxes was $75,000, the monthly restaurantpurchases amount that would be expected would be about $148, but because thereare differences between income ranges and monthly restaurant purchases, therestaurant purchases would not be exactly that amount. Consequently, the 95 per-cent confidence interval reveals that the sales figure should fall between $55 and$241. Finally, the prediction is valid only if conditions remain the same as they werefor the time period from which the original data were collected.4

    Predicted Standard error of the estimate

    $147.96 1.96 $47.54$147.96

    to $241.14

    y z

    Calculation of 95% confidenceintervals for the predictedmonthly restaurant purchases

    93 1854 78

    .$ .

    y a bx= +

    Calculation of monthlyrestaurant purchasespredicted with an incomelevel of $75,000

    Dollars spent in = $35.46 + $1.50restaurants per month Income in $1,000s

    = $35.46 + $1.50 75= $35.46 + $112.5= $147.96

    6160811_CH19 11/8/06 4:23 PM Page 560

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Multiple Regression Analysis 561

    The precision of a predictionbased on a regression analysisfinding depends on the size of thestandard error of the estimate.

    You may be troubled at the large range of our confidence intervals, and you areright to be concerned. If you recall our popcorn sales estimation example in thebeginning of the chapter, you should remember that it is important to assess the pre-cision of the predictions generated by a predictive model. How precisely a regressionanalysis finding predicts is determined by the size of the standard error of the esti-mate, a measure of the variability of the predicted dependent variable. In ourHobbits Choice survey case, the average dollars spent on restaurants per monthmay be predicted by our bivariate regression findings; however, if we repeated thesurvey many, many times, and made our $75,000 income prediction of the averagedollars spent every time, 95 percent of these predictions would fall between $55 and$241. There is no way to make this prediction more exact because its precision isdictated by the variability in the data.

    MULTIPLE REGRESSION ANALYSISNow that you are familiar with bivariate regression analysis, you are ready to stepup to a higher level. In this section, we will introduce you to multiple regressionanalysis. You will find that all of the concepts in bivariate regression apply to multi-ple regression, except you will be working with more than one independent variable.

    An Underlying Conceptual ModelIn Chapter 4 in which you learned about marketing research problem definition, wereferred to a model as a structure that ties together various constructs and their rela-tionships. In that chapter, we indicated that it is beneficial for the marketing managerand the market researcher to have some sort of model in mind when designing theresearch plan. The bivariate regression equation that you just learned about is a modelthat ties together an independent variable and its dependent variable. The dependentvariables that market researchers are interested in are typically sales, potential sales, orsome attitude held by those who make up the market. For example, in the Novartis

    Because consumers areinfluenced by many factors,it is best to use a generalconceptual model as thebasis for multiple regressionanalysis.

    6160811_CH19 11/8/06 4:23 PM Page 561

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 562 Chapter 19 Predictive Analysis in Marketing Research

    There is an underlying generalconceptual model in multipleregression analysis.

    The researcher and the manageridentify, measure, and analyzespecific variables that pertain tothe general conceptual model inmind.

    Purchases;Intentions toPurchase;

    Preferences;or Satisfaction

    Attitudes,Opinions,Feelings

    MediaExposure,Word of Mouth

    Past Behavior,Experience,Knowledge

    Demographics,Lifestyle

    Figure 19.6 A Conceptual Model for Multiple Regression Analysis

    example, the dependent variable was territory sales. If Dell computers commissioneda survey, it might want information on those who intend to purchase a Dell computer,or it might want information on those who intend to buy a competing brand as ameans of understanding these consumers and perhaps dissuading them. The depen-dent variable would be purchase intentions for Dell computers. If Maxwell HouseCoffee was considering a line of gourmet iced coffee, it would want to know how cof-fee drinkers feel about gourmet iced coffee, that is, their attitudes toward buying,preparing, and drinking it would be the dependent variable.

    Figure 19.6 provides a general conceptual model that fits many marketingresearch situations, particularly those that are investigating consumer behavior. Ageneral conceptual model identifies independent and dependent variables and showstheir basic relationships to one another. In Figure 19.6, you can see that purchases,intentions to purchase, and preferences are in the center, meaning they are depen-dent. The surrounding concepts are possible independent variables. That is, any onecould be used to predict any dependent variable. For example, ones intentions topurchase an expensive automobile like a Lexus could depend on ones income. Itcould also depend on friends recommendations (word of mouth), ones opinionsabout how a Lexus would enhance ones self-image, or experiences riding in or dri-ving a Lexus.

    In truth, consumers preferences, intentions, and actions are potentially influ-enced by a great number of factors as would be very evident if you listed all of thesubconcepts that make up each concept in Figure 19.6. For example, there are prob-ably a dozen different demographic variables; there could be dozens of lifestyledimensions, and a person is exposed to a great many types of advertising mediaevery day. Of course, in the problem definition stage, the researcher and managerslice the myriad of independent variables down to a manageable number to beincluded on the questionnaire. That is, they have the general model structure inFigure 19.6 in mind, but they identify and measure specific variables that pertain tothe problem at hand.

    Our underlying conceptual model example is one of many different conceptualmodels that researchers have available to them. In truth, every research project hasa unique conceptual model that depends entirely on the research objectives. As anexample of how regression analysis can be applied to how consumers from two dif-ferent countries view certain questionable sales tactics, read Marketing ResearchInsight 19.2. We have also provided Marketing Research Insight 19.3 on page 563

    6160811_CH19 11/8/06 4:23 PM Page 562

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Multiple Regression Analysis 563

    A Global Perspective

    Regression Analysis Shows Ethical Reactions to Questionable Business Practices Differ Between Two Countries5

    MARKETING RESEARCH

    INSIGHT

    Situation 1: How would you feel if you found that you wereovercharged $80 on a purchase of a new suit? What about a$5 overcharge?Situation 2: What would be your reaction if you just bought a$25 membership in a health club and you found out it mightgo out of business the very next day? What if you paid $700?

    Variations of these situations were presented to samplesof American and Greek college students. Among these varia-tions, the dollar value of the situation was systematically var-ied. For the suit, the overcharged amounts used were $5,$40, and $80, whereas for the health club membership theamounts were $25, $200, and $700. The gender and orga-nizational status of the culprit (salesperson, general man-ager, and owner) were also varied. The students gave their

    reactions to each situation by responding to scales, one ofwhich pertained to how ethical or unethical they consideredthe act to be.

    Using a form of regression called conjoint analysis, theresearchers found that Greek and American college studentsare similar in many ways. For example, both groups felt thatthe suit purchase situation was more ethically offensive thanthe health club one. However, the Greek students saw thesituations as more unethical than did the American students.

    With respect to what factors influenced the studentsreactions, American and Greek students were in agreementthat the least important factor was the actors gender, andthe most important factor was the dollar size of the trans-gression. However, Greek students were more affected bythe dollar size than were American students.

    19.2

    A Global Perspective

    Does Buy American in the United States Work as Well as Buy Japanese in Japan?

    MARKETING RESEARCH

    INSIGHT

    The global economy works in both directions. Products fromthe home country compete with products produced in for-eign countries, and the home countrys products areexported to foreign countries where they are the foreigncountry products. For instance, a consumer in Chicago maybe comparing U.S.-made televisions to those produced inKorea, Japan, or Germany. Similarly, a German consumer inBerlin may be comparing the very same brands, but now theGerman brand is the home country brand.

    Suppose you are Japanese and you are in the market fora mountain bike. If you are Japanese, you are a member of acollectivist culture that values cooperation, collectivism, andteamwork. You can find Japanese mountain bikes andAmerican import mountain bikes. After you do someresearch, you find that some Japanese bikes are better thansome of the American bikes, and that some of the Americanbikes are better than some of the Japanese bikes. What bikewill you buy: a superior Japanese one, a superior Americanone, a lesser Japanese bike, or a lesser U.S. bike?

    Now, lets assume that you are an American looking forthat mountain bike. Americans are nurtured in an individual-istic culture, so you value competition, self-reliance, andbeing distinctive. You will find the same marketscape: supe-

    rior Japanese bikes, lesser Japanese bikes, superiorAmerican bikes, and lesser American bikes. Which moun-tain bike will you buy?

    These questions are vital ones for global marketersbecause the ways consumers make decisions about prod-ucts with foreign country origins have important implicationsabout the marketing strategies a global marketer should useto gain competitive advantage in each of the countries whereits products are sold.

    Two researchers6 investigated the exact situations wehave just posed for you: Japanese consumers consideringsuperior and lesser Japanese and American mountain bikebrands and American consumers considering superior andlesser American and Japanese mountain bike brands. Usingvarious techniques including regression analysis, they foundthat Japanese consumers favored Japanese-made productsregardless of superiority. In other words, they always chosetheir own countrys brand. American consumers, on theother hand, chose the superior brand, regardless of thecountry of origin. The researchers concluded that any globalmarketer must be aware of the strong home country productfavoritism they will face in countries where the culture is col-lectivist. In these markets, the global marketer may want to

    19.3

    (box continues)

    6160811_CH19 11/8/06 4:23 PM Page 563

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 564 Chapter 19 Predictive Analysis in Marketing Research

    that illustrates how regression analysis was used to assess the success of the sloganBuy American.

    Bivariate regression analysis treats only dependentindependent pairs, and itwould take a great many bivariate regression analyses to account for possible rele-vant dependentindependent pairs of variables in a survey. Fortunately, there is noneed to perform a great many bivariate regressions, as there is a much better toolcalled multiple regression analysis, the last analysis technique described in this text-book. While hearing that you have only one more statistical technique to learn inthis textbook, you should be aware that there are a number of statistical techniquesthat are beyond the scope of this textbook. A market researcher and author of abook on these techniques is Dr. Ronald Tatham, who is also the CEO of BurkeMarketing Research, Inc., a world leader in marketing research. You can meet Dr.Tatham in Marketing Research Insight 19.4.

    Multiple Regression Analysis DescribedMultiple regression analysis is an expansion of bivariate regression analysis in thatmore than one independent variable is used in the regression equation. The additionof independent variables complicates the conceptualization by adding more dimen-sions or axes to the regression situation. But it makes the regression model morerealistic because predictions normally depend on multiple factors, not just one.

    Basic Assumptions in Multiple Regression Consider our example with the numberof salespeople as the independent variable and territory sales as the dependent vari-able. A second independent variable such as advertising levels can be added to theequation (Figure 19.7). You will note that the advertising dimension (x2) is included

    Meet a Marketing Researcher

    Meet Ronald L. Tatham, Ph.D, Chief Executive Officer, Burke, Inc.

    MARKETING RESEARCH

    INSIGHTDr. Ronald Tatham began his career at Burke in 1976. He hasbeen chairman/CEO since 1989. Before joining Burke, Ronwas a professor on the Graduate Business Faculty of ArizonaState University and has taught at the University of Cincinnatiand Kent State University. He also consulted with several orga-nizations, including advertising agencies, retailers, as well asconsumer and industrial good manufacturers and serviceproviders. Ron holds a B.B.A. degree from the University ofTexas at Austin, an M.B.A. from Texas Tech University, and aPh.D. from the University of Alabama. He is coauthor ofMultivariate Data Analysis (MacMillan, 4th edition, 1994). Hisresearch papers have appeared in several publications,

    including the Journal of Marketing Research, the Journal ofMarket Research Society, Business Horizons, andManagement Science. His most recent article is ProductDesign and the Pricing Decision: A Sequential Approach,Journal of the Market Research Society, January 1995 (withJeff Miller and Vidyut Vashi). He is a member of the MarketingResearch Advisory Board at the University of Georgia and theMSMR Advisory Board at the University of Texas at Arlington.He is also active in several professional organizations and haspresented over 200 seminars and papers before professionalgroups. His most recent teaching assignment was a class oninternational research at the University of Thailand.

    19.4

    downplay its foreign country status. Similarly, a global mar-keter competing in a country where the culture is individual-istic must demonstrate superior product performance even ifit is the marketers home country. The researchers caution

    that Buy American slogans issued by U.S. marketers atU.S. consumers probably have not worked for lesser-qualityproducts. But Buy Japanese seems to work quite well inJapan regardless of the quality of the Japanese products.

    Multiple regression means thatyou have more than one indepen-dent variable to predict a singledependent variable.

    6160811_CH19 11/8/06 4:23 PM Page 564

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Multiple Regression Analysis 565

    With multiple regression, youwork with a regression planerather than a line.

    A multiple regression equationhas two or more independent vari-ables (xs).

    X2

    Y

    X1

    Figure 19.7 With Multiple Regression, the Line Becomes a Plane

    at a right angle to the axis representing the number of salespeople (x1) , which cre-ates a three-dimensional display. At the same time, the addition of a second variableturns the regression line into a regression plane. A regression plane is the shape ofthe dependent variable in multiple regression analysis. If other independent vari-ables are added to the regression analysis, it would be necessary to envision each oneas a new and separate axis existing at right angles to all other axes. Obviously, it isimpossible to draw more than three dimensions at right angles. In fact, it is difficultto even conceive of a multiple-dimension diagram, but the assumptions of multipleregression analysis require this conceptualization.

    Everything about multiple regression is essentially equivalent to bivariateregression except you are working with more than one independent variable. Theterminology is slightly different in places, and some statistics are modified to takeinto account the multiple aspect, but for the most part, concepts in multiple regres-sion are analogous to those in the simple bivariate case. We note these similarities inour description of multiple regression.

    The regression equation in multiple regression has the following form:

    Multiple regression equation y = a + b1x1 + b2x2 + b3x3 + . . . + bmxmwhere

    y = the dependent, or predicted, variable

    xi = independent variable i

    a = the intercept

    bi = the slope for independent variable i

    m = the number of independent variables in the equation

    As you can see, the addition of other independent variables has done nothingmore than to add bixis to the equation. We still have retained the basic y = a + bxstraight-line formula, except now we have multiple x variables, and each one isadded to the equation, changing y by its individual slope. The inclusion of each

    6160811_CH19 11/8/06 4:23 PM Page 565

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 566 Chapter 19 Predictive Analysis in Marketing Research

    Multiple R indicates how well theindependent variables can predictthe dependent variable in multipleregression.

    independent variable in this manner preserves the straight-line assumptions of mul-tiple regression analysis. This is sometimes known as additivity because each newindependent variable is added on to the regression equation.

    Lets look at a multiple regression analysis result so you can better understandthe multiple regression equation. Here is a possible result using our Lexus example.

    Lexus purchaseintention multipleregression equationexample

    Intention to purchase a Lexus

    attitude toward Lexus (15 scale)negative word of mouth (15 scale)

    income level (110 scale)

    =

    +

    +

    21 051 0

    ..

    .

    This multiple regression equation says that you can predict a consumers intentionto buy a Lexus level if you know three variables: (1) attitude toward Lexus, (2) friendsnegative recommendations or other negative comments about Lexus, and (3) incomelevel using a scale with 10 income grades. Furthermore, we can see the impact of eachof these variables on Lexus purchase intentions. Here is how to interpret the equation.First, the average person has a 2 intention level, or some small propensity to wantto buy a Lexus. Attitude toward Lexus is measured on a 15 scale, and with each atti-tude scale point, intention goes up 1 point. That is, an individual with a strong posi-tive attitude of 5 will have a greater intention than one with a strong negative atti-tude of 1. With friends objections to the Lexus (negative word of mouth) such asA Lexus is overpriced, the intention decreases by .5 for each level on the 5-pointscale. Finally, the intention increases by 1 with each increasing income grade.

    Here is a numerical example for a potential Lexus buyer whose attitude is 4,negative word of mouth is 3, and income is 5.

    Multiple regression is a very powerful tool, because it tells us which factorsaffect the dependent variable, which way (the sign) each factor influences the depen-dent variable, and how much (the size of bi) each factor influences it.

    Just as was the case in bivariate regression analysis in which we used the correla-tion between y and x, it is possible to inspect the strength of the linear relationshipbetween the independent variables and the dependent variable with multiple regres-sion. Multiple R, also called the coefficient of determination, is a handy measure of thestrength of the overall linear relationship. Just as was the case in bivariate regressionanalysis, the multiple regression analysis model assumes that a straight-line (plane)relationship exists among the variables. Multiple R ranges from 0 to +1.0 and repre-sents the amount of the dependent variable explained, or accounted for, by the com-bined independent variables. High multiple R values indicate that the regression planeapplies well to the scatter of points, whereas low values signal that the straight-linemodel does not apply well. At the same time, a multiple regression result is an estimateof the population multiple regression equation, and, just as was the case with otherestimated population parameters, it is necessary to test for statistical significance.

    Multiple R is like a lead indicator of the multiple regression analysis findings.As you will see soon, it is one of the first pieces of information provided in a multi-ple regression output. Many researchers mentally convert the multiple R into a per-

    Calculation of Lexuspurchase intentionusing the multipleregression equation

    Intention to purchase a Lexus

    435

    9.5

    =

    + + =

    21 05

    1 0

    ...

    6160811_CH19 11/8/06 4:23 PM Page 566

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Multiple Regression Analysis 567

    With multiple regression, theindependent variables shouldhave low correlations with oneanother.

    Multicollinearity can be assessedand eliminated in multiple regres-sion with the VIF statistic.

    centage. For example, a multiple R of .75 means that the regression findings willexplain 75 percent of the dependent variable. The greater the explanatory power ofthe multiple regression finding, the better and more useful it is for the researcher.

    Let us issue a caution before we show you how to run a multiple regressionanalysis using SPSS. The independence assumption stipulates that the independentvariables must be statistically independent and uncorrelated with one another. Theindependence assumption is portrayed by the right-angle alignment of each addi-tional independent variable in Figure 19.7. The presence of moderate or stronger cor-relations among the independent variables is termed multicollinearity and will violatethe independence assumption of multiple regression analysis results when it occurs.7

    It is up to the researcher to test for and remove multicollinearity if it is present.There are statistics issued to identify this problem. One commonly used statistic

    is the variance inflation factor (VIF). The VIF is a single number, and a rule ofthumb is that as long as VIF is less than 10, multicollinearity is not a concern. Witha VIF of 10 or more associated with any independent variable in the multiple regres-sion equation, it is prudent to remove that variable from consideration or to other-wise reconstitute the set of independent variables. In other words, when examiningthe output of any multiple regression, the researcher should inspect the VIF numberassociated with each independent variable that is retained in the final multipleregression equation by the procedure. If the VIF is greater than 10, the researchershould remove that variable from the independent variable set and rerun the multi-ple regression. This iterative process is used until only independent variables that arestatistically significant and that have acceptable VIFs are in the final multiple regres-sion equation. SPSS calculates all VIFs.

    RETURN TO Your Integrated Case

    The Hobbits Choice Restaurant Survey: How to Run and Interpret Multiple Regression Analysis on SPSSRunning multiple regression is almost identical to performing simple bivariate regressionwith SPSS. The only difference is that you will select more than one independent variablefor the analysis. Lets think about a general conceptual model that might predict how

    6160811_CH19 11/8/06 4:23 PM Page 567

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 568 Chapter 19 Predictive Analysis in Marketing Research

    Figure 19.8 The SPSS Clickstream for Multiple Regression Analysis

    much people spend on restaurants per month. We already know from our bivariateregression analysis work with The Hobbits Choice Restaurant data set that income pre-dicts this dependent variable. Another predictor could be family size as larger families willorder more entres because there are more family members. So, we will add this inde-pendent variable. A third independent variable could be preferences. People who preferan elegant dcor surely will pay for this atmosphere as elegant dcors are typically foundin the most expensive restaurants. To summarize, we have determined our conceptualmodel: The average amount paid at restaurants per month may be predicted by (1)household income level, (2) family size, and (3) preference for restaurants with elegantdcors.

    Just as with bivariate regression, the ANALYZE-REGRESSION-LINEAR commandsequence is used to run a multiple regression analysis, and the variable, dollars spent inrestaurants per month, is selected as the dependent variable, while the other three arespecified as the independent variables. You will find this annotated SPSS clickstream inFigure 19.8.

    As the computer output in Figure 19.9 shows, the Adjusted R Square value (ModelSummary table) indicating the strength of relationship between the independent variablesand the dependent variable is .749, signifying that there is some linear relationship pre-sent. Next, the printout reveals that the ANOVA F is significant, signaling that the nullhypothesis of no linear relationship is rejected, and it is justifiable to use a straight-line rela-tionship to model the variables in this case.

    Just as we did with bivariate regression, it is necessary in multiple regression analysis totest for statistical significance of the bis (betas) determined for the independent variables.Once again, you must determine whether sampling error is influencing the results and giv-ing a false reading. You should recall that this test is a test for significance from zero (thenull hypothesis) and is achieved through the use of separate t tests for each bi. The SPSSoutput in Figure 19.9 indicates the levels of statistical significance. In this particular exam-ple, it is apparent the recoded income level and the preference for an elegant dining dcorare significant as both have significance levels of .000. The constant (a) also is significantas it, too, has a significance level of .000. However, the size of family independent variable

    The SPSS ANALYZE-REGRESSION-LINEAR command is used to runmultiple regression.

    With multiple regression, look atthe significance level of each cal-culated beta.

    6160811_CH19 11/8/06 4:23 PM Page 568

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Multiple Regression Analysis 569

    Figure 19.9 SPSS Output for Multiple Regression Analysis

    Run trimmed regressions itera-tively until all betas are signifi-cant.

    Once the multiple regressionequation coefficients are deter-mined, one can make a predic-tion of the dependent variableusing independent variablevalues.

    SPSS Student Assistant Online

    Running and Interpreting Multiple

    Regression

    SPSS

    A trimmed regression meansthat you eliminate the non-significant independent vari-ables and rerun the regression.

    is not significant because its significance level is .200 and, therefore, above our standardcutoff value of .05.

    What do you do with the mixed significance results we have just found in our dollarsspent on restaurants per month multiple regression example? Before we answer this ques-tion, you should be aware that this mixed result is very likely, so how to handle it is vital toyour understanding of how to perform multiple regression analysis successfully. Here is theanswer: It is standard practice in multiple regression analysis to systematically eliminatethose independent variables that are shown to be insignificant through a process called trimming. You then rerun the trimmed model and inspect the significance levels again.This series of eliminations or iterations helps to achieve the simplest model by eliminatingthe nonsignificant independent variables. The trimmed multiple regression model with allsignificant independent variables is found in Figure 19.10. Notice that the VIF diagnosticswere not examined because they were examined on the untrimmed SPSS output andfound to be acceptable.

    This additional run enables the marketing researcher to think in terms of fewer dimen-sions within which the dependent variable relationship operates. Generally, successive iter-ations sometimes cause the Adjusted R to decrease somewhat, and it is advisable to scru-tinize this value after each run. You can see that the new Adjusted R is still .749, so in ourexample, there has been no decrease. Iterations will also cause the beta values and theintercept value to shift slightly; consequently, it is necessary to inspect all significance lev-els of the betas once again. Through a series of iterations, the marketing researcher finallyarrives at the final regression equation expressing the salient independent variables andtheir linear relationships with the dependent variable. A concise predictive model has beenfound.

    Using Results to Make a PredictionThe use of a multiple regression result is identical in concept to the application of a bivari-ate regression resultthat is, it relies on an analysis of residuals. Remember, we beganthis chapter with a description of residuals and indicated that residuals analysis is a way todetermine the goodness of a prediction. Ultimately, the marketing researcher wishes to

    6160811_CH19 11/8/06 4:23 PM Page 569

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 570 Chapter 19 Predictive Analysis in Marketing Research

    Figure 19.10 SPSS Output for Trimmed Multiple Regression Analysis

    predict the dependent variable based on assumed or known values of the independentvariables that are found to have significant relationships within the multiple regressionequation. The standard error of the estimate is provided on all regression analysis pro-grams, and it is possible to apply this value to forecast the ranges in which the dependentvariable will fall, given levels of the independent variables.

    Making a prediction with multiple regression is identical to making one with bivariateregression except you use the multiple regression equation. For a numerical example, letus assume that we are interested in upscale restaurant patrons so we can give Jeff Deanan estimate of how much they spend monthly on restaurants. Well specify an upscalerestaurant patron as someone with household income of $100K and who very stronglyprefers an elegant dcor. To very strongly prefer means a 5 on the 5-point scale of pref-erences for various restaurant features.

    Using our SPSS trimmed multiple regression findings, we can predict the amount ofdollars spent on restaurants each month by upscale consumers. The calculations follow.Remember the constant and betas are from the trimmed multiple regression output inFigure 19.10.

    The calculated prediction is about $232; however, we must take into consideration thesample error and variability of the data with a confidence interval, that is, the predicteddollars spent on restaurants per month 1.96 times the standard error of the estimate.

    Calculation of 95% confidenceinterval for a prediction based on multiple regression findings with The Hobbits ChoiceRestaurant survey

    Predicted standard error of the estimate

    $124.05 to $306.13

    y

    1 96215 09 1 96 46 45215 09 91 04

    .$ . . $ .$ . .

    Calculation of monthly restaurantpurchases predicted with an income level of $100,000 and preference for elegantdcor of 5

    y a b x b x= + += + + = + =

    1 1 2 229 39 1 15 100 14 14 529 39 115 000 70 70215 09

    $ . $ . $ .$ . $ , $ .$ .

    Here is an example of a predictionusing multiple regression.

    6160811_CH19 11/8/06 4:23 PM Page 570

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Multiple Regression Analysis 571

    The interval-at-minimum scalingassumption requirement of multi-ple regression may be relaxed byuse of a dummy variable.

    The researcher can compare stan-dardized beta coefficients sizesdirectly, but comparing unstan-dardized betas is like comparingapples and oranges.

    Standardized betas indicate therelative importance of alternativeindependent variables.

    The interpretation would be that those individuals with a household income before taxesof $100,000 and who very strongly prefer an elegant dcor when they dine can beexpected to spend between about $124 and $306 monthly on restaurants, averaging about$215. Again, the confidence interval range is quite large, but it is perfectly reflective of thevariability in the data and in no way a flaw in the multiple regression analysis.

    Special Uses of Multiple Regression AnalysisThere are a number of special uses and considerations to keep in mind when run-ning multiple regression analysis. These include using a dummy independent vari-able, using standardized betas to compare the importance of independent variables,and using multiple regression as a screening device.

    Using a Dummy Independent Variable A dummy independent variable isdefined as one that is scaled with a nominal 0-versus-1 coding scheme. The 0-versus-1code is traditional, but any two adjacent numbers could be used, such as 1-versus-2.The scaling assumptions that underlie multiple regression analysis require that theindependent and dependent variables both be at least interval scaled. However, thereare instances in which a marketing researcher may want to use an independentvariable that does not embody interval-scaling assumptions. It is not unusual, forinstance, for the marketing researcher to wish to use a dichotomous or two-level vari-able, such as gender, as an independent variable, in a multiple regression problem. Forinstance, a researcher may want to use gender coded as 0 for male and 1 for female asan independent variable. Or you might have a buyernonbuyer dummy variable thatyou want to use as an independent variable. In these instances, it is usually permissibleto go ahead and slightly violate the assumption of metric scaling for the independentvariable to come up with a result that is in some degree interpretable.

    Using Standardized Betas to Compare the Importance of Independent VariablesRegardless of the application intentions of the marketing researcher, it is usually ofinterest to the marketing researcher to determine the relative importance of the inde-pendent variables in the multiple regression result. Because independent variablesare often measured with different units, it is wrong to make direct comparisonsbetween the calculated betas. For example, it is improper to directly compare the bcoefficient for family size to another for money spent per month on personal groom-ing because the units of measurement are so different (people versus dollars). Themost common approach is to standardize the independent variables through a quickoperation that involves dividing the difference between each independent variablevalue and its mean by the standard deviation of that independent variable. Thisresults in what is called the standardized beta coefficient. In other words, standard-ization translates each independent value into the number of standard deviationsaway from its own mean. Essentially, this procedure transforms these variables intoa set of values with a mean of zero and a standard deviation equal to 1.0.

    When they are standardized, direct comparisons may be made between theresulting betas. The larger the absolute value of a standardized beta coefficient, themore relative importance it assumes in predicting the dependent variable. SPSS andmost other statistical programs provide the standardized betas automatically. If youreview the SPSS output in Figure 19.10, you will find the standardized values underthe column designated as Standardized Coefficients. It is important to note thatthis operation has no effect on the final multiple regression result. Its only functionis to allow direct comparisons of the relative impact of the significant independentvariables on the dependent variable. As an example, if you look at the Standardized

    6160811_CH19 11/8/06 4:23 PM Page 571

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 572 Chapter 19 Predictive Analysis in Marketing Research

    SPSS

    Stepwise regression is useful if aresearcher has many independentvariables and wants to narrow theset down to a smaller number ofstatistically significant variables.

    Coefficients reported in our Hobbits Choice Restaurant regression printout(Figure 19.10), you will see that the income level is the more important variable(.658), whereas preference for an elegant restaurant dcor is much less important(.230). We have highlighted these numbers with magenta color so you can identifythem easily.

    Using Multiple Regression as a Screening Device Another application of multipleregression analysis is as a screening or identifying device. That is, the marketingresearcher may be faced with a large number and variety of prospective independentvariables, and he or she may use multiple regression as a screening device or a wayof spotting the salient (statistically significant) independent variables for the depen-dent variable at hand. In this instance, the intent is not to determine some sort of aprediction of the dependent variable; rather, it may be to search for clues as to whatfactors help the researcher understand the behavior of this particular variable. Forinstance, the researcher might be seeking market segmentation bases and could useregression to spot which demographic variables are related to the consumer behav-ior variable under study. We have prepared Marketing Research Insight 19.5 thatwill show you how multiple regression can be used as a screening device in develop-ing countries for global marketers to refine their market segmentation approaches.

    STEPWISE MULTIPLE REGRESSIONWhen the researcher is using multiple regression as a screening tool or he or she isotherwise faced with a large number of independent variables in the conceptualmodel that are to be tested by multiple regression, it can become tedious to narrowdown the independent variables by trimming. Fortunately, there is a type of multipleregression that does the trimming operation automatically, and this is called step-wise multiple regression.

    With stepwise multiple regression, the one independent variable that is statisti-cally significant and explains the most variance in the dependent variable is deter-mined, and it is entered into the multiple regression equation. Then the statisticallysignificant independent variable that contributes most to explaining the remainingunexplained variance in the dependent variable is determined and entered. Thisprocess is continued until all statistically significant independent variables have beenentered into the multiple regression equation.8 In other words, all of the insignifi-cant independent variables are eliminated from the final multiple regression equa-tion based on the level of significance stipulated by the researcher in the multipleregression options. The final output contains only statistically significant indepen-dent variables. Stepwise regression is used by researchers when they are confrontedwith a large number of competing independent variables and they do want to nar-row down the analysis to a set of statistically significant independent variables in asingle regression analysis. With stepwise multiple regression, there is no need to trimand rerun the regression analysis because SPSS does the trimming automatically.

    How to Do Stepwise Multiple Regression with SPSSA researcher executes stepwise multiple regression by using the ANALYZE-REGRESSION-LINEAR command sequence precisely as was described for multipleregression. The dependent variable and many independent variables are clicked intotheir respective windows as before. To direct SPSS to perform stepwise multipleregression, one uses the Method selection menu to select Stepwise. The findingswill be the same as those arrived at by a researcher who uses iterative trimmed multi-ple regressions. Of course, with stepwise multiple regression output, there will be

    Multiple regression is sometimesused to help a marketer applymarket segmentation.

    6160811_CH19 11/8/06 4:23 PM Page 572

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Stepwise Multiple Regression 573

    A Global Perspective

    Multiple Regression Screening Analysis Leads to an Understanding of Women Consumers in Developing Markets

    MARKETING RESEARCH

    INSIGHT

    Global marketers of household and other family-orientedproducts are keenly interested in shifts in the workforce indeveloping countries. As the workforce of a country movesinto higher-paying jobs, discretionary income of familiesincreases. Similarly, with these shifts, consumers often findthey have less time and they become more open to time-saving goods and services being marketed by foreign corpo-rations in their countries. One of the most significant shiftsthat can happen in a developing country workforce is thelarge-scale movement of women into the job market. When awife takes on employment, household income can increasesubstantially, and it has profound implications for the prod-ucts and services that household will purchase with this newincome level.

    Asia is a region undergoing many economic transitions,and researchers9 chose to study five Asian countriesKorea,Thailand, Sri Lanka, Indonesia, and the Philippinesin anattempt to understand what factors are key to women fromthese Asian countries entering into the labor market. A greatmany factors may combine to foster the movement of Asianwomen into the workforce, and regression analysis can beused as a screening device to narrow down the many possi-ble factors to a set of those that are significantly related tothese shifts.

    The researchers speculated in their general conceptualmodel that a number of possible independent variablesmay determine whether a woman is employed outside the

    home or whether she remains at home. These variablesincluded wifes age, husbands age, total number ofbirths, number of young children, total number of childrenliving at home, rural or urban residence, wifes illiteracy,husbands illiteracy, and level of education of the wife andof the husband (separate dummy variables of primary, middle,high school, or college education levels). A multipleregression screening analysis was conducted separatelyfor each of the five Asian countries. Table 19.4 lists theindependent variables that were found to be statisticallysignificant for at least three of the five Asian countries. Aplus sign means the beta coefficient was positive, whereasa minus sign means that the beta coefficient was found tobe negative.

    Global marketers targeting Asian countries shouldinterpret these findings in the following way. Their targetmarkets should be older women who are past the child-bearing stage and who definitely do not have young chil-dren. Rural or urban location depends on the country.When the wife graduates from college or a university, shedefinitely will enter the workforce. Also, since these work-ing wives have college degrees, global marketers canexpect them to be sophisticated consumers. Theresearchers speculate that if the husband has a highschool education, he earns a higher income, and the wifedoes not need to work, so this group is a completely dif-ferent kind of target market.

    19.5

    Table 19.4 Results of a Multiple Regression Screening Analysis on Determinants of Women Entering the Workforce in Asian Countries

    FACTOR KOREA THAILAND SRI LANKA INDONESIA PHILIPPINES

    Wifes age + + + + +Births + + +Children at home Young children Rural location + +Wifes college + + + + +

    educationHusbands high

    school education

    + = positive relationship, = negative relationship, blank = not significant

    6160811_CH19 11/8/06 4:23 PM Page 573

    2009

    9341

    99

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • 574 Chapter 19 Predictive Analysis in Marketing Research

    Regression is a statistical tool,not a cause-and-effect statement.

    information on those independent variables that are taken out of the multiple regres-sion equation based on nonsignificance, and, if the researcher wishes, SPSS stepwisemultiple regression will also take into account the VIF statistic to assure that multi-collinearity is not an issue.

    We do not have screenshots of stepwise multiple regression as this technique isquite advanced. Please read our Final Comments on Multiple Regression Analysisat the end of this chapter for an explanation.

    TWO WARNINGS REGARDING REGRESSION ANALYSISBefore leaving our description of multiple regression analysis, we must issue a warn-ing about your interpretation of regression. We all have a natural tendency to thinkin terms of causes and effects, and regression analysis invites us to think in terms ofa dependent variables resulting or being caused by an independent variablesactions. This line of thinking is absolutely incorrect: Regression analysis is nothingmore than a statistical tool that assumes a linear relationship between two variables.It springs from correlation analysis, which is, as you will recall, a measure of the lin-ear association and not the causal relationship between two variables. Consequently,even though two variables, such as sales and advertising, are logically connected, aregression analysis does not permit the marketing researcher to formulate cause-and-effect statements.

    The second warning we have is that you should not apply regression analysis topredict outside of the boundaries of the data used to develop your regression model.That is, you may use the regression model to interpolate within the boundaries setby the range (lowest value to highest value) of your independent variable, but if youuse it to predict for independent values outside those limits, you have moved into anarea that is not accounted for by the raw data used to compute your regression line.For this reason, you are not assured that the regression equation findings are valid.For example, it would not be correct to use our income-dollars spent on restaurantsregression equation findings on individuals who are wealthy and have annualincomes in the millions of dollars because these individuals were not represented inThe Hobbits Choice Restaurant survey.

    FINAL COMMENTS ON MULTIPLE REGRESSION ANALYSISThere is a great deal more to multiple regression analysis but it is beyond the scopeof this textbook to delve deeper into this topic. The coverage in this chapter intro-duces you to regression analysis, and it provides you with enough informationabout it to run uncomplicated regression analyses on SPSS, identify the relevantaspects of the SPSS output, and to interpret the findings. As you will see when youwork with the SPSS regression analysis procedures, we have only scratched the sur-face of this topic. There are many more options, statistics, and considerationsinvolved. In fact, there is so much material that whole textbooks on regression exist.Our purpose has been to teach you the basic concepts and to help you interpret thestatistics associated with these concepts as you encounter them in statistical analysisprogram output. Our descriptions are merely an introduction to multiple regressionanalysis to help you comprehend the basic notions, common uses, and interpreta-tions involved with this predictive technique.10

    Despite our simple treatment of it, we fully realize that even simplified regres-sion analysis is very complicated and difficult to learn and that we have showeredyou with a great many regression statistical terms and concepts in this chapter.

    6160811_CH19 11/8/06 4:23 PM Page 574

    2009934199

    Marketing Research: Online Research Applications, Fourth Edition, by Alvin C. Burns and Ronald F. Bush. Copyright 2003 by Pearson Prentice Hall.

  • Summary 575

    Seasoned researchers are intimately knowledgeable with them and very comfortablein using them. However, as a student encountering them for the first time, youundoubtedly feel very intimidated. Although we may not be able to reduce youranxiety, we have created Table 19.5 that lists all of the regression analysis conceptswe have described in this chapter, and it provides an explanation of each one. Atleast, you will not need to search through the chapter to find these concepts whenyou are trying to learn or use them.

    SUMMARYPredictive analyses are methods used to forecast the level of a variable such as sales.Model building and extrapolation are two general options available to marketresearchers. In either case, it is important to assess the goodness of the prediction.This assessment is typically performed by comparing the predictions against theactual data with procedures called residuals analyses.

    Market researchers use regression analysis to make predictions. The basis of thistechnique is an assumed straight-line relationship existing between the variables. Withbivariate regression, one independent variable, x, is used