12
Discriminant Analysis: Discriminant analysis is used to model the value of a dependent categorical variable based on its relationship to one or more predictors. Given a set of independent variables, discriminant analysis attempts to find linear combinations of those variables that best separate the groups of cases. If you are a loan officer at a bank, you want to be able to identify characteristics that are indicative of people who are likely to default on loans, and you want to use those characteristics to identify good and bad credit risks. Suppose information on 850 customers is given. The first 700 cases are customers who were previously given loans. Use a random sample of these 700 customers to create a discriminant analysis model, setting the remaining customers aside to validate the analysis. Then use the model to classify the 150 prospective customers as good or bad credit risks. The coefficients for Years with current employer and Years at current address are smaller for the Yes classification function, which means that customers who have lived at the same address and worked at the same company for many years are less likely to default. Similarly, customers with greater debt are more likely to default. Factor Analysis: Factor Analysis is primarily used for data reduction or structure detection. The purpose of data reduction is to remove redundant (highly correlated) variables from the data file, perhaps replacing the entire data file with a smaller number of uncorrelated variables.

Marketing Research Techniques

Embed Size (px)

DESCRIPTION

Different Techniques which are carried out in the SPSS while carrying out marekting research of a brand etc.

Citation preview

Discriminant Analysis:Discriminant analysis is used to model the value of a dependent categorical variable based on its relationship to one or more predictors. Given a set of independent variables, discriminant analysis attempts to find linear combinations of those variables that best separate the groups of cases.If you are a loan officer at a bank, you want to be able to identify characteristics that are indicative of people who are likely to default on loans, and you want to use those characteristics to identify good and bad credit risks. Suppose information on 850 customers is given. The first 700 cases are customers who were previously given loans. Use a random sample of these 700 customers to create a discriminant analysis model, setting the remaining customers aside to validate the analysis. Then use the model to classify the 150 prospective customers as good or bad credit risks.

The coefficients forYears with current employerandYears at current addressare smaller for theYesclassification function, which means that customers who have lived at the same address and worked at the same company for many years are less likely to default. Similarly, customers with greater debt are more likely to default.

Factor Analysis:Factor Analysis is primarily used for data reduction or structure detection.The purpose ofdata reductionis to remove redundant (highly correlated) variables from the data file, perhaps replacing the entire data file with a smaller number of uncorrelated variables.The purpose ofstructure detectionis to examine the underlying (or latent) relationships between the variables.For Data Reduction.The principal components method of extraction begins by finding a linear combination of variables (component) that accounts for as much variation in the original variables as possible. It then finds another component that accounts for as much of the remaining variation as possible and is uncorrelated with the previous component, continuing in this way until there are as many components as original variables. Usually, a few components will account for most of the variation, and these components can be used to replace the original variables. This method is most often used to reduce the number of variables in the data file.Q. An industry analyst would like to predict automobile sales from a set of predictors. However, many of the predictors are correlated, and the analyst fears that this might adversely affect her results

For the initial solution, there are as many components as variables, and in a correlations analysis, the sum of the eigenvalues equals the number of components. You have requested that eigenvalues greater than 1 be extracted, so the first three principal components form the extracted solution.

The second section of the table shows the extracted components. They explain nearly 88% of the variability in the original ten variables, so you can considerably reduce the complexity of the data set by using these components, with only a 12% loss of information

The rotated component matrix helps you to determine what the components represent. The first component is most highly correlated withPrice in thousandsandHorsepower.Price in thousandsis a better representative, however, because it is less correlated with the other two components. The second component is most highly correlated withLength The third component is most highly correlated withVehicle type.

K-Means Cluster Analyses:The K-Means Cluster Analysis procedure begins with the construction of initial cluster centers. You can assign these yourself or have the procedure selectkwell-spaced observations for the cluster centers.After obtaining initial cluster centers, the procedure: Assigns cases to clusters based on distance from the cluster centers. Updates the locations of cluster centers based on the mean values of cases in each cluster.These steps are repeated until any reassignment of cases would make the clusters more internally variable or externally similar.

Q. A telecommunications provider wants to segment its customer base by service usage patterns. If customers can be classified by usage, the company can offer more attractive packages to its customers. Use the K-Means Cluster Analysis procedure to find subsets of "similar" customers.

The final cluster centers are computed as the mean for each variable within each final cluster. The final cluster centers reflect the characteristics of the typical case for each cluster. Customers in cluster 1 tend to be big spenders who purchase a lot of services. Customers in cluster 2 tend to be moderate spenders who purchase the "calling" services. Customers in cluster 3 tend to spend very little and do not purchase many services.

This table shows theEuclidean distancesbetween the final cluster centers. Greater distances between clusters correspond to greater dissimilarities. Clusters 1 and 3 are most different.

A large number of cases were assigned to the third cluster, which unfortunately is the least profitable group. Perhaps a fourth, more profitable, cluster could be extracted from this "basic service" group.

Correspondence Analysis:Correspondence Analysis allows you to examine the relationship between two nominal variables graphically in a multidimensional space. It computes row and column scores and produces plots based on the scores. Categories that are similar to each other appear close to each other in the plots. In this way, it is easy to see which categories of a variable are similar to each other or which categories of the two variables are related.

Q. In this example, you will use data pertaining to perceived images of six iced-coffee brands. The six brands are denoted asAA,BB,CC,DD,EE, andFFto preserve confidentiality.

The row points plot shows thatfreshanduglyare both very close to the origin, indicating that they differ little from the average row profile. Three general classifications emerge. Located in the upper left of the plot,tough,men, andworkingare all similar to each other. The lower left containssweet,fattening, children, andpremium. In contrast,healthy,low fat,nutritious, andnewcluster on the right side of the plot.

Notice in the column points plot that all brands are far from the origin, so no brand is similar to the overall centroid. BrandsCCandDDgroup together at the right, whereas brandsBBandFFcluster in the lower half of the plot. BrandsAAandEEare not similar to any other brand

In the upper left of the resulting biplot, brandEEis the only tough, working brand and appeals to men. BrandAAis the most popular and also viewed as the most highly caffeinated. The sweet, fattening brands includeBBandFF. BrandsCCandDD, while perceived as new and healthy, are also the most unpopular.

Logit Analysis:The Logit Log linear Analysis procedure is used to model the values of one or more categorical variables given one or more categorical predictors. This is accomplished through analysis of the cell counts of the cross tabulation table formed by the cross-classification of the response and predictor variables.Q. As part of an effort to improve the marketing of its breakfast options, a consumer packaged goods company polls 880 people, noting their age, gender, marital status, and whether or not they have an active lifestyle (based upon whether they exercise at least twice a week). Each participant then tasted 3 breakfast foods and was asked which one they liked best. Use Logit Loglinear Analysis to determine marketing profiles for each breakfast option.

Parameters with significant negative coefficients decrease the likelihood of that response category with respect to the reference category.Parameters with positive coefficients increase the likelihood of that response category.

Conjoint Analysis:Conjoint analysis is a technique for measuring consumer preference about the attributes--such as price or package design--of a product or service. It relies on surveying subjects with a representative set of attribute combinations--for example, a particular package design and price--which the subjects rank or score according to preference. Analysis then yields quantitative information that can be used to model consumer preference for any combination of the attributes.A study utilizing conjoint analysis consists of choosing a representative set of attribute combinations, administering them to a group of subjects, and analyzing the rankings or scores recorded by the respondents. In conjoint analysis, attributes are referred to as factors, and attribute values--like a particular price or package design--are called levels.Conjoint uses thefull-profile approachwhere respondents rank alternative products defined by specific levels of all factors. Even after careful selection of the factors and levels for a study, the total number of potential product combinations is frequently too large for subjects to judge. For instance, with 5 factors and 3 levels for each factor, the number of combinations is 243 (3 3 3 3 3). To solve this problem, the full-profile approach uses what is termed afractional factorial design, which presents a suitable fraction of all possible combinations of the factor levels. The resulting set, called anorthogonal array, is designed to capture the main effects for each factor level. Interactions between levels of one factor with levels of another factor are assumed to be negligible.The Generate Orthogonal Design procedure is used to generate an orthogonal array and is typically the starting point of a conjoint analysis.Qs. In a popular example of conjoint analysis, a company interested in marketing a new carpet cleaner wants to examine the influence of five factors on consumer preferencepackage design, brand name, price, aGood Housekeepingseal, and a money-back guarantee. There are three factor levels for package design, each one differing in the location of the applicator brush; three brand names (K2R,Glory, andBissell); three price levels; and two levels (either no or yes) for each of the last two factors. The following table displays the variables used in the carpet-cleaner study, with their variable labels and values.Variables in the carpet-cleaner studyVariable nameVariable labelValue label

packagepackage designA*, B*, C*

brandbrand nameK2R, Glory, Bissell

priceprice$1.19, $1.39, $1.59

sealGood Housekeeping sealno, yes

moneymoney-back guaranteeno, yes

There could be other factors and factor levels that characterize carpet cleaners, but these are the only ones of interest to management. This is an important point in conjoint analysis. You want to choose only those factors (independent variables) that you think most influence the subject's preference (the dependent variable). Using conjoint analysis, you will develop a model for customer preference based on these five factors.

This table shows the utility (part-worth) scores and their standard errors for each factor level. Higher utility values indicate greater preference. Since the utilities are all expressed in a common unit, they can be added together to give thetotal utilityof any combination. For example, the total utility of a cleaner with package designB*, brandK2R, price$1.19, and no seal of approval or money-back guarantee is:utility(package B*) + utility(K2R)

+ utility($1.19) + utility(no seal)

+ utility(no money-back) + constant

or1.867 + 0.367 + (6.595) + 2.000 + 1.250 + 12.870 = 11.759

Multidimensional Scaling:Given a set of objects, the goal of multidimensional scaling is to find a representation of the objects in a low-dimensional space. This solution is found by using theproximitiesbetween the objects. The procedure minimizes the squared deviations between the original, possibly transformed, object proximities and their Euclidean distances in the low-dimensional space.The purpose of the low-dimensional space is to uncover relationships between the objects. By restricting the solution to be a linear combination of independent variables, you may be able to interpret the dimensions of the solution in terms of these variables. In the following example, you will see how 15 different kinship terms can be represented in three dimensions and how that space can be interpreted with respect to the gender, generation, and degree of separation of each of the terms.