95
Presented by Abhijeet Birari UNIT V ANALYSIS OF DATA

Analysis of data in research

Embed Size (px)

Citation preview

Page 1: Analysis of data in research

Presented by Abhijeet Birari

UNIT VANALYSIS OF DATA

Page 2: Analysis of data in research

ANALYSIS OF DATA

Collection of Data

Analysis of Data

Draw Logical Inferen

ces

Page 4: Analysis of data in research

Statistical Package for Social Sciences

Page 5: Analysis of data in research

WHAT IS SPSS?• SPSS Statistics is a software package used for statistical analysis.• SPSS can be used for:

– Processing Questionnaire– Reporting in tables and graphs– Analyzing

• Mean, Median, Mode• Mean Dev & Std. Dev., • Correlation & Regression, • Chi Square, T-Test, Z-test, ANOVA, MANOVA, Factor Analysis, Cluster Analysis, Multidimensional Scaling etc.

• Founded in 1968 and acquired by IBM in 2009.

Page 6: Analysis of data in research
Page 7: Analysis of data in research

WHAT IS HYPOTHESIS?“The statement speculating the outcome of a research or experiment.”

• H0=There is no difference in performance of Div. A, B and C in Semester I

• Ha=Business Communication subject has been effective in developing communication skills of students

• H0=Biometric system has not improved the attendance of faculties

• Ha=Excessive fishing has affected marine life

• H0=There is no significant difference in salary of males and females in particular organization.

Here, H0=Null HypothesisHa=Alternate Hypothesis

Page 8: Analysis of data in research

WHAT IS LEVEL OF SIGNIFICANCE When null hypothesis is true, you accept it.

When it is false, you reject it.

5% level of significance means you are taking 5% risk of rejecting null hypothesis when it happens to be true.

It is the maximum value of probability of rejecting H0 when it is true.

Page 9: Analysis of data in research

TYPES OF STATISTICAL TESTS

Tests Meaning When it is used

Statistical tests used

Parametric Tests Based on assumption that

population from where the sample is drawn is normally distributed.

Used to test parameters like mean, standard deviation, proportions etc.

• T-test• ANOVA• ANCOVA• MANOVA• Karl Pearson

Non parametric

Tests

Don’t require assumption regarding shape of population distribution.

Used mostly for categorical variable or in case of small sample size which violates normality.

• Chi Square• Mann-Whitney U• Wilcoxon Signed Rank• Kruskal-Wallis• Spearman’s

Page 10: Analysis of data in research

ANOVA(Analysis of Variance)

Page 11: Analysis of data in research

INTRODUCTION• Significance of difference between means of two samples can be judged using:

– Z test (>30)– T test (<30)

• Difficulty arises while measuring difference between means of more than 2 samples• ANOVA is used in such cases• ANOVA is used to test the significance of the difference between more than two sample means and

to make inferences about whether our samples are drawn from population having same means

Significance of difference of IQ of 2 divisions Z test or T Test

Significance of difference between performance of 5 different types of vehicles ANOVA

Page 12: Analysis of data in research

WHEN TO USE ANOVA?

Compare yield of crop from several variety of seeds

Mileage of 4 automobiles

Spending habits of five groups of students

Productivity of 4 different types of machine during a given period of time

Effectiveness of fitness programme on increase in stamina of 5 players

Page 13: Analysis of data in research

WHY ANOVA INSTEAD OF MULTIPLE T TEST?

• If more than two groups, why not just do several two sample t-tests to compare the mean from one group with the mean from each of the other groups?

• The problem with the multiple t-tests approach is that as the number of groups increases, the number of two sample t-tests also increases.

• As the number of tests increases the probability of making a Type I error also increases.

Page 14: Analysis of data in research

ANOVA HYPOTHESES• The Null hypothesis for ANOVA is that the means for all groups

are equal.

• The Alternative hypothesis for ANOVA is that at least two of the means are not equal.

Page 15: Analysis of data in research

ONE WAY ANOVA AND

TWO WAY ANOVA

Page 16: Analysis of data in research

What is 1-way ANOVA and 2-way ANOVA?

• If we take only one factor and investigate the difference among its various categories having numerous possible values, it is called as One-way ANOVA.

• In case we investigate two factors at the same time, then we use Two-way ANOVA

Training Type ProductivityAdvanced 200Advanced 193Advanced 207

Intermediate 172Intermediate 179Intermediate 186

Beginners 130Beginners 125Beginners 119

One-way ANOVA

Gender Educational Level

Marks

Male School 89

Male College 50

Male School 90

Male College 80

Female College 50

Female University 40

Female School 91

Female University 56

Two-way ANOVA

Page 17: Analysis of data in research

HOW ANOVA WORKS?• Three methods used to dissolve a powder in water are compared by the time (in minutes) it

takes until the powder is fully dissolved. The results are summarized in the following table:

• It is thought that the population means of the three methods m1, m2 and m3 are not all equal (i.e., at least one m is different from the others). How can this be tested?

Page 18: Analysis of data in research

• One way is to use multiple two-sample t-tests and • compare Method 1 with Method 2, • Method 1 with Method 3 and • Method 2 with Method 3 (comparing all the pairs)• But if each test is 0.05, the probability of making a Type 1 error when running three tests would

increase.• Better method is ANOVA (analysis of variance)• The technique requires the analysis of different forms of variances – hence the name.

Important: ANOVA is used to show that means are different and not variance are different.

Page 19: Analysis of data in research

• ANOVA compares two types of variances• The variance within each sample and • The variance between different samples.

• The black dotted arrows show the per-sample variation of the individual data points around the sample mean (the variance within).

• The red arrows show the variation of the sample means around the grand mean (the variance between).

Page 20: Analysis of data in research

STEPS FOR USING ANOVA

Null Hypothesis H0 : μ1= μ2= μ3=………= μk

Alternate Hypothesis Ha : μ1≠ μ2 ≠ μ3 ≠ ……… ≠ μk

1. Calculate mean of each sample (x̄1, x̄2, x̄3…… x̄k)

2. Calculate mean of sample means:

Where k=Total number samples

3. Calculate Sum of Square between the samples:

Where n1=Total number of item in sample 1n2=Total number of item in sample 2n3=Total number of item in sample 3 …………………….

Step 1 : State Null and Alternate Hypothesis

Step 2 : Compute Variance Between the samples

kXXXXX K

.......321

2233

222

211 )(......)()()( xxnxxnxxnxxnSS kkbetween

Page 21: Analysis of data in research

1. Calculate Sum of Square within the samples:

SSTotal = SSBetween + SSWithin

Step 3 : Compute Variance Within samples

2233

222

211 )(....)()()( kkiiiiiiiiwithin xxxxxxxxSS

Step 4 : Calculate total variance

Step 5 : Calculate average variance between and within samples

1

kSSMS Between

between knSSMS within

within

N=Total no of items in all samples

K=Number of samples

Page 22: Analysis of data in research

Step 6 : Calculate F-ratio

within

between

MSMSFratio

Step 7 : Set up ANOVA table

Source of variation

Sum of squares (SS)

Degree of freedom (d.f)

Mean Squares F-Value(Calculated)

Between Samples

SS Between k-1 MS Between= SS Between/k-1

F=MS Between/MS Within

Within Samples

SS Within n-k MS Within=SS Within/n-k

Total SS Total n-1

Page 23: Analysis of data in research

Decision Rule: Reject H0 if Calculated value of F > Tabulated value of F Otherwise accept H0

Or

Accept H0 if Calculated value of F < Tabulated value of F Otherwise reject H0

Step 8 : Look for Table value of F

Steps:1. Find out two degree of freedom (one for between and one for

within)2. Denote x for between and y for within [F(x,y)]3. In F-distribution table, go along x columns, and down y rows.

The point of intersection is your tabulated F-ratio

Page 24: Analysis of data in research

EXAMPLE• Set up an analysis of variance table for the following per acre production

data for three varieties of wheat, each grown on 4 plots and state if the variety differences are significant.

• Test at 5% level of significance

Page 25: Analysis of data in research

H0 = The difference between varieties is not significantHa = The difference in varieties is significant

Page 26: Analysis of data in research

Interpretation:

Calculated Value of F < Table Value of F ∴ Accept Null Hypothesis

Difference in wheat output due to varieties is not significant and is just a matter of chance.

Page 27: Analysis of data in research

EXAMPLE• Ranbaxy Ltd. has purchased three new machines of different makes and

wishes to determine whether one of them is faster than the others in producing a certain output.

• Four hourly production figures are observed at random from each machine and the results are given below:

• Use ANOVA and determine whether machines are significantly different in their mean speed.

Observations M1 M2 M3

1 28 31 30

2 32 37 28

3 30 38 26

4 34 42 28

Page 28: Analysis of data in research

EXAMPLE

Page 29: Analysis of data in research

EXAMPLE

Page 30: Analysis of data in research

TWO WAY ANOVA

Page 31: Analysis of data in research

TWO WAY ANOVA• Two-way ANOVA technique is used when the data are classified on the basis of two factors.

• For example, the agricultural output may be classified on the basis of different varieties of seeds and also on the basis of different varieties of fertilizers used.

• Two types of 2-way ANOVA– Without repeated values– With repeated values

Page 32: Analysis of data in research

STEPS IN 2-WAY ANOVA

1

2

3

Page 33: Analysis of data in research

STEPS IN 2-WAY ANOVA

SS for residual or error = total SS – (SS between columns + SS between rows)

4

5

6

Page 34: Analysis of data in research

STEPS IN 2-WAY ANOVA

7

Page 35: Analysis of data in research

Prepare ANOVA Table

STEPS IN 2-WAY ANOVA

8

Page 36: Analysis of data in research

EXAMPLE

Page 37: Analysis of data in research
Page 38: Analysis of data in research
Page 39: Analysis of data in research

RESEARCH PROPOSAL

Page 40: Analysis of data in research

WHAT IS RESEARCH PROPOSAL?

A research proposal is a document that provides a detailed description of the intended program. It is like an outline of the entire research process that gives a reader a

summary of the information discussed in a project.

Page 41: Analysis of data in research

WHAT IS RESEARCH PROPOSAL?• Research proposal sets out

– Broad topic you want to research– What is it trying to achieve?– How would you do research?– What would be time need?– What results it might produce?

Page 42: Analysis of data in research

PURPOSE OF RESEARCH PROPOSAL• Convince others that research is worth

• Sell your idea to funding agency

• Convince the problem is significant and worth study

• Approach is new and yield results

Page 43: Analysis of data in research

ELEMENTS OF RESEARCH PROPOSALIntroduction

Statement of Problem

Purpose of the Study

Review of Literature

Questions and Hypothesis

The Design – Methods & Procedures

Limitations of the Study

Significance of the Study

References

Page 44: Analysis of data in research

FACTOR ANALYSIS

Page 45: Analysis of data in research

Color of Bike

Look

Masculine/Feminine

Mileage

Price

Maintenance Cost

Power

Speed

Control

Weight

Brand

Ease of delivery

Financial Assistance

Offer/Discounts Tyre size

Disc Brake

Smooth Handling

Service Centers

Design Cost Technical Comfort

FACTORS Unobserved

Observed

Page 46: Analysis of data in research

FACTOR ANALYSIS

“Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of

unobserved variables called factors.”

Page 47: Analysis of data in research

EXAMPLEAcademic ability of student

Quantitative Ability Verbal Ability

1. Maths Score2. Computer Program Score3. Physics Score4. Aptitude Test Score

1. English2. Verbal Reasoning Score

Page 48: Analysis of data in research

PURPOSE OF FACTOR ANALYSIS

• To identify underlying constructs in the data.

• To reduce number of variables

• To reduce redundancy of data (E.g. Quantitative Aptitude)

Page 49: Analysis of data in research

APPLICATION OF FACTOR ANALYSIS

• Market Segmentation• Product Research• Advertising Studies• Pricing Studies

Page 50: Analysis of data in research

Friendliness of Staff

Time Spent in Line-up

Assistance via Telephone

Service

Observed

Unobserved

X1 X2 X3

F1

Page 51: Analysis of data in research

X1

X2

X3

X4

F1

F2

F3

F4

a1

b1

c1

d1

Page 52: Analysis of data in research

WAYS OF FACTOR ANALYSIS

1. Confirmative Factor Analysis– Factors and corresponding variables are already known– On the basis of literature review or past experience/expertise

2. Exploratory Factor Analysis– Algorithm is used to explore pattern among variables– Then factors are explored– No prior hypothesis to start with

Page 53: Analysis of data in research

CONDITIONS FOR FACTOR ANALYSIS

• Use interval or ratio data• Variables are related• Sufficient number of variables (min 4-5 variables for one factor)• Large no of observations• All variables should be normally distributed

Page 54: Analysis of data in research

STEPS IN FACTOR ANALYSISFormulate the Problem

Construct the Correlation Matrix

Determine the method of Factor Analysis

Determine Number of Factors

Estimate the Factor Matrix

Rotate the Factors

Estimating Practical Significance

Page 55: Analysis of data in research

DISCRIMINANT ANALYSIS

Page 56: Analysis of data in research

EXAMPLE• Basketballer or volleyballer on the basis of anthropometric variables.

• High or low performer on the basis of skill.

• Juniors or seniors category on the basis of the maturity parameters.

Page 57: Analysis of data in research

DEFINITION

“Discriminant analysis is a multivariate statistical technique used for classifying a set of observations into pre defined groups.”

Page 58: Analysis of data in research

OBJECTIVE• To understand group differences and to predict the likelihood

that a particular entity will belong to a particular class or group based on independent variables.

Page 59: Analysis of data in research

PURPOSE• To classify a subject into one of the two groups on the basis of

some independent traits.

• To study the relationship between group membership and the variables used to predict the group membership.

Page 60: Analysis of data in research

SITUATIONS FOR ITS USE• When the dependent variable is dichotomous or multichotomous.

• Independent variables are metric, i.e. interval or ratio.

• Example: • Basketballer or volleyballer on the basis of anthropometric variables.• High or low performer on the basis of skill.• Juniors or seniors category on the basis of the maturity parameters.

Page 61: Analysis of data in research

ASSUMPTIONS1. Sample size

– Should be at least five times the number of independent variables.

2. Normal distribution– Each of the independent variable is normally distributed.

3. Homogeneity of variances / covariances– All variables have linear and homoscedastic relationships.

Page 62: Analysis of data in research

ASSUMPTIONS• Outliers

– Outliers should not be present in the data. DA is highly sensitive to the inclusion of outliers.

• Non-multicollinearity– There should be any correlation among the independent variables.

• Mutually exclusive– The groups must be mutually exclusive, with every subject or case belonging to

only one group.

Page 63: Analysis of data in research

ASSUMPTIONS• Variability

– No independent variables should have a zero variability in either of the groups formed by the dependent variable.

Page 64: Analysis of data in research

To identify the players into different categories during selection process.

Page 65: Analysis of data in research
Page 66: Analysis of data in research

CLUSTER ANALYSIS

Page 67: Analysis of data in research

DEFINITION• “Cluster analysis is a group of multivariate techniques whose primary purpose is to

group objects (e.g., respondents, products, or other entities) based on the characteristics they possess.”

• It is a means of grouping records based upon attributes that make them similar.

• If plotted geometrically, the objects within the clusters will be close together, while the distance between clusters will be farther apart.

Page 68: Analysis of data in research

CLUSTER VS FACTOR ANALYSIS Cluster analysis is about grouping subjects (e.g. people). Factor analysis is about

grouping variables.

Cluster analysis is a form of categorization, whereas factor analysis is a form of simplification.

In Cluster analysis, grouping is based on the distance (proximity), in Factor analysis it is based on variation (correlation)

Page 69: Analysis of data in research

EXAMPLE• Suppose a marketing researcher wishes to determine market segments in a community based on

patterns of loyalty to brands and stores a small sample of seven respondents is selected as a pilot test of how cluster analysis is applied. Two measures of loyalty- V1(store loyalty) and V2(brand loyalty)- were measured for each respondents on 0-10 scale.

Page 70: Analysis of data in research
Page 71: Analysis of data in research

HOW DO WE MEASURE SIMILARITY?• Proximity Matrix of Euclidean Distance Between Observations

ObservationObservations

A B C D E F G

A ---

B 3.162 ---

C 5.099 2.000 ---

D 5.099 2.828 2.000 ---

E 5.000 2.236 2.236 4.123 ---

F 6.403 3.606 3.000 5.000 1.414 ---

G 3.606 2.236 3.606 5.000 2.000 3.162 ---

Page 72: Analysis of data in research

HOW DO WE FORM CLUSTERS?• Identify the two most similar(closest) observations not already in the same cluster and combine

them.

• We apply this rule repeatedly to generate a number of cluster solutions, starting with each observation as its own “cluster” and then combining two clusters at a time until all observations are in a single cluster.

• This process is termed a hierarchical procedure because it moves in a stepwise fashion to form an entire range of cluster solutions. It is also an agglomerative method because clusters are formed by combining existing clusters.

Page 73: Analysis of data in research

AGGLOMERATIVE PROCESS CLUSTER SOLUTION

StepMinimum Distance

Unclustered Observationsa

Observation Pair Cluster Membership Number of

Clusters

Overall Similarity Measure (Average

Within-Cluster Distance)

Initial Solution (A)(B)(C)(D)(E)(F)(G) 7 0

1 1.414 E-F (A)(B)(C)(D)(E-F)(G) 6 1.414

2 2.000 E-G (A)(B)(C)(D)(E-F-G) 5 2.192

3 2.000 C-D (A)(B)(C-D)(E-F-G) 4 2.144

4 2.000 B-C (A)(B-C-D)(E-F-G) 3 2.234

5 2.236 B-E (A)(B-C-D-E-F-G) 2 2.896

6 3.162 A-B (A-B-C-D-E-F-G) 1 3.420

Page 74: Analysis of data in research
Page 75: Analysis of data in research

• Dendogram:

Graphical representation (tree graph) of the results of a hierarchical procedure. Starting with each object as a separate cluster, the dendogram shows graphically how the clusters are combined at each step of the procedure until all are contained in a single cluster

Page 76: Analysis of data in research

USAGE OF CLUSTER ANALYSIS

Market Segmentation: Splitting customers into different groups/segments where customers have similar requirements.

Segmenting industries/sectors:

Segmenting Markets: Cities or regions having common traits like population mix, infrastructure development, climatic condition etc.

Career Planning: Grouping people on the basis of educational qualification, experience, aptitude and aspirations.

Segmenting financial sectors/instruments: Grouping according to raw material cost, financial allocation, seasonability etc.

Page 77: Analysis of data in research

CONJOINT ANALYSIS

Page 78: Analysis of data in research

EXAMPLE

Page 79: Analysis of data in research

MEANING• Concerned with understanding how people make choices between products or

services or

• Combination of product and service

• Businesses can design new products or services that better meet customers underlying needs.

• Conjoint analysis is a popular marketing research technique that marketers use to determine what features a new product should have and how it should be priced.

Page 80: Analysis of data in research

• Suppose we want to market a new golf ball. We know from experience and from talking with golfers that there are three important product features:1. Average Driving Distance2. Average Ball Life3. Price

Page 81: Analysis of data in research

TYPES OF CONJOINT ANALYSIS1. Choice Based

– Respondents select from grouped options

Page 82: Analysis of data in research

TYPES OF CONJOINT ANALYSIS2. Adaptive Choice

– It is used for studying how people make decisions regarding complex products or services– Packages adapt based on previous selections– It gets ‘smarter’ as the survey progresses

Page 83: Analysis of data in research

TYPES OF CONJOINT ANALYSIS

Page 84: Analysis of data in research

TYPES OF CONJOINT ANALYSIS3. Menu-based

1. Respondents are shown a list of features and levels

2. They have to choose among options3. Example: Airtel My Plan

Page 85: Analysis of data in research

TYPES OF CONJOINT ANALYSIS

Page 86: Analysis of data in research

4. Full profile rating based– Display series of product profile– Typically rated on likelihood to purchase or

preference scale

Page 87: Analysis of data in research

5. Self explicate– Direct ask of features and levels– Each feature is presented separately

for evaluation – Respondents rate all remaining

features according to desirability

Page 88: Analysis of data in research

ADVANTAGES• Estimates psychological tradeoffs that consumers make when evaluating several

attributes together

• Measures preferences at the individual level

• Uncovers real or hidden drivers which may not be apparent to the respondent themselves

• Realistic choice or shopping task

• Used to develop needs based segmentation

Page 89: Analysis of data in research

DISADVANTAGES• Designing conjoint studies can be complex

• With too many options, respondents resort to simplification strategies

• Respondents are unable to articulate attitudes toward new categories

• Poorly designed studies may over-value emotional/preference variables and undervalue concrete variables

• Does not take into account the number items per purchase so it can give a poor reading of market share

Page 90: Analysis of data in research

MULTIDIMENSIONAL SCALING

Page 91: Analysis of data in research

EXAMPLEA researcher may give test subjects several varieties of apple and have them make comparisons on several criteria between two apples at a time. Once all the apples are directly compared to each other variety, the data is plotted on a graph that shows how similar one type is to another.

Page 92: Analysis of data in research

MEANING• Multidimensional scaling (MDS) is a means of visualizing the level of similarity of

individual cases of a dataset.

• Multidimensional scaling is a method used to create comparisons between things that are difficult to compare.

• The end result of this process is generally a two-dimensional chart that shows a level of similarity between various items, all relative to one another.

Page 93: Analysis of data in research
Page 94: Analysis of data in research

APPLICATIONS OF MDS• Understanding the position of brands in the marketplace relative to groups of

homogeneous consumers.

• Identifying new products by looking for white space opportunities or gaps.

• Gauging the effectiveness of advertising by identifying the brands position before and after a campaign.

• Assessing the attitudes and perceptions of consumers.

• Determine what attributes the brand owns and what attributes competitors own.

Page 95: Analysis of data in research

THANK YOU