Multi variate presentation

Preview:

Citation preview

MULTIVARIATE DATA ANALYSIS- A GLIMPSE

FDP conducted @ NITC – CalicutBy

Arun Kumar. S

Quote of the day…

Statistics is the science that addresses the twin question: what data should be collected, and, once collected, how should they be analyzed?

- W.G. HUNTER

Can Statistics be trusted?

“There are three kinds of lies: Lies, damnedlies, and statistics.” -- Mark Twain

Doing research(PhD)…

• No end• Time constraints• Age constraints• Your own capability• Importance of friends/family . . . • Improving writing and analytical(research methodology) skills..

Steps in Research process• Course work• Mini project• Literature survey• Choosing research problem-Accomplishable

research • Research plan• Prepare questionnaire• Collect data• Analyze data• Publish or perish• Thesis writing

Which test should I use ?

Indicators

Multivariate analysis

• An extension to univariate (with a single variable) and bivariate (with two variables) analysis

• Dealing with a number of samples and species/environmental variables simultaneously

Multivariate analysis techniques

• Multiple Regression Analysis• Logistic Regression Analysis• Discriminant Analysis• Multivariate Analysis of Variance (MANOVA)• Factor Analysis• Cluster Analysis• Multi-dimensional Scaling• Correspondence Analysis• Conjoint Analysis• Canonical Correlation• Structural Equation Modelling

Multivariate Data Set

• Morphological measurement of organisms (e.g. length)

• Physiological measurement of organisms (e.g. blood pressure)

• Physiochemical measurement of the environment (e.g. air temperature)

• Species abundance

• Species richness etc……

Why MVA?

• Situation 1: A harried executive calls you into his office and shows you three proposed advertising campaigns for next year. He asks, “Which one should I use? They all look pretty good to me.”

• Situation 2: During the annual budget meeting, the sales manager wants to know why two of his main competitors are gaining share. Do they have better widgets? Do their products appeal to different types of customers? What is going on in the market?

Multidimensional Scaling

• A mathematical dimension reduction technique that maps the distances between observations in a high dimensional space into a lower (for example, two) dimensional space.

• There are two types of MDS:

- Metric

- Non-metric

• Metric MDS:– Assume the input data is either interval or ratio

during measurement (Quantitative )• Non-metric MDS (nMDS)

– The data should be in the form of rank (Quantitative and/or Qualitative)

Data Type….

Major Advantages of nMDS• Ordination is based on the ranked

similarities/dissimilarities between pairs of samples. The actual values of data are not being used in the ordination, few assumptions on the nature and quality of the data

• Ordinal data could be used

e.g. 1 = very low; 2 = low; 3 = mid; 4 = high; 5 = very high

How does MDS Work? MDS attempts to locate the n observations in a

reduced dimensional space so that the differences between pairs of points in this reduced space match, as closely as possible, the true-ordered differences between the observations

Data Envelopment Analysis (DEA)• DEA, initiated by Charnes, Cooper and Rhodes (CCR)

(1978) and building on Farell’s (1957) work was further generalized by Banker, Charnes and Cooper (BCC) (1984).

• non-parametric linear programming technique• relative efficiency of decision-making units (DMUs)• The method utilizes linear programming to ‘envelop’

observed data and then calculate efficiency based on the distance a unit is firm the relatively efficient frontier (the envelope).

• Widely used in recent literature

• A DEA model can be constructed either to minimise inputs or to maximise outputs. An input orientation aims at reducing the input amounts as much as possible while keeping at least the present output levels, while an output orientation aims at maximising output levels without increasing use of inputs (Cooper et al., 2000).

Steps in DEA

• Identify the objective function • Identify the Inputs and outputs• Collect the data• Make the assumptions of the model

– Input oriented or output oriented• Select the software to run DEA• Report the results

Objectives of DEA

• To measure the technical efficiency of different departments in XYZ university in the year 2005-06

• To measure the allocative efficiency of a sample of manufacturing firms in India during the period 2009-2010

• To measure the scale efficiency of sample of nationalized banks in Trichy city in 2010-2011

Can find out . . .

• single efficiency score• it highlights the areas of improvement for each single DMU• able to identify whether it has used input excessively or its output has been under-

produced.• The set of efficiency DMUs is called the

reference set.

Conditions for implementing DEA

• DMUs to be sufficiently similar,• so that comparisons are meaningful.• similar range of resources is available to all the

units and they operate in a similar environment.

• It is to be noted that those DMUs indicated as efficient are only efficient in relation to others in the sample.

• Efficiency scores from DEA analyses are known to be highly sensitive to both the choice of input and output measures and the inclusion or exclusion of key variables. A serious drawback of DEA is that it does not provide tests of significance of the input or output variables included in

the model.

DEA software

• Efficiency Measurement Systems(EMS)• Data Envelopment Analysis Program(DEAP)

Tim Coelli 1996• MS Excel Spreadsheet for DEA computations

Rule of thumb

• DMUs must be homogenous units or the ones with similar objectives.

• The no: of DMUs is expected to be larger than the product of no: of inputs and outputs in order to discriminate effectively between efficient and inefficient DMUs

• The sample size should be at least 2 or 3 times larger than the sum of the no: of inputs and outputs

Recommended