35
1 Navigator IBM A SPSS A Statistics (A Vade Mecum for SPSS 19 on Windows7)

SPSS 19 self study pack

  • Upload
    donhu

  • View
    220

  • Download
    2

Embed Size (px)

Citation preview

Page 1: SPSS 19 self study pack

1

Navigator

IBMA SPSSA

Statistics

(A Vade Mecum for SPSS 19 on Windows7)

Page 2: SPSS 19 self study pack

2

Page 3: SPSS 19 self study pack

3

Navigator

IBMA SPSSA

Statistics

(A Vade Mecum for SPSS 19 on Windows7)

Page 4: SPSS 19 self study pack

4

Preface

This is an elementary guide intended for navigating IBM SPSS v19 on the Windows

platform. Little or no previous knowledge of SPSS is assumed. The guide is to be

treated as an interface to manipulate tools available from IBM SPSS before

embarking on statistical applications and their interpretations. Brief comments on data

properties and necessary requirements for applying specific statistical techniques have

been outlined in the appendix.

September 2011 Royal Holloway

University of London

Further Readings:

Frequently Applied Statistical Techniques P Pal

Multivariate Statistical Analysis P Pal

Page 5: SPSS 19 self study pack

5

Contents

Working with Input Editor 4

Data View 4

Variable View 5

Data Manipulations 6

Pre-processing of Raw Data 7

Compute 8

Re-Code 10

Case Selection 12

Simple Condition; Conjugate Condition 13

Split File 15

Working with Output Editor 17

Controlling Output 18

Export Output to another Application 18

Appendix

Basic Statistics 22

Uncertainties

Frequently Applied Statistical Techniques

Mathematical Models

Page 6: SPSS 19 self study pack

6

NAVIGATOR IBM SPSS 19

§ Working with the Input

When SPSS is opened, the first window that appears on the screen is the Input Editor

(Data View). At the top of this Editor, two bars appear: standard menu bar and

standard tool bar.

The main body consists of a spreadsheet with rows and columns. Like most

applications packages under Windows OS, there are arrows on the right hand and

bottom margins of the SPSS spreadsheet for scrolling it up, down or side ways as

required. At the bottom (left hand corner) two tabbed buttons are located e,g Data

View and Variable View/

The standard menu bar shows the following menu items:

Item Used for Usage

File opening, saving, printing files (extensive)

Edit copy, paste, editing files (extensive)

View controlling the appearance of software (less extensive)

Data organizing data file, code labelling of data values (extensive)

Transform recode, compute, defining new variables (extensive)

Analyze statistical applications (extensive)

Graphs drawing different types of graphs (extensive)

Add-ons plug-in additional applications (less frequent)

Utilities accessing command languages (less frequent)

Window activating a particular window (seldom)

Help accessing SPSS help facilities on applications (extensive)

Exhibit 1 The Input Editor (Data View)

Page 7: SPSS 19 self study pack

7

The tool bar below the standard menu bar contains tools which are used as short cut

tools to perform some of the standard menu functions. The specific function by the

individual icon may be displayed by anchoring the mouse pointer on it.

Exhibit 1 shows 5 sets of data have been entered in 3 columns. By default, each value

has 2 decimal places as the values are entered. The three active columns containing

values are automatically labelled VAR00001, VAR00002, VAR00003 as values are

typed in. The remaining columns remain passive with blank cells. These columns

labelled Var (faintly displayed).

By clicking on the Variable View tab, the Input Editor is opened showing the

variables.

Exhibit 2. Input Editor Variable View

This window shows the list of variables entered in the Input Editor (Data View) and

information about the types and format including names and other properties of

individual variables, together with their character width, decimal places etc. In the

above Exhibit, notable elements are: 3 variables with their names, their type (i.e.

numeric or string type; here it is numeric type), width (8 character width), decimals (2

decimal places), alignment of data values (right alignment).

Page 8: SPSS 19 self study pack

8

§ Manipulation of Input data

Different techniques are available for the manipulation of input data. In the following

sections, a selection of frequently used manipulation methods is outlined. These

include: Changing variable names, Defining data properties, Labelling code values,

Increase or decrease decimal places.

Change Variable names:

Steps to Change Variable names (header labels)

Anchor mouse pointer to a required cell in the Name column

(Col 1 of the Variable View Editor)

Type the desired name;

Repeat process to all the variable cells as required.

Variable names must begin with a letter, have up to 8 characters (without space),

Data manipulation:

Usually the width of data values should not require any change. However, default

decimal places of 2, may need changing.

Steps to change decimal places in the data values

Anchor mouse pointer to a cell in the Decimal column, (shows the up and

down arrow scroll button appears).

Click Up arrow to increase or Down arrow to decrease decimal places

in the data values.

Steps to change value labels

Values in some variables may consist of discrete numeric codes representing some

class names. These codes may be given their appropriate value labels.

Click on Data from the Menu bar and Choose Define Data Properties

(The variable scanning window appears)

Exhibit 3 Variable scanning window

Page 9: SPSS 19 self study pack

9

Steps (contd)

Click on the desired variable from the list and transfer to the right box

Click on Continue; next window opens (see Exhibit below)

Click on the Variable appearing on the left.

Anchor mouse pointer in the Label box (top

right hand corner) and

type an elaborate label as desired;

Type in the labels in the boxes next to

each Code value.

Finally OK.

Exhibit 4 Value edit window

§ Pre-processing of raw data.

Pre-processing of raw data may consist of (i) direct arithmetic calculation, or (ii) by

application of an in-built function e.g. Log10 or Ln function or exponential or

trigonometric functions or (iii) application of a statistical probability function e.g.

CDF (cumulative distribution function)

Direct calculation may be based on a formula or an equation. The formula expression

may contain variables (entered as input data) and constants.

A variety of in-built functions are offered for transformation of raw data values. Any

specific function may be selected from a list of functions. Data values contained

selected variable(s) from the input list are used as the argument(s) of the selected

function.

Page 10: SPSS 19 self study pack

10

(a) Pre-processing by Direct calculation using an expression

Here use constant numeric values, data variables, and operator symbols are used from

the numeric key pad of the Compute dialog box

Example 1: Calculate AGE from the raw data set containing the Month and Year of

birth (to 2011) This is to be calculated with reference to the year 2011.

Suggested formula is

AGE = (2011 – Year) + (12 – Month)/12

Here AGE is the target variable in which the computed result values go and Year and

Month variables are provided as raw data input variables (See Exhibit 3)

Exhibit 5 Compute Dialog box.

The operators (e.g. + , -, *, / operators), parantheses, and numeric constants are

available from the key pad displayed by the Compute box, the input variables list

appears on the left hand side pane.

Steps to compute a new variable

Transform

Compute (Compute dialog box appears)

Enter desired name in the Target box

Enter the desired formula to compute in

the Numeric Expression box

Finally OK

Page 11: SPSS 19 self study pack

11

Note:

In the Transform --- Compute process (SPSS 19), when the entries are made from the

key board, these entries are recorded as a syntax document. This syntax document

appears after the final OK is clicked. The sequence is illustrated with the following

simple computational example.

Example: Conversion of Centigrade values to Fahrenheit scale.

Formula Fahrenheit = (5/9)*Centigrade + 32

In the Compute process, we chose Fahrenheit as the Target variable and in the

Function box, we type the formula by picking the appropriate symbols and operators

and finally OK.

The syntax representation appears on the screen. This is shown in the Exhibit.

Exhibit 6 Syntax file on the foreground and Data file on the background.

The Syntax representation may be cancelled and the results (in this case the

Fahrenheit scale values) appear as a new variable data as follows.

Page 12: SPSS 19 self study pack

12

.Exhibit 7 The results after pre-processing are shown as the target variable.

(b) Computation by using a function

For computation of any in-built function on the data values, select required function

from the functions group list (top half of the box) and select any procedure function

from special variables group (bottom half).

Example 2: (1 ) Compute Log10 of AGE and

(2 ) natural log of AGE

The formulas for these computations are as follows:

Log_AGE = Lg10 (AGE)

LnAGE = LN (AGE)

Here LOG_AGE and LNAGE are the Target variables for Log10 and natural log

transformations respectively, and LG10 and Ln are the corresponding functions to be

applied to the variable AGE.

Steps

Transform

Compute (Compute dialog box appears)

Enter desired name in the Target box

Enter the desired function in Numeric Expression box

Enter required argument from the variable list

Finally OK.

(For entering a function, click on the up arrow button at the bottom right hand corner

of the key pad. The function is displayed which requires appropriate arguments.)

Page 13: SPSS 19 self study pack

13

From the input variable list highlight the appropriate variable and enter as the required

argument.)

Exhibit 8 (Using Functions on Variables)

These pre-processed results (after applying chosen functions) appear in the Input

Editor as new Variables with the name that was given as the Target Variable in the

Compute process.

Exhibit 9.(Input Editor displaying Computed Values)

Page 14: SPSS 19 self study pack

14

§ Re-Code

Re-code tool is useful to hold new specified (re-coded) values from an existing

variable. Re-code mechanism allows to form a categorical variable from an existing

variable (old) containing continuous data.

Steps to re-code

Transform

Re-code (select into different variable option)

(Re-code dialog box appears)

Exhibit 8 (Re-code Dialog box)

Steps (Contd).

Select the variable to be re-coded from the input variable list

Type an output variable in the Output variable box and click Change

Click Old and New values button

(Old and New values box appears)

Exhibit 10. (Old and New values Dialog box)

In this box three range selection options are available

(i) Range ---------through -----------

(ii) Range Lowest through ---- Value

(iii) Range Value ---- through Highest

Page 15: SPSS 19 self study pack

15

Steps (Contd

Enter the limits of each range in appropriate blank boxes

Type a desired value in the New value box, and click ADD

(this New value is displayed in Old New box

Continue (Return to the previous dialog box)

Finally OK.

Exhibit 11

The re-coded values appear in a new column in the Input Editor.

Exhibit 11

Page 16: SPSS 19 self study pack

16

§ Selection of a sub-set of data

It is often necessary to analyse values contained in a variable by sub-groups. The sub-

sets are to be specified in terms of some categorical variable. Two variables are in this

process. Analysis of data is achieved by the Split file tool which organises the output

by the specified sub-groupings. This selects only the desired sub-set from the full data

file. The selection may be based on simple conditions or compound conditions.

(a) Case selection with simple condition

Example 3. Given a sample of candidates with their gender and marital status, select

(a) the female candidates

(b) the female candidates who are married.

Calculate (a) the average age of the candidates (b) the average age of male and (c)

average age of female candidates.

Gender Maritals Age

1 2 23.00

2 2 19.00

2 2 33.00

2 2 42.00

1 1 26.00

1 1 28.00

1 2 39.00

2 2 27.00

2 2 18.00

2 1 31.00 Codes: Gender

1 2 23.00 male = 1

2 2 19.00 female = 2

2 2 33.00 Maritals

2 2 42.00 Unmarried = 1

1 1 26.00 Married = 2

1 1 28.00

1 2 39.00

2 2 27.00

2 2 18.00

2 1 31.00

Select female cases only (i.e. Gender = 2)

Steps

Data – Choose Select Cases (the select cases window appears)

This window shows the list of all the variables entered (left

hand pane) and All cases selected as default

Choose the If condition satisfied option by clicking in

the adjacent circle to open the next window

(see Exhibit)

Page 17: SPSS 19 self study pack

17

Finally OK

Exhibit 12 (Select Cases Conditions Dialog box). The exhibit shows all cases are

selected.

Exhibit 13 (Condition applied)

Steps (contd).

Choose appropriate variable (for the given Example, this is Gender)

Transfer to the right hand side box.

Page 18: SPSS 19 self study pack

18

Paste = symbol, followed by 2, ( i.e. Gender = 2)

Continue (return to the previous window)

This window now shows the instruction next to

the If Condition satisfied option,

Finally OK.

The results of the condition are applied in the Input Editor by selecting the selected

cases and de-selecting the remaining cases. (see left hand margin and Filter_$ col) .

Exhibit 14 (Input Editor showing the selection with a simple condition)

(b) Case selection with compound condition

With the sample example Select cases for female and married candidates.

This is a compound condition case. In order to apply the condition, two variables are

to be sought (i) gender from which gender = 2 condition is to be employed and

(ii) maritals from which maritals = 2 is to be employed.

The two separate conditions are combined by the & conjunction. See the Exhibit

below.

Page 19: SPSS 19 self study pack

19

Exhibit 15 (Selection with conjugate conditions)

As a result, the Input Editor displays all cases which satisfy the compound condition

and crosses out the remaining cases. See the margin and filter $ column of the Editor.

Exhibit 16 (Input Editor showing the sub-set selected with conjugate condition)

Page 20: SPSS 19 self study pack

20

§ Split Files

It is often necessary to analyse data from a data file (full set) by different categories

(sub-sets). This is performed by specifying the sub-sets with appropriate defining

variable which contains the category codes.

Example: Given a sample of 10 male and female candidates with their ages.

Calculate (a) the average age of the candidates

(b) the average age of male and

(c) average age of female candidates.

Note: (b) and (c) are split file cases.

1 42.00

2 34.00

2 32.00

2 23.00

1 28.00

1 31.00

1 43.00

2 36.00

1 27.00

1 42.00

Steps to Split file

From the Data menu select Split File (Split File box appears)

Choose Organize Output by groups and then OK

Note: Data file is sorted by the grouping variable

Exhibit 17

Page 21: SPSS 19 self study pack

21

Results (a) Average of all candidates

Exhibit 18 Average age of all the candidates calculated

Average age is 33.80

(b) Average age of Male and Female candidates

Exhibit 20 (The results on the top window)

Page 22: SPSS 19 self study pack

22

Appendix

§ Measurement scales

Majority of statistical analysis involves simple arithmetic operations on the data

values. However the types of admissible operations depend on the measurement scale

of the data. Measurement scales of the data values are of the following types

Nominal Discrete Non-parametric

Ordinal/Rank/Likert Discrete Non-parametric

Interval/Ratio Continuous Parametric

Relative strength of 3 measurement scales

Weak <-----------------------------------------> Strong

Nominal Ordinal/Rank/Likert Interval/Ratio

Transformation from Interval/Ratio to Ordinal is possible by grouping Interval scale

values in discrete groups within ranges defined by researchers. (cf. Re-code

application).

Transformation from Ordinal scale to Interval scale is done by applying various

transformation on the Ordinal scale data, e.g Log transform, Square root or Inverse

transform.

Admissible Statistical Analyses

In general, the non-parametric variables (Nominal or Ordinal/Rank scale) define

categories. Hence these variables are called categorical or classification or grouping

or factor variables, containing the so-called non-metric or discrete data. Parametric

variables contain actual measurements or observations which represent metric or

continuous data.

Various statistical techniques that are applied on the data are dependent on the

measurement scale of the data values. A list of admissible statistical techniques is

given below.

_________Non-parametric_________ Parametric

Nominal Ordinal Ratio/Interval

Mode Median Summary parameters

Frequency Frequency Mean, Variance,

Non-parametric tests Non-parametric tests Skewness, Kurtosis

Contingency coeff Spearmans correlation Inferential Statistics

ANOVA MANOVA

Pearsons Correlations

Linear and non-linear

regressions

Page 23: SPSS 19 self study pack

23

§ Summary statistics

Statistical parameters are computed with a view to describing (or summarizing)

the properties of sampled data. Each parameter is a single-valued quantity. In

most cases, these parameters are computed for continuous scale data, with a few

for discrete data. A selection of these parameters is given below.

Definitions and Formulas of Summary Parameters for a sample

Mode: Mode is the most frequently occurring value in a collection of

observations. There may be more than one mode in a set of observations.

(Nominal)

Median: Median is the value in the middle value in a range of observations.

There are exactly same number of observations greater than and less than

the median value With N observed values,

if N is odd, the median value is in the position [N+1]/2, and

if N is even, median is the average of the values in position N/2 and N/2 + 1.

(Ordinal/Rank)

Arithmetic Mean X : The (sample) mean of a sample of N observations

consisting of X1, X2,...Xi values, is given by

N

i

iXN

X1

1 1

(Interval/Ratio)

Range: Range is the difference between the maximum and minimum

values of N observations, i.e.

Range = Xmax - Xmin 2

Variance v: The variance of N observation represents the spread of the

observed data set about the mean and is expressed as

1

1

2

N

XX

v

N

i

i

3

(Interval/Ratio)

Standard Deviation s: Standard deviation is given by the square root of

variance

s = v

(Interval/Ratio)

Page 24: SPSS 19 self study pack

24

Skewness Sk: Skewness represents features (presence or absence) of

symmetry of the distribution of the data set about the mean and is given by

3

3

* sN

XX

S

i

k

6

(Interval/Ratio)

Sk zero Sk -ve Sk +ve

Distribution Symmetrical Left skewed Right skewed

Kurtosis Kr: Kurtosis represents the peakedness of a distribution.

4

4

* sN

XX

K i

i

r

7

(Interval/Ratio)

Page 25: SPSS 19 self study pack

25

§ Summary statistics contd.

Quartiles: Quartiles divide a data set into quarters (4 equal parts)

The first quartile, Q1, is the median of the portion of data set that lies at or below

the median of the entire data set

The second quartile, Q2 is the median of the entire data set

The third quartile Q3, is the median of the portion of data set that lies at or above

the median of the entire data set.

Note: The entire data set is arranged in ascending order for quartile evaluation.

Five number summary: Five number summary is the list of five summary

measures including the minimum, Q1, Q2 (median), Q3 and maximum of

the data set.

Note: The entire data set is arranged in ascending order for five number

evaluation.

A Box and Whisker plot gives a graphic representation of a data set based on the

Five number summary.

Q1 Median Q3

Also the B and W plot gives a visual understanding of the skewness of the

distribution of the data set. The equal areas of the two polygons on either side of

the Median indicates a symmetrical (skew = 0) distribution of the dta.

Percentiles: A percentile of a specific value K (Kth Percentile) is the value

of the data at or below which lies the K percent of observations.

Page 26: SPSS 19 self study pack

26

§ Uncertainties associated with sample data

Confidence Interval CI

A confidence interval is a range if values, wa to wb, which is expected to include a

statistical parameter θ (i.e. wa ≤ θ ≤wb ).

A sampling distribution is associated with a statistical parameter (e,g, mean,

correlation coefficient, regression coefficient and so on) calculated from a sample data

set and the Central Limit Theorem states that this distribution is normal with the well

known bell shape and two tails (areas of error) on either side. By specifying a

confidence level α (traditionally α is set at 95%) with the sample parameter, it is

necessary to exclude the regions of error. This region is 1 – α taking both sides (two

tail) of the distribution. In other words, the area of error on each side is (1 – α)/2The

area of exclusion is usually looked up from a table showing values at desired error (A

t-distribution table).

a) Calculation of CI for a mean (m)

Find the value for the error region (look up t0.025 from t-distribution table).

This value is 1.96 at 95% confidence level, or at 5% error level.

Multiply the standard error (SE) of the mean by this value which gives the

required interval.

Note: the SE of a mean m is (s/√N), where s is the standard deviation of m and

N the sample size

Thus the CI of a sample mean m is

CI if m = m ± t0.025 * SE of m i.e.

= m ± 1.96* (s/√N)

b) Calculation of CI for a proportion from a dichotomous set of values

Consider a sample of size N with dichotomous values. The proportion p of values

with one dichotomy sub-set is f1/N and the other sub-set is 1- p.

Calculate the standard error SE of a proportion (p). This is given by

SE of a proportion = N

pp )1(

The CI of a sample proportion p is

CI of p = p± t0.025 * SE of p at 95%level

= p± t0.025 * N

pp )1(

Page 27: SPSS 19 self study pack

27

§ Inferential Statistics

The key objective of statistical analysis with a set of data is to draw some valid

inference from it. Two computational steps are involved in deriving such inference.

(a) Calculation of an appropriate test measure of the relevant hypothesis and

(b) Estimation of the confidence interval.

Inferential statistics comprises a wide range of tests which enable researchers to draw

inferences from their sample data set.

If an inference is to be drawn on a full population of data with a sample, certain

conditions are to satisfied. Associated with every statistical test, there are two basic

components. These components basically specify the conditions for an appropriate

inference. These include

i) A measurement requirement and

ii) A model.

Parametric and Non-parametric Tests:

A parametric tests is a test which is to be applied on a set of data measured on Interval

or Ratio scale (continuous) or on counts (discrete). It is generally assumed that the

data conform to some distribution e.g. Normal or Poisson etc.

A non-parametric test applied to a data set on Ordinal scale (e.g. ranked data) or on

Nominal scale. In non-parametric tests, conformity of the data to any distribution is

not assumed.

Power Efficiency of Parametric and Non-Parametric Tests

In general, inference of a more general type are drawn from the application of

statistical tests which demand weaker assumption about the data. However, the test of

null hypothesis through these applications is less powerful for any particular sample

of size N (say) under consideration. This true of any Non-Parametric tests compared

with Parametric test.

Thus when two different tests are applied Test A and Test B) to two samples of two

sizes Na and Nb where Nb > Na, then test B may be more powerful than test A

This provides a scope for test B over test A.

Efficiency and Sample size

The extent of increase in the sample size to make a test (test B say) as powerful as

another test (test A say) is linked with the power efficiency. This is defined in terms

of the sample size used the respective tests

Page 28: SPSS 19 self study pack

28

Power efficiency of test B = Na/Nb

Using the above formula, it is easy to calculate the sample size of test B in order to

raise its power to level of test A.

Considering the parametric and non-parametric tests for comparing means of two

groups (i.e. students t-test and Mann-Whitney test), the power efficiency of Mann-

Whitney test is 95% relative to the t-test.

Similarly among the parametric and non-parametric version of analysis of variance

(ANOVA) the power efficiency of Kruskal-Wallis (non-parametric) test is again 95%

relative to F-test (parametric).

Same level of efficiency may be achieved for two types of tests (parametric and non-

parametric) by drawing appropriate size for each. For example, a researcher needs to

have a sample size of 20 cases for a Mann Whitney test to be able to reject Ho with

the same level of confidence for every 19 cases for a t-test. Similarly a sample of size

20 is needed in a Kruskal-Wallis test to be able to reject Ho for every 19 cases for

application of the F-test.

Page 29: SPSS 19 self study pack

29

§ Hypothesis testing steps

A no difference statement is set up as the starter (Ho Null hypothesis). The

implicit aim of a null hypothesis is to negate what you wish to demonstrate with

your sample data set.

Apply appropriate statistical technique, which produces the test statistic value,

degrees of freedom (df), and probability p (sig-value)

Compare the test statistic p (sig) value with the p value of the sampled

distribution

If the p value is low (conventionally < 0.05), reject Ho with residual uncertainty

proportional to p or else accept Ho.

Significance level (p): The dichotomy

yes Reject Ho

p < 0.05

no Accept Ho

§ Common Errors

The common errors in arriving at a decision about Ho are of two types:

Type I Error To reject Ho, when in fact it is true.

Probability associated with Type I error = alpha

Type II Error To accept Ho when in fact it is false

Probability associated with Type II error = beta

The two probabilities alpha and beta are inversely related, if alpha is increased. Beta

is decreased.

These errors originate from the sample size N of the data set. The errors are reduced

by by increasing the sample size N.

Page 30: SPSS 19 self study pack

30

§ Specific Parametric Tests.

Simple ANOVA applications

ANOVA (One Way or Univariate)

Data specification:

2 variables - one Test Variable which includes all test scores (Interval/Ratio scale)

and one categorical variable (Ordinal or Nominal scale) which defines the groups

(sub-sets) of data on the test variables.

The key quantity that is computed in ANOVA is the F-statistic, together with the

degrees of freedom and the sig-value.

The F statistic (a single valued result) is the ratio of the between group (b-g) variation

and the within-group (w-g) variation present in the test variable.

F = gw

gb

SS

SS

Both the numerator and the denominator are single-valued scaler quantities for

ANOVA applications and hence the F-statistic turns out to be a single-valued

quantity.

Simple MANOVA applications

GLM Multiple Analysis of Variance (MANOVA)

The MANOVA technique is applied to analyse the variances with 2 or more test

variables (contrast to MANOVA). These test variables containing continuous data are

treated as the dependent variables and the categorical variable(s) as independent

variable(s). MANOVA computes the effects of the categorical (factor) variable on the

dependent variables through the F-statistic. Normality assumptions on the test

variables are not stringent.

In Manova with multiple test variables, the between group (b-g) and within group (w-

g) variations are not single values scaler quantities but matrices. These matrices are

known as SSCP(bg) and SSCP(wg) matrices respectively.

Page 31: SPSS 19 self study pack

31

The latent roots from the matrix represent the so-called eigenvalues. The F-statistic

calculation from these eigenvalues proceeds along the route as follows.

The following equations relate the eigenvalues (s) with the Wilk's and the F-

statsistic.

Wilks = i

i

1

1

F =

p

pnn 11 21

where n1 and n2 represent the number of cases in p groups.

Other commonly used Manova test indices are also computed from the eigenvalues.

These are given below.

Pillais Trace =i i

i

1

Hotelling Trace = i

i

Roy’s max root = max

max

1

SSCP

b-g

(Matr

ix)/S

SCPw-

g(Mat

rix)

SSCPb-g (Matrix)/SSCPw-g(Matrix)

Computes eigenvalues

Computes Wilks

Page 32: SPSS 19 self study pack

32

Specific MANOVA Applications

Variability of 2 or more test variables with 1 or more factor variables

Data Specification:

(i) 2 or more test variables (Continuous) and

(ii) 1 or more categorical (factor) variables (discrete). The discrete variable may

include 2 or more levels

Repeated measure MANOVA (Between groups)

Data Specification: Sub-sets of Test variables (Interval/Ratio scale) are to be entered

in a 2-dimensional matrix format. The first element of the matrix represents the main

factor and the second element denotes the factor levels (the levels indicate time

repetition) .

Repeated measure MANOVA (mixed Within-Between groups)

Sub-sets of Test variable (Interval/Ratio scale) is to be entered in a 1-dimensional

matrix form with factor levels (the levels indicate time repetition)

§ BI-VARIATE RELATIONSHIPS

Pearsons Product-Moment Correlation (Parametric)

Data specification 2 or more variables

Measurement scale Ratio/Interval scale

Example Correlation Matrix 3 variables: A B C

Correlation coefficients appear as a Matrix AA AB AC

BA BB BC

CA CB CC

AB = BA; AC = CA; BC = CB

Range of Correlation coefficient r -1 ≤ r ≤ +1

Qualitative description of strength of r

Range Relationship

+/-0.1 to +/- 0.3 weak

+/-0.31 to +/- 0.5 medium

+/-0.51 to +/- 0.7 moderate

+/-0.71 to +/- 1.0 strong

Page 33: SPSS 19 self study pack

33

§ MATHEMATICAL MODELS

An aspect of rationality is the possibility of going beyond purely intuitive

judgements to the use of structural models that provide methods of

aggregating intuitive judgements.

P Suppes

Lucien Stern Professor of Philosophy

Stanford University California.

§ General Linear Model Family

§ GLM Prediction Model

Simple Linear Regression

A single linear regression is formed with one dependent variable y and one

independent variable x1, and the equation is reduced to the form:

y = constant + b1* x1

The coefficient b1 which represents the slope of the regression line and the constant

is the intercept of the regression line with the vertical axis The coefficient may have a

positive (+ve) or a negative (-ve) value. A +ve coefficient means that as the

independent variable x increases, the dependent variable y also increases and a -ve

coefficient means that as x increases y decreases. The magnitude of the coefficient

value gives the measure of steepness of the regression line.

GLM

Regression Analysis

Discriminant Analysis

Factor Analyis

Principal Component Analysis

Page 34: SPSS 19 self study pack

34

Multiple Linear Regression

Regression is an example of GLM structures with the objective of prediction

In Multiple regression, a linear equation is constructed by least square fitting using

multiple independent (explanatory) variables X1, X2, X3 and one dependent variable

Y.. The equation is expressed as follows:

Y = constant + b1* X1 + b2* X2 + b3* X3 +...

In the above equation, the coefficients (b1, b2 etc) associated with the independent

variables are partial regression coefficients. Each coefficient or parameter should be

regarded as an estimator of the measure of influence of a particular independent

variable on the dependent variable. These coefficients may be +ve or -ve

Data Specification

Usually all the variables (independent and dependent) contain continuous

(Interval/Ratio scale) data.

In many regression applications, some of the independent variables may

contain Ordinal /Rank scale data.

In special applications (e.g. regression used as a classification model), the

dependent variable may contain categorical data (Nominal or Ordinal/Rank scale)

Terms and Definitions

Partial Regression Coefficient - The coefficient associated with each independent

variable (e.g. b1, b2 etc) is called the partial regression coefficient. These are the

measures of influence of the independent variables on the dependent variable. These

are unstandardised coefficients

Standardised Coefficients - Owing to the presence of different scales of values in the

independent variables, it is necessary to transform the unstandardised coefficients

to produce the standardised coefficient which gives the measure of relative influence

of each regression coefficient in comparable units. This is given by

= y

x

S

Sb

where is the standardised coefficient which is related to b, the unstandardised

coefficient for a particular independent variable X and the dependent variable Y, and

Sx and Sy are the respective standard deviations.

Coefficient of Multiple Determination R2 - This is the measure of the proportion of

the variation in the dependent variable explained by the independent riables

Page 35: SPSS 19 self study pack

35

§ The Ballantine for Multiple Regression

The Ballantine model provides a diagrammatic representation of the influence of the

explanatory variables on the dependent variable. The variables involved (dependent

and independents) in the multivariate regression analysis are depicted as circles.

which mutually overlap. The overlap sections of these circles represent the extent of

partial contributions by the independent variables to the dependent variable and

among themselves.

A B

Y

X2

X1

C D

Three circles (above) represent 3 variables Y, X1 and X2. There are 4 overlap regions

shown by A, B, C, and D.

'A' represents the overlap between X1 and Y (independent of X2) equivalent to b1

'B' represents the overlap between X2 and Y (independent of X1) equivalent to b2

'C' represents the overlap between X1, X2 and Y

The ratio of the sum 'A' + 'B' + 'C' to the total area of the 'Y' circle is equivalent to R2,

the coefficient of multiple determination.

'D' represents the overlap between X1 and X2 (independent of Y) is equivalent to

partial correlation of X1 and X2.