49
SPSS TUTORIAL Basic Data Entry Name The name of each SPSS variable in a given file must be unique; it must start with a letter; it may have up to 8 characters (including letters, numbers, and the underscore _ (note that certain key words are reversed and may not be used as variable names, e.g., "compute", "sum", and so forth). To change an existing name, click in the cell containing the name, highlight the part you want to change, and type in the replacement. To create a new variable name, click in the first empty row under the name column and type a new (unique) variable name. Notice that we can use "cat_dog" but not "cat-dog" and not "cat dog". The hyphen gets interpreted as subtraction (cat minus dog) by SPSS, and the space confuses SPSS as to how many variables are being named. Type The two basic types of variables that you will use are numeric and string. Numeric variables may only have numbers assigned. String variables may contain letters or numbers, but even if a string variable happens to contain only numbers, numeric operations on that variable will not be allowed (e.g., finding the mean, variance, standard deviation, etc...). To change a variable type, click in that cell on the grey box with ...

Name  · Web viewThe hyphen gets interpreted as subtraction (cat minus dog) by SPSS, and the space confuses SPSS as to how many variables are being named. Type The two basic types

Embed Size (px)

Citation preview

SPSS TUTORIAL

Basic Data Entry

Name

The name of each SPSS variable in a given file must be unique; it must start with a letter; it may have up to 8 characters (including letters, numbers, and the underscore _ (note that certain key words are reversed and may not be used as variable names, e.g., "compute", "sum", and so forth). To change an existing name, click in the cell containing the name, highlight the part you want to change, and type in the replacement. To create a new variable name, click in the first empty row under the name column and type a new (unique) variable name.

Notice that we can use "cat_dog" but not "cat-dog" and not "cat dog". The hyphen gets interpreted as subtraction (cat minus dog) by SPSS, and the space confuses SPSS as to how many variables are being named.

Type

The two basic types of variables that you will use are numeric and string. Numeric variables may only have numbers assigned. String variables may contain letters or numbers, but even if a string variable happens to contain only numbers, numeric operations on that variable will not be allowed (e.g., finding the mean, variance, standard deviation, etc...). To change a variable type, click in that cell on the grey box with ...

Clicking on this box will bring up the variable type menu:

If you select a numeric variable, you can then click in the width box or the decimal box to change the default values of 8 characters reserved to displaying numbers with 2 decimal places. For whole numbers, you can drop the decimals down to 0.

If you select a string variable, you can tell SPSS how much "room" to leave in memory for each value, indicating the number of characters to be allowed for data entry in this string variable.

Width

The width of a variable is the number of characters SPSS will allow to be entered for the variable. If it is a numerical value with decimals, this total width has to include a spot for each decimal, as well as one for the decimal point. You can change a width by clicking in the width cell for the desired variable and typing a new number or you can use the arrow keys at the edge of the cell

Decimals

The decimals of a variable is the number of decimal places that SPSS will display. If more decimals have been entered (or computed by SPSS), the additional information will be retained internally but not displayed on screen. For whole numbers, you would reduce the number of decimals to zero. You can change the number of decimal places by clicking int he decimals cell for the desired variable and typing a new number or you can

use the arrow keys at the edge of the cell

Label

The label of a variable is a string of text to indentify in more detail what a variable represents. Unlike the name, the label is limited to 255 characters and may contain spaces and punctuation. For instance, if there is a variable for each question on a questionnaire, you would type the question as the variable label. To change or edit a variable label, simply click anywhere within the cell.

Values

Although the variable label goes a long way to explaining what the variable represents, for categorical data (discrete data of both nominal and ordinal levels of measurement), we often need to know which numbers represent which categories. To indicate how these numbers are assigned, one can add labels to specific values by clicking on the ... box in the values cell

Clicking here opens up the Value Labels dialogue box.

Click in the Value field to type a specific numeric value Click in the Label field to type the corresponding label Click on the Add button to add this pair of value and label to the list

You can remove a pairing created above by clicking on that pair and then clicking on the delete button. Similarly, you can change pairing by clicking on the pair, then typing in a new value, a new label, or both; then, you click on the Change button. When you are satisfied with the definitions of each value, click on the OK button

Undesirable Variable TypesBy Ruben Geert van den Bergon January 20, 2015 under 1.3. SPSS Data Preparation Tutorial.

SPSS has two variable types: string variables and numeric variables. String variables have “String” under Type in Variable View. All other variables are numeric. The screenshot below illustrates this point for hotel_evaluation.sav.

A problem with some data files is that they contain string variables that should have been numeric. A rule of thumb is thatonly nominal variables with many distinct values should be string variables.Right. In Variable View we see that fname, bday, age and q1are string variables. The screenshot below shows them in Data View.

First, fname holds respondents’ first names. Is it nominal? Yes. Does is have many different values? Yes. Conclusion: it's an appropriate string variable. No problem here.Second, bday holds respondents’ birthdays. Is it nominal? No. Conclusion: this should have been a numeric variable. More precisely, it should be a date variable (which is also a numeric variable). Solution: convert it. Convert String to Date Variableshows how to do so but we'll skip that for now.Third, age is also a metric instead of a nominal variable and thus had better be converted to numeric as well. We'll cover this inSPSS Convert String to Numeric Variable but we'll skip it for now.Fourth, q1 appears to be an ordinal variable. It's not nominal and it doesn't have many distinct values either so it's not a proper string variable. A labeled numeric variable (similar to q2 for example) would be appropriate here. For now, we'll skip converting it.

Cronbach’s Alpha (reliability analysis)

Step 2: 

(a) Select "Analyze" 

(b) Select "Scale" 

(c) Select "Reliability Analysis"

Figure 2 shows what your screen should now display.

Figure 2: Reliability Analysis Command

Step 3: A pop-up window will appear for reliability analysis. In this window are two boxes, one to the left and one to the right. The left contains the variables entered in SPSS (TV1, TV2, etc.), the box to the right, which is labeled "Items," is where one moves those variables for which Cronbach's alpha is desired. Note that I have selected the three Task Value variables in Figure 3.

 

Figure 3: Reliability Analysis Pop-up Window

 

In Figure 4, note that I have moved the three Task Value variables to the box on the right for these are the three for which I desire Cronbach's alpha. Once we run this analysis, Cronbach's alpha will be calculated for the three Task Value variables (items) to provide information about the internal consistency of those three items. If we also wanted to obtain Cronbach's alpha for the Anxiety items, would would need to re-run the analysis with only the Anxeity items appearing in the "Items:" box. To run Cronbach's alpha with both sets of items, Task Value and Anxiety, would be a mistake because those six items are not designed to measure the same construct and the alpha that would result would be uninterpretable. 

 

Figure 4: Reliability Analysis Pop-up Window

 

Step 4: Select desired statistics for the analysis. Click on the "Statistics" button which can be seen in Figure 4. Once that button is selected, a pop-up window labeled "Statistics" will appear. This window is displayed in Figure 5 below. Note in Figure 5 that I have placed a check mark next to "Scale" and "Scale if item deleted." You should also select those two. After selecting those two options, then click on the "Continue" button to return to the "Reliability Analysis" pop-up window displayed above in Figure 4, then click on the "OK" button to run the analysis. 

 

Figure 5: Statistical Options for Reliability Analysis

 

Step 5: Analysis of results.

(a) Overall alpha: Now that Cronbach's alpha has been run for the three Task Value items, we must next examine the results. Figure 6 below displays some of the results obtained. The red arrow points to the overall alpha for the three Task Value items. As the results in Figure 6 show, overall alpha is .907, which is very high and indicates strong internal consistency among the three Task Value items. Essentially this means that respondents who tended to select high scores for one item also tended to select high scores for the others; similarly, respondents who selected a low scores for one item tended to select low scores for the other Task Value items. Thus,

knowing the score for one Task Value item would enable one to predict with some accuracy the possible scores for the other two Task Value items. Had alpha been low, this ability to predict scores from one item would not be possible. 

 

Figure 6: Statistical Results for Reliability Analysis (overall alpha highlighted)

 

(b) Corrected Item-Total Correlation: Figure 7 below highlights the column containing the "Corrected Item-Total Correlation" for each of the items. This column displays the correlation between a given Task Value item and the sum score of the other two items. For example, the correlation between Task Value item 1 and the sum of items 2 and 3 (i.e., item 2 + item 3) is r = .799. What this means is that there is a strong, positive correlation between the scores on the one item (item 1) and the combined score of the other two (items 2 and 3). This is a way to assess how well one item's score is internally consistent with composite scores from all other items that remain. If this correlation is weak (de Vaus suggests anything less than .30 is a weak correlation for item-analysis purposes [de Vaus (2004), Suveys in Social Research, Routledge, p. 184]), then that item should be removed and not used to form a composite score for the variable

in question. For example, if the correlation between scores for item 1 and the combined scores of items 2 and 3 was low, say r = .15, then when we create the composite (overall) score for Task Value (the step taken after reliability analysis) we would create the composite using only items 2 and 3 and we would simply ignore scores from item 1 because it was not internally consistent with the other items. 

 

Figure 7: Statistical Results for Reliability Analysis (Corrected Item-Total Correlation)

(c) Cronbach's Alpha if item Deleted: Figure 8 displays Cronbach's alpha that would result if a given item were deleted. Like the item-total correlation presented above in (b), this column of information is valuable for determining which items from among a set of items contributes to the total alpha. The value presented in this column represents the alpha value if the given item were not included. For example, for Task Value item 1, the Cronbach's alpha if item 1 was deleted would drop from the overall total of .907 to .880. Since alpha would drop with the removal of TV1, this item appears to be useful and contribute to the overall reliability of Task Value. Item 3, however is less certain. Cronbach's alpha would increase from .907 to .911 if item 3 were deleted or not used for computing an overall Task Value score. So should this item be removed and should the overall Task Value composite be created only from items 1 and 2? In this case the

answer is no, we should instead retain all three items. Why? Note first that alpha does not increase by a large degree from deleting item 3. Second, note that item 3 still correlates very well with the composite score from items 1 and 2 (the item-total correlation for item 3 is .759). Since deletion of item 3 results in little change, and since item 3 correlates well with the composite of items 1 and 2, there is no statistical reason to drop item 3. 

 

Figure 8: Statistical Results for Reliability Analysis (Cronbach's Alpha if item Deleted)

Once reliability analysis are done and items are deleted/ or not deleted then second step is to develop scale.

Compute Variables

To compute a new variable, click Transform > Compute Variable.

The Compute Variable window will open where you will specify how to calculate your new variable.

A Target Variable: The name of the new variable that will be created during the computation.

Simply type a name for the new variable in the text field. Once a variable is entered here, you can click on “Type & Label” to assign a variable type and give it a label. The default type for new variables is numeric.

B The left column lists all of the variables in your dataset. You can use this menu to add

variables into a computation: either double-click on a variable to add it to the Numeric Expression field, or select the variable(s) that will be used in your computation and click the arrow to move them to the Numeric Expression text field (C).

N umeric Expression: Specify how to compute the new variable by writing a numeric

expression.

D  The center of the window includes a collection of arithmetic operators, Boolean operators,

and numeric characters, which you can use to specify how your new variable will be calculated. There are many kinds of calculations you can specify by selecting a variable (or multiple variables) from the left column, moving them to the center text field, and using the blue buttons to specify values (e.g., “1”) and operations (e.g., +, *, /).

E  If: The If option allows you to specify the conditions under which your computation will be

applied.

F  Function group: You can also use the built-in functions in the Function group list on the

right-hand side of the window. The function group contains many useful, common functions that may be used for calculating values for new variables (e.g., mean, logarithm). To find a specific function, simply click one of the function groups in the Function Group list. You will now see a list of functions that belong to that function group in the Functions and Special Variables area. If you click on a specific function, a description of that function will appear in the text field to the left.

Click If (indicated by letter E in the above image) to open the Compute Variable: If Cases window.

EXAMPLE: COMPUTING A NEW VARIABLE USING ARITHMETIC

Now we will use what we have learned throughout this tutorial to demonstrate how to compute a new variable. In this example, we wish to compute a new variable called AverageScore that is the average of four test scores—variablesEnglish, Reading, Math, and Writing.

1. Click Transform > Compute Variable.

2. In the Target Variable field, type a name for the new variable that will be computed. Let's call our new variableAverageScore.

3. Highlight each variable—English, Reading, Math, and Writing—from the list on the left and click the arrow to move each variable to the Numeric Expression field. (Alternatively, you can double-click on the variable name to move it to the Numeric Expression field.) Make sure you click the spacebar to create a space between each variable.  

4. Now your four variables will appear in the Numeric Expression field. Move your cursor between each set of variables and click the “+” sign to add the symbol for addition to the numeric expression. Now your expression should appear as English + Reading + Math + Writing.

5. Now insert parentheses around the expression so that it appears as (English + Reading + Math + Writing).

6. At the end of the expression, add the “/” sign and the number “4.” Now your expression should appear as(English + Reading + Math + Writing) / 4.

7. The final expression indicates that the new variable, AverageScore will be calculated as the average of the four test scores.

8. Click OK to complete the computation and apply the changes to the data.

9. Finally, let’s make sure that a new variable called AverageScore was successfully created.

o We can find the new variable in the last column in Data View or in the last row of Variable View. If you do not see the new variable, the computation was unsuccessful.

o We can check the syntax that was executed by looking at the log in the Output Viewer window. After running Compute Variable, the syntax that should have appeared in the output window is:

o COMPUTE FinalGrade1=(English + Reading + Math + Writing) / 4.

EXECUTE.

If there was an error in how the computation was specified, the log in the Output Viewer will often show an error message.

o It is also useful to explore whether the computation you specified was applied correctly to the data. You can spot-check the computation by viewing your data in the Data View tab. To check that the new variable computed correctly, you can manually calculate the averages for a few cases in your dataset just to spot-check that the computation worked correctly.

Recoding New Variables (Reverse Coding)

RECODE INTO DIFFERENT VARIABLES

Recoding into a different variable transforms an original variable into a new variable. That is, the changes do not overwrite the original variable; they are instead applied to a copy of the original variable under a new name.

To recode into different variables, click Transform > Recode into Different Variables.

The Recode into Different Variables window will appear.

The left column lists all of the variables in your dataset. Select the variable you wish to recode by clicking it. Click the arrow in the center to move the selected variable to the center text box, (B).

A  Input Variable -> Output Variable: The center text box lists the variable(s) you have selected to recode, as well as the name your new variable(s) will have after the recode. You will define the new name in (C).

B  Output Variable: Define the name and label for your recoded variable(s) by typing them in the text fields. Once you are finished, click Change. Now the center text box, (B), will display both the name of the original variable as well as the name for the new variable (e.g., “Height --> Height_categ”).

C  Old and New Variables: Click the Old and New Values to specify how you wish to recode the values for the selected variable.

D  If: The If option allows you to specify the conditions under which your recode will be applied. (We discuss the Ifoption in more detail later in this tutorial.)

Old and New Values

Once you click Old and New Values, a new window where you will specify how to transform the values will appear.

1 Old Value: Specify the type of value you wish to recode (e.g., a specific value, missing data, or a range of values) and the specific value to be recoded (e.g., a value of “1” or a range of “1-5”).

When recoding variables, always handle the missing values first! The most common recoding errors come from not explicitly accounting for missing values, so that they end up lumped in with the valid values.

Value: Enter a specific numeric code representing an existing category.

System-missing: Applies to any system-missing values (.)

System- or user-missing: Applies to any system-missing values (.) or special missing value codes defined by the user in the Variable View window

Range: For use with ordered categories or continuous measurements. Enter the lower and upper boundaries that should be coded. The recoded category will include both endpoints, so data values that are exactly equal to the boundaries will be included in that category.

Range, LOWEST through value: For use with ordered categories or continuous measurements. Recode all values less than or equal to some number.

Range, value through HIGHEST: For use with ordered categories or continuous measurements. Recode all values greater than or equal to some number.

All other values: Applies to any value not explicitly accounted for by the previous recoding rules. If using this setting, it should be applied last.

2 New Value: Specify the new value for your variable (i.e., a specific numeric code such as “2,” system-missing, or copy old values).

3 Old -> New: Once you have selected the old and new values for your selected variable in (1) and (2), clickAdd in area (3), Old-->New. The recode that you have specified now appears in the text field. If you need to change one of the recodes that you have added to the Old-->New area section, simply click on the one you wish to change and make changes in (1) and (2) as necessary.

You will need to repeat these steps for each value that you wish to recode. Once you have specified all the transformations that you wish to make for the selected variable, click the “Continue” button.

Descriptive Statistics

See the Pdf File uploaded

Inferential Statistics

Correlational Analysis

The Pearson product-moment correlation coefficient (Pearson’s correlation, for short) is a measure of the strength and direction of association that exists between two variables measured on at least an interval scale. For example, you could use a Pearson’s correlation to

understand whether there is an association between exam performance and time spent revising; whether there is an association between depression and length of unemployment; and so forth.

Click Analyze > Correlate > Bivariate... on the menu system as shown below:

Published with written permission from SPSS Inc., an IBM Company.

You will be presented with the following screen:

Published with written permission from SPSS Inc., an IBM Company.

Transfer the variables  Height  and  Jump_Dist  into the Variables: box by dragging-and-

dropping or by clicking the   button. You will end up with a screen similar to the one below:

Published with written permission from SPSS Inc., an IBM Company.

Note: If you study involves calculating more than one correlation and you want to carry out these correlations at the same time, we show you how to do this in our enhanced Pearson’s correlation guide. We also show you how to write up the results from multiple correlations.

Make sure that the Pearson tickbox is checked under the -Correlation Coefficients- area (although it is selected by default in SPSS).

Click the   button. If you wish to generate some descriptives, you can do it here by clicking on the relevant tickbox under the-Statistics- area.

Published with written permission from SPSS Inc., an IBM Company.

Click the   button.

Click the   button.

Correlations Box 

Take a look at the first box in your output file called Correlations. You will see your variable names in two rows. In this example, you can see the variable name ‘water’ in the first row and the variable name ‘skin’ in the second row. You will also see your two variable names in two columns. See the variable names ‘water’ and ‘skin’ in the columns on the right? You will see four boxes on the right hand side. These boxes will all contain numbers that represent variable crossings. For example, the top box on the right represents the crossing between the ‘water’ variable and the ‘skin’ variable. The bottom box on the left also happens to represent this crossing. These are the two boxes that we are interested in. They will have the same information so we really only need to read from one. In these boxes, you will see a value for Pearson’s r, a Sig. (2-tailed) value and a number (N) value.

Pearson’s r You can find the Pearson’s r statistic in the top of each box. The Pearson’s r for the correlation between the water and skin variables in our example is 0.985.

 

When Pearson’s r is close to 1… This means that there is a strong relationship between your two variables. This means that changes in one variable are strongly correlated with changes in the second variable. In our example, Pearson’s r is 0.985. This number is very close to 1. For this reason, we can conclude that there is a strong relationship between our water and skin variables. However, we cannot make any other conclusions about this relationship, based on this number. 

When Pearson’s r is close to 0… This means that there is a weak relationship between your two variables. This means that changes in one variable are not correlated with changes in the second variable. If our Pearson’s r were 0.01, we could conclude that our variables were not strongly correlated.

 When Pearson’s r is positive (+)… This means that as one variable increases in value, the second variable also increase in value. Similarly, as one variable decreases in value, the second variable also decreases in value. This is called a positive correlation. In our example, our Pearson’s r value of 0.985 was positive. We know this value is positive because SPSS did not put a negative sign in front of it. So, positive is the default. Since our example Pearson’s r is positive, we can conclude that when the amount of water increases (our first variable), the participant skin elasticity rating (our second variable) also increases. 

When Pearson’s r is negative (-)… This means that as one variable increases in value, the second variable decreases in value. This is called a negative correlation. In our example, our Pearson’s r value of 0.985 was positive. But what if SPSS generated a Pearson’s r value of -0.985? If SPSS generated a negative Pearson’s r value, we could conclude that when the amount of water increases (our first variable), the participant skin elasticity rating (our second variable) decreases. Sig (2-Tailed) value You can find this value in the Correlations box. This value will tell you if there is a statistically significant correlation between your two variables. In our example, our Sig. (2-tailed) value is 0.002. 

If the Sig (2-Tailed) value is greater than 05… You can conclude that there is no statistically significant correlation between your two variables. That means, increases or decreases in one variable do not significantly relate to increases or decreases in your second variable. 

If the Sig (2-Tailed) value is less than or equal to .05… You can conclude that there is a statistically significant correlations between your two variables. That means, increases or decreases in one variable do significantly relate to increases or decreases in your second variable.Our Example 

The Sig. (2-Tailed) value in our example is 0.002. This value is less than .05. Because of this, we can conclude that there is a statistically significant correlation between amount of water consumed in glasses and participant rating of skin elasticity.  So what about the scatterplot? You can find your scatterplot in your output file. It will look something like the graph below. You will see a bunch of dots. Your scatterplot can tell you about the relationship between variables, just like Pearson’s r. With it, you can determine the strength and direction of the relationship between variables.   

Relationship strength Try to imagine a line that connects the dots in your scatterplot. Is this an easy or difficult task? This task can help you determine the strength of the relationship between your two variables. If your variables have a strong relationship, it will be easy for your to imagine a line connecting all of the dots. For example, in our example scatterplots, the dots seem to go together to form a straight line. However, some scatterplots do not look like this. With some scatterplots, the dots are scattered about so that it is very hard to imagine a line connecting them. The dots are not densely positioned in one place. Instead, they are all over the place. When this is the case, your variables may not have a strong relationship. Relationship Direction You can use your scatterplot to understand the direction of your relationship. Your scatterplot can tell you if you have a positive, negative or zero correlation. 

Positive correlation in a scatterplot If the line that you imagine in your graph slopes upward from zero, you can conclude that you have a positive correlation between your variables. Increases in one variable are correlated with increases in your other variable. Similarly, decreases in one variable are correlated with decreases in your other variable. 

Negative correlation in a scatterplot If the line that you imagine in your graph starts high at zero and gradually slopes downward, you can conclude that you have a negative correlation between your variables. Increases in one variable are correlated with decreases in your other variable. Zero correlation in a scatterplot If the line that you imagine does not slop, or you can’t imagine a line at all, you can conclude that you have a zero correlation between your variables. That means that your variables are not related to one another. Increases or decreases in one variable have no effect on increases or decreases in your second variable.

Multiple Regression Analysis

Multiple regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more other variables. The variable we want to predict is called the dependent variable (or sometimes, the outcome, target or criterion variable). The variables we are using to predict the value of the dependent variable are called the independent variables (or sometimes, the predictor, explanatory or regressor variables).

For example, you could use multiple regression to understand whether exam performance can be predicted based on revision time, test anxiety, lecture attendance and gender. Alternately, you could use multiple regression to understand whether daily cigarette consumption can be predicted based on smoking duration, age when started smoking, smoker type, income and gender.

Multiple regression also allows you to determine the overall fit (variance explained) of the model and the relative contribution of each of the predictors to the total variance explained. For example, you might want to know how much of the variation in exam performance can be explained by revision time, test anxiety, lecture attendance and gender "as a whole", but also the "relative contribution" of each independent variable in explaining the variance.

Click Analyze > Regression > Linear... on the main menu, as shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

Note: Don't worry that you're selecting Analyze > Regression > Linear... on the main menu or that the dialogue boxes in the steps that follow have the title, Linear Regression. You have not made a mistake. You are in the correct place to carry out the multiple regression procedure. This is just the title that SPSS Statistics gives, even when running a multiple regression procedure.

You will be presented with the Linear Regression dialogue box below:

Published with written permission from SPSS Statistics, IBM Corporation.

Transfer the dependent variable,  VO2max , into the Dependent: box and the independent

variables,  age ,  weight , heart_rate  and  gender  into the Independent(s): box, using the   buttons, as shown below (all other boxes can be ignored):

Published with written permission from SPSS Statistics, IBM Corporation.

Note: For a standard multiple regression you should ignore the    and   buttons as they are for sequential (hierarchical) multiple regression. The Method: option needs to be kept

at the default value, which is  . If, for whatever reason,   is not selected,

you need to change Method: back to  . The   method is the name given by SPSS Statistics to standard regression analysis.

Click the   button. You will be presented with the Linear Regression: Statistics dialogue box, as shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

In addition to the options that are selected by default, select Confidence intervals in the –Regression Coefficients– area leaving the Level (%): option at "95". You will end up with the following screen:

Published with written permission from SPSS Statistics, IBM Corporation.

Click the   button. You will be returned to the Linear Regression dialogue box.

Click the   button. This will generate the output.

Interpreting and Reporting the Output of Multiple Regression Analysis

SPSS Statistics will generate quite a few tables of output for a multiple regression analysis. In this section, we show you only the three main tables required to understand your results from the multiple regression procedure, assuming that no assumptions have been violated. However, in this "quick start" guide, we focus only on the three main tables you need to understand your multiple regression results, assuming that your data has already met the eight assumptions required for multiple regression to give you a valid result:

Determining how well the model fits

The first table of interest is the Model Summary table. This table provides the R, R2, adjusted R2, and the standard error of the estimate, which can be used to determine how well a regression model fits the data:

Published with written permission from SPSS Statistics, IBM Corporation.

The "R" column represents the value of R, the multiple correlation coefficient. R can be considered to be one measure of the quality of the prediction of the dependent variable; in this case,  VO2max . A value of 0.760, in this example, indicates a good level of prediction. The "R

Square" column represents the R2 value (also called the coefficient of determination), which is the proportion of variance in the dependent variable that can be explained by the independent variables (technically, it is the proportion of variation accounted for by the regression model above and beyond the mean model). You can see from our value of 0.577 that our independent variables explain 57.7% of the variability of our dependent variable,  VO2max . However, you

also need to be able to interpret "Adjusted R Square" (adj. R2) to accurately report your data. We explain the reasons for this, as well as the output, in our enhanced multiple regression guide.

Statistical significance

The F-ratio in the ANOVA table (see below) tests whether the overall regression model is a good fit for the data. The table shows that the independent variables statistically significantly

predict the dependent variable, F(4, 95) = 32.393, p < .0005 (i.e., the regression model is a good fit of the data).

Published with written permission from SPSS Statistics, IBM Corporation.

Estimated model coefficients

The general form of the equation to predict  VO2max  from  age ,  weight ,  heart_rate ,  gender ,

is:

predicted  VO2max  = 87.83 – (0.165 x  age ) – (0.385 x  weight ) – (0.118 x  heart_rate ) +

(13.208 x  gender )

This is obtained from the Coefficients table, as shown below:

Published with written permission from SPSS Statistics, IBM Corporation.

Unstandardized coefficients indicate how much the dependent variable varies with an independent variable when all other independent variables are held constant. Consider the effect of  age  in this example. The unstandardized coefficient, B1, for  age  is equal to -0.165

(see Coefficients table). This means that for each one year increase in age, there is a decrease in VO2max of 0.165 ml/min/kg.

Statistical significance of the independent variables

You can test for the statistical significance of each of the independent variables. This tests whether the unstandardized (or standardized) coefficients are equal to 0 (zero) in the population. If p < .05, you can conclude that the coefficients are statistically significantly different to 0 (zero). The t-value and corresponding p-value are located in the "t" and "Sig." columns, respectively, as highlighted below:

Published with written permission from SPSS Statistics, IBM Corporation.

You can see from the "Sig." column that all independent variable coefficients are statistically significantly different from 0 (zero). Although the intercept, B0, is tested for statistical significance, this is rarely an important or interesting finding.

Putting it all together

You could write up the results as follows:

General

A multiple regression was run to predict VO2max from gender, age, weight and heart rate. These variables statistically significantly predicted VO2max, F(4, 95) = 32.393, p < .0005, R2 = .577. All four variables added statistically significantly to the prediction, p < .05.

Comparing Means

Independent T-Test

The independent-samples t-test (or independent t-test, for short) compares the means between two unrelated groups on the same continuous, dependent variable. For example, you could use an independent t-test to understand whether first year graduate salaries differed based on gender (i.e., your dependent variable would be "first year graduate salaries" and your independent variable would be "gender", which has two groups: "male" and "female"). Alternately, you could use an independent t-test to understand whether there is a difference in test anxiety based on educational level (i.e., your dependent variable would be "test anxiety" and your independent variable would be "educational level", which has two groups: "undergraduates" and "postgraduates").

Click Analyze > Compare Means > Independent-Samples T Test... on the top menu, as shown below:

Published with written permission from SPSS, IBM Corporation.

You will be presented with the Independent-Samples T Test dialogue box, as shown below:

Published with written permission from SPSS, IBM Corporation.

Transfer the dependent variable,  Cholesterol , into the Test Variable(s): box, and transfer the

independent variable,  Treatment , into the Grouping Variable: box, by highlighting the relevant

variables and pressing the   buttons. You will end up with the following screen:

Published with written permission from SPSS, IBM Corporation.

You then need to define the groups (treatments). Click on the   button. You will be presented with the Define Groups dialogue box, as shown below:

Published with written permission from SPSS, IBM Corporation.

Enter 1 into the Group 1: box and enter 2 into the Group 2: box. Remember that we labelled the Diet Treatment group as 1 and the Exercise Treatment group as 2.

Published with written permission from SPSS Inc., an IBM Company.

Note: If you have more than 2 treatment groups in your study (e.g., 3 groups: diet, exercise and drug treatment groups), but only wanted to compared two (e.g., the diet and drug treatment groups), you could type in 1 to Group 1: box and 3 to Group 2:box (i.e., if you wished to compare the diet with drug treatment).

Click the   button.

If you need to change the confidence level limits or change how to exclude cases, click

the   button. You will be presented with the following:

Published with written permission from SPSS, IBM Corporation.

Click the   button. You will be returned to the Independent-Samples T Test dialogue box.

Click the   button.

Output of the independent t-test in SPSS

SPSS generates two main tables of output for the independent t-test. However, in this "quick start" guide, we take you through each of the two main tables in turn, assuming that your data met all the relevant assumptions.

Group Statistics Table

This table provides useful descriptive statistics for the two groups that you compared, including the mean and standard deviation.

Published with written permission from SPSS Inc., an IBM Company.

Unless you have other reasons to do so, it would be considered normal to present information on the mean and standard deviation for this data. You might also state the number of participants that you had in each of the two groups. This can be useful when you have missing values and the number of recruited participants is larger than the number of participants that could be analysed.

A diagram can also be used to visually present your results. For example, you could use a bar chart with error bars (e.g., where the error bars could use the standard deviation, standard error or 95% confidence intervals). This can make it easier for others to understand your results. Again, we show you how to do this in our enhanced independent t-test guide.

Independent Samples Test Table

This table provides the actual results from the independent t-test.

Published with written permission from SPSS Inc., an IBM Company.

We can see that the group means are significantly different because the value in the "Sig. (2-tailed)" row is less than 0.05. Looking at the Group Statistics table, we can see that those people who undertook the exercise trial had lower cholesterol levels at the end of the programme than those who underwent a calorie-controlled diet.

Reporting the output of the independent t-test

Based on the results above, we could report the results of the study as follows

General (interpretation for the report)

This study found that overweight, physically inactive male participants had statistically significantly lower cholesterol concentrations (5.80 ± 0.38 mmol/L) at the end of an exercise-training programme compared to after a calorie-controlled diet (6.15 ± 0.52 mmol/L), t(38) = 2.428, p = 0.020.

ANOVA (More than two groups)

Output

Published with written permission from SPSS Statistics, IBM Corporation.

SPSS Statisticstop ^ANOVA Table

This is the table that shows the output of the ANOVA analysis and whether we have a statistically significant difference between our group means. We can see that the significance level is 0.021 (p = .021), which is below 0.05. and, therefore, there is a statistically significant difference in the mean length of time to complete the spreadsheet problem between the different courses taken. This is great to know, but we do not know which of the specific groups differed.

Published with written permission from SPSS Statistics, IBM Corporation.Reporting the output of the one-way ANOVA

Based on the results above, we could report the results of the study as:

General

There was a statistically significant difference between groups as determined by one-way ANOVA (F(2,27) = 4.467, p = .021). The study suggested that advanced level took lesser time to complete problems in spreadsheet than other levels.