40
E E x x c c e e l l f f o o r r D D a a t t a a A A n n a a l l y y s s i i s s 3005 30th Street Boulder, CO 80301 303-444-7863 www.n-r-c.com

Excel Handbook

Embed Size (px)

Citation preview

Page 1: Excel Handbook

EExxcceell ffoorr DDaattaa AAnnaallyyssiiss

3005 30th Street Boulder, CO 80301 303-444-7863 www.n-r-c.com

Page 2: Excel Handbook

EExxcceell ffoorr DDaattaa AAnnaallyyssiiss Table of Contents

Introduction .................................................................................................... 1

Data Entry in Excel .......................................................................................... 2 Unique IDs ................................................................................................................................. 2 Setting up the Worksheet............................................................................................................. 2 Entering Single-Response, Closed-Ended Questions........................................................................ 2 Entering “Multiple-Response” Questions ........................................................................................ 3 Entering Open-Ended Questions ................................................................................................... 5 Creating a Codebook ................................................................................................................... 6

Analyzing the Data .......................................................................................... 8 Calculating an Average ................................................................................................................ 8 Creating a Frequency Distribution for a Single-Response Question................................................. 10 Creating a Frequency Distribution for a Multiple-Response Question .............................................. 13 Functions and formulas used for simple descriptive analyses in Excel............................................. 15

Presenting the Results: One Quick Idea........................................................ 17

Using Pivot Tables for Basic and Advanced Analyses .................................... 18 Creating a PivotTables (“Basic” Analyses) .................................................................................... 20 Crosstabulation of Data Using PivotTables (“Advanced” Analyses) ................................................. 23

APPENDIX I: Example Completed Surveys for Data Entry ............................ 25

APPENDIX II: Example Codebook ................................................................. 32

APPENDIX III: Example Analysis, with Formulas.......................................... 33

APPENDIX IV: Example of an “Annotated Instrument”................................. 37

Page 3: Excel Handbook

Excel for Data Analysis Page 1 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Introduction This handbook is designed to instruct program staff on how to set up data entry processes and perform simple analyses of data collected through surveys, course evaluations, or by observation or other record keeping. Throughout this handbook, a common example is used: data representing the results from six surveys completed by fictional participants of a fictional training program. A copy of the completed surveys can be found in Appendix I. The reader may find it helpful to review the surveys before continuing with the rest of the handbook. In fact, it might be beneficial to pull out the six surveys and refer to them periodically while reviewing the handbook. The individual conducting the data analysis is referred to in this handbook as the “analyst.” This person may be a program staff member, volunteer, board member or other stakeholder willing to accomplish this task. There is no job description for this analyst. He or she needs only to have a basic understanding of Microsoft Excel, know how to perform calculations using the contents of multiple cells, and be familiar with formulas. Reminders about using Excel are found in text boxes throughout the handbook. Good luck! The Staff of NRC

Excel for Data Analysis was written by National Research Center, Inc. 3005 30th Street, Boulder, Colorado 80301

Phone: 303-444-7863 Fax: 303-444-1145 www.n-r-c.com

Copyright © 2003 by National Research Center, Inc. All rights reserved.

Page 4: Excel Handbook

Excel for Data Analysis Page 2 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Data Entry in Excel The first job to be completed before data analysis of a data set is creating an electronic dataset, or entering the data into an electronic file. This can be done fairly simply using Microsoft Excel. Unique IDs Before beginning the data entry, it is advisable to put a unique identifier on each survey or data form. This will allow the analyst to keep track of his/her progress, and will also make it easier to track down and set straight any data entry errors. This “identifier” is not one that actually associates or identifies the survey with a particular person; rather, it is only to make it easier to find a specific survey at a later date. The surveys do not need to be in any particular order, just begin at the top of the stack with 1, and number consecutively. Setting up the Worksheet To set up a worksheet for data entry, the analyst will use the first row (row 1) as the question or question part labels. Dedicate the first column (column A) to the IDs. Thus, the analyst will put the label “ID” in cell A1. Cell B1 would contain the label q1 (for question #1) or whatever is appropriate for the first question or field of data. Cell B2 would contain the label q2 (or whatever is appropriate), etc. Each survey will then be entered into one row; the first survey in row 2 (ID #1), the second survey in row 3, and so on.

Reminder: Cell References “Cells” in an Excel spreadsheet are referred to by the intersection of the Column and Row in which they appear. In the example used for this handbook, the cell that contains the label “ID” is cell A1, because it is in the first column (A) and the first row (1). The cell that contains the answer to question #1 of the third survey entered is B4 (the 2nd column and the 4th row).

Entering Single-Response, Closed-Ended Questions A “closed-ended” question means that the respondent chooses an answer by marking a box or circling a number from a given list of possible responses. A “single-response” question means that the respondent is to only choose one answer from the list. Question #1 (shown below) from the example survey represents a single-response, closed-ended question.

1) How many of the training sessions did you attend? 1 to 2 3 to 4 5 6 or more

When entering and analyzing data, it is easiest to work with numbers. To do this, a number is assigned to each possible response option: “1 to 2” = 1, “3 to 4” = 2, “5” = 3, and “6 or more” = 4.

Page 5: Excel Handbook

Excel for Data Analysis Page 3 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Thus, since the respondent to the first survey said they attended “5” sessions, a “3” would be entered as the answer. The example to the right shows how the answers to question #1 would be entered for all six fictional surveys (from Appendix I). Entering “Multiple-Response” Questions Question #2 from the fictional survey is a “multiple-response” question, meaning that respondents could give more than one answer to the question; in this example, they may have heard of the program from multiple sources.

2) How did you hear about this training? (Please check all that apply.) Neighborhood newsletter Bulletin boards in community buildings Flyers Your child’s school Word of mouth Other

There are two ways the data could be entered from a question of this type. In the first method, a number is assigned to each response, similar to a single-response question. However, more than one column is assigned to the question. The number of columns assigned should be as many as

the highest number of answers the analyst believes that the respondent may give; if necessary, assign as many columns as there are possible responses (in case a respondent checks every box). In the example at left, 3 columns were assigned to question #2, and the answers entered as shown.

The second approach to multiple-response questions is to assign a column to each possible response. For the example question #2 (shown on the previous page), the following columns would be assigned:

q2a: Neighborhood newsletter q2b: Bulletin boards in community buildings q2c: Flyers q2d: Your child’s school q2e: Word of mouth

If a response was marked, place a “1” in the assigned column. If no response was given, leave it blank, or place a “0” in the column. With this method, it is harder to know if a respondent skipped a question altogether. The analyst may wish to have a column before q2a where he/she marks whether or not the question was left blank (1=blank, 2=not blank). This will help in the

Page 6: Excel Handbook

Excel for Data Analysis Page 4 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

analysis, when calculating the percent of respondents giving each answer. The example below shows how the data could be entered for question #2 using this approach.

Reminder: “Freeze Panes” “Freezing the panes” allows the labels at the top of the worksheet and the IDs at the left of the worksheet to be always visible. To freeze the panes, put the cursor in the cell where the panes should break (usually B2). Then select “Windows” from the menu bar, and then the option to “Freeze Panes.” This option works as a toggle; that is, if this option is selected again, the panes will “unfreeze.” (If the panes are frozen, the menu option will read “Unfreeze Panes.”) Using this option is quite helpful where there are many variables (columns) or cases (surveys, records of data in rows).

Page 7: Excel Handbook

Excel for Data Analysis Page 5 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Entering Open-Ended Questions An open-ended question is one in which respondents are invited to answer in their own words, rather than from a list of responses. Question #6 on the fictional survey represents an “open-ended” question. 6) Do you have any other comments you would like to make about this training?

________________________________________________________________________

________________________________________________________________________

Depending on the type of open-ended question asked, the analyst may or may not wish to enter these responses into the dataset at the same time as the other questions are entered. These questions could be entered later into an appendix for a report, or they could be read and assigned “codes;” that is, like answers could be grouped into categories. Each category or code could be assigned a number, and these codes entered into the dataset in a manner similar to the examples shown above. For this fictional survey, the answers to Question #6 were deemed short enough to enter verbatim into the dataset, as shown in the example below:

However, the answers to Question #7 were considered appropriate for coding. 7) What is your race? ____________________

The answers were entered into the dataset as written in by respondents, as shown below, but then codes were assigned: 1=Latino/a; 2=Asian; 3=White/Caucasian.

Page 8: Excel Handbook

Excel for Data Analysis Page 6 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Creating a Codebook The examples above showed how the data entry would occur for each type of question. Generally, the analyst will want to set up the data entry spreadsheet before beginning the data entry. By knowing how to enter each type of question, the analyst can determine which questions will be entered into each column, being sure to reserve the first column for the IDs. Appendix II shows the codebook for the fictional survey being used as an example in this handbook. The ID is in column A (shown with a circle around it), question #1 is in column B, question #2, using the first version of multiple-response data entry, is in columns C through E, while question #2 using the second version of multiple-response data entry is in columns F through K (in this example, the “others” were ignored), the three parts of question #3 are in columns L, M and N, and so on. This codebook also shows the numeric equivalents assigned to each question response. It is a good idea to hang on to this codebook. It will serve as a customized guide in data entry, and in the analysis of the data once the dataset has been created. The example below shows the entered data for the surveys shown in Appendix I.

(Note: the columns for the open-ended questions were shrunk to allow all the columns to show.)

Page 9: Excel Handbook

Excel for Data Analysis Page 7 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Reminder: Wrapping Text Sometimes the text entered into a cell is too long for it to display in its entirety. To turn on text wrapping (the text will automatically move to the next line if it runs out of room), highlight the cells to be formatted, then choose “Format” from the menu bar, and then “Cells.”

Click on the “Alignment” tab, and check the box labeled, “Wrap text.” Click the OK button to apply the formatting. The text should be wrapped in the cell. Note that wrapping text will change the height of the rows.

Page 10: Excel Handbook

Excel for Data Analysis Page 8 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Analyzing the Data Now that the data collected by the program has been entered into an electronic dataset, the analyst is ready to start analyzing the information to get answers to the questions posed. This next section will demonstrate how to use formulas and functions within Excel to produce the “statistics” or summaries of the information needed. Reminder: Formulas Formulas are used to perform calculations within a spreadsheet. To insert a formula, as opposed to a number or text, type an equals sign (“=”) in the cell where the calculation is to be performed, and then type in the rest of the formula. A formula can perform mathematical calculations or execute a wide variety of functions (see below for more on functions). To add or subtract, use the plus (+) or minus (-) symbol. To multiply, use an asterisk (*) and to divide use a slash (/). Use parentheses as necessary to indicate the desired order of operations. For example, if the analyst wanted to know how many seconds there were in 3 hours, he or she could type in the formula: =3*60*60. The result displayed in the cell would be 10,800. There might have been a cell somewhere on the page that had a value of “3” to indicate three hours; for the sake of an example, this cell is T21. To know how many seconds that represented, use the same formula as above, but exchange the “3” for the cell reference: =T21*60*60. If the number of hours in cell T21 changed, the result of the formula would also change.

Reminder: Functions and Referring to a range of cells Functions can be used within formulas to perform special calculations or manipulations. There are a large number and variety of functions that can be used in Excel. Some of the functions are mathematical, some are logical, some are statistical, and others serve yet more purposes. All functions begin in a similar fashion: the equals sign (=), the function, immediately followed by an open parenthesis, the references on which the function should operate each separated by a comma (a different number of references are needed for each function), and a close parenthesis. For example, the “SUM” function can be used to add the values of several cells. Some functions will refer to a “range” of cells. For example, if an analyst wanted to total the number of youth served in the table below, a formula could be used like that found in cell B5: =B2+B3+B4. Alternatively, the SUM function could be used which referred to a range of cells to be summed, like this: = SUM(B2:B5). The colon indicates that a range of cells is being referred to, starting with (and including) the cell to the left of the colon, and ending with (and including) the cell to the right of the colon. The function “SUM” indicates what is to be done with this range of cells – total all the values together. Calculating an Average Calculating the average of a range of cells is a fairly simple procedure within Excel, and appropriate for certain types of data. For example, in the fictional survey for our training program, one of the questions asks respondents to report their annual household income. The average annual income of participants could be calculated and reported.

Page 11: Excel Handbook

Excel for Data Analysis Page 9 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

The function “AVERAGE” would be used to make this calculation. As shown in the table below, to create this formula an equals sign (=) is first typed, followed by the function, with the range of cells proceeding the function in parentheses.

Reminder: Formatting cells In many of the spreadsheet examples shown in this handbook, some of the cells are formatted as numbers, and some are formatted as percents. You will want to format the cells appropriately. To format a cell or group of cells, highlight the cells you wish to format, then choose “Format” from the menu bar, and then “Cells.” A dialogue box will open, with a number of formatting options. You can format the alignment of the cell contents, the cell shading or border, or the “Number.” If you choose the “Number” tab, you will be presented with a list of types of number formats, such as “currency,” “percentage,” etc. Choose the type, and then decide how many decimals you want. The highlighted cells will be formatted according to the specifications you choose.

Page 12: Excel Handbook

Excel for Data Analysis Page 10 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Creating a Frequency Distribution for a Single-Response Question Creating a frequency distribution, or a count and/or proportion of respondents giving each response to a question, is an intuitively easy process. However, doing it within Excel for a large number of cases is actually a multi-step procedure. The first step is to count how many respondents gave each response. There is a function within Excel that will help automate this step: “COUNTIF.” To use this function, specify two items: - What range of cells contains the answers to the question of interest, and - Which particular answer should be counted (“the criterion”). The function is set up as: =COUNTIF(range of cells, criterion). To know how many people attended the training program just one or two times, the analyst would want to count how many times “1” (the numeric assignment for question #1 to the response “1 to 2”) was entered as the answer to question #1. The data for question #1 are in column B, and specifically in rows 2 through 7. The formula to enter to find out how many respondents said they attended one or two sessions would be: =COUNTIF(B2:B7,1) The results can be seen in the table below in cell B13. The formula is shown to the right in cell C13.

To get a count of the number of responses to each of the other possible answers, use the same formula, but change the criteria each time. (See the formulas in cells C14, C15, and C16.) In this example, no participants attended 1 to 2 sessions, three participants attended 3 to 4 sessions, two participants attended 5 sessions, and one participant attended 6 or more sessions.

Page 13: Excel Handbook

Excel for Data Analysis Page 11 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

To know the proportion (percent) of respondents attending 6 or more times, the analyst would want to divide the number who gave that answer by the total number of those who answered the question. The SUM function can be used to total the number of respondents who answered that question. In the example above, the formula would be: =SUM(B13:B16). In the table below, that formula was entered into cell B11. To determine the proportion of people giving that answer, the contents of cell B16 would need to be divided by cell B11. As shown below, those results are displayed in cell B22. The formulas showing the formulas for calculating the proportion giving each answer to question #1 are also shown.

Page 14: Excel Handbook

Excel for Data Analysis Page 12 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Reminder: Absolute versus relative cell references In a formula, a cell reference can be made in a “relative” or an “absolute” manner. For example, looking at the table below, if the analyst wanted to calculate a percent, he or she might create a formula in cell C2 which would display the proportion of youth served who are 12-14 years old. That formula would be: =B2/B5, which would divide the value of B2 (12) by the value of B5 (112). The analyst may then wish to also calculate the proportion of youth served who are 15-17 years old. If the contents of cell C2 were copied to cell C3, the formula would look like this: =B3/B6. This is because in Excel the cell references in this formula are “relative” references; that is, Excel has assumed that because in cell C2 the calculated number was derived by dividing the number in the same row and one column to the left by the number three rows below and one column to the left, the same thing should happen in the cell to which the formula is copied. However, cell B6 is blank, so an invalid number would be calculated in cell C3 using this formula. This can be fixed by changing the formula after it has been copied, so that the denominator refers to B5. But, if the formula is then copied to cell C4, the denominator would again have to be manually changed in the formula to refer to the correct cell that contains the total number of youth served. If this manual change was not made, the formulas in column C would look like the formulas in column D in the table below. If, however, an “absolute” reference was used to refer to the row that contains the total number of youth served, when the formula was copied, the denominator would always refer to row 5. The dollar sign ($) is used to indicate an absolute reference. In this example, it is only used for the row designation, not for the column designation. It can be used for both the row and column designation, or only one or the other. Excel defaults to assuming that all cell references are relative, unless the change is made manually. Knowing how to use relative and absolute references can greatly speed up creation of spreadsheets in Excel. .

Page 15: Excel Handbook

Excel for Data Analysis Page 13 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Creating a Frequency Distribution for a Multiple-Response Question The approach to be used to calculate the results to a multiple-response question depends upon the approach used to enter the data. If the data have been entered such using the first approach described, where a numeric assignment is made for each possible response, but more than one column is designated for entry of the results (as in columns D, E and F in the table below), then the counts and proportions can be calculated in a manner quite similar to that of an single-response question. The change would be in the definition of the range of cells to include in the count. Instead of covering only one column, it would cover multiple columns. In this example, the number of people who said they heard of the program through the neighborhood newsletter would be determined using the formula: =COUNTIF(D2:F7,1) Calculating the percent of respondents who heard of the program through the neighborhood newsletter would also be changed slightly. Instead of dividing the number of respondents giving a specific answer by the sum of the cells F13 through F17 (which would be the total number of responses, not respondents answering the question), the denominator is the total number of respondents answering the question. To determine this, the number of valid answers entered in column D would need to be examined. This can be done using the COUNT function. This formula is not shown in the table below, but would be entered in cell D11 as follows: =COUNT(D2:F7) This function counts the number of non-blank answers in the range of cells specified. In this case, every respondent gave at least one answer, so the total is 6, the same as the number of returned surveys. This same formula (with the correct cell range specification) was used in cells E11 and F11. The numbers displayed there designate the number of people who gave two or more answers (4 people, see cell E11) or three answers (1 person, see cell F11). It should be noted when reporting the percentages to a multiple response question that the percents will add to more than 100%, as respondents can give more than one answer.

Page 16: Excel Handbook

Excel for Data Analysis Page 14 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

If the answers to question #2 were entered as shown in columns H through M, where each possible answer was assigned to a column, and a “1” was used to designate when a box was checked, then a slightly different approach is needed to create the frequency distribution. First, to get the total number of respondents who gave an answer, column H needs to be appropriately analyzed. In this instance, a “1” was entered if a respondent gave no answer to the question, and a “2” was entered if a respondent gave at least one answer. The formula in cell H11 (not shown in the table below) was =COUNTIF($H$2:$H$7,2), to count the number of valid answers to question #2. This formula was copied to cells I11, J11, K11, L11 and M11. To determine the number of people who indicated each potential source of familiarity with the training, the number of “1” responses in each column was counted, using the COUNTIF function. The formula for cell M13 (the number of respondents indicating they heard of the program by word of mouth) is shown in cell N13. A similar formula was used for each of the other responses. Next, to determine the proportion of respondents each of those counts represented, the counts were divided by the number of valid responses to question #2. As shown in cell M19, 33% of respondents reported they had heard of the training by word of mouth. The formula used to make that calculation is shown in cell N19. A similar formula was used for each of the other responses. Again, it should be noted when reporting the percentages to a multiple response question that the percents will add to more than 100%, as respondents can give more than one answer.

PivotTables cannot be used to calculate the frequency distribution of multiple response questions.

Page 17: Excel Handbook

Excel for Data Analysis Page 15 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Reminder: Functions Revisited “SUM” is only one of a large number of functions available in Excel. Some of the functions are mathematical, some are logical, some are statistical, and others serve yet more purposes. All functions begin in a similar fashion: the function, immediately followed by an open parenthesis, the references on which the function should operate each separated by a comma (a different number of references are needed for each function), and a close parenthesis. The functions needed for simple descriptive analyses in Excel are shown below. Functions and formulas used for simple descriptive analyses in Excel The table on the next page displays the functions used to perform the analyses described in this handbook. The examples all refer to the spreadsheet and examples shown in Appendix III.

Page 18: Excel Handbook

Excel for Data Analysis Page 16 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Functions and formulas used for simple descriptive analyses in Excel

Calculate . . . by . . . using the function or formula . . . operators are: example:

value displayed: what it means:

the number of surveys completed

counting the number of rows of data entered (regardless of whether some cells/rows are blank)

ROWS

range of cells for which the number of rows should be counted

=ROWS(B2:B7) 6 6 surveys were returned

the average rating or answer of those who responded

calculating the average of the ratings or answers given by those who gave an answer

AVERAGE range of cells containing the values to be averaged

=AVERAGE(AH2:AH7) $29,000 The average annual income as reported for question #10

the lowest number given as an answer

examining the values in a range of cells, and finding the lowest value

MIN range of cells containing the values to be examined

=MIN(AH2:AH7) $15,000 The lowest annual income as reported for question #10

the highest number given as an answer

examining the values in a range of cells, and finding the highest value

MAX range of cells containing the values to be examined

=MAX(AH2:AH7) $57,000 The highest annual income as reported for question #10

the number of respondents who gave a specific answer*

counting the number of responses of a certain type within a range of cells

COUNTIF

1) the range of cells to be examined 2) the value to be counted

=COUNTIF(B$2:B$7,3) 2 2 people gave an answer of “5 times” question #1

the total number of respondents who answered the question**

adding the number of people who gave a valid answer to a question

SUM range of cells to be totaled =SUM(B13:B16) 6 6 people answered

question #1

the total number of respondents who answered the question

counting the number of nonblank answers COUNT range of cells to be

examined =COUNT(E2:E7) 4

4 people gave two or more answers to question #2 (as column E contains the second answer people gave to question #2)

the proportion (percent) of respondents who gave a specific answer

dividing the number of people who gave a specific answer by the total number of people who answered the question

(division) [cell reference1]/[cell reference2]

cell reference1 is the cell reference of the numerator; cell reference2 is the cell reference of the denominator

=B15/B$11 33% 33% of respondents gave an answer of “5 times” to question #1

*This is used for each “row” or part of a frequency distribution. ** Or the sum of any list of numbers.

Page 19: Excel Handbook

Excel for Data Analysis Page 17 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Presenting the Results: One Quick Idea Once the frequency distributions of the data set have been produced, how will the analyst and other program staff share this information with others? The Excel spreadsheet is not very pretty. One idea is to create an “annotated instrument;” that is, typing the results into a blank questionnaire.1 Most evaluation forms or surveys have been created using word processing software such as Word or WordPerfect, and thus are well-suited to this approach. A new file should be created from the electronic version of the survey. The check boxes can then be replaced with the proportion of respondents giving each answer. For example:

1) How many of the training sessions did you attend? 0% 1 to 2 50% 3 to 4 33% 5 17% 6 or more

Staff can write a cover memo or report to accompany the annotated instrument that explains the methods used to obtain the data and interprets the results. An example copy of an annotated instrument can be found in Appendix IV.

1 The term “annotated instrument” is one created by and used by staff at National Research Center, Inc. It is NOT a commonly used evaluation term, but one that we think is descriptive.

Page 20: Excel Handbook

Excel for Data Analysis Page 18 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Using Pivot Tables for Basic and Advanced Analyses Pivot tables are an analytic tool at the disposal of the Excel user. They take a bit of time to set up, but can be very powerful. Pivot tables can be used as an alternate way to create frequency distributions, although they cannot be used for multiple response questions. They can also be used to create crosstabulations of data. For example, the analyst might wish to know whether males and females have a different response to a training, or whether younger respondents feel more positively about staff than older respondents. A useful first step before creating a pivot table is to name the range of cells that will be used for the analyses. This range of cells should include the first row with the variable names. Reminder: Naming a Range of Cells To name a range of cells, highlight all the columns and rows that make up the database. Choose Insert from the menu bar, select Name and then Define… In general, when the named range of cells will be used for creating pivot tables, it is a good idea to name the range “Database.” This is the default name used by Excel in the pivot table wizard. The “Define Name” dialogue box above shows that the name “Database” has been typed in. The field labeled “Refers to:” shows that Database will refer to the cells starting at A1 and going to W7 in the worksheet labeled “Data Entry.” These are the cells that contain the data entered for the fictional survey. Once a range of cells has been defined, pivot tables can be created from those data. It is easiest to create the pivot tables on another worksheet within the workbook.

Page 21: Excel Handbook

Excel for Data Analysis Page 19 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Reminder: Worksheets within a Workbook (or Spreadsheet) An Excel file is often referred to as a “spreadsheet.” This file, however, is comprised of a group of “worksheets.” By default, a new workbook in Excel usually contains three worksheets. These are usually labeled “Sheet1,” “Sheet2,” and “Sheet3.” The note below was entered in cell B7 on Sheet2. To see a different worksheet, simply click on the tab of the worksheet to be viewed. To rename the worksheets, double-click the tab and type a new name. Names are limited to a certain number of characters.

Page 22: Excel Handbook

Excel for Data Analysis Page 20 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Creating a PivotTables (“Basic” Analyses) Before the analyst sets up the pivot table, he or she should place the cursor in the cell where it is desired to generate the pivot table. To set up a pivot table, go to the Data menu, then select PivotTable and PivotChart Report… The PivotTable and PivotChart Wizard will walk one through the rest of the set up. In the example below, the pivot table will be placed in cell B4. The Pivot Table and PivotChart Wizard Once “PivotTable and Pivot Chart Report...” has been selected from the Data menu, the Pivot Table and PivotChart Wizard will start displaying a series of dialogue boxes. The first dialogue box is shown below as Step 1 of 3. (Note: Different versions of Excel will have slightly different Pivot Table and PivotChart Wizard dialogue boxes, but the steps to follow are the same or similar.) Step 1: Two questions are asked in Step 1 of the Wizard. For the most part, the analyst will select the Wizard’s default options. In answer to the first question, the data to be analyzed is an Excel list or database. In answer to the second question, a PivotTable will be created. (Note: PivotCharts are not discussed in this handbook, but the analyst may wish to try this option.) Click Next to continue onto the next step of the Wizard.

Page 23: Excel Handbook

Excel for Data Analysis Page 21 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Step 2: In Step 2, the Wizard asks for the location of the data to be used in the PivotTable. The name “Database” is automatically inserted as the answer. If another named range is desired, it can be typed into the field. If the range of cells to be used has not been named, it can be selected by clicking on the “Browse…” button. Click Next to continue onto the next step of the Wizard. Step 3: In Step 3, the Wizard asks where the PivotTable should be placed. The default is the location of the cursor when the Wizard was started. At this point, the analyst will choose the data to be displayed in the PivotTable by clicking on the “Layout…” button. When this button is clicked, another dialogue box is displayed Layout: The Layout dialogue box displays all the variables or fields available for display in the PivotTable. These fields are shown as a series of buttons in the right half of the dialogue box. If there are a large number of fields, the scroll button below the fields can be used to show additional field buttons. In the left half of a blank template is shown. To select a field for display, simply drag the fields from the right into the areas on the left. To create a pivot table that displays the frequency of training attendances, the button q1 (“How many of the training sessions did you attend?”) would be dragged into the row area, so that the values in q1 will be listed vertically as rows. A field is also needed for the data section. It does not really matter what button is dragged into the data section, as it will be used simply as a counter. However, it should be a field that has no missing data; the ID field is ideal for this situation. As shown above, the ID field was dragged into the data area. Usually by default the field in the data area will be shown as a “Count.” If a different summary is desired, double-click the button, and a dialogue box displaying various options will be displayed.

Page 24: Excel Handbook

Excel for Data Analysis Page 22 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

PivotTable Field: The Field dialogue box shown to the left is displayed if a button in the data portion of the template is double-clicked. In this example, the data summary chosen is “Count.” In addition, if the “Options>>” button is clicked, more options for the display of the data are shown. In this instance, it would be appropriate to display the information as a proportion, so the option of showing the data as: “% of column” was selected. Format Cells: To choose a number format for the data display, click on the “Number” button in the PivotTable Field dialogue box. A Format Cells dialogue box will be displayed, from which an appropriate number format can be selected.

Page 25: Excel Handbook

Excel for Data Analysis Page 23 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

After this, click the “OK” buttons until the Step 3 dialogue box is again showing. At this point, if the “Finish” button is clicked, the PivotTable will be displayed. In this example, the PivotTable will appear as shown to the right: Note that when using the PivotTable method for this question, the value 1 (“1 to 2 sessions”) is not listed because no one selected this response in the survey. Crosstabulation of Data Using PivotTables (“Advanced” Analyses) Sometimes it is useful to analyze the data based on certain respondent characteristics; for example, satisfaction ratings by gender or program attended. One of the easiest ways to generate a table like this is through the use of a PivotTable. The example to the right shows the PivotTable layout and resulting table to perform a crosstabulation of the results to question #5 “How would you rate the overall quality of this training?” by the gender of the respondent. (Of course, crosstabulations are recommended with larger datasets than that created for these examples, with sufficient number of cases within each subgroup examined.) This PivotTable Layout: (Q9, gender, is placed in the column area, while q5, quality rating, is placed in the row area. ID is again used for the data section.) produces:

Females (1) gave more positive answers than did males (2).

Page 26: Excel Handbook

Excel for Data Analysis Page 24 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

The analysis in the previous example could also be performed using the average quality rating, on a scale from 1 to 4, where 4 = “excellent” and 1 = “poor.” This PivotTable Layout: (Q9, gender, is placed in the column area, while q5, quality rating, is placed in the data area. The type of data summary was changed to “Average”, and the Number formatting was changed to a number with two decimal places.) produces:

Again, this shows that females (1) gave higher quality ratings than did males (2).

Page 27: Excel Handbook

Excel for Data Analysis Page 25 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

APPENDIX I: Example Completed Surveys for Data Entry The following pages show the completed surveys from six participants in a fictional training program. These were used for all the examples in this handbook.

Page 28: Excel Handbook

Excel for Data Analysis Page 26 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Page 29: Excel Handbook

Excel for Data Analysis Page 27 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Page 30: Excel Handbook

Excel for Data Analysis Page 28 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Page 31: Excel Handbook

Excel for Data Analysis Page 29 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Page 32: Excel Handbook

Excel for Data Analysis Page 30 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Page 33: Excel Handbook

Excel for Data Analysis Page 31 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Page 34: Excel Handbook

Excel for Data Analysis Page 32 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

APPENDIX II: Example Codebook

Page 35: Excel Handbook

Excel for Data Analysis Page 33 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

APPENDIX III: Example Analysis, with Formulas

Page 36: Excel Handbook

Excel for Data Analysis Page 34 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Page 37: Excel Handbook

Excel for Data Analysis Page 35 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Page 38: Excel Handbook

Excel for Data Analysis Page 36 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

Page 39: Excel Handbook

Excel for Data Analysis Page 37 © National Research Center Inc. 3005 30th St. • Boulder, CO 80301 • (303) 444-7863

APPENDIX IV: Example of an “Annotated Instrument” The next page shows an example of an “annotated instrument” for the training program using the data examples as included in the previous appendices.

Page 40: Excel Handbook

Excel for Data Analysis Page 38 © National Research Center Inc. 3005 30th St.• Boulder, CO 80301•(303) 444-7863

Training Evaluation: Annotated Instrument

1) How many of the training sessions did you attend? 0% 1 to 2 50% 3 to 4 33% 5 17% 6 or more

2) How did you hear about this training? (Please check all that apply.)

33% Neighborhood newsletter 50% Your child’s school 17% Bulletin boards in community buildings 33% Word of mouth 50% Flyers 0% Other

3) Please rate the following aspects of the training:

Very Very Poor Poor Good Good

The instructor’s knowledge of the topic .......................................................... 0% 17% 67% 17% The instructor’s presentation style/skills ...................................................... 0% 25% 50% 25% The handouts or take-home materials ...........................................................20% 0% 80% 0%

4) Rate the extent to which you agree or disagree with each of the following statements. Strongly Strongly Disagree Disagree Agree Agree

I would strongly recommend this training for my friend............................. 0% 20% 60% 20% This training will help improve the quality of like for my family ................. 0% 17% 50% 33%

Poor Fair Good Excellent 5) How would you rate the overall quality of this training? ............................... 17% 0% 33% 50% 6) Do you have any other comments you would like to make about this training?

• I think we spent too much time reviewing the background information. • I had a lot of fun. I thought Angela was great. • This was great! I will definitely apply what I learned at work and at home!

7) What is your race? 50% Latino/a 17% Asian 33% White

8) How long have you lived in Colorado? 17% 6 years 50% 7 years 33% 8 years

9) What is your gender? 50% Female 50% Male

10) What is your annual household income? average annual income:= $29,000

33% less than $20,000 33% $20,000 to $29,999 17% $30,000 to $39,999 17% $40,000 or more

11) Is your child enrolled in the free lunch program?

50% Yes 50% NO

Thank you for your answers!