14
Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that we can perform different quantitative analysis, including statistical analysis, in Excel. It is powerful, it has a very simple user interface and the purpose of this document is to acquaint you with the free Analysis ToolPak add-in—how it works and what it can do. Excel provides a set of data analysis tools called the Analysis ToolPak which you can use to save steps when you develop complex statistical analyses. You provide the data and parameters for each analysis; the tool uses the appropriate statistical macro functions and then displays the results in an output table. Some tools generate charts in addition to output tables. This document illustrates the use of Excel’s Analysis ToolPak add-in for data analysis. Analysis ToolPak is bundled with Excel, which is free for anyone who owns Excel. Since the early days of Excel—at least 20 years ago—Analysis ToolPak has been part of Excel. Indeed, its current form is almost identical to its form then. Admittedly, Microsoft has recently revised many of Excel’s statistical functions to make them more accurate numerically and to provide a more consistent naming convention, but the functionality and user interface of Analysis ToolPak have changed hardly at all. If the Data Analysis command is selectable on the Tools menu (2003 or 2007), Data Menu (2010) onwards, then the Analysis ToolPak is installed on your system. However, if the Data Analysis command is not on the Tools menu, you need to install the Analysis ToolPak by doing the following: For Excel 2003 and previous version On the Tools menu, click Add-Ins.... If Analysis ToolPak is not listed in the Add-Ins dialog box, click Browse and locate the drive, folder name, and file name for the Analysis ToolPak Add-in Analys32.xll usually located in the Program Files\Microsoft Office\Office\Library\Analysis folder. Once you find the file, select it and click OK. For Excel 2007 and above On the file menu click option menu, then you will get this window and then click the Add-Ins category. If you're using Excel 2007, click the Microsoft Office Button, and then click Excel Options. In the Manage box, select Excel Add-ins and then click Go. In the Add-Ins box, check the Analysis ToolPak check box, and then click OK. If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it. If you are prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to install it.

Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Using Excel’s Analysis ToolPak Add-In

Bijay Lal Pradhan, PhD

Introduction I have a strong opinions that we can perform different quantitative analysis, including statistical analysis, in Excel. It is powerful, it has a very simple user interface and the purpose of this document is to acquaint you with the free Analysis ToolPak add-in—how it works and what it can do.

Excel provides a set of data analysis tools called the Analysis ToolPak which you can use to save steps when you develop complex statistical analyses. You provide the data and parameters for each analysis; the tool uses the appropriate statistical macro functions and then displays the results in an output table.

Some tools generate charts in addition to output tables. This document illustrates the use of Excel’s Analysis ToolPak add-in for data analysis. Analysis ToolPak is bundled with Excel, which is free for anyone who owns Excel.

Since the early days of Excel—at least 20 years ago—Analysis ToolPak has been part of Excel. Indeed, its

current form is almost identical to its form then. Admittedly, Microsoft has recently revised many of

Excel’s statistical functions to make them more accurate numerically and to provide a more consistent

naming convention, but the functionality and user interface of Analysis ToolPak have changed hardly at

all.

If the Data Analysis command is selectable on the Tools menu (2003 or 2007), Data Menu (2010) onwards, then the Analysis ToolPak is installed on your system. However, if the Data Analysis command is not on the Tools menu, you need to install the Analysis ToolPak by doing the following:

For Excel 2003 and previous version

On the Tools menu, click Add-Ins.... If Analysis ToolPak is not listed in the Add-Ins dialog box, click Browse and locate the drive, folder name, and file name for the Analysis ToolPak Add-in — Analys32.xll — usually located in the Program Files\Microsoft Office\Office\Library\Analysis folder. Once you find the file, select it and click OK.

For Excel 2007 and above

On the file menu click option menu, then you will get this window

and then click the Add-Ins category. If you're using Excel 2007, click the Microsoft Office Button, and then click Excel Options. In the Manage box, select Excel Add-ins and then click Go. In the Add-Ins box, check the Analysis ToolPak check box, and then click OK. If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it. If you are prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to install it.

Page 2: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Figure 1 Excel Option

The Data Analysis ToolPak has a Descriptive Statistics tool that provides you with an easy way to calculate summary statistics for a set of sample data. Summary statistics includes Mean, Standard Error, Median, Mode, Standard Deviation, Variance, Kurtosis, Skewness, Range, Minimum, Maximum, Sum, and Count. This tool eliminates the need to type individual functions to find each of these results. Step 1. Select the Tools *pull-down menu, if you see data analysis, click on this option, otherwise, click on add-in.. option to install analysis tool pak. Step 2. Click on the data analysis option. Step 3. Choose Descriptive Statistics from Analysis Tools list. Step 4. When the dialog box appears: Enter A1:A4 in the input range box, A1 is a value in column A and row 1, in this case this value is 2. Using the same technique enter other VALUES until you reach the last one. If a sample consists of 20 numbers, you can select for example A1, A2, A3, etc. as the input range. Step 5. Select an output range, in this case B1. Click on summary statistics to see the results. Select OK. When you click OK, you will see the result in the selected range.

Figure 2 Add-Ins List Once Analysis ToolPak is loaded, you will see a Data Analysis item on the Data ribbon. In fact, if you have also loaded the Solver add-in, the Data Analysis button is right below the Solver button, as shown in Figure 3.

Figure 3 Data Ribbon

When you click the Data Analysis button, you see the list of tools available, some of which appear in Figure 4. These tools are described in subsequent sections of this document.

Page 3: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Figure 4 Data Analysis Tools

Descriptive Statistics

You can obtain summary measures of numeric variables by selecting Descriptive Statistics from the Data Analysis Tools list in Figure 4. Here is an example based on the file bank confidance.xlsx (see Figure 5).

Figure 5 confidence score data

When you select Descriptive Statistics,

you see the dialog box in Figure 4. It

guesses correctly that the only numeric

data are in the range C1:C71, although

you have to check that labels are in the

first row. The “Grouped By:” option

should usually be “Columns,” meaning

that each variable is in a column, not a

row. There are three options for the

location of the results, and if you choose

the New Worksheet option, you can

provide a name for this new worksheet.

Finally, you can check any of the four

options at the bottom, although none are

checked by default.

Figure 6 Descriptive Statistics Dialog Box

Page 4: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Figure 7 Summary Statistics of Salaries

Salary

Mean 13906.49

Standard Error 767.3394

Median 12800

Mode 9200

Standard Deviation 6420.022

Sample Variance 41216686

Kurtosis -1.01838

Skewness 0.455332

Range 22000

Minimum 5000

Maximum 27000

Sum 973454

Count 70

Largest(1) 27000

Smallest(1) 5000

Confidence Level(95.0%) 1530.8

If you check the Confidence Level for Mean option in Figure 6. What do you get? The output for the

Salary variable appears in Figure 7. Unfortunately, no interpretation of the 1530.8 value in last row is

provided. It turns out that this is the value that should be subtracted from and added to the sample

mean to get a 95% confidence interval for the mean.

Histograms

The Histogram option in Analysis ToolPak allows you to create a frequency table and accompanying

chart of a numeric variable. However, the Histogram dialog box not only requires the range for the data

variable, but it also requires a “Bins” range. In this case of age, there will be 7 bins: less than or equal to

20, greater than 20 but less than or equal to 40, and so on, up to greater than 60.

Figure 8 Histogram Dialog Box

Page 5: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Figure 9 Age Data with Bins

The results appear in Figure 10. They include the table of bin frequencies and the corresponding chart. If you prefer the bars in the histogram to be right next to one another, you can right-click any bar, select Format Data Series, and choose a Gap Width of 0. Also, you can delete the Frequency legend. Figure 10 Age Data with Bins and Histogram

Correlation (and Covariance)

It is easy to create a table of correlations with Analysis ToolPak, as illustrated here with the selected three columns of data in the file bank confidance.xlsx (see Figure 13). You choose Correlation from the Data Analysis Tools list and fill out the dialog box as shown in Figure 14. The resulting table of correlations appears in Figure 15.

Figure 11 bank confidence data

Page 6: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Figure 12 Correlations Dialog Box

Salary Age Qualification

Salary 1

Age 0.349988 1 Qualification 0.192526 0.122485 1

In the same way we can find out the covariances.

Rank and Percentile

Analysis ToolPak has a Rank and Percentile procedure that you might find useful. You select a column of

numeric data, and the procedure essentially sorts the data from high to low. Figure 16 shows the results of

doing this to the Salary variable in the file. The Rank column is equivalent to using Excel’s RANK (or the newer

RANK.EQ) function. The Percent column lists the approximate percentage of salaries at or below each given

salary. It is equivalent to using the PERCENTRANK.INC function (available starting in Excel 2010).

Figure 13 Ranks of salary

Point Column1 Rank Percent

27 27000 1 98.50%

40 27000 1 98.50%

34 25000 3 95.60%

46 25000 3 95.60%

68 24665 5 94.20%

38 24300 6 92.70%

14 24000 7 91.30%

44 23670 8 89.80%

Hypothesis Tests

Analysis ToolPak has five separate procedures for implementing common hypothesis tests. Three of

these are for testing the difference between two sample means when the two samples are

independent. Another tests the difference between two sample means when the two samples are

paired. Finally, there is a test for equality of two sample variances.

Tests for Difference between Two Sample Means: Independent Samples

Page 7: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

These procedures are labeled z-Test: Two Sample for Means, t-Test: Two-Sample Assuming Equal Variances, and t-Test: Two-Sample Assuming Unequal Variances. The first assumes the population variances are known, whereas the last two make no such assumption.

Analysis ToolPak requires the data to be unstacked. As an example, the data in the file bank

confidance.xlsx are stacked (see Figure 17). There is a categorical variable sex and a numeric variable

income. Indeed, this is the usual data arrangement in such data sets. However, to use any of the

Analysis ToolPak tests for testing the mean rating across male and female, the data must first be

unstacked, as shown in Figure 18, where the two column lengths for the unstacked variables are not

necessarily the same. Here we have to unstack the data manually (by sorting on gender and then

copying and pasting).

Figure 17 Stacked Exercise Data Figure 18 Unstacked Data

In any case, once you have the unstacked data, all three of the procedures are straightforward and similar. For example, the dialog box for the t-Test: Two-Sample Assuming Equal Variances procedure is shown in Figure 19.

Figure 19 Two-Sample Test Dialog Box

salary sex

1 7000 53

2 12500 20

3 9200 26

4 22000 32

5 18200 40

6 5000 21

7 11500 29

8 17400 40

9 12200 26

10 12800 29

11 7000 42

12 7500 31

….. …..

….. …..

71 ….. …..

salary male salary female

1 7000 11500

2 12500 17400

3 9200 7000

4 22000 7500

5 18200 22000

6 5000 24000

7 12200 17000

8 12800 14000

9 18000 9200

10 15000 9500

11 7800 10000

12 6800 16200

….. …..

….. …..

71 ….. …..

Page 8: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

After widening the columns appropriately, the results appear in Figure 20. Interestingly, even though the dialog box asks for a significance level (alpha), it is not used in the results at all. However, you can mentally compare your alpha level to the p-value shown in cell B11 for a one-tailed test or in cell B13 for a two-tailed test.

Figure 20 Two-Sample Test Results

t-Test: Two-Sample Assuming Equal Variances

salary male salary female

Mean 13217.97297 14678.45455 Variance 47057190.92 34771460.07 Observations 37 33 Pooled Variance 41275670.52 Hypothesized Mean Difference 0 df 68

t Stat -

0.949418795 P(T<=t) one-tail 0.172884911 t Critical one-tail 1.667572281 P(T<=t) two-tail 0.345769822 t Critical two-tail 1.995468931

Test for Equality of Variances

Given the test results in Figure 20, you might want to check whether the equal-variance assumption is

reasonable. You can do this with the F-Test Two-Sample for Variances procedure, again using the unstacked data. The dialog box is filled out exactly as in t-test, and the results appear in Figure 21. The

p-value of about 0.07 indicates that there is evidence, but not totally convincing evidence, that the

two variances are not equal.

Figure 21 Equal Variance Test Results

F-Test Two-Sample for Variances

salary male salary female

Mean 13217.97297 14678.45455

Variance 47057190.92 34771460.07

Observations 37 33

df 36 32

F 1.353328012 P(F<=f) one-tail 0.194021106 F Critical one-tail 1.783845791

Page 9: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Test for Difference between Two Sample Means: Paired Samples

If you are comparing two samples that are paired in some natural way, you should use the t-Test: Paired Two Sample for Means procedure. As an example, the husband and wife ratings are naturally paired, assuming that the reactions of husbands and wives to bank confidance are correlated (see Figure 22). These data are already unstacked, as Analysis ToolPak requires, so the Paired Sample dialog box can be filled in directly, as shown in Figure 23.

Figure 22 Paired Confidance Ratings

Figure 23 Paired-Sample Test Dialog Box

Figure 24 Paired-Sample Test Results t-Test: Paired Two Sample for Means

confidance Score Husband Confidance Score Wife

Mean 4.444444444 5.055555556 Variance 2.614379085 2.408496732 Observations 18 18 Pearson Correlation 0.786607714 Hypothesized Mean Difference 0 df 17 t Stat -2.500455885 P(T<=t) one-tail 0.011463202 t Critical one-tail 1.739606726 P(T<=t) two-tail 0.022926405 t Critical two-tail 2.109815578

Page 10: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Analysis of Variance (ANOVA) Procedures

Single-Factor ANOVA

Single-factor ANOVA, also called one-way ANOVA, is an extension of the two-sample t-test (with independent samples) to more than two samples. It tests whether the means of all samples are equal.

The Analysis ToolPak’s Anova: Single Factor procedure implements this test, again assuming unstacked

data. As an example, the file data2.xlsx lists cereal sales at Bhatbhateni superstore for five different

shelf heights (see Figure 25). To run the analysis, we have to fill out the dialog box as shown in Figure

26.

Figure 25 Cereal Sales Data

Figure 26 Single-Factor ANOVA Dialog Box

The results appear in Figure 27. The sample mean and variance for each shelf height are listed, followed by the ANOVA table for the test. In this case, its p-value (0.308 > 0.05) indicates that the means sales of cereal are not significantly different in the position of it placed.

Page 11: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Figure 27 ANOVA Results for Cereal Data

Anova: Single Factor SUMMARY

Groups Count Sum Average Variance Lowest 25 9483 379.32 9217.81 Next-to-lowest 25 10294 411.76 12520.61 Middle 25 10069 402.76 10373.44 Next-to highest 25 10295 411.8 11262.83 Highest 25 11080 443.2 10722.58

ANOVA

Source of Variation SS df MS F P-value F crit

Between Groups 52549.71 4 13137.43 1.214241 0.308373 2.447237 Within Groups 1298335 120 10819.45 Total 1350884 124

Two-Factor ANOVA with Replication

Analysis ToolPak’s Anova: Two-Factor With Replication procedure is an extension of the single-factor

ANOVA procedure. Now there are two factors, and observations are made for each combination of the

two factor levels. The test is again basically a test of equal means, or equivalently, of equal factor-level

effects. there is no easy way to rearrange the data as required other than by copying and pasting. The

data arranged and its out is exhibited in sheet ANOVA_R.

Figure 31 Two-Factor ANOVA with Replications Results Two-Factor ANOVA without Replication

Analysis ToolPak also has an Anova: Two-Factor Without Replication procedure. It is like the “With

Replication” except that there is only one observation per factor-level combination. As an example, the

file data2.xlsx with production worksheet has one observation of production of each Machine for each

four operator. With this arrangement, we can run Two-Factor ANOVA without replication and get

solution of it.

Figure 32 Rearranged Soap Sales Data

Machines

Operators Machine1 Machine2 Machine3 Machine4

Operator1 10 12 14 16

Operator2 12 11 13 16

Operator3 14 15 12 11

Operator4 16 10 17 17

Whether difference in average productivity is due to the difference in operators and the machines.

Hypothesis (Alternative) 1: a. Production due to machine is significantly different b. Production due to operator are significantly different. The results appear in Figure 33. Summary statistics are listed for each level of each of the two factors,

and the ANOVA table shows the results of the tests. Both the p-value are greater than 0.05 which

indicates that the productivity due to different machine and due to different operators are not

significantly different. (Alternative hypothesis is rejected in both cases)

Page 12: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Figure 33 ANOVA Results for Soap Sales Anova: Two-Factor Without Replication

SUMMARY Count Sum Average Variance Operator1 4 52 13 6.666667 Operator2 4 52 13 4.666667 Operator3 4 52 13 3.333333 Operator4 4 60 15 11.33333 Machine1 4 52 13 6.666667 Machine2 4 48 12 4.666667 Machine3 4 56 14 4.666667 Machine4 4 60 15 7.333333 ANOVA

Source of Variation SS df MS F P-value F crit

Rows 12 3 4 0.62069 0.619097 3.862548 Columns 20 3 6.666667 1.034483 0.422775 3.862548 Error 58 9 6.444444 Total 90 15

Regression Analysis

One of the favorite Analysis ToolPak procedures is its Regression procedure. For example we are

interested to find out the regression analysis of income on age and education level from our bank

confidence data

Figure 34 Bank confidence data

These data are fortunately in the form Analysis ToolPak requires—the independent variables are in contiguous columns—so the Regression dialog box can be filled out as shown in Figure 35. You can decide which of the five check boxes at the bottom to check (including none of them) for diagnostic analysis of the residuals.

Page 13: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Figure 35 Regression Dialog Box

The regression output shown in Figure 36 is standard. It includes the regression summary statistics at

the top, the ANOVA table for checking whether the regression has any significance as a whole, and the

information on the individual regression coefficients. One curious feature is that you automatically get

two versions of the confidence intervals for the coefficients—and you get them regardless of whether

you check the Confidence Level box in Figure 35. If you type 90% confidence level after checking as in

figure 35 then the two different confidence will be exhibited in the output summary as in figure 36.

Figure 36 Regression Output SUMMARY OUTPUT

Regression Statistics

Multiple R 0.38109

R Square 0.14523

Adjusted R Square 0.119714

Standard Error 6023.492

Observations 70

ANOVA

df SS MS F Significance

F

Regression 2 413026782.72 206513391.36 5.69 0.01

Residual 67 2430924538.77 36282455.80

Total 69 2843951321.49

Coefficients Standard Error t Stat P-

value Lower 95% Upper 95%

Lower 90.0%

Upper 90.0%

Intercept 4980.41 2743.26 1.82 0.07 -495.16 10455.98 404.89 9555.93

Age 201.26 69.12 2.91 0.00 63.29 339.22 85.97 316.54

Qualification 659.19 493.76 1.34 0.19 -326.36 1644.75 -164.36 1482.75 The residuals and residual plots requested in Figure 35 are shown in Figure 37. (Actually, all charts overlap one another, so you will probably want to move them around.) These residual plots let you see whether there are any obvious violations of the regression assumptions.

Page 14: Using Excel’s Analysis ToolPak Add-In - WordPress.com · 2018-06-15 · Using Excel’s Analysis ToolPak Add-In Bijay Lal Pradhan, PhD Introduction I have a strong opinions that

Figure 37 Residuals and Residual Plots

Sampling

The Sampling tool in Analysis ToolPak allows you to choose a random sample from a larger “population” of values. Of course, this “population” could be considerably larger. In any case, we might want to choose a random sample from large records. In this case, we have to chose one column only, Multiple column is not accessible in Excel.

Conclusion

Although Analysis ToolPak lacks many features and the overall professional look of other statistical

software packages, but it has basic statistical tool which can be use easily and it is easily accessible. If

you want to perform standard statistical analyses and you have only Excel, you can certainly get by

with Analysis ToolPak. You might have to rearrange data, widen output columns, reformat graphs, and

possibly a few other things, but you will be able to get the basic results you need fairly quickly and

easily. The alternative way to do theses statistical analysis through another add ins (Possibly Megastat) which we will discuss another session.