66
EXCEL PROJECT TUTORIAL

E XCEL P ROJECT T UTORIAL. G ETTING YOUR UNIQUE DATA SET … Go to the stat 216 homepage: // and

Embed Size (px)

Citation preview

EXCEL PROJECT TUTORIAL

GETTING YOUR UNIQUE DATA SET…

Go to the stat 216 homepage:http://www.stat.wmich.edu/s216 and click on Weekly Homework Link

GETTING YOUR UNIQUE DATA SET…

Under Excel Projects section, Click on HW Data

GETTING YOUR UNIQUE DATA SET… You will be directed to a page containing several data

sets. Click on the one assigned for this semester: Realestate Data

You would be directed to a page pertaining to that data set. Under Select Variables section, check all the box before each variable.

Check these

boxes

GETTING YOUR UNIQUE DATA SET…

At the bottom of the page, enter 30 for sample size, and your 4-digit pin that you use to access your weekly homework.

Enter 30 here

Your 4-digit pin here

then click on the submit button

GETTING YOUR UNIQUE DATA SET…

You would be directed to a page containing your unique data set.

COPYING YOUR DATA SET INTO EXCEL… On the page containing your unique data set, select

all then copy.

Open Microsoft Excel and paste your data set in the first cell

Paste your

data set here

COPYING YOUR DATA SET INTO EXCEL…

Click on DATA tab, then text to columns to

separate the variables into several columns

COPYING YOUR DATA SET INTO EXCEL…

You will see this box appear next.

Choose Delimited

then click next

COPYING YOUR DATA SET INTO EXCEL… On the next dialogue box, select how your data set is

delimited. In our case, each variable is separated by comma, so make sure only the box referring to comma is checked. Then click on Finish.

COPYING YOUR DATA SET INTO EXCEL… You would then see your data set separated into

columns. You may edit the font size and everything you want on this data. Since you are going to use this specific data set in all three phases of the project, save this data set with a filename that you could remember. E.g. stat216project

EXCEL PROJECTPhase I

PHASE I

In this phase, you are expected to identify the type and level of measurement of each variable that you are dealing with.

In addition, depending on what kind of variable that you have, what is the appropriate method of data presentation that you could use to present that variable?

Furthermore, what measures of location and spread could you compute for these variables to better describe your data set?

PHASE I

You may construct a table to help guide you on what to do with your variables.

Example:

Variable Type Level of Measurement

Price Numerical Ratio

Color Categorical Nominal

PHASE I

Once you have identified the type and level of measurement of each variable, what graphs or tables could you use to describe categorical variables? What about numerical variables?

Microsoft Excel has a data analysis toolpak that could assist you in coming up with graphs. In your Data tab, you should see a button labeled Data Analysis. If not, then you need to install this toolpak.

INSTALLING DATA ANALYSIS TOOLPAK...

In Excel 2007, click on the office button at the top, then choose

Excel Options

INSTALLING DATA ANALYSIS TOOLPAK...

You would see this box next.

Click on Add-Ins

INSTALLING DATA ANALYSIS TOOLPAK...

You would then be directed to the Add-Ins menu.

At the bottom of this menu, select Excel Add-Ins from the Manage drop down list then click on Go

INSTALLING DATA ANALYSIS TOOLPAK...

You would be directed to the Add-Ins menu. Check the box corresponding to Analysis Toolpak

then click OK.

INSTALLING DATA ANALYSIS TOOLPAK...

You would then see the Data Analysis button on the Data Tab.

GRAPHING VARIABLES…

Suppose you want to create a graph for a variable. Lets say for example, your variable has two categories: 1-Yes and 0-No.

For this variable, first thing you need to do is count the number of observations belonging to each category.

Then select the appropriate graph that you want to make.

GRAPHING VARIABLES…

Open the file containing your data set.

Suppose your data set contains a categorical variable, say Pool (0-No, 1-Yes)

GRAPHING VARIABLES…

In this particular example, suppose our observations for pool starts from D2 and goes up to D31.

In graphing categorical variables you must create a “bin” which contains all the categories of your variable.

GRAPHING VARIABLES…

Since we only have two categories for pool we would create a bin that has two categories as well, i.e. 0 and 1.

GRAPHING VARIABLES… Once you have created the bin, click on the Data tab,

then click on Data Analysis button. You would see a menu showing all the contents of the Analysis toolpak.

Since our goal is to count the number of observations for each category, choose Histogram, then click OK.

GRAPHING VARIABLES…

You would then be prompted to enter the Input Range and the Bin range.

The input range would be that column containing the observations for the variable.

The bin range is that column that contains the categories of the variable.

GRAPHING VARIABLES…

In our example: observations for pool starts from D2 to D31, while bin starts from K3 to K4

GRAPHING VARIABLES…

Once you click OK, a new worksheet would be created showing the counts for each category of the variable:

GRAPHING VARIABLES…

On this worksheet, click on INSERT tab, then choose the graph you want.

For example, we want a pie graph.

Click on Pie, then choose the type of Pie that you want. It wouldthen show you thePie graph

COMPUTING SUMMARY STATISTICS… Suppose for example, you want to describe a variable

using some numerical descriptive measures. Let’s say our variable is price of a house. In our data set, lets say this variable is on the first column.

Again, click on Data tab, then Data Analysis button. From the menu, select Descriptive Statistics

COMPUTING SUMMARY STATISTICS…

On the Input Range box, enter the range of the variable that you want to compute statistics for.

COMPUTING SUMMARY STATISTICS…

If the first row contains the label of the variable, check the box that says Labels in First Row.

Then check the box

for Summarystatistics, then

OK

COMPUTING SUMMARY STATISTICS…

On a new worksheet, the values for some numerical descriptive measures would be displayed. Adjust the column width to clearly see the values.

PHASE I WRITE-UP

Using all the graphs and computations that you made for the variables, describe the data set that you have on hand.

You may or may not use all the variables in your write-up, but you have to give a brief explanation on why you decided to include a particular variable in your project.

EXCEL PROJECTPhase II

PHASE II

The second phase of the project is focused on estimation and test of hypothesis.

In this phase, you are to compute point and interval estimates for a specific variable of interest and draw conclusion based on confidence interval or p-value of the test.

PHASE II

Suppose for example, we go back to our data set that has variables price and pool.

We might be interested to know the average price of a house, or the difference in the average price of a house with and without a pool.

POINT ESTIMATION

If we are interested in just a point estimate for the average of a specific variable, we could just use the descriptive statistics option under the data analysis menu. (see previous slides for instructions)

If we want a confidence interval instead, you could use an excel worksheet that we have provided for you.

CONFIDENCE INTERVAL

We made an excel worksheet that could help you compute your confidence interval for the mean easily. The spreadsheet looks like this:

CONFIDENCE INTERVAL The first worksheet is designed for one population

mean confidence interval. Just follow the instructions that is written on the spreadsheet.

This is your

Confidence

interval

CONFIDENCE INTERVAL

If you are interested in a confidence interval for a difference between two independent means, you would use the second spreadsheet.

CONFIDENCE INTERVAL

First, you need to sort the data set to separate the values according to which category they belong.

For example, we want a confidence interval for the average difference in the price for homes with (pool=1) or without pool (pool=0).

We need to sort the data set in a way that all those with pool=0 are next to each other, and those with pool=1 are also next to each other.

SORTING YOUR DATA SET

Select the entire data set (CTRL + A).

Click on the Data Tab, then choose the SORT button.

SORTING YOUR DATA SET

You would see the SORT dialogue box appear. Since our data set has the variable names on the first row, check this box.

SORTING YOUR DATA SET Then, from the Sort By drop down menu, choose the

variable that you would use as sorting variable. In our case, we would use pool.

Once you have selected the appropriate variable, click on OK.

SORTING YOUR DATA SET

You would then see your data set sorted according to that variable.

All those with

Pool=0are next

toeach

other.

CONFIDENCE INTERVAL

Once you have your data set sorted, follow the instructions in the worksheet.

This is

Your

Confidence

interval

CONFIDENCE INTERVAL

Note that since our interest is the difference in the price for with or without pool, what you would copy in the worksheet are the PRICES for those with pool=0 under the 0 column, and the PRICES for those with pool=1 under the 1 column.

You could use this confidence interval for drawing conclusion as well.

TEST OF HYPOTHESIS

There are several functions in the Data Analysis toolpak that you could use to conduct a test of hypothesis. Depending on the test that you are going to conduct, choose the appropriate test.

TEST OF HYPOTHESIS

Suppose in our example, we want to know if there is a difference in the average price of houses with or without pool.

The test that we would use is

this one

TEST OF HYPOTHESIS

Once you click OK, this dialogue box should appear:

Specify the range of

Values for prices with

Pool = 0 here.

Specify the range of

Values for prices with Pool = 0 here.

Set the level of

Significance here.

TEST OF HYPOTHESIS

Suppose in our sample data set, the prices for pool=0 starts from A2 up to A19 while for pool=1, it starts from A20 up to A31. We want to test the hypothesis at 5% level of significance.

TEST OF HYPOTHESIS

The output would be on a new worksheet. Adjust the column widths to see the numbers clearly.

Value of the

Test statistic P-value for

one-tailed test

P-value for

two-tailed test

PHASE II WRITE-UP

Your write-up for phase II should include all your estimates and conclusions that you drew.

You must have supporting evidence as to why did you come up with that conclusion. i.e, specify the p-value, and why did it lead you to that conclusion.

EXCEL PROJECTPhase III

PHASE III

The Final Phase of the project is basically phase I and II combined, with some more information that you could include in your project.

For example, by the time you have turned in phase II, we have not covered chi-square, regression and correlation analysis yet.

In your final phase, you might want to include some of this analysis to give further meaning to your data set.

PHASE III

For example, in our data set containing price of a house. What are the variables that are associated with price? What are the variables that you could use to predict the price of a house?

Those are just guide questions that could help you analyze your data set further.

CORRELATION ANALYSIS

Suppose you want to determine the strength of association between price of the house and the number of bedrooms.

In the data analysis toolpak, choose Correlation

CORRELATION ANALYSIS

On the dialogue box, highlight the column for price and bedrooms on the input range. Also, check the box for Labels in the first row.

CORRELATION ANALYSIS

You would see the output on a new worksheet.

This is the correlation coefficient of the two variables.

REGRESSION ANALYSIS

Suppose you want to predict the price of the house using say, the number of bedrooms.

You could use the Regression Analysis option from the Data Analysis Toolpak.

REGRESSION ANALYSIS

On the Regression dialogue box:

Specify the range of

Values of the variable

You want to predict

Here

Specify the range of

Values of the variable

You are using to predict the other

Variable here

REGRESSION ANALYSIS

On a new worksheet, you would see the regression output.

PHASE III WRITE-UP

In your final project write-up, you are expected to write an executive summary about the entire project.

You might want to include these sections in your project in order to provide your readers with an effective paper.

PHASE III WRITE-UP

IntroductionWhat your data set is all about? What are the variables? What are the questions you intended to answer in this project and what are the methods that you used to answer them?

Executive SummaryWhat are your findings? What are the answer to the questions you raised before? What can you conclude on your data set?

PHASE III WRITE-UP

Appendixa copy of your data setyour references