of 12 /12
Univariate Descriptive Statistics Introduction In this notes we analyze some basic descriptive statistics by using them in the database “Paises.xlsx”, which is the same database considered in the previous notes, regarding the basic use of Excel 2007: Data analysis Suppose you are interested in a descriptive analysis of the variable PIB, and consider initially all the observations. To do this, we can use one option of the package we installed in the previous computer class, i.e. the tool “Análisis de datos” (or Data Analysis Tool). Note: to block a cell, you can write manually the double \$, or you can press F4. The steps to follow are: 1. Select Data Analysis Tool in the tab Data; select Descriptive Statistics and press OK:

# Univariate Descriptive Statistics

Embed Size (px)

Citation preview

Univariate Descriptive Statistics Introduction

In this notes we analyze some basic descriptive statistics by using them in the database

“Paises.xlsx”, which is the same database considered in the previous notes, regarding the basic

use of Excel 2007:

Data analysis

Suppose you are interested in a descriptive analysis of the variable PIB, and consider

initially all the observations. To do this, we can use one option of the package we installed in

the previous computer class, i.e. the tool “Análisis de datos” (or Data Analysis Tool).

Note: to block a cell, you can write manually the double \$, or you can press F4.

The steps to follow are:

1. Select Data Analysis Tool in the tab Data; select Descriptive Statistics and press OK:

2. In the new window we have to introduce the parameters which indicate the variable

we want to analyze (we can introduce the information by hand, or selecting the cells from

the sheet); in this case we should introduce the range “\$G\$1:\$G\$92”. Moreover, we have

to select Grouped by: Columns and Labels in First Row because our variable is contained in

a column, and we select also the label of this variable. Furthermore: we have to indicate

where we want to visualize the output; we suggest to use the following option: New

Worksheet Ply: Uni_An_PIB (it is simply a name with which we can remember what is

inside that worksheet). Finally, we should select Summary Statistics and press OK:

3. Once pressed OK, Excel 2007 will show:

Another interesting option available in the Data Analysis Tool, is that of Rank and

Percentile. In this case, by using the variable PIB, we could select among the Output options,

Output range, and select, for example, the cell “Uni_An_PIB!\$D\$1”. In this way, the new

descriptive measures will appear in the same sheet in which we reported the previous ones:

Once pressed OK, Excel 2007 will show:

As a result of the decreasing ordering, we can obtain the first, the second and the third

quartiles (note that the second quartile is the median, so in practice we already know its

value). In this case, the number of observations is odd, so, once obtained the ordered sample,

the first quartile will be the observation that occupies the position 3(n+1)/4, the second

quartile will be the observation that occupies the position 2(n+1)/4=(n+1)/2, and the third

quartile will be the observation that occupies the position (n+1)/4.

We can compute these three values using Excel as a simple Calculator. We position ourselves

in the cell “I1” and we introduce the code “=3*(B15+1)/4”, where B15 represents the cell

where Excel 2007 calculated the number of observation: n.

Next, in the cells “I2” and “I3”, we introduce “=(B15+1)/2” and “=(B15+1)/4” respectively. The

results we obtain will be 23, 46 and 69, and the corresponding quartiles will be 470, 1690 and

7600. Clearly, we can report this new information in the sheet in which we are working, i.e.

Uni_An_PIB. For example:

Note: the previous problem can be also addressed using the programmed functions

PERCENTILE() or QUARTILE(). However, in this case Excel 2007 will compute Q1 and Q3 by

averaging the observations in the positions 69-68 and 23-24 respectively.

Note: The value of a quartile IS NOT given by the position, but by the numerical value of the

observation in that position.

The option Descriptive Statistics does not provide all the information you might want; for

example, if you are interested in the coefficient of variation, you have to compute it by aplying

its defintion, i.e. Standard Deviation/Mean. You can compute it in the cell “B22” using the code

“=B7/B3”.

The data used in these computer classes have also a categorical variable. Suppose you are

interested in a descriptive analysis for the variable PIB, but only for the European countries. To

do this, when we use the options of the Data Analysis Tool, we must select only those

countries with value EU in the variable Zona.

Note: in this case the observations are already ordered according the variable Zona. If this is

not the case, you can order a database using the following steps:

1. Tab Home.

2. Sort & Filter.

3. Custom Sort

4. Introduce your preferences (in this case we want to order alphabetically according to

variable Zona, and subsequently according to the label País):

Frequencies tables and histograms: quantitative variables

In this section we show how to create frequencies tables and histograms for quantitative

variables. To do this, we will use the variable ln(PIB), i.e., a variable that we built during the

first computer class (guide 0).

Go to a new sheet, and change its name: we rename it as Hist_Freq_lnPIB. In the first column

we are going to compute the limits of the classes using the following rule:

• Number of observations: 91

• Minimum value: 4,38202663 � Consider 4,3

• Maximum value: 10,4359964 � Consider 10,5

• Range: 6,2

• Number of classes: 91^(1/2)= 9,53939201 � 9 o 10 clases.

Suppose you want to use 10 classes, how can we create them?

1. In the cell “B2” we compute the length of each interval with the code “=(10,5-4,3)/10”:

2. In the cell “A4” we compute the upper limit of the first class, which is equal to

“min+length”, and we can implement with the code “=4,3+\$B\$1”:

3. Next, we compute the remaining upper limits. The first one is equal to “previous upper

limit+length”, and indeed all of them can be computed with this rule. To compute the first,

we locate ourselves in the cell “A5” and introduce the code “=A4+\$B\$1”. Finallly, we copy

the content of “A5” in the cells from “A6” to “A13”:

Once computed the upper limits of the classes, we can compute the frequencies tables and

draw the histogram:

1. Select Data Analysis Tool in the tab Data; select Histogram and press OK.

2. In the Histogram’s window we have to introduce the Input range “Hoja1!\$H\$1:\$H\$92”,

the Bin range “\$A\$3:\$A\$13”, select Labels, introduce the Output range (for example, we

select the cell “A15” in Hist_Freq_lnPIB) and select Chart output:

3. Once pressed OK, Excel 2007 will show:

The obtained results can be improved in two ways:

II. We can improve the histogram.

In case I: considering the absolute frequencies, we can compute the relative frequencies, the

cumulative absolute frequencies and the cumulative absolute frequencies. In this way:

1. Copy the content of cell “A15:B25” in “A30:B40” (the row of the last class is not

interesting); next, select the column Upper_limits, press the right button of the mouse and

select Insert… � Shift cells right. In order to show a more complete information, in this

new column we can compute the lower limits of the classes. We locate ourselves in the cell

“A31” and we write the code “=B31-\$B\$1”; finally, we copy the content of this cell in the

cells from “A32” to “A40”.

2. In the cells D31-D40 we compute the relative frequencies; in cell “D31” we introduce

the code “=C31/Uni_An_PIB!\$B\$15”, and next we copy the content of this cell in the cells

from “D32” to “D40”.

3. In the cells E31-E40 we compute the cumulative absolute frequencies; in cell “E31” we

simply introduce the code “=C31”; in cell “E32” we introduce the code “=E31+C32”, and

next we copy the content of this cell in the cells from “E33” to “E40”.

4. In the cells F31-F40 we compute the cumulative relative frequencies. Because of the

structure of the table we are building, in order to compute the frequencies we can just

copy the content in cells E31-E40 into cells F31-F40. The final table is:

In the case II: the output of Excel 2007 is an histogram with spaces among bars; however, our

classes are contiguous, and they share lower and upper limits. To join the bars:

1. Locate yourselves with the mouse on a bar of the histogram, press the right button,

and select Format Data Series…:

2. In the new window, change the Gap Width to 0%, and the new histogram is

Frequencies tables and histograms: qualitative variables

In this section we show how to create frequencies tables and histograms for qualitative

variables. To do this, we will use the variable Zona.

Position yourselves in a new sheet, and change its name: now we will call it as Zona. In the first

column write the names of the modalities of the variable:

Zona

AFR

ASIA

AME

EU

Total

In the second column, in the cell “B2” compute the absolute frequency of the modality “AFR”.

To do this we can use the function COUNTIF() in the following way:

“=COUNTIF(Hoja1!I\$2:I\$92;A2)”. Next, copy its content in “B3”, “B4” and “B5”.

To verify that everything is ok, in cell “B6” compute the sum of the absolute frequencies

(“=SUMA(B2:B5)”):

Zona f

AFR 27

ASIA 24

AME 14

EU 26

Total 91

When we have these absolute frequencies, we already know how to compute the other types

of frequencies. Finally, let’s see how to create a bar plot for this variable:

1. Position yourselves in cell “D1”, move to tab Insert and select the option Column � 3-

D clustered column.

2. Excel 2007 will create an empty graph. To pass to Excel 2007 the data with which we

want to generate an histogram, select the option Select Data.

3. In Chart data range introduce “=Zona!\$A\$1:\$B\$5”, and the final graph is:

BoxPlots

In Excel 2007 there is not such

program. Specifically, the next

http://www.cms.murdoch.edu.au/areas/maths/statsnotes/samplestats/BoxPlotMacro.xls

You have to allow the use of macros by eliminating the

First, go to Excel Options -> Popular

tick (if it is not done) Show Developer tab in the Ribbon

Then a new tab appears in the main men

Security:

Here, tick Enable all macros (no

Obviously, in general, you must be careful if you run macros from not secure webs

our case here)…

Run the macro:

there is not such a procedure to make boxplots. It is better to use a macro

e next macro permits to make boxplots:

http://www.cms.murdoch.edu.au/areas/maths/statsnotes/samplestats/BoxPlotMacro.xls

You have to allow the use of macros by eliminating the restrictions of security

Popular

Show Developer tab in the Ribbon.

appears in the main menu called Developer. From this one, tick

(not recommended…

Obviously, in general, you must be careful if you run macros from not secure webs

It is better to use a macro

http://www.cms.murdoch.edu.au/areas/maths/statsnotes/samplestats/BoxPlotMacro.xls

of security of Excel 2007.

From this one, tick the tab Macro

Obviously, in general, you must be careful if you run macros from not secure webs (this is not

And use it with a data file.

For example, in the file “Paises.xlsx” mar

tab.

You will obtain just

“Paises.xlsx” mark a column and run the boxplot macro from the Macro

and run the boxplot macro from the Macro

If you tick <Chart Title> and <Data scale…>

<Data scale…> you can modify the labels of the figurelabels of the figure.