Upload
dinhkien
View
230
Download
3
Embed Size (px)
Citation preview
Univariate Descriptive Statistics Introduction
In this notes we analyze some basic descriptive statistics by using them in the database
“Paises.xlsx”, which is the same database considered in the previous notes, regarding the basic
use of Excel 2007:
Data analysis
Suppose you are interested in a descriptive analysis of the variable PIB, and consider
initially all the observations. To do this, we can use one option of the package we installed in
the previous computer class, i.e. the tool “Análisis de datos” (or Data Analysis Tool).
Note: to block a cell, you can write manually the double $, or you can press F4.
The steps to follow are:
1. Select Data Analysis Tool in the tab Data; select Descriptive Statistics and press OK:
2. In the new window we have to introduce the parameters which indicate the variable
we want to analyze (we can introduce the information by hand, or selecting the cells from
the sheet); in this case we should introduce the range “$G$1:$G$92”. Moreover, we have
to select Grouped by: Columns and Labels in First Row because our variable is contained in
a column, and we select also the label of this variable. Furthermore: we have to indicate
where we want to visualize the output; we suggest to use the following option: New
Worksheet Ply: Uni_An_PIB (it is simply a name with which we can remember what is
inside that worksheet). Finally, we should select Summary Statistics and press OK:
3. Once pressed OK, Excel 2007 will show:
Another interesting option available in the Data Analysis Tool, is that of Rank and
Percentile. In this case, by using the variable PIB, we could select among the Output options,
Output range, and select, for example, the cell “Uni_An_PIB!$D$1”. In this way, the new
descriptive measures will appear in the same sheet in which we reported the previous ones:
Once pressed OK, Excel 2007 will show:
As a result of the decreasing ordering, we can obtain the first, the second and the third
quartiles (note that the second quartile is the median, so in practice we already know its
value). In this case, the number of observations is odd, so, once obtained the ordered sample,
the first quartile will be the observation that occupies the position 3(n+1)/4, the second
quartile will be the observation that occupies the position 2(n+1)/4=(n+1)/2, and the third
quartile will be the observation that occupies the position (n+1)/4.
We can compute these three values using Excel as a simple Calculator. We position ourselves
in the cell “I1” and we introduce the code “=3*(B15+1)/4”, where B15 represents the cell
where Excel 2007 calculated the number of observation: n.
Next, in the cells “I2” and “I3”, we introduce “=(B15+1)/2” and “=(B15+1)/4” respectively. The
results we obtain will be 23, 46 and 69, and the corresponding quartiles will be 470, 1690 and
7600. Clearly, we can report this new information in the sheet in which we are working, i.e.
Uni_An_PIB. For example:
Note: the previous problem can be also addressed using the programmed functions
PERCENTILE() or QUARTILE(). However, in this case Excel 2007 will compute Q1 and Q3 by
averaging the observations in the positions 69-68 and 23-24 respectively.
Note: The value of a quartile IS NOT given by the position, but by the numerical value of the
observation in that position.
The option Descriptive Statistics does not provide all the information you might want; for
example, if you are interested in the coefficient of variation, you have to compute it by aplying
its defintion, i.e. Standard Deviation/Mean. You can compute it in the cell “B22” using the code
“=B7/B3”.
The data used in these computer classes have also a categorical variable. Suppose you are
interested in a descriptive analysis for the variable PIB, but only for the European countries. To
do this, when we use the options of the Data Analysis Tool, we must select only those
countries with value EU in the variable Zona.
Note: in this case the observations are already ordered according the variable Zona. If this is
not the case, you can order a database using the following steps:
1. Tab Home.
2. Sort & Filter.
3. Custom Sort
4. Introduce your preferences (in this case we want to order alphabetically according to
variable Zona, and subsequently according to the label País):
Frequencies tables and histograms: quantitative variables
In this section we show how to create frequencies tables and histograms for quantitative
variables. To do this, we will use the variable ln(PIB), i.e., a variable that we built during the
first computer class (guide 0).
Go to a new sheet, and change its name: we rename it as Hist_Freq_lnPIB. In the first column
we are going to compute the limits of the classes using the following rule:
• Number of observations: 91
• Minimum value: 4,38202663 � Consider 4,3
• Maximum value: 10,4359964 � Consider 10,5
• Range: 6,2
• Number of classes: 91^(1/2)= 9,53939201 � 9 o 10 clases.
Suppose you want to use 10 classes, how can we create them?
1. In the cell “B2” we compute the length of each interval with the code “=(10,5-4,3)/10”:
2. In the cell “A4” we compute the upper limit of the first class, which is equal to
“min+length”, and we can implement with the code “=4,3+$B$1”:
3. Next, we compute the remaining upper limits. The first one is equal to “previous upper
limit+length”, and indeed all of them can be computed with this rule. To compute the first,
we locate ourselves in the cell “A5” and introduce the code “=A4+$B$1”. Finallly, we copy
the content of “A5” in the cells from “A6” to “A13”:
Once computed the upper limits of the classes, we can compute the frequencies tables and
draw the histogram:
1. Select Data Analysis Tool in the tab Data; select Histogram and press OK.
2. In the Histogram’s window we have to introduce the Input range “Hoja1!$H$1:$H$92”,
the Bin range “$A$3:$A$13”, select Labels, introduce the Output range (for example, we
select the cell “A15” in Hist_Freq_lnPIB) and select Chart output:
3. Once pressed OK, Excel 2007 will show:
The obtained results can be improved in two ways:
I. We can offer more information about the frequencies.
II. We can improve the histogram.
In case I: considering the absolute frequencies, we can compute the relative frequencies, the
cumulative absolute frequencies and the cumulative absolute frequencies. In this way:
1. Copy the content of cell “A15:B25” in “A30:B40” (the row of the last class is not
interesting); next, select the column Upper_limits, press the right button of the mouse and
select Insert… � Shift cells right. In order to show a more complete information, in this
new column we can compute the lower limits of the classes. We locate ourselves in the cell
“A31” and we write the code “=B31-$B$1”; finally, we copy the content of this cell in the
cells from “A32” to “A40”.
2. In the cells D31-D40 we compute the relative frequencies; in cell “D31” we introduce
the code “=C31/Uni_An_PIB!$B$15”, and next we copy the content of this cell in the cells
from “D32” to “D40”.
3. In the cells E31-E40 we compute the cumulative absolute frequencies; in cell “E31” we
simply introduce the code “=C31”; in cell “E32” we introduce the code “=E31+C32”, and
next we copy the content of this cell in the cells from “E33” to “E40”.
4. In the cells F31-F40 we compute the cumulative relative frequencies. Because of the
structure of the table we are building, in order to compute the frequencies we can just
copy the content in cells E31-E40 into cells F31-F40. The final table is:
In the case II: the output of Excel 2007 is an histogram with spaces among bars; however, our
classes are contiguous, and they share lower and upper limits. To join the bars:
1. Locate yourselves with the mouse on a bar of the histogram, press the right button,
and select Format Data Series…:
2. In the new window, change the Gap Width to 0%, and the new histogram is
Frequencies tables and histograms: qualitative variables
In this section we show how to create frequencies tables and histograms for qualitative
variables. To do this, we will use the variable Zona.
Position yourselves in a new sheet, and change its name: now we will call it as Zona. In the first
column write the names of the modalities of the variable:
Zona
AFR
ASIA
AME
EU
Total
In the second column, in the cell “B2” compute the absolute frequency of the modality “AFR”.
To do this we can use the function COUNTIF() in the following way:
“=COUNTIF(Hoja1!I$2:I$92;A2)”. Next, copy its content in “B3”, “B4” and “B5”.
To verify that everything is ok, in cell “B6” compute the sum of the absolute frequencies
(“=SUMA(B2:B5)”):
Zona f
AFR 27
ASIA 24
AME 14
EU 26
Total 91
When we have these absolute frequencies, we already know how to compute the other types
of frequencies. Finally, let’s see how to create a bar plot for this variable:
1. Position yourselves in cell “D1”, move to tab Insert and select the option Column � 3-
D clustered column.
2. Excel 2007 will create an empty graph. To pass to Excel 2007 the data with which we
want to generate an histogram, select the option Select Data.
3. In Chart data range introduce “=Zona!$A$1:$B$5”, and the final graph is:
BoxPlots
In Excel 2007 there is not such
program. Specifically, the next
http://www.cms.murdoch.edu.au/areas/maths/statsnotes/samplestats/BoxPlotMacro.xls
You have to allow the use of macros by eliminating the
First, go to Excel Options -> Popular
tick (if it is not done) Show Developer tab in the Ribbon
Then a new tab appears in the main men
Security:
Here, tick Enable all macros (no
Obviously, in general, you must be careful if you run macros from not secure webs
our case here)…
Run the macro:
there is not such a procedure to make boxplots. It is better to use a macro
e next macro permits to make boxplots:
http://www.cms.murdoch.edu.au/areas/maths/statsnotes/samplestats/BoxPlotMacro.xls
You have to allow the use of macros by eliminating the restrictions of security
Popular
Show Developer tab in the Ribbon.
appears in the main menu called Developer. From this one, tick
(not recommended…
Obviously, in general, you must be careful if you run macros from not secure webs
It is better to use a macro
http://www.cms.murdoch.edu.au/areas/maths/statsnotes/samplestats/BoxPlotMacro.xls
of security of Excel 2007.
From this one, tick the tab Macro
Obviously, in general, you must be careful if you run macros from not secure webs (this is not
And use it with a data file.
For example, in the file “Paises.xlsx” mar
tab.
You will obtain just
“Paises.xlsx” mark a column and run the boxplot macro from the Macro
and run the boxplot macro from the Macro
If you tick <Chart Title> and <Data scale…>
<Data scale…> you can modify the labels of the figurelabels of the figure.