Upload
christineshearer
View
21.103
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Univariate Analysis
Simple Tools for Description
POLI 399/691 - Fall 2008 2Topic 6
Description of Variables Univariate analysis refers to the analysis
of one variable Several statistical measures can be
employed to describe data Allows for comparison across variables
measured in different unitsProvides parsimony: one or two statistics can
help us understand a large number of cases
POLI 399/691 - Fall 2008 3Topic 6
ProportionShare of cases relative to the whole
population; Range is from 0 to 1E.g. if 50 women in sample of 125, then
proportion of women is 50/125=0.4 Percentage is the proportion multiplied by
100E.g. if proportion is .40, then percentage
is .40x100=40%
Basic descriptive tools
POLI 399/691 - Fall 2008 4Topic 6
Percentage change allows us to calculate the relative change in a variable over some period of time Percentage change is:
Time 2 – Time 1 x 100 Time 1
E.g. in 1993 women made up 48% of the population and in 2003 this percentage had risen to 51%. What is the percentage change from 1993 to 2003?
((51-48)/48)x100=(3/48)x100=6.25% (it is not 3%)
Percentage point difference is the absolute change between percentage at time 1 and percentage at time 2 Using the same example, the percentage point difference in the
share of women in the population between 1993 and 2003 is 3 percentage points (X2-X1) (it is not 3%)
POLI 399/691 - Fall 2008 5Topic 6
Frequency Table
The frequency table (or frequency distribution) is commonly used to provide a “snapshot” of a variable
Made up of 4 columns: Values (categories) of the variable The number of cases The percentage of cases The cumulative percentage of cases
Consider collapsing categories if the variable has a large number of values/categories
POLI 399/691 - Fall 2008 6Topic 6
Table 1: Frequency Table of Grouped Data – Ages of Respondents
Age Group Frequency Percentage
Cumulative Percentage
18-24 36 15.0 15.0
25-34 44 18.3 33.3
35-44 43 17.9 51.2
45-54 46 19.2 70.4
55-64 34 14.2 84.6
65 and over 37 15.4 100.0
Total 240 100.0% 100.0%
Source: Hypothetical Data, 2005.
POLI 399/691 - Fall 2008 7Topic 6
Bar charts, pie charts and line graphs
Bar charts or pie charts are good for showing the variation in the percentage of cases for each value of a variablePie chart – compare parts to the wholeBar graphs to compare categories/values
Line chart is good for longitudinal dataReveals trends over time
POLI 399/691 - Fall 2008 8Topic 6
Figure 1: Federal Expenditures by Sector
38
25
127 4
14
0
10
20
30
40
50
60
Expenditure Type
Perc
en
tage
Social Public Debt Fiscal Arrangements Defence Gov't Operations Other
Source: Hypothetical Data, 2006
POLI 399/691 - Fall 2008 9Topic 6
Figure 2: Federal Expenditures by Sector
38%
25%
12%
7%
4%
14% Social
Public Debt
FiscalArrangementsDefence
Gov't Operations
Other
Source: Hypothetical Data, 2006
POLI 399/691 - Fall 2008 10Topic 6
Figure 2: Share of Women among Party Leaders Selected by Year, 1980-2005
0
5
10
15
20
25
30
35
40
45
50
Per
cen
tag
e
Percentage per year 3 per. Mov. Avg. (Percentage per year)
Source: O’Neill and Stewart, “Gender and Political Party Leadership in Canada,” Party Politics, forthcoming.
POLI 399/691 - Fall 2008 11Topic 6
Religious Volunteers
All OtherVolunteers
Non-Volunteers
Voted in last federal election 83.7 80.8 71.6
Voted in last provincial election 82.6 79.2 70.6
Voted in last municipal election 72.8 67.4 58.0
Follow news or current affairs daily 70.2 66.8 65.7
N (over 18 only) (509) 537 (1603) 1745 (5346)
Note: Entries are percentage of respondents who reported engaging in said activity. All differences across the three groups are statistically significant (p<.01). Differences between religious and other volunteers in reported municipal voting statistically significant (p< .05).
Table 8: Political Participation by Volunteer Type
Source: Brenda O’Neill, “Canadian Women’s Religious Volunteerism: Compassion, Connections and Comparisons” in B. O’Neill and E. Gidengil, Gender and Social Capital, New York: Routledge, 2006.
POLI 399/691 - Fall 2008 12Topic 6
Checklist for Charts and Tables Have you chosen the proper type of chart? Have you provided a clear, descriptive title? (Note the
difference between “Table” and “Figure”) Is the data source noted in a footnote? Are statistical tests reported in a footnote? For Bivariate tables, is the dependent variable on the
vertical axis? The independent on the horizontal? Are the axes properly labelled? Will colour choices matter if printed in black and white? Have you provided values in bar/pie charts? Does the length of the axes distort the result? Have you referred to and explained the table/chart in the
text?
POLI 399/691 - Fall 2008 13Topic 6
Measures of Central Tendency
Measures of central tendency allow us to speak of some “standard” case for all the cases in the sample or population What is the most common unit? Is there some pattern in the data?
Three different measures: mean, median and mode Nominal data? Use mode Ordinal data? Use mode and/or median Interval data? Use mode, median and/or mean
The mean provides the most information; the mode, the least Always use the statistic that provides the most information; goal
is parsimony
POLI 399/691 - Fall 2008 14Topic 6
Mode For nominal data, the mode is the measure of
the “standard” or “most common” case The mode is simply that category of the variable
that occurs the most often (i.e. has the most cases)
The mode is the “best guess” for nominal data The utility of this statistic is limited
Can change dramatically with the addition of a few cases (not very stable)
Tells us about the most common value but little else
POLI 399/691 - Fall 2008 15Topic 6
Figure 1: Federal Expenditures by Sector
38
25
127 4
14
0
10
20
30
40
50
60
Expenditure Type
Perc
en
tage
Social Public Debt Fiscal Arrangements Defence Gov't Operations Other
Source: Hypothetical Data, 2006
← Mode is Social Expenditures
POLI 399/691 - Fall 2008 16Topic 6
Median Use with ordinal data Indicates the middle case in an ordered
set of cases – the midpoint To determine the median, order the data
from lowest to highest and the median is the value of the middle caseEven number of cases? Take the average of
the two middle values (add them together and divide by 2)
POLI 399/691 - Fall 2008 17Topic 6
Mean The mean describes the centre of gravity of interval data
Commonly called the average Easily allows one to locate a case relative to all others
Where is a case located in relation to all the others? Above average? Below average?
To calculate: ΣXi/n=(X1+X2+…+Xi)/n where i=number of cases
Reliable but sensitive to outliers (cases that are much larger or much smaller than the rest) Median provides a better sense of the most common case when
there are outliers
POLI 399/691 - Fall 2008 18Topic 6
Example: Income data
For these data, the mean is $1,039,700 and the median is $36,5000
We call a distribution with outliers a skewed distribution
Income for 10 cases
$24,000
$25,000
$28,000
$30,000
$35,000
$38,000
$56,000
$75,000
$86,000
$10,000,000
Median →
Mean →
POLI 399/691 - Fall 2008 19Topic 6
Measures of Dispersion Once you know the standard case, you should also know
how standard the case is – that is, how well does this one case represent all the cases?
For nominal data, there is no measure of dispersion; one could simply indicate how many categories exist
For ordinal data, the range provides some information about the spread of data The range is simply the highest value minus the lowest value
When we have outliers the range gives a distorted picture of the data E.g. for our income data, the range is $10,000,000-$24,000 =
$9,976,000
POLI 399/691 - Fall 2008 20Topic 6
For interval data, we use the standard deviation A measure of the average deviation of a case from the mean
value A deviation is the distance and direction of any raw score from
the mean The larger the deviation, the further the score from the mean The deviation can be either positive or negative (larger or smaller
than the mean value)
The mean is that value where the sum of negative deviations equals the sum of positive deviations
You want to calculate the average size of these deviations but we need to ‘fix’ the problem of the deviations summing to 0
To fix the problem, we square each deviation before we sum them, and then take the square root of the total
POLI 399/691 - Fall 2008 21Topic 6
Formula for standard deviation
N
XXds i
2
..
Note: N-1 is employed for a sample
POLI 399/691 - Fall 2008 22Topic 6
To calculate the standard deviation: Calculate the mean Subtract the mean from each value (these are the
deviations) Square each of the deviations Sum them (add them together) Divide this sum by the number of cases (to get the
average squared deviation) Compute the square root of average squared
deviation
POLI 399/691 - Fall 2008 23Topic 6
Table 8.10 Computation of Standard Deviation, Beth’s Grades
SUBJECT GRADE
Sociology 66 66 – 82 = –16 256
Psychology 72 72 – 82 = –10 100
Political science 88 88 – 82 = 6 36
Anthropology 90 90 – 82 = 8 64
Philosophy 94 94 – 82 = 12 144
MEAN 82.0 TOTAL 600
1 N
)XX( 2
sd
4
600sd
25.12sd
Note: The “N – 1” term is used when sampling procedures have been used. When population values are used the denominator is “N.” SPSS uses N – 1 in calculating the standard deviation in the DESCRIPTIVES procedure.
xx 2)x(x
POLI 399/691 - Fall 2008 24Topic 6
The result is always a positive number but you can think of the average deviation as occurring either positively or negatively
The last measure to review is the variance Variance is simply the square of the standard
deviation Variance and standard deviation are easily
calculated by software programs Good to calculate it on your own for small samples to
get a “feel” for the statistic These are two statistics that will be used again
for other calculations
POLI 399/691 - Fall 2008 25Topic 6
The smaller the standard deviation, the tighter the cases are around the meanThe mean is a “better” predictor of scores
when the standard deviation is small Like the mean, the standard deviation is
also sensitive to outliers Describing data effectively requires
information on both the mean and the standard deviation
POLI 399/691 - Fall 2008 26Topic 6
Statistics and SPSS
Statistic Nominal Ordinal Interval
Central Tendency
Mode Mode
Median
Mode
Median
Mean
Dispersion
-- Range Range
Standard Deviation Variance
SPSS Commands
(options)
Frequencies
(mode)
Frequencies
(range, median)
Descriptives
(all)
Source: Jackson and Verberg, p.222.
POLI 399/691 - Fall 2008 27Topic 6
Z Scores (or standardized scores)
A Z score represents the distance from the mean, in standard deviation units, of any value in a distribution
Z scores are comparable across different populations and different units because they are offered in standard units
The Z score formula is as follows:
sd
XXZ
POLI 399/691 - Fall 2008 28Topic 6
A negative z-score means the case falls below the mean; a positive one means it lies above the mean A z-score of 0 means ….? The larger the score, the further from the mean
Useful when combining variables with very different ranges into indexes Transform into Z scores and then create the index
To obtain Z scores in SPSS Select Analyze → Descriptive Statistics
→Descriptives Select one or more variables Check “Save standardized values as variables” to save z scores
as new variables. They will be the last variables in the variable view screen
POLI 399/691 - Fall 2008 29Topic 6
Key terms
Proportion Percentage Percentage change Percentage point
difference Bar chart Pie chart Frequency table Cumulative percentage Mean
Median Mode Outlier Skewed distribution Measures of variation Range Standard deviation Variance Standardized (Z) scores