View
214
Download
0
Category
Tags:
Preview:
Citation preview
Th
e I
nfo
rmati
on
Sch
ool
of
the U
niv
ers
ity o
f W
ash
ing
ton
LIS 570
Session 7.1Bivariate Data Analysis
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 2
Objectives
• Reinforce concept of standard error and the standard normal distribution (basis of confidence level and confidence interval)
• Understand different approaches to the analysis of bivariate data
• Gain confidence in use of SPSS
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 3
Agenda• Review Central Limit Theorem • Visualization of “confidence
interval” and “confidence level”• Overview of bivariate analysis
approaches• Exploratory data analysis using
SPSS
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 4
Shapes of distributionNormal distribution:symmetrical Bell-shapedcurve
Positively skewed:tail on the right, cluster towards low end of the variable
Negatively skewed:tail on the left, cluster towards high-end of the variable
sym
metr
ical
Bimodality: A double peak
asym
metr
ical
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 5
Central Limit Theorem
The CLT states: regardless of the shape of the population distribution, as the number of samples (N) becomes very large (approaches infinity) the distribution of the sample mean ( m ) is normally distributed, with a mean of µ and standard deviation of σ/(√N).
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 6
Standard Error of the Mean
Standard error of the mean (Sm)
Sm = N
– Standard error is inversely related to square root of sample size
– To reduce standard error, increase sample size– Standard error is directly related to standard
deviation – When N = 1, standard error is equal to
standard deviation
Standard deviationTotal number in the sample
SS
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 7
Inferential statistics - univariate analysis
Interval estimates and interval variables• Estimation of sample mean accuracy—
based on random sampling and probability theoryStandardize the sample mean to estimate
population mean:t = sample mean – population mean
estimated SE
Population mean = sample mean + t * (estimated SE)
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 8
Exercise—sampling distribution
• Coin tossing • Probability of head or tails—50%• Each of you is a “sample” for this
activity.• Flip the coin 9 times, count the #
of times you get a “head”.
Live demo: http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/index.html
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 9
Standard Error(for nominal & ordinal
data)Variable must have only two
categories(could combine categories to achieve
this)
SB = PQ
NStandard error for binominal distribution
P = the % in one category of the variableQ = the % in the other category of the variable
Total number in the sample
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 10
Choosing the Statistical Technique*
Specific research question or hypothesis
Determine # of variables in question
Univariate analysis Bivariate analysis Multivariate analysis
Determine level of measurement of variables
Choose univariate method of analysis
Choose relevantdescriptive statistics
Choose relevantinferential statistics
* Source: De Vaus, D.A. (1991) Surveys in Social Research. Third edition. North Sydney, Australia: Allen & Unwin Pty Ltd., p133
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 11
Methods of analysis (De Vaus, 134)
Univariate methods
Bivariate methods
Frequency distributions Cross tabulations
Scattergrams
Regression
Correlation
Comparison of means
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 12
Association
• Example: gender and voting– Are gender and party supported
associated (related)?– Are gender and party supported
independent (unrelated)?– Are women more likely than men to
vote republican?Are men more likely to vote democrat?
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 13
Association
Association in bivariate data means that certain values of one variable tend to occur more often with some values of the second variable than with other variables of that variable (Moore p.242)
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 14
Cross Tabulation Tables
• Designate the X variable and the Y variable
• Place the values of X across the table• Draw a column for each X value • Place the values of Y down the table• Draw a row for each Y value• Insert frequencies into each CELL• Compute totals (MARGINALS) for each
column and row
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 15
Determining if a Relationship Exists
• Compute percentages for each value of X (down each column)– Base = marginal for each column
• Read the table by comparing values of X for each value of Y– Read table across each row
• Terminology – strong/ weak; positive/ negative; linear/
curvilinear
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 16
Cross tabulation tables
White collar Freq %
Blue collar Freq %
Total
Democrat 270 27% 810 81% 1080
Republican 730 73% 190 19% 920
Totals 1000 100% 1000 100% 2000
Calculatepercent
ReadTable
(De Vaus pp 158-160)
Occupation
Vote
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 17
Cross tabulation• Use column percentages and
compare these across the table• Where there is a difference this
indicates some association
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 18
Describing association
Direction Strength
Nature
Positive - Negative
Strong - Weak
Linear - Curvilinear
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 19
Describing association
Two variables are positively associated when larger values of one tend to be accompanied by larger values of the other
The variables are negatively associated when larger values of one tend to be accompanied by smaller values of the other
(Moore, p. 254)
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 20
Describing association
Scattergram or scatterplotGraph that can be used to show how two
interval level variables are related to one another
weight
Age
Variable A
Variable B
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 21
Description of Scattergrams
– Strength of Relationship• Strong• Moderate• Low
– Linearity of Relationship• Linear• Curvilinear
– Direction• Positive• Negative
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 22
Description of scatterplots
Strength and direction
Y
X X
X X
Y
Y Y
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 23
Description of scatterplots
Strength and direction
Nature
X
X X
X
Y
Y Y
Y
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 24
Correlation• Correlation coefficient—number
used to describe the strength and direction of association between variables
• Very strong = .80 through 1• Moderately strong = .60 through .79• Moderate = .50 through .59• Moderately weak = .30 through .49
• Very weak to no relationship 0 to .29 -1.00Perfect Negative Correlation
0.00 No relationship
1.00 Perfect PositiveCorrelation
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 25
Correlation Coefficients
– Nominal• Phi• Cramer’s V
– Ordinal (linear)• Gamma
– Nominal and Interval• Eta
http://www.nyu.edu/its/socsci/Docs/correlate.html
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 26
Correlation: Pearson’s r– Interval and/or ratio variables– Pearson product moment
coefficient (r)• two interval variables, normally
distributed • assumes a linear relationship• Can be any number from
– 0 to -1 : 0 to 1 (+1)• Sign (+ or -) shows direction• Number shows strength• Linearity cannot be determined from the
coefficiente.g.: r = .8913
Th
e In
form
atio
n S
cho
ol
of t
he U
nive
rsity
of
Was
hing
ton
LIS 570 Univariate Analysis
Mason; p. 27
Summary• Bivariate analysis• crosstabulation
– X - columns– Y - rows
• calculate percentages for columns• read percentages across the rows to observe
association
• Correlation and scattergram: describe strength and direction of association
Recommended