Upload
blaise-taylor
View
230
Download
0
Embed Size (px)
DESCRIPTION
Normal Distribution
Citation preview
Measurement MathMeasurement MathDeShon - 2006DeShon - 2006
Univariate DescriptivesUnivariate Descriptives MeanMean Variance, standard deviationVariance, standard deviation
Skew & KurtosisSkew & Kurtosis If normal distribution, mean and SD If normal distribution, mean and SD
are sufficient statisticsare sufficient statistics
1
)( 2
nXXXX
SxVar ii 1
nXXXX
S ii
Normal DistributionNormal Distribution
Univariate Probability FunctionsUnivariate Probability Functions
Bivariate DescriptivesBivariate Descriptives Mean and SD of each variable and the Mean and SD of each variable and the
correlation (correlation (ρρ) between them are ) between them are sufficient statistics for a bivariate sufficient statistics for a bivariate normal distributionnormal distribution
Distributions are abstractions or Distributions are abstractions or models models Used to simplifyUsed to simplify Useful to the extent the assumptions of Useful to the extent the assumptions of
the model are metthe model are met
2D – Ellipse or Scatterplot2D – Ellipse or Scatterplot
Galton’sOriginal Graph
3D Probability Density3D Probability Density
CovarianceCovariance Covariance is the extent to which two Covariance is the extent to which two
variables co-vary from their respective meansvariables co-vary from their respective means
Case X Y x=X-3 y =Y-4 xy
1 1 2 -2 -2 4
2 2 3 -1 -1 1
3 3 3 0 -1 0
4 6 8 3 4 12
Sum 17
Cov(X,Y) = 17/(4-1) = 5.667
1
),(
nYYXX
yxCov ii
CovarianceCovariance Covariance ranges from negative to Covariance ranges from negative to
positive infinitypositive infinity Variance - Covariance matrixVariance - Covariance matrix
Variance is the covariance of a variable with Variance is the covariance of a variable with itselfitself
)(()()()()()()()(
ZVarYZCovXZCovYZCovYVarXYCovXZCovXYCovXVar
CorrelationCorrelation Covariance is an unbounded statisticCovariance is an unbounded statistic Standardize the covariance with the Standardize the covariance with the
standard deviations standard deviations -1 -1 ≤≤ rr ≤≤ 1 1
)()(),(YVarXVar
YXCovrp
Correlation MatrixCorrelation MatrixTable 1. Descriptive Statistics for the Variables
Correlations
Variables Mean s.d 1 2 3 4 5 6 7 8 9 10
Self-rated cog ability 4.89 .86 .81
Self-enhancement 4.03 .85 .34 .79
Individualism 4.92 .89 .40 .41 .78
Horiz individualism 5.19 1.05 .41 .25 .82 .80
Vert individualism 4.65 1.11 .25 .42 .84 .37 .72
Collectivism 5.05 .74 .21 .11 .08 .06 .06 .72
Age 21.00 1.70 .12 .01 .17 .13 .16 .01 --
Gender 1.63 .49 -.16 -.06 -.11 .07 -.11 -.02 -.01 --
Academic seniority 2.17 1.01 .17 .07 .22 .23 .14 .06 .45 .12 --
Actual cog ability 10.71 1.60 .17 -.02 .08 .11 .03 .07 -.02 -.07 .12 --
Notes: N = 608; gender was coded 1 for male and 2 for female. Reliabilities (Coefficient alpha) are on the diagonal.
Coefficient of DeterminationCoefficient of Determination rr2 = percentage of variance in Y = percentage of variance in Y
accounted for by Xaccounted for by X Ranges from 0 to 1 (positive only)Ranges from 0 to 1 (positive only) This number is a meaningful This number is a meaningful
proportionproportion
Other measures of associationOther measures of association Point Biserial CorrelationPoint Biserial Correlation Biserial CorrelationBiserial Correlation Tetrachoric CorrelationTetrachoric Correlation
binary variablesbinary variables Polychoric CorrelationPolychoric Correlation
ordinal variablesordinal variables Odds RatioOdds Ratio
binary variablesbinary variables
Point Biserial CorrelationPoint Biserial Correlation Used when one variable is a natural Used when one variable is a natural
(real) dichotomy (two categories) (real) dichotomy (two categories) and the other variable is interval or and the other variable is interval or continuouscontinuous
Just a normal correlation between a Just a normal correlation between a continuous and a dichotomous continuous and a dichotomous variablevariable
Biserial CorrelationBiserial Correlation When one variable is an artificial When one variable is an artificial
dichotomy (two categories) and the dichotomy (two categories) and the criterion variable is interval or criterion variable is interval or continuous continuous
Tetrachoric CorrelationTetrachoric Correlation Estimates what the Estimates what the
correlation between correlation between two binary variables two binary variables would be if you could would be if you could measure variables on measure variables on a continuous scale.a continuous scale.
ExampleExample: difficulty : difficulty walking up 10 steps walking up 10 steps and difficulty lifting and difficulty lifting 10 lbs.10 lbs.
Difficulty Walking Up 10 Steps
Level of Difficultyno difficulty difficulty
Tetrachoric CorrelationTetrachoric Correlation Assumes that both Assumes that both
“traits” are normally “traits” are normally distributeddistributed
Correlation, Correlation, rr, , measures how narrow measures how narrow the ellipse is.the ellipse is.
a, b, c, d are the a, b, c, d are the proportions in each proportions in each quadrantquadrant
a
cd
b
Tetrachoric CorrelationTetrachoric Correlation
For For αα = ad/bc, = ad/bc,Approximation 1:Approximation 1:
Approximation 2 (Digby):Approximation 2 (Digby):
Q
11
Q
3 4
3 4
11
Tetrachoric CorrelationTetrachoric Correlation Example:Example:
Tetrachoric correlation Tetrachoric correlation = 0.61= 0.61
Pearson correlation = Pearson correlation = 0.410.41
o Assumes threshold is Assumes threshold is the same across the same across peoplepeople
o Strong assumption Strong assumption that underlying that underlying quantity of interest is quantity of interest is truly continuoustruly continuous
Difficulty Walking Up10 Steps
No Yes
Difficulty Lifting 10 lb.
No 40 10 50
Yes 20 30 50
60 40 100
Odds RatioOdds Ratio Measure of Measure of
association between association between two binary variablestwo binary variables
Risk associated with x Risk associated with x given y.given y.
Example:Example: odds of difficulty odds of difficulty
walking up 10 steps to walking up 10 steps to the odds of difficulty the odds of difficulty lifting 10 lb:lifting 10 lb:
OR p pp p
adbc
1 1
2 2
11
40 302 0 10 6
/ ( )/ ( )
( )( )( )( )
Pros and ConsPros and Cons Tetrachoric correlationTetrachoric correlation
same interpretation as Spearman and Pearson same interpretation as Spearman and Pearson correlationscorrelations
““difficult” to calculate exactlydifficult” to calculate exactly Makes assumptionsMakes assumptions
Odds RatioOdds Ratio easy to understand, but no “perfect” association easy to understand, but no “perfect” association
that is manageable (i.e. {that is manageable (i.e. {∞∞, -, -∞∞}})) easy to calculateeasy to calculate not comparable to correlationsnot comparable to correlations
May give you different May give you different results/inference!results/inference!
Dichotomized Data: Dichotomized Data: A Bad Habit of PsychologistsA Bad Habit of Psychologists
Sometimes perfectly good quantitative data Sometimes perfectly good quantitative data is made binary because it seems easier to is made binary because it seems easier to talk about "High" vs. "Low"talk about "High" vs. "Low" The worst habit is median splitThe worst habit is median split
Usually the High and Low groups are mixtures of the Usually the High and Low groups are mixtures of the continuacontinua
Rarely is the median interpreted rationallyRarely is the median interpreted rationally See referencesSee references
Cohen, J. (1983) The cost of dichotomization. Cohen, J. (1983) The cost of dichotomization. Applied Applied Psychological MeasurementPsychological Measurement, 7, 7, , 249-253.249-253.
McCallum, R.C., Zhang, S., Preacher, K.J., Rucker, D.D. McCallum, R.C., Zhang, S., Preacher, K.J., Rucker, D.D. (2002) On the practice of dichotomization of quantitative (2002) On the practice of dichotomization of quantitative variables. variables. Psychological Methods, 7Psychological Methods, 7, 19-40., 19-40.
Simple RegressionSimple Regression The The simple linear regression MODELsimple linear regression MODEL is: is:
yy = = 00 + + 11xx + +
describes how y is related to xdescribes how y is related to x 00 and and 11 are called are called parameters of the modelparameters of the model.. is a random variable called theis a random variable called the error term error term..
x y
e
Simple RegressionSimple Regression
Graph of the regression equation is a Graph of the regression equation is a straight line.straight line.
ββ0 is the population is the population y-y-intercept of the intercept of the regression line.regression line.
ββ11 is the population slope of the is the population slope of the regression line.regression line.
EE((yy) is the expected value of ) is the expected value of yy for a for a given given xx value value
xyyE 10ˆ)(
xyE 10)(
Simple RegressionSimple Regression
EE((yy))
xx
Slope Slope 11is positiveis positive
Regression lineRegression line
InterceptIntercept00
Simple RegressionSimple Regression
EE((yy))
xx
Slope Slope 11is 0is 0
Regression lineRegression lineInterceptIntercept
00
Estimated Simple RegressionEstimated Simple Regression The The estimated simple linear regression estimated simple linear regression
equationequation is: is:
The graph is called the estimated regression The graph is called the estimated regression line.line.
bb0 is the 0 is the yy intercept of the line. intercept of the line. bb1 is the slope of the line.1 is the slope of the line. is the estimated/predicted value of is the estimated/predicted value of yy for a for a
given given xx value. value.
0 1y b b x
y
Estimation processEstimation processRegression ModelRegression Model
yy = = 00 + + 11xx + +Regression EquationRegression Equation
EE((yy) = ) = 00 + + 11xxUnknown ParametersUnknown Parameters
00, , 11
Sample Data:Sample Data:x yx yxx11 y y11. .. . . .. . xxnn yynn
EstimatedEstimatedRegression EquationRegression Equation
Sample StatisticsSample Statistics
bb00, , bb11
bb00 and and bb11provide estimates ofprovide estimates of
00 and and 11
0 1y b b x
Least Squares EstimationLeast Squares Estimation Least Squares CriterionLeast Squares Criterion
where:where:yyii = = observedobserved value of the dependent value of the dependent variable for the ivariable for the ithth observation observationyyii = = predicted/estimatedpredicted/estimated value of the value of the dependent variable for the idependent variable for the ithth observationobservation
^
min (y yi i )2
Least Squares EstimationLeast Squares Estimation Estimated SlopeEstimated Slope
EstimatedEstimated y y-Intercept-Intercept
)(),(
/)(/)(
221 XVarYXCov
nxxnyxyx
bii
iiii
xbyb 0
Model AssumptionsModel Assumptions1.1. X is measured without error.X is measured without error.2.2. X and X and are independent are independent3.3. The error The error is a random variable with is a random variable with
mean of zero.mean of zero.4.4. The variance of The variance of , denoted by , denoted by 22, is the , is the
same for all values of the independent same for all values of the independent variable (homogeneity of error variance).variable (homogeneity of error variance).
5.5. The values of The values of are independent. are independent.6.6. The error The error is a normally distributed is a normally distributed
random variable.random variable.
Example: Consumer WarfareExample: Consumer Warfare
Number of Ads (X)Number of Ads (X) Purchases (Y)Purchases (Y)11 141433 242422 181811 171733 2727
ExampleExample Slope for the Estimated Regression EquationSlope for the Estimated Regression Equation
bb11 = 220 - (10)(100)/5 = 5 = 220 - (10)(100)/5 = 5 24 - (10)24 - (10)22/5/5
yy-Intercept for the Estimated Regression -Intercept for the Estimated Regression EquationEquation bb00 = 20 - 5(2) = 10 = 20 - 5(2) = 10
Estimated Regression EquationEstimated Regression Equationyy = 10 + 5 = 10 + 5xx^
ExampleExample Scatter plot with regression lineScatter plot with regression line
y = 10 + 5x
0
5
10
15
20
25
30
0 1 2 3 4# of Ads
Purc
hase
s
^
Evaluating FitEvaluating Fit Coefficient of DeterminationCoefficient of Determination
where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error
SST = SSR + SSESST = SSR + SSE
( ) ( ) ( )y y y y y yi i i i 2 2 2^^
rr22 = SSR/SST = SSR/SST
Evaluating FitEvaluating Fit Coefficient of DeterminationCoefficient of Determination
rr22 = SSR/SST = 100/114 = .8772 = SSR/SST = 100/114 = .8772
The regression relationship is very The regression relationship is very strong because 88% of the variation in strong because 88% of the variation in number of purchases can be explained number of purchases can be explained by the linear relationship with the by the linear relationship with the between the number of TV adsbetween the number of TV ads
Mean Square ErrorMean Square Error An Estimate of An Estimate of 22
The mean square error (MSE) provides the The mean square error (MSE) provides the estimate of estimate of 22, ,
SS22 = MSE = SSE/(n-2) = MSE = SSE/(n-2)
where:where:
210
2 )()ˆ(SSE iiii xbbyyy
Standard Error of EstimateStandard Error of Estimate An Estimate of SAn Estimate of S
To estimate To estimate we take the square root of we take the square root of 22.. The resulting S is called the The resulting S is called the standard error of standard error of
the estimatethe estimate.. Also called “Root Mean Squared Error”Also called “Root Mean Squared Error”
2SSEMSE
n
s
Linear CompositesLinear Composites Linear composites are fundamental to behavioral Linear composites are fundamental to behavioral
measurementmeasurement Prediction & Multiple RegressionPrediction & Multiple Regression Principle Component AnalysisPrinciple Component Analysis Factor AnalysisFactor Analysis Confirmatory Factor AnalysisConfirmatory Factor Analysis Scale DevelopmentScale Development
Ex: Unit-weighting of items in a testEx: Unit-weighting of items in a test Test = 1*X1 + 1*X2 + 1*X3 + … + 1*XnTest = 1*X1 + 1*X2 + 1*X3 + … + 1*Xn
Linear CompositesLinear Composites Sum ScaleSum Scale
ScaleScaleAA = X = X11 + X + X22 + X + X33 + … + X + … + Xnn
Unit-weighted linear compositeUnit-weighted linear composite ScaleScaleAA = 1*X = 1*X11 + 1*X + 1*X22 + 1*X + 1*X33 + … + 1*X + … + 1*Xnn
Weighted linear compositeWeighted linear composite ScaleScaleAA = b = b11XX11 + b + b22XX22 + b + b33XX33 + … + b + … + bnnXXnn
Variance of a weighted CompositeVariance of a weighted Composite
XX YY
YY Var(X)Var(X) Cov(XY)Cov(XY)
YY Cov(XY)Cov(XY) Var(Y)Var(Y)
),(*2)(*)(*)( 212221
21 YXCovwwXVarwXVarwYXVar
Effective vs. Nominal WeightsEffective vs. Nominal Weights Nominal weightsNominal weights
The desired weight assigned to each The desired weight assigned to each componentcomponent
Effective weights Effective weights the actual contribution of each the actual contribution of each
component to the compositecomponent to the composite function of the desired weights, function of the desired weights,
standard deviations, and covariances of standard deviations, and covariances of the componentsthe components
Principles of Composite FormationPrinciples of Composite Formation
Standardize before combining!!!!!Standardize before combining!!!!! Weighting doesn’t matter much when Weighting doesn’t matter much when
the correlations among the components the correlations among the components are moderate to largeare moderate to large
As the number of components As the number of components increases, the importance of weighting increases, the importance of weighting decreasesdecreases
Differential weights are difficult to Differential weights are difficult to replicate/cross-validatereplicate/cross-validate
Decision AccuracyDecision AccuracyTr
uth
Yes
No
DecisionFail Pass
TruePositive
False Positive
FalseNegative
TrueNegative
Signal Detection TheorySignal Detection Theory
Polygraph ExamplePolygraph Example Sensitivity, etc…Sensitivity, etc…