Measurement Math DeShon - 2006. Univariate Descriptives Mean Mean Variance, standard deviation Variance, standard deviation Skew Kurtosis Skew Kurtosis

Measurement MathMeasurement MathDeShon - 2006DeShon - 2006

Univariate DescriptivesUnivariate Descriptives MeanMean Variance, standard deviationVariance, standard deviation

Skew & KurtosisSkew & Kurtosis If normal distribution, mean and SD If normal distribution, mean and SD

are sufficient statisticsare sufficient statistics

1

)( 2

nXXXX

SxVar ii 1

nXXXX

S ii

Normal DistributionNormal Distribution

Univariate Probability FunctionsUnivariate Probability Functions

Bivariate DescriptivesBivariate Descriptives Mean and SD of each variable and the Mean and SD of each variable and the

correlation (correlation (ρρ) between them are ) between them are sufficient statistics for a bivariate sufficient statistics for a bivariate normal distributionnormal distribution

Distributions are abstractions or Distributions are abstractions or models models Used to simplifyUsed to simplify Useful to the extent the assumptions of Useful to the extent the assumptions of

the model are metthe model are met

2D – Ellipse or Scatterplot2D – Ellipse or Scatterplot

Galton’sOriginal Graph

3D Probability Density3D Probability Density

CovarianceCovariance Covariance is the extent to which two Covariance is the extent to which two

variables co-vary from their respective meansvariables co-vary from their respective means

Case X Y x=X-3 y =Y-4 xy

1 1 2 -2 -2 4

2 2 3 -1 -1 1

3 3 3 0 -1 0

4 6 8 3 4 12

Sum 17

Cov(X,Y) = 17/(4-1) = 5.667

1

),(

nYYXX

yxCov ii

CovarianceCovariance Covariance ranges from negative to Covariance ranges from negative to

positive infinitypositive infinity Variance - Covariance matrixVariance - Covariance matrix

Variance is the covariance of a variable with Variance is the covariance of a variable with itselfitself

)(()()()()()()()(

ZVarYZCovXZCovYZCovYVarXYCovXZCovXYCovXVar

CorrelationCorrelation Covariance is an unbounded statisticCovariance is an unbounded statistic Standardize the covariance with the Standardize the covariance with the

standard deviations standard deviations -1 -1 ≤≤ rr ≤≤ 1 1

)()(),(YVarXVar

YXCovrp

Correlation MatrixCorrelation MatrixTable 1. Descriptive Statistics for the Variables

Correlations

Variables Mean s.d 1 2 3 4 5 6 7 8 9 10

Self-rated cog ability 4.89 .86 .81

Self-enhancement 4.03 .85 .34 .79

Individualism 4.92 .89 .40 .41 .78

Horiz individualism 5.19 1.05 .41 .25 .82 .80

Vert individualism 4.65 1.11 .25 .42 .84 .37 .72

Collectivism 5.05 .74 .21 .11 .08 .06 .06 .72

Age 21.00 1.70 .12 .01 .17 .13 .16 .01 --

Gender 1.63 .49 -.16 -.06 -.11 .07 -.11 -.02 -.01 --

Academic seniority 2.17 1.01 .17 .07 .22 .23 .14 .06 .45 .12 --

Actual cog ability 10.71 1.60 .17 -.02 .08 .11 .03 .07 -.02 -.07 .12 --

Notes: N = 608; gender was coded 1 for male and 2 for female. Reliabilities (Coefficient alpha) are on the diagonal.

Coefficient of DeterminationCoefficient of Determination rr2 = percentage of variance in Y = percentage of variance in Y

accounted for by Xaccounted for by X Ranges from 0 to 1 (positive only)Ranges from 0 to 1 (positive only) This number is a meaningful This number is a meaningful

proportionproportion

Other measures of associationOther measures of association Point Biserial CorrelationPoint Biserial Correlation Biserial CorrelationBiserial Correlation Tetrachoric CorrelationTetrachoric Correlation

binary variablesbinary variables Polychoric CorrelationPolychoric Correlation

ordinal variablesordinal variables Odds RatioOdds Ratio

binary variablesbinary variables

Point Biserial CorrelationPoint Biserial Correlation Used when one variable is a natural Used when one variable is a natural

(real) dichotomy (two categories) (real) dichotomy (two categories) and the other variable is interval or and the other variable is interval or continuouscontinuous

Just a normal correlation between a Just a normal correlation between a continuous and a dichotomous continuous and a dichotomous variablevariable

Biserial CorrelationBiserial Correlation When one variable is an artificial When one variable is an artificial

dichotomy (two categories) and the dichotomy (two categories) and the criterion variable is interval or criterion variable is interval or continuous continuous

Tetrachoric CorrelationTetrachoric Correlation Estimates what the Estimates what the

correlation between correlation between two binary variables two binary variables would be if you could would be if you could measure variables on measure variables on a continuous scale.a continuous scale.

ExampleExample: difficulty : difficulty walking up 10 steps walking up 10 steps and difficulty lifting and difficulty lifting 10 lbs.10 lbs.

Difficulty Walking Up 10 Steps

Level of Difficultyno difficulty difficulty

Tetrachoric CorrelationTetrachoric Correlation Assumes that both Assumes that both

“traits” are normally “traits” are normally distributeddistributed

Correlation, Correlation, rr, , measures how narrow measures how narrow the ellipse is.the ellipse is.

a, b, c, d are the a, b, c, d are the proportions in each proportions in each quadrantquadrant

a

cd

b

Tetrachoric CorrelationTetrachoric Correlation

For For αα = ad/bc, = ad/bc,Approximation 1:Approximation 1:

Approximation 2 (Digby):Approximation 2 (Digby):

Q

11

Q

3 4

3 4

11

Tetrachoric CorrelationTetrachoric Correlation Example:Example:

Tetrachoric correlation Tetrachoric correlation = 0.61= 0.61

Pearson correlation = Pearson correlation = 0.410.41

o Assumes threshold is Assumes threshold is the same across the same across peoplepeople

o Strong assumption Strong assumption that underlying that underlying quantity of interest is quantity of interest is truly continuoustruly continuous

Difficulty Walking Up10 Steps

No Yes

Difficulty Lifting 10 lb.

No 40 10 50

Yes 20 30 50

60 40 100

Odds RatioOdds Ratio Measure of Measure of

association between association between two binary variablestwo binary variables

Risk associated with x Risk associated with x given y.given y.

Example:Example: odds of difficulty odds of difficulty

walking up 10 steps to walking up 10 steps to the odds of difficulty the odds of difficulty lifting 10 lb:lifting 10 lb:

OR p pp p

adbc

1 1

2 2

11

40 302 0 10 6

/ ( )/ ( )

( )( )( )( )

Pros and ConsPros and Cons Tetrachoric correlationTetrachoric correlation

same interpretation as Spearman and Pearson same interpretation as Spearman and Pearson correlationscorrelations

““difficult” to calculate exactlydifficult” to calculate exactly Makes assumptionsMakes assumptions

Odds RatioOdds Ratio easy to understand, but no “perfect” association easy to understand, but no “perfect” association

that is manageable (i.e. {that is manageable (i.e. {∞∞, -, -∞∞}})) easy to calculateeasy to calculate not comparable to correlationsnot comparable to correlations

May give you different May give you different results/inference!results/inference!

Dichotomized Data: Dichotomized Data: A Bad Habit of PsychologistsA Bad Habit of Psychologists

Sometimes perfectly good quantitative data Sometimes perfectly good quantitative data is made binary because it seems easier to is made binary because it seems easier to talk about "High" vs. "Low"talk about "High" vs. "Low" The worst habit is median splitThe worst habit is median split

Usually the High and Low groups are mixtures of the Usually the High and Low groups are mixtures of the continuacontinua

Rarely is the median interpreted rationallyRarely is the median interpreted rationally See referencesSee references

Cohen, J. (1983) The cost of dichotomization. Cohen, J. (1983) The cost of dichotomization. Applied Applied Psychological MeasurementPsychological Measurement, 7, 7, , 249-253.249-253.

McCallum, R.C., Zhang, S., Preacher, K.J., Rucker, D.D. McCallum, R.C., Zhang, S., Preacher, K.J., Rucker, D.D. (2002) On the practice of dichotomization of quantitative (2002) On the practice of dichotomization of quantitative variables. variables. Psychological Methods, 7Psychological Methods, 7, 19-40., 19-40.

Simple RegressionSimple Regression The The simple linear regression MODELsimple linear regression MODEL is: is:

yy = = 00 + + 11xx + +

describes how y is related to xdescribes how y is related to x 00 and and 11 are called are called parameters of the modelparameters of the model.. is a random variable called theis a random variable called the error term error term..

x y

e

Simple RegressionSimple Regression

Graph of the regression equation is a Graph of the regression equation is a straight line.straight line.

ββ0 is the population is the population y-y-intercept of the intercept of the regression line.regression line.

ββ11 is the population slope of the is the population slope of the regression line.regression line.

EE((yy) is the expected value of ) is the expected value of yy for a for a given given xx value value

xyyE 10ˆ)(

xyE 10)(


EE((yy))

xx

Slope Slope 11is positiveis positive

Regression lineRegression line

InterceptIntercept00


EE((yy))

xx

Slope Slope 11is 0is 0

Regression lineRegression lineInterceptIntercept

00

Estimated Simple RegressionEstimated Simple Regression The The estimated simple linear regression estimated simple linear regression

equationequation is: is:

The graph is called the estimated regression The graph is called the estimated regression line.line.

bb0 is the 0 is the yy intercept of the line. intercept of the line. bb1 is the slope of the line.1 is the slope of the line. is the estimated/predicted value of is the estimated/predicted value of yy for a for a

given given xx value. value.

0 1y b b x

y

Estimation processEstimation processRegression ModelRegression Model

yy = = 00 + + 11xx + +Regression EquationRegression Equation

EE((yy) = ) = 00 + + 11xxUnknown ParametersUnknown Parameters

00, , 11

Sample Data:Sample Data:x yx yxx11 y y11. .. . . .. . xxnn yynn

EstimatedEstimatedRegression EquationRegression Equation

Sample StatisticsSample Statistics

bb00, , bb11

bb00 and and bb11provide estimates ofprovide estimates of

00 and and 11

0 1y b b x

Least Squares EstimationLeast Squares Estimation Least Squares CriterionLeast Squares Criterion

where:where:yyii = = observedobserved value of the dependent value of the dependent variable for the ivariable for the ithth observation observationyyii = = predicted/estimatedpredicted/estimated value of the value of the dependent variable for the idependent variable for the ithth observationobservation

^

min (y yi i )2

Least Squares EstimationLeast Squares Estimation Estimated SlopeEstimated Slope

EstimatedEstimated y y-Intercept-Intercept

)(),(

/)(/)(

221 XVarYXCov

nxxnyxyx

bii

iiii

xbyb 0

Model AssumptionsModel Assumptions1.1. X is measured without error.X is measured without error.2.2. X and X and are independent are independent3.3. The error The error is a random variable with is a random variable with

mean of zero.mean of zero.4.4. The variance of The variance of , denoted by , denoted by 22, is the , is the

same for all values of the independent same for all values of the independent variable (homogeneity of error variance).variable (homogeneity of error variance).

5.5. The values of The values of are independent. are independent.6.6. The error The error is a normally distributed is a normally distributed

random variable.random variable.

Example: Consumer WarfareExample: Consumer Warfare

Number of Ads (X)Number of Ads (X) Purchases (Y)Purchases (Y)11 141433 242422 181811 171733 2727

ExampleExample Slope for the Estimated Regression EquationSlope for the Estimated Regression Equation

bb11 = 220 - (10)(100)/5 = 5 = 220 - (10)(100)/5 = 5 24 - (10)24 - (10)22/5/5

yy-Intercept for the Estimated Regression -Intercept for the Estimated Regression EquationEquation bb00 = 20 - 5(2) = 10 = 20 - 5(2) = 10

Estimated Regression EquationEstimated Regression Equationyy = 10 + 5 = 10 + 5xx^

ExampleExample Scatter plot with regression lineScatter plot with regression line

y = 10 + 5x

0

5

10

15

20

25

30

0 1 2 3 4# of Ads

Purc

hase

s

^

Evaluating FitEvaluating Fit Coefficient of DeterminationCoefficient of Determination

where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error

SST = SSR + SSESST = SSR + SSE

( ) ( ) ( )y y y y y yi i i i 2 2 2^^

rr22 = SSR/SST = SSR/SST

Evaluating FitEvaluating Fit Coefficient of DeterminationCoefficient of Determination

rr22 = SSR/SST = 100/114 = .8772 = SSR/SST = 100/114 = .8772

The regression relationship is very The regression relationship is very strong because 88% of the variation in strong because 88% of the variation in number of purchases can be explained number of purchases can be explained by the linear relationship with the by the linear relationship with the between the number of TV adsbetween the number of TV ads

Mean Square ErrorMean Square Error An Estimate of An Estimate of 22

The mean square error (MSE) provides the The mean square error (MSE) provides the estimate of estimate of 22, ,

SS22 = MSE = SSE/(n-2) = MSE = SSE/(n-2)

where:where:

210

2 )()ˆ(SSE iiii xbbyyy

Standard Error of EstimateStandard Error of Estimate An Estimate of SAn Estimate of S

To estimate To estimate we take the square root of we take the square root of 22.. The resulting S is called the The resulting S is called the standard error of standard error of

the estimatethe estimate.. Also called “Root Mean Squared Error”Also called “Root Mean Squared Error”

2SSEMSE

n

s

Linear CompositesLinear Composites Linear composites are fundamental to behavioral Linear composites are fundamental to behavioral

measurementmeasurement Prediction & Multiple RegressionPrediction & Multiple Regression Principle Component AnalysisPrinciple Component Analysis Factor AnalysisFactor Analysis Confirmatory Factor AnalysisConfirmatory Factor Analysis Scale DevelopmentScale Development

Ex: Unit-weighting of items in a testEx: Unit-weighting of items in a test Test = 1*X1 + 1*X2 + 1*X3 + … + 1*XnTest = 1*X1 + 1*X2 + 1*X3 + … + 1*Xn

Linear CompositesLinear Composites Sum ScaleSum Scale

ScaleScaleAA = X = X11 + X + X22 + X + X33 + … + X + … + Xnn

Unit-weighted linear compositeUnit-weighted linear composite ScaleScaleAA = 1*X = 1*X11 + 1*X + 1*X22 + 1*X + 1*X33 + … + 1*X + … + 1*Xnn

Weighted linear compositeWeighted linear composite ScaleScaleAA = b = b11XX11 + b + b22XX22 + b + b33XX33 + … + b + … + bnnXXnn

Variance of a weighted CompositeVariance of a weighted Composite

XX YY

YY Var(X)Var(X) Cov(XY)Cov(XY)

YY Cov(XY)Cov(XY) Var(Y)Var(Y)

),(*2)(*)(*)( 212221

21 YXCovwwXVarwXVarwYXVar

Effective vs. Nominal WeightsEffective vs. Nominal Weights Nominal weightsNominal weights

The desired weight assigned to each The desired weight assigned to each componentcomponent

Effective weights Effective weights the actual contribution of each the actual contribution of each

component to the compositecomponent to the composite function of the desired weights, function of the desired weights,

standard deviations, and covariances of standard deviations, and covariances of the componentsthe components

Principles of Composite FormationPrinciples of Composite Formation

Standardize before combining!!!!!Standardize before combining!!!!! Weighting doesn’t matter much when Weighting doesn’t matter much when

the correlations among the components the correlations among the components are moderate to largeare moderate to large

As the number of components As the number of components increases, the importance of weighting increases, the importance of weighting decreasesdecreases

Differential weights are difficult to Differential weights are difficult to replicate/cross-validatereplicate/cross-validate

Decision AccuracyDecision AccuracyTr

uth

Yes

No

DecisionFail Pass

TruePositive

False Positive

FalseNegative

TrueNegative

Signal Detection TheorySignal Detection Theory

Polygraph ExamplePolygraph Example Sensitivity, etc…Sensitivity, etc…

Documents

Measurement Math DeShon - 2006. Univariate Descriptives Mean Mean Variance, standard deviation Variance, standard deviation Skew Kurtosis Skew Kurtosis