Introduction to Statistical Models and Factoring l dependent and independent models l “traditional” and possible applications of independent models l 3-way

Introduction to Statistical Models and Factoring

dependent and independent models “traditional” and possible applications of

independent models 3-way sampling problem basic steps in factor analyses

Dependent multivariate models

Dependent models are used when we divide our variables into “criterion” and “predictor” variables

the value of the criterion(ia) is “dependent” on the value of the predictor(s) -- statistically / causally simple reg y’ = bx + a

multiple reg y’ = bx + bx + bx + a

canonical reg a + by + by = bx + bx + bx + a

x & y vars are quantitative, binary, coded, or interaction terms

Dependent multivariate models, cont.

Dependent models …

these are the General Linear Models from

“multivariate class”

research questions/hypotheses are about

which predictors (with what weightings) are

useful for estimating what criteria

Independent multivariate models

Independent models are used when there is no “predictor vs. criterion” distinction among our variables.

The independent models we will examine are … Factor Analysis Cluster Analysis Multidimensional Scaling

Research questions are about the number and identity (interpretation) of groupings among the “things” being analyzed

Independent multivariate models: “Traditional” Uses

Factor variables to find the number and identity of the different kinds of information Factor the 25 questions from client’s intake

interviews

Cluster people to find the number and identity of the different kinds of characteristic profiles

Cluster 250 students to took a standardized test

Scale stimuli to find the “rules of stimulus similarity and dissimilarity”

MDscale 24 shape stimuli

Before we get into the “alternative” uses of these models ...

3-way sampling problem

from a statistical or research design perspective “sampling” usually refers to the selection of some set of people from which data will be collected, for the purposes of representing what the results would be if data were collected from the entire population of people in which the researcher is interested

from a psychometric perspective “sampling” is a broader issue, with three dimensions sampling respondents to represent the desired population of

individuals sampling attributes to represent some desired domain of

characteristics sampling stimuli (things or people) to represent the desired

category(ies) of objects

3-way sampling

Examples 20 patients each rate the complexity, meaningfulness and

pleasantness of the 10 Rorschach cards 3 co-managers judge the efficiency, effectiveness, efficacy and

elegance of the 15 workers they share 10 psychologists rate each of 30 clients on their amenability to

treatment, dangerousness and treatment progress 200 respondents complete a 50 item self-report personality

measureStim

uli

Attributes

Peo

ple

Let’s look at how “people”, “attributes” and “stimuli” are used...

From the examples...

Example

#1

#2

#3

#4

People Stimuli Attributes

20 patients 10 cards cmp, ples. mng

3 co-man 15 workers e, e, e & e

10 psychists 30 clients amen, dang, tp

200 responds 1 -- “self” 50 items

So, why is it called the 3-way sampling “problem” ???

one “problem” is that most data analysis models (both dependent and independent) start from a 2-way data set, most commonly…

So, the 3-way data must be “prepared” for analysis, by either ... limited collection (only collect 2-way

data-- only one person, one stimulus, or one attribute involved)

selection (only use one 2-way “layer” from the 3-way sample)

aggregation (combine across one “way” of the 3-way sample to get a 2-way layer)

“Variables”

“Cas

es”

Let’s look at an example of each ...

Examples of data prep...

limited collection only collect 2-way data-- only

one person, one stimulus, or one attribute involved

Example -- 200 respondents complete a 50 item self-report personality measure only one stimulus (“self” or

“I”) -- so only a 2-way sampling (people x attributes)

2-way data table would look like

200

Res

pond

ents

50 Items


selection only use one 2-way layer from the 3-

way sample

Example -- 20 patients each rate the complexity, meaningfulness and pleasantness of the 10 Rorschach cards here’s what the 3-way data array

would look like

Imagine the researcher were interested in only the meaningfulness data only those data would be selected

comp mean plesnt

20 P

atie

nts

10 C

ards

Example of selection, cont.

The resulting 2-way table would look like ...

20 P

atie

nts

10 cards

All data are mean-ingfullness ratings


aggregation only use one 2-way layer from the 3-

way sample

Example -- 3 co-managers judge the efficiency, effectiveness, efficacy and elegance of the 15 workers they share here’s what the 3-way data array

would look like

Imagine the researcher was interested in how the workers differed in terms of the attributes

ef ef ef el

3 co

-man

grs

15 w

orke

rs

Example of aggregation, cont.

In this case, the co-manger ratings would be considered “replications” of each other -- existing primarily to get more stable data (than one manager’s rating)

So, we would aggregate (take the mean) across the three co-managers for each attribute of each worker

The resulting 2-way table would look like...

15 w

orke

rs

All data are average ratings

ef ef ef el

A second example of aggregation

Imagine the researcher was interested in how the workers differed in the ratings given by the three co-mangers

In this case, the attributes would be considered “replications” of each other -- existing primarily to get more stable data (than using one attribute)

So, we would aggregate (take the mean) across the four attributes ratings from each co-manager, for each worker

The resulting 2-way table would look like...

15 w

orke

rs

All data are average ratings

co#1 co#2 co#3

Different ways of treating data for the different models

The 2-way data table we have been discussing is often labeled the “X” matrix

starting with “X”, different things are done to prepare the data for different model

Let’s look at these ..

Remember, we’ll start with the “traditional” uses of the different models, and then look at the different ways they can be used

Factor variables to find the number and identity of the different kinds of information

“Variables”

“Cas

es”

X R “Variables”

“Var

iabl

es”

S “Factors”

“Var

iabl

es”

“R” captures the relationships among the variables which are summarized in the “S” (Structure) matrix, which provides the basis for deciding how many and what are the kinds of information the variables carry

Cluster people to find the number and identity of the different kinds of characteristic profiles

“Variables”

“Cas

es”

X D “Cases”

“C

ases

”

C

“

Cas

es”

“D” captures the similarities and differences among the cases which are summarized in “C” (Cluster membership), which provides the basis for deciding how many and what are the “sets” of people

53248. . 1

“Cluster”11122 . . 3

“Variables”

-2 -

1 0

1

2

Scale stimuli to find the “rules of stimulus similarity and dissimilarity”

“Stimuli”

“Sti

mul

i”

D

“D” captures the patterns of similarities and dissimilarities among the stimuli (can be from direct ratings or derived from “X”, more later) which are summarized in the “Map”, which provides the basis for deciding how many and what are the “rules” (dimensions) underlying the patterns of stimulus similarities and dissimilarities.

17

86

35

42

Map

simple

complex

symmetrical

asymmetrical

Independent multivariate models: “Alternative” Uses

As you might imagine, we are not limited to factoring variables, clustering people, and scaling stimuli

Any combination of “interest” “data” and “model” is possible

So, there are really nine possible combinations

Factoring Clustering Scaling

Variables People Stimuli

* *

*

Independent multivariate models: “Alternative” Uses of Factoring

Factoring provides a geometric (spatial) model of a pattern of intercorrelations number of underlying dimensions and interpretation of each

The two most common types of factoring are… R-type factoring -- based on inter-variable correlations

factoring variables number & kinds of variables with “similar information”

Q-type factoring -- based on inter-person correlations factoring people number & kinds of persons with “similar characteristics”

Factor people to find sets of people that

have “similar characteristics” “Variables”

“Cas

es”

X Q “Cases”

“Cas

es”

S “Factors”

“C

ases

”

“Q” captures the relationships among the cases which are summarized in the “S” (Structure) matrix, which provides the basis for deciding how many and what are the kinds of persons with similar characteristics

Independent multivariate models: “Alternative” Uses of Clustering

Clustering provides a non-geometric (non-spatial)

model of similarities and differences

number of groups and description of each

The three most common types of factoring are…

clustering people -- what we’ve looked at

clustering variables -- alternative to factoring

clustering stimuli -- alternative to MDScaling

Cluster variables to find sets of variables that have “similar characteristics”

“Variables”

“Cas

es”

X D “Variables”

“Var

iabl

es”

C

“Var

iabl

es”

“D” captures the similarities and differences among the variables which are summarized in “C” (Cluster membership), which provides the basis for deciding how many and what are the “sets” of variables

53248. . 1

“Cluster”11122 . . 3

Cluster stimuli to find sets of stimuli that have “similar characteristics”

“Variables”

“Sti

mul

i”

X D “Stimuli”

“S

tim

uli”

C

“

Sti

mul

i”

“D” captures the similarities and differences among the stimuli which are summarized in “C” (Cluster membership), which provides the basis for deciding how many and what are the “sets” of stimuli

53248. . 1

“Cluster”11122 . . 3

Independent multivariate models: “Alternative” Uses of MDScaling

Scaling provides a geometric (spatial) model of a pattern of

similarities and dissimilarities

number of underlying dimensions and interpretation of each

The three types of scaling are…

Scaling stimuli -- what we’ve looked at

Scaling Variables -- an alternative to factoring

Scaling People -- an alternative to clustering

Scale variables to find the “groups of variables”

“Items”

“I

tem

s”

D

“D” captures the patterns of similarities and dissimilarities among the items (can be from direct ratings or derived from “X”, more later) which are summarized in the “Map”, which provides the basis for deciding how many and what are the “rules” (dimensions) underlying the patterns of variable similarities and dissimilarities.

17

8

6 3

5

4 2

Map

Scale people to find the “dimensions of person’s similarities and dissimilarities”

“Cases”

“Cas

es”

D

“D” captures the patterns of similarities and dissimilarities among the people (can be from direct ratings or derived from “X”, more later) which are summarized in the “Map”, which provides the basis for deciding how many and what are the “rules” (dimensions) underlying the patterns of person’s similarities and dissimilarities.

17

86

35

42

Map

Intro to MDScaling

Short HistoryPurpose & Uses of MDSSteps in MDS ResearchTypes of MDS Models/Analyses

Short History

Classical Psychophysics Began as the search for the relationships between

the physical world and the “inner life” Got boring quickly -- in the name of scientific rigor

(physical attribute) by (psychological attributes) plots

Pretty sure this wasn’t the way to go ...Unidimensional research of the multivariate world Assumes we know the physical attributes and the

psychological attributes that are important

Short History

MDS was a “extension/rebellion” of Classical Psychophysics

Sought some way to collect data to “capture” . . . the attributes/dimensions that underlie “thought” the values of stimuli (objects) on the dimensions/attributes

What might be the “basic data” for this process ?? similarities/dissimilarities among the stimuli

A “solution” that can represent the information in those similarities would capture the “rules” underlying the decisions represented in those similarities a kind of data reduction but one that “reveals underlying structure”

How MDScaling “Works”

The result of an MDS analysis is a “map” that positions the stimuli in a k-space based on the set of stimulus similarities

Ever see a map ??? Remember that “triangle” in the corner ??? Map of 8 Cities

e

h

a

fb

cg

d

a b c d e f g

b 55

c 35 35

d 45 75 40

e 10 60 40 45

f 35 80 55 30 30

g 45 30 10 45 50 60

h 80 40 40 75 85 95 35

Inter-City Distances

MDS is Map-making, only backwards

Start with the pairwise distances (dissimilarities) Assume some “k” (# of dimensions; 1-6 in SPSS) Determine the positions of the stimuli in that k-space that best match the

pairwise dissimilarities - the map Assess how well the solution represents the dissimilarities

R² -- for the dissimilarities and the solution distances larger R² indicates a “better” solution

Stress -- family of “badness of fit” indices smaller stress indicates a “better” solution

If the solution is good, the stimulus positions “reveal” the structure/rules underlying the original dissimilarities

Purpose and Uses of MDS

The purpose is to provide a spatial representation of the pairwise dissimilarities among a set of stimuli the assumption is that by interpreting this “space” we can

understand the bases (rules, attributes) of the similarities

Used to ask how people “think about” things similarities and differences among stimuli may be the most

basic process of “thinking about”

MDS has been used to reveal the “rules” or “bases of judgement” for a wide variety of domains and populations can be used for composite, group comparison or individual

differences types of analyses (more later)

Purpose and Uses of MDS, cont.

One very important use of MDS is to “check-up” on stimuli you are planning use, especially if the differences among the stimuli are the intended IV manipulation

E.g., vignette studies -- often the IV is manipulated as the differences among the stories, cases, records you are counting on (assuming) that you can anticipate…

what differences between the stories will influence the DV (those you build in as the IVs)

what differences between stories will not influence the DV (“unimportant” differences you built in to give the stories some “character” or “so they’re not all the same”)

Purpose and Uses of MDS, cont.

Remember in Factoring we noted that sometimes we learn as much about the variables as about the factors?

The same is true in MDScaling In addition to learning about the “underlying dimensions” of

thought and comparisons, we often acquire unexpected information about individual stimuli

they often get positioned in unexpected places

e.g., “influence” was the “root word” of 82 synonyms/antonyms wasn’t anywhere near the centroid of the space

e.g., “felt” “tactually flat” but “visually fluffy”

Steps in an MDS Study

Select the stimulus domain & stimuli be sure to “cover” the domain want 8-10 stimuli per expected dimension (rules vary !!)

Select the population(s) of interest Select a data collection procedure

direct scaling -- pairwise dissimilarities collected “first hand” indirect scaling -- pairwise dissimilarities computes from a

set of attribute ratings (assumes you know the attributes being used !!!)

often use both -- direct scaling data for the MDSolution and the indirect data for helping to interpret the “map”

Steps in an MDS Study, cont.

Determine the # dimensions for the “best map” “scree-like” plots using R² and Stress indices stability & replicability interpretability

Interpret the “map” visual inspection dimensional interpretation -- placing attribute vectors using

multiple regression neighborhood interpretation -- identifying clusters of stimuli

Plan the next study what will be the result of deleting &/or adding stimuli ? what will be the result of a different population ?

Types of MDS models & analyses

There are four major types of MDS analyses Composite Scaling (ALSCAL)

Assumes everybody uses the same “rules” to form dissimilarities among the stimuli and that everybody quantifies stimuli the same using those rules

Individual Differences Scaling (INDSCAL) Assumes everybody uses the same “rules” (or a

subset of them) but provides for individual differences relative importance of those rules across people

Types of MDS models & analyses, cont.

There are four major types of MDS analyses Group Comparison Analyses (need a priori groups)

Form a composite solution for each “group” and compare them -- looking for similarities and differences in “rules” or “quantification” across the groups

Group Identification Analyses (“like” clustering) Identify groups of participants that have similar

pairwise dissimilarity data -- these folks use similar rules/quantifications -- then identify the group

Documents

Introduction to Statistical Models and Factoring l dependent and independent models l “traditional” and possible applications of independent models l 3-way